> > This again has the "costs" I referred to: > creating a potentially large sequence, > and shuffling it.
I thought I would see if I could do better, so I wrote this: <code> import random def index_types(n, typelist=[True, False]): numtypes = len(typelist) total = float(n*numtypes) counts = [0] * numtypes while total > 0: index = int(random.random() * numtypes) if counts[index] < n: counts[index] += 1 total -= 1 yield typelist[index] def shuffle_types(n,typelist=[True,False]): types = typelist*n random.shuffle(types) for next_type in types: yield next_type def random_types(n,typelist=[True,False]): type0, type1 = typelist ct0, ct1 = 0,0 while ct0+ct1<2*n: if random.random() < ((n-ct0)/(2*n-ct0-ct1)): next_type = type0 ct0 += 1 else: next_type = type1 ct1 += 1 yield next_type def test_index(n): for x in index_types(n): pass def test_shuffle(n): for x in shuffle_types(n): pass def test_random(n): for x in shuffle_types(n): pass if __name__ == '__main__': from time import sleep sleep(10) from timeit import Timer for function in ['test_index', 'test_shuffle', 'test_random']: for nvalue in [1000,10000,100000,1000000]: t = Timer("%s(%d)" % (function, nvalue), "from __main__ import %s" % function) print function, 'n =', nvalue, ':', t.timeit(number=100) </code> Which yielded the following results: [EMAIL PROTECTED]:~$ python test.py test_index n = 1000 : 0.599834918976 test_index n = 10000 : 5.78634595871 test_index n = 100000 : 56.1441719532 test_index n = 1000000 : 562.577429056 test_shuffle n = 1000 : 0.420483827591 test_shuffle n = 10000 : 4.62663197517 test_shuffle n = 100000 : 46.3557109833 test_shuffle n = 1000000 : 498.563408852 test_random n = 1000 : 0.440869092941 test_random n = 10000 : 4.77765703201 test_random n = 100000 : 47.6845810413 test_random n = 1000000 : 485.233494043 Which shows my function is the slowest (doh!) and your second function is the fastest. However, shuffle isn't taking much longer to create and randomize a fairly large list (4.99 secs vs 4.85 secs for your second function.) In my defense, I wanted to see if I could write the function to take an arbitrarily long list, rather than limiting it to 2 items. I also noticed your second function is not working as intended: >>> r = [x for x in test.random_types(10)] >>> r [False, False, False, False, False, False, False, False, False, False, True, True, True, True, True, True, True, True, True, True] I think it needs a cast to a float: <code> if random.random() < (float(n-ct0)/(2*n-ct0-ct1)): </code> I was too impatient to wait for the n=1000000 results, so here it is without them: [EMAIL PROTECTED]:~$ python test.py test_index n = 1000 : 0.583498001099 test_index n = 10000 : 5.80185317993 test_index n = 100000 : 58.8963599205 test_shuffle n = 1000 : 0.431984901428 test_shuffle n = 10000 : 4.75261592865 test_shuffle n = 100000 : 48.1326880455 test_random n = 1000 : 0.697184085846 test_random n = 10000 : 4.41986989975 test_random n = 100000 : 45.7090520859 Once again, the difference is negligible. My question is, does this function reduce the "randomness" of the data? Doesn't it move the data towards an alternating pattern of True, False? It seems to me that calling shuffle, or perhaps chained PRNG as suggested by Steven > If you really care about shuffling large lists, you can chain PRNGs. For > instance, you might use shuffle() to generate a random batch of items, > then use a completely different PRNG to shuffle the list again. would produce the most "random" data... Pete -- http://mail.python.org/mailman/listinfo/python-list