Blind Anagram wrote: > I would be grateful for any advice people can offer on the fastest way > to count items in a sub-sequence of a large list. > > I have a list of boolean values that can contain many hundreds of > millions of elements for which I want to count the number of True values > in a sub-sequence, one from the start up to some value (say hi). > > I am currently using: > > sieve[:hi].count(True) > > but I believe this may be costly because it copies a possibly large part > of the sieve. > > Ideally I would like to be able to use: > > sieve.count(True, hi) > > where 'hi' sets the end of the count but this function is, sadly, not > available for lists. > > The use of a bytearray with a memoryview object instead of a list solves > this particular problem but it is not a solution for me as it creates > more problems than it solves in other aspects of the program. > > Can I assume that one possible solution would be to sub-class list and > create a C based extension to provide list.count(value, limit)? > > Are there any other solutions that will avoid copying a large part of > the list?
If the list doesn't change often you can convert it to a string >>> items = [True, False, False] * 10 >>> sitems = "".join("FT"[i] for i in items) >>> sitems 'TFFTFFTFFTFFTFFTFFTFFTFFTFFTFF' >>> sitems.count("T", 3, 10) 3 >>> sitems.count("F", 3, 10) 4 Or you use a[3:10].sum() on a boolean numpy array. Its slices are views rather than copies: >>> import numpy >>> a = numpy.array([True, False, False]*10) >>> a[3:10].sum() 3 -- http://mail.python.org/mailman/listinfo/python-list