On 2009-01-09, Marc 'BlackJack' Rintsch <bj_...@gmx.net> wrote:
> On Fri, 09 Jan 2009 15:34:17 +0000, MRAB wrote:
>> Marc 'BlackJack' Rintsch wrote:
>>> def iter_max_values(blocks, block_count):
>>>     for i, block in enumerate(blocks):
>>>         histogram = defaultdict(int)
>>>         for byte in block:
>>>             histogram[byte] += 1
>>>         yield max((count, byte)
>>>                   for value, count in histogram.iteritems())[1]
>> [snip]
>> Would it be faster if histogram was a list initialised to [0] * 256?
> Don't know.  Then for every byte in the 2??GiB we have to call `ord()`.  
> Maybe the speedup from the list compensates this, maybe not.
> I think that we have to to something with *every* byte of that really 
> large file *at Python level* is the main problem here.  In C that's just 
> some primitive numbers.  Python has all the object overhead.

Using buffers or arrays of bytes instead of strings/lists would
probably reduce the overhead quite a bit.

Grant Edwards                   grante             Yow! I've got an IDEA!!
                                  at               Why don't I STARE at you
                               visi.com            so HARD, you forget your
                                                   SOCIAL SECURITY NUMBER!!

Reply via email to