Marc 'BlackJack' Rintsch wrote:
On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote:

As this was horribly slow (20 Minutes for a 2GB file) I coded the whole
thing in C also:

Yours took ~37 minutes for 2 GiB here.  This "just" ~15 minutes:

#!/usr/bin/env python
from __future__ import division, with_statement
import os
import sys
from collections import defaultdict
from functools import partial
from itertools import imap


def iter_max_values(blocks, block_count):
    for i, block in enumerate(blocks):
        histogram = defaultdict(int)
        for byte in block:
            histogram[byte] += 1
yield max((count, byte)
                  for value, count in histogram.iteritems())[1]
[snip]
Would it be faster if histogram was a list initialised to [0] * 256?
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to