Duncan Booth, 30.04.2010 10:20:
So more than 3GB just for the strings (and that's for Python 2.x on
Python 3.x you'll need nearly 5GB).

Running on a 64 bit version of Python should be fine, but for a 32 bit
system a naive approach just isn't going to work.

Option 1: use a trie. That should reduce storage, maybe it will reduce
it enough, maybe not. It depends on the data.

Depending on the implementation and the data, a trie can also use a lot /more/ space than the set of strings that it contains. The 83 million 14 character strings can well include a relatively large subset of the possible permutations (imagine purely numeric, hex-numeric or lower-case alpha-numeric strings, for example), so the trie will likely need to branch very often with very low intermediate run length. If you use pointers for trie branches, that's at least 8 bytes per branch on a 64bit system, versus 1 byte per character in a byte string list. Depending on the ratio of branches to characters, one or the other may win. So a "naive approach" likely won't work for tries either.

Stefan

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to