Re: Optimizing size of very large dictionaries

M.-A. Lemburg Thu, 31 Jul 2008 03:26:55 -0700

On 2008-07-31 02:29, [EMAIL PROTECTED] wrote:

Are there any techniques I can use to strip a dictionary data
structure down to the smallest memory overhead possible?


I'm working on a project where my available RAM is limited to 2G
and I would like to use very large dictionaries vs. a traditional
database.

Background: I'm trying to identify duplicate records in very
large text based transaction logs. I'm detecting duplicate
records by creating a SHA1 checksum of each record and using this
checksum as a dictionary key. This works great except for several
files whose size is such that their associated checksum
dictionaries are too big for my workstation's 2G of RAM.


If you don't have a problem with taking a small performance hit,
then I'd suggest to have a look at mxBeeBase, which is an on-disk
dictionary implementation:

    http://www.egenix.com/products/python/mxBase/mxBeeBase/

Of course, you could also use a database table for this. Together
with a proper index that should work as well (but it's likely slower
than mxBeeBase).

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jul 31 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
--
http://mail.python.org/mailman/listinfo/python-list

Re: Optimizing size of very large dictionaries

Reply via email to