Re: splitting a large dictionary into smaller ones

Tim Chase Mon, 23 Mar 2009 05:59:06 -0700

i have a very large dictionary object that is built from a text file
that is about 800 MB -- it contains several million keys.  ideally i
would like to pickle this object so that i wouldnt have to parse this
large file to compute the dictionary every time i run my program.
however currently the pickled file is over 300 MB and takes a very
long time to write to disk - even longer than recomputing the
dictionary from scratch.


i would like to split the dictionary into smaller ones, containing
only hundreds of thousands of keys, and then try to pickle them. is

there a way to easily do this?

While others have suggested databases, they may be a bitoverkill, depending on your needs. Python2.5+ supplies not onlythe sqlite3 module, but older versions (at least back to 2.0)offer the anydbm module (changed to "dbm" in 3.0), allowing youto create an on-disk string-to-string dictionary:


  import anydbm
  db = anydbm.open("data.db", "c")

  # populate some data
  # using "db" as your dictionary
  import csv
  f = file("800megs.txt")
  data = csv.reader(f, delimiter='\t')
  data.next()  # discard a header row
  for key, value in data:
    db[key] = value
  f.close()
  print db["some key"]

  db.close()

The resulting DB object is a little sparsely documented, but forthe most part it can be treated like a dictionary. The advantageis that, if the source data doesn't change, you can parse onceand then just use your "data.db" file from there out.


-tkc







--
http://mail.python.org/mailman/listinfo/python-list

Re: splitting a large dictionary into smaller ones

Reply via email to