Re: How to Buffer Serialized Objects to Disk

Peter Otten Wed, 12 Jan 2011 14:08:01 -0800

Scott McCarty wrote:

> Sorry to ask this question. I have search the list archives and googled,
> but I don't even know what words to find what I am looking for, I am just
> looking for a little kick in the right direction.
> 
> I have a Python based log analysis program called petit (
> http://crunchtools.com/petit). I am trying to modify it to manage the main
> object types to and from disk.
> 
> Essentially, I have one object which is a list of a bunch of "Entry"
> objects. The Entry objects have date, time, date, etc fields which I use
> for analysis techniques. At the very beginning I build up the list of
> objects then would like to start pickling it while building to save
> memory. I want to be able to process more entries than I have memory. With
> a strait list it looks like I could build from xreadlines(), but once you
> turn it into a more complex object, I don't quick know where to go.
> 
> I understand how to pickle the entire data structure, but I need something
> that will manage the memory/disk allocation?  Any thoughts?


You can write multiple pickled objects into a single file:

import cPickle as pickle

def dump(filename, items):
    with open(filename, "wb") as out:
        dump = pickle.Pickler(out).dump
        for item in items:
            dump(item)

def load(filename):
    with open(filename, "rb") as instream:
        load = pickle.Unpickler(instream).load
        while True:
            try:
                item = load()
            except EOFError:
                break
            yield item

if __name__ == "__main__":
    filename = "tmp.pickle"
    from collections import namedtuple
    T = namedtuple("T", "alpha beta")
    dump(filename, (T(a, b) for a, b in zip("abc", [1,2,3])))
    for item in load(filename):
        print item

To get random access you'd have to maintain a list containing the offsets of 
the entries in the file.
However, a simple database like SQLite is probably sufficient for the kind 
of entries you have in mind, and it allows operations like aggregation, 
sorting and grouping out of the box.

Peter

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How to Buffer Serialized Objects to Disk

Reply via email to