[Boris Borcic] > Assuming that the items of my_stream share no content (they are > dumps of db cursor fetches), is there a simple way to do the > equivalent of > > def pickles(my_stream) : > from cPickle import load,dumps > while 1 : > yield dumps(load(my_stream)) > > without the overhead associated with unpickling objects > just to pickle them again ?
cPickle (but not pickle.py) Unpickler objects have a barely documented noload() method. This "acts like" load(), except doesn't import modules or construct objects of user-defined classes. The return value of noload() is undocumented and usually useless. ZODB uses it a lot ;-) Anyway, that can go much faster than load(), and works even if the classes and modules referenced by pickles aren't available in the unpickling environment. It doesn't return the individual pickle strings, but they're easy to get at by paying attention to the file position between noload() calls. For example, import cPickle as pickle import os # Build a pickle file with 4 pickles. PICKLEFILE = "temp.pck" class C: pass f = open(PICKLEFILE, "wb") p = pickle.Pickler(f, 1) p.dump(2) p.dump([3, 4]) p.dump(C()) p.dump("all done") f.close() # Now use noload() to extract the 4 pickle # strings in that file. f = open(PICKLEFILE, "rb") limit = os.path.getsize(PICKLEFILE) u = pickle.Unpickler(f) pickles = [] pos = 0 while pos < limit: u.noload() thispos = f.tell() f.seek(pos) pickles.append(f.read(thispos - pos)) pos = thispos from pprint import pprint pprint(pickles) That prints a list containing the 4 pickle strings: ['K\x02.', ']q\x01(K\x03K\x04e.', '(c__main__\nC\nq\x02o}q\x03b.', 'U\x08all doneq\x04.'] You could do much the same by calling pickletools.dis() and ignoring its output, but that's likely to be slower. -- http://mail.python.org/mailman/listinfo/python-list