On 6/11/08, Eric Jonas <[EMAIL PROTECTED]> wrote: > Lisandro, > Profile tells me that pickle is taking 70 seconds with a cStringIO > object and 76 seconds with a regular file object. Is that in line with > what you've seen ?
Well, as my use case is to serialize object and send them to the wires, I never use file objects. Anyway, your result is the expected one. So the cStringIO appoach will not help you too much. > I'm not entirely sure what pickle does that takes so > long, but I have a feeling it's spending a lot of time walking various > circular references -- this is a pretty dense graph i'm trying to > pickle, although (to avoid the recursion limit) I do try and break most > of the graph's references before-hand. Well, you said your graph has ~600k nodes. That's quite a lot of nodes to traverse for cPickle. I believe that you will need to follow Stefan's suggestion, that is, you will need to implement a custom way for data persistence. Hope you do not have general Python object attributes in your base classes!. > On Wed, 2008-06-11 at 12:35 -0300, Lisandro Dalcin wrote: > > Eric, I've tried hard in the past to speedup pickling for > > communication general Python object via MPI (for my project mpi4py). > > I'm now reimplementing mpi4py from scratch using Cython, but I've not > > yet reimplemented my pickle machinery. > > > > Could you try to output your pickles to a cStringIO instance (instead > > of a normal file instance) and tell me if you can get some speedup? If > > you get some speedup, then perhaps we can hack a bit to make it even > > faster for your use case... > > > > Of course, the other way would be to define from scratch a custom > > 'format' for saving your data, but I anticipate that that would be > > really a lot of work, and I even doubt you can get more speedup that > > with cPickle if you still want to be have to serialize arbitrary > > Python data... > > > > > > On 6/11/08, Eric Jonas <[EMAIL PROTECTED]> wrote: > > > I assume most of us are using Cython because normal python is too slow > > > for our particular operation. I've been writing a compiler in python > > > which generates large graphs with ~600k nodes, and I'd like to > > > serialize/checkpoint these to disk for later stages of the compiler > > > pipeline to use. However, at the moment, it takes ~60 seconds to > > > serialize the resulting graph to disk with cPickle. > > > > > > Have any Cython users found better/faster ways of serializing to disk? I > > > could potentially use the "marshal" module, but my understanding is that > > > it only works for built-in types, and of course, that the root of my > > > class hierarchy I have four cython classes. I'd love to avoid writing my > > > own serialize/deserialize methods, if at all possible. > > > > > > I can't be the only person in this position, > > > > > > Thanks, > > > ...Eric > > > > > > > > > > > > _______________________________________________ > > > Cython-dev mailing list > > > [email protected] > > > http://codespeak.net/mailman/listinfo/cython-dev > > > > > > > > > _______________________________________________ > Cython-dev mailing list > [email protected] > http://codespeak.net/mailman/listinfo/cython-dev > -- Lisandro Dalcín --------------- Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC) Instituto de Desarrollo Tecnológico para la Industria Química (INTEC) Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) PTLC - Güemes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
