On 6/11/08, Eric Jonas <[EMAIL PROTECTED]> wrote:
> Lisandro,
>    Profile tells me that pickle is taking 70 seconds with a cStringIO
>  object and 76 seconds with a regular file object. Is that in line with
>  what you've seen ?

Well, as my use case is to serialize object and send them to the
wires, I never use file objects. Anyway, your result is the expected
one. So the cStringIO appoach will not help you too much.

> I'm not entirely sure what pickle does that takes so
>  long, but I have a feeling it's spending a lot of time walking various
>  circular references -- this is a pretty dense graph i'm trying to
>  pickle, although (to avoid the recursion limit) I do try and break most
>  of the graph's references before-hand.

Well, you said your graph has ~600k nodes. That's quite a lot of nodes
to traverse for cPickle.

I believe that you will need to follow Stefan's suggestion, that is,
you will need to implement a custom way for data persistence. Hope you
do not have general Python object attributes in your base classes!.


>  On Wed, 2008-06-11 at 12:35 -0300, Lisandro Dalcin wrote:
>  > Eric, I've tried hard in the past to speedup pickling for
>  > communication general Python object via MPI (for my project mpi4py).
>  > I'm now reimplementing mpi4py from scratch using Cython, but I've not
>  > yet reimplemented my pickle machinery.
>  >
>  > Could you try to output your pickles to a cStringIO instance (instead
>  > of a normal file instance) and tell me if you can get some speedup? If
>  > you get some speedup, then perhaps we can hack a bit to make it even
>  > faster for your use case...
>  >
>  > Of course, the other way would be to define from scratch a custom
>  > 'format'  for saving your data, but I anticipate that that would be
>  > really a lot of work, and I even doubt you can get more speedup that
>  > with cPickle if you still want to be have to serialize arbitrary
>  > Python data...
>  >
>  >
>  > On 6/11/08, Eric Jonas <[EMAIL PROTECTED]> wrote:
>  > > I assume most of us are using Cython because normal python is too slow
>  > >  for our particular operation. I've been writing a compiler in python
>  > >  which generates large graphs with ~600k nodes, and I'd like to
>  > >  serialize/checkpoint these to disk for later stages of the compiler
>  > >  pipeline to use. However, at the moment, it takes ~60 seconds to
>  > >  serialize the resulting graph to disk with cPickle.
>  > >
>  > >  Have any Cython users found better/faster ways of serializing to disk? I
>  > >  could potentially use the "marshal" module, but my understanding is that
>  > >  it only works for built-in types, and of course, that the root of my
>  > >  class hierarchy I have four cython classes. I'd love to avoid writing my
>  > >  own serialize/deserialize methods, if at all possible.
>  > >
>  > >  I can't be the only person in this position,
>  > >
>  > >  Thanks,
>  > >                         ...Eric
>  > >
>  > >
>  > >
>  > >  _______________________________________________
>  > >  Cython-dev mailing list
>  > >  [email protected]
>  > >  http://codespeak.net/mailman/listinfo/cython-dev
>  > >
>  >
>  >
>
>  _______________________________________________
>  Cython-dev mailing list
>  [email protected]
>  http://codespeak.net/mailman/listinfo/cython-dev
>


-- 
Lisandro Dalcín
---------------
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to