> There are a couple factual inaccuracies on the site that I'd like to clear up 
> first:
> Trivial benchmarks put cerealizer and banana/jelly on the same level as far 
> as performance goes:
> $ python -m timeit -s 'from cereal import dumps; L = ["Hello", " ", ("w", 
> "o", "r", "l", "d", ".")]' 'dumps(L)'
> 10000 loops, best of 3: 84.1 usec per loop
> $ python -m timeit -s 'from twisted.spread import banana, jelly; dumps = 
> lambda o: banana.encode(jelly.jelly(o)); L = ["Hello", " ", ("w", "o", "r", 
> "l", "d", ".")]' 'dumps(L)'
> 10000 loops, best of 3: 89.7 usec per loop
>
> This is with cBanana though, which has to be explicitly enabled and, of 
> course, is written in C.  So Cerealizer looks like it has the potential to do 
> pretty well, performance-wise.

My personal benchmark was different; it was using a list with 2000
objects defined as following:

class O(object):
  def __init__(self):
    self.x = 1
    self.s = "jiba"
    self.o = None

with self.o referring to another O object. I think my benchmark,
although still very limited, is more representative since it involves
object, string, number and list.

See it there:
http://svn.gna.org/viewcvs/*checkout*/soya/trunk/cerealizer/test/test1.py?content-type=text%2Fplain&rev=31

The results are (using Psyco):
With old-style classes:
        cerealizer
        dumps in 0.0619530677795 s, 114914 bytes length
        loads in 0.0313038825989 s

        cPickle
        dumps in 0.0301840305328 s, 116356 bytes length
        loads in 0.023097038269 s

        jelly + banana
        dumps in 0.168012142181 s 169729 bytes length
        loads in 1.82081913948 s

        jelly + cBanana
        dumps in 0.082946062088 s 169729 bytes length
        loads in 0.156159877777 s

With new-style classes:
        cerealizer
        dumps in 0.0575239658356 s, 114914 bytes length
        loads in 0.028165102005 s

        cPickle
        dumps in 0.07634806633 s, 116428 bytes length
        loads in 0.0278959274292 s

        jelly + banana
        dumps in 0.156242132187 s 169729 bytes length
        (TypeError; I didn't investigate this problem yet although it is
surely solvable)

        jelly + cBanana
        dumps in 0.10772895813 s 169729 bytes length
        (TypeError; I didn't investigate this problem yet although it is
surely solvable)

As you see, cPickle is about 2 times faster than cerealizer for
old-style classes, but cerealizer beats cPickle for new-style classes
(which makes sense since I have optimized it for new-style classes).
However, Jelly is far behind, even using cBanana, especially for
loading.


> You talked about _Tuple and _Dereference on the website as well.  These are 
> internal implementation details. jelly also supports extension types, by way 
> of setUnjellyableForClass and similar functions.

The problem arises only when the extension type expects an attribute of
a specific class, e.g. (in Pyrex):

cdef class MyClass:
  cdef MyClass other

The other attribute of MyClass can only contains a reference to an
instance of MyClass (or None). Thus it cannot be set to an instance of
_Dereference or _Tuple, even temporarily; doing other =
_Dereference(...) raises an exception.

I solve this problem in Cerealizer by doing a 2-pass object creation:
step 1, create all the objects; step 2, set all objects' states.

> As far as security goes, no obvious problems jump out at me, either
> from the API for from skimming the code.  I think early-binding
> __new__, __getstate__, and __setstate__ may be going further than
> is necessary.  If someone can find code to set attributes on classes
> in your process space, they can probably already do anything they
> want to your program and don't need to exploit security problems in
> your serializer.

I agree on that; however I prefer to be "over-secure" than "just as
secure as necessary" :-)

Thank you for your opinion!
I'm going to update my website.
Jiba

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to