Hi,
yes I know the main usage is to generate pyc files. But marshal is also
used for other stuff
and is the fastest built in serialization method. For some use cases it
makes sense to use it instead of
pickle or others. And people use it not only to generate pyc files.
I only found one case with a performance regression in the newer
protocol versions for
3.4. We should take care of it and improve it. Now it is possible to
handle this in a beta phase
and fix it for the upcoming release. Or even document all this. I think
it is also useful for others
to know about the new versions and their usage and the behavior.
I also noticed the new versions can be faster in some use cases. I like
the work done for this
and think it was also useful to reduce the size of the resulting
serialization. I 'm not against it
nor want to criticize it. I only want to improve all this further.
Regards,
Wolfgang
On 28.01.2014 06:14, Kristján Valur Jónsson wrote:
Hi there.
I think you should modify your program to marshal (and load) a compiled module.
This is where the optimizations in versions 3 and 4 become important.
K
-----Original Message-----
From: Python-Dev [mailto:python-dev-
[email protected]] On Behalf Of Victor Stinner
Sent: Monday, January 27, 2014 23:35
To: Wolfgang
Cc: Python-Dev
Subject: Re: [Python-Dev] Python 3.4, marshal dumps slower (version 3
protocol)
Hi,
I'm surprised: marshal.dumps() doesn't raise an error if you pass an invalid
version. In fact, Python 3.3 only supports versions 0, 1 and 2. If you pass 3,
it
will use the version 2. (Same apply for version
99.)
Python 3.4 has two new versions: 3 and 4. The version 3 "shares common
object references", the version 4 adds short tuples and short strings
(produce smaller files).
It would be nice to document the differences between marshal versions.
And what do you think of raising an error if the version is unknown in
marshal.dumps()?
I modified your benchmark to test also loads() and run the benchmark
10 times. Results:
---
Python 3.3.3+ (3.3:50aa9e3ab9a4, Jan 27 2014, 16:11:26) [GCC 4.8.2 20131212
(Red Hat 4.8.2-7)] on linux
dumps v0: 391.9 ms
data size v0: 45582.9 kB
loads v0: 616.2 ms
dumps v1: 384.3 ms
data size v1: 45582.9 kB
loads v1: 594.0 ms
dumps v2: 153.1 ms
data size v2: 41395.4 kB
loads v2: 549.6 ms
dumps v3: 152.1 ms
data size v3: 41395.4 kB
loads v3: 535.9 ms
dumps v4: 152.3 ms
data size v4: 41395.4 kB
loads v4: 549.7 ms
---
And:
---
Python 3.4.0b3+ (default:dbad4564cd12, Jan 27 2014, 16:09:40) [GCC 4.8.2
20131212 (Red Hat 4.8.2-7)] on linux
dumps v0: 389.4 ms
data size v0: 45582.9 kB
loads v0: 564.8 ms
dumps v1: 390.2 ms
data size v1: 45582.9 kB
loads v1: 545.6 ms
dumps v2: 165.5 ms
data size v2: 41395.4 kB
loads v2: 470.9 ms
dumps v3: 425.6 ms
data size v3: 41395.4 kB
loads v3: 528.2 ms
dumps v4: 369.2 ms
data size v4: 37000.9 kB
loads v4: 550.2 ms
---
Version 2 is the fastest in Python 3.3 and 3.4, but version 4 with Python 3.4
produces the smallest file.
Victor
2014-01-27 Wolfgang <[email protected]>:
Hi,
I tested the latest beta from 3.4 (b3) and noticed there is a new
marshal protocol version 3.
The documentation is a little silent about the new features, not going
into detail.
I've run a performance test with the new protocol version and noticed
the new version is two times slower in serialization than version 2. I
tested it with a simple value tuple in a list (500000 elements).
Nothing special. (happens only if the tuple contains also a tuple)
Copy of the test code:
from time import time
from marshal import dumps
def genData(amount=500000):
for i in range(amount):
yield (i, i+2, i*2, (i+1,i+4,i,4), "my string template %s" % i,
1.01*i,
True)
data = list(genData())
print(len(data))
t0 = time()
result = dumps(data, 2)
t1 = time()
print("duration p2: %f" % (t1-t0))
t0 = time()
result = dumps(data, 3)
t1 = time()
print("duration p3: %f" % (t1-t0))
Is the overhead for the recursion detection so high ?
Note this happens only if there is a tuple in the tuple of the datalist.
Regards,
Wolfgang
_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-
dev/victor.stinner%40gm
ail.com
_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com