Aaron Watters added the comment:
also:
I just modified the code to do iterations using increasingly large data
sizes
and I see the kind of very unpleasant behaviour for the old implementation
(max time varies wildly from min time) that I saw in my more complex
program. The new implementation doesn't have
these problems. First the runs and then the modified code
runs
arw:/home/arw/test>
arw:/home/arw/test> ~/apache2/htdocs/pythonsrc/Python/python_old mtest1.pyold
old
0 40 elapsed max= 2.28881835938e-05 min= 4.76837158203e-06 ratio= 4.8
1 160 elapsed max= 1.59740447998e-05 min= 9.05990600586e-06 ratio=
1.76315789474
2 640 elapsed max= 2.40802764893e-05 min= 2.19345092773e-05 ratio=
1.09782608696
3 2560 elapsed max= 8.79764556885e-05 min= 3.981590271e-05 ratio=
2.20958083832
4 10240 elapsed max= 0.000290155410767 min= 0.000148057937622 ratio=
1.95974235105
5 40960 elapsed max= 0.000867128372192 min= 0.00060510635376 ratio=
1.43301812451
6 163840 elapsed max= 0.00739598274231 min= 0.00339317321777 ratio=
2.17966554244
7 655360 elapsed max= 0.0883929729462 min= 0.0139379501343 ratio=
6.34189189189
8 2621440 elapsed max= 1.69851398468 min= 0.0547370910645 ratio=
31.0304028155
9 10485760 elapsed max= 9.98945093155 min= 0.213104963303 ratio=
46.875730986
10 41943040 elapsed max= 132.281101942 min= 0.834150075912 ratio=
158.581897625
arw:/home/arw/test> ~/apache2/htdocs/pythonsrc/Python/python mtest1.py new
new
0 40 elapsed max= 2.19345092773e-05 min= 5.00679016113e-06 ratio=
4.38095238095
1 160 elapsed max= 1.00135803223e-05 min= 9.05990600586e-06 ratio=
1.10526315789
2 640 elapsed max= 3.19480895996e-05 min= 1.28746032715e-05 ratio=
2.48148148148
3 2560 elapsed max= 5.69820404053e-05 min= 3.981590271e-05 ratio=
1.43113772455
4 10240 elapsed max= 0.000186920166016 min= 0.000138998031616 ratio=
1.34476843911
5 40960 elapsed max= 0.00355315208435 min= 0.000746965408325 ratio=
4.75678263645
6 163840 elapsed max= 0.00326490402222 min= 0.00304794311523 ratio=
1.07118272841
7 655360 elapsed max= 0.0127630233765 min= 0.0122020244598 ratio=
1.04597588855
8 2621440 elapsed max= 0.0511522293091 min= 0.0484230518341 ratio=
1.05636112082
9 10485760 elapsed max= 0.198891878128 min= 0.187420129776 ratio=
1.06120873124
10 41943040 elapsed max= 0.758435964584 min= 0.729014158249 ratio=
1.04035834696
arw:/home/arw/test>
Above high ratio numbers indicate strange and unpleasant performance
variance.
For iteration 7 and higher the old implementation has a much
worse max time performance than the new one.
Here is the test code:
def test():
from marshal import dumps
from time import time
size = 10
for i in range(11):
size = size*4
testString = "abc"*size
#print "now testing", i, size
minelapsed = None
for j in range(11):
now = time()
dump = dumps(testString)
elapsed = time()-now
if minelapsed is None:
minelapsed = elapsed
maxelapsed = elapsed
else:
minelapsed = min(elapsed, minelapsed)
maxelapsed = max(elapsed, maxelapsed)
print i, size, "elapsed max=", maxelapsed, "min=", minelapsed,
"ratio=", maxelapsed/minelapsed
if __name__=="__main__":
import sys
print sys.argv[1]
test()
-- Aaron Watters
On Jan 11, 2008 10:14 AM, Aaron Watters < [EMAIL PROTECTED]> wrote:
>
> Aaron Watters added the comment:
>
> Facundo
>
> 1) the +1024 was an accelerator to jump up to over 1k at the first resize.
> I think it's a good idea or at least doesn't hurt.
>
> 2) Here is an example program:
>
> def test():
> from marshal import dumps
> from time import time
> testString = "abc"*100000000
> print "now testing"
> now = time()
> dump = dumps(testString)
> elapsed = time()-now
> print "elapsed", elapsed
>
> if __name__=="__main__":
> test()
>
> Here are two runs: the first with the old marshal and the second with the
> patched marshal. The second is
> better than 2* faster than the first.
>
> arw:/home/arw/test> ~/apache2/htdocs/pythonsrc/Python/python_old mtest1.py
> now testing
> elapsed 4.13367795944
> arw:/home/arw/test> ~/apache2/htdocs/pythonsrc/Python/python mtest1.py
> now testing
> elapsed 1.7495341301
> arw:/home/arw/test>
>
> The example that inspired this research was very complicated and involved
> millions of calls to dumps
> which caused a number of anomalies (system calls went berzerk for some
> reason, maybe paging).
>
> -- Aaron Watters
>
> On Jan 11, 2008 9:25 AM, Facundo Batista <[EMAIL PROTECTED]> wrote:
>
> >
> > Facundo Batista added the comment:
> >
> > Why not just double the size? The "doubling + 1024" address some
> > specific issue? If so, it should be commented.
> >
> > Also, do you have an example of a marshal.dumps() that suffers from this
> > issue?
> >
> > Thank you!
> >
> > ----------
> > nosy: +facundobatista
> >
> > __________________________________
> > Tracker <[EMAIL PROTECTED]>
> > <http://bugs.python.org/issue1792>
> > __________________________________
> >
>
> Added file: http://bugs.python.org/file9124/unnamed
>
> __________________________________
> Tracker <[EMAIL PROTECTED]>
> <http://bugs.python.org/issue1792>
> __________________________________
>
Added file: http://bugs.python.org/file9125/unnamed
__________________________________
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1792>
__________________________________
also:<br><br>I just modified the code to do iterations using increasingly large
data sizes<br>and I see the kind of very unpleasant behaviour for the old
implementation<br>(max time varies wildly from min time) that I saw in my more
complex
<br>program. The new implementation doesn't have
<br>these problems. First the runs and then the modified
code<br><br>runs<br><br>arw:/home/arw/test> <br>arw:/home/arw/test>
~/apache2/htdocs/pythonsrc/Python/python_old mtest1.py old<br>old<br>0 40
elapsed max=
2.28881835938e-05 min= 4.76837158203e-06 ratio= 4.8<br>1 160 elapsed max=
1.59740447998e-05 min= 9.05990600586e-06 ratio= 1.76315789474<br>2 640 elapsed
max= 2.40802764893e-05 min= 2.19345092773e-05 ratio= 1.09782608696<br>
3 2560 elapsed max= 8.79764556885e-05 min= 3.981590271e-05 ratio=
2.20958083832<br>4 10240 elapsed max= 0.000290155410767 min= 0.000148057937622
ratio= 1.95974235105<br>5 40960 elapsed max= 0.000867128372192 min=
0.00060510635376
ratio= 1.43301812451<br>6 163840 elapsed max= 0.00739598274231 min=
0.00339317321777 ratio= 2.17966554244<br>7 655360 elapsed max= 0.0883929729462
min= 0.0139379501343 ratio= 6.34189189189<br>8 2621440 elapsed max=
1.69851398468
min= 0.0547370910645 ratio= 31.0304028155<br>9 10485760 elapsed max=
9.98945093155 min= 0.213104963303 ratio= 46.875730986<br>10 41943040 elapsed
max= 132.281101942 min= 0.834150075912 ratio=
158.581897625<br>arw:/home/arw/test>
~/apache2/htdocs/pythonsrc/Python/python
mtest1.py new<br>new<br>0 40 elapsed max= 2.19345092773e-05 min=
5.00679016113e-06 ratio= 4.38095238095<br>1 160 elapsed max= 1.00135803223e-05
min= 9.05990600586e-06 ratio= 1.10526315789<br>2 640 elapsed max=
3.19480895996e-05
min= 1.28746032715e-05 ratio= 2.48148148148<br>3 2560 elapsed max=
5.69820404053e-05 min= 3.981590271e-05 ratio= 1.43113772455<br>4 10240 elapsed
max= 0.000186920166016 min= 0.000138998031616 ratio= 1.34476843911<br>5 40960
elapsed max=
0.00355315208435 min= 0.000746965408325 ratio= 4.75678263645<br>6 163840
elapsed max= 0.00326490402222 min= 0.00304794311523 ratio= 1.07118272841<br>7
655360 elapsed max= 0.0127630233765 min= 0.0122020244598 ratio= 1.04597588855
<br>8 2621440 elapsed max= 0.0511522293091 min= 0.0484230518341 ratio=
1.05636112082<br>9 10485760 elapsed max= 0.198891878128 min= 0.187420129776
ratio= 1.06120873124<br>10 41943040 elapsed max= 0.758435964584 min=
0.729014158249
ratio= 1.04035834696<br>arw:/home/arw/test> <br><br>Above high ratio
numbers indicate strange and unpleasant performance variance.<br>For iteration
7 and higher the old implementation has a much<br>worse max time performance
than the new one.
<br><br>Here is the test code:<br><br>def test():<br> from
marshal import dumps<br> from time import
time<br> size = 10<br> for i in
range(11):<br> size =
size*4<br> testString =
"abc"*size
<br> #print "now testing",
i, size<br> minelapsed =
None<br> for j in
range(11):<br>
now =
time()<br>
dump =
dumps(testString)<br>
elapsed =
time()-now<br>
if minelapsed is None:
<br>
minelapsed =
elapsed<br>
maxelapsed =
elapsed<br>
else:<br>
minelapsed = min(elapsed,
minelapsed)<br>
maxelapsed = max(elapsed,
maxelapsed)<br> print i, size,
"elapsed max=", maxelapsed, "min=", minelapsed,
"ratio=", maxelapsed/minelapsed
<br><br>if __name__=="__main__":<br> import
sys<br> print sys.argv[1]<br>
test()<br><br>-- Aaron Watters<br><br><div class="gmail_quote">On Jan 11, 2008
10:14 AM, Aaron Watters <
<a href="mailto:[EMAIL PROTECTED]" target="_blank">[EMAIL PROTECTED]</a>>
wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid
rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>Aaron Watters added the comment:
<br><div><div></div><div><br>Facundo<br><br>1) the +1024 was an accelerator to
jump up to over 1k at the first resize.<br>I think it's a good idea or at
least doesn't hurt.<br><br>2) Here is an example program:
<br><br>def test():<br> from marshal import dumps<br>
from time import time<br> testString =
"abc"*100000000<br> print "now testing"<br>
now = time()<br> dump = dumps(testString)<br>
elapsed = time()-now
<br> print "elapsed", elapsed<br><br>if
__name__=="__main__":<br> test()<br><br>Here are two
runs: the first with the old marshal and the second with the<br>patched
marshal. The second is<br>better than 2* faster than the first.
<br><br>arw:/home/arw/test> ~/apache2/htdocs/pythonsrc/Python/python_old
mtest1.py<br>now testing<br>elapsed 4.13367795944<br>arw:/home/arw/test>
~/apache2/htdocs/pythonsrc/Python/python mtest1.py<br>now testing<br>
elapsed 1.7495341301<br>arw:/home/arw/test><br><br>The example that inspired
this research was very complicated and involved<br>millions of calls to
dumps<br>which caused a number of anomalies (system calls went berzerk for some
<br>reason, maybe paging).<br><br> -- Aaron Watters<br><br>On Jan 11,
2008 9:25 AM, Facundo Batista <<a href="mailto:[EMAIL PROTECTED]"
target="_blank">[EMAIL PROTECTED]</a>> wrote:<br><br>><br></div>
</div><div><div>
</div><div>> Facundo Batista added the comment:<br>><br>> Why not just
double the size? The "doubling + 1024" address some<br>> specific
issue? If so, it should be commented.<br>><br>
> Also, do you have an example of a marshal.dumps() that suffers from
this<br>> issue?<br>><br>> Thank you!<br>><br>>
----------<br>> nosy: +facundobatista<br>><br>>
__________________________________
<br>> Tracker <<a href="mailto:[EMAIL PROTECTED]" target="_blank">[EMAIL
PROTECTED]</a>><br>> <<a href="http://bugs.python.org/issue1792"
target="_blank">http://bugs.python.org/issue1792</a>><br>
> __________________________________
<br>><br><br></div></div>Added file: <a
href="http://bugs.python.org/file9124/unnamed"
target="_blank">http://bugs.python.org/file9124/unnamed</a><br><div><div></div><div><br>__________________________________
<br>Tracker <<a href="mailto:[EMAIL PROTECTED]" target="_blank">[EMAIL
PROTECTED]</a>><br><<a href="http://bugs.python.org/issue1792"
target="_blank">http://bugs.python.org/issue1792</a>><br>__________________________________
</div></div></blockquote></div><br>
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com