[Python-Dev] 2.6 rc1 performance results

2008-09-13 Thread A.M. Kuchling
Three weeks ago, Antoine Pitrou posted the pybench results 
for 2.6 trunk:
http://mail.python.org/pipermail/python-dev/2008-August/081951.html

The big discovery in those results were TryExcept being 48% slower,
but there was a patch in the bug tracker to improve things.  I've
re-run the tests to check the results.

Disclaimer: these results are probably not directly comparable.
Antoine was using a 32-bit Linux installation on an Athlon 3600+ X2;
I'm on a Macbook.

Good news: TryExcept is now only 10% slower than 2.5, not 48%.

Bad news: the big slowdowns are:

 CompareFloats:   117ms98ms  +19.2%   118ms99ms  +19.0%
   CompareIntegers:   110ms   104ms   +5.6%   110ms   105ms   +4.9%
DictWithStringKeys:   118ms   105ms  +12.8%   133ms   108ms  +22.7%
NestedForLoops:   125ms   116ms   +7.7%   127ms   118ms   +8.0%
 Recursion:   193ms   159ms  +21.5%   197ms   163ms  +20.8%
  SecondImport:   139ms   129ms   +8.4%   143ms   130ms   +9.9%
   SecondPackageImport:   150ms   139ms   +8.6%   152ms   140ms   +8.1%
 SecondSubmoduleImport:   211ms   191ms  +10.5%   214ms   195ms   +9.4%
   SimpleComplexArithmetic:   130ms   119ms   +9.4%   131ms   120ms   +9.2%

Antoine, your Recursion results were actually about the same (+2.2%)
from 2.5 to 2.6, so this big slowdown is novel.  I wonder if these
tests are simply slower on MacOS for some reason (compiler, CPU cache
size, etc.).  Does anyone see similar results?  Any idea what might
have made DictForStringKeys and Recursion slow down?

Complete results:

Test minimum run-timeaverage  run-time
 thisother   diffthisother   diff
---
  BuiltinFunctionCalls:   140ms   148ms   -5.4%   142ms   153ms   -7.5%
   BuiltinMethodLookup:   120ms   135ms  -11.2%   122ms   137ms  -11.0%
 CompareFloats:   117ms98ms  +19.2%   118ms99ms  +19.0%
 CompareFloatsIntegers:   109ms   119ms   -8.9%   109ms   121ms   -9.3%
   CompareIntegers:   110ms   104ms   +5.6%   110ms   105ms   +4.9%
CompareInternedStrings:   128ms   153ms  -16.3%   131ms   158ms  -16.8%
  CompareLongs:   102ms99ms   +3.5%   105ms   101ms   +3.9%
CompareStrings:   164ms   161ms   +2.0%   166ms   165ms   +0.7%
CompareUnicode:   141ms   158ms  -10.5%   143ms   164ms  -12.6%
ComplexPythonFunctionCalls:   159ms   272ms  -41.3%   164ms   277ms  -40.6%
 ConcatStrings:   173ms   168ms   +3.2%   177ms   172ms   +3.1%
 ConcatUnicode:   108ms   121ms  -10.8%   111ms   124ms  -10.4%
   CreateInstances:   168ms   180ms   -6.4%   176ms   182ms   -3.7%
CreateNewInstances:   129ms   153ms  -15.6%   132ms   158ms  -16.0%
   CreateStringsWithConcat:   156ms   157ms   -0.7%   158ms   161ms   -1.9%
   CreateUnicodeWithConcat:   112ms   114ms   -1.8%   114ms   117ms   -2.2%
  DictCreation:   104ms   112ms   -7.1%   106ms   114ms   -7.2%
 DictWithFloatKeys:   149ms   162ms   -7.7%   153ms   168ms   -8.7%
   DictWithIntegerKeys:   123ms   148ms  -16.8%   127ms   151ms  -15.9%
DictWithStringKeys:   118ms   105ms  +12.8%   133ms   108ms  +22.7%
  ForLoops:91ms88ms   +3.6%91ms88ms   +3.0%
IfThenElse:   108ms   102ms   +5.2%   109ms   103ms   +5.5%
   ListSlicing:   155ms   239ms  -35.0%   157ms   241ms  -34.6%
NestedForLoops:   125ms   116ms   +7.7%   127ms   118ms   +8.0%
  NormalClassAttribute:   135ms   140ms   -3.8%   139ms   146ms   -4.7%
   NormalInstanceAttribute:   123ms   126ms   -2.4%   125ms   130ms   -4.4%
   PythonFunctionCalls:   126ms   126ms   +0.0%   129ms   128ms   +0.9%
 PythonMethodCalls:   165ms   165ms   -0.1%   168ms   170ms   -1.1%
 Recursion:   193ms   159ms  +21.5%   197ms   163ms  +20.8%
  SecondImport:   139ms   129ms   +8.4%   143ms   130ms   +9.9%
   SecondPackageImport:   150ms   139ms   +8.6%   152ms   140ms   +8.1%
 SecondSubmoduleImport:   211ms   191ms  +10.5%   214ms   195ms   +9.4%
   SimpleComplexArithmetic:   130ms   119ms   +9.4%   131ms   120ms   +9.2%
SimpleDictManipulation:   124ms   146ms  -14.6%   128ms   150ms  -14.8%
 SimpleFloatArithmetic:   127ms   132ms   -3.6%   131ms   144ms   -9.3%
  SimpleIntFloatArithmetic:93ms   100ms   -6.5%94ms   100ms   -5.6%
   SimpleIntegerArithmetic:94ms91ms   +2.8%95ms92ms   +3.1%
SimpleListManipulation:   108ms   110ms   -1.1%   110ms   111ms   -1.2%
  SimpleLongArithmetic:   141ms   136ms   +3.8%   143ms   139ms   +2.8%
  

Re: [Python-Dev] 2.6 rc1 performance results

2008-09-13 Thread Antoine Pitrou
A.M. Kuchling amk at amk.ca writes:
 
 Bad news: the big slowdowns are:
[snip]

I don't get the same results, but there can be significant variations between
two pybench runs. Did use the same compiler and the same flags for both Python
versions?


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.6 rc1 performance results

2008-09-13 Thread Jean-Paul Calderone

On Sat, 13 Sep 2008 08:03:50 -0400, A.M. Kuchling [EMAIL PROTECTED] wrote:

Three weeks ago, Antoine Pitrou posted the pybench results
for 2.6 trunk:
http://mail.python.org/pipermail/python-dev/2008-August/081951.html

The big discovery in those results were TryExcept being 48% slower,
but there was a patch in the bug tracker to improve things.  I've
re-run the tests to check the results.

Disclaimer: these results are probably not directly comparable.
Antoine was using a 32-bit Linux installation on an Athlon 3600+ X2;
I'm on a Macbook.

Good news: TryExcept is now only 10% slower than 2.5, not 48%.

Bad news: the big slowdowns are:

CompareFloats:   117ms98ms  +19.2%   118ms99ms  +19.0%
  CompareIntegers:   110ms   104ms   +5.6%   110ms   105ms   +4.9%
   DictWithStringKeys:   118ms   105ms  +12.8%   133ms   108ms  +22.7%
   NestedForLoops:   125ms   116ms   +7.7%   127ms   118ms   +8.0%
Recursion:   193ms   159ms  +21.5%   197ms   163ms  +20.8%
 SecondImport:   139ms   129ms   +8.4%   143ms   130ms   +9.9%
  SecondPackageImport:   150ms   139ms   +8.6%   152ms   140ms   +8.1%
SecondSubmoduleImport:   211ms   191ms  +10.5%   214ms   195ms   +9.4%
  SimpleComplexArithmetic:   130ms   119ms   +9.4%   131ms   120ms   +9.2%



I see similar results for some of these.  The complete results from a run
on an AMD Athlon(tm) 64 Processor 3200+ are attached.

Jean-Paul

---
PYBENCH 2.0
---
* using CPython 2.6rc1 (trunk:66421M, Sep 12 2008, 21:05:52) [GCC 4.2.3 (Ubuntu 
4.2.3-2ubuntu7)]
* disabled garbage collection
* system check interval set to maximum: 2147483647
* using timer: time.time

---
Benchmark: p26.pybench
---

Rounds: 10
Warp:   10
Timer:  time.time

Machine Details:
   Platform ID:Linux-2.6.24-19-generic-i686-with-debian-lenny-sid
   Processor:  

Python:
   Implementation: CPython
   Executable: /home/exarkun/Projects/python/trunk//python
   Version:2.6.0
   Compiler:   GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)
   Bits:   32bit
   Build:  Sep 12 2008 21:05:52 (#trunk:66421M)
   Unicode:UCS2


---
Comparing with: p25.pybench
---

Rounds: 10
Warp:   10
Timer:  time.time

Machine Details:
   Platform ID:Linux-2.6.24-19-generic-i686-with-debian-lenny-sid
   Processor:  

Python:
   Implementation: n/a
   Executable: 
/home/exarkun/Projects/python/branches/release25-maint/python
   Version:2.5.3a0
   Compiler:   GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)
   Bits:   32bit
   Build:  Sep 13 2008 09:32:41 (#release25-maint:66444)
   Unicode:UCS2


Test minimum run-timeaverage  run-time
 thisother   diffthisother   diff
---
  BuiltinFunctionCalls:   178ms   187ms   -4.5%   184ms   193ms   -4.6%
   BuiltinMethodLookup:   151ms   165ms   -8.5%   155ms   167ms   -7.2%
 CompareFloats:   150ms   146ms   +2.9%   153ms   150ms   +1.9%
 CompareFloatsIntegers:   143ms   147ms   -2.8%   150ms   150ms   +0.4%
   CompareIntegers:   180ms   182ms   -1.0%   182ms   190ms   -4.3%
CompareInternedStrings:   159ms   160ms   -1.1%   163ms   166ms   -2.0%
  CompareLongs:   135ms   136ms   -0.7%   136ms   139ms   -1.5%
CompareStrings:   142ms   150ms   -5.4%   146ms   153ms   -4.5%
CompareUnicode:   148ms   135ms   +9.6%   151ms   137ms  +10.6%
ComplexPythonFunctionCalls:   155ms   226ms  -31.4%   158ms   229ms  -30.9%
 ConcatStrings:   197ms   203ms   -2.8%   202ms   215ms   -6.4%
 ConcatUnicode:   179ms   168ms   +6.6%   182ms   184ms   -0.8%
   CreateInstances:   159ms   157ms   +1.4%   162ms   161ms   +0.7%
CreateNewInstances:   119ms   141ms  -15.4%   121ms   144ms  -16.2%
   CreateStringsWithConcat:   189ms   173ms   +9.3%   195ms   177ms  +10.2%
   CreateUnicodeWithConcat:   116ms   113ms   +2.3%   118ms   115ms   +2.6%
  DictCreation:   109ms   140ms  -22.2%   112ms   143ms  -21.8%
 DictWithFloatKeys:   202ms   199ms   +1.6%   208ms   204ms   +1.6%
   DictWithIntegerKeys:   158ms   156ms   +1.0%   161ms   

Re: [Python-Dev] 2.6 rc1 performance results

2008-09-13 Thread Nick Coghlan
A.M. Kuchling wrote:
 Antoine, your Recursion results were actually about the same (+2.2%)
 from 2.5 to 2.6, so this big slowdown is novel.  I wonder if these
 tests are simply slower on MacOS for some reason (compiler, CPU cache
 size, etc.).  Does anyone see similar results?  Any idea what might
 have made DictForStringKeys and Recursion slow down?

The change to universal binaries, perhaps? Your results showed quite a
few slowdowns in number related code, while my local testing shows
primarily speed increases in those areas.

That said, I'm seeing big enough swings in the percentages between runs
that I'd like to get some tips on how to smooth out the variations -
e.g. will increasing the warp factor increasing the amount of time each
individual run takes?

Although on a Mac OS X specific front... could the conversion to
universal binaries have made a difference? Do you get the same
performance numbers for a local build as you do for the version from the
installer?

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.6 rc1 performance results

2008-09-13 Thread Martin v. Löwis

 The change to universal binaries, perhaps?

That shouldn't really matter - the machine code should still be the
same, and it should all get loaded at program startup. IOW, startup
and imports may get slower, but otherwise, it should have no impact.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.6 rc1 performance results

2008-09-13 Thread Antoine Pitrou
Nick Coghlan ncoghlan at gmail.com writes:
 
 That said, I'm seeing big enough swings in the percentages between runs
 that I'd like to get some tips on how to smooth out the variations -
 e.g. will increasing the warp factor increasing the amount of time each
 individual run takes?

Increasing the number of rounds (-n) is probably better.
Also, if you are on a laptop or a modern desktop machine, check that CPU
frequency scaling is disabled before running any benchmark (on Linux,
cpufreq-set -g performance does the trick).



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.6 rc1 performance results

2008-09-13 Thread Leonardo Santagada


On Sep 13, 2008, at 1:03 PM, Antoine Pitrou wrote:


Nick Coghlan ncoghlan at gmail.com writes:


That said, I'm seeing big enough swings in the percentages between  
runs

that I'd like to get some tips on how to smooth out the variations -
e.g. will increasing the warp factor increasing the amount of time  
each

individual run takes?


Increasing the number of rounds (-n) is probably better.
Also, if you are on a laptop or a modern desktop machine, check that  
CPU

frequency scaling is disabled before running any benchmark (on Linux,
cpufreq-set -g performance does the trick).



I don't think there is any way to stop cpu frequency scalling on mac  
os x. Also comparing 2.6 rc1 to system python 2.5 is not fair either  
(does anyone really knows how apple compiled its python?). Also the  
performance of 2 diferent processor lines on different os insert a  
fair amount of variables to any comparison.


I would sugest compiling 2.5 and 2.6 from source, run the benchmark x  
times and take the smallest time of each test (so os and cpu scalling  
don't influence so much the benchmark) and then comparing the results.


--
Leonardo Santagada
santagada at gmail.com



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.6 rc1 performance results

2008-09-13 Thread Brett Hoerner
On Sat, Sep 13, 2008 at 6:05 PM, Leonardo Santagada [EMAIL PROTECTED] wrote:
 I would sugest compiling 2.5 and 2.6 from source, run the benchmark x times
 and take the smallest time of each test (so os and cpu scalling don't
 influence so much the benchmark) and then comparing the results.

I didn't actually run them and pick the smallest, but I did just
compile both from source to keep the environments as close as
possible.

Both compiled from source with Apple's GCC 4.0.1 on a MacBook Pro
(Intel Core 2 Duo, 4GB RAM) running OS X 10.5.4.  Minimal apps
running, plugged in to avoid obvious CPU scaling (I'm sure it drops
when you're on battery).

this = Python 2.6.0rc1
other = Python 2.5.2

Brett

---

Test minimum run-timeaverage  run-time
 thisother   diffthisother   diff
---
  BuiltinFunctionCalls:   120ms   120ms   +0.2%   121ms   121ms   +0.1%
   BuiltinMethodLookup:92ms   109ms  -15.7%93ms   110ms  -15.4%
 CompareFloats:87ms87ms   -0.8%87ms88ms   -0.9%
 CompareFloatsIntegers:91ms83ms   +9.6%92ms84ms   +9.6%
   CompareIntegers:80ms81ms   -1.4%80ms81ms   -1.5%
CompareInternedStrings:88ms86ms   +2.9%89ms87ms   +1.3%
  CompareLongs:84ms78ms   +8.1%85ms78ms   +8.0%
CompareStrings:69ms71ms   -2.4%72ms74ms   -2.2%
CompareUnicode:96ms96ms   -0.5%99ms99ms   -0.2%
ComplexPythonFunctionCalls:   128ms 0ms n/a   129ms 0ms n/a
 ConcatStrings:   125ms   119ms   +5.5%   129ms   122ms   +5.1%
 ConcatUnicode:78ms70ms  +11.0%79ms71ms  +11.9%
   CreateInstances:   136ms   135ms   +1.1%   137ms   136ms   +0.8%
CreateNewInstances:   102ms   118ms  -13.5%   103ms   118ms  -12.9%
   CreateStringsWithConcat:   111ms   117ms   -4.6%   112ms   118ms   -4.7%
   CreateUnicodeWithConcat:82ms   122ms  -33.0%84ms   125ms  -33.3%
  DictCreation:90ms86ms   +4.9%91ms91ms   +0.5%
 DictWithFloatKeys:89ms   107ms  -17.0%91ms   110ms  -17.8%
   DictWithIntegerKeys:86ms85ms   +0.7%87ms86ms   +1.6%
DictWithStringKeys:79ms80ms   -0.5%80ms81ms   -1.1%
  ForLoops:73ms80ms   -8.6%75ms81ms   -7.5%
IfThenElse:77ms81ms   -4.7%78ms82ms   -4.8%
   ListSlicing:   106ms   139ms  -23.9%   107ms   142ms  -24.2%
NestedForLoops:   103ms   101ms   +1.5%   106ms   104ms   +1.8%
  NormalClassAttribute:99ms   118ms  -16.3%   100ms   120ms  -16.7%
   NormalInstanceAttribute:88ms   107ms  -17.9%88ms   107ms  -17.6%
   PythonFunctionCalls:89ms94ms   -5.1%90ms95ms   -4.9%
 PythonMethodCalls:   131ms   135ms   -3.4%   132ms   137ms   -4.0%
 Recursion:   124ms   128ms   -3.4%   125ms   130ms   -3.7%
  SecondImport:92ms84ms   +9.1%92ms85ms   +8.9%
   SecondPackageImport:97ms88ms  +10.2%98ms89ms   +9.2%
 SecondSubmoduleImport:   125ms   112ms  +11.8%   126ms   113ms  +11.0%
   SimpleComplexArithmetic:   100ms98ms   +2.4%   101ms99ms   +1.8%
SimpleDictManipulation:88ms92ms   -4.7%89ms94ms   -4.9%
 SimpleFloatArithmetic:89ms   106ms  -16.2%91ms   110ms  -16.5%
  SimpleIntFloatArithmetic:73ms87ms  -16.1%73ms87ms  -16.1%
   SimpleIntegerArithmetic:73ms88ms  -17.5%73ms89ms  -17.5%
SimpleListManipulation:84ms78ms   +7.3%85ms84ms   +1.1%
  SimpleLongArithmetic:   108ms   106ms   +1.9%   109ms   107ms   +1.8%
SmallLists:   119ms   120ms   -0.9%   120ms   124ms   -3.1%
   SmallTuples:   113ms   105ms   +7.7%   115ms   106ms   +7.9%
 SpecialClassAttribute:96ms   116ms  -17.6%97ms   118ms  -17.8%
  SpecialInstanceAttribute:   158ms   179ms  -11.9%   159ms   181ms  -12.1%
StringMappings:   156ms   162ms   -3.8%   156ms   162ms   -3.6%
  StringPredicates:   128ms   153ms  -16.5%   129ms   154ms  -16.3%
 StringSlicing:   113ms   100ms  +13.1%   121ms   103ms  +17.0%
 TryExcept:69ms72ms   -3.6%69ms72ms   -3.8%
TryFinally:96ms 0ms n/a97ms 0ms n/a
TryRaiseExcept:98ms   101ms   -2.2%99ms   101ms   -1.8%
  TupleSlicing:   136ms   147ms   -7.2%   141ms   148ms   -4.9%
   UnicodeMappings:   112ms99ms  +14.1%