[Python-Dev] 2.6 rc1 performance results
Three weeks ago, Antoine Pitrou posted the pybench results for 2.6 trunk: http://mail.python.org/pipermail/python-dev/2008-August/081951.html The big discovery in those results were TryExcept being 48% slower, but there was a patch in the bug tracker to improve things. I've re-run the tests to check the results. Disclaimer: these results are probably not directly comparable. Antoine was using a 32-bit Linux installation on an Athlon 3600+ X2; I'm on a Macbook. Good news: TryExcept is now only 10% slower than 2.5, not 48%. Bad news: the big slowdowns are: CompareFloats: 117ms98ms +19.2% 118ms99ms +19.0% CompareIntegers: 110ms 104ms +5.6% 110ms 105ms +4.9% DictWithStringKeys: 118ms 105ms +12.8% 133ms 108ms +22.7% NestedForLoops: 125ms 116ms +7.7% 127ms 118ms +8.0% Recursion: 193ms 159ms +21.5% 197ms 163ms +20.8% SecondImport: 139ms 129ms +8.4% 143ms 130ms +9.9% SecondPackageImport: 150ms 139ms +8.6% 152ms 140ms +8.1% SecondSubmoduleImport: 211ms 191ms +10.5% 214ms 195ms +9.4% SimpleComplexArithmetic: 130ms 119ms +9.4% 131ms 120ms +9.2% Antoine, your Recursion results were actually about the same (+2.2%) from 2.5 to 2.6, so this big slowdown is novel. I wonder if these tests are simply slower on MacOS for some reason (compiler, CPU cache size, etc.). Does anyone see similar results? Any idea what might have made DictForStringKeys and Recursion slow down? Complete results: Test minimum run-timeaverage run-time thisother diffthisother diff --- BuiltinFunctionCalls: 140ms 148ms -5.4% 142ms 153ms -7.5% BuiltinMethodLookup: 120ms 135ms -11.2% 122ms 137ms -11.0% CompareFloats: 117ms98ms +19.2% 118ms99ms +19.0% CompareFloatsIntegers: 109ms 119ms -8.9% 109ms 121ms -9.3% CompareIntegers: 110ms 104ms +5.6% 110ms 105ms +4.9% CompareInternedStrings: 128ms 153ms -16.3% 131ms 158ms -16.8% CompareLongs: 102ms99ms +3.5% 105ms 101ms +3.9% CompareStrings: 164ms 161ms +2.0% 166ms 165ms +0.7% CompareUnicode: 141ms 158ms -10.5% 143ms 164ms -12.6% ComplexPythonFunctionCalls: 159ms 272ms -41.3% 164ms 277ms -40.6% ConcatStrings: 173ms 168ms +3.2% 177ms 172ms +3.1% ConcatUnicode: 108ms 121ms -10.8% 111ms 124ms -10.4% CreateInstances: 168ms 180ms -6.4% 176ms 182ms -3.7% CreateNewInstances: 129ms 153ms -15.6% 132ms 158ms -16.0% CreateStringsWithConcat: 156ms 157ms -0.7% 158ms 161ms -1.9% CreateUnicodeWithConcat: 112ms 114ms -1.8% 114ms 117ms -2.2% DictCreation: 104ms 112ms -7.1% 106ms 114ms -7.2% DictWithFloatKeys: 149ms 162ms -7.7% 153ms 168ms -8.7% DictWithIntegerKeys: 123ms 148ms -16.8% 127ms 151ms -15.9% DictWithStringKeys: 118ms 105ms +12.8% 133ms 108ms +22.7% ForLoops:91ms88ms +3.6%91ms88ms +3.0% IfThenElse: 108ms 102ms +5.2% 109ms 103ms +5.5% ListSlicing: 155ms 239ms -35.0% 157ms 241ms -34.6% NestedForLoops: 125ms 116ms +7.7% 127ms 118ms +8.0% NormalClassAttribute: 135ms 140ms -3.8% 139ms 146ms -4.7% NormalInstanceAttribute: 123ms 126ms -2.4% 125ms 130ms -4.4% PythonFunctionCalls: 126ms 126ms +0.0% 129ms 128ms +0.9% PythonMethodCalls: 165ms 165ms -0.1% 168ms 170ms -1.1% Recursion: 193ms 159ms +21.5% 197ms 163ms +20.8% SecondImport: 139ms 129ms +8.4% 143ms 130ms +9.9% SecondPackageImport: 150ms 139ms +8.6% 152ms 140ms +8.1% SecondSubmoduleImport: 211ms 191ms +10.5% 214ms 195ms +9.4% SimpleComplexArithmetic: 130ms 119ms +9.4% 131ms 120ms +9.2% SimpleDictManipulation: 124ms 146ms -14.6% 128ms 150ms -14.8% SimpleFloatArithmetic: 127ms 132ms -3.6% 131ms 144ms -9.3% SimpleIntFloatArithmetic:93ms 100ms -6.5%94ms 100ms -5.6% SimpleIntegerArithmetic:94ms91ms +2.8%95ms92ms +3.1% SimpleListManipulation: 108ms 110ms -1.1% 110ms 111ms -1.2% SimpleLongArithmetic: 141ms 136ms +3.8% 143ms 139ms +2.8%
Re: [Python-Dev] 2.6 rc1 performance results
A.M. Kuchling amk at amk.ca writes: Bad news: the big slowdowns are: [snip] I don't get the same results, but there can be significant variations between two pybench runs. Did use the same compiler and the same flags for both Python versions? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.6 rc1 performance results
On Sat, 13 Sep 2008 08:03:50 -0400, A.M. Kuchling [EMAIL PROTECTED] wrote: Three weeks ago, Antoine Pitrou posted the pybench results for 2.6 trunk: http://mail.python.org/pipermail/python-dev/2008-August/081951.html The big discovery in those results were TryExcept being 48% slower, but there was a patch in the bug tracker to improve things. I've re-run the tests to check the results. Disclaimer: these results are probably not directly comparable. Antoine was using a 32-bit Linux installation on an Athlon 3600+ X2; I'm on a Macbook. Good news: TryExcept is now only 10% slower than 2.5, not 48%. Bad news: the big slowdowns are: CompareFloats: 117ms98ms +19.2% 118ms99ms +19.0% CompareIntegers: 110ms 104ms +5.6% 110ms 105ms +4.9% DictWithStringKeys: 118ms 105ms +12.8% 133ms 108ms +22.7% NestedForLoops: 125ms 116ms +7.7% 127ms 118ms +8.0% Recursion: 193ms 159ms +21.5% 197ms 163ms +20.8% SecondImport: 139ms 129ms +8.4% 143ms 130ms +9.9% SecondPackageImport: 150ms 139ms +8.6% 152ms 140ms +8.1% SecondSubmoduleImport: 211ms 191ms +10.5% 214ms 195ms +9.4% SimpleComplexArithmetic: 130ms 119ms +9.4% 131ms 120ms +9.2% I see similar results for some of these. The complete results from a run on an AMD Athlon(tm) 64 Processor 3200+ are attached. Jean-Paul --- PYBENCH 2.0 --- * using CPython 2.6rc1 (trunk:66421M, Sep 12 2008, 21:05:52) [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] * disabled garbage collection * system check interval set to maximum: 2147483647 * using timer: time.time --- Benchmark: p26.pybench --- Rounds: 10 Warp: 10 Timer: time.time Machine Details: Platform ID:Linux-2.6.24-19-generic-i686-with-debian-lenny-sid Processor: Python: Implementation: CPython Executable: /home/exarkun/Projects/python/trunk//python Version:2.6.0 Compiler: GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7) Bits: 32bit Build: Sep 12 2008 21:05:52 (#trunk:66421M) Unicode:UCS2 --- Comparing with: p25.pybench --- Rounds: 10 Warp: 10 Timer: time.time Machine Details: Platform ID:Linux-2.6.24-19-generic-i686-with-debian-lenny-sid Processor: Python: Implementation: n/a Executable: /home/exarkun/Projects/python/branches/release25-maint/python Version:2.5.3a0 Compiler: GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7) Bits: 32bit Build: Sep 13 2008 09:32:41 (#release25-maint:66444) Unicode:UCS2 Test minimum run-timeaverage run-time thisother diffthisother diff --- BuiltinFunctionCalls: 178ms 187ms -4.5% 184ms 193ms -4.6% BuiltinMethodLookup: 151ms 165ms -8.5% 155ms 167ms -7.2% CompareFloats: 150ms 146ms +2.9% 153ms 150ms +1.9% CompareFloatsIntegers: 143ms 147ms -2.8% 150ms 150ms +0.4% CompareIntegers: 180ms 182ms -1.0% 182ms 190ms -4.3% CompareInternedStrings: 159ms 160ms -1.1% 163ms 166ms -2.0% CompareLongs: 135ms 136ms -0.7% 136ms 139ms -1.5% CompareStrings: 142ms 150ms -5.4% 146ms 153ms -4.5% CompareUnicode: 148ms 135ms +9.6% 151ms 137ms +10.6% ComplexPythonFunctionCalls: 155ms 226ms -31.4% 158ms 229ms -30.9% ConcatStrings: 197ms 203ms -2.8% 202ms 215ms -6.4% ConcatUnicode: 179ms 168ms +6.6% 182ms 184ms -0.8% CreateInstances: 159ms 157ms +1.4% 162ms 161ms +0.7% CreateNewInstances: 119ms 141ms -15.4% 121ms 144ms -16.2% CreateStringsWithConcat: 189ms 173ms +9.3% 195ms 177ms +10.2% CreateUnicodeWithConcat: 116ms 113ms +2.3% 118ms 115ms +2.6% DictCreation: 109ms 140ms -22.2% 112ms 143ms -21.8% DictWithFloatKeys: 202ms 199ms +1.6% 208ms 204ms +1.6% DictWithIntegerKeys: 158ms 156ms +1.0% 161ms
Re: [Python-Dev] 2.6 rc1 performance results
A.M. Kuchling wrote: Antoine, your Recursion results were actually about the same (+2.2%) from 2.5 to 2.6, so this big slowdown is novel. I wonder if these tests are simply slower on MacOS for some reason (compiler, CPU cache size, etc.). Does anyone see similar results? Any idea what might have made DictForStringKeys and Recursion slow down? The change to universal binaries, perhaps? Your results showed quite a few slowdowns in number related code, while my local testing shows primarily speed increases in those areas. That said, I'm seeing big enough swings in the percentages between runs that I'd like to get some tips on how to smooth out the variations - e.g. will increasing the warp factor increasing the amount of time each individual run takes? Although on a Mac OS X specific front... could the conversion to universal binaries have made a difference? Do you get the same performance numbers for a local build as you do for the version from the installer? Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.6 rc1 performance results
The change to universal binaries, perhaps? That shouldn't really matter - the machine code should still be the same, and it should all get loaded at program startup. IOW, startup and imports may get slower, but otherwise, it should have no impact. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.6 rc1 performance results
Nick Coghlan ncoghlan at gmail.com writes: That said, I'm seeing big enough swings in the percentages between runs that I'd like to get some tips on how to smooth out the variations - e.g. will increasing the warp factor increasing the amount of time each individual run takes? Increasing the number of rounds (-n) is probably better. Also, if you are on a laptop or a modern desktop machine, check that CPU frequency scaling is disabled before running any benchmark (on Linux, cpufreq-set -g performance does the trick). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.6 rc1 performance results
On Sep 13, 2008, at 1:03 PM, Antoine Pitrou wrote: Nick Coghlan ncoghlan at gmail.com writes: That said, I'm seeing big enough swings in the percentages between runs that I'd like to get some tips on how to smooth out the variations - e.g. will increasing the warp factor increasing the amount of time each individual run takes? Increasing the number of rounds (-n) is probably better. Also, if you are on a laptop or a modern desktop machine, check that CPU frequency scaling is disabled before running any benchmark (on Linux, cpufreq-set -g performance does the trick). I don't think there is any way to stop cpu frequency scalling on mac os x. Also comparing 2.6 rc1 to system python 2.5 is not fair either (does anyone really knows how apple compiled its python?). Also the performance of 2 diferent processor lines on different os insert a fair amount of variables to any comparison. I would sugest compiling 2.5 and 2.6 from source, run the benchmark x times and take the smallest time of each test (so os and cpu scalling don't influence so much the benchmark) and then comparing the results. -- Leonardo Santagada santagada at gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.6 rc1 performance results
On Sat, Sep 13, 2008 at 6:05 PM, Leonardo Santagada [EMAIL PROTECTED] wrote: I would sugest compiling 2.5 and 2.6 from source, run the benchmark x times and take the smallest time of each test (so os and cpu scalling don't influence so much the benchmark) and then comparing the results. I didn't actually run them and pick the smallest, but I did just compile both from source to keep the environments as close as possible. Both compiled from source with Apple's GCC 4.0.1 on a MacBook Pro (Intel Core 2 Duo, 4GB RAM) running OS X 10.5.4. Minimal apps running, plugged in to avoid obvious CPU scaling (I'm sure it drops when you're on battery). this = Python 2.6.0rc1 other = Python 2.5.2 Brett --- Test minimum run-timeaverage run-time thisother diffthisother diff --- BuiltinFunctionCalls: 120ms 120ms +0.2% 121ms 121ms +0.1% BuiltinMethodLookup:92ms 109ms -15.7%93ms 110ms -15.4% CompareFloats:87ms87ms -0.8%87ms88ms -0.9% CompareFloatsIntegers:91ms83ms +9.6%92ms84ms +9.6% CompareIntegers:80ms81ms -1.4%80ms81ms -1.5% CompareInternedStrings:88ms86ms +2.9%89ms87ms +1.3% CompareLongs:84ms78ms +8.1%85ms78ms +8.0% CompareStrings:69ms71ms -2.4%72ms74ms -2.2% CompareUnicode:96ms96ms -0.5%99ms99ms -0.2% ComplexPythonFunctionCalls: 128ms 0ms n/a 129ms 0ms n/a ConcatStrings: 125ms 119ms +5.5% 129ms 122ms +5.1% ConcatUnicode:78ms70ms +11.0%79ms71ms +11.9% CreateInstances: 136ms 135ms +1.1% 137ms 136ms +0.8% CreateNewInstances: 102ms 118ms -13.5% 103ms 118ms -12.9% CreateStringsWithConcat: 111ms 117ms -4.6% 112ms 118ms -4.7% CreateUnicodeWithConcat:82ms 122ms -33.0%84ms 125ms -33.3% DictCreation:90ms86ms +4.9%91ms91ms +0.5% DictWithFloatKeys:89ms 107ms -17.0%91ms 110ms -17.8% DictWithIntegerKeys:86ms85ms +0.7%87ms86ms +1.6% DictWithStringKeys:79ms80ms -0.5%80ms81ms -1.1% ForLoops:73ms80ms -8.6%75ms81ms -7.5% IfThenElse:77ms81ms -4.7%78ms82ms -4.8% ListSlicing: 106ms 139ms -23.9% 107ms 142ms -24.2% NestedForLoops: 103ms 101ms +1.5% 106ms 104ms +1.8% NormalClassAttribute:99ms 118ms -16.3% 100ms 120ms -16.7% NormalInstanceAttribute:88ms 107ms -17.9%88ms 107ms -17.6% PythonFunctionCalls:89ms94ms -5.1%90ms95ms -4.9% PythonMethodCalls: 131ms 135ms -3.4% 132ms 137ms -4.0% Recursion: 124ms 128ms -3.4% 125ms 130ms -3.7% SecondImport:92ms84ms +9.1%92ms85ms +8.9% SecondPackageImport:97ms88ms +10.2%98ms89ms +9.2% SecondSubmoduleImport: 125ms 112ms +11.8% 126ms 113ms +11.0% SimpleComplexArithmetic: 100ms98ms +2.4% 101ms99ms +1.8% SimpleDictManipulation:88ms92ms -4.7%89ms94ms -4.9% SimpleFloatArithmetic:89ms 106ms -16.2%91ms 110ms -16.5% SimpleIntFloatArithmetic:73ms87ms -16.1%73ms87ms -16.1% SimpleIntegerArithmetic:73ms88ms -17.5%73ms89ms -17.5% SimpleListManipulation:84ms78ms +7.3%85ms84ms +1.1% SimpleLongArithmetic: 108ms 106ms +1.9% 109ms 107ms +1.8% SmallLists: 119ms 120ms -0.9% 120ms 124ms -3.1% SmallTuples: 113ms 105ms +7.7% 115ms 106ms +7.9% SpecialClassAttribute:96ms 116ms -17.6%97ms 118ms -17.8% SpecialInstanceAttribute: 158ms 179ms -11.9% 159ms 181ms -12.1% StringMappings: 156ms 162ms -3.8% 156ms 162ms -3.6% StringPredicates: 128ms 153ms -16.5% 129ms 154ms -16.3% StringSlicing: 113ms 100ms +13.1% 121ms 103ms +17.0% TryExcept:69ms72ms -3.6%69ms72ms -3.8% TryFinally:96ms 0ms n/a97ms 0ms n/a TryRaiseExcept:98ms 101ms -2.2%99ms 101ms -1.8% TupleSlicing: 136ms 147ms -7.2% 141ms 148ms -4.9% UnicodeMappings: 112ms99ms +14.1%