[issue22501] Optimise PyLong division by 1 or -1

2015-04-22 Thread A.M. Kuchling

A.M. Kuchling added the comment:

From reading the discussion thread, it looks like the consensus is to not 
apply this set of patches because the speed-up is unfortunately small.  
Closing as won't-fix; please re-open if someone wishes to pursue this again.

--
nosy: +akuchling
resolution:  - wont fix
stage: patch review - resolved
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22501] Optimise PyLong division by 1 or -1

2014-09-27 Thread Stefan Behnel

Stefan Behnel added the comment:

One more comment: I also benchmarked the change in long_true_div() now and 
found that it's only a minor improvement for large numbers and a 
*pessimisation* for small numbers:

Before:

$ ./python -m timeit -s 'x = 5' 'x / -1'
1000 loops, best of 3: 0.0313 usec per loop
$ ./python -m timeit -s 'x = 5' 'x / 1'
1000 loops, best of 3: 0.0307 usec per loop

$ ./python -m timeit -s 'x = 2**200 + 3**234 + 5**89 + 7**123' 'x / 1'
1000 loops, best of 3: 0.101 usec per loop
$ ./python -m timeit -s 'x = 2**200 + 3**234 + 5**89 + 7**123' 'x / -1'
1000 loops, best of 3: 0.104 usec per loop


Patched:

$ ./python -m timeit -s 'x = 5' 'x / 1'  
1000 loops, best of 3: 0.0569 usec per loop
$ ./python -m timeit -s 'x = 5' 'x / -1'
1000 loops, best of 3: 0.0576 usec per loop

$ ./python -m timeit -s 'x = 2**200 + 3**234 + 5**89 + 7**123' 'x / -1'
1000 loops, best of 3: 0.056 usec per loop
$ ./python -m timeit -s 'x = 2**200 + 3**234 + 5**89 + 7**123' 'x / 1' 
1000 loops, best of 3: 0.056 usec per loop

$ ./python -m timeit -s 'x = 2**200 + 3**234 + 5**89 + 7**123' 'x / -2'
1000 loops, best of 3: 0.106 usec per loop


So, just for completeness, here's the patch without that part, with changes 
only in l_divmod() and long_mul(), with the timeit results as given in my 
previous comment.

--
Added file: http://bugs.python.org/file36746/mul_div_by_1_fast_path_3.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22501] Optimise PyLong division by 1 or -1

2014-09-26 Thread Stefan Behnel

New submission from Stefan Behnel:

The attached patch adds fast paths for PyLong division by 1 and -1, as well as 
dividing 0 by something. This was found helpful for fractions normalisation, as 
the GCD that is divided by can often be |1|, but firing up the whole division 
machinery for this eats a lot of CPU cycles for nothing.

There are currently two test failures in test_long.py because dividing a huge 
number by 1 or -1 no longer raises an OverflowError. This is a behavioural 
change, but I find it acceptable. If others agree, I'll fix the tests and 
submit a new patch.

--
components: Interpreter Core
files: div_by_1_fast_path.patch
keywords: patch
messages: 227590
nosy: mark.dickinson, pitrou, scoder, serhiy.storchaka
priority: normal
severity: normal
status: open
title: Optimise PyLong division by 1 or -1
type: performance
versions: Python 3.5
Added file: http://bugs.python.org/file36729/div_by_1_fast_path.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22501] Optimise PyLong division by 1 or -1

2014-09-26 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Perhaps it would be worth to special case multiplying on 0, 1 and -1 and adding 
0, 1 and -1 too.

--
stage:  - patch review

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22501] Optimise PyLong division by 1 or -1

2014-09-26 Thread STINNER Victor

STINNER Victor added the comment:

Any optimization requires a benchmark. What is the speedup?

--
nosy: +haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22501] Optimise PyLong division by 1 or -1

2014-09-26 Thread STINNER Victor

STINNER Victor added the comment:

I proposed an optimization for x  0 (as part of a larger patch to optimize 
2 ** x) but the issue was rejected:
http://bugs.python.org/issue21420#msg217802

Mark Dickson wrote (msg217863):
There are many, many tiny optimisations we *could* be making in 
Objects/longobject.c; each of those potential optimisations adds to the cost of 
maintaining the code, detracts from readability, and potentially even slows 
down the common cases fractionally.  In general, I think we should only be 
applying this sort of optimization when there's a clear benefit to real-world 
code.  I don't think this one crosses that line.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22501] Optimise PyLong division by 1 or -1

2014-09-26 Thread Stefan Behnel

Stefan Behnel added the comment:

Attaching a similar patch for long_mul().

--
Added file: http://bugs.python.org/file36730/mul_by_1_fast_path.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22501] Optimise PyLong division by 1 or -1

2014-09-26 Thread Stefan Behnel

Stefan Behnel added the comment:

 Any optimization requires a benchmark. What is the speedup?

I gave numbers in ticket #22464.


Since many Fraction input values can already be normalised for some reason, the 
following change shaves off almost 30% of the calls to 
PyNumber_InPlaceFloorDivide() in the telco benchmark during Fraction 
instantiation according to callgrind, thus saving 20% of the CPU instructions 
that go into tp_new().


I then proposed to move this into the PyLong type in general, rather than 
letting Fraction itself do it less efficiently.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22501] Optimise PyLong division by 1 or -1

2014-09-26 Thread Stefan Behnel

Stefan Behnel added the comment:

@Serhiy: moving the fast path into l_divmod() has the disadvantage of making it 
even more complex because we'd then also want to determine the modulus, but 
only if requested, and it can be 1, 0 or -1, depending on the second value. 
Sounds like a lot more if's.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22501] Optimise PyLong division by 1 or -1

2014-09-26 Thread Stefan Behnel

Stefan Behnel added the comment:

Combined patch for both mul and div that fixes the return value of 
long_true_div(), as found by Serhiy, and removes the useless change in 
long_divrem(), as found by Antoine. Thanks!

All test_long.py tests pass now.

--
Added file: http://bugs.python.org/file36731/mul_div_by_1_fast_path.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22501] Optimise PyLong division by 1 or -1

2014-09-26 Thread Stefan Behnel

Stefan Behnel added the comment:

@Serhiy: please ignore my comment in msg227599. I'll submit a patch that moves 
the specialisation to l_divmod().

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22501] Optimise PyLong division by 1 or -1

2014-09-26 Thread Stefan Behnel

Stefan Behnel added the comment:

Thanks for the reviews, here's a new patch.

--
Added file: http://bugs.python.org/file36732/mul_div_by_1_fast_path_2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22501] Optimise PyLong division by 1 or -1

2014-09-26 Thread Stefan Behnel

Changes by Stefan Behnel sco...@users.sourceforge.net:


Removed file: http://bugs.python.org/file36732/mul_div_by_1_fast_path_2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22501] Optimise PyLong division by 1 or -1

2014-09-26 Thread Stefan Behnel

Stefan Behnel added the comment:

Sorry, last patch version contained a use before type check bug.

--
Added file: http://bugs.python.org/file36733/mul_div_by_1_fast_path_3.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22501] Optimise PyLong division by 1 or -1

2014-09-26 Thread Stefan Behnel

Stefan Behnel added the comment:

Here is an incremental patch that adds fast paths for adding and subtracting 0.

Question: the module calls long_long() in some places (e.g. long_abs()) and 
thus forces the return type to be exactly a PyLong and not a subtype. My 
changes use a plain incref+return input value in some places. Should they 
call long_long() on it instead?

--
Added file: http://bugs.python.org/file36734/add_sub_0_fast_path.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22501] Optimise PyLong division by 1 or -1

2014-09-26 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Le 26/09/2014 12:57, Stefan Behnel a écrit :
 
 Question: the module calls long_long() in some places (e.g.
long_abs()) and thus forces the return type to be exactly a PyLong and
not a subtype. My changes use a plain incref+return input value in
some places. Should they call long_long() on it instead?

Ah, yes, they should. The return type should not depend on the input
*values* :-)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22501] Optimise PyLong division by 1 or -1

2014-09-26 Thread Stefan Behnel

Changes by Stefan Behnel sco...@users.sourceforge.net:


Added file: http://bugs.python.org/file36736/mul_div_by_1_fast_path_3.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22501] Optimise PyLong division by 1 or -1

2014-09-26 Thread Stefan Behnel

Stefan Behnel added the comment:

Ok, updating both patches.

--
Added file: http://bugs.python.org/file36735/add_sub_0_fast_path_2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22501] Optimise PyLong division by 1 or -1

2014-09-26 Thread Stefan Behnel

Stefan Behnel added the comment:

I reran the fractions benchmark over the final result and the overall gain 
turned out to be, well, small. It's a clearly reproducible 2-3% faster. That's 
not bad for the macro impact of a micro-optimisation, but it's not a clear 
argument for throwing more code at it either.

I'll leave it to you to decide.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22501] Optimise PyLong division by 1 or -1

2014-09-26 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Oh, such small gain and only on one specific benchmark not included still in 
standard benchmark suite, looks discourage. May be other benchmarks have gain 
from these changes?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22501] Optimise PyLong division by 1 or -1

2014-09-26 Thread STINNER Victor

STINNER Victor added the comment:

 2-3% faster

3% is not enough to justify the change.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22501] Optimise PyLong division by 1 or -1

2014-09-26 Thread Stefan Behnel

Stefan Behnel added the comment:

Since Serhiy gave another round of valid feedback, here's an updated patch.

--
Added file: http://bugs.python.org/file36739/mul_div_by_1_fast_path_3.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22501] Optimise PyLong division by 1 or -1

2014-09-26 Thread Stefan Behnel

Stefan Behnel added the comment:

I callgrinded it again and it confirmed that the gain when doing this inside of 
long_div() and friends is way lower than doing it right in Fraction.__new__(). 
It's not safe to do there, though, as is tests on integers are generally not 
a good idea in Python code. (Although it doesn't break anything if it fails, as 
it's a pure optimisation to avoid useless overhead.)

The micro benchmarks with timeit confirm that it's as fast as expected.

Large numbers before:

$ ./python -m timeit -s 'x = 2**2000 + 3**234 + 5**891 + 7**1234' ' -x'
1000 loops, best of 3: 0.177 usec per loop
$ ./python -m timeit -s 'x = 2**2000 + 3**234 + 5**891 + 7**1234' 'x * -2'
100 loops, best of 3: 0.329 usec per loop
$ ./python -m timeit -s 'x = 2**2000 + 3**234 + 5**891 + 7**1234' 'x // -2'
10 loops, best of 3: 2.8 usec per loop

$ ./python -m timeit -s 'x = 2**2000 + 3**234 + 5**891 + 7**1234' 'x * -1'
100 loops, best of 3: 0.329 usec per loop
$ ./python -m timeit -s 'x = 2**2000 + 3**234 + 5**891 + 7**1234' 'x // -1'
10 loops, best of 3: 2.36 usec per loop

$ ./python -m timeit -s 'x = 2**2000 + 3**234 + 5**891 + 7**1234' 'x * 1'
100 loops, best of 3: 0.333 usec per loop
$ ./python -m timeit -s 'x = 2**2000 + 3**234 + 5**891 + 7**1234' 'x // 1'
10 loops, best of 3: 2.37 usec per loop


Patched:

$ ./python -m timeit -s 'x = 2**2000 + 3**234 + 5**891 + 7**1234' ' -x'
1000 loops, best of 3: 0.176 usec per loop
$ ./python -m timeit -s 'x = 2**2000 + 3**234 + 5**891 + 7**1234' 'x * -2'
100 loops, best of 3: 0.328 usec per loop
$ ./python -m timeit -s 'x = 2**2000 + 3**234 + 5**891 + 7**1234' 'x // -2'
10 loops, best of 3: 2.8 usec per loop

$ ./python -m timeit -s 'x = 2**2000 + 3**234 + 5**891 + 7**1234' 'x * -1'
1000 loops, best of 3: 0.177 usec per loop
$ ./python -m timeit -s 'x = 2**2000 + 3**234 + 5**891 + 7**1234' 'x // -1'
1000 loops, best of 3: 0.178 usec per loop

$ ./python -m timeit -s 'x = 2**2000 + 3**234 + 5**891 + 7**1234' 'x * 1'
1000 loops, best of 3: 0.0244 usec per loop
$ ./python -m timeit -s 'x = 2**2000 + 3**234 + 5**891 + 7**1234' 'x // 1'
1000 loops, best of 3: 0.0258 usec per loop


Small numbers before:

$ ./python -m timeit -s 'x = 5' 'x * -2'
1000 loops, best of 3: 0.0408 usec per loop
$ ./python -m timeit -s 'x = 5' 'x // -2'
1000 loops, best of 3: 0.0714 usec per loop

$ ./python -m timeit -s 'x = 5' 'x * -1'
1000 loops, best of 3: 0.0293 usec per loop
$ ./python -m timeit -s 'x = 5' 'x * 1'
1000 loops, best of 3: 0.0282 usec per loop

$ ./python -m timeit -s 'x = 5' 'x // 1'
1000 loops, best of 3: 0.0529 usec per loop
$ ./python -m timeit -s 'x = 5' 'x // -1'
1000 loops, best of 3: 0.0536 usec per loop


Patched:

$ ./python -m timeit -s 'x = 5' 'x * -2'
1000 loops, best of 3: 0.0391 usec per loop
$ ./python -m timeit -s 'x = 5' 'x // -2'
1000 loops, best of 3: 0.0718 usec per loop

$ ./python -m timeit -s 'x = 5' 'x * -1'
1000 loops, best of 3: 0.0267 usec per loop
$ ./python -m timeit -s 'x = 5' 'x * 1'
1000 loops, best of 3: 0.0265 usec per loop

$ ./python -m timeit -s 'x = 5' 'x // 1'
1000 loops, best of 3: 0.0259 usec per loop
$ ./python -m timeit -s 'x = 5' 'x // -1'
1000 loops, best of 3: 0.0285 usec per loop


Note: we're talking µsecs here, not usually something to worry about. And it's 
unlikely that other benchmarks see similarly high speedups as the one for 
fractions (due to the relatively high likelihood of the GCD being 1 there).

I'm ok with closing this ticket as won't fix.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22501
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com