As a bystander, this is all fascinating (I had actually anticipated that
the //10 optimization came from PGO).

Does the optimization for //10 actually help in the real world? It would if
people did a lot of manual conversion to decimal, which is easiest
expressed using //10. But presumably for that people mostly end up using
str() or repr(), which has its own custom code,
long_to_decimal_string_internal().

Then again I'm not sure what's *lost* even if this optimization is
pointless -- surely it doesn't slow other divisions down enough to be
measurable.

On Sun, Jan 16, 2022 at 12:35 PM Mark Dickinson <dicki...@gmail.com> wrote:

> On Sun, Jan 16, 2022 at 12:08 PM Mark Dickinson <dicki...@gmail.com>
> wrote:
>
>> So gcc is anticipating divisions by 10 and introducing special-case
>> divide-by-reciprocal-multiply code for that case, and presumably the
>> profile generated for the PGO backs up this being a common enough case, so
>> we end up with the above code in the final compilation.
>>
>
> Nope, that's not what's happening. This analysis is backwards, and
> unfairly attributes to GCC the apparently arbitrary choice to
> optimise division by 10. But it's not GCC's fault; it's ours. What's
> *actually* happening is that GCC is simply recording values for n used in
> calls to divrem1 (via the -fprofile-values option, which is implied by
> -fprofile-generate, which is used as a result of the --enable-optimizations
> configure script option). It's then noticing that in our profile task
> (which consists of a selection of Lib/test/test_*.py test files) we most
> often do divisions by 10, and so it optimizes that case.
>
> To test this hypothesis I added a large number of tests for division by 17
> in test_long.py, and then recompiled from scratch (again with
> --enable-optimizations). Here are the results:
>
> root@341b5fd44b23:/home/cpython# ./python -m timeit -n 1000000 -s
> "x=10**1000; y=10" "x//y"
>
> 1000000 loops, best of 5: 1.14 usec per loop
>
> root@341b5fd44b23:/home/cpython# ./python -m timeit -n 1000000 -s
> "x=10**1000; y=17" "x//y"
>
> 1000000 loops, best of 5: 306 nsec per loop
>
> root@341b5fd44b23:/home/cpython# ./python -m timeit -n 1000000 -s
> "x=10**1000; y=1" "x//y"
>
> 1000000 loops, best of 5: 1.14 usec per loop
>
> root@341b5fd44b23:/home/cpython# ./python -m timeit -n 1000000 -s
> "x=10**1000; y=2" "x//y"
>
> 1000000 loops, best of 5: 1.15 usec per loop
>
> As expected, division by 17 is now optimised; division by 10 is as slow as
> division by other small scalars.
>
> --
> Mark
>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/2MOQCVMEQBV7PATT47GUYHS42QIJHTRK/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*
<http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/67DXYX3YLMDC5R4X6FI3NMRT2TGZDZHC/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to