As a bystander, this is all fascinating (I had actually anticipated that the //10 optimization came from PGO).
Does the optimization for //10 actually help in the real world? It would if people did a lot of manual conversion to decimal, which is easiest expressed using //10. But presumably for that people mostly end up using str() or repr(), which has its own custom code, long_to_decimal_string_internal(). Then again I'm not sure what's *lost* even if this optimization is pointless -- surely it doesn't slow other divisions down enough to be measurable. On Sun, Jan 16, 2022 at 12:35 PM Mark Dickinson <dicki...@gmail.com> wrote: > On Sun, Jan 16, 2022 at 12:08 PM Mark Dickinson <dicki...@gmail.com> > wrote: > >> So gcc is anticipating divisions by 10 and introducing special-case >> divide-by-reciprocal-multiply code for that case, and presumably the >> profile generated for the PGO backs up this being a common enough case, so >> we end up with the above code in the final compilation. >> > > Nope, that's not what's happening. This analysis is backwards, and > unfairly attributes to GCC the apparently arbitrary choice to > optimise division by 10. But it's not GCC's fault; it's ours. What's > *actually* happening is that GCC is simply recording values for n used in > calls to divrem1 (via the -fprofile-values option, which is implied by > -fprofile-generate, which is used as a result of the --enable-optimizations > configure script option). It's then noticing that in our profile task > (which consists of a selection of Lib/test/test_*.py test files) we most > often do divisions by 10, and so it optimizes that case. > > To test this hypothesis I added a large number of tests for division by 17 > in test_long.py, and then recompiled from scratch (again with > --enable-optimizations). Here are the results: > > root@341b5fd44b23:/home/cpython# ./python -m timeit -n 1000000 -s > "x=10**1000; y=10" "x//y" > > 1000000 loops, best of 5: 1.14 usec per loop > > root@341b5fd44b23:/home/cpython# ./python -m timeit -n 1000000 -s > "x=10**1000; y=17" "x//y" > > 1000000 loops, best of 5: 306 nsec per loop > > root@341b5fd44b23:/home/cpython# ./python -m timeit -n 1000000 -s > "x=10**1000; y=1" "x//y" > > 1000000 loops, best of 5: 1.14 usec per loop > > root@341b5fd44b23:/home/cpython# ./python -m timeit -n 1000000 -s > "x=10**1000; y=2" "x//y" > > 1000000 loops, best of 5: 1.15 usec per loop > > As expected, division by 17 is now optimised; division by 10 is as slow as > division by other small scalars. > > -- > Mark > > _______________________________________________ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/2MOQCVMEQBV7PATT47GUYHS42QIJHTRK/ > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/67DXYX3YLMDC5R4X6FI3NMRT2TGZDZHC/ Code of Conduct: http://python.org/psf/codeofconduct/