[Barry Scott and Steve Dower share tips for convincing Visual Studio
 to show assembler without recompiling the file]

Thanks, fellows! That mostly ;-) workedl. Problem remaining is that
breakpoints just didn't work. They showed up "visually", and in the table
of set breakpoints, but code went whizzing right by them every time.

I didn't investigate. It's possible, e.g., that the connection between C
source and the generated PGO code was so obscure that VS just gave up - or
just blew it.

Instead I wrote a Python loop to run a division of interest "forever". That
way I hoped I'd be likely to land in the loop of interest by luck when I
broke into the process.

Which worked! So here's the body of the main loop:

00007FFE451D2760  mov         eax,dword ptr [rcx-4]
00007FFE451D2763  lea         rcx,[rcx-4]
00007FFE451D2767  shl         r9,1Eh
00007FFE451D276B  or          r9,rax
00007FFE451D276E  cmp         r8,0Ah
00007FFE451D2772  jne         long_div+25Bh (07FFE451D27CBh)
00007FFE451D2774  mov         rax,rdi
00007FFE451D2777  mul         rax,r9
00007FFE451D277A  mov         rax,rdx
00007FFE451D277D  shr         rax,3
00007FFE451D2781  mov         dword ptr [r10+rcx],eax
00007FFE451D2785  mov         eax,eax
00007FFE451D2787  imul        rax,r8
00007FFE451D278B  sub         r9,rax
00007FFE451D278E  sub         rbx,1
00007FFE451D2792  jns         long_div+1F0h (07FFE451D2760h)

And above the loop is this line, which you'll recognize as loading the same
scaled reciprocal of 10 as the gcc code Mark posted earlier. The code above
moves %rdi into %rax before the mul instruction:

00007FFE451D2747  mov         rdi,0CCCCCCCCCCCCCCCDh

Note an odd decision here:the MS code compares the divisor to 10 on _every_
iteration. There are not two, "10 or not 10?", loop; bodies. Instead, if
the divisor isn't 10, "jne long_div+25Bh" jumps to code not shown here, a
few instructions that use hardware division, and then jump back into the
tail end of the loop above to finish computing the remainder (etc).

So they not only optimized division by 10, they added a useless test and
two branches to every iteration of the loop when we're not dividing by 10
;-)
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/DGRMIVHOIAYQEQJD32ED4MCBOPFE5SIT/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to