[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting
https://bugs.kde.org/show_bug.cgi?id=406256 Carl Love changed: What|Removed |Added Status|RESOLVED|CLOSED -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting
https://bugs.kde.org/show_bug.cgi?id=406256 Carl Love changed: What|Removed |Added Resolution|--- |FIXED Status|REPORTED|RESOLVED --- Comment #13 from Carl Love --- The VEX and testsuite patch were committed. commit d2cbb78a151256290d490fcb7a805884d6406a7e Author: Carl Love Date: Tue May 28 11:33:00 2019 -0500 PPC64, Subnormal testcase changes VEX patch fixed issues with generating subnormal results. This patch adds a specific test case and updates the expected values for the existing test case. Update jm-vmx tests, add subnormal test case. https://bugs.kde.org/show_bug.cgi?id=406256 commit 991db2a39bcbdbf5cdb4337684c29f96c63070a8 Author: Carl Love Date: Tue May 28 11:26:13 2019 -0500 PPC64, fix issues with dnormal values in the vector fp instructions. The result of the floating point instructions vmaddfp, vnmsubfp, vaddfp, vsubfp, vmaxfp, vminfp, vrefp, vrsqrtefp, vcmpeqfp, vcmpeqfp, vcmpgefp, vcmpgtfp are controlled by the setting of the NJ bit in the VSCR register. If VSCR[NJ] = 0; then denormalized values are handled as specified by Java and the IEEE standard. If the bit is a 1, then the denormalized element in the vector is replaced with a zero. Valgrind was not properly handling the denormalized case for these instructions. This patch fixes the issue. https://bugs.kde.org/show_bug.cgi?id=406256 Closing -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting
https://bugs.kde.org/show_bug.cgi?id=406256 --- Comment #12 from Julian Seward --- > Updated patch to fix issues with dnormal values v5 (27.44 KB, patch) > 2019-05-15 21:27 UTC, Carl Love Details > Update test case, add new test (1.81 MB, patch) > 2019-05-15 21:28 UTC, Carl Love Carl, these look fine to land now. Thank you for your patience with this. I am particularly pleased that the supporting functions (is_NaN, etc) all got vectorised. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting
https://bugs.kde.org/show_bug.cgi?id=406256 Julian Seward changed: What|Removed |Added Attachment #119920|0 |1 is obsolete|| -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting
https://bugs.kde.org/show_bug.cgi?id=406256 Julian Seward changed: What|Removed |Added Attachment #119940|0 |1 is obsolete|| -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting
https://bugs.kde.org/show_bug.cgi?id=406256 --- Comment #11 from Carl Love --- Julian: > https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.kde.org_sho > w-5Fbug.cgi-3Fid-3D406256&d=DwIFaQ&c=jf_iaSHvJObTbx- > siA1ZOg&r=RFEmMkZAk > -- > _wFGN5tkM_A&m=5o2Sjz00Tqqxz4dOXWvuGwWsbia9aURAkghSmeY0Sm0&s=1rRXfsCpJ > Dor_oQ1SSmOvBBYZkeJgD7aT4Iz_gYMxGk&e= > > --- Comment #8 from Julian Seward --- > Thanks for the respin. I have mostly only minor comments about > it. Is OK to > land provided all the comments below are addressed, except for the > one about > vectorising negateVF32, which would be nice to fix if you can do so > relatively > easily, but is not essential. > > Also, when landing, please split the patch into two parts: the > implementation > and the tests, and land the implementation first. Done. I had all the changes in one patch to make it easier to move files around to 5 different machines for testing. > > > --- a/VEX/priv/guest_ppc_toIR.c > +++ b/VEX/priv/guest_ppc_toIR.c > > is_Zero_Vector, is_Denorm_Vector and dnormV32_adj have vectorised > nicely. Is > it also possible to do negateVF32 with vectors, rather lane > by lane as at > present? Note, for a vector version of is_NaN, you can see more or > less how > to do it by looking at isNan() in host_ppc_isel.c. > Renamed negateVF32(value) to negate_Vector( size, value) to make the naming more consistent. Also, structured it to generalize easily to more vector sizes. Currently just supporting vector of F32. This change requires creating the new function is_NaN_Vector(size, value) as you eluded to above. Again, structured the function to easily extend to more vector sizes. > > +static IRExpr* dnormV32_adj ( IRExpr* src ) > > nit: maybe rename this to be more consistent with your other vector- > helper > function names (is_Zero_Vector etc) > Yea, consistency is a good thing. Renamed dnormV32_adj() to dnorm_adj_Vector(). Note did not restructure the function to easily extend to other vector sizes. Left that for the future. > > + assign ( VSCR_NJ_mask, binop( Iop_64HLtoV128, > + unop( Iop_1Sto64, > + mkexpr( VSCR_NJ ) ) , > + unop( Iop_1Sto64, > + mkexpr( VSCR_NJ ) ) ) ); > > nit: VSCR_NJ isn't used past this point. Change its type to Ity_I64 > and lift > the 1Sto64 operation into that definition, so it isn't duplicated > here. > Yea, that would be better. Fixed. > > --- a/coregrind/m_dispatch/dispatch-ppc32-linux.S > +++ b/coregrind/m_dispatch/dispatch-ppc32-linux.S > > LafterFP2: > + /* set host Vector Status Control Register bit NJ to zero > + to ensure the host generate subnormal results for the > + vector floating point instructions. */ > +mfvscr 16 /* Clear NJ bit */ > +vspltisw 9,0x1 /* 4x 0x0001 */ > +vspltisw 8,0x0 /* zero */ > +vsldoi 9,8,9,0x2 /* <<2*8 => 4x 0x0001 > */ > +vnor 9,9,9 /* 4x 0xFFFE */ > + vand 16,16,9 /* Mask out NJ bit */ > +mtvscr 16 > > (1) Guard these with #ifdef HAS_ALTIVEC like the other Altivec stuff > in > this file. Otherwise this will fail when run on a non-Altivec > enabled target > (do we still support any of those)? And the same for the other to > assembly > files. > (2) (As a check) is the above sequence runnable even on the lowest > level > Altivec subset? Otherwise (again) it will fail at run time. > > (3) I wasn't entirely clear what the changes to the post-run > invariant checks > are (after label "postamble:"). IIUC, they already do check that > VSCR[NJ].host == 0, but the comments are wrong, and you've updated > the > comments, but not the code? Can you clarify/re-check? When I went in to fix up the code per you comments, I noticed that the code I added to set VSCR[NJ] = 0 is not doing anything. I had missed the code a bit lower: /* set host AltiVec control word to the default mode expected by VEX-generated code. */ ld 6,.tocent__vgPlain_machine_ppc64_has_VMX@toc(2) ld 6,0(6) cmpldi 6,0 beq .LafterVMX2 vspltisw 3,0x0 /* generate zero */ mtvscr 3 which is forcing the entire VSCR register to zero. The instruction vtvscr moves the contents of the specified register to VSCR. This is actually what I had originally been looking for when I was trying to figure out why Valgrind always generated subnormal results but natively I wasn't getting subnormal results on BE. This code t
[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting
https://bugs.kde.org/show_bug.cgi?id=406256 --- Comment #10 from Carl Love --- Created attachment 120090 --> https://bugs.kde.org/attachment.cgi?id=120090&action=edit Update test case, add new test The test case patch that goes with the subnormal changes in VEX. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting
https://bugs.kde.org/show_bug.cgi?id=406256 --- Comment #9 from Carl Love --- Created attachment 120089 --> https://bugs.kde.org/attachment.cgi?id=120089&action=edit Updated patch to fix issues with dnormal values v5 Updated the patch per latest comments from Julian. Split patch into VEX patch and test case patch. Renamed negateVF32(value) to negate_Vector( size, value), vectorized. Created new function is_NaN_Vector(size, value) Renamed dnormV32_adj() to dnorm_adj_Vector(). Lifted VSCR_NJ, made it Ity_I64. The various assembly routines: Removed my new code to set VSCR[NJ]=0 as it is redundant given that there is existing code that is already clearing the register. Updated the comments in the code to make it clear where VSCR[NJ] is set to 0. Updated comments with regard to the invarent check to make it clear what is being checked and what to do based on the check. There are now no functional changes to the assembly functions as they already ensure the VSCR[NJ] but is set to 0. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting
https://bugs.kde.org/show_bug.cgi?id=406256 --- Comment #8 from Julian Seward --- Thanks for the respin. I have mostly only minor comments about it. Is OK to land provided all the comments below are addressed, except for the one about vectorising negateVF32, which would be nice to fix if you can do so relatively easily, but is not essential. Also, when landing, please split the patch into two parts: the implementation and the tests, and land the implementation first. --- a/VEX/priv/guest_ppc_toIR.c +++ b/VEX/priv/guest_ppc_toIR.c is_Zero_Vector, is_Denorm_Vector and dnormV32_adj have vectorised nicely. Is it also possible to do negateVF32 with vectors, rather lane by lane as at present? Note, for a vector version of is_NaN, you can see more or less how to do it by looking at isNan() in host_ppc_isel.c. +static IRExpr* dnormV32_adj ( IRExpr* src ) nit: maybe rename this to be more consistent with your other vector-helper function names (is_Zero_Vector etc) + assign ( VSCR_NJ_mask, binop( Iop_64HLtoV128, + unop( Iop_1Sto64, + mkexpr( VSCR_NJ ) ) , + unop( Iop_1Sto64, + mkexpr( VSCR_NJ ) ) ) ); nit: VSCR_NJ isn't used past this point. Change its type to Ity_I64 and lift the 1Sto64 operation into that definition, so it isn't duplicated here. --- a/coregrind/m_dispatch/dispatch-ppc32-linux.S +++ b/coregrind/m_dispatch/dispatch-ppc32-linux.S LafterFP2: + /* set host Vector Status Control Register bit NJ to zero + to ensure the host generate subnormal results for the + vector floating point instructions. */ +mfvscr 16 /* Clear NJ bit */ +vspltisw 9,0x1 /* 4x 0x0001 */ +vspltisw 8,0x0 /* zero */ +vsldoi 9,8,9,0x2 /* <<2*8 => 4x 0x0001 */ +vnor 9,9,9 /* 4x 0xFFFE */ + vand 16,16,9 /* Mask out NJ bit */ +mtvscr 16 (1) Guard these with #ifdef HAS_ALTIVEC like the other Altivec stuff in this file. Otherwise this will fail when run on a non-Altivec enabled target (do we still support any of those)? And the same for the other to assembly files. (2) (As a check) is the above sequence runnable even on the lowest level Altivec subset? Otherwise (again) it will fail at run time. (3) I wasn't entirely clear what the changes to the post-run invariant checks are (after label "postamble:"). IIUC, they already do check that VSCR[NJ].host == 0, but the comments are wrong, and you've updated the comments, but not the code? Can you clarify/re-check? diff --git a/memcheck/tests/ppc32/vgcore.9687 b/memcheck/tests/ppc32/vgcore.9687 new file mode 100644 This should definitely not be in the patch! -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting
https://bugs.kde.org/show_bug.cgi?id=406256 --- Comment #7 from Carl Love --- Created attachment 119940 --> https://bugs.kde.org/attachment.cgi?id=119940&action=edit Updated patch to fix issues with dnormal values v5 Updated patch after finding issues on Power 7. The new assembly code was re-written to remove the mtvsrd and mfvsrd so the code will run. The specific files are: coregrind/m_dispatch/dispatch-ppc32-linux.S coregrind/m_dispatch/dispatch-ppc64be-linux.S coregrind/m_dispatch/dispatch-ppc64le-linux.S The updated patch has been tested on Power 6, Power 7, Power 8LE, Power 8BE and Power 9. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting
https://bugs.kde.org/show_bug.cgi?id=406256 --- Comment #6 from Carl Love --- Additional testing shows that the mtvsrd and mfvsrd instructions which are used in the assembly interface functions are not supported on P7 and earlier. Will need to rework current patch. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting
https://bugs.kde.org/show_bug.cgi?id=406256 Carl Love changed: What|Removed |Added Attachment #119918|0 |1 is obsolete|| --- Comment #5 from Carl Love --- Created attachment 119920 --> https://bugs.kde.org/attachment.cgi?id=119920&action=edit Updated patch to fix issues with dnormal values v4 Testing on P7 found that I needed to make sure the system supports altivec or the subnormal_test will fail on illegal inst. Updated the subnormal_test.vgtest file -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting
https://bugs.kde.org/show_bug.cgi?id=406256 Carl Love changed: What|Removed |Added Attachment #119504|0 |1 is obsolete|| --- Comment #4 from Carl Love --- Created attachment 119918 --> https://bugs.kde.org/attachment.cgi?id=119918&action=edit Updated patch to fix issues with dnormal values v3 Updated the patch. Moved the code to set VSCR[NJ] to the assembly routines for ppc64le, ppc64be, ppc32. Tested on P8 LE, P8 BE, P9. Manually verified the assembly code will jump to .invariant_violation if the VSCR[NJ] bit is set to 1 by replacing the "mfvscr 7" instruction with some instructions that sets contents of register v7 to 0x01 which is what the value would be if the NJ bit is set. The routine dnormV32_adj() was rewritten to use vector Iops. Fixed a couple of bugs that I found in retesting the patch. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting
https://bugs.kde.org/show_bug.cgi?id=406256 Carl Love changed: What|Removed |Added Attachment #119258|0 |1 is obsolete|| --- Comment #3 from Carl Love --- Created attachment 119504 --> https://bugs.kde.org/attachment.cgi?id=119504&action=edit Updated patch to fix issues with dnormal values Updated patch, needs review by Julian -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting
https://bugs.kde.org/show_bug.cgi?id=406256 --- Comment #2 from Will Schmidt --- I have no input into a different better spot for getRRegUniverse_PPC changes. A few other patch comments: +++ b/Makefile.all.am the "-mvsx -maltivec" combo can be shrunk to just "-mvsx" . (mvsx is a superset that includes maltivec). + /* LE API requires NJ be set to 0. */ A few spots.. that API reference should prob be ABI. + vrm[0] = ~(1 << (127 - 111)) & vrm[0]; // Clear NJ bit LE case + vrm[1] = ~(1 << (127 - 111)) & vrm[1]; // Clear NJ bit BE case Preferably wrap this in some logic so we are not writing to reserved bits of the vscr in the non-LE or non-BE cases. +static IRExpr* dnorm_adj ( IRExpr* value ) just wondering, should this be instead named dnorm_adj_int or some variation to avoid conflict if we need to do anything similar with Longs or Shorts in the future. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting
https://bugs.kde.org/show_bug.cgi?id=406256 Carl Love changed: What|Removed |Added CC||will_schm...@vnet.ibm.com -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting
https://bugs.kde.org/show_bug.cgi?id=406256 --- Comment #1 from Carl Love --- Created attachment 119258 --> https://bugs.kde.org/attachment.cgi?id=119258&action=edit Proposed fix for making the subnormal results track the VSCR[NJ] bit The attachment is a proposed fix for the issue. The Iops to implement the various vector floating point instructions map to a subset of the vector floating point instructions. The underlying host hardware needs to run with the VSCR[NJ] bit set to zero to generate the subnormal results. The guest state needs to then adjust the arguments/results of the vector floating point instructions as needed based on the guest setting of VSCR[NJ]. Hence the need to setup the host with VSCR[NJ] = 0. This was done in VEX/priv/host_ppc_defs.c, function getRRegUniverse_PPC (). The function is actually setting up the guest register state. The function is called once as part of the initialization process. That is the right time to set the host configuration. Ideally there would be a host initialization function that would be called for doing this but I don't see one. Given that there is not host specific function, I had to put the code into the guest setup function. I don't consider this to be the ideal place for the code. I would be interested in ideas on a more appropriate location for this code. *** The host setup code requires the addition of the -mvsx and -maltivec command line options to be set for Power 7. These options are on by default when compiling for Power 8, 9. Hence the Makefile.all.am change. Finally, the current regression tests do not seem to cover the subnormal cases well enough. I created an explicit subnormal test which runs with the VSCR[NJ] bit set to zero and one. Again, would appreciate feedback on where best to do the host initialization. -- You are receiving this mail because: You are watching all bug changes.