Re: Linux Sparc FPU register corruption
From: James Y Knight Date: Tue, 9 Jun 2015 17:54:10 -0400 > What distro are you running? Maybe I should try the same thing you're using... Debian 7.0 -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20150609.163754.1323765095739500628.da...@davemloft.net
Re: Linux Sparc FPU register corruption
On Jun 9, 2015, at 3:02 PM, David Miller wrote: > Anyways, please put proper memory barriers into your testcase and > let's see if the problem still triggers. Okay, I edited my test case changing the asm to "membar #Sync; stda %%f0, [%0] 0xf0; membar #Sync". That should be entirely over-doing it, right? Problem still occurs within seconds. And, don't forget Problem #2 -- which didn't even involve weird ASIs at all (at least, not in the test task itself). On Jun 9, 2015, at 4:45 PM, David Miller wrote: > Just FYI, I commented out the usleep() in your test program and have > been running your: > > seq 64 | xargs -n1 -P64 /bin/sh -c 'while ./a.out; do : ; done' > > test, and it's been running flawlessly for 2 hours on my T4. > > This is with Linus's current tree. > > I'll fire up my T3 later and try to reproduce it there. Gah! Thanks for testing it on your setup!...but that's really unfortunate that you can't reproduce the problem! What distro are you running? Maybe I should try the same thing you're using... Anyone else reading this able to reproduce problems with my test program (or not?) -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/513253de-8b73-4956-be1f-49b714615...@google.com
Re: Linux Sparc FPU register corruption
Just FYI, I commented out the usleep() in your test program and have been running your: seq 64 | xargs -n1 -P64 /bin/sh -c 'while ./a.out; do : ; done' test, and it's been running flawlessly for 2 hours on my T4. This is with Linus's current tree. I'll fire up my T3 later and try to reproduce it there. -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20150609.134537.10408041080684182.da...@davemloft.net
Re: Linux Sparc FPU register corruption
From: James Y Knight Date: Tue, 9 Jun 2015 08:13:58 -0400 > Um, but my test isn't testing what is being stored to memory at > all. It is storing to memory and **never loading from the memory > after**. Why would writing FROM fp registers TO memory corrupt the > *registers* due to a missing memory barrier? The memory barrier is necessary for two reasons, only one of them is to handle the asynchronousness of the memory operations. The other reason is that there are strict rules for accessing the FPU register file around block loads and stores. Therefore if you don't do the proper memory barriers you can get corrupted FPU register as well as memory contents. And complicating things even more, what you can get away with is different on every single cpu variant. That's why I really wish debian didn't disable multiarch as that makes the code use the UltraSPARC-I/II/III memcpy, which might not be %100 kosher on Niagara-T1 and later. Anyways, please put proper memory barriers into your testcase and let's see if the problem still triggers. Thanks. -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20150609.120207.248599655687564710.da...@davemloft.net
Re: Linux Sparc FPU register corruption
From: James Y Knight Date: Tue, 9 Jun 2015 08:07:11 -0400 > Debian glibc has multiarch support disabled (done a couple years ago to try > to workaround the unreliability, not entirely successfully..), so it's not > using that routine you mention. It's using > sysdeps/sparc/sparc32/sparcv9/memcpy.S which points to > sysdeps/sparc/sparc64/memcpy.S I bet you stop getting corruptions if you use the appropriate optimized routine. -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20150609.111649.1613386818130198452.da...@davemloft.net
Re: Linux Sparc FPU register corruption
From: David Miller Date: Tue, 09 Jun 2015 00:34:14 -0700 (PDT) > Your test is faulty. > > You cannot use ASI_BLK_P loads or stores without appropriate memory > barriers around them. Um, but my test isn't testing what is being stored to memory at all. It is storing to memory and **never loading from the memory after**. Why would writing FROM fp registers TO memory corrupt the *registers* due to a missing memory barrier? If problem 1 is due to a missing membar somewhere, I think it's surely gotta be the kernel that's missing it, not my test code. No? > FWIW, you're probably hitting the bug fixed by the following commit in > glibc: > > commit 834caf06f33d79be54cff63c274fba2845513593 > Author: Jose E. Marchesi > Date: Sat May 17 11:20:27 2014 -0700 > >Fix sparc memcpy data corruption when using niagara2 optimized routines. > >* sysdeps/sparc/sparc64/multiarch/memcpy-niagara2.S: Add missing >membar to avoid block loads/stores to overlap previous stores. > Debian glibc has multiarch support disabled (done a couple years ago to try to workaround the unreliability, not entirely successfully..), so it's not using that routine you mention. It's using sysdeps/sparc/sparc32/sparcv9/memcpy.S which points to sysdeps/sparc/sparc64/memcpy.S [sorry for the duplicate post -- didn't realize vger.kernel.org doesn't accept emails sent by the android gmail client] -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/ecf5d654-b024-4286-a22f-bf6cdf437...@google.com
Re: Linux Sparc FPU register corruption
From: David Miller Date: Tue, 09 Jun 2015 00:34:14 -0700 (PDT) > > Your test is faulty. > > You cannot use ASI_BLK_P loads or stores without appropriate memory > barriers around them. Um, but my test isn't testing what is being stored to memory at all. It is storing to memory and **never loading from the memory after**. Why would writing FROM fp registers TO memory corrupt the *registers* due to a missing memory barrier? If problem 1 is due to a missing membar somewhere, I think it's surely gotta be the kernel that's missing it, not my test code. No? On Jun 9, 2015 3:46 AM, "David Miller" wrote: > > FWIW, you're probably hitting the bug fixed by the following commit in > glibc: > > commit 834caf06f33d79be54cff63c274fba2845513593 > Author: Jose E. Marchesi > Date: Sat May 17 11:20:27 2014 -0700 > > Fix sparc memcpy data corruption when using niagara2 optimized routines. > > * sysdeps/sparc/sparc64/multiarch/memcpy-niagara2.S: Add missing > membar to avoid block loads/stores to overlap previous stores. > Debian glibc has multiarch support disabled (done a couple years ago to try to workaround the unreliability, not entirely successfully..), so it's not using that routine you mention. It's using sysdeps/sparc/sparc32/sparcv9/memcpy.S which points to sysdeps/sparc/sparc64/memcpy.S
Re: Linux Sparc FPU register corruption
From: David Miller Date: Tue, 09 Jun 2015 00:34:14 -0700 (PDT) > Your test is faulty. > > You cannot use ASI_BLK_P loads or stores without appropriate memory > barriers around them. > > The rules for when and where you need the memory barriers are > complicated, especially if you want to incur the cost of the memory > barrier as infrequently as possible. FWIW, you're probably hitting the bug fixed by the following commit in glibc: commit 834caf06f33d79be54cff63c274fba2845513593 Author: Jose E. Marchesi Date: Sat May 17 11:20:27 2014 -0700 Fix sparc memcpy data corruption when using niagara2 optimized routines. * sysdeps/sparc/sparc64/multiarch/memcpy-niagara2.S: Add missing membar to avoid block loads/stores to overlap previous stores. -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20150609.004641.1323317951176412858.da...@davemloft.net
Re: Linux Sparc FPU register corruption
Your test is faulty. You cannot use ASI_BLK_P loads or stores without appropriate memory barriers around them. The rules for when and where you need the memory barriers are complicated, especially if you want to incur the cost of the memory barrier as infrequently as possible. -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20150609.003414.731851656168946673.da...@davemloft.net