Re: Linux Sparc FPU register corruption
On 2015-06-09 12:02, David Miller wrote: From: James Y Knight jykni...@google.com Date: Tue, 9 Jun 2015 08:13:58 -0400 Um, but my test isn't testing what is being stored to memory at all. It is storing to memory and **never loading from the memory after**. Why would writing FROM fp registers TO memory corrupt the *registers* due to a missing memory barrier? The memory barrier is necessary for two reasons, only one of them is to handle the asynchronousness of the memory operations. The other reason is that there are strict rules for accessing the FPU register file around block loads and stores. Therefore if you don't do the proper memory barriers you can get corrupted FPU register as well as memory contents. And complicating things even more, what you can get away with is different on every single cpu variant. That's why I really wish So it means the userland code doesn't run the same on the various CPU. How are we supposed to do with static binaries? debian didn't disable multiarch as that makes the code use the UltraSPARC-I/II/III memcpy, which might not be %100 kosher on Niagara-T1 and later. The UltraSPARC-I/II/III memcpy code might have issues, but it clearly works a lot better than the Niagara-T1 code on a Niagara-T1 machine. Disabling multiarch support improves a lot the stability on these machines. Aurelien -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurel...@aurel32.net http://www.aurel32.net -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20150610075006.gd10...@aurel32.net
Re: Linux Sparc FPU register corruption
On 2015-06-10 02:18, David Miller wrote: Disabling multiarch support improves a lot the stability on these machines. By disabling it you are creating an even worse situation, for the reasons I've discussed already, plus guess what I test when I'm doing development? How could it be worse? With the Niagara T1 memcpy routines, the machine is not usable, as the processes crashes regularly with a segmentation fault. With the default sparc v9 memcpy routines, the machine becomes usable. -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurel...@aurel32.net http://www.aurel32.net -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20150610093314.ga10...@aurel32.net
Re: Linux Sparc FPU register corruption
From: Aurelien Jarno aurel...@aurel32.net Date: Wed, 10 Jun 2015 09:50:06 +0200 So it means the userland code doesn't run the same on the various CPU. How are we supposed to do with static binaries? Multiarch works perfectly fine in static binaries, just the same as it does with dynamically linked executables. Normally static binaries do not use PLT entries, but with multiarch it does, so that the proper routine can be resolved at run time just as it would via the dynamic linker. Disabling multiarch support improves a lot the stability on these machines. By disabling it you are creating an even worse situation, for the reasons I've discussed already, plus guess what I test when I'm doing development? -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20150610.021833.1402529650339691081.da...@davemloft.net
Re: Linux Sparc FPU register corruption
From: David Miller da...@davemloft.net Date: Tue, 09 Jun 2015 13:45:37 -0700 (PDT) Just FYI, I commented out the usleep() in your test program and have been running your: seq 64 | xargs -n1 -P64 /bin/sh -c 'while ./a.out; do : ; done' test, and it's been running flawlessly for 2 hours on my T4. This is with Linus's current tree. I'll fire up my T3 later and try to reproduce it there. Good news, I can reproduce it on my T3. I'll try to debug this. -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20150610.132207.1606053216447272806.da...@davemloft.net
Re: Linux Sparc FPU register corruption
From: James Y Knight jykni...@google.com Date: Wed, 10 Jun 2015 10:40:58 -0400 But separately from the reliability issue, it seems rather unfortunate that the 'default' sparcv9 and sparc64 routines aren't actually coded to the base sparcv9 standard instruction set. It seems like probably the base routines should limit themselves to normal LDX/STX or LDDF/STDF instructions, and leave things like LDBLOCKF (which the docs mark CPU-specific, and deprecated, and potentially to be removed from future chips), for when a specific processor is targeted. All sparc64 cpus support block loads and stores. There are documents, and then there is reality. Those instructions cannot be removed without breaking tons of code out there and the people making changes to the sparc64 cpus are painfully aware of this. -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20150610.110738.1101193887993858245.da...@davemloft.net
Re: Linux Sparc FPU register corruption
From: Aurelien Jarno aurel...@aurel32.net Date: Wed, 10 Jun 2015 11:33:14 +0200 On 2015-06-10 02:18, David Miller wrote: Disabling multiarch support improves a lot the stability on these machines. By disabling it you are creating an even worse situation, for the reasons I've discussed already, plus guess what I test when I'm doing development? How could it be worse? With the Niagara T1 memcpy routines, the machine is not usable, as the processes crashes regularly with a segmentation fault. With the default sparc v9 memcpy routines, the machine becomes usable. I do not think this will happen with current kernel and glibc and all the bug fixes I've been doing over the past few years. -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20150610.110250.1634796446687407507.da...@davemloft.net