Re: Linux Sparc FPU register corruption

2015-06-10 Thread Aurelien Jarno
On 2015-06-09 12:02, David Miller wrote:
 From: James Y Knight jykni...@google.com
 Date: Tue, 9 Jun 2015 08:13:58 -0400
 
  Um, but my test isn't testing what is being stored to memory at
  all. It is storing to memory and **never loading from the memory
  after**. Why would writing FROM fp registers TO memory corrupt the
  *registers* due to a missing memory barrier?
 
 The memory barrier is necessary for two reasons, only one of them is
 to handle the asynchronousness of the memory operations.
 
 The other reason is that there are strict rules for accessing the FPU
 register file around block loads and stores.
 
 Therefore if you don't do the proper memory barriers you can get
 corrupted FPU register as well as memory contents.
 
 And complicating things even more, what you can get away with is
 different on every single cpu variant.  That's why I really wish

So it means the userland code doesn't run the same on the various
CPU. How are we supposed to do with static binaries?

 debian didn't disable multiarch as that makes the code use the
 UltraSPARC-I/II/III memcpy, which might not be %100 kosher on
 Niagara-T1 and later.

The UltraSPARC-I/II/III memcpy code might have issues, but it clearly
works a lot better than the Niagara-T1 code on a Niagara-T1 machine.
Disabling multiarch support improves a lot the stability on these
machines.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net


-- 
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20150610075006.gd10...@aurel32.net



Re: Linux Sparc FPU register corruption

2015-06-10 Thread Aurelien Jarno
On 2015-06-10 02:18, David Miller wrote:
  Disabling multiarch support improves a lot the stability on these
  machines.
 
 By disabling it you are creating an even worse situation, for the
 reasons I've discussed already, plus guess what I test when I'm
 doing development?

How could it be worse? With the Niagara T1 memcpy routines, the machine
is not usable, as the processes crashes regularly with a segmentation
fault. With the default sparc v9 memcpy routines, the machine becomes
usable.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net


-- 
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20150610093314.ga10...@aurel32.net



Re: Linux Sparc FPU register corruption

2015-06-10 Thread David Miller
From: Aurelien Jarno aurel...@aurel32.net
Date: Wed, 10 Jun 2015 09:50:06 +0200

 So it means the userland code doesn't run the same on the various
 CPU. How are we supposed to do with static binaries?

Multiarch works perfectly fine in static binaries, just the same as it
does with dynamically linked executables.

Normally static binaries do not use PLT entries, but with multiarch
it does, so that the proper routine can be resolved at run time just
as it would via the dynamic linker.

 Disabling multiarch support improves a lot the stability on these
 machines.

By disabling it you are creating an even worse situation, for the
reasons I've discussed already, plus guess what I test when I'm
doing development?


-- 
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/20150610.021833.1402529650339691081.da...@davemloft.net



Re: Linux Sparc FPU register corruption

2015-06-10 Thread David Miller
From: David Miller da...@davemloft.net
Date: Tue, 09 Jun 2015 13:45:37 -0700 (PDT)

 Just FYI, I commented out the usleep() in your test program and have
 been running your:
 
 seq 64 | xargs -n1 -P64 /bin/sh -c 'while ./a.out; do : ; done'
 
 test, and it's been running flawlessly for 2 hours on my T4.
 
 This is with Linus's current tree.
 
 I'll fire up my T3 later and try to reproduce it there.

Good news, I can reproduce it on my T3.

I'll try to debug this.


-- 
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/20150610.132207.1606053216447272806.da...@davemloft.net



Re: Linux Sparc FPU register corruption

2015-06-10 Thread David Miller
From: James Y Knight jykni...@google.com
Date: Wed, 10 Jun 2015 10:40:58 -0400

 But separately from the reliability issue, it seems rather
 unfortunate that the 'default' sparcv9 and sparc64 routines aren't
 actually coded to the base sparcv9 standard instruction set. It
 seems like probably the base routines should limit themselves to
 normal LDX/STX or LDDF/STDF instructions, and leave things like
 LDBLOCKF (which the docs mark CPU-specific, and deprecated, and
 potentially to be removed from future chips), for when a specific
 processor is targeted.

All sparc64 cpus support block loads and stores.

There are documents, and then there is reality.

Those instructions cannot be removed without breaking tons of code
out there and the people making changes to the sparc64 cpus are
painfully aware of this.


-- 
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/20150610.110738.1101193887993858245.da...@davemloft.net



Re: Linux Sparc FPU register corruption

2015-06-10 Thread David Miller
From: Aurelien Jarno aurel...@aurel32.net
Date: Wed, 10 Jun 2015 11:33:14 +0200

 On 2015-06-10 02:18, David Miller wrote:
  Disabling multiarch support improves a lot the stability on these
  machines.
 
 By disabling it you are creating an even worse situation, for the
 reasons I've discussed already, plus guess what I test when I'm
 doing development?
 
 How could it be worse? With the Niagara T1 memcpy routines, the machine
 is not usable, as the processes crashes regularly with a segmentation
 fault. With the default sparc v9 memcpy routines, the machine becomes
 usable.

I do not think this will happen with current kernel and glibc and
all the bug fixes I've been doing over the past few years.


-- 
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/20150610.110250.1634796446687407507.da...@davemloft.net