Re: Linux Sparc FPU register corruption

2015-06-10 Thread David Miller
From: David Miller 
Date: Tue, 09 Jun 2015 13:45:37 -0700 (PDT)

> Just FYI, I commented out the usleep() in your test program and have
> been running your:
> 
> seq 64 | xargs -n1 -P64 /bin/sh -c 'while ./a.out; do : ; done'
> 
> test, and it's been running flawlessly for 2 hours on my T4.
> 
> This is with Linus's current tree.
> 
> I'll fire up my T3 later and try to reproduce it there.

Good news, I can reproduce it on my T3.

I'll try to debug this.


-- 
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/20150610.132207.1606053216447272806.da...@davemloft.net



Re: Linux Sparc FPU register corruption

2015-06-10 Thread David Miller
From: James Y Knight 
Date: Wed, 10 Jun 2015 10:40:58 -0400

> But separately from the reliability issue, it seems rather
> unfortunate that the 'default' sparcv9 and sparc64 routines aren't
> actually coded to the base sparcv9 standard instruction set. It
> seems like probably the base routines should limit themselves to
> normal LDX/STX or LDDF/STDF instructions, and leave things like
> LDBLOCKF (which the docs mark CPU-specific, and deprecated, and
> potentially to be removed from future chips), for when a specific
> processor is targeted.

All sparc64 cpus support block loads and stores.

There are documents, and then there is reality.

Those instructions cannot be removed without breaking tons of code
out there and the people making changes to the sparc64 cpus are
painfully aware of this.


-- 
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/20150610.110738.1101193887993858245.da...@davemloft.net



Re: Linux Sparc FPU register corruption

2015-06-10 Thread David Miller
From: Aurelien Jarno 
Date: Wed, 10 Jun 2015 11:33:14 +0200

> On 2015-06-10 02:18, David Miller wrote:
>> > Disabling multiarch support improves a lot the stability on these
>> > machines.
>> 
>> By disabling it you are creating an even worse situation, for the
>> reasons I've discussed already, plus guess what I test when I'm
>> doing development?
> 
> How could it be worse? With the Niagara T1 memcpy routines, the machine
> is not usable, as the processes crashes regularly with a segmentation
> fault. With the default sparc v9 memcpy routines, the machine becomes
> usable.

I do not think this will happen with current kernel and glibc and
all the bug fixes I've been doing over the past few years.


-- 
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/20150610.110250.1634796446687407507.da...@davemloft.net



Re: Linux Sparc FPU register corruption

2015-06-10 Thread David Miller
From: Aurelien Jarno 
Date: Wed, 10 Jun 2015 09:50:06 +0200

> So it means the userland code doesn't run the same on the various
> CPU. How are we supposed to do with static binaries?

Multiarch works perfectly fine in static binaries, just the same as it
does with dynamically linked executables.

Normally static binaries do not use PLT entries, but with multiarch
it does, so that the proper routine can be resolved at run time just
as it would via the dynamic linker.

> Disabling multiarch support improves a lot the stability on these
> machines.

By disabling it you are creating an even worse situation, for the
reasons I've discussed already, plus guess what I test when I'm
doing development?


-- 
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/20150610.021833.1402529650339691081.da...@davemloft.net



Re: Linux Sparc FPU register corruption

2015-06-10 Thread Aurelien Jarno
On 2015-06-10 02:18, David Miller wrote:
> > Disabling multiarch support improves a lot the stability on these
> > machines.
> 
> By disabling it you are creating an even worse situation, for the
> reasons I've discussed already, plus guess what I test when I'm
> doing development?

How could it be worse? With the Niagara T1 memcpy routines, the machine
is not usable, as the processes crashes regularly with a segmentation
fault. With the default sparc v9 memcpy routines, the machine becomes
usable.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net


-- 
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20150610093314.ga10...@aurel32.net



Re: Linux Sparc FPU register corruption

2015-06-10 Thread Aurelien Jarno
On 2015-06-09 12:02, David Miller wrote:
> From: James Y Knight 
> Date: Tue, 9 Jun 2015 08:13:58 -0400
> 
> > Um, but my test isn't testing what is being stored to memory at
> > all. It is storing to memory and **never loading from the memory
> > after**. Why would writing FROM fp registers TO memory corrupt the
> > *registers* due to a missing memory barrier?
> 
> The memory barrier is necessary for two reasons, only one of them is
> to handle the asynchronousness of the memory operations.
> 
> The other reason is that there are strict rules for accessing the FPU
> register file around block loads and stores.
> 
> Therefore if you don't do the proper memory barriers you can get
> corrupted FPU register as well as memory contents.
> 
> And complicating things even more, what you can get away with is
> different on every single cpu variant.  That's why I really wish

So it means the userland code doesn't run the same on the various
CPU. How are we supposed to do with static binaries?

> debian didn't disable multiarch as that makes the code use the
> UltraSPARC-I/II/III memcpy, which might not be %100 kosher on
> Niagara-T1 and later.

The UltraSPARC-I/II/III memcpy code might have issues, but it clearly
works a lot better than the Niagara-T1 code on a Niagara-T1 machine.
Disabling multiarch support improves a lot the stability on these
machines.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net


-- 
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20150610075006.gd10...@aurel32.net