Re: Linux Sparc FPU register corruption

2015-06-09 Thread David Miller
From: James Y Knight 
Date: Tue, 9 Jun 2015 17:54:10 -0400

> What distro are you running? Maybe I should try the same thing you're using...

Debian 7.0


-- 
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/20150609.163754.1323765095739500628.da...@davemloft.net



Re: Linux Sparc FPU register corruption

2015-06-09 Thread James Y Knight
On Jun 9, 2015, at 3:02 PM, David Miller  wrote:
> Anyways, please put proper memory barriers into your testcase and
> let's see if the problem still triggers.

Okay, I edited my test case changing the asm to "membar #Sync; stda %%f0, [%0] 
0xf0; membar #Sync". That should be entirely over-doing it, right? Problem 
still occurs within seconds.

And, don't forget Problem #2 -- which didn't even involve weird ASIs at all (at 
least, not in the test task itself).

On Jun 9, 2015, at 4:45 PM, David Miller  wrote:
> Just FYI, I commented out the usleep() in your test program and have
> been running your:
> 
> seq 64 | xargs -n1 -P64 /bin/sh -c 'while ./a.out; do : ; done'
> 
> test, and it's been running flawlessly for 2 hours on my T4.
> 
> This is with Linus's current tree.
> 
> I'll fire up my T3 later and try to reproduce it there.

Gah! Thanks for testing it on your setup!...but that's really unfortunate that 
you can't reproduce the problem!

What distro are you running? Maybe I should try the same thing you're using...

Anyone else reading this able to reproduce problems with my test program (or 
not?)

--
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/513253de-8b73-4956-be1f-49b714615...@google.com



Re: Linux Sparc FPU register corruption

2015-06-09 Thread David Miller

Just FYI, I commented out the usleep() in your test program and have
been running your:

seq 64 | xargs -n1 -P64 /bin/sh -c 'while ./a.out; do : ; done'

test, and it's been running flawlessly for 2 hours on my T4.

This is with Linus's current tree.

I'll fire up my T3 later and try to reproduce it there.


-- 
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/20150609.134537.10408041080684182.da...@davemloft.net



Re: Linux Sparc FPU register corruption

2015-06-09 Thread David Miller
From: James Y Knight 
Date: Tue, 9 Jun 2015 08:13:58 -0400

> Um, but my test isn't testing what is being stored to memory at
> all. It is storing to memory and **never loading from the memory
> after**. Why would writing FROM fp registers TO memory corrupt the
> *registers* due to a missing memory barrier?

The memory barrier is necessary for two reasons, only one of them is
to handle the asynchronousness of the memory operations.

The other reason is that there are strict rules for accessing the FPU
register file around block loads and stores.

Therefore if you don't do the proper memory barriers you can get
corrupted FPU register as well as memory contents.

And complicating things even more, what you can get away with is
different on every single cpu variant.  That's why I really wish
debian didn't disable multiarch as that makes the code use the
UltraSPARC-I/II/III memcpy, which might not be %100 kosher on
Niagara-T1 and later.

Anyways, please put proper memory barriers into your testcase and
let's see if the problem still triggers.

Thanks.


-- 
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/20150609.120207.248599655687564710.da...@davemloft.net



Re: Linux Sparc FPU register corruption

2015-06-09 Thread David Miller
From: James Y Knight 
Date: Tue, 9 Jun 2015 08:07:11 -0400

> Debian glibc has multiarch support disabled (done a couple years ago to try
> to workaround the unreliability, not entirely successfully..), so it's not
> using that routine you mention. It's using
> sysdeps/sparc/sparc32/sparcv9/memcpy.S which points to
> sysdeps/sparc/sparc64/memcpy.S

I bet you stop getting corruptions if you use the appropriate optimized
routine.


-- 
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/20150609.111649.1613386818130198452.da...@davemloft.net



Re: Linux Sparc FPU register corruption

2015-06-09 Thread James Y Knight
From: David Miller 
Date: Tue, 09 Jun 2015 00:34:14 -0700 (PDT)

> Your test is faulty.
> 
> You cannot use ASI_BLK_P loads or stores without appropriate memory
> barriers around them.

Um, but my test isn't testing what is being stored to memory at all. It is 
storing to memory and **never loading from the memory after**. Why would 
writing FROM fp registers TO memory corrupt the *registers* due to a missing 
memory barrier?

If problem 1 is due to a missing membar somewhere, I think it's surely gotta be 
the kernel that's missing it, not my test code. No?

> FWIW, you're probably hitting the bug fixed by the following commit in
> glibc:
> 
> commit 834caf06f33d79be54cff63c274fba2845513593
> Author: Jose E. Marchesi 
> Date:   Sat May 17 11:20:27 2014 -0700
> 
>Fix sparc memcpy data corruption when using niagara2 optimized routines.
> 
>* sysdeps/sparc/sparc64/multiarch/memcpy-niagara2.S: Add missing
>membar to avoid block loads/stores to overlap previous stores.
> 

Debian glibc has multiarch support disabled (done a couple years ago to try to 
workaround the unreliability, not entirely successfully..), so it's not using 
that routine you mention. It's using sysdeps/sparc/sparc32/sparcv9/memcpy.S 
which points to sysdeps/sparc/sparc64/memcpy.S

[sorry for the duplicate post -- didn't realize vger.kernel.org doesn't accept 
emails sent by the android gmail client]

--
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/ecf5d654-b024-4286-a22f-bf6cdf437...@google.com



Re: Linux Sparc FPU register corruption

2015-06-09 Thread James Y Knight
From: David Miller  Date: Tue, 09 Jun 2015 00:34:14
-0700 (PDT)
>
> Your test is faulty.
>
> You cannot use ASI_BLK_P loads or stores without appropriate memory
> barriers around them.

Um, but my test isn't testing what is being stored to memory at all. It is
storing to memory and **never loading from the memory after**. Why would
writing FROM fp registers TO memory corrupt the *registers* due to a
missing memory barrier?

If problem 1 is due to a missing membar somewhere, I think it's surely
gotta be the kernel that's missing it, not my test code. No?

On Jun 9, 2015 3:46 AM, "David Miller"  wrote:
>
> FWIW, you're probably hitting the bug fixed by the following commit in
> glibc:
>
> commit 834caf06f33d79be54cff63c274fba2845513593
> Author: Jose E. Marchesi 
> Date:   Sat May 17 11:20:27 2014 -0700
>
> Fix sparc memcpy data corruption when using niagara2 optimized
routines.
>
> * sysdeps/sparc/sparc64/multiarch/memcpy-niagara2.S: Add missing
> membar to avoid block loads/stores to overlap previous stores.
>

Debian glibc has multiarch support disabled (done a couple years ago to try
to workaround the unreliability, not entirely successfully..), so it's not
using that routine you mention. It's using
sysdeps/sparc/sparc32/sparcv9/memcpy.S which points to
sysdeps/sparc/sparc64/memcpy.S


Re: Linux Sparc FPU register corruption

2015-06-09 Thread David Miller
From: David Miller 
Date: Tue, 09 Jun 2015 00:34:14 -0700 (PDT)

> Your test is faulty.
> 
> You cannot use ASI_BLK_P loads or stores without appropriate memory
> barriers around them.
> 
> The rules for when and where you need the memory barriers are
> complicated, especially if you want to incur the cost of the memory
> barrier as infrequently as possible.

FWIW, you're probably hitting the bug fixed by the following commit in
glibc:

commit 834caf06f33d79be54cff63c274fba2845513593
Author: Jose E. Marchesi 
Date:   Sat May 17 11:20:27 2014 -0700

Fix sparc memcpy data corruption when using niagara2 optimized routines.

* sysdeps/sparc/sparc64/multiarch/memcpy-niagara2.S: Add missing
membar to avoid block loads/stores to overlap previous stores.


-- 
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/20150609.004641.1323317951176412858.da...@davemloft.net



Re: Linux Sparc FPU register corruption

2015-06-09 Thread David Miller

Your test is faulty.

You cannot use ASI_BLK_P loads or stores without appropriate memory
barriers around them.

The rules for when and where you need the memory barriers are
complicated, especially if you want to incur the cost of the memory
barrier as infrequently as possible.


-- 
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/20150609.003414.731851656168946673.da...@davemloft.net