Re: Good news on Debian Sparc port stability

2015-06-22 Thread Kezia
James Y Knight  google.com> writes:

> 
> I've recently acquired a Sparc T3-1, and installed Debian Unstable's 
Sparc port on it, as a guest in a Oracle VM Server for Sparc ("ldm") VM.
> I ran into a few issues, that I've cataloged in a story below. But it 
has a happy ending!
> 
> Kernel sunvdc module
> 
> Installation wasn't 100% straightforward, as the "sunvdc" virtual disk 
driver, at least as used in kernel 3.16.7-ckt9-3, which was what was in 
the d-i image I downloaded from http://d-i.debian.org/daily-
images/sparc/ at the time, seems to be basically 100% broken. As soon as 
the installer got to the partitioner, the whole VM would hang. I see 
that there have been a lot of commits to that driver from Oracle people 
in the last few months, so I hope they're working on fixing it. Dunno. 
> 
> I also never tried installing on "bare metal", which I'm led to 
believe from random forum posts does work out of the box, since I wanted 
to keep solaris (and didn't realize, going in, how hard I was making 
things for myself...)
> 
> So long story short on that, I ended up doing an NFS root install 
instead, since the sunvnet network driver worked fine. It would be real 
nice if debian-installer had the ability to install to NFS readily 
available; I had to go extract the nfs modules manually from the normal 
kernel package, and then run debootstrap manually. (But -- I'm sure 
happy that debian's initramfs has builtin support for NFS root!)
> 
> klibc-utils
> ===
> Next problem I found is that the klibc-utils' ipconfig program gets a 
Bus Error when trying to get itself an DHCP address. I believe that DHCP 
client is only ever used in the initramfs, and only if you want to do an 
NFS root; the other dhcp daemons, e.g. as found in debian-installer, had 
worked fine. So, I told it to use a static IP instead, which worked. 
(I'm sure the bug is just an obvious misaligned memory access; I can 
look into that later).
> 
> GLibc
> =
> After that, everything seemed to be going fine, except that programs 
like GCC would randomly segfault and give parse errors. This has been 
reported before, 
e.g. http://thread.gmane.org/gmane.linux.ports.sparc/16835, from 2 years 
ago. Things were stable enough to use interactively, if you're willing 
to keep retrying a build until it works, but not stable enough to use 
for any autobuild system.
> 
> 
> After a getting a hint from Aurelien that disabling optimized memcpy 
routines in glibc (eglibc 2.19-1, on Wed, 04 Jun 2014 20:32:06 
+0200) had improved, but did not fix, the problem, I started looking 
into that
> 
> 
> ...And found that recompiling glibc, disabling the sparcv9 
optimizations (that is: eliminating debian/patches/sparc/local-sparcv9-
target.diff), *appears* to have completely fixed the stability issue!
> 
> To try to verify that, I ran a loop building and rebuilding 'clang' 
(with full "ninja" parallelism) overnight, and it's had zero crashes in 
all 14 builds of clang that it got through. Prior to fixing glibc, at 
least one of the ~2300 build steps (gcc/as/ld) was sure to crash 
unreproducibly.
> 
> 
> It'd be great if someone wants to try to figure out exactly /which/ of 
the asm routines in the various sysdeps/**/sparc32/sparcv9 are broken, 
to narrow down the problem better, too. I highly suspect there's just 
something wrong in one or more of the hand-written asm files, but it's 
certainly possible there's some wider problem that the sparcv9 
optimizations of glibc (but nothing else I've seen so far), just happens 
to expose.
> 
> GCC
> 
> 
> 
> Oh, and I'll mention one more bug I ran into, which is not sparc-
specific, but does affect building some C++11 software on Sparc:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65945
> 
> The workaround for that is usually to just compile at an optimization 
level greater than -O0, as the problematic construct typically only 
occurs in inline templates forwarding their arguments onto another 
function, which all just disappear at high opt levels.
> 
> 
> 
> Conclusion
> ==
> 
> 
> 
> It seems like the one change to glibc is probably a good-enough fix to 
get the Sparc port back to a position of stability.
> 
> And I hope this can help avoid Sparc needing to be deleted from 
Debian...
> 
> It seems to really *not* be in as bad a shape as one might be led to 
believe. E.g. I'm not sure what "lack of proper kernel support" means 
(from Joerg's https://lists.debian.org/debian-
devel/2015/04/msg00284.html). The kernel appears to be working fine. I 
ran into some bugs, but besides the one glibc issue, none really seem 
fatal to the health of the port in Debian.
> 
> James
> 
> 
> 

Hi James,

I'm having a problem to install and/or use Oracle VM. Do you know how? 
Because initially I had o install 2 VM's n my Sparc which in the primay 
is a Solaris 11. I guess that VM would resolve my problem, but i didnt 
find anything to help me install this on my Solaris. (initialy i needed 

Re: Good news on Debian Sparc port stability

2015-06-08 Thread Paul Wise
On Jun 4, 2015, at 11:07 AM, James Y Knight wrote:

> I hope this can help avoid Sparc needing to be deleted from Debian...

The official criteria for being in the main archive and being released
in a Debian stable release are available here:

https://ftp-master.debian.org/archive-criteria.html
https://release.debian.org/testing/arch_policy.html
https://release.debian.org/testing/arch_qualify.html

I'll add some further thoughts here:

There needs to be a team of Debian folk pro-actively looking at build
and stability issues, responding to questions from users and DSA and
keeping the website/wiki pages and other documentation updated.

https://buildd.debian.org/status/architecture.php?a=sparc&suite=sid
https://www.debian.org/ports/sparc/
https://wiki.debian.org/PortsSparc?action=fullsearch&value=sparc&titlesearch=Titles

There needs to be interest in Linux on SPARC upstream in the toolchain
community and the Linux community, both on bare metal and otherwise.

Something has to be decided about the sparc64 port, is it going to
replace the sparc port, co-exist with it or what?

https://wiki.debian.org/Sparc64

It would be great if Oracle or other vendors would support the Debian
SPARC port(s) by donating more modern and faster hardware, as other
vendors have done for arm64, mips, ppc64el etc.

There need to be people or organisations actually interested in using
the Debian SPARC ports, otherwise the only point is portability. Right
now there is 1 known user of sparc64 and 120 of sparc.

http://popcon.debian.org/ 

-- 
bye,
pabs

https://wiki.debian.org/PaulWise



signature.asc
Description: This is a digitally signed message part


Re: Good news on Debian Sparc port stability

2015-06-08 Thread Paul Wise
On Jun 4, 2015, at 11:07 AM, James Y Knight wrote:

> I'm not sure what "lack of proper kernel support" means

I guess that is referring to the trouble DSA were having with sompek and
stadler, which would freeze almost every day and have to be reset via
the ALOM management processor. This was solved by using the Linux
backport from wheezy-backports. DSA would prefer to use kernels from
wheezy with the relevant commits backported than full Linux backports.

https://db.debian.org/machines.cgi?host=sompek
https://db.debian.org/machines.cgi?host=stadler

-- 
bye,
pabs

https://wiki.debian.org/PaulWise



signature.asc
Description: This is a digitally signed message part


Re: Good news on Debian Sparc port stability

2015-06-05 Thread Patrick Baggett
I would ask David S. Miller about the sparc ASM stuff - he seems to be the
resident sparc genius and linux kernel maintainer.

On Fri, Jun 5, 2015 at 4:18 PM, James Y Knight  wrote:

>
> On Jun 4, 2015, at 11:07 AM, James Y Knight  wrote:
> GLibc
> =
> After that, everything seemed to be going fine, except that programs like
> GCC would randomly segfault and give parse errors. This has been reported
> before, e.g. http://thread.gmane.org/gmane.linux.ports.sparc/16835, from
> 2 years ago. Things were stable enough to use interactively, if you're
> willing to keep retrying a build until it works, but not stable enough to
> use for any autobuild system.
>
> After a getting a hint from Aurelien that disabling optimized memcpy
> routines in glibc (eglibc 2.19-1, on Wed, 04 Jun 2014 20:32:06 +0200) had
> improved, but did not fix, the problem, I started looking into that
>
> ...And found that recompiling glibc, disabling the sparcv9 optimizations
> (that is: eliminating debian/patches/sparc/local-sparcv9-target.diff),
> *appears* to have completely fixed the stability issue!
>
> To try to verify that, I ran a loop building and rebuilding 'clang' (with
> full "ninja" parallelism) overnight, and it's had zero crashes in all 14
> builds of clang that it got through. Prior to fixing glibc, at least one of
> the ~2300 build steps (gcc/as/ld) was sure to crash unreproducibly.
>
> It'd be great if someone wants to try to figure out exactly /which/ of the
> asm routines in the various sysdeps/**/sparc32/sparcv9 are broken, to
> narrow down the problem better, too. I highly suspect there's just
> something wrong in one or more of the hand-written asm files, but it's
> certainly possible there's some wider problem that the sparcv9
> optimizations of glibc (but nothing else I've seen so far), just happens to
> expose.
>
> So, bad news and good news:
>
> Bad News: the above solution of simply disabling sparcv9 breaks some
> things (other than gcc). It breaks something about atomics or semaphores,
> likely due to a mismatch of expectations between libc and other things (the
> sparc32 routines, when *NOT* compiled in a shared library, dynamically
> choose between the v8 and v9 ways of doing things, so it's entirely
> reasonable to assume that doing it the v8 way cannot work right).
>
> Good News:
>
> My next attempt at a fix, is to just disable the optimized string ops:
>  rm sysdeps/sparc/sparc32/sparcv9/*mem* sysdeps/sparc/sparc32/sparcv9/*st*
> That seems to still have fixed the random gcc crashes, AND doesn't break
> other things. :)
>
>
> Looking into what the deleted routines are doing that's "interesting":
>
> * memcpy and memset:
>
> They're using LDBLOCKF STBLOCKF "block copy" instructions, which are:
> 1) Not actually part of the Sparcv9 standard instruction set, but rather
> are processor-specific (Although, these processor-specific instructions
> have been implemented since the UltraSPARC I).
> "The LDBLOCKF instruction is intended to be a processor-specific
> instruction, which may or may not be implemented in future Oracle SPARC
> Architecture implementations. Therefore, it should only be used in
> platform-specific dynamically-linked libraries or in software created by a
> runtime code generator that is aware of the specific virtual processor
> implementation on which it is executing."
>
> 2) Marked deprecated.
> "The LDBLOCKF instructions are deprecated and should not be used in
> new software. A sequence of LDDF instructions should be used instead."
>
> 3) Don't follow the normal TSO memory model ordering that everything else
> does; they require explicit MEMBARs in the right places to ensure even
> *single-thread/cpu* memory ordering correctness.
> "Block operations do not generally conform to dependence order on the
> issuing virtual processor; that is, no read-after-write or write-after-read
> checking occurs between block loads and stores. Explicit MEMBARs are
> required to enforce dependence ordering between block operations that
> reference the same address."
>
> It certainly looks like the author of those routines *tried* to do the
> right thing w.r.t. inserting membar instructions in the right place, but I
> can easily imagine it's wrong somehow. And it is entirely plausible that
> the behavior would be hardware-generation specific, since it has, by
> design, weird hardware-specific memory semantics. I'm placing my bets on
> this one being the problem.
>
> * memchr, memcmp, strcmp, strcpy, etc.
>
> These are using a nonfaulting load instruction. The nonfaulting load
> doesn't actually mean the hardware doesn't fault on loading from an
> unmapped page. Actually, unmapped pages still cause a fault, but the fault
> is supposed to be handled by the OS. It's also possible to map pages as
> "for use by nonfaulting loads only" (linux doesn't appear to do this).
>
> That's a rare instruction -- not generated by GCC I think, so I could
> imagine there being a bug in the fault handler for it. I think th

Good news on Debian Sparc port stability

2015-06-04 Thread James Y Knight
I've recently acquired a Sparc T3-1, and installed Debian Unstable's Sparc port 
on it, as a guest in a Oracle VM Server for Sparc ("ldm") VM.

I ran into a few issues, that I've cataloged in a story below. But it has a 
happy ending!

Kernel sunvdc module

Installation wasn't 100% straightforward, as the "sunvdc" virtual disk driver, 
at least as used in kernel 3.16.7-ckt9-3, which was what was in the d-i image I 
downloaded from http://d-i.debian.org/daily-images/sparc/ at the time, seems to 
be basically 100% broken. As soon as the installer got to the partitioner, the 
whole VM would hang. I see that there have been a lot of commits to that driver 
from Oracle people in the last few months, so I hope they're working on fixing 
it. Dunno. 

I also never tried installing on "bare metal", which I'm led to believe from 
random forum posts does work out of the box, since I wanted to keep solaris 
(and didn't realize, going in, how hard I was making things for myself...)

So long story short on that, I ended up doing an NFS root install instead, 
since the sunvnet network driver worked fine. It would be real nice if 
debian-installer had the ability to install to NFS readily available; I had to 
go extract the nfs modules manually from the normal kernel package, and then 
run debootstrap manually. (But -- I'm sure happy that debian's initramfs has 
builtin support for NFS root!)

klibc-utils
===
Next problem I found is that the klibc-utils' ipconfig program gets a Bus Error 
when trying to get itself an DHCP address. I believe that DHCP client is only 
ever used in the initramfs, and only if you want to do an NFS root; the other 
dhcp daemons, e.g. as found in debian-installer, had worked fine. So, I told it 
to use a static IP instead, which worked. (I'm sure the bug is just an obvious 
misaligned memory access; I can look into that later).

GLibc
=
After that, everything seemed to be going fine, except that programs like GCC 
would randomly segfault and give parse errors. This has been reported before, 
e.g. http://thread.gmane.org/gmane.linux.ports.sparc/16835, from 2 years ago. 
Things were stable enough to use interactively, if you're willing to keep 
retrying a build until it works, but not stable enough to use for any autobuild 
system.

After a getting a hint from Aurelien that disabling optimized memcpy routines 
in glibc (eglibc 2.19-1, on Wed, 04 Jun 2014 20:32:06 +0200) had improved, but 
did not fix, the problem, I started looking into that

...And found that recompiling glibc, disabling the sparcv9 optimizations (that 
is: eliminating debian/patches/sparc/local-sparcv9-target.diff), *appears* to 
have completely fixed the stability issue!

To try to verify that, I ran a loop building and rebuilding 'clang' (with full 
"ninja" parallelism) overnight, and it's had zero crashes in all 14 builds of 
clang that it got through. Prior to fixing glibc, at least one of the ~2300 
build steps (gcc/as/ld) was sure to crash unreproducibly.

It'd be great if someone wants to try to figure out exactly /which/ of the asm 
routines in the various sysdeps/**/sparc32/sparcv9 are broken, to narrow down 
the problem better, too. I highly suspect there's just something wrong in one 
or more of the hand-written asm files, but it's certainly possible there's some 
wider problem that the sparcv9 optimizations of glibc (but nothing else I've 
seen so far), just happens to expose.

GCC


Oh, and I'll mention one more bug I ran into, which is not sparc-specific, but 
does affect building some C++11 software on Sparc:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65945

The workaround for that is usually to just compile at an optimization level 
greater than -O0, as the problematic construct typically only occurs in inline 
templates forwarding their arguments onto another function, which all just 
disappear at high opt levels.

Conclusion
==

It seems like the one change to glibc is probably a good-enough fix to get the 
Sparc port back to a position of stability.

And I hope this can help avoid Sparc needing to be deleted from Debian...

It seems to really *not* be in as bad a shape as one might be led to believe. 
E.g. I'm not sure what "lack of proper kernel support" means (from Joerg's 
https://lists.debian.org/debian-devel/2015/04/msg00284.html). The kernel 
appears to be working fine. I ran into some bugs, but besides the one glibc 
issue, none really seem fatal to the health of the port in Debian.

James