Re: Sparc architecture requalification

2006-05-22 Thread Jurij Smakov

On Mon, 22 May 2006, Gustavo Franco wrote:


Is there a simple way to reproduce this critical bug in a ultra1
(yeah!) ? Btw, i've some suggestions to easily identify and backport
the .17-rcX fix:


No, it's not easy. Neither I nor Clint Adams were able to reproduce it 
locally on similar machines. Yet it was killing two different buildds 
every time it tried to build openoffice and other large packages. So far 
the only person who can reproduce it reliably (even with 2.6.17-rc3) is 
Blars Blarson. If you are lucky (for some values of lucky :-), you can hit 
another bug which probably affects ultra1: esp scsi driver is busted and 
dies with DMA errors on any significant disk activity. Martin Habets is 
currently looking into it.



- Ask David S. Miller 


He's aware of this problem.


- If he can't tell us the exact commit, we can isolate the problem
using git bisect[0]

In a ultra1 the bisect game wll took ages for me, so i couldn't do
that, just reproduce the bug with a older kernel and test a patched
.16.


So far there was only tentative agreement to adopt 2.6.16 for etch (at 
least, that's my perception of the situation). Certain arguments against 
2.6.16 were presented on debian-kernel mailing list. For sparc, 2.6.16 is 
a lose-lose situation, because a) the status of the 2.6.16 kernel with 
respect to SMP crash is largely unknown, and testing it out extensively on 
buildds is not very feasible; and b) 2.6.17 is the first kernel which 
contains the support for Sun's new Niagara processor. That support is not 
trivially backportable, so if 2.6.16 is adopted as the etch kernel, we 
might have to copy over the whole sparc64 directory from 2.6.17 and hope 
that we can make it work.


Best regards,

Jurij Smakov[EMAIL PROTECTED]
Key: http://www.wooyd.org/pgpkey/   KeyID: C99E03CC


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Sparc architecture requalification

2006-05-22 Thread Gustavo Franco

On 5/21/06, Steve Langasek <[EMAIL PROTECTED]> wrote:

On Sat, May 20, 2006 at 05:34:08AM +, Aurelien Jarno wrote:
(...)
> - The kernel failures (that occurs only on SMP boxes) seems to be gone,
> at least on the build daemons. I don't know what has been done (if
> somebody know, please tell us), but the two packages that were killing
> the buildds (ie glibc and openoffice.org) are now building correctly (4
> last uploads for the glibc, last upload for openoffice.org).

What's been done is to install a kernel which is newer than any that are
actually available in sid or etch.  The fact that this seems to fix the
problem is a positive step in the right direction, but it's not sufficient
for the release qual as it leaves us with very low confidence in the
usability of the port when we can't use the Debian kernels for etch on any
of the relevant project machines.[1]

So the ideal solution is that, now that we have a known-working version,
someone determines whether 2.6.16 includes the same fixes and if not, gets
them backported to 2.6.16 for etch.



Is there a simple way to reproduce this critical bug in a ultra1
(yeah!) ? Btw, i've some suggestions to easily identify and backport
the .17-rcX fix:

- Ask David S. Miller 
- If he can't tell us the exact commit, we can isolate the problem
using git bisect[0]

In a ultra1 the bisect game wll took ages for me, so i couldn't do
that, just reproduce the bug with a older kernel and test a patched
.16.

[0] = 
http://www.kernel.org/pub/software/scm/git/docs/howto/isolate-bugs-with-bisect.txt

Hope that helps,
-- stratus



Re: Sparc architecture requalification

2006-05-21 Thread Steve Langasek
On Sat, May 20, 2006 at 05:34:08AM +, Aurelien Jarno wrote:

> It has been a long time since the sparc status on the architecture 
> requalification page [1] has been updated. A few things seems to have 
> changed:

> - There is now 3 sparc buildds (mrpurply, spontini and auric), so I 
> think the "buildd redundancy" box could be set to green.

Yes, this appears to be correct; I checked with Ryan about this at DebConf,
and we do seem to have full redundancy now for sparc buildds.

> - The kernel failures (that occurs only on SMP boxes) seems to be gone, 
> at least on the build daemons. I don't know what has been done (if 
> somebody know, please tell us), but the two packages that were killing 
> the buildds (ie glibc and openoffice.org) are now building correctly (4 
> last uploads for the glibc, last upload for openoffice.org).

What's been done is to install a kernel which is newer than any that are
actually available in sid or etch.  The fact that this seems to fix the
problem is a positive step in the right direction, but it's not sufficient
for the release qual as it leaves us with very low confidence in the
usability of the port when we can't use the Debian kernels for etch on any
of the relevant project machines.[1]

So the ideal solution is that, now that we have a known-working version,
someone determines whether 2.6.16 includes the same fixes and if not, gets
them backported to 2.6.16 for etch.

There is also the question of having appropriate kernel images on the
buildds for the remainder of sarge's term as "stable", but I don't see any
way that this should be a blocker for sparc's inclusion as an etch release
arch if the *current* buildd kernel problems don't make sparc unreleasable
package-wise.

> If the kernel failures still appear to be present, would it be possible 
> to qualify the port for non-SMP only?

AIUI most of the sparc hardware people want to *use* Debian on is SMP kit,
so I think it would be a shame to call a UP port releasable but would
certainly take the opinions of the sparc porters into consideration.

Cheers,
-- 
Steve Langasek   Give me a lever long enough and a Free OS
Debian Developer   to set it on, and I can move the world.
[EMAIL PROTECTED]   http://www.debian.org/

[1] independent of whether DSA actually uses stock Debian kernels on most
Debian systems, which TTBOMK is actually not the case


signature.asc
Description: Digital signature


Re: Sparc architecture requalification

2006-05-20 Thread Jurij Smakov

On Sat, 20 May 2006, Aurelien Jarno wrote:

- The kernel failures (that occurs only on SMP boxes) seems to be gone, at 
least on the build daemons. I don't know what has been done (if somebody 
know, please tell us), but the two packages that were killing the buildds (ie 
glibc and openoffice.org) are now building correctly (4 last uploads for the 
glibc, last upload for openoffice.org).


James Troup mentioned, that the buildds stopped dying (*knock on wood*) 
after 2.6.17-rc kernels were installed on them.


Best regards,

Jurij Smakov[EMAIL PROTECTED]
Key: http://www.wooyd.org/pgpkey/   KeyID: C99E03CC


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Sparc architecture requalification

2006-05-19 Thread Aurelien Jarno

Hi all,

It has been a long time since the sparc status on the architecture 
requalification page [1] has been updated. A few things seems to have 
changed:


- There is now 3 sparc buildds (mrpurply, spontini and auric), so I 
think the "buildd redundancy" box could be set to green.


- The kernel failures (that occurs only on SMP boxes) seems to be gone, 
at least on the build daemons. I don't know what has been done (if 
somebody know, please tell us), but the two packages that were killing 
the buildds (ie glibc and openoffice.org) are now building correctly (4 
last uploads for the glibc, last upload for openoffice.org).


If the kernel failures still appear to be present, would it be possible 
to qualify the port for non-SMP only?


Bye,
Aurelien

[1] http://release.debian.org/etch_arch_qualify.html

--
  .''`.  Aurelien Jarno | GPG: 1024D/F1BCDB73
 : :' :  Debian GNU/Linux developer | Electrical Engineer
 `. `'   [EMAIL PROTECTED] | [EMAIL PROTECTED]
   `-people.debian.org/~aurel32 | www.aurel32.net


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]