Re: sparc buildd issues [Re: Release candidate architecture requalification results; amd64 is RC]

2007-06-27 Thread Josip Rodin
On Wed, Jun 27, 2007 at 04:01:15PM -0400, Clint Adams wrote:
> > It does seem to have issues -- kernel 2.6.19.2 had crashed at one point
> > soon after the buildd started running, and I seem to see some three more
> > 'down' entries now with 2.6.22-rc5... elmo might know more.
> 
> Perhaps you could start using official Debian kernels and file bugs
> when it crashes.

I don't believe these were necessarily crashes, they may have been just
reboots while someone was logged in. I was referring to the 'down' state
in the last(1) log.

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: sparc buildd issues [Re: Release candidate architecture requalification results; amd64 is RC]

2007-06-27 Thread Clint Adams
On Wed, Jun 27, 2007 at 09:53:37PM +0200, Josip Rodin wrote:
> Right now I see some 42 *_sparc.deb files in there, so it seems like it
> might be approaching a working state of some sort. It does seem to have
> issues -- kernel 2.6.19.2 had crashed at one point soon after the buildd
> started running, and I seem to see some three more 'down' entries now with
> 2.6.22-rc5... elmo might know more.
> 
> In any case, I thought I shouldn't leave this thread unanswered after
> this much activity finally happened :)

Perhaps you could start using official Debian kernels and file bugs
when it crashes.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: sparc buildd issues [Re: Release candidate architecture requalification results; amd64 is RC]

2007-06-27 Thread Josip Rodin
On Sat, Feb 24, 2007 at 02:57:15PM +0100, Josip Rodin wrote:
> > I finally plugged in that Fire 280R last night! :) It took some time, but
> > it's there.
> 
> The machine currently has 2 GB RAM memory and two 750 MHz processors.
> 
> Now it's better in that regard than both auric and spontini, which should
> make it worth while.

On Fri, May 18, 2007 at 03:56:06AM -0700, Steve Langasek wrote:
> > > Anyway, as far as getting something done, I would suggest opening an   
> > > RT ticket and documenting in that ticket [...]

Some time after having filed the ticket, James having described auric as
completely gone ( :( ), over the last couple of weeks the machine was set
up as a buildd by him. It's called lebrun.d.o.

Right now I see some 42 *_sparc.deb files in there, so it seems like it
might be approaching a working state of some sort. It does seem to have
issues -- kernel 2.6.19.2 had crashed at one point soon after the buildd
started running, and I seem to see some three more 'down' entries now with
2.6.22-rc5... elmo might know more.

In any case, I thought I shouldn't leave this thread unanswered after
this much activity finally happened :)

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: sparc buildd issues [Re: Release candidate architecture requalification results; amd64 is RC]

2007-02-24 Thread Josip Rodin
On Sat, Feb 24, 2007 at 06:28:59AM -0800, Earl Violet wrote:
> I sometimes come across Sun components while salvaging computers. If
> I know what is needed/wanted, I will put things aside. I am in
> Tucson, Arizona, USA. There are many components I am not familiar
> with but I will try.

That will be useful in the future, but I actually can't expand this machine
any more - I filled up all eight RAM slots, and both CPU slots.

But I'm sure if you pass along information about Sun components, people
could organize something.

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: sparc buildd issues [Re: Release candidate architecture requalification results; amd64 is RC]

2007-02-24 Thread Earl Violet
I sometimes come across Sun components while salvaging computers. If
I know what is needed/wanted, I will put things aside. I am in
Tucson, Arizona, USA. There are many components I am not familiar
with but I will try.

Earl


URL http://deserthowler.cjb.net
Instant messenger: earlcoyote
ICQ:64033496


 

8:00? 8:25? 8:40? Find a flick in no time 
with the Yahoo! Search movie showtime shortcut.
http://tools.search.yahoo.com/shortcuts/#news


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: sparc buildd issues [Re: Release candidate architecture requalification results; amd64 is RC]

2007-02-24 Thread Josip Rodin
On Sat, Jan 27, 2007 at 10:20:16AM +0100, Josip Rodin wrote:
> > Yesterday I decommissioned that 280R and it's now waiting in a storage
> > room. I'll find some extra disks and plug them in, as well as a
> > permanent place in a network rack.
> 
> I finally plugged in that Fire 280R last night! :) It took some time, but
> it's there.
> 
> > I don't know what kind of memory these machines use, and judging by the
> > price of extra CPUs I doubt I'll be able to afford buying more, but
> > otherwise this is a machine stronger than either vore or spontini.
> 
> OK, so the machine currently has 1 GB RAM, and one 750 MHz processor.
> 
> If Debianites have any spare money at hand to buy extra components from one
> of those cheap American shops, and are also going to DebConf in Edinburgh,
> I'd be happy to reimburse them personally and pick the stuff up at the
> conference.

I got inventive and found a quicker way :) The machine currently has
2 GB RAM memory and two 750 MHz processors.

Now it's better in that regard than both auric and spontini, which should
make it worth while.

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: sparc buildd issues [Re: Release candidate architecture requalification results; amd64 is RC]

2007-01-27 Thread Josip Rodin
On Tue, Jan 10, 2006 at 01:45:28PM +0100, Josip Rodin wrote:
> > > sparc will be removed from consideration until we no longer have to
> > > worry about OpenOffice builds (or other intensive package builds)
> > > crashing the buildd machines.
> > 
> > I have a few Ultra5 machines available, as well as sufficient memory, disk
> > and bandwidth to support expansion. I could probably also get a 280R
> 
> Yesterday I decommissioned that 280R and it's now waiting in a storage room.
> I'll find some extra disks and plug them in, as well as a permanent place in
> a network rack.

I finally plugged in that Fire 280R last night! :) It took some time, but
it's there.

> I don't know what kind of memory these machines use, and judging by the
> price of extra CPUs I doubt I'll be able to afford buying more, but
> otherwise this is a machine stronger than either vore or spontini.

OK, so the machine currently has 1 GB RAM, and one 750 MHz processor.

If Debianites have any spare money at hand to buy extra components from one
of those cheap American shops, and are also going to DebConf in Edinburgh,
I'd be happy to reimburse them personally and pick the stuff up at the
conference.

I see sparc is at around 98% mark, so I guess it's no hurry, but I figure it
would help to have this in the long run.

(I would also be willing to be educated about how to run a buildd myself at
the conference :)

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: sparc buildd issues [Re: Release candidate architecture requalification results; amd64 is RC]

2006-01-17 Thread Josip Rodin
On Tue, Jan 10, 2006 at 01:45:28PM +0100, Josip Rodin wrote:
> I don't know what kind of memory [280R] machines use, and judging by the
> price of extra CPUs I doubt I'll be able to afford buying more

The Sun handbook says they are 232-pin 7ns memory modules.
$400 for a gigabyte at http://www.memorydealers.com/sunfire280ra35.html
Not exactly cheap.

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: sparc buildd issues [Re: Release candidate architecture requalification results; amd64 is RC]

2006-01-10 Thread Josip Rodin
On Tue, Dec 27, 2005 at 02:18:43PM +0100, Josip Rodin wrote:
> > sparc will be removed from consideration until we no longer have to
> > worry about OpenOffice builds (or other intensive package builds)
> > crashing the buildd machines.
> 
> I have a few Ultra5 machines available, as well as sufficient memory, disk
> and bandwidth to support expansion. I could probably also get a 280R

Yesterday I decommissioned that 280R and it's now waiting in a storage room.
I'll find some extra disks and plug them in, as well as a permanent place in
a network rack.

I don't know what kind of memory these machines use, and judging by the
price of extra CPUs I doubt I'll be able to afford buying more, but
otherwise this is a machine stronger than either vore or spontini.

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: System freezes, time jumps 3.25 days (was Re: sparc buildd issues)

2006-01-03 Thread Richard Mortimer
Dave,

Below is a patch that closes a race condition reading the stick register
on Hummingbird cpus. Previously the code always incremented the high
word if the low had wrapped but we don't know if high was read before or
after the wrap.

This may (or may not :-) ) be the cause of a kernel lockup seen on a
number of Hummingbird systems running the Debian kernel 2.6.14 where
time seems to jump forward 3 days 6 hours shortly before the system
locks up.

Patch below.

Richard

<-- snip -->

Ensure STICK register value is read properly when register roll over
occurs.

Signed-off-by: Richard Mortimer <[EMAIL PROTECTED]>



--- linux-2.6-2.6.14+2.6.15-rc5/arch/sparc64/kernel/time.c.orig 2006-01-03 
14:27:36.0 +
+++ linux-2.6-2.6.14+2.6.15-rc5/arch/sparc64/kernel/time.c  2006-01-03 
22:20:19.0 +
@@ -280,9 +280,9 @@
  * Since STICK is constantly updating, we have to access it carefully.
  *
  * The sequence we use to read is:
- * 1) read low
- * 2) read high
- * 3) read low again, if it rolled over increment high by 1
+ * 1) read high
+ * 2) read low
+ * 3) read high again, if it rolled re-read both low and high again.
  *
  * Writing STICK safely is also tricky:
  * 1) write low to zero
@@ -295,18 +295,18 @@
 static unsigned long __hbird_read_stick(void)
 {
unsigned long ret, tmp1, tmp2, tmp3;
-   unsigned long addr = HBIRD_STICK_ADDR;
+   unsigned long addr = HBIRD_STICK_ADDR+8;

__asm__ __volatile__("ldxa  [%1] %5, %2\n\t"
-"add   %1, 0x8, %1\n\t"
-"ldxa  [%1] %5, %3\n\t"
+"1:\n\t"
 "sub   %1, 0x8, %1\n\t"
+"ldxa  [%1] %5, %3\n\t"
+"add   %1, 0x8, %1\n\t"
 "ldxa  [%1] %5, %4\n\t"
 "cmp   %4, %2\n\t"
-"blu,a,pn  %%xcc, 1f\n\t"
-" add  %3, 1, %3\n"
-"1:\n\t"
-"sllx  %3, 32, %3\n\t"
+"bne,a,pn  %%xcc, 1b\n\t"
+" mov  %4, %2\n"
+"sllx  %4, 32, %4\n\t"
 "or%3, %4, %0\n\t"
 : "=&r" (ret), "=&r" (addr),
   "=&r" (tmp1), "=&r" (tmp2), "=&r" (tmp3)




On Sat, 2005-12-31 at 00:41 +, Richard Mortimer wrote:
> On Thu, 2005-12-29 at 16:46 -0800, Jurij Smakov wrote:
> > On Wed, 28 Dec 2005, Blars Blarson wrote:
> > 
> > > In article <[EMAIL PROTECTED]>
> > > [EMAIL PROTECTED] writes:
> > >
> > >> On a Sparc netra X1, the system partially freezes (some stuff continues
> > >> running but at least one of the operations necessary to log in gets 
> > >> stuck).
> > >> The logs show that as of the moment of the freeze, the clock has jumped
> > >> forward exactly 3 days, 6 hours,
> > >> 11 minutes and 15 seconds. The change is not gradual; it jumps between
> > >> syslog marks set a minute apart.
> > >
> > > This is not what I have seen.  It sounds like an unrelated issue.
> > 
> > Yeah, I haven't heard about the jumping time issue before. It was reported 
> > that it is absent in 2.6.14, could you please test it? If it's really 
> > gone, it would be the easiest way out.
> 
> I have seen occasional freezes on a Netra X1 running 2.6.14-2.
> Previously I had just put it down to bad hardware or power supply
> glitches (I only use the machine occasionally for testing stuff out).
> Now having seen these discussions I have looked back in my logs and can
> see at least one occurrence of something that looks like the 3 days 6
> hours stuff. I see entries in syslog that jump something like that time
> before for a couple of entries and then the machine hangs until I notice
> and powercycle.
> 
> Anyway that got me thinking as to what may cause this sort of thing.
> Past experience suggests that it would probably be caused by an overflow
> of the low 32 bits of a 64 bit counter or something like that. I
> couldn't make any of the clock frequencies that the machine claims to
> use look anything sensible.
> 
> But I did notice something in __hbird_read_stick in
> arch/sparc64/kernel/time.c 
> The comment (and indeed the code) says that it has to read two 32 bit
> registers in I/O space and that it has to take care of overflow using
> the following sequence.
> 
>  * The sequence we use to read is:
>  * 1) read low
>  * 2) read high
>  * 3) read low again, if it rolled over increment high by 1
> 
> Now to me it seems that if we see the low roll over then always
> incrementing high could be wrong because we could have read it before or
> after the roll over. I think a better solution would be as follows:
> 
> 1) read high
> 2) read low
> 3) read high, if high changed then start again
> 
> I do not know how this could cause the problem becaus

System freezes, time jumps 3.25 days (was Re: sparc buildd issues)

2005-12-30 Thread Richard Mortimer
On Thu, 2005-12-29 at 16:46 -0800, Jurij Smakov wrote:
> On Wed, 28 Dec 2005, Blars Blarson wrote:
> 
> > In article <[EMAIL PROTECTED]>
> > [EMAIL PROTECTED] writes:
> >
> >> On a Sparc netra X1, the system partially freezes (some stuff continues
> >> running but at least one of the operations necessary to log in gets stuck).
> >> The logs show that as of the moment of the freeze, the clock has jumped
> >> forward exactly 3 days, 6 hours,
> >> 11 minutes and 15 seconds. The change is not gradual; it jumps between
> >> syslog marks set a minute apart.
> >
> > This is not what I have seen.  It sounds like an unrelated issue.
> 
> Yeah, I haven't heard about the jumping time issue before. It was reported 
> that it is absent in 2.6.14, could you please test it? If it's really 
> gone, it would be the easiest way out.

I have seen occasional freezes on a Netra X1 running 2.6.14-2.
Previously I had just put it down to bad hardware or power supply
glitches (I only use the machine occasionally for testing stuff out).
Now having seen these discussions I have looked back in my logs and can
see at least one occurrence of something that looks like the 3 days 6
hours stuff. I see entries in syslog that jump something like that time
before for a couple of entries and then the machine hangs until I notice
and powercycle.

Anyway that got me thinking as to what may cause this sort of thing.
Past experience suggests that it would probably be caused by an overflow
of the low 32 bits of a 64 bit counter or something like that. I
couldn't make any of the clock frequencies that the machine claims to
use look anything sensible.

But I did notice something in __hbird_read_stick in
arch/sparc64/kernel/time.c 
The comment (and indeed the code) says that it has to read two 32 bit
registers in I/O space and that it has to take care of overflow using
the following sequence.

 * The sequence we use to read is:
 * 1) read low
 * 2) read high
 * 3) read low again, if it rolled over increment high by 1

Now to me it seems that if we see the low roll over then always
incrementing high could be wrong because we could have read it before or
after the roll over. I think a better solution would be as follows:

1) read high
2) read low
3) read high, if high changed then start again

I do not know how this could cause the problem because the counter runs
at 5.5Mhz and would overflow every 773 seconds or so. But maybe the
possiblility of hitting this every 10-15 minutes combined with another
problem could give us the symptoms that we see.

Richard

P.S. Sorry but I don't have time to generate a patch today but will do
so early next week unless someone else has beaten me to it.

-- 
Richard Mortimer <[EMAIL PROTECTED]>


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: sparc buildd issues

2005-12-29 Thread Jurij Smakov

On Wed, 28 Dec 2005, Blars Blarson wrote:


In article <[EMAIL PROTECTED]>
[EMAIL PROTECTED] writes:


On a Sparc netra X1, the system partially freezes (some stuff continues
running but at least one of the operations necessary to log in gets stuck).
The logs show that as of the moment of the freeze, the clock has jumped
forward exactly 3 days, 6 hours,
11 minutes and 15 seconds. The change is not gradual; it jumps between
syslog marks set a minute apart.


This is not what I have seen.  It sounds like an unrelated issue.


Yeah, I haven't heard about the jumping time issue before. It was reported 
that it is absent in 2.6.14, could you please test it? If it's really 
gone, it would be the easiest way out.


Regarding the kernel problem which is plaguing the buildds: I think by now 
it is fair to say that it affects only SMP machines. Furthermore, it looks 
like it only crashes the machines with SCSI controllers. It has been 
discussed a while ago on the gentoo-sparc list, archived discussion is 
available at


http://marc.theaimsgroup.com/?t=11196707991&r=1&w=2
http://marc.theaimsgroup.com/?t=11299903061&r=1&w=2

There also seems to be a way to crash the kernel by creating heavy disk 
activity, just by tar'ing, copying and untar'ing big directory trees 
repeatedly (that's what the crashme script mentioned in these thread 
does). Today I have received a second CPU for my Ultra60 (thanks a LOT to 
Clint Adams for providing it), so I'll try to reproduce the failure and 
poke around to see what I can do about it.


Best regards,

Jurij Smakov[EMAIL PROTECTED]
Key: http://www.wooyd.org/pgpkey/   KeyID: C99E03CC


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Linux 2.6.14 & blade 100 framebuffer (was: Re: sparc buildd issues [Re: Release candidate architecture requalification results; amd64 is RC])

2005-12-28 Thread Admar Schoonen
On Wed, Dec 28, 2005 at 03:48:51PM +0100, Luigi Gangitano wrote:
> >The Xorg setup that works flawlessly on my blade 100 with (custom)  
> >2.6.9 at 1280
> >x 960 at 85 Hz suddenly only works at 60 Hz and shows red dots with  
> >debian
> >2.6.14. I haven't investigated much in this issue, but perhaps you  
> >know a
> >solution / workaround?
> 
> I run it at 1024x768, and had many problems with the framebuffer that  
> have been solved in the latest 2.6.14 sources. Maybe you should try  
> the latest kernel image.

I just tried linux-image-2.6.14-2-sparc64 version 2.6.14-6 on my blade 100.
Console looks ok, except for some "snow" or noise when text is scrolling. Xorg
still insists on 1280x960 at 60 Hz; if I lower the resolution to something like
1024x768 or 800x600, it will put the refresh rate at 85 Hz. At all of those
resolutions though, there are red pixels on the screen, most notably when moving
a window.

If I boot the same kernel with 'video=atyfb:off vga=normal', Xorg suddenly does
run at 1280x960 at 85 Hz. Unfortunately, with that parameter, my second
framebuffer (another ati mach64 card) is not initialized, and Xorg only uses the
onboard card. The error in /var/log/Xorg.0.log is:

(==) ATI(1): Chipset:  "ati".
(**) ATI(1): Depth 24, (--) framebuffer bpp 32
(**) ATI(1): Option "reference_clock" "29.5 Mhz"
(EE) ATI(1): Adapter has not been initialised.

In summary: 2.6.14-2 & Xorg seem to work fine for a blade 100, provided that you
disable the ati framebuffer (although I haven't tested it longer than a few
hours).

Now, if I could only find a way to make either openboot, the linux kernel or
Xorg (correctly) initialize both cards...

Admar


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: sparc buildd issues [Re: Release candidate architecture requalification results; amd64 is RC]

2005-12-28 Thread Luigi Gangitano

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


Il giorno 28/dic/05, alle ore 15:29, Admar Schoonen ha scritto:
Are you running the Debian 2.6.14 kernel, or another (self  
compiled?) version?


A self compiled version based on the latest debian sources (2.6.14-6).


And what about Xorg? Do you have red dots in Xorg?


I don't use that machine as a desktop, so I run Xorg rarely. If I  
recall correctly, last time I run it there were some glitches in X. I  
will check as soon as I have phisical access to the machine.


The Xorg setup that works flawlessly on my blade 100 with (custom)  
2.6.9 at 1280
x 960 at 85 Hz suddenly only works at 60 Hz and shows red dots with  
debian
2.6.14. I haven't investigated much in this issue, but perhaps you  
know a

solution / workaround?


I run it at 1024x768, and had many problems with the framebuffer that  
have been solved in the latest 2.6.14 sources. Maybe you should try  
the latest kernel image.


Regards,

- --
Luigi Gangitano -- <[EMAIL PROTECTED]> -- <[EMAIL PROTECTED]>
GPG: 1024D/924C0C26: 12F8 9C03 89D3 DB4A 9972  C24A F19B A618 924C 0C26


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (Darwin)

iD8DBQFDsqXV8ZumGJJMDCYRAhlnAJ90QVb4ZhN+UtzvQJuip7EdIhEITwCeManj
hvwt0llLCnm3RYZ5Lo+Mqo0=
=RpQD
-END PGP SIGNATURE-


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: sparc buildd issues [Re: Release candidate architecture requalification results; amd64 is RC]

2005-12-28 Thread Luigi Gangitano

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


Il giorno 27/dic/05, alle ore 19:04, William Herrin ha scritto:
On a Sparc netra X1, the system partially freezes (some stuff  
continues running but at least one of the operations necessary to  
log in gets stuck). The logs show that as of the moment of the  
freeze, the clock has jumped forward exactly 3 days, 6 hours,
11 minutes and 15 seconds. The change is not gradual; it jumps  
between syslog marks set a minute apart.


I had the same problem on a SunBlade 100 (basically the same machine  
in a desktop case). Apparently the block subsystem is freezed (no  
activity of the disk) and login can complete.


Now I'm running 2.6.14 and the bug has not occurred in the last 25  
days, which is way longer than it was with 2.6.9 to 2.6.12 kernels.  
It seems to me that it has been fixed.


Regards,

- --
Luigi Gangitano -- <[EMAIL PROTECTED]> -- <[EMAIL PROTECTED]>
GPG: 1024D/924C0C26: 12F8 9C03 89D3 DB4A 9972  C24A F19B A618 924C 0C26


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (Darwin)

iD8DBQFDsppW8ZumGJJMDCYRAiD+AJ4zZP9RNY2vjdhszLtunulroWdF8gCdHqXm
nPQYxaR22a/0D/CQAYTjBoM=
=OqPy
-END PGP SIGNATURE-


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: sparc buildd issues [Re: Release candidate architecture requalification results; amd64 is RC]

2005-12-28 Thread Blars Blarson
In article <[EMAIL PROTECTED]> 
[EMAIL PROTECTED] writes:

>On a Sparc netra X1, the system partially freezes (some stuff continues
>running but at least one of the operations necessary to log in gets stuck).
>The logs show that as of the moment of the freeze, the clock has jumped
>forward exactly 3 days, 6 hours,
>11 minutes and 15 seconds. The change is not gradual; it jumps between
>syslog marks set a minute apart.

This is not what I have seen.  It sounds like an unrelated issue.







-- 
Blars Blarson   [EMAIL PROTECTED]
http://www.blars.org/blars.html
With Microsoft, failure is not an option.  It is a standard feature.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: sparc buildd issues [Re: Release candidate architecture requalification results; amd64 is RC]

2005-12-28 Thread Blars Blarson
In article <[EMAIL PROTECTED]> 
[EMAIL PROTECTED] writes:
>Can i also repoduce this on a sarge machine / what are the requirements
>to get this happen. only the kernel or more?
>is a chroot with sid/etch inside enough?
>What can i do to help the porters with this issue.?
>
>I know that rene is doing some OOo builds here, but i am didn't heard
>the last time that maurice has dumped.
>
>Daniel

Part of what needs to be done is determine exactly what circumstances
it does happen.  I think this happened to me under 2.6.8, so it may
apply to sarge.  It may only happen on SMP boxes or SMP kernels.

It does not happen predictably, openoffice builds fine sometimes.

A guess is it may be some kernel code that needs a lock around it.

-- 
Blars Blarson   [EMAIL PROTECTED]
http://www.blars.org/blars.html
With Microsoft, failure is not an option.  It is a standard feature.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: sparc buildd issues [Re: Release candidate architecture requalification results; amd64 is RC]

2005-12-27 Thread William Herrin
I hadn't seen that, thanks. That sunsolve issue is kind of light on the
details, but it implies a random drift of a few seconds. The symptoms I
see are a sudden forward jump of exactly 3d 6:11:15. Every time it happened I saw exactly the same jump, after which some
programs continue running while others hang.

I had a limited ability to track it. The first hang point while trying
to log in following the time jump was when running "id -u" in
/etc/profile. I determined this by bracketing each section of
/etc/profile with echos. 

I tried leaving a terminal live and logged in once, but "ps auxww" hung
as well. Bind, klog and syslog at least continue running. I suppose the
next trick would be to wait for a hang and run them with strace to see
which call hangs, but at that point I tried going back to kernel
2.6.8.2. Its been running smoothly since. Whatever the problem is, it doesn't seem to affect 2.6.8.2
.

Regards,
Bill Herrin

On 12/27/05, Alexander Zangerl <[EMAIL PROTECTED]
> wrote:
On Tue, 27 Dec 2005 13:04:50 EST, William Herrin writes:>On a Sparc netra X1, the system partially freezes (some stuff continues>running but at least one of the operations necessary to log in gets stuck).

>The logs show that as of the moment of the freeze, the clock has jumped>forward exactly 3 days, 6 hours,>11 minutes and 15 seconds. The change is not gradual; it jumps between>syslog marks set a minute apart.
i'm pretty sure you've seenhttp://sunsolve.sun.com/searchproxy/document.do?assetkey=1-26-49016-1
>where sun says "no workaround"...apparently it's a chip issue that needs
to be worked around in the kernel. bah.-- William
D.
Herrin  [EMAIL PROTECTED]  
[EMAIL PROTECTED]3005
Crane
Dr.Web:
Falls Church, VA 22042-3004



Re: sparc buildd issues [Re: Release candidate architecture requalification results; amd64 is RC]

2005-12-27 Thread Alexander Zangerl
On Tue, 27 Dec 2005 13:04:50 EST, William Herrin writes:
>On a Sparc netra X1, the system partially freezes (some stuff continues
>running but at least one of the operations necessary to log in gets stuck).
>The logs show that as of the moment of the freeze, the clock has jumped
>forward exactly 3 days, 6 hours,
>11 minutes and 15 seconds. The change is not gradual; it jumps between
>syslog marks set a minute apart.

i'm pretty sure you've seen 
http://sunsolve.sun.com/searchproxy/document.do?assetkey=1-26-49016-1>
where sun says "no workaround"...apparently it's a chip issue that needs
to be worked around in the kernel. bah.

az


-- 
+ Alexander Zangerl + DSA 42BD645D + (RSA 5B586291)
"Now that I think of it, O'Reilly is to a system administrator as a shoulder
length latex glove is to a veterinarian." -- Peter da Silva


pgpGnIrK11iU2.pgp
Description: PGP signature


Re: sparc buildd issues [Re: Release candidate architecture requalification results; amd64 is RC]

2005-12-27 Thread Admar Schoonen
On Tue, Dec 27, 2005 at 01:04:50PM -0500, William Herrin wrote:
> On 12/27/05, Josip Rodin <[EMAIL PROTECTED]> wrote:
> 
> > What is this kernel problem, how does it manifest? Which branch, 2.4 or
> > 2.6?
> >
> 
> I don't know if this is the same problem you're looking for, but here are
> the symptoms I've seen for kernels 2.6.11 and 2.6.12:
> 
> On a Sparc netra X1, the system partially freezes (some stuff continues
> running but at least one of the operations necessary to log in gets stuck).
> The logs show that as of the moment of the freeze, the clock has jumped
> forward exactly 3 days, 6 hours,
> 11 minutes and 15 seconds. The change is not gradual; it jumps between
> syslog marks set a minute apart.

Same problem occurs occasionally on my Blade 100 (same CPU) with custom build
2.6.9. When the time is jumping 3 days and 6 hours, some things still work, but
usually things which involve a shell are so incredibly slow that I give up hope
and just reset the machine.

I have to admit that it's been a long while since I had the strange time jump.
For some reason, the system now seems to just freeze every now and again. I
don't know if all those freezes are related to the strange time jump; sometimes
I can't even move the mouse pointer and I don't find anything in syslog.

I don't know the situation is with 2.6.8 or with 2.6.10 or newer. I don't use
2.6.11 and newer since those have problems with the framebuffer / Xorg (2.6.14
from Debian fixes a lot, but there are still some issues).

This e-mail might sound a bit as a complaint, but I really do appreciate all the
hard work you are doing.

Best regards,

Admar


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: sparc buildd issues [Re: Release candidate architecture requalification results; amd64 is RC]

2005-12-27 Thread Daniel J. Priem
Can i also repoduce this on a sarge machine / what are the requirements
to get this happen. only the kernel or more?
is a chroot with sid/etch inside enough?
What can i do to help the porters with this issue.?

I know that rene is doing some OOo builds here, but i am didn't heard
the last time that maurice has dumped.

Daniel


> To reproduce, run your system under heavy load, like building
> openoffice.org, gcc, or the kernel.  I run two builds at a time.  This
> may only happen on multi-processor machines.  (Auric is dual
> processor, I don't know about the new sparc buildds.)
> 
> 
> 
> -- 
> Blars Blarson [EMAIL PROTECTED]
>   http://www.blars.org/blars.html
> With Microsoft, failure is not an option.  It is a standard feature.
> 
> 


signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil


Re: sparc buildd issues [Re: Release candidate architecture requalification results; amd64 is RC]

2005-12-27 Thread William Herrin
On 12/27/05, Josip Rodin <[EMAIL PROTECTED]
> wrote:
What is this kernel problem, how does it manifest? Which branch, 2.4 or 2.6?

I don't know if this is the same problem you're looking for, but here
are the symptoms I've seen for kernels 2.6.11 and 2.6.12:

On a Sparc netra X1, the system partially freezes (some stuff continues
running but at least one of the operations necessary to log in gets
stuck). The logs show that as of the moment of the freeze, the clock
has jumped forward exactly 3 days, 6 hours,
11 minutes and 15 seconds. The change is not gradual; it jumps between syslog marks set a minute apart. 

This happens intermittantly. I can't force it to reproduce. With a
system under moderate load it will happen within 10 days. Under light
load it happens within 30 days. The problem doesn't happen under
Debian's 2.6.8.2, or at least it hasn't in the 40 days since I switched
back to it, which is longer than between any of the dozen prior
freezes. 

The freeze ups don't correlate with any particular activity on the
system. Nothing is schedule to run and with only Bind on the machine
there there isn't much of an opportunity for it to be put under
external load.

$ cat /proc/cpuinfo
cpu : TI UltraSparc IIe (Hummingbird)
fpu : UltraSparc IIe integrated FPU
promlib : Version 3 Revision 0
prom    : 4.0.6
type    : sun4u
ncpus probed    : 1
ncpus active    : 1
Cpu0Bogo    : 794.62
Cpu0ClkTck  : 17d78400
MMU Type    : Spitfire
$ free

total  
used   free
shared    buffers cached
Mem:   
513064 408600
104464 
0 101904 218800
-/+ buffers/cache:  87896 425168
Swap:   
0 
0  0

Regards,
Bill Herrin-- William D.
Herrin  [EMAIL PROTECTED]  
[EMAIL PROTECTED]3005
Crane
Dr.Web:
Falls Church, VA 22042-3004



Re: sparc buildd issues [Re: Release candidate architecture requalification results; amd64 is RC]

2005-12-27 Thread Blars Blarson
In article <[EMAIL PROTECTED]> [EMAIL PROTECTED] writes:
>What is this kernel problem, how does it manifest? Which branch, 2.4 or 2.6?

Assuming this is what I have seen on my sparc pbuilder (ultra 2, dual
300Mhz), the symptom is the system rebooting (or, less often, crashing
and needing a power cycle to reboot).  I'm running 2.6.12 (etch).
Some of my reboots are due to power problems though, since the system
is not on a UPS.  (Cheep UPSes die quickly with the frequent long
brownouts I get here.)

To reproduce, run your system under heavy load, like building
openoffice.org, gcc, or the kernel.  I run two builds at a time.  This
may only happen on multi-processor machines.  (Auric is dual
processor, I don't know about the new sparc buildds.)



-- 
Blars Blarson   [EMAIL PROTECTED]
http://www.blars.org/blars.html
With Microsoft, failure is not an option.  It is a standard feature.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



sparc buildd issues [Re: Release candidate architecture requalification results; amd64 is RC]

2005-12-27 Thread Josip Rodin
On Mon, Dec 26, 2005 at 04:14:35AM -0800, Steve Langasek wrote:
> Sun SPARC
> -
> Initially, it seemed like sparc ought to have the easiest time of the
> four to get requalified.  Despite misgivings earlier this year about
> upstream support, sparc is generally doing well; since the sarge
> release, two new sparc buildds have been brought on-line.
> Unfortunately, in the same period two buildds have gone *off-line*,
> including vore.debian.org which was the designated porter machine; and
> conversations with DSA revealed the presence of persistent kernel
> problems affecting the stability of the remaining buildds.  It is taking
> some time to pin down this problem since it only shows up under load,
> but the porters are working their way through the bug step by step.  In
> the meantime, even if this bug might not affect all sparc hardware,
> having it affect all of our build daemons is certainly a showstopper, so 
> sparc will be removed from consideration until we no longer have to
> worry about OpenOffice builds (or other intensive package builds)
> crashing the buildd machines.

I have a few Ultra5 machines available, as well as sufficient memory, disk
and bandwidth to support expansion. I could probably also get a 280R, and in
the future possibly a V240 or such.

What is this kernel problem, how does it manifest? Which branch, 2.4 or 2.6?

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]