Seeing system-lockups on recent current

2003-10-10 Thread Garance A Drosihn
For the past week or so, I have been having a frustrating time
with my freebsd-current/i386 system.  It is a dual Athlon
system.  It has been running -current just fine since December,
with me updating the OS every week or two.  I did not update it
for most of September, and then went to update it to pick up
the recent round of security-related fixes.
My first update run picked up a change which caused system
panics.  Other people were also seeing that panic, and it
wasn't long before updates were committed to current to fix
that problem.  However, ever since then my -current system
has very frequently locked up.  Totally locked.  The only way
to get it back is a hardware reset.
I have rebuilt the system at least a dozen times since then.
I have built it with snapshots of /usr/src from Sept 12th
to Oct 8th (which is what it's running at the moment).  I
have dropped back to a single-CPU kernel.  I turned off X
(in /etc/ttys) so that doesn't start up at all.  All those
attempts to get a reliable 5.x-system have not worked.
Sometimes the system will crash in the middle of a buildworld,
other times it will crash while it's basically idle and the
monitor is turned off.  One time it crashed in the middle of
an installworld -- right when it was replacing /lib files.
Boy was that a headache to recover from!
On the same PC, in a different DOS partition, is a 4.x-stable
system.  If I boot into 4.x, I have no problems.  I fire up
all the servers that I run, start buildworlds, run cvsup's,
and even had all the 5.x partitions mounted and was running
a infinite-loop that MD5'd every file in the 5.x system.  I
had all of that going on at the same time, and the system is
fine.  While in the 4.x system, I've removed /usr/src on the
5.x system and recreated it, just in case there were some
files corrupted in there.  And once the problems started, I
made a point of always removing all of /usr/obj/usr/src
before starting the buildworld, in case there were corrupted
files in there.
I still have a few things I want to try.  And I know it could
still be a hardware problem (although it bugs me that it fails
so consistently on 5.x and never fails on 4.x).  Perhaps it
is just some disk-corruption problem that occurred during the
first few panics.  But I thought I'd at least mention it, and
see if anyone else has been having similar problems.
--
Garance Alistair Drosehn=   [EMAIL PROTECTED]
Senior Systems Programmer   or  [EMAIL PROTECTED]
Rensselaer Polytechnic Instituteor  [EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Seeing system-lockups on recent current

2003-10-10 Thread Doug White
On Fri, 10 Oct 2003, Garance A Drosihn wrote:

> For the past week or so, I have been having a frustrating time
> with my freebsd-current/i386 system.  It is a dual Athlon
> system.  It has been running -current just fine since December,
> with me updating the OS every week or two.  I did not update it
> for most of September, and then went to update it to pick up
> the recent round of security-related fixes.

It would be useful to isolate exactly what day the problem started
occuring. That would simplify isolating the offending commit.  Use the
date specifier in cvsup to checkout specific dates, then build & test.

-- 
Doug White|  FreeBSD: The Power to Serve
[EMAIL PROTECTED]  |  www.FreeBSD.org
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Seeing system-lockups on recent current

2003-10-10 Thread Garance A Drosihn
At 12:48 PM -0700 10/10/03, Doug White wrote:
On Fri, 10 Oct 2003, Garance A Drosihn wrote:

 For the past week or so, I have been having a frustrating time
 with my freebsd-current/i386 system.  It is a dual Athlon
 system.  It has been running -current just fine since December,
 with me updating the OS every week or two.  I did not update it
 for most of September, and then went to update it to pick up
 the recent round of security-related fixes.
It would be useful to isolate exactly what day the problem
started occuring. That would simplify isolating the offending
commit.  Use the date specifier in cvsup to checkout specific
dates, then build & test.
I've done that.  As mentioned in the message, I've done complete
system rebuilds using snapshots from about Sept 12th to Oct 8th.
The problem is that it's tedious do keep doing these rebuilds,
when the very act of a buildworld or buildkernel can trigger
the system lockup.
I really am torn between thinking that it's a change in -current
and thinking it must be something about my specific system.
Depending on which set of observations I pick, I can make an
excellent case for either one being the culprit.  So, if no one
else *is* seeing this kind of problem, then it's more likely to
be my hardware (one way or another).  I'll keep trying things.
--
Garance Alistair Drosehn=   [EMAIL PROTECTED]
Senior Systems Programmer   or  [EMAIL PROTECTED]
Rensselaer Polytechnic Instituteor  [EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Seeing system-lockups on recent current

2003-10-10 Thread Dag-Erling Smørgrav
Doug White <[EMAIL PROTECTED]> writes:
> On Fri, 10 Oct 2003, Garance A Drosihn wrote:
> > For the past week or so, I have been having a frustrating time
> > with my freebsd-current/i386 system.  It is a dual Athlon
> > system.  [...]
> It would be useful to isolate exactly what day the problem started
> occuring.

I experienced similar problems on a dual Athlon system (MSI K7D
Master-L motherboard, AMD 760MPX chipset, dual Athlon MP 2200+) which
is barely a couple of months old.  I ended up reverting to RELENG_5_1.
With -CURRENT, both UP and SMP kernels will crash with symptoms which
suggest hardware trouble.  With RELENG_5_1, UP is rock solid (knock on
wood) while SMP crashes within minutes of booting.  I've run out of
patience with this system, so I'll keep running RELENG_5_1 on it until
someone manages to convince me that -CURRENT will run properly on AMD
hardware (maybe around 5.3 or so...)

Now, my shiny new 2.4 GHz P4, on the other hand...  *drool*

DES
-- 
Dag-Erling Smørgrav - [EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Seeing system-lockups on recent current

2003-10-10 Thread Don Lewis
On 10 Oct, Dag-Erling Smørgrav wrote:
> Doug White <[EMAIL PROTECTED]> writes:
>> On Fri, 10 Oct 2003, Garance A Drosihn wrote:
>> > For the past week or so, I have been having a frustrating time
>> > with my freebsd-current/i386 system.  It is a dual Athlon
>> > system.  [...]
>> It would be useful to isolate exactly what day the problem started
>> occuring.
> 
> I experienced similar problems on a dual Athlon system (MSI K7D
> Master-L motherboard, AMD 760MPX chipset, dual Athlon MP 2200+) which
> is barely a couple of months old.  I ended up reverting to RELENG_5_1.
> With -CURRENT, both UP and SMP kernels will crash with symptoms which
> suggest hardware trouble.  With RELENG_5_1, UP is rock solid (knock on
> wood) while SMP crashes within minutes of booting.  I've run out of
> patience with this system, so I'll keep running RELENG_5_1 on it until
> someone manages to convince me that -CURRENT will run properly on AMD
> hardware (maybe around 5.3 or so...)

My Athlon XP 1900+/AMD 761 UP box is happily running a late October 6th
version of -current.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Seeing system-lockups on recent current

2003-10-11 Thread Dag-Erling Smørgrav
Don Lewis <[EMAIL PROTECTED]> writes:
> My Athlon XP 1900+/AMD 761 UP box is happily running a late October 6th
> version of -current.

XP != MP

DES
-- 
Dag-Erling Smørgrav - [EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Seeing system-lockups on recent current

2003-10-11 Thread Urmas Lett

On Fri, 10 Oct 2003, Dag-Erling Smørgrav wrote:

> I experienced similar problems on a dual Athlon system (MSI K7D
> Master-L motherboard, AMD 760MPX chipset, dual Athlon MP 2200+) which
> is barely a couple of months old.  I ended up reverting to RELENG_5_1.

Same here. MSI K7D Master-L motherboard, with -CURRENT and MP kernel there
is no way to make buildworld without panic. Even buildkernel exits with
random signals. With MP 4-STABLE, MP Dragonfly and WinXP the same machine
is rock-stable.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Seeing system-lockups on recent current

2003-10-12 Thread Cy Schubert
I'm seeing similar lockups, however they started shortly after the new ATA 
code was committed. The lockups usually occur when there's a lot of ATA 
activity, e.g. filesystem or fsck. At the moment I can only guess as to 
what the problem might be (missing interrupt is my most educated guesss) 
but keeping the amount of ATA I/O to a minimum does help the situation. 
Both machines which have suffered the problem have intel chipsets. One is a 
12 year old P120 (I cannot recall the exact chipset) and the other is a 
PIII with an 815E chipset. On a couple of occasions I had systat running 
and noticed that buffers in use climbed until the system just froze, 
responding only to pings. In all cases all filesystems were generally 
"clean" just with the dirty bit set, except for filesystem on an ATA drive 
(/var or /export) which required considerable cleanup. Filesystems that 
reside on SCSI devices have yet to exhibit any symptoms, e.g. requiring 
anything more than resetting the dirty bit.

Due to this problem I've yet to complete a portupgrade, something I've been 
trying to complete over the last four weeks, as it usually hangs the system 
within 12 hours.


Cheers,
--
Cy Schubert <[EMAIL PROTECTED]>http://www.komquats.com/
BC Government .   FreeBSD UNIX
[EMAIL PROTECTED] . [EMAIL PROTECTED]
http://www.gov.bc.ca/ .http://www.FreeBSD.org/

In message <[EMAIL PROTECTED]>, Garance A Drosihn 
writes:
> For the past week or so, I have been having a frustrating time
> with my freebsd-current/i386 system.  It is a dual Athlon
> system.  It has been running -current just fine since December,
> with me updating the OS every week or two.  I did not update it
> for most of September, and then went to update it to pick up
> the recent round of security-related fixes.
> 
> My first update run picked up a change which caused system
> panics.  Other people were also seeing that panic, and it
> wasn't long before updates were committed to current to fix
> that problem.  However, ever since then my -current system
> has very frequently locked up.  Totally locked.  The only way
> to get it back is a hardware reset.
> 
> I have rebuilt the system at least a dozen times since then.
> I have built it with snapshots of /usr/src from Sept 12th
> to Oct 8th (which is what it's running at the moment).  I
> have dropped back to a single-CPU kernel.  I turned off X
> (in /etc/ttys) so that doesn't start up at all.  All those
> attempts to get a reliable 5.x-system have not worked.
> Sometimes the system will crash in the middle of a buildworld,
> other times it will crash while it's basically idle and the
> monitor is turned off.  One time it crashed in the middle of
> an installworld -- right when it was replacing /lib files.
> Boy was that a headache to recover from!
> 
> On the same PC, in a different DOS partition, is a 4.x-stable
> system.  If I boot into 4.x, I have no problems.  I fire up
> all the servers that I run, start buildworlds, run cvsup's,
> and even had all the 5.x partitions mounted and was running
> a infinite-loop that MD5'd every file in the 5.x system.  I
> had all of that going on at the same time, and the system is
> fine.  While in the 4.x system, I've removed /usr/src on the
> 5.x system and recreated it, just in case there were some
> files corrupted in there.  And once the problems started, I
> made a point of always removing all of /usr/obj/usr/src
> before starting the buildworld, in case there were corrupted
> files in there.
> 
> I still have a few things I want to try.  And I know it could
> still be a hardware problem (although it bugs me that it fails
> so consistently on 5.x and never fails on 4.x).  Perhaps it
> is just some disk-corruption problem that occurred during the
> first few panics.  But I thought I'd at least mention it, and
> see if anyone else has been having similar problems.
> 
> -- 
> Garance Alistair Drosehn=   [EMAIL PROTECTED]
> Senior Systems Programmer   or  [EMAIL PROTECTED]
> Rensselaer Polytechnic Instituteor  [EMAIL PROTECTED]
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
> 


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Seeing system-lockups on recent current

2003-10-17 Thread Garance A Drosihn
At 11:52 PM +0200 10/10/03, Dag-Erling Smørgrav wrote:
Doug White <[EMAIL PROTECTED]> writes:
 On Fri, 10 Oct 2003, Garance A Drosihn wrote:
 > > For the past week or so, I have been having a frustrating
 > > time with my freebsd-current/i386 system.  It is a dual
 > > Athlon system.  [...]
 > It would be useful to isolate exactly what day the problem
 > started occuring.
I experienced similar problems on a dual Athlon system (MSI K7D
Master-L motherboard, AMD 760MPX chipset, dual Athlon MP 2200+)
which is barely a couple of months old.  I ended up reverting
to RELENG_5_1.
With -CURRENT, both UP and SMP kernels will crash with symptoms
which suggest hardware trouble.  With RELENG_5_1, UP is rock
solid (knock on wood) while SMP crashes within minutes of booting.
Just to follow up on this...

My symptoms were different, in that I have problems with both
UP and SMP (although UP did seem more stable).  I also tried a
clean install of 5.1-RELEASE (right off the CD's), and that
would also hang up.  Since I *know* this machine had been
running fine back at the time of 5.1-release, this was pretty
significant.
I took the PC back to the place I got it from, and they ran
some kind of diagnostics on it and said the motherboard is
bad.  They're replacing the motherboard.  So, unless I have
something more to say when I get that back, it looks pretty
likely that my headaches were hardware-related.  (my machine
also has a different components than des's machine)
--
Garance Alistair Drosehn=   [EMAIL PROTECTED]
Senior Systems Programmer   or  [EMAIL PROTECTED]
Rensselaer Polytechnic Instituteor  [EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"