Re: system hangup - I'm lost

2008-10-03 Thread Oliver Lehmann
cpghost wrote:


 If it's PATA, check the cabling, then check it again, and just to
 make sure, replace the cable even if the system used to work flawlessly
 in the past. I've had this on a few servers, but replacing the cables
 always fixed the problem for me.

It's SATA - it's a 3ware 9500S-4LP controller. I can just hope it would have
detect any drive problem (even if it would result because of bad cabeling). If
not I don't know why I had a raid controller anyway ;)
The only other disk drive I've on that system is an USB attached hdd for backup
purpose... So I can't realy try having the swap somewhere else..

//nudel /c0 show all
/c0 Driver Version = 3.60.04.003
/c0 Model = 9500S-4LP
/c0 Available Memory = 112MB
/c0 Firmware Version = FE9X 2.08.00.009
/c0 Bios Version = BE9X 2.03.01.052
/c0 Boot Loader Version = BL9X 2.02.00.001
/c0 Serial Number = D19004A5300589
/c0 PCB Version = Rev 019
/c0 PCHIP Version = 1.50
/c0 ACHIP Version = 3.20
/c0 Number of Ports = 4
/c0 Number of Drives = 4
/c0 Number of Units = 1
/c0 Total Optimal Units = 1
/c0 Not Optimal Units = 0 
/c0 JBOD Export Policy = off
/c0 Disk Spinup Policy = 1
/c0 Spinup Stagger Time Policy (sec) = 2
/c0 Cache on Degrade Policy = Follow Unit Policy

Unit  UnitType  Status %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
--
u0RAID-5OK -   -   64K 698.461   ON OFF

Port   Status   Unit   SizeBlocksSerial
---
p0 OK   u0 232.88 GB   488397168 WD-WCANK1079272 
p1 OK   u0 232.88 GB   488397168 WD-WCANK1120378 
p2 OK   u0 232.88 GB   488397168 WD-WCANK1120936 
p3 OK   u0 232.88 GB   488397168 WD-WCANK1120805 

Name  OnlineState  BBUReady  StatusVolt Temp Hours  LastCapTest
---
bbu   On   Yes   OKOK   OK   25524-Aug-2008  

//nudel 

-- 
 Oliver Lehmann
  http://www.pofo.de/
  http://wishlist.ans-netz.de/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: system hangup - I'm lost

2008-10-02 Thread Oliver Fromme
Jeremy Chadwick wrote:
  - Maxim MAX211ECA1, no idea but doesn't interest me

Just for completeness, this is a serial port driver IC.

Best regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH  Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758,  Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

[...]  one observation we can make here is that Python makes
an excellent pseudocoding language, with the wonderful attribute
that it can actually be executed.  --  Bruce Eckel
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: system hangup - I'm lost

2008-10-02 Thread Oliver Lehmann
Oliver Lehmann wrote:

 Hi,
 
 today I'd a crash again - I was not able to get a crash dump (thought a
 panic at the end of the kdb would do it but didn't - should have called
 dumpon before ;)) - so here now the information I was able to retrieve:
 
 Ok, what I've got so far is wrinting stuff out to the console when the
 system hangs up:
 
 swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2, size: 4096
 swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2, size: 4096
 swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2, size: 4096
 swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2, size: 4096
 ...
 
 and now the debugger stuff:

 [snipped]


So.. no idea? anyone?

-- 
 Oliver Lehmann
  http://www.pofo.de/
  http://wishlist.ans-netz.de/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: system hangup - I'm lost

2008-10-02 Thread John Baldwin
On Wednesday 01 October 2008 11:29:43 am Oliver Lehmann wrote:
 Hi,
 
 today I'd a crash again - I was not able to get a crash dump (thought a
 panic at the end of the kdb would do it but didn't - should have called
 dumpon before ;)) - so here now the information I was able to retrieve:
 
 Ok, what I've got so far is wrinting stuff out to the console when the
 system hangs up:
 
 swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2, size: 4096
 swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2, size: 4096
 swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2, size: 4096
 swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2, size: 4096

Sounds like your disk has died, or perhaps the controller is hung and not 
completing disk I/O requests anymore.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: system hangup - I'm lost

2008-10-02 Thread Oliver Lehmann
John Baldwin wrote:

 Sounds like your disk has died, or perhaps the controller is hung and not 
 completing disk I/O requests anymore.

Hm - the 3ware eventlog does not shed any light on this - no events
occured. So I can just guess that the controller and the disks are fine
(I had once a hard failing disk and the controller detected it correctly)

Do you have an idea how to debug this further?

-- 
 Oliver Lehmann
  http://www.pofo.de/
  http://wishlist.ans-netz.de/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: system hangup - I'm lost

2008-10-02 Thread cpghost
On Thu, Oct 02, 2008 at 06:51:06PM +0200, Oliver Lehmann wrote:
  today I'd a crash again - I was not able to get a crash dump (thought a
  panic at the end of the kdb would do it but didn't - should have called
  dumpon before ;)) - so here now the information I was able to retrieve:
  
  Ok, what I've got so far is wrinting stuff out to the console when the
  system hangs up:
  
  swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2, size: 4096
  swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2, size: 4096
  swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2, size: 4096
  swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2, size: 4096
  ...
  
  and now the debugger stuff:
 
  [snipped]
 
 
 So.. no idea? anyone?

If it's PATA, check the cabling, then check it again, and just to
make sure, replace the cable even if the system used to work flawlessly
in the past. I've had this on a few servers, but replacing the cables
always fixed the problem for me.

Oh, btw, you can reproduce this exact behavior on diskless workstations
with an NFS-mounted swap.

IIRC, it even happened on VERY slow hardware with GBDE or GELI-encrypted
swap partitions; but I'm not 100% sure it was due to slowness (it could
have been a bad cabling issue as well).

 -- 
  Oliver Lehmann
   http://www.pofo.de/
   http://wishlist.ans-netz.de/

-cpghost.

-- 
Cordula's Web. http://www.cordula.ws/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: system hangup - I'm lost

2008-10-01 Thread Jim Pingle
Jeremy Chadwick wrote:
 P.S. -- You're the 2nd person I've encountered in under a week who's
 using 440BX/GX-based hardware in present day.  I would not be
 surprised if the board is simply going bad/failing due to age.  :-)

I still have quite a few of these in active use. They are good
workhorses. Sure, they don't have the raw computing power of newer
servers, but for most of our tasks they get the job done. I also have a
couple stacks of these in 2U cases sitting unused for spare parts and
testing.

They make great FreeBSD boxes, and handle low-moderate loads pretty
well. We use them for all kinds of things: firewalls, personal/testing
servers, SVN repos, monitoring and traffic graphing, name servers, you
name it.

To bring this back on topic, they might be old, but I have yet to
encounter one single motherboard from that series that has failed on me
in any way. (*knock on wood*) However, mine are all Intel L440GX boards
with dual PIII CPUs in the 600-800MHz range.

We try to squeeze every last bit of value out of the hardware we have. :-)

Jim
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: system hangup - I'm lost

2008-10-01 Thread Oliver Lehmann
Hi,

today I'd a crash again - I was not able to get a crash dump (thought a
panic at the end of the kdb would do it but didn't - should have called
dumpon before ;)) - so here now the information I was able to retrieve:

Ok, what I've got so far is wrinting stuff out to the console when the
system hangs up:

swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2, size: 4096
...

and now the debugger stuff:

KDB: enter: manual escape to debugger
[thread pid 40 tid 100048 ]
Stopped at  kdb_enter+0x30: leave   
db sh locks
exclusive sleep mutex Giant r = 0 (0xc07c73c0) locked
@ /usr/src/sys/kern/kern_intr.c:681
db sh alllocks
Process 40 (irq1: atkbd0) thread 0xc4503a80 (100048)
exclusive sleep mutex Giant r = 0 (0xc07c73c0) locked
@ /usr/src/sys/kern/kern_intr.c:681
db 

so there are no locks except the one I caused but anyhow:

db bt 100048
Tracing pid 40 tid 100048 td 0xc4503a80
kdb_enter(c077aee6,4,1,0,1,...) at kdb_enter+0x30
scgetc(c0842b60,2,de391c88,c05ad0b7,c4609340,...) at scgetc+0x575
sckbdevent(c0823740,0,c0842b60,c07c73c0,8,...) at sckbdevent+0x210
atkbd_intr(c0823740,0,de391cd8,c05695b8,c0823740,...) at atkbd_intr+0xa1
atkbdintr(c0823740,0,c076448a,2a9,8,...) at atkbdintr+0x21
ithread_execute_handlers(c460cc90,c4449680,c076448a,30e,c4503a80,...) at
ithread_execute_handlers+0x108 ithread_loop
(c45f66c0,de391d38,c07642ea,30c,0,...) at ithread_loop+0x64 fork_exit
(c05696b0,c45f66c0,de391d38) at fork_exit+0x78 fork_trampoline() at
fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xde391d6c, ebp = 0 ---

db sh pcpu
cpuid= 0
curthread= 0xc4503a80: pid 40 irq1: atkbd0
curpcb   = 0xde391d90
fpcurthread  = none
idlethread   = 0xc444c780: pid 11 idle: cpu0
APIC ID  = 1
currentldt   = 0x50
spin locks held:

and now the output of ps (beware, it is long, no idea why there are so
many cron - maybe the crond still schedules but they don't get processed?)

show lockedvnods follows afterwards

db ps
  pid  ppid  pgrp   uid   state   wmesg wchancmd
57919 57918   692 0  SV  ufs  0xc47857c8 cron
57918   692   692 0  S   ppwait   0xc6e63a78 cron
57917 57916   692 0  SV  ufs  0xc47857c8 cron
57916   692   692 0  S   ppwait   0xc6e63c90 cron
57915 57914   692 0  SV  ufs  0xc47857c8 cron
57914   692   692 0  S   ppwait   0xc6eb3000 cron
57913 57912   692 0  SV  ufs  0xc47857c8 cron
57912   692   692 0  S   ppwait   0xc70a9430 cron
57911 57908   692 0  SV  ufs  0xc47857c8 cron
57910 57907   692 0  SV  ufs  0xc47857c8 cron
57909 57906   692 0  SV  ufs  0xc47857c8 cron
57908   692   692 0  S   ppwait   0xc6eb3648 cron
57907   692   692 0  S   ppwait   0xc6eb3860 cron
57906   692   692 0  S   ppwait   0xc6eb3a78 cron
57905   686   68625  S   ufs  0xc4953388 sendmail
57904 57902   692 0  SV  ufs  0xc47857c8 cron
57903 57901   692 0  SV  ufs  0xc47857c8 cron
57902   692   692 0  S   ppwait   0xc49a4430 cron
57901   692   692 0  S   ppwait   0xc49a4648 cron
57900 57899   692 0  SV  ufs  0xc47857c8 cron
57899   692   692 0  S   ppwait   0xc49a4860 cron
57898 57897   692 0  SV  ufs  0xc47857c8 cron
57897   692   692 0  S   ppwait   0xc49a4a78 cron
57896 57895   692 0  SV  ufs  0xc47857c8 cron
57895   692   692 0  S   ppwait   0xc49a4c90 cron
57894 57893   692 0  SV  ufs  0xc47857c8 cron
57893   692   692 0  S   ppwait   0xc6b7c648 cron
57892 57891   692 0  SV  ufs  0xc47857c8 cron
57891   692   692 0  S   ppwait   0xc66bc430 cron
57890 57889   692 0  SV  ufs  0xc47857c8 cron
57889   692   692 0  S   ppwait   0xc6b7c860 cron
57888 57887   692 0  SV  ufs  0xc47857c8 cron
57887   692   692 0  S   ppwait   0xc66bc860 cron
57886   686   68625  S   ufs  0xc4953388 sendmail
57885 57884   692 0  SV  ufs  0xc47857c8 cron
57884   692   692 0  S   ppwait   0xc66bca78 cron
57883 57882   692 0  SV  ufs  0xc47857c8 cron
57882   692   692 0  S   ppwait   0xc66bcc90 cron
57881 57880   692 0  SV  ufs  0xc47857c8 cron
57880   692   692 0  S   ppwait   0xc6a65000 cron
57879 57878   692 0  SV  ufs  0xc47857c8 cron
57878   692   692 0  S   ppwait   0xc6a65218 cron
57877 57876   692 0  SV  ufs  0xc47857c8 cron
57876   692   692 0  S   ppwait   0xc6a65430 cron
57875 57874   692 0  SV  ufs  0xc47857c8 cron
57874   692   692 0  S   ppwait   0xc6a65648 cron
57873 57872   692 0  SV  ufs  0xc47857c8 cron
57872   692   692 0  S   ppwait   

Re: system hangup - I'm lost

2008-10-01 Thread Oliver Lehmann
Jeremy Chadwick wrote:

 On Wed, Oct 01, 2008 at 06:53:09AM +0200, Oliver Lehmann wrote:

  Because it is a Server Board it offers a lot of managing features and
  other nice things like serial console at bootup and system monitoring
  features... but all unsupported withn FreeBSDs software ;)
 
 Really?  That's interesting, because Charles Sprickman told me that
 there is no hardware monitoring information in the BIOS if you go in
 there.  Most motherboards provide that in the BIOS as a centralised
 place above all else.

You are right - I could have sworn that there was such an screen in the
BIOS but all I can see is for setting up stuff like enabling eventlog and
posting it through a modem connection and so on - server specific stuff -
but no display screen for health information...
So you where right ;)

-- 
 Oliver Lehmann
  http://www.pofo.de/
  http://wishlist.ans-netz.de/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: system hangup - I'm lost

2008-09-30 Thread Gavin Atkinson
On Mon, 2008-09-29 at 22:14 +0200, Oliver Lehmann wrote:

   Any idea what I could do to shed some more light on this behaviour?
   Why it is happening and what really is causing it?
   Would enabling the kernel debugger really help here? I mean the system
   is really hanging up - except ping response it is not responding to
   anything except the reset switch ;)

If it's responding to ping, you should be able to get into the debugger.
Compile it in, along with options WITNESS and options
WITNESS_SKIPSPIN, and press ctrl-alt-escape when the machine next
hangs.

From there, it should hopefully be possible to get more info.  It's been
a long time since I've used the debugger under 6.x so some of the more
useful commands may not exist, but the output of at least sh locks,
sh alllocks and bt on any processes that seem to be holding locks.
Also sh pcpu and ps will help to determine exactly what was running
at the time.

Gavin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: system hangup - I'm lost

2008-09-30 Thread Bartosz Stec

Oliver Lehmann wrote:

Hi,

My fileserver has sporadical hangups running 6.3:

FreeBSD 6.3-STABLE #0: Thu Jun 19 00:21:00 CEST 2008
[EMAIL PROTECTED]:/usr/obj/i386-pentium3-6.3/usr/src/sys/NUDEL

The exact release doesn't matter since it happened before. It always
happens afer some time of having some load on the system (I'm building
ports with tinderbox and during the build process it just hangs up).

The system does nothing write out on the console, neither the CRT, nor
the serial console.

The system itself is:

CPU: Intel Pentium III (845.64-MHz 686-class CPU)
  Origin = GenuineIntel  Id = 0x683  Stepping = 3
  
Features=0x387fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,PN,MMX,FXSR,SSE
real memory  = 805240832 (767 MB)
avail memory = 778481664 (742 MB)
ACPI APIC Table: Intel  N440BX  
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): APIC ID:  1
 cpu1 (AP): APIC ID:  0
ioapic0 Version 1.1 irqs 0-23 on motherboard

while the diskspace is provided by an 3ware RAID:

twa0: 3ware 9000 series Storage Controller port 0x2400-0x24ff mem 
0xf4101000-0xf41010ff,0xf480-0xf4ff irq 18 at device 11.0 on pci0
twa0: INFO: (0x04: 0x0053): Battery capacity test is overdue: 
twa0: INFO: (0x15: 0x1300): Controller details:: Model 9500S-4LP, 4 ports, Firmware FE9X 2.08.00.009, BIOS BE9X 2.03.01.052


da0 at twa0 bus 0 target 0 lun 0
da0: AMCC 9500S-4LP  DISK 2.08 Fixed Direct Access SCSI-3 device 
da0: 100.000MB/s transfers

da0: 715224MB (1464778752 512 byte sectors: 255H 63S/T 91178C)

I had - in the past - sometimes messages left which where indicating,
that the system was not able to allocate swap space fast enough if I
recall it correctly (_not_ out of swap space!) but the RAID is kinda
fast imho.

  Any idea what I could do to shed some more light on this behaviour?
  Why it is happening and what really is causing it?
  Would enabling the kernel debugger really help here? I mean the system
  is really hanging up - except ping response it is not responding to
  anything except the reset switch ;)

   Greetings, Oliver


  
Personally I'd rather bet on some hardware problem (overheating?) Try to 
install mbmon from ports. I had also similiar problems with old 
motherboards with swelled capacitors.


--
Bartosz Stec

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: system hangup - I'm lost

2008-09-30 Thread Robert Watson

On Tue, 30 Sep 2008, Gavin Atkinson wrote:


On Mon, 2008-09-29 at 22:14 +0200, Oliver Lehmann wrote:


  Any idea what I could do to shed some more light on this behaviour?
  Why it is happening and what really is causing it?
  Would enabling the kernel debugger really help here? I mean the system
  is really hanging up - except ping response it is not responding to
  anything except the reset switch ;)


If it's responding to ping, you should be able to get into the debugger. 
Compile it in, along with options WITNESS and options WITNESS_SKIPSPIN, 
and press ctrl-alt-escape when the machine next hangs.


From there, it should hopefully be possible to get more info.  It's been a 
long time since I've used the debugger under 6.x so some of the more useful 
commands may not exist, but the output of at least sh locks, sh alllocks 
and bt on any processes that seem to be holding locks. Also sh pcpu and 
ps will help to determine exactly what was running at the time.


show lockedvnods is also quite useful if the problem originates in the file 
system, as it lists vnodes that have been locked, and by which threads.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: system hangup - I'm lost

2008-09-30 Thread Jeremy Chadwick
On Tue, Sep 30, 2008 at 12:39:27PM +0200, Bartosz Stec wrote:
 Oliver Lehmann wrote:
 Hi,

 My fileserver has sporadical hangups running 6.3:

 FreeBSD 6.3-STABLE #0: Thu Jun 19 00:21:00 CEST 2008
 [EMAIL PROTECTED]:/usr/obj/i386-pentium3-6.3/usr/src/sys/NUDEL

 The exact release doesn't matter since it happened before. It always
 happens afer some time of having some load on the system (I'm building
 ports with tinderbox and during the build process it just hangs up).

 The system does nothing write out on the console, neither the CRT, nor
 the serial console.

 The system itself is:

 CPU: Intel Pentium III (845.64-MHz 686-class CPU)
   Origin = GenuineIntel  Id = 0x683  Stepping = 3
   
 Features=0x387fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,PN,MMX,FXSR,SSE
 real memory  = 805240832 (767 MB)
 avail memory = 778481664 (742 MB)
 ACPI APIC Table: Intel  N440BX  
 FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
  cpu0 (BSP): APIC ID:  1
  cpu1 (AP): APIC ID:  0
 ioapic0 Version 1.1 irqs 0-23 on motherboard

 while the diskspace is provided by an 3ware RAID:

 twa0: 3ware 9000 series Storage Controller port 0x2400-0x24ff mem 
 0xf4101000-0xf41010ff,0xf480-0xf4ff irq 18 at device 11.0 on pci0
 twa0: INFO: (0x04: 0x0053): Battery capacity test is overdue: twa0: 
 INFO: (0x15: 0x1300): Controller details:: Model 9500S-4LP, 4 ports, 
 Firmware FE9X 2.08.00.009, BIOS BE9X 2.03.01.052

 da0 at twa0 bus 0 target 0 lun 0
 da0: AMCC 9500S-4LP  DISK 2.08 Fixed Direct Access SCSI-3 device  
 da0: 100.000MB/s transfers
 da0: 715224MB (1464778752 512 byte sectors: 255H 63S/T 91178C)

 I had - in the past - sometimes messages left which where indicating,
 that the system was not able to allocate swap space fast enough if I
 recall it correctly (_not_ out of swap space!) but the RAID is kinda
 fast imho.

   Any idea what I could do to shed some more light on this behaviour?
   Why it is happening and what really is causing it?
   Would enabling the kernel debugger really help here? I mean the system
   is really hanging up - except ping response it is not responding to
   anything except the reset switch ;)

Greetings, Oliver


   
 Personally I'd rather bet on some hardware problem (overheating?) Try to  
 install mbmon from ports. I had also similiar problems with old  
 motherboards with swelled capacitors.

Be careful with mbmon and healthd -- just because they compile and run
does not mean they're working properly (the values shown may be
completely unreliable/incorrect).

It's best to check such things in the system BIOS, unless you have
absolute certainty that your motherboard is supported by mbmon/healthd.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: system hangup - I'm lost

2008-09-30 Thread John Baldwin
On Monday 29 September 2008 04:14:08 pm Oliver Lehmann wrote:
 Hi,
 
 My fileserver has sporadical hangups running 6.3:
 
 FreeBSD 6.3-STABLE #0: Thu Jun 19 00:21:00 CEST 2008
 
[EMAIL PROTECTED]:/usr/obj/i386-pentium3-6.3/usr/src/sys/NUDEL
 
 The exact release doesn't matter since it happened before. It always
 happens afer some time of having some load on the system (I'm building
 ports with tinderbox and during the build process it just hangs up).
 
 The system does nothing write out on the console, neither the CRT, nor
 the serial console.

1) Setup support for crashdumps.
2) Add 'DDB' and 'KDB' to your kernel.  When it hangs, break into the debugger 
(CTRL+ALT+ESC) and run 'panic' to generate a crash dump.
3) ps -axl -M /var/crash/vmcore.X -N /boot/kernel/kernel

(where vmcore.X is the core file generated, probably vmcore.0).  That's the 
first place to start.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: system hangup - I'm lost

2008-09-30 Thread Oliver Lehmann
Hi,

Jeremy Chadwick wrote:

 On Tue, Sep 30, 2008 at 12:39:27PM +0200, Bartosz Stec wrote:
  Personally I'd rather bet on some hardware problem (overheating?) Try to  
  install mbmon from ports. I had also similiar problems with old  
  motherboards with swelled capacitors.
 
 Be careful with mbmon and healthd -- just because they compile and run
 does not mean they're working properly (the values shown may be
 completely unreliable/incorrect).
 
 It's best to check such things in the system BIOS, unless you have
 absolute certainty that your motherboard is supported by mbmon/healthd.

The systems chipset (440GX - board is
http://www.intel.com/support/motherboards/server/l440gx/) is not
supported by mbmon. All I can check is the temperature of the harddrives
and they are between 30 - 45 °C. Which just means nothing for the CPUs ;)

make world for example does not break the system down - I only encounter
this during my tinderbox runs - who knows what stresses it then that much.

I'll now make a kernel with all the debugging stuff in it...


-- 
 Oliver Lehmann
  http://www.pofo.de/
  http://wishlist.ans-netz.de/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: system hangup - I'm lost

2008-09-30 Thread Oliver Lehmann
John Baldwin wrote:

 (CTRL+ALT+ESC) and run 'panic' to generate a crash dump.

problem here is, that after some memory upgrade my swapspace is no longer
bigh enough to cover the memory size. I'll try this as a last resort if
the interactive work with kdb does not provide any help and will remove
some memory before it then...

-- 
 Oliver Lehmann
  http://www.pofo.de/
  http://wishlist.ans-netz.de/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: system hangup - I'm lost

2008-09-30 Thread John Baldwin
On Tuesday 30 September 2008 10:57:19 am Oliver Lehmann wrote:
 John Baldwin wrote:
 
  (CTRL+ALT+ESC) and run 'panic' to generate a crash dump.
 
 problem here is, that after some memory upgrade my swapspace is no longer
 bigh enough to cover the memory size. I'll try this as a last resort if
 the interactive work with kdb does not provide any help and will remove
 some memory before it then...

Turn on minidumps.  minidumps don't dump all of memory (generally a lot, lot 
less).

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: system hangup - I'm lost

2008-09-30 Thread Jeremy Chadwick
On Tue, Sep 30, 2008 at 04:55:34PM +0200, Oliver Lehmann wrote:
 Hi,
 
 Jeremy Chadwick wrote:
 
  On Tue, Sep 30, 2008 at 12:39:27PM +0200, Bartosz Stec wrote:
   Personally I'd rather bet on some hardware problem (overheating?) Try to  
   install mbmon from ports. I had also similiar problems with old  
   motherboards with swelled capacitors.
  
  Be careful with mbmon and healthd -- just because they compile and run
  does not mean they're working properly (the values shown may be
  completely unreliable/incorrect).
  
  It's best to check such things in the system BIOS, unless you have
  absolute certainty that your motherboard is supported by mbmon/healthd.
 
 The systems chipset (440GX - board is
 http://www.intel.com/support/motherboards/server/l440gx/) is not
 supported by mbmon. All I can check is the temperature of the harddrives
 and they are between 30 - 45 °C. Which just means nothing for the CPUs ;)

The chipset rarely matters (I've yet to encounter any PC chipset that
natively handles full fan, voltage, and temperature monitoring), but
the motherboard model can tell me a lot.  :-)

Boards have to include an external H/W monitoring IC (such as one from
National Semiconductor (LMxx), AMD, or Winbond), have thermistors placed
around the board, and have the H/W IC tied into the ISA or SMBus.
Sometimes the H/W monitoring IC also acts as a super I/O chip (which
means it handles serial, parallel, keyboard, mouse, and floppy disks --
and sometimes IDE).

I can't find anything on Intel's site that clues me in; all the PDFs
are vague as far as what chips are on the board.

I tried searching for a high-resolution photo of the L440GX on Google
Images, but I find none which are sharp/clear enough.  The best I
could find was this:

http://bbs.yjfy.com/UploadFile/2008-2/20082818545062073.jpg

I see Intel northbridge and southbridges, a Cirrus Logic (VGA?) chip, an
Intel flash chip (probably for CMOS), and an Intel NIC.  Four chips I
don't recognise are an Intel chip on the far right, a mystery chip
at the bottom of the board (can't make out company logo), and two
chips with E in their company logo (right of PCI slots).  Possibly
one of these handles H/W monitoring.

If you can reboot the system and go into the BIOS, see if you can
find anything that looks remotely like CPU and system temperatures,
as well as voltages.  If there's no such menu, the board likely has
no support for such.

P.S. -- You're the 2nd person I've encountered in under a week who's
using 440BX/GX-based hardware in present day.  I would not be
surprised if the board is simply going bad/failing due to age.  :-)

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: system hangup - I'm lost

2008-09-30 Thread Oliver Lehmann
Jeremy Chadwick wrote:

 I can't find anything on Intel's site that clues me in; all the PDFs
 are vague as far as what chips are on the board.

Have you tried the Product specifications?

http://download.intel.com/support/motherboards/server/l440gx/254151-003.pdf

Beginning on page 33 (43 of the pdf)

It has 3 different Server Management busses. the temperature part is
handled within a Baseboard Management Controller. This BMC is implemented
using a DS82CL10.
Because it is a Server Board it offers a lot of managing features and
other nice things like serial console at bootup and system monitoring
features... but all unsupported withn FreeBSDs software ;)


 P.S. -- You're the 2nd person I've encountered in under a week who's
 using 440BX/GX-based hardware in present day.  I would not be
 surprised if the board is simply going bad/failing due to age.  :-)

Hm - I'd wonder if this would be the case. I mean I'm using older
hardware (Tyan Tsunami S1830S, PII300, DAC960P, RAID-1 2*IBM DFHS S2W)
without any problems as router ;)

-- 
 Oliver Lehmann
  http://www.pofo.de/
  http://wishlist.ans-netz.de/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: system hangup - I'm lost

2008-09-30 Thread Jeremy Chadwick
On Wed, Oct 01, 2008 at 06:53:09AM +0200, Oliver Lehmann wrote:
 Jeremy Chadwick wrote:
 
  I can't find anything on Intel's site that clues me in; all the PDFs
  are vague as far as what chips are on the board.
 
 Have you tried the Product specifications?

No need -- Charles Sprickman sent me high-resolution pictures of all the
ICs on the 440GX board, and I was able to identify all of them except a
few (and those are obviously bit-latches or gates of some kind, so not
important).

Here's the list:

- National Semiconductor Super I/O chip [1]
- Cirrus Logic GD5480 video/VGA chip
- Samsung SGRAM module for VGA chip; 16MBytes, 70ns
- Intel 82371EB (PIIX4E) chip [2]
- Dallas Semiconductor DS80CH11 power management chip
- EtronTech SRAM; 256kbit, 15ns
- Unknown, looks like flash or DRAM
- Intel S82093AA I/O APIC
- Octal bit-latch IC
- Intel SB21150BC PCI bridge; 66MHz
- Intel chip of some kind, can't make it out due to dust
- Texas Instrument UCC5638 SCSI terminator
- Texas Instrument UCC5638 SCSI terminator
- Cypress Semiconductor W48C101 clock chip
- Numerous other bit-latching ICs
- Cypress Semiconductor 3.3V SDRAM buffering chip; probably used to drive SDRAM 
DIMMs (system memory)
- ??? Model 684702-003; not sure what this does, but is of no interest
- Some TI chip, doesn't interest me
- 2x California Micro Devices ECP/EPP (parallel port) terminator
- Maxim MAX211ECA1, no idea but doesn't interest me

[1]: I'll have to look up datasheets on this chip to see if it supports
H/W monitoring.

[2]: This chip does a **lot**, the most important piece being it drives
the entire PCI bus.  It *does* support SMBus, but not I2C.  Linux
lmsensors supports this chip, but I don't know how it supports it.
I will need to look up the specs/datasheets on it
http://www.lm-sensors.org/browser/lm-sensors/trunk/doc/busses/i2c-piix4


 http://download.intel.com/support/motherboards/server/l440gx/254151-003.pdf
 
 Beginning on page 33 (43 of the pdf)
 
 It has 3 different Server Management busses. the temperature part is
 handled within a Baseboard Management Controller. This BMC is implemented
 using a DS82CL10.

This tells me very little.  :-)

 Because it is a Server Board it offers a lot of managing features and
 other nice things like serial console at bootup and system monitoring
 features... but all unsupported withn FreeBSDs software ;)

Really?  That's interesting, because Charles Sprickman told me that
there is no hardware monitoring information in the BIOS if you go in
there.  Most motherboards provide that in the BIOS as a centralised
place above all else.

Either way, I'm going to look into the details.  Examining what exactly
Linux lm-sensors means by support will be the first step.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]