Re: System hangs for several minutes (disk IO related)

2013-09-02 Thread Nils Pascal Illenseer
Hi,

I see similar hangs on one of our Supermicro servers.
We have a ZFS RAID (mirrored stripped vdevs) and when I use zfs receive to 
receive snapshots the whole system hangs for up to ten or even more minutes at 
the end.

Kernel: latest (9.2-RC3)
Adaptec 6805 RAID-Controller provides disks for ZFS via JBOD

/var/log/messages and dmesg do not show anything related to the hangs.

I hope this helps to analyze that issue any further.

Regards,
Nils Pascal Illenseer


--  Cut here  --

Copyright (c) 1992-2013 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 9.2-RC3 #0 r254795: Sat Aug 24 20:25:04 UTC 2013
r...@bake.isc.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64
gcc version 4.2.1 20070831 patched [FreeBSD]
CPU: AMD Opteron(tm) Processor 6376  (2300.05-MHz K8-class CPU)
  Origin = AuthenticAMD  Id = 0x600f20  Family = 0x15  Model = 0x2  Stepping 
= 0
  
Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT
  
Features2=0x3e98320bSSE3,PCLMULQDQ,MON,SSSE3,FMA,CX16,SSE4.1,SSE4.2,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C
  AMD Features=0x2e500800SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM
  AMD 
Features2=0x1ebbfffLAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,XOP,SKINIT,WDT,LWP,FMA4,b17,NodeId,TBM,Topology,b23,b24
  Standard Extended Features=0x8
  TSC: P-state invariant, performance statistics
real memory  = 137438953472 (131072 MB)
avail memory = 133006090240 (126844 MB)
Event timer LAPIC quality 400
ACPI APIC Table: 050713 APIC1654
FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs
FreeBSD/SMP: 1 package(s) x 16 core(s)
…
aacraid0: Adaptec RAID Controller mem 
0xfd80-0xfdbf,0xfd7bf800-0xfd7b,0xfd7bf400-0xfd7bf4ff irq 28 at 
device 0.0 on pci1
aacraid0: Enable Raw I/O
aacraid0: Enable 64-bit array
aacraid0: New comm. interface type1 enabled
aacraid0: Adaptec 6805, aacraid driver 3.1.1-1
aacraidp0 on aacraid0
aacraidp1 on aacraid0
aacraidp2 on aacraid0
aacraidp3 on aacraid0

--  Cut here  --


Am 30.07.2013 um 19:19 schrieb Ewald Jenisch a...@jenisch.at:

 Hi,
 
 I'm seeing rather strange behavior on an HP DL585 G5 wrt. disk IO:
 
 When there's any disk io the machine completely freezes, i.e. no
 console input possible, no screen output - complete hang. After some
 minutes the box comes back to normal again - but sure enough with the
 next disk io it freezes again.
 
 To give you a typical example: While a portsnap fetch extract was
 running I did a sync. Normally this should complete in a matter of
 milliseconds to seconds in the worst case - but dig this:
 
 # date;time sync;date
 Tue Jul 30 09:57:38 CEST 2013
 0.000u 0.311s 9:54.69 0.0%  4+161k 0+1287io 0pf+0w
 Tue Jul 30 10:07:38 CEST 2013
 #
 
 No, this is not a typo - it really took nearly ten minutes (!) for the
 sync to complete. In the meantime - every windows, all activity
 (console, screen-output etc.) is completely blocked. ('portsnap fetch
 extract' was only given as an example here - the lockup occurs
 whenever there is disk io like for example tar, etc).
 
 We're speaking about a machine with decent hardware here, here's an
 excerpt from dmesg:
 
 --  Cut here  --
 
 FreeBSD 9.2-BETA2 #0 r253750: Mon Jul 29 11:07:04 CEST 2013
root@sniff-rz2:/usr/obj/usr/src/sys/GENERIC amd64
 gcc version 4.2.1 20070831 patched [FreeBSD]
 CPU: Quad-Core AMD Opteron(tm) Processor 8358 SE (2411.16-MHz K8-class CPU)
  Origin = AuthenticAMD  Id = 0x100f23  Family = 0x10  Model = 0x2  Stepping 
 = 3
  
 Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT
  Features2=0x802009SSE3,MON,CX16,POPCNT
  AMD Features=0xee400800SYSCALL,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!
  AMD Features2=0x7ffLAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS
  TSC: P-state invariant
 real memory  = 137438953472 (131072 MB)
 avail memory = 132973432832 (126813 MB)
 Event timer LAPIC quality 400
 ACPI APIC Table: HP ProLiant
 FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs
 ...
 ciss0: HP Smart Array P400 port 0x3000-0x30ff mem 
 0xd9e0-0xd9ef,0xd9df-0xd9df0fff irq 16 at device 0.0 on pci8
 ciss0: PERFORMANT Transport
 ...
 da0 at ciss0 bus 0 scbus2 target 0 lun 0
 da0: COMPAQ RAID 1(1+0) OK Fixed Direct Access SCSI-5 device 
 da0: 135.168MB/s transfers
 da0: Command Queueing enabled
 da0: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C)
 da0: quirks=0x1NO_SYNC_CACHE
 
 --  Cut here  --
 
 Kernel: Latest kernel as of yesterday (9.2Beta)
 
 BIOS: is at the latest level 

Re: System hangs for several minutes (disk IO related)

2013-07-31 Thread Mark Felder
If you Google SmartArray P400 I'm sure you will find tons of horror
stories. As soon as I saw that in your post I recalled looking into an
issue for a customer not too long ago. In short, it's a very bad
controller with tons of issues. I could be mistaking this for another
controller, but I'm pretty confident this is the same one.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: System hangs for several minutes (disk IO related)

2013-07-31 Thread John Johnstone

On 7/31/13 9:41 AM, Mark Felder wrote:

If you Google SmartArray P400 I'm sure you will find tons of horror
stories. As soon as I saw that in your post I recalled looking into an
issue for a customer not too long ago. In short, it's a very bad
controller with tons of issues. I could be mistaking this for another
controller, but I'm pretty confident this is the same one.


FreeBSD 9.1 is running here on a few DL360 G5's with the P400i and I've 
seen no problems with them.  I've seen about 200 of the same machines in 
production for a few years with Windows and Linux and not a particularly 
notable failure rate.  I just did a simple web search for P400 problems 
and didn't come up with much.  I'd be surprised if there's a significant 
difference between the P400i and the P400.


-
John J.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


System hangs for several minutes (disk IO related)

2013-07-30 Thread Ewald Jenisch
Hi,

I'm seeing rather strange behavior on an HP DL585 G5 wrt. disk IO:

When there's any disk io the machine completely freezes, i.e. no
console input possible, no screen output - complete hang. After some
minutes the box comes back to normal again - but sure enough with the
next disk io it freezes again.

To give you a typical example: While a portsnap fetch extract was
running I did a sync. Normally this should complete in a matter of
milliseconds to seconds in the worst case - but dig this:

# date;time sync;date
Tue Jul 30 09:57:38 CEST 2013
0.000u 0.311s 9:54.69 0.0%  4+161k 0+1287io 0pf+0w
Tue Jul 30 10:07:38 CEST 2013
#

No, this is not a typo - it really took nearly ten minutes (!) for the
sync to complete. In the meantime - every windows, all activity
(console, screen-output etc.) is completely blocked. ('portsnap fetch
extract' was only given as an example here - the lockup occurs
whenever there is disk io like for example tar, etc).

We're speaking about a machine with decent hardware here, here's an
excerpt from dmesg:

--  Cut here  --

FreeBSD 9.2-BETA2 #0 r253750: Mon Jul 29 11:07:04 CEST 2013
root@sniff-rz2:/usr/obj/usr/src/sys/GENERIC amd64
gcc version 4.2.1 20070831 patched [FreeBSD]
CPU: Quad-Core AMD Opteron(tm) Processor 8358 SE (2411.16-MHz K8-class CPU)
  Origin = AuthenticAMD  Id = 0x100f23  Family = 0x10  Model = 0x2  Stepping 
= 3
  
Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT
  Features2=0x802009SSE3,MON,CX16,POPCNT
  AMD Features=0xee400800SYSCALL,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!
  AMD Features2=0x7ffLAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS
  TSC: P-state invariant
real memory  = 137438953472 (131072 MB)
avail memory = 132973432832 (126813 MB)
Event timer LAPIC quality 400
ACPI APIC Table: HP ProLiant
FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs
...
ciss0: HP Smart Array P400 port 0x3000-0x30ff mem 
0xd9e0-0xd9ef,0xd9df-0xd9df0fff irq 16 at device 0.0 on pci8
ciss0: PERFORMANT Transport
...
da0 at ciss0 bus 0 scbus2 target 0 lun 0
da0: COMPAQ RAID 1(1+0) OK Fixed Direct Access SCSI-5 device 
da0: 135.168MB/s transfers
da0: Command Queueing enabled
da0: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C)
da0: quirks=0x1NO_SYNC_CACHE

--  Cut here  --

Kernel: Latest kernel as of yesterday (9.2Beta)

BIOS: is at the latest level (Support pack as of Spring 2013)
installed which updated BIOS, iLO etc. Aside from that I reset BIOS to
default values just to be sure. 

SmartArray P400 - Firmware 7.24 (latest)

Harddisks: Two 146GB HDs running in Raid1-mode.  Already tried
hot-swapping the disks - didn't change anything.

Needless to say - no error message etc. in neither dmesg nor
/var/log/messages :-(

To me it looks like this is some sort of timing problem - but where
should I start looking?

Thanks much in advance for any help,
-ewald
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org