Re: System hangs for several minutes (disk IO related)
Hi, I see similar hangs on one of our Supermicro servers. We have a ZFS RAID (mirrored stripped vdevs) and when I use zfs receive to receive snapshots the whole system hangs for up to ten or even more minutes at the end. Kernel: latest (9.2-RC3) Adaptec 6805 RAID-Controller provides disks for ZFS via JBOD /var/log/messages and dmesg do not show anything related to the hangs. I hope this helps to analyze that issue any further. Regards, Nils Pascal Illenseer -- Cut here -- Copyright (c) 1992-2013 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 9.2-RC3 #0 r254795: Sat Aug 24 20:25:04 UTC 2013 r...@bake.isc.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 gcc version 4.2.1 20070831 patched [FreeBSD] CPU: AMD Opteron(tm) Processor 6376 (2300.05-MHz K8-class CPU) Origin = AuthenticAMD Id = 0x600f20 Family = 0x15 Model = 0x2 Stepping = 0 Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT Features2=0x3e98320bSSE3,PCLMULQDQ,MON,SSSE3,FMA,CX16,SSE4.1,SSE4.2,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C AMD Features=0x2e500800SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM AMD Features2=0x1ebbfffLAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,XOP,SKINIT,WDT,LWP,FMA4,b17,NodeId,TBM,Topology,b23,b24 Standard Extended Features=0x8 TSC: P-state invariant, performance statistics real memory = 137438953472 (131072 MB) avail memory = 133006090240 (126844 MB) Event timer LAPIC quality 400 ACPI APIC Table: 050713 APIC1654 FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs FreeBSD/SMP: 1 package(s) x 16 core(s) … aacraid0: Adaptec RAID Controller mem 0xfd80-0xfdbf,0xfd7bf800-0xfd7b,0xfd7bf400-0xfd7bf4ff irq 28 at device 0.0 on pci1 aacraid0: Enable Raw I/O aacraid0: Enable 64-bit array aacraid0: New comm. interface type1 enabled aacraid0: Adaptec 6805, aacraid driver 3.1.1-1 aacraidp0 on aacraid0 aacraidp1 on aacraid0 aacraidp2 on aacraid0 aacraidp3 on aacraid0 -- Cut here -- Am 30.07.2013 um 19:19 schrieb Ewald Jenisch a...@jenisch.at: Hi, I'm seeing rather strange behavior on an HP DL585 G5 wrt. disk IO: When there's any disk io the machine completely freezes, i.e. no console input possible, no screen output - complete hang. After some minutes the box comes back to normal again - but sure enough with the next disk io it freezes again. To give you a typical example: While a portsnap fetch extract was running I did a sync. Normally this should complete in a matter of milliseconds to seconds in the worst case - but dig this: # date;time sync;date Tue Jul 30 09:57:38 CEST 2013 0.000u 0.311s 9:54.69 0.0% 4+161k 0+1287io 0pf+0w Tue Jul 30 10:07:38 CEST 2013 # No, this is not a typo - it really took nearly ten minutes (!) for the sync to complete. In the meantime - every windows, all activity (console, screen-output etc.) is completely blocked. ('portsnap fetch extract' was only given as an example here - the lockup occurs whenever there is disk io like for example tar, etc). We're speaking about a machine with decent hardware here, here's an excerpt from dmesg: -- Cut here -- FreeBSD 9.2-BETA2 #0 r253750: Mon Jul 29 11:07:04 CEST 2013 root@sniff-rz2:/usr/obj/usr/src/sys/GENERIC amd64 gcc version 4.2.1 20070831 patched [FreeBSD] CPU: Quad-Core AMD Opteron(tm) Processor 8358 SE (2411.16-MHz K8-class CPU) Origin = AuthenticAMD Id = 0x100f23 Family = 0x10 Model = 0x2 Stepping = 3 Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT Features2=0x802009SSE3,MON,CX16,POPCNT AMD Features=0xee400800SYSCALL,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow! AMD Features2=0x7ffLAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS TSC: P-state invariant real memory = 137438953472 (131072 MB) avail memory = 132973432832 (126813 MB) Event timer LAPIC quality 400 ACPI APIC Table: HP ProLiant FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs ... ciss0: HP Smart Array P400 port 0x3000-0x30ff mem 0xd9e0-0xd9ef,0xd9df-0xd9df0fff irq 16 at device 0.0 on pci8 ciss0: PERFORMANT Transport ... da0 at ciss0 bus 0 scbus2 target 0 lun 0 da0: COMPAQ RAID 1(1+0) OK Fixed Direct Access SCSI-5 device da0: 135.168MB/s transfers da0: Command Queueing enabled da0: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C) da0: quirks=0x1NO_SYNC_CACHE -- Cut here -- Kernel: Latest kernel as of yesterday (9.2Beta) BIOS: is at the latest level
Re: System hangs for several minutes (disk IO related)
If you Google SmartArray P400 I'm sure you will find tons of horror stories. As soon as I saw that in your post I recalled looking into an issue for a customer not too long ago. In short, it's a very bad controller with tons of issues. I could be mistaking this for another controller, but I'm pretty confident this is the same one. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: System hangs for several minutes (disk IO related)
On 7/31/13 9:41 AM, Mark Felder wrote: If you Google SmartArray P400 I'm sure you will find tons of horror stories. As soon as I saw that in your post I recalled looking into an issue for a customer not too long ago. In short, it's a very bad controller with tons of issues. I could be mistaking this for another controller, but I'm pretty confident this is the same one. FreeBSD 9.1 is running here on a few DL360 G5's with the P400i and I've seen no problems with them. I've seen about 200 of the same machines in production for a few years with Windows and Linux and not a particularly notable failure rate. I just did a simple web search for P400 problems and didn't come up with much. I'd be surprised if there's a significant difference between the P400i and the P400. - John J. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
System hangs for several minutes (disk IO related)
Hi, I'm seeing rather strange behavior on an HP DL585 G5 wrt. disk IO: When there's any disk io the machine completely freezes, i.e. no console input possible, no screen output - complete hang. After some minutes the box comes back to normal again - but sure enough with the next disk io it freezes again. To give you a typical example: While a portsnap fetch extract was running I did a sync. Normally this should complete in a matter of milliseconds to seconds in the worst case - but dig this: # date;time sync;date Tue Jul 30 09:57:38 CEST 2013 0.000u 0.311s 9:54.69 0.0% 4+161k 0+1287io 0pf+0w Tue Jul 30 10:07:38 CEST 2013 # No, this is not a typo - it really took nearly ten minutes (!) for the sync to complete. In the meantime - every windows, all activity (console, screen-output etc.) is completely blocked. ('portsnap fetch extract' was only given as an example here - the lockup occurs whenever there is disk io like for example tar, etc). We're speaking about a machine with decent hardware here, here's an excerpt from dmesg: -- Cut here -- FreeBSD 9.2-BETA2 #0 r253750: Mon Jul 29 11:07:04 CEST 2013 root@sniff-rz2:/usr/obj/usr/src/sys/GENERIC amd64 gcc version 4.2.1 20070831 patched [FreeBSD] CPU: Quad-Core AMD Opteron(tm) Processor 8358 SE (2411.16-MHz K8-class CPU) Origin = AuthenticAMD Id = 0x100f23 Family = 0x10 Model = 0x2 Stepping = 3 Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT Features2=0x802009SSE3,MON,CX16,POPCNT AMD Features=0xee400800SYSCALL,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow! AMD Features2=0x7ffLAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS TSC: P-state invariant real memory = 137438953472 (131072 MB) avail memory = 132973432832 (126813 MB) Event timer LAPIC quality 400 ACPI APIC Table: HP ProLiant FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs ... ciss0: HP Smart Array P400 port 0x3000-0x30ff mem 0xd9e0-0xd9ef,0xd9df-0xd9df0fff irq 16 at device 0.0 on pci8 ciss0: PERFORMANT Transport ... da0 at ciss0 bus 0 scbus2 target 0 lun 0 da0: COMPAQ RAID 1(1+0) OK Fixed Direct Access SCSI-5 device da0: 135.168MB/s transfers da0: Command Queueing enabled da0: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C) da0: quirks=0x1NO_SYNC_CACHE -- Cut here -- Kernel: Latest kernel as of yesterday (9.2Beta) BIOS: is at the latest level (Support pack as of Spring 2013) installed which updated BIOS, iLO etc. Aside from that I reset BIOS to default values just to be sure. SmartArray P400 - Firmware 7.24 (latest) Harddisks: Two 146GB HDs running in Raid1-mode. Already tried hot-swapping the disks - didn't change anything. Needless to say - no error message etc. in neither dmesg nor /var/log/messages :-( To me it looks like this is some sort of timing problem - but where should I start looking? Thanks much in advance for any help, -ewald ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org