Re: More than 32 CPUs under 8.4-P
Minutes after I typed that message 2x16 the system paniced with the following back trace: kdb_backtrace panic vdev_deadman vdev_deadman vdev_deadman spa_deadman softclock intr_event_execute_handlers ithread_loop fork_exit fork_trampoline I had just created a memory disk when that happened: root@iirc:~ # mdconfig -a -t swap -s 1g -u 1 root@iirc:~ # newfs -U /dev/md1 root@iirc:~ # mount /dev/md1 /mnt root@iirc:~ # cp -p procstat kgdb /mnt root@iirc:~ # cd /rescue/ root@iirc:/rescue # cp -p * /mnt On Sun, 2013-05-19 at 18:45 -0700, Dennis Glatting wrote: > On Sun, 2013-05-19 at 16:28 -0400, Paul Kraus wrote: > > On May 19, 2013, at 11:51 AM, Dennis Glatting wrote: > > > > > ZFS hangs on multi-socket systems (Tyan, Supermicro) under 9.1. ZFS does > > > not hang under 8.4. This (and one other 4 socket) is a production > > > system. > > > > Can you be more specific, I have been running 9.0 and 9.1 systems with > > multi-CPU and all ZFS with no (CPU related*) issues. > > > > I have (down to) ten FreeBSD/ZFS systems. Five of them are multi-socket > populated. All are AMD CPUs of the 6200 series. Two of those > multi-socketed systems are simply workstations and don't do much file > I/O, so I have yet to see them fault. > > The remaining three perform significant I/O in the 1-8TB (simultaneous) > file range, including sorting, compression, backup, etc (ZFS compression > is enabled on some data sets as is dedup on a few minor data sets). I > also do iSCSI and NFS from one of these systems. > > Simply, if I run 9.1 on those three busy systems ZFS will eventually > hang under load (within ten hours to a few days) whereas it does not > under 8.3/4. Two of those systems are 4x16 cores, one 2x16, and two 2x8 > cores. Multiple, simultaneous pbzip2 runs on individual 2-5TB ASCII > files generally causes a hang within 10-20 hours. > > "Hang" means the system is alive and on the network but disk I/O has > stopped. Run any command except statically linked executables on a > memory volume and they will not run (no output or return to command > prompt). This includes "reboot," which never really reboots. > > The volumes where work is performed are typically 12-33TB RAIDz2 > volumes. For example: > > root@mc:~ # zpool list disk-1 > NAME SIZE ALLOC FREECAP DEDUP HEALTH ALTROOT > disk-1 16.2T 5.86T 10.4T36% 1.32x ONLINE - > > root@mc:~ # zpool status disk-1 > pool: disk-1 > state: ONLINE > scan: scrub repaired 0 in 21h53m with 0 errors on Mon Apr 29 01:52:55 > 2013 > config: > > NAMESTATE READ WRITE CKSUM > disk-1 ONLINE 0 0 0 > raidz2-0 ONLINE 0 0 0 > da2 ONLINE 0 0 0 > da3 ONLINE 0 0 0 > da4 ONLINE 0 0 0 > da7 ONLINE 0 0 0 > da5 ONLINE 0 0 0 > da6 ONLINE 0 0 0 > cache > da0 ONLINE 0 0 0 > > errors: No known data errors > > > > * I say no CPU related issues because I have run into SATA timeout > > issues with an external SATA enclosure with 4 drives (I know, SATA port > > expanders are evil, but it is my best option here). Sometimes the zpool > > hangs hard, sometimes just becomes unresponsive for a while. My "fix", > > such as it is, is to tune the zfs per vdev queue depth as follows: > > > > vfs.zfs.vdev.min_pending="3" > > vfs.zfs.vdev.max_pending="5" > > > > I've not tried those. Currently, these are mine: > > vfs.zfs.write_limit_override="1G" > vfs.zfs.arc_max="8G" > vfs.zfs.txg.timeout=15 > vfs.zfs.cache_flush_disable=1 > > # Recommended from the net > # April, 2013 > vfs.zfs.l2arc_norw=0 # Default is 1 > vfs.zfs.l2arc_feed_again=0# Default is 1 > vfs.zfs.l2arc_noprefetch=0# Default is 0 > vfs.zfs.l2arc_feed_min_ms=1000# Default is 200 > > > > The defaults are 5 and 10 respectively, and when I run with those I > > have the timeout issues, but only under very heavy I/O load. I only > > generate such load when migrating large amounts of data, which > > thankfully does not happen all that often. > > > > Two days ago when the 9.1 system hanged I was able to run a static > procstat where it inadvertently(?) printed that da0 wasn't responsive on > the console. Unfortunately I didn't have a static camcontrol ready so I > was unable to query it. > > That said, according to the criteria from > https://wiki.freebsd.org/AvgZfsDeadlockDebug that hang isn't a true ZFS > problem, yet hung it was. > > I have since (today) updated the firmware of most of the devices in that > system and it is currently running some tasks. Most of the disks in that > system are Seagate but the un-updated devices include three WD disks > (RAID1 OS and a swap disk) -- unupdated because I haven't been able to > figure WD firmware download out) and a SSD where the manufactur
Re: More than 32 CPUs under 8.4-P
On Sun, 2013-05-19 at 16:28 -0400, Paul Kraus wrote: > On May 19, 2013, at 11:51 AM, Dennis Glatting wrote: > > > ZFS hangs on multi-socket systems (Tyan, Supermicro) under 9.1. ZFS does > > not hang under 8.4. This (and one other 4 socket) is a production > > system. > > Can you be more specific, I have been running 9.0 and 9.1 systems with > multi-CPU and all ZFS with no (CPU related*) issues. > I have (down to) ten FreeBSD/ZFS systems. Five of them are multi-socket populated. All are AMD CPUs of the 6200 series. Two of those multi-socketed systems are simply workstations and don't do much file I/O, so I have yet to see them fault. The remaining three perform significant I/O in the 1-8TB (simultaneous) file range, including sorting, compression, backup, etc (ZFS compression is enabled on some data sets as is dedup on a few minor data sets). I also do iSCSI and NFS from one of these systems. Simply, if I run 9.1 on those three busy systems ZFS will eventually hang under load (within ten hours to a few days) whereas it does not under 8.3/4. Two of those systems are 4x16 cores, one 2x16, and two 2x8 cores. Multiple, simultaneous pbzip2 runs on individual 2-5TB ASCII files generally causes a hang within 10-20 hours. "Hang" means the system is alive and on the network but disk I/O has stopped. Run any command except statically linked executables on a memory volume and they will not run (no output or return to command prompt). This includes "reboot," which never really reboots. The volumes where work is performed are typically 12-33TB RAIDz2 volumes. For example: root@mc:~ # zpool list disk-1 NAME SIZE ALLOC FREECAP DEDUP HEALTH ALTROOT disk-1 16.2T 5.86T 10.4T36% 1.32x ONLINE - root@mc:~ # zpool status disk-1 pool: disk-1 state: ONLINE scan: scrub repaired 0 in 21h53m with 0 errors on Mon Apr 29 01:52:55 2013 config: NAMESTATE READ WRITE CKSUM disk-1 ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 da4 ONLINE 0 0 0 da7 ONLINE 0 0 0 da5 ONLINE 0 0 0 da6 ONLINE 0 0 0 cache da0 ONLINE 0 0 0 errors: No known data errors > * I say no CPU related issues because I have run into SATA timeout > issues with an external SATA enclosure with 4 drives (I know, SATA port > expanders are evil, but it is my best option here). Sometimes the zpool > hangs hard, sometimes just becomes unresponsive for a while. My "fix", > such as it is, is to tune the zfs per vdev queue depth as follows: > > vfs.zfs.vdev.min_pending="3" > vfs.zfs.vdev.max_pending="5" > I've not tried those. Currently, these are mine: vfs.zfs.write_limit_override="1G" vfs.zfs.arc_max="8G" vfs.zfs.txg.timeout=15 vfs.zfs.cache_flush_disable=1 # Recommended from the net # April, 2013 vfs.zfs.l2arc_norw=0# Default is 1 vfs.zfs.l2arc_feed_again=0 # Default is 1 vfs.zfs.l2arc_noprefetch=0 # Default is 0 vfs.zfs.l2arc_feed_min_ms=1000 # Default is 200 > The defaults are 5 and 10 respectively, and when I run with those I > have the timeout issues, but only under very heavy I/O load. I only > generate such load when migrating large amounts of data, which > thankfully does not happen all that often. > Two days ago when the 9.1 system hanged I was able to run a static procstat where it inadvertently(?) printed that da0 wasn't responsive on the console. Unfortunately I didn't have a static camcontrol ready so I was unable to query it. That said, according to the criteria from https://wiki.freebsd.org/AvgZfsDeadlockDebug that hang isn't a true ZFS problem, yet hung it was. I have since (today) updated the firmware of most of the devices in that system and it is currently running some tasks. Most of the disks in that system are Seagate but the un-updated devices include three WD disks (RAID1 OS and a swap disk) -- unupdated because I haven't been able to figure WD firmware download out) and a SSD where the manufacturer indicates the firmware diff is minor, though I plan to go back and flash it anyway. If my 4x16 system ever finishes I will be updating its device's firmware too but it is an 8.4-P system and doesn't give me any trouble. Another 4x16 system gave me ZFS trouble under 9.1 but when I downgraded to 8.4-P it has been stable as a rock for the past 22 days often under heavy load. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: More than 32 CPUs under 8.4-P
On May 19, 2013, at 11:51 AM, Dennis Glatting wrote: > ZFS hangs on multi-socket systems (Tyan, Supermicro) under 9.1. ZFS does > not hang under 8.4. This (and one other 4 socket) is a production > system. Can you be more specific, I have been running 9.0 and 9.1 systems with multi-CPU and all ZFS with no (CPU related*) issues. * I say no CPU related issues because I have run into SATA timeout issues with an external SATA enclosure with 4 drives (I know, SATA port expanders are evil, but it is my best option here). Sometimes the zpool hangs hard, sometimes just becomes unresponsive for a while. My "fix", such as it is, is to tune the zfs per vdev queue depth as follows: vfs.zfs.vdev.min_pending="3" vfs.zfs.vdev.max_pending="5" The defaults are 5 and 10 respectively, and when I run with those I have the timeout issues, but only under very heavy I/O load. I only generate such load when migrating large amounts of data, which thankfully does not happen all that often. -- Paul Kraus Deputy Technical Director, LoneStarCon 3 Sound Coordinator, Schenectady Light Opera Company ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: More than 32 CPUs under 8.4-P
On 5/19/2013 10:51 AM, Dennis Glatting wrote: On Sun, 2013-05-19 at 11:48 +0200, Tijl Coosemans wrote: On 2013-05-18 19:13, Dennis Glatting wrote: I have a 4x16=64 core server running FreeBSD 8.4-P but only two of the CPUs (2x16=32) are enabled. Enabling the other 32 isn't as simple as changing MAXCPU in param.h (apparently) and recompiling. What do I need to do to enable the other 32 cores? Try FreeBSD 9.x. MAXCPU is 64 there. Not an option. ZFS hangs on multi-socket systems (Tyan, Supermicro) under 9.1. ZFS does not hang under 8.4. This (and one other 4 socket) is a production system. Does this seem to describe the problem? http://www.freebsd.org/cgi/query-pr.cgi?pr=177536 You can try switching to 9-stable. Looking at the source for 8.4, it's not being fixed in the 8 series. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: More than 32 CPUs under 8.4-P
On Sun, 2013-05-19 at 11:48 +0200, Tijl Coosemans wrote: > On 2013-05-18 19:13, Dennis Glatting wrote: > > I have a 4x16=64 core server running FreeBSD 8.4-P but only two of the > > CPUs (2x16=32) are enabled. Enabling the other 32 isn't as simple as > > changing MAXCPU in param.h (apparently) and recompiling. > > > > What do I need to do to enable the other 32 cores? > > Try FreeBSD 9.x. MAXCPU is 64 there. > Not an option. ZFS hangs on multi-socket systems (Tyan, Supermicro) under 9.1. ZFS does not hang under 8.4. This (and one other 4 socket) is a production system. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: More than 32 CPUs under 8.4-P
On 2013-05-18 19:13, Dennis Glatting wrote: > I have a 4x16=64 core server running FreeBSD 8.4-P but only two of the > CPUs (2x16=32) are enabled. Enabling the other 32 isn't as simple as > changing MAXCPU in param.h (apparently) and recompiling. > > What do I need to do to enable the other 32 cores? Try FreeBSD 9.x. MAXCPU is 64 there. signature.asc Description: OpenPGP digital signature
Re: More than 32 CPUs under 8.4-P
В Sat, 18 May 2013 10:13:08 -0700 Dennis Glatting пишет: > > I have a 4x16=64 core server running FreeBSD 8.4-P but only two of the > CPUs (2x16=32) are enabled. Enabling the other 32 isn't as simple as > changing MAXCPU in param.h (apparently) and recompiling. > > What do I need to do to enable the other 32 cores? > > > Copyright (c) 1992-2013 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, > 1994 The Regents of the University of California. All rights > reserved. > FreeBSD is a registered trademark of The FreeBSD Foundation. > FreeBSD 8.4-PRERELEASE #0 r250401: Wed May 8 21:46:23 PDT 2013 > root@mc:/disk-2/obj/disk-1/src/sys/GENERIC amd64 > gcc version 4.2.1 20070831 patched [FreeBSD] > Timecounter "i8254" frequency 1193182 Hz quality 0 > CPU: AMD Opteron(TM) Processor 6274 (2200.04-MHz > K8-class CPU) > Origin = "AuthenticAMD" Id = 0x600f12 Family = 15 Model = 1 > Stepping = 2 > > Features=0x178bfbff > > Features2=0x1e98220b > AMD Features=0x2e500800 > AMD > Features2=0x1c9bfff,> > TSC: P-state invariant > real memory = 137438953472 (131072 MB) > avail memory = 132427862016 (126293 MB) > ACPI APIC Table: <120911 APIC1027> > FreeBSD/SMP: Multiprocessor System Detected: 32 CPUs > FreeBSD/SMP: 2 package(s) x 16 core(s) > http://svnweb.freebsd.org/base/releng/8.4/sys/amd64/include/param.h?revision=248810&view=markup Set MAXCPU parameter to 64 ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: More than 32 CPUs under 8.4-P
On Sat, 2013-05-18 at 20:24 +0300, Ivan Klymenko wrote: > В Sat, 18 May 2013 10:13:08 -0700 > Dennis Glatting пишет: > > > > > I have a 4x16=64 core server running FreeBSD 8.4-P but only two of the > > CPUs (2x16=32) are enabled. Enabling the other 32 isn't as simple as > > changing MAXCPU in param.h (apparently) and recompiling. > > Oops, sorry :) > In the boot sequence the kernel (loop) kept printing errors to the console and didn't get to a command prompt. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: More than 32 CPUs under 8.4-P
В Sat, 18 May 2013 10:13:08 -0700 Dennis Glatting пишет: > > I have a 4x16=64 core server running FreeBSD 8.4-P but only two of the > CPUs (2x16=32) are enabled. Enabling the other 32 isn't as simple as > changing MAXCPU in param.h (apparently) and recompiling. Oops, sorry :) ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"