Re: More than 32 CPUs under 8.4-P

2013-05-19 Thread Dennis Glatting

Minutes after I typed that message 2x16 the system paniced with the
following back trace:

kdb_backtrace
panic
vdev_deadman
vdev_deadman
vdev_deadman
spa_deadman
softclock
intr_event_execute_handlers
ithread_loop
fork_exit
fork_trampoline

I had just created a memory disk when that happened:

root@iirc:~ # mdconfig -a -t swap -s 1g -u 1
root@iirc:~ # newfs -U /dev/md1
root@iirc:~ # mount /dev/md1 /mnt
root@iirc:~ # cp -p procstat kgdb /mnt
root@iirc:~ # cd /rescue/
root@iirc:/rescue # cp -p * /mnt







On Sun, 2013-05-19 at 18:45 -0700, Dennis Glatting wrote:
> On Sun, 2013-05-19 at 16:28 -0400, Paul Kraus wrote:
> > On May 19, 2013, at 11:51 AM, Dennis Glatting  wrote:
> > 
> > > ZFS hangs on multi-socket systems (Tyan, Supermicro) under 9.1. ZFS does
> > > not hang under 8.4. This (and one other 4 socket) is a production
> > > system.
> > 
> > Can you be more specific, I have been running 9.0 and 9.1 systems with
> > multi-CPU and all ZFS with no (CPU related*) issues.
> > 
> 
> I have (down to) ten FreeBSD/ZFS systems. Five of them are multi-socket
> populated. All are AMD CPUs of the 6200 series. Two of those
> multi-socketed systems are simply workstations and don't do much file
> I/O, so I have yet to see them fault.
> 
> The remaining three perform significant I/O in the 1-8TB (simultaneous)
> file range, including sorting, compression, backup, etc (ZFS compression
> is enabled on some data sets as is dedup on a few minor data sets). I
> also do iSCSI and NFS from one of these systems.
> 
> Simply, if I run 9.1 on those three busy systems ZFS will eventually
> hang under load (within ten hours to a few days) whereas it does not
> under 8.3/4. Two of those systems are 4x16 cores, one 2x16, and two 2x8
> cores. Multiple, simultaneous pbzip2 runs on individual 2-5TB ASCII
> files generally causes a hang within 10-20 hours.
> 
> "Hang" means the system is alive and on the network but disk I/O has
> stopped. Run any command except statically linked executables on a
> memory volume and they will not run (no output or return to command
> prompt). This includes "reboot," which never really reboots.
> 
> The volumes where work is performed are typically 12-33TB RAIDz2
> volumes. For example:
> 
> root@mc:~ # zpool list disk-1
> NAME SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
> disk-1  16.2T  5.86T  10.4T36%  1.32x  ONLINE  -
> 
> root@mc:~ # zpool status disk-1
>   pool: disk-1
>  state: ONLINE
>   scan: scrub repaired 0 in 21h53m with 0 errors on Mon Apr 29 01:52:55
> 2013
> config:
> 
>   NAMESTATE READ WRITE CKSUM
>   disk-1  ONLINE   0 0 0
> raidz2-0  ONLINE   0 0 0
>   da2 ONLINE   0 0 0
>   da3 ONLINE   0 0 0
>   da4 ONLINE   0 0 0
>   da7 ONLINE   0 0 0
>   da5 ONLINE   0 0 0
>   da6 ONLINE   0 0 0
>   cache
> da0   ONLINE   0 0 0
> 
> errors: No known data errors
> 
> 
> > * I say no CPU related issues because I have run into SATA timeout
> > issues with an external SATA enclosure with 4 drives (I know, SATA port
> > expanders are evil, but it is my best option here). Sometimes the zpool
> > hangs hard, sometimes just becomes unresponsive for a while. My "fix",
> > such as it is, is to tune the zfs per vdev queue depth as follows:
> > 
> > vfs.zfs.vdev.min_pending="3"
> > vfs.zfs.vdev.max_pending="5"
> > 
> 
> I've not tried those. Currently, these are mine:
> 
> vfs.zfs.write_limit_override="1G"
> vfs.zfs.arc_max="8G"
> vfs.zfs.txg.timeout=15
> vfs.zfs.cache_flush_disable=1
> 
> # Recommended from the net
> # April, 2013
> vfs.zfs.l2arc_norw=0  # Default is 1
> vfs.zfs.l2arc_feed_again=0# Default is 1
> vfs.zfs.l2arc_noprefetch=0# Default is 0
> vfs.zfs.l2arc_feed_min_ms=1000# Default is 200
> 
> 
> > The defaults are 5 and 10 respectively, and when I run with those I
> > have the timeout issues, but only under very heavy I/O load. I only
> > generate such load when migrating large amounts of data, which
> > thankfully does not happen all that often.
> > 
> 
> Two days ago when the 9.1 system hanged I was able to run a static
> procstat where it inadvertently(?) printed that da0 wasn't responsive on
> the console. Unfortunately I didn't have a static camcontrol ready so I
> was unable to query it.
> 
> That said, according to the criteria from
> https://wiki.freebsd.org/AvgZfsDeadlockDebug that hang isn't a true ZFS
> problem, yet hung it was.
> 
> I have since (today) updated the firmware of most of the devices in that
> system and it is currently running some tasks. Most of the disks in that
> system are Seagate but the un-updated devices include three WD disks
> (RAID1 OS and a swap disk) -- unupdated because I haven't been able to
> figure WD firmware download out) and a SSD where the manufactur

Re: More than 32 CPUs under 8.4-P

2013-05-19 Thread Dennis Glatting
On Sun, 2013-05-19 at 16:28 -0400, Paul Kraus wrote:
> On May 19, 2013, at 11:51 AM, Dennis Glatting  wrote:
> 
> > ZFS hangs on multi-socket systems (Tyan, Supermicro) under 9.1. ZFS does
> > not hang under 8.4. This (and one other 4 socket) is a production
> > system.
> 
>   Can you be more specific, I have been running 9.0 and 9.1 systems with
> multi-CPU and all ZFS with no (CPU related*) issues.
> 

I have (down to) ten FreeBSD/ZFS systems. Five of them are multi-socket
populated. All are AMD CPUs of the 6200 series. Two of those
multi-socketed systems are simply workstations and don't do much file
I/O, so I have yet to see them fault.

The remaining three perform significant I/O in the 1-8TB (simultaneous)
file range, including sorting, compression, backup, etc (ZFS compression
is enabled on some data sets as is dedup on a few minor data sets). I
also do iSCSI and NFS from one of these systems.

Simply, if I run 9.1 on those three busy systems ZFS will eventually
hang under load (within ten hours to a few days) whereas it does not
under 8.3/4. Two of those systems are 4x16 cores, one 2x16, and two 2x8
cores. Multiple, simultaneous pbzip2 runs on individual 2-5TB ASCII
files generally causes a hang within 10-20 hours.

"Hang" means the system is alive and on the network but disk I/O has
stopped. Run any command except statically linked executables on a
memory volume and they will not run (no output or return to command
prompt). This includes "reboot," which never really reboots.

The volumes where work is performed are typically 12-33TB RAIDz2
volumes. For example:

root@mc:~ # zpool list disk-1
NAME SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
disk-1  16.2T  5.86T  10.4T36%  1.32x  ONLINE  -

root@mc:~ # zpool status disk-1
  pool: disk-1
 state: ONLINE
  scan: scrub repaired 0 in 21h53m with 0 errors on Mon Apr 29 01:52:55
2013
config:

NAMESTATE READ WRITE CKSUM
disk-1  ONLINE   0 0 0
  raidz2-0  ONLINE   0 0 0
da2 ONLINE   0 0 0
da3 ONLINE   0 0 0
da4 ONLINE   0 0 0
da7 ONLINE   0 0 0
da5 ONLINE   0 0 0
da6 ONLINE   0 0 0
cache
  da0   ONLINE   0 0 0

errors: No known data errors


> * I say no CPU related issues because I have run into SATA timeout
> issues with an external SATA enclosure with 4 drives (I know, SATA port
> expanders are evil, but it is my best option here). Sometimes the zpool
> hangs hard, sometimes just becomes unresponsive for a while. My "fix",
> such as it is, is to tune the zfs per vdev queue depth as follows:
> 
> vfs.zfs.vdev.min_pending="3"
> vfs.zfs.vdev.max_pending="5"
> 

I've not tried those. Currently, these are mine:

vfs.zfs.write_limit_override="1G"
vfs.zfs.arc_max="8G"
vfs.zfs.txg.timeout=15
vfs.zfs.cache_flush_disable=1

# Recommended from the net
# April, 2013
vfs.zfs.l2arc_norw=0# Default is 1
vfs.zfs.l2arc_feed_again=0  # Default is 1
vfs.zfs.l2arc_noprefetch=0  # Default is 0
vfs.zfs.l2arc_feed_min_ms=1000  # Default is 200


> The defaults are 5 and 10 respectively, and when I run with those I
> have the timeout issues, but only under very heavy I/O load. I only
> generate such load when migrating large amounts of data, which
> thankfully does not happen all that often.
> 

Two days ago when the 9.1 system hanged I was able to run a static
procstat where it inadvertently(?) printed that da0 wasn't responsive on
the console. Unfortunately I didn't have a static camcontrol ready so I
was unable to query it.

That said, according to the criteria from
https://wiki.freebsd.org/AvgZfsDeadlockDebug that hang isn't a true ZFS
problem, yet hung it was.

I have since (today) updated the firmware of most of the devices in that
system and it is currently running some tasks. Most of the disks in that
system are Seagate but the un-updated devices include three WD disks
(RAID1 OS and a swap disk) -- unupdated because I haven't been able to
figure WD firmware download out) and a SSD where the manufacturer
indicates the firmware diff is minor, though I plan to go back and flash
it anyway.

If my 4x16 system ever finishes I will be updating its device's firmware
too but it is an 8.4-P system and doesn't give me any trouble. Another
4x16 system gave me ZFS trouble under 9.1 but when I downgraded to 8.4-P
it has been stable as a rock for the past 22 days often under heavy
load.





___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: More than 32 CPUs under 8.4-P

2013-05-19 Thread Paul Kraus
On May 19, 2013, at 11:51 AM, Dennis Glatting  wrote:

> ZFS hangs on multi-socket systems (Tyan, Supermicro) under 9.1. ZFS does
> not hang under 8.4. This (and one other 4 socket) is a production
> system.

Can you be more specific, I have been running 9.0 and 9.1 systems with 
multi-CPU and all ZFS with no (CPU related*) issues.

* I say no CPU related issues because I have run into SATA timeout issues with 
an external SATA enclosure with 4 drives (I know, SATA port expanders are evil, 
but it is my best option here). Sometimes the zpool hangs hard, sometimes just 
becomes unresponsive for a while. My "fix", such as it is, is to tune the zfs 
per vdev queue depth as follows:

vfs.zfs.vdev.min_pending="3"
vfs.zfs.vdev.max_pending="5"

The defaults are 5 and 10 respectively, and when I run with those I have the 
timeout issues, but only under very heavy I/O load. I only generate such load 
when migrating large amounts of data, which thankfully does not happen all that 
often.

--
Paul Kraus
Deputy Technical Director, LoneStarCon 3
Sound Coordinator, Schenectady Light Opera Company

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: More than 32 CPUs under 8.4-P

2013-05-19 Thread Joshua Isom

On 5/19/2013 10:51 AM, Dennis Glatting wrote:

On Sun, 2013-05-19 at 11:48 +0200, Tijl Coosemans wrote:

On 2013-05-18 19:13, Dennis Glatting wrote:

I have a 4x16=64 core server running FreeBSD 8.4-P but only two of the
CPUs (2x16=32) are enabled. Enabling the other 32 isn't as simple as
changing MAXCPU in param.h (apparently) and recompiling.

What do I need to do to enable the other 32 cores?


Try FreeBSD 9.x. MAXCPU is 64 there.



Not an option.

ZFS hangs on multi-socket systems (Tyan, Supermicro) under 9.1. ZFS does
not hang under 8.4. This (and one other 4 socket) is a production
system.




Does this seem to describe the problem?

http://www.freebsd.org/cgi/query-pr.cgi?pr=177536

You can try switching to 9-stable.  Looking at the source for 8.4, it's 
not being fixed in the 8 series.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: More than 32 CPUs under 8.4-P

2013-05-19 Thread Dennis Glatting
On Sun, 2013-05-19 at 11:48 +0200, Tijl Coosemans wrote:
> On 2013-05-18 19:13, Dennis Glatting wrote:
> > I have a 4x16=64 core server running FreeBSD 8.4-P but only two of the
> > CPUs (2x16=32) are enabled. Enabling the other 32 isn't as simple as
> > changing MAXCPU in param.h (apparently) and recompiling.
> > 
> > What do I need to do to enable the other 32 cores?
> 
> Try FreeBSD 9.x. MAXCPU is 64 there.
> 

Not an option. 

ZFS hangs on multi-socket systems (Tyan, Supermicro) under 9.1. ZFS does
not hang under 8.4. This (and one other 4 socket) is a production
system.







___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: More than 32 CPUs under 8.4-P

2013-05-19 Thread Tijl Coosemans
On 2013-05-18 19:13, Dennis Glatting wrote:
> I have a 4x16=64 core server running FreeBSD 8.4-P but only two of the
> CPUs (2x16=32) are enabled. Enabling the other 32 isn't as simple as
> changing MAXCPU in param.h (apparently) and recompiling.
> 
> What do I need to do to enable the other 32 cores?

Try FreeBSD 9.x. MAXCPU is 64 there.



signature.asc
Description: OpenPGP digital signature


Re: More than 32 CPUs under 8.4-P

2013-05-18 Thread Ivan Klymenko
В Sat, 18 May 2013 10:13:08 -0700
Dennis Glatting  пишет:

> 
> I have a 4x16=64 core server running FreeBSD 8.4-P but only two of the
> CPUs (2x16=32) are enabled. Enabling the other 32 isn't as simple as
> changing MAXCPU in param.h (apparently) and recompiling.
> 
> What do I need to do to enable the other 32 cores?
> 
> 
> Copyright (c) 1992-2013 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993,
> 1994 The Regents of the University of California. All rights
> reserved.
> FreeBSD is a registered trademark of The FreeBSD Foundation.
> FreeBSD 8.4-PRERELEASE #0 r250401: Wed May  8 21:46:23 PDT 2013
> root@mc:/disk-2/obj/disk-1/src/sys/GENERIC amd64
> gcc version 4.2.1 20070831 patched [FreeBSD]
> Timecounter "i8254" frequency 1193182 Hz quality 0
> CPU: AMD Opteron(TM) Processor 6274  (2200.04-MHz
> K8-class CPU)
>   Origin = "AuthenticAMD"  Id = 0x600f12  Family = 15  Model = 1
> Stepping = 2
> 
> Features=0x178bfbff
> 
> Features2=0x1e98220b
>   AMD Features=0x2e500800
>   AMD
> Features2=0x1c9bfff,>
>   TSC: P-state invariant
> real memory  = 137438953472 (131072 MB)
> avail memory = 132427862016 (126293 MB)
> ACPI APIC Table: <120911 APIC1027>
> FreeBSD/SMP: Multiprocessor System Detected: 32 CPUs
> FreeBSD/SMP: 2 package(s) x 16 core(s)
> 

http://svnweb.freebsd.org/base/releng/8.4/sys/amd64/include/param.h?revision=248810&view=markup

Set MAXCPU parameter to 64
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Re: More than 32 CPUs under 8.4-P

2013-05-18 Thread Dennis Glatting
On Sat, 2013-05-18 at 20:24 +0300, Ivan Klymenko wrote:
> В Sat, 18 May 2013 10:13:08 -0700
> Dennis Glatting  пишет:
> 
> > 
> > I have a 4x16=64 core server running FreeBSD 8.4-P but only two of the
> > CPUs (2x16=32) are enabled. Enabling the other 32 isn't as simple as
> > changing MAXCPU in param.h (apparently) and recompiling.
> 
> Oops, sorry :)
> 

In the boot sequence the kernel (loop) kept printing errors to the
console and didn't get to a command prompt.




___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Re: More than 32 CPUs under 8.4-P

2013-05-18 Thread Ivan Klymenko
В Sat, 18 May 2013 10:13:08 -0700
Dennis Glatting  пишет:

> 
> I have a 4x16=64 core server running FreeBSD 8.4-P but only two of the
> CPUs (2x16=32) are enabled. Enabling the other 32 isn't as simple as
> changing MAXCPU in param.h (apparently) and recompiling.

Oops, sorry :)
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"