Re: More than 32 CPUs under 8.4-P

2013-05-19 Thread Tijl Coosemans
On 2013-05-18 19:13, Dennis Glatting wrote:
 I have a 4x16=64 core server running FreeBSD 8.4-P but only two of the
 CPUs (2x16=32) are enabled. Enabling the other 32 isn't as simple as
 changing MAXCPU in param.h (apparently) and recompiling.
 
 What do I need to do to enable the other 32 cores?

Try FreeBSD 9.x. MAXCPU is 64 there.



signature.asc
Description: OpenPGP digital signature


Re: More than 32 CPUs under 8.4-P

2013-05-19 Thread Dennis Glatting
On Sun, 2013-05-19 at 11:48 +0200, Tijl Coosemans wrote:
 On 2013-05-18 19:13, Dennis Glatting wrote:
  I have a 4x16=64 core server running FreeBSD 8.4-P but only two of the
  CPUs (2x16=32) are enabled. Enabling the other 32 isn't as simple as
  changing MAXCPU in param.h (apparently) and recompiling.
  
  What do I need to do to enable the other 32 cores?
 
 Try FreeBSD 9.x. MAXCPU is 64 there.
 

Not an option. 

ZFS hangs on multi-socket systems (Tyan, Supermicro) under 9.1. ZFS does
not hang under 8.4. This (and one other 4 socket) is a production
system.







___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: More than 32 CPUs under 8.4-P

2013-05-19 Thread Joshua Isom

On 5/19/2013 10:51 AM, Dennis Glatting wrote:

On Sun, 2013-05-19 at 11:48 +0200, Tijl Coosemans wrote:

On 2013-05-18 19:13, Dennis Glatting wrote:

I have a 4x16=64 core server running FreeBSD 8.4-P but only two of the
CPUs (2x16=32) are enabled. Enabling the other 32 isn't as simple as
changing MAXCPU in param.h (apparently) and recompiling.

What do I need to do to enable the other 32 cores?


Try FreeBSD 9.x. MAXCPU is 64 there.



Not an option.

ZFS hangs on multi-socket systems (Tyan, Supermicro) under 9.1. ZFS does
not hang under 8.4. This (and one other 4 socket) is a production
system.




Does this seem to describe the problem?

http://www.freebsd.org/cgi/query-pr.cgi?pr=177536

You can try switching to 9-stable.  Looking at the source for 8.4, it's 
not being fixed in the 8 series.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: More than 32 CPUs under 8.4-P

2013-05-19 Thread Paul Kraus
On May 19, 2013, at 11:51 AM, Dennis Glatting free...@pki2.com wrote:

 ZFS hangs on multi-socket systems (Tyan, Supermicro) under 9.1. ZFS does
 not hang under 8.4. This (and one other 4 socket) is a production
 system.

Can you be more specific, I have been running 9.0 and 9.1 systems with 
multi-CPU and all ZFS with no (CPU related*) issues.

* I say no CPU related issues because I have run into SATA timeout issues with 
an external SATA enclosure with 4 drives (I know, SATA port expanders are evil, 
but it is my best option here). Sometimes the zpool hangs hard, sometimes just 
becomes unresponsive for a while. My fix, such as it is, is to tune the zfs 
per vdev queue depth as follows:

vfs.zfs.vdev.min_pending=3
vfs.zfs.vdev.max_pending=5

The defaults are 5 and 10 respectively, and when I run with those I have the 
timeout issues, but only under very heavy I/O load. I only generate such load 
when migrating large amounts of data, which thankfully does not happen all that 
often.

--
Paul Kraus
Deputy Technical Director, LoneStarCon 3
Sound Coordinator, Schenectady Light Opera Company

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: More than 32 CPUs under 8.4-P

2013-05-19 Thread Dennis Glatting
On Sun, 2013-05-19 at 16:28 -0400, Paul Kraus wrote:
 On May 19, 2013, at 11:51 AM, Dennis Glatting free...@pki2.com wrote:
 
  ZFS hangs on multi-socket systems (Tyan, Supermicro) under 9.1. ZFS does
  not hang under 8.4. This (and one other 4 socket) is a production
  system.
 
   Can you be more specific, I have been running 9.0 and 9.1 systems with
 multi-CPU and all ZFS with no (CPU related*) issues.
 

I have (down to) ten FreeBSD/ZFS systems. Five of them are multi-socket
populated. All are AMD CPUs of the 6200 series. Two of those
multi-socketed systems are simply workstations and don't do much file
I/O, so I have yet to see them fault.

The remaining three perform significant I/O in the 1-8TB (simultaneous)
file range, including sorting, compression, backup, etc (ZFS compression
is enabled on some data sets as is dedup on a few minor data sets). I
also do iSCSI and NFS from one of these systems.

Simply, if I run 9.1 on those three busy systems ZFS will eventually
hang under load (within ten hours to a few days) whereas it does not
under 8.3/4. Two of those systems are 4x16 cores, one 2x16, and two 2x8
cores. Multiple, simultaneous pbzip2 runs on individual 2-5TB ASCII
files generally causes a hang within 10-20 hours.

Hang means the system is alive and on the network but disk I/O has
stopped. Run any command except statically linked executables on a
memory volume and they will not run (no output or return to command
prompt). This includes reboot, which never really reboots.

The volumes where work is performed are typically 12-33TB RAIDz2
volumes. For example:

root@mc:~ # zpool list disk-1
NAME SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
disk-1  16.2T  5.86T  10.4T36%  1.32x  ONLINE  -

root@mc:~ # zpool status disk-1
  pool: disk-1
 state: ONLINE
  scan: scrub repaired 0 in 21h53m with 0 errors on Mon Apr 29 01:52:55
2013
config:

NAMESTATE READ WRITE CKSUM
disk-1  ONLINE   0 0 0
  raidz2-0  ONLINE   0 0 0
da2 ONLINE   0 0 0
da3 ONLINE   0 0 0
da4 ONLINE   0 0 0
da7 ONLINE   0 0 0
da5 ONLINE   0 0 0
da6 ONLINE   0 0 0
cache
  da0   ONLINE   0 0 0

errors: No known data errors


 * I say no CPU related issues because I have run into SATA timeout
 issues with an external SATA enclosure with 4 drives (I know, SATA port
 expanders are evil, but it is my best option here). Sometimes the zpool
 hangs hard, sometimes just becomes unresponsive for a while. My fix,
 such as it is, is to tune the zfs per vdev queue depth as follows:
 
 vfs.zfs.vdev.min_pending=3
 vfs.zfs.vdev.max_pending=5
 

I've not tried those. Currently, these are mine:

vfs.zfs.write_limit_override=1G
vfs.zfs.arc_max=8G
vfs.zfs.txg.timeout=15
vfs.zfs.cache_flush_disable=1

# Recommended from the net
# April, 2013
vfs.zfs.l2arc_norw=0# Default is 1
vfs.zfs.l2arc_feed_again=0  # Default is 1
vfs.zfs.l2arc_noprefetch=0  # Default is 0
vfs.zfs.l2arc_feed_min_ms=1000  # Default is 200


 The defaults are 5 and 10 respectively, and when I run with those I
 have the timeout issues, but only under very heavy I/O load. I only
 generate such load when migrating large amounts of data, which
 thankfully does not happen all that often.
 

Two days ago when the 9.1 system hanged I was able to run a static
procstat where it inadvertently(?) printed that da0 wasn't responsive on
the console. Unfortunately I didn't have a static camcontrol ready so I
was unable to query it.

That said, according to the criteria from
https://wiki.freebsd.org/AvgZfsDeadlockDebug that hang isn't a true ZFS
problem, yet hung it was.

I have since (today) updated the firmware of most of the devices in that
system and it is currently running some tasks. Most of the disks in that
system are Seagate but the un-updated devices include three WD disks
(RAID1 OS and a swap disk) -- unupdated because I haven't been able to
figure WD firmware download out) and a SSD where the manufacturer
indicates the firmware diff is minor, though I plan to go back and flash
it anyway.

If my 4x16 system ever finishes I will be updating its device's firmware
too but it is an 8.4-P system and doesn't give me any trouble. Another
4x16 system gave me ZFS trouble under 9.1 but when I downgraded to 8.4-P
it has been stable as a rock for the past 22 days often under heavy
load.





___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: More than 32 CPUs under 8.4-P

2013-05-19 Thread Dennis Glatting

Minutes after I typed that message 2x16 the system paniced with the
following back trace:

kdb_backtrace
panic
vdev_deadman
vdev_deadman
vdev_deadman
spa_deadman
softclock
intr_event_execute_handlers
ithread_loop
fork_exit
fork_trampoline

I had just created a memory disk when that happened:

root@iirc:~ # mdconfig -a -t swap -s 1g -u 1
root@iirc:~ # newfs -U /dev/md1
root@iirc:~ # mount /dev/md1 /mnt
root@iirc:~ # cp -p procstat kgdb /mnt
root@iirc:~ # cd /rescue/
root@iirc:/rescue # cp -p * /mnt







On Sun, 2013-05-19 at 18:45 -0700, Dennis Glatting wrote:
 On Sun, 2013-05-19 at 16:28 -0400, Paul Kraus wrote:
  On May 19, 2013, at 11:51 AM, Dennis Glatting free...@pki2.com wrote:
  
   ZFS hangs on multi-socket systems (Tyan, Supermicro) under 9.1. ZFS does
   not hang under 8.4. This (and one other 4 socket) is a production
   system.
  
  Can you be more specific, I have been running 9.0 and 9.1 systems with
  multi-CPU and all ZFS with no (CPU related*) issues.
  
 
 I have (down to) ten FreeBSD/ZFS systems. Five of them are multi-socket
 populated. All are AMD CPUs of the 6200 series. Two of those
 multi-socketed systems are simply workstations and don't do much file
 I/O, so I have yet to see them fault.
 
 The remaining three perform significant I/O in the 1-8TB (simultaneous)
 file range, including sorting, compression, backup, etc (ZFS compression
 is enabled on some data sets as is dedup on a few minor data sets). I
 also do iSCSI and NFS from one of these systems.
 
 Simply, if I run 9.1 on those three busy systems ZFS will eventually
 hang under load (within ten hours to a few days) whereas it does not
 under 8.3/4. Two of those systems are 4x16 cores, one 2x16, and two 2x8
 cores. Multiple, simultaneous pbzip2 runs on individual 2-5TB ASCII
 files generally causes a hang within 10-20 hours.
 
 Hang means the system is alive and on the network but disk I/O has
 stopped. Run any command except statically linked executables on a
 memory volume and they will not run (no output or return to command
 prompt). This includes reboot, which never really reboots.
 
 The volumes where work is performed are typically 12-33TB RAIDz2
 volumes. For example:
 
 root@mc:~ # zpool list disk-1
 NAME SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
 disk-1  16.2T  5.86T  10.4T36%  1.32x  ONLINE  -
 
 root@mc:~ # zpool status disk-1
   pool: disk-1
  state: ONLINE
   scan: scrub repaired 0 in 21h53m with 0 errors on Mon Apr 29 01:52:55
 2013
 config:
 
   NAMESTATE READ WRITE CKSUM
   disk-1  ONLINE   0 0 0
 raidz2-0  ONLINE   0 0 0
   da2 ONLINE   0 0 0
   da3 ONLINE   0 0 0
   da4 ONLINE   0 0 0
   da7 ONLINE   0 0 0
   da5 ONLINE   0 0 0
   da6 ONLINE   0 0 0
   cache
 da0   ONLINE   0 0 0
 
 errors: No known data errors
 
 
  * I say no CPU related issues because I have run into SATA timeout
  issues with an external SATA enclosure with 4 drives (I know, SATA port
  expanders are evil, but it is my best option here). Sometimes the zpool
  hangs hard, sometimes just becomes unresponsive for a while. My fix,
  such as it is, is to tune the zfs per vdev queue depth as follows:
  
  vfs.zfs.vdev.min_pending=3
  vfs.zfs.vdev.max_pending=5
  
 
 I've not tried those. Currently, these are mine:
 
 vfs.zfs.write_limit_override=1G
 vfs.zfs.arc_max=8G
 vfs.zfs.txg.timeout=15
 vfs.zfs.cache_flush_disable=1
 
 # Recommended from the net
 # April, 2013
 vfs.zfs.l2arc_norw=0  # Default is 1
 vfs.zfs.l2arc_feed_again=0# Default is 1
 vfs.zfs.l2arc_noprefetch=0# Default is 0
 vfs.zfs.l2arc_feed_min_ms=1000# Default is 200
 
 
  The defaults are 5 and 10 respectively, and when I run with those I
  have the timeout issues, but only under very heavy I/O load. I only
  generate such load when migrating large amounts of data, which
  thankfully does not happen all that often.
  
 
 Two days ago when the 9.1 system hanged I was able to run a static
 procstat where it inadvertently(?) printed that da0 wasn't responsive on
 the console. Unfortunately I didn't have a static camcontrol ready so I
 was unable to query it.
 
 That said, according to the criteria from
 https://wiki.freebsd.org/AvgZfsDeadlockDebug that hang isn't a true ZFS
 problem, yet hung it was.
 
 I have since (today) updated the firmware of most of the devices in that
 system and it is currently running some tasks. Most of the disks in that
 system are Seagate but the un-updated devices include three WD disks
 (RAID1 OS and a swap disk) -- unupdated because I haven't been able to
 figure WD firmware download out) and a SSD where the manufacturer
 indicates the firmware diff is minor, though I plan to go back and flash
 it anyway.
 
 If my 4x16 system ever finishes I 

More than 32 CPUs under 8.4-P

2013-05-18 Thread Dennis Glatting

I have a 4x16=64 core server running FreeBSD 8.4-P but only two of the
CPUs (2x16=32) are enabled. Enabling the other 32 isn't as simple as
changing MAXCPU in param.h (apparently) and recompiling.

What do I need to do to enable the other 32 cores?


Copyright (c) 1992-2013 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights
reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 8.4-PRERELEASE #0 r250401: Wed May  8 21:46:23 PDT 2013
root@mc:/disk-2/obj/disk-1/src/sys/GENERIC amd64
gcc version 4.2.1 20070831 patched [FreeBSD]
Timecounter i8254 frequency 1193182 Hz quality 0
CPU: AMD Opteron(TM) Processor 6274  (2200.04-MHz
K8-class CPU)
  Origin = AuthenticAMD  Id = 0x600f12  Family = 15  Model = 1
Stepping = 2

Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT

Features2=0x1e98220bSSE3,PCLMULQDQ,MON,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,AESNI,XSAVE,OSXSAVE,AVX
  AMD Features=0x2e500800SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM
  AMD
Features2=0x1c9bfffLAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,XOP,SKINIT,WDT,LWP,FMA4,NodeId,Topology,b23,b24
  TSC: P-state invariant
real memory  = 137438953472 (131072 MB)
avail memory = 132427862016 (126293 MB)
ACPI APIC Table: 120911 APIC1027
FreeBSD/SMP: Multiprocessor System Detected: 32 CPUs
FreeBSD/SMP: 2 package(s) x 16 core(s)


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: More than 32 CPUs under 8.4-P

2013-05-18 Thread Ivan Klymenko
В Sat, 18 May 2013 10:13:08 -0700
Dennis Glatting free...@pki2.com пишет:

 
 I have a 4x16=64 core server running FreeBSD 8.4-P but only two of the
 CPUs (2x16=32) are enabled. Enabling the other 32 isn't as simple as
 changing MAXCPU in param.h (apparently) and recompiling.

Oops, sorry :)
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

Re: More than 32 CPUs under 8.4-P

2013-05-18 Thread Dennis Glatting
On Sat, 2013-05-18 at 20:24 +0300, Ivan Klymenko wrote:
 В Sat, 18 May 2013 10:13:08 -0700
 Dennis Glatting free...@pki2.com пишет:
 
  
  I have a 4x16=64 core server running FreeBSD 8.4-P but only two of the
  CPUs (2x16=32) are enabled. Enabling the other 32 isn't as simple as
  changing MAXCPU in param.h (apparently) and recompiling.
 
 Oops, sorry :)
 

In the boot sequence the kernel (loop) kept printing errors to the
console and didn't get to a command prompt.




___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

Re: More than 32 CPUs under 8.4-P

2013-05-18 Thread Ivan Klymenko
В Sat, 18 May 2013 10:13:08 -0700
Dennis Glatting free...@pki2.com пишет:

 
 I have a 4x16=64 core server running FreeBSD 8.4-P but only two of the
 CPUs (2x16=32) are enabled. Enabling the other 32 isn't as simple as
 changing MAXCPU in param.h (apparently) and recompiling.
 
 What do I need to do to enable the other 32 cores?
 
 
 Copyright (c) 1992-2013 The FreeBSD Project.
 Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993,
 1994 The Regents of the University of California. All rights
 reserved.
 FreeBSD is a registered trademark of The FreeBSD Foundation.
 FreeBSD 8.4-PRERELEASE #0 r250401: Wed May  8 21:46:23 PDT 2013
 root@mc:/disk-2/obj/disk-1/src/sys/GENERIC amd64
 gcc version 4.2.1 20070831 patched [FreeBSD]
 Timecounter i8254 frequency 1193182 Hz quality 0
 CPU: AMD Opteron(TM) Processor 6274  (2200.04-MHz
 K8-class CPU)
   Origin = AuthenticAMD  Id = 0x600f12  Family = 15  Model = 1
 Stepping = 2
 
 Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT
 
 Features2=0x1e98220bSSE3,PCLMULQDQ,MON,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,AESNI,XSAVE,OSXSAVE,AVX
   AMD Features=0x2e500800SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM
   AMD
 Features2=0x1c9bfffLAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,XOP,SKINIT,WDT,LWP,FMA4,NodeId,Topology,b23,b24
   TSC: P-state invariant
 real memory  = 137438953472 (131072 MB)
 avail memory = 132427862016 (126293 MB)
 ACPI APIC Table: 120911 APIC1027
 FreeBSD/SMP: Multiprocessor System Detected: 32 CPUs
 FreeBSD/SMP: 2 package(s) x 16 core(s)
 

http://svnweb.freebsd.org/base/releng/8.4/sys/amd64/include/param.h?revision=248810view=markup

Set MAXCPU parameter to 64
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org