Hi Jim, here are the answers to your questions : > > What size and type of server?
SUNW,Sun-Fire-V240, Memory size: 2048 Megabytes > What size and type of storage? SAN-attached storage array, dual-path 2GB FC connection 4 LUNs 96GB each : # mpathadm list lu /dev/rdsk/c3t001738010140003Dd0s2 Total Path Count: 2 Operational Path Count: 2 /dev/rdsk/c3t001738010140003Cd0s2 Total Path Count: 2 Operational Path Count: 2 /dev/rdsk/c3t001738010140003Bd0s2 Total Path Count: 2 Operational Path Count: 2 /dev/rdsk/c3t001738010140002Dd0s2 Total Path Count: 2 Operational Path Count: 2 > What release of Solaris? Solaris 10 11/06 s10s_u3wos_05a SPARC, all patches installed > What how may networks, and what type? 2 GBE interfaces point-to-point between NFS client and server: bge1-bge1, bge2-bge2 > What is being used to generate the load for the testing? spec.org SFS97_R1 http://www.spec.org/sfs97r1/ > What is the zpool configuration? # zpool status tank1 pool: tank1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank1 ONLINE 0 0 0 c3t001738010140003Bd0 ONLINE 0 0 0 c3t001738010140003Cd0 ONLINE 0 0 0 c3t001738010140002Dd0 ONLINE 0 0 0 c3t001738010140003Dd0 ONLINE 0 0 0 errors: No known data errors I've created 160 ZFS'es in this zpool. > What do the system stats look like while under load > (e.g. mpstat), and how > to they change when you see this behavior? normal load CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 2 29 24 4236 0 589 489 0 38 1 10 0 89 1 0 0 2 1772 1666 4034 3 604 512 0 25 0 11 0 89 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 97 2849 2205 4183 23 287 614 0 12 0 75 0 25 1 0 0 639 5406 5206 3877 56 287 638 0 18 0 75 0 25 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 504 198 122 4955 0 730 246 0 57 0 15 0 85 1 0 0 6 2154 2049 4512 5 697 345 0 166 0 16 0 84 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 5 39 31 4108 2 603 261 0 44 0 10 0 90 1 0 0 4 1899 1791 3967 6 604 327 0 51 1 12 0 87 When I see this behavior: CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 8 0 123 1065 836 3677 12 687 426 2 85 0 35 0 65 1 7 0 276 2444 2266 3602 23 686 464 3 79 0 36 0 64 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 160 1400 971 6710 7 1395 608 3 12 0 59 0 41 1 0 0 412 3733 3499 6596 28 1390 637 3 33 0 63 0 37 prstat | grep nfsd 592 daemon 12M 10M sleep 60 -20 0:57:57 24% nfsd/1028 592 daemon 12M 10M sleep 60 -20 0:57:59 24% nfsd/1028 592 daemon 12M 10M sleep 60 -20 0:58:02 25% nfsd/1028 > What does "zpool iostat <zpool_name> 1" data look > like while under load? tank1 60.5G 321G 2.29K 675 144M 4.00M tank1 60.5G 321G 496 2.45K 29.0M 9.26M tank1 60.5G 321G 655 2.64K 40.6M 14.9M tank1 60.5G 321G 2.73K 534 174M 6.96M tank1 60.5G 321G 2.42K 719 151M 4.59M tank1 60.5G 321G 32 2.91K 1.99M 14.8M tank1 60.5G 321G 12 2.46K 824K 12.6M tank1 60.5G 321G 63 1.83K 3.84M 13.9M tank1 60.5G 321G 2.50K 903 150M 14.3M tank1 60.5G 321G 2.93K 414 180M 10.1M tank1 60.5G 321G 1.95K 998 124M 5.78M tank1 60.5G 321G 164 2.65K 10.0M 12.1M tank1 60.5G 321G 959 1.98K 58.6M 12.3M tank1 60.5G 321G 2.82K 477 178M 7.56M tank1 60.5G 321G 338 2.46K 20.8M 13.7M tank1 60.5G 321G 166 3.01K 10.6M 17.8M > Are you collecting nfsstat data - what is the rate of > incoming NFS ops? Server nfs: calls badcalls 18673 0 Version 3: (19090 calls) null getattr setattr lookup access readlink 0 0% 2178 11% 209 1% 5214 27% 1343 7% 1368 7% read write create mkdir symlink mknod 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% remove rmdir rename link readdir readdirplus 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% fsstat fsinfo pathconf commit 0 0% 0 0% 0 0% 0 0% > Can you characterize the load - read/write data > intensive, metadata intensive? There is a standard mix of NFS ops defined by spec.org SFS : SFS Aggregate Results for 1 Client(s), Thu Mar 1 18:28:04 2007 NFS Protocol Version 3 ------------------------------------------------------------------------------ NFS Target Actual NFS NFS Mean Std Dev Std Error Pcnt Op NFS NFS Op Op Response Response of Mean,95% of Type Mix Mix Success Error Time Time Confidence Total Pcnt Pcnt Count Count Msec/Op Msec/Op +- Msec/Op Time ------------------------------------------------------------------------------ getattr 11% 11.1% 66693 0 1.18 1.50 0.01 5.9% setattr 1% 1.0% 6097 0 3.54 3.46 0.05 1.6% lookup 27% 27.5% 164643 0 1.98 2.62 0.01 24.5% readlink 7% 7.1% 42300 0 0.94 0.99 0.01 3.0% read 18% 17.6% 105538 0 3.43 3.60 0.01 27.3% write 9% 8.8% 52640 0 2.68 3.03 0.01 10.6% create 1% 1.0% 6194 0 4.50 6.45 0.06 2.1% remove 1% 1.0% 5922 0 5.86 6.22 0.06 2.6% readdir 2% 2.0% 12261 0 2.46 1.85 0.02 2.3% fsstat 1% 1.0% 6031 0 0.83 0.65 0.02 0.4% access 7% 7.2% 42826 0 0.85 0.69 0.01 2.7% commit 5% 4.5% 27218 0 2.73 3.47 0.02 5.6% fsinfo 1% 1.0% 6010 0 0.84 0.72 0.02 0.4% readdirplus 9% 9.1% 54513 0 2.66 2.26 0.01 10.9% ------------------------------------------------------------------------------ [i]( How can I use fixed-width font here or <pre></pre> tags ??? )[/i] > > Are the client machines Solaris, or something else? Yes, the same configuration as on the server > > Does this last for seconds, minutes, tens-of-minutes? When it becomes bad (in my case when the requested number of IOPS is 4000), it never gets back to normal behavior during the load. [b]Just for comparison: 160 UFS'es on the same 4 LUNs concatenated by SVM and didvided into 160 soft partitions works fine under load of 11000 SFS IOPS. And number of NFS threads never jumps so high.[/b] > Does the system > remain in this > state indefinitely until reboot, or does it > normalize? Yes, remains in this state if the load doesn't stop. > > Can you consistently reproduce this problem? Yes I can. Thanks in advance, -- Leon This message posted from opensolaris.org