Sam - Here's the output from netpipe between one client and one server:
[r...@lps-246 bin]# ./nplaunch ../NPtcp -h lps-246-compute-2 ../NPtcp -h lps-246-compute-2 [1] 4534 Send and receive buffers are 16384 and 87380 bytes (A bug in Linux doubles the requested buffer sizes) Send and receive buffers are 16384 and 87380 bytes (A bug in Linux doubles the requested buffer sizes) Now starting the main loop 0: 1 bytes 1964 times --> 0.15 Mbps in 51.87 usec 1: 2 bytes 1927 times --> 0.29 Mbps in 51.95 usec 2: 3 bytes 1924 times --> 0.44 Mbps in 51.93 usec 3: 4 bytes 1283 times --> 0.59 Mbps in 51.84 usec 4: 6 bytes 1446 times --> 0.88 Mbps in 51.87 usec 5: 8 bytes 964 times --> 1.18 Mbps in 51.89 usec 6: 12 bytes 1204 times --> 1.76 Mbps in 51.88 usec 7: 13 bytes 803 times --> 1.91 Mbps in 51.79 usec 8: 16 bytes 891 times --> 2.35 Mbps in 52.02 usec 9: 19 bytes 1081 times --> 2.79 Mbps in 52.01 usec 10: 21 bytes 1214 times --> 3.07 Mbps in 52.13 usec 11: 24 bytes 1278 times --> 3.52 Mbps in 52.01 usec 12: 27 bytes 1361 times --> 3.96 Mbps in 52.04 usec 13: 29 bytes 853 times --> 4.25 Mbps in 52.04 usec 14: 32 bytes 927 times --> 4.69 Mbps in 52.07 usec 15: 35 bytes 1020 times --> 5.14 Mbps in 52.00 usec 16: 45 bytes 1098 times --> 6.58 Mbps in 52.20 usec 17: 48 bytes 1277 times --> 7.01 Mbps in 52.21 usec 18: 51 bytes 1316 times --> 7.47 Mbps in 52.11 usec 19: 61 bytes 752 times --> 8.91 Mbps in 52.23 usec 20: 64 bytes 941 times --> 9.34 Mbps in 52.30 usec 21: 67 bytes 985 times --> 9.78 Mbps in 52.26 usec 22: 93 bytes 1028 times --> 13.50 Mbps in 52.57 usec 23: 96 bytes 1268 times --> 13.93 Mbps in 52.56 usec 24: 99 bytes 1288 times --> 14.37 Mbps in 52.57 usec 25: 125 bytes 691 times --> 18.13 Mbps in 52.59 usec 26: 128 bytes 943 times --> 18.54 Mbps in 52.68 usec 27: 131 bytes 963 times --> 18.97 Mbps in 52.69 usec 28: 189 bytes 985 times --> 26.85 Mbps in 53.70 usec 29: 192 bytes 1241 times --> 27.16 Mbps in 53.94 usec 30: 195 bytes 1245 times --> 27.40 Mbps in 54.30 usec 31: 253 bytes 642 times --> 31.53 Mbps in 61.23 usec 32: 256 bytes 813 times --> 31.89 Mbps in 61.25 usec 33: 259 bytes 822 times --> 32.82 Mbps in 60.20 usec 34: 381 bytes 846 times --> 43.84 Mbps in 66.31 usec 35: 384 bytes 1005 times --> 44.14 Mbps in 66.37 usec 36: 387 bytes 1008 times --> 44.35 Mbps in 66.57 usec 37: 509 bytes 512 times --> 55.88 Mbps in 69.49 usec 38: 512 bytes 718 times --> 56.08 Mbps in 69.66 usec 39: 515 bytes 720 times --> 56.42 Mbps in 69.64 usec 40: 765 bytes 724 times --> 77.61 Mbps in 75.20 usec 41: 768 bytes 886 times --> 77.89 Mbps in 75.22 usec 42: 771 bytes 887 times --> 78.17 Mbps in 75.25 usec 43: 1021 bytes 448 times --> 95.64 Mbps in 81.45 usec 44: 1024 bytes 613 times --> 96.04 Mbps in 81.35 usec 45: 1027 bytes 615 times --> 96.29 Mbps in 81.37 usec 46: 1533 bytes 617 times --> 118.90 Mbps in 98.37 usec 47: 1536 bytes 677 times --> 118.75 Mbps in 98.68 usec 48: 1539 bytes 676 times --> 119.00 Mbps in 98.67 usec 49: 2045 bytes 339 times --> 153.16 Mbps in 101.87 usec 50: 2048 bytes 490 times --> 152.82 Mbps in 102.25 usec 51: 2051 bytes 489 times --> 153.41 Mbps in 102.00 usec 52: 3069 bytes 491 times --> 195.25 Mbps in 119.92 usec 53: 3072 bytes 555 times --> 195.44 Mbps in 119.92 usec 54: 3075 bytes 556 times --> 196.04 Mbps in 119.67 usec 55: 4093 bytes 279 times --> 241.11 Mbps in 129.52 usec 56: 4096 bytes 385 times --> 241.18 Mbps in 129.57 usec 57: 4099 bytes 386 times --> 241.85 Mbps in 129.31 usec 58: 6141 bytes 387 times --> 313.92 Mbps in 149.25 usec 59: 6144 bytes 446 times --> 313.39 Mbps in 149.57 usec 60: 6147 bytes 445 times --> 313.58 Mbps in 149.55 usec 61: 8189 bytes 223 times --> 376.78 Mbps in 165.82 usec 62: 8192 bytes 301 times --> 376.76 Mbps in 165.89 usec 63: 8195 bytes 301 times --> 377.01 Mbps in 165.84 usec 64: 12285 bytes 301 times --> 466.20 Mbps in 201.04 usec 65: 12288 bytes 331 times --> 467.01 Mbps in 200.75 usec 66: 12291 bytes 332 times --> 467.81 Mbps in 200.45 usec 67: 16381 bytes 166 times --> 525.68 Mbps in 237.74 usec 68: 16384 bytes 210 times --> 526.26 Mbps in 237.53 usec 69: 16387 bytes 210 times --> 526.45 Mbps in 237.48 usec 70: 24573 bytes 210 times --> 606.69 Mbps in 309.02 usec 71: 24576 bytes 215 times --> 605.94 Mbps in 309.43 usec 72: 24579 bytes 215 times --> 606.69 Mbps in 309.09 usec 73: 32765 bytes 107 times --> 656.41 Mbps in 380.82 usec 74: 32768 bytes 131 times --> 654.14 Mbps in 382.18 usec 75: 32771 bytes 130 times --> 655.71 Mbps in 381.30 usec 76: 49149 bytes 131 times --> 717.66 Mbps in 522.50 usec 77: 49152 bytes 127 times --> 718.85 Mbps in 521.67 usec 78: 49155 bytes 127 times --> 716.82 Mbps in 523.17 usec 79: 65533 bytes 63 times --> 749.16 Mbps in 667.38 usec 80: 65536 bytes 74 times --> 750.34 Mbps in 666.36 usec 81: 65539 bytes 75 times --> 748.70 Mbps in 667.85 usec 82: 98301 bytes 74 times --> 796.11 Mbps in 942.05 usec 83: 98304 bytes 70 times --> 797.44 Mbps in 940.52 usec 84: 98307 bytes 70 times --> 796.58 Mbps in 941.56 usec 85: 131069 bytes 35 times --> 819.79 Mbps in 1219.80 usec 86: 131072 bytes 40 times --> 819.94 Mbps in 1219.60 usec 87: 131075 bytes 40 times --> 820.30 Mbps in 1219.09 usec 88: 196605 bytes 41 times --> 839.50 Mbps in 1786.76 usec 89: 196608 bytes 37 times --> 839.81 Mbps in 1786.12 usec 90: 196611 bytes 37 times --> 840.53 Mbps in 1784.61 usec 91: 262141 bytes 18 times --> 851.70 Mbps in 2348.22 usec 92: 262144 bytes 21 times --> 852.22 Mbps in 2346.81 usec 93: 262147 bytes 21 times --> 852.35 Mbps in 2346.48 usec 94: 393213 bytes 21 times --> 864.02 Mbps in 3472.12 usec 95: 393216 bytes 19 times --> 864.67 Mbps in 3469.55 usec 96: 393219 bytes 19 times --> 863.81 Mbps in 3473.02 usec 97: 524285 bytes 9 times --> 871.33 Mbps in 4590.67 usec 98: 524288 bytes 10 times --> 871.13 Mbps in 4591.75 usec 99: 524291 bytes 10 times --> 871.46 Mbps in 4590.00 usec 100: 786429 bytes 10 times --> 878.64 Mbps in 6828.69 usec 101: 786432 bytes 9 times --> 879.35 Mbps in 6823.22 usec 102: 786435 bytes 9 times --> 879.40 Mbps in 6822.89 usec 103: 1048573 bytes 4 times --> 883.66 Mbps in 9053.23 usec 104: 1048576 bytes 5 times --> 884.31 Mbps in 9046.60 usec 105: 1048579 bytes 5 times --> 884.45 Mbps in 9045.20 usec 106: 1572861 bytes 5 times --> 888.60 Mbps in 13504.41 usec 107: 1572864 bytes 4 times --> 888.71 Mbps in 13502.75 usec 108: 1572867 bytes 4 times --> 888.76 Mbps in 13502.00 usec 109: 2097149 bytes 3 times --> 891.10 Mbps in 17955.34 usec 110: 2097152 bytes 3 times --> 891.30 Mbps in 17951.33 usec 111: 2097155 bytes 3 times --> 891.17 Mbps in 17954.03 usec 112: 3145725 bytes 3 times --> 893.47 Mbps in 26861.51 usec 113: 3145728 bytes 3 times --> 893.33 Mbps in 26865.84 usec 114: 3145731 bytes 3 times --> 893.47 Mbps in 26861.47 usec 115: 4194301 bytes 3 times --> 894.52 Mbps in 35773.16 usec 116: 4194304 bytes 3 times --> 894.50 Mbps in 35774.15 usec 117: 4194307 bytes 3 times --> 894.55 Mbps in 35772.16 usec 118: 6291453 bytes 3 times --> 895.59 Mbps in 53596.18 usec 119: 6291456 bytes 3 times --> 895.64 Mbps in 53593.16 usec 120: 6291459 bytes 3 times --> 895.58 Mbps in 53596.34 usec 121: 8388605 bytes 3 times --> 896.17 Mbps in 71414.67 usec 122: 8388608 bytes 3 times --> 896.18 Mbps in 71413.99 usec 123: 8388611 bytes 3 times --> 896.14 Mbps in 71417.49 usec I'll run it on each node and let you know if anything is out of place. I believe the above results are fine for GigE, yes? - Dave On Wed, Jul 1, 2009 at 4:20 PM, Sam Lang <sl...@mcs.anl.gov> wrote: > > David, > It sounds like your initial thought (that there is a network > problem) could be correct. I would probably explore that first. What sort > of numbers do you get from netpipe runs (or even bmi_pingpong) between > client and server? > > -sam > > On Jul 1, 2009, at 5:15 PM, David Bonnie wrote: > > Sorry for not being clear. > > The hardware and software is unchanged. Runs from a few months ago (on > 2.8.0) performed as expected. Current runs (on both 2.8.0 and 2.8.1) are > slow. > > The nodes are sitting there with very low CPU usage even when running the > benchmark. I'm the only one running any jobs and there aren't any processes > running (the system load is < .02 and the cpu usage is pretty much 0%). > > The local disks haven't changed and are empty except for the pvfs2 storage > space; performance is bad even when I put the PVFS2 file system storage onto > a very fast (>300 MB/s local bandwidth) Atrato vlun connected over fiber > channel. > > My initial thought is that some hardware along the line died but I can't > seem to pinpoint it. All of the network interfaces show 0 errors and 0 > dropped packets. > > - Dave > > On Wed, Jul 1, 2009 at 4:10 PM, Rob Ross <rr...@mcs.anl.gov> wrote: > >> Hi David, >> >> I still don't get it: when was the performance good? Same software and >> hardware, just some time in the past? Or is there a software change? >> >> The nodes aren't being used for anything else, there are no rogue >> processes, and the local file systems are otherwise empty? >> >> Thanks, >> >> Rob >> >> On Jul 1, 2009, at 5:05 PM, David Bonnie wrote: >> >> Rob - >>> >>> Performance is down across all PVFS2 installations. The benchmark simply >>> creates files of a random size (between 1 and 25 MB) in a single folder on >>> the mounted PVFS2 partition, 16 KB at a time. It's not anywhere near ideal, >>> but it's the workload I'm working with. >>> >>> Prior to this problem we were getting ~22 MB/s write throughput and we're >>> down to about 2.5 MB/s for no apparent reason. Reads are down from about 55 >>> MB/s to 30 MB/s. No hardware has changed and as far as I can tell no >>> hardware has died either. >>> >>> - Dave >>> >>> >>> On Wed, Jul 1, 2009 at 4:00 PM, Rob Ross <rr...@mcs.anl.gov> wrote: >>> Do you mean that 2.8.0 is fast and 2.8.1 is slow? Can you describe the >>> benchmark and how you are doing your measurements? >>> >>> Rob >>> >>> >>> On Jul 1, 2009, at 4:43 PM, David Bonnie wrote: >>> >>> Hello all - >>> >>> I'm having trouble figuring out a problem with performance depredation on >>> a simple 10 node cluster. Prior runs on the cluster (before this problem >>> manifested itself) resulted in bandwidth and IOPS about 10 times higher on a >>> small file creation workload. Each node is running as a metadata server and >>> a data server. >>> >>> The problem is persistent between versions and installations of PVFS2 >>> 2.8.0 and 2.8.1. Rebooting all of the nodes didn't improve anything. The >>> network connections (simple GigE) showed no errors or dropped packets. >>> Using different physical disks (both SAS and FC) didn't improve things. >>> The kernel logs didn't show anything out of place nor did the pvfs2 server >>> or client logs. It seems like a network issue but I can't seem to find >>> anything wrong with any of the connections. >>> >>> Has anyone seen this kind of problem before? I seem to remember >>> something on the list before about performance suddenly dropping but I can't >>> find the message now (of course). Any insight would be appreciated! >>> >>> Thanks, >>> >>> - Dave >>> _______________________________________________ >>> Pvfs2-developers mailing list >>> Pvfs2-developers@beowulf-underground.org >>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers >>> >>> >>> >> > _______________________________________________ > Pvfs2-developers mailing list > Pvfs2-developers@beowulf-underground.org > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers > > > > _______________________________________________ > Pvfs2-developers mailing list > Pvfs2-developers@beowulf-underground.org > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers > >
_______________________________________________ Pvfs2-developers mailing list Pvfs2-developers@beowulf-underground.org http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers