Sam -

Here's the output from netpipe between one client and one server:

[r...@lps-246 bin]# ./nplaunch ../NPtcp -h lps-246-compute-2

../NPtcp -h lps-246-compute-2

[1] 4534
Send and receive buffers are 16384 and 87380 bytes
(A bug in Linux doubles the requested buffer sizes)
Send and receive buffers are 16384 and 87380 bytes
(A bug in Linux doubles the requested buffer sizes)
Now starting the main loop
  0:       1 bytes   1964 times -->      0.15 Mbps in      51.87 usec
  1:       2 bytes   1927 times -->      0.29 Mbps in      51.95 usec
  2:       3 bytes   1924 times -->      0.44 Mbps in      51.93 usec
  3:       4 bytes   1283 times -->      0.59 Mbps in      51.84 usec
  4:       6 bytes   1446 times -->      0.88 Mbps in      51.87 usec
  5:       8 bytes    964 times -->      1.18 Mbps in      51.89 usec
  6:      12 bytes   1204 times -->      1.76 Mbps in      51.88 usec
  7:      13 bytes    803 times -->      1.91 Mbps in      51.79 usec
  8:      16 bytes    891 times -->      2.35 Mbps in      52.02 usec
  9:      19 bytes   1081 times -->      2.79 Mbps in      52.01 usec
 10:      21 bytes   1214 times -->      3.07 Mbps in      52.13 usec
 11:      24 bytes   1278 times -->      3.52 Mbps in      52.01 usec
 12:      27 bytes   1361 times -->      3.96 Mbps in      52.04 usec
 13:      29 bytes    853 times -->      4.25 Mbps in      52.04 usec
 14:      32 bytes    927 times -->      4.69 Mbps in      52.07 usec
 15:      35 bytes   1020 times -->      5.14 Mbps in      52.00 usec
 16:      45 bytes   1098 times -->      6.58 Mbps in      52.20 usec
 17:      48 bytes   1277 times -->      7.01 Mbps in      52.21 usec
 18:      51 bytes   1316 times -->      7.47 Mbps in      52.11 usec
 19:      61 bytes    752 times -->      8.91 Mbps in      52.23 usec
 20:      64 bytes    941 times -->      9.34 Mbps in      52.30 usec
 21:      67 bytes    985 times -->      9.78 Mbps in      52.26 usec
 22:      93 bytes   1028 times -->     13.50 Mbps in      52.57 usec
 23:      96 bytes   1268 times -->     13.93 Mbps in      52.56 usec
 24:      99 bytes   1288 times -->     14.37 Mbps in      52.57 usec
 25:     125 bytes    691 times -->     18.13 Mbps in      52.59 usec
 26:     128 bytes    943 times -->     18.54 Mbps in      52.68 usec
 27:     131 bytes    963 times -->     18.97 Mbps in      52.69 usec
 28:     189 bytes    985 times -->     26.85 Mbps in      53.70 usec
 29:     192 bytes   1241 times -->     27.16 Mbps in      53.94 usec
 30:     195 bytes   1245 times -->     27.40 Mbps in      54.30 usec
 31:     253 bytes    642 times -->     31.53 Mbps in      61.23 usec
 32:     256 bytes    813 times -->     31.89 Mbps in      61.25 usec
 33:     259 bytes    822 times -->     32.82 Mbps in      60.20 usec
 34:     381 bytes    846 times -->     43.84 Mbps in      66.31 usec
 35:     384 bytes   1005 times -->     44.14 Mbps in      66.37 usec
 36:     387 bytes   1008 times -->     44.35 Mbps in      66.57 usec
 37:     509 bytes    512 times -->     55.88 Mbps in      69.49 usec
 38:     512 bytes    718 times -->     56.08 Mbps in      69.66 usec
 39:     515 bytes    720 times -->     56.42 Mbps in      69.64 usec
 40:     765 bytes    724 times -->     77.61 Mbps in      75.20 usec
 41:     768 bytes    886 times -->     77.89 Mbps in      75.22 usec
 42:     771 bytes    887 times -->     78.17 Mbps in      75.25 usec
 43:    1021 bytes    448 times -->     95.64 Mbps in      81.45 usec
 44:    1024 bytes    613 times -->     96.04 Mbps in      81.35 usec
 45:    1027 bytes    615 times -->     96.29 Mbps in      81.37 usec
 46:    1533 bytes    617 times -->    118.90 Mbps in      98.37 usec
 47:    1536 bytes    677 times -->    118.75 Mbps in      98.68 usec
 48:    1539 bytes    676 times -->    119.00 Mbps in      98.67 usec
 49:    2045 bytes    339 times -->    153.16 Mbps in     101.87 usec
 50:    2048 bytes    490 times -->    152.82 Mbps in     102.25 usec
 51:    2051 bytes    489 times -->    153.41 Mbps in     102.00 usec
 52:    3069 bytes    491 times -->    195.25 Mbps in     119.92 usec
 53:    3072 bytes    555 times -->    195.44 Mbps in     119.92 usec
 54:    3075 bytes    556 times -->    196.04 Mbps in     119.67 usec
 55:    4093 bytes    279 times -->    241.11 Mbps in     129.52 usec
 56:    4096 bytes    385 times -->    241.18 Mbps in     129.57 usec
 57:    4099 bytes    386 times -->    241.85 Mbps in     129.31 usec
 58:    6141 bytes    387 times -->    313.92 Mbps in     149.25 usec
 59:    6144 bytes    446 times -->    313.39 Mbps in     149.57 usec
 60:    6147 bytes    445 times -->    313.58 Mbps in     149.55 usec
 61:    8189 bytes    223 times -->    376.78 Mbps in     165.82 usec
 62:    8192 bytes    301 times -->    376.76 Mbps in     165.89 usec
 63:    8195 bytes    301 times -->    377.01 Mbps in     165.84 usec
 64:   12285 bytes    301 times -->    466.20 Mbps in     201.04 usec
 65:   12288 bytes    331 times -->    467.01 Mbps in     200.75 usec
 66:   12291 bytes    332 times -->    467.81 Mbps in     200.45 usec
 67:   16381 bytes    166 times -->    525.68 Mbps in     237.74 usec
 68:   16384 bytes    210 times -->    526.26 Mbps in     237.53 usec
 69:   16387 bytes    210 times -->    526.45 Mbps in     237.48 usec
 70:   24573 bytes    210 times -->    606.69 Mbps in     309.02 usec
 71:   24576 bytes    215 times -->    605.94 Mbps in     309.43 usec
 72:   24579 bytes    215 times -->    606.69 Mbps in     309.09 usec
 73:   32765 bytes    107 times -->    656.41 Mbps in     380.82 usec
 74:   32768 bytes    131 times -->    654.14 Mbps in     382.18 usec
 75:   32771 bytes    130 times -->    655.71 Mbps in     381.30 usec
 76:   49149 bytes    131 times -->    717.66 Mbps in     522.50 usec
 77:   49152 bytes    127 times -->    718.85 Mbps in     521.67 usec
 78:   49155 bytes    127 times -->    716.82 Mbps in     523.17 usec
 79:   65533 bytes     63 times -->    749.16 Mbps in     667.38 usec
 80:   65536 bytes     74 times -->    750.34 Mbps in     666.36 usec
 81:   65539 bytes     75 times -->    748.70 Mbps in     667.85 usec
 82:   98301 bytes     74 times -->    796.11 Mbps in     942.05 usec
 83:   98304 bytes     70 times -->    797.44 Mbps in     940.52 usec
 84:   98307 bytes     70 times -->    796.58 Mbps in     941.56 usec
 85:  131069 bytes     35 times -->    819.79 Mbps in    1219.80 usec
 86:  131072 bytes     40 times -->    819.94 Mbps in    1219.60 usec
 87:  131075 bytes     40 times -->    820.30 Mbps in    1219.09 usec
 88:  196605 bytes     41 times -->    839.50 Mbps in    1786.76 usec
 89:  196608 bytes     37 times -->    839.81 Mbps in    1786.12 usec
 90:  196611 bytes     37 times -->    840.53 Mbps in    1784.61 usec
 91:  262141 bytes     18 times -->    851.70 Mbps in    2348.22 usec
 92:  262144 bytes     21 times -->    852.22 Mbps in    2346.81 usec
 93:  262147 bytes     21 times -->    852.35 Mbps in    2346.48 usec
 94:  393213 bytes     21 times -->    864.02 Mbps in    3472.12 usec
 95:  393216 bytes     19 times -->    864.67 Mbps in    3469.55 usec
 96:  393219 bytes     19 times -->    863.81 Mbps in    3473.02 usec
 97:  524285 bytes      9 times -->    871.33 Mbps in    4590.67 usec
 98:  524288 bytes     10 times -->    871.13 Mbps in    4591.75 usec
 99:  524291 bytes     10 times -->    871.46 Mbps in    4590.00 usec
100:  786429 bytes     10 times -->    878.64 Mbps in    6828.69 usec
101:  786432 bytes      9 times -->    879.35 Mbps in    6823.22 usec
102:  786435 bytes      9 times -->    879.40 Mbps in    6822.89 usec
103: 1048573 bytes      4 times -->    883.66 Mbps in    9053.23 usec
104: 1048576 bytes      5 times -->    884.31 Mbps in    9046.60 usec
105: 1048579 bytes      5 times -->    884.45 Mbps in    9045.20 usec
106: 1572861 bytes      5 times -->    888.60 Mbps in   13504.41 usec
107: 1572864 bytes      4 times -->    888.71 Mbps in   13502.75 usec
108: 1572867 bytes      4 times -->    888.76 Mbps in   13502.00 usec
109: 2097149 bytes      3 times -->    891.10 Mbps in   17955.34 usec
110: 2097152 bytes      3 times -->    891.30 Mbps in   17951.33 usec
111: 2097155 bytes      3 times -->    891.17 Mbps in   17954.03 usec
112: 3145725 bytes      3 times -->    893.47 Mbps in   26861.51 usec
113: 3145728 bytes      3 times -->    893.33 Mbps in   26865.84 usec
114: 3145731 bytes      3 times -->    893.47 Mbps in   26861.47 usec
115: 4194301 bytes      3 times -->    894.52 Mbps in   35773.16 usec
116: 4194304 bytes      3 times -->    894.50 Mbps in   35774.15 usec
117: 4194307 bytes      3 times -->    894.55 Mbps in   35772.16 usec
118: 6291453 bytes      3 times -->    895.59 Mbps in   53596.18 usec
119: 6291456 bytes      3 times -->    895.64 Mbps in   53593.16 usec
120: 6291459 bytes      3 times -->    895.58 Mbps in   53596.34 usec
121: 8388605 bytes      3 times -->    896.17 Mbps in   71414.67 usec
122: 8388608 bytes      3 times -->    896.18 Mbps in   71413.99 usec
123: 8388611 bytes      3 times -->    896.14 Mbps in   71417.49 usec


I'll run it on each node and let you know if anything is out of place.  I
believe the above results are fine for GigE, yes?

- Dave

On Wed, Jul 1, 2009 at 4:20 PM, Sam Lang <sl...@mcs.anl.gov> wrote:

>
> David,
> It sounds like your initial thought (that there is a network
> problem) could be correct.  I would probably explore that first.  What sort
> of numbers do you get from netpipe runs (or even bmi_pingpong) between
> client and server?
>
> -sam
>
> On Jul 1, 2009, at 5:15 PM, David Bonnie wrote:
>
> Sorry for not being clear.
>
> The hardware and software is unchanged.  Runs from a few months ago (on
> 2.8.0) performed as expected.  Current runs (on both 2.8.0 and 2.8.1) are
> slow.
>
> The nodes are sitting there with very low CPU usage even when running the
> benchmark.  I'm the only one running any jobs and there aren't any processes
> running (the system load is < .02 and the cpu usage is pretty much 0%).
>
> The local disks haven't changed and are empty except for the pvfs2 storage
> space; performance is bad even when I put the PVFS2 file system storage onto
> a very fast (>300 MB/s local bandwidth) Atrato vlun connected over fiber
> channel.
>
> My initial thought is that some hardware along the line died but I can't
> seem to pinpoint it.  All of the network interfaces show 0 errors and 0
> dropped packets.
>
> - Dave
>
> On Wed, Jul 1, 2009 at 4:10 PM, Rob Ross <rr...@mcs.anl.gov> wrote:
>
>> Hi David,
>>
>> I still don't get it: when was the performance good? Same software and
>> hardware, just some time in the past? Or is there a software change?
>>
>> The nodes aren't being used for anything else, there are no rogue
>> processes, and the local file systems are otherwise empty?
>>
>> Thanks,
>>
>> Rob
>>
>> On Jul 1, 2009, at 5:05 PM, David Bonnie wrote:
>>
>>  Rob -
>>>
>>> Performance is down across all PVFS2 installations.  The benchmark simply
>>> creates files of a random size (between 1 and 25 MB) in a single folder on
>>> the mounted PVFS2 partition, 16 KB at a time.  It's not anywhere near ideal,
>>> but it's the workload I'm working with.
>>>
>>> Prior to this problem we were getting ~22 MB/s write throughput and we're
>>> down to about 2.5 MB/s for no apparent reason.  Reads are down from about 55
>>> MB/s to 30 MB/s.  No hardware has changed and as far as I can tell no
>>> hardware has died either.
>>>
>>> - Dave
>>>
>>>
>>> On Wed, Jul 1, 2009 at 4:00 PM, Rob Ross <rr...@mcs.anl.gov> wrote:
>>> Do you mean that 2.8.0 is fast and 2.8.1 is slow? Can you describe the
>>> benchmark and how you are doing your measurements?
>>>
>>> Rob
>>>
>>>
>>> On Jul 1, 2009, at 4:43 PM, David Bonnie wrote:
>>>
>>> Hello all -
>>>
>>> I'm having trouble figuring out a problem with performance depredation on
>>> a simple 10 node cluster.  Prior runs on the cluster (before this problem
>>> manifested itself) resulted in bandwidth and IOPS about 10 times higher on a
>>> small file creation workload.  Each node is running as a metadata server and
>>> a data server.
>>>
>>> The problem is persistent between versions and installations of PVFS2
>>> 2.8.0 and 2.8.1.  Rebooting all of the nodes didn't improve anything.  The
>>> network connections (simple GigE) showed no errors or dropped packets.
>>>  Using different physical disks (both SAS and FC) didn't improve things.
>>>  The kernel logs didn't show anything out of place nor did the pvfs2 server
>>> or client logs.  It seems like a network issue but I can't seem to find
>>> anything wrong with any of the connections.
>>>
>>> Has anyone seen this kind of problem before?  I seem to remember
>>> something on the list before about performance suddenly dropping but I can't
>>> find the message now (of course).  Any insight would be appreciated!
>>>
>>> Thanks,
>>>
>>> - Dave
>>> _______________________________________________
>>> Pvfs2-developers mailing list
>>> Pvfs2-developers@beowulf-underground.org
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>>
>>>
>>>
>>
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers@beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>
>
>
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers@beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>
>
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to