On Thu, 24 Apr 2014 13:51:49 +0800 Indra Pramana wrote:

> Hi Christian,
> Good day to you, and thank you for your reply.
> On Wed, Apr 23, 2014 at 11:41 PM, Christian Balzer <ch...@gol.com> wrote:
> > > > > Using 32 concurrent writes, result is below. The speed really
> > > > > fluctuates.
> > > > >
> > > > >  Total time run:         64.31704964.317049
> > > > > Total writes made:      1095
> > > > > Write size:             4194304
> > > > > Bandwidth (MB/sec):     68.100
> > > > >
> > > > > Stddev Bandwidth:       44.6773
> > > > > Max bandwidth (MB/sec): 184
> > > > > Min bandwidth (MB/sec): 0
> > > > > Average Latency:        1.87761
> > > > > Stddev Latency:         1.90906
> > > > > Max latency:            9.99347
> > > > > Min latency:            0.075849
> > > > >
> > > > That is really weird, it should get faster, not slower. ^o^
> > > > I assume you've run this a number of times?
> > > >
> > > > Also my apologies, the default is 16 threads, not 1, but that still
> > > > isn't enough to get my cluster to full speed:
> > > > ---
> > > > Bandwidth (MB/sec):     349.044
> > > >
> > > > Stddev Bandwidth:       107.582
> > > > Max bandwidth (MB/sec): 408
> > > > ---
> > > > at 64 threads it will ramp up from a slow start to:
> > > > ---
> > > > Bandwidth (MB/sec):     406.967
> > > >
> > > > Stddev Bandwidth:       114.015
> > > > Max bandwidth (MB/sec): 452
> > > > ---
> > > >
> > > > But what stands out is your latency. I don't have a 10GBE network
> > > > to compare, but my Infiniband based cluster (going through at
> > > > least one switch) gives me values like this:
> > > > ---
> > > > Average Latency:        0.335519
> > > > Stddev Latency:         0.177663
> > > > Max latency:            1.37517
> > > > Min latency:            0.1017
> > > > ---
> > > >
> > > > Of course that latency is not just the network.
> > > >
> > >
> > > What else can contribute to this latency? Storage node load, disk
> > > speed, anything else?
> > >
> > That and the network itself are pretty much it, you should know once
> > you've run those test with atop or iostat on the storage nodes.
> >
> > >
> > > > I would suggest running atop (gives you more information at one
> > > > glance) or "iostat -x 3" on all your storage nodes during these
> > > > tests to identify any node or OSD that is overloaded in some way.
> > > >
> > >
> > > Will try.
> > >
> > Do that and let us know about the results.
> >
> I have done some tests using iostat and noted some OSDs on a particular
> storage node going up to the 100% limit when I run the rados bench test.
Dumping lots of text will make people skip over your mails, you need to
summarize and preferably understand yourself what these numbers mean.

The iostat output is not too conclusive, as the numbers when reaching 100%
utilization are not particular impressive.
The fact that it happens though should make you look for anything
different with these OSDs, from smartctl checks to PG distribution, as
in "ceph pg dump" and then tallying up each PG. 
Also look at "ceph osd tree" and see if those OSDs or node have a higher
weight than others.

The atop line indicates that sdb was being read at a rate of 100MB/s and
assuming that your benchmark was more or less the only thing running at
that time this would mean something very odd is going on, as all the other
OSDs were have no significant reads going on and all were being written at
about the same speed.


> ====
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            1.09    0.00    0.92   21.74    0.00   76.25
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> sda               0.00     0.00    4.33   42.00    73.33  6980.00
> 304.46     0.29    6.22    0.00    6.86   1.50   6.93
> sdb               0.00     0.00    0.00   17.67     0.00  6344.00
> 718.19    59.64  854.26    0.00  854.26  56.60 *100.00*
> sdc               0.00     0.00   12.33   59.33    70.67 18882.33
> 528.92    36.54  509.80   64.76  602.31  10.51  75.33
> sdd               0.00     0.00    3.33   54.33    24.00 15249.17
> 529.71     1.29   22.45    3.20   23.63   1.64   9.47
> sde               0.00     0.33    0.00    0.67     0.00     4.00
> 12.00     0.30  450.00    0.00  450.00 450.00  30.00
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            1.38    0.00    1.13    7.75    0.00   89.74
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> sda               0.00     0.00    5.00   69.00    30.67 19408.50
> 525.38     4.29   58.02    0.53   62.18   2.00  14.80
> sdb               0.00     0.00    7.00   63.33    41.33 20911.50
> 595.82    13.09  826.96   88.57  908.57   5.48  38.53
> sdc               0.00     0.00    2.67   30.00    17.33  6945.33
> 426.29     0.21    6.53    0.50    7.07   1.59   5.20
> sdd               0.00     0.00    2.67   58.67    16.00 20661.33
> 674.26     4.89   79.54   41.00   81.30   2.70  16.53
> sde               0.00     0.00    0.00    1.67     0.00     6.67
> 8.00     0.01    3.20    0.00    3.20   1.60   0.27
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            0.97    0.00    0.55    6.73    0.00   91.75
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> sda               0.00     0.00    1.67   15.33    21.33   120.00
> 16.63     0.02    1.18    0.00    1.30   0.63   1.07
> sdb               0.00     0.00    4.33   62.33    24.00 13299.17
> 399.69     2.68   11.18    1.23   11.87   1.94  12.93
> sdc               0.00     0.00    0.67   38.33    70.67  7881.33
> 407.79    37.66  202.15    0.00  205.67  13.61  53.07
> sdd               0.00     0.00    3.00   17.33    12.00   166.00
> 17.51     0.05    2.89    3.11    2.85   0.98   2.00
> sde               0.00     0.00    0.00    0.00     0.00     0.00
> 0.00     0.00    0.00    0.00    0.00   0.00   0.00
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            1.29    0.00    0.92   24.10    0.00   73.68
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> sda               0.00     0.00    0.00   45.33     0.00  4392.50
> 193.79     0.62   13.62    0.00   13.62   1.09   4.93
> sdb               0.00     0.00    0.00    8.67     0.00  3600.00
> 830.77    63.87 1605.54    0.00 1605.54 115.38 *100.00*
> sdc               0.00     0.33    8.67   42.67    37.33  5672.33
> 222.45    16.88  908.78    1.38 1093.09   7.06  36.27
> sdd               0.00     0.00    0.33   31.00     1.33   629.83
> 40.29     0.06    1.91    0.00    1.94   0.94   2.93
> sde               0.00     0.00    0.00    0.33     0.00     1.33
> 8.00     0.12  368.00    0.00  368.00 368.00  12.27
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            1.59    0.00    0.88    4.82    0.00   92.70
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> sda               0.00     0.00    0.00   29.00     0.00   235.00
> 16.21     0.06    1.98    0.00    1.98   0.97   2.80
> sdb               0.00     6.00    4.33  114.67    38.67  6422.33
> 108.59     9.19  513.19  265.23  522.56   2.08  24.80
> sdc               0.00     0.00    0.00   20.67     0.00   124.00
> 12.00     0.04    2.00    0.00    2.00   1.03   2.13
> sdd               0.00     5.00    1.67   81.00    12.00   546.17
> 13.50     0.10    1.21    0.80    1.22   0.39   3.20
> sde               0.00     0.00    0.00    0.00     0.00     0.00
> 0.00     0.00    0.00    0.00    0.00   0.00   0.00
> ====
> And the high utilisation is randomly affecting other OSDs as well within
> the same node, and not only affecting one particular OSD.
> atop result on the node:
> ====
> ATOP -
> ceph-osd-07
> 2014/04/24
> 13:49:12
> ------                                                              10s
> elapsed
> PRC | sys    1.77s |  user   2.11s |              |               |
> #proc    164 |               | #trun      2 | #tslpi  2817  | #tslpu
> 0 |               | #zombie    0 | clones     4  |
> |               | #exit      0 |
> CPU | sys      14% |  user     20% |              |  irq       1%
> |              |               | idle    632% | wait    133%
> |              |               | steal     0% | guest     0%
> |              | avgf 1.79GHz  | avgscal  54% |
> cpu | sys       6% |  user      7% |              |  irq       0%
> |              |               | idle     19% | cpu006 w 68%
> |              |               | steal     0% | guest     0%
> |              | avgf 2.42GHz  | avgscal  73% |
> cpu | sys       2% |  user      3% |              |  irq       0%
> |              |               | idle     88% | cpu002 w  7%
> |              |               | steal     0% | guest     0%
> |              | avgf 1.68GHz  | avgscal  50% |
> cpu | sys       2% |  user      2% |              |  irq       0%
> |              |               | idle     86% | cpu003 w 10%
> |              |               | steal     0% | guest     0%
> |              | avgf 1.67GHz  | avgscal  50% |
> cpu | sys       2% |  user      2% |              |  irq       0%
> |              |               | idle     75% | cpu001 w 21%
> |              |               | steal     0% | guest     0%
> |              | avgf 1.83GHz  | avgscal  55% |
> cpu | sys       1% |  user      2% |              |  irq       1%
> |              |               | idle     70% | cpu000 w 26%
> |              |               | steal     0% | guest     0%
> |              | avgf 1.85GHz  | avgscal  56% |
> cpu | sys       1% |  user      2% |              |  irq       0%
> |              |               | idle     97% | cpu004 w  1%
> |              |               | steal     0% | guest     0%
> |              | avgf 1.64GHz  | avgscal  49% |
> cpu | sys       1% |  user      1% |              |  irq       0%
> |              |               | idle     98% | cpu005 w  0%
> |              |               | steal     0% | guest     0%
> |              | avgf 1.60GHz  | avgscal  48% |
> cpu | sys       0% |  user      1% |              |  irq       0%
> |              |               | idle     98% | cpu007 w  0%
> |              |               | steal     0% | guest     0%
> |              | avgf 1.60GHz  | avgscal  48% |
> CPL | avg1    1.12 |               | avg5    0.90 |               | avg15
> 0.72 |               |              |               | csw   103682
> |               | intr   34330 |               |
> |               | numcpu     8 |
> MEM | tot    15.6G |               | free  158.2M |  cache  13.7G
> |              |  dirty 101.4M | buff   18.2M |               | slab
> 574.6M |               |              |               |
> |               |              |
> SWP | tot   518.0M |               | free  489.6M |
> |              |               |              |
> |              |               |              |               | vmcom
> 5.2G |               | vmlim   8.3G |
> PAG | scan  327450 |               |              |  stall      0
> |              |               |              |
> |              |               | swin       0 |
> |              |               | swout      0 |
> DSK |          sdb |               | busy     90% |  read    8115
> |              |  write    695 | KiB/r    130 |               | KiB/w
> 194 | MBr/s 103.34  |              | MBw/s  13.22  | avq     4.61
> |               | avio 1.01 ms |
> DSK |          sdc |               | busy     32% |  read      23
> |              |  write    431 | KiB/r      6 |               | KiB/w
> 318 | MBr/s   0.02  |              | MBw/s  13.41  | avq    34.86
> |               | avio 6.95 ms |
> DSK |          sda |               | busy     32% |  read      25
> |              |  write    674 | KiB/r      6 |               | KiB/w
> 193 | MBr/s   0.02  |              | MBw/s  12.76  | avq    41.00
> |               | avio 4.48 ms |
> DSK |          sdd |               | busy      7% |  read      26
> |              |  write    473 | KiB/r      7 |               | KiB/w
> 223 | MBr/s   0.02  |              | MBw/s  10.31  | avq    14.29
> |               | avio 1.45 ms |
> DSK |          sde |               | busy      2% |  read       0
> |              |  write      5 | KiB/r      0 |               | KiB/w
> 5 | MBr/s   0.00  |              | MBw/s   0.00  | avq     1.00
> |               | avio 44.8 ms |
> NET | transport    |  tcpi   21326 |              |  tcpo   27479 |
> udpi       0 |  udpo       0 | tcpao      0 |               | tcppo
> 2 | tcprs      3  | tcpie      0 | tcpor      0  |              | udpnp
> 0  | udpip      0 |
> NET | network      |               | ipi    21326 |  ipo    14340
> |              |  ipfrw      0 | deliv  21326 |
> |              |               |              |               | icmpi
> 0 |               | icmpo      0 |
> NET | p2p2    ---- |  pcki   12659 |              |  pcko   20931 | si
> 124 Mbps |               | so  107 Mbps | coll       0  | mlti       0
> |               | erri       0 | erro       0  |              | drpi
> 0  | drpo       0 |
> NET | p2p1    ---- |  pcki    8565 |              |  pcko    6443 | si
> 106 Mbps |               | so 7911 Kbps | coll       0  | mlti       0
> |               | erri       0 | erro       0  |              | drpi
> 0  | drpo       0 |
> NET | lo      ---- |  pcki     108 |              |  pcko     108 |
> si    8 Kbps |               | so    8 Kbps | coll       0  | mlti
> 0 |               | erri       0 | erro       0  |              | drpi
> 0  | drpo       0 |
>   PID         RUID              EUID              THR
> SYSCPU           USRCPU          VGROW           RGROW
> RDDSK          WRDSK          ST          EXC         S
> CPUNR           CPU         CMD         1/1
>  6881         root              root              538
> 0.74s            0.94s             0K            256K
> 1.0G         121.3M          --            -         S
> 3           17%         ceph-osd
> 28708         root              root              720
> 0.30s            0.69s           512K             -8K
> 160K         157.7M          --            -         S
> 3           10%         ceph-osd
> 31569         root              root              678
> 0.21s            0.30s           512K           -584K
> 156K         162.7M          --            -         S
> 0            5%         ceph-osd
> 32095         root              root              654
> 0.14s            0.16s             0K              0K
> 60K         105.9M          --            -         S
> 0            3%         ceph-osd
>    61         root              root                1
> 0.20s            0.00s             0K              0K
> 0K             0K          --            -         S
> 3            2%         kswapd0
> 10584         root              root                1
> 0.03s            0.02s           112K            112K
> 0K             0K          --            -         R
> 4            1%         atop
> 11618         root              root                1
> 0.03s            0.00s             0K              0K
> 0K             0K          --            -         S
> 6            0%         kworker/6:2
>    10         root              root                1
> 0.02s            0.00s             0K              0K
> 0K             0K          --            -         S
> 0            0%         rcu_sched
>    38         root              root                1
> 0.01s            0.00s             0K              0K
> 0K             0K          --            -         S
> 6            0%         ksoftirqd/6
>  1623         root              root                1
> 0.01s            0.00s             0K              0K
> 0K             0K          --            -         S
> 6            0%         kworker/6:1H
>  1993         root              root                1
> 0.01s            0.00s             0K              0K
> 0K             0K          --            -         S
> 2            0%         flush-8:48
>  2031         root              root                1
> 0.01s            0.00s             0K              0K
> 0K             0K          --            -         S
> 2            0%         flush-8:0
>  2032         root              root                1
> 0.01s            0.00s             0K              0K
> 0K             0K          --            -         S
> 0            0%         flush-8:16
>  2033         root              root                1
> 0.01s            0.00s             0K              0K
> 0K             0K          --            -         S
> 2            0%         flush-8:32
>  5787         root              root                1
> 0.01s            0.00s             0K              0K
> 4K             0K          --            -         S
> 3            0%         kworker/3:0
> 27605         root              root                1
> 0.01s            0.00s             0K              0K
> 0K             0K          --            -         S
> 1            0%         kworker/1:2
> 27823         root              root                1
> 0.01s            0.00s             0K              0K
> 0K             0K          --            -         S
> 0            0%         kworker/0:2
> 32511         root              root                1
> 0.01s            0.00s             0K              0K
> 0K             0K          --            -         S
> 2            0%         kworker/2:0
>  1536         root              root                1
> 0.00s            0.00s             0K              0K
> 0K             0K          --            -         S
> 2            0%         irqbalance
>   478         root              root                1
> 0.00s            0.00s             0K              0K
> 0K             0K          --            -         S
> 3            0%         usb-storage
>   494         root              root                1
> 0.00s            0.00s             0K              0K
> 0K             0K          --            -         S
> 1            0%         jbd2/sde1-8
>  1550         root              root                1
> 0.00s            0.00s             0K              0K
> 400K             0K          --            -         S
> 1            0%         xfsaild/sdb1
>  1750         root              root                1
> 0.00s            0.00s             0K              0K
> 128K             0K          --            -         S
> 2            0%         xfsaild/sdd1
>  1994         root              root                1
> 0.00s            0.00s             0K              0K
> 0K             0K          --            -         S
> 1            0%         flush-8:64
> ====
> I have tried to trim the SSD drives but the problem seems to persist.
> Last time trimming the SSD drives can help to improve the performance.
> Any advice is greatly appreciated.
> Thank you.

Christian Balzer        Network/Systems Engineer                
ch...@gol.com           Global OnLine Japan/Fusion Communications
ceph-users mailing list

Reply via email to