Re: [ceph-users] how to find the lazy egg - poor performance - interesting observations [klartext]

2019-11-13 Thread Paul Emmerich
On Wed, Nov 13, 2019 at 10:13 AM Stefan Bauer  wrote:
>
> Paul,
>
>
> i would like to take the chance, to thank you and ask if it could not be, that
> subop_latency reports high value (is that avgtime in seconds reported?)
> because the communication partner is slow in writing/commiting?

no


Paul

>
>
> Dont want to follow the red hering :/
>
>
> We have the following times on our 11 osds. Attached image.
>
>
>
> -Ursprüngliche Nachricht-
> Von: Paul Emmerich 
> Gesendet: Donnerstag 7 November 2019 19:04
> An: Stefan Bauer 
> CC: ceph-users@lists.ceph.com
> Betreff: Re: [ceph-users] how to find the lazy egg - poor performance - 
> interesting observations [klartext]
>
> You can have a look at subop_latency in "ceph daemon osd.XX perf
> dump", it tells you how long an OSD took to reply to another OSD.
> That's usually a good indicator if an OSD is dragging down others.
> Or have a look at "ceph osd perf dump" which is basically disk
> latency; simpler to acquire but with less information
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
> On Thu, Nov 7, 2019 at 6:55 PM Stefan Bauer  wrote:
> >
> > Hi folks,
> >
> >
> > we are running a 3 node proxmox-cluster with - of corse - ceph :)
> >
> > ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous 
> > (stable)
> >
> >
> > 10G network. iperf reports almost 10G between all nodes.
> >
> >
> > We are using mixed standard SSDs (crucial / samsung). We are aware, that 
> > these disks can not delivery high iops or great throughput, but we have 
> > several of these clusters and this one is showing very poor performance.
> >
> >
> > NOW the strange fact:
> >
> >
> > When a specific node is rebooting, the throughput is acceptable.
> >
> >
> > But when the specific node is back, the results dropped by almost 100%.
> >
> >
> > 2 NODES (one rebooting)
> >
> >
> > # rados bench -p scbench 10 write --no-cleanup
> > hints = 1
> > Maintaining 16 concurrent writes of 4194304 bytes to objects of size 
> > 4194304 for up to 10 seconds or 0 objects
> > Object prefix: benchmark_data_pve3_1767693
> >   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg 
> > lat(s)
> > 0   0 0 0 0 0   -   > > 0
> > 1  165539   155.992   156   0.0445665
> > 0.257988
> > 2  16   11094187.98   2200.087097
> > 0.291173
> > 3  16   156   140   186.645   1840.462171
> > 0.286895
> > 4  16   184   168167.98   112   0.0235336
> > 0.358085
> > 5  16   210   194   155.181   1040.112401
> > 0.347883
> > 6  16   252   236   157.314   1680.134099
> > 0.382159
> > 7  16   287   271   154.838   140   0.0264864 
> > 0.40092
> > 8  16   329   313   156.481   168   0.0609964
> > 0.394753
> > 9  16   364   348   154.649   1400.244309
> > 0.392331
> >10  16   416   400   159.981   2080.277489
> > 0.387424
> > Total time run: 10.335496
> > Total writes made:  417
> > Write size: 4194304
> > Object size:4194304
> > Bandwidth (MB/sec): 161.386
> > Stddev Bandwidth:   37.8065
> > Max bandwidth (MB/sec): 220
> > Min bandwidth (MB/sec): 104
> > Average IOPS:   40
> > Stddev IOPS:9
> > Max IOPS:   55
> > Min IOPS:   26
> > Average Latency(s): 0.396434
> > Stddev Latency(s):  0.428527
> > Max latency(s): 1.86968
> > Min latency(s): 0.020558
> >
> >
> >
> > THIRD NODE ONLINE:
> >
> >
> >
> > root@pve3:/# rados bench -p scbench 10 write --no-cleanup
> > hints = 1
> > Maintaining 16 concurrent writes of 4194304 bytes to objects of size 
> > 4194304 for up to 10 seconds or 0 objects
> > Object prefix: benchmark_data_pve3_1771977
> >   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg 
> > lat(s)
> > 0   0 0 0 0 0   -   > > 0
> &g

Re: [ceph-users] how to find the lazy egg - poor performance - interesting observations [klartext]

2019-11-13 Thread Stefan Bauer
Paul,



i would like to take the chance, to thank you and ask if it could not be, that



subop_latency reports high value (is that avgtime in seconds reported?)

   "subop_latency": {
    "avgcount": 7782673,
    "sum": 38852.140794738,
    "avgtime": 0.004992133



because the communication partner is slow in writing/commiting?



Dont want to follow the red hering :/



We have the following times on our 11 osds. Attached image.





-Ursprüngliche Nachricht-
Von: Paul Emmerich 
Gesendet: Donnerstag 7 November 2019 19:04
An: Stefan Bauer 
CC: ceph-users@lists.ceph.com
Betreff: Re: [ceph-users] how to find the lazy egg - poor performance - 
interesting observations [klartext]



You can have a look at subop_latency in "ceph daemon osd.XX perf
dump", it tells you how long an OSD took to reply to another OSD.
That's usually a good indicator if an OSD is dragging down others.
Or have a look at "ceph osd perf dump" which is basically disk
latency; simpler to acquire but with less information

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Thu, Nov 7, 2019 at 6:55 PM Stefan Bauer  wrote:
>
> Hi folks,
>
>
> we are running a 3 node proxmox-cluster with - of corse - ceph :)
>
> ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous 
> (stable)
>
>
> 10G network. iperf reports almost 10G between all nodes.
>
>
> We are using mixed standard SSDs (crucial / samsung). We are aware, that 
> these disks can not delivery high iops or great throughput, but we have 
> several of these clusters and this one is showing very poor performance.
>
>
> NOW the strange fact:
>
>
> When a specific node is rebooting, the throughput is acceptable.
>
>
> But when the specific node is back, the results dropped by almost 100%.
>
>
> 2 NODES (one rebooting)
>
>
> # rados bench -p scbench 10 write --no-cleanup
> hints = 1
> Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 
> for up to 10 seconds or 0 objects
> Object prefix: benchmark_data_pve3_1767693
>   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
> 0   0 0 0 0 0   -   0
> 1  165539   155.992   156   0.04456650.257988
> 2  16   11094187.98   2200.0870970.291173
> 3  16   156   140   186.645   1840.4621710.286895
> 4  16   184   168167.98   112   0.02353360.358085
> 5  16   210   194   155.181   1040.1124010.347883
> 6  16   252   236   157.314   1680.1340990.382159
> 7  16   287   271   154.838   140   0.0264864 0.40092
> 8  16   329   313   156.481   168   0.06099640.394753
> 9  16   364   348   154.649   1400.2443090.392331
>10  16   416   400   159.981   2080.2774890.387424
> Total time run: 10.335496
> Total writes made:  417
> Write size: 4194304
> Object size:4194304
> Bandwidth (MB/sec): 161.386
> Stddev Bandwidth:   37.8065
> Max bandwidth (MB/sec): 220
> Min bandwidth (MB/sec): 104
> Average IOPS:   40
> Stddev IOPS:9
> Max IOPS:   55
> Min IOPS:   26
> Average Latency(s): 0.396434
> Stddev Latency(s):  0.428527
> Max latency(s): 1.86968
> Min latency(s): 0.020558
>
>
>
> THIRD NODE ONLINE:
>
>
>
> root@pve3:/# rados bench -p scbench 10 write --no-cleanup
> hints = 1
> Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 
> for up to 10 seconds or 0 objects
> Object prefix: benchmark_data_pve3_1771977
>   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
> 0   0 0 0 0 0   -   0
> 1  163923   91.994392 0.213530.267249
> 2  164630   59.992428 0.295270.268672
> 3  165337   49.3271280.1227320.259731
> 4  165337   36.9954 0   -0.259731
> 5  165337   29.5963 0   -0.259731
> 6  168771   47.3271   45.0.241921 1.19831
> 7  16   10690   51.4214760.124821 1.07941
> 8  16   129   113   

[ceph-users] how to find the lazy egg - poor performance - interesting observations [klartext]

2019-11-09 Thread Philippe D'Anjou
This only happens with this one specific node?checked system logs? checked 
SMART on all disks?I mean technically it's expected to have slower writes when 
the third node is there, it's by ceph design.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to find the lazy egg - poor performance - interesting observations [klartext]

2019-11-07 Thread Stefan Bauer
Thank you Paul. I'm not sure if these low values will be of any help:



osd commit_latency(ms) apply_latency(ms)
  0  0 0
  1  0 0
  5  0 0
  4  0 0
  3  0 0
  2  0 0
  6  0 0
  7  3 3
  8  3 3
  9  3 3
 10  3 3
 11  0 0



But still, there are some higher OSDs.



If i do some stresstest on a VM, the values increase heavily but Im unsure if 
this is not only a peak by the data distribution through crush-map and part of 
the game.



osd commit_latency(ms) apply_latency(ms)
  0  8 8
  1 18    18
  5  0 0
  4  0 0
  3  0 0
  2  7 7
  6  0 0
  7    100   100
  8 44    44
  9    199   199
 10    512   512
 11 15    15



osd commit_latency(ms) apply_latency(ms)
  0 30    30
  1  5 5
  5  0 0
  4  0 0
  3  0 0
  2    719   719
  6  0 0
  7    150   150
  8 22    22
  9    110   110
 10 94    94
 11 24    24









Stefan



Von: Paul Emmerich 


You can have a look at subop_latency in "ceph daemon osd.XX perf
dump", it tells you how long an OSD took to reply to another OSD.
That's usually a good indicator if an OSD is dragging down others.
Or have a look at "ceph osd perf dump" which is basically disk
latency; simpler to acquire but with less information
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to find the lazy egg - poor performance - interesting observations [klartext]

2019-11-07 Thread Paul Emmerich
You can have a look at subop_latency in "ceph daemon osd.XX perf
dump", it tells you how long an OSD took to reply to another OSD.
That's usually a good indicator if an OSD is dragging down others.
Or have a look at "ceph osd perf dump" which is basically disk
latency; simpler to acquire but with less information

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Thu, Nov 7, 2019 at 6:55 PM Stefan Bauer  wrote:
>
> Hi folks,
>
>
> we are running a 3 node proxmox-cluster with - of corse - ceph :)
>
> ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous 
> (stable)
>
>
> 10G network. iperf reports almost 10G between all nodes.
>
>
> We are using mixed standard SSDs (crucial / samsung). We are aware, that 
> these disks can not delivery high iops or great throughput, but we have 
> several of these clusters and this one is showing very poor performance.
>
>
> NOW the strange fact:
>
>
> When a specific node is rebooting, the throughput is acceptable.
>
>
> But when the specific node is back, the results dropped by almost 100%.
>
>
> 2 NODES (one rebooting)
>
>
> # rados bench -p scbench 10 write --no-cleanup
> hints = 1
> Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 
> for up to 10 seconds or 0 objects
> Object prefix: benchmark_data_pve3_1767693
>   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
> 0   0 0 0 0 0   -   0
> 1  165539   155.992   156   0.04456650.257988
> 2  16   11094187.98   2200.0870970.291173
> 3  16   156   140   186.645   1840.4621710.286895
> 4  16   184   168167.98   112   0.02353360.358085
> 5  16   210   194   155.181   1040.1124010.347883
> 6  16   252   236   157.314   1680.1340990.382159
> 7  16   287   271   154.838   140   0.0264864 0.40092
> 8  16   329   313   156.481   168   0.06099640.394753
> 9  16   364   348   154.649   1400.2443090.392331
>10  16   416   400   159.981   2080.2774890.387424
> Total time run: 10.335496
> Total writes made:  417
> Write size: 4194304
> Object size:4194304
> Bandwidth (MB/sec): 161.386
> Stddev Bandwidth:   37.8065
> Max bandwidth (MB/sec): 220
> Min bandwidth (MB/sec): 104
> Average IOPS:   40
> Stddev IOPS:9
> Max IOPS:   55
> Min IOPS:   26
> Average Latency(s): 0.396434
> Stddev Latency(s):  0.428527
> Max latency(s): 1.86968
> Min latency(s): 0.020558
>
>
>
> THIRD NODE ONLINE:
>
>
>
> root@pve3:/# rados bench -p scbench 10 write --no-cleanup
> hints = 1
> Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 
> for up to 10 seconds or 0 objects
> Object prefix: benchmark_data_pve3_1771977
>   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
> 0   0 0 0 0 0   -   0
> 1  163923   91.994392 0.213530.267249
> 2  164630   59.992428 0.295270.268672
> 3  165337   49.3271280.1227320.259731
> 4  165337   36.9954 0   -0.259731
> 5  165337   29.5963 0   -0.259731
> 6  168771   47.3271   45.0.241921 1.19831
> 7  16   10690   51.4214760.124821 1.07941
> 8  16   129   11356.49292   0.03141460.941378
> 9  16   142   126   55.9919520.2855360.871445
>10  16   147   131   52.3925200.3548030.852074
> Total time run: 10.138312
> Total writes made:  148
> Write size: 4194304
> Object size:4194304
> Bandwidth (MB/sec): 58.3924
> Stddev Bandwidth:   34.405
> Max bandwidth (MB/sec): 92
> Min bandwidth (MB/sec): 0
> Average IOPS:   14
> Stddev IOPS:8
> Max IOPS:   23
> Min IOPS:   0
> Average Latency(s): 1.08818
> Stddev Latency(s):  1.55967
> Max latency(s): 5.02514
> Min latency(s): 0.0255947
>
>
>
> Is here a single node faulty?
>
>
>
> root@pve3:/# ceph status
>   cluster:
> id: 138c857a-c4e6-4600-9320-9567011470d6
> health: HEALTH_WARN
> application not enabled on 1 pool(s) (thats just for benchmarking)
>
>   services:
> mon: 3 daemons, quorum pve1,pve2,pve3
> mgr: pve1(active), standbys: 

[ceph-users] how to find the lazy egg - poor performance - interesting observations [klartext]

2019-11-07 Thread Stefan Bauer
Hi folks,



we are running a 3 node proxmox-cluster with - of corse - ceph :)

ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous 
(stable)



10G network. iperf reports almost 10G between all nodes.



We are using mixed standard SSDs (crucial / samsung). We are aware, that these 
disks can not delivery high iops or great throughput, but we have several of 
these clusters and this one is showing very poor performance.



NOW the strange fact:



When a specific node is rebooting, the throughput is acceptable.



But when the specific node is back, the results dropped by almost 100%.



2 NODES (one rebooting)



# rados bench -p scbench 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 
for up to 10 seconds or 0 objects
Object prefix: benchmark_data_pve3_1767693
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0   0 0 0 0 0   -   0
    1  16    55    39   155.992   156   0.0445665    0.257988
    2  16   110    94    187.98   220    0.087097    0.291173
    3  16   156   140   186.645   184    0.462171    0.286895
    4  16   184   168    167.98   112   0.0235336    0.358085
    5  16   210   194   155.181   104    0.112401    0.347883
    6  16   252   236   157.314   168    0.134099    0.382159
    7  16   287   271   154.838   140   0.0264864 0.40092
    8  16   329   313   156.481   168   0.0609964    0.394753
    9  16   364   348   154.649   140    0.244309    0.392331
   10  16   416   400   159.981   208    0.277489    0.387424
Total time run: 10.335496
Total writes made:  417
Write size: 4194304
Object size:    4194304
Bandwidth (MB/sec): 161.386
Stddev Bandwidth:   37.8065
Max bandwidth (MB/sec): 220
Min bandwidth (MB/sec): 104
Average IOPS:   40
Stddev IOPS:    9
Max IOPS:   55
Min IOPS:   26
Average Latency(s): 0.396434
Stddev Latency(s):  0.428527
Max latency(s): 1.86968
Min latency(s): 0.020558





THIRD NODE ONLINE:





root@pve3:/# rados bench -p scbench 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 
for up to 10 seconds or 0 objects
Object prefix: benchmark_data_pve3_1771977
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0   0 0 0 0 0   -   0
    1  16    39    23   91.9943    92 0.21353    0.267249
    2  16    46    30   59.9924    28 0.29527    0.268672
    3  16    53    37   49.3271    28    0.122732    0.259731
    4  16    53    37   36.9954 0   -    0.259731
    5  16    53    37   29.5963 0   -    0.259731
    6  16    87    71   47.3271   45.    0.241921 1.19831
    7  16   106    90   51.4214    76    0.124821 1.07941
    8  16   129   113    56.492    92   0.0314146    0.941378
    9  16   142   126   55.9919    52    0.285536    0.871445
   10  16   147   131   52.3925    20    0.354803    0.852074
Total time run: 10.138312
Total writes made:  148
Write size: 4194304
Object size:    4194304
Bandwidth (MB/sec): 58.3924
Stddev Bandwidth:   34.405
Max bandwidth (MB/sec): 92
Min bandwidth (MB/sec): 0
Average IOPS:   14
Stddev IOPS:    8
Max IOPS:   23
Min IOPS:   0
Average Latency(s): 1.08818
Stddev Latency(s):  1.55967
Max latency(s): 5.02514
Min latency(s): 0.0255947





Is here a single node faulty?





root@pve3:/# ceph status
  cluster:
    id: 138c857a-c4e6-4600-9320-9567011470d6
    health: HEALTH_WARN
    application not enabled on 1 pool(s) (thats just for benchmarking)
 
  services:
    mon: 3 daemons, quorum pve1,pve2,pve3
    mgr: pve1(active), standbys: pve3, pve2
    osd: 12 osds: 12 up, 12 in
 
  data:
    pools:   2 pools, 612 pgs
    objects: 758.52k objects, 2.89TiB
    usage:   8.62TiB used, 7.75TiB / 16.4TiB avail
    pgs: 611 active+clean
 1   active+clean+scrubbing+deep
 
  io:
    client:   4.99MiB/s rd, 1.36MiB/s wr, 678op/s rd, 105op/s wr





Thank you.



Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com