[ceph-users] low io with enterprise SSDs ceph luminous - can we expect more? [klartext]

2020-01-15 Thread Stefan Bauer
Folks,



i would like to thank you again for your help regarding performance speedup of 
our ceph cluster.



Customer just reports, that database is around 40% faster than before without 
changing any hardware.



This really kicks ass now! :)



We measured the subop_latency - avgtime on our OSDs and could reduce latency 
from 2.5ms to 0.7ms now.



:p




Cheers





Stefan





-Ursprüngliche Nachricht-
Von: Виталий Филиппов 
Gesendet: Dienstag 14 Januar 2020 10:28
An: Wido den Hollander ; Stefan Bauer 
CC: ceph-users@lists.ceph.com
Betreff: Re: [ceph-users] low io with enterprise SSDs ceph luminous - can we 
expect more? [klartext]

...disable signatures and rbd cache. I didn't mention it in the email to not 
repeat myself. But I have it in the article :-)
--
With best regards,
Vitaliy Filippov___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] low io with enterprise SSDs ceph luminous - can we expect more? [klartext]

2020-01-14 Thread Stefan Bauer
Thank you all,



performance is indeed better now. Can now go back to sleep ;)



KR



Stefan



-Ursprüngliche Nachricht-
Von: Виталий Филиппов 
Gesendet: Dienstag 14 Januar 2020 10:28
An: Wido den Hollander ; Stefan Bauer 
CC: ceph-users@lists.ceph.com
Betreff: Re: [ceph-users] low io with enterprise SSDs ceph luminous - can we 
expect more? [klartext]

...disable signatures and rbd cache. I didn't mention it in the email to not 
repeat myself. But I have it in the article :-)
--
With best regards,
Vitaliy Filippov___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] low io with enterprise SSDs ceph luminous - can we expect more? [klartext]

2020-01-14 Thread Stefan Bauer
Hi Vitaliy,



thank you for your time. Do you mean



cephx sign messages = false

with "diable signatures" ?



KR

Stefan





-Ursprüngliche Nachricht-
Von: Виталий Филиппов 
Gesendet: Dienstag 14 Januar 2020 10:28
An: Wido den Hollander ; Stefan Bauer 
CC: ceph-users@lists.ceph.com
Betreff: Re: [ceph-users] low io with enterprise SSDs ceph luminous - can we 
expect more? [klartext]

...disable signatures and rbd cache. I didn't mention it in the email to not 
repeat myself. But I have it in the article :-)
--
With best regards,
Vitaliy Filippov___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] low io with enterprise SSDs ceph luminous - can we expect more? [klartext]

2020-01-14 Thread Stefan Bauer
Hi Stefan,



thank you for your time.



"temporary write through" does not seem to be a legit parameter.



However write through is already set:



root@proxmox61:~# echo "temporary write through" > 
/sys/block/sdb/device/scsi_disk/*/cache_type
root@proxmox61:~# cat /sys/block/sdb/device/scsi_disk/2\:0\:0\:0/cache_type
write through



is that, what you meant?



Thank you.



KR



Stefan



-Ursprüngliche Nachricht-
Von: Stefan Priebe - Profihost AG 
 
this has something todo with the firmware and how the manufacturer
handles syncs / flushes.

Intel just ignores sync / flush commands for drives which have a
capacitor. Samsung does not.

The problem is that Ceph sends a lot of flush commands which slows down
drives without capacitor.

You can make linux to ignore those userspace requests with the following

command:
echo "temporary write through" >
/sys/block/sdX/device/scsi_disk/*/cache_type

Greets,
Stefan Priebe
Profihost AG


> Thank you.
>
>
> Stefan
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] low io with enterprise SSDs ceph luminous - can we expect more? [klartext]

2020-01-13 Thread Stefan Bauer
Hi,



we're playing around with ceph but are not quite happy with the IOs.



3 node ceph / proxmox cluster with each:



LSI HBA 3008 controller

4 x MZILT960HAHQ/007 Samsung SSD

Transport protocol:   SAS (SPL-3)

40G fibre Intel 520 Network controller on Unifi Switch

Ping roundtrip to partner node is 0.040ms average.



Transport protocol:   SAS (SPL-3)



fio reports on a virtual machine with



--randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test 
--filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75



on average 5000 iops / write

on average 13000 iops / read





We're expecting more. :( any ideas or is that all we can expect?



money is not a problem for this test-bed, any ideas howto gain more IOS is 
greatly appreciated.



Thank you.



Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to find the lazy egg - poor performance - interesting observations [klartext]

2019-11-13 Thread Stefan Bauer
Paul,



i would like to take the chance, to thank you and ask if it could not be, that



subop_latency reports high value (is that avgtime in seconds reported?)

   "subop_latency": {
    "avgcount": 7782673,
    "sum": 38852.140794738,
    "avgtime": 0.004992133



because the communication partner is slow in writing/commiting?



Dont want to follow the red hering :/



We have the following times on our 11 osds. Attached image.





-Ursprüngliche Nachricht-
Von: Paul Emmerich 
Gesendet: Donnerstag 7 November 2019 19:04
An: Stefan Bauer 
CC: ceph-users@lists.ceph.com
Betreff: Re: [ceph-users] how to find the lazy egg - poor performance - 
interesting observations [klartext]



You can have a look at subop_latency in "ceph daemon osd.XX perf
dump", it tells you how long an OSD took to reply to another OSD.
That's usually a good indicator if an OSD is dragging down others.
Or have a look at "ceph osd perf dump" which is basically disk
latency; simpler to acquire but with less information

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Thu, Nov 7, 2019 at 6:55 PM Stefan Bauer  wrote:
>
> Hi folks,
>
>
> we are running a 3 node proxmox-cluster with - of corse - ceph :)
>
> ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous 
> (stable)
>
>
> 10G network. iperf reports almost 10G between all nodes.
>
>
> We are using mixed standard SSDs (crucial / samsung). We are aware, that 
> these disks can not delivery high iops or great throughput, but we have 
> several of these clusters and this one is showing very poor performance.
>
>
> NOW the strange fact:
>
>
> When a specific node is rebooting, the throughput is acceptable.
>
>
> But when the specific node is back, the results dropped by almost 100%.
>
>
> 2 NODES (one rebooting)
>
>
> # rados bench -p scbench 10 write --no-cleanup
> hints = 1
> Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 
> for up to 10 seconds or 0 objects
> Object prefix: benchmark_data_pve3_1767693
>   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
> 0   0 0 0 0 0   -   0
> 1  165539   155.992   156   0.04456650.257988
> 2  16   11094187.98   2200.0870970.291173
> 3  16   156   140   186.645   1840.4621710.286895
> 4  16   184   168167.98   112   0.02353360.358085
> 5  16   210   194   155.181   1040.1124010.347883
> 6  16   252   236   157.314   1680.1340990.382159
> 7  16   287   271   154.838   140   0.0264864 0.40092
> 8  16   329   313   156.481   168   0.06099640.394753
> 9  16   364   348   154.649   1400.2443090.392331
>10  16   416   400   159.981   2080.2774890.387424
> Total time run: 10.335496
> Total writes made:  417
> Write size: 4194304
> Object size:4194304
> Bandwidth (MB/sec): 161.386
> Stddev Bandwidth:   37.8065
> Max bandwidth (MB/sec): 220
> Min bandwidth (MB/sec): 104
> Average IOPS:   40
> Stddev IOPS:9
> Max IOPS:   55
> Min IOPS:   26
> Average Latency(s): 0.396434
> Stddev Latency(s):  0.428527
> Max latency(s): 1.86968
> Min latency(s): 0.020558
>
>
>
> THIRD NODE ONLINE:
>
>
>
> root@pve3:/# rados bench -p scbench 10 write --no-cleanup
> hints = 1
> Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 
> for up to 10 seconds or 0 objects
> Object prefix: benchmark_data_pve3_1771977
>   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
> 0   0 0 0 0 0   -   0
> 1  163923   91.994392 0.213530.267249
> 2  164630   59.992428 0.295270.268672
> 3  165337   49.3271280.1227320.259731
> 4  165337   36.9954 0   -0.259731
> 5  165337   29.5963 0   -0.259731
> 6  168771   47.3271   45.0.241921 1.19831
> 7  16   10690   51.4214760.124821 1.07941
> 8  16   129   1

Re: [ceph-users] how to find the lazy egg - poor performance - interesting observations [klartext]

2019-11-07 Thread Stefan Bauer
Thank you Paul. I'm not sure if these low values will be of any help:



osd commit_latency(ms) apply_latency(ms)
  0  0 0
  1  0 0
  5  0 0
  4  0 0
  3  0 0
  2  0 0
  6  0 0
  7  3 3
  8  3 3
  9  3 3
 10  3 3
 11  0 0



But still, there are some higher OSDs.



If i do some stresstest on a VM, the values increase heavily but Im unsure if 
this is not only a peak by the data distribution through crush-map and part of 
the game.



osd commit_latency(ms) apply_latency(ms)
  0  8 8
  1 18    18
  5  0 0
  4  0 0
  3  0 0
  2  7 7
  6  0 0
  7    100   100
  8 44    44
  9    199   199
 10    512   512
 11 15    15



osd commit_latency(ms) apply_latency(ms)
  0 30    30
  1  5 5
  5  0 0
  4  0 0
  3  0 0
  2    719   719
  6  0 0
  7    150   150
  8 22    22
  9    110   110
 10 94    94
 11 24    24









Stefan



Von: Paul Emmerich 


You can have a look at subop_latency in "ceph daemon osd.XX perf
dump", it tells you how long an OSD took to reply to another OSD.
That's usually a good indicator if an OSD is dragging down others.
Or have a look at "ceph osd perf dump" which is basically disk
latency; simpler to acquire but with less information
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] how to find the lazy egg - poor performance - interesting observations [klartext]

2019-11-07 Thread Stefan Bauer
Hi folks,



we are running a 3 node proxmox-cluster with - of corse - ceph :)

ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous 
(stable)



10G network. iperf reports almost 10G between all nodes.



We are using mixed standard SSDs (crucial / samsung). We are aware, that these 
disks can not delivery high iops or great throughput, but we have several of 
these clusters and this one is showing very poor performance.



NOW the strange fact:



When a specific node is rebooting, the throughput is acceptable.



But when the specific node is back, the results dropped by almost 100%.



2 NODES (one rebooting)



# rados bench -p scbench 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 
for up to 10 seconds or 0 objects
Object prefix: benchmark_data_pve3_1767693
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0   0 0 0 0 0   -   0
    1  16    55    39   155.992   156   0.0445665    0.257988
    2  16   110    94    187.98   220    0.087097    0.291173
    3  16   156   140   186.645   184    0.462171    0.286895
    4  16   184   168    167.98   112   0.0235336    0.358085
    5  16   210   194   155.181   104    0.112401    0.347883
    6  16   252   236   157.314   168    0.134099    0.382159
    7  16   287   271   154.838   140   0.0264864 0.40092
    8  16   329   313   156.481   168   0.0609964    0.394753
    9  16   364   348   154.649   140    0.244309    0.392331
   10  16   416   400   159.981   208    0.277489    0.387424
Total time run: 10.335496
Total writes made:  417
Write size: 4194304
Object size:    4194304
Bandwidth (MB/sec): 161.386
Stddev Bandwidth:   37.8065
Max bandwidth (MB/sec): 220
Min bandwidth (MB/sec): 104
Average IOPS:   40
Stddev IOPS:    9
Max IOPS:   55
Min IOPS:   26
Average Latency(s): 0.396434
Stddev Latency(s):  0.428527
Max latency(s): 1.86968
Min latency(s): 0.020558





THIRD NODE ONLINE:





root@pve3:/# rados bench -p scbench 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 
for up to 10 seconds or 0 objects
Object prefix: benchmark_data_pve3_1771977
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0   0 0 0 0 0   -   0
    1  16    39    23   91.9943    92 0.21353    0.267249
    2  16    46    30   59.9924    28 0.29527    0.268672
    3  16    53    37   49.3271    28    0.122732    0.259731
    4  16    53    37   36.9954 0   -    0.259731
    5  16    53    37   29.5963 0   -    0.259731
    6  16    87    71   47.3271   45.    0.241921 1.19831
    7  16   106    90   51.4214    76    0.124821 1.07941
    8  16   129   113    56.492    92   0.0314146    0.941378
    9  16   142   126   55.9919    52    0.285536    0.871445
   10  16   147   131   52.3925    20    0.354803    0.852074
Total time run: 10.138312
Total writes made:  148
Write size: 4194304
Object size:    4194304
Bandwidth (MB/sec): 58.3924
Stddev Bandwidth:   34.405
Max bandwidth (MB/sec): 92
Min bandwidth (MB/sec): 0
Average IOPS:   14
Stddev IOPS:    8
Max IOPS:   23
Min IOPS:   0
Average Latency(s): 1.08818
Stddev Latency(s):  1.55967
Max latency(s): 5.02514
Min latency(s): 0.0255947





Is here a single node faulty?





root@pve3:/# ceph status
  cluster:
    id: 138c857a-c4e6-4600-9320-9567011470d6
    health: HEALTH_WARN
    application not enabled on 1 pool(s) (thats just for benchmarking)
 
  services:
    mon: 3 daemons, quorum pve1,pve2,pve3
    mgr: pve1(active), standbys: pve3, pve2
    osd: 12 osds: 12 up, 12 in
 
  data:
    pools:   2 pools, 612 pgs
    objects: 758.52k objects, 2.89TiB
    usage:   8.62TiB used, 7.75TiB / 16.4TiB avail
    pgs: 611 active+clean
 1   active+clean+scrubbing+deep
 
  io:
    client:   4.99MiB/s rd, 1.36MiB/s wr, 678op/s rd, 105op/s wr





Thank you.



Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com