Re: [ceph-users] performance in a small cluster

2019-06-07 Thread Paul Emmerich
On Tue, Jun 4, 2019 at 1:39 PM  wrote:

> >> Basically they max out at around 1000 IOPS and report 100%
> >> utilization and feel slow.
> >>
> >> Haven't seen the 5200 yet.
>
> Micron 5100s performs wonderfully!
>
> You have to just turn its write cache off:
>
> hdparm -W 0 /dev/sdX
>
> 1000 IOPS means you haven't done it. Although even with write cache
> enabled I observe like ~5000 iops, not 1000, but that delta is probably
> just eaten by Ceph :))
>

can confirm that there are several disks where the write cache seems to
be broken and this helps a lot. Good to know that this is one of these
disks.

(The cluster with these disks that I've worked on didn't need more than
a few IOPS per disk and had other problems, so I didn't check it there)


Paul


>
> With write cache turned off 5100 is capable of up to 4 write iops.
> 5200 is slightly worse, but only slightly: it still gives ~25000 iops.
>
> Funny thing is that the same applies to a lot of server SSDs with
> supercapacitors. As I understand when their write cache is turned on
> every `fsync` is translated to SATA FLUSH CACHE, and the latter is
> interpreted by the drive as "please flush all caches, including
> capacitor-protected write cache".
>
> And when you turn it off the drive just writes at its full speed and
> doesn't flush the cache because it has capacitors to account for a
> possible power loss.
>
> You don't need to disable cache explicitly only with some HBAs that do
> it internally.
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] performance in a small cluster

2019-06-04 Thread vitalif

Basically they max out at around 1000 IOPS and report 100%
utilization and feel slow.

Haven't seen the 5200 yet.


Micron 5100s performs wonderfully!

You have to just turn its write cache off:

hdparm -W 0 /dev/sdX

1000 IOPS means you haven't done it. Although even with write cache 
enabled I observe like ~5000 iops, not 1000, but that delta is probably 
just eaten by Ceph :))


With write cache turned off 5100 is capable of up to 4 write iops. 
5200 is slightly worse, but only slightly: it still gives ~25000 iops.


Funny thing is that the same applies to a lot of server SSDs with 
supercapacitors. As I understand when their write cache is turned on 
every `fsync` is translated to SATA FLUSH CACHE, and the latter is 
interpreted by the drive as "please flush all caches, including 
capacitor-protected write cache".


And when you turn it off the drive just writes at its full speed and 
doesn't flush the cache because it has capacitors to account for a 
possible power loss.


You don't need to disable cache explicitly only with some HBAs that do 
it internally.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] performance in a small cluster

2019-05-31 Thread Reed Dier
Is there any other evidence of this?

I have 20 5100 MAX (MTFDDAK1T9TCC) and have not experienced any real issues 
with them.
I would pick my Samsung SM863a's or any of my Intel's over the Micron's, but I 
haven't seen the Micron's cause any issues for me.
For what its worth, they are all FW D0MU027, which is likely more out of date, 
but it is working for me.

However, I would steer people away from the Micron 9100 MAX 
(MTFDHAX1T2MCF-1AN1ZABYY) as an NVMe disk to use for WAL/DB, as I have seen 
performance, and reliability issues with those.

Just my 2¢

Reed

> On May 29, 2019, at 12:52 PM, Paul Emmerich  wrote:
> 
> 
> 
> On Wed, May 29, 2019 at 9:36 AM Robert Sander  > wrote:
> Am 24.05.19 um 14:43 schrieb Paul Emmerich:
> > * SSD model? Lots of cheap SSDs simply can't handle more than that
> 
> The customer currently has 12 Micron 5100 1,92TB (Micron_5100_MTFDDAK1)
> SSDs and will get a batch of Micron 5200 in the next days
> 
> And there's your bottleneck ;)
> The Micron 5100 performs horribly in Ceph, I've seen similar performance in 
> another cluster with these disks.
> Basically they max out at around 1000 IOPS and report 100% utilization and 
> feel slow.
> 
> Haven't seen the 5200 yet.
> 
> 
> Paul
>  
> 
> We have identified the performance settings in the BIOS as a major
> factor. Ramping that up we got a remarkable performance increase.
> 
> Regards
> -- 
> Robert Sander
> Heinlein Support GmbH
> Linux: Akademie - Support - Hosting
> http://www.heinlein-support.de 
> 
> Tel: 030-405051-43
> Fax: 030-405051-19
> 
> Zwangsangaben lt. §35a GmbHG:
> HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
> Geschäftsführer: Peer Heinlein  -- Sitz: Berlin
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] performance in a small cluster

2019-05-29 Thread Paul Emmerich
On Wed, May 29, 2019 at 11:37 AM Robert Sander 
wrote:

> Hi,
>
> Am 29.05.19 um 11:19 schrieb Martin Verges:
> >
> > We have identified the performance settings in the BIOS as a major
> > factor
> >
> > could you share your insights what options you changed to increase
> > performance and could you provide numbers to it?
>
> Most default perfomance settings nowadays seem to be geared towards
> power savings. This decreases CPU frequencies and does not play well
> with Ceph (and virtualization).
>

Agreed, disabling C states can help, disabling dynamic underclocking can
also help.

No need to that in the BIOS, setting that via linux-cpupower and similar
tools is enough.

Another thing that can help is this:

net.ipv4.tcp_low_latency=1

But all of these are to get last drop of IOPS out if you are already
gettings lots of IOPS,
it's not something that helps if your disk is only getting 1000 IOPS.

Paul


>
> There was just one setting in the BIOS of these machines called "Host
> Performance" that was set to "Balanced". We changed that to "Max
> Performance" and immediately the throughput doubled.
>
> Regards
> --
> Robert Sander
> Heinlein Support GmbH
> Linux: Akademie - Support - Hosting
> http://www.heinlein-support.de
>
> Tel: 030-405051-43
> Fax: 030-405051-19
>
> Zwangsangaben lt. §35a GmbHG:
> HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
> Geschäftsführer: Peer Heinlein  -- Sitz: Berlin
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] performance in a small cluster

2019-05-29 Thread Paul Emmerich
On Wed, May 29, 2019 at 9:36 AM Robert Sander 
wrote:

> Am 24.05.19 um 14:43 schrieb Paul Emmerich:
> > * SSD model? Lots of cheap SSDs simply can't handle more than that
>
> The customer currently has 12 Micron 5100 1,92TB (Micron_5100_MTFDDAK1)
> SSDs and will get a batch of Micron 5200 in the next days
>

And there's your bottleneck ;)
The Micron 5100 performs horribly in Ceph, I've seen similar performance in
another cluster with these disks.
Basically they max out at around 1000 IOPS and report 100% utilization and
feel slow.

Haven't seen the 5200 yet.


Paul


>
> We have identified the performance settings in the BIOS as a major
> factor. Ramping that up we got a remarkable performance increase.
>
> Regards
> --
> Robert Sander
> Heinlein Support GmbH
> Linux: Akademie - Support - Hosting
> http://www.heinlein-support.de
>
> Tel: 030-405051-43
> Fax: 030-405051-19
>
> Zwangsangaben lt. §35a GmbHG:
> HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
> Geschäftsführer: Peer Heinlein  -- Sitz: Berlin
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] performance in a small cluster

2019-05-29 Thread Robert Sander
Hi,

Am 29.05.19 um 11:19 schrieb Martin Verges:
> 
> We have identified the performance settings in the BIOS as a major
> factor
> 
> could you share your insights what options you changed to increase
> performance and could you provide numbers to it?

Most default perfomance settings nowadays seem to be geared towards
power savings. This decreases CPU frequencies and does not play well
with Ceph (and virtualization).

There was just one setting in the BIOS of these machines called "Host
Performance" that was set to "Balanced". We changed that to "Max
Performance" and immediately the throughput doubled.

Regards
-- 
Robert Sander
Heinlein Support GmbH
Linux: Akademie - Support - Hosting
http://www.heinlein-support.de

Tel: 030-405051-43
Fax: 030-405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein  -- Sitz: Berlin



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] performance in a small cluster

2019-05-29 Thread Andrei Mikhailovsky
It would be interesting to learn the improvements types and the BIOS changes 
that helped you. 

Thanks 

> From: "Martin Verges" 
> To: "Robert Sander" 
> Cc: "ceph-users" 
> Sent: Wednesday, 29 May, 2019 10:19:09
> Subject: Re: [ceph-users] performance in a small cluster

> Hello Robert,

>> We have identified the performance settings in the BIOS as a major factor

> could you share your insights what options you changed to increase performance
> and could you provide numbers to it?

> Many thanks in advance

> --
> Martin Verges
> Managing director

> Mobile: +49 174 9335695
> E-Mail: [ mailto:martin.ver...@croit.io | martin.ver...@croit.io ]
> Chat: [ https://t.me/MartinVerges | https://t.me/MartinVerges ]

> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263

> Web: [ https://croit.io/ | https://croit.io ]
> YouTube: [ https://goo.gl/PGE1Bx | https://goo.gl/PGE1Bx ]

> Am Mi., 29. Mai 2019 um 09:36 Uhr schrieb Robert Sander < [
> mailto:r.san...@heinlein-support.de | r.san...@heinlein-support.de ] >:

>> Am 24.05.19 um 14:43 schrieb Paul Emmerich:
>> > * SSD model? Lots of cheap SSDs simply can't handle more than that

>> The customer currently has 12 Micron 5100 1,92TB (Micron_5100_MTFDDAK1)
>> SSDs and will get a batch of Micron 5200 in the next days

>> We have identified the performance settings in the BIOS as a major
>> factor. Ramping that up we got a remarkable performance increase.

>> Regards
>> --
>> Robert Sander
>> Heinlein Support GmbH
>> Linux: Akademie - Support - Hosting
>> [ http://www.heinlein-support.de/ | http://www.heinlein-support.de ]

>> Tel: 030-405051-43
>> Fax: 030-405051-19

>> Zwangsangaben lt. §35a GmbHG:
>> HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
>> Geschäftsführer: Peer Heinlein -- Sitz: Berlin

>> ___
>> ceph-users mailing list
>> [ mailto:ceph-users@lists.ceph.com | ceph-users@lists.ceph.com ]
>> [ http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com |
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ]

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] performance in a small cluster

2019-05-29 Thread Martin Verges
Hello Robert,

We have identified the performance settings in the BIOS as a major factor
>

could you share your insights what options you changed to increase
performance and could you provide numbers to it?

Many thanks in advance

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx


Am Mi., 29. Mai 2019 um 09:36 Uhr schrieb Robert Sander <
r.san...@heinlein-support.de>:

> Am 24.05.19 um 14:43 schrieb Paul Emmerich:
> > * SSD model? Lots of cheap SSDs simply can't handle more than that
>
> The customer currently has 12 Micron 5100 1,92TB (Micron_5100_MTFDDAK1)
> SSDs and will get a batch of Micron 5200 in the next days
>
> We have identified the performance settings in the BIOS as a major
> factor. Ramping that up we got a remarkable performance increase.
>
> Regards
> --
> Robert Sander
> Heinlein Support GmbH
> Linux: Akademie - Support - Hosting
> http://www.heinlein-support.de
>
> Tel: 030-405051-43
> Fax: 030-405051-19
>
> Zwangsangaben lt. §35a GmbHG:
> HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
> Geschäftsführer: Peer Heinlein  -- Sitz: Berlin
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] performance in a small cluster

2019-05-29 Thread Robert Sander
Am 24.05.19 um 14:43 schrieb Paul Emmerich:
> * SSD model? Lots of cheap SSDs simply can't handle more than that

The customer currently has 12 Micron 5100 1,92TB (Micron_5100_MTFDDAK1)
SSDs and will get a batch of Micron 5200 in the next days

We have identified the performance settings in the BIOS as a major
factor. Ramping that up we got a remarkable performance increase.

Regards
-- 
Robert Sander
Heinlein Support GmbH
Linux: Akademie - Support - Hosting
http://www.heinlein-support.de

Tel: 030-405051-43
Fax: 030-405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein  -- Sitz: Berlin



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] performance in a small cluster

2019-05-27 Thread Stefan Kooman
Quoting Robert Sander (r.san...@heinlein-support.de):
> Hi,
> 
> we have a small cluster at a customer's site with three nodes and 4 SSD-OSDs
> each.
> Connected with 10G the system is supposed to perform well.
> 
> rados bench shows ~450MB/s write and ~950MB/s read speeds with 4MB objects
> but only 20MB/s write and 95MB/s read with 4KB objects.
> 
> This is a little bit disappointing as the 4K performance is also seen in KVM
> VMs using RBD.
> 
> Is there anything we can do to improve performance with small objects /
> block sizes?

Josh gave a talk about this:
https://static.sched.com/hosted_files/cephalocon2019/10/Optimizing%20Small%20Ceph%20Clusters.pdf

TL;DR: 
- For small clusters use relatively more PGs than for large clusters
- Make sure your cluster is well balanced, and this script might
be useful:
https://github.com/JoshSalomon/Cephalocon-2019/blob/master/pool_pgs_osd.sh

Josh is also tuning the objecter_* attributes (if you have plenty of
CPU/Memory):

objecter_inflight_ops = 5120
objecter_inflight_op_bytes = 524288000 (512 * 1,024,000)
## You can multiply / divide both with the same factor

Some more tuning tips in the presentation by Wido/Piotr that might be
useful:
https://static.sched.com/hosted_files/cephalocon2019/d6/ceph%20on%20nvme%20barcelona%202019.pdf

Gr. Stefan

-- 
| BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] performance in a small cluster

2019-05-25 Thread Marc Roos
8.86, stdev=137.19
clat percentiles (usec):
 |  1.00th=[  265],  5.00th=[  310], 10.00th=[  351], 20.00th=[  
445],
 | 30.00th=[  494], 40.00th=[  519], 50.00th=[  537], 60.00th=[  
562],
 | 70.00th=[  594], 80.00th=[  644], 90.00th=[  701], 95.00th=[  
742],
 | 99.00th=[  816], 99.50th=[  840], 99.90th=[  914], 99.95th=[ 
1172],
 | 99.99th=[ 2442]
   bw (  KiB/s): min= 4643, max= 7991, per=79.54%, avg=5767.26, 
stdev=1080.89, samples=359
   iops: min= 1160, max= 1997, avg=1441.43, stdev=270.23, 
samples=359
  lat (usec)   : 250=0.57%, 500=31.98%, 750=62.92%, 1000=4.46%
  lat (msec)   : 2=0.05%, 4=0.01%, 10=0.01%, 50=0.01%
  cpu  : usr=1.07%, sys=2.69%, ctx=327838, majf=0, minf=76
  IO depths: 1=116.9%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
>=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
 issued rwt: total=326298,0,0, short=0,0,0, dropped=0,0,0
 latency   : target=0, window=0, percentile=100.00%, depth=1

[3] cephfs
+-++---+--+---+---+---+-
--+---+---+---+--+--+---+---+--+
--+--+--+--+--+--+--
+--+--+--+--+--+
| ||   |  | 4k r ran. |   |   | 4k w 
ran. |   |   | 4k r seq. |  |  | 4k w seq. |   | 
 | 1024k r ran. |  |  | 1024k w ran. |  |  | 1024k r 
seq. |  |  | 1024k w seq. |  |  |
+-++---+--+---+---+---+-
--+---+---+---+--+--+---+---+--+
--+--+--+--+--+--+--
+--+--+--+--+--+
| || size  |  | lat   | iops  | kB/s  | lat  
 | iops  | kB/s  | lat   | iops | MB/s | lat   | iops  | MB/s | 
lat  | iops | MB/s | lat  | iops | MB/s | lat  | 
iops | MB/s | lat  | iops | MB/s |
+-++---+--+---+---+---+-
--+---+---+---+--+--+---+---+--+
--+--+--+--+--+--+--
+--+--+--+--+--+
| Cephfs  | ssd rep. 3 |   |  | 2.78  | 1781  | 7297  | 1.42 
 | 700   | 2871  | 0.29  | 3314 | 13.6 | 0.04  | 889   | 3.64 | 
4.3  | 231  | 243  | 0.08 | 132  | 139  | 4.23 | 
235  | 247  | 6.99 | 142  | 150  |
+-++---+--+---+---+---+-
--+---+---+---+--+--+---+---+--+
--+--+--+--+--+--+--
+--+--+--+--+--+
| Cephfs  | ssd rep. 1 |   |  | 0.54  | 1809  | 7412  | 0.8  
 | 1238  | 5071  | 0.29  | 3325 | 13.6 | 0.56  | 1761  | 7.21 | 
4.27 | 233  | 245  | 4.34 | 229  | 241  | 4.21 | 
236  | 248  | 4.34 | 229  | 241  |
+-++---+--+---+---+---+-
--+---+---+---+--+--+---+---+--+
--+--+--+--+--+--+--
+--+--+--+--+--+
| Samsung | MZK7KM480  | 480GB |  | 0.09  | 10.2k | 41600 | 0.05 
 | 17.9k | 73200 | 0.05  | 18k  | 77.6 | 0.05  | 18.3k | 75.1 | 
2.06 | 482  | 506  | 2.16 | 460  | 483  | 1.98 | 
502  | 527  | 2.13 | 466  | 489  |
+-++---+--+---+---+---+-
--+---+---+---+--+--+---+---+--+
--+--+--+--+--+--+--
+--+--+--+--+--+



-Original Message-
From: Robert Sander [mailto:r.san...@heinlein-support.de] 
Sent: vrijdag 24 mei 2019 15:26
To: ceph-users
Subject: Re: [ceph-users] performance in a small cluster

Am 24.05.19 um 14:43 schrieb Paul Emmerich:
> 20 MB/s at 4K blocks is ~5000 iops, that's 1250 IOPS per SSD (assuming 

> replica 3).
> 
> What we usually check in scenarios like these:
> 
> * SSD model? Lots of cheap SSDs simply can't handle more than that

The system has been newly created and is not busy at all.

We tested a single SSD without OSD on top with fio: it can do 50K IOPS 
read and 16K IOPS write.

> * Get some proper statistics such as OSD latencies, disk IO 
> utilization, etc. A benchmark without detailed performance data 
> doesn't really help to debug such a problem

Yes, that is correct, we will try to setup a perfdata gathering system.

Regards
--
Robert Sander
Heinlein Support GmbH
Linux: Akademie - Support -

Re: [ceph-users] performance in a small cluster

2019-05-25 Thread Marc Schöchlin
Hello Robert,

probably the following tool provides deeper insights whats happening on your 
osds:

https://github.com/scoopex/ceph/blob/master/src/tools/histogram_dump.py
https://github.com/ceph/ceph/pull/28244
https://user-images.githubusercontent.com/288876/58368661-410afa00-7ef0-11e9-9aca-b09d974024a7.png

Monitoring virtual machine/client behavior in a comparable way would also be a 
good thing.

@All: Do you know suitable tools?

  * kernel rbd
  * rbd-nbd
  * linux native (i.e. if your want to analyze from inside a kvm or xen vm)

(the output of "iostat -N -d -x -t -m 10" seems not to be enough for detailed 
analytics)

Regards
Marc

Am 24.05.19 um 13:22 schrieb Robert Sander:
> Hi,
>
> we have a small cluster at a customer's site with three nodes and 4 SSD-OSDs 
> each.
> Connected with 10G the system is supposed to perform well.
>
> rados bench shows ~450MB/s write and ~950MB/s read speeds with 4MB objects 
> but only 20MB/s write and 95MB/s read with 4KB objects.
>
> This is a little bit disappointing as the 4K performance is also seen in KVM 
> VMs using RBD.
>
> Is there anything we can do to improve performance with small objects / block 
> sizes?
>
> Jumbo frames have already been enabled.
>
> 4MB objects write:
>
> Total time run: 30.218930
> Total writes made:  3391
> Write size: 4194304
> Object size:    4194304
> Bandwidth (MB/sec): 448.858
> Stddev Bandwidth:   63.5044
> Max bandwidth (MB/sec): 552
> Min bandwidth (MB/sec): 320
> Average IOPS:   112
> Stddev IOPS:    15
> Max IOPS:   138
> Min IOPS:   80
> Average Latency(s): 0.142475
> Stddev Latency(s):  0.0990132
> Max latency(s): 0.814715
> Min latency(s): 0.0308732
>
> 4MB objects rand read:
>
> Total time run:   30.169312
> Total reads made: 7223
> Read size:    4194304
> Object size:  4194304
> Bandwidth (MB/sec):   957.662
> Average IOPS: 239
> Stddev IOPS:  23
> Max IOPS: 272
> Min IOPS: 175
> Average Latency(s):   0.0653696
> Max latency(s):   0.517275
> Min latency(s):   0.00201978
>
> 4K objects write:
>
> Total time run: 30.002628
> Total writes made:  165404
> Write size: 4096
> Object size:    4096
> Bandwidth (MB/sec): 21.5351
> Stddev Bandwidth:   2.0575
> Max bandwidth (MB/sec): 22.4727
> Min bandwidth (MB/sec): 11.0508
> Average IOPS:   5512
> Stddev IOPS:    526
> Max IOPS:   5753
> Min IOPS:   2829
> Average Latency(s): 0.00290095
> Stddev Latency(s):  0.0015036
> Max latency(s): 0.0778454
> Min latency(s): 0.00174262
>
> 4K objects read:
>
> Total time run:   30.000538
> Total reads made: 1064610
> Read size:    4096
> Object size:  4096
> Bandwidth (MB/sec):   138.619
> Average IOPS: 35486
> Stddev IOPS:  3776
> Max IOPS: 42208
> Min IOPS: 26264
> Average Latency(s):   0.000443905
> Max latency(s):   0.0123462
> Min latency(s):   0.000123081
>
>
> Regards
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] performance in a small cluster

2019-05-24 Thread Maged Mokhtar

Hi Robert

1) Can you specify how many threads were used in the 4k write rados test 
? i suspect that only 16 threads were used, this is because it is the 
default + also the average latency was 2.9 ms giving average of 344 iops 
per thread, your average iops were 5512 divide this by 344 we get 16.02. 
If this is the case then this is too low, you have 12 OSDs, you need to 
use 64 or 128 threads to get a couple of threads on each OSD to stress 
it. use the -t option to specify the thread count. Also better if you 
can run more than 1 client process + preferably from different hosts and 
get the total iops.


2) The read latency you see of 0.4 ms is good. The write latency of 2.9 
ms is not very good but not terrible: a fast all flash bluestore system 
should give around 1 to 1.5 ms write latency (ie from around 600 to 1000 
iops per thread), some users are able to go below 1 ms but it is not 
easy.  Disk model as well as tuning your cpu c states and p states 
frequency will help reduce latency, there are several topics in this 
mailing list that goes into this in great detail + search for a 
presentation by Nick Fisk.


3) Running a simple tool like atop while doing the tests can also reveal 
a lot on where bottlenecks are, % utilization of disks and cpu are 
important. However i expect that if you were using 16 threads only, they 
will not be highly utilized as the dominant factor would be latency as 
noted earlier.


/Maged


On 24/05/2019 13:22, Robert Sander wrote:

Hi,

we have a small cluster at a customer's site with three nodes and 4 
SSD-OSDs each.

Connected with 10G the system is supposed to perform well.

rados bench shows ~450MB/s write and ~950MB/s read speeds with 4MB 
objects but only 20MB/s write and 95MB/s read with 4KB objects.


This is a little bit disappointing as the 4K performance is also seen 
in KVM VMs using RBD.


Is there anything we can do to improve performance with small objects 
/ block sizes?


Jumbo frames have already been enabled.

4MB objects write:

Total time run: 30.218930
Total writes made:  3391
Write size: 4194304
Object size:    4194304
Bandwidth (MB/sec): 448.858
Stddev Bandwidth:   63.5044
Max bandwidth (MB/sec): 552
Min bandwidth (MB/sec): 320
Average IOPS:   112
Stddev IOPS:    15
Max IOPS:   138
Min IOPS:   80
Average Latency(s): 0.142475
Stddev Latency(s):  0.0990132
Max latency(s): 0.814715
Min latency(s): 0.0308732

4MB objects rand read:

Total time run:   30.169312
Total reads made: 7223
Read size:    4194304
Object size:  4194304
Bandwidth (MB/sec):   957.662
Average IOPS: 239
Stddev IOPS:  23
Max IOPS: 272
Min IOPS: 175
Average Latency(s):   0.0653696
Max latency(s):   0.517275
Min latency(s):   0.00201978

4K objects write:

Total time run: 30.002628
Total writes made:  165404
Write size: 4096
Object size:    4096
Bandwidth (MB/sec): 21.5351
Stddev Bandwidth:   2.0575
Max bandwidth (MB/sec): 22.4727
Min bandwidth (MB/sec): 11.0508
Average IOPS:   5512
Stddev IOPS:    526
Max IOPS:   5753
Min IOPS:   2829
Average Latency(s): 0.00290095
Stddev Latency(s):  0.0015036
Max latency(s): 0.0778454
Min latency(s): 0.00174262

4K objects read:

Total time run:   30.000538
Total reads made: 1064610
Read size:    4096
Object size:  4096
Bandwidth (MB/sec):   138.619
Average IOPS: 35486
Stddev IOPS:  3776
Max IOPS: 42208
Min IOPS: 26264
Average Latency(s):   0.000443905
Max latency(s):   0.0123462
Min latency(s):   0.000123081


Regards


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] performance in a small cluster

2019-05-24 Thread Paul Emmerich
On Sat, May 25, 2019 at 12:30 AM Mark Lehrer  wrote:

> > but only 20MB/s write and 95MB/s read with 4KB objects.
>
> There is copy-on-write overhead for each block, so 4K performance is
> going to be limited no matter what.
>

no snapshots are involved and he's using rados bench which operates on
block sizes as specified, so no partial updates are involved

This workload basically goes straight into the WAL for up to 512 MB, so it's
virtually identical to running the standard fio benchmark for ceph disks.


>
> However, if your system is like mine the main problem you will run
> into is that Ceph was designed for spinning disks.  Therefore, its
> main goal is to make sure that no individual OSD is doing more than
> one or two things at a time no matter what.  Unfortunately, SSDs
> typically don't show best performance until you are doing 20+
> simultaneous I/Os (especially if you use a small block size).
>

No, there are different defaults for number of threads and other tuning
parameters since Luminous.


>
> You can see this most clearly with iostat (run "iostat -mtxy 1" on one
> of your OSD nodes) and a high queue depth 4K workload.  You'll notice
> that even though the client is trying to do many things at a time, the
> OSD node is practically idle.  Especially problematic is the fact that
> iostat will stay below 1 in the "avgqu-sz" column and the utilization
> % will be very low.  This makes it look like a thread semaphore kind
> of problem to me... and increasing the number of clients doesn't seem
> to make the OSDs work any harder.
>

RocksDB WAL uses 4 threads/WALs by default IIRC, you can change that
in bluestore_rocksdb_options. Yes, that is often a bottleneck and is one of
the standard options to tune to get the most IOPS out of NVMe disks.
Well, that and creating more partitions/OSDs on a single disk.


But the main problem is that you want to write your data for real. Many
SSDs are just bad at writing small chunks of data.
These benchmark results simply look like a case of a slow disk.


>
> I still haven't found a good solution unfortunately but definitely
> keep an eye on the queue size and util% in iostat -- SSD bandwidth &
> iops depend on maximizing the number of parallel I/O operations.  If
> anyone has hints on improving Ceph threading I would love to figure
> this one out.
>

Agreed, everyone should monitor util%



-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


>
>
> On Fri, May 24, 2019 at 5:23 AM Robert Sander
>  wrote:
> >
> > Hi,
> >
> > we have a small cluster at a customer's site with three nodes and 4
> > SSD-OSDs each.
> > Connected with 10G the system is supposed to perform well.
> >
> > rados bench shows ~450MB/s write and ~950MB/s read speeds with 4MB
> > objects but only 20MB/s write and 95MB/s read with 4KB objects.
> >
> > This is a little bit disappointing as the 4K performance is also seen in
> > KVM VMs using RBD.
> >
> > Is there anything we can do to improve performance with small objects /
> > block sizes?
> >
> > Jumbo frames have already been enabled.
> >
> > 4MB objects write:
> >
> > Total time run: 30.218930
> > Total writes made:  3391
> > Write size: 4194304
> > Object size:4194304
> > Bandwidth (MB/sec): 448.858
> > Stddev Bandwidth:   63.5044
> > Max bandwidth (MB/sec): 552
> > Min bandwidth (MB/sec): 320
> > Average IOPS:   112
> > Stddev IOPS:15
> > Max IOPS:   138
> > Min IOPS:   80
> > Average Latency(s): 0.142475
> > Stddev Latency(s):  0.0990132
> > Max latency(s): 0.814715
> > Min latency(s): 0.0308732
> >
> > 4MB objects rand read:
> >
> > Total time run:   30.169312
> > Total reads made: 7223
> > Read size:4194304
> > Object size:  4194304
> > Bandwidth (MB/sec):   957.662
> > Average IOPS: 239
> > Stddev IOPS:  23
> > Max IOPS: 272
> > Min IOPS: 175
> > Average Latency(s):   0.0653696
> > Max latency(s):   0.517275
> > Min latency(s):   0.00201978
> >
> > 4K objects write:
> >
> > Total time run: 30.002628
> > Total writes made:  165404
> > Write size: 4096
> > Object size:4096
> > Bandwidth (MB/sec): 21.5351
> > Stddev Bandwidth:   2.0575
> > Max bandwidth (MB/sec): 22.4727
> > Min bandwidth (MB/sec): 11.0508
> > Average IOPS:   5512
> > Stddev IOPS:526
> > Max IOPS:   5753
> > Min IOPS:   2829
> > Average Latency(s): 0.00290095
> > Stddev Latency(s):  0.0015036
> > Max latency(s): 0.0778454
> > Min latency(s): 0.00174262
> >
> > 4K objects read:
> >
> > Total time run:   30.000538
> > Total reads made: 1064610
> > Read size:4096
> > Object size:  4096
> > Bandwidth (MB/sec):   138.619
> > Average 

Re: [ceph-users] performance in a small cluster

2019-05-24 Thread Mark Lehrer
> but only 20MB/s write and 95MB/s read with 4KB objects.

There is copy-on-write overhead for each block, so 4K performance is
going to be limited no matter what.

However, if your system is like mine the main problem you will run
into is that Ceph was designed for spinning disks.  Therefore, its
main goal is to make sure that no individual OSD is doing more than
one or two things at a time no matter what.  Unfortunately, SSDs
typically don't show best performance until you are doing 20+
simultaneous I/Os (especially if you use a small block size).

You can see this most clearly with iostat (run "iostat -mtxy 1" on one
of your OSD nodes) and a high queue depth 4K workload.  You'll notice
that even though the client is trying to do many things at a time, the
OSD node is practically idle.  Especially problematic is the fact that
iostat will stay below 1 in the "avgqu-sz" column and the utilization
% will be very low.  This makes it look like a thread semaphore kind
of problem to me... and increasing the number of clients doesn't seem
to make the OSDs work any harder.

I still haven't found a good solution unfortunately but definitely
keep an eye on the queue size and util% in iostat -- SSD bandwidth &
iops depend on maximizing the number of parallel I/O operations.  If
anyone has hints on improving Ceph threading I would love to figure
this one out.


On Fri, May 24, 2019 at 5:23 AM Robert Sander
 wrote:
>
> Hi,
>
> we have a small cluster at a customer's site with three nodes and 4
> SSD-OSDs each.
> Connected with 10G the system is supposed to perform well.
>
> rados bench shows ~450MB/s write and ~950MB/s read speeds with 4MB
> objects but only 20MB/s write and 95MB/s read with 4KB objects.
>
> This is a little bit disappointing as the 4K performance is also seen in
> KVM VMs using RBD.
>
> Is there anything we can do to improve performance with small objects /
> block sizes?
>
> Jumbo frames have already been enabled.
>
> 4MB objects write:
>
> Total time run: 30.218930
> Total writes made:  3391
> Write size: 4194304
> Object size:4194304
> Bandwidth (MB/sec): 448.858
> Stddev Bandwidth:   63.5044
> Max bandwidth (MB/sec): 552
> Min bandwidth (MB/sec): 320
> Average IOPS:   112
> Stddev IOPS:15
> Max IOPS:   138
> Min IOPS:   80
> Average Latency(s): 0.142475
> Stddev Latency(s):  0.0990132
> Max latency(s): 0.814715
> Min latency(s): 0.0308732
>
> 4MB objects rand read:
>
> Total time run:   30.169312
> Total reads made: 7223
> Read size:4194304
> Object size:  4194304
> Bandwidth (MB/sec):   957.662
> Average IOPS: 239
> Stddev IOPS:  23
> Max IOPS: 272
> Min IOPS: 175
> Average Latency(s):   0.0653696
> Max latency(s):   0.517275
> Min latency(s):   0.00201978
>
> 4K objects write:
>
> Total time run: 30.002628
> Total writes made:  165404
> Write size: 4096
> Object size:4096
> Bandwidth (MB/sec): 21.5351
> Stddev Bandwidth:   2.0575
> Max bandwidth (MB/sec): 22.4727
> Min bandwidth (MB/sec): 11.0508
> Average IOPS:   5512
> Stddev IOPS:526
> Max IOPS:   5753
> Min IOPS:   2829
> Average Latency(s): 0.00290095
> Stddev Latency(s):  0.0015036
> Max latency(s): 0.0778454
> Min latency(s): 0.00174262
>
> 4K objects read:
>
> Total time run:   30.000538
> Total reads made: 1064610
> Read size:4096
> Object size:  4096
> Bandwidth (MB/sec):   138.619
> Average IOPS: 35486
> Stddev IOPS:  3776
> Max IOPS: 42208
> Min IOPS: 26264
> Average Latency(s):   0.000443905
> Max latency(s):   0.0123462
> Min latency(s):   0.000123081
>
>
> Regards
> --
> Robert Sander
> Heinlein Support GmbH
> Linux: Akademie - Support - Hosting
> http://www.heinlein-support.de
>
> Tel: 030-405051-43
> Fax: 030-405051-19
>
> Zwangsangaben lt. §35a GmbHG:
> HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
> Geschäftsführer: Peer Heinlein  -- Sitz: Berlin
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] performance in a small cluster

2019-05-24 Thread Paul Emmerich
On Fri, May 24, 2019 at 3:27 PM Robert Sander 
wrote:

> Am 24.05.19 um 14:43 schrieb Paul Emmerich:
> > 20 MB/s at 4K blocks is ~5000 iops, that's 1250 IOPS per SSD (assuming
> > replica 3).
> >
> > What we usually check in scenarios like these:
> >
> > * SSD model? Lots of cheap SSDs simply can't handle more than that
>
> The system has been newly created and is not busy at all.
>
> We tested a single SSD without OSD on top with fio: it can do 50K IOPS
> read and 16K IOPS write.
>

If you tell us the disk model someone here might be able to share their
experiences with that disk.


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


>
> > * Get some proper statistics such as OSD latencies, disk IO utilization,
> > etc. A benchmark without detailed performance data doesn't really help
> > to debug such a problem
>
> Yes, that is correct, we will try to setup a perfdata gathering system.
>
> Regards
> --
> Robert Sander
> Heinlein Support GmbH
> Linux: Akademie - Support - Hosting
> http://www.heinlein-support.de
>
> Tel: 030-405051-43
> Fax: 030-405051-19
>
> Zwangsangaben lt. §35a GmbHG:
> HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
> Geschäftsführer: Peer Heinlein  -- Sitz: Berlin
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] performance in a small cluster

2019-05-24 Thread Robert LeBlanc
On Fri, May 24, 2019 at 6:26 AM Robert Sander 
wrote:

> Am 24.05.19 um 14:43 schrieb Paul Emmerich:
> > 20 MB/s at 4K blocks is ~5000 iops, that's 1250 IOPS per SSD (assuming
> > replica 3).
> >
> > What we usually check in scenarios like these:
> >
> > * SSD model? Lots of cheap SSDs simply can't handle more than that
>
> The system has been newly created and is not busy at all.
>
> We tested a single SSD without OSD on top with fio: it can do 50K IOPS
> read and 16K IOPS write.
>

You probably tested with async writes, try passing sync to fio, that is
much closer to what Ceph will do as it syncs every write to make sure it
is written to disk before acknowledging back to the client that the write
is done. When I did these tests, I also filled the entire drive and ran the
test for an hour. Most drives looked fine with short tests are small
amounts of data, but once the drive started getting full, the performance
dropped off a cliff. Considering that Ceph is really hard on drives, it's
good to test the extreme.

Robert LeBlanc
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] performance in a small cluster

2019-05-24 Thread Robert Sander
Am 24.05.19 um 14:43 schrieb Paul Emmerich:
> 20 MB/s at 4K blocks is ~5000 iops, that's 1250 IOPS per SSD (assuming
> replica 3).
> 
> What we usually check in scenarios like these:
> 
> * SSD model? Lots of cheap SSDs simply can't handle more than that

The system has been newly created and is not busy at all.

We tested a single SSD without OSD on top with fio: it can do 50K IOPS
read and 16K IOPS write.

> * Get some proper statistics such as OSD latencies, disk IO utilization,
> etc. A benchmark without detailed performance data doesn't really help
> to debug such a problem

Yes, that is correct, we will try to setup a perfdata gathering system.

Regards
-- 
Robert Sander
Heinlein Support GmbH
Linux: Akademie - Support - Hosting
http://www.heinlein-support.de

Tel: 030-405051-43
Fax: 030-405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein  -- Sitz: Berlin



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] performance in a small cluster

2019-05-24 Thread Paul Emmerich
20 MB/s at 4K blocks is ~5000 iops, that's 1250 IOPS per SSD (assuming
replica 3).

What we usually check in scenarios like these:

* SSD model? Lots of cheap SSDs simply can't handle more than that
* Get some proper statistics such as OSD latencies, disk IO utilization,
etc. A benchmark without detailed performance data doesn't really help to
debug such a problem


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Fri, May 24, 2019 at 1:23 PM Robert Sander 
wrote:

> Hi,
>
> we have a small cluster at a customer's site with three nodes and 4
> SSD-OSDs each.
> Connected with 10G the system is supposed to perform well.
>
> rados bench shows ~450MB/s write and ~950MB/s read speeds with 4MB
> objects but only 20MB/s write and 95MB/s read with 4KB objects.
>
> This is a little bit disappointing as the 4K performance is also seen in
> KVM VMs using RBD.
>
> Is there anything we can do to improve performance with small objects /
> block sizes?
>
> Jumbo frames have already been enabled.
>
> 4MB objects write:
>
> Total time run: 30.218930
> Total writes made:  3391
> Write size: 4194304
> Object size:4194304
> Bandwidth (MB/sec): 448.858
> Stddev Bandwidth:   63.5044
> Max bandwidth (MB/sec): 552
> Min bandwidth (MB/sec): 320
> Average IOPS:   112
> Stddev IOPS:15
> Max IOPS:   138
> Min IOPS:   80
> Average Latency(s): 0.142475
> Stddev Latency(s):  0.0990132
> Max latency(s): 0.814715
> Min latency(s): 0.0308732
>
> 4MB objects rand read:
>
> Total time run:   30.169312
> Total reads made: 7223
> Read size:4194304
> Object size:  4194304
> Bandwidth (MB/sec):   957.662
> Average IOPS: 239
> Stddev IOPS:  23
> Max IOPS: 272
> Min IOPS: 175
> Average Latency(s):   0.0653696
> Max latency(s):   0.517275
> Min latency(s):   0.00201978
>
> 4K objects write:
>
> Total time run: 30.002628
> Total writes made:  165404
> Write size: 4096
> Object size:4096
> Bandwidth (MB/sec): 21.5351
> Stddev Bandwidth:   2.0575
> Max bandwidth (MB/sec): 22.4727
> Min bandwidth (MB/sec): 11.0508
> Average IOPS:   5512
> Stddev IOPS:526
> Max IOPS:   5753
> Min IOPS:   2829
> Average Latency(s): 0.00290095
> Stddev Latency(s):  0.0015036
> Max latency(s): 0.0778454
> Min latency(s): 0.00174262
>
> 4K objects read:
>
> Total time run:   30.000538
> Total reads made: 1064610
> Read size:4096
> Object size:  4096
> Bandwidth (MB/sec):   138.619
> Average IOPS: 35486
> Stddev IOPS:  3776
> Max IOPS: 42208
> Min IOPS: 26264
> Average Latency(s):   0.000443905
> Max latency(s):   0.0123462
> Min latency(s):   0.000123081
>
>
> Regards
> --
> Robert Sander
> Heinlein Support GmbH
> Linux: Akademie - Support - Hosting
> http://www.heinlein-support.de
>
> Tel: 030-405051-43
> Fax: 030-405051-19
>
> Zwangsangaben lt. §35a GmbHG:
> HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
> Geschäftsführer: Peer Heinlein  -- Sitz: Berlin
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com