Re: [ceph-users] Rados performance inconsistencies, lower than expected performance

2018-09-17 Thread Menno Zonneveld
> > I’m running Proxmox VE 5.2 which includes ceph version 12.2.7 
> > (94ce186ac93bb28c3c444bccfefb8a31eb0748e4)
> luminous (stable)
> 12.2.8 is in the repositories. ;)

I forgot to reply to this part and I did notice the update afterwards and since 
updated but performance was the same.

I redid all the steps (delete OSDs, zap disks, create OSDs and sync data for 
each server) and performance is back again.

Any ideas or suggestions on how to provide useful information or how to debug 
this issue?

Thanks,
Menno
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rados performance inconsistencies, lower than expected performance

2018-09-13 Thread Menno Zonneveld
-Original message-
> From:Alwin Antreich 
> Sent: Thursday 13th September 2018 14:41
> To: Menno Zonneveld 
> Cc: ceph-users ; Marc Roos 
> 
> Subject: Re: [ceph-users] Rados performance inconsistencies, lower than 
> expected performance
> 
> > Am I doing something wrong? Did I run into some sort of bug?
> AFAIR, the object deletion is done in the background. Depending on how
> quick you do the subsequent tests and how much the cluster is working on
> the recovery, the results may vary well.

I've noticed this as well but the performance will never recover unless I 
recreate the OSDs, the storage machines show no activity on CPU or disk so I 
assume the deletion and recovery is already done (also ceph -w shows everything 
is ok), this is something I made sure of while doing these tests.

The second the third machine finishes recovery of recreating all OSDs 
performance is back again.

Thanks,
Menno

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rados performance inconsistencies, lower than expected performance

2018-09-13 Thread Alwin Antreich
On Thu, Sep 13, 2018 at 02:17:20PM +0200, Menno Zonneveld wrote:
> Update on the subject, warning, lengthy post but reproducible results and 
> workaround to get performance back to expected level.
> 
> One of the servers had a broken disk controller causing some performance 
> issues on this one host, FIO showed about half performance on some disks 
> compared to the other hosts, it’s been replaced but rados performance did not 
> improve.
> 
> - write: 313.86 MB/s / 0.203907ms
> 
> I figured it would be wise to test all servers and disks to see if they 
> deliver the expected performance.
> 
> Since I have data on the cluster that I wanted to keep online I did one 
> server at a time, delete the 3 OSDs, FIO test them, recreate the OSDs and add 
> them back to the cluster, wait till the cluster is healthy and on to the next 
> server. All disks match the expected performance now so with all servers and 
> OSDs up again I redid the rados benchmark.
> 
> Performance was almost twice as good as before recreating all OSDs and on par 
> with what I had expected for bandwidth and latency.
> 
> - write: 586.679 MB/s / 0.109085ms
> - read: 2092.27 MB/s / 0.0292913ms
> 
> As the controller was not to blame I wanted to test if having different size 
> OSDs with ‘correct’ weights assigned was causing the issue, I removed one OSD 
> on each storage node (one node at a time) and re-partitioned it and added it 
> back to the cluster with the correct weight, performance was still ok though 
> a little slower as before.
> 
> Figuring this wasn’t the cause either I took out the OSDs with partitions 
> again, wiped the disks and recreated the OSDs. Performance now was even 
> lower, almost as low as when I just swapped the controller.
> 
> Since I knew the performance could be better I decided to recreate all OSDs 
> one server at a time and performance once again was good.
> 
> Since now I was able to reproduce the issue I started once more and document 
> all the steps to see if there is any logic to the issue.
> 
> With the cluster performing well I started removing one OSD at a time, wait 
> for the cluster to become healthy again, benchmark, add it back and on to the 
> next server.
> 
> These are the results of each step.
> 
> One OSD out:
> 
> write: 528.439 / 0.121021
> read: 2022.19 / 0.03031
> 
> OSD back in again:
> 
> write: 584.14 / 0.10956
> read: 2108.06 / 0.0289867
> 
> Next server, one OSD out:
> 
> write: 482.923 / 0.132512
> read: 2008.38 / 0.0305356
> 
> OSD back in again:
> 
> write: 578.034 / 0.110686
> read: 2059.24 / 0.0297554
> 
> Next server, one OSD out:
> 
> write: 470.384 / 0.136055
> read: 2043.68 / 0.0299759
> 
> OSD back in again:
> 
> write: 424.01 / 0.150886
> read: 2086.94 / 0.0293182
> 
> Write performance now is significantly lower as when I started. When I first 
> wrote on the mailing list performance seems to go up once CEPH enters 
> 'near-full' state so I decided to test that again.
> 
> I reached full by accident and the last two write tests showed somewhat 
> better performance but not near the level I started with.
> 
> write: 468.632 / 0.136559
> write: 488.523 / 0.130999
> 
> I removed the benchmark pool and recreated it, testing a few more times, 
> performance now seems even lower again and again near the results I started 
> off with.
> 
> write: 449.034 / 0.142524
> write: 399.532 / 0.160168
> write: 366.831 / 0.174451
> 
> I know how to get the performance back to the expected level by recreating 
> all OSDs and shuffling data around the cluster but I don’t think this should 
> happen in the first place.
> 
> Just to clarify when removing an OSD I reweigh it to 0, wait for it’s safe to 
> delete the OSD, I assume this is the correct way of doing such things.
> 
> Am I doing something wrong? Did I run into some sort of bug?
AFAIR, the object deletion is done in the background. Depending on how
quick you do the subsequent tests and how much the cluster is working on
the recovery, the results may vary well.

> 
> I’m running Proxmox VE 5.2 which includes ceph version 12.2.7 
> (94ce186ac93bb28c3c444bccfefb8a31eb0748e4) luminous (stable)
12.2.8 is in the repositories. ;)

> 
> Thanks,
> Menno
> 
> 
> My script to safely remove an OSD:
> 
> ceph osd crush reweight osd.$1 0.0
> while ! ceph osd safe-to-destroy $1; do echo "not safe to destroy, waiting.." 
> ; sleep 10 ; done
> sleep 5
> ceph osd out $1
> systemctl disable ceph-osd@$1
> systemctl stop ceph-osd@$1
> ceph osd crush remove osd.$1
> ceph auth del osd.$1
> ceph osd down $1
> ceph osd rm $1
> 
--
Cheers,
Alwin

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rados performance inconsistencies, lower than expected performance

2018-09-13 Thread Menno Zonneveld
Update on the subject, warning, lengthy post but reproducible results and 
workaround to get performance back to expected level.

One of the servers had a broken disk controller causing some performance issues 
on this one host, FIO showed about half performance on some disks compared to 
the other hosts, it’s been replaced but rados performance did not improve.

- write: 313.86 MB/s / 0.203907ms

I figured it would be wise to test all servers and disks to see if they deliver 
the expected performance.

Since I have data on the cluster that I wanted to keep online I did one server 
at a time, delete the 3 OSDs, FIO test them, recreate the OSDs and add them 
back to the cluster, wait till the cluster is healthy and on to the next 
server. All disks match the expected performance now so with all servers and 
OSDs up again I redid the rados benchmark.

Performance was almost twice as good as before recreating all OSDs and on par 
with what I had expected for bandwidth and latency.

- write: 586.679 MB/s / 0.109085ms
- read: 2092.27 MB/s / 0.0292913ms

As the controller was not to blame I wanted to test if having different size 
OSDs with ‘correct’ weights assigned was causing the issue, I removed one OSD 
on each storage node (one node at a time) and re-partitioned it and added it 
back to the cluster with the correct weight, performance was still ok though a 
little slower as before.

Figuring this wasn’t the cause either I took out the OSDs with partitions 
again, wiped the disks and recreated the OSDs. Performance now was even lower, 
almost as low as when I just swapped the controller.

Since I knew the performance could be better I decided to recreate all OSDs one 
server at a time and performance once again was good.

Since now I was able to reproduce the issue I started once more and document 
all the steps to see if there is any logic to the issue.

With the cluster performing well I started removing one OSD at a time, wait for 
the cluster to become healthy again, benchmark, add it back and on to the next 
server.

These are the results of each step.

One OSD out:

write: 528.439 / 0.121021
read: 2022.19 / 0.03031

OSD back in again:

write: 584.14 / 0.10956
read: 2108.06 / 0.0289867

Next server, one OSD out:

write: 482.923 / 0.132512
read: 2008.38 / 0.0305356

OSD back in again:

write: 578.034 / 0.110686
read: 2059.24 / 0.0297554

Next server, one OSD out:

write: 470.384 / 0.136055
read: 2043.68 / 0.0299759

OSD back in again:

write: 424.01 / 0.150886
read: 2086.94 / 0.0293182

Write performance now is significantly lower as when I started. When I first 
wrote on the mailing list performance seems to go up once CEPH enters 
'near-full' state so I decided to test that again.

I reached full by accident and the last two write tests showed somewhat better 
performance but not near the level I started with.

write: 468.632 / 0.136559
write: 488.523 / 0.130999

I removed the benchmark pool and recreated it, testing a few more times, 
performance now seems even lower again and again near the results I started off 
with.

write: 449.034 / 0.142524
write: 399.532 / 0.160168
write: 366.831 / 0.174451

I know how to get the performance back to the expected level by recreating all 
OSDs and shuffling data around the cluster but I don’t think this should happen 
in the first place.

Just to clarify when removing an OSD I reweigh it to 0, wait for it’s safe to 
delete the OSD, I assume this is the correct way of doing such things.

Am I doing something wrong? Did I run into some sort of bug?

I’m running Proxmox VE 5.2 which includes ceph version 12.2.7 
(94ce186ac93bb28c3c444bccfefb8a31eb0748e4) luminous (stable)

Thanks,
Menno


My script to safely remove an OSD:

ceph osd crush reweight osd.$1 0.0
while ! ceph osd safe-to-destroy $1; do echo "not safe to destroy, waiting.." ; 
sleep 10 ; done
sleep 5
ceph osd out $1
systemctl disable ceph-osd@$1
systemctl stop ceph-osd@$1
ceph osd crush remove osd.$1
ceph auth del osd.$1
ceph osd down $1
ceph osd rm $1

-Original message-
> From:Menno Zonneveld 
> Sent: Monday 10th September 2018 11:45
> To: Alwin Antreich ; ceph-users 
> 
> Cc: Marc Roos 
> Subject: RE: [ceph-users] Rados performance inconsistencies, lower than 
> expected performance
> 
> 
> -Original message-
> > From:Alwin Antreich 
> > Sent: Thursday 6th September 2018 18:36
> > To: ceph-users 
> > Cc: Menno Zonneveld ; Marc Roos 
> > Subject: Re: [ceph-users] Rados performance inconsistencies, lower than
> expected performance
> > 
> > On Thu, Sep 06, 2018 at 05:15:26PM +0200, Marc Roos wrote:
> > > 
> > > It is idle, testing still, running a backup's at night on it.
> > > How do you fill up the cluster so you can test between empty and full?
> 
> > > Do you have a "ceph df" from empty and full? 
> > > 
> > > I have done anot

Re: [ceph-users] Rados performance inconsistencies, lower than expected performance

2018-09-10 Thread Menno Zonneveld


-Original message-
> From:Alwin Antreich 
> Sent: Thursday 6th September 2018 18:36
> To: ceph-users 
> Cc: Menno Zonneveld ; Marc Roos 
> Subject: Re: [ceph-users] Rados performance inconsistencies, lower than 
> expected performance
> 
> On Thu, Sep 06, 2018 at 05:15:26PM +0200, Marc Roos wrote:
> > 
> > It is idle, testing still, running a backup's at night on it.
> > How do you fill up the cluster so you can test between empty and full? 
> > Do you have a "ceph df" from empty and full? 
> > 
> > I have done another test disabling new scrubs on the rbd.ssd pool (but 
> > still 3 on hdd) with:
> > ceph tell osd.* injectargs --osd_max_backfills=0
> > Again getting slower towards the end.
> > Bandwidth (MB/sec): 395.749
> > Average Latency(s): 0.161713
> In the results you both had, the latency is twice as high as in our
> tests [1]. That can already make quiet some difference. Depending on the
> actual hardware used, there may or may not be the possibility for good
> optimisation.
> 
> As a start, you could test the disks with fio, as shown in our benchmark
> paper, to get some results for comparison. The forum thread [1] has
> some benchmarks from other users for comparison.
> 
> [1] https://forum.proxmox.com/threads/proxmox-ve-ceph-benchmark-2018-02.41761/

Thanks for the suggestion, I redid the fio test and one server seem to be 
causing trouble.

When I initially tested our SSD's according to the benchmark paper our Intel 
SSD's performed more or less equal to the Samsung SSD's used.

from fio.log

fio: (groupid=0, jobs=1): err= 0: pid=3606315: Mon Sep 10 11:12:36 2018
  write: io=4005.9MB, bw=68366KB/s, iops=17091, runt= 60001msec
slat (usec): min=5, max=252, avg= 5.76, stdev= 0.66
clat (usec): min=6, max=949, avg=51.72, stdev= 9.54
 lat (usec): min=54, max=955, avg=57.48, stdev= 9.56

However one of the other machines (with identical SSD's) now performs poorly 
compared to the others with these results

fio: (groupid=0, jobs=1): err= 0: pid=3893600: Mon Sep 10 11:15:17 2018
  write: io=1258.8MB, bw=51801KB/s, iops=12950, runt= 24883msec
slat (usec): min=5, max=259, avg= 6.17, stdev= 0.78
clat (usec): min=53, max=857, avg=69.77, stdev=13.11
 lat (usec): min=70, max=863, avg=75.93, stdev=13.17

I'll first resolve the slower machine before doing more testing as this surely 
won't help overall performance.


> --
> Cheers,
> Alwin

Thanks!,
Menno
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rados performance inconsistencies, lower than expected performance

2018-09-10 Thread Menno Zonneveld
I filled up the cluster by accident by not supplying --no-cleanup to the write 
benchmark, I'm sure there must be a better way for that though.

I've run the tests again and when the cluster is 'empty' (I have a few test 
VM's stored on CEPH) and let it fill up again.

Performance goes up from 276.812 to 433.859 MB/sec and latency goes down from 
0.231178 to 0.147433.

I do have to mention I did find a problem with the cluster thanks to Alwin's 
suggestion to (re)do fio benchmarks, one server with identical SSD's is 
performing poorly compared to the others, I'll resolve this first before 
continuing other benchmarks.

When empty:

# ceph df

GLOBAL:
    SIZE  AVAIL RAW USED %RAW USED 
    3784G 2488G    1295G 34.24 
POOLS:
    NAME ID USED %USED MAX AVAIL OBJECTS 
    ssd  1  431G 37.33  723G  110984 
    rbdbench 76    0 0  723G   0 

# rados bench -p rbdbench 180 write -b 4M -t 16 --no-cleanup

Total time run: 180.223580
Total writes made:  12472
Write size: 4194304
Object size:    4194304
Bandwidth (MB/sec): 276.812
Stddev Bandwidth:   66.2295
Max bandwidth (MB/sec): 524
Min bandwidth (MB/sec): 112
Average IOPS:   69
Stddev IOPS:    16
Max IOPS:   131
Min IOPS:   28
Average Latency(s): 0.231178
Stddev Latency(s):  0.19153
Max latency(s): 1.16432
Min latency(s): 0.022585

And after a few benchmarks when I hit CEPH's warning near-full.:

# ceph df

GLOBAL:
    SIZE  AVAIL RAW USED %RAW USED 
    3784G  751G    3032G 80.13 
POOLS:
    NAME ID USED %USED MAX AVAIL OBJECTS 
    ssd  1  431G 82.93    90858M  110984 
    rbdbench 76 579G 86.73    90858M  148467 

# rados bench -p rbdbench 180 write -b 4M -t 16 --no-cleanup

Total time run: 180.233495
Total writes made:  19549
Write size: 4194304
Object size:    4194304
Bandwidth (MB/sec): 433.859
Stddev Bandwidth:   73.0601
Max bandwidth (MB/sec): 584
Min bandwidth (MB/sec): 220
Average IOPS:   108
Stddev IOPS:    18
Max IOPS:   146
Min IOPS:   55
Average Latency(s): 0.147433
Stddev Latency(s):  0.103518
Max latency(s): 1.08162
Min latency(s): 0.0218688


-Original message-
> From:Marc Roos 
> Sent: Thursday 6th September 2018 17:15
> To: ceph-users ; Menno Zonneveld 
> Subject: RE: [ceph-users] Rados performance inconsistencies, lower than 
> expected performance
> 
> 
> It is idle, testing still, running a backup's at night on it.
> How do you fill up the cluster so you can test between empty and full? 
> Do you have a "ceph df" from empty and full? 
> 
> I have done another test disabling new scrubs on the rbd.ssd pool (but 
> still 3 on hdd) with:
> ceph tell osd.* injectargs --osd_max_backfills=0
> Again getting slower towards the end.
> Bandwidth (MB/sec): 395.749
> Average Latency(s): 0.161713
> 
> 
> -Original Message-
> From: Menno Zonneveld [mailto:me...@1afa.com] 
> Sent: donderdag 6 september 2018 16:56
> To: Marc Roos; ceph-users
> Subject: RE: [ceph-users] Rados performance inconsistencies, lower than 
> expected performance
> 
> The benchmark does fluctuate quite a bit that's why I run it for 180 
> seconds now as then I do get consistent results.
> 
> Your performance seems on par with what I'm getting with 3 nodes and 9 
> OSD's, not sure what to make of that.
> 
> Are your machines actively used perhaps? Mine are mostly idle as it's 
> still a test setup.
> 
> -Original message-
> > From:Marc Roos 
> > Sent: Thursday 6th September 2018 16:23
> > To: ceph-users ; Menno Zonneveld 
> > 
> > Subject: RE: [ceph-users] Rados performance inconsistencies, lower 
> > than expected performance
> > 
> > 
> > 
> > I am on 4 nodes, mostly hdds, and 4x samsung sm863 480GB 2x E5-2660 2x 
> 
> > LSI SAS2308 1x dual port 10Gbit (one used, and shared between 
> > cluster/client vlans)
> > 
> > I have 5 pg's scrubbing, but I am not sure if there is any on the ssd 
> > pool. I am noticing a drop in the performance at the end of the test.
> > Maybe some caching on the ssd?
> > 
> > rados bench -p rbd.ssd 60 write -b 4M -t 16
> > Bandwidth (MB/sec): 448.465
> > Average Latency(s): 0.142671
> > 
> > rados bench -p rbd.ssd 180 write -b 4M -t 16
> > Bandwidth (MB/sec):     381.998
> > Average Latency(s):     0.167524
> > 
> > 
> > -Original Message-
> > From: Menno Zonneveld [mailto:me...@1afa.com]
> > Sent: donderdag 

Re: [ceph-users] Rados performance inconsistencies, lower than expected performance

2018-09-07 Thread Marc Roos
 

 the samsung sm863. 


write-4k-seq: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
4096B-4096B, ioengine=libaio, iodepth=1
randwrite-4k-seq: (g=1): rw=randwrite, bs=(R) 4096B-4096B, (W) 
4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
read-4k-seq: (g=2): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
4096B-4096B, ioengine=libaio, iodepth=1
randread-4k-seq: (g=3): rw=randread, bs=(R) 4096B-4096B, (W) 
4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
rw-4k-seq: (g=4): rw=rw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
4096B-4096B, ioengine=libaio, iodepth=1
randrw-4k-seq: (g=5): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, 
(T) 4096B-4096B, ioengine=libaio, iodepth=1
write-128k-seq: (g=6): rw=write, bs=(R) 128KiB-128KiB, (W) 
128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=1
randwrite-128k-seq: (g=7): rw=randwrite, bs=(R) 128KiB-128KiB, (W) 
128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=1
read-128k-seq: (g=8): rw=read, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, 
(T) 128KiB-128KiB, ioengine=libaio, iodepth=1
randread-128k-seq: (g=9): rw=randread, bs=(R) 128KiB-128KiB, (W) 
128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=1
rw-128k-seq: (g=10): rw=rw, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 
128KiB-128KiB, ioengine=libaio, iodepth=1
randrw-128k-seq: (g=11): rw=randrw, bs=(R) 128KiB-128KiB, (W) 
128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=1
write-1024k-seq: (g=12): rw=write, bs=(R) 1024KiB-1024KiB, (W) 
1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1
randwrite-1024k-seq: (g=13): rw=randwrite, bs=(R) 1024KiB-1024KiB, (W) 
1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1
read-1024k-seq: (g=14): rw=read, bs=(R) 1024KiB-1024KiB, (W) 
1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1
randread-1024k-seq: (g=15): rw=randread, bs=(R) 1024KiB-1024KiB, (W) 
1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1
rw-1024k-seq: (g=16): rw=rw, bs=(R) 1024KiB-1024KiB, (W) 
1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1
randrw-1024k-seq: (g=17): rw=randrw, bs=(R) 1024KiB-1024KiB, (W) 
1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1
fio-3.1
Starting 18 processes

write-4k-seq: (groupid=0, jobs=1): err= 0: pid=522702: Thu Sep  6 
21:04:12 2018
  write: IOPS=18.3k, BW=71.6MiB/s (75.1MB/s)(12.6GiB/180001msec)
slat (usec): min=4, max=118, avg= 9.69, stdev= 4.91
clat (nsec): min=1147, max=647409, avg=42105.25, stdev=8700.49
 lat (usec): min=34, max=662, avg=52.09, stdev= 9.30
clat percentiles (usec):
 |  1.00th=[   33],  5.00th=[   35], 10.00th=[   35], 20.00th=[   
35],
 | 30.00th=[   36], 40.00th=[   36], 50.00th=[   39], 60.00th=[   
43],
 | 70.00th=[   47], 80.00th=[   52], 90.00th=[   56], 95.00th=[   
58],
 | 99.00th=[   60], 99.50th=[   62], 99.90th=[   67], 99.95th=[   
69],
 | 99.99th=[  155]
   bw (  KiB/s): min=36464, max=92829, per=84.59%, avg=62036.62, 
stdev=13226.37, samples=359
   iops: min= 9116, max=23207, avg=15508.80, stdev=3306.62, 
samples=359
  lat (usec)   : 2=0.01%, 10=0.01%, 20=0.01%, 50=75.33%, 100=24.64%
  lat (usec)   : 250=0.01%, 500=0.01%, 750=0.01%
  cpu  : usr=11.41%, sys=27.84%, ctx=3300541, majf=0, minf=51
  IO depths: 1=116.7%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
>=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
 issued rwt: total=0,3300219,0, short=0,0,0, dropped=0,0,0
 latency   : target=0, window=0, percentile=100.00%, depth=1
randwrite-4k-seq: (groupid=1, jobs=1): err= 0: pid=522903: Thu Sep  6 
21:04:12 2018
  write: IOPS=17.9k, BW=69.8MiB/s (73.2MB/s)(12.3GiB/180001msec)
slat (usec): min=4, max=333, avg= 9.94, stdev= 5.00
clat (nsec): min=1141, max=1131.2k, avg=42560.69, stdev=9074.14
 lat (usec): min=35, max=1137, avg=52.80, stdev= 9.42
clat percentiles (usec):
 |  1.00th=[   33],  5.00th=[   35], 10.00th=[   35], 20.00th=[   
35],
 | 30.00th=[   36], 40.00th=[   36], 50.00th=[   41], 60.00th=[   
43],
 | 70.00th=[   49], 80.00th=[   54], 90.00th=[   57], 95.00th=[   
58],
 | 99.00th=[   60], 99.50th=[   62], 99.90th=[   67], 99.95th=[   
70],
 | 99.99th=[  174]
   bw (  KiB/s): min=34338, max=92268, per=84.26%, avg=60268.13, 
stdev=12283.36, samples=359
   iops: min= 8584, max=23067, avg=15066.67, stdev=3070.87, 
samples=359
  lat (usec)   : 2=0.01%, 10=0.01%, 20=0.01%, 50=71.73%, 100=28.24%
  lat (usec)   : 250=0.01%, 500=0.01%, 750=0.01%
  lat (msec)   : 2=0.01%
  cpu  : usr=12.96%, sys=26.87%, ctx=3218988, majf=0, minf=10962
  IO depths: 1=116.8%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
>=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%

Re: [ceph-users] Rados performance inconsistencies, lower than expected performance

2018-09-06 Thread Alwin Antreich
On Thu, Sep 06, 2018 at 05:15:26PM +0200, Marc Roos wrote:
> 
> It is idle, testing still, running a backup's at night on it.
> How do you fill up the cluster so you can test between empty and full? 
> Do you have a "ceph df" from empty and full? 
> 
> I have done another test disabling new scrubs on the rbd.ssd pool (but 
> still 3 on hdd) with:
> ceph tell osd.* injectargs --osd_max_backfills=0
> Again getting slower towards the end.
> Bandwidth (MB/sec): 395.749
> Average Latency(s): 0.161713
In the results you both had, the latency is twice as high as in our
tests [1]. That can already make quiet some difference. Depending on the
actual hardware used, there may or may not be the possibility for good
optimisation.

As a start, you could test the disks with fio, as shown in our benchmark
paper, to get some results for comparison. The forum thread [1] has
some benchmarks from other users for comparison.

[1] https://forum.proxmox.com/threads/proxmox-ve-ceph-benchmark-2018-02.41761/

--
Cheers,
Alwin

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rados performance inconsistencies, lower than expected performance

2018-09-06 Thread Marc Roos

It is idle, testing still, running a backup's at night on it.
How do you fill up the cluster so you can test between empty and full? 
Do you have a "ceph df" from empty and full? 

I have done another test disabling new scrubs on the rbd.ssd pool (but 
still 3 on hdd) with:
ceph tell osd.* injectargs --osd_max_backfills=0
Again getting slower towards the end.
Bandwidth (MB/sec): 395.749
Average Latency(s): 0.161713


-Original Message-
From: Menno Zonneveld [mailto:me...@1afa.com] 
Sent: donderdag 6 september 2018 16:56
To: Marc Roos; ceph-users
Subject: RE: [ceph-users] Rados performance inconsistencies, lower than 
expected performance

The benchmark does fluctuate quite a bit that's why I run it for 180 
seconds now as then I do get consistent results.

Your performance seems on par with what I'm getting with 3 nodes and 9 
OSD's, not sure what to make of that.

Are your machines actively used perhaps? Mine are mostly idle as it's 
still a test setup.

-Original message-
> From:Marc Roos 
> Sent: Thursday 6th September 2018 16:23
> To: ceph-users ; Menno Zonneveld 
> 
> Subject: RE: [ceph-users] Rados performance inconsistencies, lower 
> than expected performance
> 
> 
> 
> I am on 4 nodes, mostly hdds, and 4x samsung sm863 480GB 2x E5-2660 2x 

> LSI SAS2308 1x dual port 10Gbit (one used, and shared between 
> cluster/client vlans)
> 
> I have 5 pg's scrubbing, but I am not sure if there is any on the ssd 
> pool. I am noticing a drop in the performance at the end of the test.
> Maybe some caching on the ssd?
> 
> rados bench -p rbd.ssd 60 write -b 4M -t 16
> Bandwidth (MB/sec): 448.465
> Average Latency(s): 0.142671
> 
> rados bench -p rbd.ssd 180 write -b 4M -t 16
> Bandwidth (MB/sec): 381.998
> Average Latency(s): 0.167524
> 
> 
> -Original Message-
> From: Menno Zonneveld [mailto:me...@1afa.com]
> Sent: donderdag 6 september 2018 15:52
> To: Marc Roos; ceph-users
> Subject: RE: [ceph-users] Rados performance inconsistencies, lower 
> than expected performance
> 
> ah yes, 3x replicated with minimal 2.
> 
> 
> my ceph.conf is pretty bare, just in case it might be relevant
> 
> [global]
>auth client required = cephx
>auth cluster required = cephx
>auth service required = cephx
> 
>cluster network = 172.25.42.0/24
> 
>fsid = f4971cca-e73c-46bc-bb05-4af61d419f6e
> 
>keyring = /etc/pve/priv/$cluster.$name.keyring
> 
>mon allow pool delete = true
>mon osd allow primary affinity = true
> 
>osd journal size = 5120
>osd pool default min size = 2
>osd pool default size = 3
> 
> 
> -Original message-
> > From:Marc Roos 
> > Sent: Thursday 6th September 2018 15:43
> > To: ceph-users ; Menno Zonneveld 
> > 
> > Subject: RE: [ceph-users] Rados performance inconsistencies, lower 
> > than expected performance
> > 
> >  
> > 
> > Test pool is 3x replicated?
> > 
> > 
> > -Original Message-
> > From: Menno Zonneveld [mailto:me...@1afa.com]
> > Sent: donderdag 6 september 2018 15:29
> > To: ceph-users@lists.ceph.com
> > Subject: [ceph-users] Rados performance inconsistencies, lower than 
> > expected performance
> > 
> > I've setup a CEPH cluster to test things before going into 
> > production but I've run into some performance issues that I cannot 
> > resolve or explain.
> > 
> > Hardware in use in each storage machine (x3)
> > - dual 10Gbit Solarflare Communications SFC9020 (Linux bond, mtu 
> > 9000)
> > - dual 10Gbit EdgeSwitch 16-Port XG
> > - LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 HBA
> > - 3x Intel S4500 480GB SSD as OSD's
> > - 2x SSD raid-1 boot/OS disks
> > - 2x Intel(R) Xeon(R) CPU E5-2630
> > - 128GB memory
> > 
> > Software wise I'm running CEPH 12.2.7-pve1 setup from Proxmox VE 5.2 

> > on all nodes.
> > 
> > Running rados benchmark resulted in somewhat lower than expected 
> > performance unless ceph enters the 'near-full' state. When the 
> > cluster
> 
> > is mostly empty rados bench (180 write -b 4M -t 16) results in about 

> > 330MB/s with 0.18ms latency but when hitting near-full state this 
> > goes
> 
> > up to a more expected 550MB/s and 0.11ms latency.
> > 
> > iostat on the storage machines shows the disks are hardly utilized 
> > unless the cluster hits near-full, CPU and network also aren't maxed 

> > out. I’ve also tried with NIC bonding and just one switch, without 
> > jumbo frames but nothing seem to matter in this case.
> > 
&

Re: [ceph-users] Rados performance inconsistencies, lower than expected performance

2018-09-06 Thread Menno Zonneveld
The benchmark does fluctuate quite a bit that's why I run it for 180 seconds 
now as then I do get consistent results.

Your performance seems on par with what I'm getting with 3 nodes and 9 OSD's, 
not sure what to make of that.

Are your machines actively used perhaps? Mine are mostly idle as it's still a 
test setup.

-Original message-
> From:Marc Roos 
> Sent: Thursday 6th September 2018 16:23
> To: ceph-users ; Menno Zonneveld 
> Subject: RE: [ceph-users] Rados performance inconsistencies, lower than 
> expected performance
> 
> 
> 
> I am on 4 nodes, mostly hdds, and 4x samsung sm863 480GB
> 2x E5-2660
> 2x LSI SAS2308 
> 1x dual port 10Gbit (one used, and shared between cluster/client vlans)
> 
> I have 5 pg's scrubbing, but I am not sure if there is any on the ssd 
> pool. I am noticing a drop in the performance at the end of the test. 
> Maybe some caching on the ssd?
> 
> rados bench -p rbd.ssd 60 write -b 4M -t 16
> Bandwidth (MB/sec): 448.465
> Average Latency(s): 0.142671
> 
> rados bench -p rbd.ssd 180 write -b 4M -t 16
> Bandwidth (MB/sec): 381.998
> Average Latency(s): 0.167524
> 
> 
> -Original Message-
> From: Menno Zonneveld [mailto:me...@1afa.com] 
> Sent: donderdag 6 september 2018 15:52
> To: Marc Roos; ceph-users
> Subject: RE: [ceph-users] Rados performance inconsistencies, lower than 
> expected performance
> 
> ah yes, 3x replicated with minimal 2.
> 
> 
> my ceph.conf is pretty bare, just in case it might be relevant
> 
> [global]
>auth client required = cephx
>auth cluster required = cephx
>auth service required = cephx
> 
>cluster network = 172.25.42.0/24
> 
>fsid = f4971cca-e73c-46bc-bb05-4af61d419f6e
> 
>keyring = /etc/pve/priv/$cluster.$name.keyring
> 
>mon allow pool delete = true
>mon osd allow primary affinity = true
> 
>osd journal size = 5120
>osd pool default min size = 2
>osd pool default size = 3
> 
> 
> -Original message-----
> > From:Marc Roos 
> > Sent: Thursday 6th September 2018 15:43
> > To: ceph-users ; Menno Zonneveld 
> > 
> > Subject: RE: [ceph-users] Rados performance inconsistencies, lower 
> > than expected performance
> > 
> >  
> > 
> > Test pool is 3x replicated?
> > 
> > 
> > -Original Message-
> > From: Menno Zonneveld [mailto:me...@1afa.com]
> > Sent: donderdag 6 september 2018 15:29
> > To: ceph-users@lists.ceph.com
> > Subject: [ceph-users] Rados performance inconsistencies, lower than 
> > expected performance
> > 
> > I've setup a CEPH cluster to test things before going into production 
> > but I've run into some performance issues that I cannot resolve or 
> > explain.
> > 
> > Hardware in use in each storage machine (x3)
> > - dual 10Gbit Solarflare Communications SFC9020 (Linux bond, mtu 9000)
> > - dual 10Gbit EdgeSwitch 16-Port XG
> > - LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 HBA
> > - 3x Intel S4500 480GB SSD as OSD's
> > - 2x SSD raid-1 boot/OS disks
> > - 2x Intel(R) Xeon(R) CPU E5-2630
> > - 128GB memory
> > 
> > Software wise I'm running CEPH 12.2.7-pve1 setup from Proxmox VE 5.2 
> > on all nodes.
> > 
> > Running rados benchmark resulted in somewhat lower than expected 
> > performance unless ceph enters the 'near-full' state. When the cluster 
> 
> > is mostly empty rados bench (180 write -b 4M -t 16) results in about 
> > 330MB/s with 0.18ms latency but when hitting near-full state this goes 
> 
> > up to a more expected 550MB/s and 0.11ms latency.
> > 
> > iostat on the storage machines shows the disks are hardly utilized 
> > unless the cluster hits near-full, CPU and network also aren't maxed 
> > out. I’ve also tried with NIC bonding and just one switch, without 
> > jumbo frames but nothing seem to matter in this case.
> > 
> > Is this expected behavior or what can I try to do to pinpoint the 
> > bottleneck ?
> > 
> > The expected performance is per Proxmox's benchmark results they 
> > released this year, they have 4 OSD's per server and hit almost 
> > 800MB/s with 0.08ms latency using 10Gbit and 3 nodes, though they have 
> 
> > more OSD's and somewhat different hardware I understand I won't hit 
> > the 800MB/s mark but the difference between empty and almost full 
> > cluster makes no sense to me, I'd expect it to be the other way 
> around.
> > 
> > Thanks,
> > Menno
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
> > 
> > 
> 
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rados performance inconsistencies, lower than expected performance

2018-09-06 Thread Menno Zonneveld
-Original message-
> From:Alwin Antreich 
> Sent: Thursday 6th September 2018 16:27
> To: ceph-users 
> Cc: Menno Zonneveld 
> Subject: Re: [ceph-users] Rados performance inconsistencies, lower than 
> expected performance
> 
> Hi,

Hi!

> On Thu, Sep 06, 2018 at 03:52:21PM +0200, Menno Zonneveld wrote:
> > ah yes, 3x replicated with minimal 2.
> > 
> > 
> > my ceph.conf is pretty bare, just in case it might be relevant
> > 
> > [global]
> > auth client required = cephx
> > auth cluster required = cephx
> > auth service required = cephx
> > 
> > cluster network = 172.25.42.0/24
> > 
> > fsid = f4971cca-e73c-46bc-bb05-4af61d419f6e
> > 
> > keyring = /etc/pve/priv/$cluster.$name.keyring
> > 
> > mon allow pool delete = true
> > mon osd allow primary affinity = true
> On our test cluster, we didn't set the primary affinity as all OSDs were
> SSDs of the same model. Did you do any settings other than this? How
> does your crush map look like?

I only used this option when testing with mixing HDD and SSD (1 replica on SSD 
and 2 on HDD); right now affinity for all disks is 1.

The weight of one OSD in each server is lower because I have partitioned the 
drive to be able to test with SSD journal for HDDs but this isn't active at the 
moment.

If I understand correctly setting the weight like this should be fine and I 
also tested with weight 1 for all OSD's and I still get the same performance 
('slow' when empty, fast when full)

Current ceph osd tree

ID  CLASS WEIGHT  TYPE NAME    STATUS REWEIGHT PRI-AFF 
 -1   3.71997 root ssd 
 -5   1.23999 host ceph01-test 
  2   ssd 0.36600 osd.2    up  1.0 1.0 
  3   ssd 0.43700 osd.3    up  1.0 1.0 
  6   ssd 0.43700 osd.6    up  1.0 1.0 
 -7   1.23999 host ceph02-test 
  4   ssd 0.36600 osd.4    up  1.0 1.0 
  5   ssd 0.43700 osd.5    up  1.0 1.0 
  7   ssd 0.43700 osd.7    up  1.0 1.0 
 -3   1.23999 host ceph03-test 
  0   ssd 0.36600 osd.0    up  1.0 1.0 
  1   ssd 0.43700 osd.1    up  1.0 1.0 
  8   ssd 0.43700 osd.8    up  1.0 1.0 

My current crush map looks like this:

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class ssd
device 1 osd.1 class ssd
device 2 osd.2 class ssd
device 3 osd.3 class ssd
device 4 osd.4 class ssd
device 5 osd.5 class ssd
device 6 osd.6 class ssd
device 7 osd.7 class ssd
device 8 osd.8 class ssd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host ceph03-test {
 id -3 # do not change unnecessarily
 id -4 class ssd # do not change unnecessarily
 # weight 1.240
 alg straw2
 hash 0 # rjenkins1
 item osd.1 weight 0.437
 item osd.0 weight 0.366
 item osd.8 weight 0.437
}
host ceph01-test {
 id -5 # do not change unnecessarily
 id -6 class ssd # do not change unnecessarily
 # weight 1.240
 alg straw2
 hash 0 # rjenkins1
 item osd.3 weight 0.437
 item osd.2 weight 0.366
 item osd.6 weight 0.437
}
host ceph02-test {
 id -7 # do not change unnecessarily
 id -8 class ssd # do not change unnecessarily
 # weight 1.240
 alg straw2
 hash 0 # rjenkins1
 item osd.5 weight 0.437
 item osd.4 weight 0.366
 item osd.7 weight 0.437
}
root ssd {
 id -1 # do not change unnecessarily
 id -2 class ssd # do not change unnecessarily
 # weight 3.720
 alg straw2
 hash 0 # rjenkins1
 item ceph03-test weight 1.240
 item ceph01-test weight 1.240
 item ceph02-test weight 1.240
}

# rules
rule ssd {
 id 0
 type replicated
 min_size 1
 max_size 10
 step take ssd
 step chooseleaf firstn 0 type host
 step emit
}

# end crush map

> > 
> > osd journal size = 5120
> > osd pool default min size = 2
> > osd pool default size = 3
> > 
> >

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rados performance inconsistencies, lower than expected performance

2018-09-06 Thread Alwin Antreich
Hi,

On Thu, Sep 06, 2018 at 03:52:21PM +0200, Menno Zonneveld wrote:
> ah yes, 3x replicated with minimal 2.
> 
> 
> my ceph.conf is pretty bare, just in case it might be relevant
> 
> [global]
>auth client required = cephx
>auth cluster required = cephx
>auth service required = cephx
> 
>cluster network = 172.25.42.0/24
> 
>fsid = f4971cca-e73c-46bc-bb05-4af61d419f6e
> 
>keyring = /etc/pve/priv/$cluster.$name.keyring
> 
>mon allow pool delete = true
>mon osd allow primary affinity = true
On our test cluster, we didn't set the primary affinity as all OSDs were
SSDs of the same model. Did you do any settings other than this? How
does your crush map look like?

> 
>osd journal size = 5120
>osd pool default min size = 2
>osd pool default size = 3
> 
> 
> -Original message-
> > From:Marc Roos 
> > Sent: Thursday 6th September 2018 15:43
> > To: ceph-users ; Menno Zonneveld 
> > Subject: RE: [ceph-users] Rados performance inconsistencies, lower than 
> > expected performance
> > 
> >  
> > 
> > Test pool is 3x replicated?
> > 
> > 
> > -Original Message-
> > From: Menno Zonneveld [mailto:me...@1afa.com] 
> > Sent: donderdag 6 september 2018 15:29
> > To: ceph-users@lists.ceph.com
> > Subject: [ceph-users] Rados performance inconsistencies, lower than 
> > expected performance
> > 
> > I've setup a CEPH cluster to test things before going into production 
> > but I've run into some performance issues that I cannot resolve or 
> > explain.
> > 
> > Hardware in use in each storage machine (x3)
> > - dual 10Gbit Solarflare Communications SFC9020 (Linux bond, mtu 9000)
> > - dual 10Gbit EdgeSwitch 16-Port XG
> > - LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 HBA
> > - 3x Intel S4500 480GB SSD as OSD's
> > - 2x SSD raid-1 boot/OS disks
> > - 2x Intel(R) Xeon(R) CPU E5-2630
> > - 128GB memory
> > 
> > Software wise I'm running CEPH 12.2.7-pve1 setup from Proxmox VE 5.2 on 
> > all nodes.
> > 
> > Running rados benchmark resulted in somewhat lower than expected 
> > performance unless ceph enters the 'near-full' state. When the cluster 
> > is mostly empty rados bench (180 write -b 4M -t 16) results in about 
> > 330MB/s with 0.18ms latency but when hitting near-full state this goes 
> > up to a more expected 550MB/s and 0.11ms latency.
> > 
> > iostat on the storage machines shows the disks are hardly utilized 
> > unless the cluster hits near-full, CPU and network also aren't maxed 
> > out. I’ve also tried with NIC bonding and just one switch, without 
> > jumbo frames but nothing seem to matter in this case.
> > 
> > Is this expected behavior or what can I try to do to pinpoint the 
> > bottleneck ?
> > 
> > The expected performance is per Proxmox's benchmark results they 
> > released this year, they have 4 OSD's per server and hit almost 800MB/s 
> > with 0.08ms latency using 10Gbit and 3 nodes, though they have more 
> > OSD's and somewhat different hardware I understand I won't hit the 
> > 800MB/s mark but the difference between empty and almost full cluster 
> > makes no sense to me, I'd expect it to be the other way around.
> > 
> > Thanks,
> > Menno

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rados performance inconsistencies, lower than expected performance

2018-09-06 Thread Marc Roos


I am on 4 nodes, mostly hdds, and 4x samsung sm863 480GB
2x E5-2660
2x LSI SAS2308 
1x dual port 10Gbit (one used, and shared between cluster/client vlans)

I have 5 pg's scrubbing, but I am not sure if there is any on the ssd 
pool. I am noticing a drop in the performance at the end of the test. 
Maybe some caching on the ssd?

rados bench -p rbd.ssd 60 write -b 4M -t 16
Bandwidth (MB/sec): 448.465
Average Latency(s): 0.142671

rados bench -p rbd.ssd 180 write -b 4M -t 16
Bandwidth (MB/sec): 381.998
Average Latency(s): 0.167524


-Original Message-
From: Menno Zonneveld [mailto:me...@1afa.com] 
Sent: donderdag 6 september 2018 15:52
To: Marc Roos; ceph-users
Subject: RE: [ceph-users] Rados performance inconsistencies, lower than 
expected performance

ah yes, 3x replicated with minimal 2.


my ceph.conf is pretty bare, just in case it might be relevant

[global]
 auth client required = cephx
 auth cluster required = cephx
 auth service required = cephx

 cluster network = 172.25.42.0/24

 fsid = f4971cca-e73c-46bc-bb05-4af61d419f6e

 keyring = /etc/pve/priv/$cluster.$name.keyring

 mon allow pool delete = true
 mon osd allow primary affinity = true

 osd journal size = 5120
 osd pool default min size = 2
 osd pool default size = 3


-Original message-
> From:Marc Roos 
> Sent: Thursday 6th September 2018 15:43
> To: ceph-users ; Menno Zonneveld 
> 
> Subject: RE: [ceph-users] Rados performance inconsistencies, lower 
> than expected performance
> 
>  
> 
> Test pool is 3x replicated?
> 
> 
> -Original Message-
> From: Menno Zonneveld [mailto:me...@1afa.com]
> Sent: donderdag 6 september 2018 15:29
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] Rados performance inconsistencies, lower than 
> expected performance
> 
> I've setup a CEPH cluster to test things before going into production 
> but I've run into some performance issues that I cannot resolve or 
> explain.
> 
> Hardware in use in each storage machine (x3)
> - dual 10Gbit Solarflare Communications SFC9020 (Linux bond, mtu 9000)
> - dual 10Gbit EdgeSwitch 16-Port XG
> - LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 HBA
> - 3x Intel S4500 480GB SSD as OSD's
> - 2x SSD raid-1 boot/OS disks
> - 2x Intel(R) Xeon(R) CPU E5-2630
> - 128GB memory
> 
> Software wise I'm running CEPH 12.2.7-pve1 setup from Proxmox VE 5.2 
> on all nodes.
> 
> Running rados benchmark resulted in somewhat lower than expected 
> performance unless ceph enters the 'near-full' state. When the cluster 

> is mostly empty rados bench (180 write -b 4M -t 16) results in about 
> 330MB/s with 0.18ms latency but when hitting near-full state this goes 

> up to a more expected 550MB/s and 0.11ms latency.
> 
> iostat on the storage machines shows the disks are hardly utilized 
> unless the cluster hits near-full, CPU and network also aren't maxed 
> out. I’ve also tried with NIC bonding and just one switch, without 
> jumbo frames but nothing seem to matter in this case.
> 
> Is this expected behavior or what can I try to do to pinpoint the 
> bottleneck ?
> 
> The expected performance is per Proxmox's benchmark results they 
> released this year, they have 4 OSD's per server and hit almost 
> 800MB/s with 0.08ms latency using 10Gbit and 3 nodes, though they have 

> more OSD's and somewhat different hardware I understand I won't hit 
> the 800MB/s mark but the difference between empty and almost full 
> cluster makes no sense to me, I'd expect it to be the other way 
around.
> 
> Thanks,
> Menno
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rados performance inconsistencies, lower than expected performance

2018-09-06 Thread Menno Zonneveld
ah yes, 3x replicated with minimal 2.


my ceph.conf is pretty bare, just in case it might be relevant

[global]
 auth client required = cephx
 auth cluster required = cephx
 auth service required = cephx

 cluster network = 172.25.42.0/24

 fsid = f4971cca-e73c-46bc-bb05-4af61d419f6e

 keyring = /etc/pve/priv/$cluster.$name.keyring

 mon allow pool delete = true
 mon osd allow primary affinity = true

 osd journal size = 5120
 osd pool default min size = 2
 osd pool default size = 3


-Original message-
> From:Marc Roos 
> Sent: Thursday 6th September 2018 15:43
> To: ceph-users ; Menno Zonneveld 
> Subject: RE: [ceph-users] Rados performance inconsistencies, lower than 
> expected performance
> 
>  
> 
> Test pool is 3x replicated?
> 
> 
> -Original Message-
> From: Menno Zonneveld [mailto:me...@1afa.com] 
> Sent: donderdag 6 september 2018 15:29
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] Rados performance inconsistencies, lower than 
> expected performance
> 
> I've setup a CEPH cluster to test things before going into production 
> but I've run into some performance issues that I cannot resolve or 
> explain.
> 
> Hardware in use in each storage machine (x3)
> - dual 10Gbit Solarflare Communications SFC9020 (Linux bond, mtu 9000)
> - dual 10Gbit EdgeSwitch 16-Port XG
> - LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 HBA
> - 3x Intel S4500 480GB SSD as OSD's
> - 2x SSD raid-1 boot/OS disks
> - 2x Intel(R) Xeon(R) CPU E5-2630
> - 128GB memory
> 
> Software wise I'm running CEPH 12.2.7-pve1 setup from Proxmox VE 5.2 on 
> all nodes.
> 
> Running rados benchmark resulted in somewhat lower than expected 
> performance unless ceph enters the 'near-full' state. When the cluster 
> is mostly empty rados bench (180 write -b 4M -t 16) results in about 
> 330MB/s with 0.18ms latency but when hitting near-full state this goes 
> up to a more expected 550MB/s and 0.11ms latency.
> 
> iostat on the storage machines shows the disks are hardly utilized 
> unless the cluster hits near-full, CPU and network also aren't maxed 
> out. I’ve also tried with NIC bonding and just one switch, without 
> jumbo frames but nothing seem to matter in this case.
> 
> Is this expected behavior or what can I try to do to pinpoint the 
> bottleneck ?
> 
> The expected performance is per Proxmox's benchmark results they 
> released this year, they have 4 OSD's per server and hit almost 800MB/s 
> with 0.08ms latency using 10Gbit and 3 nodes, though they have more 
> OSD's and somewhat different hardware I understand I won't hit the 
> 800MB/s mark but the difference between empty and almost full cluster 
> makes no sense to me, I'd expect it to be the other way around.
> 
> Thanks,
> Menno
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rados performance inconsistencies, lower than expected performance

2018-09-06 Thread Marc Roos
 

Test pool is 3x replicated?


-Original Message-
From: Menno Zonneveld [mailto:me...@1afa.com] 
Sent: donderdag 6 september 2018 15:29
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Rados performance inconsistencies, lower than 
expected performance

I've setup a CEPH cluster to test things before going into production 
but I've run into some performance issues that I cannot resolve or 
explain.

Hardware in use in each storage machine (x3)
- dual 10Gbit Solarflare Communications SFC9020 (Linux bond, mtu 9000)
- dual 10Gbit EdgeSwitch 16-Port XG
- LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 HBA
- 3x Intel S4500 480GB SSD as OSD's
- 2x SSD raid-1 boot/OS disks
- 2x Intel(R) Xeon(R) CPU E5-2630
- 128GB memory

Software wise I'm running CEPH 12.2.7-pve1 setup from Proxmox VE 5.2 on 
all nodes.

Running rados benchmark resulted in somewhat lower than expected 
performance unless ceph enters the 'near-full' state. When the cluster 
is mostly empty rados bench (180 write -b 4M -t 16) results in about 
330MB/s with 0.18ms latency but when hitting near-full state this goes 
up to a more expected 550MB/s and 0.11ms latency.

iostat on the storage machines shows the disks are hardly utilized 
unless the cluster hits near-full, CPU and network also aren't maxed 
out. I’ve also tried with NIC bonding and just one switch, without 
jumbo frames but nothing seem to matter in this case.

Is this expected behavior or what can I try to do to pinpoint the 
bottleneck ?

The expected performance is per Proxmox's benchmark results they 
released this year, they have 4 OSD's per server and hit almost 800MB/s 
with 0.08ms latency using 10Gbit and 3 nodes, though they have more 
OSD's and somewhat different hardware I understand I won't hit the 
800MB/s mark but the difference between empty and almost full cluster 
makes no sense to me, I'd expect it to be the other way around.

Thanks,
Menno
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Rados performance inconsistencies, lower than expected performance

2018-09-06 Thread Menno Zonneveld
I've setup a CEPH cluster to test things before going into production but I've 
run into some performance issues that I cannot resolve or explain.

Hardware in use in each storage machine (x3)
- dual 10Gbit Solarflare Communications SFC9020 (Linux bond, mtu 9000)
- dual 10Gbit EdgeSwitch 16-Port XG
- LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 HBA
- 3x Intel S4500 480GB SSD as OSD's
- 2x SSD raid-1 boot/OS disks
- 2x Intel(R) Xeon(R) CPU E5-2630
- 128GB memory

Software wise I'm running CEPH 12.2.7-pve1 setup from Proxmox VE 5.2 on all 
nodes.

Running rados benchmark resulted in somewhat lower than expected performance 
unless ceph enters the 'near-full' state. When the cluster is mostly empty 
rados bench (180 write -b 4M -t 16) results in about 330MB/s with 0.18ms 
latency but when hitting near-full state this goes up to a more expected 
550MB/s and 0.11ms latency.

iostat on the storage machines shows the disks are hardly utilized unless the 
cluster hits near-full, CPU and network also aren't maxed out. I’ve also tried 
with NIC bonding and just one switch, without jumbo frames but nothing seem to 
matter in this case.

Is this expected behavior or what can I try to do to pinpoint the bottleneck ?

The expected performance is per Proxmox's benchmark results they released this 
year, they have 4 OSD's per server and hit almost 800MB/s with 0.08ms latency 
using 10Gbit and 3 nodes, though they have more OSD's and somewhat different 
hardware I understand I won't hit the 800MB/s mark but the difference between 
empty and almost full cluster makes no sense to me, I'd expect it to be the 
other way around.

Thanks,
Menno
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com