Re: [ceph-users] Terrible RBD performance with Jewel

Adrian Saul Thu, 14 Jul 2016 18:00:11 -0700

I would suggest caution with " filestore_odsync_write" - its fine on good SSDs, 
but on poor SSDs or spinning disks it will kill performance.

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Somnath Roy
Sent: Friday, 15 July 2016 3:12 AM
To: Garg, Pankaj; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Terrible RBD performance with Jewel

Try increasing the following to say 10

osd_op_num_shards = 10
filestore_fd_cache_size = 128

Hope, the following you introduced after I told you , so, it shouldn't be the 
cause it seems (?)

filestore_odsync_write = true

Also, comment out the following.

filestore_wbthrottle_enable = false

From: Garg, Pankaj [mailto:pankaj.g...@cavium.com]
Sent: Thursday, July 14, 2016 10:05 AM
To: Somnath Roy; ceph-users@lists.ceph.com
Subject: RE: Terrible RBD performance with Jewel

Something in this section is causing all the 0 IOPS issue. Have not been able 
to nail down it yet. (I did comment out the filestore_max_inline_xattr_size 
entries, and problem still exists).
If I take out the whole [osd] section, I was able to get rid of IOPS staying at 
0 for long periods of time. Performance is still not where I would expect.
[osd]
osd_enable_op_tracker = false
osd_op_num_shards = 2
filestore_wbthrottle_enable = false
filestore_max_sync_interval = 1
filestore_odsync_write = true
#filestore_max_inline_xattr_size = 254
#filestore_max_inline_xattrs = 6
filestore_queue_committing_max_bytes = 1048576000
filestore_queue_committing_max_ops = 5000
filestore_queue_max_bytes = 1048576000
filestore_queue_max_ops = 500
journal_max_write_bytes = 1048576000
journal_max_write_entries = 1000
journal_queue_max_bytes = 1048576000
journal_queue_max_ops = 3000
filestore_fd_cache_shards = 32
filestore_fd_cache_size = 64

From: Somnath Roy [mailto:somnath....@sandisk.com]
Sent: Wednesday, July 13, 2016 7:05 PM
To: Garg, Pankaj; ceph-users@lists.ceph.com
Subject: RE: Terrible RBD performance with Jewel

I am not sure whether you need to set the following. What's the point of 
reducing inline xattr stuff ? I forgot the calculation but lower values could 
redirect your xattrs to omap. Better comment those out.

filestore_max_inline_xattr_size = 254
filestore_max_inline_xattrs = 6

We could do some improvement on some of the params but nothing it seems 
responsible for the behavior you are seeing.
Could you run iotop and see if any process (like xfsaild) is doing io on the 
drives during that time ?

Thanks & Regards
Somnath

From: Garg, Pankaj [mailto:pankaj.g...@cavium.com]
Sent: Wednesday, July 13, 2016 6:40 PM
To: Somnath Roy; ceph-users@lists.ceph.com
Subject: RE: Terrible RBD performance with Jewel

I agree, but I'm dealing with something else out here with this setup.
I just ran a test, and within 3 seconds my IOPS went to 0, and stayed there for 
90 seconds....then started and within seconds again went to 0.
This doesn't seem normal at all. Here is my ceph.conf:

[global]
fsid = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
public_network = xxxxxxxxxxxxxxxxxxxxxxxx
cluster_network = xxxxxxxxxxxxxxxxxxxxxxxxxxxx
mon_initial_members = ceph1
mon_host = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd_mkfs_options = -f -i size=2048 -n size=64k
osd_mount_options_xfs = inode64,noatime,logbsize=256k
filestore_merge_threshold = 40
filestore_split_multiple = 8
osd_op_threads = 12
osd_pool_default_size = 2
mon_pg_warn_max_object_skew = 100000
mon_pg_warn_min_per_osd = 0
mon_pg_warn_max_per_osd = 32768
filestore_op_threads = 6

[osd]
osd_enable_op_tracker = false
osd_op_num_shards = 2
filestore_wbthrottle_enable = false
filestore_max_sync_interval = 1
filestore_odsync_write = true
filestore_max_inline_xattr_size = 254
filestore_max_inline_xattrs = 6
filestore_queue_committing_max_bytes = 1048576000
filestore_queue_committing_max_ops = 5000
filestore_queue_max_bytes = 1048576000
filestore_queue_max_ops = 500
journal_max_write_bytes = 1048576000
journal_max_write_entries = 1000
journal_queue_max_bytes = 1048576000
journal_queue_max_ops = 3000
filestore_fd_cache_shards = 32
filestore_fd_cache_size = 64

From: Somnath Roy [mailto:somnath....@sandisk.com]
Sent: Wednesday, July 13, 2016 6:06 PM
To: Garg, Pankaj; ceph-users@lists.ceph.com
Subject: RE: Terrible RBD performance with Jewel

You should do that first to get a stable performance out with filestore.
1M seq write for the entire image should be sufficient to precondition it.

From: Garg, Pankaj [mailto:pankaj.g...@cavium.com]
Sent: Wednesday, July 13, 2016 6:04 PM
To: Somnath Roy; ceph-users@lists.ceph.com
Subject: RE: Terrible RBD performance with Jewel

No I have not.

From: Somnath Roy [mailto:somnath....@sandisk.com]
Sent: Wednesday, July 13, 2016 6:00 PM
To: Garg, Pankaj; ceph-users@lists.ceph.com
Subject: RE: Terrible RBD performance with Jewel

In fact, I was wrong , I missed you are running with 12 OSDs (considering one 
OSD per SSD). In that case, it will take ~250 second to fill up the journal.
Have you preconditioned the entire image with bigger block say 1M before doing 
any real test ?

From: Garg, Pankaj [mailto:pankaj.g...@cavium.com]
Sent: Wednesday, July 13, 2016 5:55 PM
To: Somnath Roy; ceph-users@lists.ceph.com
Subject: RE: Terrible RBD performance with Jewel

Thanks Somnath. I will try all these, but I think there is something else going 
on too.
Firstly my test reaches 0 IOPS within 10 seconds sometimes.
Secondly, when I'm at 0 IOPS, I see NO disk activity on IOSTAT and no CPU 
activity either. This part is strange.

Thanks
Pankaj

From: Somnath Roy [mailto:somnath....@sandisk.com]
Sent: Wednesday, July 13, 2016 5:49 PM
To: Somnath Roy; Garg, Pankaj; ceph-users@lists.ceph.com
Subject: RE: Terrible RBD performance with Jewel

Also increase the following..

filestore_op_threads

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Somnath Roy
Sent: Wednesday, July 13, 2016 5:47 PM
To: Garg, Pankaj; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Terrible RBD performance with Jewel

Pankaj,

Could be related to the new throttle parameter introduced in jewel. By default 
these throttles are off , you need to tweak it according to your setup.
What is your journal size and fio block size ?
If it is default 5GB , with this rate (assuming 4K RW)   you mentioned and 
considering 3X replication , it can fill up your journal and stall io within 
~30 seconds or so.
If you think this is what is happening in your system , you need to turn this 
throttle on (see 
https://github.com/ceph/ceph/blob/jewel/src/doc/dynamic-throttle.txt ) and also 
need to lower the filestore_max_sync_interval to ~1 (or even lower). Since you 
are trying on SSD , I would also recommend to turn the following parameter on 
for the stable performance out.

filestore_odsync_write = true

Thanks & Regards
Somnath
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Garg, 
Pankaj
Sent: Wednesday, July 13, 2016 4:57 PM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Terrible RBD performance with Jewel

Hi,
I just  installed jewel on a small cluster of 3 machines with 4 SSDs each. I 
created 8 RBD images, and use a single client, with 8 threads, to do random 
writes (using FIO with RBD engine) on the images ( 1 thread per image).
The cluster has 3X replication and 10G cluster and client networks.
FIO prints the aggregate IOPS every second for the cluster. Before Jewel, I get 
roughtly 10K IOPS. It was up and down, but still kept going.
Now I see IOPS that go to 13-15K, but then it drops, and eventually drops to 
ZERO for several seconds, and then starts back up again.

What am I missing?

Thanks
Pankaj
PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).
Confidentiality: This email and any attachments are confidential and may be 
subject to copyright, legal or some other professional privilege. They are 
intended solely for the attention and use of the named addressee(s). They may 
only be copied, distributed or disclosed with the consent of the copyright 
owner. If you have received this email by mistake or by breach of the 
confidentiality clause, please notify the sender immediately by return email 
and delete or destroy all copies of the email. Any confidentiality, privilege 
or copyright is not waived or lost because this email has been sent to you by 
mistake.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Terrible RBD performance with Jewel

Reply via email to