Thanks Somnath. I will try all these, but I think there is something else going 
on too.
Firstly my test reaches 0 IOPS within 10 seconds sometimes.
Secondly, when I'm at 0 IOPS, I see NO disk activity on IOSTAT and no CPU 
activity either. This part is strange.

Thanks
Pankaj

From: Somnath Roy [mailto:somnath....@sandisk.com]
Sent: Wednesday, July 13, 2016 5:49 PM
To: Somnath Roy; Garg, Pankaj; ceph-users@lists.ceph.com
Subject: RE: Terrible RBD performance with Jewel

Also increase the following..

filestore_op_threads

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Somnath Roy
Sent: Wednesday, July 13, 2016 5:47 PM
To: Garg, Pankaj; ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] Terrible RBD performance with Jewel

Pankaj,

Could be related to the new throttle parameter introduced in jewel. By default 
these throttles are off , you need to tweak it according to your setup.
What is your journal size and fio block size ?
If it is default 5GB , with this rate (assuming 4K RW)   you mentioned and 
considering 3X replication , it can fill up your journal and stall io within 
~30 seconds or so.
If you think this is what is happening in your system , you need to turn this 
throttle on (see 
https://github.com/ceph/ceph/blob/jewel/src/doc/dynamic-throttle.txt ) and also 
need to lower the filestore_max_sync_interval to ~1 (or even lower). Since you 
are trying on SSD , I would also recommend to turn the following parameter on 
for the stable performance out.


filestore_odsync_write = true

Thanks & Regards
Somnath
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Garg, 
Pankaj
Sent: Wednesday, July 13, 2016 4:57 PM
To: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: [ceph-users] Terrible RBD performance with Jewel

Hi,
I just  installed jewel on a small cluster of 3 machines with 4 SSDs each. I 
created 8 RBD images, and use a single client, with 8 threads, to do random 
writes (using FIO with RBD engine) on the images ( 1 thread per image).
The cluster has 3X replication and 10G cluster and client networks.
FIO prints the aggregate IOPS every second for the cluster. Before Jewel, I get 
roughtly 10K IOPS. It was up and down, but still kept going.
Now I see IOPS that go to 13-15K, but then it drops, and eventually drops to 
ZERO for several seconds, and then starts back up again.

What am I missing?

Thanks
Pankaj
PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to