Hi Nick,

Yes our application is doing small Random IO and I did not realize that the 
snapshotting feature could so much degrade performances in that case.

We have just deactivated it and deleted all snapshots. Will notify you if it 
drastically reduce the blocked ops and consequently the IO freeze on client 
side.

Thanks

Thomas

From: Nick Fisk [mailto:n...@fisk.me.uk]
Sent: mercredi 16 novembre 2016 13:25
To: Thomas Danan; 'Peter Maloney'
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] ceph cluster having blocke requests very frequently



From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Thomas 
Danan
Sent: 15 November 2016 21:14
To: Peter Maloney 
<peter.malo...@brockmann-consult.de<mailto:peter.malo...@brockmann-consult.de>>
Cc: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] ceph cluster having blocke requests very frequently

Very interesting ...

Any idea why optimal tunable would help here ?  on our cluster we have 500TB of 
data, I am a bit concerned about changing it without taking lot of precautions 
. ...
I am curious to know how much time it takes you to change tunable, size of your 
cluster and observed impacts on client IO ...

Yes We do have daily rbd snapshot from 16 different ceph RBD clients. 
Snapshoting the RBD image is quite immediate while we are seing the issue 
continuously during the day...

Just to point out that when you take a snapshot any writes to the original RBD 
will mean that the full 4MB object is copied into the snapshot. If you have a 
lot of small random IO going on the original RBD this can lead to massive write 
amplification across the cluster and may cause issues such as what you describe.

Also be aware that deleting large snapshots also puts significant strain on the 
OSD's as they try and delete hundreds of thousands of objects.


Will check all of this tomorrow . ..

Thanks again

Thomas



Sent from my Samsung device


-------- Original message --------
From: Peter Maloney 
<peter.malo...@brockmann-consult.de<mailto:peter.malo...@brockmann-consult.de>>
Date: 11/15/16 21:27 (GMT+01:00)
To: Thomas Danan <thomas.da...@mycom-osi.com<mailto:thomas.da...@mycom-osi.com>>
Cc: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] ceph cluster having blocke requests very frequently
On 11/15/16 14:05, Thomas Danan wrote:
> Hi Peter,
>
> Ceph cluster version is 0.94.5 and we are running with Firefly tunables and 
> also we have 10KPGs instead of the 30K / 40K we should have.
> The linux kernel version is 3.10.0-327.36.1.el7.x86_64 with RHEL 7.2
>
> On our side we havethe following settings:
> mon_osd_adjust_heartbeat_grace = false
> mon_osd_adjust_down_out_interval = false
> mon_osd_min_down_reporters = 5
> mon_osd_min_down_reports = 10
>
> explaining why the OSDs are not flapping but still they are behaving wrongly 
> and generate the slow requests I am describing.
>
> The osd_op_complaint_time is with the default value (30 sec), not sure I want 
> to change it base on your experience
I wasn't saying you should set the complaint time to 5, just saying
that's why I have complaints logged with such low block times.
> Thomas

And now I'm testing this:
        osd recovery sleep = 0.5
        osd snap trim sleep = 0.5

(or fiddling with it as low as 0.1 to make it rebalance faster)

While also changing tunables to optimal (which will rebalance 75% of the
objects)
Which has very good results so far (a few <14s blocks right at the
start, and none since, over an hour ago).

And I'm somehow hoping that will fix my rbd export-diff issue too... but
it at least appears to fix the rebalance causing blocks.

Do you use rbd snapshots? I think that may be causing my issues, based
on things like:

>             "description": "osd_op(client.692201.0:20455419 4.1b5a5bc1
> rbd_data.94a08238e1f29.000000000000617b [] snapc 918d=[918d]
> ack+ondisk+write+known_if_redirected e40036)",
>             "initiated_at": "2016-11-15 20:57:48.313432",
>             "age": 409.634862,
>             "duration": 3.377347,
>             ...
>                     {
>                         "time": "2016-11-15 20:57:48.313767",
>                         "event": "waiting for subops from 0,1,8,22"
>                     },
>             ...
>                     {
>                         "time": "2016-11-15 20:57:51.688530",
>                         "event": "sub_op_applied_rec from 22"
>                     },


Which says "snapc" in there (CoW?), and I think shows that just one osd
is delayed a few seconds and the rest are really fast, like you said.
(and not sure why I see 4 osds here when I have size 3... node1 osd 0
and 1, and node3 osd 8 and 22)

or some (shorter I think) have description like:
> osd_repop(client.426591.0:203051290 4.1f9
> 4:9fe4c001:::rbd_data.4cf92238e1f29.00000000000014ef:head v 40047'2531604)

________________________________

This electronic message contains information from Mycom which may be privileged 
or confidential. The information is intended to be for the use of the 
individual(s) or entity named above. If you are not the intended recipient, be 
aware that any disclosure, copying, distribution or any other use of the 
contents of this information is prohibited. If you have received this 
electronic message in error, please notify us by post or telephone (to the 
numbers or correspondence address above) or by email (at the email address 
above) immediately.



________________________________

This electronic message contains information from Mycom which may be privileged 
or confidential. The information is intended to be for the use of the 
individual(s) or entity named above. If you are not the intended recipient, be 
aware that any disclosure, copying, distribution or any other use of the 
contents of this information is prohibited. If you have received this 
electronic message in error, please notify us by post or telephone (to the 
numbers or correspondence address above) or by email (at the email address 
above) immediately.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to