Hi,

You need to monitor latency instead of peak points. As Ceph is writing to
two other nodes if you have 3 replicas that is 4x extra the latency
compared to one roundtrip to the first OSD from client. So smaller and more
IO equals more pain in latency.

And the worst thing is that there is nothing that actually shows this
AFAIK, ceph osd perf shows some latency. A slow CPU could hamper
performance even if it's showing no sign of it.

I believe you can see which operations is running right now and see where
they are waiting for, I think there's a thread on this ML regarding dead
lock and skow requests that could be interesting.

I did not see Christians responses either so maybe not a problem with your
client.

Regards,
Josef

On Tue, 24 May 2016, 08:49 Peter Kerdisle, <peter.kerdi...@gmail.com> wrote:

> Hey Christian,
>
> I honestly haven't seen any replies to my earlier message. I will traverse
> my email and make sure I find it, my apologies.
>
> I am graphing everything with collectd and graphite this is what makes it
> so frustrating since I am not seeing any obvious pain points.
>
> I am basically using the pool in read-forward now so there should be
> almost no promotion from EC to the SSD pool. I will see what options I have
> for adding some SSD journals to the OSD nodes to help speed things along.
>
> Thanks, and apologies again for missing your earlier replies.
>
> Peter
>
> On Tue, May 24, 2016 at 4:25 AM, Christian Balzer <ch...@gol.com> wrote:
>
>>
>> Hello,
>>
>> On Mon, 23 May 2016 10:45:41 +0200 Peter Kerdisle wrote:
>>
>> > Hey,
>> >
>> > Sadly I'm still battling this issue. I did notice one interesting thing.
>> >
>> > I changed the cache settings for my cache tier to add redundancy to the
>> > pool which means a lot of recover activity on the cache. During all this
>> > there were absolutely no slow requests reported. Is there anything I can
>> > conclude from that information? Is it possible that not having any SSDs
>> > for journals could be the bottleneck on my erasure pool and that's
>> > generating slow requests?
>> >
>> That's what I suggested in the first of my 3 replies to your original
>> thread which seemingly got ignored judging by the lack of a reply.
>>
>> > I simply can't imagine why a request can be blocked for 30 or even 60
>> > seconds. It's getting really frustrating not being able to fix this and
>> I
>> > simply don't know what else I can do at this point.
>> >
>> Are you monitoring your cluster, especially the HDD nodes?
>> Permanently with collectd and graphite (or similar) and topically
>> (especially during tests) with atop (or iostat, etc)?
>>
>> And one more time, your cache tier can only help you if it is fast and
>> large enough to sufficiently dis-engage you from your slow backing
>> storage.
>> And lets face it EC pool AND no journal SSDs will be quite slow.
>>
>> So if your cache is dirty all the time and has to flush (causing IO on the
>> backing storage) while there is also promotion from the backing storage to
>> the cache going on you're basically down to the base speed of your EC
>> pool, at least for some of your ops.
>>
>> Christian
>>
>> > If anybody has anything I haven't tried before please let me know.
>> >
>> > Peter
>> >
>> > On Thu, May 5, 2016 at 10:30 AM, Peter Kerdisle
>> > <peter.kerdi...@gmail.com> wrote:
>> >
>> > > Hey guys,
>> > >
>> > > I'm running into an issue with my cluster during high activity.
>> > >
>> > > I have two SSD cache servers (2 SSDs for journals, 7 SSDs for data)
>> > > with 2x10Gbit bonded each and a six OSD nodes with a 10Gbit public and
>> > > 10Gbit cluster network for the erasure pool (10x3TB without separate
>> > > journal). This is all on Jewel.
>> > >
>> > > It's working fine with normal load. However when I force increased
>> > > activity by lowering the cache_target_dirty_ratio to make sure my
>> > > files are promoted things start to go amiss.
>> > >
>> > > To give an example: http://pastie.org/private/5k5ml6a8gqkivjshgjcedg
>> > >
>> > > This is especially concering:  pgs: 9 activating+undersized+degraded,
>> > > 48 active+undersized+degraded, 1 stale+active+clean, 27 peering.
>> > >
>> > > Here is an other minute or so where I grepped for warnings:
>> > > http://pastie.org/private/bfv3kxl63cfcduafoaurog
>> > >
>> > > These warnings are generated all over the OSD nodes, not specifically
>> > > one OSD or even one node.
>> > >
>> > > During this time the different OSD logs show various warnings:
>> > >
>> > > 2016-05-05 10:04:10.873603 7f1afaf3b700  1 heartbeat_map is_healthy
>> > > 'OSD::osd_op_tp thread 0x7f1af3f2d700' had timed out after 15
>> > > 2016-05-05 10:04:10.873605 7f1afaf3b700  1 heartbeat_map is_healthy
>> > > 'OSD::osd_op_tp thread 0x7f1af5f31700' had timed out after 15
>> > > 2016-05-05 10:04:10.905997 7f1afc73e700  1 heartbeat_map is_healthy
>> > > 'OSD::osd_op_tp thread 0x7f1af3f2d700' had timed out after 15
>> > > 2016-05-05 10:04:10.906000 7f1afc73e700  1 heartbeat_map is_healthy
>> > > 'OSD::osd_op_tp thread 0x7f1af5f31700' had timed out after 15
>> > > 2016-05-05 10:04:10.906022 7f1afaf3b700  1 heartbeat_map is_healthy
>> > > 'OSD::osd_op_tp thread 0x7f1af3f2d700' had timed out after 15
>> > > 2016-05-05 10:04:10.906027 7f1afaf3b700  1 heartbeat_map is_healthy
>> > > 'OSD::osd_op_tp thread 0x7f1af5f31700' had timed out after 15
>> > > 2016-05-05 10:04:10.949894 7f1af3f2d700  1 heartbeat_map reset_timeout
>> > > 'OSD::osd_op_tp thread 0x7f1af3f2d700' had timed out after 15
>> > > 2016-05-05 10:04:10.956801 7f1afc73e700  1 heartbeat_map is_healthy
>> > > 'OSD::osd_op_tp thread 0x7f1af5f31700' had timed out after 15
>> > > 2016-05-05 10:04:10.956833 7f1afaf3b700  1 heartbeat_map is_healthy
>> > > 'OSD::osd_op_tp thread 0x7f1af5f31700' had timed out after 15
>> > > 2016-05-05 10:04:10.957816 7f1af5f31700  1 heartbeat_map reset_timeout
>> > > 'OSD::osd_op_tp thread 0x7f1af5f31700' had timed out after 15
>> > >
>> > > or
>> > >
>> > > 2016-05-05 10:03:27.269658 7f98cca35700 -1 osd.6 7235 heartbeat_check:
>> > > no reply from osd.25 since back 2016-05-05 10:03:06.566276 front
>> > > 2016-05-05 10:03:06.566276 (cutoff 2016-05-05 10:03:07.269651)
>> > > 2016-05-05 10:03:28.269838 7f98cca35700 -1 osd.6 7235 heartbeat_check:
>> > > no reply from osd.25 since back 2016-05-05 10:03:06.566276 front
>> > > 2016-05-05 10:03:06.566276 (cutoff 2016-05-05 10:03:08.269831)
>> > > 2016-05-05 10:03:29.269998 7f98cca35700 -1 osd.6 7235 heartbeat_check:
>> > > no reply from osd.25 since back 2016-05-05 10:03:06.566276 front
>> > > 2016-05-05 10:03:06.566276 (cutoff 2016-05-05 10:03:09.269992)
>> > > 2016-05-05 10:03:29.801145 7f98b339a700 -1 osd.6 7235 heartbeat_check:
>> > > no reply from osd.25 since back 2016-05-05 10:03:06.566276 front
>> > > 2016-05-05 10:03:06.566276 (cutoff 2016-05-05 10:03:09.801142)
>> > > 2016-05-05 10:04:06.275237 7f98cca35700  0 log_channel(cluster) log
>> > > [WRN] : 1 slow requests, 1 included below; oldest blocked for >
>> > > 30.910385 secs 2016-05-05 10:04:06.275252 7f98cca35700  0
>> > > log_channel(cluster) log [WRN] : slow request 30.910385 seconds old,
>> > > received at 2016-05-05 10:03:35.364796: osd_op(osd.70.6640:6555588
>> > > 4.1041c254 rbd_data.afd7564682858.0000000000014daa [copy-from ver
>> > > 105993] snapc 0=[]
>> > > ondisk+write+ignore_overlay+enforce_snapc+known_if_redirected e7235)
>> > > currently commit_sent
>> > >
>> > >
>> > > I've followed the instructions on
>> > >
>> http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/
>> > > to hopefully find out what's happening but as far as the hardware is
>> > > concerned everything looks fine. No smart errors logged, iostats shows
>> > > some activity but nothing pegged to 100%, no messages in dmesg and the
>> > > cpus are only used for around 25% max.
>> > >
>> > > This makes me think it might be network related. I have a 10Gbit
>> public
>> > > and 10Gbit cluster network neither of which seems to hit any limits.
>> > > There is one thing that might be a problem and that is that one of the
>> > > cache node has a bonded interface and no access to the cluster
>> > > network, and the other cache node has a public and cluster interface.
>> > >
>> > > Could anybody give me some more steps I can take to further discover
>> > > where this bottleneck lies?
>> > >
>> > > Thanks in advance,
>> > >
>> > > Peter
>> > >
>>
>>
>> --
>> Christian Balzer        Network/Systems Engineer
>> ch...@gol.com           Global OnLine Japan/Rakuten Communications
>> http://www.gol.com/
>>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to