Re: [ceph-users] One of our nodes has logs saying: wrongly marked me down

Robert LeBlanc Fri, 03 Jul 2015 20:33:07 -0700

You may not want to set your heartbeat grace so high, it will make I/O
block for a long time in the case of a real failure. You may want to look
at increasing down reporters instead.


Robert LeBlanc

Sent from a mobile device please excuse any typos.
On Jul 2, 2015 9:39 PM, "Tuomas Juntunen" <tuomas.juntu...@databasement.fi>
wrote:

> Just reporting back on my findings
>
>
>
> After making these changes the flapping occurred just once during the
> night. To fix it further I changed the heartbeat grace to 120secs. Also
> matched
>
> osd_op_threads and filestore_op_threads to core count.
>
>
>
> Br,T
>
>
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Tuomas Juntunen
> *Sent:* 2. heinäkuuta 2015 16:23
> *To:* 'Somnath Roy'; 'ceph-users'
> *Subject:* Re: [ceph-users] One of our nodes has logs saying: wrongly
> marked me down
>
>
>
> Thanks
>
>
>
> I’ll test these values, and also add the osd heartbeat grace to 60 seconds
> instead of 20, hopefully that would help with the latency during deep scrub.
>
>
>
> I changed shards to 6 and shard threads to 2, then it matches physical
> cores on the server not including hyperthreading.
>
>
>
> Br, T
>
>
>
> *From:* Somnath Roy [mailto:somnath....@sandisk.com
> <somnath....@sandisk.com>]
> *Sent:* 2. heinäkuuta 2015 6:29
> *To:* Tuomas Juntunen; 'ceph-users'
> *Subject:* RE: [ceph-users] One of our nodes has logs saying: wrongly
> marked me down
>
>
>
> Yeah, this can happen during deep_scrub and also during rebalancing..I
> forgot to mention that..
>
> Generally, it is a good idea to throttle those..For deep scrub, you can
> try using (got it from old post, I never used it)
>
>
>
> osd_scrub_chunk_min = 1
>
> osd_scrub_chunk_max = 1
>
> osd_scrub_sleep = 0.1
>
>
>
> For rebalancing I think you are already using proper value..
>
>
>
> But, I don’t think this will eliminate the scenario all together but
> should alleviate it a bit.
>
>
>
> Also, why you are using so many shards ? How many OSDs you are running in
> a box ? shard 25 should be good if you are running with single OSD, IF you
> have lot of OSDs in a box, try to reduce it ~5 or so.
>
>
>
> Thanks & Regards
>
> Somnath
>
>
>
>
>
> *From:* Tuomas Juntunen [mailto:tuomas.juntu...@databasement.fi
> <tuomas.juntu...@databasement.fi>]
> *Sent:* Wednesday, July 01, 2015 8:18 PM
> *To:* Somnath Roy; 'ceph-users'
> *Subject:* RE: [ceph-users] One of our nodes has logs saying: wrongly
> marked me down
>
>
>
> I’ve checked the network, we use IPoIB and all nodes are connected to the
> same switch, there are no breaks in connectivity while this happens. My
> constant ping says 0.03 – 0.1ms. I would say this is ok.
>
>
>
> This happens almost every time when deep scrubbing is running. Our loads
> on this particular server goes to 300+ and osd’s are marked down.
>
>
>
> Any suggestions on settings? I now have the following settings that might
> affect this
>
>
>
> [global]
>
>                              osd_op_threads = 6
>
>                              osd_op_num_threads_per_shard = 1
>
>                              osd_op_num_shards = 25
>
>                              #osd_op_num_sharded_pool_threads = 25
>
>                              filestore_op_threads = 6
>
>                              ms_nocrc = true
>
>                              filestore_fd_cache_size = 64
>
>                              filestore_fd_cache_shards = 32
>
>                              ms_dispatch_throttle_bytes = 0
>
>                              throttler_perf_counter = false
>
>
>
> [osd]
>
>                              osd scrub load threshold = 0.1
>
>                              osd max backfills = 1
>
>                              osd recovery max active = 1
>
>                              osd scrub sleep = .1
>
>                              osd disk thread ioprio class = idle
>
>                              osd disk thread ioprio priority = 7
>
>                              osd scrub chunk max = 5
>
>                              osd deep scrub stride = 1048576
>
>                              filestore queue max ops = 10000
>
>                              filestore max sync interval = 30
>
>                              filestore min sync interval = 29
>
>                              osd_client_message_size_cap = 0
>
>                              osd_client_message_cap = 0
>
>                              osd_enable_op_tracker = false
>
>
>
> Br, T
>
>
>
>
>
> *From:* Somnath Roy [mailto:somnath....@sandisk.com
> <somnath....@sandisk.com>]
> *Sent:* 2. heinäkuuta 2015 0:30
> *To:* Tuomas Juntunen; 'ceph-users'
> *Subject:* RE: [ceph-users] One of our nodes has logs saying: wrongly
> marked me down
>
>
>
> This can happen if your OSDs are flapping.. Hope your network is stable.
>
>
>
> Thanks & Regards
>
> Somnath
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com
> <ceph-users-boun...@lists.ceph.com>] *On Behalf Of *Tuomas Juntunen
> *Sent:* Wednesday, July 01, 2015 2:24 PM
> *To:* 'ceph-users'
> *Subject:* [ceph-users] One of our nodes has logs saying: wrongly marked
> me down
>
>
>
> Hi
>
>
>
> One our nodes has OSD logs that say “wrongly marked me down” for every OSD
> at some point. What could be the reason for this. Anyone have any similar
> experiences?
>
>
>
> Other nodes work totally fine and they are all identical.
>
>
>
> Br,T
>
>
> ------------------------------
>
>
> PLEASE NOTE: The information contained in this electronic mail message is
> intended only for the use of the designated recipient(s) named above. If
> the reader of this message is not the intended recipient, you are hereby
> notified that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly
> prohibited. If you have received this communication in error, please notify
> the sender by telephone or e-mail (as shown above) immediately and destroy
> any and all copies of this message in your possession (whether hard copies
> or electronically stored copies).
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] One of our nodes has logs saying: wrongly marked me down

Reply via email to