We do have our nf_conntrack_max set to 1048576, but our count is actually only between 65k and 101k, so your 256k is probably fine. You'd have noticed in dmesg anyway, I figure, but I thought I'd mention it.
I am out of ideas. :) Jordan On Wed, Sep 20, 2017 at 1:28 PM, David Turner <drakonst...@gmail.com> wrote: > We have been running with the following settings for months for pid_max > and conntrack_max. Are these at values you would deem too low? > kernel.pid_max = 4194303 > net.nf_conntrack_max = 262144 > > > I went back in the logs to see where the slow requests and timeouts > started. It seems to have come out of nowhere. I included the log for 8 > minutes prior to the osd op thread time outs and 8.5 minutes before the > first slow request message. I don't see any indicators at all that it was > scrubbing or anything. I'm also not seeing any SMART errors on the drives. > > 2017-09-19 01:14:25.539030 7f134326a700 1 leveldb: Delete type=2 #125088 > > 2017-09-19 01:16:25.544797 7f134326a700 1 leveldb: Level-0 table #125303: > started > 2017-09-19 01:16:25.571103 7f134326a700 1 leveldb: Level-0 table #125303: > 1509406 bytes OK > 2017-09-19 01:16:25.572424 7f134326a700 1 leveldb: Delete type=0 #125277 > > 2017-09-19 01:18:14.329887 7f134326a700 1 leveldb: Level-0 table #125305: > started > 2017-09-19 01:18:14.359580 7f134326a700 1 leveldb: Level-0 table #125305: > 1503050 bytes OK > 2017-09-19 01:18:14.361107 7f134326a700 1 leveldb: Delete type=0 #125302 > > 2017-09-19 01:20:13.239215 7f134326a700 1 leveldb: Level-0 table #125307: > started > 2017-09-19 01:20:13.268553 7f134326a700 1 leveldb: Level-0 table #125307: > 1500713 bytes OK > 2017-09-19 01:20:13.270018 7f134326a700 1 leveldb: Delete type=0 #125304 > > 2017-09-19 01:22:24.023066 7f13c142b700 1 heartbeat_map is_healthy > 'OSD::osd_op_tp thread 0x7f139a4e0700' had timed out after 15 > 2017-09-19 01:22:24.023081 7f13c142b700 1 heartbeat_map is_healthy > 'OSD::osd_op_tp thread 0x7f139ace1700' had timed out after 15 > > --- this lines repeated thousands of times before we see the first slow > request message --- > > 2017-09-19 01:22:37.770750 7f134ddac700 1 heartbeat_map is_healthy > 'OSD::osd_op_tp thread 0x7f139d4e6700' had timed out after 15 > 2017-09-19 01:22:37.770751 7f134ddac700 1 heartbeat_map is_healthy > 'OSD::osd_op_tp thread 0x7f139e4e8700' had timed out after 15 > 2017-09-19 01:22:37.772928 7f13bc0b6700 0 log_channel(cluster) log [WRN] > : 25 slow requests, 5 included below; oldest blocked for > 30.890114 secs > 2017-09-19 01:22:37.772946 7f13bc0b6700 0 log_channel(cluster) log [WRN] > : slow request 30.726969 seconds old, received at 2017-09-19 > 01:22:07.045866: MOSDECSubOpWrite(97.2ebs4 48111 ECSubWrite(tid=6428263, > reqid=client.12636870.0:480468942, at_version=48111'1958916, > trim_to=48066'1955886, trim_rollback_to=48111'1958910)) currently started > 2017-09-19 01:22:37.772957 7f13bc0b6700 0 log_channel(cluster) log [WRN] > : slow request 30.668230 seconds old, received at 2017-09-19 > 01:22:07.104606: MOSDECSubOpWrite(97.2ebs4 48111 ECSubWrite(tid=6428264, > reqid=client.12636870.0:480468943, at_version=48111'1958917, > trim_to=48066'1955886, trim_rollback_to=48111'1958910)) currently started > 2017-09-19 01:22:37.772984 7f13bc0b6700 0 log_channel(cluster) log [WRN] > : slow request 30.057066 seconds old, received at 2017-09-19 > 01:22:07.715770: MOSDECSubOpWrite(97.2e6s2 48111 ECSubWrite(tid=3560867, > reqid=client.18422767.0:328744332, at_version=48111'1953782, > trim_to=48066'1950689, trim_rollback_to=48111'1953780)) currently started > 2017-09-19 01:22:37.772989 7f13bc0b6700 0 log_channel(cluster) log [WRN] > : slow request 30.664709 seconds old, received at 2017-09-19 > 01:22:07.108126: MOSDECSubOpWrite(97.2ebs4 48111 ECSubWrite(tid=6428265, > reqid=client.12636870.0:480468944, at_version=48111'1958918, > trim_to=48066'1955886, trim_rollback_to=48111'1958910)) currently started > 2017-09-19 01:22:37.773000 7f13bc0b6700 0 log_channel(cluster) log [WRN] > : slow request 30.657596 seconds old, received at 2017-09-19 > 01:22:07.115240: MOSDECSubOpWrite(97.51s3 48111 ECSubWrite(tid=5500858, > reqid=client.24892528.0:35633485, at_version=48111'1958524, > trim_to=48066'1955458, trim_rollback_to=48111'1958513)) currently started > 2017-09-19 01:22:37.785581 7f1381ee9700 1 heartbeat_map is_healthy > 'OSD::osd_op_tp thread 0x7f139a4e0700' had timed out after 15 > 2017-09-19 01:22:37.785583 7f1381ee9700 1 heartbeat_map is_healthy > 'OSD::osd_op_tp thread 0x7f139ace1700' had timed out after 15 > > > On Tue, Sep 19, 2017 at 7:31 PM Jordan Share <jordan.sh...@gmail.com> > wrote: > >> We had suicide timeouts, but unfortunately I can't remember the specific >> root cause at this point. >> >> It was definitely one of two things: >> * too low of net.netfilter.nf_conntrack_max (preventing the osds from >> opening new connections to each other) >> * too low of kernel.pid_max or kernel.threads-max (preventing new >> threads from starting) >> >> I am pretty sure we hit the pid_max early on, but the conntrack_max >> didn't cause trouble until we had enough VMs running to push the total >> number of connections past the default limit. >> >> Our cluster is approximately the same size as yours. >> >> Jordan >> >> On Tue, Sep 19, 2017 at 3:05 PM, Stanley Zhang < >> stanley.zh...@smxemail.com> wrote: >> >>> We don't use EC pools, but my experience with similar slow requests on >>> RGW+replicated_pools is that in the logs you need to find out the first >>> slow request and identify where it's from, for example, is it deep-scrub, >>> or some client accessing corrupted objects, disk errors etc. >>> >>> On 20/09/17 8:13 AM, David Turner wrote: >>> >>> Just starting 3 nights ago we started seeing OSDs randomly going down in >>> our cluster (Jewel 10.2.7). At first I saw that each OSD that was recently >>> marked down in the cluster (`ceph osd dump | grep -E '^osd\.[0-9]+\s' | >>> sort -nrk11` sorted list of OSDs by which OSDs have been marked down in the >>> most recent OSD map epochs) and all of them had been wrongly marked down. >>> Prior to this there is a lot of evidence of slow requests, OSD op thread >>> timing out, Filestore op thread timing out, and other errors. At the >>> bottom of the email is an excerpt of such errors. It is definitely not a >>> comprehensive log of these errors. >>> >>> The map epoch of the last time the OSD is marked down in the OSD map >>> matches when it logs that it was wrongly marked down. After thousands more >>> lines of op threads timing out and slow requests, the OSD finally asserts >>> with "hit suicide timeout". The first 2 nights this was localized to 2 >>> osds on the same host, but last night this happened on an osd that is on a >>> second host. 2 of the 3 OSDs have hit this 3 times and the other has hit >>> it twice. >>> >>> There are 15 OSD nodes and 224 total OSDs in the cluster. So far only >>> these 3 OSDs have hit this FAILED assert. Does anyone have any ideas of >>> what to do next? Or have any information that is missing here to be able >>> to understand things further? It's too intermittent to run the OSD with >>> log level 20. >>> >>> 2017-09-19 02:06:19.712334 7fdfeae86700 1 heartbeat_map is_healthy >>> 'OSD::osd_op_tp thread 0x7fdfc7235700' had timed out after 15 >>> 2017-09-19 02:06:19.712358 7fdfeae86700 1 heartbeat_map is_healthy >>> 'FileStore::op_tp thread 0x7fdfde58f700' had timed out after 60 >>> 2017-09-19 02:06:55.974099 7fdfe5b11700 0 log_channel(cluster) log >>> [WRN] : 891 slow requests, 5 included below; oldest blocked for > >>> 150.987639 secs >>> 2017-09-19 02:06:55.974114 7fdfe5b11700 0 log_channel(cluster) log >>> [WRN] : slow request 150.985487 seconds old, received at 2017-09-19 >>> 02:04:24.987412: MOSDECSubOpWrite(97.f8s5 48121 ECSubWrite(tid=420 >>> 4267, reqid=client.24351228.0:139832730, at_version=48121'1955383, >>> trim_to=48111'1952339, trim_rollback_to=48121'1955373)) currently >>> started >>> 2017-09-19 02:06:55.974123 7fdfe5b11700 0 log_channel(cluster) log >>> [WRN] : slow request 150.510718 seconds old, received at 2017-09-19 >>> 02:04:25.462180: MOSDECSubOpWrite(97.4es1 48121 ECSubWrite(tid=540 >>> 7590, reqid=client.24892528.0:36410074, at_version=48121'1960640, >>> trim_to=48111'1957612, trim_rollback_to=48121'1960635)) currently >>> started >>> 2017-09-19 02:06:55.974128 7fdfe5b11700 0 log_channel(cluster) log >>> [WRN] : slow request 149.177285 seconds old, received at 2017-09-19 >>> 02:04:26.795614: MOSDECSubOpWrite(97.1f4s1 48121 ECSubWrite(tid=87 >>> 93253, reqid=client.24892528.0:36410582, at_version=48121'1964351, >>> trim_to=48111'1961282, trim_rollback_to=48121'1964344)) currently >>> queued_for_pg >>> 2017-09-19 02:06:55.974134 7fdfe5b11700 0 log_channel(cluster) log >>> [WRN] : slow request 147.359320 seconds old, received at 2017-09-19 >>> 02:04:28.613578: MOSDECSubOpWrite(97.8s5 48121 ECSubWrite(tid=3228 >>> 587, reqid=client.18422767.0:329511916, at_version=48121'1965073, >>> trim_to=48111'1962055, trim_rollback_to=48121'1965066)) currently >>> queued_for_pg >>> 2017-09-19 02:06:55.974139 7fdfe5b11700 0 log_channel(cluster) log >>> [WRN] : slow request 146.404241 seconds old, received at 2017-09-19 >>> 02:04:29.568657: MOSDECSubOpWrite(97.f8s5 48121 ECSubWrite(tid=420 >>> 4291, reqid=client.18422767.0:329512018, at_version=48121'1955389, >>> trim_to=48111'1952339, trim_rollback_to=48121'1955377)) currently >>> queued_for_pg >>> 2017-09-19 02:06:55.974276 7fdf7aa6a700 0 -- 10.10.113.29:6826/3332721 >>> >> 10.10.13.32:6822/130263 pipe(0x7fe003b1c800 sd=67 :24892 s=1 >>> pgs=7268 cs=1 l=0 c=0x7fe01b44be00).connect got RESETSESSION >>> 2017-09-19 02:06:55.974626 7fdf78141700 0 -- 10.10.13.29:6826/3332721 >>> >> 10.10.13.33:6802/2651380 pipe(0x7fe008140800 sd=70 :61127 s=1 >>> pgs=7059 cs=1 l=0 c=0x7fe01be72600).connect got RESETSESSION >>> 2017-09-19 02:06:55.974657 7fdf78e4e700 0 -- 10.10.13.29:6826/3332721 >>> >> 10.10.13.35:6827/477826 pipe(0x7fe004aef400 sd=91 :19464 s=1 >>> pgs=7316 cs=1 l=0 c=0x7fe01b179080).connect got RESETSESSION >>> 2017-09-19 02:06:55.977120 7fdf95310700 0 -- 10.10.13.29:6826/3332721 >>> >> 10.10.13.22:6812/2276428 pipe(0x7fe003d15400 sd=86 :58892 s=1 >>> pgs=15908 cs=1 l=0 c=0x7fe01b2e1500).connect got RESETSESSION >>> 2017-09-19 02:06:55.979722 7fdf9500d700 0 -- 10.10.13.29:6826/3332721 >>> >> 10.10.13.27:6830/2018590 pipe(0x7fe003a06800 sd=98 :42191 s=1 >>> pgs=12697 cs=1 l=0 c=0x7fe01b44d780).connect got RESETSESSION >>> 2017-09-19 02:06:56.106436 7fdfba1dc700 0 -- 10.10.13.29:6826/3332721 >>> >> 10.10.13.27:6811/2018593 pipe(0x7fe009e79400 sd=137 :54582 s=1 >>> pgs=11500 cs=1 l=0 c=0x7fe005820880).connect got RESETSESSION >>> 2017-09-19 02:06:56.107146 7fdfbbaf5700 0 -- 10.10.13.29:6826/3332721 >>> >> 10.10.13.27:6811/2018593 pipe(0x7fe009e79400 sd=137 :54582 s=2 >>> pgs=11602 cs=1 l=0 c=0x7fe005820880).fault, initiating reconnect >>> --- >>> 2017-09-19 02:06:56.213980 7fdfdd58d700 0 log_channel(cluster) log >>> [WRN] : map e48123 wrongly marked me down >>> --- >>> 2017-09-19 03:06:34.778837 7fdfeae86700 1 heartbeat_map is_healthy >>> 'FileStore::op_tp thread 0x7fdfde58f700' had timed out after 60 >>> 2017-09-19 03:06:34.778840 7fdfeae86700 1 heartbeat_map is_healthy >>> 'FileStore::op_tp thread 0x7fdfddd8e700' had timed out after 60 >>> 2017-09-19 03:06:39.778908 7fdfeae86700 1 heartbeat_map is_healthy >>> 'OSD::osd_op_tp thread 0x7fdfc422f700' had timed out after 15 >>> 2017-09-19 03:06:39.778921 7fdfeae86700 1 heartbeat_map is_healthy >>> 'OSD::osd_op_tp thread 0x7fdfc5231700' had timed out after 15 >>> 2017-09-19 03:06:39.778930 7fdfeae86700 1 heartbeat_map is_healthy >>> 'OSD::osd_op_tp thread 0x7fdfc5231700' had suicide timed out after 150 >>> 2017-09-19 03:06:39.782749 7fdfeae86700 -1 common/HeartbeatMap.cc: In >>> function 'bool ceph::HeartbeatMap::_check(const >>> ceph::heartbeat_handle_d*, const char*, time_t)' thread 7fdfeae86700 time >>> 2017-09-19 >>> 03:06:39.778940 >>> common/HeartbeatMap.cc: 86: FAILED assert(0 == "hit suicide timeout") >>> >>> >>> _______________________________________________ >>> ceph-users mailing >>> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >>> -- >>> >>> *Stanley Zhang | * Senior Operations Engineer >>> *Telephone:* +64 9 302 0515 <+64%209-302%200515> *Fax:* +64 9 302 0518 >>> <+64%209-302%200518> >>> *Mobile:* +64 22 318 3664 <+64%2022%20318%203664> *Freephone:* 0800 SMX >>> SMX (769 769) >>> *SMX Limited:* Level 15, 19 Victoria Street West, Auckland, New Zealand >>> <https://maps.google.com/?q=19+Victoria%0D+Street+West,+Auckland,+New+Zealand&entry=gmail&source=g> >>> *Web:* http://smxemail.com >>> [image: SMX | Cloud Email Hosting & Security] >>> This email has been filtered by SMX. For more information visit >>> smxemail.com. >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >>
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com