We have been running with the following settings for months for pid_max and conntrack_max. Are these at values you would deem too low? kernel.pid_max = 4194303 net.nf_conntrack_max = 262144
I went back in the logs to see where the slow requests and timeouts started. It seems to have come out of nowhere. I included the log for 8 minutes prior to the osd op thread time outs and 8.5 minutes before the first slow request message. I don't see any indicators at all that it was scrubbing or anything. I'm also not seeing any SMART errors on the drives. 2017-09-19 01:14:25.539030 7f134326a700 1 leveldb: Delete type=2 #125088 2017-09-19 01:16:25.544797 7f134326a700 1 leveldb: Level-0 table #125303: started 2017-09-19 01:16:25.571103 7f134326a700 1 leveldb: Level-0 table #125303: 1509406 bytes OK 2017-09-19 01:16:25.572424 7f134326a700 1 leveldb: Delete type=0 #125277 2017-09-19 01:18:14.329887 7f134326a700 1 leveldb: Level-0 table #125305: started 2017-09-19 01:18:14.359580 7f134326a700 1 leveldb: Level-0 table #125305: 1503050 bytes OK 2017-09-19 01:18:14.361107 7f134326a700 1 leveldb: Delete type=0 #125302 2017-09-19 01:20:13.239215 7f134326a700 1 leveldb: Level-0 table #125307: started 2017-09-19 01:20:13.268553 7f134326a700 1 leveldb: Level-0 table #125307: 1500713 bytes OK 2017-09-19 01:20:13.270018 7f134326a700 1 leveldb: Delete type=0 #125304 2017-09-19 01:22:24.023066 7f13c142b700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f139a4e0700' had timed out after 15 2017-09-19 01:22:24.023081 7f13c142b700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f139ace1700' had timed out after 15 --- this lines repeated thousands of times before we see the first slow request message --- 2017-09-19 01:22:37.770750 7f134ddac700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f139d4e6700' had timed out after 15 2017-09-19 01:22:37.770751 7f134ddac700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f139e4e8700' had timed out after 15 2017-09-19 01:22:37.772928 7f13bc0b6700 0 log_channel(cluster) log [WRN] : 25 slow requests, 5 included below; oldest blocked for > 30.890114 secs 2017-09-19 01:22:37.772946 7f13bc0b6700 0 log_channel(cluster) log [WRN] : slow request 30.726969 seconds old, received at 2017-09-19 01:22:07.045866: MOSDECSubOpWrite(97.2ebs4 48111 ECSubWrite(tid=6428263, reqid=client.12636870.0:480468942, at_version=48111'1958916, trim_to=48066'1955886, trim_rollback_to=48111'1958910)) currently started 2017-09-19 01:22:37.772957 7f13bc0b6700 0 log_channel(cluster) log [WRN] : slow request 30.668230 seconds old, received at 2017-09-19 01:22:07.104606: MOSDECSubOpWrite(97.2ebs4 48111 ECSubWrite(tid=6428264, reqid=client.12636870.0:480468943, at_version=48111'1958917, trim_to=48066'1955886, trim_rollback_to=48111'1958910)) currently started 2017-09-19 01:22:37.772984 7f13bc0b6700 0 log_channel(cluster) log [WRN] : slow request 30.057066 seconds old, received at 2017-09-19 01:22:07.715770: MOSDECSubOpWrite(97.2e6s2 48111 ECSubWrite(tid=3560867, reqid=client.18422767.0:328744332, at_version=48111'1953782, trim_to=48066'1950689, trim_rollback_to=48111'1953780)) currently started 2017-09-19 01:22:37.772989 7f13bc0b6700 0 log_channel(cluster) log [WRN] : slow request 30.664709 seconds old, received at 2017-09-19 01:22:07.108126: MOSDECSubOpWrite(97.2ebs4 48111 ECSubWrite(tid=6428265, reqid=client.12636870.0:480468944, at_version=48111'1958918, trim_to=48066'1955886, trim_rollback_to=48111'1958910)) currently started 2017-09-19 01:22:37.773000 7f13bc0b6700 0 log_channel(cluster) log [WRN] : slow request 30.657596 seconds old, received at 2017-09-19 01:22:07.115240: MOSDECSubOpWrite(97.51s3 48111 ECSubWrite(tid=5500858, reqid=client.24892528.0:35633485, at_version=48111'1958524, trim_to=48066'1955458, trim_rollback_to=48111'1958513)) currently started 2017-09-19 01:22:37.785581 7f1381ee9700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f139a4e0700' had timed out after 15 2017-09-19 01:22:37.785583 7f1381ee9700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f139ace1700' had timed out after 15 On Tue, Sep 19, 2017 at 7:31 PM Jordan Share <jordan.sh...@gmail.com> wrote: > We had suicide timeouts, but unfortunately I can't remember the specific > root cause at this point. > > It was definitely one of two things: > * too low of net.netfilter.nf_conntrack_max (preventing the osds from > opening new connections to each other) > * too low of kernel.pid_max or kernel.threads-max (preventing new > threads from starting) > > I am pretty sure we hit the pid_max early on, but the conntrack_max didn't > cause trouble until we had enough VMs running to push the total number of > connections past the default limit. > > Our cluster is approximately the same size as yours. > > Jordan > > On Tue, Sep 19, 2017 at 3:05 PM, Stanley Zhang <stanley.zh...@smxemail.com > > wrote: > >> We don't use EC pools, but my experience with similar slow requests on >> RGW+replicated_pools is that in the logs you need to find out the first >> slow request and identify where it's from, for example, is it deep-scrub, >> or some client accessing corrupted objects, disk errors etc. >> >> On 20/09/17 8:13 AM, David Turner wrote: >> >> Just starting 3 nights ago we started seeing OSDs randomly going down in >> our cluster (Jewel 10.2.7). At first I saw that each OSD that was recently >> marked down in the cluster (`ceph osd dump | grep -E '^osd\.[0-9]+\s' | >> sort -nrk11` sorted list of OSDs by which OSDs have been marked down in the >> most recent OSD map epochs) and all of them had been wrongly marked down. >> Prior to this there is a lot of evidence of slow requests, OSD op thread >> timing out, Filestore op thread timing out, and other errors. At the >> bottom of the email is an excerpt of such errors. It is definitely not a >> comprehensive log of these errors. >> >> The map epoch of the last time the OSD is marked down in the OSD map >> matches when it logs that it was wrongly marked down. After thousands more >> lines of op threads timing out and slow requests, the OSD finally asserts >> with "hit suicide timeout". The first 2 nights this was localized to 2 >> osds on the same host, but last night this happened on an osd that is on a >> second host. 2 of the 3 OSDs have hit this 3 times and the other has hit >> it twice. >> >> There are 15 OSD nodes and 224 total OSDs in the cluster. So far only >> these 3 OSDs have hit this FAILED assert. Does anyone have any ideas of >> what to do next? Or have any information that is missing here to be able >> to understand things further? It's too intermittent to run the OSD with >> log level 20. >> >> 2017-09-19 02:06:19.712334 7fdfeae86700 1 heartbeat_map is_healthy >> 'OSD::osd_op_tp thread 0x7fdfc7235700' had timed out after 15 >> 2017-09-19 02:06:19.712358 7fdfeae86700 1 heartbeat_map is_healthy >> 'FileStore::op_tp thread 0x7fdfde58f700' had timed out after 60 >> 2017-09-19 02:06:55.974099 7fdfe5b11700 0 log_channel(cluster) log [WRN] >> : 891 slow requests, 5 included below; oldest blocked for > 150.987639 secs >> 2017-09-19 02:06:55.974114 7fdfe5b11700 0 log_channel(cluster) log [WRN] >> : slow request 150.985487 seconds old, received at 2017-09-19 >> 02:04:24.987412: MOSDECSubOpWrite(97.f8s5 48121 ECSubWrite(tid=420 >> 4267, reqid=client.24351228.0:139832730, at_version=48121'1955383, >> trim_to=48111'1952339, trim_rollback_to=48121'1955373)) currently started >> 2017-09-19 02:06:55.974123 7fdfe5b11700 0 log_channel(cluster) log [WRN] >> : slow request 150.510718 seconds old, received at 2017-09-19 >> 02:04:25.462180: MOSDECSubOpWrite(97.4es1 48121 ECSubWrite(tid=540 >> 7590, reqid=client.24892528.0:36410074, at_version=48121'1960640, >> trim_to=48111'1957612, trim_rollback_to=48121'1960635)) currently started >> 2017-09-19 02:06:55.974128 7fdfe5b11700 0 log_channel(cluster) log [WRN] >> : slow request 149.177285 seconds old, received at 2017-09-19 >> 02:04:26.795614: MOSDECSubOpWrite(97.1f4s1 48121 ECSubWrite(tid=87 >> 93253, reqid=client.24892528.0:36410582, at_version=48121'1964351, >> trim_to=48111'1961282, trim_rollback_to=48121'1964344)) currently >> queued_for_pg >> 2017-09-19 02:06:55.974134 7fdfe5b11700 0 log_channel(cluster) log [WRN] >> : slow request 147.359320 seconds old, received at 2017-09-19 >> 02:04:28.613578: MOSDECSubOpWrite(97.8s5 48121 ECSubWrite(tid=3228 >> 587, reqid=client.18422767.0:329511916, at_version=48121'1965073, >> trim_to=48111'1962055, trim_rollback_to=48121'1965066)) currently >> queued_for_pg >> 2017-09-19 02:06:55.974139 7fdfe5b11700 0 log_channel(cluster) log [WRN] >> : slow request 146.404241 seconds old, received at 2017-09-19 >> 02:04:29.568657: MOSDECSubOpWrite(97.f8s5 48121 ECSubWrite(tid=420 >> 4291, reqid=client.18422767.0:329512018, at_version=48121'1955389, >> trim_to=48111'1952339, trim_rollback_to=48121'1955377)) currently >> queued_for_pg >> 2017-09-19 02:06:55.974276 7fdf7aa6a700 0 -- 10.10.113.29:6826/3332721 >> >> 10.10.13.32:6822/130263 pipe(0x7fe003b1c800 sd=67 :24892 s=1 pgs=7268 >> cs=1 l=0 c=0x7fe01b44be00).connect got RESETSESSION >> 2017-09-19 02:06:55.974626 7fdf78141700 0 -- 10.10.13.29:6826/3332721 >> >> 10.10.13.33:6802/2651380 pipe(0x7fe008140800 sd=70 :61127 s=1 >> pgs=7059 cs=1 l=0 c=0x7fe01be72600).connect got RESETSESSION >> 2017-09-19 02:06:55.974657 7fdf78e4e700 0 -- 10.10.13.29:6826/3332721 >> >> 10.10.13.35:6827/477826 pipe(0x7fe004aef400 sd=91 :19464 s=1 pgs=7316 >> cs=1 l=0 c=0x7fe01b179080).connect got RESETSESSION >> 2017-09-19 02:06:55.977120 7fdf95310700 0 -- 10.10.13.29:6826/3332721 >> >> 10.10.13.22:6812/2276428 pipe(0x7fe003d15400 sd=86 :58892 s=1 >> pgs=15908 cs=1 l=0 c=0x7fe01b2e1500).connect got RESETSESSION >> 2017-09-19 02:06:55.979722 7fdf9500d700 0 -- 10.10.13.29:6826/3332721 >> >> 10.10.13.27:6830/2018590 pipe(0x7fe003a06800 sd=98 :42191 s=1 >> pgs=12697 cs=1 l=0 c=0x7fe01b44d780).connect got RESETSESSION >> 2017-09-19 02:06:56.106436 7fdfba1dc700 0 -- 10.10.13.29:6826/3332721 >> >> 10.10.13.27:6811/2018593 pipe(0x7fe009e79400 sd=137 :54582 s=1 >> pgs=11500 cs=1 l=0 c=0x7fe005820880).connect got RESETSESSION >> 2017-09-19 02:06:56.107146 7fdfbbaf5700 0 -- 10.10.13.29:6826/3332721 >> >> 10.10.13.27:6811/2018593 pipe(0x7fe009e79400 sd=137 :54582 s=2 >> pgs=11602 cs=1 l=0 c=0x7fe005820880).fault, initiating reconnect >> --- >> 2017-09-19 02:06:56.213980 7fdfdd58d700 0 log_channel(cluster) log [WRN] >> : map e48123 wrongly marked me down >> --- >> 2017-09-19 03:06:34.778837 7fdfeae86700 1 heartbeat_map is_healthy >> 'FileStore::op_tp thread 0x7fdfde58f700' had timed out after 60 >> 2017-09-19 03:06:34.778840 7fdfeae86700 1 heartbeat_map is_healthy >> 'FileStore::op_tp thread 0x7fdfddd8e700' had timed out after 60 >> 2017-09-19 03:06:39.778908 7fdfeae86700 1 heartbeat_map is_healthy >> 'OSD::osd_op_tp thread 0x7fdfc422f700' had timed out after 15 >> 2017-09-19 03:06:39.778921 7fdfeae86700 1 heartbeat_map is_healthy >> 'OSD::osd_op_tp thread 0x7fdfc5231700' had timed out after 15 >> 2017-09-19 03:06:39.778930 7fdfeae86700 1 heartbeat_map is_healthy >> 'OSD::osd_op_tp thread 0x7fdfc5231700' had suicide timed out after 150 >> 2017-09-19 03:06:39.782749 7fdfeae86700 -1 common/HeartbeatMap.cc: In >> function 'bool ceph::HeartbeatMap::_check(const ceph::heartbeat_handle_d*, >> const char*, time_t)' thread 7fdfeae86700 time 2017-09-19 >> 03:06:39.778940 >> common/HeartbeatMap.cc: 86: FAILED assert(0 == "hit suicide timeout") >> >> >> _______________________________________________ >> ceph-users mailing >> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> -- >> >> *Stanley Zhang | * Senior Operations Engineer >> *Telephone:* +64 9 302 0515 <+64%209-302%200515> *Fax:* +64 9 302 0518 >> <+64%209-302%200518> >> *Mobile:* +64 22 318 3664 <+64%2022%20318%203664> *Freephone:* 0800 SMX >> SMX (769 769) >> *SMX Limited:* Level 15, 19 Victoria Street West, Auckland, New Zealand >> <https://maps.google.com/?q=19+Victoria%0D+Street+West,+Auckland,+New+Zealand&entry=gmail&source=g> >> *Web:* http://smxemail.com >> [image: SMX | Cloud Email Hosting & Security] >> This email has been filtered by SMX. For more information visit >> smxemail.com. >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com