Re: [ceph-users] OSD went down but no idea why

blackpiglet J. Tue, 30 Jan 2018 00:59:46 -0800

I found some logs says osd.91 is down. I think that should be same for osd.9
I am not sure what will cause the OSD process treated by peers as down.


2018-01-30 06:39:33.767747 7f2402ef9700  1 mon.ubuntuser8@0(leader).log
v424396 check_sub sending message to client.164108 10.1.248.8:0/3388257888
with 1 entries (version 424396)
2018-01-30 06:39:37.623969 7f2409f07700  0 log_channel(cluster) log [INF] :
Health check cleared: REQUEST_SLOW (was: 2 slow requests are blocked > 32
sec)
2018-01-30 06:39:37.751100 7f2402ef9700  1 mon.ubuntuser8@0(leader).log
v424397 check_sub sending message to client.164108 10.1.248.8:0/3388257888
with 1 entries (version 424397)
2018-01-30 06:39:53.535500 7f2402ef9700  1 mon.ubuntuser8@0(leader).log
v424398 check_sub sending message to client.164108 10.1.248.8:0/3388257888
with 0 entries (version 424398)
2018-01-30 06:40:09.251953 7f2409f07700  0
mon.ubuntuser8@0(leader).data_health(33)
update_stats avail 86% total 806 GB, used 70945 MB, avail 696 GB
2018-01-30 06:40:11.668097 7f2409f07700  0 log_channel(cluster) log [WRN] :
Health check failed: 1 slow requests are blocked > 32 sec (REQUEST_SLOW)
2018-01-30 06:40:11.794779 7f2402ef9700  1 mon.ubuntuser8@0(leader).log
v424399 check_sub sending message to client.164108 10.1.248.8:0/3388257888
with 1 entries (version 424399)
2018-01-30 06:40:18.599837 7f2407702700  1 mon.ubuntuser8@0(leader).osd
e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.0
10.1.248.1:6871/3727 is reporting failure:1
2018-01-30 06:40:18.599861 7f2407702700  0 log_channel(cluster) log [DBG] :
osd.90 10.1.248.4:6830/3552 reported failed by osd.0 10.1.248.1:6871/3727
2018-01-30 06:40:18.783394 7f2402ef9700  1 mon.ubuntuser8@0(leader).log
v424400 check_sub sending message to client.164108 10.1.248.8:0/3388257888
with 0 entries (version 424400)
2018-01-30 06:40:19.290525 7f2407702700  1 mon.ubuntuser8@0(leader).osd
e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.61
10.1.248.3:6842/3728 is reporting failure:1
2018-01-30 06:40:19.290546 7f2407702700  0 log_channel(cluster) log [DBG] :
osd.90 10.1.248.4:6830/3552 reported failed by osd.61 10.1.248.3:6842/3728
2018-01-30 06:40:19.316574 7f2407702700  1 mon.ubuntuser8@0(leader).osd
e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.39
10.1.248.2:6822/3732 is reporting failure:1
2018-01-30 06:40:19.316592 7f2407702700  0 log_channel(cluster) log [DBG] :
osd.90 10.1.248.4:6830/3552 reported failed by osd.39 10.1.248.2:6822/3732
2018-01-30 06:40:19.361317 7f2407702700  1 mon.ubuntuser8@0(leader).osd
e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.44
10.1.248.2:6800/3683 is reporting failure:1
2018-01-30 06:40:19.361335 7f2407702700  0 log_channel(cluster) log [DBG] :
osd.90 10.1.248.4:6830/3552 reported failed by osd.44 10.1.248.2:6800/3683
2018-01-30 06:40:19.918065 7f2402ef9700  1 mon.ubuntuser8@0(leader).log
v424401 check_sub sending message to client.164108 10.1.248.8:0/3388257888
with 0 entries (version 424401)
2018-01-30 06:40:20.075119 7f2407702700  1 mon.ubuntuser8@0(leader).osd
e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.41
10.1.248.2:6830/3677 is reporting failure:1
2018-01-30 06:40:20.075141 7f2407702700  0 log_channel(cluster) log [DBG] :
osd.90 10.1.248.4:6830/3552 reported failed by osd.41 10.1.248.2:6830/3677
2018-01-30 06:40:20.493001 7f2407702700  1 mon.ubuntuser8@0(leader).osd
e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.56
10.1.248.3:6800/3713 is reporting failure:1
2018-01-30 06:40:20.493023 7f2407702700  0 log_channel(cluster) log [DBG] :
osd.90 10.1.248.4:6830/3552 reported failed by osd.56 10.1.248.3:6800/3713
2018-01-30 06:40:20.787217 7f2407702700  1 mon.ubuntuser8@0(leader).osd
e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.48
10.1.248.3:6815/3670 is reporting failure:1
2018-01-30 06:40:20.787238 7f2407702700  0 log_channel(cluster) log [DBG] :
osd.90 10.1.248.4:6830/3552 reported failed by osd.48 10.1.248.3:6815/3670
2018-01-30 06:40:20.982180 7f2407702700  1 mon.ubuntuser8@0(leader).osd
e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.21
10.1.248.1:6800/3711 is reporting failure:1
2018-01-30 06:40:20.982198 7f2407702700  0 log_channel(cluster) log [DBG] :
osd.90 10.1.248.4:6830/3552 reported failed by osd.21 10.1.248.1:6800/3711
2018-01-30 06:40:20.984336 7f2402ef9700  1 mon.ubuntuser8@0(leader).log
v424402 check_sub sending message to client.164108 10.1.248.8:0/3388257888
with 0 entries (version 424402)
2018-01-30 06:40:21.152463 7f2407702700  1 mon.ubuntuser8@0(leader).osd
e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.13
10.1.248.1:6842/3700 is reporting failure:1
2018-01-30 06:40:21.152480 7f2407702700  0 log_channel(cluster) log [DBG] :
osd.90 10.1.248.4:6830/3552 reported failed by osd.13 10.1.248.1:6842/3700
2018-01-30 06:40:22.083900 7f2407702700  1 mon.ubuntuser8@0(leader).osd
e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.38
10.1.248.2:6876/3764 is reporting failure:1
2018-01-30 06:40:22.083920 7f2407702700  0 log_channel(cluster) log [DBG] :
osd.90 10.1.248.4:6830/3552 reported failed by osd.38 10.1.248.2:6876/3764
2018-01-30 06:40:22.117579 7f2402ef9700  1 mon.ubuntuser8@0(leader).log
v424403 check_sub sending message to client.164108 10.1.248.8:0/3388257888
with 0 entries (version 424403)
2018-01-30 06:40:22.179713 7f2407702700  1 mon.ubuntuser8@0(leader).osd
e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.89
10.1.248.3:6817/3679 is reporting failure:1
2018-01-30 06:40:22.179737 7f2407702700  0 log_channel(cluster) log [DBG] :
osd.90 10.1.248.4:6830/3552 reported failed by osd.89 10.1.248.3:6817/3679
2018-01-30 06:40:22.318619 7f2407702700  1 mon.ubuntuser8@0(leader).osd
e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.69
10.1.248.3:6812/3716 is reporting failure:1
2018-01-30 06:40:22.318644 7f2407702700  0 log_channel(cluster) log [DBG] :
osd.90 10.1.248.4:6830/3552 reported failed by osd.69 10.1.248.3:6812/3716
2018-01-30 06:40:22.698652 7f2407702700  1 mon.ubuntuser8@0(leader).osd
e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.53
10.1.248.3:6843/3701 is reporting failure:1
2018-01-30 06:40:22.698673 7f2407702700  0 log_channel(cluster) log [DBG] :
osd.90 10.1.248.4:6830/3552 reported failed by osd.53 10.1.248.3:6843/3701
2018-01-30 06:40:22.842870 7f2407702700  1 mon.ubuntuser8@0(leader).osd
e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.23
10.1.248.1:6831/3663 is reporting failure:1
2018-01-30 06:40:22.842886 7f2407702700  0 log_channel(cluster) log [DBG] :
osd.90 10.1.248.4:6830/3552 reported failed by osd.23 10.1.248.1:6831/3663
2018-01-30 06:40:23.183964 7f2402ef9700  1 mon.ubuntuser8@0(leader).log
v424404 check_sub sending message to client.164108 10.1.248.8:0/3388257888
with 0 entries (version 424404)
2018-01-30 06:40:23.259187 7f2407702700  1 mon.ubuntuser8@0(leader).osd
e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.25
10.1.248.2:6825/3713 is reporting failure:1
2018-01-30 06:40:23.259203 7f2407702700  0 log_channel(cluster) log [DBG] :
osd.90 10.1.248.4:6830/3552 reported failed by osd.25 10.1.248.2:6825/3713
2018-01-30 06:40:23.281029 7f2407702700  1 mon.ubuntuser8@0(leader).osd
e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.62
10.1.248.3:6872/3735 is reporting failure:1
2018-01-30 06:40:23.281041 7f2407702700  0 log_channel(cluster) log [DBG] :
osd.90 10.1.248.4:6830/3552 reported failed by osd.62 10.1.248.3:6872/3735
2018-01-30 06:40:23.771737 7f2407702700  1 mon.ubuntuser8@0(leader).osd
e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.92
10.1.248.4:6801/3581 is reporting failure:1
2018-01-30 06:40:23.771749 7f2407702700  0 log_channel(cluster) log [DBG] :
osd.90 10.1.248.4:6830/3552 reported failed by osd.92 10.1.248.4:6801/3581
2018-01-30 06:40:24.317177 7f2402ef9700  1 mon.ubuntuser8@0(leader).log
v424405 check_sub sending message to client.164108 10.1.248.8:0/3388257888
with 2 entries (version 424405)
2018-01-30 06:40:31.694701 7f2409f07700  0 log_channel(cluster) log [WRN] :
Health check update: 3 slow requests are blocked > 32 sec (REQUEST_SLOW)
2018-01-30 06:40:31.830914 7f2402ef9700  1 mon.ubuntuser8@0(leader).log
v424406 check_sub sending message to client.164108 10.1.248.8:0/3388257888
with 1 entries (version 424406)
2018-01-30 06:40:37.702118 7f2409f07700  0 log_channel(cluster) log [WRN] :
Health check update: 4 slow requests are blocked > 32 sec (REQUEST_SLOW)
2018-01-30 06:40:37.845087 7f2402ef9700  1 mon.ubuntuser8@0(leader).log
v424407 check_sub sending message to client.164108 10.1.248.8:0/3388257888
with 1 entries (version 424407)
2018-01-30 06:40:38.730131 7f2409f07700  1 mon.ubuntuser8@0(leader).osd
e3016  we have enough reporters to mark osd.90 down
2018-01-30 06:40:38.730162 7f2409f07700  0 log_channel(cluster) log [INF] :
osd.90 failed (root=default,host=ubuntuser4) (4 reporters from different
host after 34.958423 >= grace 31.094427)
2018-01-30 06:40:38.730952 7f2409f07700  0 log_channel(cluster) log [WRN] :
Health check failed: 1 osds down (OSD_DOWN)
2018-01-30 06:40:38.795202 7f2402ef9700  1 mon.ubuntuser8@0(leader).osd
e3017 e3017: 96 total, 92 up, 93 in
2018-01-30 06:40:38.830011 7f2402ef9700  0 log_channel(cluster) log [DBG] :
osdmap e3017: 96 total, 92 up, 93 in
2018-01-30 06:40:38.830254 7f2407702700  0 mon.ubuntuser8@0(leader) e1
handle_command mon_command({"prefix": "osd metadata", "id": 17} v 0) v1
2018-01-30 06:40:38.830299 7f2407702700  0 log_channel(audit) log [DBG] :
from='client.164108 10.1.248.8:0/3388257888' entity='mgr.ubuntuser8'
cmd=[{"prefix": "osd metadata", "id": 17}]: dispatch
2018-01-30 06:40:38.911919 7f2402ef9700  1 mon.ubuntuser8@0(leader).log
v424408 check_sub sending message to client.164108 10.1.248.8:0/3388257888
with 2 entries (version 424408)
2018-01-30 06:40:39.861824 7f2402ef9700  1 mon.ubuntuser8@0(leader).osd
e3018 e3018: 96 total, 92 up, 93 in
2018-01-30 06:40:39.895967 7f2402ef9700  0 log_channel(cluster) log [DBG] :
osdmap e3018: 96 total, 92 up, 93 in
2018-01-30 06:40:39.896243 7f2407702700  0 mon.ubuntuser8@0(leader) e1
handle_command mon_command({"prefix": "osd metadata", "id": 17} v 0) v1
2018-01-30 06:40:39.896291 7f2407702700  0 log_channel(audit) log [DBG] :
from='client.164108 10.1.248.8:0/3388257888' entity='mgr.ubuntuser8'
cmd=[{"prefix": "osd metadata", "id": 17}]: dispatch
2018-01-30 06:40:39.911825 7f2409f07700  0 log_channel(cluster) log [WRN] :
Health check update: Reduced data availability: 7 pgs peering, 10 pgs stale
(PG_AVAILABILITY)
2018-01-30 06:40:39.911883 7f2409f07700  0 log_channel(cluster) log [WRN] :
Health check failed: Degraded data redundancy: 7 pgs unclean (PG_DEGRADED)
2018-01-30 06:40:40.928452 7f2402ef9700  1 mon.ubuntuser8@0(leader).log
v424409 check_sub sending message to client.164108 10.1.248.8:0/3388257888
with 2 entries (version 424409)
2018-01-30 06:40:43.544073 7f2402ef9700  1 mon.ubuntuser8@0(leader).log
v424410 check_sub sending message to client.164108 10.1.248.8:0/3388257888
with 5 entries (version 424410)
2018-01-30 06:40:43.761429 7f2409f07700  0 log_channel(cluster) log [WRN] :
Health check update: 5 slow requests are blocked > 32 sec (REQUEST_SLOW)

2018-01-30 16:07 GMT+08:00 blackpiglet J. <blackpigletbr...@gmail.com>:

> Guys,
>
> We had set up a five nodes Ceph cluster. Four are OSD servers and the
> other one is MON and MGR.
> Recently, during RGW stability test, RGW default
> pool: default.rgw.buckets.data is accidentally written full. As a result,
> RGW is stuck. We don't know the exact steps to recover, then we deleted all
> default RGW pools directly. After RGW restarted, all pools are back.
>
> Then I found 4 OSD processes are down. I am not whether this is related to
> our RGW pool delete operation.
>
> This a the log I think may be useful. The full version is in the
> attachment.
>
> Any help is appreciated. Thanks in advance.
>
>
> 2018-01-29 20:17:48.896544 7fc111122700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7fc0fb912700' had timed out after 15
> 2018-01-29 20:17:52.360069 7fc110183700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7fc0fb912700' had timed out after 15
> 2018-01-29 20:17:57.360192 7fc110183700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7fc0fb912700' had timed out after 15
> 2018-01-29 20:18:02.360315 7fc110183700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7fc0fb912700' had timed out after 15
> 2018-01-29 20:18:06.029867 7fc0fb912700  1 heartbeat_map reset_timeout
> 'OSD::osd_op_tp thread 0x7fc0fb912700' had timed out after 15
> 2018-01-29 20:18:06.030576 7fc0fc914700  1 osd.9 pg_epoch: 2481 pg[6.18( v
> 2276'10 (0'0,2276'10] local-lis/les=2479/2480 n=3 ec=704/704 lis/c
> 2479/2470 les/c/f 2480/2471/2312 2481/2481/2476) [9,91,51] r=0 lpr=2481
> pi=[2470,2481)/2 luod=
> 0'0 crt=2276'10 lcod 2151'9 mlcod 0'0 active] start_peering_interval up
> [9,51] -> [9,91,51], acting [9,51] -> [9,91,51], acting_primary 9 -> 9,
> up_primary 9 -> 9, role 0 -> 0, features acting 2305244844532236283
> upacting 230524484453223
> 6283
> 2018-01-29 20:18:06.030774 7fc0fc914700  1 osd.9 pg_epoch: 2481 pg[6.18( v
> 2276'10 (0'0,2276'10] local-lis/les=2479/2480 n=3 ec=704/704 lis/c
> 2479/2470 les/c/f 2480/2471/2312 2481/2481/2476) [9,91,51] r=0 lpr=2481
> pi=[2470,2481)/2 crt=2
> 276'10 lcod 2151'9 mlcod 0'0 unknown] state<Start>: transitioning to
> Primary
> 2018-01-29 20:18:06.031236 7fc0fc914700  1 osd.9 pg_epoch: 2481 pg[6.1d( v
> 2316'568 (0'0,2316'568] local-lis/les=2479/2480 n=2 ec=704/704 lis/c
> 2479/2470 les/c/f 2480/2471/2312 2481/2481/2481) [91,79,9] r=2 lpr=2481
> pi=[2470,2481)/2 luo
> d=0'0 crt=2316'568 lcod 2312'567 active] start_peering_interval up [79,9]
> -> [91,79,9], acting [79,9] -> [91,79,9], acting_primary 79 -> 91,
> up_primary 79 -> 91, role 1 -> 2, features acting 2305244844532236283
> upacting 2305244844532236
> 283
> 2018-01-29 20:18:06.031324 7fc0fc914700  1 osd.9 pg_epoch: 2481 pg[6.1d( v
> 2316'568 (0'0,2316'568] local-lis/les=2479/2480 n=2 ec=704/704 lis/c
> 2479/2470 les/c/f 2480/2471/2312 2481/2481/2481) [91,79,9] r=2 lpr=2481
> pi=[2470,2481)/2 crt
> =2316'568 lcod 2312'567 unknown NOTIFY] state<Start>: transitioning to
> Stray
> 2018-01-29 20:18:06.032167 7fc112124700  0 -- 10.1.248.1:6850/6003656 >>
> 10.1.248.4:6825/1003558 conn(0x556089bf4800 :6850 
> s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH
> pgs=5245 cs=1 l=0).handle_connect_reply connect got RESETSESSION
> 2018-01-29 20:18:06.032184 7fc111923700  0 -- 10.1.248.1:6850/6003656 >>
> 10.1.248.3:6827/3726 conn(0x5560c0c06000 :-1 
> s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH
> pgs=6734 cs=1 l=0).handle_connect_reply connect got RESETSESSION
> 2018-01-29 20:18:06.053899 7fc103121700  0 log_channel(cluster) log [WRN]
> : Monitor daemon marked osd.9 down, but it is still running
> 2018-01-29 20:18:06.053917 7fc103121700  0 log_channel(cluster) log [DBG]
> : map e2485 wrongly marked me down at e2483
> 2018-01-29 20:18:06.053922 7fc103121700  0 osd.9 2485 _committed_osd_maps
> marked down 6 > osd_max_markdown_count 5 in last 600.000000 seconds,
> shutting down
> 2018-01-29 20:18:06.053927 7fc103121700  1 osd.9 2485
> start_waiting_for_healthy
> 2018-01-29 20:18:06.064250 7fc0fc914700  1 osd.9 pg_epoch: 2483 pg[10.53(
> empty local-lis/les=2476/2477 n=0 ec=1815/1815 lis/c 2476/2476 les/c/f
> 2477/2477/2312 2483/2483/2483) [] r=-1 lpr=2483 pi=[2476,2483)/1 crt=0'0
> active] start_peer
> ing_interval up [9] -> [], acting [9] -> [], acting_primary 9 -> -1,
> up_primary 9 -> -1, role 0 -> -1, features acting 2305244844532236283
> upacting 2305244844532236283
> 2018-01-29 20:18:06.064258 7fc0fd115700  1 osd.9 pg_epoch: 2483 pg[3.16( v
> 2284'372 (0'0,2284'372] local-lis/les=2476/2477 n=2 ec=1758/130 lis/c
> 2476/2476 les/c/f 2477/2477/2312 2483/2483/1760) [33,85] r=-1 lpr=2483
> pi=[2476,2483)/1 luo
> d=0'0 crt=2284'372 lcod 2284'371 active] start_peering_interval up
> [33,9,85] -> [33,85], acting [33,9,85] -> [33,85], acting_primary 33 -> 33,
> up_primary 33 -> 33, role 1 -> -1, features acting 2305244844532236283
> upacting 2305244844532
> 236283
> 2018-01-29 20:18:06.064493 7fc0fc914700  1 osd.9 pg_epoch: 2485 pg[10.53(
> empty local-lis/les=2476/2477 n=0 ec=1815/1815 lis/c 2476/2476 les/c/f
> 2477/2477/2312 2483/2483/2483) [] r=-1 lpr=2483 pi=[2476,2483)/1 crt=0'0
> unknown NOTIFY] st
> ate<Start>: transitioning to Stray
> 2018-01-29 20:18:06.064545 7fc0fd115700  1 osd.9 pg_epoch: 2485 pg[3.16( v
> 2284'372 (0'0,2284'372] local-lis/les=2476/2477 n=2 ec=1758/130 lis/c
> 2476/2476 les/c/f 2477/2477/2312 2483/2483/1760) [33,85] r=-1 lpr=2483
> pi=[2476,2483)/1 crt
> =2284'372 lcod 2284'371 unknown NOTIFY] state<Start>: transitioning to
> Stray
> 2018-01-29 20:18:06.064886 7fc0fc914700  1 osd.9 pg_epoch: 2483 pg[10.3f(
> empty local-lis/les=2476/2477 n=0 ec=1815/1815 lis/c 2476/2476 les/c/f
> 2477/2477/2312 2483/2483/2483) [] r=-1 lpr=2483 pi=[2476,2483)/1 crt=0'0
> active] start_peer
> ing_interval up [9] -> [], acting [9] -> [], acting_primary 9 -> -1,
> up_primary 9 -> -1, role 0 -> -1, features acting 2305244844532236283
> upacting 2305244844532236283
> 2018-01-29 20:18:06.064972 7fc0fd115700  1 osd.9 pg_epoch: 2483 pg[6.42( v
> 2298'367 (0'0,2298'367] local-lis/les=2476/2477 n=1 ec=1753/704 lis/c
> 2476/2476 les/c/f 2477/2477/2312 2483/2483/2367) [24,84] r=-1 lpr=2483
> pi=[2476,2483)/1 luo
> d=0'0 crt=2298'367 lcod 2298'366 active] start_peering_interval up
> [24,84,9] -> [24,84], acting [24,84,9] -> [24,84], acting_primary 24 -> 24,
> up_primary 24 -> 24, role 2 -> -1, features acting 2305244844532236283
> upacting 2305244844532
> 236283
> 2018-01-29 20:18:06.065091 7fc0fc914700  1 osd.9 pg_epoch: 2485 pg[10.3f(
> empty local-lis/les=2476/2477 n=0 ec=1815/1815 lis/c 2476/2476 les/c/f
> 2477/2477/2312 2483/2483/2483) [] r=-1 lpr=2483 pi=[2476,2483)/1 crt=0'0
> unknown NOTIFY] st
> ate<Start>: transitioning to Stray
>
> 2018-01-29 20:18:06.072202 7fc0fc914700  1 osd.9 pg_epoch: 2483 pg[5.3c( v
> 2312'24 (0'0,2312'24] local-lis/les=2476/2477 n=1 ec=1748/700 lis/c
> 2476/2476 les/c/f 2477/2477/2312 2483/2483/2483) [32,66] r=-1 lpr=2483
> pi=[2476,2483)/1 luod=
> 0'0 crt=2312'24 lcod 2312'23 active] start_peering_interval up [9,32,66]
> -> [32,66], acting [9,32,66] -> [32,66], acting_primary 9 -> 32, up_primary
> 9 -> 32, role 0 -> -1, features acting 2305244844532236283 upacting
> 2305244844532236283
> 2018-01-29 20:18:06.072324 7fc0fc914700  1 osd.9 pg_epoch: 2485 pg[5.3c( v
> 2312'24 (0'0,2312'24] local-lis/les=2476/2477 n=1 ec=1748/700 lis/c
> 2476/2476 les/c/f 2477/2477/2312 2483/2483/2483) [32,66] r=-1 lpr=2483
> pi=[2476,2483)/1 crt=2
> 312'24 lcod 2312'23 unknown NOTIFY] state<Start>: transitioning to Stray
> 2018-01-29 20:18:06.072463 7fc103121700  0 osd.9 2485 _committed_osd_maps
> shutdown OSD via async signal
> 2018-01-29 20:18:06.072570 7fc0f4904700 -1 Fail to open '/proc/0/cmdline'
> error = (2) No such file or directory
> 2018-01-29 20:18:06.072597 7fc0f4904700 -1 received  signal: Interrupt
> from  PID: 0 task name: <unknown> UID: 0
> 2018-01-29 20:18:06.072603 7fc0f4904700 -1 osd.9 2485 *** Got signal
> Interrupt ***
> 2018-01-29 20:18:06.072609 7fc0f4904700  0 osd.9 2485 prepare_to_stop
> starting shutdown
> 2018-01-29 20:18:06.072614 7fc0f4904700 -1 osd.9 2485 shutdown
> 2018-01-29 20:18:06.429238 7fc10d936700  0 log_channel(cluster) log [WRN]
> : 2 slow requests, 2 included below; oldest blocked for > 55.682037 secs
> 2018-01-29 20:18:06.429251 7fc10d936700  0 log_channel(cluster) log [WRN]
> : slow request 55.682037 seconds old, received at 2018-01-29
> 20:17:10.747125: pg_notify((query:2482 sent:2482 6.18( v 2276'10
> (0'0,2276'10] local-lis/les=2473/247
> 4 n=3 ec=704/704 lis/c 2473/2470 les/c/f 2474/2471/2312
> 2481/2481/2476))=([2470,2480] intervals=([2473,2475] acting
> 51,91),([2479,2480] acting 9,51)) epoch 2482) currently wait for new map
> 2018-01-29 20:18:06.429265 7fc10d936700  0 log_channel(cluster) log [WRN]
> : slow request 55.130508 seconds old, received at 2018-01-29
> 20:17:11.298655: pg_notify((query:2482 sent:2482 6.18( v 2276'10
> (0'0,2276'10] local-lis/les=2479/248
> 0 n=3 ec=704/704 lis/c 2479/2470 les/c/f 2480/2471/2312
> 2481/2481/2476))=([2470,2480] intervals=([2473,2475] acting
> 51,91),([2479,2480] acting 9,51)) epoch 2482) currently wait for new map
> 2018-01-29 20:18:08.117736 7fc0f4904700  1 bluestore(/var/lib/ceph/osd/ceph-9)
> umount
> 2018-01-29 20:18:08.317882 7fc0f4904700  1 stupidalloc shutdown
> 2018-01-29 20:18:08.322660 7fc0f4904700  1 freelist shutdown
> 2018-01-29 20:18:08.322718 7fc0f4904700  4 rocksdb:
> [/build/ceph-12.2.2/src/rocksdb/db/db_impl.cc:217] Shutdown: canceling
> all background work
> 2018-01-29 20:18:08.339096 7fc0f4904700  4 rocksdb:
> [/build/ceph-12.2.2/src/rocksdb/db/db_impl.cc:343] Shutdown complete
> 2018-01-29 20:18:08.508354 7fc0f4904700  1 bluefs umount
> 2018-01-29 20:18:08.508555 7fc0f4904700  1 stupidalloc shutdown
> 2018-01-29 20:18:08.508571 7fc0f4904700  1 stupidalloc shutdown
> 2018-01-29 20:18:08.508573 7fc0f4904700  1 stupidalloc shutdown
> 2018-01-29 20:18:08.508718 7fc0f4904700  1 bdev(0x5560701c7440
> /dev/nvme0n1p19) close
> 2018-01-29 20:18:08.786803 7fc0f4904700  1 bdev(0x5560701c6fc0
> /dev/nvme0n1p20) close
> 2018-01-29 20:18:08.874747 7fc0f4904700  1 bdev(0x5560701c7200
> /var/lib/ceph/osd/ceph-9/block) close
> 2018-01-29 20:18:09.030750 7fc0f4904700  1 bdev(0x5560701c6d80
> /var/lib/ceph/osd/ceph-9/block) close
>
> BR,
> Bruce J.
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD went down but no idea why

Reply via email to