the osd.0 is still in down state after restart? if so, maybe the problem is in mon, can you set the leader mon's debug_mon=20 and restart one of the down state osd. and then attach the mon log file.
nokia ceph <nokiacephus...@gmail.com> 于2019年11月8日周五 下午6:38写道: > > Hi, > > > > Below is the status of the OSD after restart. > > > > # systemctl status ceph-osd@0.service > > ● ceph-osd@0.service - Ceph object storage daemon osd.0 > > Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; > enabled-runtime; vendor preset: disabled) > > Drop-In: /etc/systemd/system/ceph-osd@.service.d > > └─90-ExecStart_NUMA.conf > > Active: active (running) since Fri 2019-11-08 10:32:51 UTC; 1min 1s ago > > Process: 219213 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster > ${CLUSTER} --id %i (code=exited, status=0/SUCCESS) Main PID: 219218 > (ceph-osd) > > CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@0.service > > └─219218 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph > --setgroup ceph > > > > Nov 08 10:32:51 cn1.chn8be1c1.cdn systemd[1]: Starting Ceph object storage > daemon osd.0... > > Nov 08 10:32:51 cn1.chn8be1c1.cdn systemd[1]: Started Ceph object storage > daemon osd.0. > > Nov 08 10:33:03 cn1.chn8be1c1.cdn numactl[219218]: 2019-11-08 10:33:03.785 > 7f9adeed4d80 -1 osd.0 1795 log_to_monitors {default=true} Nov 08 10:33:05 > cn1.chn8be1c1.cdn numactl[219218]: 2019-11-08 10:33:05.474 7f9ad14df700 -1 > osd.0 1795 set_numa_affinity unable to identify public interface 'dss-client' > numa n...r directory > > Hint: Some lines were ellipsized, use -l to show in full. > > > > > > And I have attached the logs in the file in this mail while this restart was > initiated. > > > > > On Fri, Nov 8, 2019 at 3:59 PM huang jun <hjwsm1...@gmail.com> wrote: >> >> try to restart some of the down osds in 'ceph osd tree', and to see >> what happened? >> >> nokia ceph <nokiacephus...@gmail.com> 于2019年11月8日周五 下午6:24写道: >> > >> > Adding my official mail id >> > >> > ---------- Forwarded message --------- >> > From: nokia ceph <nokiacephus...@gmail.com> >> > Date: Fri, Nov 8, 2019 at 3:57 PM >> > Subject: OSD's not coming up in Nautilus >> > To: Ceph Users <ceph-users@lists.ceph.com> >> > >> > >> > Hi Team, >> > >> > There is one 5 node ceph cluster which we have upgraded from Luminous to >> > Nautilus and everything was going well until yesterday when we noticed >> > that the ceph osd's are marked down and not recognized by the monitors as >> > running eventhough the osd processes are running. >> > >> > We noticed that the admin.keyring and the mon.keyring are missing in the >> > nodes which we have recreated it with the below commands. >> > >> > ceph-authtool --create-keyring /etc/ceph/ceph.client.admin.keyring >> > --gen-key -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap >> > mds allow >> > >> > ceph-authtool --create_keyring /etc/ceph/ceph.mon.keyring --gen-key -n >> > mon. --cap mon 'allow *' >> > >> > In logs we find the below lines. >> > >> > 2019-11-08 09:01:50.525 7ff61722b700 0 log_channel(audit) log [DBG] : >> > from='client.? 10.50.11.44:0/2398064782' entity='client.admin' >> > cmd=[{"prefix": "df", "format": "json"}]: dispatch >> > 2019-11-08 09:02:37.686 7ff61722b700 0 log_channel(cluster) log [INF] : >> > mon.cn1 calling monitor election >> > 2019-11-08 09:02:37.686 7ff61722b700 1 mon.cn1@0(electing).elector(31157) >> > init, last seen epoch 31157, mid-election, bumping >> > 2019-11-08 09:02:37.688 7ff61722b700 -1 mon.cn1@0(electing) e3 failed to >> > get devid for : udev_device_new_from_subsystem_sysname failed on '' >> > 2019-11-08 09:02:37.770 7ff61722b700 0 log_channel(cluster) log [INF] : >> > mon.cn1 is new leader, mons cn1,cn2,cn3,cn4,cn5 in quorum (ranks 0,1,2,3,4) >> > 2019-11-08 09:02:37.857 7ff613a24700 0 log_channel(cluster) log [DBG] : >> > monmap e3: 5 mons at >> > {cn1=[v2:10.50.11.41:3300/0,v1:10.50.11.41:6789/0],cn2=[v2:10.50.11.42:3300/0,v1:10.50.11.42:6789/0],cn3=[v2:10.50.11.43:3300/0,v1:10.50.11.43:6789/0],cn4=[v2:10.50.11.44:3300/0,v1:10.50.11.44:6789/0],cn5=[v2:10.50.11.45:3300/0,v1:10.50.11.45:6789/0]} >> > >> > >> > >> > # ceph mon dump >> > dumped monmap epoch 3 >> > epoch 3 >> > fsid 9dbf207a-561c-48ba-892d-3e79b86be12f >> > last_changed 2019-09-03 07:53:39.031174 >> > created 2019-08-23 18:30:55.970279 >> > min_mon_release 14 (nautilus) >> > 0: [v2:10.50.11.41:3300/0,v1:10.50.11.41:6789/0] mon.cn1 >> > 1: [v2:10.50.11.42:3300/0,v1:10.50.11.42:6789/0] mon.cn2 >> > 2: [v2:10.50.11.43:3300/0,v1:10.50.11.43:6789/0] mon.cn3 >> > 3: [v2:10.50.11.44:3300/0,v1:10.50.11.44:6789/0] mon.cn4 >> > 4: [v2:10.50.11.45:3300/0,v1:10.50.11.45:6789/0] mon.cn5 >> > >> > >> > # ceph -s >> > cluster: >> > id: 9dbf207a-561c-48ba-892d-3e79b86be12f >> > health: HEALTH_WARN >> > 85 osds down >> > 3 hosts (72 osds) down >> > 1 nearfull osd(s) >> > 1 pool(s) nearfull >> > Reduced data availability: 2048 pgs inactive >> > too few PGs per OSD (17 < min 30) >> > 1/5 mons down, quorum cn2,cn3,cn4,cn5 >> > >> > services: >> > mon: 5 daemons, quorum cn2,cn3,cn4,cn5 (age 57s), out of quorum: cn1 >> > mgr: cn1(active, since 73m), standbys: cn2, cn3, cn4, cn5 >> > osd: 120 osds: 35 up, 120 in; 909 remapped pgs >> > >> > data: >> > pools: 1 pools, 2048 pgs >> > objects: 0 objects, 0 B >> > usage: 176 TiB used, 260 TiB / 437 TiB avail >> > pgs: 100.000% pgs unknown >> > 2048 unknown >> > >> > >> > The osd logs show the below logs. >> > >> > 2019-11-08 09:05:33.332 7fd1a36eed80 0 _get_class not permitted to load >> > kvs >> > 2019-11-08 09:05:33.332 7fd1a36eed80 0 _get_class not permitted to load >> > lua >> > 2019-11-08 09:05:33.337 7fd1a36eed80 0 _get_class not permitted to load >> > sdk >> > 2019-11-08 09:05:33.337 7fd1a36eed80 0 osd.0 1795 crush map has features >> > 432629308056666112, adjusting msgr requires for clients >> > 2019-11-08 09:05:33.337 7fd1a36eed80 0 osd.0 1795 crush map has features >> > 432629308056666112 was 8705, adjusting msgr requires for mons >> > 2019-11-08 09:05:33.337 7fd1a36eed80 0 osd.0 1795 crush map has features >> > 1009090060360105984, adjusting msgr requires for osds >> > >> > Please let us know what might be the issue. There seems to be no network >> > issues in any of the servers public and private interfaces. >> > >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@lists.ceph.com >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com