> Op 22 februari 2017 om 14:24 schreef george.vasilaka...@stfc.ac.uk: > > > Brad Hubbard pointed out on the bug tracker > (http://tracker.ceph.com/issues/18960) that, for EC, we need to add the shard > suffix to the PGID parameter in the command, e.g. --pgid 1.323s0 > The command now works and produces the same output as PG query. >
Good! > To avoid spamming the list I've put the outputs of this command of 307, 595 > and 1391 in a Gist > (https://gist.github.com/gvasilak/3bf155a89a4b2703e639c4326df01460) > So what I see there is this for osd.307: "empty": 1, "dne": 0, "incomplete": 0, "last_epoch_started": 0, "hit_set_history": { "current_last_update": "0'0", "history": [] } } last_epoch_started is 0 and empty is 1. The other OSDs are reporting last_epoch_started 16806 and empty 0. My EC PG knowledge is not sufficient here to exactly tell you what is going on, but that's the only thing I noticed so far. If you stop osd.307 and maybe mark it as out, does that help? Wido > ________________________________________ > From: Wido den Hollander [w...@42on.com] > Sent: 22 February 2017 12:18 > To: Vasilakakos, George (STFC,RAL,SC); ceph-users@lists.ceph.com > Subject: RE: [ceph-users] PG stuck peering after host reboot > > > Op 21 februari 2017 om 15:35 schreef george.vasilaka...@stfc.ac.uk: > > > > > > I have noticed something odd with the ceph-objectstore-tool command: > > > > It always reports PG X not found even on healthly OSDs/PGs. The 'list' op > > works on both and unhealthy PGs. > > > > Are you sure you are supplying the correct PG ID? > > I just tested with (Jewel 10.2.5): > > $ ceph pg ls-by-osd 5 > $ systemctl stop ceph-osd@5 > $ ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-5 --op info --pgid > 10.d0 > $ systemctl start ceph-osd@5 > > Can you double-check that? > > It's weird that the PG can't be found on those OSDs by the tool. > > Wido > > > > ________________________________________ > > From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of > > george.vasilaka...@stfc.ac.uk [george.vasilaka...@stfc.ac.uk] > > Sent: 21 February 2017 10:17 > > To: w...@42on.com; ceph-users@lists.ceph.com; bhubb...@redhat.com > > Subject: Re: [ceph-users] PG stuck peering after host reboot > > > > > Can you for the sake of redundancy post your sequence of commands you > > > executed and their output? > > > > [root@ceph-sn852 ~]# systemctl stop ceph-osd@307 > > [root@ceph-sn852 ~]# ceph-objectstore-tool --data-path > > /var/lib/ceph/osd/ceph-307 --op info --pgid 1.323 > > PG '1.323' not found > > [root@ceph-sn852 ~]# systemctl start ceph-osd@307 > > > > I did the same thing for 307 (new up but not acting primary) and all the > > OSDs in the original set (including 595). The output was the exact same. I > > don't have the whole session log handy from all those sessions but here's a > > sample from one that's easy to pick out: > > > > [root@ceph-sn832 ~]# systemctl stop ceph-osd@7 > > [root@ceph-sn832 ~]# ceph-objectstore-tool --data-path > > /var/lib/ceph/osd/ceph-7 --op info --pgid 1.323 > > PG '1.323' not found > > [root@ceph-sn832 ~]# systemctl start ceph-osd@7 > > [root@ceph-sn832 ~]# ll /var/lib/ceph/osd/ceph-7/current/ > > 0.18_head/ 11.1c8s5_TEMP/ 13.3b_head/ 1.74s1_TEMP/ > > 2.256s6_head/ 2.c3s10_TEMP/ 3.b9s4_head/ > > 0.18_TEMP/ 1.16s1_head/ 13.3b_TEMP/ 1.8bs9_head/ > > 2.256s6_TEMP/ 2.c4s3_head/ 3.b9s4_TEMP/ > > 1.106s10_head/ 1.16s1_TEMP/ 1.3a6s0_head/ 1.8bs9_TEMP/ > > 2.2d5s2_head/ 2.c4s3_TEMP/ 4.34s10_head/ > > 1.106s10_TEMP/ 1.274s5_head/ 1.3a6s0_TEMP/ 2.174s10_head/ > > 2.2d5s2_TEMP/ 2.dbs7_head/ 4.34s10_TEMP/ > > 11.12as10_head/ 1.274s5_TEMP/ 1.3e4s9_head/ 2.174s10_TEMP/ > > 2.340s8_head/ 2.dbs7_TEMP/ commit_op_seq > > 11.12as10_TEMP/ 1.2ds8_head/ 1.3e4s9_TEMP/ 2.1c1s10_head/ > > 2.340s8_TEMP/ 3.159s3_head/ meta/ > > 11.148s2_head/ 1.2ds8_TEMP/ 14.1a_head/ 2.1c1s10_TEMP/ > > 2.36es10_head/ 3.159s3_TEMP/ nosnap > > 11.148s2_TEMP/ 1.323s8_head/ 14.1a_TEMP/ 2.1d0s6_head/ > > 2.36es10_TEMP/ 3.170s1_head/ omap/ > > 11.165s6_head/ 1.323s8_TEMP/ 1.6fs9_head/ 2.1d0s6_TEMP/ > > 2.3d3s10_head/ 3.170s1_TEMP/ > > 11.165s6_TEMP/ 13.32_head/ 1.6fs9_TEMP/ 2.1efs2_head/ > > 2.3d3s10_TEMP/ 3.1aas5_head/ > > 11.1c8s5_head/ 13.32_TEMP/ 1.74s1_head/ 2.1efs2_TEMP/ > > 2.c3s10_head/ 3.1aas5_TEMP/ > > [root@ceph-sn832 ~]# ll /var/lib/ceph/osd/ceph-7/current/1.323s8_ > > 1.323s8_head/ 1.323s8_TEMP/ > > [root@ceph-sn832 ~]# ll > > /var/lib/ceph/osd/ceph-7/current/1.323s8_head/DIR_3/DIR_2/DIR_ > > DIR_3/ DIR_7/ DIR_B/ DIR_F/ > > [root@ceph-sn832 ~]# ll > > /var/lib/ceph/osd/ceph-7/current/1.323s8_head/DIR_3/DIR_2/DIR_3/DIR_ > > DIR_0/ DIR_1/ DIR_2/ DIR_3/ DIR_4/ DIR_5/ DIR_6/ DIR_7/ DIR_8/ DIR_9/ > > DIR_A/ DIR_B/ DIR_C/ DIR_D/ DIR_E/ DIR_F/ > > [root@ceph-sn832 ~]# ll > > /var/lib/ceph/osd/ceph-7/current/1.323s8_head/DIR_3/DIR_2/DIR_3/DIR_1/ > > total 271276 > > -rw-r--r--. 1 ceph ceph 8388608 Feb 3 22:07 > > datadisk\srucio\sdata16\u13TeV\s11\sad\sDAOD\uTOPQ4.09383728.\u000436.pool.root.1.0000000000000001__head_2BA91323__1_ffffffffffffffff_8 > > > > > If you run a find in the data directory of the OSD, does that PG show up? > > > > OSDs 595 (used to be 0), 1391(1), 240(2), 7(7, the one that started this) > > have a 1.323_headsX directory. OSD 307 does not. > > I have not checked the other OSDs in the PG yet. > > > > Wido > > > > > > > > Best regards, > > > > > > George > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com