> Op 22 februari 2017 om 14:24 schreef george.vasilaka...@stfc.ac.uk:
> 
> 
> Brad Hubbard pointed out on the bug tracker 
> (http://tracker.ceph.com/issues/18960) that, for EC, we need to add the shard 
> suffix to the PGID parameter in the command, e.g. --pgid 1.323s0
> The command now works and produces the same output as PG query.
> 

Good!

> To avoid spamming the list I've put the outputs of this command of 307, 595 
> and 1391 in a Gist 
> (https://gist.github.com/gvasilak/3bf155a89a4b2703e639c4326df01460)
> 

So what I see there is this for osd.307:

    "empty": 1,
    "dne": 0,
    "incomplete": 0,
    "last_epoch_started": 0,
    "hit_set_history": {
        "current_last_update": "0'0",
        "history": []
    }
}

last_epoch_started is 0 and empty is 1. The other OSDs are reporting 
last_epoch_started 16806 and empty 0.

My EC PG knowledge is not sufficient here to exactly tell you what is going on, 
but that's the only thing I noticed so far.

If you stop osd.307 and maybe mark it as out, does that help?

Wido

> ________________________________________
> From: Wido den Hollander [w...@42on.com]
> Sent: 22 February 2017 12:18
> To: Vasilakakos, George (STFC,RAL,SC); ceph-users@lists.ceph.com
> Subject: RE: [ceph-users] PG stuck peering after host reboot
> 
> > Op 21 februari 2017 om 15:35 schreef george.vasilaka...@stfc.ac.uk:
> >
> >
> > I have noticed something odd with the ceph-objectstore-tool command:
> >
> > It always reports PG X not found even on healthly OSDs/PGs. The 'list' op 
> > works on both and unhealthy PGs.
> >
> 
> Are you sure you are supplying the correct PG ID?
> 
> I just tested with (Jewel 10.2.5):
> 
> $ ceph pg ls-by-osd 5
> $ systemctl stop ceph-osd@5
> $ ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-5 --op info --pgid 
> 10.d0
> $ systemctl start ceph-osd@5
> 
> Can you double-check that?
> 
> It's weird that the PG can't be found on those OSDs by the tool.
> 
> Wido
> 
> 
> > ________________________________________
> > From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of 
> > george.vasilaka...@stfc.ac.uk [george.vasilaka...@stfc.ac.uk]
> > Sent: 21 February 2017 10:17
> > To: w...@42on.com; ceph-users@lists.ceph.com; bhubb...@redhat.com
> > Subject: Re: [ceph-users] PG stuck peering after host reboot
> >
> > > Can you for the sake of redundancy post your sequence of commands you 
> > > executed and their output?
> >
> > [root@ceph-sn852 ~]# systemctl stop ceph-osd@307
> > [root@ceph-sn852 ~]# ceph-objectstore-tool --data-path 
> > /var/lib/ceph/osd/ceph-307 --op info --pgid 1.323
> > PG '1.323' not found
> > [root@ceph-sn852 ~]# systemctl start ceph-osd@307
> >
> > I did the same thing for 307 (new up but not acting primary) and all the 
> > OSDs in the original set (including 595). The output was the exact same. I 
> > don't have the whole session log handy from all those sessions but here's a 
> > sample from one that's easy to pick out:
> >
> > [root@ceph-sn832 ~]# systemctl stop ceph-osd@7
> > [root@ceph-sn832 ~]# ceph-objectstore-tool --data-path 
> > /var/lib/ceph/osd/ceph-7 --op info --pgid 1.323
> > PG '1.323' not found
> > [root@ceph-sn832 ~]# systemctl start ceph-osd@7
> > [root@ceph-sn832 ~]# ll /var/lib/ceph/osd/ceph-7/current/
> > 0.18_head/      11.1c8s5_TEMP/  13.3b_head/     1.74s1_TEMP/    
> > 2.256s6_head/   2.c3s10_TEMP/   3.b9s4_head/
> > 0.18_TEMP/      1.16s1_head/    13.3b_TEMP/     1.8bs9_head/    
> > 2.256s6_TEMP/   2.c4s3_head/    3.b9s4_TEMP/
> > 1.106s10_head/  1.16s1_TEMP/    1.3a6s0_head/   1.8bs9_TEMP/    
> > 2.2d5s2_head/   2.c4s3_TEMP/    4.34s10_head/
> > 1.106s10_TEMP/  1.274s5_head/   1.3a6s0_TEMP/   2.174s10_head/  
> > 2.2d5s2_TEMP/   2.dbs7_head/    4.34s10_TEMP/
> > 11.12as10_head/ 1.274s5_TEMP/   1.3e4s9_head/   2.174s10_TEMP/  
> > 2.340s8_head/   2.dbs7_TEMP/    commit_op_seq
> > 11.12as10_TEMP/ 1.2ds8_head/    1.3e4s9_TEMP/   2.1c1s10_head/  
> > 2.340s8_TEMP/   3.159s3_head/   meta/
> > 11.148s2_head/  1.2ds8_TEMP/    14.1a_head/     2.1c1s10_TEMP/  
> > 2.36es10_head/  3.159s3_TEMP/   nosnap
> > 11.148s2_TEMP/  1.323s8_head/   14.1a_TEMP/     2.1d0s6_head/   
> > 2.36es10_TEMP/  3.170s1_head/   omap/
> > 11.165s6_head/  1.323s8_TEMP/   1.6fs9_head/    2.1d0s6_TEMP/   
> > 2.3d3s10_head/  3.170s1_TEMP/
> > 11.165s6_TEMP/  13.32_head/     1.6fs9_TEMP/    2.1efs2_head/   
> > 2.3d3s10_TEMP/  3.1aas5_head/
> > 11.1c8s5_head/  13.32_TEMP/     1.74s1_head/    2.1efs2_TEMP/   
> > 2.c3s10_head/   3.1aas5_TEMP/
> > [root@ceph-sn832 ~]# ll /var/lib/ceph/osd/ceph-7/current/1.323s8_
> > 1.323s8_head/ 1.323s8_TEMP/
> > [root@ceph-sn832 ~]# ll 
> > /var/lib/ceph/osd/ceph-7/current/1.323s8_head/DIR_3/DIR_2/DIR_
> > DIR_3/ DIR_7/ DIR_B/ DIR_F/
> > [root@ceph-sn832 ~]# ll 
> > /var/lib/ceph/osd/ceph-7/current/1.323s8_head/DIR_3/DIR_2/DIR_3/DIR_
> > DIR_0/ DIR_1/ DIR_2/ DIR_3/ DIR_4/ DIR_5/ DIR_6/ DIR_7/ DIR_8/ DIR_9/ 
> > DIR_A/ DIR_B/ DIR_C/ DIR_D/ DIR_E/ DIR_F/
> > [root@ceph-sn832 ~]# ll 
> > /var/lib/ceph/osd/ceph-7/current/1.323s8_head/DIR_3/DIR_2/DIR_3/DIR_1/
> > total 271276
> > -rw-r--r--. 1 ceph ceph 8388608 Feb  3 22:07 
> > datadisk\srucio\sdata16\u13TeV\s11\sad\sDAOD\uTOPQ4.09383728.\u000436.pool.root.1.0000000000000001__head_2BA91323__1_ffffffffffffffff_8
> >
> > > If you run a find in the data directory of the OSD, does that PG show up?
> >
> > OSDs 595 (used to be 0), 1391(1), 240(2), 7(7, the one that started this) 
> > have a 1.323_headsX directory. OSD 307 does not.
> > I have not checked the other OSDs in the PG yet.
> >
> > Wido
> >
> > >
> > > Best regards,
> > >
> > > George
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to