Re: [ceph-users] Inconsistent PGs

施柏安 Wed, 22 Jun 2016 02:46:50 -0700

Hi,
You can use command 'ceph pg query' to check what's going on with the pgs
which have problem and use "ceph-objectstore-tool" to recover that pg.


2016-06-21 19:09 GMT+08:00 Paweł Sadowski <c...@sadziu.pl>:

> Already restarted those OSD and then whole cluster (rack by rack,
> failure domain is rack in this setup).
> We would like to try *ceph-objectstore-tool mark-complete* operation. Is
> there any way (other than checking mtime on file and querying PGs) to
> determine which replica has most up to date datas?
>
> On 06/21/2016 12:37 PM, M Ranga Swami Reddy wrote:
> > Try to restart OSD 109 and 166? check if it help?
> >
> >
> > On Tue, Jun 21, 2016 at 4:05 PM, Paweł Sadowski <c...@sadziu.pl> wrote:
> >> Thanks for response.
> >>
> >> All OSDs seems to be ok, they have been restarted, joined cluster after
> >> that, nothing weird in the logs.
> >>
> >> # ceph pg dump_stuck stale
> >> ok
> >>
> >> # ceph pg dump_stuck inactive
> >> ok
> >> pg_stat    state    up    up_primary    acting    acting_primary
> >> 3.2929    incomplete    [109,272,83]    109    [109,272,83]    109
> >> 3.1683    incomplete    [166,329,281]    166    [166,329,281]    166
> >>
> >> # ceph pg dump_stuck unclean
> >> ok
> >> pg_stat    state    up    up_primary    acting    acting_primary
> >> 3.2929    incomplete    [109,272,83]    109    [109,272,83]    109
> >> 3.1683    incomplete    [166,329,281]    166    [166,329,281]    166
> >>
> >>
> >> On OSD 166 there is 100 blocked ops (on 109 too), they all end on
> >> "event": "reached_pg"
> >>
> >> # ceph --admin-daemon /var/run/ceph/ceph-osd.166.asok dump_ops_in_flight
> >> ...
> >>         {
> >>             "description": "osd_op(client.958764031.0:18137113
> >> rbd_data.392585982ae8944a.0000000000000ad4 [set-alloc-hint object_size
> >> 4194304 write_size 4194304,write 2641920~8192] 3.d6195683 RETRY=15
> >> ack+ondisk+retry+write+known_if_redirected e613241)",
> >>             "initiated_at": "2016-06-21 10:19:59.894393",
> >>             "age": 828.025527,
> >>             "duration": 600.020809,
> >>             "type_data": [
> >>                 "reached pg",
> >>                 {
> >>                     "client": "client.958764031",
> >>                     "tid": 18137113
> >>                 },
> >>                 [
> >>                     {
> >>                         "time": "2016-06-21 10:19:59.894393",
> >>                         "event": "initiated"
> >>                     },
> >>                     {
> >>                         "time": "2016-06-21 10:29:59.915202",
> >>                         "event": "reached_pg"
> >>                     }
> >>                 ]
> >>             ]
> >>         }
> >>     ],
> >>     "num_ops": 100
> >> }
> >>
> >>
> >>
> >> On 06/21/2016 12:27 PM, M Ranga Swami Reddy wrote:
> >>> you can use the below cmds:
> >>> ==
> >>>
> >>> ceph pg dump_stuck stale
> >>> ceph pg dump_stuck inactive
> >>> ceph pg dump_stuck unclean
> >>> ===
> >>>
> >>> And the query the PG, which are in unclean or stale state, check for
> >>> any issue with a specific OSD.
> >>>
> >>> Thanks
> >>> Swami
> >>>
> >>> On Tue, Jun 21, 2016 at 3:02 PM, Paweł Sadowski <c...@sadziu.pl>
> wrote:
> >>>> Hello,
> >>>>
> >>>> We have an issue on one of our clusters. One node with 9 OSD was down
> >>>> for more than 12 hours. During that time cluster recovered without
> >>>> problems. When host back to the cluster we got two PGs in incomplete
> >>>> state. We decided to mark OSDs on this host as out but the two PGs are
> >>>> still in incomplete state. Trying to query those pg hangs forever. We
> >>>> were alredy trying restarting OSDs. Is there any way to solve this
> issue
> >>>> without loosing data? Any help appreciate :)
> >>>>
> >>>> # ceph health detail | grep incomplete
> >>>> HEALTH_WARN 2 pgs incomplete; 2 pgs stuck inactive; 2 pgs stuck
> unclean;
> >>>> 200 requests are blocked > 32 sec; 2 osds have slow requests;
> >>>> noscrub,nodeep-scrub flag(s) set
> >>>> pg 3.2929 is stuck inactive since forever, current state incomplete,
> >>>> last acting [109,272,83]
> >>>> pg 3.1683 is stuck inactive since forever, current state incomplete,
> >>>> last acting [166,329,281]
> >>>> pg 3.2929 is stuck unclean since forever, current state incomplete,
> last
> >>>> acting [109,272,83]
> >>>> pg 3.1683 is stuck unclean since forever, current state incomplete,
> last
> >>>> acting [166,329,281]
> >>>> pg 3.1683 is incomplete, acting [166,329,281] (reducing pool vms
> >>>> min_size from 2 may help; search ceph.com/docs for 'incomplete')
> >>>> pg 3.2929 is incomplete, acting [109,272,83] (reducing pool vms
> min_size
> >>>> from 2 may help; search ceph.com/docs for 'incomplete')
> >>>>
> >>>> Directory for PG 3.1683 is present on OSD 166 and containes ~8GB.
> >>>>
> >>>> We didn't try setting min_size to 1 yet (we treat is as a last
> resort).
> >>>>
> >>>>
> >>>>
> >>>> Some cluster info:
> >>>> # ceph --version
> >>>>
> >>>> ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
> >>>>
> >>>> # ceph -s
> >>>>      health HEALTH_WARN
> >>>>             2 pgs incomplete
> >>>>             2 pgs stuck inactive
> >>>>             2 pgs stuck unclean
> >>>>             200 requests are blocked > 32 sec
> >>>>             noscrub,nodeep-scrub flag(s) set
> >>>>      monmap e7: 5 mons at
> >>>>
> {mon-03=*.2:6789/0,mon-04=*.36:6789/0,mon-05=*.81:6789/0,mon-06=*.0:6789/0,mon-07=*.40:6789/0}
> >>>>             election epoch 3250, quorum 0,1,2,3,4
> >>>> mon-06,mon-07,mon-04,mon-03,mon-05
> >>>>      osdmap e613040: 346 osds: 346 up, 337 in
> >>>>             flags noscrub,nodeep-scrub
> >>>>       pgmap v27163053: 18624 pgs, 6 pools, 138 TB data, 39062 kobjects
> >>>>             415 TB used, 186 TB / 601 TB avail
> >>>>                18622 active+clean
> >>>>                    2 incomplete
> >>>>   client io 9992 kB/s rd, 64867 kB/s wr, 8458 op/s
> >>>>
> >>>>
> >>>> # ceph osd pool get vms pg_num
> >>>> pg_num: 16384
> >>>>
> >>>> # ceph osd pool get vms size
> >>>> size: 3
> >>>>
> >>>> # ceph osd pool get vms min_size
> >>>> min_size: 2
> >> --
> >> PS
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Best regards,

施柏安 Desmond Shih
技術研發部 Technical Development
 <http://www.inwinstack.com/>
迎棧科技股份有限公司
│ 886-975-857-982
│ desmond.s@inwinstack <desmon...@inwinstack.com>.com
│ 886-2-7738-2858 #7725
│ 新北市220板橋區遠東路3號5樓C室
Rm.C, 5F., No.3, Yuandong Rd.,
Banqiao Dist., New Taipei City 220, Taiwan (R.O.C)

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Inconsistent PGs

Reply via email to