Re: [ceph-users] Ceph pg in inactive state

soumya tr Wed, 30 Oct 2019 17:53:14 -0700

Thanks 潘东元 for the response.

The creation of a new pool works, and all the PGs corresponding to that
pool have active+clean state.


When I initially set ceph 3 node cluster using juju charms (replication
count per object was set to 3), there were issues with ceph-osd services.
So I had to delete the units and readd them (I did all of them together,
which must have created issues with rebalancing). I assume that the PGs in
the inactive state points to the 3 old OSDs which were deleted.

I assume I will have to create all the pools again. But my concern is about
the default pools.

-------------------------------
pool 1 'default.rgw.buckets.data' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 16 pgp_num 16 last_change 15 flags hashpspool
stripe_width 0 application rgw
pool 2 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 2 pgp_num 2 last_change 19 flags hashpspool
stripe_width 0 application rgw
pool 3 'default.rgw.data.root' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 2 pgp_num 2 last_change 23 flags hashpspool
stripe_width 0 application rgw
pool 4 'default.rgw.gc' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 2 pgp_num 2 last_change 27 flags hashpspool
stripe_width 0 application rgw
pool 5 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 2 pgp_num 2 last_change 31 flags hashpspool
stripe_width 0 application rgw
pool 6 'default.rgw.intent-log' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 2 pgp_num 2 last_change 35 flags hashpspool
stripe_width 0 application rgw
pool 7 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 2 pgp_num 2 last_change 39 flags hashpspool
stripe_width 0 application rgw
pool 8 'default.rgw.usage' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 2 pgp_num 2 last_change 43 flags hashpspool
stripe_width 0 application rgw
pool 9 'default.rgw.users.keys' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 2 pgp_num 2 last_change 47 flags hashpspool
stripe_width 0 application rgw
pool 10 'default.rgw.users.email' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 2 pgp_num 2 last_change 51 flags hashpspool
stripe_width 0 application rgw
pool 11 'default.rgw.users.swift' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 2 pgp_num 2 last_change 55 flags hashpspool
stripe_width 0 application rgw
pool 12 'default.rgw.users.uid' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 2 pgp_num 2 last_change 59 flags hashpspool
stripe_width 0 application rgw
pool 13 'default.rgw.buckets.extra' replicated size 3 min_size 2 crush_rule
0 object_hash rjenkins pg_num 2 pgp_num 2 last_change 63 flags hashpspool
stripe_width 0 application rgw
pool 14 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_rule
0 object_hash rjenkins pg_num 4 pgp_num 4 last_change 67 flags hashpspool
stripe_width 0 application rgw
-------------------------------

Can you please update if recreating them using rados cli will break
anything?





On Wed, Oct 30, 2019 at 4:56 PM 潘东元 <dongyuanp...@gmail.com> wrote:

> your pg acting set is empty,and cluster report i don't have pg that
> indicate pg dost not have primary osd.
> What are you cluster status when  you are create poo?l
>
> Wido den Hollander <w...@42on.com> 于2019年10月30日周三 下午1:30写道：
> >
> >
> >
> > On 10/30/19 3:04 AM, soumya tr wrote:
> > > Hi all,
> > >
> > > I have a 3 node ceph cluster setup using juju charms. ceph health shows
> > > having inactive pgs.
> > >
> > > ---------------
> > > /# ceph status
> > >   cluster:
> > >     id:     0e36956e-ef64-11e9-b472-00163e6e01e8
> > >     health: HEALTH_WARN
> > >             Reduced data availability: 114 pgs inactive
> > >
> > >   services:
> > >     mon: 3 daemons, quorum
> > > juju-06c3e9-0-lxd-0,juju-06c3e9-2-lxd-0,juju-06c3e9-1-lxd-0
> > >     mgr: juju-06c3e9-0-lxd-0(active), standbys: juju-06c3e9-1-lxd-0,
> > > juju-06c3e9-2-lxd-0
> > >     osd: 3 osds: 3 up, 3 in
> > >
> > >   data:
> > >     pools:   18 pools, 114 pgs
> > >     objects: 0  objects, 0 B
> > >     usage:   3.0 GiB used, 34 TiB / 34 TiB avail
> > >     pgs:     100.000% pgs unknown
> > >              114 unknown/
> > > ---------------
> > >
> > > *PG health as well shows the PGs are in inactive state*
> > >
> > > -------------------------------
> > > /# ceph health detail
> > > HEALTH_WARN Reduced data availability: 114 pgs inactive
> > > PG_AVAILABILITY Reduced data availability: 114 pgs inactive
> > >     pg 1.0 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 1.1 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 1.2 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 1.3 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 1.4 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 1.5 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 1.6 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 1.7 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 1.8 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 1.9 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 1.a is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 2.0 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 2.1 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 3.0 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 3.1 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 4.0 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 4.1 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 5.0 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 5.1 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 6.0 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 6.1 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 7.0 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 7.1 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 8.0 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 8.1 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 9.0 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 9.1 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 10.1 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 11.0 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 17.10 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 17.11 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 17.12 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 17.13 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 17.14 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 17.15 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 17.16 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 17.17 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 17.18 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 17.19 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 17.1a is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 18.10 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 18.11 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 18.12 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 18.13 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 18.14 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 18.15 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 18.16 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 18.17 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 18.19 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > >     pg 18.1a is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > > /
> > > /    pg 18.1b is stuck inactive for 1454.593774, current state unknown,
> > > last acting []/
> > > --------------------------------
> > >
> > > But the weird thing is when I query for individual pg, its unable to
> > > find it :(
> > >
> > > --------------------------------
> > > /# ceph pg 1.1 query
> > > Error ENOENT: i don't have pgid 1.1
> > > /
> > > /
> > > /
> > > /# ceph pg 18.1a query
> > > Error ENOENT: i don't have pgid 18.1a
> > > /
> > > /
> > > /
> > > /# ceph pg 18.1b query
> > > Error ENOENT: i don't have pgid 18.1b/
> > > --------------------------------
> > >
> > > As per https://docs.ceph.com/docs/master/rados/operations/pg-states/,
> > >
> > > ---------------------------------
> > > /unknown : /The ceph-mgr hasn’t yet received any information about the
> > > PG’s state from an OSD since mgr started up.
> > > ---------------------------------
> > >
> > > I confirmed that all ceph osds are up, and the ceph-mgr service is as
> > > well running.
> > >
> >
> > Did you restart the Mgr? And are there maybe firewalls in between which
> > might be causing troubles?
> >
> > This seems like a Mgr issue.
> >
> > Wido
> >
> > > Is there anything else that I need to check to rectify the issue?
> > >
> > >
> > > --
> > > Regards,
> > > Soumya
> > >
> > >
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Regards,
Soumya

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph pg in inactive state

Reply via email to