Re: [ceph-users] Brand new cluster -- pg is stuck inactive

dE Fri, 13 Oct 2017 19:10:15 -0700

On 10/14/2017 12:53 AM, David Turner wrote:

What does your environment look like? Someone recently on the mailinglist had PGs stuck creating because of a networking issue.

On Fri, Oct 13, 2017 at 2:03 PM Ronny Aasen <ronny+ceph-us...@aasen.cx<mailto:ronny%2bceph-us...@aasen.cx>> wrote:


    strange that no osd is acting for your pg's
    can you show the output from
    ceph osd tree


    mvh
    Ronny Aasen



    On 13.10.2017 18:53, dE wrote:
    > Hi,
    >
    >     I'm running ceph 10.2.5 on Debian (official package).
    >
    > It cant seem to create any functional pools --
    >
    > ceph health detail
    > HEALTH_ERR 64 pgs are stuck inactive for more than 300 seconds;
    64 pgs
    > stuck inactive; too few PGs per OSD (21 < min 30)
    > pg 0.39 is stuck inactive for 652.741684, current state
    creating, last
    > acting []
    > pg 0.38 is stuck inactive for 652.741688, current state
    creating, last
    > acting []
    > pg 0.37 is stuck inactive for 652.741690, current state
    creating, last
    > acting []
    > pg 0.36 is stuck inactive for 652.741692, current state
    creating, last
    > acting []
    > pg 0.35 is stuck inactive for 652.741694, current state
    creating, last
    > acting []
    > pg 0.34 is stuck inactive for 652.741696, current state
    creating, last
    > acting []
    > pg 0.33 is stuck inactive for 652.741698, current state
    creating, last
    > acting []
    > pg 0.32 is stuck inactive for 652.741701, current state
    creating, last
    > acting []
    > pg 0.3 is stuck inactive for 652.741762, current state creating,
    last
    > acting []
    > pg 0.2e is stuck inactive for 652.741715, current state
    creating, last
    > acting []
    > pg 0.2d is stuck inactive for 652.741719, current state
    creating, last
    > acting []
    > pg 0.2c is stuck inactive for 652.741721, current state
    creating, last
    > acting []
    > pg 0.2b is stuck inactive for 652.741723, current state
    creating, last
    > acting []
    > pg 0.2a is stuck inactive for 652.741725, current state
    creating, last
    > acting []
    > pg 0.29 is stuck inactive for 652.741727, current state
    creating, last
    > acting []
    > pg 0.28 is stuck inactive for 652.741730, current state
    creating, last
    > acting []
    > pg 0.27 is stuck inactive for 652.741732, current state
    creating, last
    > acting []
    > pg 0.26 is stuck inactive for 652.741734, current state
    creating, last
    > acting []
    > pg 0.3e is stuck inactive for 652.741707, current state
    creating, last
    > acting []
    > pg 0.f is stuck inactive for 652.741761, current state creating,
    last
    > acting []
    > pg 0.3f is stuck inactive for 652.741708, current state
    creating, last
    > acting []
    > pg 0.10 is stuck inactive for 652.741763, current state
    creating, last
    > acting []
    > pg 0.4 is stuck inactive for 652.741773, current state creating,
    last
    > acting []
    > pg 0.5 is stuck inactive for 652.741774, current state creating,
    last
    > acting []
    > pg 0.3a is stuck inactive for 652.741717, current state
    creating, last
    > acting []
    > pg 0.b is stuck inactive for 652.741771, current state creating,
    last
    > acting []
    > pg 0.c is stuck inactive for 652.741772, current state creating,
    last
    > acting []
    > pg 0.3b is stuck inactive for 652.741721, current state
    creating, last
    > acting []
    > pg 0.d is stuck inactive for 652.741774, current state creating,
    last
    > acting []
    > pg 0.3c is stuck inactive for 652.741722, current state
    creating, last
    > acting []
    > pg 0.e is stuck inactive for 652.741776, current state creating,
    last
    > acting []
    > pg 0.3d is stuck inactive for 652.741724, current state
    creating, last
    > acting []
    > pg 0.22 is stuck inactive for 652.741756, current state
    creating, last
    > acting []
    > pg 0.21 is stuck inactive for 652.741758, current state
    creating, last
    > acting []
    > pg 0.a is stuck inactive for 652.741783, current state creating,
    last
    > acting []
    > pg 0.20 is stuck inactive for 652.741761, current state
    creating, last
    > acting []
    > pg 0.9 is stuck inactive for 652.741787, current state creating,
    last
    > acting []
    > pg 0.1f is stuck inactive for 652.741764, current state
    creating, last
    > acting []
    > pg 0.8 is stuck inactive for 652.741790, current state creating,
    last
    > acting []
    > pg 0.7 is stuck inactive for 652.741792, current state creating,
    last
    > acting []
    > pg 0.6 is stuck inactive for 652.741794, current state creating,
    last
    > acting []
    > pg 0.1e is stuck inactive for 652.741770, current state
    creating, last
    > acting []
    > pg 0.1d is stuck inactive for 652.741772, current state
    creating, last
    > acting []
    > pg 0.1c is stuck inactive for 652.741774, current state
    creating, last
    > acting []
    > pg 0.1b is stuck inactive for 652.741777, current state
    creating, last
    > acting []
    > pg 0.1a is stuck inactive for 652.741784, current state
    creating, last
    > acting []
    > pg 0.2 is stuck inactive for 652.741812, current state creating,
    last
    > acting []
    > pg 0.31 is stuck inactive for 652.741762, current state
    creating, last
    > acting []
    > pg 0.19 is stuck inactive for 652.741789, current state
    creating, last
    > acting []
    > pg 0.11 is stuck inactive for 652.741797, current state
    creating, last
    > acting []
    > pg 0.18 is stuck inactive for 652.741793, current state
    creating, last
    > acting []
    > pg 0.1 is stuck inactive for 652.741820, current state creating,
    last
    > acting []
    > pg 0.30 is stuck inactive for 652.741769, current state
    creating, last
    > acting []
    > pg 0.17 is stuck inactive for 652.741797, current state
    creating, last
    > acting []
    > pg 0.0 is stuck inactive for 652.741829, current state creating,
    last
    > acting []
    > pg 0.2f is stuck inactive for 652.741774, current state
    creating, last
    > acting []
    > pg 0.16 is stuck inactive for 652.741802, current state
    creating, last
    > acting []
    > pg 0.12 is stuck inactive for 652.741807, current state
    creating, last
    > acting []
    > pg 0.13 is stuck inactive for 652.741807, current state
    creating, last
    > acting []
    > pg 0.14 is stuck inactive for 652.741807, current state
    creating, last
    > acting []
    > pg 0.15 is stuck inactive for 652.741808, current state
    creating, last
    > acting []
    > pg 0.23 is stuck inactive for 652.741792, current state
    creating, last
    > acting []
    > pg 0.24 is stuck inactive for 652.741793, current state
    creating, last
    > acting []
    > pg 0.25 is stuck inactive for 652.741793, current state
    creating, last
    > acting []
    >
    > I got 3 OSDs --
    >
    > ceph osd stat
    >      osdmap e8: 3 osds: 3 up, 3 in
    >             flags sortbitwise,require_jewel_osds
    >
    > ceph osd pool ls detail
    > pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0
    object_hash
    > rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool
    > stripe_width 0
    >
    > The state inactive seems to be odd for a brand new pool with no
    data.
    >
    > This's my ceph.conf --
    >
    > [global]
    > fsid = 8161c91e-dbd2-4491-adf8-74446bef916a
    > auth cluster required = cephx
    > auth service required = cephx
    > auth client required = cephx
    > debug = 10/10
    > mon host = 10.242.103.139:8567
    <http://10.242.103.139:8567>,10.242.103.140:8567
    <http://10.242.103.140:8567>,10.242.103.141:8567
    <http://10.242.103.141:8567>
    > [mon]
    > ms bind ipv6 = false
    > mon data = /srv/ceph/mon
    > mon addr = 0.0.0.0:8567 <http://0.0.0.0:8567>
    > mon warn on legacy crush tunables = true
    > mon crush min required version = jewel
    > mon initial members = 0,1,2
    > keyring = /etc/ceph/mon_keyring
    > log file = /var/log/ceph/mon.log
    > [osd]
    > osd data = /srv/ceph/osd
    > osd journal = /srv/ceph/osd/osd_journal
    > osd journal size = 10240
    > osd recovery delay start = 10
    > osd recovery thread timeout = 60
    > osd recovery max active = 1
    > osd recovery max chunk = 10485760
    > osd max backfills = 2
    > osd backfill retry interval = 60
    > osd backfill scan min = 100
    > osd backfill scan max = 1000
    > keyring = /etc/ceph/osd_keyring
    >
    > The monitors run on the same host as osds.
    >
    > Any help will be appreciated highly!
    >
    > _______________________________________________
    > ceph-users mailing list
    > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
    > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


    _______________________________________________
    ceph-users mailing list
    ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


These are VMs with a Linux bridge for connectivity.

vlan haver been created over teamed interfaces for the primary interface.

The osds can be seen as up and in and there's a quorum, so not aconnectivity issue.

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Brand new cluster -- pg is stuck inactive

Reply via email to