I think this is resolved—and you're right about the 0-weight of the root bucket being strange. I had created the rack buckets with

# ceph osd crush add-bucket rack-0 rack

whereas I should have used something like

# ceph osd crush add-bucket rack-0 rack root=default

There's a bit in the documentation (https://docs.ceph.com/en/quincy/rados/operations/crush-map) that says "Not all keys need to be specified" (in a different context, I admit).

I might have saved a second or two by omitting "root=default" and maybe half a minute by not checking the CRUSH map carefully afterwards. It was not worth it.

// J

On 2023-04-05 12:01, c...@elchaka.de wrote:
I guess this is related to your crush rules..
Unfortunaly i dont know much about creating the rules...

But someone cloud give more insights when you also provide

crush rule dump

.... your "-1 0 root default" is a bit strange


Am 1. April 2023 01:01:39 MESZ schrieb Johan Hattne <jo...@hattne.se>:

    Here goes:

    # ceph -s
       cluster:
         id:     e1327a10-8b8c-11ed-88b9-3cecef0e3946
         health: HEALTH_OK

       services:
         mon: 5 daemons, quorum 
bcgonen-a,bcgonen-b,bcgonen-c,bcgonen-r0h0,bcgonen-r0h1 (age 16h)
         mgr: bcgonen-b.furndm(active, since 8d), standbys: bcgonen-a.qmmqxj
         mds: 1/1 daemons up, 2 standby
         osd: 36 osds: 36 up (since 16h), 36 in (since 3d); 1041 remapped pgs

       data:
         volumes: 1/1 healthy
         pools:   3 pools, 1041 pgs
         objects: 5.42M objects, 6.5 TiB
         usage:   19 TiB used, 428 TiB / 447 TiB avail
         pgs:     27087125/16252275 objects misplaced (166.667%)
                  1039 active+clean+remapped
                  2    active+clean+remapped+scrubbing+deep

    # ceph osd tree
    ID   CLASS  WEIGHT     TYPE NAME              STATUS  REWEIGHT  PRI-AFF
    -14         149.02008  rack rack-1
      -7         149.02008      host bcgonen-r1h0
      20    hdd   14.55269          osd.20             up   1.00000  1.00000
      21    hdd   14.55269          osd.21             up   1.00000  1.00000
      22    hdd   14.55269          osd.22             up   1.00000  1.00000
      23    hdd   14.55269          osd.23             up   1.00000  1.00000
      24    hdd   14.55269          osd.24             up   1.00000  1.00000
      25    hdd   14.55269          osd.25             up   1.00000  1.00000
      26    hdd   14.55269          osd.26             up   1.00000  1.00000
      27    hdd   14.55269          osd.27             up   1.00000  1.00000
      28    hdd   14.55269          osd.28             up   1.00000  1.00000
      29    hdd   14.55269          osd.29             up   1.00000  1.00000
      34    ssd    1.74660          osd.34             up   1.00000  1.00000
      35    ssd    1.74660          osd.35             up   1.00000  1.00000
    -13         298.04016  rack rack-0
      -3         149.02008      host bcgonen-r0h0
       0    hdd   14.55269          osd.0              up   1.00000  1.00000
       1    hdd   14.55269          osd.1              up   1.00000  1.00000
       2    hdd   14.55269          osd.2              up   1.00000  1.00000
       3    hdd   14.55269          osd.3              up   1.00000  1.00000
       4    hdd   14.55269          osd.4              up   1.00000  1.00000
       5    hdd   14.55269          osd.5              up   1.00000  1.00000
       6    hdd   14.55269          osd.6              up   1.00000  1.00000
       7    hdd   14.55269          osd.7              up   1.00000  1.00000
       8    hdd   14.55269          osd.8              up   1.00000  1.00000
       9    hdd   14.55269          osd.9              up   1.00000  1.00000
      30    ssd    1.74660          osd.30             up   1.00000  1.00000
      31    ssd    1.74660          osd.31             up   1.00000  1.00000
      -5         149.02008      host bcgonen-r0h1
      10    hdd   14.55269          osd.10             up   1.00000  1.00000
      11    hdd   14.55269          osd.11             up   1.00000  1.00000
      12    hdd   14.55269          osd.12             up   1.00000  1.00000
      13    hdd   14.55269          osd.13             up   1.00000  1.00000
      14    hdd   14.55269          osd.14             up   1.00000  1.00000
      15    hdd   14.55269          osd.15             up   1.00000  1.00000
      16    hdd   14.55269          osd.16             up   1.00000  1.00000
      17    hdd   14.55269          osd.17             up   1.00000  1.00000
      18    hdd   14.55269          osd.18             up   1.00000  1.00000
      19    hdd   14.55269          osd.19             up   1.00000  1.00000
      32    ssd    1.74660          osd.32             up   1.00000  1.00000
      33    ssd    1.74660          osd.33             up   1.00000  1.00000
      -1                 0  root default

    # ceph osd pool ls detail
    pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash 
rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 31 flags hashpspool 
stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr
    pool 2 'cephfs.cephfs.meta' replicated size 3 min_size 2 crush_rule 2 
object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode on last_change 9833 
lfor 0/0/584 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 
recovery_priority 5 application cephfs
    pool 3 'cephfs.cephfs.data' replicated size 3 min_size 2 crush_rule 1 
object_hash rjenkins pg_num 1024 pgp_num 1024 autoscale_mode on last_change 
7630 lfor 0/1831/6544 flags hashpspool,bulk stripe_width 0 application cephfs

    crush_rules 1 and 2 are just used to assign the data and meta pool to HDD 
and SSD, respectively (failure domain: host).

    // J

    On 2023-03-31 15:37, c...@elchaka.de wrote:

        Need to know some more about your cluster...

        Ceph -s
        Ceph osd df tree
        Replica or ec?
        ...

        Perhaps this can give us some insight
        Mehmet

        Am 31. März 2023 18:08:38 MESZ schrieb Johan Hattne
        <jo...@hattne.se>:

        Dear all;

        Up until a few hours ago, I had a seemingly normally-behaving
        cluster (Quincy, 17.2.5) with 36 OSDs, evenly distributed across
        3 of its 6 nodes. The cluster is only used for CephFS and the
        only non-standard configuration I can think of is that I had 2
        active MDSs, but only 1 standby. I had also doubled
        mds_cache_memory limit to 8 GB (all OSD hosts have 256 G of RAM)
        at some point in the past.

        Then I rebooted one of the OSD nodes. The rebooted node held one
        of the active MDSs. Now the node is back up: ceph -s says the
        cluster is healthy, but all PGs are in a active+clean+remapped
        state and 166.67% of the objects are misplaced (dashboard:
        -66.66% healthy).

        The data pool is a threefold replica with 5.4M object, the
        number of misplaced objects is reported as 27087410/16252446.
        The denominator in the ratio makes sense to me (16.2M / 3 =
        5.4M), but the numerator does not. I also note that the ratio is
        *exactly* 5 / 3. The filesystem is still mounted and appears to
        be usable, but df reports it as 100% full; I suspect it would
        say 167% but that is capped somewhere.

        Any ideas about what is going on? Any suggestions for recovery?

        // Best wishes; Johan
        ------------------------------------------------------------------------
        ceph-users mailing list -- ceph-users@ceph.io
        To unsubscribe send an email to ceph-users-le...@ceph.io


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to