Re: [ceph-users] 300 active+undersized+degraded+remapped

Deepak Naidu Sat, 01 Jul 2017 12:55:37 -0700

Thanks Max, yes the location hook is ideal way. But as I have few NVME per node 
I ended up using ceph.conf to add them to correct location.


--
Deepak

On Jul 1, 2017, at 11:52 AM, Maxime Guyot 
<max...@root314.com<mailto:max...@root314.com>> wrote:

Hi Deepak,

As Wildo pointed it out in the thread you linked, "osd crush update on start" 
and osd crush location are quick ways to fix this. If you are doing custom 
locations (like for tiering NVMe vs HDD) "osd crush location hook" (Doc: 
http://docs.ceph.com/docs/master/rados/operations/crush-map/#custom-location-hooks
 ) is a good option as well: it allows you to configure the crush location of 
the OSD based on a script, it shouldn't be too hard to detect if the OSD is 
NVMe or SATA and set its location based on that. It's really nice when you add 
new OSDs to see them arrive in the right location automatically.
Shameless plug: you can find an example in this blog post 
http://www.root314.com/2017/01/15/Ceph-storage-tiers/#tiered-crushmap I hope it 
helps

Cheers,
Maxime

On Sat, 1 Jul 2017 at 03:28 Deepak Naidu 
<dna...@nvidia.com<mailto:dna...@nvidia.com>> wrote:
OK, so looks like its ceph crushmap behavior 
http://docs.ceph.com/docs/master/rados/operations/crush-map/

--
Deepak

From: ceph-users 
[mailto:ceph-users-boun...@lists.ceph.com<mailto:ceph-users-boun...@lists.ceph.com>]
 On Behalf Of Deepak Naidu
Sent: Friday, June 30, 2017 7:06 PM
To: David Turner; ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>

Subject: Re: [ceph-users] 300 active+undersized+degraded+remapped

OK, I fixed the issue. But this is very weird. But will list them so its easy 
for other to check when there is similar issue.


1)      I had create rack aware osd tree

2)      I have SATA OSD’s and NVME OSD

3)      I created rack aware policy for both SATA and NVME OSD

4)      NVME OSD was used for CEPH FS Meta

5)      Recently: When I tried reboot of OSD node, it seemed that my journal 
volumes which were on NVME didn’t startup bcos of the UDEV rules and I had to 
create startup script to fix them.

6)      With that. I had rebooted all the OSD one by one monitoring the ceph 
status.

7)      I was at the 3rd last node, then I notice the pgstuck warning. Not sure 
when and what happened, but I started getting this PG stuck issue(which is 
listed in my original email)

8)      I wasted time to look at the issue/error, but then I found the pool 
100% used issue.

9)      Now when I tried ceph osd tree. It looks like my NVME OSD’s went back 
to the host level OSD’s rather than the newly created/mapped NVME rack level. 
Ie no OSD’s under nvme-host name. This was the issue.

10)   Luckily I had created the backup of compiled version. I imported them in 
crushmap rule and now pool status is OK.

But, my question is how did ceph re-map the CRUSH rule ?

I had to create “new host entry” for NVME in crushmap ie

host OSD1-nvme              -- This is just dummy entry in crushmap ie it 
doesn’t resolve to any hostname
host OSD1                          -- This is the actual hostname and resolves 
to IP and has an hostname

Is that the issue ?

Current status

health HEALTH_OK
osdmap e5108: 610 osds: 610 up, 610 in
            flags sortbitwise,require_jewel_osds
      pgmap v247114: 15450 pgs, 3 pools, 322 GB data, 86102 objects
            1155 GB used, 5462 TB / 5463 TB avail
               15450 active+clean


Pool1                  15       233M          0              1820T              
   3737
Pool2                 16          0                0             1820T          
        0
Pool Meta               17     34928k                0                   2357G  
                      28


Partial list of my osd tree

-15    2.76392     rack rack1-nvme
-18    0.69098         host OSD1-nvme
 60    0.69098             osd.60                     up  1.00000          
1.00000
-21    0.69098         host OSD2-nvme
243    0.69098             osd.243                    up  1.00000          
1.00000
-24    0.69098         host OSD3-NGN1-nvme
426    0.69098             osd.426                    up  1.00000          
1.00000
-1 5456.27734 root default
-12 2182.51099     rack rack1-sata
 -2  545.62775         host OSD1
  0    9.09380             osd.0                      up  1.00000          
1.00000
  1    9.09380             osd.1                      up  1.00000          
1.00000
  2    9.09380             osd.2                      up  1.00000          
1.00000
  3    9.09380             osd.3                      up  1.00000          
1.00000
-2  545.62775         host OSD2
  0    9.09380             osd.0                      up  1.00000          
1.00000
  1    9.09380             osd.1                      up  1.00000          
1.00000
  2    9.09380             osd.2                      up  1.00000          
1.00000
  3    9.09380             osd.3                      up  1.00000          
1.00000
-2  545.62775         host OSD2
  0    9.09380             osd.0                      up  1.00000          
1.00000
  1    9.09380             osd.1                      up  1.00000          
1.00000
  2    9.09380             osd.2                      up  1.00000          
1.00000
  3    9.09380             osd.3                      up  1.00000          
1.00000


--
Deepak



From: David Turner [mailto:drakonst...@gmail.com]
Sent: Friday, June 30, 2017 6:36 PM
To: Deepak Naidu; ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] 300 active+undersized+degraded+remapped


ceph status
ceph osd tree

Is your meta pool on ssds instead of the same root and osds as the rest of the 
cluster?

On Fri, Jun 30, 2017, 9:29 PM Deepak Naidu 
<dna...@nvidia.com<mailto:dna...@nvidia.com>> wrote:
Hello,

I am getting the below error and I am unable to get them resolved even after 
starting and stopping the OSD’s. All the OSD’s seems to be up.

How do I repair the OSD’s or fix them manually. I am using cephFS. But oddly 
the ceph df is showing 100% used(which is showing in KB). But the pool is 
1886G(with 3 copies). I can still write to the ceph FS without any issue. Not 
sure why is CEPH reporting the wrong info of 100% full


ceph version 10.2.7

     health HEALTH_WARN
            300 pgs degraded
            300 pgs stuck degraded
            300 pgs stuck unclean
            300 pgs stuck undersized
            300 pgs undersized
            recovery 28/19674 objects degraded (0.142%)
            recovery 56/19674 objects misplaced (0.285%)



GLOBAL:
    SIZE      AVAIL     RAW USED     %RAW USED
    5463T     5462T         187G             0
POOLS:
    NAME                 ID     USED       %USED       MAX AVAIL     OBJECTS
    Pool1                  15       233M          0              1820T          
       3737
    Pool2                 16          0                0               1820T    
              0
    PoolMeta        17     34719k     100.00             0                      
28


Any help is appreciated

--
Deepak
________________________________
This email message is for the sole use of the intended recipient(s) and may 
contain confidential information.  Any unauthorized review, use, disclosure or 
distribution is prohibited.  If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.
________________________________
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 300 active+undersized+degraded+remapped

Reply via email to