Space on hosts in rack2 does not add up to cover space in rack1. After
enough data are written to the cluster all pgs on rack2 would be
allocated and the cluster won't be able to find a free pg to map new
data to for the 3rd replica.

Bottomline, spread your big disks to all 4 hosts, or add some more
disks/OSDs to hosts on rack2. As a last resort, you may decrease the
failure domain to 'osd' instead of the default 'host' but that is very
dangerous for a production cluster.

-K.


On 03/24/2016 04:36 PM, yang sheng wrote:
> Hi all,
> 
> I am testing the ceph right now using 4 servers with 8 OSDs (all OSDs
> are up and in). I have 3 pools in my cluster (image pool, volume pool
> and default rbd pool), both image and volume pool have replication size
> =3. Based on the pg equation, there are 448 pgs in my cluster. 
> 
> $ ceph osd tree
> ID WEIGHT   TYPE NAME                          UP/DOWN REWEIGHT
> PRIMARY-AFFINITY 
> -1 16.07797 root default                                                
>         
> -5 14.38599     rack rack1                                              
> -2  7.17599         host psusnjhhdlc7iosstb001                          
>         
>  0  3.53899             osd.0                       up  1.00000        
>  1.00000 
>  1  3.63699             osd.1                       up  1.00000        
>  1.00000 
> -3  7.20999         host psusnjhhdlc7iosstb002                          
>         
>  2  3.63699             osd.2                       up  1.00000        
>  1.00000 
>  3  3.57300             osd.3                       up  1.00000        
>  1.00000 
> -6  1.69199     rack rack2                                              
> -4  0.83600         host psusnjhhdlc7iosstb003                          
>         
>  5  0.43500             osd.5                       up  1.00000        
>  1.00000 
>  4  0.40099             osd.4                       up  1.00000        
>  1.00000 
> -7  0.85599         host psusnjhhdlc7iosstb004                          
>         
>  6  0.40099             osd.6                       up  1.00000        
>        0 
>  7  0.45499             osd.7                       up  1.00000        
>        0 
> 
> $ ceph osd dump
> pool 0 'rbd' replicated size 2 min_size 2 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 745 flags hashpspool
> stripe_width 0
> pool 3 'imagesliberty' replicated size 3 min_size 2 crush_ruleset 0
> object_hash rjenkins pg_num 128 pgp_num 128 last_change 777 flags
> hashpspool stripe_width 0
> removed_snaps [1~1,8~c]
> pool 4 'volumesliberty' replicated size 3 min_size 2 crush_ruleset 0
> object_hash rjenkins pg_num 256 pgp_num 256 last_change 776 flags
> hashpspool stripe_width 0
> removed_snaps [1~1,15~14,2a~1,2c~1,2e~24,57~2,5a~18,74~2,78~1,94~5,b7~2]
> 
> 
> Right now, the ceph health is HEALTH_WARN. I use "ceph health detail"
>  to dump the information, and there is a pg stuck.
> 
> $ ceph -s:
> cluster 2e906379-f211-4329-8faf-a8e7600b8418
>      health HEALTH_WARN
>             1 pgs degraded
>             1 pgs stuck degraded
>             1 pgs stuck inactive
>             1 pgs stuck unclean
>             1 pgs stuck undersized
>             1 pgs undersized
>             recovery 23/55329 objects degraded (0.042%)
>      monmap e14: 2 mons at
> {psusnjhhdlc7ioscom002=192.168.2.62:6789/0,psusnjhhdlc7ioscon002=192.168.2.12:6789/0
> <http://192.168.2.62:6789/0,psusnjhhdlc7ioscon002=192.168.2.12:6789/0>}
>             election epoch 106, quorum 0,1
> psusnjhhdlc7ioscon002,psusnjhhdlc7ioscom002
>      osdmap e776: 8 osds: 8 up, 8 in
>             flags sortbitwise
>       pgmap v519644: 448 pgs, 3 pools, 51541 MB data, 18443 objects
>             170 GB used, 16294 GB / 16464 GB avail
>             23/55329 objects degraded (0.042%)
>                  447 active+clean
>                    1 undersized+degraded+peered
> 
> 
> $ ceph health detail
> HEALTH_WARN 1 pgs degraded; 1 pgs stuck unclean; 1 pgs undersized;
> recovery 23/55329 objects degraded (0.042%)
> pg 3.d is stuck unclean for 58161.177025, current state
> active+undersized+degraded, last acting [1,3]
> pg 3.d is active+undersized+degraded, acting [1,3]
> recovery 23/55329 objects degraded (0.042%)
> 
> If I am right, the pg 3.d has only 2 replicas, primary in OSD.1 and
> secondary in OSD.3. There is no 3rd replica in the cluster. That's why
> it gives the unhealthy warning.  
> 
> I tried to decrease the replication size =2 for image pool and the stuck
> pg disappeared. After I change the size back to 3, still the ceph didn't
> create the 3rd replica for pg 3.d.
> 
> I also tried to shutdown Server 0 which has OSD.0 and OSD.1 which let pg
> d.3 has only 1 replica in the cluster. Still it didn't create another
> copy even I set size =3 and min_size=2. Also, there are more pg in
> degraded undersized or unclean mode.
> 
> $ ceph pg map 3.d
> osdmap e796 pg 3.d (3.d) -> up [3] acting [3]
> 
> $ ceph -s
>     cluster 2e906379-f211-4329-8faf-a8e7600b8418
>      health HEALTH_WARN
>             16 pgs degraded
>             16 pgs stuck degraded
>             2 pgs stuck inactive
>             37 pgs stuck unclean
>             16 pgs stuck undersized
>             16 pgs undersized
>             recovery 1427/55329 objects degraded (2.579%)
>             recovery 780/55329 objects misplaced (1.410%)
>      monmap e14: 2 mons at
> {psusnjhhdlc7ioscom002=192.168.2.62:6789/0,psusnjhhdlc7ioscon002=192.168.2.12:6789/0
> <http://192.168.2.62:6789/0,psusnjhhdlc7ioscon002=192.168.2.12:6789/0>}
>             election epoch 106, quorum 0,1
> psusnjhhdlc7ioscon002,psusnjhhdlc7ioscom002
>      osdmap e796: 8 osds: 6 up, 6 in; 21 remapped pgs
>             flags sortbitwise
>       pgmap v521445: 448 pgs, 3 pools, 51541 MB data, 18443 objects
>             168 GB used, 8947 GB / 9116 GB avail
>             1427/55329 objects degraded (2.579%)
>             780/55329 objects misplaced (1.410%)
>                  411 active+clean
>                   21 active+remapped
>                   14 active+undersized+degraded
>                    2 undersized+degraded+peered
> 
> Can anyone advise how fix pg 3.d problem and why ceph couldn't recover
> if I shutdown one server (2 OSDs)
> 
> Thanks
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to