Re: [ceph-users] Blocked requests activating+remapped afterextendingpg(p)_num

2018-05-17 Thread Kevin Olbrich
Hi!

@Paul
Thanks! I know, I read the whole topic about size 2 some months ago. But
this has not been my decision, I had to set it up like that.

In the meantime, I did a reboot of node1001 and node1002 with flag "noout"
set and now peering has finished and only 0.0x% are rebalanced.
IO is flowing again. This happend as soon as the OSD was down (not out).

This looks very much like a bug for me, isn't it? Restarting an OSD to
"repair" crush?
Also I did query the pg but it did not show any error. It just lists stats
and that the pg was active since 8:40 this morning.
There are row(s) with "blocked by" but no value, is that supposed to be
filled with data?

Kind regards,
Kevin



2018-05-17 16:45 GMT+02:00 Paul Emmerich :

> Check ceph pg query, it will (usually) tell you why something is stuck
> inactive.
>
> Also: never do min_size 1.
>
>
> Paul
>
>
> 2018-05-17 15:48 GMT+02:00 Kevin Olbrich :
>
>> I was able to obtain another NVMe to get the HDDs in node1004 into the
>> cluster.
>> The number of disks (all 1TB) is now balanced between racks, still some
>> inactive PGs:
>>
>>   data:
>> pools:   2 pools, 1536 pgs
>> objects: 639k objects, 2554 GB
>> usage:   5167 GB used, 14133 GB / 19300 GB avail
>> pgs: 1.562% pgs not active
>>  1183/1309952 objects degraded (0.090%)
>>  199660/1309952 objects misplaced (15.242%)
>>  1072 active+clean
>>  405  active+remapped+backfill_wait
>>  35   active+remapped+backfilling
>>  21   activating+remapped
>>  3activating+undersized+degraded+remapped
>>
>>
>>
>> ID  CLASS WEIGHT   TYPE NAME STATUS REWEIGHT PRI-AFF
>>  -1   18.85289 root default
>> -16   18.85289 datacenter dc01
>> -19   18.85289 pod dc01-agg01
>> -108.98700 rack dc01-rack02
>>  -44.03899 host node1001
>>   0   hdd  0.90999 osd.0 up  1.0 1.0
>>   1   hdd  0.90999 osd.1 up  1.0 1.0
>>   5   hdd  0.90999 osd.5 up  1.0 1.0
>>   2   ssd  0.43700 osd.2 up  1.0 1.0
>>   3   ssd  0.43700 osd.3 up  1.0 1.0
>>   4   ssd  0.43700 osd.4 up  1.0 1.0
>>  -74.94899 host node1002
>>   9   hdd  0.90999 osd.9 up  1.0 1.0
>>  10   hdd  0.90999 osd.10up  1.0 1.0
>>  11   hdd  0.90999 osd.11up  1.0 1.0
>>  12   hdd  0.90999 osd.12up  1.0 1.0
>>   6   ssd  0.43700 osd.6 up  1.0 1.0
>>   7   ssd  0.43700 osd.7 up  1.0 1.0
>>   8   ssd  0.43700 osd.8 up  1.0 1.0
>> -119.86589 rack dc01-rack03
>> -225.38794 host node1003
>>  17   hdd  0.90999 osd.17up  1.0 1.0
>>  18   hdd  0.90999 osd.18up  1.0 1.0
>>  24   hdd  0.90999 osd.24up  1.0 1.0
>>  26   hdd  0.90999 osd.26up  1.0 1.0
>>  13   ssd  0.43700 osd.13up  1.0 1.0
>>  14   ssd  0.43700 osd.14up  1.0 1.0
>>  15   ssd  0.43700 osd.15up  1.0 1.0
>>  16   ssd  0.43700 osd.16up  1.0 1.0
>> -254.47795 host node1004
>>  23   hdd  0.90999 osd.23up  1.0 1.0
>>  25   hdd  0.90999 osd.25up  1.0 1.0
>>  27   hdd  0.90999 osd.27up  1.0 1.0
>>  19   ssd  0.43700 osd.19up  1.0 1.0
>>  20   ssd  0.43700 osd.20up  1.0 1.0
>>  21   ssd  0.43700 osd.21up  1.0 1.0
>>  22   ssd  0.43700 osd.22up  1.0 1.0
>>
>>
>> Pools are size 2, min_size 1 during setup.
>>
>> The count of PGs in activate state are related to the weight of OSDs but
>> why are they failing to proceed to active+clean or active+remapped?
>>
>> Kind regards,
>> Kevin
>>
>> 2018-05-17 14:05 GMT+02:00 Kevin Olbrich :
>>
>>> Ok, I just waited some time but I still got some "activating" issues:
>>>
>>>   data:
>>> pools:   2 pools, 1536 pgs
>>> objects: 639k objects, 2554 GB
>>> usage:   5194 GB used, 11312 GB / 16506 GB avail
>>> pgs: 7.943% pgs not active
>>>  5567/1309948 objects degraded (0.425%)
>>>  195386/1309948 objects misplaced (14.916%)
>>>  1147 

Re: [ceph-users] Blocked requests activating+remapped afterextendingpg(p)_num

2018-05-17 Thread Paul Emmerich
Check ceph pg query, it will (usually) tell you why something is stuck
inactive.

Also: never do min_size 1.


Paul


2018-05-17 15:48 GMT+02:00 Kevin Olbrich :

> I was able to obtain another NVMe to get the HDDs in node1004 into the
> cluster.
> The number of disks (all 1TB) is now balanced between racks, still some
> inactive PGs:
>
>   data:
> pools:   2 pools, 1536 pgs
> objects: 639k objects, 2554 GB
> usage:   5167 GB used, 14133 GB / 19300 GB avail
> pgs: 1.562% pgs not active
>  1183/1309952 objects degraded (0.090%)
>  199660/1309952 objects misplaced (15.242%)
>  1072 active+clean
>  405  active+remapped+backfill_wait
>  35   active+remapped+backfilling
>  21   activating+remapped
>  3activating+undersized+degraded+remapped
>
>
>
> ID  CLASS WEIGHT   TYPE NAME STATUS REWEIGHT PRI-AFF
>  -1   18.85289 root default
> -16   18.85289 datacenter dc01
> -19   18.85289 pod dc01-agg01
> -108.98700 rack dc01-rack02
>  -44.03899 host node1001
>   0   hdd  0.90999 osd.0 up  1.0 1.0
>   1   hdd  0.90999 osd.1 up  1.0 1.0
>   5   hdd  0.90999 osd.5 up  1.0 1.0
>   2   ssd  0.43700 osd.2 up  1.0 1.0
>   3   ssd  0.43700 osd.3 up  1.0 1.0
>   4   ssd  0.43700 osd.4 up  1.0 1.0
>  -74.94899 host node1002
>   9   hdd  0.90999 osd.9 up  1.0 1.0
>  10   hdd  0.90999 osd.10up  1.0 1.0
>  11   hdd  0.90999 osd.11up  1.0 1.0
>  12   hdd  0.90999 osd.12up  1.0 1.0
>   6   ssd  0.43700 osd.6 up  1.0 1.0
>   7   ssd  0.43700 osd.7 up  1.0 1.0
>   8   ssd  0.43700 osd.8 up  1.0 1.0
> -119.86589 rack dc01-rack03
> -225.38794 host node1003
>  17   hdd  0.90999 osd.17up  1.0 1.0
>  18   hdd  0.90999 osd.18up  1.0 1.0
>  24   hdd  0.90999 osd.24up  1.0 1.0
>  26   hdd  0.90999 osd.26up  1.0 1.0
>  13   ssd  0.43700 osd.13up  1.0 1.0
>  14   ssd  0.43700 osd.14up  1.0 1.0
>  15   ssd  0.43700 osd.15up  1.0 1.0
>  16   ssd  0.43700 osd.16up  1.0 1.0
> -254.47795 host node1004
>  23   hdd  0.90999 osd.23up  1.0 1.0
>  25   hdd  0.90999 osd.25up  1.0 1.0
>  27   hdd  0.90999 osd.27up  1.0 1.0
>  19   ssd  0.43700 osd.19up  1.0 1.0
>  20   ssd  0.43700 osd.20up  1.0 1.0
>  21   ssd  0.43700 osd.21up  1.0 1.0
>  22   ssd  0.43700 osd.22up  1.0 1.0
>
>
> Pools are size 2, min_size 1 during setup.
>
> The count of PGs in activate state are related to the weight of OSDs but
> why are they failing to proceed to active+clean or active+remapped?
>
> Kind regards,
> Kevin
>
> 2018-05-17 14:05 GMT+02:00 Kevin Olbrich :
>
>> Ok, I just waited some time but I still got some "activating" issues:
>>
>>   data:
>> pools:   2 pools, 1536 pgs
>> objects: 639k objects, 2554 GB
>> usage:   5194 GB used, 11312 GB / 16506 GB avail
>> pgs: 7.943% pgs not active
>>  5567/1309948 objects degraded (0.425%)
>>  195386/1309948 objects misplaced (14.916%)
>>  1147 active+clean
>>  235  active+remapped+backfill_wait
>> * 107  activating+remapped*
>>  32   active+remapped+backfilling
>> * 15   activating+undersized+degraded+remapped*
>>
>> I set these settings during runtime:
>> ceph tell 'osd.*' injectargs '--osd-max-backfills 16'
>> ceph tell 'osd.*' injectargs '--osd-recovery-max-active 4'
>> ceph tell 'mon.*' injectargs '--mon_max_pg_per_osd 800'
>> ceph tell 'osd.*' injectargs '--osd_max_pg_per_osd_hard_ratio 32'
>>
>> Sure, mon_max_pg_per_osd is oversized but this is just temporary.
>> Calculated PGs per OSD is 200.
>>
>> I searched the net and the bugtracker but most posts suggest
>> osd_max_pg_per_osd_hard_ratio = 32 to fix this issue but this time, I
>> got more stuck PGs.
>>
>> Any more hints?
>>
>> Kind regards.
>> Kevin
>>
>> 2018-05-17 13:37 

Re: [ceph-users] Blocked requests activating+remapped afterextendingpg(p)_num

2018-05-17 Thread Kevin Olbrich
I was able to obtain another NVMe to get the HDDs in node1004 into the
cluster.
The number of disks (all 1TB) is now balanced between racks, still some
inactive PGs:

  data:
pools:   2 pools, 1536 pgs
objects: 639k objects, 2554 GB
usage:   5167 GB used, 14133 GB / 19300 GB avail
pgs: 1.562% pgs not active
 1183/1309952 objects degraded (0.090%)
 199660/1309952 objects misplaced (15.242%)
 1072 active+clean
 405  active+remapped+backfill_wait
 35   active+remapped+backfilling
 21   activating+remapped
 3activating+undersized+degraded+remapped



ID  CLASS WEIGHT   TYPE NAME STATUS REWEIGHT PRI-AFF
 -1   18.85289 root default
-16   18.85289 datacenter dc01
-19   18.85289 pod dc01-agg01
-108.98700 rack dc01-rack02
 -44.03899 host node1001
  0   hdd  0.90999 osd.0 up  1.0 1.0
  1   hdd  0.90999 osd.1 up  1.0 1.0
  5   hdd  0.90999 osd.5 up  1.0 1.0
  2   ssd  0.43700 osd.2 up  1.0 1.0
  3   ssd  0.43700 osd.3 up  1.0 1.0
  4   ssd  0.43700 osd.4 up  1.0 1.0
 -74.94899 host node1002
  9   hdd  0.90999 osd.9 up  1.0 1.0
 10   hdd  0.90999 osd.10up  1.0 1.0
 11   hdd  0.90999 osd.11up  1.0 1.0
 12   hdd  0.90999 osd.12up  1.0 1.0
  6   ssd  0.43700 osd.6 up  1.0 1.0
  7   ssd  0.43700 osd.7 up  1.0 1.0
  8   ssd  0.43700 osd.8 up  1.0 1.0
-119.86589 rack dc01-rack03
-225.38794 host node1003
 17   hdd  0.90999 osd.17up  1.0 1.0
 18   hdd  0.90999 osd.18up  1.0 1.0
 24   hdd  0.90999 osd.24up  1.0 1.0
 26   hdd  0.90999 osd.26up  1.0 1.0
 13   ssd  0.43700 osd.13up  1.0 1.0
 14   ssd  0.43700 osd.14up  1.0 1.0
 15   ssd  0.43700 osd.15up  1.0 1.0
 16   ssd  0.43700 osd.16up  1.0 1.0
-254.47795 host node1004
 23   hdd  0.90999 osd.23up  1.0 1.0
 25   hdd  0.90999 osd.25up  1.0 1.0
 27   hdd  0.90999 osd.27up  1.0 1.0
 19   ssd  0.43700 osd.19up  1.0 1.0
 20   ssd  0.43700 osd.20up  1.0 1.0
 21   ssd  0.43700 osd.21up  1.0 1.0
 22   ssd  0.43700 osd.22up  1.0 1.0


Pools are size 2, min_size 1 during setup.

The count of PGs in activate state are related to the weight of OSDs but
why are they failing to proceed to active+clean or active+remapped?

Kind regards,
Kevin

2018-05-17 14:05 GMT+02:00 Kevin Olbrich :

> Ok, I just waited some time but I still got some "activating" issues:
>
>   data:
> pools:   2 pools, 1536 pgs
> objects: 639k objects, 2554 GB
> usage:   5194 GB used, 11312 GB / 16506 GB avail
> pgs: 7.943% pgs not active
>  5567/1309948 objects degraded (0.425%)
>  195386/1309948 objects misplaced (14.916%)
>  1147 active+clean
>  235  active+remapped+backfill_wait
> * 107  activating+remapped*
>  32   active+remapped+backfilling
> * 15   activating+undersized+degraded+remapped*
>
> I set these settings during runtime:
> ceph tell 'osd.*' injectargs '--osd-max-backfills 16'
> ceph tell 'osd.*' injectargs '--osd-recovery-max-active 4'
> ceph tell 'mon.*' injectargs '--mon_max_pg_per_osd 800'
> ceph tell 'osd.*' injectargs '--osd_max_pg_per_osd_hard_ratio 32'
>
> Sure, mon_max_pg_per_osd is oversized but this is just temporary.
> Calculated PGs per OSD is 200.
>
> I searched the net and the bugtracker but most posts suggest
> osd_max_pg_per_osd_hard_ratio = 32 to fix this issue but this time, I got
> more stuck PGs.
>
> Any more hints?
>
> Kind regards.
> Kevin
>
> 2018-05-17 13:37 GMT+02:00 Kevin Olbrich :
>
>> PS: Cluster currently is size 2, I used PGCalc on Ceph website which, by
>> default, will place 200 PGs on each OSD.
>> I read about the protection in the docs and later noticed that I better
>> had only placed 100 PGs.
>>
>>
>> 2018-05-17 13:35 GMT+02:00 Kevin Olbrich :
>>
>>> Hi!

Re: [ceph-users] Blocked requests activating+remapped afterextendingpg(p)_num

2018-05-17 Thread Kevin Olbrich
Ok, I just waited some time but I still got some "activating" issues:

  data:
pools:   2 pools, 1536 pgs
objects: 639k objects, 2554 GB
usage:   5194 GB used, 11312 GB / 16506 GB avail
pgs: 7.943% pgs not active
 5567/1309948 objects degraded (0.425%)
 195386/1309948 objects misplaced (14.916%)
 1147 active+clean
 235  active+remapped+backfill_wait
* 107  activating+remapped*
 32   active+remapped+backfilling
* 15   activating+undersized+degraded+remapped*

I set these settings during runtime:
ceph tell 'osd.*' injectargs '--osd-max-backfills 16'
ceph tell 'osd.*' injectargs '--osd-recovery-max-active 4'
ceph tell 'mon.*' injectargs '--mon_max_pg_per_osd 800'
ceph tell 'osd.*' injectargs '--osd_max_pg_per_osd_hard_ratio 32'

Sure, mon_max_pg_per_osd is oversized but this is just temporary.
Calculated PGs per OSD is 200.

I searched the net and the bugtracker but most posts suggest
osd_max_pg_per_osd_hard_ratio
= 32 to fix this issue but this time, I got more stuck PGs.

Any more hints?

Kind regards.
Kevin

2018-05-17 13:37 GMT+02:00 Kevin Olbrich :

> PS: Cluster currently is size 2, I used PGCalc on Ceph website which, by
> default, will place 200 PGs on each OSD.
> I read about the protection in the docs and later noticed that I better
> had only placed 100 PGs.
>
>
> 2018-05-17 13:35 GMT+02:00 Kevin Olbrich :
>
>> Hi!
>>
>> Thanks for your quick reply.
>> Before I read your mail, i applied the following conf to my OSDs:
>> ceph tell 'osd.*' injectargs '--osd_max_pg_per_osd_hard_ratio 32'
>>
>> Status is now:
>>   data:
>> pools:   2 pools, 1536 pgs
>> objects: 639k objects, 2554 GB
>> usage:   5211 GB used, 11295 GB / 16506 GB avail
>> pgs: 7.943% pgs not active
>>  5567/1309948 objects degraded (0.425%)
>>  252327/1309948 objects misplaced (19.262%)
>>  1030 active+clean
>>  351  active+remapped+backfill_wait
>>  107  activating+remapped
>>  33   active+remapped+backfilling
>>  15   activating+undersized+degraded+remapped
>>
>> A little bit better but still some non-active PGs.
>> I will investigate your other hints!
>>
>> Thanks
>> Kevin
>>
>> 2018-05-17 13:30 GMT+02:00 Burkhard Linke > bio.uni-giessen.de>:
>>
>>> Hi,
>>>
>>>
>>>
>>> On 05/17/2018 01:09 PM, Kevin Olbrich wrote:
>>>
 Hi!

 Today I added some new OSDs (nearly doubled) to my luminous cluster.
 I then changed pg(p)_num from 256 to 1024 for that pool because it was
 complaining about to few PGs. (I noticed that should better have been
 small
 changes).

 This is the current status:

  health: HEALTH_ERR
  336568/1307562 objects misplaced (25.740%)
  Reduced data availability: 128 pgs inactive, 3 pgs
 peering, 1
 pg stale
  Degraded data redundancy: 6985/1307562 objects degraded
 (0.534%), 19 pgs degraded, 19 pgs undersized
  107 slow requests are blocked > 32 sec
  218 stuck requests are blocked > 4096 sec

data:
  pools:   2 pools, 1536 pgs
  objects: 638k objects, 2549 GB
  usage:   5210 GB used, 11295 GB / 16506 GB avail
  pgs: 0.195% pgs unknown
   8.138% pgs not active
   6985/1307562 objects degraded (0.534%)
   336568/1307562 objects misplaced (25.740%)
   855 active+clean
   517 active+remapped+backfill_wait
   107 activating+remapped
   31  active+remapped+backfilling
   15  activating+undersized+degraded+remapped
   4   active+undersized+degraded+remapped+backfilling
   3   unknown
   3   peering
   1   stale+active+clean

>>>
>>> You need to resolve the unknown/peering/activating pgs first. You have
>>> 1536 PGs, assuming replication size 3 this make 4608 PG copies. Given 25
>>> OSDs and the heterogenous host sizes, I assume that some OSDs hold more
>>> than 200 PGs. There's a threshold for the number of PGs; reaching this
>>> threshold keeps the OSDs from accepting new PGs.
>>>
>>> Try to increase the threshold  (mon_max_pg_per_osd /
>>> max_pg_per_osd_hard_ratio / osd_max_pg_per_osd_hard_ratio, not sure about
>>> the exact one, consult the documentation) to allow more PGs on the OSDs. If
>>> this is the cause of the problem, the peering and activating states should
>>> be resolved within a short time.
>>>
>>> You can also check the number of PGs per OSD with 'ceph osd df'; the
>>> last column is the current number of PGs.
>>>
>>>

 OSD tree:

 ID  CLASS WEIGHT   TYPE NAME STATUS REWEIGHT PRI-AFF
   -1   16.12177 root default
 -16   

Re: [ceph-users] Blocked requests activating+remapped afterextendingpg(p)_num

2018-05-17 Thread Kevin Olbrich
PS: Cluster currently is size 2, I used PGCalc on Ceph website which, by
default, will place 200 PGs on each OSD.
I read about the protection in the docs and later noticed that I better had
only placed 100 PGs.


2018-05-17 13:35 GMT+02:00 Kevin Olbrich :

> Hi!
>
> Thanks for your quick reply.
> Before I read your mail, i applied the following conf to my OSDs:
> ceph tell 'osd.*' injectargs '--osd_max_pg_per_osd_hard_ratio 32'
>
> Status is now:
>   data:
> pools:   2 pools, 1536 pgs
> objects: 639k objects, 2554 GB
> usage:   5211 GB used, 11295 GB / 16506 GB avail
> pgs: 7.943% pgs not active
>  5567/1309948 objects degraded (0.425%)
>  252327/1309948 objects misplaced (19.262%)
>  1030 active+clean
>  351  active+remapped+backfill_wait
>  107  activating+remapped
>  33   active+remapped+backfilling
>  15   activating+undersized+degraded+remapped
>
> A little bit better but still some non-active PGs.
> I will investigate your other hints!
>
> Thanks
> Kevin
>
> 2018-05-17 13:30 GMT+02:00 Burkhard Linke  bio.uni-giessen.de>:
>
>> Hi,
>>
>>
>>
>> On 05/17/2018 01:09 PM, Kevin Olbrich wrote:
>>
>>> Hi!
>>>
>>> Today I added some new OSDs (nearly doubled) to my luminous cluster.
>>> I then changed pg(p)_num from 256 to 1024 for that pool because it was
>>> complaining about to few PGs. (I noticed that should better have been
>>> small
>>> changes).
>>>
>>> This is the current status:
>>>
>>>  health: HEALTH_ERR
>>>  336568/1307562 objects misplaced (25.740%)
>>>  Reduced data availability: 128 pgs inactive, 3 pgs peering,
>>> 1
>>> pg stale
>>>  Degraded data redundancy: 6985/1307562 objects degraded
>>> (0.534%), 19 pgs degraded, 19 pgs undersized
>>>  107 slow requests are blocked > 32 sec
>>>  218 stuck requests are blocked > 4096 sec
>>>
>>>data:
>>>  pools:   2 pools, 1536 pgs
>>>  objects: 638k objects, 2549 GB
>>>  usage:   5210 GB used, 11295 GB / 16506 GB avail
>>>  pgs: 0.195% pgs unknown
>>>   8.138% pgs not active
>>>   6985/1307562 objects degraded (0.534%)
>>>   336568/1307562 objects misplaced (25.740%)
>>>   855 active+clean
>>>   517 active+remapped+backfill_wait
>>>   107 activating+remapped
>>>   31  active+remapped+backfilling
>>>   15  activating+undersized+degraded+remapped
>>>   4   active+undersized+degraded+remapped+backfilling
>>>   3   unknown
>>>   3   peering
>>>   1   stale+active+clean
>>>
>>
>> You need to resolve the unknown/peering/activating pgs first. You have
>> 1536 PGs, assuming replication size 3 this make 4608 PG copies. Given 25
>> OSDs and the heterogenous host sizes, I assume that some OSDs hold more
>> than 200 PGs. There's a threshold for the number of PGs; reaching this
>> threshold keeps the OSDs from accepting new PGs.
>>
>> Try to increase the threshold  (mon_max_pg_per_osd /
>> max_pg_per_osd_hard_ratio / osd_max_pg_per_osd_hard_ratio, not sure about
>> the exact one, consult the documentation) to allow more PGs on the OSDs. If
>> this is the cause of the problem, the peering and activating states should
>> be resolved within a short time.
>>
>> You can also check the number of PGs per OSD with 'ceph osd df'; the last
>> column is the current number of PGs.
>>
>>
>>>
>>> OSD tree:
>>>
>>> ID  CLASS WEIGHT   TYPE NAME STATUS REWEIGHT PRI-AFF
>>>   -1   16.12177 root default
>>> -16   16.12177 datacenter dc01
>>> -19   16.12177 pod dc01-agg01
>>> -108.98700 rack dc01-rack02
>>>   -44.03899 host node1001
>>>0   hdd  0.90999 osd.0 up  1.0 1.0
>>>1   hdd  0.90999 osd.1 up  1.0 1.0
>>>5   hdd  0.90999 osd.5 up  1.0 1.0
>>>2   ssd  0.43700 osd.2 up  1.0 1.0
>>>3   ssd  0.43700 osd.3 up  1.0 1.0
>>>4   ssd  0.43700 osd.4 up  1.0 1.0
>>>   -74.94899 host node1002
>>>9   hdd  0.90999 osd.9 up  1.0 1.0
>>>   10   hdd  0.90999 osd.10up  1.0 1.0
>>>   11   hdd  0.90999 osd.11up  1.0 1.0
>>>   12   hdd  0.90999 osd.12up  1.0 1.0
>>>6   ssd  0.43700 osd.6 up  1.0 1.0
>>>7   ssd  0.43700 osd.7 up  1.0 1.0
>>>8   ssd  0.43700 osd.8 up  1.0 1.0
>>> -117.13477 rack 

Re: [ceph-users] Blocked requests activating+remapped afterextendingpg(p)_num

2018-05-17 Thread Kevin Olbrich
Hi!

Thanks for your quick reply.
Before I read your mail, i applied the following conf to my OSDs:
ceph tell 'osd.*' injectargs '--osd_max_pg_per_osd_hard_ratio 32'

Status is now:
  data:
pools:   2 pools, 1536 pgs
objects: 639k objects, 2554 GB
usage:   5211 GB used, 11295 GB / 16506 GB avail
pgs: 7.943% pgs not active
 5567/1309948 objects degraded (0.425%)
 252327/1309948 objects misplaced (19.262%)
 1030 active+clean
 351  active+remapped+backfill_wait
 107  activating+remapped
 33   active+remapped+backfilling
 15   activating+undersized+degraded+remapped

A little bit better but still some non-active PGs.
I will investigate your other hints!

Thanks
Kevin

2018-05-17 13:30 GMT+02:00 Burkhard Linke <
burkhard.li...@computational.bio.uni-giessen.de>:

> Hi,
>
>
>
> On 05/17/2018 01:09 PM, Kevin Olbrich wrote:
>
>> Hi!
>>
>> Today I added some new OSDs (nearly doubled) to my luminous cluster.
>> I then changed pg(p)_num from 256 to 1024 for that pool because it was
>> complaining about to few PGs. (I noticed that should better have been
>> small
>> changes).
>>
>> This is the current status:
>>
>>  health: HEALTH_ERR
>>  336568/1307562 objects misplaced (25.740%)
>>  Reduced data availability: 128 pgs inactive, 3 pgs peering, 1
>> pg stale
>>  Degraded data redundancy: 6985/1307562 objects degraded
>> (0.534%), 19 pgs degraded, 19 pgs undersized
>>  107 slow requests are blocked > 32 sec
>>  218 stuck requests are blocked > 4096 sec
>>
>>data:
>>  pools:   2 pools, 1536 pgs
>>  objects: 638k objects, 2549 GB
>>  usage:   5210 GB used, 11295 GB / 16506 GB avail
>>  pgs: 0.195% pgs unknown
>>   8.138% pgs not active
>>   6985/1307562 objects degraded (0.534%)
>>   336568/1307562 objects misplaced (25.740%)
>>   855 active+clean
>>   517 active+remapped+backfill_wait
>>   107 activating+remapped
>>   31  active+remapped+backfilling
>>   15  activating+undersized+degraded+remapped
>>   4   active+undersized+degraded+remapped+backfilling
>>   3   unknown
>>   3   peering
>>   1   stale+active+clean
>>
>
> You need to resolve the unknown/peering/activating pgs first. You have
> 1536 PGs, assuming replication size 3 this make 4608 PG copies. Given 25
> OSDs and the heterogenous host sizes, I assume that some OSDs hold more
> than 200 PGs. There's a threshold for the number of PGs; reaching this
> threshold keeps the OSDs from accepting new PGs.
>
> Try to increase the threshold  (mon_max_pg_per_osd /
> max_pg_per_osd_hard_ratio / osd_max_pg_per_osd_hard_ratio, not sure about
> the exact one, consult the documentation) to allow more PGs on the OSDs. If
> this is the cause of the problem, the peering and activating states should
> be resolved within a short time.
>
> You can also check the number of PGs per OSD with 'ceph osd df'; the last
> column is the current number of PGs.
>
>
>>
>> OSD tree:
>>
>> ID  CLASS WEIGHT   TYPE NAME STATUS REWEIGHT PRI-AFF
>>   -1   16.12177 root default
>> -16   16.12177 datacenter dc01
>> -19   16.12177 pod dc01-agg01
>> -108.98700 rack dc01-rack02
>>   -44.03899 host node1001
>>0   hdd  0.90999 osd.0 up  1.0 1.0
>>1   hdd  0.90999 osd.1 up  1.0 1.0
>>5   hdd  0.90999 osd.5 up  1.0 1.0
>>2   ssd  0.43700 osd.2 up  1.0 1.0
>>3   ssd  0.43700 osd.3 up  1.0 1.0
>>4   ssd  0.43700 osd.4 up  1.0 1.0
>>   -74.94899 host node1002
>>9   hdd  0.90999 osd.9 up  1.0 1.0
>>   10   hdd  0.90999 osd.10up  1.0 1.0
>>   11   hdd  0.90999 osd.11up  1.0 1.0
>>   12   hdd  0.90999 osd.12up  1.0 1.0
>>6   ssd  0.43700 osd.6 up  1.0 1.0
>>7   ssd  0.43700 osd.7 up  1.0 1.0
>>8   ssd  0.43700 osd.8 up  1.0 1.0
>> -117.13477 rack dc01-rack03
>> -225.38678 host node1003
>>   17   hdd  0.90970 osd.17up  1.0 1.0
>>   18   hdd  0.90970 osd.18up  1.0 1.0
>>   24   hdd  0.90970 osd.24up  1.0 1.0
>>   26   hdd  0.90970 osd.26up  1.0 1.0
>>   13   ssd  0.43700