[ceph-users] Re: unbalanced OSDs

2023-08-03 Thread Pavlo Astakhov

Take a look at https://github.com/TheJJ/ceph-balancer

We switched to it after lot of attempts to make internal balancer work 
as expected and now we have ~even OSD utilization across cluster:


# ./placementoptimizer.py -v balance --ensure-optimal-moves 
--ensure-variance-decrease

[2023-08-03 23:33:27,954] gathering cluster state via ceph api...
[2023-08-03 23:33:36,081] running pg balancer
[2023-08-03 23:33:36,088] current OSD fill rate per crushclasses:
[2023-08-03 23:33:36,089]   ssd: average=49.86%, median=50.27%, 
without_placement_constraints=53.01%

[2023-08-03 23:33:36,090] cluster variance for crushclasses:
[2023-08-03 23:33:36,090]   ssd: 4.163
[2023-08-03 23:33:36,090] min osd.14 44.698%
[2023-08-03 23:33:36,090] max osd.22 51.897%
[2023-08-03 23:33:36,101] in descending full-order, couldn't empty 
osd.22, so we're done. if you want to try more often, set 
--max-full-move-attempts=$nr, this may unlock more balancing possibilities.
[2023-08-03 23:33:36,101] 


[2023-08-03 23:33:36,101] generated 0 remaps.
[2023-08-03 23:33:36,101] total movement size: 0.0B.
[2023-08-03 23:33:36,102] 


[2023-08-03 23:33:36,102] old cluster variance per crushclass:
[2023-08-03 23:33:36,102]   ssd: 4.163
[2023-08-03 23:33:36,102] old min osd.14 44.698%
[2023-08-03 23:33:36,102] old max osd.22 51.897%
[2023-08-03 23:33:36,102] 


[2023-08-03 23:33:36,103] new min osd.14 44.698%
[2023-08-03 23:33:36,103] new max osd.22 51.897%
[2023-08-03 23:33:36,103] new cluster variance:
[2023-08-03 23:33:36,103]   ssd: 4.163
[2023-08-03 23:33:36,103] 




On 03.08.2023 16:38, Spiros Papageorgiou wrote:

On 03-Aug-23 12:11 PM, Eugen Block wrote:

ceph balancer status


I changed the PGs and it started rebalancing (and turned autoscaler 
off) , so now it will not report status:


It reports: "optimize_result": "Too many objects (0.088184 > 0.05) 
are misplaced; try again later"


Lets wait a few hours to see what happens...

Thanx!

Sp

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: unbalanced OSDs

2023-08-03 Thread Spiros Papageorgiou

On 03-Aug-23 12:11 PM, Eugen Block wrote:

ceph balancer status


I changed the PGs and it started rebalancing (and turned autoscaler off) 
, so now it will not report status:


It reports: "optimize_result": "Too many objects (0.088184 > 0.05) 
are misplaced; try again later"


Lets wait a few hours to see what happens...

Thanx!

Sp

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: unbalanced OSDs

2023-08-03 Thread Eugen Block
Turn off the autoscaler and increase pg_num to 512 or so (power of 2).  
The recommendation is to have between 100 and 150 PGs per OSD (incl.  
replicas). And then let the balancer handle the rest. What is the  
current balancer status (ceph balancer status)?


Zitat von Spiros Papageorgiou :


Hi all,


I have a ceph cluster with 3 nodes. ceph version is 16.2.9. There  
are 7 SSD OSDs on each server and one pool that resides on these OSDs.


My OSDs are terribly unbalanced:

ID  CLASS  WEIGHT    REWEIGHT  SIZE RAW USE  DATA OMAP  
META  AVAIL    %USE   VAR   PGS STATUS  TYPE NAME
-9 28.42200 -   28 TiB  9.3 TiB  9.2 TiB  161 MiB     
26 GiB   19 TiB  32.56  1.09    -  root ssddisks
-2  9.47400 -  9.5 TiB  3.4 TiB  3.4 TiB   66 MiB    
9.2 GiB  6.1 TiB  35.52  1.19    -  host px1-ssd
 0    ssd   1.74599   0.85004  1.7 TiB  810 GiB  807 GiB  3.2 MiB    
2.3 GiB  978 GiB  45.28  1.51   26  up  osd.0
 5    ssd   0.82999   0.85004  850 GiB  581 GiB  580 GiB   22 MiB    
912 MiB  269 GiB  68.38  2.29   19  up  osd.5
 6    ssd   0.82999   1.0  850 GiB  8.2 GiB  7.8 GiB  9.5 MiB    
435 MiB  842 GiB   0.97  0.03    4  up  osd.6
 7    ssd   0.82999   1.0  850 GiB  294 GiB  293 GiB   26 MiB    
591 MiB  556 GiB  34.60  1.16   11  up  osd.7
16    ssd   1.74599   0.85004  1.7 TiB  872 GiB  869 GiB  3.1 MiB    
2.3 GiB  916 GiB  48.75  1.63   27  up  osd.16
23    ssd   1.74599   1.0  1.7 TiB  438 GiB  436 GiB  1.5 MiB    
1.7 GiB  1.3 TiB  24.48  0.82   14  up  osd.23
24    ssd   1.74599   1.0  1.7 TiB  444 GiB  443 GiB  1.6 MiB    
1.0 GiB  1.3 TiB  24.81  0.83   17  up  osd.24
-6  9.47400 -  9.5 TiB  2.9 TiB  2.9 TiB   46 MiB    
8.1 GiB  6.6 TiB  30.39  1.02    -  host px2-ssd
12    ssd   0.82999   1.0  850 GiB  154 GiB  154 GiB   21 MiB    
368 MiB  696 GiB  18.16  0.61    9  up  osd.12
13    ssd   0.82999   1.0  850 GiB  144 GiB  143 GiB  527 KiB    
469 MiB  706 GiB  16.92  0.57    4  up  osd.13
14    ssd   0.82999   1.0  850 GiB  149 GiB  149 GiB   16 MiB    
299 MiB  700 GiB  17.58  0.59    7  up  osd.14
29    ssd   1.74599   1.0  1.7 TiB  449 GiB  448 GiB  1.6 MiB    
1.4 GiB  1.3 TiB  25.11  0.84   20  up  osd.29
30    ssd   1.74599   0.85004  1.7 TiB  885 GiB  882 GiB  3.1 MiB    
2.3 GiB  903 GiB  49.48  1.65   31  up  osd.30
31    ssd   1.74599   1.0  1.7 TiB  728 GiB  727 GiB  2.6 MiB    
1.8 GiB  1.0 TiB  40.74  1.36   22  up  osd.31
32    ssd   1.74599   1.0  1.7 TiB  438 GiB  437 GiB  1.6 MiB    
1.4 GiB  1.3 TiB  24.51  0.82   15  up  osd.32
-4  9.47400 -  9.5 TiB  3.0 TiB  3.0 TiB   49 MiB    
8.7 GiB  6.5 TiB  31.78  1.06    -  host px3-ssd
19    ssd   0.82999   1.0  850 GiB  293 GiB  292 GiB   14 MiB    
500 MiB  557 GiB  34.47  1.15    9  up  osd.19
20    ssd   0.82999   1.0  850 GiB  290 GiB  290 GiB   10 MiB    
482 MiB  560 GiB  34.15  1.14   10  up  osd.20
21    ssd   0.82999   1.0  850 GiB  148 GiB  147 GiB   16 MiB    
428 MiB  702 GiB  17.36  0.58    5  up  osd.21
25    ssd   1.74599   1.0  1.7 TiB  446 GiB  445 GiB  1.8 MiB    
1.6 GiB  1.3 TiB  24.96  0.83   19  up  osd.25
26    ssd   1.74599   1.0  1.7 TiB  739 GiB  737 GiB  2.6 MiB    
2.0 GiB  1.0 TiB  41.33  1.38   29  up  osd.26
27    ssd   1.74599   1.0  1.7 TiB  725 GiB  723 GiB  2.6 MiB    
2.1 GiB  1.0 TiB  40.55  1.36   21  up  osd.27
28    ssd   1.74599   1.0  1.7 TiB  442 GiB  440 GiB  1.6 MiB    
1.7 GiB  1.3 TiB  24.72  0.83   17  up  osd.28


I have done a "ceph osd reweight-by-utilization" and "ceph osd  
set-require-min-compat-client luminous". The pool has 32 PGs which  
were set by autoscale_mode, which is on.


Why are my OSDs, so unbalanced? I have osd.5 with 68.3% and osd.6  
with 0.97%  Also when the reweight-by-utilization, osd.5  
utilization actually increased...



What am i missing here?


Sp

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unbalanced OSDs when pg_autoscale enabled

2023-03-30 Thread 郑亮
I set the target_size_ratio of pools by mistake as multiple pools sharing
the same raw capacity. After I adjust it, a large number of pgs are in the
backfill state, but the usage rate of osds is still growing, How do I need
to adjust it?

[root@node01 smd]# ceph osd pool autoscale-statusPOOL
SIZE  TARGET SIZERATE  RAW
CAPACITY   RATIO  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW
PG_NUM  AUTOSCALE  BULK   device_health_metrics
291.9M  3.048289G  0.
1.0   1  on False
deeproute-replica-hdd-pool  12613M
 3.0785.8T  0.00100.1000   0.0010   1.0
  32  on False  deeproute-replica-ssd-pool
 12352G  3.048289G  0.9901
  50.   0.9901   1.01024  on False
 .rgw.root5831
  3.048289G  0.00990.5000   0.0099   1.0
8  on False  default.rgw.log
182   3.048289G  0.
  1.0  32  on
False  default.rgw.control 0
3.048289G  0.
1.0  32  on False  default.rgw.meta
0   3.048289G
0.  4.0   8  on
 False  os-dsglczutvqsgowpz.rgw.control 0
 3.016096G  0.16670.5000
0.1667   1.0  64  on False
os-dsglczutvqsgowpz.rgw.meta104.2k
 3.016096G  0.16670.5000   0.1667   1.0
  64  on False
os-dsglczutvqsgowpz.rgw.buckets.index   57769M
 3.016096G  0.16670.5000   0.1667   1.0
  32  on False
os-dsglczutvqsgowpz.rgw.buckets.non-ec  496.5M
 3.016096G  0.16670.5000   0.1667   1.0
  32  on False  os-dsglczutvqsgowpz.rgw.log
 309.0M  3.016096G  0.1667
   0.5000   0.1667   1.0  32  on False
 os-dsglczutvqsgowpz.rgw.buckets.data147.7T
1.333730697632785.8T  0.7992   80.
0.7992   1.01024  on False  cephfs-metadata
  3231M  3.0
16096G  0.0006  4.0  32
  on False  cephfs-replicated-pool  23137G
 3.0785.8T  0.1998   20.
0.1998   1.0 128  on False  .nfs
 100.1k  3.0
 48289G  0.  1.0  32
   on False  os-dsglczutvqsgowpz.rgw.otp 0
  3.016096G  0.16670.5000
 0.1667   1.0   8  on False

[root@node01 pg]# ceph osd df | egrep -i "name|hdd"
  1hdd  10.91399   1.0   11 TiB  5.3 TiB  5.2 TiB   17 KiB
55 GiB  5.6 TiB  48.26  1.33  195  up
 22hdd  10.91399   1.0   11 TiB  8.6 TiB  8.6 TiB8 KiB
77 GiB  2.3 TiB  79.11  2.17  212  up
 31hdd  10.91399   1.0   11 TiB  3.1 TiB  3.0 TiB   12 KiB
29 GiB  7.8 TiB  28.11  0.77  197  up
 51hdd  10.91399   1.0   11 TiB  3.9 TiB  3.8 TiB   10 KiB
38 GiB  7.1 TiB  35.28  0.97  186  up
 60hdd  10.91399   1.0   11 TiB  927 GiB  916 GiB   14 KiB
11 GiB   10 TiB   8.29  0.23  167  up
 70hdd  10.91399   1.0   11 TiB  2.3 TiB  2.2 TiB   13 KiB
5.2 GiB  8.7 TiB  20.63  0.57  177  up
 78hdd  10.91399   1.0   11 TiB  3.2 TiB  3.2 TiB   17 KiB
31 GiB  7.7 TiB  29.56  0.81  185  up
 96hdd  10.91399   1.0   11 TiB  3.9 TiB  3.8 TiB   11 KiB
38 GiB  7.1 TiB  35.31  0.97  195  up
  9hdd  10.91399   1.0   11 TiB  3.0 TiB  2.9 TiB   17 KiB
14 GiB  8.0 TiB  27.11  0.75  183  up
 19hdd  10.91399   1.0   11 TiB  4.0 TiB  4.0 TiB   11 KiB
47 GiB  6.9 TiB  36.66  1.01  192  up
 29hdd  10.91399   1.0   11 TiB  5.3 TiB  5.2 TiB   14 KiB
40 GiB  5.6 TiB  48.40  1.33  202  up
 47hdd  10.91399   1.0   11 TiB  736 GiB  734 GiB   10 KiB
2.1 GiB   10 TiB   6.59  0.18  172  up
 56hdd  10.91399   1.0   11 TiB  3.9 TiB  3.8 TiB   10 KiB
38 GiB  7.0 TiB  35.56  0.98  184  up
 65hdd  10.91399   1.0   11 TiB  6.1 TiB  6.1 TiB9 KiB
57 GiB  4.8 TiB  56.22  1.55  214  up
 88hdd  10.91399   1.0   11 TiB  4.6 TiB  4.6 TiB   13 KiB
47 GiB  6.3 TiB  42.50  1.17  194  up
 98hdd  10.91399   1.0   11 TiB  7.6 TiB  7.5 TiB   14 KiB
60 GiB  3.3 TiB  69.31  1.90  210  up
  2hdd  10.91399   1.0   11 TiB  1.6 TiB  1.6 TiB   16 KiB
19 GiB  9.3 TiB  14.38  0.40  182  up
 30hdd