Hi Benjamin

This behavior was introduced in Ceph with the new mClock scheduler [1]. If the mClock scheduler is used, the osd_max_backfills option gets overridden (to 1000), among others.

This is what is very likely causing the issues in your cluster when rebalancing. With the mClock scheduler the parameters for tuning rebalancing have changed. In our wiki you can find a description of the new parameters and how you can use them [2].

This should be fixed in the newer Ceph version 17.2.6 [3] [4], which is already available via our repositories (no-subscription as well as enterprise). It contains the fix for this issue and should override the max_backfills to a more reasonable value. Nevertheless, you should still take a look at the new mClock tuning options.

Kind Regards
Stefan

[1] https://github.com/ceph/ceph/pull/38920
[2] https://pve.proxmox.com/wiki/Ceph_mclock_tuning
[3] https://github.com/ceph/ceph/pull/48226/files
[4] https://github.com/ceph/ceph/commit/89e48395f8b1329066a1d7e05a4e9e083c88c1a6

On 5/30/23 12:00, Benjamin Hofer wrote:
Dear community,

We've set up a Proxmox hyper-converged Ceph cluster in production.
After syncing in one new OSD using the "pveceph osd create" command,
we got massive network performance issues and outages. We then found
that "osd_max_backfills" is set to 1000 (Ceph default is 1) and that
this (along with some other values) have been overridden.

Does anyone know a root cause? I can't imagine that this is the
Proxmox default behaviour and I'm very sure that we didn't change
anything (actually I didn't even know the value before researching and
talking to colleagues with deeper Ceph knowledge).

System:

PVE version output: pve-manager/7.3-6/723bb6ec (running kernel: 5.15.102-1-pve)
ceph version 17.2.5 (e04241aa9b639588fa6c864845287d2824cb6b55) quincy (stable)

# ceph config get osd.1
WHO    MASK  LEVEL  OPTION                            VALUE         RO
osd.1        basic  osd_mclock_max_capacity_iops_ssd  17080.220753

# ceph config show osd.1
NAME                                             VALUE
             SOURCE    OVERRIDES  IGNORES
auth_client_required                             cephx
             file
auth_cluster_required                            cephx
             file
auth_service_required                            cephx
             file
cluster_network                                  10.0.18.0/24
             file
daemonize                                        false
             override
keyring                                          $osd_data/keyring
             default
leveldb_log
             default
mon_allow_pool_delete                            true
             file
mon_host                                         10.0.18.30 10.0.18.10
10.0.18.20  file
ms_bind_ipv4                                     true
             file
ms_bind_ipv6                                     false
             file
no_config_file                                   false
             override
osd_delete_sleep                                 0.000000
             override
osd_delete_sleep_hdd                             0.000000
             override
osd_delete_sleep_hybrid                          0.000000
             override
osd_delete_sleep_ssd                             0.000000
             override
osd_max_backfills                                1000
             override
osd_mclock_max_capacity_iops_ssd                 17080.220753
             mon
osd_mclock_scheduler_background_best_effort_lim  999999
             default
osd_mclock_scheduler_background_best_effort_res  534
             default
osd_mclock_scheduler_background_best_effort_wgt  2
             default
osd_mclock_scheduler_background_recovery_lim     2135
             default
osd_mclock_scheduler_background_recovery_res     534
             default
osd_mclock_scheduler_background_recovery_wgt     1
             default
osd_mclock_scheduler_client_lim                  999999
             default
osd_mclock_scheduler_client_res                  1068
             default
osd_mclock_scheduler_client_wgt                  2
             default
osd_pool_default_min_size                        2
             file
osd_pool_default_size                            3
             file
osd_recovery_max_active                          1000
             override
osd_recovery_max_active_hdd                      1000
             override
osd_recovery_max_active_ssd                      1000
             override
osd_recovery_sleep                               0.000000
             override
osd_recovery_sleep_hdd                           0.000000
             override
osd_recovery_sleep_hybrid                        0.000000
             override
osd_recovery_sleep_ssd                           0.000000
             override
osd_scrub_sleep                                  0.000000
             override
osd_snap_trim_sleep                              0.000000
             override
osd_snap_trim_sleep_hdd                          0.000000
             override
osd_snap_trim_sleep_hybrid                       0.000000
             override
osd_snap_trim_sleep_ssd                          0.000000
             override
public_network                                   10.0.18.0/24
             file
rbd_default_features                             61
             default
rbd_qos_exclude_ops                              0
             default
setgroup                                         ceph
             cmdline
setuser                                          ceph
             cmdline

Thanks a lot in advance.

Best
Benjamin

_______________________________________________
pve-user mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user




_______________________________________________
pve-user mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Reply via email to