Just FYI - there is no need to reboot the server after this tuning. OSD restart is sufficient.

On 29.04.2025 14:33, Igor Fedotov wrote:

Marco,

this validation was introduced in v18.2.5 as not following the rule could result in OSD crash in some cases.

So better to catch that sooner than later.


Thanks,

Igor

On 29.04.2025 14:27, Marco Pizzolo wrote:
Hi Igor,

Thank you so very much for responding so quickly. Interestingly, I don't remember setting these values, but I did see a global level override for 0.8 on one, and 0.2 on another, so I removed the global overrides and am rebooting the server to see what happens.

I should know soon enough how things are looking.

I'll report back, but I don't understand why I would have been able to upgrade this over the past 4-5 years from 14 --> 15 --> 16 --> 17 --> 18.2.4 without issues, but now going from 18.2.4 --> 18.2.6 I am dead in the water.

Thanks,
Marco

On Tue, Apr 29, 2025 at 1:18 PM Igor Fedotov <igor.fedo...@croit.io> wrote:

    Hi Marco,

    the following log line (unfortunately it was cut off) sheds some
    light:

    "
    Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug
    2025-04-29T10:24:09.287+0000 7f6961ae9740 -1
    bluestore(/var/lib/ceph/osd/ceph-3) _set_cache_sizes
    bluestore_cache_meta_>

    "

    Likely it says that sum of bluestore_cache_meta_ratio +
    bluestore_cache_kv_ratio + bluestore_cache_kv_onode_ratio config
    parameters exceeds 1.0

    So one has to tune the parameters in a way to get the sum less or
    equal
    to 1.0.

    Default settings are:

    bluestore_cache_meta_ratio = 0.45

    bluestore_cache_kv_ratio = 0.45

    bluestore_cache_kv_onode_ratio = 0.04


    Thanks,

    Igor



    On 29.04.2025 13:36, Marco Pizzolo wrote:
    > Hello Everyone,
    >
    > I'm upgrading from 18.2.4 to 18.2.6, and I have a 4-node
    cluster with 8
    > NVMe's per node.  Each NVMe is split into 2 OSDs.  The upgrade
    went through
    > the mgr, mon, crash and began upgrading OSDs.
    >
    > The OSDs it was upgrading were not coming back online.
    >
    > I tried rebooting, and no luck.
    >
    > journalctl -xe shows the following:
    >
    > ░░ The unit
    >
    
docker-02cb79ef9a657cdaa26b781966aa6d2f1d5e54cdc9efa6c5ff1f0e98c3a866e4.scope
    > has successfully entered the 'dead' state.
    > Apr 29 06:24:09 prdhcistonode01 dockerd[2967]:
    > time="2025-04-29T06:24:09.282073583-04:00" level=info
    msg="ignoring event"
    > container=76c56ddd668015de0022bfa2527060e64a9513>
    > Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
    > time="2025-04-29T06:24:09.282129114-04:00" level=info msg="shim
    > disconnected" id=76c56ddd668015de0022bfa2527060e64a95137>
    > Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
    > time="2025-04-29T06:24:09.282219664-04:00" level=warning
    msg="cleaning up
    > after shim disconnected" id=76c56ddd668015de00>
    > Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
    > time="2025-04-29T06:24:09.282242484-04:00" level=info
    msg="cleaning up dead
    > shim"
    > Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug
    > 2025-04-29T10:24:09.287+0000 7f6961ae9740  1 mClockScheduler:
    > set_osd_capacity_params_from_config: osd_bandwidth_cost_p>
    > Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug
    > 2025-04-29T10:24:09.287+0000 7f6961ae9740  0 osd.3:0.OSDShard
    using op
    > scheduler mclock_scheduler, cutoff=196
    > Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug
    > 2025-04-29T10:24:09.287+0000 7f6961ae9740  1 bdev(0x56046b4c8000
    > /var/lib/ceph/osd/ceph-3/block) open path /var/lib/cep>
    > Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
    > time="2025-04-29T06:24:09.292047607-04:00" level=warning
    msg="cleanup
    > warnings time=\"2025-04-29T06:24:09-04:00\" level=>
    > Apr 29 06:24:09 prdhcistonode01 dockerd[2967]:
    > time="2025-04-29T06:24:09.292163618-04:00" level=info
    msg="ignoring event"
    > container=02cb79ef9a657cdaa26b781966aa6d2f1d5e54>
    > Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
    > time="2025-04-29T06:24:09.292216428-04:00" level=info msg="shim
    > disconnected" id=02cb79ef9a657cdaa26b781966aa6d2f1d5e54c>
    > Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
    > time="2025-04-29T06:24:09.292277279-04:00" level=warning
    msg="cleaning up
    > after shim disconnected" id=02cb79ef9a657cdaa2>
    > Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
    > time="2025-04-29T06:24:09.292291949-04:00" level=info
    msg="cleaning up dead
    > shim"
    > Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug
    > 2025-04-29T10:24:09.287+0000 7f6961ae9740  1 bdev(0x56046b4c8000
    > /var/lib/ceph/osd/ceph-3/block) open size 640122932428>
    > Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug
    > 2025-04-29T10:24:09.287+0000 7f6961ae9740 -1
    > bluestore(/var/lib/ceph/osd/ceph-3) _set_cache_sizes
    bluestore_cache_meta_>
    > Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug
    > 2025-04-29T10:24:09.287+0000 7f6961ae9740  1 bdev(0x56046b4c8000
    > /var/lib/ceph/osd/ceph-3/block) close
    > Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
    > time="2025-04-29T06:24:09.303385220-04:00" level=warning
    msg="cleanup
    > warnings time=\"2025-04-29T06:24:09-04:00\" level=>
    > Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug
    > 2025-04-29T10:24:09.307+0000 7f2c10403740  1 mClockScheduler:
    > set_osd_capacity_params_from_config: osd_bandwidth_cost_p>
    > Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug
    > 2025-04-29T10:24:09.307+0000 7f2c10403740  0 osd.0:0.OSDShard
    using op
    > scheduler mclock_scheduler, cutoff=196
    > Apr 29 06:24:09 prdhcistonode01 bash[23144]: debug
    > 2025-04-29T10:24:09.307+0000 7f12f08c5740 -1 osd.15 0 OSD:init:
    unable to
    > mount object store
    > Apr 29 06:24:09 prdhcistonode01 bash[23144]: debug
    > 2025-04-29T10:24:09.307+0000 7f12f08c5740 -1  ** ERROR: osd
    init failed:
    > (22) Invalid argument
    > Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug
    > 2025-04-29T10:24:09.307+0000 7f2c10403740  1 bdev(0x55d5e45f0000
    > /var/lib/ceph/osd/ceph-0/block) open path /var/lib/cep>
    > Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug
    > 2025-04-29T10:24:09.307+0000 7f2c10403740  1 bdev(0x55d5e45f0000
    > /var/lib/ceph/osd/ceph-0/block) open size 640122932428>
    > Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug
    > 2025-04-29T10:24:09.307+0000 7f2c10403740 -1
    > bluestore(/var/lib/ceph/osd/ceph-0) _set_cache_sizes
    bluestore_cache_meta_>
    > Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug
    > 2025-04-29T10:24:09.307+0000 7f2c10403740  1 bdev(0x55d5e45f0000
    > /var/lib/ceph/osd/ceph-0/block) close
    > Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug
    > 2025-04-29T10:24:09.363+0000 7f30b83b1740  1 mClockScheduler:
    > set_osd_capacity_params_from_config: osd_bandwidth_cost_p>
    > Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug
    > 2025-04-29T10:24:09.363+0000 7f30b83b1740  0 osd.8:0.OSDShard
    using op
    > scheduler mclock_scheduler, cutoff=196
    > Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug
    > 2025-04-29T10:24:09.363+0000 7f30b83b1740  1 bdev(0x555f40688000
    > /var/lib/ceph/osd/ceph-8/block) open path /var/lib/cep>
    > Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug
    > 2025-04-29T10:24:09.363+0000 7f30b83b1740  1 bdev(0x555f40688000
    > /var/lib/ceph/osd/ceph-8/block) open size 640122932428>
    > Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug
    > 2025-04-29T10:24:09.363+0000 7f30b83b1740 -1
    > bluestore(/var/lib/ceph/osd/ceph-8) _set_cache_sizes
    bluestore_cache_meta_>
    > Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug
    > 2025-04-29T10:24:09.363+0000 7f30b83b1740  1 bdev(0x555f40688000
    > /var/lib/ceph/osd/ceph-8/block) close
    > Apr 29 06:24:09 prdhcistonode01 systemd[1]:
    > ceph-fbc38f5c-a3a6-11ea-805c-3b954db9ce7a@osd.12.service: Main
    process
    > exited, code=exited, status=1/FAILURE
    >
    >
    > Any help you can offer would be greatly appreciated. This is
    running in
    > docker:
    >
    > Client: Docker Engine - Community
    >   Version:           24.0.7
    >   API version:       1.43
    >   Go version:        go1.20.10
    >   Git commit:        afdd53b
    >   Built:             Thu Oct 26 09:08:01 2023
    >   OS/Arch:           linux/amd64
    >   Context:           default
    >
    > Server: Docker Engine - Community
    >   Engine:
    >    Version:          24.0.7
    >    API version:      1.43 (minimum version 1.12)
    >    Go version:       go1.20.10
    >    Git commit:       311b9ff
    >    Built:            Thu Oct 26 09:08:01 2023
    >    OS/Arch:          linux/amd64
    >    Experimental:     false
    >   containerd:
    >    Version:          1.6.25
    >    GitCommit: d8f198a4ed8892c764191ef7b3b06d8a2eeb5c7f
    >   runc:
    >    Version:          1.1.10
    >    GitCommit:        v1.1.10-0-g18a0cb0
    >   docker-init:
    >    Version:          0.19.0
    >    GitCommit:        de40ad0
    >
    > Thanks,
    > Marco
    > _______________________________________________
    > ceph-users mailing list -- ceph-users@ceph.io
    > To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to