Zakhar,

my general concern about downgrading to previous versions is that this procedure is generally neither assumed nor tested by dev team. Although is possible most of the time. But in this specific case it is not doable due to (at least) https://github.com/ceph/ceph/pull/52212 which enables 4K bluefs allocation unit support - once some daemon gets it - there is no way back.

I'm still thinking that setting "fit_to_fast" mode without enabling dynamic compaction levels is quite safe but definitely it's better to be tested in the real environment and under actual payload first. Also you might want to apply such a workaround gradually - one daemon first, bake it for a while, then apply for the full node, bake a bit more and finally go forward and update the remaining. Or even better - bake it in a test cluster first.

Alternatively you might consider building updated code yourself and make patched binaries on top of .14...


Thanks,

Igor


On 20/10/2023 15:10, Zakhar Kirpichenko wrote:
Thank you, Igor.

It is somewhat disappointing that fixing this bug in Pacific has such a low priority, considering its impact on existing clusters.

The document attached to the PR explicitly says about `level_compaction_dynamic_level_bytes` that "enabling it on an existing DB requires special caution", we'd rather not experiment with something that has the potential to cause data corruption or loss in a production cluster. Perhaps a downgrade to the previous version, 16.2.13 which worked for us without any issues, is an option, or would you advise against such a downgrade from 16.2.14?

/Z

On Fri, 20 Oct 2023 at 14:46, Igor Fedotov <igor.fedo...@croit.io> wrote:

    Hi Zakhar,

    Definitely we expect one more (and apparently the last) Pacific
    minor release. There is no specific date yet though - the plans
    are to release Quincy and Reef minor releases prior to it.
    Hopefully to be done before the Christmas/New Year.

    Meanwhile you might want to workaround the issue by tuning
    bluestore_volume_selection_policy. Unfortunately most likely my
    original proposal to set it to rocksdb_original wouldn't work in
    this case so you better try "fit_to_fast" mode. This should be
    coupled with enabling 'level_compaction_dynamic_level_bytes' mode
    in RocksDB - there is pretty good spec on applying this mode to
    BlueStore attached to https://github.com/ceph/ceph/pull/37156.


    Thanks,

    Igor

    On 20/10/2023 06:03, Zakhar Kirpichenko wrote:
    Igor, I noticed that there's no roadmap for the next 16.2.x
    release. May I ask what time frame we are looking at with regards
    to a possible fix?

    We're experiencing several OSD crashes caused by this issue per day.

    /Z

    On Mon, 16 Oct 2023 at 14:19, Igor Fedotov
    <igor.fedo...@croit.io> wrote:

        That's true.

        On 16/10/2023 14:13, Zakhar Kirpichenko wrote:
        Many thanks, Igor. I found previously submitted bug reports
        and subscribed to them. My understanding is that the issue
        is going to be fixed in the next Pacific minor release.

        /Z

        On Mon, 16 Oct 2023 at 14:03, Igor Fedotov
        <igor.fedo...@croit.io> wrote:

            Hi Zakhar,

            please see my reply for the post on the similar issue at:
            
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/YNJ35HXN4HXF4XWB6IOZ2RKXX7EQCEIY/


            Thanks,

            Igor

            On 16/10/2023 09:26, Zakhar Kirpichenko wrote:
            > Hi,
            >
            > After upgrading to Ceph 16.2.14 we had several OSD crashes
            > in bstore_kv_sync thread:
            >
            >
            >     1. "assert_thread_name": "bstore_kv_sync",
            >     2. "backtrace": [
            >     3. "/lib64/libpthread.so.0(+0x12cf0)
            [0x7ff2f6750cf0]",
            >     4. "gsignal()",
            >     5. "abort()",
            >     6. "(ceph::__ceph_assert_fail(char const*, char
            const*, int, char
            >     const*)+0x1a9) [0x564dc5f87d0b]",
            >     7. "/usr/bin/ceph-osd(+0x584ed4) [0x564dc5f87ed4]",
            >     8. "(RocksDBBlueFSVolumeSelector::sub_usage(void*,
            bluefs_fnode_t
            >     const&)+0x15e) [0x564dc6604a9e]",
            >     9. "(BlueFS::_flush_range_F(BlueFS::FileWriter*,
            unsigned long, unsigned
            >     long)+0x77d) [0x564dc66951cd]",
            >     10. "(BlueFS::_flush_F(BlueFS::FileWriter*, bool,
            bool*)+0x90)
            >     [0x564dc6695670]",
            >     11. "(BlueFS::fsync(BlueFS::FileWriter*)+0x18b)
            [0x564dc66b1a6b]",
            >     12. "(BlueRocksWritableFile::Sync()+0x18)
            [0x564dc66c1768]",
            >     13.
            "(rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions
            >     const&, rocksdb::IODebugContext*)+0x1f)
            [0x564dc6b6496f]",
            >     14.
            "(rocksdb::WritableFileWriter::SyncInternal(bool)+0x402)
            >     [0x564dc6c761c2]",
            >     15.
            "(rocksdb::WritableFileWriter::Sync(bool)+0x88)
            [0x564dc6c77808]",
            >     16.
            "(rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup
            >     const&, rocksdb::log::Writer*, unsigned long*,
            bool, bool, unsigned
            >     long)+0x309) [0x564dc6b780c9]",
            >     17.
            "(rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&,
            >     rocksdb::WriteBatch*, rocksdb::WriteCallback*,
            unsigned long*, unsigned
            >     long, bool, unsigned long*, unsigned long,
            >     rocksdb::PreReleaseCallback*)+0x2629)
            [0x564dc6b80c69]",
            >     18. "(rocksdb::DBImpl::Write(rocksdb::WriteOptions
            const&,
            >     rocksdb::WriteBatch*)+0x21) [0x564dc6b80e61]",
            >     19.
            "(RocksDBStore::submit_common(rocksdb::WriteOptions&,
            >  std::shared_ptr<KeyValueDB::TransactionImpl>)+0x84)
            [0x564dc6b1f644]",
            >     20.
            
"(RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x9a)
            >     [0x564dc6b2004a]",
            >     21. "(BlueStore::_kv_sync_thread()+0x30d8)
            [0x564dc6602ec8]",
            >     22. "(BlueStore::KVSyncThread::entry()+0x11)
            [0x564dc662ab61]",
            >     23. "/lib64/libpthread.so.0(+0x81ca)
            [0x7ff2f67461ca]",
            >     24. "clone()"
            >     25. ],
            >
            >
            > I am attaching two instances of crash info for further
            reference:
            > https://pastebin.com/E6myaHNU
            >
            > OSD configuration is rather simple and close to default:
            >
            > osd.6         dev  bluestore_cache_size_hdd 4294967296
            >   osd.6         dev
            > bluestore_cache_size_ssd 4294967296
            >                    osd  advanced  debug_rocksdb
            >    1/5                              osd
            >          advanced  osd_max_backfills              2
            >         osd           basic
            > osd_memory_target  17179869184
            >                      osd  advanced 
            osd_recovery_max_active
            >      2                          osd
            >      advanced  osd_scrub_sleep          0.100000
            > osd           advanced
            >   rbd_balance_parent_reads false
            >
            > debug_rocksdb is a recent change, otherwise this
            configuration has been
            > running without issues for months. The crashes
            happened on two different
            > hosts with identical hardware, the hosts and storage
            (NVME DB/WAL, HDD
            > block) don't exhibit any issues. We have not
            experienced such crashes with
            > Ceph < 16.2.14.
            >
            > Is this a known issue, or should I open a bug report?
            >
            > Best regards,
            > Zakhar
            > _______________________________________________
            > ceph-users mailing list -- ceph-users@ceph.io
            > To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to