Hi guys, When we introduced `deleteRange` for ledger index deletion, we found a segment fault in ledger index deletion [1]. I suggest upgrading the RocksDB version to the latest 7.9.x to see if the RocksDB deleteRange segment fault bug has been fixed. I have tested upgrading the RocksDB version from 6.10.2 to 7.9.2 and rollback to 6.10.2. This is the testing result.
- The upgrade process works fine in the following cases - Pulsar producer keeps producing messages to Pulsar topic - Pulsar consumer consumes messages from the Pulsar topic, and the Pulsar broker fetches messages from the BookKeeper cluster - Trigger compaction to cleanup expired ledgers - Rollback RocksDB version from 7.9.2 to 6.10.2, and bookie failed to startup with the following exception. ```java 2023-01-29T17:09:27,794+0800 [main] ERROR org.apache.bookkeeper.server.Main - Failed to build bookie server java.io.IOException: Error open RocksDB database at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:200) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6] at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:89) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6] at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.lambda$static$0(KeyValueStorageRocksDB.java:63) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6] at org.apache.bookkeeper.bookie.storage.ldb.LedgerMetadataIndex.<init>(LedgerMetadataIndex.java:68) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6] at org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.<init>(SingleDirectoryDbLedgerStorage.java:170) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6] at org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage.newSingleDirectoryDbLedgerStorage(DbLedgerStorage.java:150) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6] at org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage.initialize(DbLedgerStorage.java:129) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6] at org.apache.bookkeeper.bookie.Bookie.<init>(Bookie.java:819) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6] at org.apache.bookkeeper.proto.BookieServer.newBookie(BookieServer.java:152) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6] at org.apache.bookkeeper.proto.BookieServer.<init>(BookieServer.java:120) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6] at org.apache.bookkeeper.server.service.BookieService.<init>(BookieService.java:52) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6] at org.apache.bookkeeper.server.Main.buildBookieServer(Main.java:304) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6] at org.apache.bookkeeper.server.Main.doMain(Main.java:226) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6] at org.apache.bookkeeper.server.Main.main(Main.java:208) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6] Caused by: org.rocksdb.RocksDBException: unknown checksum type 4 in data/bookkeeper/ledgers/current/ledgers/000025.sst offset 1078 size 33 at org.rocksdb.RocksDB.open(Native Method) ~[org.rocksdb-rocksdbjni-6.10.2.jar:?] at org.rocksdb.RocksDB.open(RocksDB.java:239) ~[org.rocksdb-rocksdbjni-6.10.2.jar:?] at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:197) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6] ... 13 more ``` The root cause of this exception is that RocksDB 7.9.2 uses `kXXH3` by default, and `kXXH3` is only supported since RocksDB 6.27 https://github.com/facebook/rocksdb/blob/79e57a39a33dbe17c8f51167e40e66d6c91f8eb4/include/rocksdb/table.h#L56 For the BookKeeper master branch, we have upgraded the RocksDB to `6.29.4.1`, which can support RocksDB upgrade to 7.9.2 and rollback to 6.29.4.1. For the RocksDB < 6.27, we can push a fix to ensure RocksDB 7.9.2 does not use the latest checksum type `kXXH3` I suggest doing this upgradation. Do you have any concerns? [1] https://github.com/apache/bookkeeper/issues/3734 Thanks, Hang