Limit the rocksdb batch size to avoid potential OOM risk

Yong Zhang Tue, 08 Aug 2023 02:55:28 -0700

Hi everyone,

I was tested with this code to verify the batch impaction on memory.
https://gist.github.com/zymap/19249ab35bb0f64c55cbf7f2e8356cb3

I found the memory keeps increasing with the batch size. And if the
batch is not flushed into the sst, it will save into the WAL file, and the
WAL file will not be limited by `max_total_wal_size`.
If it was OOM killed because of the large batch and the batch was
saved in the WAL. The only way to reopen the rocksDB is to add
more memory for the bookie.
I also talk this issue with rocksDB community, they said:
> when the batch size is so large (esp if you run multiple batches
together)
the wal size may reach (limit + batch-size * number of open batches). We
have a project opened by our friends from Kafka streams to handle
huge batch size. In the meanwhile can you restrict the size of your batch ?

--
In the Pulsar, the compacted ledger hasn't a rollover policy or retention
policy. If the user has tons of keys in the compaction, that would make
the compacted ledger bigger and bigger. In our environment, a compacted
ledger reached 200G. It contains lots of entries in a single ledger, which
makes the batch very large.
In release 4.14.7 and branch-4.15, we didn't limit the delete operation
numbers in a single batch.
https://github.com/apache/bookkeeper/blob/c22136b03489db1643521f586e9cae2c4a511e10/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/storage/ldb/EntryLocationIndex.java#L241
Finally, when bookie runs the garbage collection and removes the ledger,
it will be OOM killed because of the large batch.

Pulsar already has a proposal about configuring the compacted topic ledger
retention, https://github.com/apache/pulsar/issues/19665.
But I still think we also need to have a way to control the batch size to
make
sure we have a way to limit the memory.

Here is the PR for this change:
https://github.com/apache/bookkeeper/pull/4044

Best,
Yong

Limit the rocksdb batch size to avoid potential OOM risk

Reply via email to