Sophie Blee-Goldman created KAFKA-8627:
------------------------------------------
Summary: Investigate batching on state restore
Key: KAFKA-8627
URL: https://issues.apache.org/jira/browse/KAFKA-8627
Project: Kafka
Issue Type: Improvement
Components: streams
Reporter: Sophie Blee-Goldman
Currently when rebuilding state from scratch, we form batches based on whatever
is returned by poll() and write them to RocksDB. Given the structure of
RocksDB, inserting large sorted batches gives the best performance when writing
large amounts of data at once, so we should investigate the potential
restore-time improvement of
1) Larger batches – either by tuning the restore consumer to return larger
amounts of data, buffering records into larger batches, or both
2) Sorting batches
These two factors are likely to be coupled, so we should explore the
performance gains/hits by varying both if possible (ie turn sorting on/off with
a variety of batch sizes)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)