[
https://issues.apache.org/jira/browse/KAFKA-13239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418705#comment-17418705
]
Guozhang Wang edited comment on KAFKA-13239 at 9/22/21, 4:39 PM:
-----------------------------------------------------------------
{quote} What does this mean/how do memtables factor into this?{quote}
This is for the {{allowBlockingFlush}} call specifically. The
{{ingestExternalFile}} function can be called in the middle of reads/writes
where the mem-table was not empty, and when ingesting the sst files, if there's
overlap with the mem-tables blocking calls to flush the memtable may be
triggered (? I am not 100% sure why that's the case, worth digging deeper); and
if that is not allowed via the setter the ingestion call would fail. For our
case, the mem-tables "should" be empty during restoration, but no one knows
what rocksDB would be doing upon opening really, so just to be on the safer
side while I was browsing through the interface I think it may be better to
just set it.
{quote} Either way, I think I'm not convinced that it will be so simple to
avoid issues due to compaction/write stalls. {quote}
Yeah I agree. My understanding is that {{RocksDB.ingestExternalFile}} itself is
not blocking on compactions, and hence background compactions may still be
pretty hot behind the scene even when it returns. I also feel that it's better
to first do the restoration thread first
(https://issues.apache.org/jira/browse/KAFKA-10199), and then consider this one.
was (Author: guozhang):
{quote} What does this mean/how do memtables factor into this?{quote}
This is for the {{allowBlockingFlush}} call specifically. The
{{ingestExternalFile}} function can be called in the middle of reads/writes
where the mem-table was not empty, and when ingesting the sst files, if there's
overlap with the mem-tables blocking calls to flush the memtable may be
triggered (? I am not 100% sure why that's the case, worth digging deeper); and
if that is not allowed via the setter the ingestion call would fail. For our
case, the mem-tables "should" be empty during restoration, but no one knows
what rocksDB would be doing upon opening really, so just to be on the safer
side while I was browsing through the interface I think it may be better to
just set it.
{quote} Either way, I think I'm not convinced that it will be so simple to
avoid issues due to compaction/write stalls. {quote}
Yeah I agree. My understanding is that {{RocksDB.ingestExternalFile}} itself is
not blocking on compactions, and hence background compactions may still be
pretty hot behind the scene even when it returns. I also feel that it's better
to first do the restoration thread first, and then consider this one.
> Use RocksDB.ingestExternalFile for restoration
> ----------------------------------------------
>
> Key: KAFKA-13239
> URL: https://issues.apache.org/jira/browse/KAFKA-13239
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Reporter: Guozhang Wang
> Priority: Major
>
> Now that we are in newer version of RocksDB, we can consider using the new
> {code}
> ingestExternalFile(final ColumnFamilyHandle columnFamilyHandle,
> final List<String> filePathList,
> final IngestExternalFileOptions ingestExternalFileOptions)
> {code}
> for restoring changelog into state stores. More specifically:
> 1) Use larger default batch size in restore consumer polling behavior so that
> each poll would return more records as possible.
> 2) For a single batch of records returned from a restore consumer poll call,
> first write them as a single SST File using the {{SstFileWriter}}. The
> existing {{DBOptions}} could be used to construct the {{EnvOptions} and
> {{Options}} for the writter.
> Do not yet ingest the written file to the db yet within each iteration
> 3) At the end of the restoration, call {{RocksDB.ingestExternalFile}} given
> all the written files' path as the parameter. The
> {{IngestExternalFileOptions}} would be specifically configured to allow key
> range overlapping with mem-table.
> 4) A specific note is that after the call in 3), heavy compaction may be
> executed by RocksDB in the background and before it cools down, starting
> normal processing immediately which would try to {{put}} new records into the
> store may see high stalls. To work around it we would consider using
> {{RocksDB.compactRange()}} which would block until the compaction is
> completed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)