[ 
https://issues.apache.org/jira/browse/KAFKA-13239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418705#comment-17418705
 ] 

Guozhang Wang edited comment on KAFKA-13239 at 9/22/21, 4:39 PM:
-----------------------------------------------------------------

{quote} What does this mean/how do memtables factor into this?{quote}

This is for the {{allowBlockingFlush}} call specifically. The 
{{ingestExternalFile}} function can be called in the middle of reads/writes 
where the mem-table was not empty, and when ingesting the sst files, if there's 
overlap with the mem-tables blocking calls to flush the memtable may be 
triggered (? I am not 100% sure why that's the case, worth digging deeper); and 
if that is not allowed via the setter the ingestion call would fail. For our 
case, the mem-tables "should" be empty during restoration, but no one knows 
what rocksDB would be doing upon opening really, so just to be on the safer 
side while I was browsing through the interface I think it may be better to 
just set it.

{quote} Either way, I think I'm not convinced that it will be so simple to 
avoid issues due to compaction/write stalls. {quote}

Yeah I agree. My understanding is that {{RocksDB.ingestExternalFile}} itself is 
not blocking on compactions, and hence background compactions may still be 
pretty hot behind the scene even when it returns. I also feel that it's better 
to first do the restoration thread first 
(https://issues.apache.org/jira/browse/KAFKA-10199), and then consider this one.


was (Author: guozhang):
{quote} What does this mean/how do memtables factor into this?{quote}

This is for the {{allowBlockingFlush}} call specifically. The 
{{ingestExternalFile}} function can be called in the middle of reads/writes 
where the mem-table was not empty, and when ingesting the sst files, if there's 
overlap with the mem-tables blocking calls to flush the memtable may be 
triggered (? I am not 100% sure why that's the case, worth digging deeper); and 
if that is not allowed via the setter the ingestion call would fail. For our 
case, the mem-tables "should" be empty during restoration, but no one knows 
what rocksDB would be doing upon opening really, so just to be on the safer 
side while I was browsing through the interface I think it may be better to 
just set it.

{quote} Either way, I think I'm not convinced that it will be so simple to 
avoid issues due to compaction/write stalls. {quote}

Yeah I agree. My understanding is that {{RocksDB.ingestExternalFile}} itself is 
not blocking on compactions, and hence background compactions may still be 
pretty hot behind the scene even when it returns. I also feel that it's better 
to first do the restoration thread first, and then consider this one.

> Use RocksDB.ingestExternalFile for restoration
> ----------------------------------------------
>
>                 Key: KAFKA-13239
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13239
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Guozhang Wang
>            Priority: Major
>
> Now that we are in newer version of RocksDB, we can consider using the new
> {code}
> ingestExternalFile(final ColumnFamilyHandle columnFamilyHandle,
>       final List<String> filePathList,
>       final IngestExternalFileOptions ingestExternalFileOptions)
> {code}
> for restoring changelog into state stores. More specifically:
> 1) Use larger default batch size in restore consumer polling behavior so that 
> each poll would return more records as possible.
> 2) For a single batch of records returned from a restore consumer poll call, 
> first write them as a single SST File using the {{SstFileWriter}}. The 
> existing {{DBOptions}} could be used to construct the {{EnvOptions} and 
> {{Options}} for the writter.
> Do not yet ingest the written file to the db yet within each iteration
> 3) At the end of the restoration, call {{RocksDB.ingestExternalFile}} given 
> all the written files' path as the parameter. The 
> {{IngestExternalFileOptions}} would be specifically configured to allow key 
> range overlapping with mem-table.
> 4) A specific note is that after the call in 3), heavy compaction may be 
> executed by RocksDB in the background and before it cools down, starting 
> normal processing immediately which would try to {{put}} new records into the 
> store may see high stalls. To work around it we would consider using 
> {{RocksDB.compactRange()}} which would block until the compaction is 
> completed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to