xushiyan commented on code in PR #6256: URL: https://github.com/apache/hudi/pull/6256#discussion_r945241280
########## rfc/rfc-51/rfc-51.md: ########## @@ -148,20 +152,27 @@ hudi_cdc_table/ Under a partition directory, the `.log` file with `CDCBlock` above will keep the changing data we have to materialize. -There is an option to control what data is written to `CDCBlock`, that is `hoodie.table.cdc.supplemental.logging`. See the description of this config above. +#### Write-on-indexing vs Write-on-compaction Review Comment: > This could be confusing for customers who dont know the implementation details. I actually don't resonate with this, as I don't see this as exposing implementation details. It's a known characteristic that RO queries for MOR tables depends on compaction. This aligns with MOR semantics: if you need to query efficiently (RO), you have to wait for compaction; if you need to get fresher results (RT/snapshot), you have to spend more computation power. Similarly, if you need to query CDC result efficiently, you wait for compaction; if to get fresher CDC results, spend more computation power to merge the log files in-flight and then compute cdc. Logic for the 2nd case has not been proposed (and it should be) in the PR. I feel having a new service for this would introduce more impl. complexities and gaps to understand. For example, the way it interplays with what type of index is used and whether cdc is on or not. We should note that in COW compaction happens implicitly, which would simplify the impl. for cdc write-on-compaction due to code path reuse. As this only affects MOR tables, we should trace back to the MOR semantics: what can users expect to get from MOR? my understanding is: MOR is the tradeoff btw fast ingestion and query freshness, it applies to the data files, it can also be applied to the cdc data. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org