xushiyan commented on code in PR #6256:
URL: https://github.com/apache/hudi/pull/6256#discussion_r945241280


##########
rfc/rfc-51/rfc-51.md:
##########
@@ -148,20 +152,27 @@ hudi_cdc_table/
 
 Under a partition directory, the `.log` file with `CDCBlock` above will keep 
the changing data we have to materialize.
 
-There is an option to control what data is written to `CDCBlock`, that is 
`hoodie.table.cdc.supplemental.logging`. See the description of this config 
above.
+#### Write-on-indexing vs Write-on-compaction

Review Comment:
   > This could be confusing for customers who dont know the implementation 
details.
   
   I actually don't resonate with this, as I don't see this as exposing 
implementation details. It's a known characteristic that RO queries for MOR 
tables depends on compaction. This aligns with MOR semantics: if you need to 
query efficiently (RO), you have to wait for compaction; if you need to get 
fresher results (RT/snapshot), you have to spend more computation power. 
Similarly, if you need to query CDC result efficiently, you wait for 
compaction; if to get fresher CDC results, spend more computation power to 
merge the log files in-flight and then compute cdc. Logic for the 2nd case has 
not been proposed (and it should be) in the PR.
   
   I feel having a new service for this would introduce more impl. complexities 
and gaps to understand. For example, the way it interplays with what type of 
index is used and whether cdc is on or not. We should note that in COW 
compaction happens implicitly, which would simplify the impl. for cdc 
write-on-compaction due to code path reuse.
   
   As this only affects MOR tables, we should trace back to the MOR semantics: 
what can users expect to get from MOR?  my understanding is: MOR is the 
tradeoff btw fast ingestion and query freshness, it applies to the data files, 
it can also be applied to the cdc data.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to