parisni commented on issue #8222:
URL: https://github.com/apache/hudi/issues/8222#issuecomment-1496493568

   > The solution would be to perform a base + log merge first (which will 
consider the precombine fields), then filter for the commit range (increases 
the cost of the query, but will give you same semantics).
   
   Indeed that's what i would expect. Also, i would expect the incremental MOR 
apply the same merger that is used for reading (not only the precombine field)
   
   > How much of a blocker is this for your project? This will help us 
prioritize this.
   
   I'm fine with CoW implementation right now. What would have been helpful is 
a clear statement in the documentation stating mor and cow handle incremental 
but in a different way. 
   
   Thanks a lot.
   
   
   On April 4, 2023 3:17:00 PM UTC, vinoth chandar ***@***.***> wrote:
   >@parisni To clarify the semantics a bit. Incremental query provides all the 
records that changed between a start and end commit time range. If there are 
multiple writes (CoW) or multiple compactions (MoR) between queries, you would 
only see the latest record (per pre combine logic) up to the compacted point, 
then log records after that. This is similar to the Kafka compacted topic 
[design](https://kafka.apache.org/documentation/#compaction), to bound the 
"catch up" time for downstream jobs. If one wants every change record i.e, 
multiple rows in incremental query output per key for each change, that's what 
the CDC feature solves, right now it's supported for CoW).
   >
   >As for this problem, the issue is the reads are served out of the logs 
based on the commit time range and it's fine as long as we are just returning 
the latest committed records. In this case, there is a pre-combine field to 
respect and that's not handled yet. The solution would be to perform a base + 
log merge first (which will consider the precombine fields), then filter for 
the commit range (increases the cost of the query, but will give you same 
semantics). 
   >
   >How much of a blocker is this for your project? This will help us 
prioritize this. 
   >
   >
   > 
   >
   >-- 
   >Reply to this email directly or view it on GitHub:
   >https://github.com/apache/hudi/issues/8222#issuecomment-1496165517
   >You are receiving this because you were mentioned.
   >
   >Message ID: ***@***.***>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to