shangxinli commented on issue #17512:
URL: https://github.com/apache/hudi/issues/17512#issuecomment-4529463572

   Good catch, I agree.
   
   Propagating upstream watermarks when `eventTimeFieldName` is absent on the 
downstream table is semantically unsafe — as you note, any filtering or 
projection in the pipeline means the upstream range no longer describes what's 
actually in the downstream partitions. There's no reliable way to know at write 
time how much the data was trimmed.
   
   Revised rule for Phase 2: **upstream propagation is only enabled when 
`eventTimeFieldName` is explicitly set on the downstream table.** In that case, 
the write pipeline computes the downstream watermark from its own records (the 
existing Phase 1 path), and separately uses the upstream commit's per-partition 
metadata to fill in partitions that received no new writes in this commit 
(e.g., incremental source with sparse updates). If `eventTimeFieldName` is 
unset, we no-op entirely — no inherited watermark, no fabricated value.
   
   This also resolves the back-fill concern I flagged earlier: if the user is 
writing event-time-aware data, they must declare the field; the framework then 
has a ground truth to validate propagation against rather than guessing.
   
   I'll update the RFC/plan to reflect this. Does that address your concern?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to