shangxinli commented on issue #17512: URL: https://github.com/apache/hudi/issues/17512#issuecomment-4529463572
Good catch, I agree. Propagating upstream watermarks when `eventTimeFieldName` is absent on the downstream table is semantically unsafe — as you note, any filtering or projection in the pipeline means the upstream range no longer describes what's actually in the downstream partitions. There's no reliable way to know at write time how much the data was trimmed. Revised rule for Phase 2: **upstream propagation is only enabled when `eventTimeFieldName` is explicitly set on the downstream table.** In that case, the write pipeline computes the downstream watermark from its own records (the existing Phase 1 path), and separately uses the upstream commit's per-partition metadata to fill in partitions that received no new writes in this commit (e.g., incremental source with sparse updates). If `eventTimeFieldName` is unset, we no-op entirely — no inherited watermark, no fabricated value. This also resolves the back-fill concern I flagged earlier: if the user is writing event-time-aware data, they must declare the field; the framework then has a ground truth to validate propagation against rather than guessing. I'll update the RFC/plan to reflect this. Does that address your concern? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
