shnapz commented on issue #33981: URL: https://github.com/apache/beam/issues/33981#issuecomment-3700722605
> To submit lineage data in OpenLineage format, you need to know the source-sink pairs. True, but it is all simpler if you don't need a transform-level granularity, but a job-level granularity (consider your job as an atomic transform that has all sources and sinks at the same time): class [org/apache/beam/sdk/metrics/Lineage.java](https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/Lineage.java) is a central interception point across all IOs. So my PR just modifies this class to **add ability** to substitute lineage metrics with any custom implementation (sacrificing transform-level granularity). Apparently it doesn't overlap with your ticket at all. So we are good! Indeed metrics are critical for the transform level, and also metrics are useful for cross-worker deduplication (implementation is strictly runner specific though). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
