Hi All, I have heard of a few use cases where lineage support is asked for. On apex, it seems to be an ask for the ability to uniquely track each tuple as it flows through the DAG. It further boils down to being able to track every tuple going into each operator and the corresponding tuple going out of the operator. Here are a quick list I put together to describe some requirements for lineage support on apex. Please feel free to improve or add to it. Also, please respond with ideas on how we can solve this on the apex platform.
When lineage is enabled, 1. We should be able to track each tuple as it enters and exits an operator. eg: enrichment. 2. We should be able to track all the tuples that contributed to a tuple that is emitted. eg: dimensions computation. 3. We should be able to track all the tuples that contributed to all the tuples emitted by the operator. eg: join? -- Regards, Ashwin.
