This is very valuable. I have heard the following feature sets from
customers.

- Ability to spool to hdfs (or any DFS interface)
- Ability to pick and choose the tuple, i.e. not every tuple may need to be
tracked
- Minimal performance hit
- The current api remains as is
- Ability to get the content based on tuple-id

Apex should enable this with minimal or no coding from users

Thks,
Amol


On Mon, Apr 25, 2016 at 12:00 AM, Ashwin Chandra Putta <
[email protected]> wrote:

> Hi All,
>
> I have heard of a few use cases where lineage support is asked for. On
> apex, it seems to be an ask for the ability to uniquely track each tuple as
> it flows through the DAG. It further boils down to being able to track
> every tuple going into each operator and the corresponding tuple going out
> of the operator. Here are a quick list I put together to describe some
> requirements for lineage support on apex. Please feel free to improve or
> add to it. Also, please respond with ideas on how we can solve this on the
> apex platform.
>
> When lineage is enabled,
> 1. We should be able to track each tuple as it enters and exits an
> operator. eg: enrichment.
> 2. We should be able to track all the tuples that contributed to a tuple
> that is emitted. eg: dimensions computation.
> 3. We should be able to track all the tuples that contributed to all the
> tuples emitted by the operator. eg: join?
>
> --
>
> Regards,
> Ashwin.
>

Reply via email to