This is not quite what you are asking, but I often save intermediate
results down to parquet files so I can diagnose problems and rebuild data
from a known good state without having to re-run every processing step.

On Fri, Apr 7, 2017 at 1:08 AM, kant kodali <kanth...@gmail.com> wrote:

> yes Lineage that is actually replayable is what is needed for Validation
> process. So we can address questions like how a system arrived at a state S
> at a time T. I guess a good analogy is event sourcing.
>
>
> On Thu, Apr 6, 2017 at 10:30 PM, Jörn Franke <jornfra...@gmail.com> wrote:
>
>> I do think this is the right way, you will have to do testing with test
>> data verifying that the expected output of the calculation is the output.
>> Even if the logical Plan Is correct your calculation might not be. E.g.
>> There can be bugs in Spark, in the UI or (what is very often) the client
>> describes a calculation, but in the end the description is wrong.
>>
>> > On 4. Apr 2017, at 05:19, kant kodali <kanth...@gmail.com> wrote:
>> >
>> > Hi All,
>> >
>> > I am wondering if there a way to persist the lineages generated by
>> spark underneath? Some of our clients want us to prove if the result of the
>> computation that we are showing on a dashboard is correct and for that If
>> we can show the lineage of transformations that are executed to get to the
>> result then that can be the Q.E.D moment but I am not even sure if this is
>> even possible with spark?
>> >
>> > Thanks,
>> > kant
>>
>
>

Reply via email to