Re: is there a way to persist the lineages generated by spark?

2017-04-07 Thread kant kodali
yes Lineage that is actually replayable is what is needed for Validation process. So we can address questions like how a system arrived at a state S at a time T. I guess a good analogy is event sourcing. On Thu, Apr 6, 2017 at 10:30 PM, Jörn Franke wrote: > I do think

Re: is there a way to persist the lineages generated by spark?

2017-04-06 Thread Jörn Franke
I do think this is the right way, you will have to do testing with test data verifying that the expected output of the calculation is the output. Even if the logical Plan Is correct your calculation might not be. E.g. There can be bugs in Spark, in the UI or (what is very often) the client

Re: is there a way to persist the lineages generated by spark?

2017-04-06 Thread Gourav Sengupta
Hi, I think that every client wants a validation process, but showing lineage is a approach that they are not asking, and may not be the right way to prove it. Regards, Gourav On Tue, Apr 4, 2017 at 4:19 AM, kant kodali wrote: > Hi All, > > I am wondering if there a way

Re: is there a way to persist the lineages generated by spark?

2017-04-03 Thread ayan guha
How about storing logical plans (or printDebugString, in case of RDD) to an external file on the driver? On Tue, Apr 4, 2017 at 1:19 PM, kant kodali wrote: > Hi All, > > I am wondering if there a way to persist the lineages generated by spark > underneath? Some of our

is there a way to persist the lineages generated by spark?

2017-04-03 Thread kant kodali
Hi All, I am wondering if there a way to persist the lineages generated by spark underneath? Some of our clients want us to prove if the result of the computation that we are showing on a dashboard is correct and for that If we can show the lineage of transformations that are executed to get to