This is not quite what you are asking, but I often save intermediate results down to parquet files so I can diagnose problems and rebuild data from a known good state without having to re-run every processing step.
On Fri, Apr 7, 2017 at 1:08 AM, kant kodali <kanth...@gmail.com> wrote: > yes Lineage that is actually replayable is what is needed for Validation > process. So we can address questions like how a system arrived at a state S > at a time T. I guess a good analogy is event sourcing. > > > On Thu, Apr 6, 2017 at 10:30 PM, Jörn Franke <jornfra...@gmail.com> wrote: > >> I do think this is the right way, you will have to do testing with test >> data verifying that the expected output of the calculation is the output. >> Even if the logical Plan Is correct your calculation might not be. E.g. >> There can be bugs in Spark, in the UI or (what is very often) the client >> describes a calculation, but in the end the description is wrong. >> >> > On 4. Apr 2017, at 05:19, kant kodali <kanth...@gmail.com> wrote: >> > >> > Hi All, >> > >> > I am wondering if there a way to persist the lineages generated by >> spark underneath? Some of our clients want us to prove if the result of the >> computation that we are showing on a dashboard is correct and for that If >> we can show the lineage of transformations that are executed to get to the >> result then that can be the Q.E.D moment but I am not even sure if this is >> even possible with spark? >> > >> > Thanks, >> > kant >> > >