Re: Splittable DoFN in Spark discussion

Reuven Lax Wed, 14 Mar 2018 16:44:14 -0700

Could we alternatively use a state mapping function to keep track of the
computation so far instead of outputting V each time? (also the progress so
far is probably of a different type R rather than V).



On Wed, Mar 14, 2018 at 4:28 PM Holden Karau <[email protected]> wrote:

> So we had a quick chat about what it would take to add something like
> SplittableDoFns to Spark. I'd done some sketchy thinking about this last
> year but didn't get very far.
>
> My back-of-the-envelope design was as follows:
> For input type T
> Output type V
>
> Implement a mapper which outputs type (T, V)
> and if the computation finishes T will be populated otherwise V will be
>
> For determining how long to run we'd up to either K seconds or listen for
> a signal on a port
>
> Once we're done running we take the result and filter for the ones with T
> and V into seperate collections re-run until finished
> and then union the results
>
>
> This is maybe not a great design but it was minimally complicated and I
> figured terrible was a good place to start and improve from.
>
>
> Let me know your thoughts, especially the parts where this is worse than I
> remember because its been awhile since I thought about this.
>
>
> --
> Twitter: https://twitter.com/holdenkarau
>

Re: Splittable DoFN in Spark discussion

Reply via email to