Splittable DoFN in Spark discussion

Holden Karau Wed, 14 Mar 2018 16:29:32 -0700

So we had a quick chat about what it would take to add something like
SplittableDoFns to Spark. I'd done some sketchy thinking about this last
year but didn't get very far.


My back-of-the-envelope design was as follows:
For input type T
Output type V

Implement a mapper which outputs type (T, V)
and if the computation finishes T will be populated otherwise V will be

For determining how long to run we'd up to either K seconds or listen for a
signal on a port

Once we're done running we take the result and filter for the ones with T
and V into seperate collections re-run until finished
and then union the results


This is maybe not a great design but it was minimally complicated and I
figured terrible was a good place to start and improve from.


Let me know your thoughts, especially the parts where this is worse than I
remember because its been awhile since I thought about this.


-- 
Twitter: https://twitter.com/holdenkarau

Splittable DoFN in Spark discussion

Reply via email to