So we had a quick chat about what it would take to add something like SplittableDoFns to Spark. I'd done some sketchy thinking about this last year but didn't get very far.
My back-of-the-envelope design was as follows: For input type T Output type V Implement a mapper which outputs type (T, V) and if the computation finishes T will be populated otherwise V will be For determining how long to run we'd up to either K seconds or listen for a signal on a port Once we're done running we take the result and filter for the ones with T and V into seperate collections re-run until finished and then union the results This is maybe not a great design but it was minimally complicated and I figured terrible was a good place to start and improve from. Let me know your thoughts, especially the parts where this is worse than I remember because its been awhile since I thought about this. -- Twitter: https://twitter.com/holdenkarau