Reviving this thread. I think SDF is a pretty big risk for Spark runner streaming. Holden, is it correct that Spark appears to have no way at all to produce an infinite DStream from a finite RDD? Maybe we can somehow dynamically create a new DStream for every initial restriction, said DStream being obtained using a Receiver that under the hood actually runs the SDF? (this is of course less efficient than a timer-capable runner would do, and I have doubts about the fault tolerance)
On Wed, Mar 14, 2018 at 10:26 PM Reuven Lax <re...@google.com> wrote: > How would timers be implemented? By outputing and reprocessing, the same > way you proposed for SDF? > > > On Wed, Mar 14, 2018 at 7:25 PM Holden Karau <hol...@pigscanfly.ca> wrote: > >> So the timers would have to be in our own code. >> >> On Wed, Mar 14, 2018 at 5:18 PM Eugene Kirpichov <kirpic...@google.com> >> wrote: >> >>> Does Spark have support for timers? (I know it has support for state) >>> >>> On Wed, Mar 14, 2018 at 4:43 PM Reuven Lax <re...@google.com> wrote: >>> >>>> Could we alternatively use a state mapping function to keep track of >>>> the computation so far instead of outputting V each time? (also the >>>> progress so far is probably of a different type R rather than V). >>>> >>>> >>>> On Wed, Mar 14, 2018 at 4:28 PM Holden Karau <hol...@pigscanfly.ca> >>>> wrote: >>>> >>>>> So we had a quick chat about what it would take to add something like >>>>> SplittableDoFns to Spark. I'd done some sketchy thinking about this last >>>>> year but didn't get very far. >>>>> >>>>> My back-of-the-envelope design was as follows: >>>>> For input type T >>>>> Output type V >>>>> >>>>> Implement a mapper which outputs type (T, V) >>>>> and if the computation finishes T will be populated otherwise V will be >>>>> >>>>> For determining how long to run we'd up to either K seconds or listen >>>>> for a signal on a port >>>>> >>>>> Once we're done running we take the result and filter for the ones >>>>> with T and V into seperate collections re-run until finished >>>>> and then union the results >>>>> >>>>> >>>>> This is maybe not a great design but it was minimally complicated and >>>>> I figured terrible was a good place to start and improve from. >>>>> >>>>> >>>>> Let me know your thoughts, especially the parts where this is worse >>>>> than I remember because its been awhile since I thought about this. >>>>> >>>>> >>>>> -- >>>>> Twitter: https://twitter.com/holdenkarau >>>>> >>>> -- >> Twitter: https://twitter.com/holdenkarau >> >