It would be great to have the ValidatesRunner suite of tests start executing against Flink/ULR as it will make sure things don't break and are reproducible.
On Wed, Jun 27, 2018 at 12:34 PM Eugene Kirpichov <kirpic...@google.com> wrote: > Hi! > > Those instructions are not current and I think should be discarded as they > referred to a particular effort that is over - +Ankur Goenka > <goe...@google.com> is, I believe, working on the remaining finishing > touches for running from a clean clone of Beam master and documenting how > to do that; could you help Thomas so we can start looking at what the > streaming runner is missing? > > We'll need to document this in a more prominent place. When we get to a > state where we can run Python WordCount from master, we'll need to document > it somewhere on the main portability page and/or the getting started guide; > when we can run something more serious, e.g. Tensorflow pipelines, that > will be worth a Beam blog post and worth documenting in the TFX > documentation. > > On Wed, Jun 27, 2018 at 5:35 AM Thomas Weise <t...@apache.org> wrote: > >> Hi Eugene, >> >> The basic streaming translation is already in place from the prototype, >> though I have not verified it on the master branch yet. >> >> Are the user instructions for the portable Flink runner at >> https://s.apache.org/beam-portability-team-doc current? >> >> (I don't have a dependency on SDF since we are going to use custom native >> Flink sources/sinks at this time.) >> >> Thanks, >> Thomas >> >> >> On Tue, Jun 26, 2018 at 2:13 AM Eugene Kirpichov <kirpic...@google.com> >> wrote: >> >>> Hi! >>> >>> Wanted to let you know that I've just merged the PR that adds >>> checkpointable SDF support to the portable reference runner (ULR) and the >>> Java SDK harness: >>> >>> https://github.com/apache/beam/pull/5566 >>> >>> So now we have a reference implementation of SDF support in a portable >>> runner, and a reference implementation of SDF support in a portable SDK >>> harness. >>> From here on, we need to replicate this support in other portable >>> runners and other harnesses. The obvious targets are Flink and Python >>> respectively. >>> >>> Chamikara was going to work on the Python harness. +Thomas Weise >>> <t...@apache.org> Would you be interested in the Flink portable >>> streaming runner side? It is of course blocked by having the rest of that >>> runner working in streaming mode though (the batch mode is practically done >>> - will send you a separate note about the status of that). >>> >>> On Fri, Mar 23, 2018 at 12:20 PM Eugene Kirpichov <kirpic...@google.com> >>> wrote: >>> >>>> Luke is right - unbounded sources should go through SDF. I am currently >>>> working on adding such support to Fn API. >>>> The relevant document is s.apache.org/beam-breaking-fusion (note: it >>>> focuses on a much more general case, but also considers in detail the >>>> specific case of running unbounded sources on Fn API), and the first >>>> related PR is https://github.com/apache/beam/pull/4743 . >>>> >>>> Ways you can help speed up this effort: >>>> - Make necessary changes to Apex runner per se to support regular SDFs >>>> in streaming (without portability). They will likely largely carry over to >>>> portable world. I recall that the Apex runner had some level of support of >>>> SDFs, but didn't pass the ValidatesRunner tests yet. >>>> - (general to Beam, not Apex-related per se) Implement the translation >>>> of Read.from(UnboundedSource) via impulse, which will require implementing >>>> an SDF that reads from a given UnboundedSource (taking the UnboundedSource >>>> as an element). This should be fairly straightforward and will allow all >>>> portable runners to take advantage of existing UnboundedSource's. >>>> >>>> >>>> On Fri, Mar 23, 2018 at 3:08 PM Lukasz Cwik <lc...@google.com> wrote: >>>> >>>>> Using impulse is a precursor for both bounded and unbounded SDF. >>>>> >>>>> This JIRA represents the work that would be to add support for >>>>> unbounded SDF using portability APIs: >>>>> https://issues.apache.org/jira/browse/BEAM-2939 >>>>> >>>>> >>>>> On Fri, Mar 23, 2018 at 11:46 AM Thomas Weise <t...@apache.org> wrote: >>>>> >>>>>> So for streaming, we will need the Impulse translation for bounded >>>>>> input, identical with batch, and then in addition to that support for >>>>>> SDF? >>>>>> >>>>>> Any pointers what's involved in adding the SDF support? Is it runner >>>>>> specific? Does the ULR cover it? >>>>>> >>>>>> >>>>>> On Fri, Mar 23, 2018 at 11:26 AM, Lukasz Cwik <lc...@google.com> >>>>>> wrote: >>>>>> >>>>>>> All "sources" in portability will use splittable DoFns for execution. >>>>>>> >>>>>>> Specifically, runners will need to be able to checkpoint unbounded >>>>>>> sources to get a minimum viable pipeline working. >>>>>>> For bounded pipelines, a DoFn can read the contents of a bounded >>>>>>> source. >>>>>>> >>>>>>> >>>>>>> On Fri, Mar 23, 2018 at 10:52 AM Thomas Weise <t...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I'm looking at the portable pipeline translation for streaming. I >>>>>>>> understand that for batch pipelines, it is sufficient to translate >>>>>>>> Impulse. >>>>>>>> >>>>>>>> What is the intended path to support unbounded sources? >>>>>>>> >>>>>>>> The goal here is to get a minimum translation working that will >>>>>>>> allow streaming wordcount execution. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Thomas >>>>>>>> >>>>>>>> >>>>>>