On Mon, Jun 25, 2018 at 10:47 AM Harsh Vardhan <anan...@google.com> wrote:
> > > On Thu, Jun 21, 2018 at 1:41 PM Kenneth Knowles <k...@google.com> wrote: > >> I'm not convinced the Scio approach is hacky :-) and anyhow it is at its >> core the same approach as this one isn't it? The difference is just making >> run() spin up and read the temporary location instantly, >> > > >> which this demo achieves by working in memory on one machine. No? >> > > Partially -- the current demo uses the InteractiveRunner with DirectRunner > execution. However, the InteractiveRunner should be extensible to allow > using with other distributed runners (e.g. a Flink cluster or Cloud > Dataflow). > Yes, I was digging into what "should be extensible" would mean when someone did the work. I expect it will look like Scio. Kenn > > >> Kenn >> >> On Thu, Jun 21, 2018 at 1:16 PM Harsh Vardhan <anan...@google.com> wrote: >> >>> This is targeting Python SDK + DirectRunner to start with. We can >>> explore ways to make this applicable to other SDKs and Runners. >>> >>> On Fri, Jun 15, 2018 at 1:00 PM Neville Li <nevi...@spotify.com> wrote: >>> >>>> Is this targeting mainly Python SDK and DirectRunner for now? >>>> >>>> Our interactivity solution is hacky, basically calling >>>> `pipeline.run()`, saving PCollections to temporary locations and loading it >>>> to memory. >>>> >>>> I think the most requested feature is the ability to inspect >>>> PCollection content during execution and conditionally alter DAG, similar >>>> to what the spark driver does. But this would require significant change to >>>> the DataflowRunner execution model? >>>> >>>> On Fri, Jun 15, 2018 at 9:56 AM, Kenneth Knowles <k...@google.com> >>>> wrote: >>>> >>>>> Nice! As-is, this already looks useful for making Beam accessible. >>>>> >>>>> Commented a bit on doc to highlight where SQL is different than >>>>> Scio/Python style. I think notebooks are the perfect target. Specifically, >>>>> Python and SQL on the same notebook would be amazing. >>>>> >>>>> Kenn >>>>> >>>>> On Thu, Jun 14, 2018 at 2:04 PM Sindy Li <qiny...@google.com> wrote: >>>>> >>>>>> Thanks Ahmet, >>>>>> >>>>>> We know quite a few teams in Google are interested to run interactive >>>>>> Beam pipelines, especially in Python for Machine Learning -- some are >>>>>> already using it interactively in their own way. So instead of for the >>>>>> those teams to develop their own version of interactive solution, we want >>>>>> one repository that people can contribute to. We could also provide >>>>>> better >>>>>> features like fast re-execution as is shown in the demo. >>>>>> >>>>>> Thanks, >>>>>> Sindy >>>>>> >>>>>> On Wed, Jun 13, 2018 at 5:48 PM, Ahmet Altay <al...@google.com> >>>>>> wrote: >>>>>> >>>>>>> Thank you Sindy. >>>>>>> >>>>>>> I like the demo; it looks great. This would be interesting to a lot >>>>>>> of users. What are your plans for moving this forward? What kind of an >>>>>>> input you are looking for? >>>>>>> >>>>>>> Ahmet >>>>>>> >>>>>>> On Wed, Jun 13, 2018 at 2:32 PM, Eugene Kirpichov < >>>>>>> kirpic...@google.com> wrote: >>>>>>> >>>>>>>> This is awesome, thanks Sindy! I hope that the questions related to >>>>>>>> portability will get resolved in a way that will allow to reuse some >>>>>>>> of the >>>>>>>> work for other interactive Beam experiences, including SQL as Andrew >>>>>>>> says, >>>>>>>> and providing a REPL e.g. for users of Scala or other JVM-based >>>>>>>> languages. >>>>>>>> >>>>>>>> +Neville Li <nevi...@spotify.com> Do I remember correctly that you >>>>>>>> guys had some sort of interactivity going in Scio but were looking >>>>>>>> forward >>>>>>>> to Beam developing a native solution? >>>>>>>> >>>>>>>> On Wed, Jun 13, 2018 at 2:22 PM Sindy Li <qiny...@google.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> *Thanks, Andrew!* >>>>>>>>> >>>>>>>>> *Here is a link to the demo on Youtube for people interested:* >>>>>>>>> *https://www.youtube.com/watch?v=c5CjA1e3Cqw&feature=youtu.be >>>>>>>>> <https://www.youtube.com/watch?v=c5CjA1e3Cqw&feature=youtu.be>* >>>>>>>>> >>>>>>>>> On Wed, Jun 13, 2018 at 1:23 PM, Andrew Pilloud < >>>>>>>>> apill...@google.com> wrote: >>>>>>>>> >>>>>>>>>> This sounds really interesting, thanks for sharing! We've just >>>>>>>>>> begun to explore making Beam SQL interactive. The Interactive Runner >>>>>>>>>> you've >>>>>>>>>> proposed sounds like it would solve a bunch of the problems SQL >>>>>>>>>> faces as >>>>>>>>>> well. SQL is written in Java right now, so we can't immediately >>>>>>>>>> reuse any >>>>>>>>>> code. >>>>>>>>>> >>>>>>>>>> Andrew >>>>>>>>>> >>>>>>>>>> On Wed, Jun 13, 2018 at 11:48 AM Sindy Li <qiny...@google.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Resending after subscribing to dev list. >>>>>>>>>>> >>>>>>>>>>> ---------- Forwarded message ---------- >>>>>>>>>>> From: Sindy Li <qiny...@google.com> >>>>>>>>>>> Date: Fri, Jun 8, 2018 at 5:57 PM >>>>>>>>>>> Subject: Proposing interactive beam runner >>>>>>>>>>> To: dev@beam.apache.org >>>>>>>>>>> Cc: Harsh Vardhan <anan...@google.com>, Chamikara Jayalath < >>>>>>>>>>> chamik...@google.com>, Anand Iyer <ian...@google.com>, Robert >>>>>>>>>>> Bradshaw <rober...@google.com> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hello, >>>>>>>>>>> >>>>>>>>>>> We were exploring ways to provide an interactive notebook >>>>>>>>>>> experience for writing Beam Python pipelines. The design doc >>>>>>>>>>> <https://docs.google.com/document/d/10bTc97GN5Wk-nhwncqNq9_XkJFVVy0WLT4gPFqP6Kmw/edit?usp=sharing> >>>>>>>>>>> provides >>>>>>>>>>> an overview/vision of what we would like to achieve. Pull >>>>>>>>>>> request <https://github.com/apache/beam/pull/5595> provides a >>>>>>>>>>> prototype for the same. The document also provides demo screen >>>>>>>>>>> shots and instructions for running a demo in Jupyter. Please take a >>>>>>>>>>> look. >>>>>>>>>>> We believe this would be a useful addition to Beam. >>>>>>>>>>> >>>>>>>>>>> Thanks! >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>