Re: Proposing interactive beam runner

Kenneth Knowles Mon, 25 Jun 2018 10:59:18 -0700

On Mon, Jun 25, 2018 at 10:47 AM Harsh Vardhan <anan...@google.com> wrote:


>
>
> On Thu, Jun 21, 2018 at 1:41 PM Kenneth Knowles <k...@google.com> wrote:
>
>> I'm not convinced the Scio approach is hacky :-) and anyhow it is at its
>> core the same approach as this one isn't it? The difference is just making
>> run() spin up and read the temporary location instantly,
>>
>
>
>> which this demo achieves by working in memory on one machine. No?
>>
>
> Partially -- the current demo uses the InteractiveRunner with DirectRunner
> execution. However, the InteractiveRunner should be extensible to allow
> using with other distributed runners (e.g. a Flink cluster or Cloud
> Dataflow).
>

Yes, I was digging into what "should be extensible" would mean when someone
did the work. I expect it will look like Scio.

Kenn


>
>
>> Kenn
>>
>> On Thu, Jun 21, 2018 at 1:16 PM Harsh Vardhan <anan...@google.com> wrote:
>>
>>> This is targeting Python SDK + DirectRunner to start with. We can
>>> explore ways to make this applicable to other SDKs and Runners.
>>>
>>> On Fri, Jun 15, 2018 at 1:00 PM Neville Li <nevi...@spotify.com> wrote:
>>>
>>>> Is this targeting mainly Python SDK and DirectRunner for now?
>>>>
>>>> Our interactivity solution is hacky, basically calling
>>>> `pipeline.run()`, saving PCollections to temporary locations and loading it
>>>> to memory.
>>>>
>>>> I think the most requested feature is the ability to inspect
>>>> PCollection content during execution and conditionally alter DAG, similar
>>>> to what the spark driver does. But this would require significant change to
>>>> the DataflowRunner execution model?
>>>>
>>>> On Fri, Jun 15, 2018 at 9:56 AM, Kenneth Knowles <k...@google.com>
>>>> wrote:
>>>>
>>>>> Nice! As-is, this already looks useful for making Beam accessible.
>>>>>
>>>>> Commented a bit on doc to highlight where SQL is different than
>>>>> Scio/Python style. I think notebooks are the perfect target. Specifically,
>>>>> Python and SQL on the same notebook would be amazing.
>>>>>
>>>>> Kenn
>>>>>
>>>>> On Thu, Jun 14, 2018 at 2:04 PM Sindy Li <qiny...@google.com> wrote:
>>>>>
>>>>>> Thanks Ahmet,
>>>>>>
>>>>>> We know quite a few teams in Google are interested to run interactive
>>>>>> Beam pipelines, especially in Python for Machine Learning -- some are
>>>>>> already using it interactively in their own way. So instead of for the
>>>>>> those teams to develop their own version of interactive solution, we want
>>>>>> one repository that people can contribute to. We could also provide 
>>>>>> better
>>>>>> features like fast re-execution as is shown in the demo.
>>>>>>
>>>>>> Thanks,
>>>>>> Sindy
>>>>>>
>>>>>> On Wed, Jun 13, 2018 at 5:48 PM, Ahmet Altay <al...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thank you Sindy.
>>>>>>>
>>>>>>> I like the demo; it looks great. This would be interesting to a lot
>>>>>>> of users. What are your plans for moving this forward? What kind of an
>>>>>>> input you are looking for?
>>>>>>>
>>>>>>> Ahmet
>>>>>>>
>>>>>>> On Wed, Jun 13, 2018 at 2:32 PM, Eugene Kirpichov <
>>>>>>> kirpic...@google.com> wrote:
>>>>>>>
>>>>>>>> This is awesome, thanks Sindy! I hope that the questions related to
>>>>>>>> portability will get resolved in a way that will allow to reuse some 
>>>>>>>> of the
>>>>>>>> work for other interactive Beam experiences, including SQL as Andrew 
>>>>>>>> says,
>>>>>>>> and providing a REPL e.g. for users of Scala or other JVM-based 
>>>>>>>> languages.
>>>>>>>>
>>>>>>>> +Neville Li <nevi...@spotify.com> Do I remember correctly that you
>>>>>>>> guys had some sort of interactivity going in Scio but were looking 
>>>>>>>> forward
>>>>>>>> to Beam developing a native solution?
>>>>>>>>
>>>>>>>> On Wed, Jun 13, 2018 at 2:22 PM Sindy Li <qiny...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> *Thanks, Andrew!*
>>>>>>>>>
>>>>>>>>> *Here is a link to the demo on Youtube for people interested:*
>>>>>>>>> *https://www.youtube.com/watch?v=c5CjA1e3Cqw&feature=youtu.be
>>>>>>>>> <https://www.youtube.com/watch?v=c5CjA1e3Cqw&feature=youtu.be>*
>>>>>>>>>
>>>>>>>>> On Wed, Jun 13, 2018 at 1:23 PM, Andrew Pilloud <
>>>>>>>>> apill...@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> This sounds really interesting, thanks for sharing! We've just
>>>>>>>>>> begun to explore making Beam SQL interactive. The Interactive Runner 
>>>>>>>>>> you've
>>>>>>>>>> proposed sounds like it would solve a bunch of the problems SQL 
>>>>>>>>>> faces as
>>>>>>>>>> well. SQL is written in Java right now, so we can't immediately 
>>>>>>>>>> reuse any
>>>>>>>>>> code.
>>>>>>>>>>
>>>>>>>>>> Andrew
>>>>>>>>>>
>>>>>>>>>> On Wed, Jun 13, 2018 at 11:48 AM Sindy Li <qiny...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Resending after subscribing to dev list.
>>>>>>>>>>>
>>>>>>>>>>> ---------- Forwarded message ----------
>>>>>>>>>>> From: Sindy Li <qiny...@google.com>
>>>>>>>>>>> Date: Fri, Jun 8, 2018 at 5:57 PM
>>>>>>>>>>> Subject: Proposing interactive beam runner
>>>>>>>>>>> To: dev@beam.apache.org
>>>>>>>>>>> Cc: Harsh Vardhan <anan...@google.com>, Chamikara Jayalath <
>>>>>>>>>>> chamik...@google.com>, Anand Iyer <ian...@google.com>, Robert
>>>>>>>>>>> Bradshaw <rober...@google.com>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> We were exploring ways to provide an interactive notebook
>>>>>>>>>>> experience for writing Beam Python pipelines. The design doc
>>>>>>>>>>> <https://docs.google.com/document/d/10bTc97GN5Wk-nhwncqNq9_XkJFVVy0WLT4gPFqP6Kmw/edit?usp=sharing>
>>>>>>>>>>>  provides
>>>>>>>>>>> an overview/vision of what we would like to achieve. Pull
>>>>>>>>>>> request <https://github.com/apache/beam/pull/5595> provides a
>>>>>>>>>>> prototype for the same. The document also provides demo screen
>>>>>>>>>>> shots and instructions for running a demo in Jupyter. Please take a 
>>>>>>>>>>> look.
>>>>>>>>>>> We believe this would be a useful addition to Beam.
>>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>

Re: Proposing interactive beam runner

Reply via email to