Great to see this progress! :)

On Wed, Mar 18, 2020 at 2:57 PM Reza Rokni <r...@google.com> wrote:

> Awesome !
>
> On Thu, 19 Mar 2020, 05:38 Sam Rohde, <sro...@google.com> wrote:
>
>> Hi All!
>>
>>
>>
>> I am happy to announce that an improved Interactive Runner is now
>> available on master. This Python runner allows for the interactive
>> development of Beam pipelines in a notebook (and IPython) environment.
>>
>>
>>
>> The runner still has some bugs that need to be fixed as well as some
>> refactoring, but it is in a good enough shape to start using it.
>>
>>
>>
>> Here are the new things you can do with the Interactive Runner:
>>
>>    -
>>
>>    Create and execute pipelines within a REPL
>>    -
>>
>>    Visualize elements as the pipeline is running
>>    -
>>
>>    Materialize PCollections to DataFrames
>>    -
>>
>>    Record unbounded sources for deterministic replay
>>    -
>>
>>    Replay cached unbounded sources including watermark advancements
>>
>> The code lives in sdks/python/apache_beam/runners/interactive
>> <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive>
>> and example notebooks are in
>> sdks/python/apache_beam/runners/interactive/examples
>> <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive/examples>
>> .
>>
>>
>>
>> To install, use `pip install -e .[interactive]` in your <project
>> root>/sdks/python directory.
>>
>> To run, here’s a quick example:
>>
>> ```
>>
>> import apache_beam as beam
>>
>> from apache_beam.runners.interactive.interactive_runner import
>> InteractiveRunner
>>
>> import apache_beam.runners.interactive.interactive_beam as ib
>>
>>
>>
>> p = beam.Pipeline(InteractiveRunner())
>>
>> words = p | beam.Create(['To', 'be', 'or', 'not', 'to', 'be'])
>>
>> counts = words | 'count' >> beam.combiners.Count.PerElement()
>>
>>
>>
>> # Shows a dynamically updating display of the PCollection elements
>>
>> ib.show(counts)
>>
>>
>>
>> # We can now visualize the data using standard pandas operations.
>>
>> df = ib.collect(counts)
>>
>> print(df.info())
>>
>> print(df.describe())
>>
>>
>>
>> # Plot the top-10 counted words
>>
>> df = df.sort_values(by=1, ascending=False)
>>
>> df.head(n=10).plot(x=0, y=1)
>>
>> ```
>>
>>
>>
>> Currently, Batch is supported on any runner. Streaming is only supported
>> on the DirectRunner (non-FnAPI).
>>
>>
>>
>> I would like to thank the great work of Sindy (@sindyli) and Harsh
>> (@ananvay) for the initial implementation,
>>
>> David Yan (@davidyan) who led the project, Ning (@ningk) and myself
>> (@srohde) for the implementation and design, and Ahmet (@altay), Daniel
>> (@millsd), Pablo (@pabloem), and Robert (@robertwb) who all contributed a
>> lot of their time to help with the design and code reviews.
>>
>>
>>
>> It was a team effort and we wouldn't have been able to complete it
>> without the help of everyone involved.
>>
>>
>>
>> Regards,
>>
>> Sam
>>
>>

Reply via email to