Awesome !

On Thu, 19 Mar 2020, 05:38 Sam Rohde, <[email protected]> wrote:

> Hi All!
>
>
>
> I am happy to announce that an improved Interactive Runner is now
> available on master. This Python runner allows for the interactive
> development of Beam pipelines in a notebook (and IPython) environment.
>
>
>
> The runner still has some bugs that need to be fixed as well as some
> refactoring, but it is in a good enough shape to start using it.
>
>
>
> Here are the new things you can do with the Interactive Runner:
>
>    -
>
>    Create and execute pipelines within a REPL
>    -
>
>    Visualize elements as the pipeline is running
>    -
>
>    Materialize PCollections to DataFrames
>    -
>
>    Record unbounded sources for deterministic replay
>    -
>
>    Replay cached unbounded sources including watermark advancements
>
> The code lives in sdks/python/apache_beam/runners/interactive
> <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive>
> and example notebooks are in
> sdks/python/apache_beam/runners/interactive/examples
> <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive/examples>
> .
>
>
>
> To install, use `pip install -e .[interactive]` in your <project
> root>/sdks/python directory.
>
> To run, here’s a quick example:
>
> ```
>
> import apache_beam as beam
>
> from apache_beam.runners.interactive.interactive_runner import
> InteractiveRunner
>
> import apache_beam.runners.interactive.interactive_beam as ib
>
>
>
> p = beam.Pipeline(InteractiveRunner())
>
> words = p | beam.Create(['To', 'be', 'or', 'not', 'to', 'be'])
>
> counts = words | 'count' >> beam.combiners.Count.PerElement()
>
>
>
> # Shows a dynamically updating display of the PCollection elements
>
> ib.show(counts)
>
>
>
> # We can now visualize the data using standard pandas operations.
>
> df = ib.collect(counts)
>
> print(df.info())
>
> print(df.describe())
>
>
>
> # Plot the top-10 counted words
>
> df = df.sort_values(by=1, ascending=False)
>
> df.head(n=10).plot(x=0, y=1)
>
> ```
>
>
>
> Currently, Batch is supported on any runner. Streaming is only supported
> on the DirectRunner (non-FnAPI).
>
>
>
> I would like to thank the great work of Sindy (@sindyli) and Harsh
> (@ananvay) for the initial implementation,
>
> David Yan (@davidyan) who led the project, Ning (@ningk) and myself
> (@srohde) for the implementation and design, and Ahmet (@altay), Daniel
> (@millsd), Pablo (@pabloem), and Robert (@robertwb) who all contributed a
> lot of their time to help with the design and code reviews.
>
>
>
> It was a team effort and we wouldn't have been able to complete it without
> the help of everyone involved.
>
>
>
> Regards,
>
> Sam
>
>

Reply via email to