Awesome ! On Thu, 19 Mar 2020, 05:38 Sam Rohde, <[email protected]> wrote:
> Hi All! > > > > I am happy to announce that an improved Interactive Runner is now > available on master. This Python runner allows for the interactive > development of Beam pipelines in a notebook (and IPython) environment. > > > > The runner still has some bugs that need to be fixed as well as some > refactoring, but it is in a good enough shape to start using it. > > > > Here are the new things you can do with the Interactive Runner: > > - > > Create and execute pipelines within a REPL > - > > Visualize elements as the pipeline is running > - > > Materialize PCollections to DataFrames > - > > Record unbounded sources for deterministic replay > - > > Replay cached unbounded sources including watermark advancements > > The code lives in sdks/python/apache_beam/runners/interactive > <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive> > and example notebooks are in > sdks/python/apache_beam/runners/interactive/examples > <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive/examples> > . > > > > To install, use `pip install -e .[interactive]` in your <project > root>/sdks/python directory. > > To run, here’s a quick example: > > ``` > > import apache_beam as beam > > from apache_beam.runners.interactive.interactive_runner import > InteractiveRunner > > import apache_beam.runners.interactive.interactive_beam as ib > > > > p = beam.Pipeline(InteractiveRunner()) > > words = p | beam.Create(['To', 'be', 'or', 'not', 'to', 'be']) > > counts = words | 'count' >> beam.combiners.Count.PerElement() > > > > # Shows a dynamically updating display of the PCollection elements > > ib.show(counts) > > > > # We can now visualize the data using standard pandas operations. > > df = ib.collect(counts) > > print(df.info()) > > print(df.describe()) > > > > # Plot the top-10 counted words > > df = df.sort_values(by=1, ascending=False) > > df.head(n=10).plot(x=0, y=1) > > ``` > > > > Currently, Batch is supported on any runner. Streaming is only supported > on the DirectRunner (non-FnAPI). > > > > I would like to thank the great work of Sindy (@sindyli) and Harsh > (@ananvay) for the initial implementation, > > David Yan (@davidyan) who led the project, Ning (@ningk) and myself > (@srohde) for the implementation and design, and Ahmet (@altay), Daniel > (@millsd), Pablo (@pabloem), and Robert (@robertwb) who all contributed a > lot of their time to help with the design and code reviews. > > > > It was a team effort and we wouldn't have been able to complete it without > the help of everyone involved. > > > > Regards, > > Sam > >
