Nice! On Wed, Mar 18, 2020 at 2:58 PM Ahmet Altay <[email protected]> wrote:
> Great to see this progress! :) > > On Wed, Mar 18, 2020 at 2:57 PM Reza Rokni <[email protected]> wrote: > >> Awesome ! >> >> On Thu, 19 Mar 2020, 05:38 Sam Rohde, <[email protected]> wrote: >> >>> Hi All! >>> >>> >>> >>> I am happy to announce that an improved Interactive Runner is now >>> available on master. This Python runner allows for the interactive >>> development of Beam pipelines in a notebook (and IPython) environment. >>> >>> >>> >>> The runner still has some bugs that need to be fixed as well as some >>> refactoring, but it is in a good enough shape to start using it. >>> >>> >>> >>> Here are the new things you can do with the Interactive Runner: >>> >>> - >>> >>> Create and execute pipelines within a REPL >>> - >>> >>> Visualize elements as the pipeline is running >>> - >>> >>> Materialize PCollections to DataFrames >>> - >>> >>> Record unbounded sources for deterministic replay >>> - >>> >>> Replay cached unbounded sources including watermark advancements >>> >>> The code lives in sdks/python/apache_beam/runners/interactive >>> <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive> >>> and example notebooks are in >>> sdks/python/apache_beam/runners/interactive/examples >>> <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive/examples> >>> . >>> >>> >>> >>> To install, use `pip install -e .[interactive]` in your <project >>> root>/sdks/python directory. >>> >>> To run, here’s a quick example: >>> >>> ``` >>> >>> import apache_beam as beam >>> >>> from apache_beam.runners.interactive.interactive_runner import >>> InteractiveRunner >>> >>> import apache_beam.runners.interactive.interactive_beam as ib >>> >>> >>> >>> p = beam.Pipeline(InteractiveRunner()) >>> >>> words = p | beam.Create(['To', 'be', 'or', 'not', 'to', 'be']) >>> >>> counts = words | 'count' >> beam.combiners.Count.PerElement() >>> >>> >>> >>> # Shows a dynamically updating display of the PCollection elements >>> >>> ib.show(counts) >>> >>> >>> >>> # We can now visualize the data using standard pandas operations. >>> >>> df = ib.collect(counts) >>> >>> print(df.info()) >>> >>> print(df.describe()) >>> >>> >>> >>> # Plot the top-10 counted words >>> >>> df = df.sort_values(by=1, ascending=False) >>> >>> df.head(n=10).plot(x=0, y=1) >>> >>> ``` >>> >>> >>> >>> Currently, Batch is supported on any runner. Streaming is only supported >>> on the DirectRunner (non-FnAPI). >>> >>> >>> >>> I would like to thank the great work of Sindy (@sindyli) and Harsh >>> (@ananvay) for the initial implementation, >>> >>> David Yan (@davidyan) who led the project, Ning (@ningk) and myself >>> (@srohde) for the implementation and design, and Ahmet (@altay), Daniel >>> (@millsd), Pablo (@pabloem), and Robert (@robertwb) who all contributed a >>> lot of their time to help with the design and code reviews. >>> >>> >>> >>> It was a team effort and we wouldn't have been able to complete it >>> without the help of everyone involved. >>> >>> >>> >>> Regards, >>> >>> Sam >>> >>>
