Great to see this progress! :) On Wed, Mar 18, 2020 at 2:57 PM Reza Rokni <r...@google.com> wrote:
> Awesome ! > > On Thu, 19 Mar 2020, 05:38 Sam Rohde, <sro...@google.com> wrote: > >> Hi All! >> >> >> >> I am happy to announce that an improved Interactive Runner is now >> available on master. This Python runner allows for the interactive >> development of Beam pipelines in a notebook (and IPython) environment. >> >> >> >> The runner still has some bugs that need to be fixed as well as some >> refactoring, but it is in a good enough shape to start using it. >> >> >> >> Here are the new things you can do with the Interactive Runner: >> >> - >> >> Create and execute pipelines within a REPL >> - >> >> Visualize elements as the pipeline is running >> - >> >> Materialize PCollections to DataFrames >> - >> >> Record unbounded sources for deterministic replay >> - >> >> Replay cached unbounded sources including watermark advancements >> >> The code lives in sdks/python/apache_beam/runners/interactive >> <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive> >> and example notebooks are in >> sdks/python/apache_beam/runners/interactive/examples >> <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive/examples> >> . >> >> >> >> To install, use `pip install -e .[interactive]` in your <project >> root>/sdks/python directory. >> >> To run, here’s a quick example: >> >> ``` >> >> import apache_beam as beam >> >> from apache_beam.runners.interactive.interactive_runner import >> InteractiveRunner >> >> import apache_beam.runners.interactive.interactive_beam as ib >> >> >> >> p = beam.Pipeline(InteractiveRunner()) >> >> words = p | beam.Create(['To', 'be', 'or', 'not', 'to', 'be']) >> >> counts = words | 'count' >> beam.combiners.Count.PerElement() >> >> >> >> # Shows a dynamically updating display of the PCollection elements >> >> ib.show(counts) >> >> >> >> # We can now visualize the data using standard pandas operations. >> >> df = ib.collect(counts) >> >> print(df.info()) >> >> print(df.describe()) >> >> >> >> # Plot the top-10 counted words >> >> df = df.sort_values(by=1, ascending=False) >> >> df.head(n=10).plot(x=0, y=1) >> >> ``` >> >> >> >> Currently, Batch is supported on any runner. Streaming is only supported >> on the DirectRunner (non-FnAPI). >> >> >> >> I would like to thank the great work of Sindy (@sindyli) and Harsh >> (@ananvay) for the initial implementation, >> >> David Yan (@davidyan) who led the project, Ning (@ningk) and myself >> (@srohde) for the implementation and design, and Ahmet (@altay), Daniel >> (@millsd), Pablo (@pabloem), and Robert (@robertwb) who all contributed a >> lot of their time to help with the design and code reviews. >> >> >> >> It was a team effort and we wouldn't have been able to complete it >> without the help of everyone involved. >> >> >> >> Regards, >> >> Sam >> >>