Nice!

On Wed, Mar 18, 2020 at 2:58 PM Ahmet Altay <[email protected]> wrote:

> Great to see this progress! :)
>
> On Wed, Mar 18, 2020 at 2:57 PM Reza Rokni <[email protected]> wrote:
>
>> Awesome !
>>
>> On Thu, 19 Mar 2020, 05:38 Sam Rohde, <[email protected]> wrote:
>>
>>> Hi All!
>>>
>>>
>>>
>>> I am happy to announce that an improved Interactive Runner is now
>>> available on master. This Python runner allows for the interactive
>>> development of Beam pipelines in a notebook (and IPython) environment.
>>>
>>>
>>>
>>> The runner still has some bugs that need to be fixed as well as some
>>> refactoring, but it is in a good enough shape to start using it.
>>>
>>>
>>>
>>> Here are the new things you can do with the Interactive Runner:
>>>
>>>    -
>>>
>>>    Create and execute pipelines within a REPL
>>>    -
>>>
>>>    Visualize elements as the pipeline is running
>>>    -
>>>
>>>    Materialize PCollections to DataFrames
>>>    -
>>>
>>>    Record unbounded sources for deterministic replay
>>>    -
>>>
>>>    Replay cached unbounded sources including watermark advancements
>>>
>>> The code lives in sdks/python/apache_beam/runners/interactive
>>> <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive>
>>> and example notebooks are in
>>> sdks/python/apache_beam/runners/interactive/examples
>>> <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive/examples>
>>> .
>>>
>>>
>>>
>>> To install, use `pip install -e .[interactive]` in your <project
>>> root>/sdks/python directory.
>>>
>>> To run, here’s a quick example:
>>>
>>> ```
>>>
>>> import apache_beam as beam
>>>
>>> from apache_beam.runners.interactive.interactive_runner import
>>> InteractiveRunner
>>>
>>> import apache_beam.runners.interactive.interactive_beam as ib
>>>
>>>
>>>
>>> p = beam.Pipeline(InteractiveRunner())
>>>
>>> words = p | beam.Create(['To', 'be', 'or', 'not', 'to', 'be'])
>>>
>>> counts = words | 'count' >> beam.combiners.Count.PerElement()
>>>
>>>
>>>
>>> # Shows a dynamically updating display of the PCollection elements
>>>
>>> ib.show(counts)
>>>
>>>
>>>
>>> # We can now visualize the data using standard pandas operations.
>>>
>>> df = ib.collect(counts)
>>>
>>> print(df.info())
>>>
>>> print(df.describe())
>>>
>>>
>>>
>>> # Plot the top-10 counted words
>>>
>>> df = df.sort_values(by=1, ascending=False)
>>>
>>> df.head(n=10).plot(x=0, y=1)
>>>
>>> ```
>>>
>>>
>>>
>>> Currently, Batch is supported on any runner. Streaming is only supported
>>> on the DirectRunner (non-FnAPI).
>>>
>>>
>>>
>>> I would like to thank the great work of Sindy (@sindyli) and Harsh
>>> (@ananvay) for the initial implementation,
>>>
>>> David Yan (@davidyan) who led the project, Ning (@ningk) and myself
>>> (@srohde) for the implementation and design, and Ahmet (@altay), Daniel
>>> (@millsd), Pablo (@pabloem), and Robert (@robertwb) who all contributed a
>>> lot of their time to help with the design and code reviews.
>>>
>>>
>>>
>>> It was a team effort and we wouldn't have been able to complete it
>>> without the help of everyone involved.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Sam
>>>
>>>

Reply via email to