*If you are not an Interactive Beam user, you can ignore this email.*

Hi everyone,

Recently, we've been actively developing on top of the existing
InteractiveRunner for more Interactive Beam features
<https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit?usp=sharing>
.

One of the things we've changed is what PCollections will be cached and
available for *get_result(pcoll)*.

If your unit tests or code depend on executing a pipeline with the
InteractiveRunner and check data of the PCollection through
*get_result(pcoll)*, those code might run into an error saying "raise
ValueError('PCollection not available, please run the pipeline.')".

This is because now Interactive Beam automatically figures out what
PCollections have been assigned to variables in the user-defined pipelines
in your code/test/notebooks by looking at a "watched" scope of variable
definitions.
By default everything defined in "__main__" is watched.

So if you've defined a pipeline in a local scope such as a function,
Interactive Beam will not be able to "watch" it and then cache data for
those PCollections.
There is only one line change needed to fix the usage: watch your local
scope.

Something like,
from apache_beam.runners.interactive import interactive_beam
...
def some_func(...):
    p = beam.Pipeline(InteractiveRunner())
    pcoll = p | 'SomeTransform' >> SomeTransform()
    ...
    interactive_beam.watch(locals())
    result = p.run()
    ...
...

Thanks for using Interactive Beam!

Ning.

Reply via email to