Thanks for sharing, Ning! Is this update valuable to users as well? If so, consider sending a user-geared update to [email protected].
-P. On Wed, Dec 4, 2019 at 11:14 AM Ning Kang <[email protected]> wrote: > *If you are not an Interactive Beam user, you can ignore this email.* > > Hi everyone, > > Recently, we've been actively developing on top of the existing > InteractiveRunner for more Interactive Beam features > <https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit?usp=sharing> > . > > One of the things we've changed is what PCollections will be cached and > available for *get_result(pcoll)*. > > If your unit tests or code depend on executing a pipeline with the > InteractiveRunner and check data of the PCollection through > *get_result(pcoll)*, those code might run into an error saying "raise > ValueError('PCollection not available, please run the pipeline.')". > > This is because now Interactive Beam automatically figures out what > PCollections have been assigned to variables in the user-defined pipelines > in your code/test/notebooks by looking at a "watched" scope of variable > definitions. > By default everything defined in "__main__" is watched. > > So if you've defined a pipeline in a local scope such as a function, > Interactive Beam will not be able to "watch" it and then cache data for > those PCollections. > There is only one line change needed to fix the usage: watch your local > scope. > > Something like, > from apache_beam.runners.interactive import interactive_beam > ... > def some_func(...): > p = beam.Pipeline(InteractiveRunner()) > pcoll = p | 'SomeTransform' >> SomeTransform() > ... > interactive_beam.watch(locals()) > result = p.run() > ... > ... > > Thanks for using Interactive Beam! > > Ning. > > > >
