Sounds good.
On Thu, Jan 25, 2018 at 4:12 PM, Charles Chen <[email protected]> wrote: > Yes, that is correct. The scope of the attached fix is for in-process > runners. For remote runners, we should think about how to make PCollection > contents available after pipeline execution. We may also need to better > design eager / interactive execution for that use case, since our current > use of eager mode is geared towards testing transforms locally. > > On Thu, Jan 25, 2018 at 4:07 PM Robert Bradshaw <[email protected]> wrote: >> >> Sounds good. I assume there will still need to be runner-specific >> support for any runner that chooses to implement this (e.g. writing to >> remote files then reading them in?) >> >> On Thu, Jan 25, 2018 at 3:25 PM, Charles Chen <[email protected]> wrote: >> > Currently, the Python SDK supports an eager execution mode. For >> > example, a >> > list can be directly passed into a PTransform to obtain its result: >> > >> > result = [1, 2, 3] | MyPTransform() >> > >> > To support this use, the Python DirectRunner has an option to cache its >> > intermediate results into a PValueCache. The above line, when run, >> > implicitly creates an ephemeral pipeline and runs it with the >> > DirectRunner. >> > This, however, adds a lot of complexity to the DirectRunner, and is not >> > generalizable to other in-process Python runners (like the in-process >> > Python >> > FnApiRunner, which runs batch pipelines more efficiently than the >> > current >> > Python DirectRunner). >> > >> > To improve this, I will be removing this DirectRunner-specific >> > implementation and add functionality that allows all in-process Python >> > runners to be run in eager mode. >> > >> > Jira issue: https://issues.apache.org/jira/browse/BEAM-3537 >> > Candidate fix: https://github.com/apache/beam/pull/4492 >> > >> > Best, >> > Charles
