Sounds good.

On Thu, Jan 25, 2018 at 4:12 PM, Charles Chen <[email protected]> wrote:
> Yes, that is correct.  The scope of the attached fix is for in-process
> runners.  For remote runners, we should think about how to make PCollection
> contents available after pipeline execution.  We may also need to better
> design eager / interactive execution for that use case, since our current
> use of eager mode is geared towards testing transforms locally.
>
> On Thu, Jan 25, 2018 at 4:07 PM Robert Bradshaw <[email protected]> wrote:
>>
>> Sounds good. I assume there will still need to be runner-specific
>> support for any runner that chooses to implement this (e.g. writing to
>> remote files then reading them in?)
>>
>> On Thu, Jan 25, 2018 at 3:25 PM, Charles Chen <[email protected]> wrote:
>> > Currently, the Python SDK supports an eager execution mode.  For
>> > example, a
>> > list can be directly passed into a PTransform to obtain its result:
>> >
>> > result = [1, 2, 3] | MyPTransform()
>> >
>> > To support this use, the Python DirectRunner has an option to cache its
>> > intermediate results into a PValueCache.  The above line, when run,
>> > implicitly creates an ephemeral pipeline and runs it with the
>> > DirectRunner.
>> > This, however, adds a lot of complexity to the DirectRunner, and is not
>> > generalizable to other in-process Python runners (like the in-process
>> > Python
>> > FnApiRunner, which runs batch pipelines more efficiently than the
>> > current
>> > Python DirectRunner).
>> >
>> > To improve this, I will be removing this DirectRunner-specific
>> > implementation and add functionality that allows all in-process Python
>> > runners to be run in eager mode.
>> >
>> > Jira issue: https://issues.apache.org/jira/browse/BEAM-3537
>> > Candidate fix: https://github.com/apache/beam/pull/4492
>> >
>> > Best,
>> > Charles

Reply via email to