(I believe you wanted to add +David Yan <david...@google.com>)

I am happy to see there are multiple related efforts. Both are introducing
concepts. I would hope that beyond conflicts, we are not creating
duplication and building a coherent experience. Could you reference to the
discussions where this was agreed upon?

On Fri, Sep 6, 2019 at 2:15 PM Ning Kang <ni...@google.com> wrote:

> Thanks Alexey! The materialization of PCollection data directly from cache
> instead of going through the pipeline result would be very helpful for what
> we want to achieve!
>
> On Fri, Sep 6, 2019 at 12:31 PM Alexey Strokach <ostrok...@gmail.com>
> wrote:
>
>> Hi everyone,
>>
>> I have recently finished my internship at Google, which involved doing
>> some work with Apache Beam in a Jupyter Notebook environment. One
>> limitation that I encountered with my workflow is the lack of support for
>> introspecting the contents of a PCollection and excessive boilerplate
>> required to move data between a Beam Pipeline and the Python interpreter.
>>
>> With guidance from Vanya Tarasonv and Harsh Vardhan, I have created a
>> design document which describes those limitations:
>> https://docs.google.com/document/d/1sISjl4Q60mR1V22R1UZd417wVEn_EmZT-SalTHXG4H0/
>> .
>>
>> I also have two PRs outstanding, which add support for materializing and
>> accessing bounded and unbounded PCollections both from a Beam Pipeline and
>> from the Python interpreter.
>> - https://github.com/apache/beam/pull/8884
>> - https://github.com/apache/beam/pull/8961
>>
>> I am aware of the work being carried out by +Ning Kang and +David Yan on
>> [Interactive Beam](
>> https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/),
>> and upon discussion, it does not appear that our PRs would conflict with
>> their vision.
>>
>> Any feedback from the Apache Beam community would be very much
>> appreciated :).
>>
>> Thank you,
>> Alexey
>>
>>
>>
>>
>>

Reply via email to