(I believe you wanted to add +David Yan <david...@google.com>) I am happy to see there are multiple related efforts. Both are introducing concepts. I would hope that beyond conflicts, we are not creating duplication and building a coherent experience. Could you reference to the discussions where this was agreed upon?
On Fri, Sep 6, 2019 at 2:15 PM Ning Kang <ni...@google.com> wrote: > Thanks Alexey! The materialization of PCollection data directly from cache > instead of going through the pipeline result would be very helpful for what > we want to achieve! > > On Fri, Sep 6, 2019 at 12:31 PM Alexey Strokach <ostrok...@gmail.com> > wrote: > >> Hi everyone, >> >> I have recently finished my internship at Google, which involved doing >> some work with Apache Beam in a Jupyter Notebook environment. One >> limitation that I encountered with my workflow is the lack of support for >> introspecting the contents of a PCollection and excessive boilerplate >> required to move data between a Beam Pipeline and the Python interpreter. >> >> With guidance from Vanya Tarasonv and Harsh Vardhan, I have created a >> design document which describes those limitations: >> https://docs.google.com/document/d/1sISjl4Q60mR1V22R1UZd417wVEn_EmZT-SalTHXG4H0/ >> . >> >> I also have two PRs outstanding, which add support for materializing and >> accessing bounded and unbounded PCollections both from a Beam Pipeline and >> from the Python interpreter. >> - https://github.com/apache/beam/pull/8884 >> - https://github.com/apache/beam/pull/8961 >> >> I am aware of the work being carried out by +Ning Kang and +David Yan on >> [Interactive Beam]( >> https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/), >> and upon discussion, it does not appear that our PRs would conflict with >> their vision. >> >> Any feedback from the Apache Beam community would be very much >> appreciated :). >> >> Thank you, >> Alexey >> >> >> >> >>