Re: Interactive Beam Example Failing [BEAM-8451]

2019-10-24 Thread Harsh Vardhan
Thanks, +1 to adding support for streaming on Interactive Beam (+David Yan
)


On Thu, Oct 24, 2019 at 1:45 PM Hai Lu  wrote:

> Hi Robert,
>
> We're trying out iBeam at LinkedIn for Python. As Igor mentioned, there
> seems to be some inconsistency in the behavior of interactive beam. We can
> suggest some fixes from our end but we would need some support from the
> community.
>
> Also, is there a plan to support iBeam for streaming mode? We're
> interested in that use case as well.
>
> Thanks,
> Hai
>
> On Mon, Oct 21, 2019 at 4:45 PM Robert Bradshaw 
> wrote:
>
>> Thanks for trying this out. Yes, this is definitely something that
>> should be supported (and tested).
>>
>> On Mon, Oct 21, 2019 at 3:40 PM Igor Durovic  wrote:
>> >
>> > Hi everyone,
>> >
>> > The interactive beam example using the DirectRunner fails after
>> execution of the last cell. The recursion limit is exceeded during the
>> calculation of the cache label because of a circular reference in the
>> PipelineInfo object.
>> >
>> > The constructor for the PipelineInfo class creates a mapping from each
>> pcollection to the transforms that produce and consume it. The issue arises
>> when there exists a transform that is both a producer and a consumer for
>> the same pcollection. This occurs when a transform's expand method returns
>> the same pcoll object that's passed into it. The specific transform causing
>> the failure of the example is MaybeReshuffle, which is used in the Create
>> transform. Replacing "return pcoll" with "return pcoll | Map(lambda x: x)"
>> seems to fix the problem.
>> >
>> > A workaround for this issue on the interactive beam side would be
>> fairly simple, but it seems to me that there should be more validation of
>> pipelines to prevent the use of transforms that return the same pcoll
>> that's passed in, or at least a mention of this in the transform style
>> guide. My understanding is that pcollections are produced by a single
>> transform (they even have a field called "producer" that references only
>> one transform). If that's the case then that property of pcollections
>> should be enforced.
>> >
>> > I made ticket BEAM-8451 to track this issue.
>> >
>> > I'm still new to beam so I apologize if I'm fundamentally
>> misunderstanding something. I'm not exactly sure what the next step should
>> be and would appreciate some recommendations. I can submit a PR to solve
>> the immediate problem of the failing example but the underlying problem
>> should also be addressed at some point. I also apologize if people are
>> already aware of this problem.
>> >
>> > Thank You!
>> > Igor Durovic
>>
>


Re: [DISCUSS] How to stopp SdkWorker in SdkHarness

2019-10-22 Thread Harsh Vardhan
Would approach 1 be akin to abort semantics?

On Mon, Oct 21, 2019 at 8:01 PM jincheng sun 
wrote:

> Hi Luke,
>
> Thanks a lot for your reply. Since it allows to share one SDK harness
> between multiple executable stages, the control service termination may
> occur much later than the completion of an executable stage. This is the
> main reason I prefer runners to control the teardown of DoFns.
>
> Regarding to "SDK harnesses can terminate instances any time they want and
> start new instances anytime as well.", personally I think it's not conflict
> with the proposed Approach 1 as the SDK harness could decide what to do
> when receiving the teardown request. It could do nothing if the DoFns has
> already been teared down and could also tear down the DoFns if needed.
>
> What do you think?
>
> Best,
> Jincheng
>
> Luke Cwik  于2019年10月22日周二 上午2:05写道:
>
>> Approach 2 is currently the suggested approach[1] for DoFn's to shutdown.
>> Note that SDK harnesses can terminate instances any time they want and
>> start new instances anytime as well.
>>
>> Why do you want to expose this logic so that Runners could control it?
>>
>> 1:
>> https://docs.google.com/document/d/1n6s3BOxOPct3uF4UgbbI9O9rpdiKWFH9R6mtVmR7xp0/edit#
>>
>> On Mon, Oct 21, 2019 at 4:27 AM jincheng sun 
>> wrote:
>>
>>> Hi,
>>> I found that in `SdkHarness` do not  stop the `SdkWorker` when finish.
>>> We should add the logic for stop the `SdkWorker` in `SdkHarness`.  More
>>> detail can be found [1].
>>>
>>> There are two approaches to solve this issue:
>>>
>>> Approach 1:  We can add a Fn API for teardown purpose and the runner
>>> will teardown a specific bundle descriptor via this teardown Fn API during
>>> disposing.
>>> Approach 2: The control service termination could be seen as a signal
>>> and once SDK harness receives this signal, the teardown of the bundle
>>> descriptor will be performed.
>>>
>>> More detail can be found in [2].
>>>
>>> As the Approach 2, SDK harness could be shared between multiple
>>> executable stages. The control service termination only occurs when all the
>>> executable stages sharing the same SDK harness finished. This means that
>>> the teardown of DoFns may not be executed immediately after an executable
>>> stage is finished.
>>>
>>> So, I prefer Approach 1. Welcome any feedback :)
>>>
>>> Best,
>>> Jincheng
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/worker/sdk_worker.py
>>> [2]
>>> https://docs.google.com/document/d/1sCgy9VQPf9zVXKRquK8P6N4x7aB62GEO8ozkujRSHZg/edit?usp=sharing
>>>
>> --

Got feedback? go/harsh-feedback 


Re: Proposing interactive beam runner

2018-06-28 Thread Harsh Vardhan
It seems there is support in the community -- we have a PR out for review (
https://github.com/apache/beam/pull/5818), which provides a baseline
InteractiveRunner on which we can build further. PTAL.

Thanks,
~Harsh.

On Fri, Jun 8, 2018 at 5:57 PM Sindy Li  wrote:

> Hello,
>
> We were exploring ways to provide an interactive notebook experience for
> writing Beam Python pipelines. The design doc
> 
>  provides
> an overview/vision of what we would like to achieve. Pull request
>  provides a prototype for the
> same. The document also provides demo screen shots and instructions for
> running a demo in Jupyter. Please take a look. We believe this would be a
> useful addition to Beam.
>
> Thanks!
>
>
>