This is something I've run into while working on the reference runner and
it's bugged me too. I've tried looking into what the issue was but usually
hit dead ends. Your post is really helpful, I might use it to take another
look when I have the time.

On Fri, Feb 8, 2019 at 5:26 PM Alex Amato <ajam...@google.com> wrote:

> I think graceful shutdown has been historically overlooked, it would not
> surprise me if there are a few things accidentally left out to gracefully
> shutdown the runner harness/sdk.
>
> IIRC there was also some discussion around starting up incorrectly as well
> (requiring a certain order of SDK process startup and runner harness
> startup, which may have had races as well.)
>
> On Fri, Feb 8, 2019 at 4:49 PM Brian Hulette <bhule...@google.com> wrote:
>
>> I think I've finally got a handle on this flake, and a possible solution
>> [1]. One thing that's still bothering me though is that the "CANCELLED:
>> Multiplexer hanging up" errors seem to be unavoidable.
>>
>> They occur when the GrpcDataService is closed [2] and it closes all of
>> it's multiplexers, which just send an error to their outbound observers
>> [3]. It seems to me that there should be a more graceful way to shut
>> everything down, but I'm not seeing it. Am I missing something?
>>
>> grpc-java suggests using GrpcCleanupRule to gracefully shut-down
>> in-process servers and clients [4], should we be utilizing that somehow?
>>
>> Brian
>>
>> [1] https://github.com/apache/beam/pull/7794
>> [2]
>> https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/data/GrpcDataService.java#L117
>> [3]
>> https://github.com/apache/beam/tree/master/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataGrpcMultiplexer.java#L112
>> [4]
>> https://github.com/grpc/grpc-java/blob/master/examples/README.md#unit-test-examples
>>
>> On Thu, Feb 7, 2019 at 11:49 AM Brian Hulette <bhule...@google.com>
>> wrote:
>>
>>> This was already reported in BEAM-6512 [1], which Scott gave me as a
>>> starter bug. I haven't been able to reproduce locally, so I'm trying to see
>>> if I can get it to fail on Jenkins again with some additional logging [2].
>>>
>>> Definitely interested in other's thoughts on this, I only vaguely
>>> understand what's going on. So far the only headway I've made is noticing
>>> that the "CANCELLED: Multiplexer hanging up" error seems to always occur
>>> exactly three times in failing tests. Successful runs may have one or two
>>> of these messages but never three.
>>>
>>> [1] https://issues.apache.org/jira/browse/BEAM-6512
>>> [2] https://github.com/apache/beam/pull/7767
>>>
>>> On Tue, Feb 5, 2019 at 9:50 AM Alex Amato <ajam...@google.com> wrote:
>>>
>>>>
>>>> org.apache.beam.runners.fnexecution.data.GrpcDataServiceTest.testMessageReceivedBySingleClientWhenThereAreMultipleClients
>>>>
>>>> I keep seeing this test failing in my PRs
>>>>
>>>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/4018/
>>>>
>>>>
>>>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/4018/testReport/junit/org.apache.beam.runners.fnexecution.data/GrpcDataServiceTest/testMessageReceivedBySingleClientWhenThereAreMultipleClients/
>>>>
>>>>
>>>> I've seen this one come and go for a few weeks or so. I am unsure
>>>> exactly when it first occured.
>>>>
>>>

Reply via email to