I think graceful shutdown has been historically overlooked, it would not
surprise me if there are a few things accidentally left out to gracefully
shutdown the runner harness/sdk.

IIRC there was also some discussion around starting up incorrectly as well
(requiring a certain order of SDK process startup and runner harness
startup, which may have had races as well.)

On Fri, Feb 8, 2019 at 4:49 PM Brian Hulette <bhule...@google.com> wrote:

> I think I've finally got a handle on this flake, and a possible solution
> [1]. One thing that's still bothering me though is that the "CANCELLED:
> Multiplexer hanging up" errors seem to be unavoidable.
>
> They occur when the GrpcDataService is closed [2] and it closes all of
> it's multiplexers, which just send an error to their outbound observers
> [3]. It seems to me that there should be a more graceful way to shut
> everything down, but I'm not seeing it. Am I missing something?
>
> grpc-java suggests using GrpcCleanupRule to gracefully shut-down
> in-process servers and clients [4], should we be utilizing that somehow?
>
> Brian
>
> [1] https://github.com/apache/beam/pull/7794
> [2]
> https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/data/GrpcDataService.java#L117
> [3]
> https://github.com/apache/beam/tree/master/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataGrpcMultiplexer.java#L112
> [4]
> https://github.com/grpc/grpc-java/blob/master/examples/README.md#unit-test-examples
>
> On Thu, Feb 7, 2019 at 11:49 AM Brian Hulette <bhule...@google.com> wrote:
>
>> This was already reported in BEAM-6512 [1], which Scott gave me as a
>> starter bug. I haven't been able to reproduce locally, so I'm trying to see
>> if I can get it to fail on Jenkins again with some additional logging [2].
>>
>> Definitely interested in other's thoughts on this, I only vaguely
>> understand what's going on. So far the only headway I've made is noticing
>> that the "CANCELLED: Multiplexer hanging up" error seems to always occur
>> exactly three times in failing tests. Successful runs may have one or two
>> of these messages but never three.
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-6512
>> [2] https://github.com/apache/beam/pull/7767
>>
>> On Tue, Feb 5, 2019 at 9:50 AM Alex Amato <ajam...@google.com> wrote:
>>
>>>
>>> org.apache.beam.runners.fnexecution.data.GrpcDataServiceTest.testMessageReceivedBySingleClientWhenThereAreMultipleClients
>>>
>>> I keep seeing this test failing in my PRs
>>>
>>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/4018/
>>>
>>>
>>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/4018/testReport/junit/org.apache.beam.runners.fnexecution.data/GrpcDataServiceTest/testMessageReceivedBySingleClientWhenThereAreMultipleClients/
>>>
>>>
>>> I've seen this one come and go for a few weeks or so. I am unsure
>>> exactly when it first occured.
>>>
>>

Reply via email to