Thank you for the advice. Yes, the latch not being counted-down is the
problem. (my memo:
https://github.com/apache/beam/pull/14474#discussion_r619557479 ) I'll need
to figure out why withOnError is not called.


> Can you repro locally?

No, the task succeeds in my environment (./gradlew
:runners:google-cloud-dataflow-java:worker:test).


On Tue, May 11, 2021 at 12:34 PM Kenneth Knowles <k...@apache.org> wrote:

> I am not sure how much you read the code of the test. So apologies if I am
> saying things you already know. The test does something like:
>
>  - start a logging service
>  - set up some stub clients, each with onError wired up to release a
> countdown latch
>  - send error responses to all three of them (actually it sends the error
> in the same task it creates the stub)
>  - each task waits on the latch
>
> So if onError does not deliver or does not call to release the countdown
> latch, it will hang. I notice in the gist you provide that all three stub
> clients are hung awaiting the latch. That is suspicious to me. I would want
> to confirm if the flakiness always occurs in a way that hangs all three.
> Then there are gRPC workers waiting on empty queues, and the main test
> thread waiting for the hung tasks to complete.
>
> The problem could be something about the test set up. Personally I would
> add a ton of logs, or potentially use a debugger, to confirm exactly the
> state of things when it hangs. Can you repro locally? I think this same
> functionality could be tested in different ways that might remove some of
> the variables. For example starting up all the waiting tasks, then sending
> all the onError messages that should cause them to terminate.
>
> Since this is a unit test, adding a timeout to just that method should
> save time (but will make it harder to capture stack traces, etc). I've
> opened up https://github.com/apache/beam/pull/14781 for that. There may
> be a nice way to add a timeout to the executor to capture the hung stack,
> but I didn't look for it.
>
> Kenn
>
> On Tue, May 11, 2021 at 7:36 AM Tomo Suzuki <suzt...@google.com> wrote:
>
>> gRPC 1.37.0 showed the same problem:
>> BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer
>> waits tasks forever, causing timeout in Java precommit.
>>
>> While I continue my investigation, I appreciate if someone knows the
>> cause of the problem, I pasted the thread dump of the Java process when the
>> test was frozen:
>> https://github.com/apache/beam/pull/14768
>>
>> If this mystery is never solved, vendoring (a bit old) gRPC 1.32.2
>> without the jboss dependencies is an alternate option, (suggestion by Kenn;
>> memo
>> <https://issues.apache.org/jira/browse/BEAM-11227?focusedCommentId=17318238&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17318238>
>> )
>>
>> Regards,
>> Tomo
>>
>>
>> On Mon, May 10, 2021 at 9:40 AM Tomo Suzuki <suzt...@google.com> wrote:
>>
>>> I was investigating the strange timeout (
>>> https://github.com/apache/beam/pull/14474) but was occupied with
>>> something else lately.
>>> Let me try the new version today to see any improvements.
>>>
>>>
>>> On Mon, May 10, 2021 at 4:57 AM Ismaël Mejía <ieme...@gmail.com> wrote:
>>>
>>>> I just saw that gRPC 1.37.1 is out now (and with aarch64 support for
>>>> python!) that made me wonder about this, what is the current status of
>>>> upgrading the vendored dependency Tomo?
>>>>
>>>>
>>>> On Thu, Apr 8, 2021 at 4:16 PM Tomo Suzuki <suzt...@google.com> wrote:
>>>>
>>>>> We observed the cron job of Java Precommit for the master branch
>>>>> started timing out often (not always) since upgrading the gRPC version.
>>>>> https://github.com/apache/beam/pull/14466#issuecomment-815343974
>>>>>
>>>>> Exchanged messages with Kenn, I reverted to the change; now the master
>>>>> branch uses the vendored gRPC 1.26.
>>>>>
>>>>>
>>>>> On Wed, Mar 31, 2021 at 11:40 AM Kenneth Knowles <k...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Merged. Let's keep an eye for trouble, and I will incorporate to the
>>>>>> release branch.
>>>>>>
>>>>>> Kenn
>>>>>>
>>>>>> On Wed, Mar 31, 2021 at 6:45 AM Tomo Suzuki <suzt...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Regarding troubleshooting on build timeout, it seems that Docker
>>>>>>> cache in Jenkins machines might be playing a role. As I run more "Java
>>>>>>> Presubmit", I no longer observe timeouts in the PR.
>>>>>>>
>>>>>>> Kenn, would you merge the PR?
>>>>>>> https://github.com/apache/beam/pull/14295 (all checks green,
>>>>>>> including the new Java postcommit checks)
>>>>>>>
>>>>>>> On Thu, Mar 25, 2021 at 5:24 PM Kenneth Knowles <k...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Yes, I agree this might be a good idea. This is not the only major
>>>>>>>> issue on the release-2.29.0 branch.
>>>>>>>>
>>>>>>>> The counter argument is that we will be pulling in all the bugs
>>>>>>>> introduced to `master` since the branch cut.
>>>>>>>>
>>>>>>>> As far as effort goes, I have been mostly focused on burning down
>>>>>>>> the bugs so I would not lose much work in the release process.
>>>>>>>>
>>>>>>>> Kenn
>>>>>>>>
>>>>>>>> On Thu, Mar 25, 2021 at 1:42 PM Ismaël Mejía <ieme...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Precommit is quite unstable in the last days, so worth to check if
>>>>>>>>> something is wrong in the CI.
>>>>>>>>>
>>>>>>>>> I have a question Kenn. Given that cherry picking this might be a
>>>>>>>>> bit
>>>>>>>>> big as a change can we just reconsider cutting the 2.29.0 branch
>>>>>>>>> again
>>>>>>>>> after the updated gRPC version use gets merged and mark the issues
>>>>>>>>> already fixed for version 2.30.0 to version 2.29.0 ? Seems like an
>>>>>>>>> easier upgrade path (and we will get some nice fixes/improvements
>>>>>>>>> like
>>>>>>>>> official Spark 3 support for free on the release).
>>>>>>>>>
>>>>>>>>> WDYT?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Mar 24, 2021 at 8:06 PM Tomo Suzuki <suzt...@google.com>
>>>>>>>>> wrote:
>>>>>>>>> >
>>>>>>>>> > Update: I observe that Java precommit check is unstable in the
>>>>>>>>> PR to upgrade vendored gRPC (compared with an PR with an empty 
>>>>>>>>> change).
>>>>>>>>> There's no constant failures; sometimes it succeeds and other times it
>>>>>>>>> faces timeout and flaky test failures.
>>>>>>>>> >
>>>>>>>>> > https://github.com/apache/beam/pull/14295#issuecomment-806071087
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > On Mon, Mar 22, 2021 at 10:46 AM Tomo Suzuki <suzt...@google.com>
>>>>>>>>> wrote:
>>>>>>>>> >>
>>>>>>>>> >> Thank you for the voting and I see the artifact available in
>>>>>>>>> Maven Central. I'll work on the PR to use the published artifact 
>>>>>>>>> today.
>>>>>>>>> >>
>>>>>>>>> https://search.maven.org/artifact/org.apache.beam/beam-vendor-grpc-1_36_0/0.1/jar
>>>>>>>>> >>
>>>>>>>>> >> On Tue, Mar 16, 2021 at 3:07 PM Kenneth Knowles <
>>>>>>>>> k...@apache.org> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>> Update on this: there are some minor issues and then I'll send
>>>>>>>>> out the RC.
>>>>>>>>> >>>
>>>>>>>>> >>> I think this is worth blocking 2.29.0 release on, so I will do
>>>>>>>>> this first. We are still eliminating other blockers from 2.29.0 
>>>>>>>>> anyhow.
>>>>>>>>> >>>
>>>>>>>>> >>> Kenn
>>>>>>>>> >>>
>>>>>>>>> >>> On Mon, Mar 15, 2021 at 7:17 AM Tomo Suzuki <
>>>>>>>>> suzt...@google.com> wrote:
>>>>>>>>> >>>>
>>>>>>>>> >>>> Hi Beam developers,
>>>>>>>>> >>>>
>>>>>>>>> >>>> I'm working on upgrading the vendored gRPC 1.36.0
>>>>>>>>> >>>> https://issues.apache.org/jira/browse/BEAM-11227 (PR:
>>>>>>>>> https://github.com/apache/beam/pull/14028)
>>>>>>>>> >>>> Let me know if you have any questions or concerns.
>>>>>>>>> >>>>
>>>>>>>>> >>>> Background:
>>>>>>>>> >>>> Exchanged messages with Ismaël in BEAM-11227, it seems that
>>>>>>>>> it the ticket created by some automation is false positive, but it's 
>>>>>>>>> nice
>>>>>>>>> to use an artifact without being marked with CVE.
>>>>>>>>> >>>>
>>>>>>>>> >>>> Kenn offered to work as the release manager (as in
>>>>>>>>> https://s.apache.org/beam-release-vendored-artifacts) of the
>>>>>>>>> vendored artifact.
>>>>>>>>> >>>>
>>>>>>>>> >>>> --
>>>>>>>>> >>>> Regards,
>>>>>>>>> >>>> Tomo
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> --
>>>>>>>>> >> Regards,
>>>>>>>>> >> Tomo
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > --
>>>>>>>>> > Regards,
>>>>>>>>> > Tomo
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>> Tomo
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Tomo
>>>>>
>>>>
>>>
>>> --
>>> Regards,
>>> Tomo
>>>
>>
>>
>> --
>> Regards,
>> Tomo
>>
>

-- 
Regards,
Tomo

Reply via email to