[ 
https://issues.apache.org/jira/browse/BEAM-11400?focusedWorklogId=521765&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521765
 ]

ASF GitHub Bot logged work on BEAM-11400:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/Dec/20 17:00
            Start Date: 08/Dec/20 17:00
    Worklog Time Spent: 10m 
      Work Description: reuvenlax commented on pull request #13486:
URL: https://github.com/apache/beam/pull/13486#issuecomment-740763268


   run dataflow validatesrunner


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 521765)
    Time Spent: 50m  (was: 40m)

> StreamingDataflowWorker stuck commits logic triggers exceptions if commits 
> eventually complete
> ----------------------------------------------------------------------------------------------
>
>                 Key: BEAM-11400
>                 URL: https://issues.apache.org/jira/browse/BEAM-11400
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>            Reporter: Sam Whittle
>            Assignee: Sam Whittle
>            Priority: P2
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Commits that have not completed in a timeout are cancelled as stuck and lost, 
> in logs showing up as:
> Detected key with sharding key -6893288510319386341 stuck in COMMITTING 
> state, completing it with error.
> However if the commit was not  lost but just very slow, when it eventually 
> does complete the following error occurs:
> Exception while processing commit response {}
> "java.lang.NullPointerException
>       at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:877)
>       at 
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$ComputationState.completeWork(StreamingDataflowWorker.java:2246)
> This occurs on the commit stream which finishes processing the current batch 
> of responses but then throws the error.  This causes the stream to complete 
> with an error, resending all of the other commits.  So if there were a large 
> number of commits on the stream, we make  slow progress and only complete a 
> couple before retrying everything again.  This slowdown can cause further 
> commits to exceed the timeout, entering a feedback loop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to