[ https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419896&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419896 ]
ASF GitHub Bot logged work on BEAM-9651: ---------------------------------------- Author: ASF GitHub Bot Created on: 09/Apr/20 23:56 Start Date: 09/Apr/20 23:56 Worklog Time Spent: 10m Work Description: scwhittle commented on issue #11368: [BEAM-9651] Prevent StreamPool and stream initialization livelock URL: https://github.com/apache/beam/pull/11368#issuecomment-611808055 R: @reuvenlax Please trigger the tests before merging ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 419896) Time Spent: 2h 20m (was: 2h 10m) > StreamingDataflowWorker stuck waiting for > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext > ---------------------------------------------------------------------------------------------------------------------- > > Key: BEAM-9651 > URL: https://issues.apache.org/jira/browse/BEAM-9651 > Project: Beam > Issue Type: Bug > Components: runner-dataflow > Reporter: Sam Whittle > Assignee: Sam Whittle > Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > Operation ongoing in step <redacted> for at least 28h10m00s without > outputting or completing in state windmill-read at > sun.misc.Unsafe.park(Native Method) at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at > java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at > java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at > java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.<init>(GrpcWindmillServer.java:941) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown > Source) at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.<init>(WindmillServerStub.java:159) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.<init>(WindmillServerStub.java:158) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328) > at > org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389) > at > <redacted> > Because the stream is started in a StreamPool synchronized block, all other > threads interacting with StreamPool to get or release streams end up blocking. > It is unclear if the stream never became usable and thus blocked forever or > if there is a race with the use of the Phaser that causes the stuckness. -- This message was sent by Atlassian Jira (v8.3.4#803005)