[
https://issues.apache.org/jira/browse/FLINK-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435319#comment-15435319
]
ASF GitHub Bot commented on FLINK-4437:
---------------------------------------
Github user tedyu commented on the issue:
https://github.com/apache/flink/pull/2409
I ran test suite which patch which failed here:
```
Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 201.106 sec
<<< FAILURE! - in
org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase
The JobManager should handle gracefully failing task manager with slot
sharing(org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase)
Time elapsed: 200.43 sec <<< ERROR!
java.util.concurrent.TimeoutException: Futures timed out after [200000
milliseconds]
at
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)
at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:86)
at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:86)
at
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.ready(package.scala:86)
at
org.apache.flink.runtime.minicluster.FlinkMiniCluster.waitForTaskManagersToBeRegistered(FlinkMiniCluster.scala:455)
at
org.apache.flink.runtime.minicluster.FlinkMiniCluster.waitForTaskManagersToBeRegistered(FlinkMiniCluster.scala:439)
at
org.apache.flink.runtime.minicluster.FlinkMiniCluster.start(FlinkMiniCluster.scala:330)
at
org.apache.flink.runtime.minicluster.FlinkMiniCluster.start(FlinkMiniCluster.scala:269)
at
org.apache.flink.runtime.testingUtils.TestingUtils$.startTestingCluster(TestingUtils.scala:86)
at
org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(TaskManagerFailsWithSlotSharingITCase.scala:73)
at
org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(TaskManagerFailsWithSlotSharingITCase.scala:53)
at
org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(TaskManagerFailsWithSlotSharingITCase.scala:53)
at
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.WordSpecLike$$anon$1.apply(WordSpecLike.scala:953)
at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
at
org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase.withFixture(TaskManagerFailsWithSlotSharingITCase.scala:38)
```
Doesn't seem to be related to patch.
> Lock evasion around lastTriggeredCheckpoint may lead to lost updates to
> related fields
> --------------------------------------------------------------------------------------
>
> Key: FLINK-4437
> URL: https://issues.apache.org/jira/browse/FLINK-4437
> Project: Flink
> Issue Type: Bug
> Reporter: Ted Yu
>
> In CheckpointCoordinator#triggerCheckpoint():
> {code}
> // make sure the minimum interval between checkpoints has passed
> if (lastTriggeredCheckpoint + minPauseBetweenCheckpoints > timestamp)
> {
> {code}
> If two threads evaluate 'lastTriggeredCheckpoint + minPauseBetweenCheckpoints
> > timestamp' in close proximity before lastTriggeredCheckpoint is updated,
> the two threads may have an inconsistent view of "lastTriggeredCheckpoint"
> and updates to fields correlated with "lastTriggeredCheckpoint" may be lost.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)