[jira] [Updated] (FLINK-11375) Concurrent modification to slot pool due to SlotSharingManager releaseSlot directly
[ https://issues.apache.org/jira/browse/FLINK-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-11375: -- Priority: Critical (was: Major) > Concurrent modification to slot pool due to SlotSharingManager releaseSlot > directly > > > Key: FLINK-11375 > URL: https://issues.apache.org/jira/browse/FLINK-11375 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, JobManager >Affects Versions: 1.7.1 >Reporter: shuai.xu >Assignee: BoWang >Priority: Critical > Labels: pull-request-available > Fix For: 1.8.0 > > Time Spent: 20m > Remaining Estimate: 0h > > In SlotPool, the AvailableSlots is lock free, so all access to it should in > the main thread of SlotPool, and so all the public methods are called through > SlotPoolGateway except the releaseSlot directly called by SlotSharingManager. > This may cause a ConcurrentModificationException. > 2019-01-16 19:50:16,184 INFO [flink-akka.actor.default-dispatcher-161] > org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: > BlinkStoreScanTableSource > feature_memory_entity_store-entity_lsc_page_detail_feats_group_178-Batch -> > SourceConversion(table:[_DataStreamTable_12, source: > [BlinkStoreScanTableSource > feature_memory_entity_store-entity_lsc_page_detail_feats_group_178]], > fields:(f0)) -> correlate: > table(ScanBlinkStore_entity_lsc_page_detail_feats_group_1786($cor6.f0)), > select: > item_id,mainse_searcher_rank__cart_uv,mainse_searcher_rank__cart_uv_14,mainse_searcher_rank__cart_uv_30,mainse_searcher_rank__cart_uv_7,mainse_s > (433/500) (bd34af8dd7ee02d04a4a25e698495f0a) switched from RUNNING to > FINISHED. > 2019-01-16 19:50:16,187 INFO [jobmanager-future-thread-90] > org.apache.flink.runtime.executiongraph.ExecutionGraph - scheduleVertices > meet exception, need to fail global execution graph > java.lang.reflect.UndeclaredThrowableException > at org.apache.flink.runtime.rpc.akka.$Proxy26.allocateSlots(Unknown Source) > at > org.apache.flink.runtime.jobmaster.slotpool.SlotPool$ProviderAndOwner.allocateSlots(SlotPool.java:1955) > at > org.apache.flink.runtime.executiongraph.ExecutionGraph.schedule(ExecutionGraph.java:965) > at > org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleVertices(ExecutionGraph.java:1503) > at > org.apache.flink.runtime.jobmaster.GraphManager$ExecutionGraphVertexScheduler.scheduleExecutionVertices(GraphManager.java:349) > at > org.apache.flink.runtime.schedule.StepwiseSchedulingPlugin.scheduleOneByOne(StepwiseSchedulingPlugin.java:132) > at > org.apache.flink.runtime.schedule.StepwiseSchedulingPlugin.onExecutionVertexFailover(StepwiseSchedulingPlugin.java:107) > at > org.apache.flink.runtime.jobmaster.GraphManager.notifyExecutionVertexFailover(GraphManager.java:163) > at > org.apache.flink.runtime.executiongraph.ExecutionGraph.resetExecutionVerticesAndNotify(ExecutionGraph.java:1372) > at > org.apache.flink.runtime.executiongraph.failover.FailoverRegion.restart(FailoverRegion.java:213) > at > org.apache.flink.runtime.executiongraph.failover.FailoverRegion.reset(FailoverRegion.java:198) > at > org.apache.flink.runtime.executiongraph.failover.FailoverRegion.allVerticesInTerminalState(FailoverRegion.java:97) > at > org.apache.flink.runtime.executiongraph.failover.FailoverRegion.lambda$cancel$0(FailoverRegion.java:169) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:186) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:299) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:834) > Caused by: java.util.concurrent.ExecutionException: > java.util.ConcurrentModificationException > at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > at > org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.invokeRpc(AkkaInvocationHandler.java:213) > at >
[jira] [Updated] (FLINK-11375) Concurrent modification to slot pool due to SlotSharingManager releaseSlot directly
[ https://issues.apache.org/jira/browse/FLINK-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-11375: -- Component/s: Distributed Coordination > Concurrent modification to slot pool due to SlotSharingManager releaseSlot > directly > > > Key: FLINK-11375 > URL: https://issues.apache.org/jira/browse/FLINK-11375 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, JobManager >Affects Versions: 1.7.1 >Reporter: shuai.xu >Assignee: BoWang >Priority: Major > Labels: pull-request-available > Fix For: 1.8.0 > > Time Spent: 20m > Remaining Estimate: 0h > > In SlotPool, the AvailableSlots is lock free, so all access to it should in > the main thread of SlotPool, and so all the public methods are called through > SlotPoolGateway except the releaseSlot directly called by SlotSharingManager. > This may cause a ConcurrentModificationException. > 2019-01-16 19:50:16,184 INFO [flink-akka.actor.default-dispatcher-161] > org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: > BlinkStoreScanTableSource > feature_memory_entity_store-entity_lsc_page_detail_feats_group_178-Batch -> > SourceConversion(table:[_DataStreamTable_12, source: > [BlinkStoreScanTableSource > feature_memory_entity_store-entity_lsc_page_detail_feats_group_178]], > fields:(f0)) -> correlate: > table(ScanBlinkStore_entity_lsc_page_detail_feats_group_1786($cor6.f0)), > select: > item_id,mainse_searcher_rank__cart_uv,mainse_searcher_rank__cart_uv_14,mainse_searcher_rank__cart_uv_30,mainse_searcher_rank__cart_uv_7,mainse_s > (433/500) (bd34af8dd7ee02d04a4a25e698495f0a) switched from RUNNING to > FINISHED. > 2019-01-16 19:50:16,187 INFO [jobmanager-future-thread-90] > org.apache.flink.runtime.executiongraph.ExecutionGraph - scheduleVertices > meet exception, need to fail global execution graph > java.lang.reflect.UndeclaredThrowableException > at org.apache.flink.runtime.rpc.akka.$Proxy26.allocateSlots(Unknown Source) > at > org.apache.flink.runtime.jobmaster.slotpool.SlotPool$ProviderAndOwner.allocateSlots(SlotPool.java:1955) > at > org.apache.flink.runtime.executiongraph.ExecutionGraph.schedule(ExecutionGraph.java:965) > at > org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleVertices(ExecutionGraph.java:1503) > at > org.apache.flink.runtime.jobmaster.GraphManager$ExecutionGraphVertexScheduler.scheduleExecutionVertices(GraphManager.java:349) > at > org.apache.flink.runtime.schedule.StepwiseSchedulingPlugin.scheduleOneByOne(StepwiseSchedulingPlugin.java:132) > at > org.apache.flink.runtime.schedule.StepwiseSchedulingPlugin.onExecutionVertexFailover(StepwiseSchedulingPlugin.java:107) > at > org.apache.flink.runtime.jobmaster.GraphManager.notifyExecutionVertexFailover(GraphManager.java:163) > at > org.apache.flink.runtime.executiongraph.ExecutionGraph.resetExecutionVerticesAndNotify(ExecutionGraph.java:1372) > at > org.apache.flink.runtime.executiongraph.failover.FailoverRegion.restart(FailoverRegion.java:213) > at > org.apache.flink.runtime.executiongraph.failover.FailoverRegion.reset(FailoverRegion.java:198) > at > org.apache.flink.runtime.executiongraph.failover.FailoverRegion.allVerticesInTerminalState(FailoverRegion.java:97) > at > org.apache.flink.runtime.executiongraph.failover.FailoverRegion.lambda$cancel$0(FailoverRegion.java:169) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:186) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:299) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:834) > Caused by: java.util.concurrent.ExecutionException: > java.util.ConcurrentModificationException > at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > at > org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.invokeRpc(AkkaInvocationHandler.java:213) > at >
[jira] [Updated] (FLINK-11375) Concurrent modification to slot pool due to SlotSharingManager releaseSlot directly
[ https://issues.apache.org/jira/browse/FLINK-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-11375: --- Labels: pull-request-available (was: ) > Concurrent modification to slot pool due to SlotSharingManager releaseSlot > directly > > > Key: FLINK-11375 > URL: https://issues.apache.org/jira/browse/FLINK-11375 > Project: Flink > Issue Type: Bug > Components: JobManager >Affects Versions: 1.7.1 >Reporter: shuai.xu >Assignee: BoWang >Priority: Major > Labels: pull-request-available > Fix For: 1.8.0 > > > In SlotPool, the AvailableSlots is lock free, so all access to it should in > the main thread of SlotPool, and so all the public methods are called through > SlotPoolGateway except the releaseSlot directly called by SlotSharingManager. > This may cause a ConcurrentModificationException. > 2019-01-16 19:50:16,184 INFO [flink-akka.actor.default-dispatcher-161] > org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: > BlinkStoreScanTableSource > feature_memory_entity_store-entity_lsc_page_detail_feats_group_178-Batch -> > SourceConversion(table:[_DataStreamTable_12, source: > [BlinkStoreScanTableSource > feature_memory_entity_store-entity_lsc_page_detail_feats_group_178]], > fields:(f0)) -> correlate: > table(ScanBlinkStore_entity_lsc_page_detail_feats_group_1786($cor6.f0)), > select: > item_id,mainse_searcher_rank__cart_uv,mainse_searcher_rank__cart_uv_14,mainse_searcher_rank__cart_uv_30,mainse_searcher_rank__cart_uv_7,mainse_s > (433/500) (bd34af8dd7ee02d04a4a25e698495f0a) switched from RUNNING to > FINISHED. > 2019-01-16 19:50:16,187 INFO [jobmanager-future-thread-90] > org.apache.flink.runtime.executiongraph.ExecutionGraph - scheduleVertices > meet exception, need to fail global execution graph > java.lang.reflect.UndeclaredThrowableException > at org.apache.flink.runtime.rpc.akka.$Proxy26.allocateSlots(Unknown Source) > at > org.apache.flink.runtime.jobmaster.slotpool.SlotPool$ProviderAndOwner.allocateSlots(SlotPool.java:1955) > at > org.apache.flink.runtime.executiongraph.ExecutionGraph.schedule(ExecutionGraph.java:965) > at > org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleVertices(ExecutionGraph.java:1503) > at > org.apache.flink.runtime.jobmaster.GraphManager$ExecutionGraphVertexScheduler.scheduleExecutionVertices(GraphManager.java:349) > at > org.apache.flink.runtime.schedule.StepwiseSchedulingPlugin.scheduleOneByOne(StepwiseSchedulingPlugin.java:132) > at > org.apache.flink.runtime.schedule.StepwiseSchedulingPlugin.onExecutionVertexFailover(StepwiseSchedulingPlugin.java:107) > at > org.apache.flink.runtime.jobmaster.GraphManager.notifyExecutionVertexFailover(GraphManager.java:163) > at > org.apache.flink.runtime.executiongraph.ExecutionGraph.resetExecutionVerticesAndNotify(ExecutionGraph.java:1372) > at > org.apache.flink.runtime.executiongraph.failover.FailoverRegion.restart(FailoverRegion.java:213) > at > org.apache.flink.runtime.executiongraph.failover.FailoverRegion.reset(FailoverRegion.java:198) > at > org.apache.flink.runtime.executiongraph.failover.FailoverRegion.allVerticesInTerminalState(FailoverRegion.java:97) > at > org.apache.flink.runtime.executiongraph.failover.FailoverRegion.lambda$cancel$0(FailoverRegion.java:169) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:186) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:299) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:834) > Caused by: java.util.concurrent.ExecutionException: > java.util.ConcurrentModificationException > at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > at > org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.invokeRpc(AkkaInvocationHandler.java:213) > at > org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.invoke(AkkaInvocationHandler.java:125) > ... 23 more > Caused by:
[jira] [Updated] (FLINK-11375) Concurrent modification to slot pool due to SlotSharingManager releaseSlot directly
[ https://issues.apache.org/jira/browse/FLINK-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shuai.xu updated FLINK-11375: - Description: In SlotPool, the AvailableSlots is lock free, so all access to it should in the main thread of SlotPool, and so all the public methods are called through SlotPoolGateway except the releaseSlot directly called by SlotSharingManager. This may cause a ConcurrentModificationException. 2019-01-16 19:50:16,184 INFO [flink-akka.actor.default-dispatcher-161] org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: BlinkStoreScanTableSource feature_memory_entity_store-entity_lsc_page_detail_feats_group_178-Batch -> SourceConversion(table:[_DataStreamTable_12, source: [BlinkStoreScanTableSource feature_memory_entity_store-entity_lsc_page_detail_feats_group_178]], fields:(f0)) -> correlate: table(ScanBlinkStore_entity_lsc_page_detail_feats_group_1786($cor6.f0)), select: item_id,mainse_searcher_rank__cart_uv,mainse_searcher_rank__cart_uv_14,mainse_searcher_rank__cart_uv_30,mainse_searcher_rank__cart_uv_7,mainse_s (433/500) (bd34af8dd7ee02d04a4a25e698495f0a) switched from RUNNING to FINISHED. 2019-01-16 19:50:16,187 INFO [jobmanager-future-thread-90] org.apache.flink.runtime.executiongraph.ExecutionGraph - scheduleVertices meet exception, need to fail global execution graph java.lang.reflect.UndeclaredThrowableException at org.apache.flink.runtime.rpc.akka.$Proxy26.allocateSlots(Unknown Source) at org.apache.flink.runtime.jobmaster.slotpool.SlotPool$ProviderAndOwner.allocateSlots(SlotPool.java:1955) at org.apache.flink.runtime.executiongraph.ExecutionGraph.schedule(ExecutionGraph.java:965) at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleVertices(ExecutionGraph.java:1503) at org.apache.flink.runtime.jobmaster.GraphManager$ExecutionGraphVertexScheduler.scheduleExecutionVertices(GraphManager.java:349) at org.apache.flink.runtime.schedule.StepwiseSchedulingPlugin.scheduleOneByOne(StepwiseSchedulingPlugin.java:132) at org.apache.flink.runtime.schedule.StepwiseSchedulingPlugin.onExecutionVertexFailover(StepwiseSchedulingPlugin.java:107) at org.apache.flink.runtime.jobmaster.GraphManager.notifyExecutionVertexFailover(GraphManager.java:163) at org.apache.flink.runtime.executiongraph.ExecutionGraph.resetExecutionVerticesAndNotify(ExecutionGraph.java:1372) at org.apache.flink.runtime.executiongraph.failover.FailoverRegion.restart(FailoverRegion.java:213) at org.apache.flink.runtime.executiongraph.failover.FailoverRegion.reset(FailoverRegion.java:198) at org.apache.flink.runtime.executiongraph.failover.FailoverRegion.allVerticesInTerminalState(FailoverRegion.java:97) at org.apache.flink.runtime.executiongraph.failover.FailoverRegion.lambda$cancel$0(FailoverRegion.java:169) at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:186) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:299) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) at java.lang.Thread.run(Thread.java:834) Caused by: java.util.concurrent.ExecutionException: java.util.ConcurrentModificationException at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) at org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.invokeRpc(AkkaInvocationHandler.java:213) at org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.invoke(AkkaInvocationHandler.java:125) ... 23 more Caused by: java.util.ConcurrentModificationException at java.util.HashMap$ValueSpliterator.tryAdvance(HashMap.java:1643) at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126) at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:498) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:485) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:152) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.findFirst(ReferencePipeline.java:464) at
[jira] [Updated] (FLINK-11375) Concurrent modification to slot pool due to SlotSharingManager releaseSlot directly
[ https://issues.apache.org/jira/browse/FLINK-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-11375: -- Fix Version/s: 1.8.0 > Concurrent modification to slot pool due to SlotSharingManager releaseSlot > directly > > > Key: FLINK-11375 > URL: https://issues.apache.org/jira/browse/FLINK-11375 > Project: Flink > Issue Type: Bug > Components: JobManager >Affects Versions: 1.7.1 >Reporter: shuai.xu >Priority: Major > Fix For: 1.8.0 > > > In SlotPool, the AvailableSlots is lock free, so all access to it should in > the main thread of SlotPool, and so all the public methods are called throw > SlotPoolGateway except the releaseSlot directly called by SlotSharingManager. > This may cause a ConcurrentModificationException. > 2019-01-16 19:50:16,184 INFO [flink-akka.actor.default-dispatcher-161] > org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: > BlinkStoreScanTableSource > feature_memory_entity_store-entity_lsc_page_detail_feats_group_178-Batch -> > SourceConversion(table:[_DataStreamTable_12, source: > [BlinkStoreScanTableSource > feature_memory_entity_store-entity_lsc_page_detail_feats_group_178]], > fields:(f0)) -> correlate: > table(ScanBlinkStore_entity_lsc_page_detail_feats_group_1786($cor6.f0)), > select: > item_id,mainse_searcher_rank__cart_uv,mainse_searcher_rank__cart_uv_14,mainse_searcher_rank__cart_uv_30,mainse_searcher_rank__cart_uv_7,mainse_s > (433/500) (bd34af8dd7ee02d04a4a25e698495f0a) switched from RUNNING to > FINISHED. > 2019-01-16 19:50:16,187 INFO [jobmanager-future-thread-90] > org.apache.flink.runtime.executiongraph.ExecutionGraph - scheduleVertices > meet exception, need to fail global execution graph > java.lang.reflect.UndeclaredThrowableException > at org.apache.flink.runtime.rpc.akka.$Proxy26.allocateSlots(Unknown Source) > at > org.apache.flink.runtime.jobmaster.slotpool.SlotPool$ProviderAndOwner.allocateSlots(SlotPool.java:1955) > at > org.apache.flink.runtime.executiongraph.ExecutionGraph.schedule(ExecutionGraph.java:965) > at > org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleVertices(ExecutionGraph.java:1503) > at > org.apache.flink.runtime.jobmaster.GraphManager$ExecutionGraphVertexScheduler.scheduleExecutionVertices(GraphManager.java:349) > at > org.apache.flink.runtime.schedule.StepwiseSchedulingPlugin.scheduleOneByOne(StepwiseSchedulingPlugin.java:132) > at > org.apache.flink.runtime.schedule.StepwiseSchedulingPlugin.onExecutionVertexFailover(StepwiseSchedulingPlugin.java:107) > at > org.apache.flink.runtime.jobmaster.GraphManager.notifyExecutionVertexFailover(GraphManager.java:163) > at > org.apache.flink.runtime.executiongraph.ExecutionGraph.resetExecutionVerticesAndNotify(ExecutionGraph.java:1372) > at > org.apache.flink.runtime.executiongraph.failover.FailoverRegion.restart(FailoverRegion.java:213) > at > org.apache.flink.runtime.executiongraph.failover.FailoverRegion.reset(FailoverRegion.java:198) > at > org.apache.flink.runtime.executiongraph.failover.FailoverRegion.allVerticesInTerminalState(FailoverRegion.java:97) > at > org.apache.flink.runtime.executiongraph.failover.FailoverRegion.lambda$cancel$0(FailoverRegion.java:169) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:186) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:299) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:834) > Caused by: java.util.concurrent.ExecutionException: > java.util.ConcurrentModificationException > at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > at > org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.invokeRpc(AkkaInvocationHandler.java:213) > at > org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.invoke(AkkaInvocationHandler.java:125) > ... 23 more > Caused by: java.util.ConcurrentModificationException > at java.util.HashMap$ValueSpliterator.tryAdvance(HashMap.java:1643) > at >
[jira] [Updated] (FLINK-11375) Concurrent modification to slot pool due to SlotSharingManager releaseSlot directly
[ https://issues.apache.org/jira/browse/FLINK-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shuai.xu updated FLINK-11375: - Description: In SlotPool, the AvailableSlots is lock free, so all access to it should in the main thread of SlotPool, and so all the public methods are called throw SlotPoolGateway except the releaseSlot directly called by SlotSharingManager. This may cause a ConcurrentModificationException. 2019-01-16 19:50:16,184 INFO [flink-akka.actor.default-dispatcher-161] org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: BlinkStoreScanTableSource feature_memory_entity_store-entity_lsc_page_detail_feats_group_178-Batch -> SourceConversion(table:[_DataStreamTable_12, source: [BlinkStoreScanTableSource feature_memory_entity_store-entity_lsc_page_detail_feats_group_178]], fields:(f0)) -> correlate: table(ScanBlinkStore_entity_lsc_page_detail_feats_group_1786($cor6.f0)), select: item_id,mainse_searcher_rank__cart_uv,mainse_searcher_rank__cart_uv_14,mainse_searcher_rank__cart_uv_30,mainse_searcher_rank__cart_uv_7,mainse_s (433/500) (bd34af8dd7ee02d04a4a25e698495f0a) switched from RUNNING to FINISHED. 2019-01-16 19:50:16,187 INFO [jobmanager-future-thread-90] org.apache.flink.runtime.executiongraph.ExecutionGraph - scheduleVertices meet exception, need to fail global execution graph java.lang.reflect.UndeclaredThrowableException at org.apache.flink.runtime.rpc.akka.$Proxy26.allocateSlots(Unknown Source) at org.apache.flink.runtime.jobmaster.slotpool.SlotPool$ProviderAndOwner.allocateSlots(SlotPool.java:1955) at org.apache.flink.runtime.executiongraph.ExecutionGraph.schedule(ExecutionGraph.java:965) at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleVertices(ExecutionGraph.java:1503) at org.apache.flink.runtime.jobmaster.GraphManager$ExecutionGraphVertexScheduler.scheduleExecutionVertices(GraphManager.java:349) at org.apache.flink.runtime.schedule.StepwiseSchedulingPlugin.scheduleOneByOne(StepwiseSchedulingPlugin.java:132) at org.apache.flink.runtime.schedule.StepwiseSchedulingPlugin.onExecutionVertexFailover(StepwiseSchedulingPlugin.java:107) at org.apache.flink.runtime.jobmaster.GraphManager.notifyExecutionVertexFailover(GraphManager.java:163) at org.apache.flink.runtime.executiongraph.ExecutionGraph.resetExecutionVerticesAndNotify(ExecutionGraph.java:1372) at org.apache.flink.runtime.executiongraph.failover.FailoverRegion.restart(FailoverRegion.java:213) at org.apache.flink.runtime.executiongraph.failover.FailoverRegion.reset(FailoverRegion.java:198) at org.apache.flink.runtime.executiongraph.failover.FailoverRegion.allVerticesInTerminalState(FailoverRegion.java:97) at org.apache.flink.runtime.executiongraph.failover.FailoverRegion.lambda$cancel$0(FailoverRegion.java:169) at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:186) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:299) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) at java.lang.Thread.run(Thread.java:834) Caused by: java.util.concurrent.ExecutionException: java.util.ConcurrentModificationException at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) at org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.invokeRpc(AkkaInvocationHandler.java:213) at org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.invoke(AkkaInvocationHandler.java:125) ... 23 more Caused by: java.util.ConcurrentModificationException at java.util.HashMap$ValueSpliterator.tryAdvance(HashMap.java:1643) at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126) at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:498) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:485) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:152) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.findFirst(ReferencePipeline.java:464) at