[jira] [Updated] (FLINK-11375) Concurrent modification to slot pool due to SlotSharingManager releaseSlot directly

2019-02-19 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-11375:
--
Priority: Critical  (was: Major)

> Concurrent modification to slot pool due to SlotSharingManager releaseSlot 
> directly 
> 
>
> Key: FLINK-11375
> URL: https://issues.apache.org/jira/browse/FLINK-11375
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination, JobManager
>Affects Versions: 1.7.1
>Reporter: shuai.xu
>Assignee: BoWang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.8.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In SlotPool, the AvailableSlots is lock free, so all access to it should in 
> the main thread of SlotPool, and so all the public methods are called through 
> SlotPoolGateway except the releaseSlot directly called by SlotSharingManager. 
> This may cause a ConcurrentModificationException.
>  2019-01-16 19:50:16,184 INFO [flink-akka.actor.default-dispatcher-161] 
> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: 
> BlinkStoreScanTableSource 
> feature_memory_entity_store-entity_lsc_page_detail_feats_group_178-Batch -> 
> SourceConversion(table:[_DataStreamTable_12, source: 
> [BlinkStoreScanTableSource 
> feature_memory_entity_store-entity_lsc_page_detail_feats_group_178]], 
> fields:(f0)) -> correlate: 
> table(ScanBlinkStore_entity_lsc_page_detail_feats_group_1786($cor6.f0)), 
> select: 
> item_id,mainse_searcher_rank__cart_uv,mainse_searcher_rank__cart_uv_14,mainse_searcher_rank__cart_uv_30,mainse_searcher_rank__cart_uv_7,mainse_s
>  (433/500) (bd34af8dd7ee02d04a4a25e698495f0a) switched from RUNNING to 
> FINISHED.
>  2019-01-16 19:50:16,187 INFO [jobmanager-future-thread-90] 
> org.apache.flink.runtime.executiongraph.ExecutionGraph - scheduleVertices 
> meet exception, need to fail global execution graph
>  java.lang.reflect.UndeclaredThrowableException
>  at org.apache.flink.runtime.rpc.akka.$Proxy26.allocateSlots(Unknown Source)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SlotPool$ProviderAndOwner.allocateSlots(SlotPool.java:1955)
>  at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.schedule(ExecutionGraph.java:965)
>  at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleVertices(ExecutionGraph.java:1503)
>  at 
> org.apache.flink.runtime.jobmaster.GraphManager$ExecutionGraphVertexScheduler.scheduleExecutionVertices(GraphManager.java:349)
>  at 
> org.apache.flink.runtime.schedule.StepwiseSchedulingPlugin.scheduleOneByOne(StepwiseSchedulingPlugin.java:132)
>  at 
> org.apache.flink.runtime.schedule.StepwiseSchedulingPlugin.onExecutionVertexFailover(StepwiseSchedulingPlugin.java:107)
>  at 
> org.apache.flink.runtime.jobmaster.GraphManager.notifyExecutionVertexFailover(GraphManager.java:163)
>  at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.resetExecutionVerticesAndNotify(ExecutionGraph.java:1372)
>  at 
> org.apache.flink.runtime.executiongraph.failover.FailoverRegion.restart(FailoverRegion.java:213)
>  at 
> org.apache.flink.runtime.executiongraph.failover.FailoverRegion.reset(FailoverRegion.java:198)
>  at 
> org.apache.flink.runtime.executiongraph.failover.FailoverRegion.allVerticesInTerminalState(FailoverRegion.java:97)
>  at 
> org.apache.flink.runtime.executiongraph.failover.FailoverRegion.lambda$cancel$0(FailoverRegion.java:169)
>  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>  at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:186)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:299)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
>  at java.lang.Thread.run(Thread.java:834)
>  Caused by: java.util.concurrent.ExecutionException: 
> java.util.ConcurrentModificationException
>  at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
>  at 
> org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.invokeRpc(AkkaInvocationHandler.java:213)
>  at 
> 

[jira] [Updated] (FLINK-11375) Concurrent modification to slot pool due to SlotSharingManager releaseSlot directly

2019-02-19 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-11375:
--
Component/s: Distributed Coordination

> Concurrent modification to slot pool due to SlotSharingManager releaseSlot 
> directly 
> 
>
> Key: FLINK-11375
> URL: https://issues.apache.org/jira/browse/FLINK-11375
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination, JobManager
>Affects Versions: 1.7.1
>Reporter: shuai.xu
>Assignee: BoWang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.8.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In SlotPool, the AvailableSlots is lock free, so all access to it should in 
> the main thread of SlotPool, and so all the public methods are called through 
> SlotPoolGateway except the releaseSlot directly called by SlotSharingManager. 
> This may cause a ConcurrentModificationException.
>  2019-01-16 19:50:16,184 INFO [flink-akka.actor.default-dispatcher-161] 
> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: 
> BlinkStoreScanTableSource 
> feature_memory_entity_store-entity_lsc_page_detail_feats_group_178-Batch -> 
> SourceConversion(table:[_DataStreamTable_12, source: 
> [BlinkStoreScanTableSource 
> feature_memory_entity_store-entity_lsc_page_detail_feats_group_178]], 
> fields:(f0)) -> correlate: 
> table(ScanBlinkStore_entity_lsc_page_detail_feats_group_1786($cor6.f0)), 
> select: 
> item_id,mainse_searcher_rank__cart_uv,mainse_searcher_rank__cart_uv_14,mainse_searcher_rank__cart_uv_30,mainse_searcher_rank__cart_uv_7,mainse_s
>  (433/500) (bd34af8dd7ee02d04a4a25e698495f0a) switched from RUNNING to 
> FINISHED.
>  2019-01-16 19:50:16,187 INFO [jobmanager-future-thread-90] 
> org.apache.flink.runtime.executiongraph.ExecutionGraph - scheduleVertices 
> meet exception, need to fail global execution graph
>  java.lang.reflect.UndeclaredThrowableException
>  at org.apache.flink.runtime.rpc.akka.$Proxy26.allocateSlots(Unknown Source)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SlotPool$ProviderAndOwner.allocateSlots(SlotPool.java:1955)
>  at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.schedule(ExecutionGraph.java:965)
>  at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleVertices(ExecutionGraph.java:1503)
>  at 
> org.apache.flink.runtime.jobmaster.GraphManager$ExecutionGraphVertexScheduler.scheduleExecutionVertices(GraphManager.java:349)
>  at 
> org.apache.flink.runtime.schedule.StepwiseSchedulingPlugin.scheduleOneByOne(StepwiseSchedulingPlugin.java:132)
>  at 
> org.apache.flink.runtime.schedule.StepwiseSchedulingPlugin.onExecutionVertexFailover(StepwiseSchedulingPlugin.java:107)
>  at 
> org.apache.flink.runtime.jobmaster.GraphManager.notifyExecutionVertexFailover(GraphManager.java:163)
>  at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.resetExecutionVerticesAndNotify(ExecutionGraph.java:1372)
>  at 
> org.apache.flink.runtime.executiongraph.failover.FailoverRegion.restart(FailoverRegion.java:213)
>  at 
> org.apache.flink.runtime.executiongraph.failover.FailoverRegion.reset(FailoverRegion.java:198)
>  at 
> org.apache.flink.runtime.executiongraph.failover.FailoverRegion.allVerticesInTerminalState(FailoverRegion.java:97)
>  at 
> org.apache.flink.runtime.executiongraph.failover.FailoverRegion.lambda$cancel$0(FailoverRegion.java:169)
>  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>  at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:186)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:299)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
>  at java.lang.Thread.run(Thread.java:834)
>  Caused by: java.util.concurrent.ExecutionException: 
> java.util.ConcurrentModificationException
>  at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
>  at 
> org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.invokeRpc(AkkaInvocationHandler.java:213)
>  at 
> 

[jira] [Updated] (FLINK-11375) Concurrent modification to slot pool due to SlotSharingManager releaseSlot directly

2019-02-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-11375:
---
Labels: pull-request-available  (was: )

> Concurrent modification to slot pool due to SlotSharingManager releaseSlot 
> directly 
> 
>
> Key: FLINK-11375
> URL: https://issues.apache.org/jira/browse/FLINK-11375
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Affects Versions: 1.7.1
>Reporter: shuai.xu
>Assignee: BoWang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.8.0
>
>
> In SlotPool, the AvailableSlots is lock free, so all access to it should in 
> the main thread of SlotPool, and so all the public methods are called through 
> SlotPoolGateway except the releaseSlot directly called by SlotSharingManager. 
> This may cause a ConcurrentModificationException.
>  2019-01-16 19:50:16,184 INFO [flink-akka.actor.default-dispatcher-161] 
> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: 
> BlinkStoreScanTableSource 
> feature_memory_entity_store-entity_lsc_page_detail_feats_group_178-Batch -> 
> SourceConversion(table:[_DataStreamTable_12, source: 
> [BlinkStoreScanTableSource 
> feature_memory_entity_store-entity_lsc_page_detail_feats_group_178]], 
> fields:(f0)) -> correlate: 
> table(ScanBlinkStore_entity_lsc_page_detail_feats_group_1786($cor6.f0)), 
> select: 
> item_id,mainse_searcher_rank__cart_uv,mainse_searcher_rank__cart_uv_14,mainse_searcher_rank__cart_uv_30,mainse_searcher_rank__cart_uv_7,mainse_s
>  (433/500) (bd34af8dd7ee02d04a4a25e698495f0a) switched from RUNNING to 
> FINISHED.
>  2019-01-16 19:50:16,187 INFO [jobmanager-future-thread-90] 
> org.apache.flink.runtime.executiongraph.ExecutionGraph - scheduleVertices 
> meet exception, need to fail global execution graph
>  java.lang.reflect.UndeclaredThrowableException
>  at org.apache.flink.runtime.rpc.akka.$Proxy26.allocateSlots(Unknown Source)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SlotPool$ProviderAndOwner.allocateSlots(SlotPool.java:1955)
>  at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.schedule(ExecutionGraph.java:965)
>  at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleVertices(ExecutionGraph.java:1503)
>  at 
> org.apache.flink.runtime.jobmaster.GraphManager$ExecutionGraphVertexScheduler.scheduleExecutionVertices(GraphManager.java:349)
>  at 
> org.apache.flink.runtime.schedule.StepwiseSchedulingPlugin.scheduleOneByOne(StepwiseSchedulingPlugin.java:132)
>  at 
> org.apache.flink.runtime.schedule.StepwiseSchedulingPlugin.onExecutionVertexFailover(StepwiseSchedulingPlugin.java:107)
>  at 
> org.apache.flink.runtime.jobmaster.GraphManager.notifyExecutionVertexFailover(GraphManager.java:163)
>  at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.resetExecutionVerticesAndNotify(ExecutionGraph.java:1372)
>  at 
> org.apache.flink.runtime.executiongraph.failover.FailoverRegion.restart(FailoverRegion.java:213)
>  at 
> org.apache.flink.runtime.executiongraph.failover.FailoverRegion.reset(FailoverRegion.java:198)
>  at 
> org.apache.flink.runtime.executiongraph.failover.FailoverRegion.allVerticesInTerminalState(FailoverRegion.java:97)
>  at 
> org.apache.flink.runtime.executiongraph.failover.FailoverRegion.lambda$cancel$0(FailoverRegion.java:169)
>  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>  at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:186)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:299)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
>  at java.lang.Thread.run(Thread.java:834)
>  Caused by: java.util.concurrent.ExecutionException: 
> java.util.ConcurrentModificationException
>  at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
>  at 
> org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.invokeRpc(AkkaInvocationHandler.java:213)
>  at 
> org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.invoke(AkkaInvocationHandler.java:125)
>  ... 23 more
>  Caused by: 

[jira] [Updated] (FLINK-11375) Concurrent modification to slot pool due to SlotSharingManager releaseSlot directly

2019-01-18 Thread shuai.xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shuai.xu updated FLINK-11375:
-
Description: 
In SlotPool, the AvailableSlots is lock free, so all access to it should in the 
main thread of SlotPool, and so all the public methods are called through 
SlotPoolGateway except the releaseSlot directly called by SlotSharingManager. 
This may cause a ConcurrentModificationException.

 2019-01-16 19:50:16,184 INFO [flink-akka.actor.default-dispatcher-161] 
org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: 
BlinkStoreScanTableSource 
feature_memory_entity_store-entity_lsc_page_detail_feats_group_178-Batch -> 
SourceConversion(table:[_DataStreamTable_12, source: [BlinkStoreScanTableSource 
feature_memory_entity_store-entity_lsc_page_detail_feats_group_178]], 
fields:(f0)) -> correlate: 
table(ScanBlinkStore_entity_lsc_page_detail_feats_group_1786($cor6.f0)), 
select: 
item_id,mainse_searcher_rank__cart_uv,mainse_searcher_rank__cart_uv_14,mainse_searcher_rank__cart_uv_30,mainse_searcher_rank__cart_uv_7,mainse_s
 (433/500) (bd34af8dd7ee02d04a4a25e698495f0a) switched from RUNNING to FINISHED.
 2019-01-16 19:50:16,187 INFO [jobmanager-future-thread-90] 
org.apache.flink.runtime.executiongraph.ExecutionGraph - scheduleVertices meet 
exception, need to fail global execution graph
 java.lang.reflect.UndeclaredThrowableException
 at org.apache.flink.runtime.rpc.akka.$Proxy26.allocateSlots(Unknown Source)
 at 
org.apache.flink.runtime.jobmaster.slotpool.SlotPool$ProviderAndOwner.allocateSlots(SlotPool.java:1955)
 at 
org.apache.flink.runtime.executiongraph.ExecutionGraph.schedule(ExecutionGraph.java:965)
 at 
org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleVertices(ExecutionGraph.java:1503)
 at 
org.apache.flink.runtime.jobmaster.GraphManager$ExecutionGraphVertexScheduler.scheduleExecutionVertices(GraphManager.java:349)
 at 
org.apache.flink.runtime.schedule.StepwiseSchedulingPlugin.scheduleOneByOne(StepwiseSchedulingPlugin.java:132)
 at 
org.apache.flink.runtime.schedule.StepwiseSchedulingPlugin.onExecutionVertexFailover(StepwiseSchedulingPlugin.java:107)
 at 
org.apache.flink.runtime.jobmaster.GraphManager.notifyExecutionVertexFailover(GraphManager.java:163)
 at 
org.apache.flink.runtime.executiongraph.ExecutionGraph.resetExecutionVerticesAndNotify(ExecutionGraph.java:1372)
 at 
org.apache.flink.runtime.executiongraph.failover.FailoverRegion.restart(FailoverRegion.java:213)
 at 
org.apache.flink.runtime.executiongraph.failover.FailoverRegion.reset(FailoverRegion.java:198)
 at 
org.apache.flink.runtime.executiongraph.failover.FailoverRegion.allVerticesInTerminalState(FailoverRegion.java:97)
 at 
org.apache.flink.runtime.executiongraph.failover.FailoverRegion.lambda$cancel$0(FailoverRegion.java:169)
 at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
 at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
 at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:186)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:299)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
 at java.lang.Thread.run(Thread.java:834)
 Caused by: java.util.concurrent.ExecutionException: 
java.util.ConcurrentModificationException
 at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
 at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
 at 
org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.invokeRpc(AkkaInvocationHandler.java:213)
 at 
org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.invoke(AkkaInvocationHandler.java:125)
 ... 23 more
 Caused by: java.util.ConcurrentModificationException
 at java.util.HashMap$ValueSpliterator.tryAdvance(HashMap.java:1643)
 at 
java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126)
 at 
java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:498)
 at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:485)
 at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
 at java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:152)
 at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
 at java.util.stream.ReferencePipeline.findFirst(ReferencePipeline.java:464)
 at 

[jira] [Updated] (FLINK-11375) Concurrent modification to slot pool due to SlotSharingManager releaseSlot directly

2019-01-17 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-11375:
--
Fix Version/s: 1.8.0

> Concurrent modification to slot pool due to SlotSharingManager releaseSlot 
> directly 
> 
>
> Key: FLINK-11375
> URL: https://issues.apache.org/jira/browse/FLINK-11375
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Affects Versions: 1.7.1
>Reporter: shuai.xu
>Priority: Major
> Fix For: 1.8.0
>
>
> In SlotPool, the AvailableSlots is lock free, so all access to it should in 
> the main thread of SlotPool, and so all the public methods are called throw 
> SlotPoolGateway except the releaseSlot directly called by SlotSharingManager. 
> This may cause a ConcurrentModificationException.
>  2019-01-16 19:50:16,184 INFO [flink-akka.actor.default-dispatcher-161] 
> org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: 
> BlinkStoreScanTableSource 
> feature_memory_entity_store-entity_lsc_page_detail_feats_group_178-Batch -> 
> SourceConversion(table:[_DataStreamTable_12, source: 
> [BlinkStoreScanTableSource 
> feature_memory_entity_store-entity_lsc_page_detail_feats_group_178]], 
> fields:(f0)) -> correlate: 
> table(ScanBlinkStore_entity_lsc_page_detail_feats_group_1786($cor6.f0)), 
> select: 
> item_id,mainse_searcher_rank__cart_uv,mainse_searcher_rank__cart_uv_14,mainse_searcher_rank__cart_uv_30,mainse_searcher_rank__cart_uv_7,mainse_s
>  (433/500) (bd34af8dd7ee02d04a4a25e698495f0a) switched from RUNNING to 
> FINISHED.
> 2019-01-16 19:50:16,187 INFO [jobmanager-future-thread-90] 
> org.apache.flink.runtime.executiongraph.ExecutionGraph - scheduleVertices 
> meet exception, need to fail global execution graph
> java.lang.reflect.UndeclaredThrowableException
>  at org.apache.flink.runtime.rpc.akka.$Proxy26.allocateSlots(Unknown Source)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SlotPool$ProviderAndOwner.allocateSlots(SlotPool.java:1955)
>  at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.schedule(ExecutionGraph.java:965)
>  at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleVertices(ExecutionGraph.java:1503)
>  at 
> org.apache.flink.runtime.jobmaster.GraphManager$ExecutionGraphVertexScheduler.scheduleExecutionVertices(GraphManager.java:349)
>  at 
> org.apache.flink.runtime.schedule.StepwiseSchedulingPlugin.scheduleOneByOne(StepwiseSchedulingPlugin.java:132)
>  at 
> org.apache.flink.runtime.schedule.StepwiseSchedulingPlugin.onExecutionVertexFailover(StepwiseSchedulingPlugin.java:107)
>  at 
> org.apache.flink.runtime.jobmaster.GraphManager.notifyExecutionVertexFailover(GraphManager.java:163)
>  at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.resetExecutionVerticesAndNotify(ExecutionGraph.java:1372)
>  at 
> org.apache.flink.runtime.executiongraph.failover.FailoverRegion.restart(FailoverRegion.java:213)
>  at 
> org.apache.flink.runtime.executiongraph.failover.FailoverRegion.reset(FailoverRegion.java:198)
>  at 
> org.apache.flink.runtime.executiongraph.failover.FailoverRegion.allVerticesInTerminalState(FailoverRegion.java:97)
>  at 
> org.apache.flink.runtime.executiongraph.failover.FailoverRegion.lambda$cancel$0(FailoverRegion.java:169)
>  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>  at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:186)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:299)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
>  at java.lang.Thread.run(Thread.java:834)
> Caused by: java.util.concurrent.ExecutionException: 
> java.util.ConcurrentModificationException
>  at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
>  at 
> org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.invokeRpc(AkkaInvocationHandler.java:213)
>  at 
> org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.invoke(AkkaInvocationHandler.java:125)
>  ... 23 more
> Caused by: java.util.ConcurrentModificationException
>  at java.util.HashMap$ValueSpliterator.tryAdvance(HashMap.java:1643)
>  at 
> 

[jira] [Updated] (FLINK-11375) Concurrent modification to slot pool due to SlotSharingManager releaseSlot directly

2019-01-16 Thread shuai.xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shuai.xu updated FLINK-11375:
-
Description: 
In SlotPool, the AvailableSlots is lock free, so all access to it should in the 
main thread of SlotPool, and so all the public methods are called throw 
SlotPoolGateway except the releaseSlot directly called by SlotSharingManager. 
This may cause a ConcurrentModificationException.

 2019-01-16 19:50:16,184 INFO [flink-akka.actor.default-dispatcher-161] 
org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: 
BlinkStoreScanTableSource 
feature_memory_entity_store-entity_lsc_page_detail_feats_group_178-Batch -> 
SourceConversion(table:[_DataStreamTable_12, source: [BlinkStoreScanTableSource 
feature_memory_entity_store-entity_lsc_page_detail_feats_group_178]], 
fields:(f0)) -> correlate: 
table(ScanBlinkStore_entity_lsc_page_detail_feats_group_1786($cor6.f0)), 
select: 
item_id,mainse_searcher_rank__cart_uv,mainse_searcher_rank__cart_uv_14,mainse_searcher_rank__cart_uv_30,mainse_searcher_rank__cart_uv_7,mainse_s
 (433/500) (bd34af8dd7ee02d04a4a25e698495f0a) switched from RUNNING to FINISHED.
2019-01-16 19:50:16,187 INFO [jobmanager-future-thread-90] 
org.apache.flink.runtime.executiongraph.ExecutionGraph - scheduleVertices meet 
exception, need to fail global execution graph
java.lang.reflect.UndeclaredThrowableException
 at org.apache.flink.runtime.rpc.akka.$Proxy26.allocateSlots(Unknown Source)
 at 
org.apache.flink.runtime.jobmaster.slotpool.SlotPool$ProviderAndOwner.allocateSlots(SlotPool.java:1955)
 at 
org.apache.flink.runtime.executiongraph.ExecutionGraph.schedule(ExecutionGraph.java:965)
 at 
org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleVertices(ExecutionGraph.java:1503)
 at 
org.apache.flink.runtime.jobmaster.GraphManager$ExecutionGraphVertexScheduler.scheduleExecutionVertices(GraphManager.java:349)
 at 
org.apache.flink.runtime.schedule.StepwiseSchedulingPlugin.scheduleOneByOne(StepwiseSchedulingPlugin.java:132)
 at 
org.apache.flink.runtime.schedule.StepwiseSchedulingPlugin.onExecutionVertexFailover(StepwiseSchedulingPlugin.java:107)
 at 
org.apache.flink.runtime.jobmaster.GraphManager.notifyExecutionVertexFailover(GraphManager.java:163)
 at 
org.apache.flink.runtime.executiongraph.ExecutionGraph.resetExecutionVerticesAndNotify(ExecutionGraph.java:1372)
 at 
org.apache.flink.runtime.executiongraph.failover.FailoverRegion.restart(FailoverRegion.java:213)
 at 
org.apache.flink.runtime.executiongraph.failover.FailoverRegion.reset(FailoverRegion.java:198)
 at 
org.apache.flink.runtime.executiongraph.failover.FailoverRegion.allVerticesInTerminalState(FailoverRegion.java:97)
 at 
org.apache.flink.runtime.executiongraph.failover.FailoverRegion.lambda$cancel$0(FailoverRegion.java:169)
 at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
 at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
 at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:186)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:299)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
 at java.lang.Thread.run(Thread.java:834)
Caused by: java.util.concurrent.ExecutionException: 
java.util.ConcurrentModificationException
 at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
 at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
 at 
org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.invokeRpc(AkkaInvocationHandler.java:213)
 at 
org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.invoke(AkkaInvocationHandler.java:125)
 ... 23 more
Caused by: java.util.ConcurrentModificationException
 at java.util.HashMap$ValueSpliterator.tryAdvance(HashMap.java:1643)
 at 
java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126)
 at 
java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:498)
 at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:485)
 at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
 at java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:152)
 at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
 at java.util.stream.ReferencePipeline.findFirst(ReferencePipeline.java:464)
 at