[jira] [Comment Edited] (FLINK-11143) AskTimeoutException is thrown during job submission and completion

2020-07-21 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156363#comment-17156363
 ] 

Till Rohrmann edited comment on FLINK-11143 at 7/21/20, 6:49 AM:
-

[~trohrmann] I am seeing a similar problem *when trying unaligned checkpoint 
with 1.11.0*. The Flink job actually started fine. We didn't see this 
AskTimeoutException thrown during job submission in without unaligned 
checkpoint (1.10 or 1.11).

Some more context about the app
 * a large-state stream join app (a few TBs)
 * parallelism 1,440
 * number of containers: 180
 * Cores per container: 12
 * TM_TASK_SLOTS: 8
 * akka.ask.timeout: 120 s
 * heartbeat.timeout: 12
 * web.timeout: 6 (also tried larger values like 300,000 or 600,000 without 
any difference)

I will send you the log files (with DEBUG level) in an email offline. Thanks a 
lot for your help in advance!
{code:java}
\"errors\":[\"Internal server error.\",\" AskTimeoutException is thrown during job submission and completion
> --
>
> Key: FLINK-11143
> URL: https://issues.apache.org/jira/browse/FLINK-11143
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.6.2, 1.10.0
>Reporter: Alex Vinnik
>Priority: Critical
> Attachments: flink-job-timeline.PNG
>
>
> For more details please see the thread
> [http://mail-archives.apache.org/mod_mbox/flink-user/201812.mbox/%3cc2fb26f9-1410-4333-80f4-34807481b...@gmail.com%3E]
> On submission 
> 2018-12-12 02:28:31 ERROR JobsOverviewHandler:92 - Implementation error: 
> Unhandled exception.
>  akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/dispatcher#225683351|#225683351]] after [1 ms]. 
> Sender[null] sent message of type 
> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>  at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>  at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
>  at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>  at 
> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>  at 
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>  at 
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
>  at java.lang.Thread.run(Thread.java:748)
>  
> On completion
>  
> {"errors":["Internal server error."," side:\njava.util.concurrent.CompletionException: 
> akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/dispatcher#105638574]] after [1 ms]. 
> Sender[null] sent message of type 
> \"org.apache.flink.runtime.rpc.messages.LocalFencedMessage\".
> at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
> at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
> at 
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:772)
> at akka.dispatch.OnComplete.internal(Future.scala:258)
> at akka.dispatch.OnComplete.internal(Future.scala:256)
> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> at 
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
> at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
> at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
> at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
> at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
> at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
> at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
> at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
> at 
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
> at 
> 

[jira] [Comment Edited] (FLINK-11143) AskTimeoutException is thrown during job submission and completion

2020-07-14 Thread Steven Zhen Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156363#comment-17156363
 ] 

Steven Zhen Wu edited comment on FLINK-11143 at 7/14/20, 5:38 PM:
--

[~trohrmann] I am seeing a similar problem *when trying unaligned checkpoint 
with 1.11.0*. The Flink job actually started fine. We didn't see this 
AskTimeoutException thrown during job submission in without unaligned 
checkpoint (1.10 or 1.11).

Some more context about the app
 * a large-state stream join app (a few TBs)
 * parallelism 1,440
 * number of containers: 180
 * Cores per container: 12
 * TM_TASK_SLOTS: 8
 * akka.ask.timeout: 120 s
 * heartbeat.timeout: 12
 * web.timeout: 6 (also tried larger values like 300,000 or 600,000 without 
any difference)

I will send you the log files (with DEBUG level) in an email offline. Thanks a 
lot for your help in advance!
{code:java}
\"errors\":[\"Internal server error.\",\" AskTimeoutException is thrown during job submission and completion
> --
>
> Key: FLINK-11143
> URL: https://issues.apache.org/jira/browse/FLINK-11143
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.6.2, 1.10.0
>Reporter: Alex Vinnik
>Priority: Critical
> Attachments: flink-job-timeline.PNG
>
>
> For more details please see the thread
> [http://mail-archives.apache.org/mod_mbox/flink-user/201812.mbox/%3cc2fb26f9-1410-4333-80f4-34807481b...@gmail.com%3E]
> On submission 
> 2018-12-12 02:28:31 ERROR JobsOverviewHandler:92 - Implementation error: 
> Unhandled exception.
>  akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/dispatcher#225683351|#225683351]] after [1 ms]. 
> Sender[null] sent message of type 
> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>  at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>  at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
>  at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>  at 
> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>  at 
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>  at 
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
>  at java.lang.Thread.run(Thread.java:748)
>  
> On completion
>  
> {"errors":["Internal server error."," side:\njava.util.concurrent.CompletionException: 
> akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/dispatcher#105638574]] after [1 ms]. 
> Sender[null] sent message of type 
> \"org.apache.flink.runtime.rpc.messages.LocalFencedMessage\".
> at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
> at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
> at 
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:772)
> at akka.dispatch.OnComplete.internal(Future.scala:258)
> at akka.dispatch.OnComplete.internal(Future.scala:256)
> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> at 
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
> at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
> at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
> at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
> at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
> at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
> at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
> at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
> at 
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
> at 
> 

[jira] [Comment Edited] (FLINK-11143) AskTimeoutException is thrown during job submission and completion

2020-07-14 Thread Steven Zhen Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156363#comment-17156363
 ] 

Steven Zhen Wu edited comment on FLINK-11143 at 7/14/20, 5:38 PM:
--

[~trohrmann] I am seeing a similar problem *when trying unaligned checkpoint 
with 1.11.0*. The Flink job actually started fine. We didn't see this 
AskTimeoutException thrown during job submission in without unaligned 
checkpoint (1.10 or 1.11).

Some more context about the app
 * a large-state stream join app (a few TBs)
 * parallelism 1,440
 * number of containers: 180
 * Cores per container: 12
 * TM_TASK_SLOTS: 8
 * akka.ask.timeout: 120 s
 * heartbeat.timeout: 12
 * web.timeout: 6 (also tried larger values like 300,000 or 600,000 without 
any difference)

I will send you the log files (with DEBUG level) in an email offline. Thanks a 
lot for your help in advance!
{code:java}
\"errors\":[\"Internal server error.\",\" AskTimeoutException is thrown during job submission and completion
> --
>
> Key: FLINK-11143
> URL: https://issues.apache.org/jira/browse/FLINK-11143
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.6.2, 1.10.0
>Reporter: Alex Vinnik
>Priority: Critical
> Attachments: flink-job-timeline.PNG
>
>
> For more details please see the thread
> [http://mail-archives.apache.org/mod_mbox/flink-user/201812.mbox/%3cc2fb26f9-1410-4333-80f4-34807481b...@gmail.com%3E]
> On submission 
> 2018-12-12 02:28:31 ERROR JobsOverviewHandler:92 - Implementation error: 
> Unhandled exception.
>  akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/dispatcher#225683351|#225683351]] after [1 ms]. 
> Sender[null] sent message of type 
> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>  at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>  at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
>  at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>  at 
> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>  at 
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>  at 
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
>  at java.lang.Thread.run(Thread.java:748)
>  
> On completion
>  
> {"errors":["Internal server error."," side:\njava.util.concurrent.CompletionException: 
> akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/dispatcher#105638574]] after [1 ms]. 
> Sender[null] sent message of type 
> \"org.apache.flink.runtime.rpc.messages.LocalFencedMessage\".
> at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
> at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
> at 
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:772)
> at akka.dispatch.OnComplete.internal(Future.scala:258)
> at akka.dispatch.OnComplete.internal(Future.scala:256)
> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> at 
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
> at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
> at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
> at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
> at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
> at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
> at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
> at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
> at 
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
> at 
> 

[jira] [Comment Edited] (FLINK-11143) AskTimeoutException is thrown during job submission and completion

2020-07-14 Thread Steven Zhen Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156363#comment-17156363
 ] 

Steven Zhen Wu edited comment on FLINK-11143 at 7/14/20, 5:37 PM:
--

[~trohrmann] I am seeing a similar problem *when trying unaligned checkpoint 
with 1.11.0*. The Flink job actually started fine. We didn't see this 
AskTimeoutException thrown during job submission in without unaligned 
checkpoint (1.10 or 1.11).

Some more context about the app
 * a large-state stream join app (a few TBs)
 * parallelism 1,440
 * number of containers: 180
 * Cores per container: 12
 * TM_TASK_SLOTS: 8
 * akka.ask.timeout: 120 s
 * heartbeat.timeout: 12
 * web.timeout: 6 (also tried larger values like 300,000 or 600,000 without 
any difference)

I will send you the log files (with DEBUG level) in an email offline. Thanks a 
lot for your help in advance!
{code:java}
\"errors\":[\"Internal server error.\",\" AskTimeoutException is thrown during job submission and completion
> --
>
> Key: FLINK-11143
> URL: https://issues.apache.org/jira/browse/FLINK-11143
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.6.2, 1.10.0
>Reporter: Alex Vinnik
>Priority: Critical
> Attachments: flink-job-timeline.PNG
>
>
> For more details please see the thread
> [http://mail-archives.apache.org/mod_mbox/flink-user/201812.mbox/%3cc2fb26f9-1410-4333-80f4-34807481b...@gmail.com%3E]
> On submission 
> 2018-12-12 02:28:31 ERROR JobsOverviewHandler:92 - Implementation error: 
> Unhandled exception.
>  akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/dispatcher#225683351|#225683351]] after [1 ms]. 
> Sender[null] sent message of type 
> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>  at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>  at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
>  at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>  at 
> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>  at 
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>  at 
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
>  at java.lang.Thread.run(Thread.java:748)
>  
> On completion
>  
> {"errors":["Internal server error."," side:\njava.util.concurrent.CompletionException: 
> akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/dispatcher#105638574]] after [1 ms]. 
> Sender[null] sent message of type 
> \"org.apache.flink.runtime.rpc.messages.LocalFencedMessage\".
> at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
> at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
> at 
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:772)
> at akka.dispatch.OnComplete.internal(Future.scala:258)
> at akka.dispatch.OnComplete.internal(Future.scala:256)
> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> at 
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
> at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
> at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
> at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
> at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
> at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
> at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
> at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
> at 
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
> at 
> 

[jira] [Comment Edited] (FLINK-11143) AskTimeoutException is thrown during job submission and completion

2020-07-12 Thread Steven Zhen Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156363#comment-17156363
 ] 

Steven Zhen Wu edited comment on FLINK-11143 at 7/12/20, 6:41 PM:
--

[~trohrmann] I am seeing a similar problem *when trying unaligned checkpoint 
with 1.11.0*. The Flink job actually started fine. We didn't see this 
AskTimeoutException thrown during job submission in without unaligned 
checkpoint (1.10 or 1.11).

Some more context about the app
 * a large-state stream join app (a few TBs)
 * parallelism 1,440
 * number of containers: 180
 * Cores per container: 12
 * TM_TASK_SLOTS: 8
 * akka.ask.timeout: 120 s
 * heartbeat.timeout: 12
 * web.timeout: 6 (also tried larger values like 300,000 or 600,000 without 
any difference)

I will send you the log files (with DEBUG level) in an email offline. Thanks a 
lot for your help in advance!
{code:java}
\"errors\":[\"Internal server error.\",\" AskTimeoutException is thrown during job submission and completion
> --
>
> Key: FLINK-11143
> URL: https://issues.apache.org/jira/browse/FLINK-11143
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.6.2, 1.10.0
>Reporter: Alex Vinnik
>Priority: Critical
> Attachments: flink-job-timeline.PNG
>
>
> For more details please see the thread
> [http://mail-archives.apache.org/mod_mbox/flink-user/201812.mbox/%3cc2fb26f9-1410-4333-80f4-34807481b...@gmail.com%3E]
> On submission 
> 2018-12-12 02:28:31 ERROR JobsOverviewHandler:92 - Implementation error: 
> Unhandled exception.
>  akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/dispatcher#225683351|#225683351]] after [1 ms]. 
> Sender[null] sent message of type 
> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>  at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>  at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
>  at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>  at 
> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>  at 
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>  at 
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
>  at java.lang.Thread.run(Thread.java:748)
>  
> On completion
>  
> {"errors":["Internal server error."," side:\njava.util.concurrent.CompletionException: 
> akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/dispatcher#105638574]] after [1 ms]. 
> Sender[null] sent message of type 
> \"org.apache.flink.runtime.rpc.messages.LocalFencedMessage\".
> at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
> at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
> at 
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:772)
> at akka.dispatch.OnComplete.internal(Future.scala:258)
> at akka.dispatch.OnComplete.internal(Future.scala:256)
> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> at 
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
> at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
> at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
> at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
> at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
> at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
> at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
> at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
> at 
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
> at 
> 

[jira] [Comment Edited] (FLINK-11143) AskTimeoutException is thrown during job submission and completion

2020-07-12 Thread Steven Zhen Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156363#comment-17156363
 ] 

Steven Zhen Wu edited comment on FLINK-11143 at 7/12/20, 6:39 PM:
--

[~trohrmann] I am seeing a similar problem *when trying unaligned checkpoint 
with 1.11.0*. The Flink job actually started fine. We didn't see this 
AskTimeoutException thrown during job submission in without unaligned 
checkpoint (1.10 or 1.11).

Some more context about the app
 * a large-state stream join app (a few TBs)
 * parallelism 1,440
 * number of containers: 180
 * Cores per container: 12
 * TM_TASK_SLOTS: 8
 * akka.ask.timeout: 120 s
 * heartbeat.timeout: 12
 * web.timeout: 6 (also tried larger values like 300,000 or 600,000 without 
any difference)

I will send you the log files (with DEBUG level) in an email offline. Thanks a 
lot for your help in advance!
{code:java}
\"errors\":[\"Internal server error.\",\" AskTimeoutException is thrown during job submission and completion
> --
>
> Key: FLINK-11143
> URL: https://issues.apache.org/jira/browse/FLINK-11143
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.6.2, 1.10.0
>Reporter: Alex Vinnik
>Priority: Critical
> Attachments: flink-job-timeline.PNG
>
>
> For more details please see the thread
> [http://mail-archives.apache.org/mod_mbox/flink-user/201812.mbox/%3cc2fb26f9-1410-4333-80f4-34807481b...@gmail.com%3E]
> On submission 
> 2018-12-12 02:28:31 ERROR JobsOverviewHandler:92 - Implementation error: 
> Unhandled exception.
>  akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/dispatcher#225683351|#225683351]] after [1 ms]. 
> Sender[null] sent message of type 
> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>  at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>  at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
>  at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>  at 
> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>  at 
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>  at 
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
>  at java.lang.Thread.run(Thread.java:748)
>  
> On completion
>  
> {"errors":["Internal server error."," side:\njava.util.concurrent.CompletionException: 
> akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/dispatcher#105638574]] after [1 ms]. 
> Sender[null] sent message of type 
> \"org.apache.flink.runtime.rpc.messages.LocalFencedMessage\".
> at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
> at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
> at 
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:772)
> at akka.dispatch.OnComplete.internal(Future.scala:258)
> at akka.dispatch.OnComplete.internal(Future.scala:256)
> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> at 
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
> at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
> at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
> at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
> at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
> at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
> at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
> at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
> at 
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
> at 
> 

[jira] [Comment Edited] (FLINK-11143) AskTimeoutException is thrown during job submission and completion

2020-03-30 Thread Nikola (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071053#comment-17071053
 ] 

Nikola edited comment on FLINK-11143 at 3/30/20, 3:08 PM:
--

Hi [~trohrmann], once I gather them I will share them with you.
 In the meanwhile I can give you some more information:

We see this happening with both 1.8.2 and 1.9.2 when we run batch jobs. We have 
set both the timeout to a high number (akka.ask.timeout to 30s and web.timeout 
to 60s). We don't see taskmanager losing connection. 

Some of our configuration:
{code:java}
akka.ask.timeout: 30s
akka.lookup.timeout: 30s
akka.server-socket-worker-pool.pool-size-max: 20
rest.port: 8081
rest.server.max-content-length: 209715200
rest.server.numThreads: 10
web.timeout: 59000{code}
Once we submit a job the REST API and the Web UI (presumably as it's using the 
REST API) stop responding. After 10s they throw `Ask timed out on [Actor[..]] 
after 1ms. Sender [null] sent message of type 
"orc.apache.flink.runtime.rpc.messages.LocalFencedMessage"`
 However, it seems that the jobs are scheduled (based on what we can see in the 
logs) and they run fine. However the UI is only accessable after some period of 
time (sometimes 20s, sometimes 60s, sometimes 90s).

The issue for us is that when we create a new batch job through the API we 
expect to get the GUID of the job. Since the REST API throws an exception we 
can never get the GUID and we need to reschedule the job. 

I have an example picture where we don't get any response back, so we try to 
schedule the job again. We do this until we get a GUID for the job. In the end, 
we end up running 5+ jobs (which all of them are same job). You can see the 
start timestamps are about 20s apart 

!image-2020-03-30-19-06-14-553.png!


was (Author: nikobearrr):
Hi [~trohrmann], once I gather them I will share them with you.
 In the meanwhile I can give you some more information:

We see this happening with both 1.8.2 and 1.9.2 when we run batch jobs. We have 
set both the timeout to a high number (akka.ask.timeout to 30s and web.timeout 
to 60s). We don't see taskmanager losing connection. 

Some of our configuration:


{code:java}
akka.ask.timeout: 30s
akka.lookup.timeout: 30s
akka.server-socket-worker-pool.pool-size-max: 20
rest.port: 8081
rest.server.max-content-length: 209715200
rest.server.numThreads: 10{code}

 Once we submit a job the REST API and the Web UI (presumably as it's using the 
REST API) stop responding. After 10s they throw `Ask timed out on [Actor[..]] 
after 1ms. Sender [null] sent message of type 
"orc.apache.flink.runtime.rpc.messages.LocalFencedMessage"`
 However, it seems that the jobs are scheduled (based on what we can see in the 
logs) and they run fine. However the UI is only accessable after some period of 
time (sometimes 20s, sometimes 60s, sometimes 90s).

The issue for us is that when we create a new batch job through the API we 
expect to get the GUID of the job. Since the REST API throws an exception we 
can never get the GUID and we need to reschedule the job. 

I have an example picture where we don't get any response back, so we try to 
schedule the job again. We do this until we get a GUID for the job. In the end, 
we end up running 5+ jobs (which all of them are same job). You can see the 
start timestamps are about 20s apart 

!image-2020-03-30-19-06-14-553.png!

> AskTimeoutException is thrown during job submission and completion
> --
>
> Key: FLINK-11143
> URL: https://issues.apache.org/jira/browse/FLINK-11143
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.6.2, 1.10.0
>Reporter: Alex Vinnik
>Priority: Critical
> Attachments: image-2020-03-30-19-06-14-553.png
>
>
> For more details please see the thread
> [http://mail-archives.apache.org/mod_mbox/flink-user/201812.mbox/%3cc2fb26f9-1410-4333-80f4-34807481b...@gmail.com%3E]
> On submission 
> 2018-12-12 02:28:31 ERROR JobsOverviewHandler:92 - Implementation error: 
> Unhandled exception.
>  akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/dispatcher#225683351|#225683351]] after [1 ms]. 
> Sender[null] sent message of type 
> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>  at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>  at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
>  at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>  at 
> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>  at 
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>  at 
> 

[jira] [Comment Edited] (FLINK-11143) AskTimeoutException is thrown during job submission and completion

2020-03-30 Thread Nikola (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071053#comment-17071053
 ] 

Nikola edited comment on FLINK-11143 at 3/30/20, 3:07 PM:
--

Hi [~trohrmann], once I gather them I will share them with you.
 In the meanwhile I can give you some more information:

We see this happening with both 1.8.2 and 1.9.2 when we run batch jobs. We have 
set both the timeout to a high number (akka.ask.timeout to 30s and web.timeout 
to 60s). We don't see taskmanager losing connection. 

Some of our configuration:


{code:java}
akka.ask.timeout: 30s
akka.lookup.timeout: 30s
akka.server-socket-worker-pool.pool-size-max: 20
rest.port: 8081
rest.server.max-content-length: 209715200
rest.server.numThreads: 10{code}

 Once we submit a job the REST API and the Web UI (presumably as it's using the 
REST API) stop responding. After 10s they throw `Ask timed out on [Actor[..]] 
after 1ms. Sender [null] sent message of type 
"orc.apache.flink.runtime.rpc.messages.LocalFencedMessage"`
 However, it seems that the jobs are scheduled (based on what we can see in the 
logs) and they run fine. However the UI is only accessable after some period of 
time (sometimes 20s, sometimes 60s, sometimes 90s).

The issue for us is that when we create a new batch job through the API we 
expect to get the GUID of the job. Since the REST API throws an exception we 
can never get the GUID and we need to reschedule the job. 

I have an example picture where we don't get any response back, so we try to 
schedule the job again. We do this until we get a GUID for the job. In the end, 
we end up running 5+ jobs (which all of them are same job). You can see the 
start timestamps are about 20s apart 

!image-2020-03-30-19-06-14-553.png!


was (Author: nikobearrr):
Hi [~trohrmann], once I gather them I will share them with you.
In the meanwhile I can give you some more information:

We see this happening with both 1.8.2 and 1.9.2 when we run batch jobs. We have 
set both the timeout to a high number (akka.ask.timeout to 30s and web.timeout 
to 60s). We don't see taskmanager losing connection. 
Once we submit a job the REST API and the Web UI (presumably as it's using the 
REST API) stop responding. After 10s they throw `Ask timed out on [Actor[..]] 
after 1ms. Sender [null] sent message of type 
"orc.apache.flink.runtime.rpc.messages.LocalFencedMessage"`
However, it seems that the jobs are scheduled (based on what we can see in the 
logs) and they run fine. However the UI is only accessable after some period of 
time (sometimes 20s, sometimes 60s, sometimes 90s). 

The issue for us is that when we create a new batch job through the API we 
expect to get the GUID of the job. Since the REST API throws an exception we 
can never get the GUID and we need to reschedule the job. 

I have an example picture where we don't get any response back, so we try to 
schedule the job again. We do this until we get a GUID for the job. In the end, 
we end up running 5+ jobs (which all of them are same job). You can see the 
start timestamps are about 20s apart 

!image-2020-03-30-19-06-14-553.png!

> AskTimeoutException is thrown during job submission and completion
> --
>
> Key: FLINK-11143
> URL: https://issues.apache.org/jira/browse/FLINK-11143
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.6.2, 1.10.0
>Reporter: Alex Vinnik
>Priority: Critical
> Attachments: image-2020-03-30-19-06-14-553.png
>
>
> For more details please see the thread
> [http://mail-archives.apache.org/mod_mbox/flink-user/201812.mbox/%3cc2fb26f9-1410-4333-80f4-34807481b...@gmail.com%3E]
> On submission 
> 2018-12-12 02:28:31 ERROR JobsOverviewHandler:92 - Implementation error: 
> Unhandled exception.
>  akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/dispatcher#225683351|#225683351]] after [1 ms]. 
> Sender[null] sent message of type 
> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>  at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>  at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
>  at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>  at 
> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>  at 
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>  at 
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
>  at 
> 

[jira] [Comment Edited] (FLINK-11143) AskTimeoutException is thrown during job submission and completion

2020-03-04 Thread Bowen Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051574#comment-17051574
 ] 

Bowen Li edited comment on FLINK-11143 at 3/4/20, 8:02 PM:
---

I tried on 1.10 and 1.11, and can confirm the issue still exists by running a 
local yarn cluster.

 

[~trohrmann] [~chesnay]  can you help to fix or triage this bug?


was (Author: phoenixjiangnan):
I tried on 1.10 and 1.11, and can confirm the issue still exists.

 

[~trohrmann] [~chesnay]  can you help to fix or triage this bug?

> AskTimeoutException is thrown during job submission and completion
> --
>
> Key: FLINK-11143
> URL: https://issues.apache.org/jira/browse/FLINK-11143
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.6.2, 1.10.0
>Reporter: Alex Vinnik
>Priority: Critical
>
> For more details please see the thread
> [http://mail-archives.apache.org/mod_mbox/flink-user/201812.mbox/%3cc2fb26f9-1410-4333-80f4-34807481b...@gmail.com%3E]
> On submission 
> 2018-12-12 02:28:31 ERROR JobsOverviewHandler:92 - Implementation error: 
> Unhandled exception.
>  akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/dispatcher#225683351|#225683351]] after [1 ms]. 
> Sender[null] sent message of type 
> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>  at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>  at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
>  at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>  at 
> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>  at 
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>  at 
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
>  at java.lang.Thread.run(Thread.java:748)
>  
> On completion
>  
> {"errors":["Internal server error."," side:\njava.util.concurrent.CompletionException: 
> akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/dispatcher#105638574]] after [1 ms]. 
> Sender[null] sent message of type 
> \"org.apache.flink.runtime.rpc.messages.LocalFencedMessage\".
> at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
> at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
> at 
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:772)
> at akka.dispatch.OnComplete.internal(Future.scala:258)
> at akka.dispatch.OnComplete.internal(Future.scala:256)
> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> at 
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
> at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
> at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
> at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
> at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
> at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
> at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
> at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
> at 
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
> at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
> at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
> at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
> at java.lang.Thread.run(Thread.java:748)\nCaused by: 
> akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/dispatcher#105638574]] after [1 ms]. 
> 

[jira] [Comment Edited] (FLINK-11143) AskTimeoutException is thrown during job submission and completion

2019-07-23 Thread Akshay Iyangar (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891375#comment-16891375
 ] 

Akshay Iyangar edited comment on FLINK-11143 at 7/23/19 9:01 PM:
-

Hi 
 Is it working on flink 1.8 ? we tried the setting and still seem to hit it 
with the deafult timeout of 1ms.

 
{code:java}
level":"WARN","level_value":3,"stack_trace":"java.util.concurrent.CompletionException:
 akka.pattern.AskTimeoutException: Ask timed out on 
[Actorakka://flink/user/dispatcher#-1731728438] after [1 ms]. Sender[null] 
sent message of type 
\"org.apache.flink.runtime.rpc.messages.LocalFencedMessage\".\n\tat 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)\n\tat
 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)\n\tat
 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)\n\tat
 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)\n\tat
 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)\n\tat
 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)\n\tat
 
{code}


was (Author: aiyangar):
Hi 
Is it working on flink 1.8 ? we tried the setting and still seem to hit it with 
the deafult timeout of 1ms.

 

```

level":"WARN","level_value":3,"stack_trace":"java.util.concurrent.CompletionException:
 akka.pattern.AskTimeoutException: Ask timed out on 
[Actor[akka://flink/user/dispatcher#-1731728438]] after [1 ms]. 
Sender[null] sent message of type 
\"org.apache.flink.runtime.rpc.messages.LocalFencedMessage\".\n\tat 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)\n\tat
 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)\n\tat
 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)\n\tat
 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)\n\tat
 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)\n\tat
 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)\n\tat
 
```

> AskTimeoutException is thrown during job submission and completion
> --
>
> Key: FLINK-11143
> URL: https://issues.apache.org/jira/browse/FLINK-11143
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.6.2
>Reporter: Alex Vinnik
>Priority: Major
>
> For more details please see the thread
> [http://mail-archives.apache.org/mod_mbox/flink-user/201812.mbox/%3cc2fb26f9-1410-4333-80f4-34807481b...@gmail.com%3E]
> On submission 
> 2018-12-12 02:28:31 ERROR JobsOverviewHandler:92 - Implementation error: 
> Unhandled exception.
>  akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/dispatcher#225683351|#225683351]] after [1 ms]. 
> Sender[null] sent message of type 
> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>  at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>  at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
>  at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>  at 
> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>  at 
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>  at 
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
>  at java.lang.Thread.run(Thread.java:748)
>  
> On completion
>  
> {"errors":["Internal server error."," side:\njava.util.concurrent.CompletionException: 
> akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/dispatcher#105638574]] after [1 ms]. 
> Sender[null] sent message of type 
> \"org.apache.flink.runtime.rpc.messages.LocalFencedMessage\".
> at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
> at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> 

[jira] [Comment Edited] (FLINK-11143) AskTimeoutException is thrown during job submission and completion

2019-03-14 Thread Alex Vinnik (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793092#comment-16793092
 ] 

Alex Vinnik edited comment on FLINK-11143 at 3/14/19 9:51 PM:
--

[~Zentol] is there a repo with 1.8/1.9 build artifacts I can compile against? 
Having trouble building source code both master and release-1.8
{noformat}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-shade-plugin:3.0.0:shade (shade-flink) on 
project flink-s3-fs-base: Error creating shaded jar: duplicate entry: 
META-INF/services/org.apache.flink.fs.s3base.shaded.com.fasterxml.jackson.core.ObjectCodec
 -> [Help 1] {noformat}


was (Author: alvinnik):
[~Zentol] is there a repo with 1.8/1.9 build artifact I can compile against? 
Having trouble building source code both master and release-1.8
{noformat}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-shade-plugin:3.0.0:shade (shade-flink) on 
project flink-s3-fs-base: Error creating shaded jar: duplicate entry: 
META-INF/services/org.apache.flink.fs.s3base.shaded.com.fasterxml.jackson.core.ObjectCodec
 -> [Help 1] {noformat}

> AskTimeoutException is thrown during job submission and completion
> --
>
> Key: FLINK-11143
> URL: https://issues.apache.org/jira/browse/FLINK-11143
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.6.2
>Reporter: Alex Vinnik
>Priority: Major
>
> For more details please see the thread
> [http://mail-archives.apache.org/mod_mbox/flink-user/201812.mbox/%3cc2fb26f9-1410-4333-80f4-34807481b...@gmail.com%3E]
> On submission 
> 2018-12-12 02:28:31 ERROR JobsOverviewHandler:92 - Implementation error: 
> Unhandled exception.
>  akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/dispatcher#225683351|#225683351]] after [1 ms]. 
> Sender[null] sent message of type 
> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>  at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>  at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
>  at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>  at 
> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>  at 
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>  at 
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
>  at java.lang.Thread.run(Thread.java:748)
>  
> On completion
>  
> {"errors":["Internal server error."," side:\njava.util.concurrent.CompletionException: 
> akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/dispatcher#105638574]] after [1 ms]. 
> Sender[null] sent message of type 
> \"org.apache.flink.runtime.rpc.messages.LocalFencedMessage\".
> at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
> at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
> at 
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:772)
> at akka.dispatch.OnComplete.internal(Future.scala:258)
> at akka.dispatch.OnComplete.internal(Future.scala:256)
> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> at 
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
> at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
> at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
> at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
> at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
> at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
> at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
> at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
> at 
> 

[jira] [Comment Edited] (FLINK-11143) AskTimeoutException is thrown during job submission and completion

2019-02-27 Thread Chesnay Schepler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779070#comment-16779070
 ] 

Chesnay Schepler edited comment on FLINK-11143 at 2/27/19 9:51 AM:
---

The MiniCluster should respect the rpc timeout properly ({{akka.ask.timeout}}) 
on the master and release-1.7 branch. [~alvinnik] could you try again on either 
of these branches to check whether the problem persists/your job runs 
successfully?


was (Author: zentol):
The MiniCluster should respect the rpc timeout properly (\{{akka.ask.timeout}}) 
on the master and release-1.7 branch. [~alvinnik] could you try again on either 
of these branches to check whether the problem persists, and if not whether 
your job runs successfully?

> AskTimeoutException is thrown during job submission and completion
> --
>
> Key: FLINK-11143
> URL: https://issues.apache.org/jira/browse/FLINK-11143
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.6.2
>Reporter: Alex Vinnik
>Priority: Major
>
> For more details please see the thread
> [http://mail-archives.apache.org/mod_mbox/flink-user/201812.mbox/%3cc2fb26f9-1410-4333-80f4-34807481b...@gmail.com%3E]
> On submission 
> 2018-12-12 02:28:31 ERROR JobsOverviewHandler:92 - Implementation error: 
> Unhandled exception.
>  akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/dispatcher#225683351|#225683351]] after [1 ms]. 
> Sender[null] sent message of type 
> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>  at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>  at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
>  at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>  at 
> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>  at 
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>  at 
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
>  at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
>  at java.lang.Thread.run(Thread.java:748)
>  
> On completion
>  
> {"errors":["Internal server error."," side:\njava.util.concurrent.CompletionException: 
> akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/dispatcher#105638574]] after [1 ms]. 
> Sender[null] sent message of type 
> \"org.apache.flink.runtime.rpc.messages.LocalFencedMessage\".
> at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
> at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
> at 
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:772)
> at akka.dispatch.OnComplete.internal(Future.scala:258)
> at akka.dispatch.OnComplete.internal(Future.scala:256)
> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
> at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> at 
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
> at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
> at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
> at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
> at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
> at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
> at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
> at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
> at 
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
> at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
> at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
> at 
> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
> at