[jira] [Resolved] (FLINK-4261) Setup atomic deployment of snapshots
[ https://issues.apache.org/jira/browse/FLINK-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels resolved FLINK-4261. --- Resolution: Unresolved Unfortunately, we can't deploy snapshots atomically using the Nexus repository. The staged process which leads to an atomic deployment is only designed to work for releases. Best we can do is to retry deploying artifacts in case of failures. > Setup atomic deployment of snapshots > > > Key: FLINK-4261 > URL: https://issues.apache.org/jira/browse/FLINK-4261 > Project: Flink > Issue Type: Bug > Components: Build System, release >Reporter: Maximilian Michels >Assignee: Maximilian Michels > Fix For: 1.1.0 > > > Users have reported that our nightly snapshots become inconsistent from time > to time. This happens when the upload to the snapshot repository fails during > the deployment process. Maven doesn't support atomic deployment but deploys > artifacts one after another, directly after installing them in the local > repository. If the build fails at any time, no changes are rolled back. > This problem has been solved for Nexus repositories. For releases, we already > take advantage of atomic deployments using staging repositories. Nexus > repositories support this even without using a special Maven plugin. > For releases, we have to use the Web UI to close and release staging > repositories. For snapshots this should be automated. Most importantly, the > changes shouldn't alter anything for our release process. > I suggest to use the {{nexus-staging-maven-plugin}} which essentially > replaces the standard maven deploy plugin. It can be setup to auto-close and > auto-release snapshots staging repositories. For releases, it will be setup > to never auto-close nor auto-release which keeps our existing release process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4261) Setup atomic deployment of snapshots
[ https://issues.apache.org/jira/browse/FLINK-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393894#comment-15393894 ] Maximilian Michels commented on FLINK-4261: --- Retry via cd232e683f7aa6d7660ca2d545ba1534435e1ab1 > Setup atomic deployment of snapshots > > > Key: FLINK-4261 > URL: https://issues.apache.org/jira/browse/FLINK-4261 > Project: Flink > Issue Type: Bug > Components: Build System, release >Reporter: Maximilian Michels >Assignee: Maximilian Michels > Fix For: 1.1.0 > > > Users have reported that our nightly snapshots become inconsistent from time > to time. This happens when the upload to the snapshot repository fails during > the deployment process. Maven doesn't support atomic deployment but deploys > artifacts one after another, directly after installing them in the local > repository. If the build fails at any time, no changes are rolled back. > This problem has been solved for Nexus repositories. For releases, we already > take advantage of atomic deployments using staging repositories. Nexus > repositories support this even without using a special Maven plugin. > For releases, we have to use the Web UI to close and release staging > repositories. For snapshots this should be automated. Most importantly, the > changes shouldn't alter anything for our release process. > I suggest to use the {{nexus-staging-maven-plugin}} which essentially > replaces the standard maven deploy plugin. It can be setup to auto-close and > auto-release snapshots staging repositories. For releases, it will be setup > to never auto-close nor auto-release which keeps our existing release process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-4152) TaskManager registration exponential backoff doesn't work
[ https://issues.apache.org/jira/browse/FLINK-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels closed FLINK-4152. - Resolution: Fixed Fix Version/s: 1.1.0 Fixed via 2648bc1a5a5faed8c2061bcab40a8949fd02751c > TaskManager registration exponential backoff doesn't work > - > > Key: FLINK-4152 > URL: https://issues.apache.org/jira/browse/FLINK-4152 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, TaskManager, YARN Client >Reporter: Robert Metzger >Assignee: Till Rohrmann > Fix For: 1.1.0 > > Attachments: logs.tgz > > > While testing Flink 1.1 I've found that the TaskManagers are logging many > messages when registering at the JobManager. > This is the log file: > https://gist.github.com/rmetzger/0cebe0419cdef4507b1e8a42e33ef294 > Its logging more than 3000 messages in less than a minute. I don't think that > this is the expected behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-4261) Setup atomic deployment of snapshots
Maximilian Michels created FLINK-4261: - Summary: Setup atomic deployment of snapshots Key: FLINK-4261 URL: https://issues.apache.org/jira/browse/FLINK-4261 Project: Flink Issue Type: Bug Components: Build System, release Reporter: Maximilian Michels Assignee: Maximilian Michels Fix For: 1.1.0 Users have reported that our nightly snapshots become inconsistent from time to time. This happens when the upload to the snapshot repository fails during the deployment process. Maven doesn't support atomic deployment but deploys artifacts one after another, directly after installing them in the local repository. If the build fails at any time, no changes are rolled back. This problem has been solved for Nexus repositories. For releases, we already take advantage of atomic deployments using staging repositories. Nexus repositories support this even without using a special Maven plugin. For releases, we have to use the Web UI to close and release staging repositories. For snapshots this should be automated. Most importantly, the changes shouldn't alter anything for our release process. I suggest to use the {{nexus-staging-maven-plugin}} which essentially replaces the standard maven deploy plugin. It can be setup to auto-close and auto-release snapshots staging repositories. For releases, it will be setup to never auto-close nor auto-release which keeps our existing release process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-4261) Setup atomic deployment of snapshots
[ https://issues.apache.org/jira/browse/FLINK-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels updated FLINK-4261: -- External issue ID: (was: INFRA-12320) > Setup atomic deployment of snapshots > > > Key: FLINK-4261 > URL: https://issues.apache.org/jira/browse/FLINK-4261 > Project: Flink > Issue Type: Bug > Components: Build System, release >Reporter: Maximilian Michels >Assignee: Maximilian Michels > Fix For: 1.1.0 > > > Users have reported that our nightly snapshots become inconsistent from time > to time. This happens when the upload to the snapshot repository fails during > the deployment process. Maven doesn't support atomic deployment but deploys > artifacts one after another, directly after installing them in the local > repository. If the build fails at any time, no changes are rolled back. > This problem has been solved for Nexus repositories. For releases, we already > take advantage of atomic deployments using staging repositories. Nexus > repositories support this even without using a special Maven plugin. > For releases, we have to use the Web UI to close and release staging > repositories. For snapshots this should be automated. Most importantly, the > changes shouldn't alter anything for our release process. > I suggest to use the {{nexus-staging-maven-plugin}} which essentially > replaces the standard maven deploy plugin. It can be setup to auto-close and > auto-release snapshots staging repositories. For releases, it will be setup > to never auto-close nor auto-release which keeps our existing release process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-4236) Flink Dashboard stops showing list of uploaded jars if main method cannot be looked up
[ https://issues.apache.org/jira/browse/FLINK-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels updated FLINK-4236: -- Fix Version/s: 1.1.0 > Flink Dashboard stops showing list of uploaded jars if main method cannot be > looked up > -- > > Key: FLINK-4236 > URL: https://issues.apache.org/jira/browse/FLINK-4236 > Project: Flink > Issue Type: Bug > Components: Job-Submission >Affects Versions: 1.0.3 >Reporter: Gary Yao > Fix For: 1.1.0 > > > The Flink Dashboard stops showing the list of uploaded jars on the job > submission page if a jar is uploaded for which the main method cannot be > looked up. The HTTP call returns with code 500. See the attached stacktrace: > {code} > java.lang.RuntimeException: Failed to fetch jar list: Could not look up the > main(String[]) method from the class de.zalando.[...]]: > org/shaded/apache/flink/streaming/api/functions/source/SourceFunction > at > org.apache.flink.runtime.webmonitor.handlers.JarListHandler.handleRequest(JarListHandler.java:122) > at > org.apache.flink.runtime.webmonitor.RuntimeMonitorHandler.respondAsLeader(RuntimeMonitorHandler.java:135) > at > org.apache.flink.runtime.webmonitor.RuntimeMonitorHandler.channelRead0(RuntimeMonitorHandler.java:112) > at > org.apache.flink.runtime.webmonitor.RuntimeMonitorHandler.channelRead0(RuntimeMonitorHandler.java:60) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at io.netty.handler.codec.http.router.Handler.routed(Handler.java:62) > at > io.netty.handler.codec.http.router.DualAbstractHandler.channelRead0(DualAbstractHandler.java:57) > at > io.netty.handler.codec.http.router.DualAbstractHandler.channelRead0(DualAbstractHandler.java:20) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > org.apache.flink.runtime.webmonitor.HttpRequestHandler.channelRead0(HttpRequestHandler.java:104) > at > org.apache.flink.runtime.webmonitor.HttpRequestHandler.channelRead0(HttpRequestHandler.java:65) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242) > at > io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:147) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: Could not look up the main(String[]) > method from the class de.zalando.[...]]: > org/shaded/apache/flink/streaming/api/functions/source/SourceFunction > at > org.apache.flink.client.program.PackagedProgram.hasMainMethod(PackagedProgram.java:479) > at > org.apache.flink.client.program.PackagedProgram.(PackagedProgram.java:216) > at > org.apache.flink.client.program.PackagedProgram.(PackagedProgram.java:147) > at > org.apache.flink.runtime.webmonitor.handlers.JarListHandler.handleRequest(JarListHandler.java:103) > ... 30 more > Caused by: java.lang.NoClassDefFoundError: > org/shaded/apache/flink/streaming/api/functions/source/SourceFunction > at java.lang.Class.ge
[jira] [Resolved] (FLINK-3753) KillerWatchDog should not use kill on toKill thread
[ https://issues.apache.org/jira/browse/FLINK-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels resolved FLINK-3753. --- Resolution: Won't Fix > KillerWatchDog should not use kill on toKill thread > --- > > Key: FLINK-3753 > URL: https://issues.apache.org/jira/browse/FLINK-3753 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > > {code} > // this is harsh, but this watchdog is a last resort > if (toKill.isAlive()) { > toKill.stop(); > } > {code} > stop() is deprecated. > See: > https://www.securecoding.cert.org/confluence/display/java/THI05-J.+Do+not+use+Thread.stop()+to+terminate+threads -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3753) KillerWatchDog should not use kill on toKill thread
[ https://issues.apache.org/jira/browse/FLINK-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385754#comment-15385754 ] Maximilian Michels commented on FLINK-3753: --- Thanks for reporting! It actually states in the class why the {{stop()}} method is used. I think we have to work around some issues with the Kafka fetcher. I would close this issue for now because we want to use {{stop()}} for the KillerWatchDog (scary name!). > KillerWatchDog should not use kill on toKill thread > --- > > Key: FLINK-3753 > URL: https://issues.apache.org/jira/browse/FLINK-3753 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > > {code} > // this is harsh, but this watchdog is a last resort > if (toKill.isAlive()) { > toKill.stop(); > } > {code} > stop() is deprecated. > See: > https://www.securecoding.cert.org/confluence/display/java/THI05-J.+Do+not+use+Thread.stop()+to+terminate+threads -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-4166) Generate automatic different namespaces in Zookeeper for Flink applications
[ https://issues.apache.org/jira/browse/FLINK-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels closed FLINK-4166. - Resolution: Fixed Assignee: Stefan Richter Fix Version/s: 1.1.0 Fixed via 082d87e5139a10fc34ddb3072d15c1a4747117e7 > Generate automatic different namespaces in Zookeeper for Flink applications > --- > > Key: FLINK-4166 > URL: https://issues.apache.org/jira/browse/FLINK-4166 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.0.3 >Reporter: Stefan Richter >Assignee: Stefan Richter > Fix For: 1.1.0 > > > We should automatically generate different namespaces per Flink application > in Zookeeper to avoid interference between different applications that refer > to the same Zookeeper entries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-4199) Misleading messages by CLI upon job submission
[ https://issues.apache.org/jira/browse/FLINK-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels resolved FLINK-4199. --- Resolution: Fixed Fixed via 17589d454d00efa43cdf6116ea29ff4f513b6f20 > Misleading messages by CLI upon job submission > -- > > Key: FLINK-4199 > URL: https://issues.apache.org/jira/browse/FLINK-4199 > Project: Flink > Issue Type: Bug > Components: Client >Affects Versions: 1.1.0 >Reporter: Kostas Kloudas >Assignee: Maximilian Michels > Fix For: 1.1.0 > > > Trying to submit a job jar from the client to a non-existing cluster gives > the following messages. In particular the first and last lines: "Cluster > retrieved: Standalone cluster with JobManager at localhost/127.0.0.1:6123" > and "Job has been submitted with" are totally misleading. > {code} > Cluster retrieved: Standalone cluster with JobManager at > localhost/127.0.0.1:6123 > Using address localhost:6123 to connect to JobManager. > JobManager web interface address http://localhost:8081 > Starting execution of program > Submitting job with JobID: 9c7120e5cc55b2a9157a7e2bc5a12c9d. Waiting for job > completion. > org.apache.flink.client.program.ProgramInvocationException: The program > execution failed: Communication with JobManager failed: Lost connection to > the JobManager. > Job has been submitted with JobID 9c7120e5cc55b2a9157a7e2bc5a12c9d > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-4223) Rearrange scaladoc and javadoc for Scala API
[ https://issues.apache.org/jira/browse/FLINK-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels closed FLINK-4223. - Resolution: Duplicate > Rearrange scaladoc and javadoc for Scala API > > > Key: FLINK-4223 > URL: https://issues.apache.org/jira/browse/FLINK-4223 > Project: Flink > Issue Type: Improvement > Components: Documentation >Reporter: Chiwan Park >Priority: Minor > Labels: easyfix, newbie > > Currently, some scaladocs for Scala API (Gelly Scala API, FlinkML, Streaming > Scala API) are not in scaladoc but in javadoc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4223) Rearrange scaladoc and javadoc for Scala API
[ https://issues.apache.org/jira/browse/FLINK-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383846#comment-15383846 ] Maximilian Michels commented on FLINK-4223: --- Seems like a duplicate of FLINK-3710. > Rearrange scaladoc and javadoc for Scala API > > > Key: FLINK-4223 > URL: https://issues.apache.org/jira/browse/FLINK-4223 > Project: Flink > Issue Type: Improvement > Components: Documentation >Reporter: Chiwan Park >Priority: Minor > Labels: easyfix, newbie > > Currently, some scaladocs for Scala API (Gelly Scala API, FlinkML, Streaming > Scala API) are not in scaladoc but in javadoc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4199) Misleading messages by CLI upon job submission
[ https://issues.apache.org/jira/browse/FLINK-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382423#comment-15382423 ] Maximilian Michels commented on FLINK-4199: --- This is how it looks now for successful executions: {noformat} Cluster configuration: Standalone cluster with JobManager at localhost/127.0.0.1:6123 Using address localhost:6123 to connect to JobManager. JobManager web interface address http://localhost:8081 Starting execution of program Executing WordCount example with default input data set. Use --input to specify file input. Printing result to stdout. Use --output to specify output path. Submitting job with JobID: a56b184080d9755d3f2d40c7d9092b31. Waiting for job completion. 07/18/2016 16:42:23 Job execution switched to status RUNNING. 07/18/2016 16:42:23 Source: Collection Source -> Flat Map(1/1) switched to SCHEDULED 07/18/2016 16:42:23 Source: Collection Source -> Flat Map(1/1) switched to DEPLOYING 07/18/2016 16:42:23 Keyed Aggregation -> Sink: Unnamed(1/1) switched to SCHEDULED 07/18/2016 16:42:23 Keyed Aggregation -> Sink: Unnamed(1/1) switched to DEPLOYING 07/18/2016 16:42:23 Keyed Aggregation -> Sink: Unnamed(1/1) switched to RUNNING 07/18/2016 16:42:23 Source: Collection Source -> Flat Map(1/1) switched to RUNNING 07/18/2016 16:42:23 Source: Collection Source -> Flat Map(1/1) switched to FINISHED 07/18/2016 16:42:23 Keyed Aggregation -> Sink: Unnamed(1/1) switched to FINISHED 07/18/2016 16:42:23 Job execution switched to status FINISHED. Program execution finished Job with JobID a56b184080d9755d3f2d40c7d9092b31 has finished. Job Runtime: 25 ms {noformat} This is how it looks when the JobManager is not available: {noformat} Cluster configuration: Standalone cluster with JobManager at localhost/127.0.0.1:6123 Using address localhost:6123 to connect to JobManager. JobManager web interface address http://localhost:8081 Starting execution of program Executing WordCount example with default input data set. Use --input to specify file input. Printing result to stdout. Use --output to specify output path. Submitting job with JobID: 20d070cf6a4289df4e5045c9c52cc47b. Waiting for job completion. The program finished with the following exception: org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Communication with JobManager failed: Lost connection to the JobManager. ... {noformat} > Misleading messages by CLI upon job submission > -- > > Key: FLINK-4199 > URL: https://issues.apache.org/jira/browse/FLINK-4199 > Project: Flink > Issue Type: Bug > Components: Client >Affects Versions: 1.1.0 >Reporter: Kostas Kloudas >Assignee: Maximilian Michels > Fix For: 1.1.0 > > > Trying to submit a job jar from the client to a non-existing cluster gives > the following messages. In particular the first and last lines: "Cluster > retrieved: Standalone cluster with JobManager at localhost/127.0.0.1:6123" > and "Job has been submitted with" are totally misleading. > {code} > Cluster retrieved: Standalone cluster with JobManager at > localhost/127.0.0.1:6123 > Using address localhost:6123 to connect to JobManager. > JobManager web interface address http://localhost:8081 > Starting execution of program > Submitting job with JobID: 9c7120e5cc55b2a9157a7e2bc5a12c9d. Waiting for job > completion. > org.apache.flink.client.program.ProgramInvocationException: The program > execution failed: Communication with JobManager failed: Lost connection to > the JobManager. > Job has been submitted with JobID 9c7120e5cc55b2a9157a7e2bc5a12c9d > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4199) Misleading messages by CLI upon job submission
[ https://issues.apache.org/jira/browse/FLINK-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382416#comment-15382416 ] Maximilian Michels commented on FLINK-4199: --- Thanks for reporting! {quote} 1) Cluster retrieved: Standalone cluster with JobManager at localhost/127.0.0.1:6123 {quote} I see how this is misleading. We don't do any communication with the cluster until we submit the job. So this should be more like "Cluster configuration used: ...". We don't want any communication with the cluster before we have executed the user jar interactively because the jar may not even execute. {quote} org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Communication with JobManager failed: Lost connection to the JobManager. Job has been submitted with JobID 9c7120e5cc55b2a9157a7e2bc5a12c9d {quote} {quote} 07/12/2016 16:18:58 Job execution switched to status FINISHED. Job has been submitted with JobID 477429b836247909dd428a6cba5b923b {quote} These are both artifacts of Flink not returning proper results for interactive programs. I'm changing that. > Misleading messages by CLI upon job submission > -- > > Key: FLINK-4199 > URL: https://issues.apache.org/jira/browse/FLINK-4199 > Project: Flink > Issue Type: Bug > Components: Client >Affects Versions: 1.1.0 >Reporter: Kostas Kloudas >Assignee: Maximilian Michels > Fix For: 1.1.0 > > > Trying to submit a job jar from the client to a non-existing cluster gives > the following messages. In particular the first and last lines: "Cluster > retrieved: Standalone cluster with JobManager at localhost/127.0.0.1:6123" > and "Job has been submitted with" are totally misleading. > {code} > Cluster retrieved: Standalone cluster with JobManager at > localhost/127.0.0.1:6123 > Using address localhost:6123 to connect to JobManager. > JobManager web interface address http://localhost:8081 > Starting execution of program > Submitting job with JobID: 9c7120e5cc55b2a9157a7e2bc5a12c9d. Waiting for job > completion. > org.apache.flink.client.program.ProgramInvocationException: The program > execution failed: Communication with JobManager failed: Lost connection to > the JobManager. > Job has been submitted with JobID 9c7120e5cc55b2a9157a7e2bc5a12c9d > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-4199) Misleading messages by CLI upon job submission
[ https://issues.apache.org/jira/browse/FLINK-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels updated FLINK-4199: -- Summary: Misleading messages by CLI upon job submission (was: Wrong client behavior when submitting job to non-existing cluster) > Misleading messages by CLI upon job submission > -- > > Key: FLINK-4199 > URL: https://issues.apache.org/jira/browse/FLINK-4199 > Project: Flink > Issue Type: Bug > Components: Client >Affects Versions: 1.1.0 >Reporter: Kostas Kloudas >Assignee: Maximilian Michels > Fix For: 1.1.0 > > > Trying to submit a job jar from the client to a non-existing cluster gives > the following messages. In particular the first and last lines: "Cluster > retrieved: Standalone cluster with JobManager at localhost/127.0.0.1:6123" > and "Job has been submitted with" are totally misleading. > {code} > Cluster retrieved: Standalone cluster with JobManager at > localhost/127.0.0.1:6123 > Using address localhost:6123 to connect to JobManager. > JobManager web interface address http://localhost:8081 > Starting execution of program > Submitting job with JobID: 9c7120e5cc55b2a9157a7e2bc5a12c9d. Waiting for job > completion. > org.apache.flink.client.program.ProgramInvocationException: The program > execution failed: Communication with JobManager failed: Lost connection to > the JobManager. > Job has been submitted with JobID 9c7120e5cc55b2a9157a7e2bc5a12c9d > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-4199) Wrong client behavior when submitting job to non-existing cluster
[ https://issues.apache.org/jira/browse/FLINK-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels updated FLINK-4199: -- Fix Version/s: 1.1.0 > Wrong client behavior when submitting job to non-existing cluster > - > > Key: FLINK-4199 > URL: https://issues.apache.org/jira/browse/FLINK-4199 > Project: Flink > Issue Type: Bug > Components: Client >Affects Versions: 1.1.0 >Reporter: Kostas Kloudas >Assignee: Maximilian Michels > Fix For: 1.1.0 > > > Trying to submit a job jar from the client to a non-existing cluster gives > the following messages. In particular the first and last lines: "Cluster > retrieved: Standalone cluster with JobManager at localhost/127.0.0.1:6123" > and "Job has been submitted with" are totally misleading. > {code} > Cluster retrieved: Standalone cluster with JobManager at > localhost/127.0.0.1:6123 > Using address localhost:6123 to connect to JobManager. > JobManager web interface address http://localhost:8081 > Starting execution of program > Submitting job with JobID: 9c7120e5cc55b2a9157a7e2bc5a12c9d. Waiting for job > completion. > org.apache.flink.client.program.ProgramInvocationException: The program > execution failed: Communication with JobManager failed: Lost connection to > the JobManager. > Job has been submitted with JobID 9c7120e5cc55b2a9157a7e2bc5a12c9d > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-4199) Wrong client behavior when submitting job to non-existing cluster
[ https://issues.apache.org/jira/browse/FLINK-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels updated FLINK-4199: -- Affects Version/s: 1.1.0 > Wrong client behavior when submitting job to non-existing cluster > - > > Key: FLINK-4199 > URL: https://issues.apache.org/jira/browse/FLINK-4199 > Project: Flink > Issue Type: Bug > Components: Client >Affects Versions: 1.1.0 >Reporter: Kostas Kloudas >Assignee: Maximilian Michels > Fix For: 1.1.0 > > > Trying to submit a job jar from the client to a non-existing cluster gives > the following messages. In particular the first and last lines: "Cluster > retrieved: Standalone cluster with JobManager at localhost/127.0.0.1:6123" > and "Job has been submitted with" are totally misleading. > {code} > Cluster retrieved: Standalone cluster with JobManager at > localhost/127.0.0.1:6123 > Using address localhost:6123 to connect to JobManager. > JobManager web interface address http://localhost:8081 > Starting execution of program > Submitting job with JobID: 9c7120e5cc55b2a9157a7e2bc5a12c9d. Waiting for job > completion. > org.apache.flink.client.program.ProgramInvocationException: The program > execution failed: Communication with JobManager failed: Lost connection to > the JobManager. > Job has been submitted with JobID 9c7120e5cc55b2a9157a7e2bc5a12c9d > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-4199) Wrong client behavior when submitting job to non-existing cluster
[ https://issues.apache.org/jira/browse/FLINK-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels reassigned FLINK-4199: - Assignee: Maximilian Michels > Wrong client behavior when submitting job to non-existing cluster > - > > Key: FLINK-4199 > URL: https://issues.apache.org/jira/browse/FLINK-4199 > Project: Flink > Issue Type: Bug > Components: Client >Reporter: Kostas Kloudas >Assignee: Maximilian Michels > > Trying to submit a job jar from the client to a non-existing cluster gives > the following messages. In particular the first and last lines: "Cluster > retrieved: Standalone cluster with JobManager at localhost/127.0.0.1:6123" > and "Job has been submitted with" are totally misleading. > {code} > Cluster retrieved: Standalone cluster with JobManager at > localhost/127.0.0.1:6123 > Using address localhost:6123 to connect to JobManager. > JobManager web interface address http://localhost:8081 > Starting execution of program > Submitting job with JobID: 9c7120e5cc55b2a9157a7e2bc5a12c9d. Waiting for job > completion. > org.apache.flink.client.program.ProgramInvocationException: The program > execution failed: Communication with JobManager failed: Lost connection to > the JobManager. > Job has been submitted with JobID 9c7120e5cc55b2a9157a7e2bc5a12c9d > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4166) Generate automatic different namespaces in Zookeeper for Flink applications
[ https://issues.apache.org/jira/browse/FLINK-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382162#comment-15382162 ] Maximilian Michels commented on FLINK-4166: --- [~srichter] Are you planning to work on a fix? > Generate automatic different namespaces in Zookeeper for Flink applications > --- > > Key: FLINK-4166 > URL: https://issues.apache.org/jira/browse/FLINK-4166 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.0.3 >Reporter: Stefan Richter > > We should automatically generate different namespaces per Flink application > in Zookeeper to avoid interference between different applications that refer > to the same Zookeeper entries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-4144) Yarn properties file: replace hostname/port with Yarn application id
[ https://issues.apache.org/jira/browse/FLINK-4144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels resolved FLINK-4144. --- Resolution: Fixed Resolved with 7ab6837fde3adb588273ef6bb8f4f7a215fe9c03 > Yarn properties file: replace hostname/port with Yarn application id > > > Key: FLINK-4144 > URL: https://issues.apache.org/jira/browse/FLINK-4144 > Project: Flink > Issue Type: Bug > Components: YARN Client >Affects Versions: 1.0.3 >Reporter: Maximilian Michels >Assignee: Maximilian Michels > Fix For: 1.1.0 > > > We should use the application id instead of the host/port. The hostname and > port of the JobManager can change (HA). Also, it is not unique depending on > the network configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3675) YARN ship folder incosistent behavior
[ https://issues.apache.org/jira/browse/FLINK-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15359399#comment-15359399 ] Maximilian Michels commented on FLINK-3675: --- Additional fix with 16cdb61225d78c822566e33013162fa3e40fa279 > YARN ship folder incosistent behavior > - > > Key: FLINK-3675 > URL: https://issues.apache.org/jira/browse/FLINK-3675 > Project: Flink > Issue Type: Bug > Components: YARN Client >Affects Versions: 1.0.0 >Reporter: Stefano Baghino >Assignee: Maximilian Michels >Priority: Critical > Fix For: 1.1.0 > > > After [some discussion on the user mailing > list|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-and-YARN-ship-folder-td5458.html] > it came up that the {{flink/lib}} folder is always supposed to be shipped to > the YARN cluster so that all the nodes have access to its contents. > Currently however, the Flink long-running YARN session actually ships the > folder because it's explicitly specified in the {{yarn-session.sh}} script, > while running a single job on YARN does not automatically ship it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-4141) TaskManager failures not always recover when killed during an ApplicationMaster failure in HA mode on Yarn
[ https://issues.apache.org/jira/browse/FLINK-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels resolved FLINK-4141. --- Resolution: Fixed Fix Version/s: 1.1.0 Resolved with f722b73772eb66cdb79a288300e38ff7026c7e1f > TaskManager failures not always recover when killed during an > ApplicationMaster failure in HA mode on Yarn > -- > > Key: FLINK-4141 > URL: https://issues.apache.org/jira/browse/FLINK-4141 > Project: Flink > Issue Type: Bug >Affects Versions: 1.0.3 >Reporter: Stefan Richter >Assignee: Maximilian Michels > Fix For: 1.1.0 > > > High availability on Yarn often fails to recover in the following test > scenario: > 1. Kill application master process. > 2. Then, while application master is recovering, randomly kill several task > managers (with some delay). > After the application master recovered, not all the killed task manager are > brought back and no further attempts are made the restart them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-4144) Yarn properties file: replace hostname/port with Yarn application id
Maximilian Michels created FLINK-4144: - Summary: Yarn properties file: replace hostname/port with Yarn application id Key: FLINK-4144 URL: https://issues.apache.org/jira/browse/FLINK-4144 Project: Flink Issue Type: Bug Components: YARN Client Affects Versions: 1.0.3 Reporter: Maximilian Michels Assignee: Maximilian Michels Fix For: 1.1.0 We should use the application id instead of the host/port. The hostname and port of the JobManager can change (HA). Also, it is not unique depending on the network configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-4141) TaskManager failures not always recover when killed during an ApplicationMaster failure in HA mode on Yarn
[ https://issues.apache.org/jira/browse/FLINK-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels reassigned FLINK-4141: - Assignee: Maximilian Michels > TaskManager failures not always recover when killed during an > ApplicationMaster failure in HA mode on Yarn > -- > > Key: FLINK-4141 > URL: https://issues.apache.org/jira/browse/FLINK-4141 > Project: Flink > Issue Type: Bug >Affects Versions: 1.0.3 >Reporter: Stefan Richter >Assignee: Maximilian Michels > > High availability on Yarn often fails to recover in the following test > scenario: > 1. Kill application master process. > 2. Then, while application master is recovering, randomly kill several task > managers (with some delay). > After the application master recovered, not all the killed task manager are > brought back and no further attempts are made the restart them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-3675) YARN ship folder incosistent behavior
[ https://issues.apache.org/jira/browse/FLINK-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels resolved FLINK-3675. --- Resolution: Fixed Fixed via 0483ba583c7790d13b8035c2916318a2b58c67d6 > YARN ship folder incosistent behavior > - > > Key: FLINK-3675 > URL: https://issues.apache.org/jira/browse/FLINK-3675 > Project: Flink > Issue Type: Bug > Components: YARN Client >Affects Versions: 1.0.0 >Reporter: Stefano Baghino >Assignee: Maximilian Michels >Priority: Critical > Fix For: 1.1.0 > > > After [some discussion on the user mailing > list|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-and-YARN-ship-folder-td5458.html] > it came up that the {{flink/lib}} folder is always supposed to be shipped to > the YARN cluster so that all the nodes have access to its contents. > Currently however, the Flink long-running YARN session actually ships the > folder because it's explicitly specified in the {{yarn-session.sh}} script, > while running a single job on YARN does not automatically ship it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3667) Generalize client<->cluster communication
[ https://issues.apache.org/jira/browse/FLINK-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358960#comment-15358960 ] Maximilian Michels commented on FLINK-3667: --- Fixed via f9b52a3114a2114e6846091acf3abb294a49615b Additional fixes: 3b593632dd162d951281fab8a8ed8c6bc2b07b39 a3aea27983d23d48bbad92c400d4cd42f36fabd3 cfd48a6f510c937080df0918fcb05aa410885c29 8d589623d2c2d039b014bc8783bef25351ec36ce > Generalize client<->cluster communication > - > > Key: FLINK-3667 > URL: https://issues.apache.org/jira/browse/FLINK-3667 > Project: Flink > Issue Type: Improvement > Components: YARN Client >Reporter: Maximilian Michels >Assignee: Maximilian Michels > Fix For: 1.1.0 > > > Here are some notes I took when inspecting the client<->cluster classes with > regard to future integration of other resource management frameworks in > addition to Yarn (e.g. Mesos). > {noformat} > 1 Cluster Client Abstraction > > 1.1 Status Quo > ── > 1.1.1 FlinkYarnClient > ╌ > • Holds the cluster configuration (Flink-specific and Yarn-specific) > • Contains the deploy() method to deploy the cluster > • Creates the Hadoop Yarn client > • Receives the initial job manager address > • Bootstraps the FlinkYarnCluster > 1.1.2 FlinkYarnCluster > ╌╌ > • Wrapper around the Hadoop Yarn client > • Queries cluster for status updates > • Life time methods to start and shutdown the cluster > • Flink specific features like shutdown after job completion > 1.1.3 ApplicationClient > ╌╌╌ > • Acts as a middle-man for asynchronous cluster communication > • Designed to communicate with Yarn, not used in Standalone mode > 1.1.4 CliFrontend > ╌ > • Deeply integrated with FlinkYarnClient and FlinkYarnCluster > • Constantly distinguishes between Yarn and Standalone mode > • Would be nice to have a general abstraction in place > 1.1.5 Client > > • Job submission and Job related actions, agnostic of resource framework > 1.2 Proposal > > 1.2.1 ClusterConfig (before: AbstractFlinkYarnClient) > ╌ > • Extensible cluster-agnostic config > • May be extended by specific cluster, e.g. YarnClusterConfig > 1.2.2 ClusterClient (before: AbstractFlinkYarnClient) > ╌ > • Deals with cluster (RM) specific communication > • Exposes framework agnostic information > • YarnClusterClient, MesosClusterClient, StandaloneClusterClient > 1.2.3 FlinkCluster (before: AbstractFlinkYarnCluster) > ╌ > • Basic interface to communicate with a running cluster > • Receives the ClusterClient for cluster-specific communication > • Should not have to care about the specific implementations of the > client > 1.2.4 ApplicationClient > ╌╌╌ > • Can be changed to work cluster-agnostic (first steps already in > FLINK-3543) > 1.2.5 CliFrontend > ╌ > • CliFrontend does never have to differentiate between different > cluster types after it has determined which cluster class to load. > • Base class handles framework agnostic command line arguments > • Pluggables for Yarn, Mesos handle specific commands > {noformat} > I would like to create/refactor the affected classes to set us up for a more > flexible client side resource management abstraction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-4139) Yarn: Adjust parallelism and task slots correctly
[ https://issues.apache.org/jira/browse/FLINK-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels closed FLINK-4139. - Resolution: Fixed Fixed via 44b3bc45b382c1f2783e9c17dd76ea2e9bbb40ec > Yarn: Adjust parallelism and task slots correctly > - > > Key: FLINK-4139 > URL: https://issues.apache.org/jira/browse/FLINK-4139 > Project: Flink > Issue Type: Bug > Components: Client, YARN Client >Affects Versions: 1.1.0, 1.0.3 >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Minor > Fix For: 1.1.0 > > > The Yarn CLI should handle the following situations correctly: > - The user specifies no parallelism -> parallelism is adjusted to #taskSlots > * #nodes. > - The user specifies parallelism but no #taskSlots or too few slots -> > #taskSlots are set such that they meet the parallelism > These functionality has been present in Flink 1.0.x but there were some > glitches in the implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-4139) Yarn: Adjust parallelism and task slots correctly
Maximilian Michels created FLINK-4139: - Summary: Yarn: Adjust parallelism and task slots correctly Key: FLINK-4139 URL: https://issues.apache.org/jira/browse/FLINK-4139 Project: Flink Issue Type: Bug Components: Client, YARN Client Affects Versions: 1.0.3, 1.1.0 Reporter: Maximilian Michels Assignee: Maximilian Michels Priority: Minor Fix For: 1.1.0 The Yarn CLI should handle the following situations correctly: - The user specifies no parallelism -> parallelism is adjusted to #taskSlots * #nodes. - The user specifies parallelism but no #taskSlots or too few slots -> #taskSlots are set such that they meet the parallelism These functionality has been present in Flink 1.0.x but there were some glitches in the implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-3675) YARN ship folder incosistent behavior
[ https://issues.apache.org/jira/browse/FLINK-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350953#comment-15350953 ] Maximilian Michels edited comment on FLINK-3675 at 6/27/16 1:41 PM: Option 2 is easiest but might be unexpected to the user. I'd rather choose option 1. was (Author: mxm): Option 1 is easiest but might be unexpected to the user. I'd rather choose option 2. > YARN ship folder incosistent behavior > - > > Key: FLINK-3675 > URL: https://issues.apache.org/jira/browse/FLINK-3675 > Project: Flink > Issue Type: Bug > Components: YARN Client >Affects Versions: 1.0.0 >Reporter: Stefano Baghino >Assignee: Maximilian Michels >Priority: Critical > Fix For: 1.1.0 > > > After [some discussion on the user mailing > list|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-and-YARN-ship-folder-td5458.html] > it came up that the {{flink/lib}} folder is always supposed to be shipped to > the YARN cluster so that all the nodes have access to its contents. > Currently however, the Flink long-running YARN session actually ships the > folder because it's explicitly specified in the {{yarn-session.sh}} script, > while running a single job on YARN does not automatically ship it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3675) YARN ship folder incosistent behavior
[ https://issues.apache.org/jira/browse/FLINK-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351013#comment-15351013 ] Maximilian Michels commented on FLINK-3675: --- Woops. Not enough coffee today ;) Yes, I meant option 1! The Flink assembly has to be in the /lib folder (or specified via {{-yj}}) but it is still shipped even if the lib folder is not shipped. Otherwise, we wouldn't be able to run any Flink clusters using {{./flink -m yarn-cluster}} because the /lib folder is currently not shipped there. > YARN ship folder incosistent behavior > - > > Key: FLINK-3675 > URL: https://issues.apache.org/jira/browse/FLINK-3675 > Project: Flink > Issue Type: Bug > Components: YARN Client >Affects Versions: 1.0.0 >Reporter: Stefano Baghino >Assignee: Maximilian Michels >Priority: Critical > Fix For: 1.1.0 > > > After [some discussion on the user mailing > list|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-and-YARN-ship-folder-td5458.html] > it came up that the {{flink/lib}} folder is always supposed to be shipped to > the YARN cluster so that all the nodes have access to its contents. > Currently however, the Flink long-running YARN session actually ships the > folder because it's explicitly specified in the {{yarn-session.sh}} script, > while running a single job on YARN does not automatically ship it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3675) YARN ship folder incosistent behavior
[ https://issues.apache.org/jira/browse/FLINK-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350974#comment-15350974 ] Maximilian Michels commented on FLINK-3675: --- So that would be option 2 :) Actually, the Flink assembly (fat jar) is always shipped, regardless of the ship files specified. > YARN ship folder incosistent behavior > - > > Key: FLINK-3675 > URL: https://issues.apache.org/jira/browse/FLINK-3675 > Project: Flink > Issue Type: Bug > Components: YARN Client >Affects Versions: 1.0.0 >Reporter: Stefano Baghino >Assignee: Maximilian Michels >Priority: Critical > Fix For: 1.1.0 > > > After [some discussion on the user mailing > list|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-and-YARN-ship-folder-td5458.html] > it came up that the {{flink/lib}} folder is always supposed to be shipped to > the YARN cluster so that all the nodes have access to its contents. > Currently however, the Flink long-running YARN session actually ships the > folder because it's explicitly specified in the {{yarn-session.sh}} script, > while running a single job on YARN does not automatically ship it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3675) YARN ship folder incosistent behavior
[ https://issues.apache.org/jira/browse/FLINK-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350953#comment-15350953 ] Maximilian Michels commented on FLINK-3675: --- Option 1 is easiest but might be unexpected to the user. I'd rather choose option 2. > YARN ship folder incosistent behavior > - > > Key: FLINK-3675 > URL: https://issues.apache.org/jira/browse/FLINK-3675 > Project: Flink > Issue Type: Bug > Components: YARN Client >Affects Versions: 1.0.0 >Reporter: Stefano Baghino >Assignee: Maximilian Michels >Priority: Critical > Fix For: 1.1.0 > > > After [some discussion on the user mailing > list|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-and-YARN-ship-folder-td5458.html] > it came up that the {{flink/lib}} folder is always supposed to be shipped to > the YARN cluster so that all the nodes have access to its contents. > Currently however, the Flink long-running YARN session actually ships the > folder because it's explicitly specified in the {{yarn-session.sh}} script, > while running a single job on YARN does not automatically ship it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-3675) YARN ship folder incosistent behavior
[ https://issues.apache.org/jira/browse/FLINK-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350951#comment-15350951 ] Maximilian Michels edited comment on FLINK-3675 at 6/27/16 1:04 PM: Do we want 1) to ship the /lib folder even if the user specifies a ship folder manually (via `-yt`)? Or do we want 2) to overwrite the default lib folder with the user location (as it is now for {{yarn-session.sh}})? was (Author: mxm): Do we want to ship the /lib folder even if the user specifies a ship folder manually (via `-yt`)? Or do we want to overwrite the default lib folder with the user location (as it is now for {{yarn-session.sh}})? > YARN ship folder incosistent behavior > - > > Key: FLINK-3675 > URL: https://issues.apache.org/jira/browse/FLINK-3675 > Project: Flink > Issue Type: Bug > Components: YARN Client >Affects Versions: 1.0.0 >Reporter: Stefano Baghino >Assignee: Maximilian Michels >Priority: Critical > Fix For: 1.1.0 > > > After [some discussion on the user mailing > list|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-and-YARN-ship-folder-td5458.html] > it came up that the {{flink/lib}} folder is always supposed to be shipped to > the YARN cluster so that all the nodes have access to its contents. > Currently however, the Flink long-running YARN session actually ships the > folder because it's explicitly specified in the {{yarn-session.sh}} script, > while running a single job on YARN does not automatically ship it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3675) YARN ship folder incosistent behavior
[ https://issues.apache.org/jira/browse/FLINK-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350951#comment-15350951 ] Maximilian Michels commented on FLINK-3675: --- Do we want to ship the /lib folder even if the user specifies a ship folder manually (via `-yt`)? Or do we want to overwrite the default lib folder with the user location (as it is now for {{yarn-session.sh}})? > YARN ship folder incosistent behavior > - > > Key: FLINK-3675 > URL: https://issues.apache.org/jira/browse/FLINK-3675 > Project: Flink > Issue Type: Bug > Components: YARN Client >Affects Versions: 1.0.0 >Reporter: Stefano Baghino >Assignee: Maximilian Michels >Priority: Critical > Fix For: 1.1.0 > > > After [some discussion on the user mailing > list|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-and-YARN-ship-folder-td5458.html] > it came up that the {{flink/lib}} folder is always supposed to be shipped to > the YARN cluster so that all the nodes have access to its contents. > Currently however, the Flink long-running YARN session actually ships the > folder because it's explicitly specified in the {{yarn-session.sh}} script, > while running a single job on YARN does not automatically ship it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-3675) YARN ship folder incosistent behavior
[ https://issues.apache.org/jira/browse/FLINK-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels reassigned FLINK-3675: - Assignee: Maximilian Michels > YARN ship folder incosistent behavior > - > > Key: FLINK-3675 > URL: https://issues.apache.org/jira/browse/FLINK-3675 > Project: Flink > Issue Type: Bug > Components: YARN Client >Affects Versions: 1.0.0 >Reporter: Stefano Baghino >Assignee: Maximilian Michels >Priority: Critical > Fix For: 1.1.0 > > > After [some discussion on the user mailing > list|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-and-YARN-ship-folder-td5458.html] > it came up that the {{flink/lib}} folder is always supposed to be shipped to > the YARN cluster so that all the nodes have access to its contents. > Currently however, the Flink long-running YARN session actually ships the > folder because it's explicitly specified in the {{yarn-session.sh}} script, > while running a single job on YARN does not automatically ship it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3710) ScalaDocs for org.apache.flink.streaming.scala are missing from the web site
[ https://issues.apache.org/jira/browse/FLINK-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350783#comment-15350783 ] Maximilian Michels commented on FLINK-3710: --- Thanks for reporting! We need to investigate why that is actually the case. I think [~aljoscha] had been working on this some time ago. Do you know how we could fix this [~aljoscha]? > ScalaDocs for org.apache.flink.streaming.scala are missing from the web site > > > Key: FLINK-3710 > URL: https://issues.apache.org/jira/browse/FLINK-3710 > Project: Flink > Issue Type: Bug > Components: Documentation >Affects Versions: 1.0.1 >Reporter: Elias Levy > Fix For: 1.1.0, 1.0.4 > > > The ScalaDocs only include docs for org.apache.flink.scala and sub-packages. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-3710) ScalaDocs for org.apache.flink.streaming.scala are missing from the web site
[ https://issues.apache.org/jira/browse/FLINK-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels updated FLINK-3710: -- Fix Version/s: 1.1.0 > ScalaDocs for org.apache.flink.streaming.scala are missing from the web site > > > Key: FLINK-3710 > URL: https://issues.apache.org/jira/browse/FLINK-3710 > Project: Flink > Issue Type: Bug > Components: Documentation >Affects Versions: 1.0.1 >Reporter: Elias Levy > Fix For: 1.1.0, 1.0.4 > > > The ScalaDocs only include docs for org.apache.flink.scala and sub-packages. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-3710) ScalaDocs for org.apache.flink.streaming.scala are missing from the web site
[ https://issues.apache.org/jira/browse/FLINK-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels updated FLINK-3710: -- Fix Version/s: 1.0.4 > ScalaDocs for org.apache.flink.streaming.scala are missing from the web site > > > Key: FLINK-3710 > URL: https://issues.apache.org/jira/browse/FLINK-3710 > Project: Flink > Issue Type: Bug > Components: Documentation >Affects Versions: 1.0.1 >Reporter: Elias Levy > Fix For: 1.1.0, 1.0.4 > > > The ScalaDocs only include docs for org.apache.flink.scala and sub-packages. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3964) Job submission times out with recursive.file.enumeration
[ https://issues.apache.org/jira/browse/FLINK-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350759#comment-15350759 ] Maximilian Michels commented on FLINK-3964: --- I see where you're going. That's currently not possible and wouldn't help in your case. First, the client would still have to wait until all vertices have been configured. Second, it is actually a single vertex (we're at a {{JobGraph}} level at submission time). Your Hadoop input format corresponds to one JobGraph vertex. Only later, the {{ExecutionGraph}} is created where the single vertex will be split up in the number of parallelism vertices. +1 to add a notice in the timeout exception for increasing {{akka.client.timeout}}. > Job submission times out with recursive.file.enumeration > > > Key: FLINK-3964 > URL: https://issues.apache.org/jira/browse/FLINK-3964 > Project: Flink > Issue Type: Bug > Components: Batch Connectors and Input/Output Formats, DataSet API >Affects Versions: 1.0.0 >Reporter: Juho Autio > > When using "recursive.file.enumeration" with a big enough folder structure to > list, flink batch job fails right at the beginning because of a timeout. > h2. Problem details > We get this error: {{Communication with JobManager failed: Job submission to > the JobManager timed out}}. > The code we have is basically this: > {code} > val env = ExecutionEnvironment.getExecutionEnvironment > val parameters = new Configuration > // set the recursive enumeration parameter > parameters.setBoolean("recursive.file.enumeration", true) > val parameter = ParameterTool.fromArgs(args) > val input_data_path : String = parameter.get("input_data_path", null ) > val data : DataSet[(Text,Text)] = env.readSequenceFile(classOf[Text], > classOf[Text], input_data_path) > .withParameters(parameters) > data.first(10).print > {code} > If we set {{input_data_path}} parameter to {{s3n://bucket/path/date=*/}} it > times out. If we use a more restrictive pattern like > {{s3n://bucket/path/date=20160523/}}, it doesn't time out. > To me it seems that time taken to list files shouldn't cause any timeouts on > job submission level. > For us this was "fixed" by adding {{akka.client.timeout: 600 s}} in > {{flink-conf.yaml}}, but I wonder if the timeout would still occur if we have > even more files to list? > > P.S. Is there any way to set {{akka.client.timeout}} when calling {{bin/flink > run}} instead of editing {{flink-conf.yaml}}. I tried to add it as a {{-yD}} > flag but couldn't get it working. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-4041) Failure while asking ResourceManager for RegisterResource
[ https://issues.apache.org/jira/browse/FLINK-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels resolved FLINK-4041. --- Resolution: Fixed Fix Version/s: 1.1.0 Resolved with a5aa4e115814539518fcaa88cbbca47da9d7ede5 > Failure while asking ResourceManager for RegisterResource > - > > Key: FLINK-4041 > URL: https://issues.apache.org/jira/browse/FLINK-4041 > Project: Flink > Issue Type: Bug > Components: ResourceManager >Affects Versions: 1.1.0 >Reporter: Robert Metzger >Assignee: Maximilian Michels > Labels: test-stability > Fix For: 1.1.0 > > > In this build > (https://s3.amazonaws.com/archive.travis-ci.org/jobs/136372462/log.txt), I > got the following YARN Test failure: > {code} > 2016-06-09 10:21:42,336 INFO > akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down > remote daemon. > 2016-06-09 10:21:42,336 INFO > akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon > shut down; proceeding with flushing remote transports. > 2016-06-09 10:21:42,355 INFO > akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut > down. > 2016-06-09 10:21:42,376 ERROR org.apache.flink.yarn.YarnJobManager > - Failure while asking ResourceManager for RegisterResource > akka.pattern.AskTimeoutException: Ask timed out on > [Actor[akka://flink/user/$c#1255104255]] after [1 ms] > at > akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333) > at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117) > at > akka.actor.LightArrayRevolverScheduler$TaskHolder.run(Scheduler.scala:476) > at > akka.actor.LightArrayRevolverScheduler$$anonfun$close$1.apply(Scheduler.scala:282) > at > akka.actor.LightArrayRevolverScheduler$$anonfun$close$1.apply(Scheduler.scala:281) > at scala.collection.Iterator$class.foreach(Iterator.scala:742) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1194) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at akka.actor.LightArrayRevolverScheduler.close(Scheduler.scala:280) > at akka.actor.ActorSystemImpl.stopScheduler(ActorSystem.scala:688) > at > akka.actor.ActorSystemImpl$$anonfun$liftedTree2$1$1.apply$mcV$sp(ActorSystem.scala:617) > at > akka.actor.ActorSystemImpl$$anonfun$liftedTree2$1$1.apply(ActorSystem.scala:617) > at > akka.actor.ActorSystemImpl$$anonfun$liftedTree2$1$1.apply(ActorSystem.scala:617) > at akka.actor.ActorSystemImpl$$anon$3.run(ActorSystem.scala:641) > at > akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.runNext$1(ActorSystem.scala:808) > at > akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.apply$mcV$sp(ActorSystem.scala:811) > at > akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.apply(ActorSystem.scala:804) > at > akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.apply(ActorSystem.scala:804) > at akka.util.ReentrantGuard.withGuard(LockUtil.scala:15) > at > akka.actor.ActorSystemImpl$TerminationCallbacks.run(ActorSystem.scala:804) > at > akka.actor.ActorSystemImpl$$anonfun$terminationCallbacks$1.apply(ActorSystem.scala:638) > at > akka.actor.ActorSystemImpl$$anonfun$terminationCallbacks$1.apply(ActorSystem.scala:638) > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) > at > akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67) > at > akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82) > at > akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) > at > akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) > at > scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) > at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > 2016-06-09 10:21:42,376 INFO org.apache.flink.yarn.YarnJobManager > - Shutdown compl
[jira] [Commented] (FLINK-1946) Make yarn tests logging less verbose
[ https://issues.apache.org/jira/browse/FLINK-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15349695#comment-15349695 ] Maximilian Michels commented on FLINK-1946: --- Fix with 5b2ad7f03ef67ce529551e7b464d7db94e2a1d90. Also, the log level for Hadoop classes has been adjusted previously. I think we want to keep the log level at INFO for debugging purposes. Anything else we can do? > Make yarn tests logging less verbose > > > Key: FLINK-1946 > URL: https://issues.apache.org/jira/browse/FLINK-1946 > Project: Flink > Issue Type: Improvement > Components: YARN Client >Reporter: Till Rohrmann >Priority: Minor > > Currently, the yarn tests log on the INFO level making the test outputs > confusing. Furthermore some status messages are written to stdout. I think > these messages are not necessary to be shown to the user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-3838) CLI parameter parser is munging application params
[ https://issues.apache.org/jira/browse/FLINK-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels resolved FLINK-3838. --- Resolution: Fixed master: dfecca7794f56eb2a4438e3bdca6878379929cfa release-1.0: f5a40855fbd90a375cb920358baaf176876a42f0 > CLI parameter parser is munging application params > -- > > Key: FLINK-3838 > URL: https://issues.apache.org/jira/browse/FLINK-3838 > Project: Flink > Issue Type: Bug > Components: Client >Affects Versions: 1.0.2, 1.0.3 >Reporter: Ken Krugler >Assignee: Maximilian Michels >Priority: Minor > Fix For: 1.1.0, 1.0.4 > > > If parameters for an application use a single '-' (e.g. -maxtasks) then the > CLI argument parser will munge these, and the app gets passed either just the > parameter name (e.g. 'maxtask') if the start of the parameter doesn't match a > Flink parameter, or you get two values, with the first value being the part > that matched (e.g. '-m') and the second value being the rest (e.g. 'axtasks'). > The parser should ignore everything after the jar path parameter. > Note that using -- does seem to work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-3757) addAccumulator does not throw Exception on duplicate accumulator name
[ https://issues.apache.org/jira/browse/FLINK-3757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels resolved FLINK-3757. --- Resolution: Fixed Assignee: Maximilian Michels Fix Version/s: 1.1.0 Fixed via 9c15406400af16861be8cf08e60d94074addbba0 > addAccumulator does not throw Exception on duplicate accumulator name > - > > Key: FLINK-3757 > URL: https://issues.apache.org/jira/browse/FLINK-3757 > Project: Flink > Issue Type: Bug >Reporter: Konstantin Knauf >Assignee: Maximilian Michels >Priority: Minor > Fix For: 1.1.0 > > > Works if tasks are chained, but in many situations it does not, see > See https://gist.github.com/knaufk/a0026eff9792aa905a2f586f3b0cb112 > Is this an undocumented feature or a valid bug? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-3864) Yarn tests don't check for prohibited strings in log output
[ https://issues.apache.org/jira/browse/FLINK-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels closed FLINK-3864. - Resolution: Fixed Fixed with d92aeb7aa63490278d9a9e36aae595e6cbe39f32 > Yarn tests don't check for prohibited strings in log output > --- > > Key: FLINK-3864 > URL: https://issues.apache.org/jira/browse/FLINK-3864 > Project: Flink > Issue Type: Bug > Components: Tests, YARN Client >Affects Versions: 1.1.0, 1.0.2 >Reporter: Maximilian Michels >Assignee: Maximilian Michels > Fix For: 1.1.0 > > > {{YarnTestBase.runWithArgs(...)}} provides a parameter for strings which must > not appear in the log output. {{perJobYarnCluster}} and > {{perJobYarnClusterWithParallelism}} have "System.out)" prepended to the > prohibited strings; probably an artifact of an older test code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4084) Add configDir parameter to CliFrontend and flink shell script
[ https://issues.apache.org/jira/browse/FLINK-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344402#comment-15344402 ] Maximilian Michels commented on FLINK-4084: --- Not sure if I understand correctly. I think the command-line parameter should have precedence over the environment variable. So, if {{ export FLINK_CONF_DIR=mydir1}} is set and you also specify {{--configDir=mydir2}}, then mydir2 should be used as config directory. > Add configDir parameter to CliFrontend and flink shell script > - > > Key: FLINK-4084 > URL: https://issues.apache.org/jira/browse/FLINK-4084 > Project: Flink > Issue Type: Improvement > Components: Client >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Assignee: Andrea Sella >Priority: Minor > > At the moment there is no other way than the environment variable > FLINK_CONF_DIR to specify the configuration directory for the CliFrontend if > it is started via the flink shell script. In order to improve the user > exprience, I would propose to introduce a {{--configDir}} parameter which the > user can use to specify a configuration directory more easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-4084) Add configDir parameter to CliFrontend and flink shell script
[ https://issues.apache.org/jira/browse/FLINK-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344402#comment-15344402 ] Maximilian Michels edited comment on FLINK-4084 at 6/22/16 2:40 PM: Not sure if I understand correctly. The command-line parameter should have precedence over the environment variable. So, if {{export FLINK_CONF_DIR=mydir1}} is set and you also specify {{--configDir=mydir2}}, then mydir2 should be used as config directory. was (Author: mxm): Not sure if I understand correctly. The command-line parameter should have precedence over the environment variable. So, if {{ export FLINK_CONF_DIR=mydir1}} is set and you also specify {{--configDir=mydir2}}, then mydir2 should be used as config directory. > Add configDir parameter to CliFrontend and flink shell script > - > > Key: FLINK-4084 > URL: https://issues.apache.org/jira/browse/FLINK-4084 > Project: Flink > Issue Type: Improvement > Components: Client >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Assignee: Andrea Sella >Priority: Minor > > At the moment there is no other way than the environment variable > FLINK_CONF_DIR to specify the configuration directory for the CliFrontend if > it is started via the flink shell script. In order to improve the user > exprience, I would propose to introduce a {{--configDir}} parameter which the > user can use to specify a configuration directory more easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-4084) Add configDir parameter to CliFrontend and flink shell script
[ https://issues.apache.org/jira/browse/FLINK-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344402#comment-15344402 ] Maximilian Michels edited comment on FLINK-4084 at 6/22/16 2:39 PM: Not sure if I understand correctly. The command-line parameter should have precedence over the environment variable. So, if {{ export FLINK_CONF_DIR=mydir1}} is set and you also specify {{--configDir=mydir2}}, then mydir2 should be used as config directory. was (Author: mxm): Not sure if I understand correctly. I think the command-line parameter should have precedence over the environment variable. So, if {{ export FLINK_CONF_DIR=mydir1}} is set and you also specify {{--configDir=mydir2}}, then mydir2 should be used as config directory. > Add configDir parameter to CliFrontend and flink shell script > - > > Key: FLINK-4084 > URL: https://issues.apache.org/jira/browse/FLINK-4084 > Project: Flink > Issue Type: Improvement > Components: Client >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Assignee: Andrea Sella >Priority: Minor > > At the moment there is no other way than the environment variable > FLINK_CONF_DIR to specify the configuration directory for the CliFrontend if > it is started via the flink shell script. In order to improve the user > exprience, I would propose to introduce a {{--configDir}} parameter which the > user can use to specify a configuration directory more easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4084) Add configDir parameter to CliFrontend and flink shell script
[ https://issues.apache.org/jira/browse/FLINK-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341731#comment-15341731 ] Maximilian Michels commented on FLINK-4084: --- Actually, it is a bit more than that. The idea is to add a {{configDir}} parameter to {{CliFrontend}} that users can discover when they examine the help output. Have a look at {{CliFrontendParser}} for existing parameters. The shell scripts also need to know the config directory location to load some configuration (e.g. slaves file for starting a cluster). The change you posted would only cover this part. > Add configDir parameter to CliFrontend and flink shell script > - > > Key: FLINK-4084 > URL: https://issues.apache.org/jira/browse/FLINK-4084 > Project: Flink > Issue Type: Improvement > Components: Client >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Assignee: Andrea Sella >Priority: Minor > > At the moment there is no other way than the environment variable > FLINK_CONF_DIR to specify the configuration directory for the CliFrontend if > it is started via the flink shell script. In order to improve the user > exprience, I would propose to introduce a {{--configDir}} parameter which the > user can use to specify a configuration directory more easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-4079) YARN properties file used for per-job cluster
[ https://issues.apache.org/jira/browse/FLINK-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels closed FLINK-4079. - Resolution: Fixed master: ec6d97528e8b21f191b7922e4810fd60804c8365 release-1.0: 3163638a61f49dd2df55467615096918bec3a890 Fix will be included in the 1.0.4 release. > YARN properties file used for per-job cluster > - > > Key: FLINK-4079 > URL: https://issues.apache.org/jira/browse/FLINK-4079 > Project: Flink > Issue Type: Bug > Components: Client >Affects Versions: 1.0.3 >Reporter: Ufuk Celebi >Assignee: Maximilian Michels > Fix For: 1.1.0, 1.0.4 > > > YARN per job clusters (flink run -m yarn-cluster) rely on the hidden YARN > properties file, which defines the container configuration. This can lead to > unexpected behaviour, because the per-job-cluster configuration is merged > with the YARN properties file (or used as only configuration source). > A user ran into this as follows: > - Create a long-lived YARN session with HA (creates a hidden YARN properties > file) > - Submits standalone batch jobs with a per job cluster (flink run -m > yarn-cluster). The batch jobs get submitted to the long lived HA cluster, > because of the properties file. > [~mxm] Am I correct in assuming that this is only relevant for the 1.0 branch > and will be fixed with the client refactoring you are working on? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3964) Job submission times out with recursive.file.enumeration
[ https://issues.apache.org/jira/browse/FLINK-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341654#comment-15341654 ] Maximilian Michels commented on FLINK-3964: --- You're right, Flink uses {{akka.client.timeout}} as timeout for job submissions. The Yarn properties can only be used with the Yarn mode. Just checked that {{bin/flink run -m yarn-cluster -yn 1 -yDakka.ask.timeout=1ms}} does set the client timeout to 1 millisecond (it times out). The actual cause of the issue is that the JobManager calls the {{configure()}} on all Input/Output vertices of the job. That is where the files of the input format are loaded, causing a delay in the job submission which can lead to a timeout. IMHO the only fix is to increase the timeout. If you want, we can file an issue for changing config parameters via the CLI. > Job submission times out with recursive.file.enumeration > > > Key: FLINK-3964 > URL: https://issues.apache.org/jira/browse/FLINK-3964 > Project: Flink > Issue Type: Bug > Components: Batch Connectors and Input/Output Formats, DataSet API >Affects Versions: 1.0.0 >Reporter: Juho Autio > > When using "recursive.file.enumeration" with a big enough folder structure to > list, flink batch job fails right at the beginning because of a timeout. > h2. Problem details > We get this error: {{Communication with JobManager failed: Job submission to > the JobManager timed out}}. > The code we have is basically this: > {code} > val env = ExecutionEnvironment.getExecutionEnvironment > val parameters = new Configuration > // set the recursive enumeration parameter > parameters.setBoolean("recursive.file.enumeration", true) > val parameter = ParameterTool.fromArgs(args) > val input_data_path : String = parameter.get("input_data_path", null ) > val data : DataSet[(Text,Text)] = env.readSequenceFile(classOf[Text], > classOf[Text], input_data_path) > .withParameters(parameters) > data.first(10).print > {code} > If we set {{input_data_path}} parameter to {{s3n://bucket/path/date=*/}} it > times out. If we use a more restrictive pattern like > {{s3n://bucket/path/date=20160523/}}, it doesn't time out. > To me it seems that time taken to list files shouldn't cause any timeouts on > job submission level. > For us this was "fixed" by adding {{akka.client.timeout: 600 s}} in > {{flink-conf.yaml}}, but I wonder if the timeout would still occur if we have > even more files to list? > > P.S. Is there any way to set {{akka.client.timeout}} when calling {{bin/flink > run}} instead of editing {{flink-conf.yaml}}. I tried to add it as a {{-yD}} > flag but couldn't get it working. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4051) RabbitMQ Source might not react to cancel signal
[ https://issues.apache.org/jira/browse/FLINK-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341536#comment-15341536 ] Maximilian Michels commented on FLINK-4051: --- It looks to me that the {{nextDelievery()}} method is interruptible. It throws an {{InterruptedException}} in case it is interrupted. As [~rmetzger] pointed out Flink sets the interrupted flag on the source thread during cancellation. I've just tested this behavior using this test class: {code:java} public class RMQSourceTest { public static void main(String[] args) throws Exception { final RMQConnectionConfig config = new RMQConnectionConfig.Builder() .setHost("localhost") .setPort(5672) .setVirtualHost("/") .setUserName("guest") .setPassword("guest") .build(); StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.addSource(new RMQSource<>(config, "test", new SimpleStringSchema())).print(); env.execute(); } } {code} Then I deployed a Jar on a Flink cluster with RabbitMQ running on localhost on port 5672: {code:none} flink run -c org.myorg.quickstart.RMQSourceTest quickstart-0.1.jar {code} Nothing was printed of course because the queue is empty :) Then I cancelled: {code:none} $ ./flink list -- Running/Restarting Jobs --- 21.06.2016 12:31:25 : 122456d90688011df6f8fa73de2d036c : Flink Streaming Job (RUNNING) -- No scheduled jobs. $ flink cancel 122456d90688011df6f8fa73de2d036c {code} Here's the output of the program: {noformat} 06/21/2016 12:31:25 Job execution switched to status RUNNING. 06/21/2016 12:31:25 Source: Custom Source -> Sink: Unnamed(1/1) switched to SCHEDULED 06/21/2016 12:31:25 Source: Custom Source -> Sink: Unnamed(1/1) switched to DEPLOYING 06/21/2016 12:31:25 Source: Custom Source -> Sink: Unnamed(1/1) switched to RUNNING 06/21/2016 12:31:50 Job execution switched to status CANCELLING. 06/21/2016 12:31:50 Source: Custom Source -> Sink: Unnamed(1/1) switched to CANCELING 06/21/2016 12:31:50 Source: Custom Source -> Sink: Unnamed(1/1) switched to CANCELED 06/21/2016 12:31:50 Job execution switched to status CANCELED. The program finished with the following exception: org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Job was cancelled. at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:378) at org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:91) at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:355) at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:68) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1381) at org.myorg.quickstart.RMQSourceTest.main(RMQSourceTest.java:41) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:509) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:403) at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:297) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:740) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:253) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:965) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1008) Caused by: org.apache.flink.runtime.client.JobCancellationException: Job was cancelled. at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:798) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:752) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:752) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Fut
[jira] [Commented] (FLINK-4079) YARN properties file used for per-job cluster
[ https://issues.apache.org/jira/browse/FLINK-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341458#comment-15341458 ] Maximilian Michels commented on FLINK-4079: --- [~ArnaudL] I've prepared a fix for the next release. > YARN properties file used for per-job cluster > - > > Key: FLINK-4079 > URL: https://issues.apache.org/jira/browse/FLINK-4079 > Project: Flink > Issue Type: Bug > Components: Client >Affects Versions: 1.0.3 >Reporter: Ufuk Celebi >Assignee: Maximilian Michels > Fix For: 1.1.0, 1.0.4 > > > YARN per job clusters (flink run -m yarn-cluster) rely on the hidden YARN > properties file, which defines the container configuration. This can lead to > unexpected behaviour, because the per-job-cluster configuration is merged > with the YARN properties file (or used as only configuration source). > A user ran into this as follows: > - Create a long-lived YARN session with HA (creates a hidden YARN properties > file) > - Submits standalone batch jobs with a per job cluster (flink run -m > yarn-cluster). The batch jobs get submitted to the long lived HA cluster, > because of the properties file. > [~mxm] Am I correct in assuming that this is only relevant for the 1.0 branch > and will be fixed with the client refactoring you are working on? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4079) YARN properties file used for per-job cluster
[ https://issues.apache.org/jira/browse/FLINK-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341397#comment-15341397 ] Maximilian Michels commented on FLINK-4079: --- Makes sense to fix it then. Let me see. > YARN properties file used for per-job cluster > - > > Key: FLINK-4079 > URL: https://issues.apache.org/jira/browse/FLINK-4079 > Project: Flink > Issue Type: Bug > Components: Client >Affects Versions: 1.0.3 >Reporter: Ufuk Celebi >Assignee: Maximilian Michels > Fix For: 1.1.0, 1.0.4 > > > YARN per job clusters (flink run -m yarn-cluster) rely on the hidden YARN > properties file, which defines the container configuration. This can lead to > unexpected behaviour, because the per-job-cluster configuration is merged > with the YARN properties file (or used as only configuration source). > A user ran into this as follows: > - Create a long-lived YARN session with HA (creates a hidden YARN properties > file) > - Submits standalone batch jobs with a per job cluster (flink run -m > yarn-cluster). The batch jobs get submitted to the long lived HA cluster, > because of the properties file. > [~mxm] Am I correct in assuming that this is only relevant for the 1.0 branch > and will be fixed with the client refactoring you are working on? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-3105) Submission in per job YARN cluster mode reuses properties file of long-lived session
[ https://issues.apache.org/jira/browse/FLINK-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels closed FLINK-3105. - Resolution: Fixed Fix Version/s: 1.1.0 > Submission in per job YARN cluster mode reuses properties file of long-lived > session > > > Key: FLINK-3105 > URL: https://issues.apache.org/jira/browse/FLINK-3105 > Project: Flink > Issue Type: Bug > Components: YARN Client >Affects Versions: 0.10.1 >Reporter: Ufuk Celebi >Assignee: Maximilian Michels > Fix For: 1.1.0 > > > Starting a YARN session with `bin/yarn-session.sh` creates a properties file, > which is used to parse job manager information when submitting jobs. > This properties file is also used when submitting a job with {{bin/flink run > -m yarn-cluster}}. The {{yarn-cluster}} mode should actually start a new YARN > session. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (FLINK-3105) Submission in per job YARN cluster mode reuses properties file of long-lived session
[ https://issues.apache.org/jira/browse/FLINK-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels reopened FLINK-3105: --- > Submission in per job YARN cluster mode reuses properties file of long-lived > session > > > Key: FLINK-3105 > URL: https://issues.apache.org/jira/browse/FLINK-3105 > Project: Flink > Issue Type: Bug > Components: YARN Client >Affects Versions: 0.10.1 >Reporter: Ufuk Celebi > > Starting a YARN session with `bin/yarn-session.sh` creates a properties file, > which is used to parse job manager information when submitting jobs. > This properties file is also used when submitting a job with {{bin/flink run > -m yarn-cluster}}. The {{yarn-cluster}} mode should actually start a new YARN > session. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3757) addAccumulator does not throw Exception on duplicate accumulator name
[ https://issues.apache.org/jira/browse/FLINK-3757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339749#comment-15339749 ] Maximilian Michels commented on FLINK-3757: --- Does the pull request improve the documentation? > addAccumulator does not throw Exception on duplicate accumulator name > - > > Key: FLINK-3757 > URL: https://issues.apache.org/jira/browse/FLINK-3757 > Project: Flink > Issue Type: Bug >Reporter: Konstantin Knauf >Priority: Minor > > Works if tasks are chained, but in many situations it does not, see > See https://gist.github.com/knaufk/a0026eff9792aa905a2f586f3b0cb112 > Is this an undocumented feature or a valid bug? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3757) addAccumulator does not throw Exception on duplicate accumulator name
[ https://issues.apache.org/jira/browse/FLINK-3757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339744#comment-15339744 ] Maximilian Michels commented on FLINK-3757: --- It does throw an exception if you previously added it in the same Task. I agree, we should clarify that. > addAccumulator does not throw Exception on duplicate accumulator name > - > > Key: FLINK-3757 > URL: https://issues.apache.org/jira/browse/FLINK-3757 > Project: Flink > Issue Type: Bug >Reporter: Konstantin Knauf >Priority: Minor > > Works if tasks are chained, but in many situations it does not, see > See https://gist.github.com/knaufk/a0026eff9792aa905a2f586f3b0cb112 > Is this an undocumented feature or a valid bug? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-3380) Unstable Test: JobSubmissionFailsITCase
[ https://issues.apache.org/jira/browse/FLINK-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels resolved FLINK-3380. --- Resolution: Fixed Fix Version/s: 1.1.0 Resolving for now. > Unstable Test: JobSubmissionFailsITCase > --- > > Key: FLINK-3380 > URL: https://issues.apache.org/jira/browse/FLINK-3380 > Project: Flink > Issue Type: Bug >Reporter: Kostas Kloudas >Priority: Critical > Labels: test-stability > Fix For: 1.1.0 > > > https://s3.amazonaws.com/archive.travis-ci.org/jobs/108034635/log.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-4095) Add configDir argument to shell scripts
[ https://issues.apache.org/jira/browse/FLINK-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels closed FLINK-4095. - Resolution: Fixed Seems like custom config dirs are highly request now :) This duplicates FLINK-4084. > Add configDir argument to shell scripts > --- > > Key: FLINK-4095 > URL: https://issues.apache.org/jira/browse/FLINK-4095 > Project: Flink > Issue Type: Improvement > Components: Startup Shell Scripts >Reporter: Ufuk Celebi > Fix For: 1.1.0 > > > Currently it's very hard to execute the shell scripts with different > configurations, because we always use the config file in the {{conf}} > directory. > Let's add a CLI argument to allow specifying a different configuration > directory (or file) when running the shell scripts like > {code} > bin/flink run --configDir > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-4095) Add configDir argument to shell scripts
[ https://issues.apache.org/jira/browse/FLINK-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels closed FLINK-4095. - Resolution: Duplicate > Add configDir argument to shell scripts > --- > > Key: FLINK-4095 > URL: https://issues.apache.org/jira/browse/FLINK-4095 > Project: Flink > Issue Type: Improvement > Components: Startup Shell Scripts >Reporter: Ufuk Celebi > Fix For: 1.1.0 > > > Currently it's very hard to execute the shell scripts with different > configurations, because we always use the config file in the {{conf}} > directory. > Let's add a CLI argument to allow specifying a different configuration > directory (or file) when running the shell scripts like > {code} > bin/flink run --configDir > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (FLINK-4095) Add configDir argument to shell scripts
[ https://issues.apache.org/jira/browse/FLINK-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels reopened FLINK-4095: --- > Add configDir argument to shell scripts > --- > > Key: FLINK-4095 > URL: https://issues.apache.org/jira/browse/FLINK-4095 > Project: Flink > Issue Type: Improvement > Components: Startup Shell Scripts >Reporter: Ufuk Celebi > Fix For: 1.1.0 > > > Currently it's very hard to execute the shell scripts with different > configurations, because we always use the config file in the {{conf}} > directory. > Let's add a CLI argument to allow specifying a different configuration > directory (or file) when running the shell scripts like > {code} > bin/flink run --configDir > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-4090) Close of OutputStream should be in finally clause in FlinkYarnSessionCli#writeYarnProperties()
[ https://issues.apache.org/jira/browse/FLINK-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels closed FLINK-4090. - Resolution: Fixed Fixed with f2e9c521aa4e0cbc14b71c561b8c5dd46201a45a > Close of OutputStream should be in finally clause in > FlinkYarnSessionCli#writeYarnProperties() > -- > > Key: FLINK-4090 > URL: https://issues.apache.org/jira/browse/FLINK-4090 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Assignee: Maximilian Michels >Priority: Minor > Fix For: 1.1.0 > > > {code} > try { > OutputStream out = new FileOutputStream(propertiesFile); > properties.store(out, "Generated YARN properties file"); > out.close(); > } catch (IOException e) { > {code} > The close of out should be in finally cluase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-4090) Close of OutputStream should be in finally clause in FlinkYarnSessionCli#writeYarnProperties()
[ https://issues.apache.org/jira/browse/FLINK-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels reassigned FLINK-4090: - Assignee: Maximilian Michels > Close of OutputStream should be in finally clause in > FlinkYarnSessionCli#writeYarnProperties() > -- > > Key: FLINK-4090 > URL: https://issues.apache.org/jira/browse/FLINK-4090 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Assignee: Maximilian Michels >Priority: Minor > Fix For: 1.1.0 > > > {code} > try { > OutputStream out = new FileOutputStream(propertiesFile); > properties.store(out, "Generated YARN properties file"); > out.close(); > } catch (IOException e) { > {code} > The close of out should be in finally cluase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-4090) Close of OutputStream should be in finally clause in FlinkYarnSessionCli#writeYarnProperties()
[ https://issues.apache.org/jira/browse/FLINK-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels updated FLINK-4090: -- Fix Version/s: 1.1.0 > Close of OutputStream should be in finally clause in > FlinkYarnSessionCli#writeYarnProperties() > -- > > Key: FLINK-4090 > URL: https://issues.apache.org/jira/browse/FLINK-4090 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > Fix For: 1.1.0 > > > {code} > try { > OutputStream out = new FileOutputStream(propertiesFile); > properties.store(out, "Generated YARN properties file"); > out.close(); > } catch (IOException e) { > {code} > The close of out should be in finally cluase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4089) Ineffective null check in YarnClusterClient#getApplicationStatus()
[ https://issues.apache.org/jira/browse/FLINK-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339677#comment-15339677 ] Maximilian Michels commented on FLINK-4089: --- Thank you for reporting! > Ineffective null check in YarnClusterClient#getApplicationStatus() > -- > > Key: FLINK-4089 > URL: https://issues.apache.org/jira/browse/FLINK-4089 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Assignee: Maximilian Michels > Fix For: 1.1.0, 1.0.4 > > > Here is related code: > {code} > if(pollingRunner == null) { > LOG.warn("YarnClusterClient.getApplicationStatus() has been called on > an uninitialized cluster." + > "The system might be in an erroneous state"); > } > ApplicationReport lastReport = pollingRunner.getLastReport(); > {code} > If pollingRunner is null, NPE would result from the getLastReport() call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-4089) Ineffective null check in YarnClusterClient#getApplicationStatus()
[ https://issues.apache.org/jira/browse/FLINK-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels closed FLINK-4089. - Resolution: Fixed Assignee: Maximilian Michels Fix Version/s: 1.0.4 master: e6320a4c3922e934698acfe274546bccf88ad53d release-1.0: e30bee839aadf9c3ac7d46bb35a87c0f130f4789 > Ineffective null check in YarnClusterClient#getApplicationStatus() > -- > > Key: FLINK-4089 > URL: https://issues.apache.org/jira/browse/FLINK-4089 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Assignee: Maximilian Michels > Fix For: 1.1.0, 1.0.4 > > > Here is related code: > {code} > if(pollingRunner == null) { > LOG.warn("YarnClusterClient.getApplicationStatus() has been called on > an uninitialized cluster." + > "The system might be in an erroneous state"); > } > ApplicationReport lastReport = pollingRunner.getLastReport(); > {code} > If pollingRunner is null, NPE would result from the getLastReport() call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-4089) Ineffective null check in YarnClusterClient#getApplicationStatus()
[ https://issues.apache.org/jira/browse/FLINK-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels updated FLINK-4089: -- Fix Version/s: 1.1.0 > Ineffective null check in YarnClusterClient#getApplicationStatus() > -- > > Key: FLINK-4089 > URL: https://issues.apache.org/jira/browse/FLINK-4089 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu > Fix For: 1.1.0 > > > Here is related code: > {code} > if(pollingRunner == null) { > LOG.warn("YarnClusterClient.getApplicationStatus() has been called on > an uninitialized cluster." + > "The system might be in an erroneous state"); > } > ApplicationReport lastReport = pollingRunner.getLastReport(); > {code} > If pollingRunner is null, NPE would result from the getLastReport() call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4084) Add configDir parameter to CliFrontend and flink shell script
[ https://issues.apache.org/jira/browse/FLINK-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339654#comment-15339654 ] Maximilian Michels commented on FLINK-4084: --- Yes, please go ahead. I'll assign the issue to you. > Add configDir parameter to CliFrontend and flink shell script > - > > Key: FLINK-4084 > URL: https://issues.apache.org/jira/browse/FLINK-4084 > Project: Flink > Issue Type: Improvement > Components: Client >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Priority: Minor > > At the moment there is no other way than the environment variable > FLINK_CONF_DIR to specify the configuration directory for the CliFrontend if > it is started via the flink shell script. In order to improve the user > exprience, I would propose to introduce a {{--configDir}} parameter which the > user can use to specify a configuration directory more easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-4084) Add configDir parameter to CliFrontend and flink shell script
[ https://issues.apache.org/jira/browse/FLINK-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels updated FLINK-4084: -- Assignee: Andrea Sella > Add configDir parameter to CliFrontend and flink shell script > - > > Key: FLINK-4084 > URL: https://issues.apache.org/jira/browse/FLINK-4084 > Project: Flink > Issue Type: Improvement > Components: Client >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Assignee: Andrea Sella >Priority: Minor > > At the moment there is no other way than the environment variable > FLINK_CONF_DIR to specify the configuration directory for the CliFrontend if > it is started via the flink shell script. In order to improve the user > exprience, I would propose to introduce a {{--configDir}} parameter which the > user can use to specify a configuration directory more easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-4041) Failure while asking ResourceManager for RegisterResource
[ https://issues.apache.org/jira/browse/FLINK-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels reassigned FLINK-4041: - Assignee: Maximilian Michels > Failure while asking ResourceManager for RegisterResource > - > > Key: FLINK-4041 > URL: https://issues.apache.org/jira/browse/FLINK-4041 > Project: Flink > Issue Type: Bug > Components: ResourceManager >Affects Versions: 1.1.0 >Reporter: Robert Metzger >Assignee: Maximilian Michels > Labels: test-stability > > In this build > (https://s3.amazonaws.com/archive.travis-ci.org/jobs/136372462/log.txt), I > got the following YARN Test failure: > {code} > 2016-06-09 10:21:42,336 INFO > akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down > remote daemon. > 2016-06-09 10:21:42,336 INFO > akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon > shut down; proceeding with flushing remote transports. > 2016-06-09 10:21:42,355 INFO > akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut > down. > 2016-06-09 10:21:42,376 ERROR org.apache.flink.yarn.YarnJobManager > - Failure while asking ResourceManager for RegisterResource > akka.pattern.AskTimeoutException: Ask timed out on > [Actor[akka://flink/user/$c#1255104255]] after [1 ms] > at > akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333) > at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117) > at > akka.actor.LightArrayRevolverScheduler$TaskHolder.run(Scheduler.scala:476) > at > akka.actor.LightArrayRevolverScheduler$$anonfun$close$1.apply(Scheduler.scala:282) > at > akka.actor.LightArrayRevolverScheduler$$anonfun$close$1.apply(Scheduler.scala:281) > at scala.collection.Iterator$class.foreach(Iterator.scala:742) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1194) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at akka.actor.LightArrayRevolverScheduler.close(Scheduler.scala:280) > at akka.actor.ActorSystemImpl.stopScheduler(ActorSystem.scala:688) > at > akka.actor.ActorSystemImpl$$anonfun$liftedTree2$1$1.apply$mcV$sp(ActorSystem.scala:617) > at > akka.actor.ActorSystemImpl$$anonfun$liftedTree2$1$1.apply(ActorSystem.scala:617) > at > akka.actor.ActorSystemImpl$$anonfun$liftedTree2$1$1.apply(ActorSystem.scala:617) > at akka.actor.ActorSystemImpl$$anon$3.run(ActorSystem.scala:641) > at > akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.runNext$1(ActorSystem.scala:808) > at > akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.apply$mcV$sp(ActorSystem.scala:811) > at > akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.apply(ActorSystem.scala:804) > at > akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.apply(ActorSystem.scala:804) > at akka.util.ReentrantGuard.withGuard(LockUtil.scala:15) > at > akka.actor.ActorSystemImpl$TerminationCallbacks.run(ActorSystem.scala:804) > at > akka.actor.ActorSystemImpl$$anonfun$terminationCallbacks$1.apply(ActorSystem.scala:638) > at > akka.actor.ActorSystemImpl$$anonfun$terminationCallbacks$1.apply(ActorSystem.scala:638) > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) > at > akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67) > at > akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82) > at > akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) > at > akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) > at > scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) > at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > 2016-06-09 10:21:42,376 INFO org.apache.flink.yarn.YarnJobManager > - Shutdown completed. Stopping JVM. > 2016-06-09 10:21:42,377 INFO > org.apache.flink.runtime.webmonitor.StackTra
[jira] [Commented] (FLINK-4041) Failure while asking ResourceManager for RegisterResource
[ https://issues.apache.org/jira/browse/FLINK-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339620#comment-15339620 ] Maximilian Michels commented on FLINK-4041: --- This happens when a TaskManager registers during shutdown of the JobManager. > Failure while asking ResourceManager for RegisterResource > - > > Key: FLINK-4041 > URL: https://issues.apache.org/jira/browse/FLINK-4041 > Project: Flink > Issue Type: Bug > Components: ResourceManager >Affects Versions: 1.1.0 >Reporter: Robert Metzger > Labels: test-stability > > In this build > (https://s3.amazonaws.com/archive.travis-ci.org/jobs/136372462/log.txt), I > got the following YARN Test failure: > {code} > 2016-06-09 10:21:42,336 INFO > akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down > remote daemon. > 2016-06-09 10:21:42,336 INFO > akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon > shut down; proceeding with flushing remote transports. > 2016-06-09 10:21:42,355 INFO > akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut > down. > 2016-06-09 10:21:42,376 ERROR org.apache.flink.yarn.YarnJobManager > - Failure while asking ResourceManager for RegisterResource > akka.pattern.AskTimeoutException: Ask timed out on > [Actor[akka://flink/user/$c#1255104255]] after [1 ms] > at > akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333) > at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117) > at > akka.actor.LightArrayRevolverScheduler$TaskHolder.run(Scheduler.scala:476) > at > akka.actor.LightArrayRevolverScheduler$$anonfun$close$1.apply(Scheduler.scala:282) > at > akka.actor.LightArrayRevolverScheduler$$anonfun$close$1.apply(Scheduler.scala:281) > at scala.collection.Iterator$class.foreach(Iterator.scala:742) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1194) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at akka.actor.LightArrayRevolverScheduler.close(Scheduler.scala:280) > at akka.actor.ActorSystemImpl.stopScheduler(ActorSystem.scala:688) > at > akka.actor.ActorSystemImpl$$anonfun$liftedTree2$1$1.apply$mcV$sp(ActorSystem.scala:617) > at > akka.actor.ActorSystemImpl$$anonfun$liftedTree2$1$1.apply(ActorSystem.scala:617) > at > akka.actor.ActorSystemImpl$$anonfun$liftedTree2$1$1.apply(ActorSystem.scala:617) > at akka.actor.ActorSystemImpl$$anon$3.run(ActorSystem.scala:641) > at > akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.runNext$1(ActorSystem.scala:808) > at > akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.apply$mcV$sp(ActorSystem.scala:811) > at > akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.apply(ActorSystem.scala:804) > at > akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.apply(ActorSystem.scala:804) > at akka.util.ReentrantGuard.withGuard(LockUtil.scala:15) > at > akka.actor.ActorSystemImpl$TerminationCallbacks.run(ActorSystem.scala:804) > at > akka.actor.ActorSystemImpl$$anonfun$terminationCallbacks$1.apply(ActorSystem.scala:638) > at > akka.actor.ActorSystemImpl$$anonfun$terminationCallbacks$1.apply(ActorSystem.scala:638) > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) > at > akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67) > at > akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82) > at > akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) > at > akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) > at > scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) > at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > 2016-06-09 10:21:42,376 INFO org.apache.flink.yarn.YarnJobManager > - Shutdown completed. Stopping JVM. > 2016-06-09 10:21:42,377
[jira] [Commented] (FLINK-3964) Job submission times out with recursive.file.enumeration
[ https://issues.apache.org/jira/browse/FLINK-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339541#comment-15339541 ] Maximilian Michels commented on FLINK-3964: --- Thanks for reporting! Have you tried {{bin/flink run -yDakka.ask.timeout=600s ...}}? > Job submission times out with recursive.file.enumeration > > > Key: FLINK-3964 > URL: https://issues.apache.org/jira/browse/FLINK-3964 > Project: Flink > Issue Type: Bug >Reporter: Juho Autio > > When using "recursive.file.enumeration" with a big enough folder structure to > list, flink batch job fails right at the beginning because of a timeout. > h2. Problem details > We get this error: {{Communication with JobManager failed: Job submission to > the JobManager timed out}}. > The code we have is basically this: > {code} > val env = ExecutionEnvironment.getExecutionEnvironment > val parameters = new Configuration > // set the recursive enumeration parameter > parameters.setBoolean("recursive.file.enumeration", true) > val parameter = ParameterTool.fromArgs(args) > val input_data_path : String = parameter.get("input_data_path", null ) > val data : DataSet[(Text,Text)] = env.readSequenceFile(classOf[Text], > classOf[Text], input_data_path) > .withParameters(parameters) > data.first(10).print > {code} > If we set {{input_data_path}} parameter to {{s3n://bucket/path/date=*/}} it > times out. If we use a more restrictive pattern like > {{s3n://bucket/path/date=20160523/}}, it doesn't time out. > To me it seems that time taken to list files shouldn't cause any timeouts on > job submission level. > For us this was "fixed" by adding {{akka.client.timeout: 600 s}} in > {{flink-conf.yaml}}, but I wonder if the timeout would still occur if we have > even more files to list? > > P.S. Is there any way to set {{akka.client.timeout}} when calling {{bin/flink > run}} instead of editing {{flink-conf.yaml}}. I tried to add it as a {{-yD}} > flag but couldn't get it working. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3838) CLI parameter parser is munging application params
[ https://issues.apache.org/jira/browse/FLINK-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339413#comment-15339413 ] Maximilian Michels commented on FLINK-3838: --- Issue is fixed in common-cli 1.3.1 > CLI parameter parser is munging application params > -- > > Key: FLINK-3838 > URL: https://issues.apache.org/jira/browse/FLINK-3838 > Project: Flink > Issue Type: Bug > Components: Client >Affects Versions: 1.0.2, 1.0.3 >Reporter: Ken Krugler >Assignee: Maximilian Michels >Priority: Minor > Fix For: 1.1.0, 1.0.4 > > > If parameters for an application use a single '-' (e.g. -maxtasks) then the > CLI argument parser will munge these, and the app gets passed either just the > parameter name (e.g. 'maxtask') if the start of the parameter doesn't match a > Flink parameter, or you get two values, with the first value being the part > that matched (e.g. '-m') and the second value being the rest (e.g. 'axtasks'). > The parser should ignore everything after the jar path parameter. > Note that using -- does seem to work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3838) CLI parameter parser is munging application params
[ https://issues.apache.org/jira/browse/FLINK-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339411#comment-15339411 ] Maximilian Michels commented on FLINK-3838: --- A workaround is to use {{./flink run -- }} The two dashes tell the parsing library that no more options are following and it should take the rest as it is. > CLI parameter parser is munging application params > -- > > Key: FLINK-3838 > URL: https://issues.apache.org/jira/browse/FLINK-3838 > Project: Flink > Issue Type: Bug > Components: Client >Affects Versions: 1.0.2, 1.0.3 >Reporter: Ken Krugler >Assignee: Maximilian Michels >Priority: Minor > Fix For: 1.1.0, 1.0.4 > > > If parameters for an application use a single '-' (e.g. -maxtasks) then the > CLI argument parser will munge these, and the app gets passed either just the > parameter name (e.g. 'maxtask') if the start of the parameter doesn't match a > Flink parameter, or you get two values, with the first value being the part > that matched (e.g. '-m') and the second value being the rest (e.g. 'axtasks'). > The parser should ignore everything after the jar path parameter. > Note that using -- does seem to work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3838) CLI parameter parser is munging application params
[ https://issues.apache.org/jira/browse/FLINK-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339405#comment-15339405 ] Maximilian Michels commented on FLINK-3838: --- Occurs whenever args are present before the jar path and the jar arguments. Works (Passes "-arg" and "value" as parameters): {{./flink run /path/to/jar -arg value}} Doesn't work (Passes "arg" and "value" as parameters): {{./flink run -m localhost:6123 /path/to/jar -arg value}} > CLI parameter parser is munging application params > -- > > Key: FLINK-3838 > URL: https://issues.apache.org/jira/browse/FLINK-3838 > Project: Flink > Issue Type: Bug > Components: Client >Affects Versions: 1.0.2, 1.0.3 >Reporter: Ken Krugler >Assignee: Maximilian Michels >Priority: Minor > Fix For: 1.1.0, 1.0.4 > > > If parameters for an application use a single '-' (e.g. -maxtasks) then the > CLI argument parser will munge these, and the app gets passed either just the > parameter name (e.g. 'maxtask') if the start of the parameter doesn't match a > Flink parameter, or you get two values, with the first value being the part > that matched (e.g. '-m') and the second value being the rest (e.g. 'axtasks'). > The parser should ignore everything after the jar path parameter. > Note that using -- does seem to work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3838) CLI parameter parser is munging application params
[ https://issues.apache.org/jira/browse/FLINK-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339280#comment-15339280 ] Maximilian Michels commented on FLINK-3838: --- Seems like this only occurs when running on Yarn, correct? > CLI parameter parser is munging application params > -- > > Key: FLINK-3838 > URL: https://issues.apache.org/jira/browse/FLINK-3838 > Project: Flink > Issue Type: Bug > Components: Client >Affects Versions: 1.0.2, 1.0.3 >Reporter: Ken Krugler >Assignee: Maximilian Michels >Priority: Minor > Fix For: 1.1.0, 1.0.4 > > > If parameters for an application use a single '-' (e.g. -maxtasks) then the > CLI argument parser will munge these, and the app gets passed either just the > parameter name (e.g. 'maxtask') if the start of the parameter doesn't match a > Flink parameter, or you get two values, with the first value being the part > that matched (e.g. '-m') and the second value being the rest (e.g. 'axtasks'). > The parser should ignore everything after the jar path parameter. > Note that using -- does seem to work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-3838) CLI parameter parser is munging application params
[ https://issues.apache.org/jira/browse/FLINK-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels reassigned FLINK-3838: - Assignee: Maximilian Michels > CLI parameter parser is munging application params > -- > > Key: FLINK-3838 > URL: https://issues.apache.org/jira/browse/FLINK-3838 > Project: Flink > Issue Type: Bug > Components: Client >Affects Versions: 1.0.2, 1.0.3 >Reporter: Ken Krugler >Assignee: Maximilian Michels >Priority: Minor > Fix For: 1.1.0, 1.0.4 > > > If parameters for an application use a single '-' (e.g. -maxtasks) then the > CLI argument parser will munge these, and the app gets passed either just the > parameter name (e.g. 'maxtask') if the start of the parameter doesn't match a > Flink parameter, or you get two values, with the first value being the part > that matched (e.g. '-m') and the second value being the rest (e.g. 'axtasks'). > The parser should ignore everything after the jar path parameter. > Note that using -- does seem to work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3838) CLI parameter parser is munging application params
[ https://issues.apache.org/jira/browse/FLINK-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339240#comment-15339240 ] Maximilian Michels commented on FLINK-3838: --- Thanks for reporting! > CLI parameter parser is munging application params > -- > > Key: FLINK-3838 > URL: https://issues.apache.org/jira/browse/FLINK-3838 > Project: Flink > Issue Type: Bug > Components: Client >Affects Versions: 1.0.2, 1.0.3 >Reporter: Ken Krugler >Assignee: Maximilian Michels >Priority: Minor > Fix For: 1.1.0, 1.0.4 > > > If parameters for an application use a single '-' (e.g. -maxtasks) then the > CLI argument parser will munge these, and the app gets passed either just the > parameter name (e.g. 'maxtask') if the start of the parameter doesn't match a > Flink parameter, or you get two values, with the first value being the part > that matched (e.g. '-m') and the second value being the rest (e.g. 'axtasks'). > The parser should ignore everything after the jar path parameter. > Note that using -- does seem to work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-3838) CLI parameter parser is munging application params
[ https://issues.apache.org/jira/browse/FLINK-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels updated FLINK-3838: -- Affects Version/s: 1.0.3 > CLI parameter parser is munging application params > -- > > Key: FLINK-3838 > URL: https://issues.apache.org/jira/browse/FLINK-3838 > Project: Flink > Issue Type: Bug > Components: Client >Affects Versions: 1.0.2, 1.0.3 >Reporter: Ken Krugler >Priority: Minor > Fix For: 1.1.0, 1.0.4 > > > If parameters for an application use a single '-' (e.g. -maxtasks) then the > CLI argument parser will munge these, and the app gets passed either just the > parameter name (e.g. 'maxtask') if the start of the parameter doesn't match a > Flink parameter, or you get two values, with the first value being the part > that matched (e.g. '-m') and the second value being the rest (e.g. 'axtasks'). > The parser should ignore everything after the jar path parameter. > Note that using -- does seem to work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-3838) CLI parameter parser is munging application params
[ https://issues.apache.org/jira/browse/FLINK-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels updated FLINK-3838: -- Fix Version/s: 1.0.4 1.1.0 > CLI parameter parser is munging application params > -- > > Key: FLINK-3838 > URL: https://issues.apache.org/jira/browse/FLINK-3838 > Project: Flink > Issue Type: Bug > Components: Client >Affects Versions: 1.0.2, 1.0.3 >Reporter: Ken Krugler >Priority: Minor > Fix For: 1.1.0, 1.0.4 > > > If parameters for an application use a single '-' (e.g. -maxtasks) then the > CLI argument parser will munge these, and the app gets passed either just the > parameter name (e.g. 'maxtask') if the start of the parameter doesn't match a > Flink parameter, or you get two values, with the first value being the part > that matched (e.g. '-m') and the second value being the rest (e.g. 'axtasks'). > The parser should ignore everything after the jar path parameter. > Note that using -- does seem to work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-3820) ScalaShellITCase.testPreventRecreationBatch deadlocks on Travis
[ https://issues.apache.org/jira/browse/FLINK-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels resolved FLINK-3820. --- Resolution: Duplicate Should be resolved with FLINK-4030. > ScalaShellITCase.testPreventRecreationBatch deadlocks on Travis > --- > > Key: FLINK-3820 > URL: https://issues.apache.org/jira/browse/FLINK-3820 > Project: Flink > Issue Type: Bug > Components: Scala Shell >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Priority: Critical > Labels: test-stability > > The test case {{ScalaShellITCase.testPreventRecreationBatch}} deadlocked on > Travis. > https://s3.amazonaws.com/archive.travis-ci.org/jobs/125789071/log.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3787) Yarn client does not report unfulfillable container constraints
[ https://issues.apache.org/jira/browse/FLINK-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339230#comment-15339230 ] Maximilian Michels commented on FLINK-3787: --- Do we want to fix this for the 1.1.0 release? > Yarn client does not report unfulfillable container constraints > --- > > Key: FLINK-3787 > URL: https://issues.apache.org/jira/browse/FLINK-3787 > Project: Flink > Issue Type: Improvement > Components: YARN Client >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Priority: Minor > > If the number of virtual cores for a Yarn container is not fulfillable, then > the {{TaskManager}} won't be started. This is only reported in the logs but > not in the {{FlinkYarnClient}}. Thus, the user will see a started > {{JobManager}} with no connected {{TaskManagers}}. Since the log aggregation > is only available after the Yarn job has been stopped, there is no easy way > for the user to detect what's going on. > This problem is aggravated by the fact that the number of virtual cores is > coupled to the number of slots if no explicit value has been set for the > virtual cores. Therefore, it might happen that the Yarn deployment fails > because of the virtual cores even though the user has never set a value for > them (the user might even not know about the virtual cores). > I think it would be good to check if the virtual cores constraint is > fulfillable. If not, then the user should receive a clear message that the > Flink cluster cannot be deployed (similar to the memory constraints). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-3765) ScalaShellITCase deadlocks spuriously
[ https://issues.apache.org/jira/browse/FLINK-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels resolved FLINK-3765. --- Resolution: Duplicate This should be resolved with the resolution of FLINK-4030. Sorry, didn't see the issue. > ScalaShellITCase deadlocks spuriously > - > > Key: FLINK-3765 > URL: https://issues.apache.org/jira/browse/FLINK-3765 > Project: Flink > Issue Type: Bug > Components: Scala Shell >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Priority: Critical > Labels: test-stability > > The {{ScalaShellITCase}} deadlocks spuriously. Stephan already observed it > locally and now it also happened on Travis [1]. > [1] https://s3.amazonaws.com/archive.travis-ci.org/jobs/123206554/log.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3757) addAccumulator does not throw Exception on duplicate accumulator name
[ https://issues.apache.org/jira/browse/FLINK-3757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339224#comment-15339224 ] Maximilian Michels commented on FLINK-3757: --- Accumulators are first created locally without checking at the JobManager whether they exist. Only once they have been transferred to the JobManager they are merged (and errors are reported). Since accumulators can be created at arbitrary points during job execution, I think this behavior makes sense. We would have to send a message to the JobManager upon creation or do static analysis of the code to see if accumulators collide. I would say it is a feature to avoid making requests to the JobManager during execution :) > addAccumulator does not throw Exception on duplicate accumulator name > - > > Key: FLINK-3757 > URL: https://issues.apache.org/jira/browse/FLINK-3757 > Project: Flink > Issue Type: Bug >Reporter: Konstantin Knauf >Priority: Minor > > Works if tasks are chained, but in many situations it does not, see > See https://gist.github.com/knaufk/a0026eff9792aa905a2f586f3b0cb112 > Is this an undocumented feature or a valid bug? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-4009) Scala Shell fails to find library for inclusion in test
[ https://issues.apache.org/jira/browse/FLINK-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels updated FLINK-4009: -- Issue Type: Bug (was: Sub-task) Parent: (was: FLINK-3454) > Scala Shell fails to find library for inclusion in test > --- > > Key: FLINK-4009 > URL: https://issues.apache.org/jira/browse/FLINK-4009 > Project: Flink > Issue Type: Bug > Components: Scala Shell, Tests >Affects Versions: 1.1.0 >Reporter: Maximilian Michels >Assignee: Maximilian Michels > Labels: test-stability > Fix For: 1.1.0 > > > The Scala Shell test fails to find the flink-ml library jar in the target > folder when executing with Intellij. This is due to its working directory > being expected in "flink-scala-shell/target" when it is in fact > "flink-scala-shell". When executed with Maven, this works fine because the > Shade plugin changes the basedir from the project root to the /target > folder*. > As per [~till.rohrmann] and [~greghogan] suggestions we could simply add > flink-ml as a test dependency and look for the jar path in the classpath. > \* Because we have the dependencyReducedPomLocation set to /target/. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-4009) Scala Shell fails to find library for inclusion in test
[ https://issues.apache.org/jira/browse/FLINK-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels closed FLINK-4009. - Resolution: Fixed > Scala Shell fails to find library for inclusion in test > --- > > Key: FLINK-4009 > URL: https://issues.apache.org/jira/browse/FLINK-4009 > Project: Flink > Issue Type: Sub-task > Components: Scala Shell, Tests >Affects Versions: 1.1.0 >Reporter: Maximilian Michels >Assignee: Maximilian Michels > Labels: test-stability > Fix For: 1.1.0 > > > The Scala Shell test fails to find the flink-ml library jar in the target > folder when executing with Intellij. This is due to its working directory > being expected in "flink-scala-shell/target" when it is in fact > "flink-scala-shell". When executed with Maven, this works fine because the > Shade plugin changes the basedir from the project root to the /target > folder*. > As per [~till.rohrmann] and [~greghogan] suggestions we could simply add > flink-ml as a test dependency and look for the jar path in the classpath. > \* Because we have the dependencyReducedPomLocation set to /target/. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4009) Scala Shell fails to find library for inclusion in test
[ https://issues.apache.org/jira/browse/FLINK-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335903#comment-15335903 ] Maximilian Michels commented on FLINK-4009: --- Fixed with 4d8cbec4ffca033f5602a53205d65e535ffc5f97. Discuss for using the class path can continue in FLINK-3454. We need to build a jar file during the test to load the classes in the shell. > Scala Shell fails to find library for inclusion in test > --- > > Key: FLINK-4009 > URL: https://issues.apache.org/jira/browse/FLINK-4009 > Project: Flink > Issue Type: Sub-task > Components: Scala Shell, Tests >Affects Versions: 1.1.0 >Reporter: Maximilian Michels >Assignee: Maximilian Michels > Labels: test-stability > Fix For: 1.1.0 > > > The Scala Shell test fails to find the flink-ml library jar in the target > folder when executing with Intellij. This is due to its working directory > being expected in "flink-scala-shell/target" when it is in fact > "flink-scala-shell". When executed with Maven, this works fine because the > Shade plugin changes the basedir from the project root to the /target > folder*. > As per [~till.rohrmann] and [~greghogan] suggestions we could simply add > flink-ml as a test dependency and look for the jar path in the classpath. > \* Because we have the dependencyReducedPomLocation set to /target/. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4009) Scala Shell fails to find library for inclusion in test
[ https://issues.apache.org/jira/browse/FLINK-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335898#comment-15335898 ] Maximilian Michels commented on FLINK-4009: --- Fixing this in the meantime with a simple lookup strategy. > Scala Shell fails to find library for inclusion in test > --- > > Key: FLINK-4009 > URL: https://issues.apache.org/jira/browse/FLINK-4009 > Project: Flink > Issue Type: Sub-task > Components: Scala Shell, Tests >Affects Versions: 1.1.0 >Reporter: Maximilian Michels >Assignee: Maximilian Michels > Labels: test-stability > Fix For: 1.1.0 > > > The Scala Shell test fails to find the flink-ml library jar in the target > folder when executing with Intellij. This is due to its working directory > being expected in "flink-scala-shell/target" when it is in fact > "flink-scala-shell". When executed with Maven, this works fine because the > Shade plugin changes the basedir from the project root to the /target > folder*. > As per [~till.rohrmann] and [~greghogan] suggestions we could simply add > flink-ml as a test dependency and look for the jar path in the classpath. > \* Because we have the dependencyReducedPomLocation set to /target/. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4009) Scala Shell fails to find library for inclusion in test
[ https://issues.apache.org/jira/browse/FLINK-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335894#comment-15335894 ] Maximilian Michels commented on FLINK-4009: --- Not pressing IMHO. > Scala Shell fails to find library for inclusion in test > --- > > Key: FLINK-4009 > URL: https://issues.apache.org/jira/browse/FLINK-4009 > Project: Flink > Issue Type: Sub-task > Components: Scala Shell, Tests >Affects Versions: 1.1.0 >Reporter: Maximilian Michels >Assignee: Maximilian Michels > Labels: test-stability > Fix For: 1.1.0 > > > The Scala Shell test fails to find the flink-ml library jar in the target > folder when executing with Intellij. This is due to its working directory > being expected in "flink-scala-shell/target" when it is in fact > "flink-scala-shell". When executed with Maven, this works fine because the > Shade plugin changes the basedir from the project root to the /target > folder*. > As per [~till.rohrmann] and [~greghogan] suggestions we could simply add > flink-ml as a test dependency and look for the jar path in the classpath. > \* Because we have the dependencyReducedPomLocation set to /target/. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-3890) Deprecate streaming mode flag from Yarn CLI
[ https://issues.apache.org/jira/browse/FLINK-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels closed FLINK-3890. - Resolution: Fixed Fixed with f4ac852275da36ee33aa54ae9097293ccc981afa > Deprecate streaming mode flag from Yarn CLI > --- > > Key: FLINK-3890 > URL: https://issues.apache.org/jira/browse/FLINK-3890 > Project: Flink > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Minor > Fix For: 1.1.0 > > > The {{-yst}} and {{-yarnstreaming}} parameter to the Yarn command-line is not > in use anymore since FLINK-3073 and should have been removed before the 1.0.0 > release. I would suggest to mark the parameter as deprecated in the code and > not list it anymore in the help section. In one of the upcoming major > releases, we can remove it completely (which would give users an error if > they used the flag). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-3667) Generalize client<->cluster communication
[ https://issues.apache.org/jira/browse/FLINK-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels closed FLINK-3667. - Resolution: Fixed Implemented via f9b52a3114a2114e6846091acf3abb294a49615b and f4ac852275da36ee33aa54ae9097293ccc981afa > Generalize client<->cluster communication > - > > Key: FLINK-3667 > URL: https://issues.apache.org/jira/browse/FLINK-3667 > Project: Flink > Issue Type: Improvement > Components: YARN Client >Reporter: Maximilian Michels >Assignee: Maximilian Michels > Fix For: 1.1.0 > > > Here are some notes I took when inspecting the client<->cluster classes with > regard to future integration of other resource management frameworks in > addition to Yarn (e.g. Mesos). > {noformat} > 1 Cluster Client Abstraction > > 1.1 Status Quo > ── > 1.1.1 FlinkYarnClient > ╌ > • Holds the cluster configuration (Flink-specific and Yarn-specific) > • Contains the deploy() method to deploy the cluster > • Creates the Hadoop Yarn client > • Receives the initial job manager address > • Bootstraps the FlinkYarnCluster > 1.1.2 FlinkYarnCluster > ╌╌ > • Wrapper around the Hadoop Yarn client > • Queries cluster for status updates > • Life time methods to start and shutdown the cluster > • Flink specific features like shutdown after job completion > 1.1.3 ApplicationClient > ╌╌╌ > • Acts as a middle-man for asynchronous cluster communication > • Designed to communicate with Yarn, not used in Standalone mode > 1.1.4 CliFrontend > ╌ > • Deeply integrated with FlinkYarnClient and FlinkYarnCluster > • Constantly distinguishes between Yarn and Standalone mode > • Would be nice to have a general abstraction in place > 1.1.5 Client > > • Job submission and Job related actions, agnostic of resource framework > 1.2 Proposal > > 1.2.1 ClusterConfig (before: AbstractFlinkYarnClient) > ╌ > • Extensible cluster-agnostic config > • May be extended by specific cluster, e.g. YarnClusterConfig > 1.2.2 ClusterClient (before: AbstractFlinkYarnClient) > ╌ > • Deals with cluster (RM) specific communication > • Exposes framework agnostic information > • YarnClusterClient, MesosClusterClient, StandaloneClusterClient > 1.2.3 FlinkCluster (before: AbstractFlinkYarnCluster) > ╌ > • Basic interface to communicate with a running cluster > • Receives the ClusterClient for cluster-specific communication > • Should not have to care about the specific implementations of the > client > 1.2.4 ApplicationClient > ╌╌╌ > • Can be changed to work cluster-agnostic (first steps already in > FLINK-3543) > 1.2.5 CliFrontend > ╌ > • CliFrontend does never have to differentiate between different > cluster types after it has determined which cluster class to load. > • Base class handles framework agnostic command line arguments > • Pluggables for Yarn, Mesos handle specific commands > {noformat} > I would like to create/refactor the affected classes to set us up for a more > flexible client side resource management abstraction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-3667) Generalize client<->cluster communication
[ https://issues.apache.org/jira/browse/FLINK-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels updated FLINK-3667: -- Fix Version/s: 1.1.0 > Generalize client<->cluster communication > - > > Key: FLINK-3667 > URL: https://issues.apache.org/jira/browse/FLINK-3667 > Project: Flink > Issue Type: Improvement > Components: YARN Client >Reporter: Maximilian Michels >Assignee: Maximilian Michels > Fix For: 1.1.0 > > > Here are some notes I took when inspecting the client<->cluster classes with > regard to future integration of other resource management frameworks in > addition to Yarn (e.g. Mesos). > {noformat} > 1 Cluster Client Abstraction > > 1.1 Status Quo > ── > 1.1.1 FlinkYarnClient > ╌ > • Holds the cluster configuration (Flink-specific and Yarn-specific) > • Contains the deploy() method to deploy the cluster > • Creates the Hadoop Yarn client > • Receives the initial job manager address > • Bootstraps the FlinkYarnCluster > 1.1.2 FlinkYarnCluster > ╌╌ > • Wrapper around the Hadoop Yarn client > • Queries cluster for status updates > • Life time methods to start and shutdown the cluster > • Flink specific features like shutdown after job completion > 1.1.3 ApplicationClient > ╌╌╌ > • Acts as a middle-man for asynchronous cluster communication > • Designed to communicate with Yarn, not used in Standalone mode > 1.1.4 CliFrontend > ╌ > • Deeply integrated with FlinkYarnClient and FlinkYarnCluster > • Constantly distinguishes between Yarn and Standalone mode > • Would be nice to have a general abstraction in place > 1.1.5 Client > > • Job submission and Job related actions, agnostic of resource framework > 1.2 Proposal > > 1.2.1 ClusterConfig (before: AbstractFlinkYarnClient) > ╌ > • Extensible cluster-agnostic config > • May be extended by specific cluster, e.g. YarnClusterConfig > 1.2.2 ClusterClient (before: AbstractFlinkYarnClient) > ╌ > • Deals with cluster (RM) specific communication > • Exposes framework agnostic information > • YarnClusterClient, MesosClusterClient, StandaloneClusterClient > 1.2.3 FlinkCluster (before: AbstractFlinkYarnCluster) > ╌ > • Basic interface to communicate with a running cluster > • Receives the ClusterClient for cluster-specific communication > • Should not have to care about the specific implementations of the > client > 1.2.4 ApplicationClient > ╌╌╌ > • Can be changed to work cluster-agnostic (first steps already in > FLINK-3543) > 1.2.5 CliFrontend > ╌ > • CliFrontend does never have to differentiate between different > cluster types after it has determined which cluster class to load. > • Base class handles framework agnostic command line arguments > • Pluggables for Yarn, Mesos handle specific commands > {noformat} > I would like to create/refactor the affected classes to set us up for a more > flexible client side resource management abstraction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4079) YARN properties file used for per-job cluster
[ https://issues.apache.org/jira/browse/FLINK-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335730#comment-15335730 ] Maximilian Michels commented on FLINK-4079: --- Fixed for 1.1 with ec6d97528e8b21f191b7922e4810fd60804c8365 Do we want to do a backport for 1.0.4? > YARN properties file used for per-job cluster > - > > Key: FLINK-4079 > URL: https://issues.apache.org/jira/browse/FLINK-4079 > Project: Flink > Issue Type: Bug > Components: Client >Affects Versions: 1.0.3 >Reporter: Ufuk Celebi >Assignee: Maximilian Michels > Fix For: 1.1.0, 1.0.4 > > > YARN per job clusters (flink run -m yarn-cluster) rely on the hidden YARN > properties file, which defines the container configuration. This can lead to > unexpected behaviour, because the per-job-cluster configuration is merged > with the YARN properties file (or used as only configuration source). > A user ran into this as follows: > - Create a long-lived YARN session with HA (creates a hidden YARN properties > file) > - Submits standalone batch jobs with a per job cluster (flink run -m > yarn-cluster). The batch jobs get submitted to the long lived HA cluster, > because of the properties file. > [~mxm] Am I correct in assuming that this is only relevant for the 1.0 branch > and will be fixed with the client refactoring you are working on? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-3937) Make flink cli list, savepoint, cancel and stop work on Flink-on-YARN clusters
[ https://issues.apache.org/jira/browse/FLINK-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels closed FLINK-3937. - Resolution: Fixed Fix Version/s: 1.1.0 Fixed with 875d4d238567a0503941488eb4e7d03b38c0fc42 and f4ac852275da36ee33aa54ae9097293ccc981afa > Make flink cli list, savepoint, cancel and stop work on Flink-on-YARN clusters > -- > > Key: FLINK-3937 > URL: https://issues.apache.org/jira/browse/FLINK-3937 > Project: Flink > Issue Type: Improvement >Reporter: Sebastian Klemke >Assignee: Maximilian Michels >Priority: Minor > Fix For: 1.1.0 > > Attachments: improve_flink_cli_yarn_integration.patch > > > Currently, flink cli can't figure out JobManager RPC location for > Flink-on-YARN clusters. Therefore, list, savepoint, cancel and stop > subcommands are hard to invoke if you only know the YARN application ID. As > an improvement, I suggest adding a -yid option to the > mentioned subcommands that can be used together with -m yarn-cluster. Flink > cli would then retrieve JobManager RPC location from YARN ResourceManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-3937) Make flink cli list, savepoint, cancel and stop work on Flink-on-YARN clusters
[ https://issues.apache.org/jira/browse/FLINK-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels updated FLINK-3937: -- Priority: Minor (was: Trivial) > Make flink cli list, savepoint, cancel and stop work on Flink-on-YARN clusters > -- > > Key: FLINK-3937 > URL: https://issues.apache.org/jira/browse/FLINK-3937 > Project: Flink > Issue Type: Improvement >Reporter: Sebastian Klemke >Assignee: Maximilian Michels >Priority: Minor > Attachments: improve_flink_cli_yarn_integration.patch > > > Currently, flink cli can't figure out JobManager RPC location for > Flink-on-YARN clusters. Therefore, list, savepoint, cancel and stop > subcommands are hard to invoke if you only know the YARN application ID. As > an improvement, I suggest adding a -yid option to the > mentioned subcommands that can be used together with -m yarn-cluster. Flink > cli would then retrieve JobManager RPC location from YARN ResourceManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-3863) Yarn Cluster shutdown may fail if leader changed recently
[ https://issues.apache.org/jira/browse/FLINK-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels closed FLINK-3863. - Resolution: Fixed Fixed with 9e9842410a635d183a002d1f25a6f489ce9d6a2f > Yarn Cluster shutdown may fail if leader changed recently > - > > Key: FLINK-3863 > URL: https://issues.apache.org/jira/browse/FLINK-3863 > Project: Flink > Issue Type: Bug > Components: YARN Client >Affects Versions: 1.1.0, 1.0.2 >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Minor > Fix For: 1.1.0 > > > The {{ApplicationClient}} sets {{yarnJobManager}} to {{None}} until it has > connected to a newly elected JobManager. A shutdown message to the > application master is discarded while the ApplicationClient tries to > reconnect. The ApplicationClient should retry to shutdown the cluster when it > is connected to the new leader. It may also time out (which currently is > always the case). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2392) Instable test in flink-yarn-tests
[ https://issues.apache.org/jira/browse/FLINK-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333516#comment-15333516 ] Maximilian Michels commented on FLINK-2392: --- It doesn't matter if it is a naming conflict or a malformed bean. It still seems to be an instance of incorrect registration, no? :) > Instable test in flink-yarn-tests > - > > Key: FLINK-2392 > URL: https://issues.apache.org/jira/browse/FLINK-2392 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 0.10.0 >Reporter: Matthias J. Sax >Assignee: Robert Metzger >Priority: Critical > Labels: test-stability > Fix For: 1.0.0 > > > The test YARNSessionFIFOITCase fails from time to time on an irregular basis. > For example see: https://travis-ci.org/apache/flink/jobs/72019690 > {noformat} > Tests run: 12, Failures: 1, Errors: 0, Skipped: 2, Time elapsed: 205.163 sec > <<< FAILURE! - in org.apache.flink.yarn.YARNSessionFIFOITCase > perJobYarnClusterWithParallelism(org.apache.flink.yarn.YARNSessionFIFOITCase) > Time elapsed: 60.651 sec <<< FAILURE! > java.lang.AssertionError: During the timeout period of 60 seconds the > expected string did not show up > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.apache.flink.yarn.YarnTestBase.runWithArgs(YarnTestBase.java:478) > at > org.apache.flink.yarn.YARNSessionFIFOITCase.perJobYarnClusterWithParallelism(YARNSessionFIFOITCase.java:435) > Results : > Failed tests: > > YARNSessionFIFOITCase.perJobYarnClusterWithParallelism:435->YarnTestBase.runWithArgs:478 > During the timeout period of 60 seconds the expected string did not show up > {noformat} > Another error case is this (see > https://travis-ci.org/mjsax/flink/jobs/77313444) > {noformat} > Tests run: 12, Failures: 3, Errors: 0, Skipped: 2, Time elapsed: 182.008 sec > <<< FAILURE! - in org.apache.flink.yarn.YARNSessionFIFOITCase > testTaskManagerFailure(org.apache.flink.yarn.YARNSessionFIFOITCase) Time > elapsed: 27.356 sec <<< FAILURE! > java.lang.AssertionError: Found a file > /home/travis/build/mjsax/flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-logDir-nm-0_0/application_1440595422559_0007/container_1440595422559_0007_01_03/taskmanager.log > with a prohibited string: [Exception, Started > SelectChannelConnector@0.0.0.0:8081] > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.flink.yarn.YarnTestBase.ensureNoProhibitedStringInLogFiles(YarnTestBase.java:294) > at > org.apache.flink.yarn.YARNSessionFIFOITCase.checkForProhibitedLogContents(YARNSessionFIFOITCase.java:94) > testNonexistingQueue(org.apache.flink.yarn.YARNSessionFIFOITCase) Time > elapsed: 17.421 sec <<< FAILURE! > java.lang.AssertionError: Found a file > /home/travis/build/mjsax/flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-logDir-nm-0_0/application_1440595422559_0007/container_1440595422559_0007_01_03/taskmanager.log > with a prohibited string: [Exception, Started > SelectChannelConnector@0.0.0.0:8081] > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.flink.yarn.YarnTestBase.ensureNoProhibitedStringInLogFiles(YarnTestBase.java:294) > at > org.apache.flink.yarn.YARNSessionFIFOITCase.checkForProhibitedLogContents(YARNSessionFIFOITCase.java:94) > testJavaAPI(org.apache.flink.yarn.YARNSessionFIFOITCase) Time elapsed: > 11.984 sec <<< FAILURE! > java.lang.AssertionError: Found a file > /home/travis/build/mjsax/flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-logDir-nm-0_0/application_1440595422559_0007/container_1440595422559_0007_01_03/taskmanager.log > with a prohibited string: [Exception, Started > SelectChannelConnector@0.0.0.0:8081] > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.flink.yarn.YarnTestBase.ensureNoProhibitedStringInLogFiles(YarnTestBase.java:294) > at > org.apache.flink.yarn.YARNSessionFIFOITCase.checkForProhibitedLogContents(YARNSessionFIFOITCase.java:94) > {noformat} > Furthermore, this build failed too: > https://travis-ci.org/apache/flink/jobs/77313450 > (no error, but Travis terminated to due no progress for 300 seconds -> > deadlock?) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2392) Instable test in flink-yarn-tests
[ https://issues.apache.org/jira/browse/FLINK-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333461#comment-15333461 ] Maximilian Michels commented on FLINK-2392: --- What do you mean? The issue is unrelated or the Exception? > Instable test in flink-yarn-tests > - > > Key: FLINK-2392 > URL: https://issues.apache.org/jira/browse/FLINK-2392 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 0.10.0 >Reporter: Matthias J. Sax >Assignee: Robert Metzger >Priority: Critical > Labels: test-stability > Fix For: 1.0.0 > > > The test YARNSessionFIFOITCase fails from time to time on an irregular basis. > For example see: https://travis-ci.org/apache/flink/jobs/72019690 > {noformat} > Tests run: 12, Failures: 1, Errors: 0, Skipped: 2, Time elapsed: 205.163 sec > <<< FAILURE! - in org.apache.flink.yarn.YARNSessionFIFOITCase > perJobYarnClusterWithParallelism(org.apache.flink.yarn.YARNSessionFIFOITCase) > Time elapsed: 60.651 sec <<< FAILURE! > java.lang.AssertionError: During the timeout period of 60 seconds the > expected string did not show up > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.apache.flink.yarn.YarnTestBase.runWithArgs(YarnTestBase.java:478) > at > org.apache.flink.yarn.YARNSessionFIFOITCase.perJobYarnClusterWithParallelism(YARNSessionFIFOITCase.java:435) > Results : > Failed tests: > > YARNSessionFIFOITCase.perJobYarnClusterWithParallelism:435->YarnTestBase.runWithArgs:478 > During the timeout period of 60 seconds the expected string did not show up > {noformat} > Another error case is this (see > https://travis-ci.org/mjsax/flink/jobs/77313444) > {noformat} > Tests run: 12, Failures: 3, Errors: 0, Skipped: 2, Time elapsed: 182.008 sec > <<< FAILURE! - in org.apache.flink.yarn.YARNSessionFIFOITCase > testTaskManagerFailure(org.apache.flink.yarn.YARNSessionFIFOITCase) Time > elapsed: 27.356 sec <<< FAILURE! > java.lang.AssertionError: Found a file > /home/travis/build/mjsax/flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-logDir-nm-0_0/application_1440595422559_0007/container_1440595422559_0007_01_03/taskmanager.log > with a prohibited string: [Exception, Started > SelectChannelConnector@0.0.0.0:8081] > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.flink.yarn.YarnTestBase.ensureNoProhibitedStringInLogFiles(YarnTestBase.java:294) > at > org.apache.flink.yarn.YARNSessionFIFOITCase.checkForProhibitedLogContents(YARNSessionFIFOITCase.java:94) > testNonexistingQueue(org.apache.flink.yarn.YARNSessionFIFOITCase) Time > elapsed: 17.421 sec <<< FAILURE! > java.lang.AssertionError: Found a file > /home/travis/build/mjsax/flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-logDir-nm-0_0/application_1440595422559_0007/container_1440595422559_0007_01_03/taskmanager.log > with a prohibited string: [Exception, Started > SelectChannelConnector@0.0.0.0:8081] > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.flink.yarn.YarnTestBase.ensureNoProhibitedStringInLogFiles(YarnTestBase.java:294) > at > org.apache.flink.yarn.YARNSessionFIFOITCase.checkForProhibitedLogContents(YARNSessionFIFOITCase.java:94) > testJavaAPI(org.apache.flink.yarn.YARNSessionFIFOITCase) Time elapsed: > 11.984 sec <<< FAILURE! > java.lang.AssertionError: Found a file > /home/travis/build/mjsax/flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-logDir-nm-0_0/application_1440595422559_0007/container_1440595422559_0007_01_03/taskmanager.log > with a prohibited string: [Exception, Started > SelectChannelConnector@0.0.0.0:8081] > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.flink.yarn.YarnTestBase.ensureNoProhibitedStringInLogFiles(YarnTestBase.java:294) > at > org.apache.flink.yarn.YARNSessionFIFOITCase.checkForProhibitedLogContents(YARNSessionFIFOITCase.java:94) > {noformat} > Furthermore, this build failed too: > https://travis-ci.org/apache/flink/jobs/77313450 > (no error, but Travis terminated to due no progress for 300 seconds -> > deadlock?) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3962) JMXReporter doesn't properly register/deregister metrics
[ https://issues.apache.org/jira/browse/FLINK-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333440#comment-15333440 ] Maximilian Michels commented on FLINK-3962: --- Another exception from the Yarn logs: {noformat} 2016-06-16 04:58:37,218 ERROR org.apache.flink.metrics.reporter.JMXReporter - Metric did not comply with JMX MBean naming rules. javax.management.NotCompliantMBeanException: Interface is not public: org.apache.flink.metrics.reporter.JMXReporter$JmxGaugeMBean at com.sun.jmx.mbeanserver.MBeanAnalyzer.(MBeanAnalyzer.java:114) at com.sun.jmx.mbeanserver.MBeanAnalyzer.analyzer(MBeanAnalyzer.java:102) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.getAnalyzer(StandardMBeanIntrospector.java:67) at com.sun.jmx.mbeanserver.MBeanIntrospector.getPerInterface(MBeanIntrospector.java:192) at com.sun.jmx.mbeanserver.MBeanSupport.(MBeanSupport.java:138) at com.sun.jmx.mbeanserver.StandardMBeanSupport.(StandardMBeanSupport.java:60) at com.sun.jmx.mbeanserver.Introspector.makeDynamicMBean(Introspector.java:192) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:898) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:324) at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522) at org.apache.flink.metrics.reporter.JMXReporter.notifyOfAddedMetric(JMXReporter.java:113) at org.apache.flink.metrics.MetricRegistry.register(MetricRegistry.java:174) at org.apache.flink.metrics.groups.AbstractMetricGroup.addMetric(AbstractMetricGroup.java:206) at org.apache.flink.metrics.groups.AbstractMetricGroup.gauge(AbstractMetricGroup.java:162) at org.apache.flink.runtime.taskmanager.TaskManager$.instantiateClassLoaderMetrics(TaskManager.scala:2298) at org.apache.flink.runtime.taskmanager.TaskManager$.org$apache$flink$runtime$taskmanager$TaskManager$$instantiateStatusMetrics(TaskManager.scala:2287) at org.apache.flink.runtime.taskmanager.TaskManager.associateWithJobManager(TaskManager.scala:951) at org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$handleRegistrationMessage(TaskManager.scala:631) at org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:124) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 2016-06-16 04:58:37,221 ERROR org.apache.flink.metrics.reporter.JMXReporter - Metric did not comply with JMX MBean naming rules. javax.management.NotCompliantMBeanException: Interface is not public: org.apache.flink.metrics.reporter.JMXReporter$JmxGaugeMBean at com.sun.jmx.mbeanserver.MBeanAnalyzer.(MBeanAnalyzer.java:114) at com.sun.jmx.mbeanserver.MBeanAnalyzer.analyzer(MBeanAnalyzer.java:102) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.getAnalyzer(StandardMBeanIntrospector.java:67) at com.sun.jmx.mbeanserver.MBeanIntrospector.getPerInterface(MBeanIntrospector.java:192) at com.sun.jmx.mbeanserver.MBeanSupport.(MBeanSupport.java:138) at com.sun.jmx.mbeanserver.StandardMBeanSupport.(StandardMBeanSupport.java:60) at com.sun.jmx.mbeanserver.Introspector.makeDynamicMBean(Introspector.java:192) at com.sun.jmx.interceptor.DefaultM
[jira] [Commented] (FLINK-2392) Instable test in flink-yarn-tests
[ https://issues.apache.org/jira/browse/FLINK-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333442#comment-15333442 ] Maximilian Michels commented on FLINK-2392: --- Should be fixed the resolution of FLINK-3962 > Instable test in flink-yarn-tests > - > > Key: FLINK-2392 > URL: https://issues.apache.org/jira/browse/FLINK-2392 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 0.10.0 >Reporter: Matthias J. Sax >Assignee: Robert Metzger >Priority: Critical > Labels: test-stability > Fix For: 1.0.0 > > > The test YARNSessionFIFOITCase fails from time to time on an irregular basis. > For example see: https://travis-ci.org/apache/flink/jobs/72019690 > {noformat} > Tests run: 12, Failures: 1, Errors: 0, Skipped: 2, Time elapsed: 205.163 sec > <<< FAILURE! - in org.apache.flink.yarn.YARNSessionFIFOITCase > perJobYarnClusterWithParallelism(org.apache.flink.yarn.YARNSessionFIFOITCase) > Time elapsed: 60.651 sec <<< FAILURE! > java.lang.AssertionError: During the timeout period of 60 seconds the > expected string did not show up > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.apache.flink.yarn.YarnTestBase.runWithArgs(YarnTestBase.java:478) > at > org.apache.flink.yarn.YARNSessionFIFOITCase.perJobYarnClusterWithParallelism(YARNSessionFIFOITCase.java:435) > Results : > Failed tests: > > YARNSessionFIFOITCase.perJobYarnClusterWithParallelism:435->YarnTestBase.runWithArgs:478 > During the timeout period of 60 seconds the expected string did not show up > {noformat} > Another error case is this (see > https://travis-ci.org/mjsax/flink/jobs/77313444) > {noformat} > Tests run: 12, Failures: 3, Errors: 0, Skipped: 2, Time elapsed: 182.008 sec > <<< FAILURE! - in org.apache.flink.yarn.YARNSessionFIFOITCase > testTaskManagerFailure(org.apache.flink.yarn.YARNSessionFIFOITCase) Time > elapsed: 27.356 sec <<< FAILURE! > java.lang.AssertionError: Found a file > /home/travis/build/mjsax/flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-logDir-nm-0_0/application_1440595422559_0007/container_1440595422559_0007_01_03/taskmanager.log > with a prohibited string: [Exception, Started > SelectChannelConnector@0.0.0.0:8081] > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.flink.yarn.YarnTestBase.ensureNoProhibitedStringInLogFiles(YarnTestBase.java:294) > at > org.apache.flink.yarn.YARNSessionFIFOITCase.checkForProhibitedLogContents(YARNSessionFIFOITCase.java:94) > testNonexistingQueue(org.apache.flink.yarn.YARNSessionFIFOITCase) Time > elapsed: 17.421 sec <<< FAILURE! > java.lang.AssertionError: Found a file > /home/travis/build/mjsax/flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-logDir-nm-0_0/application_1440595422559_0007/container_1440595422559_0007_01_03/taskmanager.log > with a prohibited string: [Exception, Started > SelectChannelConnector@0.0.0.0:8081] > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.flink.yarn.YarnTestBase.ensureNoProhibitedStringInLogFiles(YarnTestBase.java:294) > at > org.apache.flink.yarn.YARNSessionFIFOITCase.checkForProhibitedLogContents(YARNSessionFIFOITCase.java:94) > testJavaAPI(org.apache.flink.yarn.YARNSessionFIFOITCase) Time elapsed: > 11.984 sec <<< FAILURE! > java.lang.AssertionError: Found a file > /home/travis/build/mjsax/flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-logDir-nm-0_0/application_1440595422559_0007/container_1440595422559_0007_01_03/taskmanager.log > with a prohibited string: [Exception, Started > SelectChannelConnector@0.0.0.0:8081] > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.flink.yarn.YarnTestBase.ensureNoProhibitedStringInLogFiles(YarnTestBase.java:294) > at > org.apache.flink.yarn.YARNSessionFIFOITCase.checkForProhibitedLogContents(YARNSessionFIFOITCase.java:94) > {noformat} > Furthermore, this build failed too: > https://travis-ci.org/apache/flink/jobs/77313450 > (no error, but Travis terminated to due no progress for 300 seconds -> > deadlock?) -- This message was sent by Atlassian JIRA (v6.3.4#6332)