[jira] [Resolved] (FLINK-4261) Setup atomic deployment of snapshots

2016-07-26 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels resolved FLINK-4261.
---
Resolution: Unresolved

Unfortunately, we can't deploy snapshots atomically using the Nexus repository. 
The staged process which leads to an atomic deployment is only designed to work 
for releases. Best we can do is to retry deploying artifacts in case of 
failures.

> Setup atomic deployment of snapshots
> 
>
> Key: FLINK-4261
> URL: https://issues.apache.org/jira/browse/FLINK-4261
> Project: Flink
>  Issue Type: Bug
>  Components: Build System, release
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> Users have reported that our nightly snapshots become inconsistent from time 
> to time. This happens when the upload to the snapshot repository fails during 
> the deployment process. Maven doesn't support atomic deployment but deploys 
> artifacts one after another, directly after installing them in the local 
> repository. If the build fails at any time, no changes are rolled back. 
> This problem has been solved for Nexus repositories. For releases, we already 
> take advantage of atomic deployments using staging repositories. Nexus 
> repositories support this even without using a special Maven plugin. 
> For releases, we have to use the Web UI to close and release staging 
> repositories. For snapshots this should be automated. Most importantly, the 
> changes shouldn't alter anything for our release process.
> I suggest to use the {{nexus-staging-maven-plugin}} which essentially 
> replaces the standard maven deploy plugin. It can be setup to auto-close and 
> auto-release snapshots staging repositories. For releases, it will be setup 
> to never auto-close nor auto-release which keeps our existing release process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4261) Setup atomic deployment of snapshots

2016-07-26 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393894#comment-15393894
 ] 

Maximilian Michels commented on FLINK-4261:
---

Retry via cd232e683f7aa6d7660ca2d545ba1534435e1ab1

> Setup atomic deployment of snapshots
> 
>
> Key: FLINK-4261
> URL: https://issues.apache.org/jira/browse/FLINK-4261
> Project: Flink
>  Issue Type: Bug
>  Components: Build System, release
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> Users have reported that our nightly snapshots become inconsistent from time 
> to time. This happens when the upload to the snapshot repository fails during 
> the deployment process. Maven doesn't support atomic deployment but deploys 
> artifacts one after another, directly after installing them in the local 
> repository. If the build fails at any time, no changes are rolled back. 
> This problem has been solved for Nexus repositories. For releases, we already 
> take advantage of atomic deployments using staging repositories. Nexus 
> repositories support this even without using a special Maven plugin. 
> For releases, we have to use the Web UI to close and release staging 
> repositories. For snapshots this should be automated. Most importantly, the 
> changes shouldn't alter anything for our release process.
> I suggest to use the {{nexus-staging-maven-plugin}} which essentially 
> replaces the standard maven deploy plugin. It can be setup to auto-close and 
> auto-release snapshots staging repositories. For releases, it will be setup 
> to never auto-close nor auto-release which keeps our existing release process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-4152) TaskManager registration exponential backoff doesn't work

2016-07-26 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed FLINK-4152.
-
   Resolution: Fixed
Fix Version/s: 1.1.0

Fixed via 2648bc1a5a5faed8c2061bcab40a8949fd02751c

> TaskManager registration exponential backoff doesn't work
> -
>
> Key: FLINK-4152
> URL: https://issues.apache.org/jira/browse/FLINK-4152
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination, TaskManager, YARN Client
>Reporter: Robert Metzger
>Assignee: Till Rohrmann
> Fix For: 1.1.0
>
> Attachments: logs.tgz
>
>
> While testing Flink 1.1 I've found that the TaskManagers are logging many 
> messages when registering at the JobManager.
> This is the log file: 
> https://gist.github.com/rmetzger/0cebe0419cdef4507b1e8a42e33ef294
> Its logging more than 3000 messages in less than a minute. I don't think that 
> this is the expected behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-4261) Setup atomic deployment of snapshots

2016-07-25 Thread Maximilian Michels (JIRA)
Maximilian Michels created FLINK-4261:
-

 Summary: Setup atomic deployment of snapshots
 Key: FLINK-4261
 URL: https://issues.apache.org/jira/browse/FLINK-4261
 Project: Flink
  Issue Type: Bug
  Components: Build System, release
Reporter: Maximilian Michels
Assignee: Maximilian Michels
 Fix For: 1.1.0


Users have reported that our nightly snapshots become inconsistent from time to 
time. This happens when the upload to the snapshot repository fails during the 
deployment process. Maven doesn't support atomic deployment but deploys 
artifacts one after another, directly after installing them in the local 
repository. If the build fails at any time, no changes are rolled back. 

This problem has been solved for Nexus repositories. For releases, we already 
take advantage of atomic deployments using staging repositories. Nexus 
repositories support this even without using a special Maven plugin. 

For releases, we have to use the Web UI to close and release staging 
repositories. For snapshots this should be automated. Most importantly, the 
changes shouldn't alter anything for our release process.

I suggest to use the {{nexus-staging-maven-plugin}} which essentially replaces 
the standard maven deploy plugin. It can be setup to auto-close and 
auto-release snapshots staging repositories. For releases, it will be setup to 
never auto-close nor auto-release which keeps our existing release process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-4261) Setup atomic deployment of snapshots

2016-07-25 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated FLINK-4261:
--
External issue ID:   (was: INFRA-12320)

> Setup atomic deployment of snapshots
> 
>
> Key: FLINK-4261
> URL: https://issues.apache.org/jira/browse/FLINK-4261
> Project: Flink
>  Issue Type: Bug
>  Components: Build System, release
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> Users have reported that our nightly snapshots become inconsistent from time 
> to time. This happens when the upload to the snapshot repository fails during 
> the deployment process. Maven doesn't support atomic deployment but deploys 
> artifacts one after another, directly after installing them in the local 
> repository. If the build fails at any time, no changes are rolled back. 
> This problem has been solved for Nexus repositories. For releases, we already 
> take advantage of atomic deployments using staging repositories. Nexus 
> repositories support this even without using a special Maven plugin. 
> For releases, we have to use the Web UI to close and release staging 
> repositories. For snapshots this should be automated. Most importantly, the 
> changes shouldn't alter anything for our release process.
> I suggest to use the {{nexus-staging-maven-plugin}} which essentially 
> replaces the standard maven deploy plugin. It can be setup to auto-close and 
> auto-release snapshots staging repositories. For releases, it will be setup 
> to never auto-close nor auto-release which keeps our existing release process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-4236) Flink Dashboard stops showing list of uploaded jars if main method cannot be looked up

2016-07-21 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated FLINK-4236:
--
Fix Version/s: 1.1.0

> Flink Dashboard stops showing list of uploaded jars if main method cannot be 
> looked up
> --
>
> Key: FLINK-4236
> URL: https://issues.apache.org/jira/browse/FLINK-4236
> Project: Flink
>  Issue Type: Bug
>  Components: Job-Submission
>Affects Versions: 1.0.3
>Reporter: Gary Yao
> Fix For: 1.1.0
>
>
> The Flink Dashboard stops showing the list of uploaded jars on the job 
> submission page if a jar is uploaded for which the main method cannot be 
> looked up. The HTTP call returns with code 500. See the attached stacktrace:
> {code}
> java.lang.RuntimeException: Failed to fetch jar list: Could not look up the 
> main(String[]) method from the class de.zalando.[...]]: 
> org/shaded/apache/flink/streaming/api/functions/source/SourceFunction
> at 
> org.apache.flink.runtime.webmonitor.handlers.JarListHandler.handleRequest(JarListHandler.java:122)
> at 
> org.apache.flink.runtime.webmonitor.RuntimeMonitorHandler.respondAsLeader(RuntimeMonitorHandler.java:135)
> at 
> org.apache.flink.runtime.webmonitor.RuntimeMonitorHandler.channelRead0(RuntimeMonitorHandler.java:112)
> at 
> org.apache.flink.runtime.webmonitor.RuntimeMonitorHandler.channelRead0(RuntimeMonitorHandler.java:60)
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at io.netty.handler.codec.http.router.Handler.routed(Handler.java:62)
> at 
> io.netty.handler.codec.http.router.DualAbstractHandler.channelRead0(DualAbstractHandler.java:57)
> at 
> io.netty.handler.codec.http.router.DualAbstractHandler.channelRead0(DualAbstractHandler.java:20)
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at 
> org.apache.flink.runtime.webmonitor.HttpRequestHandler.channelRead0(HttpRequestHandler.java:104)
> at 
> org.apache.flink.runtime.webmonitor.HttpRequestHandler.channelRead0(HttpRequestHandler.java:65)
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
> at 
> io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:147)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
> at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
> at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Could not look up the main(String[]) 
> method from the class de.zalando.[...]]: 
> org/shaded/apache/flink/streaming/api/functions/source/SourceFunction
> at 
> org.apache.flink.client.program.PackagedProgram.hasMainMethod(PackagedProgram.java:479)
> at 
> org.apache.flink.client.program.PackagedProgram.(PackagedProgram.java:216)
> at 
> org.apache.flink.client.program.PackagedProgram.(PackagedProgram.java:147)
> at 
> org.apache.flink.runtime.webmonitor.handlers.JarListHandler.handleRequest(JarListHandler.java:103)
> ... 30 more
> Caused by: java.lang.NoClassDefFoundError: 
> org/shaded/apache/flink/streaming/api/functions/source/SourceFunction
> at java.lang.Class.ge

[jira] [Resolved] (FLINK-3753) KillerWatchDog should not use kill on toKill thread

2016-07-20 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels resolved FLINK-3753.
---
Resolution: Won't Fix

> KillerWatchDog should not use kill on toKill thread
> ---
>
> Key: FLINK-3753
> URL: https://issues.apache.org/jira/browse/FLINK-3753
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> {code}
> // this is harsh, but this watchdog is a last resort
> if (toKill.isAlive()) {
>   toKill.stop();
> }
> {code}
> stop() is deprecated.
> See:
> https://www.securecoding.cert.org/confluence/display/java/THI05-J.+Do+not+use+Thread.stop()+to+terminate+threads



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3753) KillerWatchDog should not use kill on toKill thread

2016-07-20 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385754#comment-15385754
 ] 

Maximilian Michels commented on FLINK-3753:
---

Thanks for reporting! It actually states in the class why the {{stop()}} method 
is used. I think we have to work around some issues with the Kafka fetcher.

I would close this issue for now because we want to use {{stop()}} for the 
KillerWatchDog (scary name!).

> KillerWatchDog should not use kill on toKill thread
> ---
>
> Key: FLINK-3753
> URL: https://issues.apache.org/jira/browse/FLINK-3753
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> {code}
> // this is harsh, but this watchdog is a last resort
> if (toKill.isAlive()) {
>   toKill.stop();
> }
> {code}
> stop() is deprecated.
> See:
> https://www.securecoding.cert.org/confluence/display/java/THI05-J.+Do+not+use+Thread.stop()+to+terminate+threads



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-4166) Generate automatic different namespaces in Zookeeper for Flink applications

2016-07-19 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed FLINK-4166.
-
   Resolution: Fixed
 Assignee: Stefan Richter
Fix Version/s: 1.1.0

Fixed via 082d87e5139a10fc34ddb3072d15c1a4747117e7

> Generate automatic different namespaces in Zookeeper for Flink applications
> ---
>
> Key: FLINK-4166
> URL: https://issues.apache.org/jira/browse/FLINK-4166
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.0.3
>Reporter: Stefan Richter
>Assignee: Stefan Richter
> Fix For: 1.1.0
>
>
> We should automatically generate different namespaces per Flink application 
> in Zookeeper to avoid interference between different applications that refer 
> to the same Zookeeper entries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-4199) Misleading messages by CLI upon job submission

2016-07-19 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels resolved FLINK-4199.
---
Resolution: Fixed

Fixed via 17589d454d00efa43cdf6116ea29ff4f513b6f20

> Misleading messages by CLI upon job submission
> --
>
> Key: FLINK-4199
> URL: https://issues.apache.org/jira/browse/FLINK-4199
> Project: Flink
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.1.0
>Reporter: Kostas Kloudas
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> Trying to submit a job jar from the client to a non-existing cluster gives 
> the following messages. In particular the first and last lines: "Cluster 
> retrieved: Standalone cluster with JobManager at localhost/127.0.0.1:6123" 
> and "Job has been submitted with" are totally misleading.
> {code}
> Cluster retrieved: Standalone cluster with JobManager at 
> localhost/127.0.0.1:6123
> Using address localhost:6123 to connect to JobManager.
> JobManager web interface address http://localhost:8081
> Starting execution of program
> Submitting job with JobID: 9c7120e5cc55b2a9157a7e2bc5a12c9d. Waiting for job 
> completion.
> org.apache.flink.client.program.ProgramInvocationException: The program 
> execution failed: Communication with JobManager failed: Lost connection to 
> the JobManager.
> Job has been submitted with JobID 9c7120e5cc55b2a9157a7e2bc5a12c9d
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-4223) Rearrange scaladoc and javadoc for Scala API

2016-07-19 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed FLINK-4223.
-
Resolution: Duplicate

> Rearrange scaladoc and javadoc for Scala API
> 
>
> Key: FLINK-4223
> URL: https://issues.apache.org/jira/browse/FLINK-4223
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Chiwan Park
>Priority: Minor
>  Labels: easyfix, newbie
>
> Currently, some scaladocs for Scala API (Gelly Scala API, FlinkML, Streaming 
> Scala API) are not in scaladoc but in javadoc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4223) Rearrange scaladoc and javadoc for Scala API

2016-07-19 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383846#comment-15383846
 ] 

Maximilian Michels commented on FLINK-4223:
---

Seems like a duplicate of FLINK-3710.

> Rearrange scaladoc and javadoc for Scala API
> 
>
> Key: FLINK-4223
> URL: https://issues.apache.org/jira/browse/FLINK-4223
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Chiwan Park
>Priority: Minor
>  Labels: easyfix, newbie
>
> Currently, some scaladocs for Scala API (Gelly Scala API, FlinkML, Streaming 
> Scala API) are not in scaladoc but in javadoc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4199) Misleading messages by CLI upon job submission

2016-07-18 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382423#comment-15382423
 ] 

Maximilian Michels commented on FLINK-4199:
---


This is how it looks now for successful executions:

{noformat}
Cluster configuration: Standalone cluster with JobManager at 
localhost/127.0.0.1:6123
Using address localhost:6123 to connect to JobManager.
JobManager web interface address http://localhost:8081
Starting execution of program
Executing WordCount example with default input data set.
Use --input to specify file input.
Printing result to stdout. Use --output to specify output path.
Submitting job with JobID: a56b184080d9755d3f2d40c7d9092b31. Waiting for job 
completion.
07/18/2016 16:42:23 Job execution switched to status RUNNING.
07/18/2016 16:42:23 Source: Collection Source -> Flat Map(1/1) switched to 
SCHEDULED 
07/18/2016 16:42:23 Source: Collection Source -> Flat Map(1/1) switched to 
DEPLOYING 
07/18/2016 16:42:23 Keyed Aggregation -> Sink: Unnamed(1/1) switched to 
SCHEDULED 
07/18/2016 16:42:23 Keyed Aggregation -> Sink: Unnamed(1/1) switched to 
DEPLOYING 
07/18/2016 16:42:23 Keyed Aggregation -> Sink: Unnamed(1/1) switched to 
RUNNING 
07/18/2016 16:42:23 Source: Collection Source -> Flat Map(1/1) switched to 
RUNNING 
07/18/2016 16:42:23 Source: Collection Source -> Flat Map(1/1) switched to 
FINISHED 
07/18/2016 16:42:23 Keyed Aggregation -> Sink: Unnamed(1/1) switched to 
FINISHED 
07/18/2016 16:42:23 Job execution switched to status FINISHED.
Program execution finished
Job with JobID a56b184080d9755d3f2d40c7d9092b31 has finished.
Job Runtime: 25 ms
{noformat}

This is how it looks when the JobManager is not available:

{noformat}
Cluster configuration: Standalone cluster with JobManager at 
localhost/127.0.0.1:6123
Using address localhost:6123 to connect to JobManager.
JobManager web interface address http://localhost:8081
Starting execution of program
Executing WordCount example with default input data set.
Use --input to specify file input.
Printing result to stdout. Use --output to specify output path.
Submitting job with JobID: 20d070cf6a4289df4e5045c9c52cc47b. Waiting for job 
completion.


 The program finished with the following exception:

org.apache.flink.client.program.ProgramInvocationException: The program 
execution failed: Communication with JobManager failed: Lost connection to the 
JobManager.
...
{noformat}



> Misleading messages by CLI upon job submission
> --
>
> Key: FLINK-4199
> URL: https://issues.apache.org/jira/browse/FLINK-4199
> Project: Flink
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.1.0
>Reporter: Kostas Kloudas
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> Trying to submit a job jar from the client to a non-existing cluster gives 
> the following messages. In particular the first and last lines: "Cluster 
> retrieved: Standalone cluster with JobManager at localhost/127.0.0.1:6123" 
> and "Job has been submitted with" are totally misleading.
> {code}
> Cluster retrieved: Standalone cluster with JobManager at 
> localhost/127.0.0.1:6123
> Using address localhost:6123 to connect to JobManager.
> JobManager web interface address http://localhost:8081
> Starting execution of program
> Submitting job with JobID: 9c7120e5cc55b2a9157a7e2bc5a12c9d. Waiting for job 
> completion.
> org.apache.flink.client.program.ProgramInvocationException: The program 
> execution failed: Communication with JobManager failed: Lost connection to 
> the JobManager.
> Job has been submitted with JobID 9c7120e5cc55b2a9157a7e2bc5a12c9d
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4199) Misleading messages by CLI upon job submission

2016-07-18 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382416#comment-15382416
 ] 

Maximilian Michels commented on FLINK-4199:
---

Thanks for reporting!
 
{quote}
1) Cluster retrieved: Standalone cluster with JobManager at 
localhost/127.0.0.1:6123
{quote}

I see how this is misleading. We don't do any communication with the cluster 
until we submit the job. So this should be more like "Cluster configuration 
used: ...". We don't want any communication with the cluster before we have 
executed the user jar interactively because the jar may not even execute.


{quote}
org.apache.flink.client.program.ProgramInvocationException: The program 
execution failed: Communication with JobManager failed: Lost connection to the 
JobManager.
Job has been submitted with JobID 9c7120e5cc55b2a9157a7e2bc5a12c9d
{quote}

{quote}
07/12/2016 16:18:58 Job execution switched to status FINISHED.
Job has been submitted with JobID 477429b836247909dd428a6cba5b923b
{quote}

These are both artifacts of Flink not returning proper results for interactive 
programs. I'm changing that.


> Misleading messages by CLI upon job submission
> --
>
> Key: FLINK-4199
> URL: https://issues.apache.org/jira/browse/FLINK-4199
> Project: Flink
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.1.0
>Reporter: Kostas Kloudas
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> Trying to submit a job jar from the client to a non-existing cluster gives 
> the following messages. In particular the first and last lines: "Cluster 
> retrieved: Standalone cluster with JobManager at localhost/127.0.0.1:6123" 
> and "Job has been submitted with" are totally misleading.
> {code}
> Cluster retrieved: Standalone cluster with JobManager at 
> localhost/127.0.0.1:6123
> Using address localhost:6123 to connect to JobManager.
> JobManager web interface address http://localhost:8081
> Starting execution of program
> Submitting job with JobID: 9c7120e5cc55b2a9157a7e2bc5a12c9d. Waiting for job 
> completion.
> org.apache.flink.client.program.ProgramInvocationException: The program 
> execution failed: Communication with JobManager failed: Lost connection to 
> the JobManager.
> Job has been submitted with JobID 9c7120e5cc55b2a9157a7e2bc5a12c9d
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-4199) Misleading messages by CLI upon job submission

2016-07-18 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated FLINK-4199:
--
Summary: Misleading messages by CLI upon job submission  (was: Wrong client 
behavior when submitting job to non-existing cluster)

> Misleading messages by CLI upon job submission
> --
>
> Key: FLINK-4199
> URL: https://issues.apache.org/jira/browse/FLINK-4199
> Project: Flink
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.1.0
>Reporter: Kostas Kloudas
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> Trying to submit a job jar from the client to a non-existing cluster gives 
> the following messages. In particular the first and last lines: "Cluster 
> retrieved: Standalone cluster with JobManager at localhost/127.0.0.1:6123" 
> and "Job has been submitted with" are totally misleading.
> {code}
> Cluster retrieved: Standalone cluster with JobManager at 
> localhost/127.0.0.1:6123
> Using address localhost:6123 to connect to JobManager.
> JobManager web interface address http://localhost:8081
> Starting execution of program
> Submitting job with JobID: 9c7120e5cc55b2a9157a7e2bc5a12c9d. Waiting for job 
> completion.
> org.apache.flink.client.program.ProgramInvocationException: The program 
> execution failed: Communication with JobManager failed: Lost connection to 
> the JobManager.
> Job has been submitted with JobID 9c7120e5cc55b2a9157a7e2bc5a12c9d
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-4199) Wrong client behavior when submitting job to non-existing cluster

2016-07-18 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated FLINK-4199:
--
Fix Version/s: 1.1.0

> Wrong client behavior when submitting job to non-existing cluster
> -
>
> Key: FLINK-4199
> URL: https://issues.apache.org/jira/browse/FLINK-4199
> Project: Flink
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.1.0
>Reporter: Kostas Kloudas
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> Trying to submit a job jar from the client to a non-existing cluster gives 
> the following messages. In particular the first and last lines: "Cluster 
> retrieved: Standalone cluster with JobManager at localhost/127.0.0.1:6123" 
> and "Job has been submitted with" are totally misleading.
> {code}
> Cluster retrieved: Standalone cluster with JobManager at 
> localhost/127.0.0.1:6123
> Using address localhost:6123 to connect to JobManager.
> JobManager web interface address http://localhost:8081
> Starting execution of program
> Submitting job with JobID: 9c7120e5cc55b2a9157a7e2bc5a12c9d. Waiting for job 
> completion.
> org.apache.flink.client.program.ProgramInvocationException: The program 
> execution failed: Communication with JobManager failed: Lost connection to 
> the JobManager.
> Job has been submitted with JobID 9c7120e5cc55b2a9157a7e2bc5a12c9d
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-4199) Wrong client behavior when submitting job to non-existing cluster

2016-07-18 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated FLINK-4199:
--
Affects Version/s: 1.1.0

> Wrong client behavior when submitting job to non-existing cluster
> -
>
> Key: FLINK-4199
> URL: https://issues.apache.org/jira/browse/FLINK-4199
> Project: Flink
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.1.0
>Reporter: Kostas Kloudas
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> Trying to submit a job jar from the client to a non-existing cluster gives 
> the following messages. In particular the first and last lines: "Cluster 
> retrieved: Standalone cluster with JobManager at localhost/127.0.0.1:6123" 
> and "Job has been submitted with" are totally misleading.
> {code}
> Cluster retrieved: Standalone cluster with JobManager at 
> localhost/127.0.0.1:6123
> Using address localhost:6123 to connect to JobManager.
> JobManager web interface address http://localhost:8081
> Starting execution of program
> Submitting job with JobID: 9c7120e5cc55b2a9157a7e2bc5a12c9d. Waiting for job 
> completion.
> org.apache.flink.client.program.ProgramInvocationException: The program 
> execution failed: Communication with JobManager failed: Lost connection to 
> the JobManager.
> Job has been submitted with JobID 9c7120e5cc55b2a9157a7e2bc5a12c9d
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-4199) Wrong client behavior when submitting job to non-existing cluster

2016-07-18 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels reassigned FLINK-4199:
-

Assignee: Maximilian Michels

> Wrong client behavior when submitting job to non-existing cluster
> -
>
> Key: FLINK-4199
> URL: https://issues.apache.org/jira/browse/FLINK-4199
> Project: Flink
>  Issue Type: Bug
>  Components: Client
>Reporter: Kostas Kloudas
>Assignee: Maximilian Michels
>
> Trying to submit a job jar from the client to a non-existing cluster gives 
> the following messages. In particular the first and last lines: "Cluster 
> retrieved: Standalone cluster with JobManager at localhost/127.0.0.1:6123" 
> and "Job has been submitted with" are totally misleading.
> {code}
> Cluster retrieved: Standalone cluster with JobManager at 
> localhost/127.0.0.1:6123
> Using address localhost:6123 to connect to JobManager.
> JobManager web interface address http://localhost:8081
> Starting execution of program
> Submitting job with JobID: 9c7120e5cc55b2a9157a7e2bc5a12c9d. Waiting for job 
> completion.
> org.apache.flink.client.program.ProgramInvocationException: The program 
> execution failed: Communication with JobManager failed: Lost connection to 
> the JobManager.
> Job has been submitted with JobID 9c7120e5cc55b2a9157a7e2bc5a12c9d
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4166) Generate automatic different namespaces in Zookeeper for Flink applications

2016-07-18 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382162#comment-15382162
 ] 

Maximilian Michels commented on FLINK-4166:
---

[~srichter] Are you planning to work on a fix?

> Generate automatic different namespaces in Zookeeper for Flink applications
> ---
>
> Key: FLINK-4166
> URL: https://issues.apache.org/jira/browse/FLINK-4166
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.0.3
>Reporter: Stefan Richter
>
> We should automatically generate different namespaces per Flink application 
> in Zookeeper to avoid interference between different applications that refer 
> to the same Zookeeper entries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-4144) Yarn properties file: replace hostname/port with Yarn application id

2016-07-01 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels resolved FLINK-4144.
---
Resolution: Fixed

Resolved with 7ab6837fde3adb588273ef6bb8f4f7a215fe9c03

> Yarn properties file: replace hostname/port with Yarn application id
> 
>
> Key: FLINK-4144
> URL: https://issues.apache.org/jira/browse/FLINK-4144
> Project: Flink
>  Issue Type: Bug
>  Components: YARN Client
>Affects Versions: 1.0.3
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> We should use the application id instead of the host/port. The hostname and 
> port of the JobManager can change (HA). Also, it is not unique depending on 
> the network configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3675) YARN ship folder incosistent behavior

2016-07-01 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15359399#comment-15359399
 ] 

Maximilian Michels commented on FLINK-3675:
---

Additional fix with 16cdb61225d78c822566e33013162fa3e40fa279

> YARN ship folder incosistent behavior
> -
>
> Key: FLINK-3675
> URL: https://issues.apache.org/jira/browse/FLINK-3675
> Project: Flink
>  Issue Type: Bug
>  Components: YARN Client
>Affects Versions: 1.0.0
>Reporter: Stefano Baghino
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 1.1.0
>
>
> After [some discussion on the user mailing 
> list|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-and-YARN-ship-folder-td5458.html]
>  it came up that the {{flink/lib}} folder is always supposed to be shipped to 
> the YARN cluster so that all the nodes have access to its contents.
> Currently however, the Flink long-running YARN session actually ships the 
> folder because it's explicitly specified in the {{yarn-session.sh}} script, 
> while running a single job on YARN does not automatically ship it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-4141) TaskManager failures not always recover when killed during an ApplicationMaster failure in HA mode on Yarn

2016-07-01 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels resolved FLINK-4141.
---
   Resolution: Fixed
Fix Version/s: 1.1.0

Resolved with f722b73772eb66cdb79a288300e38ff7026c7e1f

> TaskManager failures not always recover when killed during an 
> ApplicationMaster failure in HA mode on Yarn
> --
>
> Key: FLINK-4141
> URL: https://issues.apache.org/jira/browse/FLINK-4141
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.0.3
>Reporter: Stefan Richter
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> High availability on Yarn often fails to recover in the following test 
> scenario:
> 1. Kill application master process.
> 2. Then, while application master is recovering, randomly kill several task 
> managers (with some delay).
> After the application master recovered, not all the killed task manager are 
> brought back and no further attempts are made the restart them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-4144) Yarn properties file: replace hostname/port with Yarn application id

2016-07-01 Thread Maximilian Michels (JIRA)
Maximilian Michels created FLINK-4144:
-

 Summary: Yarn properties file: replace hostname/port with Yarn 
application id
 Key: FLINK-4144
 URL: https://issues.apache.org/jira/browse/FLINK-4144
 Project: Flink
  Issue Type: Bug
  Components: YARN Client
Affects Versions: 1.0.3
Reporter: Maximilian Michels
Assignee: Maximilian Michels
 Fix For: 1.1.0


We should use the application id instead of the host/port. The hostname and 
port of the JobManager can change (HA). Also, it is not unique depending on the 
network configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-4141) TaskManager failures not always recover when killed during an ApplicationMaster failure in HA mode on Yarn

2016-07-01 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels reassigned FLINK-4141:
-

Assignee: Maximilian Michels

> TaskManager failures not always recover when killed during an 
> ApplicationMaster failure in HA mode on Yarn
> --
>
> Key: FLINK-4141
> URL: https://issues.apache.org/jira/browse/FLINK-4141
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.0.3
>Reporter: Stefan Richter
>Assignee: Maximilian Michels
>
> High availability on Yarn often fails to recover in the following test 
> scenario:
> 1. Kill application master process.
> 2. Then, while application master is recovering, randomly kill several task 
> managers (with some delay).
> After the application master recovered, not all the killed task manager are 
> brought back and no further attempts are made the restart them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-3675) YARN ship folder incosistent behavior

2016-07-01 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels resolved FLINK-3675.
---
Resolution: Fixed

Fixed via 0483ba583c7790d13b8035c2916318a2b58c67d6

> YARN ship folder incosistent behavior
> -
>
> Key: FLINK-3675
> URL: https://issues.apache.org/jira/browse/FLINK-3675
> Project: Flink
>  Issue Type: Bug
>  Components: YARN Client
>Affects Versions: 1.0.0
>Reporter: Stefano Baghino
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 1.1.0
>
>
> After [some discussion on the user mailing 
> list|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-and-YARN-ship-folder-td5458.html]
>  it came up that the {{flink/lib}} folder is always supposed to be shipped to 
> the YARN cluster so that all the nodes have access to its contents.
> Currently however, the Flink long-running YARN session actually ships the 
> folder because it's explicitly specified in the {{yarn-session.sh}} script, 
> while running a single job on YARN does not automatically ship it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3667) Generalize client<->cluster communication

2016-07-01 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358960#comment-15358960
 ] 

Maximilian Michels commented on FLINK-3667:
---

Fixed via f9b52a3114a2114e6846091acf3abb294a49615b 

Additional fixes:
3b593632dd162d951281fab8a8ed8c6bc2b07b39 
a3aea27983d23d48bbad92c400d4cd42f36fabd3
cfd48a6f510c937080df0918fcb05aa410885c29
8d589623d2c2d039b014bc8783bef25351ec36ce

> Generalize client<->cluster communication
> -
>
> Key: FLINK-3667
> URL: https://issues.apache.org/jira/browse/FLINK-3667
> Project: Flink
>  Issue Type: Improvement
>  Components: YARN Client
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> Here are some notes I took when inspecting the client<->cluster classes with 
> regard to future integration of other resource management frameworks in 
> addition to Yarn (e.g. Mesos).
> {noformat}
> 1 Cluster Client Abstraction
> 
> 1.1 Status Quo
> ──
> 1.1.1 FlinkYarnClient
> ╌
>   • Holds the cluster configuration (Flink-specific and Yarn-specific)
>   • Contains the deploy() method to deploy the cluster
>   • Creates the Hadoop Yarn client
>   • Receives the initial job manager address
>   • Bootstraps the FlinkYarnCluster
> 1.1.2 FlinkYarnCluster
> ╌╌
>   • Wrapper around the Hadoop Yarn client
>   • Queries cluster for status updates
>   • Life time methods to start and shutdown the cluster
>   • Flink specific features like shutdown after job completion
> 1.1.3 ApplicationClient
> ╌╌╌
>   • Acts as a middle-man for asynchronous cluster communication
>   • Designed to communicate with Yarn, not used in Standalone mode
> 1.1.4 CliFrontend
> ╌
>   • Deeply integrated with FlinkYarnClient and FlinkYarnCluster
>   • Constantly distinguishes between Yarn and Standalone mode
>   • Would be nice to have a general abstraction in place
> 1.1.5 Client
> 
>   • Job submission and Job related actions, agnostic of resource framework
> 1.2 Proposal
> 
> 1.2.1 ClusterConfig (before: AbstractFlinkYarnClient)
> ╌
>   • Extensible cluster-agnostic config
>   • May be extended by specific cluster, e.g. YarnClusterConfig
> 1.2.2 ClusterClient (before: AbstractFlinkYarnClient)
> ╌
>   • Deals with cluster (RM) specific communication
>   • Exposes framework agnostic information
>   • YarnClusterClient, MesosClusterClient, StandaloneClusterClient
> 1.2.3 FlinkCluster (before: AbstractFlinkYarnCluster)
> ╌
>   • Basic interface to communicate with a running cluster
>   • Receives the ClusterClient for cluster-specific communication
>   • Should not have to care about the specific implementations of the
> client
> 1.2.4 ApplicationClient
> ╌╌╌
>   • Can be changed to work cluster-agnostic (first steps already in
> FLINK-3543)
> 1.2.5 CliFrontend
> ╌
>   • CliFrontend does never have to differentiate between different
> cluster types after it has determined which cluster class to load.
>   • Base class handles framework agnostic command line arguments
>   • Pluggables for Yarn, Mesos handle specific commands
> {noformat}
> I would like to create/refactor the affected classes to set us up for a more 
> flexible client side resource management abstraction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-4139) Yarn: Adjust parallelism and task slots correctly

2016-07-01 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed FLINK-4139.
-
Resolution: Fixed

Fixed via 44b3bc45b382c1f2783e9c17dd76ea2e9bbb40ec

> Yarn: Adjust parallelism and task slots correctly
> -
>
> Key: FLINK-4139
> URL: https://issues.apache.org/jira/browse/FLINK-4139
> Project: Flink
>  Issue Type: Bug
>  Components: Client, YARN Client
>Affects Versions: 1.1.0, 1.0.3
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
> Fix For: 1.1.0
>
>
> The Yarn CLI should handle the following situations correctly:
> - The user specifies no parallelism -> parallelism is adjusted to #taskSlots 
> * #nodes.
> - The user specifies parallelism but no #taskSlots or too few slots -> 
> #taskSlots are set such that they meet the parallelism
> These functionality has been present in Flink 1.0.x but there were some 
> glitches in the implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-4139) Yarn: Adjust parallelism and task slots correctly

2016-07-01 Thread Maximilian Michels (JIRA)
Maximilian Michels created FLINK-4139:
-

 Summary: Yarn: Adjust parallelism and task slots correctly
 Key: FLINK-4139
 URL: https://issues.apache.org/jira/browse/FLINK-4139
 Project: Flink
  Issue Type: Bug
  Components: Client, YARN Client
Affects Versions: 1.0.3, 1.1.0
Reporter: Maximilian Michels
Assignee: Maximilian Michels
Priority: Minor
 Fix For: 1.1.0


The Yarn CLI should handle the following situations correctly:

- The user specifies no parallelism -> parallelism is adjusted to #taskSlots * 
#nodes.

- The user specifies parallelism but no #taskSlots or too few slots -> 
#taskSlots are set such that they meet the parallelism

These functionality has been present in Flink 1.0.x but there were some 
glitches in the implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-3675) YARN ship folder incosistent behavior

2016-06-27 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350953#comment-15350953
 ] 

Maximilian Michels edited comment on FLINK-3675 at 6/27/16 1:41 PM:


Option 2 is easiest but might be unexpected to the user. I'd rather choose 
option 1.


was (Author: mxm):
Option 1 is easiest but might be unexpected to the user. I'd rather choose 
option 2.

> YARN ship folder incosistent behavior
> -
>
> Key: FLINK-3675
> URL: https://issues.apache.org/jira/browse/FLINK-3675
> Project: Flink
>  Issue Type: Bug
>  Components: YARN Client
>Affects Versions: 1.0.0
>Reporter: Stefano Baghino
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 1.1.0
>
>
> After [some discussion on the user mailing 
> list|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-and-YARN-ship-folder-td5458.html]
>  it came up that the {{flink/lib}} folder is always supposed to be shipped to 
> the YARN cluster so that all the nodes have access to its contents.
> Currently however, the Flink long-running YARN session actually ships the 
> folder because it's explicitly specified in the {{yarn-session.sh}} script, 
> while running a single job on YARN does not automatically ship it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3675) YARN ship folder incosistent behavior

2016-06-27 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351013#comment-15351013
 ] 

Maximilian Michels commented on FLINK-3675:
---

Woops. Not enough coffee today ;) Yes, I meant option 1!

The Flink assembly has to be in the /lib folder (or specified via {{-yj}}) but 
it is still shipped even if the lib folder is not shipped. Otherwise, we 
wouldn't be able to run any Flink clusters using {{./flink -m yarn-cluster}} 
because the /lib folder is currently not shipped there.

> YARN ship folder incosistent behavior
> -
>
> Key: FLINK-3675
> URL: https://issues.apache.org/jira/browse/FLINK-3675
> Project: Flink
>  Issue Type: Bug
>  Components: YARN Client
>Affects Versions: 1.0.0
>Reporter: Stefano Baghino
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 1.1.0
>
>
> After [some discussion on the user mailing 
> list|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-and-YARN-ship-folder-td5458.html]
>  it came up that the {{flink/lib}} folder is always supposed to be shipped to 
> the YARN cluster so that all the nodes have access to its contents.
> Currently however, the Flink long-running YARN session actually ships the 
> folder because it's explicitly specified in the {{yarn-session.sh}} script, 
> while running a single job on YARN does not automatically ship it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3675) YARN ship folder incosistent behavior

2016-06-27 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350974#comment-15350974
 ] 

Maximilian Michels commented on FLINK-3675:
---

So that would be option 2 :) Actually, the Flink assembly (fat jar) is always 
shipped, regardless of the ship files specified.

> YARN ship folder incosistent behavior
> -
>
> Key: FLINK-3675
> URL: https://issues.apache.org/jira/browse/FLINK-3675
> Project: Flink
>  Issue Type: Bug
>  Components: YARN Client
>Affects Versions: 1.0.0
>Reporter: Stefano Baghino
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 1.1.0
>
>
> After [some discussion on the user mailing 
> list|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-and-YARN-ship-folder-td5458.html]
>  it came up that the {{flink/lib}} folder is always supposed to be shipped to 
> the YARN cluster so that all the nodes have access to its contents.
> Currently however, the Flink long-running YARN session actually ships the 
> folder because it's explicitly specified in the {{yarn-session.sh}} script, 
> while running a single job on YARN does not automatically ship it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3675) YARN ship folder incosistent behavior

2016-06-27 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350953#comment-15350953
 ] 

Maximilian Michels commented on FLINK-3675:
---

Option 1 is easiest but might be unexpected to the user. I'd rather choose 
option 2.

> YARN ship folder incosistent behavior
> -
>
> Key: FLINK-3675
> URL: https://issues.apache.org/jira/browse/FLINK-3675
> Project: Flink
>  Issue Type: Bug
>  Components: YARN Client
>Affects Versions: 1.0.0
>Reporter: Stefano Baghino
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 1.1.0
>
>
> After [some discussion on the user mailing 
> list|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-and-YARN-ship-folder-td5458.html]
>  it came up that the {{flink/lib}} folder is always supposed to be shipped to 
> the YARN cluster so that all the nodes have access to its contents.
> Currently however, the Flink long-running YARN session actually ships the 
> folder because it's explicitly specified in the {{yarn-session.sh}} script, 
> while running a single job on YARN does not automatically ship it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-3675) YARN ship folder incosistent behavior

2016-06-27 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350951#comment-15350951
 ] 

Maximilian Michels edited comment on FLINK-3675 at 6/27/16 1:04 PM:


Do we want 1) to ship the /lib folder even if the user specifies a ship folder 
manually (via `-yt`)? Or do we want 2) to overwrite the default lib folder with 
the user location (as it is now for {{yarn-session.sh}})?


was (Author: mxm):
Do we want to ship the /lib folder even if the user specifies a ship folder 
manually (via `-yt`)? Or do we want to overwrite the default lib folder with 
the user location (as it is now for {{yarn-session.sh}})?

> YARN ship folder incosistent behavior
> -
>
> Key: FLINK-3675
> URL: https://issues.apache.org/jira/browse/FLINK-3675
> Project: Flink
>  Issue Type: Bug
>  Components: YARN Client
>Affects Versions: 1.0.0
>Reporter: Stefano Baghino
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 1.1.0
>
>
> After [some discussion on the user mailing 
> list|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-and-YARN-ship-folder-td5458.html]
>  it came up that the {{flink/lib}} folder is always supposed to be shipped to 
> the YARN cluster so that all the nodes have access to its contents.
> Currently however, the Flink long-running YARN session actually ships the 
> folder because it's explicitly specified in the {{yarn-session.sh}} script, 
> while running a single job on YARN does not automatically ship it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3675) YARN ship folder incosistent behavior

2016-06-27 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350951#comment-15350951
 ] 

Maximilian Michels commented on FLINK-3675:
---

Do we want to ship the /lib folder even if the user specifies a ship folder 
manually (via `-yt`)? Or do we want to overwrite the default lib folder with 
the user location (as it is now for {{yarn-session.sh}})?

> YARN ship folder incosistent behavior
> -
>
> Key: FLINK-3675
> URL: https://issues.apache.org/jira/browse/FLINK-3675
> Project: Flink
>  Issue Type: Bug
>  Components: YARN Client
>Affects Versions: 1.0.0
>Reporter: Stefano Baghino
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 1.1.0
>
>
> After [some discussion on the user mailing 
> list|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-and-YARN-ship-folder-td5458.html]
>  it came up that the {{flink/lib}} folder is always supposed to be shipped to 
> the YARN cluster so that all the nodes have access to its contents.
> Currently however, the Flink long-running YARN session actually ships the 
> folder because it's explicitly specified in the {{yarn-session.sh}} script, 
> while running a single job on YARN does not automatically ship it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-3675) YARN ship folder incosistent behavior

2016-06-27 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels reassigned FLINK-3675:
-

Assignee: Maximilian Michels

> YARN ship folder incosistent behavior
> -
>
> Key: FLINK-3675
> URL: https://issues.apache.org/jira/browse/FLINK-3675
> Project: Flink
>  Issue Type: Bug
>  Components: YARN Client
>Affects Versions: 1.0.0
>Reporter: Stefano Baghino
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 1.1.0
>
>
> After [some discussion on the user mailing 
> list|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-and-YARN-ship-folder-td5458.html]
>  it came up that the {{flink/lib}} folder is always supposed to be shipped to 
> the YARN cluster so that all the nodes have access to its contents.
> Currently however, the Flink long-running YARN session actually ships the 
> folder because it's explicitly specified in the {{yarn-session.sh}} script, 
> while running a single job on YARN does not automatically ship it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3710) ScalaDocs for org.apache.flink.streaming.scala are missing from the web site

2016-06-27 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350783#comment-15350783
 ] 

Maximilian Michels commented on FLINK-3710:
---

Thanks for reporting! We need to investigate why that is actually the case.

I think [~aljoscha] had been working on this some time ago. Do you know how we 
could fix this [~aljoscha]?

> ScalaDocs for org.apache.flink.streaming.scala are missing from the web site
> 
>
> Key: FLINK-3710
> URL: https://issues.apache.org/jira/browse/FLINK-3710
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.0.1
>Reporter: Elias Levy
> Fix For: 1.1.0, 1.0.4
>
>
> The ScalaDocs only include docs for org.apache.flink.scala and sub-packages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3710) ScalaDocs for org.apache.flink.streaming.scala are missing from the web site

2016-06-27 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated FLINK-3710:
--
Fix Version/s: 1.1.0

> ScalaDocs for org.apache.flink.streaming.scala are missing from the web site
> 
>
> Key: FLINK-3710
> URL: https://issues.apache.org/jira/browse/FLINK-3710
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.0.1
>Reporter: Elias Levy
> Fix For: 1.1.0, 1.0.4
>
>
> The ScalaDocs only include docs for org.apache.flink.scala and sub-packages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3710) ScalaDocs for org.apache.flink.streaming.scala are missing from the web site

2016-06-27 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated FLINK-3710:
--
Fix Version/s: 1.0.4

> ScalaDocs for org.apache.flink.streaming.scala are missing from the web site
> 
>
> Key: FLINK-3710
> URL: https://issues.apache.org/jira/browse/FLINK-3710
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.0.1
>Reporter: Elias Levy
> Fix For: 1.1.0, 1.0.4
>
>
> The ScalaDocs only include docs for org.apache.flink.scala and sub-packages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3964) Job submission times out with recursive.file.enumeration

2016-06-27 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350759#comment-15350759
 ] 

Maximilian Michels commented on FLINK-3964:
---

I see where you're going. That's currently not possible and wouldn't help in 
your case. First, the client would still have to wait until all vertices have 
been configured. Second, it is actually a single vertex (we're at a 
{{JobGraph}} level at submission time). Your Hadoop input format corresponds to 
one JobGraph vertex. Only later, the {{ExecutionGraph}} is created where the 
single vertex will be split up in the number of parallelism vertices.

+1 to add a notice in the timeout exception for increasing 
{{akka.client.timeout}}.

> Job submission times out with recursive.file.enumeration
> 
>
> Key: FLINK-3964
> URL: https://issues.apache.org/jira/browse/FLINK-3964
> Project: Flink
>  Issue Type: Bug
>  Components: Batch Connectors and Input/Output Formats, DataSet API
>Affects Versions: 1.0.0
>Reporter: Juho Autio
>
> When using "recursive.file.enumeration" with a big enough folder structure to 
> list, flink batch job fails right at the beginning because of a timeout.
> h2. Problem details
> We get this error: {{Communication with JobManager failed: Job submission to 
> the JobManager timed out}}.
> The code we have is basically this:
> {code}
> val env = ExecutionEnvironment.getExecutionEnvironment
> val parameters = new Configuration
> // set the recursive enumeration parameter
> parameters.setBoolean("recursive.file.enumeration", true)
> val parameter = ParameterTool.fromArgs(args)
> val input_data_path : String = parameter.get("input_data_path", null )
> val data : DataSet[(Text,Text)] = env.readSequenceFile(classOf[Text], 
> classOf[Text], input_data_path)
> .withParameters(parameters)
> data.first(10).print
> {code}
> If we set {{input_data_path}} parameter to {{s3n://bucket/path/date=*/}} it 
> times out. If we use a more restrictive pattern like 
> {{s3n://bucket/path/date=20160523/}}, it doesn't time out.
> To me it seems that time taken to list files shouldn't cause any timeouts on 
> job submission level.
> For us this was "fixed" by adding {{akka.client.timeout: 600 s}} in 
> {{flink-conf.yaml}}, but I wonder if the timeout would still occur if we have 
> even more files to list?
> 
> P.S. Is there any way to set {{akka.client.timeout}} when calling {{bin/flink 
> run}} instead of editing {{flink-conf.yaml}}. I tried to add it as a {{-yD}} 
> flag but couldn't get it working.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-4041) Failure while asking ResourceManager for RegisterResource

2016-06-25 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels resolved FLINK-4041.
---
   Resolution: Fixed
Fix Version/s: 1.1.0

Resolved with a5aa4e115814539518fcaa88cbbca47da9d7ede5

> Failure while asking ResourceManager for RegisterResource
> -
>
> Key: FLINK-4041
> URL: https://issues.apache.org/jira/browse/FLINK-4041
> Project: Flink
>  Issue Type: Bug
>  Components: ResourceManager
>Affects Versions: 1.1.0
>Reporter: Robert Metzger
>Assignee: Maximilian Michels
>  Labels: test-stability
> Fix For: 1.1.0
>
>
> In this build 
> (https://s3.amazonaws.com/archive.travis-ci.org/jobs/136372462/log.txt), I 
> got the following YARN Test failure:
> {code}
> 2016-06-09 10:21:42,336 INFO  
> akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down 
> remote daemon.
> 2016-06-09 10:21:42,336 INFO  
> akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon 
> shut down; proceeding with flushing remote transports.
> 2016-06-09 10:21:42,355 INFO  
> akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut 
> down.
> 2016-06-09 10:21:42,376 ERROR org.apache.flink.yarn.YarnJobManager
>   - Failure while asking ResourceManager for RegisterResource
> akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/$c#1255104255]] after [1 ms]
>   at 
> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
>   at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
>   at 
> akka.actor.LightArrayRevolverScheduler$TaskHolder.run(Scheduler.scala:476)
>   at 
> akka.actor.LightArrayRevolverScheduler$$anonfun$close$1.apply(Scheduler.scala:282)
>   at 
> akka.actor.LightArrayRevolverScheduler$$anonfun$close$1.apply(Scheduler.scala:281)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at akka.actor.LightArrayRevolverScheduler.close(Scheduler.scala:280)
>   at akka.actor.ActorSystemImpl.stopScheduler(ActorSystem.scala:688)
>   at 
> akka.actor.ActorSystemImpl$$anonfun$liftedTree2$1$1.apply$mcV$sp(ActorSystem.scala:617)
>   at 
> akka.actor.ActorSystemImpl$$anonfun$liftedTree2$1$1.apply(ActorSystem.scala:617)
>   at 
> akka.actor.ActorSystemImpl$$anonfun$liftedTree2$1$1.apply(ActorSystem.scala:617)
>   at akka.actor.ActorSystemImpl$$anon$3.run(ActorSystem.scala:641)
>   at 
> akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.runNext$1(ActorSystem.scala:808)
>   at 
> akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.apply$mcV$sp(ActorSystem.scala:811)
>   at 
> akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.apply(ActorSystem.scala:804)
>   at 
> akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.apply(ActorSystem.scala:804)
>   at akka.util.ReentrantGuard.withGuard(LockUtil.scala:15)
>   at 
> akka.actor.ActorSystemImpl$TerminationCallbacks.run(ActorSystem.scala:804)
>   at 
> akka.actor.ActorSystemImpl$$anonfun$terminationCallbacks$1.apply(ActorSystem.scala:638)
>   at 
> akka.actor.ActorSystemImpl$$anonfun$terminationCallbacks$1.apply(ActorSystem.scala:638)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>   at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 2016-06-09 10:21:42,376 INFO  org.apache.flink.yarn.YarnJobManager
>   - Shutdown compl

[jira] [Commented] (FLINK-1946) Make yarn tests logging less verbose

2016-06-25 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15349695#comment-15349695
 ] 

Maximilian Michels commented on FLINK-1946:
---

Fix with 5b2ad7f03ef67ce529551e7b464d7db94e2a1d90. Also, the log level for 
Hadoop classes has been adjusted previously. I think we want to keep the log 
level at INFO for debugging purposes.

Anything else we can do?

> Make yarn tests logging less verbose
> 
>
> Key: FLINK-1946
> URL: https://issues.apache.org/jira/browse/FLINK-1946
> Project: Flink
>  Issue Type: Improvement
>  Components: YARN Client
>Reporter: Till Rohrmann
>Priority: Minor
>
> Currently, the yarn tests log on the INFO level making the test outputs 
> confusing. Furthermore some status messages are written to stdout. I think 
> these messages are not necessary to be shown to the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-3838) CLI parameter parser is munging application params

2016-06-25 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels resolved FLINK-3838.
---
Resolution: Fixed

master: dfecca7794f56eb2a4438e3bdca6878379929cfa
release-1.0: f5a40855fbd90a375cb920358baaf176876a42f0

> CLI parameter parser is munging application params
> --
>
> Key: FLINK-3838
> URL: https://issues.apache.org/jira/browse/FLINK-3838
> Project: Flink
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.0.2, 1.0.3
>Reporter: Ken Krugler
>Assignee: Maximilian Michels
>Priority: Minor
> Fix For: 1.1.0, 1.0.4
>
>
> If parameters for an application use a single '-' (e.g. -maxtasks) then the 
> CLI argument parser will munge these, and the app gets passed either just the 
> parameter name (e.g. 'maxtask') if the start of the parameter doesn't match a 
> Flink parameter, or you get two values, with the first value being the part 
> that matched (e.g. '-m') and the second value being the rest (e.g. 'axtasks').
> The parser should ignore everything after the jar path parameter.
> Note that using -- does seem to work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-3757) addAccumulator does not throw Exception on duplicate accumulator name

2016-06-25 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels resolved FLINK-3757.
---
   Resolution: Fixed
 Assignee: Maximilian Michels
Fix Version/s: 1.1.0

Fixed via 9c15406400af16861be8cf08e60d94074addbba0

> addAccumulator does not throw Exception on duplicate accumulator name
> -
>
> Key: FLINK-3757
> URL: https://issues.apache.org/jira/browse/FLINK-3757
> Project: Flink
>  Issue Type: Bug
>Reporter: Konstantin Knauf
>Assignee: Maximilian Michels
>Priority: Minor
> Fix For: 1.1.0
>
>
> Works if tasks are chained, but in many situations it does not, see
> See https://gist.github.com/knaufk/a0026eff9792aa905a2f586f3b0cb112
> Is this an undocumented feature or a valid bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3864) Yarn tests don't check for prohibited strings in log output

2016-06-25 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed FLINK-3864.
-
Resolution: Fixed

Fixed with d92aeb7aa63490278d9a9e36aae595e6cbe39f32

> Yarn tests don't check for prohibited strings in log output
> ---
>
> Key: FLINK-3864
> URL: https://issues.apache.org/jira/browse/FLINK-3864
> Project: Flink
>  Issue Type: Bug
>  Components: Tests, YARN Client
>Affects Versions: 1.1.0, 1.0.2
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> {{YarnTestBase.runWithArgs(...)}} provides a parameter for strings which must 
> not appear in the log output. {{perJobYarnCluster}} and 
> {{perJobYarnClusterWithParallelism}} have "System.out)" prepended to the 
> prohibited strings; probably an artifact of an older test code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4084) Add configDir parameter to CliFrontend and flink shell script

2016-06-22 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344402#comment-15344402
 ] 

Maximilian Michels commented on FLINK-4084:
---

Not sure if I understand correctly. I think the command-line parameter should 
have precedence over the environment variable. So, if {{ export 
FLINK_CONF_DIR=mydir1}} is set and you also specify {{--configDir=mydir2}}, 
then mydir2 should be used as config directory.

> Add configDir parameter to CliFrontend and flink shell script
> -
>
> Key: FLINK-4084
> URL: https://issues.apache.org/jira/browse/FLINK-4084
> Project: Flink
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: Andrea Sella
>Priority: Minor
>
> At the moment there is no other way than the environment variable 
> FLINK_CONF_DIR to specify the configuration directory for the CliFrontend if 
> it is started via the flink shell script. In order to improve the user 
> exprience, I would propose to introduce a {{--configDir}} parameter which the 
> user can use to specify a configuration directory more easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-4084) Add configDir parameter to CliFrontend and flink shell script

2016-06-22 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344402#comment-15344402
 ] 

Maximilian Michels edited comment on FLINK-4084 at 6/22/16 2:40 PM:


Not sure if I understand correctly. The command-line parameter should have 
precedence over the environment variable. So, if {{export 
FLINK_CONF_DIR=mydir1}} is set and you also specify {{--configDir=mydir2}}, 
then mydir2 should be used as config directory.


was (Author: mxm):
Not sure if I understand correctly. The command-line parameter should have 
precedence over the environment variable. So, if {{ export 
FLINK_CONF_DIR=mydir1}} is set and you also specify {{--configDir=mydir2}}, 
then mydir2 should be used as config directory.

> Add configDir parameter to CliFrontend and flink shell script
> -
>
> Key: FLINK-4084
> URL: https://issues.apache.org/jira/browse/FLINK-4084
> Project: Flink
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: Andrea Sella
>Priority: Minor
>
> At the moment there is no other way than the environment variable 
> FLINK_CONF_DIR to specify the configuration directory for the CliFrontend if 
> it is started via the flink shell script. In order to improve the user 
> exprience, I would propose to introduce a {{--configDir}} parameter which the 
> user can use to specify a configuration directory more easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-4084) Add configDir parameter to CliFrontend and flink shell script

2016-06-22 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344402#comment-15344402
 ] 

Maximilian Michels edited comment on FLINK-4084 at 6/22/16 2:39 PM:


Not sure if I understand correctly. The command-line parameter should have 
precedence over the environment variable. So, if {{ export 
FLINK_CONF_DIR=mydir1}} is set and you also specify {{--configDir=mydir2}}, 
then mydir2 should be used as config directory.


was (Author: mxm):
Not sure if I understand correctly. I think the command-line parameter should 
have precedence over the environment variable. So, if {{ export 
FLINK_CONF_DIR=mydir1}} is set and you also specify {{--configDir=mydir2}}, 
then mydir2 should be used as config directory.

> Add configDir parameter to CliFrontend and flink shell script
> -
>
> Key: FLINK-4084
> URL: https://issues.apache.org/jira/browse/FLINK-4084
> Project: Flink
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: Andrea Sella
>Priority: Minor
>
> At the moment there is no other way than the environment variable 
> FLINK_CONF_DIR to specify the configuration directory for the CliFrontend if 
> it is started via the flink shell script. In order to improve the user 
> exprience, I would propose to introduce a {{--configDir}} parameter which the 
> user can use to specify a configuration directory more easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4084) Add configDir parameter to CliFrontend and flink shell script

2016-06-21 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341731#comment-15341731
 ] 

Maximilian Michels commented on FLINK-4084:
---

Actually, it is a bit more than that. The idea is to add a {{configDir}} 
parameter to {{CliFrontend}} that users can discover when they examine the help 
output. Have a look at {{CliFrontendParser}} for existing parameters.

The shell scripts also need to know the config directory location to load some 
configuration (e.g. slaves file for starting a cluster). The change you posted 
would only cover this part.

> Add configDir parameter to CliFrontend and flink shell script
> -
>
> Key: FLINK-4084
> URL: https://issues.apache.org/jira/browse/FLINK-4084
> Project: Flink
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: Andrea Sella
>Priority: Minor
>
> At the moment there is no other way than the environment variable 
> FLINK_CONF_DIR to specify the configuration directory for the CliFrontend if 
> it is started via the flink shell script. In order to improve the user 
> exprience, I would propose to introduce a {{--configDir}} parameter which the 
> user can use to specify a configuration directory more easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-4079) YARN properties file used for per-job cluster

2016-06-21 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed FLINK-4079.
-
Resolution: Fixed

master: ec6d97528e8b21f191b7922e4810fd60804c8365
release-1.0: 3163638a61f49dd2df55467615096918bec3a890

Fix will be included in the 1.0.4 release.

> YARN properties file used for per-job cluster
> -
>
> Key: FLINK-4079
> URL: https://issues.apache.org/jira/browse/FLINK-4079
> Project: Flink
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.0.3
>Reporter: Ufuk Celebi
>Assignee: Maximilian Michels
> Fix For: 1.1.0, 1.0.4
>
>
> YARN per job clusters (flink run -m yarn-cluster) rely on the hidden YARN 
> properties file, which defines the container configuration. This can lead to 
> unexpected behaviour, because the per-job-cluster configuration is merged  
> with the YARN properties file (or used as only configuration source).
> A user ran into this as follows:
> - Create a long-lived YARN session with HA (creates a hidden YARN properties 
> file)
> - Submits standalone batch jobs with a per job cluster (flink run -m 
> yarn-cluster). The batch jobs get submitted to the long lived HA cluster, 
> because of the properties file.
> [~mxm] Am I correct in assuming that this is only relevant for the 1.0 branch 
> and will be fixed with the client refactoring you are working on?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3964) Job submission times out with recursive.file.enumeration

2016-06-21 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341654#comment-15341654
 ] 

Maximilian Michels commented on FLINK-3964:
---

You're right, Flink uses {{akka.client.timeout}} as timeout for job 
submissions. The Yarn properties can only be used with the Yarn mode. Just 
checked that {{bin/flink run -m yarn-cluster -yn 1 -yDakka.ask.timeout=1ms}} 
does set the client timeout to 1 millisecond (it times out).

The actual cause of the issue is that the JobManager calls the {{configure()}} 
on all Input/Output vertices of the job. That is where the files of the input 
format are loaded, causing a delay in the job submission which can lead to a 
timeout. IMHO the only fix is to increase the timeout.

If you want, we can file an issue for changing config parameters via the CLI.

> Job submission times out with recursive.file.enumeration
> 
>
> Key: FLINK-3964
> URL: https://issues.apache.org/jira/browse/FLINK-3964
> Project: Flink
>  Issue Type: Bug
>  Components: Batch Connectors and Input/Output Formats, DataSet API
>Affects Versions: 1.0.0
>Reporter: Juho Autio
>
> When using "recursive.file.enumeration" with a big enough folder structure to 
> list, flink batch job fails right at the beginning because of a timeout.
> h2. Problem details
> We get this error: {{Communication with JobManager failed: Job submission to 
> the JobManager timed out}}.
> The code we have is basically this:
> {code}
> val env = ExecutionEnvironment.getExecutionEnvironment
> val parameters = new Configuration
> // set the recursive enumeration parameter
> parameters.setBoolean("recursive.file.enumeration", true)
> val parameter = ParameterTool.fromArgs(args)
> val input_data_path : String = parameter.get("input_data_path", null )
> val data : DataSet[(Text,Text)] = env.readSequenceFile(classOf[Text], 
> classOf[Text], input_data_path)
> .withParameters(parameters)
> data.first(10).print
> {code}
> If we set {{input_data_path}} parameter to {{s3n://bucket/path/date=*/}} it 
> times out. If we use a more restrictive pattern like 
> {{s3n://bucket/path/date=20160523/}}, it doesn't time out.
> To me it seems that time taken to list files shouldn't cause any timeouts on 
> job submission level.
> For us this was "fixed" by adding {{akka.client.timeout: 600 s}} in 
> {{flink-conf.yaml}}, but I wonder if the timeout would still occur if we have 
> even more files to list?
> 
> P.S. Is there any way to set {{akka.client.timeout}} when calling {{bin/flink 
> run}} instead of editing {{flink-conf.yaml}}. I tried to add it as a {{-yD}} 
> flag but couldn't get it working.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4051) RabbitMQ Source might not react to cancel signal

2016-06-21 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341536#comment-15341536
 ] 

Maximilian Michels commented on FLINK-4051:
---

It looks to me that the {{nextDelievery()}} method is interruptible. It throws 
an {{InterruptedException}} in case it is interrupted. As [~rmetzger] pointed 
out Flink sets the interrupted flag on the source thread during cancellation.

I've just tested this behavior using this test class:

{code:java}
public class RMQSourceTest {
public static void main(String[] args) throws Exception {

final RMQConnectionConfig config = new 
RMQConnectionConfig.Builder()
.setHost("localhost")
.setPort(5672)
.setVirtualHost("/")
.setUserName("guest")
.setPassword("guest")
.build();

StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();

env.addSource(new RMQSource<>(config, "test", new 
SimpleStringSchema())).print();

env.execute();
}
}
{code}

Then I deployed a Jar on a Flink cluster with RabbitMQ running on localhost on 
port 5672:

{code:none}
flink run -c org.myorg.quickstart.RMQSourceTest quickstart-0.1.jar
{code}

Nothing was printed of course because the queue is empty :) Then I cancelled:

{code:none}
$ ./flink list
-- Running/Restarting Jobs ---
21.06.2016 12:31:25 : 122456d90688011df6f8fa73de2d036c : Flink Streaming Job 
(RUNNING)
--
No scheduled jobs.
$ flink cancel 122456d90688011df6f8fa73de2d036c
{code}

Here's the output of the program:

{noformat}
06/21/2016 12:31:25 Job execution switched to status RUNNING.
06/21/2016 12:31:25 Source: Custom Source -> Sink: Unnamed(1/1) switched to 
SCHEDULED
06/21/2016 12:31:25 Source: Custom Source -> Sink: Unnamed(1/1) switched to 
DEPLOYING
06/21/2016 12:31:25 Source: Custom Source -> Sink: Unnamed(1/1) switched to 
RUNNING
06/21/2016 12:31:50 Job execution switched to status CANCELLING.
06/21/2016 12:31:50 Source: Custom Source -> Sink: Unnamed(1/1) switched to 
CANCELING
06/21/2016 12:31:50 Source: Custom Source -> Sink: Unnamed(1/1) switched to 
CANCELED
06/21/2016 12:31:50 Job execution switched to status CANCELED.


 The program finished with the following exception:

org.apache.flink.client.program.ProgramInvocationException: The program 
execution failed: Job was cancelled.
at 
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:378)
at 
org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:91)
at 
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:355)
at 
org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:68)
at 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1381)
at org.myorg.quickstart.RMQSourceTest.main(RMQSourceTest.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:509)
at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:403)
at 
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:297)
at 
org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:740)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:253)
at 
org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:965)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1008)
Caused by: org.apache.flink.runtime.client.JobCancellationException: Job was 
cancelled.
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:798)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:752)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:752)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at 
scala.concurrent.impl.Fut

[jira] [Commented] (FLINK-4079) YARN properties file used for per-job cluster

2016-06-21 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341458#comment-15341458
 ] 

Maximilian Michels commented on FLINK-4079:
---

[~ArnaudL] I've prepared a fix for the next release.

> YARN properties file used for per-job cluster
> -
>
> Key: FLINK-4079
> URL: https://issues.apache.org/jira/browse/FLINK-4079
> Project: Flink
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.0.3
>Reporter: Ufuk Celebi
>Assignee: Maximilian Michels
> Fix For: 1.1.0, 1.0.4
>
>
> YARN per job clusters (flink run -m yarn-cluster) rely on the hidden YARN 
> properties file, which defines the container configuration. This can lead to 
> unexpected behaviour, because the per-job-cluster configuration is merged  
> with the YARN properties file (or used as only configuration source).
> A user ran into this as follows:
> - Create a long-lived YARN session with HA (creates a hidden YARN properties 
> file)
> - Submits standalone batch jobs with a per job cluster (flink run -m 
> yarn-cluster). The batch jobs get submitted to the long lived HA cluster, 
> because of the properties file.
> [~mxm] Am I correct in assuming that this is only relevant for the 1.0 branch 
> and will be fixed with the client refactoring you are working on?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4079) YARN properties file used for per-job cluster

2016-06-21 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341397#comment-15341397
 ] 

Maximilian Michels commented on FLINK-4079:
---

Makes sense to fix it then. Let me see.

> YARN properties file used for per-job cluster
> -
>
> Key: FLINK-4079
> URL: https://issues.apache.org/jira/browse/FLINK-4079
> Project: Flink
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.0.3
>Reporter: Ufuk Celebi
>Assignee: Maximilian Michels
> Fix For: 1.1.0, 1.0.4
>
>
> YARN per job clusters (flink run -m yarn-cluster) rely on the hidden YARN 
> properties file, which defines the container configuration. This can lead to 
> unexpected behaviour, because the per-job-cluster configuration is merged  
> with the YARN properties file (or used as only configuration source).
> A user ran into this as follows:
> - Create a long-lived YARN session with HA (creates a hidden YARN properties 
> file)
> - Submits standalone batch jobs with a per job cluster (flink run -m 
> yarn-cluster). The batch jobs get submitted to the long lived HA cluster, 
> because of the properties file.
> [~mxm] Am I correct in assuming that this is only relevant for the 1.0 branch 
> and will be fixed with the client refactoring you are working on?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3105) Submission in per job YARN cluster mode reuses properties file of long-lived session

2016-06-20 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed FLINK-3105.
-
   Resolution: Fixed
Fix Version/s: 1.1.0

> Submission in per job YARN cluster mode reuses properties file of long-lived 
> session
> 
>
> Key: FLINK-3105
> URL: https://issues.apache.org/jira/browse/FLINK-3105
> Project: Flink
>  Issue Type: Bug
>  Components: YARN Client
>Affects Versions: 0.10.1
>Reporter: Ufuk Celebi
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> Starting a YARN session with `bin/yarn-session.sh` creates a properties file, 
> which is used to parse job manager information when submitting jobs.
> This properties file is also used when submitting a job with {{bin/flink run 
> -m yarn-cluster}}. The {{yarn-cluster}} mode should actually start a new YARN 
> session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (FLINK-3105) Submission in per job YARN cluster mode reuses properties file of long-lived session

2016-06-20 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels reopened FLINK-3105:
---

> Submission in per job YARN cluster mode reuses properties file of long-lived 
> session
> 
>
> Key: FLINK-3105
> URL: https://issues.apache.org/jira/browse/FLINK-3105
> Project: Flink
>  Issue Type: Bug
>  Components: YARN Client
>Affects Versions: 0.10.1
>Reporter: Ufuk Celebi
>
> Starting a YARN session with `bin/yarn-session.sh` creates a properties file, 
> which is used to parse job manager information when submitting jobs.
> This properties file is also used when submitting a job with {{bin/flink run 
> -m yarn-cluster}}. The {{yarn-cluster}} mode should actually start a new YARN 
> session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3757) addAccumulator does not throw Exception on duplicate accumulator name

2016-06-20 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339749#comment-15339749
 ] 

Maximilian Michels commented on FLINK-3757:
---

Does the pull request improve the documentation?

> addAccumulator does not throw Exception on duplicate accumulator name
> -
>
> Key: FLINK-3757
> URL: https://issues.apache.org/jira/browse/FLINK-3757
> Project: Flink
>  Issue Type: Bug
>Reporter: Konstantin Knauf
>Priority: Minor
>
> Works if tasks are chained, but in many situations it does not, see
> See https://gist.github.com/knaufk/a0026eff9792aa905a2f586f3b0cb112
> Is this an undocumented feature or a valid bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3757) addAccumulator does not throw Exception on duplicate accumulator name

2016-06-20 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339744#comment-15339744
 ] 

Maximilian Michels commented on FLINK-3757:
---

It does throw an exception if you previously added it in the same Task. I 
agree, we should clarify that.

> addAccumulator does not throw Exception on duplicate accumulator name
> -
>
> Key: FLINK-3757
> URL: https://issues.apache.org/jira/browse/FLINK-3757
> Project: Flink
>  Issue Type: Bug
>Reporter: Konstantin Knauf
>Priority: Minor
>
> Works if tasks are chained, but in many situations it does not, see
> See https://gist.github.com/knaufk/a0026eff9792aa905a2f586f3b0cb112
> Is this an undocumented feature or a valid bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-3380) Unstable Test: JobSubmissionFailsITCase

2016-06-20 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels resolved FLINK-3380.
---
   Resolution: Fixed
Fix Version/s: 1.1.0

Resolving for now.

> Unstable Test: JobSubmissionFailsITCase
> ---
>
> Key: FLINK-3380
> URL: https://issues.apache.org/jira/browse/FLINK-3380
> Project: Flink
>  Issue Type: Bug
>Reporter: Kostas Kloudas
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.1.0
>
>
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/108034635/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-4095) Add configDir argument to shell scripts

2016-06-20 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed FLINK-4095.
-
Resolution: Fixed

Seems like custom config dirs are highly request now :) This duplicates 
FLINK-4084.

> Add configDir argument to shell scripts
> ---
>
> Key: FLINK-4095
> URL: https://issues.apache.org/jira/browse/FLINK-4095
> Project: Flink
>  Issue Type: Improvement
>  Components: Startup Shell Scripts
>Reporter: Ufuk Celebi
> Fix For: 1.1.0
>
>
> Currently it's very hard to execute the shell scripts with different 
> configurations, because we always use the config file in the {{conf}} 
> directory.
> Let's add a CLI argument to allow specifying a different configuration 
> directory (or file) when running the shell scripts like 
> {code}
> bin/flink run --configDir
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-4095) Add configDir argument to shell scripts

2016-06-20 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed FLINK-4095.
-
Resolution: Duplicate

> Add configDir argument to shell scripts
> ---
>
> Key: FLINK-4095
> URL: https://issues.apache.org/jira/browse/FLINK-4095
> Project: Flink
>  Issue Type: Improvement
>  Components: Startup Shell Scripts
>Reporter: Ufuk Celebi
> Fix For: 1.1.0
>
>
> Currently it's very hard to execute the shell scripts with different 
> configurations, because we always use the config file in the {{conf}} 
> directory.
> Let's add a CLI argument to allow specifying a different configuration 
> directory (or file) when running the shell scripts like 
> {code}
> bin/flink run --configDir
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (FLINK-4095) Add configDir argument to shell scripts

2016-06-20 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels reopened FLINK-4095:
---

> Add configDir argument to shell scripts
> ---
>
> Key: FLINK-4095
> URL: https://issues.apache.org/jira/browse/FLINK-4095
> Project: Flink
>  Issue Type: Improvement
>  Components: Startup Shell Scripts
>Reporter: Ufuk Celebi
> Fix For: 1.1.0
>
>
> Currently it's very hard to execute the shell scripts with different 
> configurations, because we always use the config file in the {{conf}} 
> directory.
> Let's add a CLI argument to allow specifying a different configuration 
> directory (or file) when running the shell scripts like 
> {code}
> bin/flink run --configDir
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-4090) Close of OutputStream should be in finally clause in FlinkYarnSessionCli#writeYarnProperties()

2016-06-20 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed FLINK-4090.
-
Resolution: Fixed

Fixed with f2e9c521aa4e0cbc14b71c561b8c5dd46201a45a

> Close of OutputStream should be in finally clause in 
> FlinkYarnSessionCli#writeYarnProperties()
> --
>
> Key: FLINK-4090
> URL: https://issues.apache.org/jira/browse/FLINK-4090
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Maximilian Michels
>Priority: Minor
> Fix For: 1.1.0
>
>
> {code}
> try {
>   OutputStream out = new FileOutputStream(propertiesFile);
>   properties.store(out, "Generated YARN properties file");
>   out.close();
> } catch (IOException e) {
> {code}
> The close of out should be in finally cluase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-4090) Close of OutputStream should be in finally clause in FlinkYarnSessionCli#writeYarnProperties()

2016-06-20 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels reassigned FLINK-4090:
-

Assignee: Maximilian Michels

> Close of OutputStream should be in finally clause in 
> FlinkYarnSessionCli#writeYarnProperties()
> --
>
> Key: FLINK-4090
> URL: https://issues.apache.org/jira/browse/FLINK-4090
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Maximilian Michels
>Priority: Minor
> Fix For: 1.1.0
>
>
> {code}
> try {
>   OutputStream out = new FileOutputStream(propertiesFile);
>   properties.store(out, "Generated YARN properties file");
>   out.close();
> } catch (IOException e) {
> {code}
> The close of out should be in finally cluase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-4090) Close of OutputStream should be in finally clause in FlinkYarnSessionCli#writeYarnProperties()

2016-06-20 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated FLINK-4090:
--
Fix Version/s: 1.1.0

> Close of OutputStream should be in finally clause in 
> FlinkYarnSessionCli#writeYarnProperties()
> --
>
> Key: FLINK-4090
> URL: https://issues.apache.org/jira/browse/FLINK-4090
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
> Fix For: 1.1.0
>
>
> {code}
> try {
>   OutputStream out = new FileOutputStream(propertiesFile);
>   properties.store(out, "Generated YARN properties file");
>   out.close();
> } catch (IOException e) {
> {code}
> The close of out should be in finally cluase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4089) Ineffective null check in YarnClusterClient#getApplicationStatus()

2016-06-20 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339677#comment-15339677
 ] 

Maximilian Michels commented on FLINK-4089:
---

Thank you for reporting!

> Ineffective null check in YarnClusterClient#getApplicationStatus()
> --
>
> Key: FLINK-4089
> URL: https://issues.apache.org/jira/browse/FLINK-4089
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Maximilian Michels
> Fix For: 1.1.0, 1.0.4
>
>
> Here is related code:
> {code}
> if(pollingRunner == null) {
>   LOG.warn("YarnClusterClient.getApplicationStatus() has been called on 
> an uninitialized cluster." +
>   "The system might be in an erroneous state");
> }
> ApplicationReport lastReport = pollingRunner.getLastReport();
> {code}
> If pollingRunner is null, NPE would result from the getLastReport() call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-4089) Ineffective null check in YarnClusterClient#getApplicationStatus()

2016-06-20 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed FLINK-4089.
-
   Resolution: Fixed
 Assignee: Maximilian Michels
Fix Version/s: 1.0.4

master: e6320a4c3922e934698acfe274546bccf88ad53d
release-1.0: e30bee839aadf9c3ac7d46bb35a87c0f130f4789

> Ineffective null check in YarnClusterClient#getApplicationStatus()
> --
>
> Key: FLINK-4089
> URL: https://issues.apache.org/jira/browse/FLINK-4089
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Maximilian Michels
> Fix For: 1.1.0, 1.0.4
>
>
> Here is related code:
> {code}
> if(pollingRunner == null) {
>   LOG.warn("YarnClusterClient.getApplicationStatus() has been called on 
> an uninitialized cluster." +
>   "The system might be in an erroneous state");
> }
> ApplicationReport lastReport = pollingRunner.getLastReport();
> {code}
> If pollingRunner is null, NPE would result from the getLastReport() call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-4089) Ineffective null check in YarnClusterClient#getApplicationStatus()

2016-06-20 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated FLINK-4089:
--
Fix Version/s: 1.1.0

> Ineffective null check in YarnClusterClient#getApplicationStatus()
> --
>
> Key: FLINK-4089
> URL: https://issues.apache.org/jira/browse/FLINK-4089
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
> Fix For: 1.1.0
>
>
> Here is related code:
> {code}
> if(pollingRunner == null) {
>   LOG.warn("YarnClusterClient.getApplicationStatus() has been called on 
> an uninitialized cluster." +
>   "The system might be in an erroneous state");
> }
> ApplicationReport lastReport = pollingRunner.getLastReport();
> {code}
> If pollingRunner is null, NPE would result from the getLastReport() call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4084) Add configDir parameter to CliFrontend and flink shell script

2016-06-20 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339654#comment-15339654
 ] 

Maximilian Michels commented on FLINK-4084:
---

Yes, please go ahead. I'll assign the issue to you.

> Add configDir parameter to CliFrontend and flink shell script
> -
>
> Key: FLINK-4084
> URL: https://issues.apache.org/jira/browse/FLINK-4084
> Project: Flink
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Priority: Minor
>
> At the moment there is no other way than the environment variable 
> FLINK_CONF_DIR to specify the configuration directory for the CliFrontend if 
> it is started via the flink shell script. In order to improve the user 
> exprience, I would propose to introduce a {{--configDir}} parameter which the 
> user can use to specify a configuration directory more easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-4084) Add configDir parameter to CliFrontend and flink shell script

2016-06-20 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated FLINK-4084:
--
Assignee: Andrea Sella

> Add configDir parameter to CliFrontend and flink shell script
> -
>
> Key: FLINK-4084
> URL: https://issues.apache.org/jira/browse/FLINK-4084
> Project: Flink
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: Andrea Sella
>Priority: Minor
>
> At the moment there is no other way than the environment variable 
> FLINK_CONF_DIR to specify the configuration directory for the CliFrontend if 
> it is started via the flink shell script. In order to improve the user 
> exprience, I would propose to introduce a {{--configDir}} parameter which the 
> user can use to specify a configuration directory more easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-4041) Failure while asking ResourceManager for RegisterResource

2016-06-20 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels reassigned FLINK-4041:
-

Assignee: Maximilian Michels

> Failure while asking ResourceManager for RegisterResource
> -
>
> Key: FLINK-4041
> URL: https://issues.apache.org/jira/browse/FLINK-4041
> Project: Flink
>  Issue Type: Bug
>  Components: ResourceManager
>Affects Versions: 1.1.0
>Reporter: Robert Metzger
>Assignee: Maximilian Michels
>  Labels: test-stability
>
> In this build 
> (https://s3.amazonaws.com/archive.travis-ci.org/jobs/136372462/log.txt), I 
> got the following YARN Test failure:
> {code}
> 2016-06-09 10:21:42,336 INFO  
> akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down 
> remote daemon.
> 2016-06-09 10:21:42,336 INFO  
> akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon 
> shut down; proceeding with flushing remote transports.
> 2016-06-09 10:21:42,355 INFO  
> akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut 
> down.
> 2016-06-09 10:21:42,376 ERROR org.apache.flink.yarn.YarnJobManager
>   - Failure while asking ResourceManager for RegisterResource
> akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/$c#1255104255]] after [1 ms]
>   at 
> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
>   at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
>   at 
> akka.actor.LightArrayRevolverScheduler$TaskHolder.run(Scheduler.scala:476)
>   at 
> akka.actor.LightArrayRevolverScheduler$$anonfun$close$1.apply(Scheduler.scala:282)
>   at 
> akka.actor.LightArrayRevolverScheduler$$anonfun$close$1.apply(Scheduler.scala:281)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at akka.actor.LightArrayRevolverScheduler.close(Scheduler.scala:280)
>   at akka.actor.ActorSystemImpl.stopScheduler(ActorSystem.scala:688)
>   at 
> akka.actor.ActorSystemImpl$$anonfun$liftedTree2$1$1.apply$mcV$sp(ActorSystem.scala:617)
>   at 
> akka.actor.ActorSystemImpl$$anonfun$liftedTree2$1$1.apply(ActorSystem.scala:617)
>   at 
> akka.actor.ActorSystemImpl$$anonfun$liftedTree2$1$1.apply(ActorSystem.scala:617)
>   at akka.actor.ActorSystemImpl$$anon$3.run(ActorSystem.scala:641)
>   at 
> akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.runNext$1(ActorSystem.scala:808)
>   at 
> akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.apply$mcV$sp(ActorSystem.scala:811)
>   at 
> akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.apply(ActorSystem.scala:804)
>   at 
> akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.apply(ActorSystem.scala:804)
>   at akka.util.ReentrantGuard.withGuard(LockUtil.scala:15)
>   at 
> akka.actor.ActorSystemImpl$TerminationCallbacks.run(ActorSystem.scala:804)
>   at 
> akka.actor.ActorSystemImpl$$anonfun$terminationCallbacks$1.apply(ActorSystem.scala:638)
>   at 
> akka.actor.ActorSystemImpl$$anonfun$terminationCallbacks$1.apply(ActorSystem.scala:638)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>   at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 2016-06-09 10:21:42,376 INFO  org.apache.flink.yarn.YarnJobManager
>   - Shutdown completed. Stopping JVM.
> 2016-06-09 10:21:42,377 INFO  
> org.apache.flink.runtime.webmonitor.StackTra

[jira] [Commented] (FLINK-4041) Failure while asking ResourceManager for RegisterResource

2016-06-20 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339620#comment-15339620
 ] 

Maximilian Michels commented on FLINK-4041:
---

This happens when a TaskManager registers during shutdown of the JobManager.

> Failure while asking ResourceManager for RegisterResource
> -
>
> Key: FLINK-4041
> URL: https://issues.apache.org/jira/browse/FLINK-4041
> Project: Flink
>  Issue Type: Bug
>  Components: ResourceManager
>Affects Versions: 1.1.0
>Reporter: Robert Metzger
>  Labels: test-stability
>
> In this build 
> (https://s3.amazonaws.com/archive.travis-ci.org/jobs/136372462/log.txt), I 
> got the following YARN Test failure:
> {code}
> 2016-06-09 10:21:42,336 INFO  
> akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down 
> remote daemon.
> 2016-06-09 10:21:42,336 INFO  
> akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon 
> shut down; proceeding with flushing remote transports.
> 2016-06-09 10:21:42,355 INFO  
> akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut 
> down.
> 2016-06-09 10:21:42,376 ERROR org.apache.flink.yarn.YarnJobManager
>   - Failure while asking ResourceManager for RegisterResource
> akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/$c#1255104255]] after [1 ms]
>   at 
> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
>   at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
>   at 
> akka.actor.LightArrayRevolverScheduler$TaskHolder.run(Scheduler.scala:476)
>   at 
> akka.actor.LightArrayRevolverScheduler$$anonfun$close$1.apply(Scheduler.scala:282)
>   at 
> akka.actor.LightArrayRevolverScheduler$$anonfun$close$1.apply(Scheduler.scala:281)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at akka.actor.LightArrayRevolverScheduler.close(Scheduler.scala:280)
>   at akka.actor.ActorSystemImpl.stopScheduler(ActorSystem.scala:688)
>   at 
> akka.actor.ActorSystemImpl$$anonfun$liftedTree2$1$1.apply$mcV$sp(ActorSystem.scala:617)
>   at 
> akka.actor.ActorSystemImpl$$anonfun$liftedTree2$1$1.apply(ActorSystem.scala:617)
>   at 
> akka.actor.ActorSystemImpl$$anonfun$liftedTree2$1$1.apply(ActorSystem.scala:617)
>   at akka.actor.ActorSystemImpl$$anon$3.run(ActorSystem.scala:641)
>   at 
> akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.runNext$1(ActorSystem.scala:808)
>   at 
> akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.apply$mcV$sp(ActorSystem.scala:811)
>   at 
> akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.apply(ActorSystem.scala:804)
>   at 
> akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.apply(ActorSystem.scala:804)
>   at akka.util.ReentrantGuard.withGuard(LockUtil.scala:15)
>   at 
> akka.actor.ActorSystemImpl$TerminationCallbacks.run(ActorSystem.scala:804)
>   at 
> akka.actor.ActorSystemImpl$$anonfun$terminationCallbacks$1.apply(ActorSystem.scala:638)
>   at 
> akka.actor.ActorSystemImpl$$anonfun$terminationCallbacks$1.apply(ActorSystem.scala:638)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>   at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 2016-06-09 10:21:42,376 INFO  org.apache.flink.yarn.YarnJobManager
>   - Shutdown completed. Stopping JVM.
> 2016-06-09 10:21:42,377 

[jira] [Commented] (FLINK-3964) Job submission times out with recursive.file.enumeration

2016-06-20 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339541#comment-15339541
 ] 

Maximilian Michels commented on FLINK-3964:
---

Thanks for reporting! Have you tried {{bin/flink run -yDakka.ask.timeout=600s 
...}}?

> Job submission times out with recursive.file.enumeration
> 
>
> Key: FLINK-3964
> URL: https://issues.apache.org/jira/browse/FLINK-3964
> Project: Flink
>  Issue Type: Bug
>Reporter: Juho Autio
>
> When using "recursive.file.enumeration" with a big enough folder structure to 
> list, flink batch job fails right at the beginning because of a timeout.
> h2. Problem details
> We get this error: {{Communication with JobManager failed: Job submission to 
> the JobManager timed out}}.
> The code we have is basically this:
> {code}
> val env = ExecutionEnvironment.getExecutionEnvironment
> val parameters = new Configuration
> // set the recursive enumeration parameter
> parameters.setBoolean("recursive.file.enumeration", true)
> val parameter = ParameterTool.fromArgs(args)
> val input_data_path : String = parameter.get("input_data_path", null )
> val data : DataSet[(Text,Text)] = env.readSequenceFile(classOf[Text], 
> classOf[Text], input_data_path)
> .withParameters(parameters)
> data.first(10).print
> {code}
> If we set {{input_data_path}} parameter to {{s3n://bucket/path/date=*/}} it 
> times out. If we use a more restrictive pattern like 
> {{s3n://bucket/path/date=20160523/}}, it doesn't time out.
> To me it seems that time taken to list files shouldn't cause any timeouts on 
> job submission level.
> For us this was "fixed" by adding {{akka.client.timeout: 600 s}} in 
> {{flink-conf.yaml}}, but I wonder if the timeout would still occur if we have 
> even more files to list?
> 
> P.S. Is there any way to set {{akka.client.timeout}} when calling {{bin/flink 
> run}} instead of editing {{flink-conf.yaml}}. I tried to add it as a {{-yD}} 
> flag but couldn't get it working.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3838) CLI parameter parser is munging application params

2016-06-20 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339413#comment-15339413
 ] 

Maximilian Michels commented on FLINK-3838:
---

Issue is fixed in common-cli 1.3.1

> CLI parameter parser is munging application params
> --
>
> Key: FLINK-3838
> URL: https://issues.apache.org/jira/browse/FLINK-3838
> Project: Flink
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.0.2, 1.0.3
>Reporter: Ken Krugler
>Assignee: Maximilian Michels
>Priority: Minor
> Fix For: 1.1.0, 1.0.4
>
>
> If parameters for an application use a single '-' (e.g. -maxtasks) then the 
> CLI argument parser will munge these, and the app gets passed either just the 
> parameter name (e.g. 'maxtask') if the start of the parameter doesn't match a 
> Flink parameter, or you get two values, with the first value being the part 
> that matched (e.g. '-m') and the second value being the rest (e.g. 'axtasks').
> The parser should ignore everything after the jar path parameter.
> Note that using -- does seem to work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3838) CLI parameter parser is munging application params

2016-06-20 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339411#comment-15339411
 ] 

Maximilian Michels commented on FLINK-3838:
---

A workaround is to use {{./flink run   -- }} The two dashes tell the parsing library that no more options are 
following and it should take the rest as it is.

> CLI parameter parser is munging application params
> --
>
> Key: FLINK-3838
> URL: https://issues.apache.org/jira/browse/FLINK-3838
> Project: Flink
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.0.2, 1.0.3
>Reporter: Ken Krugler
>Assignee: Maximilian Michels
>Priority: Minor
> Fix For: 1.1.0, 1.0.4
>
>
> If parameters for an application use a single '-' (e.g. -maxtasks) then the 
> CLI argument parser will munge these, and the app gets passed either just the 
> parameter name (e.g. 'maxtask') if the start of the parameter doesn't match a 
> Flink parameter, or you get two values, with the first value being the part 
> that matched (e.g. '-m') and the second value being the rest (e.g. 'axtasks').
> The parser should ignore everything after the jar path parameter.
> Note that using -- does seem to work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3838) CLI parameter parser is munging application params

2016-06-20 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339405#comment-15339405
 ] 

Maximilian Michels commented on FLINK-3838:
---

Occurs whenever args are present before the jar path and the jar arguments.

Works (Passes "-arg" and "value" as parameters): {{./flink run /path/to/jar 
-arg value}}
Doesn't work (Passes "arg" and "value" as parameters): {{./flink run -m 
localhost:6123 /path/to/jar -arg value}}

> CLI parameter parser is munging application params
> --
>
> Key: FLINK-3838
> URL: https://issues.apache.org/jira/browse/FLINK-3838
> Project: Flink
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.0.2, 1.0.3
>Reporter: Ken Krugler
>Assignee: Maximilian Michels
>Priority: Minor
> Fix For: 1.1.0, 1.0.4
>
>
> If parameters for an application use a single '-' (e.g. -maxtasks) then the 
> CLI argument parser will munge these, and the app gets passed either just the 
> parameter name (e.g. 'maxtask') if the start of the parameter doesn't match a 
> Flink parameter, or you get two values, with the first value being the part 
> that matched (e.g. '-m') and the second value being the rest (e.g. 'axtasks').
> The parser should ignore everything after the jar path parameter.
> Note that using -- does seem to work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3838) CLI parameter parser is munging application params

2016-06-20 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339280#comment-15339280
 ] 

Maximilian Michels commented on FLINK-3838:
---

Seems like this only occurs when running on Yarn, correct?

> CLI parameter parser is munging application params
> --
>
> Key: FLINK-3838
> URL: https://issues.apache.org/jira/browse/FLINK-3838
> Project: Flink
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.0.2, 1.0.3
>Reporter: Ken Krugler
>Assignee: Maximilian Michels
>Priority: Minor
> Fix For: 1.1.0, 1.0.4
>
>
> If parameters for an application use a single '-' (e.g. -maxtasks) then the 
> CLI argument parser will munge these, and the app gets passed either just the 
> parameter name (e.g. 'maxtask') if the start of the parameter doesn't match a 
> Flink parameter, or you get two values, with the first value being the part 
> that matched (e.g. '-m') and the second value being the rest (e.g. 'axtasks').
> The parser should ignore everything after the jar path parameter.
> Note that using -- does seem to work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-3838) CLI parameter parser is munging application params

2016-06-20 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels reassigned FLINK-3838:
-

Assignee: Maximilian Michels

> CLI parameter parser is munging application params
> --
>
> Key: FLINK-3838
> URL: https://issues.apache.org/jira/browse/FLINK-3838
> Project: Flink
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.0.2, 1.0.3
>Reporter: Ken Krugler
>Assignee: Maximilian Michels
>Priority: Minor
> Fix For: 1.1.0, 1.0.4
>
>
> If parameters for an application use a single '-' (e.g. -maxtasks) then the 
> CLI argument parser will munge these, and the app gets passed either just the 
> parameter name (e.g. 'maxtask') if the start of the parameter doesn't match a 
> Flink parameter, or you get two values, with the first value being the part 
> that matched (e.g. '-m') and the second value being the rest (e.g. 'axtasks').
> The parser should ignore everything after the jar path parameter.
> Note that using -- does seem to work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3838) CLI parameter parser is munging application params

2016-06-20 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339240#comment-15339240
 ] 

Maximilian Michels commented on FLINK-3838:
---

Thanks for reporting!

> CLI parameter parser is munging application params
> --
>
> Key: FLINK-3838
> URL: https://issues.apache.org/jira/browse/FLINK-3838
> Project: Flink
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.0.2, 1.0.3
>Reporter: Ken Krugler
>Assignee: Maximilian Michels
>Priority: Minor
> Fix For: 1.1.0, 1.0.4
>
>
> If parameters for an application use a single '-' (e.g. -maxtasks) then the 
> CLI argument parser will munge these, and the app gets passed either just the 
> parameter name (e.g. 'maxtask') if the start of the parameter doesn't match a 
> Flink parameter, or you get two values, with the first value being the part 
> that matched (e.g. '-m') and the second value being the rest (e.g. 'axtasks').
> The parser should ignore everything after the jar path parameter.
> Note that using -- does seem to work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3838) CLI parameter parser is munging application params

2016-06-20 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated FLINK-3838:
--
Affects Version/s: 1.0.3

> CLI parameter parser is munging application params
> --
>
> Key: FLINK-3838
> URL: https://issues.apache.org/jira/browse/FLINK-3838
> Project: Flink
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.0.2, 1.0.3
>Reporter: Ken Krugler
>Priority: Minor
> Fix For: 1.1.0, 1.0.4
>
>
> If parameters for an application use a single '-' (e.g. -maxtasks) then the 
> CLI argument parser will munge these, and the app gets passed either just the 
> parameter name (e.g. 'maxtask') if the start of the parameter doesn't match a 
> Flink parameter, or you get two values, with the first value being the part 
> that matched (e.g. '-m') and the second value being the rest (e.g. 'axtasks').
> The parser should ignore everything after the jar path parameter.
> Note that using -- does seem to work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3838) CLI parameter parser is munging application params

2016-06-20 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated FLINK-3838:
--
Fix Version/s: 1.0.4
   1.1.0

> CLI parameter parser is munging application params
> --
>
> Key: FLINK-3838
> URL: https://issues.apache.org/jira/browse/FLINK-3838
> Project: Flink
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.0.2, 1.0.3
>Reporter: Ken Krugler
>Priority: Minor
> Fix For: 1.1.0, 1.0.4
>
>
> If parameters for an application use a single '-' (e.g. -maxtasks) then the 
> CLI argument parser will munge these, and the app gets passed either just the 
> parameter name (e.g. 'maxtask') if the start of the parameter doesn't match a 
> Flink parameter, or you get two values, with the first value being the part 
> that matched (e.g. '-m') and the second value being the rest (e.g. 'axtasks').
> The parser should ignore everything after the jar path parameter.
> Note that using -- does seem to work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-3820) ScalaShellITCase.testPreventRecreationBatch deadlocks on Travis

2016-06-20 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels resolved FLINK-3820.
---
Resolution: Duplicate

Should be resolved with FLINK-4030.

> ScalaShellITCase.testPreventRecreationBatch deadlocks on Travis
> ---
>
> Key: FLINK-3820
> URL: https://issues.apache.org/jira/browse/FLINK-3820
> Project: Flink
>  Issue Type: Bug
>  Components: Scala Shell
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Priority: Critical
>  Labels: test-stability
>
> The test case {{ScalaShellITCase.testPreventRecreationBatch}} deadlocked on 
> Travis.
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/125789071/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3787) Yarn client does not report unfulfillable container constraints

2016-06-20 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339230#comment-15339230
 ] 

Maximilian Michels commented on FLINK-3787:
---

Do we want to fix this for the 1.1.0 release?

> Yarn client does not report unfulfillable container constraints
> ---
>
> Key: FLINK-3787
> URL: https://issues.apache.org/jira/browse/FLINK-3787
> Project: Flink
>  Issue Type: Improvement
>  Components: YARN Client
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Priority: Minor
>
> If the number of virtual cores for a Yarn container is not fulfillable, then 
> the {{TaskManager}} won't be started. This is only reported in the logs but 
> not in the {{FlinkYarnClient}}. Thus, the user will see a started 
> {{JobManager}} with no connected {{TaskManagers}}. Since the log aggregation 
> is only available after the Yarn job has been stopped, there is no easy way 
> for the user to detect what's going on.
> This problem is aggravated by the fact that the number of virtual cores is 
> coupled to the number of slots if no explicit value has been set for the 
> virtual cores. Therefore, it might happen that the Yarn deployment fails 
> because of the virtual cores even though the user has never set a value for 
> them (the user might even not know about the virtual cores).
> I think it would be good to check if the virtual cores constraint is 
> fulfillable. If not, then the user should receive a clear message that the 
> Flink cluster cannot be deployed (similar to the memory constraints).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-3765) ScalaShellITCase deadlocks spuriously

2016-06-20 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels resolved FLINK-3765.
---
Resolution: Duplicate

This should be resolved with the resolution of FLINK-4030. Sorry, didn't see 
the issue.

> ScalaShellITCase deadlocks spuriously
> -
>
> Key: FLINK-3765
> URL: https://issues.apache.org/jira/browse/FLINK-3765
> Project: Flink
>  Issue Type: Bug
>  Components: Scala Shell
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Priority: Critical
>  Labels: test-stability
>
> The {{ScalaShellITCase}} deadlocks spuriously. Stephan already observed it 
> locally and now it also happened on Travis [1].
> [1] https://s3.amazonaws.com/archive.travis-ci.org/jobs/123206554/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3757) addAccumulator does not throw Exception on duplicate accumulator name

2016-06-20 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339224#comment-15339224
 ] 

Maximilian Michels commented on FLINK-3757:
---

Accumulators are first created locally without checking at the JobManager 
whether they exist. Only once they have been transferred to the JobManager they 
are merged (and errors are reported). Since accumulators can be created at 
arbitrary points during job execution, I think this behavior makes sense. We 
would have to send a message to the JobManager upon creation or do static 
analysis of the code to see if accumulators collide.

I would say it is a feature to avoid making requests to the JobManager during 
execution :)

> addAccumulator does not throw Exception on duplicate accumulator name
> -
>
> Key: FLINK-3757
> URL: https://issues.apache.org/jira/browse/FLINK-3757
> Project: Flink
>  Issue Type: Bug
>Reporter: Konstantin Knauf
>Priority: Minor
>
> Works if tasks are chained, but in many situations it does not, see
> See https://gist.github.com/knaufk/a0026eff9792aa905a2f586f3b0cb112
> Is this an undocumented feature or a valid bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-4009) Scala Shell fails to find library for inclusion in test

2016-06-17 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated FLINK-4009:
--
Issue Type: Bug  (was: Sub-task)
Parent: (was: FLINK-3454)

> Scala Shell fails to find library for inclusion in test
> ---
>
> Key: FLINK-4009
> URL: https://issues.apache.org/jira/browse/FLINK-4009
> Project: Flink
>  Issue Type: Bug
>  Components: Scala Shell, Tests
>Affects Versions: 1.1.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>  Labels: test-stability
> Fix For: 1.1.0
>
>
> The Scala Shell test fails to find the flink-ml library jar in the target 
> folder when executing with Intellij. This is due to its working directory 
> being expected in "flink-scala-shell/target" when it is in fact 
> "flink-scala-shell". When executed with Maven, this works fine because the 
> Shade plugin changes the basedir from the project root to the /target 
> folder*. 
> As per [~till.rohrmann] and [~greghogan] suggestions we could simply add 
> flink-ml as a test dependency and look for the jar path in the classpath.
> \* Because we have the dependencyReducedPomLocation set to /target/.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-4009) Scala Shell fails to find library for inclusion in test

2016-06-17 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed FLINK-4009.
-
Resolution: Fixed

> Scala Shell fails to find library for inclusion in test
> ---
>
> Key: FLINK-4009
> URL: https://issues.apache.org/jira/browse/FLINK-4009
> Project: Flink
>  Issue Type: Sub-task
>  Components: Scala Shell, Tests
>Affects Versions: 1.1.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>  Labels: test-stability
> Fix For: 1.1.0
>
>
> The Scala Shell test fails to find the flink-ml library jar in the target 
> folder when executing with Intellij. This is due to its working directory 
> being expected in "flink-scala-shell/target" when it is in fact 
> "flink-scala-shell". When executed with Maven, this works fine because the 
> Shade plugin changes the basedir from the project root to the /target 
> folder*. 
> As per [~till.rohrmann] and [~greghogan] suggestions we could simply add 
> flink-ml as a test dependency and look for the jar path in the classpath.
> \* Because we have the dependencyReducedPomLocation set to /target/.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4009) Scala Shell fails to find library for inclusion in test

2016-06-17 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335903#comment-15335903
 ] 

Maximilian Michels commented on FLINK-4009:
---

Fixed with 4d8cbec4ffca033f5602a53205d65e535ffc5f97. Discuss for using the 
class path can continue in FLINK-3454. We need to build a jar file during the 
test to load the classes in the shell.

> Scala Shell fails to find library for inclusion in test
> ---
>
> Key: FLINK-4009
> URL: https://issues.apache.org/jira/browse/FLINK-4009
> Project: Flink
>  Issue Type: Sub-task
>  Components: Scala Shell, Tests
>Affects Versions: 1.1.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>  Labels: test-stability
> Fix For: 1.1.0
>
>
> The Scala Shell test fails to find the flink-ml library jar in the target 
> folder when executing with Intellij. This is due to its working directory 
> being expected in "flink-scala-shell/target" when it is in fact 
> "flink-scala-shell". When executed with Maven, this works fine because the 
> Shade plugin changes the basedir from the project root to the /target 
> folder*. 
> As per [~till.rohrmann] and [~greghogan] suggestions we could simply add 
> flink-ml as a test dependency and look for the jar path in the classpath.
> \* Because we have the dependencyReducedPomLocation set to /target/.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4009) Scala Shell fails to find library for inclusion in test

2016-06-17 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335898#comment-15335898
 ] 

Maximilian Michels commented on FLINK-4009:
---

Fixing this in the meantime with a simple lookup strategy.

> Scala Shell fails to find library for inclusion in test
> ---
>
> Key: FLINK-4009
> URL: https://issues.apache.org/jira/browse/FLINK-4009
> Project: Flink
>  Issue Type: Sub-task
>  Components: Scala Shell, Tests
>Affects Versions: 1.1.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>  Labels: test-stability
> Fix For: 1.1.0
>
>
> The Scala Shell test fails to find the flink-ml library jar in the target 
> folder when executing with Intellij. This is due to its working directory 
> being expected in "flink-scala-shell/target" when it is in fact 
> "flink-scala-shell". When executed with Maven, this works fine because the 
> Shade plugin changes the basedir from the project root to the /target 
> folder*. 
> As per [~till.rohrmann] and [~greghogan] suggestions we could simply add 
> flink-ml as a test dependency and look for the jar path in the classpath.
> \* Because we have the dependencyReducedPomLocation set to /target/.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4009) Scala Shell fails to find library for inclusion in test

2016-06-17 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335894#comment-15335894
 ] 

Maximilian Michels commented on FLINK-4009:
---

Not pressing IMHO.

> Scala Shell fails to find library for inclusion in test
> ---
>
> Key: FLINK-4009
> URL: https://issues.apache.org/jira/browse/FLINK-4009
> Project: Flink
>  Issue Type: Sub-task
>  Components: Scala Shell, Tests
>Affects Versions: 1.1.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>  Labels: test-stability
> Fix For: 1.1.0
>
>
> The Scala Shell test fails to find the flink-ml library jar in the target 
> folder when executing with Intellij. This is due to its working directory 
> being expected in "flink-scala-shell/target" when it is in fact 
> "flink-scala-shell". When executed with Maven, this works fine because the 
> Shade plugin changes the basedir from the project root to the /target 
> folder*. 
> As per [~till.rohrmann] and [~greghogan] suggestions we could simply add 
> flink-ml as a test dependency and look for the jar path in the classpath.
> \* Because we have the dependencyReducedPomLocation set to /target/.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3890) Deprecate streaming mode flag from Yarn CLI

2016-06-17 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed FLINK-3890.
-
Resolution: Fixed

Fixed with f4ac852275da36ee33aa54ae9097293ccc981afa

> Deprecate streaming mode flag from Yarn CLI
> ---
>
> Key: FLINK-3890
> URL: https://issues.apache.org/jira/browse/FLINK-3890
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
> Fix For: 1.1.0
>
>
> The {{-yst}} and {{-yarnstreaming}} parameter to the Yarn command-line is not 
> in use anymore since FLINK-3073 and should have been removed before the 1.0.0 
> release. I would suggest to mark the parameter as deprecated in the code and 
> not list it anymore in the help section. In one of the upcoming major 
> releases, we can remove it completely (which would give users an error if 
> they used the flag).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3667) Generalize client<->cluster communication

2016-06-17 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed FLINK-3667.
-
Resolution: Fixed

Implemented via f9b52a3114a2114e6846091acf3abb294a49615b and 
f4ac852275da36ee33aa54ae9097293ccc981afa

> Generalize client<->cluster communication
> -
>
> Key: FLINK-3667
> URL: https://issues.apache.org/jira/browse/FLINK-3667
> Project: Flink
>  Issue Type: Improvement
>  Components: YARN Client
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> Here are some notes I took when inspecting the client<->cluster classes with 
> regard to future integration of other resource management frameworks in 
> addition to Yarn (e.g. Mesos).
> {noformat}
> 1 Cluster Client Abstraction
> 
> 1.1 Status Quo
> ──
> 1.1.1 FlinkYarnClient
> ╌
>   • Holds the cluster configuration (Flink-specific and Yarn-specific)
>   • Contains the deploy() method to deploy the cluster
>   • Creates the Hadoop Yarn client
>   • Receives the initial job manager address
>   • Bootstraps the FlinkYarnCluster
> 1.1.2 FlinkYarnCluster
> ╌╌
>   • Wrapper around the Hadoop Yarn client
>   • Queries cluster for status updates
>   • Life time methods to start and shutdown the cluster
>   • Flink specific features like shutdown after job completion
> 1.1.3 ApplicationClient
> ╌╌╌
>   • Acts as a middle-man for asynchronous cluster communication
>   • Designed to communicate with Yarn, not used in Standalone mode
> 1.1.4 CliFrontend
> ╌
>   • Deeply integrated with FlinkYarnClient and FlinkYarnCluster
>   • Constantly distinguishes between Yarn and Standalone mode
>   • Would be nice to have a general abstraction in place
> 1.1.5 Client
> 
>   • Job submission and Job related actions, agnostic of resource framework
> 1.2 Proposal
> 
> 1.2.1 ClusterConfig (before: AbstractFlinkYarnClient)
> ╌
>   • Extensible cluster-agnostic config
>   • May be extended by specific cluster, e.g. YarnClusterConfig
> 1.2.2 ClusterClient (before: AbstractFlinkYarnClient)
> ╌
>   • Deals with cluster (RM) specific communication
>   • Exposes framework agnostic information
>   • YarnClusterClient, MesosClusterClient, StandaloneClusterClient
> 1.2.3 FlinkCluster (before: AbstractFlinkYarnCluster)
> ╌
>   • Basic interface to communicate with a running cluster
>   • Receives the ClusterClient for cluster-specific communication
>   • Should not have to care about the specific implementations of the
> client
> 1.2.4 ApplicationClient
> ╌╌╌
>   • Can be changed to work cluster-agnostic (first steps already in
> FLINK-3543)
> 1.2.5 CliFrontend
> ╌
>   • CliFrontend does never have to differentiate between different
> cluster types after it has determined which cluster class to load.
>   • Base class handles framework agnostic command line arguments
>   • Pluggables for Yarn, Mesos handle specific commands
> {noformat}
> I would like to create/refactor the affected classes to set us up for a more 
> flexible client side resource management abstraction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3667) Generalize client<->cluster communication

2016-06-17 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated FLINK-3667:
--
Fix Version/s: 1.1.0

> Generalize client<->cluster communication
> -
>
> Key: FLINK-3667
> URL: https://issues.apache.org/jira/browse/FLINK-3667
> Project: Flink
>  Issue Type: Improvement
>  Components: YARN Client
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> Here are some notes I took when inspecting the client<->cluster classes with 
> regard to future integration of other resource management frameworks in 
> addition to Yarn (e.g. Mesos).
> {noformat}
> 1 Cluster Client Abstraction
> 
> 1.1 Status Quo
> ──
> 1.1.1 FlinkYarnClient
> ╌
>   • Holds the cluster configuration (Flink-specific and Yarn-specific)
>   • Contains the deploy() method to deploy the cluster
>   • Creates the Hadoop Yarn client
>   • Receives the initial job manager address
>   • Bootstraps the FlinkYarnCluster
> 1.1.2 FlinkYarnCluster
> ╌╌
>   • Wrapper around the Hadoop Yarn client
>   • Queries cluster for status updates
>   • Life time methods to start and shutdown the cluster
>   • Flink specific features like shutdown after job completion
> 1.1.3 ApplicationClient
> ╌╌╌
>   • Acts as a middle-man for asynchronous cluster communication
>   • Designed to communicate with Yarn, not used in Standalone mode
> 1.1.4 CliFrontend
> ╌
>   • Deeply integrated with FlinkYarnClient and FlinkYarnCluster
>   • Constantly distinguishes between Yarn and Standalone mode
>   • Would be nice to have a general abstraction in place
> 1.1.5 Client
> 
>   • Job submission and Job related actions, agnostic of resource framework
> 1.2 Proposal
> 
> 1.2.1 ClusterConfig (before: AbstractFlinkYarnClient)
> ╌
>   • Extensible cluster-agnostic config
>   • May be extended by specific cluster, e.g. YarnClusterConfig
> 1.2.2 ClusterClient (before: AbstractFlinkYarnClient)
> ╌
>   • Deals with cluster (RM) specific communication
>   • Exposes framework agnostic information
>   • YarnClusterClient, MesosClusterClient, StandaloneClusterClient
> 1.2.3 FlinkCluster (before: AbstractFlinkYarnCluster)
> ╌
>   • Basic interface to communicate with a running cluster
>   • Receives the ClusterClient for cluster-specific communication
>   • Should not have to care about the specific implementations of the
> client
> 1.2.4 ApplicationClient
> ╌╌╌
>   • Can be changed to work cluster-agnostic (first steps already in
> FLINK-3543)
> 1.2.5 CliFrontend
> ╌
>   • CliFrontend does never have to differentiate between different
> cluster types after it has determined which cluster class to load.
>   • Base class handles framework agnostic command line arguments
>   • Pluggables for Yarn, Mesos handle specific commands
> {noformat}
> I would like to create/refactor the affected classes to set us up for a more 
> flexible client side resource management abstraction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4079) YARN properties file used for per-job cluster

2016-06-17 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335730#comment-15335730
 ] 

Maximilian Michels commented on FLINK-4079:
---

Fixed for 1.1 with ec6d97528e8b21f191b7922e4810fd60804c8365

Do we want to do a backport for 1.0.4?

> YARN properties file used for per-job cluster
> -
>
> Key: FLINK-4079
> URL: https://issues.apache.org/jira/browse/FLINK-4079
> Project: Flink
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.0.3
>Reporter: Ufuk Celebi
>Assignee: Maximilian Michels
> Fix For: 1.1.0, 1.0.4
>
>
> YARN per job clusters (flink run -m yarn-cluster) rely on the hidden YARN 
> properties file, which defines the container configuration. This can lead to 
> unexpected behaviour, because the per-job-cluster configuration is merged  
> with the YARN properties file (or used as only configuration source).
> A user ran into this as follows:
> - Create a long-lived YARN session with HA (creates a hidden YARN properties 
> file)
> - Submits standalone batch jobs with a per job cluster (flink run -m 
> yarn-cluster). The batch jobs get submitted to the long lived HA cluster, 
> because of the properties file.
> [~mxm] Am I correct in assuming that this is only relevant for the 1.0 branch 
> and will be fixed with the client refactoring you are working on?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3937) Make flink cli list, savepoint, cancel and stop work on Flink-on-YARN clusters

2016-06-17 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed FLINK-3937.
-
   Resolution: Fixed
Fix Version/s: 1.1.0

Fixed with 875d4d238567a0503941488eb4e7d03b38c0fc42 and 
f4ac852275da36ee33aa54ae9097293ccc981afa

> Make flink cli list, savepoint, cancel and stop work on Flink-on-YARN clusters
> --
>
> Key: FLINK-3937
> URL: https://issues.apache.org/jira/browse/FLINK-3937
> Project: Flink
>  Issue Type: Improvement
>Reporter: Sebastian Klemke
>Assignee: Maximilian Michels
>Priority: Minor
> Fix For: 1.1.0
>
> Attachments: improve_flink_cli_yarn_integration.patch
>
>
> Currently, flink cli can't figure out JobManager RPC location for 
> Flink-on-YARN clusters. Therefore, list, savepoint, cancel and stop 
> subcommands are hard to invoke if you only know the YARN application ID. As 
> an improvement, I suggest adding a -yid  option to the 
> mentioned subcommands that can be used together with -m yarn-cluster. Flink 
> cli would then retrieve JobManager RPC location from YARN ResourceManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3937) Make flink cli list, savepoint, cancel and stop work on Flink-on-YARN clusters

2016-06-17 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated FLINK-3937:
--
Priority: Minor  (was: Trivial)

> Make flink cli list, savepoint, cancel and stop work on Flink-on-YARN clusters
> --
>
> Key: FLINK-3937
> URL: https://issues.apache.org/jira/browse/FLINK-3937
> Project: Flink
>  Issue Type: Improvement
>Reporter: Sebastian Klemke
>Assignee: Maximilian Michels
>Priority: Minor
> Attachments: improve_flink_cli_yarn_integration.patch
>
>
> Currently, flink cli can't figure out JobManager RPC location for 
> Flink-on-YARN clusters. Therefore, list, savepoint, cancel and stop 
> subcommands are hard to invoke if you only know the YARN application ID. As 
> an improvement, I suggest adding a -yid  option to the 
> mentioned subcommands that can be used together with -m yarn-cluster. Flink 
> cli would then retrieve JobManager RPC location from YARN ResourceManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3863) Yarn Cluster shutdown may fail if leader changed recently

2016-06-17 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed FLINK-3863.
-
Resolution: Fixed

Fixed with 9e9842410a635d183a002d1f25a6f489ce9d6a2f

> Yarn Cluster shutdown may fail if leader changed recently
> -
>
> Key: FLINK-3863
> URL: https://issues.apache.org/jira/browse/FLINK-3863
> Project: Flink
>  Issue Type: Bug
>  Components: YARN Client
>Affects Versions: 1.1.0, 1.0.2
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
> Fix For: 1.1.0
>
>
> The {{ApplicationClient}} sets {{yarnJobManager}} to {{None}} until it has 
> connected to a newly elected JobManager. A shutdown message to the 
> application master is discarded while the ApplicationClient tries to 
> reconnect. The ApplicationClient should retry to shutdown the cluster when it 
> is connected to the new leader. It may also time out (which currently is 
> always the case).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2392) Instable test in flink-yarn-tests

2016-06-16 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333516#comment-15333516
 ] 

Maximilian Michels commented on FLINK-2392:
---

It doesn't matter if it is a naming conflict or a malformed bean. It still 
seems to be an instance of incorrect registration, no? :)

> Instable test in flink-yarn-tests
> -
>
> Key: FLINK-2392
> URL: https://issues.apache.org/jira/browse/FLINK-2392
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 0.10.0
>Reporter: Matthias J. Sax
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.0.0
>
>
> The test YARNSessionFIFOITCase fails from time to time on an irregular basis. 
> For example see: https://travis-ci.org/apache/flink/jobs/72019690
> {noformat}
> Tests run: 12, Failures: 1, Errors: 0, Skipped: 2, Time elapsed: 205.163 sec 
> <<< FAILURE! - in org.apache.flink.yarn.YARNSessionFIFOITCase
> perJobYarnClusterWithParallelism(org.apache.flink.yarn.YARNSessionFIFOITCase) 
>  Time elapsed: 60.651 sec  <<< FAILURE!
> java.lang.AssertionError: During the timeout period of 60 seconds the 
> expected string did not show up
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.apache.flink.yarn.YarnTestBase.runWithArgs(YarnTestBase.java:478)
>   at 
> org.apache.flink.yarn.YARNSessionFIFOITCase.perJobYarnClusterWithParallelism(YARNSessionFIFOITCase.java:435)
> Results :
> Failed tests: 
>   
> YARNSessionFIFOITCase.perJobYarnClusterWithParallelism:435->YarnTestBase.runWithArgs:478
>  During the timeout period of 60 seconds the expected string did not show up
> {noformat}
> Another error case is this (see 
> https://travis-ci.org/mjsax/flink/jobs/77313444)
> {noformat}
> Tests run: 12, Failures: 3, Errors: 0, Skipped: 2, Time elapsed: 182.008 sec 
> <<< FAILURE! - in org.apache.flink.yarn.YARNSessionFIFOITCase
> testTaskManagerFailure(org.apache.flink.yarn.YARNSessionFIFOITCase)  Time 
> elapsed: 27.356 sec  <<< FAILURE!
> java.lang.AssertionError: Found a file 
> /home/travis/build/mjsax/flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-logDir-nm-0_0/application_1440595422559_0007/container_1440595422559_0007_01_03/taskmanager.log
>  with a prohibited string: [Exception, Started 
> SelectChannelConnector@0.0.0.0:8081]
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.flink.yarn.YarnTestBase.ensureNoProhibitedStringInLogFiles(YarnTestBase.java:294)
>   at 
> org.apache.flink.yarn.YARNSessionFIFOITCase.checkForProhibitedLogContents(YARNSessionFIFOITCase.java:94)
> testNonexistingQueue(org.apache.flink.yarn.YARNSessionFIFOITCase)  Time 
> elapsed: 17.421 sec  <<< FAILURE!
> java.lang.AssertionError: Found a file 
> /home/travis/build/mjsax/flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-logDir-nm-0_0/application_1440595422559_0007/container_1440595422559_0007_01_03/taskmanager.log
>  with a prohibited string: [Exception, Started 
> SelectChannelConnector@0.0.0.0:8081]
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.flink.yarn.YarnTestBase.ensureNoProhibitedStringInLogFiles(YarnTestBase.java:294)
>   at 
> org.apache.flink.yarn.YARNSessionFIFOITCase.checkForProhibitedLogContents(YARNSessionFIFOITCase.java:94)
> testJavaAPI(org.apache.flink.yarn.YARNSessionFIFOITCase)  Time elapsed: 
> 11.984 sec  <<< FAILURE!
> java.lang.AssertionError: Found a file 
> /home/travis/build/mjsax/flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-logDir-nm-0_0/application_1440595422559_0007/container_1440595422559_0007_01_03/taskmanager.log
>  with a prohibited string: [Exception, Started 
> SelectChannelConnector@0.0.0.0:8081]
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.flink.yarn.YarnTestBase.ensureNoProhibitedStringInLogFiles(YarnTestBase.java:294)
>   at 
> org.apache.flink.yarn.YARNSessionFIFOITCase.checkForProhibitedLogContents(YARNSessionFIFOITCase.java:94)
> {noformat}
> Furthermore, this build failed too: 
> https://travis-ci.org/apache/flink/jobs/77313450
> (no error, but Travis terminated to due no progress for 300 seconds -> 
> deadlock?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2392) Instable test in flink-yarn-tests

2016-06-16 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333461#comment-15333461
 ] 

Maximilian Michels commented on FLINK-2392:
---

What do you mean? The issue is unrelated or the Exception?

> Instable test in flink-yarn-tests
> -
>
> Key: FLINK-2392
> URL: https://issues.apache.org/jira/browse/FLINK-2392
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 0.10.0
>Reporter: Matthias J. Sax
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.0.0
>
>
> The test YARNSessionFIFOITCase fails from time to time on an irregular basis. 
> For example see: https://travis-ci.org/apache/flink/jobs/72019690
> {noformat}
> Tests run: 12, Failures: 1, Errors: 0, Skipped: 2, Time elapsed: 205.163 sec 
> <<< FAILURE! - in org.apache.flink.yarn.YARNSessionFIFOITCase
> perJobYarnClusterWithParallelism(org.apache.flink.yarn.YARNSessionFIFOITCase) 
>  Time elapsed: 60.651 sec  <<< FAILURE!
> java.lang.AssertionError: During the timeout period of 60 seconds the 
> expected string did not show up
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.apache.flink.yarn.YarnTestBase.runWithArgs(YarnTestBase.java:478)
>   at 
> org.apache.flink.yarn.YARNSessionFIFOITCase.perJobYarnClusterWithParallelism(YARNSessionFIFOITCase.java:435)
> Results :
> Failed tests: 
>   
> YARNSessionFIFOITCase.perJobYarnClusterWithParallelism:435->YarnTestBase.runWithArgs:478
>  During the timeout period of 60 seconds the expected string did not show up
> {noformat}
> Another error case is this (see 
> https://travis-ci.org/mjsax/flink/jobs/77313444)
> {noformat}
> Tests run: 12, Failures: 3, Errors: 0, Skipped: 2, Time elapsed: 182.008 sec 
> <<< FAILURE! - in org.apache.flink.yarn.YARNSessionFIFOITCase
> testTaskManagerFailure(org.apache.flink.yarn.YARNSessionFIFOITCase)  Time 
> elapsed: 27.356 sec  <<< FAILURE!
> java.lang.AssertionError: Found a file 
> /home/travis/build/mjsax/flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-logDir-nm-0_0/application_1440595422559_0007/container_1440595422559_0007_01_03/taskmanager.log
>  with a prohibited string: [Exception, Started 
> SelectChannelConnector@0.0.0.0:8081]
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.flink.yarn.YarnTestBase.ensureNoProhibitedStringInLogFiles(YarnTestBase.java:294)
>   at 
> org.apache.flink.yarn.YARNSessionFIFOITCase.checkForProhibitedLogContents(YARNSessionFIFOITCase.java:94)
> testNonexistingQueue(org.apache.flink.yarn.YARNSessionFIFOITCase)  Time 
> elapsed: 17.421 sec  <<< FAILURE!
> java.lang.AssertionError: Found a file 
> /home/travis/build/mjsax/flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-logDir-nm-0_0/application_1440595422559_0007/container_1440595422559_0007_01_03/taskmanager.log
>  with a prohibited string: [Exception, Started 
> SelectChannelConnector@0.0.0.0:8081]
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.flink.yarn.YarnTestBase.ensureNoProhibitedStringInLogFiles(YarnTestBase.java:294)
>   at 
> org.apache.flink.yarn.YARNSessionFIFOITCase.checkForProhibitedLogContents(YARNSessionFIFOITCase.java:94)
> testJavaAPI(org.apache.flink.yarn.YARNSessionFIFOITCase)  Time elapsed: 
> 11.984 sec  <<< FAILURE!
> java.lang.AssertionError: Found a file 
> /home/travis/build/mjsax/flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-logDir-nm-0_0/application_1440595422559_0007/container_1440595422559_0007_01_03/taskmanager.log
>  with a prohibited string: [Exception, Started 
> SelectChannelConnector@0.0.0.0:8081]
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.flink.yarn.YarnTestBase.ensureNoProhibitedStringInLogFiles(YarnTestBase.java:294)
>   at 
> org.apache.flink.yarn.YARNSessionFIFOITCase.checkForProhibitedLogContents(YARNSessionFIFOITCase.java:94)
> {noformat}
> Furthermore, this build failed too: 
> https://travis-ci.org/apache/flink/jobs/77313450
> (no error, but Travis terminated to due no progress for 300 seconds -> 
> deadlock?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3962) JMXReporter doesn't properly register/deregister metrics

2016-06-16 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333440#comment-15333440
 ] 

Maximilian Michels commented on FLINK-3962:
---

Another exception from the Yarn logs:

{noformat}
2016-06-16 04:58:37,218 ERROR org.apache.flink.metrics.reporter.JMXReporter 
- Metric did not comply with JMX MBean naming rules.
javax.management.NotCompliantMBeanException: Interface is not public: 
org.apache.flink.metrics.reporter.JMXReporter$JmxGaugeMBean
at com.sun.jmx.mbeanserver.MBeanAnalyzer.(MBeanAnalyzer.java:114)
at 
com.sun.jmx.mbeanserver.MBeanAnalyzer.analyzer(MBeanAnalyzer.java:102)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.getAnalyzer(StandardMBeanIntrospector.java:67)
at 
com.sun.jmx.mbeanserver.MBeanIntrospector.getPerInterface(MBeanIntrospector.java:192)
at com.sun.jmx.mbeanserver.MBeanSupport.(MBeanSupport.java:138)
at 
com.sun.jmx.mbeanserver.StandardMBeanSupport.(StandardMBeanSupport.java:60)
at 
com.sun.jmx.mbeanserver.Introspector.makeDynamicMBean(Introspector.java:192)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:898)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:324)
at 
com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
at 
org.apache.flink.metrics.reporter.JMXReporter.notifyOfAddedMetric(JMXReporter.java:113)
at 
org.apache.flink.metrics.MetricRegistry.register(MetricRegistry.java:174)
at 
org.apache.flink.metrics.groups.AbstractMetricGroup.addMetric(AbstractMetricGroup.java:206)
at 
org.apache.flink.metrics.groups.AbstractMetricGroup.gauge(AbstractMetricGroup.java:162)
at 
org.apache.flink.runtime.taskmanager.TaskManager$.instantiateClassLoaderMetrics(TaskManager.scala:2298)
at 
org.apache.flink.runtime.taskmanager.TaskManager$.org$apache$flink$runtime$taskmanager$TaskManager$$instantiateStatusMetrics(TaskManager.scala:2287)
at 
org.apache.flink.runtime.taskmanager.TaskManager.associateWithJobManager(TaskManager.scala:951)
at 
org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$handleRegistrationMessage(TaskManager.scala:631)
at 
org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at 
org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36)
at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at 
org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
at 
org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at 
org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at 
org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:124)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
at akka.dispatch.Mailbox.run(Mailbox.scala:221)
at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2016-06-16 04:58:37,221 ERROR org.apache.flink.metrics.reporter.JMXReporter 
- Metric did not comply with JMX MBean naming rules.
javax.management.NotCompliantMBeanException: Interface is not public: 
org.apache.flink.metrics.reporter.JMXReporter$JmxGaugeMBean
at com.sun.jmx.mbeanserver.MBeanAnalyzer.(MBeanAnalyzer.java:114)
at 
com.sun.jmx.mbeanserver.MBeanAnalyzer.analyzer(MBeanAnalyzer.java:102)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.getAnalyzer(StandardMBeanIntrospector.java:67)
at 
com.sun.jmx.mbeanserver.MBeanIntrospector.getPerInterface(MBeanIntrospector.java:192)
at com.sun.jmx.mbeanserver.MBeanSupport.(MBeanSupport.java:138)
at 
com.sun.jmx.mbeanserver.StandardMBeanSupport.(StandardMBeanSupport.java:60)
at 
com.sun.jmx.mbeanserver.Introspector.makeDynamicMBean(Introspector.java:192)
at 
com.sun.jmx.interceptor.DefaultM

[jira] [Commented] (FLINK-2392) Instable test in flink-yarn-tests

2016-06-16 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333442#comment-15333442
 ] 

Maximilian Michels commented on FLINK-2392:
---

Should be fixed the resolution of FLINK-3962

> Instable test in flink-yarn-tests
> -
>
> Key: FLINK-2392
> URL: https://issues.apache.org/jira/browse/FLINK-2392
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 0.10.0
>Reporter: Matthias J. Sax
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.0.0
>
>
> The test YARNSessionFIFOITCase fails from time to time on an irregular basis. 
> For example see: https://travis-ci.org/apache/flink/jobs/72019690
> {noformat}
> Tests run: 12, Failures: 1, Errors: 0, Skipped: 2, Time elapsed: 205.163 sec 
> <<< FAILURE! - in org.apache.flink.yarn.YARNSessionFIFOITCase
> perJobYarnClusterWithParallelism(org.apache.flink.yarn.YARNSessionFIFOITCase) 
>  Time elapsed: 60.651 sec  <<< FAILURE!
> java.lang.AssertionError: During the timeout period of 60 seconds the 
> expected string did not show up
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.apache.flink.yarn.YarnTestBase.runWithArgs(YarnTestBase.java:478)
>   at 
> org.apache.flink.yarn.YARNSessionFIFOITCase.perJobYarnClusterWithParallelism(YARNSessionFIFOITCase.java:435)
> Results :
> Failed tests: 
>   
> YARNSessionFIFOITCase.perJobYarnClusterWithParallelism:435->YarnTestBase.runWithArgs:478
>  During the timeout period of 60 seconds the expected string did not show up
> {noformat}
> Another error case is this (see 
> https://travis-ci.org/mjsax/flink/jobs/77313444)
> {noformat}
> Tests run: 12, Failures: 3, Errors: 0, Skipped: 2, Time elapsed: 182.008 sec 
> <<< FAILURE! - in org.apache.flink.yarn.YARNSessionFIFOITCase
> testTaskManagerFailure(org.apache.flink.yarn.YARNSessionFIFOITCase)  Time 
> elapsed: 27.356 sec  <<< FAILURE!
> java.lang.AssertionError: Found a file 
> /home/travis/build/mjsax/flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-logDir-nm-0_0/application_1440595422559_0007/container_1440595422559_0007_01_03/taskmanager.log
>  with a prohibited string: [Exception, Started 
> SelectChannelConnector@0.0.0.0:8081]
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.flink.yarn.YarnTestBase.ensureNoProhibitedStringInLogFiles(YarnTestBase.java:294)
>   at 
> org.apache.flink.yarn.YARNSessionFIFOITCase.checkForProhibitedLogContents(YARNSessionFIFOITCase.java:94)
> testNonexistingQueue(org.apache.flink.yarn.YARNSessionFIFOITCase)  Time 
> elapsed: 17.421 sec  <<< FAILURE!
> java.lang.AssertionError: Found a file 
> /home/travis/build/mjsax/flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-logDir-nm-0_0/application_1440595422559_0007/container_1440595422559_0007_01_03/taskmanager.log
>  with a prohibited string: [Exception, Started 
> SelectChannelConnector@0.0.0.0:8081]
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.flink.yarn.YarnTestBase.ensureNoProhibitedStringInLogFiles(YarnTestBase.java:294)
>   at 
> org.apache.flink.yarn.YARNSessionFIFOITCase.checkForProhibitedLogContents(YARNSessionFIFOITCase.java:94)
> testJavaAPI(org.apache.flink.yarn.YARNSessionFIFOITCase)  Time elapsed: 
> 11.984 sec  <<< FAILURE!
> java.lang.AssertionError: Found a file 
> /home/travis/build/mjsax/flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-logDir-nm-0_0/application_1440595422559_0007/container_1440595422559_0007_01_03/taskmanager.log
>  with a prohibited string: [Exception, Started 
> SelectChannelConnector@0.0.0.0:8081]
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.flink.yarn.YarnTestBase.ensureNoProhibitedStringInLogFiles(YarnTestBase.java:294)
>   at 
> org.apache.flink.yarn.YARNSessionFIFOITCase.checkForProhibitedLogContents(YARNSessionFIFOITCase.java:94)
> {noformat}
> Furthermore, this build failed too: 
> https://travis-ci.org/apache/flink/jobs/77313450
> (no error, but Travis terminated to due no progress for 300 seconds -> 
> deadlock?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    3   4   5   6   7   8   9   10   11   12   >