[jira] [Comment Edited] (FLINK-2066) Make delay between execution retries configurable

2015-10-02 Thread Chengxuan Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940794#comment-14940794
 ] 

Chengxuan Wang edited comment on FLINK-2066 at 10/2/15 6:00 AM:


May I ask how long the delay is? I am very new to this. Sorry to disturb.


was (Author: wangchx):
May I ask how long is the delay? I am very new to this. Sorry to disturb.

> Make delay between execution retries configurable
> -
>
> Key: FLINK-2066
> URL: https://issues.apache.org/jira/browse/FLINK-2066
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Nuno Miguel Marques dos Santos
>  Labels: starter
>
> Flink allows to specify a delay between execution retries. This helps to let 
> some external failure causes fully manifest themselves before the restart is 
> attempted.
> The delay is currently defined only system wide.
> We should add it to the {{ExecutionConfig}} of a job to allow per-job 
> specification.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2779) Update documentation to reflect new Stream/Window API

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941136#comment-14941136
 ] 

ASF GitHub Bot commented on FLINK-2779:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145021034
  
There should be some streaming specific stuff in the `Execution Config` 
section. For example, there is `enableTimestamps()` and 
`setAutoWatermarkInterval()`.


> Update documentation to reflect new Stream/Window API
> -
>
> Key: FLINK-2779
> URL: https://issues.apache.org/jira/browse/FLINK-2779
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Kostas Tzoumas
> Fix For: 0.10
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2797) CLI: Missing option to submit jobs in detached mode

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941137#comment-14941137
 ] 

ASF GitHub Bot commented on FLINK-2797:
---

GitHub user sachingoel0101 opened a pull request:

https://github.com/apache/flink/pull/1214

[FLINK-2797][cli] Add support for running jobs in detached mode from CLI

Adds an option `-d` to run jobs in detached mode.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sachingoel0101/flink detached_cli

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1214.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1214


commit f75467cddbc182fdc42061c9ca602e20690a4ef5
Author: Sachin Goel 
Date:   2015-10-02T12:41:09Z

[FLINK-2797][cli] Add support for running jobs in detached mode from CLI




> CLI: Missing option to submit jobs in detached mode
> ---
>
> Key: FLINK-2797
> URL: https://issues.apache.org/jira/browse/FLINK-2797
> Project: Flink
>  Issue Type: Bug
>  Components: Command-line client
>Affects Versions: 0.9, 0.10
>Reporter: Maximilian Michels
>Assignee: Sachin Goel
> Fix For: 0.10
>
>
> Jobs can only be submitted in detached mode using YARN but not on a 
> standalone installation. This has been requested by users who want to submit 
> a job, get the job id, and later query its status.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2779) Update documentation to reflect new Stream/Window API

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941150#comment-14941150
 ] 

ASF GitHub Bot commented on FLINK-2779:
---

Github user senorcarbone commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145025577
  
Nice doc so far ^^

One tiny fix on the  *Advanced window constructs* subsection:

[0,1000], [100,1100], **[200,1200]**, ..., [1000, 2000]


> Update documentation to reflect new Stream/Window API
> -
>
> Key: FLINK-2779
> URL: https://issues.apache.org/jira/browse/FLINK-2779
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Kostas Tzoumas
> Fix For: 0.10
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2354) Recover running jobs on JobManager failure

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941173#comment-14941173
 ] 

ASF GitHub Bot commented on FLINK-2354:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1153#discussion_r41024851
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java
 ---
@@ -110,61 +120,68 @@
 
public CheckpointCoordinator(
JobID job,
-   int numSuccessfulCheckpointsToRetain,
long checkpointTimeout,
ExecutionVertex[] tasksToTrigger,
ExecutionVertex[] tasksToWaitFor,
ExecutionVertex[] tasksToCommitTo,
-   ClassLoader userClassLoader) {
+   ClassLoader userClassLoader,
+   CheckpointIDCounter checkpointIDCounter,
+   CompletedCheckpoints completedCheckpoints,
+   RecoveryMode recoveryMode) throws Exception {

// some sanity checks
if (job == null || tasksToTrigger == null ||
tasksToWaitFor == null || tasksToCommitTo == 
null) {
throw new NullPointerException();
}
-   if (numSuccessfulCheckpointsToRetain < 1) {
-   throw new IllegalArgumentException("Must retain at 
least one successful checkpoint");
-   }
if (checkpointTimeout < 1) {
throw new IllegalArgumentException("Checkpoint timeout 
must be larger than zero");
}

this.job = job;
--- End diff --

Maybe we could harmonize the null checking as you've done it.


> Recover running jobs on JobManager failure
> --
>
> Key: FLINK-2354
> URL: https://issues.apache.org/jira/browse/FLINK-2354
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager
>Affects Versions: master
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
> Fix For: 0.10
>
>
> tl;dr Persist JobGraphs in state backend and coordinate reference to state 
> handle via ZooKeeper.
> Problem: When running multiple JobManagers in high availability mode, the 
> leading job manager looses all running jobs when it fails. After a new 
> leading job manager is elected, it is not possible to recover any previously 
> running jobs.
> Solution: The leading job manager, which receives the job graph writes 1) the 
> job graph to a state backend, and 2) a reference to the respective state 
> handle to ZooKeeper. In general, job graphs can become large (multiple MBs, 
> because they include closures etc.). ZooKeeper is not designed for data of 
> this size. The level of indirection via the reference to the state backend 
> keeps the data in ZooKeeper small.
> Proposed ZooKeeper layout:
> /flink (default)
>   +- currentJobs
>+- job id i
> +- state handle reference of job graph i
> The 'currentJobs' node needs to be persistent to allow recovery of jobs 
> between job managers. The currentJobs node needs to satisfy the following 
> invariant: There is a reference to a job graph with id i IFF the respective 
> job graph needs to be recovered by a newly elected job manager leader.
> With this in place, jobs will be recovered from their initial state (as if 
> resubmitted). The next step is to backup the runtime state handles of 
> checkpoints in a similar manner.
> ---
> This work will be based on [~trohrm...@apache.org]'s implementation of 
> FLINK-2291. The leader election service notifies the job manager about 
> granted/revoked leadership. This notification happens via Akka and thus 
> serially *per* job manager, but results in eventually consistent state 
> between job managers. For some snapshots of time it is possible to have a new 
> leader granted leadership, before the old one has been revoked its leadership.
> [~trohrm...@apache.org], can you confirm that leadership does not guarantee 
> mutually exclusive access to the shared 'currentJobs' state?
> For example, the following can happen:
> - JM 1 is leader, JM 2 is standby
> - JOB i is running (and hence /flink/currentJobs/i exists)
> - ZK notifies leader election service (LES) of JM 1 and JM 2
> - LES 2 immediately notifies JM 2 about granted leadership, but LES 1 
> notification revoking leadership takes longer
> - JOB i finishes (TMs don't notice leadership change yet) and JM 1 receives 
> final JobStatusChange
> - JM 2 resubmits the job /flink/currentJobs/i
> - JM 1 removes /flink/currentJobs/i, because it is now finished
> => inconsistent state (wrt the specified 

[jira] [Updated] (FLINK-2799) Yarn tests cannot be executed with DEBUG log level

2015-10-02 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated FLINK-2799:
--
Fix Version/s: 0.10

> Yarn tests cannot be executed with DEBUG log level
> --
>
> Key: FLINK-2799
> URL: https://issues.apache.org/jira/browse/FLINK-2799
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 0.9, 0.10
>Reporter: Till Rohrmann
> Fix For: 0.10
>
>
> The problem is that on debug log level the {{org.apache.hadoop.util.Shell}} 
> logs {{java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set.}} if 
> the {{HADOOP_HOME}} variable is not set. This is the case on Travis. 
> Unfortunately, the Yarn test fail as soon as they find the word {{Exception}} 
> in the logs. Thus, they fail for no reason.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2779) Update documentation to reflect new Stream/Window API

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941090#comment-14941090
 ] 

ASF GitHub Bot commented on FLINK-2779:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145013573
  
Two comments:
  - I would call the section "Buffer Timeout" rather "Controlling Latency" 
or so. It helps people that are interested in latency to find the section 
relevant for them
  - The code example in "Making local variables consistent" is wrong, the 
types of the interfaces/casting is not correct


> Update documentation to reflect new Stream/Window API
> -
>
> Key: FLINK-2779
> URL: https://issues.apache.org/jira/browse/FLINK-2779
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Kostas Tzoumas
> Fix For: 0.10
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2779][FLINK-2794][streaming][docs] New ...

2015-10-02 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145013573
  
Two comments:
  - I would call the section "Buffer Timeout" rather "Controlling Latency" 
or so. It helps people that are interested in latency to find the section 
relevant for them
  - The code example in "Making local variables consistent" is wrong, the 
types of the interfaces/casting is not correct


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2779) Update documentation to reflect new Stream/Window API

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941113#comment-14941113
 ] 

ASF GitHub Bot commented on FLINK-2779:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145017010
  
Sometimes you write DataSet in the stream guide where it should be 
DataStream.



> Update documentation to reflect new Stream/Window API
> -
>
> Key: FLINK-2779
> URL: https://issues.apache.org/jira/browse/FLINK-2779
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Kostas Tzoumas
> Fix For: 0.10
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2779][FLINK-2794][streaming][docs] New ...

2015-10-02 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145019903
  
The WindowFunction has a different signature for regular windows and all 
windows. This should maybe be visible in the transformations section.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2779) Update documentation to reflect new Stream/Window API

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941127#comment-14941127
 ] 

ASF GitHub Bot commented on FLINK-2779:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145019903
  
The WindowFunction has a different signature for regular windows and all 
windows. This should maybe be visible in the transformations section.


> Update documentation to reflect new Stream/Window API
> -
>
> Key: FLINK-2779
> URL: https://issues.apache.org/jira/browse/FLINK-2779
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Kostas Tzoumas
> Fix For: 0.10
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2779][FLINK-2794][streaming][docs] New ...

2015-10-02 Thread gyfora
Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145024860
  
That is what I mean, global time reduce.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2779][FLINK-2794][streaming][docs] New ...

2015-10-02 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145024435
  
Global windows are not parallel, not in any system, it is inherent in the 
operation.

You can pre-aggregate in parallel, if the windows are time windows. These 
are probably the most common windows. Any other windows are not even supported 
in any other system that I have worked with...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2797][cli] Add support for running jobs...

2015-10-02 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/1214#issuecomment-145023847
  
I tested the normal execution mode. However, I don't have a yarn setup. Can 
someone give it a quick check with yarn?
Quick question though: In detached mode, print or collect throw exceptions. 
Is that intended?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2810) Warn user if bc not installed

2015-10-02 Thread Greg Hogan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941172#comment-14941172
 ] 

Greg Hogan commented on FLINK-2810:
---

Bash arithmetic evaluation is limited to integers.

> Warn user if bc not installed
> -
>
> Key: FLINK-2810
> URL: https://issues.apache.org/jira/browse/FLINK-2810
> Project: Flink
>  Issue Type: Improvement
>  Components: Command-line client
>Affects Versions: master
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Minor
>
> taskmanager.sh will print the following message when starting the cluster if 
> bc is not installed and off-heap memory is enabled and configured as a ratio. 
> The script should first check that bc is installed and otherwise print a 
> specific message.
> {noformat}
> [ERROR] Configured TaskManager managed memory fraction is not a valid value. 
> Please set taskmanager.memory.fraction in flink-conf.yaml
> {noformat}
> An example of a distribution where bc is not installed by default are the 
> Debian images for Google Compute Engine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2354] Add job graph and checkpoint reco...

2015-10-02 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1153#discussion_r41024851
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java
 ---
@@ -110,61 +120,68 @@
 
public CheckpointCoordinator(
JobID job,
-   int numSuccessfulCheckpointsToRetain,
long checkpointTimeout,
ExecutionVertex[] tasksToTrigger,
ExecutionVertex[] tasksToWaitFor,
ExecutionVertex[] tasksToCommitTo,
-   ClassLoader userClassLoader) {
+   ClassLoader userClassLoader,
+   CheckpointIDCounter checkpointIDCounter,
+   CompletedCheckpoints completedCheckpoints,
+   RecoveryMode recoveryMode) throws Exception {

// some sanity checks
if (job == null || tasksToTrigger == null ||
tasksToWaitFor == null || tasksToCommitTo == 
null) {
throw new NullPointerException();
}
-   if (numSuccessfulCheckpointsToRetain < 1) {
-   throw new IllegalArgumentException("Must retain at 
least one successful checkpoint");
-   }
if (checkpointTimeout < 1) {
throw new IllegalArgumentException("Checkpoint timeout 
must be larger than zero");
}

this.job = job;
--- End diff --

Maybe we could harmonize the null checking as you've done it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-2810) Warn user if bc not installed

2015-10-02 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated FLINK-2810:
--
Fix Version/s: 0.10

> Warn user if bc not installed
> -
>
> Key: FLINK-2810
> URL: https://issues.apache.org/jira/browse/FLINK-2810
> Project: Flink
>  Issue Type: Improvement
>  Components: Command-line client
>Affects Versions: 0.10
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Minor
> Fix For: 0.10
>
>
> taskmanager.sh will print the following message when starting the cluster if 
> bc is not installed and off-heap memory is enabled and configured as a ratio. 
> The script should first check that bc is installed and otherwise print a 
> specific message.
> {noformat}
> [ERROR] Configured TaskManager managed memory fraction is not a valid value. 
> Please set taskmanager.memory.fraction in flink-conf.yaml
> {noformat}
> An example of a distribution where bc is not installed by default are the 
> Debian images for Google Compute Engine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-2799) Yarn tests cannot be executed with DEBUG log level

2015-10-02 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated FLINK-2799:
--
Affects Version/s: 0.10
   0.9

> Yarn tests cannot be executed with DEBUG log level
> --
>
> Key: FLINK-2799
> URL: https://issues.apache.org/jira/browse/FLINK-2799
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 0.9, 0.10
>Reporter: Till Rohrmann
>
> The problem is that on debug log level the {{org.apache.hadoop.util.Shell}} 
> logs {{java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set.}} if 
> the {{HADOOP_HOME}} variable is not set. This is the case on Travis. 
> Unfortunately, the Yarn test fail as soon as they find the word {{Exception}} 
> in the logs. Thus, they fail for no reason.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [hotfix] Execute YARN integration tests only u...

2015-10-02 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/1210#issuecomment-145036429
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1599] TypeComperator with no keys and c...

2015-10-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1207


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2800) kryo serialization problem

2015-10-02 Thread Stefano Bortoli (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940984#comment-14940984
 ] 

Stefano Bortoli commented on FLINK-2800:


I don't know whether it is the same issue, but after switching from my POJOs to 
BSONObject I have got a race condition with kryo serialization:

2015-10-02 11:55:26 INFO  JobClient:161 - 10/02/2015 11:55:26Cross(Cross at 
main(FlinkMongoHadoop2LinkPOI2CDA.java:138))(4/4) switched to FAILED
java.lang.IndexOutOfBoundsException: Index: 112, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at 
com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:42)
at com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:805)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:759)
at 
org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:210)
at 
org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:127)
at 
org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:30)
at 
org.apache.flink.runtime.operators.resettable.AbstractBlockResettableIterator.getNextRecord(AbstractBlockResettableIterator.java:180)
at 
org.apache.flink.runtime.operators.resettable.BlockResettableMutableObjectIterator.next(BlockResettableMutableObjectIterator.java:111)
at 
org.apache.flink.runtime.operators.CrossDriver.runBlockedOuterSecond(CrossDriver.java:309)
at org.apache.flink.runtime.operators.CrossDriver.run(CrossDriver.java:162)
at 
org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:489)
at 
org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:354)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:581)
at java.lang.Thread.run(Thread.java:745)

> kryo serialization problem
> --
>
> Key: FLINK-2800
> URL: https://issues.apache.org/jira/browse/FLINK-2800
> Project: Flink
>  Issue Type: Bug
>  Components: Type Serialization System
>Affects Versions: master
> Environment: linux ubuntu 12.04 LTS, Java 7
>Reporter: Stefano Bortoli
>
> Performing a cross of two dataset of POJOs I have got the exception below. 
> The first time I run the process, there was no problem. When I run it the 
> second time, I have got the exception. My guess is that it could be a race 
> condition related to the reuse of the Kryo serializer object. However, it 
> could also be "a bug where type registrations are not properly forwarded to 
> all Serializers", as suggested by Stephan.
> 
> 2015-10-01 18:18:21 INFO  JobClient:161 - 10/01/2015 18:18:21 Cross(Cross at 
> main(FlinkMongoHadoop2LinkPOI2CDA.java:160))(3/4) switched to FAILED 
> com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 
> 114
>   at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:119)
>   at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:641)
>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:752)
>   at 
> org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:210)
>   at 
> org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:127)
>   at 
> org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:30)
>   at 
> org.apache.flink.runtime.operators.resettable.AbstractBlockResettableIterator.getNextRecord(AbstractBlockResettableIterator.java:180)
>   at 
> org.apache.flink.runtime.operators.resettable.BlockResettableMutableObjectIterator.next(BlockResettableMutableObjectIterator.java:111)
>   at 
> org.apache.flink.runtime.operators.CrossDriver.runBlockedOuterSecond(CrossDriver.java:309)
>   at 
> org.apache.flink.runtime.operators.CrossDriver.run(CrossDriver.java:162)
>   at 
> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:489)
>   at 
> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:354)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:581)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-2796) CLI -q flag to supress the output does not work

2015-10-02 Thread Sachin Goel (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sachin Goel reassigned FLINK-2796:
--

Assignee: Sachin Goel

> CLI -q flag to supress the output does not work
> ---
>
> Key: FLINK-2796
> URL: https://issues.apache.org/jira/browse/FLINK-2796
> Project: Flink
>  Issue Type: Bug
>  Components: Command-line client
>Affects Versions: 0.10
>Reporter: Maximilian Michels
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: starter
> Fix For: 0.10
>
>
> The log output is shown regardless of whether -q is specified:
> {noformat}
> /bin/flink run -q examples/WordCount.jar
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2779][FLINK-2794][streaming][docs] New ...

2015-10-02 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145017010
  
Sometimes you write DataSet in the stream guide where it should be 
DataStream.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1789] [core] [runtime] [java-api] Allow...

2015-10-02 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/593#issuecomment-145026274
  
This looks good to merge. However, there are a lot of merge conflicts. 
Could you rebase to the master again? We're are thinking about creating a 
release candidate for 0.10 until the end of next week.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2354) Recover running jobs on JobManager failure

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941177#comment-14941177
 ] 

ASF GitHub Bot commented on FLINK-2354:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1153#discussion_r41025127
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java
 ---
@@ -178,9 +195,9 @@ public void run() {
 * After this method has been called, the coordinator does not accept 
and further
 * messages and cannot trigger any further checkpoints.
 */
-   public void shutdown() {
+   public void shutdown() throws Exception {
synchronized (lock) {
-   try {   
+   try {
if (shutdown) {
return;
}
--- End diff --

Maybe we can replace this return with simply `if (!shutdown) { the shutdown 
logic }`


> Recover running jobs on JobManager failure
> --
>
> Key: FLINK-2354
> URL: https://issues.apache.org/jira/browse/FLINK-2354
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager
>Affects Versions: master
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
> Fix For: 0.10
>
>
> tl;dr Persist JobGraphs in state backend and coordinate reference to state 
> handle via ZooKeeper.
> Problem: When running multiple JobManagers in high availability mode, the 
> leading job manager looses all running jobs when it fails. After a new 
> leading job manager is elected, it is not possible to recover any previously 
> running jobs.
> Solution: The leading job manager, which receives the job graph writes 1) the 
> job graph to a state backend, and 2) a reference to the respective state 
> handle to ZooKeeper. In general, job graphs can become large (multiple MBs, 
> because they include closures etc.). ZooKeeper is not designed for data of 
> this size. The level of indirection via the reference to the state backend 
> keeps the data in ZooKeeper small.
> Proposed ZooKeeper layout:
> /flink (default)
>   +- currentJobs
>+- job id i
> +- state handle reference of job graph i
> The 'currentJobs' node needs to be persistent to allow recovery of jobs 
> between job managers. The currentJobs node needs to satisfy the following 
> invariant: There is a reference to a job graph with id i IFF the respective 
> job graph needs to be recovered by a newly elected job manager leader.
> With this in place, jobs will be recovered from their initial state (as if 
> resubmitted). The next step is to backup the runtime state handles of 
> checkpoints in a similar manner.
> ---
> This work will be based on [~trohrm...@apache.org]'s implementation of 
> FLINK-2291. The leader election service notifies the job manager about 
> granted/revoked leadership. This notification happens via Akka and thus 
> serially *per* job manager, but results in eventually consistent state 
> between job managers. For some snapshots of time it is possible to have a new 
> leader granted leadership, before the old one has been revoked its leadership.
> [~trohrm...@apache.org], can you confirm that leadership does not guarantee 
> mutually exclusive access to the shared 'currentJobs' state?
> For example, the following can happen:
> - JM 1 is leader, JM 2 is standby
> - JOB i is running (and hence /flink/currentJobs/i exists)
> - ZK notifies leader election service (LES) of JM 1 and JM 2
> - LES 2 immediately notifies JM 2 about granted leadership, but LES 1 
> notification revoking leadership takes longer
> - JOB i finishes (TMs don't notice leadership change yet) and JM 1 receives 
> final JobStatusChange
> - JM 2 resubmits the job /flink/currentJobs/i
> - JM 1 removes /flink/currentJobs/i, because it is now finished
> => inconsistent state (wrt the specified invariant above)
> If it is indeed a problem, we can circumvent this with a Curator recipe for 
> [shared locks|http://curator.apache.org/curator-recipes/shared-lock.html] to 
> coordinate the access to currentJobs. The lock needs to be acquired on 
> leadership.
> ---
> Minimum required tests:
> - Unit tests for job graph serialization and writing to state backend and 
> ZooKeeper with expected nodes
> - Unit tests for job submission to job manager in leader/non-leader state
> - Unit tests for leadership granting/revoking and job submission/restarting 
> interleavings
> - Process failure integration tests with single and multiple running jobs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2354] Add job graph and checkpoint reco...

2015-10-02 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1153#discussion_r41025127
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java
 ---
@@ -178,9 +195,9 @@ public void run() {
 * After this method has been called, the coordinator does not accept 
and further
 * messages and cannot trigger any further checkpoints.
 */
-   public void shutdown() {
+   public void shutdown() throws Exception {
synchronized (lock) {
-   try {   
+   try {
if (shutdown) {
return;
}
--- End diff --

Maybe we can replace this return with simply `if (!shutdown) { the shutdown 
logic }`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2810) Warn user if bc not installed

2015-10-02 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941185#comment-14941185
 ] 

Maximilian Michels commented on FLINK-2810:
---

Yes, bash cannot perform floating point operations. {{bc}} is part of POSIX: 
http://www.unix.com/man-page/posix/1p/bc/ but not included in every 
distribution.

We only need {{bc}} in off-heap mode. Throwing a proper error when {{bc}} does 
not exist is a good suggestion.

> Warn user if bc not installed
> -
>
> Key: FLINK-2810
> URL: https://issues.apache.org/jira/browse/FLINK-2810
> Project: Flink
>  Issue Type: Improvement
>  Components: Command-line client
>Affects Versions: master
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Minor
>
> taskmanager.sh will print the following message when starting the cluster if 
> bc is not installed and off-heap memory is enabled and configured as a ratio. 
> The script should first check that bc is installed and otherwise print a 
> specific message.
> {noformat}
> [ERROR] Configured TaskManager managed memory fraction is not a valid value. 
> Please set taskmanager.memory.fraction in flink-conf.yaml
> {noformat}
> An example of a distribution where bc is not installed by default are the 
> Debian images for Google Compute Engine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [hotfix] Execute YARN integration tests only u...

2015-10-02 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1210#issuecomment-145042315
  
Let's hope the world moves beyond Hadoop 2.3 quickly so we can activate it 
again for the default version

+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-2808) Rework / Extend the StatehandleProvider

2015-10-02 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-2808:
---

 Summary: Rework / Extend the StatehandleProvider
 Key: FLINK-2808
 URL: https://issues.apache.org/jira/browse/FLINK-2808
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


I would like to make some changes (mostly additions) to the 
{{StateHandleProvider}}. Ideally for the upcoming release, as it is somewhat 
part of the public API.

The rational behind this is to handle in a nice and extensible way the creation 
of key/value state backed by various implementations (FS, distributed KV store, 
local KV store with FS backup, ...) and various checkpointing ways (full dump, 
append, incremental keys, ...)

The changes would concretely be:

1.  There should be a default {{StateHandleProvider}} set on the execution 
environment. Functions can later specify the {{StateHandleProvider}} when 
grabbing the {{StreamOperatorState}} from the runtime context (plus optionally 
a {{Checkpointer}})

2.  The {{StreamOperatorState}} is created from the {{StateHandleProvider}}. 
That way, a KeyValueStore state backend can create a {{StreamOperatorState}} 
that directly updates data in the KV store on every access, if that is desired 
(and filter accesses by timestamps to only show committed data)

3.  The StateHandleProvider should have methods to get an output stream that 
writes to the state checkpoint directly (and returns a StateHandle upon 
closing). That way we can convert and dump large state into the checkpoint 
without crating a full copy in memory before.


Lastly, I would like to change some names
  - {{StateHandleProvider}} to either {{StateBackend}}, {{StateStore}}, or 
{{StateProvider}} (simpler name).
  - {{StreamOperatorState}} to either {{State}} or {{KVState}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2809) DataSet[Unit] doesn't work

2015-10-02 Thread Gabor Gevay (JIRA)
Gabor Gevay created FLINK-2809:
--

 Summary: DataSet[Unit] doesn't work
 Key: FLINK-2809
 URL: https://issues.apache.org/jira/browse/FLINK-2809
 Project: Flink
  Issue Type: Bug
  Components: Scala API
Reporter: Gabor Gevay
Assignee: Gabor Gevay
Priority: Minor


The following code creates a DataSet\[Unit\]:

val env = ExecutionEnvironment.createLocalEnvironment()
val a = env.fromElements(1,2,3)
val b = a.map (_ => ())
b.writeAsText("/tmp/xxx")
env.execute()

This doesn't work, because a VoidSerializer is created, which can't cope with a 
BoxedUnit. See exception below.

I'm now thinking about creating a UnitSerializer class.



org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:314)
at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at 
org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:36)
at 
org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at 
org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at 
org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:92)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
at akka.dispatch.Mailbox.run(Mailbox.scala:221)
at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.ClassCastException: scala.runtime.BoxedUnit cannot be cast 
to java.lang.Void
at 
org.apache.flink.api.common.typeutils.base.VoidSerializer.serialize(VoidSerializer.java:26)
at 
org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:51)
at 
org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.addRecord(SpanningRecordSerializer.java:76)
at 
org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:83)
at 
org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:65)
at 
org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:78)
at 
org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:177)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:564)
at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2797) CLI: Missing option to submit jobs in detached mode

2015-10-02 Thread Sachin Goel (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941108#comment-14941108
 ] 

Sachin Goel commented on FLINK-2797:


Hi [~mxm], yes. I'm almost done. Just building and testing locally right now.

> CLI: Missing option to submit jobs in detached mode
> ---
>
> Key: FLINK-2797
> URL: https://issues.apache.org/jira/browse/FLINK-2797
> Project: Flink
>  Issue Type: Bug
>  Components: Command-line client
>Affects Versions: 0.9, 0.10
>Reporter: Maximilian Michels
>Assignee: Sachin Goel
> Fix For: 0.10
>
>
> Jobs can only be submitted in detached mode using YARN but not on a 
> standalone installation. This has been requested by users who want to submit 
> a job, get the job id, and later query its status.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2779) Update documentation to reflect new Stream/Window API

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941119#comment-14941119
 ] 

ASF GitHub Bot commented on FLINK-2779:
---

Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145017991
  
The description of Evictors for the keyed windows are a little bit messy, 
what gets evicted from where. There is also some conflict with the example and 
the description: "Retain 1000 elements from the end of the window backwards, 
evicting all others." vs "and every time execution is triggered, 10 elements 
are removed from the window". 

As far as I understand triggers and evictors work with the same semantics 
as now, but there is always a fix time trigger/eviction determined by the 
window assigner.


> Update documentation to reflect new Stream/Window API
> -
>
> Key: FLINK-2779
> URL: https://issues.apache.org/jira/browse/FLINK-2779
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Kostas Tzoumas
> Fix For: 0.10
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2779][FLINK-2794][streaming][docs] New ...

2015-10-02 Thread gyfora
Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145017991
  
The description of Evictors for the keyed windows are a little bit messy, 
what gets evicted from where. There is also some conflict with the example and 
the description: "Retain 1000 elements from the end of the window backwards, 
evicting all others." vs "and every time execution is triggered, 10 elements 
are removed from the window". 

As far as I understand triggers and evictors work with the same semantics 
as now, but there is always a fix time trigger/eviction determined by the 
window assigner.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2779) Update documentation to reflect new Stream/Window API

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941118#comment-14941118
 ] 

ASF GitHub Bot commented on FLINK-2779:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145017812
  
In the section "Transformations" you sometimes use the `` syntax to 
highlight words, such as DataStream. Inside the table they don't get properly 
translated, however. (At least on my machine.)


> Update documentation to reflect new Stream/Window API
> -
>
> Key: FLINK-2779
> URL: https://issues.apache.org/jira/browse/FLINK-2779
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Kostas Tzoumas
> Fix For: 0.10
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1789) Allow adding of URLs to the usercode class loader

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941151#comment-14941151
 ] 

ASF GitHub Bot commented on FLINK-1789:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/593#issuecomment-145026274
  
This looks good to merge. However, there are a lot of merge conflicts. 
Could you rebase to the master again? We're are thinking about creating a 
release candidate for 0.10 until the end of next week.


> Allow adding of URLs to the usercode class loader
> -
>
> Key: FLINK-1789
> URL: https://issues.apache.org/jira/browse/FLINK-1789
> Project: Flink
>  Issue Type: Improvement
>  Components: Distributed Runtime
>Reporter: Timo Walther
>Assignee: Timo Walther
>Priority: Minor
>
> Currently, there is no option to add customs classpath URLs to the 
> FlinkUserCodeClassLoader. JARs always need to be shipped to the cluster even 
> if they are already present on all nodes.
> It would be great if RemoteEnvironment also accepts valid classpaths URLs and 
> forwards them to BlobLibraryCacheManager.
> The problem with the current approach is that the code loaded by the regular 
> JVM class loader cannot refer to job specific types (which can be accessed 
> only at the UserCodeClassLoader level). Unfortunately, this is the case if we 
> use the classpath entry to generate the dataflows dynamically at runtime.
> Currently this functionality needs to be done by "hacks" (hardcode a 
> filesystem path next to the list of jars when initializing the BlobManager 
> entry). It makes sense to open an issue which makes this list parameterizable 
> via an additional ExecutionEnvironment argument (this is basically the only 
> main feature which prohibits the use of Emma project with "off-the-shelf" 
> Flink).
> This, of course, would require that the folders are shared (e.g. via NFS) 
> between client, master and workers. I think what made Stephan so excited is 
> the idea of using the same URL mechanism in order to ship the code to all 
> dependent parties (most probably by running a dedicated HTTP or FTP server on 
> the client).
> We make the following assumptions for the use case where we need the global 
> class path:
> - The URL is either a file path that points to a directory accessible to all 
> nodes (NFS or so) and the client runs in the cluster as well.
> - The URL is an HTTP URL or so that points to a file server that serves the 
> classes to work in non-shared directory settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2779][FLINK-2794][streaming][docs] New ...

2015-10-02 Thread senorcarbone
Github user senorcarbone commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145032778
  
Ah Indeed. I just reviewed this with the prospect of custom 
pre-aggregations in mind and it seems like pre-aggregation strategies operate 
on bucket-granularity. Solely on windowing semantics what you suggest covers my 
question though. thanks ^^


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2779) Update documentation to reflect new Stream/Window API

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941179#comment-14941179
 ] 

ASF GitHub Bot commented on FLINK-2779:
---

Github user senorcarbone commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145032778
  
Ah Indeed. I just reviewed this with the prospect of custom 
pre-aggregations in mind and it seems like pre-aggregation strategies operate 
on bucket-granularity. Solely on windowing semantics what you suggest covers my 
question though. thanks ^^


> Update documentation to reflect new Stream/Window API
> -
>
> Key: FLINK-2779
> URL: https://issues.apache.org/jira/browse/FLINK-2779
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Kostas Tzoumas
> Fix For: 0.10
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-2810) Warn user if bc not installed

2015-10-02 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated FLINK-2810:
--
Affects Version/s: (was: master)
   0.10

> Warn user if bc not installed
> -
>
> Key: FLINK-2810
> URL: https://issues.apache.org/jira/browse/FLINK-2810
> Project: Flink
>  Issue Type: Improvement
>  Components: Command-line client
>Affects Versions: 0.10
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Minor
> Fix For: 0.10
>
>
> taskmanager.sh will print the following message when starting the cluster if 
> bc is not installed and off-heap memory is enabled and configured as a ratio. 
> The script should first check that bc is installed and otherwise print a 
> specific message.
> {noformat}
> [ERROR] Configured TaskManager managed memory fraction is not a valid value. 
> Please set taskmanager.memory.fraction in flink-conf.yaml
> {noformat}
> An example of a distribution where bc is not installed by default are the 
> Debian images for Google Compute Engine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2779][FLINK-2794][streaming][docs] New ...

2015-10-02 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145017812
  
In the section "Transformations" you sometimes use the `` syntax to 
highlight words, such as DataStream. Inside the table they don't get properly 
translated, however. (At least on my machine.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2810) Warn user if bc not installed

2015-10-02 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941138#comment-14941138
 ] 

Stephan Ewen commented on FLINK-2810:
-

Do the scripts strictly need {{bc}} ? You can do quite a bit of math in the 
bash as well...

> Warn user if bc not installed
> -
>
> Key: FLINK-2810
> URL: https://issues.apache.org/jira/browse/FLINK-2810
> Project: Flink
>  Issue Type: Improvement
>  Components: Command-line client
>Affects Versions: master
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Minor
>
> taskmanager.sh will print the following message when starting the cluster if 
> bc is not installed and off-heap memory is enabled and configured as a ratio. 
> The script should first check that bc is installed and otherwise print a 
> specific message.
> {noformat}
> [ERROR] Configured TaskManager managed memory fraction is not a valid value. 
> Please set taskmanager.memory.fraction in flink-conf.yaml
> {noformat}
> An example of a distribution where bc is not installed by default are the 
> Debian images for Google Compute Engine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2797) CLI: Missing option to submit jobs in detached mode

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941142#comment-14941142
 ] 

ASF GitHub Bot commented on FLINK-2797:
---

Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/1214#issuecomment-145023847
  
I tested the normal execution mode. However, I don't have a yarn setup. Can 
someone give it a quick check with yarn?
Quick question though: In detached mode, print or collect throw exceptions. 
Is that intended?


> CLI: Missing option to submit jobs in detached mode
> ---
>
> Key: FLINK-2797
> URL: https://issues.apache.org/jira/browse/FLINK-2797
> Project: Flink
>  Issue Type: Bug
>  Components: Command-line client
>Affects Versions: 0.9, 0.10
>Reporter: Maximilian Michels
>Assignee: Sachin Goel
> Fix For: 0.10
>
>
> Jobs can only be submitted in detached mode using YARN but not on a 
> standalone installation. This has been requested by users who want to submit 
> a job, get the job id, and later query its status.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2779][FLINK-2794][streaming][docs] New ...

2015-10-02 Thread senorcarbone
Github user senorcarbone commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145025577
  
Nice doc so far ^^

One tiny fix on the  *Advanced window constructs* subsection:

[0,1000], [100,1100], **[200,1200]**, ..., [1000, 2000]


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2779) Update documentation to reflect new Stream/Window API

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941148#comment-14941148
 ] 

ASF GitHub Bot commented on FLINK-2779:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145025216
  
Would be a nice addition to the docs, to state that global time reduce (and 
other aggregations) are pre-aggregated in parallel.


> Update documentation to reflect new Stream/Window API
> -
>
> Key: FLINK-2779
> URL: https://issues.apache.org/jira/browse/FLINK-2779
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Kostas Tzoumas
> Fix For: 0.10
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2779][FLINK-2794][streaming][docs] New ...

2015-10-02 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145025216
  
Would be a nice addition to the docs, to state that global time reduce (and 
other aggregations) are pre-aggregated in parallel.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2779) Update documentation to reflect new Stream/Window API

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941144#comment-14941144
 ] 

ASF GitHub Bot commented on FLINK-2779:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145024435
  
Global windows are not parallel, not in any system, it is inherent in the 
operation.

You can pre-aggregate in parallel, if the windows are time windows. These 
are probably the most common windows. Any other windows are not even supported 
in any other system that I have worked with...


> Update documentation to reflect new Stream/Window API
> -
>
> Key: FLINK-2779
> URL: https://issues.apache.org/jira/browse/FLINK-2779
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Kostas Tzoumas
> Fix For: 0.10
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2779) Update documentation to reflect new Stream/Window API

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941163#comment-14941163
 ] 

ASF GitHub Bot commented on FLINK-2779:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145029475
  
@senorcarbone It doesn't restrict. You can use an assigner that assigns to 
one single window. Then using, trigger and evictor you can implement everything 
that was possible before.


> Update documentation to reflect new Stream/Window API
> -
>
> Key: FLINK-2779
> URL: https://issues.apache.org/jira/browse/FLINK-2779
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Kostas Tzoumas
> Fix For: 0.10
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2797][cli] Add support for running jobs...

2015-10-02 Thread sachingoel0101
GitHub user sachingoel0101 opened a pull request:

https://github.com/apache/flink/pull/1214

[FLINK-2797][cli] Add support for running jobs in detached mode from CLI

Adds an option `-d` to run jobs in detached mode.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sachingoel0101/flink detached_cli

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1214.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1214


commit f75467cddbc182fdc42061c9ca602e20690a4ef5
Author: Sachin Goel 
Date:   2015-10-02T12:41:09Z

[FLINK-2797][cli] Add support for running jobs in detached mode from CLI




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2779][FLINK-2794][streaming][docs] New ...

2015-10-02 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145021034
  
There should be some streaming specific stuff in the `Execution Config` 
section. For example, there is `enableTimestamps()` and 
`setAutoWatermarkInterval()`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2779][FLINK-2794][streaming][docs] New ...

2015-10-02 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145029475
  
@senorcarbone It doesn't restrict. You can use an assigner that assigns to 
one single window. Then using, trigger and evictor you can implement everything 
that was possible before.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-2810) Warn user if bc not installed

2015-10-02 Thread Greg Hogan (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan updated FLINK-2810:
--
Description: 
taskmanager.sh will print the following message when starting the cluster if bc 
is not installed and off-heap memory is enabled and configured as a ratio. The 
script should first check that bc is installed and otherwise print a specific 
message.

{noformat}
[ERROR] Configured TaskManager managed memory fraction is not a valid value. 
Please set 'taskmanager.memory.fraction' in flink-conf.yaml
{noformat}

An example of a distribution where bc is not installed by default are the 
Debian images for Google Compute Engine.

  was:
taskmanager.sh will print the following message when starting the cluster if bc 
is not installed and off-heap memory is enabled and configured as a ratio. The 
script should first check that bc is installed and otherwise print a specific 
message.

{noformat}
[ERROR] Configured TaskManager managed memory fraction is not a valid value. 
Please set taskmanager.memory.fraction in flink-conf.yaml
{noformat}

An example of a distribution where bc is not installed by default are the 
Debian images for Google Compute Engine.


> Warn user if bc not installed
> -
>
> Key: FLINK-2810
> URL: https://issues.apache.org/jira/browse/FLINK-2810
> Project: Flink
>  Issue Type: Improvement
>  Components: Command-line client
>Affects Versions: 0.10
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Minor
> Fix For: 0.10
>
>
> taskmanager.sh will print the following message when starting the cluster if 
> bc is not installed and off-heap memory is enabled and configured as a ratio. 
> The script should first check that bc is installed and otherwise print a 
> specific message.
> {noformat}
> [ERROR] Configured TaskManager managed memory fraction is not a valid value. 
> Please set 'taskmanager.memory.fraction' in flink-conf.yaml
> {noformat}
> An example of a distribution where bc is not installed by default are the 
> Debian images for Google Compute Engine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2790] [yarn] [ha] Adds high availabilit...

2015-10-02 Thread tillrohrmann
GitHub user tillrohrmann opened a pull request:

https://github.com/apache/flink/pull/1213

[FLINK-2790] [yarn] [ha] Adds high availability support for Yarn

Adds high availability support for Yarn by exploiting Yarn's functionality 
to restart a failed application master. Depending on the Hadoop version the 
behaviour is an increasing superset of functionalities of the preceding 
version's behaviour

###2.3.0 <= version < 2.4.0

* Set the number of application attempts to the configuration value 
`yarn.application-attempts`. This means that the application can be restarted 
`yarn.application-attempts` time before yarn fails the application. In case of 
an application master, all other task manager containers will also be killed.

### 2.4.0 <= version < 2.6.0

* Additionally, enables that containers will be kept across application 
attempts. This avoids the killing of TaskManager containers in the case of an 
application master failure. This has the advantage that the startup time is 
faster and that the user does not have to wait for obtaining the container 
resources again.

### 2.6.0 <= version

* Sets the attempts failure validity interval to the akka timeout value. 
The attempts failure validity interval says that an application is only killed 
after the system has seen the maximum number of application attempts during one 
interval. This avoids that a long lasting job will deplete it's application 
attempts.

This PR also refactors the different Yarn components to allow the start-up 
of testing actors within Yarn. Furthermore, the `JobManager` start up logic is 
slightly extended to allow code reuse in the `ApplicationMasterBase`.

The HA functionality is tested via the `YARNHighAvailabilityITCase` where 
an application master is multiple times killed. Each time it's checked that the 
single TaskManager successfully reconnects to the newly started 
`YarnJobManager`. In case of version `2.3.0`, the `TaskManager` is restarted.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tillrohrmann/flink yarnHA

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1213.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1213


commit 1a18172ae69eb576638704f8e143a921aa8630d5
Author: Till Rohrmann 
Date:   2015-09-01T14:35:48Z

[FLINK-2790] [yarn] [ha] Adds high availability support for Yarn

commit 5359676556d16610303d4f36fcbe5320ef4e6643
Author: Till Rohrmann 
Date:   2015-09-23T15:42:57Z

Refactors JobManager's start actors method to be reusable

commit d6a47cd8ad265c5122d1a67c09773cbc5a491261
Author: Till Rohrmann 
Date:   2015-09-24T12:55:18Z

Yarn refactoring to introduce yarn testing functionality

commit f9578f136dd41cd9829d712f7c62a59c9ea8e194
Author: Till Rohrmann 
Date:   2015-09-28T16:21:30Z

Added support for testing yarn cluster. Extracted JobManager's and 
TaskManager's testing messages into stackable traits.

commit dbfa16438ad9d7d61e8d1a582c8cd1de9352078e
Author: Till Rohrmann 
Date:   2015-09-29T15:05:01Z

Implemented YarnHighAvailabilityITCase using Akka messages for 
synchronization.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2790) Add high availability support for Yarn

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941081#comment-14941081
 ] 

ASF GitHub Bot commented on FLINK-2790:
---

GitHub user tillrohrmann opened a pull request:

https://github.com/apache/flink/pull/1213

[FLINK-2790] [yarn] [ha] Adds high availability support for Yarn

Adds high availability support for Yarn by exploiting Yarn's functionality 
to restart a failed application master. Depending on the Hadoop version the 
behaviour is an increasing superset of functionalities of the preceding 
version's behaviour

###2.3.0 <= version < 2.4.0

* Set the number of application attempts to the configuration value 
`yarn.application-attempts`. This means that the application can be restarted 
`yarn.application-attempts` time before yarn fails the application. In case of 
an application master, all other task manager containers will also be killed.

### 2.4.0 <= version < 2.6.0

* Additionally, enables that containers will be kept across application 
attempts. This avoids the killing of TaskManager containers in the case of an 
application master failure. This has the advantage that the startup time is 
faster and that the user does not have to wait for obtaining the container 
resources again.

### 2.6.0 <= version

* Sets the attempts failure validity interval to the akka timeout value. 
The attempts failure validity interval says that an application is only killed 
after the system has seen the maximum number of application attempts during one 
interval. This avoids that a long lasting job will deplete it's application 
attempts.

This PR also refactors the different Yarn components to allow the start-up 
of testing actors within Yarn. Furthermore, the `JobManager` start up logic is 
slightly extended to allow code reuse in the `ApplicationMasterBase`.

The HA functionality is tested via the `YARNHighAvailabilityITCase` where 
an application master is multiple times killed. Each time it's checked that the 
single TaskManager successfully reconnects to the newly started 
`YarnJobManager`. In case of version `2.3.0`, the `TaskManager` is restarted.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tillrohrmann/flink yarnHA

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1213.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1213


commit 1a18172ae69eb576638704f8e143a921aa8630d5
Author: Till Rohrmann 
Date:   2015-09-01T14:35:48Z

[FLINK-2790] [yarn] [ha] Adds high availability support for Yarn

commit 5359676556d16610303d4f36fcbe5320ef4e6643
Author: Till Rohrmann 
Date:   2015-09-23T15:42:57Z

Refactors JobManager's start actors method to be reusable

commit d6a47cd8ad265c5122d1a67c09773cbc5a491261
Author: Till Rohrmann 
Date:   2015-09-24T12:55:18Z

Yarn refactoring to introduce yarn testing functionality

commit f9578f136dd41cd9829d712f7c62a59c9ea8e194
Author: Till Rohrmann 
Date:   2015-09-28T16:21:30Z

Added support for testing yarn cluster. Extracted JobManager's and 
TaskManager's testing messages into stackable traits.

commit dbfa16438ad9d7d61e8d1a582c8cd1de9352078e
Author: Till Rohrmann 
Date:   2015-09-29T15:05:01Z

Implemented YarnHighAvailabilityITCase using Akka messages for 
synchronization.




> Add high availability support for Yarn
> --
>
> Key: FLINK-2790
> URL: https://issues.apache.org/jira/browse/FLINK-2790
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Reporter: Till Rohrmann
> Fix For: 0.10
>
>
> Add master high availability support for Yarn. The idea is to let Yarn 
> restart a failed application master in a new container. For that, we set the 
> number of application retries to something greater than 1. 
> From version 2.4.0 onwards, it is possible to reuse already started 
> containers for the TaskManagers, thus, avoiding unnecessary restart delays.
> From version 2.6.0 onwards, it is possible to specify an interval in which 
> the number of application attempts have to be exceeded in order to fail the 
> job. This will prevent long running jobs from eventually depleting all 
> available application attempts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2810) Warn user if bc not installed

2015-10-02 Thread Greg Hogan (JIRA)
Greg Hogan created FLINK-2810:
-

 Summary: Warn user if bc not installed
 Key: FLINK-2810
 URL: https://issues.apache.org/jira/browse/FLINK-2810
 Project: Flink
  Issue Type: Improvement
  Components: Command-line client
Affects Versions: master
Reporter: Greg Hogan
Assignee: Greg Hogan
Priority: Minor


taskmanager.sh will print the following message when starting the cluster if bc 
is not installed and off-heap memory is enabled and configured as a ratio. The 
script should first check that bc is installed and otherwise print a specific 
message.

{noformat}
[ERROR] Configured TaskManager managed memory fraction is not a valid value. 
Please set taskmanager.memory.fraction in flink-conf.yaml
{noformat}

An example of a distribution where bc is not installed by default are the 
Debian images for Google Compute Engine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2779) Update documentation to reflect new Stream/Window API

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941128#comment-14941128
 ] 

ASF GitHub Bot commented on FLINK-2779:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145020080
  
@gyfora Yes, they work with the same semantics. What do you mean by "fix 
time trigger/eviction determined by the window assigner"?


> Update documentation to reflect new Stream/Window API
> -
>
> Key: FLINK-2779
> URL: https://issues.apache.org/jira/browse/FLINK-2779
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Kostas Tzoumas
> Fix For: 0.10
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2779][FLINK-2794][streaming][docs] New ...

2015-10-02 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145020080
  
@gyfora Yes, they work with the same semantics. What do you mean by "fix 
time trigger/eviction determined by the window assigner"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2779) Update documentation to reflect new Stream/Window API

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941129#comment-14941129
 ] 

ASF GitHub Bot commented on FLINK-2779:
---

Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145020261
  
@aljoscha : what I meant was just that we define some time semantics with 
the window assigner


> Update documentation to reflect new Stream/Window API
> -
>
> Key: FLINK-2779
> URL: https://issues.apache.org/jira/browse/FLINK-2779
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Kostas Tzoumas
> Fix For: 0.10
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2779][FLINK-2794][streaming][docs] New ...

2015-10-02 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145020330
  
@gyfora, yes, thats right



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2779][FLINK-2794][streaming][docs] New ...

2015-10-02 Thread gyfora
Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145020261
  
@aljoscha : what I meant was just that we define some time semantics with 
the window assigner


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2779) Update documentation to reflect new Stream/Window API

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941130#comment-14941130
 ] 

ASF GitHub Bot commented on FLINK-2779:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145020330
  
@gyfora, yes, thats right



> Update documentation to reflect new Stream/Window API
> -
>
> Key: FLINK-2779
> URL: https://issues.apache.org/jira/browse/FLINK-2779
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Kostas Tzoumas
> Fix For: 0.10
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2779][FLINK-2794][streaming][docs] New ...

2015-10-02 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145020475
  
The ConnectedStreams example formatting is off (in Transformations 
section), also there should be something like:
```java
ConnectedStreams<> connectedStreams = someStream.connect(otherStream)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2779][FLINK-2794][streaming][docs] New ...

2015-10-02 Thread gyfora
Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145020604
  
"Windows on unkeyed data streams (non-parallel windows)"
I still think that this gives a bad false impression of executing all 
global windows including time in a non-parallel way.

I can already here people say that Spark can do global time windows in 
parallel while Flink cant :P


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2779][FLINK-2794][streaming][docs] New ...

2015-10-02 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145020534
  
If split is mentioned select should also be mentioned.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2779) Update documentation to reflect new Stream/Window API

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941134#comment-14941134
 ] 

ASF GitHub Bot commented on FLINK-2779:
---

Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145020604
  
"Windows on unkeyed data streams (non-parallel windows)"
I still think that this gives a bad false impression of executing all 
global windows including time in a non-parallel way.

I can already here people say that Spark can do global time windows in 
parallel while Flink cant :P


> Update documentation to reflect new Stream/Window API
> -
>
> Key: FLINK-2779
> URL: https://issues.apache.org/jira/browse/FLINK-2779
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Kostas Tzoumas
> Fix For: 0.10
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2779) Update documentation to reflect new Stream/Window API

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941132#comment-14941132
 ] 

ASF GitHub Bot commented on FLINK-2779:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145020475
  
The ConnectedStreams example formatting is off (in Transformations 
section), also there should be something like:
```java
ConnectedStreams<> connectedStreams = someStream.connect(otherStream)
```


> Update documentation to reflect new Stream/Window API
> -
>
> Key: FLINK-2779
> URL: https://issues.apache.org/jira/browse/FLINK-2779
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Kostas Tzoumas
> Fix For: 0.10
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2779) Update documentation to reflect new Stream/Window API

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941133#comment-14941133
 ] 

ASF GitHub Bot commented on FLINK-2779:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145020534
  
If split is mentioned select should also be mentioned.


> Update documentation to reflect new Stream/Window API
> -
>
> Key: FLINK-2779
> URL: https://issues.apache.org/jira/browse/FLINK-2779
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Kostas Tzoumas
> Fix For: 0.10
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2779) Update documentation to reflect new Stream/Window API

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941146#comment-14941146
 ] 

ASF GitHub Bot commented on FLINK-2779:
---

Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145024860
  
That is what I mean, global time reduce.


> Update documentation to reflect new Stream/Window API
> -
>
> Key: FLINK-2779
> URL: https://issues.apache.org/jira/browse/FLINK-2779
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Kostas Tzoumas
> Fix For: 0.10
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2779) Update documentation to reflect new Stream/Window API

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941160#comment-14941160
 ] 

ASF GitHub Bot commented on FLINK-2779:
---

Github user senorcarbone commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145028978
  
From a quick read on the documentation (and prior knowledge from google 
dataflow) it is easy to get a full picture of the new semantics. Even though I 
like it from a engineering perspective, It needs to be pointed out though that 
the addition of the *Assigner* restricts our previous windowing semantics to 
periodic windows, i.e. at the moment you receive a record you have to know all 
the possible buckets this record will fall into while before we did not have 
such a constraint. In other words we no longer support dynamic windows where 
range and slide can change dynamically.


> Update documentation to reflect new Stream/Window API
> -
>
> Key: FLINK-2779
> URL: https://issues.apache.org/jira/browse/FLINK-2779
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Kostas Tzoumas
> Fix For: 0.10
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2779][FLINK-2794][streaming][docs] New ...

2015-10-02 Thread senorcarbone
Github user senorcarbone commented on the pull request:

https://github.com/apache/flink/pull/1208#issuecomment-145028978
  
From a quick read on the documentation (and prior knowledge from google 
dataflow) it is easy to get a full picture of the new semantics. Even though I 
like it from a engineering perspective, It needs to be pointed out though that 
the addition of the *Assigner* restricts our previous windowing semantics to 
periodic windows, i.e. at the moment you receive a record you have to know all 
the possible buckets this record will fall into while before we did not have 
such a constraint. In other words we no longer support dynamic windows where 
range and slide can change dynamically.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2797) CLI: Missing option to submit jobs in detached mode

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941161#comment-14941161
 ] 

ASF GitHub Bot commented on FLINK-2797:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/1214#issuecomment-145029090
  
Looks good but the Maven Checkstyle check doesn't pass...


> CLI: Missing option to submit jobs in detached mode
> ---
>
> Key: FLINK-2797
> URL: https://issues.apache.org/jira/browse/FLINK-2797
> Project: Flink
>  Issue Type: Bug
>  Components: Command-line client
>Affects Versions: 0.9, 0.10
>Reporter: Maximilian Michels
>Assignee: Sachin Goel
> Fix For: 0.10
>
>
> Jobs can only be submitted in detached mode using YARN but not on a 
> standalone installation. This has been requested by users who want to submit 
> a job, get the job id, and later query its status.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2797][cli] Add support for running jobs...

2015-10-02 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/1214#issuecomment-145029090
  
Looks good but the Maven Checkstyle check doesn't pass...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-2802) Watermark triggered operators cannot progress with cyclic flows

2015-10-02 Thread Gyula Fora (JIRA)
Gyula Fora created FLINK-2802:
-

 Summary: Watermark triggered operators cannot progress with cyclic 
flows
 Key: FLINK-2802
 URL: https://issues.apache.org/jira/browse/FLINK-2802
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Gyula Fora
Priority: Blocker


The problem is that we can easily create a cyclic watermark (time) dependency 
in the stream graph which will result in a deadlock for watermark triggered 
operators such as  the `WindowOperator`.

A solution to this could be to emit a Long.MAX_VALUE watermark from the 
iteration sources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2525) Add configuration support in Storm-compatibility

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940918#comment-14940918
 ] 

ASF GitHub Bot commented on FLINK-2525:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1046


> Add configuration support in Storm-compatibility
> 
>
> Key: FLINK-2525
> URL: https://issues.apache.org/jira/browse/FLINK-2525
> Project: Flink
>  Issue Type: New Feature
>  Components: Storm Compatibility
>Reporter: fangfengbin
>Assignee: fangfengbin
>
> Spouts and Bolt are initialized by a call to `Spout.open(...)` and 
> `Bolt.prepare()`, respectively. Both methods have a config `Map` as first 
> parameter. This map is currently not populated. Thus, Spouts and Bolts cannot 
> be configure with user defined parameters. In order to support this feature, 
> spout and bolt wrapper classes need to be extended to create a proper `Map` 
> object. Furthermore, the clients need to be extended to take a `Map`, 
> translate it into a Flink `Configuration` that is forwarded to the wrappers 
> for proper initialization of the map.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2525]Add configuration support in Storm...

2015-10-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1046


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2561) Sync Gelly Java and Scala APIs

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940970#comment-14940970
 ] 

ASF GitHub Bot commented on FLINK-2561:
---

GitHub user vasia opened a pull request:

https://github.com/apache/flink/pull/1211

[FLINK-2561] Adds gelly-scala examples

This PR addresses the remaining issues of FLINK-2561.
It adds 3 gelly-scala examples: one vertex-centric sssp, one gsa-sssp and 
one showing how to use library methods (conn. components).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vasia/flink gelly-scala-example

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1211.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1211


commit 836e3959107c0bebe45888fccdb27c1e87d1011e
Author: vasia 
Date:   2015-10-01T20:26:25Z

[FLINK-2561] [gelly] add gelly-scala examples: vertex-centric SSSP, GSA SSSP
and how to use a library method (connected components).

commit 9d3ca468609228c4b0764058e4164995610b0d9c
Author: vasia 
Date:   2015-10-02T08:29:04Z

[gelly] fix parameters order in creation methods to be consistent with the 
Java API

commit 2b4063bad223b2aefd0134f4dfbba3d83fd08933
Author: vasia 
Date:   2015-10-02T09:08:34Z

[gelly] style corrections




> Sync Gelly Java and Scala APIs
> --
>
> Key: FLINK-2561
> URL: https://issues.apache.org/jira/browse/FLINK-2561
> Project: Flink
>  Issue Type: Task
>  Components: Gelly
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
> Fix For: 0.10
>
>
> There is some functionality and tests missing from the Gelly Scala API. This 
> should be added, together with documentation, a completeness test and some 
> usage examples.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2167] [table] Add fromHCat() to TableEn...

2015-10-02 Thread twalthr
Github user twalthr commented on the pull request:

https://github.com/apache/flink/pull/1127#issuecomment-145001030
  
Again, thanks for the feedback.
I will close this PR and open 2 separate PRs for TableSources and 
HCatInputFormat.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2167) Add fromHCat() to TableEnvironment

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941063#comment-14941063
 ] 

ASF GitHub Bot commented on FLINK-2167:
---

Github user twalthr commented on the pull request:

https://github.com/apache/flink/pull/1127#issuecomment-145001030
  
Again, thanks for the feedback.
I will close this PR and open 2 separate PRs for TableSources and 
HCatInputFormat.


> Add fromHCat() to TableEnvironment
> --
>
> Key: FLINK-2167
> URL: https://issues.apache.org/jira/browse/FLINK-2167
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API
>Affects Versions: 0.9
>Reporter: Fabian Hueske
>Assignee: Timo Walther
>Priority: Minor
>  Labels: starter
>
> Add a {{fromHCat()}} method to the {{TableEnvironment}} to read a {{Table}} 
> from an HCatalog table.
> The implementation could reuse Flink's HCatInputFormat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2167] [table] Add fromHCat() to TableEn...

2015-10-02 Thread twalthr
Github user twalthr closed the pull request at:

https://github.com/apache/flink/pull/1127


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2167) Add fromHCat() to TableEnvironment

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941064#comment-14941064
 ] 

ASF GitHub Bot commented on FLINK-2167:
---

Github user twalthr closed the pull request at:

https://github.com/apache/flink/pull/1127


> Add fromHCat() to TableEnvironment
> --
>
> Key: FLINK-2167
> URL: https://issues.apache.org/jira/browse/FLINK-2167
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API
>Affects Versions: 0.9
>Reporter: Fabian Hueske
>Assignee: Timo Walther
>Priority: Minor
>  Labels: starter
>
> Add a {{fromHCat()}} method to the {{TableEnvironment}} to read a {{Table}} 
> from an HCatalog table.
> The implementation could reuse Flink's HCatInputFormat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2561] Adds gelly-scala examples

2015-10-02 Thread vasia
GitHub user vasia opened a pull request:

https://github.com/apache/flink/pull/1211

[FLINK-2561] Adds gelly-scala examples

This PR addresses the remaining issues of FLINK-2561.
It adds 3 gelly-scala examples: one vertex-centric sssp, one gsa-sssp and 
one showing how to use library methods (conn. components).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vasia/flink gelly-scala-example

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1211.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1211


commit 836e3959107c0bebe45888fccdb27c1e87d1011e
Author: vasia 
Date:   2015-10-01T20:26:25Z

[FLINK-2561] [gelly] add gelly-scala examples: vertex-centric SSSP, GSA SSSP
and how to use a library method (connected components).

commit 9d3ca468609228c4b0764058e4164995610b0d9c
Author: vasia 
Date:   2015-10-02T08:29:04Z

[gelly] fix parameters order in creation methods to be consistent with the 
Java API

commit 2b4063bad223b2aefd0134f4dfbba3d83fd08933
Author: vasia 
Date:   2015-10-02T09:08:34Z

[gelly] style corrections




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2642) Scala Table API crashes when executing word count example

2015-10-02 Thread Jonas Traub (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941000#comment-14941000
 ] 

Jonas Traub commented on FLINK-2642:


Thank you very much Timo for fixing this!

> Scala Table API crashes when executing word count example
> -
>
> Key: FLINK-2642
> URL: https://issues.apache.org/jira/browse/FLINK-2642
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
> Environment: current master (0.10)
>Reporter: Jonas Traub
>Assignee: Timo Walther
>
> I tried to run the examples provided in the documentation of Flink's Table 
> API. Unfortunately, the Scala word count example provided in the 
> [documentation|https://ci.apache.org/projects/flink/flink-docs-master/libs/table.html]
>  doesn't work and does not give a meaningful exception.
> (Other examples work fine)
> Here my code:
> {code:java}
> package org.apache.flink.examples.scala
> import org.apache.flink.api.scala._
> import org.apache.flink.api.scala.table._
> object WordCount {
>   def main(args: Array[String]): Unit = {
> // set up execution environment
> val env = ExecutionEnvironment.getExecutionEnvironment
> case class WC(word: String, count: Int)
> val input = env.fromElements(WC("hello", 1), WC("hello", 1), WC("ciao", 
> 1))
> val expr = input.toTable
> val result = expr.groupBy('word).select('word, 'count.sum as 
> 'count).toDataSet[WC]
> result.print()
>   }
> }
> {code}
> Here the thrown exception:
> {code}
> Exception in thread "main" 
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:414)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>   at 
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>   at 
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>   at 
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>   at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>   at 
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:104)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>   at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.Exception: The user defined 'open(Configuration)' method 
> in class org.apache.flink.api.table.runtime.ExpressionSelectFunction caused 
> an exception: null
>   at 
> org.apache.flink.runtime.operators.RegularPactTask.openUserCode(RegularPactTask.java:1368)
>   at 
> org.apache.flink.runtime.operators.chaining.ChainedMapDriver.openTask(ChainedMapDriver.java:47)
>   at 
> org.apache.flink.runtime.operators.RegularPactTask.openChainedTasks(RegularPactTask.java:1408)
>   at 
> org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:142)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:581)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.flink.api.table.codegen.IndentStringContext$$anonfun$j$2.apply(Indenter.scala:30)
>   at 
> org.apache.flink.api.table.codegen.IndentStringContext$$anonfun$j$2.apply(Indenter.scala:23)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>   at 
> 

[jira] [Commented] (FLINK-2576) Add outer joins to API and Optimizer

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941053#comment-14941053
 ] 

ASF GitHub Bot commented on FLINK-2576:
---

Github user jkovacs commented on the pull request:

https://github.com/apache/flink/pull/1138#issuecomment-144997953
  
Thanks @fhueske and @StephanEwen for the comprehensive review and 
additional details on Flink internals!, I agree that we should rather wait to 
implement the projection join correctly at a later point.
I'll append a few commits addressing the review comments and squash them 
later into the appropriate commits when you feel it's ready to merge.


> Add outer joins to API and Optimizer
> 
>
> Key: FLINK-2576
> URL: https://issues.apache.org/jira/browse/FLINK-2576
> Project: Flink
>  Issue Type: Sub-task
>  Components: Java API, Optimizer, Scala API
>Reporter: Ricky Pogalz
>Priority: Minor
> Fix For: pre-apache
>
>
> Add left/right/full outer join methods to the DataSet APIs (Java, Scala) and 
> to the optimizer of Flink.
> Initially, the execution strategy should be a sort-merge outer join 
> (FLINK-2105) but can later be extended to hash joins for left/right outer 
> joins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2576] Add Outer Join operator to Optimi...

2015-10-02 Thread jkovacs
Github user jkovacs commented on the pull request:

https://github.com/apache/flink/pull/1138#issuecomment-144997953
  
Thanks @fhueske and @StephanEwen for the comprehensive review and 
additional details on Flink internals!, I agree that we should rather wait to 
implement the projection join correctly at a later point.
I'll append a few commits addressing the review comments and squash them 
later into the appropriate commits when you feel it's ready to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2801) Rework Storm Compatibility Tests

2015-10-02 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940923#comment-14940923
 ] 

Matthias J. Sax commented on FLINK-2801:


We need the STOP signal (FLINK-2111) to make this work smooth. 

> Rework Storm Compatibility Tests
> 
>
> Key: FLINK-2801
> URL: https://issues.apache.org/jira/browse/FLINK-2801
> Project: Flink
>  Issue Type: Bug
>  Components: Storm Compatibility
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>
> The tests for the storm compatibiliy layer are all working with timeouts 
> (running the program for 10 seconds) and then checking whether teh expected 
> result has been written.
> That is inherently unstable and slow (long delays). They should be rewritten 
> in a similar manner like for example the KafkaITCase tests, where the 
> streaming jobs terminate themselves with a "SuccessException", which can be 
> recognized as successful completion when thrown by the job client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2801) Rework Storm Compatibility Tests

2015-10-02 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940936#comment-14940936
 ] 

Stephan Ewen commented on FLINK-2801:
-

I don't think we do. We don't have that signal for other jobs as well and it 
works pretty well.

> Rework Storm Compatibility Tests
> 
>
> Key: FLINK-2801
> URL: https://issues.apache.org/jira/browse/FLINK-2801
> Project: Flink
>  Issue Type: Bug
>  Components: Storm Compatibility
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>
> The tests for the storm compatibiliy layer are all working with timeouts 
> (running the program for 10 seconds) and then checking whether teh expected 
> result has been written.
> That is inherently unstable and slow (long delays). They should be rewritten 
> in a similar manner like for example the KafkaITCase tests, where the 
> streaming jobs terminate themselves with a "SuccessException", which can be 
> recognized as successful completion when thrown by the job client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2642) Scala Table API crashes when executing word count example

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940935#comment-14940935
 ] 

ASF GitHub Bot commented on FLINK-2642:
---

GitHub user twalthr opened a pull request:

https://github.com/apache/flink/pull/1209

[FLINK-2642] [table] Scala Table API crashes when executing word count 
example

This PR improves the checking of the types during table translation. The 
error message for [FLINK-2642] is now correct.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/twalthr/flink TableApiWCFix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1209.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1209


commit 75b2a111729e5a97573e86b64bacf9496bde8a4c
Author: twalthr 
Date:   2015-10-02T09:12:01Z

[FLINK-2642] [table] Scala Table API crashes when executing word count 
example




> Scala Table API crashes when executing word count example
> -
>
> Key: FLINK-2642
> URL: https://issues.apache.org/jira/browse/FLINK-2642
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
> Environment: current master (0.10)
>Reporter: Jonas Traub
>Assignee: Timo Walther
>
> I tried to run the examples provided in the documentation of Flink's Table 
> API. Unfortunately, the Scala word count example provided in the 
> [documentation|https://ci.apache.org/projects/flink/flink-docs-master/libs/table.html]
>  doesn't work and does not give a meaningful exception.
> (Other examples work fine)
> Here my code:
> {code:java}
> package org.apache.flink.examples.scala
> import org.apache.flink.api.scala._
> import org.apache.flink.api.scala.table._
> object WordCount {
>   def main(args: Array[String]): Unit = {
> // set up execution environment
> val env = ExecutionEnvironment.getExecutionEnvironment
> case class WC(word: String, count: Int)
> val input = env.fromElements(WC("hello", 1), WC("hello", 1), WC("ciao", 
> 1))
> val expr = input.toTable
> val result = expr.groupBy('word).select('word, 'count.sum as 
> 'count).toDataSet[WC]
> result.print()
>   }
> }
> {code}
> Here the thrown exception:
> {code}
> Exception in thread "main" 
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:414)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>   at 
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>   at 
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>   at 
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>   at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>   at 
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:104)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>   at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.Exception: The user defined 'open(Configuration)' method 
> in class org.apache.flink.api.table.runtime.ExpressionSelectFunction caused 
> an exception: null
>   at 
> 

[jira] [Commented] (FLINK-2806) No TypeInfo for Scala's Nothing type

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940999#comment-14940999
 ] 

ASF GitHub Bot commented on FLINK-2806:
---

GitHub user ggevay opened a pull request:

https://github.com/apache/flink/pull/1212

[FLINK-2806] [scala-api] Add a TypeInformation[Nothing].

I added a TypeInfo[Nothing] class, and added a line to workaround the 
compiler bug. I also added a test for this, which tests at compile time: if the 
"implicit val scalaNothingTypeInfo" is not there, then it does not compile.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ggevay/flink typeInfo-nothing

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1212.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1212


commit a73a96a5de5a271ed4a3ca55aea14c34072aa7e7
Author: Gabor Gevay 
Date:   2015-10-02T08:14:01Z

[FLINK-2806] [scala-api] Add a TypeInformation[Nothing].




> No TypeInfo for Scala's Nothing type
> 
>
> Key: FLINK-2806
> URL: https://issues.apache.org/jira/browse/FLINK-2806
> Project: Flink
>  Issue Type: Bug
>  Components: Scala API
>Reporter: Gabor Gevay
>Assignee: Gabor Gevay
>Priority: Minor
>
> When writing some generic code, I encountered a situation where I needed a 
> TypeInformation[Nothing]. Two problems prevent me from getting it:
> 1. TypeInformationGen.mkTypeInfo doesn't return a real 
> TypeInformation[Nothing]. (It actually returns a casted null in that case.)
> 2. The line
> implicit def createTypeInformation[T]: TypeInformation[T] = macro 
> TypeUtils.createTypeInfo[T]
> does not fire in some situations when it should, when T = Nothing. (I guess 
> this is a compiler bug.)
> I will open a PR shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2806] [scala-api] Add a TypeInformation...

2015-10-02 Thread ggevay
GitHub user ggevay opened a pull request:

https://github.com/apache/flink/pull/1212

[FLINK-2806] [scala-api] Add a TypeInformation[Nothing].

I added a TypeInfo[Nothing] class, and added a line to workaround the 
compiler bug. I also added a test for this, which tests at compile time: if the 
"implicit val scalaNothingTypeInfo" is not there, then it does not compile.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ggevay/flink typeInfo-nothing

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1212.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1212


commit a73a96a5de5a271ed4a3ca55aea14c34072aa7e7
Author: Gabor Gevay 
Date:   2015-10-02T08:14:01Z

[FLINK-2806] [scala-api] Add a TypeInformation[Nothing].




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2807) Add javadocs/comments to new windowing mechanics

2015-10-02 Thread Kostas Tzoumas (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941024#comment-14941024
 ] 

Kostas Tzoumas commented on FLINK-2807:
---

If it helps temporarily, I have a draft of the user-facing documentation, which 
tracks the new window mechanics implementation here: 
https://github.com/apache/flink/pull/1208 

We should have javadocs before merging

> Add javadocs/comments to new windowing mechanics
> 
>
> Key: FLINK-2807
> URL: https://issues.apache.org/jira/browse/FLINK-2807
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Gyula Fora
>Priority: Critical
> Fix For: 0.10
>
>
> The newly introduced windowing mechanics (operators and interfaces) currently 
> have almost no javadocs or other comments about the working mechanisms.
> This makes the review process practically impossible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2576) Add outer joins to API and Optimizer

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941060#comment-14941060
 ] 

ASF GitHub Bot commented on FLINK-2576:
---

Github user jkovacs commented on a diff in the pull request:

https://github.com/apache/flink/pull/1138#discussion_r41014523
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/operators/base/OuterJoinOperatorBase.java
 ---
@@ -0,0 +1,314 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.common.operators.base;
+
+import org.apache.commons.collections.ResettableIterator;
+import org.apache.commons.collections.iterators.ListIteratorWrapper;
+import org.apache.flink.api.common.ExecutionConfig;
+import org.apache.flink.api.common.functions.FlatJoinFunction;
+import org.apache.flink.api.common.functions.RuntimeContext;
+import org.apache.flink.api.common.functions.util.CopyingListCollector;
+import org.apache.flink.api.common.functions.util.FunctionUtils;
+import org.apache.flink.api.common.operators.BinaryOperatorInformation;
+import org.apache.flink.api.common.operators.util.ListKeyGroupedIterator;
+import org.apache.flink.api.common.operators.util.UserCodeClassWrapper;
+import org.apache.flink.api.common.operators.util.UserCodeObjectWrapper;
+import org.apache.flink.api.common.operators.util.UserCodeWrapper;
+import org.apache.flink.api.common.typeinfo.AtomicType;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.common.typeutils.CompositeType;
+import org.apache.flink.api.common.typeutils.GenericPairComparator;
+import org.apache.flink.api.common.typeutils.TypeComparator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.util.Collector;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+
+public class OuterJoinOperatorBase> extends AbstractJoinOperatorBase {
+
+   public static enum OuterJoinType {LEFT, RIGHT, FULL}
+
+   private OuterJoinType outerJoinType;
+
+   public OuterJoinOperatorBase(UserCodeWrapper udf, 
BinaryOperatorInformation operatorInfo,
+   int[] keyPositions1, int[] keyPositions2, String name, 
OuterJoinType outerJoinType) {
+   super(udf, operatorInfo, keyPositions1, keyPositions2, name);
+   this.outerJoinType = outerJoinType;
+   }
+
+   public OuterJoinOperatorBase(FT udf, BinaryOperatorInformation operatorInfo,
+   int[] keyPositions1, int[] keyPositions2, String name, 
OuterJoinType outerJoinType) {
+   super(new UserCodeObjectWrapper(udf), operatorInfo, 
keyPositions1, keyPositions2, name);
+   this.outerJoinType = outerJoinType;
+   }
+
+   public OuterJoinOperatorBase(Class udf, 
BinaryOperatorInformation operatorInfo,
+   int[] keyPositions1, int[] keyPositions2, String name, 
OuterJoinType outerJoinType) {
+   super(new UserCodeClassWrapper(udf), operatorInfo, 
keyPositions1, keyPositions2, name);
+   this.outerJoinType = outerJoinType;
+   }
+
+   public void setOuterJoinType(OuterJoinType outerJoinType) {
+   this.outerJoinType = outerJoinType;
+   }
+
+   public OuterJoinType getOuterJoinType() {
+   return outerJoinType;
+   }
+
+   @Override
+   protected List executeOnCollections(List leftInput, List 
rightInput, RuntimeContext runtimeContext, ExecutionConfig executionConfig) 
throws Exception {
+   TypeInformation leftInformation = 
getOperatorInfo().getFirstInputType();
+   TypeInformation rightInformation = 
getOperatorInfo().getSecondInputType();

[GitHub] flink pull request: [FLINK-2576] Add Outer Join operator to Optimi...

2015-10-02 Thread jkovacs
Github user jkovacs commented on a diff in the pull request:

https://github.com/apache/flink/pull/1138#discussion_r41014523
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/operators/base/OuterJoinOperatorBase.java
 ---
@@ -0,0 +1,314 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.common.operators.base;
+
+import org.apache.commons.collections.ResettableIterator;
+import org.apache.commons.collections.iterators.ListIteratorWrapper;
+import org.apache.flink.api.common.ExecutionConfig;
+import org.apache.flink.api.common.functions.FlatJoinFunction;
+import org.apache.flink.api.common.functions.RuntimeContext;
+import org.apache.flink.api.common.functions.util.CopyingListCollector;
+import org.apache.flink.api.common.functions.util.FunctionUtils;
+import org.apache.flink.api.common.operators.BinaryOperatorInformation;
+import org.apache.flink.api.common.operators.util.ListKeyGroupedIterator;
+import org.apache.flink.api.common.operators.util.UserCodeClassWrapper;
+import org.apache.flink.api.common.operators.util.UserCodeObjectWrapper;
+import org.apache.flink.api.common.operators.util.UserCodeWrapper;
+import org.apache.flink.api.common.typeinfo.AtomicType;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.common.typeutils.CompositeType;
+import org.apache.flink.api.common.typeutils.GenericPairComparator;
+import org.apache.flink.api.common.typeutils.TypeComparator;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.util.Collector;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+
+public class OuterJoinOperatorBase> extends AbstractJoinOperatorBase {
+
+   public static enum OuterJoinType {LEFT, RIGHT, FULL}
+
+   private OuterJoinType outerJoinType;
+
+   public OuterJoinOperatorBase(UserCodeWrapper udf, 
BinaryOperatorInformation operatorInfo,
+   int[] keyPositions1, int[] keyPositions2, String name, 
OuterJoinType outerJoinType) {
+   super(udf, operatorInfo, keyPositions1, keyPositions2, name);
+   this.outerJoinType = outerJoinType;
+   }
+
+   public OuterJoinOperatorBase(FT udf, BinaryOperatorInformation operatorInfo,
+   int[] keyPositions1, int[] keyPositions2, String name, 
OuterJoinType outerJoinType) {
+   super(new UserCodeObjectWrapper(udf), operatorInfo, 
keyPositions1, keyPositions2, name);
+   this.outerJoinType = outerJoinType;
+   }
+
+   public OuterJoinOperatorBase(Class udf, 
BinaryOperatorInformation operatorInfo,
+   int[] keyPositions1, int[] keyPositions2, String name, 
OuterJoinType outerJoinType) {
+   super(new UserCodeClassWrapper(udf), operatorInfo, 
keyPositions1, keyPositions2, name);
+   this.outerJoinType = outerJoinType;
+   }
+
+   public void setOuterJoinType(OuterJoinType outerJoinType) {
+   this.outerJoinType = outerJoinType;
+   }
+
+   public OuterJoinType getOuterJoinType() {
+   return outerJoinType;
+   }
+
+   @Override
+   protected List executeOnCollections(List leftInput, List 
rightInput, RuntimeContext runtimeContext, ExecutionConfig executionConfig) 
throws Exception {
+   TypeInformation leftInformation = 
getOperatorInfo().getFirstInputType();
+   TypeInformation rightInformation = 
getOperatorInfo().getSecondInputType();
+   TypeInformation outInformation = 
getOperatorInfo().getOutputType();
+
+   TypeComparator leftComparator = buildComparatorFor(0, 
executionConfig, leftInformation);
+   TypeComparator 

[GitHub] flink pull request: [FLINK-2642] [table] Scala Table API crashes w...

2015-10-02 Thread twalthr
GitHub user twalthr opened a pull request:

https://github.com/apache/flink/pull/1209

[FLINK-2642] [table] Scala Table API crashes when executing word count 
example

This PR improves the checking of the types during table translation. The 
error message for [FLINK-2642] is now correct.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/twalthr/flink TableApiWCFix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1209.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1209


commit 75b2a111729e5a97573e86b64bacf9496bde8a4c
Author: twalthr 
Date:   2015-10-02T09:12:01Z

[FLINK-2642] [table] Scala Table API crashes when executing word count 
example




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-2806) No TypeInfo for Scala's Nothing type

2015-10-02 Thread Gabor Gevay (JIRA)
Gabor Gevay created FLINK-2806:
--

 Summary: No TypeInfo for Scala's Nothing type
 Key: FLINK-2806
 URL: https://issues.apache.org/jira/browse/FLINK-2806
 Project: Flink
  Issue Type: Bug
  Components: Scala API
Reporter: Gabor Gevay
Assignee: Gabor Gevay
Priority: Minor


When writing some generic code, I encountered a situation where I needed a 
TypeInformation[Nothing]. Two problems prevent me from getting it:
1. TypeInformationGen.mkTypeInfo doesn't return a real 
TypeInformation[Nothing]. (It actually returns a casted null in that case.)
2. The line
implicit def createTypeInformation[T]: TypeInformation[T] = macro 
TypeUtils.createTypeInfo[T]
does not fire in some situations when it should, when T = Nothing. (I guess 
this is a compiler bug.)

I will open a PR shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2796) CLI -q flag to supress the output does not work

2015-10-02 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941046#comment-14941046
 ] 

Maximilian Michels commented on FLINK-2796:
---

Sorry, I was about to open a pull request.

> CLI -q flag to supress the output does not work
> ---
>
> Key: FLINK-2796
> URL: https://issues.apache.org/jira/browse/FLINK-2796
> Project: Flink
>  Issue Type: Bug
>  Components: Command-line client
>Affects Versions: 0.10
>Reporter: Maximilian Michels
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: starter
> Fix For: 0.10
>
>
> The log output is shown regardless of whether -q is specified:
> {noformat}
> /bin/flink run -q examples/WordCount.jar
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2796) CLI -q flag to supress the output does not work

2015-10-02 Thread Sachin Goel (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941055#comment-14941055
 ] 

Sachin Goel commented on FLINK-2796:


No problem. It was a pretty minor change anyway. :)

> CLI -q flag to supress the output does not work
> ---
>
> Key: FLINK-2796
> URL: https://issues.apache.org/jira/browse/FLINK-2796
> Project: Flink
>  Issue Type: Bug
>  Components: Command-line client
>Affects Versions: 0.10
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Labels: starter
> Fix For: 0.10
>
>
> The log output is shown regardless of whether -q is specified:
> {noformat}
> /bin/flink run -q examples/WordCount.jar
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2797) CLI: Missing option to submit jobs in detached mode

2015-10-02 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941061#comment-14941061
 ] 

Maximilian Michels commented on FLINK-2797:
---

I would propose {{-d}} as a flag for submitting detached jobs. The CLI should 
exit after the job submission. It would be great if {{-d}} also triggered 
detached execution for YARN jobs. However, the {{-yd}} YARN flag has to stay 
for backwards-compatibility.

> CLI: Missing option to submit jobs in detached mode
> ---
>
> Key: FLINK-2797
> URL: https://issues.apache.org/jira/browse/FLINK-2797
> Project: Flink
>  Issue Type: Bug
>  Components: Command-line client
>Affects Versions: 0.9, 0.10
>Reporter: Maximilian Michels
>Assignee: Sachin Goel
> Fix For: 0.10
>
>
> Jobs can only be submitted in detached mode using YARN but not on a 
> standalone installation. This has been requested by users who want to submit 
> a job, get the job id, and later query its status.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2740) Create data consumer for Apache NiFi

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940908#comment-14940908
 ] 

ASF GitHub Bot commented on FLINK-2740:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1198#issuecomment-144960807
  
Great. Thank you for your answers. I  think its fine to let users deal with 
cases when multiple systems want to consume data from a NiFi flow.

Once my in-code comments are addressed, the code is good to merge in my 
opinion.



> Create data consumer for Apache NiFi
> 
>
> Key: FLINK-2740
> URL: https://issues.apache.org/jira/browse/FLINK-2740
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming Connectors
>Reporter: Kostas Tzoumas
>Assignee: Joseph Witt
>
> Create a connector to Apache NiFi to create Flink DataStreams from NiFi flows



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-2740 Adding flink-connector-nifi module ...

2015-10-02 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1198#issuecomment-144960807
  
Great. Thank you for your answers. I  think its fine to let users deal with 
cases when multiple systems want to consume data from a NiFi flow.

Once my in-code comments are addressed, the code is good to merge in my 
opinion.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [hotfix] Execute YARN integration tests only u...

2015-10-02 Thread rmetzger
GitHub user rmetzger opened a pull request:

https://github.com/apache/flink/pull/1210

[hotfix] Execute YARN integration tests only upon request 

...by activating the 'include-yarn-tests' profile

The default Hadoop version set in Flink (2.3.0) is causing the 
MiniYarnCluster to fail on some machines with some ip/hostname resolution 
issues.

The YARN tests are all executed in travis profiles with Hadoop versions 
above 2.3.0

**I'm opening this quickly as a PR to see if somebody disagrees or has 
valid concerns. I'm probably going to merge it in the next 8 hours**.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rmetzger/flink yarn230local

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1210.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1210


commit 49d3364d29569f5a9bd85fb4a4714295fae3117b
Author: Robert Metzger 
Date:   2015-10-02T09:38:40Z

[hotfix] Execute YARN integration tests only upon request (by activating 
the 'include-yarn-tests' profile)

The default Hadoop version set in Flink (2.3.0) is causing the 
MiniYarnCluster to fail on some machines with some ip/hostname resolution 
issues.

The YARN tests are all executed in travis profiles with Hadoop versions 
above 2.3.0




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (FLINK-2762) Job Runtime: -1 ms

2015-10-02 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed FLINK-2762.
-
   Resolution: Fixed
Fix Version/s: 0.10

Fix via {{142a8440f3e31be3c5c83f87a36cf468c103a9e2}}.

> Job Runtime: -1 ms
> --
>
> Key: FLINK-2762
> URL: https://issues.apache.org/jira/browse/FLINK-2762
> Project: Flink
>  Issue Type: Bug
>  Components: other
>Affects Versions: master
>Reporter: Greg Hogan
>Assignee: Maximilian Michels
>Priority: Minor
> Fix For: 0.10
>
>
> At the conclusion of my job the following is written to the console.
> {noformat}
> 09/24/2015 13:40:48   Job execution switched to status FINISHED.
> Job Runtime: -1 ms
> {noformat}
> {{org.apache.flink.client.program.Client.runBlocking(PackagedProgram, int)}} 
> calls {{JobExecutionResult.fromJobSubmissionResult(JobSubmissionResult)}} 
> which creates a "dummy object for wrapping a JobSubmissionResult" with 
> {{netRuntime}} set to -1 and {{accumulators}} set to {{null}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1599) TypeComperator with no keys and comparators matches some elements

2015-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940973#comment-14940973
 ] 

ASF GitHub Bot commented on FLINK-1599:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1207


> TypeComperator with no keys and comparators matches some elements
> -
>
> Key: FLINK-1599
> URL: https://issues.apache.org/jira/browse/FLINK-1599
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Runtime
>Affects Versions: 0.8.0
>Reporter: Maximilian Michels
>Priority: Minor
>
> If you create a custom type comparator by subclassing {{TypeComperator}} and 
> implement {{int extractKeys(Object record, Object[] target, int index)}} and 
> {{TypeComparator[] getFlatComparators()}} to return 0 and no type comparators 
> respectively, the {{coGroup}} operator (possibly others) find matching 
> elements although no comparators have been specified.
> In this case, the expected behavior for a CoGroup would be that only elements 
> from one side are supplied in the CoGroup method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >