[GitHub] flink pull request: Enforce import restriction on usage of Flink s...

2015-06-04 Thread lokeshrajaram
GitHub user lokeshrajaram opened a pull request:

https://github.com/apache/flink/pull/775

Enforce import restriction on usage of Flink shaded package and Commons 
Validate

https://issues.apache.org/jira/browse/FLINK-2155

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lokeshrajaram/flink enforce_import_restriction

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/775.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #775


commit 733c704b094c2d7a19351e43d6338ccc774d02df
Author: Lokesh Rajaram 
Date:   2015-05-28T04:29:33Z

Merge pull request #1 from apache/master

update from original

commit 3b1c2eb8aa98f9176007f7e7a9cb3e973fdc4a02
Author: Lokesh Rajaram 
Date:   2015-06-03T06:22:12Z

getting latest
Merge branch 'master' of https://github.com/apache/flink

commit f14a0626092e68a2670cfe242bb44dd877d005cb
Author: Lokesh Rajaram 
Date:   2015-06-04T04:12:55Z

added illegal import restrictions module

commit f33a0f33a770b5a8fe40e35451a3e1058b990c42
Author: Lokesh Rajaram 
Date:   2015-06-04T06:41:21Z

added package import restriction check and restriction check for using 
Commons Validate

commit 9188d804e84a350848fed7d01c1a698d0bcfbaea
Author: Lokesh Rajaram 
Date:   2015-06-04T06:45:44Z

Merge branch 'master' of https://github.com/apache/flink

commit 110d1dfe579284c8cc9fe4851aa3ddeaf8cdbea2
Author: Lokesh Rajaram 
Date:   2015-06-04T04:12:55Z

added illegal import restrictions module

commit 536ba194af1888a2d5bff91070bbb4fa14f450da
Author: Lokesh Rajaram 
Date:   2015-06-04T06:41:21Z

added package import restriction check and restriction check for using 
Commons Validate

commit a600dc5a35bb5b60cacb70019c49b987b0896e11
Author: Lokesh Rajaram 
Date:   2015-06-04T06:48:27Z

Merge branch 'enforce_import_restriction' of 
https://github.com/lokeshrajaram/flink into enforce_import_restriction




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2155) Add an additional checkstyle validation for illegal imports

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572285#comment-14572285
 ] 

ASF GitHub Bot commented on FLINK-2155:
---

GitHub user lokeshrajaram opened a pull request:

https://github.com/apache/flink/pull/775

Enforce import restriction on usage of Flink shaded package and Commons 
Validate

https://issues.apache.org/jira/browse/FLINK-2155

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lokeshrajaram/flink enforce_import_restriction

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/775.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #775


commit 733c704b094c2d7a19351e43d6338ccc774d02df
Author: Lokesh Rajaram 
Date:   2015-05-28T04:29:33Z

Merge pull request #1 from apache/master

update from original

commit 3b1c2eb8aa98f9176007f7e7a9cb3e973fdc4a02
Author: Lokesh Rajaram 
Date:   2015-06-03T06:22:12Z

getting latest
Merge branch 'master' of https://github.com/apache/flink

commit f14a0626092e68a2670cfe242bb44dd877d005cb
Author: Lokesh Rajaram 
Date:   2015-06-04T04:12:55Z

added illegal import restrictions module

commit f33a0f33a770b5a8fe40e35451a3e1058b990c42
Author: Lokesh Rajaram 
Date:   2015-06-04T06:41:21Z

added package import restriction check and restriction check for using 
Commons Validate

commit 9188d804e84a350848fed7d01c1a698d0bcfbaea
Author: Lokesh Rajaram 
Date:   2015-06-04T06:45:44Z

Merge branch 'master' of https://github.com/apache/flink

commit 110d1dfe579284c8cc9fe4851aa3ddeaf8cdbea2
Author: Lokesh Rajaram 
Date:   2015-06-04T04:12:55Z

added illegal import restrictions module

commit 536ba194af1888a2d5bff91070bbb4fa14f450da
Author: Lokesh Rajaram 
Date:   2015-06-04T06:41:21Z

added package import restriction check and restriction check for using 
Commons Validate

commit a600dc5a35bb5b60cacb70019c49b987b0896e11
Author: Lokesh Rajaram 
Date:   2015-06-04T06:48:27Z

Merge branch 'enforce_import_restriction' of 
https://github.com/lokeshrajaram/flink into enforce_import_restriction




> Add an additional checkstyle validation for illegal imports
> ---
>
> Key: FLINK-2155
> URL: https://issues.apache.org/jira/browse/FLINK-2155
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Reporter: Lokesh Rajaram
>Assignee: Lokesh Rajaram
>
> Add an additional check-style validation for illegal imports.
> To begin with the following two package import are marked as illegal:
>  1. org.apache.commons.lang3.Validate
>  2. org.apache.flink.shaded.*
> Implementation based on: 
> http://checkstyle.sourceforge.net/config_imports.html#IllegalImport



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Enforce import restriction on usage of Flink s...

2015-06-04 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/775#issuecomment-108758033
  
+1 change looks good.
Can you squash your changes into one commit, named 
`[FLINK-2155] Enforce import restriction on usage of Flink shaded package 
and Commons Validate`
That makes merging for us easier.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2155) Add an additional checkstyle validation for illegal imports

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572290#comment-14572290
 ] 

ASF GitHub Bot commented on FLINK-2155:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/775#issuecomment-108758033
  
+1 change looks good.
Can you squash your changes into one commit, named 
`[FLINK-2155] Enforce import restriction on usage of Flink shaded package 
and Commons Validate`
That makes merging for us easier.


> Add an additional checkstyle validation for illegal imports
> ---
>
> Key: FLINK-2155
> URL: https://issues.apache.org/jira/browse/FLINK-2155
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Reporter: Lokesh Rajaram
>Assignee: Lokesh Rajaram
>
> Add an additional check-style validation for illegal imports.
> To begin with the following two package import are marked as illegal:
>  1. org.apache.commons.lang3.Validate
>  2. org.apache.flink.shaded.*
> Implementation based on: 
> http://checkstyle.sourceforge.net/config_imports.html#IllegalImport



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Enforce import restriction on usage of Flink s...

2015-06-04 Thread lokeshrajaram
Github user lokeshrajaram commented on the pull request:

https://github.com/apache/flink/pull/775#issuecomment-108760193
  
Thanks @rmetzger. will do that


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Enforce import restriction on usage of Flink s...

2015-06-04 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/775#issuecomment-108760386
  
Cool, thank you.
You can update the pull request by force-pushing into the branch it is 
based on.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (FLINK-2048) Enhance Twitter Stream support

2015-06-04 Thread Hilmi Yildirim (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hilmi Yildirim resolved FLINK-2048.
---
Resolution: Fixed

TwitterFilterSource was adapted and TwitterFilterSource was implemented. 

TwitterFilterSource enables to create a filtered twitter stream.

> Enhance Twitter Stream support
> --
>
> Key: FLINK-2048
> URL: https://issues.apache.org/jira/browse/FLINK-2048
> Project: Flink
>  Issue Type: Task
>  Components: Streaming
>Affects Versions: master
>Reporter: Hilmi Yildirim
>Assignee: Hilmi Yildirim
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Flink does not have a real twitter support. It only has a TwitterSource which 
> uses a sample stream which can not be used properly for analysis. It is 
> possible to use external tools to create streams (e.g. Kafka) but it is 
> beneficially to create a propert twitter stream in Flink.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...

2015-06-04 Thread mbalassi
Github user mbalassi commented on the pull request:

https://github.com/apache/flink/pull/742#issuecomment-108765546
  
Big +1 for the `run(SourceContext)` interface and experimenting with 
`Thread.holdslock(obj).`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572312#comment-14572312
 ] 

ASF GitHub Bot commented on FLINK-2098:
---

Github user mbalassi commented on the pull request:

https://github.com/apache/flink/pull/742#issuecomment-108765546
  
Big +1 for the `run(SourceContext)` interface and experimenting with 
`Thread.holdslock(obj).`


> Checkpoint barrier initiation at source is not aligned with snapshotting
> 
>
> Key: FLINK-2098
> URL: https://issues.apache.org/jira/browse/FLINK-2098
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Aljoscha Krettek
>Priority: Blocker
> Fix For: 0.9
>
>
> The stream source does not properly align the emission of checkpoint barriers 
> with the drawing of snapshots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

2015-06-04 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572329#comment-14572329
 ] 

Till Rohrmann commented on FLINK-1731:
--

You can enable Travis support [1] for you repository. Then whenever you push 
something to your repo, Travis will trigger a new build.

[1] https://travis-ci.org/

> Add kMeans clustering algorithm to machine learning library
> ---
>
> Key: FLINK-1731
> URL: https://issues.apache.org/jira/browse/FLINK-1731
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Peter Schrott
>  Labels: ML
>
> The Flink repository already contains a kMeans implementation but it is not 
> yet ported to the machine learning library. I assume that only the used data 
> types have to be adapted and then it can be more or less directly moved to 
> flink-ml.
> The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better 
> implementation because the improve the initial seeding phase to achieve near 
> optimal clustering. It might be worthwhile to implement kMeans||.
> Resources:
> [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
> [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2126] Fixed Flink scala shell ITSuite s...

2015-06-04 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/768#issuecomment-108772594
  
We haven't created an issue for the logging file, yet. Will do. The
flink-scala-shell and the flink-ml modules use the scalatest maven plugin.

The fix should be easy, though. Simply give these two modules a different
travis log4j properties file where you don't depend on the fork number.

On Wed, Jun 3, 2015 at 7:13 PM, Ufuk Celebi 
wrote:

> Can you just remove the commented lines? And I think that we should also
> remove the check from the other tests.
>
> From @tillrohrmann : the issue with the
> LOG file not being created is a separate issue. Till, do we have an issue
> for that one? Is this the only tests, which runs with scalatest or why 
does
> it not affect other tests?
>
> —
> Reply to this email directly or view it on GitHub
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2126) Scala shell tests sporadically fail on travis

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572332#comment-14572332
 ] 

ASF GitHub Bot commented on FLINK-2126:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/768#issuecomment-108772594
  
We haven't created an issue for the logging file, yet. Will do. The
flink-scala-shell and the flink-ml modules use the scalatest maven plugin.

The fix should be easy, though. Simply give these two modules a different
travis log4j properties file where you don't depend on the fork number.

On Wed, Jun 3, 2015 at 7:13 PM, Ufuk Celebi 
wrote:

> Can you just remove the commented lines? And I think that we should also
> remove the check from the other tests.
>
> From @tillrohrmann : the issue with the
> LOG file not being created is a separate issue. Till, do we have an issue
> for that one? Is this the only tests, which runs with scalatest or why 
does
> it not affect other tests?
>
> —
> Reply to this email directly or view it on GitHub
> .
>



> Scala shell tests sporadically fail on travis
> -
>
> Key: FLINK-2126
> URL: https://issues.apache.org/jira/browse/FLINK-2126
> Project: Flink
>  Issue Type: Bug
>  Components: Scala Shell
>Affects Versions: 0.9
>Reporter: Robert Metzger
>
> See https://travis-ci.org/rmetzger/flink/jobs/64893149



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2156) Scala modules cannot create logging file

2015-06-04 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-2156:


 Summary: Scala modules cannot create logging file
 Key: FLINK-2156
 URL: https://issues.apache.org/jira/browse/FLINK-2156
 Project: Flink
  Issue Type: Bug
Reporter: Till Rohrmann
Priority: Minor


The Scala only modules `flink-scala-shell` and `flink-ml` use Maven's scalatest 
plugin to run their tests. The scalatest plugin has no `forkNumber`, though. 
Therefore, the logging fails to create the logging file as specified in the 
`log4j-travis.properties` file.

We can fix this issue by giving these two modules different `log4j.properties` 
files which don't require a `forkNumber`. Or we fix the `forkNumber` to 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-2156) Scala modules cannot create logging file

2015-06-04 Thread Till Rohrmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-2156:
-
Description: 
The Scala only modules {{flink-scala-shell}} and {{flink-ml}} use Maven's 
scalatest plugin to run their tests. The scalatest plugin has no 
{{forkNumber}}, though. Therefore, the logging fails to create the logging file 
as specified in the {{log4j-travis.properties}} file.

We can fix this issue by giving these two modules different 
{{log4j.properties}} files which don't require a {{forkNumber}}. Or we fix the 
{{forkNumber}} to 1.

  was:
The Scala only modules `flink-scala-shell` and `flink-ml` use Maven's scalatest 
plugin to run their tests. The scalatest plugin has no `forkNumber`, though. 
Therefore, the logging fails to create the logging file as specified in the 
`log4j-travis.properties` file.

We can fix this issue by giving these two modules different `log4j.properties` 
files which don't require a `forkNumber`. Or we fix the `forkNumber` to 1.


> Scala modules cannot create logging file
> 
>
> Key: FLINK-2156
> URL: https://issues.apache.org/jira/browse/FLINK-2156
> Project: Flink
>  Issue Type: Bug
>Reporter: Till Rohrmann
>Priority: Minor
>
> The Scala only modules {{flink-scala-shell}} and {{flink-ml}} use Maven's 
> scalatest plugin to run their tests. The scalatest plugin has no 
> {{forkNumber}}, though. Therefore, the logging fails to create the logging 
> file as specified in the {{log4j-travis.properties}} file.
> We can fix this issue by giving these two modules different 
> {{log4j.properties}} files which don't require a {{forkNumber}}. Or we fix 
> the {{forkNumber}} to 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2155) Add an additional checkstyle validation for illegal imports

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572350#comment-14572350
 ] 

ASF GitHub Bot commented on FLINK-2155:
---

GitHub user lokeshrajaram opened a pull request:

https://github.com/apache/flink/pull/776

[FLINK-2155] Enforce import restriction on usage of Flink shaded package 
and Commons Validate

@rmetzger sorry I had issues squashing commits hence this new pull request. 
Not sure if I am doing it right. Sorry for the trouble

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lokeshrajaram/flink checks_for_import

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/776.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #776


commit 733c704b094c2d7a19351e43d6338ccc774d02df
Author: Lokesh Rajaram 
Date:   2015-05-28T04:29:33Z

Merge pull request #1 from apache/master

update from original

commit 3b1c2eb8aa98f9176007f7e7a9cb3e973fdc4a02
Author: Lokesh Rajaram 
Date:   2015-06-03T06:22:12Z

getting latest
Merge branch 'master' of https://github.com/apache/flink

commit 9188d804e84a350848fed7d01c1a698d0bcfbaea
Author: Lokesh Rajaram 
Date:   2015-06-04T06:45:44Z

Merge branch 'master' of https://github.com/apache/flink

commit 3f07464ce937296c0409d561bf8548e5403c4346
Author: Lokesh Rajaram 
Date:   2015-06-04T08:13:27Z

[FLINK-2155] Enforce import restriction on usage of Flink shaded package 
and Commons Validate




> Add an additional checkstyle validation for illegal imports
> ---
>
> Key: FLINK-2155
> URL: https://issues.apache.org/jira/browse/FLINK-2155
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Reporter: Lokesh Rajaram
>Assignee: Lokesh Rajaram
>
> Add an additional check-style validation for illegal imports.
> To begin with the following two package import are marked as illegal:
>  1. org.apache.commons.lang3.Validate
>  2. org.apache.flink.shaded.*
> Implementation based on: 
> http://checkstyle.sourceforge.net/config_imports.html#IllegalImport



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2155] Enforce import restriction on usa...

2015-06-04 Thread lokeshrajaram
GitHub user lokeshrajaram opened a pull request:

https://github.com/apache/flink/pull/776

[FLINK-2155] Enforce import restriction on usage of Flink shaded package 
and Commons Validate

@rmetzger sorry I had issues squashing commits hence this new pull request. 
Not sure if I am doing it right. Sorry for the trouble

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lokeshrajaram/flink checks_for_import

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/776.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #776


commit 733c704b094c2d7a19351e43d6338ccc774d02df
Author: Lokesh Rajaram 
Date:   2015-05-28T04:29:33Z

Merge pull request #1 from apache/master

update from original

commit 3b1c2eb8aa98f9176007f7e7a9cb3e973fdc4a02
Author: Lokesh Rajaram 
Date:   2015-06-03T06:22:12Z

getting latest
Merge branch 'master' of https://github.com/apache/flink

commit 9188d804e84a350848fed7d01c1a698d0bcfbaea
Author: Lokesh Rajaram 
Date:   2015-06-04T06:45:44Z

Merge branch 'master' of https://github.com/apache/flink

commit 3f07464ce937296c0409d561bf8548e5403c4346
Author: Lokesh Rajaram 
Date:   2015-06-04T08:13:27Z

[FLINK-2155] Enforce import restriction on usage of Flink shaded package 
and Commons Validate




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Enforce import restriction on usage of Flink s...

2015-06-04 Thread lokeshrajaram
Github user lokeshrajaram closed the pull request at:

https://github.com/apache/flink/pull/775


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2155) Add an additional checkstyle validation for illegal imports

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572351#comment-14572351
 ] 

ASF GitHub Bot commented on FLINK-2155:
---

Github user lokeshrajaram closed the pull request at:

https://github.com/apache/flink/pull/775


> Add an additional checkstyle validation for illegal imports
> ---
>
> Key: FLINK-2155
> URL: https://issues.apache.org/jira/browse/FLINK-2155
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Reporter: Lokesh Rajaram
>Assignee: Lokesh Rajaram
>
> Add an additional check-style validation for illegal imports.
> To begin with the following two package import are marked as illegal:
>  1. org.apache.commons.lang3.Validate
>  2. org.apache.flink.shaded.*
> Implementation based on: 
> http://checkstyle.sourceforge.net/config_imports.html#IllegalImport



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2126) Scala shell tests sporadically fail on travis

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572355#comment-14572355
 ] 

ASF GitHub Bot commented on FLINK-2126:
---

Github user nikste commented on the pull request:

https://github.com/apache/flink/pull/768#issuecomment-108780565
  
Ok, removed the comments and the "Job execution switched to status 
FINISHED." for all tests now (although, till now it only happened in the "add 
numbers" Tests.)


> Scala shell tests sporadically fail on travis
> -
>
> Key: FLINK-2126
> URL: https://issues.apache.org/jira/browse/FLINK-2126
> Project: Flink
>  Issue Type: Bug
>  Components: Scala Shell
>Affects Versions: 0.9
>Reporter: Robert Metzger
>
> See https://travis-ci.org/rmetzger/flink/jobs/64893149



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2126] Fixed Flink scala shell ITSuite s...

2015-06-04 Thread nikste
Github user nikste commented on the pull request:

https://github.com/apache/flink/pull/768#issuecomment-108780565
  
Ok, removed the comments and the "Job execution switched to status 
FINISHED." for all tests now (although, till now it only happened in the "add 
numbers" Tests.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [streaming] ITCases for streaming scala exampl...

2015-06-04 Thread mbalassi
GitHub user mbalassi opened a pull request:

https://github.com/apache/flink/pull/777

[streaming] ITCases for streaming scala examples



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mbalassi/flink scala-it

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/777.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #777


commit 3182b3ca024ad511e69074e6cf14bd851b93528b
Author: mbalassi 
Date:   2015-06-02T17:12:02Z

[streaming] [scala] ITCases for streaming scala examples

commit d25b975d912f14fc6804925a8263446554a5dac0
Author: mbalassi 
Date:   2015-06-04T08:11:42Z

[streaming] Removed StockPrices example

The purpose of this was to serve as an example for a blogpost.
Moved it to another branch and removed it from the master.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2116) Make pipeline extension require less coding

2015-06-04 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572386#comment-14572386
 ] 

Till Rohrmann commented on FLINK-2116:
--

At the moment, the corresponding PR only contains the {{evaluate}} method which 
gives you a {{DataSet}} of tuples {{(true label, predicted label)}}. This can 
then be used to calculate some accuracy scores. But this has not been done yet. 

With this PR I wanted to get some feedback on the general design of the 
pipelines with the {{evaluate}} method and whether it makes sense to use 
{{Tuples}} as input instead of {{LabeledVector}}. Maybe there is also some 
other way to automatically extract a label value from some type which is 
parameterized to make the default {{EvaluateDataSetOperation}} work on 
{{LabeledVector}} if you only specify a {{PredictOperation}}.

My gut feeling is also that we should keep the calculation of the evaluation 
score separate from the actual {{Predictor}}, because if you have a pipeline, 
then it's no longer easy to access the members of the {{Predictor}} which are 
only defined in the corresponding subclass. Moreover, maybe sometimes you want 
to apply different scores to your method depending on the use case.

We should definitely open a new JIRA issue for the implementation of an 
evaluation framework.

> Make pipeline extension require less coding
> ---
>
> Key: FLINK-2116
> URL: https://issues.apache.org/jira/browse/FLINK-2116
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Mikio Braun
>Assignee: Till Rohrmann
>Priority: Minor
>
> Right now, implementing methods from the pipelines for new types, or even 
> adding new methods to pipelines requires many steps:
> 1) implementing methods for new types
>   implement implicit of the corresponding class encapsulating the operation 
> in the companion object
> 2) adding methods to the pipeline
>   - adding a method
>   - adding a trait for the operation
>   - implement implicit in the companion object
> These are all objects which contain many generic parameters, so reducing the 
> work would be great.
> The goal should be that you can really focus on the code to add, and have as 
> little boilerplate code as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2155] Enforce import restriction on usa...

2015-06-04 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/776#issuecomment-108791701
  
Hey @lokeshrajaram. This is not right, but it's not a problem. We can 
easily fix it. :-) I suggest that you do the following:

1. If you don't have the Flink repository as a remote, add it: `git add 
remote flink https://git-wip-us.apache.org/repos/asf/flink.git` and do a fetch 
`git fetch flink`. This depends on whether you cloned your fork or the main 
repo.

2. Checkout a new branch from flink/master: `git checkout -b YOUR_BRANCH 
flink/master`.

3. Cherry pick your commit: `git cherry-pick 3f07464`. This will add your 
commit to the new branch. There should be no conflicts.

4. Now force push this to the branch of *this* pull request: `git push -f 
origin YOUR_BRANCH:checks_for_import`. Assuming that origin is your forked 
repository. You have to force push, because you are changing the history of 
this branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2155) Add an additional checkstyle validation for illegal imports

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572392#comment-14572392
 ] 

ASF GitHub Bot commented on FLINK-2155:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/776#issuecomment-108791701
  
Hey @lokeshrajaram. This is not right, but it's not a problem. We can 
easily fix it. :-) I suggest that you do the following:

1. If you don't have the Flink repository as a remote, add it: `git add 
remote flink https://git-wip-us.apache.org/repos/asf/flink.git` and do a fetch 
`git fetch flink`. This depends on whether you cloned your fork or the main 
repo.

2. Checkout a new branch from flink/master: `git checkout -b YOUR_BRANCH 
flink/master`.

3. Cherry pick your commit: `git cherry-pick 3f07464`. This will add your 
commit to the new branch. There should be no conflicts.

4. Now force push this to the branch of *this* pull request: `git push -f 
origin YOUR_BRANCH:checks_for_import`. Assuming that origin is your forked 
repository. You have to force push, because you are changing the history of 
this branch.


> Add an additional checkstyle validation for illegal imports
> ---
>
> Key: FLINK-2155
> URL: https://issues.apache.org/jira/browse/FLINK-2155
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Reporter: Lokesh Rajaram
>Assignee: Lokesh Rajaram
>
> Add an additional check-style validation for illegal imports.
> To begin with the following two package import are marked as illegal:
>  1. org.apache.commons.lang3.Validate
>  2. org.apache.flink.shaded.*
> Implementation based on: 
> http://checkstyle.sourceforge.net/config_imports.html#IllegalImport



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2092) Document (new) behavior of print() and execute()

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572393#comment-14572393
 ] 

ASF GitHub Bot commented on FLINK-2092:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/774#discussion_r31703115
  
--- Diff: docs/apis/programming_guide.md ---
@@ -189,6 +187,10 @@ that creates the type information for Flink operations.
 
 
 
+
+
+ Hadoop Dependency Versions
+
 If you are using Flink together with Hadoop, the version of the dependency 
may vary depending on the
 version of Hadoop (or more specifically, HDFS) that you want to use Flink 
with. Please refer to the
 [downloads page]({{site.baseurl}}/downloads.html) for a list of available 
versions, and instructions
--- End diff --

this link is broken... it should refer to the project webpage.


> Document (new) behavior of print() and execute()
> 
>
> Key: FLINK-2092
> URL: https://issues.apache.org/jira/browse/FLINK-2092
> Project: Flink
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2092] Adjust documentation for print()/...

2015-06-04 Thread uce
Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/774#discussion_r31703115
  
--- Diff: docs/apis/programming_guide.md ---
@@ -189,6 +187,10 @@ that creates the type information for Flink operations.
 
 
 
+
+
+ Hadoop Dependency Versions
+
 If you are using Flink together with Hadoop, the version of the dependency 
may vary depending on the
 version of Hadoop (or more specifically, HDFS) that you want to use Flink 
with. Please refer to the
 [downloads page]({{site.baseurl}}/downloads.html) for a list of available 
versions, and instructions
--- End diff --

this link is broken... it should refer to the project webpage.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2092] Adjust documentation for print()/...

2015-06-04 Thread uce
Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/774#discussion_r31703262
  
--- Diff: docs/apis/programming_guide.md ---
@@ -283,7 +284,7 @@ This will create a new DataSet by converting every 
String in the original
 set to an Integer. For more information and a list of all the 
transformations,
 please refer to [Transformations](#transformations).
 
-Once you have a DataSet that needs to be written to disk you call one
+Once you have a DataSet that needs to be processed further you can call one
--- End diff --

I think I understand why you changes this to "processed further", but it 
does not make sense with print(). ;)

Maybe: once you want to write or read a result?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2092) Document (new) behavior of print() and execute()

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572396#comment-14572396
 ] 

ASF GitHub Bot commented on FLINK-2092:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/774#discussion_r31703262
  
--- Diff: docs/apis/programming_guide.md ---
@@ -283,7 +284,7 @@ This will create a new DataSet by converting every 
String in the original
 set to an Integer. For more information and a list of all the 
transformations,
 please refer to [Transformations](#transformations).
 
-Once you have a DataSet that needs to be written to disk you call one
+Once you have a DataSet that needs to be processed further you can call one
--- End diff --

I think I understand why you changes this to "processed further", but it 
does not make sense with print(). ;)

Maybe: once you want to write or read a result?


> Document (new) behavior of print() and execute()
> 
>
> Key: FLINK-2092
> URL: https://issues.apache.org/jira/browse/FLINK-2092
> Project: Flink
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2092] Adjust documentation for print()/...

2015-06-04 Thread uce
Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/774#discussion_r31703387
  
--- Diff: docs/apis/programming_guide.md ---
@@ -394,26 +382,66 @@ def write(outputFormat: FileOutputFormat[T],
 writeMode: WriteMode = WriteMode.NO_OVERWRITE)
 
 def print()
-{% endhighlight %}
 
-The last method is only useful for developing/debugging on a local machine,
-it will output the contents of the DataSet to standard output. (Note that 
in
-a cluster, the result goes to the standard out stream of the cluster nodes 
and ends
-up in the *.out* files of the workers).
-The first two do as the name suggests, the third one can be used to 
specify a
-custom data output format. Please refer
-to [Data Sinks](#data-sinks) for more information on writing to files and 
also
-about custom data output formats.
-
-Once you specified the complete program you need to call `execute` on
-the `ExecutionEnvironment`. This will either execute on your local
-machine or submit your program for execution on a cluster, depending on
-how you created the execution environment.
+def collect()
+{% endhighlight %}
 
 
 
 
 
+The first two methods (`writeAsText()` and `writeAsCsv()`) do as the name 
suggests, the third one 
+can be used to specify a custom data output format. Please refer to [Data 
Sinks](#data-sinks) for 
+more information on writing to files and also about custom data output 
formats.
+
+The `print()` method is useful for developing/debugging. It will output 
the contents of the DataSet 
--- End diff --

Stephan added `printOnTaskManager`... can you add this here as well?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2092) Document (new) behavior of print() and execute()

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572397#comment-14572397
 ] 

ASF GitHub Bot commented on FLINK-2092:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/774#discussion_r31703387
  
--- Diff: docs/apis/programming_guide.md ---
@@ -394,26 +382,66 @@ def write(outputFormat: FileOutputFormat[T],
 writeMode: WriteMode = WriteMode.NO_OVERWRITE)
 
 def print()
-{% endhighlight %}
 
-The last method is only useful for developing/debugging on a local machine,
-it will output the contents of the DataSet to standard output. (Note that 
in
-a cluster, the result goes to the standard out stream of the cluster nodes 
and ends
-up in the *.out* files of the workers).
-The first two do as the name suggests, the third one can be used to 
specify a
-custom data output format. Please refer
-to [Data Sinks](#data-sinks) for more information on writing to files and 
also
-about custom data output formats.
-
-Once you specified the complete program you need to call `execute` on
-the `ExecutionEnvironment`. This will either execute on your local
-machine or submit your program for execution on a cluster, depending on
-how you created the execution environment.
+def collect()
+{% endhighlight %}
 
 
 
 
 
+The first two methods (`writeAsText()` and `writeAsCsv()`) do as the name 
suggests, the third one 
+can be used to specify a custom data output format. Please refer to [Data 
Sinks](#data-sinks) for 
+more information on writing to files and also about custom data output 
formats.
+
+The `print()` method is useful for developing/debugging. It will output 
the contents of the DataSet 
--- End diff --

Stephan added `printOnTaskManager`... can you add this here as well?


> Document (new) behavior of print() and execute()
> 
>
> Key: FLINK-2092
> URL: https://issues.apache.org/jira/browse/FLINK-2092
> Project: Flink
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2092] Adjust documentation for print()/...

2015-06-04 Thread uce
Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/774#discussion_r31703441
  
--- Diff: docs/apis/programming_guide.md ---
@@ -394,26 +382,66 @@ def write(outputFormat: FileOutputFormat[T],
 writeMode: WriteMode = WriteMode.NO_OVERWRITE)
 
 def print()
-{% endhighlight %}
 
-The last method is only useful for developing/debugging on a local machine,
-it will output the contents of the DataSet to standard output. (Note that 
in
-a cluster, the result goes to the standard out stream of the cluster nodes 
and ends
-up in the *.out* files of the workers).
-The first two do as the name suggests, the third one can be used to 
specify a
-custom data output format. Please refer
-to [Data Sinks](#data-sinks) for more information on writing to files and 
also
-about custom data output formats.
-
-Once you specified the complete program you need to call `execute` on
-the `ExecutionEnvironment`. This will either execute on your local
-machine or submit your program for execution on a cluster, depending on
-how you created the execution environment.
+def collect()
+{% endhighlight %}
 
 
 
 
 
+The first two methods (`writeAsText()` and `writeAsCsv()`) do as the name 
suggests, the third one 
+can be used to specify a custom data output format. Please refer to [Data 
Sinks](#data-sinks) for 
+more information on writing to files and also about custom data output 
formats.
+
+The `print()` method is useful for developing/debugging. It will output 
the contents of the DataSet 
+to standard output (on the JVM starting the Flink execution). **NOTE** The 
behavior of the `print()`
+method changed with Flink 0.9.x. Before it was printing to the log file of 
the workers, now its 
+sending the DataSet results to the client and printing the results there.
+
+`collect()` allows to retrieve the DataSet from the cluster to the local 
JVM. The `collect()` method 
--- End diff --

I don't know if this a German thing. I also catch myself doing this alot... 
but it does not *allow* you to retrieve it... it just retrieves it. ;)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2092] Adjust documentation for print()/...

2015-06-04 Thread uce
Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/774#discussion_r31703506
  
--- Diff: docs/apis/programming_guide.md ---
@@ -394,26 +382,66 @@ def write(outputFormat: FileOutputFormat[T],
 writeMode: WriteMode = WriteMode.NO_OVERWRITE)
 
 def print()
-{% endhighlight %}
 
-The last method is only useful for developing/debugging on a local machine,
-it will output the contents of the DataSet to standard output. (Note that 
in
-a cluster, the result goes to the standard out stream of the cluster nodes 
and ends
-up in the *.out* files of the workers).
-The first two do as the name suggests, the third one can be used to 
specify a
-custom data output format. Please refer
-to [Data Sinks](#data-sinks) for more information on writing to files and 
also
-about custom data output formats.
-
-Once you specified the complete program you need to call `execute` on
-the `ExecutionEnvironment`. This will either execute on your local
-machine or submit your program for execution on a cluster, depending on
-how you created the execution environment.
+def collect()
+{% endhighlight %}
 
 
 
 
 
+The first two methods (`writeAsText()` and `writeAsCsv()`) do as the name 
suggests, the third one 
+can be used to specify a custom data output format. Please refer to [Data 
Sinks](#data-sinks) for 
+more information on writing to files and also about custom data output 
formats.
+
+The `print()` method is useful for developing/debugging. It will output 
the contents of the DataSet 
+to standard output (on the JVM starting the Flink execution). **NOTE** The 
behavior of the `print()`
+method changed with Flink 0.9.x. Before it was printing to the log file of 
the workers, now its 
+sending the DataSet results to the client and printing the results there.
+
+`collect()` allows to retrieve the DataSet from the cluster to the local 
JVM. The `collect()` method 
+will return a `List` containing the elements.
+
+Both `print()` and `collect()` will trigger the execution of the program.
--- End diff --

Let's be explicit. Something along the lines of: ...trigger execution of 
the program. You don't need a further call to `execute()`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2092) Document (new) behavior of print() and execute()

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572399#comment-14572399
 ] 

ASF GitHub Bot commented on FLINK-2092:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/774#discussion_r31703441
  
--- Diff: docs/apis/programming_guide.md ---
@@ -394,26 +382,66 @@ def write(outputFormat: FileOutputFormat[T],
 writeMode: WriteMode = WriteMode.NO_OVERWRITE)
 
 def print()
-{% endhighlight %}
 
-The last method is only useful for developing/debugging on a local machine,
-it will output the contents of the DataSet to standard output. (Note that 
in
-a cluster, the result goes to the standard out stream of the cluster nodes 
and ends
-up in the *.out* files of the workers).
-The first two do as the name suggests, the third one can be used to 
specify a
-custom data output format. Please refer
-to [Data Sinks](#data-sinks) for more information on writing to files and 
also
-about custom data output formats.
-
-Once you specified the complete program you need to call `execute` on
-the `ExecutionEnvironment`. This will either execute on your local
-machine or submit your program for execution on a cluster, depending on
-how you created the execution environment.
+def collect()
+{% endhighlight %}
 
 
 
 
 
+The first two methods (`writeAsText()` and `writeAsCsv()`) do as the name 
suggests, the third one 
+can be used to specify a custom data output format. Please refer to [Data 
Sinks](#data-sinks) for 
+more information on writing to files and also about custom data output 
formats.
+
+The `print()` method is useful for developing/debugging. It will output 
the contents of the DataSet 
+to standard output (on the JVM starting the Flink execution). **NOTE** The 
behavior of the `print()`
+method changed with Flink 0.9.x. Before it was printing to the log file of 
the workers, now its 
+sending the DataSet results to the client and printing the results there.
+
+`collect()` allows to retrieve the DataSet from the cluster to the local 
JVM. The `collect()` method 
--- End diff --

I don't know if this a German thing. I also catch myself doing this alot... 
but it does not *allow* you to retrieve it... it just retrieves it. ;)


> Document (new) behavior of print() and execute()
> 
>
> Key: FLINK-2092
> URL: https://issues.apache.org/jira/browse/FLINK-2092
> Project: Flink
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2092) Document (new) behavior of print() and execute()

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572401#comment-14572401
 ] 

ASF GitHub Bot commented on FLINK-2092:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/774#discussion_r31703506
  
--- Diff: docs/apis/programming_guide.md ---
@@ -394,26 +382,66 @@ def write(outputFormat: FileOutputFormat[T],
 writeMode: WriteMode = WriteMode.NO_OVERWRITE)
 
 def print()
-{% endhighlight %}
 
-The last method is only useful for developing/debugging on a local machine,
-it will output the contents of the DataSet to standard output. (Note that 
in
-a cluster, the result goes to the standard out stream of the cluster nodes 
and ends
-up in the *.out* files of the workers).
-The first two do as the name suggests, the third one can be used to 
specify a
-custom data output format. Please refer
-to [Data Sinks](#data-sinks) for more information on writing to files and 
also
-about custom data output formats.
-
-Once you specified the complete program you need to call `execute` on
-the `ExecutionEnvironment`. This will either execute on your local
-machine or submit your program for execution on a cluster, depending on
-how you created the execution environment.
+def collect()
+{% endhighlight %}
 
 
 
 
 
+The first two methods (`writeAsText()` and `writeAsCsv()`) do as the name 
suggests, the third one 
+can be used to specify a custom data output format. Please refer to [Data 
Sinks](#data-sinks) for 
+more information on writing to files and also about custom data output 
formats.
+
+The `print()` method is useful for developing/debugging. It will output 
the contents of the DataSet 
+to standard output (on the JVM starting the Flink execution). **NOTE** The 
behavior of the `print()`
+method changed with Flink 0.9.x. Before it was printing to the log file of 
the workers, now its 
+sending the DataSet results to the client and printing the results there.
+
+`collect()` allows to retrieve the DataSet from the cluster to the local 
JVM. The `collect()` method 
+will return a `List` containing the elements.
+
+Both `print()` and `collect()` will trigger the execution of the program.
--- End diff --

Let's be explicit. Something along the lines of: ...trigger execution of 
the program. You don't need a further call to `execute()`.


> Document (new) behavior of print() and execute()
> 
>
> Key: FLINK-2092
> URL: https://issues.apache.org/jira/browse/FLINK-2092
> Project: Flink
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2092) Document (new) behavior of print() and execute()

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572402#comment-14572402
 ] 

ASF GitHub Bot commented on FLINK-2092:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/774#discussion_r31703637
  
--- Diff: docs/apis/programming_guide.md ---
@@ -394,26 +382,66 @@ def write(outputFormat: FileOutputFormat[T],
 writeMode: WriteMode = WriteMode.NO_OVERWRITE)
 
 def print()
-{% endhighlight %}
 
-The last method is only useful for developing/debugging on a local machine,
-it will output the contents of the DataSet to standard output. (Note that 
in
-a cluster, the result goes to the standard out stream of the cluster nodes 
and ends
-up in the *.out* files of the workers).
-The first two do as the name suggests, the third one can be used to 
specify a
-custom data output format. Please refer
-to [Data Sinks](#data-sinks) for more information on writing to files and 
also
-about custom data output formats.
-
-Once you specified the complete program you need to call `execute` on
-the `ExecutionEnvironment`. This will either execute on your local
-machine or submit your program for execution on a cluster, depending on
-how you created the execution environment.
+def collect()
+{% endhighlight %}
 
 
 
 
 
+The first two methods (`writeAsText()` and `writeAsCsv()`) do as the name 
suggests, the third one 
+can be used to specify a custom data output format. Please refer to [Data 
Sinks](#data-sinks) for 
+more information on writing to files and also about custom data output 
formats.
+
+The `print()` method is useful for developing/debugging. It will output 
the contents of the DataSet 
+to standard output (on the JVM starting the Flink execution). **NOTE** The 
behavior of the `print()`
+method changed with Flink 0.9.x. Before it was printing to the log file of 
the workers, now its 
+sending the DataSet results to the client and printing the results there.
+
+`collect()` allows to retrieve the DataSet from the cluster to the local 
JVM. The `collect()` method 
+will return a `List` containing the elements.
+
+Both `print()` and `collect()` will trigger the execution of the program.
+
+
+**NOTE** `print()` and `collect()` retrieve the data from the cluster to 
the client. Currently,
+the data sizes you can retrieve with `collect()` are limited due to our 
RPC system. It is not advised
+to collect DataSets larger than 10MBs.
+
+
+Once you specified the complete program you need to **trigger the program 
execution**. You can call
+`execute()` directly on the `ExecutionEnviroment` or you implicitly 
trigger the execution with
+`collect()` or `print()`.
+Depending on the type of the `ExecutionEnvironment` the execution will be 
triggered on your local 
+machine or submit your program for execution on a cluster.
+
+Note that you can not call both `print()` (or `collect()`) and `execute()` 
at the end of program.
+
+The `execute()` method is returning the `JobExecutionResult`, including 
execution times and
+accumulator results. `print()` and `collect()` are not returning the 
result, but it can be
+accessed from the `getLastJobExecutionResult()` method.
+
+
+[Back to top](#top)
+
+
+DataSet abstraction
+---
+
+The batch processing APIs of Flink are centered around the `DataSet` 
abstraction. A `DataSet` is only
+an abstract representation of a set of data that can contain duplicates.
+
+Also note that Flink is not always physically creating (materializing) 
each DataSet at runtime. This 
+depends on the used runtime, the configuration and optimizer decisions.
+
+The Flink runtime is usually not materializing the DataSets because it is 
using a streaming runtime model.
--- End diff --

We could formulate this more positive: The Flink runtime does not need to 
always materialize... 


> Document (new) behavior of print() and execute()
> 
>
> Key: FLINK-2092
> URL: https://issues.apache.org/jira/browse/FLINK-2092
> Project: Flink
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2092] Adjust documentation for print()/...

2015-06-04 Thread uce
Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/774#discussion_r31703637
  
--- Diff: docs/apis/programming_guide.md ---
@@ -394,26 +382,66 @@ def write(outputFormat: FileOutputFormat[T],
 writeMode: WriteMode = WriteMode.NO_OVERWRITE)
 
 def print()
-{% endhighlight %}
 
-The last method is only useful for developing/debugging on a local machine,
-it will output the contents of the DataSet to standard output. (Note that 
in
-a cluster, the result goes to the standard out stream of the cluster nodes 
and ends
-up in the *.out* files of the workers).
-The first two do as the name suggests, the third one can be used to 
specify a
-custom data output format. Please refer
-to [Data Sinks](#data-sinks) for more information on writing to files and 
also
-about custom data output formats.
-
-Once you specified the complete program you need to call `execute` on
-the `ExecutionEnvironment`. This will either execute on your local
-machine or submit your program for execution on a cluster, depending on
-how you created the execution environment.
+def collect()
+{% endhighlight %}
 
 
 
 
 
+The first two methods (`writeAsText()` and `writeAsCsv()`) do as the name 
suggests, the third one 
+can be used to specify a custom data output format. Please refer to [Data 
Sinks](#data-sinks) for 
+more information on writing to files and also about custom data output 
formats.
+
+The `print()` method is useful for developing/debugging. It will output 
the contents of the DataSet 
+to standard output (on the JVM starting the Flink execution). **NOTE** The 
behavior of the `print()`
+method changed with Flink 0.9.x. Before it was printing to the log file of 
the workers, now its 
+sending the DataSet results to the client and printing the results there.
+
+`collect()` allows to retrieve the DataSet from the cluster to the local 
JVM. The `collect()` method 
+will return a `List` containing the elements.
+
+Both `print()` and `collect()` will trigger the execution of the program.
+
+
+**NOTE** `print()` and `collect()` retrieve the data from the cluster to 
the client. Currently,
+the data sizes you can retrieve with `collect()` are limited due to our 
RPC system. It is not advised
+to collect DataSets larger than 10MBs.
+
+
+Once you specified the complete program you need to **trigger the program 
execution**. You can call
+`execute()` directly on the `ExecutionEnviroment` or you implicitly 
trigger the execution with
+`collect()` or `print()`.
+Depending on the type of the `ExecutionEnvironment` the execution will be 
triggered on your local 
+machine or submit your program for execution on a cluster.
+
+Note that you can not call both `print()` (or `collect()`) and `execute()` 
at the end of program.
+
+The `execute()` method is returning the `JobExecutionResult`, including 
execution times and
+accumulator results. `print()` and `collect()` are not returning the 
result, but it can be
+accessed from the `getLastJobExecutionResult()` method.
+
+
+[Back to top](#top)
+
+
+DataSet abstraction
+---
+
+The batch processing APIs of Flink are centered around the `DataSet` 
abstraction. A `DataSet` is only
+an abstract representation of a set of data that can contain duplicates.
+
+Also note that Flink is not always physically creating (materializing) 
each DataSet at runtime. This 
+depends on the used runtime, the configuration and optimizer decisions.
+
+The Flink runtime is usually not materializing the DataSets because it is 
using a streaming runtime model.
--- End diff --

We could formulate this more positive: The Flink runtime does not need to 
always materialize... 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2092] Adjust documentation for print()/...

2015-06-04 Thread uce
Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/774#discussion_r31703697
  
--- Diff: docs/apis/programming_guide.md ---
@@ -394,26 +382,66 @@ def write(outputFormat: FileOutputFormat[T],
 writeMode: WriteMode = WriteMode.NO_OVERWRITE)
 
 def print()
-{% endhighlight %}
 
-The last method is only useful for developing/debugging on a local machine,
-it will output the contents of the DataSet to standard output. (Note that 
in
-a cluster, the result goes to the standard out stream of the cluster nodes 
and ends
-up in the *.out* files of the workers).
-The first two do as the name suggests, the third one can be used to 
specify a
-custom data output format. Please refer
-to [Data Sinks](#data-sinks) for more information on writing to files and 
also
-about custom data output formats.
-
-Once you specified the complete program you need to call `execute` on
-the `ExecutionEnvironment`. This will either execute on your local
-machine or submit your program for execution on a cluster, depending on
-how you created the execution environment.
+def collect()
+{% endhighlight %}
 
 
 
 
 
+The first two methods (`writeAsText()` and `writeAsCsv()`) do as the name 
suggests, the third one 
+can be used to specify a custom data output format. Please refer to [Data 
Sinks](#data-sinks) for 
+more information on writing to files and also about custom data output 
formats.
+
+The `print()` method is useful for developing/debugging. It will output 
the contents of the DataSet 
+to standard output (on the JVM starting the Flink execution). **NOTE** The 
behavior of the `print()`
+method changed with Flink 0.9.x. Before it was printing to the log file of 
the workers, now its 
+sending the DataSet results to the client and printing the results there.
+
+`collect()` allows to retrieve the DataSet from the cluster to the local 
JVM. The `collect()` method 
+will return a `List` containing the elements.
+
+Both `print()` and `collect()` will trigger the execution of the program.
+
+
+**NOTE** `print()` and `collect()` retrieve the data from the cluster to 
the client. Currently,
+the data sizes you can retrieve with `collect()` are limited due to our 
RPC system. It is not advised
+to collect DataSets larger than 10MBs.
+
+
+Once you specified the complete program you need to **trigger the program 
execution**. You can call
+`execute()` directly on the `ExecutionEnviroment` or you implicitly 
trigger the execution with
+`collect()` or `print()`.
+Depending on the type of the `ExecutionEnvironment` the execution will be 
triggered on your local 
+machine or submit your program for execution on a cluster.
+
+Note that you can not call both `print()` (or `collect()`) and `execute()` 
at the end of program.
+
+The `execute()` method is returning the `JobExecutionResult`, including 
execution times and
+accumulator results. `print()` and `collect()` are not returning the 
result, but it can be
+accessed from the `getLastJobExecutionResult()` method.
+
+
+[Back to top](#top)
+
+
+DataSet abstraction
+---
+
+The batch processing APIs of Flink are centered around the `DataSet` 
abstraction. A `DataSet` is only
+an abstract representation of a set of data that can contain duplicates.
+
+Also note that Flink is not always physically creating (materializing) 
each DataSet at runtime. This 
+depends on the used runtime, the configuration and optimizer decisions.
+
+The Flink runtime is usually not materializing the DataSets because it is 
using a streaming runtime model.
+DataSets are only materialized to avoid distributed deadlocks (usually at 
points where the data flow 
--- End diff --

Can skip the `usally`... and I would say: `at points where the data flow 
graph branches out and joins again later` instead of `out to join`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2092) Document (new) behavior of print() and execute()

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572404#comment-14572404
 ] 

ASF GitHub Bot commented on FLINK-2092:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/774#discussion_r31703697
  
--- Diff: docs/apis/programming_guide.md ---
@@ -394,26 +382,66 @@ def write(outputFormat: FileOutputFormat[T],
 writeMode: WriteMode = WriteMode.NO_OVERWRITE)
 
 def print()
-{% endhighlight %}
 
-The last method is only useful for developing/debugging on a local machine,
-it will output the contents of the DataSet to standard output. (Note that 
in
-a cluster, the result goes to the standard out stream of the cluster nodes 
and ends
-up in the *.out* files of the workers).
-The first two do as the name suggests, the third one can be used to 
specify a
-custom data output format. Please refer
-to [Data Sinks](#data-sinks) for more information on writing to files and 
also
-about custom data output formats.
-
-Once you specified the complete program you need to call `execute` on
-the `ExecutionEnvironment`. This will either execute on your local
-machine or submit your program for execution on a cluster, depending on
-how you created the execution environment.
+def collect()
+{% endhighlight %}
 
 
 
 
 
+The first two methods (`writeAsText()` and `writeAsCsv()`) do as the name 
suggests, the third one 
+can be used to specify a custom data output format. Please refer to [Data 
Sinks](#data-sinks) for 
+more information on writing to files and also about custom data output 
formats.
+
+The `print()` method is useful for developing/debugging. It will output 
the contents of the DataSet 
+to standard output (on the JVM starting the Flink execution). **NOTE** The 
behavior of the `print()`
+method changed with Flink 0.9.x. Before it was printing to the log file of 
the workers, now its 
+sending the DataSet results to the client and printing the results there.
+
+`collect()` allows to retrieve the DataSet from the cluster to the local 
JVM. The `collect()` method 
+will return a `List` containing the elements.
+
+Both `print()` and `collect()` will trigger the execution of the program.
+
+
+**NOTE** `print()` and `collect()` retrieve the data from the cluster to 
the client. Currently,
+the data sizes you can retrieve with `collect()` are limited due to our 
RPC system. It is not advised
+to collect DataSets larger than 10MBs.
+
+
+Once you specified the complete program you need to **trigger the program 
execution**. You can call
+`execute()` directly on the `ExecutionEnviroment` or you implicitly 
trigger the execution with
+`collect()` or `print()`.
+Depending on the type of the `ExecutionEnvironment` the execution will be 
triggered on your local 
+machine or submit your program for execution on a cluster.
+
+Note that you can not call both `print()` (or `collect()`) and `execute()` 
at the end of program.
+
+The `execute()` method is returning the `JobExecutionResult`, including 
execution times and
+accumulator results. `print()` and `collect()` are not returning the 
result, but it can be
+accessed from the `getLastJobExecutionResult()` method.
+
+
+[Back to top](#top)
+
+
+DataSet abstraction
+---
+
+The batch processing APIs of Flink are centered around the `DataSet` 
abstraction. A `DataSet` is only
+an abstract representation of a set of data that can contain duplicates.
+
+Also note that Flink is not always physically creating (materializing) 
each DataSet at runtime. This 
+depends on the used runtime, the configuration and optimizer decisions.
+
+The Flink runtime is usually not materializing the DataSets because it is 
using a streaming runtime model.
+DataSets are only materialized to avoid distributed deadlocks (usually at 
points where the data flow 
--- End diff --

Can skip the `usally`... and I would say: `at points where the data flow 
graph branches out and joins again later` instead of `out to join`.


> Document (new) behavior of print() and execute()
> 
>
> Key: FLINK-2092
> URL: https://issues.apache.org/jira/browse/FLINK-2092
> Project: Flink
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2092) Document (new) behavior of print() and execute()

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572411#comment-14572411
 ] 

ASF GitHub Bot commented on FLINK-2092:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/774#issuecomment-108795427
  
Thanks so much for updating the docs and starting the discussion on the ML. 
If you don't have time to adress my comments, just give your feedback and I can 
take care of it. :-)


> Document (new) behavior of print() and execute()
> 
>
> Key: FLINK-2092
> URL: https://issues.apache.org/jira/browse/FLINK-2092
> Project: Flink
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2092] Adjust documentation for print()/...

2015-06-04 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/774#issuecomment-108795427
  
Thanks so much for updating the docs and starting the discussion on the ML. 
If you don't have time to adress my comments, just give your feedback and I can 
take care of it. :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2134] Close Netty channel via CloseRequ...

2015-06-04 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/773#issuecomment-108795770
  
OK, thanks for the review. I'm merging this now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [docs/javadoc][hotfix] Corrected Join hint and...

2015-06-04 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/763#issuecomment-108795853
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2134) Deadlock in SuccessAfterNetworkBuffersFailureITCase

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572413#comment-14572413
 ] 

ASF GitHub Bot commented on FLINK-2134:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/773#issuecomment-108795770
  
OK, thanks for the review. I'm merging this now.


> Deadlock in SuccessAfterNetworkBuffersFailureITCase
> ---
>
> Key: FLINK-2134
> URL: https://issues.apache.org/jira/browse/FLINK-2134
> Project: Flink
>  Issue Type: Bug
>Affects Versions: master
>Reporter: Ufuk Celebi
>
> I ran into the issue in a Travis run for a PR: 
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/64994288/log.txt
> I can reproduce this locally by running 
> SuccessAfterNetworkBuffersFailureITCase multiple times:
> {code}
> cluster = new ForkableFlinkMiniCluster(config, false);
> for (int i = 0; i < 100; i++) {
>// run test programs CC, KMeans, CC
> }
> {code}
> The iteration tasks wait for superstep notifications like this:
> {code}
> "Join (Join at 
> runConnectedComponents(SuccessAfterNetworkBuffersFailureITCase.java:128)) 
> (8/6)" daemon prio=5 tid=0x7f95f374f800 nid=0x138a7 in Object.wait() 
> [0x000123f2a000]
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x0007f89e3440> (a java.lang.Object)
>   at 
> org.apache.flink.runtime.iterative.concurrent.SuperstepKickoffLatch.awaitStartOfSuperstepOrTermination(SuperstepKickoffLatch.java:57)
>   - locked <0x0007f89e3440> (a java.lang.Object)
>   at 
> org.apache.flink.runtime.iterative.task.IterationTailPactTask.run(IterationTailPactTask.java:131)
>   at 
> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> I've asked [~rmetzger] to reproduce this and it deadlocks for him as well. 
> The system needs to be under some load for this to occur after multiple runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2157) Create evaluation framework for ML library

2015-06-04 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-2157:


 Summary: Create evaluation framework for ML library
 Key: FLINK-2157
 URL: https://issues.apache.org/jira/browse/FLINK-2157
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann


Currently, FlinkML lacks means to evaluate the performance of trained models. 
It would be great to add some {{Evaluators}} which can calculate some score 
based on the information about true and predicted labels. This could also be 
used for the cross validation to choose the right hyper parameters.

Possible scores could be F score [1], zero-one-loss score, etc.

Resources
[1] [http://en.wikipedia.org/wiki/F1_score]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2158) NullPointerException in DateSerializer.

2015-06-04 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-2158:
-

 Summary: NullPointerException in DateSerializer.
 Key: FLINK-2158
 URL: https://issues.apache.org/jira/browse/FLINK-2158
 Project: Flink
  Issue Type: Bug
  Components: Local Runtime
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Robert Metzger


A user reported the following issue to me.
{code}
The data preparation for task 'GroupReduce (GroupReduce at main(XXX.java:96))' 
, caused an error: Error obtaining the sorted input: Thread 'SortMerger Reading 
Thread' terminated due to an exception: null
at 
org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:472)
at 
org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
at 
org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Error obtaining the sorted input: Thread 
'SortMerger Reading Thread' terminated due to an exception: null
at 
org.apache.flink.runtime.operators.sort.UnilateralSortMerger.getIterator(UnilateralSortMerger.java:607)
at 
org.apache.flink.runtime.operators.RegularPactTask.getInput(RegularPactTask.java:1134)
at 
org.apache.flink.runtime.operators.GroupReduceDriver.prepare(GroupReduceDriver.java:94)
at 
org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:466)
... 3 more
Caused by: java.io.IOException: Thread 'SortMerger Reading Thread' terminated 
due to an exception: null
at 
org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:785)
Caused by: java.lang.NullPointerException
at 
org.apache.flink.api.common.typeutils.base.DateSerializer.deserialize(DateSerializer.java:72)
at 
org.apache.flink.api.common.typeutils.base.DateSerializer.deserialize(DateSerializer.java:28)
at 
org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:487)
at 
org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:136)
at 
org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:30)
at 
org.apache.flink.runtime.plugable.ReusingDeserializationDelegate.read(ReusingDeserializationDelegate.java:57)
at 
org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:111)
at 
org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:64)
at 
org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
at 
org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:59)
at 
org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ReadingThread.go(UnilateralSortMerger.java:1020)
at 
org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:781)
{code}

I'm investigating it ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2134] Close Netty channel via CloseRequ...

2015-06-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/773


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2134) Deadlock in SuccessAfterNetworkBuffersFailureITCase

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572424#comment-14572424
 ] 

ASF GitHub Bot commented on FLINK-2134:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/773


> Deadlock in SuccessAfterNetworkBuffersFailureITCase
> ---
>
> Key: FLINK-2134
> URL: https://issues.apache.org/jira/browse/FLINK-2134
> Project: Flink
>  Issue Type: Bug
>Affects Versions: master
>Reporter: Ufuk Celebi
>
> I ran into the issue in a Travis run for a PR: 
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/64994288/log.txt
> I can reproduce this locally by running 
> SuccessAfterNetworkBuffersFailureITCase multiple times:
> {code}
> cluster = new ForkableFlinkMiniCluster(config, false);
> for (int i = 0; i < 100; i++) {
>// run test programs CC, KMeans, CC
> }
> {code}
> The iteration tasks wait for superstep notifications like this:
> {code}
> "Join (Join at 
> runConnectedComponents(SuccessAfterNetworkBuffersFailureITCase.java:128)) 
> (8/6)" daemon prio=5 tid=0x7f95f374f800 nid=0x138a7 in Object.wait() 
> [0x000123f2a000]
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x0007f89e3440> (a java.lang.Object)
>   at 
> org.apache.flink.runtime.iterative.concurrent.SuperstepKickoffLatch.awaitStartOfSuperstepOrTermination(SuperstepKickoffLatch.java:57)
>   - locked <0x0007f89e3440> (a java.lang.Object)
>   at 
> org.apache.flink.runtime.iterative.task.IterationTailPactTask.run(IterationTailPactTask.java:131)
>   at 
> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> I've asked [~rmetzger] to reproduce this and it deadlocks for him as well. 
> The system needs to be under some load for this to occur after multiple runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2134) Deadlock in SuccessAfterNetworkBuffersFailureITCase

2015-06-04 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi resolved FLINK-2134.

   Resolution: Fixed
Fix Version/s: 0.9

Fixed via 0dea359.

> Deadlock in SuccessAfterNetworkBuffersFailureITCase
> ---
>
> Key: FLINK-2134
> URL: https://issues.apache.org/jira/browse/FLINK-2134
> Project: Flink
>  Issue Type: Bug
>Affects Versions: master
>Reporter: Ufuk Celebi
> Fix For: 0.9
>
>
> I ran into the issue in a Travis run for a PR: 
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/64994288/log.txt
> I can reproduce this locally by running 
> SuccessAfterNetworkBuffersFailureITCase multiple times:
> {code}
> cluster = new ForkableFlinkMiniCluster(config, false);
> for (int i = 0; i < 100; i++) {
>// run test programs CC, KMeans, CC
> }
> {code}
> The iteration tasks wait for superstep notifications like this:
> {code}
> "Join (Join at 
> runConnectedComponents(SuccessAfterNetworkBuffersFailureITCase.java:128)) 
> (8/6)" daemon prio=5 tid=0x7f95f374f800 nid=0x138a7 in Object.wait() 
> [0x000123f2a000]
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x0007f89e3440> (a java.lang.Object)
>   at 
> org.apache.flink.runtime.iterative.concurrent.SuperstepKickoffLatch.awaitStartOfSuperstepOrTermination(SuperstepKickoffLatch.java:57)
>   - locked <0x0007f89e3440> (a java.lang.Object)
>   at 
> org.apache.flink.runtime.iterative.task.IterationTailPactTask.run(IterationTailPactTask.java:131)
>   at 
> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> I've asked [~rmetzger] to reproduce this and it deadlocks for him as well. 
> The system needs to be under some load for this to occur after multiple runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2159) SimpleRecoveryITCase fails

2015-06-04 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-2159:
--

 Summary: SimpleRecoveryITCase fails
 Key: FLINK-2159
 URL: https://issues.apache.org/jira/browse/FLINK-2159
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: master
Reporter: Ufuk Celebi


https://s3.amazonaws.com/archive.travis-ci.org/jobs/65257722/log.txt

{code}
Caused by: 
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not 
enough free slots available to run the job. You can decrease the operator 
parallelism or increase the number of slots per TaskManager in the 
configuration. Task to schedule: < Attempt #5 (DataSource (at 
testRestartMultipleTimes(SimpleRecoveryITCase.java:208) 
(org.apache.flink.api.java.io.ParallelIteratorInputFormat)) (3/4)) @ 
(unassigned) - [SCHEDULED] > with groupID < 20c83c1167381bfe663661879f72d07f > 
in sharing group < SlotSharingGroup [e9782d396600feda909e1fe6b16d3aaf, 
22ffd0dc2c2845bcb0a8d0ba53f68672, 61a229789b0edde1e0acc83e6d2a996b, 
20c83c1167381bfe663661879f72d07f] >. Resources available to scheduler: Number 
of instances=1, total number of slots=2, available slots=0
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2135) Java plan translation fails with ClassCastException (probably in first())

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572448#comment-14572448
 ] 

ASF GitHub Bot commented on FLINK-2135:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/769#issuecomment-108808078
  
+1

Get this in soon, please!


> Java plan translation fails with ClassCastException (probably in first())
> -
>
> Key: FLINK-2135
> URL: https://issues.apache.org/jira/browse/FLINK-2135
> Project: Flink
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>
> A user reported the following error
> {code}
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.flink.api.java.functions.FirstReducer cannot be cast to 
> org.apache.flink.api.common.functions.RichGroupReduceFunction
>   at 
> org.apache.flink.api.java.operators.translation.PlanUnwrappingSortedReduceGroupOperator.(PlanUnwrappingSortedReduceGroupOperator.java:40)
>   at 
> org.apache.flink.api.java.operators.GroupReduceOperator.translateSelectorFunctionSortedReducer(GroupReduceOperator.java:278)
>   at 
> org.apache.flink.api.java.operators.GroupReduceOperator.translateToDataFlow(GroupReduceOperator.java:177)
>   at 
> org.apache.flink.api.java.operators.GroupReduceOperator.translateToDataFlow(GroupReduceOperator.java:50)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translateSingleInputOperator(OperatorTranslation.java:124)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translate(OperatorTranslation.java:86)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translateSingleInputOperator(OperatorTranslation.java:122)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translate(OperatorTranslation.java:86)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translateSingleInputOperator(OperatorTranslation.java:122)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translate(OperatorTranslation.java:86)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translate(OperatorTranslation.java:61)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translateToPlan(OperatorTranslation.java:49)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:925)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:893)
>   at 
> org.apache.flink.api.java.LocalEnvironment.execute(LocalEnvironment.java:50)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:789)
>   at org.apache.flink.api.java.DataSet.collect(DataSet.java:411)
>   at org.apache.flink.api.java.DataSet.print(DataSet.java:1346)
>   at com.dataartisans.GroupReduceBug.main(GroupReduceBug.java:43)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
> {code}
> It is reproducible with the following code
> {code}
>   ExecutionEnvironment ee = 
> ExecutionEnvironment.getExecutionEnvironment();
>   DataSet b = ee.fromElements("a", "b");
>   GroupReduceOperator a = b.groupBy(new 
> KeySelector() {
>   @Override
>   public Long getKey(String value) throws Exception {
>   return 1L;
>   }
>   }).sortGroup(new KeySelector() {
>   @Override
>   public Double getKey(String value) throws Exception {
>   return 1.0;
>   }
>   }, Order.DESCENDING).first(10);
>   a.print();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2135] Fix faulty cast to GroupReduceFun...

2015-06-04 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/769#issuecomment-108808078
  
+1

Get this in soon, please!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2126] Fixed Flink scala shell ITSuite s...

2015-06-04 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/768#issuecomment-108808536
  
...merge race :D Can you change the commit msg to include the issue ID 
please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2126] Fixed Flink scala shell ITSuite s...

2015-06-04 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/768#issuecomment-108808452
  
+1

Having the correct result is the main validation we need. 

Merging this...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2126) Scala shell tests sporadically fail on travis

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572450#comment-14572450
 ] 

ASF GitHub Bot commented on FLINK-2126:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/768#issuecomment-108808452
  
+1

Having the correct result is the main validation we need. 

Merging this...


> Scala shell tests sporadically fail on travis
> -
>
> Key: FLINK-2126
> URL: https://issues.apache.org/jira/browse/FLINK-2126
> Project: Flink
>  Issue Type: Bug
>  Components: Scala Shell
>Affects Versions: 0.9
>Reporter: Robert Metzger
>
> See https://travis-ci.org/rmetzger/flink/jobs/64893149



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2126) Scala shell tests sporadically fail on travis

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572451#comment-14572451
 ] 

ASF GitHub Bot commented on FLINK-2126:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/768#issuecomment-108808536
  
...merge race :D Can you change the commit msg to include the issue ID 
please?


> Scala shell tests sporadically fail on travis
> -
>
> Key: FLINK-2126
> URL: https://issues.apache.org/jira/browse/FLINK-2126
> Project: Flink
>  Issue Type: Bug
>  Components: Scala Shell
>Affects Versions: 0.9
>Reporter: Robert Metzger
>
> See https://travis-ci.org/rmetzger/flink/jobs/64893149



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2160) Change Streaming Source Interface to run(Context)/cancel()

2015-06-04 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-2160:
---

 Summary: Change Streaming Source Interface to run(Context)/cancel()
 Key: FLINK-2160
 URL: https://issues.apache.org/jira/browse/FLINK-2160
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek


This will make the source interface more extensible in the future without 
breaking existing sources (after the change). Right now, the context would have 
methods for element emission and for retrieving the checkpoint lock for 
checkpointed sources. In the future this can be extended to allow emission of 
elements with timestamp and for dealing with watermark emission.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...

2015-06-04 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/729#discussion_r31706246
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/operators/SingleInputUdfOperator.java
 ---
@@ -54,8 +54,11 @@
 
private Map> broadcastVariables;
 
+   // NOTE: only set this variable via setSemanticProperties()
--- End diff --

Just a quick question (haven't checked the code). Does the analyzer also 
respect semantic information provided via the Operator API 
(withForwardedFields()), i.e., not via function annotations?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...

2015-06-04 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/729#issuecomment-108808990
  
The build succeeds. :) I will have a look at the changes. Thanks for not 
force updating this PR.

I will test it in a distributed setup and if everything runs fine, we can 
merge this. :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572455#comment-14572455
 ] 

ASF GitHub Bot commented on FLINK-1319:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/729#issuecomment-108808990
  
The build succeeds. :) I will have a look at the changes. Thanks for not 
force updating this PR.

I will test it in a distributed setup and if everything runs fine, we can 
merge this. :-)


> Add static code analysis for UDFs
> -
>
> Key: FLINK-1319
> URL: https://issues.apache.org/jira/browse/FLINK-1319
> Project: Flink
>  Issue Type: New Feature
>  Components: Java API, Scala API
>Reporter: Stephan Ewen
>Assignee: Timo Walther
>Priority: Minor
>
> Flink's Optimizer takes information that tells it for UDFs which fields of 
> the input elements are accessed, modified, or frwarded/copied. This 
> information frequently helps to reuse partitionings, sorts, etc. It may speed 
> up programs significantly, as it can frequently eliminate sorts and shuffles, 
> which are costly.
> Right now, users can add lightweight annotations to UDFs to provide this 
> information (such as adding {{@ConstandFields("0->3, 1, 2->1")}}.
> We worked with static code analysis of UDFs before, to determine this 
> information automatically. This is an incredible feature, as it "magically" 
> makes programs faster.
> For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this 
> works surprisingly well in many cases. We used the "Soot" toolkit for the 
> static code analysis. Unfortunately, Soot is LGPL licensed and thus we did 
> not include any of the code so far.
> I propose to add this functionality to Flink, in the form of a drop-in 
> addition, to work around the LGPL incompatibility with ALS 2.0. Users could 
> simply download a special "flink-code-analysis.jar" and drop it into the 
> "lib" folder to enable this functionality. We may even add a script to 
> "tools" that downloads that library automatically into the lib folder. This 
> should be legally fine, since we do not redistribute LGPL code and only 
> dynamically link it (the incompatibility with ASL 2.0 is mainly in the 
> patentability, if I remember correctly).
> Prior work on this has been done by [~aljoscha] and [~skunert], which could 
> provide a code base to start with.
> *Appendix*
> Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/
> Papers on static analysis and for optimization: 
> http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and 
> http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
> Quick introduction to the Optimizer: 
> http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf 
> (Section 6)
> Optimizer for Iterations: 
> http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf 
> (Sections 4.3 and 5.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572454#comment-14572454
 ] 

ASF GitHub Bot commented on FLINK-1319:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/729#discussion_r31706246
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/operators/SingleInputUdfOperator.java
 ---
@@ -54,8 +54,11 @@
 
private Map> broadcastVariables;
 
+   // NOTE: only set this variable via setSemanticProperties()
--- End diff --

Just a quick question (haven't checked the code). Does the analyzer also 
respect semantic information provided via the Operator API 
(withForwardedFields()), i.e., not via function annotations?


> Add static code analysis for UDFs
> -
>
> Key: FLINK-1319
> URL: https://issues.apache.org/jira/browse/FLINK-1319
> Project: Flink
>  Issue Type: New Feature
>  Components: Java API, Scala API
>Reporter: Stephan Ewen
>Assignee: Timo Walther
>Priority: Minor
>
> Flink's Optimizer takes information that tells it for UDFs which fields of 
> the input elements are accessed, modified, or frwarded/copied. This 
> information frequently helps to reuse partitionings, sorts, etc. It may speed 
> up programs significantly, as it can frequently eliminate sorts and shuffles, 
> which are costly.
> Right now, users can add lightweight annotations to UDFs to provide this 
> information (such as adding {{@ConstandFields("0->3, 1, 2->1")}}.
> We worked with static code analysis of UDFs before, to determine this 
> information automatically. This is an incredible feature, as it "magically" 
> makes programs faster.
> For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this 
> works surprisingly well in many cases. We used the "Soot" toolkit for the 
> static code analysis. Unfortunately, Soot is LGPL licensed and thus we did 
> not include any of the code so far.
> I propose to add this functionality to Flink, in the form of a drop-in 
> addition, to work around the LGPL incompatibility with ALS 2.0. Users could 
> simply download a special "flink-code-analysis.jar" and drop it into the 
> "lib" folder to enable this functionality. We may even add a script to 
> "tools" that downloads that library automatically into the lib folder. This 
> should be legally fine, since we do not redistribute LGPL code and only 
> dynamically link it (the incompatibility with ASL 2.0 is mainly in the 
> patentability, if I remember correctly).
> Prior work on this has been done by [~aljoscha] and [~skunert], which could 
> provide a code base to start with.
> *Appendix*
> Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/
> Papers on static analysis and for optimization: 
> http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and 
> http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
> Quick introduction to the Optimizer: 
> http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf 
> (Section 6)
> Optimizer for Iterations: 
> http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf 
> (Sections 4.3 and 5.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Remove extra HTML tags in TypeInformation Java...

2015-06-04 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/766#issuecomment-108809645
  
Merging...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [docs/javadoc][hotfix] Corrected Join hint and...

2015-06-04 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/763#issuecomment-108810129
  
Merging...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2135] Fix faulty cast to GroupReduceFun...

2015-06-04 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/769#issuecomment-108810422
  
Okay, will merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2135) Java plan translation fails with ClassCastException (probably in first())

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572461#comment-14572461
 ] 

ASF GitHub Bot commented on FLINK-2135:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/769#issuecomment-108810422
  
Okay, will merge


> Java plan translation fails with ClassCastException (probably in first())
> -
>
> Key: FLINK-2135
> URL: https://issues.apache.org/jira/browse/FLINK-2135
> Project: Flink
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>
> A user reported the following error
> {code}
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.flink.api.java.functions.FirstReducer cannot be cast to 
> org.apache.flink.api.common.functions.RichGroupReduceFunction
>   at 
> org.apache.flink.api.java.operators.translation.PlanUnwrappingSortedReduceGroupOperator.(PlanUnwrappingSortedReduceGroupOperator.java:40)
>   at 
> org.apache.flink.api.java.operators.GroupReduceOperator.translateSelectorFunctionSortedReducer(GroupReduceOperator.java:278)
>   at 
> org.apache.flink.api.java.operators.GroupReduceOperator.translateToDataFlow(GroupReduceOperator.java:177)
>   at 
> org.apache.flink.api.java.operators.GroupReduceOperator.translateToDataFlow(GroupReduceOperator.java:50)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translateSingleInputOperator(OperatorTranslation.java:124)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translate(OperatorTranslation.java:86)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translateSingleInputOperator(OperatorTranslation.java:122)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translate(OperatorTranslation.java:86)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translateSingleInputOperator(OperatorTranslation.java:122)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translate(OperatorTranslation.java:86)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translate(OperatorTranslation.java:61)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translateToPlan(OperatorTranslation.java:49)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:925)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:893)
>   at 
> org.apache.flink.api.java.LocalEnvironment.execute(LocalEnvironment.java:50)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:789)
>   at org.apache.flink.api.java.DataSet.collect(DataSet.java:411)
>   at org.apache.flink.api.java.DataSet.print(DataSet.java:1346)
>   at com.dataartisans.GroupReduceBug.main(GroupReduceBug.java:43)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
> {code}
> It is reproducible with the following code
> {code}
>   ExecutionEnvironment ee = 
> ExecutionEnvironment.getExecutionEnvironment();
>   DataSet b = ee.fromElements("a", "b");
>   GroupReduceOperator a = b.groupBy(new 
> KeySelector() {
>   @Override
>   public Long getKey(String value) throws Exception {
>   return 1L;
>   }
>   }).sortGroup(new KeySelector() {
>   @Override
>   public Double getKey(String value) throws Exception {
>   return 1.0;
>   }
>   }, Order.DESCENDING).first(10);
>   a.print();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2160) Change Streaming Source Interface to run(Context)/cancel()

2015-06-04 Thread Gyula Fora (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572462#comment-14572462
 ] 

Gyula Fora commented on FLINK-2160:
---

The context is accessible from the open method before run is called. Why do we 
need the context in the run?

> Change Streaming Source Interface to run(Context)/cancel()
> --
>
> Key: FLINK-2160
> URL: https://issues.apache.org/jira/browse/FLINK-2160
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> This will make the source interface more extensible in the future without 
> breaking existing sources (after the change). Right now, the context would 
> have methods for element emission and for retrieving the checkpoint lock for 
> checkpointed sources. In the future this can be extended to allow emission of 
> elements with timestamp and for dealing with watermark emission.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1981] add support for GZIP files

2015-06-04 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/762#issuecomment-108812916
  
:+1: This has been requested multiple times now. I would merge your pull 
request. Can you add some documentation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...

2015-06-04 Thread twalthr
Github user twalthr commented on a diff in the pull request:

https://github.com/apache/flink/pull/729#discussion_r31707016
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/operators/SingleInputUdfOperator.java
 ---
@@ -54,8 +54,11 @@
 
private Map> broadcastVariables;
 
+   // NOTE: only set this variable via setSemanticProperties()
--- End diff --

Yes, I have also added a test for that (see 
SemanticPropertiesPrecendenceTest).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572472#comment-14572472
 ] 

ASF GitHub Bot commented on FLINK-1319:
---

Github user twalthr commented on a diff in the pull request:

https://github.com/apache/flink/pull/729#discussion_r31707016
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/operators/SingleInputUdfOperator.java
 ---
@@ -54,8 +54,11 @@
 
private Map> broadcastVariables;
 
+   // NOTE: only set this variable via setSemanticProperties()
--- End diff --

Yes, I have also added a test for that (see 
SemanticPropertiesPrecendenceTest).


> Add static code analysis for UDFs
> -
>
> Key: FLINK-1319
> URL: https://issues.apache.org/jira/browse/FLINK-1319
> Project: Flink
>  Issue Type: New Feature
>  Components: Java API, Scala API
>Reporter: Stephan Ewen
>Assignee: Timo Walther
>Priority: Minor
>
> Flink's Optimizer takes information that tells it for UDFs which fields of 
> the input elements are accessed, modified, or frwarded/copied. This 
> information frequently helps to reuse partitionings, sorts, etc. It may speed 
> up programs significantly, as it can frequently eliminate sorts and shuffles, 
> which are costly.
> Right now, users can add lightweight annotations to UDFs to provide this 
> information (such as adding {{@ConstandFields("0->3, 1, 2->1")}}.
> We worked with static code analysis of UDFs before, to determine this 
> information automatically. This is an incredible feature, as it "magically" 
> makes programs faster.
> For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this 
> works surprisingly well in many cases. We used the "Soot" toolkit for the 
> static code analysis. Unfortunately, Soot is LGPL licensed and thus we did 
> not include any of the code so far.
> I propose to add this functionality to Flink, in the form of a drop-in 
> addition, to work around the LGPL incompatibility with ALS 2.0. Users could 
> simply download a special "flink-code-analysis.jar" and drop it into the 
> "lib" folder to enable this functionality. We may even add a script to 
> "tools" that downloads that library automatically into the lib folder. This 
> should be legally fine, since we do not redistribute LGPL code and only 
> dynamically link it (the incompatibility with ASL 2.0 is mainly in the 
> patentability, if I remember correctly).
> Prior work on this has been done by [~aljoscha] and [~skunert], which could 
> provide a code base to start with.
> *Appendix*
> Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/
> Papers on static analysis and for optimization: 
> http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and 
> http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
> Quick introduction to the Optimizer: 
> http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf 
> (Section 6)
> Optimizer for Iterations: 
> http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf 
> (Sections 4.3 and 5.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1981) Add GZip support

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572471#comment-14572471
 ] 

ASF GitHub Bot commented on FLINK-1981:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/762#issuecomment-108812916
  
:+1: This has been requested multiple times now. I would merge your pull 
request. Can you add some documentation?


> Add GZip support
> 
>
> Key: FLINK-1981
> URL: https://issues.apache.org/jira/browse/FLINK-1981
> Project: Flink
>  Issue Type: New Feature
>  Components: Core
>Reporter: Sebastian Kruse
>Assignee: Sebastian Kruse
>Priority: Minor
>
> GZip, as a commonly used compression format, should be supported in the same 
> way as the already supported deflate files. This allows to use GZip files 
> with any subclass of FileInputFormat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

2015-06-04 Thread Faye Beligianni (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572475#comment-14572475
 ] 

Faye Beligianni commented on FLINK-1731:


Hello [~peedeeX21], 

I am interesting in using your implementation of k-Means algorithm for some 
experiments that I am gonna run for my master thesis.
If that's not a problem I could pull it from your repository. 

Thank you, 
Faye
 

> Add kMeans clustering algorithm to machine learning library
> ---
>
> Key: FLINK-1731
> URL: https://issues.apache.org/jira/browse/FLINK-1731
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Peter Schrott
>  Labels: ML
>
> The Flink repository already contains a kMeans implementation but it is not 
> yet ported to the machine learning library. I assume that only the used data 
> types have to be adapted and then it can be more or less directly moved to 
> flink-ml.
> The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better 
> implementation because the improve the initial seeding phase to achieve near 
> optimal clustering. It might be worthwhile to implement kMeans||.
> Resources:
> [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
> [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2160) Change Streaming Source Interface to run(Context)/cancel()

2015-06-04 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572477#comment-14572477
 ] 

Aljoscha Krettek commented on FLINK-2160:
-

As I said in the description, this is a special context that allows the sources 
to communicate with the outside world in an extensible way. This is not related 
to the RuntimeContext that is available to RichFunctions.

> Change Streaming Source Interface to run(Context)/cancel()
> --
>
> Key: FLINK-2160
> URL: https://issues.apache.org/jira/browse/FLINK-2160
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> This will make the source interface more extensible in the future without 
> breaking existing sources (after the change). Right now, the context would 
> have methods for element emission and for retrieving the checkpoint lock for 
> checkpointed sources. In the future this can be extended to allow emission of 
> elements with timestamp and for dealing with watermark emission.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2160) Change Streaming Source Interface to run(Context)/cancel()

2015-06-04 Thread Gyula Fora (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572478#comment-14572478
 ] 

Gyula Fora commented on FLINK-2160:
---

Ah, sorry :)

> Change Streaming Source Interface to run(Context)/cancel()
> --
>
> Key: FLINK-2160
> URL: https://issues.apache.org/jira/browse/FLINK-2160
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> This will make the source interface more extensible in the future without 
> breaking existing sources (after the change). Right now, the context would 
> have methods for element emission and for retrieving the checkpoint lock for 
> checkpointed sources. In the future this can be extended to allow emission of 
> elements with timestamp and for dealing with watermark emission.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2116) Make pipeline extension require less coding

2015-06-04 Thread Mikio Braun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572481#comment-14572481
 ] 

Mikio Braun commented on FLINK-2116:


I think if you look at MultipleLinearRegression, the code is now much better 
because you can focus much better on the kind of prediction you want to 
compute. One comment here, allowing to make predictions only for one element at 
a time might be less efficient than doing the prediction for a set of elements 
which can often be written down in matrix algebra. But that probably doesn't 
fit into the Flink framework as well.

On the other hand, the whole Predictor/implicit business is still incredibly 
noise in terms of boilerplate code. That's probably a personal perference, but 
when you spend 10 out of 15 lines on setting up stuff for the compiler's type 
system, I get the feeling that this is not the right way to do it. I understand 
how it works, and all, of course, but still... . Maybe using shorter variable 
names for the type parameters would work, I don't know.


> Make pipeline extension require less coding
> ---
>
> Key: FLINK-2116
> URL: https://issues.apache.org/jira/browse/FLINK-2116
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Mikio Braun
>Assignee: Till Rohrmann
>Priority: Minor
>
> Right now, implementing methods from the pipelines for new types, or even 
> adding new methods to pipelines requires many steps:
> 1) implementing methods for new types
>   implement implicit of the corresponding class encapsulating the operation 
> in the companion object
> 2) adding methods to the pipeline
>   - adding a method
>   - adding a trait for the operation
>   - implement implicit in the companion object
> These are all objects which contain many generic parameters, so reducing the 
> work would be great.
> The goal should be that you can really focus on the code to add, and have as 
> little boilerplate code as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2161) Flink Scala Shell does not support external jars (e.g. Gelly, FlinkML)

2015-06-04 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-2161:


 Summary: Flink Scala Shell does not support external jars (e.g. 
Gelly, FlinkML)
 Key: FLINK-2161
 URL: https://issues.apache.org/jira/browse/FLINK-2161
 Project: Flink
  Issue Type: Improvement
Reporter: Till Rohrmann


Currently, there is no easy way to load and ship external libraries/jars with 
Flink's Scala shell. Assume that you want to run some Gelly graph algorithms 
from within the Scala shell, then you have to put the Gelly jar manually in the 
lib directory and make sure that this jar is also available on your cluster, 
because it is not shipped with the user code. 

It would be good to have a simple mechanism how to specify additional jars upon 
startup of the Scala shell. These jars should then also be shipped to the 
cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2160) Change Streaming Source Interface to run(Context)/cancel()

2015-06-04 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572483#comment-14572483
 ] 

Aljoscha Krettek commented on FLINK-2160:
-

No problemo :)

> Change Streaming Source Interface to run(Context)/cancel()
> --
>
> Key: FLINK-2160
> URL: https://issues.apache.org/jira/browse/FLINK-2160
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> This will make the source interface more extensible in the future without 
> breaking existing sources (after the change). Right now, the context would 
> have methods for element emission and for retrieving the checkpoint lock for 
> checkpointed sources. In the future this can be extended to allow emission of 
> elements with timestamp and for dealing with watermark emission.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2126] Fixed Flink scala shell ITSuite s...

2015-06-04 Thread nikste
Github user nikste commented on the pull request:

https://github.com/apache/flink/pull/768#issuecomment-108823811
  
Changed to include the issue ID, 3..2..1 GO !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2126) Scala shell tests sporadically fail on travis

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572496#comment-14572496
 ] 

ASF GitHub Bot commented on FLINK-2126:
---

Github user nikste commented on the pull request:

https://github.com/apache/flink/pull/768#issuecomment-108823811
  
Changed to include the issue ID, 3..2..1 GO !


> Scala shell tests sporadically fail on travis
> -
>
> Key: FLINK-2126
> URL: https://issues.apache.org/jira/browse/FLINK-2126
> Project: Flink
>  Issue Type: Bug
>  Components: Scala Shell
>Affects Versions: 0.9
>Reporter: Robert Metzger
>
> See https://travis-ci.org/rmetzger/flink/jobs/64893149



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2076) Bug in re-openable hash join

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572509#comment-14572509
 ] 

ASF GitHub Bot commented on FLINK-2076:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/751#issuecomment-108830359
  
Looks good, thanks a lot!

Will merge this...


> Bug in re-openable hash join
> 
>
> Key: FLINK-2076
> URL: https://issues.apache.org/jira/browse/FLINK-2076
> Project: Flink
>  Issue Type: Bug
>  Components: Local Runtime
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Chiwan Park
>
> It happens deterministically in my machine with the following setup:
> TaskManager:
>   - heap size: 512m
>   - network buffers: 4096
>   - slots: 32
> Job:
>   - ConnectedComponents
>   - 100k vertices
>   - 1.2m edges
> --> this gives around 260 m Flink managed memory, across 32 slots is 8MB per 
> slot, with several mem consumers in the job, makes the iterative hash join 
> out-of-core
> {code}
> java.lang.RuntimeException: Hash Join bug in memory management: 
> Memory buffers leaked.
>   at 
> org.apache.flink.runtime.operators.hash.MutableHashTable.buildTableFromSpilledPartition(MutableHashTable.java:733)
>   at 
> org.apache.flink.runtime.operators.hash.MutableHashTable.prepareNextPartition(MutableHashTable.java:508)
>   at 
> org.apache.flink.runtime.operators.hash.ReOpenableMutableHashTable.prepareNextPartition(ReOpenableMutableHashTable.java:167)
>   at 
> org.apache.flink.runtime.operators.hash.MutableHashTable.nextRecord(MutableHashTable.java:541)
>   at 
> org.apache.flink.runtime.operators.hash.NonReusingBuildSecondHashMatchIterator.callWithNextKey(NonReusingBuildSecondHashMatchIterator.java:102)
>   at 
> org.apache.flink.runtime.operators.AbstractCachedBuildSideMatchDriver.run(AbstractCachedBuildSideMatchDriver.java:155)
>   at 
> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
>   at 
> org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139)
>   at 
> org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92)
>   at 
> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:560)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2076] [runtime] Fix memory leakage in M...

2015-06-04 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/751#issuecomment-108830359
  
Looks good, thanks a lot!

Will merge this...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: From stream to Hbase

2015-06-04 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/706#issuecomment-108836071
  
Will merge this...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1789] [core] [runtime] [java-api] Allow...

2015-06-04 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/593#issuecomment-108839757
  
I was originally thinking to merge this after #681 , but I may have to 
merge this by itself to get it into the release.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1789) Allow adding of URLs to the usercode class loader

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572532#comment-14572532
 ] 

ASF GitHub Bot commented on FLINK-1789:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/593#issuecomment-108839757
  
I was originally thinking to merge this after #681 , but I may have to 
merge this by itself to get it into the release.


> Allow adding of URLs to the usercode class loader
> -
>
> Key: FLINK-1789
> URL: https://issues.apache.org/jira/browse/FLINK-1789
> Project: Flink
>  Issue Type: Improvement
>  Components: Distributed Runtime
>Reporter: Timo Walther
>Assignee: Timo Walther
>Priority: Minor
>
> Currently, there is no option to add customs classpath URLs to the 
> FlinkUserCodeClassLoader. JARs always need to be shipped to the cluster even 
> if they are already present on all nodes.
> It would be great if RemoteEnvironment also accepts valid classpaths URLs and 
> forwards them to BlobLibraryCacheManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2119) Add ExecutionGraph support for leg-wise scheduling

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572542#comment-14572542
 ] 

ASF GitHub Bot commented on FLINK-2119:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/754#issuecomment-108843486
  
The hook that starts the next batch source is at the level of the 
`Execution`. That means that as soon as one execution is finished, the next leg 
would start.

I think that would only work if the leg's does not by itself trigger any 
multiple successors (because it has an intermediate shuffle). Otherwise, we 
would again immediately exceed the number of slots.

Can we put the hook onto the intermediate result, and have it trigger an 
entire "slot sharing group" as soon as it is fully available? My initial 
thought is that this would support the above case better.


> Add ExecutionGraph support for leg-wise scheduling
> --
>
> Key: FLINK-2119
> URL: https://issues.apache.org/jira/browse/FLINK-2119
> Project: Flink
>  Issue Type: Improvement
>  Components: Scheduler
>Affects Versions: master
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
>
> Scheduling currently happens by lazily unrolling the ExecutionGraph from the 
> sources.
> 1. All sources are scheduled for execution.
> 2. Their results trigger scheduling and deployment of the receiving tasks 
> (either on the first available buffer or when all are produced [pipelined vs. 
> blocking exchange]).
> For certain batch jobs this can be problematic as many tasks will be running 
> at the same time and consume task manager resources like executionslots and 
> memory. For these jobs, it is desirable to schedule the ExecutionGraph in 
> with different strategies.
> With respect to the ExecutionGraph, the current limitation is that data 
> availability for a result always triggers scheduling of the consuming tasks. 
> This needs to be more general to allow different scheduling strategies.
> Consider the following example:
> {code}
>   [ union ]
>  / \
> /   \
>   [ source 1 ]  [ source 2 ]
> {code}
> Currently, both sources are scheduled concurrently and the "faster" one 
> triggers scheduling of the union. It is desirable to first allow source 1 to 
> completly produce its result, then trigger scheduling of source 2, and only 
> then schedule the union.
> The required changes in the ExecutionGraph are conceptually straight-forward: 
> instead of going through the list of result consumers and scheduling them, we 
> need to be able to run a more general action. For normal operation, this will 
> still schedule the consumer task, but we can also configure it to kick of the 
> next source etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2119] Add ExecutionGraph support for ba...

2015-06-04 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/754#issuecomment-108843486
  
The hook that starts the next batch source is at the level of the 
`Execution`. That means that as soon as one execution is finished, the next leg 
would start.

I think that would only work if the leg's does not by itself trigger any 
multiple successors (because it has an intermediate shuffle). Otherwise, we 
would again immediately exceed the number of slots.

Can we put the hook onto the intermediate result, and have it trigger an 
entire "slot sharing group" as soon as it is fully available? My initial 
thought is that this would support the above case better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1981] add support for GZIP files

2015-06-04 Thread sekruse
Github user sekruse commented on the pull request:

https://github.com/apache/flink/pull/762#issuecomment-108844255
  
Sure, I can do that. Do you talk about a user documentation or more Java 
docs. And if the former applies, where would I put that documentation 
preferrably?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1981) Add GZip support

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572545#comment-14572545
 ] 

ASF GitHub Bot commented on FLINK-1981:
---

Github user sekruse commented on the pull request:

https://github.com/apache/flink/pull/762#issuecomment-108844255
  
Sure, I can do that. Do you talk about a user documentation or more Java 
docs. And if the former applies, where would I put that documentation 
preferrably?


> Add GZip support
> 
>
> Key: FLINK-1981
> URL: https://issues.apache.org/jira/browse/FLINK-1981
> Project: Flink
>  Issue Type: New Feature
>  Components: Core
>Reporter: Sebastian Kruse
>Assignee: Sebastian Kruse
>Priority: Minor
>
> GZip, as a commonly used compression format, should be supported in the same 
> way as the already supported deflate files. This allows to use GZip files 
> with any subclass of FileInputFormat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1981] add support for GZIP files

2015-06-04 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/762#issuecomment-108845395
  
I'm talking about the user documentation. You could mention support for 
gzip and add an example here: 
http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#data-sources


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1981) Add GZip support

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572550#comment-14572550
 ] 

ASF GitHub Bot commented on FLINK-1981:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/762#issuecomment-108845395
  
I'm talking about the user documentation. You could mention support for 
gzip and add an example here: 
http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#data-sources


> Add GZip support
> 
>
> Key: FLINK-1981
> URL: https://issues.apache.org/jira/browse/FLINK-1981
> Project: Flink
>  Issue Type: New Feature
>  Components: Core
>Reporter: Sebastian Kruse
>Assignee: Sebastian Kruse
>Priority: Minor
>
> GZip, as a commonly used compression format, should be supported in the same 
> way as the already supported deflate files. This allows to use GZip files 
> with any subclass of FileInputFormat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2135] Fix faulty cast to GroupReduceFun...

2015-06-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/769


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1981] add support for GZIP files

2015-06-04 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/762#issuecomment-108845535
  
You can modify the documentation in the `docs/apis/programming_guide.md` 
file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2135) Java plan translation fails with ClassCastException (probably in first())

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572551#comment-14572551
 ] 

ASF GitHub Bot commented on FLINK-2135:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/769


> Java plan translation fails with ClassCastException (probably in first())
> -
>
> Key: FLINK-2135
> URL: https://issues.apache.org/jira/browse/FLINK-2135
> Project: Flink
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>
> A user reported the following error
> {code}
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.flink.api.java.functions.FirstReducer cannot be cast to 
> org.apache.flink.api.common.functions.RichGroupReduceFunction
>   at 
> org.apache.flink.api.java.operators.translation.PlanUnwrappingSortedReduceGroupOperator.(PlanUnwrappingSortedReduceGroupOperator.java:40)
>   at 
> org.apache.flink.api.java.operators.GroupReduceOperator.translateSelectorFunctionSortedReducer(GroupReduceOperator.java:278)
>   at 
> org.apache.flink.api.java.operators.GroupReduceOperator.translateToDataFlow(GroupReduceOperator.java:177)
>   at 
> org.apache.flink.api.java.operators.GroupReduceOperator.translateToDataFlow(GroupReduceOperator.java:50)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translateSingleInputOperator(OperatorTranslation.java:124)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translate(OperatorTranslation.java:86)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translateSingleInputOperator(OperatorTranslation.java:122)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translate(OperatorTranslation.java:86)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translateSingleInputOperator(OperatorTranslation.java:122)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translate(OperatorTranslation.java:86)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translate(OperatorTranslation.java:61)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translateToPlan(OperatorTranslation.java:49)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:925)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:893)
>   at 
> org.apache.flink.api.java.LocalEnvironment.execute(LocalEnvironment.java:50)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:789)
>   at org.apache.flink.api.java.DataSet.collect(DataSet.java:411)
>   at org.apache.flink.api.java.DataSet.print(DataSet.java:1346)
>   at com.dataartisans.GroupReduceBug.main(GroupReduceBug.java:43)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
> {code}
> It is reproducible with the following code
> {code}
>   ExecutionEnvironment ee = 
> ExecutionEnvironment.getExecutionEnvironment();
>   DataSet b = ee.fromElements("a", "b");
>   GroupReduceOperator a = b.groupBy(new 
> KeySelector() {
>   @Override
>   public Long getKey(String value) throws Exception {
>   return 1L;
>   }
>   }).sortGroup(new KeySelector() {
>   @Override
>   public Double getKey(String value) throws Exception {
>   return 1.0;
>   }
>   }, Order.DESCENDING).first(10);
>   a.print();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2135) Java plan translation fails with ClassCastException (probably in first())

2015-06-04 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger resolved FLINK-2135.
---
   Resolution: Fixed
Fix Version/s: 0.9

Resolved in http://git-wip-us.apache.org/repos/asf/flink/commit/0081fb2e

> Java plan translation fails with ClassCastException (probably in first())
> -
>
> Key: FLINK-2135
> URL: https://issues.apache.org/jira/browse/FLINK-2135
> Project: Flink
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Robert Metzger
> Fix For: 0.9
>
>
> A user reported the following error
> {code}
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.flink.api.java.functions.FirstReducer cannot be cast to 
> org.apache.flink.api.common.functions.RichGroupReduceFunction
>   at 
> org.apache.flink.api.java.operators.translation.PlanUnwrappingSortedReduceGroupOperator.(PlanUnwrappingSortedReduceGroupOperator.java:40)
>   at 
> org.apache.flink.api.java.operators.GroupReduceOperator.translateSelectorFunctionSortedReducer(GroupReduceOperator.java:278)
>   at 
> org.apache.flink.api.java.operators.GroupReduceOperator.translateToDataFlow(GroupReduceOperator.java:177)
>   at 
> org.apache.flink.api.java.operators.GroupReduceOperator.translateToDataFlow(GroupReduceOperator.java:50)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translateSingleInputOperator(OperatorTranslation.java:124)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translate(OperatorTranslation.java:86)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translateSingleInputOperator(OperatorTranslation.java:122)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translate(OperatorTranslation.java:86)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translateSingleInputOperator(OperatorTranslation.java:122)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translate(OperatorTranslation.java:86)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translate(OperatorTranslation.java:61)
>   at 
> org.apache.flink.api.java.operators.OperatorTranslation.translateToPlan(OperatorTranslation.java:49)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:925)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:893)
>   at 
> org.apache.flink.api.java.LocalEnvironment.execute(LocalEnvironment.java:50)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:789)
>   at org.apache.flink.api.java.DataSet.collect(DataSet.java:411)
>   at org.apache.flink.api.java.DataSet.print(DataSet.java:1346)
>   at com.dataartisans.GroupReduceBug.main(GroupReduceBug.java:43)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
> {code}
> It is reproducible with the following code
> {code}
>   ExecutionEnvironment ee = 
> ExecutionEnvironment.getExecutionEnvironment();
>   DataSet b = ee.fromElements("a", "b");
>   GroupReduceOperator a = b.groupBy(new 
> KeySelector() {
>   @Override
>   public Long getKey(String value) throws Exception {
>   return 1L;
>   }
>   }).sortGroup(new KeySelector() {
>   @Override
>   public Double getKey(String value) throws Exception {
>   return 1.0;
>   }
>   }, Order.DESCENDING).first(10);
>   a.print();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1981) Add GZip support

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572553#comment-14572553
 ] 

ASF GitHub Bot commented on FLINK-1981:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/762#issuecomment-108845535
  
You can modify the documentation in the `docs/apis/programming_guide.md` 
file.


> Add GZip support
> 
>
> Key: FLINK-1981
> URL: https://issues.apache.org/jira/browse/FLINK-1981
> Project: Flink
>  Issue Type: New Feature
>  Components: Core
>Reporter: Sebastian Kruse
>Assignee: Sebastian Kruse
>Priority: Minor
>
> GZip, as a commonly used compression format, should be supported in the same 
> way as the already supported deflate files. This allows to use GZip files 
> with any subclass of FileInputFormat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...

2015-06-04 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/729#discussion_r31711766
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/functions/SemanticPropUtil.java
 ---
@@ -309,15 +308,20 @@ public static DualInputSemanticProperties 
getSemanticPropsDual(
getSemanticPropsDualFromString(result, forwardedFirst, 
forwardedSecond,
nonForwardedFirst, nonForwardedSecond, 
readFirst, readSecond, inType1, inType2, outType);
return result;
-   } else {
-   return new DualInputSemanticProperties();
}
+   return null;
+   }
+
+   public static void 
getSemanticPropsSingleFromString(SingleInputSemanticProperties result,
+   
String[] forwarded, String[] nonForwarded, 
String[] readSet,
+   
TypeInformation inType, TypeInformation 
outType) {
+   getSemanticPropsSingleFromString(result, forwarded, 
nonForwarded, readSet, inType, outType, false);
}
 
public static void 
getSemanticPropsSingleFromString(SingleInputSemanticProperties result,
String[] forwarded, String[] nonForwarded, String[] 
readSet,
-   TypeInformation inType, TypeInformation outType)
-   {
+   TypeInformation inType, TypeInformation outType,
+   boolean skipIncompatibleTypes) {
--- End diff --

Can you explain a bit, why you introduced this flag? 
Why should is be possible to skip the compatibility checks?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572561#comment-14572561
 ] 

ASF GitHub Bot commented on FLINK-1319:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/729#discussion_r31711766
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/functions/SemanticPropUtil.java
 ---
@@ -309,15 +308,20 @@ public static DualInputSemanticProperties 
getSemanticPropsDual(
getSemanticPropsDualFromString(result, forwardedFirst, 
forwardedSecond,
nonForwardedFirst, nonForwardedSecond, 
readFirst, readSecond, inType1, inType2, outType);
return result;
-   } else {
-   return new DualInputSemanticProperties();
}
+   return null;
+   }
+
+   public static void 
getSemanticPropsSingleFromString(SingleInputSemanticProperties result,
+   
String[] forwarded, String[] nonForwarded, 
String[] readSet,
+   
TypeInformation inType, TypeInformation 
outType) {
+   getSemanticPropsSingleFromString(result, forwarded, 
nonForwarded, readSet, inType, outType, false);
}
 
public static void 
getSemanticPropsSingleFromString(SingleInputSemanticProperties result,
String[] forwarded, String[] nonForwarded, String[] 
readSet,
-   TypeInformation inType, TypeInformation outType)
-   {
+   TypeInformation inType, TypeInformation outType,
+   boolean skipIncompatibleTypes) {
--- End diff --

Can you explain a bit, why you introduced this flag? 
Why should is be possible to skip the compatibility checks?


> Add static code analysis for UDFs
> -
>
> Key: FLINK-1319
> URL: https://issues.apache.org/jira/browse/FLINK-1319
> Project: Flink
>  Issue Type: New Feature
>  Components: Java API, Scala API
>Reporter: Stephan Ewen
>Assignee: Timo Walther
>Priority: Minor
>
> Flink's Optimizer takes information that tells it for UDFs which fields of 
> the input elements are accessed, modified, or frwarded/copied. This 
> information frequently helps to reuse partitionings, sorts, etc. It may speed 
> up programs significantly, as it can frequently eliminate sorts and shuffles, 
> which are costly.
> Right now, users can add lightweight annotations to UDFs to provide this 
> information (such as adding {{@ConstandFields("0->3, 1, 2->1")}}.
> We worked with static code analysis of UDFs before, to determine this 
> information automatically. This is an incredible feature, as it "magically" 
> makes programs faster.
> For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this 
> works surprisingly well in many cases. We used the "Soot" toolkit for the 
> static code analysis. Unfortunately, Soot is LGPL licensed and thus we did 
> not include any of the code so far.
> I propose to add this functionality to Flink, in the form of a drop-in 
> addition, to work around the LGPL incompatibility with ALS 2.0. Users could 
> simply download a special "flink-code-analysis.jar" and drop it into the 
> "lib" folder to enable this functionality. We may even add a script to 
> "tools" that downloads that library automatically into the lib folder. This 
> should be legally fine, since we do not redistribute LGPL code and only 
> dynamically link it (the incompatibility with ASL 2.0 is mainly in the 
> patentability, if I remember correctly).
> Prior work on this has been done by [~aljoscha] and [~skunert], which could 
> provide a code base to start with.
> *Appendix*
> Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/
> Papers on static analysis and for optimization: 
> http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and 
> http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
> Quick introduction to the Optimizer: 
> http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf 
> (Section 6)
> Optimizer for Iterations: 
> http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf 
> (Sections 4.3 and 5.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2119) Add ExecutionGraph support for leg-wise scheduling

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572564#comment-14572564
 ] 

ASF GitHub Bot commented on FLINK-2119:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/754#issuecomment-108847772
  
I'm not sure if this is what you mean: the triggering is happening on the 
level of the intermediate result and not the execution. The code just happens 
to be in the execution. (This is what I meant when we had an offline chat about 
the way that intermediate results notifications are handled. It is somewhat of 
a mess.)

To the second point. Yes, I think it is possible to trigger a 
SlotSharingGroup instead of a JobExecutionVertex. But would it make a 
difference for the result? The JobExecutionVertex would grab the slot and the 
successors would simply grap their subslots when they are in the same slot 
sharing group, right?




> Add ExecutionGraph support for leg-wise scheduling
> --
>
> Key: FLINK-2119
> URL: https://issues.apache.org/jira/browse/FLINK-2119
> Project: Flink
>  Issue Type: Improvement
>  Components: Scheduler
>Affects Versions: master
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
>
> Scheduling currently happens by lazily unrolling the ExecutionGraph from the 
> sources.
> 1. All sources are scheduled for execution.
> 2. Their results trigger scheduling and deployment of the receiving tasks 
> (either on the first available buffer or when all are produced [pipelined vs. 
> blocking exchange]).
> For certain batch jobs this can be problematic as many tasks will be running 
> at the same time and consume task manager resources like executionslots and 
> memory. For these jobs, it is desirable to schedule the ExecutionGraph in 
> with different strategies.
> With respect to the ExecutionGraph, the current limitation is that data 
> availability for a result always triggers scheduling of the consuming tasks. 
> This needs to be more general to allow different scheduling strategies.
> Consider the following example:
> {code}
>   [ union ]
>  / \
> /   \
>   [ source 1 ]  [ source 2 ]
> {code}
> Currently, both sources are scheduled concurrently and the "faster" one 
> triggers scheduling of the union. It is desirable to first allow source 1 to 
> completly produce its result, then trigger scheduling of source 2, and only 
> then schedule the union.
> The required changes in the ExecutionGraph are conceptually straight-forward: 
> instead of going through the list of result consumers and scheduling them, we 
> need to be able to run a more general action. For normal operation, this will 
> still schedule the consumer task, but we can also configure it to kick of the 
> next source etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2119] Add ExecutionGraph support for ba...

2015-06-04 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/754#issuecomment-108847772
  
I'm not sure if this is what you mean: the triggering is happening on the 
level of the intermediate result and not the execution. The code just happens 
to be in the execution. (This is what I meant when we had an offline chat about 
the way that intermediate results notifications are handled. It is somewhat of 
a mess.)

To the second point. Yes, I think it is possible to trigger a 
SlotSharingGroup instead of a JobExecutionVertex. But would it make a 
difference for the result? The JobExecutionVertex would grab the slot and the 
successors would simply grap their subslots when they are in the same slot 
sharing group, right?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2076) Bug in re-openable hash join

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572572#comment-14572572
 ] 

ASF GitHub Bot commented on FLINK-2076:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/751


> Bug in re-openable hash join
> 
>
> Key: FLINK-2076
> URL: https://issues.apache.org/jira/browse/FLINK-2076
> Project: Flink
>  Issue Type: Bug
>  Components: Local Runtime
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Chiwan Park
>
> It happens deterministically in my machine with the following setup:
> TaskManager:
>   - heap size: 512m
>   - network buffers: 4096
>   - slots: 32
> Job:
>   - ConnectedComponents
>   - 100k vertices
>   - 1.2m edges
> --> this gives around 260 m Flink managed memory, across 32 slots is 8MB per 
> slot, with several mem consumers in the job, makes the iterative hash join 
> out-of-core
> {code}
> java.lang.RuntimeException: Hash Join bug in memory management: 
> Memory buffers leaked.
>   at 
> org.apache.flink.runtime.operators.hash.MutableHashTable.buildTableFromSpilledPartition(MutableHashTable.java:733)
>   at 
> org.apache.flink.runtime.operators.hash.MutableHashTable.prepareNextPartition(MutableHashTable.java:508)
>   at 
> org.apache.flink.runtime.operators.hash.ReOpenableMutableHashTable.prepareNextPartition(ReOpenableMutableHashTable.java:167)
>   at 
> org.apache.flink.runtime.operators.hash.MutableHashTable.nextRecord(MutableHashTable.java:541)
>   at 
> org.apache.flink.runtime.operators.hash.NonReusingBuildSecondHashMatchIterator.callWithNextKey(NonReusingBuildSecondHashMatchIterator.java:102)
>   at 
> org.apache.flink.runtime.operators.AbstractCachedBuildSideMatchDriver.run(AbstractCachedBuildSideMatchDriver.java:155)
>   at 
> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
>   at 
> org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139)
>   at 
> org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92)
>   at 
> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:560)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-2095) Screenshots missing in webcient documentation

2015-06-04 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-2095:
--
Priority: Trivial  (was: Major)
Assignee: (was: Robert Metzger)
  Labels: starter  (was: )

> Screenshots missing in webcient documentation
> -
>
> Key: FLINK-2095
> URL: https://issues.apache.org/jira/browse/FLINK-2095
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, website
>Reporter: Fabian Hueske
>Priority: Trivial
>  Labels: starter
>
> The screenshots for the documentation of the web submission client are 
> missing since the redesign of the website.
> http://ci.apache.org/projects/flink/flink-docs-master/apis/web_client.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2126) Scala shell tests sporadically fail on travis

2015-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572573#comment-14572573
 ] 

ASF GitHub Bot commented on FLINK-2126:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/768


> Scala shell tests sporadically fail on travis
> -
>
> Key: FLINK-2126
> URL: https://issues.apache.org/jira/browse/FLINK-2126
> Project: Flink
>  Issue Type: Bug
>  Components: Scala Shell
>Affects Versions: 0.9
>Reporter: Robert Metzger
>
> See https://travis-ci.org/rmetzger/flink/jobs/64893149



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   >