[GitHub] flink pull request: FLINK-2018 Add ParameterUtil.fromGenericOption...

2015-05-24 Thread mbalassi
Github user mbalassi commented on the pull request:

https://github.com/apache/flink/pull/720#issuecomment-104989667
  
Thanks for picking up the issue. Generally looks good to me. One minor 
comment: please do not merge my commit on the delta policies as it is already 
in. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2018) Add ParameterUtil.fromGenericOptionsParser() for compatibility to Hadoop's argument parser

2015-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14557672#comment-14557672
 ] 

ASF GitHub Bot commented on FLINK-2018:
---

Github user mbalassi commented on the pull request:

https://github.com/apache/flink/pull/720#issuecomment-104989667
  
Thanks for picking up the issue. Generally looks good to me. One minor 
comment: please do not merge my commit on the delta policies as it is already 
in. :)


> Add ParameterUtil.fromGenericOptionsParser() for compatibility to Hadoop's 
> argument parser
> --
>
> Key: FLINK-2018
> URL: https://issues.apache.org/jira/browse/FLINK-2018
> Project: Flink
>  Issue Type: Improvement
>Reporter: Robert Metzger
>Priority: Minor
>  Labels: starter
>
> In FLINK-1525 we've added the {{ParameterTool}}.
> For users used to Hadoop's {{GenericOptionsParser}} it would be great to 
> provide a compatible parser.
> See: 
> https://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/util/GenericOptionsParser.html
> {code}
> @Test
> public void testFromGenericOptionsParser() {
>   ParameterUtil parameter = ParameterUtil.fromGenericOptionsParser(new 
> String[]{"-D", "input=myinput", "-DexpectedCount=15"});
>   validate(parameter);
> } 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2043] Change the KMeansDataGenerator to...

2015-05-24 Thread pp86
GitHub user pp86 opened a pull request:

https://github.com/apache/flink/pull/721

[FLINK-2043] Change the KMeansDataGenerator to allow passing a custom path

https://issues.apache.org/jira/browse/FLINK-2043

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pp86/flink myBranch

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/721.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #721


commit 05c5987c69ff37b74a705163b198801b93fe299b
Author: pp86 
Date:   2015-05-22T21:01:48Z

Merge pull request #1 from apache/master

test update

commit 00c6542c7681bd7099ded31985382c6cd1aa43b5
Author: Pietro Pinoli 
Date:   2015-05-24T11:53:15Z

merging

commit 90fa66e7356817b8d0e4190c024f914ee5d70e5b
Author: Pietro Pinoli 
Date:   2015-05-24T11:35:35Z

[FLINK-2043] Change the KMeansDataGenerator to allow passing a custom path




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2043) Change the KMeansDataGenerator to allow passing a custom path

2015-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14557717#comment-14557717
 ] 

ASF GitHub Bot commented on FLINK-2043:
---

GitHub user pp86 opened a pull request:

https://github.com/apache/flink/pull/721

[FLINK-2043] Change the KMeansDataGenerator to allow passing a custom path

https://issues.apache.org/jira/browse/FLINK-2043

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pp86/flink myBranch

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/721.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #721


commit 05c5987c69ff37b74a705163b198801b93fe299b
Author: pp86 
Date:   2015-05-22T21:01:48Z

Merge pull request #1 from apache/master

test update

commit 00c6542c7681bd7099ded31985382c6cd1aa43b5
Author: Pietro Pinoli 
Date:   2015-05-24T11:53:15Z

merging

commit 90fa66e7356817b8d0e4190c024f914ee5d70e5b
Author: Pietro Pinoli 
Date:   2015-05-24T11:35:35Z

[FLINK-2043] Change the KMeansDataGenerator to allow passing a custom path




> Change the KMeansDataGenerator to allow passing a custom path
> -
>
> Key: FLINK-2043
> URL: https://issues.apache.org/jira/browse/FLINK-2043
> Project: Flink
>  Issue Type: Improvement
>  Components: Examples
>Reporter: Robert Metzger
>Assignee: pietro pinoli
>Priority: Trivial
>  Labels: starter
>
> It would be nice to allow the user to specify a target path for the generated 
> data.
> Right now, one has to pass the path by changing the tmp directory of java
> {code}
> java -Djava.io.tmpdir=`pwd` -cp 
> /home/robert/flink/build-target/examples/flink-java-examples-0.9-SNAPSHOT-KMeans.jar
>  org.apache.flink.examples.java.clustering.util.KMeansDataGenerator
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2012][gelly] Added methods to remove/ad...

2015-05-24 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/678#issuecomment-105014574
  
Hi @andralungu,

I think that mixing DataSet with List is not the way to go. When would one 
end up with a DataSet of vertices, but just a list of Edges (given that edges 
are usually more than vertices)? On the other hand, we have to deal with the 
empty edge case as you said. Actually, I think that `addVertex` / `addVertices` 
shouldn't deal with the edges at all -- having an argument for edges kind of 
implies that we're checking that these edges belong to the vertex/vertices to 
be added.. which we don't.

Let me propose the following:

- regarding additions: when we have DataSets, simply use `union`.  
`addVertex` adds a single vertex and `addVertices` adds a list of vertices to 
the graph (no edges as an argument). `addEdge` and `addEdges` work the same 
way: single edge and list of edges. The only thing we need to decide here is 
the behavior when adding an edge for a non-existing vertex. We can either 
ignore the edge or create it with some default value.
- regarding removals: I propose we add a `difference` method, corresponding 
to `union`, which will work on DataSets. Methods for removing vertices and 
edges will work on single objects and lists.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2012) addVertices, addEdges, removeVertices, removeEdges methods

2015-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14557729#comment-14557729
 ] 

ASF GitHub Bot commented on FLINK-2012:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/678#issuecomment-105014574
  
Hi @andralungu,

I think that mixing DataSet with List is not the way to go. When would one 
end up with a DataSet of vertices, but just a list of Edges (given that edges 
are usually more than vertices)? On the other hand, we have to deal with the 
empty edge case as you said. Actually, I think that `addVertex` / `addVertices` 
shouldn't deal with the edges at all -- having an argument for edges kind of 
implies that we're checking that these edges belong to the vertex/vertices to 
be added.. which we don't.

Let me propose the following:

- regarding additions: when we have DataSets, simply use `union`.  
`addVertex` adds a single vertex and `addVertices` adds a list of vertices to 
the graph (no edges as an argument). `addEdge` and `addEdges` work the same 
way: single edge and list of edges. The only thing we need to decide here is 
the behavior when adding an edge for a non-existing vertex. We can either 
ignore the edge or create it with some default value.
- regarding removals: I propose we add a `difference` method, corresponding 
to `union`, which will work on DataSets. Methods for removing vertices and 
edges will work on single objects and lists.


> addVertices, addEdges, removeVertices, removeEdges methods
> --
>
> Key: FLINK-2012
> URL: https://issues.apache.org/jira/browse/FLINK-2012
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.9
>Reporter: Andra Lungu
>Assignee: Andra Lungu
>Priority: Minor
>
> Currently, Gelly only allows the addition/deletion of one vertex/edge at a 
> time. If a user would want to add two (or more) vertices, he/she would need 
> to add a vertex-> create a new graph; then add another vertex -> another 
> graph etc.  
> It would be nice to also have addVertices, addEdges, removeVertices, 
> removeEdges methods. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-2018 Add ParameterUtil.fromGenericOption...

2015-05-24 Thread ajaybhat
Github user ajaybhat commented on the pull request:

https://github.com/apache/flink/pull/720#issuecomment-105028293
  
> please do not merge my commit on the delta policies as it is already in.

I don't know how that happened. As usual I just committed, rebased from 
master and pushed to my fork. Is there a way to get rid of that commit?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2018) Add ParameterUtil.fromGenericOptionsParser() for compatibility to Hadoop's argument parser

2015-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14557761#comment-14557761
 ] 

ASF GitHub Bot commented on FLINK-2018:
---

Github user ajaybhat commented on the pull request:

https://github.com/apache/flink/pull/720#issuecomment-105028293
  
> please do not merge my commit on the delta policies as it is already in.

I don't know how that happened. As usual I just committed, rebased from 
master and pushed to my fork. Is there a way to get rid of that commit?


> Add ParameterUtil.fromGenericOptionsParser() for compatibility to Hadoop's 
> argument parser
> --
>
> Key: FLINK-2018
> URL: https://issues.apache.org/jira/browse/FLINK-2018
> Project: Flink
>  Issue Type: Improvement
>Reporter: Robert Metzger
>Priority: Minor
>  Labels: starter
>
> In FLINK-1525 we've added the {{ParameterTool}}.
> For users used to Hadoop's {{GenericOptionsParser}} it would be great to 
> provide a compatible parser.
> See: 
> https://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/util/GenericOptionsParser.html
> {code}
> @Test
> public void testFromGenericOptionsParser() {
>   ParameterUtil parameter = ParameterUtil.fromGenericOptionsParser(new 
> String[]{"-D", "input=myinput", "-DexpectedCount=15"});
>   validate(parameter);
> } 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2043) Change the KMeansDataGenerator to allow passing a custom path

2015-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14557767#comment-14557767
 ] 

ASF GitHub Bot commented on FLINK-2043:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/721#issuecomment-105028826
  
Thank you for taking care of this.
The change looks good, +1 to merge



> Change the KMeansDataGenerator to allow passing a custom path
> -
>
> Key: FLINK-2043
> URL: https://issues.apache.org/jira/browse/FLINK-2043
> Project: Flink
>  Issue Type: Improvement
>  Components: Examples
>Reporter: Robert Metzger
>Assignee: pietro pinoli
>Priority: Trivial
>  Labels: starter
>
> It would be nice to allow the user to specify a target path for the generated 
> data.
> Right now, one has to pass the path by changing the tmp directory of java
> {code}
> java -Djava.io.tmpdir=`pwd` -cp 
> /home/robert/flink/build-target/examples/flink-java-examples-0.9-SNAPSHOT-KMeans.jar
>  org.apache.flink.examples.java.clustering.util.KMeansDataGenerator
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2043] Change the KMeansDataGenerator to...

2015-05-24 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/721#issuecomment-105028826
  
Thank you for taking care of this.
The change looks good, +1 to merge



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: FLINK-2018 Add ParameterUtil.fromGenericOption...

2015-05-24 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/720#discussion_r30953426
  
--- Diff: 
flink-java/src/test/java/org/apache/flink/api/java/utils/ParameterToolTest.java 
---
@@ -150,6 +150,14 @@ public void testMerged() {
validate(parameter);
}
 
+   @Test
+   public void testFromGenericOptionsParser() throws IOException {
+   System.setProperty("input", "myInput");
+   System.setProperty("expectedCount", "15");
+   ParameterTool parameter = 
ParameterTool.fromGenericOptionsParser(new String[]{"-D", "input=myInput", 
"-DexpectedCount=15"});
--- End diff --

Is the genericOptionsParser getting the settings from the system properties 
or the command line arguments?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2018) Add ParameterUtil.fromGenericOptionsParser() for compatibility to Hadoop's argument parser

2015-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14557769#comment-14557769
 ] 

ASF GitHub Bot commented on FLINK-2018:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/720#discussion_r30953426
  
--- Diff: 
flink-java/src/test/java/org/apache/flink/api/java/utils/ParameterToolTest.java 
---
@@ -150,6 +150,14 @@ public void testMerged() {
validate(parameter);
}
 
+   @Test
+   public void testFromGenericOptionsParser() throws IOException {
+   System.setProperty("input", "myInput");
+   System.setProperty("expectedCount", "15");
+   ParameterTool parameter = 
ParameterTool.fromGenericOptionsParser(new String[]{"-D", "input=myInput", 
"-DexpectedCount=15"});
--- End diff --

Is the genericOptionsParser getting the settings from the system properties 
or the command line arguments?


> Add ParameterUtil.fromGenericOptionsParser() for compatibility to Hadoop's 
> argument parser
> --
>
> Key: FLINK-2018
> URL: https://issues.apache.org/jira/browse/FLINK-2018
> Project: Flink
>  Issue Type: Improvement
>Reporter: Robert Metzger
>Priority: Minor
>  Labels: starter
>
> In FLINK-1525 we've added the {{ParameterTool}}.
> For users used to Hadoop's {{GenericOptionsParser}} it would be great to 
> provide a compatible parser.
> See: 
> https://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/util/GenericOptionsParser.html
> {code}
> @Test
> public void testFromGenericOptionsParser() {
>   ParameterUtil parameter = ParameterUtil.fromGenericOptionsParser(new 
> String[]{"-D", "input=myinput", "-DexpectedCount=15"});
>   validate(parameter);
> } 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1759) Execution statistics for vertex-centric iterations

2015-05-24 Thread Andra Lungu (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14557798#comment-14557798
 ] 

Andra Lungu commented on FLINK-1759:


Hey, 

We would like to know what the status of Flink profiling is at the moment and 
whether we can help you with anything :)

> Execution statistics for vertex-centric iterations
> --
>
> Key: FLINK-1759
> URL: https://issues.apache.org/jira/browse/FLINK-1759
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>Priority: Minor
>
> It would be nice to add an option for gathering execution statistics from 
> VertexCentricIteration.
> In particular, the following metrics could be useful:
> - total number of supersteps
> - number of messages sent (total / per superstep)
> - bytes of messages exchanged (total / per superstep)
> - execution time (total / per superstep)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-1941) Add documentation for Gelly-GSA

2015-05-24 Thread Vasia Kalavri (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri reassigned FLINK-1941:


Assignee: Vasia Kalavri

> Add documentation for Gelly-GSA
> ---
>
> Key: FLINK-1941
> URL: https://issues.apache.org/jira/browse/FLINK-1941
> Project: Flink
>  Issue Type: Task
>  Components: Gelly
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>  Labels: docs, gelly
>
> Add a section in the Gelly guide to describe the newly introduced 
> Gather-Sum-Apply iteration method. Show how GSA uses delta iterations 
> internally and explain the differences of this model as compared to 
> vertex-centric.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1999) TF-IDF transformer

2015-05-24 Thread Filip Perisic (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14557823#comment-14557823
 ] 

Filip Perisic commented on FLINK-1999:
--

We want to compare results of our implementation to the result of Python's 
scikit tf-idf implementation..should we add test data to the project or should 
we hard code test data into strings?

> TF-IDF transformer
> --
>
> Key: FLINK-1999
> URL: https://issues.apache.org/jira/browse/FLINK-1999
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Ronny Bräunlich
>Assignee: Alexander Alexandrov
>Priority: Minor
>  Labels: ML
>
> Hello everybody,
> we are a group of three students from TU Berlin (I guess we're not the first 
> group creating an issue) and we want to/have to implement a tf-idf tranformer 
> for Flink.
> Our lecturer Alexander told us that we could get some guidance here and that 
> you could point us to an old version of a similar tranformer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1941) Add documentation for Gelly-GSA

2015-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14557827#comment-14557827
 ] 

ASF GitHub Bot commented on FLINK-1941:
---

GitHub user vasia opened a pull request:

https://github.com/apache/flink/pull/722

[FLINK-1941] documentation for the Gather-Sum-Apply iterations in Gelly

This one adds a high-level description of GSA iterative model and an 
example of implementing SSSP with GSA in Gelly. Also, a section on available 
configuration options for the GSAIteration.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vasia/flink gsa-docs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/722.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #722


commit 7e1ad7f76e6a57a69d6a2de7e76e49a4d87d
Author: vasia 
Date:   2015-05-24T19:31:37Z

[FLINK-1941] [gelly] [docs] added documentation for the Gather-Sum-Apply 
iterations in Gelly




> Add documentation for Gelly-GSA
> ---
>
> Key: FLINK-1941
> URL: https://issues.apache.org/jira/browse/FLINK-1941
> Project: Flink
>  Issue Type: Task
>  Components: Gelly
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>  Labels: docs, gelly
>
> Add a section in the Gelly guide to describe the newly introduced 
> Gather-Sum-Apply iteration method. Show how GSA uses delta iterations 
> internally and explain the differences of this model as compared to 
> vertex-centric.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1941] documentation for the Gather-Sum-...

2015-05-24 Thread vasia
GitHub user vasia opened a pull request:

https://github.com/apache/flink/pull/722

[FLINK-1941] documentation for the Gather-Sum-Apply iterations in Gelly

This one adds a high-level description of GSA iterative model and an 
example of implementing SSSP with GSA in Gelly. Also, a section on available 
configuration options for the GSAIteration.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vasia/flink gsa-docs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/722.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #722


commit 7e1ad7f76e6a57a69d6a2de7e76e49a4d87d
Author: vasia 
Date:   2015-05-24T19:31:37Z

[FLINK-1941] [gelly] [docs] added documentation for the Gather-Sum-Apply 
iterations in Gelly




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (FLINK-1963) Improve distinct() transformation

2015-05-24 Thread pietro pinoli (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pietro pinoli reassigned FLINK-1963:


Assignee: pietro pinoli

> Improve distinct() transformation
> -
>
> Key: FLINK-1963
> URL: https://issues.apache.org/jira/browse/FLINK-1963
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 0.9
>Reporter: Fabian Hueske
>Assignee: pietro pinoli
>Priority: Minor
>  Labels: starter
> Fix For: 0.9
>
>
> The `distinct()` transformation is a bit limited right now with respect to 
> processing atomic key types:
> - `distinct(String ...)` works only for composite data types (POJO, tuple), 
> but wildcard expression should also be supported for atomic key types
> - `distinct()` only works for composite types, but should also work for 
> atomic key types
> - `distinct(KeySelector)` is the most generic one, but not very handy to use
> - `distinct(int ...)` works only for Tuple data types (which is fine)
> Fixing this should be rather easy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1963) Improve distinct() transformation

2015-05-24 Thread pietro pinoli (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14557843#comment-14557843
 ] 

pietro pinoli commented on FLINK-1963:
--

Hello, if you have no objections, I'd like to take care of this.

> Improve distinct() transformation
> -
>
> Key: FLINK-1963
> URL: https://issues.apache.org/jira/browse/FLINK-1963
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 0.9
>Reporter: Fabian Hueske
>Assignee: pietro pinoli
>Priority: Minor
>  Labels: starter
> Fix For: 0.9
>
>
> The `distinct()` transformation is a bit limited right now with respect to 
> processing atomic key types:
> - `distinct(String ...)` works only for composite data types (POJO, tuple), 
> but wildcard expression should also be supported for atomic key types
> - `distinct()` only works for composite types, but should also work for 
> atomic key types
> - `distinct(KeySelector)` is the most generic one, but not very handy to use
> - `distinct(int ...)` works only for Tuple data types (which is fine)
> Fixing this should be rather easy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1727) Add decision tree to machine learning library

2015-05-24 Thread Sachin Goel (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14557846#comment-14557846
 ] 

Sachin Goel commented on FLINK-1727:


I've implemented a partial decision tree which works with continuous fields and 
gini gain function. 
The pull request is here: https://github.com/apache/flink/pull/710

> Add decision tree to machine learning library
> -
>
> Key: FLINK-1727
> URL: https://issues.apache.org/jira/browse/FLINK-1727
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Mikio Braun
>  Labels: ML
>
> Decision trees are widely used for classification and regression tasks. Thus, 
> it would be worthwhile to add support for them to Flink's machine learning 
> library. 
> A streaming parallel decision tree learning algorithm has been proposed by 
> Ben-Haim and Tom-Tov [1]. This can maybe adapted to a batch use case as well. 
> [2] contains an overview of different techniques of how to scale inductive 
> learning algorithms up. A presentation of Spark's MLlib decision tree 
> implementation can be found in [3].
> Resources:
> [1] [http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf]
> [2] 
> [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.46.8226&rep=rep1&type=pdf]
> [3] 
> [http://spark-summit.org/wp-content/uploads/2014/07/Scalable-Distributed-Decision-Trees-in-Spark-Made-Das-Sparks-Talwalkar.pdf]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-1727) Add decision tree to machine learning library

2015-05-24 Thread Sachin Goel (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sachin Goel reassigned FLINK-1727:
--

Assignee: Sachin Goel  (was: Mikio Braun)

> Add decision tree to machine learning library
> -
>
> Key: FLINK-1727
> URL: https://issues.apache.org/jira/browse/FLINK-1727
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Sachin Goel
>  Labels: ML
>
> Decision trees are widely used for classification and regression tasks. Thus, 
> it would be worthwhile to add support for them to Flink's machine learning 
> library. 
> A streaming parallel decision tree learning algorithm has been proposed by 
> Ben-Haim and Tom-Tov [1]. This can maybe adapted to a batch use case as well. 
> [2] contains an overview of different techniques of how to scale inductive 
> learning algorithms up. A presentation of Spark's MLlib decision tree 
> implementation can be found in [3].
> Resources:
> [1] [http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf]
> [2] 
> [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.46.8226&rep=rep1&type=pdf]
> [3] 
> [http://spark-summit.org/wp-content/uploads/2014/07/Scalable-Distributed-Decision-Trees-in-Spark-Made-Das-Sparks-Talwalkar.pdf]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1941] documentation for the Gather-Sum-...

2015-05-24 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/722#issuecomment-105064088
  
Thanks for the review @andralungu!
You're right, I'll add a comparison paragraph. I guess I should probably 
also add the same SSSP example with the corresponding drawings for the 
vertex-centric case as well, to better show how computation is distributed :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1941) Add documentation for Gelly-GSA

2015-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14557856#comment-14557856
 ] 

ASF GitHub Bot commented on FLINK-1941:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/722#issuecomment-105064088
  
Thanks for the review @andralungu!
You're right, I'll add a comparison paragraph. I guess I should probably 
also add the same SSSP example with the corresponding drawings for the 
vertex-centric case as well, to better show how computation is distributed :)


> Add documentation for Gelly-GSA
> ---
>
> Key: FLINK-1941
> URL: https://issues.apache.org/jira/browse/FLINK-1941
> Project: Flink
>  Issue Type: Task
>  Components: Gelly
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>  Labels: docs, gelly
>
> Add a section in the Gelly guide to describe the newly introduced 
> Gather-Sum-Apply iteration method. Show how GSA uses delta iterations 
> internally and explain the differences of this model as compared to 
> vertex-centric.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-2018 Add ParameterUtil.fromGenericOption...

2015-05-24 Thread ajaybhat
Github user ajaybhat commented on a diff in the pull request:

https://github.com/apache/flink/pull/720#discussion_r30963563
  
--- Diff: 
flink-java/src/test/java/org/apache/flink/api/java/utils/ParameterToolTest.java 
---
@@ -150,6 +150,14 @@ public void testMerged() {
validate(parameter);
}
 
+   @Test
+   public void testFromGenericOptionsParser() throws IOException {
+   System.setProperty("input", "myInput");
+   System.setProperty("expectedCount", "15");
+   ParameterTool parameter = 
ParameterTool.fromGenericOptionsParser(new String[]{"-D", "input=myInput", 
"-DexpectedCount=15"});
--- End diff --

Its from the command line arguments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2018) Add ParameterUtil.fromGenericOptionsParser() for compatibility to Hadoop's argument parser

2015-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14557997#comment-14557997
 ] 

ASF GitHub Bot commented on FLINK-2018:
---

Github user ajaybhat commented on a diff in the pull request:

https://github.com/apache/flink/pull/720#discussion_r30963563
  
--- Diff: 
flink-java/src/test/java/org/apache/flink/api/java/utils/ParameterToolTest.java 
---
@@ -150,6 +150,14 @@ public void testMerged() {
validate(parameter);
}
 
+   @Test
+   public void testFromGenericOptionsParser() throws IOException {
+   System.setProperty("input", "myInput");
+   System.setProperty("expectedCount", "15");
+   ParameterTool parameter = 
ParameterTool.fromGenericOptionsParser(new String[]{"-D", "input=myInput", 
"-DexpectedCount=15"});
--- End diff --

Its from the command line arguments.


> Add ParameterUtil.fromGenericOptionsParser() for compatibility to Hadoop's 
> argument parser
> --
>
> Key: FLINK-2018
> URL: https://issues.apache.org/jira/browse/FLINK-2018
> Project: Flink
>  Issue Type: Improvement
>Reporter: Robert Metzger
>Priority: Minor
>  Labels: starter
>
> In FLINK-1525 we've added the {{ParameterTool}}.
> For users used to Hadoop's {{GenericOptionsParser}} it would be great to 
> provide a compatible parser.
> See: 
> https://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/util/GenericOptionsParser.html
> {code}
> @Test
> public void testFromGenericOptionsParser() {
>   ParameterUtil parameter = ParameterUtil.fromGenericOptionsParser(new 
> String[]{"-D", "input=myinput", "-DexpectedCount=15"});
>   validate(parameter);
> } 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)