[jira] [Commented] (FLINK-2548) In a VertexCentricIteration, the run time of one iteration should be proportional to the size of the workset

2015-08-22 Thread Vasia Kalavri (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707946#comment-14707946
 ] 

Vasia Kalavri commented on FLINK-2548:
--

[~ggevay], an approach similar to what you are describing is implemented by the 
GatherSumApplyIteration, i.e. join + reduce + join.
I have also found out in my experiments that this implementation is faster than 
the vertex-centric in many cases, but also might be slower, depending on the 
dataset and algorithm. That's why we support both models and let the user 
decide which one to use based on their needs.
I don't think changing the vertex-centric implementation to switch between 
coGroup and join is a good idea.
I think that the two supported iteration models in Gelly cover the majority of 
common use-cases. For more custom solutions one can always write their own 
delta iteration.


> In a VertexCentricIteration, the run time of one iteration should be 
> proportional to the size of the workset
> 
>
> Key: FLINK-2548
> URL: https://issues.apache.org/jira/browse/FLINK-2548
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 0.9, 0.10
>Reporter: Gabor Gevay
>Assignee: Gabor Gevay
>
> Currently, the performance of vertex centric iteration is suboptimal in those 
> iterations where the workset is small, because the complexity of one 
> iteration contains the number of edges and vertices of the graph because of 
> coGroups:
> VertexCentricIteration.buildMessagingFunction does a coGroup between the 
> edges and the workset, to get the neighbors to the messaging UDF. This is 
> problematic from a performance point of view, because the coGroup UDF gets 
> called on all the edge groups, including those that are not getting any 
> messages.
> An analogous problem is present in 
> VertexCentricIteration.createResultSimpleVertex at the creation of the 
> updates: a coGroup happens between the messages and the solution set, which 
> has the number of vertices of the graph included in its complexity.
> Both of these coGroups could be avoided by doing a join instead (with the 
> same keys that the coGroup uses), and then a groupBy. The complexity of these 
> operations would be dominated by the size of the workset, as opposed to the 
> number of edges or vertices of the graph. The joins should have the edges and 
> the solution set at the build side to achieve this complexity. (They will not 
> be rebuilt at every iteration.)
> I made some experiments with this, and the initial results seem promising. On 
> some workloads, this achieves a 2 times speedup, because later iterations 
> often have quite small worksets, and these get a huge speedup from this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1901) Create sample operator for Dataset

2015-08-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707972#comment-14707972
 ] 

ASF GitHub Bot commented on FLINK-1901:
---

Github user thvasilo commented on the pull request:

https://github.com/apache/flink/pull/949#issuecomment-133675023
  
It's great to have this in, I'll try to update the cross-validation and SGD 
to use this.


> Create sample operator for Dataset
> --
>
> Key: FLINK-1901
> URL: https://issues.apache.org/jira/browse/FLINK-1901
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Reporter: Theodore Vasiloudis
>Assignee: Chengxiang Li
>
> In order to be able to implement Stochastic Gradient Descent and a number of 
> other machine learning algorithms we need to have a way to take a random 
> sample from a Dataset.
> We need to be able to sample with or without replacement from the Dataset, 
> choose the relative or exact size of the sample, set a seed for 
> reproducibility, and support sampling within iterations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2445) Add tests for HadoopOutputFormats

2015-08-22 Thread Martin Liesenberg (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707979#comment-14707979
 ] 

Martin Liesenberg commented on FLINK-2445:
--

Hi,

I have a question concerning the scope of the task. As I understand it the 
HadoopOutputFormat* classes wrap access to the actual Hadoop OutputFormat 
implementations and the tests should ensure that the methods of those 
OutputFormat classes are called correctly - so far so good and stated in the 
description as well. 

Is it sufficient to create a mock object and assert that the methods are called 
correctly? Judging from the HadoopOutputFormatBase class, the Hadoop 
OutputFormat can implement (Job)Configurable, so I guess those cases would need 
to be covered as well?

Best regards, Martin

> Add tests for HadoopOutputFormats
> -
>
> Key: FLINK-2445
> URL: https://issues.apache.org/jira/browse/FLINK-2445
> Project: Flink
>  Issue Type: Test
>  Components: Hadoop Compatibility, Tests
>Affects Versions: 0.10, 0.9.1
>Reporter: Fabian Hueske
>  Labels: starter
>
> The HadoopOutputFormats and HadoopOutputFormatBase classes are not 
> sufficiently covered by unit tests.
> We need tests that ensure that the methods of the wrapped Hadoop 
> OutputFormats are correctly called. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2017) Add predefined required parameters to ParameterTool

2015-08-22 Thread Martin Liesenberg (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708000#comment-14708000
 ] 

Martin Liesenberg commented on FLINK-2017:
--

Hi, 

I'd like to work on this ticket, but before I'll start, I though it's best to 
check my understanding with the expectations, just to make sure we are on the 
same page. 

So the user can define a list of required parameters which are encapsulated in 
a new class. Once the parameters are defined, they can be checked against a 
list of provided parameters. Should check return a boolean or void and throw an 
Exception if not all are provided?

I imagine the printHelp method to produce output similar to what is shown if 
"grep --help" is called in the command line in linux. Does that sound sensible?

The checkAndPopulate method updates the parameter object and returns void, 
right?

One last thing, I couldnt find the ParameterUtil class used after the 
definition of the required parameters in the above test case. Is it mean to be 
ParameterTool?

Looking forward to your comments.
Best regards, Martin

> Add predefined required parameters to ParameterTool
> ---
>
> Key: FLINK-2017
> URL: https://issues.apache.org/jira/browse/FLINK-2017
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 0.9
>Reporter: Robert Metzger
>  Labels: starter
>
> In FLINK-1525 we've added the {{ParameterTool}}.
> During the PR review, there was a request for required parameters.
> This issue is about implementing a facility to define required parameters. 
> The tool should also be able to print a help menu with a list of all 
> parameters.
> This test case shows my initial ideas how to design the API
> {code}
>   @Test
>   public void requiredParameters() {
>   RequiredParameters required = new RequiredParameters();
>   Option input = required.add("input").alt("i").help("Path to 
> input file or directory"); // parameter with long and short variant
>   required.add("output"); // parameter only with long variant
>   Option parallelism = 
> required.add("parallelism").alt("p").type(Integer.class); // parameter with 
> type
>   Option spOption = 
> required.add("sourceParallelism").alt("sp").defaultValue(12).help("Number 
> specifying the number of parallel data source instances"); // parameter with 
> default value, specifying the type.
>   Option executionType = 
> required.add("executionType").alt("et").defaultValue("pipelined").choices("pipelined",
>  "batch");
>   ParameterUtil parameter = ParameterUtil.fromArgs(new 
> String[]{"-i", "someinput", "--output", "someout", "-p", "15"});
>   required.check(parameter);
>   required.printHelp();
>   required.checkAndPopulate(parameter);
>   String inputString = input.get();
>   int par = parallelism.getInteger();
>   String output = parameter.get("output");
>   int sourcePar = parameter.getInteger(spOption.getName());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1901] [core] Create sample operator for...

2015-08-22 Thread thvasilo
Github user thvasilo commented on the pull request:

https://github.com/apache/flink/pull/949#issuecomment-133675023
  
It's great to have this in, I'll try to update the cross-validation and SGD 
to use this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2030][FLINK-2274][ml]Online Histograms ...

2015-08-22 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/861#issuecomment-133707118
  
@tillrohrmann, thanks for the brilliant suggestions. Using a `TreeMap` and 
`PriorityQueue` with invalidation, I've managed to bring down the complexity of 
the `add` and `merge` operations to logarithmic time. Further, `quantile` and 
`count` are also linear only, as they should be.
Further, I've decided to put both the Histograms in the `accumulator` 
package since they're supposed to work like one anyway. There already was a 
*discrete* histogram in the `accumulator` package. The *continuous* one now 
resides in the same place.
Also, the `DataSetUtils` class now contains functions to create histograms, 
providing access to these classes from the Java api itself instead of the ML 
library. That was needed to be done sooner or later. Flink-2274 actually asks 
for that. 
@thvasilo @chiwanpark  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2030) Implement an online histogram with Merging and equalization features

2015-08-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708043#comment-14708043
 ] 

ASF GitHub Bot commented on FLINK-2030:
---

Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/861#issuecomment-133707118
  
@tillrohrmann, thanks for the brilliant suggestions. Using a `TreeMap` and 
`PriorityQueue` with invalidation, I've managed to bring down the complexity of 
the `add` and `merge` operations to logarithmic time. Further, `quantile` and 
`count` are also linear only, as they should be.
Further, I've decided to put both the Histograms in the `accumulator` 
package since they're supposed to work like one anyway. There already was a 
*discrete* histogram in the `accumulator` package. The *continuous* one now 
resides in the same place.
Also, the `DataSetUtils` class now contains functions to create histograms, 
providing access to these classes from the Java api itself instead of the ML 
library. That was needed to be done sooner or later. Flink-2274 actually asks 
for that. 
@thvasilo @chiwanpark  


> Implement an online histogram with Merging and equalization features
> 
>
> Key: FLINK-2030
> URL: https://issues.apache.org/jira/browse/FLINK-2030
> Project: Flink
>  Issue Type: Sub-task
>  Components: Machine Learning Library
>Reporter: Sachin Goel
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: ML
>
> For the implementation of the decision tree in 
> https://issues.apache.org/jira/browse/FLINK-1727, we need to implement an 
> histogram with online updates, merging and equalization features. A reference 
> implementation is provided in [1]
> [1].http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2558) Add Streaming Connector for Elasticsearch

2015-08-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708074#comment-14708074
 ] 

ASF GitHub Bot commented on FLINK-2558:
---

Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/1040#discussion_r37697052
  
--- Diff: docs/apis/streaming_guide.md ---
@@ -1517,11 +1517,24 @@ Stream connectors
 
 
 
-Connectors provide an interface for accessing data from various third 
party sources (message queues). Currently three connectors are natively 
supported, namely [Apache Kafka](https://kafka.apache.org/),  
[RabbitMQ](http://www.rabbitmq.com/) and the [Twitter Streaming 
API](https://dev.twitter.com/docs/streaming-apis).
+Connectors provide code for interfacing with various third-party systems.
+Currently these systems are supported:
 
-Typically the connector packages consist of a source and sink class (with 
the exception of Twitter where only a source is provided). To use these sources 
the user needs to pass Serialization/Deserialization schemas for the connectors 
for the desired types. (Or use some predefined ones)
+ * [Apache Kafka](https://kafka.apache.org/)
--- End diff --

Should we add source/sink, to show which types of connectors are 
im`plemented for which systems?

We can also add Java/Scala collections and file systems (HDFS) to that list.


> Add Streaming Connector for Elasticsearch
> -
>
> Key: FLINK-2558
> URL: https://issues.apache.org/jira/browse/FLINK-2558
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
> Fix For: 0.10
>
>
> We should add a sink that can write to Elasticsearch. A source does not seem 
> necessary because Elasticsearch would mostly be used for accessing results, 
> for example using a dashboard.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2558) Add Streaming Connector for Elasticsearch

2015-08-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708075#comment-14708075
 ] 

ASF GitHub Bot commented on FLINK-2558:
---

Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/1040#discussion_r37697064
  
--- Diff: docs/apis/streaming_guide.md ---
@@ -1661,6 +1674,165 @@ More about Kafka can be found 
[here](https://kafka.apache.org/documentation.html
 
 [Back to top](#top)
 
+### Elasticsearch
+
+This connector provides a Sink that can write to an
+[Elasticsearch](https://elastic.co/) Index. To use this connector, add the
+following dependency to your project:
+
+{% highlight xml %}
+
+  org.apache.flink
+  flink-connector-elasticsearch
+  {{site.version }}
+
+{% endhighlight %}
+
+Note that the streaming connectors are currently not part of the binary
+distribution. See

+[here](cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution)
+for information about how to package the program with the libraries for
+cluster execution.
+
+ Installing Elasticsearch
+
+Instructions for setting up an Elasticsearch cluster can be found

+[here](https://www.elastic.co/guide/en/elasticsearch/reference/current/setup.html).
+Make sure to set and remember a cluster name. This must be set when
+creating a Sink for writing to your cluster
+
+ Elasticsearch Sink
+The connector provides a Sink that can send data to an Elasticsearch Index.
+
+The sink can use two different methods for communicating with 
Elasticsearch:
+
+1. An embedded Node
+2. The TransportClient
+
+See 
[here](https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/client.html)
+for information about the differences between the two modes.
+
+This code shows how to create a sink that uses an embedded Node for
+communication:
+
+
+
+{% highlight java %}
+DataStream input = ...;
+
+Map config = Maps.newHashMap();
+// This instructs the sink to emit after every element, otherwise they 
would be buffered
+config.put("bulk.flush.max.actions", "1");
+config.put("cluster.name", "my-cluster-name");
+
+input.addSink(new ElasticsearchSink<>(config, new 
IndexRequestBuilder() {
+@Override
+public IndexRequest createIndexRequest(String element, RuntimeContext 
ctx) {
--- End diff --

Can we make the ElasticsearchSink rich? Then it has access to the 
RuntimeContext anyways, and it needs not be passed in every call.


> Add Streaming Connector for Elasticsearch
> -
>
> Key: FLINK-2558
> URL: https://issues.apache.org/jira/browse/FLINK-2558
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
> Fix For: 0.10
>
>
> We should add a sink that can write to Elasticsearch. A source does not seem 
> necessary because Elasticsearch would mostly be used for accessing results, 
> for example using a dashboard.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2558] Add Streaming Connector for Elast...

2015-08-22 Thread StephanEwen
Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/1040#discussion_r37697064
  
--- Diff: docs/apis/streaming_guide.md ---
@@ -1661,6 +1674,165 @@ More about Kafka can be found 
[here](https://kafka.apache.org/documentation.html
 
 [Back to top](#top)
 
+### Elasticsearch
+
+This connector provides a Sink that can write to an
+[Elasticsearch](https://elastic.co/) Index. To use this connector, add the
+following dependency to your project:
+
+{% highlight xml %}
+
+  org.apache.flink
+  flink-connector-elasticsearch
+  {{site.version }}
+
+{% endhighlight %}
+
+Note that the streaming connectors are currently not part of the binary
+distribution. See

+[here](cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution)
+for information about how to package the program with the libraries for
+cluster execution.
+
+ Installing Elasticsearch
+
+Instructions for setting up an Elasticsearch cluster can be found

+[here](https://www.elastic.co/guide/en/elasticsearch/reference/current/setup.html).
+Make sure to set and remember a cluster name. This must be set when
+creating a Sink for writing to your cluster
+
+ Elasticsearch Sink
+The connector provides a Sink that can send data to an Elasticsearch Index.
+
+The sink can use two different methods for communicating with 
Elasticsearch:
+
+1. An embedded Node
+2. The TransportClient
+
+See 
[here](https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/client.html)
+for information about the differences between the two modes.
+
+This code shows how to create a sink that uses an embedded Node for
+communication:
+
+
+
+{% highlight java %}
+DataStream input = ...;
+
+Map config = Maps.newHashMap();
+// This instructs the sink to emit after every element, otherwise they 
would be buffered
+config.put("bulk.flush.max.actions", "1");
+config.put("cluster.name", "my-cluster-name");
+
+input.addSink(new ElasticsearchSink<>(config, new 
IndexRequestBuilder() {
+@Override
+public IndexRequest createIndexRequest(String element, RuntimeContext 
ctx) {
--- End diff --

Can we make the ElasticsearchSink rich? Then it has access to the 
RuntimeContext anyways, and it needs not be passed in every call.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2558) Add Streaming Connector for Elasticsearch

2015-08-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708076#comment-14708076
 ] 

ASF GitHub Bot commented on FLINK-2558:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1040#issuecomment-133718276
  
This looks pretty nice :-) Docs, tests, good comments, cluster tests!

+1 to merge this!


> Add Streaming Connector for Elasticsearch
> -
>
> Key: FLINK-2558
> URL: https://issues.apache.org/jira/browse/FLINK-2558
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
> Fix For: 0.10
>
>
> We should add a sink that can write to Elasticsearch. A source does not seem 
> necessary because Elasticsearch would mostly be used for accessing results, 
> for example using a dashboard.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2558] Add Streaming Connector for Elast...

2015-08-22 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1040#issuecomment-133718325
  
Added two minor comments on the docs/interface, but treat those as 
optional, they are and matter of taste anyways.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2558) Add Streaming Connector for Elasticsearch

2015-08-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708077#comment-14708077
 ] 

ASF GitHub Bot commented on FLINK-2558:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1040#issuecomment-133718325
  
Added two minor comments on the docs/interface, but treat those as 
optional, they are and matter of taste anyways.


> Add Streaming Connector for Elasticsearch
> -
>
> Key: FLINK-2558
> URL: https://issues.apache.org/jira/browse/FLINK-2558
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
> Fix For: 0.10
>
>
> We should add a sink that can write to Elasticsearch. A source does not seem 
> necessary because Elasticsearch would mostly be used for accessing results, 
> for example using a dashboard.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2560] Flink-Avro Plugin cannot be handl...

2015-08-22 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1041#issuecomment-133718527
  
Looks good, +1 to merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2560) Flink-Avro Plugin cannot be handled by Eclipse

2015-08-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708078#comment-14708078
 ] 

ASF GitHub Bot commented on FLINK-2560:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1041#issuecomment-133718527
  
Looks good, +1 to merge


> Flink-Avro Plugin cannot be handled by Eclipse
> --
>
> Key: FLINK-2560
> URL: https://issues.apache.org/jira/browse/FLINK-2560
> Project: Flink
>  Issue Type: Improvement
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>Priority: Trivial
>
> Eclipse always shows the following error:
> {noformat}
> Description   ResourcePathLocationType Plugin execution 
> not overed by lifecycle configuration: 
> org.apache.avro:avro-maven-plugin:1.7.7:schema (execution: default, phase: 
> generate-sources)   pom.xml /flink-avro line 134Maven Project 
> Build Lifecycle Mapping problem
> {noformat}
> This can be fixed by disable plugin within Eclipse via  ... 
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2558] Add Streaming Connector for Elast...

2015-08-22 Thread StephanEwen
Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/1040#discussion_r37697052
  
--- Diff: docs/apis/streaming_guide.md ---
@@ -1517,11 +1517,24 @@ Stream connectors
 
 
 
-Connectors provide an interface for accessing data from various third 
party sources (message queues). Currently three connectors are natively 
supported, namely [Apache Kafka](https://kafka.apache.org/),  
[RabbitMQ](http://www.rabbitmq.com/) and the [Twitter Streaming 
API](https://dev.twitter.com/docs/streaming-apis).
+Connectors provide code for interfacing with various third-party systems.
+Currently these systems are supported:
 
-Typically the connector packages consist of a source and sink class (with 
the exception of Twitter where only a source is provided). To use these sources 
the user needs to pass Serialization/Deserialization schemas for the connectors 
for the desired types. (Or use some predefined ones)
+ * [Apache Kafka](https://kafka.apache.org/)
--- End diff --

Should we add source/sink, to show which types of connectors are 
im`plemented for which systems?

We can also add Java/Scala collections and file systems (HDFS) to that list.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2558] Add Streaming Connector for Elast...

2015-08-22 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1040#issuecomment-133718276
  
This looks pretty nice :-) Docs, tests, good comments, cluster tests!

+1 to merge this!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2451] [gelly] examples and library clea...

2015-08-22 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1000#issuecomment-133725849
  
Thank you for the review @andralungu. I will rebase and merge this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2451) Cleanup Gelly examples

2015-08-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708089#comment-14708089
 ] 

ASF GitHub Bot commented on FLINK-2451:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1000#issuecomment-133725849
  
Thank you for the review @andralungu. I will rebase and merge this.


> Cleanup Gelly examples
> --
>
> Key: FLINK-2451
> URL: https://issues.apache.org/jira/browse/FLINK-2451
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 0.10
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>Priority: Minor
>
> As per discussion in the dev@ mailing list, this issue proposes the following 
> changes to the Gelly examples and library:
> 1. Keep the following examples as they are:
> EuclideanGraphWeighing, GraphMetrics, IncrementalSSSP, JaccardSimilarity,  
> MusicProfiles.
> 2. Keep only 1 example to show how to use library methods.
> 3. Add 1 example for vertex-centric iterations.
> 4. Keep 1 example for GSA iterations and move the redundant GSA 
> implementations to the library.
> 5. Improve the examples documentation and refer to the functionality that 
> each of them demonstrates.
> 6. Port and modify existing example tests accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1520]Read edges and vertices from CSV f...

2015-08-22 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/847#issuecomment-133727559
  
Thanks for the comments @andralungu!
@shghatge, can you please close this PR? I will make the docs update and 
open a new one, which will include your work and my changes if that's OK with 
you. Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files

2015-08-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708092#comment-14708092
 ] 

ASF GitHub Bot commented on FLINK-1520:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/847#issuecomment-133727559
  
Thanks for the comments @andralungu!
@shghatge, can you please close this PR? I will make the docs update and 
open a new one, which will include your work and my changes if that's OK with 
you. Thank you!


> Read edges and vertices from CSV files
> --
>
> Key: FLINK-1520
> URL: https://issues.apache.org/jira/browse/FLINK-1520
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Vasia Kalavri
>Assignee: Shivani Ghatge
>Priority: Minor
>  Labels: easyfix, newbie
>
> Add methods to create Vertex and Edge Datasets directly from CSV file inputs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2310] Add an Adamic Adar Similarity exa...

2015-08-22 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/892#issuecomment-133728081
  
Any progress on this and #923 @shghatge?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2310) Add an Adamic-Adar Similarity example

2015-08-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708097#comment-14708097
 ] 

ASF GitHub Bot commented on FLINK-2310:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/892#issuecomment-133728081
  
Any progress on this and #923 @shghatge?


> Add an Adamic-Adar Similarity example
> -
>
> Key: FLINK-2310
> URL: https://issues.apache.org/jira/browse/FLINK-2310
> Project: Flink
>  Issue Type: Task
>  Components: Gelly
>Reporter: Andra Lungu
>Assignee: Shivani Ghatge
>Priority: Minor
>
> Just as Jaccard, the Adamic-Adar algorithm measures the similarity between a 
> set of nodes. However, instead of counting the common neighbors and dividing 
> them by the total number of neighbors, the similarity is weighted according 
> to the vertex degrees. In particular, it's equal to log(1/numberOfEdges).
> The Adamic-Adar algorithm can be broken into three steps: 
> 1). For each vertex, compute the log of its inverse degrees (with the formula 
> above) and set it as the vertex value. 
> 2). Each vertex will then send this new computed value along with a list of 
> neighbors to the targets of its out-edges
> 3). Weigh the edges with the Adamic-Adar index: Sum over n from CN of 
> log(1/k_n)(CN is the set of all common neighbors of two vertices x, y. k_n is 
> the degree of node n). See [2]
> Prerequisites: 
> - Full understanding of the Jaccard Similarity Measure algorithm
> - Reading the associated literature: 
> [1] http://social.cs.uiuc.edu/class/cs591kgk/friendsadamic.pdf
> [2] 
> http://stackoverflow.com/questions/22565620/fast-algorithm-to-compute-adamic-adar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2033) Add overloaded methods with explicit TypeInformation parameters to Gelly

2015-08-22 Thread Vasia Kalavri (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708099#comment-14708099
 ] 

Vasia Kalavri commented on FLINK-2033:
--

I can close this one now, right [~vanaepi]?

> Add overloaded methods with explicit TypeInformation parameters to Gelly
> 
>
> Key: FLINK-2033
> URL: https://issues.apache.org/jira/browse/FLINK-2033
> Project: Flink
>  Issue Type: Task
>  Components: Gelly
>Affects Versions: 0.9
>Reporter: PJ Van Aeken
>Assignee: PJ Van Aeken
>
> For the implementation of the Scala API for Gelly (FLINK-1962), we need to 
> pass explicit TypeInformation since the Java TypeExtractor does not work for 
> all Scala Types (see FLINK-2023).
> To do this, the java Gelly API needs to be expanded with methods that allow 
> for explicit passing of TypeInformation.
> An example with mapVertices:
> {code}
>  public  Graph mapVertices(final MapFunction, 
> NV> mapper) {
> TypeInformation keyType = ((TupleTypeInfo) 
> vertices.getType()).getTypeAt(0);
> String callLocation = Utils.getCallLocationName();
> TypeInformation valueType = 
> TypeExtractor.getMapReturnTypes(mapper, vertices.getType(), callLocation, 
> false);
> TypeInformation> returnType = 
> (TypeInformation>) new TupleTypeInfo(
> Vertex.class, keyType, valueType);
> return mapVertices(mapper,returnType);
> }
> public  Graph mapVertices(final MapFunction, 
> NV> mapper, TypeInformation> returnType) {
> DataSet> mappedVertices = vertices.map(
> new MapFunction, Vertex>() {
> public Vertex map(Vertex value) throws 
> Exception {
> return new Vertex(value.f0, mapper.map(value));
> }
> }).returns(returnType);
> return new Graph(mappedVertices, this.edges, this.context);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2561) Sync Gelly Java and Scala APIs

2015-08-22 Thread Vasia Kalavri (JIRA)
Vasia Kalavri created FLINK-2561:


 Summary: Sync Gelly Java and Scala APIs
 Key: FLINK-2561
 URL: https://issues.apache.org/jira/browse/FLINK-2561
 Project: Flink
  Issue Type: Task
  Components: Gelly
Reporter: Vasia Kalavri


There is some functionality and tests missing from the Gelly Scala API. This 
should be added, together with documentation, a completeness test and some 
usage examples.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1962) Add Gelly Scala API

2015-08-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708102#comment-14708102
 ] 

ASF GitHub Bot commented on FLINK-1962:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1004#issuecomment-133730158
  
I have created 
[FLINK-2561](https://issues.apache.org/jira/browse/FLINK-2561) for the sync.


> Add Gelly Scala API
> ---
>
> Key: FLINK-1962
> URL: https://issues.apache.org/jira/browse/FLINK-1962
> Project: Flink
>  Issue Type: Task
>  Components: Gelly, Scala API
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>Assignee: PJ Van Aeken
> Fix For: 0.10
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1962] Add Gelly Scala API v2

2015-08-22 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1004#issuecomment-133730158
  
I have created 
[FLINK-2561](https://issues.apache.org/jira/browse/FLINK-2561) for the sync.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2558] Add Streaming Connector for Elast...

2015-08-22 Thread aljoscha
Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/1040#discussion_r37697987
  
--- Diff: docs/apis/streaming_guide.md ---
@@ -1517,11 +1517,24 @@ Stream connectors
 
 
 
-Connectors provide an interface for accessing data from various third 
party sources (message queues). Currently three connectors are natively 
supported, namely [Apache Kafka](https://kafka.apache.org/),  
[RabbitMQ](http://www.rabbitmq.com/) and the [Twitter Streaming 
API](https://dev.twitter.com/docs/streaming-apis).
+Connectors provide code for interfacing with various third-party systems.
+Currently these systems are supported:
 
-Typically the connector packages consist of a source and sink class (with 
the exception of Twitter where only a source is provided). To use these sources 
the user needs to pass Serialization/Deserialization schemas for the connectors 
for the desired types. (Or use some predefined ones)
+ * [Apache Kafka](https://kafka.apache.org/)
--- End diff --

The source/sink would be helpful. The documentation might need a 
restructure anyways. The information about sources/sinks is scattered across 
the "Connecting to the Outside World" and "Connectors" sections.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2558) Add Streaming Connector for Elasticsearch

2015-08-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708115#comment-14708115
 ] 

ASF GitHub Bot commented on FLINK-2558:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/1040#discussion_r37698000
  
--- Diff: docs/apis/streaming_guide.md ---
@@ -1661,6 +1674,165 @@ More about Kafka can be found 
[here](https://kafka.apache.org/documentation.html
 
 [Back to top](#top)
 
+### Elasticsearch
+
+This connector provides a Sink that can write to an
+[Elasticsearch](https://elastic.co/) Index. To use this connector, add the
+following dependency to your project:
+
+{% highlight xml %}
+
+  org.apache.flink
+  flink-connector-elasticsearch
+  {{site.version }}
+
+{% endhighlight %}
+
+Note that the streaming connectors are currently not part of the binary
+distribution. See

+[here](cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution)
+for information about how to package the program with the libraries for
+cluster execution.
+
+ Installing Elasticsearch
+
+Instructions for setting up an Elasticsearch cluster can be found

+[here](https://www.elastic.co/guide/en/elasticsearch/reference/current/setup.html).
+Make sure to set and remember a cluster name. This must be set when
+creating a Sink for writing to your cluster
+
+ Elasticsearch Sink
+The connector provides a Sink that can send data to an Elasticsearch Index.
+
+The sink can use two different methods for communicating with 
Elasticsearch:
+
+1. An embedded Node
+2. The TransportClient
+
+See 
[here](https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/client.html)
+for information about the differences between the two modes.
+
+This code shows how to create a sink that uses an embedded Node for
+communication:
+
+
+
+{% highlight java %}
+DataStream input = ...;
+
+Map config = Maps.newHashMap();
+// This instructs the sink to emit after every element, otherwise they 
would be buffered
+config.put("bulk.flush.max.actions", "1");
+config.put("cluster.name", "my-cluster-name");
+
+input.addSink(new ElasticsearchSink<>(config, new 
IndexRequestBuilder() {
+@Override
+public IndexRequest createIndexRequest(String element, RuntimeContext 
ctx) {
--- End diff --

The `ElasticsearchSink` is rich already. The `IndexRequestBuilder` is more 
like a souped up key selector that gives the user the power to specify in great 
detail how they want their element added to Elasticsearch. I admit the function 
signature is a bit strange but I didn't want to go full-blown RichFunction for 
the `IndexRequestBuilder`.

Should we change it? Because then users would also think that they could 
make it stateful and all the other things that come with rich functions.


> Add Streaming Connector for Elasticsearch
> -
>
> Key: FLINK-2558
> URL: https://issues.apache.org/jira/browse/FLINK-2558
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
> Fix For: 0.10
>
>
> We should add a sink that can write to Elasticsearch. A source does not seem 
> necessary because Elasticsearch would mostly be used for accessing results, 
> for example using a dashboard.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2558] Add Streaming Connector for Elast...

2015-08-22 Thread aljoscha
Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/1040#discussion_r37698000
  
--- Diff: docs/apis/streaming_guide.md ---
@@ -1661,6 +1674,165 @@ More about Kafka can be found 
[here](https://kafka.apache.org/documentation.html
 
 [Back to top](#top)
 
+### Elasticsearch
+
+This connector provides a Sink that can write to an
+[Elasticsearch](https://elastic.co/) Index. To use this connector, add the
+following dependency to your project:
+
+{% highlight xml %}
+
+  org.apache.flink
+  flink-connector-elasticsearch
+  {{site.version }}
+
+{% endhighlight %}
+
+Note that the streaming connectors are currently not part of the binary
+distribution. See

+[here](cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution)
+for information about how to package the program with the libraries for
+cluster execution.
+
+ Installing Elasticsearch
+
+Instructions for setting up an Elasticsearch cluster can be found

+[here](https://www.elastic.co/guide/en/elasticsearch/reference/current/setup.html).
+Make sure to set and remember a cluster name. This must be set when
+creating a Sink for writing to your cluster
+
+ Elasticsearch Sink
+The connector provides a Sink that can send data to an Elasticsearch Index.
+
+The sink can use two different methods for communicating with 
Elasticsearch:
+
+1. An embedded Node
+2. The TransportClient
+
+See 
[here](https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/client.html)
+for information about the differences between the two modes.
+
+This code shows how to create a sink that uses an embedded Node for
+communication:
+
+
+
+{% highlight java %}
+DataStream input = ...;
+
+Map config = Maps.newHashMap();
+// This instructs the sink to emit after every element, otherwise they 
would be buffered
+config.put("bulk.flush.max.actions", "1");
+config.put("cluster.name", "my-cluster-name");
+
+input.addSink(new ElasticsearchSink<>(config, new 
IndexRequestBuilder() {
+@Override
+public IndexRequest createIndexRequest(String element, RuntimeContext 
ctx) {
--- End diff --

The `ElasticsearchSink` is rich already. The `IndexRequestBuilder` is more 
like a souped up key selector that gives the user the power to specify in great 
detail how they want their element added to Elasticsearch. I admit the function 
signature is a bit strange but I didn't want to go full-blown RichFunction for 
the `IndexRequestBuilder`.

Should we change it? Because then users would also think that they could 
make it stateful and all the other things that come with rich functions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2558) Add Streaming Connector for Elasticsearch

2015-08-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708114#comment-14708114
 ] 

ASF GitHub Bot commented on FLINK-2558:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/1040#discussion_r37697987
  
--- Diff: docs/apis/streaming_guide.md ---
@@ -1517,11 +1517,24 @@ Stream connectors
 
 
 
-Connectors provide an interface for accessing data from various third 
party sources (message queues). Currently three connectors are natively 
supported, namely [Apache Kafka](https://kafka.apache.org/),  
[RabbitMQ](http://www.rabbitmq.com/) and the [Twitter Streaming 
API](https://dev.twitter.com/docs/streaming-apis).
+Connectors provide code for interfacing with various third-party systems.
+Currently these systems are supported:
 
-Typically the connector packages consist of a source and sink class (with 
the exception of Twitter where only a source is provided). To use these sources 
the user needs to pass Serialization/Deserialization schemas for the connectors 
for the desired types. (Or use some predefined ones)
+ * [Apache Kafka](https://kafka.apache.org/)
--- End diff --

The source/sink would be helpful. The documentation might need a 
restructure anyways. The information about sources/sinks is scattered across 
the "Connecting to the Outside World" and "Connectors" sections.


> Add Streaming Connector for Elasticsearch
> -
>
> Key: FLINK-2558
> URL: https://issues.apache.org/jira/browse/FLINK-2558
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
> Fix For: 0.10
>
>
> We should add a sink that can write to Elasticsearch. A source does not seem 
> necessary because Elasticsearch would mostly be used for accessing results, 
> for example using a dashboard.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-2439) [py] Expand DataSet feature coverage

2015-08-22 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-2439:

Description: 
An upcoming commit of mine will add the following methods to the Python API's 
DataSet class:

first
distinct
partitionByHash
rebalance


  was:
An upcoming commit of mine will add the following methods to the Python API's 
DataSet class:

first
distinct
partitionByHash
rebalance
aggregates(min, max, sum)



> [py] Expand DataSet feature coverage
> 
>
> Key: FLINK-2439
> URL: https://issues.apache.org/jira/browse/FLINK-2439
> Project: Flink
>  Issue Type: Improvement
>  Components: Python API
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>
> An upcoming commit of mine will add the following methods to the Python API's 
> DataSet class:
> first
> distinct
> partitionByHash
> rebalance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-2439) [py] Expand DataSet feature coverage

2015-08-22 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-2439:

Issue Type: Sub-task  (was: Improvement)
Parent: FLINK-1926

> [py] Expand DataSet feature coverage
> 
>
> Key: FLINK-2439
> URL: https://issues.apache.org/jira/browse/FLINK-2439
> Project: Flink
>  Issue Type: Sub-task
>  Components: Python API
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>
> An upcoming commit of mine will add the following methods to the Python API's 
> DataSet class:
> first
> distinct
> partitionByHash
> rebalance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-2440) [py] Expand Environment feature coverage

2015-08-22 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-2440:

Issue Type: Sub-task  (was: Improvement)
Parent: FLINK-1926

> [py] Expand Environment feature coverage
> 
>
> Key: FLINK-2440
> URL: https://issues.apache.org/jira/browse/FLINK-2440
> Project: Flink
>  Issue Type: Sub-task
>  Components: Python API
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>
> An upcoming commit of mine will add the following methods to the Python API's 
> Environment class:
> getParallelism
> set-/getNumberOfExecutionRetries
> Additionally, calls to the now deprecated java getDegreeOfParallelism were 
> changed, and the equivalent python method renamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-2432) [py] Provide support for custom serialization

2015-08-22 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-2432:

Issue Type: Sub-task  (was: New Feature)
Parent: FLINK-1926

> [py] Provide support for custom serialization
> -
>
> Key: FLINK-2432
> URL: https://issues.apache.org/jira/browse/FLINK-2432
> Project: Flink
>  Issue Type: Sub-task
>  Components: Python API
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1945) Make python tests less verbose

2015-08-22 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-1945:

Component/s: Tests
 Python API

> Make python tests less verbose
> --
>
> Key: FLINK-1945
> URL: https://issues.apache.org/jira/browse/FLINK-1945
> Project: Flink
>  Issue Type: Improvement
>  Components: Python API, Tests
>Reporter: Till Rohrmann
>Assignee: Chesnay Schepler
>Priority: Minor
>
> Currently, the python tests print a lot of log messages to stdout. 
> Furthermore there seems to be some println statements which clutter the 
> console output. I think that these log messages are not required for the 
> tests and thus should be suppressed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2562) [py] Implement KeySelectors

2015-08-22 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-2562:
---

 Summary: [py] Implement KeySelectors
 Key: FLINK-2562
 URL: https://issues.apache.org/jira/browse/FLINK-2562
 Project: Flink
  Issue Type: Sub-task
  Components: Python API
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2451] [gelly] examples and library clea...

2015-08-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1000


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2451) Cleanup Gelly examples

2015-08-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708142#comment-14708142
 ] 

ASF GitHub Bot commented on FLINK-2451:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1000


> Cleanup Gelly examples
> --
>
> Key: FLINK-2451
> URL: https://issues.apache.org/jira/browse/FLINK-2451
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 0.10
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>Priority: Minor
>
> As per discussion in the dev@ mailing list, this issue proposes the following 
> changes to the Gelly examples and library:
> 1. Keep the following examples as they are:
> EuclideanGraphWeighing, GraphMetrics, IncrementalSSSP, JaccardSimilarity,  
> MusicProfiles.
> 2. Keep only 1 example to show how to use library methods.
> 3. Add 1 example for vertex-centric iterations.
> 4. Keep 1 example for GSA iterations and move the redundant GSA 
> implementations to the library.
> 5. Improve the examples documentation and refer to the functionality that 
> each of them demonstrates.
> 6. Port and modify existing example tests accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-2451) Cleanup Gelly examples

2015-08-22 Thread Vasia Kalavri (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri closed FLINK-2451.

   Resolution: Fixed
Fix Version/s: 0.10

> Cleanup Gelly examples
> --
>
> Key: FLINK-2451
> URL: https://issues.apache.org/jira/browse/FLINK-2451
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 0.10
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>Priority: Minor
> Fix For: 0.10
>
>
> As per discussion in the dev@ mailing list, this issue proposes the following 
> changes to the Gelly examples and library:
> 1. Keep the following examples as they are:
> EuclideanGraphWeighing, GraphMetrics, IncrementalSSSP, JaccardSimilarity,  
> MusicProfiles.
> 2. Keep only 1 example to show how to use library methods.
> 3. Add 1 example for vertex-centric iterations.
> 4. Keep 1 example for GSA iterations and move the redundant GSA 
> implementations to the library.
> 5. Improve the examples documentation and refer to the functionality that 
> each of them demonstrates.
> 6. Port and modify existing example tests accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2448]Clear cache file list in Execution...

2015-08-22 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/1031#issuecomment-133788218
  
Travis passes. :)
I asked on the dev list about the two small changes in flink-gelly. @vasia 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2448) registerCacheFile fails with MultipleProgramsTestbase

2015-08-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708305#comment-14708305
 ] 

ASF GitHub Bot commented on FLINK-2448:
---

Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/1031#issuecomment-133788218
  
Travis passes. :)
I asked on the dev list about the two small changes in flink-gelly. @vasia 


> registerCacheFile fails with MultipleProgramsTestbase
> -
>
> Key: FLINK-2448
> URL: https://issues.apache.org/jira/browse/FLINK-2448
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Reporter: Chesnay Schepler
>Priority: Minor
>
> When trying to register a file using a constant name an expection is thrown 
> saying the file was already cached.
> This is probably because the same environment is reused, and the cacheFile 
> entries are not cleared between runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)