[jira] [Commented] (FLINK-3650) Add maxBy/minBy to Scala DataSet API

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268142#comment-15268142
 ] 

ASF GitHub Bot commented on FLINK-3650:
---

Github user ramkrish86 commented on the pull request:

https://github.com/apache/flink/pull/1856#issuecomment-216442490
  
Removed the unused import. That was causing a build error.


> Add maxBy/minBy to Scala DataSet API
> 
>
> Key: FLINK-3650
> URL: https://issues.apache.org/jira/browse/FLINK-3650
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: ramkrishna.s.vasudevan
>
> The stable Java DataSet API contains the API calls {{maxBy}} and {{minBy}}. 
> These methods are not supported by the Scala DataSet API. These methods 
> should be added in order to have a consistent API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-3650 Add maxBy/minBy to Scala DataSet AP...

2016-05-02 Thread ramkrish86
Github user ramkrish86 commented on the pull request:

https://github.com/apache/flink/pull/1856#issuecomment-216442490
  
Removed the unused import. That was causing a build error.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-02 Thread gallenvara
Github user gallenvara commented on the pull request:

https://github.com/apache/flink/pull/1956#issuecomment-216433392
  
@greghogan @vasia  thanks a lot for your review and codes have been 
modified. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268085#comment-15268085
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user gallenvara commented on the pull request:

https://github.com/apache/flink/pull/1956#issuecomment-216433392
  
@greghogan @vasia  thanks a lot for your review and codes have been 
modified. :)


> Implementation of Gelly HITS Algorithm
> --
>
> Key: FLINK-2044
> URL: https://issues.apache.org/jira/browse/FLINK-2044
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Ahamd Javid
>Assignee: GaoLun
>Priority: Minor
>
> Implementation of Hits Algorithm in Gelly API using Java. the feature branch 
> can be found here: (https://github.com/JavidMayar/flink/commits/HITS)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267946#comment-15267946
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user gallenvara commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61832489
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
+
+   public static enum HITSParameter {
+   HUB,
+   AUTHORITY
+   }
+
+   private int maxIterations;
+   private long numberOfVertices;
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is known,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, 
HITSParameter)} constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   }
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is unknown,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} 
constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, long numberOfVertices, 
HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
--- End diff --

may be `this(maxIterations, hitParameter);` :)


> Implementation of Gelly HITS Algorithm
> --
>
> Key: FLINK-2044
> URL: https://issues.apache.org/jira/browse/FLINK-2044
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Ahamd Javid
>Assignee: GaoLun
>

[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-02 Thread gallenvara
Github user gallenvara commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61832489
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
+
+   public static enum HITSParameter {
+   HUB,
+   AUTHORITY
+   }
+
+   private int maxIterations;
+   private long numberOfVertices;
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is known,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, 
HITSParameter)} constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   }
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is unknown,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} 
constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, long numberOfVertices, 
HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
--- End diff --

may be `this(maxIterations, hitParameter);` :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Jackson version upgrade: default from 2.4.2 to...

2016-05-02 Thread cowtowncoder
Github user cowtowncoder commented on the pull request:

https://github.com/apache/flink/pull/1952#issuecomment-216415361
  
@smarthi Updated as suggested.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267901#comment-15267901
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user gallenvara commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61831147
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
+
+   public static enum HITSParameter {
+   HUB,
+   AUTHORITY
+   }
+
+   private int maxIterations;
+   private long numberOfVertices;
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is known,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, 
HITSParameter)} constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   }
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is unknown,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} 
constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, long numberOfVertices, 
HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   this.numberOfVertices = numberOfVertices;
+   }
+
+   @Override
+   public DataSet> run(Graph 
netGraph) throws Exception {
+   if (this.numberOfVertices == 0) {
+   this.numberOfVertices = netGraph.numberOfVertices();
+   }
+
+  

[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-02 Thread gallenvara
Github user gallenvara commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61831147
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
+
+   public static enum HITSParameter {
+   HUB,
+   AUTHORITY
+   }
+
+   private int maxIterations;
+   private long numberOfVertices;
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is known,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, 
HITSParameter)} constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   }
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is unknown,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} 
constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, long numberOfVertices, 
HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   this.numberOfVertices = numberOfVertices;
+   }
+
+   @Override
+   public DataSet> run(Graph 
netGraph) throws Exception {
+   if (this.numberOfVertices == 0) {
+   this.numberOfVertices = netGraph.numberOfVertices();
+   }
+
+   ScatterGatherConfiguration parameter = new 
ScatterGatherConfiguration();
+   parameter.setDirection(EdgeDirection.ALL);
+   parameter.registerAggregator("sumAllValue", new 
DoubleSumAggregator());
+
+ 

[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267897#comment-15267897
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user gallenvara commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61831064
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
--- End diff --

not used, may be good to set to `NullValue`


> Implementation of Gelly HITS Algorithm
> --
>
> Key: FLINK-2044
> URL: https://issues.apache.org/jira/browse/FLINK-2044
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Ahamd Javid
>Assignee: GaoLun
>Priority: Minor
>
> Implementation of Hits Algorithm in Gelly API using Java. the feature branch 
> can be found here: (https://github.com/JavidMayar/flink/commits/HITS)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-02 Thread gallenvara
Github user gallenvara commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61831064
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
--- End diff --

not used, may be good to set to `NullValue`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Jackson version upgrade: default from 2.4.2 to...

2016-05-02 Thread smarthi
Github user smarthi commented on the pull request:

https://github.com/apache/flink/pull/1952#issuecomment-216409045
  
ElasticSearch 2.x requires jackson >= 2.6.2, hence the Elastic 2.x 
connector jackson version is set at 2.7.x.  Locally, I changed the Jackson 
version to be 2.7.x in the parent pom and didn't see any issues or tests 
failures. I think its safe to change the jackson.version to 2.7.4 in parent 
pom. If so, please remove the  in ElasticSearch2/pom.xml


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink-3691] extend avroinputformat to support...

2016-05-02 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1920#issuecomment-216390708
  
Thanks for the PR @gna-phetsarath! I had one suggestion and one comment 
about passing null in a objectReuse test.
Thanks, Fabian


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink-3691] extend avroinputformat to support...

2016-05-02 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1920#discussion_r61819570
  
--- Diff: 
flink-batch-connectors/flink-avro/src/main/java/org/apache/flink/api/java/io/AvroInputFormat.java
 ---
@@ -119,12 +138,14 @@ public E nextRecord(E reuseValue) throws IOException {
if (reachedEnd()) {
return null;
}
-   
-   if (!reuseAvroValue) {
-   reuseValue = 
InstantiationUtil.instantiate(avroValueType, Object.class);
+   if (org.apache.avro.generic.GenericRecord.class == 
avroValueType) {
--- End diff --

I think we can save one comparison for the `reuseAvroValue == true` case if 
we move that check up:
```
if (reuseAvroValue) {
  return dataFileReader.next(reuseValue);
} else {
  if (GenericRecord.class == avroValueType) {
return dataFileReader.next();
  } else {
return dataFileReader.next(InstantiationUtil.instantiate(avroValueType, 
Object.class));
  }
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink-3691] extend avroinputformat to support...

2016-05-02 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1920#discussion_r61819193
  
--- Diff: 
flink-batch-connectors/flink-avro/src/test/java/org/apache/flink/api/io/avro/AvroRecordInputFormatTest.java
 ---
@@ -289,6 +290,119 @@ public void testDeserializeToSpecificType() throws 
IOException {
}
}
 
+   /**
+* Test if the AvroInputFormat is able to properly read data from an 
Avro
+* file as a GenericRecord.
+* 
+* @throws IOException,
+* if there is an exception
+*/
+   @SuppressWarnings("unchecked")
+   @Test
+   public void testDeserialisationGenericRecord() throws IOException {
+   Configuration parameters = new Configuration();
+
+   AvroInputFormat format = new 
AvroInputFormat(new Path(testFile.getAbsolutePath()),
+   GenericRecord.class);
+   try {
+   format.configure(parameters);
+   FileInputSplit[] splits = format.createInputSplits(1);
+   assertEquals(splits.length, 1);
+   format.open(splits[0]);
+
+   GenericRecord u = format.nextRecord(null);
--- End diff --

But this test tests the `GenericRecord` mode with object reuse. Is it valid 
to give `null` as a reuse object?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3519) Subclasses of Tuples don't work if the declared type of a DataSet is not the descendant

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267587#comment-15267587
 ] 

ASF GitHub Bot commented on FLINK-3519:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1724#issuecomment-216378193
  
+1 to merge


> Subclasses of Tuples don't work if the declared type of a DataSet is not the 
> descendant
> ---
>
> Key: FLINK-3519
> URL: https://issues.apache.org/jira/browse/FLINK-3519
> Project: Flink
>  Issue Type: Bug
>  Components: Type Serialization System
>Affects Versions: 1.0.0
>Reporter: Gabor Gevay
>Assignee: Gabor Gevay
>Priority: Minor
>
> If I have a subclass of TupleN, then objects of this type will turn into 
> TupleNs when I try to use them in a DataSet.
> For example, if I have a class like this:
> {code}
> public static class Foo extends Tuple1 {
>   public short a;
>   public Foo() {}
>   public Foo(int f0, int a) {
>   this.f0 = f0;
>   this.a = (short)a;
>   }
>   @Override
>   public String toString() {
>   return "(" + f0 + ", " + a + ")";
>   }
> }
> {code}
> And then I do this:
> {code}
> env.fromElements(0,0,0).map(new MapFunction() {
>   @Override
>   public Tuple1 map(Integer value) throws Exception {
>   return new Foo(5, 6);
>   }
> }).print();
> {code}
> Then I don't have Foos in the output, but only Tuples:
> {code}
> (5)
> (5)
> (5)
> {code}
> The problem is caused by the TupleSerializer not caring about subclasses at 
> all. I guess the reason for this is performance: we don't want to deal with 
> writing and reading subclass tags when we have Tuples.
> I see three options for solving this:
> 1. Add subclass tags to the TupleSerializer: This is not really an option, 
> because we don't want to loose performance.
> 2. Document this behavior in the javadoc of the Tuple classes.
> 3. Make the Tuple types final: this would be the clean solution, but it is 
> API breaking, and the first victim would be Gelly: the Vertex and Edge types 
> extend from tuples. (Note that the issue doesn't appear there, because the 
> DataSets there always have the type of the descendant class.)
> When deciding between 2. and 3., an important point to note is that if you 
> have your class extend from a Tuple type instead of just adding the f0, f1, 
> ... fields manually in the hopes of getting the performance boost associated 
> with Tuples, then you are out of luck: the PojoSerializer will kick in anyway 
> when the declared types of your DataSets are the descendant type.
> If someone knows about a good reason to extend from a Tuple class, then 
> please comment.
> For 2., this is a suggested wording for the javadoc of the Tuple classes:
> Warning: Please don't subclass Tuple classes, but if you do, then be sure to 
> always declare the element type of your DataSets to your descendant type. 
> (That is, if you have a "class A extends Tuple2", then don't use instances of 
> A in a DataSet, but use DataSet.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3519] [core] Add warning about subclass...

2016-05-02 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1724#issuecomment-216378193
  
+1 to merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2220) Join on Pojo without hashCode() silently fails

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267556#comment-15267556
 ] 

ASF GitHub Bot commented on FLINK-2220:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1940#issuecomment-216372861
  
I added a comment to FLINK-2220. I think the original analysis of the 
problem was not correct. It is not necessary to check for POJOs whether they 
override `equals()` and `hashcode()`. Details in FLINK-2220


> Join on Pojo without hashCode() silently fails
> --
>
> Key: FLINK-2220
> URL: https://issues.apache.org/jira/browse/FLINK-2220
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 0.9, 0.8.1
>Reporter: Marcus Leich
>
> I need to perform a join using a complete Pojo as join key.
> With DOP > 1 this only works if the Pojo comes with a meaningful hasCode() 
> implementation, as otherwise equal objects will get hashed to different 
> partitions based on their memory address and not on the content.
> I guess it's fine if users are required to implement hasCode() themselves, 
> but it would be nice of documentation or better yet, Flink itself could alert 
> users that this is a requirement, similar to how Comparable is required for 
> keys.
> Use the following code to reproduce the issue:
> public class Pojo implements Comparable {
> public byte[] data;
> public Pojo () {
> }
> public Pojo (byte[] data) {
> this.data = data;
> }
> @Override
> public int compareTo(Pojo o) {
> return UnsignedBytes.lexicographicalComparator().compare(data, 
> o.data);
> }
> // uncomment me for making the join work
> /* @Override
> public int hashCode() {
> return Arrays.hashCode(data);
> }*/
> }
> public void testJoin () throws Exception {
> final ExecutionEnvironment env = 
> ExecutionEnvironment.createLocalEnvironment();
> env.setParallelism(4);
> DataSet> left = env.fromElements(
> new Tuple2<>(new Pojo(new byte[] {0, 24, 23, 1, 3}), "black"),
> new Tuple2<>(new Pojo(new byte[] {0, 14, 13, 14, 13}), "red"),
> new Tuple2<>(new Pojo(new byte[] {1}), "Spark"),
> new Tuple2<>(new Pojo(new byte[] {2}), "good"),
> new Tuple2<>(new Pojo(new byte[] {5}), "bug"));
> DataSet> right = env.fromElements(
> new Tuple2<>(new Pojo(new byte[] {0, 24, 23, 1, 3}), "white"),
> new Tuple2<>(new Pojo(new byte[] {0, 14, 13, 14, 13}), 
> "green"),
> new Tuple2<>(new Pojo(new byte[] {1}), "Flink"),
> new Tuple2<>(new Pojo(new byte[] {2}), "evil"),
> new Tuple2<>(new Pojo(new byte[] {5}), "fix"));
> // will not print anything unless Pojo has a real hashCode() 
> implementation
> 
> left.join(right).where(0).equalTo(0).projectFirst(1).projectSecond(1).print();
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2220] Join on Pojo without hashCode() s...

2016-05-02 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1940#issuecomment-216372861
  
I added a comment to FLINK-2220. I think the original analysis of the 
problem was not correct. It is not necessary to check for POJOs whether they 
override `equals()` and `hashcode()`. Details in FLINK-2220


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2220) Join on Pojo without hashCode() silently fails

2016-05-02 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267555#comment-15267555
 ] 

Fabian Hueske commented on FLINK-2220:
--

I think the cause of the original problem was not correctly identified before.
The problem is not that the POJO does not override {{hashcode()}} and 
{{equals()}}. The {{PojoComparator}} delegates {{equals()}} and {{hashcode()}} 
to the {{TypeComparator}} s of their fields. In the code snippet above, this 
would be the {{TypeComparator}} for the {{byte[]}} field. 

If I remember correctly, byte arrays were handled as {{GenericType}} in version 
0.9.0. In contrast to the {{PojoComparator}}, the {{GenericTypeComparator}} 
uses the {{equals()}} and {{hashcode()}} of the objects.
So instead of checking for overridden {{equals()}} and {{hashcode()}} methods 
in {{TypeExtractor.analyzePojo()}} we need to extend the check in 
{{GenericTypeInfo.isKeyType()}} by checking whether both methods have been 
overridden.

> Join on Pojo without hashCode() silently fails
> --
>
> Key: FLINK-2220
> URL: https://issues.apache.org/jira/browse/FLINK-2220
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 0.9, 0.8.1
>Reporter: Marcus Leich
>
> I need to perform a join using a complete Pojo as join key.
> With DOP > 1 this only works if the Pojo comes with a meaningful hasCode() 
> implementation, as otherwise equal objects will get hashed to different 
> partitions based on their memory address and not on the content.
> I guess it's fine if users are required to implement hasCode() themselves, 
> but it would be nice of documentation or better yet, Flink itself could alert 
> users that this is a requirement, similar to how Comparable is required for 
> keys.
> Use the following code to reproduce the issue:
> public class Pojo implements Comparable {
> public byte[] data;
> public Pojo () {
> }
> public Pojo (byte[] data) {
> this.data = data;
> }
> @Override
> public int compareTo(Pojo o) {
> return UnsignedBytes.lexicographicalComparator().compare(data, 
> o.data);
> }
> // uncomment me for making the join work
> /* @Override
> public int hashCode() {
> return Arrays.hashCode(data);
> }*/
> }
> public void testJoin () throws Exception {
> final ExecutionEnvironment env = 
> ExecutionEnvironment.createLocalEnvironment();
> env.setParallelism(4);
> DataSet> left = env.fromElements(
> new Tuple2<>(new Pojo(new byte[] {0, 24, 23, 1, 3}), "black"),
> new Tuple2<>(new Pojo(new byte[] {0, 14, 13, 14, 13}), "red"),
> new Tuple2<>(new Pojo(new byte[] {1}), "Spark"),
> new Tuple2<>(new Pojo(new byte[] {2}), "good"),
> new Tuple2<>(new Pojo(new byte[] {5}), "bug"));
> DataSet> right = env.fromElements(
> new Tuple2<>(new Pojo(new byte[] {0, 24, 23, 1, 3}), "white"),
> new Tuple2<>(new Pojo(new byte[] {0, 14, 13, 14, 13}), 
> "green"),
> new Tuple2<>(new Pojo(new byte[] {1}), "Flink"),
> new Tuple2<>(new Pojo(new byte[] {2}), "evil"),
> new Tuple2<>(new Pojo(new byte[] {5}), "fix"));
> // will not print anything unless Pojo has a real hashCode() 
> implementation
> 
> left.join(right).where(0).equalTo(0).projectFirst(1).projectSecond(1).print();
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3581) Add Special Aligned Event-Time WindowOperator

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267450#comment-15267450
 ] 

ASF GitHub Bot commented on FLINK-3581:
---

Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/1957#issuecomment-216354689
  
Is that a problem? Maybe we could do some periodic garbage collection on 
the empty column families.


> Add Special Aligned Event-Time WindowOperator
> -
>
> Key: FLINK-3581
> URL: https://issues.apache.org/jira/browse/FLINK-3581
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> The current Window Trigger is per key. Meaning every window has a (logical) 
> Trigger for every key in the window, i.e. there will be state and time 
> triggers per key per window.
> For some types of windows, i.e. based on time it is possible to use a single 
> Trigger to fire for all keys at the same time. In that case we would save a 
> lot of space on state and timers. Which makes state snapshots a lot smaller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3581] [FLINK-3582] State Iterator and A...

2016-05-02 Thread gyfora
Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/1957#issuecomment-216354689
  
Is that a problem? Maybe we could do some periodic garbage collection on 
the empty column families.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (FLINK-1996) Add output methods to Table API

2016-05-02 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske reassigned FLINK-1996:


Assignee: Fabian Hueske

> Add output methods to Table API
> ---
>
> Key: FLINK-1996
> URL: https://issues.apache.org/jira/browse/FLINK-1996
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Affects Versions: 0.9
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> Tables need to be converted to DataSets (or DataStreams) to write them out. 
> It would be good to have a way to emit Table results directly for example to 
> print, CSV, JDBC, HBase, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Jackson version upgrade: default from 2.4.2 to...

2016-05-02 Thread cowtowncoder
Github user cowtowncoder commented on the pull request:

https://github.com/apache/flink/pull/1952#issuecomment-216337481
  
@aljoscha @fhueske Nothing special, just thought I'd start with smallest 
step, given that this is my first contribution here.
But given that 2.7.4 is out now, I agree that going right there does make 
most sense and should be safe.
It also looks like most usage is via Tree API (JsonNodes), some streaming; 
most changes are typically in databinding and very few compatibility issues 
occur outside databinding. So to me upgrade seems safe either way. And since 
Elastic-client already uses 2.7 that would allow unification of versions as 
well.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3854) Support Avro key-value rolling sink writer

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267240#comment-15267240
 ] 

ASF GitHub Bot commented on FLINK-3854:
---

Github user IgorBerman commented on the pull request:

https://github.com/apache/flink/pull/1953#issuecomment-216324472
  
@aljoscha , @rmetzger thanks for the review. I've updated PR : moved 
verifications to the C'tor 


> Support Avro key-value rolling sink writer
> --
>
> Key: FLINK-3854
> URL: https://issues.apache.org/jira/browse/FLINK-3854
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming Connectors
>Affects Versions: 1.0.3
>Reporter: Igor Berman
>
> Support rolling sink writer in avro key value format.
> preferably without additional classpath dependencies
> preferable in same format as M/R jobs for backward compatibility



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3854] Support Avro key-value rolling si...

2016-05-02 Thread IgorBerman
Github user IgorBerman commented on the pull request:

https://github.com/apache/flink/pull/1953#issuecomment-216324472
  
@aljoscha , @rmetzger thanks for the review. I've updated PR : moved 
verifications to the C'tor 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2926) Add a Strongly Connected Components Library Method

2016-05-02 Thread Martin Liesenberg (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267231#comment-15267231
 ] 

Martin Liesenberg commented on FLINK-2926:
--

yes, will do. thanks for the feedback. :)

> Add a Strongly Connected Components Library Method
> --
>
> Key: FLINK-2926
> URL: https://issues.apache.org/jira/browse/FLINK-2926
> Project: Flink
>  Issue Type: Task
>  Components: Gelly
>Affects Versions: 0.10.0
>Reporter: Andra Lungu
>Assignee: Martin Liesenberg
>Priority: Minor
>  Labels: requires-design-doc
>
> This algorithm operates in four main steps: 
> 1). Form the transposed graph (each vertex sends its id to its out neighbors 
> which form a transposedNeighbors set)
> 2). Trimming: every vertex which has only incoming or outgoing edges sets 
> colorID to its own value and becomes inactive. 
> 3). Forward traversal: 
>Start phase: propagate id to out neighbors 
>Rest phase: update the colorID with the minimum value seen 
> until convergence
> 4). Backward traversal: 
>  Start: if the vertex id is equal to its color id 
> propagate the value to transposedNeighbors
>  Rest: each vertex that receives a message equal to its 
> colorId will propagate its colorId to the transposed graph and becomes 
> inactive. 
> More info in section 3.1 of this paper: 
> http://ilpubs.stanford.edu:8090/1077/3/p535-salihoglu.pdf
> or in section 6 of this paper: http://www.vldb.org/pvldb/vol7/p1821-yan.pdf  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267211#comment-15267211
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61780511
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
+
+   public static enum HITSParameter {
+   HUB,
+   AUTHORITY
+   }
+
+   private int maxIterations;
+   private long numberOfVertices;
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is known,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, 
HITSParameter)} constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   }
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is unknown,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} 
constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, long numberOfVertices, 
HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   this.numberOfVertices = numberOfVertices;
+   }
+
+   @Override
+   public DataSet> run(Graph 
netGraph) throws Exception {
+   if (this.numberOfVertices == 0) {
+   this.numberOfVertices = netGraph.numberOfVertices();
+   }
+
+   

[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-02 Thread greghogan
Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61780580
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
--- End diff --

Is the edge value used in this algorithm?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-02 Thread greghogan
Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61780511
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
+
+   public static enum HITSParameter {
+   HUB,
+   AUTHORITY
+   }
+
+   private int maxIterations;
+   private long numberOfVertices;
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is known,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, 
HITSParameter)} constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   }
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is unknown,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} 
constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, long numberOfVertices, 
HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   this.numberOfVertices = numberOfVertices;
+   }
+
+   @Override
+   public DataSet> run(Graph 
netGraph) throws Exception {
+   if (this.numberOfVertices == 0) {
+   this.numberOfVertices = netGraph.numberOfVertices();
+   }
+
+   ScatterGatherConfiguration parameter = new 
ScatterGatherConfiguration();
+   parameter.setDirection(EdgeDirection.ALL);
+   parameter.registerAggregator("sumAllValue", new 
DoubleSumAggregator());
+
+  

[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267206#comment-15267206
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61780238
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
+
+   public static enum HITSParameter {
+   HUB,
+   AUTHORITY
+   }
+
+   private int maxIterations;
+   private long numberOfVertices;
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is known,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, 
HITSParameter)} constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   }
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is unknown,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} 
constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, long numberOfVertices, 
HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   this.numberOfVertices = numberOfVertices;
+   }
+
+   @Override
+   public DataSet> run(Graph 
netGraph) throws Exception {
+   if (this.numberOfVertices == 0) {
+   this.numberOfVertices = netGraph.numberOfVertices();
+   }
+
+   

[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-02 Thread greghogan
Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61780238
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
+
+   public static enum HITSParameter {
+   HUB,
+   AUTHORITY
+   }
+
+   private int maxIterations;
+   private long numberOfVertices;
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is known,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, 
HITSParameter)} constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   }
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is unknown,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} 
constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, long numberOfVertices, 
HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   this.numberOfVertices = numberOfVertices;
+   }
+
+   @Override
+   public DataSet> run(Graph 
netGraph) throws Exception {
+   if (this.numberOfVertices == 0) {
+   this.numberOfVertices = netGraph.numberOfVertices();
+   }
+
+   ScatterGatherConfiguration parameter = new 
ScatterGatherConfiguration();
+   parameter.setDirection(EdgeDirection.ALL);
+   parameter.registerAggregator("sumAllValue", new 
DoubleSumAggregator());
+
+  

[jira] [Commented] (FLINK-3772) Graph algorithms for vertex and edge degree

2016-05-02 Thread Greg Hogan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267132#comment-15267132
 ] 

Greg Hogan commented on FLINK-3772:
---

I added documentation and removed algorithm caching, the latter because I think 
we do better by merging configurations.

> Graph algorithms for vertex and edge degree
> ---
>
> Key: FLINK-3772
> URL: https://issues.apache.org/jira/browse/FLINK-3772
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Many graph algorithms require vertices or edges to be marked with the degree. 
> This ticket provides algorithms for annotating
> * vertex degree for undirected graphs
> * vertex out-, in-, and out- and in-degree for directed graphs
> * edge source, target, and source and target degree for undirected graphs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1984) Integrate Flink with Apache Mesos

2016-05-02 Thread Eron Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eron Wright  updated FLINK-1984:

Description: 
There are some users asking for an integration of Flink into Mesos.

-There also is a pending pull request for adding Mesos support for Flink-: 
https://github.com/apache/flink/pull/251

Update (May '16):  a new effort is now underway, building on the recent 
ResourceManager work.



  was:
There are some users asking for an integration of Flink into Mesos.

There also is a pending pull request for adding Mesos support for Flink: 
https://github.com/apache/flink/pull/251
But the PR is insufficiently tested. I'll add the code of the pull request to 
this JIRA in case somebody wants to pick it up in the future.


> Integrate Flink with Apache Mesos
> -
>
> Key: FLINK-1984
> URL: https://issues.apache.org/jira/browse/FLINK-1984
> Project: Flink
>  Issue Type: New Feature
>  Components: New Components
>Reporter: Robert Metzger
>Assignee: Eron Wright 
>Priority: Minor
> Attachments: 251.patch
>
>
> There are some users asking for an integration of Flink into Mesos.
> -There also is a pending pull request for adding Mesos support for Flink-: 
> https://github.com/apache/flink/pull/251
> Update (May '16):  a new effort is now underway, building on the recent 
> ResourceManager work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267049#comment-15267049
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user gallenvara commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61772427
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
+
+   public static enum HITSParameter {
+   HUB,
+   AUTHORITY
+   }
+
+   private int maxIterations;
+   private long numberOfVertices;
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is known,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, 
HITSParameter)} constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   }
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is unknown,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} 
constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, long numberOfVertices, 
HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   this.numberOfVertices = numberOfVertices;
+   }
+
+   @Override
+   public DataSet> run(Graph 
netGraph) throws Exception {
--- End diff --

The two phases are depended on each other. Hub can update until authority 
updated and normalized, also the same to authority. So 

[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-02 Thread gallenvara
Github user gallenvara commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61772427
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
+
+   public static enum HITSParameter {
+   HUB,
+   AUTHORITY
+   }
+
+   private int maxIterations;
+   private long numberOfVertices;
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is known,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, 
HITSParameter)} constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   }
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is unknown,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} 
constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, long numberOfVertices, 
HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   this.numberOfVertices = numberOfVertices;
+   }
+
+   @Override
+   public DataSet> run(Graph 
netGraph) throws Exception {
--- End diff --

The two phases are depended on each other. Hub can update until authority 
updated and normalized, also the same to authority. So the two updating 
processing is in a front and back order, i mean they are belong to different 
iteration step. Return a `tuple2` value means that we can get hub and authority 
in the same `superstep`. So the `HITSParameter` being set.


---

[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267033#comment-15267033
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61770727
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
+
+   public static enum HITSParameter {
+   HUB,
+   AUTHORITY
+   }
+
+   private int maxIterations;
+   private long numberOfVertices;
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is known,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, 
HITSParameter)} constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   }
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is unknown,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} 
constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, long numberOfVertices, 
HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   this.numberOfVertices = numberOfVertices;
--- End diff --

Thanks for the contribution :) Pull requests are always asynchronous.


> Implementation of Gelly HITS Algorithm
> --
>
> Key: FLINK-2044
> URL: https://issues.apache.org/jira/browse/FLINK-2044
>  

[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-02 Thread greghogan
Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61770727
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
+
+   public static enum HITSParameter {
+   HUB,
+   AUTHORITY
+   }
+
+   private int maxIterations;
+   private long numberOfVertices;
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is known,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, 
HITSParameter)} constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   }
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is unknown,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} 
constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, long numberOfVertices, 
HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   this.numberOfVertices = numberOfVertices;
--- End diff --

Thanks for the contribution :) Pull requests are always asynchronous.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3650) Add maxBy/minBy to Scala DataSet API

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267014#comment-15267014
 ] 

ASF GitHub Bot commented on FLINK-3650:
---

Github user ramkrish86 commented on the pull request:

https://github.com/apache/flink/pull/1856#issuecomment-216300472
  
Added maxBy and minBy to GroupedDataSet too. So should be fine now. 


> Add maxBy/minBy to Scala DataSet API
> 
>
> Key: FLINK-3650
> URL: https://issues.apache.org/jira/browse/FLINK-3650
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: ramkrishna.s.vasudevan
>
> The stable Java DataSet API contains the API calls {{maxBy}} and {{minBy}}. 
> These methods are not supported by the Scala DataSet API. These methods 
> should be added in order to have a consistent API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-3650 Add maxBy/minBy to Scala DataSet AP...

2016-05-02 Thread ramkrish86
Github user ramkrish86 commented on the pull request:

https://github.com/apache/flink/pull/1856#issuecomment-216300472
  
Added maxBy and minBy to GroupedDataSet too. So should be fine now. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267012#comment-15267012
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61769569
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
+
+   public static enum HITSParameter {
+   HUB,
+   AUTHORITY
+   }
+
+   private int maxIterations;
+   private long numberOfVertices;
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is known,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, 
HITSParameter)} constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   }
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is unknown,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} 
constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, long numberOfVertices, 
HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   this.numberOfVertices = numberOfVertices;
+   }
+
+   @Override
+   public DataSet> run(Graph 
netGraph) throws Exception {
--- End diff --

Could we return both the hub score and authority score in a `Tuple2` rather 
than having the user choose between the two scores?


> 

[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-02 Thread greghogan
Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61769569
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
+
+   public static enum HITSParameter {
+   HUB,
+   AUTHORITY
+   }
+
+   private int maxIterations;
+   private long numberOfVertices;
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is known,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, 
HITSParameter)} constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   }
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is unknown,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} 
constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, long numberOfVertices, 
HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   this.numberOfVertices = numberOfVertices;
+   }
+
+   @Override
+   public DataSet> run(Graph 
netGraph) throws Exception {
--- End diff --

Could we return both the hub score and authority score in a `Tuple2` rather 
than having the user choose between the two scores?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact 

[jira] [Commented] (FLINK-3650) Add maxBy/minBy to Scala DataSet API

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267010#comment-15267010
 ] 

ASF GitHub Bot commented on FLINK-3650:
---

Github user ramkrish86 commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r61769507
  
--- Diff: 
flink-scala/src/test/scala/org/apache/flink/api/operator/MaxByOperatorTest.scala
 ---
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.api.operator
+
+import org.apache.flink.api.common.InvalidProgramException
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo
+import org.apache.flink.api.java.tuple.{Tuple, Tuple5}
+import org.apache.flink.api.java.typeutils.TupleTypeInfo
+import org.apache.flink.api.scala._
+import org.junit.Test
+import org.junit.Assert;
+
+class MaxByOperatorTest {
+  private val emptyTupleData = List[Tuple5[Integer, Long, String, Long, 
Integer]]()
+  private val customTypeData = List[CustomType](new CustomType())
+  private val tupleTypeInfo: TupleTypeInfo[Tuple5[Integer, Long, String, 
Long, Integer]] =
--- End diff --

Sorry. I am not sure on this. Can you give some examples?


> Add maxBy/minBy to Scala DataSet API
> 
>
> Key: FLINK-3650
> URL: https://issues.apache.org/jira/browse/FLINK-3650
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: ramkrishna.s.vasudevan
>
> The stable Java DataSet API contains the API calls {{maxBy}} and {{minBy}}. 
> These methods are not supported by the Scala DataSet API. These methods 
> should be added in order to have a consistent API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-3650 Add maxBy/minBy to Scala DataSet AP...

2016-05-02 Thread ramkrish86
Github user ramkrish86 commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r61769507
  
--- Diff: 
flink-scala/src/test/scala/org/apache/flink/api/operator/MaxByOperatorTest.scala
 ---
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.api.operator
+
+import org.apache.flink.api.common.InvalidProgramException
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo
+import org.apache.flink.api.java.tuple.{Tuple, Tuple5}
+import org.apache.flink.api.java.typeutils.TupleTypeInfo
+import org.apache.flink.api.scala._
+import org.junit.Test
+import org.junit.Assert;
+
+class MaxByOperatorTest {
+  private val emptyTupleData = List[Tuple5[Integer, Long, String, Long, 
Integer]]()
+  private val customTypeData = List[CustomType](new CustomType())
+  private val tupleTypeInfo: TupleTypeInfo[Tuple5[Integer, Long, String, 
Long, Integer]] =
--- End diff --

Sorry. I am not sure on this. Can you give some examples?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3650) Add maxBy/minBy to Scala DataSet API

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267009#comment-15267009
 ] 

ASF GitHub Bot commented on FLINK-3650:
---

Github user ramkrish86 commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r61769474
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/functions/SelectByMinFunction.java
 ---
@@ -72,8 +73,8 @@ public T reduce(T value1, T value2) throws Exception {
for (int position : fields) {
// Save position of compared key
// Get both values - both implement comparable
-   Comparable comparable1 = 
value1.getFieldNotNull(position);
-   Comparable comparable2 = 
value2.getFieldNotNull(position);
+   Comparable comparable1 = 
((Tuple)value1).getFieldNotNull(position);
--- End diff --

I have a doubt here. Previously also this SelectByMaxFunction was extending 
T extends Tuple. So TupleTypeInfo was actually allowed here. So even now we are 
allowing TypeInformation which is the super class of TupleTypeInfo. So it 
should work the same way as it was earlier?


> Add maxBy/minBy to Scala DataSet API
> 
>
> Key: FLINK-3650
> URL: https://issues.apache.org/jira/browse/FLINK-3650
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: ramkrishna.s.vasudevan
>
> The stable Java DataSet API contains the API calls {{maxBy}} and {{minBy}}. 
> These methods are not supported by the Scala DataSet API. These methods 
> should be added in order to have a consistent API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-3650 Add maxBy/minBy to Scala DataSet AP...

2016-05-02 Thread ramkrish86
Github user ramkrish86 commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r61769474
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/functions/SelectByMinFunction.java
 ---
@@ -72,8 +73,8 @@ public T reduce(T value1, T value2) throws Exception {
for (int position : fields) {
// Save position of compared key
// Get both values - both implement comparable
-   Comparable comparable1 = 
value1.getFieldNotNull(position);
-   Comparable comparable2 = 
value2.getFieldNotNull(position);
+   Comparable comparable1 = 
((Tuple)value1).getFieldNotNull(position);
--- End diff --

I have a doubt here. Previously also this SelectByMaxFunction was extending 
T extends Tuple. So TupleTypeInfo was actually allowed here. So even now we are 
allowing TypeInformation which is the super class of TupleTypeInfo. So it 
should work the same way as it was earlier?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (FLINK-3845) Gelly allows duplicate vertices in Graph.addVertices

2016-05-02 Thread Greg Hogan (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan resolved FLINK-3845.
---
Resolution: Fixed

Fixed in 9c7a6598dd84e6739b50cfc8f6c5194e3a3ace9f

> Gelly allows duplicate vertices in Graph.addVertices
> 
>
> Key: FLINK-3845
> URL: https://issues.apache.org/jira/browse/FLINK-3845
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Gelly performs a {{DataSet}} union then calls {{distinct()}} which keeps 
> vertices with the same label but different values. This should be replaced 
> with one of the join operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3845) Gelly allows duplicate vertices in Graph.addVertices

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266995#comment-15266995
 ] 

ASF GitHub Bot commented on FLINK-3845:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1949


> Gelly allows duplicate vertices in Graph.addVertices
> 
>
> Key: FLINK-3845
> URL: https://issues.apache.org/jira/browse/FLINK-3845
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Gelly performs a {{DataSet}} union then calls {{distinct()}} which keeps 
> vertices with the same label but different values. This should be replaced 
> with one of the join operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3845] [gelly] Gelly allows duplicate ve...

2016-05-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1949


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266984#comment-15266984
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61767543
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
+
+   public static enum HITSParameter {
+   HUB,
+   AUTHORITY
+   }
+
+   private int maxIterations;
+   private long numberOfVertices;
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is known,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, 
HITSParameter)} constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   }
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is unknown,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} 
constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, long numberOfVertices, 
HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   this.numberOfVertices = numberOfVertices;
+   }
+
+   @Override
+   public DataSet> run(Graph 
netGraph) throws Exception {
+   if (this.numberOfVertices == 0) {
+   this.numberOfVertices = netGraph.numberOfVertices();
+   }
+
+   

[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-02 Thread greghogan
Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61767543
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
+
+   public static enum HITSParameter {
+   HUB,
+   AUTHORITY
+   }
+
+   private int maxIterations;
+   private long numberOfVertices;
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is known,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, 
HITSParameter)} constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   }
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is unknown,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} 
constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, long numberOfVertices, 
HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   this.numberOfVertices = numberOfVertices;
+   }
+
+   @Override
+   public DataSet> run(Graph 
netGraph) throws Exception {
+   if (this.numberOfVertices == 0) {
+   this.numberOfVertices = netGraph.numberOfVertices();
+   }
+
+   ScatterGatherConfiguration parameter = new 
ScatterGatherConfiguration();
+   parameter.setDirection(EdgeDirection.ALL);
+   parameter.registerAggregator("sumAllValue", new 
DoubleSumAggregator());
+
+  

[jira] [Commented] (FLINK-3669) WindowOperator registers a lot of timers at StreamTask

2016-05-02 Thread Konstantin Knauf (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266976#comment-15266976
 ] 

Konstantin Knauf commented on FLINK-3669:
-

The problem in the gist, yes. As I said, I could not test with the original 
application.

> WindowOperator registers a lot of timers at StreamTask
> --
>
> Key: FLINK-3669
> URL: https://issues.apache.org/jira/browse/FLINK-3669
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.0.1
>Reporter: Aljoscha Krettek
>Assignee: Konstantin Knauf
>Priority: Blocker
>
> Right now, the WindowOperator registers a timer at the StreamTask for every 
> processing-time timer that a Trigger registers. We should combine several 
> registered trigger timers to only register one low-level timer (timer 
> coalescing).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266974#comment-15266974
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user gallenvara commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61766847
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
+
+   public static enum HITSParameter {
+   HUB,
+   AUTHORITY
+   }
+
+   private int maxIterations;
+   private long numberOfVertices;
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is known,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, 
HITSParameter)} constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   }
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is unknown,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} 
constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, long numberOfVertices, 
HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   this.numberOfVertices = numberOfVertices;
--- End diff --

Thanks for review :) i will modify codes tomorrow because my computer is 
not by my side now.


> Implementation of Gelly HITS Algorithm
> --
>
> Key: FLINK-2044
> URL: 

[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-02 Thread gallenvara
Github user gallenvara commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61766847
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
+
+   public static enum HITSParameter {
+   HUB,
+   AUTHORITY
+   }
+
+   private int maxIterations;
+   private long numberOfVertices;
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is known,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, 
HITSParameter)} constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   }
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is unknown,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} 
constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, long numberOfVertices, 
HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   this.numberOfVertices = numberOfVertices;
--- End diff --

Thanks for review :) i will modify codes tomorrow because my computer is 
not by my side now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266947#comment-15266947
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61764570
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
+
+   public static enum HITSParameter {
+   HUB,
+   AUTHORITY
+   }
+
+   private int maxIterations;
+   private long numberOfVertices;
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is known,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, 
HITSParameter)} constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   }
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is unknown,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} 
constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, long numberOfVertices, 
HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   this.numberOfVertices = numberOfVertices;
--- End diff --

Verify positive `numberOfVertices` with `Preconditions.checkArgument`. Same 
for `maxIterations` in other constructor.


> Implementation of Gelly HITS Algorithm
> --
>
> Key: FLINK-2044
> URL: 

[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-02 Thread greghogan
Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61764570
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
+
+   public static enum HITSParameter {
+   HUB,
+   AUTHORITY
+   }
+
+   private int maxIterations;
+   private long numberOfVertices;
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is known,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, 
HITSParameter)} constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   }
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is unknown,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} 
constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, long numberOfVertices, 
HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   this.numberOfVertices = numberOfVertices;
--- End diff --

Verify positive `numberOfVertices` with `Preconditions.checkArgument`. Same 
for `maxIterations` in other constructor.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266941#comment-15266941
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61764033
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
+
+   public static enum HITSParameter {
+   HUB,
+   AUTHORITY
+   }
+
+   private int maxIterations;
+   private long numberOfVertices;
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is known,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, 
HITSParameter)} constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   }
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is unknown,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} 
constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, long numberOfVertices, 
HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
--- End diff --

Can replace 82:87 with `super(maxIterations, hitsParameter);`.


> Implementation of Gelly HITS Algorithm
> --
>
> Key: FLINK-2044
> URL: https://issues.apache.org/jira/browse/FLINK-2044
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Ahamd Javid
>Assignee: GaoLun

[jira] [Commented] (FLINK-3845) Gelly allows duplicate vertices in Graph.addVertices

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266939#comment-15266939
 ] 

ASF GitHub Bot commented on FLINK-3845:
---

Github user greghogan commented on the pull request:

https://github.com/apache/flink/pull/1949#issuecomment-216287144
  
Merging ...


> Gelly allows duplicate vertices in Graph.addVertices
> 
>
> Key: FLINK-3845
> URL: https://issues.apache.org/jira/browse/FLINK-3845
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Gelly performs a {{DataSet}} union then calls {{distinct()}} which keeps 
> vertices with the same label but different values. This should be replaced 
> with one of the join operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-02 Thread greghogan
Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61764033
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
+
+   public static enum HITSParameter {
+   HUB,
+   AUTHORITY
+   }
+
+   private int maxIterations;
+   private long numberOfVertices;
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is known,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, 
HITSParameter)} constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   }
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is unknown,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} 
constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, long numberOfVertices, 
HITSParameter hitsParameter) {
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
--- End diff --

Can replace 82:87 with `super(maxIterations, hitsParameter);`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3845] [gelly] Gelly allows duplicate ve...

2016-05-02 Thread greghogan
Github user greghogan commented on the pull request:

https://github.com/apache/flink/pull/1949#issuecomment-216287144
  
Merging ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3793) Re-organize the Table API and SQL docs

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266918#comment-15266918
 ] 

ASF GitHub Bot commented on FLINK-3793:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1955#discussion_r61762758
  
--- Diff: docs/apis/table.md ---
@@ -57,6 +52,170 @@ The following dependency must be added to your project 
in order to use the Table
 
 Note that the Table API is currently not part of the binary distribution. 
See linking with it for cluster execution [here]({{ site.baseurl 
}}/apis/cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution).
 
+
+Registering and Accessing Tables
--- End diff --

ļ‘ 


> Re-organize the Table API and SQL docs
> --
>
> Key: FLINK-3793
> URL: https://issues.apache.org/jira/browse/FLINK-3793
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Table API
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>
> Now that we have added SQL and soon streaming SQL support, we need to 
> reorganize the Table API documentation. 
> - The current guide is under "apis/batch/libs". We should either split it 
> into a streaming and a batch part or move it to under "apis". The second 
> option might be preferable, as the batch and stream APIs have a lot in common.
> - The current guide has separate sections for Java and Scala APIs. These can 
> be merged and organized with tabs, like other parts of the docs.
> - Mentions of "Table API" can be renamed to "Table API and SQL", e.g. in the 
> software stack figure and homepage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3793][docs] re-organize table API and S...

2016-05-02 Thread vasia
Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1955#discussion_r61762758
  
--- Diff: docs/apis/table.md ---
@@ -57,6 +52,170 @@ The following dependency must be added to your project 
in order to use the Table
 
 Note that the Table API is currently not part of the binary distribution. 
See linking with it for cluster execution [here]({{ site.baseurl 
}}/apis/cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution).
 
+
+Registering and Accessing Tables
--- End diff --

Ć°ĀŸĀ‘Ā 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3793) Re-organize the Table API and SQL docs

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266915#comment-15266915
 ] 

ASF GitHub Bot commented on FLINK-3793:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1955#discussion_r61762024
  
--- Diff: docs/apis/table.md ---
@@ -57,6 +52,170 @@ The following dependency must be added to your project 
in order to use the Table
 
 Note that the Table API is currently not part of the binary distribution. 
See linking with it for cluster execution [here]({{ site.baseurl 
}}/apis/cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution).
 
+
+Registering and Accessing Tables
--- End diff --

No, I mean we should move the subsection about accessing tables to the 
respective sections and rename this section to "Registering Tables".


> Re-organize the Table API and SQL docs
> --
>
> Key: FLINK-3793
> URL: https://issues.apache.org/jira/browse/FLINK-3793
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Table API
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>
> Now that we have added SQL and soon streaming SQL support, we need to 
> reorganize the Table API documentation. 
> - The current guide is under "apis/batch/libs". We should either split it 
> into a streaming and a batch part or move it to under "apis". The second 
> option might be preferable, as the batch and stream APIs have a lot in common.
> - The current guide has separate sections for Java and Scala APIs. These can 
> be merged and organized with tabs, like other parts of the docs.
> - Mentions of "Table API" can be renamed to "Table API and SQL", e.g. in the 
> software stack figure and homepage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3793][docs] re-organize table API and S...

2016-05-02 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1955#discussion_r61762024
  
--- Diff: docs/apis/table.md ---
@@ -57,6 +52,170 @@ The following dependency must be added to your project 
in order to use the Table
 
 Note that the Table API is currently not part of the binary distribution. 
See linking with it for cluster execution [here]({{ site.baseurl 
}}/apis/cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution).
 
+
+Registering and Accessing Tables
--- End diff --

No, I mean we should move the subsection about accessing tables to the 
respective sections and rename this section to "Registering Tables".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3793) Re-organize the Table API and SQL docs

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266909#comment-15266909
 ] 

ASF GitHub Bot commented on FLINK-3793:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1955#discussion_r61761448
  
--- Diff: docs/apis/table.md ---
@@ -57,6 +52,170 @@ The following dependency must be added to your project 
in order to use the Table
 
 Note that the Table API is currently not part of the binary distribution. 
See linking with it for cluster execution [here]({{ site.baseurl 
}}/apis/cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution).
 
+
+Registering and Accessing Tables
--- End diff --

I put this here since this is common to both Table API and SQL. Do you mean 
having a "Registering Tables" section both under Table API and under SQL?


> Re-organize the Table API and SQL docs
> --
>
> Key: FLINK-3793
> URL: https://issues.apache.org/jira/browse/FLINK-3793
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Table API
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>
> Now that we have added SQL and soon streaming SQL support, we need to 
> reorganize the Table API documentation. 
> - The current guide is under "apis/batch/libs". We should either split it 
> into a streaming and a batch part or move it to under "apis". The second 
> option might be preferable, as the batch and stream APIs have a lot in common.
> - The current guide has separate sections for Java and Scala APIs. These can 
> be merged and organized with tabs, like other parts of the docs.
> - Mentions of "Table API" can be renamed to "Table API and SQL", e.g. in the 
> software stack figure and homepage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3793][docs] re-organize table API and S...

2016-05-02 Thread vasia
Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1955#discussion_r61761448
  
--- Diff: docs/apis/table.md ---
@@ -57,6 +52,170 @@ The following dependency must be added to your project 
in order to use the Table
 
 Note that the Table API is currently not part of the binary distribution. 
See linking with it for cluster execution [here]({{ site.baseurl 
}}/apis/cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution).
 
+
+Registering and Accessing Tables
--- End diff --

I put this here since this is common to both Table API and SQL. Do you mean 
having a "Registering Tables" section both under Table API and under SQL?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3856) Create types for java.sql.Date/Time/Timestamp

2016-05-02 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266907#comment-15266907
 ] 

Fabian Hueske commented on FLINK-3856:
--

I think it makes sense to add these time types. 
We can add them to BasicTypes, but we should mark them as {{@PublicEvolving}}, 
IMO.

> Create types for java.sql.Date/Time/Timestamp
> -
>
> Key: FLINK-3856
> URL: https://issues.apache.org/jira/browse/FLINK-3856
> Project: Flink
>  Issue Type: New Feature
>  Components: Core
>Reporter: Timo Walther
>Assignee: Timo Walther
>
> At the moment there is only the {{Date}} type which is not sufficient for 
> most use cases about time.
> The Table API would also benefit from having different types as output result.
> I would propose to add the three {{java.sql.}} types either as {{BasicTypes}} 
> or in an additional class {{TimeTypes}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3771) Methods for translating Graphs

2016-05-02 Thread Greg Hogan (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan closed FLINK-3771.
-
Resolution: Implemented

> Methods for translating Graphs
> --
>
> Key: FLINK-3771
> URL: https://issues.apache.org/jira/browse/FLINK-3771
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Provide methods for translation of the type or value of graph labels, vertex 
> values, and edge values.
> Sample use cases:
> * shifting graph labels in order to union generated graphs or graphs read 
> from multiple sources
> * downsizing labels or values since algorithms prefer to generate wide types 
> which may be expensive for further computation
> * changing label type for testing or benchmarking alternative code paths



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3771) Methods for translating Graphs

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266901#comment-15266901
 ] 

ASF GitHub Bot commented on FLINK-3771:
---

Github user greghogan commented on the pull request:

https://github.com/apache/flink/pull/1900#issuecomment-216280065
  
Implemented in 08e4bf9440a566c874c2b8e74ae2127ff264c672


> Methods for translating Graphs
> --
>
> Key: FLINK-3771
> URL: https://issues.apache.org/jira/browse/FLINK-3771
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Provide methods for translation of the type or value of graph labels, vertex 
> values, and edge values.
> Sample use cases:
> * shifting graph labels in order to union generated graphs or graphs read 
> from multiple sources
> * downsizing labels or values since algorithms prefer to generate wide types 
> which may be expensive for further computation
> * changing label type for testing or benchmarking alternative code paths



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3771] [gelly] Methods for translating G...

2016-05-02 Thread greghogan
Github user greghogan commented on the pull request:

https://github.com/apache/flink/pull/1900#issuecomment-216280065
  
Implemented in 08e4bf9440a566c874c2b8e74ae2127ff264c672


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3771) Methods for translating Graphs

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266887#comment-15266887
 ] 

ASF GitHub Bot commented on FLINK-3771:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1900


> Methods for translating Graphs
> --
>
> Key: FLINK-3771
> URL: https://issues.apache.org/jira/browse/FLINK-3771
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Provide methods for translation of the type or value of graph labels, vertex 
> values, and edge values.
> Sample use cases:
> * shifting graph labels in order to union generated graphs or graphs read 
> from multiple sources
> * downsizing labels or values since algorithms prefer to generate wide types 
> which may be expensive for further computation
> * changing label type for testing or benchmarking alternative code paths



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3771] [gelly] Methods for translating G...

2016-05-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1900


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3771) Methods for translating Graphs

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266882#comment-15266882
 ] 

ASF GitHub Bot commented on FLINK-3771:
---

Github user greghogan commented on the pull request:

https://github.com/apache/flink/pull/1900#issuecomment-216277509
  
Updated docs. Merging ...


> Methods for translating Graphs
> --
>
> Key: FLINK-3771
> URL: https://issues.apache.org/jira/browse/FLINK-3771
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Provide methods for translation of the type or value of graph labels, vertex 
> values, and edge values.
> Sample use cases:
> * shifting graph labels in order to union generated graphs or graphs read 
> from multiple sources
> * downsizing labels or values since algorithms prefer to generate wide types 
> which may be expensive for further computation
> * changing label type for testing or benchmarking alternative code paths



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3771] [gelly] Methods for translating G...

2016-05-02 Thread greghogan
Github user greghogan commented on the pull request:

https://github.com/apache/flink/pull/1900#issuecomment-216277509
  
Updated docs. Merging ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-3222) Incorrect shift amount in OperatorCheckpointStats#hashCode()

2016-05-02 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-3222:
--
Description: 
Here is related code:
{code}
result = 31 * result + (int) (subTaskStats.length ^ (subTaskStats.length 
>>> 32));
{code}
subTaskStats.length is an int.

The shift amount is greater than 31 bits.

  was:
Here is related code:
{code}
result = 31 * result + (int) (subTaskStats.length ^ (subTaskStats.length 
>>> 32));
{code}
subTaskStats.length is an int.


The shift amount is greater than 31 bits.


> Incorrect shift amount in OperatorCheckpointStats#hashCode()
> 
>
> Key: FLINK-3222
> URL: https://issues.apache.org/jira/browse/FLINK-3222
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> Here is related code:
> {code}
> result = 31 * result + (int) (subTaskStats.length ^ (subTaskStats.length 
> >>> 32));
> {code}
> subTaskStats.length is an int.
> The shift amount is greater than 31 bits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3733) registeredTypesWithKryoSerializers is not assigned in ExecutionConfig#deserializeUserCode()

2016-05-02 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-3733:
--
Description: 
{code}
if (serializedRegisteredTypesWithKryoSerializers != null) {
  registeredTypesWithKryoSerializers = 
serializedRegisteredTypesWithKryoSerializers.deserializeValue(userCodeClassLoader);
} else {
  registeredTypesWithKryoSerializerClasses = new LinkedHashMap<>();
}
{code}

When serializedRegisteredTypesWithKryoSerializers is null, 
registeredTypesWithKryoSerializers is not assigned.

  was:
{code}
if (serializedRegisteredTypesWithKryoSerializers != null) {
  registeredTypesWithKryoSerializers = 
serializedRegisteredTypesWithKryoSerializers.deserializeValue(userCodeClassLoader);
} else {
  registeredTypesWithKryoSerializerClasses = new LinkedHashMap<>();
}
{code}
When serializedRegisteredTypesWithKryoSerializers is null, 
registeredTypesWithKryoSerializers is not assigned.


> registeredTypesWithKryoSerializers is not assigned in 
> ExecutionConfig#deserializeUserCode()
> ---
>
> Key: FLINK-3733
> URL: https://issues.apache.org/jira/browse/FLINK-3733
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> {code}
> if (serializedRegisteredTypesWithKryoSerializers != null) {
>   registeredTypesWithKryoSerializers = 
> serializedRegisteredTypesWithKryoSerializers.deserializeValue(userCodeClassLoader);
> } else {
>   registeredTypesWithKryoSerializerClasses = new LinkedHashMap<>();
> }
> {code}
> When serializedRegisteredTypesWithKryoSerializers is null, 
> registeredTypesWithKryoSerializers is not assigned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1984) Integrate Flink with Apache Mesos

2016-05-02 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-1984:

Assignee: Eron Wright 

> Integrate Flink with Apache Mesos
> -
>
> Key: FLINK-1984
> URL: https://issues.apache.org/jira/browse/FLINK-1984
> Project: Flink
>  Issue Type: New Feature
>  Components: New Components
>Reporter: Robert Metzger
>Assignee: Eron Wright 
>Priority: Minor
> Attachments: 251.patch
>
>
> There are some users asking for an integration of Flink into Mesos.
> There also is a pending pull request for adding Mesos support for Flink: 
> https://github.com/apache/flink/pull/251
> But the PR is insufficiently tested. I'll add the code of the pull request to 
> this JIRA in case somebody wants to pick it up in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3734) Unclosed DataInputView in AbstractAlignedProcessingTimeWindowOperator#restoreState()

2016-05-02 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-3734:
--
Description: 
{code}
DataInputView in = inputState.getState(getUserCodeClassloader());

final long nextEvaluationTime = in.readLong();
final long nextSlideTime = in.readLong();

AbstractKeyedTimePanes panes = 
createPanes(keySelector, function);
panes.readFromInput(in, keySerializer, stateTypeSerializer);

restoredState = new RestoredState<>(panes, nextEvaluationTime, 
nextSlideTime);
  }
{code}

DataInputView in is not closed upon return.

  was:
{code}
DataInputView in = inputState.getState(getUserCodeClassloader());

final long nextEvaluationTime = in.readLong();
final long nextSlideTime = in.readLong();

AbstractKeyedTimePanes panes = 
createPanes(keySelector, function);
panes.readFromInput(in, keySerializer, stateTypeSerializer);

restoredState = new RestoredState<>(panes, nextEvaluationTime, 
nextSlideTime);
  }
{code}
DataInputView in is not closed upon return.


> Unclosed DataInputView in 
> AbstractAlignedProcessingTimeWindowOperator#restoreState()
> 
>
> Key: FLINK-3734
> URL: https://issues.apache.org/jira/browse/FLINK-3734
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> {code}
> DataInputView in = inputState.getState(getUserCodeClassloader());
> final long nextEvaluationTime = in.readLong();
> final long nextSlideTime = in.readLong();
> AbstractKeyedTimePanes panes = 
> createPanes(keySelector, function);
> panes.readFromInput(in, keySerializer, stateTypeSerializer);
> restoredState = new RestoredState<>(panes, nextEvaluationTime, 
> nextSlideTime);
>   }
> {code}
> DataInputView in is not closed upon return.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1707) Add an Affinity Propagation Library Method

2016-05-02 Thread JIRA

[ 
https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266863#comment-15266863
 ] 

Josep RubiĆ³ commented on FLINK-1707:


Hi Vasia,

I could not work on it last weekend and I don't think I'm going to do any 
progress this week either.
I'll keep you updated.

Thanks

> Add an Affinity Propagation Library Method
> --
>
> Key: FLINK-1707
> URL: https://issues.apache.org/jira/browse/FLINK-1707
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Vasia Kalavri
>Assignee: Josep RubiĆ³
>Priority: Minor
>  Labels: requires-design-doc
> Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf
>
>
> This issue proposes adding the an implementation of the Affinity Propagation 
> algorithm as a Gelly library method and a corresponding example.
> The algorithm is described in paper [1] and a description of a vertex-centric 
> implementation can be found is [2].
> [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf
> [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf
> Design doc:
> https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3754][Table]Add a validation phase befo...

2016-05-02 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1916#issuecomment-216270686
  
Great, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3754) Add a validation phase before construct RelNode using TableAPI

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266825#comment-15266825
 ] 

ASF GitHub Bot commented on FLINK-3754:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1916#issuecomment-216270686
  
Great, thanks!


> Add a validation phase before construct RelNode using TableAPI
> --
>
> Key: FLINK-3754
> URL: https://issues.apache.org/jira/browse/FLINK-3754
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Affects Versions: 1.0.0
>Reporter: Yijie Shen
>Assignee: Yijie Shen
>
> Unlike sql string's execution, which have a separate validation phase before 
> RelNode construction, Table API lacks the counterparts and the validation is 
> scattered in many places.
> I suggest to add a single validation phase and detect problems as early as 
> possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3793) Re-organize the Table API and SQL docs

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266821#comment-15266821
 ] 

ASF GitHub Bot commented on FLINK-3793:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1955#issuecomment-216270394
  
Thanks for the quick update. A few minor suggestions, otherwise good to 
merge.


> Re-organize the Table API and SQL docs
> --
>
> Key: FLINK-3793
> URL: https://issues.apache.org/jira/browse/FLINK-3793
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Table API
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>
> Now that we have added SQL and soon streaming SQL support, we need to 
> reorganize the Table API documentation. 
> - The current guide is under "apis/batch/libs". We should either split it 
> into a streaming and a batch part or move it to under "apis". The second 
> option might be preferable, as the batch and stream APIs have a lot in common.
> - The current guide has separate sections for Java and Scala APIs. These can 
> be merged and organized with tabs, like other parts of the docs.
> - Mentions of "Table API" can be renamed to "Table API and SQL", e.g. in the 
> software stack figure and homepage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3793][docs] re-organize table API and S...

2016-05-02 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1955#issuecomment-216270394
  
Thanks for the quick update. A few minor suggestions, otherwise good to 
merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3793) Re-organize the Table API and SQL docs

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266816#comment-15266816
 ] 

ASF GitHub Bot commented on FLINK-3793:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1955#discussion_r61755610
  
--- Diff: docs/apis/table.md ---
@@ -57,6 +52,170 @@ The following dependency must be added to your project 
in order to use the Table
 
 Note that the Table API is currently not part of the binary distribution. 
See linking with it for cluster execution [here]({{ site.baseurl 
}}/apis/cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution).
 
+
+Registering and Accessing Tables
--- End diff --

Rename if we move the access section to the corresponding sections.


> Re-organize the Table API and SQL docs
> --
>
> Key: FLINK-3793
> URL: https://issues.apache.org/jira/browse/FLINK-3793
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Table API
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>
> Now that we have added SQL and soon streaming SQL support, we need to 
> reorganize the Table API documentation. 
> - The current guide is under "apis/batch/libs". We should either split it 
> into a streaming and a batch part or move it to under "apis". The second 
> option might be preferable, as the batch and stream APIs have a lot in common.
> - The current guide has separate sections for Java and Scala APIs. These can 
> be merged and organized with tabs, like other parts of the docs.
> - Mentions of "Table API" can be renamed to "Table API and SQL", e.g. in the 
> software stack figure and homepage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3793][docs] re-organize table API and S...

2016-05-02 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1955#discussion_r61755610
  
--- Diff: docs/apis/table.md ---
@@ -57,6 +52,170 @@ The following dependency must be added to your project 
in order to use the Table
 
 Note that the Table API is currently not part of the binary distribution. 
See linking with it for cluster execution [here]({{ site.baseurl 
}}/apis/cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution).
 
+
+Registering and Accessing Tables
--- End diff --

Rename if we move the access section to the corresponding sections.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3793) Re-organize the Table API and SQL docs

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266812#comment-15266812
 ] 

ASF GitHub Bot commented on FLINK-3793:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1955#discussion_r61755531
  
--- Diff: docs/apis/table.md ---
@@ -57,6 +52,170 @@ The following dependency must be added to your project 
in order to use the Table
 
 Note that the Table API is currently not part of the binary distribution. 
See linking with it for cluster execution [here]({{ site.baseurl 
}}/apis/cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution).
 
+
+Registering and Accessing Tables
+
+
+`TableEnvironment`s have an internal table catalog to which tables can be 
registered with a unique name. After registration, a table can be accessed from 
the `TableEnvironment` by its name. Tables can be registered in different ways.
+
+*Note that it is not required to register a `DataSet` or `DataStream` as a 
table in a `TableEnvironment` in order to process it with the Table API.* 
+
+### Register a DataSet
+
+A `DataSet` is registered as a `Table` in a `BatchTableEnvironment` as 
follows:
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+BatchTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);
+
+// register the DataSet cust as table "Customers" with fields derived from 
the dataset
+tableEnv.registerDataSet("Customers", cust)
+
+// register the DataSet ord as table "Orders" with fields user, product, 
and amount
+tableEnv.registerDataSet("Orders", ord, "user, product, amount");
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+val env = ExecutionEnvironment.getExecutionEnvironment
+val tableEnv = TableEnvironment.getTableEnvironment(env)
+
+// register the DataSet cust as table "Customers" with fields derived from 
the dataset
+tableEnv.registerDataSet("Customers", cust)
+
+// register the DataSet ord as table "Orders" with fields user, product, 
and amount
+tableEnv.registerDataSet("Orders", ord, 'user, 'product, 'amount)
+{% endhighlight %}
+
+
+
+*Note: DataSet table names are not allowed to follow the 
`^_DataSetTable_[0-9]+` pattern, as these are reserved for internal use only.*
+
+### Register a DataStream
+
+A `DataStream` is registered as a `Table` in a `StreamTableEnvironment` as 
follows:
+
+
+
+{% highlight java %}
+StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
+StreamTableEnvironment tableEnv = 
TableEnvironment.getTableEnvironment(env);
+
+// register the DataStream cust as table "Customers" with fields derived 
from the datastream
+tableEnv.registerDataStream("Customers", cust)
+
+// register the DataStream ord as table "Orders" with fields user, 
product, and amount
+tableEnv.registerDataStream("Orders", ord, "user, product, amount");
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+val env = StreamExecutionEnvironment.getExecutionEnvironment
+val tableEnv = TableEnvironment.getTableEnvironment(env)
+
+// register the DataStream cust as table "Customers" with fields derived 
from the datastream
+tableEnv.registerDataStream("Customers", cust)
+
+// register the DataStream ord as table "Orders" with fields user, 
product, and amount
+tableEnv.registerDataStream("Orders", ord, 'user, 'product, 'amount)
+{% endhighlight %}
+
+
+
+*Note: DataStream table names are not allowed to follow the 
`^_DataStreamTable_[0-9]+` pattern, as these are reserved for internal use 
only.*
+
+### Register a Table
+
+A `Table` that originates from a Table API operation or a SQL query is 
registered in a `TableEnvironemnt` as follows:
+
+
+
+{% highlight java %}
+// works for StreamExecutionEnvironment identically
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+BatchTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);
+
+// convert a DataSet into a Table
+Table custT = tableEnv
+  .toTable(custDs, "name, zipcode")
+  .where("zipcode = '12345'")
+  .select("name")
+
+// register the Table custT as table "custNames"
+tableEnv.registerTable("custNames", custT)
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+// works for StreamExecutionEnvironment identically
+val env = ExecutionEnvironment.getExecutionEnvironment
+val tableEnv = TableEnvironment.getTableEnvironment(env)
+
+// convert a DataSet into a Table

[GitHub] flink pull request: [FLINK-3793][docs] re-organize table API and S...

2016-05-02 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1955#discussion_r61755531
  
--- Diff: docs/apis/table.md ---
@@ -57,6 +52,170 @@ The following dependency must be added to your project 
in order to use the Table
 
 Note that the Table API is currently not part of the binary distribution. 
See linking with it for cluster execution [here]({{ site.baseurl 
}}/apis/cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution).
 
+
+Registering and Accessing Tables
+
+
+`TableEnvironment`s have an internal table catalog to which tables can be 
registered with a unique name. After registration, a table can be accessed from 
the `TableEnvironment` by its name. Tables can be registered in different ways.
+
+*Note that it is not required to register a `DataSet` or `DataStream` as a 
table in a `TableEnvironment` in order to process it with the Table API.* 
+
+### Register a DataSet
+
+A `DataSet` is registered as a `Table` in a `BatchTableEnvironment` as 
follows:
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+BatchTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);
+
+// register the DataSet cust as table "Customers" with fields derived from 
the dataset
+tableEnv.registerDataSet("Customers", cust)
+
+// register the DataSet ord as table "Orders" with fields user, product, 
and amount
+tableEnv.registerDataSet("Orders", ord, "user, product, amount");
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+val env = ExecutionEnvironment.getExecutionEnvironment
+val tableEnv = TableEnvironment.getTableEnvironment(env)
+
+// register the DataSet cust as table "Customers" with fields derived from 
the dataset
+tableEnv.registerDataSet("Customers", cust)
+
+// register the DataSet ord as table "Orders" with fields user, product, 
and amount
+tableEnv.registerDataSet("Orders", ord, 'user, 'product, 'amount)
+{% endhighlight %}
+
+
+
+*Note: DataSet table names are not allowed to follow the 
`^_DataSetTable_[0-9]+` pattern, as these are reserved for internal use only.*
+
+### Register a DataStream
+
+A `DataStream` is registered as a `Table` in a `StreamTableEnvironment` as 
follows:
+
+
+
+{% highlight java %}
+StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
+StreamTableEnvironment tableEnv = 
TableEnvironment.getTableEnvironment(env);
+
+// register the DataStream cust as table "Customers" with fields derived 
from the datastream
+tableEnv.registerDataStream("Customers", cust)
+
+// register the DataStream ord as table "Orders" with fields user, 
product, and amount
+tableEnv.registerDataStream("Orders", ord, "user, product, amount");
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+val env = StreamExecutionEnvironment.getExecutionEnvironment
+val tableEnv = TableEnvironment.getTableEnvironment(env)
+
+// register the DataStream cust as table "Customers" with fields derived 
from the datastream
+tableEnv.registerDataStream("Customers", cust)
+
+// register the DataStream ord as table "Orders" with fields user, 
product, and amount
+tableEnv.registerDataStream("Orders", ord, 'user, 'product, 'amount)
+{% endhighlight %}
+
+
+
+*Note: DataStream table names are not allowed to follow the 
`^_DataStreamTable_[0-9]+` pattern, as these are reserved for internal use 
only.*
+
+### Register a Table
+
+A `Table` that originates from a Table API operation or a SQL query is 
registered in a `TableEnvironemnt` as follows:
+
+
+
+{% highlight java %}
+// works for StreamExecutionEnvironment identically
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+BatchTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);
+
+// convert a DataSet into a Table
+Table custT = tableEnv
+  .toTable(custDs, "name, zipcode")
+  .where("zipcode = '12345'")
+  .select("name")
+
+// register the Table custT as table "custNames"
+tableEnv.registerTable("custNames", custT)
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+// works for StreamExecutionEnvironment identically
+val env = ExecutionEnvironment.getExecutionEnvironment
+val tableEnv = TableEnvironment.getTableEnvironment(env)
+
+// convert a DataSet into a Table
+val custT = custDs
+  .toTable(tableEnv, 'name, 'zipcode)
+  .where('zipcode === "12345")
+  .select('name)
+
+// register the Table custT as table "custNames"
+tableEnv.registerTable("custNames", custT)
+{% 

[jira] [Commented] (FLINK-3793) Re-organize the Table API and SQL docs

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266803#comment-15266803
 ] 

ASF GitHub Bot commented on FLINK-3793:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1955#discussion_r61755204
  
--- Diff: docs/apis/table.md ---
@@ -33,10 +28,10 @@ under the License.
 
 **Table API and SQL are experimental features**
 
-The Table API is a SQL-like expression language that can be embedded in 
Flink's DataSet and DataStream APIs (Java and Scala).
-A `DataSet` or `DataStream` can be converted into a relational `Table` 
abstraction. You can apply relational operators such as selection, aggregation, 
and joins on `Table`s or query them with regular SQL queries.
-When a `Table` is converted back into a `DataSet` or `DataStream`, the 
logical plan, which was defined by relational operators and SQL queries, is 
optimized using [Apache Calcite](https://calcite.apache.org/)
-and transformed into a `DataSet` or `DataStream` execution plan.
+The Table API is a SQL-like expression language for relational stream and 
batch processing that can be easily embedded in Flink's DataSet and DataStream 
APIs (Java and Scala).
+The Table API operates on a relational `Table` abstraction, which can be 
created from external data sources, or existing DataSets and DataStreams. With 
the Table API, you can apply relational operators such as selection, 
aggregation, and joins on `Table`s.
--- End diff --

"The Table API *and SQL interface* operates..."


> Re-organize the Table API and SQL docs
> --
>
> Key: FLINK-3793
> URL: https://issues.apache.org/jira/browse/FLINK-3793
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Table API
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>
> Now that we have added SQL and soon streaming SQL support, we need to 
> reorganize the Table API documentation. 
> - The current guide is under "apis/batch/libs". We should either split it 
> into a streaming and a batch part or move it to under "apis". The second 
> option might be preferable, as the batch and stream APIs have a lot in common.
> - The current guide has separate sections for Java and Scala APIs. These can 
> be merged and organized with tabs, like other parts of the docs.
> - Mentions of "Table API" can be renamed to "Table API and SQL", e.g. in the 
> software stack figure and homepage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3754][Table]Add a validation phase befo...

2016-05-02 Thread yjshen
Github user yjshen commented on the pull request:

https://github.com/apache/flink/pull/1916#issuecomment-216269369
  
Will start to work on eager validation now. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3793][docs] re-organize table API and S...

2016-05-02 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1955#discussion_r61755204
  
--- Diff: docs/apis/table.md ---
@@ -33,10 +28,10 @@ under the License.
 
 **Table API and SQL are experimental features**
 
-The Table API is a SQL-like expression language that can be embedded in 
Flink's DataSet and DataStream APIs (Java and Scala).
-A `DataSet` or `DataStream` can be converted into a relational `Table` 
abstraction. You can apply relational operators such as selection, aggregation, 
and joins on `Table`s or query them with regular SQL queries.
-When a `Table` is converted back into a `DataSet` or `DataStream`, the 
logical plan, which was defined by relational operators and SQL queries, is 
optimized using [Apache Calcite](https://calcite.apache.org/)
-and transformed into a `DataSet` or `DataStream` execution plan.
+The Table API is a SQL-like expression language for relational stream and 
batch processing that can be easily embedded in Flink's DataSet and DataStream 
APIs (Java and Scala).
+The Table API operates on a relational `Table` abstraction, which can be 
created from external data sources, or existing DataSets and DataStreams. With 
the Table API, you can apply relational operators such as selection, 
aggregation, and joins on `Table`s.
--- End diff --

"The Table API *and SQL interface* operates..."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (FLINK-1991) Return Table as DataSet

2016-05-02 Thread Timo Walther (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timo Walther resolved FLINK-1991.
-
   Resolution: Fixed
Fix Version/s: 1.1.0

Has been implemented as part of FLINK-3226.

> Return Table as DataSet
> --
>
> Key: FLINK-1991
> URL: https://issues.apache.org/jira/browse/FLINK-1991
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Affects Versions: 0.9
>Reporter: Fabian Hueske
>Assignee: Timo Walther
> Fix For: 1.1.0
>
>
> WIth a {{TableEnvironment}} a {{Table}} can be converted into a Pojo or Row 
> data set. Both types have their issues. A Pojo needs to be explicitly defined 
> and a Row cannot be easily processed by a user-function.
> It would be good to have a method to convert a {{Table}} into a 
> {{DataSet}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3859) Add BigDecimal/BigInteger support to Table API

2016-05-02 Thread Timo Walther (JIRA)
Timo Walther created FLINK-3859:
---

 Summary: Add BigDecimal/BigInteger support to Table API
 Key: FLINK-3859
 URL: https://issues.apache.org/jira/browse/FLINK-3859
 Project: Flink
  Issue Type: New Feature
  Components: Table API
Reporter: Timo Walther
Assignee: Timo Walther


Since FLINK-3786 has been solved, we can now start integrating 
BigDecimal/BigInteger into the Table API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3581] [FLINK-3582] State Iterator and A...

2016-05-02 Thread gyfora
Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/1957#issuecomment-216257051
  
Hi,
So just a quick question regarding the namespace dropping in rocks. I 
though you said it would be possible to do this by using prefixes in rocks. Are 
there some limitations of this approach?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3581] [FLINK-3582] State Iterator and A...

2016-05-02 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1957#issuecomment-216261831
  
I had a version that was doing this. The problem there is that you don't 
know when you can drop column families. For knowing when a column family is 
empty you would have to keep a Set of keys for which you have state in memory. 
Which somewhat defeats the purpose of the RocksDB backend.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3581) Add Special Aligned Event-Time WindowOperator

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266754#comment-15266754
 ] 

ASF GitHub Bot commented on FLINK-3581:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1957#issuecomment-216261831
  
I had a version that was doing this. The problem there is that you don't 
know when you can drop column families. For knowing when a column family is 
empty you would have to keep a Set of keys for which you have state in memory. 
Which somewhat defeats the purpose of the RocksDB backend.


> Add Special Aligned Event-Time WindowOperator
> -
>
> Key: FLINK-3581
> URL: https://issues.apache.org/jira/browse/FLINK-3581
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> The current Window Trigger is per key. Meaning every window has a (logical) 
> Trigger for every key in the window, i.e. there will be state and time 
> triggers per key per window.
> For some types of windows, i.e. based on time it is possible to use a single 
> Trigger to fire for all keys at the same time. In that case we would save a 
> lot of space on state and timers. Which makes state snapshots a lot smaller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-3837) Create FLOOR/CEIL function

2016-05-02 Thread Timo Walther (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timo Walther resolved FLINK-3837.
-
   Resolution: Fixed
Fix Version/s: 1.1.0

Fixed in a0f5397.

> Create FLOOR/CEIL function
> --
>
> Key: FLINK-3837
> URL: https://issues.apache.org/jira/browse/FLINK-3837
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API
>Reporter: Timo Walther
>Assignee: Timo Walther
>Priority: Minor
> Fix For: 1.1.0
>
>
> Create the FLOOR/CEIL function for Table API and SQL. They will later be 
> extended in FLINK-3580 to support date and time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3793) Re-organize the Table API and SQL docs

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266743#comment-15266743
 ] 

ASF GitHub Bot commented on FLINK-3793:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1955#issuecomment-216260839
  
Looks any better now? :)


> Re-organize the Table API and SQL docs
> --
>
> Key: FLINK-3793
> URL: https://issues.apache.org/jira/browse/FLINK-3793
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Table API
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>
> Now that we have added SQL and soon streaming SQL support, we need to 
> reorganize the Table API documentation. 
> - The current guide is under "apis/batch/libs". We should either split it 
> into a streaming and a batch part or move it to under "apis". The second 
> option might be preferable, as the batch and stream APIs have a lot in common.
> - The current guide has separate sections for Java and Scala APIs. These can 
> be merged and organized with tabs, like other parts of the docs.
> - Mentions of "Table API" can be renamed to "Table API and SQL", e.g. in the 
> software stack figure and homepage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3581] [FLINK-3582] State Iterator and A...

2016-05-02 Thread gyfora
Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/1957#issuecomment-216259593
  
The other possibility would be to store them in different column families. 
Not sure about the performance there though


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3837) Create FLOOR/CEIL function

2016-05-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266735#comment-15266735
 ] 

ASF GitHub Bot commented on FLINK-3837:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1943


> Create FLOOR/CEIL function
> --
>
> Key: FLINK-3837
> URL: https://issues.apache.org/jira/browse/FLINK-3837
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API
>Reporter: Timo Walther
>Assignee: Timo Walther
>Priority: Minor
>
> Create the FLOOR/CEIL function for Table API and SQL. They will later be 
> extended in FLINK-3580 to support date and time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3837] [table] Create FLOOR/CEIL functio...

2016-05-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1943


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   >