[GitHub] spark pull request: [SPARK-13386][GraphX] ConnectedComponents shou...

2016-02-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11268


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13386][GraphX] ConnectedComponents shou...

2016-02-20 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11268#issuecomment-186676145
  
Thanks - merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13386][GraphX] ConnectedComponents shou...

2016-02-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11268#issuecomment-186675185
  
**[Test build #2552 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2552/consoleFull)**
 for PR 11268 at commit 
[`774f0fc`](https://github.com/apache/spark/commit/774f0fcc9bcb2d5c468a7d56c4144453d31258a5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13386][GraphX] ConnectedComponents shou...

2016-02-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11268#issuecomment-186670441
  
**[Test build #2552 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2552/consoleFull)**
 for PR 11268 at commit 
[`774f0fc`](https://github.com/apache/spark/commit/774f0fcc9bcb2d5c468a7d56c4144453d31258a5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13386][GraphX] ConnectedComponents shou...

2016-02-20 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/11268#discussion_r53549798
  
--- Diff: 
graphx/src/main/scala/org/apache/spark/graphx/lib/ConnectedComponents.scala ---
@@ -29,13 +29,14 @@ object ConnectedComponents {
*
* @tparam VD the vertex attribute type (discarded in the computation)
* @tparam ED the edge attribute type (preserved in the computation)
-   *
* @param graph the graph for which to compute the connected components
-   *
+   * @param maxIterations the maximum number of iterations to run for
* @return a graph with vertex attributes containing the smallest vertex 
in each
* connected component
*/
-  def run[VD: ClassTag, ED: ClassTag](graph: Graph[VD, ED]): 
Graph[VertexId, ED] = {
+  def run[VD: ClassTag, ED: ClassTag](graph: Graph[VD, ED],
+  maxIterations: Int = Int.MaxValue): 
Graph[VertexId, ED] = {
--- End diff --

ok, another run api without maxIterations options is added


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13386][GraphX] ConnectedComponents shou...

2016-02-19 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/11268#discussion_r53544911
  
--- Diff: 
graphx/src/main/scala/org/apache/spark/graphx/lib/ConnectedComponents.scala ---
@@ -29,13 +29,14 @@ object ConnectedComponents {
*
* @tparam VD the vertex attribute type (discarded in the computation)
* @tparam ED the edge attribute type (preserved in the computation)
-   *
* @param graph the graph for which to compute the connected components
-   *
+   * @param maxIterations the maximum number of iterations to run for
* @return a graph with vertex attributes containing the smallest vertex 
in each
* connected component
*/
-  def run[VD: ClassTag, ED: ClassTag](graph: Graph[VD, ED]): 
Graph[VertexId, ED] = {
+  def run[VD: ClassTag, ED: ClassTag](graph: Graph[VD, ED],
+  maxIterations: Int = Int.MaxValue): 
Graph[VertexId, ED] = {
--- End diff --

for this one, we should also just overload it so we don't break binary 
compatibility. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13386][GraphX] ConnectedComponents shou...

2016-02-19 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/11268#discussion_r53544885
  
--- Diff: 
graphx/src/main/scala/org/apache/spark/graphx/lib/ConnectedComponents.scala ---
@@ -29,13 +29,14 @@ object ConnectedComponents {
*
* @tparam VD the vertex attribute type (discarded in the computation)
* @tparam ED the edge attribute type (preserved in the computation)
-   *
* @param graph the graph for which to compute the connected components
-   *
+   * @param numIter the maximum number of iterations to run for
* @return a graph with vertex attributes containing the smallest vertex 
in each
* connected component
*/
-  def run[VD: ClassTag, ED: ClassTag](graph: Graph[VD, ED]): 
Graph[VertexId, ED] = {
+  def run[VD: ClassTag, ED: ClassTag](graph: Graph[VD, ED],
+  numIter: Int = Int.MaxValue): 
Graph[VertexId, ED] = {
--- End diff --

ok, all those options are renamed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13386][GraphX] ConnectedComponents shou...

2016-02-19 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/11268#discussion_r53544614
  
--- Diff: 
graphx/src/main/scala/org/apache/spark/graphx/lib/ConnectedComponents.scala ---
@@ -29,13 +29,14 @@ object ConnectedComponents {
*
* @tparam VD the vertex attribute type (discarded in the computation)
* @tparam ED the edge attribute type (preserved in the computation)
-   *
* @param graph the graph for which to compute the connected components
-   *
+   * @param numIter the maximum number of iterations to run for
* @return a graph with vertex attributes containing the smallest vertex 
in each
* connected component
*/
-  def run[VD: ClassTag, ED: ClassTag](graph: Graph[VD, ED]): 
Graph[VertexId, ED] = {
+  def run[VD: ClassTag, ED: ClassTag](graph: Graph[VD, ED],
+  numIter: Int = Int.MaxValue): 
Graph[VertexId, ED] = {
--- End diff --

and call it maxIterations


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13386][GraphX] ConnectedComponents shou...

2016-02-19 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/11268#discussion_r53544613
  
--- Diff: 
graphx/src/main/scala/org/apache/spark/graphx/lib/ConnectedComponents.scala ---
@@ -29,13 +29,14 @@ object ConnectedComponents {
*
* @tparam VD the vertex attribute type (discarded in the computation)
* @tparam ED the edge attribute type (preserved in the computation)
-   *
* @param graph the graph for which to compute the connected components
-   *
+   * @param numIter the maximum number of iterations to run for
* @return a graph with vertex attributes containing the smallest vertex 
in each
* connected component
*/
-  def run[VD: ClassTag, ED: ClassTag](graph: Graph[VD, ED]): 
Graph[VertexId, ED] = {
+  def run[VD: ClassTag, ED: ClassTag](graph: Graph[VD, ED],
+  numIter: Int = Int.MaxValue): 
Graph[VertexId, ED] = {
--- End diff --

we should do the same thing here too to avoid breaking apis.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13386][GraphX] ConnectedComponents shou...

2016-02-19 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/11268#discussion_r53544610
  
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/GraphOps.scala ---
@@ -406,13 +406,23 @@ class GraphOps[VD: ClassTag, ED: ClassTag](graph: 
Graph[VD, ED]) extends Seriali
   }
 
   /**
+* Compute the connected component membership of each vertex and return 
a graph with the vertex
+* value containing the lowest vertex id in the connected component 
containing that vertex.
+*
+* @see [[org.apache.spark.graphx.lib.ConnectedComponents$#run]]
+*/
+  def connectedComponents(): Graph[VertexId, ED] = {
+ConnectedComponents.run(graph, Int.MaxValue)
+  }
+
+  /**
* Compute the connected component membership of each vertex and return 
a graph with the vertex
* value containing the lowest vertex id in the connected component 
containing that vertex.
*
* @see [[org.apache.spark.graphx.lib.ConnectedComponents$#run]]
*/
-  def connectedComponents(): Graph[VertexId, ED] = {
-ConnectedComponents.run(graph)
+  def connectedComponents(numIter: Int): Graph[VertexId, ED] = {
--- End diff --

maybe name this maxIterations to be consistent with Pregel's parameter.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13386][GraphX] ConnectedComponents shou...

2016-02-19 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/11268#discussion_r53542779
  
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/GraphOps.scala ---
@@ -411,8 +411,8 @@ class GraphOps[VD: ClassTag, ED: ClassTag](graph: 
Graph[VD, ED]) extends Seriali
*
* @see [[org.apache.spark.graphx.lib.ConnectedComponents$#run]]
*/
-  def connectedComponents(): Graph[VertexId, ED] = {
-ConnectedComponents.run(graph)
+  def connectedComponents(numIter: Int = Int.MaxValue): Graph[VertexId, 
ED] = {
--- End diff --

you are right, I have fixed it. Now there is another ConnectedComponents 
API with numIters


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13386][GraphX] ConnectedComponents shou...

2016-02-19 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/11268#discussion_r53530296
  
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/GraphOps.scala ---
@@ -411,8 +411,8 @@ class GraphOps[VD: ClassTag, ED: ClassTag](graph: 
Graph[VD, ED]) extends Seriali
*
* @see [[org.apache.spark.graphx.lib.ConnectedComponents$#run]]
*/
-  def connectedComponents(): Graph[VertexId, ED] = {
-ConnectedComponents.run(graph)
+  def connectedComponents(numIter: Int = Int.MaxValue): Graph[VertexId, 
ED] = {
--- End diff --

we should just add a new method to this.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13386][GraphX] ConnectedComponents shou...

2016-02-19 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/11268#issuecomment-186142745
  
Although that's probably fine, it does change a public API. I'd prefer a 
maintainer look at it, but I'm not sure who is still available.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13386][GraphX] ConnectedComponents shou...

2016-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11268#issuecomment-186134977
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13386][Graphx] ConnectedComponents shou...

2016-02-19 Thread zhengruifeng
GitHub user zhengruifeng opened a pull request:

https://github.com/apache/spark/pull/11268

[SPARK-13386][Graphx] ConnectedComponents should support maxIteration option

## What changes were proposed in this pull request?

add maxIteration option for ConnectedComponents algorithm


## How was the this patch tested?

unit tests passed




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhengruifeng/spark ccwithmax

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11268.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11268


commit ad428d564ae6b831f0dc44df4582f10c3c11b5e4
Author: Zheng RuiFeng 
Date:   2016-02-19T08:12:53Z

add numIter

commit 918e7812db5efa22329c0c40b99c234c2ebc7c3b
Author: Zheng RuiFeng 
Date:   2016-02-19T09:24:08Z

fix example




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org