[GitHub] spark pull request: [SPARK-13386][GraphX] ConnectedComponents shou...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11268 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13386][GraphX] ConnectedComponents shou...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/11268#issuecomment-186676145 Thanks - merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13386][GraphX] ConnectedComponents shou...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11268#issuecomment-186675185 **[Test build #2552 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2552/consoleFull)** for PR 11268 at commit [`774f0fc`](https://github.com/apache/spark/commit/774f0fcc9bcb2d5c468a7d56c4144453d31258a5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13386][GraphX] ConnectedComponents shou...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11268#issuecomment-186670441 **[Test build #2552 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2552/consoleFull)** for PR 11268 at commit [`774f0fc`](https://github.com/apache/spark/commit/774f0fcc9bcb2d5c468a7d56c4144453d31258a5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13386][GraphX] ConnectedComponents shou...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/11268#discussion_r53549798 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/ConnectedComponents.scala --- @@ -29,13 +29,14 @@ object ConnectedComponents { * * @tparam VD the vertex attribute type (discarded in the computation) * @tparam ED the edge attribute type (preserved in the computation) - * * @param graph the graph for which to compute the connected components - * + * @param maxIterations the maximum number of iterations to run for * @return a graph with vertex attributes containing the smallest vertex in each * connected component */ - def run[VD: ClassTag, ED: ClassTag](graph: Graph[VD, ED]): Graph[VertexId, ED] = { + def run[VD: ClassTag, ED: ClassTag](graph: Graph[VD, ED], + maxIterations: Int = Int.MaxValue): Graph[VertexId, ED] = { --- End diff -- ok, another run api without maxIterations options is added --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13386][GraphX] ConnectedComponents shou...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/11268#discussion_r53544911 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/ConnectedComponents.scala --- @@ -29,13 +29,14 @@ object ConnectedComponents { * * @tparam VD the vertex attribute type (discarded in the computation) * @tparam ED the edge attribute type (preserved in the computation) - * * @param graph the graph for which to compute the connected components - * + * @param maxIterations the maximum number of iterations to run for * @return a graph with vertex attributes containing the smallest vertex in each * connected component */ - def run[VD: ClassTag, ED: ClassTag](graph: Graph[VD, ED]): Graph[VertexId, ED] = { + def run[VD: ClassTag, ED: ClassTag](graph: Graph[VD, ED], + maxIterations: Int = Int.MaxValue): Graph[VertexId, ED] = { --- End diff -- for this one, we should also just overload it so we don't break binary compatibility. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13386][GraphX] ConnectedComponents shou...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/11268#discussion_r53544885 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/ConnectedComponents.scala --- @@ -29,13 +29,14 @@ object ConnectedComponents { * * @tparam VD the vertex attribute type (discarded in the computation) * @tparam ED the edge attribute type (preserved in the computation) - * * @param graph the graph for which to compute the connected components - * + * @param numIter the maximum number of iterations to run for * @return a graph with vertex attributes containing the smallest vertex in each * connected component */ - def run[VD: ClassTag, ED: ClassTag](graph: Graph[VD, ED]): Graph[VertexId, ED] = { + def run[VD: ClassTag, ED: ClassTag](graph: Graph[VD, ED], + numIter: Int = Int.MaxValue): Graph[VertexId, ED] = { --- End diff -- ok, all those options are renamed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13386][GraphX] ConnectedComponents shou...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/11268#discussion_r53544614 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/ConnectedComponents.scala --- @@ -29,13 +29,14 @@ object ConnectedComponents { * * @tparam VD the vertex attribute type (discarded in the computation) * @tparam ED the edge attribute type (preserved in the computation) - * * @param graph the graph for which to compute the connected components - * + * @param numIter the maximum number of iterations to run for * @return a graph with vertex attributes containing the smallest vertex in each * connected component */ - def run[VD: ClassTag, ED: ClassTag](graph: Graph[VD, ED]): Graph[VertexId, ED] = { + def run[VD: ClassTag, ED: ClassTag](graph: Graph[VD, ED], + numIter: Int = Int.MaxValue): Graph[VertexId, ED] = { --- End diff -- and call it maxIterations --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13386][GraphX] ConnectedComponents shou...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/11268#discussion_r53544613 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/ConnectedComponents.scala --- @@ -29,13 +29,14 @@ object ConnectedComponents { * * @tparam VD the vertex attribute type (discarded in the computation) * @tparam ED the edge attribute type (preserved in the computation) - * * @param graph the graph for which to compute the connected components - * + * @param numIter the maximum number of iterations to run for * @return a graph with vertex attributes containing the smallest vertex in each * connected component */ - def run[VD: ClassTag, ED: ClassTag](graph: Graph[VD, ED]): Graph[VertexId, ED] = { + def run[VD: ClassTag, ED: ClassTag](graph: Graph[VD, ED], + numIter: Int = Int.MaxValue): Graph[VertexId, ED] = { --- End diff -- we should do the same thing here too to avoid breaking apis. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13386][GraphX] ConnectedComponents shou...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/11268#discussion_r53544610 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/GraphOps.scala --- @@ -406,13 +406,23 @@ class GraphOps[VD: ClassTag, ED: ClassTag](graph: Graph[VD, ED]) extends Seriali } /** +* Compute the connected component membership of each vertex and return a graph with the vertex +* value containing the lowest vertex id in the connected component containing that vertex. +* +* @see [[org.apache.spark.graphx.lib.ConnectedComponents$#run]] +*/ + def connectedComponents(): Graph[VertexId, ED] = { +ConnectedComponents.run(graph, Int.MaxValue) + } + + /** * Compute the connected component membership of each vertex and return a graph with the vertex * value containing the lowest vertex id in the connected component containing that vertex. * * @see [[org.apache.spark.graphx.lib.ConnectedComponents$#run]] */ - def connectedComponents(): Graph[VertexId, ED] = { -ConnectedComponents.run(graph) + def connectedComponents(numIter: Int): Graph[VertexId, ED] = { --- End diff -- maybe name this maxIterations to be consistent with Pregel's parameter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13386][GraphX] ConnectedComponents shou...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/11268#discussion_r53542779 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/GraphOps.scala --- @@ -411,8 +411,8 @@ class GraphOps[VD: ClassTag, ED: ClassTag](graph: Graph[VD, ED]) extends Seriali * * @see [[org.apache.spark.graphx.lib.ConnectedComponents$#run]] */ - def connectedComponents(): Graph[VertexId, ED] = { -ConnectedComponents.run(graph) + def connectedComponents(numIter: Int = Int.MaxValue): Graph[VertexId, ED] = { --- End diff -- you are right, I have fixed it. Now there is another ConnectedComponents API with numIters --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13386][GraphX] ConnectedComponents shou...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/11268#discussion_r53530296 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/GraphOps.scala --- @@ -411,8 +411,8 @@ class GraphOps[VD: ClassTag, ED: ClassTag](graph: Graph[VD, ED]) extends Seriali * * @see [[org.apache.spark.graphx.lib.ConnectedComponents$#run]] */ - def connectedComponents(): Graph[VertexId, ED] = { -ConnectedComponents.run(graph) + def connectedComponents(numIter: Int = Int.MaxValue): Graph[VertexId, ED] = { --- End diff -- we should just add a new method to this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13386][GraphX] ConnectedComponents shou...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/11268#issuecomment-186142745 Although that's probably fine, it does change a public API. I'd prefer a maintainer look at it, but I'm not sure who is still available. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13386][GraphX] ConnectedComponents shou...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11268#issuecomment-186134977 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13386][Graphx] ConnectedComponents shou...
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/11268 [SPARK-13386][Graphx] ConnectedComponents should support maxIteration option ## What changes were proposed in this pull request? add maxIteration option for ConnectedComponents algorithm ## How was the this patch tested? unit tests passed You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark ccwithmax Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11268.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11268 commit ad428d564ae6b831f0dc44df4582f10c3c11b5e4 Author: Zheng RuiFeng Date: 2016-02-19T08:12:53Z add numIter commit 918e7812db5efa22329c0c40b99c234c2ebc7c3b Author: Zheng RuiFeng Date: 2016-02-19T09:24:08Z fix example --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org