Repository: spark Updated Branches: refs/heads/master bad0f7dbb -> 192d1f9cf
[GRAPHX][EXAMPLES] move graphx test data directory and update graphx document ## What changes were proposed in this pull request? There are two test data files used for graphx examples existing in directory "graphx/data" I move it into "data/" directory because the "graphx" directory is used for code files and other test data files (such as mllib, streaming test data) are all in there. I also update the graphx document where reference the data files which I move place. ## How was this patch tested? N/A Author: WeichenXu <weichenxu...@outlook.com> Closes #14010 from WeichenXu123/move_graphx_data_dir. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/192d1f9c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/192d1f9c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/192d1f9c Branch: refs/heads/master Commit: 192d1f9cf3463d050b87422939448f2acf86acc9 Parents: bad0f7d Author: WeichenXu <weichenxu...@outlook.com> Authored: Sat Jul 2 08:40:23 2016 +0100 Committer: Sean Owen <so...@cloudera.com> Committed: Sat Jul 2 08:40:23 2016 +0100 ---------------------------------------------------------------------- data/graphx/followers.txt | 8 ++++++++ data/graphx/users.txt | 7 +++++++ docs/graphx-programming-guide.md | 18 +++++++++--------- graphx/data/followers.txt | 8 -------- graphx/data/users.txt | 7 ------- 5 files changed, 24 insertions(+), 24 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/192d1f9c/data/graphx/followers.txt ---------------------------------------------------------------------- diff --git a/data/graphx/followers.txt b/data/graphx/followers.txt new file mode 100644 index 0000000..7bb8e90 --- /dev/null +++ b/data/graphx/followers.txt @@ -0,0 +1,8 @@ +2 1 +4 1 +1 2 +6 3 +7 3 +7 6 +6 7 +3 7 http://git-wip-us.apache.org/repos/asf/spark/blob/192d1f9c/data/graphx/users.txt ---------------------------------------------------------------------- diff --git a/data/graphx/users.txt b/data/graphx/users.txt new file mode 100644 index 0000000..982d19d --- /dev/null +++ b/data/graphx/users.txt @@ -0,0 +1,7 @@ +1,BarackObama,Barack Obama +2,ladygaga,Goddess of Love +3,jeresig,John Resig +4,justinbieber,Justin Bieber +6,matei_zaharia,Matei Zaharia +7,odersky,Martin Odersky +8,anonsys http://git-wip-us.apache.org/repos/asf/spark/blob/192d1f9c/docs/graphx-programming-guide.md ---------------------------------------------------------------------- diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md index 81cf174..e376b66 100644 --- a/docs/graphx-programming-guide.md +++ b/docs/graphx-programming-guide.md @@ -1007,15 +1007,15 @@ PageRank measures the importance of each vertex in a graph, assuming an edge fro GraphX comes with static and dynamic implementations of PageRank as methods on the [`PageRank` object][PageRank]. Static PageRank runs for a fixed number of iterations, while dynamic PageRank runs until the ranks converge (i.e., stop changing by more than a specified tolerance). [`GraphOps`][GraphOps] allows calling these algorithms directly as methods on `Graph`. -GraphX also includes an example social network dataset that we can run PageRank on. A set of users is given in `graphx/data/users.txt`, and a set of relationships between users is given in `graphx/data/followers.txt`. We compute the PageRank of each user as follows: +GraphX also includes an example social network dataset that we can run PageRank on. A set of users is given in `data/graphx/users.txt`, and a set of relationships between users is given in `data/graphx/followers.txt`. We compute the PageRank of each user as follows: {% highlight scala %} // Load the edges as a graph -val graph = GraphLoader.edgeListFile(sc, "graphx/data/followers.txt") +val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt") // Run PageRank val ranks = graph.pageRank(0.0001).vertices // Join the ranks with the usernames -val users = sc.textFile("graphx/data/users.txt").map { line => +val users = sc.textFile("data/graphx/users.txt").map { line => val fields = line.split(",") (fields(0).toLong, fields(1)) } @@ -1032,11 +1032,11 @@ The connected components algorithm labels each connected component of the graph {% highlight scala %} // Load the graph as in the PageRank example -val graph = GraphLoader.edgeListFile(sc, "graphx/data/followers.txt") +val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt") // Find the connected components val cc = graph.connectedComponents().vertices // Join the connected components with the usernames -val users = sc.textFile("graphx/data/users.txt").map { line => +val users = sc.textFile("data/graphx/users.txt").map { line => val fields = line.split(",") (fields(0).toLong, fields(1)) } @@ -1053,11 +1053,11 @@ A vertex is part of a triangle when it has two adjacent vertices with an edge be {% highlight scala %} // Load the edges in canonical order and partition the graph for triangle count -val graph = GraphLoader.edgeListFile(sc, "graphx/data/followers.txt", true).partitionBy(PartitionStrategy.RandomVertexCut) +val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt", true).partitionBy(PartitionStrategy.RandomVertexCut) // Find the triangle count for each vertex val triCounts = graph.triangleCount().vertices // Join the triangle counts with the usernames -val users = sc.textFile("graphx/data/users.txt").map { line => +val users = sc.textFile("data/graphx/users.txt").map { line => val fields = line.split(",") (fields(0).toLong, fields(1)) } @@ -1081,11 +1081,11 @@ all of this in just a few lines with GraphX: val sc = new SparkContext("spark://master.amplab.org", "research") // Load my user data and parse into tuples of user id and attribute list -val users = (sc.textFile("graphx/data/users.txt") +val users = (sc.textFile("data/graphx/users.txt") .map(line => line.split(",")).map( parts => (parts.head.toLong, parts.tail) )) // Parse the edge data which is already in userId -> userId format -val followerGraph = GraphLoader.edgeListFile(sc, "graphx/data/followers.txt") +val followerGraph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt") // Attach the user attributes val graph = followerGraph.outerJoinVertices(users) { http://git-wip-us.apache.org/repos/asf/spark/blob/192d1f9c/graphx/data/followers.txt ---------------------------------------------------------------------- diff --git a/graphx/data/followers.txt b/graphx/data/followers.txt deleted file mode 100644 index 7bb8e90..0000000 --- a/graphx/data/followers.txt +++ /dev/null @@ -1,8 +0,0 @@ -2 1 -4 1 -1 2 -6 3 -7 3 -7 6 -6 7 -3 7 http://git-wip-us.apache.org/repos/asf/spark/blob/192d1f9c/graphx/data/users.txt ---------------------------------------------------------------------- diff --git a/graphx/data/users.txt b/graphx/data/users.txt deleted file mode 100644 index 982d19d..0000000 --- a/graphx/data/users.txt +++ /dev/null @@ -1,7 +0,0 @@ -1,BarackObama,Barack Obama -2,ladygaga,Goddess of Love -3,jeresig,John Resig -4,justinbieber,Justin Bieber -6,matei_zaharia,Matei Zaharia -7,odersky,Martin Odersky -8,anonsys --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org