[ https://issues.apache.org/jira/browse/SPARK-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13978592#comment-13978592 ]
Reynold Xin commented on SPARK-1188: ------------------------------------ I added you to contributor list so you should be able to edit in the future. Cheers. > GraphX triplets not working properly > ------------------------------------ > > Key: SPARK-1188 > URL: https://issues.apache.org/jira/browse/SPARK-1188 > Project: Spark > Issue Type: Bug > Components: GraphX > Affects Versions: 0.9.0 > Reporter: Kev Alan > Fix For: 1.0.0 > > > I followed the GraphX tutorial at > http://ampcamp.berkeley.edu/big-data-mini-course/graph-analytics-with-graphx.html > > on a local stand-alone cluster (Spark version 0.9.0) with two workers. > Somehow, the graph.triplets is not returning what it should -- only Eds and > Frans. > ``` > scala> graph.edges.toArray > 14/03/04 16:15:57 INFO SparkContext: Starting job: collect at EdgeRDD.scala:51 > 14/03/04 16:15:57 INFO DAGScheduler: Got job 5 (collect at EdgeRDD.scala:51) > with 1 output partitions (allowLocal=false) > 14/03/04 16:15:57 INFO DAGScheduler: Final stage: Stage 27 (collect at > EdgeRDD.scala:51) > 14/03/04 16:15:57 INFO DAGScheduler: Parents of final stage: List() > 14/03/04 16:15:57 INFO DAGScheduler: Missing parents: List() > 14/03/04 16:15:57 INFO DAGScheduler: Submitting Stage 27 (MappedRDD[36] at > map at EdgeRDD.scala:51), which has no missing parents > 14/03/04 16:15:57 INFO DAGScheduler: Submitting 1 missing tasks from Stage 27 > (MappedRDD[36] at map at EdgeRDD.scala:51) > 14/03/04 16:15:57 INFO TaskSchedulerImpl: Adding task set 27.0 with 1 tasks > 14/03/04 16:15:57 INFO TaskSetManager: Starting task 27.0:0 as TID 11 on > executor localhost: localhost (PROCESS_LOCAL) > 14/03/04 16:15:57 INFO TaskSetManager: Serialized task 27.0:0 as 2068 bytes > in 1 ms > 14/03/04 16:15:57 INFO Executor: Running task ID 11 > 14/03/04 16:15:57 INFO BlockManager: Found block rdd_2_0 locally > 14/03/04 16:15:57 INFO Executor: Serialized size of result for 11 is 936 > 14/03/04 16:15:57 INFO Executor: Sending result for 11 directly to driver > 14/03/04 16:15:57 INFO Executor: Finished task ID 11 > 14/03/04 16:15:57 INFO TaskSetManager: Finished TID 11 in 13 ms on localhost > (progress: 0/1) > 14/03/04 16:15:57 INFO DAGScheduler: Completed ResultTask(27, 0) > 14/03/04 16:15:57 INFO TaskSchedulerImpl: Remove TaskSet 27.0 from pool > 14/03/04 16:15:57 INFO DAGScheduler: Stage 27 (collect at EdgeRDD.scala:51) > finished in 0.015 s > 14/03/04 16:15:57 INFO SparkContext: Job finished: collect at > EdgeRDD.scala:51, took 0.023602266 s > res7: Array[org.apache.spark.graphx.Edge[Int]] = Array(Edge(2,1,7), > Edge(2,4,2), Edge(3,2,4), Edge(3,6,3), Edge(4,1,1), Edge(5,2,2), Edge(5,3,8), > Edge(5,6,3)) > scala> graph.vertices.toArray > 14/03/04 16:16:18 INFO SparkContext: Starting job: toArray at <console>:27 > 14/03/04 16:16:18 INFO DAGScheduler: Got job 6 (toArray at <console>:27) with > 1 output partitions (allowLocal=false) > 14/03/04 16:16:18 INFO DAGScheduler: Final stage: Stage 28 (toArray at > <console>:27) > 14/03/04 16:16:18 INFO DAGScheduler: Parents of final stage: List(Stage 32, > Stage 29) > 14/03/04 16:16:18 INFO DAGScheduler: Missing parents: List() > 14/03/04 16:16:18 INFO DAGScheduler: Submitting Stage 28 (VertexRDD[15] at > RDD at VertexRDD.scala:52), which has no missing parents > 14/03/04 16:16:18 INFO DAGScheduler: Submitting 1 missing tasks from Stage 28 > (VertexRDD[15] at RDD at VertexRDD.scala:52) > 14/03/04 16:16:18 INFO TaskSchedulerImpl: Adding task set 28.0 with 1 tasks > 14/03/04 16:16:18 INFO TaskSetManager: Starting task 28.0:0 as TID 12 on > executor localhost: localhost (PROCESS_LOCAL) > 14/03/04 16:16:18 INFO TaskSetManager: Serialized task 28.0:0 as 2426 bytes > in 0 ms > 14/03/04 16:16:18 INFO Executor: Running task ID 12 > 14/03/04 16:16:18 INFO BlockManager: Found block rdd_14_0 locally > 14/03/04 16:16:18 INFO Executor: Serialized size of result for 12 is 947 > 14/03/04 16:16:18 INFO Executor: Sending result for 12 directly to driver > 14/03/04 16:16:18 INFO Executor: Finished task ID 12 > 14/03/04 16:16:18 INFO TaskSetManager: Finished TID 12 in 13 ms on localhost > (progress: 0/1) > 14/03/04 16:16:18 INFO DAGScheduler: Completed ResultTask(28, 0) > 14/03/04 16:16:18 INFO TaskSchedulerImpl: Remove TaskSet 28.0 from pool > 14/03/04 16:16:18 INFO DAGScheduler: Stage 28 (toArray at <console>:27) > finished in 0.015 s > 14/03/04 16:16:18 INFO SparkContext: Job finished: toArray at <console>:27, > took 0.027839851 s > res9: Array[(org.apache.spark.graphx.VertexId, (String, Int))] = > Array((4,(David,42)), (2,(Bob,27)), (6,(Fran,50)), (5,(Ed,55)), > (3,(Charlie,65)), (1,(Alice,28))) > scala> graph.triplets.toArray > 14/03/04 16:16:30 INFO SparkContext: Starting job: toArray at <console>:27 > 14/03/04 16:16:30 INFO DAGScheduler: Got job 7 (toArray at <console>:27) with > 1 output partitions (allowLocal=false) > 14/03/04 16:16:31 INFO DAGScheduler: Final stage: Stage 33 (toArray at > <console>:27) > 14/03/04 16:16:31 INFO DAGScheduler: Parents of final stage: List(Stage 34) > 14/03/04 16:16:31 INFO DAGScheduler: Missing parents: List() > 14/03/04 16:16:31 INFO DAGScheduler: Submitting Stage 33 > (ZippedPartitionsRDD2[32] at zipPartitions at GraphImpl.scala:60), which has > no missing parents > 14/03/04 16:16:31 INFO DAGScheduler: Submitting 1 missing tasks from Stage 33 > (ZippedPartitionsRDD2[32] at zipPartitions at GraphImpl.scala:60) > 14/03/04 16:16:31 INFO TaskSchedulerImpl: Adding task set 33.0 with 1 tasks > 14/03/04 16:16:31 INFO TaskSetManager: Starting task 33.0:0 as TID 13 on > executor localhost: localhost (PROCESS_LOCAL) > 14/03/04 16:16:31 INFO TaskSetManager: Serialized task 33.0:0 as 3322 bytes > in 1 ms > 14/03/04 16:16:31 INFO Executor: Running task ID 13 > 14/03/04 16:16:31 INFO BlockManager: Found block rdd_2_0 locally > 14/03/04 16:16:31 INFO BlockManager: Found block rdd_31_0 locally > 14/03/04 16:16:31 INFO Executor: Serialized size of result for 13 is 931 > 14/03/04 16:16:31 INFO Executor: Sending result for 13 directly to driver > 14/03/04 16:16:31 INFO Executor: Finished task ID 13 > 14/03/04 16:16:31 INFO TaskSetManager: Finished TID 13 in 17 ms on localhost > (progress: 0/1) > 14/03/04 16:16:31 INFO DAGScheduler: Completed ResultTask(33, 0) > 14/03/04 16:16:31 INFO TaskSchedulerImpl: Remove TaskSet 33.0 from pool > 14/03/04 16:16:31 INFO DAGScheduler: Stage 33 (toArray at <console>:27) > finished in 0.019 s > 14/03/04 16:16:31 INFO SparkContext: Job finished: toArray at <console>:27, > took 0.037909394 s > res10: Array[org.apache.spark.graphx.EdgeTriplet[(String, Int),Int]] = > Array(((5,(Ed,55)),(6,(Fran,50)),3), ((5,(Ed,55)),(6,(Fran,50)),3), > ((5,(Ed,55)),(6,(Fran,50)),3), ((5,(Ed,55)),(6,(Fran,50)),3), > ((5,(Ed,55)),(6,(Fran,50)),3), ((5,(Ed,55)),(6,(Fran,50)),3), > ((5,(Ed,55)),(6,(Fran,50)),3), ((5,(Ed,55)),(6,(Fran,50)),3)) > ``` -- This message was sent by Atlassian JIRA (v6.2#6252)