Brennon York created SPARK-5790: ----------------------------------- Summary: VertexRDD's won't zip properly for `diff` capability Key: SPARK-5790 URL: https://issues.apache.org/jira/browse/SPARK-5790 Project: Spark Issue Type: Bug Components: GraphX Reporter: Brennon York
For VertexRDD's with differing partition sizes one cannot run commands like `diff` as it will thrown an IllegalArgumentException. The code below provides an example: {code:scala} import org.apache.spark.graphx._ import org.apache.spark.rdd._ val setA: VertexRDD[Int] = VertexRDD(sc.parallelize(0L until 3L).map(id => (id, id.toInt+1))) setA.collect.foreach(println(_)) val setB: VertexRDD[Int] = VertexRDD(sc.parallelize(2L until 4L).map(id => (id, id.toInt+2))) setB.collect.foreach(println(_)) val diff = setA.diff(setB) diff.collect.foreach(println(_)) val setC: VertexRDD[Int] = VertexRDD(sc.parallelize(2L until 4L).map(id => (id, id.toInt+2)) ++ sc.parallelize(6L until 8L).map(id => (id, id.toInt+2))) setA.diff(setC).collect // java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of partitions {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org