[jira] [Commented] (SPARK-5790) VertexRDD's won't zip properly for `diff` capability
[ https://issues.apache.org/jira/browse/SPARK-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361319#comment-14361319 ] Apache Spark commented on SPARK-5790: - User 'brennonyork' has created a pull request for this issue: https://github.com/apache/spark/pull/5023 > VertexRDD's won't zip properly for `diff` capability > > > Key: SPARK-5790 > URL: https://issues.apache.org/jira/browse/SPARK-5790 > Project: Spark > Issue Type: Bug > Components: GraphX >Reporter: Brennon York >Assignee: Brennon York > > For VertexRDD's with differing partition sizes one cannot run commands like > `diff` as it will thrown an IllegalArgumentException. The code below provides > an example: > {code} > import org.apache.spark.graphx._ > import org.apache.spark.rdd._ > val setA: VertexRDD[Int] = VertexRDD(sc.parallelize(0L until 3L).map(id => > (id, id.toInt+1))) > setA.collect.foreach(println(_)) > val setB: VertexRDD[Int] = VertexRDD(sc.parallelize(2L until 4L).map(id => > (id, id.toInt+2))) > setB.collect.foreach(println(_)) > val diff = setA.diff(setB) > diff.collect.foreach(println(_)) > val setC: VertexRDD[Int] = VertexRDD(sc.parallelize(2L until 4L).map(id => > (id, id.toInt+2)) ++ sc.parallelize(6L until 8L).map(id => (id, id.toInt+2))) > setA.diff(setC).collect > // java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of > partitions > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5790) VertexRDD's won't zip properly for `diff` capability
[ https://issues.apache.org/jira/browse/SPARK-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360613#comment-14360613 ] Brennon York commented on SPARK-5790: - [~maropu] did you get those tests in a PR or into the master branch for Spark? I was going to close this issue, but wanted to make sure we didn't lose those tests! :) > VertexRDD's won't zip properly for `diff` capability > > > Key: SPARK-5790 > URL: https://issues.apache.org/jira/browse/SPARK-5790 > Project: Spark > Issue Type: Bug > Components: GraphX >Reporter: Brennon York >Assignee: Brennon York > > For VertexRDD's with differing partition sizes one cannot run commands like > `diff` as it will thrown an IllegalArgumentException. The code below provides > an example: > {code} > import org.apache.spark.graphx._ > import org.apache.spark.rdd._ > val setA: VertexRDD[Int] = VertexRDD(sc.parallelize(0L until 3L).map(id => > (id, id.toInt+1))) > setA.collect.foreach(println(_)) > val setB: VertexRDD[Int] = VertexRDD(sc.parallelize(2L until 4L).map(id => > (id, id.toInt+2))) > setB.collect.foreach(println(_)) > val diff = setA.diff(setB) > diff.collect.foreach(println(_)) > val setC: VertexRDD[Int] = VertexRDD(sc.parallelize(2L until 4L).map(id => > (id, id.toInt+2)) ++ sc.parallelize(6L until 8L).map(id => (id, id.toInt+2))) > setA.diff(setC).collect > // java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of > partitions > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5790) VertexRDD's won't zip properly for `diff` capability
[ https://issues.apache.org/jira/browse/SPARK-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14334292#comment-14334292 ] Takeshi Yamamuro commented on SPARK-5790: - Thanks for your work :) > VertexRDD's won't zip properly for `diff` capability > > > Key: SPARK-5790 > URL: https://issues.apache.org/jira/browse/SPARK-5790 > Project: Spark > Issue Type: Bug > Components: GraphX >Reporter: Brennon York >Assignee: Brennon York > > For VertexRDD's with differing partition sizes one cannot run commands like > `diff` as it will thrown an IllegalArgumentException. The code below provides > an example: > {code} > import org.apache.spark.graphx._ > import org.apache.spark.rdd._ > val setA: VertexRDD[Int] = VertexRDD(sc.parallelize(0L until 3L).map(id => > (id, id.toInt+1))) > setA.collect.foreach(println(_)) > val setB: VertexRDD[Int] = VertexRDD(sc.parallelize(2L until 4L).map(id => > (id, id.toInt+2))) > setB.collect.foreach(println(_)) > val diff = setA.diff(setB) > diff.collect.foreach(println(_)) > val setC: VertexRDD[Int] = VertexRDD(sc.parallelize(2L until 4L).map(id => > (id, id.toInt+2)) ++ sc.parallelize(6L until 8L).map(id => (id, id.toInt+2))) > setA.diff(setC).collect > // java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of > partitions > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5790) VertexRDD's won't zip properly for `diff` capability
[ https://issues.apache.org/jira/browse/SPARK-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329252#comment-14329252 ] Brennon York commented on SPARK-5790: - [~maropu] this looks very similar to the work I just pushed up for [SPARK-1955|https://github.com/apache/spark/pull/4705] which was acting as the overarching issue for this ticket. I didn't write tests though which would be a major benefit. Would you be willing to refactor and only include the tests to close this issue out? That would help out tremendously and I wouldn't want to lose that effort! > VertexRDD's won't zip properly for `diff` capability > > > Key: SPARK-5790 > URL: https://issues.apache.org/jira/browse/SPARK-5790 > Project: Spark > Issue Type: Bug > Components: GraphX >Reporter: Brennon York >Assignee: Brennon York > > For VertexRDD's with differing partition sizes one cannot run commands like > `diff` as it will thrown an IllegalArgumentException. The code below provides > an example: > {code} > import org.apache.spark.graphx._ > import org.apache.spark.rdd._ > val setA: VertexRDD[Int] = VertexRDD(sc.parallelize(0L until 3L).map(id => > (id, id.toInt+1))) > setA.collect.foreach(println(_)) > val setB: VertexRDD[Int] = VertexRDD(sc.parallelize(2L until 4L).map(id => > (id, id.toInt+2))) > setB.collect.foreach(println(_)) > val diff = setA.diff(setB) > diff.collect.foreach(println(_)) > val setC: VertexRDD[Int] = VertexRDD(sc.parallelize(2L until 4L).map(id => > (id, id.toInt+2)) ++ sc.parallelize(6L until 8L).map(id => (id, id.toInt+2))) > setA.diff(setC).collect > // java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of > partitions > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5790) VertexRDD's won't zip properly for `diff` capability
[ https://issues.apache.org/jira/browse/SPARK-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328778#comment-14328778 ] Takeshi Yamamuro commented on SPARK-5790: - Hi, What's the status of your work? I fixed this bugs, so if you haven't finished yet, plz refer to my patch: https://github.com/maropu/spark/commit/1f64794b2ce33e64f340e383d4e8a60639a7eb4b Thanks. > VertexRDD's won't zip properly for `diff` capability > > > Key: SPARK-5790 > URL: https://issues.apache.org/jira/browse/SPARK-5790 > Project: Spark > Issue Type: Bug > Components: GraphX >Reporter: Brennon York >Assignee: Brennon York > > For VertexRDD's with differing partition sizes one cannot run commands like > `diff` as it will thrown an IllegalArgumentException. The code below provides > an example: > {code} > import org.apache.spark.graphx._ > import org.apache.spark.rdd._ > val setA: VertexRDD[Int] = VertexRDD(sc.parallelize(0L until 3L).map(id => > (id, id.toInt+1))) > setA.collect.foreach(println(_)) > val setB: VertexRDD[Int] = VertexRDD(sc.parallelize(2L until 4L).map(id => > (id, id.toInt+2))) > setB.collect.foreach(println(_)) > val diff = setA.diff(setB) > diff.collect.foreach(println(_)) > val setC: VertexRDD[Int] = VertexRDD(sc.parallelize(2L until 4L).map(id => > (id, id.toInt+2)) ++ sc.parallelize(6L until 8L).map(id => (id, id.toInt+2))) > setA.diff(setC).collect > // java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of > partitions > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5790) VertexRDD's won't zip properly for `diff` capability
[ https://issues.apache.org/jira/browse/SPARK-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14319316#comment-14319316 ] Brennon York commented on SPARK-5790: - FWIW this issue is a blocker for [SPARK-4600|https://issues.apache.org/jira/browse/SPARK-4600] that I'm working on as `diff` relies on the use of `zipPartitions` causing this. If someone could assign this to me I'll continue working this issue. > VertexRDD's won't zip properly for `diff` capability > > > Key: SPARK-5790 > URL: https://issues.apache.org/jira/browse/SPARK-5790 > Project: Spark > Issue Type: Bug > Components: GraphX >Reporter: Brennon York > > For VertexRDD's with differing partition sizes one cannot run commands like > `diff` as it will thrown an IllegalArgumentException. The code below provides > an example: > {code} > import org.apache.spark.graphx._ > import org.apache.spark.rdd._ > val setA: VertexRDD[Int] = VertexRDD(sc.parallelize(0L until 3L).map(id => > (id, id.toInt+1))) > setA.collect.foreach(println(_)) > val setB: VertexRDD[Int] = VertexRDD(sc.parallelize(2L until 4L).map(id => > (id, id.toInt+2))) > setB.collect.foreach(println(_)) > val diff = setA.diff(setB) > diff.collect.foreach(println(_)) > val setC: VertexRDD[Int] = VertexRDD(sc.parallelize(2L until 4L).map(id => > (id, id.toInt+2)) ++ sc.parallelize(6L until 8L).map(id => (id, id.toInt+2))) > setA.diff(setC).collect > // java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of > partitions > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org