Shawn Guo created SPARK-4533: -------------------------------- Summary: Can only subtract another SchemaRDD Key: SPARK-4533 URL: https://issues.apache.org/jira/browse/SPARK-4533 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.1.0 Environment: JDK6/7 Reporter: Shawn Guo Priority: Minor
There are two unexpected validations in below SchemaRDD APIs. subtract(self, other, numPartitions=None) "Can only subtract another SchemaRDD" intersection(self, other) "Can only intersect with another SchemaRDD" "Can only subtract another SchemaRDD" will be thrown when SchemaRDD subtract other types of RDD. Reproduce Steps: A = SchemaRDD B = SchemaRDD A_APX= A.keyBy(lambda line: None) B_APX=B.keyBy(lambda line: None) {color:red} CROSSED = A_APX.join(B_APX).map(lambda line: line[1]).filter(filter condition).map(lambda line: line[0])) {color} C=A.subtract(CROSSED) {color:red}#ERROR:Can only subtract another SchemaRDD{color} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org