Dan Osipov created SPARK-26366: ---------------------------------- Summary: Except with transform regression Key: SPARK-26366 URL: https://issues.apache.org/jira/browse/SPARK-26366 Project: Spark Issue Type: Bug Components: Spark Core, SQL Affects Versions: 2.3.2 Reporter: Dan Osipov
There appears to be a regression between Spark 2.2 and 2.3. Below is the code to reproduce it: {code:java} import org.apache.spark.sql.functions.col import org.apache.spark.sql.Row import org.apache.spark.sql.types._ val inputDF = spark.sqlContext.createDataFrame( spark.sparkContext.parallelize(Seq( Row("0", "john", "smith", "j...@smith.com"), Row("1", "jane", "doe", "j...@doe.com"), Row("2", "apache", "spark", "sp...@apache.org"), Row("3", "foo", "bar", null) )), StructType(List( StructField("id", StringType, nullable=true), StructField("first_name", StringType, nullable=true), StructField("last_name", StringType, nullable=true), StructField("email", StringType, nullable=true) )) ) val exceptDF = inputDF.transform( toProcessDF => toProcessDF.filter( ( col("first_name").isin(Seq("john", "jane"): _*) and col("last_name").isin(Seq("smith", "doe"): _*) ) or col("email").isin(List(): _*) ) ) inputDF.except(exceptDF).show() {code} Output with Spark 2.2: {noformat} +---+----------+---------+----------------+ | id|first_name|last_name| email| +---+----------+---------+----------------+ | 2| apache| spark|sp...@apache.org| | 3| foo| bar| null| +---+----------+---------+----------------+{noformat} Output with Spark 2.3: {noformat} +---+----------+---------+----------------+ | id|first_name|last_name| email| +---+----------+---------+----------------+ | 2| apache| spark|sp...@apache.org| +---+----------+---------+----------------+{noformat} Note, changing the last line to {code:java} inputDF.except(exceptDF.cache()).show() {code} produces identical output for both Spark 2.3 and 2.2 -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org