[ https://issues.apache.org/jira/browse/SPARK-18753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-18753: ------------------------------------ Assignee: Apache Spark > Inconsistent behavior after writing to parquet files > ---------------------------------------------------- > > Key: SPARK-18753 > URL: https://issues.apache.org/jira/browse/SPARK-18753 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.2, 2.1.0 > Reporter: Shixiong Zhu > Assignee: Apache Spark > > Found an inconsistent behavior when using parquet. > {code} > scala> val ds = Seq[java.lang.Boolean](new java.lang.Boolean(true), null: > java.lang.Boolean, new java.lang.Boolean(false)).toDS > ds: org.apache.spark.sql.Dataset[Boolean] = [value: boolean] > scala> ds.filter('value === "true").show > +-----+ > |value| > +-----+ > +-----+ > {code} > In the above example, `ds.filter('value === "true")` returns nothing as > "true" will be converted to null and the filter expression will be always > null, so it drops all rows. > However, if I store `ds` to a parquet file and read it back, `filter('value > === "true")` will return non null values. > {code} > scala> ds.write.parquet("testfile") > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > scala> val ds2 = spark.read.parquet("testfile") > ds2: org.apache.spark.sql.DataFrame = [value: boolean] > scala> ds2.filter('value === "true").show > +-----+ > |value| > +-----+ > | true| > |false| > +-----+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org