[ https://issues.apache.org/jira/browse/PARQUET-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16752641#comment-16752641 ]
Ryan Blue commented on PARQUET-1510: ------------------------------------ Fixed metadata. > Dictionary filter skips null values when evaluating not-equals. > --------------------------------------------------------------- > > Key: PARQUET-1510 > URL: https://issues.apache.org/jira/browse/PARQUET-1510 > Project: Parquet > Issue Type: Bug > Components: parquet-mr > Affects Versions: 1.9.0, 1.10.0, 1.9.1 > Reporter: Ryan Blue > Priority: Blocker > Labels: correctness, pull-request-available > Fix For: 1.11.0, 1.10.1 > > > This was discovered in Spark, see SPARK-26677. From the Spark PR: > {code} > // Repeat the values to get dictionary encoding. > Seq(Some("A"), Some("A"), > None).toDF.repartition(1).write.mode("overwrite").parquet("/tmp/foo") > spark.read.parquet("/tmp/foo").where("NOT (value <=> 'A')").show() > +-----+ > |value| > +-----+ > +-----+ > {code} > {code} > // Use plain encoding. > Seq(Some("A"), > None).toDF.repartition(1).write.mode("overwrite").parquet("/tmp/bar") > spark.read.parquet("/tmp/bar").where("NOT (value <=> 'A')").show() > +-----+ > |value| > +-----+ > | null| > +-----+ > {code} > This is a correctness issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)