HyukjinKwon commented on a change in pull request #23622: [SPARK-26677][SQL] 
Disable dictionary filtering by default at Parquet
URL: https://github.com/apache/spark/pull/23622#discussion_r251267116
 
 

 ##########
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
 ##########
 @@ -314,6 +314,17 @@ class ParquetFileFormat
       SQLConf.CASE_SENSITIVE.key,
       sparkSession.sessionState.conf.caseSensitiveAnalysis)
 
+    // There are two things to note here.
+    //
+    // 1. Dictionary filtering has an issue about the predication on null. For 
this reason,
+    //   This filtering is disabled. See SPARK-26677.
+    //
+    // 2. We should disable 'parquet.filter.dictionary.enabled' but
+    //   the 'parquet.filter.stats.enabled' and 
'parquet.filter.dictionary.enabled' were
+    //   swapped mistakenly in Parquet side. It should use 
'parquet.filter.dictionary.enabled'
+    //   when Spark upgrades Parquet. See PARQUET-1309.
+    hadoopConf.setIfUnset(ParquetInputFormat.STATS_FILTERING_ENABLED, "false")
 
 Review comment:
   Thanks for all details. +1 for going 1.10.1. If that's a plan here, I will 
close this PR in a couple of days.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to