[ 
https://issues.apache.org/jira/browse/PARQUET-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689543#comment-17689543
 ] 

ASF GitHub Bot commented on PARQUET-2245:
-----------------------------------------

wgtmac commented on code in PR #1029:
URL: https://github.com/apache/parquet-mr/pull/1029#discussion_r1108041382


##########
parquet-hadoop/src/main/java/org/apache/parquet/filter2/dictionarylevel/DictionaryFilter.java:
##########
@@ -187,10 +196,7 @@ public <T extends Comparable<T>> Boolean visit(NotEq<T> 
notEq) {
 
     try {
       Set<T> dictSet = expandDictionary(meta);
-      boolean mayContainNull = (meta.getStatistics() == null
-          || !meta.getStatistics().isNumNullsSet()
-          || meta.getStatistics().getNumNulls() > 0);
-      if (dictSet != null && dictSet.size() == 1 && dictSet.contains(value) && 
!mayContainNull) {
+      if (dictSet != null && dictSet.size() == 1 && dictSet.contains(value)) {

Review Comment:
   I just noticed that the `FilterPredicate` does not provide an entry for `IS 
NULL` or `IS NOT NULL`. This confuses me because `col IS NOT NULL` is not equal 
to `col != NULL`.
   
   CMIW, `col NOT EQ A` has two meanings as below:
   - If A is NULL, it should return an empty list. Because NULL cannot be 
compared to any value including another NULL.
   - Otherwise, it should return a list of values excluding A and NULL.
   
   cc @huaxingao @gszadovszky @shangxinli 





> Improve dictionary filter evaluating notEq
> ------------------------------------------
>
>                 Key: PARQUET-2245
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2245
>             Project: Parquet
>          Issue Type: Improvement
>            Reporter: Yujiang Zhong
>            Priority: Minor
>
> When evaluating `notEq`, if the column may contain nulls and the `notEq` 
> value is non-null, the row-group must not be skipped. In such scenario 
> reading dictionary and compare values is not necessary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to