[ https://issues.apache.org/jira/browse/PARQUET-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689543#comment-17689543 ]
ASF GitHub Bot commented on PARQUET-2245: ----------------------------------------- wgtmac commented on code in PR #1029: URL: https://github.com/apache/parquet-mr/pull/1029#discussion_r1108041382 ########## parquet-hadoop/src/main/java/org/apache/parquet/filter2/dictionarylevel/DictionaryFilter.java: ########## @@ -187,10 +196,7 @@ public <T extends Comparable<T>> Boolean visit(NotEq<T> notEq) { try { Set<T> dictSet = expandDictionary(meta); - boolean mayContainNull = (meta.getStatistics() == null - || !meta.getStatistics().isNumNullsSet() - || meta.getStatistics().getNumNulls() > 0); - if (dictSet != null && dictSet.size() == 1 && dictSet.contains(value) && !mayContainNull) { + if (dictSet != null && dictSet.size() == 1 && dictSet.contains(value)) { Review Comment: I just noticed that the `FilterPredicate` does not provide an entry for `IS NULL` or `IS NOT NULL`. This confuses me because `col IS NOT NULL` is not equal to `col != NULL`. CMIW, `col NOT EQ A` has two meanings as below: - If A is NULL, it should return an empty list. Because NULL cannot be compared to any value including another NULL. - Otherwise, it should return a list of values excluding A and NULL. cc @huaxingao @gszadovszky @shangxinli > Improve dictionary filter evaluating notEq > ------------------------------------------ > > Key: PARQUET-2245 > URL: https://issues.apache.org/jira/browse/PARQUET-2245 > Project: Parquet > Issue Type: Improvement > Reporter: Yujiang Zhong > Priority: Minor > > When evaluating `notEq`, if the column may contain nulls and the `notEq` > value is non-null, the row-group must not be skipped. In such scenario > reading dictionary and compare values is not necessary. -- This message was sent by Atlassian Jira (v8.20.10#820010)