[ 
https://issues.apache.org/jira/browse/PARQUET-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17415321#comment-17415321
 ] 

ASF GitHub Bot commented on PARQUET-1968:
-----------------------------------------

huaxingao commented on a change in pull request #923:
URL: https://github.com/apache/parquet-mr/pull/923#discussion_r708849684



##########
File path: 
parquet-column/src/main/java/org/apache/parquet/filter2/predicate/Operators.java
##########
@@ -250,27 +250,16 @@ public Eq(Column<T> column, T value) {
     }
   }
 
-  // base class for In and NotIn
+  // base class for In and NotIn. In is used to filter data based on a list of 
values. NotIn is used to filter data that
+  // are not in the list of values.

Review comment:
       Changed.

##########
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/filter2/bloomfilterlevel/BloomFilterImpl.java
##########
@@ -118,6 +120,45 @@ private ColumnChunkMetaData getColumnChunk(ColumnPath 
columnPath) {
     return BLOCK_MIGHT_MATCH;
   }
 
+  @Override
+  public <T extends Comparable<T>> Boolean visit(Operators.In<T> in) {
+    Set<T> values = in.getValues();
+
+    if (values.contains(null)) {
+      // the bloom filter bitset contains only non-null values so isn't 
helpful. this
+      // could check the column stats, but the StatisticsFilter is responsible
+      return BLOCK_MIGHT_MATCH;
+    }
+
+    Operators.Column<T> filterColumn = in.getColumn();
+    ColumnChunkMetaData meta = getColumnChunk(filterColumn.getColumnPath());
+    if (meta == null) {
+      // the column isn't in this file so all values are null, but the value
+      // must be non-null because of the above check.
+      return BLOCK_CANNOT_MATCH;
+    }
+
+    try {
+      BloomFilter bloomFilter = bloomFilterReader.readBloomFilter(meta);
+      if (bloomFilter != null) {
+        for (T value : values) {
+          if (bloomFilter.findHash(bloomFilter.hash(value))) {
+            return BLOCK_MIGHT_MATCH;
+          }
+        }
+        return BLOCK_CANNOT_MATCH;
+      }
+    } catch (RuntimeException e) {
+      LOG.warn(e.getMessage());
+    }

Review comment:
       Removed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> FilterApi support In predicate
> ------------------------------
>
>                 Key: PARQUET-1968
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1968
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>    Affects Versions: 1.12.0
>            Reporter: Yuming Wang
>            Priority: Major
>
> FilterApi should support native In predicate.
> Spark:
> https://github.com/apache/spark/blob/d6a68e0b67ff7de58073c176dd097070e88ac831/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala#L600-L605
> Impala:
> https://issues.apache.org/jira/browse/IMPALA-3654



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to