[
https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687119#comment-17687119
]
ASF GitHub Bot commented on PARQUET-2237:
-----------------------------------------
yabola commented on code in PR #1023:
URL: https://github.com/apache/parquet-mr/pull/1023#discussion_r1102881433
##########
parquet-hadoop/src/main/java/org/apache/parquet/filter2/compat/PredicateEvaluation.java:
##########
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.parquet.filter2.compat;
+
+import org.apache.parquet.filter2.predicate.FilterPredicate;
+import org.apache.parquet.filter2.predicate.Operators;
+
+/**
+ * Used in Filters to mark whether the block data matches the condition.
+ * If we cannot decide whether the block matches, it will be always safe to
return BLOCK_MIGHT_MATCH.
+ *
+ * We use Boolean Object here to distinguish the value type, please do not
modify it.
+ */
+public class PredicateEvaluation {
+ /* The block might match, but we cannot decide yet, will check in the other
filters. */
+ public static final Boolean BLOCK_MIGHT_MATCH = new Boolean(false);
+ /* The block can match for sure. */
+ public static final Boolean BLOCK_MUST_MATCH = new Boolean(false);
+ /* The block can't match for sure */
+ public static final Boolean BLOCK_CANNOT_MATCH = new Boolean(true);
+
+ public static Boolean evaluateAnd(Operators.And and,
FilterPredicate.Visitor<Boolean> predicate) {
+ Boolean left = and.getLeft().accept(predicate);
Review Comment:
Yes, thanks
> Improve performance when filters in RowGroupFilter can match exactly
> --------------------------------------------------------------------
>
> Key: PARQUET-2237
> URL: https://issues.apache.org/jira/browse/PARQUET-2237
> Project: Parquet
> Issue Type: Improvement
> Reporter: Mars
> Priority: Major
>
> If we can accurately judge by the minMax status, we don’t need to load the
> dictionary from filesystem and compare one by one anymore.
> Similarly , Bloomfilter needs to load from filesystem, it may costs time and
> memory. If we can exactly determine the existence/nonexistence of the value
> from minMax or dictionary filters , then we can avoid using Bloomfilter to
> Improve performance.
> For example,
> # read data greater than {{x1}} in the block, if minMax in status is all
> greater than {{{}x1{}}}, then we don't need to read dictionary and compare
> one by one.
> # If we already have page dictionaries and have compared one by one, we
> don't need to read BloomFilter and compare.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)