sunchao commented on a change in pull request #32753:
URL: https://github.com/apache/spark/pull/32753#discussion_r658990194



##########
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetReadState.java
##########
@@ -17,13 +17,31 @@
 
 package org.apache.spark.sql.execution.datasources.parquet;
 
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.PrimitiveIterator;
+
 /**
  * Helper class to store intermediate state while reading a Parquet column 
chunk.
  */
 final class ParquetReadState {
-  /** Maximum definition level */
+  private static final RowRange MAX_ROW_RANGE = new RowRange(Long.MIN_VALUE, 
Long.MAX_VALUE);
+  private static final RowRange MIN_ROW_RANGE = new RowRange(Long.MAX_VALUE, 
Long.MIN_VALUE);
+
+  /** Iterator over all row ranges, only not-null if column index is present */
+  private final Iterator<RowRange> rowRanges;

Review comment:
       No. The list of row ranges is associated with a Parquet row group. For 
example, let's say you have two columns `c1:int` and `c2:bigint`, and the 
following pages:
   
   ```
     row index   0        500       1000      1500
                 -------------------------------
     c1 (int)    |         |         |         |
                 -------------------------------
     c2 (bigint) |    |    |    |    |    |    |
                 -------------------------------
                 0   250  500  750  1000 1250 1500
   ```
   
   Suppose the query is `SELECT * FROM tbl WHERE c1 = 750 AND c2 = 1100`
   This, when applied on `c1`, will produce row range `[500, 1000)`. When 
applied on `c2`, will produce row range `[1000, 1250)`. These two will be 
unioned into `[500, 1250)` and that is the row range for the whole row group.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to