[ https://issues.apache.org/jira/browse/PARQUET-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695692#comment-17695692 ]
ASF GitHub Bot commented on PARQUET-2252: ----------------------------------------- zhongyujiang commented on code in PR #1038: URL: https://github.com/apache/parquet-mr/pull/1038#discussion_r1123028615 ########## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ########## @@ -1011,6 +1012,35 @@ public PageReadStore readFilteredRowGroup(int blockIndex) throws IOException { } RowRanges rowRanges = getRowRanges(blockIndex); + return readFilteredRowGroup(blockIndex, rowRanges); + } + + /** + * Reads all the columns requested from the specified row group. It may skip specific pages based on the + * {@code rowRanges} passed in. As the rows are not aligned among the pages of the different columns row + * synchronization might be required. See the documentation of the class SynchronizingColumnReader for details. + * + * @param blockIndex the index of the requested block + * @param rowRanges the row ranges to be read from the requested block + * @return the PageReadStore which can provide PageReaders for each column or null if there are no rows in this block + * @throws IOException if an error occurs while reading + * @throws IllegalArgumentException if the {@code blockIndex} is invalid or the {@code rowRanges} is null + */ + public ColumnChunkPageReadStore readFilteredRowGroup(int blockIndex, RowRanges rowRanges) throws IOException { + if (blockIndex < 0 || blockIndex >= blocks.size()) { + throw new IllegalArgumentException(String.format("Invalid block index %s, the valid block index range are: " + + "[%s, %s]", blockIndex, 0, blocks.size() - 1)); + } + + if (Objects.isNull(rowRanges)) { + throw new IllegalArgumentException("RowRanges must not be null"); + } + + BlockMetaData block = blocks.get(blockIndex); + if (block.getRowCount() == 0L) { + throw new ParquetEmptyBlockException("Illegal row group of 0 rows"); Review Comment: I checked PARQUET-2291, seems we only skip empty row group when using reader as a iterator, right? We skip empty row group in `readNextRowGroup()` , but not when the user passes in a `blockIndex`, and this newly introduced method also requires the user pass in a `blockIndex`. > Make some methods public to allow external projects to implement page skipping > ------------------------------------------------------------------------------ > > Key: PARQUET-2252 > URL: https://issues.apache.org/jira/browse/PARQUET-2252 > Project: Parquet > Issue Type: New Feature > Reporter: Yujiang Zhong > Priority: Major > > Iceberg hopes to implement the column index filter based on Iceberg's own > expressions, we would like to be able to use some of the methods in Parquet > repo, for example: methods in `RowRanges` and `IndexIterator`, however these > are currently not public. Currently we can only rely on reflection to use > them. -- This message was sent by Atlassian Jira (v8.20.10#820010)