[ https://issues.apache.org/jira/browse/PARQUET-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183312#comment-17183312 ]
Gabor Szadovszky commented on PARQUET-1901: ------------------------------------------- It is clear we shall handle this case properly. I've quickly checked the other filters ({{DictionaryFilter}}, {{StatisticsFilter}} and {{BloomFilterImpl}}) and neither handles the case of the filter being {{null}} (meaning they all throw NPE). So, I would vote on not checking for the filter being {{null}} in {{ColumnIndexFilter}}. Instead, the places where it is invoked shall handle the case of a {{null}} filter like [here|https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java#L870-L872]. > Add filter null check for ColumnIndex > --------------------------------------- > > Key: PARQUET-1901 > URL: https://issues.apache.org/jira/browse/PARQUET-1901 > Project: Parquet > Issue Type: Bug > Components: parquet-mr > Affects Versions: 1.11.0 > Reporter: Xinli Shang > Assignee: Xinli Shang > Priority: Major > Fix For: 1.12.0 > > > This Jira is opened for discussion that should we add null checking for the > filter when ColumnIndex is enabled. > In the ColumnIndexFilter#calculateRowRanges() method, the input parameter > 'filter' is assumed to be non-null without checking. It throws NPE when > ColumnIndex is enabled(by default) but there is no filter set in the > ParquetReadOptions. The call stack is as below. > java.lang.NullPointerException > at > org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.calculateRowRanges(ColumnIndexFilter.java:81) > at > org.apache.parquet.hadoop.ParquetFileReader.getRowRanges(ParquetFileReader.java:961) > at > org.apache.parquet.hadoop.ParquetFileReader.readNextFilteredRowGroup(ParquetFileReader.java:891) > If we don't add, the user might need to choose to call readNextRowGroup() or > readFilteredNextRowGroup() accordingly based on filter existence. > Thoughts? > -- This message was sent by Atlassian Jira (v8.3.4#803005)