[jira] [Commented] (PARQUET-1901) Add filter null check for ColumnIndex

Ryan Blue (Jira) Mon, 24 Aug 2020 10:22:43 -0700


    [ 
https://issues.apache.org/jira/browse/PARQUET-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183481#comment-17183481
 ]


Ryan Blue commented on PARQUET-1901:
------------------------------------

It isn't clear to me how a filter implementation would handle the filter itself 
being null. It could return a default value to accept/read, but that runs into 
issues when filters like {{not(null)}} are passed in. So I agree with Gabor 
that it makes sense for a null filter to be an exceptional case in the filter 
implementations themselves.

But I would expect a method like {{calculateRowRanges}} to correctly return the 
default {{RowRanges.createSingle(rowCount)}} if that method were passed a null 
value, since it is not actually processing the filter.

For Iceberg, I'm wondering if it wouldn't be easier to implement our own filter 
implementation that produced row ranges and passed them in. That's how we 
filter row groups and I think it has been much easier not needing to convert to 
Parquet filters, which are difficult to work with.

> Add filter null check for ColumnIndex  
> ---------------------------------------
>
>                 Key: PARQUET-1901
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1901
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: 1.11.0
>            Reporter: Xinli Shang
>            Assignee: Xinli Shang
>            Priority: Major
>             Fix For: 1.12.0
>
>
> This Jira is opened for discussion that should we add null checking for the 
> filter when ColumnIndex is enabled. 
> In the ColumnIndexFilter#calculateRowRanges() method, the input parameter 
> 'filter' is assumed to be non-null without checking. It throws NPE when 
> ColumnIndex is enabled(by default) but there is no filter set in the 
> ParquetReadOptions. The call stack is as below. 
>     java.lang.NullPointerException
>         at 
> org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.calculateRowRanges(ColumnIndexFilter.java:81)
>         at 
> org.apache.parquet.hadoop.ParquetFileReader.getRowRanges(ParquetFileReader.java:961)
>         at 
> org.apache.parquet.hadoop.ParquetFileReader.readNextFilteredRowGroup(ParquetFileReader.java:891)
> If we don't add, the user might need to choose to call readNextRowGroup() or 
> readFilteredNextRowGroup() accordingly based on filter existence. 
> Thoughts?  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PARQUET-1901) Add filter null check for ColumnIndex

Reply via email to