Matt DePero created ARROW-16638:
-----------------------------------

             Summary: [Go][Parquet] Boolean column reader fails to skip rows
                 Key: ARROW-16638
                 URL: https://issues.apache.org/jira/browse/ARROW-16638
             Project: Apache Arrow
          Issue Type: Bug
          Components: Go
            Reporter: Matt DePero
             Fix For: 9.0.0


Skipping values in the go parquet column reader is effectively implemented by 
reading the target number of rows into scratch space which is then discarded. 
In the boolean case, 
[BytesRequired|https://github.com/apache/arrow/blob/4c21fd12f93e4853c03c05919ffb22c6bb8f09b0/go/parquet/file/column_reader.go#L439]
 returns returns a scratch buffer that allocates one bit per row, however that 
[same scratch 
space|https://github.com/apache/arrow/blob/4c21fd12f93e4853c03c05919ffb22c6bb8f09b0/go/parquet/file/column_reader_types.gen.go#L212-L213]
 is also attempted to be used for `defLvls` and `repLvls` (both int16), which 
requires two bytes per row. Since the boolean `values` buffer is not large 
enough to hold the same number of rows worth of def and rep levels, skipping 
too many rows results in an index out of bounds panic.

 

Note that for other column types, this does not seem to be an issue since the 
buffer needed for `values` is always larger than the buffer needed for def and 
rep levels, however there still seems to be no reason to include any non-nil 
value to `cr.ReadBatch(...)` for [rep and def 
lvls|https://github.com/apache/arrow/blob/4c21fd12f93e4853c03c05919ffb22c6bb8f09b0/go/parquet/file/column_reader_types.gen.go#L212-L213]
 when skipping any column in the reader.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to