[ https://issues.apache.org/jira/browse/ARROW-11518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neal Richardson updated ARROW-11518: ------------------------------------ Fix Version/s: (was: 5.0.0) 6.0.0 > [C++] [Parquet] Parquet reader crashes when reading boolean columns > ------------------------------------------------------------------- > > Key: ARROW-11518 > URL: https://issues.apache.org/jira/browse/ARROW-11518 > Project: Apache Arrow > Issue Type: Bug > Components: C++ > Affects Versions: 3.0.0 > Reporter: Andrey Klochkov > Priority: Major > Labels: pull-request-available > Fix For: 6.0.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Parquet file reader crashes while reading boolean columns in > {{TypedColumnReaderImpl<DType>::Skip}}. > The calculation of the buffer size in the code below is not correct as > {{value_byte_size}} is 1 for booleans, and the same buffer is used for > definition and repetition levels data which requires 2 bytes per value. > {code} > // This will be enough scratch space to accommodate 16-bit levels or any > // value type > std::shared_ptr<ResizableBuffer> scratch = AllocateBuffer( > this->pool_, batch_size * > type_traits<DType::type_num>::value_byte_size); > do { > batch_size = std::min(batch_size, rows_to_skip); > values_read = > ReadBatch(static_cast<int>(batch_size), > reinterpret_cast<int16_t*>(scratch->mutable_data()), > reinterpret_cast<int16_t*>(scratch->mutable_data()), > reinterpret_cast<T*>(scratch->mutable_data()), > &values_read); > rows_to_skip -= values_read; > } while (values_read > 0 && rows_to_skip > 0); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)