Andrey Klochkov created ARROW-11518: ---------------------------------------
Summary: [C++] [Parquet] Parquet reader crashes when reading boolean columns Key: ARROW-11518 URL: https://issues.apache.org/jira/browse/ARROW-11518 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 3.0.0 Reporter: Andrey Klochkov Parquet file reader crashes while reading boolean columns in {{TypedColumnReaderImpl<DType>::Skip}}. The calculation of the buffer size in the code below is not correct as {{value_byte_size}} is 1 for booleans, and the same buffer is used for definition and repetition levels data which require 2 bytes per value. {code} // This will be enough scratch space to accommodate 16-bit levels or any // value type std::shared_ptr<ResizableBuffer> scratch = AllocateBuffer( this->pool_, batch_size * type_traits<DType::type_num>::value_byte_size); do { batch_size = std::min(batch_size, rows_to_skip); values_read = ReadBatch(static_cast<int>(batch_size), reinterpret_cast<int16_t*>(scratch->mutable_data()), reinterpret_cast<int16_t*>(scratch->mutable_data()), reinterpret_cast<T*>(scratch->mutable_data()), &values_read); rows_to_skip -= values_read; } while (values_read > 0 && rows_to_skip > 0); {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)