wgtmac commented on code in PR #35825: URL: https://github.com/apache/arrow/pull/35825#discussion_r1212894156
########## cpp/src/parquet/encoding.cc: ########## @@ -1126,6 +1126,39 @@ inline int DecodePlain<ByteArray>(const uint8_t* data, int64_t data_size, int nu return bytes_decoded; } +static inline int64_t ReadLargeByteArray(const uint8_t* data, int64_t data_size, + LargeByteArray* out) { + if (ARROW_PREDICT_FALSE(data_size < 4)) { + ParquetException::EofException(); + } + const int32_t len = SafeLoadAs<int32_t>(data); + if (len < 0) { + throw ParquetException("Invalid BYTE_ARRAY value"); + } + const int64_t consumed_length = static_cast<int64_t>(len) + 4; + if (ARROW_PREDICT_FALSE(data_size < consumed_length)) { + ParquetException::EofException(); + } + *out = LargeByteArray{static_cast<uint32_t>(len), data + 4}; + return consumed_length; +} + +template <> +inline int DecodePlain<LargeByteArray>(const uint8_t* data, int64_t data_size, int num_values, Review Comment: IMO, decoding plain binaries to ByteArray* is sufficient. It is the job of LargeBinaryBuilder to convert ByteArray to arrow::LargeBinary. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org