wgtmac commented on code in PR #35825:
URL: https://github.com/apache/arrow/pull/35825#discussion_r1212894156


##########
cpp/src/parquet/encoding.cc:
##########
@@ -1126,6 +1126,39 @@ inline int DecodePlain<ByteArray>(const uint8_t* data, 
int64_t data_size, int nu
   return bytes_decoded;
 }
 
+static inline int64_t ReadLargeByteArray(const uint8_t* data, int64_t 
data_size,
+                                    LargeByteArray* out) {
+  if (ARROW_PREDICT_FALSE(data_size < 4)) {
+    ParquetException::EofException();
+  }
+  const int32_t len = SafeLoadAs<int32_t>(data);
+  if (len < 0) {
+    throw ParquetException("Invalid BYTE_ARRAY value");
+  }
+  const int64_t consumed_length = static_cast<int64_t>(len) + 4;
+  if (ARROW_PREDICT_FALSE(data_size < consumed_length)) {
+    ParquetException::EofException();
+  }
+  *out = LargeByteArray{static_cast<uint32_t>(len), data + 4};
+  return consumed_length;
+}
+
+template <>
+inline int DecodePlain<LargeByteArray>(const uint8_t* data, int64_t data_size, 
int num_values,

Review Comment:
   IMO, decoding plain binaries to ByteArray* is sufficient. It is the job of 
LargeBinaryBuilder to convert ByteArray to arrow::LargeBinary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to