pitrou commented on code in PR #48215:
URL: https://github.com/apache/arrow/pull/48215#discussion_r2556081992


##########
cpp/src/parquet/arrow/reader_internal.cc:
##########
@@ -851,7 +851,41 @@ Status TransferHalfFloat(RecordReader* reader, MemoryPool* 
pool,
   std::shared_ptr<ChunkedArray> chunked_array;
   RETURN_NOT_OK(
       TransferBinary(reader, pool, field->WithType(binary_type), 
&chunked_array));
+#if ARROW_LITTLE_ENDIAN
   ARROW_ASSIGN_OR_RAISE(*out, chunked_array->View(field->type()));
+#else
+  // Convert little-endian bytes from Parquet to native-endian HalfFloat

Review Comment:
   I would favor a different approach: turn `TransferBinary` into:
   ```c++
   Status TransferBinary(RecordReader* reader, MemoryPool* pool,
                         const std::shared_ptr<Field>& logical_type_field,
                         
std::function<Result<std::shared_ptr<Array>>(std::shared_ptr<Array>)> 
array_process,
                         std::shared_ptr<ChunkedArray>* out) {
   ```
   
   such that the optional `array_process` is called for each chunk. If 
carefully coded, this will help limit memory consumption by disposing of old 
chunks while creating the new ones.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to