jonded94 commented on code in PR #8790:
URL: https://github.com/apache/arrow-rs/pull/8790#discussion_r2582588902


##########
arrow-array/src/ffi_stream.rs:
##########
@@ -364,7 +364,13 @@ impl Iterator for ArrowArrayStreamReader {
             let result = unsafe {
                 from_ffi_and_data_type(array, 
DataType::Struct(self.schema().fields().clone()))
             };
-            Some(result.map(|data| RecordBatch::from(StructArray::from(data))))
+            Some(result.map(|data| {
+                let struct_array = StructArray::from(data);

Review Comment:
   >> What is current on main is pretty elegant
   >
   > `RecordBatch::from(StructArray::from(data)))`
   
   I personally wouldn't call this elegant, as it's actually *currently* 
leading to unsoundless problems which I already stated a few times in this 
discussion :grimacing: 
   
   1. Since `StructArray` is metadata-less, the `RecordBatch` constructed from 
it is metadata-less by definition too. This means, the returned `RecordBatch` 
can't ever have the schema advertised by `ArrowArrayStreamReader`, at least if 
it's supposed to have metadata.
   2. Much more importantly: There currently are no checks whether the 
`RecordBatch` constructed from `StructArray` and returned by the stream reader 
even corresponds to the schema which `ArrowArrayStreamReader` advertises. They 
are just returned as-is, and in prinicple it's possible to make this interface 
return an entirely different `RecordBatch`. This was the cause of this PR, 
namely that it's currently returning unsound `RecordBatch` without advertising 
this as an error somewhere.
   
   The current state with `unsafe` in this PR more or less leaves problem 2 
intact (with arguably introducing a chance of an additional invariant 
incorrectness in the `RecordBatch` itself), but at least fixes problem 1.
   
   Nevertheless, I will proceed with introducing schema checking now and remove 
`unsafe`, which would also fix problem 2 in general, with a tiny additional 
compute cost.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to