jleibs commented on issue #7628:
URL: https://github.com/apache/arrow-rs/issues/7628#issuecomment-2970823649

   It's a bit orthogonal to the specific request, but I believe the motivating 
factors for @emilk 's request here bring up a deeper confusing issue.
    
   When we talk about attaching metadata to a `RecordBatch` I believe there are 
two separate things we could be talking about:
   - Message metadata:
     - 
https://github.com/apache/arrow/blob/2ba455f17e7cdbfe2b2f1aa3dfb2bf00878a17e1/format/Message.fbs#L154
   - Schema metadata:
     - 
https://github.com/apache/arrow/blob/2ba455f17e7cdbfe2b2f1aa3dfb2bf00878a17e1/format/Schema.fbs#L565
   
   The proposed solution reinforces this confusion. I would expect 
`RecordBatch::metadata_mut()` to modify the former, but the proposed solution 
would in fact modify the latter.
   
   It's not even clear to me if arrow-rs tracks or exposes the latter anywhere.
   
   
   A corollary to this is that, even after reading the docs, I can't say 
whether it's considered an error for several RecordBatches in the same logical 
stream to contain different schemas / metadata values. 
   
   The existence of `schema()` on interfaces like `RecordBatchReader`, to me, 
suggests many implementers would likely expect some guarantee along the lines 
of:
   ```
   for record_batch in reader {
     assert_eq!(reader.schema(), batch.schema())`;
   }
   ```
   
   Some of the validation questions have been touched on here:
   - https://github.com/apache/arrow-rs/pull/4800
   - https://github.com/apache/arrow-rs/issues/4801
   
   But those don't seem to bring up the topic of "message metadata" either.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to