jleibs commented on issue #7628: URL: https://github.com/apache/arrow-rs/issues/7628#issuecomment-2970823649
It's a bit orthogonal to the specific request, but I believe the motivating factors for @emilk 's request here bring up a deeper confusing issue. When we talk about attaching metadata to a `RecordBatch` I believe there are two separate things we could be talking about: - Message metadata: - https://github.com/apache/arrow/blob/2ba455f17e7cdbfe2b2f1aa3dfb2bf00878a17e1/format/Message.fbs#L154 - Schema metadata: - https://github.com/apache/arrow/blob/2ba455f17e7cdbfe2b2f1aa3dfb2bf00878a17e1/format/Schema.fbs#L565 The proposed solution reinforces this confusion. I would expect `RecordBatch::metadata_mut()` to modify the former, but the proposed solution would in fact modify the latter. It's not even clear to me if arrow-rs tracks or exposes the latter anywhere. A corollary to this is that, even after reading the docs, I can't say whether it's considered an error for several RecordBatches in the same logical stream to contain different schemas / metadata values. The existence of `schema()` on interfaces like `RecordBatchReader`, to me, suggests many implementers would likely expect some guarantee along the lines of: ``` for record_batch in reader { assert_eq!(reader.schema(), batch.schema())`; } ``` Some of the validation questions have been touched on here: - https://github.com/apache/arrow-rs/pull/4800 - https://github.com/apache/arrow-rs/issues/4801 But those don't seem to bring up the topic of "message metadata" either. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org