> I think in this particular case, we should consider the C ABI /
> in-memory representation and IPC format as separate beasts. If an
> implementation of Arrow does not want to use this string-view array
> type at all (for example, if it created memory safety issues in Rust),
> then it can choose to convert to the existing string array
> representation when receiving a C ABI payload. Whether or not there is
> an alternate IPC format for this data type seems like a separate
> question -- my preference actually would be to support this for
> in-memory / C ABI use but not to alter the IPC format.
>

I think this idea deserves some clarification or at least more exposition.
On first reading, it was not clear to me that we might add things to the
in-memory Arrow format but not IPC, that that was even an option. I'm
guessing I'm not the only one who missed that.

If these new types are only part of the Arrow in-memory format, then it's
not the case that reading/writing IPC files involves no serialization
overhead. I recognize that that's technically already the case since IPC
supports compression now, but it's not generally how we talk about the
relationship between the IPC and in-memory formats (see our own FAQ [1],
for example). If we go forward with these changes, it would be a good
opportunity for us to clarify in our docs/website that the "Arrow format"
is not a single thing.

Neal

[1]: https://arrow.apache.org/faq/

Reply via email to