I recently noticed that in the Java implementation we expose a set/setSafe
function that allows to mutate Arrow Arrays [1]

This seems to be at odds with the general design of the C++ (and by
consequence Python and R) library where Arrays are immutable and can be
modified only through compute functions returning copies.

The Arrow Format documentation [2] seems to suggest that mutation of data
structures is possible and left as an implementation detail, but given that
some users might be willing to mutate existing structures (for example to
avoid incurring in the memory cost of copies when dealing with big arrays)
I think there might be reasons for both allowing mutation of Arrays and
disallowing it. It probably makes sense to ensure that all the
implementations agree on such a fundamental choice to avoid setting
expectations on users' side that might not apply when they cross language
barriers.

[1]
https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/SmallIntVector.html#setSafe-int-int-
[2] https://arrow.apache.org/docs/format/Columnar.html

Reply via email to