I think the c data interface requires the arrays to be immutable or two implementations will race when mutating/reading the shared regions, since we have no mechanism to synchronize read/write access across the boundary.
Best, Jorge On Wed, Nov 3, 2021 at 1:50 PM Alessandro Molina < alessan...@ursacomputing.com> wrote: > I recently noticed that in the Java implementation we expose a set/setSafe > function that allows to mutate Arrow Arrays [1] > > This seems to be at odds with the general design of the C++ (and by > consequence Python and R) library where Arrays are immutable and can be > modified only through compute functions returning copies. > > The Arrow Format documentation [2] seems to suggest that mutation of data > structures is possible and left as an implementation detail, but given that > some users might be willing to mutate existing structures (for example to > avoid incurring in the memory cost of copies when dealing with big arrays) > I think there might be reasons for both allowing mutation of Arrays and > disallowing it. It probably makes sense to ensure that all the > implementations agree on such a fundamental choice to avoid setting > expectations on users' side that might not apply when they cross language > barriers. > > [1] > > https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/SmallIntVector.html#setSafe-int-int- > [2] https://arrow.apache.org/docs/format/Columnar.html >