I think the c data interface requires the arrays to be immutable or two
implementations will race when mutating/reading the shared regions, since
we have no mechanism to synchronize read/write access across the boundary.

Best,
Jorge


On Wed, Nov 3, 2021 at 1:50 PM Alessandro Molina <
alessan...@ursacomputing.com> wrote:

> I recently noticed that in the Java implementation we expose a set/setSafe
> function that allows to mutate Arrow Arrays [1]
>
> This seems to be at odds with the general design of the C++ (and by
> consequence Python and R) library where Arrays are immutable and can be
> modified only through compute functions returning copies.
>
> The Arrow Format documentation [2] seems to suggest that mutation of data
> structures is possible and left as an implementation detail, but given that
> some users might be willing to mutate existing structures (for example to
> avoid incurring in the memory cost of copies when dealing with big arrays)
> I think there might be reasons for both allowing mutation of Arrays and
> disallowing it. It probably makes sense to ensure that all the
> implementations agree on such a fundamental choice to avoid setting
> expectations on users' side that might not apply when they cross language
> barriers.
>
> [1]
>
> https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/SmallIntVector.html#setSafe-int-int-
> [2] https://arrow.apache.org/docs/format/Columnar.html
>

Reply via email to