amol- commented on a change in pull request #10199:
URL: https://github.com/apache/arrow/pull/10199#discussion_r632411238
##########
File path: python/pyarrow/tests/test_array.py
##########
@@ -2666,6 +2666,38 @@ def test_array_masked():
assert arr.type == pa.int64()
+def test_binary_array_masked():
+ # ARROW-12431
+ masked_basic = pa.array([b'\x05'], type=pa.binary(1),
+ mask=np.array([False]))
+ assert [b'\x05'] == masked_basic.to_pylist()
+
+ # Fixed Length Binary
+ masked = pa.array(np.array([b'\x05']), type=pa.binary(1),
+ mask=np.array([False]))
+ assert [b'\x05'] == masked.to_pylist()
+
+ masked_nulls = pa.array(np.array([b'\x05']), type=pa.binary(1),
+ mask=np.array([True]))
+ assert [None] == masked_nulls.to_pylist()
+
+ # Variable Length Binary
+ masked = pa.array(np.array([b'\x05']), type=pa.binary(),
+ mask=np.array([False]))
+ assert [b'\x05'] == masked.to_pylist()
+
+ masked_nulls = pa.array(np.array([b'\x05']), type=pa.binary(),
+ mask=np.array([True]))
+ assert [None] == masked_nulls.to_pylist()
+
+ # Fixed Length Binary, copy
Review comment:
It verifies that the behaviour is the same that we get from variable
length binary arrays, which do not reuse the numpy array memory. I don't think
it's an implementation detail because it changes the user experience.
The fact that the underlying numpy array is shared or not changes the user
experience as it means the user can't modify the original numpy array without
indirectly modifying (probably unexpectedly) the Arrow array too.
which lead me to create https://issues.apache.org/jira/browse/ARROW-12666
because in some cases we reuse the numpy memory (all basic types) and in other
cases we don't (the string, binary etc... types). The follow up ticket suggests
to make that behaviour clear as numpy does by adding a `copy=True/False`
argument to the `pyarrow.array` function.
We can discuss further what's the best way to go in that dedicated ticket,
here I wanted to make sure we were consistent with that happens when
`pa.binary()` and `pa.binary(N)` are used. So the test verifies that if you
modify the numpy array the arraw array doesn't change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]