[jira] [Commented] (ARROW-564) [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array if there are nulls)

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447725#comment-16447725
 ] 

ASF GitHub Bot commented on ARROW-564:
--

pitrou commented on a change in pull request #1931: ARROW-564 [Python] Add 
support for return zero copy NumPy arrays
URL: https://github.com/apache/arrow/pull/1931#discussion_r183307671
 
 

 ##
 File path: python/pyarrow/tests/test_array.py
 ##
 @@ -83,6 +83,33 @@ def test_long_array_format():
 assert result == expected
 
 
+def test_to_numpy_zero_copy():
+import gc
+
+arr = pa.array(range(10))
+
+for i in range(10):
 
 Review comment:
   Why do you loop 10 times?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array 
> if there are nulls)
> 
>
> Key: ARROW-564
> URL: https://issues.apache.org/jira/browse/ARROW-564
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: beginner, pull-request-available
> Fix For: 1.0.0
>
>
> At the moment, for {{pyarrow.Array}} instances, we have a method called 
> {{to_pandas}}. While this method returns NumPy Arrays, it returns them in the 
> form that Pandas would use them in its {{Series}}. The difference here is 
> visible for example in the case of integers with null values. For Pandas, we 
> convert it into a float array and set all entries to NaN where we have null 
> entries in the Arrow array. For vanilla NumPy arrays, we would return a tuple 
> of a valid bytemap (not bitmap!) and a values array. The values array in this 
> case should simply be a view on the underlying Arrow buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-564) [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array if there are nulls)

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447729#comment-16447729
 ] 

ASF GitHub Bot commented on ARROW-564:
--

pitrou commented on a change in pull request #1931: ARROW-564 [Python] Add 
support for return zero copy NumPy arrays
URL: https://github.com/apache/arrow/pull/1931#discussion_r183307323
 
 

 ##
 File path: python/pyarrow/tests/test_array.py
 ##
 @@ -83,6 +83,33 @@ def test_long_array_format():
 assert result == expected
 
 
+def test_to_numpy_zero_copy():
+import gc
+
+arr = pa.array(range(10))
+
+for i in range(10):
+np_arr = arr.to_numpy()
+assert sys.getrefcount(np_arr) == 2
 
 Review comment:
   I'm not sure what this line is meant to test?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array 
> if there are nulls)
> 
>
> Key: ARROW-564
> URL: https://issues.apache.org/jira/browse/ARROW-564
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: beginner, pull-request-available
> Fix For: 1.0.0
>
>
> At the moment, for {{pyarrow.Array}} instances, we have a method called 
> {{to_pandas}}. While this method returns NumPy Arrays, it returns them in the 
> form that Pandas would use them in its {{Series}}. The difference here is 
> visible for example in the case of integers with null values. For Pandas, we 
> convert it into a float array and set all entries to NaN where we have null 
> entries in the Arrow array. For vanilla NumPy arrays, we would return a tuple 
> of a valid bytemap (not bitmap!) and a values array. The values array in this 
> case should simply be a view on the underlying Arrow buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-564) [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array if there are nulls)

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447728#comment-16447728
 ] 

ASF GitHub Bot commented on ARROW-564:
--

pitrou commented on a change in pull request #1931: ARROW-564 [Python] Add 
support for return zero copy NumPy arrays
URL: https://github.com/apache/arrow/pull/1931#discussion_r183308083
 
 

 ##
 File path: python/pyarrow/tests/test_array.py
 ##
 @@ -83,6 +83,33 @@ def test_long_array_format():
 assert result == expected
 
 
+def test_to_numpy_zero_copy():
+import gc
+
+arr = pa.array(range(10))
+
+for i in range(10):
+np_arr = arr.to_numpy()
+assert sys.getrefcount(np_arr) == 2
+np_arr = None  # noqa
+
+assert sys.getrefcount(arr) == 2
+
+for i in range(10):
+arr = pa.array(range(10))
+np_arr = arr.to_numpy()
+arr = None
+gc.collect()
+
+# Ensure base is still valid
 
 Review comment:
   I'm not sure that's the right way of looking at it. Just check that 
`np_arr.base` is not None...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array 
> if there are nulls)
> 
>
> Key: ARROW-564
> URL: https://issues.apache.org/jira/browse/ARROW-564
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: beginner, pull-request-available
> Fix For: 1.0.0
>
>
> At the moment, for {{pyarrow.Array}} instances, we have a method called 
> {{to_pandas}}. While this method returns NumPy Arrays, it returns them in the 
> form that Pandas would use them in its {{Series}}. The difference here is 
> visible for example in the case of integers with null values. For Pandas, we 
> convert it into a float array and set all entries to NaN where we have null 
> entries in the Arrow array. For vanilla NumPy arrays, we would return a tuple 
> of a valid bytemap (not bitmap!) and a values array. The values array in this 
> case should simply be a view on the underlying Arrow buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-564) [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array if there are nulls)

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447731#comment-16447731
 ] 

ASF GitHub Bot commented on ARROW-564:
--

pitrou commented on a change in pull request #1931: ARROW-564 [Python] Add 
support for return zero copy NumPy arrays
URL: https://github.com/apache/arrow/pull/1931#discussion_r183309840
 
 

 ##
 File path: python/pyarrow/tests/test_array.py
 ##
 @@ -83,6 +83,33 @@ def test_long_array_format():
 assert result == expected
 
 
+def test_to_numpy_zero_copy():
 
 Review comment:
   This function isn't actually testing the zero-copy part. You should mutate 
the result Numpy array and check the original Arrow array is mutated (of 
course, the fact we're able to get a mutable Numpy array from an Arrow array 
could be seen as a bug).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array 
> if there are nulls)
> 
>
> Key: ARROW-564
> URL: https://issues.apache.org/jira/browse/ARROW-564
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: beginner, pull-request-available
> Fix For: 1.0.0
>
>
> At the moment, for {{pyarrow.Array}} instances, we have a method called 
> {{to_pandas}}. While this method returns NumPy Arrays, it returns them in the 
> form that Pandas would use them in its {{Series}}. The difference here is 
> visible for example in the case of integers with null values. For Pandas, we 
> convert it into a float array and set all entries to NaN where we have null 
> entries in the Arrow array. For vanilla NumPy arrays, we would return a tuple 
> of a valid bytemap (not bitmap!) and a values array. The values array in this 
> case should simply be a view on the underlying Arrow buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-564) [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array if there are nulls)

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447727#comment-16447727
 ] 

ASF GitHub Bot commented on ARROW-564:
--

pitrou commented on a change in pull request #1931: ARROW-564 [Python] Add 
support for return zero copy NumPy arrays
URL: https://github.com/apache/arrow/pull/1931#discussion_r183308880
 
 

 ##
 File path: python/pyarrow/tests/test_array.py
 ##
 @@ -577,6 +604,31 @@ def test_simple_type_construction():
 str(result)
 
 
+@pytest.mark.parametrize(
+'narr',
+[
+np.arange(10, dtype=np.int64),
+np.arange(10, dtype=np.int32),
+np.arange(10, dtype=np.int16),
+np.arange(10, dtype=np.int8),
+np.arange(10, dtype=np.uint64),
+np.arange(10, dtype=np.uint32),
+np.arange(10, dtype=np.uint16),
+np.arange(10, dtype=np.uint8),
+np.arange(10, dtype=np.float64),
+np.arange(10, dtype=np.float32),
+np.arange(10, dtype=np.float16),
+]
+)
+def test_to_numpy_roundtrip(narr):
+arr = pa.array(narr)
+assert narr.dtype == arr.to_numpy().dtype
+assert np.array_equal(narr, arr.to_numpy())
 
 Review comment:
   Use `np.testing.assert_array_equal`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array 
> if there are nulls)
> 
>
> Key: ARROW-564
> URL: https://issues.apache.org/jira/browse/ARROW-564
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: beginner, pull-request-available
> Fix For: 1.0.0
>
>
> At the moment, for {{pyarrow.Array}} instances, we have a method called 
> {{to_pandas}}. While this method returns NumPy Arrays, it returns them in the 
> form that Pandas would use them in its {{Series}}. The difference here is 
> visible for example in the case of integers with null values. For Pandas, we 
> convert it into a float array and set all entries to NaN where we have null 
> entries in the Arrow array. For vanilla NumPy arrays, we would return a tuple 
> of a valid bytemap (not bitmap!) and a values array. The values array in this 
> case should simply be a view on the underlying Arrow buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-564) [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array if there are nulls)

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447730#comment-16447730
 ] 

ASF GitHub Bot commented on ARROW-564:
--

pitrou commented on a change in pull request #1931: ARROW-564 [Python] Add 
support for return zero copy NumPy arrays
URL: https://github.com/apache/arrow/pull/1931#discussion_r183308153
 
 

 ##
 File path: python/pyarrow/tests/test_array.py
 ##
 @@ -83,6 +83,33 @@ def test_long_array_format():
 assert result == expected
 
 
+def test_to_numpy_zero_copy():
+import gc
+
+arr = pa.array(range(10))
+
+for i in range(10):
+np_arr = arr.to_numpy()
+assert sys.getrefcount(np_arr) == 2
+np_arr = None  # noqa
+
+assert sys.getrefcount(arr) == 2
+
+for i in range(10):
+arr = pa.array(range(10))
+np_arr = arr.to_numpy()
+arr = None
+gc.collect()
+
+# Ensure base is still valid
+
+# Because of py.test's assert inspection magic, if you put getrefcount
+# on the line being examined, it will be 1 higher than you expect
+base_refcount = sys.getrefcount(np_arr.base)
+assert base_refcount == 2
+np_arr.sum()
 
 Review comment:
   You should check the result value.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array 
> if there are nulls)
> 
>
> Key: ARROW-564
> URL: https://issues.apache.org/jira/browse/ARROW-564
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: beginner, pull-request-available
> Fix For: 1.0.0
>
>
> At the moment, for {{pyarrow.Array}} instances, we have a method called 
> {{to_pandas}}. While this method returns NumPy Arrays, it returns them in the 
> form that Pandas would use them in its {{Series}}. The difference here is 
> visible for example in the case of integers with null values. For Pandas, we 
> convert it into a float array and set all entries to NaN where we have null 
> entries in the Arrow array. For vanilla NumPy arrays, we would return a tuple 
> of a valid bytemap (not bitmap!) and a values array. The values array in this 
> case should simply be a view on the underlying Arrow buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-564) [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array if there are nulls)

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447726#comment-16447726
 ] 

ASF GitHub Bot commented on ARROW-564:
--

pitrou commented on a change in pull request #1931: ARROW-564 [Python] Add 
support for return zero copy NumPy arrays
URL: https://github.com/apache/arrow/pull/1931#discussion_r183307516
 
 

 ##
 File path: python/pyarrow/tests/test_array.py
 ##
 @@ -83,6 +83,33 @@ def test_long_array_format():
 assert result == expected
 
 
+def test_to_numpy_zero_copy():
+import gc
+
+arr = pa.array(range(10))
+
+for i in range(10):
+np_arr = arr.to_numpy()
+assert sys.getrefcount(np_arr) == 2
+np_arr = None  # noqa
+
+assert sys.getrefcount(arr) == 2
 
 Review comment:
   Instead of harcoding this, you should check the original value hasn't 
changed:
   ```
   old_refcount = sys.getrefcount(arr)
   # ... do something
   assert sys.getrefcount(arr) == old_refcount
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array 
> if there are nulls)
> 
>
> Key: ARROW-564
> URL: https://issues.apache.org/jira/browse/ARROW-564
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: beginner, pull-request-available
> Fix For: 1.0.0
>
>
> At the moment, for {{pyarrow.Array}} instances, we have a method called 
> {{to_pandas}}. While this method returns NumPy Arrays, it returns them in the 
> form that Pandas would use them in its {{Series}}. The difference here is 
> visible for example in the case of integers with null values. For Pandas, we 
> convert it into a float array and set all entries to NaN where we have null 
> entries in the Arrow array. For vanilla NumPy arrays, we would return a tuple 
> of a valid bytemap (not bitmap!) and a values array. The values array in this 
> case should simply be a view on the underlying Arrow buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-564) [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array if there are nulls)

2018-04-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447212#comment-16447212
 ] 

ASF GitHub Bot commented on ARROW-564:
--

kynan opened a new pull request #1931: ARROW-564 [Python] Add support for 
return zero copy NumPy arrays
URL: https://github.com/apache/arrow/pull/1931
 
 
   Depends on the in-flight pull request for ARROW-2491


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array 
> if there are nulls)
> 
>
> Key: ARROW-564
> URL: https://issues.apache.org/jira/browse/ARROW-564
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: beginner, pull-request-available
> Fix For: 1.0.0
>
>
> At the moment, for {{pyarrow.Array}} instances, we have a method called 
> {{to_pandas}}. While this method returns NumPy Arrays, it returns them in the 
> form that Pandas would use them in its {{Series}}. The difference here is 
> visible for example in the case of integers with null values. For Pandas, we 
> convert it into a float array and set all entries to NaN where we have null 
> entries in the Arrow array. For vanilla NumPy arrays, we would return a tuple 
> of a valid bytemap (not bitmap!) and a values array. The values array in this 
> case should simply be a view on the underlying Arrow buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)