[ 
https://issues.apache.org/jira/browse/ARROW-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382853#comment-16382853
 ] 

ASF GitHub Bot commented on ARROW-2142:
---------------------------------------

wesm commented on a change in pull request #1635: ARROW-2142: [Python] Allow 
conversion from Numpy struct array
URL: https://github.com/apache/arrow/pull/1635#discussion_r171726050
 
 

 ##########
 File path: python/pyarrow/tests/test_convert_pandas.py
 ##########
 @@ -1371,6 +1371,69 @@ def test_structarray(self):
         series = pd.Series(arr.to_pandas())
         tm.assert_series_equal(series, expected)
 
+    def test_from_numpy(self):
+        dt = np.dtype([('x', np.int32),
+                       (('y_title', 'y'), np.bool_)])
+        ty = pa.struct([pa.field('x', pa.int32()),
+                        pa.field('y', pa.bool_())])
+
+        data = np.array([], dtype=dt)
+        arr = pa.array(data, type=ty)
+        assert arr.to_pylist() == []
+
+        data = np.array([(42, True), (43, False)], dtype=dt)
+        arr = pa.array(data, type=ty)
+        assert arr.to_pylist() == [{'x': 42, 'y': True},
+                                   {'x': 43, 'y': False}]
+
+        # With mask
+        arr = pa.array(data, mask=np.bool_([False, True]), type=ty)
+        assert arr.to_pylist() == [{'x': 42, 'y': True}, None]
+
+        # Trivial struct type
+        dt = np.dtype([])
+        ty = pa.struct([])
+
+        data = np.array([], dtype=dt)
+        arr = pa.array(data, type=ty)
+        assert arr.to_pylist() == []
+
+        data = np.array([(), ()], dtype=dt)
+        arr = pa.array(data, type=ty)
+        assert arr.to_pylist() == [{}, {}]
+
+    def test_from_numpy_nested(self):
+        dt = np.dtype([('x', np.dtype([('xx', np.int8),
+                                       ('yy', np.bool_)])),
+                       ('y', np.int16)])
+        ty = pa.struct([pa.field('x', pa.struct([pa.field('xx', pa.int8()),
+                                                 pa.field('yy', pa.bool_())])),
+                        pa.field('y', pa.int16())])
+
+        data = np.array([], dtype=dt)
+        arr = pa.array(data, type=ty)
+        assert arr.to_pylist() == []
+
+        data = np.array([((1, True), 2), ((3, False), 4)], dtype=dt)
+        arr = pa.array(data, type=ty)
+        assert arr.to_pylist() == [{'x': {'xx': 1, 'yy': True}, 'y': 2},
+                                   {'x': {'xx': 3, 'yy': False}, 'y': 4}]
+
+    def test_from_numpy_bad_input(self):
+        ty = pa.struct([pa.field('x', pa.int32()),
+                        pa.field('y', pa.bool_())])
+        dt = np.dtype([('x', np.int32),
+                       ('z', np.bool_)])
+
+        data = np.array([], dtype=dt)
+        with pytest.raises(TypeError,
+                           match="Missing field 'y'"):
+            pa.array(data, type=ty)
+        data = np.int32([])
+        with pytest.raises(TypeError,
+                           match="Expected struct array"):
+            pa.array(data, type=ty)
 
 Review comment:
   Per above, it may be worth writing a "large memory" test with the 
`large_memory` pytest mark (which we can run locally, but not in Travis CI) 
where we have a field that overflows the 2G in a BinaryArray so we can test the 
rechunking / splitting of the null bitmap. I guess you'll have to pass a mask 
to get some nulls to make sure the logic is correct

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Conversion from Numpy struct array unimplemented
> ---------------------------------------------------------
>
>                 Key: ARROW-2142
>                 URL: https://issues.apache.org/jira/browse/ARROW-2142
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>    Affects Versions: 0.8.0
>            Reporter: Antoine Pitrou
>            Assignee: Antoine Pitrou
>            Priority: Major
>              Labels: pull-request-available
>
> {code:python}
> >>> arr = np.array([(1.5,)], dtype=np.dtype([('x', np.float32)]))
> >>> arr
> array([(1.5,)], dtype=[('x', '<f4')])
> >>> arr[0]
> (1.5,)
> >>> arr['x']
> array([1.5], dtype=float32)
> >>> arr['x'][0]
> 1.5
> >>> pa.array(arr, type=pa.struct([pa.field('x', pa.float32())]))
> Traceback (most recent call last):
>   File "<ipython-input-18-27a52820b7d8>", line 1, in <module>
>     pa.array(arr, type=pa.struct([pa.field('x', pa.float32())]))
>   File "array.pxi", line 177, in pyarrow.lib.array
>   File "error.pxi", line 77, in pyarrow.lib.check_status
>   File "error.pxi", line 85, in pyarrow.lib.check_status
> ArrowNotImplementedError: 
> /home/antoine/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:1585 code: 
> converter.Convert()
> NumPyConverter doesn't implement <struct<x: float>> conversion.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to