[ 
https://issues.apache.org/jira/browse/ARROW-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382849#comment-16382849
 ] 

ASF GitHub Bot commented on ARROW-2142:
---------------------------------------

wesm commented on a change in pull request #1635: ARROW-2142: [Python] Allow 
conversion from Numpy struct array
URL: https://github.com/apache/arrow/pull/1635#discussion_r171722310
 
 

 ##########
 File path: cpp/src/arrow/array.cc
 ##########
 @@ -772,6 +773,105 @@ std::shared_ptr<Array> MakeArray(const 
std::shared_ptr<ArrayData>& data) {
   return out;
 }
 
+// ----------------------------------------------------------------------
+// Misc APIs
+
+namespace internal {
+
+std::vector<ArrayVector> RechunkArraysConsistently(
+    const std::vector<ArrayVector>& groups) {
+  if (groups.size() <= 1) {
+    return groups;
+  }
+  // Adjacent slices defining the desired rechunking
+  std::vector<std::pair<int64_t, int64_t>> slices;
+  // Total number of elements common to all array groups
+  int64_t total_length = -1;
+
+  {
+    // Compute a vector of slices such that each array spans
+    // one or more *entire* slices only
+    // e.g. if group #1 has bounds {0, 2, 4, 5, 10}
+    //     and group #2 has bounds {0, 5, 7, 10}
+    // then the computed slices are
+    //     {(0, 2), (2, 4), (4, 5), (5, 7), (7, 10)}
+    std::set<int64_t> bounds;
+    for (auto& group : groups) {
+      int64_t cur = 0;
+      bounds.insert(cur);
+      for (auto& array : group) {
+        cur += array->length();
+        bounds.insert(cur);
 
 Review comment:
   The complexity of this code roughly O(ncolumns * log(num chunks)). The 
algorithm in `TableBatchReader::ReadNext` is linear-time -- where it's more 
complex than what's below may be a matter of opinion

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Conversion from Numpy struct array unimplemented
> ---------------------------------------------------------
>
>                 Key: ARROW-2142
>                 URL: https://issues.apache.org/jira/browse/ARROW-2142
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>    Affects Versions: 0.8.0
>            Reporter: Antoine Pitrou
>            Assignee: Antoine Pitrou
>            Priority: Major
>              Labels: pull-request-available
>
> {code:python}
> >>> arr = np.array([(1.5,)], dtype=np.dtype([('x', np.float32)]))
> >>> arr
> array([(1.5,)], dtype=[('x', '<f4')])
> >>> arr[0]
> (1.5,)
> >>> arr['x']
> array([1.5], dtype=float32)
> >>> arr['x'][0]
> 1.5
> >>> pa.array(arr, type=pa.struct([pa.field('x', pa.float32())]))
> Traceback (most recent call last):
>   File "<ipython-input-18-27a52820b7d8>", line 1, in <module>
>     pa.array(arr, type=pa.struct([pa.field('x', pa.float32())]))
>   File "array.pxi", line 177, in pyarrow.lib.array
>   File "error.pxi", line 77, in pyarrow.lib.check_status
>   File "error.pxi", line 85, in pyarrow.lib.check_status
> ArrowNotImplementedError: 
> /home/antoine/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:1585 code: 
> converter.Convert()
> NumPyConverter doesn't implement <struct<x: float>> conversion.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to