[ 
https://issues.apache.org/jira/browse/ARROW-10799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243424#comment-17243424
 ] 

David Li commented on ARROW-10799:
----------------------------------

Not any simple one that I know of, unfortunately. I think you could do it by 
translating the indices into offsets into individual record batches, then 
calling take on each record batch, building up a new chunked array. I had 
started working on implementing this in C++ but haven't had the time to finish 
it.

> [C++] Take on string chunked arrays slow and fails
> --------------------------------------------------
>
>                 Key: ARROW-10799
>                 URL: https://issues.apache.org/jira/browse/ARROW-10799
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Maarten Breddels
>            Priority: Major
>
>  
> {code:java}
> import pyarrow as pa
> a = pa.array(['a'] * 2**26)
> c = pa.chunked_array([a] * 2*18)
> c.take([0, 1])
> {code}
> Gives
> {noformat}
> ----------------------------------------
> ArrowInvalidTraceback (most recent call last)
> <ipython-input-4-57099ee02815> in <module>
> ----> 1 c.take([0, 1])
> ~/github/apache/arrow/python/pyarrow/table.pxi in 
> pyarrow.lib.ChunkedArray.take()
> ~/github/apache/arrow/python/pyarrow/compute.py in take(data, indices, 
> boundscheck, memory_pool)
>     421     """
>     422     options = TakeOptions(boundscheck=boundscheck)
> --> 423     return call_function('take', [data, indices], options, 
> memory_pool)
>     424 
>     425 
> ~/github/apache/arrow/python/pyarrow/_compute.pyx in 
> pyarrow._compute.call_function()
> ~/github/apache/arrow/python/pyarrow/_compute.pyx in 
> pyarrow._compute.Function.call()
> ~/github/apache/arrow/python/pyarrow/error.pxi in 
> pyarrow.lib.pyarrow_internal_check_status()
> ~/github/apache/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: offset overflow while concatenating arrays
> {noformat}
>  
> PS: did not check master but  3.0.0.dev238+gb0bc9f8d
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to