jorisvandenbossche commented on issue #38137: URL: https://github.com/apache/arrow/issues/38137#issuecomment-1757264027
The `pa.array(..)` function indeed currently supports specifically numpy.ndarray and pandas array-likes (which is essentially ducktyping for `.dtype` attribute, and then converting to numpy ndarray), and other objects are treated as general sequences (and converted as such, by iterating over the elements). We should probably expand that to support any buffer-like object as well, and in that case take the faster code path for ndarrays. Manual workaround for now is to create a Buffer object (which does support objects that implement the buffer protocol, and thus supports array.array objects), and then create an Array from that Buffer with the low-level `from_buffers`: ``` In [12]: %timeit pa.array(x) 2.53 ms ± 15.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [13]: %timeit pa.Array.from_buffers(pa.uint8(), len(x), [None, pa.py_buffer(x)]) 1.54 µs ± 105 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
