Young-Jun Ko created ARROW-1925:
-----------------------------------
Summary: Wrapping PyArrow Table with Numpy without copy
Key: ARROW-1925
URL: https://issues.apache.org/jira/browse/ARROW-1925
Project: Apache Arrow
Issue Type: New Feature
Components: Python
Affects Versions: 0.7.1
Reporter: Young-Jun Ko
Priority: Minor
The scenario is the following:
I have a parquet file, which has a column containing a float array of constant
size.
So it can be thought of as a matrix.
When I read the parquet file, the way I currently access it, is to convert it
to pandas, extract the values, giving me a list of np.array and then doing
np.vstack to get the matrix.
This involves a copy that would be nice to avoid.
When a parquet file (or more generally a parquet dataset) is read, would the
values of the array column be contiguous in memory, so that a view on the data
could be created without having to copy? That would be neat.
Thanks!
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)