Anish Biswas created ARROW-8257: ----------------------------------- Summary: Clarification regarding the `CDataInterface.rst` Key: ARROW-8257 URL: https://issues.apache.org/jira/browse/ARROW-8257 Project: Apache Arrow Issue Type: Bug Reporter: Anish Biswas
I have been trying to wrap my head around the[ CDataInterface.rst|[https://github.com/apache/arrow/blob/master/docs/source/format/CDataInterface.rst]] document for a few days now. So what I am trying is basically to use the C interface with a minimum dependencies to produce blocks of bytes that pyarrow can reconstruct and work on as a normal pyarrow array (and vice-versa: both directions). Here's what I already tried doing. * Created a C library that contains the two structs ArrowSchema and ArrowArray and some functions to export an int64_t array as an Arrow Array. This is very similar to what the document did with int32_t arrays. * Imported the C library in Python. Created an int64_t pyarrow.array. Serialized it to read the bytes via Numpy and populated the C struct I created using the C library function. What I expected was that the bytes would have some resemblance to each other and that pyarrow would have some utility to pick up the ArrowArray struct and treat it as an Arrow Array. But I couldn't get it to work. I am also confused as to how do I use ArrowSchema properly. The {{ArrowSchema}} is the only structure that differentiates different {{ArrowArray}} formats. However, the fact that I am not using it anywhere with the {{ArrowArray}} struct or for that matter for any kind of initialization which tells the Arrow library that "The next structure you will encounter would be of the kind that the {{ArrowSchema}} has provided you", doesn't seem correct to me. It would really help me out, if you could tell if I actually misinterpreted the doc, or am I doing something wrong. Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)