I am extremely sorry for the late reply, I didn't get an email regarding your reply. Thanks for the links! This is exactly what I wanted. I tried doing the same `_import_from_c` in my code but it throws an error stating that `pyarrow.DataType._import_from_c` doesn't exist. I am running pyarrow 0.16.0. Is there a case of version mismatch here?
On 2020/03/29 20:46:32, Wes McKinney <wesmck...@gmail.com> wrote: > To add to this, take a look at the C interface functions in pyarrow > > Reconstruct pyarrow.DataType from C ArrowSchema > > https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/types.pxi#L203 > > Reconstruct pyarrow.Array from C ArrowArray > > https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/array.pxi#L1176 > > The idea is that a single ArrowSchema may correspond to a sequence of > ArrowArray, so the data type (equivalently schema) is represented > separately from the array data. > > You can see examples of both of these in the unit tests (which use > cffi to create the C structs) > > https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_cffi.py > > If you're having trouble getting things to work, it would be helpful > if you could show what data exactly you are putting into the C > structures and how it is not returning the expected result when > imported into pyarrow. > > On Sun, Mar 29, 2020 at 3:41 PM Neal Richardson > <neal.p.richard...@gmail.com> wrote: > > > > Hi Anish, > > You may be interested in how the Arrow R package uses the C interface to > > pass data to/from pyarrow. Both sides use the Arrow C++ library's > > implementation of the C interface. See > > https://github.com/apache/arrow/blob/master/r/src/py-to-r.cpp and > > https://github.com/apache/arrow/blob/master/r/R/py-to-r.R. The Arrow C++ > > implementation is in > > https://github.com/apache/arrow/tree/master/cpp/src/arrow/c. > > > > Neal > > > > On Sun, Mar 29, 2020 at 12:14 PM Anish Biswas <anishbiswas...@gmail.com> > > wrote: > > > > > I have been trying to wrap my head around the[ CDataInterface.rst| > > > > > > https://github.com/apache/arrow/blob/master/docs/source/format/CDataInterface.rst > > > ] > > > document for a few days now. So what I am trying is basically to use the C > > > interface with a minimum dependencies to produce blocks of bytes that > > > pyarrow can reconstruct and work on as a normal pyarrow array (and > > > vice-versa: both directions). > > > > > > Here's what I already tried doing. > > > > > > - Created a C library that contains the two structs ArrowSchema and > > > ArrowArray and some functions to export an int64_t array as an Arrow > > > Array. > > > This is very similar to what the document did with int32_t arrays. > > > - Imported the C library in Python. Created an int64_t pyarrow.array. > > > Serialized it to read the bytes via Numpy and populated the C struct I > > > created using the C library function. > > > > > > What I expected was that the bytes would have some resemblance to each > > > other and that pyarrow would have some utility to pick up the ArrowArray > > > struct and treat it as an Arrow Array. But I couldn't get it to work. > > > > > > I am also confused as to how do I use ArrowSchema properly. The > > > ArrowSchema is > > > the only structure that differentiates different ArrowArray formats. > > > However, the fact that I am not using it anywhere with the ArrowArray > > > struct > > > or for that matter for any kind of initialization which tells the Arrow > > > library that "The next structure you will encounter would be of the kind > > > that the ArrowSchema has provided you", doesn't seem correct to me. > > > > > > It would really help me out, if you could tell if I actually > > > misinterpreted > > > the doc, or am I doing something wrong. Thanks! > > > >