Re: Clarification regarding the `CDataInterface.rst`
Upgrading the pip installer worked perfectly. Thanks! Regards, Anish Biswas On 2020/04/02 09:35:50, Antoine Pitrou wrote: > > Hi Anish, > > It looks like a bug with old pip versions. You can first upgrade pip using: > > $ pip install -U pip > > Then redo the "pip install" command for pyarrow. > > If you can't upgrade pip, you can install Numpy separately first (using > "pip install numpy"). > > Regards > > Antoine. > > > Le 02/04/2020 à 06:07, Anish Biswas a écrit : > > Hey Antoine, > > > > I am getting a few complications by using what you said. It's attempting to > > collect numpy>=1.14.0(from pyarrow) and I cross-checked it and isn't any > > .whl file for numpy hosted there. The same case persists for six. Can you > > please look into it? > > > > Thanks, > > Anish Biswas > > > > On 2020/03/30 16:15:53, Antoine Pitrou wrote: > >> On Mon, 30 Mar 2020 15:17:02 - > >> Anish Biswas wrote: > >>> Thanks! I'll probably build the Arrow Library from source. Thanks again! > >> > >> You should be able to get a nightly build using: > >> > >> $ pip install -U --extra-index-url \ > >> https://pypi.fury.io/arrow-nightlies/ --pre pyarrow > >> > >> Regards > >> > >> Antoine. > >> > >> > >> >
Re: Clarification regarding the `CDataInterface.rst`
Hi Anish, It looks like a bug with old pip versions. You can first upgrade pip using: $ pip install -U pip Then redo the "pip install" command for pyarrow. If you can't upgrade pip, you can install Numpy separately first (using "pip install numpy"). Regards Antoine. Le 02/04/2020 à 06:07, Anish Biswas a écrit : > Hey Antoine, > > I am getting a few complications by using what you said. It's attempting to > collect numpy>=1.14.0(from pyarrow) and I cross-checked it and isn't any .whl > file for numpy hosted there. The same case persists for six. Can you please > look into it? > > Thanks, > Anish Biswas > > On 2020/03/30 16:15:53, Antoine Pitrou wrote: >> On Mon, 30 Mar 2020 15:17:02 - >> Anish Biswas wrote: >>> Thanks! I'll probably build the Arrow Library from source. Thanks again! >> >> You should be able to get a nightly build using: >> >> $ pip install -U --extra-index-url \ >> https://pypi.fury.io/arrow-nightlies/ --pre pyarrow >> >> Regards >> >> Antoine. >> >> >>
Re: Clarification regarding the `CDataInterface.rst`
Hey Antoine, I am getting a few complications by using what you said. It's attempting to collect numpy>=1.14.0(from pyarrow) and I cross-checked it and isn't any .whl file for numpy hosted there. The same case persists for six. Can you please look into it? Thanks, Anish Biswas On 2020/03/30 16:15:53, Antoine Pitrou wrote: > On Mon, 30 Mar 2020 15:17:02 - > Anish Biswas wrote: > > Thanks! I'll probably build the Arrow Library from source. Thanks again! > > You should be able to get a nightly build using: > > $ pip install -U --extra-index-url \ > https://pypi.fury.io/arrow-nightlies/ --pre pyarrow > > Regards > > Antoine. > > >
Re: Clarification regarding the `CDataInterface.rst`
On Mon, 30 Mar 2020 15:17:02 - Anish Biswas wrote: > Thanks! I'll probably build the Arrow Library from source. Thanks again! You should be able to get a nightly build using: $ pip install -U --extra-index-url \ https://pypi.fury.io/arrow-nightlies/ --pre pyarrow Regards Antoine.
Re: Clarification regarding the `CDataInterface.rst`
Thanks! I'll probably build the Arrow Library from source. Thanks again! On 2020/03/30 14:49:35, Wes McKinney wrote: > The first release containing this functionality is the upcoming one 0.17.0. > In the meantime you can build from source or use the wheel build scripts in > python/manylinux1. We are working on nightlies for development and testing, > so someone may be able to point you to a nightly package > > On Mon, Mar 30, 2020, 9:28 AM Anish Biswas wrote: > > > I am extremely sorry for the late reply, I didn't get an email regarding > > your reply. Thanks for the links! This is exactly what I wanted. I tried > > doing the same `_import_from_c` in my code but it throws an error stating > > that `pyarrow.DataType._import_from_c` doesn't exist. I am running pyarrow > > 0.16.0. Is there a case of version mismatch here? > > > > On 2020/03/29 20:46:32, Wes McKinney wrote: > > > To add to this, take a look at the C interface functions in pyarrow > > > > > > Reconstruct pyarrow.DataType from C ArrowSchema > > > > > > > > https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/types.pxi#L203 > > > > > > Reconstruct pyarrow.Array from C ArrowArray > > > > > > > > https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/array.pxi#L1176 > > > > > > The idea is that a single ArrowSchema may correspond to a sequence of > > > ArrowArray, so the data type (equivalently schema) is represented > > > separately from the array data. > > > > > > You can see examples of both of these in the unit tests (which use > > > cffi to create the C structs) > > > > > > > > https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_cffi.py > > > > > > If you're having trouble getting things to work, it would be helpful > > > if you could show what data exactly you are putting into the C > > > structures and how it is not returning the expected result when > > > imported into pyarrow. > > > > > > On Sun, Mar 29, 2020 at 3:41 PM Neal Richardson > > > wrote: > > > > > > > > Hi Anish, > > > > You may be interested in how the Arrow R package uses the C interface > > to > > > > pass data to/from pyarrow. Both sides use the Arrow C++ library's > > > > implementation of the C interface. See > > > > https://github.com/apache/arrow/blob/master/r/src/py-to-r.cpp and > > > > https://github.com/apache/arrow/blob/master/r/R/py-to-r.R. The Arrow > > C++ > > > > implementation is in > > > > https://github.com/apache/arrow/tree/master/cpp/src/arrow/c. > > > > > > > > Neal > > > > > > > > On Sun, Mar 29, 2020 at 12:14 PM Anish Biswas < > > anishbiswas...@gmail.com> > > > > wrote: > > > > > > > > > I have been trying to wrap my head around the[ CDataInterface.rst| > > > > > > > > > > > > https://github.com/apache/arrow/blob/master/docs/source/format/CDataInterface.rst > > > > > ] > > > > > document for a few days now. So what I am trying is basically to use > > the C > > > > > interface with a minimum dependencies to produce blocks of bytes that > > > > > pyarrow can reconstruct and work on as a normal pyarrow array (and > > > > > vice-versa: both directions). > > > > > > > > > > Here's what I already tried doing. > > > > > > > > > >- Created a C library that contains the two structs ArrowSchema > > and > > > > >ArrowArray and some functions to export an int64_t array as an > > Arrow > > > > > Array. > > > > >This is very similar to what the document did with int32_t arrays. > > > > >- Imported the C library in Python. Created an int64_t > > pyarrow.array. > > > > >Serialized it to read the bytes via Numpy and populated the C > > struct I > > > > >created using the C library function. > > > > > > > > > > What I expected was that the bytes would have some resemblance to > > each > > > > > other and that pyarrow would have some utility to pick up the > > ArrowArray > > > > > struct and treat it as an Arrow Array. But I couldn't get it to work. > > > > > > > > > > I am also confused as to how do I use ArrowSchema properly. The > > > > > ArrowSchema is > > > > > the only structure that differentiates different ArrowArray formats. > > > > > However, the fact that I am not using it anywhere with the ArrowArray > > > > > struct > > > > > or for that matter for any kind of initialization which tells the > > Arrow > > > > > library that "The next structure you will encounter would be of the > > kind > > > > > that the ArrowSchema has provided you", doesn't seem correct to me. > > > > > > > > > > It would really help me out, if you could tell if I actually > > misinterpreted > > > > > the doc, or am I doing something wrong. Thanks! > > > > > > > > > > >
Re: Clarification regarding the `CDataInterface.rst`
The first release containing this functionality is the upcoming one 0.17.0. In the meantime you can build from source or use the wheel build scripts in python/manylinux1. We are working on nightlies for development and testing, so someone may be able to point you to a nightly package On Mon, Mar 30, 2020, 9:28 AM Anish Biswas wrote: > I am extremely sorry for the late reply, I didn't get an email regarding > your reply. Thanks for the links! This is exactly what I wanted. I tried > doing the same `_import_from_c` in my code but it throws an error stating > that `pyarrow.DataType._import_from_c` doesn't exist. I am running pyarrow > 0.16.0. Is there a case of version mismatch here? > > On 2020/03/29 20:46:32, Wes McKinney wrote: > > To add to this, take a look at the C interface functions in pyarrow > > > > Reconstruct pyarrow.DataType from C ArrowSchema > > > > > https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/types.pxi#L203 > > > > Reconstruct pyarrow.Array from C ArrowArray > > > > > https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/array.pxi#L1176 > > > > The idea is that a single ArrowSchema may correspond to a sequence of > > ArrowArray, so the data type (equivalently schema) is represented > > separately from the array data. > > > > You can see examples of both of these in the unit tests (which use > > cffi to create the C structs) > > > > > https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_cffi.py > > > > If you're having trouble getting things to work, it would be helpful > > if you could show what data exactly you are putting into the C > > structures and how it is not returning the expected result when > > imported into pyarrow. > > > > On Sun, Mar 29, 2020 at 3:41 PM Neal Richardson > > wrote: > > > > > > Hi Anish, > > > You may be interested in how the Arrow R package uses the C interface > to > > > pass data to/from pyarrow. Both sides use the Arrow C++ library's > > > implementation of the C interface. See > > > https://github.com/apache/arrow/blob/master/r/src/py-to-r.cpp and > > > https://github.com/apache/arrow/blob/master/r/R/py-to-r.R. The Arrow > C++ > > > implementation is in > > > https://github.com/apache/arrow/tree/master/cpp/src/arrow/c. > > > > > > Neal > > > > > > On Sun, Mar 29, 2020 at 12:14 PM Anish Biswas < > anishbiswas...@gmail.com> > > > wrote: > > > > > > > I have been trying to wrap my head around the[ CDataInterface.rst| > > > > > > > > > https://github.com/apache/arrow/blob/master/docs/source/format/CDataInterface.rst > > > > ] > > > > document for a few days now. So what I am trying is basically to use > the C > > > > interface with a minimum dependencies to produce blocks of bytes that > > > > pyarrow can reconstruct and work on as a normal pyarrow array (and > > > > vice-versa: both directions). > > > > > > > > Here's what I already tried doing. > > > > > > > >- Created a C library that contains the two structs ArrowSchema > and > > > >ArrowArray and some functions to export an int64_t array as an > Arrow > > > > Array. > > > >This is very similar to what the document did with int32_t arrays. > > > >- Imported the C library in Python. Created an int64_t > pyarrow.array. > > > >Serialized it to read the bytes via Numpy and populated the C > struct I > > > >created using the C library function. > > > > > > > > What I expected was that the bytes would have some resemblance to > each > > > > other and that pyarrow would have some utility to pick up the > ArrowArray > > > > struct and treat it as an Arrow Array. But I couldn't get it to work. > > > > > > > > I am also confused as to how do I use ArrowSchema properly. The > > > > ArrowSchema is > > > > the only structure that differentiates different ArrowArray formats. > > > > However, the fact that I am not using it anywhere with the ArrowArray > > > > struct > > > > or for that matter for any kind of initialization which tells the > Arrow > > > > library that "The next structure you will encounter would be of the > kind > > > > that the ArrowSchema has provided you", doesn't seem correct to me. > > > > > > > > It would really help me out, if you could tell if I actually > misinterpreted > > > > the doc, or am I doing something wrong. Thanks! > > > > > > >
Re: Clarification regarding the `CDataInterface.rst`
Hi Neil Richardson, I apologize for the late reply. The links are pretty helpful, thanks a ton! I went through them and this would be a very good starting point for a larger project that I am working on where my task is exactly this. Conversions "to Arrow" and "from Arrow". On 2020/03/29 20:40:59, Neal Richardson wrote: > Hi Anish, > You may be interested in how the Arrow R package uses the C interface to > pass data to/from pyarrow. Both sides use the Arrow C++ library's > implementation of the C interface. See > https://github.com/apache/arrow/blob/master/r/src/py-to-r.cpp and > https://github.com/apache/arrow/blob/master/r/R/py-to-r.R. The Arrow C++ > implementation is in > https://github.com/apache/arrow/tree/master/cpp/src/arrow/c. > > Neal > > On Sun, Mar 29, 2020 at 12:14 PM Anish Biswas > wrote: > > > I have been trying to wrap my head around the[ CDataInterface.rst| > > > > https://github.com/apache/arrow/blob/master/docs/source/format/CDataInterface.rst > > ] > > document for a few days now. So what I am trying is basically to use the C > > interface with a minimum dependencies to produce blocks of bytes that > > pyarrow can reconstruct and work on as a normal pyarrow array (and > > vice-versa: both directions). > > > > Here's what I already tried doing. > > > >- Created a C library that contains the two structs ArrowSchema and > >ArrowArray and some functions to export an int64_t array as an Arrow > > Array. > >This is very similar to what the document did with int32_t arrays. > >- Imported the C library in Python. Created an int64_t pyarrow.array. > >Serialized it to read the bytes via Numpy and populated the C struct I > >created using the C library function. > > > > What I expected was that the bytes would have some resemblance to each > > other and that pyarrow would have some utility to pick up the ArrowArray > > struct and treat it as an Arrow Array. But I couldn't get it to work. > > > > I am also confused as to how do I use ArrowSchema properly. The > > ArrowSchema is > > the only structure that differentiates different ArrowArray formats. > > However, the fact that I am not using it anywhere with the ArrowArray > > struct > > or for that matter for any kind of initialization which tells the Arrow > > library that "The next structure you will encounter would be of the kind > > that the ArrowSchema has provided you", doesn't seem correct to me. > > > > It would really help me out, if you could tell if I actually misinterpreted > > the doc, or am I doing something wrong. Thanks! > > >
Re: Clarification regarding the `CDataInterface.rst`
I am extremely sorry for the late reply, I didn't get an email regarding your reply. Thanks for the links! This is exactly what I wanted. I tried doing the same `_import_from_c` in my code but it throws an error stating that `pyarrow.DataType._import_from_c` doesn't exist. I am running pyarrow 0.16.0. Is there a case of version mismatch here? On 2020/03/29 20:46:32, Wes McKinney wrote: > To add to this, take a look at the C interface functions in pyarrow > > Reconstruct pyarrow.DataType from C ArrowSchema > > https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/types.pxi#L203 > > Reconstruct pyarrow.Array from C ArrowArray > > https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/array.pxi#L1176 > > The idea is that a single ArrowSchema may correspond to a sequence of > ArrowArray, so the data type (equivalently schema) is represented > separately from the array data. > > You can see examples of both of these in the unit tests (which use > cffi to create the C structs) > > https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_cffi.py > > If you're having trouble getting things to work, it would be helpful > if you could show what data exactly you are putting into the C > structures and how it is not returning the expected result when > imported into pyarrow. > > On Sun, Mar 29, 2020 at 3:41 PM Neal Richardson > wrote: > > > > Hi Anish, > > You may be interested in how the Arrow R package uses the C interface to > > pass data to/from pyarrow. Both sides use the Arrow C++ library's > > implementation of the C interface. See > > https://github.com/apache/arrow/blob/master/r/src/py-to-r.cpp and > > https://github.com/apache/arrow/blob/master/r/R/py-to-r.R. The Arrow C++ > > implementation is in > > https://github.com/apache/arrow/tree/master/cpp/src/arrow/c. > > > > Neal > > > > On Sun, Mar 29, 2020 at 12:14 PM Anish Biswas > > wrote: > > > > > I have been trying to wrap my head around the[ CDataInterface.rst| > > > > > > https://github.com/apache/arrow/blob/master/docs/source/format/CDataInterface.rst > > > ] > > > document for a few days now. So what I am trying is basically to use the C > > > interface with a minimum dependencies to produce blocks of bytes that > > > pyarrow can reconstruct and work on as a normal pyarrow array (and > > > vice-versa: both directions). > > > > > > Here's what I already tried doing. > > > > > >- Created a C library that contains the two structs ArrowSchema and > > >ArrowArray and some functions to export an int64_t array as an Arrow > > > Array. > > >This is very similar to what the document did with int32_t arrays. > > >- Imported the C library in Python. Created an int64_t pyarrow.array. > > >Serialized it to read the bytes via Numpy and populated the C struct I > > >created using the C library function. > > > > > > What I expected was that the bytes would have some resemblance to each > > > other and that pyarrow would have some utility to pick up the ArrowArray > > > struct and treat it as an Arrow Array. But I couldn't get it to work. > > > > > > I am also confused as to how do I use ArrowSchema properly. The > > > ArrowSchema is > > > the only structure that differentiates different ArrowArray formats. > > > However, the fact that I am not using it anywhere with the ArrowArray > > > struct > > > or for that matter for any kind of initialization which tells the Arrow > > > library that "The next structure you will encounter would be of the kind > > > that the ArrowSchema has provided you", doesn't seem correct to me. > > > > > > It would really help me out, if you could tell if I actually > > > misinterpreted > > > the doc, or am I doing something wrong. Thanks! > > > >
Re: Clarification regarding the `CDataInterface.rst`
To add to this, take a look at the C interface functions in pyarrow Reconstruct pyarrow.DataType from C ArrowSchema https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/types.pxi#L203 Reconstruct pyarrow.Array from C ArrowArray https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/array.pxi#L1176 The idea is that a single ArrowSchema may correspond to a sequence of ArrowArray, so the data type (equivalently schema) is represented separately from the array data. You can see examples of both of these in the unit tests (which use cffi to create the C structs) https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_cffi.py If you're having trouble getting things to work, it would be helpful if you could show what data exactly you are putting into the C structures and how it is not returning the expected result when imported into pyarrow. On Sun, Mar 29, 2020 at 3:41 PM Neal Richardson wrote: > > Hi Anish, > You may be interested in how the Arrow R package uses the C interface to > pass data to/from pyarrow. Both sides use the Arrow C++ library's > implementation of the C interface. See > https://github.com/apache/arrow/blob/master/r/src/py-to-r.cpp and > https://github.com/apache/arrow/blob/master/r/R/py-to-r.R. The Arrow C++ > implementation is in > https://github.com/apache/arrow/tree/master/cpp/src/arrow/c. > > Neal > > On Sun, Mar 29, 2020 at 12:14 PM Anish Biswas > wrote: > > > I have been trying to wrap my head around the[ CDataInterface.rst| > > > > https://github.com/apache/arrow/blob/master/docs/source/format/CDataInterface.rst > > ] > > document for a few days now. So what I am trying is basically to use the C > > interface with a minimum dependencies to produce blocks of bytes that > > pyarrow can reconstruct and work on as a normal pyarrow array (and > > vice-versa: both directions). > > > > Here's what I already tried doing. > > > >- Created a C library that contains the two structs ArrowSchema and > >ArrowArray and some functions to export an int64_t array as an Arrow > > Array. > >This is very similar to what the document did with int32_t arrays. > >- Imported the C library in Python. Created an int64_t pyarrow.array. > >Serialized it to read the bytes via Numpy and populated the C struct I > >created using the C library function. > > > > What I expected was that the bytes would have some resemblance to each > > other and that pyarrow would have some utility to pick up the ArrowArray > > struct and treat it as an Arrow Array. But I couldn't get it to work. > > > > I am also confused as to how do I use ArrowSchema properly. The > > ArrowSchema is > > the only structure that differentiates different ArrowArray formats. > > However, the fact that I am not using it anywhere with the ArrowArray > > struct > > or for that matter for any kind of initialization which tells the Arrow > > library that "The next structure you will encounter would be of the kind > > that the ArrowSchema has provided you", doesn't seem correct to me. > > > > It would really help me out, if you could tell if I actually misinterpreted > > the doc, or am I doing something wrong. Thanks! > >
Re: Clarification regarding the `CDataInterface.rst`
Hi Anish, You may be interested in how the Arrow R package uses the C interface to pass data to/from pyarrow. Both sides use the Arrow C++ library's implementation of the C interface. See https://github.com/apache/arrow/blob/master/r/src/py-to-r.cpp and https://github.com/apache/arrow/blob/master/r/R/py-to-r.R. The Arrow C++ implementation is in https://github.com/apache/arrow/tree/master/cpp/src/arrow/c. Neal On Sun, Mar 29, 2020 at 12:14 PM Anish Biswas wrote: > I have been trying to wrap my head around the[ CDataInterface.rst| > > https://github.com/apache/arrow/blob/master/docs/source/format/CDataInterface.rst > ] > document for a few days now. So what I am trying is basically to use the C > interface with a minimum dependencies to produce blocks of bytes that > pyarrow can reconstruct and work on as a normal pyarrow array (and > vice-versa: both directions). > > Here's what I already tried doing. > >- Created a C library that contains the two structs ArrowSchema and >ArrowArray and some functions to export an int64_t array as an Arrow > Array. >This is very similar to what the document did with int32_t arrays. >- Imported the C library in Python. Created an int64_t pyarrow.array. >Serialized it to read the bytes via Numpy and populated the C struct I >created using the C library function. > > What I expected was that the bytes would have some resemblance to each > other and that pyarrow would have some utility to pick up the ArrowArray > struct and treat it as an Arrow Array. But I couldn't get it to work. > > I am also confused as to how do I use ArrowSchema properly. The > ArrowSchema is > the only structure that differentiates different ArrowArray formats. > However, the fact that I am not using it anywhere with the ArrowArray > struct > or for that matter for any kind of initialization which tells the Arrow > library that "The next structure you will encounter would be of the kind > that the ArrowSchema has provided you", doesn't seem correct to me. > > It would really help me out, if you could tell if I actually misinterpreted > the doc, or am I doing something wrong. Thanks! >