Re: Clarification regarding the `CDataInterface.rst`

2020-04-02 Thread Anish Biswas
Upgrading the pip installer worked perfectly. Thanks!

Regards,
Anish Biswas

On 2020/04/02 09:35:50, Antoine Pitrou  wrote: 
> 
> Hi Anish,
> 
> It looks like a bug with old pip versions.  You can first upgrade pip using:
> 
> $ pip install -U pip
> 
> Then redo the "pip install" command for pyarrow.
> 
> If you can't upgrade pip, you can install Numpy separately first (using
> "pip install numpy").
> 
> Regards
> 
> Antoine.
> 
> 
> Le 02/04/2020 à 06:07, Anish Biswas a écrit :
> > Hey Antoine,
> > 
> > I am getting a few complications by using what you said. It's attempting to 
> > collect numpy>=1.14.0(from pyarrow) and I cross-checked it and isn't any 
> > .whl file for numpy hosted there. The same case persists for six. Can you 
> > please look into it?
> > 
> > Thanks,
> > Anish Biswas
> > 
> > On 2020/03/30 16:15:53, Antoine Pitrou  wrote: 
> >> On Mon, 30 Mar 2020 15:17:02 -
> >> Anish Biswas  wrote:
> >>> Thanks! I'll probably build the Arrow Library from source. Thanks again!
> >>
> >> You should be able to get a nightly build using:
> >>
> >> $ pip install -U --extra-index-url \
> >> https://pypi.fury.io/arrow-nightlies/ --pre pyarrow
> >>
> >> Regards
> >>
> >> Antoine.
> >>
> >>
> >>
> 


Re: Clarification regarding the `CDataInterface.rst`

2020-04-02 Thread Antoine Pitrou


Hi Anish,

It looks like a bug with old pip versions.  You can first upgrade pip using:

$ pip install -U pip

Then redo the "pip install" command for pyarrow.

If you can't upgrade pip, you can install Numpy separately first (using
"pip install numpy").

Regards

Antoine.


Le 02/04/2020 à 06:07, Anish Biswas a écrit :
> Hey Antoine,
> 
> I am getting a few complications by using what you said. It's attempting to 
> collect numpy>=1.14.0(from pyarrow) and I cross-checked it and isn't any .whl 
> file for numpy hosted there. The same case persists for six. Can you please 
> look into it?
> 
> Thanks,
> Anish Biswas
> 
> On 2020/03/30 16:15:53, Antoine Pitrou  wrote: 
>> On Mon, 30 Mar 2020 15:17:02 -
>> Anish Biswas  wrote:
>>> Thanks! I'll probably build the Arrow Library from source. Thanks again!
>>
>> You should be able to get a nightly build using:
>>
>> $ pip install -U --extra-index-url \
>> https://pypi.fury.io/arrow-nightlies/ --pre pyarrow
>>
>> Regards
>>
>> Antoine.
>>
>>
>>


Re: Clarification regarding the `CDataInterface.rst`

2020-04-01 Thread Anish Biswas
Hey Antoine,

I am getting a few complications by using what you said. It's attempting to 
collect numpy>=1.14.0(from pyarrow) and I cross-checked it and isn't any .whl 
file for numpy hosted there. The same case persists for six. Can you please 
look into it?

Thanks,
Anish Biswas

On 2020/03/30 16:15:53, Antoine Pitrou  wrote: 
> On Mon, 30 Mar 2020 15:17:02 -
> Anish Biswas  wrote:
> > Thanks! I'll probably build the Arrow Library from source. Thanks again!
> 
> You should be able to get a nightly build using:
> 
> $ pip install -U --extra-index-url \
> https://pypi.fury.io/arrow-nightlies/ --pre pyarrow
> 
> Regards
> 
> Antoine.
> 
> 
> 


Re: Clarification regarding the `CDataInterface.rst`

2020-03-30 Thread Antoine Pitrou
On Mon, 30 Mar 2020 15:17:02 -
Anish Biswas  wrote:
> Thanks! I'll probably build the Arrow Library from source. Thanks again!

You should be able to get a nightly build using:

$ pip install -U --extra-index-url \
https://pypi.fury.io/arrow-nightlies/ --pre pyarrow

Regards

Antoine.




Re: Clarification regarding the `CDataInterface.rst`

2020-03-30 Thread Anish Biswas
Thanks! I'll probably build the Arrow Library from source. Thanks again!

On 2020/03/30 14:49:35, Wes McKinney  wrote: 
> The first release containing this functionality is the upcoming one 0.17.0.
> In the meantime you can build from source or use the wheel build scripts in
> python/manylinux1. We are working on nightlies for development and testing,
> so someone may be able to point you to a nightly package
> 
> On Mon, Mar 30, 2020, 9:28 AM Anish Biswas  wrote:
> 
> > I am extremely sorry for the late reply, I didn't get an email regarding
> > your reply. Thanks for the links! This is exactly what I wanted. I tried
> > doing the same `_import_from_c` in my code but it throws an error stating
> > that `pyarrow.DataType._import_from_c` doesn't exist. I am running pyarrow
> > 0.16.0. Is there a case of version mismatch here?
> >
> > On 2020/03/29 20:46:32, Wes McKinney  wrote:
> > > To add to this, take a look at the C interface functions in pyarrow
> > >
> > > Reconstruct pyarrow.DataType from C ArrowSchema
> > >
> > >
> > https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/types.pxi#L203
> > >
> > > Reconstruct pyarrow.Array from C ArrowArray
> > >
> > >
> > https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/array.pxi#L1176
> > >
> > > The idea is that a single ArrowSchema may correspond to a sequence of
> > > ArrowArray, so the data type (equivalently schema) is represented
> > > separately from the array data.
> > >
> > > You can see examples of both of these in the unit tests (which use
> > > cffi to create the C structs)
> > >
> > >
> > https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_cffi.py
> > >
> > > If you're having trouble getting things to work, it would be helpful
> > > if you could show what data exactly you are putting into the C
> > > structures and how it is not returning the expected result when
> > > imported into pyarrow.
> > >
> > > On Sun, Mar 29, 2020 at 3:41 PM Neal Richardson
> > >  wrote:
> > > >
> > > > Hi Anish,
> > > > You may be interested in how the Arrow R package uses the C interface
> > to
> > > > pass data to/from pyarrow. Both sides use the Arrow C++ library's
> > > > implementation of the C interface. See
> > > > https://github.com/apache/arrow/blob/master/r/src/py-to-r.cpp and
> > > > https://github.com/apache/arrow/blob/master/r/R/py-to-r.R. The Arrow
> > C++
> > > > implementation is in
> > > > https://github.com/apache/arrow/tree/master/cpp/src/arrow/c.
> > > >
> > > > Neal
> > > >
> > > > On Sun, Mar 29, 2020 at 12:14 PM Anish Biswas <
> > anishbiswas...@gmail.com>
> > > > wrote:
> > > >
> > > > > I have been trying to wrap my head around the[ CDataInterface.rst|
> > > > >
> > > > >
> > https://github.com/apache/arrow/blob/master/docs/source/format/CDataInterface.rst
> > > > > ]
> > > > > document for a few days now. So what I am trying is basically to use
> > the C
> > > > > interface with a minimum dependencies to produce blocks of bytes that
> > > > > pyarrow can reconstruct and work on as a normal pyarrow array (and
> > > > > vice-versa: both directions).
> > > > >
> > > > > Here's what I already tried doing.
> > > > >
> > > > >- Created a C library that contains the two structs ArrowSchema
> > and
> > > > >ArrowArray and some functions to export an int64_t array as an
> > Arrow
> > > > > Array.
> > > > >This is very similar to what the document did with int32_t arrays.
> > > > >- Imported the C library in Python. Created an int64_t
> > pyarrow.array.
> > > > >Serialized it to read the bytes via Numpy and populated the C
> > struct I
> > > > >created using the C library function.
> > > > >
> > > > > What I expected was that the bytes would have some resemblance to
> > each
> > > > > other and that pyarrow would have some utility to pick up the
> > ArrowArray
> > > > > struct and treat it as an Arrow Array. But I couldn't get it to work.
> > > > >
> > > > > I am also confused as to how do I use ArrowSchema properly. The
> > > > > ArrowSchema is
> > > > > the only structure that differentiates different ArrowArray formats.
> > > > > However, the fact that I am not using it anywhere with the ArrowArray
> > > > > struct
> > > > > or for that matter for any kind of initialization which tells the
> > Arrow
> > > > > library that "The next structure you will encounter would be of the
> > kind
> > > > > that the ArrowSchema has provided you", doesn't seem correct to me.
> > > > >
> > > > > It would really help me out, if you could tell if I actually
> > misinterpreted
> > > > > the doc, or am I doing something wrong. Thanks!
> > > > >
> > >
> >
> 


Re: Clarification regarding the `CDataInterface.rst`

2020-03-30 Thread Wes McKinney
The first release containing this functionality is the upcoming one 0.17.0.
In the meantime you can build from source or use the wheel build scripts in
python/manylinux1. We are working on nightlies for development and testing,
so someone may be able to point you to a nightly package

On Mon, Mar 30, 2020, 9:28 AM Anish Biswas  wrote:

> I am extremely sorry for the late reply, I didn't get an email regarding
> your reply. Thanks for the links! This is exactly what I wanted. I tried
> doing the same `_import_from_c` in my code but it throws an error stating
> that `pyarrow.DataType._import_from_c` doesn't exist. I am running pyarrow
> 0.16.0. Is there a case of version mismatch here?
>
> On 2020/03/29 20:46:32, Wes McKinney  wrote:
> > To add to this, take a look at the C interface functions in pyarrow
> >
> > Reconstruct pyarrow.DataType from C ArrowSchema
> >
> >
> https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/types.pxi#L203
> >
> > Reconstruct pyarrow.Array from C ArrowArray
> >
> >
> https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/array.pxi#L1176
> >
> > The idea is that a single ArrowSchema may correspond to a sequence of
> > ArrowArray, so the data type (equivalently schema) is represented
> > separately from the array data.
> >
> > You can see examples of both of these in the unit tests (which use
> > cffi to create the C structs)
> >
> >
> https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_cffi.py
> >
> > If you're having trouble getting things to work, it would be helpful
> > if you could show what data exactly you are putting into the C
> > structures and how it is not returning the expected result when
> > imported into pyarrow.
> >
> > On Sun, Mar 29, 2020 at 3:41 PM Neal Richardson
> >  wrote:
> > >
> > > Hi Anish,
> > > You may be interested in how the Arrow R package uses the C interface
> to
> > > pass data to/from pyarrow. Both sides use the Arrow C++ library's
> > > implementation of the C interface. See
> > > https://github.com/apache/arrow/blob/master/r/src/py-to-r.cpp and
> > > https://github.com/apache/arrow/blob/master/r/R/py-to-r.R. The Arrow
> C++
> > > implementation is in
> > > https://github.com/apache/arrow/tree/master/cpp/src/arrow/c.
> > >
> > > Neal
> > >
> > > On Sun, Mar 29, 2020 at 12:14 PM Anish Biswas <
> anishbiswas...@gmail.com>
> > > wrote:
> > >
> > > > I have been trying to wrap my head around the[ CDataInterface.rst|
> > > >
> > > >
> https://github.com/apache/arrow/blob/master/docs/source/format/CDataInterface.rst
> > > > ]
> > > > document for a few days now. So what I am trying is basically to use
> the C
> > > > interface with a minimum dependencies to produce blocks of bytes that
> > > > pyarrow can reconstruct and work on as a normal pyarrow array (and
> > > > vice-versa: both directions).
> > > >
> > > > Here's what I already tried doing.
> > > >
> > > >- Created a C library that contains the two structs ArrowSchema
> and
> > > >ArrowArray and some functions to export an int64_t array as an
> Arrow
> > > > Array.
> > > >This is very similar to what the document did with int32_t arrays.
> > > >- Imported the C library in Python. Created an int64_t
> pyarrow.array.
> > > >Serialized it to read the bytes via Numpy and populated the C
> struct I
> > > >created using the C library function.
> > > >
> > > > What I expected was that the bytes would have some resemblance to
> each
> > > > other and that pyarrow would have some utility to pick up the
> ArrowArray
> > > > struct and treat it as an Arrow Array. But I couldn't get it to work.
> > > >
> > > > I am also confused as to how do I use ArrowSchema properly. The
> > > > ArrowSchema is
> > > > the only structure that differentiates different ArrowArray formats.
> > > > However, the fact that I am not using it anywhere with the ArrowArray
> > > > struct
> > > > or for that matter for any kind of initialization which tells the
> Arrow
> > > > library that "The next structure you will encounter would be of the
> kind
> > > > that the ArrowSchema has provided you", doesn't seem correct to me.
> > > >
> > > > It would really help me out, if you could tell if I actually
> misinterpreted
> > > > the doc, or am I doing something wrong. Thanks!
> > > >
> >
>


Re: Clarification regarding the `CDataInterface.rst`

2020-03-30 Thread Anish Biswas
Hi Neil Richardson,
I apologize for the late reply. The links are pretty helpful, thanks a ton! I 
went through them and this would be a very good starting point for a larger 
project that I am working on where my task is exactly this. Conversions "to 
Arrow" and "from Arrow".

On 2020/03/29 20:40:59, Neal Richardson  wrote: 
> Hi Anish,
> You may be interested in how the Arrow R package uses the C interface to
> pass data to/from pyarrow. Both sides use the Arrow C++ library's
> implementation of the C interface. See
> https://github.com/apache/arrow/blob/master/r/src/py-to-r.cpp and
> https://github.com/apache/arrow/blob/master/r/R/py-to-r.R. The Arrow C++
> implementation is in
> https://github.com/apache/arrow/tree/master/cpp/src/arrow/c.
> 
> Neal
> 
> On Sun, Mar 29, 2020 at 12:14 PM Anish Biswas 
> wrote:
> 
> > I have been trying to wrap my head around the[ CDataInterface.rst|
> >
> > https://github.com/apache/arrow/blob/master/docs/source/format/CDataInterface.rst
> > ]
> > document for a few days now. So what I am trying is basically to use the C
> > interface with a minimum dependencies to produce blocks of bytes that
> > pyarrow can reconstruct and work on as a normal pyarrow array (and
> > vice-versa: both directions).
> >
> > Here's what I already tried doing.
> >
> >- Created a C library that contains the two structs ArrowSchema and
> >ArrowArray and some functions to export an int64_t array as an Arrow
> > Array.
> >This is very similar to what the document did with int32_t arrays.
> >- Imported the C library in Python. Created an int64_t pyarrow.array.
> >Serialized it to read the bytes via Numpy and populated the C struct I
> >created using the C library function.
> >
> > What I expected was that the bytes would have some resemblance to each
> > other and that pyarrow would have some utility to pick up the ArrowArray
> > struct and treat it as an Arrow Array. But I couldn't get it to work.
> >
> > I am also confused as to how do I use ArrowSchema properly. The
> > ArrowSchema is
> > the only structure that differentiates different ArrowArray formats.
> > However, the fact that I am not using it anywhere with the ArrowArray
> > struct
> > or for that matter for any kind of initialization which tells the Arrow
> > library that "The next structure you will encounter would be of the kind
> > that the ArrowSchema has provided you", doesn't seem correct to me.
> >
> > It would really help me out, if you could tell if I actually misinterpreted
> > the doc, or am I doing something wrong. Thanks!
> >
> 


Re: Clarification regarding the `CDataInterface.rst`

2020-03-30 Thread Anish Biswas
I am extremely sorry for the late reply, I didn't get an email regarding your 
reply. Thanks for the links! This is exactly what I wanted. I tried doing the 
same `_import_from_c` in my code but it throws an error stating that 
`pyarrow.DataType._import_from_c` doesn't exist. I am running pyarrow 0.16.0. 
Is there a case of version mismatch here?

On 2020/03/29 20:46:32, Wes McKinney  wrote: 
> To add to this, take a look at the C interface functions in pyarrow
> 
> Reconstruct pyarrow.DataType from C ArrowSchema
> 
> https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/types.pxi#L203
> 
> Reconstruct pyarrow.Array from C ArrowArray
> 
> https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/array.pxi#L1176
> 
> The idea is that a single ArrowSchema may correspond to a sequence of
> ArrowArray, so the data type (equivalently schema) is represented
> separately from the array data.
> 
> You can see examples of both of these in the unit tests (which use
> cffi to create the C structs)
> 
> https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_cffi.py
> 
> If you're having trouble getting things to work, it would be helpful
> if you could show what data exactly you are putting into the C
> structures and how it is not returning the expected result when
> imported into pyarrow.
> 
> On Sun, Mar 29, 2020 at 3:41 PM Neal Richardson
>  wrote:
> >
> > Hi Anish,
> > You may be interested in how the Arrow R package uses the C interface to
> > pass data to/from pyarrow. Both sides use the Arrow C++ library's
> > implementation of the C interface. See
> > https://github.com/apache/arrow/blob/master/r/src/py-to-r.cpp and
> > https://github.com/apache/arrow/blob/master/r/R/py-to-r.R. The Arrow C++
> > implementation is in
> > https://github.com/apache/arrow/tree/master/cpp/src/arrow/c.
> >
> > Neal
> >
> > On Sun, Mar 29, 2020 at 12:14 PM Anish Biswas 
> > wrote:
> >
> > > I have been trying to wrap my head around the[ CDataInterface.rst|
> > >
> > > https://github.com/apache/arrow/blob/master/docs/source/format/CDataInterface.rst
> > > ]
> > > document for a few days now. So what I am trying is basically to use the C
> > > interface with a minimum dependencies to produce blocks of bytes that
> > > pyarrow can reconstruct and work on as a normal pyarrow array (and
> > > vice-versa: both directions).
> > >
> > > Here's what I already tried doing.
> > >
> > >- Created a C library that contains the two structs ArrowSchema and
> > >ArrowArray and some functions to export an int64_t array as an Arrow
> > > Array.
> > >This is very similar to what the document did with int32_t arrays.
> > >- Imported the C library in Python. Created an int64_t pyarrow.array.
> > >Serialized it to read the bytes via Numpy and populated the C struct I
> > >created using the C library function.
> > >
> > > What I expected was that the bytes would have some resemblance to each
> > > other and that pyarrow would have some utility to pick up the ArrowArray
> > > struct and treat it as an Arrow Array. But I couldn't get it to work.
> > >
> > > I am also confused as to how do I use ArrowSchema properly. The
> > > ArrowSchema is
> > > the only structure that differentiates different ArrowArray formats.
> > > However, the fact that I am not using it anywhere with the ArrowArray
> > > struct
> > > or for that matter for any kind of initialization which tells the Arrow
> > > library that "The next structure you will encounter would be of the kind
> > > that the ArrowSchema has provided you", doesn't seem correct to me.
> > >
> > > It would really help me out, if you could tell if I actually 
> > > misinterpreted
> > > the doc, or am I doing something wrong. Thanks!
> > >
> 


Re: Clarification regarding the `CDataInterface.rst`

2020-03-29 Thread Wes McKinney
To add to this, take a look at the C interface functions in pyarrow

Reconstruct pyarrow.DataType from C ArrowSchema

https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/types.pxi#L203

Reconstruct pyarrow.Array from C ArrowArray

https://github.com/apache/arrow/blob/b07c2626cb3cdd3498b41da9feedf7c8319baa27/python/pyarrow/array.pxi#L1176

The idea is that a single ArrowSchema may correspond to a sequence of
ArrowArray, so the data type (equivalently schema) is represented
separately from the array data.

You can see examples of both of these in the unit tests (which use
cffi to create the C structs)

https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_cffi.py

If you're having trouble getting things to work, it would be helpful
if you could show what data exactly you are putting into the C
structures and how it is not returning the expected result when
imported into pyarrow.

On Sun, Mar 29, 2020 at 3:41 PM Neal Richardson
 wrote:
>
> Hi Anish,
> You may be interested in how the Arrow R package uses the C interface to
> pass data to/from pyarrow. Both sides use the Arrow C++ library's
> implementation of the C interface. See
> https://github.com/apache/arrow/blob/master/r/src/py-to-r.cpp and
> https://github.com/apache/arrow/blob/master/r/R/py-to-r.R. The Arrow C++
> implementation is in
> https://github.com/apache/arrow/tree/master/cpp/src/arrow/c.
>
> Neal
>
> On Sun, Mar 29, 2020 at 12:14 PM Anish Biswas 
> wrote:
>
> > I have been trying to wrap my head around the[ CDataInterface.rst|
> >
> > https://github.com/apache/arrow/blob/master/docs/source/format/CDataInterface.rst
> > ]
> > document for a few days now. So what I am trying is basically to use the C
> > interface with a minimum dependencies to produce blocks of bytes that
> > pyarrow can reconstruct and work on as a normal pyarrow array (and
> > vice-versa: both directions).
> >
> > Here's what I already tried doing.
> >
> >- Created a C library that contains the two structs ArrowSchema and
> >ArrowArray and some functions to export an int64_t array as an Arrow
> > Array.
> >This is very similar to what the document did with int32_t arrays.
> >- Imported the C library in Python. Created an int64_t pyarrow.array.
> >Serialized it to read the bytes via Numpy and populated the C struct I
> >created using the C library function.
> >
> > What I expected was that the bytes would have some resemblance to each
> > other and that pyarrow would have some utility to pick up the ArrowArray
> > struct and treat it as an Arrow Array. But I couldn't get it to work.
> >
> > I am also confused as to how do I use ArrowSchema properly. The
> > ArrowSchema is
> > the only structure that differentiates different ArrowArray formats.
> > However, the fact that I am not using it anywhere with the ArrowArray
> > struct
> > or for that matter for any kind of initialization which tells the Arrow
> > library that "The next structure you will encounter would be of the kind
> > that the ArrowSchema has provided you", doesn't seem correct to me.
> >
> > It would really help me out, if you could tell if I actually misinterpreted
> > the doc, or am I doing something wrong. Thanks!
> >


Re: Clarification regarding the `CDataInterface.rst`

2020-03-29 Thread Neal Richardson
Hi Anish,
You may be interested in how the Arrow R package uses the C interface to
pass data to/from pyarrow. Both sides use the Arrow C++ library's
implementation of the C interface. See
https://github.com/apache/arrow/blob/master/r/src/py-to-r.cpp and
https://github.com/apache/arrow/blob/master/r/R/py-to-r.R. The Arrow C++
implementation is in
https://github.com/apache/arrow/tree/master/cpp/src/arrow/c.

Neal

On Sun, Mar 29, 2020 at 12:14 PM Anish Biswas 
wrote:

> I have been trying to wrap my head around the[ CDataInterface.rst|
>
> https://github.com/apache/arrow/blob/master/docs/source/format/CDataInterface.rst
> ]
> document for a few days now. So what I am trying is basically to use the C
> interface with a minimum dependencies to produce blocks of bytes that
> pyarrow can reconstruct and work on as a normal pyarrow array (and
> vice-versa: both directions).
>
> Here's what I already tried doing.
>
>- Created a C library that contains the two structs ArrowSchema and
>ArrowArray and some functions to export an int64_t array as an Arrow
> Array.
>This is very similar to what the document did with int32_t arrays.
>- Imported the C library in Python. Created an int64_t pyarrow.array.
>Serialized it to read the bytes via Numpy and populated the C struct I
>created using the C library function.
>
> What I expected was that the bytes would have some resemblance to each
> other and that pyarrow would have some utility to pick up the ArrowArray
> struct and treat it as an Arrow Array. But I couldn't get it to work.
>
> I am also confused as to how do I use ArrowSchema properly. The
> ArrowSchema is
> the only structure that differentiates different ArrowArray formats.
> However, the fact that I am not using it anywhere with the ArrowArray
> struct
> or for that matter for any kind of initialization which tells the Arrow
> library that "The next structure you will encounter would be of the kind
> that the ArrowSchema has provided you", doesn't seem correct to me.
>
> It would really help me out, if you could tell if I actually misinterpreted
> the doc, or am I doing something wrong. Thanks!
>


Clarification regarding the `CDataInterface.rst`

2020-03-29 Thread Anish Biswas
I have been trying to wrap my head around the[ CDataInterface.rst|
https://github.com/apache/arrow/blob/master/docs/source/format/CDataInterface.rst]
document for a few days now. So what I am trying is basically to use the C
interface with a minimum dependencies to produce blocks of bytes that
pyarrow can reconstruct and work on as a normal pyarrow array (and
vice-versa: both directions).

Here's what I already tried doing.

   - Created a C library that contains the two structs ArrowSchema and
   ArrowArray and some functions to export an int64_t array as an Arrow Array.
   This is very similar to what the document did with int32_t arrays.
   - Imported the C library in Python. Created an int64_t pyarrow.array.
   Serialized it to read the bytes via Numpy and populated the C struct I
   created using the C library function.

What I expected was that the bytes would have some resemblance to each
other and that pyarrow would have some utility to pick up the ArrowArray
struct and treat it as an Arrow Array. But I couldn't get it to work.

I am also confused as to how do I use ArrowSchema properly. The ArrowSchema is
the only structure that differentiates different ArrowArray formats.
However, the fact that I am not using it anywhere with the ArrowArray struct
or for that matter for any kind of initialization which tells the Arrow
library that "The next structure you will encounter would be of the kind
that the ArrowSchema has provided you", doesn't seem correct to me.

It would really help me out, if you could tell if I actually misinterpreted
the doc, or am I doing something wrong. Thanks!


[jira] [Created] (ARROW-8257) Clarification regarding the `CDataInterface.rst`

2020-03-29 Thread Anish Biswas (Jira)
Anish Biswas created ARROW-8257:
---

 Summary: Clarification regarding the `CDataInterface.rst`
 Key: ARROW-8257
 URL: https://issues.apache.org/jira/browse/ARROW-8257
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Anish Biswas


I have been trying to wrap my head around the[ 
CDataInterface.rst|[https://github.com/apache/arrow/blob/master/docs/source/format/CDataInterface.rst]]
 document for a few days now. So what I am trying is basically to use the C 
interface with a minimum dependencies to produce blocks of bytes that pyarrow 
can reconstruct and work on as a normal pyarrow array (and vice-versa: both 
directions). 

Here's what I already tried doing. 
 * Created a C library that contains the two structs ArrowSchema and ArrowArray 
and some functions to export an int64_t array as an Arrow Array. This is very 
similar to what the document did with int32_t arrays.
 * Imported the C library in Python. Created an int64_t pyarrow.array. 
Serialized it to read the bytes via Numpy and populated the C struct I created 
using the C library function. 

What I expected was that the bytes would have some resemblance to each other 
and that pyarrow would have some utility to pick up the ArrowArray struct and 
treat it as an Arrow Array. But I couldn't get it to work. 

I am also confused as to how do I use ArrowSchema properly. The {{ArrowSchema}} 
is the only structure that differentiates different {{ArrowArray}} formats. 
However, the fact that I am not using it anywhere with the {{ArrowArray}} 
struct or for that matter for any kind of initialization which tells the Arrow 
library that "The next structure you will encounter would be of the kind that 
the {{ArrowSchema}} has provided you", doesn't seem correct to me. 

It would really help me out, if you could tell if I actually misinterpreted the 
doc, or am I doing something wrong. Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)