@Wes McKinney <wesmck...@gmail.com> On a separate note, why would there be
2 pointers in the Buffer class, const uint8_t* data_ &  uint8_t*
mutable_data_? Can't we have 1 pointer and then use const_cast<> in
mutable_data() method (or vise versa)?

On Wed, Apr 21, 2021 at 10:09 AM Niranda Perera <niranda.per...@gmail.com>
wrote:

> Sorry, that's exactly what you've mentioned in the jira. :-) Please
> ignore!
>
> On Wed, Apr 21, 2021 at 10:07 AM Niranda Perera <niranda.per...@gmail.com>
> wrote:
>
>> @Wes, @Antoine,
>> As @Weston pointed out, it seems like the issue is here.
>>
>> https://github.com/apache/arrow/blob/37c27d1eaf0fa61281ad103c08a0251bb6883ec4/cpp/src/arrow/python/numpy_convert.cc#L51
>> When the numpy buffer's is_mutable_ marked as true, ideally,
>> *mutable_data_ should have also been set IMO.
>>
>> On Wed, Apr 21, 2021 at 10:00 AM Wes McKinney <wesmck...@gmail.com>
>> wrote:
>>
>>> Definitely a bug. I just opened
>>>
>>> https://issues.apache.org/jira/browse/ARROW-12495
>>>
>>> If there's a chance to get this into 4.0.0 this would be a nice one
>>> but I suspect the next RC is already under way (it need not block
>>> since this bug has been present a long time)
>>>
>>> On Wed, Apr 21, 2021 at 3:31 AM Antoine Pitrou <anto...@python.org>
>>> wrote:
>>> >
>>> >
>>> > It sounds like a bug if is_mutable_ is true but mutable_data_ is
>>> nullptr.
>>> >
>>> > Regards
>>> >
>>> > Antoine.
>>> >
>>> >
>>> > Le 21/04/2021 à 03:17, Weston Pace a écrit :
>>> > > If it comes from pandas (and is eligible for zero-copy) then the
>>> > > buffer implementation will be `NumPyBuffer`.  Printing one in GDB
>>> > > yields...
>>> > >
>>> > > ```
>>> > > $12 = {_vptr.Buffer = 0x7f0b66e147f8 <vtable for
>>> > > arrow::py::NumPyBuffer+16>, is_mutable_ = true, is_cpu_ = true, data_
>>> > > = 0x55b71f901a70 "\001", mutable_data_ = 0x0, size_ = 16, capacity_ =
>>> > > 16,
>>> > >    parent_ = {<std::__shared_ptr<arrow::Buffer,
>>> > > (__gnu_cxx::_Lock_policy)2>> =
>>> > > {<std::__shared_ptr_access<arrow::Buffer, (__gnu_cxx::_Lock_policy)2,
>>> > > false, false>> = {<No data fields>}, _M_ptr = 0x0,
>>> > >        _M_refcount = {_M_pi = 0x0}}, <No data fields>},
>>> > >    memory_manager_ = {<std::__shared_ptr<arrow::MemoryManager,
>>> > > (__gnu_cxx::_Lock_policy)2>> =
>>> > > {<std::__shared_ptr_access<arrow::MemoryManager,
>>> > > (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>},
>>> > >        _M_ptr = 0x55b71fdca4e0, _M_refcount = {_M_pi =
>>> > > 0x55b71fb90640}}, <No data fields>}}
>>> > > ```
>>> > >
>>> > > Notice that `is_cpu_` and `is_mutable_` are both `true`.  It's maybe
>>> a
>>> > > bug that `is_mutable_` is true.  Although maybe not as it appears to
>>> > > be telling whether the underlying numpy buffer itself is mutable or
>>> > > not...
>>> > >
>>> > > ```
>>> > > if (PyArray_FLAGS(ndarray) & NPY_ARRAY_WRITEABLE) {
>>> > >      is_mutable_ = true;
>>> > > }
>>> > > ```
>>> > >
>>> > >
>>> > > On Tue, Apr 20, 2021 at 2:15 PM Niranda Perera <
>>> niranda.per...@gmail.com> wrote:
>>> > >>
>>> > >> Hi all,
>>> > >>
>>> > >> We have been using Arrow v2.0.0 and we encountered the following
>>> issue.
>>> > >>
>>> > >> I was reading a table with numeric data using pandas.read_csv and
>>> then
>>> > >> converting it into pyarrow table. In our application (Cylon
>>> > >> <https://github.com/cylondata/cylon>), we are accessing this
>>> pyarrow table
>>> > >> from c++. We want to access the mutable data of the arrays in the
>>> pyarrow
>>> > >> table.
>>> > >>
>>> > >> But the following returns a nullptr.
>>> > >> T *mutable_data = array->data()->GetMutableValues<T>(1); // returns
>>> nullptr
>>> > >>
>>> > >> Interestingly,
>>> > >> array->data()->buffers[1]->IsMutable(); // returns true
>>> > >> array->data()->buffers[1]->IsCpu(); // returns true
>>> > >>
>>> > >> This only happens when I use pandas df to create a pyarrow table. It
>>> > >> wouldn't happen when I use pyarrow.read_csv. So, I am guessing
>>> there's some
>>> > >> issue in the buffer creation from pandas df.
>>> > >>
>>> > >> Is this an expected behavior? or has this been resolved in v2.0<
>>> releases?
>>> > >>
>>> > >> Best
>>> > >> --
>>> > >> Niranda Perera
>>> > >> https://niranda.dev/
>>> > >> @n1r44 <https://twitter.com/N1R44>
>>>
>>
>>
>> --
>> Niranda Perera
>> https://niranda.dev/
>> @n1r44 <https://twitter.com/N1R44>
>>
>>
>
> --
> Niranda Perera
> https://niranda.dev/
> @n1r44 <https://twitter.com/N1R44>
>
>

-- 
Niranda Perera
https://niranda.dev/
@n1r44 <https://twitter.com/N1R44>

Reply via email to