@Wes McKinney <wesmck...@gmail.com> On a separate note, why would there be 2 pointers in the Buffer class, const uint8_t* data_ & uint8_t* mutable_data_? Can't we have 1 pointer and then use const_cast<> in mutable_data() method (or vise versa)?
On Wed, Apr 21, 2021 at 10:09 AM Niranda Perera <niranda.per...@gmail.com> wrote: > Sorry, that's exactly what you've mentioned in the jira. :-) Please > ignore! > > On Wed, Apr 21, 2021 at 10:07 AM Niranda Perera <niranda.per...@gmail.com> > wrote: > >> @Wes, @Antoine, >> As @Weston pointed out, it seems like the issue is here. >> >> https://github.com/apache/arrow/blob/37c27d1eaf0fa61281ad103c08a0251bb6883ec4/cpp/src/arrow/python/numpy_convert.cc#L51 >> When the numpy buffer's is_mutable_ marked as true, ideally, >> *mutable_data_ should have also been set IMO. >> >> On Wed, Apr 21, 2021 at 10:00 AM Wes McKinney <wesmck...@gmail.com> >> wrote: >> >>> Definitely a bug. I just opened >>> >>> https://issues.apache.org/jira/browse/ARROW-12495 >>> >>> If there's a chance to get this into 4.0.0 this would be a nice one >>> but I suspect the next RC is already under way (it need not block >>> since this bug has been present a long time) >>> >>> On Wed, Apr 21, 2021 at 3:31 AM Antoine Pitrou <anto...@python.org> >>> wrote: >>> > >>> > >>> > It sounds like a bug if is_mutable_ is true but mutable_data_ is >>> nullptr. >>> > >>> > Regards >>> > >>> > Antoine. >>> > >>> > >>> > Le 21/04/2021 à 03:17, Weston Pace a écrit : >>> > > If it comes from pandas (and is eligible for zero-copy) then the >>> > > buffer implementation will be `NumPyBuffer`. Printing one in GDB >>> > > yields... >>> > > >>> > > ``` >>> > > $12 = {_vptr.Buffer = 0x7f0b66e147f8 <vtable for >>> > > arrow::py::NumPyBuffer+16>, is_mutable_ = true, is_cpu_ = true, data_ >>> > > = 0x55b71f901a70 "\001", mutable_data_ = 0x0, size_ = 16, capacity_ = >>> > > 16, >>> > > parent_ = {<std::__shared_ptr<arrow::Buffer, >>> > > (__gnu_cxx::_Lock_policy)2>> = >>> > > {<std::__shared_ptr_access<arrow::Buffer, (__gnu_cxx::_Lock_policy)2, >>> > > false, false>> = {<No data fields>}, _M_ptr = 0x0, >>> > > _M_refcount = {_M_pi = 0x0}}, <No data fields>}, >>> > > memory_manager_ = {<std::__shared_ptr<arrow::MemoryManager, >>> > > (__gnu_cxx::_Lock_policy)2>> = >>> > > {<std::__shared_ptr_access<arrow::MemoryManager, >>> > > (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, >>> > > _M_ptr = 0x55b71fdca4e0, _M_refcount = {_M_pi = >>> > > 0x55b71fb90640}}, <No data fields>}} >>> > > ``` >>> > > >>> > > Notice that `is_cpu_` and `is_mutable_` are both `true`. It's maybe >>> a >>> > > bug that `is_mutable_` is true. Although maybe not as it appears to >>> > > be telling whether the underlying numpy buffer itself is mutable or >>> > > not... >>> > > >>> > > ``` >>> > > if (PyArray_FLAGS(ndarray) & NPY_ARRAY_WRITEABLE) { >>> > > is_mutable_ = true; >>> > > } >>> > > ``` >>> > > >>> > > >>> > > On Tue, Apr 20, 2021 at 2:15 PM Niranda Perera < >>> niranda.per...@gmail.com> wrote: >>> > >> >>> > >> Hi all, >>> > >> >>> > >> We have been using Arrow v2.0.0 and we encountered the following >>> issue. >>> > >> >>> > >> I was reading a table with numeric data using pandas.read_csv and >>> then >>> > >> converting it into pyarrow table. In our application (Cylon >>> > >> <https://github.com/cylondata/cylon>), we are accessing this >>> pyarrow table >>> > >> from c++. We want to access the mutable data of the arrays in the >>> pyarrow >>> > >> table. >>> > >> >>> > >> But the following returns a nullptr. >>> > >> T *mutable_data = array->data()->GetMutableValues<T>(1); // returns >>> nullptr >>> > >> >>> > >> Interestingly, >>> > >> array->data()->buffers[1]->IsMutable(); // returns true >>> > >> array->data()->buffers[1]->IsCpu(); // returns true >>> > >> >>> > >> This only happens when I use pandas df to create a pyarrow table. It >>> > >> wouldn't happen when I use pyarrow.read_csv. So, I am guessing >>> there's some >>> > >> issue in the buffer creation from pandas df. >>> > >> >>> > >> Is this an expected behavior? or has this been resolved in v2.0< >>> releases? >>> > >> >>> > >> Best >>> > >> -- >>> > >> Niranda Perera >>> > >> https://niranda.dev/ >>> > >> @n1r44 <https://twitter.com/N1R44> >>> >> >> >> -- >> Niranda Perera >> https://niranda.dev/ >> @n1r44 <https://twitter.com/N1R44> >> >> > > -- > Niranda Perera > https://niranda.dev/ > @n1r44 <https://twitter.com/N1R44> > > -- Niranda Perera https://niranda.dev/ @n1r44 <https://twitter.com/N1R44>