amol- commented on a change in pull request #10101:
URL: https://github.com/apache/arrow/pull/10101#discussion_r618353079
##########
File path: cpp/src/arrow/python/arrow_to_pandas.cc
##########
@@ -2229,6 +2229,11 @@ Status ConvertChunkedArrayToPandas(const PandasOptions&
options,
checked_cast<const DictionaryType&>(*arr->type()).value_type();
RETURN_NOT_OK(DecodeDictionaries(options.pool, dense_type, &arr));
DCHECK_NE(arr->type()->id(), Type::DICTIONARY);
+
+ // The original Python DictionaryArray won't own the memory anymore
+ // as we actually built a new array when we decoded the DictionaryArray
+ // thus let the final resulting numpy array own the memory through a
Capsule
+ py_ref = nullptr;
Review comment:
@pitrou would you be able to confirm that this is the right fix?
You can see the problem it deals with by checking out this branch,
commenting out that line and running the `test_dictionary_to_numpy` test.
You will see that in such case we produce corrupted data as the subsequent
`MakeNumPyView` used the original `DictionaryArray` as the "base" of the new
`numpy` array and the newly built `Array` resulting by dictionary decoding (who
really owns the memory for the numpy array) probably goes away immediately at
the end of this method. With this change instead a capsule is made and the
returned numpy array becomes the owner of that capsule.
Does that sound like the right solution?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]