pitrou commented on a change in pull request #9626: URL: https://github.com/apache/arrow/pull/9626#discussion_r587322650
########## File path: python/pyarrow/tests/test_pandas.py ########## @@ -2272,6 +2272,30 @@ def test_to_pandas(self): series = pd.Series(arr.to_pandas()) tm.assert_series_equal(series, expected) + def test_to_pandas_multiple_chunks(self): + # ARROW-11855 + bytes_start = pa.total_allocated_bytes() Review comment: Probably want to call `gc.collect()` just before this, to avoid false positives. ########## File path: cpp/src/arrow/python/arrow_to_pandas.cc ########## @@ -689,7 +691,8 @@ Status ConvertStruct(PandasOptions options, const ChunkedArray& data, auto name = array_type->field(static_cast<int>(field_idx))->name(); if (!arr->field(static_cast<int>(field_idx))->IsNull(i)) { // Value exists in child array, obtain it - auto array = reinterpret_cast<PyArrayObject*>(fields_data[field_idx].obj()); + auto array = reinterpret_cast<PyArrayObject*>( + fields_data[field_idx + fields_data_offset].obj()); Review comment: Does this mean that conversion could give the wrong results (in addition to being leaky)? If so, can you add a test showcasing that? (I believe you need the different chunks to be unequal...). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org