[jira] [Commented] (ARROW-1973) [Python] Memory leak when converting Arrow tables with array columns to Pandas dataframes.

ASF GitHub Bot (JIRA) Thu, 08 Feb 2018 10:58:11 -0800

    [ 
https://issues.apache.org/jira/browse/ARROW-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357411#comment-16357411
 ]


ASF GitHub Bot commented on ARROW-1973:
---------------------------------------

cpcloud commented on a change in pull request #1578: ARROW-1973: [Python] 
Memory leak when converting Arrow tables with array columns to Pandas 
dataframes.
URL: https://github.com/apache/arrow/pull/1578#discussion_r167034173
 
 

 ##########
 File path: cpp/src/arrow/python/arrow_to_pandas.cc
 ##########
 @@ -502,18 +502,20 @@ template <typename ArrowType>
 inline Status ConvertListsLike(PandasOptions options, const 
std::shared_ptr<Column>& col,
                                PyObject** out_values) {
   const ChunkedArray& data = *col->data().get();
-  auto list_type = std::static_pointer_cast<ListType>(col->type());
+  const auto& list_type = static_cast<const ListType&>(*col->type());
 
   // Get column of underlying value arrays
   std::vector<std::shared_ptr<Array>> value_arrays;
   for (int c = 0; c < data.num_chunks(); c++) {
-    auto arr = std::static_pointer_cast<ListArray>(data.chunk(c));
-    value_arrays.emplace_back(arr->values());
+    const auto& arr = static_cast<const ListArray&>(*data.chunk(c));
+    value_arrays.emplace_back(arr.values());
   }
-  auto flat_column = std::make_shared<Column>(list_type->value_field(), 
value_arrays);
+  auto flat_column = std::make_shared<Column>(list_type.value_field(), 
value_arrays);
   // TODO(ARROW-489): Currently we don't have a Python reference for single 
columns.
   //    Storing a reference to the whole Array would be to expensive.
-  PyObject* numpy_array;
+  OwnedRef owned_numpy_array;
 
 Review comment:
   Yep, thank you.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Memory leak when converting Arrow tables with array columns to 
> Pandas dataframes.
> ------------------------------------------------------------------------------------------
>
>                 Key: ARROW-1973
>                 URL: https://issues.apache.org/jira/browse/ARROW-1973
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>    Affects Versions: 0.8.0
>         Environment: Linux Mint 18.2
> Anaconda Python distribution + pyarrow installed from the conda-forge channel
>            Reporter: Alexey Strokach
>            Assignee: Phillip Cloud
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>
> There appears to be a memory leak when using PyArrow to convert tables 
> containing array columns to Pandas DataFrames.
>  See the `test_memory_leak.py` example here: 
> https://gitlab.com/ostrokach/pyarrow_duplicate_column_errors



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1973) [Python] Memory leak when converting Arrow tables with array columns to Pandas dataframes.

Reply via email to