[jira] [Commented] (ARROW-1754) [Python] Fix buggy Parquet roundtrip when an index name is the same as a column name

ASF GitHub Bot (JIRA) Wed, 17 Jan 2018 16:04:48 -0800

    [ 
https://issues.apache.org/jira/browse/ARROW-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329725#comment-16329725
 ]


ASF GitHub Bot commented on ARROW-1754:
---------------------------------------

cpcloud commented on a change in pull request #1408: ARROW-1754: [Python] 
alternative fix for duplicate index/column name that preserves index name if 
available
URL: https://github.com/apache/arrow/pull/1408#discussion_r162216610
 
 

 ##########
 File path: python/pyarrow/pandas_compat.py
 ##########
 @@ -294,9 +288,29 @@ def _column_name_to_strings(name):
     return str(name)
 
 
+def _index_level_name(index, i, column_names):
+    """Return the name of an index level or a default name if `index.name` is
+    None or is already a column name.
+
+    Parameters
+    ----------
+    index : pandas.Index
+    i : int
+
+    Returns
+    -------
+    name : str
+    """
+    if index.name is not None and index.name not in column_names:
 
 Review comment:
   Should we be concerned about the linear search for `index.name not in 
column_names`? If so, let's create a set outside the loop below that we can 
check so that we don't need to do a full scan of the column names for every 
index column.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Fix buggy Parquet roundtrip when an index name is the same as a 
> column name
> ------------------------------------------------------------------------------------
>
>                 Key: ARROW-1754
>                 URL: https://issues.apache.org/jira/browse/ARROW-1754
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.7.1
>            Reporter: Wes McKinney
>            Assignee: Phillip Cloud
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.8.0
>
>
> See upstream report 
> https://stackoverflow.com/questions/47013052/issue-with-pyarrow-when-loading-parquet-file-where-index-has-redundant-column



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1754) [Python] Fix buggy Parquet roundtrip when an index name is the same as a column name

Reply via email to