[ https://issues.apache.org/jira/browse/SPARK-31441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-31441. ---------------------------------- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 28210 [https://github.com/apache/spark/pull/28210] > Support duplicated column names for toPandas with Arrow execution. > ------------------------------------------------------------------ > > Key: SPARK-31441 > URL: https://issues.apache.org/jira/browse/SPARK-31441 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.4.5, 3.0.0 > Reporter: Takuya Ueshin > Assignee: Takuya Ueshin > Priority: Major > Fix For: 3.0.0 > > > When we execute {{toPandas()}} with Arrow execution, it fails if the column > names have duplicates. > {code:python} > >>> spark.sql("select 1 v, 1 v").toPandas() > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/path/to/lib/python3.7/site-packages/pyspark/sql/dataframe.py", line > 2132, in toPandas > pdf = table.to_pandas() > File "pyarrow/array.pxi", line 441, in > pyarrow.lib._PandasConvertible.to_pandas > File "pyarrow/table.pxi", line 1367, in pyarrow.lib.Table._to_pandas > File > "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.7/lib/python3.7/site-packages/pyarrow/pandas_compat.py", > line 653, in table_to_blockmanager > columns = _deserialize_column_index(table, all_columns, column_indexes) > File "/path/to/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line > 704, in _deserialize_column_index > columns = _flatten_single_level_multiindex(columns) > File "/path/to/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line > 937, in _flatten_single_level_multiindex > raise ValueError('Found non-unique column index') > ValueError: Found non-unique column index > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org