[ 
https://issues.apache.org/jira/browse/SPARK-44545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Dmitriev updated SPARK-44545:
------------------------------------
    Description: 
So, I have a dataframe with non-unique columns names:

 
{code:java}
df = spark_session.createDataFrame([[1,2,3], [4,5,6]], ['a', 'a', 'c']) {code}
 

It works fine.

 

Now I try to get a column with non-unique name by index
{code:java}
df[0]
{code}
Expectation: It returns first of the columns

Note, the 
[doc|[https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.__getitem__.html]]
 doesn't mention non-unique name as a precondition.

 

Actual result:

It throws exception:

 
{noformat}
AnalysisException                         Traceback (most recent call last)
Cell In[71], line 1
----> 1 df[0]

File /usr/local/spark/python/pyspark/sql/dataframe.py:2935, in 
DataFrame.__getitem__(self, item)
   2933     return self.select(*item)
   2934 elif isinstance(item, int):
-> 2935     jc = self._jdf.apply(self.columns[item])
   2936     return Column(jc)
   2937 else:

File 
/usr/local/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, in 
JavaMember.__call__(self, *args)
   1316 command = proto.CALL_COMMAND_NAME +\
   1317     self.command_header +\
   1318     args_command +\
   1319     proto.END_COMMAND_PART
   1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
   1323     answer, self.gateway_client, self.target_id, self.name)
   1325 for temp_arg in temp_args:
   1326     if hasattr(temp_arg, "_detach"):

File /usr/local/spark/python/pyspark/errors/exceptions/captured.py:175, in 
capture_sql_exception.<locals>.deco(*a, **kw)
    171 converted = convert_exception(e.java_exception)
    172 if not isinstance(converted, UnknownException):
    173     # Hide where the exception came from that shows a non-Pythonic
    174     # JVM exception message.
--> 175     raise converted from None
    176 else:
    177     raise

AnalysisException: [AMBIGUOUS_REFERENCE] Reference `a` is ambiguous, could be: 
[`a`, `a`].{noformat}
 

  was:
So, I have a dataframe with non-unique columns names:

 
{code:java}
df = spark_session.createDataFrame([[1,2,3], [4,5,6]], ['a', 'a', 'c']) {code}
 

It works fine.

 

Now I try to get a column with non-unique name by index
{code:java}
df[0]
{code}
Expectation: It returns first of the columns

Note, the 
[doc|[https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.__getitem__.html]]
 doesn't mention non-unique name as a precondition.

 

Actual result:

It throws exception:

 
{noformat}
AnalysisException                         Traceback (most recent call last)
Cell In[71], line 1
----> 1 df[0]

File /usr/local/spark/python/pyspark/sql/dataframe.py:2935, in 
DataFrame.__getitem__(self, item)
   2933     return self.select(*item)
   2934 elif isinstance(item, int):
-> 2935     jc = self._jdf.apply(self.columns[item])
   2936     return Column(jc)
   2937 else:

File 
/usr/local/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, in 
JavaMember.__call__(self, *args)
   1316 command = proto.CALL_COMMAND_NAME +\
   1317     self.command_header +\
   1318     args_command +\
   1319     proto.END_COMMAND_PART
   1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
   1323     answer, self.gateway_client, self.target_id, self.name)
   1325 for temp_arg in temp_args:
   1326     if hasattr(temp_arg, "_detach"):

File /usr/local/spark/python/pyspark/errors/exceptions/captured.py:175, in 
capture_sql_exception.<locals>.deco(*a, **kw)
    171 converted = convert_exception(e.java_exception)
    172 if not isinstance(converted, UnknownException):
    173     # Hide where the exception came from that shows a non-Pythonic
    174     # JVM exception message.
--> 175     raise converted from None
    176 else:
    177     raise

AnalysisException: [AMBIGUOUS_REFERENCE] Reference `a` is ambiguous, could be: 
[`a`, `a`].{noformat}
 


> It's impossible to get column by index if names are not unique
> --------------------------------------------------------------
>
>                 Key: SPARK-44545
>                 URL: https://issues.apache.org/jira/browse/SPARK-44545
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.4.0
>         Environment: I have python 3.11, pyspark 3.4.0
>            Reporter: Alexey Dmitriev
>            Priority: Major
>
> So, I have a dataframe with non-unique columns names:
>  
> {code:java}
> df = spark_session.createDataFrame([[1,2,3], [4,5,6]], ['a', 'a', 'c']) {code}
>  
> It works fine.
>  
> Now I try to get a column with non-unique name by index
> {code:java}
> df[0]
> {code}
> Expectation: It returns first of the columns
> Note, the 
> [doc|[https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.__getitem__.html]]
>  doesn't mention non-unique name as a precondition.
>  
> Actual result:
> It throws exception:
>  
> {noformat}
> AnalysisException                         Traceback (most recent call last)
> Cell In[71], line 1
> ----> 1 df[0]
> File /usr/local/spark/python/pyspark/sql/dataframe.py:2935, in 
> DataFrame.__getitem__(self, item)
>    2933     return self.select(*item)
>    2934 elif isinstance(item, int):
> -> 2935     jc = self._jdf.apply(self.columns[item])
>    2936     return Column(jc)
>    2937 else:
> File 
> /usr/local/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, 
> in JavaMember.__call__(self, *args)
>    1316 command = proto.CALL_COMMAND_NAME +\
>    1317     self.command_header +\
>    1318     args_command +\
>    1319     proto.END_COMMAND_PART
>    1321 answer = self.gateway_client.send_command(command)
> -> 1322 return_value = get_return_value(
>    1323     answer, self.gateway_client, self.target_id, self.name)
>    1325 for temp_arg in temp_args:
>    1326     if hasattr(temp_arg, "_detach"):
> File /usr/local/spark/python/pyspark/errors/exceptions/captured.py:175, in 
> capture_sql_exception.<locals>.deco(*a, **kw)
>     171 converted = convert_exception(e.java_exception)
>     172 if not isinstance(converted, UnknownException):
>     173     # Hide where the exception came from that shows a non-Pythonic
>     174     # JVM exception message.
> --> 175     raise converted from None
>     176 else:
>     177     raise
> AnalysisException: [AMBIGUOUS_REFERENCE] Reference `a` is ambiguous, could 
> be: [`a`, `a`].{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to