[ https://issues.apache.org/jira/browse/SPARK-44545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Dmitriev updated SPARK-44545: ------------------------------------ Description: So, I have a dataframe with non-unique columns names: {code:java} df = spark_session.createDataFrame([[1,2,3], [4,5,6]], ['a', 'a', 'c']) {code} It works fine. Now I try to get a column with non-unique name by index {code:java} df[0] {code} Expectation: It returns first of the columns Note, the [doc|[https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.__getitem__.html]] doesn't mention non-unique name as a precondition. Actual result: It throws exception: {noformat} AnalysisException Traceback (most recent call last) Cell In[71], line 1 ----> 1 df[0] File /usr/local/spark/python/pyspark/sql/dataframe.py:2935, in DataFrame.__getitem__(self, item) 2933 return self.select(*item) 2934 elif isinstance(item, int): -> 2935 jc = self._jdf.apply(self.columns[item]) 2936 return Column(jc) 2937 else: File /usr/local/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args) 1316 command = proto.CALL_COMMAND_NAME +\ 1317 self.command_header +\ 1318 args_command +\ 1319 proto.END_COMMAND_PART 1321 answer = self.gateway_client.send_command(command) -> 1322 return_value = get_return_value( 1323 answer, self.gateway_client, self.target_id, self.name) 1325 for temp_arg in temp_args: 1326 if hasattr(temp_arg, "_detach"): File /usr/local/spark/python/pyspark/errors/exceptions/captured.py:175, in capture_sql_exception.<locals>.deco(*a, **kw) 171 converted = convert_exception(e.java_exception) 172 if not isinstance(converted, UnknownException): 173 # Hide where the exception came from that shows a non-Pythonic 174 # JVM exception message. --> 175 raise converted from None 176 else: 177 raise AnalysisException: [AMBIGUOUS_REFERENCE] Reference `a` is ambiguous, could be: [`a`, `a`].{noformat} was: So, I have a dataframe with non-unique columns names: {code:java} df = spark_session.createDataFrame([[1,2,3], [4,5,6]], ['a', 'a', 'c']) {code} It works fine. Now I try to get a column with non-unique name by index {code:java} df[0] {code} Expectation: It returns first of the columns Note, the [doc|[https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.__getitem__.html]] doesn't mention non-unique name as a precondition. Actual result: It throws exception: {noformat} AnalysisException Traceback (most recent call last) Cell In[71], line 1 ----> 1 df[0] File /usr/local/spark/python/pyspark/sql/dataframe.py:2935, in DataFrame.__getitem__(self, item) 2933 return self.select(*item) 2934 elif isinstance(item, int): -> 2935 jc = self._jdf.apply(self.columns[item]) 2936 return Column(jc) 2937 else: File /usr/local/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args) 1316 command = proto.CALL_COMMAND_NAME +\ 1317 self.command_header +\ 1318 args_command +\ 1319 proto.END_COMMAND_PART 1321 answer = self.gateway_client.send_command(command) -> 1322 return_value = get_return_value( 1323 answer, self.gateway_client, self.target_id, self.name) 1325 for temp_arg in temp_args: 1326 if hasattr(temp_arg, "_detach"): File /usr/local/spark/python/pyspark/errors/exceptions/captured.py:175, in capture_sql_exception.<locals>.deco(*a, **kw) 171 converted = convert_exception(e.java_exception) 172 if not isinstance(converted, UnknownException): 173 # Hide where the exception came from that shows a non-Pythonic 174 # JVM exception message. --> 175 raise converted from None 176 else: 177 raise AnalysisException: [AMBIGUOUS_REFERENCE] Reference `a` is ambiguous, could be: [`a`, `a`].{noformat} > It's impossible to get column by index if names are not unique > -------------------------------------------------------------- > > Key: SPARK-44545 > URL: https://issues.apache.org/jira/browse/SPARK-44545 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.4.0 > Environment: I have python 3.11, pyspark 3.4.0 > Reporter: Alexey Dmitriev > Priority: Major > > So, I have a dataframe with non-unique columns names: > > {code:java} > df = spark_session.createDataFrame([[1,2,3], [4,5,6]], ['a', 'a', 'c']) {code} > > It works fine. > > Now I try to get a column with non-unique name by index > {code:java} > df[0] > {code} > Expectation: It returns first of the columns > Note, the > [doc|[https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.__getitem__.html]] > doesn't mention non-unique name as a precondition. > > Actual result: > It throws exception: > > {noformat} > AnalysisException Traceback (most recent call last) > Cell In[71], line 1 > ----> 1 df[0] > File /usr/local/spark/python/pyspark/sql/dataframe.py:2935, in > DataFrame.__getitem__(self, item) > 2933 return self.select(*item) > 2934 elif isinstance(item, int): > -> 2935 jc = self._jdf.apply(self.columns[item]) > 2936 return Column(jc) > 2937 else: > File > /usr/local/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, > in JavaMember.__call__(self, *args) > 1316 command = proto.CALL_COMMAND_NAME +\ > 1317 self.command_header +\ > 1318 args_command +\ > 1319 proto.END_COMMAND_PART > 1321 answer = self.gateway_client.send_command(command) > -> 1322 return_value = get_return_value( > 1323 answer, self.gateway_client, self.target_id, self.name) > 1325 for temp_arg in temp_args: > 1326 if hasattr(temp_arg, "_detach"): > File /usr/local/spark/python/pyspark/errors/exceptions/captured.py:175, in > capture_sql_exception.<locals>.deco(*a, **kw) > 171 converted = convert_exception(e.java_exception) > 172 if not isinstance(converted, UnknownException): > 173 # Hide where the exception came from that shows a non-Pythonic > 174 # JVM exception message. > --> 175 raise converted from None > 176 else: > 177 raise > AnalysisException: [AMBIGUOUS_REFERENCE] Reference `a` is ambiguous, could > be: [`a`, `a`].{noformat} > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org