[ https://issues.apache.org/jira/browse/SPARK-42444 ]
Sean R. Owen deleted comment on SPARK-42444: -------------------------------------- was (Author: JIRAUSER295111): Thank you for sharing. [Azure Solution Architect Training |https://www.igmguru.com/cloud-computing/microsoft-azure-solution-architect-az-300-training/]has been designed for software developers who are keen on developing best-in-class applications using this open and advanced platform of Windows Azure. > DataFrame.drop should handle multi columns properly > --------------------------------------------------- > > Key: SPARK-42444 > URL: https://issues.apache.org/jira/browse/SPARK-42444 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.4.0 > Reporter: Ruifeng Zheng > Priority: Blocker > > {code:java} > from pyspark.sql import Row > df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], > ["age", "name"]) > df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, > name="Bob")]) > df1.join(df2, df1.name == df2.name, 'inner').drop('name', 'age').show() > {code} > This works in 3.3 > {code:java} > +------+ > |height| > +------+ > | 85| > | 80| > +------+ > {code} > but fails in 3.4 > {code:java} > --------------------------------------------------------------------------- > AnalysisException Traceback (most recent call last) > Cell In[1], line 4 > 2 df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, > "Bob")], ["age", "name"]) > 3 df2 = spark.createDataFrame([Row(height=80, name="Tom"), > Row(height=85, name="Bob")]) > ----> 4 df1.join(df2, df1.name == df2.name, 'inner').drop('name', > 'age').show() > File ~/Dev/spark/python/pyspark/sql/dataframe.py:4913, in > DataFrame.drop(self, *cols) > 4911 jcols = [_to_java_column(c) for c in cols] > 4912 first_column, *remaining_columns = jcols > -> 4913 jdf = self._jdf.drop(first_column, self._jseq(remaining_columns)) > 4915 return DataFrame(jdf, self.sparkSession) > File ~/Dev/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, > in JavaMember.__call__(self, *args) > 1316 command = proto.CALL_COMMAND_NAME +\ > 1317 self.command_header +\ > 1318 args_command +\ > 1319 proto.END_COMMAND_PART > 1321 answer = self.gateway_client.send_command(command) > -> 1322 return_value = get_return_value( > 1323 answer, self.gateway_client, self.target_id, self.name) > 1325 for temp_arg in temp_args: > 1326 if hasattr(temp_arg, "_detach"): > File ~/Dev/spark/python/pyspark/errors/exceptions/captured.py:159, in > capture_sql_exception.<locals>.deco(*a, **kw) > 155 converted = convert_exception(e.java_exception) > 156 if not isinstance(converted, UnknownException): > 157 # Hide where the exception came from that shows a non-Pythonic > 158 # JVM exception message. > --> 159 raise converted from None > 160 else: > 161 raise > AnalysisException: [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could > be: [`name`, `name`]. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org