[jira] [Commented] (SPARK-42444) DataFrame.drop should handle multi columns properly
[ https://issues.apache.org/jira/browse/SPARK-42444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692467#comment-17692467 ] Apache Spark commented on SPARK-42444: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/40135 > DataFrame.drop should handle multi columns properly > --- > > Key: SPARK-42444 > URL: https://issues.apache.org/jira/browse/SPARK-42444 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Blocker > > {code:java} > from pyspark.sql import Row > df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], > ["age", "name"]) > df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, > name="Bob")]) > df1.join(df2, df1.name == df2.name, 'inner').drop('name', 'age').show() > {code} > This works in 3.3 > {code:java} > +--+ > |height| > +--+ > |85| > |80| > +--+ > {code} > but fails in 3.4 > {code:java} > --- > AnalysisException Traceback (most recent call last) > Cell In[1], line 4 > 2 df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, > "Bob")], ["age", "name"]) > 3 df2 = spark.createDataFrame([Row(height=80, name="Tom"), > Row(height=85, name="Bob")]) > > 4 df1.join(df2, df1.name == df2.name, 'inner').drop('name', > 'age').show() > File ~/Dev/spark/python/pyspark/sql/dataframe.py:4913, in > DataFrame.drop(self, *cols) >4911 jcols = [_to_java_column(c) for c in cols] >4912 first_column, *remaining_columns = jcols > -> 4913 jdf = self._jdf.drop(first_column, self._jseq(remaining_columns)) >4915 return DataFrame(jdf, self.sparkSession) > File ~/Dev/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, > in JavaMember.__call__(self, *args) >1316 command = proto.CALL_COMMAND_NAME +\ >1317 self.command_header +\ >1318 args_command +\ >1319 proto.END_COMMAND_PART >1321 answer = self.gateway_client.send_command(command) > -> 1322 return_value = get_return_value( >1323 answer, self.gateway_client, self.target_id, self.name) >1325 for temp_arg in temp_args: >1326 if hasattr(temp_arg, "_detach"): > File ~/Dev/spark/python/pyspark/errors/exceptions/captured.py:159, in > capture_sql_exception..deco(*a, **kw) > 155 converted = convert_exception(e.java_exception) > 156 if not isinstance(converted, UnknownException): > 157 # Hide where the exception came from that shows a non-Pythonic > 158 # JVM exception message. > --> 159 raise converted from None > 160 else: > 161 raise > AnalysisException: [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could > be: [`name`, `name`]. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42444) DataFrame.drop should handle multi columns properly
[ https://issues.apache.org/jira/browse/SPARK-42444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692466#comment-17692466 ] Apache Spark commented on SPARK-42444: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/40135 > DataFrame.drop should handle multi columns properly > --- > > Key: SPARK-42444 > URL: https://issues.apache.org/jira/browse/SPARK-42444 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Blocker > > {code:java} > from pyspark.sql import Row > df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], > ["age", "name"]) > df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, > name="Bob")]) > df1.join(df2, df1.name == df2.name, 'inner').drop('name', 'age').show() > {code} > This works in 3.3 > {code:java} > +--+ > |height| > +--+ > |85| > |80| > +--+ > {code} > but fails in 3.4 > {code:java} > --- > AnalysisException Traceback (most recent call last) > Cell In[1], line 4 > 2 df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, > "Bob")], ["age", "name"]) > 3 df2 = spark.createDataFrame([Row(height=80, name="Tom"), > Row(height=85, name="Bob")]) > > 4 df1.join(df2, df1.name == df2.name, 'inner').drop('name', > 'age').show() > File ~/Dev/spark/python/pyspark/sql/dataframe.py:4913, in > DataFrame.drop(self, *cols) >4911 jcols = [_to_java_column(c) for c in cols] >4912 first_column, *remaining_columns = jcols > -> 4913 jdf = self._jdf.drop(first_column, self._jseq(remaining_columns)) >4915 return DataFrame(jdf, self.sparkSession) > File ~/Dev/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, > in JavaMember.__call__(self, *args) >1316 command = proto.CALL_COMMAND_NAME +\ >1317 self.command_header +\ >1318 args_command +\ >1319 proto.END_COMMAND_PART >1321 answer = self.gateway_client.send_command(command) > -> 1322 return_value = get_return_value( >1323 answer, self.gateway_client, self.target_id, self.name) >1325 for temp_arg in temp_args: >1326 if hasattr(temp_arg, "_detach"): > File ~/Dev/spark/python/pyspark/errors/exceptions/captured.py:159, in > capture_sql_exception..deco(*a, **kw) > 155 converted = convert_exception(e.java_exception) > 156 if not isinstance(converted, UnknownException): > 157 # Hide where the exception came from that shows a non-Pythonic > 158 # JVM exception message. > --> 159 raise converted from None > 160 else: > 161 raise > AnalysisException: [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could > be: [`name`, `name`]. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42444) DataFrame.drop should handle multi columns properly
[ https://issues.apache.org/jira/browse/SPARK-42444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692442#comment-17692442 ] Ruifeng Zheng commented on SPARK-42444: --- I am going to fix this one > DataFrame.drop should handle multi columns properly > --- > > Key: SPARK-42444 > URL: https://issues.apache.org/jira/browse/SPARK-42444 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Blocker > > {code:java} > from pyspark.sql import Row > df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], > ["age", "name"]) > df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, > name="Bob")]) > df1.join(df2, df1.name == df2.name, 'inner').drop('name', 'age').show() > {code} > This works in 3.3 > {code:java} > +--+ > |height| > +--+ > |85| > |80| > +--+ > {code} > but fails in 3.4 > {code:java} > --- > AnalysisException Traceback (most recent call last) > Cell In[1], line 4 > 2 df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, > "Bob")], ["age", "name"]) > 3 df2 = spark.createDataFrame([Row(height=80, name="Tom"), > Row(height=85, name="Bob")]) > > 4 df1.join(df2, df1.name == df2.name, 'inner').drop('name', > 'age').show() > File ~/Dev/spark/python/pyspark/sql/dataframe.py:4913, in > DataFrame.drop(self, *cols) >4911 jcols = [_to_java_column(c) for c in cols] >4912 first_column, *remaining_columns = jcols > -> 4913 jdf = self._jdf.drop(first_column, self._jseq(remaining_columns)) >4915 return DataFrame(jdf, self.sparkSession) > File ~/Dev/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, > in JavaMember.__call__(self, *args) >1316 command = proto.CALL_COMMAND_NAME +\ >1317 self.command_header +\ >1318 args_command +\ >1319 proto.END_COMMAND_PART >1321 answer = self.gateway_client.send_command(command) > -> 1322 return_value = get_return_value( >1323 answer, self.gateway_client, self.target_id, self.name) >1325 for temp_arg in temp_args: >1326 if hasattr(temp_arg, "_detach"): > File ~/Dev/spark/python/pyspark/errors/exceptions/captured.py:159, in > capture_sql_exception..deco(*a, **kw) > 155 converted = convert_exception(e.java_exception) > 156 if not isinstance(converted, UnknownException): > 157 # Hide where the exception came from that shows a non-Pythonic > 158 # JVM exception message. > --> 159 raise converted from None > 160 else: > 161 raise > AnalysisException: [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could > be: [`name`, `name`]. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42444) DataFrame.drop should handle multi columns properly
[ https://issues.apache.org/jira/browse/SPARK-42444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17691489#comment-17691489 ] Pankaj Nagla commented on SPARK-42444: -- Thank you for sharing. [Azure Solution Architect Training|[https://www.igmguru.com/cloud-computing/microsoft-azure-solution-architect-az-300-training/]] has been designed for software developers who are keen on developing best-in-class applications using this open and advanced platform of Windows Azure. > DataFrame.drop should handle multi columns properly > --- > > Key: SPARK-42444 > URL: https://issues.apache.org/jira/browse/SPARK-42444 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Blocker > > {code:java} > from pyspark.sql import Row > df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], > ["age", "name"]) > df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, > name="Bob")]) > df1.join(df2, df1.name == df2.name, 'inner').drop('name', 'age').show() > {code} > This works in 3.3 > {code:java} > +--+ > |height| > +--+ > |85| > |80| > +--+ > {code} > but fails in 3.4 > {code:java} > --- > AnalysisException Traceback (most recent call last) > Cell In[1], line 4 > 2 df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, > "Bob")], ["age", "name"]) > 3 df2 = spark.createDataFrame([Row(height=80, name="Tom"), > Row(height=85, name="Bob")]) > > 4 df1.join(df2, df1.name == df2.name, 'inner').drop('name', > 'age').show() > File ~/Dev/spark/python/pyspark/sql/dataframe.py:4913, in > DataFrame.drop(self, *cols) >4911 jcols = [_to_java_column(c) for c in cols] >4912 first_column, *remaining_columns = jcols > -> 4913 jdf = self._jdf.drop(first_column, self._jseq(remaining_columns)) >4915 return DataFrame(jdf, self.sparkSession) > File ~/Dev/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, > in JavaMember.__call__(self, *args) >1316 command = proto.CALL_COMMAND_NAME +\ >1317 self.command_header +\ >1318 args_command +\ >1319 proto.END_COMMAND_PART >1321 answer = self.gateway_client.send_command(command) > -> 1322 return_value = get_return_value( >1323 answer, self.gateway_client, self.target_id, self.name) >1325 for temp_arg in temp_args: >1326 if hasattr(temp_arg, "_detach"): > File ~/Dev/spark/python/pyspark/errors/exceptions/captured.py:159, in > capture_sql_exception..deco(*a, **kw) > 155 converted = convert_exception(e.java_exception) > 156 if not isinstance(converted, UnknownException): > 157 # Hide where the exception came from that shows a non-Pythonic > 158 # JVM exception message. > --> 159 raise converted from None > 160 else: > 161 raise > AnalysisException: [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could > be: [`name`, `name`]. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org