[jira] [Assigned] (SPARK-42444) DataFrame.drop should handle multi columns properly

2023-02-23 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-42444:
-

Assignee: Ruifeng Zheng

> DataFrame.drop should handle multi columns properly
> ---
>
> Key: SPARK-42444
> URL: https://issues.apache.org/jira/browse/SPARK-42444
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Blocker
>
> {code:java}
> from pyspark.sql import Row
> df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], 
> ["age", "name"])
> df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, 
> name="Bob")])
> df1.join(df2, df1.name == df2.name, 'inner').drop('name', 'age').show()
> {code}
> This works in 3.3
> {code:java}
> +--+
> |height|
> +--+
> |85|
> |80|
> +--+
> {code}
> but fails in 3.4
> {code:java}
> ---
> AnalysisException Traceback (most recent call last)
> Cell In[1], line 4
>   2 df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, 
> "Bob")], ["age", "name"])
>   3 df2 = spark.createDataFrame([Row(height=80, name="Tom"), 
> Row(height=85, name="Bob")])
> > 4 df1.join(df2, df1.name == df2.name, 'inner').drop('name', 
> 'age').show()
> File ~/Dev/spark/python/pyspark/sql/dataframe.py:4913, in 
> DataFrame.drop(self, *cols)
>4911 jcols = [_to_java_column(c) for c in cols]
>4912 first_column, *remaining_columns = jcols
> -> 4913 jdf = self._jdf.drop(first_column, self._jseq(remaining_columns))
>4915 return DataFrame(jdf, self.sparkSession)
> File ~/Dev/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, 
> in JavaMember.__call__(self, *args)
>1316 command = proto.CALL_COMMAND_NAME +\
>1317 self.command_header +\
>1318 args_command +\
>1319 proto.END_COMMAND_PART
>1321 answer = self.gateway_client.send_command(command)
> -> 1322 return_value = get_return_value(
>1323 answer, self.gateway_client, self.target_id, self.name)
>1325 for temp_arg in temp_args:
>1326 if hasattr(temp_arg, "_detach"):
> File ~/Dev/spark/python/pyspark/errors/exceptions/captured.py:159, in 
> capture_sql_exception..deco(*a, **kw)
> 155 converted = convert_exception(e.java_exception)
> 156 if not isinstance(converted, UnknownException):
> 157 # Hide where the exception came from that shows a non-Pythonic
> 158 # JVM exception message.
> --> 159 raise converted from None
> 160 else:
> 161 raise
> AnalysisException: [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could 
> be: [`name`, `name`].
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42444) DataFrame.drop should handle multi columns properly

2023-02-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42444:


Assignee: (was: Apache Spark)

> DataFrame.drop should handle multi columns properly
> ---
>
> Key: SPARK-42444
> URL: https://issues.apache.org/jira/browse/SPARK-42444
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Blocker
>
> {code:java}
> from pyspark.sql import Row
> df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], 
> ["age", "name"])
> df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, 
> name="Bob")])
> df1.join(df2, df1.name == df2.name, 'inner').drop('name', 'age').show()
> {code}
> This works in 3.3
> {code:java}
> +--+
> |height|
> +--+
> |85|
> |80|
> +--+
> {code}
> but fails in 3.4
> {code:java}
> ---
> AnalysisException Traceback (most recent call last)
> Cell In[1], line 4
>   2 df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, 
> "Bob")], ["age", "name"])
>   3 df2 = spark.createDataFrame([Row(height=80, name="Tom"), 
> Row(height=85, name="Bob")])
> > 4 df1.join(df2, df1.name == df2.name, 'inner').drop('name', 
> 'age').show()
> File ~/Dev/spark/python/pyspark/sql/dataframe.py:4913, in 
> DataFrame.drop(self, *cols)
>4911 jcols = [_to_java_column(c) for c in cols]
>4912 first_column, *remaining_columns = jcols
> -> 4913 jdf = self._jdf.drop(first_column, self._jseq(remaining_columns))
>4915 return DataFrame(jdf, self.sparkSession)
> File ~/Dev/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, 
> in JavaMember.__call__(self, *args)
>1316 command = proto.CALL_COMMAND_NAME +\
>1317 self.command_header +\
>1318 args_command +\
>1319 proto.END_COMMAND_PART
>1321 answer = self.gateway_client.send_command(command)
> -> 1322 return_value = get_return_value(
>1323 answer, self.gateway_client, self.target_id, self.name)
>1325 for temp_arg in temp_args:
>1326 if hasattr(temp_arg, "_detach"):
> File ~/Dev/spark/python/pyspark/errors/exceptions/captured.py:159, in 
> capture_sql_exception..deco(*a, **kw)
> 155 converted = convert_exception(e.java_exception)
> 156 if not isinstance(converted, UnknownException):
> 157 # Hide where the exception came from that shows a non-Pythonic
> 158 # JVM exception message.
> --> 159 raise converted from None
> 160 else:
> 161 raise
> AnalysisException: [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could 
> be: [`name`, `name`].
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42444) DataFrame.drop should handle multi columns properly

2023-02-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42444:


Assignee: Apache Spark

> DataFrame.drop should handle multi columns properly
> ---
>
> Key: SPARK-42444
> URL: https://issues.apache.org/jira/browse/SPARK-42444
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Blocker
>
> {code:java}
> from pyspark.sql import Row
> df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], 
> ["age", "name"])
> df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, 
> name="Bob")])
> df1.join(df2, df1.name == df2.name, 'inner').drop('name', 'age').show()
> {code}
> This works in 3.3
> {code:java}
> +--+
> |height|
> +--+
> |85|
> |80|
> +--+
> {code}
> but fails in 3.4
> {code:java}
> ---
> AnalysisException Traceback (most recent call last)
> Cell In[1], line 4
>   2 df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, 
> "Bob")], ["age", "name"])
>   3 df2 = spark.createDataFrame([Row(height=80, name="Tom"), 
> Row(height=85, name="Bob")])
> > 4 df1.join(df2, df1.name == df2.name, 'inner').drop('name', 
> 'age').show()
> File ~/Dev/spark/python/pyspark/sql/dataframe.py:4913, in 
> DataFrame.drop(self, *cols)
>4911 jcols = [_to_java_column(c) for c in cols]
>4912 first_column, *remaining_columns = jcols
> -> 4913 jdf = self._jdf.drop(first_column, self._jseq(remaining_columns))
>4915 return DataFrame(jdf, self.sparkSession)
> File ~/Dev/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, 
> in JavaMember.__call__(self, *args)
>1316 command = proto.CALL_COMMAND_NAME +\
>1317 self.command_header +\
>1318 args_command +\
>1319 proto.END_COMMAND_PART
>1321 answer = self.gateway_client.send_command(command)
> -> 1322 return_value = get_return_value(
>1323 answer, self.gateway_client, self.target_id, self.name)
>1325 for temp_arg in temp_args:
>1326 if hasattr(temp_arg, "_detach"):
> File ~/Dev/spark/python/pyspark/errors/exceptions/captured.py:159, in 
> capture_sql_exception..deco(*a, **kw)
> 155 converted = convert_exception(e.java_exception)
> 156 if not isinstance(converted, UnknownException):
> 157 # Hide where the exception came from that shows a non-Pythonic
> 158 # JVM exception message.
> --> 159 raise converted from None
> 160 else:
> 161 raise
> AnalysisException: [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could 
> be: [`name`, `name`].
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org