[jira] [Assigned] (SPARK-39942) The input parameter of nsmallest should be validated as Integer

Apache Spark (Jira) Mon, 01 Aug 2022 19:35:05 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-39942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Apache Spark reassigned SPARK-39942:
------------------------------------

    Assignee:     (was: Apache Spark)

> The input parameter of nsmallest should be validated as Integer
> ---------------------------------------------------------------
>
>                 Key: SPARK-39942
>                 URL: https://issues.apache.org/jira/browse/SPARK-39942
>             Project: Spark
>          Issue Type: Bug
>          Components: Pandas API on Spark
>    Affects Versions: 3.2.2
>         Environment: PySpark: Master
>            Reporter: bo zhao
>            Priority: Minor
>
> The input parameter of nsmallest should be validated as Integer. So I think 
> we might miss this validation.
> And PySpark will raise Error when we input the strange types. Such as
>  
> PySpark:
> {code:java}
> >>> df = ps.DataFrame({'A': [1, 2, 3, 4], 'B': [3, 4, 5, 6]}, columns=['A', 
> >>> 'B']) 
> >>> df.groupby(['A'])['B'].nsmallest(1)
>  A    
> 1  0    3 
> 2  1    4 
> 3  2    5 
> 4  3    6 
> Name: B, dtype: int64
> >>> df.groupby(['A'])['B'].nsmallest(True)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/home/spark/spark/python/pyspark/pandas/groupby.py", line 3598, in 
> nsmallest
>     sdf.withColumn(temp_rank_column, F.row_number().over(window))
>   File "/home/spark/spark/python/pyspark/sql/dataframe.py", line 2129, in 
> filter
>     jdf = self._jdf.filter(condition._jc)
>   File 
> "/home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/py4j/java_gateway.py",
>  line 1321, in __call__
>     return_value = get_return_value(
>   File "/home/spark/spark/python/pyspark/sql/utils.py", line 196, in deco
>     raise converted from None
> pyspark.sql.utils.AnalysisException: cannot resolve '(__rank__ <= true)' due 
> to data type mismatch: differing types in '(__rank__ <= true)' (int and 
> boolean).;
> 'Filter (__rank__#4995 <= true)
> +- Project [__index_level_0__#4988L, __index_level_1__#4989L, B#4979L, 
> __natural_order__#4983L, __rank__#4995]
>    +- Project [__index_level_0__#4988L, __index_level_1__#4989L, B#4979L, 
> __natural_order__#4983L, __rank__#4995, __rank__#4995]
>       +- Window [row_number() windowspecdefinition(__index_level_0__#4988L, 
> B#4979L ASC NULLS FIRST, __natural_order__#4983L ASC NULLS FIRST, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
> __rank__#4995], [__index_level_0__#4988L], [B#4979L ASC NULLS FIRST, 
> __natural_order__#4983L ASC NULLS FIRST]
>          +- Project [__index_level_0__#4988L, __index_level_1__#4989L, 
> B#4979L, __natural_order__#4983L]
>             +- Project [A#4978L AS __index_level_0__#4988L, 
> __index_level_0__#4977L AS __index_level_1__#4989L, B#4979L, 
> __natural_order__#4983L]
>                +- Project [__index_level_0__#4977L, A#4978L, B#4979L, 
> monotonically_increasing_id() AS __natural_order__#4983L]
>                   +- LogicalRDD [__index_level_0__#4977L, A#4978L, B#4979L], 
> false
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-39942) The input parameter of nsmallest should be validated as Integer

Reply via email to