[jira] [Commented] (SPARK-38614) After Spark update, df.show() shows incorrect F.percent_rank results

Apache Spark (Jira) Tue, 21 Jun 2022 16:48:06 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17557160#comment-17557160
 ]


Apache Spark commented on SPARK-38614:
--------------------------------------

User 'bersprockets' has created a pull request for this issue:
https://github.com/apache/spark/pull/36951

> After Spark update, df.show() shows incorrect F.percent_rank results
> --------------------------------------------------------------------
>
>                 Key: SPARK-38614
>                 URL: https://issues.apache.org/jira/browse/SPARK-38614
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 3.2.0, 3.2.1, 3.3.0
>            Reporter: ZygD
>            Priority: Major
>              Labels: correctness
>
> Expected result is obtained using Spark 3.1.2, but not 3.2.0, 3.2.1 or 3.3.0.
> *Minimal reproducible example*
> {code:java}
> from pyspark.sql import SparkSession, functions as F, Window as W
> spark = SparkSession.builder.getOrCreate()
>  
> df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
> df.show(3)
> df.show(5) {code}
> *Expected result*
> {code:java}
> +---+----+
> | id|  pr|
> +---+----+
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> +---+----+
> only showing top 3 rows
> +---+----+
> | id|  pr|
> +---+----+
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> |  3|0.03|
> |  4|0.04|
> +---+----+
> only showing top 5 rows{code}
> *Actual result*
> {code:java}
> +---+------------------+
> | id|                pr|
> +---+------------------+
> |  0|               0.0|
> |  1|0.3333333333333333|
> |  2|0.6666666666666666|
> +---+------------------+
> only showing top 3 rows
> +---+---+
> | id| pr|
> +---+---+
> |  0|0.0|
> |  1|0.2|
> |  2|0.4|
> |  3|0.6|
> |  4|0.8|
> +---+---+
> only showing top 5 rows{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38614) After Spark update, df.show() shows incorrect F.percent_rank results

Reply via email to