[ https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17557160#comment-17557160 ]
Apache Spark commented on SPARK-38614: -------------------------------------- User 'bersprockets' has created a pull request for this issue: https://github.com/apache/spark/pull/36951 > After Spark update, df.show() shows incorrect F.percent_rank results > -------------------------------------------------------------------- > > Key: SPARK-38614 > URL: https://issues.apache.org/jira/browse/SPARK-38614 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL > Affects Versions: 3.2.0, 3.2.1, 3.3.0 > Reporter: ZygD > Priority: Major > Labels: correctness > > Expected result is obtained using Spark 3.1.2, but not 3.2.0, 3.2.1 or 3.3.0. > *Minimal reproducible example* > {code:java} > from pyspark.sql import SparkSession, functions as F, Window as W > spark = SparkSession.builder.getOrCreate() > > df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) > df.show(3) > df.show(5) {code} > *Expected result* > {code:java} > +---+----+ > | id| pr| > +---+----+ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > +---+----+ > only showing top 3 rows > +---+----+ > | id| pr| > +---+----+ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > | 3|0.03| > | 4|0.04| > +---+----+ > only showing top 5 rows{code} > *Actual result* > {code:java} > +---+------------------+ > | id| pr| > +---+------------------+ > | 0| 0.0| > | 1|0.3333333333333333| > | 2|0.6666666666666666| > +---+------------------+ > only showing top 3 rows > +---+---+ > | id| pr| > +---+---+ > | 0|0.0| > | 1|0.2| > | 2|0.4| > | 3|0.6| > | 4|0.8| > +---+---+ > only showing top 5 rows{code} -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org