[ https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ZygD updated SPARK-38614: ------------------------- Description: *Minimal reproducible example* {code:java} from pyspark.sql import SparkSession, functions as F, Window as W spark = SparkSession.builder.getOrCreate() df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) df.show(3) df.show(5) {code} *Expected result* {code:java} +---+----+ | id| pr| +---+----+ | 0| 0.0| | 1|0.01| | 2|0.02| +---+----+ only showing top 3 rows +---+----+ | id| pr| +---+----+ | 0| 0.0| | 1|0.01| | 2|0.02| | 3|0.03| | 4|0.04| +---+----+ only showing top 5 rows{code} *Actual result* {code:java} +---+------------------+ | id| pr| +---+------------------+ | 0| 0.0| | 1|0.3333333333333333| | 2|0.6666666666666666| +---+------------------+ only showing top 3 rows +---+---+ | id| pr| +---+---+ | 0|0.0| | 1|0.2| | 2|0.4| | 3|0.6| | 4|0.8| +---+---+ only showing top 5 rows{code} was: *Minimal reproducible example* ```python from pyspark.sql import SparkSession, functions as F, Window as W spark = SparkSession.builder.getOrCreate() df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) df.show(3) df.show(5) ``` *Expected result* ```none +---+----+ | id| pr| +---+----+ | 0| 0.0| | 1|0.01| | 2|0.02| +---+----+ only showing top 3 rows +---+----+ | id| pr| +---+----+ | 0| 0.0| | 1|0.01| | 2|0.02| | 3|0.03| | 4|0.04| +---+----+ only showing top 5 rows ``` *Actual result* ```none +---+------------------+ | id| pr| +---+------------------+ | 0| 0.0| | 1|0.3333333333333333| | 2|0.6666666666666666| +---+------------------+ only showing top 3 rows +---+---+ | id| pr| +---+---+ | 0|0.0| | 1|0.2| | 2|0.4| | 3|0.6| | 4|0.8| +---+---+ only showing top 5 rows ``` > df.show(3) does not equal df.show() first rows > ---------------------------------------------- > > Key: SPARK-38614 > URL: https://issues.apache.org/jira/browse/SPARK-38614 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.2.1 > Reporter: ZygD > Priority: Major > > *Minimal reproducible example* > {code:java} > from pyspark.sql import SparkSession, functions as F, Window as W > spark = SparkSession.builder.getOrCreate() > > df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) > df.show(3) > df.show(5) {code} > *Expected result* > {code:java} > +---+----+ > | id| pr| > +---+----+ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > +---+----+ > only showing top 3 rows > +---+----+ > | id| pr| > +---+----+ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > | 3|0.03| > | 4|0.04| > +---+----+ > only showing top 5 rows{code} > *Actual result* > {code:java} > +---+------------------+ > | id| pr| > +---+------------------+ > | 0| 0.0| > | 1|0.3333333333333333| > | 2|0.6666666666666666| > +---+------------------+ > only showing top 3 rows > +---+---+ > | id| pr| > +---+---+ > | 0|0.0| > | 1|0.2| > | 2|0.4| > | 3|0.6| > | 4|0.8| > +---+---+ > only showing top 5 rows{code} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org