[ https://issues.apache.org/jira/browse/SPARK-33057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215468#comment-17215468 ]
Li Jin commented on SPARK-33057: -------------------------------- I agree this is an improvement rather than a bug. Although, I am not sure why this is a "Won't fix"/"Invalid". Yes the second code adds project and the first one doesn't. However, I think the logically they are doing the same thing - "Filtering based on the output of a window operation". Whether the output of the window operation is assigned to a new field or not doesn't seem to change the logically meaning. > Cannot use filter with window operations > ---------------------------------------- > > Key: SPARK-33057 > URL: https://issues.apache.org/jira/browse/SPARK-33057 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.0.1 > Reporter: Li Jin > Priority: Major > > Current, trying to use filter with a window operations will fail: > > {code:java} > df = spark.range(100) > win = Window.partitionBy().orderBy('id') > df.filter(F.rank().over(win) > 10).show() > {code} > Error: > {code:java} > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File > "/Users/icexelloss/opt/miniconda3/envs/ibis-dev-spark-3/lib/python3.8/site-packages/pyspark/sql/dataframe.py", > line 1461, in filter > jdf = self._jdf.filter(condition._jc) > File > "/Users/icexelloss/opt/miniconda3/envs/ibis-dev-spark-3/lib/python3.8/site-packages/pyspark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", > line 1304, in __call__ > File > "/Users/icexelloss/opt/miniconda3/envs/ibis-dev-spark-3/lib/python3.8/site-packages/pyspark/sql/utils.py", > line 134, in deco > raise_from(converted) > File "<string>", line 3, in raise_from > pyspark.sql.utils.AnalysisException: It is not allowed to use window > functions inside WHERE clause;{code} > Although the code is same as the code below, which works: > {code:java} > df = spark.range(100) > win = Window.partitionBy().orderBy('id') > df = df.withColumn('rank', F.rank().over(win)) > df = df[df['rank'] > 10] > df = df.drop('rank'){code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org