This is an known but in 1.4.1, fixed in 1.4.2 and 1.5 (both are not released yet).
On Thu, Sep 3, 2015 at 7:41 AM, Sergey Shcherbakov <sergey.shcherba...@gmail.com> wrote: > Hello all, > > I'm experimenting with Spark 1.4.1 window functions > and have come to a problem in pySpark that I've described in a Stackoverflow > question > > In essence, the > > wSpec = Window.orderBy(df.a) > df.select(df.a, func.rank().over(wSpec).alias("rank")).collect() > df.select(df.a, func.lag(df.b,1).over(wSpec).alias("prev"), df.b, > func.lead(df.b,1).over(wSpec).alias("next")).collect() > > does not work in pySpark: exception for the first collect() and None output > from window function in the second collect(). > > While the same example in Spark/Scala works fine: > > val wSpec = Window.orderBy("a") > df.select(df("a"), rank().over(wSpec).alias("rank")).collect() > df.select(df("a"), lag(df("b"),1).over(wSpec).alias("prev"), df("b"), > lead(df("b"),1).over(wSpec).alias("next")) > > Am I doing anything wrong or this is a pySpark issue indeed? > > > Best Regards, > Sergey > > PS: Here is the full pySpark shell example: > > from pyspark.sql.window import Window > import pyspark.sql.functions as func > > l = [(1,101),(2,202),(3,303),(4,404),(5,505)] > df = sqlContext.createDataFrame(l,["a","b"]) > wSpec = Window.orderBy(df.a).rowsBetween(-1,1) > df.select(df.a, func.rank().over(wSpec).alias("rank")) > # ==> Failure org.apache.spark.sql.AnalysisException: Window function rank > does not take a frame specification. > df.select(df.a, func.lag(df.b,1).over(wSpec).alias("prev"), df.b, > func.lead(df.b,1).over(wSpec).alias("next")) > # ===> org.apache.spark.sql.AnalysisException: Window function lag does not > take a frame specification.; > > > wSpec = Window.orderBy(df.a) > df.select(df.a, func.rank().over(wSpec).alias("rank")) > # ===> org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more > arguments are expected. > > df.select(df.a, func.lag(df.b,1).over(wSpec).alias("prev"), df.b, > func.lead(df.b,1).over(wSpec).alias("next")).collect() > # [Row(a=1, prev=None, b=101, next=None), Row(a=2, prev=None, b=202, > next=None), Row(a=3, prev=None, b=303, next=None)] > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org