Weiluo Ren created SPARK-17895:
----------------------------------

             Summary: Improve documentation of "rowsBetween" and "rangeBetween"
                 Key: SPARK-17895
                 URL: https://issues.apache.org/jira/browse/SPARK-17895
             Project: Spark
          Issue Type: Documentation
          Components: PySpark, SparkR, SQL
            Reporter: Weiluo Ren
            Priority: Minor


This is an issue found by [~junyangq] when he was fixing SparkR docs.

In WindowSpec we have two methods "rangeBetween" and "rowsBetween" (See 
[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala#L82]).
 However, the description of "rangeBetween" does not clearly differentiate it 
from "rowsBetween". Even though in 
[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L109]
 we have pretty nice description for "RangeFrame" and "RowFrame" which are used 
in "rangeBetween" and "rowsBetween", I cannot find them in the online Spark 
scala api. 

We could add small examples to the description of "rangeBetween" and 
"rowsBetween" like
{code}
val df = Seq(1,1,2).toDF("id")
df.withColumn("sum", sum('id) over Window.orderBy('id).rangeBetween(0,1)).show
/**
 * It shows
 * +---+---+
 * | id|sum|
 * +---+---+
 * |  1|  4|
 * |  1|  4|
 * |  2|  2|
 * +---+---+
*/

df.withColumn("sum", sum('id) over Window.orderBy('id).rowsBetween(0,1)).show
/**
 * It shows
 * +---+---+
 * | id|sum|
 * +---+---+
 * |  1|  2|
 * |  1|  3|
 * |  2|  2|
 * +---+---+
*/
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to