[ 
https://issues.apache.org/jira/browse/SPARK-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jo Desmet updated SPARK-10893:
------------------------------
    Description: 
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster 
mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}

Java:
{code:borderStyle=solid}
    SparkContext sc = new SparkContext(conf);
    HiveContext sqlContext = new HiveContext(sc);
    DataFrame df = sqlContext.read().json(getInputPath("input.json"));
    
    df = df.withColumn(
      "previous",
      lag(dataFrame.col("VBB"), 1)
        .over(Window.orderBy(dataFrame.col("VAA")))
      );
{code}

  was:
Trying to aggregate with the LAG Analytic function gives the wrong result. In 
my testcase it was always giving the fixed value '103079215105' when I tried to 
run on an integer.
Note that this only happens on Spark 1.5.0, and only when running in cluster 
mode.
It works fine when running on Spark 1.4.1, or when running in local mode. 
I did not test on a yarn cluster.
I did not test other analytic aggregates.

Input Jason:
{"VAA":"A", "VBB":1}
{"VAA":"B", "VBB":-1}
{"VAA":"C", "VBB":2}
{"VAA":"d", "VBB":3}
{"VAA":null, "VBB":null}

Java:
{code:title=Bar.java|borderStyle=solid}
    SparkContext sc = new SparkContext(conf);
    HiveContext sqlContext = new HiveContext(sc);
    DataFrame df = sqlContext.read().json(getInputPath("input.json"));
    
    df = df.withColumn(
      "previous",
      lag(dataFrame.col("VBB"), 1)
        .over(Window.orderBy(dataFrame.col("VAA")))
      );
{code}


> Lag Analytic function broken
> ----------------------------
>
>                 Key: SPARK-10893
>                 URL: https://issues.apache.org/jira/browse/SPARK-10893
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 1.5.0
>         Environment: Spark Standalone Cluster on Linux
>            Reporter: Jo Desmet
>
> Trying to aggregate with the LAG Analytic function gives the wrong result. In 
> my testcase it was always giving the fixed value '103079215105' when I tried 
> to run on an integer.
> Note that this only happens on Spark 1.5.0, and only when running in cluster 
> mode.
> It works fine when running on Spark 1.4.1, or when running in local mode. 
> I did not test on a yarn cluster.
> I did not test other analytic aggregates.
> Input Jason:
> {"VAA":"A", "VBB":1}
> {"VAA":"B", "VBB":-1}
> {"VAA":"C", "VBB":2}
> {"VAA":"d", "VBB":3}
> {"VAA":null, "VBB":null}
> Java:
> {code:borderStyle=solid}
>     SparkContext sc = new SparkContext(conf);
>     HiveContext sqlContext = new HiveContext(sc);
>     DataFrame df = sqlContext.read().json(getInputPath("input.json"));
>     
>     df = df.withColumn(
>       "previous",
>       lag(dataFrame.col("VBB"), 1)
>         .over(Window.orderBy(dataFrame.col("VAA")))
>       );
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to