[jira] [Commented] (SPARK-9740) first/last aggregate NULL behavior

2016-02-16 Thread Emlyn Corrin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148472#comment-15148472
 ] 

Emlyn Corrin commented on SPARK-9740:
-

Thanks, the {{registerTempTable}} + {{sql}} workaround is fine for now, I guess 
I'll just wait for 2.0 to clean that up.

> first/last aggregate NULL behavior
> --
>
> Key: SPARK-9740
> URL: https://issues.apache.org/jira/browse/SPARK-9740
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Herman van Hovell
>Assignee: Yin Huai
>  Labels: releasenotes
> Fix For: 1.6.0
>
>
> The FIRST/LAST aggregates implemented as part of the new UDAF interface, 
> return the first or last non-null value (if any) found. This is a departure 
> from the behavior of the old FIRST/LAST aggregates and from the 
> FIRST_VALUE/LAST_VALUE aggregates in Hive. These would return a null value, 
> if that happened to be the first/last value seen. SPARK-9592 tries to 'fix' 
> this behavior for the old UDAF interface.
> Hive makes this behavior configurable, by adding a skipNulls flag. I would 
> suggest to do the same, and make the default behavior compatible with Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9740) first/last aggregate NULL behavior

2016-02-16 Thread Herman van Hovell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148463#comment-15148463
 ] 

Herman van Hovell commented on SPARK-9740:
--

[~emlyn] We have merged a patch for this, see: 
https://issues.apache.org/jira/browse/SPARK-13049. Things like this typically 
don't get backported; since it is more a feature than a bug.

You could still use the sql workarround. You could also create some glue code, 
i.e. put the contents of the merged PR in a different object 
(ExtendedFunctions?) in the org.apache.spark.sql package. I sometimes do this 
when I need such functionality.


> first/last aggregate NULL behavior
> --
>
> Key: SPARK-9740
> URL: https://issues.apache.org/jira/browse/SPARK-9740
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Herman van Hovell
>Assignee: Yin Huai
>  Labels: releasenotes
> Fix For: 1.6.0
>
>
> The FIRST/LAST aggregates implemented as part of the new UDAF interface, 
> return the first or last non-null value (if any) found. This is a departure 
> from the behavior of the old FIRST/LAST aggregates and from the 
> FIRST_VALUE/LAST_VALUE aggregates in Hive. These would return a null value, 
> if that happened to be the first/last value seen. SPARK-9592 tries to 'fix' 
> this behavior for the old UDAF interface.
> Hive makes this behavior configurable, by adding a skipNulls flag. I would 
> suggest to do the same, and make the default behavior compatible with Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9740) first/last aggregate NULL behavior

2016-02-16 Thread Emlyn Corrin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148458#comment-15148458
 ] 

Emlyn Corrin commented on SPARK-9740:
-

Any update on this? Should I open a new issue for it so it doesn't fall through 
the cracks?

> first/last aggregate NULL behavior
> --
>
> Key: SPARK-9740
> URL: https://issues.apache.org/jira/browse/SPARK-9740
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Herman van Hovell
>Assignee: Yin Huai
>  Labels: releasenotes
> Fix For: 1.6.0
>
>
> The FIRST/LAST aggregates implemented as part of the new UDAF interface, 
> return the first or last non-null value (if any) found. This is a departure 
> from the behavior of the old FIRST/LAST aggregates and from the 
> FIRST_VALUE/LAST_VALUE aggregates in Hive. These would return a null value, 
> if that happened to be the first/last value seen. SPARK-9592 tries to 'fix' 
> this behavior for the old UDAF interface.
> Hive makes this behavior configurable, by adding a skipNulls flag. I would 
> suggest to do the same, and make the default behavior compatible with Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9740) first/last aggregate NULL behavior

2016-01-27 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15119673#comment-15119673
 ] 

Yin Huai commented on SPARK-9740:
-

Thank you!

> first/last aggregate NULL behavior
> --
>
> Key: SPARK-9740
> URL: https://issues.apache.org/jira/browse/SPARK-9740
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Herman van Hovell
>Assignee: Yin Huai
>  Labels: releasenotes
> Fix For: 1.6.0
>
>
> The FIRST/LAST aggregates implemented as part of the new UDAF interface, 
> return the first or last non-null value (if any) found. This is a departure 
> from the behavior of the old FIRST/LAST aggregates and from the 
> FIRST_VALUE/LAST_VALUE aggregates in Hive. These would return a null value, 
> if that happened to be the first/last value seen. SPARK-9592 tries to 'fix' 
> this behavior for the old UDAF interface.
> Hive makes this behavior configurable, by adding a skipNulls flag. I would 
> suggest to do the same, and make the default behavior compatible with Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9740) first/last aggregate NULL behavior

2016-01-27 Thread Herman van Hovell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15119240#comment-15119240
 ] 

Herman van Hovell commented on SPARK-9740:
--

[~yhuai] I'll have a look.

> first/last aggregate NULL behavior
> --
>
> Key: SPARK-9740
> URL: https://issues.apache.org/jira/browse/SPARK-9740
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Herman van Hovell
>Assignee: Yin Huai
>  Labels: releasenotes
> Fix For: 1.6.0
>
>
> The FIRST/LAST aggregates implemented as part of the new UDAF interface, 
> return the first or last non-null value (if any) found. This is a departure 
> from the behavior of the old FIRST/LAST aggregates and from the 
> FIRST_VALUE/LAST_VALUE aggregates in Hive. These would return a null value, 
> if that happened to be the first/last value seen. SPARK-9592 tries to 'fix' 
> this behavior for the old UDAF interface.
> Hive makes this behavior configurable, by adding a skipNulls flag. I would 
> suggest to do the same, and make the default behavior compatible with Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9740) first/last aggregate NULL behavior

2016-01-26 Thread Emlyn Corrin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116977#comment-15116977
 ] 

Emlyn Corrin commented on SPARK-9740:
-

I've put together a minimal example to demonstrate the problem:
{code}
package spark_test;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.functions;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.SQLContext;
import org.apache.spark.sql.expressions.Window;
import org.apache.spark.sql.hive.HiveContext;

public class SparkTestMain {
public static void main(String[] args) {
if (args.length != 1) {
System.err.println("Usage: SparkTest ");
System.exit(1);
}
SparkConf sparkConf = new SparkConf().setAppName("SparkTest");
JavaSparkContext ctx = new JavaSparkContext(sparkConf);
SQLContext sqlCtx = new HiveContext(ctx); // I tried both SQLContext 
and HiveContext here
DataFrame df = sqlCtx.read().json(args[0]);
System.out.println(df.schema().simpleString());
DataFrame df2 = df.select(functions.expr("FIRST(value,true)")
  .over(Window.partitionBy(df.col("id"))
.orderBy(df.col("time"))
.rowsBetween(Long.MIN_VALUE, 0)));
System.out.println(df2.take(5));
}
}
{code}

I ran that with:
{code}
spark-submit --master local[*] spark-test.jar test.json
{code}
And it fails with:
{code}
struct
Exception in thread "main" java.lang.UnsupportedOperationException: 
'FIRST('value,true) is not supported in a window operation.
at 
org.apache.spark.sql.expressions.WindowSpec.withAggregate(WindowSpec.scala:191)
at org.apache.spark.sql.Column.over(Column.scala:1049)
at spark_test.SparkTestMain.main(SparkTestMain.java:23)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{code}

> first/last aggregate NULL behavior
> --
>
> Key: SPARK-9740
> URL: https://issues.apache.org/jira/browse/SPARK-9740
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Herman van Hovell
>Assignee: Yin Huai
>  Labels: releasenotes
> Fix For: 1.6.0
>
>
> The FIRST/LAST aggregates implemented as part of the new UDAF interface, 
> return the first or last non-null value (if any) found. This is a departure 
> from the behavior of the old FIRST/LAST aggregates and from the 
> FIRST_VALUE/LAST_VALUE aggregates in Hive. These would return a null value, 
> if that happened to be the first/last value seen. SPARK-9592 tries to 'fix' 
> this behavior for the old UDAF interface.
> Hive makes this behavior configurable, by adding a skipNulls flag. I would 
> suggest to do the same, and make the default behavior compatible with Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9740) first/last aggregate NULL behavior

2016-01-26 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117651#comment-15117651
 ] 

Yin Huai commented on SPARK-9740:
-

[~hvanhovell] Will you have time to take a look? Thanks :)

> first/last aggregate NULL behavior
> --
>
> Key: SPARK-9740
> URL: https://issues.apache.org/jira/browse/SPARK-9740
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Herman van Hovell
>Assignee: Yin Huai
>  Labels: releasenotes
> Fix For: 1.6.0
>
>
> The FIRST/LAST aggregates implemented as part of the new UDAF interface, 
> return the first or last non-null value (if any) found. This is a departure 
> from the behavior of the old FIRST/LAST aggregates and from the 
> FIRST_VALUE/LAST_VALUE aggregates in Hive. These would return a null value, 
> if that happened to be the first/last value seen. SPARK-9592 tries to 'fix' 
> this behavior for the old UDAF interface.
> Hive makes this behavior configurable, by adding a skipNulls flag. I would 
> suggest to do the same, and make the default behavior compatible with Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9740) first/last aggregate NULL behavior

2016-01-25 Thread Emlyn Corrin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115904#comment-15115904
 ] 

Emlyn Corrin commented on SPARK-9740:
-

Thanks for the help. I've tried with {{callUDF}} and that gives me the same 
error as when I use {{expr}}. For now I've managed to work around it by calling 
{{registerTempTable("tempTable")}} on the DataFrame, and then 
{{SQLContext.sql("SELECT LAST(colName,true) OVER(...) FROM tempTable")}}, which 
works, but feels a bit hacky.
I'll try to put together a minimal example that demonstrates this, as it is 
currently in the middle of a fairly big Clojure application that calls Spark 
through Java interop.

> first/last aggregate NULL behavior
> --
>
> Key: SPARK-9740
> URL: https://issues.apache.org/jira/browse/SPARK-9740
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Herman van Hovell
>Assignee: Yin Huai
>  Labels: releasenotes
> Fix For: 1.6.0
>
>
> The FIRST/LAST aggregates implemented as part of the new UDAF interface, 
> return the first or last non-null value (if any) found. This is a departure 
> from the behavior of the old FIRST/LAST aggregates and from the 
> FIRST_VALUE/LAST_VALUE aggregates in Hive. These would return a null value, 
> if that happened to be the first/last value seen. SPARK-9592 tries to 'fix' 
> this behavior for the old UDAF interface.
> Hive makes this behavior configurable, by adding a skipNulls flag. I would 
> suggest to do the same, and make the default behavior compatible with Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9740) first/last aggregate NULL behavior

2016-01-25 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115907#comment-15115907
 ] 

Yin Huai commented on SPARK-9740:
-

Can you provide the full stack trace?

> first/last aggregate NULL behavior
> --
>
> Key: SPARK-9740
> URL: https://issues.apache.org/jira/browse/SPARK-9740
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Herman van Hovell
>Assignee: Yin Huai
>  Labels: releasenotes
> Fix For: 1.6.0
>
>
> The FIRST/LAST aggregates implemented as part of the new UDAF interface, 
> return the first or last non-null value (if any) found. This is a departure 
> from the behavior of the old FIRST/LAST aggregates and from the 
> FIRST_VALUE/LAST_VALUE aggregates in Hive. These would return a null value, 
> if that happened to be the first/last value seen. SPARK-9592 tries to 'fix' 
> this behavior for the old UDAF interface.
> Hive makes this behavior configurable, by adding a skipNulls flag. I would 
> suggest to do the same, and make the default behavior compatible with Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9740) first/last aggregate NULL behavior

2016-01-25 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115480#comment-15115480
 ] 

Yin Huai commented on SPARK-9740:
-

Can you attach your code? Also, can you try to use {{functions.callUDF("last", 
col, functions.lit(true))}}?

> first/last aggregate NULL behavior
> --
>
> Key: SPARK-9740
> URL: https://issues.apache.org/jira/browse/SPARK-9740
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Herman van Hovell
>Assignee: Yin Huai
>  Labels: releasenotes
> Fix For: 1.6.0
>
>
> The FIRST/LAST aggregates implemented as part of the new UDAF interface, 
> return the first or last non-null value (if any) found. This is a departure 
> from the behavior of the old FIRST/LAST aggregates and from the 
> FIRST_VALUE/LAST_VALUE aggregates in Hive. These would return a null value, 
> if that happened to be the first/last value seen. SPARK-9592 tries to 'fix' 
> this behavior for the old UDAF interface.
> Hive makes this behavior configurable, by adding a skipNulls flag. I would 
> suggest to do the same, and make the default behavior compatible with Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9740) first/last aggregate NULL behavior

2016-01-25 Thread Herman van Hovell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115650#comment-15115650
 ] 

Herman van Hovell commented on SPARK-9740:
--

We are probably resolving the Hive function by accident. The First/Last 
functions probably don't have an expressions only constructor.

> first/last aggregate NULL behavior
> --
>
> Key: SPARK-9740
> URL: https://issues.apache.org/jira/browse/SPARK-9740
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Herman van Hovell
>Assignee: Yin Huai
>  Labels: releasenotes
> Fix For: 1.6.0
>
>
> The FIRST/LAST aggregates implemented as part of the new UDAF interface, 
> return the first or last non-null value (if any) found. This is a departure 
> from the behavior of the old FIRST/LAST aggregates and from the 
> FIRST_VALUE/LAST_VALUE aggregates in Hive. These would return a null value, 
> if that happened to be the first/last value seen. SPARK-9592 tries to 'fix' 
> this behavior for the old UDAF interface.
> Hive makes this behavior configurable, by adding a skipNulls flag. I would 
> suggest to do the same, and make the default behavior compatible with Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9740) first/last aggregate NULL behavior

2016-01-25 Thread Herman van Hovell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115659#comment-15115659
 ] 

Herman van Hovell commented on SPARK-9740:
--

Hmmm... It does have a suitable constructor. Please attach an example.

> first/last aggregate NULL behavior
> --
>
> Key: SPARK-9740
> URL: https://issues.apache.org/jira/browse/SPARK-9740
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Herman van Hovell
>Assignee: Yin Huai
>  Labels: releasenotes
> Fix For: 1.6.0
>
>
> The FIRST/LAST aggregates implemented as part of the new UDAF interface, 
> return the first or last non-null value (if any) found. This is a departure 
> from the behavior of the old FIRST/LAST aggregates and from the 
> FIRST_VALUE/LAST_VALUE aggregates in Hive. These would return a null value, 
> if that happened to be the first/last value seen. SPARK-9592 tries to 'fix' 
> this behavior for the old UDAF interface.
> Hive makes this behavior configurable, by adding a skipNulls flag. I would 
> suggest to do the same, and make the default behavior compatible with Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9740) first/last aggregate NULL behavior

2016-01-25 Thread Emlyn Corrin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116297#comment-15116297
 ] 

Emlyn Corrin commented on SPARK-9740:
-

Here's the stack trace, it fails during compilation:
{code}
java.lang.UnsupportedOperationException: 'last('timestamp,true) is not 
supported in a window operation., compiling:(test.clj:8:1)
Exception in thread "main" java.lang.UnsupportedOperationException: 
'last('timestamp,true) is not supported in a window operation., 
compiling:(test.clj:8:1)
at clojure.lang.Compiler$InvokeExpr.eval(Compiler.java:3657)
at clojure.lang.Compiler$IfExpr.eval(Compiler.java:2695)
at clojure.lang.Compiler.compile1(Compiler.java:7474)
at clojure.lang.Compiler.compile(Compiler.java:7541)
at clojure.lang.RT.compile(RT.java:406)
at clojure.lang.RT.load(RT.java:451)
at clojure.lang.RT.load(RT.java:419)
at clojure.core$load$fn__5677.invoke(core.clj:5893)
at clojure.core$load.invokeStatic(core.clj:5892)
at clojure.core$load.doInvoke(core.clj:5876)
at clojure.lang.RestFn.invoke(RestFn.java:408)
at clojure.core$load_one.invokeStatic(core.clj:5697)
at clojure.core$load_one.invoke(core.clj:5692)
at clojure.core$load_lib$fn__5626.invoke(core.clj:5737)
at clojure.core$load_lib.invokeStatic(core.clj:5736)
at clojure.core$load_lib.doInvoke(core.clj:5717)
at clojure.lang.RestFn.applyTo(RestFn.java:142)
at clojure.core$apply.invokeStatic(core.clj:648)
at clojure.core$load_libs.invokeStatic(core.clj:5774)
at clojure.core$load_libs.doInvoke(core.clj:5758)
at clojure.lang.RestFn.applyTo(RestFn.java:137)
at clojure.core$apply.invokeStatic(core.clj:648)
at clojure.core$require.invokeStatic(core.clj:5796)
at sparkalytics.core$loading__5569__auto7689.invoke(core.clj:1)
at clojure.lang.AFn.applyToHelper(AFn.java:152)
at clojure.lang.AFn.applyTo(AFn.java:144)
at clojure.lang.Compiler$InvokeExpr.eval(Compiler.java:3652)
at clojure.lang.Compiler.compile1(Compiler.java:7474)
at clojure.lang.Compiler.compile1(Compiler.java:7464)
at clojure.lang.Compiler.compile(Compiler.java:7541)
at clojure.lang.RT.compile(RT.java:406)
at clojure.lang.RT.load(RT.java:451)
at clojure.lang.RT.load(RT.java:419)
at clojure.core$load$fn__5677.invoke(core.clj:5893)
at clojure.core$load.invokeStatic(core.clj:5892)
at clojure.core$load.doInvoke(core.clj:5876)
at clojure.lang.RestFn.invoke(RestFn.java:408)
at clojure.core$load_one.invokeStatic(core.clj:5697)
at clojure.core$compile$fn__5682.invoke(core.clj:5903)
at clojure.core$compile.invokeStatic(core.clj:5903)
at user$eval13$fn__22.invoke(form-init8921303069087101413.clj:1)
at user$eval13.invokeStatic(form-init8921303069087101413.clj:1)
at user$eval13.invoke(form-init8921303069087101413.clj:1)
at clojure.lang.Compiler.eval(Compiler.java:6927)
at clojure.lang.Compiler.eval(Compiler.java:6917)
at clojure.lang.Compiler.load(Compiler.java:7379)
at clojure.lang.Compiler.loadFile(Compiler.java:7317)
at clojure.main$load_script.invokeStatic(main.clj:275)
at clojure.main$init_opt.invokeStatic(main.clj:277)
at clojure.main$init_opt.invoke(main.clj:277)
at clojure.main$initialize.invokeStatic(main.clj:308)
at clojure.main$null_opt.invokeStatic(main.clj:342)
at clojure.main$null_opt.invoke(main.clj:339)
at clojure.main$main.invokeStatic(main.clj:421)
at clojure.main$main.doInvoke(main.clj:384)
at clojure.lang.RestFn.invoke(RestFn.java:421)
at clojure.lang.Var.invoke(Var.java:383)
at clojure.lang.AFn.applyToHelper(AFn.java:156)
at clojure.lang.Var.applyTo(Var.java:700)
at clojure.main.main(main.java:37)
Caused by: java.lang.UnsupportedOperationException: 'last('timestamp,true) is 
not supported in a window operation.
at 
org.apache.spark.sql.expressions.WindowSpec.withAggregate(WindowSpec.scala:191)
at org.apache.spark.sql.Column.over(Column.scala:1049)
at sparkalytics.jobs.test$fn__8460.invokeStatic(test.clj:10)
at sparkalytics.jobs.test$fn__8460.invoke(test.clj:8)
at clojure.lang.AFn.applyToHelper(AFn.java:152)
at clojure.lang.AFn.applyTo(AFn.java:144)
at clojure.lang.Compiler$InvokeExpr.eval(Compiler.java:3652)
... 59 more
Compilation failed: Subprocess failed
{code}
(I'm not sure how to get the 59 missing lines from the end, but hopefully this 
is enough)

> first/last aggregate NULL behavior
> --
>
> Key: SPARK-9740
> URL: https://issues.apache.org/jira/browse/SPARK-9740
> Project: Spark

[jira] [Commented] (SPARK-9740) first/last aggregate NULL behavior

2016-01-25 Thread Herman van Hovell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116317#comment-15116317
 ] 

Herman van Hovell commented on SPARK-9740:
--

[~emlyn] What version of Spark are you using?

> first/last aggregate NULL behavior
> --
>
> Key: SPARK-9740
> URL: https://issues.apache.org/jira/browse/SPARK-9740
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Herman van Hovell
>Assignee: Yin Huai
>  Labels: releasenotes
> Fix For: 1.6.0
>
>
> The FIRST/LAST aggregates implemented as part of the new UDAF interface, 
> return the first or last non-null value (if any) found. This is a departure 
> from the behavior of the old FIRST/LAST aggregates and from the 
> FIRST_VALUE/LAST_VALUE aggregates in Hive. These would return a null value, 
> if that happened to be the first/last value seen. SPARK-9592 tries to 'fix' 
> this behavior for the old UDAF interface.
> Hive makes this behavior configurable, by adding a skipNulls flag. I would 
> suggest to do the same, and make the default behavior compatible with Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9740) first/last aggregate NULL behavior

2016-01-25 Thread Emlyn Corrin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116329#comment-15116329
 ] 

Emlyn Corrin commented on SPARK-9740:
-

Version 1.6.0

> first/last aggregate NULL behavior
> --
>
> Key: SPARK-9740
> URL: https://issues.apache.org/jira/browse/SPARK-9740
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Herman van Hovell
>Assignee: Yin Huai
>  Labels: releasenotes
> Fix For: 1.6.0
>
>
> The FIRST/LAST aggregates implemented as part of the new UDAF interface, 
> return the first or last non-null value (if any) found. This is a departure 
> from the behavior of the old FIRST/LAST aggregates and from the 
> FIRST_VALUE/LAST_VALUE aggregates in Hive. These would return a null value, 
> if that happened to be the first/last value seen. SPARK-9592 tries to 'fix' 
> this behavior for the old UDAF interface.
> Hive makes this behavior configurable, by adding a skipNulls flag. I would 
> suggest to do the same, and make the default behavior compatible with Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9740) first/last aggregate NULL behavior

2016-01-25 Thread Emlyn Corrin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114935#comment-15114935
 ] 

Emlyn Corrin commented on SPARK-9740:
-

Thanks [~yhuai], that's just what I was looking for!

> first/last aggregate NULL behavior
> --
>
> Key: SPARK-9740
> URL: https://issues.apache.org/jira/browse/SPARK-9740
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Herman van Hovell
>Assignee: Yin Huai
>  Labels: releasenotes
> Fix For: 1.6.0
>
>
> The FIRST/LAST aggregates implemented as part of the new UDAF interface, 
> return the first or last non-null value (if any) found. This is a departure 
> from the behavior of the old FIRST/LAST aggregates and from the 
> FIRST_VALUE/LAST_VALUE aggregates in Hive. These would return a null value, 
> if that happened to be the first/last value seen. SPARK-9592 tries to 'fix' 
> this behavior for the old UDAF interface.
> Hive makes this behavior configurable, by adding a skipNulls flag. I would 
> suggest to do the same, and make the default behavior compatible with Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9740) first/last aggregate NULL behavior

2016-01-25 Thread Emlyn Corrin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115115#comment-15115115
 ] 

Emlyn Corrin commented on SPARK-9740:
-

[~yhuai] actually, that doesn't seem to work, when I try it, I get:
{code}
java.lang.UnsupportedOperationException: 'LAST('column,true) is not supported 
in a window operation.
{code}

While {{functions.last}} seems to work fine in window operations.

> first/last aggregate NULL behavior
> --
>
> Key: SPARK-9740
> URL: https://issues.apache.org/jira/browse/SPARK-9740
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Herman van Hovell
>Assignee: Yin Huai
>  Labels: releasenotes
> Fix For: 1.6.0
>
>
> The FIRST/LAST aggregates implemented as part of the new UDAF interface, 
> return the first or last non-null value (if any) found. This is a departure 
> from the behavior of the old FIRST/LAST aggregates and from the 
> FIRST_VALUE/LAST_VALUE aggregates in Hive. These would return a null value, 
> if that happened to be the first/last value seen. SPARK-9592 tries to 'fix' 
> this behavior for the old UDAF interface.
> Hive makes this behavior configurable, by adding a skipNulls flag. I would 
> suggest to do the same, and make the default behavior compatible with Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9740) first/last aggregate NULL behavior

2016-01-24 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114721#comment-15114721
 ] 

Yin Huai commented on SPARK-9740:
-

[~emlyn] You can use {{expr}} function provided by 
{{org.apache.spark.sql.functions}} to do that. For example, 
{{expr("first(colName, true)")}}.

> first/last aggregate NULL behavior
> --
>
> Key: SPARK-9740
> URL: https://issues.apache.org/jira/browse/SPARK-9740
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Herman van Hovell
>Assignee: Yin Huai
>  Labels: releasenotes
> Fix For: 1.6.0
>
>
> The FIRST/LAST aggregates implemented as part of the new UDAF interface, 
> return the first or last non-null value (if any) found. This is a departure 
> from the behavior of the old FIRST/LAST aggregates and from the 
> FIRST_VALUE/LAST_VALUE aggregates in Hive. These would return a null value, 
> if that happened to be the first/last value seen. SPARK-9592 tries to 'fix' 
> this behavior for the old UDAF interface.
> Hive makes this behavior configurable, by adding a skipNulls flag. I would 
> suggest to do the same, and make the default behavior compatible with Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9740) first/last aggregate NULL behavior

2016-01-21 Thread Emlyn Corrin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110688#comment-15110688
 ] 

Emlyn Corrin commented on SPARK-9740:
-

How do you use FIRST/LAST from the Java API with ignoreNulls now? I can't find 
a way to specify ignoreNulls=true.

> first/last aggregate NULL behavior
> --
>
> Key: SPARK-9740
> URL: https://issues.apache.org/jira/browse/SPARK-9740
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Herman van Hovell
>Assignee: Yin Huai
>  Labels: releasenotes
> Fix For: 1.6.0
>
>
> The FIRST/LAST aggregates implemented as part of the new UDAF interface, 
> return the first or last non-null value (if any) found. This is a departure 
> from the behavior of the old FIRST/LAST aggregates and from the 
> FIRST_VALUE/LAST_VALUE aggregates in Hive. These would return a null value, 
> if that happened to be the first/last value seen. SPARK-9592 tries to 'fix' 
> this behavior for the old UDAF interface.
> Hive makes this behavior configurable, by adding a skipNulls flag. I would 
> suggest to do the same, and make the default behavior compatible with Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9740) first/last aggregate NULL behavior

2015-08-11 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692293#comment-14692293
 ] 

Yin Huai commented on SPARK-9740:
-

Actually, seems our old first/last functions do not respect nulls.

 first/last aggregate NULL behavior
 --

 Key: SPARK-9740
 URL: https://issues.apache.org/jira/browse/SPARK-9740
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Herman van Hovell
Assignee: Yin Huai

 The FIRST/LAST aggregates implemented as part of the new UDAF interface, 
 return the first or last non-null value (if any) found. This is a departure 
 from the behavior of the old FIRST/LAST aggregates and from the 
 FIRST_VALUE/LAST_VALUE aggregates in Hive. These would return a null value, 
 if that happened to be the first/last value seen. SPARK-9592 tries to 'fix' 
 this behavior for the old UDAF interface.
 Hive makes this behavior configurable, by adding a skipNulls flag. I would 
 suggest to do the same, and make the default behavior compatible with Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9740) first/last aggregate NULL behavior

2015-08-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692392#comment-14692392
 ] 

Apache Spark commented on SPARK-9740:
-

User 'yhuai' has created a pull request for this issue:
https://github.com/apache/spark/pull/8113

 first/last aggregate NULL behavior
 --

 Key: SPARK-9740
 URL: https://issues.apache.org/jira/browse/SPARK-9740
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Herman van Hovell
Assignee: Yin Huai

 The FIRST/LAST aggregates implemented as part of the new UDAF interface, 
 return the first or last non-null value (if any) found. This is a departure 
 from the behavior of the old FIRST/LAST aggregates and from the 
 FIRST_VALUE/LAST_VALUE aggregates in Hive. These would return a null value, 
 if that happened to be the first/last value seen. SPARK-9592 tries to 'fix' 
 this behavior for the old UDAF interface.
 Hive makes this behavior configurable, by adding a skipNulls flag. I would 
 suggest to do the same, and make the default behavior compatible with Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9740) first/last aggregate NULL behavior

2015-08-11 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682381#comment-14682381
 ] 

Yin Huai commented on SPARK-9740:
-

[~hvanhovell] Does Hive have a flag for it? Or it take {{IGNORE NULLS}} clause 
like 
oraclehttp://docs.oracle.com/cd/B28359_01/server.111/b28286/functions059.htm?

 first/last aggregate NULL behavior
 --

 Key: SPARK-9740
 URL: https://issues.apache.org/jira/browse/SPARK-9740
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Herman van Hovell
Priority: Minor

 The FIRST/LAST aggregates implemented as part of the new UDAF interface, 
 return the first or last non-null value (if any) found. This is a departure 
 from the behavior of the old FIRST/LAST aggregates and from the 
 FIRST_VALUE/LAST_VALUE aggregates in Hive. These would return a null value, 
 if that happened to be the first/last value seen. SPARK-9592 tries to 'fix' 
 this behavior for the old UDAF interface.
 Hive makes this behavior configurable, by adding a skipNulls flag. I would 
 suggest to do the same, and make the default behavior compatible with Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9740) first/last aggregate NULL behavior

2015-08-11 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682390#comment-14682390
 ] 

Yin Huai commented on SPARK-9740:
-

I think we should use {{RESPECT NULLS}} as the default (the default setting of 
the standard).

 first/last aggregate NULL behavior
 --

 Key: SPARK-9740
 URL: https://issues.apache.org/jira/browse/SPARK-9740
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Herman van Hovell
Priority: Minor

 The FIRST/LAST aggregates implemented as part of the new UDAF interface, 
 return the first or last non-null value (if any) found. This is a departure 
 from the behavior of the old FIRST/LAST aggregates and from the 
 FIRST_VALUE/LAST_VALUE aggregates in Hive. These would return a null value, 
 if that happened to be the first/last value seen. SPARK-9592 tries to 'fix' 
 this behavior for the old UDAF interface.
 Hive makes this behavior configurable, by adding a skipNulls flag. I would 
 suggest to do the same, and make the default behavior compatible with Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9740) first/last aggregate NULL behavior

2015-08-11 Thread Herman van Hovell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682405#comment-14682405
 ] 

Herman van Hovell commented on SPARK-9740:
--

I think Hive uses a Flag, {{skipNulls}}, for this:
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFFirstValue.java#L62
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLastValue.java#L52

It is not very well documented.

Hive defaults to including (respecting) nulls.

 first/last aggregate NULL behavior
 --

 Key: SPARK-9740
 URL: https://issues.apache.org/jira/browse/SPARK-9740
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Herman van Hovell
Assignee: Yin Huai

 The FIRST/LAST aggregates implemented as part of the new UDAF interface, 
 return the first or last non-null value (if any) found. This is a departure 
 from the behavior of the old FIRST/LAST aggregates and from the 
 FIRST_VALUE/LAST_VALUE aggregates in Hive. These would return a null value, 
 if that happened to be the first/last value seen. SPARK-9592 tries to 'fix' 
 this behavior for the old UDAF interface.
 Hive makes this behavior configurable, by adding a skipNulls flag. I would 
 suggest to do the same, and make the default behavior compatible with Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9740) first/last aggregate NULL behavior

2015-08-11 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682460#comment-14682460
 ] 

Yin Huai commented on SPARK-9740:
-

Thanks. I will change the behavior to respecting nulls.

 first/last aggregate NULL behavior
 --

 Key: SPARK-9740
 URL: https://issues.apache.org/jira/browse/SPARK-9740
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Herman van Hovell
Assignee: Yin Huai

 The FIRST/LAST aggregates implemented as part of the new UDAF interface, 
 return the first or last non-null value (if any) found. This is a departure 
 from the behavior of the old FIRST/LAST aggregates and from the 
 FIRST_VALUE/LAST_VALUE aggregates in Hive. These would return a null value, 
 if that happened to be the first/last value seen. SPARK-9592 tries to 'fix' 
 this behavior for the old UDAF interface.
 Hive makes this behavior configurable, by adding a skipNulls flag. I would 
 suggest to do the same, and make the default behavior compatible with Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9740) first/last aggregate NULL behavior

2015-08-07 Thread Herman van Hovell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661859#comment-14661859
 ] 

Herman van Hovell commented on SPARK-9740:
--

BTW: I encountered this while doing tests for SPARK-8641. Unfortunately it is 
kind of a PITA to create a proper test using an Aggregate, they do not enforce 
sorting, so the result of FIRST/LAST is undeterministic. 

 first/last aggregate NULL behavior
 --

 Key: SPARK-9740
 URL: https://issues.apache.org/jira/browse/SPARK-9740
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.6.0
Reporter: Herman van Hovell
Priority: Minor

 The FIRST/LAST aggregates implemented as part of the new UDAF interface, 
 return the first or last non-null value (if any) found. This is a departure 
 from the behavior of the old FIRST/LAST aggregates and from the 
 FIRST_VALUE/LAST_VALUE aggregates in Hive. These would return a null value, 
 if that happened to be the first/last value seen. SPARK-9592 tries to 'fix' 
 this behavior for the old UDAF interface.
 Hive makes this behavior configurable, by adding a skipNulls flag. I would 
 suggest to do the same, and make the default behavior compatible with Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org