[jira] [Commented] (SPARK-18527) UDAFPercentile (bigint, array) needs explicity cast to double

2016-11-29 Thread Herman van Hovell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15704937#comment-15704937
 ] 

Herman van Hovell commented on SPARK-18527:
---

We did not merge my PR (which was a hack TBH), but the Percentile 
implementation. That is the reason why I closed this.

> UDAFPercentile (bigint, array) needs explicity cast to double
> -
>
> Key: SPARK-18527
> URL: https://issues.apache.org/jira/browse/SPARK-18527
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
> Environment: spark-2.0.1-bin-hadoop2.7/bin/spark-shell
>Reporter: Fabian Boehnlein
>Assignee: Jiang Xingbo
> Fix For: 2.1.0
>
>
> Same bug as SPARK-16228 but 
> {code}_FUNC_(bigint, array) {code}
> instead of 
> {code}_FUNC_(bigint, double){code}
> Fix of SPARK-16228 only fixes the non-array case that was hit.
> {code}
> sql("select percentile(value, array(0.5,0.99)) from values 1,2,3 T(value)")
> {code}
> fails in Spark 2 shell.
> Longer example
> {code}
> case class Record(key: Long, value: String)
> val recordsDF = spark.createDataFrame((1 to 100).map(i => Record(i.toLong, 
> s"val_$i")))
> recordsDF.createOrReplaceTempView("records")
> sql("SELECT percentile(key, Array(0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 
> 0.2, 0.1)) AS test FROM records")
> org.apache.spark.sql.AnalysisException: No handler for Hive UDF 
> 'org.apache.hadoop.hive.ql.udf.UDAFPercentile': 
> org.apache.hadoop.hive.ql.exec.NoMatchingMethodException: No matching method 
> for class org.apache.had
> oop.hive.ql.udf.UDAFPercentile with (bigint, array). Possible 
> choices: _FUNC_(bigint, array)  _FUNC_(bigint, double)  ; line 1 pos 7
>   at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.getMethodInternal(FunctionRegistry.java:1164)
>   at 
> org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:83)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:56)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver.getEvaluator(AbstractGenericUDAFResolver.java:47){code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18527) UDAFPercentile (bigint, array) needs explicity cast to double

2016-11-29 Thread Fabian Boehnlein (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15704934#comment-15704934
 ] 

Fabian Boehnlein commented on SPARK-18527:
--

Thanks [~hvanhovell]. Will be interesting to compare this fix to the native 
implementation which was also merged into branch-2.1.

> UDAFPercentile (bigint, array) needs explicity cast to double
> -
>
> Key: SPARK-18527
> URL: https://issues.apache.org/jira/browse/SPARK-18527
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
> Environment: spark-2.0.1-bin-hadoop2.7/bin/spark-shell
>Reporter: Fabian Boehnlein
>Assignee: Jiang Xingbo
> Fix For: 2.1.0
>
>
> Same bug as SPARK-16228 but 
> {code}_FUNC_(bigint, array) {code}
> instead of 
> {code}_FUNC_(bigint, double){code}
> Fix of SPARK-16228 only fixes the non-array case that was hit.
> {code}
> sql("select percentile(value, array(0.5,0.99)) from values 1,2,3 T(value)")
> {code}
> fails in Spark 2 shell.
> Longer example
> {code}
> case class Record(key: Long, value: String)
> val recordsDF = spark.createDataFrame((1 to 100).map(i => Record(i.toLong, 
> s"val_$i")))
> recordsDF.createOrReplaceTempView("records")
> sql("SELECT percentile(key, Array(0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 
> 0.2, 0.1)) AS test FROM records")
> org.apache.spark.sql.AnalysisException: No handler for Hive UDF 
> 'org.apache.hadoop.hive.ql.udf.UDAFPercentile': 
> org.apache.hadoop.hive.ql.exec.NoMatchingMethodException: No matching method 
> for class org.apache.had
> oop.hive.ql.udf.UDAFPercentile with (bigint, array). Possible 
> choices: _FUNC_(bigint, array)  _FUNC_(bigint, double)  ; line 1 pos 7
>   at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.getMethodInternal(FunctionRegistry.java:1164)
>   at 
> org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:83)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:56)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver.getEvaluator(AbstractGenericUDAFResolver.java:47){code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18527) UDAFPercentile (bigint, array) needs explicity cast to double

2016-11-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15701806#comment-15701806
 ] 

Apache Spark commented on SPARK-18527:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/16034

> UDAFPercentile (bigint, array) needs explicity cast to double
> -
>
> Key: SPARK-18527
> URL: https://issues.apache.org/jira/browse/SPARK-18527
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
> Environment: spark-2.0.1-bin-hadoop2.7/bin/spark-shell
>Reporter: Fabian Boehnlein
>
> Same bug as SPARK-16228 but 
> {code}_FUNC_(bigint, array) {code}
> instead of 
> {code}_FUNC_(bigint, double){code}
> Fix of SPARK-16228 only fixes the non-array case that was hit.
> {code}
> sql("select percentile(value, array(0.5,0.99)) from values 1,2,3 T(value)")
> {code}
> fails in Spark 2 shell.
> Longer example
> {code}
> case class Record(key: Long, value: String)
> val recordsDF = spark.createDataFrame((1 to 100).map(i => Record(i.toLong, 
> s"val_$i")))
> recordsDF.createOrReplaceTempView("records")
> sql("SELECT percentile(key, Array(0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 
> 0.2, 0.1)) AS test FROM records")
> org.apache.spark.sql.AnalysisException: No handler for Hive UDF 
> 'org.apache.hadoop.hive.ql.udf.UDAFPercentile': 
> org.apache.hadoop.hive.ql.exec.NoMatchingMethodException: No matching method 
> for class org.apache.had
> oop.hive.ql.udf.UDAFPercentile with (bigint, array). Possible 
> choices: _FUNC_(bigint, array)  _FUNC_(bigint, double)  ; line 1 pos 7
>   at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.getMethodInternal(FunctionRegistry.java:1164)
>   at 
> org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:83)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:56)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver.getEvaluator(AbstractGenericUDAFResolver.java:47){code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18527) UDAFPercentile (bigint, array) needs explicity cast to double

2016-11-28 Thread Fabian Boehnlein (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15701661#comment-15701661
 ] 

Fabian Boehnlein commented on SPARK-18527:
--

Interesting [~hvanhovell], good to see that 
{code}percentile(a, array){code}
is aimed to be 
[covered|https://github.com/apache/spark/pull/14136/files#diff-a15a6f87f9676612c69435953a13ddd3R127]
 in the own implementation. Indeed that PR seems quite big for a soon release.

Maybe [~dongjoon] could give starting points for this one, related to the very 
close PR: https://github.com/apache/spark/pull/13930

Thanks!

> UDAFPercentile (bigint, array) needs explicity cast to double
> -
>
> Key: SPARK-18527
> URL: https://issues.apache.org/jira/browse/SPARK-18527
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
> Environment: spark-2.0.1-bin-hadoop2.7/bin/spark-shell
>Reporter: Fabian Boehnlein
>
> Same bug as SPARK-16228 but 
> {code}_FUNC_(bigint, array) {code}
> instead of 
> {code}_FUNC_(bigint, double){code}
> Fix of SPARK-16228 only fixes the non-array case that was hit.
> {code}
> sql("select percentile(value, array(0.5,0.99)) from values 1,2,3 T(value)")
> {code}
> fails in Spark 2 shell.
> Longer example
> {code}
> case class Record(key: Long, value: String)
> val recordsDF = spark.createDataFrame((1 to 100).map(i => Record(i.toLong, 
> s"val_$i")))
> recordsDF.createOrReplaceTempView("records")
> sql("SELECT percentile(key, Array(0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 
> 0.2, 0.1)) AS test FROM records")
> org.apache.spark.sql.AnalysisException: No handler for Hive UDF 
> 'org.apache.hadoop.hive.ql.udf.UDAFPercentile': 
> org.apache.hadoop.hive.ql.exec.NoMatchingMethodException: No matching method 
> for class org.apache.had
> oop.hive.ql.udf.UDAFPercentile with (bigint, array). Possible 
> choices: _FUNC_(bigint, array)  _FUNC_(bigint, double)  ; line 1 pos 7
>   at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.getMethodInternal(FunctionRegistry.java:1164)
>   at 
> org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:83)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:56)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver.getEvaluator(AbstractGenericUDAFResolver.java:47){code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18527) UDAFPercentile (bigint, array) needs explicity cast to double

2016-11-26 Thread Herman van Hovell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15697972#comment-15697972
 ] 

Herman van Hovell commented on SPARK-18527:
---

The implementation of our own percentile function is tracked in SPARK-16282. I 
don't think that will make 2.1, so it would be great if you can create a PR.

> UDAFPercentile (bigint, array) needs explicity cast to double
> -
>
> Key: SPARK-18527
> URL: https://issues.apache.org/jira/browse/SPARK-18527
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
> Environment: spark-2.0.1-bin-hadoop2.7/bin/spark-shell
>Reporter: Fabian Boehnlein
>
> Same bug as SPARK-16228 but 
> {code}_FUNC_(bigint, array) {code}
> instead of 
> {code}_FUNC_(bigint, double){code}
> Fix of SPARK-16228 only fixes the non-array case that was hit.
> {code}
> sql("select percentile(value, array(0.5,0.99)) from values 1,2,3 T(value)")
> {code}
> fails in Spark 2 shell.
> Longer example
> {code}
> case class Record(key: Long, value: String)
> val recordsDF = spark.createDataFrame((1 to 100).map(i => Record(i.toLong, 
> s"val_$i")))
> recordsDF.createOrReplaceTempView("records")
> sql("SELECT percentile(key, Array(0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 
> 0.2, 0.1)) AS test FROM records")
> org.apache.spark.sql.AnalysisException: No handler for Hive UDF 
> 'org.apache.hadoop.hive.ql.udf.UDAFPercentile': 
> org.apache.hadoop.hive.ql.exec.NoMatchingMethodException: No matching method 
> for class org.apache.had
> oop.hive.ql.udf.UDAFPercentile with (bigint, array). Possible 
> choices: _FUNC_(bigint, array)  _FUNC_(bigint, double)  ; line 1 pos 7
>   at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.getMethodInternal(FunctionRegistry.java:1164)
>   at 
> org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:83)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:56)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver.getEvaluator(AbstractGenericUDAFResolver.java:47){code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18527) UDAFPercentile (bigint, array) needs explicity cast to double

2016-11-25 Thread Thomas Sebastian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15696598#comment-15696598
 ] 

Thomas Sebastian commented on SPARK-18527:
--

I am interested to work on this.

> UDAFPercentile (bigint, array) needs explicity cast to double
> -
>
> Key: SPARK-18527
> URL: https://issues.apache.org/jira/browse/SPARK-18527
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
> Environment: spark-2.0.1-bin-hadoop2.7/bin/spark-shell
>Reporter: Fabian Boehnlein
>
> Same bug as SPARK-16228 but 
> {code}_FUNC_(bigint, array) {code}
> instead of 
> {code}_FUNC_(bigint, double){code}
> Fix of SPARK-16228 only fixes the non-array case that was hit.
> {code}
> sql("select percentile(value, array(0.5,0.99)) from values 1,2,3 T(value)")
> {code}
> fails in Spark 2 shell.
> Longer example
> {code}
> case class Record(key: Long, value: String)
> val recordsDF = spark.createDataFrame((1 to 100).map(i => Record(i.toLong, 
> s"val_$i")))
> recordsDF.createOrReplaceTempView("records")
> sql("SELECT percentile(key, Array(0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 
> 0.2, 0.1)) AS test FROM records")
> org.apache.spark.sql.AnalysisException: No handler for Hive UDF 
> 'org.apache.hadoop.hive.ql.udf.UDAFPercentile': 
> org.apache.hadoop.hive.ql.exec.NoMatchingMethodException: No matching method 
> for class org.apache.had
> oop.hive.ql.udf.UDAFPercentile with (bigint, array). Possible 
> choices: _FUNC_(bigint, array)  _FUNC_(bigint, double)  ; line 1 pos 7
>   at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.getMethodInternal(FunctionRegistry.java:1164)
>   at 
> org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:83)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:56)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver.getEvaluator(AbstractGenericUDAFResolver.java:47){code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org