[jira] [Commented] (SPARK-18527) UDAFPercentile (bigint, array) needs explicity cast to double
[ https://issues.apache.org/jira/browse/SPARK-18527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15704937#comment-15704937 ] Herman van Hovell commented on SPARK-18527: --- We did not merge my PR (which was a hack TBH), but the Percentile implementation. That is the reason why I closed this. > UDAFPercentile (bigint, array) needs explicity cast to double > - > > Key: SPARK-18527 > URL: https://issues.apache.org/jira/browse/SPARK-18527 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1 > Environment: spark-2.0.1-bin-hadoop2.7/bin/spark-shell >Reporter: Fabian Boehnlein >Assignee: Jiang Xingbo > Fix For: 2.1.0 > > > Same bug as SPARK-16228 but > {code}_FUNC_(bigint, array) {code} > instead of > {code}_FUNC_(bigint, double){code} > Fix of SPARK-16228 only fixes the non-array case that was hit. > {code} > sql("select percentile(value, array(0.5,0.99)) from values 1,2,3 T(value)") > {code} > fails in Spark 2 shell. > Longer example > {code} > case class Record(key: Long, value: String) > val recordsDF = spark.createDataFrame((1 to 100).map(i => Record(i.toLong, > s"val_$i"))) > recordsDF.createOrReplaceTempView("records") > sql("SELECT percentile(key, Array(0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, > 0.2, 0.1)) AS test FROM records") > org.apache.spark.sql.AnalysisException: No handler for Hive UDF > 'org.apache.hadoop.hive.ql.udf.UDAFPercentile': > org.apache.hadoop.hive.ql.exec.NoMatchingMethodException: No matching method > for class org.apache.had > oop.hive.ql.udf.UDAFPercentile with (bigint, array). Possible > choices: _FUNC_(bigint, array) _FUNC_(bigint, double) ; line 1 pos 7 > at > org.apache.hadoop.hive.ql.exec.FunctionRegistry.getMethodInternal(FunctionRegistry.java:1164) > at > org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:83) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:56) > at > org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver.getEvaluator(AbstractGenericUDAFResolver.java:47){code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18527) UDAFPercentile (bigint, array) needs explicity cast to double
[ https://issues.apache.org/jira/browse/SPARK-18527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15704934#comment-15704934 ] Fabian Boehnlein commented on SPARK-18527: -- Thanks [~hvanhovell]. Will be interesting to compare this fix to the native implementation which was also merged into branch-2.1. > UDAFPercentile (bigint, array) needs explicity cast to double > - > > Key: SPARK-18527 > URL: https://issues.apache.org/jira/browse/SPARK-18527 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1 > Environment: spark-2.0.1-bin-hadoop2.7/bin/spark-shell >Reporter: Fabian Boehnlein >Assignee: Jiang Xingbo > Fix For: 2.1.0 > > > Same bug as SPARK-16228 but > {code}_FUNC_(bigint, array) {code} > instead of > {code}_FUNC_(bigint, double){code} > Fix of SPARK-16228 only fixes the non-array case that was hit. > {code} > sql("select percentile(value, array(0.5,0.99)) from values 1,2,3 T(value)") > {code} > fails in Spark 2 shell. > Longer example > {code} > case class Record(key: Long, value: String) > val recordsDF = spark.createDataFrame((1 to 100).map(i => Record(i.toLong, > s"val_$i"))) > recordsDF.createOrReplaceTempView("records") > sql("SELECT percentile(key, Array(0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, > 0.2, 0.1)) AS test FROM records") > org.apache.spark.sql.AnalysisException: No handler for Hive UDF > 'org.apache.hadoop.hive.ql.udf.UDAFPercentile': > org.apache.hadoop.hive.ql.exec.NoMatchingMethodException: No matching method > for class org.apache.had > oop.hive.ql.udf.UDAFPercentile with (bigint, array). Possible > choices: _FUNC_(bigint, array) _FUNC_(bigint, double) ; line 1 pos 7 > at > org.apache.hadoop.hive.ql.exec.FunctionRegistry.getMethodInternal(FunctionRegistry.java:1164) > at > org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:83) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:56) > at > org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver.getEvaluator(AbstractGenericUDAFResolver.java:47){code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18527) UDAFPercentile (bigint, array) needs explicity cast to double
[ https://issues.apache.org/jira/browse/SPARK-18527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15701806#comment-15701806 ] Apache Spark commented on SPARK-18527: -- User 'hvanhovell' has created a pull request for this issue: https://github.com/apache/spark/pull/16034 > UDAFPercentile (bigint, array) needs explicity cast to double > - > > Key: SPARK-18527 > URL: https://issues.apache.org/jira/browse/SPARK-18527 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1 > Environment: spark-2.0.1-bin-hadoop2.7/bin/spark-shell >Reporter: Fabian Boehnlein > > Same bug as SPARK-16228 but > {code}_FUNC_(bigint, array) {code} > instead of > {code}_FUNC_(bigint, double){code} > Fix of SPARK-16228 only fixes the non-array case that was hit. > {code} > sql("select percentile(value, array(0.5,0.99)) from values 1,2,3 T(value)") > {code} > fails in Spark 2 shell. > Longer example > {code} > case class Record(key: Long, value: String) > val recordsDF = spark.createDataFrame((1 to 100).map(i => Record(i.toLong, > s"val_$i"))) > recordsDF.createOrReplaceTempView("records") > sql("SELECT percentile(key, Array(0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, > 0.2, 0.1)) AS test FROM records") > org.apache.spark.sql.AnalysisException: No handler for Hive UDF > 'org.apache.hadoop.hive.ql.udf.UDAFPercentile': > org.apache.hadoop.hive.ql.exec.NoMatchingMethodException: No matching method > for class org.apache.had > oop.hive.ql.udf.UDAFPercentile with (bigint, array). Possible > choices: _FUNC_(bigint, array) _FUNC_(bigint, double) ; line 1 pos 7 > at > org.apache.hadoop.hive.ql.exec.FunctionRegistry.getMethodInternal(FunctionRegistry.java:1164) > at > org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:83) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:56) > at > org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver.getEvaluator(AbstractGenericUDAFResolver.java:47){code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18527) UDAFPercentile (bigint, array) needs explicity cast to double
[ https://issues.apache.org/jira/browse/SPARK-18527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15701661#comment-15701661 ] Fabian Boehnlein commented on SPARK-18527: -- Interesting [~hvanhovell], good to see that {code}percentile(a, array){code} is aimed to be [covered|https://github.com/apache/spark/pull/14136/files#diff-a15a6f87f9676612c69435953a13ddd3R127] in the own implementation. Indeed that PR seems quite big for a soon release. Maybe [~dongjoon] could give starting points for this one, related to the very close PR: https://github.com/apache/spark/pull/13930 Thanks! > UDAFPercentile (bigint, array) needs explicity cast to double > - > > Key: SPARK-18527 > URL: https://issues.apache.org/jira/browse/SPARK-18527 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1 > Environment: spark-2.0.1-bin-hadoop2.7/bin/spark-shell >Reporter: Fabian Boehnlein > > Same bug as SPARK-16228 but > {code}_FUNC_(bigint, array) {code} > instead of > {code}_FUNC_(bigint, double){code} > Fix of SPARK-16228 only fixes the non-array case that was hit. > {code} > sql("select percentile(value, array(0.5,0.99)) from values 1,2,3 T(value)") > {code} > fails in Spark 2 shell. > Longer example > {code} > case class Record(key: Long, value: String) > val recordsDF = spark.createDataFrame((1 to 100).map(i => Record(i.toLong, > s"val_$i"))) > recordsDF.createOrReplaceTempView("records") > sql("SELECT percentile(key, Array(0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, > 0.2, 0.1)) AS test FROM records") > org.apache.spark.sql.AnalysisException: No handler for Hive UDF > 'org.apache.hadoop.hive.ql.udf.UDAFPercentile': > org.apache.hadoop.hive.ql.exec.NoMatchingMethodException: No matching method > for class org.apache.had > oop.hive.ql.udf.UDAFPercentile with (bigint, array). Possible > choices: _FUNC_(bigint, array) _FUNC_(bigint, double) ; line 1 pos 7 > at > org.apache.hadoop.hive.ql.exec.FunctionRegistry.getMethodInternal(FunctionRegistry.java:1164) > at > org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:83) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:56) > at > org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver.getEvaluator(AbstractGenericUDAFResolver.java:47){code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18527) UDAFPercentile (bigint, array) needs explicity cast to double
[ https://issues.apache.org/jira/browse/SPARK-18527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15697972#comment-15697972 ] Herman van Hovell commented on SPARK-18527: --- The implementation of our own percentile function is tracked in SPARK-16282. I don't think that will make 2.1, so it would be great if you can create a PR. > UDAFPercentile (bigint, array) needs explicity cast to double > - > > Key: SPARK-18527 > URL: https://issues.apache.org/jira/browse/SPARK-18527 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1 > Environment: spark-2.0.1-bin-hadoop2.7/bin/spark-shell >Reporter: Fabian Boehnlein > > Same bug as SPARK-16228 but > {code}_FUNC_(bigint, array) {code} > instead of > {code}_FUNC_(bigint, double){code} > Fix of SPARK-16228 only fixes the non-array case that was hit. > {code} > sql("select percentile(value, array(0.5,0.99)) from values 1,2,3 T(value)") > {code} > fails in Spark 2 shell. > Longer example > {code} > case class Record(key: Long, value: String) > val recordsDF = spark.createDataFrame((1 to 100).map(i => Record(i.toLong, > s"val_$i"))) > recordsDF.createOrReplaceTempView("records") > sql("SELECT percentile(key, Array(0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, > 0.2, 0.1)) AS test FROM records") > org.apache.spark.sql.AnalysisException: No handler for Hive UDF > 'org.apache.hadoop.hive.ql.udf.UDAFPercentile': > org.apache.hadoop.hive.ql.exec.NoMatchingMethodException: No matching method > for class org.apache.had > oop.hive.ql.udf.UDAFPercentile with (bigint, array). Possible > choices: _FUNC_(bigint, array) _FUNC_(bigint, double) ; line 1 pos 7 > at > org.apache.hadoop.hive.ql.exec.FunctionRegistry.getMethodInternal(FunctionRegistry.java:1164) > at > org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:83) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:56) > at > org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver.getEvaluator(AbstractGenericUDAFResolver.java:47){code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18527) UDAFPercentile (bigint, array) needs explicity cast to double
[ https://issues.apache.org/jira/browse/SPARK-18527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15696598#comment-15696598 ] Thomas Sebastian commented on SPARK-18527: -- I am interested to work on this. > UDAFPercentile (bigint, array) needs explicity cast to double > - > > Key: SPARK-18527 > URL: https://issues.apache.org/jira/browse/SPARK-18527 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1 > Environment: spark-2.0.1-bin-hadoop2.7/bin/spark-shell >Reporter: Fabian Boehnlein > > Same bug as SPARK-16228 but > {code}_FUNC_(bigint, array) {code} > instead of > {code}_FUNC_(bigint, double){code} > Fix of SPARK-16228 only fixes the non-array case that was hit. > {code} > sql("select percentile(value, array(0.5,0.99)) from values 1,2,3 T(value)") > {code} > fails in Spark 2 shell. > Longer example > {code} > case class Record(key: Long, value: String) > val recordsDF = spark.createDataFrame((1 to 100).map(i => Record(i.toLong, > s"val_$i"))) > recordsDF.createOrReplaceTempView("records") > sql("SELECT percentile(key, Array(0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, > 0.2, 0.1)) AS test FROM records") > org.apache.spark.sql.AnalysisException: No handler for Hive UDF > 'org.apache.hadoop.hive.ql.udf.UDAFPercentile': > org.apache.hadoop.hive.ql.exec.NoMatchingMethodException: No matching method > for class org.apache.had > oop.hive.ql.udf.UDAFPercentile with (bigint, array). Possible > choices: _FUNC_(bigint, array) _FUNC_(bigint, double) ; line 1 pos 7 > at > org.apache.hadoop.hive.ql.exec.FunctionRegistry.getMethodInternal(FunctionRegistry.java:1164) > at > org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:83) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:56) > at > org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver.getEvaluator(AbstractGenericUDAFResolver.java:47){code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org