Re: callUdf("percentile_approx",col("mycol"),lit(0.25)) does not compile spark 1.5.1 source but it does work in spark 1.5.1 bin

2015-11-02 Thread Umesh Kacha
Hi Ted I checked  hive-exec-1.2.1.spark.jar contains the following required
classes but still it doesn't compile I don't understand why is this Jar
getting overwritten in scope

org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFMultiplePercentileApproxEvaluator.class

Please guide.

On Mon, Oct 19, 2015 at 4:30 PM, Umesh Kacha  wrote:

> Hi Ted thanks much for your help really appreciate it. I tried to use
> maven dependencies you mentioned but still callUdf is not compiling please
> find snap shot of my intellij editor. I am sorry you may have to zoom
> pictures as I can't share code. Thanks again.
> On Oct 19, 2015 8:32 AM, "Ted Yu"  wrote:
>
>> Umesh:
>>
>> $ jar tvf
>> /home/hbase/.m2/repository/org/spark-project/hive/hive-exec/1.2.1.spark/hive-exec-1.2.1.spark.jar
>> | grep GenericUDAFPercentile
>>   2143 Fri Jul 31 23:51:48 PDT 2015
>> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$1.class
>>   4602 Fri Jul 31 23:51:48 PDT 2015
>> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFMultiplePercentileApproxEvaluator.class
>>
>> As long as the following dependency is in your pom.xml:
>> [INFO] +- org.spark-project.hive:hive-exec:jar:1.2.1.spark:compile
>>
>> You should be able to invoke percentile_approx
>>
>> Cheers
>>
>> On Sun, Oct 18, 2015 at 8:58 AM, Umesh Kacha 
>> wrote:
>>
>>> Thanks much Ted so when do we get to use this sparkUdf in Java code
>>> using maven code dependencies?? You said JIRA 10671 is not pushed as
>>> part of 1.5.1 so it should be released in 1.6.0 as mentioned in the JIRA
>>> right?
>>>
>>> On Sun, Oct 18, 2015 at 9:20 PM, Ted Yu  wrote:
>>>
 The udf is defined in GenericUDAFPercentileApprox of hive.

 When spark-shell runs, it has access to the above class which is
 packaged
 in assembly/target/scala-2.10/spark-assembly-1.6.0-SNAPSHOT-hadoop2.7.0.jar
 :

   2143 Fri Oct 16 15:02:26 PDT 2015
 org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$1.class
   4602 Fri Oct 16 15:02:26 PDT 2015
 org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFMultiplePercentileApproxEvaluator.class
   1697 Fri Oct 16 15:02:26 PDT 2015
 org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator$PercentileAggBuf.class
   6570 Fri Oct 16 15:02:26 PDT 2015
 org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator.class
   4334 Fri Oct 16 15:02:26 PDT 2015
 org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFSinglePercentileApproxEvaluator.class
   6293 Fri Oct 16 15:02:26 PDT 2015
 org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox.class

 That was the cause for different behavior.

 FYI

 On Sun, Oct 18, 2015 at 12:10 AM, unk1102 
 wrote:

> Hi starting new thread following old thread looks like code for
> compiling
> callUdf("percentile_approx",col("mycol"),lit(0.25)) is not merged in
> spark
> 1.5.1 source but I dont understand why this function call works in
> Spark
> 1.5.1 spark-shell/bin. Please guide.
>
> -- Forwarded message --
> From: "Ted Yu" 
> Date: Oct 14, 2015 3:26 AM
> Subject: Re: How to calculate percentile of a column of DataFrame?
> To: "Umesh Kacha" 
> Cc: "Michael Armbrust" ,
> "saif.a.ell...@wellsfargo.com" ,
> "user" 
>
> I modified DataFrameSuite, in master branch, to call percentile_approx
> instead of simpleUDF :
>
> - deprecated callUdf in SQLContext
> - callUDF in SQLContext *** FAILED ***
>   org.apache.spark.sql.AnalysisException: undefined function
> percentile_approx;
>   at
>
> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.apply(FunctionRegistry.scala:64)
>   at
>
> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.apply(FunctionRegistry.scala:64)
>   at scala.Option.getOrElse(Option.scala:120)
>   at
>
> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:63)
>   at
>
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506)
>   at
>
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506)
>   at
>
> org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)
>   at
>
> 

Re: callUdf("percentile_approx",col("mycol"),lit(0.25)) does not compile spark 1.5.1 source but it does work in spark 1.5.1 bin

2015-10-18 Thread Ted Yu
Umesh:

$ jar tvf
/home/hbase/.m2/repository/org/spark-project/hive/hive-exec/1.2.1.spark/hive-exec-1.2.1.spark.jar
| grep GenericUDAFPercentile
  2143 Fri Jul 31 23:51:48 PDT 2015
org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$1.class
  4602 Fri Jul 31 23:51:48 PDT 2015
org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFMultiplePercentileApproxEvaluator.class

As long as the following dependency is in your pom.xml:
[INFO] +- org.spark-project.hive:hive-exec:jar:1.2.1.spark:compile

You should be able to invoke percentile_approx

Cheers

On Sun, Oct 18, 2015 at 8:58 AM, Umesh Kacha  wrote:

> Thanks much Ted so when do we get to use this sparkUdf in Java code using
> maven code dependencies?? You said JIRA 10671 is not pushed as part of
> 1.5.1 so it should be released in 1.6.0 as mentioned in the JIRA right?
>
> On Sun, Oct 18, 2015 at 9:20 PM, Ted Yu  wrote:
>
>> The udf is defined in GenericUDAFPercentileApprox of hive.
>>
>> When spark-shell runs, it has access to the above class which is packaged
>> in assembly/target/scala-2.10/spark-assembly-1.6.0-SNAPSHOT-hadoop2.7.0.jar
>> :
>>
>>   2143 Fri Oct 16 15:02:26 PDT 2015
>> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$1.class
>>   4602 Fri Oct 16 15:02:26 PDT 2015
>> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFMultiplePercentileApproxEvaluator.class
>>   1697 Fri Oct 16 15:02:26 PDT 2015
>> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator$PercentileAggBuf.class
>>   6570 Fri Oct 16 15:02:26 PDT 2015
>> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator.class
>>   4334 Fri Oct 16 15:02:26 PDT 2015
>> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFSinglePercentileApproxEvaluator.class
>>   6293 Fri Oct 16 15:02:26 PDT 2015
>> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox.class
>>
>> That was the cause for different behavior.
>>
>> FYI
>>
>> On Sun, Oct 18, 2015 at 12:10 AM, unk1102  wrote:
>>
>>> Hi starting new thread following old thread looks like code for compiling
>>> callUdf("percentile_approx",col("mycol"),lit(0.25)) is not merged in
>>> spark
>>> 1.5.1 source but I dont understand why this function call works in Spark
>>> 1.5.1 spark-shell/bin. Please guide.
>>>
>>> -- Forwarded message --
>>> From: "Ted Yu" 
>>> Date: Oct 14, 2015 3:26 AM
>>> Subject: Re: How to calculate percentile of a column of DataFrame?
>>> To: "Umesh Kacha" 
>>> Cc: "Michael Armbrust" ,
>>> "saif.a.ell...@wellsfargo.com" ,
>>> "user" 
>>>
>>> I modified DataFrameSuite, in master branch, to call percentile_approx
>>> instead of simpleUDF :
>>>
>>> - deprecated callUdf in SQLContext
>>> - callUDF in SQLContext *** FAILED ***
>>>   org.apache.spark.sql.AnalysisException: undefined function
>>> percentile_approx;
>>>   at
>>>
>>> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.apply(FunctionRegistry.scala:64)
>>>   at
>>>
>>> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.apply(FunctionRegistry.scala:64)
>>>   at scala.Option.getOrElse(Option.scala:120)
>>>   at
>>>
>>> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:63)
>>>   at
>>>
>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506)
>>>   at
>>>
>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506)
>>>   at
>>>
>>> org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)
>>>   at
>>>
>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:505)
>>>   at
>>>
>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:502)
>>>   at
>>>
>>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
>>>
>>> SPARK-10671 is included.
>>> For 1.5.1, I guess the absence of SPARK-10671 means that SparkSQL treats
>>> percentile_approx as normal UDF.
>>>
>>> Experts can correct me, if there is any misunderstanding.
>>>
>>> Cheers
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/callUdf-percentile-approx-col-mycol-lit-0-25-does-not-compile-spark-1-5-1-source-but-it-does-work-inn-tp25111.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> 

Re: callUdf("percentile_approx",col("mycol"),lit(0.25)) does not compile spark 1.5.1 source but it does work in spark 1.5.1 bin

2015-10-18 Thread Ted Yu
The udf is defined in GenericUDAFPercentileApprox of hive.

When spark-shell runs, it has access to the above class which is packaged
in assembly/target/scala-2.10/spark-assembly-1.6.0-SNAPSHOT-hadoop2.7.0.jar
:

  2143 Fri Oct 16 15:02:26 PDT 2015
org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$1.class
  4602 Fri Oct 16 15:02:26 PDT 2015
org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFMultiplePercentileApproxEvaluator.class
  1697 Fri Oct 16 15:02:26 PDT 2015
org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator$PercentileAggBuf.class
  6570 Fri Oct 16 15:02:26 PDT 2015
org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator.class
  4334 Fri Oct 16 15:02:26 PDT 2015
org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFSinglePercentileApproxEvaluator.class
  6293 Fri Oct 16 15:02:26 PDT 2015
org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox.class

That was the cause for different behavior.

FYI

On Sun, Oct 18, 2015 at 12:10 AM, unk1102  wrote:

> Hi starting new thread following old thread looks like code for compiling
> callUdf("percentile_approx",col("mycol"),lit(0.25)) is not merged in spark
> 1.5.1 source but I dont understand why this function call works in Spark
> 1.5.1 spark-shell/bin. Please guide.
>
> -- Forwarded message --
> From: "Ted Yu" 
> Date: Oct 14, 2015 3:26 AM
> Subject: Re: How to calculate percentile of a column of DataFrame?
> To: "Umesh Kacha" 
> Cc: "Michael Armbrust" ,
> "saif.a.ell...@wellsfargo.com" ,
> "user" 
>
> I modified DataFrameSuite, in master branch, to call percentile_approx
> instead of simpleUDF :
>
> - deprecated callUdf in SQLContext
> - callUDF in SQLContext *** FAILED ***
>   org.apache.spark.sql.AnalysisException: undefined function
> percentile_approx;
>   at
>
> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.apply(FunctionRegistry.scala:64)
>   at
>
> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.apply(FunctionRegistry.scala:64)
>   at scala.Option.getOrElse(Option.scala:120)
>   at
>
> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:63)
>   at
>
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506)
>   at
>
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506)
>   at
>
> org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)
>   at
>
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:505)
>   at
>
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:502)
>   at
>
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
>
> SPARK-10671 is included.
> For 1.5.1, I guess the absence of SPARK-10671 means that SparkSQL treats
> percentile_approx as normal UDF.
>
> Experts can correct me, if there is any misunderstanding.
>
> Cheers
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/callUdf-percentile-approx-col-mycol-lit-0-25-does-not-compile-spark-1-5-1-source-but-it-does-work-inn-tp25111.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: callUdf("percentile_approx",col("mycol"),lit(0.25)) does not compile spark 1.5.1 source but it does work in spark 1.5.1 bin

2015-10-18 Thread Umesh Kacha
Thanks much Ted so when do we get to use this sparkUdf in Java code using
maven code dependencies?? You said JIRA 10671 is not pushed as part of
1.5.1 so it should be released in 1.6.0 as mentioned in the JIRA right?

On Sun, Oct 18, 2015 at 9:20 PM, Ted Yu  wrote:

> The udf is defined in GenericUDAFPercentileApprox of hive.
>
> When spark-shell runs, it has access to the above class which is packaged
> in assembly/target/scala-2.10/spark-assembly-1.6.0-SNAPSHOT-hadoop2.7.0.jar
> :
>
>   2143 Fri Oct 16 15:02:26 PDT 2015
> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$1.class
>   4602 Fri Oct 16 15:02:26 PDT 2015
> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFMultiplePercentileApproxEvaluator.class
>   1697 Fri Oct 16 15:02:26 PDT 2015
> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator$PercentileAggBuf.class
>   6570 Fri Oct 16 15:02:26 PDT 2015
> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator.class
>   4334 Fri Oct 16 15:02:26 PDT 2015
> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFSinglePercentileApproxEvaluator.class
>   6293 Fri Oct 16 15:02:26 PDT 2015
> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox.class
>
> That was the cause for different behavior.
>
> FYI
>
> On Sun, Oct 18, 2015 at 12:10 AM, unk1102  wrote:
>
>> Hi starting new thread following old thread looks like code for compiling
>> callUdf("percentile_approx",col("mycol"),lit(0.25)) is not merged in spark
>> 1.5.1 source but I dont understand why this function call works in Spark
>> 1.5.1 spark-shell/bin. Please guide.
>>
>> -- Forwarded message --
>> From: "Ted Yu" 
>> Date: Oct 14, 2015 3:26 AM
>> Subject: Re: How to calculate percentile of a column of DataFrame?
>> To: "Umesh Kacha" 
>> Cc: "Michael Armbrust" ,
>> "saif.a.ell...@wellsfargo.com" ,
>> "user" 
>>
>> I modified DataFrameSuite, in master branch, to call percentile_approx
>> instead of simpleUDF :
>>
>> - deprecated callUdf in SQLContext
>> - callUDF in SQLContext *** FAILED ***
>>   org.apache.spark.sql.AnalysisException: undefined function
>> percentile_approx;
>>   at
>>
>> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.apply(FunctionRegistry.scala:64)
>>   at
>>
>> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.apply(FunctionRegistry.scala:64)
>>   at scala.Option.getOrElse(Option.scala:120)
>>   at
>>
>> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:63)
>>   at
>>
>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506)
>>   at
>>
>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506)
>>   at
>>
>> org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)
>>   at
>>
>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:505)
>>   at
>>
>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:502)
>>   at
>>
>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
>>
>> SPARK-10671 is included.
>> For 1.5.1, I guess the absence of SPARK-10671 means that SparkSQL treats
>> percentile_approx as normal UDF.
>>
>> Experts can correct me, if there is any misunderstanding.
>>
>> Cheers
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/callUdf-percentile-approx-col-mycol-lit-0-25-does-not-compile-spark-1-5-1-source-but-it-does-work-inn-tp25111.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>