Re: How to calculate percentile of a column of DataFrame?

2015-10-14 Thread Umesh Kacha
Hi Ted thanks much for your help. So fix is in JIRA 10671 and it is suppose to release in spark 1.6.0 right? Until 1.6.0 is released I won't be able to invoke callUdf using string and percentile_approx with lit as argument right On Oct 14, 2015 03:26, "Ted Yu" wrote: > I

Re: How to calculate percentile of a column of DataFrame?

2015-10-13 Thread Umesh Kacha
Hi Ted if fix went after 1.5.1 release then how come it's working with 1.5.1 binary in spark-shell. On Oct 13, 2015 1:32 PM, "Ted Yu" wrote: > Looks like the fix went in after 1.5.1 was released. > > You may verify using master branch build. > > Cheers > > On Oct 13, 2015,

Re: How to calculate percentile of a column of DataFrame?

2015-10-13 Thread Umesh Kacha
Hi Ted, thanks much I tried using percentile_approx in Spark-shell like you mentioned it works using 1.5.1 but it doesn't compile in Java using 1.5.1 maven libraries it still complains same that callUdf can have string and column types only. Please guide. On Oct 13, 2015 12:34 AM, "Ted Yu"

Re: How to calculate percentile of a column of DataFrame?

2015-10-13 Thread Ted Yu
Pardon me. I didn't read your previous response clearly. I will try to reproduce the compilation error on master branch. Right now, I have some other high priority task on hand. BTW I was looking at SPARK-10671 FYI On Tue, Oct 13, 2015 at 1:42 AM, Umesh Kacha wrote: >

Re: How to calculate percentile of a column of DataFrame?

2015-10-13 Thread Ted Yu
Looks like the fix went in after 1.5.1 was released. You may verify using master branch build. Cheers > On Oct 13, 2015, at 12:21 AM, Umesh Kacha wrote: > > Hi Ted, thanks much I tried using percentile_approx in Spark-shell like you > mentioned it works using 1.5.1

Re: How to calculate percentile of a column of DataFrame?

2015-10-13 Thread Ted Yu
Can you pastebin your Java code and the command you used to compile ? Thanks > On Oct 13, 2015, at 1:42 AM, Umesh Kacha wrote: > > Hi Ted if fix went after 1.5.1 release then how come it's working with 1.5.1 > binary in spark-shell. > >> On Oct 13, 2015 1:32 PM, "Ted

Re: How to calculate percentile of a column of DataFrame?

2015-10-13 Thread Umesh Kacha
OK thanks much Ted looks like some issue while using maven dependencies in Java code for 1.5.1. I am still not able to understand if spark 1.5.1 binary in spark-shell can recognize callUdf then why not callUdf not getting compiled while using maven build. On Oct 13, 2015 2:20 PM, "Ted Yu"

Re: How to calculate percentile of a column of DataFrame?

2015-10-13 Thread Umesh Kacha
Hi Ted I am using the following line of code I can't paste entire code sorry but the following only line doesn't compile in my spark job sourceframe.select(callUDF("percentile_approx",col("mycol"), lit(0.25))) I am using Intellij editor java and maven dependencies of spark core spark sql spark

Re: How to calculate percentile of a column of DataFrame?

2015-10-13 Thread Ted Yu
I am currently dealing with a high priority bug in another project. Hope to get back to this soon. On Tue, Oct 13, 2015 at 11:56 AM, Umesh Kacha wrote: > Hi Ted sorry for asking again. Did you get chance to look at compilation > issue? Thanks much. > > Regards. > On Oct

Re: How to calculate percentile of a column of DataFrame?

2015-10-13 Thread Umesh Kacha
Hi Ted sorry for asking again. Did you get chance to look at compilation issue? Thanks much. Regards. On Oct 13, 2015 18:39, "Umesh Kacha" wrote: > Hi Ted I am using the following line of code I can't paste entire code > sorry but the following only line doesn't compile

Re: How to calculate percentile of a column of DataFrame?

2015-10-13 Thread Ted Yu
I modified DataFrameSuite, in master branch, to call percentile_approx instead of simpleUDF : - deprecated callUdf in SQLContext - callUDF in SQLContext *** FAILED *** org.apache.spark.sql.AnalysisException: undefined function percentile_approx; at

Re: How to calculate percentile of a column of DataFrame?

2015-10-12 Thread Umesh Kacha
Hi if you can help it would be great as I am stuck don't know how to remove compilation error in callUdf when we pass three parameters function name string column name as col and lit function please guide On Oct 11, 2015 1:05 AM, "Umesh Kacha" wrote: > Hi any idea? how do

Re: How to calculate percentile of a column of DataFrame?

2015-10-12 Thread Richard Eggert
I think the problem may be that callUDF takes a DataType indicating the return type of the UDF as its second argument. On Oct 12, 2015 9:27 AM, "Umesh Kacha" wrote: > Hi if you can help it would be great as I am stuck don't know how to > remove compilation error in callUdf

Re: How to calculate percentile of a column of DataFrame?

2015-10-12 Thread Ted Yu
Umesh: Have you tried calling callUdf without the lit() parameter ? Cheers On Mon, Oct 12, 2015 at 6:27 AM, Umesh Kacha wrote: > Hi if you can help it would be great as I am stuck don't know how to > remove compilation error in callUdf when we pass three parameters

Re: How to calculate percentile of a column of DataFrame?

2015-10-12 Thread Umesh Kacha
Hi Ted thanks if I dont pass lit function then how can I tell percentile_approx function to give me 25% or 50% like we do in Hive percentile_approx(mycol,0.25). Regards On Mon, Oct 12, 2015 at 7:20 PM, Ted Yu wrote: > Umesh: > Have you tried calling callUdf without the

Re: How to calculate percentile of a column of DataFrame?

2015-10-12 Thread Ted Yu
Using spark-shell, I did the following exercise (master branch) : SQL context available as sqlContext. scala> val df = Seq(("id1", 1), ("id2", 4), ("id3", 5)).toDF("id", "value") df: org.apache.spark.sql.DataFrame = [id: string, value: int] scala> sqlContext.udf.register("simpleUDF", (v: Int,

Re: How to calculate percentile of a column of DataFrame?

2015-10-12 Thread Umesh Kacha
Sorry forgot to tell that I am using Spark 1.4.1 as callUdf is available in Spark 1.4.0 as per JAvadocx On Tue, Oct 13, 2015 at 12:22 AM, Umesh Kacha wrote: > Hi Ted thanks much for the detailed answer and appreciate your efforts. Do > we need to register Hive UDFs? > >

Re: How to calculate percentile of a column of DataFrame?

2015-10-12 Thread Umesh Kacha
Hi Ted thanks much for the detailed answer and appreciate your efforts. Do we need to register Hive UDFs? sqlContext.udf.register("percentile_approx");???//is it valid? I am calling Hive UDF percentile_approx in the following manner which gives compilation error

Re: How to calculate percentile of a column of DataFrame?

2015-10-12 Thread Ted Yu
SQL context available as sqlContext. scala> val df = Seq(("id1", 1), ("id2", 4), ("id3", 5)).toDF("id", "value") df: org.apache.spark.sql.DataFrame = [id: string, value: int] scala> df.select(callUDF("percentile_approx",col("value"), lit(0.25))).show() +--+

Re: How to calculate percentile of a column of DataFrame?

2015-10-12 Thread Umesh Kacha
Hi Ted thanks much are you saying above code will work in only 1.5.1? I tried upgrading to 1.5.1 but I have found potential bug my Spark job creates hive partitions using hiveContext.sql("insert into partitions") when I use Spark 1.5.1 I cant see any partitions files orc files getting created in

Re: How to calculate percentile of a column of DataFrame?

2015-10-12 Thread Ted Yu
I would suggest using http://search-hadoop.com/ to find literature on the empty partitions directory problem. If there is no answer there, please start a new thread with the following information: release of Spark release of hadoop code snippet symptom Cheers On Mon, Oct 12, 2015 at 12:08 PM,

Re: How to calculate percentile of a column of DataFrame?

2015-10-10 Thread Umesh Kacha
Hi any idea? how do I call percentlie_approx using callUdf() please guide. On Sat, Oct 10, 2015 at 1:39 AM, Umesh Kacha wrote: > I have a doubt Michael I tried to use callUDF in the following code it > does not work. > >

How to calculate percentile of a column of DataFrame?

2015-10-09 Thread unk1102
Hi how to calculate percentile of a column in a DataFrame? I cant find any percentile_approx function in Spark aggregation functions. For e.g. in Hive we have percentile_approx and we can use it in the following way hiveContext.sql("select percentile_approx("mycol",0.25) from myTable); I can see

Re: How to calculate percentile of a column of DataFrame?

2015-10-09 Thread Michael Armbrust
You can use callUDF(col("mycol"), lit(0.25)) to call hive UDFs from dataframes. On Fri, Oct 9, 2015 at 12:01 PM, unk1102 wrote: > Hi how to calculate percentile of a column in a DataFrame? I cant find any > percentile_approx function in Spark aggregation functions. For

RE: How to calculate percentile of a column of DataFrame?

2015-10-09 Thread Saif.A.Ellafi
Where can we find other available functions such as lit() ? I can’t find lit in the api. Thanks From: Michael Armbrust [mailto:mich...@databricks.com] Sent: Friday, October 09, 2015 4:04 PM To: unk1102 Cc: user Subject: Re: How to calculate percentile of a column of DataFrame? You can use

Re: How to calculate percentile of a column of DataFrame?

2015-10-09 Thread Umesh Kacha
I found it in 1.3 documentation lit says something else not percent public static Column lit(Object literal) Creates a Column of literal

RE: How to calculate percentile of a column of DataFrame?

2015-10-09 Thread Saif.A.Ellafi
Yes but I mean, this is rather curious. How is def lit(literal:Any) --> becomes a percentile function lit(25) Thanks for clarification Saif From: Umesh Kacha [mailto:umesh.ka...@gmail.com] Sent: Friday, October 09, 2015 4:10 PM To: Ellafi, Saif A. Cc: Michael Armbrust; user Subject: Re: How to

Re: How to calculate percentile of a column of DataFrame?

2015-10-09 Thread Michael Armbrust
This is confusing because I made a typo... callUDF("percentile_approx", col("mycol"), lit(0.25)) The first argument is the name of the UDF, all other arguments need to be columns that are passed in as arguments. lit is just saying to make a literal column that always has the value 0.25. On

Re: How to calculate percentile of a column of DataFrame?

2015-10-09 Thread Umesh Kacha
thanks much Michael let me try. On Sat, Oct 10, 2015 at 1:20 AM, Michael Armbrust wrote: > This is confusing because I made a typo... > > callUDF("percentile_approx", col("mycol"), lit(0.25)) > > The first argument is the name of the UDF, all other arguments need to be >

Re: How to calculate percentile of a column of DataFrame?

2015-10-09 Thread Umesh Kacha
I have a doubt Michael I tried to use callUDF in the following code it does not work. sourceFrame.agg(callUdf("percentile_approx",col("myCol"),lit(0.25))) Above code does not compile because callUdf() takes only two arguments function name in String and Column class type. Please guide. On Sat,