Spark DataFrame callUdf does not compile?

2015-12-28 Thread unk1102
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n25821/Screen_Shot_2015-12-28_at_8.jpg>
 

Hi I am trying to invoke Hive UDF using
dataframe.select(callUdf("percentile_approx",col("C1"),lit(0.25))) but it
does not compile however same call works in Spark scala console I dont
understand why. I am using Spark 1.5.2 maven source in my Java code. I have
also explicitly added maven dependency hive-exec-1.2.1.spark.jar where
percentile_approx is located but still does not compile code please check
attached code image. Please guide. Thanks in advance.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-DataFrame-callUdf-does-not-compile-tp25821.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark DataFrame callUdf does not compile?

2015-12-28 Thread Umesh Kacha
Hi thanks you understood question incorrectly. First of all I am passing
UDF name as String and if you see callUDF arguments then it does not take
string as first argument and if I use callUDF it will throw me exception
saying percentile_approx function not found. And another thing I mentioned
is that it works in Spark scala console so it does not have any problem of
calling it in not expected way. Hope now question is clear.

On Mon, Dec 28, 2015 at 9:21 PM, Hamel Kothari <hamelkoth...@gmail.com>
wrote:

> Also, if I'm reading correctly, it looks like you're calling "callUdf"
> when what you probably want is "callUDF" (notice the subtle capitalization
> difference). Docs:
> https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions.html#callUDF(java.lang.String,%20org.apache.spark.sql.Column..
> .)
>
> On Mon, Dec 28, 2015 at 10:48 AM Hamel Kothari <hamelkoth...@gmail.com>
> wrote:
>
>> Would you mind sharing more of your code? I can't really see the code
>> that well from the attached screenshot but it appears that "Lit" is
>> capitalized. Not sure what this method actually refers to but the
>> definition in functions.scala is lowercased.
>>
>> Even if that's not it, some more code would be helpful to solving this.
>> Also, since it's a compilation error, if you could share the compilation
>> error that would be very useful.
>>
>> -Hamel
>>
>> On Mon, Dec 28, 2015 at 10:26 AM unk1102 <umesh.ka...@gmail.com> wrote:
>>
>>> <
>>> http://apache-spark-user-list.1001560.n3.nabble.com/file/n25821/Screen_Shot_2015-12-28_at_8.jpg
>>> >
>>>
>>> Hi I am trying to invoke Hive UDF using
>>> dataframe.select(callUdf("percentile_approx",col("C1"),lit(0.25))) but it
>>> does not compile however same call works in Spark scala console I dont
>>> understand why. I am using Spark 1.5.2 maven source in my Java code. I
>>> have
>>> also explicitly added maven dependency hive-exec-1.2.1.spark.jar where
>>> percentile_approx is located but still does not compile code please check
>>> attached code image. Please guide. Thanks in advance.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-DataFrame-callUdf-does-not-compile-tp25821.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>


Re: Spark DataFrame callUdf does not compile?

2015-12-28 Thread Umesh Kacha
Thanks but I tried everything I want to confirm I am writing code below if
you can compile the following in Java with spark 1.5.2 then great otherwise
nothing is helpful here as I am stumbling with this since last few days.

public class PercentileHiveApproxTestMain {

public static void main(String[] args) {
SparkConf sparkconf = new
SparkConf().setAppName("PercentileHiveApproxTestMain").setMaster("local[*]");
SparkContext sc = new SparkContext(sparkconf);
SqlContext sqlContext = new SqlContext(sc);
//load two column data from csv and create dataframe with columns
C1(int),C0(string)
DataFrame df =
sqlContext.read().format("com.databricks.spark.csv").load("/tmp/df.csv");
df.select(callUdf("percentile_approx",col("C1"),lit(0.25))).show() //does
not compile
}

}

On Mon, Dec 28, 2015 at 9:56 PM, Hamel Kothari <hamelkoth...@gmail.com>
wrote:

> If you scroll further down in the documentation, you will see that callUDF
> does have a version which takes (String, Column...) as arguments: *callUDF
> <https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions.html#callUDF(java.lang.String,%20org.apache.spark.sql.Column...)>*
> (java.lang.String udfName, Column
> <https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/Column.html>
> ... cols)
>
> Unfortunately the link I posted above doesn't seem to work because of the
> punctuation in the URL but it is there. If you use "callUdf" from Java with
> a string argument, which is what you seem to be doing, it expects a
> Seq because of the way it is defined in scala. That's also a
> deprecated method anyways.
>
> The reason you're getting the exception is not because that's the wrong
> method to call. It's because the percentile_approx UDF is never registered.
> If you're passing in a UDF by name, you must register it with your SQL
> context as follows (example taken from the documentation of the above
> referenced method):
>
>   import org.apache.spark.sql._
>
>   val df = Seq(("id1", 1), ("id2", 4), ("id3", 5)).toDF("id", "value")
>   val sqlContext = df.sqlContext
>   sqlContext.udf.register("simpleUDF", (v: Int) => v * v)
>   df.select($"id", callUDF("simpleUDF", $"value"))
>
>
>
>
> On Mon, Dec 28, 2015 at 11:08 AM Umesh Kacha <umesh.ka...@gmail.com>
> wrote:
>
>> Hi thanks you understood question incorrectly. First of all I am passing
>> UDF name as String and if you see callUDF arguments then it does not take
>> string as first argument and if I use callUDF it will throw me exception
>> saying percentile_approx function not found. And another thing I mentioned
>> is that it works in Spark scala console so it does not have any problem of
>> calling it in not expected way. Hope now question is clear.
>>
>> On Mon, Dec 28, 2015 at 9:21 PM, Hamel Kothari <hamelkoth...@gmail.com>
>> wrote:
>>
>>> Also, if I'm reading correctly, it looks like you're calling "callUdf"
>>> when what you probably want is "callUDF" (notice the subtle capitalization
>>> difference). Docs:
>>> https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions.html#callUDF(java.lang.String,%20org.apache.spark.sql.Column..
>>> .)
>>>
>>> On Mon, Dec 28, 2015 at 10:48 AM Hamel Kothari <hamelkoth...@gmail.com>
>>> wrote:
>>>
>>>> Would you mind sharing more of your code? I can't really see the code
>>>> that well from the attached screenshot but it appears that "Lit" is
>>>> capitalized. Not sure what this method actually refers to but the
>>>> definition in functions.scala is lowercased.
>>>>
>>>> Even if that's not it, some more code would be helpful to solving this.
>>>> Also, since it's a compilation error, if you could share the compilation
>>>> error that would be very useful.
>>>>
>>>> -Hamel
>>>>
>>>> On Mon, Dec 28, 2015 at 10:26 AM unk1102 <umesh.ka...@gmail.com> wrote:
>>>>
>>>>> <
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/file/n25821/Screen_Shot_2015-12-28_at_8.jpg
>>>>> >
>>>>>
>>>>> Hi I am trying to invoke Hive UDF using
>>>>> dataframe.select(callUdf("percentile_approx",col("C1"),lit(0.25))) but
>>>>> it
>>>>> does not compile however same call works in Spark scala console I dont
>>>>> understand why. I am using Spark 1.5.2 maven source in my Java code. I
>>>>> have
>>>>> also explicitly added maven dependency hive-exec-1.2.1.spark.jar where
>>>>> percentile_approx is located but still does not compile code please
>>>>> check
>>>>> attached code image. Please guide. Thanks in advance.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-DataFrame-callUdf-does-not-compile-tp25821.html
>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>> -
>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>
>>>>>
>>


Re: Spark DataFrame callUdf does not compile?

2015-12-28 Thread Hamel Kothari
Would you mind sharing more of your code? I can't really see the code that
well from the attached screenshot but it appears that "Lit" is capitalized.
Not sure what this method actually refers to but the definition in
functions.scala is lowercased.

Even if that's not it, some more code would be helpful to solving this.
Also, since it's a compilation error, if you could share the compilation
error that would be very useful.

-Hamel

On Mon, Dec 28, 2015 at 10:26 AM unk1102 <umesh.ka...@gmail.com> wrote:

> <
> http://apache-spark-user-list.1001560.n3.nabble.com/file/n25821/Screen_Shot_2015-12-28_at_8.jpg
> >
>
> Hi I am trying to invoke Hive UDF using
> dataframe.select(callUdf("percentile_approx",col("C1"),lit(0.25))) but it
> does not compile however same call works in Spark scala console I dont
> understand why. I am using Spark 1.5.2 maven source in my Java code. I have
> also explicitly added maven dependency hive-exec-1.2.1.spark.jar where
> percentile_approx is located but still does not compile code please check
> attached code image. Please guide. Thanks in advance.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-DataFrame-callUdf-does-not-compile-tp25821.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Spark DataFrame callUdf does not compile?

2015-12-28 Thread Hamel Kothari
Also, if I'm reading correctly, it looks like you're calling "callUdf" when
what you probably want is "callUDF" (notice the subtle capitalization
difference). Docs:
https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions.html#callUDF(java.lang.String,%20org.apache.spark.sql.Column..
.)

On Mon, Dec 28, 2015 at 10:48 AM Hamel Kothari <hamelkoth...@gmail.com>
wrote:

> Would you mind sharing more of your code? I can't really see the code that
> well from the attached screenshot but it appears that "Lit" is capitalized.
> Not sure what this method actually refers to but the definition in
> functions.scala is lowercased.
>
> Even if that's not it, some more code would be helpful to solving this.
> Also, since it's a compilation error, if you could share the compilation
> error that would be very useful.
>
> -Hamel
>
> On Mon, Dec 28, 2015 at 10:26 AM unk1102 <umesh.ka...@gmail.com> wrote:
>
>> <
>> http://apache-spark-user-list.1001560.n3.nabble.com/file/n25821/Screen_Shot_2015-12-28_at_8.jpg
>> >
>>
>> Hi I am trying to invoke Hive UDF using
>> dataframe.select(callUdf("percentile_approx",col("C1"),lit(0.25))) but it
>> does not compile however same call works in Spark scala console I dont
>> understand why. I am using Spark 1.5.2 maven source in my Java code. I
>> have
>> also explicitly added maven dependency hive-exec-1.2.1.spark.jar where
>> percentile_approx is located but still does not compile code please check
>> attached code image. Please guide. Thanks in advance.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-DataFrame-callUdf-does-not-compile-tp25821.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>


Re: Spark DataFrame callUdf does not compile?

2015-12-28 Thread Hamel Kothari
If you scroll further down in the documentation, you will see that callUDF
does have a version which takes (String, Column...) as arguments: *callUDF
<https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions.html#callUDF(java.lang.String,%20org.apache.spark.sql.Column...)>*
(java.lang.String udfName, Column
<https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/Column.html>
... cols)

Unfortunately the link I posted above doesn't seem to work because of the
punctuation in the URL but it is there. If you use "callUdf" from Java with
a string argument, which is what you seem to be doing, it expects a
Seq because of the way it is defined in scala. That's also a
deprecated method anyways.

The reason you're getting the exception is not because that's the wrong
method to call. It's because the percentile_approx UDF is never registered.
If you're passing in a UDF by name, you must register it with your SQL
context as follows (example taken from the documentation of the above
referenced method):

  import org.apache.spark.sql._

  val df = Seq(("id1", 1), ("id2", 4), ("id3", 5)).toDF("id", "value")
  val sqlContext = df.sqlContext
  sqlContext.udf.register("simpleUDF", (v: Int) => v * v)
  df.select($"id", callUDF("simpleUDF", $"value"))




On Mon, Dec 28, 2015 at 11:08 AM Umesh Kacha <umesh.ka...@gmail.com> wrote:

> Hi thanks you understood question incorrectly. First of all I am passing
> UDF name as String and if you see callUDF arguments then it does not take
> string as first argument and if I use callUDF it will throw me exception
> saying percentile_approx function not found. And another thing I mentioned
> is that it works in Spark scala console so it does not have any problem of
> calling it in not expected way. Hope now question is clear.
>
> On Mon, Dec 28, 2015 at 9:21 PM, Hamel Kothari <hamelkoth...@gmail.com>
> wrote:
>
>> Also, if I'm reading correctly, it looks like you're calling "callUdf"
>> when what you probably want is "callUDF" (notice the subtle capitalization
>> difference). Docs:
>> https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions.html#callUDF(java.lang.String,%20org.apache.spark.sql.Column..
>> .)
>>
>> On Mon, Dec 28, 2015 at 10:48 AM Hamel Kothari <hamelkoth...@gmail.com>
>> wrote:
>>
>>> Would you mind sharing more of your code? I can't really see the code
>>> that well from the attached screenshot but it appears that "Lit" is
>>> capitalized. Not sure what this method actually refers to but the
>>> definition in functions.scala is lowercased.
>>>
>>> Even if that's not it, some more code would be helpful to solving this.
>>> Also, since it's a compilation error, if you could share the compilation
>>> error that would be very useful.
>>>
>>> -Hamel
>>>
>>> On Mon, Dec 28, 2015 at 10:26 AM unk1102 <umesh.ka...@gmail.com> wrote:
>>>
>>>> <
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/file/n25821/Screen_Shot_2015-12-28_at_8.jpg
>>>> >
>>>>
>>>> Hi I am trying to invoke Hive UDF using
>>>> dataframe.select(callUdf("percentile_approx",col("C1"),lit(0.25))) but
>>>> it
>>>> does not compile however same call works in Spark scala console I dont
>>>> understand why. I am using Spark 1.5.2 maven source in my Java code. I
>>>> have
>>>> also explicitly added maven dependency hive-exec-1.2.1.spark.jar where
>>>> percentile_approx is located but still does not compile code please
>>>> check
>>>> attached code image. Please guide. Thanks in advance.
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-DataFrame-callUdf-does-not-compile-tp25821.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> -
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>
>>>>
>