Re: registering udf to use in spark.sql('select...

2016-08-04 Thread Mich Talebzadeh
Yes pretty straight forward define, register and use

def cleanupCurrency (word : String) : Double = {
 word.toString.substring(1).replace(",", "").toDouble
}
sqlContext.udf.register("cleanupCurrency", cleanupCurrency(_:String))


val a = df.filter(col("Total") > "").map(p => Invoices(p(0).toString,
p(1).toString, cleanupCurrency(p(2).toString),
cleanupCurrency(p(3).toString), cleanupCurrency(p(4).toString)))

HTH


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 4 August 2016 at 17:09, Nicholas Chammas 
wrote:

> No, SQLContext is not disappearing. The top-level class is replaced by
> SparkSession, but you can always get the underlying context from the
> session.
>
> You can also use SparkSession.udf.register()
> ,
> which is just a wrapper for sqlContext.registerFunction
> 
> .
> ​
>
> On Thu, Aug 4, 2016 at 12:04 PM Ben Teeuwen  wrote:
>
>> Yes, but I don’t want to use it in a select() call.
>> Either selectExpr() or spark.sql(), with the udf being called inside a
>> string.
>>
>> Now I got it to work using "sqlContext.registerFunction('
>> encodeOneHot_udf',encodeOneHot, VectorUDT())”
>> But this sqlContext approach will disappear, right? So I’m curious what
>> to use instead.
>>
>> On Aug 4, 2016, at 3:54 PM, Nicholas Chammas 
>> wrote:
>>
>> Have you looked at pyspark.sql.functions.udf and the associated examples?
>> 2016년 8월 4일 (목) 오전 9:10, Ben Teeuwen 님이 작성:
>>
>>> Hi,
>>>
>>> I’d like to use a UDF in pyspark 2.0. As in ..
>>> 
>>>
>>> def squareIt(x):
>>>   return x * x
>>>
>>> # register the function and define return type
>>> ….
>>>
>>> spark.sql(“”"select myUdf(adgroupid, 'extra_string_parameter') as
>>> function_result from df’)
>>>
>>> _
>>>
>>> How can I register the function? I only see registerFunction in the
>>> deprecated sqlContext at http://spark.apache.org/
>>> docs/2.0.0/api/python/pyspark.sql.html.
>>> As the ‘spark’ object unifies hiveContext and sqlContext, what is the
>>> new way to go?
>>>
>>> Ben
>>>
>>
>>


Re: registering udf to use in spark.sql('select...

2016-08-04 Thread Nicholas Chammas
No, SQLContext is not disappearing. The top-level class is replaced by
SparkSession, but you can always get the underlying context from the
session.

You can also use SparkSession.udf.register()
,
which is just a wrapper for sqlContext.registerFunction

.
​

On Thu, Aug 4, 2016 at 12:04 PM Ben Teeuwen  wrote:

> Yes, but I don’t want to use it in a select() call.
> Either selectExpr() or spark.sql(), with the udf being called inside a
> string.
>
> Now I got it to work using
> "sqlContext.registerFunction('encodeOneHot_udf',encodeOneHot, VectorUDT())”
> But this sqlContext approach will disappear, right? So I’m curious what to
> use instead.
>
> On Aug 4, 2016, at 3:54 PM, Nicholas Chammas 
> wrote:
>
> Have you looked at pyspark.sql.functions.udf and the associated examples?
> 2016년 8월 4일 (목) 오전 9:10, Ben Teeuwen 님이 작성:
>
>> Hi,
>>
>> I’d like to use a UDF in pyspark 2.0. As in ..
>> 
>>
>> def squareIt(x):
>>   return x * x
>>
>> # register the function and define return type
>> ….
>>
>> spark.sql(“”"select myUdf(adgroupid, 'extra_string_parameter') as
>> function_result from df’)
>>
>> _
>>
>> How can I register the function? I only see registerFunction in the
>> deprecated sqlContext at
>> http://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html.
>> As the ‘spark’ object unifies hiveContext and sqlContext, what is the new
>> way to go?
>>
>> Ben
>>
>
>


Re: registering udf to use in spark.sql('select...

2016-08-04 Thread Ben Teeuwen
Yes, but I don’t want to use it in a select() call. 
Either selectExpr() or spark.sql(), with the udf being called inside a string.

Now I got it to work using 
"sqlContext.registerFunction('encodeOneHot_udf',encodeOneHot, VectorUDT())”
But this sqlContext approach will disappear, right? So I’m curious what to use 
instead.

> On Aug 4, 2016, at 3:54 PM, Nicholas Chammas  
> wrote:
> 
> Have you looked at pyspark.sql.functions.udf and the associated examples?
> 2016년 8월 4일 (목) 오전 9:10, Ben Teeuwen  >님이 작성:
> Hi,
> 
> I’d like to use a UDF in pyspark 2.0. As in ..
>  
> 
> def squareIt(x):
>   return x * x
> 
> # register the function and define return type
> ….
> 
> spark.sql(“”"select myUdf(adgroupid, 'extra_string_parameter') as 
> function_result from df’)
> 
> _
> 
> How can I register the function? I only see registerFunction in the 
> deprecated sqlContext at 
> http://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html 
> .
> As the ‘spark’ object unifies hiveContext and sqlContext, what is the new way 
> to go?
> 
> Ben



Re: registering udf to use in spark.sql('select...

2016-08-04 Thread Nicholas Chammas
Have you looked at pyspark.sql.functions.udf and the associated examples?
2016년 8월 4일 (목) 오전 9:10, Ben Teeuwen 님이 작성:

> Hi,
>
> I’d like to use a UDF in pyspark 2.0. As in ..
> 
>
> def squareIt(x):
>   return x * x
>
> # register the function and define return type
> ….
>
> spark.sql(“”"select myUdf(adgroupid, 'extra_string_parameter') as
> function_result from df’)
>
> _
>
> How can I register the function? I only see registerFunction in the
> deprecated sqlContext at
> http://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html.
> As the ‘spark’ object unifies hiveContext and sqlContext, what is the new
> way to go?
>
> Ben
>