Re: Re: how to call udf with parameters

Pralabh Kumar Thu, 15 Jun 2017 22:01:14 -0700

val getlength=udf((idx1:Int,idx2:Int, data : String)=>
data.substring(idx1,idx2))


data.select(getlength(lit(1),lit(2),data("col1"))).collect

On Fri, Jun 16, 2017 at 10:22 AM, Pralabh Kumar <pralabhku...@gmail.com>
wrote:

> Use lit , give me some time , I'll provide an example
>
> On 16-Jun-2017 10:15 AM, "lk_spark" <lk_sp...@163.com> wrote:
>
>> thanks Kumar , I want to know how to cao udf with multiple parameters ,
>> maybe an udf to make a substr function,how can I pass parameter with begin
>> and end index ?  I try it with errors. Does the udf parameters could only
>> be a column type?
>>
>> 2017-06-16
>> ------------------------------
>> lk_spark
>> ------------------------------
>>
>> *发件人：*Pralabh Kumar <pralabhku...@gmail.com>
>> *发送时间：*2017-06-16 17:49
>> *主题：*Re: how to call udf with parameters
>> *收件人：*"lk_spark"<lk_sp...@163.com>
>> *抄送：*"user.spark"<user@spark.apache.org>
>>
>> sample UDF
>> val getlength=udf((data:String)=>data.length())
>> data.select(getlength(data("col1")))
>>
>> On Fri, Jun 16, 2017 at 9:21 AM, lk_spark <lk_sp...@163.com> wrote:
>>
>>> hi,all
>>>      I define a udf with multiple parameters  ,but I don't know how to
>>> call it with DataFrame
>>>
>>> UDF:
>>>
>>> def ssplit2 = udf { (sentence: String, delNum: Boolean, delEn: Boolean,
>>> minTermLen: Int) =>
>>>     val terms = HanLP.segment(sentence).asScala
>>> .....
>>>
>>> Call :
>>>
>>> scala> val output = input.select(ssplit2($"text",t
>>> rue,true,2).as('words))
>>> <console>:40: error: type mismatch;
>>>  found   : Boolean(true)
>>>  required: org.apache.spark.sql.Column
>>>        val output = input.select(ssplit2($"text",t
>>> rue,true,2).as('words))
>>>                                                  ^
>>> <console>:40: error: type mismatch;
>>>  found   : Boolean(true)
>>>  required: org.apache.spark.sql.Column
>>>        val output = input.select(ssplit2($"text",t
>>> rue,true,2).as('words))
>>>                                                       ^
>>> <console>:40: error: type mismatch;
>>>  found   : Int(2)
>>>  required: org.apache.spark.sql.Column
>>>        val output = input.select(ssplit2($"text",t
>>> rue,true,2).as('words))
>>>                                                            ^
>>>
>>> scala> val output = input.select(ssplit2($"text",$
>>> "true",$"true",$"2").as('words))
>>> org.apache.spark.sql.AnalysisException: cannot resolve '`true`' given
>>> input columns: [id, text];;
>>> 'Project [UDF(text#6, 'true, 'true, '2) AS words#16]
>>> +- Project [_1#2 AS id#5, _2#3 AS text#6]
>>>    +- LocalRelation [_1#2, _2#3]
>>>
>>> I need help!!
>>>
>>>
>>> 2017-06-16
>>> ------------------------------
>>> lk_spark
>>>
>>
>>

Re: Re: how to call udf with parameters

Reply via email to