What version of spark you are using? I cannot reproduce your error:

scala> spark.version
res9: String = 2.1.1
scala> val dataset = Seq((0, "hello"), (1, "world")).toDF("id", "text")
dataset: org.apache.spark.sql.DataFrame = [id: int, text: string]
scala> import org.apache.spark.sql.functions.udf
import org.apache.spark.sql.functions.udf

// define a method in similar way like you did
scala> def len = udf { (data: String) => data.length > 0 }
len: org.apache.spark.sql.expressions.UserDefinedFunction

// use it
scala> dataset.select(len($"text").as('length)).show
+------+
|length|
+------+
|  true|
|  true|
+------+


Yong



________________________________
From: Pralabh Kumar <pralabhku...@gmail.com>
Sent: Friday, June 16, 2017 12:19 AM
To: lk_spark
Cc: user.spark
Subject: Re: how to call udf with parameters

sample UDF
val getlength=udf((data:String)=>data.length())
data.select(getlength(data("col1")))

On Fri, Jun 16, 2017 at 9:21 AM, lk_spark 
<lk_sp...@163.com<mailto:lk_sp...@163.com>> wrote:
hi,all
     I define a udf with multiple parameters  ,but I don't know how to call it 
with DataFrame

UDF:

def ssplit2 = udf { (sentence: String, delNum: Boolean, delEn: Boolean, 
minTermLen: Int) =>
    val terms = HanLP.segment(sentence).asScala
.....

Call :

scala> val output = input.select(ssplit2($"text",true,true,2).as('words))
<console>:40: error: type mismatch;
 found   : Boolean(true)
 required: org.apache.spark.sql.Column
       val output = input.select(ssplit2($"text",true,true,2).as('words))
                                                 ^
<console>:40: error: type mismatch;
 found   : Boolean(true)
 required: org.apache.spark.sql.Column
       val output = input.select(ssplit2($"text",true,true,2).as('words))
                                                      ^
<console>:40: error: type mismatch;
 found   : Int(2)
 required: org.apache.spark.sql.Column
       val output = input.select(ssplit2($"text",true,true,2).as('words))
                                                           ^

scala> val output = 
input.select(ssplit2($"text",$"true",$"true",$"2").as('words))
org.apache.spark.sql.AnalysisException: cannot resolve '`true`' given input 
columns: [id, text];;
'Project [UDF(text#6, 'true, 'true, '2) AS words#16]
+- Project [_1#2 AS id#5, _2#3 AS text#6]
   +- LocalRelation [_1#2, _2#3]

I need help!!


2017-06-16
________________________________
lk_spark

Reply via email to