[ https://issues.apache.org/jira/browse/SPARK-11688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15003600#comment-15003600 ]
Jakob Odersky commented on SPARK-11688: --------------------------------------- Registering a UDF requires a function (instance of FunctionX), however only defs support default parameters. Let me illustrate: {{hasSubstring _}} is equivalent to {{(x: String, y: String, z: Int) => hasSubstring(x, y, z)}}, which is only syntactic sugar for {code} new Function3[Long, String, String, Int] { def apply(x: String, y: String, z: Int) = hasSubstring(x, y, z) } {code} Therefore, the error is expected since you are trying to call a Function3 with only two parameters. With the current API, and without trying some macro-magic, I see no way of enabling default parameters for UDFs. Maybe a changing the register API to something like register((X, Y, Z) => R, defaults) could work where defaults would supply the arguments to any non-specified parameters when the UDF is called. However this could also lead to some very subtle errors as any substituted default parameters would have a value as specified during registration, potentially different from a default parameter specified in a corresponding def declaration. > UDF's doesn't work when it has a default arguments > -------------------------------------------------- > > Key: SPARK-11688 > URL: https://issues.apache.org/jira/browse/SPARK-11688 > Project: Spark > Issue Type: Improvement > Components: SQL > Reporter: M Bharat lal > Priority: Minor > > Use case: > ======== > Suppose we have a function which accepts three parameters (string, subString > and frmIndex which has 0 default value ) > def hasSubstring(string:String, subString:String, frmIndex:Int = 0): Long = > string.indexOf(subString, frmIndex) > above function works perfectly if I dont pass frmIndex parameter > scala> hasSubstring("Scala", "la") > res0: Long = 3 > But, when I register the above function as UDF (successfully registered) and > call the same without passing frmIndex parameter got the below exception > scala> val df = > sqlContext.createDataFrame(Seq(("scala","Spark","MLlib"),("abc", "def", > "gfh"))).toDF("c1", "c2", "c3") > df: org.apache.spark.sql.DataFrame = [c1: string, c2: string, c3: string] > scala> df.show > +-----+-----+-----+ > | c1| c2| c3| > +-----+-----+-----+ > |scala|Spark|MLlib| > | abc| def| gfh| > +-----+-----+-----+ > scala> sqlContext.udf.register("hasSubstring", hasSubstring _ ) > res3: org.apache.spark.sql.UserDefinedFunction = > UserDefinedFunction(<function3>,LongType,List()) > scala> val result = df.as("i0").withColumn("subStringIndex", > callUDF("hasSubstring", $"i0.c1", lit("la"))) > org.apache.spark.sql.AnalysisException: undefined function hasSubstring; > at > org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58) > at > org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58) > at scala.Option.getOrElse(Option.scala:120) > at > org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:57) > at > org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:53) > at scala.util.Try.getOrElse(Try.scala:77) > at > org.apache.spark.sql.hive.HiveFunctionRegistry.lookupFunction(hiveUDFs.scala:53) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$23.apply(Analyzer.scala:490) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$23.apply(Analyzer.scala:490) > at > org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org