M Bharat lal created SPARK-11688: ------------------------------------ Summary: UDF's doesn't work when it has a default arguments Key: SPARK-11688 URL: https://issues.apache.org/jira/browse/SPARK-11688 Project: Spark Issue Type: Improvement Components: SQL Reporter: M Bharat lal Priority: Minor
Use case: ======== Suppose we have a function which accepts three parameters (string, subString and frmIndex which has 0 default value ) def hasSubstring(string:String, subString:String, frmIndex:Int = 0): Long = string.indexOf(subString, frmIndex) above function works perfectly if I dont pass frmIndex parameter scala> hasSubstring("Scala", "la") res0: Long = 3 But, when I register the above function as UDF (successfully registered) and call the same without passing frmIndex parameter got the below exception scala> val df = sqlContext.createDataFrame(Seq(("scala","Spark","MLlib"),("abc", "def", "gfh"))).toDF("c1", "c2", "c3") df: org.apache.spark.sql.DataFrame = [c1: string, c2: string, c3: string] scala> df.show +-----+-----+-----+ | c1| c2| c3| +-----+-----+-----+ |scala|Spark|MLlib| | abc| def| gfh| +-----+-----+-----+ scala> sqlContext.udf.register("hasSubstring", hasSubstring _ ) res3: org.apache.spark.sql.UserDefinedFunction = UserDefinedFunction(<function3>,LongType,List()) scala> val result = df.as("i0").withColumn("subStringIndex", callUDF("hasSubstring", $"i0.c1", lit("la"))) org.apache.spark.sql.AnalysisException: undefined function hasSubstring; at org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58) at org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:57) at org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:53) at scala.util.Try.getOrElse(Try.scala:77) at org.apache.spark.sql.hive.HiveFunctionRegistry.lookupFunction(hiveUDFs.scala:53) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$23.apply(Analyzer.scala:490) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$23.apply(Analyzer.scala:490) at org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org