M Bharat lal created SPARK-11688:
------------------------------------

             Summary: UDF's doesn't work when it has a default arguments
                 Key: SPARK-11688
                 URL: https://issues.apache.org/jira/browse/SPARK-11688
             Project: Spark
          Issue Type: Improvement
          Components: SQL
            Reporter: M Bharat lal
            Priority: Minor


Use case:
========
Suppose we have a function which accepts three parameters (string, subString 
and frmIndex which has 0 default value )

def hasSubstring(string:String, subString:String, frmIndex:Int = 0): Long = 
string.indexOf(subString, frmIndex)

above function works perfectly if I dont pass frmIndex parameter

scala> hasSubstring("Scala", "la")
res0: Long = 3

But, when I register the above function as UDF (successfully registered) and 
call the same without  passing frmIndex parameter got the below exception

scala> val df  = 
sqlContext.createDataFrame(Seq(("scala","Spark","MLlib"),("abc", "def", 
"gfh"))).toDF("c1", "c2", "c3")
df: org.apache.spark.sql.DataFrame = [c1: string, c2: string, c3: string]

scala> df.show
+-----+-----+-----+
|   c1|   c2|   c3|
+-----+-----+-----+
|scala|Spark|MLlib|
|  abc|  def|  gfh|
+-----+-----+-----+

scala> sqlContext.udf.register("hasSubstring", hasSubstring _ )
res3: org.apache.spark.sql.UserDefinedFunction = 
UserDefinedFunction(<function3>,LongType,List())

scala> val result = df.as("i0").withColumn("subStringIndex", 
callUDF("hasSubstring", $"i0.c1", lit("la")))

org.apache.spark.sql.AnalysisException: undefined function hasSubstring;
        at 
org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58)
        at 
org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58)
        at scala.Option.getOrElse(Option.scala:120)
        at 
org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:57)
        at 
org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:53)
        at scala.util.Try.getOrElse(Try.scala:77)
        at 
org.apache.spark.sql.hive.HiveFunctionRegistry.lookupFunction(hiveUDFs.scala:53)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$23.apply(Analyzer.scala:490)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$23.apply(Analyzer.scala:490)
        at 
org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to