[jira] [Commented] (SPARK-11688) UDF's doesn't work when it has a default arguments

Jakob Odersky (JIRA) Thu, 12 Nov 2015 22:11:35 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-11688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15003600#comment-15003600
 ]


Jakob Odersky commented on SPARK-11688:
---------------------------------------

Registering a UDF requires a function (instance of FunctionX), however only 
defs support default parameters. Let me illustrate:

{{hasSubstring _}}  is equivalent to {{(x: String, y: String, z: Int) => 
hasSubstring(x, y, z)}}, which is only syntactic sugar for

{code}
new Function3[Long, String, String, Int] {
  def apply(x: String, y: String, z: Int) = hasSubstring(x, y, z)
}
{code}

Therefore, the error is expected since you are trying to call a Function3 with 
only two parameters.

With the current API, and without trying some macro-magic, I see no way of 
enabling default parameters for UDFs. Maybe a changing the register API to 
something like
register((X, Y, Z) => R, defaults) could work where defaults would supply the 
arguments to any non-specified parameters when the UDF is called. However this 
could also lead to some very subtle errors as any substituted default 
parameters would have a value as specified during registration, potentially 
different from a default parameter specified in a corresponding def declaration.

> UDF's doesn't work when it has a default arguments
> --------------------------------------------------
>
>                 Key: SPARK-11688
>                 URL: https://issues.apache.org/jira/browse/SPARK-11688
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: M Bharat lal
>            Priority: Minor
>
> Use case:
> ========
> Suppose we have a function which accepts three parameters (string, subString 
> and frmIndex which has 0 default value )
> def hasSubstring(string:String, subString:String, frmIndex:Int = 0): Long = 
> string.indexOf(subString, frmIndex)
> above function works perfectly if I dont pass frmIndex parameter
> scala> hasSubstring("Scala", "la")
> res0: Long = 3
> But, when I register the above function as UDF (successfully registered) and 
> call the same without  passing frmIndex parameter got the below exception
> scala> val df  = 
> sqlContext.createDataFrame(Seq(("scala","Spark","MLlib"),("abc", "def", 
> "gfh"))).toDF("c1", "c2", "c3")
> df: org.apache.spark.sql.DataFrame = [c1: string, c2: string, c3: string]
> scala> df.show
> +-----+-----+-----+
> |   c1|   c2|   c3|
> +-----+-----+-----+
> |scala|Spark|MLlib|
> |  abc|  def|  gfh|
> +-----+-----+-----+
> scala> sqlContext.udf.register("hasSubstring", hasSubstring _ )
> res3: org.apache.spark.sql.UserDefinedFunction = 
> UserDefinedFunction(<function3>,LongType,List())
> scala> val result = df.as("i0").withColumn("subStringIndex", 
> callUDF("hasSubstring", $"i0.c1", lit("la")))
> org.apache.spark.sql.AnalysisException: undefined function hasSubstring;
>       at 
> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58)
>       at 
> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58)
>       at scala.Option.getOrElse(Option.scala:120)
>       at 
> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:57)
>       at 
> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:53)
>       at scala.util.Try.getOrElse(Try.scala:77)
>       at 
> org.apache.spark.sql.hive.HiveFunctionRegistry.lookupFunction(hiveUDFs.scala:53)
>       at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$23.apply(Analyzer.scala:490)
>       at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$23.apply(Analyzer.scala:490)
>       at 
> org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11688) UDF's doesn't work when it has a default arguments

Reply via email to