[
https://issues.apache.org/jira/browse/SPARK-11688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon updated SPARK-11688:
---------------------------------
Labels: bulk-closed (was: )
> UDF's doesn't work when it has a default arguments
> --------------------------------------------------
>
> Key: SPARK-11688
> URL: https://issues.apache.org/jira/browse/SPARK-11688
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Reporter: M Bharat lal
> Priority: Minor
> Labels: bulk-closed
>
> Use case:
> ========
> Suppose we have a function which accepts three parameters (string, subString
> and frmIndex which has 0 default value )
> def hasSubstring(string:String, subString:String, frmIndex:Int = 0): Long =
> string.indexOf(subString, frmIndex)
> above function works perfectly if I dont pass frmIndex parameter
> scala> hasSubstring("Scala", "la")
> res0: Long = 3
> But, when I register the above function as UDF (successfully registered) and
> call the same without passing frmIndex parameter got the below exception
> scala> val df =
> sqlContext.createDataFrame(Seq(("scala","Spark","MLlib"),("abc", "def",
> "gfh"))).toDF("c1", "c2", "c3")
> df: org.apache.spark.sql.DataFrame = [c1: string, c2: string, c3: string]
> scala> df.show
> +-----+-----+-----+
> | c1| c2| c3|
> +-----+-----+-----+
> |scala|Spark|MLlib|
> | abc| def| gfh|
> +-----+-----+-----+
> scala> sqlContext.udf.register("hasSubstring", hasSubstring _ )
> res3: org.apache.spark.sql.UserDefinedFunction =
> UserDefinedFunction(<function3>,LongType,List())
> scala> val result = df.as("i0").withColumn("subStringIndex",
> callUDF("hasSubstring", $"i0.c1", lit("la")))
> org.apache.spark.sql.AnalysisException: undefined function hasSubstring;
> at
> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58)
> at
> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58)
> at scala.Option.getOrElse(Option.scala:120)
> at
> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:57)
> at
> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:53)
> at scala.util.Try.getOrElse(Try.scala:77)
> at
> org.apache.spark.sql.hive.HiveFunctionRegistry.lookupFunction(hiveUDFs.scala:53)
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$23.apply(Analyzer.scala:490)
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$23.apply(Analyzer.scala:490)
> at
> org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]