I think you are right that there is no way to call Java UDF without
registration right now. Adding another 20 methods to functions would be
scary. Maybe the best way is to have a companion object
for UserDefinedFunction, and define UDF there?

e.g.

object UserDefinedFunction {

  def define(f: org.apache.spark.api.java.function.Function0, returnType:
Class[_]): UserDefinedFunction

  // ... define a few more - maybe up to 5 arguments?
}

Ideally, we should ask for both argument class and return class, so we can
do the proper type conversion (e.g. if the UDF expects a string, but the
input expression is an int, Catalyst can automatically add a cast).
However, we haven't implemented those in UserDefinedFunction yet.




On Fri, May 29, 2015 at 12:54 PM, Justin Uang <justin.u...@gmail.com> wrote:

> I would like to define a UDF in Java via a closure and then use it without
> registration. In Scala, I believe there are two ways to do this:
>
>     myUdf = functions.udf({ _ + 5})
>     myDf.select(myUdf(myDf("age")))
>
> or
>
>     myDf.select(functions.callUDF({_ + 5}, DataTypes.IntegerType,
> myDf("age")))
>
> However, both of these don't work for Java UDF. The first one requires
> TypeTags. For the second one, I was able to hack it by creating a scala
> AbstractFunction1 and using callUDF, which requires declaring the catalyst
> DataType instead of using TypeTags. However, it was still nasty because I
> had to return a scala map instead of a java map.
>
> Is there first class support for creating
> a org.apache.spark.sql.UserDefinedFunction that works with
> the org.apache.spark.sql.api.java.UDF1<T1, R>? I'm fine with having to
> declare the catalyst type when creating it.
>
> If it doesn't exist, I would be happy to work on it =)
>
> Justin
>

Reply via email to