Hi Kant, The udfDeterministic would be set to false if the results from your UDF are non-deterministic, such as produced by random numbers, so the catalyst optimizer will not cache and reuse results.
On Mon, Apr 2, 2018 at 12:11 PM, kant kodali <kanth...@gmail.com> wrote: > Looks like there is spark.udf().registerPython() like below. > > public void registerPython(java.lang.String name, org.apache.spark.sql. > execution.python.UserDefinedPythonFunction udf) > > > can anyone describe what *udfDeterministic *parameter does in the method > signature below? > > public UserDefinedPythonFunction(java.lang.String name, > org.apache.spark.api.python.PythonFunction func, > org.apache.spark.sql.types.DataType dataType, int pythonEvalType, boolean > udfDeterministic) { /* compiled code */ } > > > On Sun, Apr 1, 2018 at 3:46 PM, kant kodali <kanth...@gmail.com> wrote: > >> Hi All, >> >> All of our spark code is in Java wondering if there a way to register >> python UDF's using java API such that the registered UDF's can be used >> using raw spark SQL. >> If there is any other way to achieve this goal please suggest! >> >> Thanks >> >> >