I have a question related to Permanent UDF for spark enabled hive support.
When we do create function, this is registered with hive via
spark-sql>create function customfun as
'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDay' using jar
'hdfs:///tmp/hive-exec.jar';
call stack:
org.apache.spark.sql.hive.client.HiveClientImpl#createFunction
org.apache.spark.sql.hive.HiveExternalCatalog#createFunction
org.apache.spark.sql.catalyst.catalog.SessionCatalog#createFunction
org.apache.spark.sql.execution.command.CreateFunctionCommand#run
but when we call a registered UDF, we do ADD JAR call to hive
spark-sql> select customfun('2015-08-22');
call stack:
org.apache.spark.sql.hive.client.HiveClientImpl#addJar
org.apache.spark.sql.hive.HiveSessionResourceLoader#addJar
org.apache.spark.sql.internal.SessionResourceLoader#loadResource
org.apache.spark.sql.catalyst.catalog.SessionCatalog#loadFunctionResources
org.apache.spark.sql.catalyst.catalog.SessionCatalog#lookupFunction
so is the ADD JAR call to hive necessary when we invoke a already registered
UDF.? as i see if we follow current code,
1. hive can lookup already registered UDFs without explicit add jar call from
spark , Refer https://cwiki.apache.org/confluence/display/Hive/HivePlugins
fixed via https://issues.apache.org/jira/browse/HIVE-6380 ( When the function
is referenced for the first time by a Hive session, these resources will be
added to the environment. )
2. We cannot have across session as the new session again need to do add jar
internally on UDF call, which will fail as caller neeed to have a admin role
set ( hive requires add jar to be run only via admin role )
Please correct me if i am wrong, can we avoid add jar when we invoke a
registered UDF.? any side-effects if i modify this flow.?