[ https://issues.apache.org/jira/browse/SPARK-35321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17339906#comment-17339906 ]
Chao Sun commented on SPARK-35321: ---------------------------------- [~xkrogen] yes that can help to solve the issue, but users need to specify both {{spark.sql.hive.metastore.version}} and {{spark.sql.hive.metastore.jars}}. The latter is not so easy to setup: the {{maven}} option usually takes a very long time to download all the jars, while the {{path}} option require users to download all the relevant Hive jars with the specific version and it's tedious. I think this specific issue is worth fixing in Spark itself regardless since it doesn't really need to load all the permanent functions when starting up Hive client from what I can see. The process could also be pretty expensive if there are many UDFs registered in HMS. > Spark 3.x can't talk to HMS 1.2.x and lower due to get_all_functions Thrift > API missing > --------------------------------------------------------------------------------------- > > Key: SPARK-35321 > URL: https://issues.apache.org/jira/browse/SPARK-35321 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.0.2, 3.1.1, 3.2.0 > Reporter: Chao Sun > Priority: Major > > https://issues.apache.org/jira/browse/HIVE-10319 introduced a new API > {{get_all_functions}} which is only supported in Hive 1.3.0/2.0.0 and up. > This is called when creating a new {{Hive}} object: > {code} > private Hive(HiveConf c, boolean doRegisterAllFns) throws HiveException { > conf = c; > if (doRegisterAllFns) { > registerAllFunctionsOnce(); > } > } > {code} > {{registerAllFunctionsOnce}} will reload all the permanent functions by > calling the {{get_all_functions}} API from the megastore. In Spark, we always > pass {{doRegisterAllFns}} as true, and this will cause failure: > {code} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.thrift.TApplicationException: Invalid method name: > 'get_all_functions' > at > org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3897) > at > org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:248) > at > org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:231) > ... 96 more > Caused by: org.apache.thrift.TApplicationException: Invalid method name: > 'get_all_functions' > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_all_functions(ThriftHiveMetastore.java:3845) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_all_functions(ThriftHiveMetastore.java:3833) > {code} > It looks like Spark doesn't really need to call {{registerAllFunctionsOnce}} > since it loads the Hive permanent function directly from HMS API. The Hive > {{FunctionRegistry}} is only used for loading Hive built-in functions. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org