Adam Szita created HIVE-21096:
---------------------------------
Summary: Remove unnecessary Spark dependency from HS2 process
Key: HIVE-21096
URL: https://issues.apache.org/jira/browse/HIVE-21096
Project: Hive
Issue Type: Improvement
Components: HiveServer2, Spark
Reporter: Adam Szita
Assignee: Adam Szita
When a HiveOnSpark job is kicked off most of the work is done by the
RemoteDriver, which is a separate process. There a couple of smaller parts of
code, where HS2 process depends on Spark jars, these for example include
receiving stats from the driver or putting together a Spark conf object - used
mostly during communication with RemoteDriver.
We can limit the data types used for such communication so that we don't use
(and serialize) types that are in Spark codebase, and hence we can refactor our
code to only use Spark jars in the Remote Driver process.
I think this way would be cleaner from dependencies point of view, and also
less erroneous when users have to compile the classpath for their HS2 processes.
(E.g. due to a change between Spark 2.2 and 2.4 we had to also include
spark-unsafe*.jar - though it's an internal change to Spark..)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)