----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30107/ -----------------------------------------------------------
(Updated Jan. 22, 2015, 9:23 a.m.) Review request for hive and Xuefu Zhang. Changes ------- Spark driver may need to load extra added class in 2 place, first, while execute GetJobStatusJob, it need to deserialize SparkWork. Second, while HiveInputFormat get splits, it need to deserialize MapWork. Remote Driver execute AddJarJob in netty rpc thread directly as it's SyncJobRquest, and execute GetJobStatusJob(which wraps spark job) with its threadpool. HiveInputFormat get splits may happens in akka thread pool, as Spark send message through akka between SparkContext and DAGScheduler. So we may need to reset 2 threads classloader to enable this dynamic add jar in RSC. Bugs: HIVE-9410 https://issues.apache.org/jira/browse/HIVE-9410 Repository: hive-git Description ------- The RemoteDriver does not contains added jar in it's classpath, so it would failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, while use add jar through Hive CLI, Hive add jar into CLI classpath(through thread context classloader) and add it to distributed cache as well. Compare to Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should add added jar into it's classpath as well. Diffs (updated) ----- ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java d7cb111 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 30a00a7 spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 00aa4ec spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 1eb3ff2 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 5f9be65 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java PRE-CREATION Diff: https://reviews.apache.org/r/30107/diff/ Testing ------- Thanks, chengxiang li