Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30107/ --- (Updated Jan. 22, 2015, 9:23 a.m.) Review request for hive and Xuefu Zhang. Changes --- Spark driver may need to load extra added class in 2 place, first, while execute GetJobStatusJob, it need to deserialize SparkWork. Second, while HiveInputFormat get splits, it need to deserialize MapWork. Remote Driver execute AddJarJob in netty rpc thread directly as it's SyncJobRquest, and execute GetJobStatusJob(which wraps spark job) with its threadpool. HiveInputFormat get splits may happens in akka thread pool, as Spark send message through akka between SparkContext and DAGScheduler. So we may need to reset 2 threads classloader to enable this dynamic add jar in RSC. Bugs: HIVE-9410 https://issues.apache.org/jira/browse/HIVE-9410 Repository: hive-git Description --- The RemoteDriver does not contains added jar in it's classpath, so it would failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, while use add jar through Hive CLI, Hive add jar into CLI classpath(through thread context classloader) and add it to distributed cache as well. Compare to Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should add added jar into it's classpath as well. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java d7cb111 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 30a00a7 spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 00aa4ec spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 1eb3ff2 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 5f9be65 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java PRE-CREATION Diff: https://reviews.apache.org/r/30107/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30107/#review69329 --- I'm wondering what's the story for Hive CLI. Hive CLI can add jars from local file system. Would this work for Hive on Spark? ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java https://reviews.apache.org/r/30107/#comment114004 Callers of getBaseWork() will add the jars to the classpath. Why this is necessary? Who are the callers? Any side-effect? ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java https://reviews.apache.org/r/30107/#comment114005 So, this is the code that adds the jars to the classpath of the remote driver? I'm wondering why these jars are necessary in order to deserailize SparkWork. - Xuefu Zhang On Jan. 22, 2015, 9:23 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30107/ --- (Updated Jan. 22, 2015, 9:23 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-9410 https://issues.apache.org/jira/browse/HIVE-9410 Repository: hive-git Description --- The RemoteDriver does not contains added jar in it's classpath, so it would failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, while use add jar through Hive CLI, Hive add jar into CLI classpath(through thread context classloader) and add it to distributed cache as well. Compare to Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should add added jar into it's classpath as well. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java d7cb111 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 30a00a7 spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 00aa4ec spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 1eb3ff2 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 5f9be65 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java PRE-CREATION Diff: https://reviews.apache.org/r/30107/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]
On 一月 23, 2015, 2:05 a.m., Xuefu Zhang wrote: I'm wondering what's the story for Hive CLI. Hive CLI can add jars from local file system. Would this work for Hive on Spark? Hive CLI add jars to classpath dynamically same as this patch does for RemoteDriver, update thread context classloader with added jars path included. For Hive on Spark, Hive CLI stay the same, the issue is that RemoteDriver does not add these added jars into its class path, so the NoClassFound error come out while RemoteDriver side need related class. On 一月 23, 2015, 2:05 a.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java, line 367 https://reviews.apache.org/r/30107/diff/4/?file=829688#file829688line367 Callers of getBaseWork() will add the jars to the classpath. Why this is necessary? Who are the callers? Any side-effect? The reason why we need to do this is that, getBaseWork() would generate MapWork/ReduceWork which contains Hive operators inside, and UDTFOperator which contains added jar class need to be loaded. To load added jar dynamically, we need to reset thread context classloader, as mentioned in previous change summary, unlike HiveCLI, there are 2 threads in RemoteDriver side may need to load added jar, For akka thread, there is no proper cut-in point for add jars to classpath. The side-effect is that, many HiveCLI threads may have to check to update its classload unneccsary. Another possible solution is that, we update SystemClassLoader for RemoteDriver dynamically, which must be done in a quite hacky way, such as: URLClassLoader sysloader = (URLClassLoader) ClassLoader.getSystemClassLoader(); Class sysclass = URLClassLoader.class; try { Method method = sysclass.getDeclaredMethod(addURL, parameters); method.setAccessible(true); method.invoke(sysloader, new Object[] {u}); } catch (Throwable t) { t.printStackTrace(); throw new IOException(Error, could not add URL to system classloader); } Which one do you prefer? On 一月 23, 2015, 2:05 a.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java, line 220 https://reviews.apache.org/r/30107/diff/4/?file=829689#file829689line220 So, this is the code that adds the jars to the classpath of the remote driver? I'm wondering why these jars are necessary in order to deserailize SparkWork. Same as previous comments, SparkWork contains MapWork/ReduceWork which contains operator tree, UTFFOperator need to load added jar class. - chengxiang --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30107/#review69329 --- On 一月 22, 2015, 9:23 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30107/ --- (Updated 一月 22, 2015, 9:23 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-9410 https://issues.apache.org/jira/browse/HIVE-9410 Repository: hive-git Description --- The RemoteDriver does not contains added jar in it's classpath, so it would failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, while use add jar through Hive CLI, Hive add jar into CLI classpath(through thread context classloader) and add it to distributed cache as well. Compare to Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should add added jar into it's classpath as well. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java d7cb111 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 30a00a7 spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 00aa4ec spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 1eb3ff2 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 5f9be65 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java PRE-CREATION Diff: https://reviews.apache.org/r/30107/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30107/#review69336 --- ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java https://reviews.apache.org/r/30107/#comment114014 #3 this would be executed in akka thread, get extra jar path from JobConf, and add to current thread classloader. ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java https://reviews.apache.org/r/30107/#comment114013 #2 this job is executed in thread RemoteDriver threadpool, it get extra jar paths from JobContext, add them to current thread classloader, and set them to JobConf. spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java https://reviews.apache.org/r/30107/#comment114012 #1 add extra jar path to JobContext, this job is executed in netty connection thread. - chengxiang li On 一月 22, 2015, 9:23 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30107/ --- (Updated 一月 22, 2015, 9:23 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-9410 https://issues.apache.org/jira/browse/HIVE-9410 Repository: hive-git Description --- The RemoteDriver does not contains added jar in it's classpath, so it would failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, while use add jar through Hive CLI, Hive add jar into CLI classpath(through thread context classloader) and add it to distributed cache as well. Compare to Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should add added jar into it's classpath as well. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java d7cb111 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 30a00a7 spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 00aa4ec spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 1eb3ff2 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 5f9be65 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java PRE-CREATION Diff: https://reviews.apache.org/r/30107/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]
On 一月 23, 2015, 2:05 a.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java, line 220 https://reviews.apache.org/r/30107/diff/4/?file=829689#file829689line220 So, this is the code that adds the jars to the classpath of the remote driver? I'm wondering why these jars are necessary in order to deserailize SparkWork. chengxiang li wrote: Same as previous comments, SparkWork contains MapWork/ReduceWork which contains operator tree, UTFFOperator need to load added jar class. Xuefu Zhang wrote: Sorry, but which operator? UTFFOperator? I could find it in hive source. Sorry, as you can see from the error log in JIRA, the extra class in added jar is contained in UDTFOperator: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: de.bankmark.bigbench.queries.q10.SentimentUDF Serialization trace: genericUDTF (org.apache.hadoop.hive.ql.plan.UDTFDesc) conf (org.apache.hadoop.hive.ql.exec.UDTFOperator) childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator) childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator) childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) - chengxiang --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30107/#review69329 --- On 一月 22, 2015, 9:23 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30107/ --- (Updated 一月 22, 2015, 9:23 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-9410 https://issues.apache.org/jira/browse/HIVE-9410 Repository: hive-git Description --- The RemoteDriver does not contains added jar in it's classpath, so it would failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, while use add jar through Hive CLI, Hive add jar into CLI classpath(through thread context classloader) and add it to distributed cache as well. Compare to Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should add added jar into it's classpath as well. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java d7cb111 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 30a00a7 spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 00aa4ec spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 1eb3ff2 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 5f9be65 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java PRE-CREATION Diff: https://reviews.apache.org/r/30107/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]
On Jan. 23, 2015, 2:05 a.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java, line 220 https://reviews.apache.org/r/30107/diff/4/?file=829689#file829689line220 So, this is the code that adds the jars to the classpath of the remote driver? I'm wondering why these jars are necessary in order to deserailize SparkWork. chengxiang li wrote: Same as previous comments, SparkWork contains MapWork/ReduceWork which contains operator tree, UTFFOperator need to load added jar class. Sorry, but which operator? UTFFOperator? I could find it in hive source. - Xuefu --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30107/#review69329 --- On Jan. 22, 2015, 9:23 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30107/ --- (Updated Jan. 22, 2015, 9:23 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-9410 https://issues.apache.org/jira/browse/HIVE-9410 Repository: hive-git Description --- The RemoteDriver does not contains added jar in it's classpath, so it would failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, while use add jar through Hive CLI, Hive add jar into CLI classpath(through thread context classloader) and add it to distributed cache as well. Compare to Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should add added jar into it's classpath as well. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java d7cb111 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 30a00a7 spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 00aa4ec spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 1eb3ff2 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 5f9be65 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java PRE-CREATION Diff: https://reviews.apache.org/r/30107/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]
On Jan. 23, 2015, 3:02 a.m., chengxiang li wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java, line 371 https://reviews.apache.org/r/30107/diff/4/?file=829688#file829688line371 #3 this would be executed in akka thread, get extra jar path from JobConf, and add to current thread classloader. what thread is referred as akka thread? - Xuefu --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30107/#review69336 --- On Jan. 22, 2015, 9:23 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30107/ --- (Updated Jan. 22, 2015, 9:23 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-9410 https://issues.apache.org/jira/browse/HIVE-9410 Repository: hive-git Description --- The RemoteDriver does not contains added jar in it's classpath, so it would failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, while use add jar through Hive CLI, Hive add jar into CLI classpath(through thread context classloader) and add it to distributed cache as well. Compare to Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should add added jar into it's classpath as well. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java d7cb111 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 30a00a7 spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 00aa4ec spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 1eb3ff2 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 5f9be65 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java PRE-CREATION Diff: https://reviews.apache.org/r/30107/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]
On 一月 23, 2015, 3:02 a.m., chengxiang li wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java, line 371 https://reviews.apache.org/r/30107/diff/4/?file=829688#file829688line371 #3 this would be executed in akka thread, get extra jar path from JobConf, and add to current thread classloader. Xuefu Zhang wrote: what thread is referred as akka thread? Inside Spark driver, SparkContext submit spark job to DAGSchedule through akka message instead of directly invoke, akka hold a thread pool to handle messages. - chengxiang --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30107/#review69336 --- On 一月 22, 2015, 9:23 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30107/ --- (Updated 一月 22, 2015, 9:23 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-9410 https://issues.apache.org/jira/browse/HIVE-9410 Repository: hive-git Description --- The RemoteDriver does not contains added jar in it's classpath, so it would failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, while use add jar through Hive CLI, Hive add jar into CLI classpath(through thread context classloader) and add it to distributed cache as well. Compare to Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should add added jar into it's classpath as well. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java d7cb111 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 30a00a7 spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 00aa4ec spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 1eb3ff2 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 5f9be65 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java PRE-CREATION Diff: https://reviews.apache.org/r/30107/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30107/#review69341 --- ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java https://reviews.apache.org/r/30107/#comment114022 Should we also check addedJars.isEmpty() to be consistent with other places? - Xuefu Zhang On Jan. 22, 2015, 9:23 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30107/ --- (Updated Jan. 22, 2015, 9:23 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-9410 https://issues.apache.org/jira/browse/HIVE-9410 Repository: hive-git Description --- The RemoteDriver does not contains added jar in it's classpath, so it would failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, while use add jar through Hive CLI, Hive add jar into CLI classpath(through thread context classloader) and add it to distributed cache as well. Compare to Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should add added jar into it's classpath as well. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java d7cb111 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 30a00a7 spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 00aa4ec spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 1eb3ff2 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 5f9be65 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java PRE-CREATION Diff: https://reviews.apache.org/r/30107/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30107/#review69344 --- ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java https://reviews.apache.org/r/30107/#comment114027 Could we add a check, something like: if (hive.execution.engine==spark) { try { ... } The code as it is might make other people frown. - Xuefu Zhang On Jan. 22, 2015, 9:23 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30107/ --- (Updated Jan. 22, 2015, 9:23 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-9410 https://issues.apache.org/jira/browse/HIVE-9410 Repository: hive-git Description --- The RemoteDriver does not contains added jar in it's classpath, so it would failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, while use add jar through Hive CLI, Hive add jar into CLI classpath(through thread context classloader) and add it to distributed cache as well. Compare to Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should add added jar into it's classpath as well. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java d7cb111 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 30a00a7 spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 00aa4ec spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 1eb3ff2 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 5f9be65 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java PRE-CREATION Diff: https://reviews.apache.org/r/30107/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30107/#review69357 --- Ship it! Ship It! - Xuefu Zhang On Jan. 23, 2015, 6:37 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30107/ --- (Updated Jan. 23, 2015, 6:37 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-9410 https://issues.apache.org/jira/browse/HIVE-9410 Repository: hive-git Description --- The RemoteDriver does not contains added jar in it's classpath, so it would failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, while use add jar through Hive CLI, Hive add jar into CLI classpath(through thread context classloader) and add it to distributed cache as well. Compare to Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should add added jar into it's classpath as well. Diffs - itests/src/test/resources/testconfiguration.properties 6340d1c ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 9d9f4e6 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java a4a166a ql/src/test/queries/clientpositive/lateral_view_explode2.q PRE-CREATION ql/src/test/results/clientpositive/lateral_view_explode2.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 00aa4ec spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 1eb3ff2 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 5f9be65 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java PRE-CREATION Diff: https://reviews.apache.org/r/30107/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30107/ --- (Updated Jan. 23, 2015, 6:44 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-9410 https://issues.apache.org/jira/browse/HIVE-9410 Repository: hive-git Description --- The RemoteDriver does not contains added jar in it's classpath, so it would failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, while use add jar through Hive CLI, Hive add jar into CLI classpath(through thread context classloader) and add it to distributed cache as well. Compare to Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should add added jar into it's classpath as well. Diffs (updated) - itests/src/test/resources/testconfiguration.properties 6340d1c ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 9d9f4e6 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java a4a166a ql/src/test/queries/clientpositive/lateral_view_explode2.q PRE-CREATION ql/src/test/results/clientpositive/lateral_view_explode2.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 00aa4ec spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 1eb3ff2 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 5f9be65 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java PRE-CREATION Diff: https://reviews.apache.org/r/30107/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30107/ --- (Updated Jan. 23, 2015, 6:37 a.m.) Review request for hive and Xuefu Zhang. Changes --- add more comments and fix what xuefu mentioned before. Bugs: HIVE-9410 https://issues.apache.org/jira/browse/HIVE-9410 Repository: hive-git Description --- The RemoteDriver does not contains added jar in it's classpath, so it would failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, while use add jar through Hive CLI, Hive add jar into CLI classpath(through thread context classloader) and add it to distributed cache as well. Compare to Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should add added jar into it's classpath as well. Diffs (updated) - itests/src/test/resources/testconfiguration.properties 6340d1c ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 9d9f4e6 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java a4a166a ql/src/test/queries/clientpositive/lateral_view_explode2.q PRE-CREATION ql/src/test/results/clientpositive/lateral_view_explode2.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out PRE-CREATION spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 00aa4ec spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 1eb3ff2 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 5f9be65 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java PRE-CREATION Diff: https://reviews.apache.org/r/30107/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30107/ --- (Updated Jan. 22, 2015, 3:53 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-9410 https://issues.apache.org/jira/browse/HIVE-9410 Repository: hive-git Description --- The RemoteDriver does not contains added jar in it's classpath, so it would failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, while use add jar through Hive CLI, Hive add jar into CLI classpath(through thread context classloader) and add it to distributed cache as well. Compare to Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should add added jar into it's classpath as well. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 30a00a7 spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 00aa4ec spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 1eb3ff2 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 5f9be65 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java PRE-CREATION Diff: https://reviews.apache.org/r/30107/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30107/ --- (Updated Jan. 22, 2015, 3:54 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-9410 https://issues.apache.org/jira/browse/HIVE-9410 Repository: hive-git Description --- The RemoteDriver does not contains added jar in it's classpath, so it would failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, while use add jar through Hive CLI, Hive add jar into CLI classpath(through thread context classloader) and add it to distributed cache as well. Compare to Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should add added jar into it's classpath as well. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 30a00a7 spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 00aa4ec spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 1eb3ff2 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 5f9be65 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java PRE-CREATION Diff: https://reviews.apache.org/r/30107/diff/ Testing --- Thanks, chengxiang li