[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082049#comment-15082049 ] Jason Dere commented on HIVE-11878: --- Committed to branch-1 > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11878 ClassLoader Issues when Registering > Jars.pptx, HIVE-11878.2.patch, HIVE-11878.3.patch, HIVE-11878.4.patch, > HIVE-11878.4.patch.branch-1, HIVE-11878.patch, HIVE-11878_approach3.patch, > HIVE-11878_approach3_per_session_clasloader.patch, > HIVE-11878_approach3_with_review_comments.patch, > HIVE-11878_approach3_with_review_comments1.patch, HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051359#comment-15051359 ] Jason Dere commented on HIVE-11878: --- Test failiures are not related > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878 ClassLoader Issues when Registering > Jars.pptx, HIVE-11878.2.patch, HIVE-11878.3.patch, HIVE-11878.4.patch, > HIVE-11878.patch, HIVE-11878_approach3.patch, > HIVE-11878_approach3_per_session_clasloader.patch, > HIVE-11878_approach3_with_review_comments.patch, > HIVE-11878_approach3_with_review_comments1.patch, HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051361#comment-15051361 ] Ashutosh Chauhan commented on HIVE-11878: - Since this has been seen at other sites also, it will be good to land this in 2.0 branch as well. > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878 ClassLoader Issues when Registering > Jars.pptx, HIVE-11878.2.patch, HIVE-11878.3.patch, HIVE-11878.4.patch, > HIVE-11878.patch, HIVE-11878_approach3.patch, > HIVE-11878_approach3_per_session_clasloader.patch, > HIVE-11878_approach3_with_review_comments.patch, > HIVE-11878_approach3_with_review_comments1.patch, HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050350#comment-15050350 ] Hive QA commented on HIVE-11878: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12776512/HIVE-11878.4.patch {color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 9874 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2 org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testAddPartitions org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testFetchingPartitionsWithDifferentSchemas org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping org.apache.hive.jdbc.TestSSL.testSSLVersion org.apache.hive.jdbc.miniHS2.TestHs2Metrics.testMetrics {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6299/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6299/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6299/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12776512 - PreCommit-HIVE-TRUNK-Build > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878 ClassLoader Issues when Registering > Jars.pptx, HIVE-11878.2.patch, HIVE-11878.3.patch, HIVE-11878.4.patch, > HIVE-11878.patch, HIVE-11878_approach3.patch, > HIVE-11878_approach3_per_session_clasloader.patch, > HIVE-11878_approach3_with_review_comments.patch, > HIVE-11878_approach3_with_review_comments1.patch, HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15049815#comment-15049815 ] Jason Dere commented on HIVE-11878: --- +1 if the test look good > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878 ClassLoader Issues when Registering > Jars.pptx, HIVE-11878.2.patch, HIVE-11878.3.patch, HIVE-11878.4.patch, > HIVE-11878.patch, HIVE-11878_approach3.patch, > HIVE-11878_approach3_per_session_clasloader.patch, > HIVE-11878_approach3_with_review_comments.patch, > HIVE-11878_approach3_with_review_comments1.patch, HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048290#comment-15048290 ] Ratandeep Ratti commented on HIVE-11878: [~jdere] . I've fixed both the failing tests. Please have a look. Thanks > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878 ClassLoader Issues when Registering > Jars.pptx, HIVE-11878.2.patch, HIVE-11878.3.patch, HIVE-11878.4.patch, > HIVE-11878.patch, HIVE-11878_approach3.patch, > HIVE-11878_approach3_per_session_clasloader.patch, > HIVE-11878_approach3_with_review_comments.patch, > HIVE-11878_approach3_with_review_comments1.patch, HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034590#comment-15034590 ] Jason Dere commented on HIVE-11878: --- I think this is happening because the changes in conf/ivysettings.xml assume the local maven repository is in file:${user.home}/.m2/repository. I set my MAVEN_OPTS to specify a non-default directory using -Dmaven.repo.local and also hit this error. In the pom file, we use the maven.repo.local setting, would there be a way to specify this in the ivy settings? {noformat} ${maven.repo.local} {noformat} > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878 ClassLoader Issues when Registering > Jars.pptx, HIVE-11878.2.patch, HIVE-11878.3.patch, HIVE-11878.patch, > HIVE-11878_approach3.patch, > HIVE-11878_approach3_per_session_clasloader.patch, > HIVE-11878_approach3_with_review_comments.patch, > HIVE-11878_approach3_with_review_comments1.patch, HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033460#comment-15033460 ] Ratandeep Ratti commented on HIVE-11878: Any thoughts on what could be happening with {{udf_classloader_dynamic_dependency_resolution}} ? Seems like it is failing when registering the jar {code} running ADD JAR ivy://org.apache.hive.hive-it-custom-udfs:udf-classloader-udf1:+ {code} > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878 ClassLoader Issues when Registering > Jars.pptx, HIVE-11878.2.patch, HIVE-11878.3.patch, HIVE-11878.patch, > HIVE-11878_approach3.patch, > HIVE-11878_approach3_per_session_clasloader.patch, > HIVE-11878_approach3_with_review_comments.patch, > HIVE-11878_approach3_with_review_comments1.patch, HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033441#comment-15033441 ] Ratandeep Ratti commented on HIVE-11878: [~jdere] I'm unable to reproduce {{org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_classloader_dynamic_dependency_resolution}} I'll check why {{org.apache.hadoop.hive.ql.security.authorization.plugin.TestHiveAuthorizerShowFilters.org.apache.hadoop.hive.ql.security.authorization.plugin.TestHiveAuthorizerShowFilters}} is failing. > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878 ClassLoader Issues when Registering > Jars.pptx, HIVE-11878.2.patch, HIVE-11878.3.patch, HIVE-11878.patch, > HIVE-11878_approach3.patch, > HIVE-11878_approach3_per_session_clasloader.patch, > HIVE-11878_approach3_with_review_comments.patch, > HIVE-11878_approach3_with_review_comments1.patch, HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033426#comment-15033426 ] Jason Dere commented on HIVE-11878: --- [~rdsr] would you be able to look at these failures? > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878 ClassLoader Issues when Registering > Jars.pptx, HIVE-11878.2.patch, HIVE-11878.3.patch, HIVE-11878.patch, > HIVE-11878_approach3.patch, > HIVE-11878_approach3_per_session_clasloader.patch, > HIVE-11878_approach3_with_review_comments.patch, > HIVE-11878_approach3_with_review_comments1.patch, HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033412#comment-15033412 ] Jason Dere commented on HIVE-11878: --- The following test failures look like real errors and need to be fixed: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_classloader_dynamic_dependency_resolution org.apache.hadoop.hive.ql.security.authorization.plugin.TestHiveAuthorizerShowFilters.org.apache.hadoop.hive.ql.security.authorization.plugin.TestHiveAuthorizerShowFilters TestHiveAuthorizerShowFilters is failing with the following error in hive-unit/target/tmp/log/hive.log: {noformat} 2015-12-01T01:05:03,174 ERROR [main]: ql.Driver (SessionState.java:printError(1010)) - FAILED: MockitoException Mockito cannot mock this class: class org.apache.hadoop.hive.ql.security.authorization.plugin.TestHiveAuthorizerShowFilters$MockedHiveAuthorizerFactory$1AuthorizerWithFilterCmdImpl Mockito can only mock visible & non-final classes. If you're not sure why you're getting this error, please report to the mailing list. org.mockito.exceptions.base.MockitoException: Mockito cannot mock this class: class org.apache.hadoop.hive.ql.security.authorization.plugin.TestHiveAuthorizerShowFilters$MockedHiveAuthorizerFactory$1AuthorizerWithFilterCmdImpl Mockito can only mock visible & non-final classes. If you're not sure why you're getting this error, please report to the mailing list. at org.apache.hadoop.hive.ql.security.authorization.plugin.TestHiveAuthorizerShowFilters$MockedHiveAuthorizerFactory.createHiveAuthorizer(TestHiveAuthorizerShowFilters.java:98) at org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:773) at org.apache.hadoop.hive.ql.session.SessionState.getAuthenticator(SessionState.java:1436) at org.apache.hadoop.hive.ql.session.SessionState.getUserFromAuthenticator(SessionState.java:1034) at org.apache.hadoop.hive.ql.metadata.Table.getEmptyTable(Table.java:179) at org.apache.hadoop.hive.ql.metadata.Table.(Table.java:121) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.addDbAndTabToOutputs(SemanticAnalyzer.java:11016) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:10871) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:9960) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10060) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:222) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:237) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:462) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:317) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1227) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1276) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1152) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1140) at org.apache.hadoop.hive.ql.security.authorization.plugin.TestHiveAuthorizerShowFilters.runCmd(TestHiveAuthorizerShowFilters.java:265) at org.apache.hadoop.hive.ql.security.authorization.plugin.TestHiveAuthorizerShowFilters.beforeTest(TestHiveAuthorizerShowFilters.java:127) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) Caused by: org.mockito.cglib.core.CodeGenerationException:
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15031621#comment-15031621 ] Hive QA commented on HIVE-11878: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12774703/HIVE-11878.3.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 9826 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file TestMiniLlapCliDriver - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_classloader_dynamic_dependency_resolution org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping org.apache.hadoop.hive.ql.security.authorization.plugin.TestHiveAuthorizerShowFilters.org.apache.hadoop.hive.ql.security.authorization.plugin.TestHiveAuthorizerShowFilters org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6169/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6169/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6169/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12774703 - PreCommit-HIVE-TRUNK-Build > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878 ClassLoader Issues when Registering > Jars.pptx, HIVE-11878.2.patch, HIVE-11878.3.patch, HIVE-11878.patch, > HIVE-11878_approach3.patch, > HIVE-11878_approach3_per_session_clasloader.patch, > HIVE-11878_approach3_with_review_comments.patch, > HIVE-11878_approach3_with_review_comments1.patch, HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see t
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15030036#comment-15030036 ] Hive QA commented on HIVE-11878: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12774196/HIVE-11878.2.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6145/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6145/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6145/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-6145/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 7984738 HIVE-12465: Hive might produce wrong results when (outer) joins are merged (Jesus Camacho Rodriguez, reviewed by Ashutosh Chauhan) + git clean -f -d + git checkout master Already on 'master' + git reset --hard origin/master HEAD is now at 7984738 HIVE-12465: Hive might produce wrong results when (outer) joins are merged (Jesus Camacho Rodriguez, reviewed by Ashutosh Chauhan) + git merge --ff-only origin/master Already up-to-date. + git gc + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12774196 - PreCommit-HIVE-TRUNK-Build > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878 ClassLoader Issues when Registering > Jars.pptx, HIVE-11878.2.patch, HIVE-11878.patch, HIVE-11878_approach3.patch, > HIVE-11878_approach3_per_session_clasloader.patch, > HIVE-11878_approach3_with_review_comments.patch, > HIVE-11878_approach3_with_review_comments1.patch, HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See deta
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15028192#comment-15028192 ] Ratandeep Ratti commented on HIVE-11878: That's correct [~jdere] > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878 ClassLoader Issues when Registering > Jars.pptx, HIVE-11878.2.patch, HIVE-11878.patch, HIVE-11878_approach3.patch, > HIVE-11878_approach3_per_session_clasloader.patch, > HIVE-11878_approach3_with_review_comments.patch, > HIVE-11878_approach3_with_review_comments1.patch, HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15025688#comment-15025688 ] Jason Dere commented on HIVE-11878: --- So removing JARs from the session will still require closing the existing classloader and creating a new one (with the specified JARs omitted from the list of URIs), correct? > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878 ClassLoader Issues when Registering > Jars.pptx, HIVE-11878.patch, HIVE-11878_approach3.patch, > HIVE-11878_approach3_per_session_clasloader.patch, > HIVE-11878_approach3_with_review_comments.patch, > HIVE-11878_approach3_with_review_comments1.patch, HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010274#comment-15010274 ] Ratandeep Ratti commented on HIVE-11878: Incorporated [~jdere] comments. > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878 ClassLoader Issues when Registering > Jars.pptx, HIVE-11878.patch, HIVE-11878_approach3.patch, > HIVE-11878_approach3_per_session_clasloader.patch, > HIVE-11878_approach3_with_review_comments.patch, HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14997102#comment-14997102 ] Jason Dere commented on HIVE-11878: --- Hi [~rdsr], I think that sounds good. > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878.patch, HIVE-11878_approach3.patch, > HIVE-11878_approach3_per_session_clasloader.patch, HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14995540#comment-14995540 ] Ratandeep Ratti commented on HIVE-11878: Hi [~jdere] Thanks for the comments. We can change the parent classloader when creating the {{UDFClassLoader}} to SessionState.class.getClassLoader() which will be the system classloader. What do you think? About the second point. I think you are right. I was mistaken. > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878.patch, HIVE-11878_approach3.patch, > HIVE-11878_approach3_per_session_clasloader.patch, HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14978962#comment-14978962 ] Jason Dere commented on HIVE-11878: --- Apologies for not getting back to you on this before now. As to your comments: {quote} I have some doubts about using the thread context classloader as the parent. This does not seem to provide clean isolation between jars/resources between different sessions. Case in point: a thread context classloader could be a previous session's classloader .This can happen when the same thread was used to work on a previous session, and is now being used to work on the newer current session. The thread context classloaer could contain a different implementation of the same class also present in the session classloader. Do you see this a a problem? {quote} I think you are right, that the new session's classloader would be polluted with the loaded JARs from the thread's previous SessionState. What would be a better parent class loader to use here - the system class loader? Will it have any JARs added from the AUX_JARS options? Checking some of the various AUX_JARS related options, not sure if I missed any others: - HIVE_AUX_JARS_PATH environment variable: This gets added to the CLASSPATH, so the System class loader will have these - hive.aux.jars: As far as I can tell this is used when shipping JARs to the Map/Reduce tasks, so this does not seem to get used by the SessionState. Someone correct me if I am wrong. - hive.reloadable.aux.jars.path: These jars are added at the start of a HiveServer2 session, so it may not matter that the System class loader does not have these jars. {quote} Another potential problem I'm thinking about – which is present in the proposed approach (see RB) is – in HiveServer2 any worker thread can serve any request by mapping it to a persistent session. Couldn't this lead to a situation where for a specific session the session specific classloader (conf.getClassLoader()) and the thread context classloader end up being different? Say we have two worker thread t1 and t2 .The very first query is handled by t1 where a fresh session s1 is created along with a fresh classloader c1, which is set as the session specific classloader and the thread context classloader. The next query for the same session is handled by t2. I guess since it is the same session s1, we do not create a fresh classloader. The session specific classloader is c1, but since it is a different thread and no classloader has been set on it, the thread will have the system classloader as its context classloader. Couldn't this cause potential CNF exceptions? If I understood correctly this problem also exists in the current implementation, doesn't it? {quote} I'm not sure if I understand the problem here - I believe that if a new thread handles an existing session, the thread calls Session.setCurrentSessionState() which should set the thread's context class loader to the SessionState's class loader. Or did I not understand the issue? > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878.patch, HIVE-11878_approach3.patch, > HIVE-11878_approach3_per_session_clasloader.patch, HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextCl
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940705#comment-14940705 ] Ratandeep Ratti commented on HIVE-11878: s/above to/above two/ > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878.patch, HIVE-11878_approach3.patch, > HIVE-11878_approach3_per_session_clasloader.patch, HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940701#comment-14940701 ] Ratandeep Ratti commented on HIVE-11878: Also note: The above to problems, I think, should also exist in Hive currently. Am I missing something here? > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878.patch, HIVE-11878_approach3.patch, > HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940183#comment-14940183 ] Ratandeep Ratti commented on HIVE-11878: Hi [~jdere] I got some time to look into this today. I incorporated your suggestion where I create a fresh classloader when a new session is created. I use, as parent, the thread context classloader for the freshly created session classloader (See RB: https://reviews.apache.org/r/38663/) . I have some doubts about using the thread context classloader as the parent. This does not seem to provide clean isolation between jars/resources between different sessions. Case in point: a thread context classloader could be a previous session's classloader .This can happen when the same thread was used to work on a previous session, and is now being used to work on the newer current session. The thread context classloaer could contain a different implementation of the same class also present in the session classloader. Do you see this a a problem? Another potential problem I'm thinking about -- which is present in the proposed approach (see RB) is -- in HiveServer2 any worker thread can serve any request by mapping it to a persistent session. Couldn't this lead to a situation where for a specific session the session specific classloader (conf.getClassLoader()) and the thread context classloader end up being different? Say we have two worker thread t1 and t2 .The very first query is handled by t1 where a fresh session s1 is created along with a fresh classloader c1, which is set as the session specific classloader and the thread context classloader. The next query for the same session is handled by t2. I guess since it is the same session s1, we do not create a fresh classloader. The session specific classloader is c1, but since it is a different thread and no classloader has been set on it, the thread will have the system classloader as its context classloader. Couldn't this cause potential CNF exceptions? If I understood correctly this problem also exists in the current implementation, doesn't it? > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878.patch, HIVE-11878_approach3.patch, > HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14906613#comment-14906613 ] Ratandeep Ratti commented on HIVE-11878: Hi [~cwsteinbach] . Thanks for the comments, I'll update the patch with your comments. [~jdere] That's a very good point. I'll incorporate your suggestions > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878.patch, HIVE-11878_approach3.patch, > HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905569#comment-14905569 ] Jason Dere commented on HIVE-11878: --- For HiveServer2 there may potentially be multiple Sessions from different users, each of which may be issuing their own ADD/REMOVE JAR commands. Since this patch is now re-using/updating the class loader rather than creating a new one when adding JARs, it would be good to make sure that the class loader used by a one session does not get affected by ADD/REMOVE JAR commands happening from other/older sessions. Can you make sure that when a new session is started, that a new class loader is created for that session, with an appropriate parent class loader (is it the context class loader?)? > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878.patch, HIVE-11878_approach3.patch, > HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905124#comment-14905124 ] Carl Steinbach commented on HIVE-11878: --- Hi [~rdsr], I left some comments on RB related to the testing approach. Everything else looks good. Thanks. > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878.patch, HIVE-11878_approach3.patch, > HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904005#comment-14904005 ] Ratandeep Ratti commented on HIVE-11878: Hi Folks, I've implemented Approach 3 outlined above. Some points I find in favour of this approach is . * We skip creating needless classloaders * This is very close to the old implementation. Think of it like we are registering all the jars at once (See: https://issues.apache.org/jira/browse/HIVE-3907) > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878.patch, HIVE-11878_approach3.patch, > HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903992#comment-14903992 ] Ratandeep Ratti commented on HIVE-11878: [~jdere], [~ashutoshc], I'd love to hear your thoughts on this. > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878.patch, HIVE-11878_approach3.patch, > HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903990#comment-14903990 ] Ratandeep Ratti commented on HIVE-11878: RB for approach 3: https://reviews.apache.org/r/38663/ > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878.patch, HIVE-11878_approach3.patch, > HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14876858#comment-14876858 ] Ratandeep Ratti commented on HIVE-11878: bq. if we had previously loaded a class with the previous classloader, and now load the class again with the current classloader, would there be any potential effects here? The two class objects will definitely be different. I'll try to look if we compare class-objects in the code. Some effects that come to mind are 1. o instanceof c . If c is loaded by a classloader u1 and o is also an object of c, but the object's class was loaded by another classloader u2. 2. casting may not work. (similar reasoning as above) [~jdere], [~ashutoshc] . I'd also like to get your opinion on approach 3, mentioned above, which is we do not create new classloaders for every jar, but add jars to the same classloader using the {{addURL}} method. We basically extend the URLClassLoader and change scope of the method addURL from protected to public. This can side step the potential problems that we are discussing here. As for deleting jars in {{org.apache.hadoop.hive.ql.exec.Utilities#removeFromClassPath}}, it can be exactly as before, except that it will not create an instance of URLClassloader but a subclass of it (with scope of addURL changed) and set that as the currentThreadContext classloader and the Hadoop Configuration classloader. One way to think about approach 3 is that it is exactly like what is currently being done, except that we register all the jars at once. I haven't implemented approach 3 yet, wanted to get some opinion on it before I proceeded further. > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878.patch, HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14876793#comment-14876793 ] Jason Dere commented on HIVE-11878: --- Thanks for looking into this. Would there have been any good reason why Hive was originally keeping the old classloader around when resolving classes? One thing I can think of is that since it checks its parent classloader, when it loads a class it will end up using the oldest version of the classloader that is able to load this class. So with this patch, if we had previously loaded a class with the previous classloader, and now load the class again with the current classloader, would there be any potential effects here? The 2 Class objects would not be considered the same, do we ever compare Class objects? Are there any other behavior differences between classes/objects from different class loaders? Does this only apply for classes from JARs added using ADD JAR, since approach 1 still has the System class loader as its parent classloader? > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878.patch, HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14876084#comment-14876084 ] Ashutosh Chauhan commented on HIVE-11878: - Thanks, [~rdsr] for detailed report and digging into this. I can repro this even with just create function (no need of select) if I add following in TestClUDF1.java {code} public TestClUDF1 () { TestClClassA testClClassA = new TestClClassA(); } {code} This generates following stack trace (which is different then above), but essentially same root cause : {code} java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDF(Registry.java:144) at org.apache.hadoop.hive.ql.exec.Registry.registerFunction(Registry.java:105) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerTemporaryUDF(FunctionRegistry.java:1536) at org.apache.hadoop.hive.ql.exec.FunctionTask.createTemporaryFunction(FunctionTask.java:166) at org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java:72) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1747) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1506) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1263) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1079) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1069) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1033) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1007) at org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:146) at org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_test_classloader(TestCliDriver.java:130) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129) ... 39 more Caused by: java.lang.NoClassDefFoundError: TestClClassA at TestClUDF1.(TestClUDF1.java:29) ... 44 more Caused by: java.lang.ClassNotFoundException: TestClClassA at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14876061#comment-14876061 ] Ratandeep Ratti commented on HIVE-11878: [~ashutoshc] Note that {{TestClClassA}} class is present in {{utility.jar}} which is registered after registering the udf jar as shown in the qfile {{test_classloader.q}} > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878.patch, HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14876037#comment-14876037 ] Ratandeep Ratti commented on HIVE-11878: In this specific scenario, the exception occurs in the select statement, when the initialize method of the UDF is called which then tries to create an object of the class {{TestClClassA}} > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878.patch, HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14876034#comment-14876034 ] Ratandeep Ratti commented on HIVE-11878: Hi [~ashutoshc] Here's the complete stacktrace (You can generate this by applying patch HIVE-11878_qtest.patch ) {noformat} Tests run: 2, Failures: 1, Errors: 0, Skipped: 0 at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:882) at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:149) at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:106) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:617) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:252) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10137) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:212) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:240) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:428) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:310) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1150) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1203) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1079) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1069) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1033) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1007) at org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:146) at org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_test_classloader(TestCliDriver.java:130) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) Caused by: java.lang.ClassNotFoundException: TestClClassA at java.net.URLClassLoader$1.run(URLClassLoader.java:372) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:360) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 60 more {noformat} > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878.patch, HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader wh
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14875993#comment-14875993 ] Ashutosh Chauhan commented on HIVE-11878: - Also, this happens when doing create function or in select query which uses that created function ? > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878.patch, HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14875985#comment-14875985 ] Ashutosh Chauhan commented on HIVE-11878: - [~rdsr] Can you post the stack trace you get when bug happens ? > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878.patch, HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)