[jira] [Comment Edited] (TEZ-4557) Revert TEZ-4303, NoClassDefFoundError because of missing httpclient jar
[ https://issues.apache.org/jira/browse/TEZ-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859483#comment-17859483 ] László Bodor edited comment on TEZ-4557 at 6/23/24 4:58 PM: I was thinking about this and I feel that we might want to remove this exclusion, so add httpclient transitive dependency back I remember back in the day when TEZ-4303 was merged, we tended to follow a pattern of removing every transitive dependency from Tez that caused CVE scan issues, because we - from tez side - just wanted to get rid of them, but at the same time we violated a more important rule to assembly and deploy a standalone, OOTB tez.tar.gz that works in most of the cases...here, in the stacktrace, it's clearly seen that Hadoop's KMSClientProvider needs this library, so excluding it is a hack, and doesn't improve the stack (Hadoop will have the same "bad" dependency) today I would solve CVE issues like: 1. if something is required by Hadoop, let's try not to mess with that 2. if a transitive dependency causes CVE warning, let's push the solution on Hadoop folks, and upgrade the Hadoop dependency when it's available -> this way the whole Hadoop stack will leverage our efforts I'm open to any objections here: [~ayushtkn], [~jeagles] was (Author: abstractdog): I was thinking about this and I feel that we might want to remove this exclusion, so add httpclient transitive dependency back I remember back in the day when TEZ-4303 was merged, we tended to follow a pattern of removing every transitive dependency from Tez that caused CVE scan issues, because we - from tez side - just wanted to get rid of them, but at the same time we violated a more important rule to have a standalone, OOTB tez.tar.gz that works in most of the cases...here in the stacktrace, it's clearly seen that Hadoop's KMSClientProvider needs this library, so excluding it is a hack again, and doesn't improve the stack (Hadoop will have the same "bad" dependency) today I would solve CVE issues like: 1. if something is required by Hadoop, let's try not to mess with that 2. if a transitive dependency causes CVE warning, let's push the solution on Hadoop folks, and upgrade the Hadoop dependency when it's available -> this way the whole Hadoop stack will leverage our efforts I'm open to any objections here: [~ayushtkn], [~jeagles] > Revert TEZ-4303, NoClassDefFoundError because of missing httpclient jar > --- > > Key: TEZ-4557 > URL: https://issues.apache.org/jira/browse/TEZ-4557 > Project: Apache Tez > Issue Type: Bug >Reporter: Raghav Aggarwal >Assignee: Raghav Aggarwal >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > When insert data into table located in encryption zone using Hive with tez > fails as the httpclient jar has been excluded from hadoop transitive > dependency. Same query passes with MR. > Tez: 0.10.2,0.10.3 > Hadoop: 3.3.6 > Hive: 3.1.2 > > Steps to reproduce issue: > 1. Create a encryption key using ranger keyadmin user. > 2. hdfs crypto -createZone -keyName test_key -path /user/raghav/encrypt_zone > 3. create table tbl(id int) location '/user/raghav/encrypt_zone'; > 4. insert into tbl values(1); > > Stacktrace: > {code:java} > Caused by: java.lang.NoClassDefFoundError: > org/apache/http/client/utils/URIBuilder > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.createURL(KMSClientProvider.java:468) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:823) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:354) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:350) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:175) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:350) > at > org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:535) > at > org.apache.hadoop.hdfs.HdfsKMSUtil.decryptEncryptedDataEncryptionKey(HdfsKMSUtil.java:216) > at > org.apache.hadoop.hdfs.DFSClient.createWrappedOutputStream(DFSClient.java:1002) > at > org.apache.hadoop.hdfs.DFSClient.createWrappedOutputStream(DFSClient.java:983) > at > org.apache.hadoop.hdfs.DistributedFileSystem.safelyCreateWrappedOutputStream(DistributedFileSystem.java:734) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$300(DistributedFileSystem.java:149) > at > org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(Distribute
[jira] [Comment Edited] (TEZ-4557) Revert TEZ-4303, NoClassDefFoundError because of missing httpclient jar
[ https://issues.apache.org/jira/browse/TEZ-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842941#comment-17842941 ] Raghav Aggarwal edited comment on TEZ-4557 at 5/2/24 11:38 AM: --- [~ayushtkn], I am using Hive 3.1.2 , hadoop 3.3.6 and tez 0.10.2. The issue should happen in hive 4 with tez 0.10.3, as httpclient jar is missing from tez/lib. Haven't tested it explicitly with those version as ranger integration will be required. was (Author: JIRAUSER295901): I am using Hive 3.1.2 , hadoop 3.3.6 and tez 0.10.2. The issue should happen in hive 4 with tez 0.10.3, as httpclient jar is missing from tez/lib. > Revert TEZ-4303, NoClassDefFoundError because of missing httpclient jar > --- > > Key: TEZ-4557 > URL: https://issues.apache.org/jira/browse/TEZ-4557 > Project: Apache Tez > Issue Type: Bug >Reporter: Raghav Aggarwal >Assignee: Raghav Aggarwal >Priority: Major > > When insert data into table located in encryption zone using Hive with tez > fails as the httpclient jar has been excluded from hadoop transitive > dependency. Same query passes with MR. > Tez: 0.10.2,0.10.3 > Hadoop: 3.3.6 > Hive: 3.1.2 > > Steps to reproduce issue: > 1. Create a encryption key using ranger keyadmin user. > 2. hdfs crypto -createZone -keyName test_key -path /user/raghav/encrypt_zone > 3. create table tbl(id int) location '/user/raghav/encrypt_zone'; > 4. insert into tbl values(1); > > Stacktrace: > {code:java} > Caused by: java.lang.NoClassDefFoundError: > org/apache/http/client/utils/URIBuilder > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.createURL(KMSClientProvider.java:468) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:823) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:354) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:350) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:175) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:350) > at > org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:535) > at > org.apache.hadoop.hdfs.HdfsKMSUtil.decryptEncryptedDataEncryptionKey(HdfsKMSUtil.java:216) > at > org.apache.hadoop.hdfs.DFSClient.createWrappedOutputStream(DFSClient.java:1002) > at > org.apache.hadoop.hdfs.DFSClient.createWrappedOutputStream(DFSClient.java:983) > at > org.apache.hadoop.hdfs.DistributedFileSystem.safelyCreateWrappedOutputStream(DistributedFileSystem.java:734) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$300(DistributedFileSystem.java:149) > at > org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:572) > at > org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:566) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:580) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:507) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1233) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1109) > at > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat.getHiveRecordWriter(HiveIgnoreKeyTextOutputFormat.java:81) > at > org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:297) > at > org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:282) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:801) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:752) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:922) > at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:993) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:939) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:926) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) > at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:993) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:939) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:926) > at > org.apache.h