[jira] [Comment Edited] (TEZ-4557) Revert TEZ-4303, NoClassDefFoundError because of missing httpclient jar

2024-06-23 Thread Jira


[ 
https://issues.apache.org/jira/browse/TEZ-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859483#comment-17859483
 ] 

László Bodor edited comment on TEZ-4557 at 6/23/24 4:58 PM:


I was thinking about this and I feel that we might want to remove this 
exclusion, so add httpclient transitive dependency back

I remember back in the day when TEZ-4303 was merged, we tended to follow a 
pattern of removing every transitive dependency from Tez that caused CVE scan 
issues, because we - from tez side - just wanted to get rid of them, but at the 
same time we violated a more important rule to assembly and deploy a 
standalone, OOTB tez.tar.gz that works in most of the cases...here, in the 
stacktrace, it's clearly seen that Hadoop's KMSClientProvider needs this 
library, so excluding it is a hack, and doesn't improve the stack (Hadoop will 
have the same "bad" dependency)

today I would solve CVE issues like:

1. if something is required by Hadoop, let's try not to mess with that
2. if a transitive dependency causes CVE warning, let's push the solution on 
Hadoop folks, and upgrade the Hadoop dependency when it's available -> this way 
the whole Hadoop stack will leverage our efforts

I'm open to any objections here: [~ayushtkn], [~jeagles]




was (Author: abstractdog):
I was thinking about this and I feel that we might want to remove this 
exclusion, so add httpclient transitive dependency back

I remember back in the day when TEZ-4303 was merged, we tended to follow a 
pattern of removing every transitive dependency from Tez that caused CVE scan 
issues, because we - from tez side - just wanted to get rid of them, but at the 
same time we violated a more important rule to have a standalone, OOTB 
tez.tar.gz that works in most of the cases...here in the stacktrace, it's 
clearly seen that Hadoop's KMSClientProvider needs this library, so excluding 
it is a hack again, and doesn't improve the stack (Hadoop will have the same 
"bad" dependency)

today I would solve CVE issues like:

1. if something is required by Hadoop, let's try not to mess with that
2. if a transitive dependency causes CVE warning, let's push the solution on 
Hadoop folks, and upgrade the Hadoop dependency when it's available -> this way 
the whole Hadoop stack will leverage our efforts

I'm open to any objections here: [~ayushtkn], [~jeagles]



> Revert TEZ-4303, NoClassDefFoundError because of missing httpclient jar
> ---
>
> Key: TEZ-4557
> URL: https://issues.apache.org/jira/browse/TEZ-4557
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Raghav Aggarwal
>Assignee: Raghav Aggarwal
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When insert data into table located in encryption zone using Hive with tez 
> fails as the httpclient jar has been excluded from hadoop transitive 
> dependency. Same query passes with MR.
> Tez: 0.10.2,0.10.3
> Hadoop: 3.3.6
> Hive: 3.1.2
>  
> Steps to reproduce issue:
> 1. Create a encryption key using ranger keyadmin user.
> 2. hdfs crypto -createZone -keyName test_key -path /user/raghav/encrypt_zone
> 3. create table tbl(id int) location '/user/raghav/encrypt_zone';
> 4. insert into tbl values(1);
>  
> Stacktrace:
> {code:java}
> Caused by: java.lang.NoClassDefFoundError: 
> org/apache/http/client/utils/URIBuilder
>     at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.createURL(KMSClientProvider.java:468)
>     at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:823)
>     at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:354)
>     at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:350)
>     at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:175)
>     at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:350)
>     at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:535)
>     at 
> org.apache.hadoop.hdfs.HdfsKMSUtil.decryptEncryptedDataEncryptionKey(HdfsKMSUtil.java:216)
>     at 
> org.apache.hadoop.hdfs.DFSClient.createWrappedOutputStream(DFSClient.java:1002)
>     at 
> org.apache.hadoop.hdfs.DFSClient.createWrappedOutputStream(DFSClient.java:983)
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem.safelyCreateWrappedOutputStream(DistributedFileSystem.java:734)
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$300(DistributedFileSystem.java:149)
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(Distribute

[jira] [Comment Edited] (TEZ-4557) Revert TEZ-4303, NoClassDefFoundError because of missing httpclient jar

2024-05-02 Thread Raghav Aggarwal (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842941#comment-17842941
 ] 

Raghav Aggarwal edited comment on TEZ-4557 at 5/2/24 11:38 AM:
---

[~ayushtkn], I am using Hive 3.1.2 , hadoop 3.3.6 and tez 0.10.2.

The issue should happen in hive 4 with tez 0.10.3, as httpclient jar is missing 
from tez/lib. Haven't tested it explicitly with those version as ranger 
integration will be required.


was (Author: JIRAUSER295901):
I am using Hive 3.1.2 , hadoop 3.3.6 and tez 0.10.2.

The issue should happen in hive 4 with tez 0.10.3, as httpclient jar is missing 
from tez/lib. 

> Revert TEZ-4303, NoClassDefFoundError because of missing httpclient jar
> ---
>
> Key: TEZ-4557
> URL: https://issues.apache.org/jira/browse/TEZ-4557
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Raghav Aggarwal
>Assignee: Raghav Aggarwal
>Priority: Major
>
> When insert data into table located in encryption zone using Hive with tez 
> fails as the httpclient jar has been excluded from hadoop transitive 
> dependency. Same query passes with MR.
> Tez: 0.10.2,0.10.3
> Hadoop: 3.3.6
> Hive: 3.1.2
>  
> Steps to reproduce issue:
> 1. Create a encryption key using ranger keyadmin user.
> 2. hdfs crypto -createZone -keyName test_key -path /user/raghav/encrypt_zone
> 3. create table tbl(id int) location '/user/raghav/encrypt_zone';
> 4. insert into tbl values(1);
>  
> Stacktrace:
> {code:java}
> Caused by: java.lang.NoClassDefFoundError: 
> org/apache/http/client/utils/URIBuilder
>     at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.createURL(KMSClientProvider.java:468)
>     at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:823)
>     at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:354)
>     at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:350)
>     at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:175)
>     at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:350)
>     at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:535)
>     at 
> org.apache.hadoop.hdfs.HdfsKMSUtil.decryptEncryptedDataEncryptionKey(HdfsKMSUtil.java:216)
>     at 
> org.apache.hadoop.hdfs.DFSClient.createWrappedOutputStream(DFSClient.java:1002)
>     at 
> org.apache.hadoop.hdfs.DFSClient.createWrappedOutputStream(DFSClient.java:983)
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem.safelyCreateWrappedOutputStream(DistributedFileSystem.java:734)
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$300(DistributedFileSystem.java:149)
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:572)
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:566)
>     at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:580)
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:507)
>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1233)
>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1109)
>     at 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat.getHiveRecordWriter(HiveIgnoreKeyTextOutputFormat.java:81)
>     at 
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:297)
>     at 
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:282)
>     at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:801)
>     at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:752)
>     at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:922)
>     at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:993)
>     at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:939)
>     at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:926)
>     at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>     at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:993)
>     at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:939)
>     at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:926)
>     at 
> org.apache.h