[jira] [Created] (HADOOP-19195) Upgrade aws sdk v2 to 2.25.53
Harshit Gupta created HADOOP-19195: -- Summary: Upgrade aws sdk v2 to 2.25.53 Key: HADOOP-19195 URL: https://issues.apache.org/jira/browse/HADOOP-19195 Project: Hadoop Common Issue Type: Improvement Components: fs/s3 Affects Versions: 3.5.0, 3.4.1 Reporter: Harshit Gupta Assignee: Harshit Gupta Fix For: 3.5.0 Upgrade aws sdk v2 to 2.25.53 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19194) Add test to find unshaded dependencies in the aws sdk
Harshit Gupta created HADOOP-19194: -- Summary: Add test to find unshaded dependencies in the aws sdk Key: HADOOP-19194 URL: https://issues.apache.org/jira/browse/HADOOP-19194 Project: Hadoop Common Issue Type: Improvement Components: fs/s3 Affects Versions: 3.4.0 Reporter: Harshit Gupta Assignee: Harshit Gupta Fix For: 3.4.1 Write a test to assess the aws sdk for unshaded artefacts on the class path which might cause deployment failures. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19101) Vectored Read into off-heap buffer broken in fallback implementation
[ https://issues.apache.org/jira/browse/HADOOP-19101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824079#comment-17824079 ] Harshit Gupta commented on HADOOP-19101: [~ste...@apache.org] how did you discover this issue? I thought we had tests that defined and changed the offset of the ranges being read irrespective of the offset? > Vectored Read into off-heap buffer broken in fallback implementation > > > Key: HADOOP-19101 > URL: https://issues.apache.org/jira/browse/HADOOP-19101 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs, fs/azure >Affects Versions: 3.4.0, 3.3.6 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Blocker > > {{VectoredReadUtils.readInDirectBuffer()}} always starts off reading at > position zero even when the range is at a different offset. As a result: you > can get incorrect information. > Thanks for this is straightforward: we pass in a FileRange and use its offset > as the starting position. > However, this does mean that all shipping releases 3.3.5-3.4.0 cannot safely > read vectorIO into direct buffers through HDFS, ABFS or GCS. Note that we > have never seen this in production because the parquet and ORC libraries both > read into on-heap storage. > Those libraries needs to be audited to make sure that they never attempt to > read into off-heap DirectBuffers. This is a bit trickier than you would think > because an allocator is passed in. For PARQUET-2171 we will > * only invoke the API on streams which explicitly declare their support for > the API (so fallback in parquet itself) > * not invoke when direct buffer allocation is in use. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-19082) S3A: Update AWS SDK V2 to 2.24.6
[ https://issues.apache.org/jira/browse/HADOOP-19082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshit Gupta updated HADOOP-19082: --- Summary: S3A: Update AWS SDK V2 to 2.24.6 (was: S3A: Update AWS SDK V2 to 2.24.1) > S3A: Update AWS SDK V2 to 2.24.6 > > > Key: HADOOP-19082 > URL: https://issues.apache.org/jira/browse/HADOOP-19082 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Harshit Gupta >Assignee: Harshit Gupta >Priority: Major > Labels: pull-request-available > > Update the AWS SDK to 2.24.1 from 2.23.5 for latest updates in packaging > w.r.t. imds module. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-19082) S3A: Update AWS SDK V2 to 2.24.6
[ https://issues.apache.org/jira/browse/HADOOP-19082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshit Gupta updated HADOOP-19082: --- Description: Update the AWS SDK to 2.24.6 from 2.23.5 for latest updates in packaging w.r.t. imds module. (was: Update the AWS SDK to 2.24.1 from 2.23.5 for latest updates in packaging w.r.t. imds module.) > S3A: Update AWS SDK V2 to 2.24.6 > > > Key: HADOOP-19082 > URL: https://issues.apache.org/jira/browse/HADOOP-19082 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Harshit Gupta >Assignee: Harshit Gupta >Priority: Major > Labels: pull-request-available > > Update the AWS SDK to 2.24.6 from 2.23.5 for latest updates in packaging > w.r.t. imds module. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19082) Update AWS SDK V2 to 2.24.1
Harshit Gupta created HADOOP-19082: -- Summary: Update AWS SDK V2 to 2.24.1 Key: HADOOP-19082 URL: https://issues.apache.org/jira/browse/HADOOP-19082 Project: Hadoop Common Issue Type: Improvement Components: fs/s3 Affects Versions: 3.4.0 Reporter: Harshit Gupta Assignee: Harshit Gupta Update the AWS SDK to 2.24.1 from 2.23.5 for latest updates in packaging w.r.t. imds module. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18828) Hadoop UGI doesn't work with KCM based kerberos tickets
Harshit Gupta created HADOOP-18828: -- Summary: Hadoop UGI doesn't work with KCM based kerberos tickets Key: HADOOP-18828 URL: https://issues.apache.org/jira/browse/HADOOP-18828 Project: Hadoop Common Issue Type: Improvement Reporter: Harshit Gupta -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18637) S3A to support upload of files greater than 2 GB using DiskBlocks
[ https://issues.apache.org/jira/browse/HADOOP-18637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17711434#comment-17711434 ] Harshit Gupta commented on HADOOP-18637: Sure, I will get on it. > S3A to support upload of files greater than 2 GB using DiskBlocks > - > > Key: HADOOP-18637 > URL: https://issues.apache.org/jira/browse/HADOOP-18637 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Reporter: Harshit Gupta >Assignee: Harshit Gupta >Priority: Major > Labels: pull-request-available > > Use S3A Diskblocks to support the upload of files greater than 2 GB using > DiskBlocks. Currently, the max upload size of a single block is ~2GB. > cc: [~mthakur] [~ste...@apache.org] [~mehakmeet] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18684) Fix S3A filesystem such that the scheme matches the URI scheme
[ https://issues.apache.org/jira/browse/HADOOP-18684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshit Gupta updated HADOOP-18684: --- Affects Version/s: 3.3.5 > Fix S3A filesystem such that the scheme matches the URI scheme > -- > > Key: HADOOP-18684 > URL: https://issues.apache.org/jira/browse/HADOOP-18684 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.3.5 >Reporter: Harshit Gupta >Priority: Major > Labels: pull-request-available > > Certain codepaths use the FileContext API's to perform FS based operations > such as yarn log aggregations. While trying to reuse the S3A connector for > GCS based workloads the yarn log aggregation was not happening. Upon further > investigation it was observed that FileContext API have hardcoded URI scheme > checks that need to disabled/updated to make S3A compatible with non AWS > stores. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18684) Fix S3A filesystem such that the scheme matches the URI scheme
Harshit Gupta created HADOOP-18684: -- Summary: Fix S3A filesystem such that the scheme matches the URI scheme Key: HADOOP-18684 URL: https://issues.apache.org/jira/browse/HADOOP-18684 Project: Hadoop Common Issue Type: Improvement Reporter: Harshit Gupta Certain codepaths use the FileContext API's to perform FS based operations such as yarn log aggregations. While trying to reuse the S3A connector for GCS based workloads the yarn log aggregation was not happening. Upon further investigation it was observed that FileContext API have hardcoded URI scheme checks that need to disabled/updated to make S3A compatible with non AWS stores. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18637) S3A to support upload of files greater than 2 GB using DiskBlocks
[ https://issues.apache.org/jira/browse/HADOOP-18637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691517#comment-17691517 ] Harshit Gupta commented on HADOOP-18637: S3ABlockOutputStream creates three different types of DataBlock depending upon the {{fs.s3a.fast.upload.buffer}} which defaults to disk, we can create an empty file for the same size and limit the Buffer size to {{Integer.MAX_VALUE}} . *For other buffer types should we deny uploads larger than 2 Gigs or should we add the support there as well?* like for {{ByteArrayBlock}} which writes directly to the {{S3AByteArrayOutputStream}} which will be again initialized with {{Integer.MAX_Value}} .The same goes for {{ByteBufferBlock}} as well. One thing to make sure of here is that it's never gonna write something larger than {{Integer.MAX_VALUE}} as the calling function to write has the signature {{public synchronized void write(byte[] source, int offset, int len)}} (S3ABlockOutputStream). *This is just for compatibility with non-AWS s3 stores.* > S3A to support upload of files greater than 2 GB using DiskBlocks > - > > Key: HADOOP-18637 > URL: https://issues.apache.org/jira/browse/HADOOP-18637 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Reporter: Harshit Gupta >Assignee: Harshit Gupta >Priority: Major > > Use S3A Diskblocks to support the upload of files greater than 2 GB using > DiskBlocks. Currently, the max upload size of a single block is ~2GB. > cc: [~mthakur] [~ste...@apache.org] [~mehakmeet] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18637) S3A to support upload of files greater than 2 GB using DiskBlocks
Harshit Gupta created HADOOP-18637: -- Summary: S3A to support upload of files greater than 2 GB using DiskBlocks Key: HADOOP-18637 URL: https://issues.apache.org/jira/browse/HADOOP-18637 Project: Hadoop Common Issue Type: Improvement Reporter: Harshit Gupta Assignee: Harshit Gupta Use S3A Diskblocks to support the upload of files greater than 2 GB using DiskBlocks. Currently, the max upload size of a single block is ~2GB. cc: [~mthakur] [~ste...@apache.org] [~mehakmeet] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] (HADOOP-18589) Create a new config for ABFS read-ahead
[ https://issues.apache.org/jira/browse/HADOOP-18589 ] Harshit Gupta deleted comment on HADOOP-18589: was (Author: harshit.gupta): CC [~snvijaya] [~mthakur] [~ste...@apache.org] > Create a new config for ABFS read-ahead > --- > > Key: HADOOP-18589 > URL: https://issues.apache.org/jira/browse/HADOOP-18589 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure >Reporter: Harshit Gupta >Priority: Major > > Create a new config for ABFS read-ahead to make it easier to enable it and > deprecate the existing one simultaneously. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18589) Create a new config for ABFS read-ahead
[ https://issues.apache.org/jira/browse/HADOOP-18589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshit Gupta updated HADOOP-18589: --- Description: Create a new config for ABFS read-ahead to make it easier to enable it and deprecate the existing one simultaneously. CC [~snvijaya] [~mthakur] [~ste...@apache.org] was: Create a new config for ABFS read-ahead to make it easier to enable it and deprecate the existing one simultaneously. > Create a new config for ABFS read-ahead > --- > > Key: HADOOP-18589 > URL: https://issues.apache.org/jira/browse/HADOOP-18589 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure >Reporter: Harshit Gupta >Priority: Major > > Create a new config for ABFS read-ahead to make it easier to enable it and > deprecate the existing one simultaneously. > > CC [~snvijaya] [~mthakur] [~ste...@apache.org] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18589) Create a new config for ABFS read-ahead
[ https://issues.apache.org/jira/browse/HADOOP-18589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshit Gupta updated HADOOP-18589: --- Description: Create a new config for ABFS read-ahead to make it easier to enable it and deprecate the existing one simultaneously. was:Create a new config for ABFS read-ahead to make it easier to enable it and deprecate the existing one simultaneously. > Create a new config for ABFS read-ahead > --- > > Key: HADOOP-18589 > URL: https://issues.apache.org/jira/browse/HADOOP-18589 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure >Reporter: Harshit Gupta >Priority: Major > > Create a new config for ABFS read-ahead to make it easier to enable it and > deprecate the existing one simultaneously. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18589) Create a new config for ABFS read-ahead
[ https://issues.apache.org/jira/browse/HADOOP-18589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654872#comment-17654872 ] Harshit Gupta commented on HADOOP-18589: CC [~snvijaya] [~mthakur] [~ste...@apache.org] > Create a new config for ABFS read-ahead > --- > > Key: HADOOP-18589 > URL: https://issues.apache.org/jira/browse/HADOOP-18589 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure >Reporter: Harshit Gupta >Priority: Major > > Create a new config for ABFS read-ahead to make it easier to enable it and > deprecate the existing one simultaneously. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18589) Create a new config for ABFS read-ahead
[ https://issues.apache.org/jira/browse/HADOOP-18589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshit Gupta updated HADOOP-18589: --- Component/s: fs/azure > Create a new config for ABFS read-ahead > --- > > Key: HADOOP-18589 > URL: https://issues.apache.org/jira/browse/HADOOP-18589 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure >Reporter: Harshit Gupta >Priority: Major > > Create a new config for ABFS read-ahead to make it easier to enable it and > deprecate the existing one simultaneously. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18589) Create a new config for ABFS read-ahead
Harshit Gupta created HADOOP-18589: -- Summary: Create a new config for ABFS read-ahead Key: HADOOP-18589 URL: https://issues.apache.org/jira/browse/HADOOP-18589 Project: Hadoop Common Issue Type: Improvement Reporter: Harshit Gupta Create a new config for ABFS read-ahead to make it easier to enable it and deprecate the existing one simultaneously. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-18530) ChecksumFileSystem::readVectored might return byte buffers not positioned at 0
[ https://issues.apache.org/jira/browse/HADOOP-18530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshit Gupta reassigned HADOOP-18530: -- Assignee: Harshit Gupta > ChecksumFileSystem::readVectored might return byte buffers not positioned at 0 > -- > > Key: HADOOP-18530 > URL: https://issues.apache.org/jira/browse/HADOOP-18530 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs >Affects Versions: 3.3.5 >Reporter: Harshit Gupta >Assignee: Harshit Gupta >Priority: Blocker > Labels: pull-request-available > > Checksystem::readVectored method returns the byte buffers that are not > positioned at 0, which might be the underlying assumption for the readers > like ORC. > > cc:/ [~mthakur] [~rbalamohan] [~ste...@apache.org] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18530) ChecksumFileSystem::readVectored might return byte buffers not positioned at 0
[ https://issues.apache.org/jira/browse/HADOOP-18530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshit Gupta updated HADOOP-18530: --- Description: Checksystem::readVectored method returns the byte buffers that are not positioned at 0, which might be the underlying assumption for the readers like ORC. cc:/ [~mthakur] [~rbalamohan] [~ste...@apache.org] was:Checksystem::readVectored method returns the byte buffers that are not positioned at 0, which might be the underlying assumption for the readers like ORC. > ChecksumFileSystem::readVectored might return byte buffers not positioned at 0 > -- > > Key: HADOOP-18530 > URL: https://issues.apache.org/jira/browse/HADOOP-18530 > Project: Hadoop Common > Issue Type: Bug >Reporter: Harshit Gupta >Priority: Major > > Checksystem::readVectored method returns the byte buffers that are not > positioned at 0, which might be the underlying assumption for the readers > like ORC. > > cc:/ [~mthakur] [~rbalamohan] [~ste...@apache.org] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18530) ChecksumFileSystem::readVectored might return byte buffers not positioned at 0
Harshit Gupta created HADOOP-18530: -- Summary: ChecksumFileSystem::readVectored might return byte buffers not positioned at 0 Key: HADOOP-18530 URL: https://issues.apache.org/jira/browse/HADOOP-18530 Project: Hadoop Common Issue Type: Bug Reporter: Harshit Gupta Checksystem::readVectored method returns the byte buffers that are not positioned at 0, which might be the underlying assumption for the readers like ORC. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org