[jira] [Commented] (HADOOP-18759) [ABFS][Backoff-Optimization] Have a Static retry policy for connection timeout failures
[ https://issues.apache.org/jira/browse/HADOOP-18759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825140#comment-17825140 ] Anoop Sam John commented on HADOOP-18759: - PR link : https://github.com/apache/hadoop/pull/5881/ > [ABFS][Backoff-Optimization] Have a Static retry policy for connection > timeout failures > --- > > Key: HADOOP-18759 > URL: https://issues.apache.org/jira/browse/HADOOP-18759 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.3.4 >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Fix For: 3.5.0 > > > Today when a request fails with connection timeout, it falls back into the > loop for exponential retry. Unlike Azure Storage, there are no guarantees of > success on exponentially retried request or recommendations for ideal retry > policies for Azure network or any other general failures. Faster failure and > retry might be more beneficial for such generic connection timeout failures. > This PR introduces a new Static Retry Policy which will currently be used > only for Connection Timeout failures. It means all the requests failing with > Connection Timeout errors will be retried after a constant retry(sleep) > interval independent of how many times that request has failed. Max Retry > Count check will still be in place. > Following Configurations will be introduced in the change: > # "fs.azure.static.retry.for.connection.timeout.enabled" - default: true, > true: static retry will be used for CT, false: Exponential retry will be used. > # "fs.azure.static.retry.interval" - default: 1000ms. > This also introduces a new field in x-ms-client-request-id only for the > requests that are being retried after connection timeout failure. New filed > will tell what retry policy was used to get the sleep interval before making > this request. > Header "x-ms-client-request-id " right now has only the retryCount and > retryReason this particular API call is. For ex: > :eb06d8f6-5693-461b-b63c-5858fa7655e6:29cb0d19-2b68-4409-bc35-cb7160b90dd8:::CF:1_CT. > Moving ahead for retryReason "CT" it will have retry policy abbreviation as > well. > For ex: > :eb06d8f6-5693-461b-b63c-5858fa7655e6:29cb0d19-2b68-4409-bc35-cb7160b90dd8:::CF:1_CT_E. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17643) WASB : Make metadata checks case insensitive
[ https://issues.apache.org/jira/browse/HADOOP-17643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17455104#comment-17455104 ] Anoop Sam John commented on HADOOP-17643: - I put up PRs so that the merge is easier. Good to go in. I thought these were merged prior. Sorry for not checking. > WASB : Make metadata checks case insensitive > > > Key: HADOOP-17643 > URL: https://issues.apache.org/jira/browse/HADOOP-17643 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 2.7.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > WASB driver uses meta data on blobs to denote permission, whether its a place > holder 0 sized blob for dir etc. > For storage migration users uses Azcopy, it copies the blobs but will cause > the metadata keys to get changed to camel case. As per discussion with MSFT > Azcopy team, this is a known issue and technical limitation. This is what > Azcopy team explained > "For context, blob metadata is implemented with HTTP headers. They are case > insensitive but case preserving. > There is a known issue with the Go language. The HTTP client that it provides > does this case modification to the response headers before we can read the > raw values, so the destination metadata keys have a different casing than the > source. We’ve reached out to the Go Team in the past but weren’t successful > in convincing them to change the behaviour. We don’t have a short term > solution right now" > So propose to change the metadata key checks to do case insensitive checks. > May be make case insensitive check configurable with defaults to false for > compatibility. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17848) Hadoop NativeAzureFileSystem append removes ownership set on the file
[ https://issues.apache.org/jira/browse/HADOOP-17848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17402737#comment-17402737 ] Anoop Sam John commented on HADOOP-17848: - The call fs.setPermission(filePath, new FsPermission(FILE_LOG_PERMISSIONS)); is removing the owner/group details? What if this is not there but only create and later append call to file? > Hadoop NativeAzureFileSystem append removes ownership set on the file > - > > Key: HADOOP-17848 > URL: https://issues.apache.org/jira/browse/HADOOP-17848 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.3.1 >Reporter: Prabhu Joseph >Priority: Major > > *Repro:* Create Operation sets ownership whereas append operation removes the > same. > Create: > *// -rw-r--r-- 1 root supergroup 1 2021-08-15 11:02 /tmp/dummyfile* > Append: > *// -rwxrwxrwx 1 2 2021-08-15 11:04 /tmp/dummyfile* > {code:java} > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.fs.FSDataOutputStream; > import org.apache.hadoop.fs.FileSystem; > import org.apache.hadoop.fs.Path; > import org.apache.hadoop.fs.permission.FsPermission; > public class Wasb { > private static final short FILE_LOG_PERMISSIONS = 0640; > > public static void main(String[] args) throws Exception { > > Configuration fsConf = new Configuration(); > fsConf.set("fs.azure.enable.append.support", "true"); > Path filePath = new Path("/tmp/dummyfile"); > FileSystem fs = FileSystem.newInstance(filePath.toUri(), fsConf); > FSDataOutputStream stream = fs.create(filePath, false); > stream.write(12345); > stream.close(); > stream = fs.append(filePath); > stream.write(888); > stream.close(); > fs.setPermission(filePath, new FsPermission(FILE_LOG_PERMISSIONS)); > fs.close(); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17770) WASB : Support disabling buffered reads in positional reads
[ https://issues.apache.org/jira/browse/HADOOP-17770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385411#comment-17385411 ] Anoop Sam John commented on HADOOP-17770: - Still many existing customers on WASB based storage. Create a new cluster WITHOUT storage copy is an easy job. But when it comes to new cluster with copy data (to ADL gen2 account) its not that easy. So this would be benefitted. Anyways with other patch the basic infra was made in place. This patch for WASB was much easy to do. :-) > WASB : Support disabling buffered reads in positional reads > --- > > Key: HADOOP-17770 > URL: https://issues.apache.org/jira/browse/HADOOP-17770 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: AvgLatency_0 cache hit.png, AvgLatency_50% cache > hit.png, Throughput_0 cache hit.png, Throughput_50% cache hit.png > > Time Spent: 4h 10m > Remaining Estimate: 0h > > This is just like HADOOP-17038 > Right now it will do a seek to the position , read and then seek back to the > old position. (As per the impl in the super class) > In HBase kind of workloads we rely mostly on short preads. (like 64 KB size > by default). So would be ideal to support a pure pos read API which will not > even keep the data in a buffer but will only read the required data as what > is asked for by the caller. (Not reading ahead more data as per the read size > config) > Allow an optional boolean config to be specified while opening file for read > using which buffered pread can be disabled. > FutureDataInputStreamBuilder openFile(Path path) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-17770) WASB : Support disabling buffered reads in positional reads
[ https://issues.apache.org/jira/browse/HADOOP-17770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17375234#comment-17375234 ] Anoop Sam John edited comment on HADOOP-17770 at 7/6/21, 5:38 AM: -- Ran below perf test with an HBase usage Have a 3 RS node cluster. Region Server node SKU is DS13 v2. Storage is premium Block blob storage account. Region Server Xmx = 32 GB Created one table with 15 presplits ( So that 5 regions/RS) Pumped 100 GB of data using HBase PE tool. Single column/row with 1 KB row size. Table is NOT major compacted. Every region is having 2 or 3 HFiles under it. Doing random read PE tests (Every read req is a multi get, getting 300 rows). Ran with different number of tests. 1st case is where we make sure 0% cache hit always (skipping the data caching) 2nd case is when RS is having file mode cache with cache size ~50% of data size. The cache hit ration ~50% Case #1 !Throughput_0 cache hit.png! !AvgLatency_0 cache hit.png! Case #2 !Throughput_50% cache hit.png! !AvgLatency_50% cache hit.png! was (Author: anoop.hbase): Ran below perf test with an HBase usage Have a 3 RS node cluster. Region Server node SKU is DS13 v2. Storage is premium Block blob storage account. Region Server Xmx = 32 GB Created one table with 15 presplits ( So that 5 regions/RS) Pumped 100 GB of data using HBase PE tool. Single column/row with 1 KB row size. Doing random read PE tests (Every read req is a multi get, getting 300 rows). Ran with different number of tests. 1st case is where we make sure 0% cache hit always (skipping the data caching) 2nd case is when RS is having file mode cache with cache size ~50% of data size. The cache hit ration ~50% Case #1 !Throughput_0 cache hit.png! !AvgLatency_0 cache hit.png! Case #2 !Throughput_50% cache hit.png! !AvgLatency_50% cache hit.png! > WASB : Support disabling buffered reads in positional reads > --- > > Key: HADOOP-17770 > URL: https://issues.apache.org/jira/browse/HADOOP-17770 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: AvgLatency_0 cache hit.png, AvgLatency_50% cache > hit.png, Throughput_0 cache hit.png, Throughput_50% cache hit.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > This is just like HADOOP-17038 > Right now it will do a seek to the position , read and then seek back to the > old position. (As per the impl in the super class) > In HBase kind of workloads we rely mostly on short preads. (like 64 KB size > by default). So would be ideal to support a pure pos read API which will not > even keep the data in a buffer but will only read the required data as what > is asked for by the caller. (Not reading ahead more data as per the read size > config) > Allow an optional boolean config to be specified while opening file for read > using which buffered pread can be disabled. > FutureDataInputStreamBuilder openFile(Path path) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17770) WASB : Support disabling buffered reads in positional reads
[ https://issues.apache.org/jira/browse/HADOOP-17770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17375234#comment-17375234 ] Anoop Sam John commented on HADOOP-17770: - Ran below perf test with an HBase usage Have a 3 RS node cluster. Region Server node SKU is DS13 v2. Storage is premium Block blob storage account. Region Server Xmx = 32 GB Created one table with 15 presplits ( So that 5 regions/RS) Pumped 100 GB of data using HBase PE tool. Single column/row with 1 KB row size. Doing random read PE tests (Every read req is a multi get, getting 300 rows). Ran with different number of tests. 1st case is where we make sure 0% cache hit always (skipping the data caching) 2nd case is when RS is having file mode cache with cache size ~50% of data size. The cache hit ration ~50% Case #1 !Throughput_0 cache hit.png! !AvgLatency_0 cache hit.png! Case #2 !Throughput_50% cache hit.png! !AvgLatency_50% cache hit.png! > WASB : Support disabling buffered reads in positional reads > --- > > Key: HADOOP-17770 > URL: https://issues.apache.org/jira/browse/HADOOP-17770 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: AvgLatency_0 cache hit.png, AvgLatency_50% cache > hit.png, Throughput_0 cache hit.png, Throughput_50% cache hit.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > This is just like HADOOP-17038 > Right now it will do a seek to the position , read and then seek back to the > old position. (As per the impl in the super class) > In HBase kind of workloads we rely mostly on short preads. (like 64 KB size > by default). So would be ideal to support a pure pos read API which will not > even keep the data in a buffer but will only read the required data as what > is asked for by the caller. (Not reading ahead more data as per the read size > config) > Allow an optional boolean config to be specified while opening file for read > using which buffered pread can be disabled. > FutureDataInputStreamBuilder openFile(Path path) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17770) WASB : Support disabling buffered reads in positional reads
[ https://issues.apache.org/jira/browse/HADOOP-17770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HADOOP-17770: Attachment: AvgLatency_50% cache hit.png > WASB : Support disabling buffered reads in positional reads > --- > > Key: HADOOP-17770 > URL: https://issues.apache.org/jira/browse/HADOOP-17770 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: AvgLatency_0 cache hit.png, AvgLatency_50% cache > hit.png, Throughput_0 cache hit.png, Throughput_50% cache hit.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > This is just like HADOOP-17038 > Right now it will do a seek to the position , read and then seek back to the > old position. (As per the impl in the super class) > In HBase kind of workloads we rely mostly on short preads. (like 64 KB size > by default). So would be ideal to support a pure pos read API which will not > even keep the data in a buffer but will only read the required data as what > is asked for by the caller. (Not reading ahead more data as per the read size > config) > Allow an optional boolean config to be specified while opening file for read > using which buffered pread can be disabled. > FutureDataInputStreamBuilder openFile(Path path) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17770) WASB : Support disabling buffered reads in positional reads
[ https://issues.apache.org/jira/browse/HADOOP-17770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HADOOP-17770: Attachment: Throughput_50% cache hit.png > WASB : Support disabling buffered reads in positional reads > --- > > Key: HADOOP-17770 > URL: https://issues.apache.org/jira/browse/HADOOP-17770 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: AvgLatency_0 cache hit.png, AvgLatency_50% cache > hit.png, Throughput_0 cache hit.png, Throughput_50% cache hit.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > This is just like HADOOP-17038 > Right now it will do a seek to the position , read and then seek back to the > old position. (As per the impl in the super class) > In HBase kind of workloads we rely mostly on short preads. (like 64 KB size > by default). So would be ideal to support a pure pos read API which will not > even keep the data in a buffer but will only read the required data as what > is asked for by the caller. (Not reading ahead more data as per the read size > config) > Allow an optional boolean config to be specified while opening file for read > using which buffered pread can be disabled. > FutureDataInputStreamBuilder openFile(Path path) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17770) WASB : Support disabling buffered reads in positional reads
[ https://issues.apache.org/jira/browse/HADOOP-17770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HADOOP-17770: Attachment: AvgLatency_0 cache hit.png > WASB : Support disabling buffered reads in positional reads > --- > > Key: HADOOP-17770 > URL: https://issues.apache.org/jira/browse/HADOOP-17770 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: AvgLatency_0 cache hit.png, Throughput_0 cache hit.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > This is just like HADOOP-17038 > Right now it will do a seek to the position , read and then seek back to the > old position. (As per the impl in the super class) > In HBase kind of workloads we rely mostly on short preads. (like 64 KB size > by default). So would be ideal to support a pure pos read API which will not > even keep the data in a buffer but will only read the required data as what > is asked for by the caller. (Not reading ahead more data as per the read size > config) > Allow an optional boolean config to be specified while opening file for read > using which buffered pread can be disabled. > FutureDataInputStreamBuilder openFile(Path path) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17770) WASB : Support disabling buffered reads in positional reads
[ https://issues.apache.org/jira/browse/HADOOP-17770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HADOOP-17770: Attachment: Throughput_0 cache hit.png > WASB : Support disabling buffered reads in positional reads > --- > > Key: HADOOP-17770 > URL: https://issues.apache.org/jira/browse/HADOOP-17770 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: Throughput_0 cache hit.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > This is just like HADOOP-17038 > Right now it will do a seek to the position , read and then seek back to the > old position. (As per the impl in the super class) > In HBase kind of workloads we rely mostly on short preads. (like 64 KB size > by default). So would be ideal to support a pure pos read API which will not > even keep the data in a buffer but will only read the required data as what > is asked for by the caller. (Not reading ahead more data as per the read size > config) > Allow an optional boolean config to be specified while opening file for read > using which buffered pread can be disabled. > FutureDataInputStreamBuilder openFile(Path path) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17770) WASB : Support disabling buffered reads in positional reads
[ https://issues.apache.org/jira/browse/HADOOP-17770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HADOOP-17770: Fix Version/s: 3.4.0 > WASB : Support disabling buffered reads in positional reads > --- > > Key: HADOOP-17770 > URL: https://issues.apache.org/jira/browse/HADOOP-17770 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Fix For: 3.4.0 > > > This is just like HADOOP-17038 > Right now it will do a seek to the position , read and then seek back to the > old position. (As per the impl in the super class) > In HBase kind of workloads we rely mostly on short preads. (like 64 KB size > by default). So would be ideal to support a pure pos read API which will not > even keep the data in a buffer but will only read the required data as what > is asked for by the caller. (Not reading ahead more data as per the read size > config) > Allow an optional boolean config to be specified while opening file for read > using which buffered pread can be disabled. > FutureDataInputStreamBuilder openFile(Path path) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17770) WASB : Support disabling buffered reads in positional reads
[ https://issues.apache.org/jira/browse/HADOOP-17770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HADOOP-17770: Description: This is just like HADOOP-17038 Right now it will do a seek to the position , read and then seek back to the old position. (As per the impl in the super class) In HBase kind of workloads we rely mostly on short preads. (like 64 KB size by default). So would be ideal to support a pure pos read API which will not even keep the data in a buffer but will only read the required data as what is asked for by the caller. (Not reading ahead more data as per the read size config) Allow an optional boolean config to be specified while opening file for read using which buffered pread can be disabled. FutureDataInputStreamBuilder openFile(Path path) was:This is just like HADOOP-17038 > WASB : Support disabling buffered reads in positional reads > --- > > Key: HADOOP-17770 > URL: https://issues.apache.org/jira/browse/HADOOP-17770 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > > This is just like HADOOP-17038 > Right now it will do a seek to the position , read and then seek back to the > old position. (As per the impl in the super class) > In HBase kind of workloads we rely mostly on short preads. (like 64 KB size > by default). So would be ideal to support a pure pos read API which will not > even keep the data in a buffer but will only read the required data as what > is asked for by the caller. (Not reading ahead more data as per the read size > config) > Allow an optional boolean config to be specified while opening file for read > using which buffered pread can be disabled. > FutureDataInputStreamBuilder openFile(Path path) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-17770) WASB : Support disabling buffered reads in positional reads
Anoop Sam John created HADOOP-17770: --- Summary: WASB : Support disabling buffered reads in positional reads Key: HADOOP-17770 URL: https://issues.apache.org/jira/browse/HADOOP-17770 Project: Hadoop Common Issue Type: Improvement Reporter: Anoop Sam John Assignee: Anoop Sam John This is just like HADOOP-17038 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17691) Abfs directory delete times out on large directory tree: OperationTimedOut
[ https://issues.apache.org/jira/browse/HADOOP-17691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17345735#comment-17345735 ] Anoop Sam John commented on HADOOP-17691: - Ping [~sneha_varma] > Abfs directory delete times out on large directory tree: OperationTimedOut > -- > > Key: HADOOP-17691 > URL: https://issues.apache.org/jira/browse/HADOOP-17691 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Priority: Major > > Timeouts surfacing on abfs when a delete of a large directory tree is invoked. > {code} > StatusDescription=Operation could not be completed within the specified time. > ErrorCode=OperationTimedOut > ErrorMessage=Operation could not be completed within the specified time. > {code} > This has surfaced in v1 FileOutputCommitter cleanups, implying the > directories created there (many many dirs, no files remaining after the job > commit) is sufficient to create the problem. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-17687) ABFS: delete call sets Socket timeout lesser than query timeout leading to failures
[ https://issues.apache.org/jira/browse/HADOOP-17687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17343031#comment-17343031 ] Anoop Sam John edited comment on HADOOP-17687 at 5/12/21, 7:43 AM: --- >From driver the operation time out been hard coded to 90sec which is passed in >the HTTP req. But server is not honoring this at all as cap timeout at server >end is 30 sec for DELETE op. Also in our driver we set the read timeout on the socket (we try read the op response ) to be 30 sec. So ya ideally by all means 30 sec is the max time for delete today. was (Author: anoop.hbase): >From drive the operation time out been hard coded to 90sec which is passed in >the HTTP req. But server is not honoring this at all as cap timeout at server >end is 30 sec for DELETE op. Also in our driver we set the read timeout on the socket (we try read the op response ) to be 30 sec. So ya ideally by all means 30 sec is the max time for delete today. > ABFS: delete call sets Socket timeout lesser than query timeout leading to > failures > --- > > Key: HADOOP-17687 > URL: https://issues.apache.org/jira/browse/HADOOP-17687 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Priority: Minor > > ABFS Driver sets Socket timeout to 30 seconds and query timeout to 90 > seconds. The client will fail with SocketTimeoutException when the delete > path has huge number of dirs/files before the actual query timeout. The > socket timeout has to be greater than query timeout value. And it is good to > have this timeout configurable to avoid failures when delete call takes more > than the hardcoded configuration. > {code} > 21/03/26 09:24:00 DEBUG services.AbfsClient: First execution of REST > operation - DeletePath > . > 21/03/26 09:24:30 DEBUG services.AbfsClient: HttpRequestFailure: > 0,,cid=bf4e4d0b,rid=,sent=0,recv=0,DELETE,https://prabhuAbfs.dfs.core.windows.net/general/output/_temporary?timeout=90&recursive=true > java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > at java.net.SocketInputStream.read(SocketInputStream.java:171) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at org.wildfly.openssl.OpenSSLSocket.read(OpenSSLSocket.java:423) > at > org.wildfly.openssl.OpenSSLInputStream.read(OpenSSLInputStream.java:41) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:743) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1593) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498) > at > java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) > at > sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:352) > at > org.apache.hadoop.fs.azurebfs.services.AbfsHttpOperation.processResponse(AbfsHttpOperation.java:303) > at > org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.executeHttpOperation(AbfsRestOperation.java:192) > at > org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:134) > at > org.apache.hadoop.fs.azurebfs.services.AbfsClient.deletePath(AbfsClient.java:462) > at > org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.delete(AzureBlobFileSystemStore.java:558) > at > org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.delete(AzureBlobFileSystem.java:339) > at org.apache.hadoop.fs.shell.Delete$Rm.processPath(Delete.java:121) > at > org.apache.hadoop.fs.shell.Command.processPathInternal(Command.java:367) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331) > at > org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:304) > at > org.apache.hadoop.fs.shell.Command.processArgument(Command.java:286) > at > org.apache.hadoop.fs.shell.Command.processArguments(Command.java:270) > at > org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:120) > at org.apache.hadoop.fs.shell.Command.run(Command.java:177) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:328) >
[jira] [Commented] (HADOOP-17691) Abfs directory delete times out on large directory tree: OperationTimedOut
[ https://issues.apache.org/jira/browse/HADOOP-17691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17343086#comment-17343086 ] Anoop Sam John commented on HADOOP-17691: - >From driver the operation time out been hard coded to 90sec which is passed in >the HTTP req. But server is not honoring this at all as cap timeout at server >end is 30 sec for DELETE op. Also in our driver we set the read timeout on the socket (we try read the op response ) to be 30 sec. So ya ideally by all means 30 sec is the max time for delete today. > Abfs directory delete times out on large directory tree: OperationTimedOut > -- > > Key: HADOOP-17691 > URL: https://issues.apache.org/jira/browse/HADOOP-17691 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Priority: Major > > Timeouts surfacing on abfs when a delete of a large directory tree is invoked. > {code} > StatusDescription=Operation could not be completed within the specified time. > ErrorCode=OperationTimedOut > ErrorMessage=Operation could not be completed within the specified time. > {code} > This has surfaced in v1 FileOutputCommitter cleanups, implying the > directories created there (many many dirs, no files remaining after the job > commit) is sufficient to create the problem. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17691) Abfs directory delete times out on large directory tree: OperationTimedOut
[ https://issues.apache.org/jira/browse/HADOOP-17691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17343032#comment-17343032 ] Anoop Sam John commented on HADOOP-17691: - cc [~snvijaya] > Abfs directory delete times out on large directory tree: OperationTimedOut > -- > > Key: HADOOP-17691 > URL: https://issues.apache.org/jira/browse/HADOOP-17691 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Priority: Major > > Timeouts surfacing on abfs when a delete of a large directory tree is invoked. > {code} > StatusDescription=Operation could not be completed within the specified time. > ErrorCode=OperationTimedOut > ErrorMessage=Operation could not be completed within the specified time. > {code} > This has surfaced in v1 FileOutputCommitter cleanups, implying the > directories created there (many many dirs, no files remaining after the job > commit) is sufficient to create the problem. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17687) ABFS: delete call sets Socket timeout lesser than query timeout leading to failures
[ https://issues.apache.org/jira/browse/HADOOP-17687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17343031#comment-17343031 ] Anoop Sam John commented on HADOOP-17687: - >From drive the operation time out been hard coded to 90sec which is passed in >the HTTP req. But server is not honoring this at all as cap timeout at server >end is 30 sec for DELETE op. Also in our driver we set the read timeout on the socket (we try read the op response ) to be 30 sec. So ya ideally by all means 30 sec is the max time for delete today. > ABFS: delete call sets Socket timeout lesser than query timeout leading to > failures > --- > > Key: HADOOP-17687 > URL: https://issues.apache.org/jira/browse/HADOOP-17687 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Priority: Minor > > ABFS Driver sets Socket timeout to 30 seconds and query timeout to 90 > seconds. The client will fail with SocketTimeoutException when the delete > path has huge number of dirs/files before the actual query timeout. The > socket timeout has to be greater than query timeout value. And it is good to > have this timeout configurable to avoid failures when delete call takes more > than the hardcoded configuration. > {code} > 21/03/26 09:24:00 DEBUG services.AbfsClient: First execution of REST > operation - DeletePath > . > 21/03/26 09:24:30 DEBUG services.AbfsClient: HttpRequestFailure: > 0,,cid=bf4e4d0b,rid=,sent=0,recv=0,DELETE,https://prabhuAbfs.dfs.core.windows.net/general/output/_temporary?timeout=90&recursive=true > java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > at java.net.SocketInputStream.read(SocketInputStream.java:171) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at org.wildfly.openssl.OpenSSLSocket.read(OpenSSLSocket.java:423) > at > org.wildfly.openssl.OpenSSLInputStream.read(OpenSSLInputStream.java:41) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:743) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1593) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498) > at > java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) > at > sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:352) > at > org.apache.hadoop.fs.azurebfs.services.AbfsHttpOperation.processResponse(AbfsHttpOperation.java:303) > at > org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.executeHttpOperation(AbfsRestOperation.java:192) > at > org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:134) > at > org.apache.hadoop.fs.azurebfs.services.AbfsClient.deletePath(AbfsClient.java:462) > at > org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.delete(AzureBlobFileSystemStore.java:558) > at > org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.delete(AzureBlobFileSystem.java:339) > at org.apache.hadoop.fs.shell.Delete$Rm.processPath(Delete.java:121) > at > org.apache.hadoop.fs.shell.Command.processPathInternal(Command.java:367) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331) > at > org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:304) > at > org.apache.hadoop.fs.shell.Command.processArgument(Command.java:286) > at > org.apache.hadoop.fs.shell.Command.processArguments(Command.java:270) > at > org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:120) > at org.apache.hadoop.fs.shell.Command.run(Command.java:177) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:328) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:391) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: c
[jira] [Assigned] (HADOOP-17640) ABFS: transient failure of TestAzureBlobFileSystemFileStatus.testLastModifiedTime
[ https://issues.apache.org/jira/browse/HADOOP-17640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John reassigned HADOOP-17640: --- Assignee: Anoop Sam John > ABFS: transient failure of > TestAzureBlobFileSystemFileStatus.testLastModifiedTime > - > > Key: HADOOP-17640 > URL: https://issues.apache.org/jira/browse/HADOOP-17640 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure, test >Affects Versions: 3.3.1 >Reporter: Steve Loughran >Assignee: Anoop Sam John >Priority: Minor > > saw a transient failure of > TestAzureBlobFileSystemFileStatus.testLastModifiedTime during a parallel > (threads=8) test run against UK-west > {code} > java.lang.AssertionError: lastModifiedTime should be before createEndTime > {code} > assumption: the times are in fact equal, though the fact the assert doesn't > include the values makes this hard to guarantee -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-17637) ABFS: rename ListResultSchemaTest to TestListResultSchema so maven tests run it
[ https://issues.apache.org/jira/browse/HADOOP-17637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John reassigned HADOOP-17637: --- Assignee: Anoop Sam John > ABFS: rename ListResultSchemaTest to TestListResultSchema so maven tests run > it > --- > > Key: HADOOP-17637 > URL: https://issues.apache.org/jira/browse/HADOOP-17637 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure, test >Affects Versions: 3.3.1 >Reporter: Steve Loughran >Assignee: Anoop Sam John >Priority: Minor > > The test {{ListResultSchemaTest}} from HADOOP-17086 won't actually be run in > maven test runs because the Test- needs to come as a prefix. > Fix: rename -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17645) Fix test failures in org.apache.hadoop.fs.azure.ITestOutputStreamSemantics
[ https://issues.apache.org/jira/browse/HADOOP-17645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17333135#comment-17333135 ] Anoop Sam John commented on HADOOP-17645: - I was checking more on the WASB tests [~ste...@apache.org] these days while working on HADOOP-17643. I see you mentioned 4 Jiras and out of which 2 are in. May I check the other 2 as separate one item as that is ABFS related. For this test, am dealing with my Blob storage alone. Will you be able to pull this small test fix in? So I can raise the PR for HADOOP-17643 > Fix test failures in org.apache.hadoop.fs.azure.ITestOutputStreamSemantics > -- > > Key: HADOOP-17645 > URL: https://issues.apache.org/jira/browse/HADOOP-17645 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.3.1, 3.4.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Failures after HADOOP-13327 > PageBlob and Compacting BlockBlob having only hflush and hsync capability. > The test wrongly assert capability DROPBEHIND, READAHEAD, UNBUFFER -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17645) Fix test failures in org.apache.hadoop.fs.azure.ITestOutputStreamSemantics
[ https://issues.apache.org/jira/browse/HADOOP-17645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324977#comment-17324977 ] Anoop Sam John commented on HADOOP-17645: - Thanks. Moved the discuss on ITestWasbUriAndConfiguration to other Jira. This one will be kept for original issue as in title > Fix test failures in org.apache.hadoop.fs.azure.ITestOutputStreamSemantics > -- > > Key: HADOOP-17645 > URL: https://issues.apache.org/jira/browse/HADOOP-17645 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.3.1, 3.4.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Failures after HADOOP-13327 > PageBlob and Compacting BlockBlob having only hflush and hsync capability. > The test wrongly assert capability DROPBEHIND, READAHEAD, UNBUFFER -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17641) ITestWasbUriAndConfiguration.testCanonicalServiceName() failing now mockaccount exists
[ https://issues.apache.org/jira/browse/HADOOP-17641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324976#comment-17324976 ] Anoop Sam John commented on HADOOP-17641: - Should we remove this part in test? {code} intercept(IllegalArgumentException.class, "java.net.UnknownHostException", () -> fs0.getCanonicalServiceName()); {code} Any mock named account we can not guarantee not to be present always. Or else it has to be a wrong account name completely like wrong domain suffix or so. Am not having full idea on all parts of WASB tests. So not able to take a call here. > ITestWasbUriAndConfiguration.testCanonicalServiceName() failing now > mockaccount exists > -- > > Key: HADOOP-17641 > URL: https://issues.apache.org/jira/browse/HADOOP-17641 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure, test >Affects Versions: 3.3.0, 3.2.1 >Reporter: Steve Loughran >Priority: Minor > > The test ITestWasbUriAndConfiguration.testCanonicalServiceName() is failing > in its intercept > [ERROR] ITestWasbUriAndConfiguration.testCanonicalServiceName:656 Expected > a java.lang.IllegalArgumentException to be thrown, but got the result: : > "20.38.122.132:0" > Root cause is: the mock account in > AzureBlobStorageTestAccount.MOCK_ACCOUNT_NAME is > "mockAccount.blob.core.windows.net" and *someone has created that account* > This means it resolves > nslookup mockAccount.blob.core.windows.net > Server: 172.18.64.15 > Address: 172.18.64.15#53 > Non-authoritative answer: > mockAccount.blob.core.windows.net canonical name = > blob.dsm08prdstr02a.store.core.windows.net. > Name: blob.dsm08prdstr02a.store.core.windows.net > Address: 20.38.122.132 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17645) Fix test failures in org.apache.hadoop.fs.azure.ITestOutputStreamSemantics
[ https://issues.apache.org/jira/browse/HADOOP-17645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324683#comment-17324683 ] Anoop Sam John commented on HADOOP-17645: - Noticed one more consistent fail in ITestWasbUriAndConfiguration#testCanonicalServiceName {code} // Default getCanonicalServiceName() will try to resolve the host to IP, // because the mock container does not exist, this call is expected to fail. intercept(IllegalArgumentException.class, "java.net.UnknownHostException", () -> fs0.getCanonicalServiceName()); {code} Here the container name is a mock one 'mockContainer' and account name is mockAccount.blob.core.windows.net While resolve to IP, the container wont come into picture at all. And seems there is an account 'mockAccount.blob.core.windows.net' present now as it resolves to an IP. So this test always fails now. I feel its a flaky test. We rely on a storage account name and assume it wont be present. We can not guarantee that. Better we can remove this part from test. The test already checks for the getCanonicalServiceName() with conf 'fs.azure.override.canonical.service.name' to true. Thoughts [~ste...@apache.org]. May I fix this also as part of this PR? > Fix test failures in org.apache.hadoop.fs.azure.ITestOutputStreamSemantics > -- > > Key: HADOOP-17645 > URL: https://issues.apache.org/jira/browse/HADOOP-17645 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.3.1, 3.4.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Failures after HADOOP-13327 > PageBlob and Compacting BlockBlob having only hflush and hsync capability. > The test wrongly assert capability DROPBEHIND, READAHEAD, UNBUFFER -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17645) Fix test failures in org.apache.hadoop.fs.azure.ITestOutputStreamSemantics
[ https://issues.apache.org/jira/browse/HADOOP-17645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324481#comment-17324481 ] Anoop Sam John commented on HADOOP-17645: - Ping [~ste...@apache.org]. Simple test fix. Noticed while working on another issue > Fix test failures in org.apache.hadoop.fs.azure.ITestOutputStreamSemantics > -- > > Key: HADOOP-17645 > URL: https://issues.apache.org/jira/browse/HADOOP-17645 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.3.1, 3.4.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Failures after HADOOP-13327 > PageBlob and Compacting BlockBlob having only hflush and hsync capability. > The test wrongly assert capability DROPBEHIND, READAHEAD, UNBUFFER -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-17645) Fix test failures in org.apache.hadoop.fs.azure.ITestOutputStreamSemantics
Anoop Sam John created HADOOP-17645: --- Summary: Fix test failures in org.apache.hadoop.fs.azure.ITestOutputStreamSemantics Key: HADOOP-17645 URL: https://issues.apache.org/jira/browse/HADOOP-17645 Project: Hadoop Common Issue Type: Bug Affects Versions: 3.3.1, 3.4.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Failures after HADOOP-13327 PageBlob and Compacting BlockBlob having only hflush and hsync capability. The test wrongly assert capability DROPBEHIND, READAHEAD, UNBUFFER -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17643) WASB : Make metadata checks case insensitive
[ https://issues.apache.org/jira/browse/HADOOP-17643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HADOOP-17643: Affects Version/s: 2.7.0 > WASB : Make metadata checks case insensitive > > > Key: HADOOP-17643 > URL: https://issues.apache.org/jira/browse/HADOOP-17643 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 2.7.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > > WASB driver uses meta data on blobs to denote permission, whether its a place > holder 0 sized blob for dir etc. > For storage migration users uses Azcopy, it copies the blobs but will cause > the metadata keys to get changed to camel case. As per discussion with MSFT > Azcopy team, this is a known issue and technical limitation. This is what > Azcopy team explained > "For context, blob metadata is implemented with HTTP headers. They are case > insensitive but case preserving. > There is a known issue with the Go language. The HTTP client that it provides > does this case modification to the response headers before we can read the > raw values, so the destination metadata keys have a different casing than the > source. We’ve reached out to the Go Team in the past but weren’t successful > in convincing them to change the behaviour. We don’t have a short term > solution right now" > So propose to change the metadata key checks to do case insensitive checks. > May be make case insensitive check configurable with defaults to false for > compatibility. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-17643) WASB : Make metadata checks case insensitive
Anoop Sam John created HADOOP-17643: --- Summary: WASB : Make metadata checks case insensitive Key: HADOOP-17643 URL: https://issues.apache.org/jira/browse/HADOOP-17643 Project: Hadoop Common Issue Type: Improvement Reporter: Anoop Sam John Assignee: Anoop Sam John WASB driver uses meta data on blobs to denote permission, whether its a place holder 0 sized blob for dir etc. For storage migration users uses Azcopy, it copies the blobs but will cause the metadata keys to get changed to camel case. As per discussion with MSFT Azcopy team, this is a known issue and technical limitation. This is what Azcopy team explained "For context, blob metadata is implemented with HTTP headers. They are case insensitive but case preserving. There is a known issue with the Go language. The HTTP client that it provides does this case modification to the response headers before we can read the raw values, so the destination metadata keys have a different casing than the source. We’ve reached out to the Go Team in the past but weren’t successful in convincing them to change the behaviour. We don’t have a short term solution right now" So propose to change the metadata key checks to do case insensitive checks. May be make case insensitive check configurable with defaults to false for compatibility. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14748) Wasb input streams to implement CanUnbuffer
[ https://issues.apache.org/jira/browse/HADOOP-14748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17296779#comment-17296779 ] Anoop Sam John commented on HADOOP-14748: - [~ste...@apache.org] This was closed because of lack of bandwidth only right? Still this is valid case for WASB and specifically for HBase. I will possibly start working on this. > Wasb input streams to implement CanUnbuffer > --- > > Key: HADOOP-14748 > URL: https://issues.apache.org/jira/browse/HADOOP-14748 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Priority: Minor > > HBase relies on FileSystems implementing {{CanUnbuffer.unbuffer()}} to force > input streams to free up remote connections (HBASE-9393Link). This works for > HDFS, but not elsewhere. > WASB {{BlockBlobInputStream}} can implement this by closing the stream in > {{closeBlobInputStream}}, so it will be re-opened elsewhere. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17038) Support disabling buffered reads in ABFS positional reads
[ https://issues.apache.org/jira/browse/HADOOP-17038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17224696#comment-17224696 ] Anoop Sam John commented on HADOOP-17038: - [~ste...@apache.org] Can u pls help review the PR. Thanks > Support disabling buffered reads in ABFS positional reads > - > > Key: HADOOP-17038 > URL: https://issues.apache.org/jira/browse/HADOOP-17038 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Labels: HBase, abfsactive, pull-request-available > Attachments: HBase Perf Test Report.xlsx, screenshot-1.png > > Time Spent: 1h > Remaining Estimate: 0h > > Right now it will do a seek to the position , read and then seek back to the > old position. (As per the impl in the super class) > In HBase kind of workloads we rely mostly on short preads. (like 64 KB size > by default). So would be ideal to support a pure pos read API which will not > even keep the data in a buffer but will only read the required data as what > is asked for by the caller. (Not reading ahead more data as per the read size > config) > Allow an optional boolean config to be specified while opening file for read > using which buffered pread can be disabled. > FutureDataInputStreamBuilder openFile(Path path) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17308) WASB : PageBlobOutputStream succeeding flush even when underlying flush to storage failed
[ https://issues.apache.org/jira/browse/HADOOP-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HADOOP-17308: Affects Version/s: 2.7.0 > WASB : PageBlobOutputStream succeeding flush even when underlying flush to > storage failed > -- > > Key: HADOOP-17308 > URL: https://issues.apache.org/jira/browse/HADOOP-17308 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Critical > Labels: HBASE > > In PageBlobOutputStream, write() APIs will fill the buffer and > hflush/hsync/flush call will flush the buffer to underlying storage. Here the > Azure calls are handled in another thread > {code} > private synchronized void flushIOBuffers() { > ... > lastQueuedTask = new WriteRequest(outBuffer.toByteArray()); > ioThreadPool.execute(lastQueuedTask); > > } > private class WriteRequest implements Runnable { > private final byte[] dataPayload; > private final CountDownLatch doneSignal = new CountDownLatch(1); > public WriteRequest(byte[] dataPayload) { > this.dataPayload = dataPayload; > } > public void waitTillDone() throws InterruptedException { > doneSignal.await(); > } > @Override > public void run() { > try { > LOG.debug("before runInternal()"); > runInternal(); > LOG.debug("after runInternal()"); > } finally { > doneSignal.countDown(); > } > } > private void runInternal() { > .. > writePayloadToServer(rawPayload); > ... > } > private void writePayloadToServer(byte[] rawPayload) { > .. > try { > blob.uploadPages(wrapperStream, currentBlobOffset, rawPayload.length, > withMD5Checking(), PageBlobOutputStream.this.opContext); > } catch (IOException ex) { > lastError = ex; > } catch (StorageException ex) { > lastError = new IOException(ex); > } > if (lastError != null) { > LOG.debug("Caught error in > PageBlobOutputStream#writePayloadToServer()"); > } > } > } > {code} > The flushing thread will wait for the other thread to complete the Runnable > WriteRequest. Thats fine. But when some exception happened while > blob.uploadPages, we just set that to lastError state variable. This > variable is been checked for all subsequent ops like write, flush etc. But > what about the current flush call? that is silently being succeeded.!! > In standard Azure backed HBase clusters WAL is on page blob. This issue > causes a serious issue in HBase and causes data loss! HBase think a WAL write > was hflushed and make row write successful. In fact the row was never gone to > storage. > Checking the lastError variable at the end of flush op will solve the issue. > Then we will throw IOE from this flush() itself. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17308) WASB : PageBlobOutputStream succeeding flush even when underlying flush to storage failed
[ https://issues.apache.org/jira/browse/HADOOP-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HADOOP-17308: Labels: HBASE (was: ) > WASB : PageBlobOutputStream succeeding flush even when underlying flush to > storage failed > -- > > Key: HADOOP-17308 > URL: https://issues.apache.org/jira/browse/HADOOP-17308 > Project: Hadoop Common > Issue Type: Bug >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Critical > Labels: HBASE > > In PageBlobOutputStream, write() APIs will fill the buffer and > hflush/hsync/flush call will flush the buffer to underlying storage. Here the > Azure calls are handled in another thread > {code} > private synchronized void flushIOBuffers() { > ... > lastQueuedTask = new WriteRequest(outBuffer.toByteArray()); > ioThreadPool.execute(lastQueuedTask); > > } > private class WriteRequest implements Runnable { > private final byte[] dataPayload; > private final CountDownLatch doneSignal = new CountDownLatch(1); > public WriteRequest(byte[] dataPayload) { > this.dataPayload = dataPayload; > } > public void waitTillDone() throws InterruptedException { > doneSignal.await(); > } > @Override > public void run() { > try { > LOG.debug("before runInternal()"); > runInternal(); > LOG.debug("after runInternal()"); > } finally { > doneSignal.countDown(); > } > } > private void runInternal() { > .. > writePayloadToServer(rawPayload); > ... > } > private void writePayloadToServer(byte[] rawPayload) { > .. > try { > blob.uploadPages(wrapperStream, currentBlobOffset, rawPayload.length, > withMD5Checking(), PageBlobOutputStream.this.opContext); > } catch (IOException ex) { > lastError = ex; > } catch (StorageException ex) { > lastError = new IOException(ex); > } > if (lastError != null) { > LOG.debug("Caught error in > PageBlobOutputStream#writePayloadToServer()"); > } > } > } > {code} > The flushing thread will wait for the other thread to complete the Runnable > WriteRequest. Thats fine. But when some exception happened while > blob.uploadPages, we just set that to lastError state variable. This > variable is been checked for all subsequent ops like write, flush etc. But > what about the current flush call? that is silently being succeeded.!! > In standard Azure backed HBase clusters WAL is on page blob. This issue > causes a serious issue in HBase and causes data loss! HBase think a WAL write > was hflushed and make row write successful. In fact the row was never gone to > storage. > Checking the lastError variable at the end of flush op will solve the issue. > Then we will throw IOE from this flush() itself. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-17308) WASB : PageBlobOutputStream succeeding flush even when underlying flush to storage failed
Anoop Sam John created HADOOP-17308: --- Summary: WASB : PageBlobOutputStream succeeding flush even when underlying flush to storage failed Key: HADOOP-17308 URL: https://issues.apache.org/jira/browse/HADOOP-17308 Project: Hadoop Common Issue Type: Bug Reporter: Anoop Sam John Assignee: Anoop Sam John In PageBlobOutputStream, write() APIs will fill the buffer and hflush/hsync/flush call will flush the buffer to underlying storage. Here the Azure calls are handled in another thread {code} private synchronized void flushIOBuffers() { ... lastQueuedTask = new WriteRequest(outBuffer.toByteArray()); ioThreadPool.execute(lastQueuedTask); } private class WriteRequest implements Runnable { private final byte[] dataPayload; private final CountDownLatch doneSignal = new CountDownLatch(1); public WriteRequest(byte[] dataPayload) { this.dataPayload = dataPayload; } public void waitTillDone() throws InterruptedException { doneSignal.await(); } @Override public void run() { try { LOG.debug("before runInternal()"); runInternal(); LOG.debug("after runInternal()"); } finally { doneSignal.countDown(); } } private void runInternal() { .. writePayloadToServer(rawPayload); ... } private void writePayloadToServer(byte[] rawPayload) { .. try { blob.uploadPages(wrapperStream, currentBlobOffset, rawPayload.length, withMD5Checking(), PageBlobOutputStream.this.opContext); } catch (IOException ex) { lastError = ex; } catch (StorageException ex) { lastError = new IOException(ex); } if (lastError != null) { LOG.debug("Caught error in PageBlobOutputStream#writePayloadToServer()"); } } } {code} The flushing thread will wait for the other thread to complete the Runnable WriteRequest. Thats fine. But when some exception happened while blob.uploadPages, we just set that to lastError state variable. This variable is been checked for all subsequent ops like write, flush etc. But what about the current flush call? that is silently being succeeded.!! In standard Azure backed HBase clusters WAL is on page blob. This issue causes a serious issue in HBase and causes data loss! HBase think a WAL write was hflushed and make row write successful. In fact the row was never gone to storage. Checking the lastError variable at the end of flush op will solve the issue. Then we will throw IOE from this flush() itself. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17038) Support disabling buffered reads in ABFS positional reads
[ https://issues.apache.org/jira/browse/HADOOP-17038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213859#comment-17213859 ] Anoop Sam John commented on HADOOP-17038: - As of now disabling buffered pread alone is possible with usage of openFile() and setting option on builder. My plan is to have a follow up jira so that one can override other read related configs(like buffersize, readahead Q depth etc) while opening File for read will be possible. > Support disabling buffered reads in ABFS positional reads > - > > Key: HADOOP-17038 > URL: https://issues.apache.org/jira/browse/HADOOP-17038 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Labels: HBase, abfsactive, pull-request-available > Attachments: HBase Perf Test Report.xlsx, screenshot-1.png > > Time Spent: 1h > Remaining Estimate: 0h > > Right now it will do a seek to the position , read and then seek back to the > old position. (As per the impl in the super class) > In HBase kind of workloads we rely mostly on short preads. (like 64 KB size > by default). So would be ideal to support a pure pos read API which will not > even keep the data in a buffer but will only read the required data as what > is asked for by the caller. (Not reading ahead more data as per the read size > config) > Allow an optional boolean config to be specified while opening file for read > using which buffered pread can be disabled. > FutureDataInputStreamBuilder openFile(Path path) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17038) Support disabling buffered reads in ABFS positional reads
[ https://issues.apache.org/jira/browse/HADOOP-17038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212916#comment-17212916 ] Anoop Sam John commented on HADOOP-17038: - New PR based on the suggestion from [~ste...@apache.org]. Using new openFile API to disable buffered reads while preads. The API is marked InterfaceStability.Unstable as of now. Will this be changed ? Thanks for the suggestions. Tests passed in Azure ADL Gen2 premium storage account in East US. I have an HBase PE test results on a 3 node cluster. Will give that charts in a while. We see 2x gains. Will give cluster details and hbase file details. > Support disabling buffered reads in ABFS positional reads > - > > Key: HADOOP-17038 > URL: https://issues.apache.org/jira/browse/HADOOP-17038 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Labels: HBase, abfsactive, pull-request-available > Attachments: HBase Perf Test Report.xlsx, screenshot-1.png > > Time Spent: 50m > Remaining Estimate: 0h > > Right now it will do a seek to the position , read and then seek back to the > old position. (As per the impl in the super class) > In HBase kind of workloads we rely mostly on short preads. (like 64 KB size > by default). So would be ideal to support a pure pos read API which will not > even keep the data in a buffer but will only read the required data as what > is asked for by the caller. (Not reading ahead more data as per the read size > config) > Allow an optional boolean config to be specified while opening file for read > using which buffered pread can be disabled. > FutureDataInputStreamBuilder openFile(Path path) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17038) Support disabling buffered reads in ABFS positional reads
[ https://issues.apache.org/jira/browse/HADOOP-17038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HADOOP-17038: Description: Right now it will do a seek to the position , read and then seek back to the old position. (As per the impl in the super class) In HBase kind of workloads we rely mostly on short preads. (like 64 KB size by default). So would be ideal to support a pure pos read API which will not even keep the data in a buffer but will only read the required data as what is asked for by the caller. (Not reading ahead more data as per the read size config) Allow an optional boolean config to be specified while opening file for read using which buffered pread can be disabled. FutureDataInputStreamBuilder openFile(Path path) was: Right now it will do a seek to the position , read and then seek back to the old position. (As per the impl in the super class) In HBase kind of workloads we rely mostly on short preads. (like 64 KB size by default). So would be ideal to support a pure pos read API which will not even keep the data in a buffer but will only read the required data as what is asked for by the caller. (Not reading ahead more data as per the read size config) > Support disabling buffered reads in ABFS positional reads > - > > Key: HADOOP-17038 > URL: https://issues.apache.org/jira/browse/HADOOP-17038 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Labels: HBase, abfsactive, pull-request-available > Attachments: HBase Perf Test Report.xlsx, screenshot-1.png > > Time Spent: 40m > Remaining Estimate: 0h > > Right now it will do a seek to the position , read and then seek back to the > old position. (As per the impl in the super class) > In HBase kind of workloads we rely mostly on short preads. (like 64 KB size > by default). So would be ideal to support a pure pos read API which will not > even keep the data in a buffer but will only read the required data as what > is asked for by the caller. (Not reading ahead more data as per the read size > config) > Allow an optional boolean config to be specified while opening file for read > using which buffered pread can be disabled. > FutureDataInputStreamBuilder openFile(Path path) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17038) Support disabling buffered reads in ABFS positional reads
[ https://issues.apache.org/jira/browse/HADOOP-17038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HADOOP-17038: Summary: Support disabling buffered reads in ABFS positional reads (was: Support positional read in AbfsInputStream) > Support disabling buffered reads in ABFS positional reads > - > > Key: HADOOP-17038 > URL: https://issues.apache.org/jira/browse/HADOOP-17038 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Labels: HBase, abfsactive, pull-request-available > Attachments: HBase Perf Test Report.xlsx, screenshot-1.png > > Time Spent: 40m > Remaining Estimate: 0h > > Right now it will do a seek to the position , read and then seek back to the > old position. (As per the impl in the super class) > In HBase kind of workloads we rely mostly on short preads. (like 64 KB size > by default). So would be ideal to support a pure pos read API which will not > even keep the data in a buffer but will only read the required data as what > is asked for by the caller. (Not reading ahead more data as per the read size > config) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17038) Support positional read in AbfsInputStream
[ https://issues.apache.org/jira/browse/HADOOP-17038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17177430#comment-17177430 ] Anoop Sam John commented on HADOOP-17038: - bq.The option should be going into hbase-site, not hadoop-site, but I fear people doing the tuning may miss that Oh ya u said that.. Ya exactly this is what needed. Even reducing the value for fs.azure.read.request.size , we make sure that is added not in core-site.xml.. For Hive like workload, this read ahead (just calling that way) is really helpful. Fully agree to ur concern.. Actually because of this concern, ie. reduce fs.azure.read.request.size to some thing like 512 KB for HBase gets which affects the HBase long range scans and specially compaction issue many many reads to Azure FS, only I came up with this patch itself. > Support positional read in AbfsInputStream > -- > > Key: HADOOP-17038 > URL: https://issues.apache.org/jira/browse/HADOOP-17038 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Labels: HBase, abfsactive > Attachments: HBase Perf Test Report.xlsx, screenshot-1.png > > > Right now it will do a seek to the position , read and then seek back to the > old position. (As per the impl in the super class) > In HBase kind of workloads we rely mostly on short preads. (like 64 KB size > by default). So would be ideal to support a pure pos read API which will not > even keep the data in a buffer but will only read the required data as what > is asked for by the caller. (Not reading ahead more data as per the read size > config) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17038) Support positional read in AbfsInputStream
[ https://issues.apache.org/jira/browse/HADOOP-17038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17177422#comment-17177422 ] Anoop Sam John commented on HADOOP-17038: - Thanks [~ste...@apache.org] for those Qs. I agree to the fact that it should not be enabled just for HBase as other may suffer. Within HBase itself for long range scan (compaction needs it anyways), this is not good. And that is exactly why this config was added and made to false by default. The config can be turned on only for HBase via hbase-site.xml in RegionServer side. And HBase as such use both type of APIs. ie. Hadoop's pread API as well as normal read() after a seek.. When we do the long range scan, we make sure to use the seek+ read mode. HBase have the intelligence to do this switch back. That is interesting Steve.. The per file open option. In old versions which I have seen, this was not there.. In fact that is what at 1st I checked. To control it per each opened InputStream.. Let me read that and understand how to leverage that if possible.. If that is not really what needed, will come back and address ur comments. Appreciate it Steve. Thanks. > Support positional read in AbfsInputStream > -- > > Key: HADOOP-17038 > URL: https://issues.apache.org/jira/browse/HADOOP-17038 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Labels: HBase, abfsactive > Attachments: HBase Perf Test Report.xlsx, screenshot-1.png > > > Right now it will do a seek to the position , read and then seek back to the > old position. (As per the impl in the super class) > In HBase kind of workloads we rely mostly on short preads. (like 64 KB size > by default). So would be ideal to support a pure pos read API which will not > even keep the data in a buffer but will only read the required data as what > is asked for by the caller. (Not reading ahead more data as per the read size > config) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17038) Support positional read in AbfsInputStream
[ https://issues.apache.org/jira/browse/HADOOP-17038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HADOOP-17038: Attachment: screenshot-1.png > Support positional read in AbfsInputStream > -- > > Key: HADOOP-17038 > URL: https://issues.apache.org/jira/browse/HADOOP-17038 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Labels: HBase, abfsactive > Attachments: HBase Perf Test Report.xlsx, screenshot-1.png > > > Right now it will do a seek to the position , read and then seek back to the > old position. (As per the impl in the super class) > In HBase kind of workloads we rely mostly on short preads. (like 64 KB size > by default). So would be ideal to support a pure pos read API which will not > even keep the data in a buffer but will only read the required data as what > is asked for by the caller. (Not reading ahead more data as per the read size > config) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17038) Support positional read in AbfsInputStream
[ https://issues.apache.org/jira/browse/HADOOP-17038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174117#comment-17174117 ] Anoop Sam John commented on HADOOP-17038: - !screenshot-1.png! > Support positional read in AbfsInputStream > -- > > Key: HADOOP-17038 > URL: https://issues.apache.org/jira/browse/HADOOP-17038 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Labels: HBase, abfsactive > Attachments: HBase Perf Test Report.xlsx, screenshot-1.png > > > Right now it will do a seek to the position , read and then seek back to the > old position. (As per the impl in the super class) > In HBase kind of workloads we rely mostly on short preads. (like 64 KB size > by default). So would be ideal to support a pure pos read API which will not > even keep the data in a buffer but will only read the required data as what > is asked for by the caller. (Not reading ahead more data as per the read size > config) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17038) Support positional read in AbfsInputStream
[ https://issues.apache.org/jira/browse/HADOOP-17038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HADOOP-17038: Attachment: HBase Perf Test Report.xlsx > Support positional read in AbfsInputStream > -- > > Key: HADOOP-17038 > URL: https://issues.apache.org/jira/browse/HADOOP-17038 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Labels: HBase, abfsactive > Attachments: HBase Perf Test Report.xlsx > > > Right now it will do a seek to the position , read and then seek back to the > old position. (As per the impl in the super class) > In HBase kind of workloads we rely mostly on short preads. (like 64 KB size > by default). So would be ideal to support a pure pos read API which will not > even keep the data in a buffer but will only read the required data as what > is asked for by the caller. (Not reading ahead more data as per the read size > config) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17038) Support positional read in AbfsInputStream
[ https://issues.apache.org/jira/browse/HADOOP-17038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174116#comment-17174116 ] Anoop Sam John commented on HADOOP-17038: - Attached below is pert test done on an HBase cluster. This is a single RegionServer cluster with 15 regions. Total data size is 100 GB. Single client app with 10 or 100 threads doing the random row get ops. (Using HBase's native PerformanceEvaluation Tool) > Support positional read in AbfsInputStream > -- > > Key: HADOOP-17038 > URL: https://issues.apache.org/jira/browse/HADOOP-17038 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Labels: HBase, abfsactive > > Right now it will do a seek to the position , read and then seek back to the > old position. (As per the impl in the super class) > In HBase kind of workloads we rely mostly on short preads. (like 64 KB size > by default). So would be ideal to support a pure pos read API which will not > even keep the data in a buffer but will only read the required data as what > is asked for by the caller. (Not reading ahead more data as per the read size > config) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17038) Support positional read in AbfsInputStream
[ https://issues.apache.org/jira/browse/HADOOP-17038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174098#comment-17174098 ] Anoop Sam John commented on HADOOP-17038: - As mentioned in the desc, its main adv is with HBase where mostly the reads are random short reads. HBase by default do only positional reads for get/scans. We have a tracking mechanism in scan, where if consecutive blocks are reads by a scanner, we switch back to stream based reads(seek+ read model). Also during scan while compaction we do stream reads means seek+ read.. In case of these long reads (specially compaction where only compaction thread working on that dedicated FileInputStream), reading at 4 MB per remote reads is very useful. So its not that good to reduce fs.azure.read.request.size. This reduction will help normal random row gets case but compactions will add more pressure on the FS. Also if the same cluster is having range scans, that also might suffer. This is where the real pos reads make adv. In this patch the pos read API is extended in AbfsInputStream and it will not rely on the buffer at all. So the API is no longer synchronized. Also it will do read only the exact number of bytes being requested for. > Support positional read in AbfsInputStream > -- > > Key: HADOOP-17038 > URL: https://issues.apache.org/jira/browse/HADOOP-17038 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Labels: HBase, abfsactive > > Right now it will do a seek to the position , read and then seek back to the > old position. (As per the impl in the super class) > In HBase kind of workloads we rely mostly on short preads. (like 64 KB size > by default). So would be ideal to support a pure pos read API which will not > even keep the data in a buffer but will only read the required data as what > is asked for by the caller. (Not reading ahead more data as per the read size > config) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17038) Support positional read in AbfsInputStream
[ https://issues.apache.org/jira/browse/HADOOP-17038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HADOOP-17038: Labels: HBase abfsactive (was: abfsactive) > Support positional read in AbfsInputStream > -- > > Key: HADOOP-17038 > URL: https://issues.apache.org/jira/browse/HADOOP-17038 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Labels: HBase, abfsactive > > Right now it will do a seek to the position , read and then seek back to the > old position. (As per the impl in the super class) > In HBase kind of workloads we rely mostly on short preads. (like 64 KB size > by default). So would be ideal to support a pure pos read API which will not > even keep the data in a buffer but will only read the required data as what > is asked for by the caller. (Not reading ahead more data as per the read size > config) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16998) WASB : NativeAzureFsOutputStream#close() throwing java.lang.IllegalArgumentException instead of IOE which causes HBase RS to get aborted
[ https://issues.apache.org/jira/browse/HADOOP-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17141305#comment-17141305 ] Anoop Sam John commented on HADOOP-16998: - Thanks all for the reviews.. Fixed the comments in latest PR. > WASB : NativeAzureFsOutputStream#close() throwing > java.lang.IllegalArgumentException instead of IOE which causes HBase RS to > get aborted > > > Key: HADOOP-16998 > URL: https://issues.apache.org/jira/browse/HADOOP-16998 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Attachments: HADOOP-16998.patch > > > During HFile create, at the end when called close() on the OutputStream, > there is some pending data to get flushed. When this flush happens, an > Exception is thrown back from Storage. The Azure-storage SDK layer will throw > back IOE. (Even if it is a StorageException thrown from the Storage, the SDK > converts it to IOE.) But at HBase, we end up getting IllegalArgumentException > which causes the RS to get aborted. If we get back IOE, the flush will get > retried instead of aborting RS. > The reason is this > NativeAzureFsOutputStream uses Azure-storage SDK's BlobOutputStreamInternal. > But the BlobOutputStreamInternal is wrapped within a SyncableDataOutputStream > which is a FilterOutputStream. During the close op, NativeAzureFsOutputStream > calls close on SyncableDataOutputStream and it uses below method from > FilterOutputStream > {code} > public void close() throws IOException { > try (OutputStream ostream = out) { > flush(); > } > } > {code} > Here the flush call caused an IOE to be thrown to here. The finally will > issue close call on ostream (Which is an instance of BlobOutputStreamInternal) > When BlobOutputStreamInternal#close() is been called, if there was any > exception already occured on that Stream, it will throw back the same > Exception > {code} > public synchronized void close() throws IOException { > try { > // if the user has already closed the stream, this will throw a > STREAM_CLOSED exception > // if an exception was thrown by any thread in the > threadExecutor, realize it now > this.checkStreamState(); > ... > } > private void checkStreamState() throws IOException { > if (this.lastError != null) { > throw this.lastError; > } > } > {code} > So here both try and finally block getting Exceptions and Java uses > Throwable#addSuppressed() > Within this method if both Exceptions are same objects, it throws back > IllegalArgumentException > {code} > public final synchronized void addSuppressed(Throwable exception) { > if (exception == this) > throw new > IllegalArgumentException(SELF_SUPPRESSION_MESSAGE, exception); > > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16998) WASB : NativeAzureFsOutputStream#close() throwing java.lang.IllegalArgumentException instead of IOE which causes HBase RS to get aborted
[ https://issues.apache.org/jira/browse/HADOOP-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134913#comment-17134913 ] Anoop Sam John commented on HADOOP-16998: - PR raised https://github.com/apache/hadoop/pull/2073 Pls help with review. Ping [~ste...@apache.org] > WASB : NativeAzureFsOutputStream#close() throwing > java.lang.IllegalArgumentException instead of IOE which causes HBase RS to > get aborted > > > Key: HADOOP-16998 > URL: https://issues.apache.org/jira/browse/HADOOP-16998 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Attachments: HADOOP-16998.patch > > > During HFile create, at the end when called close() on the OutputStream, > there is some pending data to get flushed. When this flush happens, an > Exception is thrown back from Storage. The Azure-storage SDK layer will throw > back IOE. (Even if it is a StorageException thrown from the Storage, the SDK > converts it to IOE.) But at HBase, we end up getting IllegalArgumentException > which causes the RS to get aborted. If we get back IOE, the flush will get > retried instead of aborting RS. > The reason is this > NativeAzureFsOutputStream uses Azure-storage SDK's BlobOutputStreamInternal. > But the BlobOutputStreamInternal is wrapped within a SyncableDataOutputStream > which is a FilterOutputStream. During the close op, NativeAzureFsOutputStream > calls close on SyncableDataOutputStream and it uses below method from > FilterOutputStream > {code} > public void close() throws IOException { > try (OutputStream ostream = out) { > flush(); > } > } > {code} > Here the flush call caused an IOE to be thrown to here. The finally will > issue close call on ostream (Which is an instance of BlobOutputStreamInternal) > When BlobOutputStreamInternal#close() is been called, if there was any > exception already occured on that Stream, it will throw back the same > Exception > {code} > public synchronized void close() throws IOException { > try { > // if the user has already closed the stream, this will throw a > STREAM_CLOSED exception > // if an exception was thrown by any thread in the > threadExecutor, realize it now > this.checkStreamState(); > ... > } > private void checkStreamState() throws IOException { > if (this.lastError != null) { > throw this.lastError; > } > } > {code} > So here both try and finally block getting Exceptions and Java uses > Throwable#addSuppressed() > Within this method if both Exceptions are same objects, it throws back > IllegalArgumentException > {code} > public final synchronized void addSuppressed(Throwable exception) { > if (exception == this) > throw new > IllegalArgumentException(SELF_SUPPRESSION_MESSAGE, exception); > > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-17038) Support positional read in AbfsInputStream
Anoop Sam John created HADOOP-17038: --- Summary: Support positional read in AbfsInputStream Key: HADOOP-17038 URL: https://issues.apache.org/jira/browse/HADOOP-17038 Project: Hadoop Common Issue Type: Sub-task Reporter: Anoop Sam John Assignee: Anoop Sam John Right now it will do a seek to the position , read and then seek back to the old position. (As per the impl in the super class) In HBase kind of workloads we rely mostly on short preads. (like 64 KB size by default). So would be ideal to support a pure pos read API which will not even keep the data in a buffer but will only read the required data as what is asked for by the caller. (Not reading ahead more data as per the read size config) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-16998) WASB : NativeAzureFsOutputStream#close() throwing java.lang.IllegalArgumentException instead of IOE which causes HBase RS to get aborted
[ https://issues.apache.org/jira/browse/HADOOP-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17087871#comment-17087871 ] Anoop Sam John edited comment on HADOOP-16998 at 4/20/20, 5:30 PM: --- Thanks Steve. The version on which this was observed was 2.7.3.. But I believe this should be there in all versions and even in master. HADOOP-16785 handles cases where writes are called after close(). Here it is different. When close() is been called there is still data pending for flush. That write fails with IOE from Azure Storage SDK. And then in finally block of the close() it try to close the Azure Storage SDK level OS which throws back same IOE. This is the stack trace of the Exception what we see at HBase level. {code} Caused by: java.lang.IllegalArgumentException: ... at java.lang.Throwable.addSuppressed(Throwable.java:1072) at java.io.FilterOutputStream.close(FilterOutputStream.java:159) at org.apache.hadoop.fs.azure.NativeAzureFileSystem$NativeAzureFsOutputStream.close(NativeAzureFileSystem.java:1055) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.finishClose(AbstractHFileWriter.java:248) at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.finishClose(HFileWriterV3.java:133) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:368) at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:1080) at org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:67) at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:80) at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:960) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2411) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2511) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2256) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2218) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2110) at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2036) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:501) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: ... at com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:778) at com.microsoft.azure.storage.blob.BlobOutputStreamInternal.writeBlock(BlobOutputStreamInternal.java:462) at com.microsoft.azure.storage.blob.BlobOutputStreamInternal.access$000(BlobOutputStreamInternal.java:47) at com.microsoft.azure.storage.blob.BlobOutputStreamInternal$1.call(BlobOutputStreamInternal.java:406) at com.microsoft.azure.storage.blob.BlobOutputStreamInternal$1.call(BlobOutputStreamInternal.java:403) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: com.microsoft.azure.storage.StorageException: .. at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:87) at com.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:315) at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:185) at com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlockInternal(CloudBlockBlob.java:1097) at com.microsoft.azure.
[jira] [Commented] (HADOOP-16998) WASB : NativeAzureFsOutputStream#close() throwing java.lang.IllegalArgumentException instead of IOE which causes HBase RS to get aborted
[ https://issues.apache.org/jira/browse/HADOOP-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17087873#comment-17087873 ] Anoop Sam John commented on HADOOP-16998: - bq.Patches up on github as PRs for review Bit busy with some other stuff.. Surely will do after that. Tks. > WASB : NativeAzureFsOutputStream#close() throwing > java.lang.IllegalArgumentException instead of IOE which causes HBase RS to > get aborted > > > Key: HADOOP-16998 > URL: https://issues.apache.org/jira/browse/HADOOP-16998 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Attachments: HADOOP-16998.patch > > > During HFile create, at the end when called close() on the OutputStream, > there is some pending data to get flushed. When this flush happens, an > Exception is thrown back from Storage. The Azure-storage SDK layer will throw > back IOE. (Even if it is a StorageException thrown from the Storage, the SDK > converts it to IOE.) But at HBase, we end up getting IllegalArgumentException > which causes the RS to get aborted. If we get back IOE, the flush will get > retried instead of aborting RS. > The reason is this > NativeAzureFsOutputStream uses Azure-storage SDK's BlobOutputStreamInternal. > But the BlobOutputStreamInternal is wrapped within a SyncableDataOutputStream > which is a FilterOutputStream. During the close op, NativeAzureFsOutputStream > calls close on SyncableDataOutputStream and it uses below method from > FilterOutputStream > {code} > public void close() throws IOException { > try (OutputStream ostream = out) { > flush(); > } > } > {code} > Here the flush call caused an IOE to be thrown to here. The finally will > issue close call on ostream (Which is an instance of BlobOutputStreamInternal) > When BlobOutputStreamInternal#close() is been called, if there was any > exception already occured on that Stream, it will throw back the same > Exception > {code} > public synchronized void close() throws IOException { > try { > // if the user has already closed the stream, this will throw a > STREAM_CLOSED exception > // if an exception was thrown by any thread in the > threadExecutor, realize it now > this.checkStreamState(); > ... > } > private void checkStreamState() throws IOException { > if (this.lastError != null) { > throw this.lastError; > } > } > {code} > So here both try and finally block getting Exceptions and Java uses > Throwable#addSuppressed() > Within this method if both Exceptions are same objects, it throws back > IllegalArgumentException > {code} > public final synchronized void addSuppressed(Throwable exception) { > if (exception == this) > throw new > IllegalArgumentException(SELF_SUPPRESSION_MESSAGE, exception); > > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16998) WASB : NativeAzureFsOutputStream#close() throwing java.lang.IllegalArgumentException instead of IOE which causes HBase RS to get aborted
[ https://issues.apache.org/jira/browse/HADOOP-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17087871#comment-17087871 ] Anoop Sam John commented on HADOOP-16998: - Thanks Steve. The version on which this was observed was 2.7.3.. But I believe this should be there in all versions and even in master. HADOOP-16785 having handles cases where writes are called after close(). Here it is different. When close() is been called there is still data pending for flush. That write fails with IOE from Azure Storage SDK. And then in finally block of the close() it try to close the Azure Storage SDK level OS which throws back same IOE. This is the stack trace of the Exception what we see at HBase level. {code} Caused by: java.lang.IllegalArgumentException: ... at java.lang.Throwable.addSuppressed(Throwable.java:1072) at java.io.FilterOutputStream.close(FilterOutputStream.java:159) at org.apache.hadoop.fs.azure.NativeAzureFileSystem$NativeAzureFsOutputStream.close(NativeAzureFileSystem.java:1055) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.finishClose(AbstractHFileWriter.java:248) at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.finishClose(HFileWriterV3.java:133) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:368) at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:1080) at org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:67) at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:80) at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:960) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2411) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2511) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2256) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2218) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2110) at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2036) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:501) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: ... at com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:778) at com.microsoft.azure.storage.blob.BlobOutputStreamInternal.writeBlock(BlobOutputStreamInternal.java:462) at com.microsoft.azure.storage.blob.BlobOutputStreamInternal.access$000(BlobOutputStreamInternal.java:47) at com.microsoft.azure.storage.blob.BlobOutputStreamInternal$1.call(BlobOutputStreamInternal.java:406) at com.microsoft.azure.storage.blob.BlobOutputStreamInternal$1.call(BlobOutputStreamInternal.java:403) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: com.microsoft.azure.storage.StorageException: .. at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:87) at com.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:315) at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:185) at com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlockInternal(CloudBlockBlob.java:1097) at com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlock(Clo
[jira] [Commented] (HADOOP-16998) WASB : NativeAzureFsOutputStream#close() throwing java.lang.IllegalArgumentException instead of IOE which causes HBase RS to get aborted
[ https://issues.apache.org/jira/browse/HADOOP-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17087375#comment-17087375 ] Anoop Sam John commented on HADOOP-16998: - Thanks for having a look. Sure I will work on it.. Just attached for the ref. > WASB : NativeAzureFsOutputStream#close() throwing > java.lang.IllegalArgumentException instead of IOE which causes HBase RS to > get aborted > > > Key: HADOOP-16998 > URL: https://issues.apache.org/jira/browse/HADOOP-16998 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Attachments: HADOOP-16998.patch > > > During HFile create, at the end when called close() on the OutputStream, > there is some pending data to get flushed. When this flush happens, an > Exception is thrown back from Storage. The Azure-storage SDK layer will throw > back IOE. (Even if it is a StorageException thrown from the Storage, the SDK > converts it to IOE.) But at HBase, we end up getting IllegalArgumentException > which causes the RS to get aborted. If we get back IOE, the flush will get > retried instead of aborting RS. > The reason is this > NativeAzureFsOutputStream uses Azure-storage SDK's BlobOutputStreamInternal. > But the BlobOutputStreamInternal is wrapped within a SyncableDataOutputStream > which is a FilterOutputStream. During the close op, NativeAzureFsOutputStream > calls close on SyncableDataOutputStream and it uses below method from > FilterOutputStream > {code} > public void close() throws IOException { > try (OutputStream ostream = out) { > flush(); > } > } > {code} > Here the flush call caused an IOE to be thrown to here. The finally will > issue close call on ostream (Which is an instance of BlobOutputStreamInternal) > When BlobOutputStreamInternal#close() is been called, if there was any > exception already occured on that Stream, it will throw back the same > Exception > {code} > public synchronized void close() throws IOException { > try { > // if the user has already closed the stream, this will throw a > STREAM_CLOSED exception > // if an exception was thrown by any thread in the > threadExecutor, realize it now > this.checkStreamState(); > ... > } > private void checkStreamState() throws IOException { > if (this.lastError != null) { > throw this.lastError; > } > } > {code} > So here both try and finally block getting Exceptions and Java uses > Throwable#addSuppressed() > Within this method if both Exceptions are same objects, it throws back > IllegalArgumentException > {code} > public final synchronized void addSuppressed(Throwable exception) { > if (exception == this) > throw new > IllegalArgumentException(SELF_SUPPRESSION_MESSAGE, exception); > > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16998) WASB : NativeAzureFsOutputStream#close() throwing java.lang.IllegalArgumentException instead of IOE which causes HBase RS to get aborted
[ https://issues.apache.org/jira/browse/HADOOP-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17087354#comment-17087354 ] Anoop Sam John commented on HADOOP-16998: - Just attached the patch. I need to work with the PR. But some time later. > WASB : NativeAzureFsOutputStream#close() throwing > java.lang.IllegalArgumentException instead of IOE which causes HBase RS to > get aborted > > > Key: HADOOP-16998 > URL: https://issues.apache.org/jira/browse/HADOOP-16998 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Attachments: HADOOP-16998.patch > > > During HFile create, at the end when called close() on the OutputStream, > there is some pending data to get flushed. When this flush happens, an > Exception is thrown back from Storage. The Azure-storage SDK layer will throw > back IOE. (Even if it is a StorageException thrown from the Storage, the SDK > converts it to IOE.) But at HBase, we end up getting IllegalArgumentException > which causes the RS to get aborted. If we get back IOE, the flush will get > retried instead of aborting RS. > The reason is this > NativeAzureFsOutputStream uses Azure-storage SDK's BlobOutputStreamInternal. > But the BlobOutputStreamInternal is wrapped within a SyncableDataOutputStream > which is a FilterOutputStream. During the close op, NativeAzureFsOutputStream > calls close on SyncableDataOutputStream and it uses below method from > FilterOutputStream > {code} > public void close() throws IOException { > try (OutputStream ostream = out) { > flush(); > } > } > {code} > Here the flush call caused an IOE to be thrown to here. The finally will > issue close call on ostream (Which is an instance of BlobOutputStreamInternal) > When BlobOutputStreamInternal#close() is been called, if there was any > exception already occured on that Stream, it will throw back the same > Exception > {code} > public synchronized void close() throws IOException { > try { > // if the user has already closed the stream, this will throw a > STREAM_CLOSED exception > // if an exception was thrown by any thread in the > threadExecutor, realize it now > this.checkStreamState(); > ... > } > private void checkStreamState() throws IOException { > if (this.lastError != null) { > throw this.lastError; > } > } > {code} > So here both try and finally block getting Exceptions and Java uses > Throwable#addSuppressed() > Within this method if both Exceptions are same objects, it throws back > IllegalArgumentException > {code} > public final synchronized void addSuppressed(Throwable exception) { > if (exception == this) > throw new > IllegalArgumentException(SELF_SUPPRESSION_MESSAGE, exception); > > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16998) WASB : NativeAzureFsOutputStream#close() throwing java.lang.IllegalArgumentException instead of IOE which causes HBase RS to get aborted
[ https://issues.apache.org/jira/browse/HADOOP-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HADOOP-16998: Attachment: HADOOP-16998.patch > WASB : NativeAzureFsOutputStream#close() throwing > java.lang.IllegalArgumentException instead of IOE which causes HBase RS to > get aborted > > > Key: HADOOP-16998 > URL: https://issues.apache.org/jira/browse/HADOOP-16998 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Attachments: HADOOP-16998.patch > > > During HFile create, at the end when called close() on the OutputStream, > there is some pending data to get flushed. When this flush happens, an > Exception is thrown back from Storage. The Azure-storage SDK layer will throw > back IOE. (Even if it is a StorageException thrown from the Storage, the SDK > converts it to IOE.) But at HBase, we end up getting IllegalArgumentException > which causes the RS to get aborted. If we get back IOE, the flush will get > retried instead of aborting RS. > The reason is this > NativeAzureFsOutputStream uses Azure-storage SDK's BlobOutputStreamInternal. > But the BlobOutputStreamInternal is wrapped within a SyncableDataOutputStream > which is a FilterOutputStream. During the close op, NativeAzureFsOutputStream > calls close on SyncableDataOutputStream and it uses below method from > FilterOutputStream > {code} > public void close() throws IOException { > try (OutputStream ostream = out) { > flush(); > } > } > {code} > Here the flush call caused an IOE to be thrown to here. The finally will > issue close call on ostream (Which is an instance of BlobOutputStreamInternal) > When BlobOutputStreamInternal#close() is been called, if there was any > exception already occured on that Stream, it will throw back the same > Exception > {code} > public synchronized void close() throws IOException { > try { > // if the user has already closed the stream, this will throw a > STREAM_CLOSED exception > // if an exception was thrown by any thread in the > threadExecutor, realize it now > this.checkStreamState(); > ... > } > private void checkStreamState() throws IOException { > if (this.lastError != null) { > throw this.lastError; > } > } > {code} > So here both try and finally block getting Exceptions and Java uses > Throwable#addSuppressed() > Within this method if both Exceptions are same objects, it throws back > IllegalArgumentException > {code} > public final synchronized void addSuppressed(Throwable exception) { > if (exception == this) > throw new > IllegalArgumentException(SELF_SUPPRESSION_MESSAGE, exception); > > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16998) WASB : NativeAzureFsOutputStream#close() throwing java.lang.IllegalArgumentException instead of IOE which causes HBase RS to get aborted
[ https://issues.apache.org/jira/browse/HADOOP-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17087352#comment-17087352 ] Anoop Sam John commented on HADOOP-16998: - Yes same way. Seems in some versions of JDK, FilterOutputStream is changed to handle this possible issue in close(). But we might have to fix this at the WASB layer itself as long as we support JDK 1.8+.. Because some versions of JDK 1.8 will have this issue possibly coming. > WASB : NativeAzureFsOutputStream#close() throwing > java.lang.IllegalArgumentException instead of IOE which causes HBase RS to > get aborted > > > Key: HADOOP-16998 > URL: https://issues.apache.org/jira/browse/HADOOP-16998 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > > During HFile create, at the end when called close() on the OutputStream, > there is some pending data to get flushed. When this flush happens, an > Exception is thrown back from Storage. The Azure-storage SDK layer will throw > back IOE. (Even if it is a StorageException thrown from the Storage, the SDK > converts it to IOE.) But at HBase, we end up getting IllegalArgumentException > which causes the RS to get aborted. If we get back IOE, the flush will get > retried instead of aborting RS. > The reason is this > NativeAzureFsOutputStream uses Azure-storage SDK's BlobOutputStreamInternal. > But the BlobOutputStreamInternal is wrapped within a SyncableDataOutputStream > which is a FilterOutputStream. During the close op, NativeAzureFsOutputStream > calls close on SyncableDataOutputStream and it uses below method from > FilterOutputStream > {code} > public void close() throws IOException { > try (OutputStream ostream = out) { > flush(); > } > } > {code} > Here the flush call caused an IOE to be thrown to here. The finally will > issue close call on ostream (Which is an instance of BlobOutputStreamInternal) > When BlobOutputStreamInternal#close() is been called, if there was any > exception already occured on that Stream, it will throw back the same > Exception > {code} > public synchronized void close() throws IOException { > try { > // if the user has already closed the stream, this will throw a > STREAM_CLOSED exception > // if an exception was thrown by any thread in the > threadExecutor, realize it now > this.checkStreamState(); > ... > } > private void checkStreamState() throws IOException { > if (this.lastError != null) { > throw this.lastError; > } > } > {code} > So here both try and finally block getting Exceptions and Java uses > Throwable#addSuppressed() > Within this method if both Exceptions are same objects, it throws back > IllegalArgumentException > {code} > public final synchronized void addSuppressed(Throwable exception) { > if (exception == this) > throw new > IllegalArgumentException(SELF_SUPPRESSION_MESSAGE, exception); > > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-16998) WASB : NativeAzureFsOutputStream#close() throwing java.lang.IllegalArgumentException instead of IOE which causes HBase RS to get aborted
Anoop Sam John created HADOOP-16998: --- Summary: WASB : NativeAzureFsOutputStream#close() throwing java.lang.IllegalArgumentException instead of IOE which causes HBase RS to get aborted Key: HADOOP-16998 URL: https://issues.apache.org/jira/browse/HADOOP-16998 Project: Hadoop Common Issue Type: Bug Reporter: Anoop Sam John During HFile create, at the end when called close() on the OutputStream, there is some pending data to get flushed. When this flush happens, an Exception is thrown back from Storage. The Azure-storage SDK layer will throw back IOE. (Even if it is a StorageException thrown from the Storage, the SDK converts it to IOE.) But at HBase, we end up getting IllegalArgumentException which causes the RS to get aborted. If we get back IOE, the flush will get retried instead of aborting RS. The reason is this NativeAzureFsOutputStream uses Azure-storage SDK's BlobOutputStreamInternal. But the BlobOutputStreamInternal is wrapped within a SyncableDataOutputStream which is a FilterOutputStream. During the close op, NativeAzureFsOutputStream calls close on SyncableDataOutputStream and it uses below method from FilterOutputStream {code} public void close() throws IOException { try (OutputStream ostream = out) { flush(); } } {code} Here the flush call caused an IOE to be thrown to here. The finally will issue close call on ostream (Which is an instance of BlobOutputStreamInternal) When BlobOutputStreamInternal#close() is been called, if there was any exception already occured on that Stream, it will throw back the same Exception {code} public synchronized void close() throws IOException { try { // if the user has already closed the stream, this will throw a STREAM_CLOSED exception // if an exception was thrown by any thread in the threadExecutor, realize it now this.checkStreamState(); ... } private void checkStreamState() throws IOException { if (this.lastError != null) { throw this.lastError; } } {code} So here both try and finally block getting Exceptions and Java uses Throwable#addSuppressed() Within this method if both Exceptions are same objects, it throws back IllegalArgumentException {code} public final synchronized void addSuppressed(Throwable exception) { if (exception == this) throw new IllegalArgumentException(SELF_SUPPRESSION_MESSAGE, exception); } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15643) Review/implement ABFS support for the extra fs ops which some apps (HBase) expects
[ https://issues.apache.org/jira/browse/HADOOP-15643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17029085#comment-17029085 ] Anoop Sam John commented on HADOOP-15643: - HBase now having ByteBufferReadable usage also. > Review/implement ABFS support for the extra fs ops which some apps (HBase) > expects > -- > > Key: HADOOP-15643 > URL: https://issues.apache.org/jira/browse/HADOOP-15643 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: HADOOP-15407 >Reporter: Steve Loughran >Priority: Major > > One troublespot with storage connectors is those apps which expect rarer > APIs, e.g. Beam and ByteBufferReadable ( BEAM-2790), HBase and CanUnbuffer > (HADOOP-14748). > Review ABFS support with these, decide which to implement, and if not, make > sure that the callers can handle that -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org