RE: [VOTE] Release Apache Hadoop 3.2.1 - RC0
I built release-3.2.1-RC0 and verified that the hadoop-azure tests are passing, so WASB and ABFS look great. +1 -Original Message- From: runlin zhang Sent: Thursday, September 12, 2019 1:06 AM To: Rohith Sharma K S Cc: Hdfs-dev ; yarn-dev ; mapreduce-dev ; Hadoop Common Subject: Re: [VOTE] Release Apache Hadoop 3.2.1 - RC0 +1 > 在 2019年9月11日,下午3:26,Rohith Sharma K S 写道: > > Hi folks, > > I have put together a release candidate (RC0) for Apache Hadoop 3.2.1. > > The RC is available at: > https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fhome.ap > ache.org%2F~rohithsharmaks%2Fhadoop-3.2.1-RC0%2Fdata=02%7C01%7Ctm > arq%40microsoft.com%7Cd987fa2749894f6556c108d737580754%7C72f988bf86f14 > 1af91ab2d7cd011db47%7C1%7C1%7C637038723538965168sdata=ItMaCqAX58U > MR%2Bm8Tx5wCatU4j80rsLcGumldCUgwoE%3Dreserved=0 > > The RC tag in git is release-3.2.1-RC0: > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith > ub.com%2Fapache%2Fhadoop%2Ftree%2Frelease-3.2.1-RC0data=02%7C01%7 > Ctmarq%40microsoft.com%7Cd987fa2749894f6556c108d737580754%7C72f988bf86 > f141af91ab2d7cd011db47%7C1%7C1%7C637038723538975159sdata=2BrIY0xb > 8D%2Bkx59R5jSfiMHpN8R72y3HaFlGFlZQwW0%3Dreserved=0 > > > The maven artifacts are staged at > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepo > sitory.apache.org%2Fcontent%2Frepositories%2Forgapachehadoop-1226%2F > mp;data=02%7C01%7Ctmarq%40microsoft.com%7Cd987fa2749894f6556c108d73758 > 0754%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C637038723538975159 > mp;sdata=Xdt%2FcPEGc7H%2BWrimqf4BIq4G44ejQ4uA4icsyn%2FjiII%3Drese > rved=0 > > You can find my public key at: > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist > .apache.org%2Frepos%2Fdist%2Frelease%2Fhadoop%2Fcommon%2FKEYSdata > =02%7C01%7Ctmarq%40microsoft.com%7Cd987fa2749894f6556c108d737580754%7C > 72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C637038723538975159sdat > a=Ors%2BdrV8jVjaLSjc44Wwk2NSMVCkcrZDpPtH9f4FV84%3Dreserved=0 > > This vote will run for 7 days(5 weekdays), ending on 18th Sept at > 11:59 pm PST. > > I have done testing with a pseudo cluster and distributed shell job. > My +1 to start. > > Thanks & Regards > Rohith Sharma K S - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-16405) Upgrade Wildfly Openssl version to 1.0.7.Final
[ https://issues.apache.org/jira/browse/HADOOP-16405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Marquardt resolved HADOOP-16405. --- Resolution: Fixed Fix Version/s: 3.3.0 Duplicate of HADOOP-16460. > Upgrade Wildfly Openssl version to 1.0.7.Final > -- > > Key: HADOOP-16405 > URL: https://issues.apache.org/jira/browse/HADOOP-16405 > Project: Hadoop Common > Issue Type: Sub-task > Components: build, fs/azure >Affects Versions: 3.2.0 >Reporter: Vishwajeet Dusane >Assignee: Vishwajeet Dusane >Priority: Major > Fix For: 3.3.0 > > > Upgrade Wildfly Openssl version to 1.0.7.Final. This version has SNI support > which is essential for firewall enabled clusters along with many stability > related fixes. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-16460) ABFS: fix for Sever Name Indication (SNI)
Thomas Marquardt created HADOOP-16460: - Summary: ABFS: fix for Sever Name Indication (SNI) Key: HADOOP-16460 URL: https://issues.apache.org/jira/browse/HADOOP-16460 Project: Hadoop Common Issue Type: Bug Components: fs/azure Affects Versions: 3.1.2 Reporter: Thomas Marquardt Assignee: Vishwajeet Dusane We need to update wildfly-openssl to 1.0.7.Final in ./hadoop-project/pom.xml. ABFS depends on wildfly-openssl for secure sockets due to the performance improvements. The current wildfly-openssl does not support Server Name Indication (SNI). A fix was made in https://github.com/wildfly/wildfly-openssl/issues/59 and there is an official release of wildfly-openssl with the fix ([https://github.com/wildfly/wildfly-openssl/releases/tag/1.0.7.Final)|https://github.com/wildfly/wildfly-openssl/releases/tag/1.0.7.Final).]. The fix has been validated. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15872) ABFS: getFileStatus should only require execute permission on the parent folders
Thomas Marquardt created HADOOP-15872: - Summary: ABFS: getFileStatus should only require execute permission on the parent folders Key: HADOOP-15872 URL: https://issues.apache.org/jira/browse/HADOOP-15872 Project: Hadoop Common Issue Type: Bug Components: fs/azure Affects Versions: 3.2.0 Reporter: Thomas Marquardt Assignee: Thomas Marquardt The ABFS implementation of getFileStatus currently requires read permission. According to HDFS permissions guide, it should only require execute on the parent folders (traversal access). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Reopened] (HADOOP-15839) Review + update cloud store sensitive keys in hadoop.security.sensitive-config-keys
[ https://issues.apache.org/jira/browse/HADOOP-15839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Marquardt reopened HADOOP-15839: --- Looks like we missed things like "fs.azure.account.oauth2.client.secret". See my earlier comment in this JIRA. > Review + update cloud store sensitive keys in > hadoop.security.sensitive-config-keys > --- > > Key: HADOOP-15839 > URL: https://issues.apache.org/jira/browse/HADOOP-15839 > Project: Hadoop Common > Issue Type: Sub-task > Components: conf >Affects Versions: 3.2.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: HADOOP-15839-001.patch > > > Make sure that {{hadoop.security.sensitive-config-keys}} is up to date with > all cloud store options, including > h3. s3a: > * s3a per-bucket secrets > * s3a session tokens > h3: abfs > * {{fs.azure.account.oauth2.client.secret}} > h3. adls > fs.adl.oauth2.credential > fs.adl.oauth2.refresh.token -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15723) ABFS: Ranger Support
Thomas Marquardt created HADOOP-15723: - Summary: ABFS: Ranger Support Key: HADOOP-15723 URL: https://issues.apache.org/jira/browse/HADOOP-15723 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Reporter: Thomas Marquardt Add support for Ranger -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Reopened] (HADOOP-15703) ABFS - Implement client-side throttling
[ https://issues.apache.org/jira/browse/HADOOP-15703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Marquardt reopened HADOOP-15703: --- Assignee: Thomas Marquardt (was: Sneha Varma) I'll provide a patch to fix the Yetus issues. I could not get Yetus to run previously, so lets see if it will run on the patch to fix this. > ABFS - Implement client-side throttling > > > Key: HADOOP-15703 > URL: https://issues.apache.org/jira/browse/HADOOP-15703 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Sneha Varma > Assignee: Thomas Marquardt >Priority: Major > Attachments: HADOOP-15703-HADOOP-15407-001.patch, > HADOOP-15703-HADOOP-15407-002.patch > > > Big data workloads frequently exceed the AzureBlobFS max ingress and egress > limits > (https://docs.microsoft.com/en-us/azure/storage/common/storage-scalability-targets). > For example, the max ingress limit for a GRS account in the United States is > currently 10 Gbps. When the limit is exceeded, the AzureBlobFS service fails > a percentage of incoming requests, and this causes the client to initiate the > retry policy. The retry policy delays requests by sleeping, but the sleep > duration is independent of the client throughput and account limit. This > results in low throughput, due to the high number of failed requests and > thrashing causes by the retry policy. > To fix this, we introduce a client-side throttle which minimizes failed > requests and maximizes throughput. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Reopened] (HADOOP-15547) WASB: improve listStatus performance
[ https://issues.apache.org/jira/browse/HADOOP-15547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Marquardt reopened HADOOP-15547: --- Reactivating for branch-2 backport. > WASB: improve listStatus performance > > > Key: HADOOP-15547 > URL: https://issues.apache.org/jira/browse/HADOOP-15547 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure >Affects Versions: 2.9.1, 3.0.2 >Reporter: Thomas Marquardt > Assignee: Thomas Marquardt >Priority: Major > Fix For: 3.1.1 > > Attachments: HADOOP-15547-004.patch, HADOOP-15547-004.patch, > HADOOP-15547.001.patch, HADOOP-15547.002.patch, HADOOP-15547.003.patch > > > The WASB implementation of Filesystem.listStatus is very slow due to O(n!) > algorithm to remove duplicates and uses too much memory due to the extra > conversion from BlobListItem to FileMetadata to FileStatus. It takes over 30 > minutes to list 700,000 files. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15704) ABFS: Consider passing FS URI to CustomDelegationTokenManager
Thomas Marquardt created HADOOP-15704: - Summary: ABFS: Consider passing FS URI to CustomDelegationTokenManager Key: HADOOP-15704 URL: https://issues.apache.org/jira/browse/HADOOP-15704 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Reporter: Thomas Marquardt Refer to Steve's comments in HADOOP-15692. Passing the FS or FS URI to the CustomDelegationTokenManager would allow it to provide per-filesystem tokens. We currently have implementations of CustomDelegationTokenManager, and need to do a little leg work here, but it may be possible to update before GA. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15692) ABFS: extensible support for custom oauth
Thomas Marquardt created HADOOP-15692: - Summary: ABFS: extensible support for custom oauth Key: HADOOP-15692 URL: https://issues.apache.org/jira/browse/HADOOP-15692 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Reporter: Thomas Marquardt ABFS supports oauth in various forms and needs to export interfaces for customization of FileSystem.getDelegationToken and getAccessToken. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15682) ABFS: Add support for StreamCapabilities
Thomas Marquardt created HADOOP-15682: - Summary: ABFS: Add support for StreamCapabilities Key: HADOOP-15682 URL: https://issues.apache.org/jira/browse/HADOOP-15682 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Affects Versions: 3. Reporter: Thomas Marquardt Add support for the new StreamCapabilities interface. This work is similar to what was done for WASB [HADOOP-15677|https://jira.apache.org/jira/browse/HADOOP-15677]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15677) WASB: Add support for StreamCapabilities
Thomas Marquardt created HADOOP-15677: - Summary: WASB: Add support for StreamCapabilities Key: HADOOP-15677 URL: https://issues.apache.org/jira/browse/HADOOP-15677 Project: Hadoop Common Issue Type: New Feature Components: fs/azure Affects Versions: 3.0.3, 3.1.0 Reporter: Thomas Marquardt StreamCapabilities is a new interface in branch-3, and was partially added to WASB. Let's complete the implementation and add test coverage for block blobs, block blobs with compaction, and page blobs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15669) ABFS: Improve HTTPS Performance
Thomas Marquardt created HADOOP-15669: - Summary: ABFS: Improve HTTPS Performance Key: HADOOP-15669 URL: https://issues.apache.org/jira/browse/HADOOP-15669 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Reporter: Thomas Marquardt We see approximately 50% worse throughput for ABFS over HTTPs vs HTTP. Lets perform a detailed measurement and see what can be done to improve throughput. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15664) ABFS: Reduce test run time via parallelization and grouping
Thomas Marquardt created HADOOP-15664: - Summary: ABFS: Reduce test run time via parallelization and grouping Key: HADOOP-15664 URL: https://issues.apache.org/jira/browse/HADOOP-15664 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Reporter: Thomas Marquardt 1) Let's reduce the total test runtime by improving parallelization of the tests. 2) Let's make it possible to select WASB tests, ABFS tests, or both so developers can run only the tests appropriate for the change they've made. 3) Update the testing-azure.md accordingly -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15663) ABFS: Simplify configuration
Thomas Marquardt created HADOOP-15663: - Summary: ABFS: Simplify configuration Key: HADOOP-15663 URL: https://issues.apache.org/jira/browse/HADOOP-15663 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Reporter: Thomas Marquardt Configuration for WASB and ABFS is too complex. The current approach is to use four files for test configuration. Both WASB and ABFS have basic test configuration which is committed to the repo (azure-test.xml and azure-bfs-test.xml). Currently these contain the fs.AbstractFileSystem.[scheme].impl configuration, but otherwise are empty except for an include reference to a file containing the endpoint credentials. Both WASB and ABFS have endpoint credential configuration files (azure-auth-keys.xml and azure-bfs-auth-keys.xml). These have been added to .gitignore to prevent them from accidentally being submitted in a patch, which would leak the developers storage account credentials. These files contain account names, storage account keys, and service endpoints. There is some overlap of the configuration for WASB and ABFS, where they use the same property name but use different values. 1) Let's reduce the number of test configuration files to one, if possible. 2) Let's simplify the account name, key, and endpoint configuration for WASB and ABFS if possible, but still support the legacy way of doing it, which is very error prone. 3) Let's improve error handling, so that typos or misconfiguration are not so difficult to troubleshoot. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15662) ABFS: Better exception handling of DNS errors
Thomas Marquardt created HADOOP-15662: - Summary: ABFS: Better exception handling of DNS errors Key: HADOOP-15662 URL: https://issues.apache.org/jira/browse/HADOOP-15662 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Reporter: Thomas Marquardt DNS errors are common during testing due to typos or misconfiguration. They can also occur in production, as some transient DNS issues occur from time to time. 1) Let's investigate if we can distinguish between the two and fail fast for the test issues, but continue to have retry logic for the transient DNS issues in production. 2) Let's improve the error handling of DNS failures, so the user has an actionable error message. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15547) WASB: listStatus performance
Thomas Marquardt created HADOOP-15547: - Summary: WASB: listStatus performance Key: HADOOP-15547 URL: https://issues.apache.org/jira/browse/HADOOP-15547 Project: Hadoop Common Issue Type: Bug Components: fs/azure Affects Versions: 3.0.2, 2.9.1 Reporter: Thomas Marquardt Assignee: Thomas Marquardt The WASB implementation of Filesystem.listStatus is very slow due to O(n!) algorithm to remove duplicates and uses too much memory due to the extra conversion from BlobListItem to FileMetadata to FileStatus. It takes over 30 minutes to list 700,000 files. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15478) WASB: hflush() and hsync() regression
Thomas Marquardt created HADOOP-15478: - Summary: WASB: hflush() and hsync() regression Key: HADOOP-15478 URL: https://issues.apache.org/jira/browse/HADOOP-15478 Project: Hadoop Common Issue Type: Bug Components: fs/azure Affects Versions: 3.0.2, 2.9.0 Reporter: Thomas Marquardt Assignee: Thomas Marquardt HADOOP-14520 introduced a regression in hflush() and hsync(). Previously, for the default case where users upload data as block blobs, these were no-ops. Unfortunately, HADOOP-14520 accidentally implemented hflush() and hsync() by default, so any data buffered in the stream is immediately uploaded to storage. This new behavior is undesirable, because block blobs have a limit of 50,000 blocks. Spark users are now seeing failures due to exceeding the block limit, since Spark frequently invokes hflush(). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
Re: [DISCUSS] Branch Proposal: HADOOP 15407: ABFS
A feature branch seems reasonable to me too. Note that the WASB connector will continue to exist, and live side-by-side with the new Azure Blob Filesystem (ABFS) connector. We will encourage users to move to the new ABFS connector, and all of our new feature and performance improvements will target the ABFS connector. ABFS will perform better at no additional cost, so I expect current users to migrate in time. The two connectors are compatible for mainline scenarios, but there are some uncommon features in WASB that we chose not to carry over in the initial implementation. So we hope ABFS will replace the usage of WASB, but the WASB connector itself will continue to exist. Maybe we can remove WASB in the future some day, if nobody is using it. I can confirm that nobody ever gets seek() right. :) Thanks, Thomas From: larry mccay <lmc...@apache.org> Sent: Tuesday, May 15, 2018 8:44 AM To: Steve Loughran Cc: Hadoop Common Subject: Re: [DISCUSS] Branch Proposal: HADOOP 15407: ABFS This seems like a reasonable and effective use of a feature branch and branch committers to me. On Tue, May 15, 2018 at 11:34 AM, Steve Loughran <ste...@hortonworks.com> wrote: > Hi > > Chris Douglas I and I've have a proposal for a short-lived feature branch > for the Azure ABFS connector to go into the hadoop-azure package. This will > connect to the new azure storage service, which will ultimately replace the > one used by wasb. It's a big patch and, like all storage connectors, will > inevitably take time to stabilize (i.e: nobody ever get seek() right, even > when we think we have). > > Thomas & Esfandiar will do the coding: they've already done the paperwork. > Chris, myself & anyone else interested can be involved in the review and > testing. > > Comments? > > - > > The initial HADOOP-15407 patch contains a new filesystem client for the > forthcoming Azure ABFS, which is intended to replace Azure WASB as the > Azure storage layer. The patch is large, as it contains the replacement > client, tests, and generated code. > > We propose a feature branch, so the module can be broken into salient, > reviewable chunks. Internal constraints prevented this feature from being > developed in Apache, so we want to ensure that all the code is discussed, > maintainable, and documented by the community before it merges. > > To effect this, we also propose adding two developers as branch > committers: Thomas Marquardt tm...@microsoft.com<mailto:tma > r...@microsoft.com> Esfandiar Manii esma...@microsoft.com sma...@microsoft.com> > > Beyond normal feature branch activity and merge criteria for FS modules, > we want to add another merge criterion for ABFS. Some of the client APIs > are not GA. It seems reasonable to require that this client works with > public endpoints before it merges to trunk. > > To test the Blob FS driver, Blob FS team (including Esfandiar Manii and > Thomas Marquardt) in Azure Storage will need the MSDN subscription ID(s) > for all reviewers who want to run the tests. The ABFS team will then > whitelist the subscription ID(s) for the Blob FS Preview. At that time, > future storage accounts created will have the Blob FS endpoint, > .dfs.core.windows.net<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdfs.core.windows.net=02%7C01%7Ctmarq%40microsoft.com%7C8cce958a338644ba48e108d5ba7acf7e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636619958983989716=HG5Ru6jlBauS09rQY49BcLCI39jZPJH5cFVGgAy7JW8%3D=0>, > which > the Blob FS driver relies on. > > This is a temporary state during the (current) Private Preview and the > early phases of Public Preview. In a few months, the whitelisting will not > be required and anyone will be able to create a storage account with access > to the Blob FS endpoint. > > Thomas and Esfandiar have been active in the Hadoop project working on the > WASB connector (see > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHADOOP-14552=02%7C01%7Ctmarq%40microsoft.com%7C8cce958a338644ba48e108d5ba7acf7e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636619958983989716=QFZt%2BNRDEvpV6HX0rHLjPvKBzTWVAQyxji1o6cbgMr0%3D=0). > They understand the processes and requirements of the software. Working on > the branch directly will let them bring this significant feature into the > hadoop-azure module without disrupting existing users. >
[jira] [Created] (HADOOP-15446) WASB: PageBlobInputStream.skip breaks HBASE replication
Thomas Marquardt created HADOOP-15446: - Summary: WASB: PageBlobInputStream.skip breaks HBASE replication Key: HADOOP-15446 URL: https://issues.apache.org/jira/browse/HADOOP-15446 Project: Hadoop Common Issue Type: Bug Components: fs/azure Affects Versions: 3.0.2, 2.9.0 Reporter: Thomas Marquardt Assignee: Thomas Marquardt Page Blobs are primarily used by HBASE. HBASE replication, which apparently has not been used with WASB until recently, performs non-sequential reads on log files using PageBlobInputStream. There are bugs in this stream implementation which prevent skip and seek from working properly, and eventually the stream state becomes corrupt and unusable. I believe this bug affects all releases of WASB/HADOOP. It appears to be a day-0 bug in PageBlobInputStream. There were similar bugs opened in the past (HADOOP-15042) but the issue was not properly fixed, and no test coverage was added. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15156) backport HADOOP-15086 rename fix to branch-2
Thomas Marquardt created HADOOP-15156: - Summary: backport HADOOP-15086 rename fix to branch-2 Key: HADOOP-15156 URL: https://issues.apache.org/jira/browse/HADOOP-15156 Project: Hadoop Common Issue Type: Bug Components: fs/azure Reporter: Thomas Marquardt Assignee: Thomas Marquardt backport HADOOP-15086 (rename fix) to branch-2 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14769) WASB: delete recursive should not fail if a file is deleted
Thomas Marquardt created HADOOP-14769: - Summary: WASB: delete recursive should not fail if a file is deleted Key: HADOOP-14769 URL: https://issues.apache.org/jira/browse/HADOOP-14769 Project: Hadoop Common Issue Type: Bug Components: fs/azure Reporter: Thomas Marquardt Assignee: Thomas Marquardt FileSystem.delete(Path path) and delete(Path path, boolean recursive) return false if the path does not exist. The WASB implementation of recursive delete currently fails if one of the entries is deleted by an external agent while a recursive delete is in progress. For example, if you try to delete all of the files in a directory, which can be a very long process, and one of the files contained within is deleted by an external agent, the recursive directory delete operation will fail if it tries to delete that file and discovers that it does not exist. This is not desirable. A recursive directory delete operation should succeeed if the directory initially exists and when the operation completes, the directory and all of its entries do not exist. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14722) Azure: BlockBlobInputStream position incorrect after seek
Thomas Marquardt created HADOOP-14722: - Summary: Azure: BlockBlobInputStream position incorrect after seek Key: HADOOP-14722 URL: https://issues.apache.org/jira/browse/HADOOP-14722 Project: Hadoop Common Issue Type: Bug Components: fs/azure Reporter: Thomas Marquardt Assignee: Thomas Marquardt The seek, skip, and getPos methods of BlockBlobInputStream do not correctly account for the stream's internal buffer. This results in invalid stream positions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org