RE: [VOTE] Release Apache Hadoop 3.2.1 - RC0

2019-09-13 Thread Thomas Marquardt
I built release-3.2.1-RC0 and verified that the hadoop-azure tests are passing, 
so WASB and ABFS look great.  +1

-Original Message-
From: runlin zhang  
Sent: Thursday, September 12, 2019 1:06 AM
To: Rohith Sharma K S 
Cc: Hdfs-dev ; yarn-dev 
; mapreduce-dev ; 
Hadoop Common 
Subject: Re: [VOTE] Release Apache Hadoop 3.2.1 - RC0

+1

> 在 2019年9月11日,下午3:26,Rohith Sharma K S  写道:
> 
> Hi folks,
> 
> I have put together a release candidate (RC0) for Apache Hadoop 3.2.1.
> 
> The RC is available at:
> https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fhome.ap
> ache.org%2F~rohithsharmaks%2Fhadoop-3.2.1-RC0%2Fdata=02%7C01%7Ctm
> arq%40microsoft.com%7Cd987fa2749894f6556c108d737580754%7C72f988bf86f14
> 1af91ab2d7cd011db47%7C1%7C1%7C637038723538965168sdata=ItMaCqAX58U
> MR%2Bm8Tx5wCatU4j80rsLcGumldCUgwoE%3Dreserved=0
> 
> The RC tag in git is release-3.2.1-RC0:
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> ub.com%2Fapache%2Fhadoop%2Ftree%2Frelease-3.2.1-RC0data=02%7C01%7
> Ctmarq%40microsoft.com%7Cd987fa2749894f6556c108d737580754%7C72f988bf86
> f141af91ab2d7cd011db47%7C1%7C1%7C637038723538975159sdata=2BrIY0xb
> 8D%2Bkx59R5jSfiMHpN8R72y3HaFlGFlZQwW0%3Dreserved=0
> 
> 
> The maven artifacts are staged at
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepo
> sitory.apache.org%2Fcontent%2Frepositories%2Forgapachehadoop-1226%2F
> mp;data=02%7C01%7Ctmarq%40microsoft.com%7Cd987fa2749894f6556c108d73758
> 0754%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C637038723538975159
> mp;sdata=Xdt%2FcPEGc7H%2BWrimqf4BIq4G44ejQ4uA4icsyn%2FjiII%3Drese
> rved=0
> 
> You can find my public key at:
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist
> .apache.org%2Frepos%2Fdist%2Frelease%2Fhadoop%2Fcommon%2FKEYSdata
> =02%7C01%7Ctmarq%40microsoft.com%7Cd987fa2749894f6556c108d737580754%7C
> 72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C637038723538975159sdat
> a=Ors%2BdrV8jVjaLSjc44Wwk2NSMVCkcrZDpPtH9f4FV84%3Dreserved=0
> 
> This vote will run for 7 days(5 weekdays), ending on 18th Sept at 
> 11:59 pm PST.
> 
> I have done testing with a pseudo cluster and distributed shell job. 
> My +1 to start.
> 
> Thanks & Regards
> Rohith Sharma K S


-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org


-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-16405) Upgrade Wildfly Openssl version to 1.0.7.Final

2019-07-30 Thread Thomas Marquardt (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Marquardt resolved HADOOP-16405.
---
   Resolution: Fixed
Fix Version/s: 3.3.0

Duplicate of HADOOP-16460.

> Upgrade Wildfly Openssl version to 1.0.7.Final
> --
>
> Key: HADOOP-16405
> URL: https://issues.apache.org/jira/browse/HADOOP-16405
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build, fs/azure
>Affects Versions: 3.2.0
>Reporter: Vishwajeet Dusane
>Assignee: Vishwajeet Dusane
>Priority: Major
> Fix For: 3.3.0
>
>
> Upgrade Wildfly Openssl version to 1.0.7.Final. This version has SNI support 
> which is essential for firewall enabled clusters along with many stability 
> related fixes.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16460) ABFS: fix for Sever Name Indication (SNI)

2019-07-24 Thread Thomas Marquardt (JIRA)
Thomas Marquardt created HADOOP-16460:
-

 Summary: ABFS: fix for Sever Name Indication (SNI)
 Key: HADOOP-16460
 URL: https://issues.apache.org/jira/browse/HADOOP-16460
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/azure
Affects Versions: 3.1.2
Reporter: Thomas Marquardt
Assignee: Vishwajeet Dusane


We need to update wildfly-openssl to 1.0.7.Final in ./hadoop-project/pom.xml.

 

ABFS depends on wildfly-openssl for secure sockets due to the performance 
improvements. The current wildfly-openssl does not support Server Name 
Indication (SNI). A fix was made in 
https://github.com/wildfly/wildfly-openssl/issues/59 and there is an official 
release of wildfly-openssl with the fix 
([https://github.com/wildfly/wildfly-openssl/releases/tag/1.0.7.Final)|https://github.com/wildfly/wildfly-openssl/releases/tag/1.0.7.Final).].
  The fix has been validated.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15872) ABFS: getFileStatus should only require execute permission on the parent folders

2018-10-22 Thread Thomas Marquardt (JIRA)
Thomas Marquardt created HADOOP-15872:
-

 Summary: ABFS: getFileStatus should only require execute 
permission on the parent folders
 Key: HADOOP-15872
 URL: https://issues.apache.org/jira/browse/HADOOP-15872
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/azure
Affects Versions: 3.2.0
Reporter: Thomas Marquardt
Assignee: Thomas Marquardt


The ABFS implementation of getFileStatus currently requires read permission.  
According to HDFS permissions guide, it should only require execute on the 
parent folders (traversal access).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-15839) Review + update cloud store sensitive keys in hadoop.security.sensitive-config-keys

2018-10-10 Thread Thomas Marquardt (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Marquardt reopened HADOOP-15839:
---

Looks like we missed things like "fs.azure.account.oauth2.client.secret".  See 
my earlier comment in this JIRA.

> Review + update cloud store sensitive keys in 
> hadoop.security.sensitive-config-keys
> ---
>
> Key: HADOOP-15839
> URL: https://issues.apache.org/jira/browse/HADOOP-15839
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: conf
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: HADOOP-15839-001.patch
>
>
> Make sure that {{hadoop.security.sensitive-config-keys}} is up to date with 
> all cloud store options, including
> h3. s3a:
> * s3a per-bucket secrets
> * s3a session tokens
> h3: abfs
> * {{fs.azure.account.oauth2.client.secret}}
> h3. adls
> fs.adl.oauth2.credential
> fs.adl.oauth2.refresh.token



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15723) ABFS: Ranger Support

2018-09-05 Thread Thomas Marquardt (JIRA)
Thomas Marquardt created HADOOP-15723:
-

 Summary: ABFS: Ranger Support
 Key: HADOOP-15723
 URL: https://issues.apache.org/jira/browse/HADOOP-15723
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Reporter: Thomas Marquardt


Add support for Ranger



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-15703) ABFS - Implement client-side throttling

2018-09-05 Thread Thomas Marquardt (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Marquardt reopened HADOOP-15703:
---
  Assignee: Thomas Marquardt  (was: Sneha Varma)

I'll provide a patch to fix the Yetus issues.  I could not get Yetus to run 
previously, so lets see if it will run on the patch to fix this.

> ABFS - Implement client-side throttling 
> 
>
> Key: HADOOP-15703
> URL: https://issues.apache.org/jira/browse/HADOOP-15703
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Sneha Varma
>    Assignee: Thomas Marquardt
>Priority: Major
> Attachments: HADOOP-15703-HADOOP-15407-001.patch, 
> HADOOP-15703-HADOOP-15407-002.patch
>
>
> Big data workloads frequently exceed the AzureBlobFS max ingress and egress 
> limits 
> (https://docs.microsoft.com/en-us/azure/storage/common/storage-scalability-targets).
>  For example, the max ingress limit for a GRS account in the United States is 
> currently 10 Gbps. When the limit is exceeded, the AzureBlobFS service fails 
> a percentage of incoming requests, and this causes the client to initiate the 
> retry policy. The retry policy delays requests by sleeping, but the sleep 
> duration is independent of the client throughput and account limit. This 
> results in low throughput, due to the high number of failed requests and 
> thrashing causes by the retry policy.
> To fix this, we introduce a client-side throttle which minimizes failed 
> requests and maximizes throughput. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-15547) WASB: improve listStatus performance

2018-08-31 Thread Thomas Marquardt (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Marquardt reopened HADOOP-15547:
---

Reactivating for branch-2 backport.

> WASB: improve listStatus performance
> 
>
> Key: HADOOP-15547
> URL: https://issues.apache.org/jira/browse/HADOOP-15547
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/azure
>Affects Versions: 2.9.1, 3.0.2
>Reporter: Thomas Marquardt
>    Assignee: Thomas Marquardt
>Priority: Major
> Fix For: 3.1.1
>
> Attachments: HADOOP-15547-004.patch, HADOOP-15547-004.patch, 
> HADOOP-15547.001.patch, HADOOP-15547.002.patch, HADOOP-15547.003.patch
>
>
> The WASB implementation of Filesystem.listStatus is very slow due to O(n!) 
> algorithm to remove duplicates and uses too much memory due to the extra 
> conversion from BlobListItem to FileMetadata to FileStatus.  It takes over 30 
> minutes to list 700,000 files.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15704) ABFS: Consider passing FS URI to CustomDelegationTokenManager

2018-08-28 Thread Thomas Marquardt (JIRA)
Thomas Marquardt created HADOOP-15704:
-

 Summary: ABFS: Consider passing FS URI to 
CustomDelegationTokenManager
 Key: HADOOP-15704
 URL: https://issues.apache.org/jira/browse/HADOOP-15704
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Reporter: Thomas Marquardt


Refer to Steve's comments in HADOOP-15692.  Passing the FS or FS URI to the 
CustomDelegationTokenManager would allow it to provide per-filesystem tokens.  
We currently have implementations of CustomDelegationTokenManager, and need to 
do a little leg work here, but it may be possible to update before GA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15692) ABFS: extensible support for custom oauth

2018-08-23 Thread Thomas Marquardt (JIRA)
Thomas Marquardt created HADOOP-15692:
-

 Summary: ABFS: extensible support for custom oauth
 Key: HADOOP-15692
 URL: https://issues.apache.org/jira/browse/HADOOP-15692
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Reporter: Thomas Marquardt


ABFS supports oauth in various forms and needs to export interfaces for 
customization of FileSystem.getDelegationToken and getAccessToken.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15682) ABFS: Add support for StreamCapabilities

2018-08-16 Thread Thomas Marquardt (JIRA)
Thomas Marquardt created HADOOP-15682:
-

 Summary: ABFS: Add support for StreamCapabilities
 Key: HADOOP-15682
 URL: https://issues.apache.org/jira/browse/HADOOP-15682
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.
Reporter: Thomas Marquardt


Add support for the new StreamCapabilities interface.  This work is similar to 
what was done for WASB 
[HADOOP-15677|https://jira.apache.org/jira/browse/HADOOP-15677].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15677) WASB: Add support for StreamCapabilities

2018-08-15 Thread Thomas Marquardt (JIRA)
Thomas Marquardt created HADOOP-15677:
-

 Summary: WASB: Add support for StreamCapabilities
 Key: HADOOP-15677
 URL: https://issues.apache.org/jira/browse/HADOOP-15677
 Project: Hadoop Common
  Issue Type: New Feature
  Components: fs/azure
Affects Versions: 3.0.3, 3.1.0
Reporter: Thomas Marquardt


StreamCapabilities is a new interface in branch-3, and was partially added to 
WASB.  Let's complete the implementation and add test coverage for block blobs, 
block blobs with compaction, and page blobs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15669) ABFS: Improve HTTPS Performance

2018-08-10 Thread Thomas Marquardt (JIRA)
Thomas Marquardt created HADOOP-15669:
-

 Summary: ABFS: Improve HTTPS Performance
 Key: HADOOP-15669
 URL: https://issues.apache.org/jira/browse/HADOOP-15669
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Reporter: Thomas Marquardt


We see approximately 50% worse throughput for ABFS over HTTPs vs HTTP.  Lets 
perform a detailed measurement and see what can be done to improve throughput.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15664) ABFS: Reduce test run time via parallelization and grouping

2018-08-09 Thread Thomas Marquardt (JIRA)
Thomas Marquardt created HADOOP-15664:
-

 Summary: ABFS: Reduce test run time via parallelization and 
grouping
 Key: HADOOP-15664
 URL: https://issues.apache.org/jira/browse/HADOOP-15664
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Reporter: Thomas Marquardt


1) Let's reduce the total test runtime by improving parallelization of the 
tests.

2) Let's make it possible to select WASB tests, ABFS tests, or both so 
developers can run only the tests appropriate for the change they've made.

3) Update the testing-azure.md accordingly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15663) ABFS: Simplify configuration

2018-08-09 Thread Thomas Marquardt (JIRA)
Thomas Marquardt created HADOOP-15663:
-

 Summary: ABFS: Simplify configuration
 Key: HADOOP-15663
 URL: https://issues.apache.org/jira/browse/HADOOP-15663
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Reporter: Thomas Marquardt


Configuration for WASB and ABFS is too complex.  The current approach is to use 
four files for test configuration. 

Both WASB and ABFS have basic test configuration which is committed to the repo 
(azure-test.xml and azure-bfs-test.xml).  Currently these contain the 
fs.AbstractFileSystem.[scheme].impl configuration, but otherwise are empty 
except for an include reference to a file containing the endpoint credentials. 

Both WASB and ABFS have endpoint credential configuration files 
(azure-auth-keys.xml and azure-bfs-auth-keys.xml).  These have been added to 
.gitignore to prevent them from accidentally being submitted in a patch, which 
would leak the developers storage account credentials.  These files contain 
account names, storage account keys, and service endpoints.

There is some overlap of the configuration for WASB and ABFS, where they use 
the same property name but use different values.  

1) Let's reduce the number of test configuration files to one, if possible.

2) Let's simplify the account name, key, and endpoint configuration for WASB 
and ABFS if possible, but still support the legacy way of doing it, which is 
very error prone.

3) Let's improve error handling, so that typos or misconfiguration are not so 
difficult to troubleshoot.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15662) ABFS: Better exception handling of DNS errors

2018-08-09 Thread Thomas Marquardt (JIRA)
Thomas Marquardt created HADOOP-15662:
-

 Summary: ABFS: Better exception handling of DNS errors
 Key: HADOOP-15662
 URL: https://issues.apache.org/jira/browse/HADOOP-15662
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Reporter: Thomas Marquardt


DNS errors are common during testing due to typos or misconfiguration.  They 
can also occur in production, as some transient DNS issues occur from time to 
time. 

1) Let's investigate if we can distinguish between the two and fail fast for 
the test issues, but continue to have retry logic for the transient DNS issues 
in production.

2) Let's improve the error handling of DNS failures, so the user has an 
actionable error message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15547) WASB: listStatus performance

2018-06-17 Thread Thomas Marquardt (JIRA)
Thomas Marquardt created HADOOP-15547:
-

 Summary: WASB: listStatus performance
 Key: HADOOP-15547
 URL: https://issues.apache.org/jira/browse/HADOOP-15547
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/azure
Affects Versions: 3.0.2, 2.9.1
Reporter: Thomas Marquardt
Assignee: Thomas Marquardt


The WASB implementation of Filesystem.listStatus is very slow due to O(n!) 
algorithm to remove duplicates and uses too much memory due to the extra 
conversion from BlobListItem to FileMetadata to FileStatus.  It takes over 30 
minutes to list 700,000 files.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15478) WASB: hflush() and hsync() regression

2018-05-18 Thread Thomas Marquardt (JIRA)
Thomas Marquardt created HADOOP-15478:
-

 Summary: WASB: hflush() and hsync() regression
 Key: HADOOP-15478
 URL: https://issues.apache.org/jira/browse/HADOOP-15478
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/azure
Affects Versions: 3.0.2, 2.9.0
Reporter: Thomas Marquardt
Assignee: Thomas Marquardt


HADOOP-14520 introduced a regression in hflush() and hsync().  Previously, for 
the default case where users upload data as block blobs, these were no-ops.  
Unfortunately, HADOOP-14520 accidentally implemented hflush() and hsync() by 
default, so any data buffered in the stream is immediately uploaded to storage. 
 This new behavior is undesirable, because block blobs have a limit of 50,000 
blocks.  Spark users are now seeing failures due to exceeding the block limit, 
since Spark frequently invokes hflush().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: [DISCUSS] Branch Proposal: HADOOP 15407: ABFS

2018-05-15 Thread Thomas Marquardt
A feature branch seems reasonable to me too.  Note that the WASB connector will 
continue to exist, and live side-by-side with the new Azure Blob Filesystem 
(ABFS) connector.  We will encourage users to move to the new ABFS connector, 
and all of our new feature and performance improvements will target the ABFS 
connector.  ABFS will perform better at no additional cost, so I expect current 
users to migrate in time.  The two connectors are compatible for mainline 
scenarios, but there are some uncommon features in WASB that we chose not to 
carry over in the initial implementation.


So we hope ABFS will replace the usage of WASB, but the WASB connector itself 
will continue to exist.  Maybe we can remove WASB in the future some day, if 
nobody is using it.


I can confirm that nobody ever gets seek() right. :)


Thanks,

Thomas


From: larry mccay <lmc...@apache.org>
Sent: Tuesday, May 15, 2018 8:44 AM
To: Steve Loughran
Cc: Hadoop Common
Subject: Re: [DISCUSS] Branch Proposal: HADOOP 15407: ABFS

This seems like a reasonable and effective use of a feature branch and
branch committers to me.


On Tue, May 15, 2018 at 11:34 AM, Steve Loughran <ste...@hortonworks.com>
wrote:

> Hi
>
> Chris Douglas I and I've have a proposal for a short-lived feature branch
> for the Azure ABFS connector to go into the hadoop-azure package. This will
> connect to the new azure storage service, which will ultimately replace the
> one used by wasb. It's a big patch and, like all storage connectors, will
> inevitably take time to stabilize (i.e: nobody ever get seek() right, even
> when we think we have).
>
> Thomas & Esfandiar will do the coding: they've already done the paperwork.
> Chris, myself & anyone else interested can be involved in the review and
> testing.
>
> Comments?
>
> -
>
> The initial HADOOP-15407 patch contains a new filesystem client for the
> forthcoming Azure ABFS, which is intended to replace Azure WASB as the
> Azure storage layer. The patch is large, as it contains the replacement
> client, tests, and generated code.
>
> We propose a feature branch, so the module can be broken into salient,
> reviewable chunks. Internal constraints prevented this feature from being
> developed in Apache, so we want to ensure that all the code is discussed,
> maintainable, and documented by the community before it merges.
>
> To effect this, we also propose adding two developers as branch
> committers: Thomas Marquardt tm...@microsoft.com<mailto:tma
> r...@microsoft.com> Esfandiar Manii esma...@microsoft.com sma...@microsoft.com>
>
> Beyond normal feature branch activity and merge criteria for FS modules,
> we want to add another merge criterion for ABFS. Some of the client APIs
> are not GA. It seems reasonable to require that this client works with
> public endpoints before it merges to trunk.
>
> To test the Blob FS driver, Blob FS team (including Esfandiar Manii and
> Thomas Marquardt) in Azure Storage will need the MSDN subscription ID(s)
> for all reviewers who want to run the tests. The ABFS team will then
> whitelist the subscription ID(s) for the Blob FS Preview. At that time,
> future storage accounts created will have the Blob FS endpoint,
> .dfs.core.windows.net<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdfs.core.windows.net=02%7C01%7Ctmarq%40microsoft.com%7C8cce958a338644ba48e108d5ba7acf7e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636619958983989716=HG5Ru6jlBauS09rQY49BcLCI39jZPJH5cFVGgAy7JW8%3D=0>,
>  which
> the Blob FS driver relies on.
>
> This is a temporary state during the (current) Private Preview and the
> early phases of Public Preview. In a few months, the whitelisting will not
> be required and anyone will be able to create a storage account with access
> to the Blob FS endpoint.
>
> Thomas and Esfandiar have been active in the Hadoop project working on the
> WASB connector (see 
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHADOOP-14552=02%7C01%7Ctmarq%40microsoft.com%7C8cce958a338644ba48e108d5ba7acf7e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636619958983989716=QFZt%2BNRDEvpV6HX0rHLjPvKBzTWVAQyxji1o6cbgMr0%3D=0).
> They understand the processes and requirements of the software. Working on
> the branch directly will let them bring this significant feature into the
> hadoop-azure module without disrupting existing users.
>


[jira] [Created] (HADOOP-15446) WASB: PageBlobInputStream.skip breaks HBASE replication

2018-05-03 Thread Thomas Marquardt (JIRA)
Thomas Marquardt created HADOOP-15446:
-

 Summary: WASB: PageBlobInputStream.skip breaks HBASE replication
 Key: HADOOP-15446
 URL: https://issues.apache.org/jira/browse/HADOOP-15446
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/azure
Affects Versions: 3.0.2, 2.9.0
Reporter: Thomas Marquardt
Assignee: Thomas Marquardt


Page Blobs are primarily used by HBASE.  HBASE replication, which apparently 
has not been used with WASB until recently, performs non-sequential reads on 
log files using PageBlobInputStream.  There are bugs in this stream 
implementation which prevent skip and seek from working properly, and 
eventually the stream state becomes corrupt and unusable.

I believe this bug affects all releases of WASB/HADOOP.  It appears to be a 
day-0 bug in PageBlobInputStream.  There were similar bugs opened in the past 
(HADOOP-15042) but the issue was not properly fixed, and no test coverage was 
added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15156) backport HADOOP-15086 rename fix to branch-2

2018-01-03 Thread Thomas Marquardt (JIRA)
Thomas Marquardt created HADOOP-15156:
-

 Summary: backport HADOOP-15086 rename fix to branch-2
 Key: HADOOP-15156
 URL: https://issues.apache.org/jira/browse/HADOOP-15156
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/azure
Reporter: Thomas Marquardt
Assignee: Thomas Marquardt


backport HADOOP-15086 (rename fix) to branch-2



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-14769) WASB: delete recursive should not fail if a file is deleted

2017-08-13 Thread Thomas Marquardt (JIRA)
Thomas Marquardt created HADOOP-14769:
-

 Summary: WASB: delete recursive should not fail if a file is 
deleted
 Key: HADOOP-14769
 URL: https://issues.apache.org/jira/browse/HADOOP-14769
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/azure
Reporter: Thomas Marquardt
Assignee: Thomas Marquardt


FileSystem.delete(Path path) and delete(Path path, boolean recursive) return 
false if the path does not exist.  The WASB implementation of recursive delete 
currently fails if one of the entries is deleted by an external agent while a 
recursive delete is in progress.  For example, if you try to delete all of the 
files in a directory, which can be a very long process, and one of the files 
contained within is deleted by an external agent, the recursive directory 
delete operation will fail if it tries to delete that file and discovers that 
it does not exist.  This is not desirable.  A recursive directory delete 
operation should succeeed if the directory initially exists and when the 
operation completes, the directory and all of its entries do not exist.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-14722) Azure: BlockBlobInputStream position incorrect after seek

2017-08-01 Thread Thomas Marquardt (JIRA)
Thomas Marquardt created HADOOP-14722:
-

 Summary: Azure: BlockBlobInputStream position incorrect after seek
 Key: HADOOP-14722
 URL: https://issues.apache.org/jira/browse/HADOOP-14722
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/azure
Reporter: Thomas Marquardt
Assignee: Thomas Marquardt


The seek, skip, and getPos methods of BlockBlobInputStream do not correctly 
account for the stream's  internal buffer.  This results in invalid stream 
positions. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org