bhattmanish98 commented on code in PR #7540:
URL: https://github.com/apache/hadoop/pull/7540#discussion_r2026607555


##########
hadoop-tools/hadoop-azure/src/site/markdown/index.md:
##########
@@ -12,553 +12,1479 @@
   limitations under the License. See accompanying LICENSE file.
 -->
 
-# Hadoop Azure Support: Azure Blob Storage
+# Hadoop Azure Support: ABFS  - Azure Data Lake Storage Gen2
 
 <!-- MACRO{toc|fromDepth=1|toDepth=3} -->
 
-See also:
-
-* [WASB](./wasb.html)
-* [ABFS](./abfs.html)
-* [Namespace Disabled Accounts on ABFS](./fns_blob.html)
-* [Testing](./testing_azure.html)
-
-## Introduction
+## <a name="introduction"></a> Introduction
 
-The `hadoop-azure` module provides support for integration with
-[Azure Blob 
Storage](http://azure.microsoft.com/en-us/documentation/services/storage/).
-The built jar file, named `hadoop-azure.jar`, also declares transitive 
dependencies
-on the additional artifacts it requires, notably the
-[Azure Storage SDK for Java](https://github.com/Azure/azure-storage-java).
+The `hadoop-azure` module provides support for the Azure Data Lake Storage Gen2
+storage layer through the "abfs" connector
 
-To make it part of Apache Hadoop's default classpath, simply make sure that
-`HADOOP_OPTIONAL_TOOLS`in `hadoop-env.sh` has `'hadoop-azure` in the list.
-Example:
+To make it part of Apache Hadoop's default classpath, make sure that
+`HADOOP_OPTIONAL_TOOLS` environment variable has `hadoop-azure` in the list,
+*on every machine in the cluster*
 
 ```bash
-export HADOOP_OPTIONAL_TOOLS="hadoop-azure,hadoop-azure-datalake"
+export HADOOP_OPTIONAL_TOOLS=hadoop-azure
 ```
-## Features
 
-* Read and write data stored in an Azure Blob Storage account.
-* Present a hierarchical file system view by implementing the standard Hadoop
+You can set this locally in your `.profile`/`.bashrc`, but note it won't
+propagate to jobs running in-cluster.
+
+See also:
+* [FNS (non-HNS)](./fns_blob.html)
+* [Legacy-Deprecated-WASB](./wasb.html)
+* [Testing](./testing_azure.html)
+
+## <a name="features"></a> Features of the ABFS connector.
+
+* Supports reading and writing data stored in an Azure Blob Storage account.
+* *Fully Consistent* view of the storage across all clients.
+* Can read data written through the ` deprecated wasb:` connector.
+* Presents a hierarchical file system view by implementing the standard Hadoop
   [`FileSystem`](../api/org/apache/hadoop/fs/FileSystem.html) interface.
 * Supports configuration of multiple Azure Blob Storage accounts.
-* Supports both block blobs (suitable for most use cases, such as MapReduce) 
and
-  page blobs (suitable for continuous write use cases, such as an HBase
-  write-ahead log).
-* Reference file system paths using URLs using the `wasb` scheme.
-* Also reference file system paths using URLs with the `wasbs` scheme for SSL
-  encrypted access.
-* Can act as a source of data in a MapReduce job, or a sink.
-* Tested on both Linux and Windows.
-* Tested at scale.
-
-## Limitations
-
-* File owner and group are persisted, but the permissions model is not 
enforced.
-  Authorization occurs at the level of the entire Azure Blob Storage account.
-* File last access time is not tracked.
+* Can act as a source or destination of data in Hadoop MapReduce, Apache Hive, 
Apache Spark.
+* Tested at scale on both Linux and Windows by Microsoft themselves.
+* Can be used as a replacement for HDFS on Hadoop clusters deployed in Azure 
infrastructure.
+
+For details on ABFS, consult the following documents:
+
+* [A closer look at Azure Data Lake Storage 
Gen2](https://azure.microsoft.com/en-gb/blog/a-closer-look-at-azure-data-lake-storage-gen2/);
+MSDN Article from June 28, 2018.
+* [Storage 
Tiers](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blob-storage-tiers)
 
-## Usage
+## Getting started
 
 ### Concepts
 
-The Azure Blob Storage data model presents 3 core concepts:
+The Azure Storage data model presents 3 core concepts:
 
 * **Storage Account**: All access is done through a storage account.
 * **Container**: A container is a grouping of multiple blobs.  A storage 
account
   may have multiple containers.  In Hadoop, an entire file system hierarchy is
-  stored in a single container.  It is also possible to configure multiple
-  containers, effectively presenting multiple file systems that can be 
referenced
-  using distinct URLs.
-* **Blob**: A file of any type and size.  In Hadoop, files are stored in blobs.
-  The internal implementation also uses blobs to persist the file system
-  hierarchy and other metadata.
+  stored in a single container.
+* **Blob**: A file of any type and size stored with the existing wasb connector
 
-### Configuring Credentials
+The ABFS connector connects to classic containers, or those created
+with Hierarchical Namespaces.
 
-Usage of Azure Blob Storage requires configuration of credentials.  Typically
-this is set in core-site.xml.  The configuration property name is of the form
-`fs.azure.account.key.<account name>.blob.core.windows.net` and the value is 
the
-access key.  **The access key is a secret that protects access to your storage
-account.  Do not share the access key (or the core-site.xml file) with an
-untrusted party.**
+## <a name="namespaces"></a> Hierarchical Namespaces (and WASB Compatibility)
 
-For example:
+A key aspect of ADLS Gen 2 is its support for
+[hierarchical 
namespaces](https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-namespace)
+These are effectively directories and offer high performance rename and delete 
operations
+—something which makes a significant improvement in performance in query 
engines
+writing data to, including MapReduce, Spark, Hive, as well as DistCp.
 
-```xml
-<property>
-  <name>fs.azure.account.key.youraccount.blob.core.windows.net</name>
-  <value>YOUR ACCESS KEY</value>
-</property>
-```
-In many Hadoop clusters, the core-site.xml file is world-readable. It is 
possible to
-protect the access key within a credential provider as well. This provides an 
encrypted
-file format along with protection with file permissions.
+This feature is only available if the container was created with "namespace"
+support.
 
-#### Protecting the Azure Credentials for WASB with Credential Providers
+You enable namespace support when creating a new Storage Account,
+by checking the "Hierarchical Namespace" option in the Portal UI, or, when
+creating through the command line, using the option `--hierarchical-namespace 
true`
 
-To protect these credentials from prying eyes, it is recommended that you use
-the credential provider framework to securely store them and access them
-through configuration. The following describes its use for Azure credentials
-in WASB FileSystem.
+_You cannot enable Hierarchical Namespaces on an existing storage account_
 
-For additional reading on the credential provider API see:
-[Credential Provider 
API](../hadoop-project-dist/hadoop-common/CredentialProviderAPI.html).
+_**Containers in a storage account with Hierarchical Namespaces are
+not (currently) readable through the `deprecated wasb:` connector.**_
 
-##### End to End Steps for Distcp and WASB with Credential Providers
-
-###### provision
+Some of the `az storage` command line commands fail too, for example:
 
 ```bash
-% hadoop credential create 
fs.azure.account.key.youraccount.blob.core.windows.net -value 123
-    -provider localjceks://file/home/lmccay/wasb.jceks
+$ az storage container list --account-name abfswales1
+Blob API is not yet supported for hierarchical namespace accounts. ErrorCode: 
BlobApiNotYetSupportedForHierarchicalNamespaceAccounts
 ```
 
-###### configure core-site.xml or command line system property
+### <a name="creating"></a> Creating an Azure Storage Account
 
-```xml
-<property>
-  <name>hadoop.security.credential.provider.path</name>
-  <value>localjceks://file/home/lmccay/wasb.jceks</value>
-  <description>Path to interrogate for protected credentials.</description>
-</property>
-```
+The best documentation on getting started with Azure Datalake Gen2 with the
+abfs connector is [Using Azure Data Lake Storage Gen2 with Azure HDInsight 
clusters](https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-use-hdi-cluster)
+
+It includes instructions to create it from [the Azure command line 
tool](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest),
+which can be installed on Windows, MacOS (via Homebrew) and Linux (apt or yum).
 
-###### distcp
+The [az 
storage](https://docs.microsoft.com/en-us/cli/azure/storage?view=azure-cli-latest)
 subcommand
+handles all storage commands, [`az storage account 
create`](https://docs.microsoft.com/en-us/cli/azure/storage/account?view=azure-cli-latest#az-storage-account-create)
+does the creation.
 
+Until the ADLS gen2 API support is finalized, you need to add an extension
+to the ADLS command.
 ```bash
-% hadoop distcp
-    [-D 
hadoop.security.credential.provider.path=localjceks://file/home/lmccay/wasb.jceks]
-    hdfs://hostname:9001/user/lmccay/007020615 
wasb://yourcontai...@youraccount.blob.core.windows.net/testDir/
+az extension add --name storage-preview
 ```
 
-NOTE: You may optionally add the provider path property to the distcp command 
line instead of
-added job specific configuration to a generic core-site.xml. The square 
brackets above illustrate
-this capability.
+Check that all is well by verifying that the usage command includes 
`--hierarchical-namespace`:
+```
+$  az storage account
+usage: az storage account create [-h] [--verbose] [--debug]
+     [--output {json,jsonc,table,tsv,yaml,none}]
+     [--query JMESPATH] --resource-group
+     RESOURCE_GROUP_NAME --name ACCOUNT_NAME
+     [--sku 
{Standard_LRS,Standard_GRS,Standard_RAGRS,Standard_ZRS,Premium_LRS,Premium_ZRS}]
+     [--location LOCATION]
+     [--kind {Storage,StorageV2,BlobStorage,FileStorage,BlockBlobStorage}]
+     [--tags [TAGS [TAGS ...]]]
+     [--custom-domain CUSTOM_DOMAIN]
+     [--encryption-services {blob,file,table,queue} [{blob,file,table,queue} 
...]]
+     [--access-tier {Hot,Cool}]
+     [--https-only [{true,false}]]
+     [--file-aad [{true,false}]]
+     [--hierarchical-namespace [{true,false}]]
+     [--bypass {None,Logging,Metrics,AzureServices} 
[{None,Logging,Metrics,AzureServices} ...]]
+     [--default-action {Allow,Deny}]
+     [--assign-identity]
+     [--subscription _SUBSCRIPTION]
+```
 
-#### Protecting the Azure Credentials for WASB within an Encrypted File
+You can list locations from `az account list-locations`, which lists the

Review Comment:
   In future, there can be addition or deletion of few of the locations, do you 
want to hardcode all locations here or just it would be better to write 2-3 
locations and mark ... for others. What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to