[ https://issues.apache.org/jira/browse/HADOOP-12666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vishwajeet Dusane updated HADOOP-12666: --------------------------------------- Status: In Progress (was: Patch Available) > Support Windows Azure Data Lake - as a file system in Hadoop > ------------------------------------------------------------ > > Key: HADOOP-12666 > URL: https://issues.apache.org/jira/browse/HADOOP-12666 > Project: Hadoop Common > Issue Type: New Feature > Components: fs, fs/azure, tools > Reporter: Vishwajeet Dusane > Assignee: Vishwajeet Dusane > Attachments: HADOOP-12666-002.patch, HADOOP-12666-003.patch, > HADOOP-12666-1.patch > > Original Estimate: 336h > Time Spent: 336h > Remaining Estimate: 0h > > h2. Description > This JIRA describes a new file system implementation for accessing Windows > Azure Data Lake Store (ADL) from within Hadoop. This would enable existing > Hadoop applications such has MR, HIVE, Hbase etc.., to use ADL store as > input or output. > > ADL is ultra-high capacity, Optimized for massive throughput with rich > management and security features. More details available at > https://azure.microsoft.com/en-us/services/data-lake-store/ > h2. High level design > ADL file system exposes RESTful interfaces compatible with WebHdfs > specification 2.7.1. > At a high level, the code here extends the SWebHdfsFileSystem class to > provide an implementation for accessing ADL storage; the scheme ADL is used > for accessing it over HTTPS. We use the URI scheme: > {code}adl://<URI to account>/path/to/file{code} > to address individual Files/Folders. Tests are implemented mostly using a > Contract implementation for the ADL functionality, with an option to test > against a real ADL storage if configured. > h2. Credits and history > This has been ongoing work for a while, and the early version of this work > can be seen in. Credit for this work goes to the team: [~vishwajeet.dusane], > [~snayak], [~srevanka], [~kiranch], [~chakrab], [~omkarksa], [~snvijaya], > [~ansaiprasanna] [~jsangwan] > h2. Test > Besides Contract tests, we have used ADL as the additional file system in the > current public preview release. Various different customer and test workloads > have been run against clusters with such configurations for quite some time. > The current version reflects to the version of the code tested and used in > our production environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)