[jira] [Commented] (HADOOP-12666) Support Microsoft Azure Data Lake - as a file system in Hadoop

Chris Nauroth (JIRA) Fri, 10 Jun 2016 09:57:37 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-12666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15324806#comment-15324806
 ]


Chris Nauroth commented on HADOOP-12666:
----------------------------------------

bq. This is the kind of thing best done as feature branch: it can be stabilised 
before being merged with anything.

[~ste...@apache.org], please allow me to take the responsibility for this, and 
I'll learn from it.  I was working based on established precedent for prior 
file system implementations.  The prior history is that s3n, s3a, swift and 
wasb all entered trunk without development on feature branches.  I considered 
feature branches unnecessary for those, because they were new, isolated modules 
with no impact on the rest of the tree, so I followed the same path here for 
ADL.

Established precedent doesn't always mean good practice though.  My thinking on 
this has evolved recently, and I'm leaning towards using feature branches more 
often.

bq. Certainly I'd like to see backports to branch-2 holding back...

Yes, this is the plan.  I may have caused confusion earlier with my accidental 
commit to branch-2.  That has been reverted.

I'd like to recap the plan that Chris D and I landed on.  The comments on this 
JIRA are lengthy, so it would have been easy to miss this:

{quote}
HADOOP-13037 will remove the dependency on WebHDFS, largely rewriting this 
client. The buffering in PrivateAzureDataLakeFileSystem should also be 
rewritten. It's implementing something like demand-paging, but some of the 
control flow would be more powerful, and more understandable, if it were 
layered more conventionally. Configuring the client is also very complex. I 
tried the directions, but only arrived at a working client with Vishwajeet's 
help.
The target version is 2.9, but we should hold off on backporting this before 
it's easier to use and maintain. I would like to commit the result of review 
from Chris Nauroth, Lei (Eddy) Xu, Tony Wu, Aaron Fabbri, and Sean Mackrory to 
trunk. It'll be easier to fixup the patch in targeted JIRAs. Committing the 
contract tests in HADOOP-12875 would also be helpful. This would be with the 
caveats from HDFS-9938: this module may be removed if it impedes WebHDFS 
development. Further, it should be easier to configure before we include it in 
a release. Is this an acceptable path forward?
{quote}

To summarize, the plan was going to be to commit HADOOP-12666 and HADOOP-12875 
close together.  Then, the contract tests would serve as an effective check 
against regressions when the work is done on HADOOP-13037.  You have -1'd the 
current revision of HADOOP-12875, so the contributors will need to work through 
your feedback before HADOOP-12875 can be committed.

> Support Microsoft Azure Data Lake - as a file system in Hadoop
> --------------------------------------------------------------
>
>                 Key: HADOOP-12666
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12666
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs, fs/azure, tools
>            Reporter: Vishwajeet Dusane
>            Assignee: Vishwajeet Dusane
>             Fix For: 3.0.0-alpha1
>
>         Attachments: Create_Read_Hadoop_Adl_Store_Semantics.pdf, 
> HADOOP-12666-002.patch, HADOOP-12666-003.patch, HADOOP-12666-004.patch, 
> HADOOP-12666-005.patch, HADOOP-12666-006.patch, HADOOP-12666-007.patch, 
> HADOOP-12666-008.patch, HADOOP-12666-009.patch, HADOOP-12666-010.patch, 
> HADOOP-12666-011.patch, HADOOP-12666-012.patch, HADOOP-12666-013.patch, 
> HADOOP-12666-014.patch, HADOOP-12666-015.patch, HADOOP-12666-016.patch, 
> HADOOP-12666-1.patch
>
>   Original Estimate: 336h
>          Time Spent: 336h
>  Remaining Estimate: 0h
>
> h2. Description
> This JIRA describes a new file system implementation for accessing Microsoft 
> Azure Data Lake Store (ADL) from within Hadoop. This would enable existing 
> Hadoop applications such has MR, HIVE, Hbase etc..,  to use ADL store as 
> input or output.
>  
> ADL is ultra-high capacity, Optimized for massive throughput with rich 
> management and security features. More details available at 
> https://azure.microsoft.com/en-us/services/data-lake-store/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-12666) Support Microsoft Azure Data Lake - as a file system in Hadoop

Reply via email to