[jira] [Commented] (HADOOP-12666) Support Microsoft Azure Data Lake - as a file system in Hadoop

Steve Loughran (JIRA) Fri, 10 Jun 2016 03:14:04 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-12666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15324175#comment-15324175
 ]


Steve Loughran commented on HADOOP-12666:
-----------------------------------------

 I've only just caught up on this ... I do have some issues which will need 
correcting in the contract test suite. I suspect the size of the patch meant 
everyone was overwhelemed with the project. This is the kind of thing best done 
as feature branch: it can be stabilised before being merged with anything. 
 
 
The reason for doing in a feature branch is that the feature could have been 
kept out until the contract tests are working, and all those areas where there 
is a divergence addressed. It can take a few iterations; doing it in a branch 
will address this. Certainly I'd like to see backports to branch-2 holding back 
(briefly) until those tests are passing, where passing is "don't skip things 
that aren't working unless its a fundamental feature of the FS". Some of the 
test are being skipped with a "BUG" message. Those test failures, are, to me, a 
sign the FS isn't ready for use yet. 
 
Put differently: FS contract tests should not have been postponed until after 
this went in. Those tests are the closest we have to verifying compliance with 
our (reverse-engineered) specification of how a filesystem must work. If the 
tests don't pass, it's not a filesystem according to the Hadoop specification. 
Sorry.
 
At a quick glance at the code, I'm disappointed that a loose "throw IOE" policy 
has been adopted rather than the strict "throw the same exceptions as HDFS" 
one. It's not that much harder to type {{throw new EOFException}} in a seek, is 
it? Speaking of which: seek() is broken. I'll leave it to the reader to guess 
why.
  

> Support Microsoft Azure Data Lake - as a file system in Hadoop
> --------------------------------------------------------------
>
>                 Key: HADOOP-12666
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12666
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs, fs/azure, tools
>            Reporter: Vishwajeet Dusane
>            Assignee: Vishwajeet Dusane
>             Fix For: 3.0.0-alpha1
>
>         Attachments: Create_Read_Hadoop_Adl_Store_Semantics.pdf, 
> HADOOP-12666-002.patch, HADOOP-12666-003.patch, HADOOP-12666-004.patch, 
> HADOOP-12666-005.patch, HADOOP-12666-006.patch, HADOOP-12666-007.patch, 
> HADOOP-12666-008.patch, HADOOP-12666-009.patch, HADOOP-12666-010.patch, 
> HADOOP-12666-011.patch, HADOOP-12666-012.patch, HADOOP-12666-013.patch, 
> HADOOP-12666-014.patch, HADOOP-12666-015.patch, HADOOP-12666-016.patch, 
> HADOOP-12666-1.patch
>
>   Original Estimate: 336h
>          Time Spent: 336h
>  Remaining Estimate: 0h
>
> h2. Description
> This JIRA describes a new file system implementation for accessing Microsoft 
> Azure Data Lake Store (ADL) from within Hadoop. This would enable existing 
> Hadoop applications such has MR, HIVE, Hbase etc..,  to use ADL store as 
> input or output.
>  
> ADL is ultra-high capacity, Optimized for massive throughput with rich 
> management and security features. More details available at 
> https://azure.microsoft.com/en-us/services/data-lake-store/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-12666) Support Microsoft Azure Data Lake - as a file system in Hadoop

Reply via email to