[jira] [Commented] (HADOOP-12666) Support Microsoft Azure Data Lake - as a file system in Hadoop

Vishwajeet Dusane (JIRA) Tue, 09 Feb 2016 05:10:29 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-12666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138915#comment-15138915
 ]


Vishwajeet Dusane commented on HADOOP-12666:
--------------------------------------------

Thank you [~cnauroth] for the comments. 

1. Yes, i will upload the respective document.
2. We do have extended Contact test cases which integrated with back-end 
service however those test are not pushed as part of this check in. I will 
create separate JIRA for the Live mode test cases.
3. We do run contract test cases.

For the code. 
1. I have refactored namespace as per comments from [~fabbri] and [~cnauroth] 
suggestion as 

||Namespace||Purpose||
|org.apache.hadoop.fs.adl|Public interface exposed for Hadoop application to 
integrate with. For long term support, this namespace to stay even if we remove 
refactor dependency on org.apache.hadoop.hdfs.web|
|org.apache.hadoop.hdfs.web|Extension of WebHdfsFileSystem to override 
protected functionality. Example ConnectionFactory access, Override redirection 
operation etc.|

2. {panel}PrivateAzureDataLakeFileSystem{panel} is exposed for 
{panel}AdlFileSystem{panel} to inherit. I will add the documentation for the 
same.
3. Intentional to not add lock. Even if multiple instances are created the last 
instance would be used across to refresh token.
4. Yes, Similar comment i got from [~chris.douglas] as well. Reason behind 
hiding logging through was to switch quickly between {panel}Log{panel} and 
{panel}System.out.println{panel} during debugging. Quickest way is change the 
code than configuration file. We will migrate to use SLF4J but not part of this 
patch release. is that fine? 
5. Explained above
6. Agree and incorporated the code change.
7. {panel}FileStatus{panel} cache management feature is configurable. In case 
of some scenarios are breaking for the customer, they can turn off the local 
cache. Cache scope is within the process. I will document on the behavior and 
tuning flags like duration of the cache. We do see great performance 
improvement however we do not wish to compromise on the correctness. 
8. {panel}ADLConfKeys#LOG_VERSION{panel} is to capture code instrumentation 
version. this information is used only during debugging session.
9. Excellent point. Bug was, mock server hung up 
{panel}TestAAAAAUploadBenchmark/TestAAAAADownloadBenchmark{panel} are not 
executed before other tests. I will investigate the root cause of this issue 
since you pointed out on the execution order not guaranteed.



> Support Microsoft Azure Data Lake - as a file system in Hadoop
> --------------------------------------------------------------
>
>                 Key: HADOOP-12666
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12666
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs, fs/azure, tools
>            Reporter: Vishwajeet Dusane
>            Assignee: Vishwajeet Dusane
>         Attachments: HADOOP-12666-002.patch, HADOOP-12666-003.patch, 
> HADOOP-12666-004.patch, HADOOP-12666-005.patch, HADOOP-12666-1.patch
>
>   Original Estimate: 336h
>          Time Spent: 336h
>  Remaining Estimate: 0h
>
> h2. Description
> This JIRA describes a new file system implementation for accessing Microsoft 
> Azure Data Lake Store (ADL) from within Hadoop. This would enable existing 
> Hadoop applications such has MR, HIVE, Hbase etc..,  to use ADL store as 
> input or output.
>  
> ADL is ultra-high capacity, Optimized for massive throughput with rich 
> management and security features. More details available at 
> https://azure.microsoft.com/en-us/services/data-lake-store/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12666) Support Microsoft Azure Data Lake - as a file system in Hadoop

Reply via email to