[jira] [Commented] (HADOOP-12666) Support Microsoft Azure Data Lake - as a file system in Hadoop

Aaron Fabbri (JIRA) Wed, 24 Feb 2016 16:49:37 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-12666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166449#comment-15166449
 ]


Aaron Fabbri commented on HADOOP-12666:
---------------------------------------

[~vishwajeet.dusane] thanks for the responses. It seems you are not convinced 
on some of the synchronization bugs I pointed out.

Two hints:

- Understand why 
[ConcurrentHashMap|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ConcurrentHashMap.html]
 adds functions like putIfAbsent().  Just because get() and put() are 
synchronized, does not make this safe: 

{noformat}
if (map.get() == null) {
    Thing t = new Thing();
    map.put(t);
}
{noformat}


- Use of volatile.  Given:
{noformat}
    volatile byte[] data = null;
    volatile int bufferOffset = 0;
{noformat}

This does not make code like this thread safe:

{noformat}
int read() {
    return data[bufferOffset++] & 0xff;
}
{noformat}

The argument "we probably don't need thread safety anyways" implies you should 
just remove all synchronization.  If not needed, it would hurt performance.

If I'm wrong on any of this please call it out.  Thank you.




> Support Microsoft Azure Data Lake - as a file system in Hadoop
> --------------------------------------------------------------
>
>                 Key: HADOOP-12666
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12666
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs, fs/azure, tools
>            Reporter: Vishwajeet Dusane
>            Assignee: Vishwajeet Dusane
>         Attachments: HADOOP-12666-002.patch, HADOOP-12666-003.patch, 
> HADOOP-12666-004.patch, HADOOP-12666-005.patch, HADOOP-12666-006.patch, 
> HADOOP-12666-1.patch
>
>   Original Estimate: 336h
>          Time Spent: 336h
>  Remaining Estimate: 0h
>
> h2. Description
> This JIRA describes a new file system implementation for accessing Microsoft 
> Azure Data Lake Store (ADL) from within Hadoop. This would enable existing 
> Hadoop applications such has MR, HIVE, Hbase etc..,  to use ADL store as 
> input or output.
>  
> ADL is ultra-high capacity, Optimized for massive throughput with rich 
> management and security features. More details available at 
> https://azure.microsoft.com/en-us/services/data-lake-store/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12666) Support Microsoft Azure Data Lake - as a file system in Hadoop

Reply via email to