[ https://issues.apache.org/jira/browse/HADOOP-12666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Douglas updated HADOOP-12666: ----------------------------------- Attachment: HADOOP-12666-012.patch CachedRefreshTokenBasedAccessTokenProvider - Since the AccessTokenProvider is only created by reflection, the Timer cstr is for testing and does not require an override in this subclass - The static instance should be final and created during class initialization, but... - {{ConfRefreshTokenBasedAccessTokenProvider}} is not threadsafe. {{setConf}} will update the static instance without synchronization, which is shared by every instance of {{CachedRTBATP}}. This could cause undefined behavior. The intent is to be to pool clients with the same parameters? Would it make sense to add a small cache (v12)? PrivateCachedRefreshTokenBasedAccessTokenProvider - The override doesn't seem to serve a purpose. Since it's a workaround, adding audience/visibility annotations (HADOOP-5073) would emphasize that this is temporary. PrivateAzureDataLakeFileSystem - catching {{ArrayIndexOutOfBoundsException}} instead of performing proper bounds checking in {{BufferManager::get}} is not efficient: {code:title=PrivateAzureDataLakeFileSystem.java} synchronized (BufferManager.getLock()) { if (bm.hasData(fsPath.toString(), fileOffset, len)) { try { bm.get(data, fileOffset); validDataHoldingSize = data.length; currentFileOffset = fileOffset; } catch (ArrayIndexOutOfBoundsException e) { fetchDataOverNetwork = true; } } else { fetchDataOverNetwork = true; } } {code} {code:title=BufferManager.java} void get(byte[] data, long offset) { System.arraycopy(buffer.data, (int) (offset - buffer.offset), data, 0, data.length); } {code} The BufferManager/PrivateAzureDataLakeFileSystem synchronization is unorthodox, and verifying its correctness is not straightforward. Layering that complexity on top of the readahead logic without simplifying abstractions makes it very difficult to review. I hope subsequent revisions will replace this code with a clearer model, because the current code will be very difficult to maintain. > Support Microsoft Azure Data Lake - as a file system in Hadoop > -------------------------------------------------------------- > > Key: HADOOP-12666 > URL: https://issues.apache.org/jira/browse/HADOOP-12666 > Project: Hadoop Common > Issue Type: New Feature > Components: fs, fs/azure, tools > Reporter: Vishwajeet Dusane > Assignee: Vishwajeet Dusane > Attachments: Create_Read_Hadoop_Adl_Store_Semantics.pdf, > HADOOP-12666-002.patch, HADOOP-12666-003.patch, HADOOP-12666-004.patch, > HADOOP-12666-005.patch, HADOOP-12666-006.patch, HADOOP-12666-007.patch, > HADOOP-12666-008.patch, HADOOP-12666-009.patch, HADOOP-12666-010.patch, > HADOOP-12666-011.patch, HADOOP-12666-012.patch, HADOOP-12666-1.patch > > Original Estimate: 336h > Time Spent: 336h > Remaining Estimate: 0h > > h2. Description > This JIRA describes a new file system implementation for accessing Microsoft > Azure Data Lake Store (ADL) from within Hadoop. This would enable existing > Hadoop applications such has MR, HIVE, Hbase etc.., to use ADL store as > input or output. > > ADL is ultra-high capacity, Optimized for massive throughput with rich > management and security features. More details available at > https://azure.microsoft.com/en-us/services/data-lake-store/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org