[ https://issues.apache.org/jira/browse/HADOOP-12666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15162960#comment-15162960 ]
Vishwajeet Dusane commented on HADOOP-12666: -------------------------------------------- Extracted comments out * AF>This class seems to have serious issues that need addressing: 1. Local race conditions in caller PrivateAzureDataLakeFileSystem [Vishwajeet] - Already explained in the previous replies to comments from [~cnauroth] and [~eddyxu]. In the worst case, there would be a cache miss and this would not break any functionality. 2. No mechanism for cache invalidation across nodes in the cluster. [Vishwajeet] - Scope of cache is per process. Invalidation across nodes would be taken care as part of the distributed cache mechanism in the future. * AF>Update comment? This uses "adl" scheme, right? [Vishwajeet] - Yes * AF>What is this class used for? I didn't see any uses. [Vishwajeet] - No longer required. This will is removed. * AF>Care to comment why this is in the ..hdfs.web package instead of fs.adl? [Vishwajeet] Already explained in the previous comments. In short, requirement to use protected functionality within `WebHdfsFileSystem` implementation to disable redirect operation, Access Runner class for configuration.. etc. * AF>? [Vishwajeet] Comment was trimmed, updated again. * AF>Due to the bug or due to the fix? The fix was merged in 2.8.0, right? * AF>I'm not understanding this last sentence, can you explain? [Vishwajeet] Not needed anymore since homedirectory would constructed locally and no back-end call would be necessary. Updated comment accordingly. * AF>Is this a race condition? thread 1> getFileStatus(), cache miss super.getStatus -> s1 cache.get() -> null thread 2> delete() cache.clear() thread 1> cache.put(s1) Maybe provide an atomic putIfAbsent() for FileStatusCacheManager. You can synchronize on the underlying map object I believe (see Collections.synchronizedMap()). [Vishwajeet] - Already using `syncMap = Collections.synchronizedMap(map);` We are aware of the limitation of the current implementation. Cache is short lived and would not be persistent effect. Is there a real user scenario where such issue can surface frequently ? We have recommended to turn off the `FileStatus` cache feature in case of misbehavior. * AF> Seems like there is a less-likely race condition here. (f is replaced by a directory after checking fs.isFile()) [Vishwajeet] Why there would be race condition? * AF>Similar pattern of get/mutate non-atomically repeats here and below. [Vishwajeet] Could you please elaborate the issue ? * AF>typo [Vishwajeet] Thanks, corrected. * AF>Did you guys consider handling this as transparently reconnecting, instead of doing separate connections for each op? Seems like performance would be a lot better? [Vishwajeet] Exactly the same. Since transparent re-connection would require to pass new offset and length value again which is separate op. HTTP persistent connection ensures the same socket is reused so no impact on the performance as well. We have done enough perf test to ensure. * AF>I'd expect you to want a connection per-thread, instead of per-op. [Vishwajeet] Connection per-op * AF> This case could use some perf optimization. e.g. Three calls to get system time. + return fin; + } [Vishwajeet] Agree. Corrected. * AF> How about adding ADLLogger.logWithTimestamp(). That way, if the logger is disabled, you don't keep getting system time. [Vishwajeet] Removed system.currentTimeinMs in case debug or perf flags are OFF. * AF> Redundant check of isLogEnabled() + ADLLogger.log("getFileBlockLocations } [Vishwajeet] Corrected * AF>Just use "name" twice instead of defining host? [Vishwajeet] Mainly for readability on block location computation so would not change that. * AF> Why the runtime check of a compile-time constant? How about just add a comment near the definition "must be non-zero" + throw new IllegalArgumentException( + "The block size for the given file is not a positive number} [Vishwajeet] Removed. * AF> Redundant check of isLogEnabled() + ADLLogger.log("getFileBlockLocations } [Vishwajeet] Removed * AF>Formatting. Missing newline above. [Vishwajeet] Using Apache.xml in intellij for formatting per apache guideline. Will look at it later in rulset defined in Apache.xml and raise JIRA accordingly. * AF>Why volatile here? Needs comments. I have a feeling this is wrong and you need some synchronized blocks below instead. [Vishwajeet] - We removed synchronization block for performance reason and used Volatile for specific variables which require synchronization. Problem could be when same FsInputSteam instance being used across threads to read from the same file. Couple of variable does not require to be volatile though, updated code. * AF>Probably unique. What is impact if this collides? Why not use a random java UUID? [Vishwajeet] - For performance reason we have removed UUID usage. Usage is for telemetry purpose. It is used with other parameters to identity the request. For the overlapped value, impact is none. * AF>this is not thread safe, but looking at FSInputStream, it appears read() is supposed to be. Seems this class is racy in general? [Vishwajeet] - Assumption is FsInputSteam instance API would not be invoked from multiple threads. Are you aware of such condition which we might encounter? * AF>I didn't have time to review this part thoroughly.. Can take better look next round. [Vishwajeet] Looking forward for your comments. * AF>Is this shutdown() needed? [Vishwajeet] - Do you see issue any issue here ? * AF> Curious, why don't you care about timeout expiring? Can't you just remove this whole try-catch block? The futures are what are ensuring tasks finish below, no? + } [Vishwajeet] - Decision is to have time bound operation and if not complete in the duration then fail fast so that application can retry. * AF>Add function comment w/ param and @return explanation? [Vishwajeet] - Done * AF>Can you explain what this openid does? - [Vishwajeet] To be used by ADL back-end for telemetry. * AF> humm.. we expect in.read() to return 0, and then return -1? Also, you can remove the string concatenation in the exception message. + } [Vishwajeet] We encountered this condition when requested n number of bytes match the available n number of bytes however available bytes are (n+1), where +1 is end of network steam marker. Mainly for reusing socket connection. if end of network steam marker is not read then next urlConnection instance re initiate new socket connection to back-end server. * AF> As noted before, using filesystem types to achieve logging tags is awkward. [Vishwajeet] Additional instrumentation is done in this class for capture perf data hence the separate filesystem exposed. We need to rework on that, i will file separate JIRA for the same. * AF>Again, need to put these in core-default.xml with documentation and [Vishwajeet] Yes, Will add it as part of the patch. All the configuration defined part of this patch are already lowered case in Path 06, do let me know if you have found any discrepancy? > Support Microsoft Azure Data Lake - as a file system in Hadoop > -------------------------------------------------------------- > > Key: HADOOP-12666 > URL: https://issues.apache.org/jira/browse/HADOOP-12666 > Project: Hadoop Common > Issue Type: New Feature > Components: fs, fs/azure, tools > Reporter: Vishwajeet Dusane > Assignee: Vishwajeet Dusane > Attachments: HADOOP-12666-002.patch, HADOOP-12666-003.patch, > HADOOP-12666-004.patch, HADOOP-12666-005.patch, HADOOP-12666-006.patch, > HADOOP-12666-1.patch > > Original Estimate: 336h > Time Spent: 336h > Remaining Estimate: 0h > > h2. Description > This JIRA describes a new file system implementation for accessing Microsoft > Azure Data Lake Store (ADL) from within Hadoop. This would enable existing > Hadoop applications such has MR, HIVE, Hbase etc.., to use ADL store as > input or output. > > ADL is ultra-high capacity, Optimized for massive throughput with rich > management and security features. More details available at > https://azure.microsoft.com/en-us/services/data-lake-store/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)