[ 
https://issues.apache.org/jira/browse/HADOOP-13726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15906663#comment-15906663
 ] 

Manjunath Anand commented on HADOOP-13726:
------------------------------------------

Thanks [~cnauroth] for your inputs. After looking at the points you raised I 
tried few sample code to understand more about {code}computeIfAbsent{code} and 
found below observations (please refer to the code I shared in my above 
comment):-
1) I could see that if the hashcode is same for say two similar keys which are 
passed to computeIfAbsent concurrently then one of them waits for the other to 
complete, but if the hashcode of the keys are different then it doesnt block 
each other. 
2) Inference from test results of above point is that the below point you 
raised should be handled fairly by the computeIfAbsent
{quote} then only threads attempting to access s3a://my-bucket get blocked. 
Threads accessing a different FileSystem, such as hdfs://mylocalcluster can 
still make progress.{quote}

I observed one major benefit of using {code}computeIfAbsent{code} after 
referring to your point which am quoting below:-
{quote}FileSystem initialization is neither short nor simple, involving things 
like network connections and authentication, all of which can suffer 
problematic failure modes like timeouts. {quote}
What I observed during testing is that say if multiple threads try to create 
FileSystem for the same key and if one thread fails then by using the 
computeIfAbsent the next thread which was waiting will compute and I observed 
that this happens until the key has a non null value in the map. So in case of 
a timeout for first thread attempt to create a filesystem then either the 
second concurrent thread or any subsequent thread will be able to retry and see 
if it succeeds.

I tried evaluating the suggestion of using {code}LoadingCache/CacheLoader{code} 
however I came across a problem where in for the overloaded {code}load(K 
key){code} how do we pass the URI and conf from the getInternal method as the 
value for the map in getInternal method is not computed based on the Key but 
based on the URI and conf. There may be a workaround for this but it may 
involved more code change and redesign is my understanding. I am hoping I 
didnot miss out anything here but if I did please suggest.

I would like to work on this (if you dont mind) but only after you are fine 
with the above observations and suggest me which approach to go ahead with or 
if you have any observations which I can explore further.

> Enforce that FileSystem initializes only a single instance of the requested 
> FileSystem.
> ---------------------------------------------------------------------------------------
>
>                 Key: HADOOP-13726
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13726
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Nauroth
>
> The {{FileSystem}} cache is intended to guarantee reuse of instances by 
> multiple call sites or multiple threads.  The current implementation does 
> provide this guarantee, but there is a brief race condition window during 
> which multiple threads could perform redundant initialization.  If the file 
> system implementation has expensive initialization logic, then this is 
> wasteful.  This issue proposes to eliminate that race condition and guarantee 
> initialization of only a single instance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to