[jira] [Commented] (HDFS-14440) RBF: Optimize the file write process in case of multiple destinations.

Ayush Saxena (JIRA) Fri, 26 Apr 2019 08:57:15 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827072#comment-16827072
 ]


Ayush Saxena commented on HDFS-14440:
-------------------------------------

I did some analysis in my test setup for just this part of code to catch some 
confirmation for the number of RPC's and time
 * For A Successful Write, The number of RPC stayed same for both approaches. 
Just the time improved for obvious reasons, we discussed above.
 * For Non Successful Write:
 ** For Empty Files : The minimum RPC is got was 2 with Order HASH and 4 with 
the new approach and time for checking was half with new approach, But for 
order RANDOM, I guess the optimization that you talked about(First Location 
always hitting, I guess didn't hold up for me) And the RPC count was also 
RANDOM and time too with the old Approach, But with new it stayed const and 
same as other cases.
 ** For Non-Empty : The time was same with HASH order and For other it was more 
and was more like dynamic.

Well the time difference for the method execution depends on the n/w state and 
I can't put the number from prod here. Well it is quite mathematical too.

 

Let me know if any doubts pertain. Moreover I don't think any threat to 
Functionality from this change.

> RBF: Optimize the file write process in case of multiple destinations.
> ----------------------------------------------------------------------
>
>                 Key: HDFS-14440
>                 URL: https://issues.apache.org/jira/browse/HDFS-14440
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Ayush Saxena
>            Assignee: Ayush Saxena
>            Priority: Major
>         Attachments: HDFS-14440-HDFS-13891-01.patch
>
>
> In case of multiple destinations, We need to check if the file already exists 
> in one of the subclusters for which we use the existing getBlockLocation() 
> API which is by default a sequential Call,
> In an ideal scenario where the file needs to be created each subcluster shall 
> be checked sequentially, this can be done concurrently to save time.
> In another case where the file is found and if the last block is null, we 
> need to do getFileInfo to all the locations to get the location where the 
> file exists. This also can be prevented by use of ConcurrentCall since we 
> shall be having the remoteLocation to where the getBlockLocation returned a 
> non null entry.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14440) RBF: Optimize the file write process in case of multiple destinations.

Reply via email to