[jira] [Commented] (HDFS-14440) RBF: Optimize the file write process in case of multiple destinations.

JIRA Fri, 26 Apr 2019 09:58:24 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827121#comment-16827121
 ]


Íñigo Goiri commented on HDFS-14440:
------------------------------------

Do you mind putting the results in a table? Hard to parse for me which results 
is for what.
I guess one compromise would be to use the old approach for HASH based mount 
points and the new one for SPACE?
BTW, the use case for RANDOM is basically read low balance.
We have files that are read from thousands of containers and we put those files 
in all subclusters and read from a random one.

Interesting observation on the {{getBlockLocations()}} versus {{getFileInfo()}}.
I guess it makes sense as the first one actually requires going through the 
block manager while the other is just name space.
We should consider this.


> RBF: Optimize the file write process in case of multiple destinations.
> ----------------------------------------------------------------------
>
>                 Key: HDFS-14440
>                 URL: https://issues.apache.org/jira/browse/HDFS-14440
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Ayush Saxena
>            Assignee: Ayush Saxena
>            Priority: Major
>         Attachments: HDFS-14440-HDFS-13891-01.patch
>
>
> In case of multiple destinations, We need to check if the file already exists 
> in one of the subclusters for which we use the existing getBlockLocation() 
> API which is by default a sequential Call,
> In an ideal scenario where the file needs to be created each subcluster shall 
> be checked sequentially, this can be done concurrently to save time.
> In another case where the file is found and if the last block is null, we 
> need to do getFileInfo to all the locations to get the location where the 
> file exists. This also can be prevented by use of ConcurrentCall since we 
> shall be having the remoteLocation to where the getBlockLocation returned a 
> non null entry.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14440) RBF: Optimize the file write process in case of multiple destinations.

Reply via email to