[jira] [Commented] (HDFS-14440) RBF: Optimize the file write process in case of multiple destinations.

JIRA Fri, 19 Apr 2019 09:30:25 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822031#comment-16822031
 ]


Íñigo Goiri commented on HDFS-14440:
------------------------------------

I think we should try to exploit hashing instead of invoking everywhere.
There are two cases here:
* Old approach:
** The file already exists: we check in the one that we expect it to be. This 
is 1 RPC calls.
** The file is new: we go over all the subclusters one by one. This is N RPC 
calls in N times.
* The approach in [^HDFS-14440-HDFS-13891-01.patch]:
** The file already exists: we check everywhere. This is N RPC calls in 1 time.
** The file is new: we check everywhere. This is N RPC calls in 1 time.

I personally prefer the old approach.
The new one reduces the time for an operation that is not that critical.
There could even be a hybrid where we check in the one we expect and then 
concurrently in the rest.

Are you seeing this as a bottleneck?


> RBF: Optimize the file write process in case of multiple destinations.
> ----------------------------------------------------------------------
>
>                 Key: HDFS-14440
>                 URL: https://issues.apache.org/jira/browse/HDFS-14440
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Ayush Saxena
>            Assignee: Ayush Saxena
>            Priority: Major
>         Attachments: HDFS-14440-HDFS-13891-01.patch
>
>
> In case of multiple destinations, We need to check if the file already exists 
> in one of the subclusters for which we use the existing getBlockLocation() 
> API which is by default a sequential Call,
> In an ideal scenario where the file needs to be created each subcluster shall 
> be checked sequentially, this can be done concurrently to save time.
> In another case where the file is found and if the last block is null, we 
> need to do getFileInfo to all the locations to get the location where the 
> file exists. This also can be prevented by use of ConcurrentCall since we 
> shall be having the remoteLocation to where the getBlockLocation returned a 
> non null entry.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14440) RBF: Optimize the file write process in case of multiple destinations.

Reply via email to