[jira] [Updated] (HDFS-13166) [SPS]: Implement caching mechanism to keep LIVE datanodes to minimize costly getLiveDatanodeStorageReport() calls

Rakesh R (JIRA) Sun, 18 Feb 2018 10:54:21 -0800

     [ 
https://issues.apache.org/jira/browse/HDFS-13166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Rakesh R updated HDFS-13166:
----------------------------
    Description: 
Presently {{#getLiveDatanodeStorageReport()}} is fetched for every file and 
does the computation. This Jira sub-task is to discuss and implement a cache 
mechanism which in turn reduces the number of function calls. Also, could 
define a configurable refresh interval and periodically refresh the DN cache by 
fetching latest {{#getLiveDatanodeStorageReport}} on this interval.

 Following comments taken from HDFS-10285, here
 Comment-7)
{quote}Adding getDatanodeStorageReport is concerning. getDatanodeListForReport 
is already a very bad method that should be avoided for anything but jmx – even 
then it’s a concern. I eliminated calls to it years ago. All it takes is a 
nscd/dns hiccup and you’re left holding the fsn lock for an excessive length of 
time. Beyond that, the response is going to be pretty large and tagging all the 
storage reports is not going to be cheap.

verifyTargetDatanodeHasSpaceForScheduling does it really need the namesystem 
lock? Can’t DatanodeDescriptor#chooseStorage4Block synchronize on its 
storageMap?

Appears to be calling getLiveDatanodeStorageReport for every file. As mentioned 
earlier, this is NOT cheap. The SPS should be able to operate on a fuzzy/cached 
state of the world. Then it gets another datanode report to determine the 
number of live nodes to decide if it should sleep before processing the next 
path. The number of nodes from the prior cached view of the world should 
suffice.
{quote}

  was:Presently {{#getLiveDatanodeStorageReport}} is fetched for every file and 
does the computation. This task is to discuss and implement a cache mechanism 
to minimize the number of function calls. Probably, we could define a 
configurable refresh interval and periodically refresh the DN cache by fetching 
latest {{#getLiveDatanodeStorageReport}}.


> [SPS]: Implement caching mechanism to keep LIVE datanodes to minimize costly 
> getLiveDatanodeStorageReport() calls
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-13166
>                 URL: https://issues.apache.org/jira/browse/HDFS-13166
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>            Priority: Major
>
> Presently {{#getLiveDatanodeStorageReport()}} is fetched for every file and 
> does the computation. This Jira sub-task is to discuss and implement a cache 
> mechanism which in turn reduces the number of function calls. Also, could 
> define a configurable refresh interval and periodically refresh the DN cache 
> by fetching latest {{#getLiveDatanodeStorageReport}} on this interval.
>  Following comments taken from HDFS-10285, here
>  Comment-7)
> {quote}Adding getDatanodeStorageReport is concerning. 
> getDatanodeListForReport is already a very bad method that should be avoided 
> for anything but jmx – even then it’s a concern. I eliminated calls to it 
> years ago. All it takes is a nscd/dns hiccup and you’re left holding the fsn 
> lock for an excessive length of time. Beyond that, the response is going to 
> be pretty large and tagging all the storage reports is not going to be cheap.
> verifyTargetDatanodeHasSpaceForScheduling does it really need the namesystem 
> lock? Can’t DatanodeDescriptor#chooseStorage4Block synchronize on its 
> storageMap?
> Appears to be calling getLiveDatanodeStorageReport for every file. As 
> mentioned earlier, this is NOT cheap. The SPS should be able to operate on a 
> fuzzy/cached state of the world. Then it gets another datanode report to 
> determine the number of live nodes to decide if it should sleep before 
> processing the next path. The number of nodes from the prior cached view of 
> the world should suffice.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13166) [SPS]: Implement caching mechanism to keep LIVE datanodes to minimize costly getLiveDatanodeStorageReport() calls

Reply via email to