[jira] [Commented] (HDFS-15171) Add a thread to call saveDfsUsed periodically, to prevent datanode too long restart time.

Stephen O'Donnell (Jira) Tue, 18 Feb 2020 03:52:13 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17039018#comment-17039018
 ]


Stephen O'Donnell commented on HDFS-15171:
------------------------------------------

[~zhuqi] There are a few parts to this.

First, the disk usage is refreshed every 10 minutes by a thread in 
CachingGetSpaceUsed, however I did not realise it did not persist the newly 
calculated value to the cache file. You are correct, as it does not, and the 
shutdown hook does it. It also could take some time to complete if the disk is 
large.

I wonder if we could use the existing refresh thread in CachingGetSpaceUsed, 
perhaps by passing a callback (or something like the existing shutdown hook) so 
it can save the cache file each time it runs?

The next problem is that in BlockPoolSlice.loadDfsUsed(), it will only load the 
cache file if the mtime on the file is less than 
"dfs.datanode.cached-dfsused.check.interval.ms" old. This defaults to 10 
minutes. This results in another problem:

If you shutdown a DN for more than 10 minutes, even if it shutdown cleanly and 
saved the cache file, it will not read the cache file on startup as the file is 
over 10 minutes old.

Saying as the disk usage will be refreshed within 10 minutes of the DN 
starting, I think 10 minutes for dfs.datanode.cached-dfsused.check.interval.ms 
is probably too small and that default could do with being a bit higher.

If we could ensure the cache file was saved approximately every 10 minutes by 
the refresh thread, you could argue that the DN should always use the cache 
file if it is there, as it should be reasonably up to date anyway.

I encountered a problem like this recently with a few DNs which were taking a 
very long time to startup, and we worked around it by adjusting the mtime on 
the cache file to get them started. This was OK for a one off, but perhaps we 
can do better in this Jira.



> Add a thread to call saveDfsUsed periodically, to prevent datanode too long 
> restart time.  
> -------------------------------------------------------------------------------------------
>
>                 Key: HDFS-15171
>                 URL: https://issues.apache.org/jira/browse/HDFS-15171
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 3.2.0
>            Reporter: zhuqi
>            Assignee: zhuqi
>            Priority: Major
>
> There are 30 storage dirs per datanode in our production cluster , it will 
> take too many time to restart, because sometimes the datanode didn't shutdown 
> gracefully. Now only the datanode graceful shut down hook and the 
> blockpoolslice shutdown will cause the saveDfsUsed function, that cause the 
> restart of datanode can't reuse the dfsuse cache sometimes. I think if we can 
> add a thread to periodically call the saveDfsUsed function.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15171) Add a thread to call saveDfsUsed periodically, to prevent datanode too long restart time.

Reply via email to