[ 
https://issues.apache.org/jira/browse/HDFS-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Yun reassigned HDFS-14627:
-------------------------------

    Assignee: Yang Yun

> Improvements to make slow archive storage works on HDFS
> -------------------------------------------------------
>
>                 Key: HDFS-14627
>                 URL: https://issues.apache.org/jira/browse/HDFS-14627
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Yang Yun
>            Assignee: Yang Yun
>            Priority: Minor
>         Attachments: HDFS-14627.patch, 
> data_flow_between_datanode_and_aws_s3.jpg
>
>
> In our setup, we mount archival storage from remote. the write speed is about 
> 20M/Sec, the read speed is about 40M/Sec, and the normal file operations, for 
> example 'ls', are time consuming.
> we add some improvements to make this kind of archive storage works in 
> currrent hdfs system.
> 1. Add multiply to read/write timeout if block saved on archive storage.
> 2. Save replica cache file of archive storage to other fast disk for quick 
> restart datanode, shutdownHook may does not execute if the saving takes too 
> long time.
> 3. Check mount file system before using mounted archive storage.
> 4. Reduce or avoid call DF during generating heartbeat report for archive 
> storage.
> 5. Add option to skip archive block during decommission.
> 6. Use multi-threads to scan archive storage.
> 7. Check archive storage error with retry times.
> 8. Add option to disable scan block on archive storage.
> 9. Sleep a heartBeat time if there are too many difference when call 
> checkAndUpdate in DirectoryScanner
> 10. An auto-service to scan fsimage and set the storage policy of files 
> according to policy.
> 11. An auto-service to call mover to move the blocks to right storage.
> 12. Dedup files on remote storage if the storage is reliable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to