[ 
https://issues.apache.org/jira/browse/HDFS-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmytro Molkov updated HDFS-854:
-------------------------------

    Attachment: HDFS-854.patch

Please have a look at the patch.

The problem we are trying to solve here is generating the first block report 
quicker after restart by scanning the volumes in parallel. This way instead of 
scanning 12 TB of data sequentially we scan 12 chunks of 1 TB in parallel. 
Since there is a lot of latency in IO we have an improvement of a few times in 
the time to generate the block report.

The test for this is just running the directory scanner test twice: with 
parallel execution and without it.

> Datanode should scan devices in parallel to generate block report
> -----------------------------------------------------------------
>
>                 Key: HDFS-854
>                 URL: https://issues.apache.org/jira/browse/HDFS-854
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>            Reporter: dhruba borthakur
>            Assignee: Dmytro Molkov
>         Attachments: HDFS-854.patch
>
>
> A Datanode should scan its disk devices in parallel so that the time to 
> generate a block report is reduced. This will reduce the startup time of a 
> cluster.
> A datanode has 12 disk (each of 1 TB) to store HDFS blocks. There is a total 
> of 150K blocks on these 12 disks. It takes the datanode upto 20 minutes to 
> scan these devices to generate the first block report.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to