[ 
https://issues.apache.org/jira/browse/HDFS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803142#action_12803142
 ] 

Konstantin Shvachko commented on HDFS-898:
------------------------------------------

I ran some experiments on some large and small images as a proof of concept. 
Here is the table. 
- First line is the number blocks in the file system. The largest I had is 40 
million blocks. 
- Second line is the largest hole free of block ids.
- Third line is the minimum segment that we expect to find which is calculated 
as the ration 2 ^64^ / num_blocks.

I don't know how to right align numbers, so I used leading zeroes, hope it is 
not confusing.

| Number of blocks     | 40,509,569 | 31,959,139 | 241,777 | 178,278 | 148,035 
| 
| Largest segment size | 8,623,203,281,141 | 10,662,709,581,709 | 
889,137,135,725,504 | 1,324,814,576,358,595 | 1,849,602,429,191,491 |
| Expected minimum     | 0,455,367,560,644 | 00,577,197,761,694 | 
076,296,205,914,968 | 0,103,471,211,268,346 | 0,124,609,852,155,620 |

We see that selected segments are larger than the expected minimums and larger 
than 2 ^38^ = 274,877,906,944.
This speaks of the quality of the random generator, but also projects longer 
than 43 years life span with the first segment we choose.

> Sequential generation of block ids
> ----------------------------------
>
>                 Key: HDFS-898
>                 URL: https://issues.apache.org/jira/browse/HDFS-898
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.20.1
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.22.0
>
>
> This is a proposal to replace random generation of block ids with a 
> sequential generator in order to avoid block id reuse in the future.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to