[
https://issues.apache.org/jira/browse/HDFS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803142#action_12803142
]
Konstantin Shvachko commented on HDFS-898:
------------------------------------------
I ran some experiments on some large and small images as a proof of concept.
Here is the table.
- First line is the number blocks in the file system. The largest I had is 40
million blocks.
- Second line is the largest hole free of block ids.
- Third line is the minimum segment that we expect to find which is calculated
as the ration 2 ^64^ / num_blocks.
I don't know how to right align numbers, so I used leading zeroes, hope it is
not confusing.
| Number of blocks | 40,509,569 | 31,959,139 | 241,777 | 178,278 | 148,035
|
| Largest segment size | 8,623,203,281,141 | 10,662,709,581,709 |
889,137,135,725,504 | 1,324,814,576,358,595 | 1,849,602,429,191,491 |
| Expected minimum | 0,455,367,560,644 | 00,577,197,761,694 |
076,296,205,914,968 | 0,103,471,211,268,346 | 0,124,609,852,155,620 |
We see that selected segments are larger than the expected minimums and larger
than 2 ^38^ = 274,877,906,944.
This speaks of the quality of the random generator, but also projects longer
than 43 years life span with the first segment we choose.
> Sequential generation of block ids
> ----------------------------------
>
> Key: HDFS-898
> URL: https://issues.apache.org/jira/browse/HDFS-898
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: name-node
> Affects Versions: 0.20.1
> Reporter: Konstantin Shvachko
> Assignee: Konstantin Shvachko
> Fix For: 0.22.0
>
>
> This is a proposal to replace random generation of block ids with a
> sequential generator in order to avoid block id reuse in the future.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.