[ https://issues.apache.org/jira/browse/HDFS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803142#action_12803142 ]
Konstantin Shvachko commented on HDFS-898: ------------------------------------------ I ran some experiments on some large and small images as a proof of concept. Here is the table. - First line is the number blocks in the file system. The largest I had is 40 million blocks. - Second line is the largest hole free of block ids. - Third line is the minimum segment that we expect to find which is calculated as the ration 2 ^64^ / num_blocks. I don't know how to right align numbers, so I used leading zeroes, hope it is not confusing. | Number of blocks | 40,509,569 | 31,959,139 | 241,777 | 178,278 | 148,035 | | Largest segment size | 8,623,203,281,141 | 10,662,709,581,709 | 889,137,135,725,504 | 1,324,814,576,358,595 | 1,849,602,429,191,491 | | Expected minimum | 0,455,367,560,644 | 00,577,197,761,694 | 076,296,205,914,968 | 0,103,471,211,268,346 | 0,124,609,852,155,620 | We see that selected segments are larger than the expected minimums and larger than 2 ^38^ = 274,877,906,944. This speaks of the quality of the random generator, but also projects longer than 43 years life span with the first segment we choose. > Sequential generation of block ids > ---------------------------------- > > Key: HDFS-898 > URL: https://issues.apache.org/jira/browse/HDFS-898 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node > Affects Versions: 0.20.1 > Reporter: Konstantin Shvachko > Assignee: Konstantin Shvachko > Fix For: 0.22.0 > > > This is a proposal to replace random generation of block ids with a > sequential generator in order to avoid block id reuse in the future. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.