[jira] Commented: (HDFS-898) Sequential generation of block ids
[ https://issues.apache.org/jira/browse/HDFS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976379#action_12976379 ] dhruba borthakur commented on HDFS-898: --- I think we should do sequential-ids only for newly formatted clusters and not require that existing upgrade their disk formats. > Sequential generation of block ids > -- > > Key: HDFS-898 > URL: https://issues.apache.org/jira/browse/HDFS-898 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.20.1 >Reporter: Konstantin Shvachko > Fix For: 0.23.0 > > Attachments: DuplicateBlockIds.patch, FreeBlockIds.pdf, > HighBitProjection.pdf > > > This is a proposal to replace random generation of block ids with a > sequential generator in order to avoid block id reuse in the future. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-898) Sequential generation of block ids
[ https://issues.apache.org/jira/browse/HDFS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839870#action_12839870 ] Konstantin Shvachko commented on HDFS-898: -- Great! Yet another cluster would have had its block ids converted using 8 bit projection collision free. Thanks Dmytro. > Sequential generation of block ids > -- > > Key: HDFS-898 > URL: https://issues.apache.org/jira/browse/HDFS-898 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.20.1 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Fix For: 0.22.0 > > Attachments: DuplicateBlockIds.patch, FreeBlockIds.pdf, > HighBitProjection.pdf > > > This is a proposal to replace random generation of block ids with a > sequential generator in order to avoid block id reuse in the future. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-898) Sequential generation of block ids
[ https://issues.apache.org/jira/browse/HDFS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839805#action_12839805 ] Dmytro Molkov commented on HDFS-898: I just ran a tool on FB cluster. Here is the output: Bit map applied: ff00 Number of collisions = 0 = Number of blocks = 57909756 Number of negtive ids = 57909756 Number of positive ids = 0 Largest segment = (-277768208, 9223372036854775807) Segment size = 9.223372037132544E18 Expected max = 318542942464 > Sequential generation of block ids > -- > > Key: HDFS-898 > URL: https://issues.apache.org/jira/browse/HDFS-898 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.20.1 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Fix For: 0.22.0 > > Attachments: DuplicateBlockIds.patch, FreeBlockIds.pdf, > HighBitProjection.pdf > > > This is a proposal to replace random generation of block ids with a > sequential generator in order to avoid block id reuse in the future. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-898) Sequential generation of block ids
[ https://issues.apache.org/jira/browse/HDFS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829998#action_12829998 ] Konstantin Shvachko commented on HDFS-898: -- Yes the birthday problem is similar, but not the same. In BDP you project /mm/dd to . But in BDP years are arbitrary, while in our case -s are limited to a certain number as Nicholas mentioned. Also birthdays do not uniquely identify a person, while in our case all blocks have unique ids. > Sequential generation of block ids > -- > > Key: HDFS-898 > URL: https://issues.apache.org/jira/browse/HDFS-898 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.20.1 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Fix For: 0.22.0 > > Attachments: DuplicateBlockIds.patch, HighBitProjection.pdf > > > This is a proposal to replace random generation of block ids with a > sequential generator in order to avoid block id reuse in the future. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-898) Sequential generation of block ids
[ https://issues.apache.org/jira/browse/HDFS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829897#action_12829897 ] Todd Lipcon commented on HDFS-898: -- You're completely right, now that I think more about it. However, it's a good sanity check since we know the probability of duplicates to be relatively small. Using the correct numbers, the birthday problem approximation is 0.0307 which lines up with yours very closely. Thanks, and apologies for the junk on the jira. > Sequential generation of block ids > -- > > Key: HDFS-898 > URL: https://issues.apache.org/jira/browse/HDFS-898 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.20.1 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Fix For: 0.22.0 > > Attachments: DuplicateBlockIds.patch, HighBitProjection.pdf > > > This is a proposal to replace random generation of block ids with a > sequential generator in order to avoid block id reuse in the future. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-898) Sequential generation of block ids
[ https://issues.apache.org/jira/browse/HDFS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829894#action_12829894 ] Tsz Wo (Nicholas), SZE commented on HDFS-898: - This is not a birthday problem. There is no limitation on the number of people having the same birthday but there is a limitation on block ids. For 365 days and 2^26 people, all 2^26 people could have the same birthday. However, for 2^(64-b) id space and B=2^26 blocks, at most two blocks could share the same block id after removing 1 bit. > Sequential generation of block ids > -- > > Key: HDFS-898 > URL: https://issues.apache.org/jira/browse/HDFS-898 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.20.1 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Fix For: 0.22.0 > > Attachments: DuplicateBlockIds.patch, HighBitProjection.pdf > > > This is a proposal to replace random generation of block ids with a > sequential generator in order to avoid block id reuse in the future. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-898) Sequential generation of block ids
[ https://issues.apache.org/jira/browse/HDFS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829893#action_12829893 ] Todd Lipcon commented on HDFS-898: -- Wait, I just realized my mistake... I had used ^ instead of **. Please ignore! > Sequential generation of block ids > -- > > Key: HDFS-898 > URL: https://issues.apache.org/jira/browse/HDFS-898 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.20.1 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Fix For: 0.22.0 > > Attachments: DuplicateBlockIds.patch, HighBitProjection.pdf > > > This is a proposal to replace random generation of block ids with a > sequential generator in order to avoid block id reuse in the future. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-898) Sequential generation of block ids
[ https://issues.apache.org/jira/browse/HDFS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829890#action_12829890 ] Todd Lipcon commented on HDFS-898: -- I get slightly different figures than you guys... I am looking at this as identical to the well-known Birthday Problem: http://en.wikipedia.org/wiki/Birthday_problem In our case, we have 2^(64-b) "days" and 2^26 "people" We have 2^(64-b) "days" and B=2^26 "people". Following the formula on Wikipedia: {noformat} In [21]: n = 2^26 In [22]: d = 2^(64-8) In [23]: reduce(operator.mul, [(1 - float(i)/d) for i in xrange(0, n)]) Out[23]: 0.0037908372356959502 {noformat} whereas you've calculated 0.03065 for this case. The python above agrees with Wikipedia for the birthday example, so I think the code is correct: {noformat} In [25]: d = 365 In [26]: n = 23 In [27]: reduce(operator.mul, [(1 - float(i)/d) for i in xrange(0, n)]) Out[27]: 0.49270276567601451 {noformat} Wary of floating point math, I also checked using int math to calculate numerator and denominator, then int division to make them smaller, then float division to get a fraction: {noformat} In [70]: num,denom = (reduce(operator.mul, [d - i for i in xrange(0, n)])), (d**(n)) In [71]: float(num/1)/float(denom/1) Out[71]: 0.0037908372356959502 {noformat} So where are our numbers diverging? > Sequential generation of block ids > -- > > Key: HDFS-898 > URL: https://issues.apache.org/jira/browse/HDFS-898 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.20.1 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Fix For: 0.22.0 > > Attachments: DuplicateBlockIds.patch, HighBitProjection.pdf > > > This is a proposal to replace random generation of block ids with a > sequential generator in order to avoid block id reuse in the future. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-898) Sequential generation of block ids
[ https://issues.apache.org/jira/browse/HDFS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829860#action_12829860 ] Konstantin Shvachko commented on HDFS-898: -- h2. Approach #2 Here is another idea initiated by Rob Chansler, which will potentially let us free all positive longs for future block id generation. Let's set 8 or less high bits of existing block ids to 1. There is a high probability there will be no id collisions in the resulting set. More formally. We have a collection of blocks . And we build a new collection , where Mask = ff00 . Collection does not have repetitions, and there is a very good chance that collection also does not have them. The intuition here is that if you have randomly distributed points in n-dimensional space and you project them into (n-1)-dimensional sub-space, say along one of the axis, the probability that two points will be projected into the same image is low. I am attaching a document which derives a formula for the probability. The formula was independently confirmed by Nicholas. The bottom line is this: - The probability of getting a collision when setting the first 8 bits to 1 is < 0.03065 - The probability of getting a collision when setting only one high bit to 1 is < 0.0001221 >From practical point of view it is enough to set at least one highest bit. >Then we'll free the entire segment of positive longs for new block id >generation. I implemented a block id analyzing routine, which combines the two approaches described in the issue. The routine reads the image using OIV and first tries to reset bits starting from the high 8 and going down to 1 highest bit whenever resetting more bits leads to collisions. If the routine fails to reset any bits collision-free, then it uses the initial approach, that is, just finds the biggest free gap in existing block ids. If the bit reset approach succeeds then block replicas need to be renamed on data-nodes. This will be done via an upgrade. During the upgrade data-nodes hardlink replicas into the new directory (automatically handled by the upgrade framework), and then rename the newly created links reflecting the new block ids (the specific part of this upgrade). In order to avoid any mishaps I also propose to assign new generation stamp to all blocks renamed. So that when data-nodes that missed the upgrade wake up they will not report old blocks. The latter is impossible, because data-nodes cannot join the cluster until they upgrade, but we don't want to take any chances. I tested the routine on 6 images so far, containing from 150,000 to 40,000,000 blocks. All of them successfully tolerated the reset of 8 bits without ANY collisions. I would like to ask the community for some help. If people could run the tool on their images and post some stats or just send the results to me I'll take care of summarizing them. > Sequential generation of block ids > -- > > Key: HDFS-898 > URL: https://issues.apache.org/jira/browse/HDFS-898 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.20.1 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Fix For: 0.22.0 > > Attachments: HighBitProjection.pdf > > > This is a proposal to replace random generation of block ids with a > sequential generator in order to avoid block id reuse in the future. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-898) Sequential generation of block ids
[ https://issues.apache.org/jira/browse/HDFS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803142#action_12803142 ] Konstantin Shvachko commented on HDFS-898: -- I ran some experiments on some large and small images as a proof of concept. Here is the table. - First line is the number blocks in the file system. The largest I had is 40 million blocks. - Second line is the largest hole free of block ids. - Third line is the minimum segment that we expect to find which is calculated as the ration 2 ^64^ / num_blocks. I don't know how to right align numbers, so I used leading zeroes, hope it is not confusing. | Number of blocks | 40,509,569 | 31,959,139 | 241,777 | 178,278 | 148,035 | | Largest segment size | 8,623,203,281,141 | 10,662,709,581,709 | 889,137,135,725,504 | 1,324,814,576,358,595 | 1,849,602,429,191,491 | | Expected minimum | 0,455,367,560,644 | 00,577,197,761,694 | 076,296,205,914,968 | 0,103,471,211,268,346 | 0,124,609,852,155,620 | We see that selected segments are larger than the expected minimums and larger than 2 ^38^ = 274,877,906,944. This speaks of the quality of the random generator, but also projects longer than 43 years life span with the first segment we choose. > Sequential generation of block ids > -- > > Key: HDFS-898 > URL: https://issues.apache.org/jira/browse/HDFS-898 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.20.1 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Fix For: 0.22.0 > > > This is a proposal to replace random generation of block ids with a > sequential generator in order to avoid block id reuse in the future. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-898) Sequential generation of block ids
[ https://issues.apache.org/jira/browse/HDFS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803126#action_12803126 ] Konstantin Shvachko commented on HDFS-898: -- My main motivation is (3) - multiple name-nodes should have disjoint segments of ids. It would be good to do (2) - reduce the size of generation stamp. Many people will be happy to know some weird scenarios (1.1) like I described do not apply to HDFS. We cannot do anything about pre-generation-stamp blocks, I am just saying it exists. In a sense current generation stamps play the role of sequential ids, wouldn't it be good to finally turn it right? > Sequential generation of block ids > -- > > Key: HDFS-898 > URL: https://issues.apache.org/jira/browse/HDFS-898 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.20.1 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Fix For: 0.22.0 > > > This is a proposal to replace random generation of block ids with a > sequential generator in order to avoid block id reuse in the future. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-898) Sequential generation of block ids
[ https://issues.apache.org/jira/browse/HDFS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803092#action_12803092 ] dhruba borthakur commented on HDFS-898: --- Thanks Konstantin for the explanation. I understand the scenario much better now. My view is that we do not need to solve either of the problems (1) or (2) that you have listed below. Is it not? Is there a use case for (1)? > Sequential generation of block ids > -- > > Key: HDFS-898 > URL: https://issues.apache.org/jira/browse/HDFS-898 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.20.1 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Fix For: 0.22.0 > > > This is a proposal to replace random generation of block ids with a > sequential generator in order to avoid block id reuse in the future. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-898) Sequential generation of block ids
[ https://issues.apache.org/jira/browse/HDFS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803044#action_12803044 ] Konstantin Shvachko commented on HDFS-898: -- (1) I have a block (id = 1001, gs = ), which belongs to an existing file, replication factor = 3. I have two replicas of this block (id = 1001, gs = 7775) and (id = 1001, gs = 7770). This makes the block corrupt, and we don't remove replicas of a corrupt block. Then a third data-node starts, which was down for long time, it has replica (id = 1001, gs = 7), which at some point belonged to a different file, which was removed from the system since then. So I have three replicas, and the name-node does not have info to determine which replica is stale, and which one belonged to the removed file. Does that make sense? (2) Yes, all old blocks have gs=0. So if a have an old block with three valid replicas, and a DN with a prehistoric replica comes up, then you will not be able to distinguish between them. I mean that introduction of generation stamps does not solve the prehistoric block problem for blocks that existed before blocks had generation stamps. > Sequential generation of block ids > -- > > Key: HDFS-898 > URL: https://issues.apache.org/jira/browse/HDFS-898 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.20.1 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Fix For: 0.22.0 > > > This is a proposal to replace random generation of block ids with a > sequential generator in order to avoid block id reuse in the future. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-898) Sequential generation of block ids
[ https://issues.apache.org/jira/browse/HDFS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803028#action_12803028 ] dhruba borthakur commented on HDFS-898: --- > 1. In case of corrupt blocks, when all generation stamps of its replicas are > less than the blocks' gen stamp I think the Namenode can handle both stale replicas as well as prehistoric blocks the same. It will assume that the block does not belong to any files in the namespace and can safely be deleted? > 2. The really old blocks, created before introduction of generation stamps, > can still cause the problem Those datanodes will go through the upgrade process that will convert all those old blocks to have a generation stamp of 0, isn't it? > Sequential generation of block ids > -- > > Key: HDFS-898 > URL: https://issues.apache.org/jira/browse/HDFS-898 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.20.1 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Fix For: 0.22.0 > > > This is a proposal to replace random generation of block ids with a > sequential generator in order to avoid block id reuse in the future. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-898) Sequential generation of block ids
[ https://issues.apache.org/jira/browse/HDFS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803024#action_12803024 ] Konstantin Shvachko commented on HDFS-898: -- You are absolutely right if everything goes well generation stamps help recognizing prehistoric replicas of newly created blocks. When I say the problem is relevant I mean to emphasize two cases: # In case of corrupt blocks, when all generation stamps of its replicas are less than the block's gen. stamp on the NN, we do not have means to recognize whether the replica is just a stale replica of this particular block or some other (prehistoric) block. This happens when you don't have up to date replicas of the block and a data-node with a prehistoric replica of previously removed block with the same id starts up. # The really old blocks, created before introduction of generation stamps, can still cause the problem. I agree this is not exactly the problem as you defined it, but they are still related problems. > Sequential generation of block ids > -- > > Key: HDFS-898 > URL: https://issues.apache.org/jira/browse/HDFS-898 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.20.1 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Fix For: 0.22.0 > > > This is a proposal to replace random generation of block ids with a > sequential generator in order to avoid block id reuse in the future. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-898) Sequential generation of block ids
[ https://issues.apache.org/jira/browse/HDFS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800978#action_12800978 ] dhruba borthakur commented on HDFS-898: --- >The prehistoric block problem is still relevant, since the name-node's >block-map keys blocks by their ids, > see HDFS-512. Although now there is less chance to corrupt a block with a > stale replica, > because stale replicas will have older generation stamps. Can you pl explain why this is so? I thought that the list of blocks that hang off a Inode still has the generation stamp in it. If a datanode sends a block report with a pre-historic blockId, it can be detected at the NN because the genstamp of that block will not match the generation stamp of the blocks that hang off the inode. > Sequential generation of block ids > -- > > Key: HDFS-898 > URL: https://issues.apache.org/jira/browse/HDFS-898 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.20.1 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Fix For: 0.22.0 > > > This is a proposal to replace random generation of block ids with a > sequential generator in order to avoid block id reuse in the future. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-898) Sequential generation of block ids
[ https://issues.apache.org/jira/browse/HDFS-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800083#action_12800083 ] Konstantin Shvachko commented on HDFS-898: -- h3. Problem Currently HDFS generates ids for new blocks by randomly selecting a 64-bit number and verifying this id is not in the system yet. If the id is assigned to one of the blocks the procedure repeats again. This was discussed in several issues before: HADOOP-2656, HADOOP-146, HADOOP-158. The problem with this approach is that data-nodes that had been offline for a long time getting back online may bring in blocks that have already been eleased by the system and then their ids were regenerated and reused again for different blocks in different files. These are so called prehistoric blocks, see [here|https://issues.apache.org/jira/browse/HADOOP-2656?focusedCommentId=12590551&comment-tabpanel#action_12590551]. Bringing up a data-node with a prehistoric replica can potentially corrupt the block. h3. Motivation # The prehistoric block problem is still relevant, since the name-node's block-map keys blocks by their ids, see HDFS-512. Although now there is less chance to corrupt a block with a stale replica, because stale replicas will have older generation stamps. # Non-reusable block ids may let us convert generation stamps to 32-bit numbers. Currently generation stamp is 64-bit and is a global file system variable. If we are sure block ids are not reusable, then we can implement per-file generation stamps and 32-bit will be enough for that. # A forward looking reason for turning into sequential ids is that if we ever introduce multiple or distributed name-nodes, then it will not be feasible to verify the existence of a particular block id across these name-nodes. h3. Solution This simple solution was born some time ago in a discussion with Nicholas and Rob. Suppose that you have a cluster with 64 million blocks (N = 2 ^26^). Block ids are 64-bit numbers there is 2 ^64^ of such numbers. If I order the existing block ids b ~1~ , ..., b ~N~ , where b ~i~ < b ~i+1~ . Then each i defines a contiguous segment (b ~i~ , b ~i+1~ ) of numbers, which does not contain other block ids inside it. It is easy to see that at least one segment will be of size 2 ^38^ = 2 ^64^ / 2 ^26^ . So the proposal is to find such a segment and use it for generating block ids starting from b ~i~ + 1 by incrementing the previous generation stamp until it reaches b ~i+1~ . Currently we increment the generation stamp every time the file is created or a block write fails. Suppose we do 200 new-generation-stamps per second. This is rather optimistic: I think in practice the number is lower. With that pace we can keep generating new stamps for 43 years. Then we will find another large gap in what will remain out of those 43-year-old blocks and start using this new segment. I don't think anybody in this community will care what may happen after the third segment is comsumed, but HDFS will keep looking for new segments and the expectation is that very few blocks will outlive their creators. As the next step I'll look at some real HDFS images and verify that practice confirms the theory. > Sequential generation of block ids > -- > > Key: HDFS-898 > URL: https://issues.apache.org/jira/browse/HDFS-898 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.20.1 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Fix For: 0.22.0 > > > This is a proposal to replace random generation of block ids with a > sequential generator in order to avoid block id reuse in the future. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.