[ https://issues.apache.org/jira/browse/HDFS-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Walter Su updated HDFS-8501: ---------------------------- Description: Erasure Coding: Improve memory efficiency of BlockInfoStriped Assume we have a BlockInfoStriped: {noformat} triplets[] = {s0, s1, s2, s3} indices[] = {0, 1, 2, 3} {noformat} When we run balancer/mover to re-locate replica on s2, firstly it becomes: {noformat} triplets[] = {s0, s1, s2, s3, s4} indices[] = {0, 1, 2, 3, 2} {noformat} Then the replica on s2 is removed, finally it becomes: {noformat} triplets[] = {s0, s1, null, s3, s4} indices[] = {0, 1, -1, 3, 2} {noformat} The worst case is: {noformat} triplets[] = {null, null, null, null, s4, s5, s6, s7} indices[] = {-1, -1, -1, -1, 0, 1, 2, 3} {noformat} We should learn from {{BlockInfoContiguous.removeStorage(..)}}. When a storage is removed, we bring the last item front. With the improvement, the worst case become: {noformat} triplets[] = {s4, s5, s6, s7, null} indices[] = {0, 1, 2, 3, -1} {noformat} We have an empty slot. Notes: Assume we copy 4 storage first, then delete 4. Even with the improvement, the worst case could be: {noformat} triplets[] = {s4, s5, s6, s7, null, null, null, null} indices[] = {0, 1, 2, 3, -1, -1, -1, -1} {noformat} But the Balancer strategy won't move same block/blockGroup twice in a row. So this case is very rare. was: Erasure Coding: Improve memory efficiency of BlockInfoStriped Assume we have a BlockInfoStriped: {noformat} triplets[] = {s0, s1, s2, s3} indices[] = {0, 1, 2, 3} {noformat} When we run balancer/mover to re-locate replica on s2, firstly it becomes: {noformat} triplets[] = {s0, s1, s2, s3, s2} indices[] = {0, 1, 2, 3, 2} {noformat} Then the replica on s2 is removed, finally it becomes: {noformat} triplets[] = {s0, s1, null, s3, s2} indices[] = {0, 1, -1, 3, 2} {noformat} The worst case is: {noformat} triplets[] = {null, null, null, null, s0, s1, s2, s3} indices[] = {-1, -1, -1, -1, 0, 1, 2, 3} {noformat} We should learn from {{BlockInfoContiguous.removeStorage(..)}}. When a storage is removed, we bring the last item front. With the improvement, the worst case become: {noformat} triplets[] = {s0, s1, s2, s3, null} indices[] = {0, 1, 2, 3, -1} {noformat} We have an empty slot. Notes: Assume we copy 4 storage first, then delete 4. Even with the improvement, the worst case could be: {noformat} triplets[] = {s0, s1, s2, s3, null, null, null, null} indices[] = {0, 1, 2, 3, -1, -1, -1, -1} {noformat} But the Balancer strategy won't move same block/blockGroup twice in a row. So this case is very rare. > Erasure Coding: Improve memory efficiency of BlockInfoStriped > ------------------------------------------------------------- > > Key: HDFS-8501 > URL: https://issues.apache.org/jira/browse/HDFS-8501 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Walter Su > Assignee: Walter Su > > Erasure Coding: Improve memory efficiency of BlockInfoStriped > Assume we have a BlockInfoStriped: > {noformat} > triplets[] = {s0, s1, s2, s3} > indices[] = {0, 1, 2, 3} > {noformat} > When we run balancer/mover to re-locate replica on s2, firstly it becomes: > {noformat} > triplets[] = {s0, s1, s2, s3, s4} > indices[] = {0, 1, 2, 3, 2} > {noformat} > Then the replica on s2 is removed, finally it becomes: > {noformat} > triplets[] = {s0, s1, null, s3, s4} > indices[] = {0, 1, -1, 3, 2} > {noformat} > The worst case is: > {noformat} > triplets[] = {null, null, null, null, s4, s5, s6, s7} > indices[] = {-1, -1, -1, -1, 0, 1, 2, 3} > {noformat} > We should learn from {{BlockInfoContiguous.removeStorage(..)}}. When a > storage is removed, we bring the last item front. > With the improvement, the worst case become: > {noformat} > triplets[] = {s4, s5, s6, s7, null} > indices[] = {0, 1, 2, 3, -1} > {noformat} > We have an empty slot. > Notes: > Assume we copy 4 storage first, then delete 4. Even with the improvement, the > worst case could be: > {noformat} > triplets[] = {s4, s5, s6, s7, null, null, null, null} > indices[] = {0, 1, 2, 3, -1, -1, -1, -1} > {noformat} > But the Balancer strategy won't move same block/blockGroup twice in a row. So > this case is very rare. -- This message was sent by Atlassian JIRA (v6.3.4#6332)