[ https://issues.apache.org/jira/browse/HDFS-9173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951360#comment-14951360 ]
Walter Su commented on HDFS-9173: --------------------------------- Great work Walter! It's probably simpler to treat data and parity blocks uniformly here. We can calculate "safe length" as the smallest length that covers at least 6 internal blocks. In your example, that would be the length of blk_5. We can then leave all data/parity blocks smaller than this length as stale replicas. ErasureCodingWorker will take care of replacing them. Zhe, that's my first thought. But this method is likely left 6 blocks, and we can simply delete the partial "target last stripe". What if appending is to append on last blockGroup, but not start a new one, the next time we only have 6 streamers. If it fails again, we probably have to cut off more stripes. I'm not quite approve starting new block group is better than appending on last blockGroup. We can discuss at HDFS-7663. The step #3(recover partial blocks), is to recover more blocks. If client got killed, It's very likely all block lengths differ. It's not that "stale" as you think. They are all healthy. I agree ErasureCodingWorker recover them. But recover a whole block costs. The step #4 is optional. If we simply delete the partial "target last stripe", it's easier to append on last blockGroup. If cellSize is 1MB we could dispose at most 6MB data. To recover more blocks is the same idea from BlockRecovery for non-ec file. It sync the 3 replicas with the same minimal length. All 3 replicas are kept. It could have simply pick out a longest replica, and dispose the others to save more data. Block Replication can recover the 2 others anyway. But it didn't do it. Well, if to start a new blockGroup is the plan, it doesn't matter how much time/resources Block Replication costs on the previous block, is it? > Erasure Coding: Lease recovery for striped file > ----------------------------------------------- > > Key: HDFS-9173 > URL: https://issues.apache.org/jira/browse/HDFS-9173 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Walter Su > Assignee: Walter Su > Attachments: HDFS-9173.00.wip.patch, HDFS-9173.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)