On 2014-10-23 05:19, Miao Xie wrote:
On Wed, 22 Oct 2014 14:40:47 +0200, Piotr Pawłow wrote:
On 22.10.2014 03:43, Chris Murphy wrote:
On Oct 21, 2014, at 4:14 PM, Piotr Pawłow<p...@siedziba.pl>  wrote:
Looks normal to me. Last time I started a balance after adding 6th device to my 
FS, it took 4 days to move 25GBs of data.
It's long term untenable. At some point it must be fixed. It's way, way slower 
than md raid.
At a certain point it needs to fallback to block level copying, with a ~ 32KB 
block. It can't be treating things as if they're 1K files, doing file level 
copying that takes forever. It's just too risky that another device fails in 
the meantime.

There's "device replace" for restoring redundancy, which is fast, but not 
implemented yet for RAID5/6.

Now my colleague and I is implementing the scrub/replace for RAID5/6
and I have a plan to reimplement the balance and split it off from the 
metadata/file data process. the main idea is
- allocate a new chunk which has the same size as the relocated one, but don't 
insert it into the block group list, so we don't
   allocate the free space from it.
- set the source chunk to be Read-only
- copy the data from the source chunk to the new chunk
- replace the extent map of the source chunk with the one of the new chunk(The 
new chunk has
   the same logical address and the length as the old one)
- release the source chunk

By this way, we needn't deal the data one extent by one extent, and needn't do 
any space reservation,
so the speed will be very fast even we have lots of snapshots.

Even if balance gets re-implemented this way, we should still provide some way to consolidate the data from multiple partially full chunks. Maybe keep the old balance path and have some option (maybe call it aggressive?) that turns it on instead of the new code.


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to