On Wed, Jun 21, 2017 at 12:51 AM, Marat Khalili <m...@rqc.ru> wrote:
> On 21/06/17 06:48, Chris Murphy wrote:
>>
>> Another possibility is to ensure a new write is written to a new*not*
>> full stripe, i.e. dynamic stripe size. So if the modification is a 50K
>> file on a 4 disk raid5; instead of writing 3 64K data strips + 1 64K
>> parity strip (a full stripe write); write out 1 64K data strip + 1 64K
>> parity strip. In effect, a 4 disk raid5 would quickly get not just 3
>> data + 1 parity strip Btrfs block groups; but 1 data + 1 parity, and 2
>> data + 1 parity chunks, and direct those write to the proper chunk
>> based on size. Anyway that's beyond my ability to assess how much
>> allocator work that is. Balance I'd expect to rewrite everything to
>> max data strips possible; the optimization would only apply to normal
>> operation COW..

> This will make some filesystems mostly RAID1, negating all space savings of
> RAID5, won't it?

No. It'd only apply to partial stripe writes, typically small files.
But small file, metadata centric workloads suck for raid5 anyway, and
should use raid1. So making the implementation more like raid1 than
raid5 for the RMW case I think is still better than Btrfs raid56 RMW
writes in effect being no-COW.


> Isn't it easier to recalculate parity block based using previous state of
> two rewritten strips, parity and data? I don't understand all performance
> implications, but it might scale better with number of devices.

The problem is atomicity. Either the data strip or parity strip is
overwritten first, and before the other is committed, the file system
is not merely inconsistent, it's basically lying, there's no way to
know for sure after the fact whether the data or parity were properly
written. And even the metadata is inconsistent too because it can only
describe the unmodified state and the successfully modified state,
whereas a 3rd state "partially modified" is possible and no way to
really fix it.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to