Re: Poor performance of btrfs. Suspected unidentified btrfs housekeeping process which writes a lot

Chris Murphy Thu, 31 Jan 2013 11:08:45 -0800

On Jan 31, 2013, at 2:45 AM, Adam Ryczkowski <adam.ryczkow...@statystyka.net> 
wrote:
>> 
> Yes, you are right. It is important contributing factor, why relatime mount 
> option killed my performance so badly.


So is this what was causing the problem?

>> 
> The dedup chunk size isn't clearly stated, but from the README I infer it 
> deduplicates files as a whole; here is an excerpt from the README 
> (https://github.com/g2p/bedup/blob/master/README.rst)

I wouldn't expect reading file metadata, 

> This is a summary of the granurality of the allocation pieces in the storage 
> hierarchy.
> On mdadm I have chunk size of 512K,

It's quite large for your use case. It's large for most any use case, actually.

> I couldn't find any command that tells me the leaf size of already created 
> btrfs system. Maybe you can tell me?

I don't know that it's easily determined after mkfs time, someone else can 
maybe answer. Default is 4KB. Otherwise you use flags to set it.


> I will also check, if there is an alignment problem as well. When I was 
> reading a manual for each of the layer I came to the conclusion that each 
> layer is supposed to align to the underlying one automatically. But I try to 
> can check it.

I'm not thinking of an alignment problem, but a poor chosen chunk size for the 
usage problem. Changing 50 bytes (could be metadata or data), means in your 
case at least 2MB of RMW with a 512KB chunk. And this gets worse with more 
disks, because you have more chunks to read. The whole stripe is read, 
modified, and written on md raid6 currently. You're planning to add four more 
disks, so that's now 8 disks, and a 4MB full stripe RMW for 50 bytes of changed 
data.

Depending on what GPT partitioned these 3TB disks, it's remotely possible they 
aren't aligned to 4K sectors however. gdisk should do this correctly by 
starting the first partition at LBA 2048, and aligning to 16 sector boundaries. 
parted of recent versions does something similar, but I forget the details. 
Older versions can misalign by starting at LBA 63, as can other older non-Linux 
tools. OS X's Disk Utility starts the first partition at LBA 40 which is OK.

> Can you tell me more? Because I have only learned, that btrfs multi-device 
> support cannot join two volumes without striping. And striping in this case 
> is equivalent to fragmentation, which we want to avoid. In contrast to what 
> LVM can do. LVM can concatenate the underlying storage together, without 
> striping.

When you create a btrfs file system, by default the data profile is single, and 
metadata profile is dup. When you add another device to the volume, it stays 
this way. The single data profile behaves similar to LVM linear, except btrfs 
will alternate chunk allocations between devices, so that one isn't just 
sitting there spinning for a month and not being used at all. 

So it's not striping. But even if it were striping, that would help you on 
write performance in particular because now it's effectively RAID 60. I don't 
see why striping is considered fragmentation.

To change the profile for the volume, you use -dconvert and/or -mconvert with a 
rebalance operation.

Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Poor performance of btrfs. Suspected unidentified btrfs housekeeping process which writes a lot

Reply via email to