On Tue, Oct 30, 2012 at 5:47 PM, ching <lschin...@gmail.com> wrote:
> On 10/31/2012 06:19 AM, Hugo Mills wrote:
>> On Tue, Oct 30, 2012 at 10:14:12PM +0000, Hugo Mills wrote:
>>> On Wed, Oct 31, 2012 at 05:40:25AM +0800, ching wrote:
>>>> On 10/30/2012 08:17 PM, cwillu wrote:
>>>>>>> If there is a lot of small files, then the size of metadata will be
>>>>>>> undesirable due to deduplication
>>>>>> Yes, that is a fact, but if that really matters depends on the use-case
>>>>>> (e.g., the small files to large files ratio, ...). But as btrfs is 
>>>>>> designed
>>>>>> explicitly as a general purpose file system, you usually want the good
>>>>>> performance instead of the better disk-usage (especially as disk space 
>>>>>> isn't
>>>>>> expensive anymore).
>>>>> As I understand it, in basically all cases the total storage used by
>>>>> inlining will be _smaller_, as the allocation doesn't need to be
>>>>> aligned to the sector size.
>>>>>
>>>> if i have 10G small files in total, then it will consume 20G by default.
>>>    If those small files are each 128 bytes in size, then you have
>>> approximately 80 million of them, and they'd take up 80 million pages,
>>> or 320 GiB of total disk space.
>>    Sorry, to make that clear -- I meant if they were stored in Data.
>> If they're inlined in metadata, then they'll take approximately 20 GiB
>> as you claim, which is a lot less than the 320 GiB they'd be if
>> they're not.
>>
>>    Hugo.
>>
>
>
> is it the same for:
> 1. 3k per file with leaf size=4K
> 2. 60k per file with leaf size=64k
>
>

import os
import sys

data = "1" * 1024 * 3

for x in xrange(100 * 1000):
  with open('%s/%s' % (sys.argv[1], x), 'a') as f:
    f.write(data)

root@repository:~$ mount -o loop ~/inline /mnt
root@repository:~$ mount -o loop,max_inline=0 ~/noninline /mnt2

root@repository:~$ time python test.py /mnt
real    0m11.105s
user    0m1.328s
sys     0m5.416s
root@repository:~$ time python test.py /mnt2
real    0m21.905s
user    0m1.292s
sys     0m5.460s

root@repository:/$ btrfs fi df /mnt
Data: total=1.01GB, used=256.00KB
System, DUP: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.00GB, used=652.70MB
Metadata: total=8.00MB, used=0.00

root@repository:/$ btrfs fi df /mnt2
Data: total=1.01GB, used=391.12MB
System, DUP: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.00GB, used=60.98MB
Metadata: total=8.00MB, used=0.00

3k data, 4k leaf: inline is twice the speed, but 1.4x bigger.

----

root@repository:~$ mkfs.btrfs inline -l 64k
root@repository:~$ mkfs.btrfs noninline -l 64k
...
root@repository:~$ time python test.py /mnt
real    0m12.244s
user    0m1.396s
sys     0m8.101s
root@repository:~$ time python test.py /mnt2
real    0m13.047s
user    0m1.436s
sys     0m7.772s

root@repository:/$ btr\fs fi df /mnt
Data: total=8.00MB, used=256.00KB
System, DUP: total=8.00MB, used=64.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.00GB, used=342.06MB
Metadata: total=8.00MB, used=0.00

root@repository:/$ btr\fs fi df /mnt2
Data: total=1.01GB, used=391.10MB
System, DUP: total=8.00MB, used=64.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.00GB, used=50.06MB
Metadata: total=8.00MB, used=0.00

3k data, 64k leaf: inline is still 10% faster, and is now 25% smaller

----

data = "1" * 1024 * 32

... (mkfs, mount, etc)

root@repository:~$ time python test.py /mnt
real    0m17.834s
user    0m1.224s
sys     0m4.772s
root@repository:~$ time python test.py /mnt2
real    0m20.521s
user    0m1.304s
sys     0m6.344s

root@repository:/$ btrfs fi df /mnt
Data: total=4.01GB, used=3.05GB
System, DUP: total=8.00MB, used=64.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.00GB, used=54.00MB
Metadata: total=8.00MB, used=0.00

root@repository:/$ btrfs fi df /mnt2
Data: total=4.01GB, used=3.05GB
System, DUP: total=8.00MB, used=64.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.00GB, used=53.56MB
Metadata: total=8.00MB, used=0.00

32k data, 64k leaf: inline is still 10% faster, and is now the same
size (not dead sure why, probably some interaction with the size of
the actual write that happens)

----

data = "1" * 1024 * 7

... etc


root@repository:~$ time python test.py /mnt
real    0m9.628s
user    0m1.368s
sys     0m4.188s
root@repository:~$ time python test.py /mnt2
real    0m13.455s
user    0m1.608s
sys     0m7.884s

root@repository:/$ btrfs fi df /mnt
Data: total=3.01GB, used=1.91GB
System, DUP: total=8.00MB, used=64.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.00GB, used=74.69MB
Metadata: total=8.00MB, used=0.00

root@repository:/$ btrfs fi df /mnt2
Data: total=3.01GB, used=1.91GB
System, DUP: total=8.00MB, used=64.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.00GB, used=74.69MB
Metadata: total=8.00MB, used=0.00

7k data, 64k leaf:  30% faster, same data usage.

----

Are we done yet?  Can I go home now? ;p
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to