Re: [arch-general] Btrfs more than twice as fast compared to ext4
On 16/03/10 00:48, Shridhar Daithankar wrote: [...] But as far as file system performance goes, the overhead should be identical for both the runs, no? I'm not too sure about that. I'm guessing there is less seeking going on with Btrfs. Some files systems (reiserfs + reiserfs4 IIRC) are very good with many small files, better than the ext*fs, this may be another case of that. Besides, I need to run the comparison(rather verification of file contents) many times over during the application life-cycle and I cannot afford to bring in another copy from disk. The working set is expected to be 30-40GB at a time, 3GB is just test setup. With md5sum, I can store it in database and verify it on one copy only. Fair enough. And finally, it is terrible on timings. Running md5sum is lot faster, about 3 times in the best case. [...] wow, that's slow! So when the source file system is btrfs, it is still couple of times faster at least. I still think you could achieve better times by not calling the external command that many times. Since you're already gonna store the checksums in a database, I'd just write a proper program in python or something. Or even just a shellscript, but you might wanna refrain from for .. in `find .. , it's the slowest and that relies on the fact that your filenames don't have spaces in them. [[ky] ~]# }} time find /usr/bin -type f -print0 | xargs -0 md5sum /tmp/1 real0m3.633s [[ky] ~]# }} time find /usr/bin -type f -exec md5sum {} \; /tmp/2 real0m10.196s [[ky] ~]# }} time for i in `find /usr/bin -type f`;do md5sum $i;done /tmp/3 real0m11.245s this last version missed a file because it has spaces in its name and as result the file 3 was inconsistent with files 1 and 2 [[ky] ~]# }} diff /tmp/{1,2} [[ky] ~]# }} diff /tmp/{3,2} 3054a3055 0c5d8f10aa0731671a00961f059dc46e /usr/bin/New SMB and DCERPC features in Impacket.pdf that was a test against just 4008, so you can imagine time savings with 5+ files.
Re: [arch-general] Btrfs more than twice as fast compared to ext4
On Tuesday 16 March 2010 14:41:41 Nathan Wayde wrote: On 16/03/10 00:48, Shridhar Daithankar wrote: [...] But as far as file system performance goes, the overhead should be identical for both the runs, no? I'm not too sure about that. I'm guessing there is less seeking going on with Btrfs. Some files systems (reiserfs + reiserfs4 IIRC) are very good with many small files, better than the ext*fs, this may be another case of that. Yes btrfs does have tail packing i.e. storing inode and the file together in a single block. However all the files I had in the tree were 50-55K in size and that definitely does not fit in a block. I still think you could achieve better times by not calling the external command that many times. Since you're already gonna store the checksums in a database, I'd just write a proper program in python or something. The application I am developing already has copy/copyttree and md5sum built- in. I mmap the whole file and do memcpy/memcmp/md5sum in a single pass. That is already a bit faster than native cp, which uses write and buffer management. I changed/refactored the tree copy code and created a new tree. And I wanted to verify outside the application that the tree copy has gone good. Hence did find/md5sum. This was a one time exercise only but the result were drastic enough to be published. -- Regards Shridhar
Re: [arch-general] Btrfs more than twice as fast compared to ext4
On 03/13/2010 08:35 AM, Shridhar Daithankar wrote: Hi, Just wanted to share an interesting experience I had today. Check http://ghodechhap.net/btrfs.performance.txt Great. A stable version released ? -- Nilesh Govindarajan Site Server Adminstrator www.itech7.com
Re: [arch-general] Btrfs more than twice as fast compared to ext4
On Monday 15 March 2010 15:44:35 Nathan Wayde wrote: On 13/03/10 03:05, Shridhar Daithankar wrote: Hi, Just wanted to share an interesting experience I had today. Check http://ghodechhap.net/btrfs.performance.txt Maybe you're looking for http://docs.python.org/library/filecmp.html One cannot help but think that you took a disk-bound process and turned it into a cpu-bound one. Since you're just interested in which files are different you should have just used `cmp` instead of `md5sum` the latter is just overkill and I'd assume calling an external command that many times can't be very nice either. here are some comparisons, they use /usr/lib - i figured 75000 files should be a good test... I made this as deliberately unfair/in-comparable as possible, I wanted to show the potential overhead of calling md5sum that many times. I didn't know of cmp, thanks. I tried the same thing with cmp in loops and it agrees with your comments that it is is totally I/O bound, not CPU bound at all. However, even in md5sum case, I/O was high too, the disk light was on all the time. May be it was the case for CPU speed difference. But as far as file system performance goes, the overhead should be identical for both the runs, no? Besides, I need to run the comparison(rather verification of file contents) many times over during the application life-cycle and I cannot afford to bring in another copy from disk. The working set is expected to be 30-40GB at a time, 3GB is just test setup. With md5sum, I can store it in database and verify it on one copy only. And finally, it is terrible on timings. Running md5sum is lot faster, about 3 times in the best case. shrid...@bheem /mnt1/shridhar/tmp/importtest.big$ time for i in `find . -type f`;do cmp $i /data/shridhar/tmp/4/$i;done real21m30.137s user0m27.665s sys 1m21.581s shrid...@bheem /data/shridhar/tmp/4$ time for i in `find . -type f`;do cmp $i /mnt1/shridhar/tmp/importtest.big/$i;done real6m26.988s user0m40.721s sys 1m28.371s shrid...@bheem /mnt1/shridhar/tmp/importtest.big$ time for i in `find . -type f`;do cmp $i /data/shridhar/tmp/4/$i;done real16m27.541s user0m37.281s sys 1m23.995s So when the source file system is btrfs, it is still couple of times faster at least. -- Regards Shridhar
[arch-general] Btrfs more than twice as fast compared to ext4
Hi, Just wanted to share an interesting experience I had today. Check http://ghodechhap.net/btrfs.performance.txt -- Regards Shridhar