Re: 2.6.16-rc6-mm2: slow writes on reiser4.
Le Tue, 21 Mar 2006 23:41:22 -0800, Hans Reiser <[EMAIL PROTECTED]> a écrit : > It may be that we need to port some of > the block allocation optimizations from V3 to V4 (Jeff's work) to help > with 90% full filesystems. Talking of that, I've read about a localized performance problem of reiserfs 3 in backuppc's mailing list (that is otherwise similar in performance with xfs for that task). I wonder if it was ever reported to you, as suggested in this mailing list... http://sourceforge.net/mailarchive/message.php?msg_id=8646808 My understanding is that backuppc is hitting reiserfs3 hard links worse case. Backuppc creates a huge pool of all versions of all files from all backups, compressed, organized using MD5 hashing (handling collisions of course), and hardlinked from their different backup views. [Some metadata is stored separately, so that several files with same content but different metadata can still be shared on disk. But I digress] At night, a sweeping process takes place to remove too old backups (according to user policy), and maybe check if some more background sharing/compression can be done. If I remember well, v3 puts directory entries and their corresponding inodes next to each other on disk. When hardlinks are created, new directory entries are created, pointing to the same inode. If the first directory entry is removed, the inode could be no longer stored near any of the entries pointing to it. Since backuppc is routinely removing directory entries in FIFO order, it's almost guaranteed to happen every time. Hence a very bad inodes distribution on disk after some time... I don't know what xfs does exactly (blocks of preallocated inodes ?) but it does better in this case. Hope it helps, Pierre.
Re: 2.6.16-rc6-mm2: slow writes on reiser4.
Hello Laurent, On Wed, 29 Mar 2006 08:16:55 +0200 Laurent Riffard <[EMAIL PROTECTED]> wrote: | So I found more conclusive to write 150M and thus to fill up the 2 FS. Thanks for the explanations. Truly yours, Philippe
Re: 2.6.16-rc6-mm2: slow writes on reiser4.
Le 29.03.2006 00:49, Philippe Gramoullé a écrit : > Hello Laurent, > > On Tue, 28 Mar 2006 22:19:01 +0200 > Laurent Riffard <[EMAIL PROTECTED]> wrote: > > | These FS are quite similars. Now guess what ? I filled these FS with > | dd. > | > | Original FS > | === > | # sync > | # time dd if=/dev/zero of=toto bs=1M count=150 > | 103+0 enregistrements lus. > | 102+0 enregistrements écrits. > | Command exited with non-zero status 1 > > Well, at least on my system , such a command exits with a 0 status Oops ! I trimmed a line when I cut'n'paste. dd exits with the message "Aucun espace disponible sur le périphérique" which means "No space left on device". > Also, not a single of your posts in this thread has this error except this one > and the one below Yes I somewhat changed my test. On the previous test, I dd'd 100M to the FS. As the original FS and its copy have different free space, writing 100M on each FS results in 3M free versus 30M free. I did this test and I it takes about 2'20" versus 15". But I feared that one objects "It's because you have less free space on the first FS". So I found more conclusive to write 150M and thus to fill up the 2 FS. > | 0.00user 2.94system 3:32.18elapsed 1%CPU (0avgtext+0avgdata > | 0maxresident)k > | # time sync > | 0inputs+0outputs (0major+279minor)pagefaults 0swaps > | 0.00user 0.01system 0:00.18elapsed 6%CPU (0avgtext+0avgdata > | 0maxresident)k > | 0inputs+0outputs (0major+191minor)pagefaults 0swaps > | > | Copy FS > | === > | # sync > | # time dd if=/dev/zero of=toto bs=1M count=150 > | dd: écriture de `toto': Aucun espace disponible sur le périphérique > | 132+0 enregistrements lus. > | 131+0 enregistrements écrits. > | Command exited with non-zero status 1 > > Here, i can understand the "exited with non-zero status 1" as > "Aucun espace disponible sur le périphérique" is french for > "No space left on device" yes, see above. > | 0.00user 4.08system 0:15.95elapsed 25%CPU (0avgtext+0avgdata > | 0maxresident)k > | 0inputs+0outputs (1major+279minor)pagefaults 0swaps > | # time sync > | 0.00user 0.00system 0:00.17elapsed 0%CPU (0avgtext+0avgdata > | 0maxresident)k > | 0inputs+0outputs (0major+190minor)pagefaults 0swaps > | disk$ > | > | See ? 3'30" versus 16". > > Are the 16" due to the fact that the above command exited earlier than it > should have ? No, (see above), both FS were filled up to 0M free space. > Thanks, > > Philippe > Thanks for your comments. I hope this made it clear. To be fair, you can see there is some differences between the 2 FS : - the copy is larger than the original one : 995998 bytes vs 1003520, which is 0.75% larger. - the original FS resides on an extended partition (/dev/hda8) while the copy is on a logical volume (/dev/vglinux1/test). This LV is hosted on /dev/hda4. I hope these differences do not have a high impact on the results. I'll try to dd of=/dev/hda8 if=/dev/vglinux1/test, and see if it makes some differences when I dd a 100M file on the FS. ~~ laurent
Re: 2.6.16-rc6-mm2: slow writes on reiser4.
Hello Laurent, On Tue, 28 Mar 2006 22:19:01 +0200 Laurent Riffard <[EMAIL PROTECTED]> wrote: | These FS are quite similars. Now guess what ? I filled these FS with | dd. | | Original FS | === | # sync | # time dd if=/dev/zero of=toto bs=1M count=150 | 103+0 enregistrements lus. | 102+0 enregistrements écrits. | Command exited with non-zero status 1 Well, at least on my system , such a command exits with a 0 status Also, not a single of your posts in this thread has this error except this one and the one below | 0.00user 2.94system 3:32.18elapsed 1%CPU (0avgtext+0avgdata | 0maxresident)k | # time sync | 0inputs+0outputs (0major+279minor)pagefaults 0swaps | 0.00user 0.01system 0:00.18elapsed 6%CPU (0avgtext+0avgdata | 0maxresident)k | 0inputs+0outputs (0major+191minor)pagefaults 0swaps | | Copy FS | === | # sync | # time dd if=/dev/zero of=toto bs=1M count=150 | dd: écriture de `toto': Aucun espace disponible sur le périphérique | 132+0 enregistrements lus. | 131+0 enregistrements écrits. | Command exited with non-zero status 1 Here, i can understand the "exited with non-zero status 1" as "Aucun espace disponible sur le périphérique" is french for "No space left on device" | 0.00user 4.08system 0:15.95elapsed 25%CPU (0avgtext+0avgdata | 0maxresident)k | 0inputs+0outputs (1major+279minor)pagefaults 0swaps | # time sync | 0.00user 0.00system 0:00.17elapsed 0%CPU (0avgtext+0avgdata | 0maxresident)k | 0inputs+0outputs (0major+190minor)pagefaults 0swaps | disk$ | | See ? 3'30" versus 16". Are the 16" due to the fact that the above command exited earlier than it should have ? Thanks, Philippe
Re: 2.6.16-rc6-mm2: slow writes on reiser4.
I think what this means is that after we have a repacker, we should gain performance advantages over our competition as a result. It is far easier for us to code an online repacker than it is for them. Hans
Re: 2.6.16-rc6-mm2: slow writes on reiser4.
Laurent Riffard wrote: > > >See ? 3'30" versus 16". > >I packed the metadata of my original FS to a file, you can grab it >from http://laurent.riffard.free.fr/kernel.reiser4.bz2 (6.7M). > > Wow. We need to do the repacker. We might also need to examine whether there are optimizations in V3 block allocation we should apply to V4, but mostly we need the repacker. Ok, well, right after we go into the kernel it will be done. Thanks much Laurent, you did a great job of analyzing this for us. >Note I was unable to unpack it : > > >># bunzip2 -c /tmp/kernel.reiser4.bz2 | debugfs.reiser4 -U /dev/vglinux1/test >>debugfs.reiser4 1.0.5 >>Copyright (C) 2001, 2002, 2003, 2004 by Hans Reiser, licensing governed by >>reiser4progs/COPYING. >> >>Info : The metadata were packed with the reiser4progs 1.0.5. >> >> >>Error: Can't unpack filesystem. >> >> > >~~ >laurent > > > >
Re: 2.6.16-rc6-mm2: slow writes on reiser4.
Le 22.03.2006 20:04, Hans Reiser a écrit : > Instead of using sync, could you increase the size of the files you > write so that they are 10x ram size? > > I have a suspicion we are slow at sync I am not sure why, but I > have seen other data where sync was slow for us, and maybe we need to > optimize that code path. > > Hans > Hello Hans, sorry for the long delay to reply. I'm not sure this is a problem with _sync_. I had concerns with sync on reiser4, but I was thinking it was related with the FS policy which try to do a lot of work in memory, and when syncing time comes, there is a huge amount of data to write back to disk. Well, I'm not a File Systems Expert, this is wild guess... Anyway, I didn't try to "write a file of size 10x ram size". My test case is a 925M FS with 100M free, and I have 512M ram. And I guess there is a problem with the Reiser4 internal data. It's an old FS, I made thousands of kernel builds on it. I allocated a new logical volume (about same size, same HD), made it a reiser4 FS and copied all my data on it. > [EMAIL PROTECTED] ~]# grep reiser4 /proc/mounts > /dev/hda8 /home/laurent/kernel reiser4 > rw,nosuid,nodev,atom_max_size=0x7e22,atom_max_age=0x249f0,atom_min_size=0x100,atom_max_flushers=0x1,cbk_cache_slots=0x10 > 0 0 > /dev/vglinux1/test /mnt/disk reiser4 > rw,atom_max_size=0x7e22,atom_max_age=0x249f0,atom_min_size=0x100,atom_max_flushers=0x1,cbk_cache_slots=0x10 > 0 0 > [EMAIL PROTECTED] ~]# grep -e hda8 -e dm-5 /proc/partitions >3 8 995998 hda8 > 254 51003520 dm-5 > [EMAIL PROTECTED] ~]# cp -pRL /home/laurent/kernel/. /mnt/disk [cut errors with symbolic links] > [EMAIL PROTECTED] ~]# df /home/laurent/kernel /mnt/disk > Sys. de fich. Tail. Occ. Disp. %Occ. Monté sur > /dev/hda8 925M 822M 103M 89% /home/laurent/kernel > /dev/mapper/vglinux1-test > 932M 800M 132M 86% /mnt/disk These FS are quite similars. Now guess what ? I filled these FS with dd. Original FS === # sync # time dd if=/dev/zero of=toto bs=1M count=150 103+0 enregistrements lus. 102+0 enregistrements écrits. Command exited with non-zero status 1 0.00user 2.94system 3:32.18elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k # time sync 0inputs+0outputs (0major+279minor)pagefaults 0swaps 0.00user 0.01system 0:00.18elapsed 6%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+191minor)pagefaults 0swaps Copy FS === # sync # time dd if=/dev/zero of=toto bs=1M count=150 dd: écriture de `toto': Aucun espace disponible sur le périphérique 132+0 enregistrements lus. 131+0 enregistrements écrits. Command exited with non-zero status 1 0.00user 4.08system 0:15.95elapsed 25%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (1major+279minor)pagefaults 0swaps # time sync 0.00user 0.00system 0:00.17elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+190minor)pagefaults 0swaps disk$ See ? 3'30" versus 16". I packed the metadata of my original FS to a file, you can grab it from http://laurent.riffard.free.fr/kernel.reiser4.bz2 (6.7M). Note I was unable to unpack it : > # bunzip2 -c /tmp/kernel.reiser4.bz2 | debugfs.reiser4 -U /dev/vglinux1/test > debugfs.reiser4 1.0.5 > Copyright (C) 2001, 2002, 2003, 2004 by Hans Reiser, licensing governed by > reiser4progs/COPYING. > > Info : The metadata were packed with the reiser4progs 1.0.5. > > > Error: Can't unpack filesystem. ~~ laurent
Re: 2.6.16-rc6-mm2: slow writes on reiser4.
On 3/23/06, Jindrich Makovicka <[EMAIL PROTECTED]> wrote: > Hans Reiser wrote: > > Instead of using sync, could you increase the size of the files you > > write so that they are 10x ram size? > > > > I have a suspicion we are slow at sync I am not sure why, but I > > have seen other data where sync was slow for us, and maybe we need to > > optimize that code path. > > My impression is rather that the bottleneck is the amount of seeking the > sync causes - would it be possible to reorder the write operations > somehow, still preserving atomicity? yeah, the kernel is not good at ordering flush during sync, it would work much better if Reiser4 could just be told to do a full sync, and then have only one thread that climbs through the fake inode and squallocs everything. > Also, a comparison of Reiser4 performance on NCQ vs. non-NCQ drive could > be interesting (I don't have NCQ, maybe that's the problem). the scheduler could make a difference too, most likely in the area of 'congestion' threshold and handling. NATE
Re: 2.6.16-rc6-mm2: slow writes on reiser4.
Hans Reiser wrote: > Instead of using sync, could you increase the size of the files you > write so that they are 10x ram size? > > I have a suspicion we are slow at sync I am not sure why, but I > have seen other data where sync was slow for us, and maybe we need to > optimize that code path. My impression is rather that the bottleneck is the amount of seeking the sync causes - would it be possible to reorder the write operations somehow, still preserving atomicity? Also, a comparison of Reiser4 performance on NCQ vs. non-NCQ drive could be interesting (I don't have NCQ, maybe that's the problem). Regards, -- Jindrich Makovicka
Re: 2.6.16-rc6-mm2: slow writes on reiser4.
Instead of using sync, could you increase the size of the files you write so that they are 10x ram size? I have a suspicion we are slow at sync I am not sure why, but I have seen other data where sync was slow for us, and maybe we need to optimize that code path. Hans Laurent Riffard wrote: >Le 22.03.2006 08:41, Hans Reiser a écrit : > > >>Laurent Riffard wrote: >> >> >> >> >>>Hello, >>> >>>Writing big files is very slow on reiser4 now. >>> >>>"dd if=/dev/zero of=toto bs=1k count=102400; sync" >>> >>> >>> >>try bs=4M, and tell me what happens. also try an empty fs, and an fs >>that is equally full to reiserfs. Note that reiserfs in your test is >>68% full vs. 90% full for V4. It may be that we need to port some of >>the block allocation optimizations from V3 to V4 (Jeff's work) to help >>with 90% full filesystems. Thanks for doing this. Real users always >>teach me a lot when they test things differently from how I did. >> >>Hans >> >> > >Hello Hans, > >Yesterday, I realized that my tests were not fair. So I did some >further tests trying to have the same situation for 3 different FS >(reiserfs/ext2/reiser4) and I sent the result to the list, but this >mail never reached the list. I have resent it. > >As per your request, I tried to replay my dd test on my 90% full >reiser4 FS, using a 4M block size. Here are the results: > >- > > >>Desktop$ cd ~/kernel >> >>kernel$ rm toto >>rm: détruire fichier régulier `toto'? o >> >>kernel$ df . >>Sys. de fich. Tail. Occ. Disp. %Occ. Monté sur >>/dev/hda8 925M 748M 177M 81% /home/laurent/kernel >> >>kernel$ grep /dev/hda8 /rpoc/mounts >>grep: /rpoc/mounts: Aucun fichier ou répertoire de ce type >> >>kernel$ grep /dev/hda8 /proc/mounts >>/dev/hda8 /home/laurent/kernel reiser4 >>rw,nosuid,nodev,atom_max_size=0x7e0c,atom_max_age=0x249f0,atom_min_size=0x100,atom_max_flushers=0x1,cbk_cache_slots=0x10 >> 0 0 >> >>kernel$ sync; time dd if=/dev/zero of=toto bs=4M count=25; time sync >>25+0 enregistrements lus. >>25+0 enregistrements écrits. >>0.00user 2.89system 0:17.18elapsed 16%CPU (0avgtext+0avgdata 0maxresident)k >>0inputs+0outputs (0major+252minor)pagefaults 0swaps >>0.00user 0.00system 2:19.91elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k >>0inputs+0outputs (0major+191minor)pagefaults 0swaps >> >>kernel$ sync; time dd if=/dev/zero of=toto bs=4M count=25; time sync >>25+0 enregistrements lus. >>25+0 enregistrements écrits. >>0.00user 2.96system 1:16.42elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k >>0inputs+0outputs (0major+252minor)pagefaults 0swaps >>0.00user 0.00system 0:08.70elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k >>0inputs+0outputs (0major+190minor)pagefaults 0swaps >> >> >- > >I tried to run an "iostat 10" simultaneously with dd+sync. I >attached the output. Hope this helps. >~~ >laurent > > > > >Le script a débuté sur mer 22 mar 2006 19:12:56 CET >Desktop$ cd ~/kernel >kernel$ >kernel$ sleep 15 && echo SYNC && sync && echo DD && time dd if=/dev/zero >of=toto bs=4M count=25 && echo SYNC && time sync && echo END & >[1] 4657 >kernel$ iostat -t 10 /dev/hda8 >Linux 2.6.16-rc6-mm2 (antares.localdomain) 22.03.2006 > >Heure: 19:13:32 >avg-cpu: %user %nice %system %iowait %idle > 5,010,02 11,074,45 79,46 > >Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >hda8 5,34 0,27 217,58 12971026592 > >Heure: 19:13:42 >avg-cpu: %user %nice %system %iowait %idle > 0,100,000,200,20 99,50 > >Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >hda8 0,00 0,00 0,00 0 0 > >SYNC >DD >Heure: 19:13:52 >avg-cpu: %user %nice %system %iowait %idle > 1,500,00 79,328,29 10,89 > >Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >hda8 20,38 3,20 1202,00 32 12032 > >Heure: 19:14:02 >avg-cpu: %user %nice %system %iowait %idle > 2,300,00 81,08 16,620,00 > >Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >hda8 33,53 0,00 1398,20 0 13968 > >Heure: 19:14:12 >avg-cpu: %user %nice %system %iowait %idle > 1,900,00 88,519,590,00 > >Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >hda8 25,27 0,00 893,51 0 8944 > >Heure: 19:14:22 >avg-cpu: %user %nice %system %iowait %idle > 3,190,00 85,63 11,180,00 > >Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >hda8 27,35 0,00 1288,62 0 12912 > >Heure: 19:14:32 >avg-cpu: %user %nice %system %iowait %idle > 0,800,00 90,019,19
Re: 2.6.16-rc6-mm2: slow writes on reiser4.
Le 22.03.2006 08:41, Hans Reiser a écrit : > Laurent Riffard wrote: > > >>Hello, >> >>Writing big files is very slow on reiser4 now. >> >>"dd if=/dev/zero of=toto bs=1k count=102400; sync" >> > > try bs=4M, and tell me what happens. also try an empty fs, and an fs > that is equally full to reiserfs. Note that reiserfs in your test is > 68% full vs. 90% full for V4. It may be that we need to port some of > the block allocation optimizations from V3 to V4 (Jeff's work) to help > with 90% full filesystems. Thanks for doing this. Real users always > teach me a lot when they test things differently from how I did. > > Hans Hello Hans, Yesterday, I realized that my tests were not fair. So I did some further tests trying to have the same situation for 3 different FS (reiserfs/ext2/reiser4) and I sent the result to the list, but this mail never reached the list. I have resent it. As per your request, I tried to replay my dd test on my 90% full reiser4 FS, using a 4M block size. Here are the results: - > Desktop$ cd ~/kernel > > kernel$ rm toto > rm: détruire fichier régulier `toto'? o > > kernel$ df . > Sys. de fich. Tail. Occ. Disp. %Occ. Monté sur > /dev/hda8 925M 748M 177M 81% /home/laurent/kernel > > kernel$ grep /dev/hda8 /rpoc/mounts > grep: /rpoc/mounts: Aucun fichier ou répertoire de ce type > > kernel$ grep /dev/hda8 /proc/mounts > /dev/hda8 /home/laurent/kernel reiser4 > rw,nosuid,nodev,atom_max_size=0x7e0c,atom_max_age=0x249f0,atom_min_size=0x100,atom_max_flushers=0x1,cbk_cache_slots=0x10 > 0 0 > > kernel$ sync; time dd if=/dev/zero of=toto bs=4M count=25; time sync > 25+0 enregistrements lus. > 25+0 enregistrements écrits. > 0.00user 2.89system 0:17.18elapsed 16%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (0major+252minor)pagefaults 0swaps > 0.00user 0.00system 2:19.91elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (0major+191minor)pagefaults 0swaps > > kernel$ sync; time dd if=/dev/zero of=toto bs=4M count=25; time sync > 25+0 enregistrements lus. > 25+0 enregistrements écrits. > 0.00user 2.96system 1:16.42elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (0major+252minor)pagefaults 0swaps > 0.00user 0.00system 0:08.70elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (0major+190minor)pagefaults 0swaps - I tried to run an "iostat 10" simultaneously with dd+sync. I attached the output. Hope this helps. ~~ laurent Le script a débuté sur mer 22 mar 2006 19:12:56 CET Desktop$ cd ~/kernel kernel$ kernel$ sleep 15 && echo SYNC && sync && echo DD && time dd if=/dev/zero of=toto bs=4M count=25 && echo SYNC && time sync && echo END & [1] 4657 kernel$ iostat -t 10 /dev/hda8 Linux 2.6.16-rc6-mm2 (antares.localdomain) 22.03.2006 Heure: 19:13:32 avg-cpu: %user %nice %system %iowait %idle 5,010,02 11,074,45 79,46 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda8 5,34 0,27 217,58 12971026592 Heure: 19:13:42 avg-cpu: %user %nice %system %iowait %idle 0,100,000,200,20 99,50 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda8 0,00 0,00 0,00 0 0 SYNC DD Heure: 19:13:52 avg-cpu: %user %nice %system %iowait %idle 1,500,00 79,328,29 10,89 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda8 20,38 3,20 1202,00 32 12032 Heure: 19:14:02 avg-cpu: %user %nice %system %iowait %idle 2,300,00 81,08 16,620,00 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda8 33,53 0,00 1398,20 0 13968 Heure: 19:14:12 avg-cpu: %user %nice %system %iowait %idle 1,900,00 88,519,590,00 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda8 25,27 0,00 893,51 0 8944 Heure: 19:14:22 avg-cpu: %user %nice %system %iowait %idle 3,190,00 85,63 11,180,00 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda8 27,35 0,00 1288,62 0 12912 Heure: 19:14:32 avg-cpu: %user %nice %system %iowait %idle 0,800,00 90,019,190,00 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda8 25,17 0,00 800,00 0 8008 Heure: 19:14:42 avg-cpu: %user %nice %system %iowait %idle 0,300,00 74,93 24,780,00 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda8 54,35 0,00 3138,46 0 31416 Heure: 19:14:52 avg-cpu: %user %nice %system %iowait %idle 0,200,00 81,62 18,180,00
Re: 2.6.16-rc6-mm2: slow writes on reiser4.
[this is a second post, the first post seemed to never reach the list] Le 21.03.2006 22:16, Laurent Riffard a écrit : > Hello, > > Writing big files is very slow on reiser4 now. > > "dd if=/dev/zero of=toto bs=1k count=102400; sync" takes more than 2 minutes > on > reiser4 fs, but only 15 seconds on reiserfs fs. Oops! My tests were not fair: my reiser4 FS was almost full while my reiserfs FS had plenty of free space. > kernel$ df . > Sys. de fich. Tail. Occ. Disp. %Occ. Monté sur > /dev/hda8 925M 825M 101M 90% /home/laurent/kernel > kernel$ grep hda8 /proc/mounts > /dev/hda8 /home/laurent/kernel reiser4 > rw,nosuid,nodev,atom_max_size=0x7e0c,atom_max_age=0x249f0,atom_min_size=0x100,atom_max_flushers=0x1,cbk_cache_slots=0x10 > 0 0 [snip] > ~$ df . > Sys. de fich. Tail. Occ. Disp. %Occ. Monté sur > /dev/mapper/vglinux1-lvhome > 7,0G 4,8G 2,3G 68% /home > ~$ grep lvhome /proc/mounts > /dev/vglinux1/lvhome /home reiserfs rw 0 0 So I did some tests with a 2GB logical volume. I formatted it (reiserfs/ext2/reiser4fs), I untared a copy of a kernel tree on this FS and I wrote a 100 MB file 3 times. FSElapsed time for dd + sync reiserfs: 14.22s ext2: 11.12s reiser4: 19.71s I won't discuss why reiser4 is slow here. Maybe my tests are not so good. The interesting point of this thread is that reiser4 seems not to like the situations with little space available. I should replay these tests with 90% full FS (but it's time to go to bed now...). Below is attached the full logs of my tests. ~~ laurent Le script a débuté sur mar 21 mar 2006 22:40:11 CET [EMAIL PROTECTED] ~]# lvdisplay /dev/vglinux1/test --- Logical volume --- LV Name/dev/vglinux1/test VG Namevglinux1 LV UUID1IdmIn-9Ne8-IZDS-PUYF-IyLP-Xz54-c50H2E LV Write Accessread/write LV Status available # open 0 LV Size2,00 GB Current LE 512 Segments 2 Allocation inherit Read ahead sectors 0 Block device 254:5 [EMAIL PROTECTED] ~]# mkfs.reiserfs /dev/vglinux1/test mkfs.reiserfs 3.6.19 (2003 www.namesys.com) A pair of credits: Yury Umanets (aka Umka) developed libreiser4, userspace plugins, and all userspace tools (reiser4progs) except of fsck. Hans Reiser was the project initiator, source of all funding for the first 5.5 years. He is the architect and official maintainer. Guessing about desired format.. Kernel 2.6.16-rc6-mm2 is running. Format 3.6 with standard journal Count of blocks on the device: 524288 Number of blocks consumed by mkreiserfs formatting process: 8227 Blocksize: 4096 Hash function used to sort names: "r5" Journal Size 8193 blocks (first block 18) Journal Max transaction length 1024 inode generation number: 0 UUID: 9f9b271b-1ed6-4ffb-9cde-243d3859b221 ATTENTION: YOU SHOULD REBOOT AFTER FDISK! ALL DATA WILL BE LOST ON '/dev/vglinux1/test'! Continue (y/n):y Initializing journal - 0%20%40%60%80%100% Syncing..ok Tell your friends to use a kernel based on 2.4.18 or later, and especially not a kernel based on 2.4.9, when you use reiserFS. Have fun. ReiserFS is successfully created on /dev/vglinux1/test. [EMAIL PROTECTED] ~]# mount /dev/vglinux1/test /mnt/disk [EMAIL PROTECTED] ~]# cd /mnt/disk [EMAIL PROTECTED] disk]# tar -xjf ~laurent/.ketchup/linux-2.6.15.tar.bz2 [EMAIL PROTECTED] disk]# df . Sys. de fich. Tail. Occ. Disp. %Occ. Monté sur /dev/mapper/vglinux1-test 2,0G 260M 1,8G 13% /mnt/disk [EMAIL PROTECTED] disk]# ls linux-2.6.15 [EMAIL PROTECTED] disk]# sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync 102400+0 enregistrements lus. 102400+0 enregistrements écrits. 0.04user 1.60system 0:01.73elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+250minor)pagefaults 0swaps 0.00user 0.06system 0:15.53elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+191minor)pagefaults 0swaps [EMAIL PROTECTED] disk]# sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync 102400+0 enregistrements lus. 102400+0 enregistrements écrits. 0.02user 1.60system 0:01.65elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+251minor)pagefaults 0swaps 0.00user 0.04system 0:09.72elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+190minor)pagefaults 0swaps [EMAIL PROTECTED] disk]# sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync 102400+0 enregistrements lus. 102400+0 enregistrements écrits. 0.04user 1.63system 0:01.69elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+250minor)pagefaults 0swaps 0.00user 0.06system 0:15.58elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+192minor)pagefaults 0swaps [EMAIL PROTECTED] disk]# sync; time dd if=/dev/zero of=toto bs=1k count=1024
Re: 2.6.16-rc6-mm2: slow writes on reiser4.
Laurent Riffard wrote: >Hello, > >Writing big files is very slow on reiser4 now. > >"dd if=/dev/zero of=toto bs=1k count=102400; sync" > try bs=4M, and tell me what happens. also try an empty fs, and an fs that is equally full to reiserfs. Note that reiserfs in your test is 68% full vs. 90% full for V4. It may be that we need to port some of the block allocation optimizations from V3 to V4 (Jeff's work) to help with 90% full filesystems. Thanks for doing this. Real users always teach me a lot when they test things differently from how I did. Hans > takes more than 2 minutes on >reiser4 fs, but only 15 seconds on reiserfs fs. > >Actually, writing on reiser4 is not uniformly slow, it seems to be blocked for >ages from time to time. I monitored the number of dirty pages from >/proc/meminfo >an I hit sysrq-T when the system was stalling: > >ddD 17DE 0 21930 21929 (NOTLB) > d7169c74 e0c98b05 0246 17de f396aa00 003d1249 d0b68140 > d0b68030 f396aa00 003d1249 6d519e00 0002 c0396434 d8bf8e30 d8bf8e38 > 0246 d7169ca0 c0270f08 d0b68030 0001 d0b68030 c0113b25 d8bf8e38 >Call Trace: > [] __down+0x81/0xdc > [] __down_failed+0xa/0x10 > [] .text.lock.lock+0x15/0x1b [reiser4] > [] longterm_lock_znode+0x5b4/0x7b0 [reiser4] > [] cbk_level_lookup+0x8a/0x954 [reiser4] > [] traverse_tree+0x752/0xa0d [reiser4] > [] coord_by_handle+0x781/0x789 [reiser4] > [] object_lookup+0x1eb/0x230 [reiser4] > [] find_file_item+0x18d/0x1b7 [reiser4] > [] write_flow+0x208/0x6e1 [reiser4] > [] write_unix_file+0x3d9/0x5b0 [reiser4] > [] vfs_write+0x8a/0x133 > [] sys_write+0x3b/0x60 > [] sysenter_past_esp+0x54/0x75 > >Below are the detailed test I ran. Feel free to ask for more information. > >Reiser4 FS >== > >Desktop$ cd ~/kernel > >kernel$ df . >Sys. de fich. Tail. Occ. Disp. %Occ. Monté sur >/dev/hda8 925M 825M 101M 90% /home/laurent/kernel > >kernel$ grep hda8 /proc/mounts >/dev/hda8 /home/laurent/kernel reiser4 >rw,nosuid,nodev,atom_max_size=0x7e0c,atom_max_age=0x249f0,atom_min_size=0x100,atom_max_flushers=0x1,cbk_cache_slots=0x10 > 0 0 > >kernel$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync >102400+0 enregistrements lus. >102400+0 enregistrements écrits. >0.06user 13.95system 1:42.09elapsed 13%CPU (0avgtext+0avgdata 0maxresident)k >0inputs+0outputs (0major+250minor)pagefaults 0swaps >0.00user 0.00system 1:22.90elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k >0inputs+0outputs (0major+191minor)pagefaults 0swaps > >kernel$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync >102400+0 enregistrements lus. >102400+0 enregistrements écrits. >0.08user 14.01system 1:45.57elapsed 13%CPU (0avgtext+0avgdata 0maxresident)k >0inputs+0outputs (0major+249minor)pagefaults 0swaps >0.00user 0.00system 0:09.78elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k >0inputs+0outputs (0major+191minor)pagefaults 0swaps > >kernel$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync >102400+0 enregistrements lus. >102400+0 enregistrements écrits. >0.06user 14.13system 2:18.27elapsed 10%CPU (0avgtext+0avgdata 0maxresident)k >0inputs+0outputs (0major+251minor)pagefaults 0swaps >0.00user 0.00system 0:08.48elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k >0inputs+0outputs (0major+190minor)pagefaults 0swaps > >kernel$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync >102400+0 enregistrements lus. >102400+0 enregistrements écrits. >0.06user 14.27system 1:56.34elapsed 12%CPU (0avgtext+0avgdata 0maxresident)k >0inputs+0outputs (0major+251minor)pagefaults 0swaps >0.00user 0.00system 0:10.46elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k >0inputs+0outputs (0major+190minor)pagefaults 0swaps > > >Reiserfs FS >=== >kernel$ cd > >~$ df . >Sys. de fich. Tail. Occ. Disp. %Occ. Monté sur >/dev/mapper/vglinux1-lvhome > 7,0G 4,8G 2,3G 68% /home >[/dev/mapper/vglinux1-lvhome resides on /dev/hda4] > >~$ grep lvhome /proc/mounts >/dev/vglinux1/lvhome /home reiserfs rw 0 0 > >~$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync >102400+0 enregistrements lus. >102400+0 enregistrements écrits. >0.04user 1.75system 0:02.05elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k >0inputs+0outputs (0major+249minor)pagefaults 0swaps >0.00user 0.10system 0:12.93elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k >0inputs+0outputs (0major+191minor)pagefaults 0swaps > >~$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync >102400+0 enregistrements lus. >102400+0 enregistrements écrits. >0.04user 1.83system 0:01.98elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k >0inputs+0outputs (0major+250minor)pagefaults 0swaps >0.00user 0.16system 0:14.45elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k >0inputs+0outputs (0major+191minor)pagefaults 0swaps > >~$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync >102400+0 enregistrements lus. >102400+0