Re: [Bug 9546] New: Huge latency in concurrent I/O when using data=ordered
(switching to email - please respond via emailed reply-to-all, not via the bugzilla web interface) On Tue, 11 Dec 2007 11:36:39 -0800 (PST) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=9546 Summary: Huge latency in concurrent I/O when using data=ordered Product: File System Version: 2.5 KernelVersion: 2.6.24-rc4 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: ext3 AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] Most recent kernel where this bug did not occur: Unknown, certainly not a regression, but something specific to ext3 algorithm Distribution: Bluewhite64 12 (Slackware 12 64 bits port) and Slackware 12 Hardware Environment: Athlon 64 3000+laptop IDE 5400 80GB+1.2GB RAM Athlon 64X2 4200+SATA 7200 200GB drive+1GB Athlon 2800+IDE 7200 40GB drive+512MB Software Environment: dd, cp, konqueror/KDE, mount/tune2fs Problem Description: When the system does heavy input/output operations on big files, small files access from other applications are always not served for very long time. This can cause huge latencies. The system is really not usable at all, even with all the recent improvements done to increase interactivity on desktop. This behaviour is very visible with the simple following test case: 1. Build a DVD structure from big MPEG+PS files with dvdauthor (it copies the files in the DVD stucture, then pass on them to fix VOBUs, but this part is not very long so this is not the main problem). 2. While the computer is doing this, try to open a web browser such as konqueror. Then open a page from bookmark. Then open a new tab, then open another page from bookmark. Switch bak to first page. What I get is: 35 seconds to open Konqueror. 8 seconds to open the bookmark menu. Incredible. 30 seconds to open the web page (DSL/10GBits). 5 seconds to open the second tab. 6 seconds to reopen the menu. 36 seconds to open the second page. 14 seconds to come back to first tab. This is unbelievable! The system is completely trashed, with more than 1GB RAM, whatever the hardware configuration is used. Of course, I investigated the problem... First, DMA is OK. Second, I thought cache would make memory swapped. So I used echo 0 swapiness. Then (of course, the system was not swapping at all), I thought TEXT sections from software discarded (that would be simply stupid, but who knows?). I then tried to make the writing process throttled with dirty_background_ratio (say 10%) while reserving a greater RAM portion for the rest of the system with dirty_ratio (say 70%). No way. Then I launched top, and looked at the WCHAN to see what was the problem for the frozen process (ie: konqueror). The I saw the faulty guy: log_wait_commit! So I concluded there is unfair access to the filesystem journal. So I tried other journaling options than the default ordered data mode. The results were really different: 5s, 2s, 4s, etc., both with journal and write back mode! I therefore think there is a great lock and even maybe a priority inversion in log_wait_commit of the ext3 filesystem. I think that, even if it is throttled, the writing process always get access to the journal in ordered mode, simply because it writes many pages at a time and because the ordered mode indeed implies... ordering of requests (as I understand it). It's sad this is the default option that gives the worst interactivity problems. Indeed, this messes all previous work done to enhance desktop experience I think, too bad! Btw, I've also seen on Internet that some people reported that journal data mode gives better performance. I think the problem was indeed related to latency rather than performance (timing the writing process effectively shows a output rate halved with journal data mode, and twice the time to process). Steps to reproduce: I did a simple script: #!/bin/bash SRC1=src1.bin SRC2=src2.bin DEST_DIR=tmpdir DST1=dst.bin # First, create the source files: if [ ! -e $SRC1 ] ; then dd if=/dev/zero of=$SRC1 bs=10k count=15 fi if [ ! -e $SRC2 ] ; then dd if=/dev/zero of=$SRC2 bs=10k count=15 fi mkdir $DEST_DIR /dev/null 21 sync # Do the test: echo Trashing the system... rm $DEST_DIR/$DST1 /dev/null 21 cp $SRC1 $DEST_DIR/$DST1 cat $SRC2 $DEST_DIR/$DST1 echo Done! #rm -rf $DEST_DIR $SRC1 $SRC2 While running it, try to use normally the interactive programs, such as konqueror (the program should have to access files, such as cookies, cache and so for konqueror). Then remount/tune the filesystem to use another data mode for ext3.
+ ext3-ext4-avoid-divide-by-zero.patch added to -mm tree
The patch titled ext3, ext4: avoid divide by zero has been added to the -mm tree. Its filename is ext3-ext4-avoid-divide-by-zero.patch *** Remember to use Documentation/SubmitChecklist when testing your code *** See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find out what to do about this -- Subject: ext3, ext4: avoid divide by zero From: Andries E. Brouwer [EMAIL PROTECTED] As it turns out, the kernel divides by EXT3_INODES_PER_GROUP(s) when mounting an ext3 filesystem. If that number is zero, a crash follows. Below a patch. This crash was reported by Joeri de Ruiter, Carst Tankink and Pim Vullers. Cc: linux-ext4@vger.kernel.org Acked-by: Alan Cox [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- fs/ext3/super.c |2 +- fs/ext4/super.c |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff -puN fs/ext3/super.c~ext3-ext4-avoid-divide-by-zero fs/ext3/super.c --- a/fs/ext3/super.c~ext3-ext4-avoid-divide-by-zero +++ a/fs/ext3/super.c @@ -1676,7 +1676,7 @@ static int ext3_fill_super (struct super sbi-s_blocks_per_group = le32_to_cpu(es-s_blocks_per_group); sbi-s_frags_per_group = le32_to_cpu(es-s_frags_per_group); sbi-s_inodes_per_group = le32_to_cpu(es-s_inodes_per_group); - if (EXT3_INODE_SIZE(sb) == 0) + if (EXT3_INODE_SIZE(sb) == 0 || EXT3_INODES_PER_GROUP(sb) == 0) goto cantfind_ext3; sbi-s_inodes_per_block = blocksize / EXT3_INODE_SIZE(sb); if (sbi-s_inodes_per_block == 0) diff -puN fs/ext4/super.c~ext3-ext4-avoid-divide-by-zero fs/ext4/super.c --- a/fs/ext4/super.c~ext3-ext4-avoid-divide-by-zero +++ a/fs/ext4/super.c @@ -1797,7 +1797,7 @@ static int ext4_fill_super (struct super sbi-s_desc_size = EXT4_MIN_DESC_SIZE; sbi-s_blocks_per_group = le32_to_cpu(es-s_blocks_per_group); sbi-s_inodes_per_group = le32_to_cpu(es-s_inodes_per_group); - if (EXT4_INODE_SIZE(sb) == 0) + if (EXT4_INODE_SIZE(sb) == 0 || EXT4_INODES_PER_GROUP(sb) == 0) goto cantfind_ext4; sbi-s_inodes_per_block = blocksize / EXT4_INODE_SIZE(sb); if (sbi-s_inodes_per_block == 0) _ Patches currently in -mm which might be from [EMAIL PROTECTED] are ext3-ext4-avoid-divide-by-zero.patch mnt_unbindable-fix.patch - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug 9546] New: Huge latency in concurrent I/O when using data=ordered
(switching to email - please respond via emailed reply-to-all, not via the bugzilla web interface) On Tue, 11 Dec 2007 11:36:39 -0800 (PST) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=9546 Summary: Huge latency in concurrent I/O when using data=ordered Product: File System Version: 2.5 KernelVersion: 2.6.24-rc4 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: ext3 AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] Most recent kernel where this bug did not occur: Unknown, certainly not a regression, but something specific to ext3 algorithm Distribution: Bluewhite64 12 (Slackware 12 64 bits port) and Slackware 12 Hardware Environment: Athlon 64 3000+laptop IDE 5400 80GB+1.2GB RAM Athlon 64X2 4200+SATA 7200 200GB drive+1GB Athlon 2800+IDE 7200 40GB drive+512MB Software Environment: dd, cp, konqueror/KDE, mount/tune2fs Problem Description: When the system does heavy input/output operations on big files, small files access from other applications are always not served for very long time. This can cause huge latencies. The system is really not usable at all, even with all the recent improvements done to increase interactivity on desktop. This behaviour is very visible with the simple following test case: 1. Build a DVD structure from big MPEG+PS files with dvdauthor (it copies the files in the DVD stucture, then pass on them to fix VOBUs, but this part is not very long so this is not the main problem). 2. While the computer is doing this, try to open a web browser such as konqueror. Then open a page from bookmark. Then open a new tab, then open another page from bookmark. Switch bak to first page. What I get is: 35 seconds to open Konqueror. 8 seconds to open the bookmark menu. Incredible. 30 seconds to open the web page (DSL/10GBits). 5 seconds to open the second tab. 6 seconds to reopen the menu. 36 seconds to open the second page. 14 seconds to come back to first tab. This is unbelievable! The system is completely trashed, with more than 1GB RAM, whatever the hardware configuration is used. Of course, I investigated the problem... First, DMA is OK. Second, I thought cache would make memory swapped. So I used echo 0 swapiness. Then (of course, the system was not swapping at all), I thought TEXT sections from software discarded (that would be simply stupid, but who knows?). I then tried to make the writing process throttled with dirty_background_ratio (say 10%) while reserving a greater RAM portion for the rest of the system with dirty_ratio (say 70%). No way. Then I launched top, and looked at the WCHAN to see what was the problem for the frozen process (ie: konqueror). The I saw the faulty guy: log_wait_commit! So I concluded there is unfair access to the filesystem journal. So I tried other journaling options than the default ordered data mode. The results were really different: 5s, 2s, 4s, etc., both with journal and write back mode! I therefore think there is a great lock and even maybe a priority inversion in log_wait_commit of the ext3 filesystem. I think that, even if it is throttled, the writing process always get access to the journal in ordered mode, simply because it writes many pages at a time and because the ordered mode indeed implies... ordering of requests (as I understand it). It's sad this is the default option that gives the worst interactivity problems. Indeed, this messes all previous work done to enhance desktop experience I think, too bad! Btw, I've also seen on Internet that some people reported that journal data mode gives better performance. I think the problem was indeed related to latency rather than performance (timing the writing process effectively shows a output rate halved with journal data mode, and twice the time to process). Steps to reproduce: I did a simple script: #!/bin/bash SRC1=src1.bin SRC2=src2.bin DEST_DIR=tmpdir DST1=dst.bin # First, create the source files: if [ ! -e $SRC1 ] ; then dd if=/dev/zero of=$SRC1 bs=10k count=15 fi if [ ! -e $SRC2 ] ; then dd if=/dev/zero of=$SRC2 bs=10k count=15 fi mkdir $DEST_DIR /dev/null 21 sync # Do the test: echo Trashing the system... rm $DEST_DIR/$DST1 /dev/null 21 cp $SRC1 $DEST_DIR/$DST1 cat $SRC2 $DEST_DIR/$DST1 echo Done! #rm -rf $DEST_DIR $SRC1 $SRC2 While running it, try to use normally the interactive programs, such as konqueror
[PATCH -mm] ext3: remove unused code from ext3_find_entry()
Hello, This patch removes unused code from ext3_find_entry(). Compile and boot tested. Signed-off-by: Mariusz Kozlowski [EMAIL PROTECTED] fs/ext3/namei.c | 67174 - 67077 (-97 bytes) fs/ext3/namei.o | 157944 - 157896 (-48 bytes) fs/ext3/namei.c |4 1 file changed, 4 deletions(-) --- linux-2.6.24-rc4-mm1-a/fs/ext3/namei.c 2007-12-06 09:27:07.0 +0100 +++ linux-2.6.24-rc4-mm1-b/fs/ext3/namei.c 2007-12-12 21:14:07.0 +0100 @@ -860,14 +860,10 @@ static struct buffer_head * ext3_find_en int nblocks, i, err; struct inode *dir = dentry-d_parent-d_inode; int namelen; - const u8 *name; - unsigned blocksize; *res_dir = NULL; sb = dir-i_sb; - blocksize = sb-s_blocksize; namelen = dentry-d_name.len; - name = dentry-d_name.name; if (namelen EXT3_NAME_LEN) return NULL; if (is_dx(dir)) { - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -mm] ext4: remove unused code from ext4_find_entry()
Hello, The unused code found in ext3_find_entry() is also present (and still unused) in the ext4_find_entry() code. This patch removes it. Compile tested only. Signed-off-by: Mariusz Kozlowski [EMAIL PROTECTED] fs/ext4/namei.c | 68044 - 67947 (-97 bytes) fs/ext4/namei.o | 183840 - 183792 (-48 bytes) fs/ext4/namei.c |4 1 file changed, 4 deletions(-) --- linux-2.6.24-rc4-mm1-a/fs/ext4/namei.c 2007-12-06 09:27:07.0 +0100 +++ linux-2.6.24-rc4-mm1-b/fs/ext4/namei.c 2007-12-12 22:32:45.0 +0100 @@ -861,14 +861,10 @@ static struct buffer_head * ext4_find_en int i, err; struct inode *dir = dentry-d_parent-d_inode; int namelen; - const u8 *name; - unsigned blocksize; *res_dir = NULL; sb = dir-i_sb; - blocksize = sb-s_blocksize; namelen = dentry-d_name.len; - name = dentry-d_name.name; if (namelen EXT4_NAME_LEN) return NULL; if (is_dx(dir)) { - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ext3 SMP bug ? PANIC in __d_find_alias
It is on: $ uname -a Linux home 2.6.23 #5 SMP PREEMPT Sun Oct 21 23:08:50 GST 2007 i686 unknown unknown GNU/Linux And yes it happened on previous kernels also at least since .21 I've had 6 panics so far randomly, but generally when doing a updatedb (from find(1)) which seems to trigger it ever so often if there is other activity also going on. M Original Message Subject: Re: ext3 SMP bug ? PANIC in __d_find_alias Date: Wed, 12 Dec 2007 20:36:40 +0100 From: Rafael J. Wysocki [EMAIL PROTECTED] To: Mitch [EMAIL PROTECTED] CC: [EMAIL PROTECTED], linux-ext4@vger.kernel.org References: [EMAIL PROTECTED] [Added CC to [EMAIL PROTECTED] On Wednesday, 12 of December 2007, Mitch wrote: Can anyone help with this ? This seems to be a true SMP bug - the same kernel on another UP machine is working fine (although different h/w). Seems like stress (find for example) can easily trigger this. Does it look like i have a bad filesystem ? Can anyone help me figure out which one ? The fact that this is tainted (due to nvidia) is a red herring i think because both my machines (the SMP and UP one) are using the same nvidia module and the panic is in ext3 code. Which kernel is this? Did it happen with any previous kernel? Dec 10 03:02:43 home kernel: BUG: unable to handle kernel NULL pointer dereference at virtual address Dec 10 03:02:43 home kernel: printing eip: Dec 10 03:02:43 home kernel: c01761fc Dec 10 03:02:43 home kernel: *pdpt = 198a6001 Dec 10 03:02:43 home kernel: *pde = Dec 10 03:02:43 home kernel: Oops: [#1] Dec 10 03:02:43 home kernel: PREEMPT SMP Dec 10 03:02:43 home kernel: Modules linked in: loop nls_iso8859_1 nls_cp437 vfat fat tun iptable_nat nvidia(P) appletalk psnap llc nfsd expo rtfs lockd sunrpc xt_limit xt_tcpudp iptable_mangle ipt_LOG ipt_MASQUERADE nf_nat ipt_TOS ipt_REJECT nf_conntrack_irc nf_conntrack_ftp nf_con ntrack_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables ftdi_sio usbserial forcedeth snd_hda_intel snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_pcm snd_timer snd_page_alloc snd_mixer_oss snd usb_storage ehci_hcd ohci_hcd it87 hwmon_vid i2c_dev i 2c_core Dec 10 03:02:43 home kernel: CPU:1 Dec 10 03:02:43 home kernel: EIP:0060:[__d_find_alias+44/192] Tainted: PVLI Dec 10 03:02:43 home kernel: EFLAGS: 00010282 (2.6.23 #5) Dec 10 03:02:43 home kernel: EIP is at __d_find_alias+0x2c/0xc0 Dec 10 03:02:43 home kernel: eax: ebx: c03579bc ecx: edx: 4000 Dec 10 03:02:44 home kernel: esi: f55d58bc edi: ebp: 0001 esp: d479dda4 Dec 10 03:02:44 home kernel: ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Dec 10 03:02:44 home kernel: Process find (pid: 8233, ti=d479c000 task=f6d35ab0 task.ti=d479c000) Dec 10 03:02:44 home kernel: Stack: f55d58a4 ebf42f00 f6735800 ebf42f00 c017832f f55d58a4 ebf42f00 f6735800 Dec 10 03:02:44 home kernel:c01ad386 c0177755 ebf42f60 d479de38 ebf42f00 e85bf2fc c0357e80 ebf42f00 Dec 10 03:02:44 home kernel:d479df04 c016d242 d479de44 f7c04740 f1352a98 f1352b0c d479de38 00034c98 Dec 10 03:02:44 home kernel: Call Trace: Dec 10 03:02:44 home kernel: [d_splice_alias+95/208] d_splice_alias+0x5f/0xd0 Dec 10 03:02:44 home kernel: [ext3_lookup+230/288] ext3_lookup+0xe6/0x120 Dec 10 03:02:44 home kernel: [d_alloc+309/416] d_alloc+0x135/0x1a0 Dec 10 03:02:44 home kernel: [do_lookup+290/416] do_lookup+0x122/0x1a0 Dec 10 03:02:44 home kernel: [__link_path_walk+1873/3408] __link_path_walk+0x751/0xd50 Dec 10 03:02:44 home kernel: [link_path_walk+101/192] link_path_walk+0x65/0xc0 Dec 10 03:02:44 home kernel: [link_path_walk+69/192] link_path_walk+0x45/0xc0 Dec 10 03:02:44 home kernel: [nameidata_to_filp+53/64] nameidata_to_filp+0x35/0x40 Dec 10 03:02:44 home kernel: [do_filp_open+75/96] do_filp_open+0x4b/0x60 Dec 10 03:02:44 home kernel: [do_path_lookup+120/448] do_path_lookup+0x78/0x1c0 Dec 10 03:02:44 home kernel: [getname+160/192] getname+0xa0/0xc0 Dec 10 03:02:44 home kernel: [__user_walk_fd+59/96] __user_walk_fd+0x3b/0x60 Dec 10 03:02:44 home kernel: [vfs_lstat_fd+31/80] vfs_lstat_fd+0x1f/0x50 Dec 10 03:02:44 home kernel: [nameidata_to_filp+53/64] nameidata_to_filp+0x35/0x40 Dec 10 03:02:44 home kernel: [do_filp_open+75/96] do_filp_open+0x4b/0x60 Dec 10 03:02:44 home kernel: [sys_lstat64+15/48] sys_lstat64+0xf/0x30 Dec 10 03:02:44 home kernel: [__fput+257/352] __fput+0x101/0x160 Dec 10 03:02:44 home kernel: [mntput_no_expire+19/96] mntput_no_expire+0x13/0x60 Dec 10 03:02:44 home kernel: [filp_close+71/128] filp_close+0x47/0x80 Dec 10 03:02:44 home kernel: [sys_close+102/208] sys_close+0x66/0xd0 Dec 10 03:02:44 home kernel: [sysenter_past_esp+95/133] sysenter_past_esp+0x5f/0x85 Dec 10 03:02:44 home kernel: === Dec 10 03:02:44 home kernel: Code: 89 c1 89 d5 57 56 8d 70 18 53 8b 40 18 31 db 39 c6 74 6c 0f b7 51 6a 31 ff
ext4 still broken on multiple architectures
fs/ext4/mballoc.c: In function 'ext4_mb_generate_buddy': fs/ext4/mballoc.c:836: error: implicit declaration of function 'ext2_find_next_bit' Can someone please get this fixed? - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html