Re: [Bug 9546] New: Huge latency in concurrent I/O when using data=ordered

2007-12-12 Thread Jan Kara
 
 (switching to email - please respond via emailed reply-to-all, not via the
 bugzilla web interface)
 
 On Tue, 11 Dec 2007 11:36:39 -0800 (PST)
 [EMAIL PROTECTED] wrote:
 
  http://bugzilla.kernel.org/show_bug.cgi?id=9546
  
 Summary: Huge latency in concurrent I/O when using data=ordered
 Product: File System
 Version: 2.5
   KernelVersion: 2.6.24-rc4
Platform: All
  OS/Version: Linux
Tree: Mainline
  Status: NEW
Severity: normal
Priority: P1
   Component: ext3
  AssignedTo: [EMAIL PROTECTED]
  ReportedBy: [EMAIL PROTECTED]
  
  
  Most recent kernel where this bug did not occur:
  Unknown, certainly not a regression, but something specific to ext3 
  algorithm
  
  Distribution:
  Bluewhite64 12 (Slackware 12 64 bits port) and Slackware 12
  
  Hardware Environment:
  Athlon 64 3000+laptop IDE 5400 80GB+1.2GB RAM
  Athlon 64X2 4200+SATA 7200 200GB drive+1GB
  Athlon 2800+IDE 7200 40GB drive+512MB
  
  Software Environment:
  dd, cp, konqueror/KDE, mount/tune2fs
  
  Problem Description:
  When the system does heavy input/output operations on big files, small files
  access from other applications are always not served for very long time. 
  This
  can cause huge latencies. The system is really not usable at all, even with 
  all
  the recent improvements done to increase interactivity on desktop.
  
  This behaviour is very visible with the simple following test case:
  1. Build a DVD structure from big MPEG+PS files with dvdauthor (it copies 
  the
  files in the DVD stucture, then pass on them to fix VOBUs, but this part is 
  not
  very long so this is not the main problem).
  2. While the computer is doing this, try to open a web browser such as
  konqueror. Then open a page from bookmark. Then open a new tab, then open
  another page from bookmark. Switch bak to first page.
  
  What I get is:
  35 seconds to open Konqueror.
  8 seconds to open the bookmark menu. Incredible.
  30 seconds to open the web page (DSL/10GBits).
  5 seconds to open the second tab.
  6 seconds to reopen the menu.
  36 seconds to open the second page.
  14 seconds to come back to first tab.
  This is unbelievable! The system is completely trashed, with more than 1GB 
  RAM,
  whatever the hardware configuration is used.
  
  Of course, I investigated the problem... First, DMA is OK. Second, I thought
  cache would make memory swapped. So I used echo 0  swapiness. Then (of 
  course,
  the system was not swapping at all), I thought TEXT sections from software
  discarded (that would be simply stupid, but who knows?). I then tried to 
  make
  the writing process throttled with dirty_background_ratio (say 10%) while
  reserving a greater RAM portion for the rest of the system with dirty_ratio
  (say 70%). No way. Then I launched top, and looked at the WCHAN to see what 
  was
  the problem for the frozen process (ie: konqueror). The I saw the faulty 
  guy:
  log_wait_commit!
  
  So I concluded there is unfair access to the filesystem journal. So I tried
  other journaling options than the default ordered data mode. The results 
  were
  really different: 5s, 2s, 4s, etc., both with journal and write back mode!
  
  I therefore think there is a great lock and even maybe a priority inversion 
  in
  log_wait_commit of the ext3 filesystem. I think that, even if it is 
  throttled,
  the writing process always get access to the journal in ordered mode, simply
  because it writes many pages at a time and because the ordered mode indeed
  implies... ordering of requests (as I understand it).
  
  It's sad this is the default option that gives the worst interactivity
  problems. Indeed, this messes all previous work done to enhance desktop
  experience I think, too bad!
  
  Btw, I've also seen on Internet that some people reported that journal data
  mode gives better performance. I think the problem was indeed related to
  latency rather than performance (timing the writing process effectively 
  shows a
  output rate halved with journal data mode, and twice the time to process).
  
  Steps to reproduce:
  I did a simple script:
  #!/bin/bash
  
  SRC1=src1.bin
  SRC2=src2.bin
  DEST_DIR=tmpdir
  DST1=dst.bin
  
  # First, create the source files:
  if [ ! -e $SRC1 ] ; then
  dd if=/dev/zero of=$SRC1 bs=10k count=15
  fi
  if [ ! -e $SRC2 ] ; then
  dd if=/dev/zero of=$SRC2 bs=10k count=15
  fi
  mkdir $DEST_DIR  /dev/null 21
  sync
  
  # Do the test:
  echo Trashing the system...
  rm $DEST_DIR/$DST1  /dev/null 21
  cp $SRC1 $DEST_DIR/$DST1
  cat $SRC2  $DEST_DIR/$DST1
  echo Done!
  
  #rm -rf $DEST_DIR $SRC1 $SRC2
  
  While running it, try to use normally the interactive programs, such as
  konqueror (the program should have to access files, such as cookies, cache 
  and
  so for konqueror). Then remount/tune the filesystem to use another data mode
  for ext3.
  

+ ext3-ext4-avoid-divide-by-zero.patch added to -mm tree

2007-12-12 Thread akpm

The patch titled
 ext3, ext4: avoid divide by zero
has been added to the -mm tree.  Its filename is
 ext3-ext4-avoid-divide-by-zero.patch

*** Remember to use Documentation/SubmitChecklist when testing your code ***

See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find
out what to do about this

--
Subject: ext3, ext4: avoid divide by zero
From: Andries E. Brouwer [EMAIL PROTECTED]

As it turns out, the kernel divides by EXT3_INODES_PER_GROUP(s) when
mounting an ext3 filesystem.  If that number is zero, a crash follows. 
Below a patch.

This crash was reported by Joeri de Ruiter, Carst Tankink and Pim Vullers.

Cc: linux-ext4@vger.kernel.org
Acked-by: Alan Cox [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 fs/ext3/super.c |2 +-
 fs/ext4/super.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff -puN fs/ext3/super.c~ext3-ext4-avoid-divide-by-zero fs/ext3/super.c
--- a/fs/ext3/super.c~ext3-ext4-avoid-divide-by-zero
+++ a/fs/ext3/super.c
@@ -1676,7 +1676,7 @@ static int ext3_fill_super (struct super
sbi-s_blocks_per_group = le32_to_cpu(es-s_blocks_per_group);
sbi-s_frags_per_group = le32_to_cpu(es-s_frags_per_group);
sbi-s_inodes_per_group = le32_to_cpu(es-s_inodes_per_group);
-   if (EXT3_INODE_SIZE(sb) == 0)
+   if (EXT3_INODE_SIZE(sb) == 0 || EXT3_INODES_PER_GROUP(sb) == 0)
goto cantfind_ext3;
sbi-s_inodes_per_block = blocksize / EXT3_INODE_SIZE(sb);
if (sbi-s_inodes_per_block == 0)
diff -puN fs/ext4/super.c~ext3-ext4-avoid-divide-by-zero fs/ext4/super.c
--- a/fs/ext4/super.c~ext3-ext4-avoid-divide-by-zero
+++ a/fs/ext4/super.c
@@ -1797,7 +1797,7 @@ static int ext4_fill_super (struct super
sbi-s_desc_size = EXT4_MIN_DESC_SIZE;
sbi-s_blocks_per_group = le32_to_cpu(es-s_blocks_per_group);
sbi-s_inodes_per_group = le32_to_cpu(es-s_inodes_per_group);
-   if (EXT4_INODE_SIZE(sb) == 0)
+   if (EXT4_INODE_SIZE(sb) == 0 || EXT4_INODES_PER_GROUP(sb) == 0)
goto cantfind_ext4;
sbi-s_inodes_per_block = blocksize / EXT4_INODE_SIZE(sb);
if (sbi-s_inodes_per_block == 0)
_

Patches currently in -mm which might be from [EMAIL PROTECTED] are

ext3-ext4-avoid-divide-by-zero.patch
mnt_unbindable-fix.patch

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 9546] New: Huge latency in concurrent I/O when using data=ordered

2007-12-12 Thread Jan Kara
  
  (switching to email - please respond via emailed reply-to-all, not via the
  bugzilla web interface)
  
  On Tue, 11 Dec 2007 11:36:39 -0800 (PST)
  [EMAIL PROTECTED] wrote:
  
   http://bugzilla.kernel.org/show_bug.cgi?id=9546
   
  Summary: Huge latency in concurrent I/O when using data=ordered
  Product: File System
  Version: 2.5
KernelVersion: 2.6.24-rc4
 Platform: All
   OS/Version: Linux
 Tree: Mainline
   Status: NEW
 Severity: normal
 Priority: P1
Component: ext3
   AssignedTo: [EMAIL PROTECTED]
   ReportedBy: [EMAIL PROTECTED]
   
   
   Most recent kernel where this bug did not occur:
   Unknown, certainly not a regression, but something specific to ext3 
   algorithm
   
   Distribution:
   Bluewhite64 12 (Slackware 12 64 bits port) and Slackware 12
   
   Hardware Environment:
   Athlon 64 3000+laptop IDE 5400 80GB+1.2GB RAM
   Athlon 64X2 4200+SATA 7200 200GB drive+1GB
   Athlon 2800+IDE 7200 40GB drive+512MB
   
   Software Environment:
   dd, cp, konqueror/KDE, mount/tune2fs
   
   Problem Description:
   When the system does heavy input/output operations on big files, small 
   files
   access from other applications are always not served for very long time. 
   This
   can cause huge latencies. The system is really not usable at all, even 
   with all
   the recent improvements done to increase interactivity on desktop.
   
   This behaviour is very visible with the simple following test case:
   1. Build a DVD structure from big MPEG+PS files with dvdauthor (it copies 
   the
   files in the DVD stucture, then pass on them to fix VOBUs, but this part 
   is not
   very long so this is not the main problem).
   2. While the computer is doing this, try to open a web browser such as
   konqueror. Then open a page from bookmark. Then open a new tab, then open
   another page from bookmark. Switch bak to first page.
   
   What I get is:
   35 seconds to open Konqueror.
   8 seconds to open the bookmark menu. Incredible.
   30 seconds to open the web page (DSL/10GBits).
   5 seconds to open the second tab.
   6 seconds to reopen the menu.
   36 seconds to open the second page.
   14 seconds to come back to first tab.
   This is unbelievable! The system is completely trashed, with more than 
   1GB RAM,
   whatever the hardware configuration is used.
   
   Of course, I investigated the problem... First, DMA is OK. Second, I 
   thought
   cache would make memory swapped. So I used echo 0  swapiness. Then (of 
   course,
   the system was not swapping at all), I thought TEXT sections from software
   discarded (that would be simply stupid, but who knows?). I then tried to 
   make
   the writing process throttled with dirty_background_ratio (say 10%) while
   reserving a greater RAM portion for the rest of the system with 
   dirty_ratio
   (say 70%). No way. Then I launched top, and looked at the WCHAN to see 
   what was
   the problem for the frozen process (ie: konqueror). The I saw the faulty 
   guy:
   log_wait_commit!
   
   So I concluded there is unfair access to the filesystem journal. So I 
   tried
   other journaling options than the default ordered data mode. The 
   results were
   really different: 5s, 2s, 4s, etc., both with journal and write back 
   mode!
   
   I therefore think there is a great lock and even maybe a priority 
   inversion in
   log_wait_commit of the ext3 filesystem. I think that, even if it is 
   throttled,
   the writing process always get access to the journal in ordered mode, 
   simply
   because it writes many pages at a time and because the ordered mode indeed
   implies... ordering of requests (as I understand it).
   
   It's sad this is the default option that gives the worst interactivity
   problems. Indeed, this messes all previous work done to enhance desktop
   experience I think, too bad!
   
   Btw, I've also seen on Internet that some people reported that journal 
   data
   mode gives better performance. I think the problem was indeed related to
   latency rather than performance (timing the writing process effectively 
   shows a
   output rate halved with journal data mode, and twice the time to process).
   
   Steps to reproduce:
   I did a simple script:
   #!/bin/bash
   
   SRC1=src1.bin
   SRC2=src2.bin
   DEST_DIR=tmpdir
   DST1=dst.bin
   
   # First, create the source files:
   if [ ! -e $SRC1 ] ; then
   dd if=/dev/zero of=$SRC1 bs=10k count=15
   fi
   if [ ! -e $SRC2 ] ; then
   dd if=/dev/zero of=$SRC2 bs=10k count=15
   fi
   mkdir $DEST_DIR  /dev/null 21
   sync
   
   # Do the test:
   echo Trashing the system...
   rm $DEST_DIR/$DST1  /dev/null 21
   cp $SRC1 $DEST_DIR/$DST1
   cat $SRC2  $DEST_DIR/$DST1
   echo Done!
   
   #rm -rf $DEST_DIR $SRC1 $SRC2
   
   While running it, try to use normally the interactive programs, such as
   konqueror 

[PATCH -mm] ext3: remove unused code from ext3_find_entry()

2007-12-12 Thread Mariusz Kozlowski
Hello,

This patch removes unused code from ext3_find_entry().
Compile and boot tested.

Signed-off-by: Mariusz Kozlowski [EMAIL PROTECTED]

 fs/ext3/namei.c | 67174 - 67077 (-97 bytes)
 fs/ext3/namei.o | 157944 - 157896 (-48 bytes)

 fs/ext3/namei.c |4 
 1 file changed, 4 deletions(-)

--- linux-2.6.24-rc4-mm1-a/fs/ext3/namei.c  2007-12-06 09:27:07.0 
+0100
+++ linux-2.6.24-rc4-mm1-b/fs/ext3/namei.c  2007-12-12 21:14:07.0 
+0100
@@ -860,14 +860,10 @@ static struct buffer_head * ext3_find_en
int nblocks, i, err;
struct inode *dir = dentry-d_parent-d_inode;
int namelen;
-   const u8 *name;
-   unsigned blocksize;

*res_dir = NULL;
sb = dir-i_sb;
-   blocksize = sb-s_blocksize;
namelen = dentry-d_name.len;
-   name = dentry-d_name.name;
if (namelen  EXT3_NAME_LEN)
return NULL;
if (is_dx(dir)) {
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -mm] ext4: remove unused code from ext4_find_entry()

2007-12-12 Thread Mariusz Kozlowski
Hello,

The unused code found in ext3_find_entry() is also present (and still 
unused)
in the ext4_find_entry() code. This patch removes it. Compile tested only.

Signed-off-by: Mariusz Kozlowski [EMAIL PROTECTED]

 fs/ext4/namei.c | 68044 - 67947 (-97 bytes)
 fs/ext4/namei.o | 183840 - 183792 (-48 bytes)

 fs/ext4/namei.c |4 
 1 file changed, 4 deletions(-)

--- linux-2.6.24-rc4-mm1-a/fs/ext4/namei.c  2007-12-06 09:27:07.0 
+0100
+++ linux-2.6.24-rc4-mm1-b/fs/ext4/namei.c  2007-12-12 22:32:45.0 
+0100
@@ -861,14 +861,10 @@ static struct buffer_head * ext4_find_en
int i, err;
struct inode *dir = dentry-d_parent-d_inode;
int namelen;
-   const u8 *name;
-   unsigned blocksize;

*res_dir = NULL;
sb = dir-i_sb;
-   blocksize = sb-s_blocksize;
namelen = dentry-d_name.len;
-   name = dentry-d_name.name;
if (namelen  EXT4_NAME_LEN)
return NULL;
if (is_dx(dir)) {
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ext3 SMP bug ? PANIC in __d_find_alias

2007-12-12 Thread Mitch

It is on:

$ uname -a
Linux home 2.6.23 #5 SMP PREEMPT Sun Oct 21 23:08:50 GST 2007 i686 
unknown unknown GNU/Linux


And yes it happened on previous kernels also at least since .21
I've had 6 panics so far randomly, but generally when doing a updatedb 
(from find(1)) which seems to trigger it ever so often if there is other 
activity also going on.


M

 Original Message 
Subject: Re: ext3 SMP bug  ?  PANIC in __d_find_alias
Date: Wed, 12 Dec 2007 20:36:40 +0100
From: Rafael J. Wysocki [EMAIL PROTECTED]
To: Mitch [EMAIL PROTECTED]
CC: [EMAIL PROTECTED], linux-ext4@vger.kernel.org
References: [EMAIL PROTECTED]

[Added CC to [EMAIL PROTECTED]

On Wednesday, 12 of December 2007, Mitch wrote:
Can anyone help with this ? This seems to be a true SMP bug - the same 
kernel on another UP machine is working fine (although different h/w). 
Seems like stress (find for example) can easily trigger this. Does it 
look like i have a bad filesystem ? Can anyone help me figure out which 
one ? The fact that this is tainted (due to nvidia) is a red herring i 
think because both my machines (the SMP and UP one) are using the same 
nvidia module and the panic is in ext3 code.


Which kernel is this?

Did it happen with any previous kernel?


Dec 10 03:02:43 home kernel: BUG: unable to handle kernel NULL pointer 
dereference at virtual address 

Dec 10 03:02:43 home kernel:  printing eip:
Dec 10 03:02:43 home kernel: c01761fc
Dec 10 03:02:43 home kernel: *pdpt = 198a6001
Dec 10 03:02:43 home kernel: *pde = 
Dec 10 03:02:43 home kernel: Oops:  [#1]
Dec 10 03:02:43 home kernel: PREEMPT SMP
Dec 10 03:02:43 home kernel: Modules linked in: loop nls_iso8859_1 
nls_cp437 vfat fat tun iptable_nat nvidia(P) appletalk psnap llc nfsd expo
rtfs lockd sunrpc xt_limit xt_tcpudp iptable_mangle ipt_LOG 
ipt_MASQUERADE nf_nat ipt_TOS ipt_REJECT nf_conntrack_irc 
nf_conntrack_ftp nf_con
ntrack_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables 
ftdi_sio usbserial forcedeth snd_hda_intel snd_seq_oss snd_seq_midi_event
  snd_seq snd_seq_device snd_pcm_oss snd_pcm snd_timer snd_page_alloc 
snd_mixer_oss snd usb_storage ehci_hcd ohci_hcd it87 hwmon_vid i2c_dev i

2c_core
Dec 10 03:02:43 home kernel: CPU:1
Dec 10 03:02:43 home kernel: EIP:0060:[__d_find_alias+44/192] 
Tainted: PVLI

Dec 10 03:02:43 home kernel: EFLAGS: 00010282   (2.6.23 #5)
Dec 10 03:02:43 home kernel: EIP is at __d_find_alias+0x2c/0xc0
Dec 10 03:02:43 home kernel: eax:    ebx: c03579bc   ecx: 
   edx: 4000
Dec 10 03:02:44 home kernel: esi: f55d58bc   edi:    ebp: 
0001   esp: d479dda4
Dec 10 03:02:44 home kernel: ds: 007b   es: 007b   fs: 00d8  gs: 0033 
ss: 0068
Dec 10 03:02:44 home kernel: Process find (pid: 8233, ti=d479c000 
task=f6d35ab0 task.ti=d479c000)
Dec 10 03:02:44 home kernel: Stack: f55d58a4 ebf42f00 f6735800 ebf42f00 
c017832f f55d58a4 ebf42f00 f6735800
Dec 10 03:02:44 home kernel:c01ad386 c0177755 ebf42f60 d479de38 
ebf42f00 e85bf2fc c0357e80 ebf42f00
Dec 10 03:02:44 home kernel:d479df04 c016d242 d479de44 f7c04740 
f1352a98 f1352b0c d479de38 00034c98

Dec 10 03:02:44 home kernel: Call Trace:
Dec 10 03:02:44 home kernel:  [d_splice_alias+95/208] 
d_splice_alias+0x5f/0xd0

Dec 10 03:02:44 home kernel:  [ext3_lookup+230/288] ext3_lookup+0xe6/0x120
Dec 10 03:02:44 home kernel:  [d_alloc+309/416] d_alloc+0x135/0x1a0
Dec 10 03:02:44 home kernel:  [do_lookup+290/416] do_lookup+0x122/0x1a0
Dec 10 03:02:44 home kernel:  [__link_path_walk+1873/3408] 
__link_path_walk+0x751/0xd50
Dec 10 03:02:44 home kernel:  [link_path_walk+101/192] 
link_path_walk+0x65/0xc0
Dec 10 03:02:44 home kernel:  [link_path_walk+69/192] 
link_path_walk+0x45/0xc0
Dec 10 03:02:44 home kernel:  [nameidata_to_filp+53/64] 
nameidata_to_filp+0x35/0x40

Dec 10 03:02:44 home kernel:  [do_filp_open+75/96] do_filp_open+0x4b/0x60
Dec 10 03:02:44 home kernel:  [do_path_lookup+120/448] 
do_path_lookup+0x78/0x1c0

Dec 10 03:02:44 home kernel:  [getname+160/192] getname+0xa0/0xc0
Dec 10 03:02:44 home kernel:  [__user_walk_fd+59/96] 
__user_walk_fd+0x3b/0x60

Dec 10 03:02:44 home kernel:  [vfs_lstat_fd+31/80] vfs_lstat_fd+0x1f/0x50
Dec 10 03:02:44 home kernel:  [nameidata_to_filp+53/64] 
nameidata_to_filp+0x35/0x40

Dec 10 03:02:44 home kernel:  [do_filp_open+75/96] do_filp_open+0x4b/0x60
Dec 10 03:02:44 home kernel:  [sys_lstat64+15/48] sys_lstat64+0xf/0x30
Dec 10 03:02:44 home kernel:  [__fput+257/352] __fput+0x101/0x160
Dec 10 03:02:44 home kernel:  [mntput_no_expire+19/96] 
mntput_no_expire+0x13/0x60

Dec 10 03:02:44 home kernel:  [filp_close+71/128] filp_close+0x47/0x80
Dec 10 03:02:44 home kernel:  [sys_close+102/208] sys_close+0x66/0xd0
Dec 10 03:02:44 home kernel:  [sysenter_past_esp+95/133] 
sysenter_past_esp+0x5f/0x85

Dec 10 03:02:44 home kernel:  ===
Dec 10 03:02:44 home kernel: Code: 89 c1 89 d5 57 56 8d 70 18 53 8b 40 
18 31 db 39 c6 74 6c 0f b7 51 6a 31 ff 

ext4 still broken on multiple architectures

2007-12-12 Thread Andrew Morton
fs/ext4/mballoc.c: In function 'ext4_mb_generate_buddy':
fs/ext4/mballoc.c:836: error: implicit declaration of function 
'ext2_find_next_bit'

Can someone please get this fixed?
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html