Re: [Bug 9546] New: Huge latency in concurrent I/O when using data=ordered

2007-12-12 Thread Jan Kara
 
 (switching to email - please respond via emailed reply-to-all, not via the
 bugzilla web interface)
 
 On Tue, 11 Dec 2007 11:36:39 -0800 (PST)
 [EMAIL PROTECTED] wrote:
 
  http://bugzilla.kernel.org/show_bug.cgi?id=9546
  
 Summary: Huge latency in concurrent I/O when using data=ordered
 Product: File System
 Version: 2.5
   KernelVersion: 2.6.24-rc4
Platform: All
  OS/Version: Linux
Tree: Mainline
  Status: NEW
Severity: normal
Priority: P1
   Component: ext3
  AssignedTo: [EMAIL PROTECTED]
  ReportedBy: [EMAIL PROTECTED]
  
  
  Most recent kernel where this bug did not occur:
  Unknown, certainly not a regression, but something specific to ext3 
  algorithm
  
  Distribution:
  Bluewhite64 12 (Slackware 12 64 bits port) and Slackware 12
  
  Hardware Environment:
  Athlon 64 3000+laptop IDE 5400 80GB+1.2GB RAM
  Athlon 64X2 4200+SATA 7200 200GB drive+1GB
  Athlon 2800+IDE 7200 40GB drive+512MB
  
  Software Environment:
  dd, cp, konqueror/KDE, mount/tune2fs
  
  Problem Description:
  When the system does heavy input/output operations on big files, small files
  access from other applications are always not served for very long time. 
  This
  can cause huge latencies. The system is really not usable at all, even with 
  all
  the recent improvements done to increase interactivity on desktop.
  
  This behaviour is very visible with the simple following test case:
  1. Build a DVD structure from big MPEG+PS files with dvdauthor (it copies 
  the
  files in the DVD stucture, then pass on them to fix VOBUs, but this part is 
  not
  very long so this is not the main problem).
  2. While the computer is doing this, try to open a web browser such as
  konqueror. Then open a page from bookmark. Then open a new tab, then open
  another page from bookmark. Switch bak to first page.
  
  What I get is:
  35 seconds to open Konqueror.
  8 seconds to open the bookmark menu. Incredible.
  30 seconds to open the web page (DSL/10GBits).
  5 seconds to open the second tab.
  6 seconds to reopen the menu.
  36 seconds to open the second page.
  14 seconds to come back to first tab.
  This is unbelievable! The system is completely trashed, with more than 1GB 
  RAM,
  whatever the hardware configuration is used.
  
  Of course, I investigated the problem... First, DMA is OK. Second, I thought
  cache would make memory swapped. So I used echo 0  swapiness. Then (of 
  course,
  the system was not swapping at all), I thought TEXT sections from software
  discarded (that would be simply stupid, but who knows?). I then tried to 
  make
  the writing process throttled with dirty_background_ratio (say 10%) while
  reserving a greater RAM portion for the rest of the system with dirty_ratio
  (say 70%). No way. Then I launched top, and looked at the WCHAN to see what 
  was
  the problem for the frozen process (ie: konqueror). The I saw the faulty 
  guy:
  log_wait_commit!
  
  So I concluded there is unfair access to the filesystem journal. So I tried
  other journaling options than the default ordered data mode. The results 
  were
  really different: 5s, 2s, 4s, etc., both with journal and write back mode!
  
  I therefore think there is a great lock and even maybe a priority inversion 
  in
  log_wait_commit of the ext3 filesystem. I think that, even if it is 
  throttled,
  the writing process always get access to the journal in ordered mode, simply
  because it writes many pages at a time and because the ordered mode indeed
  implies... ordering of requests (as I understand it).
  
  It's sad this is the default option that gives the worst interactivity
  problems. Indeed, this messes all previous work done to enhance desktop
  experience I think, too bad!
  
  Btw, I've also seen on Internet that some people reported that journal data
  mode gives better performance. I think the problem was indeed related to
  latency rather than performance (timing the writing process effectively 
  shows a
  output rate halved with journal data mode, and twice the time to process).
  
  Steps to reproduce:
  I did a simple script:
  #!/bin/bash
  
  SRC1=src1.bin
  SRC2=src2.bin
  DEST_DIR=tmpdir
  DST1=dst.bin
  
  # First, create the source files:
  if [ ! -e $SRC1 ] ; then
  dd if=/dev/zero of=$SRC1 bs=10k count=15
  fi
  if [ ! -e $SRC2 ] ; then
  dd if=/dev/zero of=$SRC2 bs=10k count=15
  fi
  mkdir $DEST_DIR  /dev/null 21
  sync
  
  # Do the test:
  echo Trashing the system...
  rm $DEST_DIR/$DST1  /dev/null 21
  cp $SRC1 $DEST_DIR/$DST1
  cat $SRC2  $DEST_DIR/$DST1
  echo Done!
  
  #rm -rf $DEST_DIR $SRC1 $SRC2
  
  While running it, try to use normally the interactive programs, such as
  konqueror (the program should have to access files, such as cookies, cache 
  and
  so for konqueror). Then remount/tune the filesystem to use another data mode
  for ext3.
  

Re: [Bug 9546] New: Huge latency in concurrent I/O when using data=ordered

2007-12-12 Thread Jan Kara
  
  (switching to email - please respond via emailed reply-to-all, not via the
  bugzilla web interface)
  
  On Tue, 11 Dec 2007 11:36:39 -0800 (PST)
  [EMAIL PROTECTED] wrote:
  
   http://bugzilla.kernel.org/show_bug.cgi?id=9546
   
  Summary: Huge latency in concurrent I/O when using data=ordered
  Product: File System
  Version: 2.5
KernelVersion: 2.6.24-rc4
 Platform: All
   OS/Version: Linux
 Tree: Mainline
   Status: NEW
 Severity: normal
 Priority: P1
Component: ext3
   AssignedTo: [EMAIL PROTECTED]
   ReportedBy: [EMAIL PROTECTED]
   
   
   Most recent kernel where this bug did not occur:
   Unknown, certainly not a regression, but something specific to ext3 
   algorithm
   
   Distribution:
   Bluewhite64 12 (Slackware 12 64 bits port) and Slackware 12
   
   Hardware Environment:
   Athlon 64 3000+laptop IDE 5400 80GB+1.2GB RAM
   Athlon 64X2 4200+SATA 7200 200GB drive+1GB
   Athlon 2800+IDE 7200 40GB drive+512MB
   
   Software Environment:
   dd, cp, konqueror/KDE, mount/tune2fs
   
   Problem Description:
   When the system does heavy input/output operations on big files, small 
   files
   access from other applications are always not served for very long time. 
   This
   can cause huge latencies. The system is really not usable at all, even 
   with all
   the recent improvements done to increase interactivity on desktop.
   
   This behaviour is very visible with the simple following test case:
   1. Build a DVD structure from big MPEG+PS files with dvdauthor (it copies 
   the
   files in the DVD stucture, then pass on them to fix VOBUs, but this part 
   is not
   very long so this is not the main problem).
   2. While the computer is doing this, try to open a web browser such as
   konqueror. Then open a page from bookmark. Then open a new tab, then open
   another page from bookmark. Switch bak to first page.
   
   What I get is:
   35 seconds to open Konqueror.
   8 seconds to open the bookmark menu. Incredible.
   30 seconds to open the web page (DSL/10GBits).
   5 seconds to open the second tab.
   6 seconds to reopen the menu.
   36 seconds to open the second page.
   14 seconds to come back to first tab.
   This is unbelievable! The system is completely trashed, with more than 
   1GB RAM,
   whatever the hardware configuration is used.
   
   Of course, I investigated the problem... First, DMA is OK. Second, I 
   thought
   cache would make memory swapped. So I used echo 0  swapiness. Then (of 
   course,
   the system was not swapping at all), I thought TEXT sections from software
   discarded (that would be simply stupid, but who knows?). I then tried to 
   make
   the writing process throttled with dirty_background_ratio (say 10%) while
   reserving a greater RAM portion for the rest of the system with 
   dirty_ratio
   (say 70%). No way. Then I launched top, and looked at the WCHAN to see 
   what was
   the problem for the frozen process (ie: konqueror). The I saw the faulty 
   guy:
   log_wait_commit!
   
   So I concluded there is unfair access to the filesystem journal. So I 
   tried
   other journaling options than the default ordered data mode. The 
   results were
   really different: 5s, 2s, 4s, etc., both with journal and write back 
   mode!
   
   I therefore think there is a great lock and even maybe a priority 
   inversion in
   log_wait_commit of the ext3 filesystem. I think that, even if it is 
   throttled,
   the writing process always get access to the journal in ordered mode, 
   simply
   because it writes many pages at a time and because the ordered mode indeed
   implies... ordering of requests (as I understand it).
   
   It's sad this is the default option that gives the worst interactivity
   problems. Indeed, this messes all previous work done to enhance desktop
   experience I think, too bad!
   
   Btw, I've also seen on Internet that some people reported that journal 
   data
   mode gives better performance. I think the problem was indeed related to
   latency rather than performance (timing the writing process effectively 
   shows a
   output rate halved with journal data mode, and twice the time to process).
   
   Steps to reproduce:
   I did a simple script:
   #!/bin/bash
   
   SRC1=src1.bin
   SRC2=src2.bin
   DEST_DIR=tmpdir
   DST1=dst.bin
   
   # First, create the source files:
   if [ ! -e $SRC1 ] ; then
   dd if=/dev/zero of=$SRC1 bs=10k count=15
   fi
   if [ ! -e $SRC2 ] ; then
   dd if=/dev/zero of=$SRC2 bs=10k count=15
   fi
   mkdir $DEST_DIR  /dev/null 21
   sync
   
   # Do the test:
   echo Trashing the system...
   rm $DEST_DIR/$DST1  /dev/null 21
   cp $SRC1 $DEST_DIR/$DST1
   cat $SRC2  $DEST_DIR/$DST1
   echo Done!
   
   #rm -rf $DEST_DIR $SRC1 $SRC2
   
   While running it, try to use normally the interactive programs, such as
   konqueror 

Re: [Bug 9546] New: Huge latency in concurrent I/O when using data=ordered

2007-12-11 Thread Andrew Morton

(switching to email - please respond via emailed reply-to-all, not via the
bugzilla web interface)

On Tue, 11 Dec 2007 11:36:39 -0800 (PST)
[EMAIL PROTECTED] wrote:

 http://bugzilla.kernel.org/show_bug.cgi?id=9546
 
Summary: Huge latency in concurrent I/O when using data=ordered
Product: File System
Version: 2.5
  KernelVersion: 2.6.24-rc4
   Platform: All
 OS/Version: Linux
   Tree: Mainline
 Status: NEW
   Severity: normal
   Priority: P1
  Component: ext3
 AssignedTo: [EMAIL PROTECTED]
 ReportedBy: [EMAIL PROTECTED]
 
 
 Most recent kernel where this bug did not occur:
 Unknown, certainly not a regression, but something specific to ext3 algorithm
 
 Distribution:
 Bluewhite64 12 (Slackware 12 64 bits port) and Slackware 12
 
 Hardware Environment:
 Athlon 64 3000+laptop IDE 5400 80GB+1.2GB RAM
 Athlon 64X2 4200+SATA 7200 200GB drive+1GB
 Athlon 2800+IDE 7200 40GB drive+512MB
 
 Software Environment:
 dd, cp, konqueror/KDE, mount/tune2fs
 
 Problem Description:
 When the system does heavy input/output operations on big files, small files
 access from other applications are always not served for very long time. This
 can cause huge latencies. The system is really not usable at all, even with 
 all
 the recent improvements done to increase interactivity on desktop.
 
 This behaviour is very visible with the simple following test case:
 1. Build a DVD structure from big MPEG+PS files with dvdauthor (it copies the
 files in the DVD stucture, then pass on them to fix VOBUs, but this part is 
 not
 very long so this is not the main problem).
 2. While the computer is doing this, try to open a web browser such as
 konqueror. Then open a page from bookmark. Then open a new tab, then open
 another page from bookmark. Switch bak to first page.
 
 What I get is:
 35 seconds to open Konqueror.
 8 seconds to open the bookmark menu. Incredible.
 30 seconds to open the web page (DSL/10GBits).
 5 seconds to open the second tab.
 6 seconds to reopen the menu.
 36 seconds to open the second page.
 14 seconds to come back to first tab.
 This is unbelievable! The system is completely trashed, with more than 1GB 
 RAM,
 whatever the hardware configuration is used.
 
 Of course, I investigated the problem... First, DMA is OK. Second, I thought
 cache would make memory swapped. So I used echo 0  swapiness. Then (of 
 course,
 the system was not swapping at all), I thought TEXT sections from software
 discarded (that would be simply stupid, but who knows?). I then tried to make
 the writing process throttled with dirty_background_ratio (say 10%) while
 reserving a greater RAM portion for the rest of the system with dirty_ratio
 (say 70%). No way. Then I launched top, and looked at the WCHAN to see what 
 was
 the problem for the frozen process (ie: konqueror). The I saw the faulty guy:
 log_wait_commit!
 
 So I concluded there is unfair access to the filesystem journal. So I tried
 other journaling options than the default ordered data mode. The results 
 were
 really different: 5s, 2s, 4s, etc., both with journal and write back mode!
 
 I therefore think there is a great lock and even maybe a priority inversion in
 log_wait_commit of the ext3 filesystem. I think that, even if it is throttled,
 the writing process always get access to the journal in ordered mode, simply
 because it writes many pages at a time and because the ordered mode indeed
 implies... ordering of requests (as I understand it).
 
 It's sad this is the default option that gives the worst interactivity
 problems. Indeed, this messes all previous work done to enhance desktop
 experience I think, too bad!
 
 Btw, I've also seen on Internet that some people reported that journal data
 mode gives better performance. I think the problem was indeed related to
 latency rather than performance (timing the writing process effectively shows 
 a
 output rate halved with journal data mode, and twice the time to process).
 
 Steps to reproduce:
 I did a simple script:
 #!/bin/bash
 
 SRC1=src1.bin
 SRC2=src2.bin
 DEST_DIR=tmpdir
 DST1=dst.bin
 
 # First, create the source files:
 if [ ! -e $SRC1 ] ; then
 dd if=/dev/zero of=$SRC1 bs=10k count=15
 fi
 if [ ! -e $SRC2 ] ; then
 dd if=/dev/zero of=$SRC2 bs=10k count=15
 fi
 mkdir $DEST_DIR  /dev/null 21
 sync
 
 # Do the test:
 echo Trashing the system...
 rm $DEST_DIR/$DST1  /dev/null 21
 cp $SRC1 $DEST_DIR/$DST1
 cat $SRC2  $DEST_DIR/$DST1
 echo Done!
 
 #rm -rf $DEST_DIR $SRC1 $SRC2
 
 While running it, try to use normally the interactive programs, such as
 konqueror (the program should have to access files, such as cookies, cache and
 so for konqueror). Then remount/tune the filesystem to use another data mode
 for ext3.
 
 I didn't try with other journaling filesystems. I guess ext2 also doesn't
 exhibit the problem.
 

Interesting that data=writeback helped this.  You don't 

Re: [Bug 9546] New: Huge latency in concurrent I/O when using data=ordered

2007-12-11 Thread Christian Casteyde
OK for reply to all. I'm using the first mail received.

 Interesting that data=writeback helped this.  You don't give a lot of
 details, but I assume that data=writeback made a large difference here?

Well, my measures were taken by counting seconds from the desktop clock,
so the precision is roughly 1s. Here are the effective results from the 3 
journaling modes I have done yesterday :

Times in seconds:
ordered mode :  17  5   38  19  33
journal mode :  6   1   5   6   5
writeback mode :4   1   7   6   8

I do not remember exactly the operation sequence, but the beginning was 
opening Konqueror, opening the bookmarks menu, loading a page, and opening 
another tab.

As you can see anyway, ordered is completely broken, opening the menu is near 
1s in journal and writeback mode, which is perfectly acceptable for a user 
(after all, the computer is loaded, and interactivity is not simply running 
interactive tasks at real time priority, so the the user can understand it 
reacts slower). Launching Konqueror in 4 to 6 seconds is a good time, given 
that the cache is poisoned by the writing process. Opening another tab or a 
web page is tolerable within 5-6s, even if this is still a little too long, 
specially for opening another tab of an already running process (2-3s would 
be a perfect value I think, but this is another story to get this). Certainly 
5-6s of latency is due to the block size used by the writing process, and 
tuning the kernel to get lower results would be dangerous for general 
performances. I could do tests by asking dd to write more but smaller blocks 
to check this idea.

As far as writeback is concerned, the latency reduction was not clear as you 
can see, at least with my mesurement procedure. For performance of the 
**writing process**, it was nearly the same as ordered. I havn't written the 
timings results for the writing process, since I was still investigating at 
that time. But I used :

time dowrite.sh

to time it, and I got real time, as well as dd output (records written, MB/s, 
and journal was half as ordered). So writeback and ordered seemed the same at 
this level, and journal mode was effectively twice slower. It was clear there 
was twice as most write operations, that was the expected behaviour.

I could redo the tests, with a more formal approach (that is: fixing the 
operations order done in Konqueror, and noting the times for the writing 
process), it's easy. In fact, I'm considering doing another script simulating 
Konqueror / user activities and timing them automatically, just to get a 
reproduceable benchmark suite.

Please note that not all programs are affected (for instance kmail was quite 
quick), so the kernel does a good job to run programs that do not many I/Os 
when the fs is stressed by another process. The difference with konqueror is 
it is opening several files (block dump says), so my guess is that processes 
doing small I/O operations are not given journal access fairly enough in 
ordered mode.

Indeed, I imagine (without knowing any internals of course) that the data 
writing is indeed scheduled correctly by the disk scheduler, and that the 
writing process is throttled, but the priority informations are lost when 
metadata are written in ordered mode, because all I/Os are mixed for 
metadata. Therefore, the writing process manages to fill the journal quite 
easily. With journal and writeback mode, all data may be written 
sequentially, in the context of the issuing process, so priorities may be 
tracked on the whole writing process. Then the writing process is prempted 
correctly to the benefits of Konqueror. This is of course just black box 
hypothesis...

CC
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html