Re: [reiserfs-list] fsync() Performance Issue
On Mon, 2002-05-06 at 21:17, Manuel Krause wrote: > On 05/07/2002 12:57 AM, Chris Mason wrote: > > > Hi, Chris & Hans! > > Don't think this somekind of destructive discussion would lead to > anything useful for now, can you post a diff for > 2.4.19-pre7+latest-related-pending +compound-patch-from-ftp? > > I'll try it and report if that leads to more security and/or less > performance on my every day use with NS6 and so on if there is any. The current data logging patches are at: ftp.suse.com/pub/people/mason/patches/data-logging They are against 2.4.19-pre7, and contain versions of the major (stable) speedups. The patch is pretty big, so I'm not likely to merge with the namesys pending directories. The namesys guys add things frequently, and I think it would get confusing for people trying to figure out which patches to apply. The data logging stuff is beta code, if you have a good test bed where it's ok if things go wrong I can make you a special patch with the pending stuff merged. -chris
Re: [reiserfs-list] fsync() Performance Issue
On 05/07/2002 12:57 AM, Chris Mason wrote: > On Mon, 2002-05-06 at 17:21, Hans Reiser wrote: > >>>I'd rather not put it back in because it adds yet another corner case to >>>maintain for all time. Most of the fsync/O_SYNC bound applications are >>>just given their own partition anyway, so most users that need data >>>logging need it for every write. >>> >>> >>Does mozilla's mail user agent use fsync? Should I give it its own >>partition? I bet it is fsync bound;-) >> > > [ I took Wayne off the cc list, he's probably not horribly interested ] > > Perhaps, but I'll also bet the fsync performance hit doesn't affect the > performance of the system as a whole. Remember that data=journal > doesn't make the fsyncs fast, it just makes them faster. > > >>Most persons using small fsyncs are using it because the person who >>wrote their application wrote it wrong. What's more, many of the >>persons who wrote those applications cannot understand that they did it >>wrong even if you tell them (e.g. qmail author reportedly cannot >>understand, sendmail guys now understand but had Kirk McKusick on their >>staff and attending the meeting when I explained it to them so they are >>not very typical). >> >>In other words, handling stupidity is an important life skill, and we >>all need to excell at it.;-) >> > > A real strength to linux is the application designers can talk directly > to their own personal bottlenecks. Hopefully we reward those that hunt > us down and spend the time convincing us their applications are worth > tuning for. They then proceed to beat the pants off their competition. > > >>Tell me what your thoughts are on the following: >> >>If you ask randomly selected ReiserFS users (not the reiserfs-list, but >>the ones who would never send you an email) the following >>questions, what percentage will answer which choice? >> >>The filesystem you are using is named: >> >>a) the Performance Optimized SuSE FS >> >>b) NTFS >> >>c) FAT >> >>d) ext2 >> >>e) ReiserFS >> > > I believe the ones that know what a filesystem is will answer ReiserFS, > You might get a lot of ext2 answers, just because that's what a lot of > people think the linux filesystem is. > > >>If you want to change reiserfs to use data journaling you must do which: >> >>a) reinstall the reiserfs package using rpm >> >>b) modify /etc/fs.conf >> >>c) reinstall the operating system from scratch, and select different >>options during the install this time >> >>d) reformat your reiserfs partition using mkreiserfs >> >>e) none of the above >> >>f) all of the above except e) >> > > These people won't be admins of systems big enough for the difference to > matter. data journaling is targeted at people with so much load they > would have to buy more hardware to make up for it. The new option > lowers the price to performance ratio, which is exactly what we want to > do for sendmails, egeneras, lycos, etc. If it takes my laptop 20ms to > deliver a mail message, cutting the time down to 10ms just won't matter. > > >> >>What do you think the chances are that you can convince Hubert that >>every SuSE Enterprise Edition user should be asked at install time if >>they are going to use fsync a lot on each partition, and to use a >>different fstab setting if yes? >> > > Very little, I might tell them to buy the suse email server instead, > since that would have the settings done right. data=journal is just a > small part of mail server tuning. > > >>I know that you are an experienced sysadmin who was good at it. Your >>intuition tells you that most sysadmins are like the ones you were >>willing to hire into your group at the university. They aren't. >> >>Linux needs to be like a telephone. You plug it in, push buttons, and >>talk. It works well, but most folks don't know why. >> >> > > Exactly. I think there are 3 classes of users at play here. > > 1) Those who don't understand and don't have enough load to notice. > 2) Those who don't understand and do have enough load to notice. > 3) Those who do understand and do have enough load to notice. > > #2 will buy support from someone, and they should be able to configure > the thing right. > > #3 will find the docs and do it right themselves. > > >>A moderate number of programs are small fsync bound for the simple >>reason that it is simpler to write them that way.We need to cover >>over their simplistic designs. >> >>So, you have my sympathies Chris, because I believe you that it makes >>the code uglier and it won't be a joy to code and test. I hope you also >>see that it should be done. >> > > Mostly, I feel this kind of tuning is a mistake right now. The patch is > young and there are so many places left to tweak...I'm still at the > stage where much larger improvements are possible, and a better use of > coding time. Plus, it's monday and it's always more fun to debate than > give in on mondays. > > -chris > Hi, Chris & Hans! D
Re: [reiserfs-list] fsync() Performance Issue
Chris Mason wrote: >On Mon, 2002-05-06 at 17:21, Hans Reiser wrote: > > >>>I'd rather not put it back in because it adds yet another corner case to >>>maintain for all time. Most of the fsync/O_SYNC bound applications are >>>just given their own partition anyway, so most users that need data >>>logging need it for every write. >>> >>> >>> >>Does mozilla's mail user agent use fsync? Should I give it its own >>partition? I bet it is fsync bound;-) >> >> > >[ I took Wayne off the cc list, he's probably not horribly interested ] > >Perhaps, but I'll also bet the fsync performance hit doesn't affect the >performance of the system as a whole. > I suspect that on my laptop, downloading emails is disk bound due to fsync() I haven't measured it, but it "feels" that way. > >Mostly, I feel this kind of tuning is a mistake right now. The patch is >young and there are so many places left to tweak...I'm still at the >stage where much larger improvements are possible, and a better use of >coding time. Plus, it's monday and it's always more fun to debate than >give in on mondays. > >-chris > > > > > > Needing more time to finish analyzing what is going on and what fixes it best is always a good reason to defer things Hans
Re: [reiserfs-list] fsync() Performance Issue
On Mon, 2002-05-06 at 17:21, Hans Reiser wrote: > > >I'd rather not put it back in because it adds yet another corner case to > >maintain for all time. Most of the fsync/O_SYNC bound applications are > >just given their own partition anyway, so most users that need data > >logging need it for every write. > > > Does mozilla's mail user agent use fsync? Should I give it its own > partition? I bet it is fsync bound;-) [ I took Wayne off the cc list, he's probably not horribly interested ] Perhaps, but I'll also bet the fsync performance hit doesn't affect the performance of the system as a whole. Remember that data=journal doesn't make the fsyncs fast, it just makes them faster. > > Most persons using small fsyncs are using it because the person who > wrote their application wrote it wrong. What's more, many of the > persons who wrote those applications cannot understand that they did it > wrong even if you tell them (e.g. qmail author reportedly cannot > understand, sendmail guys now understand but had Kirk McKusick on their > staff and attending the meeting when I explained it to them so they are > not very typical). > > In other words, handling stupidity is an important life skill, and we > all need to excell at it.;-) A real strength to linux is the application designers can talk directly to their own personal bottlenecks. Hopefully we reward those that hunt us down and spend the time convincing us their applications are worth tuning for. They then proceed to beat the pants off their competition. > > Tell me what your thoughts are on the following: > > If you ask randomly selected ReiserFS users (not the reiserfs-list, but > the ones who would never send you an email) the following > questions, what percentage will answer which choice? > > The filesystem you are using is named: > > a) the Performance Optimized SuSE FS > > b) NTFS > > c) FAT > > d) ext2 > > e) ReiserFS I believe the ones that know what a filesystem is will answer ReiserFS, You might get a lot of ext2 answers, just because that's what a lot of people think the linux filesystem is. > > If you want to change reiserfs to use data journaling you must do which: > > a) reinstall the reiserfs package using rpm > > b) modify /etc/fs.conf > > c) reinstall the operating system from scratch, and select different > options during the install this time > > d) reformat your reiserfs partition using mkreiserfs > > e) none of the above > > f) all of the above except e) These people won't be admins of systems big enough for the difference to matter. data journaling is targeted at people with so much load they would have to buy more hardware to make up for it. The new option lowers the price to performance ratio, which is exactly what we want to do for sendmails, egeneras, lycos, etc. If it takes my laptop 20ms to deliver a mail message, cutting the time down to 10ms just won't matter. > > > What do you think the chances are that you can convince Hubert that > every SuSE Enterprise Edition user should be asked at install time if > they are going to use fsync a lot on each partition, and to use a > different fstab setting if yes? Very little, I might tell them to buy the suse email server instead, since that would have the settings done right. data=journal is just a small part of mail server tuning. > > I know that you are an experienced sysadmin who was good at it. Your > intuition tells you that most sysadmins are like the ones you were > willing to hire into your group at the university. They aren't. > > Linux needs to be like a telephone. You plug it in, push buttons, and > talk. It works well, but most folks don't know why. > Exactly. I think there are 3 classes of users at play here. 1) Those who don't understand and don't have enough load to notice. 2) Those who don't understand and do have enough load to notice. 3) Those who do understand and do have enough load to notice. #2 will buy support from someone, and they should be able to configure the thing right. #3 will find the docs and do it right themselves. > A moderate number of programs are small fsync bound for the simple > reason that it is simpler to write them that way.We need to cover > over their simplistic designs. > > So, you have my sympathies Chris, because I believe you that it makes > the code uglier and it won't be a joy to code and test. I hope you also > see that it should be done. Mostly, I feel this kind of tuning is a mistake right now. The patch is young and there are so many places left to tweak...I'm still at the stage where much larger improvements are possible, and a better use of coding time. Plus, it's monday and it's always more fun to debate than give in on mondays. -chris
Re: [reiserfs-list] fsync() Performance Issue
Chris Mason wrote: >On Sat, 2002-05-04 at 10:59, Hans Reiser wrote: > > >>So how about if you revise fsync so that it always sends data blocks to >>the journal not to the main disk? >> >> > >This gets a little sticky. > >Once you log a block, it might be replayed after a crash. So, you have >to protect against corner cases like this: > >write(file) >fsync(file) ; /* logs modified data blocks */ >write(file) ; /* write the same blocks without fsync */ >sync ;/* use expects new version of the blocks on disk */ > > >During replay, the logged data blocks overwrite the blocks sent to disk >via sync(). > >This isn't hard to correct for, every time a buffer is marked dirty, you >check the journal hash tables to see if it is replayable, and if so you >log it instead (the 2.2.x code did this due to tails). This translates >to increased CPU usage for every write. > >I'd rather not put it back in because it adds yet another corner case to >maintain for all time. Most of the fsync/O_SYNC bound applications are >just given their own partition anyway, so most users that need data >logging need it for every write. > Does mozilla's mail user agent use fsync? Should I give it its own partition? I bet it is fsync bound;-) Also, I don't think you can reasonably expect most persons to know that they should turn data logging on for high fsync performance, even if you document it. Most persons using small fsyncs are using it because the person who wrote their application wrote it wrong. What's more, many of the persons who wrote those applications cannot understand that they did it wrong even if you tell them (e.g. qmail author reportedly cannot understand, sendmail guys now understand but had Kirk McKusick on their staff and attending the meeting when I explained it to them so they are not very typical). In other words, handling stupidity is an important life skill, and we all need to excell at it.;-) Tell me what your thoughts are on the following: If you ask randomly selected ReiserFS users (not the reiserfs-list, but the ones who would never send you an email) the following questions, what percentage will answer which choice? The filesystem you are using is named: a) the Performance Optimized SuSE FS b) NTFS c) FAT d) ext2 e) ReiserFS If you want to change reiserfs to use data journaling you must do which: a) reinstall the reiserfs package using rpm b) modify /etc/fs.conf c) reinstall the operating system from scratch, and select different options during the install this time d) reformat your reiserfs partition using mkreiserfs e) none of the above f) all of the above except e) What do you think the chances are that you can convince Hubert that every SuSE Enterprise Edition user should be asked at install time if they are going to use fsync a lot on each partition, and to use a different fstab setting if yes? I know that you are an experienced sysadmin who was good at it. Your intuition tells you that most sysadmins are like the ones you were willing to hire into your group at the university. They aren't. Linux needs to be like a telephone. You plug it in, push buttons, and talk. It works well, but most folks don't know why. A moderate number of programs are small fsync bound for the simple reason that it is simpler to write them that way.We need to cover over their simplistic designs. So, you have my sympathies Chris, because I believe you that it makes the code uglier and it won't be a joy to code and test. I hope you also see that it should be done. Hans
Re: [reiserfs-list] fsync() Performance Issue
Chris Mason wrote: >On Sat, 2002-05-04 at 10:59, Hans Reiser wrote: > > >>So how about if you revise fsync so that it always sends data blocks to >>the journal not to the main disk? >> >> > >This gets a little sticky. > >Once you log a block, it might be replayed after a crash. So, you have >to protect against corner cases like this: > >write(file) >fsync(file) ; /* logs modified data blocks */ >write(file) ; /* write the same blocks without fsync */ >sync ;/* use expects new version of the blocks on disk */ > > >During replay, the logged data blocks overwrite the blocks sent to disk >via sync(). > >This isn't hard to correct for, every time a buffer is marked dirty, you >check the journal hash tables to see if it is replayable, and if so you >log it instead (the 2.2.x code did this due to tails). This translates >to increased CPU usage for every write. > Significant increased CPU usage? > >I'd rather not put it back in because it adds yet another corner case to >maintain for all time. Most of the fsync/O_SYNC bound applications are >just given their own partition anyway, so most users that need data >logging need it for every write. > most users don't know enough to turn it on;-) > >-chris > > > > > > > >
RE: [reiserfs-list] fsync() Performance Issue
I'll add the write caching into the test just for info. Until there is a way to guaranty the data is safe I'll have to go with no write caching though. I should have all this testing done by the end of the week. -Original Message- From: Chris Mason [mailto:[EMAIL PROTECTED]] Sent: Friday, May 03, 2002 6:00 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: RE: [reiserfs-list] fsync() Performance Issue On Fri, 2002-05-03 at 16:35, [EMAIL PROTECTED] wrote: > Chris, I have some quick preliminary results for you. I have > additional testing to perform and haven't run debugreiserfs() yet. If you > have a preference for which tests to run debugreiserfs() let me know. > Base testing was done against 2.4.13 built on RH 7.1 using the > test_writes.c code I forwarded to you. The system is a Tyan with single > PIII, IDE Promise 20269, Maxtor 160GB drive - write cache disabled. All > numbers are with fsync() and 1KB files. As I said, more testing, i.e. > filesizes, need to be performed. > 2.4.19-pre7 speedup, data logging, write barrier / no options > => 47.1ms/file Hi Wayne, thanks for sending these along. I expected a slight improvement over the 2.4.13 code even with the data logging turned off. I'm curious to see how it does with the IDE cache turned on. With scsi, I see 10-15% better without any options than an unpatched kernel. > 2.4.19-pre7 speedup, data logging, write barrier / data=journal > => 25.2ms/file > 2.4.19-pre7 speedup, data logging, write barrier / data=journal,barrier=none > => 27.8ms/file The barrier option doesn't make much difference because the write cache is off. With write cache on, the barrier code should allow you to be faster than with the caching off, but without risking the data (Jens and I are working on final fsync safety issues though). Hans, data=journal turns on the data journaling. The data journaling patches also include optimizations to write metadata back to disk in bigger chunks for tiny transactions (the current method is to write one transaction's worth back, when a transaction has 3 blocks, this is pretty slow). I've put these patches up on: ftp.suse.com/pub/people/mason/patches/data-logging > One question is will these patches be going into the 2.4 tree and > when? The data logging patches are a huge change, but the good news is they are based on the nesting patches that have been stable for a long time in the quota code. I'll probably want a month or more of heavy testing before I think about submitting them. -chris
Re: [reiserfs-list] fsync() Performance Issue
On Sat, 2002-05-04 at 10:59, Hans Reiser wrote: > > So how about if you revise fsync so that it always sends data blocks to > the journal not to the main disk? This gets a little sticky. Once you log a block, it might be replayed after a crash. So, you have to protect against corner cases like this: write(file) fsync(file) ; /* logs modified data blocks */ write(file) ; /* write the same blocks without fsync */ sync ;/* use expects new version of the blocks on disk */ During replay, the logged data blocks overwrite the blocks sent to disk via sync(). This isn't hard to correct for, every time a buffer is marked dirty, you check the journal hash tables to see if it is replayable, and if so you log it instead (the 2.2.x code did this due to tails). This translates to increased CPU usage for every write. I'd rather not put it back in because it adds yet another corner case to maintain for all time. Most of the fsync/O_SYNC bound applications are just given their own partition anyway, so most users that need data logging need it for every write. -chris
Re: [reiserfs-list] fsync() Performance Issue
Hello! On Thu, May 02, 2002 at 07:07:18AM +0200, Christian Stuke wrote: > Could we have this for 2.4.18+ pending also please? This patch would apply to 2.4.18 + pending patches, I believe. As for including these patchs into pending queue for 2.4.18, this is impossible now, it is too big of a change, unfortunatelly. We hope to get something like this into 2.4.19-pre1+ Bye, Oleg
Re: [reiserfs-list] fsync() Performance Issue
Could we have this for 2.4.18+ pending also please? Chris - Original Message - From: "Oleg Drokin" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Tuesday, April 30, 2002 4:20 PM Subject: Re: [reiserfs-list] fsync() Performance Issue > Hello! > > On Fri, Apr 26, 2002 at 04:28:26PM -0400, [EMAIL PROTECTED] wrote: > > I'm wondering if anyone out there may have some suggestions on how > > to improve the performance of a system employing fsync(). I have to be able > > to guaranty that every write to my fileserver is on disk when the client has > > passed it to the server. Therefore, I have disabled write cache on the disk > > and issue an fsync() per file. I'm running 2.4.19-pre7, reiserfs 3.6.25, > > without additional patches. I have seen some discussions out here about > > various other "speed-up" patches and am wondering if I need to add these to > > 2.4.19-pre7? And what they are and where can I obtain said patches? Also, > > I'm wondering if there is another solution to syncing the data that is > > faster than fsync(). Testing, thusfar, has shown a large disparity between > > running with and without sync.Another idea is to explore another filesystem, > > but I'm not exactly excited by the other journaling filesystems out there at > > this time. All ideas will be greatly appreciated. > > Attached is a speedup patch for 2.4.19-pre7 that should help your fsync > operations a little. (From Chris Mason). > Filesystem cannot do very much at this point unfortunatelly, it is ending up > waiting for disk to finish write operations. > > Also we are working on other speedup patches that would cover different areas > of write perfomance itself. > > Bye, > Oleg >
RE: [reiserfs-list] fsync() Performance Issue
Thanks. I'll start putting this one into test. Wayne. -Original Message- From: Chris Mason [mailto:[EMAIL PROTECTED]] Sent: Tuesday, April 30, 2002 10:28 AM To: Oleg Drokin Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: [reiserfs-list] fsync() Performance Issue On Tue, 2002-04-30 at 10:20, Oleg Drokin wrote: > Attached is a speedup patch for 2.4.19-pre7 that should help your fsync > operations a little. (From Chris Mason). > Filesystem cannot do very much at this point unfortunatelly, it is ending up > waiting for disk to finish write operations. > > Also we are working on other speedup patches that would cover different areas > of write perfomance itself. A newer one (against 2.4.19-pre7) is below. It has not been through as much testing on the namesys side, which is why Oleg sent the older one. Wayne and I have been talking in private mail, he's getting a bunch of beta patches later today (this speedup, data logging, updated barrier code). Along with instructions for testing. -chris # Veritas (Hugh Dickins supplied the patch) sent the bits in # fs/super.c that allow the FS to leave super->s_dirt set after a # write_super call. # diff -urN --exclude *.orig parent/fs/buffer.c comp/fs/buffer.c --- parent/fs/buffer.c Mon Apr 29 10:20:24 2002 +++ comp/fs/buffer.cMon Apr 29 10:20:22 2002 @@ -325,6 +325,8 @@ lock_super(sb); if (sb->s_dirt && sb->s_op && sb->s_op->write_super) sb->s_op->write_super(sb); + if (sb->s_op && sb->s_op->commit_super) + sb->s_op->commit_super(sb); unlock_super(sb); unlock_kernel(); @@ -344,7 +346,7 @@ lock_kernel(); sync_inodes(dev); DQUOT_SYNC(dev); - sync_supers(dev); + commit_supers(dev); unlock_kernel(); return sync_buffers(dev, 1); diff -urN --exclude *.orig parent/fs/reiserfs/bitmap.c comp/fs/reiserfs/bitmap.c --- parent/fs/reiserfs/bitmap.c Mon Apr 29 10:20:24 2002 +++ comp/fs/reiserfs/bitmap.c Mon Apr 29 10:20:19 2002 @@ -122,7 +122,6 @@ set_sb_free_blocks( rs, sb_free_blocks(rs) + 1 ); journal_mark_dirty (th, s, sbh); - s->s_dirt = 1; } void reiserfs_free_block (struct reiserfs_transaction_handle *th, @@ -433,7 +432,6 @@ /* update free block count in super block */ PUT_SB_FREE_BLOCKS( s, SB_FREE_BLOCKS(s) - init_amount_needed ); journal_mark_dirty (th, s, SB_BUFFER_WITH_SB (s)); - s->s_dirt = 1; return CARRY_ON; } diff -urN --exclude *.orig parent/fs/reiserfs/ibalance.c comp/fs/reiserfs/ibalance.c --- parent/fs/reiserfs/ibalance.c Mon Apr 29 10:20:24 2002 +++ comp/fs/reiserfs/ibalance.c Mon Apr 29 10:20:19 2002 @@ -632,7 +632,6 @@ /* use check_internal if new root is an internal node */ check_internal (new_root); /*&&&&&&&&&&&&&&&&&&&&&&*/ - tb->tb_sb->s_dirt = 1; /* do what is needed for buffer thrown from tree */ reiserfs_invalidate_buffer(tb, tbSh); @@ -950,7 +949,6 @@ PUT_SB_ROOT_BLOCK( tb->tb_sb, tbSh->b_blocknr ); PUT_SB_TREE_HEIGHT( tb->tb_sb, SB_TREE_HEIGHT(tb->tb_sb) + 1 ); do_balance_mark_sb_dirty (tb, tb->tb_sb->u.reiserfs_sb.s_sbh, 1); - tb->tb_sb->s_dirt = 1; } if ( tb->blknum[h] == 2 ) { diff -urN --exclude *.orig parent/fs/reiserfs/journal.c comp/fs/reiserfs/journal.c --- parent/fs/reiserfs/journal.cMon Apr 29 10:20:24 2002 +++ comp/fs/reiserfs/journal.c Mon Apr 29 10:20:21 2002 @@ -64,12 +64,15 @@ */ static int reiserfs_mounted_fs_count = 0 ; +static struct list_head kreiserfsd_supers = LIST_HEAD_INIT(kreiserfsd_supers); + /* wake this up when you add something to the commit thread task queue */ DECLARE_WAIT_QUEUE_HEAD(reiserfs_commit_thread_wait) ; /* wait on this if you need to be sure you task queue entries have been run */ static DECLARE_WAIT_QUEUE_HEAD(reiserfs_commit_thread_done) ; DECLARE_TASK_QUEUE(reiserfs_commit_thread_tq) ; +DECLARE_MUTEX(kreiserfsd_sem) ; #define JOURNAL_TRANS_HALF 1018 /* must be correct to keep the desc and commit structs at 4k */ @@ -576,17 +579,12 @@ /* lock the current transaction */ inline static void lock_journal(struct super_block *p_s_sb) { PROC_INFO_INC( p_s_sb, journal.lock_journal ); - while(atomic_read(&(SB_JOURNAL(p_s_sb)->j_wlock)) > 0) { -PROC_INFO_INC( p_s_sb, journal.lock_journal_wait ); -sleep_on(&(SB_JOURNAL(p_s_sb)->j_wait)) ; - } - atomic_set(&(SB_JOURNAL(p_s_sb)->j_wlock), 1) ; + down(&SB_JOURNAL(p_s_sb)->j_lock); } /* unlock the current transaction */ inline static void unlock_journal(struct super_block *p_s_sb) { - atomic_dec(&(SB_JOURNAL(p_s_sb)
Re: [reiserfs-list] fsync() Performance Issue
On Tue, 2002-04-30 at 10:20, Oleg Drokin wrote: > Attached is a speedup patch for 2.4.19-pre7 that should help your fsync > operations a little. (From Chris Mason). > Filesystem cannot do very much at this point unfortunatelly, it is ending up > waiting for disk to finish write operations. > > Also we are working on other speedup patches that would cover different areas > of write perfomance itself. A newer one (against 2.4.19-pre7) is below. It has not been through as much testing on the namesys side, which is why Oleg sent the older one. Wayne and I have been talking in private mail, he's getting a bunch of beta patches later today (this speedup, data logging, updated barrier code). Along with instructions for testing. -chris # Veritas (Hugh Dickins supplied the patch) sent the bits in # fs/super.c that allow the FS to leave super->s_dirt set after a # write_super call. # diff -urN --exclude *.orig parent/fs/buffer.c comp/fs/buffer.c --- parent/fs/buffer.c Mon Apr 29 10:20:24 2002 +++ comp/fs/buffer.cMon Apr 29 10:20:22 2002 @@ -325,6 +325,8 @@ lock_super(sb); if (sb->s_dirt && sb->s_op && sb->s_op->write_super) sb->s_op->write_super(sb); + if (sb->s_op && sb->s_op->commit_super) + sb->s_op->commit_super(sb); unlock_super(sb); unlock_kernel(); @@ -344,7 +346,7 @@ lock_kernel(); sync_inodes(dev); DQUOT_SYNC(dev); - sync_supers(dev); + commit_supers(dev); unlock_kernel(); return sync_buffers(dev, 1); diff -urN --exclude *.orig parent/fs/reiserfs/bitmap.c comp/fs/reiserfs/bitmap.c --- parent/fs/reiserfs/bitmap.c Mon Apr 29 10:20:24 2002 +++ comp/fs/reiserfs/bitmap.c Mon Apr 29 10:20:19 2002 @@ -122,7 +122,6 @@ set_sb_free_blocks( rs, sb_free_blocks(rs) + 1 ); journal_mark_dirty (th, s, sbh); - s->s_dirt = 1; } void reiserfs_free_block (struct reiserfs_transaction_handle *th, @@ -433,7 +432,6 @@ /* update free block count in super block */ PUT_SB_FREE_BLOCKS( s, SB_FREE_BLOCKS(s) - init_amount_needed ); journal_mark_dirty (th, s, SB_BUFFER_WITH_SB (s)); - s->s_dirt = 1; return CARRY_ON; } diff -urN --exclude *.orig parent/fs/reiserfs/ibalance.c comp/fs/reiserfs/ibalance.c --- parent/fs/reiserfs/ibalance.c Mon Apr 29 10:20:24 2002 +++ comp/fs/reiserfs/ibalance.c Mon Apr 29 10:20:19 2002 @@ -632,7 +632,6 @@ /* use check_internal if new root is an internal node */ check_internal (new_root); /*&&*/ - tb->tb_sb->s_dirt = 1; /* do what is needed for buffer thrown from tree */ reiserfs_invalidate_buffer(tb, tbSh); @@ -950,7 +949,6 @@ PUT_SB_ROOT_BLOCK( tb->tb_sb, tbSh->b_blocknr ); PUT_SB_TREE_HEIGHT( tb->tb_sb, SB_TREE_HEIGHT(tb->tb_sb) + 1 ); do_balance_mark_sb_dirty (tb, tb->tb_sb->u.reiserfs_sb.s_sbh, 1); - tb->tb_sb->s_dirt = 1; } if ( tb->blknum[h] == 2 ) { diff -urN --exclude *.orig parent/fs/reiserfs/journal.c comp/fs/reiserfs/journal.c --- parent/fs/reiserfs/journal.cMon Apr 29 10:20:24 2002 +++ comp/fs/reiserfs/journal.c Mon Apr 29 10:20:21 2002 @@ -64,12 +64,15 @@ */ static int reiserfs_mounted_fs_count = 0 ; +static struct list_head kreiserfsd_supers = LIST_HEAD_INIT(kreiserfsd_supers); + /* wake this up when you add something to the commit thread task queue */ DECLARE_WAIT_QUEUE_HEAD(reiserfs_commit_thread_wait) ; /* wait on this if you need to be sure you task queue entries have been run */ static DECLARE_WAIT_QUEUE_HEAD(reiserfs_commit_thread_done) ; DECLARE_TASK_QUEUE(reiserfs_commit_thread_tq) ; +DECLARE_MUTEX(kreiserfsd_sem) ; #define JOURNAL_TRANS_HALF 1018 /* must be correct to keep the desc and commit structs at 4k */ @@ -576,17 +579,12 @@ /* lock the current transaction */ inline static void lock_journal(struct super_block *p_s_sb) { PROC_INFO_INC( p_s_sb, journal.lock_journal ); - while(atomic_read(&(SB_JOURNAL(p_s_sb)->j_wlock)) > 0) { -PROC_INFO_INC( p_s_sb, journal.lock_journal_wait ); -sleep_on(&(SB_JOURNAL(p_s_sb)->j_wait)) ; - } - atomic_set(&(SB_JOURNAL(p_s_sb)->j_wlock), 1) ; + down(&SB_JOURNAL(p_s_sb)->j_lock); } /* unlock the current transaction */ inline static void unlock_journal(struct super_block *p_s_sb) { - atomic_dec(&(SB_JOURNAL(p_s_sb)->j_wlock)) ; - wake_up(&(SB_JOURNAL(p_s_sb)->j_wait)) ; + up(&SB_JOURNAL(p_s_sb)->j_lock); } /* @@ -756,7 +754,6 @@ atomic_set(&(jl->j_commit_flushing), 0) ; wake_up(&(jl->j_commit_wait)) ; - s->s_dirt = 1 ; return 0 ; } @@ -1220,7 +1217,6 @@ if (run++ == 0) { goto loop_start ; } - atomic_set(&(jl->j_flushing), 0) ; wake_up(&(jl->j_flush_wait)) ; return ret ; @@ -1250,7 +1246,7 @@ while(i != start) { jl = SB_JOURNAL_LIST(s) + i ; age = CURRENT_TIME - jl->j_time
Re: [reiserfs-list] fsync() Performance Issue
Hello! On Fri, Apr 26, 2002 at 04:28:26PM -0400, [EMAIL PROTECTED] wrote: > I'm wondering if anyone out there may have some suggestions on how > to improve the performance of a system employing fsync(). I have to be able > to guaranty that every write to my fileserver is on disk when the client has > passed it to the server. Therefore, I have disabled write cache on the disk > and issue an fsync() per file. I'm running 2.4.19-pre7, reiserfs 3.6.25, > without additional patches. I have seen some discussions out here about > various other "speed-up" patches and am wondering if I need to add these to > 2.4.19-pre7? And what they are and where can I obtain said patches? Also, > I'm wondering if there is another solution to syncing the data that is > faster than fsync(). Testing, thusfar, has shown a large disparity between > running with and without sync.Another idea is to explore another filesystem, > but I'm not exactly excited by the other journaling filesystems out there at > this time. All ideas will be greatly appreciated. Attached is a speedup patch for 2.4.19-pre7 that should help your fsync operations a little. (From Chris Mason). Filesystem cannot do very much at this point unfortunatelly, it is ending up waiting for disk to finish write operations. Also we are working on other speedup patches that would cover different areas of write perfomance itself. Bye, Oleg diff -uNr linux-2.4.19-pre6.o/fs/buffer.c linux-2.4.19-pre6.speedup/fs/buffer.c --- linux-2.4.19-pre6.o/fs/buffer.c Mon Apr 8 14:53:24 2002 +++ linux-2.4.19-pre6.speedup/fs/buffer.c Wed Apr 10 10:43:46 2002 @@ -325,6 +325,8 @@ lock_super(sb); if (sb->s_dirt && sb->s_op && sb->s_op->write_super) sb->s_op->write_super(sb); + if (sb->s_op && sb->s_op->commit_super) + sb->s_op->commit_super(sb); unlock_super(sb); unlock_kernel(); @@ -344,7 +346,7 @@ lock_kernel(); sync_inodes(dev); DQUOT_SYNC(dev); - sync_supers(dev); + commit_supers(dev); unlock_kernel(); return sync_buffers(dev, 1); Binary files linux-2.4.19-pre6.o/fs/reiserfs/.journal.c.rej.swp and linux-2.4.19-pre6.speedup/fs/reiserfs/.journal.c.rej.swp differ diff -uNr linux-2.4.19-pre6.o/fs/reiserfs/bitmap.c linux-2.4.19-pre6.speedup/fs/reiserfs/bitmap.c --- linux-2.4.19-pre6.o/fs/reiserfs/bitmap.cMon Apr 8 14:53:24 2002 +++ linux-2.4.19-pre6.speedup/fs/reiserfs/bitmap.c Wed Apr 10 10:43:46 2002 @@ -122,7 +122,6 @@ set_sb_free_blocks( rs, sb_free_blocks(rs) + 1 ); journal_mark_dirty (th, s, sbh); - s->s_dirt = 1; } void reiserfs_free_block (struct reiserfs_transaction_handle *th, @@ -433,7 +432,6 @@ /* update free block count in super block */ PUT_SB_FREE_BLOCKS( s, SB_FREE_BLOCKS(s) - init_amount_needed ); journal_mark_dirty (th, s, SB_BUFFER_WITH_SB (s)); - s->s_dirt = 1; return CARRY_ON; } diff -uNr linux-2.4.19-pre6.o/fs/reiserfs/ibalance.c linux-2.4.19-pre6.speedup/fs/reiserfs/ibalance.c --- linux-2.4.19-pre6.o/fs/reiserfs/ibalance.c Sat Nov 10 01:18:25 2001 +++ linux-2.4.19-pre6.speedup/fs/reiserfs/ibalance.cWed Apr 10 10:43:46 2002 @@ -632,7 +632,6 @@ /* use check_internal if new root is an internal node */ check_internal (new_root); /*&&*/ - tb->tb_sb->s_dirt = 1; /* do what is needed for buffer thrown from tree */ reiserfs_invalidate_buffer(tb, tbSh); @@ -950,7 +949,6 @@ PUT_SB_ROOT_BLOCK( tb->tb_sb, tbSh->b_blocknr ); PUT_SB_TREE_HEIGHT( tb->tb_sb, SB_TREE_HEIGHT(tb->tb_sb) + 1 ); do_balance_mark_sb_dirty (tb, tb->tb_sb->u.reiserfs_sb.s_sbh, 1); - tb->tb_sb->s_dirt = 1; } if ( tb->blknum[h] == 2 ) { diff -uNr linux-2.4.19-pre6.o/fs/reiserfs/journal.c linux-2.4.19-pre6.speedup/fs/reiserfs/journal.c --- linux-2.4.19-pre6.o/fs/reiserfs/journal.c Mon Apr 8 14:53:24 2002 +++ linux-2.4.19-pre6.speedup/fs/reiserfs/journal.c Wed Apr 10 10:44:32 2002 @@ -64,12 +64,15 @@ */ static int reiserfs_mounted_fs_count = 0 ; +static struct list_head kreiserfsd_supers = LIST_HEAD_INIT(kreiserfsd_supers); + /* wake this up when you add something to the commit thread task queue */ DECLARE_WAIT_QUEUE_HEAD(reiserfs_commit_thread_wait) ; /* wait on this if you need to be sure you task queue entries have been run */ static DECLARE_WAIT_QUEUE_HEAD(reiserfs_commit_thread_done) ; DECLARE_TASK_QUEUE(reiserfs_commit_thread_tq) ; +DECLARE_MUTEX(kreiserfsd_sem) ; #define JOURNAL_TRANS_HALF 1018 /* must be correct to keep the desc and commit structs at 4k */ @@ -576,17 +579,12 @@ /* lock the current transaction */ inline static void lock_journal(struct super_block *p_s_sb) { PROC_INFO_INC( p_s_sb, journal.lock_journal ); - while(atomic_read(&(SB_JOURNAL(p_s_sb)->j_wlock)) > 0) { -PROC_INFO_INC( p_s_
Re: [reiserfs-list] fsync() Performance Issue
[EMAIL PROTECTED] wrote: >On Mon, 29 Apr 2002 19:56:59 +0200, Matthias Andree <[EMAIL PROTECTED]> > said: > > > >>Barring write cache effects, fsync() only returns after all blocks are >>on disk. While I'm not sure if and if yes, which, Linux file systems are >>affected, but for portable applications, be aware that sync() may return >>prematurely (and is allowed to!). >> >> > >And in fact is the reason for the old "recipe": > # sync > # sync > # sync > # reboot > >On the older Vax 750-class machines, sync could return LONG before the blocks >were all flushed - the second 2 sync's were so you were busy typing for >several seconds while the disks whirred. Failure to understand the typing >speed issue has lead at least one otherwise-clued author to recommend: > # sync;sync;sync > # reboot > >(the distinction being obvious if you think about when the shell reads the >commands, and when it does the fork/exec for each case) > > > Finally I understand this. Doing more than one sync always seemed mysterious to me.;-) Thanks Matthias. Hans
Re: [reiserfs-list] fsync() Performance Issue
On Mon, 29 Apr 2002 19:56:59 +0200, Matthias Andree <[EMAIL PROTECTED]> said: > Barring write cache effects, fsync() only returns after all blocks are > on disk. While I'm not sure if and if yes, which, Linux file systems are > affected, but for portable applications, be aware that sync() may return > prematurely (and is allowed to!). And in fact is the reason for the old "recipe": # sync # sync # sync # reboot On the older Vax 750-class machines, sync could return LONG before the blocks were all flushed - the second 2 sync's were so you were busy typing for several seconds while the disks whirred. Failure to understand the typing speed issue has lead at least one otherwise-clued author to recommend: # sync;sync;sync # reboot (the distinction being obvious if you think about when the shell reads the commands, and when it does the fork/exec for each case) -- Valdis Kletnieks Computer Systems Senior Engineer Virginia Tech msg05263/pgp0.pgp Description: PGP signature
RE: [reiserfs-list] fsync() Performance Issue
Agreed, it would be better to sync to disk after multiple files rather than serially; however, in the interest of not being concerned of a power outage during the process, one of the reason the disk cache is disabled, the choice was to fsync() each write. -Original Message- From: Chris Mason [mailto:[EMAIL PROTECTED]] Sent: Monday, April 29, 2002 12:46 PM To: [EMAIL PROTECTED] Cc: Russell Coker; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: [reiserfs-list] fsync() Performance Issue On Mon, 2002-04-29 at 12:32, Toby Dickenson wrote: > >One thing that has occurred to me (which has not been previously discussed as > >far as I recall) is the possibility for using sync() instead of fsync() if > >you can accumulate a number of files (and therefore replace many fsync()'s > >with one sync() ). > > I can see > > write to file A > write to file B > write to file C > sync > > might be faster than > > write to file A > fsync A > write to file B > fsync B > write to file C > fsync C Correct. > > but is it possible for it to be faster than > > write to file A > write to file B > write to file C > fsync A > fsync B > fsync C It depends on the rest of the system. sync() goes through the big lru list for the whole box, and fsync() goes through the private list for just that inode. If you've got other devices or files with dirty data, case C that you presented will always be the fastest. For general use, I like this one the best, it is what the journal code is optimized for. If files A, B, and C are the only dirty things on the whole box, a single sync() will be slightly better, mostly due to reduced cpu time. -chris
Re: [reiserfs-list] fsync() Performance Issue
Toby Dickenson <[EMAIL PROTECTED]> writes: > write to file A > write to file B > write to file C > sync Be careful with this approach. Apart from syncing other processes' dirty data, sync() does not make the same guarantees as fsync() does. Barring write cache effects, fsync() only returns after all blocks are on disk. While I'm not sure if and if yes, which, Linux file systems are affected, but for portable applications, be aware that sync() may return prematurely (and is allowed to!). -- Matthias Andree
Re: [reiserfs-list] fsync() Performance Issue
On Mon, 2002-04-29 at 12:32, Toby Dickenson wrote: > >One thing that has occurred to me (which has not been previously discussed as > >far as I recall) is the possibility for using sync() instead of fsync() if > >you can accumulate a number of files (and therefore replace many fsync()'s > >with one sync() ). > > I can see > > write to file A > write to file B > write to file C > sync > > might be faster than > > write to file A > fsync A > write to file B > fsync B > write to file C > fsync C Correct. > > but is it possible for it to be faster than > > write to file A > write to file B > write to file C > fsync A > fsync B > fsync C It depends on the rest of the system. sync() goes through the big lru list for the whole box, and fsync() goes through the private list for just that inode. If you've got other devices or files with dirty data, case C that you presented will always be the fastest. For general use, I like this one the best, it is what the journal code is optimized for. If files A, B, and C are the only dirty things on the whole box, a single sync() will be slightly better, mostly due to reduced cpu time. -chris
Re: [reiserfs-list] fsync() Performance Issue
On Mon, 2002-04-29 at 12:20, Russell Coker wrote: > On Fri, 26 Apr 2002 22:28, [EMAIL PROTECTED] wrote: > > It's interesting to note your email address and what it implies... > > > I'm wondering if anyone out there may have some suggestions on how > > to improve the performance of a system employing fsync(). I have to be able > > to guaranty that every write to my fileserver is on disk when the client > > has passed it to the server. Therefore, I have disabled write cache on the > > disk and issue an fsync() per file. I'm running 2.4.19-pre7, reiserfs > > 3.6.25, without additional patches. I have seen some discussions out here > > about various other "speed-up" patches and am wondering if I need to add > > these to 2.4.19-pre7? And what they are and where can I obtain said > > patches? Also, I'm wondering if there is another solution to syncing the > > data that is faster than fsync(). Testing, thusfar, has shown a large > > disparity between running with and without sync.Another idea is to explore > > another filesystem, but I'm not exactly excited by the other journaling > > filesystems out there at this time. All ideas will be greatly appreciated. > > These issues have been discussed a few times, but not with any results as > exciting as you might hope for. One which was mentioned was using > fdatasync() instead of fsync(). The speedup patches should help fsync some, since they make it much more likely a commit will be done without the journal lock held. If all the writes on the FS end up being done through fsync, the data logging patches might help a lot. These should be ready for broader testing this week. If you are using IDE drives, the write barrier patches are almost enough to allow you to turn on write caching safely. They make sure metadata triggers proper drive cache flushes, I can try to rig up something that will also trigger a cache flush on data syncs. -chris
Re: [reiserfs-list] fsync() Performance Issue
On Mon, 29 Apr 2002 18:20:18 +0200, Russell Coker <[EMAIL PROTECTED]> wrote: >On Fri, 26 Apr 2002 22:28, [EMAIL PROTECTED] wrote: > >It's interesting to note your email address and what it implies... > >> I'm wondering if anyone out there may have some suggestions on how >> to improve the performance of a system employing fsync(). I have to be able >> to guaranty that every write to my fileserver is on disk when the client >> has passed it to the server. Therefore, I have disabled write cache on the >> disk and issue an fsync() per file. I'm running 2.4.19-pre7, reiserfs >> 3.6.25, without additional patches. I have seen some discussions out here >> about various other "speed-up" patches and am wondering if I need to add >> these to 2.4.19-pre7? And what they are and where can I obtain said >> patches? Also, I'm wondering if there is another solution to syncing the >> data that is faster than fsync(). Testing, thusfar, has shown a large >> disparity between running with and without sync.Another idea is to explore >> another filesystem, but I'm not exactly excited by the other journaling >> filesystems out there at this time. All ideas will be greatly appreciated. > >These issues have been discussed a few times, but not with any results as >exciting as you might hope for. One which was mentioned was using >fdatasync() instead of fsync(). > >One thing that has occurred to me (which has not been previously discussed as >far as I recall) is the possibility for using sync() instead of fsync() if >you can accumulate a number of files (and therefore replace many fsync()'s >with one sync() ). I can see write to file A write to file B write to file C sync might be faster than write to file A fsync A write to file B fsync B write to file C fsync C but is it possible for it to be faster than write to file A write to file B write to file C fsync A fsync B fsync C ? Toby Dickenson [EMAIL PROTECTED]
Re: [reiserfs-list] fsync() Performance Issue
On Fri, 26 Apr 2002 22:28, [EMAIL PROTECTED] wrote: It's interesting to note your email address and what it implies... > I'm wondering if anyone out there may have some suggestions on how > to improve the performance of a system employing fsync(). I have to be able > to guaranty that every write to my fileserver is on disk when the client > has passed it to the server. Therefore, I have disabled write cache on the > disk and issue an fsync() per file. I'm running 2.4.19-pre7, reiserfs > 3.6.25, without additional patches. I have seen some discussions out here > about various other "speed-up" patches and am wondering if I need to add > these to 2.4.19-pre7? And what they are and where can I obtain said > patches? Also, I'm wondering if there is another solution to syncing the > data that is faster than fsync(). Testing, thusfar, has shown a large > disparity between running with and without sync.Another idea is to explore > another filesystem, but I'm not exactly excited by the other journaling > filesystems out there at this time. All ideas will be greatly appreciated. These issues have been discussed a few times, but not with any results as exciting as you might hope for. One which was mentioned was using fdatasync() instead of fsync(). One thing that has occurred to me (which has not been previously discussed as far as I recall) is the possibility for using sync() instead of fsync() if you can accumulate a number of files (and therefore replace many fsync()'s with one sync() ). -- If you send email to me or to a mailing list that I use which has >4 lines of legalistic junk at the end then you are specifically authorizing me to do whatever I wish with the message and all other messages from your domain, by posting the message you agree that your long legalistic sig is void.
[reiserfs-list] fsync() Performance Issue
I'm wondering if anyone out there may have some suggestions on how to improve the performance of a system employing fsync(). I have to be able to guaranty that every write to my fileserver is on disk when the client has passed it to the server. Therefore, I have disabled write cache on the disk and issue an fsync() per file. I'm running 2.4.19-pre7, reiserfs 3.6.25, without additional patches. I have seen some discussions out here about various other "speed-up" patches and am wondering if I need to add these to 2.4.19-pre7? And what they are and where can I obtain said patches? Also, I'm wondering if there is another solution to syncing the data that is faster than fsync(). Testing, thusfar, has shown a large disparity between running with and without sync.Another idea is to explore another filesystem, but I'm not exactly excited by the other journaling filesystems out there at this time. All ideas will be greatly appreciated. Wayne EMC Corp Centera Engineering 4400 Computer Drive M/S F213 Westboro, MA01580 email: [EMAIL PROTECTED] voice: (508) 898-6564 pager: (888) 769-4578 (numeric) [EMAIL PROTECTED] (alpha) fax: (508) 898-6388 "One man can make a difference, and every man should try." - JFK <>