Re: [reiserfs-list] 2.4.19-pre7 / corruption on unwanted reboot
Dirk Mueller wrote: Hi, I've seen HEAVY file corruption on unwanted reboots (like pressing the reset button accidently) on reiserfs with this kernel on 3 machines now. The symptom is that it finds a LOT of files to unlink on journal replay, which I find suspicious as those machines are lightly loaded. I didn't follow the development too closely the last few weeks, but I believe that something turned worse in this respect lately. Note that reiserfsck doesn't find any error in the file system structure before and after the journal replay on reboot, still many files (especially those that were not touched for several hours before the reboot) contain complete garbage after the journal replay. Dirk Were these files being written to near the time of the reboot? hans
Re: [reiserfs-list] Performance question
glob is implemented by the shell not the filesystem. This is not for good reason, it just is. We could write something for you to do it in the filesystem and it would be faster. Is your need for speed critical enough to justify writing something special for it? Hans Oleg Drokin wrote: Hello! On Sun, May 05, 2002 at 04:20:13PM +0200, Philipp G?hring wrote: Let's say I have a directory with 100.000 files in it. The filenames look like name1_name2_name3_id So I have 001_41052_50125_1 001_63216_1212_1 I have to create a search engine, that serves for example the 4th Block of 10 files that match the query 001_*_1212_1. The how query would result to 100 files, that are spread across the directory. Now my question: Is it faster with ReiserFS to do a bsd_glob(001_*_1212_1) first, which should result to about 100 entries, and then take the entries 40 to 49 from the resulting array? (Is ReiserFS able to directly return 100 files out of 10 with the globbing function, or is it an iteration over all files in the directory?) *glob functions are implemented by various library functions, that do full readdir scans at least once, I believe. Or should I do 2 opendir-readdir loops, one to read over the first 39 results, that I do not need, and the second one to geht the results 40 to 49? In fact I do not see why do you need to do 2 opendir-readdir loops. One loop should be enough. You just compare each filename returned against your query and and if it matched remember it in separate list. So at the end of readdir loop you have a list of all names in a directory that match your query. And you can apply any additional check in place just not to remember unnecesary files. The problem here is that I have to readdir about 5 files (4 to get through the unneeded results, and 1 to get the 10 results i need) But on the other hand, I do not have to remember 100 files, from which I only need 10. I am completely missing the idea on where these numbers are from. Can you explain in more details. If ReiserFS has to iterate over 10 files (the whole directory) to do a 001_*_1212_1 glob, because the binary tree only speeds up known files, but not patterns, then opendir-readdir should be faster, I guess. Binary tree is only helps when you know filename, I believe. You calculate a hash and out of that hash you can quickly find desired location. You you come up with a hash that places all filenames like your one near one, this will help, then. Another option would be to use subdirectories like name1/name2/name3/id So the glob would be 001/*/1212/1, which should be faster, anyway. But on the other hand, I would have to do a lot more directory management, creating and deleting directories ... And implementing an opendir-readdir search through 001/*/1212/1 will be more work too. Readdir would require less iterations through 001/*, because number of entries will be only 100 as you described above. You get all these 100 entries and then loop 100 times trying to open 001/${next_name}/1212/1 and deciding whenever you need this file or not. (If it exists of course, or you might get -ENOENT and proceed to next directory). Also deleting directories would be an overkill. I think this might be faster in many circumfstances. Also what you've descrived looks very like to what squid does. And squid people went to reiserfs-raw interface and are quite happy with it. Bye, Oleg
Re: [reiserfs-list] 2.4.19-pre7 / corruption on unwanted reboot
On Sam, 04 Mai 2002, Chris Mason wrote: Hmmm, not good at all. Are these 3 systems IDE or scsi? Do they run additional patches on top of pre7? What kernels pre7 have you tried that didn't show this problem? All IDE. The kernel that didn't show this problem was 2.4.16 (plain). No additional patches on 2.4.19-pre7. Dirk
Re: [reiserfs-list] fsync() Performance Issue
On Sat, 2002-05-04 at 10:59, Hans Reiser wrote: So how about if you revise fsync so that it always sends data blocks to the journal not to the main disk? This gets a little sticky. Once you log a block, it might be replayed after a crash. So, you have to protect against corner cases like this: write(file) fsync(file) ; /* logs modified data blocks */ write(file) ; /* write the same blocks without fsync */ sync ;/* use expects new version of the blocks on disk */ crash During replay, the logged data blocks overwrite the blocks sent to disk via sync(). This isn't hard to correct for, every time a buffer is marked dirty, you check the journal hash tables to see if it is replayable, and if so you log it instead (the 2.2.x code did this due to tails). This translates to increased CPU usage for every write. I'd rather not put it back in because it adds yet another corner case to maintain for all time. Most of the fsync/O_SYNC bound applications are just given their own partition anyway, so most users that need data logging need it for every write. -chris
Re: [reiserfs-list] Performance question
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello! Thank you Oleg for your answers. *glob functions are implemented by various library functions, that do full readdir scans at least once, I believe. I thought I heard about a syscall, that makes it possible to pass the glob to the filesystem, so that the filesystem can optimize globbings as it likes, and pass the result back to the application, but ok. Or should I do 2 opendir-readdir loops, one to read over the first 39 results, that I do not need, and the second one to geht the results 40 to 49? In fact I do not see why do you need to do 2 opendir-readdir loops. One loop should be enough. Yeah. Sure. My mistake. One opendir, and 2 readdir loops. The first one skips over unneeded results and the second one serves the data. You just compare each filename returned against your query and and if it matched remember it in separate list. So at the end of readdir loop you have a list of all names in a directory that match your query. And you can apply any additional check in place just not to remember unnecesary files. The problem here is that I have to readdir about 5 files (4 to get through the unneeded results, and 1 to get the 10 results i need) But on the other hand, I do not have to remember 100 files, from which I only need 10. I am completely missing the idea on where these numbers are from. Can you explain in more details. I will try so. I have a table with 10 files. A complete search would result for example 100 files, which are spread across the whole directory. About every thousand files, there is one file, that matches the query. Since the client does not want to get 100 files at once, at first I return only 10 results for the first page, and the user can navigate page-wise. So I built up the scenario where the user now wants the see results 40-49 from the query 001_*_1212_1, which I assume as normal behaviour for my application. Binary tree is only helps when you know filename, I believe. Ok. Readdir would require less iterations through 001/*, because number of entries will be only 100 as you described above. You get all these 100 entries and then loop 100 times trying to open 001/${next_name}/1212/1 and deciding whenever you need this file or not. (If it exists of course, or you might get -ENOENT and proceed to next directory). Also deleting directories would be an overkill. So the question is, how big that overkill is. Is there perhaps a benchmark that tested it already? I think this might be faster in many circumfstances. Also what you've descrived looks very like to what squid does. And squid people went to reiserfs-raw interface and are quite happy with it. I think the difference to squid is that they only need one result, not a part of a search, with more than one result. But I am thinking about using reiserfs-raw too ... (At the moment flexibility has still more priority for me than raw performance) Many greetings, - -- ~ Philipp G?hring [EMAIL PROTECTED] ~ http://www.livingxml.net/ ICQ UIN: 6588261 ~ xsl:value-of select=file:/home/philipp/.sig/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE81WFGlqQ+F+0wB3oRAtYSAJsGgaHnsohasbrjnJEQWAhi4tatSwCfQXDB dGlKoxKq0vcB0jHMOV6AEWQ= =heIa -END PGP SIGNATURE-
Re: [reiserfs-list] 33 bad sectors kills 20GB
On sobota 04 maj 2002 10:58 am, Dax Kelson wrote: This is a 20GB filesystem, used dd_rescue to make an image of the drive: Summary for /dev/hda3 - hda3.img: dd_rescue: (info): ipos: 19711944.0k opos: 19711944.0k xferd: 19711944.0k errs: 33 errxfer:16.5k succxfer: 19711927.5k avg.rate:11743kB/s I then ran reiserfsck on the image file: look_for_lost: 6 files seem to left not linked to lost+found Objects without names 1377 Empty lost dirs removed 10 Dirs linked to /lost+found: 149 Dirs without stat data found 5 Files linked to /lost+found 1228 Pass 4 - done left 298, 0 /sec Deleted unreachable items 117 Syncing..done Done (That took 3+ hours on a PIII 900, 512MB ram box) I mounted the image, and pretty much everything is gone. Is it normal for 33 bad sectors (16.5k) of data (out of 19711944k) to completely kill the fs? Would I have better luck if I used the -A option on dd_rescue? -A Always write blocks, zeroed if err (def=no); What you did without using -A is following: fs with errors: xpbbxxx after dd: xpxxx-- Now assume that p is a pointer in the fs data structures that pointed to the last block (the last x). Now it points nowhere. You have reorganized your filesystem without updating all the pointers there are in the metadata. Nothing is going to rescue that wreck, lest some by-hand hexediting. You *absolutely* need to use -A. Or at least that's how I understand things. You may also need to do reiserfsck with rebuild-tree, although try without it first. I hope you have the latest version of reiserfstools -- if not, you *have* to get the latest one. Cheers, Kuba
RE: [reiserfs-list] fsync() Performance Issue
I'll add the write caching into the test just for info. Until there is a way to guaranty the data is safe I'll have to go with no write caching though. I should have all this testing done by the end of the week. -Original Message- From: Chris Mason [mailto:[EMAIL PROTECTED]] Sent: Friday, May 03, 2002 6:00 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: RE: [reiserfs-list] fsync() Performance Issue On Fri, 2002-05-03 at 16:35, [EMAIL PROTECTED] wrote: Chris, I have some quick preliminary results for you. I have additional testing to perform and haven't run debugreiserfs() yet. If you have a preference for which tests to run debugreiserfs() let me know. Base testing was done against 2.4.13 built on RH 7.1 using the test_writes.c code I forwarded to you. The system is a Tyan with single PIII, IDE Promise 20269, Maxtor 160GB drive - write cache disabled. All numbers are with fsync() and 1KB files. As I said, more testing, i.e. filesizes, need to be performed. 2.4.19-pre7 speedup, data logging, write barrier / no options = 47.1ms/file Hi Wayne, thanks for sending these along. I expected a slight improvement over the 2.4.13 code even with the data logging turned off. I'm curious to see how it does with the IDE cache turned on. With scsi, I see 10-15% better without any options than an unpatched kernel. 2.4.19-pre7 speedup, data logging, write barrier / data=journal = 25.2ms/file 2.4.19-pre7 speedup, data logging, write barrier / data=journal,barrier=none = 27.8ms/file The barrier option doesn't make much difference because the write cache is off. With write cache on, the barrier code should allow you to be faster than with the caching off, but without risking the data (Jens and I are working on final fsync safety issues though). Hans, data=journal turns on the data journaling. The data journaling patches also include optimizations to write metadata back to disk in bigger chunks for tiny transactions (the current method is to write one transaction's worth back, when a transaction has 3 blocks, this is pretty slow). I've put these patches up on: ftp.suse.com/pub/people/mason/patches/data-logging One question is will these patches be going into the 2.4 tree and when? The data logging patches are a huge change, but the good news is they are based on the nesting patches that have been stable for a long time in the quota code. I'll probably want a month or more of heavy testing before I think about submitting them. -chris
Re: [reiserfs-list] Performance question
Hello! On Sun, May 05, 2002 at 06:43:45PM +0200, Philipp G?hring wrote: *glob functions are implemented by various library functions, that do full readdir scans at least once, I believe. I thought I heard about a syscall, that makes it possible to pass the glob to the filesystem, so that the filesystem can optimize globbings as it likes, and pass the result back to the application, but ok. I do not think something like that exists in Linux. But if you come up with man page from section 2... Or should I do 2 opendir-readdir loops, one to read over the first 39 results, that I do not need, and the second one to geht the results 40 to 49? In fact I do not see why do you need to do 2 opendir-readdir loops. One loop should be enough. Yeah. Sure. My mistake. One opendir, and 2 readdir loops. The first one skips over unneeded results and the second one serves the data. No. Still I think you need only one loop anyway, like this: pseudocode DIR=opendir(name); while((result=readdir(DIR)) != NULL) { if ( check_filename_criteria(result-filename) ) { add_to_list_of_files_to_process(result-filename); } } for i in list_of_files_to_process { process_file(i); } So only one loop, and the second one does not count because it is serves actual data. The problem here is that I have to readdir about 5 files (4 to get through the unneeded results, and 1 to get the 10 results i need) But on the other hand, I do not have to remember 100 files, from which I only need 10. I am completely missing the idea on where these numbers are from. Can you explain in more details. I will try so. I have a table with 10 files. A complete search would result for example 100 files, which are spread across the whole directory. About every thousand files, there is one file, that matches the query. Since the client does not want to get 100 files at once, at first I return only 10 results for the first page, and the user can navigate page-wise. So I built up the scenario where the user now wants the see results 40-49 from the query 001_*_1212_1, which I assume as normal behaviour for my application. Ah, I see what you mean. If you have a lot of resources, you can setup a session and store all the search results for that session at server side. So when second request comes in, you just read search result from the session. Also you kill the session for 5 minutes after 5 minutes of inactivity on it or so. Hm... This requires for cookies to be enabled, though. ;) Readdir would require less iterations through 001/*, because number of entries will be only 100 as you described above. You get all these 100 entries and then loop 100 times trying to open 001/${next_name}/1212/1 and deciding whenever you need this file or not. (If it exists of course, or you might get -ENOENT and proceed to next directory). Also deleting directories would be an overkill. So the question is, how big that overkill is. I mean that you do not need to delete directories, when they are empty. You only need to create the directory structure once. Is there perhaps a benchmark that tested it already? No, I do not think so, but feel free to compose and run your own benchmark. I think this might be faster in many circumfstances. Also what you've descrived looks very like to what squid does. And squid people went to reiserfs-raw interface and are quite happy with it. I think the difference to squid is that they only need one result, not a part of a search, with more than one result. Hm. This is true. Bye, Oleg
Re: [reiserfs-list] fsync() Performance Issue
Chris Mason wrote: On Sat, 2002-05-04 at 10:59, Hans Reiser wrote: So how about if you revise fsync so that it always sends data blocks to the journal not to the main disk? This gets a little sticky. Once you log a block, it might be replayed after a crash. So, you have to protect against corner cases like this: write(file) fsync(file) ; /* logs modified data blocks */ write(file) ; /* write the same blocks without fsync */ sync ;/* use expects new version of the blocks on disk */ crash During replay, the logged data blocks overwrite the blocks sent to disk via sync(). This isn't hard to correct for, every time a buffer is marked dirty, you check the journal hash tables to see if it is replayable, and if so you log it instead (the 2.2.x code did this due to tails). This translates to increased CPU usage for every write. Significant increased CPU usage? I'd rather not put it back in because it adds yet another corner case to maintain for all time. Most of the fsync/O_SYNC bound applications are just given their own partition anyway, so most users that need data logging need it for every write. most users don't know enough to turn it on;-) -chris
Re: [reiserfs-list] 2.4.19-pre7 / corruption on unwanted reboot
On Mon, 06 Mai 2002, Chris Mason wrote: Please tell us everything about your IDE config. Jens and I are already trying to track down some odd reiserfs + ide problems on 2.4.19pre7, but so far that was only with our barrier write patches applied. There is not much common. two of them are VIA 686 southbridge (KT133A, KT333), one is something older, a Pentium chipset. DMA 100 and DMA 66. We all use those Maxtor 80GB EIDE disks. Dirk
Re: [reiserfs-list] 2.4.19-pre7 / corruption on unwanted reboot
On Mon, 2002-05-06 at 09:59, Dirk Mueller wrote: On Mon, 06 Mai 2002, Chris Mason wrote: Please tell us everything about your IDE config. Jens and I are already trying to track down some odd reiserfs + ide problems on 2.4.19pre7, but so far that was only with our barrier write patches applied. There is not much common. two of them are VIA 686 southbridge (KT133A, KT333), one is something older, a Pentium chipset. DMA 100 and DMA 66. We all use those Maxtor 80GB EIDE disks. Any suggestions on how I might reproduce locally? -chris
Re: [reiserfs-list] Error Code 255 and Permission Denied
Oleg Drokin [EMAIL PROTECTED] 05/06/02 07:21 AM Make sure to get latest reiserfsprogs (v3.x.1b) from namesys ftp site. I did this and installed it. But, of course, my rescue disk has the old 3.x.0j version not the new 3.x.1b version. How can I use the new version? Other indications of a problem: I ran the dmesg program from /bin and got Warning log replay starting on readonly filesystem and lots of i/o failure trying to find stat data messages. Your filesystem was corrupted by something. Do you have Windows on that box, too? I have windows on the first drive, which I rarely use, and a vfat windows partition on the second drive so that I could transfer files. I'm not sure, but I think my problems started after a power outage. kernel: is_tree_node: node level 0 does not match to the expected one 1 kernel: vs-5150: search_by_key: invalid format found in block 8272. Fsck? This message confirms damaged blocks theory. I would appreciate any suggestions as to how fix my problem. Get latest reiserfsprogs package and run reiserfsck --rebuild-tree. I presume I'm supposed to remount my / drive as read only. I'm not sure how to do that. I'll try to find out. Thanks for the help. Dan Bye, Oleg
Re: [reiserfs-list] Error Code 255 and Permission Denied
Hello! On Mon, May 06, 2002 at 12:07:26PM -0400, Daniel Christiansen wrote: Make sure to get latest reiserfsprogs (v3.x.1b) from namesys ftp site. I did this and installed it. But, of course, my rescue disk has the old 3.x.0j version not the new 3.x.1b version. How can I use the new version? Copy the new version to your rescue floppy. Other indications of a problem: I ran the dmesg program from /bin and got Warning log replay starting on readonly filesystem and lots of i/o failure trying to find stat data messages. Your filesystem was corrupted by something. Do you have Windows on that box, too? I have windows on the first drive, which I rarely use, and a vfat windows partition on the second drive so that I could transfer files. I'm not sure, but I think my problems started after a power outage. Hm. Do you have write cache enabled on your harddrive? That may explain your problems (and yes, most of drive manufacturers do enable write caching by default). Get latest reiserfsprogs package and run reiserfsck --rebuild-tree. I presume I'm supposed to remount my / drive as read only. I'm not sure how to do that. I'll try to find out. No. simple remounting won't help, you need to boot off some rescue media and run reiserfsck on completely unmounted partition. Bye, Oleg
Re: [reiserfs-list] 2.4.19-pre7 / corruption on unwanted reboot
On Mon, 06 Mai 2002, Chris Mason wrote: Any suggestions on how I might reproduce locally? not much. maybe try a lot of open, unlinked files when pressing reset and then check the md5sum's of all files.. Dirk
Re: [reiserfs-list] Error Code 255 and Permission Denied
Everything seems to work by using the 3.x.1b version of reisfsck with --rebuild-tree. Thank you very much, Oleg, for taking the time to solve my problem. I don't know anything about the write cache enabled issue below. Is this something I have to change with a jumper, a bios setting, or a software configuration? Thanks again. Other indications of a problem: I ran the dmesg program from /bin and got Warning log replay starting on readonly filesystem and lots of i/o failure trying to find stat data messages. Your filesystem was corrupted by something. Do you have Windows on that box, too? I have windows on the first drive, which I rarely use, and a vfat windows partition on the second drive so that I could transfer files. I'm not sure, but I think my problems started after a power outage. Hm. Do you have write cache enabled on your harddrive? That may explain your problems (and yes, most of drive manufacturers do enable write caching by default).
Re: [reiserfs-list] fsync() Performance Issue
On Mon, 2002-05-06 at 17:21, Hans Reiser wrote: I'd rather not put it back in because it adds yet another corner case to maintain for all time. Most of the fsync/O_SYNC bound applications are just given their own partition anyway, so most users that need data logging need it for every write. Does mozilla's mail user agent use fsync? Should I give it its own partition? I bet it is fsync bound;-) [ I took Wayne off the cc list, he's probably not horribly interested ] Perhaps, but I'll also bet the fsync performance hit doesn't affect the performance of the system as a whole. Remember that data=journal doesn't make the fsyncs fast, it just makes them faster. Most persons using small fsyncs are using it because the person who wrote their application wrote it wrong. What's more, many of the persons who wrote those applications cannot understand that they did it wrong even if you tell them (e.g. qmail author reportedly cannot understand, sendmail guys now understand but had Kirk McKusick on their staff and attending the meeting when I explained it to them so they are not very typical). In other words, handling stupidity is an important life skill, and we all need to excell at it.;-) A real strength to linux is the application designers can talk directly to their own personal bottlenecks. Hopefully we reward those that hunt us down and spend the time convincing us their applications are worth tuning for. They then proceed to beat the pants off their competition. Tell me what your thoughts are on the following: If you ask randomly selected ReiserFS users (not the reiserfs-list, but the ones who would never send you an email) the following questions, what percentage will answer which choice? The filesystem you are using is named: a) the Performance Optimized SuSE FS b) NTFS c) FAT d) ext2 e) ReiserFS I believe the ones that know what a filesystem is will answer ReiserFS, You might get a lot of ext2 answers, just because that's what a lot of people think the linux filesystem is. If you want to change reiserfs to use data journaling you must do which: a) reinstall the reiserfs package using rpm b) modify /etc/fs.conf c) reinstall the operating system from scratch, and select different options during the install this time d) reformat your reiserfs partition using mkreiserfs e) none of the above f) all of the above except e) These people won't be admins of systems big enough for the difference to matter. data journaling is targeted at people with so much load they would have to buy more hardware to make up for it. The new option lowers the price to performance ratio, which is exactly what we want to do for sendmails, egeneras, lycos, etc. If it takes my laptop 20ms to deliver a mail message, cutting the time down to 10ms just won't matter. What do you think the chances are that you can convince Hubert that every SuSE Enterprise Edition user should be asked at install time if they are going to use fsync a lot on each partition, and to use a different fstab setting if yes? Very little, I might tell them to buy the suse email server instead, since that would have the settings done right. data=journal is just a small part of mail server tuning. I know that you are an experienced sysadmin who was good at it. Your intuition tells you that most sysadmins are like the ones you were willing to hire into your group at the university. They aren't. Linux needs to be like a telephone. You plug it in, push buttons, and talk. It works well, but most folks don't know why. Exactly. I think there are 3 classes of users at play here. 1) Those who don't understand and don't have enough load to notice. 2) Those who don't understand and do have enough load to notice. 3) Those who do understand and do have enough load to notice. #2 will buy support from someone, and they should be able to configure the thing right. #3 will find the docs and do it right themselves. A moderate number of programs are small fsync bound for the simple reason that it is simpler to write them that way.We need to cover over their simplistic designs. So, you have my sympathies Chris, because I believe you that it makes the code uglier and it won't be a joy to code and test. I hope you also see that it should be done. Mostly, I feel this kind of tuning is a mistake right now. The patch is young and there are so many places left to tweak...I'm still at the stage where much larger improvements are possible, and a better use of coding time. Plus, it's monday and it's always more fun to debate than give in on mondays. -chris
[reiserfs-list] BTW: 2.4.19-patches-to-come?
Hi! BTW, for 2.4.19-final it would be very nice to have... 1.) the deleted/truncated/completed-files-on-mount at least printed in the kernel logs, at best with the real filename -- as afterwards they are not retrievable -- That's a security reason -- whoever can trigger a crash with various methods (I know the admin should take care against this case... but on my home sytem I'd like to know that info, too) but to get back the file from backups in case... who knows it before a crash... ? Am I missing something? 2.) a disk/drive/partition distinction in reiserfs related messages -- Oleg, you promised it to get real and best would be a real patch ! 3.) a hint on how to turn on/off data-journaling for some of our existing reiserfs partitions if it exists at all for now and why it could be needed in some cases?!. 4.) a hint why there is iicache code in the latest speedup-compound-patch (so that the latest iicache patch would not apply) Best regards for your stable ReiserFS, at all, even under settings with 2.4.19-pre7 +reiserfs.pending +latest reiserfs.compound-speedup +aa.vm-for-2.4.19-pre7 +akpm.read-latency-2 +rml.preempt-kernel + rml.lock-break +some-more nice aa.patches... That's a valuably fast interactive experience! Manuel
Re: [reiserfs-list] fsync() Performance Issue
On 05/07/2002 12:57 AM, Chris Mason wrote: On Mon, 2002-05-06 at 17:21, Hans Reiser wrote: I'd rather not put it back in because it adds yet another corner case to maintain for all time. Most of the fsync/O_SYNC bound applications are just given their own partition anyway, so most users that need data logging need it for every write. Does mozilla's mail user agent use fsync? Should I give it its own partition? I bet it is fsync bound;-) [ I took Wayne off the cc list, he's probably not horribly interested ] Perhaps, but I'll also bet the fsync performance hit doesn't affect the performance of the system as a whole. Remember that data=journal doesn't make the fsyncs fast, it just makes them faster. Most persons using small fsyncs are using it because the person who wrote their application wrote it wrong. What's more, many of the persons who wrote those applications cannot understand that they did it wrong even if you tell them (e.g. qmail author reportedly cannot understand, sendmail guys now understand but had Kirk McKusick on their staff and attending the meeting when I explained it to them so they are not very typical). In other words, handling stupidity is an important life skill, and we all need to excell at it.;-) A real strength to linux is the application designers can talk directly to their own personal bottlenecks. Hopefully we reward those that hunt us down and spend the time convincing us their applications are worth tuning for. They then proceed to beat the pants off their competition. Tell me what your thoughts are on the following: If you ask randomly selected ReiserFS users (not the reiserfs-list, but the ones who would never send you an email) the following questions, what percentage will answer which choice? The filesystem you are using is named: a) the Performance Optimized SuSE FS b) NTFS c) FAT d) ext2 e) ReiserFS I believe the ones that know what a filesystem is will answer ReiserFS, You might get a lot of ext2 answers, just because that's what a lot of people think the linux filesystem is. If you want to change reiserfs to use data journaling you must do which: a) reinstall the reiserfs package using rpm b) modify /etc/fs.conf c) reinstall the operating system from scratch, and select different options during the install this time d) reformat your reiserfs partition using mkreiserfs e) none of the above f) all of the above except e) These people won't be admins of systems big enough for the difference to matter. data journaling is targeted at people with so much load they would have to buy more hardware to make up for it. The new option lowers the price to performance ratio, which is exactly what we want to do for sendmails, egeneras, lycos, etc. If it takes my laptop 20ms to deliver a mail message, cutting the time down to 10ms just won't matter. What do you think the chances are that you can convince Hubert that every SuSE Enterprise Edition user should be asked at install time if they are going to use fsync a lot on each partition, and to use a different fstab setting if yes? Very little, I might tell them to buy the suse email server instead, since that would have the settings done right. data=journal is just a small part of mail server tuning. I know that you are an experienced sysadmin who was good at it. Your intuition tells you that most sysadmins are like the ones you were willing to hire into your group at the university. They aren't. Linux needs to be like a telephone. You plug it in, push buttons, and talk. It works well, but most folks don't know why. Exactly. I think there are 3 classes of users at play here. 1) Those who don't understand and don't have enough load to notice. 2) Those who don't understand and do have enough load to notice. 3) Those who do understand and do have enough load to notice. #2 will buy support from someone, and they should be able to configure the thing right. #3 will find the docs and do it right themselves. A moderate number of programs are small fsync bound for the simple reason that it is simpler to write them that way.We need to cover over their simplistic designs. So, you have my sympathies Chris, because I believe you that it makes the code uglier and it won't be a joy to code and test. I hope you also see that it should be done. Mostly, I feel this kind of tuning is a mistake right now. The patch is young and there are so many places left to tweak...I'm still at the stage where much larger improvements are possible, and a better use of coding time. Plus, it's monday and it's always more fun to debate than give in on mondays. -chris Hi, Chris Hans! Don't think this somekind of destructive discussion would lead to anything useful for now, can you post a diff for 2.4.19-pre7+latest-related-pending +compound-patch-from-ftp? I'll try it and report if that leads
Re: [reiserfs-list] Error Code 255 and Permission Denied
On Mon, 2002-05-06 at 20:54, Manuel Krause wrote: On 05/06/2002 09:12 PM, Daniel Christiansen wrote: Everything seems to work by using the 3.x.1b version of reisfsck with --rebuild-tree. Thank you very much, Oleg, for taking the time to solve my problem. I don't know anything about the write cache enabled issue below. Is this something I have to change with a jumper, a bios setting, or a software configuration? Most new IDE drives have this on by default. You can turn it off with hdparm -W 0, which will make you more able to withstand power failures. Or is it somekind of connected to Chris Masons thread 2.4.19-pre7 / corruption on unwanted reboot??? If Chris and Jens found bugs on IDE interaction with ReiserFS they should really put out a patch soon... ;-) Chris M.? Is that related eventually? Just a doubt! I haven't been able to reproduce problems after a crash with pre7, but Dirk is not often wrong when we reports about a bug. If anyone can reliably reproduce I'd be grateful. -chris
Re: [reiserfs-list] fsync() Performance Issue
On Mon, 2002-05-06 at 21:17, Manuel Krause wrote: On 05/07/2002 12:57 AM, Chris Mason wrote: Hi, Chris Hans! Don't think this somekind of destructive discussion would lead to anything useful for now, can you post a diff for 2.4.19-pre7+latest-related-pending +compound-patch-from-ftp? I'll try it and report if that leads to more security and/or less performance on my every day use with NS6 and so on if there is any. The current data logging patches are at: ftp.suse.com/pub/people/mason/patches/data-logging They are against 2.4.19-pre7, and contain versions of the major (stable) speedups. The patch is pretty big, so I'm not likely to merge with the namesys pending directories. The namesys guys add things frequently, and I think it would get confusing for people trying to figure out which patches to apply. The data logging stuff is beta code, if you have a good test bed where it's ok if things go wrong I can make you a special patch with the pending stuff merged. -chris