Re: [reiserfs-list] 2.4.19-pre7 / corruption on unwanted reboot

2002-05-06 Thread Hans Reiser

Dirk Mueller wrote:

Hi, 

I've seen HEAVY file corruption on unwanted reboots (like pressing the reset 
button accidently) on reiserfs with this kernel on 3 machines now. 

The symptom is that it finds a LOT of files to unlink on journal replay, 
which I find suspicious as those machines are lightly loaded. 

I didn't follow the development too closely the last few weeks, but I 
believe that something turned worse in this respect lately. 

Note that reiserfsck doesn't find any error in the file system structure 
before and after the journal replay on reboot, 
still many files (especially those that were not touched for several hours 
before the reboot) contain complete garbage after the journal replay. 


Dirk


  

Were these files being written to near the time of the reboot?

hans





Re: [reiserfs-list] Performance question

2002-05-06 Thread Hans Reiser

glob is implemented by the shell not the filesystem.  This is not for 
good reason, it just is.  We could write something for you to do it in 
the filesystem and it would be faster.  Is your need for speed critical 
enough to justify writing something special for it?

Hans


Oleg Drokin wrote:

Hello!

On Sun, May 05, 2002 at 04:20:13PM +0200, Philipp G?hring wrote:

  

Let's say I have a directory with 100.000 files in it.
The filenames look like
name1_name2_name3_id
So I have
001_41052_50125_1
001_63216_1212_1
I have to create a search engine, that serves for example the 4th Block of 10 
files that match the query 001_*_1212_1. The how query would result to 100 
files, that are spread across the directory.
Now my question:
Is it faster with ReiserFS to do a bsd_glob(001_*_1212_1) first, which 
should result to about 100 entries, and then take the entries 40 to 49 from 
the resulting array? 
(Is ReiserFS able to directly return 100 files out of 10 with the 
globbing function, or is it an iteration over all files in the directory?)



*glob functions are implemented by various library functions, that do full
readdir scans at least once, I believe.

  

Or should I do 2 opendir-readdir loops, one to read over the first 39 
results, that I do not need, and the second one to geht the results 40 to 49?



In fact I do not see why do you need to do 2 opendir-readdir loops.
One loop should be enough.
You just compare each filename returned against your query and and if it matched
remember it in separate list. So at the end of readdir loop you have a list of
all names in a directory that match your query. And you can apply any additional
check in place just not to remember unnecesary files.

  

The problem here is that I have to readdir about 5 files (4 to get 
through the unneeded results, and 1 to get the 10 results i need)
But on the other hand, I do not have to remember 100 files, from which I only 
need 10.



I am completely missing the idea on where these numbers are from. Can you
explain in more details.

  

If ReiserFS has to iterate over 10 files (the whole directory) to do a 
001_*_1212_1 glob, because the binary tree only speeds up known files, but 
not patterns, then opendir-readdir should be faster, I guess.



Binary tree is only helps when you know filename, I believe. You calculate
a hash and out of that hash you can quickly find desired location.
You you come up with a hash that places all filenames like your one near one,
this will help, then.

  

Another option would be to use subdirectories like
name1/name2/name3/id
So the glob would be 001/*/1212/1, which should be faster, anyway.
But on the other hand, I would have to do a lot more directory management, 
creating and deleting directories ...
And implementing an opendir-readdir search through 001/*/1212/1 will be 
more work too.



Readdir would require less iterations through 001/*, because number of
entries will be only 100 as you described above.
You get all these 100 entries and then loop 100 times trying to open
001/${next_name}/1212/1 and deciding whenever you need this file or not.
(If it exists of course, or you might get -ENOENT and proceed to next
directory).
Also deleting directories would be an overkill.
I think this might be faster in many circumfstances.
Also what you've descrived looks very like to what squid does. And squid people
went to reiserfs-raw interface and are quite happy with it.


Bye,
Oleg


  







Re: [reiserfs-list] 2.4.19-pre7 / corruption on unwanted reboot

2002-05-06 Thread Dirk Mueller

On Sam, 04 Mai 2002, Chris Mason wrote:

 Hmmm, not good at all.  Are these 3 systems IDE or scsi?  Do they run
 additional patches on top of pre7?  What kernels  pre7 have you tried
 that didn't show this problem?

All IDE. The kernel that didn't show this problem was 2.4.16 (plain). No 
additional patches on 2.4.19-pre7. 


Dirk



Re: [reiserfs-list] fsync() Performance Issue

2002-05-06 Thread Chris Mason

On Sat, 2002-05-04 at 10:59, Hans Reiser wrote:

 So how about if you revise fsync so that it always sends data blocks to 
 the journal not to the main disk?

This gets a little sticky.

Once you log a block, it might be replayed after a crash.  So, you have
to protect against corner cases like this:

write(file)
fsync(file) ; /* logs modified data blocks */
write(file) ; /* write the same blocks without fsync */
sync ;/* use expects new version of the blocks on disk */
crash

During replay, the logged data blocks overwrite the blocks sent to disk
via sync().

This isn't hard to correct for, every time a buffer is marked dirty, you
check the journal hash tables to see if it is replayable, and if so you
log it instead (the 2.2.x code did this due to tails).  This translates
to increased CPU usage for every write.

I'd rather not put it back in because it adds yet another corner case to
maintain for all time.  Most of the fsync/O_SYNC bound applications are
just given their own partition anyway, so most users that need data
logging need it for every write.

-chris







Re: [reiserfs-list] Performance question

2002-05-06 Thread Philipp G?hring

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello!

Thank you Oleg for your answers.

 *glob functions are implemented by various library functions, that do full
 readdir scans at least once, I believe.

I thought I heard about a syscall, that makes it possible to pass the glob to 
the filesystem, so that the filesystem can optimize globbings as it likes, 
and pass the result back to the application, but ok.

  Or should I do 2 opendir-readdir loops, one to read over the first 39
  results, that I do not need, and the second one to geht the results 40 to
  49?

 In fact I do not see why do you need to do 2 opendir-readdir loops.
 One loop should be enough.

Yeah. Sure. My mistake. One opendir, and 2 readdir loops. The first one skips 
over unneeded results and the second one serves the data.

 You just compare each filename returned against your query and and if it
 matched remember it in separate list. So at the end of readdir loop you
 have a list of all names in a directory that match your query. And you can
 apply any additional check in place just not to remember unnecesary files.

  The problem here is that I have to readdir about 5 files (4 to
  get through the unneeded results, and 1 to get the 10 results i need)
  But on the other hand, I do not have to remember 100 files, from which I
  only need 10.

 I am completely missing the idea on where these numbers are from. Can you
 explain in more details.

I will try so.
I have a table with 10 files. A complete search would result for example 
100 files, which are spread across the whole directory.
About every thousand files, there is one file, that matches the query.
Since the client does not want to get 100 files at once, at first I return 
only 10 results for the first page, and the user can navigate page-wise.

So I built up the scenario where the user now wants the see results 40-49 
from the query 001_*_1212_1, 
which I assume as normal behaviour for my application.

 Binary tree is only helps when you know filename, I believe. 

Ok.

 Readdir would require less iterations through 001/*, because number of
 entries will be only 100 as you described above.
 You get all these 100 entries and then loop 100 times trying to open
 001/${next_name}/1212/1 and deciding whenever you need this file or not.
 (If it exists of course, or you might get -ENOENT and proceed to next
 directory).
 Also deleting directories would be an overkill.

So the question is, how big that overkill is.
Is there perhaps a benchmark that tested it already?

 I think this might be faster in many circumfstances.
 Also what you've descrived looks very like to what squid does. And squid
 people went to reiserfs-raw interface and are quite happy with it.

I think the difference to squid is that they only need one result, not a part 
of a search, with more than one result.
But I am thinking about using reiserfs-raw too ...
(At the moment flexibility has still more priority for me than raw 
performance)

Many greetings,
- -- 
~ Philipp G?hring  [EMAIL PROTECTED]
~ http://www.livingxml.net/   ICQ UIN: 6588261
~ xsl:value-of select=file:/home/philipp/.sig/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE81WFGlqQ+F+0wB3oRAtYSAJsGgaHnsohasbrjnJEQWAhi4tatSwCfQXDB
dGlKoxKq0vcB0jHMOV6AEWQ=
=heIa
-END PGP SIGNATURE-



Re: [reiserfs-list] 33 bad sectors kills 20GB

2002-05-06 Thread Kuba Ober

On sobota 04 maj 2002 10:58 am, Dax Kelson wrote:
 This is a 20GB filesystem, used dd_rescue to make an image of the drive:

 Summary for /dev/hda3 - hda3.img:
 dd_rescue: (info):
 ipos:  19711944.0k
 opos:  19711944.0k
 xferd:  19711944.0k
 errs: 33
 errxfer:16.5k
 succxfer:  19711927.5k
 avg.rate:11743kB/s

 I then ran reiserfsck on the image file:

 look_for_lost: 6 files seem to left not linked to lost+found
 Objects without names 1377
 Empty lost dirs removed 10
 Dirs linked to /lost+found: 149
 Dirs without stat data found 5
 Files linked to /lost+found 1228
 Pass 4 - done left 298, 0 /sec
 Deleted unreachable items 117
 Syncing..done
 Done

 (That took 3+ hours on a PIII 900, 512MB ram box)

 I mounted the image, and pretty much everything is gone.

 Is it normal for 33 bad sectors (16.5k) of data (out of 19711944k) to
 completely kill the fs?

 Would I have better luck if I used the -A option on dd_rescue?

 -A Always write blocks, zeroed if err (def=no);

What you did without using -A is following:

fs with errors:

xpbbxxx

after dd:

xpxxx--

Now assume that p is a pointer in the fs data structures that pointed to the 
last block (the last x). Now it points nowhere.

You have reorganized your filesystem without updating all the pointers there 
are in the metadata. Nothing is going to rescue that wreck,  lest some 
by-hand hexediting.

You *absolutely* need to use -A. Or at least that's how I understand things.

You may also need to do reiserfsck with rebuild-tree, although try without it 
first. I hope you have the latest version of reiserfstools -- if not, you 
*have* to get the latest one.

Cheers, Kuba



RE: [reiserfs-list] fsync() Performance Issue

2002-05-06 Thread berthiaume_wayne

I'll add the write caching into the test just for info. Until there
is a way to guaranty the data is safe I'll have to go with no write caching
though. I should have all this testing done by the end of the week.

-Original Message-
From: Chris Mason [mailto:[EMAIL PROTECTED]]
Sent: Friday, May 03, 2002 6:00 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: RE: [reiserfs-list] fsync() Performance Issue


On Fri, 2002-05-03 at 16:35, [EMAIL PROTECTED] wrote:
   Chris, I have some quick preliminary results for you. I have
 additional testing to perform and haven't run debugreiserfs() yet. If you
 have a preference for which tests to run debugreiserfs() let me know.
   Base testing was done against 2.4.13 built on RH 7.1 using the
 test_writes.c code I forwarded to you. The system is a Tyan with single
 PIII, IDE Promise 20269, Maxtor 160GB drive - write cache disabled. All
 numbers are with fsync() and 1KB files. As I said, more testing, i.e.
 filesizes, need to be performed.

 2.4.19-pre7 speedup, data logging, write barrier / no options
   = 47.1ms/file

Hi Wayne, thanks for sending these along.

I expected a slight improvement over the 2.4.13 code even with the data
logging turned off.  I'm curious to see how it does with the IDE cache
turned on.  With scsi, I see 10-15% better without any options than an
unpatched kernel.

 2.4.19-pre7 speedup, data logging, write barrier / data=journal
   = 25.2ms/file
 2.4.19-pre7 speedup, data logging, write barrier /
data=journal,barrier=none
   = 27.8ms/file

The barrier option doesn't make much difference because the write cache
is off.  With write cache on, the barrier code should allow you to be
faster than with the caching off, but without risking the data (Jens and
I are working on final fsync safety issues though).

Hans, data=journal turns on the data journaling.  The data journaling
patches also include optimizations to write metadata back to disk in
bigger chunks for tiny transactions (the current method is to write one
transaction's worth back, when a transaction has 3 blocks, this is
pretty slow).

I've put these patches up on:

ftp.suse.com/pub/people/mason/patches/data-logging

   One question is will these patches be going into the 2.4 tree and
 when?

The data logging patches are a huge change, but the good news is they
are based on the nesting patches that have been stable for a long time
in the quota code.  I'll probably want a month or more of heavy testing
before I think about submitting them.

-chris




Re: [reiserfs-list] Performance question

2002-05-06 Thread Oleg Drokin

Hello!

On Sun, May 05, 2002 at 06:43:45PM +0200, Philipp G?hring wrote:

  *glob functions are implemented by various library functions, that do full
  readdir scans at least once, I believe.
 I thought I heard about a syscall, that makes it possible to pass the glob to 
 the filesystem, so that the filesystem can optimize globbings as it likes, 
 and pass the result back to the application, but ok.

I do not think something like that exists in Linux. But if you
come up with man page from section 2...

   Or should I do 2 opendir-readdir loops, one to read over the first 39
   results, that I do not need, and the second one to geht the results 40 to
   49?
  In fact I do not see why do you need to do 2 opendir-readdir loops.
  One loop should be enough.
 Yeah. Sure. My mistake. One opendir, and 2 readdir loops. The first one skips 
 over unneeded results and the second one serves the data.

No. Still I think you need only one loop anyway, like this:
pseudocode
DIR=opendir(name);
while((result=readdir(DIR)) != NULL) {
if ( check_filename_criteria(result-filename) ) {
add_to_list_of_files_to_process(result-filename);
}
}
for i in list_of_files_to_process {
process_file(i);
}

So only one loop, and the second one does not count because it is serves
actual data.

   The problem here is that I have to readdir about 5 files (4 to
   get through the unneeded results, and 1 to get the 10 results i need)
   But on the other hand, I do not have to remember 100 files, from which I
   only need 10.
  I am completely missing the idea on where these numbers are from. Can you
  explain in more details.
 I will try so.
 I have a table with 10 files. A complete search would result for example 
 100 files, which are spread across the whole directory.
 About every thousand files, there is one file, that matches the query.
 Since the client does not want to get 100 files at once, at first I return 
 only 10 results for the first page, and the user can navigate page-wise.
 So I built up the scenario where the user now wants the see results 40-49 
 from the query 001_*_1212_1, 
 which I assume as normal behaviour for my application.

Ah, I see what you mean. If you have a lot of resources, you can setup a session
and store all the search results for that session at server side.
So when second request comes in, you just read search result from the session.
Also you kill the session for 5 minutes after 5 minutes of inactivity on it or
so. Hm... This requires for cookies to be enabled, though. ;)

  Readdir would require less iterations through 001/*, because number of
  entries will be only 100 as you described above.
  You get all these 100 entries and then loop 100 times trying to open
  001/${next_name}/1212/1 and deciding whenever you need this file or not.
  (If it exists of course, or you might get -ENOENT and proceed to next
  directory).
  Also deleting directories would be an overkill.
 So the question is, how big that overkill is.

I mean that you do not need to delete directories, when they are empty.
You only need to create the directory structure once.

 Is there perhaps a benchmark that tested it already?

No, I do not think so, but feel free to compose and run your own benchmark.

  I think this might be faster in many circumfstances.
  Also what you've descrived looks very like to what squid does. And squid
  people went to reiserfs-raw interface and are quite happy with it.
 I think the difference to squid is that they only need one result, not a part 
 of a search, with more than one result.

Hm. This is true.

Bye,
Oleg



Re: [reiserfs-list] fsync() Performance Issue

2002-05-06 Thread Hans Reiser

Chris Mason wrote:

On Sat, 2002-05-04 at 10:59, Hans Reiser wrote:
  

So how about if you revise fsync so that it always sends data blocks to 
the journal not to the main disk?



This gets a little sticky.

Once you log a block, it might be replayed after a crash.  So, you have
to protect against corner cases like this:

write(file)
fsync(file) ; /* logs modified data blocks */
write(file) ; /* write the same blocks without fsync */
sync ;/* use expects new version of the blocks on disk */
crash

During replay, the logged data blocks overwrite the blocks sent to disk
via sync().

This isn't hard to correct for, every time a buffer is marked dirty, you
check the journal hash tables to see if it is replayable, and if so you
log it instead (the 2.2.x code did this due to tails).  This translates
to increased CPU usage for every write.

Significant increased CPU usage?


I'd rather not put it back in because it adds yet another corner case to
maintain for all time.  Most of the fsync/O_SYNC bound applications are
just given their own partition anyway, so most users that need data
logging need it for every write.

most users don't know enough to turn it on;-)


-chris






  







Re: [reiserfs-list] 2.4.19-pre7 / corruption on unwanted reboot

2002-05-06 Thread Dirk Mueller

On Mon, 06 Mai 2002, Chris Mason wrote:

 Please tell us everything about your IDE config.  Jens and I are already
 trying to track down some odd reiserfs + ide problems on 2.4.19pre7, but
 so far that was only with our barrier write patches applied.

There is not much common. two of them are VIA 686 southbridge (KT133A, 
KT333), one is something older, a Pentium chipset. 

DMA 100 and DMA 66. We all use those Maxtor 80GB EIDE disks. 


Dirk



Re: [reiserfs-list] 2.4.19-pre7 / corruption on unwanted reboot

2002-05-06 Thread Chris Mason

On Mon, 2002-05-06 at 09:59, Dirk Mueller wrote:
 On Mon, 06 Mai 2002, Chris Mason wrote:
 
  Please tell us everything about your IDE config.  Jens and I are already
  trying to track down some odd reiserfs + ide problems on 2.4.19pre7, but
  so far that was only with our barrier write patches applied.
 
 There is not much common. two of them are VIA 686 southbridge (KT133A, 
 KT333), one is something older, a Pentium chipset. 
 
 DMA 100 and DMA 66. We all use those Maxtor 80GB EIDE disks. 

Any suggestions on how I might reproduce locally?

-chris





Re: [reiserfs-list] Error Code 255 and Permission Denied

2002-05-06 Thread Daniel Christiansen

 Oleg Drokin [EMAIL PROTECTED] 05/06/02 07:21 AM 

Make sure to get latest reiserfsprogs (v3.x.1b) from namesys ftp site.

I did this and installed it.  But, of course, my rescue disk has the old
3.x.0j version not the new 3.x.1b version.  How can I use the new
version?

 Other indications of a problem:  I ran the dmesg program from /bin and
 got Warning log replay starting on readonly filesystem and lots of
 i/o failure trying to find stat data messages.

Your filesystem was corrupted by something. Do you have Windows on that
box,
too?

I have windows on the first drive, which I rarely use, and a vfat
windows partition on the second drive so that I could transfer files.

I'm not sure, but I think my problems started after a power outage.


 kernel: is_tree_node: node level 0 does not match to the expected one
1
 kernel: vs-5150: search_by_key: invalid format found in block 8272.
 Fsck?

This message confirms damaged blocks theory.

 I would appreciate any suggestions as to how fix my problem.

Get latest reiserfsprogs package and run reiserfsck --rebuild-tree.

I presume I'm supposed to remount my / drive as read only.  I'm not sure
how to do that.  I'll try to find out.


Thanks for the help.

Dan


Bye,
Oleg




Re: [reiserfs-list] Error Code 255 and Permission Denied

2002-05-06 Thread Oleg Drokin

Hello!

On Mon, May 06, 2002 at 12:07:26PM -0400, Daniel Christiansen wrote:

 Make sure to get latest reiserfsprogs (v3.x.1b) from namesys ftp site.
 I did this and installed it.  But, of course, my rescue disk has the old
 3.x.0j version not the new 3.x.1b version.  How can I use the new
 version?

Copy the new version to your rescue floppy.

  Other indications of a problem:  I ran the dmesg program from /bin and
  got Warning log replay starting on readonly filesystem and lots of
  i/o failure trying to find stat data messages.
 Your filesystem was corrupted by something. Do you have Windows on that
 box,
 too?
 I have windows on the first drive, which I rarely use, and a vfat
 windows partition on the second drive so that I could transfer files.
 I'm not sure, but I think my problems started after a power outage.

Hm. Do you have write cache enabled on your harddrive? That may explain
your problems (and yes, most of drive manufacturers do enable write caching
by default).

 Get latest reiserfsprogs package and run reiserfsck --rebuild-tree.
 I presume I'm supposed to remount my / drive as read only.  I'm not sure
 how to do that.  I'll try to find out.

No. simple remounting won't help, you need to boot off some rescue media and run
reiserfsck on completely unmounted partition.

Bye,
Oleg



Re: [reiserfs-list] 2.4.19-pre7 / corruption on unwanted reboot

2002-05-06 Thread Dirk Mueller

On Mon, 06 Mai 2002, Chris Mason wrote:

 Any suggestions on how I might reproduce locally?

not much. maybe try a lot of open, unlinked files when pressing reset and 
then check the md5sum's of all files..


Dirk



Re: [reiserfs-list] Error Code 255 and Permission Denied

2002-05-06 Thread Daniel Christiansen

Everything seems to work by using the 3.x.1b version of reisfsck with
--rebuild-tree.  Thank you very much, Oleg, for taking the time to solve
my problem.

I don't know anything about the write cache enabled issue below.  Is
this something I have to change with a jumper, a bios setting, or a
software configuration?

Thanks again.

  Other indications of a problem:  I ran the dmesg program from /bin
and
  got Warning log replay starting on readonly filesystem and lots of
  i/o failure trying to find stat data messages.
 Your filesystem was corrupted by something. Do you have Windows on
that
 box,
 too?
 I have windows on the first drive, which I rarely use, and a vfat
 windows partition on the second drive so that I could transfer files.
 I'm not sure, but I think my problems started after a power outage.

Hm. Do you have write cache enabled on your harddrive? That may explain
your problems (and yes, most of drive manufacturers do enable write
caching
by default).








Re: [reiserfs-list] fsync() Performance Issue

2002-05-06 Thread Chris Mason

On Mon, 2002-05-06 at 17:21, Hans Reiser wrote:

 I'd rather not put it back in because it adds yet another corner case to
 maintain for all time.  Most of the fsync/O_SYNC bound applications are
 just given their own partition anyway, so most users that need data
 logging need it for every write.
 
 Does mozilla's mail user agent use fsync?  Should I give it its own 
 partition?  I bet it is fsync bound;-)

[ I took Wayne off the cc list, he's probably not horribly interested ]

Perhaps, but I'll also bet the fsync performance hit doesn't affect the
performance of the system as a whole.  Remember that data=journal
doesn't make the fsyncs fast, it just makes them faster.

 
 Most persons using small fsyncs are using it because the person who 
 wrote their application wrote it wrong.  What's more, many of the 
 persons who wrote those applications cannot understand that they did it 
 wrong even if you tell them (e.g. qmail author reportedly cannot 
 understand, sendmail guys now understand but had Kirk McKusick on their 
 staff and attending the meeting when I explained it to them so they are 
 not very typical).  
 
 In other words, handling stupidity is an important life skill, and we 
 all need to excell at it.;-)

A real strength to linux is the application designers can talk directly
to their own personal bottlenecks.  Hopefully we reward those that hunt
us down and spend the time convincing us their applications are worth
tuning for.  They then proceed to beat the pants off their competition.

 
 Tell me what your thoughts are on the following:
 
 If you ask randomly selected ReiserFS users (not the reiserfs-list, but 
 the ones who would never send you an email)  the following 
 questions, what percentage will answer which choice?
 
 The filesystem you are using is named:
 
 a) the Performance Optimized SuSE FS
 
 b) NTFS
 
 c) FAT
 
 d) ext2
 
 e) ReiserFS

I believe the ones that know what a filesystem is will answer ReiserFS,
You might get a lot of ext2 answers, just because that's what a lot of
people think the linux filesystem is.

 
 If you want to change reiserfs to use data journaling you must do which:
 
 a) reinstall the reiserfs package using rpm
 
 b) modify /etc/fs.conf
 
 c) reinstall the operating system from scratch, and select different 
 options during the install this time
 
 d) reformat your reiserfs partition using mkreiserfs
 
 e) none of the above
 
 f) all of the above except e)

These people won't be admins of systems big enough for the difference to
matter.  data journaling is targeted at people with so much load they
would have to buy more hardware to make up for it.  The new option
lowers the price to performance ratio, which is exactly what we want to
do for sendmails, egeneras, lycos, etc.  If it takes my laptop 20ms to
deliver a mail message, cutting the time down to 10ms just won't matter.

 
 
 What do you think the chances are that you can convince Hubert that 
 every SuSE Enterprise Edition user should be asked at install time if 
 they are going to use fsync a lot on each partition, and to use a 
 different fstab setting if yes?

Very little, I might tell them to buy the suse email server instead,
since that would have the settings done right.  data=journal is just a
small part of mail server tuning.

 
 I know that you are an experienced sysadmin who was good at it.  Your 
 intuition tells you that most sysadmins are like the ones you were 
 willing to hire into your group at the university.  They aren't.
 
 Linux needs to be like a telephone.  You plug it in, push buttons, and 
 talk.  It works well, but most folks don't know why.
 

Exactly.  I think there are 3 classes of users at play here.

1) Those who don't understand and don't have enough load to notice.
2) Those who don't understand and do have enough load to notice.
3) Those who do understand and do have enough load to notice.

#2 will buy support from someone, and they should be able to configure
the thing right.

#3 will find the docs and do it right themselves.

 A moderate number of programs are small fsync bound for the simple 
 reason that it is simpler to write them that way.We need to cover 
 over their simplistic designs.
 
 So, you have my sympathies Chris, because I believe you that it makes 
 the code uglier and it won't be a joy to code and test.  I hope you also 
 see that it should be done.

Mostly, I feel this kind of tuning is a mistake right now.  The patch is
young and there are so many places left to tweak...I'm still at the
stage where much larger improvements are possible, and a better use of
coding time.  Plus, it's monday and it's always more fun to debate than
give in on mondays.

-chris





[reiserfs-list] BTW: 2.4.19-patches-to-come?

2002-05-06 Thread Manuel Krause

Hi!

BTW, for 2.4.19-final it would be very nice to have...

1.) the deleted/truncated/completed-files-on-mount at least
 printed in the kernel logs, at best with the real filename
 -- as afterwards they are not retrievable --
 That's a security reason -- whoever can trigger a crash
 with various methods (I know the admin should take care
 against this case... but on my home sytem I'd like to know
 that info, too) but to get back the file from backups in
 case... who knows it before a
 crash... ? Am I missing something?

2.) a disk/drive/partition distinction in reiserfs related
 messages -- Oleg, you promised it to get real and best
 would be a real patch !

3.) a hint on how to turn on/off data-journaling for some
 of our existing reiserfs partitions if it exists at all
 for now and why it could be needed in some cases?!.

4.) a hint why there is iicache code in the latest
 speedup-compound-patch (so that the latest iicache patch
 would not apply)


Best regards for your stable ReiserFS, at all,

even under settings with 2.4.19-pre7 +reiserfs.pending +latest 
reiserfs.compound-speedup +aa.vm-for-2.4.19-pre7 +akpm.read-latency-2 
+rml.preempt-kernel + rml.lock-break +some-more nice aa.patches... 
That's a valuably fast  interactive experience!


Manuel





Re: [reiserfs-list] fsync() Performance Issue

2002-05-06 Thread Manuel Krause

On 05/07/2002 12:57 AM, Chris Mason wrote:

 On Mon, 2002-05-06 at 17:21, Hans Reiser wrote:
 
I'd rather not put it back in because it adds yet another corner case to
maintain for all time.  Most of the fsync/O_SYNC bound applications are
just given their own partition anyway, so most users that need data
logging need it for every write.


Does mozilla's mail user agent use fsync?  Should I give it its own 
partition?  I bet it is fsync bound;-)

 
 [ I took Wayne off the cc list, he's probably not horribly interested ]
 
 Perhaps, but I'll also bet the fsync performance hit doesn't affect the
 performance of the system as a whole.  Remember that data=journal
 doesn't make the fsyncs fast, it just makes them faster.
 
 
Most persons using small fsyncs are using it because the person who 
wrote their application wrote it wrong.  What's more, many of the 
persons who wrote those applications cannot understand that they did it 
wrong even if you tell them (e.g. qmail author reportedly cannot 
understand, sendmail guys now understand but had Kirk McKusick on their 
staff and attending the meeting when I explained it to them so they are 
not very typical).  

In other words, handling stupidity is an important life skill, and we 
all need to excell at it.;-)

 
 A real strength to linux is the application designers can talk directly
 to their own personal bottlenecks.  Hopefully we reward those that hunt
 us down and spend the time convincing us their applications are worth
 tuning for.  They then proceed to beat the pants off their competition.
 
 
Tell me what your thoughts are on the following:

If you ask randomly selected ReiserFS users (not the reiserfs-list, but 
the ones who would never send you an email)  the following 
questions, what percentage will answer which choice?

The filesystem you are using is named:

a) the Performance Optimized SuSE FS

b) NTFS

c) FAT

d) ext2

e) ReiserFS

 
 I believe the ones that know what a filesystem is will answer ReiserFS,
 You might get a lot of ext2 answers, just because that's what a lot of
 people think the linux filesystem is.
 
 
If you want to change reiserfs to use data journaling you must do which:

a) reinstall the reiserfs package using rpm

b) modify /etc/fs.conf

c) reinstall the operating system from scratch, and select different 
options during the install this time

d) reformat your reiserfs partition using mkreiserfs

e) none of the above

f) all of the above except e)

 
 These people won't be admins of systems big enough for the difference to
 matter.  data journaling is targeted at people with so much load they
 would have to buy more hardware to make up for it.  The new option
 lowers the price to performance ratio, which is exactly what we want to
 do for sendmails, egeneras, lycos, etc.  If it takes my laptop 20ms to
 deliver a mail message, cutting the time down to 10ms just won't matter.
 
 

What do you think the chances are that you can convince Hubert that 
every SuSE Enterprise Edition user should be asked at install time if 
they are going to use fsync a lot on each partition, and to use a 
different fstab setting if yes?

 
 Very little, I might tell them to buy the suse email server instead,
 since that would have the settings done right.  data=journal is just a
 small part of mail server tuning.
 
 
I know that you are an experienced sysadmin who was good at it.  Your 
intuition tells you that most sysadmins are like the ones you were 
willing to hire into your group at the university.  They aren't.

Linux needs to be like a telephone.  You plug it in, push buttons, and 
talk.  It works well, but most folks don't know why.


 
 Exactly.  I think there are 3 classes of users at play here.
 
 1) Those who don't understand and don't have enough load to notice.
 2) Those who don't understand and do have enough load to notice.
 3) Those who do understand and do have enough load to notice.
 
 #2 will buy support from someone, and they should be able to configure
 the thing right.
 
 #3 will find the docs and do it right themselves.
 
 
A moderate number of programs are small fsync bound for the simple 
reason that it is simpler to write them that way.We need to cover 
over their simplistic designs.

So, you have my sympathies Chris, because I believe you that it makes 
the code uglier and it won't be a joy to code and test.  I hope you also 
see that it should be done.

 
 Mostly, I feel this kind of tuning is a mistake right now.  The patch is
 young and there are so many places left to tweak...I'm still at the
 stage where much larger improvements are possible, and a better use of
 coding time.  Plus, it's monday and it's always more fun to debate than
 give in on mondays.
 
 -chris
 


Hi, Chris  Hans!

Don't think this somekind of destructive discussion would lead to 
anything useful for now, can you post a diff for 
2.4.19-pre7+latest-related-pending +compound-patch-from-ftp?

I'll try it and report if that leads 

Re: [reiserfs-list] Error Code 255 and Permission Denied

2002-05-06 Thread Chris Mason

On Mon, 2002-05-06 at 20:54, Manuel Krause wrote:
 On 05/06/2002 09:12 PM, Daniel Christiansen wrote:
 
  Everything seems to work by using the 3.x.1b version of reisfsck with
  --rebuild-tree.  Thank you very much, Oleg, for taking the time to solve
  my problem.
  
  I don't know anything about the write cache enabled issue below.  Is
  this something I have to change with a jumper, a bios setting, or a
  software configuration?
  

Most new IDE drives have this on by default.  You can turn it off with
hdparm -W 0, which will make you more able to withstand power failures.

 
 Or is it somekind of connected to Chris Masons thread 2.4.19-pre7 / 
 corruption on unwanted reboot??? If Chris and Jens found bugs on IDE 
 interaction with ReiserFS they should really put out a patch soon... ;-)
 
 Chris M.? Is that related eventually? Just a doubt!

I haven't been able to reproduce problems after a crash with pre7, but
Dirk is not often wrong when we reports about a bug.  If anyone can
reliably reproduce I'd be grateful.

-chris





Re: [reiserfs-list] fsync() Performance Issue

2002-05-06 Thread Chris Mason

On Mon, 2002-05-06 at 21:17, Manuel Krause wrote:
 On 05/07/2002 12:57 AM, Chris Mason wrote:

 
 Hi, Chris  Hans!
 
 Don't think this somekind of destructive discussion would lead to 
 anything useful for now, can you post a diff for 
 2.4.19-pre7+latest-related-pending +compound-patch-from-ftp?
 
 I'll try it and report if that leads to more security and/or less 
 performance on my every day use with NS6 and so on if there is any.

The current data logging patches are at:

ftp.suse.com/pub/people/mason/patches/data-logging

They are against 2.4.19-pre7, and contain versions of the major (stable)
speedups.  The patch is pretty big, so I'm not likely to merge with the
namesys pending directories.  The namesys guys add things frequently,
and I think it would get confusing for people trying to figure out which
patches to apply.

The data logging stuff is beta code, if you have a good test bed where
it's ok if things go wrong I can make you a special patch with the
pending stuff merged.

-chris