Question on Reiser4 regarding power failures

2004-11-25 Thread Bernhard Prell
Hello everybody,

I have a simple question about Reiser4. I already tried to find the answer in 
http://www.namesys.com/v4/v4.html but I'm not completly sure if I understood 
everything :-)

*Background:
I 'm administrating about 20 computers. The installed Linux/Reiser is quite 
old: SuSE Linux with kernel 2.4.7(SMP) (dmesg tells me reiser 3.5.x disk 
format, ReiserFS version 3.6.25).

Obviously people often pull the plug without cleanly shutting down the 
systems, because maybe twice per month I have to setup a system again because 
it doesn't want to boot anymore. The reason is always a somehow corrupted 
root-filesystem. Sometimes reiserfsck can repair it, sometimes not because of 
I/O-read-errors. If that's the case I also have a look at the disk with 
"smartctl -t short /dev/hda" and it also reports read-errors. Then I usually 
do a "dd if=/dev/zero of=/dev/hda", repartition, reformat, setup the system 
from a tar file and everything is fine again (smartctl doesn't report 
read-errors anymore).

*Now my question, you may have guessed it:
Will the corruption caused by partially written sectors or other problems 
caused by _power failures_ be still a problem under reiser4 or are they gone 
forever because of the "atomic transactions" (at least in theory)?
(That would mean that reiserfsck is only existing to repair problems caused by 
bugs in reiser4...)

Sorry if this has been asked before, I tried to find an answer in the 
list-archive.

Thanks in advance

Bernhard


Re: Question on Reiser4 regarding power failures

2004-11-25 Thread Kerin Millar
On Thu, 2004-11-25 at 16:37 +0100, Bernhard Prell wrote:

[snip]

> *Background:
> I 'm administrating about 20 computers. The installed Linux/Reiser is quite 
> old: SuSE Linux with kernel 2.4.7(SMP) (dmesg tells me reiser 3.5.x disk 
> format, ReiserFS version 3.6.25).
> 
> Obviously people often pull the plug without cleanly shutting down the 
> systems, because maybe twice per month I have to setup a system again because 
> it doesn't want to boot anymore. The reason is always a somehow corrupted 
> root-filesystem. Sometimes reiserfsck can repair it, sometimes not because of 
> I/O-read-errors. If that's the case I also have a look at the disk with 
> "smartctl -t short /dev/hda" and it also reports read-errors. Then I usually 
> do a "dd if=/dev/zero of=/dev/hda", repartition, reformat, setup the system 
> from a tar file and everything is fine again (smartctl doesn't report 
> read-errors anymore).

[snip]

I can't comment on reiser4's characteristics but thought you might be
interested in my experiences. I ran a busy server using reiserfs for
over two years, from around the days of kernel 2.4.18. During that time,
there were several occasions on which power was lost (I had no UPS at
that time) and on no occasion did I experience filesystem corruption of
that kind, nor did I have to go through any special recovery procedures
to get things up and running again.

For this reason, and because I believe that the stability of reiserfs
was improved drastically in later revisions of the 2.4 kernel, I would
urge that you consider using a modern 2.4 kernel and the latest reiserfs
tools if possible!

Another factor is the type of journalling mode that is used. Later in
2.6, options were made available to select the journalling mode (see
http://www.namesys.com/mount-options.html) which reflect the hitherto
more flexible ext3 options. By mounting with data=journal, one can
ensure to a reasonable extent the integrity of both data and metadata,
probably at some expense in performance. Personally I never had any
problems with data=ordered which is the default.

I don't know what the situation is in 2.4 these days, but I am assuming
that this work was never backported i.e. only metadata is journalled.
However, even if that is the case I would still suggest that it is much
more robust in later kernels (both 2.4 and 2.6).

Regards,

--Kerin Francis Millar



Re: Question on Reiser4 regarding power failures

2004-11-25 Thread Christian Mayrhuber
On Thursday 25 November 2004 16:37, Bernhard Prell wrote:
> Hello everybody,
> 
> I have a simple question about Reiser4. I already tried to find the answer 
in 
> http://www.namesys.com/v4/v4.html but I'm not completly sure if I understood 
> everything :-)
> 
> *Background:
> I 'm administrating about 20 computers. The installed Linux/Reiser is quite 
> old: SuSE Linux with kernel 2.4.7(SMP) (dmesg tells me reiser 3.5.x disk 
> format, ReiserFS version 3.6.25).
> 
> Obviously people often pull the plug without cleanly shutting down the 
> systems, because maybe twice per month I have to setup a system again 
because 
> it doesn't want to boot anymore. The reason is always a somehow corrupted 
> root-filesystem. Sometimes reiserfsck can repair it, sometimes not because 
of 
> I/O-read-errors. If that's the case I also have a look at the disk with 
> "smartctl -t short /dev/hda" and it also reports read-errors. Then I usually 
> do a "dd if=/dev/zero of=/dev/hda", repartition, reformat, setup the system 
> from a tar file and everything is fine again (smartctl doesn't report 
> read-errors anymore).
> 
> *Now my question, you may have guessed it:
> Will the corruption caused by partially written sectors or other problems 
> caused by _power failures_ be still a problem under reiser4 or are they gone 
> forever because of the "atomic transactions" (at least in theory)?
> (That would mean that reiserfsck is only existing to repair problems caused 
by 
> bugs in reiser4...)
> 
> Sorry if this has been asked before, I tried to find an answer in the 
> list-archive.

I'd suggest to do the following for pull the plug scenarios on productive
systems with reiserfs:

1) Disable write caching for ide drives with "hdparm -W 0 /dev/hdX"
   This is the most important thing to do.
2) Always try to use the newest version of reiserfsck. The older, the more
   bugs.
3) Convert your filesystem to the newer 3.6 disk format using
   the "conv" mount option once. The 3.5 disk format is very old and
   has some limitations.
4) If you want to prevent data corruption and not just filesystem
   corruption, use a recent 2.6.x kernel which incoporates Chris Mason's
   data logging patches.
   Alternatively you could use a newer SuSE which has the data logging
   patches in the 2.4 kernel series, too. Maybe the installation of a
   a newer 2.4.x kernel rpm from ftp.suse.com will work for you.

As reiser4 is atomic in it's operations it should protect your data even more 
than ext3/reiserfs with the data=journal mount option, unless you don't have
write cache for ide drives enabled. Reiser4 is still beta.

Since 2.6.9 the write barrier patches are in the kernel. These should
protect better from data loss with disks that have write caching enabled.
The only two filesystems supporting write barrier are ext3 with the mount
option "barrier=1" and reiserfs with the mount option "barrier=flush".

-- 
lg, Chris



Re: Question on Reiser4 regarding power failures

2004-11-29 Thread Bernhard Prell

Thank you very much for your feedback so far!

Kerin Millar wrote:
> For this reason, and because I believe that the stability of reiserfs
> was improved drastically in later revisions of the 2.4 kernel, I would
> urge that you consider using a modern 2.4 kernel and the latest reiserfs
> tools if possible!

I don't want to describe why we are still using such an outdated system :-) 
but we will move to Gentoo with a current 2.6.x kernel and reiser4 soon. I 
just wanted to know if this will eliminate the problem once and for all - at 
least in theory - because of the new concepts in reiser4 (atomic 
transactions).  

--

Christian Mayrhuber wrote:
> I'd suggest to do the following for pull the plug scenarios on productive
> systems with reiserfs:

> 1) Disable write caching for ide drives with "hdparm -W 0 /dev/hdX"
>    This is the most important thing to do.

Will this really help to protect against partially written sectors and from 
there resulting read-errors (If a disk loses power while writing a sector the 
CRC-Check will fail and the disk reports an read-error that's not caused by a 
real hardware defect)? Changing the write cache strategy just "moves" the 
problem "in time" - maybe the propability that something happens is lower, 
because the amount of data that gets written at a certain point of time is 
smaller. 

> 4) If you want to prevent data corruption and not just filesystem
>    corruption, use a recent 2.6.x kernel which incoporates Chris Mason's
>    data logging patches.
>    Alternatively you could use a newer SuSE which has the data logging
>    patches in the 2.4 kernel series, too. Maybe the installation of a
>    a newer 2.4.x kernel rpm from ftp.suse.com will work for you.

> As reiser4 is atomic in it's operations it should protect your data even 
more 
> than ext3/reiserfs with the data=journal mount option, unless you don't have
> write cache for ide drives enabled. Reiser4 is still beta.

I am not concerned about the current beta quality, we will migrate early next 
year and someday the bugs will be fixed. Updating a SuSE system is quite a 
pain I experienced (without reinstalling), it will be easier with Gentoo to 
keep a reasonable current system.

> Since 2.6.9 the write barrier patches are in the kernel. These should
> protect better from data loss with disks that have write caching enabled.
> The only two filesystems supporting write barrier are ext3 with the mount
> option "barrier=1" and reiserfs with the mount option "barrier=flush".

Our problem is not loss of (user) data. If someone pulls the plug before all 
unsaved data is written or before the write cache is flushed - I don't care. 
But it would be nice if the system would boot again... and all system files 
are sane. As I understand it, reiser4 recognizes if a certain transaction was 
completed successfully. But will it fix a partially written sector?

Bye,

Bernhard



Re: Question on Reiser4 regarding power failures

2004-11-29 Thread Christian Mayrhuber
On Monday 29 November 2004 10:25, Bernhard Prell wrote:
> Christian Mayrhuber wrote:
> > I'd suggest to do the following for pull the plug scenarios on productive
> > systems with reiserfs:
> 
> > 1) Disable write caching for ide drives with "hdparm -W 0 /dev/hdX"
> >    This is the most important thing to do.
> 
> Will this really help to protect against partially written sectors and from 
> there resulting read-errors (If a disk loses power while writing a sector 
the 
> CRC-Check will fail and the disk reports an read-error that's not caused by 
a 
> real hardware defect)? Changing the write cache strategy just "moves" the 
> problem "in time" - maybe the propability that something happens is lower, 
> because the amount of data that gets written at a certain point of time is 
> smaller. 
I guess it's all about probability. If you can reduce the risk of a 
non-bootable system from say 50% to 5% it'll often save your day.

If the harddisk is not able to finish writing a sector during a power failure 
I guess nothing can help you. Maybe some reiserfs guy (lady?) knows how
nasty the behavior of ide drives during a power failure still is if the
write cache is already disabled.

My experience is vastly positive with disabled write caches. I didn't have a 
corrupted unbootable reiserfs since I disabled the write cache on the 
harddisk. I must admit I'm not running that old kernels.
The 2.4.7 kernel is a rather old beast with reiserfs bugs. You should 
be using a newer one.

I did some tests with kernel 2.6.9, the "barrier=flush" mount option and a 
enabled write cache. Whilst cp'ing a kernel tree I was switching off and the 
system survived every time. 
I'll stick to "barrier=flush" as it seems to be safe enough for me.

-- 
lg, Chris


Re: Question on Reiser4 regarding power failures

2004-11-29 Thread Dieter Nützel
Am Montag, 29. November 2004 10:25 schrieb Bernhard Prell:
> Thank you very much for your feedback so far!
>
> Kerin Millar wrote:
> > For this reason, and because I believe that the stability of reiserfs
> > was improved drastically in later revisions of the 2.4 kernel, I would
> > urge that you consider using a modern 2.4 kernel and the latest reiserfs
> > tools if possible!
>
> I don't want to describe why we are still using such an outdated system :-)
> but we will move to Gentoo with a current 2.6.x kernel and reiser4 soon. I
> just wanted to know if this will eliminate the problem once and for all -
> at least in theory - because of the new concepts in reiser4 (atomic
> transactions).
>
> --
>
> Christian Mayrhuber wrote:
> > I'd suggest to do the following for pull the plug scenarios on productive
> > systems with reiserfs:
> >
> > 1) Disable write caching for ide drives with "hdparm -W 0 /dev/hdX"
> >    This is the most important thing to do.
>
> Will this really help to protect against partially written sectors and from
> there resulting read-errors (If a disk loses power while writing a sector
> the CRC-Check will fail and the disk reports an read-error that's not
> caused by a real hardware defect)? Changing the write cache strategy just
> "moves" the problem "in time" - maybe the propability that something
> happens is lower, because the amount of data that gets written at a certain
> point of time is smaller.
>
> > 4) If you want to prevent data corruption and not just filesystem
> >    corruption, use a recent 2.6.x kernel which incoporates Chris Mason's
> >    data logging patches.
> >    Alternatively you could use a newer SuSE which has the data logging
> >    patches in the 2.4 kernel series, too. Maybe the installation of a
> >    a newer 2.4.x kernel rpm from ftp.suse.com will work for you.
> >
> > As reiser4 is atomic in it's operations it should protect your data even
>
> more
>
> > than ext3/reiserfs with the data=journal mount option, unless you don't
> > have write cache for ide drives enabled. Reiser4 is still beta.
>
> I am not concerned about the current beta quality, we will migrate early
> next year and someday the bugs will be fixed. Updating a SuSE system is
> quite a pain I experienced (without reinstalling), it will be easier with
> Gentoo to keep a reasonable current system.

So why don't you use a current 2.4er with Chris Mason's (SuSE) latest ReiserFS 
3.6.xx patches? Or maybe a newer SuSE 2.4er kernel (with Chris patches)?

ftp://ftp.suse.com/pub/people/mason/patches/data-logging
2.4.25 is "current" for data=ordered|journal

Greetings,
Dieter


Re: Question on Reiser4 regarding power failures

2004-11-29 Thread Christian Mayrhuber
On Monday 29 November 2004 14:31, Dieter Nützel wrote:

> ftp://ftp.suse.com/pub/people/mason/patches/data-logging
> 2.4.25 is "current" for data=ordered|journal
> 
> Greetings,
>  Dieter
> 
I had these patches running on 2.4.26 and 2.4.27 kernels. Both working
fine. I don't know about these patches and 2.4.28 because I completely 
switched over to the 2.6.x kernel series.

-- 
lg, Chris


Re: Question on Reiser4 regarding power failures

2004-11-29 Thread Matt Stegman

> Will this really help to protect against partially written sectors and from
> there resulting read-errors (If a disk loses power while writing a sector the
> CRC-Check will fail and the disk reports an read-error that's not caused by a
> real hardware defect)? Changing the write cache strategy just "moves" the
> problem "in time" - maybe the propability that something happens is lower,
> because the amount of data that gets written at a certain point of time is
> smaller.

Aren't sector writes (512-byte sectors) on hard drives atomic?  I believe
they are, either all 512 bytes are written or none are.  The problem with
write caching is that when the kernel asks the drive to write sectors,
those sectors will end up in the write cache instead of going to disk, but
the disk will report them as written.  This is fast, but if the power is
lost before the cache is written to disk, then you've just lost your data
- up to the size of your drive's cache, anywhere from 256kB or less up to
8MB caches on late model drives - and the write cache makes no ordering
guarantees at all.  This pretty much negates the security of a journalling
filesystem, since it relies on knowing whether certain data was written to
the disk or not.

-- 
Matt Stegman