Re: Serious bug in vinum?

2004-03-30 Thread Greg 'groggy' Lehey
On Tuesday, 30 March 2004 at 14:37:00 +0200, Lukas Ertl wrote:
 On Fri, 26 Mar 2004, Joao Carlos Mendes Luis wrote:

 I think this should be like:

 if (plex-state  plex_corrupt) {  /* something accessible, 
 */

 Or, in other words, volume state is up only if plex state is degraded
 or better.

 You are right, this is a bug,

No, see my reply.

 The correct solution, of course, is to check if the data is valid
 before changing the volume state, but turn might turn out to be a
 very complex check.

Well, the minimum correct solution is to return an error if somebody
tries to access the inaccessible part of the volume.  That should
happen, and I'm confused that it doesn't appear to be doing so in this
case.

On Tuesday, 30 March 2004 at 11:07:55 -0300, Joo Carlos Mendes Lus wrote:
 Greg 'groggy' Lehey wrote:
 On Tuesday, 30 March 2004 at  0:32:38 -0300, Joo Carlos Mendes Lus wrote:

 Basically, this is a feature and not a bug.  A plex that is corrupt is
 still partially accessible, so we should allow access to it.  If you
 have two striped plexes both striped between two disks, with the same
 stripe size, and one plex starts on the first drive, and the other on
 the second, and one drive dies, then each plex will lose half of its
 data, every second stripe.  But the volume will be completely
 accessible.

 A good idea if you have both stripe and mirror, to avoid discarding the
 whole disk.  But, IMHO, if some part of the disk is inacessible, the volume
 should go down, and IFF the operator wants to try recovery, should use the
 setstate command.  This is the safe state.

setstate is not safe.  It bypasses a lot of consistency checking.

One possibility would be: 

1.  Based on the plex states, check if all of the volume is still
accessible.
2.  If not, take the volume into a flaky state.  
3.  *Somehow* ensure that the volume can't be accessed again as a file
system until it has been remounted.
4.  Refuse to remount the file system without the -f option.

The last two are outside the scope of Vinum, of course.

Discussion?
--
Note: I discard all HTML mail unseen.
Finger [EMAIL PROTECTED] for PGP public key.
See complete headers for address and phone numbers.


pgp0.pgp
Description: PGP signature


Re: Serious bug in vinum?

2004-03-30 Thread João Carlos Mendes Luís


Greg 'groggy' Lehey wrote:

On Tuesday, 30 March 2004 at 14:37:00 +0200, Lukas Ertl wrote:

On Fri, 26 Mar 2004, Joao Carlos Mendes Luis wrote:


   I think this should be like:

   if (plex-state  plex_corrupt) {  /* something accessible, */

   Or, in other words, volume state is up only if plex state is degraded
or better.
You are right, this is a bug,
No, see my reply.
I think maybe is the best answer here.

The correct solution, of course, is to check if the data is valid
before changing the volume state, but turn might turn out to be a
very complex check.


Well, the minimum correct solution is to return an error if somebody
tries to access the inaccessible part of the volume.  That should
happen, and I'm confused that it doesn't appear to be doing so in this
case.
On Tuesday, 30 March 2004 at 11:07:55 -0300, Joo Carlos Mendes Lus wrote:

Greg 'groggy' Lehey wrote:

On Tuesday, 30 March 2004 at  0:32:38 -0300, Joo Carlos Mendes Lus wrote:

Basically, this is a feature and not a bug.  A plex that is corrupt is
still partially accessible, so we should allow access to it.  If you
have two striped plexes both striped between two disks, with the same
stripe size, and one plex starts on the first drive, and the other on
the second, and one drive dies, then each plex will lose half of its
data, every second stripe.  But the volume will be completely
accessible.
   A good idea if you have both stripe and mirror, to avoid discarding the
whole disk.  But, IMHO, if some part of the disk is inacessible, the volume
should go down, and IFF the operator wants to try recovery, should use the
setstate command.  This is the safe state.
setstate is not safe.  It bypasses a lot of consistency checking.
That's why it should be done only by a human operator, and only after 
checking the physical disk.  I use setstate frequently, when I have my wizard 
hat on, but I know the consequences of doing that.  If I have someone watching I 
carefully explain then to *not* repeat that.   ;-)

One possibility would be: 

1.  Based on the plex states, check if all of the volume is still
accessible.
2.  If not, take the volume into a flaky state.  
This is easy if the volume is composed of a single plex (my case, and the 
case of most people who needs only a big and unsafe disk.  Where unsafe means 
a disk available or not available, and not half a disk.  At least for me.

If the volume has more than one plex, then you could think of an algoritm 
that explores this redundancy.

But, IMO, a disk with half of it unavailable is hardly an up and ok one.

Also note that, instead of turning the whole subdisk stale when a single 
I/O fails, the error could be passed above.  But, also, this only works with 
single plex stripe or concat configurations.


3.  *Somehow* ensure that the volume can't be accessed again as a file
system until it has been remounted.
4.  Refuse to remount the file system without the -f option.
The last two are outside the scope of Vinum, of course.
And again violates the layering aproach.  I thought newfs -v has been enough...

The first time I used vinum I was happilly thinking that I would mix 4 
whole disks (except for boot and swap partitions, of course) and create a new 
pseudo disk, in which I would again disklabel it, and repartition for expected 
use.  Say, for example, that I want to have /var and /usr on different 
partitions, but I want both with mirroring.  With real world vinum I need to 
create 2 vinum partitions on real disks, and have 2 vinum volumes.

AFAIK, -current and GEOM fixes this, right?  My last experience with 
RaidFrame was a panic one, since the disk creation.  But I must confess I did 
not try that hard, since vinum and -stable was working for me.  I am not a 
-current hacker for a long time now.

Greg, I like vinum, and I use it since its release in FreeBSD.  Before that 
I have used ccd(4).  When 5.x is stable, I will use GEOM, vinum or raidframe. 
But I really think *ix is great for it's reusability, recursivity and modularity 
and vinum breaks this.  If vinum creates a virtual disk, it should behave like a 
real disk.

Jonny

--
João Carlos Mendes Luís - Networking Engineer - [EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]