Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]

2003-03-14 Thread Greg 'groggy' Lehey
On Friday, 14 March 2003 at 10:05:28 +0200, Vallo Kallaste wrote:
> On Fri, Mar 14, 2003 at 01:16:02PM +1030, Greg 'groggy' Lehey
> <[EMAIL PROTECTED]> wrote:
>
>>> So I did. Loaned two SCSI disks and 50-pin cable. Things haven't
>>> improved a bit, I'm very sorry to say it.
>>
>> Sorry for the slow reply to this.  I thought it would make sense to
>> try things out here, and so I kept trying to find time, but I have to
>> admit I just don't have it yet for a while.  I haven't forgotten, and
>> I hope that in a few weeks time I can spend some time chasing down a
>> whole lot of Vinum issues.  This is definitely the worst I have seen,
>> and I'm really puzzled why it always happens to you.
>>
>>> # simulate disk crash by forcing one arbitrary subdisk down
>>> # seems that vinum doesn't return values for command completion status
>>> # checking?
>>> echo "Stopping subdisk.. degraded mode"
>>> vinum stop -f r5.p0.s3  # assume it was successful
>>
>> I wonder if there's something relating to stop -f that doesn't happen
>> during a normal failure.  But this was exactly the way I tested it in
>> the first place.
>
> Thank you Greg, I really appreciate your ongoing effort for making
> vinum stable, trusted volume manager.
> I have to add some facts to the mix. Raidframe on the same hardware
> does not have any problems. The later tests I conducted was done
> under -stable, because I couldn't get raidframe to work under
> -current, system did panic everytime at the end of initialisation of
> parity (raidctl -iv raid?). So I used the raidframe patch for
> -stable at
> http://people.freebsd.org/~scottl/rf/2001-08-28-RAIDframe-stable.diff.gz
> Had to do some patching by hand, but otherwise works well.

I don't think that problems with RAIDFrame are related to these
problems with Vinum.  I seem to remember a commit to the head branch
recently (in the last 12 months) relating to the problem you've seen.
I forget exactly where it went (it wasn't from me), and in cursory
searching I couldn't find it.  It's possible that it hasn't been
MFC'd, which would explain your problem.  If you have a 5.0 machine,
it would be interesting to see if you can reproduce it there.

> Will it suffice to switch off power for one disk to simulate "more"
> real-world disk failure? Are there any hidden pitfalls for failing
> and restoring operation of non-hotswap disks?

I don't think so.  It was more thinking aloud than anything else.  As
I said above, this is the way I tested things in the first place.

Greg
--
See complete headers for address and phone numbers


pgp0.pgp
Description: PGP signature


Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]

2003-03-14 Thread Vallo Kallaste
On Fri, Mar 14, 2003 at 01:16:02PM +1030, Greg 'groggy' Lehey
<[EMAIL PROTECTED]> wrote:

> > So I did. Loaned two SCSI disks and 50-pin cable. Things haven't
> > improved a bit, I'm very sorry to say it.
> 
> Sorry for the slow reply to this.  I thought it would make sense to
> try things out here, and so I kept trying to find time, but I have to
> admit I just don't have it yet for a while.  I haven't forgotten, and
> I hope that in a few weeks time I can spend some time chasing down a
> whole lot of Vinum issues.  This is definitely the worst I have seen,
> and I'm really puzzled why it always happens to you.
> 
> > # simulate disk crash by forcing one arbitrary subdisk down
> > # seems that vinum doesn't return values for command completion status
> > # checking?
> > echo "Stopping subdisk.. degraded mode"
> > vinum stop -f r5.p0.s3  # assume it was successful
> 
> I wonder if there's something relating to stop -f that doesn't happen
> during a normal failure.  But this was exactly the way I tested it in
> the first place.

Thank you Greg, I really appreciate your ongoing effort for making
vinum stable, trusted volume manager.
I have to add some facts to the mix. Raidframe on the same hardware
does not have any problems. The later tests I conducted was done
under -stable, because I couldn't get raidframe to work under
-current, system did panic everytime at the end of initialisation of
parity (raidctl -iv raid?). So I used the raidframe patch for
-stable at
http://people.freebsd.org/~scottl/rf/2001-08-28-RAIDframe-stable.diff.gz
Had to do some patching by hand, but otherwise works well.
Will it suffice to switch off power for one disk to simulate "more"
real-world disk failure? Are there any hidden pitfalls for failing
and restoring operation of non-hotswap disks?
-- 

Vallo Kallaste

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message


Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]

2003-03-13 Thread Greg 'groggy' Lehey
On Saturday,  1 March 2003 at 20:43:10 +0200, Vallo Kallaste wrote:
> On Thu, Feb 27, 2003 at 11:53:02AM +0200, Vallo Kallaste  wrote:
>
 The vinum R5 and system as a whole were stable without
 softupdates. Only one problem remained after disabling softupdates,
 while being online and user I/O going on, rebuilding of failed disk
 corrupt the R5 volume completely.
>>>
>>> Yes, we've fixed a bug in that area.  It had nothing to do with soft
>>> updates, though.
>>
>> Oh, that's very good news, thank you! Yes, it had nothing to do with
>> soft updates at all and that's why I had the "remained after" in the
>> sentence.
>>
 Don't know is it fixed or not as I don't have necessary hardware at
 the moment. The only way around was to quiesce the volume before
 rebuilding, umount it, and wait until rebuild finished. I'll suggest
 extensive testing cycle for everyone who's going to work with vinum
 R5. Concat, striping and mirroring has been a breeze but not so with
 R5.
>>>
>>> IIRC the rebuild bug bit any striped configuration.
>>
>> Ok, I definitely had problems only with R5, but you certainly know
>> much better what it was exactly. I'll need to lend 50-pin SCSI cable
>> and test vinum again. Will it matter on what version of FreeBSD I'll
>> try on? My home system runs -current of Feb 5, but if you suggest
>> -stable for consistent results, I'll do it.
>
> So I did. Loaned two SCSI disks and 50-pin cable. Things haven't
> improved a bit, I'm very sorry to say it.

Sorry for the slow reply to this.  I thought it would make sense to
try things out here, and so I kept trying to find time, but I have to
admit I just don't have it yet for a while.  I haven't forgotten, and
I hope that in a few weeks time I can spend some time chasing down a
whole lot of Vinum issues.  This is definitely the worst I have seen,
and I'm really puzzled why it always happens to you.

> # simulate disk crash by forcing one arbitrary subdisk down
> # seems that vinum doesn't return values for command completion status
> # checking?
> echo "Stopping subdisk.. degraded mode"
> vinum stop -f r5.p0.s3# assume it was successful

I wonder if there's something relating to stop -f that doesn't happen
during a normal failure.  But this was exactly the way I tested it in
the first place.

Greg
--
See complete headers for address and phone numbers


pgp0.pgp
Description: PGP signature


Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]

2003-03-01 Thread Vallo Kallaste
On Thu, Feb 27, 2003 at 11:53:02AM +0200, Vallo Kallaste  wrote:

> > > The vinum R5 and system as a whole were stable without
> > > softupdates. Only one problem remained after disabling softupdates,
> > > while being online and user I/O going on, rebuilding of failed disk
> > > corrupt the R5 volume completely.
> > 
> > Yes, we've fixed a bug in that area.  It had nothing to do with soft
> > updates, though.
> 
> Oh, that's very good news, thank you! Yes, it had nothing to do with
> soft updates at all and that's why I had the "remained after" in the
> sentence.
> 
> > > Don't know is it fixed or not as I don't have necessary hardware at
> > > the moment. The only way around was to quiesce the volume before
> > > rebuilding, umount it, and wait until rebuild finished. I'll suggest
> > > extensive testing cycle for everyone who's going to work with vinum
> > > R5. Concat, striping and mirroring has been a breeze but not so with
> > > R5.
> > 
> > IIRC the rebuild bug bit any striped configuration.
> 
> Ok, I definitely had problems only with R5, but you certainly know
> much better what it was exactly. I'll need to lend 50-pin SCSI cable
> and test vinum again. Will it matter on what version of FreeBSD I'll
> try on? My home system runs -current of Feb 5, but if you suggest
> -stable for consistent results, I'll do it.

So I did. Loaned two SCSI disks and 50-pin cable. Things haven't
improved a bit, I'm very sorry to say it.
The entire test session (script below) was done in single user. To
be fair, I did tens of them, and the mode doesn't matter.
Complete script:

Script started on Sat Mar  1 19:54:45 2003
# pwd
/root
# dmesg
Copyright (c) 1992-2003 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.0-CURRENT #0: Sun Feb  2 16:16:49 EET 2003
[EMAIL PROTECTED]:/usr/home/vallo/Kevad-5.0
Preloaded elf kernel "/boot/kernel/kernel" at 0xc0516000.
Preloaded elf module "/boot/kernel/vinum.ko" at 0xc05160b4.
Preloaded elf module "/boot/kernel/ahc_pci.ko" at 0xc0516160.
Preloaded elf module "/boot/kernel/ahc.ko" at 0xc051620c.
Preloaded elf module "/boot/kernel/cam.ko" at 0xc05162b4.
Timecounter "i8254"  frequency 1193182 Hz
Timecounter "TSC"  frequency 132955356 Hz
CPU: Pentium/P54C (132.96-MHz 586-class CPU)
  Origin = "GenuineIntel"  Id = 0x526  Stepping = 6
  Features=0x1bf
real memory  = 67108864 (64 MB)
avail memory = 59682816 (56 MB)
Intel Pentium detected, installing workaround for F00F bug
Initializing GEOMetry subsystem
VESA: v2.0, 4096k memory, flags:0x0, mode table:0xc037dec2 (122)
VESA: ATI MACH64
npx0:  on motherboard
npx0: INT 16 interface
pcib0:  at pcibus 0 on motherboard
pci0:  on pcib0
isab0:  at device 7.0 on pci0
isa0:  on isab0
atapci0:  port 0xff90-0xff9f at device 7.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
ahc0:  port 0xf800-0xf8ff mem 0xffbee000-0xffbeefff 
irq 10 at device 13.0 on pci0
aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs
pci0:  at device 14.0 (no driver attached)
atapci1:  port 
0xff00-0xff3f,0xffe0-0xffe3,0xffa8-0xffaf,0xffe4-0xffe7,0xfff0-0xfff7 mem 
0xffbc-0xffbd irq 11 at device 15.0 on pci0
ata2: at 0xfff0 on atapci1
ata3: at 0xffa8 on atapci1
orm0:  at iomem 
0xed000-0xedfff,0xca000-0xca7ff,0xc8000-0xc9fff,0xc-0xc7fff on isa0
atkbdc0:  at port 0x64,0x60 on isa0
atkbd0:  flags 0x1 irq 1 on atkbdc0
kbd0 at atkbd0
ed0 at port 0x300-0x31f iomem 0xd8000 irq 5 on isa0
ed0: address 00:80:c8:37:e2:a6, type NE2000 (16 bit) 
fdc0:  at port 
0x3f7,0x3f0-0x3f5 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
ppc0:  at port 0x378-0x37f irq 7 on isa0
ppc0: Generic chipset (EPP/NIBBLE) in COMPATIBLE mode
lpt0:  on ppbus0
lpt0: Interrupt-driven port
ppi0:  on ppbus0
sc0:  at flags 0x100 on isa0
sc0: VGA <5 virtual consoles, flags=0x300>
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A
sio1 at port 0x2f8-0x2ff irq 3 on isa0
sio1: type 16550A
vga0:  at port 0x3c0-0x3df iomem 0xa-0xb on isa0
unknown:  can't assign resources (port)
unknown:  can't assign resources (port)
unknown:  can't assign resources (port)
unknown:  can't assign resources (port)
unknown:  can't assign resources (port)
Timecounters tick every 1.000 msec
ata0-slave: ATAPI identify retries exceeded
ad4: 2445MB  [5300/15/63] at ata2-master UDMA33
ad6: 2423MB  [4924/16/63] at ata3-master UDMA33
acd0: CDROM  at ata0-master PIO3
Waiting 15 seconds for SCSI devices to settle
da0 at ahc0 bus 0 target 0 lun 0
da0:  Fixed Direct Access SCSI-2 device 
da0: 10.000MB/s transfers (10.000MHz, offset 15), Tagged Queueing Enabled
da0: 8682MB (17781520 512 byte sectors: 255H 63S/T 1106C)
da1 at ahc0 bus 0 target 1 lun 0
da1:  Fixed Direct Access SCSI-2 device 
da1: 10.000MB/s transfers (10.000MHz, offset 8), Tagged Queueing Enabled
da1: 2033MB (4165272 512 byte sectors: 255H 6

Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]

2003-02-27 Thread Vallo Kallaste
On Thu, Feb 27, 2003 at 11:59:59AM +1030, Greg 'groggy' Lehey
<[EMAIL PROTECTED]> wrote:

> > The crashes and anomalies with filesystem residing on R5 volume were
> > related to vinum(R5)/softupdates combo.
> 
> Well, at one point we suspected that.  But the cases I have seen were
> based on a misassumption.  Do you have any concrete evidence that
> points to that particular combination?

Don't have any other evidence than the case I was describing. After
changing my employer I hadn't had much time or motivation to try
again.

> > The vinum R5 and system as a whole were stable without
> > softupdates. Only one problem remained after disabling softupdates,
> > while being online and user I/O going on, rebuilding of failed disk
> > corrupt the R5 volume completely.
> 
> Yes, we've fixed a bug in that area.  It had nothing to do with soft
> updates, though.

Oh, that's very good news, thank you! Yes, it had nothing to do with
soft updates at all and that's why I had the "remained after" in the
sentence.

> > Don't know is it fixed or not as I don't have necessary hardware at
> > the moment. The only way around was to quiesce the volume before
> > rebuilding, umount it, and wait until rebuild finished. I'll suggest
> > extensive testing cycle for everyone who's going to work with vinum
> > R5. Concat, striping and mirroring has been a breeze but not so with
> > R5.
> 
> IIRC the rebuild bug bit any striped configuration.

Ok, I definitely had problems only with R5, but you certainly know
much better what it was exactly. I'll need to lend 50-pin SCSI cable
and test vinum again. Will it matter on what version of FreeBSD I'll
try on? My home system runs -current of Feb 5, but if you suggest
-stable for consistent results, I'll do it.

Thanks
-- 
Vallo Kallaste

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message


Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]

2003-02-26 Thread Greg 'groggy' Lehey
On Friday, 21 February 2003 at  1:56:56 -0800, Terry Lambert wrote:
> Vallo Kallaste wrote:
>> The crashes and anomalies with filesystem residing on R5 volume were
>> related to vinum(R5)/softupdates combo. The vinum R5 and system as
>> a whole were stable without softupdates. Only one problem remained
>> after disabling softupdates, while being online and user I/O going
>> on, rebuilding of failed disk corrupt the R5 volume completely.
>> Don't know is it fixed or not as I don't have necessary hardware at
>> the moment. The only way around was to quiesce the volume before
>> rebuilding, umount it, and wait until rebuild finished. I'll suggest
>> extensive testing cycle for everyone who's going to work with
>> vinum R5. Concat, striping and mirroring has been a breeze but not
>> so with R5.
>
> I think this is an expected problem with a lot of concatenation,
> whether through Vinum, GEOM, RAIDFrame, or whatever.

Can you be more specific?  What you say below doesn't address any
basic difference between virtual and real disks.

> This comes about for the same reason that you can't "mount -u"
> to turn Soft Updates from "off" to "on": Soft Updates does not
> tolerate dirty buffers for which a dependency does not exist, and
> will crap out when a pending dirty buffer causes a write.

I don't understand what this has to do with virtual disks.

> This could be fixed in the "mount -u" case for Soft Updates, and it
> can also be fixed for Vinum (et. al.).
>
> The key is the difference between a "mount -u" vs. a "umount ; mount",
> which comes down to flushing and invalidating all buffers on the
> underlying device, e.g.:
>
>   vn_lock(devvp, LK_EXCLUSIVE | LK_RETRY, p);
>   vinvalbuf(devvp, V_SAVE, NOCRED, p, 0, 0);
>   error = VOP_CLOSE(devvp, ronly ? FREAD : FREAD|FWRITE, FSCRED, p);
>   error = VOP_OPEN(devvp, ronly ? FREAD : FREAD|FWRITE, FSCRED, p);
>   VOP_UNLOCK(devvp, 0, p);
>
> ... Basically, after rebuilding, before allowing the mount to proceed,
> the Vinum (and GEOM and RAIDFRame, etc.) code needs to cause all the
> pending dirty buffers to be written.  This will guarantee that there
> are no outstanding dirty buffers at mount time, which in turn guarantees
> that there will be no dirty buffers that the dependency tracking in
> Soft Updates does not know about.

I don't understand what you're assuming here.  Certainly I can't see
any relevance to Vinum, RAIDframe or any other virtual disk system.

Greg
--
See complete headers for address and phone numbers
Please note: we block mail from major spammers, notably yahoo.com.
See http://www.lemis.com/yahoospam.html for further details.


pgp0.pgp
Description: PGP signature


Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]

2003-02-26 Thread Greg 'groggy' Lehey
On Friday, 21 February 2003 at 10:00:46 +0200, Vallo Kallaste wrote:
> On Thu, Feb 20, 2003 at 02:28:45PM -0800, Darryl Okahata
> <[EMAIL PROTECTED]> wrote:
>
>> Vallo Kallaste <[EMAIL PROTECTED]> wrote:
>>
>>> I'll second Brad's statement about vinum and softupdates
>>> interactions. My last experiments with vinum were more than half a
>>> year ago, but I guess it still holds. BTW, the interactions showed
>>> up _only_ on R5 volumes. I had 6 disk (SCSI) R5 volume in Compaq
>>> Proliant 3000 and the system was very stable before I enabled
>>> softupdates.. and of course after I disabled softupdates. In between
>>> there were crashes and nasty problems with filesystem. Unfortunately
>>> it was production system and I hadn't chanche to play.
>>
>>  Did you believe that the crashes were caused by enabling softupdates on
>> an R5 vinum volume, or were the crashes unrelated to vinum/softupdates?
>> I can see how crashes unrelated to vinum/softupdates might trash vinum
>> filesystems.
>
> The crashes and anomalies with filesystem residing on R5 volume were
> related to vinum(R5)/softupdates combo.

Well, at one point we suspected that.  But the cases I have seen were
based on a misassumption.  Do you have any concrete evidence that
points to that particular combination?

> The vinum R5 and system as a whole were stable without
> softupdates. Only one problem remained after disabling softupdates,
> while being online and user I/O going on, rebuilding of failed disk
> corrupt the R5 volume completely.

Yes, we've fixed a bug in that area.  It had nothing to do with soft
updates, though.

> Don't know is it fixed or not as I don't have necessary hardware at
> the moment. The only way around was to quiesce the volume before
> rebuilding, umount it, and wait until rebuild finished. I'll suggest
> extensive testing cycle for everyone who's going to work with vinum
> R5. Concat, striping and mirroring has been a breeze but not so with
> R5.

IIRC the rebuild bug bit any striped configuration.

Greg
--
See complete headers for address and phone numbers
Please note: we block mail from major spammers, notably yahoo.com.
See http://www.lemis.com/yahoospam.html for further details.


pgp0.pgp
Description: PGP signature


Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]

2003-02-24 Thread Terry Lambert
Darryl Okahata wrote:
> Terry Lambert <[EMAIL PROTECTED]> wrote:
> > I think this is an expected problem with a lot of concatenation,
> > whether through Vinum, GEOM, RAIDFrame, or whatever.
> >
> > This comes about for the same reason that you can't "mount -u"
> > to turn Soft Updates from "off" to "on": Soft Updates does not
> > tolerate dirty buffers for which a dependency does not exist, and
> > will crap out when a pending dirty buffer causes a write.
> 
>  Does this affect background fsck, too (on regular, non-vinum
> filesystems)?  From what little I know of bg fsck, I'm guessing not, but
> I'd like to be sure.  Thanks.

No, it doesn't.  Background fsck works by assuming that the only
thing that could contain bad data is the cylinder group bitmaps,
which means the worst case failure is some blocks are not available
for reallocation.  It works by taking a snapshot, which is a feature
that allows modification of the FS while the bgfsck's idea of the FS
remains unchanged.  Then it goes through the bitmaps, verifying that
the blocks it thinks are allocated are in fact allocated by files
within the snapshot.  Basically, it's only job is really to clear
bits in the bitmap that represent blocks for which there are no files
referencing them.

There are situations where bgfsck can fail, sometimes catastrophically,
but they are unrelated to having dirty blocks in memory for which no
updates have been created.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message


Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]

2003-02-24 Thread Darryl Okahata
Terry Lambert <[EMAIL PROTECTED]> wrote:

> I think this is an expected problem with a lot of concatenation,
> whether through Vinum, GEOM, RAIDFrame, or whatever.
> 
> This comes about for the same reason that you can't "mount -u"
> to turn Soft Updates from "off" to "on": Soft Updates does not
> tolerate dirty buffers for which a dependency does not exist, and
> will crap out when a pending dirty buffer causes a write.

 Does this affect background fsck, too (on regular, non-vinum
filesystems)?  From what little I know of bg fsck, I'm guessing not, but
I'd like to be sure.  Thanks.

-- 
Darryl Okahata
[EMAIL PROTECTED]

DISCLAIMER: this message is the author's personal opinion and does not
constitute the support, opinion, or policy of Agilent Technologies, or
of the little green men that have been following him all day.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message


Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]

2003-02-21 Thread Terry Lambert
Vallo Kallaste wrote:
> The crashes and anomalies with filesystem residing on R5 volume were
> related to vinum(R5)/softupdates combo. The vinum R5 and system as
> a whole were stable without softupdates. Only one problem remained
> after disabling softupdates, while being online and user I/O going
> on, rebuilding of failed disk corrupt the R5 volume completely.
> Don't know is it fixed or not as I don't have necessary hardware at
> the moment. The only way around was to quiesce the volume before
> rebuilding, umount it, and wait until rebuild finished. I'll suggest
> extensive testing cycle for everyone who's going to work with
> vinum R5. Concat, striping and mirroring has been a breeze but not
> so with R5.

I think this is an expected problem with a lot of concatenation,
whether through Vinum, GEOM, RAIDFrame, or whatever.

This comes about for the same reason that you can't "mount -u"
to turn Soft Updates from "off" to "on": Soft Updates does not
tolerate dirty buffers for which a dependency does not exist, and
will crap out when a pending dirty buffer causes a write.

This could be fixed in the "mount -u" case for Soft Updates, and it
can also be fixed for Vinum (et. al.).

The key is the difference between a "mount -u" vs. a "umount ; mount",
which comes down to flushing and invalidating all buffers on the
underlying device, e.g.:

vn_lock(devvp, LK_EXCLUSIVE | LK_RETRY, p);
vinvalbuf(devvp, V_SAVE, NOCRED, p, 0, 0);
error = VOP_CLOSE(devvp, ronly ? FREAD : FREAD|FWRITE, FSCRED, p);
error = VOP_OPEN(devvp, ronly ? FREAD : FREAD|FWRITE, FSCRED, p);
VOP_UNLOCK(devvp, 0, p);

... Basically, after rebuilding, before allowing the mount to proceed,
the Vinum (and GEOM and RAIDFRame, etc.) code needs to cause all the
pending dirty buffers to be written.  This will guarantee that there
are no outstanding dirty buffers at mount time, which in turn guarantees
that there will be no dirty buffers that the dependency tracking in
Soft Updates does not know about.

FWIW: I've maintained for over 6 years now that the mount update
code should be modified to do this automatically (and provided
patches; see early 1997 mailing list archives), essentially turning
a "mount -u" into a "umount ; mount", without invalidating outstanding
vnodes and in-core inodes or their references (so that open files do
not break... they just get all their buffers taken away from them).

Of course, the only open files that matter for device layering are the
device exporting the layered block store, and the underlying component
block stores that make it up (i.e. no open files there).


-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]

2003-02-21 Thread Vallo Kallaste
On Thu, Feb 20, 2003 at 02:28:45PM -0800, Darryl Okahata
<[EMAIL PROTECTED]> wrote:

> Vallo Kallaste <[EMAIL PROTECTED]> wrote:
> 
> > I'll second Brad's statement about vinum and softupdates
> > interactions. My last experiments with vinum were more than half a
> > year ago, but I guess it still holds. BTW, the interactions showed
> > up _only_ on R5 volumes. I had 6 disk (SCSI) R5 volume in Compaq
> > Proliant 3000 and the system was very stable before I enabled
> > softupdates.. and of course after I disabled softupdates. In between
> > there were crashes and nasty problems with filesystem. Unfortunately
> > it was production system and I hadn't chanche to play.
> 
>  Did you believe that the crashes were caused by enabling softupdates on
> an R5 vinum volume, or were the crashes unrelated to vinum/softupdates?
> I can see how crashes unrelated to vinum/softupdates might trash vinum
> filesystems.

The crashes and anomalies with filesystem residing on R5 volume were
related to vinum(R5)/softupdates combo. The vinum R5 and system as
a whole were stable without softupdates. Only one problem remained
after disabling softupdates, while being online and user I/O going
on, rebuilding of failed disk corrupt the R5 volume completely.
Don't know is it fixed or not as I don't have necessary hardware at
the moment. The only way around was to quiesce the volume before
rebuilding, umount it, and wait until rebuild finished. I'll suggest
extensive testing cycle for everyone who's going to work with
vinum R5. Concat, striping and mirroring has been a breeze but not
so with R5.
-- 

Vallo Kallaste
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: background fsck deadlocks with ufs2 and big disk

2003-02-20 Thread Brad Knowles
At 2:28 PM -0800 2003/02/20, Darryl Okahata wrote:


  Did you believe that the crashes were caused by enabling softupdates on
 an R5 vinum volume, or were the crashes unrelated to vinum/softupdates?
 I can see how crashes unrelated to vinum/softupdates might trash vinum
 filesystems.


	Using RAID-5 under vinum was always a somewhat tricky business 
for me, but in many cases I could get it to work reasonably well most 
of the time.  But if I enabled softupdates on that filesystem, I was 
toast.  Softupdates enabled on filesystems that were not on top of 
vinum RAID-5 logical devices seemed to be fine.

	So, the interaction that I personally witnessed was specifically 
between vinum RAID-5 and softupdates.

--
Brad Knowles, <[EMAIL PROTECTED]>

"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
-Benjamin Franklin, Historical Review of Pennsylvania.

GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI$ P+>++ L+ !E-(---) W+++(--) N+
!w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++)
tv+(+++) b+() DI+() D+(++) G+() e++> h--- r---(+++)* z(+++)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message


Re: background fsck deadlocks with ufs2 and big disk

2003-02-20 Thread Darryl Okahata
Vallo Kallaste <[EMAIL PROTECTED]> wrote:

> I'll second Brad's statement about vinum and softupdates
> interactions. My last experiments with vinum were more than half a
> year ago, but I guess it still holds. BTW, the interactions showed
> up _only_ on R5 volumes. I had 6 disk (SCSI) R5 volume in Compaq
> Proliant 3000 and the system was very stable before I enabled
> softupdates.. and of course after I disabled softupdates. In between
> there were crashes and nasty problems with filesystem. Unfortunately
> it was production system and I hadn't chanche to play.

 Did you believe that the crashes were caused by enabling softupdates on
an R5 vinum volume, or were the crashes unrelated to vinum/softupdates?
I can see how crashes unrelated to vinum/softupdates might trash vinum
filesystems.

-- 
Darryl Okahata
[EMAIL PROTECTED]

DISCLAIMER: this message is the author's personal opinion and does not
constitute the support, opinion, or policy of Agilent Technologies, or
of the little green men that have been following him all day.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: background fsck deadlocks with ufs2 and big disk

2003-02-19 Thread Darryl Okahata
Brad Knowles <[EMAIL PROTECTED]> wrote:

>   You know, vinum & softupdates have had bad interactions with each 
> other for as long as I can remember.  Has this truly been a 
> consistent thing (as I seem to recall), or has this been an 
> on-again/off-again situation?

 Ah, yaaah.  Hmm 

 This is the first I've heard of that, but I can see how that could
be.  Could vinum be considered to be a form of (unintentional)
write-caching?  

 That might explain how the filesystem got terribly hosed, but it
doesn't help with the panic.  Foo.

[ This is on a system that's been running in the current state for
  around a month.  So far, it's panic'd once (a week or so ago), and so
  I don't have any feel for long-term stability.  We'll see how it
  goes.  ]

-- 
Darryl Okahata
[EMAIL PROTECTED]

DISCLAIMER: this message is the author's personal opinion and does not
constitute the support, opinion, or policy of Agilent Technologies, or
of the little green men that have been following him all day.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: background fsck deadlocks with ufs2 and big disk

2003-02-19 Thread Brad Knowles
At 9:15 AM -0800 2003/02/19, Darryl Okahata wrote:


 * The UFS1 filesystem in question (and I assume that it was UFS1, as I
   did not specify a filesystem type to newfs) is located on a RAID5
   vinum volume, consisting of five 80GB disks.

 * Softupdates is enabled.


	You know, vinum & softupdates have had bad interactions with each 
other for as long as I can remember.  Has this truly been a 
consistent thing (as I seem to recall), or has this been an 
on-again/off-again situation?

--
Brad Knowles, <[EMAIL PROTECTED]>

"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
-Benjamin Franklin, Historical Review of Pennsylvania.

GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI$ P+>++ L+ !E-(---) W+++(--) N+
!w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++)
tv+(+++) b+() DI+() D+(++) G+() e++> h--- r---(+++)* z(+++)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message


Re: background fsck deadlocks with ufs2 and big disk

2003-02-19 Thread Darryl Okahata
David Schultz <[EMAIL PROTECTED]> wrote:

> IIRC, Kirk was trying to reproduce this a little while ago in
> response to similar reports.  He would probably be interested
> in any new information.

 I don't have any useful information, but I do have a data point:

My 5.0-RELEASE system recently mysteriously panic'd, which
resulted in a partially trashed UFS1 filesystem, which caused bg
fsck to hang.

Details:

* The panic was weird, in that only the first 4-6 characters of the
  first function (in the panic stacktrace) was displayed on the console
  (sorry, forgot what it was).  Nothing else past that point was shown,
  and the console was locked up.  Ddb was compiled into the kernel, but
  ctrl-esc did nothing.

* The UFS1 filesystem in question (and I assume that it was UFS1, as I
  did not specify a filesystem type to newfs) is located on a RAID5
  vinum volume, consisting of five 80GB disks.

* Softupdates is enabled.

* When bg fsck hung (w/no disk activity), I could break into the ddb.
  Unfortunately, I don't know how to use ddb, aside from "ps".

* Disabling bg fsck allowed the system to boot.  However, fg fsck
  failed, and I had to do a manual fsck, which spewed lots of nasty
  "SOFTUPDATE INCONSISTENCY" errors.

* Disturbingly (but fortunately), I then unmounted the filesystem (in
  multi-user mode) and re-ran fsck, and fsck still found errors.  There
  should not have been any errors, as fg fsck just finished running.

  [ Unfortunately, I've forgotten what they were, and an umount/fsck
done right now shows no problems.  I think the errors were one of
the "incorrect block count" errors.  ]

* After the fsck, some files were partially truncated (& corrupted?).
  After investigating, I believe these truncated files (which were NOT
  recently modified) were in a directory in which other files were being
  created/written at the time of the panic.

-- 
Darryl Okahata
[EMAIL PROTECTED]

DISCLAIMER: this message is the author's personal opinion and does not
constitute the support, opinion, or policy of Agilent Technologies, or
of the little green men that have been following him all day.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: background fsck deadlocks with ufs2 and big disk

2003-02-18 Thread David Schultz
Thus spake Martin Blapp <[EMAIL PROTECTED]>:
> I just wanted to tell that I can deadlock one of my current boxes
> with a ufs2 filesystem on a 120GB ATA disk. I can reproduce
> the problem. The background fsck process hangs some time at the
> same place always at the same place, sometimes the box freezes
> after some time.
> 
> The same box works fine with ufs1.

IIRC, Kirk was trying to reproduce this a little while ago in
response to similar reports.  He would probably be interested
in any new information.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



background fsck deadlocks with ufs2 and big disk

2003-02-18 Thread Martin Blapp

Hi all,

I just wanted to tell that I can deadlock one of my current boxes
with a ufs2 filesystem on a 120GB ATA disk. I can reproduce
the problem. The background fsck process hangs some time at the
same place always at the same place, sometimes the box freezes
after some time.

The same box works fine with ufs1.

Martin

Martin Blapp, <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>
--
ImproWare AG, UNIXSP & ISP, Zurlindenstrasse 29, 4133 Pratteln, CH
Phone: +41 61 826 93 00 Fax: +41 61 826 93 01
PGP: 
PGP Fingerprint: B434 53FC C87C FE7B 0A18 B84C 8686 EF22 D300 551E
--

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message