subject:"Re\: Found NFS data corruption bug... \(was Re\: NFS\: How to make FreeBSD fall on its face in one easy step \)"

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

2001-12-19 Thread Anthony Naggs


In article <[EMAIL PROTECTED]>, Sergey Babkin
<[EMAIL PROTECTED]> writes
>
>By the way the journaling filesystems don't neccessary guarantee that 
>you won't need fsck: for example, if VXFS crashes at a particularly
>bad moment, it will require you to do "fsck -o full" which is as slow
>as the fsck on traditional UFS.

JFS still scores against traditional Unix file systems on large volumes,
(e.g. Terabytes), as it requires very small amounts of virtual memory
during a full fsck.


ttfn,
Tony

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

2001-12-18 Thread Robert Watson

On Tue, 18 Dec 2001, Brandon D. Valentine wrote:

> On Tue, 18 Dec 2001, Mike Bristow wrote:
> 
> >I suspect that the background fsck[1] that's available in FreeBSD-current
> >fits the bill just as well as JFS or XFS - and I'll also bet that it'll
> >be available in a FreeBSD-release before I'd trust data to a port of
> >JFS or XFS.
> 
> This is a killer feature.  Has anyone decided whether snapshots and
> background fsck will ever be backported to the RELENG_4 branch or are
> they destined for 5.0? 

In a word (or two): highly unlikely.  This code has been considered
experimental for a while now, and I expect that it will remain so.  While
it has been gradually improving stability (it no longer toasts your system
when you send a kill signal to fsck_ffs in the background), a number of
usability factors are still being addressed.  Kirk recently committed
several performance improvements that (apparently) result in a far more
usable system during the background fsck.  Previously, my system was
available, but largely unuseful, during the background fsck.  This code
relies on the FFS snapshot feature, which is also not as widely tested,
and has some compatibility concerns.  If the support for snapshots hasn't
yet been MFC'd to -STABLE fsck, we may want to consider doing so; last
time I checked, if a snapshot was found by RELENG_4's fsck, it would be
rather sadly removed with some unhappiness from fsck.  As such, I'd
probably resist efforts to MFC this code, and just go for inclusion in
5.0-RELEASE.  We'll need to give it a lot of testing however. :-)

Robert N M Watson FreeBSD Core Team, TrustedBSD Project
[EMAIL PROTECTED]  NAI Labs, Safeport Network Services

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

2001-12-18 Thread Terry Lambert

Alfred Perlstein wrote:
> > By the way the journaling filesystems don't neccessary guarantee that
> > you won't need fsck: for example, if VXFS crashes at a particularly
> > bad moment, it will require you to do "fsck -o full" which is as slow
> > as the fsck on traditional UFS.
> 
> Yeah, but that's not mentioned in the whitepaper! :)

Your insane humor quotient is very high today...

Actually, this is mentioned in the white papers of all journalling
FSs, but is generally glossed over with application specific hardware
that is missing on PCs, which will record the cause of the failure
across a reboot, and will throw a chock in front of the wheels before
a bad write on a power failure... something IDE drives fail to do, but
SCSI drives do not (or did not, until recently).

Of course, you can't just use PC CMOS for this because of the lack
of DC hold up time and AC fail notification in standard PC power
supplies.

You owe the Oracle your first born child, and , because of the GPL,
anyone who marries your first born child owes the Oracle their first
born child, and so on, recursively and eternally, forever after.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

2001-12-18 Thread Alfred Perlstein


* Sergey Babkin <[EMAIL PROTECTED]> [011218 19:45] wrote:
> Dan Nelson wrote:
> > 
> > In the last episode (Dec 18), Mike Bristow said:
> > > I suspect that the background fsck[1] that's available in FreeBSD-current
> > > fits the bill just as well as JFS or XFS - and I'll also bet that it'll
> > > be available in a FreeBSD-release before I'd trust data to a port of
> > > JFS or XFS.
> > 
> > The problems with a background fsck is you still have to run fsck,
> > which can take 10 minutes on a large volume when it's idle, and who
> 
> By the way the journaling filesystems don't neccessary guarantee that 
> you won't need fsck: for example, if VXFS crashes at a particularly
> bad moment, it will require you to do "fsck -o full" which is as slow
> as the fsck on traditional UFS.

Yeah, but that's not mentioned in the whitepaper! :)

-- 
-Alfred Perlstein [[EMAIL PROTECTED]]
'Instead of asking why a piece of software is using "1970s technology,"
 start asking why software is ignoring 30 years of accumulated wisdom.'
   http://www.morons.org/rants/gpl-harmful.php3

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

2001-12-18 Thread Sergey Babkin


Dan Nelson wrote:
> 
> In the last episode (Dec 18), Mike Bristow said:
> > I suspect that the background fsck[1] that's available in FreeBSD-current
> > fits the bill just as well as JFS or XFS - and I'll also bet that it'll
> > be available in a FreeBSD-release before I'd trust data to a port of
> > JFS or XFS.
> 
> The problems with a background fsck is you still have to run fsck,
> which can take 10 minutes on a large volume when it's idle, and who

By the way the journaling filesystems don't neccessary guarantee that 
you won't need fsck: for example, if VXFS crashes at a particularly
bad moment, it will require you to do "fsck -o full" which is as slow
as the fsck on traditional UFS.

-SB

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

2001-12-18 Thread Dan Nelson


In the last episode (Dec 18), Mike Bristow said:
> I suspect that the background fsck[1] that's available in FreeBSD-current
> fits the bill just as well as JFS or XFS - and I'll also bet that it'll
> be available in a FreeBSD-release before I'd trust data to a port of
> JFS or XFS.

The problems with a background fsck is you still have to run fsck,
which can take 10 minutes on a large volume when it's idle, and who
knows how long as a background process when the system's up.  It might
not even finish at all if a user starts modifying a large file, causing
the snapshot file that the background fsck is using to grow and fill up
the filesystem.  Unlikely, but possible if your disk is almost full
already.

-- 
Dan Nelson
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

2001-12-18 Thread Mike Bristow


On Thu, Dec 13, 2001 at 04:39:58AM -0500, Brandon D. Valentine wrote:
> On Wed, 12 Dec 2001, Matthew Dillon wrote:
> 
> >All I can say is... holy shit!
> 
> Dude, you kick ass.  At work I've been dealing with Linux's crappy NFS
> implementation for years, while FreeBSD has always been pretty damn good
> by comparison.  Linux finally got a decent amount of performance under
> 2.4 (which finally does NFSv3 to hosts other than other Linux boxen),
> but it still can't touch the FreeBSD NFS implementation.  The more
> robust you make it the easier it is for me to argue for deployment of
> more FreeBSD systems in NFS server roles.  The only advantage Linux has
> got right now is XFS, which is admittedly a pretty large advantage on
> multi terabyte filesystems where fsck is impossible.

I'm guessing that the real requirment here is is "when the system 
is turned on after an unclean shutdown (eg, power failure), it should
be able to export it's NFS filesystems quickly".

I suspect that the background fsck[1] that's available in FreeBSD-current
fits the bill just as well as JFS or XFS - and I'll also bet that it'll
be available in a FreeBSD-release before I'd trust data to a port of
JFS or XFS.


[1] If you've missed it, the basic idea is:

for $fs in $all_filesystems ; do
if is_a_softupdate_filesystem($fs) ; then
fsck $fs &
else
fsck $fs
fi
done

except it happens in fsck itself, rather than a shell script.

-- 
Mike Bristow, embonpointful, but not managerial, damnit.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

2001-12-17 Thread Jordan Hubbard


I'm trying to get the license issue clarified, then it can go in
/usr/src/tools/regression.

- Jordan

> Jordan Hubbard <[EMAIL PROTECTED]> writes:
> 
> > Guy Harris of NetApp sent me a whole mess-o-changes to it and when I
> > went to forward them to you, I found that I must have been in
> > delete-o-matic mode at some point earlier in my inbox since it was
> > gone.  I've requested that he send them to me again and will forward
> > them to you once I get a copy again.  Whoops!
> 
> Would it be worth making a port for this tool?  It sounds like it's
> too important to get lost in a mailing list archive.  There's a
> precedence set by having /usr/ports/sysutils/crashme.  :-)
> 
> -Dom
> 
> -- 
> | Semantico: creators of major online resources  |
> |   URL: http://www.semantico.com/   |
> |   Tel: +44 (1273) 72   |
> |   Address: 33 Bond St., Brighton, Sussex, BN1 1RD, UK. |


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

2001-12-16 Thread Tony


In article <[EMAIL PROTECTED]>, Brandon
D. Valentine <[EMAIL PROTECTED]> writes
>
[snip]
>but it still can't touch the FreeBSD NFS implementation.  The more
>robust you make it the easier it is for me to argue for deployment of
>more FreeBSD systems in NFS server roles.  The only advantage Linux has
>got right now is XFS, which is admittedly a pretty large advantage on
>multi terabyte filesystems where fsck is impossible.

That is what I wanted to hear, an unambiguous argument that a solid
implementation of JFS would be useful to some user segment.


Tony

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

2001-12-16 Thread Mike Smith



JFWIW, you can build fsx with minimal or no changes on Windows with David 
Korn's UWIN kit.  All of the other posix-y kits have internal problems 
that will cause spurious failures.

If you want to use Windows boxes as test clients (probably a good idea) 
this is fairly important...

> > I gave out fsx source code at the recent CIFS (SMB) plugfest.  If I make
> > the 2002 Connectathon I'll give it out there too.  I don't test it on
> > Windows so those defines may be in need of repair.  Please send me any
> > patches or cool additions.
> 
> Guy Harris of NetApp sent me a whole mess-o-changes to it and when I
> went to forward them to you, I found that I must have been in
> delete-o-matic mode at some point earlier in my inbox since it was
> gone.  I've requested that he send them to me again and will forward
> them to you once I get a copy again.  Whoops!
> 
> - Jordan

-- 
... every activity meets with opposition, everyone who acts has his
rivals and unfortunately opponents also.  But not because people want
to be opponents, rather because the tasks and relationships force
people to take different points of view.  [Dr. Fritz Todt]
   V I C T O R Y   N O T   V E N G E A N C E



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

2001-12-16 Thread Jordan Hubbard


> I gave out fsx source code at the recent CIFS (SMB) plugfest.  If I make
> the 2002 Connectathon I'll give it out there too.  I don't test it on
> Windows so those defines may be in need of repair.  Please send me any
> patches or cool additions.

Guy Harris of NetApp sent me a whole mess-o-changes to it and when I
went to forward them to you, I found that I must have been in
delete-o-matic mode at some point earlier in my inbox since it was
gone.  I've requested that he send them to me again and will forward
them to you once I get a copy again.  Whoops!

- Jordan

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

2001-12-13 Thread Thomas Zenker


On Thu, Dec 13, 2001 at 01:40:46PM -0800, Matthew Dillon wrote:
> 
> :Matt,
> :
> :what the hell, this seems to very near by a problem I wanted to
> :report since a week:
> :
> :in a data acquisition I have a write process writing to a file
> :backed shared mmapped ringbuffer. There can be several reader
> :processes on this this ringbuffer. Now once i killed the writer for
> :resizing of the ringbuffer and forgot about the readers. The writer
> :truncated the database without unlinking it before. This lead the
> :readers to be running for ever, it seemed so at least.  After
> :attaching with gdb I saw, that they were only page faulting nothing
> :more, for ever
> :
> :Something similar I saw with netscape going mad.
> :
> :cheers, Thomas
> 
> That's something else.  There's no OS bug there.   When you mmap()
> a file only those pages that are within the file's boundries are
> valid.  So if you ftruncate() the file then all the pages occuring
> after the (new) file EOF will become invalid and BUSfault if the 
> process touches them.
> 
> You touched upon the correct solution... remove() the file instead
> of ftruncate()ing it.  The file's data then remains intact for the
> processes still referencing it.
> 
> The readers must be catching SIGBUS and retrying (not exiting),
> causing them to run in a signal loop forever.  This is a case of
> bad programming.  I've seen it before... there was a popular IRC
> bot back in my BEST days which constantly got itself into infinite
> loops because the guy who wrote it installed a signal handler for
> SIGBUS.
> 
>   -Matt
>   Matthew Dillon 
>   <[EMAIL PROTECTED]>


well, I know, that this was a bug in my software, not to unlink the
file first and then truncating :-). But SIGBUS was not catched in
the readers.  Will try to reproduce it.

Thomas


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

2001-12-13 Thread Matthew Dillon



:Matt,
:
:what the hell, this seems to very near by a problem I wanted to
:report since a week:
:
:in a data acquisition I have a write process writing to a file
:backed shared mmapped ringbuffer. There can be several reader
:processes on this this ringbuffer. Now once i killed the writer for
:resizing of the ringbuffer and forgot about the readers. The writer
:truncated the database without unlinking it before. This lead the
:readers to be running for ever, it seemed so at least.  After
:attaching with gdb I saw, that they were only page faulting nothing
:more, for ever
:
:Something similar I saw with netscape going mad.
:
:cheers, Thomas

That's something else.  There's no OS bug there.   When you mmap()
a file only those pages that are within the file's boundries are
valid.  So if you ftruncate() the file then all the pages occuring
after the (new) file EOF will become invalid and BUSfault if the 
process touches them.

You touched upon the correct solution... remove() the file instead
of ftruncate()ing it.  The file's data then remains intact for the
processes still referencing it.

The readers must be catching SIGBUS and retrying (not exiting),
causing them to run in a signal loop forever.  This is a case of
bad programming.  I've seen it before... there was a popular IRC
bot back in my BEST days which constantly got itself into infinite
loops because the guy who wrote it installed a signal handler for
SIGBUS.

-Matt
Matthew Dillon 
<[EMAIL PROTECTED]>

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

2001-12-13 Thread Thomas Zenker

On Thu, Dec 13, 2001 at 02:58:28AM -0800, Matthew Dillon wrote:
> 
> @#$@#$ crap.  I think I found a dirty-mmap edge case with truncation.
> It requires a change to vm_page_set_validclean(), which of course is
> one of the core routines in the VM system.
> 
> Basically what happens is that ftruncate() calls vnode_pager_setsize() 
> which eventually calls vm_page_set_validclean().
> 
> If you happened to mmap() the truncation point shared R+W and
> dirty it, then truncate to something that isn't a multiple DEV_BSIZE..
> for example, if you were to truncate to an offset of '10', and a buffer

Matt,

what the hell, this seems to very near by a problem I wanted to
report since a week:

in a data acquisition I have a write process writing to a file
backed shared mmapped ringbuffer. There can be several reader
processes on this this ringbuffer. Now once i killed the writer for
resizing of the ringbuffer and forgot about the readers. The writer
truncated the database without unlinking it before. This lead the
readers to be running for ever, it seemed so at least.  After
attaching with gdb I saw, that they were only page faulting nothing
more, for ever

Something similar I saw with netscape going mad.

cheers, Thomas

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

2001-12-13 Thread Matthew Dillon



@#$@#$ crap.  I think I found a dirty-mmap edge case with truncation.
It requires a change to vm_page_set_validclean(), which of course is
one of the core routines in the VM system.

Basically what happens is that ftruncate() calls vnode_pager_setsize() 
which eventually calls vm_page_set_validclean().

If you happened to mmap() the truncation point shared R+W and
dirty it, then truncate to something that isn't a multiple DEV_BSIZE..
for example, if you were to truncate to an offset of '10', and a buffer
has not been instantiated or marked dirty for the block yet, then the
truncate operation will clear the dirty bit on the page and your 10
bytes of dirty data will never get synced and will disappear if the page
is freed.

vm_page_set_validclean() needs to set the valid bits and clear
the dirty bits associated with (base,size) within the page.  If base and/or
size is unaligned then the valid and dirty bits encompass the bits
associated with any overlapping DEV_BSIZEd chunks.  This is fine for
setting valid, but not correct when clearing dirty.  Only dirty bits for
DEV_BSIZE chunks that are fully enclosed in the range can be cleared.

The fix is easy, but a little scary due to being right smack in the
middle of the VM system.

--

In anycase, I think I got it licked.  I'm going to run this nfs tester
program overnight on a local filesystem, NFSv2, and NFSv3 mount.  Cross
your fingers!  If it survives I'll start comitting to -current tomorrow.
I give it about a 70% chance of surviving.

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

2001-12-13 Thread Jordan Hubbard


> Thanks!  I'm slowly whacking the bugs.  I just fixed another one...

That's awesome...  I'd hoped this program might help you find a few
things, but I never expected you to find so many bugs in NFS
so... quickly!  I certainly didn't expect you to tickle any local
filesystem problems either. :)

> I think I can make it perfect.  I'll post another patch tomorrow.

Thanks.  With 4.5 imminent these improvements are, to say state the
flagrantly obvious, very timely indeed.

- Jordan

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

2001-12-13 Thread Matthew Dillon



:
:   Very cool. Good job!
:
:-DG
:
:David Greenman
:Co-founder, The FreeBSD Project - http://www.freebsd.org

Thanks!  I'm slowly whacking the bugs.  I just fixed another one...
vtruncbuf() handles the buffers beyond the file EOF but doesn't handle
the buffer straddling the truncation point, so I had to augment the
NFS client's truncation code to deal with that.  With that fixed the
tester program got to 34483 operations before finding a problem.  
Hopefully I'm in the home stretch now :-)

What I really love about this program is that the problems are so
repeatable.  So far the same failure occurs at exactly the same place,
every time.  It makes it unbelievably easy to track the bugs down.

I think I can make it perfect.  I'll post another patch tomorrow.

-Matt
Matthew Dillon 
<[EMAIL PROTECTED]>

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

2001-12-12 Thread David Greenman


   Very cool. Good job!

-DG

David Greenman
Co-founder, The FreeBSD Project - http://www.freebsd.org
President, TeraSolutions, Inc. - http://www.terasolutions.com
Pave the road of life with opportunities.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )

18 matches

Site Navigation

Mail list logo

Footer information