On Thu, Mar 31, 2005 at 11:06:08AM -0600, Karl Denninger wrote:
> On Thu, Mar 31, 2005 at 12:02:20PM -0500, Matthew N. Dodd wrote:
> > On Wed, 30 Mar 2005, Karl Denninger wrote:
> > > Removing the FIRST delta, which is:
> > >
> > > 218a219,221
> > >       if (!dumping)
> > >           callout_reset(&request->callout, request->timeout * hz,
> > >                         (timeout_t*)ata_timeout, request);
> > >
> > > appears to get rid of the crashes while not harming data integrity OR the
> > > reqeueing.
> > 
> > I'd be interested to know if the attached patch does anything.
> > 
> > -- 
> > 10 40 80 C0 00 FF FF FF FF C0 00 00 00 00 10 AA AA 03 00 00 00 08 00
> > Index: ata-queue.c
> > ===================================================================
> > RCS file: /home/ncvs/src/sys/dev/ata/ata-queue.c,v
> > retrieving revision 1.32.2.6
> > diff -u -u -r1.32.2.6 ata-queue.c
> > --- ata-queue.c     23 Mar 2005 04:50:26 -0000      1.32.2.6
> > +++ ata-queue.c     31 Mar 2005 17:00:46 -0000
> > @@ -217,8 +217,7 @@
> >      }
> >      else {
> >     if (!dumping)
> > -       callout_reset(&request->callout, request->timeout * hz,
> > -                     (timeout_t*)ata_timeout, request);
> > +            callout_drain(&request->callout);
> >     if (request->bio && !(request->flags & ATA_R_TIMEOUT)) {
> >         ATA_DEBUG_RQ(request, "finish bio_taskqueue");
> >         bio_taskqueue(request->bio, (bio_task_t *)ata_completed, request);
> > 
> 
> It'll be a few hours before I will know on the production machine - the RAID
> array has to rebuild before I can trigger the problem, and we're scheduled
> for some power work here in an hour or so - which I suspect will get in the
> way.
> 
> What do you expect the patch to do, given that removing the delta appears to
> fix the instability problem?

This patch appears to be "safe".

I have about 2 hours on the production machine right now post-rebuild (which
had to complete first) with the added "callout_drain" in, have taken two DMA
WRITE retries, and have not yet seen any evidence of destabilization.

This is good evidence but not proof - before I took out the original line
the FIRST write retry would immediately cause the system to become unstable.

--
-- 
Karl Denninger ([EMAIL PROTECTED]) Internet Consultant & Kids Rights Activist
http://www.denninger.net        My home on the net - links to everything I do!
http://scubaforum.org           Your UNCENSORED place to talk about DIVING!
http://www.spamcuda.net         SPAM FREE mailboxes - FREE FOR A LIMITED TIME!
http://genesis3.blogspot.com    Musings Of A Sentient Mind


_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to