Re: raidctl on a live raid array, and the kernel debugger

Jason Murray Mon, 17 Jul 2006 06:05:11 -0700

I've tried, again, to fix my raid array with raidctl -R. I did it on theconsole port this time so I could capture the output from ddb>


Here is some output:


# raidctl -s raid0
raid0 Components:
           /dev/wd0d: failed
           /dev/wd1d: optimal
No spares.
Parity status: DIRTY
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.

So I attempt an inplace  reconstruction of wd0d.

#
# raidctl -R /dev/wd0d raid0
Closing the opened device: /dev/wd0d
About to (re-)open the device for rebuilding: /dev/wd0d
RECON: Initiating in-place reconstruction on
       row 0 col 0 -> spare at row 0 col 0.
Quiescence reached...

I then use raidctl -S to monitor the reconstruction. Things go welluntil the 48% mark. Then I get:

wd1d: uncorrectable data error reading fsbn 111722176 of11722176-111722303 (wd1 bn 114343984; cn 113436 tn 7 sn 55), retrying

/wd1: transfer error, downgrading to Ultra-DMA mode 4
wd1(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 4

wd1d: uncorrectable data error reading fsbn 111722176 of111722176-111722303 (wd1 bn 114343984; cn 113436 tn 7 sn 55), retryingwd1d: uncorrectable data error reading fsbn 111722248 of111722176-111722303 (wd1 bn 114344056; cn 113436 tn 9 sn 1), retryingwd1d: uncorrectable data error reading fsbn 111722248 of111722176-111722303 (wd1 bn 114344056; cn 113436 tn 9 sn 1)

raid0: IO Error.  Marking /dev/wd1d as failed.
Recon read failed !

panic: RAIDframe error at line 1518 file/usr/src/sys/dev/raidframe/rf_reconstruct.c

Stopped at      Debugger+0x4:   leave
RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC!

DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!

This concerns me because I need wd1d to rebuild my failed wd0d. Anyideas? Drive cables maybe? Any help is greatly appreciated.


Anyway below is the trace and ps from the ddb.


ddb> trace

Debugger(d0642f0d,d0ba1000,d0774c0c,d0ba1000,d0cb4d2c) at Debugger+0x4
panic(d0642ec0,40,d0774c0c,d026e457,d05fde44) at panic+0x63
rf_ReconReadDoneProc(d0cb4d2c,1,1,0,44ba5557) at rf_ReconReadDoneProc+0x1b6
rf_KernelWakeupFunc(d57e8200,ad6ea,787,6befb) at rf_KernelWakeupFunc+0xe2
biodone(d57e8200,9000,100000,d05fde44,636e7520) at biodone+0x88
wddone(d0b82000,d5808000,d0774ddc,d03ed2e0) at wddone+0x7e
wdc_ata_bio_done(d0b7c5a4,d5808000,0,0,a0) at wdc_ata_bio_done+0x5e
wdc_ata_bio_intr(d0b7c5a4,d5808000,1,1) at wdc_ata_bio_intr+0x1a9
wdcintr(d0b7c5a4) at wdcintr+0x8b
Xrecurse_legacy15() at Xrecurse_legacy15+0xb6
apm_cpu_idle(b0,d0644240,d06440a0,7fffffff,d02692e7) at apm_cpu_idle+0x80
idle_loop(d06f28c0,28,0,0,80000000) at idle_loop+0x5
bpendtsleep(d06440a0,4,d0587231,0,0,ffffffff,d05337d7,0) at bpendtsleep
uvm_scheduler(d064409c,3,0,d05337d7,dff0000) at uvm_scheduler+0x6b
check_console(0,0,0,0,0) at check_console

ddb> ps

   PID   PPID   PGRP    UID  S       FLAGS  WAIT       COMMAND
 29631   9527   9527      0  3     0x44184  select     sendmail
  9527  30988   9527      0  3      0x4084  pause      sh
 30988   2589   2589      0  3        0x84  piperd     cron
  5360  19845   5360      0  3      0x4086  nanosleep  raidctl
 13182      0      0      0  3    0x100204  RAIDframe  raid_reconip
 19845      1  19845      0  3      0x4086  pause      ksh
  6511      1   6511      0  3      0x4086  ttyin      getty
 31213      1  31213      0  3      0x4086  ttyin      getty
 18636      1  18636      0  3      0x4086  ttyin      getty
 16125      1  16125      0  3      0x4086  ttyin      getty
  5862      1   5862      0  3      0x4086  ttyin      getty
  2589      1   2589      0  3        0x84  select     cron
 19058      1  19058      0  3        0x84  select     sshd
  7847      1   7847     77  3       0x184  poll       dhcpd
  3822  19592  19592     83  3       0x184  poll       ntpd
 19592      1  19592      0  3        0x84  select     ntpd
  9741   1703   1703     74  3       0x184  bpf        pflogd
  1703      1   1703      0  3        0x84  netio      pflogd
  8078   6713   6713     73  2       0x184             syslogd
  6713      1   6713      0  3        0x84  netio      syslogd
  4673      1   4673      0  3        0x84  mfsidl     mount_mfs
 28531      0      0      0  3    0x100204  rfwcond    raid0
    15      0      0      0  3    0x100204  crypto_wa  crypto
    14      0      0      0  3    0x100204  aiodoned   aiodoned
    13      0      0      0  3    0x100204  syncer     update
    12      0      0      0  3    0x100204  cleaner    cleaner
    11      0      0      0  3    0x100204  reaper     reaper
    10      0      0      0  3    0x100204  pgdaemon   pagedaemon
     9      0      0      0  3    0x100204  pftm       pfpurge
     8      0      0      0  3    0x100204  usbevt     usb3
     7      0      0      0  3    0x100204  usbevt     usb2
     6      0      0      0  3    0x100204  usbevt     usb1
     5      0      0      0  3    0x100204  usbtsk     usbtask
     4      0      0      0  3    0x100204  usbevt     usb0
     3      0      0      0  3    0x100204  apmev      apm0
     2      0      0      0  3    0x100204  kmalloc    kmthread
     1      0      1      0  3      0x4084  wait       init
     0     -1      0      0  3     0x80204  scheduler  swapper

Greg Oster wrote:

"Jeff Quast" writes:
My first few months with raidframe caused many kernel panics. With 30
minutes of parity checking, this was a difficult learning experience.
I was initialy led to beleive that raidframe was hardly stable (and
therfor disabled in GENERIC).

However, as I gained experience with raidctl and raidframe, and traced
the panics to code level, I almost always found the panics were caused
by my misuse or misinterpretation of raidctl(8). A small book could
probobly be written on the many different situations you can find
yourself in with raidframe.

I havn't had a kernel panic for a long time, and have had 3 disks fail
since on a level 5 raid without issue reconstructing, changing
geometry, etc. If memory serves me, I may have reconstructed a mounted
raidset, though given the choice, I certainly wouldn't.
RAIDframe was built to allow reconstructing a mounted RAID set... infact, it goes to a lot of trouble to allow that to happen properly...The only 'problem' you might notice would be a performancedegredation for both the rebuild and any user IO taking place...
All in all, I find kernel panics with raidframe is just its way of
saying "Bad choice of arguments" :)
RAIDframe in OpenBSD is somewhat lax about checking the inputprovided by raidctl... It works quite well if you don't tell itto do anything it's not expecting :-} (most (all?) of those problemshave long since been cleaned up -- unfortunately not in the code basethat's in OpenBSD though :( )
Later...

Greg Oster

Re: raidctl on a live raid array, and the kernel debugger

Reply via email to