Pozdrav, pred kakim mesecem sem pisal na tale seznam, ker sem imel težave z nekim md arrayjem. Če vas zanimajo podrobnosti, poiščite po arhivu. Šlo je za to, da se mi je na SATA krmilniku vedno "zaskočil" en port, na katerega je bil zapet eden od diskov v md arrayju (RAID5). Nekateri ste bili mnenja, da disk na tem portu krepuje, in mi svetovali, naj ga zamenjam.
Pa ga nisem; raje sem zamenjal ves ostali hardver (matično, procesor, pomnilnik, grafično, napajalnik, SATA krmilnik etc). Po rošadi je strežnik deloval brez prej omenjenih težav ves mesec, vsak dan, 24 ur na dan. Do včeraj. Strežnik je na omrežju opravljal tudi funckijo routerja (dve mrežni kartici, PPPoe, NAT, dnsmasq, firewall ipd), sedaj pa sem dobil "dedicated" router in na strežniku to funkcionalnost upokojil. Edina hardverska sprememba med prej in sedaj je bila, da sem iz mrežne kartice za WAN izpulil ethernet kabel, katerega drugi konec je bil prej zapičen v ADSL modem. Prižgem, preizkusim, vse deluje kot mora. Dve uri pozneje se je začelo dogajati tole (se opravičujem, ampak v takih primerih se mi zdi smiselno priložit ves dnevnik): Oct 24 20:03:42 floki kernel: irq 18: nobody cared (try booting with the "irqpoll" option) Oct 24 20:03:42 floki kernel: [<c014da8e>] __report_bad_irq+0x36/0x75 Oct 24 20:03:42 floki kernel: [<c014dc86>] note_interrupt+0x1b9/0x1f7 Oct 24 20:03:42 floki kernel: [<c014d1cf>] handle_IRQ_event+0x23/0x51 Oct 24 20:03:42 floki kernel: [<c014e3fe>] handle_fasteoi_irq+0x85/0xa4 Oct 24 20:03:42 floki kernel: [<c010639e>] do_IRQ+0x57/0x71 Oct 24 20:03:42 floki kernel: [<c0135f8d>] hrtimer_start+0xe7/0xf1 Oct 24 20:03:42 floki kernel: [<c0102c2c>] default_idle+0x0/0x55 Oct 24 20:03:42 floki kernel: [<c010476b>] common_interrupt+0x23/0x28 Oct 24 20:03:42 floki kernel: [<c0102c2c>] default_idle+0x0/0x55 Oct 24 20:03:42 floki kernel: [<c0119b92>] native_safe_halt+0x2/0x3 Oct 24 20:03:42 floki kernel: [<c0102c65>] default_idle+0x39/0x55 Oct 24 20:03:42 floki kernel: [<c01023a6>] cpu_idle+0xab/0xcc Oct 24 20:03:42 floki kernel: [<c035599a>] start_kernel+0x350/0x358 Oct 24 20:03:42 floki kernel: [<c035517b>] unknown_bootoption+0x0/0x20c Oct 24 20:03:42 floki kernel: ======================= Oct 24 20:03:42 floki kernel: handlers: Oct 24 20:03:42 floki kernel: [<d08497f3>] (pdc_interrupt+0x0/0x267 [sata_promise]) Oct 24 20:03:42 floki kernel: Disabling IRQ #18 Oct 24 22:57:42 floki kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Oct 24 22:57:42 floki kernel: ata4.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0 Oct 24 22:57:42 floki kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 24 22:57:42 floki kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Oct 24 22:57:42 floki kernel: ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0 Oct 24 22:57:42 floki kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 24 22:57:42 floki kernel: ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x4190002 action 0x2 frozen Oct 24 22:57:42 floki kernel: ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0 Oct 24 22:57:42 floki kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x14 (ATA bus error) Oct 24 22:57:43 floki kernel: ata4: soft resetting port Oct 24 22:57:43 floki kernel: ata3: soft resetting port Oct 24 22:57:43 floki kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Oct 24 22:57:43 floki kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Oct 24 22:57:43 floki kernel: ata2: soft resetting port Oct 24 22:57:43 floki kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Oct 24 22:58:13 floki kernel: ata4.00: qc timeout (cmd 0x27) Oct 24 22:58:13 floki kernel: ata4.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (398297088) Oct 24 22:58:13 floki kernel: ata4.00: failed to set xfermode (err_mask=0x40) Oct 24 22:58:13 floki kernel: ata4: failed to recover some devices, retrying in 5 secs Oct 24 22:58:13 floki kernel: ata3.00: qc timeout (cmd 0x27) Oct 24 22:58:13 floki kernel: ata3.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (398297088) Oct 24 22:58:13 floki kernel: ata3.00: failed to set xfermode (err_mask=0x40) Oct 24 22:58:13 floki kernel: ata3: failed to recover some devices, retrying in 5 secs Oct 24 22:58:13 floki kernel: ata2.00: qc timeout (cmd 0x27) Oct 24 22:58:13 floki kernel: ata2.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (398297088) Oct 24 22:58:13 floki kernel: ata2.00: failed to set xfermode (err_mask=0x40) Oct 24 22:58:13 floki kernel: ata2: failed to recover some devices, retrying in 5 secs Oct 24 22:58:18 floki kernel: ata4: hard resetting port Oct 24 22:58:18 floki kernel: ata3: hard resetting port Oct 24 22:58:18 floki kernel: ata2: hard resetting port Oct 24 22:58:18 floki kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Oct 24 22:58:18 floki kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Oct 24 22:58:19 floki kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Oct 24 22:58:48 floki kernel: ata4.00: qc timeout (cmd 0x27) Oct 24 22:58:48 floki kernel: ata4.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (398297088) Oct 24 22:58:48 floki kernel: ata4.00: failed to set xfermode (err_mask=0x40) Oct 24 22:58:48 floki kernel: ata4.00: limiting speed to UDMA/133:PIO3 Oct 24 22:58:48 floki kernel: ata4: failed to recover some devices, retrying in 5 secs Oct 24 22:58:48 floki kernel: ata3.00: qc timeout (cmd 0x27) Oct 24 22:58:48 floki kernel: ata3.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (398297088) Oct 24 22:58:48 floki kernel: ata3.00: failed to set xfermode (err_mask=0x40) Oct 24 22:58:48 floki kernel: ata3.00: limiting speed to UDMA/133:PIO3 Oct 24 22:58:48 floki kernel: ata3: failed to recover some devices, retrying in 5 secs Oct 24 22:58:49 floki kernel: ata2.00: qc timeout (cmd 0x27) Oct 24 22:58:49 floki kernel: ata2.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (398297088) Oct 24 22:58:49 floki kernel: ata2.00: failed to set xfermode (err_mask=0x40) Oct 24 22:58:49 floki kernel: ata2.00: limiting speed to UDMA/133:PIO3 Oct 24 22:58:49 floki kernel: ata2: failed to recover some devices, retrying in 5 secs Oct 24 22:58:53 floki kernel: ata4: hard resetting port Oct 24 22:58:53 floki kernel: ata3: hard resetting port Oct 24 22:58:54 floki kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Oct 24 22:58:54 floki kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Oct 24 22:58:54 floki kernel: ata2: hard resetting port Oct 24 22:58:55 floki kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Oct 24 22:59:24 floki kernel: ata4.00: qc timeout (cmd 0x27) Oct 24 22:59:24 floki kernel: ata4.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (398297088) Oct 24 22:59:24 floki kernel: ata4.00: failed to set xfermode (err_mask=0x40) Oct 24 22:59:24 floki kernel: ata4.00: disabled Oct 24 22:59:24 floki kernel: ata3.00: qc timeout (cmd 0x27) Oct 24 22:59:24 floki kernel: ata3.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (398297088) Oct 24 22:59:24 floki kernel: ata3.00: failed to set xfermode (err_mask=0x40) Oct 24 22:59:24 floki kernel: ata3.00: disabled Oct 24 22:59:24 floki kernel: ata4: EH complete Oct 24 22:59:24 floki kernel: ata3: EH complete Oct 24 22:59:24 floki kernel: sd 2:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK Oct 24 22:59:24 floki kernel: end_request: I/O error, dev sdb, sector 398283327 Oct 24 22:59:24 floki kernel: sd 3:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK Oct 24 22:59:24 floki kernel: end_request: I/O error, dev sdc, sector 398283327 Oct 24 22:59:24 floki kernel: md: super_written gets error=-5, uptodate=0 Oct 24 22:59:24 floki kernel: raid5: Disk failure on sdb1, disabling device. Operation continuing on 2 devices Oct 24 22:59:24 floki kernel: md: super_written gets error=-5, uptodate=0 Oct 24 22:59:24 floki kernel: raid5: Disk failure on sdc1, disabling device. Operation continuing on 1 devices Oct 24 22:59:24 floki kernel: sd 3:0:0:0: [sdc] READ CAPACITY failed Oct 24 22:59:24 floki kernel: sd 3:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK Oct 24 22:59:24 floki kernel: sd 3:0:0:0: [sdc] Sense not available. Oct 24 22:59:24 floki kernel: sd 3:0:0:0: [sdc] Write Protect is off Oct 24 22:59:24 floki kernel: sd 3:0:0:0: [sdc] Mode Sense: 00 00 00 00 Oct 24 22:59:24 floki kernel: sd 3:0:0:0: [sdc] Asking for cache data failed Oct 24 22:59:24 floki kernel: sd 3:0:0:0: [sdc] Assuming drive cache: write through Oct 24 22:59:25 floki kernel: ata2.00: qc timeout (cmd 0x27) Oct 24 22:59:25 floki kernel: ata2.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (398297088) Oct 24 22:59:25 floki kernel: ata2.00: failed to set xfermode (err_mask=0x40) Oct 24 22:59:25 floki kernel: ata2.00: disabled Oct 24 22:59:26 floki kernel: ata2: soft resetting port Oct 24 22:59:26 floki kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Oct 24 22:59:56 floki kernel: ata2.00: qc timeout (cmd 0x27) Oct 24 22:59:56 floki kernel: ata2.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (398297088) Oct 24 22:59:56 floki kernel: ata2.00: ATA-7: Maxtor 6L200S0, BANC1G10, max UDMA/133 Oct 24 22:59:56 floki kernel: ata2.00: 398297088 sectors, multi 0: LBA48 NCQ (depth 0/32) Oct 24 22:59:56 floki kernel: ata2.00: failed to set xfermode (err_mask=0x40) Oct 24 22:59:56 floki kernel: ata2: failed to recover some devices, retrying in 5 secs Oct 24 23:00:01 floki kernel: ata2: hard resetting port Oct 24 23:00:02 floki kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Oct 24 23:00:32 floki kernel: ata2.00: qc timeout (cmd 0x27) Oct 24 23:00:32 floki kernel: ata2.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (398297088) Oct 24 23:00:32 floki kernel: ata2.00: failed to set xfermode (err_mask=0x40) Oct 24 23:00:32 floki kernel: ata2.00: limiting speed to UDMA/133:PIO3 Oct 24 23:00:32 floki kernel: ata2: failed to recover some devices, retrying in 5 secs Oct 24 23:00:37 floki kernel: ata2: hard resetting port Oct 24 23:00:38 floki kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Oct 24 23:01:08 floki kernel: ata2.00: qc timeout (cmd 0x27) Oct 24 23:01:08 floki kernel: ata2.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (398297088) Oct 24 23:01:08 floki kernel: ata2.00: failed to set xfermode (err_mask=0x40) Oct 24 23:01:08 floki kernel: ata2.00: disabled Oct 24 23:01:09 floki kernel: sd 1:0:0:0: rejecting I/O to offline device Oct 24 23:01:09 floki last message repeated 2 times Oct 24 23:01:09 floki kernel: md: super_written gets error=-5, uptodate=0 Oct 24 23:01:09 floki kernel: raid5: Disk failure on sda1, disabling device. Operation continuing on 0 devices Oct 24 23:01:09 floki kernel: ata2: EH complete Oct 24 23:01:09 floki kernel: RAID5 conf printout: Oct 24 23:01:09 floki kernel: --- rd:3 wd:0 Oct 24 23:01:09 floki kernel: disk 0, o:0, dev:sdc1 Oct 24 23:01:09 floki kernel: disk 1, o:0, dev:sda1 Oct 24 23:01:09 floki kernel: disk 2, o:0, dev:sdb1 Oct 24 23:01:09 floki kernel: Buffer I/O error on device md0, logical block 620 Oct 24 23:01:09 floki kernel: lost page write due to I/O error on md0 Oct 24 23:01:09 floki kernel: Aborting journal on device md0. Oct 24 23:01:09 floki kernel: EXT3-fs error (device md0) in ext3_reserve_inode_write: Journal has aborted Oct 24 23:01:09 floki kernel: Buffer I/O error on device md0, logical block 0 Oct 24 23:01:09 floki kernel: lost page write due to I/O error on md0 Oct 24 23:01:09 floki kernel: EXT3-fs error (device md0) in ext3_dirty_inode: Journal has aborted Oct 24 23:01:09 floki kernel: Buffer I/O error on device md0, logical block 0 Oct 24 23:01:09 floki kernel: lost page write due to I/O error on md0 Oct 24 23:01:09 floki kernel: EXT3-fs error (device md0) in ext3_dirty_inode: Journal has aborted Oct 24 23:01:09 floki kernel: Buffer I/O error on device md0, logical block 0 Oct 24 23:01:09 floki kernel: lost page write due to I/O error on md0 Oct 24 23:01:09 floki kernel: sd 1:0:0:0: rejecting I/O to offline device Oct 24 23:01:09 floki last message repeated 2 times Oct 24 23:01:09 floki kernel: sd 1:0:0:0: [sda] READ CAPACITY failed Oct 24 23:01:09 floki kernel: sd 1:0:0:0: [sda] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK Oct 24 23:01:09 floki kernel: sd 1:0:0:0: [sda] Sense not available. Oct 24 23:01:09 floki kernel: sd 1:0:0:0: rejecting I/O to offline device Oct 24 23:01:09 floki kernel: sd 1:0:0:0: [sda] Write Protect is off Oct 24 23:01:09 floki kernel: sd 1:0:0:0: [sda] Mode Sense: 00 00 00 00 Oct 24 23:01:09 floki kernel: sd 1:0:0:0: rejecting I/O to offline device Oct 24 23:01:09 floki kernel: sd 1:0:0:0: [sda] Asking for cache data failed Oct 24 23:01:09 floki kernel: sd 1:0:0:0: [sda] Assuming drive cache: write through Oct 24 23:01:09 floki kernel: ata2.00: detaching (SCSI 1:0:0:0) Oct 24 23:01:09 floki kernel: sd 1:0:0:0: [sda] Stopping disk Oct 24 23:01:09 floki kernel: sd 1:0:0:0: [sda] START_STOP FAILED Oct 24 23:01:09 floki kernel: sd 1:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK Oct 24 23:01:09 floki kernel: journal commit I/O error Oct 24 23:01:09 floki kernel: Buffer I/O error on device md0, logical block 27 Oct 24 23:01:09 floki kernel: lost page write due to I/O error on md0 Oct 24 23:01:09 floki kernel: Buffer I/O error on device md0, logical block 10256386 Oct 24 23:01:09 floki kernel: lost page write due to I/O error on md0 Oct 24 23:01:09 floki kernel: Buffer I/O error on device md0, logical block 59506690 Oct 24 23:01:09 floki kernel: lost page write due to I/O error on md0 Oct 24 23:01:09 floki kernel: RAID5 conf printout: Oct 24 23:01:09 floki kernel: --- rd:3 wd:0 Oct 24 23:01:09 floki kernel: disk 1, o:0, dev:sda1 Oct 24 23:01:09 floki kernel: disk 2, o:0, dev:sdb1 Oct 24 23:01:09 floki kernel: RAID5 conf printout: Oct 24 23:01:09 floki kernel: --- rd:3 wd:0 Oct 24 23:01:09 floki kernel: disk 1, o:0, dev:sda1 Oct 24 23:01:09 floki kernel: disk 2, o:0, dev:sdb1 Oct 24 23:01:09 floki kernel: RAID5 conf printout: Oct 24 23:01:09 floki kernel: --- rd:3 wd:0 Oct 24 23:01:09 floki kernel: disk 1, o:0, dev:sda1 Oct 24 23:01:09 floki kernel: RAID5 conf printout: Oct 24 23:01:09 floki kernel: --- rd:3 wd:0 Oct 24 23:01:09 floki kernel: disk 1, o:0, dev:sda1 Oct 24 23:01:09 floki kernel: RAID5 conf printout: Oct 24 23:01:09 floki kernel: --- rd:3 wd:0 Skrajšana verzija, kakor si jo predstavljam: od nekod se pojavi interrupt, ki gonilnik za SATA krmilnik noče. Krmilnik, ki dejansko tudi je na IRQ18, se ustavi. Ko nekdo dostopi do datotečnega sistema na arrayu /dev/md0, kernel ugotovi, da nobena komponenta (sda1, sdb1, sdc1) arrayja ni več dosegljiva, zato ga ustavi. En reboot kasneje je vse zopet v delujočem stanju. Sreča v nesreči je, da vse komponente arrayja (sda, sdb in sdc) pocrkajo hkrati, zato array ostane v konsistentnem stanju (ni ga treba rebuildat). Mislim, da gre za neko obskurno hardversko težavo; niti sanja pa se mi ne, kako bi jo izsledil, kaj šele odpravil. Fantje, ima kdo kakšno -- pa čeprav obskurno -- idejo? LP, U. _______________________________________________ lugos-list mailing list lugos-list@lugos.si http://liste2.lugos.si/cgi-bin/mailman/listinfo/lugos-list