Re: Disk Errors during boot and run time.
Bryn M. Reeves wrote: > On Fri, 2009-03-06 at 12:22 +1300, Paul Ward wrote: > >> # ls /boot >> ls: reading directory /boot: Input/output error >> > > What's in dmesg at this time? > > >> I have been told that the disks use multipath but I have no experience >> of this to date. >> I know the disks are on a SAN but as yet have not been able to locate >> them using the IBM SAN manager. >> >> > > >> Linux version 2.6.18-53.1.21.el5PAE >> > > So, RHEL5.1? > > >> (brewbuil...@ls20-bc2-13.build.redhat.com) (gcc version 4.1.2 20070626 >> (Red Hat 4.1.2-14)) #1 SMP Wed May 7 08:56:33 EDT 2008 >> > > >> Vendor: IBM Model: 1814 FAStT Rev: 0916 >> Type: Direct-Access ANSI SCSI revision: 05 >> > > So it's an IBM FAStT SAN? These are active/passive storage arrays that > require use of a multipath hardware handler to properly manage switching > between the active and passive paths and preventing I/O being sent to a > controller that cannot handler it. > > The I/O errors that you see are a result of things trying to access the > passive paths (e.g. partition scanning, lvm label scanning, udev/hal > probes etc.). > > RHEL5.1 included the old device-mapper hardware handlers. These will > only take effect once multipath has configured the devices and only > handle path switching in the event of a path failure (i.e. you'll still > see I/O errors if something tries to access one of the underlying paths > directly rather than via the multipath device map). > > RHEL5.3 introduces the scsi device handler framework as a replacement > for the device-mapper hardware handlers (this appeared upstream in > 2.6.26). > > Whether you decide to update or not it's probably worth carefully > checking the current multipath configuration on the system as this is a > very common area for configuration mistakes. > > Regards, > Bryn. > > > I don't think this is hardware-specific. I've seen this problem on desktop-grade hardware, using either IDE or SATA drives (single 300GB Seagate). Mine happened while I was using cloning software CloneZilla (don't remember which version, right now). I'll post more details if/when i run into the problem again... -- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
Re: Disk Errors during boot and run time.
On Fri, 2009-03-06 at 12:22 +1300, Paul Ward wrote: > # ls /boot > ls: reading directory /boot: Input/output error What's in dmesg at this time? > I have been told that the disks use multipath but I have no experience > of this to date. > I know the disks are on a SAN but as yet have not been able to locate > them using the IBM SAN manager. > > Linux version 2.6.18-53.1.21.el5PAE So, RHEL5.1? > (brewbuil...@ls20-bc2-13.build.redhat.com) (gcc version 4.1.2 20070626 > (Red Hat 4.1.2-14)) #1 SMP Wed May 7 08:56:33 EDT 2008 > Vendor: IBM Model: 1814 FAStT Rev: 0916 > Type: Direct-Access ANSI SCSI revision: 05 So it's an IBM FAStT SAN? These are active/passive storage arrays that require use of a multipath hardware handler to properly manage switching between the active and passive paths and preventing I/O being sent to a controller that cannot handler it. The I/O errors that you see are a result of things trying to access the passive paths (e.g. partition scanning, lvm label scanning, udev/hal probes etc.). RHEL5.1 included the old device-mapper hardware handlers. These will only take effect once multipath has configured the devices and only handle path switching in the event of a path failure (i.e. you'll still see I/O errors if something tries to access one of the underlying paths directly rather than via the multipath device map). RHEL5.3 introduces the scsi device handler framework as a replacement for the device-mapper hardware handlers (this appeared upstream in 2.6.26). Whether you decide to update or not it's probably worth carefully checking the current multipath configuration on the system as this is a very common area for configuration mistakes. Regards, Bryn. -- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
Re: Disk Errors during boot and run time.
On Tue, 2009-03-10 at 14:57 -0700, Konstantin Svist wrote: > Bump for this thread. I'd very much like to know the answer, too. You didn't need to quote all 1142 lines of context to say "me too". poc -- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
Re: Disk Errors during boot and run time.
Paul Ward wrote: > Hi, > > I have been asked to look at a server that has some disk issues. > > If I try and do a ls /boot I get the following: > # ls /boot > ls: reading directory /boot: Input/output error > > I have been told that the disks use multipath but I have no experience > of this to date. > I know the disks are on a SAN but as yet have not been able to locate > them using the IBM SAN manager. > > Can someone help me unravel what I can do to get this server healthy again. > > > This is the dmesg file during the boot up > > > Linux version 2.6.18-53.1.21.el5PAE > (brewbuil...@ls20-bc2-13.build.redhat.com) (gcc version 4.1.2 20070626 > (Red Hat 4.1.2-14)) #1 SMP Wed May 7 08:56:33 EDT 2008 > BIOS-provided physical RAM map: > BIOS-e820: - 0009d800 (usable) > BIOS-e820: 0009d800 - 000a (reserved) > BIOS-e820: 000e - 0010 (reserved) > BIOS-e820: 0010 - cffbce40 (usable) > BIOS-e820: cffbce40 - cffd (ACPI data) > BIOS-e820: cffd - d000 (reserved) > BIOS-e820: e000 - f000 (reserved) > BIOS-e820: fec0 - 0001 (reserved) > BIOS-e820: 0001 - 00013000 (usable) > 3968MB HIGHMEM available. > 896MB LOWMEM available. > found SMP MP-table at 0009d940 > Using x86 segment limits to approximate NX protection > On node 0 totalpages: 1245184 > DMA zone: 4096 pages, LIFO batch:0 > Normal zone: 225280 pages, LIFO batch:31 > HighMem zone: 1015808 pages, LIFO batch:31 > DMI 2.4 present. > Using APIC driver default > ACPI: RSDP (v002 IBM ) @ 0x000fdfd0 > ACPI: XSDT (v001 IBMSERBLADE 0x1001 IBM 0x45444f43) @ 0xcffcff00 > ACPI: FADT (v002 IBMSERBLADE 0x1001 IBM 0x45444f43) @ 0xcffcfe40 > ACPI: MADT (v001 IBMSERBLADE 0x1001 IBM 0x45444f43) @ 0xcffcfdc0 > ACPI: MCFG (v001 IBMSERBLADE 0x1001 IBM 0x45444f43) @ 0xcffcfd80 > ACPI: DSDT (v002 IBMSERBLADE 0x1000 INTL 0x20060707) @ 0x > ACPI: PM-Timer IO Port: 0x588 > ACPI: Local APIC address 0xfee0 > ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) > Processor #0 6:15 APIC version 20 > ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) > Processor #1 6:15 APIC version 20 > ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1]) > ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1]) > ACPI: IOAPIC (id[0x0e] address[0xfec0] gsi_base[0]) > IOAPIC[0]: apic_id 14, version 32, address 0xfec0, GSI 0-23 > ACPI: IOAPIC (id[0x0d] address[0xfec8] gsi_base[24]) > IOAPIC[1]: apic_id 13, version 32, address 0xfec8, GSI 24-47 > ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) > ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) > ACPI: IRQ0 used by override. > ACPI: IRQ2 used by override. > ACPI: IRQ9 used by override. > Enabling APIC mode: Flat. Using 2 I/O APICs > Using ACPI (MADT) for SMP configuration information > Allocating PCI resources starting at d100 (gap: d000:1000) > Detected 3000.364 MHz processor. > Built 1 zonelists. Total pages: 1245184 > Kernel command line: ro root=/dev/VolGroup00/LogVol00 rhgb quiet > crashkernel=1...@16m > mapped APIC to d000 (fee0) > mapped IOAPIC to c000 (fec0) > mapped IOAPIC to b000 (fec8) > Enabling fast FPU save and restore... done. > Enabling unmasked SIMD FPU exception support... done. > Initializing CPU#0 > CPU 0 irqstacks, hard=c074 soft=c072 > PID hash table entries: 4096 (order: 12, 16384 bytes) > Console: colour VGA+ 80x25 > Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) > Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) > Memory: 4014432k/4980736k available (2078k kernel code, 178348k > reserved, 859k data, 220k init, 3276528k highmem) > Checking if this processor honours the WP bit even in supervisor mode... Ok. > Calibrating delay using timer specific routine.. 6002.55 BogoMIPS > (lpj=3001278) > Security Framework v1.0.0 initialized > SELinux: Initializing. > SELinux: Starting in permissive mode > selinux_register_security: Registering secondary module capability > Capability LSM initialized as secondary > Mount-cache hash table entries: 512 > CPU: After generic identify, caps: bfebfbff 2000 > 0004e3bd 0001 > CPU: After vendor identify, caps: bfebfbff 2000 > 0004e3bd 0001 > monitor/mwait feature present. > using mwait in idle threads. > CPU: L1 I cache: 32K, L1 D cache: 32K > CPU: L2 cache: 4096K > CPU: Physical Processor ID: 0 > CPU: Processor Core ID: 0 > CPU: After all inits, caps: bfebf3ff 2000 0940 > 0004e3bd 0001 > Intel machine check architecture supported. > Intel machine check reporting enabled on CPU#0. > Checking 'hlt' instruction... OK. > SMP alternatives: switching to UP code > ACPI: Core revision 20060707 > CPU0: Intel(R) Xeon
Disk Errors during boot and run time.
Hi, I have been asked to look at a server that has some disk issues. If I try and do a ls /boot I get the following: # ls /boot ls: reading directory /boot: Input/output error I have been told that the disks use multipath but I have no experience of this to date. I know the disks are on a SAN but as yet have not been able to locate them using the IBM SAN manager. Can someone help me unravel what I can do to get this server healthy again. This is the dmesg file during the boot up Linux version 2.6.18-53.1.21.el5PAE (brewbuil...@ls20-bc2-13.build.redhat.com) (gcc version 4.1.2 20070626 (Red Hat 4.1.2-14)) #1 SMP Wed May 7 08:56:33 EDT 2008 BIOS-provided physical RAM map: BIOS-e820: - 0009d800 (usable) BIOS-e820: 0009d800 - 000a (reserved) BIOS-e820: 000e - 0010 (reserved) BIOS-e820: 0010 - cffbce40 (usable) BIOS-e820: cffbce40 - cffd (ACPI data) BIOS-e820: cffd - d000 (reserved) BIOS-e820: e000 - f000 (reserved) BIOS-e820: fec0 - 0001 (reserved) BIOS-e820: 0001 - 00013000 (usable) 3968MB HIGHMEM available. 896MB LOWMEM available. found SMP MP-table at 0009d940 Using x86 segment limits to approximate NX protection On node 0 totalpages: 1245184 DMA zone: 4096 pages, LIFO batch:0 Normal zone: 225280 pages, LIFO batch:31 HighMem zone: 1015808 pages, LIFO batch:31 DMI 2.4 present. Using APIC driver default ACPI: RSDP (v002 IBM ) @ 0x000fdfd0 ACPI: XSDT (v001 IBMSERBLADE 0x1001 IBM 0x45444f43) @ 0xcffcff00 ACPI: FADT (v002 IBMSERBLADE 0x1001 IBM 0x45444f43) @ 0xcffcfe40 ACPI: MADT (v001 IBMSERBLADE 0x1001 IBM 0x45444f43) @ 0xcffcfdc0 ACPI: MCFG (v001 IBMSERBLADE 0x1001 IBM 0x45444f43) @ 0xcffcfd80 ACPI: DSDT (v002 IBMSERBLADE 0x1000 INTL 0x20060707) @ 0x ACPI: PM-Timer IO Port: 0x588 ACPI: Local APIC address 0xfee0 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 6:15 APIC version 20 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) Processor #1 6:15 APIC version 20 ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1]) ACPI: IOAPIC (id[0x0e] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 14, version 32, address 0xfec0, GSI 0-23 ACPI: IOAPIC (id[0x0d] address[0xfec8] gsi_base[24]) IOAPIC[1]: apic_id 13, version 32, address 0xfec8, GSI 24-47 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. ACPI: IRQ9 used by override. Enabling APIC mode: Flat. Using 2 I/O APICs Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at d100 (gap: d000:1000) Detected 3000.364 MHz processor. Built 1 zonelists. Total pages: 1245184 Kernel command line: ro root=/dev/VolGroup00/LogVol00 rhgb quiet crashkernel=1...@16m mapped APIC to d000 (fee0) mapped IOAPIC to c000 (fec0) mapped IOAPIC to b000 (fec8) Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Initializing CPU#0 CPU 0 irqstacks, hard=c074 soft=c072 PID hash table entries: 4096 (order: 12, 16384 bytes) Console: colour VGA+ 80x25 Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) Memory: 4014432k/4980736k available (2078k kernel code, 178348k reserved, 859k data, 220k init, 3276528k highmem) Checking if this processor honours the WP bit even in supervisor mode... Ok. Calibrating delay using timer specific routine.. 6002.55 BogoMIPS (lpj=3001278) Security Framework v1.0.0 initialized SELinux: Initializing. SELinux: Starting in permissive mode selinux_register_security: Registering secondary module capability Capability LSM initialized as secondary Mount-cache hash table entries: 512 CPU: After generic identify, caps: bfebfbff 2000 0004e3bd 0001 CPU: After vendor identify, caps: bfebfbff 2000 0004e3bd 0001 monitor/mwait feature present. using mwait in idle threads. CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 4096K CPU: Physical Processor ID: 0 CPU: Processor Core ID: 0 CPU: After all inits, caps: bfebf3ff 2000 0940 0004e3bd 0001 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. Checking 'hlt' instruction... OK. SMP alternatives: switching to UP code ACPI: Core revision 20060707 CPU0: Intel(R) Xeon(R) CPU5160 @ 3.00GHz stepping 0b SMP alternatives: switching to SMP code Booting processor 1/1 eip 3000 CPU 1 irqstacks, hard=c0741000 soft=c0721000 Initializing CPU#1 Calibrating delay using timer specific routine.. 599