Re: PROBLEM: sata timeouts with intel 82801HB on amd64
On Wed, 07 Feb 2007 22:56:45 -0600 Robert Hancock <[EMAIL PROTECTED]> wrote: > Paolo Ornati wrote: > > If mounting XFS with "nobarrier" fixes the problem it seems that more > > than one Seagate disk cannot handle the Cache Flush command while other > > commands are in fly... > > It's not allowed to overlap NCQ (FPDMA read/write) commands with any > other commands such as cache flushes. libata core guarantees that this > doesn't happen by deferring such requests until the FPDMA commands are > complete. (At least, it's supposed to..) I didn't know that. Anyway just mounting XFS with "nobarrier" fixes the ploblem for me... so libata is buggy or I don't know! -- Paolo Ornati Linux 2.6.20 on x86_64 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: sata timeouts with intel 82801HB on amd64
Paolo Ornati wrote: If mounting XFS with "nobarrier" fixes the problem it seems that more than one Seagate disk cannot handle the Cache Flush command while other commands are in fly... It's not allowed to overlap NCQ (FPDMA read/write) commands with any other commands such as cache flushes. libata core guarantees that this doesn't happen by deferring such requests until the FPDMA commands are complete. (At least, it's supposed to..) -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: sata timeouts with intel 82801HB on amd64
"Trevor Offner Caira" <[EMAIL PROTECTED]> writes: > (3) Keywords: SATA, AHCI, modules, kernel, Intel. Does your systems is being run using ata_piix or ahci driver? -- O T A V I OS A L V A D O R - E-mail: [EMAIL PROTECTED] UIN: 5906116 GNU/Linux User: 239058 GPG ID: 49A5F855 Home Page: http://otavio.ossystems.com.br - "Microsoft sells you Windows ... Linux gives you the whole house." - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: sata timeouts with intel 82801HB on amd64
On Wed, 7 Feb 2007 07:59:46 -0500 (EST) "Trevor Offner Caira" <[EMAIL PROTECTED]> wrote: > > 1) disabling NCQ ("echo 1 > /sys/block/sda/device/queue_depth" in a > > boot script) > > No, this does not fix it. > > > OR > > > > 2) mounting XFS filesystem(s) with "nobarrier" option > > Neither does this. ok, so it's a different problem (I've tried :) -- Paolo Ornati Linux 2.6.20 on x86_64 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: sata timeouts with intel 82801HB on amd64
> Are you using XFS, right? For /usr, /var and /home, yes. For /, no, my root partition is ext3. > Can you see if the problem goes away either: > > 1) disabling NCQ ("echo 1 > /sys/block/sda/device/queue_depth" in a > boot script) No, this does not fix it. > OR > > 2) mounting XFS filesystem(s) with "nobarrier" option Neither does this. Thanks, Trevor Caira - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: sata timeouts with intel 82801HB on amd64
On Mon, 5 Feb 2007 21:08:33 -0500 (EST) "Trevor Offner Caira" <[EMAIL PROTECTED]> wrote: > (1) One-line summary: I'm getting SATA timeouts with Intel 82801HB on amd64. > > (2) Full description: Unless CONFIG_RCU_TORTURE_TEST is set, I get sata > timeouts of this form periodically: > > ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen > ata1.00: cmd 60/18:00:b3:22:0a/00:00:00:00:00/40 tag 0 cdb 0x0 data 12288 in > res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) > ata1: soft resetting port > ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > ata1.00: configured for UDMA/133 > ata1: EH complete > SCSI device sda: 625142448 512-byte hdwr sectors (320073 MB) > sda: Write Protect is off > SCSI device sda: write cache: enabled, read cache: enabled, doesn't > support DPO or FUA > > This entails complete blocking of all disk i/o (I only have one disk) for > about 45 seconds. The kernel then negotiates the next lowest transfer > speed (UDMA/166 all the way down to PIO0, when it errors saying it cannot > go slower). I get this issue on amd64 kernels only. The issue is only > present in 2.6.18+, since earlier kernels do not support my chipset at all > (intel 82801HB). > > Knoppix 5.1.1 does not show this issue (i.e., no disk i/o issues even > without rcutorture running). However, a native amd64 build of exactly the > same kernel config shows the issue. > > (3) Keywords: SATA, AHCI, modules, kernel, Intel. > [CUT] > (8.7) Other information: There's nothing in the system except for the > DG965WH motherboard, E6600 processor, 1GB of kingston RAM, the ST3320620AS > hard drive and 430 W PSU. > > Thanks for reading this far! :) Are you using XFS, right? Can you see if the problem goes away either: 1) disabling NCQ ("echo 1 > /sys/block/sda/device/queue_depth" in a boot script) OR 2) mounting XFS filesystem(s) with "nobarrier" option ? I've seen this problem with very similar hardware (and so I've added Tejun to CC :). If mounting XFS with "nobarrier" fixes the problem it seems that more than one Seagate disk cannot handle the Cache Flush command while other commands are in fly... -- Paolo Ornati Linux 2.6.20 on x86_64 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: sata timeouts with intel 82801HB on amd64
On Mon, 5 Feb 2007 21:08:33 -0500 (EST) Trevor Offner Caira [EMAIL PROTECTED] wrote: (1) One-line summary: I'm getting SATA timeouts with Intel 82801HB on amd64. (2) Full description: Unless CONFIG_RCU_TORTURE_TEST is set, I get sata timeouts of this form periodically: ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen ata1.00: cmd 60/18:00:b3:22:0a/00:00:00:00:00/40 tag 0 cdb 0x0 data 12288 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata1: soft resetting port ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: configured for UDMA/133 ata1: EH complete SCSI device sda: 625142448 512-byte hdwr sectors (320073 MB) sda: Write Protect is off SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA This entails complete blocking of all disk i/o (I only have one disk) for about 45 seconds. The kernel then negotiates the next lowest transfer speed (UDMA/166 all the way down to PIO0, when it errors saying it cannot go slower). I get this issue on amd64 kernels only. The issue is only present in 2.6.18+, since earlier kernels do not support my chipset at all (intel 82801HB). Knoppix 5.1.1 does not show this issue (i.e., no disk i/o issues even without rcutorture running). However, a native amd64 build of exactly the same kernel config shows the issue. (3) Keywords: SATA, AHCI, modules, kernel, Intel. [CUT] (8.7) Other information: There's nothing in the system except for the DG965WH motherboard, E6600 processor, 1GB of kingston RAM, the ST3320620AS hard drive and 430 W PSU. Thanks for reading this far! :) Are you using XFS, right? Can you see if the problem goes away either: 1) disabling NCQ (echo 1 /sys/block/sda/device/queue_depth in a boot script) OR 2) mounting XFS filesystem(s) with nobarrier option ? I've seen this problem with very similar hardware (and so I've added Tejun to CC :). If mounting XFS with nobarrier fixes the problem it seems that more than one Seagate disk cannot handle the Cache Flush command while other commands are in fly... -- Paolo Ornati Linux 2.6.20 on x86_64 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: sata timeouts with intel 82801HB on amd64
Are you using XFS, right? For /usr, /var and /home, yes. For /, no, my root partition is ext3. Can you see if the problem goes away either: 1) disabling NCQ (echo 1 /sys/block/sda/device/queue_depth in a boot script) No, this does not fix it. OR 2) mounting XFS filesystem(s) with nobarrier option Neither does this. Thanks, Trevor Caira - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: sata timeouts with intel 82801HB on amd64
On Wed, 7 Feb 2007 07:59:46 -0500 (EST) Trevor Offner Caira [EMAIL PROTECTED] wrote: 1) disabling NCQ (echo 1 /sys/block/sda/device/queue_depth in a boot script) No, this does not fix it. OR 2) mounting XFS filesystem(s) with nobarrier option Neither does this. ok, so it's a different problem (I've tried :) -- Paolo Ornati Linux 2.6.20 on x86_64 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: sata timeouts with intel 82801HB on amd64
Trevor Offner Caira [EMAIL PROTECTED] writes: (3) Keywords: SATA, AHCI, modules, kernel, Intel. Does your systems is being run using ata_piix or ahci driver? -- O T A V I OS A L V A D O R - E-mail: [EMAIL PROTECTED] UIN: 5906116 GNU/Linux User: 239058 GPG ID: 49A5F855 Home Page: http://otavio.ossystems.com.br - Microsoft sells you Windows ... Linux gives you the whole house. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: sata timeouts with intel 82801HB on amd64
Paolo Ornati wrote: If mounting XFS with nobarrier fixes the problem it seems that more than one Seagate disk cannot handle the Cache Flush command while other commands are in fly... It's not allowed to overlap NCQ (FPDMA read/write) commands with any other commands such as cache flushes. libata core guarantees that this doesn't happen by deferring such requests until the FPDMA commands are complete. (At least, it's supposed to..) -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: sata timeouts with intel 82801HB on amd64
On Wed, 07 Feb 2007 22:56:45 -0600 Robert Hancock [EMAIL PROTECTED] wrote: Paolo Ornati wrote: If mounting XFS with nobarrier fixes the problem it seems that more than one Seagate disk cannot handle the Cache Flush command while other commands are in fly... It's not allowed to overlap NCQ (FPDMA read/write) commands with any other commands such as cache flushes. libata core guarantees that this doesn't happen by deferring such requests until the FPDMA commands are complete. (At least, it's supposed to..) I didn't know that. Anyway just mounting XFS with nobarrier fixes the ploblem for me... so libata is buggy or I don't know! -- Paolo Ornati Linux 2.6.20 on x86_64 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
PROBLEM: sata timeouts with intel 82801HB on amd64
(1) One-line summary: I'm getting SATA timeouts with Intel 82801HB on amd64. (2) Full description: Unless CONFIG_RCU_TORTURE_TEST is set, I get sata timeouts of this form periodically: ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen ata1.00: cmd 60/18:00:b3:22:0a/00:00:00:00:00/40 tag 0 cdb 0x0 data 12288 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata1: soft resetting port ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: configured for UDMA/133 ata1: EH complete SCSI device sda: 625142448 512-byte hdwr sectors (320073 MB) sda: Write Protect is off SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA This entails complete blocking of all disk i/o (I only have one disk) for about 45 seconds. The kernel then negotiates the next lowest transfer speed (UDMA/166 all the way down to PIO0, when it errors saying it cannot go slower). I get this issue on amd64 kernels only. The issue is only present in 2.6.18+, since earlier kernels do not support my chipset at all (intel 82801HB). Knoppix 5.1.1 does not show this issue (i.e., no disk i/o issues even without rcutorture running). However, a native amd64 build of exactly the same kernel config shows the issue. (3) Keywords: SATA, AHCI, modules, kernel, Intel. (4) /proc/version: Linux delta 2.6.20 #15 SMP PREEMPT Sun Feb 4 20:25:02 EST 2007 x86_64 GNU/Linux. (5) Most recent kernel version which did not have the bug: N/A (6) Output of Oops.. message: N/A (7) A small shell script or example program which triggers the problem: simply logging in or accessing the disk in any way. (8) Environment: Note: the following information was collected on a minimal kernel with RCU_TORTURE_TEST enabled. (8.1) Software Gnu C 4.1.2 Gnu make 3.81 binutils 2.17 util-linux 2.12r mount 2.12r module-init-tools 3.3-pre2 e2fsprogs 1.40-WIP xfsprogs 2.8.18 Linux C Library2.3.6 Dynamic linker (ldd) 2.3.6 Procps 3.2.7 Net-tools 1.60 Kbd85: Sh-utils 5.97 udev 103 (8.2) Processor information: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz stepping: 6 cpu MHz : 2397.649 cache size : 4096 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm bogomips: 4797.17 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz stepping: 6 cpu MHz : 2397.649 cache size : 4096 KB physical id : 0 siblings: 2 core id : 1 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm bogomips: 4794.47 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: (8.3) Module information: N/A (8.4) /proc/ioports: -0009ebff : System RAM - : Crash kernel 0009ec00-0009 : reserved 000a-000b : Video RAM area 000c-000c7fff : Video ROM 000cd000-000cdfff : Adapter ROM 000ce000-000cefff : Adapter ROM 000f-000f : System ROM 0010-3ed98fff : System RAM 0020-005ccb4d : Kernel code 005ccb4e-0075d32f : Kernel data 3ed99000-3eda5fff : reserved 3eda6000-3ee3bfff : System RAM 3ee3c000-3eea8fff : ACPI Non-volatile Storage 3eea9000-3eeabfff : ACPI Tables 3eeac000-3eef1fff : ACPI Non-volatile Storage 3eef2000-3eefefff : ACPI Tables 3eeff000-3eef : System RAM 3ef0-3f7f : reserved 4000-4fff : :00:02.0 4000-4076 : vesafb 5000-500f : PCI Bus #06 5000-50003fff : :06:03.0 50004000-500047ff : :06:03.0 5010-501f : PCI Bus #02 5010-501001ff : :02:00.0 5020-502f : :00:02.0 5030-5031 : :00:19.0 5030-5031 : e1000 5032-50323fff : :00:1b.0 5032-50323fff : ICH HD audio 50324000-50324fff : :00:19.0 50324000-50324fff : e1000 50325000-503257ff : :00:1f.2 50325000-503257ff : ahci 50325800-50325bff : :00:1d.7 50325800-50325bff :
PROBLEM: sata timeouts with intel 82801HB on amd64
(1) One-line summary: I'm getting SATA timeouts with Intel 82801HB on amd64. (2) Full description: Unless CONFIG_RCU_TORTURE_TEST is set, I get sata timeouts of this form periodically: ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen ata1.00: cmd 60/18:00:b3:22:0a/00:00:00:00:00/40 tag 0 cdb 0x0 data 12288 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata1: soft resetting port ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: configured for UDMA/133 ata1: EH complete SCSI device sda: 625142448 512-byte hdwr sectors (320073 MB) sda: Write Protect is off SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA This entails complete blocking of all disk i/o (I only have one disk) for about 45 seconds. The kernel then negotiates the next lowest transfer speed (UDMA/166 all the way down to PIO0, when it errors saying it cannot go slower). I get this issue on amd64 kernels only. The issue is only present in 2.6.18+, since earlier kernels do not support my chipset at all (intel 82801HB). Knoppix 5.1.1 does not show this issue (i.e., no disk i/o issues even without rcutorture running). However, a native amd64 build of exactly the same kernel config shows the issue. (3) Keywords: SATA, AHCI, modules, kernel, Intel. (4) /proc/version: Linux delta 2.6.20 #15 SMP PREEMPT Sun Feb 4 20:25:02 EST 2007 x86_64 GNU/Linux. (5) Most recent kernel version which did not have the bug: N/A (6) Output of Oops.. message: N/A (7) A small shell script or example program which triggers the problem: simply logging in or accessing the disk in any way. (8) Environment: Note: the following information was collected on a minimal kernel with RCU_TORTURE_TEST enabled. (8.1) Software Gnu C 4.1.2 Gnu make 3.81 binutils 2.17 util-linux 2.12r mount 2.12r module-init-tools 3.3-pre2 e2fsprogs 1.40-WIP xfsprogs 2.8.18 Linux C Library2.3.6 Dynamic linker (ldd) 2.3.6 Procps 3.2.7 Net-tools 1.60 Kbd85: Sh-utils 5.97 udev 103 (8.2) Processor information: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz stepping: 6 cpu MHz : 2397.649 cache size : 4096 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm bogomips: 4797.17 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz stepping: 6 cpu MHz : 2397.649 cache size : 4096 KB physical id : 0 siblings: 2 core id : 1 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm bogomips: 4794.47 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: (8.3) Module information: N/A (8.4) /proc/ioports: -0009ebff : System RAM - : Crash kernel 0009ec00-0009 : reserved 000a-000b : Video RAM area 000c-000c7fff : Video ROM 000cd000-000cdfff : Adapter ROM 000ce000-000cefff : Adapter ROM 000f-000f : System ROM 0010-3ed98fff : System RAM 0020-005ccb4d : Kernel code 005ccb4e-0075d32f : Kernel data 3ed99000-3eda5fff : reserved 3eda6000-3ee3bfff : System RAM 3ee3c000-3eea8fff : ACPI Non-volatile Storage 3eea9000-3eeabfff : ACPI Tables 3eeac000-3eef1fff : ACPI Non-volatile Storage 3eef2000-3eefefff : ACPI Tables 3eeff000-3eef : System RAM 3ef0-3f7f : reserved 4000-4fff : :00:02.0 4000-4076 : vesafb 5000-500f : PCI Bus #06 5000-50003fff : :06:03.0 50004000-500047ff : :06:03.0 5010-501f : PCI Bus #02 5010-501001ff : :02:00.0 5020-502f : :00:02.0 5030-5031 : :00:19.0 5030-5031 : e1000 5032-50323fff : :00:1b.0 5032-50323fff : ICH HD audio 50324000-50324fff : :00:19.0 50324000-50324fff : e1000 50325000-503257ff : :00:1f.2 50325000-503257ff : ahci 50325800-50325bff : :00:1d.7 50325800-50325bff :