Bug#323452: "hda: dma_timer_expiry: dma status == 0x21" errors + freeze (VIA VT82C686 chipset)

2005-08-16 Thread Vincent Lefevre
Package: kernel-image-2.4.27-2-386
Version: 2.4.27-10
Severity: important

On this machine, I get the following error several times by hour:

hda: dma_timer_expiry: dma status == 0x21
hda: error waiting for DMA
hda: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest }

and the whole machine freezes for several seconds.

You'll find below the output of dmesg and "lspci -v".

-- System Information:
Debian Release: 3.1
Architecture: i386 (i686)
Kernel: Linux 2.4.27-2-386
Locale: LANG=fr_FR.UTF-8, LC_CTYPE=en_US.ISO8859-1 (charmap=ISO-8859-1)

Versions of packages kernel-image-2.4.27-2-386 depends on:
ii  coreutils [fileutils] 5.2.1-2The GNU core utilities
ii  initrd-tools  0.1.81.1   tools to create initrd image for p
ii  modutils  2.4.26-1.2 Linux module utilities

-- no debconf information

Linux version 2.4.27-2-386 ([EMAIL PROTECTED]) (gcc version 3.3.5 (Debian 
1:3.3.5-12)) #1 Mon May 16 16:47:51 JST 2005
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000f - 0010 (reserved)
 BIOS-e820: 0010 - 07ff (usable)
 BIOS-e820: 07ff - 07ff8000 (ACPI data)
 BIOS-e820: 07ff8000 - 0800 (ACPI NVS)
 BIOS-e820:  - 0001 (reserved)
127MB LOWMEM available.
On node 0 totalpages: 32752
zone(0): 4096 pages.
zone(1): 28656 pages.
zone(2): 0 pages.
ACPI disabled because your bios is from 97 and too old
You can enable it with acpi=force
Kernel command line: root=/dev/hda1 ro 
No local APIC present or hardware disabled
Initializing CPU#0
Detected 499.049 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 992.87 BogoMIPS
Memory: 123592k/131008k available (1069k kernel code, 7028k reserved, 459k 
data, 96k init, 0k highmem)
Dentry cache hash table entries: 16384 (order: 5, 131072 bytes)
Inode cache hash table entries: 8192 (order: 4, 65536 bytes)
Mount cache hash table entries: 512 (order: 0, 4096 bytes)
Buffer cache hash table entries: 4096 (order: 2, 16384 bytes)
Page-cache hash table entries: 32768 (order: 5, 131072 bytes)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
CPU: After generic, caps: 0081f9ff c0c1f9ff  
CPU: Common caps: 0081f9ff c0c1f9ff  
CPU: AMD-K7(tm) Processor stepping 02
Checking 'hlt' instruction... OK.
Checking for popad bug... OK.
POSIX conformance testing by UNIFIX
ACPI: Subsystem revision 20040326
ACPI: Interpreter disabled.
PCI: PCI BIOS revision 2.10 entry at 0xfdb01, last bus=1
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
PCI: Using IRQ router VIA [1106/0686] at 00:07.0
PCI: Disabling Via external APIC routing
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
Starting kswapd
VFS: Disk quotas vdquot_6.5.1
devfs: v1.12c (20020818) Richard Gooch ([EMAIL PROTECTED])
devfs: boot_options: 0x0
pty: 256 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with HUB-6 MANY_PORTS MULTIPORT 
SHARE_IRQ SERIAL_PCI enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
COMX: driver version 0.85 (C) 1995-1999 ITConsult-Pro Co. <[EMAIL PROTECTED]>
RAMDISK driver initialized: 16 RAM disks of 8192K size 1024 blocksize
Initializing Cryptographic API
NET4: Linux TCP/IP 1.0 for NET4.0
IP: routing cache hash table of 512 buckets, 4Kbytes
TCP: Hash tables configured (established 8192 bind 16384)
Linux IP multicast router 0.06 plus PIM-SM
RAMDISK: cramfs filesystem found at block 0
RAMDISK: Loading 3692 blocks [1 disk] into ram disk... 
|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-done.
Freeing initrd memory: 3692k freed
VFS: Mounted root (cramfs filesystem).
Freeing unused kernel memory: 96k freed
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
Uniform Multi-Platform E-IDE driver Revision: 7.00beta4-2.4
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ide: late registration of driver.
VP_IDE: IDE controller at PCI slot 00:07.1
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: VIA vt82c686a (rev 14) IDE UDMA66 controller on pci00:07.1
ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0xffa8-0xffaf,

Bug#323452: "hda: dma_timer_expiry: dma status == 0x21" errors + freeze (VIA VT82C686 chipset)

2005-08-16 Thread maximilian attems
On Tue, 16 Aug 2005, Vincent Lefevre wrote:

> On this machine, I get the following error several times by hour:
> 
> hda: dma_timer_expiry: dma status == 0x21
> hda: error waiting for DMA
> hda: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest }

your drive appears busted.
i would back up any important data it holds.
 
> and the whole machine freezes for several seconds.
> 
> You'll find below the output of dmesg and "lspci -v".
> 
> -- System Information:
> Debian Release: 3.1
> Architecture: i386 (i686)
> Kernel: Linux 2.4.27-2-386
> Locale: LANG=fr_FR.UTF-8, LC_CTYPE=en_US.ISO8859-1 (charmap=ISO-8859-1)
> 
> Versions of packages kernel-image-2.4.27-2-386 depends on:
> ii  coreutils [fileutils] 5.2.1-2The GNU core utilities
> ii  initrd-tools  0.1.81.1   tools to create initrd image for 
> p
> ii  modutils  2.4.26-1.2 Linux module utilities
> 
> -- no debconf information
> 
> Linux version 2.4.27-2-386 ([EMAIL PROTECTED]) (gcc version 3.3.5 (Debian 
> 1:3.3.5-12)) #1 Mon May 16 16:47:51 JST 2005
> BIOS-provided physical RAM map:
>  BIOS-e820:  - 0009fc00 (usable)
>  BIOS-e820: 0009fc00 - 000a (reserved)
>  BIOS-e820: 000f - 0010 (reserved)
>  BIOS-e820: 0010 - 07ff (usable)
>  BIOS-e820: 07ff - 07ff8000 (ACPI data)
>  BIOS-e820: 07ff8000 - 0800 (ACPI NVS)
>  BIOS-e820:  - 0001 (reserved)
> 127MB LOWMEM available.
> On node 0 totalpages: 32752
> zone(0): 4096 pages.
> zone(1): 28656 pages.
> zone(2): 0 pages.
> ACPI disabled because your bios is from 97 and too old
> You can enable it with acpi=force
> Kernel command line: root=/dev/hda1 ro 
> No local APIC present or hardware disabled
> Initializing CPU#0
> Detected 499.049 MHz processor.
> Console: colour VGA+ 80x25
> Calibrating delay loop... 992.87 BogoMIPS
> Memory: 123592k/131008k available (1069k kernel code, 7028k reserved, 459k 
> data, 96k init, 0k highmem)
> Dentry cache hash table entries: 16384 (order: 5, 131072 bytes)
> Inode cache hash table entries: 8192 (order: 4, 65536 bytes)
> Mount cache hash table entries: 512 (order: 0, 4096 bytes)
> Buffer cache hash table entries: 4096 (order: 2, 16384 bytes)
> Page-cache hash table entries: 32768 (order: 5, 131072 bytes)
> CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
> CPU: L2 Cache: 512K (64 bytes/line)
> CPU: After generic, caps: 0081f9ff c0c1f9ff  
> CPU: Common caps: 0081f9ff c0c1f9ff  
> CPU: AMD-K7(tm) Processor stepping 02
> Checking 'hlt' instruction... OK.
> Checking for popad bug... OK.
> POSIX conformance testing by UNIFIX
> ACPI: Subsystem revision 20040326
> ACPI: Interpreter disabled.
> PCI: PCI BIOS revision 2.10 entry at 0xfdb01, last bus=1
> PCI: Using configuration type 1
> PCI: Probing PCI hardware
> PCI: Probing PCI hardware (bus 00)
> PCI: Using IRQ router VIA [1106/0686] at 00:07.0
> PCI: Disabling Via external APIC routing
> Linux NET4.0 for Linux 2.4
> Based upon Swansea University Computer Society NET3.039
> Initializing RT netlink socket
> Starting kswapd
> VFS: Disk quotas vdquot_6.5.1
> devfs: v1.12c (20020818) Richard Gooch ([EMAIL PROTECTED])
> devfs: boot_options: 0x0
> pty: 256 Unix98 ptys configured
> Serial driver version 5.05c (2001-07-08) with HUB-6 MANY_PORTS MULTIPORT 
> SHARE_IRQ SERIAL_PCI enabled
> ttyS00 at 0x03f8 (irq = 4) is a 16550A
> COMX: driver version 0.85 (C) 1995-1999 ITConsult-Pro Co. <[EMAIL PROTECTED]>
> RAMDISK driver initialized: 16 RAM disks of 8192K size 1024 blocksize
> Initializing Cryptographic API
> NET4: Linux TCP/IP 1.0 for NET4.0
> IP: routing cache hash table of 512 buckets, 4Kbytes
> TCP: Hash tables configured (established 8192 bind 16384)
> Linux IP multicast router 0.06 plus PIM-SM
> RAMDISK: cramfs filesystem found at block 0
> RAMDISK: Loading 3692 blocks [1 disk] into ram disk... 
> |/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-done.
> Freeing initrd memory: 3692k freed
> VFS: Mounted root (cramfs filesystem).
> Freeing unused kernel memory: 96k freed
> NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
> Uniform Multi-Platform E-IDE driver Revision: 7.00beta4-2.4
> ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
> ide: late registration of driver.
> VP_IDE: IDE controller at PCI slot 00:07.1
> VP_IDE: chipset revision 6
> VP_IDE: not 100% native mode: will probe irqs later

Bug#323452: "hda: dma_timer_expiry: dma status == 0x21" errors + freeze (VIA VT82C686 chipset)

2005-08-16 Thread Vincent Lefevre
On 2005-08-17 00:34:15 +0200, maximilian attems wrote:
> On Tue, 16 Aug 2005, Vincent Lefevre wrote:
> 
> > On this machine, I get the following error several times by hour:
> > 
> > hda: dma_timer_expiry: dma status == 0x21
> > hda: error waiting for DMA
> > hda: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest }
> 
> your drive appears busted.
> i would back up any important data it holds.

When searching for "dma_timer_expiry: dma status == 0x21" on Google,
I get about 787 pages. It could be a kernel problem as well. I've
installed another kernel image (the 2.6.8-2-k7) and will see if I
get the same errors. After 20 minutes, I haven't got any.

I've just installed smartmontools, and there are several errors
at disk power-on, but the short test completes without error.

-- 
Vincent Lefèvre <[EMAIL PROTECTED]> - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / SPACES project at LORIA


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#323452: "hda: dma_timer_expiry: dma status == 0x21" errors + freeze (VIA VT82C686 chipset)

2005-08-16 Thread Horms
On Wed, Aug 17, 2005 at 01:53:47AM +0200, Vincent Lefevre wrote:
> On 2005-08-17 00:34:15 +0200, maximilian attems wrote:
> > On Tue, 16 Aug 2005, Vincent Lefevre wrote:
> > 
> > > On this machine, I get the following error several times by hour:
> > > 
> > > hda: dma_timer_expiry: dma status == 0x21
> > > hda: error waiting for DMA
> > > hda: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest 
> > > }
> > 
> > your drive appears busted.
> > i would back up any important data it holds.
> 
> When searching for "dma_timer_expiry: dma status == 0x21" on Google,
> I get about 787 pages. It could be a kernel problem as well. I've
> installed another kernel image (the 2.6.8-2-k7) and will see if I
> get the same errors. After 20 minutes, I haven't got any.
> 
> I've just installed smartmontools, and there are several errors
> at disk power-on, but the short test completes without error.

I agree with Maximilian's advice that it is highly likely that
errors like this typically indicate that a disk is failing and 
that you should back up your data and replace the disk.

-- 
Horms


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#323452: "hda: dma_timer_expiry: dma status == 0x21" errors + freeze (VIA VT82C686 chipset)

2005-08-17 Thread Vincent Lefevre
On 2005-08-17 13:25:07 +0900, Horms wrote:
> I agree with Maximilian's advice that it is highly likely that
> errors like this typically indicate that a disk is failing and 
> that you should back up your data and replace the disk.

Then is there any reason why SMART doesn't detect any problem
except some errors at power-on time? And why don't I get such
errors with a 2.6 kernel (though I haven't done much testing)
if it is not a problem with the 2.4 kernel?

The whole machine will be replaced in the near future.

-- 
Vincent Lefèvre <[EMAIL PROTECTED]> - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / SPACES project at LORIA


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#323452: "hda: dma_timer_expiry: dma status == 0x21" errors + freeze (VIA VT82C686 chipset)

2005-08-17 Thread Andres Salomon
On Wed, Aug 17, 2005 at 12:02:31PM +0200, Vincent Lefevre wrote:
[...]
> 
> Then is there any reason why SMART doesn't detect any problem
> except some errors at power-on time? And why don't I get such
> errors with a 2.6 kernel (though I haven't done much testing)
> if it is not a problem with the 2.4 kernel?
> 

What exactly are the SMART errors?  If you force a smart short or long test, 
do you get errors?




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#323452: "hda: dma_timer_expiry: dma status == 0x21" errors + freeze (VIA VT82C686 chipset)

2005-08-17 Thread Vincent Lefevre
On 2005-08-17 12:11:24 -0400, Andres Salomon wrote:
> What exactly are the SMART errors?  If you force a smart short or long test, 
> do you get errors?

No errors for short and long tests.
I've attached the output of "smartctl -a".

-- 
Vincent Lefèvre <[EMAIL PROTECTED]> - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / SPACES project at LORIA
smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model: Maxtor 91360U4
Serial Number:C40EF2RC
Firmware Version: MA540RR0
Device is:Not in smartctl database [for details use: -P showall]
ATA Version is:   4
ATA Standard is:  ATA/ATAPI-4 T13 1153D revision 17
Local Time is:Wed Aug 17 19:06:54 2005 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status:  (   0) The previous self-test routine completed
without error or no self-test has ever 
been run.
Total time to complete Offline 
data collection: (   0) seconds.
Offline data collection
capabilities:(0x1b) SMART execute Offline immediate.
Auto Offline data collection on/off 
support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x00) Error logging NOT supported.
No General Purpose Logging support.
Short self-test routine 
recommended polling time:(   2) minutes.
Extended self-test routine
recommended polling time:(  14) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000a   253   252   000Old_age   Always   
-   4295265052
  3 Spin_Up_Time0x0027   234   234   063Pre-fail  Always   
-   50
  4 Start_Stop_Count0x0032   252   252   000Old_age   Always   
-   3252
  5 Reallocated_Sector_Ct   0x0033   253   253   063Pre-fail  Always   
-   0
  6 Read_Channel_Margin 0x0001   253   253   100Pre-fail  Offline  
-   0
  7 Seek_Error_Rate 0x000a   253   251   000Old_age   Always   
-   51826
  8 Seek_Time_Performance   0x0027   250   237   187Pre-fail  Always   
-   163419211169726
  9 Power_On_Hours  0x0032   237   237   000Old_age   Always   
-   346205
 10 Spin_Retry_Count0x002b   253   252   223Pre-fail  Always   
-   89
 11 Calibration_Retry_Count 0x002b   253   252   223Pre-fail  Always   
-   77
 12 Power_Cycle_Count   0x0032   249   249   000Old_age   Always   
-   1884
196 Reallocated_Event_Count 0x0008   253   253   000Old_age   Offline  
-   0
197 Current_Pending_Sector  0x0008   253   253   000Old_age   Offline  
-   0
198 Offline_Uncorrectable   0x0008   253   253   000Old_age   Offline  
-   0
199 UDMA_CRC_Error_Count0x0008   199   199   000Old_age   Offline  
-   4920
200 Multi_Zone_Error_Rate   0x000a   253   252   000Old_age   Always   
-   279723
201 Soft_Read_Error_Rate0x000a   253   251   000Old_age   Always   
-   25770101532
202 TA_Increase_Count   0x000a   253   252   000Old_age   Always   
-   297756
203 Run_Out_Cancel  0x000b   253   252   180Pre-fail  Always   
-   297756
204 Shock_Count_Write_Opern 0x000a   253   252   000Old_age   Always   
-   297756
205 Shock_Rate_Write_Opern  0x000a   253   252   000Old_age   Always   
-   297756
207 Spin_High_Current   0x002a   253   252   000Old_age   Always   
-   89
208 Spin_Buzz   0x002a   253   252   000Old_age   Always