Hitachi 7K1000 1tb drives and sata_sil 3114 chipset
I've been encountering many, many problems with Hitachi 1tb drives under a Sil3114 chipset and I'm wondering if there could be something wrong with the driver/chipset in relation to these drives. Statistically, I've had 6 out of 7 drives exhibit very strange failure conditions while being used under this controller. Some symptoms: - Clicking noises while in operation - What appears to sound/feel like the drive spins down quickly and back up again with no console output - SMART reporting seek read errors which mysteriously appear/disappear completely - Failed I/O requests I most recently swapped out the 1tb drives for 500gb Hitachi models and have not experienced any of the problems above. The most recent failed I/O requests output lots of messages, which I've pasted below. I triggered the I/O errors by setting up lots of simultaneous copies of large files between two drives to test the configuration. If anyone has any ideas whether this could be some kind of incompatibility or bug, let me know. If anyone has any positive/negative experiences with these drives on this controller, it would also help. Thanks, -Andrew sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK end_request: I/O error, dev sdb, sector 1559281795 sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK end_request: I/O error, dev sdb, sector 1563089767 sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK end_request: I/O error, dev sdb, sector 1563115585 sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK end_request: I/O error, dev sdb, sector 7775 printk: 124 messages suppressed. Buffer I/O error on device sdb1, logical block 3856 Buffer I/O error on device sdb1, logical block 3857 Buffer I/O error on device sdb1, logical block 3858 Buffer I/O error on device sdb1, logical block 3859 EXT3-fs error (device sdb1): ext3_readdir: directory #2 contains a hole at offset 0 # lspci -vvnnxxx -s 00:11.0 00:11.0 RAID bus controller [0104]: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller [1095:3114] (rev 02) Subsystem: Silicon Image, Inc. SiI 3114 SATARaid Controller [1095:6114] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium TAbort- TAbort- MAbort- SERR- PERR- Latency: 64, Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 10 Region 0: I/O ports at fd00 [size=8] Region 1: I/O ports at fc00 [size=4] Region 2: I/O ports at fb00 [size=8] Region 3: I/O ports at fa00 [size=4] Region 4: I/O ports at f900 [size=16] Region 5: Memory at efffb000 (32-bit, non-prefetchable) [size=1K] [virtual] Expansion ROM at 1000 [disabled] [size=512K] Capabilities: [60] Power Management version 2 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=2 PME- 00: 95 10 14 31 07 00 b0 02 02 00 04 01 08 40 00 00 10: 01 fd 00 00 01 fc 00 00 01 fb 00 00 01 fa 00 00 20: 01 f9 00 00 00 b0 ff ef 00 00 00 00 95 10 14 61 30: 00 00 00 00 60 00 00 00 00 00 00 00 0b 01 00 00 40: 02 00 00 00 02 c0 81 00 00 00 00 00 00 00 00 00 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 60: 01 00 22 06 00 40 00 64 00 00 00 00 00 00 00 00 70: 00 00 60 00 00 f0 0a 0f 00 00 60 00 00 b0 15 0f 80: 03 00 00 00 03 00 00 00 00 00 00 00 3f 5b ca 53 90: 00 00 00 08 ff ff 00 00 00 00 00 19 00 00 00 00 a0: 01 31 15 65 dd 62 dd 62 92 43 92 43 09 40 09 40 b0: 01 31 15 65 dd 62 dd 62 92 43 92 43 09 40 09 40 c0: 84 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHSET libata-dev#upstream] clean up scsi_host_templates and ata_port_operations
Tejun Heo wrote: The following drivers need specific platform to build, so they need verification. If you work on one of the following drivers, please verify that the driver builds and works fine. It would be best if you can verify that the sht and ops don't change by the fifth path using the method I'll write in another message. .. * pata_scc I check pata_scc and it works fine. Your verification method detects 1 difference in ops (.thaw: NULL - ata_bmdma_thaw) but there is no problem. Cheers - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: About forcing 32bit DMA patch for AMD690G(SB600)
Hi Andrew: Thanks for your help on your platform. And Is there any update at your side on SB600 64bit DMA capacity? As Tejun mentioned, the test result on my SB600 engineering board (RS690 A12 +SB600 A21) is a little different from the result of Srihari. But I do not have other SB600 boards especially ASUS M2A-VM to do further debug. So if you can provide us your test result, that's really good. Thanks Shane -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andrew Paprocki Sent: Saturday, January 26, 2008 9:08 AM To: Tejun Heo Cc: Shane Huang; [EMAIL PROTECTED]; linux-ide@vger.kernel.org Subject: Re: About forcing 32bit DMA patch for AMD690G(SB600) I'll try to get that configuration together.. right now I only have 2 1gb sticks installed on the board, so I would need to track down 2gb ones. If I can find some laying around, I'll let you know. Thanks, -Andrew On Jan 25, 2008 12:50 AM, Tejun Heo [EMAIL PROTECTED] wrote: Andrew Paprocki wrote: I have an SB600/RS690 here with SATA drives connected. I haven't been following this thread, but I can help test something if it would help. We're trying to determine whether SB600 ahci controller can do 64bit DMA or not. Srihari's couldn't but Shane's test result tells a different story. Do you have memory mapped over 4G (if you have 4G some of them will be over 4G, you can know this by looking at the e820 map printed during boot)? - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/13] sata_mv ncq Use DMA memory pools for hardware memory tables
Mark Lord writes: Tejun Heo wrote: .. I'm skeptical about the benefit of IRQ coalescing on storage controllers. Coalescing improves performance when there are many small requests to complete and if you put a lot of small non-consecutive requests to a disk, it gets really really really slow and IRQ coalescing just doesn't matter at all. The only way to achieve high number of completions is to issue small commands to consecutive addresses which is just silly. In storage, high volume transfer is achieved through request coalescing not completion coalescing and this is true for even SDDs. .. One cool thing with the Marvell cores, is that they actually implement transaction based IRQ coalescing, whereby a number of related I/O commands (say, all the RAID5 member commands generated by a single R/W request) can be tagged together, generating an interrupt only when they all complete (or after a timeout if something goes wrong). We don't have anything resembling an appropriate abstraction for that yet, so I doubt that we could really take advantage of it. Promise SATA controllers have this feature too, though sata_promise doesn't make use of it. - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: About forcing 32bit DMA patch for AMD690G(SB600)
Dear Tejun: The test results point to varied kinds and degrees of problems. At the moment. To avoid turning off anything fancy on systems involving SB600/700, we definitely need more info. Shane, can you please summarize chipset product lines and revisions and how they're configured together (e.g. SB600 Axx goes together with RSxxx kind of stuff)? I'll have to ask for other guys' help to summarize them, and will provide it here once I get it. Currently the following issues have been discovered and we need to find out what's caused by which. .. .. * Shane's test with RS690 + SB600 triggered a weird SERR_INTERNAL error condition if pci=nomsi is used insted of quirk_disable_all_msi. This is super-weird. Maybe difference in memory layout and 64bit DMA acutally didn't work? Shane, can you please do some data write/read/verify test on the setup? I will do further debug on these issues before long, because I'm busy with other issues and my SB600 board is being used by other guy.. :-( Thanks Shane - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089
On Thu, Jan 31 2008, Nai Xia wrote: My dmesg relevant info is quite similar: [6.875041] Freeing unused kernel memory: 320k freed [8.143120] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 [8.144439] [8.144439] sector 10824201199534213, nr/cnr 0/0 [8.144439] bio cf029280, biotail cf029280, buffer , data , len 158 [8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [8.144439] backup: data_len=158 bi_size=158 [8.160756] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 [8.160756] [8.160756] sector 2669858, nr/cnr 0/0 [8.160756] bio cf029300, biotail cf029300, buffer , data , len 158 [8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [8.160756] backup: data_len=158 bi_size=158 [ 14.851101] eth0: link up [ 27.121883] eth0: no IPv6 routers present And by the way, Kiyoshi, This can be reproduced in a typical setup vmware workstation 6.02 with a vritual IDE cdrom, in case you wanna catch that with your own eyes. :-) Thanks for your trying hard to correct this annoying bug. The below fix should be enough. It's perfectly legal to have leftover byte counts when the drive signals completion, happens all the time for eg user issued commands where you don't know an exact byte count. diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c index 74c6087..bee05a3 100644 --- a/drivers/ide/ide-cd.c +++ b/drivers/ide/ide-cd.c @@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t *drive) */ if ((stat DRQ_STAT) == 0) { spin_lock_irqsave(ide_lock, flags); - if (__blk_end_request(rq, 0, 0)) + if (__blk_end_request(rq, 0, rq-data_len)) BUG(); HWGROUP(drive)-rq = NULL; spin_unlock_irqrestore(ide_lock, flags); -- Jens Axboe - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089
On Thu, Jan 31 2008, Florian Lohoff wrote: On Thu, Jan 31, 2008 at 02:05:58PM +0100, Jens Axboe wrote: The below fix should be enough. It's perfectly legal to have leftover byte counts when the drive signals completion, happens all the time for eg user issued commands where you don't know an exact byte count. diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c index 74c6087..bee05a3 100644 --- a/drivers/ide/ide-cd.c +++ b/drivers/ide/ide-cd.c @@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t *drive) */ if ((stat DRQ_STAT) == 0) { spin_lock_irqsave(ide_lock, flags); - if (__blk_end_request(rq, 0, 0)) + if (__blk_end_request(rq, 0, rq-data_len)) BUG(); HWGROUP(drive)-rq = NULL; spin_unlock_irqrestore(ide_lock, flags); Fixes the crash on boot for me ... Great, thanks for confirming that. I'll make sure the patch goes upstream today, if Linus is available. -- Jens Axboe - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089
On Thu, Jan 31, 2008 at 02:05:58PM +0100, Jens Axboe wrote: The below fix should be enough. It's perfectly legal to have leftover byte counts when the drive signals completion, happens all the time for eg user issued commands where you don't know an exact byte count. diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c index 74c6087..bee05a3 100644 --- a/drivers/ide/ide-cd.c +++ b/drivers/ide/ide-cd.c @@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t *drive) */ if ((stat DRQ_STAT) == 0) { spin_lock_irqsave(ide_lock, flags); - if (__blk_end_request(rq, 0, 0)) + if (__blk_end_request(rq, 0, rq-data_len)) BUG(); HWGROUP(drive)-rq = NULL; spin_unlock_irqrestore(ide_lock, flags); Fixes the crash on boot for me ... Flo -- Florian Lohoff [EMAIL PROTECTED] +49-171-2280134 Those who would give up a little freedom to get a little security shall soon have neither - Benjamin Franklin signature.asc Description: Digital signature
Add one more HITACHI SATA disk to NCQ blacklist
Hi, The hard disk with model num: HITACHI HTS541616J9SA00 model rev: SB4IC7UP is causing NCQ errors and should be blacklisted. Currently the blacklist for Hitachi hard disks includes { HITACHI HDS7250SASUN500G*, NULL,ATA_HORKAGE_NONCQ }, { HITACHI HDS7225SBSUN250G*, NULL,ATA_HORKAGE_NONCQ }, ... /* Blacklist entries taken from Silicon Image 3124/3132 Windows driver .inf file - also several Linux problem reports */ { HTS541060G9SA00,MB3OC60D, ATA_HORKAGE_NONCQ, }, { HTS541080G9SA00,MB4OC60D, ATA_HORKAGE_NONCQ, }, { HTS541010G9SA00,MBZOC60D, ATA_HORKAGE_NONCQ, }, http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob_plain;f=drivers/ata/libata-core.c;hb=HEAD The same hard disk causing NCQ errors to different users, http://www.nabble.com/hdparm--B-1-and-Load_Cycle-on-Hitachi-HTS541616J9SA00-td14702758.html https://bugs.edge.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/137470 Could you please add the entry { HITACHI HTS541616J9SA00,NULL, ATA_HORKAGE_NONCQ, }, Simos - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Add one more HITACHI SATA disk to NCQ blacklist
Mark Lord wrote: Simos Xenitellis wrote: Hi, The hard disk with model num: HITACHI HTS541616J9SA00 model rev: SB4IC7UP is causing NCQ errors and should be blacklisted. .. https://bugs.edge.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/137470 .. That one is for 2.6.22. We need to know if the problem exists with 2.6.24, as the NCQ code has had a number of fixes since July 2007. .. Answering my own question here: if one actually reads through the thread linked to above, it says that the 2.6.24 kernel seems to have resolved all issues. .. Could you please add the entry { HITACHI HTS541616J9SA00,NULL, ATA_HORKAGE_NONCQ, }, .. There doesn't appear to be any reason to do this, unless you have more information ? - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Add one more HITACHI SATA disk to NCQ blacklist
Simos Xenitellis wrote: Hi, The hard disk with model num: HITACHI HTS541616J9SA00 model rev: SB4IC7UP is causing NCQ errors and should be blacklisted. Currently the blacklist for Hitachi hard disks includes { HITACHI HDS7250SASUN500G*, NULL,ATA_HORKAGE_NONCQ }, { HITACHI HDS7225SBSUN250G*, NULL,ATA_HORKAGE_NONCQ }, ... /* Blacklist entries taken from Silicon Image 3124/3132 Windows driver .inf file - also several Linux problem reports */ { HTS541060G9SA00,MB3OC60D, ATA_HORKAGE_NONCQ, }, { HTS541080G9SA00,MB4OC60D, ATA_HORKAGE_NONCQ, }, { HTS541010G9SA00,MBZOC60D, ATA_HORKAGE_NONCQ, }, http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob_plain;f=drivers/ata/libata-core.c;hb=HEAD The same hard disk causing NCQ errors to different users, http://www.nabble.com/hdparm--B-1-and-Load_Cycle-on-Hitachi-HTS541616J9SA00-td14702758.html .. Those first two references above look totally bogus -- they have nothing to do with NCQ. https://bugs.edge.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/137470 .. That one is for 2.6.22. We need to know if the problem exists with 2.6.24, as the NCQ code has had a number of fixes since July 2007. Also, more information about the hardware involved would be useful. An NCQ bug could be due to the SATA chipset more than the drive, or perhaps only the combination of the two. Cheers Could you please add the entry { HITACHI HTS541616J9SA00,NULL, ATA_HORKAGE_NONCQ, }, Simos - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089
Hi Jens, On Thu, 31 Jan 2008 14:05:58 +0100, Jens Axboe wrote: On Thu, Jan 31 2008, Nai Xia wrote: My dmesg relevant info is quite similar: [6.875041] Freeing unused kernel memory: 320k freed [8.143120] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 [8.144439] [8.144439] sector 10824201199534213, nr/cnr 0/0 [8.144439] bio cf029280, biotail cf029280, buffer , data , len 158 [8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [8.144439] backup: data_len=158 bi_size=158 [8.160756] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 [8.160756] [8.160756] sector 2669858, nr/cnr 0/0 [8.160756] bio cf029300, biotail cf029300, buffer , data , len 158 [8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [8.160756] backup: data_len=158 bi_size=158 [ 14.851101] eth0: link up [ 27.121883] eth0: no IPv6 routers present And by the way, Kiyoshi, This can be reproduced in a typical setup vmware workstation 6.02 with a vritual IDE cdrom, in case you wanna catch that with your own eyes. :-) Thanks for your trying hard to correct this annoying bug. The below fix should be enough. It's perfectly legal to have leftover byte counts when the drive signals completion, happens all the time for eg user issued commands where you don't know an exact byte count. diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c index 74c6087..bee05a3 100644 --- a/drivers/ide/ide-cd.c +++ b/drivers/ide/ide-cd.c @@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t *drive) */ if ((stat DRQ_STAT) == 0) { spin_lock_irqsave(ide_lock, flags); - if (__blk_end_request(rq, 0, 0)) + if (__blk_end_request(rq, 0, rq-data_len)) BUG(); HWGROUP(drive)-rq = NULL; spin_unlock_irqrestore(ide_lock, flags); OK, I undarstand the leftover is legal. By the way, is it safe to always return success if there is a leftover? I thought we might have to complete the rq with -EIO in such case. Thanks, Kiyoshi Ueda - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089
On 31/01/2008, at 18.04, Kiyoshi Ueda [EMAIL PROTECTED] wrote: Hi Jens, On Thu, 31 Jan 2008 14:05:58 +0100, Jens Axboe wrote: On Thu, Jan 31 2008, Nai Xia wrote: My dmesg relevant info is quite similar: [6.875041] Freeing unused kernel memory: 320k freed [8.143120] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 [8.144439] [8.144439] sector 10824201199534213, nr/cnr 0/0 [8.144439] bio cf029280, biotail cf029280, buffer , data , len 158 [8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [8.144439] backup: data_len=158 bi_size=158 [8.160756] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 [8.160756] [8.160756] sector 2669858, nr/cnr 0/0 [8.160756] bio cf029300, biotail cf029300, buffer , data , len 158 [8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [8.160756] backup: data_len=158 bi_size=158 [ 14.851101] eth0: link up [ 27.121883] eth0: no IPv6 routers present And by the way, Kiyoshi, This can be reproduced in a typical setup vmware workstation 6.02 with a vritual IDE cdrom, in case you wanna catch that with your own eyes. :-) Thanks for your trying hard to correct this annoying bug. The below fix should be enough. It's perfectly legal to have leftover byte counts when the drive signals completion, happens all the time for eg user issued commands where you don't know an exact byte count. diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c index 74c6087..bee05a3 100644 --- a/drivers/ide/ide-cd.c +++ b/drivers/ide/ide-cd.c @@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr (ide_drive_t *drive) */ if ((stat DRQ_STAT) == 0) { spin_lock_irqsave(ide_lock, flags); -if (__blk_end_request(rq, 0, 0)) +if (__blk_end_request(rq, 0, rq-data_len)) BUG(); HWGROUP(drive)-rq = NULL; spin_unlock_irqrestore(ide_lock, flags); OK, I undarstand the leftover is legal. By the way, is it safe to always return success if there is a leftover? I thought we might have to complete the rq with -EIO in such case. data_len being non zero should pass the residual count back to the issuer. - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug: get EXT3-fs error Allocating block in system zone
Robert Hancock schrieb: Linus Torvalds wrote: On Mon, 10 Dec 2007, Marco Gatti wrote: I didn't compile completly. drivers/scsi/scsi_lib.c:1565:1: error: unterminated #else Heh. That #else should be an #endif, of course. It is a bit strange that it still tries to do IO to high memory. Either the whole 64 bit capability thing in AHCI is broken, or the bounce buffering doesn't work right. Or maybe you tried the iommu=off without the original patch that tried to turn off 64-bit DMA? Linus From what I can see, it appears that iommu=off disables the IOMMU but doesn't actually do anything to prevent attempts to DMA above 4GB. If you try to map something over 4GB it just chokes with that mask overflow (in arch/x86/kernel/pci-nommu_64.c). The iommu=off option actually seems rather useless, as it's the default in the only case where it will actually work (no memory above 4GB).. Hi, finally got a BIOS update from Fujitsu-Siemens-Computers that solved that problem. Now it works with 2.6.24. if interesting I added dmesg here: Linux version 2.6.24 ([EMAIL PROTECTED]) (gcc version 4.1.1 (Gentoo 4.1.1-r3)) #2 SMP Thu Jan 31 19:38:52 CET 2008 Command line: root=/dev/sda3 udev BIOS-provided physical RAM map: BIOS-e820: - 0009c800 (usable) BIOS-e820: 0009c800 - 000a (reserved) BIOS-e820: 000ce000 - 000d (reserved) BIOS-e820: 000e - 0010 (reserved) BIOS-e820: 0010 - df5b (usable) BIOS-e820: df5b - df5c4000 (ACPI data) BIOS-e820: df5c4000 - df5c7000 (ACPI NVS) BIOS-e820: df5c7000 - e000 (reserved) BIOS-e820: f800 - fc00 (reserved) BIOS-e820: fec0 - fec1 (reserved) BIOS-e820: fee0 - fee01000 (reserved) BIOS-e820: ffb0 - 0001 (reserved) BIOS-e820: 0001 - 00020e00 (usable) BIOS-e820: 00020e00 - 00021000 (reserved) Entering add_active_range(0, 0, 156) 0 entries of 3200 used Entering add_active_range(0, 256, 914864) 1 entries of 3200 used Entering add_active_range(0, 1048576, 2154496) 2 entries of 3200 used end_pfn_map = 2162688 DMI present. ACPI: RSDP 000F7350, 0014 (r0 PTLTD ) ACPI: RSDT DF5BEDF9, 0058 (r1 PTLTDRSDT 6 LTP0) ACPI: FACP DF5C3AF3, 0074 (r1 FSC6 F4240) ACPI: DSDT DF5BEE51, 4CA2 (r1 FSCD2587/A16 MSFT 301) ACPI: FACS DF5C6FC0, 0040 ACPI: TCPA DF5C3B67, 0032 (r1 Phoeni x 6 TL 0) ACPI: _MAR DF5C3B99, 0030 (r1 Intel OEMDMAR 6 LOHR1) ACPI: SSDT DF5C3BC9, 007A (r1 FSCCST_CPU06 CSF1) ACPI: SSDT DF5C3C43, 007A (r1 FSCCST_CPU16 CSF1) ACPI: SSDT DF5C3CBD, 00B6 (r1 FSCPST_CPU06 CSF1) ACPI: SSDT DF5C3D73, 00B6 (r1 FSCPST_CPU16 CSF1) ACPI: SPCR DF5C3E29, 0050 (r1 PTLTD $UCRTBL$6 PTL 1) ACPI: MCFG DF5C3E79, 003C (r1 PTLTDMCFG 6 LTP0) ACPI: HPET DF5C3EB5, 0038 (r1 PTLTD HPETTBL 6 LTP1) ACPI: APIC DF5C3EED, 0068 (r1 PTLTD APIC 6 LTP0) ACPI: BOOT DF5C3F55, 0028 (r1 PTLTD $SBFTBL$6 LTP1) ACPI: ASF! DF5C3F7D, 0083 (r16 CETP CETP6 PTL 1) ACPI: DMI detected: Fujitsu Siemens No NUMA configuration found Faking a node at -00020e00 Entering add_active_range(0, 0, 156) 0 entries of 3200 used Entering add_active_range(0, 256, 914864) 1 entries of 3200 used Entering add_active_range(0, 1048576, 2154496) 2 entries of 3200 used Bootmem setup node 0 -00020e00 [e200-e21f] PMD -81000120 on node 0 [e220-e23f] PMD -81000160 on node 0 [e240-e25f] PMD -810001a0 on node 0 [e260-e27f] PMD -810001e0 on node 0 [e280-e29f] PMD -81000220 on node 0 [e2a0-e2bf] PMD -81000260 on node 0 [e2c0-e2df] PMD -810002a0 on node 0 [e2e0-e2ff] PMD -810002e0 on node 0 [e2000100-e200011f] PMD -81000320 on node 0 [e2000120-e200013f] PMD -81000360 on node 0 [e2000140-e200015f] PMD -810003a0 on node 0 [e2000160-e200017f] PMD -810003e0 on node 0 [e2000180-e200019f] PMD -81000420 on node 0 [e20001a0-e20001bf] PMD -81000460 on node 0 [e20001c0-e20001df] PMD -810004a0 on node 0 [e20001e0-e20001ff] PMD -810004e0 on node 0 [e2000200-e200021f] PMD -81000520 on node 0 [e2000220-e200023f]
Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089
Hi Jens, On Thu, 31 Jan 2008 19:16:54 +0100, Jens Axboe wrote: On 31/01/2008, at 18.04, Kiyoshi Ueda [EMAIL PROTECTED] wrote: On Thu, 31 Jan 2008 14:05:58 +0100, Jens Axboe wrote: On Thu, Jan 31 2008, Nai Xia wrote: My dmesg relevant info is quite similar: [6.875041] Freeing unused kernel memory: 320k freed [8.143120] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 [8.144439] [8.144439] sector 10824201199534213, nr/cnr 0/0 [8.144439] bio cf029280, biotail cf029280, buffer , data , len 158 [8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [8.144439] backup: data_len=158 bi_size=158 [8.160756] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 [8.160756] [8.160756] sector 2669858, nr/cnr 0/0 [8.160756] bio cf029300, biotail cf029300, buffer , data , len 158 [8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [8.160756] backup: data_len=158 bi_size=158 [ 14.851101] eth0: link up [ 27.121883] eth0: no IPv6 routers present And by the way, Kiyoshi, This can be reproduced in a typical setup vmware workstation 6.02 with a vritual IDE cdrom, in case you wanna catch that with your own eyes. :-) Thanks for your trying hard to correct this annoying bug. The below fix should be enough. It's perfectly legal to have leftover byte counts when the drive signals completion, happens all the time for eg user issued commands where you don't know an exact byte count. diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c index 74c6087..bee05a3 100644 --- a/drivers/ide/ide-cd.c +++ b/drivers/ide/ide-cd.c @@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr (ide_drive_t *drive) */ if ((stat DRQ_STAT) == 0) { spin_lock_irqsave(ide_lock, flags); -if (__blk_end_request(rq, 0, 0)) +if (__blk_end_request(rq, 0, rq-data_len)) BUG(); HWGROUP(drive)-rq = NULL; spin_unlock_irqrestore(ide_lock, flags); OK, I undarstand the leftover is legal. By the way, is it safe to always return success if there is a leftover? I thought we might have to complete the rq with -EIO in such case. data_len being non zero should pass the residual count back to the issuer. Aah, so the issuer can know how many bytes of the I/Os are not done, and the error status of the bio which is completed by end_that_request_first() in __blk_end_request() don't matter for the issuer. OK, thanks. I think the patch is fine. Thanks, Kiyoshi Ueda - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Add one more HITACHI SATA disk to NCQ blacklist
On Thu, 2008-01-31 at 10:07 -0700, Eric D. Mudama wrote: I think the spurious completions issue was addressed in the 2.6.24-rc development series and is no longer an issue. The two links you report as NCQ errors, the first is from 2.6.22 (predating the fix) and the second is a SMART issue, not an NCQ issue. Thanks Marc, Eric for the replies. My original post with the error messages and the details of the disk is at http://www.mail-archive.com/linux-ide@vger.kernel.org/msg12243.html I'll have to test the latest version of the kernel to see if the problem still persists. Cheers, Simos --eric On Jan 31, 2008 9:33 AM, Simos Xenitellis [EMAIL PROTECTED] wrote: Hi, The hard disk with model num: HITACHI HTS541616J9SA00 model rev: SB4IC7UP is causing NCQ errors and should be blacklisted. Currently the blacklist for Hitachi hard disks includes { HITACHI HDS7250SASUN500G*, NULL,ATA_HORKAGE_NONCQ }, { HITACHI HDS7225SBSUN250G*, NULL,ATA_HORKAGE_NONCQ }, ... /* Blacklist entries taken from Silicon Image 3124/3132 Windows driver .inf file - also several Linux problem reports */ { HTS541060G9SA00,MB3OC60D, ATA_HORKAGE_NONCQ, }, { HTS541080G9SA00,MB4OC60D, ATA_HORKAGE_NONCQ, }, { HTS541010G9SA00,MBZOC60D, ATA_HORKAGE_NONCQ, }, http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob_plain;f=drivers/ata/libata-core.c;hb=HEAD The same hard disk causing NCQ errors to different users, http://www.nabble.com/hdparm--B-1-and-Load_Cycle-on-Hitachi-HTS541616J9SA00-td14702758.html https://bugs.edge.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/137470 Could you please add the entry { HITACHI HTS541616J9SA00,NULL, ATA_HORKAGE_NONCQ, }, Simos - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089
On Thu, Jan 31, 2008 at 02:05:58PM +0100, Jens Axboe wrote: On Thu, Jan 31 2008, Nai Xia wrote: My dmesg relevant info is quite similar: [6.875041] Freeing unused kernel memory: 320k freed [8.143120] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 [8.144439] [8.144439] sector 10824201199534213, nr/cnr 0/0 [8.144439] bio cf029280, biotail cf029280, buffer , data , len 158 [8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [8.144439] backup: data_len=158 bi_size=158 [8.160756] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 [8.160756] [8.160756] sector 2669858, nr/cnr 0/0 [8.160756] bio cf029300, biotail cf029300, buffer , data , len 158 [8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [8.160756] backup: data_len=158 bi_size=158 [ 14.851101] eth0: link up [ 27.121883] eth0: no IPv6 routers present And by the way, Kiyoshi, This can be reproduced in a typical setup vmware workstation 6.02 with a vritual IDE cdrom, in case you wanna catch that with your own eyes. :-) Thanks for your trying hard to correct this annoying bug. The below fix should be enough. It's perfectly legal to have leftover byte counts when the drive signals completion, happens all the time for eg user issued commands where you don't know an exact byte count. Actually, this behavior has been the case even before the __blk_end_request() changes. I did test plain 2.6.24 with the following --- linux-2.6/drivers/ide/ide-cd.c 2008-01-31 22:18:59.0 +0100 +++ linux-2.6/drivers/ide/ide-cd.c-new 2008-01-31 22:18:50.0 +0100 @@ -1711,8 +1711,12 @@ static ide_startstop_t cdrom_newpc_intr( /* * If DRQ is clear, the command has completed. */ - if ((stat DRQ_STAT) == 0) + if ((stat DRQ_STAT) == 0) { + blk_dump_rq_flags(rq, ide-cd: rq still having bio); + printk(backup: data_len=%u bi_size=%u\n, + rq-data_len, rq-bio-bi_size); goto end_request; + } /* * check which way to transfer data to see whether we've been getting residual byte counts: Jan 31 22:10:06 gollum kernel: [ 26.702877] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 Jan 31 22:10:06 gollum kernel: [ 26.702945] Jan 31 22:10:06 gollum kernel: [ 26.702946] sector 2673511, nr/cnr 0/0 Jan 31 22:10:06 gollum kernel: [ 26.703052] bio dfa8ec40, biotail dfa8ec40, buffer , data , len 158 Jan 31 22:10:06 gollum kernel: [ 26.703122] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 Jan 31 22:10:06 gollum kernel: [ 26.703877] backup: data_len=158 bi_size=158 ... so we've been simply silently ignoring this until now so i guess we don't need to BUG() for something that's totally benign. -- Regards/Gruß, Boris. - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089
Hi Boris, Thank you for the confirmation of original behavior. On Thu, 31 Jan 2008 22:37:40 +0100, Borislav Petkov wrote: On Thu, Jan 31, 2008 at 02:05:58PM +0100, Jens Axboe wrote: On Thu, Jan 31 2008, Nai Xia wrote: My dmesg relevant info is quite similar: [6.875041] Freeing unused kernel memory: 320k freed [8.143120] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 [8.144439] [8.144439] sector 10824201199534213, nr/cnr 0/0 [8.144439] bio cf029280, biotail cf029280, buffer , data , len 158 [8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [8.144439] backup: data_len=158 bi_size=158 [8.160756] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 [8.160756] [8.160756] sector 2669858, nr/cnr 0/0 [8.160756] bio cf029300, biotail cf029300, buffer , data , len 158 [8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [8.160756] backup: data_len=158 bi_size=158 [ 14.851101] eth0: link up [ 27.121883] eth0: no IPv6 routers present And by the way, Kiyoshi, This can be reproduced in a typical setup vmware workstation 6.02 with a vritual IDE cdrom, in case you wanna catch that with your own eyes. :-) Thanks for your trying hard to correct this annoying bug. The below fix should be enough. It's perfectly legal to have leftover byte counts when the drive signals completion, happens all the time for eg user issued commands where you don't know an exact byte count. Actually, this behavior has been the case even before the __blk_end_request() changes. I did test plain 2.6.24 with the following --- linux-2.6/drivers/ide/ide-cd.c2008-01-31 22:18:59.0 +0100 +++ linux-2.6/drivers/ide/ide-cd.c-new2008-01-31 22:18:50.0 +0100 @@ -1711,8 +1711,12 @@ static ide_startstop_t cdrom_newpc_intr( /* * If DRQ is clear, the command has completed. */ - if ((stat DRQ_STAT) == 0) + if ((stat DRQ_STAT) == 0) { + blk_dump_rq_flags(rq, ide-cd: rq still having bio); + printk(backup: data_len=%u bi_size=%u\n, + rq-data_len, rq-bio-bi_size); goto end_request; + } /* * check which way to transfer data to see whether we've been getting residual byte counts: Jan 31 22:10:06 gollum kernel: [ 26.702877] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 Jan 31 22:10:06 gollum kernel: [ 26.702945] Jan 31 22:10:06 gollum kernel: [ 26.702946] sector 2673511, nr/cnr 0/0 Jan 31 22:10:06 gollum kernel: [ 26.703052] bio dfa8ec40, biotail dfa8ec40, buffer , data , len 158 Jan 31 22:10:06 gollum kernel: [ 26.703122] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 Jan 31 22:10:06 gollum kernel: [ 26.703877] backup: data_len=158 bi_size=158 ... so we've been simply silently ignoring this until now so i guess we don't need to BUG() for something that's totally benign. end_that_request_last() is not called when __blk_end_reuqest() returns 1. Then, the issuer isn't waken up. So I think the BUG() or error messages should be there. And fortunately, the issuer seems not to mind whether end_that_request_first() is called for the remaining bio or not. So I think Jens' patch is fine. Thanks, Kiyoshi Ueda - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PROBLEM REMAINS: [sata_nv ADMA breaks ATAPI] Crash on accessing DVD-RAM
Alexander wrote: Hello! The problem described at https://bugzilla.redhat.com/show_bug.cgi?id=351451 and at http://ubuntuforums.org/showthread.php?t=655772 and supposedly fixed by the patch http://kerneltrap.org/mailarchive/linux-kernel/2007/11/25/445094 is still there. I have compiled 2.6.24-rc7 kernel and booted my PC with it just to find out that my SATA DVD-RW is sr0: scsi3-mmc drive: 0x/0x caddy as it was before with 2.6.23.12 and earlier 2.6 kernels compiled for x86_64. Trying to use sr0 after this results in dead hang or reboot. When I put sata_nv.adma=0 or mem=4096M then it's all ok: Can you (or others experiencing this problem) test the latest patch attached to the RH Bugzilla entry here: https://bugzilla.redhat.com/show_bug.cgi?id=351451 and see if it resolves the problem? I have one report of success so far. - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089
On Thu, Jan 31, 2008 at 05:35:56PM -0500, Kiyoshi Ueda wrote: Hi Boris, Thank you for the confirmation of original behavior. On Thu, 31 Jan 2008 22:37:40 +0100, Borislav Petkov wrote: On Thu, Jan 31, 2008 at 02:05:58PM +0100, Jens Axboe wrote: On Thu, Jan 31 2008, Nai Xia wrote: My dmesg relevant info is quite similar: [6.875041] Freeing unused kernel memory: 320k freed [8.143120] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 [8.144439] [8.144439] sector 10824201199534213, nr/cnr 0/0 [8.144439] bio cf029280, biotail cf029280, buffer , data , len 158 [8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [8.144439] backup: data_len=158 bi_size=158 [8.160756] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 [8.160756] [8.160756] sector 2669858, nr/cnr 0/0 [8.160756] bio cf029300, biotail cf029300, buffer , data , len 158 [8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [8.160756] backup: data_len=158 bi_size=158 [ 14.851101] eth0: link up [ 27.121883] eth0: no IPv6 routers present And by the way, Kiyoshi, This can be reproduced in a typical setup vmware workstation 6.02 with a vritual IDE cdrom, in case you wanna catch that with your own eyes. :-) Thanks for your trying hard to correct this annoying bug. The below fix should be enough. It's perfectly legal to have leftover byte counts when the drive signals completion, happens all the time for eg user issued commands where you don't know an exact byte count. Actually, this behavior has been the case even before the __blk_end_request() changes. I did test plain 2.6.24 with the following --- linux-2.6/drivers/ide/ide-cd.c 2008-01-31 22:18:59.0 +0100 +++ linux-2.6/drivers/ide/ide-cd.c-new 2008-01-31 22:18:50.0 +0100 @@ -1711,8 +1711,12 @@ static ide_startstop_t cdrom_newpc_intr( /* * If DRQ is clear, the command has completed. */ - if ((stat DRQ_STAT) == 0) + if ((stat DRQ_STAT) == 0) { + blk_dump_rq_flags(rq, ide-cd: rq still having bio); + printk(backup: data_len=%u bi_size=%u\n, + rq-data_len, rq-bio-bi_size); goto end_request; + } /* * check which way to transfer data to see whether we've been getting residual byte counts: Jan 31 22:10:06 gollum kernel: [ 26.702877] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 Jan 31 22:10:06 gollum kernel: [ 26.702945] Jan 31 22:10:06 gollum kernel: [ 26.702946] sector 2673511, nr/cnr 0/0 Jan 31 22:10:06 gollum kernel: [ 26.703052] bio dfa8ec40, biotail dfa8ec40, buffer , data , len 158 Jan 31 22:10:06 gollum kernel: [ 26.703122] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 Jan 31 22:10:06 gollum kernel: [ 26.703877] backup: data_len=158 bi_size=158 ... so we've been simply silently ignoring this until now so i guess we don't need to BUG() for something that's totally benign. Hi Kiyoshi, end_that_request_last() is not called when __blk_end_reuqest() returns 1. Then, the issuer isn't waken up. So I think the BUG() or error messages should be there. you mean, end_that_request_last() isn't called when __end_that_request_first() returns an error and this is the case only for fs and pc requests. Otherwise it _is_ called, thus simulating somewhat the previous behavior. However, we never BUG()'ged on residual byte counts before and this driver has been in the kernel tree for ages, so what puzzles me now is how is BUG()'ing here better than before and shouldn't we simply issue a warning instead of killing the interrupt handler... ..or am i missing something? -- Regards/Gruß, Boris. - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html