Sg/libata hotplug kernel error
Hi All, Has anyone seen this kernel error before? The controller is Sil3124 and kernel version 2.6.18-rc2. This seems to happen very rarely since this is the first time I 've seen this error message. Thanks for your feedback! Fajun ~ $ dmesg 0608 394833] SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB) [1198330608 395174] sda: Write Protect is off [1198330608 395226] sda: Mode Sense: 00 3a 00 00 [1198330608 395654] SCSI device sda: drive cache: write back [1198330608 396194] SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB) [1198330608 398800] sda: Write Protect is off [1198330608 398862] sda: Mode Sense: 00 3a 00 00 [1198330608 399293] SCSI device sda: drive cache: write back [1198330608 399391] sda: unknown partition table [1198330608 415346] sd 0:0:0:0: Attached scsi disk sda [1198330608 415668] sd 0:0:0:0: Attached scsi generic sg0 type 0 [1198339284 728601] ata1 port frozen [1198339284 728681] ata1: exception Emask 0x10 SAct 0x0 SErr 0x8 action 0x2 frozen [1198339284 728749] ata1: (irq_stat 0x01100010, PHY RDY changed) [1198339284 728802] ata1 port frozen [1198339284 728841] ata1: hard resetting port [1198339285 277716] ata1: SATA link down (SStatus 0 SControl 300) [1198339285 29] ata1 port thawed [1198339285 277818] ata1.00: disabled [1198339285 787776] ata1: EH complete [1198339285 787854] ata1.00: detaching (SCSI 0:0:0:0) [1198339314 961020] ata1 port frozen [1198339314 961097] ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x2 frozen [1198339314 961164] ata1: (irq_stat 0x00b40090, PHY RDY changed) [1198339314 961219] ata1 port frozen [1198339314 961258] ata1: hard resetting port [1198339318 701017] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [1198339318 701090] ata1 port thawed [1198339318 702326] ata1.00: ATA-7, max UDMA/133, 312581808 sectors: LBA48 NCQ (depth 31/32) [1198339318 703916] ata1.00: configured for UDMA/100 [1198339318 703980] ata1: EH complete [1198339318 704735] Vendor: ATA Model: ST9160823AS Rev: AAZ2 [1198339318 704850] Type: Direct-Access ANSI SCSI revision: 05 [1198339318 705847] SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB) [1198339318 706165] sda: Write Protect is off [1198339318 706230] sda: Mode Sense: 00 3a 00 00 [1198339318 706657] SCSI device sda: drive cache: write back [1198339318 707194] SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB) [1198339318 707480] sda: Write Protect is off [1198339318 707541] sda: Mode Sense: 00 3a 00 00 [1198339318 707964] SCSI device sda: drive cache: write back [1198339318 708031] sda: unknown partition table [1198339318 727956] sd 0:0:0:0: Attached scsi disk sda [1198339318 728265] sd 0:0:0:0: Attached scsi generic sg0 type 0 [1198350170 540587] ata1 port frozen [1198350170 540662] ata1: exception Emask 0x10 SAct 0x0 SErr 0x8 action 0x2 frozen [1198350170 540730] ata1: (irq_stat 0x01100010, PHY RDY changed) [1198350170 540780] ata1 port frozen [1198350170 540819] ata1: hard resetting port [1198350171 109527] ata1: SATA link down (SStatus 0 SControl 300) [1198350171 109592] ata1 port thawed [1198350171 109631] ata1.00: disabled [1198350171 619603] ata1: EH complete [1198350171 619683] ata1.00: detaching (SCSI 0:0:0:0) [1198350200 751481] ata1 port frozen [1198350200 751563] ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x2 frozen [1198350200 751630] ata1: (irq_stat 0x00b40090, PHY RDY changed) [1198350200 751685] ata1 port frozen [1198350200 751725] ata1: hard resetting port [1198350204 482807] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [1198350204 482878] ata1 port thawed [1198350204 496761] ata1.00: ATA-7, max UDMA/133, 312581808 sectors: LBA48 NCQ (depth 31/32) [1198350204 498302] ata1.00: configured for UDMA/100 [1198350204 498370] ata1: EH complete [1198350204 499202] Vendor: ATA Model: ST9160823AS Rev: AAZ2 [1198350204 499320] Type: Direct-Access ANSI SCSI revision: 05 [1198350204 500328] SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB) [1198350204 500664] sda: Write Protect is off [1198350204 500716] sda: Mode Sense: 00 3a 00 00 [1198350204 501161] SCSI device sda: drive cache: write back [1198350204 501697] SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB) [1198350204 501987] sda: Write Protect is off [1198350204 502049] sda: Mode Sense: 00 3a 00 00 [1198350204 502472] SCSI device sda: drive cache: write back [1198350204 502540] sda: unknown partition table [1198350204 513485] sd 0:0:0:0: Attached scsi disk sda [1198350204 513870] sd 0:0:0:0: Attached scsi generic sg0 type 0 [1198358880 722104] ata1 port frozen [1198358880 722185] ata1: exception Emask 0x10 SAct 0x0 SErr 0x8 action 0x2 frozen [1198358880 722254] ata1: (irq_stat 0x01100010, PHY RDY changed) [1198358880 722304] ata1 port frozen [1198358880 722343] ata1: hard resetting port [1198358881 291114] ata1: SATA link down (SStatus 0 SControl 300) [1198358881 291176] ata1
Re: Process Scheduling Issue using sg/libata
On 11/18/07, Mark Lord [EMAIL PROTECTED] wrote: Fajun Chen wrote: On 11/17/07, Mark Lord [EMAIL PROTECTED] wrote: .. What you probably intended to do instead, was to use mmap to just allocate some page-aligned RAM, not to actually mmap'd any on-disk data. Right? Here's how that's done: read_buffer = (U8 *)mmap(NULL, buf_sz, PROT_READ | PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0); What I intended to do is to write data into disc or read data from disc via SG_IO as requested by my user-space application. I don't want any automatically scheduled kernel task to sync data with disc. .. Right. Then you definitely do NOT want to mmap your device, because that's exactly what would otherwise happen, by design! I've experimented with memory mapping using MAP_ANONYMOUS as you suggested, the good news is that it does free up the cpu load and my system is much more responsive with the change. .. Yes, that's what we expected to see. The bad news is that the data read back from disc (PIO or DMA read) seems to be invisible to user-space application. For instance, read buffer is all zeros after Identify Device command. Is this expected side effect of MAP_ANONYMOUS option? .. No, that would be a side effect of some other bug in the code. Here (attached) is a working program that performs (PACKET)IDENTIFY DEVICE commands, using a mmap() buffer to receive the data. I verified your program works in my system and my application works as well if changed accordingly. However, this change (indirect IO in sg term) may come at a performance cost for IO intensive applications since it does NOT utilize mmaped buffer managed by sg driver. Please see relevant sg document below: http://sg.torque.net/sg/p/sg_v3_ho.html#id2495330 http://sg.torque.net/sg/p/sg_v3_ho.html#dmmio As an example, sg_rbuf.c in sg3_util package uses SG_FLAG_MMAP_IO flag in SG_IO. Please see source code attached. I also noticed that MAP_ANONYMOUS is NOT used in mmap() call in sg_rbuf.c, which may not be desirable as you pointed out in previous emails. So this brings up an interesting sg usage issue: can we use MAP_ANONYMOUS with SG_FLAG_MMAP_IO flag in SG_IO? Thanks, Fajun sg_rbuf.c Description: Binary data
Re: Process Scheduling Issue using sg/libata
On 11/18/07, Mark Lord [EMAIL PROTECTED] wrote: Fajun Chen wrote: .. I verified your program works in my system and my application works as well if changed accordingly. However, this change (indirect IO in sg term) may come at a performance cost for IO intensive applications since it does NOT utilize mmaped buffer managed by sg driver. Please see relevant sg document below: http://sg.torque.net/sg/p/sg_v3_ho.html#id2495330 http://sg.torque.net/sg/p/sg_v3_ho.html#dmmio As an example, sg_rbuf.c in sg3_util package uses SG_FLAG_MMAP_IO flag in SG_IO. Please see source code attached. I also noticed that MAP_ANONYMOUS is NOT used in mmap() call in sg_rbuf.c, which may not be desirable as you pointed out in previous emails. So this brings up an interesting sg usage issue: can we use MAP_ANONYMOUS with SG_FLAG_MMAP_IO flag in SG_IO? .. The SG_FLAG_MMAP works only with /dev/sg* devices, not /dev/sd* devices. I don't know which kind you were trying to use, since you still have not provided your source code for examination. If you are using /dev/sg*, then you should be able to get your original mmap() code to work. But the behaviour described thus far seems to indicate that your secret program must have been using /dev/sd* instead. As a matter of fact, I'm using /dev/sg*. Due to the size of my test application, I have not be able to compress it into a small and publishable form. However, this issue can be easily reproduced on my ARM XScale target using sg3_util code as follows: 1. Run printtime.c attached, which prints message to console in a loop. 2. Run sgm_dd (part of sg3_util package, source code attached) on the same system as follows: sgm_dd if=/dev/sg0 of=/dev/null count=10M bpt=1 The print task can be delayed for as many as 25 seconds. Surprisingly, I can't reproduce the problem in an i386 test system with a more powerful processor. Some clarification to MAP_ANONYMOUS option in mmap(). After fixing a bug and more testing, this option seems to make no difference to cpu load. Sorry about previous report. Back to the drawing board now :-) Thanks, Fajun printtime.c Description: Binary data sgm_dd.c Description: Binary data
Re: Process Scheduling Issue using sg/libata
On 11/17/07, James Chapman [EMAIL PROTECTED] wrote: Fajun Chen wrote: On 11/16/07, Tejun Heo [EMAIL PROTECTED] wrote: Fajun Chen wrote: I use sg/libata and ata pass through for read/writes. Linux 2.6.18-rc2 and libata version 2.00 are loaded on ARM XScale board. Under heavy cpu load (e.g. when blocks per transfer/sector count is set to 1), I've observed that the test application can suck cpu away for long time (more than 20 seconds) and other processes including high priority shell can not get the time slice to run. What's interesting is that if the application is under heavy IO load (e.g. when blocks per transfer/sector count is set to 256), the problem goes away. I also tested with open source code sg_utils and got the same result, so this is not a problem specific to my user-space application. Since user preemption is checked when the kernel is about to return to user-space from a system call, process scheduler should be invoked after each system call. Something seems to be broken here. I found a similar issue below: http://marc.info/?l=linux-arm-kernelm=103121214521819w=2 But that turns out to be an issue with MTD/JFFS2 drivers, which are not used in my system. Has anyone experienced similar issues with sg/libata? Any information would be greatly appreciated. That's one weird story. Does kernel say anything during that 20 seconds? No. Nothing in kernel log. Fajun Have you considered using oprofile to find out what the CPU is doing during the 20 seconds? Haven't tried oprofile yet, not sure if it will get the time slice to run though. During this 20 seconds, I've verified that my application is still busy with R/W ops. Does the problem occur when you put it under load using another method? What are the ATA and network drivers here? I've seen some awful out-of-tree device drivers hog the CPU with busy-waits and other crap. Oprofile results should show the culprit. If blocks per transfer/sector count is set to 256, which means cpu has less load (any other implications?), this problem no longer occurs. Our target system uses libata sil24/pata680 drivers, has a customized FIFO driver but no network driver. The relevant variable here is blocks per transfer/sector count, which seems to matter only to sg/libata. Thanks, Fajun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Process Scheduling Issue using sg/libata
On 11/17/07, Mark Lord [EMAIL PROTECTED] wrote: Fajun Chen wrote: On 11/17/07, Mark Lord [EMAIL PROTECTED] wrote: Fajun Chen wrote: On 11/16/07, Mark Lord [EMAIL PROTECTED] wrote: Fajun Chen wrote: .. This problem also happens with R/W DMA ops. Below are simplified code snippets: // Open one sg device for read if ((sg_fd = open(dev_name, O_RDWR))0) { ... } read_buffer = (U8 *)mmap(NULL, buf_sz, PROT_READ | PROT_WRITE, MAP_SHARED, sg_fd, 0); // Open the same sg device for write if ((sg_fd_wr = open(dev_name, O_RDWR))0) { ... } write_buffer = (U8 *)mmap(NULL, buf_sz, PROT_READ | PROT_WRITE, MAP_SHARED, sg_fd_wr, 0); .. Mmmm.. what is the purpose of those two mmap'd areas ? I think this is important and relevant here: what are they used for? As coded above, these are memory mapped areas taht (1) overlap, and (2) will be demand paged automatically to/from the disk as they are accessed/modified. This *will* conflict with any SG_IO operations happening at the same time on the same device. .. The purpose of using two memory mapped areas is to meet our requirement that certain data patterns for writing need to be kept across commands. For instance, if one buffer is used for both reads and writes, then this buffer will need to be re-populated with certain write data after each read command, which would be very costly for write-read mixed type of ops. This separate R/W buffer setting also facilitates data comparison. These buffers are not used at the same time (one will be used only after the command on the other is completed). My application is the only program accessing disk using sg/libata and the rest of the programs run from ramdisk. Also, each buffer is only about 0.5MB and we have 64MB RAM on the target board. With this setup, these two buffers should be pretty much independent and free from block layer/file system, correct? .. No. Those buffers as coded above are actually mmap'ed representations of portions of the device (disk drive). So any write into one of those buffers will trigger disk writes, and just accessing (read) the buffers may trigger disk reads. So what could be happening here, is when you trigger manual disk accesses via SG_IO, that result in data being copied into those buffers, the kernel then automatically schedules disk writes to update the on-disk copies of those mmap'd regions. What you probably intended to do instead, was to use mmap to just allocate some page-aligned RAM, not to actually mmap'd any on-disk data. Right? Here's how that's done: read_buffer = (U8 *)mmap(NULL, buf_sz, PROT_READ | PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0); What I intended to do is to write data into disc or read data from disc via SG_IO as requested by my user-space application. I don't want any automatically scheduled kernel task to sync data with disc. I've experimented with memory mapping using MAP_ANONYMOUS as you suggested, the good news is that it does free up the cpu load and my system is much more responsive with the change. The bad news is that the data read back from disc (PIO or DMA read) seems to be invisible to user-space application. For instance, read buffer is all zeros after Identify Device command. Is this expected side effect of MAP_ANONYMOUS option? Thanks, Fajun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Process Scheduling Issue using sg/libata
Hi All, I use sg/libata and ata pass through for read/writes. Linux 2.6.18-rc2 and libata version 2.00 are loaded on ARM XScale board. Under heavy cpu load (e.g. when blocks per transfer/sector count is set to 1), I've observed that the test application can suck cpu away for long time (more than 20 seconds) and other processes including high priority shell can not get the time slice to run. What's interesting is that if the application is under heavy IO load (e.g. when blocks per transfer/sector count is set to 256), the problem goes away. I also tested with open source code sg_utils and got the same result, so this is not a problem specific to my user-space application. Since user preemption is checked when the kernel is about to return to user-space from a system call, process scheduler should be invoked after each system call. Something seems to be broken here. I found a similar issue below: http://marc.info/?l=linux-arm-kernelm=103121214521819w=2 But that turns out to be an issue with MTD/JFFS2 drivers, which are not used in my system. Has anyone experienced similar issues with sg/libata? Any information would be greatly appreciated. Thanks, Fajun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Process Scheduling Issue using sg/libata
On 11/16/07, Tejun Heo [EMAIL PROTECTED] wrote: Fajun Chen wrote: I use sg/libata and ata pass through for read/writes. Linux 2.6.18-rc2 and libata version 2.00 are loaded on ARM XScale board. Under heavy cpu load (e.g. when blocks per transfer/sector count is set to 1), I've observed that the test application can suck cpu away for long time (more than 20 seconds) and other processes including high priority shell can not get the time slice to run. What's interesting is that if the application is under heavy IO load (e.g. when blocks per transfer/sector count is set to 256), the problem goes away. I also tested with open source code sg_utils and got the same result, so this is not a problem specific to my user-space application. Since user preemption is checked when the kernel is about to return to user-space from a system call, process scheduler should be invoked after each system call. Something seems to be broken here. I found a similar issue below: http://marc.info/?l=linux-arm-kernelm=103121214521819w=2 But that turns out to be an issue with MTD/JFFS2 drivers, which are not used in my system. Has anyone experienced similar issues with sg/libata? Any information would be greatly appreciated. That's one weird story. Does kernel say anything during that 20 seconds? No. Nothing in kernel log. Fajun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Process Scheduling Issue using sg/libata
On 11/16/07, Mark Lord [EMAIL PROTECTED] wrote: Fajun Chen wrote: Hi All, I use sg/libata and ata pass through for read/writes. Linux 2.6.18-rc2 and libata version 2.00 are loaded on ARM XScale board. Under heavy cpu load (e.g. when blocks per transfer/sector count is set to 1), I've observed that the test application can suck cpu away for long time (more than 20 seconds) and other processes including high priority shell can not get the time slice to run. What's interesting is that if the application is under heavy IO load (e.g. when blocks per transfer/sector count is set to 256), the problem goes away. I also tested with open source code sg_utils and got the same result, so this is not a problem specific to my user-space application. .. Post the relevant code here, and then we'll be able to better understand and explain it to you. For example, if the code is using ATA opcodes 0x20, 0x21, 0x24, 0x30, 0x31, 0x34, 0x29, 0x39, 0xc4 or 0xc5 (any of the R/W PIO ops), then this behaviour does not surprise me in the least. Fully expected and difficult to avoid. This problem also happens with R/W DMA ops. Below are simplified code snippets: // Open one sg device for read if ((sg_fd = open(dev_name, O_RDWR))0) { ... } read_buffer = (U8 *)mmap(NULL, buf_sz, PROT_READ | PROT_WRITE, MAP_SHARED, sg_fd, 0); // Open the same sg device for write if ((sg_fd_wr = open(dev_name, O_RDWR))0) { ... } write_buffer = (U8 *)mmap(NULL, buf_sz, PROT_READ | PROT_WRITE, MAP_SHARED, sg_fd_wr, 0); sg_io_hdr_t io_hdr; memset(io_hdr, 0, sizeof(sg_io_hdr_t)); io_hdr.interface_id = 'S'; io_hdr.mx_sb_len= sizeof(sense_buffer); io_hdr.sbp = sense_buffer; io_hdr.dxfer_len= dxfer_len; io_hdr.cmd_len = cmd_len; io_hdr.cmdp = cmdp;// ATA pass through command block io_hdr.timeout = cmd_tmo * 1000; // In millisecs io_hdr.pack_id = id; // Read/write counter for now io_hdr.iovec_count=0; // scatter gather elements, 0=not being used if (direction == 1) { io_hdr.dxfer_direction = SG_DXFER_TO_DEV; io_hdr.flags |= SG_FLAG_MMAP_IO; status = ioctl(sg_fd_wr, SG_IO, io_hdr); } else { io_hdr.dxfer_direction = SG_DXFER_FROM_DEV; io_hdr.flags |= SG_FLAG_MMAP_IO; status = ioctl(sg_fd, SG_IO, io_hdr); } ... Mmaped IO is a moot point here since this problem is also observed when using direct IO. Thanks, Fajun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Sil3124: IRQ loss errata on PCI-X
Hi, In sata_sil24.c, there's a special flag as shown below: SIL24_FLAG_PCIX_IRQ_WOC = (1 24), /* IRQ loss errata on PCI-X */ Could someone explain the error condition and the rational behind the fix? Thanks, Fajun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Libata Sil3124 hybrid drive failure
Hi, Does anyone know if libata supports hybrid drive on Sil3124? I tried to play with Seagate hybrid drive and got protocol mismatch error when adding LBAs to pinned set or querying pinned set? Sil24 driver code does not set Protocol Override field in PRB, just wonder if we need to set protocol explicitly to get this work because the controller may not decode the protocol correctly for nv cache commands. Thanks, Fajun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NV Cache Feature Support in libata
Did you refer to HDIO_DRIVE_CMD ioctl in old IDE code? I couldn't get it working on libata. Based on libata-scsi code, only PIO data in and Non-data protocol are supported for HDIO_DRIVE_CMD while NV Cache commands requires DMA based on ATA8 spec. I tried to use ATA pass through command for NV Cache commands. For instance, I implemented Add LBA to Pinned Set command just like Write DMA Extended with the differences in command, feature and LBA register bit 0, but I got protocol mismatch error from device. I would appreciate any information in regard to libata support on NV Cache commands. Thanks, Fajun On 2/9/07, Timothy Bisson [EMAIL PROTECTED] wrote: On Feb 9, 2007, at 10:25 AM, Fajun Chen wrote: I'm not aware of any NV cache (vista feature documented in ata-atapi-8) support in libata. Is this supported or any plan to support it in the future? It is already supported through the HDIO_DRIVE_CMD ioctl - pass 0xb6 as the command register value. I think the interesting question is, how can we leverage the NV Cache in the block and file system layers to improve power and performance. One probably obvious option is to enter NV Cache Power Mode when laptop mode is enabled. Tim - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Sata_Sil24: how to disable hotplug
Hi, I wonder if there's an official way to disable hotplug for SATA Sil24? Thanks, Fajun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
How to synchronize SCSI user scan with SCSI hotplug/rescan
Hi Folks, In libata EH code, both SCSI hotplug and rescan are executed from a separate workqueue after EH completes as shown below: void ata_scsi_error(struct Scsi_Host *host) { ... if (ap-pflags ATA_PFLAG_LOADING) ap-pflags = ~ATA_PFLAG_LOADING; else if (ap-pflags ATA_PFLAG_SCSI_HOTPLUG) queue_work(ata_aux_wq, ap-hotplug_task); } static int ata_eh_revalidate_and_attach() { ... queue_work(ata_aux_wq, (ap-scsi_rescan_task)); } However SCSI user scan code is synchronized to EH only as shown below: static int ata_scsi_user_scan() { ... if (rc == 0) { ata_port_schedule_eh(ap); spin_unlock_irqrestore(ap-lock, flags); ata_port_wait_eh(ap); } ... } So SCSI user scan could race with SCSI hotplug/rescan and the consequence is that we may not be able to access device right after user scan. We can work around the issue by adding timeout in user space code but this is not a real fix. Can we synchronize SCSI user scan with SCSI hotplug/rescan? Thanks, Fajun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Change Libata Error Handling for Drive Testing
Hi Tejun, JFYI, it turns out that spurious interrupts was caused by User Scan before drive is ready. I wait for 2 seconds after drive is powered on which is not sufficient for some drives. Alt status should be checked first but there's no good way to check it in user space. Does User Scan related code check alt status before drive is touched? Thanks, Fajun On 3/19/07, Fajun Chen [EMAIL PROTECTED] wrote: On 3/19/07, Tejun Heo [EMAIL PROTECTED] wrote: Fajun Chen wrote: Please ignore the changes to pata_sil680.c. The same failure happened to standard sil680 driver without my change as well. Does it also happen when the second port is empty? Yes, it happens even when one of the port (either one) is powered off. It used to happen in the middle of our IO test application, now it happened much early in our test spinup process with debugging version of ata_host_intr() function. We boot up the target (ARM XScale processor) with hard drive powered off, then power up the drive and do test spinup. What test spinup does is to issue sysfs user scan on the port followed by Identify Device. Below is my debugging version of ata_host_intr() function with ATA_IRQ_TRAP enabled and hacked. What puzzled me is that none of the sub-counters (initial value is 1) get incremented in most failures? Please see dmesg log for details. I did see one failure (out of many) where idle_irq_hsm_state is incremented and matches idle_irq counter though. Thanks, Fajun inline unsigned int ata_host_intr (struct ata_port *ap, struct ata_queued_cmd *qc) { u8 status, host_stat = 0; /* VPRINTK(ata%u: protocol %d task_state %d\n, */ /* ap-id, qc-tf.protocol, ap-hsm_task_state); */ /* printk(KERN_INFO ata%u: protocol %d task_state %d\n, */ /* ap-id, qc-tf.protocol, ap-hsm_task_state); */ /* Check whether we are expecting interrupt in this state */ switch (ap-hsm_task_state) { case HSM_ST_FIRST: /* Some pre-ATAPI-4 devices assert INTRQ * at this state when ready to receive CDB. */ /* Check the ATA_DFLAG_CDB_INTR flag is enough here. * The flag was turned on only for atapi devices. * No need to check is_atapi_taskfile(qc-tf) again. */ if (!(qc-dev-flags ATA_DFLAG_CDB_INTR)) { /* printk(KERN_INFO ata%u: flags %lu\n, */ /* ap-id, qc-dev-flags); */ ap-stats.idle_irq_non_atapi++; goto idle_irq; } break; case HSM_ST_LAST: if (qc-tf.protocol == ATA_PROT_DMA || qc-tf.protocol == ATA_PROT_ATAPI_DMA) { /* check status of DMA engine */ host_stat = ap-ops-bmdma_status(ap); VPRINTK(ata%u: host_stat 0x%X\n, ap-id, host_stat); /* if it's not our irq... */ if (!(host_stat ATA_DMA_INTR)) /* printk(KERN_INFO ata%u: host_stat %d\n, */ /* ap-id, host_stat); */ ap-stats.idle_irq_host_state++; goto idle_irq; /* before we do anything else, clear DMA-Start bit */ ap-ops-bmdma_stop(qc); if (unlikely(host_stat ATA_DMA_ERR)) { /* error when transfering data to/from memory */ qc-err_mask |= AC_ERR_HOST_BUS; ap-hsm_task_state = HSM_ST_ERR; } } break; case HSM_ST: break; default: /* printk(KERN_INFO ata%u: hsm_state %d\n, */ /* ap-id, ap-hsm_task_state); */ ap-stats.idle_irq_hsm_state++; goto idle_irq; } /* check altstatus */ status = ata_altstatus(ap); if (status ATA_BUSY) { /* printk(KERN_INFO ata%u: altstatus %d\n, */ /* ap-id, status); */ ap-stats.idle_irq_altstatus++; goto idle_irq; } /* check main status, clearing INTRQ */ status = ata_chk_status(ap); if (unlikely(status ATA_BUSY)) { /* printk(KERN_INFO ata%u: status %d\n, */ /* ap-id, status); */ ap-stats.idle_irq_status++; goto idle_irq; } /* ack bmdma irq events */ ap-ops-irq_clear(ap); ata_hsm_move(ap, qc, status, 0); return 1; /* irq handled */ idle_irq: ap-stats.idle_irq++; #ifdef ATA_IRQ_TRAP if ((ap-stats.idle_irq % 1000
Re: Change Libata Error Handling for Drive Testing
Another error we're analyzing is disabled IRQ due to screaming unhandled interrupts from ata_interrupt(). Since this happened on different drives, it doesn't seem to be a drive issue. Any known problem on this one? Could long ISR locking by some rogue application cause this problem? To detect the problem early, it probably makes sense to add error handling code to check for screeming unhandled ISR in ata_interrupt() function? Which driver are you using? Please report with lspci -nn and full dmesg with linux-ide cc'd. We use PATA Sil680. I don't have lspci ported yet on the target test hardware. I'll collect the kernel log and report it to you later. Kernel revision is 2.6.18-rc2 and libata 2.00. The following changes were made to pata_sil680.c: 1.For PCI IO performance: /* FIXME: double check */ //pci_write_config_byte(pdev, PCI_CACHE_LINE_SIZE, (class_rev) ? 1 : 255); pci_write_config_byte(pdev, PCI_CACHE_LINE_SIZE, 0); pci_write_config_byte(pdev, PCI_LATENCY_TIMER, 0x40); 2. Applied one of the patches to change TF1 timing register to 0x2283 Below are the kernel logs at the time of the failure: call 0xc0015f90: init_ramfs_fs+0x0/0x1c() [42949375.53] Calling initcall 0xc001618c: noop_init+0x0/0x1c() [42949375.53] io scheduler noop registered [42949375.53] Calling initcall 0xc00161a8: as_init+0x0/0x80() [42949375.53] io scheduler anticipatory registered [42949375.53] Calling initcall 0xc0016228: deadline_init+0x0/0x78() [42949375.53] io scheduler deadline registered [42949375.53] Calling initcall 0xc00162a0: cfq_init+0x0/0x10c() [42949375.53] io scheduler cfq registered (default) [42949375.53] Calling initcall 0xc00175b8: pci_init+0x0/0x3c() [42949375.53] PCI: Calling quirk c0018048 for :00:01.0 [42949375.53] PCI: Calling quirk c001cb64 for :00:01.0 [42949375.53] Calling initcall 0xc0019044: pci_sysfs_init+0x0/0x44() [42949375.53] Calling initcall 0xc0019298: pci_proc_init+0x0/0x90() [42949375.53] Calling initcall 0xc0019e9c: rand_initialize+0x0/0x38() [42949375.53] Calling initcall 0xc0019f70: tty_init+0x0/0x1f8() [42949375.53] Calling initcall 0xc001a168: pty_init+0x0/0x3fc() [42949375.72] Calling initcall 0xc00db7cc: aauc_init+0x0/0x14c() [42949375.72] Calling initcall 0xc00e8b00: usbfifoInit+0x0/0x1c() [42949375.72] Calling initcall 0xc001b184: serial8250_init+0x0/0x130() [42949375.72] Serial: 8250/16550 driver $Revision: 1.4 $ 2 ports, IRQ sharing disabled [42949375.72] serial8250: ttyS0 at MMIO 0xfe80 (irq = 29) is a SC16C752 [42949375.72] serial8250: ttyS1 at MMIO 0xfe88 (irq = 30) is a SC16C752 [42949375.72] Calling initcall 0xc001b578: serial8250_pci_init+0x0/0x20() [42949375.72] Calling initcall 0xc001bc80: rd_init+0x0/0x194() [42949375.73] RAMDISK driver initialized: 16 RAM disks of 10485760K size 1024 blocksize [42949375.73] Calling initcall 0xc001bef4: net_olddevs_init+0x0/0xdc() [42949375.73] Calling initcall 0xc001c024: tun_init+0x0/0x64() [42949375.73] tun: Universal TUN/TAP device driver, 1.6 [42949375.73] tun: (C) 1999-2004 Max Krasnyansky [EMAIL PROTECTED] [42949375.73] Calling initcall 0xc001c3ac: spi_transport_init+0x0/0x44() [42949375.73] Calling initcall 0xc001c3f0: fc_transport_init+0x0/0x4c() [42949375.73] Calling initcall 0xc001c43c: iscsi_transport_init+0x0/0x108() [42949375.73] Calling initcall 0xc001c544: ata_init+0x0/0xa0() [42949375.73] libata version 2.00 loaded. [42949375.73] Calling initcall 0xc001c5e4: sil_init+0x0/0x20() [42949375.73] Calling initcall 0xc001c604: sil24_init+0x0/0x20() [42949375.73] Calling initcall 0xc001c624: sil680_init+0x0/0x20() [42949375.73] pata_sil680 :00:01.0: version 0.3.2 [42949375.73] sil680: BA5_EN = 1 clock = 00 [42949375.73] sil680: BA5_EN = 1 clock = 10 [42949375.73] sil680: 133MHz clock. [42949375.73] PCI: enabling device :00:01.0 (0140 - 0143) [42949375.73] PCI: Enabling bus mastering for device :00:01.0 [42949375.73] ata1: PATA max UDMA/133 cmd 0xFE10 ctl 0xFE22 bmdma 0xFE00 irq 27 [42949375.73] ata2: PATA max UDMA/133 cmd 0xFE18 ctl 0xFE26 bmdma 0xFE08 irq 27 [42949375.73] scsi0 : pata_sil680 [42949375.75] scsi1 : pata_sil680 [42949375.77] Calling initcall 0xc001c910: vsc_sata_init+0x0/0x20() [42949375.77] Calling initcall 0xc001c930: init_sd+0x0/0x8c() [42949375.77] Calling initcall 0xc001c9bc: init_sg+0x0/0x14c() [42949375.77] Calling initcall 0xc001d03c: i2c_dev_init+0x0/0xa8() [42949375.77] i2c /dev entries driver [42949375.77] Calling initcall 0xc001d0e4: i2c_iop3xx_init+0x0/0x1c() [42949375.77] Calling initcall 0xc001d100: hwmon_init+0x0/0x5c() [42949375.77] Calling initcall 0xc001d74c: flow_cache_init+0x0/0x174() [42949375.77] Calling initcall 0xc001eb74: xfrm4_transport_init+0x0/0x20()
Re: Change Libata Error Handling for Drive Testing
On 3/19/07, Fajun Chen [EMAIL PROTECTED] wrote: Another error we're analyzing is disabled IRQ due to screaming unhandled interrupts from ata_interrupt(). Since this happened on different drives, it doesn't seem to be a drive issue. Any known problem on this one? Could long ISR locking by some rogue application cause this problem? To detect the problem early, it probably makes sense to add error handling code to check for screeming unhandled ISR in ata_interrupt() function? Which driver are you using? Please report with lspci -nn and full dmesg with linux-ide cc'd. We use PATA Sil680. I don't have lspci ported yet on the target test hardware. I'll collect the kernel log and report it to you later. Kernel revision is 2.6.18-rc2 and libata 2.00. The following changes were made to pata_sil680.c: 1.For PCI IO performance: /* FIXME: double check */ //pci_write_config_byte(pdev, PCI_CACHE_LINE_SIZE, (class_rev) ? 1 : 255); pci_write_config_byte(pdev, PCI_CACHE_LINE_SIZE, 0); pci_write_config_byte(pdev, PCI_LATENCY_TIMER, 0x40); 2. Applied one of the patches to change TF1 timing register to 0x2283 Below are the kernel logs at the time of the failure: call 0xc0015f90: init_ramfs_fs+0x0/0x1c() [42949375.53] Calling initcall 0xc001618c: noop_init+0x0/0x1c() [42949375.53] io scheduler noop registered [42949375.53] Calling initcall 0xc00161a8: as_init+0x0/0x80() [42949375.53] io scheduler anticipatory registered [42949375.53] Calling initcall 0xc0016228: deadline_init+0x0/0x78() [42949375.53] io scheduler deadline registered [42949375.53] Calling initcall 0xc00162a0: cfq_init+0x0/0x10c() [42949375.53] io scheduler cfq registered (default) [42949375.53] Calling initcall 0xc00175b8: pci_init+0x0/0x3c() [42949375.53] PCI: Calling quirk c0018048 for :00:01.0 [42949375.53] PCI: Calling quirk c001cb64 for :00:01.0 [42949375.53] Calling initcall 0xc0019044: pci_sysfs_init+0x0/0x44() [42949375.53] Calling initcall 0xc0019298: pci_proc_init+0x0/0x90() [42949375.53] Calling initcall 0xc0019e9c: rand_initialize+0x0/0x38() [42949375.53] Calling initcall 0xc0019f70: tty_init+0x0/0x1f8() [42949375.53] Calling initcall 0xc001a168: pty_init+0x0/0x3fc() [42949375.72] Calling initcall 0xc00db7cc: aauc_init+0x0/0x14c() [42949375.72] Calling initcall 0xc00e8b00: usbfifoInit+0x0/0x1c() [42949375.72] Calling initcall 0xc001b184: serial8250_init+0x0/0x130() [42949375.72] Serial: 8250/16550 driver $Revision: 1.4 $ 2 ports, IRQ sharing disabled [42949375.72] serial8250: ttyS0 at MMIO 0xfe80 (irq = 29) is a SC16C752 [42949375.72] serial8250: ttyS1 at MMIO 0xfe88 (irq = 30) is a SC16C752 [42949375.72] Calling initcall 0xc001b578: serial8250_pci_init+0x0/0x20() [42949375.72] Calling initcall 0xc001bc80: rd_init+0x0/0x194() [42949375.73] RAMDISK driver initialized: 16 RAM disks of 10485760K size 1024 blocksize [42949375.73] Calling initcall 0xc001bef4: net_olddevs_init+0x0/0xdc() [42949375.73] Calling initcall 0xc001c024: tun_init+0x0/0x64() [42949375.73] tun: Universal TUN/TAP device driver, 1.6 [42949375.73] tun: (C) 1999-2004 Max Krasnyansky [EMAIL PROTECTED] [42949375.73] Calling initcall 0xc001c3ac: spi_transport_init+0x0/0x44() [42949375.73] Calling initcall 0xc001c3f0: fc_transport_init+0x0/0x4c() [42949375.73] Calling initcall 0xc001c43c: iscsi_transport_init+0x0/0x108() [42949375.73] Calling initcall 0xc001c544: ata_init+0x0/0xa0() [42949375.73] libata version 2.00 loaded. [42949375.73] Calling initcall 0xc001c5e4: sil_init+0x0/0x20() [42949375.73] Calling initcall 0xc001c604: sil24_init+0x0/0x20() [42949375.73] Calling initcall 0xc001c624: sil680_init+0x0/0x20() [42949375.73] pata_sil680 :00:01.0: version 0.3.2 [42949375.73] sil680: BA5_EN = 1 clock = 00 [42949375.73] sil680: BA5_EN = 1 clock = 10 [42949375.73] sil680: 133MHz clock. [42949375.73] PCI: enabling device :00:01.0 (0140 - 0143) [42949375.73] PCI: Enabling bus mastering for device :00:01.0 [42949375.73] ata1: PATA max UDMA/133 cmd 0xFE10 ctl 0xFE22 bmdma 0xFE00 irq 27 [42949375.73] ata2: PATA max UDMA/133 cmd 0xFE18 ctl 0xFE26 bmdma 0xFE08 irq 27 [42949375.73] scsi0 : pata_sil680 [42949375.75] scsi1 : pata_sil680 [42949375.77] Calling initcall 0xc001c910: vsc_sata_init+0x0/0x20() [42949375.77] Calling initcall 0xc001c930: init_sd+0x0/0x8c() [42949375.77] Calling initcall 0xc001c9bc: init_sg+0x0/0x14c() [42949375.77] Calling initcall 0xc001d03c: i2c_dev_init+0x0/0xa8() [42949375.77] i2c /dev entries driver [42949375.77] Calling initcall 0xc001d0e4: i2c_iop3xx_init+0x0/0x1c() [42949375.77] Calling initcall 0xc001d100: hwmon_init+0x0/0x5c() [42949375.77] Calling initcall 0xc001d74c: flow_cache_init+0x0/0x174() [42949375.77] Calling
Re: Change Libata Error Handling for Drive Testing
On 3/19/07, Tejun Heo [EMAIL PROTECTED] wrote: Fajun Chen wrote: Please ignore the changes to pata_sil680.c. The same failure happened to standard sil680 driver without my change as well. Does it also happen when the second port is empty? Yes, it happens even when one of the port (either one) is powered off. It used to happen in the middle of our IO test application, now it happened much early in our test spinup process with debugging version of ata_host_intr() function. We boot up the target (ARM XScale processor) with hard drive powered off, then power up the drive and do test spinup. What test spinup does is to issue sysfs user scan on the port followed by Identify Device. Below is my debugging version of ata_host_intr() function with ATA_IRQ_TRAP enabled and hacked. What puzzled me is that none of the sub-counters (initial value is 1) get incremented in most failures? Please see dmesg log for details. I did see one failure (out of many) where idle_irq_hsm_state is incremented and matches idle_irq counter though. Thanks, Fajun inline unsigned int ata_host_intr (struct ata_port *ap, struct ata_queued_cmd *qc) { u8 status, host_stat = 0; /* VPRINTK(ata%u: protocol %d task_state %d\n, */ /* ap-id, qc-tf.protocol, ap-hsm_task_state); */ /* printk(KERN_INFO ata%u: protocol %d task_state %d\n, */ /* ap-id, qc-tf.protocol, ap-hsm_task_state); */ /* Check whether we are expecting interrupt in this state */ switch (ap-hsm_task_state) { case HSM_ST_FIRST: /* Some pre-ATAPI-4 devices assert INTRQ * at this state when ready to receive CDB. */ /* Check the ATA_DFLAG_CDB_INTR flag is enough here. * The flag was turned on only for atapi devices. * No need to check is_atapi_taskfile(qc-tf) again. */ if (!(qc-dev-flags ATA_DFLAG_CDB_INTR)) { /* printk(KERN_INFO ata%u: flags %lu\n, */ /* ap-id, qc-dev-flags); */ ap-stats.idle_irq_non_atapi++; goto idle_irq; } break; case HSM_ST_LAST: if (qc-tf.protocol == ATA_PROT_DMA || qc-tf.protocol == ATA_PROT_ATAPI_DMA) { /* check status of DMA engine */ host_stat = ap-ops-bmdma_status(ap); VPRINTK(ata%u: host_stat 0x%X\n, ap-id, host_stat); /* if it's not our irq... */ if (!(host_stat ATA_DMA_INTR)) /* printk(KERN_INFO ata%u: host_stat %d\n, */ /* ap-id, host_stat); */ ap-stats.idle_irq_host_state++; goto idle_irq; /* before we do anything else, clear DMA-Start bit */ ap-ops-bmdma_stop(qc); if (unlikely(host_stat ATA_DMA_ERR)) { /* error when transfering data to/from memory */ qc-err_mask |= AC_ERR_HOST_BUS; ap-hsm_task_state = HSM_ST_ERR; } } break; case HSM_ST: break; default: /* printk(KERN_INFO ata%u: hsm_state %d\n, */ /* ap-id, ap-hsm_task_state); */ ap-stats.idle_irq_hsm_state++; goto idle_irq; } /* check altstatus */ status = ata_altstatus(ap); if (status ATA_BUSY) { /* printk(KERN_INFO ata%u: altstatus %d\n, */ /* ap-id, status); */ ap-stats.idle_irq_altstatus++; goto idle_irq; } /* check main status, clearing INTRQ */ status = ata_chk_status(ap); if (unlikely(status ATA_BUSY)) { /* printk(KERN_INFO ata%u: status %d\n, */ /* ap-id, status); */ ap-stats.idle_irq_status++; goto idle_irq; } /* ack bmdma irq events */ ap-ops-irq_clear(ap); ata_hsm_move(ap, qc, status, 0); return 1; /* irq handled */ idle_irq: ap-stats.idle_irq++; #ifdef ATA_IRQ_TRAP if ((ap-stats.idle_irq % 1000) == 0) { /* ata_irq_ack(ap, 0); /\* debug trap *\/ */ ata_port_printk(ap, KERN_WARNING, irq trap: %lu - %lu %lu %lu %lu %lu\n, ap-stats.idle_irq, ap-stats.idle_irq_non_atapi, ap-stats.idle_irq_hsm_state, ap-stats.idle_irq_host_state, ap-stats.idle_irq_altstatus, ap-stats.idle_irq_status); return 0
Re: PATA Sil680 Command Timeout on ARM XScale
Since primary channel and secondary channel share the same IRQ, the ISR could be called to service one or both channels. So I would think it's normal to see irq trap traces when both channels are in IO operation, correct? I have another question in regard to ata_host_intr() function in libata-core.c. For PIO read/write, the status of interrupt pin was not checked before moving the host state machine. Sil680 spec. recommend checking IDE channel interupt (bit 11 in the IDEx Task File Timing and Config + Status register) though. Could someone explain why interrupt status does not need to be checked for PIO? I'm still troubleshooting the command timeout issue on my test hardware, I will repeat the same test on i386 hardware. In the mean time, I would appreciate any suggestions or known information to isolate the issue. Thanks, Fajun On 3/13/07, Fajun Chen [EMAIL PROTECTED] wrote: On 3/13/07, Alan Cox [EMAIL PROTECTED] wrote: above 1.5. Two timers are used to track command timeout in our test software. The one in the user space is set to 6 seconds using alarm() call while the one in the kernel (scsi timer) is set to 5 seconds. These timeout values are probably too low to be realistic, but the issue here is not about the timeout itself but to understand why it is A lot of drive commands seem to be set up on a seven second worst case always user space timer expired before kernel timer. Since kernel timer uses jiffies to track time, does this imply a kernel bug where the time interrupts were lost or delay somehow? Do you know any know problems related to command timeout in PATA Sil680? Alarm() is also handled by the same jiffies logic, so I suspect a bug in your test environment ? I enabled ata_irq_trap and did the same test again. The kernel timer caught the timeout (10 seconds) this time along with the irq trap traces below. What's the cause of these idle irqs? [42949560.15] SCSI device sdb: drive cache: write back [ 85.57] ata1: irq trap [ 85.82] ata2: irq trap [ 92.12] abnormal status 0xD0 [ 92.12] ata1: irq trap [ 92.92] ata2: irq trap [ 98.75] ata1: irq trap [ 100.26] abnormal status 0xD0 [ 100.26] ata2: irq trap [ 105.54] ata1: irq trap [ 108.05] ata1: irq trap [ 110.62] ata1: irq trap [ 113.13] ata1: irq trap [ 115.53] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen [ 115.53] ata1.00: (BMDMA stat 0x0) [ 115.53] ata1.00: tag 0 cmd 0xc8 Emask 0x4 stat 0x40 err 0x0 (timeout) [ 115.53] ata1: soft resetting port [ 115.57] ata1.00: configured for UDMA/100 [ 115.57] sg_cmd_done: sg0, pack_id=2706, res=0x802, dur=10040 ms [ 115.57] ata1: EH complete [ 115.58] SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB) [ 115.58] sda: Write Protect is off [ 115.58] sda: Mode Sense: 00 3a 00 00 [ 115.58] SCSI device sda: drive cache: write back ... Thanks, Fajun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
PATA Sil680 Command Timeout on ARM XScale
Hi Folks, We have a command timeout with Sil680 controller on ARM XScale. The kernel is 2.6.18-rc2 and libata 2.00 with preemptive enabled. Similar problem observed as well with kernel preemptive disabled. ATA pass through and sg are used. Heavy IO test was ran on both channels of Sil680 and the system was pretty loaded where the load average was above 1.5. Two timers are used to track command timeout in our test software. The one in the user space is set to 6 seconds using alarm() call while the one in the kernel (scsi timer) is set to 5 seconds. These timeout values are probably too low to be realistic, but the issue here is not about the timeout itself but to understand why it is always user space timer expired before kernel timer. Since kernel timer uses jiffies to track time, does this imply a kernel bug where the time interrupts were lost or delay somehow? Do you know any know problems related to command timeout in PATA Sil680? Thanks, Fajun User space trace: Cmd 4276 timed out after 7.260137 secs: start time 1173775439.409099 secs, timed out at 1173775446.669236 secs [Tue Mar 13 08:44:06 2007]: Test: Random Write Sectors Extended LBA Low:0 LBA High: 10 ... Num Cmds: 4277 Num_Failed_Cmds:1 ... Status: Fail [Error 401: Command timeout] Dmesg log ~ $ dmesg .77] Calling initcall 0xc001ebb4: inet_diag_init+0x0/0x80() [42949375.77] Calling initcall 0xc001ec34: tcp_diag_init+0x0/0x1c() [42949375.77] Calling initcall 0xc001ec50: bictcp_register+0x0/0x1c() [42949375.77] TCP bic registered [42949375.77] Calling initcall 0xc001ee2c: af_unix_init+0x0/0x80() [42949375.77] NET: Registered protocol family 1 [42949375.77] Calling initcall 0xc001eeac: packet_init+0x0/0x70() [42949375.77] NET: Registered protocol family 17 [42949375.77] Calling initcall 0xc0012a88: clocksource_done_booting+0x0/0x24() [42949375.77] Calling initcall 0xc0019ed4: seqgen_init+0x0/0x1c() [42949375.77] Calling initcall 0xc001ba44: early_uart_console_switch+0x0/0x90() [42949375.77] Calling initcall 0xc013a150: net_random_reseed+0x0/0x38() [42949375.77] RAMDISK: Compressed image found at block 0 [42949378.95] VFS: Mounted root (ext2 filesystem). [42949378.96] Freeing init memory: 104K [42949549.17] ata1: soft resetting port [42949549.25] ata1.00: ATA-6, max UDMA/100, 78140160 sectors: LBA48 [42949549.25] ata1.00: configured for UDMA/100 [42949549.25] ata1: EH complete [42949549.25] Vendor: ATA Model: ST940813AMRev: 3.02 [42949549.25] Type: Direct-Access ANSI SCSI revision: 05 [42949549.26] SCSI device sda: 78140160 512-byte hdwr sectors (40008 MB) [42949549.26] sda: Write Protect is off [42949549.26] sda: Mode Sense: 00 3a 00 00 [42949549.26] SCSI device sda: drive cache: write back [42949549.27] SCSI device sda: 78140160 512-byte hdwr sectors (40008 MB) [42949549.27] sda: Write Protect is off [42949549.27] sda: Mode Sense: 00 3a 00 00 [42949549.27] SCSI device sda: drive cache: write back [42949549.27] sda: unknown partition table [42949549.29] sd 0:0:0:0: Attached scsi disk sda [42949549.29] sd 0:0:0:0: Attached scsi generic sg0 type 0 [42949549.32] ata1: soft resetting port [42949549.38] ata1.00: configured for UDMA/100 [42949549.38] ata1: EH complete [42949549.38] SCSI device sda: 78140160 512-byte hdwr sectors (40008 MB) [42949549.38] sda: Write Protect is off [42949549.38] sda: Mode Sense: 00 3a 00 00 [42949549.39] SCSI device sda: drive cache: write back [42949559.28] ata2: soft resetting port [42949559.42] ata2.00: ATA-6, max UDMA/100, 78140160 sectors: LBA48 [42949559.42] ata2.00: configured for UDMA/100 [42949559.42] ata2: EH complete [42949559.42] Vendor: ATA Model: ST94811A Rev: 3.07 [42949559.42] Type: Direct-Access ANSI SCSI revision: 05 [42949559.43] SCSI device sdb: 78140160 512-byte hdwr sectors (40008 MB) [42949559.43] sdb: Write Protect is off [42949559.43] sdb: Mode Sense: 00 3a 00 00 [42949559.43] SCSI device sdb: drive cache: write back [42949559.43] SCSI device sdb: 78140160 512-byte hdwr sectors (40008 MB) [42949559.44] sdb: Write Protect is off [42949559.44] sdb: Mode Sense: 00 3a 00 00 [42949559.44] SCSI device sdb: drive cache: write back [42949559.44] sdb: unknown partition table [42949559.46] sd 1:0:0:0: Attached scsi disk sdb [42949559.46] sd 1:0:0:0: Attached scsi generic sg1 type 0 [ 643.23] NWFPE: ntpd[38] takes exception 0001 at c002d514 from 0001d308 [ 711.23] NWFPE: ntpd[38] takes exception 0001 at c002d514 from 0001d308 [ 777.22] NWFPE: ntpd[38] takes exception 0001 at c002d514 from 0001d308 [ 841.30] NWFPE: ntpd[38] takes exception 0001 at c002d514 from
Re: PATA Sil680 Command Timeout on ARM XScale
On 3/13/07, Alan Cox [EMAIL PROTECTED] wrote: above 1.5. Two timers are used to track command timeout in our test software. The one in the user space is set to 6 seconds using alarm() call while the one in the kernel (scsi timer) is set to 5 seconds. These timeout values are probably too low to be realistic, but the issue here is not about the timeout itself but to understand why it is A lot of drive commands seem to be set up on a seven second worst case always user space timer expired before kernel timer. Since kernel timer uses jiffies to track time, does this imply a kernel bug where the time interrupts were lost or delay somehow? Do you know any know problems related to command timeout in PATA Sil680? Alarm() is also handled by the same jiffies logic, so I suspect a bug in your test environment ? I enabled ata_irq_trap and did the same test again. The kernel timer caught the timeout (10 seconds) this time along with the irq trap traces below. What's the cause of these idle irqs? [42949560.15] SCSI device sdb: drive cache: write back [ 85.57] ata1: irq trap [ 85.82] ata2: irq trap [ 92.12] abnormal status 0xD0 [ 92.12] ata1: irq trap [ 92.92] ata2: irq trap [ 98.75] ata1: irq trap [ 100.26] abnormal status 0xD0 [ 100.26] ata2: irq trap [ 105.54] ata1: irq trap [ 108.05] ata1: irq trap [ 110.62] ata1: irq trap [ 113.13] ata1: irq trap [ 115.53] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen [ 115.53] ata1.00: (BMDMA stat 0x0) [ 115.53] ata1.00: tag 0 cmd 0xc8 Emask 0x4 stat 0x40 err 0x0 (timeout) [ 115.53] ata1: soft resetting port [ 115.57] ata1.00: configured for UDMA/100 [ 115.57] sg_cmd_done: sg0, pack_id=2706, res=0x802, dur=10040 ms [ 115.57] ata1: EH complete [ 115.58] SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB) [ 115.58] sda: Write Protect is off [ 115.58] sda: Mode Sense: 00 3a 00 00 [ 115.58] SCSI device sda: drive cache: write back ... Thanks, Fajun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Libata Pass Through PIO Multi Support
On 3/7/07, Jeff Garzik [EMAIL PROTECTED] wrote: Fajun Chen wrote: Hi Folks, In ata_scsi_pass_thru() function in libata-scsi.c, the following code is in place: if (scsicmd[1] 0xe0) /* PIO multi not supported yet */ goto invalid_fld; Could someone explain why the PIO multi is not supported in libata pass through? That's just code left over from before we merged Albert Lee's patches adding PIO-multi support. That check should probably just be removed. Could you please provide a pointer to Albert Lee's PIO multi patch? I'm not sure if 2.6.18-rc2 with libata 2.00 has already got the patch. Thanks, Fajun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
How to Change SATA Speed from User Space
Hi Folks, Does any one know a way to change SATA speed (1.5Gb/s vs 3.0Gb/s) from user space? I use 2.6.18-rc2 and libata version 2.00. Thanks, Fajun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Question about PATA Sil680 Cache Line Size and Performance Degradation on ARM XScale
On 2/21/07, Alan [EMAIL PROTECTED] wrote: On Wed, 21 Feb 2007 15:56:28 -0700 Fajun Chen [EMAIL PROTECTED] wrote: Hi Folks, I've noticed the following code in both pata_sil680.c and IDE code siimage.c /* FIXME: double check */ pci_write_config_byte(pdev, PCI_CACHE_LINE_SIZE, (class_rev) ? 1 : 255); I was unable to find the recommended setting in Sil680 document. Could someone explain the rational behind the code above? Does it need to be adjusted on different processors for PCI read/write performance? The code is inherited from the original bits by Andre Hedrick who had access to the chip errata documents, which afaik have never been published. Thanks for the update, Alan. I did another experiments by changing Cache Line Size to 0x02 and Latency Timer to 0x40 and the performance of PCI Read (Read DMA) has been almost doubled. So this seems there may be room for further performance enhancement by tweaking PCI configuration. One concern about the latency timer change is increased bus hold time, could this delay other applications which shares the PCI bus? Since Sil3124 has better PCI read/write performance, as a reference, could someone explain or point me to the PCI configuration code for Sil3124? I couldn't find it in sata_sil24.c. The kernel version used is 2.6.18-rc2 and libata version is 2.00. Thanks, Fajun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
NV Cache Feature Support in libata
Hi All, I'm not aware of any NV cache (vista feature documented in ata-atapi-8) support in libata. Is this supported or any plan to support it in the future? Thanks, Fajun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html