Re: libata: CD and dvd devices not recognized

2007-03-22 Thread Tejun Heo
Albert Lee wrote:
 Hi Yarema,
 
 Thanks for the detailed log.
 It looks like the bad INQUIRY command
   CDB (4:0,1,0) 12 01 00 00 fe 00 00 00 00 (INQUIRY, length=254, EVPD=1)
 is coming from the user space, not the SCSI mid-layer.
 
 I guess two problems together caused this bug:
 1. Ubuntu Linux issues an incorrect INQUIRY command to the drive.
(Other distros seem to have the INQUIRY correct.)
 2. The incorrect INQUIRY happens to cause the AOpen drive frozen.
(The HP drive is immune from the incorrect INQUIRY command.
 check condition is returned for the bad INQUIRY.)
 
 We have two possible solutions here:
 a. Patch Ubuntu, such that the incorrect INQUIRY is fixed.
 b. Patch kernel, such that the AOpen drives are blacklisted.
Each INQUIRY is inspected for the blacklisted drives.
If the INQUIRY looks wrong, the INQUIRY is rejected.
 
 I guess a. is the preferred solution...

I second Albert's opinion.  Please report this to ubuntu people so that
the origin of the problem can be fixed.

Thanks a lot.  I admire your ability and patience in tracking these
difficult issues.  :-)

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: problem detecting ATAPI device with ahci on 2.6.21-rcx

2007-03-22 Thread Tejun Heo
Kristen Carlson Accardi wrote:
 Hi, I upgrade a machine from 2.6.20-2.6.21-rc4 and am now having problems
 with my ATAPI device getting detected properly.  Back tracking to 2.6.21-rc1,
 I find the problem existed there too, but not in 2.6.20.

Please post boot dmesg of 2.6.20.  Is your ATAPI device connected using
80c cable?

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libata: CD and dvd devices not recognized

2007-03-22 Thread Alan Cox
 We have two possible solutions here:
 a. Patch Ubuntu, such that the incorrect INQUIRY is fixed.
 b. Patch kernel, such that the AOpen drives are blacklisted.
Each INQUIRY is inspected for the blacklisted drives.
If the INQUIRY looks wrong, the INQUIRY is rejected.
 
 I guess a. is the preferred solution...

We have two problems here

#1 Ubuntu got the inquiry command wrong

#2 Until now we considered INQUIRY a safe command for SG_IO passthrough.

We can't really take INQUIRY out of SG_IO so do we decide its the
hardware vendors problem or do something cleverer in the filters ?

Alan
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


sata dvd not detected on jmicron with some kernel-configurations.

2007-03-22 Thread Fredrik Rinnestam
When booting a gnu/linux distribution's install-cd i noticed that the included
[1]kernel (2.6.20.3) would not probe/find my jmicron connected sata-dvd. 

My own [2]kernel-configuration finds the drive just fine but the distributor's
kernel with all sorts of drivers built-in wont. If i connect the dvd-rom to my
ich8r controller, both kernels finds it just fine.

The jmicron controller is configured as AHCI in the systems bios and AHCI is 
built-in in both kernels along with scsi-cdrom support.

[1] http://fredrik.obra.se/linux-2.6.20.3.config
[2] http://fredrik.obra.se/2.6.20.3.config

Any suggestions on what breaks it?

Cheers.
-- 
Fredrik Rinnestam
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH libata-dev#upstream-fixes] libata: IDENTIFY backwards for drive side cable detection

2007-03-22 Thread Tejun Heo
For drive side cable detection to work correctly, drives need to be
identified backwards such that the slave device releases PDIAG- before
the mater drive tries to detect cable type.  ata_bus_probe() was fixed
by commit f31f0cc2f0b7527072d94d02da332d9bb8d7d94c but the new EH path
wasn't fixed.  This patch makes new EH path do IDENTIFY backwards.

ata_dev_configure() for new devices are still performed master first.
This is to keep the detection messages in forward order.

Signed-off-by: Tejun Heo [EMAIL PROTECTED]
---
Jeff, this one should go into #upstream-fixes.  The following
regression is fixed by this.

  http://thread.gmane.org/gmane.linux.ide/17433

Thanks.

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 361953a..c89664a 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -1743,12 +1743,17 @@ static int ata_eh_revalidate_and_attach(struct ata_port 
*ap,
 {
struct ata_eh_context *ehc = ap-eh_context;
struct ata_device *dev;
+   unsigned int new_mask = 0;
unsigned long flags;
int i, rc = 0;
 
DPRINTK(ENTER\n);
 
-   for (i = 0; i  ATA_MAX_DEVICES; i++) {
+   /* For PATA drive side cable detection to work, IDENTIFY must
+* be done backwards such that PDIAG- is released by the slave
+* device before the master device is identified.
+*/
+   for (i = ATA_MAX_DEVICES - 1; i = 0; i--) {
unsigned int action, readid_flags = 0;
 
dev = ap-device[i];
@@ -1760,13 +1765,13 @@ static int ata_eh_revalidate_and_attach(struct ata_port 
*ap,
if (action  ATA_EH_REVALIDATE  ata_dev_ready(dev)) {
if (ata_port_offline(ap)) {
rc = -EIO;
-   break;
+   goto err;
}
 
ata_eh_about_to_do(ap, dev, ATA_EH_REVALIDATE);
rc = ata_dev_revalidate(dev, readid_flags);
if (rc)
-   break;
+   goto err;
 
ata_eh_done(ap, dev, ATA_EH_REVALIDATE);
 
@@ -1784,40 +1789,53 @@ static int ata_eh_revalidate_and_attach(struct ata_port 
*ap,
 
rc = ata_dev_read_id(dev, dev-class, readid_flags,
 dev-id);
-   if (rc == 0) {
-   ehc-i.flags |= ATA_EHI_PRINTINFO;
-   rc = ata_dev_configure(dev);
-   ehc-i.flags = ~ATA_EHI_PRINTINFO;
-   } else if (rc == -ENOENT) {
+   switch (rc) {
+   case 0:
+   new_mask |= 1  i;
+   break;
+   case -ENOENT:
/* IDENTIFY was issued to non-existent
 * device.  No need to reset.  Just
 * thaw and kill the device.
 */
ata_eh_thaw_port(ap);
dev-class = ATA_DEV_UNKNOWN;
-   rc = 0;
-   }
-
-   if (rc) {
-   dev-class = ATA_DEV_UNKNOWN;
break;
+   default:
+   dev-class = ATA_DEV_UNKNOWN;
+   goto err;
}
+   }
+   }
 
-   if (ata_dev_enabled(dev)) {
-   spin_lock_irqsave(ap-lock, flags);
-   ap-pflags |= ATA_PFLAG_SCSI_HOTPLUG;
-   spin_unlock_irqrestore(ap-lock, flags);
+   /* Configure new devices forward such that user doesn't see
+* device detection messages backwards.
+*/
+   for (i = 0; i  ATA_MAX_DEVICES; i++) {
+   dev = ap-device[i];
 
-   /* new device discovered, configure xfermode */
-   ehc-i.flags |= ATA_EHI_SETMODE;
-   }
-   }
+   if (!(new_mask  (1  i)))
+   continue;
+
+   ehc-i.flags |= ATA_EHI_PRINTINFO;
+   rc = ata_dev_configure(dev);
+   ehc-i.flags = ~ATA_EHI_PRINTINFO;
+   if (rc)
+   goto err;
+
+   spin_lock_irqsave(ap-lock, flags);
+   ap-pflags |= ATA_PFLAG_SCSI_HOTPLUG;
+   spin_unlock_irqrestore(ap-lock, flags);
+
+   /* new device discovered, configure xfermode */
+   ehc-i.flags |= ATA_EHI_SETMODE;
}
 
-   if (rc)
-   *r_failed_dev = dev;
+   return 0;
 
-   DPRINTK(EXIT\n);
+ err:
+   

Re: [PATCH libata-dev#upstream-fixes] libata: IDENTIFY backwards for drive side cable detection

2007-03-22 Thread Alan Cox
On Thu, 22 Mar 2007 22:24:19 +0900
Tejun Heo [EMAIL PROTECTED] wrote:

 For drive side cable detection to work correctly, drives need to be
 identified backwards such that the slave device releases PDIAG- before
 the mater drive tries to detect cable type.  ata_bus_probe() was fixed
 by commit f31f0cc2f0b7527072d94d02da332d9bb8d7d94c but the new EH path
 wasn't fixed.  This patch makes new EH path do IDENTIFY backwards.
 
 ata_dev_configure() for new devices are still performed master first.
 This is to keep the detection messages in forward order.
 
 Signed-off-by: Tejun Heo [EMAIL PROTECTED]

Acked-by: Alan Cox [EMAIL PROTECTED]

Why do we have two implementations of the same code ?
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libata: CD and dvd devices not recognized

2007-03-22 Thread Sergei Shtylyov

Hello.

Albert Lee wrote:


Thanks for the detailed log.
It looks like the bad INQUIRY command
  CDB (4:0,1,0) 12 01 00 00 fe 00 00 00 00 (INQUIRY, length=254, EVPD=1)
is coming from the user space, not the SCSI mid-layer.



I guess two problems together caused this bug:
1. Ubuntu Linux issues an incorrect INQUIRY command to the drive.
   (Other distros seem to have the INQUIRY correct.)


   But what is incorrect about sending INQUIRY with EVPD bit?

MBR, Sergei
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH libata-dev#upstream-fixes] libata: IDENTIFY backwards for drive side cable detection

2007-03-22 Thread Tejun Heo
Alan Cox wrote:
 On Thu, 22 Mar 2007 22:24:19 +0900
 Tejun Heo [EMAIL PROTECTED] wrote:
 
 For drive side cable detection to work correctly, drives need to be
 identified backwards such that the slave device releases PDIAG- before
 the mater drive tries to detect cable type.  ata_bus_probe() was fixed
 by commit f31f0cc2f0b7527072d94d02da332d9bb8d7d94c but the new EH path
 wasn't fixed.  This patch makes new EH path do IDENTIFY backwards.

 ata_dev_configure() for new devices are still performed master first.
 This is to keep the detection messages in forward order.

 Signed-off-by: Tejun Heo [EMAIL PROTECTED]
 
 Acked-by: Alan Cox [EMAIL PROTECTED]
 
 Why do we have two implementations of the same code ?

ata_bus_probe() is scheduled to be killed once all old-EH code is
removed.  That's the old probe path.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Change Libata Error Handling for Drive Testing

2007-03-22 Thread Fajun Chen

Hi Tejun,

JFYI, it turns out that spurious interrupts was caused by User Scan
before drive is ready.  I wait for 2 seconds after drive is powered on
which is not sufficient for some drives.  Alt status should be checked
first but there's no good way to check it in user space. Does User
Scan related code check alt status before drive is touched?

Thanks,
Fajun

On 3/19/07, Fajun Chen [EMAIL PROTECTED] wrote:

On 3/19/07, Tejun Heo [EMAIL PROTECTED] wrote:
 Fajun Chen wrote:
  Please ignore the changes to pata_sil680.c. The same failure happened
  to standard sil680 driver without my change as well.

 Does it also happen when the second port is empty?


Yes, it happens even when one of the port (either one) is powered off.
It used to happen in the middle of our IO test application, now it
happened much early in our test spinup process with debugging version
of ata_host_intr() function. We boot up the target (ARM XScale
processor) with hard drive powered off, then power up the drive and do
test spinup. What test spinup does is to issue sysfs user scan on the
port followed by Identify Device.

Below is my debugging version of ata_host_intr() function with
ATA_IRQ_TRAP enabled and hacked. What puzzled me is that none of the
sub-counters (initial value is 1) get incremented in most failures?
Please see dmesg log for details. I did see one failure (out of many)
where idle_irq_hsm_state is incremented and matches idle_irq
counter though.

Thanks,
Fajun

inline unsigned int ata_host_intr (struct ata_port *ap,
   struct ata_queued_cmd *qc)
{
u8 status, host_stat = 0;

/*  VPRINTK(ata%u: protocol %d task_state %d\n, */
/*  ap-id, qc-tf.protocol, ap-hsm_task_state); */
/*  printk(KERN_INFO ata%u: protocol %d task_state %d\n,  */
/* ap-id, qc-tf.protocol, ap-hsm_task_state);  */

/* Check whether we are expecting interrupt in this state */
switch (ap-hsm_task_state) {
case HSM_ST_FIRST:
/* Some pre-ATAPI-4 devices assert INTRQ
 * at this state when ready to receive CDB.
 */

/* Check the ATA_DFLAG_CDB_INTR flag is enough here.
 * The flag was turned on only for atapi devices.
 * No need to check is_atapi_taskfile(qc-tf) again.
 */
if (!(qc-dev-flags  ATA_DFLAG_CDB_INTR))
{
/* printk(KERN_INFO ata%u: flags %lu\n,  */
/* ap-id, qc-dev-flags); */
ap-stats.idle_irq_non_atapi++;
goto idle_irq;
}
break;
case HSM_ST_LAST:
if (qc-tf.protocol == ATA_PROT_DMA ||
qc-tf.protocol == ATA_PROT_ATAPI_DMA) {
/* check status of DMA engine */
host_stat = ap-ops-bmdma_status(ap);
VPRINTK(ata%u: host_stat 0x%X\n, ap-id, host_stat);

/* if it's not our irq... */
if (!(host_stat  ATA_DMA_INTR))
/* printk(KERN_INFO ata%u: host_stat %d\n,  */
/* ap-id, host_stat); */
ap-stats.idle_irq_host_state++;
goto idle_irq;

/* before we do anything else, clear DMA-Start bit */
ap-ops-bmdma_stop(qc);

if (unlikely(host_stat  ATA_DMA_ERR)) {
/* error when transfering data to/from memory */
qc-err_mask |= AC_ERR_HOST_BUS;
ap-hsm_task_state = HSM_ST_ERR;
}
}
break;
case HSM_ST:
break;
default:
/* printk(KERN_INFO ata%u: hsm_state %d\n,  */
/* ap-id, ap-hsm_task_state); */
ap-stats.idle_irq_hsm_state++;
goto idle_irq;
}

/* check altstatus */
status = ata_altstatus(ap);
if (status  ATA_BUSY)
{
/* printk(KERN_INFO ata%u: altstatus %d\n,  */
/* ap-id, status); */
ap-stats.idle_irq_altstatus++;
goto idle_irq;
}

/* check main status, clearing INTRQ */
status = ata_chk_status(ap);
if (unlikely(status  ATA_BUSY))
{
/* printk(KERN_INFO ata%u: status %d\n,  */
/* ap-id, status); */
ap-stats.idle_irq_status++;
goto idle_irq;
}

/* ack bmdma irq events */
ap-ops-irq_clear(ap);

ata_hsm_move(ap, qc, status, 0);
return 1;   /* irq handled */

idle_irq:
ap-stats.idle_irq++;

#ifdef ATA_IRQ_TRAP
if ((ap-stats.idle_irq % 

Re: libata - 2.6.21-rc4-git5, ata channel still badly configured

2007-03-22 Thread Lukas Hejtmanek
On Thu, Mar 22, 2007 at 03:44:58PM +0900, Tejun Heo wrote:
 Lukas Hejtmanek wrote:
  Subject: ata_piix: PATA UDMA/100 configured as UDMA/33
  References : http://lkml.org/lkml/2007/2/20/294
  Submitter  : Fabio Comolli [EMAIL PROTECTED]
  Status : patch exists
  
 
 Does this fix your problem?

Yes, it does. Thank you.

-- 
Lukáš Hejtmánek
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


2.6.20.3 AMD64 oops in CFQ code

2007-03-22 Thread linux
This is a uniprocessor AMD64 system running software RAID-5 and RAID-10
over multiple PCIe SiI3132 SATA controllers.  The hardware has been very
stable for a long time, but has been acting up of late since I upgraded
to 2.6.20.3.  ECC memory should preclude the possibility of bit-flip
errors.

Kernel 2.6.20.3 + linuxpps patches (confined to drivers/serial, and not
actually in use as I stole the serial port for a console).

It takes half a day to reproduce the problem, so bisecting would be painful.

BackupPC_dump mostly writes to a large (1.7 TB) ext3 RAID5 partition.


Here are two oopes, a few minutes (16:31, to be precise) apart.
Unusually, it oopsed twice *without* locking up the system..  Usually,
I see this followed by an error from drivers/input/keyboard/atkbd.c:
printk(KERN_WARNING atkbd.c: Spurious %s on %s. 
   Some program might be trying access hardware 
directly.\n,
emitted at 1 Hz with the keyboard LEDs flashing and the system
unresponsive to keyboard or pings.
(I think it was spurious ACK on serio/input0, but my memory may be faulty.)


If anyone has any suggestions, they'd be gratefully received.


Unable to handle kernel NULL pointer dereference at 0098 RIP: 
 [8031504a] cfq_dispatch_insert+0x18/0x68
PGD 777e9067 PUD 78774067 PMD 0 
Oops:  [1] 
CPU 0 
Modules linked in: ecb
Pid: 2837, comm: BackupPC_dump Not tainted 2.6.20.3-g691f5333 #40
RIP: 0010:[8031504a]  [8031504a] 
cfq_dispatch_insert+0x18/0x68
RSP: 0018:8100770bbaf8  EFLAGS: 00010092
RAX: 81007fb36c80 RBX:  RCX: 0001
RDX: 00010003e4e7 RSI:  RDI: 
RBP: 81007fb37a00 R08:  R09: 81005d390298
R10: 81007fcb4f80 R11: 81007fcb4f80 R12: 81007facd280
R13: 0004 R14: 0001 R15: 
FS:  2b322d120d30() GS:805de000() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0098 CR3: 7bcf CR4: 06e0
Process BackupPC_dump (pid: 2837, threadinfo 8100770ba000, task 
81007fc5d8e0)
Stack:   8100770f39f0  0004
 0001 80315253 803b2607 81005da2bc40
 81007fac3800 81007facd280 81007facd280 81005d390298
Call Trace:
 [80315253] cfq_dispatch_requests+0x152/0x512
 [803b2607] scsi_done+0x0/0x18
 [8030d9f1] elv_next_request+0x137/0x147
 [803b7ce0] scsi_request_fn+0x6a/0x33a
 [8024d407] generic_unplug_device+0xa/0xe
 [80407ced] unplug_slaves+0x5b/0x94
 [80223d65] sync_page+0x0/0x40
 [80223d9b] sync_page+0x36/0x40
 [80256d45] __wait_on_bit_lock+0x36/0x65
 [80237496] __lock_page+0x5e/0x64
 [8028061d] wake_bit_function+0x0/0x23
 [802074de] find_get_page+0xe/0x2d
 [8020b38e] do_generic_mapping_read+0x1c2/0x40d
 [8020bd80] file_read_actor+0x0/0x118
 [8021422e] generic_file_aio_read+0x15c/0x19e
 [8020bafa] do_sync_read+0xc9/0x10c
 [80210342] may_open+0x5b/0x1c6
 [802805ef] autoremove_wake_function+0x0/0x2e
 [8020a857] vfs_read+0xaa/0x152
 [8020faf3] sys_read+0x45/0x6e
 [8025041e] system_call+0x7e/0x83


Code: 4c 8b ae 98 00 00 00 4c 8b 70 08 e8 63 fe ff ff 8b 43 28 4c 
RIP  [8031504a] cfq_dispatch_insert+0x18/0x68
 RSP 8100770bbaf8
CR2: 0098
 1Unable to handle kernel NULL pointer dereference at 0098 RIP: 
 [8031504a] cfq_dispatch_insert+0x18/0x68
PGD 79bd2067 PUD 789f9067 PMD 0 
Oops:  [2] 
CPU 0 
Modules linked in: ecb
Pid: 2834, comm: BackupPC_dump Not tainted 2.6.20.3-g691f5333 #40
RIP: 0010:[8031504a]  [8031504a] cfq_dispatch_insert+0x18/0x
68
RSP: 0018:8100789b5af8  EFLAGS: 00010092
RAX: 81007fb36c80 RBX:  RCX: 0001
RDX: 00010007ac16 RSI:  RDI: 
RBP: 81007fb37a00 R08:  R09: 810064dd45e0
R10: 81007fcb4f80 R11: 81007fcb4f80 R12: 81007facd280
R13: 0004 R14: 0001 R15: 
FS:  2b0a7c680d30() GS:805de000() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0098 CR3: 79d36000 CR4: 06e0
Process BackupPC_dump (pid: 2834, threadinfo 8100789b4000, task 81007a23
5140)
Stack:   81007b9ebbd0  0004
 0001 80315253 803b2607 81000e67ba00
 81007fac3800 81007facd280 81007facd280 810064dd45e0
Call Trace:
 [80315253] cfq_dispatch_requests+0x152/0x512
 [803b2607] scsi_done+0x0/0x18
 [8030d9f1] elv_next_request+0x137/0x147
 [803b7ce0] scsi_request_fn+0x6a/0x33a
 [8024d407] 

Re: 2.6.20.3 AMD64 oops in CFQ code

2007-03-22 Thread Aristeu Sergio Rozanski Filho
 This is a uniprocessor AMD64 system running software RAID-5 and RAID-10
 over multiple PCIe SiI3132 SATA controllers.  The hardware has been very
 stable for a long time, but has been acting up of late since I upgraded
 to 2.6.20.3.  ECC memory should preclude the possibility of bit-flip
 errors.
Tried checking the memory with memtest86?

Do you have k8_edac module loaded?
If you don't, I'd recomend using it to get reports of
recoverable/unrecoverable memory errors, check http://bluesmoke.sf.net/ for
latest version.

-- 
Aristeu

-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20.3 AMD64 oops in CFQ code

2007-03-22 Thread Jens Axboe
On Thu, Mar 22 2007, [EMAIL PROTECTED] wrote:
 This is a uniprocessor AMD64 system running software RAID-5 and RAID-10
 over multiple PCIe SiI3132 SATA controllers.  The hardware has been very
 stable for a long time, but has been acting up of late since I upgraded
 to 2.6.20.3.  ECC memory should preclude the possibility of bit-flip
 errors.
 
 Kernel 2.6.20.3 + linuxpps patches (confined to drivers/serial, and not
 actually in use as I stole the serial port for a console).
 
 It takes half a day to reproduce the problem, so bisecting would be painful.
 
 BackupPC_dump mostly writes to a large (1.7 TB) ext3 RAID5 partition.
 
 
 Here are two oopes, a few minutes (16:31, to be precise) apart.
 Unusually, it oopsed twice *without* locking up the system..  Usually,
 I see this followed by an error from drivers/input/keyboard/atkbd.c:
 printk(KERN_WARNING atkbd.c: Spurious %s on %s. 
Some program might be trying access hardware 
 directly.\n,
 emitted at 1 Hz with the keyboard LEDs flashing and the system
 unresponsive to keyboard or pings.
 (I think it was spurious ACK on serio/input0, but my memory may be faulty.)
 
 
 If anyone has any suggestions, they'd be gratefully received.
 
 
 Unable to handle kernel NULL pointer dereference at 0098 RIP: 
  [8031504a] cfq_dispatch_insert+0x18/0x68
 PGD 777e9067 PUD 78774067 PMD 0 
 Oops:  [1] 
 CPU 0 
 Modules linked in: ecb
 Pid: 2837, comm: BackupPC_dump Not tainted 2.6.20.3-g691f5333 #40
 RIP: 0010:[8031504a]  [8031504a] 
 cfq_dispatch_insert+0x18/0x68
 RSP: 0018:8100770bbaf8  EFLAGS: 00010092
 RAX: 81007fb36c80 RBX:  RCX: 0001
 RDX: 00010003e4e7 RSI:  RDI: 
 RBP: 81007fb37a00 R08:  R09: 81005d390298
 R10: 81007fcb4f80 R11: 81007fcb4f80 R12: 81007facd280
 R13: 0004 R14: 0001 R15: 
 FS:  2b322d120d30() GS:805de000() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 0098 CR3: 7bcf CR4: 06e0
 Process BackupPC_dump (pid: 2837, threadinfo 8100770ba000, task 
 81007fc5d8e0)
 Stack:   8100770f39f0  0004
  0001 80315253 803b2607 81005da2bc40
  81007fac3800 81007facd280 81007facd280 81005d390298
 Call Trace:
  [80315253] cfq_dispatch_requests+0x152/0x512
  [803b2607] scsi_done+0x0/0x18
  [8030d9f1] elv_next_request+0x137/0x147
  [803b7ce0] scsi_request_fn+0x6a/0x33a
  [8024d407] generic_unplug_device+0xa/0xe
  [80407ced] unplug_slaves+0x5b/0x94
  [80223d65] sync_page+0x0/0x40
  [80223d9b] sync_page+0x36/0x40
  [80256d45] __wait_on_bit_lock+0x36/0x65
  [80237496] __lock_page+0x5e/0x64
  [8028061d] wake_bit_function+0x0/0x23
  [802074de] find_get_page+0xe/0x2d
  [8020b38e] do_generic_mapping_read+0x1c2/0x40d
  [8020bd80] file_read_actor+0x0/0x118
  [8021422e] generic_file_aio_read+0x15c/0x19e
  [8020bafa] do_sync_read+0xc9/0x10c
  [80210342] may_open+0x5b/0x1c6
  [802805ef] autoremove_wake_function+0x0/0x2e
  [8020a857] vfs_read+0xaa/0x152
  [8020faf3] sys_read+0x45/0x6e
  [8025041e] system_call+0x7e/0x83

3 (I think) seperate instances of this, each involving raid5. Is your
array degraded or fully operational?


-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


request_queue_t depends on CONFIG_BLOCK

2007-03-22 Thread Olaf Hering
How can this compile error be fixed properly?
request_queue_t is inside CONFIG_BLOCK,
ide_drive_s (and likely others) use it unconditionally.


  CC  arch/powerpc/kernel/setup_64.o
In file included from linux-2.6.21-rc4/arch/powerpc/kernel/setup_64.c:23:
linux-2.6.21-rc4/include/linux/ide.h:556: error: expected 
specifier-qualifier-list before 'request_queue_t'
linux-2.6.21-rc4/include/linux/ide.h:695: warning: 'struct request' declared 
inside parameter list
linux-2.6.21-rc4/include/linux/ide.h:695: warning: its scope is only this 
definition or declaration, which is probably not what you want
linux-2.6.21-rc4/include/linux/ide.h:823: warning: 'struct request' declared 
inside parameter list
linux-2.6.21-rc4/include/linux/ide.h:856: error: field 'wrq' has incomplete type
linux-2.6.21-rc4/include/linux/ide.h:1199: error: expected ')' before '*' token
make[2]: *** [arch/powerpc/kernel/setup_64.o] Error 1
  CC  arch/powerpc/kernel/setup-common.o
In file included from linux-2.6.21-rc4/arch/powerpc/kernel/setup-common.c:24:
linux-2.6.21-rc4/include/linux/ide.h:556: error: expected 
specifier-qualifier-list before 'request_queue_t'
linux-2.6.21-rc4/include/linux/ide.h:695: warning: 'struct request' declared 
inside parameter list
linux-2.6.21-rc4/include/linux/ide.h:695: warning: its scope is only this 
definition or declaration, which is probably not what you want
linux-2.6.21-rc4/include/linux/ide.h:823: warning: 'struct request' declared 
inside parameter list
linux-2.6.21-rc4/include/linux/ide.h:856: error: field 'wrq' has incomplete type
linux-2.6.21-rc4/include/linux/ide.h:1199: error: expected ')' before '*' token
make[2]: *** [arch/powerpc/kernel/setup-common.o] Error 1
make[2]: Target `__build' not remade because of errors.
make[1]: *** [arch/powerpc/kernel] Error 2



-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: request_queue_t depends on CONFIG_BLOCK

2007-03-22 Thread Christoph Hellwig
On Thu, Mar 22, 2007 at 10:52:34PM +0100, Olaf Hering wrote:
 How can this compile error be fixed properly?
 request_queue_t is inside CONFIG_BLOCK,
 ide_drive_s (and likely others) use it unconditionally.
 
 
   CC  arch/powerpc/kernel/setup_64.o
 In file included from linux-2.6.21-rc4/arch/powerpc/kernel/setup_64.c:23:

start looking for the problem here.  Why does you arch code include ide.h?

-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] sd: implement START/STOP management

2007-03-22 Thread Vladislav Bolkhovitin

Tejun Heo wrote:

Hello, Douglas.

Douglas Gilbert wrote:


Tejun,
I note at this point that the IMMED bit in the
START STOP UNIT cdb is clear. [The code might
note that as well.] All SCSI disks that I have
seen, implement the IMMED bit and according to
the SAT standard, so should SAT layers like the
one in libata.

With the IMMED bit clear:
 - on spin up, it will wait until disk is ready.
   Okay unless there are a lot of disks, in
   which case we could ask Matthew Wilcox for help
 - on spin down, will wait until media is
   stopped. That could be 20 seconds, and if there
   were multiple disks 

I guess the question is do we need to wait until a
disk is spun down before dropping power to it
and suspending.



I think we do.  As we're issuing SYNCHRONIZE CACHE prior to spinning
down disks, it's probably okay to drop power early data-integrity-wise
but still...

We can definitely use IMMED=1 during resume (needs to be throttled
somehow tho).  This helps even when there is only one disk.  We can let
the disk spin up in the background and proceed with the rest of resuming
process.  Unfortunately, libata SAT layer doesn't do IMMED and even if
it does (I've tried and have a patch available) it doesn't really work
because during host resume each port enters EH and resets and
revalidates each device.  Many if not most ATA harddisks don't respond
to reset or IDENTIFY till it's fully spun up meaning libata EH has to
wait for all drives to spin up.  libata EH runs inside SCSI EH thread
meaning SCSI comman issue blocks till libata EH finishes resetting the
port.  So, IMMED or not, sd gotta wait for libata disks.

If we want to do parallel spin down, PM core needs to be updated such
that there are two events - issue and done - somewhat similar to what
SCSI is doing to probe devices parallelly.  If we're gonna do that, we
maybe can apply the same mechanism to resume path so that we can do
things parallelly IMMED or not.


Seems, there is another way of doing a bank spin up / spin down: doing 
it in two passes. On the first pass START_STOP will be issued with 
IMMED=1 on all devices, then on the second pass START_STOP will be 
issued with IMMED=0. So the devices will spin up / spin down in the 
parallel, but synchronously, hence the needed result will be achieved 
with minimal code changes, although it will indeed need upper layer 
changes in struct device_driver's suspend(), resume(), etc. callers.


Vlad
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] sd: implement START/STOP management

2007-03-22 Thread Henrique de Moraes Holschuh
On Thu, 22 Mar 2007, Vladislav Bolkhovitin wrote:
 Seems, there is another way of doing a bank spin up / spin down: doing 
 it in two passes. On the first pass START_STOP will be issued with 
 IMMED=1 on all devices, then on the second pass START_STOP will be 
 issued with IMMED=0. So the devices will spin up / spin down in the 
 parallel, but synchronously, hence the needed result will be achieved 

And maybe trip the PSU's overcurrent defenses?  There is a reason to default
to sequential spin-up for disks...  Of course, it can be user-selectable.
But should it be the default?

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] sd: implement START/STOP management

2007-03-22 Thread Vladislav Bolkhovitin

Henrique de Moraes Holschuh wrote:

On Thu, 22 Mar 2007, Vladislav Bolkhovitin wrote:

Seems, there is another way of doing a bank spin up / spin down: doing 
it in two passes. On the first pass START_STOP will be issued with 
IMMED=1 on all devices, then on the second pass START_STOP will be 
issued with IMMED=0. So the devices will spin up / spin down in the 
parallel, but synchronously, hence the needed result will be achieved 



And maybe trip the PSU's overcurrent defenses?  There is a reason to default
to sequential spin-up for disks... 


But on spin down there is no such problem


Of course, it can be user-selectable. But should it be the default?



-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libata: CD and dvd devices not recognized

2007-03-22 Thread Albert Lee
Sergei Shtylyov wrote:
 Hello.
 
 Albert Lee wrote:
 
 Thanks for the detailed log.
 It looks like the bad INQUIRY command
   CDB (4:0,1,0) 12 01 00 00 fe 00 00 00 00 (INQUIRY, length=254,
 EVPD=1)
 is coming from the user space, not the SCSI mid-layer.
 
 
 I guess two problems together caused this bug:
 1. Ubuntu Linux issues an incorrect INQUIRY command to the drive.
(Other distros seem to have the INQUIRY correct.)
 
 
But what is incorrect about sending INQUIRY with EVPD bit?

Nothing wrong from the SCSI point of view.
However, in the early ATAPI spec (sff-8020i), this EVPD bit is
reserved. And apprently some imperfect ATAPI CD-ROM drive doesn't
handle it well when EVPD = 1. :(

Hmm, how about the revised version:
1. Ubuntu Linux issues a correct INQUIRY command to the drive
   which set EVPD = 1. However, EVPD is reserved per early ATAPI
   spec and the AOpen 56X/AKH drive times out in this case.
   ...

--
albert



-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libata: CD and dvd devices not recognized

2007-03-22 Thread Albert Lee
Alan Cox wrote:
We have two possible solutions here:
a. Patch Ubuntu, such that the incorrect INQUIRY is fixed.
b. Patch kernel, such that the AOpen drives are blacklisted.
   Each INQUIRY is inspected for the blacklisted drives.
   If the INQUIRY looks wrong, the INQUIRY is rejected.

I guess a. is the preferred solution...
 
 
 We have two problems here
 
 #1 Ubuntu got the inquiry command wrong
 
 #2 Until now we considered INQUIRY a safe command for SG_IO passthrough.
 
 We can't really take INQUIRY out of SG_IO so do we decide its the
 hardware vendors problem or do something cleverer in the filters ?
 

Maybe the SG_IO author has better idea (ccing Doug)?

BTW, in addition to the AOpen INQUIRY with EVPD problem, we have
another imperfect ATAPI drive (TORiSAN) that freezes when READ = 128KB.
(http://bugzilla.kernel.org/show_bug.cgi?id=6710)

We can limit dev-max_sectors to workaround the TORiSAN problem.
But I don't know whether dev-max_sectors also works for SG_IO?
If no, some user space application, unaware of the problem,
might send a correct READ that locks the drive completely.

--
albert






-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: request_queue_t depends on CONFIG_BLOCK

2007-03-22 Thread Olaf Hering
On Thu, Mar 22, Christoph Hellwig wrote:

 On Thu, Mar 22, 2007 at 10:52:34PM +0100, Olaf Hering wrote:
  How can this compile error be fixed properly?
  request_queue_t is inside CONFIG_BLOCK,
  ide_drive_s (and likely others) use it unconditionally.
  
  
CC  arch/powerpc/kernel/setup_64.o
  In file included from linux-2.6.21-rc4/arch/powerpc/kernel/setup_64.c:23:
 
 start looking for the problem here.  Why does you arch code include ide.h?

Because it is needed in a few places.
Better hide everything in ide.h inside #ifdef CONFIG_IDE
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html