Re: [PATCH 2.6.13-rc1 05/10] IOCHK interface for I/O error handling/detecting

2005-07-18 Thread Grant Grundler
On Wed, Jul 06, 2005 at 02:11:42PM +0900, Hidetoshi Seto wrote:
> [This is 5 of 10 patches, "iochk-05-check_bridge.patch"]
...
>   It means that A or B hits a bus error, but there is no data
>   which one actually hits the error. So, C should notify the
>   error to both of A and B, and clear the H's status to start
>   its own I/Os.
> 
>   If there are only two devices, it become more simple. It is
>   clear if one find a bridge error while another is check-in,
>   the error is nothing except for another's.

Sorry, I don't understand this last paragraph.
I don't see how it's more simple with two devices (vs three) if
we don't exactly know which device caused the error. I thought
one still needed to reset/restart both devices. Is that correct?

The devices operate asyncronously from the drivers.
Only the driver can tell us for sure if IO was in flight for a
particular device and decide that a device could NOT have generated
an error.


Otherwise, so far, the patches look fine to me.

thanks,
grant
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.13-rc1 05/10] IOCHK interface for I/O error handling/detecting

2005-07-18 Thread Grant Grundler
On Wed, Jul 06, 2005 at 02:11:42PM +0900, Hidetoshi Seto wrote:
 [This is 5 of 10 patches, iochk-05-check_bridge.patch]
...
   It means that A or B hits a bus error, but there is no data
   which one actually hits the error. So, C should notify the
   error to both of A and B, and clear the H's status to start
   its own I/Os.
 
   If there are only two devices, it become more simple. It is
   clear if one find a bridge error while another is check-in,
   the error is nothing except for another's.

Sorry, I don't understand this last paragraph.
I don't see how it's more simple with two devices (vs three) if
we don't exactly know which device caused the error. I thought
one still needed to reset/restart both devices. Is that correct?

The devices operate asyncronously from the drivers.
Only the driver can tell us for sure if IO was in flight for a
particular device and decide that a device could NOT have generated
an error.


Otherwise, so far, the patches look fine to me.

thanks,
grant
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2.6.13-rc1 05/10] IOCHK interface for I/O error handling/detecting

2005-07-06 Thread Hidetoshi Seto

[This is 5 of 10 patches, "iochk-05-check_bridge.patch"]

- Consider three devices, A, B, and C are placed under a same
  host bridge H. After A and B checked-in (=passed iochk_clear,
  doing some I/Os, not come to call iochk_read yet), now C is
  going to check-in, just entered iochk_clear, but C finds out
  that H indicates error.

  It means that A or B hits a bus error, but there is no data
  which one actually hits the error. So, C should notify the
  error to both of A and B, and clear the H's status to start
  its own I/Os.

  If there are only two devices, it become more simple. It is
  clear if one find a bridge error while another is check-in,
  the error is nothing except for another's.

Well, works concerning registers (devices and bridges) are
almost shaped up. So, from next, I'll move to deep phase
to implement more arch-specific codes... see next (6 of 10).

Changes from previous one for 2.6.11.11:
  - (non)

Signed-off-by: Hidetoshi Seto <[EMAIL PROTECTED]>

---

 arch/ia64/lib/iomap_check.c |   45 
 1 files changed, 45 insertions(+)

Index: linux-2.6.13-rc1/arch/ia64/lib/iomap_check.c
===
--- linux-2.6.13-rc1.orig/arch/ia64/lib/iomap_check.c
+++ linux-2.6.13-rc1/arch/ia64/lib/iomap_check.c
@@ -17,6 +17,9 @@ DEFINE_SPINLOCK(iochk_lock);  /* all work
 static struct pci_dev *search_host_bridge(struct pci_dev *dev);
 static int have_error(struct pci_dev *dev);

+void notify_bridge_error(struct pci_dev *bridge);
+void clear_bridge_error(struct pci_dev *bridge);
+
 void iochk_init(void)
 {
/* setup */
@@ -33,6 +36,11 @@ void iochk_clear(iocookie *cookie, struc
cookie->host = search_host_bridge(dev);

spin_lock_irqsave(_lock, flag);
+   if (cookie->host && have_error(cookie->host)) {
+   /* someone under my bridge causes error... */
+   notify_bridge_error(cookie->host);
+   clear_bridge_error(cookie->host);
+   }
list_add(>list, _devices);
spin_unlock_irqrestore(_lock, flag);

@@ -95,5 +103,42 @@ static int have_error(struct pci_dev *de
return 0;
 }

+void notify_bridge_error(struct pci_dev *bridge)
+{
+   iocookie *cookie;
+
+   if (list_empty(_devices))
+   return;
+
+   /* notify error to all transactions using this host bridge */
+   if (bridge) {
+   /* local notify, ex. Parity, Abort etc. */
+   list_for_each_entry(cookie, _devices, list) {
+   if (cookie->host == bridge)
+   cookie->error = 1;
+   }
+   }
+}
+
+void clear_bridge_error(struct pci_dev *bridge)
+{
+   u16 status = ( PCI_STATUS_REC_TARGET_ABORT
+ | PCI_STATUS_REC_MASTER_ABORT
+ | PCI_STATUS_DETECTED_PARITY );
+
+   /* clear bridge status */
+   switch (bridge->hdr_type) {
+   case PCI_HEADER_TYPE_NORMAL: /* 0 */
+   pci_write_config_word(bridge, PCI_STATUS, status);
+   break;
+   case PCI_HEADER_TYPE_BRIDGE: /* 1 */
+   pci_write_config_word(bridge, PCI_SEC_STATUS, status);
+   break;
+   case PCI_HEADER_TYPE_CARDBUS: /* 2 */
+   default:
+   BUG();
+   }
+}
+
 EXPORT_SYMBOL(iochk_read);
 EXPORT_SYMBOL(iochk_clear);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2.6.13-rc1 05/10] IOCHK interface for I/O error handling/detecting

2005-07-06 Thread Hidetoshi Seto

[This is 5 of 10 patches, iochk-05-check_bridge.patch]

- Consider three devices, A, B, and C are placed under a same
  host bridge H. After A and B checked-in (=passed iochk_clear,
  doing some I/Os, not come to call iochk_read yet), now C is
  going to check-in, just entered iochk_clear, but C finds out
  that H indicates error.

  It means that A or B hits a bus error, but there is no data
  which one actually hits the error. So, C should notify the
  error to both of A and B, and clear the H's status to start
  its own I/Os.

  If there are only two devices, it become more simple. It is
  clear if one find a bridge error while another is check-in,
  the error is nothing except for another's.

Well, works concerning registers (devices and bridges) are
almost shaped up. So, from next, I'll move to deep phase
to implement more arch-specific codes... see next (6 of 10).

Changes from previous one for 2.6.11.11:
  - (non)

Signed-off-by: Hidetoshi Seto [EMAIL PROTECTED]

---

 arch/ia64/lib/iomap_check.c |   45 
 1 files changed, 45 insertions(+)

Index: linux-2.6.13-rc1/arch/ia64/lib/iomap_check.c
===
--- linux-2.6.13-rc1.orig/arch/ia64/lib/iomap_check.c
+++ linux-2.6.13-rc1/arch/ia64/lib/iomap_check.c
@@ -17,6 +17,9 @@ DEFINE_SPINLOCK(iochk_lock);  /* all work
 static struct pci_dev *search_host_bridge(struct pci_dev *dev);
 static int have_error(struct pci_dev *dev);

+void notify_bridge_error(struct pci_dev *bridge);
+void clear_bridge_error(struct pci_dev *bridge);
+
 void iochk_init(void)
 {
/* setup */
@@ -33,6 +36,11 @@ void iochk_clear(iocookie *cookie, struc
cookie-host = search_host_bridge(dev);

spin_lock_irqsave(iochk_lock, flag);
+   if (cookie-host  have_error(cookie-host)) {
+   /* someone under my bridge causes error... */
+   notify_bridge_error(cookie-host);
+   clear_bridge_error(cookie-host);
+   }
list_add(cookie-list, iochk_devices);
spin_unlock_irqrestore(iochk_lock, flag);

@@ -95,5 +103,42 @@ static int have_error(struct pci_dev *de
return 0;
 }

+void notify_bridge_error(struct pci_dev *bridge)
+{
+   iocookie *cookie;
+
+   if (list_empty(iochk_devices))
+   return;
+
+   /* notify error to all transactions using this host bridge */
+   if (bridge) {
+   /* local notify, ex. Parity, Abort etc. */
+   list_for_each_entry(cookie, iochk_devices, list) {
+   if (cookie-host == bridge)
+   cookie-error = 1;
+   }
+   }
+}
+
+void clear_bridge_error(struct pci_dev *bridge)
+{
+   u16 status = ( PCI_STATUS_REC_TARGET_ABORT
+ | PCI_STATUS_REC_MASTER_ABORT
+ | PCI_STATUS_DETECTED_PARITY );
+
+   /* clear bridge status */
+   switch (bridge-hdr_type) {
+   case PCI_HEADER_TYPE_NORMAL: /* 0 */
+   pci_write_config_word(bridge, PCI_STATUS, status);
+   break;
+   case PCI_HEADER_TYPE_BRIDGE: /* 1 */
+   pci_write_config_word(bridge, PCI_SEC_STATUS, status);
+   break;
+   case PCI_HEADER_TYPE_CARDBUS: /* 2 */
+   default:
+   BUG();
+   }
+}
+
 EXPORT_SYMBOL(iochk_read);
 EXPORT_SYMBOL(iochk_clear);

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/