Re: [PATCH 2.6.13] IOCHK interface for I/O error handling/detecting (for ia64)

2005-09-03 Thread Hidetoshi Seto

Matthew Wilcox wrote:

On Thu, Sep 01, 2005 at 05:45:54PM -0500, Brent Casavant wrote:


I am extremely concerned about the performance implications of this
implementation.  These changes have several deleterious effects on I/O
performance.



I agree.  I think the iochk patches should be abandoned and the feature
reimplemented on top of the asynchronous PCI error notification patches
from Linas Vepstas.


Do you mean that all architecture especially other than PPC64 already
have enough synchronization point or enough infrastructure to invoke
those notifiers effectively?  I don't think so.

As far as I know, PPC64 is enveloped by a favorable situation.
At least in situation of readX and inX on PPC64, they already have
a error check point, and the EEH technology and the firmware make their
error recovery easier.
Because PPC64 firmware (or hardware? - I'm not sure) automatically detects
errors, isolates erroneous device and returns "(type)~0" to kernel,
readX/inX doesn't need to do any expensive thing unless it get "(type)~0."
Therefore PPC64 can have a nice chance to invoke notifiers by extending
the codes in the error check point.
It is clear that doing same thing on other architecture will be quite
painful and expensive.

Why don't you think that it would be useful if the error notifier
could be invoked from iochk_read()?  I believe the iochk patches
will help implementing PCI error notification on not only IA64 but
also i386 and so on.
Or do you mean we would be happy if we all have a PPC64 box?


Thanks,
H.Seto

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.13] IOCHK interface for I/O error handling/detecting (for ia64)

2005-09-02 Thread Grant Grundler
On Fri, Sep 02, 2005 at 11:16:10AM -0700, david mosberger wrote:
> > Sorry - I think this is BS.
> > 
> > Please run mmio_test on your box and share the results.
> > mmio_test is available here:
> > svn co http://svn.gnumonks.org/trunk/mmio_test/
> 
> Reads are slow, sure, but writes are not (or should not).

Sure, MMIO writes are generally posted. But those aren't always "free".
At some point, I expect MMIO reads typically will flush those writes
and thus stall until 2 (or more) PCI bus transactions complete.

ISTR locking around MMIO writes was necessary if the box
to enforce syncronization of the error with the driver.
ISTR this syncronization was neccessary.  Was that wrong?

Complicating the MMIO perf picture are fabrics connecting the NUMA cells
which do NOT enforce MMIO ordering (e.g. Altix).
In that case, arch code will sometimes need to enforce the write ordering
by flushing MMIO writes before a driver releases a spinlock or other
syncronization primitive. This was discussed before and is archived in
the dialog between Jesse Barns and myself in late 2004 (IIRC).

In any case, mmio_test currently only tests MMIO read perf.
I need to think about how we might also test MMIO write perf.
Ie how much more expensive is MMIO read when it follows an MMIO write.


grant
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.13] IOCHK interface for I/O error handling/detecting (for ia64)

2005-09-02 Thread Matthew Wilcox
On Thu, Sep 01, 2005 at 05:45:54PM -0500, Brent Casavant wrote:
> I am extremely concerned about the performance implications of this
> implementation.  These changes have several deleterious effects on I/O
> performance.

I agree.  I think the iochk patches should be abandoned and the feature
reimplemented on top of the asynchronous PCI error notification patches
from Linas Vepstas.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.13] IOCHK interface for I/O error handling/detecting (for ia64)

2005-09-02 Thread david mosberger
On 9/2/05, Grant Grundler <[EMAIL PROTECTED]> wrote:
> On Thu, Sep 01, 2005 at 05:45:54PM -0500, Brent Casavant wrote:
> ...
> > The first is serialization of all I/O reads and writes.  This will
> > be a severe problem on systems with large numbers of PCI buses, the
> > very type of system that stands the most to gain in reliability from
> > these efforts.  At a minimum any locking should be done on a per-bus
> > basis.
> 
> The lock could be per "error domain" - that would require some
> arch specific support though to define the scope of the "error domain".

I do not think the basic inX/outX and readX/writeX operations should
involve spinlocks.  That would be really nasty if an MCA/INIT handler
had to call them, for example...

> > The second is the raw performance penalty from acquiring and dropping
> > a lock with every read and write.  This will be a substantial amount
> > of activity for any I/O-intensive system, heck even for moderate I/O
> > levels.
> 
> Sorry - I think this is BS.
> 
> Please run mmio_test on your box and share the results.
> mmio_test is available here:
> svn co http://svn.gnumonks.org/trunk/mmio_test/

Reads are slow, sure, but writes are not (or should not).

  --david
-- 
Mosberger Consulting LLC, voice/fax: 510-744-9372,
http://www.mosberger-consulting.com/
35706 Runckel Lane, Fremont, CA 94536
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.13] IOCHK interface for I/O error handling/detecting (for ia64)

2005-09-02 Thread Grant Grundler
On Thu, Sep 01, 2005 at 05:45:54PM -0500, Brent Casavant wrote:
...
> The first is serialization of all I/O reads and writes.  This will
> be a severe problem on systems with large numbers of PCI buses, the
> very type of system that stands the most to gain in reliability from
> these efforts.  At a minimum any locking should be done on a per-bus
> basis.

The lock could be per "error domain" - that would require some
arch specific support though to define the scope of the "error domain".

> The second is the raw performance penalty from acquiring and dropping
> a lock with every read and write.  This will be a substantial amount
> of activity for any I/O-intensive system, heck even for moderate I/O
> levels.

Sorry - I think this is BS.

Please run mmio_test on your box and share the results.
mmio_test is available here:
svn co http://svn.gnumonks.org/trunk/mmio_test/

Then we can talk about the cost of spinlocks vs cost of MMIO access.

thanks,
grant
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.13] IOCHK interface for I/O error handling/detecting (for ia64)

2005-09-02 Thread Hidetoshi Seto

Thank you for your comment, Brent.

Brent Casavant wrote:

On Thu, 1 Sep 2005, Hidetoshi Seto wrote:

static inline unsigned int
___ia64_inb (unsigned long port)
{
volatile unsigned char *addr = __ia64_mk_io_addr(port);
unsigned char ret;
+   unsigned long flags;

+   read_lock_irqsave(&iochk_lock,flags);
ret = *addr;
__ia64_mf_a();
+   ia64_mca_barrier(ret);
+   read_unlock_irqrestore(&iochk_lock,flags);
+
return ret;
}


I am extremely concerned about the performance implications of this
implementation.  These changes have several deleterious effects on I/O
performance.


Always there is a trade-off between security and performance.
However, I know these are kinds of interfaces for paranoia.

It would help us if I divide this patch into 2 parts, MCA related part
and CPE related part.  I'd appreciate it if you could find why I wrote
such crazy rwlock in the latter part and if you could help me with your
good sight.  I'll divide them, please wait my next mails.


The first is serialization of all I/O reads and writes.  This will
be a severe problem on systems with large numbers of PCI buses, the
very type of system that stands the most to gain in reliability from
these efforts.  At a minimum any locking should be done on a per-bus
basis.


Yes, there is a room for improvement about the lock granularity.
Maybe it should be done not on a per-bus but per-host, I think.


The second is the raw performance penalty from acquiring and dropping
a lock with every read and write.  This will be a substantial amount
of activity for any I/O-intensive system, heck even for moderate I/O
levels.


Yes, but improbably some of paranoias accepts such unpleasant without
complaining...  We could complain about the performance of RAID-5 disks.


The third is lock contention for this single lock -- I would fully expect
many dozens of processors to be performing I/O at any given time on
systems of interest, causing this to be a heavily contended lock.
This will be even more severe on NUMA systems, as the lock cacheline
bounces across the memory fabric.  A per-bus lock would again be much
more appropriate.


Yes.  This implementation (at least the rwlock) wouldn't fit to such
monster boxes.  However the goal of this interface is definitely in
such hi-end world, so possible improvement should be taken in near
future.


The final problem is that these performance penalties are paid even
by drivers which are not IOCHK aware, which for the time being is
all of them.  A reasonable solution must pay these penalties only
for drivers which are IOCHK aware.  Reinstating the readX_check()
interface is the obvious solution here.  It's simply too heavy a
performance burden to pay when almost no drivers currently benefit
from it.


Mixing aware and non-aware would make both unhappy.
At least we need to make efforts to separate them and locate them under
different host.
* readX_check(): That was kicked out by Linus.


Otherwise, I also wonder if you have any plans to handle similar
errors experienced under device-initiated DMA, or asynchronous I/O.
It's not clear that there's sufficient infrastructure in the current
patches to adequately handle those situations.

Thank you,
Brent Casavant


Every improvements could be.

Requiring data integrity on device-initiated DMA or asynchronous I/O
isn't wrong thing.  But I don't think that all errors should be handled
in one infrastructure.  There is an another approach to PCI error
handling, asynchronous recovery which Linas Vepstas (IBM) working on,
so maybe both would be required to handle various situation.

Thanks,
H.Seto

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.13] IOCHK interface for I/O error handling/detecting (for ia64)

2005-09-01 Thread Brent Casavant
On Thu, 1 Sep 2005, Hidetoshi Seto wrote:

> Index: linux-2.6.13/include/asm-ia64/io.h
> ===
> --- linux-2.6.13.orig/include/asm-ia64/io.h
> +++ linux-2.6.13/include/asm-ia64/io.h
> @@ -70,6 +70,26 @@ extern unsigned int num_io_spaces;
>  #include 
>  #include 
>  #include 
> +
> +#ifdef CONFIG_IOMAP_CHECK
> +#include 
> +#include 
> +
> +/* ia64 iocookie */
> +typedef struct {
> + struct list_headlist;
> + struct pci_dev  *dev;   /* target device */
> + struct pci_dev  *host;  /* hosting bridge */
> + unsigned long   error;  /* error flag */
> +} iocookie;
> +
> +extern rwlock_t iochk_lock;  /* see arch/ia64/lib/iomap_check.c */
> +
> +/* Enable ia64 iochk - See arch/ia64/lib/iomap_check.c */
> +#define HAVE_ARCH_IOMAP_CHECK
> +
> +#endif /* CONFIG_IOMAP_CHECK  */
> +
>  #include 
> 
>  /*
> @@ -164,14 +184,23 @@ __ia64_mk_io_addr (unsigned long port)
>   * during optimization, which is why we use "volatile" pointers.
>   */
> 
> +#ifdef CONFIG_IOMAP_CHECK
> +
> +extern void ia64_mca_barrier(unsigned long value);
> +
>  static inline unsigned int
>  ___ia64_inb (unsigned long port)
>  {
>   volatile unsigned char *addr = __ia64_mk_io_addr(port);
>   unsigned char ret;
> + unsigned long flags;
> 
> + read_lock_irqsave(&iochk_lock,flags);
>   ret = *addr;
>   __ia64_mf_a();
> + ia64_mca_barrier(ret);
> + read_unlock_irqrestore(&iochk_lock,flags);
> +
>   return ret;
>  }
> 
> @@ -180,9 +209,14 @@ ___ia64_inw (unsigned long port)
>  {
>   volatile unsigned short *addr = __ia64_mk_io_addr(port);
>   unsigned short ret;
> + unsigned long flags;
> 
> + read_lock_irqsave(&iochk_lock,flags);
>   ret = *addr;
>   __ia64_mf_a();
> + ia64_mca_barrier(ret);
> + read_unlock_irqrestore(&iochk_lock,flags);
> +
>   return ret;
>  }
> 
> @@ -191,12 +225,54 @@ ___ia64_inl (unsigned long port)
>  {
>   volatile unsigned int *addr = __ia64_mk_io_addr(port);
>   unsigned int ret;
> + unsigned long flags;
> 
> + read_lock_irqsave(&iochk_lock,flags);
>   ret = *addr;
>   __ia64_mf_a();
> + ia64_mca_barrier(ret);
> + read_unlock_irqrestore(&iochk_lock,flags);
> +
>   return ret;
>  }

I am extremely concerned about the performance implications of this
implementation.  These changes have several deleterious effects on I/O
performance.

The first is serialization of all I/O reads and writes.  This will
be a severe problem on systems with large numbers of PCI buses, the
very type of system that stands the most to gain in reliability from
these efforts.  At a minimum any locking should be done on a per-bus
basis.

The second is the raw performance penalty from acquiring and dropping
a lock with every read and write.  This will be a substantial amount
of activity for any I/O-intensive system, heck even for moderate I/O
levels.

The third is lock contention for this single lock -- I would fully expect
many dozens of processors to be performing I/O at any given time on
systems of interest, causing this to be a heavily contended lock.
This will be even more severe on NUMA systems, as the lock cacheline
bounces across the memory fabric.  A per-bus lock would again be much
more appropriate.

The final problem is that these performance penalties are paid even
by drivers which are not IOCHK aware, which for the time being is
all of them.  A reasonable solution must pay these penalties only
for drivers which are IOCHK aware.  Reinstating the readX_check()
interface is the obvious solution here.  It's simply too heavy a
performance burden to pay when almost no drivers currently benefit
from it.

Otherwise, I also wonder if you have any plans to handle similar
errors experienced under device-initiated DMA, or asynchronous I/O.
It's not clear that there's sufficient infrastructure in the current
patches to adequately handle those situations.

Thank you,
Brent Casavant

-- 
Brent Casavant  If you had nothing to fear,
[EMAIL PROTECTED]how then could you be brave?
Silicon Graphics, Inc.-- Queen Dama, Source Wars
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2.6.13] IOCHK interface for I/O error handling/detecting (for ia64)

2005-08-31 Thread Hidetoshi Seto

This patch implements ia64-specific IOCHK interfaces that enable
PCI drivers to detect error and make their error handling easier.

Please refer archives if you need, e.g. http://lwn.net/Articles/139240/

Thanks,
H.Seto

Signed-off-by: Hidetoshi Seto <[EMAIL PROTECTED]>

---
 arch/ia64/Kconfig   |   13 +++
 arch/ia64/kernel/mca.c  |   34 
 arch/ia64/kernel/mca_asm.S  |   32 
 arch/ia64/kernel/mca_drv.c  |   85 ++
 arch/ia64/lib/Makefile  |1
 arch/ia64/lib/iomap_check.c |  168 
 include/asm-ia64/io.h   |  139 
 7 files changed, 472 insertions(+)

Index: linux-2.6.13/arch/ia64/lib/Makefile
===
--- linux-2.6.13.orig/arch/ia64/lib/Makefile
+++ linux-2.6.13/arch/ia64/lib/Makefile
@@ -16,6 +16,7 @@ lib-$(CONFIG_MCKINLEY)+= copy_page_mck.
 lib-$(CONFIG_PERFMON)  += carta_random.o
 lib-$(CONFIG_MD_RAID5) += xor.o
 lib-$(CONFIG_HAVE_DEC_LOCK) += dec_and_lock.o
+lib-$(CONFIG_IOMAP_CHECK) += iomap_check.o

 AFLAGS___divdi3.o  =
 AFLAGS___udivdi3.o = -DUNSIGNED
Index: linux-2.6.13/arch/ia64/Kconfig
===
--- linux-2.6.13.orig/arch/ia64/Kconfig
+++ linux-2.6.13/arch/ia64/Kconfig
@@ -399,6 +399,19 @@ config PCI_DOMAINS
bool
default PCI

+config IOMAP_CHECK
+   bool "Support iochk interfaces for IO error detection."
+   depends on PCI && EXPERIMENTAL
+   ---help---
+ Saying Y provides iochk infrastructure for "RAS-aware" drivers
+ to detect and recover some IO errors, which strongly required by
+ some of very-high-reliable systems.
+ The implementation of this infrastructure is highly depend on arch,
+ bus system, chipset and so on.
+ Currently, very few drivers or architectures implement this support.
+
+ If you don't know what to do here, say N.
+
 source "drivers/pci/Kconfig"

 source "drivers/pci/hotplug/Kconfig"
Index: linux-2.6.13/arch/ia64/lib/iomap_check.c
===
--- /dev/null
+++ linux-2.6.13/arch/ia64/lib/iomap_check.c
@@ -0,0 +1,168 @@
+/*
+ * File:iomap_check.c
+ * Purpose: Implement the IA64 specific iomap recovery interfaces
+ */
+
+#include 
+#include 
+#include 
+
+void iochk_init(void);
+void iochk_clear(iocookie *cookie, struct pci_dev *dev);
+int  iochk_read(iocookie *cookie);
+
+struct list_head iochk_devices;
+DEFINE_RWLOCK(iochk_lock); /* all works are excluded on this lock */
+
+static struct pci_dev *search_host_bridge(struct pci_dev *dev);
+static int have_error(struct pci_dev *dev);
+
+void notify_bridge_error(struct pci_dev *bridge);
+void clear_bridge_error(struct pci_dev *bridge);
+void save_bridge_error(void);
+
+void iochk_init(void)
+{
+   /* setup */
+   INIT_LIST_HEAD(&iochk_devices);
+}
+
+void iochk_clear(iocookie *cookie, struct pci_dev *dev)
+{
+   unsigned long flag;
+
+   INIT_LIST_HEAD(&(cookie->list));
+
+   cookie->dev = dev;
+   cookie->host = search_host_bridge(dev);
+
+   write_lock_irqsave(&iochk_lock, flag);
+   if (cookie->host && have_error(cookie->host)) {
+   /* someone under my bridge causes error... */
+   notify_bridge_error(cookie->host);
+   clear_bridge_error(cookie->host);
+   }
+   list_add(&cookie->list, &iochk_devices);
+   write_unlock_irqrestore(&iochk_lock, flag);
+
+   cookie->error = 0;
+}
+
+int iochk_read(iocookie *cookie)
+{
+   unsigned long flag;
+   int ret = 0;
+
+   write_lock_irqsave(&iochk_lock, flag);
+   if ( cookie->error || have_error(cookie->dev)
+   || (cookie->host && have_error(cookie->host)) )
+   ret = 1;
+   list_del(&cookie->list);
+   write_unlock_irqrestore(&iochk_lock, flag);
+
+   return ret;
+}
+
+struct pci_dev *search_host_bridge(struct pci_dev *dev)
+{
+   struct pci_bus *pbus;
+
+   /* there is no bridge */
+   if (!dev->bus->self)
+   return NULL;
+
+   /* find root bus bridge */
+   for (pbus = dev->bus; pbus->parent && pbus->parent->self;
+   pbus = pbus->parent);
+
+   return pbus->self;
+}
+
+static int have_error(struct pci_dev *dev)
+{
+   u16 status;
+
+   /* check status */
+   switch (dev->hdr_type) {
+   case PCI_HEADER_TYPE_NORMAL: /* 0 */
+   pci_read_config_word(dev, PCI_STATUS, &status);
+   break;
+   case PCI_HEADER_TYPE_BRIDGE: /* 1 */
+   pci_read_config_word(dev, PCI_SEC_STATUS, &status);
+   break;
+   case PCI_HEADER_TYPE_CARDBUS: /* 2 */
+   return 0; /* FIX ME */
+   default:
+   BUG();
+   }
+
+   if ( (status & PCI_STATUS_REC_TARGET_ABORT)
+   || (status & PCI_STATUS_REC_MA