Re: [PATCH 2.6.21-rc5] MSI: read-flush MSI-X table (rev 2)

2007-03-29 Thread Eric W. Biederman
"Williams, Mitch A" <[EMAIL PROTECTED]> writes:

> Chuck Ebbert wrote:
>>Are you going to post one for 2.6.20 as well? Some people might be
>>interested...
>
> The first time I posted this patch, Greg KH indicated that he thought
> it was too intrusive to add to -stable, especially considering that
> our MSI-X capable hardware isn't in the field yet.

There is MSI-X capable hardware supported by the kernel in 2.6.20.

The last version of your patch with the code removed from the set_affinity
path is not very intrusive at all, as it just touches msi.c

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.21-rc5] MSI: read-flush MSI-X table (rev 2)

2007-03-29 Thread Eric W. Biederman
"Williams, Mitch A" <[EMAIL PROTECTED]> writes:

> Chuck Ebbert wrote:
>>Are you going to post one for 2.6.20 as well? Some people might be
>>interested...
>
> The first time I posted this patch, Greg KH indicated that he thought
> it was too intrusive to add to -stable, especially considering that
> our MSI-X capable hardware isn't in the field yet.

There is MSI-X capable hardware supported by the kernel in 2.6.20.

The last version of your patch with the code removed from the set_affinity
path is not very intrusive at all, as it just touches msi.c

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.21-rc5] MSI: read-flush MSI-X table (rev 2)

2007-03-29 Thread Chuck Ebbert
Williams, Mitch A wrote:
> Chuck Ebbert wrote:
>> Are you going to post one for 2.6.20 as well? Some people might be
>> interested...
> 
> The first time I posted this patch, Greg KH indicated that he thought
> it was too intrusive to add to -stable, especially considering that
> our MSI-X capable hardware isn't in the field yet.
> 
> So the answer to your question is, "probably not".

So it's not useful in 2.6.20 for anything that's available today, from
any vendor? We can put it in Fedora if you think it is.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2.6.21-rc5] MSI: read-flush MSI-X table (rev 2)

2007-03-29 Thread Williams, Mitch A
Chuck Ebbert wrote:
>Are you going to post one for 2.6.20 as well? Some people might be
>interested...

The first time I posted this patch, Greg KH indicated that he thought
it was too intrusive to add to -stable, especially considering that
our MSI-X capable hardware isn't in the field yet.

So the answer to your question is, "probably not".
-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2.6.21-rc5] MSI: read-flush MSI-X table (rev 2)

2007-03-29 Thread Williams, Mitch A
Eric W. Biederman wrote:

>Do we still need the flush the set affinity routines?
>Shouldn't flush in mask and unmask should now be enough?

Yeah, I think you're right.  I've removed that call, and 
we're running some basic validation on the change.  I'll
post a new patch tomorrow AM.
-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.21-rc5] MSI: read-flush MSI-X table (rev 2)

2007-03-29 Thread Randy Dunlap
On Thu, 29 Mar 2007 13:25:34 -0400 Chuck Ebbert wrote:

> 
> Are you going to post one for 2.6.20 as well? Some people might be
> interested...

Please don't drop cc:s.

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.21-rc5] MSI: read-flush MSI-X table (rev 2)

2007-03-29 Thread Chuck Ebbert
Mitch Williams wrote:
> This patch fixes a kernel bug which is triggered when using the
> irqbalance daemon with MSI-X hardware.
> 
> Because both MSI-X interrupt messages and MSI-X table writes are posted,
> it's possible for them to cross while in-flight.  This results in
> interrupts being received long after the kernel thinks they're disabled,
> and in interrupts being sent to stale vectors after rebalancing.
> 
> This patch performs a read flush after writes to the MSI-X table for
> mask/unmask and rebalancing operations.
> 
> This patch has been validated with (unreleased) network hardware which
> uses MSI-X.
> 
> Revised with input from Eric Biederman.
> 
> Signed-off-by: Mitch Williams <[EMAIL PROTECTED]>
> 

Are you going to post one for 2.6.20 as well? Some people might be
interested...


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.21-rc5] MSI: read-flush MSI-X table (rev 2)

2007-03-29 Thread Eric W. Biederman
Mitch Williams <[EMAIL PROTECTED]> writes:

> This patch fixes a kernel bug which is triggered when using the
> irqbalance daemon with MSI-X hardware.
>
> Because both MSI-X interrupt messages and MSI-X table writes are posted,
> it's possible for them to cross while in-flight.  This results in
> interrupts being received long after the kernel thinks they're disabled,
> and in interrupts being sent to stale vectors after rebalancing.
>
> This patch performs a read flush after writes to the MSI-X table for
> mask/unmask and rebalancing operations.
>
> This patch has been validated with (unreleased) network hardware which
> uses MSI-X.
>
> Revised with input from Eric Biederman.

Do we still need the flush the set affinity routines?
Shouldn't flush in mask and unmask should now be enough?


>
> Signed-off-by: Mitch Williams <[EMAIL PROTECTED]>
>
> diff -urpN -X dontdiff linux-2.6.21-rc5-clean/arch/i386/kernel/io_apic.c
> linux-2.6.21-rc5/arch/i386/kernel/io_apic.c
> --- linux-2.6.21-rc5-clean/arch/i386/kernel/io_apic.c 2007-03-28
> 10:05:22.0 -0700
> +++ linux-2.6.21-rc5/arch/i386/kernel/io_apic.c 2007-03-28 10:19:14.0
> -0700
> @@ -2584,6 +2584,7 @@ static void set_msi_irq_affinity(unsigne
>   msg.address_lo |= MSI_ADDR_DEST_ID(dest);
>  
>   write_msi_msg(irq, &msg);
> + msix_flush_writes(irq);
>   irq_desc[irq].affinity = mask;
>  }
>  #endif /* CONFIG_SMP */
> diff -urpN -X dontdiff linux-2.6.21-rc5-clean/arch/ia64/kernel/msi_ia64.c
> linux-2.6.21-rc5/arch/ia64/kernel/msi_ia64.c
> --- linux-2.6.21-rc5-clean/arch/ia64/kernel/msi_ia64.c 2007-03-28
> 10:05:22.0 -0700
> +++ linux-2.6.21-rc5/arch/ia64/kernel/msi_ia64.c 2007-03-28 09:34:27.0
> -0700
> @@ -60,6 +60,7 @@ static void ia64_set_msi_irq_affinity(un
>   msg.address_lo = addr;
>  
>   write_msi_msg(irq, &msg);
> + msix_flush_writes(irq);
>   irq_desc[irq].affinity = cpu_mask;
>  }
>  #endif /* CONFIG_SMP */
> diff -urpN -X dontdiff linux-2.6.21-rc5-clean/arch/ia64/sn/kernel/msi_sn.c
> linux-2.6.21-rc5/arch/ia64/sn/kernel/msi_sn.c
> --- linux-2.6.21-rc5-clean/arch/ia64/sn/kernel/msi_sn.c 2007-03-28
> 10:05:22.0 -0700
> +++ linux-2.6.21-rc5/arch/ia64/sn/kernel/msi_sn.c 2007-03-28 
> 09:34:06.0
> -0700
> @@ -204,6 +204,7 @@ static void sn_set_msi_irq_affinity(unsi
>   msg.address_lo = (u32)(bus_addr & 0x);
>  
>   write_msi_msg(irq, &msg);
> + msix_flush_writes(irq);
>   irq_desc[irq].affinity = cpu_mask;
>  }
>  #endif /* CONFIG_SMP */
> diff -urpN -X dontdiff linux-2.6.21-rc5-clean/arch/x86_64/kernel/io_apic.c
> linux-2.6.21-rc5/arch/x86_64/kernel/io_apic.c
> --- linux-2.6.21-rc5-clean/arch/x86_64/kernel/io_apic.c 2007-03-28
> 10:05:22.0 -0700
> +++ linux-2.6.21-rc5/arch/x86_64/kernel/io_apic.c 2007-03-28 
> 10:18:52.0
> -0700
> @@ -1956,6 +1956,7 @@ static void set_msi_irq_affinity(unsigne
>   msg.address_lo |= MSI_ADDR_DEST_ID(dest);
>  
>   write_msi_msg(irq, &msg);
> + msix_flush_writes(irq);
>   irq_desc[irq].affinity = mask;
>  }
>  #endif /* CONFIG_SMP */
> diff -urpN -X dontdiff linux-2.6.21-rc5-clean/drivers/pci/msi.c
> linux-2.6.21-rc5/drivers/pci/msi.c
> --- linux-2.6.21-rc5-clean/drivers/pci/msi.c 2007-03-28 10:05:24.0 
> -0700
> +++ linux-2.6.21-rc5/drivers/pci/msi.c2007-03-28 09:21:34.0 
> -0700
> @@ -68,6 +68,29 @@ static void msix_set_enable(struct pci_d
>   }
>  }
>  
> +void msix_flush_writes(unsigned int irq)
> +{
> + struct msi_desc *entry;
> +
> + entry = get_irq_msi(irq);
> + BUG_ON(!entry || !entry->dev);
> + switch (entry->msi_attrib.type) {
> + case PCI_CAP_ID_MSI:
> + /* nothing to do */
> + break;
> + case PCI_CAP_ID_MSIX:
> + {
> + int offset = entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
> + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET;
> + readl(entry->mask_base + offset);
> + break;
> + }
> + default:
> + BUG();
> + break;
> + }
> +}
> +
>  static void msi_set_mask_bit(unsigned int irq, int flag)
>  {
>   struct msi_desc *entry;
> @@ -186,11 +209,13 @@ void write_msi_msg(unsigned int irq, str
>  void mask_msi_irq(unsigned int irq)
>  {
>   msi_set_mask_bit(irq, 1);
> + msix_flush_writes(irq);
>  }
>  
>  void unmask_msi_irq(unsigned int irq)
>  {
>   msi_set_mask_bit(irq, 0);
> + msix_flush_writes(irq);
>  }
>  
>  static int msi_free_irq(struct pci_dev* dev, int irq);
> diff -urpN -X dontdiff linux-2.6.21-rc5-clean/include/linux/msi.h
> linux-2.6.21-rc5/include/linux/msi.h
> --- linux-2.6.21-rc5-clean/include/linux/msi.h 2007-03-28 10:05:25.0
> -0700
> +++ linux-2.6.21-rc5/include/linux/msi.h 2007-03-28 09:21:51.0 -0700
> @@ -12,6 +12,7 @@ extern void mask_msi_irq(unsigned int ir
>  extern void unmask_msi_irq(unsigned int irq);
>  extern void read_msi_msg(unsigned int irq, struct 

[PATCH 2.6.21-rc5] MSI: read-flush MSI-X table (rev 2)

2007-03-29 Thread Mitch Williams
This patch fixes a kernel bug which is triggered when using the
irqbalance daemon with MSI-X hardware.

Because both MSI-X interrupt messages and MSI-X table writes are posted,
it's possible for them to cross while in-flight.  This results in
interrupts being received long after the kernel thinks they're disabled,
and in interrupts being sent to stale vectors after rebalancing.

This patch performs a read flush after writes to the MSI-X table for
mask/unmask and rebalancing operations.

This patch has been validated with (unreleased) network hardware which
uses MSI-X.

Revised with input from Eric Biederman.

Signed-off-by: Mitch Williams <[EMAIL PROTECTED]>

diff -urpN -X dontdiff linux-2.6.21-rc5-clean/arch/i386/kernel/io_apic.c 
linux-2.6.21-rc5/arch/i386/kernel/io_apic.c
--- linux-2.6.21-rc5-clean/arch/i386/kernel/io_apic.c   2007-03-28 
10:05:22.0 -0700
+++ linux-2.6.21-rc5/arch/i386/kernel/io_apic.c 2007-03-28 10:19:14.0 
-0700
@@ -2584,6 +2584,7 @@ static void set_msi_irq_affinity(unsigne
msg.address_lo |= MSI_ADDR_DEST_ID(dest);
 
write_msi_msg(irq, &msg);
+   msix_flush_writes(irq);
irq_desc[irq].affinity = mask;
 }
 #endif /* CONFIG_SMP */
diff -urpN -X dontdiff linux-2.6.21-rc5-clean/arch/ia64/kernel/msi_ia64.c 
linux-2.6.21-rc5/arch/ia64/kernel/msi_ia64.c
--- linux-2.6.21-rc5-clean/arch/ia64/kernel/msi_ia64.c  2007-03-28 
10:05:22.0 -0700
+++ linux-2.6.21-rc5/arch/ia64/kernel/msi_ia64.c2007-03-28 
09:34:27.0 -0700
@@ -60,6 +60,7 @@ static void ia64_set_msi_irq_affinity(un
msg.address_lo = addr;
 
write_msi_msg(irq, &msg);
+   msix_flush_writes(irq);
irq_desc[irq].affinity = cpu_mask;
 }
 #endif /* CONFIG_SMP */
diff -urpN -X dontdiff linux-2.6.21-rc5-clean/arch/ia64/sn/kernel/msi_sn.c 
linux-2.6.21-rc5/arch/ia64/sn/kernel/msi_sn.c
--- linux-2.6.21-rc5-clean/arch/ia64/sn/kernel/msi_sn.c 2007-03-28 
10:05:22.0 -0700
+++ linux-2.6.21-rc5/arch/ia64/sn/kernel/msi_sn.c   2007-03-28 
09:34:06.0 -0700
@@ -204,6 +204,7 @@ static void sn_set_msi_irq_affinity(unsi
msg.address_lo = (u32)(bus_addr & 0x);
 
write_msi_msg(irq, &msg);
+   msix_flush_writes(irq);
irq_desc[irq].affinity = cpu_mask;
 }
 #endif /* CONFIG_SMP */
diff -urpN -X dontdiff linux-2.6.21-rc5-clean/arch/x86_64/kernel/io_apic.c 
linux-2.6.21-rc5/arch/x86_64/kernel/io_apic.c
--- linux-2.6.21-rc5-clean/arch/x86_64/kernel/io_apic.c 2007-03-28 
10:05:22.0 -0700
+++ linux-2.6.21-rc5/arch/x86_64/kernel/io_apic.c   2007-03-28 
10:18:52.0 -0700
@@ -1956,6 +1956,7 @@ static void set_msi_irq_affinity(unsigne
msg.address_lo |= MSI_ADDR_DEST_ID(dest);
 
write_msi_msg(irq, &msg);
+   msix_flush_writes(irq);
irq_desc[irq].affinity = mask;
 }
 #endif /* CONFIG_SMP */
diff -urpN -X dontdiff linux-2.6.21-rc5-clean/drivers/pci/msi.c 
linux-2.6.21-rc5/drivers/pci/msi.c
--- linux-2.6.21-rc5-clean/drivers/pci/msi.c2007-03-28 10:05:24.0 
-0700
+++ linux-2.6.21-rc5/drivers/pci/msi.c  2007-03-28 09:21:34.0 -0700
@@ -68,6 +68,29 @@ static void msix_set_enable(struct pci_d
}
 }
 
+void msix_flush_writes(unsigned int irq)
+{
+   struct msi_desc *entry;
+
+   entry = get_irq_msi(irq);
+   BUG_ON(!entry || !entry->dev);
+   switch (entry->msi_attrib.type) {
+   case PCI_CAP_ID_MSI:
+   /* nothing to do */
+   break;
+   case PCI_CAP_ID_MSIX:
+   {
+   int offset = entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
+   PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET;
+   readl(entry->mask_base + offset);
+   break;
+   }
+   default:
+   BUG();
+   break;
+   }
+}
+
 static void msi_set_mask_bit(unsigned int irq, int flag)
 {
struct msi_desc *entry;
@@ -186,11 +209,13 @@ void write_msi_msg(unsigned int irq, str
 void mask_msi_irq(unsigned int irq)
 {
msi_set_mask_bit(irq, 1);
+   msix_flush_writes(irq);
 }
 
 void unmask_msi_irq(unsigned int irq)
 {
msi_set_mask_bit(irq, 0);
+   msix_flush_writes(irq);
 }
 
 static int msi_free_irq(struct pci_dev* dev, int irq);
diff -urpN -X dontdiff linux-2.6.21-rc5-clean/include/linux/msi.h 
linux-2.6.21-rc5/include/linux/msi.h
--- linux-2.6.21-rc5-clean/include/linux/msi.h  2007-03-28 10:05:25.0 
-0700
+++ linux-2.6.21-rc5/include/linux/msi.h2007-03-28 09:21:51.0 
-0700
@@ -12,6 +12,7 @@ extern void mask_msi_irq(unsigned int ir
 extern void unmask_msi_irq(unsigned int irq);
 extern void read_msi_msg(unsigned int irq, struct msi_msg *msg);
 extern void write_msi_msg(unsigned int irq, struct msi_msg *msg);
+extern void msix_flush_writes(unsigned int irq);
 
 struct msi_desc {
struct {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More major

RE: [PATCH 2.6.21-rc5] MSI: read-flush MSI-X table

2007-03-28 Thread Williams, Mitch A
Eric W. Biederman wrote:
>The practical question in my book is do we set the enable/disable
>methods to the same functions as the mask/unmask methods or
>do we let them default to the crazy delayed disable scenario.
>
>Given that we do have a tiny race where we need to ensure the
>MSI is disabled before we unregister it, we don't know of any
>MSI implementation problems that will result in a screaming IRQ.
>I would say set enable/disable to the mask/unmask methods.
>

OK, that's easy.  I'll whip up a patch today, test it overnight,
and post it tomorrow.

Thanks for looking at this.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.21-rc5] MSI: read-flush MSI-X table

2007-03-27 Thread Eric W. Biederman
"Williams, Mitch A" <[EMAIL PROTECTED]> writes:

> Doh!  I was reading the code wrong.  We only mask if we're still
> handling a previous interrupt on the same vector.  My bad.
>
> However, I can't really see where mask() is used outside of that
> instance.  Which then leads us back to the question:  do we need
> a read flush on mask/unmask or just enable/disable?

I'm not even certain we need the read flush in the enable.
However having it in there makes the code easier to reason
about.  Which is a big plus.

Generally if the interrupt controller hardware is sane 
mask/unmask and enable/disable should be the same function.

If we need to work around something in the hardware enable/disable
should do that and mask/unmask should poke the hardware.

Since MSI is specified as properly handle pending interrupts
I would put the write flush in mask.  It makes the code easier
to understand and comprehend.

The practical question in my book is do we set the enable/disable
methods to the same functions as the mask/unmask methods or
do we let them default to the crazy delayed disable scenario.

Given that we do have a tiny race where we need to ensure the
MSI is disabled before we unregister it, we don't know of any
MSI implementation problems that will result in a screaming IRQ.
I would say set enable/disable to the mask/unmask methods.

This will fix the tiny freeing bug mentioned above, and not play
games with drivers that are using MSI irqs.

If at some point we need a lesser form someone can change the msi
enable/disable methods to something else.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2.6.21-rc5] MSI: read-flush MSI-X table

2007-03-27 Thread Williams, Mitch A
Eric W. Biederman wrote:

>> However the mask function is called at EVERY interrupt,
>> so this change would be VERY expensive.
>
>If true I think that would be bad.  However I don't see it.
>Where in handle_edge_irq do we mask the interrupt?
>The only place I see us calling ->mask is from move_native_irq
>and that only if IRQ_MOVE_PENDING is true.
>
>All I can see is us routinely calling is ack_APIC_edge.

Doh!  I was reading the code wrong.  We only mask if we're still
handling a previous interrupt on the same vector.  My bad.

However, I can't really see where mask() is used outside of that
instance.  Which then leads us back to the question:  do we need
a read flush on mask/unmask or just enable/disable?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.21-rc5] MSI: read-flush MSI-X table

2007-03-27 Thread Eric W. Biederman
"Williams, Mitch A" <[EMAIL PROTECTED]> writes:

> Eric W. Biederman wrote: 
>>
>>> Because enabling and disabling the MSI interrupt is done through
>>> config space, and config space writes are not posted.  So we won't
>>> see the problem that we do with MSI-X.
>>
>>Normally this is true.  However when we have  memory mapped pci config
>>space the writes could very easily be posted.  Even if there aren't
>>any Intel platforms that do it today other architectures might.
>
>>From what I read in the spec, configuration writes are not ever posted.
> They are a completely separate transaction type from memory writes, and
> only memory writes are posted.  So there's no need to read-flush these.

You may have a point there.  It isn't important at this point, and
it is easy enough to fix at this point.

>>The more I think about this the more I think we should make mask and
>>unmask do the read and set enable/disable to mask/unmask in the msi
>>case.
>>
>>I think being a little more paranoid will result in slightly simpler
>>more maintainable code, and not measurably affect performance.  I
>>don't expect you are migrating irqs in a fast path.
>
> Migrating IRQs happens in the fast path, but infrequently (every 10 
> seconds or so).  

Migrating IRQS gets called from the irq handler.  So I guess in that
sense it qualifies as a fast path operation.   The actual irq migration
itself is time consuming even if the individual operations aren't.  In
large part because ioapics don't appear to obey pci ordering rules so
flushing pending irqs is black magic :(

The comparative low frequency is important.

I guess what I meant is as soon as the test to for IRQ_MOVE_PENDING
is true we need to migrate an irq so we branch off of the fast path,
and irq handling path, where we do what we have to make things
work and the cost is what it takes to be reliable.

> However the mask function is called at EVERY interrupt,
> so this change would be VERY expensive.

If true I think that would be bad.  However I don't see it.
Where in handle_edge_irq do we mask the interrupt?
The only place I see us calling ->mask is from move_native_irq
and that only if IRQ_MOVE_PENDING is true.

All I can see is us routinely calling is ack_APIC_edge.

> If the driver really needs to disable the interrupt, then it can call
> irq_disable().

Agreed.  However the driver should not have any access to the mask method
in any shape or form.  That is an internal irq implementation helper.

>  Otherwise, mask (as-is) should be fine -- it masks
> the interrupt at the APIC, and the device's interrupt moderation
> should take care of keeping it from generating interrupts right away.

Please point at code because you are reading the code quite differently
from myself, and I think I have been over that code enough times to see
how it operates.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2.6.21-rc5] MSI: read-flush MSI-X table

2007-03-27 Thread Williams, Mitch A
Eric W. Biederman wrote: 
>
>> Because enabling and disabling the MSI interrupt is done through
>> config space, and config space writes are not posted.  So we won't
>> see the problem that we do with MSI-X.
>
>Normally this is true.  However when we have  memory mapped pci config
>space the writes could very easily be posted.  Even if there aren't
>any Intel platforms that do it today other architectures might.

>From what I read in the spec, configuration writes are not ever posted.
They are a completely separate transaction type from memory writes, and
only memory writes are posted.  So there's no need to read-flush these.

>The more I think about this the more I think we should make mask and
>unmask do the read and set enable/disable to mask/unmask in the msi
>case.
>
>I think being a little more paranoid will result in slightly simpler
>more maintainable code, and not measurably affect performance.  I
>don't expect you are migrating irqs in a fast path.

Migrating IRQs happens in the fast path, but infrequently (every 10 
seconds or so).  However the mask function is called at EVERY interrupt,
so this change would be VERY expensive.

If the driver really needs to disable the interrupt, then it can call
irq_disable().  Otherwise, mask (as-is) should be fine -- it masks
the interrupt at the APIC, and the device's interrupt moderation
should take care of keeping it from generating interrupts right away.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.21-rc5] MSI: read-flush MSI-X table

2007-03-27 Thread Eric W. Biederman
"Williams, Mitch A" <[EMAIL PROTECTED]> writes:

> Grant Grundler wrote: 
>
>>Why wouldn't MSI have the same problems as MSI-X?
>>
>
> Because enabling and disabling the MSI interrupt is done through
> config space, and config space writes are not posted.  So we won't
> see the problem that we do with MSI-X.

Normally this is true.  However when we have  memory mapped pci config
space the writes could very easily be posted.  Even if there aren't
any Intel platforms that do it today other architectures might.

The more I think about this the more I think we should make mask and
unmask do the read and set enable/disable to mask/unmask in the msi
case.

Ugh.  Looking closer the function should really be called msi_flush
or msi_flush_writes not msix_flush_writes.

William do you think you can fix any of that up?
That is name the function msi_flush_writes and call it from
mask_msi_irq/unmask_msi_irq.

I think being a little more paranoid will result in slightly simpler
more maintainable code, and not measurably affect performance.  I
don't expect you are migrating irqs in a fast path.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2.6.21-rc5] MSI: read-flush MSI-X table

2007-03-27 Thread Williams, Mitch A
Grant Grundler wrote: 

>Why wouldn't MSI have the same problems as MSI-X?
>

Because enabling and disabling the MSI interrupt is done through
config space, and config space writes are not posted.  So we won't
see the problem that we do with MSI-X.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.21-rc5] MSI: read-flush MSI-X table

2007-03-27 Thread Grant Grundler
On Mon, Mar 26, 2007 at 04:18:22PM -0700, Mitch Williams wrote:
> This patch fixes a kernel bug which is triggered when using the
> irqbalance daemon with MSI-X hardware.
> 
> Because both MSI-X interrupt messages and MSI-X table writes are posted,
> it's possible for them to cross while in-flight.  This results in
> interrupts being received long after the kernel thinks they're disabled,
> and in interrupts being sent to stale vectors after rebalancing.
>
> This patch performs a read flush after writes to the MSI-X table for
> enable/disable and rebalancing operations.

Why wouldn't MSI have the same problems as MSI-X?

...
> diff -urpN -X dontdiff linux-2.6.21-rc4-clean/drivers/pci/msi.c 
> linux-2.6.21-rc4/drivers/pci/msi.c
> --- linux-2.6.21-rc4-clean/drivers/pci/msi.c  2007-03-19 16:16:32.0 
> -0700
> +++ linux-2.6.21-rc4/drivers/pci/msi.c2007-03-21 12:44:51.0 
> -0700
> @@ -68,6 +68,29 @@ static void msix_set_enable(struct pci_d
>   }
>  }
>  
> +void msix_flush_writes(unsigned int irq)
> +{
> + struct msi_desc *entry;
> +
> + entry = get_irq_msi(irq);
> + BUG_ON(!entry || !entry->dev);
> + switch (entry->msi_attrib.type) {
> + case PCI_CAP_ID_MSI:
> + /* nothing to do */
> + break;
> + case PCI_CAP_ID_MSIX:
> + {
> + int offset = entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
> + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET;
> + readl(entry->mask_base + offset);
> + break;
> + }
> + default:
> + BUG();
> + break;
> + }
> +}

PCI_CAP_ID_MSI case seems wrong to me.
I was expecting a readl() in that case as well.

thanks,
grant
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.21-rc5] MSI: read-flush MSI-X table

2007-03-27 Thread Eric W. Biederman
Grant Grundler <[EMAIL PROTECTED]> writes:

> On Tue, Mar 27, 2007 at 07:23:16AM -0600, Eric W. Biederman wrote:
>> I guess I should add that I'm not certain that the code is exactly correct
>> there are weird differences between enable/disable and mask.
>
> My understanding was "enable" would clear (or ignore) pending interrupts
> and "unmask" would deliver pending interrupts. Disable and mask could
> in many implementations be the same thing as long as the enable/unmask
> difference was supported.

enable/disable are what are available to drivers.  When the call
enable_irq or  disable_irq.  Frequently we have code to make up
for deficiencies in the hardware irq controller implementations
here.

mask/unmask are helper functions only used by the internals of the
irq implementation, and actually are required to touch the hardware.

The problem on x86 is that an ioapic will drop interrupts that come
in while it is masked.Thus we have to play software games not
to loose pending interrupts (i.e. leave the irq enabled until we
get a pending interrupt).

So I think you had the general drift although you had a couple of
details confused.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.21-rc5] MSI: read-flush MSI-X table

2007-03-27 Thread Grant Grundler
On Tue, Mar 27, 2007 at 07:23:16AM -0600, Eric W. Biederman wrote:
> I guess I should add that I'm not certain that the code is exactly correct
> there are weird differences between enable/disable and mask.

My understanding was "enable" would clear (or ignore) pending interrupts
and "unmask" would deliver pending interrupts. Disable and mask could
in many implementations be the same thing as long as the enable/unmask
difference was supported.

thanks,
grnat

>   Where generally
> the mask/unmask methods do the work and enable/disable do some weird software
> thing.  Having them different and enable/disable not doing some software
> thing concerns me a little.  I think mask/unmask may been overoptimized
> in this case.
> 
> So I expect someone will wind up refactor this code at some point.
> 
> However the code is clearly better than what we have now, and it can't
> affect anything else.
> 
> Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.21-rc5] MSI: read-flush MSI-X table

2007-03-27 Thread Eric W. Biederman
Greg KH <[EMAIL PROTECTED]> writes:

> On Mon, Mar 26, 2007 at 04:18:22PM -0700, Mitch Williams wrote:
>> This patch fixes a kernel bug which is triggered when using the
>> irqbalance daemon with MSI-X hardware.
>> 
>> Because both MSI-X interrupt messages and MSI-X table writes are posted,
>> it's possible for them to cross while in-flight.  This results in
>> interrupts being received long after the kernel thinks they're disabled,
>> and in interrupts being sent to stale vectors after rebalancing.
>> 
>> This patch performs a read flush after writes to the MSI-X table for
>> enable/disable and rebalancing operations.  Because this is an expensive
>> operation, we do not perform the read flush after mask/unmask
>> operations.  Hardware which supports MSI-X typically also supports some
>> sort of interrupt moderation, so a read-flush is not necessary for
>> mask/unmask operations.
>> 
>> This patch has been validated with (unreleased) network hardware which
>> uses MSI-X.
>
> How well does this play with the MSI core changes that Michael Ellerman
> has proposed on the linux-pci mailing list?

I guess I should add that I'm not certain that the code is exactly correct
there are weird differences between enable/disable and mask.  Where generally
the mask/unmask methods do the work and enable/disable do some weird software
thing.  Having them different and enable/disable not doing some software
thing concerns me a little.  I think mask/unmask may been overoptimized
in this case.

So I expect someone will wind up refactor this code at some point.

However the code is clearly better than what we have now, and it can't
affect anything else.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.21-rc5] MSI: read-flush MSI-X table

2007-03-26 Thread Eric W. Biederman
Greg KH <[EMAIL PROTECTED]> writes:

> How well does this play with the MSI core changes that Michael Ellerman
> has proposed on the linux-pci mailing list?

The patch looks reasonable and it is independent of those changes.

This just modifies the helper code for using the msi capability itself as
an interrupt controller.   Other architectures that mask/unmask the interrupt
at an architecture specific interrupt controller  won't be affected.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.21-rc5] MSI: read-flush MSI-X table

2007-03-26 Thread Greg KH
On Mon, Mar 26, 2007 at 04:18:22PM -0700, Mitch Williams wrote:
> This patch fixes a kernel bug which is triggered when using the
> irqbalance daemon with MSI-X hardware.
> 
> Because both MSI-X interrupt messages and MSI-X table writes are posted,
> it's possible for them to cross while in-flight.  This results in
> interrupts being received long after the kernel thinks they're disabled,
> and in interrupts being sent to stale vectors after rebalancing.
> 
> This patch performs a read flush after writes to the MSI-X table for
> enable/disable and rebalancing operations.  Because this is an expensive
> operation, we do not perform the read flush after mask/unmask
> operations.  Hardware which supports MSI-X typically also supports some
> sort of interrupt moderation, so a read-flush is not necessary for
> mask/unmask operations.
> 
> This patch has been validated with (unreleased) network hardware which
> uses MSI-X.

How well does this play with the MSI core changes that Michael Ellerman
has proposed on the linux-pci mailing list?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2.6.21-rc5] MSI: read-flush MSI-X table

2007-03-26 Thread Mitch Williams
This patch fixes a kernel bug which is triggered when using the
irqbalance daemon with MSI-X hardware.

Because both MSI-X interrupt messages and MSI-X table writes are posted,
it's possible for them to cross while in-flight.  This results in
interrupts being received long after the kernel thinks they're disabled,
and in interrupts being sent to stale vectors after rebalancing.

This patch performs a read flush after writes to the MSI-X table for
enable/disable and rebalancing operations.  Because this is an expensive
operation, we do not perform the read flush after mask/unmask
operations.  Hardware which supports MSI-X typically also supports some
sort of interrupt moderation, so a read-flush is not necessary for
mask/unmask operations.

This patch has been validated with (unreleased) network hardware which
uses MSI-X.

Generated from 2.6.21-rc4; applies cleanly to 2.6.21-rc5.

Signed-off-by: Mitch Williams <[EMAIL PROTECTED]>

diff -urpN -X dontdiff linux-2.6.21-rc4-clean/arch/i386/kernel/io_apic.c 
linux-2.6.21-rc4/arch/i386/kernel/io_apic.c
--- linux-2.6.21-rc4-clean/arch/i386/kernel/io_apic.c   2007-03-19 
16:16:30.0 -0700
+++ linux-2.6.21-rc4/arch/i386/kernel/io_apic.c 2007-03-19 16:24:05.0 
-0700
@@ -2594,6 +2594,8 @@ static void set_msi_irq_affinity(unsigne
  */
 static struct irq_chip msi_chip = {
.name   = "PCI-MSI",
+   .enable = enable_msi_irq,
+   .disable= disable_msi_irq,
.unmask = unmask_msi_irq,
.mask   = mask_msi_irq,
.ack= ack_ioapic_irq,
diff -urpN -X dontdiff linux-2.6.21-rc4-clean/arch/ia64/kernel/msi_ia64.c 
linux-2.6.21-rc4/arch/ia64/kernel/msi_ia64.c
--- linux-2.6.21-rc4-clean/arch/ia64/kernel/msi_ia64.c  2007-03-19 
16:16:30.0 -0700
+++ linux-2.6.21-rc4/arch/ia64/kernel/msi_ia64.c2007-03-19 
16:24:05.0 -0700
@@ -121,6 +121,8 @@ static int ia64_msi_retrigger_irq(unsign
  */
 static struct irq_chip ia64_msi_chip = {
.name   = "PCI-MSI",
+   .enable = enable_msi_irq,
+   .disable= disable_msi_irq,
.mask   = mask_msi_irq,
.unmask = unmask_msi_irq,
.ack= ia64_ack_msi_irq,
diff -urpN -X dontdiff linux-2.6.21-rc4-clean/arch/ia64/sn/kernel/msi_sn.c 
linux-2.6.21-rc4/arch/ia64/sn/kernel/msi_sn.c
--- linux-2.6.21-rc4-clean/arch/ia64/sn/kernel/msi_sn.c 2007-03-19 
16:16:30.0 -0700
+++ linux-2.6.21-rc4/arch/ia64/sn/kernel/msi_sn.c   2007-03-19 
16:24:05.0 -0700
@@ -224,6 +224,8 @@ static int sn_msi_retrigger_irq(unsigned
 
 static struct irq_chip sn_msi_chip = {
.name   = "PCI-MSI",
+   .enable = enable_msi_irq,
+   .disable= disable_msi_irq,
.mask   = mask_msi_irq,
.unmask = unmask_msi_irq,
.ack= sn_ack_msi_irq,
diff -urpN -X dontdiff linux-2.6.21-rc4-clean/arch/x86_64/kernel/io_apic.c 
linux-2.6.21-rc4/arch/x86_64/kernel/io_apic.c
--- linux-2.6.21-rc4-clean/arch/x86_64/kernel/io_apic.c 2007-03-19 
16:16:31.0 -0700
+++ linux-2.6.21-rc4/arch/x86_64/kernel/io_apic.c   2007-03-21 
12:44:51.0 -0700
@@ -1942,6 +1942,7 @@ static void set_msi_irq_affinity(unsigne
if (cpus_empty(tmp))
return;
 
+   msix_flush_writes(irq);
if (assign_irq_vector(irq, mask))
return;
 
@@ -1956,6 +1957,7 @@ static void set_msi_irq_affinity(unsigne
msg.address_lo |= MSI_ADDR_DEST_ID(dest);
 
write_msi_msg(irq, &msg);
+   msix_flush_writes(irq);
irq_desc[irq].affinity = mask;
 }
 #endif /* CONFIG_SMP */
@@ -1966,6 +1968,8 @@ static void set_msi_irq_affinity(unsigne
  */
 static struct irq_chip msi_chip = {
.name   = "PCI-MSI",
+   .enable = enable_msi_irq,
+   .disable= disable_msi_irq,
.unmask = unmask_msi_irq,
.mask   = mask_msi_irq,
.ack= ack_apic_edge,
diff -urpN -X dontdiff linux-2.6.21-rc4-clean/drivers/pci/msi.c 
linux-2.6.21-rc4/drivers/pci/msi.c
--- linux-2.6.21-rc4-clean/drivers/pci/msi.c2007-03-19 16:16:32.0 
-0700
+++ linux-2.6.21-rc4/drivers/pci/msi.c  2007-03-21 12:44:51.0 -0700
@@ -68,6 +68,29 @@ static void msix_set_enable(struct pci_d
}
 }
 
+void msix_flush_writes(unsigned int irq)
+{
+   struct msi_desc *entry;
+
+   entry = get_irq_msi(irq);
+   BUG_ON(!entry || !entry->dev);
+   switch (entry->msi_attrib.type) {
+   case PCI_CAP_ID_MSI:
+   /* nothing to do */
+   break;
+   case PCI_CAP_ID_MSIX:
+   {
+   int offset = entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
+   PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET;
+   readl(entry->mask_base + offset);
+   break;
+   }
+   default:
+   BUG();
+   break;
+   }
+}
+
 static