Re: MSI interrupts and disable_irq

2007-10-17 Thread Manfred Spraul

Yinghai Lu wrote:


Correct, but the overall point was that MSI-X conceptually conflicts
with the existing "lockless" disable_irq() schedule, which was written
when there was a one-one relationship between irq, PCI device, and work
to be done.



at this point, nic in mcp55 is using msi or INTx.

  

Correct.
For msi-x, the driver does three disable_irq() calls to the correct 
vectors. ugly, but nevertheless correct.
The bug only affected msi: The driver did disable_irq(num>) instead of disable_irq().
The patch that I've attached to the bugzilla report 9047 seems to fix 
the crash, thus I would propose to apply it to 2.6.23 and 2.6.24. I'll 
send a seperate mail.


All other problem [reduce code duplication, less ugly locking, ...] 
should be fixed with independant patches.


--
   Manfred



YH
  


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-17 Thread Manfred Spraul

Yinghai Lu wrote:


Correct, but the overall point was that MSI-X conceptually conflicts
with the existing lockless disable_irq() schedule, which was written
when there was a one-one relationship between irq, PCI device, and work
to be done.



at this point, nic in mcp55 is using msi or INTx.

  

Correct.
For msi-x, the driver does three disable_irq() calls to the correct 
vectors. ugly, but nevertheless correct.
The bug only affected msi: The driver did disable_irq(old INTx irq 
num) instead of disable_irq(new msi-x irq num).
The patch that I've attached to the bugzilla report 9047 seems to fix 
the crash, thus I would propose to apply it to 2.6.23 and 2.6.24. I'll 
send a seperate mail.


All other problem [reduce code duplication, less ugly locking, ...] 
should be fixed with independant patches.


--
   Manfred



YH
  


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-16 Thread Jeff Garzik

Yinghai Lu wrote:

On 10/16/07, Jeff Garzik <[EMAIL PROTECTED]> wrote:

Yinghai Lu wrote:

On 10/15/07, Jeff Garzik <[EMAIL PROTECTED]> wrote:

Manfred Spraul wrote:

Jeff Garzik wrote:

I think the scenario you outline is an illustration of the approach's
fragility:  disable_irq() is a heavy hammer that originated with INTx,
and it relies on a chip-specific disable method (kernel/irq/manage.c)
that practically guarantees behavior will vary across MSI/INTx/etc.


I checked the code: IRQ_DISABLE is implemented in software, i.e.
handle_level_irq() only calls handle_IRQ_event() [and then the nic irq
handler] if IRQ_DISABLE is not set.
OTHO: The last trace looks as if nv_do_nic_poll() is interrupted by an irq.

Perhaps something corrupts dev->irq? The irq is requested with
   request_irq(np->pci_dev->irq, handler, IRQF_SHARED, dev->name, dev)
and disabled with
   disable_irq_lockdep(dev->irq);

Someone around with a MSI capable board? The forcedeth driver does
   dev->irq = pci_dev->irq
in nv_probe(), especially before pci_enable_msi().
Does pci_enable_msi() change pci_dev->irq? Then we would disable the
wrong interrupt

Remember, fundamentally MSI-X is a one-to-many relationship, when you
consider a single PCI device might have multiple vectors.

msi-x is using other entry

   if (np->msi_flags & NV_MSI_X_ENABLED)

enable_irq_lockdep(np->msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);

Correct, but the overall point was that MSI-X conceptually conflicts
with the existing "lockless" disable_irq() schedule, which was written
when there was a one-one relationship between irq, PCI device, and work
to be done.


Can I use your new driver with RHEL 5 or RHEL 5.1?


Not without modification, since it depends on the napi_struct work 
currently in torvalds/linux-2.6.git.


But I am currently rewriting the fe-lock yet again, and most of those 
changes can be applied to pre-napi_struct forcedeth.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-16 Thread Yinghai Lu
On 10/16/07, Jeff Garzik <[EMAIL PROTECTED]> wrote:
> Yinghai Lu wrote:
> > On 10/15/07, Jeff Garzik <[EMAIL PROTECTED]> wrote:
> >> Manfred Spraul wrote:
> >>> Jeff Garzik wrote:
>  I think the scenario you outline is an illustration of the approach's
>  fragility:  disable_irq() is a heavy hammer that originated with INTx,
>  and it relies on a chip-specific disable method (kernel/irq/manage.c)
>  that practically guarantees behavior will vary across MSI/INTx/etc.
> 
> >>> I checked the code: IRQ_DISABLE is implemented in software, i.e.
> >>> handle_level_irq() only calls handle_IRQ_event() [and then the nic irq
> >>> handler] if IRQ_DISABLE is not set.
> >>> OTHO: The last trace looks as if nv_do_nic_poll() is interrupted by an 
> >>> irq.
> >>>
> >>> Perhaps something corrupts dev->irq? The irq is requested with
> >>>request_irq(np->pci_dev->irq, handler, IRQF_SHARED, dev->name, dev)
> >>> and disabled with
> >>>disable_irq_lockdep(dev->irq);
> >>>
> >>> Someone around with a MSI capable board? The forcedeth driver does
> >>>dev->irq = pci_dev->irq
> >>> in nv_probe(), especially before pci_enable_msi().
> >>> Does pci_enable_msi() change pci_dev->irq? Then we would disable the
> >>> wrong interrupt
> >> Remember, fundamentally MSI-X is a one-to-many relationship, when you
> >> consider a single PCI device might have multiple vectors.
> >
> > msi-x is using other entry
> >
> >if (np->msi_flags & NV_MSI_X_ENABLED)
> >
> > enable_irq_lockdep(np->msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);
>
> Correct, but the overall point was that MSI-X conceptually conflicts
> with the existing "lockless" disable_irq() schedule, which was written
> when there was a one-one relationship between irq, PCI device, and work
> to be done.

at this point, nic in mcp55 is using msi or INTx.

YH
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-16 Thread Yinghai Lu
On 10/16/07, Jeff Garzik <[EMAIL PROTECTED]> wrote:
> Yinghai Lu wrote:
> > On 10/15/07, Jeff Garzik <[EMAIL PROTECTED]> wrote:
> >> Manfred Spraul wrote:
> >>> Jeff Garzik wrote:
>  I think the scenario you outline is an illustration of the approach's
>  fragility:  disable_irq() is a heavy hammer that originated with INTx,
>  and it relies on a chip-specific disable method (kernel/irq/manage.c)
>  that practically guarantees behavior will vary across MSI/INTx/etc.
> 
> >>> I checked the code: IRQ_DISABLE is implemented in software, i.e.
> >>> handle_level_irq() only calls handle_IRQ_event() [and then the nic irq
> >>> handler] if IRQ_DISABLE is not set.
> >>> OTHO: The last trace looks as if nv_do_nic_poll() is interrupted by an 
> >>> irq.
> >>>
> >>> Perhaps something corrupts dev->irq? The irq is requested with
> >>>request_irq(np->pci_dev->irq, handler, IRQF_SHARED, dev->name, dev)
> >>> and disabled with
> >>>disable_irq_lockdep(dev->irq);
> >>>
> >>> Someone around with a MSI capable board? The forcedeth driver does
> >>>dev->irq = pci_dev->irq
> >>> in nv_probe(), especially before pci_enable_msi().
> >>> Does pci_enable_msi() change pci_dev->irq? Then we would disable the
> >>> wrong interrupt
> >> Remember, fundamentally MSI-X is a one-to-many relationship, when you
> >> consider a single PCI device might have multiple vectors.
> >
> > msi-x is using other entry
> >
> >if (np->msi_flags & NV_MSI_X_ENABLED)
> >
> > enable_irq_lockdep(np->msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);
>
> Correct, but the overall point was that MSI-X conceptually conflicts
> with the existing "lockless" disable_irq() schedule, which was written
> when there was a one-one relationship between irq, PCI device, and work
> to be done.

Can I use your new driver with RHEL 5 or RHEL 5.1?

YH
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-16 Thread Jeff Garzik

Yinghai Lu wrote:

On 10/15/07, Jeff Garzik <[EMAIL PROTECTED]> wrote:

Manfred Spraul wrote:

Jeff Garzik wrote:

I think the scenario you outline is an illustration of the approach's
fragility:  disable_irq() is a heavy hammer that originated with INTx,
and it relies on a chip-specific disable method (kernel/irq/manage.c)
that practically guarantees behavior will vary across MSI/INTx/etc.


I checked the code: IRQ_DISABLE is implemented in software, i.e.
handle_level_irq() only calls handle_IRQ_event() [and then the nic irq
handler] if IRQ_DISABLE is not set.
OTHO: The last trace looks as if nv_do_nic_poll() is interrupted by an irq.

Perhaps something corrupts dev->irq? The irq is requested with
   request_irq(np->pci_dev->irq, handler, IRQF_SHARED, dev->name, dev)
and disabled with
   disable_irq_lockdep(dev->irq);

Someone around with a MSI capable board? The forcedeth driver does
   dev->irq = pci_dev->irq
in nv_probe(), especially before pci_enable_msi().
Does pci_enable_msi() change pci_dev->irq? Then we would disable the
wrong interrupt

Remember, fundamentally MSI-X is a one-to-many relationship, when you
consider a single PCI device might have multiple vectors.


msi-x is using other entry

   if (np->msi_flags & NV_MSI_X_ENABLED)

enable_irq_lockdep(np->msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);


Correct, but the overall point was that MSI-X conceptually conflicts 
with the existing "lockless" disable_irq() schedule, which was written 
when there was a one-one relationship between irq, PCI device, and work 
to be done.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-16 Thread Yinghai Lu
On 10/15/07, Jeff Garzik <[EMAIL PROTECTED]> wrote:
> Manfred Spraul wrote:
> > Jeff Garzik wrote:
> >>
> >> I think the scenario you outline is an illustration of the approach's
> >> fragility:  disable_irq() is a heavy hammer that originated with INTx,
> >> and it relies on a chip-specific disable method (kernel/irq/manage.c)
> >> that practically guarantees behavior will vary across MSI/INTx/etc.
> >>
> > I checked the code: IRQ_DISABLE is implemented in software, i.e.
> > handle_level_irq() only calls handle_IRQ_event() [and then the nic irq
> > handler] if IRQ_DISABLE is not set.
> > OTHO: The last trace looks as if nv_do_nic_poll() is interrupted by an irq.
> >
> > Perhaps something corrupts dev->irq? The irq is requested with
> >request_irq(np->pci_dev->irq, handler, IRQF_SHARED, dev->name, dev)
> > and disabled with
> >disable_irq_lockdep(dev->irq);
> >
> > Someone around with a MSI capable board? The forcedeth driver does
> >dev->irq = pci_dev->irq
> > in nv_probe(), especially before pci_enable_msi().
> > Does pci_enable_msi() change pci_dev->irq? Then we would disable the
> > wrong interrupt
>
> Remember, fundamentally MSI-X is a one-to-many relationship, when you
> consider a single PCI device might have multiple vectors.

msi-x is using other entry

   if (np->msi_flags & NV_MSI_X_ENABLED)

enable_irq_lockdep(np->msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);

YH
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-16 Thread Yinghai Lu
On 10/15/07, Jeff Garzik [EMAIL PROTECTED] wrote:
 Manfred Spraul wrote:
  Jeff Garzik wrote:
 
  I think the scenario you outline is an illustration of the approach's
  fragility:  disable_irq() is a heavy hammer that originated with INTx,
  and it relies on a chip-specific disable method (kernel/irq/manage.c)
  that practically guarantees behavior will vary across MSI/INTx/etc.
 
  I checked the code: IRQ_DISABLE is implemented in software, i.e.
  handle_level_irq() only calls handle_IRQ_event() [and then the nic irq
  handler] if IRQ_DISABLE is not set.
  OTHO: The last trace looks as if nv_do_nic_poll() is interrupted by an irq.
 
  Perhaps something corrupts dev-irq? The irq is requested with
 request_irq(np-pci_dev-irq, handler, IRQF_SHARED, dev-name, dev)
  and disabled with
 disable_irq_lockdep(dev-irq);
 
  Someone around with a MSI capable board? The forcedeth driver does
 dev-irq = pci_dev-irq
  in nv_probe(), especially before pci_enable_msi().
  Does pci_enable_msi() change pci_dev-irq? Then we would disable the
  wrong interrupt

 Remember, fundamentally MSI-X is a one-to-many relationship, when you
 consider a single PCI device might have multiple vectors.

msi-x is using other entry

   if (np-msi_flags  NV_MSI_X_ENABLED)

enable_irq_lockdep(np-msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);

YH
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-16 Thread Jeff Garzik

Yinghai Lu wrote:

On 10/15/07, Jeff Garzik [EMAIL PROTECTED] wrote:

Manfred Spraul wrote:

Jeff Garzik wrote:

I think the scenario you outline is an illustration of the approach's
fragility:  disable_irq() is a heavy hammer that originated with INTx,
and it relies on a chip-specific disable method (kernel/irq/manage.c)
that practically guarantees behavior will vary across MSI/INTx/etc.


I checked the code: IRQ_DISABLE is implemented in software, i.e.
handle_level_irq() only calls handle_IRQ_event() [and then the nic irq
handler] if IRQ_DISABLE is not set.
OTHO: The last trace looks as if nv_do_nic_poll() is interrupted by an irq.

Perhaps something corrupts dev-irq? The irq is requested with
   request_irq(np-pci_dev-irq, handler, IRQF_SHARED, dev-name, dev)
and disabled with
   disable_irq_lockdep(dev-irq);

Someone around with a MSI capable board? The forcedeth driver does
   dev-irq = pci_dev-irq
in nv_probe(), especially before pci_enable_msi().
Does pci_enable_msi() change pci_dev-irq? Then we would disable the
wrong interrupt

Remember, fundamentally MSI-X is a one-to-many relationship, when you
consider a single PCI device might have multiple vectors.


msi-x is using other entry

   if (np-msi_flags  NV_MSI_X_ENABLED)

enable_irq_lockdep(np-msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);


Correct, but the overall point was that MSI-X conceptually conflicts 
with the existing lockless disable_irq() schedule, which was written 
when there was a one-one relationship between irq, PCI device, and work 
to be done.


Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-16 Thread Yinghai Lu
On 10/16/07, Jeff Garzik [EMAIL PROTECTED] wrote:
 Yinghai Lu wrote:
  On 10/15/07, Jeff Garzik [EMAIL PROTECTED] wrote:
  Manfred Spraul wrote:
  Jeff Garzik wrote:
  I think the scenario you outline is an illustration of the approach's
  fragility:  disable_irq() is a heavy hammer that originated with INTx,
  and it relies on a chip-specific disable method (kernel/irq/manage.c)
  that practically guarantees behavior will vary across MSI/INTx/etc.
 
  I checked the code: IRQ_DISABLE is implemented in software, i.e.
  handle_level_irq() only calls handle_IRQ_event() [and then the nic irq
  handler] if IRQ_DISABLE is not set.
  OTHO: The last trace looks as if nv_do_nic_poll() is interrupted by an 
  irq.
 
  Perhaps something corrupts dev-irq? The irq is requested with
 request_irq(np-pci_dev-irq, handler, IRQF_SHARED, dev-name, dev)
  and disabled with
 disable_irq_lockdep(dev-irq);
 
  Someone around with a MSI capable board? The forcedeth driver does
 dev-irq = pci_dev-irq
  in nv_probe(), especially before pci_enable_msi().
  Does pci_enable_msi() change pci_dev-irq? Then we would disable the
  wrong interrupt
  Remember, fundamentally MSI-X is a one-to-many relationship, when you
  consider a single PCI device might have multiple vectors.
 
  msi-x is using other entry
 
 if (np-msi_flags  NV_MSI_X_ENABLED)
 
  enable_irq_lockdep(np-msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);

 Correct, but the overall point was that MSI-X conceptually conflicts
 with the existing lockless disable_irq() schedule, which was written
 when there was a one-one relationship between irq, PCI device, and work
 to be done.

Can I use your new driver with RHEL 5 or RHEL 5.1?

YH
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-16 Thread Yinghai Lu
On 10/16/07, Jeff Garzik [EMAIL PROTECTED] wrote:
 Yinghai Lu wrote:
  On 10/15/07, Jeff Garzik [EMAIL PROTECTED] wrote:
  Manfred Spraul wrote:
  Jeff Garzik wrote:
  I think the scenario you outline is an illustration of the approach's
  fragility:  disable_irq() is a heavy hammer that originated with INTx,
  and it relies on a chip-specific disable method (kernel/irq/manage.c)
  that practically guarantees behavior will vary across MSI/INTx/etc.
 
  I checked the code: IRQ_DISABLE is implemented in software, i.e.
  handle_level_irq() only calls handle_IRQ_event() [and then the nic irq
  handler] if IRQ_DISABLE is not set.
  OTHO: The last trace looks as if nv_do_nic_poll() is interrupted by an 
  irq.
 
  Perhaps something corrupts dev-irq? The irq is requested with
 request_irq(np-pci_dev-irq, handler, IRQF_SHARED, dev-name, dev)
  and disabled with
 disable_irq_lockdep(dev-irq);
 
  Someone around with a MSI capable board? The forcedeth driver does
 dev-irq = pci_dev-irq
  in nv_probe(), especially before pci_enable_msi().
  Does pci_enable_msi() change pci_dev-irq? Then we would disable the
  wrong interrupt
  Remember, fundamentally MSI-X is a one-to-many relationship, when you
  consider a single PCI device might have multiple vectors.
 
  msi-x is using other entry
 
 if (np-msi_flags  NV_MSI_X_ENABLED)
 
  enable_irq_lockdep(np-msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);

 Correct, but the overall point was that MSI-X conceptually conflicts
 with the existing lockless disable_irq() schedule, which was written
 when there was a one-one relationship between irq, PCI device, and work
 to be done.

at this point, nic in mcp55 is using msi or INTx.

YH
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-16 Thread Jeff Garzik

Yinghai Lu wrote:

On 10/16/07, Jeff Garzik [EMAIL PROTECTED] wrote:

Yinghai Lu wrote:

On 10/15/07, Jeff Garzik [EMAIL PROTECTED] wrote:

Manfred Spraul wrote:

Jeff Garzik wrote:

I think the scenario you outline is an illustration of the approach's
fragility:  disable_irq() is a heavy hammer that originated with INTx,
and it relies on a chip-specific disable method (kernel/irq/manage.c)
that practically guarantees behavior will vary across MSI/INTx/etc.


I checked the code: IRQ_DISABLE is implemented in software, i.e.
handle_level_irq() only calls handle_IRQ_event() [and then the nic irq
handler] if IRQ_DISABLE is not set.
OTHO: The last trace looks as if nv_do_nic_poll() is interrupted by an irq.

Perhaps something corrupts dev-irq? The irq is requested with
   request_irq(np-pci_dev-irq, handler, IRQF_SHARED, dev-name, dev)
and disabled with
   disable_irq_lockdep(dev-irq);

Someone around with a MSI capable board? The forcedeth driver does
   dev-irq = pci_dev-irq
in nv_probe(), especially before pci_enable_msi().
Does pci_enable_msi() change pci_dev-irq? Then we would disable the
wrong interrupt

Remember, fundamentally MSI-X is a one-to-many relationship, when you
consider a single PCI device might have multiple vectors.

msi-x is using other entry

   if (np-msi_flags  NV_MSI_X_ENABLED)

enable_irq_lockdep(np-msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);

Correct, but the overall point was that MSI-X conceptually conflicts
with the existing lockless disable_irq() schedule, which was written
when there was a one-one relationship between irq, PCI device, and work
to be done.


Can I use your new driver with RHEL 5 or RHEL 5.1?


Not without modification, since it depends on the napi_struct work 
currently in torvalds/linux-2.6.git.


But I am currently rewriting the fe-lock yet again, and most of those 
changes can be applied to pre-napi_struct forcedeth.


Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-15 Thread Jeff Garzik

Manfred Spraul wrote:

Jeff Garzik wrote:


I think the scenario you outline is an illustration of the approach's 
fragility:  disable_irq() is a heavy hammer that originated with INTx, 
and it relies on a chip-specific disable method (kernel/irq/manage.c) 
that practically guarantees behavior will vary across MSI/INTx/etc.


I checked the code: IRQ_DISABLE is implemented in software, i.e. 
handle_level_irq() only calls handle_IRQ_event() [and then the nic irq 
handler] if IRQ_DISABLE is not set.

OTHO: The last trace looks as if nv_do_nic_poll() is interrupted by an irq.

Perhaps something corrupts dev->irq? The irq is requested with
   request_irq(np->pci_dev->irq, handler, IRQF_SHARED, dev->name, dev)
and disabled with
   disable_irq_lockdep(dev->irq);

Someone around with a MSI capable board? The forcedeth driver does
   dev->irq = pci_dev->irq
in nv_probe(), especially before pci_enable_msi().
Does pci_enable_msi() change pci_dev->irq? Then we would disable the 
wrong interrupt


Remember, fundamentally MSI-X is a one-to-many relationship, when you 
consider a single PCI device might have multiple vectors.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-15 Thread Jeff Garzik

Manfred Spraul wrote:

Jeff Garzik wrote:


I think the scenario you outline is an illustration of the approach's 
fragility:  disable_irq() is a heavy hammer that originated with INTx, 
and it relies on a chip-specific disable method (kernel/irq/manage.c) 
that practically guarantees behavior will vary across MSI/INTx/etc.


I checked the code: IRQ_DISABLE is implemented in software, i.e. 
handle_level_irq() only calls handle_IRQ_event() [and then the nic irq 
handler] if IRQ_DISABLE is not set.

OTHO: The last trace looks as if nv_do_nic_poll() is interrupted by an irq.

Perhaps something corrupts dev-irq? The irq is requested with
   request_irq(np-pci_dev-irq, handler, IRQF_SHARED, dev-name, dev)
and disabled with
   disable_irq_lockdep(dev-irq);

Someone around with a MSI capable board? The forcedeth driver does
   dev-irq = pci_dev-irq
in nv_probe(), especially before pci_enable_msi().
Does pci_enable_msi() change pci_dev-irq? Then we would disable the 
wrong interrupt


Remember, fundamentally MSI-X is a one-to-many relationship, when you 
consider a single PCI device might have multiple vectors.


Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-14 Thread Benjamin Herrenschmidt

On Sun, 2007-10-14 at 16:15 -0700, Yinghai Lu wrote:
> On 10/14/07, Benjamin Herrenschmidt <[EMAIL PROTECTED]> wrote:
> >
> > On Sun, 2007-10-14 at 09:15 +0200, Manfred Spraul wrote:
> > > Yinghai Lu wrote:
> > > > On 10/13/07, Manfred Spraul <[EMAIL PROTECTED]> wrote:
> > > >
> > > >> Someone around with a MSI capable board? The forcedeth driver does
> > > >> dev->irq = pci_dev->irq
> > > >> in nv_probe(), especially before pci_enable_msi().
> > > >> Does pci_enable_msi() change pci_dev->irq? Then we would disable the
> > > >> wrong interrupt
> > > >>
> > > >
> > > > the request_irq==>setup_irq will make dev->irq = pci_dev->irq.
> > > >
> > > >
> > > Where is that?
> > > Otherwise I would propose the attached patch. My board is not
> > > MSI-capable, thus I can't test it myself.
> >
> > Why not just copy pcidev->irq to dev->irq once ?
> 
> it seems e1000 is using np->pci_dev->irq directly too.

Heh, allright, doesn't matter, I was just proposing to avoid one more
indirection :-)

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-14 Thread Yinghai Lu
On 10/14/07, Benjamin Herrenschmidt <[EMAIL PROTECTED]> wrote:
>
> On Sun, 2007-10-14 at 09:15 +0200, Manfred Spraul wrote:
> > Yinghai Lu wrote:
> > > On 10/13/07, Manfred Spraul <[EMAIL PROTECTED]> wrote:
> > >
> > >> Someone around with a MSI capable board? The forcedeth driver does
> > >> dev->irq = pci_dev->irq
> > >> in nv_probe(), especially before pci_enable_msi().
> > >> Does pci_enable_msi() change pci_dev->irq? Then we would disable the
> > >> wrong interrupt
> > >>
> > >
> > > the request_irq==>setup_irq will make dev->irq = pci_dev->irq.
> > >
> > >
> > Where is that?
> > Otherwise I would propose the attached patch. My board is not
> > MSI-capable, thus I can't test it myself.
>
> Why not just copy pcidev->irq to dev->irq once ?

it seems e1000 is using np->pci_dev->irq directly too.

YH
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-14 Thread Benjamin Herrenschmidt

On Sun, 2007-10-14 at 09:15 +0200, Manfred Spraul wrote:
> Yinghai Lu wrote:
> > On 10/13/07, Manfred Spraul <[EMAIL PROTECTED]> wrote:
> >   
> >> Someone around with a MSI capable board? The forcedeth driver does
> >> dev->irq = pci_dev->irq
> >> in nv_probe(), especially before pci_enable_msi().
> >> Does pci_enable_msi() change pci_dev->irq? Then we would disable the
> >> wrong interrupt
> >> 
> >
> > the request_irq==>setup_irq will make dev->irq = pci_dev->irq.
> >
> >   
> Where is that?
> Otherwise I would propose the attached patch. My board is not 
> MSI-capable, thus I can't test it myself.

Why not just copy pcidev->irq to dev->irq once ?

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-14 Thread Yinghai Lu
On 10/14/07, Manfred Spraul <[EMAIL PROTECTED]> wrote:
> Yinghai Lu wrote:
> > On 10/13/07, Manfred Spraul <[EMAIL PROTECTED]> wrote:
> >
> >> Someone around with a MSI capable board? The forcedeth driver does
> >> dev->irq = pci_dev->irq
> >> in nv_probe(), especially before pci_enable_msi().
> >> Does pci_enable_msi() change pci_dev->irq? Then we would disable the
> >> wrong interrupt
> >>
> >
> > the request_irq==>setup_irq will make dev->irq = pci_dev->irq.
> >
> >
> Where is that?

in nv_request_irq
if ((ret = pci_enable_msi(np->pci_dev)) == 0) {
np->msi_flags |= NV_MSI_ENABLED;
if (request_irq(np->pci_dev->irq, handler,
IRQF_SHARED, dev->name, dev) != 0) {

in request_irq

int request_irq(unsigned int irq, irq_handler_t handler,
unsigned long irqflags, const char *devname, void *dev_id)
...
action->dev_id = dev_id;
...
retval = setup_irq(irq, action);

in setup_irq
int setup_irq(unsigned int irq, struct irqaction *new)

new->irq = irq;

it seems I missed that here new is irqaction instead of net_dev.

YH
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-14 Thread Manfred Spraul

Yinghai Lu wrote:

On 10/13/07, Manfred Spraul <[EMAIL PROTECTED]> wrote:
  

Someone around with a MSI capable board? The forcedeth driver does
dev->irq = pci_dev->irq
in nv_probe(), especially before pci_enable_msi().
Does pci_enable_msi() change pci_dev->irq? Then we would disable the
wrong interrupt



the request_irq==>setup_irq will make dev->irq = pci_dev->irq.

  

Where is that?
Otherwise I would propose the attached patch. My board is not 
MSI-capable, thus I can't test it myself.


--
   Manfred
--- 2.6/drivers/net/forcedeth.c 2007-10-07 17:33:18.0 +0200
+++ build-2.6/drivers/net/forcedeth.c   2007-10-13 11:34:20.0 +0200
@@ -988,7 +988,7 @@
if (np->msi_flags & NV_MSI_X_ENABLED)
enable_irq(np->msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);
else
-   enable_irq(dev->irq);
+   enable_irq(np->pci_dev->irq);
} else {
enable_irq(np->msi_x_entry[NV_MSI_X_VECTOR_RX].vector);
enable_irq(np->msi_x_entry[NV_MSI_X_VECTOR_TX].vector);
@@ -1004,7 +1004,7 @@
if (np->msi_flags & NV_MSI_X_ENABLED)

disable_irq(np->msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);
else
-   disable_irq(dev->irq);
+   disable_irq(np->pci_dev->irq);
} else {
disable_irq(np->msi_x_entry[NV_MSI_X_VECTOR_RX].vector);
disable_irq(np->msi_x_entry[NV_MSI_X_VECTOR_TX].vector);
@@ -1601,7 +1601,7 @@
if (np->msi_flags & NV_MSI_X_ENABLED)

disable_irq(np->msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);
else
-   disable_irq(dev->irq);
+   disable_irq(np->pci_dev->irq);
} else {
disable_irq(np->msi_x_entry[NV_MSI_X_VECTOR_RX].vector);
}
@@ -1619,7 +1619,7 @@
if (np->msi_flags & NV_MSI_X_ENABLED)
enable_irq(np->msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);
else
-   enable_irq(dev->irq);
+   enable_irq(np->pci_dev->irq);
} else {
enable_irq(np->msi_x_entry[NV_MSI_X_VECTOR_RX].vector);
}
@@ -3557,10 +3557,12 @@
if (ret != 0 && np->msi_flags & NV_MSI_CAPABLE) {
if ((ret = pci_enable_msi(np->pci_dev)) == 0) {
np->msi_flags |= NV_MSI_ENABLED;
+   dev->irq = np->pci_dev->irq;
if (request_irq(np->pci_dev->irq, handler, IRQF_SHARED, 
dev->name, dev) != 0) {
printk(KERN_INFO "forcedeth: request_irq failed 
%d\n", ret);
pci_disable_msi(np->pci_dev);
np->msi_flags &= ~NV_MSI_ENABLED;
+   dev->irq = np->pci_dev->irq;
goto out_err;
}
 
@@ -3623,7 +3625,7 @@
if (np->msi_flags & NV_MSI_X_ENABLED)

disable_irq_lockdep(np->msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);
else
-   disable_irq_lockdep(dev->irq);
+   disable_irq_lockdep(np->pci_dev->irq);
mask = np->irqmask;
} else {
if (np->nic_poll_irq & NVREG_IRQ_RX_ALL) {
@@ -3641,6 +3643,8 @@
}
np->nic_poll_irq = 0;
 
+   /* disable_irq() contains synchronize_irq, thus no irq handler can run 
now */
+
if (np->recover_error) {
np->recover_error = 0;
printk(KERN_INFO "forcedeth: MAC in recoverable error state\n");
@@ -3677,7 +3681,6 @@
}
}
 
-   /* FIXME: Do we need synchronize_irq(dev->irq) here? */
 
writel(mask, base + NvRegIrqMask);
pci_push(base);
@@ -3690,7 +3693,7 @@
if (np->msi_flags & NV_MSI_X_ENABLED)

enable_irq_lockdep(np->msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);
else
-   enable_irq_lockdep(dev->irq);
+   enable_irq_lockdep(np->pci_dev->irq);
} else {
if (np->nic_poll_irq & NVREG_IRQ_RX_ALL) {
nv_nic_irq_rx(0, dev);
@@ -4943,7 +4946,7 @@
np->in_shutdown = 1;
spin_unlock_irq(>lock);
netif_poll_disable(dev);
-   synchronize_irq(dev->irq);
+   synchronize_irq(np->pci_dev->irq);
 
del_timer_sync(>oom_kick);
del_timer_sync(>nic_poll);


Re: MSI interrupts and disable_irq

2007-10-14 Thread Yinghai Lu
On 10/13/07, Manfred Spraul <[EMAIL PROTECTED]> wrote:
> Jeff Garzik wrote:
> >
> > I think the scenario you outline is an illustration of the approach's
> > fragility:  disable_irq() is a heavy hammer that originated with INTx,
> > and it relies on a chip-specific disable method (kernel/irq/manage.c)
> > that practically guarantees behavior will vary across MSI/INTx/etc.
> >
> I checked the code: IRQ_DISABLE is implemented in software, i.e.
> handle_level_irq() only calls handle_IRQ_event() [and then the nic irq
> handler] if IRQ_DISABLE is not set.
> OTHO: The last trace looks as if nv_do_nic_poll() is interrupted by an irq.
>
> Perhaps something corrupts dev->irq? The irq is requested with
> request_irq(np->pci_dev->irq, handler, IRQF_SHARED, dev->name, dev)
> and disabled with
> disable_irq_lockdep(dev->irq);
>
> Someone around with a MSI capable board? The forcedeth driver does
> dev->irq = pci_dev->irq
> in nv_probe(), especially before pci_enable_msi().
> Does pci_enable_msi() change pci_dev->irq? Then we would disable the
> wrong interrupt

the request_irq==>setup_irq will make dev->irq = pci_dev->irq.

YH
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-14 Thread Yinghai Lu
On 10/13/07, Manfred Spraul [EMAIL PROTECTED] wrote:
 Jeff Garzik wrote:
 
  I think the scenario you outline is an illustration of the approach's
  fragility:  disable_irq() is a heavy hammer that originated with INTx,
  and it relies on a chip-specific disable method (kernel/irq/manage.c)
  that practically guarantees behavior will vary across MSI/INTx/etc.
 
 I checked the code: IRQ_DISABLE is implemented in software, i.e.
 handle_level_irq() only calls handle_IRQ_event() [and then the nic irq
 handler] if IRQ_DISABLE is not set.
 OTHO: The last trace looks as if nv_do_nic_poll() is interrupted by an irq.

 Perhaps something corrupts dev-irq? The irq is requested with
 request_irq(np-pci_dev-irq, handler, IRQF_SHARED, dev-name, dev)
 and disabled with
 disable_irq_lockdep(dev-irq);

 Someone around with a MSI capable board? The forcedeth driver does
 dev-irq = pci_dev-irq
 in nv_probe(), especially before pci_enable_msi().
 Does pci_enable_msi() change pci_dev-irq? Then we would disable the
 wrong interrupt

the request_irq==setup_irq will make dev-irq = pci_dev-irq.

YH
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-14 Thread Manfred Spraul

Yinghai Lu wrote:

On 10/13/07, Manfred Spraul [EMAIL PROTECTED] wrote:
  

Someone around with a MSI capable board? The forcedeth driver does
dev-irq = pci_dev-irq
in nv_probe(), especially before pci_enable_msi().
Does pci_enable_msi() change pci_dev-irq? Then we would disable the
wrong interrupt



the request_irq==setup_irq will make dev-irq = pci_dev-irq.

  

Where is that?
Otherwise I would propose the attached patch. My board is not 
MSI-capable, thus I can't test it myself.


--
   Manfred
--- 2.6/drivers/net/forcedeth.c 2007-10-07 17:33:18.0 +0200
+++ build-2.6/drivers/net/forcedeth.c   2007-10-13 11:34:20.0 +0200
@@ -988,7 +988,7 @@
if (np-msi_flags  NV_MSI_X_ENABLED)
enable_irq(np-msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);
else
-   enable_irq(dev-irq);
+   enable_irq(np-pci_dev-irq);
} else {
enable_irq(np-msi_x_entry[NV_MSI_X_VECTOR_RX].vector);
enable_irq(np-msi_x_entry[NV_MSI_X_VECTOR_TX].vector);
@@ -1004,7 +1004,7 @@
if (np-msi_flags  NV_MSI_X_ENABLED)

disable_irq(np-msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);
else
-   disable_irq(dev-irq);
+   disable_irq(np-pci_dev-irq);
} else {
disable_irq(np-msi_x_entry[NV_MSI_X_VECTOR_RX].vector);
disable_irq(np-msi_x_entry[NV_MSI_X_VECTOR_TX].vector);
@@ -1601,7 +1601,7 @@
if (np-msi_flags  NV_MSI_X_ENABLED)

disable_irq(np-msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);
else
-   disable_irq(dev-irq);
+   disable_irq(np-pci_dev-irq);
} else {
disable_irq(np-msi_x_entry[NV_MSI_X_VECTOR_RX].vector);
}
@@ -1619,7 +1619,7 @@
if (np-msi_flags  NV_MSI_X_ENABLED)
enable_irq(np-msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);
else
-   enable_irq(dev-irq);
+   enable_irq(np-pci_dev-irq);
} else {
enable_irq(np-msi_x_entry[NV_MSI_X_VECTOR_RX].vector);
}
@@ -3557,10 +3557,12 @@
if (ret != 0  np-msi_flags  NV_MSI_CAPABLE) {
if ((ret = pci_enable_msi(np-pci_dev)) == 0) {
np-msi_flags |= NV_MSI_ENABLED;
+   dev-irq = np-pci_dev-irq;
if (request_irq(np-pci_dev-irq, handler, IRQF_SHARED, 
dev-name, dev) != 0) {
printk(KERN_INFO forcedeth: request_irq failed 
%d\n, ret);
pci_disable_msi(np-pci_dev);
np-msi_flags = ~NV_MSI_ENABLED;
+   dev-irq = np-pci_dev-irq;
goto out_err;
}
 
@@ -3623,7 +3625,7 @@
if (np-msi_flags  NV_MSI_X_ENABLED)

disable_irq_lockdep(np-msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);
else
-   disable_irq_lockdep(dev-irq);
+   disable_irq_lockdep(np-pci_dev-irq);
mask = np-irqmask;
} else {
if (np-nic_poll_irq  NVREG_IRQ_RX_ALL) {
@@ -3641,6 +3643,8 @@
}
np-nic_poll_irq = 0;
 
+   /* disable_irq() contains synchronize_irq, thus no irq handler can run 
now */
+
if (np-recover_error) {
np-recover_error = 0;
printk(KERN_INFO forcedeth: MAC in recoverable error state\n);
@@ -3677,7 +3681,6 @@
}
}
 
-   /* FIXME: Do we need synchronize_irq(dev-irq) here? */
 
writel(mask, base + NvRegIrqMask);
pci_push(base);
@@ -3690,7 +3693,7 @@
if (np-msi_flags  NV_MSI_X_ENABLED)

enable_irq_lockdep(np-msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);
else
-   enable_irq_lockdep(dev-irq);
+   enable_irq_lockdep(np-pci_dev-irq);
} else {
if (np-nic_poll_irq  NVREG_IRQ_RX_ALL) {
nv_nic_irq_rx(0, dev);
@@ -4943,7 +4946,7 @@
np-in_shutdown = 1;
spin_unlock_irq(np-lock);
netif_poll_disable(dev);
-   synchronize_irq(dev-irq);
+   synchronize_irq(np-pci_dev-irq);
 
del_timer_sync(np-oom_kick);
del_timer_sync(np-nic_poll);


Re: MSI interrupts and disable_irq

2007-10-14 Thread Yinghai Lu
On 10/14/07, Manfred Spraul [EMAIL PROTECTED] wrote:
 Yinghai Lu wrote:
  On 10/13/07, Manfred Spraul [EMAIL PROTECTED] wrote:
 
  Someone around with a MSI capable board? The forcedeth driver does
  dev-irq = pci_dev-irq
  in nv_probe(), especially before pci_enable_msi().
  Does pci_enable_msi() change pci_dev-irq? Then we would disable the
  wrong interrupt
 
 
  the request_irq==setup_irq will make dev-irq = pci_dev-irq.
 
 
 Where is that?

in nv_request_irq
if ((ret = pci_enable_msi(np-pci_dev)) == 0) {
np-msi_flags |= NV_MSI_ENABLED;
if (request_irq(np-pci_dev-irq, handler,
IRQF_SHARED, dev-name, dev) != 0) {

in request_irq

int request_irq(unsigned int irq, irq_handler_t handler,
unsigned long irqflags, const char *devname, void *dev_id)
...
action-dev_id = dev_id;
...
retval = setup_irq(irq, action);

in setup_irq
int setup_irq(unsigned int irq, struct irqaction *new)

new-irq = irq;

it seems I missed that here new is irqaction instead of net_dev.

YH
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-14 Thread Benjamin Herrenschmidt

On Sun, 2007-10-14 at 09:15 +0200, Manfred Spraul wrote:
 Yinghai Lu wrote:
  On 10/13/07, Manfred Spraul [EMAIL PROTECTED] wrote:

  Someone around with a MSI capable board? The forcedeth driver does
  dev-irq = pci_dev-irq
  in nv_probe(), especially before pci_enable_msi().
  Does pci_enable_msi() change pci_dev-irq? Then we would disable the
  wrong interrupt
  
 
  the request_irq==setup_irq will make dev-irq = pci_dev-irq.
 

 Where is that?
 Otherwise I would propose the attached patch. My board is not 
 MSI-capable, thus I can't test it myself.

Why not just copy pcidev-irq to dev-irq once ?

Ben.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-14 Thread Yinghai Lu
On 10/14/07, Benjamin Herrenschmidt [EMAIL PROTECTED] wrote:

 On Sun, 2007-10-14 at 09:15 +0200, Manfred Spraul wrote:
  Yinghai Lu wrote:
   On 10/13/07, Manfred Spraul [EMAIL PROTECTED] wrote:
  
   Someone around with a MSI capable board? The forcedeth driver does
   dev-irq = pci_dev-irq
   in nv_probe(), especially before pci_enable_msi().
   Does pci_enable_msi() change pci_dev-irq? Then we would disable the
   wrong interrupt
  
  
   the request_irq==setup_irq will make dev-irq = pci_dev-irq.
  
  
  Where is that?
  Otherwise I would propose the attached patch. My board is not
  MSI-capable, thus I can't test it myself.

 Why not just copy pcidev-irq to dev-irq once ?

it seems e1000 is using np-pci_dev-irq directly too.

YH
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-14 Thread Benjamin Herrenschmidt

On Sun, 2007-10-14 at 16:15 -0700, Yinghai Lu wrote:
 On 10/14/07, Benjamin Herrenschmidt [EMAIL PROTECTED] wrote:
 
  On Sun, 2007-10-14 at 09:15 +0200, Manfred Spraul wrote:
   Yinghai Lu wrote:
On 10/13/07, Manfred Spraul [EMAIL PROTECTED] wrote:
   
Someone around with a MSI capable board? The forcedeth driver does
dev-irq = pci_dev-irq
in nv_probe(), especially before pci_enable_msi().
Does pci_enable_msi() change pci_dev-irq? Then we would disable the
wrong interrupt
   
   
the request_irq==setup_irq will make dev-irq = pci_dev-irq.
   
   
   Where is that?
   Otherwise I would propose the attached patch. My board is not
   MSI-capable, thus I can't test it myself.
 
  Why not just copy pcidev-irq to dev-irq once ?
 
 it seems e1000 is using np-pci_dev-irq directly too.

Heh, allright, doesn't matter, I was just proposing to avoid one more
indirection :-)

Ben.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-13 Thread Manfred Spraul

Jeff Garzik wrote:


I think the scenario you outline is an illustration of the approach's 
fragility:  disable_irq() is a heavy hammer that originated with INTx, 
and it relies on a chip-specific disable method (kernel/irq/manage.c) 
that practically guarantees behavior will vary across MSI/INTx/etc.


I checked the code: IRQ_DISABLE is implemented in software, i.e. 
handle_level_irq() only calls handle_IRQ_event() [and then the nic irq 
handler] if IRQ_DISABLE is not set.

OTHO: The last trace looks as if nv_do_nic_poll() is interrupted by an irq.

Perhaps something corrupts dev->irq? The irq is requested with
   request_irq(np->pci_dev->irq, handler, IRQF_SHARED, dev->name, dev)
and disabled with
   disable_irq_lockdep(dev->irq);

Someone around with a MSI capable board? The forcedeth driver does
   dev->irq = pci_dev->irq
in nv_probe(), especially before pci_enable_msi().
Does pci_enable_msi() change pci_dev->irq? Then we would disable the 
wrong interrupt


--
   Manfred
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-13 Thread Manfred Spraul

Jeff Garzik wrote:


I think the scenario you outline is an illustration of the approach's 
fragility:  disable_irq() is a heavy hammer that originated with INTx, 
and it relies on a chip-specific disable method (kernel/irq/manage.c) 
that practically guarantees behavior will vary across MSI/INTx/etc.


I checked the code: IRQ_DISABLE is implemented in software, i.e. 
handle_level_irq() only calls handle_IRQ_event() [and then the nic irq 
handler] if IRQ_DISABLE is not set.

OTHO: The last trace looks as if nv_do_nic_poll() is interrupted by an irq.

Perhaps something corrupts dev-irq? The irq is requested with
   request_irq(np-pci_dev-irq, handler, IRQF_SHARED, dev-name, dev)
and disabled with
   disable_irq_lockdep(dev-irq);

Someone around with a MSI capable board? The forcedeth driver does
   dev-irq = pci_dev-irq
in nv_probe(), especially before pci_enable_msi().
Does pci_enable_msi() change pci_dev-irq? Then we would disable the 
wrong interrupt


--
   Manfred
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-07 Thread Manfred Spraul

Jeff Garzik wrote:


Interested parties should try the forcedeth patches I just posted :)

The patches look good, but I would still prefer to understand why the 
current implementation causes crashes before rewriting everything.


I've just noticed that netpoll_send_skb() calls ->hard_start_xmit() and 
->poll() within local_irq_save().
Thus it seems that  spin_lock_irqsave() must be used in the poll() and 
hard_start_xmit() callbacks, at least in nics that support 
POLL_CONTROLLER :-(


--
   Manfred

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-07 Thread Manfred Spraul

Jeff Garzik wrote:


Interested parties should try the forcedeth patches I just posted :)

The patches look good, but I would still prefer to understand why the 
current implementation causes crashes before rewriting everything.


I've just noticed that netpoll_send_skb() calls -hard_start_xmit() and 
-poll() within local_irq_save().
Thus it seems that  spin_lock_irqsave() must be used in the poll() and 
hard_start_xmit() callbacks, at least in nics that support 
POLL_CONTROLLER :-(


--
   Manfred

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-06 Thread Jeff Garzik

Yinghai Lu wrote:

On 9/28/07, Jeff Garzik <[EMAIL PROTECTED]> wrote:

Ayaz Abdulla wrote:

I am trying to track down a forcedeth driver issue described by bug 9047
in bugzilla (2.6.23-rc7-git1 forcedeth w/ MCP55 oops under heavy load).
I added a patch to synchronize the timer handlers so that one handler
doesn't accidently enable the IRQ while another timer handler is running
(see attachment 'Add timer lock' in bug report) and for other processing
protection.

However, the system still had an Oops. So I added a lock around the
nv_rx_process_optimized() and the Oops has not happened (see attachment
'New patch for locking' in bug report). This would imply a
synchronization issue. However, the only callers of that function are
the IRQ handler and the timer handlers (in non-NAPI case). The timer
handlers  use disable_irq so that the IRQ handler does not contend with
them. It looks as if disable_irq is not working properly.

This issue repros only with MSI interrupt and not legacy INTx
interrupts. Any ideas?

(added linux-kernel to CC, since I think it's more of a general kernel
issue)


I wonder if the race is between soft_timer for nv_do_nic_poll from
different CPUs


Interested parties should try the forcedeth patches I just posted :)

Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-06 Thread Yinghai Lu
On 9/28/07, Jeff Garzik <[EMAIL PROTECTED]> wrote:
> Ayaz Abdulla wrote:
> > I am trying to track down a forcedeth driver issue described by bug 9047
> > in bugzilla (2.6.23-rc7-git1 forcedeth w/ MCP55 oops under heavy load).
> > I added a patch to synchronize the timer handlers so that one handler
> > doesn't accidently enable the IRQ while another timer handler is running
> > (see attachment 'Add timer lock' in bug report) and for other processing
> > protection.
> >
> > However, the system still had an Oops. So I added a lock around the
> > nv_rx_process_optimized() and the Oops has not happened (see attachment
> > 'New patch for locking' in bug report). This would imply a
> > synchronization issue. However, the only callers of that function are
> > the IRQ handler and the timer handlers (in non-NAPI case). The timer
> > handlers  use disable_irq so that the IRQ handler does not contend with
> > them. It looks as if disable_irq is not working properly.
> >
> > This issue repros only with MSI interrupt and not legacy INTx
> > interrupts. Any ideas?
>
> (added linux-kernel to CC, since I think it's more of a general kernel
> issue)
>
I wonder if the race is between soft_timer for nv_do_nic_poll from
different CPUs

YH
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-06 Thread Yinghai Lu
On 10/5/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote:
> Stephen Hemminger <[EMAIL PROTECTED]> writes:
>
> > On Fri, 28 Sep 2007 22:47:16 -0400
> > Jeff Garzik <[EMAIL PROTECTED]> wrote:
> >
> >> Ayaz Abdulla wrote:
> >> > I am trying to track down a forcedeth driver issue described by bug 9047
> >> > in bugzilla (2.6.23-rc7-git1 forcedeth w/ MCP55 oops under heavy load).
> >> > I added a patch to synchronize the timer handlers so that one handler
> >> > doesn't accidently enable the IRQ while another timer handler is running
> >> > (see attachment 'Add timer lock' in bug report) and for other processing
> >> > protection.
> >> >
> >> > However, the system still had an Oops. So I added a lock around the
> >> > nv_rx_process_optimized() and the Oops has not happened (see attachment
> >> > 'New patch for locking' in bug report). This would imply a
> >> > synchronization issue. However, the only callers of that function are
> >> > the IRQ handler and the timer handlers (in non-NAPI case). The timer
> >> > handlers  use disable_irq so that the IRQ handler does not contend with
> >> > them. It looks as if disable_irq is not working properly.
> >> >
> >> > This issue repros only with MSI interrupt and not legacy INTx
> >> > interrupts. Any ideas?
> >>
> >> (added linux-kernel to CC, since I think it's more of a general kernel
> >> issue)
>
> I didn't see anything in disable_irq that would cause it to fail in
> the suggested way.  But I couldn't quite convince myself we were
> race free either.  I didn't see anything that was specific to MSI
> that would cause something.  But switching from level to edge
> triggered, and to a lower latency delivery path may have caused
> some behavior changes.
>
> >> To be brutally frank, I always thought this disable_irq() mess was a
> >> hack both ugly and fragile.  This disable_irq() work that appeared in a
> >> couple net drivers was correct at the time, so I didn't feel I had the
> >> justification to reject it, but it still gave me a bad feeling.
> >>
> >> I think the scenario you outline is an illustration of the approach's
> >> fragility:  disable_irq() is a heavy hammer that originated with INTx,
> >> and it relies on a chip-specific disable method (kernel/irq/manage.c)
> >> that practically guarantees behavior will vary across MSI/INTx/etc.
> >>
> >>
> >> Based on your report, it is certainly possible that there is a problem
> >> with MSI's desc->chip->disable() method...  but I would actually
> >> recommend working around the problem by making the forcedeth locking
> >> more standardized by removing all those disable_irq() hacks.
> >>
> >
> > I'll try and clean it up if the author doesn't get to it first.
>
> I took a look at the underlying side of this.
>
> I don't know if the MSI capability for the forcedeth supports a mask
> bit or not.  Mine doesn't even have a msi capability.  If it doesn't
> support a mask bit the pci spec provides not valid way to mask the
> interrupt, so what we do is actually disable the msi capability.
> At which point we might get weird INTx interactions.
>
> We have a similar case with ioapics and INTx that also turns
> a hardware level disable into a reroute to another irq command.
> So I'm going to take a look and see how infrequently we can use
> hardware level disabled.

also the driver need to have seperate handlers for ioapic, msi, msi-x...
current nv_nic_irq_optimized keep checking msi_flag...

also nv_enable_hw_interrupts and nv_disable_hw_interrupts is not
corresponding in MSI side..

YH
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-06 Thread Yinghai Lu
On 10/5/07, Eric W. Biederman [EMAIL PROTECTED] wrote:
 Stephen Hemminger [EMAIL PROTECTED] writes:

  On Fri, 28 Sep 2007 22:47:16 -0400
  Jeff Garzik [EMAIL PROTECTED] wrote:
 
  Ayaz Abdulla wrote:
   I am trying to track down a forcedeth driver issue described by bug 9047
   in bugzilla (2.6.23-rc7-git1 forcedeth w/ MCP55 oops under heavy load).
   I added a patch to synchronize the timer handlers so that one handler
   doesn't accidently enable the IRQ while another timer handler is running
   (see attachment 'Add timer lock' in bug report) and for other processing
   protection.
  
   However, the system still had an Oops. So I added a lock around the
   nv_rx_process_optimized() and the Oops has not happened (see attachment
   'New patch for locking' in bug report). This would imply a
   synchronization issue. However, the only callers of that function are
   the IRQ handler and the timer handlers (in non-NAPI case). The timer
   handlers  use disable_irq so that the IRQ handler does not contend with
   them. It looks as if disable_irq is not working properly.
  
   This issue repros only with MSI interrupt and not legacy INTx
   interrupts. Any ideas?
 
  (added linux-kernel to CC, since I think it's more of a general kernel
  issue)

 I didn't see anything in disable_irq that would cause it to fail in
 the suggested way.  But I couldn't quite convince myself we were
 race free either.  I didn't see anything that was specific to MSI
 that would cause something.  But switching from level to edge
 triggered, and to a lower latency delivery path may have caused
 some behavior changes.

  To be brutally frank, I always thought this disable_irq() mess was a
  hack both ugly and fragile.  This disable_irq() work that appeared in a
  couple net drivers was correct at the time, so I didn't feel I had the
  justification to reject it, but it still gave me a bad feeling.
 
  I think the scenario you outline is an illustration of the approach's
  fragility:  disable_irq() is a heavy hammer that originated with INTx,
  and it relies on a chip-specific disable method (kernel/irq/manage.c)
  that practically guarantees behavior will vary across MSI/INTx/etc.
 
 
  Based on your report, it is certainly possible that there is a problem
  with MSI's desc-chip-disable() method...  but I would actually
  recommend working around the problem by making the forcedeth locking
  more standardized by removing all those disable_irq() hacks.
 
 
  I'll try and clean it up if the author doesn't get to it first.

 I took a look at the underlying side of this.

 I don't know if the MSI capability for the forcedeth supports a mask
 bit or not.  Mine doesn't even have a msi capability.  If it doesn't
 support a mask bit the pci spec provides not valid way to mask the
 interrupt, so what we do is actually disable the msi capability.
 At which point we might get weird INTx interactions.

 We have a similar case with ioapics and INTx that also turns
 a hardware level disable into a reroute to another irq command.
 So I'm going to take a look and see how infrequently we can use
 hardware level disabled.

also the driver need to have seperate handlers for ioapic, msi, msi-x...
current nv_nic_irq_optimized keep checking msi_flag...

also nv_enable_hw_interrupts and nv_disable_hw_interrupts is not
corresponding in MSI side..

YH
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-06 Thread Yinghai Lu
On 9/28/07, Jeff Garzik [EMAIL PROTECTED] wrote:
 Ayaz Abdulla wrote:
  I am trying to track down a forcedeth driver issue described by bug 9047
  in bugzilla (2.6.23-rc7-git1 forcedeth w/ MCP55 oops under heavy load).
  I added a patch to synchronize the timer handlers so that one handler
  doesn't accidently enable the IRQ while another timer handler is running
  (see attachment 'Add timer lock' in bug report) and for other processing
  protection.
 
  However, the system still had an Oops. So I added a lock around the
  nv_rx_process_optimized() and the Oops has not happened (see attachment
  'New patch for locking' in bug report). This would imply a
  synchronization issue. However, the only callers of that function are
  the IRQ handler and the timer handlers (in non-NAPI case). The timer
  handlers  use disable_irq so that the IRQ handler does not contend with
  them. It looks as if disable_irq is not working properly.
 
  This issue repros only with MSI interrupt and not legacy INTx
  interrupts. Any ideas?

 (added linux-kernel to CC, since I think it's more of a general kernel
 issue)

I wonder if the race is between soft_timer for nv_do_nic_poll from
different CPUs

YH
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-06 Thread Jeff Garzik

Yinghai Lu wrote:

On 9/28/07, Jeff Garzik [EMAIL PROTECTED] wrote:

Ayaz Abdulla wrote:

I am trying to track down a forcedeth driver issue described by bug 9047
in bugzilla (2.6.23-rc7-git1 forcedeth w/ MCP55 oops under heavy load).
I added a patch to synchronize the timer handlers so that one handler
doesn't accidently enable the IRQ while another timer handler is running
(see attachment 'Add timer lock' in bug report) and for other processing
protection.

However, the system still had an Oops. So I added a lock around the
nv_rx_process_optimized() and the Oops has not happened (see attachment
'New patch for locking' in bug report). This would imply a
synchronization issue. However, the only callers of that function are
the IRQ handler and the timer handlers (in non-NAPI case). The timer
handlers  use disable_irq so that the IRQ handler does not contend with
them. It looks as if disable_irq is not working properly.

This issue repros only with MSI interrupt and not legacy INTx
interrupts. Any ideas?

(added linux-kernel to CC, since I think it's more of a general kernel
issue)


I wonder if the race is between soft_timer for nv_do_nic_poll from
different CPUs


Interested parties should try the forcedeth patches I just posted :)

Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-05 Thread Eric W. Biederman
Stephen Hemminger <[EMAIL PROTECTED]> writes:

> On Fri, 28 Sep 2007 22:47:16 -0400
> Jeff Garzik <[EMAIL PROTECTED]> wrote:
>
>> Ayaz Abdulla wrote:
>> > I am trying to track down a forcedeth driver issue described by bug 9047 
>> > in bugzilla (2.6.23-rc7-git1 forcedeth w/ MCP55 oops under heavy load). 
>> > I added a patch to synchronize the timer handlers so that one handler 
>> > doesn't accidently enable the IRQ while another timer handler is running 
>> > (see attachment 'Add timer lock' in bug report) and for other processing 
>> > protection.
>> > 
>> > However, the system still had an Oops. So I added a lock around the 
>> > nv_rx_process_optimized() and the Oops has not happened (see attachment 
>> > 'New patch for locking' in bug report). This would imply a 
>> > synchronization issue. However, the only callers of that function are 
>> > the IRQ handler and the timer handlers (in non-NAPI case). The timer 
>> > handlers  use disable_irq so that the IRQ handler does not contend with 
>> > them. It looks as if disable_irq is not working properly.
>> > 
>> > This issue repros only with MSI interrupt and not legacy INTx 
>> > interrupts. Any ideas?
>> 
>> (added linux-kernel to CC, since I think it's more of a general kernel 
>> issue)

I didn't see anything in disable_irq that would cause it to fail in
the suggested way.  But I couldn't quite convince myself we were
race free either.  I didn't see anything that was specific to MSI
that would cause something.  But switching from level to edge
triggered, and to a lower latency delivery path may have caused
some behavior changes.

>> To be brutally frank, I always thought this disable_irq() mess was a 
>> hack both ugly and fragile.  This disable_irq() work that appeared in a 
>> couple net drivers was correct at the time, so I didn't feel I had the 
>> justification to reject it, but it still gave me a bad feeling.
>> 
>> I think the scenario you outline is an illustration of the approach's 
>> fragility:  disable_irq() is a heavy hammer that originated with INTx, 
>> and it relies on a chip-specific disable method (kernel/irq/manage.c) 
>> that practically guarantees behavior will vary across MSI/INTx/etc.
>> 
>> 
>> Based on your report, it is certainly possible that there is a problem 
>> with MSI's desc->chip->disable() method...  but I would actually 
>> recommend working around the problem by making the forcedeth locking 
>> more standardized by removing all those disable_irq() hacks.
>> 
>
> I'll try and clean it up if the author doesn't get to it first.

I took a look at the underlying side of this.

I don't know if the MSI capability for the forcedeth supports a mask
bit or not.  Mine doesn't even have a msi capability.  If it doesn't
support a mask bit the pci spec provides not valid way to mask the
interrupt, so what we do is actually disable the msi capability.
At which point we might get weird INTx interactions.

We have a similar case with ioapics and INTx that also turns
a hardware level disable into a reroute to another irq command.
So I'm going to take a look and see how infrequently we can use
hardware level disabled.

Since it looks like hardware level disables tend to be creatively
implemented I recommend using disable_irq as little as possible.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-05 Thread Eric W. Biederman
Stephen Hemminger [EMAIL PROTECTED] writes:

 On Fri, 28 Sep 2007 22:47:16 -0400
 Jeff Garzik [EMAIL PROTECTED] wrote:

 Ayaz Abdulla wrote:
  I am trying to track down a forcedeth driver issue described by bug 9047 
  in bugzilla (2.6.23-rc7-git1 forcedeth w/ MCP55 oops under heavy load). 
  I added a patch to synchronize the timer handlers so that one handler 
  doesn't accidently enable the IRQ while another timer handler is running 
  (see attachment 'Add timer lock' in bug report) and for other processing 
  protection.
  
  However, the system still had an Oops. So I added a lock around the 
  nv_rx_process_optimized() and the Oops has not happened (see attachment 
  'New patch for locking' in bug report). This would imply a 
  synchronization issue. However, the only callers of that function are 
  the IRQ handler and the timer handlers (in non-NAPI case). The timer 
  handlers  use disable_irq so that the IRQ handler does not contend with 
  them. It looks as if disable_irq is not working properly.
  
  This issue repros only with MSI interrupt and not legacy INTx 
  interrupts. Any ideas?
 
 (added linux-kernel to CC, since I think it's more of a general kernel 
 issue)

I didn't see anything in disable_irq that would cause it to fail in
the suggested way.  But I couldn't quite convince myself we were
race free either.  I didn't see anything that was specific to MSI
that would cause something.  But switching from level to edge
triggered, and to a lower latency delivery path may have caused
some behavior changes.

 To be brutally frank, I always thought this disable_irq() mess was a 
 hack both ugly and fragile.  This disable_irq() work that appeared in a 
 couple net drivers was correct at the time, so I didn't feel I had the 
 justification to reject it, but it still gave me a bad feeling.
 
 I think the scenario you outline is an illustration of the approach's 
 fragility:  disable_irq() is a heavy hammer that originated with INTx, 
 and it relies on a chip-specific disable method (kernel/irq/manage.c) 
 that practically guarantees behavior will vary across MSI/INTx/etc.
 
 
 Based on your report, it is certainly possible that there is a problem 
 with MSI's desc-chip-disable() method...  but I would actually 
 recommend working around the problem by making the forcedeth locking 
 more standardized by removing all those disable_irq() hacks.
 

 I'll try and clean it up if the author doesn't get to it first.

I took a look at the underlying side of this.

I don't know if the MSI capability for the forcedeth supports a mask
bit or not.  Mine doesn't even have a msi capability.  If it doesn't
support a mask bit the pci spec provides not valid way to mask the
interrupt, so what we do is actually disable the msi capability.
At which point we might get weird INTx interactions.

We have a similar case with ioapics and INTx that also turns
a hardware level disable into a reroute to another irq command.
So I'm going to take a look and see how infrequently we can use
hardware level disabled.

Since it looks like hardware level disables tend to be creatively
implemented I recommend using disable_irq as little as possible.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-02 Thread Manfred Spraul

Ayaz Abdulla wrote:
I am trying to track down a forcedeth driver issue described by bug 
9047 in bugzilla (2.6.23-rc7-git1 forcedeth w/ MCP55 oops under heavy 
load). I added a patch to synchronize the timer handlers so that one 
handler doesn't accidently enable the IRQ while another timer handler 
is running (see attachment 'Add timer lock' in bug report) and for 
other processing protection.


However, the system still had an Oops. So I added a lock around the 
nv_rx_process_optimized() and the Oops has not happened (see 
attachment 'New patch for locking' in bug report). This would imply a 
synchronization issue. However, the only callers of that function are 
the IRQ handler and the timer handlers (in non-NAPI case). The timer 
handlers  use disable_irq so that the IRQ handler does not contend 
with them. It looks as if disable_irq is not working properly.
Either disable_irq() is not working properly or interrupts are nested, 
i.e. the irq handler is called again while running.
Which timer handler do you mean? I only see disable_irq() in the 
configuration paths (set mtu, change ring size, ...) and in the tx 
timeout case.

Neither one should happen during normal operation.

--
   Manfred
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-10-02 Thread Manfred Spraul

Ayaz Abdulla wrote:
I am trying to track down a forcedeth driver issue described by bug 
9047 in bugzilla (2.6.23-rc7-git1 forcedeth w/ MCP55 oops under heavy 
load). I added a patch to synchronize the timer handlers so that one 
handler doesn't accidently enable the IRQ while another timer handler 
is running (see attachment 'Add timer lock' in bug report) and for 
other processing protection.


However, the system still had an Oops. So I added a lock around the 
nv_rx_process_optimized() and the Oops has not happened (see 
attachment 'New patch for locking' in bug report). This would imply a 
synchronization issue. However, the only callers of that function are 
the IRQ handler and the timer handlers (in non-NAPI case). The timer 
handlers  use disable_irq so that the IRQ handler does not contend 
with them. It looks as if disable_irq is not working properly.
Either disable_irq() is not working properly or interrupts are nested, 
i.e. the irq handler is called again while running.
Which timer handler do you mean? I only see disable_irq() in the 
configuration paths (set mtu, change ring size, ...) and in the tx 
timeout case.

Neither one should happen during normal operation.

--
   Manfred
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-09-28 Thread Stephen Hemminger
On Fri, 28 Sep 2007 22:47:16 -0400
Jeff Garzik <[EMAIL PROTECTED]> wrote:

> Ayaz Abdulla wrote:
> > I am trying to track down a forcedeth driver issue described by bug 9047 
> > in bugzilla (2.6.23-rc7-git1 forcedeth w/ MCP55 oops under heavy load). 
> > I added a patch to synchronize the timer handlers so that one handler 
> > doesn't accidently enable the IRQ while another timer handler is running 
> > (see attachment 'Add timer lock' in bug report) and for other processing 
> > protection.
> > 
> > However, the system still had an Oops. So I added a lock around the 
> > nv_rx_process_optimized() and the Oops has not happened (see attachment 
> > 'New patch for locking' in bug report). This would imply a 
> > synchronization issue. However, the only callers of that function are 
> > the IRQ handler and the timer handlers (in non-NAPI case). The timer 
> > handlers  use disable_irq so that the IRQ handler does not contend with 
> > them. It looks as if disable_irq is not working properly.
> > 
> > This issue repros only with MSI interrupt and not legacy INTx 
> > interrupts. Any ideas?
> 
> (added linux-kernel to CC, since I think it's more of a general kernel 
> issue)
> 
> To be brutally frank, I always thought this disable_irq() mess was a 
> hack both ugly and fragile.  This disable_irq() work that appeared in a 
> couple net drivers was correct at the time, so I didn't feel I had the 
> justification to reject it, but it still gave me a bad feeling.
> 
> I think the scenario you outline is an illustration of the approach's 
> fragility:  disable_irq() is a heavy hammer that originated with INTx, 
> and it relies on a chip-specific disable method (kernel/irq/manage.c) 
> that practically guarantees behavior will vary across MSI/INTx/etc.
> 
> Practices like forcedeth's unique locking work for a time, but it should 
> be a warning sign any time you stray from the normal spin_lock_irqsave() 
> method of synchronization.
> 
> Based on your report, it is certainly possible that there is a problem 
> with MSI's desc->chip->disable() method...  but I would actually 
> recommend working around the problem by making the forcedeth locking 
> more standardized by removing all those disable_irq() hacks.
> 
> Using spinlocks like other net drivers (note: avoid NETIF_F_LLTX 
> drivers) has a high probability of both fixing your current problem, and 
> giving forcedeth a more stable foundation for the long term.  In my 
> humble opinion :)
> 

I'll try and clean it up if the author doesn't get to it first.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-09-28 Thread Jeff Garzik

Ayaz Abdulla wrote:
I am trying to track down a forcedeth driver issue described by bug 9047 
in bugzilla (2.6.23-rc7-git1 forcedeth w/ MCP55 oops under heavy load). 
I added a patch to synchronize the timer handlers so that one handler 
doesn't accidently enable the IRQ while another timer handler is running 
(see attachment 'Add timer lock' in bug report) and for other processing 
protection.


However, the system still had an Oops. So I added a lock around the 
nv_rx_process_optimized() and the Oops has not happened (see attachment 
'New patch for locking' in bug report). This would imply a 
synchronization issue. However, the only callers of that function are 
the IRQ handler and the timer handlers (in non-NAPI case). The timer 
handlers  use disable_irq so that the IRQ handler does not contend with 
them. It looks as if disable_irq is not working properly.


This issue repros only with MSI interrupt and not legacy INTx 
interrupts. Any ideas?


(added linux-kernel to CC, since I think it's more of a general kernel 
issue)


To be brutally frank, I always thought this disable_irq() mess was a 
hack both ugly and fragile.  This disable_irq() work that appeared in a 
couple net drivers was correct at the time, so I didn't feel I had the 
justification to reject it, but it still gave me a bad feeling.


I think the scenario you outline is an illustration of the approach's 
fragility:  disable_irq() is a heavy hammer that originated with INTx, 
and it relies on a chip-specific disable method (kernel/irq/manage.c) 
that practically guarantees behavior will vary across MSI/INTx/etc.


Practices like forcedeth's unique locking work for a time, but it should 
be a warning sign any time you stray from the normal spin_lock_irqsave() 
method of synchronization.


Based on your report, it is certainly possible that there is a problem 
with MSI's desc->chip->disable() method...  but I would actually 
recommend working around the problem by making the forcedeth locking 
more standardized by removing all those disable_irq() hacks.


Using spinlocks like other net drivers (note: avoid NETIF_F_LLTX 
drivers) has a high probability of both fixing your current problem, and 
giving forcedeth a more stable foundation for the long term.  In my 
humble opinion :)


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-09-28 Thread Jeff Garzik

Ayaz Abdulla wrote:
I am trying to track down a forcedeth driver issue described by bug 9047 
in bugzilla (2.6.23-rc7-git1 forcedeth w/ MCP55 oops under heavy load). 
I added a patch to synchronize the timer handlers so that one handler 
doesn't accidently enable the IRQ while another timer handler is running 
(see attachment 'Add timer lock' in bug report) and for other processing 
protection.


However, the system still had an Oops. So I added a lock around the 
nv_rx_process_optimized() and the Oops has not happened (see attachment 
'New patch for locking' in bug report). This would imply a 
synchronization issue. However, the only callers of that function are 
the IRQ handler and the timer handlers (in non-NAPI case). The timer 
handlers  use disable_irq so that the IRQ handler does not contend with 
them. It looks as if disable_irq is not working properly.


This issue repros only with MSI interrupt and not legacy INTx 
interrupts. Any ideas?


(added linux-kernel to CC, since I think it's more of a general kernel 
issue)


To be brutally frank, I always thought this disable_irq() mess was a 
hack both ugly and fragile.  This disable_irq() work that appeared in a 
couple net drivers was correct at the time, so I didn't feel I had the 
justification to reject it, but it still gave me a bad feeling.


I think the scenario you outline is an illustration of the approach's 
fragility:  disable_irq() is a heavy hammer that originated with INTx, 
and it relies on a chip-specific disable method (kernel/irq/manage.c) 
that practically guarantees behavior will vary across MSI/INTx/etc.


Practices like forcedeth's unique locking work for a time, but it should 
be a warning sign any time you stray from the normal spin_lock_irqsave() 
method of synchronization.


Based on your report, it is certainly possible that there is a problem 
with MSI's desc-chip-disable() method...  but I would actually 
recommend working around the problem by making the forcedeth locking 
more standardized by removing all those disable_irq() hacks.


Using spinlocks like other net drivers (note: avoid NETIF_F_LLTX 
drivers) has a high probability of both fixing your current problem, and 
giving forcedeth a more stable foundation for the long term.  In my 
humble opinion :)


Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MSI interrupts and disable_irq

2007-09-28 Thread Stephen Hemminger
On Fri, 28 Sep 2007 22:47:16 -0400
Jeff Garzik [EMAIL PROTECTED] wrote:

 Ayaz Abdulla wrote:
  I am trying to track down a forcedeth driver issue described by bug 9047 
  in bugzilla (2.6.23-rc7-git1 forcedeth w/ MCP55 oops under heavy load). 
  I added a patch to synchronize the timer handlers so that one handler 
  doesn't accidently enable the IRQ while another timer handler is running 
  (see attachment 'Add timer lock' in bug report) and for other processing 
  protection.
  
  However, the system still had an Oops. So I added a lock around the 
  nv_rx_process_optimized() and the Oops has not happened (see attachment 
  'New patch for locking' in bug report). This would imply a 
  synchronization issue. However, the only callers of that function are 
  the IRQ handler and the timer handlers (in non-NAPI case). The timer 
  handlers  use disable_irq so that the IRQ handler does not contend with 
  them. It looks as if disable_irq is not working properly.
  
  This issue repros only with MSI interrupt and not legacy INTx 
  interrupts. Any ideas?
 
 (added linux-kernel to CC, since I think it's more of a general kernel 
 issue)
 
 To be brutally frank, I always thought this disable_irq() mess was a 
 hack both ugly and fragile.  This disable_irq() work that appeared in a 
 couple net drivers was correct at the time, so I didn't feel I had the 
 justification to reject it, but it still gave me a bad feeling.
 
 I think the scenario you outline is an illustration of the approach's 
 fragility:  disable_irq() is a heavy hammer that originated with INTx, 
 and it relies on a chip-specific disable method (kernel/irq/manage.c) 
 that practically guarantees behavior will vary across MSI/INTx/etc.
 
 Practices like forcedeth's unique locking work for a time, but it should 
 be a warning sign any time you stray from the normal spin_lock_irqsave() 
 method of synchronization.
 
 Based on your report, it is certainly possible that there is a problem 
 with MSI's desc-chip-disable() method...  but I would actually 
 recommend working around the problem by making the forcedeth locking 
 more standardized by removing all those disable_irq() hacks.
 
 Using spinlocks like other net drivers (note: avoid NETIF_F_LLTX 
 drivers) has a high probability of both fixing your current problem, and 
 giving forcedeth a more stable foundation for the long term.  In my 
 humble opinion :)
 

I'll try and clean it up if the author doesn't get to it first.

-- 
Stephen Hemminger [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/