Re: NAPI and shared interrupt control

2006-12-07 Thread David Miller
From: Benjamin Herrenschmidt <[EMAIL PROTECTED]>
Date: Thu, 07 Dec 2006 21:56:02 +1100

> Dave, the main point of my initial email was: should we provide a
> routine from the net core to initialize such dummy devices properly ?
> I'm a little worried that some day, the NAPI code will change and the
> initialisations done by the driver will not be enough anymore...

That sounds like an excellent idea.

Once you guys have your implementation sorted out, submit
a patch proposing the generic bits and we'll review it.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAPI and shared interrupt control

2006-12-07 Thread Benjamin Herrenschmidt
On Thu, 2006-12-07 at 02:22 -0800, Eugene Surovegin wrote:
> On Thu, Dec 07, 2006 at 02:20:10AM -0800, David Miller wrote:
> > It also just occured to me that even if you use the dummy device
> > approach, it's the dummy device's quota that will be used by the
> > generic ->poll() downcall into the driver.
> 
> Yes, that's true. That's why I made this parameter 
> Konfig-configurable. Yes, this is not as flexible as run-time 
> configuration available for a real netdev, but better than nothing and 
> nobody has complained yet :)

Can we maybe change the dummy device weight as emacs get added to a
MAL ?

Dave, the main point of my initial email was: should we provide a
routine from the net core to initialize such dummy devices properly ?
I'm a little worried that some day, the NAPI code will change and the
initialisations done by the driver will not be enough anymore...

Cheers,
Ben.



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAPI and shared interrupt control

2006-12-07 Thread Eugene Surovegin
On Thu, Dec 07, 2006 at 02:20:10AM -0800, David Miller wrote:
> It also just occured to me that even if you use the dummy device
> approach, it's the dummy device's quota that will be used by the
> generic ->poll() downcall into the driver.

Yes, that's true. That's why I made this parameter 
Konfig-configurable. Yes, this is not as flexible as run-time 
configuration available for a real netdev, but better than nothing and 
nobody has complained yet :)

-- 
Eugene

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAPI and shared interrupt control

2006-12-07 Thread David Miller
From: Eugene Surovegin <[EMAIL PROTECTED]>
Date: Thu, 7 Dec 2006 02:15:55 -0800

> On Thu, Dec 07, 2006 at 01:59:54AM -0800, David Miller wrote:
> > From: Eugene Surovegin <[EMAIL PROTECTED]>
> > Date: Thu, 7 Dec 2006 01:45:02 -0800
> > 
> > > I fail to see how this is not even more ugly and more complex than the 
> > > solution we have right now. Instead of trivial "orthogonal" polling 
> > > code you are suggesting adding additional complexity - handle 
> > > dynamic selection of that "master" EMAC and also handling situation 
> > > when this master device goes down and you have to switch to 
> > > another one without disturbing polling for other active devices. Why 
> > > all this? This hw is ugly enough as it is.
> > 
> > Don't do dynamic selection, that indeed would be dumb.
> > 
> > Instead, just pick one of them to act as the polling master.
> > Each EMAC has a backpointer to the master EMAC, and trigger
> > the poll via that indirection.
> 
> Well, dev_close() explicitly checks and modifies state bits related 
> to NAPI polling. From quick look I don't think it's safe to take down 
> "master" device and expect NAPI polling to still work.

Good point.

It also just occured to me that even if you use the dummy device
approach, it's the dummy device's quota that will be used by the
generic ->poll() downcall into the driver.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAPI and shared interrupt control

2006-12-07 Thread Eugene Surovegin
On Thu, Dec 07, 2006 at 01:59:54AM -0800, David Miller wrote:
> From: Eugene Surovegin <[EMAIL PROTECTED]>
> Date: Thu, 7 Dec 2006 01:45:02 -0800
> 
> > I fail to see how this is not even more ugly and more complex than the 
> > solution we have right now. Instead of trivial "orthogonal" polling 
> > code you are suggesting adding additional complexity - handle 
> > dynamic selection of that "master" EMAC and also handling situation 
> > when this master device goes down and you have to switch to 
> > another one without disturbing polling for other active devices. Why 
> > all this? This hw is ugly enough as it is.
> 
> Don't do dynamic selection, that indeed would be dumb.
> 
> Instead, just pick one of them to act as the polling master.
> Each EMAC has a backpointer to the master EMAC, and trigger
> the poll via that indirection.

Well, dev_close() explicitly checks and modifies state bits related 
to NAPI polling. From quick look I don't think it's safe to take down 
"master" device and expect NAPI polling to still work.

> 
> With this shared interrupt scheme and lack of hw interrupt
> mitigation, how did the designers of this chip expect people
> to do interrupt mitigation?  I suppose they expected you to
> do it by standing on your head :-)

There are more fun stuff in this hw - max buffer size is around 4080 
bytes so we have to split packets on TX and assemble on RX (meaning 
doing memcpy) for jumbo frames. You have to reset MAC on almost every 
register change, etc

-- 
Eugene

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAPI and shared interrupt control

2006-12-07 Thread David Miller
From: Eugene Surovegin <[EMAIL PROTECTED]>
Date: Thu, 7 Dec 2006 01:45:02 -0800

> I fail to see how this is not even more ugly and more complex than the 
> solution we have right now. Instead of trivial "orthogonal" polling 
> code you are suggesting adding additional complexity - handle 
> dynamic selection of that "master" EMAC and also handling situation 
> when this master device goes down and you have to switch to 
> another one without disturbing polling for other active devices. Why 
> all this? This hw is ugly enough as it is.

Don't do dynamic selection, that indeed would be dumb.

Instead, just pick one of them to act as the polling master.
Each EMAC has a backpointer to the master EMAC, and trigger
the poll via that indirection.

With this shared interrupt scheme and lack of hw interrupt
mitigation, how did the designers of this chip expect people
to do interrupt mitigation?  I suppose they expected you to
do it by standing on your head :-)

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAPI and shared interrupt control

2006-12-07 Thread Eugene Surovegin
On Thu, Dec 07, 2006 at 01:16:27AM -0800, David Miller wrote:
> From: Benjamin Herrenschmidt <[EMAIL PROTECTED]>
> Date: Thu, 07 Dec 2006 15:24:06 +1100
> 
> > 
> > > What Eugene does currently, which seems to me like it's actually the
> > > only proper solution, is to create a separate net_device structure for
> > > the DMA engine and thus have a single NAPI poll & weighting for all the
> > > EMACs sharing a given MAL (MAL is the name of that DMA engine). This
> > > means that Rx from any of the channels schedules the poll, and
> > > interrupts can be properly masked/unmasked globally based on the
> > > presence/absence of work on all the channels.
> > 
> > Actually, another solution would be to have one of the instances do the
> > NAPI poll for all of them instead of creating a separate net_device for
> > the DMA engine...
> 
> I think this idea would work the best.

I fail to see how this is not even more ugly and more complex than the 
solution we have right now. Instead of trivial "orthogonal" polling 
code you are suggesting adding additional complexity - handle 
dynamic selection of that "master" EMAC and also handling situation 
when this master device goes down and you have to switch to 
another one without disturbing polling for other active devices. Why 
all this? This hw is ugly enough as it is.

> Just link the other related devices into a list in the driver private
> struct.  Or a simple "work pending" bitmask for each of the EMACs,
> which tell the primary EMAC netdev which devices should be polled.
> 
> This architecture doesn't make things easy, that's for sure :-)
> 
> If things get too hairy, don't feel too bad about ideas like forgoing
> NAPI altogether and just using the interrupt mitigation features of
> the chip.  Does it have any?

No, it does not.

-- 
Eugene

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAPI and shared interrupt control

2006-12-07 Thread David Miller
From: Benjamin Herrenschmidt <[EMAIL PROTECTED]>
Date: Thu, 07 Dec 2006 15:24:06 +1100

> 
> > What Eugene does currently, which seems to me like it's actually the
> > only proper solution, is to create a separate net_device structure for
> > the DMA engine and thus have a single NAPI poll & weighting for all the
> > EMACs sharing a given MAL (MAL is the name of that DMA engine). This
> > means that Rx from any of the channels schedules the poll, and
> > interrupts can be properly masked/unmasked globally based on the
> > presence/absence of work on all the channels.
> 
> Actually, another solution would be to have one of the instances do the
> NAPI poll for all of them instead of creating a separate net_device for
> the DMA engine...

I think this idea would work the best.

Just link the other related devices into a list in the driver private
struct.  Or a simple "work pending" bitmask for each of the EMACs,
which tell the primary EMAC netdev which devices should be polled.

This architecture doesn't make things easy, that's for sure :-)

If things get too hairy, don't feel too bad about ideas like forgoing
NAPI altogether and just using the interrupt mitigation features of
the chip.  Does it have any?

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAPI and shared interrupt control

2006-12-06 Thread Benjamin Herrenschmidt

> What Eugene does currently, which seems to me like it's actually the
> only proper solution, is to create a separate net_device structure for
> the DMA engine and thus have a single NAPI poll & weighting for all the
> EMACs sharing a given MAL (MAL is the name of that DMA engine). This
> means that Rx from any of the channels schedules the poll, and
> interrupts can be properly masked/unmasked globally based on the
> presence/absence of work on all the channels.

Actually, another solution would be to have one of the instances do the
NAPI poll for all of them instead of creating a separate net_device for
the DMA engine...

Ben.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


NAPI and shared interrupt control

2006-12-06 Thread Benjamin Herrenschmidt
Hi Dave !

I'd like your advice on something we need to deal with in the EMAC
ethernet driver (an IBM part). The driver is maintainedby Eugene (CC'd),
I'm mostly adding support for some new hardware at this point, which
involves making it SMP safe among other things ;-)

So the problem this driver has is with NAPI polling and its shared DMA
engine. Basically, the DMA engine can be shared between several EMAC
ethernet cells. It has separate "channels" (and thus separate rings) so
in that area, all is fine... except the interrupt control. The is only
one global interrupt enable/disable bit for packet interrupts, shared by
all the EMACs using that DMA engine cell.

That means complications for NAPI polling as we can't just
disable/enable Rx interrupts as NAPI would normally expect on a
per-device basis.

What Eugene does currently, which seems to me like it's actually the
only proper solution, is to create a separate net_device structure for
the DMA engine and thus have a single NAPI poll & weighting for all the
EMACs sharing a given MAL (MAL is the name of that DMA engine). This
means that Rx from any of the channels schedules the poll, and
interrupts can be properly masked/unmasked globally based on the
presence/absence of work on all the channels.

I'd still like to run it through you, make sure you are ok or see if you
have a better idea as you are way more familiar with the network stack
than I am :-)

My main concern with the approach is purely due to how the additional
net_device is initialized. It's currently allocated as part of some
private data structure in the driver which then manually initializes
some fields that are used by NAPI polling. While it certainly works, I
find the approach a bit fragile (what if the NAPI code internals change
and some initialisations have to be done differently ?)

Thus I was wondering, if you think the approach is sane, wether it would
make sense to provide a low level netif_init_device() thingy that makes
sure a net_device data structure is fully initialized and suitable for
use ? (make sure spinlocks are initialized etc...). Something similar to
the code used to initialize the backlog "fake" device ?

The driver currently does:

set_bit(__LINK_STATE_START, &mal->poll_dev.state);
mal->poll_dev.weight = CONFIG_IBM_EMAC_POLL_WEIGHT;
mal->poll_dev.poll = mal_poll;
mal->poll_dev.priv = mal;
atomic_set(&mal->poll_dev.refcnt, 1);

(None of the spinlocks are initialized, but then, the driver was only
ever used on UP only embedded CPUs so far and it's possible that none of
the NAPI code path for which this net_device structure is used are
hitting any spinlock). 

Or do you think that net_device should be fully registered with
register_netdev ? (that would involve a lot more than what is currently
needed/used though).

Cheers,
Ben.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html