Re: NAPI and shared interrupt control

2006-12-07 Thread David Miller
From: Benjamin Herrenschmidt [EMAIL PROTECTED]
Date: Thu, 07 Dec 2006 15:24:06 +1100

 
  What Eugene does currently, which seems to me like it's actually the
  only proper solution, is to create a separate net_device structure for
  the DMA engine and thus have a single NAPI poll  weighting for all the
  EMACs sharing a given MAL (MAL is the name of that DMA engine). This
  means that Rx from any of the channels schedules the poll, and
  interrupts can be properly masked/unmasked globally based on the
  presence/absence of work on all the channels.
 
 Actually, another solution would be to have one of the instances do the
 NAPI poll for all of them instead of creating a separate net_device for
 the DMA engine...

I think this idea would work the best.

Just link the other related devices into a list in the driver private
struct.  Or a simple work pending bitmask for each of the EMACs,
which tell the primary EMAC netdev which devices should be polled.

This architecture doesn't make things easy, that's for sure :-)

If things get too hairy, don't feel too bad about ideas like forgoing
NAPI altogether and just using the interrupt mitigation features of
the chip.  Does it have any?

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAPI and shared interrupt control

2006-12-07 Thread Eugene Surovegin
On Thu, Dec 07, 2006 at 01:16:27AM -0800, David Miller wrote:
 From: Benjamin Herrenschmidt [EMAIL PROTECTED]
 Date: Thu, 07 Dec 2006 15:24:06 +1100
 
  
   What Eugene does currently, which seems to me like it's actually the
   only proper solution, is to create a separate net_device structure for
   the DMA engine and thus have a single NAPI poll  weighting for all the
   EMACs sharing a given MAL (MAL is the name of that DMA engine). This
   means that Rx from any of the channels schedules the poll, and
   interrupts can be properly masked/unmasked globally based on the
   presence/absence of work on all the channels.
  
  Actually, another solution would be to have one of the instances do the
  NAPI poll for all of them instead of creating a separate net_device for
  the DMA engine...
 
 I think this idea would work the best.

I fail to see how this is not even more ugly and more complex than the 
solution we have right now. Instead of trivial orthogonal polling 
code you are suggesting adding additional complexity - handle 
dynamic selection of that master EMAC and also handling situation 
when this master device goes down and you have to switch to 
another one without disturbing polling for other active devices. Why 
all this? This hw is ugly enough as it is.

 Just link the other related devices into a list in the driver private
 struct.  Or a simple work pending bitmask for each of the EMACs,
 which tell the primary EMAC netdev which devices should be polled.
 
 This architecture doesn't make things easy, that's for sure :-)
 
 If things get too hairy, don't feel too bad about ideas like forgoing
 NAPI altogether and just using the interrupt mitigation features of
 the chip.  Does it have any?

No, it does not.

-- 
Eugene

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAPI and shared interrupt control

2006-12-07 Thread David Miller
From: Eugene Surovegin [EMAIL PROTECTED]
Date: Thu, 7 Dec 2006 01:45:02 -0800

 I fail to see how this is not even more ugly and more complex than the 
 solution we have right now. Instead of trivial orthogonal polling 
 code you are suggesting adding additional complexity - handle 
 dynamic selection of that master EMAC and also handling situation 
 when this master device goes down and you have to switch to 
 another one without disturbing polling for other active devices. Why 
 all this? This hw is ugly enough as it is.

Don't do dynamic selection, that indeed would be dumb.

Instead, just pick one of them to act as the polling master.
Each EMAC has a backpointer to the master EMAC, and trigger
the poll via that indirection.

With this shared interrupt scheme and lack of hw interrupt
mitigation, how did the designers of this chip expect people
to do interrupt mitigation?  I suppose they expected you to
do it by standing on your head :-)

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAPI and shared interrupt control

2006-12-07 Thread Eugene Surovegin
On Thu, Dec 07, 2006 at 01:59:54AM -0800, David Miller wrote:
 From: Eugene Surovegin [EMAIL PROTECTED]
 Date: Thu, 7 Dec 2006 01:45:02 -0800
 
  I fail to see how this is not even more ugly and more complex than the 
  solution we have right now. Instead of trivial orthogonal polling 
  code you are suggesting adding additional complexity - handle 
  dynamic selection of that master EMAC and also handling situation 
  when this master device goes down and you have to switch to 
  another one without disturbing polling for other active devices. Why 
  all this? This hw is ugly enough as it is.
 
 Don't do dynamic selection, that indeed would be dumb.
 
 Instead, just pick one of them to act as the polling master.
 Each EMAC has a backpointer to the master EMAC, and trigger
 the poll via that indirection.

Well, dev_close() explicitly checks and modifies state bits related 
to NAPI polling. From quick look I don't think it's safe to take down 
master device and expect NAPI polling to still work.

 
 With this shared interrupt scheme and lack of hw interrupt
 mitigation, how did the designers of this chip expect people
 to do interrupt mitigation?  I suppose they expected you to
 do it by standing on your head :-)

There are more fun stuff in this hw - max buffer size is around 4080 
bytes so we have to split packets on TX and assemble on RX (meaning 
doing memcpy) for jumbo frames. You have to reset MAC on almost every 
register change, etc

-- 
Eugene

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAPI and shared interrupt control

2006-12-07 Thread Eugene Surovegin
On Thu, Dec 07, 2006 at 02:20:10AM -0800, David Miller wrote:
 It also just occured to me that even if you use the dummy device
 approach, it's the dummy device's quota that will be used by the
 generic -poll() downcall into the driver.

Yes, that's true. That's why I made this parameter 
Konfig-configurable. Yes, this is not as flexible as run-time 
configuration available for a real netdev, but better than nothing and 
nobody has complained yet :)

-- 
Eugene

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAPI and shared interrupt control

2006-12-07 Thread Benjamin Herrenschmidt
On Thu, 2006-12-07 at 02:22 -0800, Eugene Surovegin wrote:
 On Thu, Dec 07, 2006 at 02:20:10AM -0800, David Miller wrote:
  It also just occured to me that even if you use the dummy device
  approach, it's the dummy device's quota that will be used by the
  generic -poll() downcall into the driver.
 
 Yes, that's true. That's why I made this parameter 
 Konfig-configurable. Yes, this is not as flexible as run-time 
 configuration available for a real netdev, but better than nothing and 
 nobody has complained yet :)

Can we maybe change the dummy device weight as emacs get added to a
MAL ?

Dave, the main point of my initial email was: should we provide a
routine from the net core to initialize such dummy devices properly ?
I'm a little worried that some day, the NAPI code will change and the
initialisations done by the driver will not be enough anymore...

Cheers,
Ben.



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


NAPI and shared interrupt control

2006-12-06 Thread Benjamin Herrenschmidt
Hi Dave !

I'd like your advice on something we need to deal with in the EMAC
ethernet driver (an IBM part). The driver is maintainedby Eugene (CC'd),
I'm mostly adding support for some new hardware at this point, which
involves making it SMP safe among other things ;-)

So the problem this driver has is with NAPI polling and its shared DMA
engine. Basically, the DMA engine can be shared between several EMAC
ethernet cells. It has separate channels (and thus separate rings) so
in that area, all is fine... except the interrupt control. The is only
one global interrupt enable/disable bit for packet interrupts, shared by
all the EMACs using that DMA engine cell.

That means complications for NAPI polling as we can't just
disable/enable Rx interrupts as NAPI would normally expect on a
per-device basis.

What Eugene does currently, which seems to me like it's actually the
only proper solution, is to create a separate net_device structure for
the DMA engine and thus have a single NAPI poll  weighting for all the
EMACs sharing a given MAL (MAL is the name of that DMA engine). This
means that Rx from any of the channels schedules the poll, and
interrupts can be properly masked/unmasked globally based on the
presence/absence of work on all the channels.

I'd still like to run it through you, make sure you are ok or see if you
have a better idea as you are way more familiar with the network stack
than I am :-)

My main concern with the approach is purely due to how the additional
net_device is initialized. It's currently allocated as part of some
private data structure in the driver which then manually initializes
some fields that are used by NAPI polling. While it certainly works, I
find the approach a bit fragile (what if the NAPI code internals change
and some initialisations have to be done differently ?)

Thus I was wondering, if you think the approach is sane, wether it would
make sense to provide a low level netif_init_device() thingy that makes
sure a net_device data structure is fully initialized and suitable for
use ? (make sure spinlocks are initialized etc...). Something similar to
the code used to initialize the backlog fake device ?

The driver currently does:

set_bit(__LINK_STATE_START, mal-poll_dev.state);
mal-poll_dev.weight = CONFIG_IBM_EMAC_POLL_WEIGHT;
mal-poll_dev.poll = mal_poll;
mal-poll_dev.priv = mal;
atomic_set(mal-poll_dev.refcnt, 1);

(None of the spinlocks are initialized, but then, the driver was only
ever used on UP only embedded CPUs so far and it's possible that none of
the NAPI code path for which this net_device structure is used are
hitting any spinlock). 

Or do you think that net_device should be fully registered with
register_netdev ? (that would involve a lot more than what is currently
needed/used though).

Cheers,
Ben.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAPI and shared interrupt control

2006-12-06 Thread Benjamin Herrenschmidt

 What Eugene does currently, which seems to me like it's actually the
 only proper solution, is to create a separate net_device structure for
 the DMA engine and thus have a single NAPI poll  weighting for all the
 EMACs sharing a given MAL (MAL is the name of that DMA engine). This
 means that Rx from any of the channels schedules the poll, and
 interrupts can be properly masked/unmasked globally based on the
 presence/absence of work on all the channels.

Actually, another solution would be to have one of the instances do the
NAPI poll for all of them instead of creating a separate net_device for
the DMA engine...

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html