Re: [PATCH net 2/2] net: core: explicitly select a txq before doing l2 forwarding

2014-01-09 Thread Neil Horman
On Thu, Jan 09, 2014 at 04:28:49PM +0800, Jason Wang wrote:
> On 01/08/2014 10:40 PM, Neil Horman wrote:
> > On Wed, Jan 08, 2014 at 11:21:21AM +0800, Jason Wang wrote:
> >> On 01/07/2014 09:17 PM, Neil Horman wrote:
> >>> On Tue, Jan 07, 2014 at 11:42:24AM +0800, Jason Wang wrote:
>  On 01/06/2014 08:42 PM, Neil Horman wrote:
> > On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote:
> >> Currently, the tx queue were selected implicitly in 
> >> ndo_dfwd_start_xmit(). The
> >> will cause several issues:
> >>
> >> - NETIF_F_LLTX was forced for macvlan device in this case which lead 
> >> extra lock
> >>   contention.
> >> - dev_hard_start_xmit() was called with NULL txq which bypasses the 
> >> net device
> >>   watchdog
> >> - dev_hard_start_xmit() does not check txq everywhere which will lead 
> >> a crash
> >>   when tso is disabled for lower device.
> >>
> >> Fix this by explicitly introducing a select queue method just for l2 
> >> forwarding
> >> offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to 
> >> do the
> >> queue selecting and transmitting for l2 forwarding.
> >>
> >> With this fixes, NETIF_F_LLTX could be preserved for macvlan and 
> >> there's no need
> >> to check txq against NULL in dev_hard_start_xmit().
> >>
> >> In the future, it was also required for macvtap l2 forwarding support 
> >> since it
> >> provides a necessary synchronization method.
> >>
> >> Cc: John Fastabend 
> >> Cc: Neil Horman 
> >> Cc: e1000-de...@lists.sourceforge.net
> >> Signed-off-by: Jason Wang 
> > Instead of creating another operation here to do special queue 
> > selection, why
> > not just have ndo_dfwd_start_xmit include a pointer to a pointer in its 
> > argument
> > list, so it can pass the txq it used back to the caller 
> > (dev_hard_start_xmit)?
> > ndo_dfwd_start_xmit already knows which queue set to pick from (since 
> > their
> > reserved for the device doing the transmitting).  It seems more clear 
> > to me than
> > creating a new netdevice operation.  
>  See commit 8ffab51b3dfc54876f145f15b351c41f3f703195 ("macvlan: lockless
>  tx path"). The point is keep the tx path lockless to be efficient and
>  simplicity for management. And macvtap multiqueue was also implemented
>  with this assumption. The real contention should be done in the txq of
>  lower device instead of macvlan itself. This is also needed for
>  multiqueue macvtap.
> >>> Ok, I see how you're preserving LLTX here, and thats great, but it doesn't
> >>> really buy us anything that I can see.  If a macvlan is using hardware
> >>> acceleration, it needs to arbitrate access to that hardware.  Weather 
> >>> thats done
> >>> by locking the lowerdev's tx queue lock or by enforcing locking on the 
> >>> macvlan
> >>> itself is equivalent.  The decision to use dfwd hardware acceleration is 
> >>> made on
> >>> open, so its not like theres any traffic that can avoid the lock, as it 
> >>> all goes
> >>> through the hardware.  All I see that this has bought us is an extra 
> >>> net_device
> >>> method (which isn't a big deal, but not necessecary as I see it).
> >> As I replied to patch 1/2, looking at the code itself again. The locking
> >> on the lowerdev's tx queue is really need since we need synchronize with
> >> other control path. Two examples are dev watchdog and ixgbe_down() both
> >> of which will try to hold tx lock to synchronize the with transmission.
> >> Without holding the lowerdev tx lock, we may have more serious issues.
> >> Also, it's a little strange for a net device has two modes. Future
> >> developers need to care about two different tx lock paths which is sub
> >> optimal.
> >>
> > Ok, having looked at this for a few hours, I agree, locking in the lowerdev 
> > has
> > some definiate advantages in plugging the holes you've pointed out.
> >
> >> For the issue of an extra net_device method,  if you don't like we can
> >> reuse the ndo_select_queue by also passing the accel_priv to that method.
> > I do, that actually simplifies things, since it lets us use the entire
> > dev_hard_start_xmit path unmodified, which gives us the locking your 
> > looking for
> > without having to create a new slimmed down variant of dev_hard_start_xmit.
> >
> > Regards
> > Neil
> 
> Right, will post V2.
> 
Thanks
Neil

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 2/2] net: core: explicitly select a txq before doing l2 forwarding

2014-01-09 Thread Jason Wang
On 01/08/2014 10:40 PM, Neil Horman wrote:
> On Wed, Jan 08, 2014 at 11:21:21AM +0800, Jason Wang wrote:
>> On 01/07/2014 09:17 PM, Neil Horman wrote:
>>> On Tue, Jan 07, 2014 at 11:42:24AM +0800, Jason Wang wrote:
 On 01/06/2014 08:42 PM, Neil Horman wrote:
> On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote:
>> Currently, the tx queue were selected implicitly in 
>> ndo_dfwd_start_xmit(). The
>> will cause several issues:
>>
>> - NETIF_F_LLTX was forced for macvlan device in this case which lead 
>> extra lock
>>   contention.
>> - dev_hard_start_xmit() was called with NULL txq which bypasses the net 
>> device
>>   watchdog
>> - dev_hard_start_xmit() does not check txq everywhere which will lead a 
>> crash
>>   when tso is disabled for lower device.
>>
>> Fix this by explicitly introducing a select queue method just for l2 
>> forwarding
>> offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to 
>> do the
>> queue selecting and transmitting for l2 forwarding.
>>
>> With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's 
>> no need
>> to check txq against NULL in dev_hard_start_xmit().
>>
>> In the future, it was also required for macvtap l2 forwarding support 
>> since it
>> provides a necessary synchronization method.
>>
>> Cc: John Fastabend 
>> Cc: Neil Horman 
>> Cc: e1000-de...@lists.sourceforge.net
>> Signed-off-by: Jason Wang 
> Instead of creating another operation here to do special queue selection, 
> why
> not just have ndo_dfwd_start_xmit include a pointer to a pointer in its 
> argument
> list, so it can pass the txq it used back to the caller 
> (dev_hard_start_xmit)?
> ndo_dfwd_start_xmit already knows which queue set to pick from (since 
> their
> reserved for the device doing the transmitting).  It seems more clear to 
> me than
> creating a new netdevice operation.  
 See commit 8ffab51b3dfc54876f145f15b351c41f3f703195 ("macvlan: lockless
 tx path"). The point is keep the tx path lockless to be efficient and
 simplicity for management. And macvtap multiqueue was also implemented
 with this assumption. The real contention should be done in the txq of
 lower device instead of macvlan itself. This is also needed for
 multiqueue macvtap.
>>> Ok, I see how you're preserving LLTX here, and thats great, but it doesn't
>>> really buy us anything that I can see.  If a macvlan is using hardware
>>> acceleration, it needs to arbitrate access to that hardware.  Weather thats 
>>> done
>>> by locking the lowerdev's tx queue lock or by enforcing locking on the 
>>> macvlan
>>> itself is equivalent.  The decision to use dfwd hardware acceleration is 
>>> made on
>>> open, so its not like theres any traffic that can avoid the lock, as it all 
>>> goes
>>> through the hardware.  All I see that this has bought us is an extra 
>>> net_device
>>> method (which isn't a big deal, but not necessecary as I see it).
>> As I replied to patch 1/2, looking at the code itself again. The locking
>> on the lowerdev's tx queue is really need since we need synchronize with
>> other control path. Two examples are dev watchdog and ixgbe_down() both
>> of which will try to hold tx lock to synchronize the with transmission.
>> Without holding the lowerdev tx lock, we may have more serious issues.
>> Also, it's a little strange for a net device has two modes. Future
>> developers need to care about two different tx lock paths which is sub
>> optimal.
>>
> Ok, having looked at this for a few hours, I agree, locking in the lowerdev 
> has
> some definiate advantages in plugging the holes you've pointed out.
>
>> For the issue of an extra net_device method,  if you don't like we can
>> reuse the ndo_select_queue by also passing the accel_priv to that method.
> I do, that actually simplifies things, since it lets us use the entire
> dev_hard_start_xmit path unmodified, which gives us the locking your looking 
> for
> without having to create a new slimmed down variant of dev_hard_start_xmit.
>
> Regards
> Neil

Right, will post V2.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 2/2] net: core: explicitly select a txq before doing l2 forwarding

2014-01-09 Thread Jason Wang
On 01/08/2014 10:40 PM, Neil Horman wrote:
 On Wed, Jan 08, 2014 at 11:21:21AM +0800, Jason Wang wrote:
 On 01/07/2014 09:17 PM, Neil Horman wrote:
 On Tue, Jan 07, 2014 at 11:42:24AM +0800, Jason Wang wrote:
 On 01/06/2014 08:42 PM, Neil Horman wrote:
 On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote:
 Currently, the tx queue were selected implicitly in 
 ndo_dfwd_start_xmit(). The
 will cause several issues:

 - NETIF_F_LLTX was forced for macvlan device in this case which lead 
 extra lock
   contention.
 - dev_hard_start_xmit() was called with NULL txq which bypasses the net 
 device
   watchdog
 - dev_hard_start_xmit() does not check txq everywhere which will lead a 
 crash
   when tso is disabled for lower device.

 Fix this by explicitly introducing a select queue method just for l2 
 forwarding
 offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to 
 do the
 queue selecting and transmitting for l2 forwarding.

 With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's 
 no need
 to check txq against NULL in dev_hard_start_xmit().

 In the future, it was also required for macvtap l2 forwarding support 
 since it
 provides a necessary synchronization method.

 Cc: John Fastabend john.r.fastab...@intel.com
 Cc: Neil Horman nhor...@tuxdriver.com
 Cc: e1000-de...@lists.sourceforge.net
 Signed-off-by: Jason Wang jasow...@redhat.com
 Instead of creating another operation here to do special queue selection, 
 why
 not just have ndo_dfwd_start_xmit include a pointer to a pointer in its 
 argument
 list, so it can pass the txq it used back to the caller 
 (dev_hard_start_xmit)?
 ndo_dfwd_start_xmit already knows which queue set to pick from (since 
 their
 reserved for the device doing the transmitting).  It seems more clear to 
 me than
 creating a new netdevice operation.  
 See commit 8ffab51b3dfc54876f145f15b351c41f3f703195 (macvlan: lockless
 tx path). The point is keep the tx path lockless to be efficient and
 simplicity for management. And macvtap multiqueue was also implemented
 with this assumption. The real contention should be done in the txq of
 lower device instead of macvlan itself. This is also needed for
 multiqueue macvtap.
 Ok, I see how you're preserving LLTX here, and thats great, but it doesn't
 really buy us anything that I can see.  If a macvlan is using hardware
 acceleration, it needs to arbitrate access to that hardware.  Weather thats 
 done
 by locking the lowerdev's tx queue lock or by enforcing locking on the 
 macvlan
 itself is equivalent.  The decision to use dfwd hardware acceleration is 
 made on
 open, so its not like theres any traffic that can avoid the lock, as it all 
 goes
 through the hardware.  All I see that this has bought us is an extra 
 net_device
 method (which isn't a big deal, but not necessecary as I see it).
 As I replied to patch 1/2, looking at the code itself again. The locking
 on the lowerdev's tx queue is really need since we need synchronize with
 other control path. Two examples are dev watchdog and ixgbe_down() both
 of which will try to hold tx lock to synchronize the with transmission.
 Without holding the lowerdev tx lock, we may have more serious issues.
 Also, it's a little strange for a net device has two modes. Future
 developers need to care about two different tx lock paths which is sub
 optimal.

 Ok, having looked at this for a few hours, I agree, locking in the lowerdev 
 has
 some definiate advantages in plugging the holes you've pointed out.

 For the issue of an extra net_device method,  if you don't like we can
 reuse the ndo_select_queue by also passing the accel_priv to that method.
 I do, that actually simplifies things, since it lets us use the entire
 dev_hard_start_xmit path unmodified, which gives us the locking your looking 
 for
 without having to create a new slimmed down variant of dev_hard_start_xmit.

 Regards
 Neil

Right, will post V2.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 2/2] net: core: explicitly select a txq before doing l2 forwarding

2014-01-09 Thread Neil Horman
On Thu, Jan 09, 2014 at 04:28:49PM +0800, Jason Wang wrote:
 On 01/08/2014 10:40 PM, Neil Horman wrote:
  On Wed, Jan 08, 2014 at 11:21:21AM +0800, Jason Wang wrote:
  On 01/07/2014 09:17 PM, Neil Horman wrote:
  On Tue, Jan 07, 2014 at 11:42:24AM +0800, Jason Wang wrote:
  On 01/06/2014 08:42 PM, Neil Horman wrote:
  On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote:
  Currently, the tx queue were selected implicitly in 
  ndo_dfwd_start_xmit(). The
  will cause several issues:
 
  - NETIF_F_LLTX was forced for macvlan device in this case which lead 
  extra lock
contention.
  - dev_hard_start_xmit() was called with NULL txq which bypasses the 
  net device
watchdog
  - dev_hard_start_xmit() does not check txq everywhere which will lead 
  a crash
when tso is disabled for lower device.
 
  Fix this by explicitly introducing a select queue method just for l2 
  forwarding
  offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to 
  do the
  queue selecting and transmitting for l2 forwarding.
 
  With this fixes, NETIF_F_LLTX could be preserved for macvlan and 
  there's no need
  to check txq against NULL in dev_hard_start_xmit().
 
  In the future, it was also required for macvtap l2 forwarding support 
  since it
  provides a necessary synchronization method.
 
  Cc: John Fastabend john.r.fastab...@intel.com
  Cc: Neil Horman nhor...@tuxdriver.com
  Cc: e1000-de...@lists.sourceforge.net
  Signed-off-by: Jason Wang jasow...@redhat.com
  Instead of creating another operation here to do special queue 
  selection, why
  not just have ndo_dfwd_start_xmit include a pointer to a pointer in its 
  argument
  list, so it can pass the txq it used back to the caller 
  (dev_hard_start_xmit)?
  ndo_dfwd_start_xmit already knows which queue set to pick from (since 
  their
  reserved for the device doing the transmitting).  It seems more clear 
  to me than
  creating a new netdevice operation.  
  See commit 8ffab51b3dfc54876f145f15b351c41f3f703195 (macvlan: lockless
  tx path). The point is keep the tx path lockless to be efficient and
  simplicity for management. And macvtap multiqueue was also implemented
  with this assumption. The real contention should be done in the txq of
  lower device instead of macvlan itself. This is also needed for
  multiqueue macvtap.
  Ok, I see how you're preserving LLTX here, and thats great, but it doesn't
  really buy us anything that I can see.  If a macvlan is using hardware
  acceleration, it needs to arbitrate access to that hardware.  Weather 
  thats done
  by locking the lowerdev's tx queue lock or by enforcing locking on the 
  macvlan
  itself is equivalent.  The decision to use dfwd hardware acceleration is 
  made on
  open, so its not like theres any traffic that can avoid the lock, as it 
  all goes
  through the hardware.  All I see that this has bought us is an extra 
  net_device
  method (which isn't a big deal, but not necessecary as I see it).
  As I replied to patch 1/2, looking at the code itself again. The locking
  on the lowerdev's tx queue is really need since we need synchronize with
  other control path. Two examples are dev watchdog and ixgbe_down() both
  of which will try to hold tx lock to synchronize the with transmission.
  Without holding the lowerdev tx lock, we may have more serious issues.
  Also, it's a little strange for a net device has two modes. Future
  developers need to care about two different tx lock paths which is sub
  optimal.
 
  Ok, having looked at this for a few hours, I agree, locking in the lowerdev 
  has
  some definiate advantages in plugging the holes you've pointed out.
 
  For the issue of an extra net_device method,  if you don't like we can
  reuse the ndo_select_queue by also passing the accel_priv to that method.
  I do, that actually simplifies things, since it lets us use the entire
  dev_hard_start_xmit path unmodified, which gives us the locking your 
  looking for
  without having to create a new slimmed down variant of dev_hard_start_xmit.
 
  Regards
  Neil
 
 Right, will post V2.
 
Thanks
Neil

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 2/2] net: core: explicitly select a txq before doing l2 forwarding

2014-01-08 Thread Neil Horman
On Wed, Jan 08, 2014 at 11:21:21AM +0800, Jason Wang wrote:
> On 01/07/2014 09:17 PM, Neil Horman wrote:
> > On Tue, Jan 07, 2014 at 11:42:24AM +0800, Jason Wang wrote:
> >> On 01/06/2014 08:42 PM, Neil Horman wrote:
> >>> On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote:
>  Currently, the tx queue were selected implicitly in 
>  ndo_dfwd_start_xmit(). The
>  will cause several issues:
> 
>  - NETIF_F_LLTX was forced for macvlan device in this case which lead 
>  extra lock
>    contention.
>  - dev_hard_start_xmit() was called with NULL txq which bypasses the net 
>  device
>    watchdog
>  - dev_hard_start_xmit() does not check txq everywhere which will lead a 
>  crash
>    when tso is disabled for lower device.
> 
>  Fix this by explicitly introducing a select queue method just for l2 
>  forwarding
>  offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to 
>  do the
>  queue selecting and transmitting for l2 forwarding.
> 
>  With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's 
>  no need
>  to check txq against NULL in dev_hard_start_xmit().
> 
>  In the future, it was also required for macvtap l2 forwarding support 
>  since it
>  provides a necessary synchronization method.
> 
>  Cc: John Fastabend 
>  Cc: Neil Horman 
>  Cc: e1000-de...@lists.sourceforge.net
>  Signed-off-by: Jason Wang 
> >>> Instead of creating another operation here to do special queue selection, 
> >>> why
> >>> not just have ndo_dfwd_start_xmit include a pointer to a pointer in its 
> >>> argument
> >>> list, so it can pass the txq it used back to the caller 
> >>> (dev_hard_start_xmit)?
> >>> ndo_dfwd_start_xmit already knows which queue set to pick from (since 
> >>> their
> >>> reserved for the device doing the transmitting).  It seems more clear to 
> >>> me than
> >>> creating a new netdevice operation.  
> >> See commit 8ffab51b3dfc54876f145f15b351c41f3f703195 ("macvlan: lockless
> >> tx path"). The point is keep the tx path lockless to be efficient and
> >> simplicity for management. And macvtap multiqueue was also implemented
> >> with this assumption. The real contention should be done in the txq of
> >> lower device instead of macvlan itself. This is also needed for
> >> multiqueue macvtap.
> > Ok, I see how you're preserving LLTX here, and thats great, but it doesn't
> > really buy us anything that I can see.  If a macvlan is using hardware
> > acceleration, it needs to arbitrate access to that hardware.  Weather thats 
> > done
> > by locking the lowerdev's tx queue lock or by enforcing locking on the 
> > macvlan
> > itself is equivalent.  The decision to use dfwd hardware acceleration is 
> > made on
> > open, so its not like theres any traffic that can avoid the lock, as it all 
> > goes
> > through the hardware.  All I see that this has bought us is an extra 
> > net_device
> > method (which isn't a big deal, but not necessecary as I see it).
> 
> As I replied to patch 1/2, looking at the code itself again. The locking
> on the lowerdev's tx queue is really need since we need synchronize with
> other control path. Two examples are dev watchdog and ixgbe_down() both
> of which will try to hold tx lock to synchronize the with transmission.
> Without holding the lowerdev tx lock, we may have more serious issues.
> Also, it's a little strange for a net device has two modes. Future
> developers need to care about two different tx lock paths which is sub
> optimal.
> 

Ok, having looked at this for a few hours, I agree, locking in the lowerdev has
some definiate advantages in plugging the holes you've pointed out.

> For the issue of an extra net_device method,  if you don't like we can
> reuse the ndo_select_queue by also passing the accel_priv to that method.
I do, that actually simplifies things, since it lets us use the entire
dev_hard_start_xmit path unmodified, which gives us the locking your looking for
without having to create a new slimmed down variant of dev_hard_start_xmit.

Regards
Neil
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 2/2] net: core: explicitly select a txq before doing l2 forwarding

2014-01-08 Thread Neil Horman
On Wed, Jan 08, 2014 at 11:21:21AM +0800, Jason Wang wrote:
 On 01/07/2014 09:17 PM, Neil Horman wrote:
  On Tue, Jan 07, 2014 at 11:42:24AM +0800, Jason Wang wrote:
  On 01/06/2014 08:42 PM, Neil Horman wrote:
  On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote:
  Currently, the tx queue were selected implicitly in 
  ndo_dfwd_start_xmit(). The
  will cause several issues:
 
  - NETIF_F_LLTX was forced for macvlan device in this case which lead 
  extra lock
contention.
  - dev_hard_start_xmit() was called with NULL txq which bypasses the net 
  device
watchdog
  - dev_hard_start_xmit() does not check txq everywhere which will lead a 
  crash
when tso is disabled for lower device.
 
  Fix this by explicitly introducing a select queue method just for l2 
  forwarding
  offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to 
  do the
  queue selecting and transmitting for l2 forwarding.
 
  With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's 
  no need
  to check txq against NULL in dev_hard_start_xmit().
 
  In the future, it was also required for macvtap l2 forwarding support 
  since it
  provides a necessary synchronization method.
 
  Cc: John Fastabend john.r.fastab...@intel.com
  Cc: Neil Horman nhor...@tuxdriver.com
  Cc: e1000-de...@lists.sourceforge.net
  Signed-off-by: Jason Wang jasow...@redhat.com
  Instead of creating another operation here to do special queue selection, 
  why
  not just have ndo_dfwd_start_xmit include a pointer to a pointer in its 
  argument
  list, so it can pass the txq it used back to the caller 
  (dev_hard_start_xmit)?
  ndo_dfwd_start_xmit already knows which queue set to pick from (since 
  their
  reserved for the device doing the transmitting).  It seems more clear to 
  me than
  creating a new netdevice operation.  
  See commit 8ffab51b3dfc54876f145f15b351c41f3f703195 (macvlan: lockless
  tx path). The point is keep the tx path lockless to be efficient and
  simplicity for management. And macvtap multiqueue was also implemented
  with this assumption. The real contention should be done in the txq of
  lower device instead of macvlan itself. This is also needed for
  multiqueue macvtap.
  Ok, I see how you're preserving LLTX here, and thats great, but it doesn't
  really buy us anything that I can see.  If a macvlan is using hardware
  acceleration, it needs to arbitrate access to that hardware.  Weather thats 
  done
  by locking the lowerdev's tx queue lock or by enforcing locking on the 
  macvlan
  itself is equivalent.  The decision to use dfwd hardware acceleration is 
  made on
  open, so its not like theres any traffic that can avoid the lock, as it all 
  goes
  through the hardware.  All I see that this has bought us is an extra 
  net_device
  method (which isn't a big deal, but not necessecary as I see it).
 
 As I replied to patch 1/2, looking at the code itself again. The locking
 on the lowerdev's tx queue is really need since we need synchronize with
 other control path. Two examples are dev watchdog and ixgbe_down() both
 of which will try to hold tx lock to synchronize the with transmission.
 Without holding the lowerdev tx lock, we may have more serious issues.
 Also, it's a little strange for a net device has two modes. Future
 developers need to care about two different tx lock paths which is sub
 optimal.
 

Ok, having looked at this for a few hours, I agree, locking in the lowerdev has
some definiate advantages in plugging the holes you've pointed out.

 For the issue of an extra net_device method,  if you don't like we can
 reuse the ndo_select_queue by also passing the accel_priv to that method.
I do, that actually simplifies things, since it lets us use the entire
dev_hard_start_xmit path unmodified, which gives us the locking your looking for
without having to create a new slimmed down variant of dev_hard_start_xmit.

Regards
Neil
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 2/2] net: core: explicitly select a txq before doing l2 forwarding

2014-01-07 Thread Jason Wang
On 01/07/2014 09:17 PM, Neil Horman wrote:
> On Tue, Jan 07, 2014 at 11:42:24AM +0800, Jason Wang wrote:
>> On 01/06/2014 08:42 PM, Neil Horman wrote:
>>> On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote:
 Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). 
 The
 will cause several issues:

 - NETIF_F_LLTX was forced for macvlan device in this case which lead extra 
 lock
   contention.
 - dev_hard_start_xmit() was called with NULL txq which bypasses the net 
 device
   watchdog
 - dev_hard_start_xmit() does not check txq everywhere which will lead a 
 crash
   when tso is disabled for lower device.

 Fix this by explicitly introducing a select queue method just for l2 
 forwarding
 offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to do 
 the
 queue selecting and transmitting for l2 forwarding.

 With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's 
 no need
 to check txq against NULL in dev_hard_start_xmit().

 In the future, it was also required for macvtap l2 forwarding support 
 since it
 provides a necessary synchronization method.

 Cc: John Fastabend 
 Cc: Neil Horman 
 Cc: e1000-de...@lists.sourceforge.net
 Signed-off-by: Jason Wang 
>>> Instead of creating another operation here to do special queue selection, 
>>> why
>>> not just have ndo_dfwd_start_xmit include a pointer to a pointer in its 
>>> argument
>>> list, so it can pass the txq it used back to the caller 
>>> (dev_hard_start_xmit)?
>>> ndo_dfwd_start_xmit already knows which queue set to pick from (since their
>>> reserved for the device doing the transmitting).  It seems more clear to me 
>>> than
>>> creating a new netdevice operation.  
>> See commit 8ffab51b3dfc54876f145f15b351c41f3f703195 ("macvlan: lockless
>> tx path"). The point is keep the tx path lockless to be efficient and
>> simplicity for management. And macvtap multiqueue was also implemented
>> with this assumption. The real contention should be done in the txq of
>> lower device instead of macvlan itself. This is also needed for
>> multiqueue macvtap.
> Ok, I see how you're preserving LLTX here, and thats great, but it doesn't
> really buy us anything that I can see.  If a macvlan is using hardware
> acceleration, it needs to arbitrate access to that hardware.  Weather thats 
> done
> by locking the lowerdev's tx queue lock or by enforcing locking on the macvlan
> itself is equivalent.  The decision to use dfwd hardware acceleration is made 
> on
> open, so its not like theres any traffic that can avoid the lock, as it all 
> goes
> through the hardware.  All I see that this has bought us is an extra 
> net_device
> method (which isn't a big deal, but not necessecary as I see it).

As I replied to patch 1/2, looking at the code itself again. The locking
on the lowerdev's tx queue is really need since we need synchronize with
other control path. Two examples are dev watchdog and ixgbe_down() both
of which will try to hold tx lock to synchronize the with transmission.
Without holding the lowerdev tx lock, we may have more serious issues.
Also, it's a little strange for a net device has two modes. Future
developers need to care about two different tx lock paths which is sub
optimal.

For the issue of an extra net_device method,  if you don't like we can
reuse the ndo_select_queue by also passing the accel_priv to that method.
>
> It seems like the right solution here is to use ethtool to disable the dfwd
> acceleration feature on the hardware if you don't want to incur the additional
> locking.
>
>>> As for the crash issue, I'm not sure what you mean.  Where in
>>> dev_hard_start_xmit would we need to check txq that we're not currently, and
>>> what crash results?
>> Well, see current dev_hard_start_xmit(), if lower device does not
>> support tso or tso is disabled, in gso path:
>>
>> gso:
>> ...
>> txq_trans_update(txq);
>> if (unlikely(netif_xmit_stopped(txq) && skb->next))
>>
>> There's an obvious NULL pointer dereference.
> Yeah, I saw that after I wrote my note, sorry about that.  However, it doesn't
> change what I said above.  i don't think theres a need to add an additional
> select_queue method here, just add a pointer to a pointer arg to the dfwd xmit
> routine to fill in the queue that was selected on transmit.  That should 
> resolve
> all the potential NULL pointers without the need to ad additional methods to
> net_device_ops.
>
>>> Also, can you elaborate on what you mean by additional lock contention? 
>> If the lower device has NETIF_F_LLTX, then both macvlan txq lock and the
>> lock of device itself must be held before doing transmission. In the
>> case, the macvlan txq lock contention is obvious unnecessary.
> Not in its current implementation.  The lowerdevice's NETIF_F_LLTX are ignored
> during transmit because we're using a privately 

Re: [PATCH net 2/2] net: core: explicitly select a txq before doing l2 forwarding

2014-01-07 Thread Neil Horman
On Tue, Jan 07, 2014 at 11:42:24AM +0800, Jason Wang wrote:
> On 01/06/2014 08:42 PM, Neil Horman wrote:
> > On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote:
> >> Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). 
> >> The
> >> will cause several issues:
> >>
> >> - NETIF_F_LLTX was forced for macvlan device in this case which lead extra 
> >> lock
> >>   contention.
> >> - dev_hard_start_xmit() was called with NULL txq which bypasses the net 
> >> device
> >>   watchdog
> >> - dev_hard_start_xmit() does not check txq everywhere which will lead a 
> >> crash
> >>   when tso is disabled for lower device.
> >>
> >> Fix this by explicitly introducing a select queue method just for l2 
> >> forwarding
> >> offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to do 
> >> the
> >> queue selecting and transmitting for l2 forwarding.
> >>
> >> With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's 
> >> no need
> >> to check txq against NULL in dev_hard_start_xmit().
> >>
> >> In the future, it was also required for macvtap l2 forwarding support 
> >> since it
> >> provides a necessary synchronization method.
> >>
> >> Cc: John Fastabend 
> >> Cc: Neil Horman 
> >> Cc: e1000-de...@lists.sourceforge.net
> >> Signed-off-by: Jason Wang 
> > Instead of creating another operation here to do special queue selection, 
> > why
> > not just have ndo_dfwd_start_xmit include a pointer to a pointer in its 
> > argument
> > list, so it can pass the txq it used back to the caller 
> > (dev_hard_start_xmit)?
> > ndo_dfwd_start_xmit already knows which queue set to pick from (since their
> > reserved for the device doing the transmitting).  It seems more clear to me 
> > than
> > creating a new netdevice operation.  
> 
> See commit 8ffab51b3dfc54876f145f15b351c41f3f703195 ("macvlan: lockless
> tx path"). The point is keep the tx path lockless to be efficient and
> simplicity for management. And macvtap multiqueue was also implemented
> with this assumption. The real contention should be done in the txq of
> lower device instead of macvlan itself. This is also needed for
> multiqueue macvtap.
> >
Ok, I see how you're preserving LLTX here, and thats great, but it doesn't
really buy us anything that I can see.  If a macvlan is using hardware
acceleration, it needs to arbitrate access to that hardware.  Weather thats done
by locking the lowerdev's tx queue lock or by enforcing locking on the macvlan
itself is equivalent.  The decision to use dfwd hardware acceleration is made on
open, so its not like theres any traffic that can avoid the lock, as it all goes
through the hardware.  All I see that this has bought us is an extra net_device
method (which isn't a big deal, but not necessecary as I see it).

It seems like the right solution here is to use ethtool to disable the dfwd
acceleration feature on the hardware if you don't want to incur the additional
locking.

> > As for the crash issue, I'm not sure what you mean.  Where in
> > dev_hard_start_xmit would we need to check txq that we're not currently, and
> > what crash results?
> 
> Well, see current dev_hard_start_xmit(), if lower device does not
> support tso or tso is disabled, in gso path:
> 
> gso:
> ...
> txq_trans_update(txq);
> if (unlikely(netif_xmit_stopped(txq) && skb->next))
> 
> There's an obvious NULL pointer dereference.
Yeah, I saw that after I wrote my note, sorry about that.  However, it doesn't
change what I said above.  i don't think theres a need to add an additional
select_queue method here, just add a pointer to a pointer arg to the dfwd xmit
routine to fill in the queue that was selected on transmit.  That should resolve
all the potential NULL pointers without the need to ad additional methods to
net_device_ops.

> >
> > Also, can you elaborate on what you mean by additional lock contention? 
> 
> If the lower device has NETIF_F_LLTX, then both macvlan txq lock and the
> lock of device itself must be held before doing transmission. In the
> case, the macvlan txq lock contention is obvious unnecessary.
Not in its current implementation.  The lowerdevice's NETIF_F_LLTX are ignored
during transmit because we're using a privately allocated set of queues assigned
to the transmitting macvlan (i.e. there is not contention on the lowerdevice for
any single given macvlan).  Thats why we have to clear the NETIF_F_LLTX field on
the macvlan itself.  using dfwd acceleration is effectively giving a macvlan its
own hardware, that needs arbitration between every context thats trying to
transmit over it.


That said, I did just notice that if the macvlan has multiple queues we will
still use a single tx queue on transmit, we should map those queues to multiple
hardware queues.  I'll fix that shortly.

> >  What
> > contention do you see that goes above and beyond the normal locking 
> > required by
> > txq access?  
> 
> As I said above, the point is keeping the lockess tx path and make 

Re: [PATCH net 2/2] net: core: explicitly select a txq before doing l2 forwarding

2014-01-07 Thread John Fastabend

[...]



+int dfwd_direct_xmit(struct sk_buff *skb, struct net_device *dev,
+ void *accel_priv)
+{
+struct netdev_queue *txq;
+int ret = NETDEV_TX_BUSY;
+int index;
+
+BUG_ON(!dev->netdev_ops->ndo_dfwd_select_queue);
+index =dev->netdev_ops->ndo_dfwd_select_queue(dev, skb,
+   accel_priv);
+
+local_bh_disable();
+
+skb_set_queue_mapping(skb, index);


How about replacing the index calculation and skb_set_queue_mapping with
netdev_pick_tx(). Then we don't need to add a new op and the existing
XPS, tx hash and select_queue() op works.



Sorry for the noise that was dumb it wouldn't work.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 2/2] net: core: explicitly select a txq before doing l2 forwarding

2014-01-07 Thread John Fastabend

On 1/5/2014 7:21 PM, Jason Wang wrote:

Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
will cause several issues:

- NETIF_F_LLTX was forced for macvlan device in this case which lead extra lock
   contention.
- dev_hard_start_xmit() was called with NULL txq which bypasses the net device
   watchdog
- dev_hard_start_xmit() does not check txq everywhere which will lead a crash
   when tso is disabled for lower device.

Fix this by explicitly introducing a select queue method just for l2 forwarding
offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to do the
queue selecting and transmitting for l2 forwarding.

With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
to check txq against NULL in dev_hard_start_xmit().

In the future, it was also required for macvtap l2 forwarding support since it
provides a necessary synchronization method.

Cc: John Fastabend 
Cc: Neil Horman 
Cc: e1000-de...@lists.sourceforge.net
Signed-off-by: Jason Wang 
---


[...]


index 4fc1722..bc2b03f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2538,6 +2538,32 @@ static inline int skb_needs_linearize(struct sk_buff 
*skb,
!(features & NETIF_F_SG)));
  }

+int dfwd_direct_xmit(struct sk_buff *skb, struct net_device *dev,
+void *accel_priv)
+{
+   struct netdev_queue *txq;
+   int ret = NETDEV_TX_BUSY;
+   int index;
+
+   BUG_ON(!dev->netdev_ops->ndo_dfwd_select_queue);
+   index = dev->netdev_ops->ndo_dfwd_select_queue(dev, skb,
+  accel_priv);
+
+   local_bh_disable();
+
+   skb_set_queue_mapping(skb, index);


How about replacing the index calculation and skb_set_queue_mapping with
netdev_pick_tx(). Then we don't need to add a new op and the existing
XPS, tx hash and select_queue() op works.


+   txq = netdev_get_tx_queue(dev, index);
+
+   HARD_TX_LOCK(dev, txq, smp_processor_id());
+   if (!netif_xmit_frozen_or_stopped(txq))
+   ret = dev_hard_start_xmit(skb, dev, txq, accel_priv);
+   HARD_TX_UNLOCK(dev, txq);
+
+   local_bh_enable();
+   return ret;
+}
+EXPORT_SYMBOL_GPL(dfwd_direct_xmit);
+
  int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
struct netdev_queue *txq, void *accel_priv)
  {
@@ -2611,7 +2637,7 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct 
net_device *dev,
rc = ops->ndo_start_xmit(skb, dev);

trace_net_dev_xmit(skb, rc, dev, skb_len);
-   if (rc == NETDEV_TX_OK && txq)
+   if (rc == NETDEV_TX_OK)
txq_trans_update(txq);


Removing the check here rather than adding more checks in the gso case
as I suggested in the other thread seems cleaner.

Thanks!
John



return rc;
}



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 2/2] net: core: explicitly select a txq before doing l2 forwarding

2014-01-07 Thread John Fastabend

On 1/5/2014 7:21 PM, Jason Wang wrote:

Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
will cause several issues:

- NETIF_F_LLTX was forced for macvlan device in this case which lead extra lock
   contention.
- dev_hard_start_xmit() was called with NULL txq which bypasses the net device
   watchdog
- dev_hard_start_xmit() does not check txq everywhere which will lead a crash
   when tso is disabled for lower device.

Fix this by explicitly introducing a select queue method just for l2 forwarding
offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to do the
queue selecting and transmitting for l2 forwarding.

With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
to check txq against NULL in dev_hard_start_xmit().

In the future, it was also required for macvtap l2 forwarding support since it
provides a necessary synchronization method.

Cc: John Fastabend john.r.fastab...@intel.com
Cc: Neil Horman nhor...@tuxdriver.com
Cc: e1000-de...@lists.sourceforge.net
Signed-off-by: Jason Wang jasow...@redhat.com
---


[...]


index 4fc1722..bc2b03f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2538,6 +2538,32 @@ static inline int skb_needs_linearize(struct sk_buff 
*skb,
!(features  NETIF_F_SG)));
  }

+int dfwd_direct_xmit(struct sk_buff *skb, struct net_device *dev,
+void *accel_priv)
+{
+   struct netdev_queue *txq;
+   int ret = NETDEV_TX_BUSY;
+   int index;
+
+   BUG_ON(!dev-netdev_ops-ndo_dfwd_select_queue);
+   index = dev-netdev_ops-ndo_dfwd_select_queue(dev, skb,
+  accel_priv);
+
+   local_bh_disable();
+
+   skb_set_queue_mapping(skb, index);


How about replacing the index calculation and skb_set_queue_mapping with
netdev_pick_tx(). Then we don't need to add a new op and the existing
XPS, tx hash and select_queue() op works.


+   txq = netdev_get_tx_queue(dev, index);
+
+   HARD_TX_LOCK(dev, txq, smp_processor_id());
+   if (!netif_xmit_frozen_or_stopped(txq))
+   ret = dev_hard_start_xmit(skb, dev, txq, accel_priv);
+   HARD_TX_UNLOCK(dev, txq);
+
+   local_bh_enable();
+   return ret;
+}
+EXPORT_SYMBOL_GPL(dfwd_direct_xmit);
+
  int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
struct netdev_queue *txq, void *accel_priv)
  {
@@ -2611,7 +2637,7 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct 
net_device *dev,
rc = ops-ndo_start_xmit(skb, dev);

trace_net_dev_xmit(skb, rc, dev, skb_len);
-   if (rc == NETDEV_TX_OK  txq)
+   if (rc == NETDEV_TX_OK)
txq_trans_update(txq);


Removing the check here rather than adding more checks in the gso case
as I suggested in the other thread seems cleaner.

Thanks!
John



return rc;
}



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 2/2] net: core: explicitly select a txq before doing l2 forwarding

2014-01-07 Thread John Fastabend

[...]



+int dfwd_direct_xmit(struct sk_buff *skb, struct net_device *dev,
+ void *accel_priv)
+{
+struct netdev_queue *txq;
+int ret = NETDEV_TX_BUSY;
+int index;
+
+BUG_ON(!dev-netdev_ops-ndo_dfwd_select_queue);
+index =dev-netdev_ops-ndo_dfwd_select_queue(dev, skb,
+   accel_priv);
+
+local_bh_disable();
+
+skb_set_queue_mapping(skb, index);


How about replacing the index calculation and skb_set_queue_mapping with
netdev_pick_tx(). Then we don't need to add a new op and the existing
XPS, tx hash and select_queue() op works.



Sorry for the noise that was dumb it wouldn't work.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 2/2] net: core: explicitly select a txq before doing l2 forwarding

2014-01-07 Thread Neil Horman
On Tue, Jan 07, 2014 at 11:42:24AM +0800, Jason Wang wrote:
 On 01/06/2014 08:42 PM, Neil Horman wrote:
  On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote:
  Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). 
  The
  will cause several issues:
 
  - NETIF_F_LLTX was forced for macvlan device in this case which lead extra 
  lock
contention.
  - dev_hard_start_xmit() was called with NULL txq which bypasses the net 
  device
watchdog
  - dev_hard_start_xmit() does not check txq everywhere which will lead a 
  crash
when tso is disabled for lower device.
 
  Fix this by explicitly introducing a select queue method just for l2 
  forwarding
  offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to do 
  the
  queue selecting and transmitting for l2 forwarding.
 
  With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's 
  no need
  to check txq against NULL in dev_hard_start_xmit().
 
  In the future, it was also required for macvtap l2 forwarding support 
  since it
  provides a necessary synchronization method.
 
  Cc: John Fastabend john.r.fastab...@intel.com
  Cc: Neil Horman nhor...@tuxdriver.com
  Cc: e1000-de...@lists.sourceforge.net
  Signed-off-by: Jason Wang jasow...@redhat.com
  Instead of creating another operation here to do special queue selection, 
  why
  not just have ndo_dfwd_start_xmit include a pointer to a pointer in its 
  argument
  list, so it can pass the txq it used back to the caller 
  (dev_hard_start_xmit)?
  ndo_dfwd_start_xmit already knows which queue set to pick from (since their
  reserved for the device doing the transmitting).  It seems more clear to me 
  than
  creating a new netdevice operation.  
 
 See commit 8ffab51b3dfc54876f145f15b351c41f3f703195 (macvlan: lockless
 tx path). The point is keep the tx path lockless to be efficient and
 simplicity for management. And macvtap multiqueue was also implemented
 with this assumption. The real contention should be done in the txq of
 lower device instead of macvlan itself. This is also needed for
 multiqueue macvtap.
 
Ok, I see how you're preserving LLTX here, and thats great, but it doesn't
really buy us anything that I can see.  If a macvlan is using hardware
acceleration, it needs to arbitrate access to that hardware.  Weather thats done
by locking the lowerdev's tx queue lock or by enforcing locking on the macvlan
itself is equivalent.  The decision to use dfwd hardware acceleration is made on
open, so its not like theres any traffic that can avoid the lock, as it all goes
through the hardware.  All I see that this has bought us is an extra net_device
method (which isn't a big deal, but not necessecary as I see it).

It seems like the right solution here is to use ethtool to disable the dfwd
acceleration feature on the hardware if you don't want to incur the additional
locking.

  As for the crash issue, I'm not sure what you mean.  Where in
  dev_hard_start_xmit would we need to check txq that we're not currently, and
  what crash results?
 
 Well, see current dev_hard_start_xmit(), if lower device does not
 support tso or tso is disabled, in gso path:
 
 gso:
 ...
 txq_trans_update(txq);
 if (unlikely(netif_xmit_stopped(txq)  skb-next))
 
 There's an obvious NULL pointer dereference.
Yeah, I saw that after I wrote my note, sorry about that.  However, it doesn't
change what I said above.  i don't think theres a need to add an additional
select_queue method here, just add a pointer to a pointer arg to the dfwd xmit
routine to fill in the queue that was selected on transmit.  That should resolve
all the potential NULL pointers without the need to ad additional methods to
net_device_ops.

 
  Also, can you elaborate on what you mean by additional lock contention? 
 
 If the lower device has NETIF_F_LLTX, then both macvlan txq lock and the
 lock of device itself must be held before doing transmission. In the
 case, the macvlan txq lock contention is obvious unnecessary.
Not in its current implementation.  The lowerdevice's NETIF_F_LLTX are ignored
during transmit because we're using a privately allocated set of queues assigned
to the transmitting macvlan (i.e. there is not contention on the lowerdevice for
any single given macvlan).  Thats why we have to clear the NETIF_F_LLTX field on
the macvlan itself.  using dfwd acceleration is effectively giving a macvlan its
own hardware, that needs arbitration between every context thats trying to
transmit over it.


That said, I did just notice that if the macvlan has multiple queues we will
still use a single tx queue on transmit, we should map those queues to multiple
hardware queues.  I'll fix that shortly.

   What
  contention do you see that goes above and beyond the normal locking 
  required by
  txq access?  
 
 As I said above, the point is keeping the lockess tx path and make the
 contention of txq for real device instead of macvlan itself.
But you've not done that.  You've 

Re: [PATCH net 2/2] net: core: explicitly select a txq before doing l2 forwarding

2014-01-07 Thread Jason Wang
On 01/07/2014 09:17 PM, Neil Horman wrote:
 On Tue, Jan 07, 2014 at 11:42:24AM +0800, Jason Wang wrote:
 On 01/06/2014 08:42 PM, Neil Horman wrote:
 On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote:
 Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). 
 The
 will cause several issues:

 - NETIF_F_LLTX was forced for macvlan device in this case which lead extra 
 lock
   contention.
 - dev_hard_start_xmit() was called with NULL txq which bypasses the net 
 device
   watchdog
 - dev_hard_start_xmit() does not check txq everywhere which will lead a 
 crash
   when tso is disabled for lower device.

 Fix this by explicitly introducing a select queue method just for l2 
 forwarding
 offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to do 
 the
 queue selecting and transmitting for l2 forwarding.

 With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's 
 no need
 to check txq against NULL in dev_hard_start_xmit().

 In the future, it was also required for macvtap l2 forwarding support 
 since it
 provides a necessary synchronization method.

 Cc: John Fastabend john.r.fastab...@intel.com
 Cc: Neil Horman nhor...@tuxdriver.com
 Cc: e1000-de...@lists.sourceforge.net
 Signed-off-by: Jason Wang jasow...@redhat.com
 Instead of creating another operation here to do special queue selection, 
 why
 not just have ndo_dfwd_start_xmit include a pointer to a pointer in its 
 argument
 list, so it can pass the txq it used back to the caller 
 (dev_hard_start_xmit)?
 ndo_dfwd_start_xmit already knows which queue set to pick from (since their
 reserved for the device doing the transmitting).  It seems more clear to me 
 than
 creating a new netdevice operation.  
 See commit 8ffab51b3dfc54876f145f15b351c41f3f703195 (macvlan: lockless
 tx path). The point is keep the tx path lockless to be efficient and
 simplicity for management. And macvtap multiqueue was also implemented
 with this assumption. The real contention should be done in the txq of
 lower device instead of macvlan itself. This is also needed for
 multiqueue macvtap.
 Ok, I see how you're preserving LLTX here, and thats great, but it doesn't
 really buy us anything that I can see.  If a macvlan is using hardware
 acceleration, it needs to arbitrate access to that hardware.  Weather thats 
 done
 by locking the lowerdev's tx queue lock or by enforcing locking on the macvlan
 itself is equivalent.  The decision to use dfwd hardware acceleration is made 
 on
 open, so its not like theres any traffic that can avoid the lock, as it all 
 goes
 through the hardware.  All I see that this has bought us is an extra 
 net_device
 method (which isn't a big deal, but not necessecary as I see it).

As I replied to patch 1/2, looking at the code itself again. The locking
on the lowerdev's tx queue is really need since we need synchronize with
other control path. Two examples are dev watchdog and ixgbe_down() both
of which will try to hold tx lock to synchronize the with transmission.
Without holding the lowerdev tx lock, we may have more serious issues.
Also, it's a little strange for a net device has two modes. Future
developers need to care about two different tx lock paths which is sub
optimal.

For the issue of an extra net_device method,  if you don't like we can
reuse the ndo_select_queue by also passing the accel_priv to that method.

 It seems like the right solution here is to use ethtool to disable the dfwd
 acceleration feature on the hardware if you don't want to incur the additional
 locking.

 As for the crash issue, I'm not sure what you mean.  Where in
 dev_hard_start_xmit would we need to check txq that we're not currently, and
 what crash results?
 Well, see current dev_hard_start_xmit(), if lower device does not
 support tso or tso is disabled, in gso path:

 gso:
 ...
 txq_trans_update(txq);
 if (unlikely(netif_xmit_stopped(txq)  skb-next))

 There's an obvious NULL pointer dereference.
 Yeah, I saw that after I wrote my note, sorry about that.  However, it doesn't
 change what I said above.  i don't think theres a need to add an additional
 select_queue method here, just add a pointer to a pointer arg to the dfwd xmit
 routine to fill in the queue that was selected on transmit.  That should 
 resolve
 all the potential NULL pointers without the need to ad additional methods to
 net_device_ops.

 Also, can you elaborate on what you mean by additional lock contention? 
 If the lower device has NETIF_F_LLTX, then both macvlan txq lock and the
 lock of device itself must be held before doing transmission. In the
 case, the macvlan txq lock contention is obvious unnecessary.
 Not in its current implementation.  The lowerdevice's NETIF_F_LLTX are ignored
 during transmit because we're using a privately allocated set of queues 
 assigned
 to the transmitting macvlan (i.e. there is not contention on the lowerdevice 
 for
 any single given macvlan).  Thats why we have to clear the 

Re: [PATCH net 2/2] net: core: explicitly select a txq before doing l2 forwarding

2014-01-06 Thread Jason Wang
On 01/06/2014 08:42 PM, Neil Horman wrote:
> On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote:
>> Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). 
>> The
>> will cause several issues:
>>
>> - NETIF_F_LLTX was forced for macvlan device in this case which lead extra 
>> lock
>>   contention.
>> - dev_hard_start_xmit() was called with NULL txq which bypasses the net 
>> device
>>   watchdog
>> - dev_hard_start_xmit() does not check txq everywhere which will lead a crash
>>   when tso is disabled for lower device.
>>
>> Fix this by explicitly introducing a select queue method just for l2 
>> forwarding
>> offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to do the
>> queue selecting and transmitting for l2 forwarding.
>>
>> With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no 
>> need
>> to check txq against NULL in dev_hard_start_xmit().
>>
>> In the future, it was also required for macvtap l2 forwarding support since 
>> it
>> provides a necessary synchronization method.
>>
>> Cc: John Fastabend 
>> Cc: Neil Horman 
>> Cc: e1000-de...@lists.sourceforge.net
>> Signed-off-by: Jason Wang 
> Instead of creating another operation here to do special queue selection, why
> not just have ndo_dfwd_start_xmit include a pointer to a pointer in its 
> argument
> list, so it can pass the txq it used back to the caller (dev_hard_start_xmit)?
> ndo_dfwd_start_xmit already knows which queue set to pick from (since their
> reserved for the device doing the transmitting).  It seems more clear to me 
> than
> creating a new netdevice operation.  

See commit 8ffab51b3dfc54876f145f15b351c41f3f703195 ("macvlan: lockless
tx path"). The point is keep the tx path lockless to be efficient and
simplicity for management. And macvtap multiqueue was also implemented
with this assumption. The real contention should be done in the txq of
lower device instead of macvlan itself. This is also needed for
multiqueue macvtap.
>
> As for the crash issue, I'm not sure what you mean.  Where in
> dev_hard_start_xmit would we need to check txq that we're not currently, and
> what crash results?

Well, see current dev_hard_start_xmit(), if lower device does not
support tso or tso is disabled, in gso path:

gso:
...
txq_trans_update(txq);
if (unlikely(netif_xmit_stopped(txq) && skb->next))

There's an obvious NULL pointer dereference.
>
> Also, can you elaborate on what you mean by additional lock contention? 

If the lower device has NETIF_F_LLTX, then both macvlan txq lock and the
lock of device itself must be held before doing transmission. In the
case, the macvlan txq lock contention is obvious unnecessary.
>  What
> contention do you see that goes above and beyond the normal locking required 
> by
> txq access?  

As I said above, the point is keeping the lockess tx path and make the
contention of txq for real device instead of macvlan itself.
> I suppose its extra locking above and beyond in the macvtap case,
> where you would otherwise never hit hardware, but that not the only use case,
> and I think the solution there is likely to add some code in the macvlan 
> feature
> set handler so that NETIF_F_LLTX is cleared if you disable the hardware
> forwarding acceleration via ethtool.
>
> Regards
> Neil
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 2/2] net: core: explicitly select a txq before doing l2 forwarding

2014-01-06 Thread Neil Horman
On Mon, Jan 06, 2014 at 07:06:25AM -0800, John Fastabend wrote:
> On 01/06/2014 04:42 AM, Neil Horman wrote:
> >On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote:
> >>Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). 
> >>The
> >>will cause several issues:
> >>
> >>- NETIF_F_LLTX was forced for macvlan device in this case which lead extra 
> >>lock
> >>   contention.
> >>- dev_hard_start_xmit() was called with NULL txq which bypasses the net 
> >>device
> >>   watchdog
> >>- dev_hard_start_xmit() does not check txq everywhere which will lead a 
> >>crash
> >>   when tso is disabled for lower device.
> >>
> >>Fix this by explicitly introducing a select queue method just for l2 
> >>forwarding
> >>offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to do 
> >>the
> >>queue selecting and transmitting for l2 forwarding.
> >>
> >>With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no 
> >>need
> >>to check txq against NULL in dev_hard_start_xmit().
> >>
> >>In the future, it was also required for macvtap l2 forwarding support since 
> >>it
> >>provides a necessary synchronization method.
> >>
> >>Cc: John Fastabend 
> >>Cc: Neil Horman 
> >>Cc: e1000-de...@lists.sourceforge.net
> >>Signed-off-by: Jason Wang 
> >
> >Instead of creating another operation here to do special queue selection, why
> >not just have ndo_dfwd_start_xmit include a pointer to a pointer in its 
> >argument
> >list, so it can pass the txq it used back to the caller 
> >(dev_hard_start_xmit)?
> >ndo_dfwd_start_xmit already knows which queue set to pick from (since their
> >reserved for the device doing the transmitting).  It seems more clear to me 
> >than
> >creating a new netdevice operation.
> >
> >As for the crash issue, I'm not sure what you mean.  Where in
> >dev_hard_start_xmit would we need to check txq that we're not currently, and
> >what crash results?
> >
> >Also, can you elaborate on what you mean by additional lock contention?  What
> >contention do you see that goes above and beyond the normal locking required 
> >by
> >txq access?  I suppose its extra locking above and beyond in the macvtap 
> >case,
> >where you would otherwise never hit hardware, but that not the only use case,
> >and I think the solution there is likely to add some code in the macvlan 
> >feature
> >set handler so that NETIF_F_LLTX is cleared if you disable the hardware
> >forwarding acceleration via ethtool.
> >
> 
> NETIF_F_LLTX is cleared in macvlan_open() which should be used in the
> macvtap case.
> 
Thats right, since accelerated hardware tx queue doesn't participate in the
network stack queue locking, the upper device needs to do it.

Thanks!
Neil

>   if (lowerdev->features & NETIF_F_HW_L2FW_DOFFLOAD) {
>   vlan->fwd_priv =
> 
> lowerdev->netdev_ops->ndo_dfwd_add_station(lowerdev, dev);
> 
>   /* If we get a NULL pointer back, or if we get an error
>* then we should just fall through to the non
> accelerated path
>*/
>   if (IS_ERR_OR_NULL(vlan->fwd_priv)) {
>   vlan->fwd_priv = NULL;
>   } else {
>   dev->features &= ~NETIF_F_LLTX;
>   return 0;
>   }
>   }
> 
> Thanks,
> John
> 
> 
> -- 
> John Fastabend Intel Corporation
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 2/2] net: core: explicitly select a txq before doing l2 forwarding

2014-01-06 Thread John Fastabend

On 01/06/2014 04:42 AM, Neil Horman wrote:

On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote:

Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
will cause several issues:

- NETIF_F_LLTX was forced for macvlan device in this case which lead extra lock
   contention.
- dev_hard_start_xmit() was called with NULL txq which bypasses the net device
   watchdog
- dev_hard_start_xmit() does not check txq everywhere which will lead a crash
   when tso is disabled for lower device.

Fix this by explicitly introducing a select queue method just for l2 forwarding
offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to do the
queue selecting and transmitting for l2 forwarding.

With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
to check txq against NULL in dev_hard_start_xmit().

In the future, it was also required for macvtap l2 forwarding support since it
provides a necessary synchronization method.

Cc: John Fastabend 
Cc: Neil Horman 
Cc: e1000-de...@lists.sourceforge.net
Signed-off-by: Jason Wang 


Instead of creating another operation here to do special queue selection, why
not just have ndo_dfwd_start_xmit include a pointer to a pointer in its argument
list, so it can pass the txq it used back to the caller (dev_hard_start_xmit)?
ndo_dfwd_start_xmit already knows which queue set to pick from (since their
reserved for the device doing the transmitting).  It seems more clear to me than
creating a new netdevice operation.

As for the crash issue, I'm not sure what you mean.  Where in
dev_hard_start_xmit would we need to check txq that we're not currently, and
what crash results?

Also, can you elaborate on what you mean by additional lock contention?  What
contention do you see that goes above and beyond the normal locking required by
txq access?  I suppose its extra locking above and beyond in the macvtap case,
where you would otherwise never hit hardware, but that not the only use case,
and I think the solution there is likely to add some code in the macvlan feature
set handler so that NETIF_F_LLTX is cleared if you disable the hardware
forwarding acceleration via ethtool.



NETIF_F_LLTX is cleared in macvlan_open() which should be used in the
macvtap case.

  if (lowerdev->features & NETIF_F_HW_L2FW_DOFFLOAD) {
  vlan->fwd_priv =

lowerdev->netdev_ops->ndo_dfwd_add_station(lowerdev, dev);

  /* If we get a NULL pointer back, or if we get an error
   * then we should just fall through to the non 
accelerated path

   */
  if (IS_ERR_OR_NULL(vlan->fwd_priv)) {
  vlan->fwd_priv = NULL;
  } else {
  dev->features &= ~NETIF_F_LLTX;
  return 0;
  }
  }

Thanks,
John


--
John Fastabend Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 2/2] net: core: explicitly select a txq before doing l2 forwarding

2014-01-06 Thread Neil Horman
On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote:
> Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
> will cause several issues:
> 
> - NETIF_F_LLTX was forced for macvlan device in this case which lead extra 
> lock
>   contention.
> - dev_hard_start_xmit() was called with NULL txq which bypasses the net device
>   watchdog
> - dev_hard_start_xmit() does not check txq everywhere which will lead a crash
>   when tso is disabled for lower device.
> 
> Fix this by explicitly introducing a select queue method just for l2 
> forwarding
> offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to do the
> queue selecting and transmitting for l2 forwarding.
> 
> With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no 
> need
> to check txq against NULL in dev_hard_start_xmit().
> 
> In the future, it was also required for macvtap l2 forwarding support since it
> provides a necessary synchronization method.
> 
> Cc: John Fastabend 
> Cc: Neil Horman 
> Cc: e1000-de...@lists.sourceforge.net
> Signed-off-by: Jason Wang 

Instead of creating another operation here to do special queue selection, why
not just have ndo_dfwd_start_xmit include a pointer to a pointer in its argument
list, so it can pass the txq it used back to the caller (dev_hard_start_xmit)?
ndo_dfwd_start_xmit already knows which queue set to pick from (since their
reserved for the device doing the transmitting).  It seems more clear to me than
creating a new netdevice operation.  

As for the crash issue, I'm not sure what you mean.  Where in
dev_hard_start_xmit would we need to check txq that we're not currently, and
what crash results?

Also, can you elaborate on what you mean by additional lock contention?  What
contention do you see that goes above and beyond the normal locking required by
txq access?  I suppose its extra locking above and beyond in the macvtap case,
where you would otherwise never hit hardware, but that not the only use case,
and I think the solution there is likely to add some code in the macvlan feature
set handler so that NETIF_F_LLTX is cleared if you disable the hardware
forwarding acceleration via ethtool.

Regards
Neil

> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 2/2] net: core: explicitly select a txq before doing l2 forwarding

2014-01-06 Thread Neil Horman
On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote:
 Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
 will cause several issues:
 
 - NETIF_F_LLTX was forced for macvlan device in this case which lead extra 
 lock
   contention.
 - dev_hard_start_xmit() was called with NULL txq which bypasses the net device
   watchdog
 - dev_hard_start_xmit() does not check txq everywhere which will lead a crash
   when tso is disabled for lower device.
 
 Fix this by explicitly introducing a select queue method just for l2 
 forwarding
 offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to do the
 queue selecting and transmitting for l2 forwarding.
 
 With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no 
 need
 to check txq against NULL in dev_hard_start_xmit().
 
 In the future, it was also required for macvtap l2 forwarding support since it
 provides a necessary synchronization method.
 
 Cc: John Fastabend john.r.fastab...@intel.com
 Cc: Neil Horman nhor...@tuxdriver.com
 Cc: e1000-de...@lists.sourceforge.net
 Signed-off-by: Jason Wang jasow...@redhat.com

Instead of creating another operation here to do special queue selection, why
not just have ndo_dfwd_start_xmit include a pointer to a pointer in its argument
list, so it can pass the txq it used back to the caller (dev_hard_start_xmit)?
ndo_dfwd_start_xmit already knows which queue set to pick from (since their
reserved for the device doing the transmitting).  It seems more clear to me than
creating a new netdevice operation.  

As for the crash issue, I'm not sure what you mean.  Where in
dev_hard_start_xmit would we need to check txq that we're not currently, and
what crash results?

Also, can you elaborate on what you mean by additional lock contention?  What
contention do you see that goes above and beyond the normal locking required by
txq access?  I suppose its extra locking above and beyond in the macvtap case,
where you would otherwise never hit hardware, but that not the only use case,
and I think the solution there is likely to add some code in the macvlan feature
set handler so that NETIF_F_LLTX is cleared if you disable the hardware
forwarding acceleration via ethtool.

Regards
Neil

 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 2/2] net: core: explicitly select a txq before doing l2 forwarding

2014-01-06 Thread John Fastabend

On 01/06/2014 04:42 AM, Neil Horman wrote:

On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote:

Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
will cause several issues:

- NETIF_F_LLTX was forced for macvlan device in this case which lead extra lock
   contention.
- dev_hard_start_xmit() was called with NULL txq which bypasses the net device
   watchdog
- dev_hard_start_xmit() does not check txq everywhere which will lead a crash
   when tso is disabled for lower device.

Fix this by explicitly introducing a select queue method just for l2 forwarding
offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to do the
queue selecting and transmitting for l2 forwarding.

With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
to check txq against NULL in dev_hard_start_xmit().

In the future, it was also required for macvtap l2 forwarding support since it
provides a necessary synchronization method.

Cc: John Fastabend john.r.fastab...@intel.com
Cc: Neil Horman nhor...@tuxdriver.com
Cc: e1000-de...@lists.sourceforge.net
Signed-off-by: Jason Wang jasow...@redhat.com


Instead of creating another operation here to do special queue selection, why
not just have ndo_dfwd_start_xmit include a pointer to a pointer in its argument
list, so it can pass the txq it used back to the caller (dev_hard_start_xmit)?
ndo_dfwd_start_xmit already knows which queue set to pick from (since their
reserved for the device doing the transmitting).  It seems more clear to me than
creating a new netdevice operation.

As for the crash issue, I'm not sure what you mean.  Where in
dev_hard_start_xmit would we need to check txq that we're not currently, and
what crash results?

Also, can you elaborate on what you mean by additional lock contention?  What
contention do you see that goes above and beyond the normal locking required by
txq access?  I suppose its extra locking above and beyond in the macvtap case,
where you would otherwise never hit hardware, but that not the only use case,
and I think the solution there is likely to add some code in the macvlan feature
set handler so that NETIF_F_LLTX is cleared if you disable the hardware
forwarding acceleration via ethtool.



NETIF_F_LLTX is cleared in macvlan_open() which should be used in the
macvtap case.

  if (lowerdev-features  NETIF_F_HW_L2FW_DOFFLOAD) {
  vlan-fwd_priv =

lowerdev-netdev_ops-ndo_dfwd_add_station(lowerdev, dev);

  /* If we get a NULL pointer back, or if we get an error
   * then we should just fall through to the non 
accelerated path

   */
  if (IS_ERR_OR_NULL(vlan-fwd_priv)) {
  vlan-fwd_priv = NULL;
  } else {
  dev-features = ~NETIF_F_LLTX;
  return 0;
  }
  }

Thanks,
John


--
John Fastabend Intel Corporation
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 2/2] net: core: explicitly select a txq before doing l2 forwarding

2014-01-06 Thread Neil Horman
On Mon, Jan 06, 2014 at 07:06:25AM -0800, John Fastabend wrote:
 On 01/06/2014 04:42 AM, Neil Horman wrote:
 On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote:
 Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). 
 The
 will cause several issues:
 
 - NETIF_F_LLTX was forced for macvlan device in this case which lead extra 
 lock
contention.
 - dev_hard_start_xmit() was called with NULL txq which bypasses the net 
 device
watchdog
 - dev_hard_start_xmit() does not check txq everywhere which will lead a 
 crash
when tso is disabled for lower device.
 
 Fix this by explicitly introducing a select queue method just for l2 
 forwarding
 offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to do 
 the
 queue selecting and transmitting for l2 forwarding.
 
 With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no 
 need
 to check txq against NULL in dev_hard_start_xmit().
 
 In the future, it was also required for macvtap l2 forwarding support since 
 it
 provides a necessary synchronization method.
 
 Cc: John Fastabend john.r.fastab...@intel.com
 Cc: Neil Horman nhor...@tuxdriver.com
 Cc: e1000-de...@lists.sourceforge.net
 Signed-off-by: Jason Wang jasow...@redhat.com
 
 Instead of creating another operation here to do special queue selection, why
 not just have ndo_dfwd_start_xmit include a pointer to a pointer in its 
 argument
 list, so it can pass the txq it used back to the caller 
 (dev_hard_start_xmit)?
 ndo_dfwd_start_xmit already knows which queue set to pick from (since their
 reserved for the device doing the transmitting).  It seems more clear to me 
 than
 creating a new netdevice operation.
 
 As for the crash issue, I'm not sure what you mean.  Where in
 dev_hard_start_xmit would we need to check txq that we're not currently, and
 what crash results?
 
 Also, can you elaborate on what you mean by additional lock contention?  What
 contention do you see that goes above and beyond the normal locking required 
 by
 txq access?  I suppose its extra locking above and beyond in the macvtap 
 case,
 where you would otherwise never hit hardware, but that not the only use case,
 and I think the solution there is likely to add some code in the macvlan 
 feature
 set handler so that NETIF_F_LLTX is cleared if you disable the hardware
 forwarding acceleration via ethtool.
 
 
 NETIF_F_LLTX is cleared in macvlan_open() which should be used in the
 macvtap case.
 
Thats right, since accelerated hardware tx queue doesn't participate in the
network stack queue locking, the upper device needs to do it.

Thanks!
Neil

   if (lowerdev-features  NETIF_F_HW_L2FW_DOFFLOAD) {
   vlan-fwd_priv =
 
 lowerdev-netdev_ops-ndo_dfwd_add_station(lowerdev, dev);
 
   /* If we get a NULL pointer back, or if we get an error
* then we should just fall through to the non
 accelerated path
*/
   if (IS_ERR_OR_NULL(vlan-fwd_priv)) {
   vlan-fwd_priv = NULL;
   } else {
   dev-features = ~NETIF_F_LLTX;
   return 0;
   }
   }
 
 Thanks,
 John
 
 
 -- 
 John Fastabend Intel Corporation
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 2/2] net: core: explicitly select a txq before doing l2 forwarding

2014-01-06 Thread Jason Wang
On 01/06/2014 08:42 PM, Neil Horman wrote:
 On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote:
 Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). 
 The
 will cause several issues:

 - NETIF_F_LLTX was forced for macvlan device in this case which lead extra 
 lock
   contention.
 - dev_hard_start_xmit() was called with NULL txq which bypasses the net 
 device
   watchdog
 - dev_hard_start_xmit() does not check txq everywhere which will lead a crash
   when tso is disabled for lower device.

 Fix this by explicitly introducing a select queue method just for l2 
 forwarding
 offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to do the
 queue selecting and transmitting for l2 forwarding.

 With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no 
 need
 to check txq against NULL in dev_hard_start_xmit().

 In the future, it was also required for macvtap l2 forwarding support since 
 it
 provides a necessary synchronization method.

 Cc: John Fastabend john.r.fastab...@intel.com
 Cc: Neil Horman nhor...@tuxdriver.com
 Cc: e1000-de...@lists.sourceforge.net
 Signed-off-by: Jason Wang jasow...@redhat.com
 Instead of creating another operation here to do special queue selection, why
 not just have ndo_dfwd_start_xmit include a pointer to a pointer in its 
 argument
 list, so it can pass the txq it used back to the caller (dev_hard_start_xmit)?
 ndo_dfwd_start_xmit already knows which queue set to pick from (since their
 reserved for the device doing the transmitting).  It seems more clear to me 
 than
 creating a new netdevice operation.  

See commit 8ffab51b3dfc54876f145f15b351c41f3f703195 (macvlan: lockless
tx path). The point is keep the tx path lockless to be efficient and
simplicity for management. And macvtap multiqueue was also implemented
with this assumption. The real contention should be done in the txq of
lower device instead of macvlan itself. This is also needed for
multiqueue macvtap.

 As for the crash issue, I'm not sure what you mean.  Where in
 dev_hard_start_xmit would we need to check txq that we're not currently, and
 what crash results?

Well, see current dev_hard_start_xmit(), if lower device does not
support tso or tso is disabled, in gso path:

gso:
...
txq_trans_update(txq);
if (unlikely(netif_xmit_stopped(txq)  skb-next))

There's an obvious NULL pointer dereference.

 Also, can you elaborate on what you mean by additional lock contention? 

If the lower device has NETIF_F_LLTX, then both macvlan txq lock and the
lock of device itself must be held before doing transmission. In the
case, the macvlan txq lock contention is obvious unnecessary.
  What
 contention do you see that goes above and beyond the normal locking required 
 by
 txq access?  

As I said above, the point is keeping the lockess tx path and make the
contention of txq for real device instead of macvlan itself.
 I suppose its extra locking above and beyond in the macvtap case,
 where you would otherwise never hit hardware, but that not the only use case,
 and I think the solution there is likely to add some code in the macvlan 
 feature
 set handler so that NETIF_F_LLTX is cleared if you disable the hardware
 forwarding acceleration via ethtool.

 Regards
 Neil


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/