Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-10 Thread Vlad Yasevich
On 01/10/2014 02:06 AM, Jason Wang wrote:
> On 01/10/2014 05:39 AM, Stephen Hemminger wrote:
>> On Thu, 09 Jan 2014 16:55:07 +0800
>> Jason Wang  wrote:
>>
>>> What if use do want a qdisc and want to change the its queue length for
>>> tun/macvlan? And the the name tx_queue_length is misleading. For tun it
>>> may make sense since it was used in transmission path. For macvtap it
>>> was not. So maybe what we need is just a new ioctl for both tun/macvtap
>>> and a new feature flag. If user create the device with new feature flag,
>>> the socket receive queue length could be changed by ioctl instead of
>>> dev->tx_queue_length. If not, the old behaviour could be kept.
>> The overloading of tx_queue_len in macvtap was the original design mistake.
>> Can't this just be undone and expose rx_queue_len as sysfs attribute?
> 
> That works. But we current allow user to change the socket sndbuf
> through TUNSNDBUF. Maybe we need a similar one for receive.
> 

That would make sense.  Since the user interacts with tun fd almost as a
socket and there is actually a socket hiding in the kernel, it almost
begs for actual SO_SNDBUF/SO_RCVBUF support :)

-vlad
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-10 Thread Vlad Yasevich
On 01/10/2014 02:06 AM, Jason Wang wrote:
 On 01/10/2014 05:39 AM, Stephen Hemminger wrote:
 On Thu, 09 Jan 2014 16:55:07 +0800
 Jason Wang jasow...@redhat.com wrote:

 What if use do want a qdisc and want to change the its queue length for
 tun/macvlan? And the the name tx_queue_length is misleading. For tun it
 may make sense since it was used in transmission path. For macvtap it
 was not. So maybe what we need is just a new ioctl for both tun/macvtap
 and a new feature flag. If user create the device with new feature flag,
 the socket receive queue length could be changed by ioctl instead of
 dev-tx_queue_length. If not, the old behaviour could be kept.
 The overloading of tx_queue_len in macvtap was the original design mistake.
 Can't this just be undone and expose rx_queue_len as sysfs attribute?
 
 That works. But we current allow user to change the socket sndbuf
 through TUNSNDBUF. Maybe we need a similar one for receive.
 

That would make sense.  Since the user interacts with tun fd almost as a
socket and there is actually a socket hiding in the kernel, it almost
begs for actual SO_SNDBUF/SO_RCVBUF support :)

-vlad
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-09 Thread Jason Wang
On 01/10/2014 05:39 AM, Stephen Hemminger wrote:
> On Thu, 09 Jan 2014 16:55:07 +0800
> Jason Wang  wrote:
>
>> What if use do want a qdisc and want to change the its queue length for
>> tun/macvlan? And the the name tx_queue_length is misleading. For tun it
>> may make sense since it was used in transmission path. For macvtap it
>> was not. So maybe what we need is just a new ioctl for both tun/macvtap
>> and a new feature flag. If user create the device with new feature flag,
>> the socket receive queue length could be changed by ioctl instead of
>> dev->tx_queue_length. If not, the old behaviour could be kept.
> The overloading of tx_queue_len in macvtap was the original design mistake.
> Can't this just be undone and expose rx_queue_len as sysfs attribute?

That works. But we current allow user to change the socket sndbuf
through TUNSNDBUF. Maybe we need a similar one for receive.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-09 Thread Stephen Hemminger
On Fri, 10 Jan 2014 00:03:23 +0200
"Michael S. Tsirkin"  wrote:

> On Thu, Jan 09, 2014 at 01:39:08PM -0800, Stephen Hemminger wrote:
> > On Thu, 09 Jan 2014 16:55:07 +0800
> > Jason Wang  wrote:
> > 
> > > What if use do want a qdisc and want to change the its queue length for
> > > tun/macvlan? And the the name tx_queue_length is misleading. For tun it
> > > may make sense since it was used in transmission path. For macvtap it
> > > was not. So maybe what we need is just a new ioctl for both tun/macvtap
> > > and a new feature flag. If user create the device with new feature flag,
> > > the socket receive queue length could be changed by ioctl instead of
> > > dev->tx_queue_length. If not, the old behaviour could be kept.
> > 
> > The overloading of tx_queue_len in macvtap was the original design mistake.
> > Can't this just be undone and expose rx_queue_len as sysfs attribute?
> 
> Yes but we need to avoid breaking user-visible ABI.

I think in this case, it was a mistake and hasn't been around long enough
to cause serious damage.

> So I think we'll need to catch any access attempts and redirect them to
> the new rx_queue_len.  I posted a patch like this using new
> ndo_set_tx_queue_len/ndo_get_tx_queue_len.  Have you seen it? What do
> you think?

It encourages others to do/make the same mistake so I don't like it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-09 Thread Michael S. Tsirkin
On Thu, Jan 09, 2014 at 01:39:08PM -0800, Stephen Hemminger wrote:
> On Thu, 09 Jan 2014 16:55:07 +0800
> Jason Wang  wrote:
> 
> > What if use do want a qdisc and want to change the its queue length for
> > tun/macvlan? And the the name tx_queue_length is misleading. For tun it
> > may make sense since it was used in transmission path. For macvtap it
> > was not. So maybe what we need is just a new ioctl for both tun/macvtap
> > and a new feature flag. If user create the device with new feature flag,
> > the socket receive queue length could be changed by ioctl instead of
> > dev->tx_queue_length. If not, the old behaviour could be kept.
> 
> The overloading of tx_queue_len in macvtap was the original design mistake.
> Can't this just be undone and expose rx_queue_len as sysfs attribute?

Yes but we need to avoid breaking user-visible ABI.
So I think we'll need to catch any access attempts and redirect them to
the new rx_queue_len.  I posted a patch like this using new
ndo_set_tx_queue_len/ndo_get_tx_queue_len.  Have you seen it? What do
you think?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-09 Thread Stephen Hemminger
On Thu, 09 Jan 2014 16:55:07 +0800
Jason Wang  wrote:

> What if use do want a qdisc and want to change the its queue length for
> tun/macvlan? And the the name tx_queue_length is misleading. For tun it
> may make sense since it was used in transmission path. For macvtap it
> was not. So maybe what we need is just a new ioctl for both tun/macvtap
> and a new feature flag. If user create the device with new feature flag,
> the socket receive queue length could be changed by ioctl instead of
> dev->tx_queue_length. If not, the old behaviour could be kept.

The overloading of tx_queue_len in macvtap was the original design mistake.
Can't this just be undone and expose rx_queue_len as sysfs attribute?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-09 Thread Jason Wang
On 01/09/2014 03:17 PM, Michael S. Tsirkin wrote:
> On Wed, Jan 08, 2014 at 11:05:42AM -0800, John Fastabend wrote:
>> [...]
>>
> OK I think I'm finally putting all the pieces together thanks.
>
> Do you know why macvtap is setting dev->tx_queue_len by default? If you
> zero this then the noqueue_qdisc is used and the q->enqueue check in
> dev_queue_xmit will fail.
 It was introduced in commit 8a35747a5d13b99e076b0222729e0caa48cb69b6
 ("macvtap: Limit packet queue length") to limit the length of socket
 receive queue of macvtap. But I'm not sure whether the qdisc is a
 byproduct of this commit, maybe we can switch to use another name
 instead of just reuse dev->tx_queue_length.
>>> You mean tx_queue_len really, right?
>>>
>>> Problem is tx_queue_len can be accessed using netlink sysfs or ioctl,
>>> so if someone uses these to control or check the # of packets that
>>> can be queued by device, this will break.
>>>
>>> How about adding ndo_set_tx_queue_len then?
>>>
>>> At some point we wanted to decouple queue length from tx_queue_length
>>> for tun as well, so that would be benefitial there as well.
>> On the receive side we need to limit the receive queue and the
>> dev->tx_queue_len value was used to do this.
>>
>> However on the tx side when dev->tx_queue_len is set the default
>> qdisc pfifo_fast or mq is used depending on if there is multiqueue
>> or not. Unless the user specifies some numtxqueues when creating
>> the macvtap device then it will be a single queue device and use
>> the pfifo_fast qdisc.
>>
>> This is the default behaviour users could zero txqueuelen when
>> they create the device because it is a stacked device with
>> NETIF_F_LLTX using the lower devices qdisc makes sense but this
>> would appear (from code inspection) to break macvtap_forward()?
>>
>> if (skb_queue_len(>sk.sk_receive_queue) >= dev->tx_queue_len)
>> goto drop;
>>
>> I'm not sure any of this is a problem other than its a bit
>> confusing to overload tx_queue_len for the rx_queue_len but the
>> precedent has been there for sometime.
> So how about ndo ops to access tx_queue_len then?
> This way we can set tx_queue_len to 0 and use some
> other field to store the rx_queue_len without users noticing.
> Along the lines of the below (it's a partial patch just
> to give you the idea):
>
> diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c
> index 5b7d0e1..e526b46 100644
> --- a/net/core/dev_ioctl.c
> +++ b/net/core/dev_ioctl.c
> @@ -167,7 +167,10 @@ static int dev_ifsioc_locked(struct net *net, struct 
> ifreq *ifr, unsigned int cm
>   return 0;
>  
>   case SIOCGIFTXQLEN:
> - ifr->ifr_qlen = dev->tx_queue_len;
> + if (dev->netdev_ops->ndo_get_tx_queue_len)
> + ifr->ifr_qlen = 
> dev->netdev_ops->ndo_get_tx_queue_len(dev);
> + else
> + ifr->ifr_qlen = dev->tx_queue_len;
>   return 0;
>  
>   default:
> @@ -296,7 +299,10 @@ static int dev_ifsioc(struct net *net, struct ifreq 
> *ifr, unsigned int cmd)
>   case SIOCSIFTXQLEN:
>   if (ifr->ifr_qlen < 0)
>   return -EINVAL;
> - dev->tx_queue_len = ifr->ifr_qlen;
> + if (dev->netdev_ops->ndo_set_tx_queue_len)
> + dev->netdev_ops->ndo_set_tx_queue_len(dev, 
> ifr->ifr_qlen);
> + else
> + dev->tx_queue_len = ifr->ifr_qlen;
>   return 0;
>  
>   case SIOCSIFNAME:
> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> index d954b56..f2fd9d5 100644
> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -280,10 +280,31 @@ NETDEVICE_SHOW_RW(flags, fmt_hex);
>  
>  static int change_tx_queue_len(struct net_device *net, unsigned long new_len)
>  {
> - net->tx_queue_len = new_len;
> + if (dev->netdev_ops->ndo_set_tx_queue_len)
> + dev->netdev_ops->ndo_set_tx_queue_len(dev, new_len);
> + else
> + dev->tx_queue_len = new_len;
>   return 0;
>  }
>  
> +static ssize_t format_tx_queue_len(const struct net_device *net, char *buf)
> +{
> + unsigned long tx_queue_len;
> +
> + if (dev->netdev_ops->ndo_get_tx_queue_len)
> + tx_queue_len = dev->netdev_ops->ndo_get_tx_queue_len(dev);
> + else
> + tx_queue_len = dev->tx_queue_len;
> +
> + return sprintf(buf, fmt_ulong, tx_queue_len);
> +}
> +
> +static ssize_t tx_queue_len_show(struct device *dev,
> +  struct device_attribute *attr, char *buf)
> +{
> + return netdev_show(dev, attr, buf, format_tx_queue_len);
> +}
> +
>  static ssize_t tx_queue_len_store(struct device *dev,
> struct device_attribute *attr,
> const char *buf, size_t len)
> @@ -293,7 +314,7 @@ static ssize_t tx_queue_len_store(struct device *dev,
>  
>   return netdev_store(dev, attr, buf, len, 

Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-09 Thread Jason Wang
On 01/10/2014 05:39 AM, Stephen Hemminger wrote:
 On Thu, 09 Jan 2014 16:55:07 +0800
 Jason Wang jasow...@redhat.com wrote:

 What if use do want a qdisc and want to change the its queue length for
 tun/macvlan? And the the name tx_queue_length is misleading. For tun it
 may make sense since it was used in transmission path. For macvtap it
 was not. So maybe what we need is just a new ioctl for both tun/macvtap
 and a new feature flag. If user create the device with new feature flag,
 the socket receive queue length could be changed by ioctl instead of
 dev-tx_queue_length. If not, the old behaviour could be kept.
 The overloading of tx_queue_len in macvtap was the original design mistake.
 Can't this just be undone and expose rx_queue_len as sysfs attribute?

That works. But we current allow user to change the socket sndbuf
through TUNSNDBUF. Maybe we need a similar one for receive.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-09 Thread Jason Wang
On 01/09/2014 03:17 PM, Michael S. Tsirkin wrote:
 On Wed, Jan 08, 2014 at 11:05:42AM -0800, John Fastabend wrote:
 [...]

 OK I think I'm finally putting all the pieces together thanks.

 Do you know why macvtap is setting dev-tx_queue_len by default? If you
 zero this then the noqueue_qdisc is used and the q-enqueue check in
 dev_queue_xmit will fail.
 It was introduced in commit 8a35747a5d13b99e076b0222729e0caa48cb69b6
 (macvtap: Limit packet queue length) to limit the length of socket
 receive queue of macvtap. But I'm not sure whether the qdisc is a
 byproduct of this commit, maybe we can switch to use another name
 instead of just reuse dev-tx_queue_length.
 You mean tx_queue_len really, right?

 Problem is tx_queue_len can be accessed using netlink sysfs or ioctl,
 so if someone uses these to control or check the # of packets that
 can be queued by device, this will break.

 How about adding ndo_set_tx_queue_len then?

 At some point we wanted to decouple queue length from tx_queue_length
 for tun as well, so that would be benefitial there as well.
 On the receive side we need to limit the receive queue and the
 dev-tx_queue_len value was used to do this.

 However on the tx side when dev-tx_queue_len is set the default
 qdisc pfifo_fast or mq is used depending on if there is multiqueue
 or not. Unless the user specifies some numtxqueues when creating
 the macvtap device then it will be a single queue device and use
 the pfifo_fast qdisc.

 This is the default behaviour users could zero txqueuelen when
 they create the device because it is a stacked device with
 NETIF_F_LLTX using the lower devices qdisc makes sense but this
 would appear (from code inspection) to break macvtap_forward()?

 if (skb_queue_len(q-sk.sk_receive_queue) = dev-tx_queue_len)
 goto drop;

 I'm not sure any of this is a problem other than its a bit
 confusing to overload tx_queue_len for the rx_queue_len but the
 precedent has been there for sometime.
 So how about ndo ops to access tx_queue_len then?
 This way we can set tx_queue_len to 0 and use some
 other field to store the rx_queue_len without users noticing.
 Along the lines of the below (it's a partial patch just
 to give you the idea):

 diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c
 index 5b7d0e1..e526b46 100644
 --- a/net/core/dev_ioctl.c
 +++ b/net/core/dev_ioctl.c
 @@ -167,7 +167,10 @@ static int dev_ifsioc_locked(struct net *net, struct 
 ifreq *ifr, unsigned int cm
   return 0;
  
   case SIOCGIFTXQLEN:
 - ifr-ifr_qlen = dev-tx_queue_len;
 + if (dev-netdev_ops-ndo_get_tx_queue_len)
 + ifr-ifr_qlen = 
 dev-netdev_ops-ndo_get_tx_queue_len(dev);
 + else
 + ifr-ifr_qlen = dev-tx_queue_len;
   return 0;
  
   default:
 @@ -296,7 +299,10 @@ static int dev_ifsioc(struct net *net, struct ifreq 
 *ifr, unsigned int cmd)
   case SIOCSIFTXQLEN:
   if (ifr-ifr_qlen  0)
   return -EINVAL;
 - dev-tx_queue_len = ifr-ifr_qlen;
 + if (dev-netdev_ops-ndo_set_tx_queue_len)
 + dev-netdev_ops-ndo_set_tx_queue_len(dev, 
 ifr-ifr_qlen);
 + else
 + dev-tx_queue_len = ifr-ifr_qlen;
   return 0;
  
   case SIOCSIFNAME:
 diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
 index d954b56..f2fd9d5 100644
 --- a/net/core/net-sysfs.c
 +++ b/net/core/net-sysfs.c
 @@ -280,10 +280,31 @@ NETDEVICE_SHOW_RW(flags, fmt_hex);
  
  static int change_tx_queue_len(struct net_device *net, unsigned long new_len)
  {
 - net-tx_queue_len = new_len;
 + if (dev-netdev_ops-ndo_set_tx_queue_len)
 + dev-netdev_ops-ndo_set_tx_queue_len(dev, new_len);
 + else
 + dev-tx_queue_len = new_len;
   return 0;
  }
  
 +static ssize_t format_tx_queue_len(const struct net_device *net, char *buf)
 +{
 + unsigned long tx_queue_len;
 +
 + if (dev-netdev_ops-ndo_get_tx_queue_len)
 + tx_queue_len = dev-netdev_ops-ndo_get_tx_queue_len(dev);
 + else
 + tx_queue_len = dev-tx_queue_len;
 +
 + return sprintf(buf, fmt_ulong, tx_queue_len);
 +}
 +
 +static ssize_t tx_queue_len_show(struct device *dev,
 +  struct device_attribute *attr, char *buf)
 +{
 + return netdev_show(dev, attr, buf, format_tx_queue_len);
 +}
 +
  static ssize_t tx_queue_len_store(struct device *dev,
 struct device_attribute *attr,
 const char *buf, size_t len)
 @@ -293,7 +314,7 @@ static ssize_t tx_queue_len_store(struct device *dev,
  
   return netdev_store(dev, attr, buf, len, change_tx_queue_len);
  }
 -NETDEVICE_SHOW_RW(tx_queue_len, fmt_ulong);
 +DEVICE_ATTR_RW(tx_queue_len);
  
  static ssize_t ifalias_store(struct device *dev, struct device_attribute 
 *attr,
const char *buf, 

Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-09 Thread Stephen Hemminger
On Thu, 09 Jan 2014 16:55:07 +0800
Jason Wang jasow...@redhat.com wrote:

 What if use do want a qdisc and want to change the its queue length for
 tun/macvlan? And the the name tx_queue_length is misleading. For tun it
 may make sense since it was used in transmission path. For macvtap it
 was not. So maybe what we need is just a new ioctl for both tun/macvtap
 and a new feature flag. If user create the device with new feature flag,
 the socket receive queue length could be changed by ioctl instead of
 dev-tx_queue_length. If not, the old behaviour could be kept.

The overloading of tx_queue_len in macvtap was the original design mistake.
Can't this just be undone and expose rx_queue_len as sysfs attribute?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-09 Thread Michael S. Tsirkin
On Thu, Jan 09, 2014 at 01:39:08PM -0800, Stephen Hemminger wrote:
 On Thu, 09 Jan 2014 16:55:07 +0800
 Jason Wang jasow...@redhat.com wrote:
 
  What if use do want a qdisc and want to change the its queue length for
  tun/macvlan? And the the name tx_queue_length is misleading. For tun it
  may make sense since it was used in transmission path. For macvtap it
  was not. So maybe what we need is just a new ioctl for both tun/macvtap
  and a new feature flag. If user create the device with new feature flag,
  the socket receive queue length could be changed by ioctl instead of
  dev-tx_queue_length. If not, the old behaviour could be kept.
 
 The overloading of tx_queue_len in macvtap was the original design mistake.
 Can't this just be undone and expose rx_queue_len as sysfs attribute?

Yes but we need to avoid breaking user-visible ABI.
So I think we'll need to catch any access attempts and redirect them to
the new rx_queue_len.  I posted a patch like this using new
ndo_set_tx_queue_len/ndo_get_tx_queue_len.  Have you seen it? What do
you think?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-09 Thread Stephen Hemminger
On Fri, 10 Jan 2014 00:03:23 +0200
Michael S. Tsirkin m...@redhat.com wrote:

 On Thu, Jan 09, 2014 at 01:39:08PM -0800, Stephen Hemminger wrote:
  On Thu, 09 Jan 2014 16:55:07 +0800
  Jason Wang jasow...@redhat.com wrote:
  
   What if use do want a qdisc and want to change the its queue length for
   tun/macvlan? And the the name tx_queue_length is misleading. For tun it
   may make sense since it was used in transmission path. For macvtap it
   was not. So maybe what we need is just a new ioctl for both tun/macvtap
   and a new feature flag. If user create the device with new feature flag,
   the socket receive queue length could be changed by ioctl instead of
   dev-tx_queue_length. If not, the old behaviour could be kept.
  
  The overloading of tx_queue_len in macvtap was the original design mistake.
  Can't this just be undone and expose rx_queue_len as sysfs attribute?
 
 Yes but we need to avoid breaking user-visible ABI.

I think in this case, it was a mistake and hasn't been around long enough
to cause serious damage.

 So I think we'll need to catch any access attempts and redirect them to
 the new rx_queue_len.  I posted a patch like this using new
 ndo_set_tx_queue_len/ndo_get_tx_queue_len.  Have you seen it? What do
 you think?

It encourages others to do/make the same mistake so I don't like it.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-08 Thread Michael S. Tsirkin
On Wed, Jan 08, 2014 at 11:05:42AM -0800, John Fastabend wrote:
> [...]
> 
> >>>OK I think I'm finally putting all the pieces together thanks.
> >>>
> >>>Do you know why macvtap is setting dev->tx_queue_len by default? If you
> >>>zero this then the noqueue_qdisc is used and the q->enqueue check in
> >>>dev_queue_xmit will fail.
> >>
> >>It was introduced in commit 8a35747a5d13b99e076b0222729e0caa48cb69b6
> >>("macvtap: Limit packet queue length") to limit the length of socket
> >>receive queue of macvtap. But I'm not sure whether the qdisc is a
> >>byproduct of this commit, maybe we can switch to use another name
> >>instead of just reuse dev->tx_queue_length.
> >
> >You mean tx_queue_len really, right?
> >
> >Problem is tx_queue_len can be accessed using netlink sysfs or ioctl,
> >so if someone uses these to control or check the # of packets that
> >can be queued by device, this will break.
> >
> >How about adding ndo_set_tx_queue_len then?
> >
> >At some point we wanted to decouple queue length from tx_queue_length
> >for tun as well, so that would be benefitial there as well.
> 
> On the receive side we need to limit the receive queue and the
> dev->tx_queue_len value was used to do this.
> 
> However on the tx side when dev->tx_queue_len is set the default
> qdisc pfifo_fast or mq is used depending on if there is multiqueue
> or not. Unless the user specifies some numtxqueues when creating
> the macvtap device then it will be a single queue device and use
> the pfifo_fast qdisc.
> 
> This is the default behaviour users could zero txqueuelen when
> they create the device because it is a stacked device with
> NETIF_F_LLTX using the lower devices qdisc makes sense but this
> would appear (from code inspection) to break macvtap_forward()?
> 
> if (skb_queue_len(>sk.sk_receive_queue) >= dev->tx_queue_len)
> goto drop;
> 
> I'm not sure any of this is a problem other than its a bit
> confusing to overload tx_queue_len for the rx_queue_len but the
> precedent has been there for sometime.

So how about ndo ops to access tx_queue_len then?
This way we can set tx_queue_len to 0 and use some
other field to store the rx_queue_len without users noticing.
Along the lines of the below (it's a partial patch just
to give you the idea):

diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c
index 5b7d0e1..e526b46 100644
--- a/net/core/dev_ioctl.c
+++ b/net/core/dev_ioctl.c
@@ -167,7 +167,10 @@ static int dev_ifsioc_locked(struct net *net, struct ifreq 
*ifr, unsigned int cm
return 0;
 
case SIOCGIFTXQLEN:
-   ifr->ifr_qlen = dev->tx_queue_len;
+   if (dev->netdev_ops->ndo_get_tx_queue_len)
+   ifr->ifr_qlen = 
dev->netdev_ops->ndo_get_tx_queue_len(dev);
+   else
+   ifr->ifr_qlen = dev->tx_queue_len;
return 0;
 
default:
@@ -296,7 +299,10 @@ static int dev_ifsioc(struct net *net, struct ifreq *ifr, 
unsigned int cmd)
case SIOCSIFTXQLEN:
if (ifr->ifr_qlen < 0)
return -EINVAL;
-   dev->tx_queue_len = ifr->ifr_qlen;
+   if (dev->netdev_ops->ndo_set_tx_queue_len)
+   dev->netdev_ops->ndo_set_tx_queue_len(dev, 
ifr->ifr_qlen);
+   else
+   dev->tx_queue_len = ifr->ifr_qlen;
return 0;
 
case SIOCSIFNAME:
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index d954b56..f2fd9d5 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -280,10 +280,31 @@ NETDEVICE_SHOW_RW(flags, fmt_hex);
 
 static int change_tx_queue_len(struct net_device *net, unsigned long new_len)
 {
-   net->tx_queue_len = new_len;
+   if (dev->netdev_ops->ndo_set_tx_queue_len)
+   dev->netdev_ops->ndo_set_tx_queue_len(dev, new_len);
+   else
+   dev->tx_queue_len = new_len;
return 0;
 }
 
+static ssize_t format_tx_queue_len(const struct net_device *net, char *buf)
+{
+   unsigned long tx_queue_len;
+
+   if (dev->netdev_ops->ndo_get_tx_queue_len)
+   tx_queue_len = dev->netdev_ops->ndo_get_tx_queue_len(dev);
+   else
+   tx_queue_len = dev->tx_queue_len;
+
+   return sprintf(buf, fmt_ulong, tx_queue_len);
+}
+
+static ssize_t tx_queue_len_show(struct device *dev,
+struct device_attribute *attr, char *buf)
+{
+   return netdev_show(dev, attr, buf, format_tx_queue_len);
+}
+
 static ssize_t tx_queue_len_store(struct device *dev,
  struct device_attribute *attr,
  const char *buf, size_t len)
@@ -293,7 +314,7 @@ static ssize_t tx_queue_len_store(struct device *dev,
 
return netdev_store(dev, attr, buf, len, change_tx_queue_len);
 }
-NETDEVICE_SHOW_RW(tx_queue_len, fmt_ulong);
+DEVICE_ATTR_RW(tx_queue_len);
 
 static ssize_t ifalias_store(struct device *dev, 

Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-08 Thread John Fastabend

[...]


OK I think I'm finally putting all the pieces together thanks.

Do you know why macvtap is setting dev->tx_queue_len by default? If you
zero this then the noqueue_qdisc is used and the q->enqueue check in
dev_queue_xmit will fail.


It was introduced in commit 8a35747a5d13b99e076b0222729e0caa48cb69b6
("macvtap: Limit packet queue length") to limit the length of socket
receive queue of macvtap. But I'm not sure whether the qdisc is a
byproduct of this commit, maybe we can switch to use another name
instead of just reuse dev->tx_queue_length.


You mean tx_queue_len really, right?

Problem is tx_queue_len can be accessed using netlink sysfs or ioctl,
so if someone uses these to control or check the # of packets that
can be queued by device, this will break.

How about adding ndo_set_tx_queue_len then?

At some point we wanted to decouple queue length from tx_queue_length
for tun as well, so that would be benefitial there as well.


On the receive side we need to limit the receive queue and the
dev->tx_queue_len value was used to do this.

However on the tx side when dev->tx_queue_len is set the default
qdisc pfifo_fast or mq is used depending on if there is multiqueue
or not. Unless the user specifies some numtxqueues when creating
the macvtap device then it will be a single queue device and use
the pfifo_fast qdisc.

This is the default behaviour users could zero txqueuelen when
they create the device because it is a stacked device with
NETIF_F_LLTX using the lower devices qdisc makes sense but this
would appear (from code inspection) to break macvtap_forward()?

if (skb_queue_len(>sk.sk_receive_queue) >= dev->tx_queue_len)
goto drop;

I'm not sure any of this is a problem other than its a bit
confusing to overload tx_queue_len for the rx_queue_len but the
precedent has been there for sometime. It is a change in behaviour
though in net-next. Previously dev_queue_xmit() was not used so
the qdisc layer was never hit on the macvtap device. Now with
dev_queue_xmit() if the user attaches multiple macvlan queues to a
single qdisc queue this should still work but wont be optimal. Ideally
the user should create as many qdisc queues at creation time via
numtxqueues as macvtap queues will be used during runtime so that there
is a 1:1 mapping or use some heuristic to avoid cases where there
is a many to 1 mapping.

From my perspective it would be OK to push this configuration/policy
to the management layer. After all it is the entity that knows how
to distribute system resources amongst the objects running over the
macvtap devices. The relevance for me is I defaulted the l2 offloaded
macvlans to single queue devices and wanted to note in net-next this
is the same policy used in the non-offloaded case.

Bit long-winded there.

Thanks,
John
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-08 Thread Michael S. Tsirkin
On Tue, Jan 07, 2014 at 05:00:29PM +0800, Jason Wang wrote:
> On 01/07/2014 03:26 PM, John Fastabend wrote:
> > [...]
> >
>  Unfortunately not. This commit has a side effect that it in fact
>  disables the multiqueue macvtap transmission. Since all macvtap queues
>  will contend on a single qdisc lock.
> 
> >>>
> >>> They will only contend on a single qdisc lock if the lower device has
> >>> 1 queue.
> >>
> >> I think we are talking about 6acf54f1cf0a6747bac9fea26f34cfc5a9029523.
> >
> > Yes.
> >
> >>
> >> The qdisc or txq lock were macvlan device itself since dev_queue_xmit()
> >> was called for macvlan device itself. So even if lower device has
> >> multiple txqs, if you just create a one queue macvlan device, you will
> >> get lock contention on macvlan device. And even if you explicitly
> >> specifying the txq numbers ( though I don't believe most management
> >> software will do this) when creating the macvlan/macvtap device, you
> >> must also configure the XPS for macvlan to make sure it has the
> >> possibility of using multiple transmit queues.
> >>
> >
> > OK I think I'm finally putting all the pieces together thanks.
> >
> > Do you know why macvtap is setting dev->tx_queue_len by default? If you
> > zero this then the noqueue_qdisc is used and the q->enqueue check in
> > dev_queue_xmit will fail.
> 
> It was introduced in commit 8a35747a5d13b99e076b0222729e0caa48cb69b6
> ("macvtap: Limit packet queue length") to limit the length of socket
> receive queue of macvtap. But I'm not sure whether the qdisc is a
> byproduct of this commit, maybe we can switch to use another name
> instead of just reuse dev->tx_queue_length.

You mean tx_queue_len really, right?

Problem is tx_queue_len can be accessed using netlink sysfs or ioctl,
so if someone uses these to control or check the # of packets that
can be queued by device, this will break.

How about adding ndo_set_tx_queue_len then?

At some point we wanted to decouple queue length from tx_queue_length
for tun as well, so that would be benefitial there as well.


> >
> > Also if XPS is not configured then skb_tx_hash is used so multiple
> > transmit queues will still be used.
> >
> 
> True.
> >>> Perhaps defaulting the L2 forwarding devices to 1queue was a
> >>> mistake. But the same issue arises when running macvtap over a
> >>> non-multiqueue nic. Or even if you have a multiqueue device and create
> >>> many more macvtap queues than the lower device has queues.
> >>>
> >>> Shouldn't the macvtap configuration take into account the lowest level
> >>> devices queues?
> >>
> >> See commit 8ffab51b3dfc54876f145f15b351c41f3f703195 ("macvlan: lockless
> >> tx path"). It allows the management to create a device without worrying
> >> the underlying device.
> >
> > OK.
> >
> >>> How does using the L2 forwarding device change the
> >>> contention issues? Without the L2 forwarding LLTX is enabled but the
> >>> qdisc lock, etc is still acquired on the device below the macvlan.
> >>>
> >>
> >> That's the point. We need make sure the txq selection and qdisc lock
> >> were done for the lower device not for the macvlan device itself. Then
> >> macvlan can automatically benefit from the multi-queue capable lower
> >> devices. But L2 forwarding needs to contend on the txq lock on macvlan
> >> device itself, which is unnecessary and can complex the management.
> >
> > If I make the l2 forwarding defaults a bit better then using the L2
> > forwarding case should not be any more complex. And because the queues
> > are dedicated to the macvtap device any contention from qdisc lock, etc
> > comes from the upper device only. 
> 
> At very least the txq of lower device should be held in order to be
> synchronized with management path. Consider txq lock were often held by
> netif_tx_disable() before trying to down the card. Current cold does not
> hold txq lock, so it loses the synchronization which may cause issues.
> And the code also does not check whether the txq has been stopped before
> trying to start the transmission.
> 
> 
> > Also if I get the bandwidth controls
> > in we can set the max/min bandwidth per macvtap device this way. That
> > is future work though.
> >
> 
> That will be a nice feature.
> >>> The ixgbe driver as it is currently written can be configured for up to
> >>> 4 queues by setting numtxqueues when the device is created. I assume
> >>> when creating macvtap queues the user needs to account for the number
> >>> of queues supported by the lower device.
> >>>
> >>
> >> We'd better not complicate the task of management, lockless tx path work
> >> very well so we can just keep it. Btw, there's no way for the user to
> >> know the maximum number of queues that L2 forwarding supports.
> >
> > Good point I'll add an attribute to query it.
> >
>  For L2 forwarding offload itself, more issues need to be addressed for
>  multiqueue macvtap:
> 
>  - ndo_dfwd_add_station() can only create queues per device at
>  ndo_open,
> 

Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-08 Thread Michael S. Tsirkin
On Tue, Jan 07, 2014 at 05:00:29PM +0800, Jason Wang wrote:
 On 01/07/2014 03:26 PM, John Fastabend wrote:
  [...]
 
  Unfortunately not. This commit has a side effect that it in fact
  disables the multiqueue macvtap transmission. Since all macvtap queues
  will contend on a single qdisc lock.
 
 
  They will only contend on a single qdisc lock if the lower device has
  1 queue.
 
  I think we are talking about 6acf54f1cf0a6747bac9fea26f34cfc5a9029523.
 
  Yes.
 
 
  The qdisc or txq lock were macvlan device itself since dev_queue_xmit()
  was called for macvlan device itself. So even if lower device has
  multiple txqs, if you just create a one queue macvlan device, you will
  get lock contention on macvlan device. And even if you explicitly
  specifying the txq numbers ( though I don't believe most management
  software will do this) when creating the macvlan/macvtap device, you
  must also configure the XPS for macvlan to make sure it has the
  possibility of using multiple transmit queues.
 
 
  OK I think I'm finally putting all the pieces together thanks.
 
  Do you know why macvtap is setting dev-tx_queue_len by default? If you
  zero this then the noqueue_qdisc is used and the q-enqueue check in
  dev_queue_xmit will fail.
 
 It was introduced in commit 8a35747a5d13b99e076b0222729e0caa48cb69b6
 (macvtap: Limit packet queue length) to limit the length of socket
 receive queue of macvtap. But I'm not sure whether the qdisc is a
 byproduct of this commit, maybe we can switch to use another name
 instead of just reuse dev-tx_queue_length.

You mean tx_queue_len really, right?

Problem is tx_queue_len can be accessed using netlink sysfs or ioctl,
so if someone uses these to control or check the # of packets that
can be queued by device, this will break.

How about adding ndo_set_tx_queue_len then?

At some point we wanted to decouple queue length from tx_queue_length
for tun as well, so that would be benefitial there as well.


 
  Also if XPS is not configured then skb_tx_hash is used so multiple
  transmit queues will still be used.
 
 
 True.
  Perhaps defaulting the L2 forwarding devices to 1queue was a
  mistake. But the same issue arises when running macvtap over a
  non-multiqueue nic. Or even if you have a multiqueue device and create
  many more macvtap queues than the lower device has queues.
 
  Shouldn't the macvtap configuration take into account the lowest level
  devices queues?
 
  See commit 8ffab51b3dfc54876f145f15b351c41f3f703195 (macvlan: lockless
  tx path). It allows the management to create a device without worrying
  the underlying device.
 
  OK.
 
  How does using the L2 forwarding device change the
  contention issues? Without the L2 forwarding LLTX is enabled but the
  qdisc lock, etc is still acquired on the device below the macvlan.
 
 
  That's the point. We need make sure the txq selection and qdisc lock
  were done for the lower device not for the macvlan device itself. Then
  macvlan can automatically benefit from the multi-queue capable lower
  devices. But L2 forwarding needs to contend on the txq lock on macvlan
  device itself, which is unnecessary and can complex the management.
 
  If I make the l2 forwarding defaults a bit better then using the L2
  forwarding case should not be any more complex. And because the queues
  are dedicated to the macvtap device any contention from qdisc lock, etc
  comes from the upper device only. 
 
 At very least the txq of lower device should be held in order to be
 synchronized with management path. Consider txq lock were often held by
 netif_tx_disable() before trying to down the card. Current cold does not
 hold txq lock, so it loses the synchronization which may cause issues.
 And the code also does not check whether the txq has been stopped before
 trying to start the transmission.
 
 
  Also if I get the bandwidth controls
  in we can set the max/min bandwidth per macvtap device this way. That
  is future work though.
 
 
 That will be a nice feature.
  The ixgbe driver as it is currently written can be configured for up to
  4 queues by setting numtxqueues when the device is created. I assume
  when creating macvtap queues the user needs to account for the number
  of queues supported by the lower device.
 
 
  We'd better not complicate the task of management, lockless tx path work
  very well so we can just keep it. Btw, there's no way for the user to
  know the maximum number of queues that L2 forwarding supports.
 
  Good point I'll add an attribute to query it.
 
  For L2 forwarding offload itself, more issues need to be addressed for
  multiqueue macvtap:
 
  - ndo_dfwd_add_station() can only create queues per device at
  ndo_open,
  but multiqueue macvtap allows user to create and destroy queues at
  their
  will and at any time.
 
  same argument as above, isn't this the same when running macvtap
  without
  the l2 offloads over a real device? I expect you hit the same
  contention
  points when running over a 

Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-08 Thread John Fastabend

[...]


OK I think I'm finally putting all the pieces together thanks.

Do you know why macvtap is setting dev-tx_queue_len by default? If you
zero this then the noqueue_qdisc is used and the q-enqueue check in
dev_queue_xmit will fail.


It was introduced in commit 8a35747a5d13b99e076b0222729e0caa48cb69b6
(macvtap: Limit packet queue length) to limit the length of socket
receive queue of macvtap. But I'm not sure whether the qdisc is a
byproduct of this commit, maybe we can switch to use another name
instead of just reuse dev-tx_queue_length.


You mean tx_queue_len really, right?

Problem is tx_queue_len can be accessed using netlink sysfs or ioctl,
so if someone uses these to control or check the # of packets that
can be queued by device, this will break.

How about adding ndo_set_tx_queue_len then?

At some point we wanted to decouple queue length from tx_queue_length
for tun as well, so that would be benefitial there as well.


On the receive side we need to limit the receive queue and the
dev-tx_queue_len value was used to do this.

However on the tx side when dev-tx_queue_len is set the default
qdisc pfifo_fast or mq is used depending on if there is multiqueue
or not. Unless the user specifies some numtxqueues when creating
the macvtap device then it will be a single queue device and use
the pfifo_fast qdisc.

This is the default behaviour users could zero txqueuelen when
they create the device because it is a stacked device with
NETIF_F_LLTX using the lower devices qdisc makes sense but this
would appear (from code inspection) to break macvtap_forward()?

if (skb_queue_len(q-sk.sk_receive_queue) = dev-tx_queue_len)
goto drop;

I'm not sure any of this is a problem other than its a bit
confusing to overload tx_queue_len for the rx_queue_len but the
precedent has been there for sometime. It is a change in behaviour
though in net-next. Previously dev_queue_xmit() was not used so
the qdisc layer was never hit on the macvtap device. Now with
dev_queue_xmit() if the user attaches multiple macvlan queues to a
single qdisc queue this should still work but wont be optimal. Ideally
the user should create as many qdisc queues at creation time via
numtxqueues as macvtap queues will be used during runtime so that there
is a 1:1 mapping or use some heuristic to avoid cases where there
is a many to 1 mapping.

From my perspective it would be OK to push this configuration/policy
to the management layer. After all it is the entity that knows how
to distribute system resources amongst the objects running over the
macvtap devices. The relevance for me is I defaulted the l2 offloaded
macvlans to single queue devices and wanted to note in net-next this
is the same policy used in the non-offloaded case.

Bit long-winded there.

Thanks,
John
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-08 Thread Michael S. Tsirkin
On Wed, Jan 08, 2014 at 11:05:42AM -0800, John Fastabend wrote:
 [...]
 
 OK I think I'm finally putting all the pieces together thanks.
 
 Do you know why macvtap is setting dev-tx_queue_len by default? If you
 zero this then the noqueue_qdisc is used and the q-enqueue check in
 dev_queue_xmit will fail.
 
 It was introduced in commit 8a35747a5d13b99e076b0222729e0caa48cb69b6
 (macvtap: Limit packet queue length) to limit the length of socket
 receive queue of macvtap. But I'm not sure whether the qdisc is a
 byproduct of this commit, maybe we can switch to use another name
 instead of just reuse dev-tx_queue_length.
 
 You mean tx_queue_len really, right?
 
 Problem is tx_queue_len can be accessed using netlink sysfs or ioctl,
 so if someone uses these to control or check the # of packets that
 can be queued by device, this will break.
 
 How about adding ndo_set_tx_queue_len then?
 
 At some point we wanted to decouple queue length from tx_queue_length
 for tun as well, so that would be benefitial there as well.
 
 On the receive side we need to limit the receive queue and the
 dev-tx_queue_len value was used to do this.
 
 However on the tx side when dev-tx_queue_len is set the default
 qdisc pfifo_fast or mq is used depending on if there is multiqueue
 or not. Unless the user specifies some numtxqueues when creating
 the macvtap device then it will be a single queue device and use
 the pfifo_fast qdisc.
 
 This is the default behaviour users could zero txqueuelen when
 they create the device because it is a stacked device with
 NETIF_F_LLTX using the lower devices qdisc makes sense but this
 would appear (from code inspection) to break macvtap_forward()?
 
 if (skb_queue_len(q-sk.sk_receive_queue) = dev-tx_queue_len)
 goto drop;
 
 I'm not sure any of this is a problem other than its a bit
 confusing to overload tx_queue_len for the rx_queue_len but the
 precedent has been there for sometime.

So how about ndo ops to access tx_queue_len then?
This way we can set tx_queue_len to 0 and use some
other field to store the rx_queue_len without users noticing.
Along the lines of the below (it's a partial patch just
to give you the idea):

diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c
index 5b7d0e1..e526b46 100644
--- a/net/core/dev_ioctl.c
+++ b/net/core/dev_ioctl.c
@@ -167,7 +167,10 @@ static int dev_ifsioc_locked(struct net *net, struct ifreq 
*ifr, unsigned int cm
return 0;
 
case SIOCGIFTXQLEN:
-   ifr-ifr_qlen = dev-tx_queue_len;
+   if (dev-netdev_ops-ndo_get_tx_queue_len)
+   ifr-ifr_qlen = 
dev-netdev_ops-ndo_get_tx_queue_len(dev);
+   else
+   ifr-ifr_qlen = dev-tx_queue_len;
return 0;
 
default:
@@ -296,7 +299,10 @@ static int dev_ifsioc(struct net *net, struct ifreq *ifr, 
unsigned int cmd)
case SIOCSIFTXQLEN:
if (ifr-ifr_qlen  0)
return -EINVAL;
-   dev-tx_queue_len = ifr-ifr_qlen;
+   if (dev-netdev_ops-ndo_set_tx_queue_len)
+   dev-netdev_ops-ndo_set_tx_queue_len(dev, 
ifr-ifr_qlen);
+   else
+   dev-tx_queue_len = ifr-ifr_qlen;
return 0;
 
case SIOCSIFNAME:
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index d954b56..f2fd9d5 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -280,10 +280,31 @@ NETDEVICE_SHOW_RW(flags, fmt_hex);
 
 static int change_tx_queue_len(struct net_device *net, unsigned long new_len)
 {
-   net-tx_queue_len = new_len;
+   if (dev-netdev_ops-ndo_set_tx_queue_len)
+   dev-netdev_ops-ndo_set_tx_queue_len(dev, new_len);
+   else
+   dev-tx_queue_len = new_len;
return 0;
 }
 
+static ssize_t format_tx_queue_len(const struct net_device *net, char *buf)
+{
+   unsigned long tx_queue_len;
+
+   if (dev-netdev_ops-ndo_get_tx_queue_len)
+   tx_queue_len = dev-netdev_ops-ndo_get_tx_queue_len(dev);
+   else
+   tx_queue_len = dev-tx_queue_len;
+
+   return sprintf(buf, fmt_ulong, tx_queue_len);
+}
+
+static ssize_t tx_queue_len_show(struct device *dev,
+struct device_attribute *attr, char *buf)
+{
+   return netdev_show(dev, attr, buf, format_tx_queue_len);
+}
+
 static ssize_t tx_queue_len_store(struct device *dev,
  struct device_attribute *attr,
  const char *buf, size_t len)
@@ -293,7 +314,7 @@ static ssize_t tx_queue_len_store(struct device *dev,
 
return netdev_store(dev, attr, buf, len, change_tx_queue_len);
 }
-NETDEVICE_SHOW_RW(tx_queue_len, fmt_ulong);
+DEVICE_ATTR_RW(tx_queue_len);
 
 static ssize_t ifalias_store(struct device *dev, struct device_attribute *attr,
 const char *buf, size_t len)
diff --git a/net/core/rtnetlink.c 

Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-07 Thread Jason Wang
On 01/07/2014 03:26 PM, John Fastabend wrote:
> [...]
>
 Unfortunately not. This commit has a side effect that it in fact
 disables the multiqueue macvtap transmission. Since all macvtap queues
 will contend on a single qdisc lock.

>>>
>>> They will only contend on a single qdisc lock if the lower device has
>>> 1 queue.
>>
>> I think we are talking about 6acf54f1cf0a6747bac9fea26f34cfc5a9029523.
>
> Yes.
>
>>
>> The qdisc or txq lock were macvlan device itself since dev_queue_xmit()
>> was called for macvlan device itself. So even if lower device has
>> multiple txqs, if you just create a one queue macvlan device, you will
>> get lock contention on macvlan device. And even if you explicitly
>> specifying the txq numbers ( though I don't believe most management
>> software will do this) when creating the macvlan/macvtap device, you
>> must also configure the XPS for macvlan to make sure it has the
>> possibility of using multiple transmit queues.
>>
>
> OK I think I'm finally putting all the pieces together thanks.
>
> Do you know why macvtap is setting dev->tx_queue_len by default? If you
> zero this then the noqueue_qdisc is used and the q->enqueue check in
> dev_queue_xmit will fail.

It was introduced in commit 8a35747a5d13b99e076b0222729e0caa48cb69b6
("macvtap: Limit packet queue length") to limit the length of socket
receive queue of macvtap. But I'm not sure whether the qdisc is a
byproduct of this commit, maybe we can switch to use another name
instead of just reuse dev->tx_queue_length.
>
> Also if XPS is not configured then skb_tx_hash is used so multiple
> transmit queues will still be used.
>

True.
>>> Perhaps defaulting the L2 forwarding devices to 1queue was a
>>> mistake. But the same issue arises when running macvtap over a
>>> non-multiqueue nic. Or even if you have a multiqueue device and create
>>> many more macvtap queues than the lower device has queues.
>>>
>>> Shouldn't the macvtap configuration take into account the lowest level
>>> devices queues?
>>
>> See commit 8ffab51b3dfc54876f145f15b351c41f3f703195 ("macvlan: lockless
>> tx path"). It allows the management to create a device without worrying
>> the underlying device.
>
> OK.
>
>>> How does using the L2 forwarding device change the
>>> contention issues? Without the L2 forwarding LLTX is enabled but the
>>> qdisc lock, etc is still acquired on the device below the macvlan.
>>>
>>
>> That's the point. We need make sure the txq selection and qdisc lock
>> were done for the lower device not for the macvlan device itself. Then
>> macvlan can automatically benefit from the multi-queue capable lower
>> devices. But L2 forwarding needs to contend on the txq lock on macvlan
>> device itself, which is unnecessary and can complex the management.
>
> If I make the l2 forwarding defaults a bit better then using the L2
> forwarding case should not be any more complex. And because the queues
> are dedicated to the macvtap device any contention from qdisc lock, etc
> comes from the upper device only. 

At very least the txq of lower device should be held in order to be
synchronized with management path. Consider txq lock were often held by
netif_tx_disable() before trying to down the card. Current cold does not
hold txq lock, so it loses the synchronization which may cause issues.
And the code also does not check whether the txq has been stopped before
trying to start the transmission.


> Also if I get the bandwidth controls
> in we can set the max/min bandwidth per macvtap device this way. That
> is future work though.
>

That will be a nice feature.
>>> The ixgbe driver as it is currently written can be configured for up to
>>> 4 queues by setting numtxqueues when the device is created. I assume
>>> when creating macvtap queues the user needs to account for the number
>>> of queues supported by the lower device.
>>>
>>
>> We'd better not complicate the task of management, lockless tx path work
>> very well so we can just keep it. Btw, there's no way for the user to
>> know the maximum number of queues that L2 forwarding supports.
>
> Good point I'll add an attribute to query it.
>
 For L2 forwarding offload itself, more issues need to be addressed for
 multiqueue macvtap:

 - ndo_dfwd_add_station() can only create queues per device at
 ndo_open,
 but multiqueue macvtap allows user to create and destroy queues at
 their
 will and at any time.
>>>
>>> same argument as above, isn't this the same when running macvtap
>>> without
>>> the l2 offloads over a real device? I expect you hit the same
>>> contention
>>> points when running over a real device.
>>
>> Not true and not only for contention.
>>
>> Macvtap allows user to create or destroy a queue by simply open or close
>> to character device /dev/tapX. But currently, we do nothing when a new
>> queue was created or destroyed for L2 forwarding offload.
>>
>> For contention, lockless tx path make the contention only happens for
>> the txq or 

Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-07 Thread Jason Wang
On 01/07/2014 03:26 PM, John Fastabend wrote:
 [...]

 Unfortunately not. This commit has a side effect that it in fact
 disables the multiqueue macvtap transmission. Since all macvtap queues
 will contend on a single qdisc lock.


 They will only contend on a single qdisc lock if the lower device has
 1 queue.

 I think we are talking about 6acf54f1cf0a6747bac9fea26f34cfc5a9029523.

 Yes.


 The qdisc or txq lock were macvlan device itself since dev_queue_xmit()
 was called for macvlan device itself. So even if lower device has
 multiple txqs, if you just create a one queue macvlan device, you will
 get lock contention on macvlan device. And even if you explicitly
 specifying the txq numbers ( though I don't believe most management
 software will do this) when creating the macvlan/macvtap device, you
 must also configure the XPS for macvlan to make sure it has the
 possibility of using multiple transmit queues.


 OK I think I'm finally putting all the pieces together thanks.

 Do you know why macvtap is setting dev-tx_queue_len by default? If you
 zero this then the noqueue_qdisc is used and the q-enqueue check in
 dev_queue_xmit will fail.

It was introduced in commit 8a35747a5d13b99e076b0222729e0caa48cb69b6
(macvtap: Limit packet queue length) to limit the length of socket
receive queue of macvtap. But I'm not sure whether the qdisc is a
byproduct of this commit, maybe we can switch to use another name
instead of just reuse dev-tx_queue_length.

 Also if XPS is not configured then skb_tx_hash is used so multiple
 transmit queues will still be used.


True.
 Perhaps defaulting the L2 forwarding devices to 1queue was a
 mistake. But the same issue arises when running macvtap over a
 non-multiqueue nic. Or even if you have a multiqueue device and create
 many more macvtap queues than the lower device has queues.

 Shouldn't the macvtap configuration take into account the lowest level
 devices queues?

 See commit 8ffab51b3dfc54876f145f15b351c41f3f703195 (macvlan: lockless
 tx path). It allows the management to create a device without worrying
 the underlying device.

 OK.

 How does using the L2 forwarding device change the
 contention issues? Without the L2 forwarding LLTX is enabled but the
 qdisc lock, etc is still acquired on the device below the macvlan.


 That's the point. We need make sure the txq selection and qdisc lock
 were done for the lower device not for the macvlan device itself. Then
 macvlan can automatically benefit from the multi-queue capable lower
 devices. But L2 forwarding needs to contend on the txq lock on macvlan
 device itself, which is unnecessary and can complex the management.

 If I make the l2 forwarding defaults a bit better then using the L2
 forwarding case should not be any more complex. And because the queues
 are dedicated to the macvtap device any contention from qdisc lock, etc
 comes from the upper device only. 

At very least the txq of lower device should be held in order to be
synchronized with management path. Consider txq lock were often held by
netif_tx_disable() before trying to down the card. Current cold does not
hold txq lock, so it loses the synchronization which may cause issues.
And the code also does not check whether the txq has been stopped before
trying to start the transmission.


 Also if I get the bandwidth controls
 in we can set the max/min bandwidth per macvtap device this way. That
 is future work though.


That will be a nice feature.
 The ixgbe driver as it is currently written can be configured for up to
 4 queues by setting numtxqueues when the device is created. I assume
 when creating macvtap queues the user needs to account for the number
 of queues supported by the lower device.


 We'd better not complicate the task of management, lockless tx path work
 very well so we can just keep it. Btw, there's no way for the user to
 know the maximum number of queues that L2 forwarding supports.

 Good point I'll add an attribute to query it.

 For L2 forwarding offload itself, more issues need to be addressed for
 multiqueue macvtap:

 - ndo_dfwd_add_station() can only create queues per device at
 ndo_open,
 but multiqueue macvtap allows user to create and destroy queues at
 their
 will and at any time.

 same argument as above, isn't this the same when running macvtap
 without
 the l2 offloads over a real device? I expect you hit the same
 contention
 points when running over a real device.

 Not true and not only for contention.

 Macvtap allows user to create or destroy a queue by simply open or close
 to character device /dev/tapX. But currently, we do nothing when a new
 queue was created or destroyed for L2 forwarding offload.

 For contention, lockless tx path make the contention only happens for
 the txq or qdisc for the lower device, but L2 forwarding offload make
 contention also happen for the macvlan device itself.

 Right, but there will be less contention there because those queues
 are a dedicated resource for the upper 

Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-06 Thread John Fastabend

[...]


Unfortunately not. This commit has a side effect that it in fact
disables the multiqueue macvtap transmission. Since all macvtap queues
will contend on a single qdisc lock.



They will only contend on a single qdisc lock if the lower device has
1 queue.


I think we are talking about 6acf54f1cf0a6747bac9fea26f34cfc5a9029523.


Yes.



The qdisc or txq lock were macvlan device itself since dev_queue_xmit()
was called for macvlan device itself. So even if lower device has
multiple txqs, if you just create a one queue macvlan device, you will
get lock contention on macvlan device. And even if you explicitly
specifying the txq numbers ( though I don't believe most management
software will do this) when creating the macvlan/macvtap device, you
must also configure the XPS for macvlan to make sure it has the
possibility of using multiple transmit queues.



OK I think I'm finally putting all the pieces together thanks.

Do you know why macvtap is setting dev->tx_queue_len by default? If you
zero this then the noqueue_qdisc is used and the q->enqueue check in
dev_queue_xmit will fail.

Also if XPS is not configured then skb_tx_hash is used so multiple
transmit queues will still be used.


Perhaps defaulting the L2 forwarding devices to 1queue was a
mistake. But the same issue arises when running macvtap over a
non-multiqueue nic. Or even if you have a multiqueue device and create
many more macvtap queues than the lower device has queues.

Shouldn't the macvtap configuration take into account the lowest level
devices queues?


See commit 8ffab51b3dfc54876f145f15b351c41f3f703195 ("macvlan: lockless
tx path"). It allows the management to create a device without worrying
the underlying device.


OK.


How does using the L2 forwarding device change the
contention issues? Without the L2 forwarding LLTX is enabled but the
qdisc lock, etc is still acquired on the device below the macvlan.



That's the point. We need make sure the txq selection and qdisc lock
were done for the lower device not for the macvlan device itself. Then
macvlan can automatically benefit from the multi-queue capable lower
devices. But L2 forwarding needs to contend on the txq lock on macvlan
device itself, which is unnecessary and can complex the management.


If I make the l2 forwarding defaults a bit better then using the L2
forwarding case should not be any more complex. And because the queues
are dedicated to the macvtap device any contention from qdisc lock, etc
comes from the upper device only. Also if I get the bandwidth controls
in we can set the max/min bandwidth per macvtap device this way. That
is future work though.


The ixgbe driver as it is currently written can be configured for up to
4 queues by setting numtxqueues when the device is created. I assume
when creating macvtap queues the user needs to account for the number
of queues supported by the lower device.



We'd better not complicate the task of management, lockless tx path work
very well so we can just keep it. Btw, there's no way for the user to
know the maximum number of queues that L2 forwarding supports.


Good point I'll add an attribute to query it.


For L2 forwarding offload itself, more issues need to be addressed for
multiqueue macvtap:

- ndo_dfwd_add_station() can only create queues per device at ndo_open,
but multiqueue macvtap allows user to create and destroy queues at their
will and at any time.


same argument as above, isn't this the same when running macvtap without
the l2 offloads over a real device? I expect you hit the same contention
points when running over a real device.


Not true and not only for contention.

Macvtap allows user to create or destroy a queue by simply open or close
to character device /dev/tapX. But currently, we do nothing when a new
queue was created or destroyed for L2 forwarding offload.

For contention, lockless tx path make the contention only happens for
the txq or qdisc for the lower device, but L2 forwarding offload make
contention also happen for the macvlan device itself.


Right, but there will be less contention there because those queues
are a dedicated resource for the upper device.

At this point I think I need to put together a real testbed and
benchmark some of this with netperf and perf running to get real
numbers. When I originally did the l2 forwarding I did not do any
testing with multiple macvtap queues and only very limited work with
macvtap.







- it looks that ixgbe has a upper limit of 4 queues per station, but
macvtap currently allows up to 16 queues per device.



The 4 limit was to simplify the code because the queue mapping in the
driver gets complicated if it is greater than 4. We can probably
increase this latter. But sorry reiterating how is this different than
a macvtap on a real device that supports a max of 4 queues?


Well, it maybe easy. I just point out possible issues we may meet currently.


Right.




So more works need to be done and unless those above 3 issues were

Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-06 Thread Jason Wang
On 01/07/2014 01:15 PM, John Fastabend wrote:
> On 01/06/2014 07:10 PM, Jason Wang wrote:
>> On 01/06/2014 08:26 PM, Neil Horman wrote:
>>> On Mon, Jan 06, 2014 at 03:54:21PM +0800, Jason Wang wrote:
 On 01/06/2014 03:35 PM, John Fastabend wrote:
> On 01/05/2014 07:21 PM, Jason Wang wrote:
>> L2 fowarding offload will bypass the rx handler of real device. This
>> will make
>> the packet could not be forwarded to macvtap device. Another problem
>> is the
>> dev_hard_start_xmit() called for macvtap does not have any
>> synchronization.
>>
>> Fix this by forbidding L2 forwarding for macvtap.
>>
>> Cc: John Fastabend 
>> Cc: Neil Horman 
>> Signed-off-by: Jason Wang 
>> ---
>>drivers/net/macvlan.c |5 -
>>1 files changed, 4 insertions(+), 1 deletions(-)
>>
> I must be missing something.
>
> The lower layer device should set skb->dev to the correct macvtap
> device on receive so that in netif_receive_skb_core() the macvtap
> handler is hit. Skipping the macvlan receive handler should be OK
> because the switching was done by the hardware. If I read macvtap.c
> correctly macvlan_common_newlink() is called with 'dev' where 'dev'
> is the macvtap device. Any idea what I'm missing? I guess I'll need
> to setup a macvtap test case.
 Unlike macvlan, macvtap depends on rx handler on the lower device to
 work. In this case macvlan_handle_frame() will call macvtap_receive().
 So doing netif_receive_skb_core() for macvtap device directly won't
 work
 since we need to forward the packet to userspace instead of kernel.

 For net-next.git, it may work since commit
 6acf54f1cf0a6747bac9fea26f34cfc5a9029523 let macvtap device
 register an
 rx handler for itself.
>>> I agree, this seems like it should already be fixed with the above
>>> commit.  With
>>> this the macvlan rx handler should effectively be a no-op as far as the
>>> reception of frames is concerned.  As long as the driver sets the
>>> dev correctly
>>> to the macvtap device (and it appears to), macvtap will get frames
>>> to user
>>> space, regardless of weather the software or hardware did the
>>> switching.  If
>>> thats the case though, I think the solution is moving that fix to
>>> -stable
>>> (pending testing of course), rather than comming up with a new fix.
>>>
> And what synchronization are you worried about on
> dev_hard_start_xmit()?
> In the L2 forwarding offload case macvlan_open() clears the
> NETIF_F_LLTX
> flag so HARD_TX_LOCK protects the driver txq. We might hit this
> warning
> in dev_queue_xmit() though,
>
>net_crit_ratelimited("Virtual device %s asks to queue packet!\n",
>
> Perhaps we can remove it.
 The problem is macvtap does not call dev_queue_xmit() for macvlan
 device. It calls macvlan_start_xmit() directly from
 macvtap_get_user().
 So HARD_TX_LOCK was not done for the txq.
>>> This seems to also be fixed by
>>> 6acf54f1cf0a6747bac9fea26f34cfc5a9029523.
>>> Macvtap does, as of that commit use dev_queue_xmit for the
>>> transmission of
>>> frames to the lowerdevice.
>>
>> Unfortunately not. This commit has a side effect that it in fact
>> disables the multiqueue macvtap transmission. Since all macvtap queues
>> will contend on a single qdisc lock.
>>
>
> They will only contend on a single qdisc lock if the lower device has
> 1 queue. 

I think we are talking about 6acf54f1cf0a6747bac9fea26f34cfc5a9029523.

The qdisc or txq lock were macvlan device itself since dev_queue_xmit()
was called for macvlan device itself. So even if lower device has
multiple txqs, if you just create a one queue macvlan device, you will
get lock contention on macvlan device. And even if you explicitly
specifying the txq numbers ( though I don't believe most management
software will do this) when creating the macvlan/macvtap device, you
must also configure the XPS for macvlan to make sure it has the
possibility of using multiple transmit queues.

> Perhaps defaulting the L2 forwarding devices to 1queue was a
> mistake. But the same issue arises when running macvtap over a
> non-multiqueue nic. Or even if you have a multiqueue device and create
> many more macvtap queues than the lower device has queues.
>
> Shouldn't the macvtap configuration take into account the lowest level
> devices queues? 

See commit 8ffab51b3dfc54876f145f15b351c41f3f703195 ("macvlan: lockless
tx path"). It allows the management to create a device without worrying
the underlying device.
> How does using the L2 forwarding device change the
> contention issues? Without the L2 forwarding LLTX is enabled but the
> qdisc lock, etc is still acquired on the device below the macvlan.
>

That's the point. We need make sure the txq selection and qdisc lock
were done for the lower device not for the macvlan device itself. Then
macvlan can automatically benefit from the 

Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-06 Thread David Miller
From: Jason Wang 
Date: Tue, 07 Jan 2014 11:17:06 +0800

> On 01/07/2014 04:47 AM, David Miller wrote:
>> From: Jason Wang 
>> Date: Mon,  6 Jan 2014 11:21:06 +0800
>>
>>> L2 fowarding offload will bypass the rx handler of real device. This will 
>>> make
>>> the packet could not be forwarded to macvtap device. Another problem is the
>>> dev_hard_start_xmit() called for macvtap does not have any synchronization.
>>>
>>> Fix this by forbidding L2 forwarding for macvtap.
>>>
>>> Cc: John Fastabend 
>>> Cc: Neil Horman 
>>> Signed-off-by: Jason Wang 
>> I think I agree with Neil that the rx_handler change might be the best
>> way to fix this.  That change seems to have a lot of nice unintended
>> side effects, no?
> 
> Not all sides effects are nice.
> 
> One obvious issue is it disables the multiqueue macvtap transmission,
> since all queues will contend on a single qdisc lock of macvlan. And
> even more, multiqueue macvtap support creating and destroying a queue on
> demand which is not supported by L2 forwarding offload.
> 
> So L2 forwarding offload needs more fixes to let the multiqueue macvtap
> works. Currently, we really need this patch to make sure macvtap works
> as expected.

Ok I moved these two patches back to "Under Review".

These are pretty last minute and we'll need to make a decision on
what to do before Friday if you want these changes to really make
it into 3.13
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-06 Thread John Fastabend

On 01/06/2014 07:10 PM, Jason Wang wrote:

On 01/06/2014 08:26 PM, Neil Horman wrote:

On Mon, Jan 06, 2014 at 03:54:21PM +0800, Jason Wang wrote:

On 01/06/2014 03:35 PM, John Fastabend wrote:

On 01/05/2014 07:21 PM, Jason Wang wrote:

L2 fowarding offload will bypass the rx handler of real device. This
will make
the packet could not be forwarded to macvtap device. Another problem
is the
dev_hard_start_xmit() called for macvtap does not have any
synchronization.

Fix this by forbidding L2 forwarding for macvtap.

Cc: John Fastabend 
Cc: Neil Horman 
Signed-off-by: Jason Wang 
---
   drivers/net/macvlan.c |5 -
   1 files changed, 4 insertions(+), 1 deletions(-)


I must be missing something.

The lower layer device should set skb->dev to the correct macvtap
device on receive so that in netif_receive_skb_core() the macvtap
handler is hit. Skipping the macvlan receive handler should be OK
because the switching was done by the hardware. If I read macvtap.c
correctly macvlan_common_newlink() is called with 'dev' where 'dev'
is the macvtap device. Any idea what I'm missing? I guess I'll need
to setup a macvtap test case.

Unlike macvlan, macvtap depends on rx handler on the lower device to
work. In this case macvlan_handle_frame() will call macvtap_receive().
So doing netif_receive_skb_core() for macvtap device directly won't work
since we need to forward the packet to userspace instead of kernel.

For net-next.git, it may work since commit
6acf54f1cf0a6747bac9fea26f34cfc5a9029523 let macvtap device register an
rx handler for itself.

I agree, this seems like it should already be fixed with the above commit.  With
this the macvlan rx handler should effectively be a no-op as far as the
reception of frames is concerned.  As long as the driver sets the dev correctly
to the macvtap device (and it appears to), macvtap will get frames to user
space, regardless of weather the software or hardware did the switching.  If
thats the case though, I think the solution is moving that fix to -stable
(pending testing of course), rather than comming up with a new fix.


And what synchronization are you worried about on dev_hard_start_xmit()?
In the L2 forwarding offload case macvlan_open() clears the NETIF_F_LLTX
flag so HARD_TX_LOCK protects the driver txq. We might hit this warning
in dev_queue_xmit() though,

   net_crit_ratelimited("Virtual device %s asks to queue packet!\n",

Perhaps we can remove it.

The problem is macvtap does not call dev_queue_xmit() for macvlan
device. It calls macvlan_start_xmit() directly from macvtap_get_user().
So HARD_TX_LOCK was not done for the txq.

This seems to also be fixed by 6acf54f1cf0a6747bac9fea26f34cfc5a9029523.
Macvtap does, as of that commit use dev_queue_xmit for the transmission of
frames to the lowerdevice.


Unfortunately not. This commit has a side effect that it in fact
disables the multiqueue macvtap transmission. Since all macvtap queues
will contend on a single qdisc lock.



They will only contend on a single qdisc lock if the lower device has
1 queue. Perhaps defaulting the L2 forwarding devices to 1queue was a
mistake. But the same issue arises when running macvtap over a
non-multiqueue nic. Or even if you have a multiqueue device and create
many more macvtap queues than the lower device has queues.

Shouldn't the macvtap configuration take into account the lowest level
devices queues? How does using the L2 forwarding device change the
contention issues? Without the L2 forwarding LLTX is enabled but the
qdisc lock, etc is still acquired on the device below the macvlan.

The ixgbe driver as it is currently written can be configured for up to
4 queues by setting numtxqueues when the device is created. I assume
when creating macvtap queues the user needs to account for the number
of queues supported by the lower device.


For L2 forwarding offload itself, more issues need to be addressed for
multiqueue macvtap:

- ndo_dfwd_add_station() can only create queues per device at ndo_open,
but multiqueue macvtap allows user to create and destroy queues at their
will and at any time.


same argument as above, isn't this the same when running macvtap without
the l2 offloads over a real device? I expect you hit the same contention
points when running over a real device.



- it looks that ixgbe has a upper limit of 4 queues per station, but
macvtap currently allows up to 16 queues per device.



The 4 limit was to simplify the code because the queue mapping in the
driver gets complicated if it is greater than 4. We can probably
increase this latter. But sorry reiterating how is this different than
a macvtap on a real device that supports a max of 4 queues?


So more works need to be done and unless those above 3 issues were
addressed, this patch is really needed to make sure macvtap works.



Agreed there is a lot more work here to improve things I'm just not
sure we need to disable this now. Also note its the l2 forwarding
should be disabled by default so a user would 

Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-06 Thread John Fastabend

On 01/06/2014 07:10 PM, Jason Wang wrote:

On 01/06/2014 08:26 PM, Neil Horman wrote:

On Mon, Jan 06, 2014 at 03:54:21PM +0800, Jason Wang wrote:

On 01/06/2014 03:35 PM, John Fastabend wrote:

On 01/05/2014 07:21 PM, Jason Wang wrote:

L2 fowarding offload will bypass the rx handler of real device. This
will make
the packet could not be forwarded to macvtap device. Another problem
is the
dev_hard_start_xmit() called for macvtap does not have any
synchronization.

Fix this by forbidding L2 forwarding for macvtap.

Cc: John Fastabend 
Cc: Neil Horman 
Signed-off-by: Jason Wang 
---
   drivers/net/macvlan.c |5 -
   1 files changed, 4 insertions(+), 1 deletions(-)


I must be missing something.

The lower layer device should set skb->dev to the correct macvtap
device on receive so that in netif_receive_skb_core() the macvtap
handler is hit. Skipping the macvlan receive handler should be OK
because the switching was done by the hardware. If I read macvtap.c
correctly macvlan_common_newlink() is called with 'dev' where 'dev'
is the macvtap device. Any idea what I'm missing? I guess I'll need
to setup a macvtap test case.

Unlike macvlan, macvtap depends on rx handler on the lower device to
work. In this case macvlan_handle_frame() will call macvtap_receive().
So doing netif_receive_skb_core() for macvtap device directly won't work
since we need to forward the packet to userspace instead of kernel.

For net-next.git, it may work since commit
6acf54f1cf0a6747bac9fea26f34cfc5a9029523 let macvtap device register an
rx handler for itself.

I agree, this seems like it should already be fixed with the above commit.  With
this the macvlan rx handler should effectively be a no-op as far as the
reception of frames is concerned.  As long as the driver sets the dev correctly
to the macvtap device (and it appears to), macvtap will get frames to user
space, regardless of weather the software or hardware did the switching.  If
thats the case though, I think the solution is moving that fix to -stable
(pending testing of course), rather than comming up with a new fix.


And what synchronization are you worried about on dev_hard_start_xmit()?
In the L2 forwarding offload case macvlan_open() clears the NETIF_F_LLTX
flag so HARD_TX_LOCK protects the driver txq. We might hit this warning
in dev_queue_xmit() though,

   net_crit_ratelimited("Virtual device %s asks to queue packet!\n",

Perhaps we can remove it.

The problem is macvtap does not call dev_queue_xmit() for macvlan
device. It calls macvlan_start_xmit() directly from macvtap_get_user().
So HARD_TX_LOCK was not done for the txq.

This seems to also be fixed by 6acf54f1cf0a6747bac9fea26f34cfc5a9029523.
Macvtap does, as of that commit use dev_queue_xmit for the transmission of
frames to the lowerdevice.


Unfortunately not. This commit has a side effect that it in fact
disables the multiqueue macvtap transmission. Since all macvtap queues
will contend on a single qdisc lock.



They will only contend on a single qdisc lock if the lower device has
1 queue. Perhaps defaulting the L2 forwarding devices to 1queue was a
mistake. But the same issue arises when running macvtap over a
non-multiqueue nic. Or even if you have a multiqueue device and create
many more macvtap queues than the lower device has queues.

Shouldn't the macvtap configuration take into account the lowest level
devices queues? How does using the L2 forwarding device change the
contention issues? Without the L2 forwarding LLTX is enabled but the
qdisc lock, etc is still acquired on the device below the macvlan.

The ixgbe driver as it is currently written can be configured for up to
4 queues by setting numtxqueues when the device is created. I assume
when creating macvtap queues the user needs to account for the number
of queues supported by the lower device.


For L2 forwarding offload itself, more issues need to be addressed for
multiqueue macvtap:

- ndo_dfwd_add_station() can only create queues per device at ndo_open,
but multiqueue macvtap allows user to create and destroy queues at their
will and at any time.


same argument as above, isn't this the same when running macvtap without
the l2 offloads over a real device? I expect you hit the same contention
points when running over a real device.



- it looks that ixgbe has a upper limit of 4 queues per station, but
macvtap currently allows up to 16 queues per device.



The 4 limit was to simplify the code because the queue mapping in the
driver gets complicated if it is greater than 4. We can probably
increase this latter. But sorry reiterating how is this different than
a macvtap on a real device that supports a max of 4 queues?


So more works need to be done and unless those above 3 issues were
addressed, this patch is really needed to make sure macvtap works.



Agreed there is a lot more work here to improve things I'm just not
sure we need to disable this now. Also note its the l2 forwarding
should be disabled by default so a user would 

Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-06 Thread Jason Wang
On 01/07/2014 04:47 AM, David Miller wrote:
> From: Jason Wang 
> Date: Mon,  6 Jan 2014 11:21:06 +0800
>
>> L2 fowarding offload will bypass the rx handler of real device. This will 
>> make
>> the packet could not be forwarded to macvtap device. Another problem is the
>> dev_hard_start_xmit() called for macvtap does not have any synchronization.
>>
>> Fix this by forbidding L2 forwarding for macvtap.
>>
>> Cc: John Fastabend 
>> Cc: Neil Horman 
>> Signed-off-by: Jason Wang 
> I think I agree with Neil that the rx_handler change might be the best
> way to fix this.  That change seems to have a lot of nice unintended
> side effects, no?

Not all sides effects are nice.

One obvious issue is it disables the multiqueue macvtap transmission,
since all queues will contend on a single qdisc lock of macvlan. And
even more, multiqueue macvtap support creating and destroying a queue on
demand which is not supported by L2 forwarding offload.

So L2 forwarding offload needs more fixes to let the multiqueue macvtap
works. Currently, we really need this patch to make sure macvtap works
as expected.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-06 Thread Jason Wang
On 01/06/2014 08:26 PM, Neil Horman wrote:
> On Mon, Jan 06, 2014 at 03:54:21PM +0800, Jason Wang wrote:
>> On 01/06/2014 03:35 PM, John Fastabend wrote:
>>> On 01/05/2014 07:21 PM, Jason Wang wrote:
 L2 fowarding offload will bypass the rx handler of real device. This
 will make
 the packet could not be forwarded to macvtap device. Another problem
 is the
 dev_hard_start_xmit() called for macvtap does not have any
 synchronization.

 Fix this by forbidding L2 forwarding for macvtap.

 Cc: John Fastabend 
 Cc: Neil Horman 
 Signed-off-by: Jason Wang 
 ---
   drivers/net/macvlan.c |5 -
   1 files changed, 4 insertions(+), 1 deletions(-)

>>> I must be missing something.
>>>
>>> The lower layer device should set skb->dev to the correct macvtap
>>> device on receive so that in netif_receive_skb_core() the macvtap
>>> handler is hit. Skipping the macvlan receive handler should be OK
>>> because the switching was done by the hardware. If I read macvtap.c
>>> correctly macvlan_common_newlink() is called with 'dev' where 'dev'
>>> is the macvtap device. Any idea what I'm missing? I guess I'll need
>>> to setup a macvtap test case.
>> Unlike macvlan, macvtap depends on rx handler on the lower device to
>> work. In this case macvlan_handle_frame() will call macvtap_receive().
>> So doing netif_receive_skb_core() for macvtap device directly won't work
>> since we need to forward the packet to userspace instead of kernel.
>>
>> For net-next.git, it may work since commit
>> 6acf54f1cf0a6747bac9fea26f34cfc5a9029523 let macvtap device register an
>> rx handler for itself.
> I agree, this seems like it should already be fixed with the above commit.  
> With
> this the macvlan rx handler should effectively be a no-op as far as the
> reception of frames is concerned.  As long as the driver sets the dev 
> correctly
> to the macvtap device (and it appears to), macvtap will get frames to user
> space, regardless of weather the software or hardware did the switching.  If
> thats the case though, I think the solution is moving that fix to -stable
> (pending testing of course), rather than comming up with a new fix.
>
>>> And what synchronization are you worried about on dev_hard_start_xmit()?
>>> In the L2 forwarding offload case macvlan_open() clears the NETIF_F_LLTX
>>> flag so HARD_TX_LOCK protects the driver txq. We might hit this warning
>>> in dev_queue_xmit() though,
>>>
>>>   net_crit_ratelimited("Virtual device %s asks to queue packet!\n",
>>>
>>> Perhaps we can remove it.
>> The problem is macvtap does not call dev_queue_xmit() for macvlan
>> device. It calls macvlan_start_xmit() directly from macvtap_get_user().
>> So HARD_TX_LOCK was not done for the txq.
> This seems to also be fixed by 6acf54f1cf0a6747bac9fea26f34cfc5a9029523.
> Macvtap does, as of that commit use dev_queue_xmit for the transmission of
> frames to the lowerdevice.

Unfortunately not. This commit has a side effect that it in fact
disables the multiqueue macvtap transmission. Since all macvtap queues
will contend on a single qdisc lock.

For L2 forwarding offload itself, more issues need to be addressed for
multiqueue macvtap:

- ndo_dfwd_add_station() can only create queues per device at ndo_open,
but multiqueue macvtap allows user to create and destroy queues at their
will and at any time.
- it looks that ixgbe has a upper limit of 4 queues per station, but
macvtap currently allows up to 16 queues per device.

So more works need to be done and unless those above 3 issues were
addressed, this patch is really needed to make sure macvtap works.

>
> Regards
> Neil
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-06 Thread David Miller
From: Jason Wang 
Date: Mon,  6 Jan 2014 11:21:06 +0800

> L2 fowarding offload will bypass the rx handler of real device. This will make
> the packet could not be forwarded to macvtap device. Another problem is the
> dev_hard_start_xmit() called for macvtap does not have any synchronization.
> 
> Fix this by forbidding L2 forwarding for macvtap.
> 
> Cc: John Fastabend 
> Cc: Neil Horman 
> Signed-off-by: Jason Wang 

I think I agree with Neil that the rx_handler change might be the best
way to fix this.  That change seems to have a lot of nice unintended
side effects, no?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-06 Thread Neil Horman
On Mon, Jan 06, 2014 at 03:54:21PM +0800, Jason Wang wrote:
> On 01/06/2014 03:35 PM, John Fastabend wrote:
> > On 01/05/2014 07:21 PM, Jason Wang wrote:
> >> L2 fowarding offload will bypass the rx handler of real device. This
> >> will make
> >> the packet could not be forwarded to macvtap device. Another problem
> >> is the
> >> dev_hard_start_xmit() called for macvtap does not have any
> >> synchronization.
> >>
> >> Fix this by forbidding L2 forwarding for macvtap.
> >>
> >> Cc: John Fastabend 
> >> Cc: Neil Horman 
> >> Signed-off-by: Jason Wang 
> >> ---
> >>   drivers/net/macvlan.c |5 -
> >>   1 files changed, 4 insertions(+), 1 deletions(-)
> >>
> >
> > I must be missing something.
> >
> > The lower layer device should set skb->dev to the correct macvtap
> > device on receive so that in netif_receive_skb_core() the macvtap
> > handler is hit. Skipping the macvlan receive handler should be OK
> > because the switching was done by the hardware. If I read macvtap.c
> > correctly macvlan_common_newlink() is called with 'dev' where 'dev'
> > is the macvtap device. Any idea what I'm missing? I guess I'll need
> > to setup a macvtap test case.
> 
> Unlike macvlan, macvtap depends on rx handler on the lower device to
> work. In this case macvlan_handle_frame() will call macvtap_receive().
> So doing netif_receive_skb_core() for macvtap device directly won't work
> since we need to forward the packet to userspace instead of kernel.
> 
> For net-next.git, it may work since commit
> 6acf54f1cf0a6747bac9fea26f34cfc5a9029523 let macvtap device register an
> rx handler for itself.

I agree, this seems like it should already be fixed with the above commit.  With
this the macvlan rx handler should effectively be a no-op as far as the
reception of frames is concerned.  As long as the driver sets the dev correctly
to the macvtap device (and it appears to), macvtap will get frames to user
space, regardless of weather the software or hardware did the switching.  If
thats the case though, I think the solution is moving that fix to -stable
(pending testing of course), rather than comming up with a new fix.

> >
> > And what synchronization are you worried about on dev_hard_start_xmit()?
> > In the L2 forwarding offload case macvlan_open() clears the NETIF_F_LLTX
> > flag so HARD_TX_LOCK protects the driver txq. We might hit this warning
> > in dev_queue_xmit() though,
> >
> >   net_crit_ratelimited("Virtual device %s asks to queue packet!\n",
> >
> > Perhaps we can remove it.
> 
> The problem is macvtap does not call dev_queue_xmit() for macvlan
> device. It calls macvlan_start_xmit() directly from macvtap_get_user().
> So HARD_TX_LOCK was not done for the txq.
This seems to also be fixed by 6acf54f1cf0a6747bac9fea26f34cfc5a9029523.
Macvtap does, as of that commit use dev_queue_xmit for the transmission of
frames to the lowerdevice.

Regards
Neil

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-06 Thread Neil Horman
On Mon, Jan 06, 2014 at 03:54:21PM +0800, Jason Wang wrote:
 On 01/06/2014 03:35 PM, John Fastabend wrote:
  On 01/05/2014 07:21 PM, Jason Wang wrote:
  L2 fowarding offload will bypass the rx handler of real device. This
  will make
  the packet could not be forwarded to macvtap device. Another problem
  is the
  dev_hard_start_xmit() called for macvtap does not have any
  synchronization.
 
  Fix this by forbidding L2 forwarding for macvtap.
 
  Cc: John Fastabend john.r.fastab...@intel.com
  Cc: Neil Horman nhor...@tuxdriver.com
  Signed-off-by: Jason Wang jasow...@redhat.com
  ---
drivers/net/macvlan.c |5 -
1 files changed, 4 insertions(+), 1 deletions(-)
 
 
  I must be missing something.
 
  The lower layer device should set skb-dev to the correct macvtap
  device on receive so that in netif_receive_skb_core() the macvtap
  handler is hit. Skipping the macvlan receive handler should be OK
  because the switching was done by the hardware. If I read macvtap.c
  correctly macvlan_common_newlink() is called with 'dev' where 'dev'
  is the macvtap device. Any idea what I'm missing? I guess I'll need
  to setup a macvtap test case.
 
 Unlike macvlan, macvtap depends on rx handler on the lower device to
 work. In this case macvlan_handle_frame() will call macvtap_receive().
 So doing netif_receive_skb_core() for macvtap device directly won't work
 since we need to forward the packet to userspace instead of kernel.
 
 For net-next.git, it may work since commit
 6acf54f1cf0a6747bac9fea26f34cfc5a9029523 let macvtap device register an
 rx handler for itself.

I agree, this seems like it should already be fixed with the above commit.  With
this the macvlan rx handler should effectively be a no-op as far as the
reception of frames is concerned.  As long as the driver sets the dev correctly
to the macvtap device (and it appears to), macvtap will get frames to user
space, regardless of weather the software or hardware did the switching.  If
thats the case though, I think the solution is moving that fix to -stable
(pending testing of course), rather than comming up with a new fix.

 
  And what synchronization are you worried about on dev_hard_start_xmit()?
  In the L2 forwarding offload case macvlan_open() clears the NETIF_F_LLTX
  flag so HARD_TX_LOCK protects the driver txq. We might hit this warning
  in dev_queue_xmit() though,
 
net_crit_ratelimited(Virtual device %s asks to queue packet!\n,
 
  Perhaps we can remove it.
 
 The problem is macvtap does not call dev_queue_xmit() for macvlan
 device. It calls macvlan_start_xmit() directly from macvtap_get_user().
 So HARD_TX_LOCK was not done for the txq.
This seems to also be fixed by 6acf54f1cf0a6747bac9fea26f34cfc5a9029523.
Macvtap does, as of that commit use dev_queue_xmit for the transmission of
frames to the lowerdevice.

Regards
Neil

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-06 Thread David Miller
From: Jason Wang jasow...@redhat.com
Date: Mon,  6 Jan 2014 11:21:06 +0800

 L2 fowarding offload will bypass the rx handler of real device. This will make
 the packet could not be forwarded to macvtap device. Another problem is the
 dev_hard_start_xmit() called for macvtap does not have any synchronization.
 
 Fix this by forbidding L2 forwarding for macvtap.
 
 Cc: John Fastabend john.r.fastab...@intel.com
 Cc: Neil Horman nhor...@tuxdriver.com
 Signed-off-by: Jason Wang jasow...@redhat.com

I think I agree with Neil that the rx_handler change might be the best
way to fix this.  That change seems to have a lot of nice unintended
side effects, no?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-06 Thread Jason Wang
On 01/06/2014 08:26 PM, Neil Horman wrote:
 On Mon, Jan 06, 2014 at 03:54:21PM +0800, Jason Wang wrote:
 On 01/06/2014 03:35 PM, John Fastabend wrote:
 On 01/05/2014 07:21 PM, Jason Wang wrote:
 L2 fowarding offload will bypass the rx handler of real device. This
 will make
 the packet could not be forwarded to macvtap device. Another problem
 is the
 dev_hard_start_xmit() called for macvtap does not have any
 synchronization.

 Fix this by forbidding L2 forwarding for macvtap.

 Cc: John Fastabend john.r.fastab...@intel.com
 Cc: Neil Horman nhor...@tuxdriver.com
 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
   drivers/net/macvlan.c |5 -
   1 files changed, 4 insertions(+), 1 deletions(-)

 I must be missing something.

 The lower layer device should set skb-dev to the correct macvtap
 device on receive so that in netif_receive_skb_core() the macvtap
 handler is hit. Skipping the macvlan receive handler should be OK
 because the switching was done by the hardware. If I read macvtap.c
 correctly macvlan_common_newlink() is called with 'dev' where 'dev'
 is the macvtap device. Any idea what I'm missing? I guess I'll need
 to setup a macvtap test case.
 Unlike macvlan, macvtap depends on rx handler on the lower device to
 work. In this case macvlan_handle_frame() will call macvtap_receive().
 So doing netif_receive_skb_core() for macvtap device directly won't work
 since we need to forward the packet to userspace instead of kernel.

 For net-next.git, it may work since commit
 6acf54f1cf0a6747bac9fea26f34cfc5a9029523 let macvtap device register an
 rx handler for itself.
 I agree, this seems like it should already be fixed with the above commit.  
 With
 this the macvlan rx handler should effectively be a no-op as far as the
 reception of frames is concerned.  As long as the driver sets the dev 
 correctly
 to the macvtap device (and it appears to), macvtap will get frames to user
 space, regardless of weather the software or hardware did the switching.  If
 thats the case though, I think the solution is moving that fix to -stable
 (pending testing of course), rather than comming up with a new fix.

 And what synchronization are you worried about on dev_hard_start_xmit()?
 In the L2 forwarding offload case macvlan_open() clears the NETIF_F_LLTX
 flag so HARD_TX_LOCK protects the driver txq. We might hit this warning
 in dev_queue_xmit() though,

   net_crit_ratelimited(Virtual device %s asks to queue packet!\n,

 Perhaps we can remove it.
 The problem is macvtap does not call dev_queue_xmit() for macvlan
 device. It calls macvlan_start_xmit() directly from macvtap_get_user().
 So HARD_TX_LOCK was not done for the txq.
 This seems to also be fixed by 6acf54f1cf0a6747bac9fea26f34cfc5a9029523.
 Macvtap does, as of that commit use dev_queue_xmit for the transmission of
 frames to the lowerdevice.

Unfortunately not. This commit has a side effect that it in fact
disables the multiqueue macvtap transmission. Since all macvtap queues
will contend on a single qdisc lock.

For L2 forwarding offload itself, more issues need to be addressed for
multiqueue macvtap:

- ndo_dfwd_add_station() can only create queues per device at ndo_open,
but multiqueue macvtap allows user to create and destroy queues at their
will and at any time.
- it looks that ixgbe has a upper limit of 4 queues per station, but
macvtap currently allows up to 16 queues per device.

So more works need to be done and unless those above 3 issues were
addressed, this patch is really needed to make sure macvtap works.


 Regards
 Neil

 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-06 Thread Jason Wang
On 01/07/2014 04:47 AM, David Miller wrote:
 From: Jason Wang jasow...@redhat.com
 Date: Mon,  6 Jan 2014 11:21:06 +0800

 L2 fowarding offload will bypass the rx handler of real device. This will 
 make
 the packet could not be forwarded to macvtap device. Another problem is the
 dev_hard_start_xmit() called for macvtap does not have any synchronization.

 Fix this by forbidding L2 forwarding for macvtap.

 Cc: John Fastabend john.r.fastab...@intel.com
 Cc: Neil Horman nhor...@tuxdriver.com
 Signed-off-by: Jason Wang jasow...@redhat.com
 I think I agree with Neil that the rx_handler change might be the best
 way to fix this.  That change seems to have a lot of nice unintended
 side effects, no?

Not all sides effects are nice.

One obvious issue is it disables the multiqueue macvtap transmission,
since all queues will contend on a single qdisc lock of macvlan. And
even more, multiqueue macvtap support creating and destroying a queue on
demand which is not supported by L2 forwarding offload.

So L2 forwarding offload needs more fixes to let the multiqueue macvtap
works. Currently, we really need this patch to make sure macvtap works
as expected.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-06 Thread John Fastabend

On 01/06/2014 07:10 PM, Jason Wang wrote:

On 01/06/2014 08:26 PM, Neil Horman wrote:

On Mon, Jan 06, 2014 at 03:54:21PM +0800, Jason Wang wrote:

On 01/06/2014 03:35 PM, John Fastabend wrote:

On 01/05/2014 07:21 PM, Jason Wang wrote:

L2 fowarding offload will bypass the rx handler of real device. This
will make
the packet could not be forwarded to macvtap device. Another problem
is the
dev_hard_start_xmit() called for macvtap does not have any
synchronization.

Fix this by forbidding L2 forwarding for macvtap.

Cc: John Fastabend john.r.fastab...@intel.com
Cc: Neil Horman nhor...@tuxdriver.com
Signed-off-by: Jason Wang jasow...@redhat.com
---
   drivers/net/macvlan.c |5 -
   1 files changed, 4 insertions(+), 1 deletions(-)


I must be missing something.

The lower layer device should set skb-dev to the correct macvtap
device on receive so that in netif_receive_skb_core() the macvtap
handler is hit. Skipping the macvlan receive handler should be OK
because the switching was done by the hardware. If I read macvtap.c
correctly macvlan_common_newlink() is called with 'dev' where 'dev'
is the macvtap device. Any idea what I'm missing? I guess I'll need
to setup a macvtap test case.

Unlike macvlan, macvtap depends on rx handler on the lower device to
work. In this case macvlan_handle_frame() will call macvtap_receive().
So doing netif_receive_skb_core() for macvtap device directly won't work
since we need to forward the packet to userspace instead of kernel.

For net-next.git, it may work since commit
6acf54f1cf0a6747bac9fea26f34cfc5a9029523 let macvtap device register an
rx handler for itself.

I agree, this seems like it should already be fixed with the above commit.  With
this the macvlan rx handler should effectively be a no-op as far as the
reception of frames is concerned.  As long as the driver sets the dev correctly
to the macvtap device (and it appears to), macvtap will get frames to user
space, regardless of weather the software or hardware did the switching.  If
thats the case though, I think the solution is moving that fix to -stable
(pending testing of course), rather than comming up with a new fix.


And what synchronization are you worried about on dev_hard_start_xmit()?
In the L2 forwarding offload case macvlan_open() clears the NETIF_F_LLTX
flag so HARD_TX_LOCK protects the driver txq. We might hit this warning
in dev_queue_xmit() though,

   net_crit_ratelimited(Virtual device %s asks to queue packet!\n,

Perhaps we can remove it.

The problem is macvtap does not call dev_queue_xmit() for macvlan
device. It calls macvlan_start_xmit() directly from macvtap_get_user().
So HARD_TX_LOCK was not done for the txq.

This seems to also be fixed by 6acf54f1cf0a6747bac9fea26f34cfc5a9029523.
Macvtap does, as of that commit use dev_queue_xmit for the transmission of
frames to the lowerdevice.


Unfortunately not. This commit has a side effect that it in fact
disables the multiqueue macvtap transmission. Since all macvtap queues
will contend on a single qdisc lock.



They will only contend on a single qdisc lock if the lower device has
1 queue. Perhaps defaulting the L2 forwarding devices to 1queue was a
mistake. But the same issue arises when running macvtap over a
non-multiqueue nic. Or even if you have a multiqueue device and create
many more macvtap queues than the lower device has queues.

Shouldn't the macvtap configuration take into account the lowest level
devices queues? How does using the L2 forwarding device change the
contention issues? Without the L2 forwarding LLTX is enabled but the
qdisc lock, etc is still acquired on the device below the macvlan.

The ixgbe driver as it is currently written can be configured for up to
4 queues by setting numtxqueues when the device is created. I assume
when creating macvtap queues the user needs to account for the number
of queues supported by the lower device.


For L2 forwarding offload itself, more issues need to be addressed for
multiqueue macvtap:

- ndo_dfwd_add_station() can only create queues per device at ndo_open,
but multiqueue macvtap allows user to create and destroy queues at their
will and at any time.


same argument as above, isn't this the same when running macvtap without
the l2 offloads over a real device? I expect you hit the same contention
points when running over a real device.



- it looks that ixgbe has a upper limit of 4 queues per station, but
macvtap currently allows up to 16 queues per device.



The 4 limit was to simplify the code because the queue mapping in the
driver gets complicated if it is greater than 4. We can probably
increase this latter. But sorry reiterating how is this different than
a macvtap on a real device that supports a max of 4 queues?


So more works need to be done and unless those above 3 issues were
addressed, this patch is really needed to make sure macvtap works.



Agreed there is a lot more work here to improve things I'm just not
sure we need to disable this now. Also note its the 

Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-06 Thread John Fastabend

On 01/06/2014 07:10 PM, Jason Wang wrote:

On 01/06/2014 08:26 PM, Neil Horman wrote:

On Mon, Jan 06, 2014 at 03:54:21PM +0800, Jason Wang wrote:

On 01/06/2014 03:35 PM, John Fastabend wrote:

On 01/05/2014 07:21 PM, Jason Wang wrote:

L2 fowarding offload will bypass the rx handler of real device. This
will make
the packet could not be forwarded to macvtap device. Another problem
is the
dev_hard_start_xmit() called for macvtap does not have any
synchronization.

Fix this by forbidding L2 forwarding for macvtap.

Cc: John Fastabend john.r.fastab...@intel.com
Cc: Neil Horman nhor...@tuxdriver.com
Signed-off-by: Jason Wang jasow...@redhat.com
---
   drivers/net/macvlan.c |5 -
   1 files changed, 4 insertions(+), 1 deletions(-)


I must be missing something.

The lower layer device should set skb-dev to the correct macvtap
device on receive so that in netif_receive_skb_core() the macvtap
handler is hit. Skipping the macvlan receive handler should be OK
because the switching was done by the hardware. If I read macvtap.c
correctly macvlan_common_newlink() is called with 'dev' where 'dev'
is the macvtap device. Any idea what I'm missing? I guess I'll need
to setup a macvtap test case.

Unlike macvlan, macvtap depends on rx handler on the lower device to
work. In this case macvlan_handle_frame() will call macvtap_receive().
So doing netif_receive_skb_core() for macvtap device directly won't work
since we need to forward the packet to userspace instead of kernel.

For net-next.git, it may work since commit
6acf54f1cf0a6747bac9fea26f34cfc5a9029523 let macvtap device register an
rx handler for itself.

I agree, this seems like it should already be fixed with the above commit.  With
this the macvlan rx handler should effectively be a no-op as far as the
reception of frames is concerned.  As long as the driver sets the dev correctly
to the macvtap device (and it appears to), macvtap will get frames to user
space, regardless of weather the software or hardware did the switching.  If
thats the case though, I think the solution is moving that fix to -stable
(pending testing of course), rather than comming up with a new fix.


And what synchronization are you worried about on dev_hard_start_xmit()?
In the L2 forwarding offload case macvlan_open() clears the NETIF_F_LLTX
flag so HARD_TX_LOCK protects the driver txq. We might hit this warning
in dev_queue_xmit() though,

   net_crit_ratelimited(Virtual device %s asks to queue packet!\n,

Perhaps we can remove it.

The problem is macvtap does not call dev_queue_xmit() for macvlan
device. It calls macvlan_start_xmit() directly from macvtap_get_user().
So HARD_TX_LOCK was not done for the txq.

This seems to also be fixed by 6acf54f1cf0a6747bac9fea26f34cfc5a9029523.
Macvtap does, as of that commit use dev_queue_xmit for the transmission of
frames to the lowerdevice.


Unfortunately not. This commit has a side effect that it in fact
disables the multiqueue macvtap transmission. Since all macvtap queues
will contend on a single qdisc lock.



They will only contend on a single qdisc lock if the lower device has
1 queue. Perhaps defaulting the L2 forwarding devices to 1queue was a
mistake. But the same issue arises when running macvtap over a
non-multiqueue nic. Or even if you have a multiqueue device and create
many more macvtap queues than the lower device has queues.

Shouldn't the macvtap configuration take into account the lowest level
devices queues? How does using the L2 forwarding device change the
contention issues? Without the L2 forwarding LLTX is enabled but the
qdisc lock, etc is still acquired on the device below the macvlan.

The ixgbe driver as it is currently written can be configured for up to
4 queues by setting numtxqueues when the device is created. I assume
when creating macvtap queues the user needs to account for the number
of queues supported by the lower device.


For L2 forwarding offload itself, more issues need to be addressed for
multiqueue macvtap:

- ndo_dfwd_add_station() can only create queues per device at ndo_open,
but multiqueue macvtap allows user to create and destroy queues at their
will and at any time.


same argument as above, isn't this the same when running macvtap without
the l2 offloads over a real device? I expect you hit the same contention
points when running over a real device.



- it looks that ixgbe has a upper limit of 4 queues per station, but
macvtap currently allows up to 16 queues per device.



The 4 limit was to simplify the code because the queue mapping in the
driver gets complicated if it is greater than 4. We can probably
increase this latter. But sorry reiterating how is this different than
a macvtap on a real device that supports a max of 4 queues?


So more works need to be done and unless those above 3 issues were
addressed, this patch is really needed to make sure macvtap works.



Agreed there is a lot more work here to improve things I'm just not
sure we need to disable this now. Also note its the 

Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-06 Thread David Miller
From: Jason Wang jasow...@redhat.com
Date: Tue, 07 Jan 2014 11:17:06 +0800

 On 01/07/2014 04:47 AM, David Miller wrote:
 From: Jason Wang jasow...@redhat.com
 Date: Mon,  6 Jan 2014 11:21:06 +0800

 L2 fowarding offload will bypass the rx handler of real device. This will 
 make
 the packet could not be forwarded to macvtap device. Another problem is the
 dev_hard_start_xmit() called for macvtap does not have any synchronization.

 Fix this by forbidding L2 forwarding for macvtap.

 Cc: John Fastabend john.r.fastab...@intel.com
 Cc: Neil Horman nhor...@tuxdriver.com
 Signed-off-by: Jason Wang jasow...@redhat.com
 I think I agree with Neil that the rx_handler change might be the best
 way to fix this.  That change seems to have a lot of nice unintended
 side effects, no?
 
 Not all sides effects are nice.
 
 One obvious issue is it disables the multiqueue macvtap transmission,
 since all queues will contend on a single qdisc lock of macvlan. And
 even more, multiqueue macvtap support creating and destroying a queue on
 demand which is not supported by L2 forwarding offload.
 
 So L2 forwarding offload needs more fixes to let the multiqueue macvtap
 works. Currently, we really need this patch to make sure macvtap works
 as expected.

Ok I moved these two patches back to Under Review.

These are pretty last minute and we'll need to make a decision on
what to do before Friday if you want these changes to really make
it into 3.13
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-06 Thread Jason Wang
On 01/07/2014 01:15 PM, John Fastabend wrote:
 On 01/06/2014 07:10 PM, Jason Wang wrote:
 On 01/06/2014 08:26 PM, Neil Horman wrote:
 On Mon, Jan 06, 2014 at 03:54:21PM +0800, Jason Wang wrote:
 On 01/06/2014 03:35 PM, John Fastabend wrote:
 On 01/05/2014 07:21 PM, Jason Wang wrote:
 L2 fowarding offload will bypass the rx handler of real device. This
 will make
 the packet could not be forwarded to macvtap device. Another problem
 is the
 dev_hard_start_xmit() called for macvtap does not have any
 synchronization.

 Fix this by forbidding L2 forwarding for macvtap.

 Cc: John Fastabend john.r.fastab...@intel.com
 Cc: Neil Horman nhor...@tuxdriver.com
 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
drivers/net/macvlan.c |5 -
1 files changed, 4 insertions(+), 1 deletions(-)

 I must be missing something.

 The lower layer device should set skb-dev to the correct macvtap
 device on receive so that in netif_receive_skb_core() the macvtap
 handler is hit. Skipping the macvlan receive handler should be OK
 because the switching was done by the hardware. If I read macvtap.c
 correctly macvlan_common_newlink() is called with 'dev' where 'dev'
 is the macvtap device. Any idea what I'm missing? I guess I'll need
 to setup a macvtap test case.
 Unlike macvlan, macvtap depends on rx handler on the lower device to
 work. In this case macvlan_handle_frame() will call macvtap_receive().
 So doing netif_receive_skb_core() for macvtap device directly won't
 work
 since we need to forward the packet to userspace instead of kernel.

 For net-next.git, it may work since commit
 6acf54f1cf0a6747bac9fea26f34cfc5a9029523 let macvtap device
 register an
 rx handler for itself.
 I agree, this seems like it should already be fixed with the above
 commit.  With
 this the macvlan rx handler should effectively be a no-op as far as the
 reception of frames is concerned.  As long as the driver sets the
 dev correctly
 to the macvtap device (and it appears to), macvtap will get frames
 to user
 space, regardless of weather the software or hardware did the
 switching.  If
 thats the case though, I think the solution is moving that fix to
 -stable
 (pending testing of course), rather than comming up with a new fix.

 And what synchronization are you worried about on
 dev_hard_start_xmit()?
 In the L2 forwarding offload case macvlan_open() clears the
 NETIF_F_LLTX
 flag so HARD_TX_LOCK protects the driver txq. We might hit this
 warning
 in dev_queue_xmit() though,

net_crit_ratelimited(Virtual device %s asks to queue packet!\n,

 Perhaps we can remove it.
 The problem is macvtap does not call dev_queue_xmit() for macvlan
 device. It calls macvlan_start_xmit() directly from
 macvtap_get_user().
 So HARD_TX_LOCK was not done for the txq.
 This seems to also be fixed by
 6acf54f1cf0a6747bac9fea26f34cfc5a9029523.
 Macvtap does, as of that commit use dev_queue_xmit for the
 transmission of
 frames to the lowerdevice.

 Unfortunately not. This commit has a side effect that it in fact
 disables the multiqueue macvtap transmission. Since all macvtap queues
 will contend on a single qdisc lock.


 They will only contend on a single qdisc lock if the lower device has
 1 queue. 

I think we are talking about 6acf54f1cf0a6747bac9fea26f34cfc5a9029523.

The qdisc or txq lock were macvlan device itself since dev_queue_xmit()
was called for macvlan device itself. So even if lower device has
multiple txqs, if you just create a one queue macvlan device, you will
get lock contention on macvlan device. And even if you explicitly
specifying the txq numbers ( though I don't believe most management
software will do this) when creating the macvlan/macvtap device, you
must also configure the XPS for macvlan to make sure it has the
possibility of using multiple transmit queues.

 Perhaps defaulting the L2 forwarding devices to 1queue was a
 mistake. But the same issue arises when running macvtap over a
 non-multiqueue nic. Or even if you have a multiqueue device and create
 many more macvtap queues than the lower device has queues.

 Shouldn't the macvtap configuration take into account the lowest level
 devices queues? 

See commit 8ffab51b3dfc54876f145f15b351c41f3f703195 (macvlan: lockless
tx path). It allows the management to create a device without worrying
the underlying device.
 How does using the L2 forwarding device change the
 contention issues? Without the L2 forwarding LLTX is enabled but the
 qdisc lock, etc is still acquired on the device below the macvlan.


That's the point. We need make sure the txq selection and qdisc lock
were done for the lower device not for the macvlan device itself. Then
macvlan can automatically benefit from the multi-queue capable lower
devices. But L2 forwarding needs to contend on the txq lock on macvlan
device itself, which is unnecessary and can complex the management.
 The ixgbe driver as it is currently written can be configured for up to
 4 queues by setting numtxqueues when the device is 

Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-06 Thread John Fastabend

[...]


Unfortunately not. This commit has a side effect that it in fact
disables the multiqueue macvtap transmission. Since all macvtap queues
will contend on a single qdisc lock.



They will only contend on a single qdisc lock if the lower device has
1 queue.


I think we are talking about 6acf54f1cf0a6747bac9fea26f34cfc5a9029523.


Yes.



The qdisc or txq lock were macvlan device itself since dev_queue_xmit()
was called for macvlan device itself. So even if lower device has
multiple txqs, if you just create a one queue macvlan device, you will
get lock contention on macvlan device. And even if you explicitly
specifying the txq numbers ( though I don't believe most management
software will do this) when creating the macvlan/macvtap device, you
must also configure the XPS for macvlan to make sure it has the
possibility of using multiple transmit queues.



OK I think I'm finally putting all the pieces together thanks.

Do you know why macvtap is setting dev-tx_queue_len by default? If you
zero this then the noqueue_qdisc is used and the q-enqueue check in
dev_queue_xmit will fail.

Also if XPS is not configured then skb_tx_hash is used so multiple
transmit queues will still be used.


Perhaps defaulting the L2 forwarding devices to 1queue was a
mistake. But the same issue arises when running macvtap over a
non-multiqueue nic. Or even if you have a multiqueue device and create
many more macvtap queues than the lower device has queues.

Shouldn't the macvtap configuration take into account the lowest level
devices queues?


See commit 8ffab51b3dfc54876f145f15b351c41f3f703195 (macvlan: lockless
tx path). It allows the management to create a device without worrying
the underlying device.


OK.


How does using the L2 forwarding device change the
contention issues? Without the L2 forwarding LLTX is enabled but the
qdisc lock, etc is still acquired on the device below the macvlan.



That's the point. We need make sure the txq selection and qdisc lock
were done for the lower device not for the macvlan device itself. Then
macvlan can automatically benefit from the multi-queue capable lower
devices. But L2 forwarding needs to contend on the txq lock on macvlan
device itself, which is unnecessary and can complex the management.


If I make the l2 forwarding defaults a bit better then using the L2
forwarding case should not be any more complex. And because the queues
are dedicated to the macvtap device any contention from qdisc lock, etc
comes from the upper device only. Also if I get the bandwidth controls
in we can set the max/min bandwidth per macvtap device this way. That
is future work though.


The ixgbe driver as it is currently written can be configured for up to
4 queues by setting numtxqueues when the device is created. I assume
when creating macvtap queues the user needs to account for the number
of queues supported by the lower device.



We'd better not complicate the task of management, lockless tx path work
very well so we can just keep it. Btw, there's no way for the user to
know the maximum number of queues that L2 forwarding supports.


Good point I'll add an attribute to query it.


For L2 forwarding offload itself, more issues need to be addressed for
multiqueue macvtap:

- ndo_dfwd_add_station() can only create queues per device at ndo_open,
but multiqueue macvtap allows user to create and destroy queues at their
will and at any time.


same argument as above, isn't this the same when running macvtap without
the l2 offloads over a real device? I expect you hit the same contention
points when running over a real device.


Not true and not only for contention.

Macvtap allows user to create or destroy a queue by simply open or close
to character device /dev/tapX. But currently, we do nothing when a new
queue was created or destroyed for L2 forwarding offload.

For contention, lockless tx path make the contention only happens for
the txq or qdisc for the lower device, but L2 forwarding offload make
contention also happen for the macvlan device itself.


Right, but there will be less contention there because those queues
are a dedicated resource for the upper device.

At this point I think I need to put together a real testbed and
benchmark some of this with netperf and perf running to get real
numbers. When I originally did the l2 forwarding I did not do any
testing with multiple macvtap queues and only very limited work with
macvtap.







- it looks that ixgbe has a upper limit of 4 queues per station, but
macvtap currently allows up to 16 queues per device.



The 4 limit was to simplify the code because the queue mapping in the
driver gets complicated if it is greater than 4. We can probably
increase this latter. But sorry reiterating how is this different than
a macvtap on a real device that supports a max of 4 queues?


Well, it maybe easy. I just point out possible issues we may meet currently.


Right.




So more works need to be done and unless those above 3 issues were
addressed, 

Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-05 Thread Jason Wang
On 01/06/2014 03:35 PM, John Fastabend wrote:
> On 01/05/2014 07:21 PM, Jason Wang wrote:
>> L2 fowarding offload will bypass the rx handler of real device. This
>> will make
>> the packet could not be forwarded to macvtap device. Another problem
>> is the
>> dev_hard_start_xmit() called for macvtap does not have any
>> synchronization.
>>
>> Fix this by forbidding L2 forwarding for macvtap.
>>
>> Cc: John Fastabend 
>> Cc: Neil Horman 
>> Signed-off-by: Jason Wang 
>> ---
>>   drivers/net/macvlan.c |5 -
>>   1 files changed, 4 insertions(+), 1 deletions(-)
>>
>
> I must be missing something.
>
> The lower layer device should set skb->dev to the correct macvtap
> device on receive so that in netif_receive_skb_core() the macvtap
> handler is hit. Skipping the macvlan receive handler should be OK
> because the switching was done by the hardware. If I read macvtap.c
> correctly macvlan_common_newlink() is called with 'dev' where 'dev'
> is the macvtap device. Any idea what I'm missing? I guess I'll need
> to setup a macvtap test case.

Unlike macvlan, macvtap depends on rx handler on the lower device to
work. In this case macvlan_handle_frame() will call macvtap_receive().
So doing netif_receive_skb_core() for macvtap device directly won't work
since we need to forward the packet to userspace instead of kernel.

For net-next.git, it may work since commit
6acf54f1cf0a6747bac9fea26f34cfc5a9029523 let macvtap device register an
rx handler for itself.
>
> And what synchronization are you worried about on dev_hard_start_xmit()?
> In the L2 forwarding offload case macvlan_open() clears the NETIF_F_LLTX
> flag so HARD_TX_LOCK protects the driver txq. We might hit this warning
> in dev_queue_xmit() though,
>
>   net_crit_ratelimited("Virtual device %s asks to queue packet!\n",
>
> Perhaps we can remove it.

The problem is macvtap does not call dev_queue_xmit() for macvlan
device. It calls macvlan_start_xmit() directly from macvtap_get_user().
So HARD_TX_LOCK was not done for the txq.
>
>> diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
>> index 60406b0..5360f73 100644
>> --- a/drivers/net/macvlan.c
>> +++ b/drivers/net/macvlan.c
>> @@ -338,6 +338,8 @@ static const struct header_ops
>> macvlan_hard_header_ops = {
>>   .cache_update= eth_header_cache_update,
>>   };
>>
>> +static struct rtnl_link_ops macvlan_link_ops;
>> +
>>   static int macvlan_open(struct net_device *dev)
>>   {
>>   struct macvlan_dev *vlan = netdev_priv(dev);
>> @@ -353,7 +355,8 @@ static int macvlan_open(struct net_device *dev)
>>   goto hash_add;
>>   }
>>
>> -if (lowerdev->features & NETIF_F_HW_L2FW_DOFFLOAD) {
>> +if (lowerdev->features & NETIF_F_HW_L2FW_DOFFLOAD &&
>> +dev->rtnl_link_ops == _link_ops) {
>>   vlan->fwd_priv =
>> lowerdev->netdev_ops->ndo_dfwd_add_station(lowerdev,
>> dev);
>>
>>
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-05 Thread John Fastabend

On 01/05/2014 07:21 PM, Jason Wang wrote:

L2 fowarding offload will bypass the rx handler of real device. This will make
the packet could not be forwarded to macvtap device. Another problem is the
dev_hard_start_xmit() called for macvtap does not have any synchronization.

Fix this by forbidding L2 forwarding for macvtap.

Cc: John Fastabend 
Cc: Neil Horman 
Signed-off-by: Jason Wang 
---
  drivers/net/macvlan.c |5 -
  1 files changed, 4 insertions(+), 1 deletions(-)



I must be missing something.

The lower layer device should set skb->dev to the correct macvtap
device on receive so that in netif_receive_skb_core() the macvtap
handler is hit. Skipping the macvlan receive handler should be OK
because the switching was done by the hardware. If I read macvtap.c
correctly macvlan_common_newlink() is called with 'dev' where 'dev'
is the macvtap device. Any idea what I'm missing? I guess I'll need
to setup a macvtap test case.

And what synchronization are you worried about on dev_hard_start_xmit()?
In the L2 forwarding offload case macvlan_open() clears the NETIF_F_LLTX
flag so HARD_TX_LOCK protects the driver txq. We might hit this warning
in dev_queue_xmit() though,

  net_crit_ratelimited("Virtual device %s asks to queue packet!\n",

Perhaps we can remove it.


diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 60406b0..5360f73 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -338,6 +338,8 @@ static const struct header_ops macvlan_hard_header_ops = {
.cache_update   = eth_header_cache_update,
  };

+static struct rtnl_link_ops macvlan_link_ops;
+
  static int macvlan_open(struct net_device *dev)
  {
struct macvlan_dev *vlan = netdev_priv(dev);
@@ -353,7 +355,8 @@ static int macvlan_open(struct net_device *dev)
goto hash_add;
}

-   if (lowerdev->features & NETIF_F_HW_L2FW_DOFFLOAD) {
+   if (lowerdev->features & NETIF_F_HW_L2FW_DOFFLOAD &&
+   dev->rtnl_link_ops == _link_ops) {
vlan->fwd_priv =
  lowerdev->netdev_ops->ndo_dfwd_add_station(lowerdev, dev);





--
John Fastabend Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-05 Thread John Fastabend

On 01/05/2014 07:21 PM, Jason Wang wrote:

L2 fowarding offload will bypass the rx handler of real device. This will make
the packet could not be forwarded to macvtap device. Another problem is the
dev_hard_start_xmit() called for macvtap does not have any synchronization.

Fix this by forbidding L2 forwarding for macvtap.

Cc: John Fastabend john.r.fastab...@intel.com
Cc: Neil Horman nhor...@tuxdriver.com
Signed-off-by: Jason Wang jasow...@redhat.com
---
  drivers/net/macvlan.c |5 -
  1 files changed, 4 insertions(+), 1 deletions(-)



I must be missing something.

The lower layer device should set skb-dev to the correct macvtap
device on receive so that in netif_receive_skb_core() the macvtap
handler is hit. Skipping the macvlan receive handler should be OK
because the switching was done by the hardware. If I read macvtap.c
correctly macvlan_common_newlink() is called with 'dev' where 'dev'
is the macvtap device. Any idea what I'm missing? I guess I'll need
to setup a macvtap test case.

And what synchronization are you worried about on dev_hard_start_xmit()?
In the L2 forwarding offload case macvlan_open() clears the NETIF_F_LLTX
flag so HARD_TX_LOCK protects the driver txq. We might hit this warning
in dev_queue_xmit() though,

  net_crit_ratelimited(Virtual device %s asks to queue packet!\n,

Perhaps we can remove it.


diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 60406b0..5360f73 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -338,6 +338,8 @@ static const struct header_ops macvlan_hard_header_ops = {
.cache_update   = eth_header_cache_update,
  };

+static struct rtnl_link_ops macvlan_link_ops;
+
  static int macvlan_open(struct net_device *dev)
  {
struct macvlan_dev *vlan = netdev_priv(dev);
@@ -353,7 +355,8 @@ static int macvlan_open(struct net_device *dev)
goto hash_add;
}

-   if (lowerdev-features  NETIF_F_HW_L2FW_DOFFLOAD) {
+   if (lowerdev-features  NETIF_F_HW_L2FW_DOFFLOAD 
+   dev-rtnl_link_ops == macvlan_link_ops) {
vlan-fwd_priv =
  lowerdev-netdev_ops-ndo_dfwd_add_station(lowerdev, dev);





--
John Fastabend Intel Corporation
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net 1/2] macvlan: forbid L2 fowarding offload for macvtap

2014-01-05 Thread Jason Wang
On 01/06/2014 03:35 PM, John Fastabend wrote:
 On 01/05/2014 07:21 PM, Jason Wang wrote:
 L2 fowarding offload will bypass the rx handler of real device. This
 will make
 the packet could not be forwarded to macvtap device. Another problem
 is the
 dev_hard_start_xmit() called for macvtap does not have any
 synchronization.

 Fix this by forbidding L2 forwarding for macvtap.

 Cc: John Fastabend john.r.fastab...@intel.com
 Cc: Neil Horman nhor...@tuxdriver.com
 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
   drivers/net/macvlan.c |5 -
   1 files changed, 4 insertions(+), 1 deletions(-)


 I must be missing something.

 The lower layer device should set skb-dev to the correct macvtap
 device on receive so that in netif_receive_skb_core() the macvtap
 handler is hit. Skipping the macvlan receive handler should be OK
 because the switching was done by the hardware. If I read macvtap.c
 correctly macvlan_common_newlink() is called with 'dev' where 'dev'
 is the macvtap device. Any idea what I'm missing? I guess I'll need
 to setup a macvtap test case.

Unlike macvlan, macvtap depends on rx handler on the lower device to
work. In this case macvlan_handle_frame() will call macvtap_receive().
So doing netif_receive_skb_core() for macvtap device directly won't work
since we need to forward the packet to userspace instead of kernel.

For net-next.git, it may work since commit
6acf54f1cf0a6747bac9fea26f34cfc5a9029523 let macvtap device register an
rx handler for itself.

 And what synchronization are you worried about on dev_hard_start_xmit()?
 In the L2 forwarding offload case macvlan_open() clears the NETIF_F_LLTX
 flag so HARD_TX_LOCK protects the driver txq. We might hit this warning
 in dev_queue_xmit() though,

   net_crit_ratelimited(Virtual device %s asks to queue packet!\n,

 Perhaps we can remove it.

The problem is macvtap does not call dev_queue_xmit() for macvlan
device. It calls macvlan_start_xmit() directly from macvtap_get_user().
So HARD_TX_LOCK was not done for the txq.

 diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
 index 60406b0..5360f73 100644
 --- a/drivers/net/macvlan.c
 +++ b/drivers/net/macvlan.c
 @@ -338,6 +338,8 @@ static const struct header_ops
 macvlan_hard_header_ops = {
   .cache_update= eth_header_cache_update,
   };

 +static struct rtnl_link_ops macvlan_link_ops;
 +
   static int macvlan_open(struct net_device *dev)
   {
   struct macvlan_dev *vlan = netdev_priv(dev);
 @@ -353,7 +355,8 @@ static int macvlan_open(struct net_device *dev)
   goto hash_add;
   }

 -if (lowerdev-features  NETIF_F_HW_L2FW_DOFFLOAD) {
 +if (lowerdev-features  NETIF_F_HW_L2FW_DOFFLOAD 
 +dev-rtnl_link_ops == macvlan_link_ops) {
   vlan-fwd_priv =
 lowerdev-netdev_ops-ndo_dfwd_add_station(lowerdev,
 dev);





--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/