On Thu, Jan 16, 2014 at 12:29:35PM +0800, Jason Wang wrote: > On 01/15/2014 03:21 PM, Michael S. Tsirkin wrote: > >On Wed, Jan 15, 2014 at 11:36:01AM +0800, Jason Wang wrote: > >>On 01/14/2014 05:52 PM, Michael S. Tsirkin wrote: > >>>On Tue, Jan 14, 2014 at 04:45:24PM +0800, Jason Wang wrote: > >>>>>On 01/14/2014 04:25 PM, Michael S. Tsirkin wrote: > >>>>>>>On Tue, Jan 14, 2014 at 02:53:07PM +0800, Jason Wang wrote: > >>>>>>>>>We used to limit the number of packets queued through > >>>>>>>>>tx_queue_length. This > >>>>>>>>>has several issues: > >>>>>>>>> > >>>>>>>>>- tx_queue_length is the control of qdisc queue length, simply > >>>>>>>>>reusing it > >>>>>>>>> to control the packets queued by device may cause confusion. > >>>>>>>>>- After commit 6acf54f1cf0a6747bac9fea26f34cfc5a9029523 ("macvtap: > >>>>>>>>>Add > >>>>>>>>> support of packet capture on macvtap device."), an unexpected > >>>>>>>>> qdisc > >>>>>>>>> caused by non-zero tx_queue_length will lead qdisc lock > >>>>>>>>> contention for > >>>>>>>>> multiqueue deivce. > >>>>>>>>>- What we really want is to limit the total amount of memory > >>>>>>>>>occupied not > >>>>>>>>> the number of packets. > >>>>>>>>> > >>>>>>>>>So this patch tries to solve the above issues by using socket rcvbuf > >>>>>>>>>to > >>>>>>>>>limit the packets could be queued for tun/macvtap. This was done by > >>>>>>>>>using > >>>>>>>>>sock_queue_rcv_skb() instead of a direct call to skb_queue_tail(). > >>>>>>>>>Also two > >>>>>>>>>new ioctl() were introduced for userspace to change the rcvbuf like > >>>>>>>>>what we > >>>>>>>>>have done for sndbuf. > >>>>>>>>> > >>>>>>>>>With this fix, we can safely change the tx_queue_len of macvtap to > >>>>>>>>>zero. This will make multiqueue works without extra lock contention. > >>>>>>>>> > >>>>>>>>>Cc: Vlad Yasevich<vyase...@redhat.com> > >>>>>>>>>Cc: Michael S. Tsirkin<m...@redhat.com> > >>>>>>>>>Cc: John Fastabend<john.r.fastab...@intel.com> > >>>>>>>>>Cc: Stephen Hemminger<step...@networkplumber.org> > >>>>>>>>>Cc: Herbert Xu<herb...@gondor.apana.org.au> > >>>>>>>>>Signed-off-by: Jason Wang<jasow...@redhat.com> > >>>>>>>No, I don't think we can change userspace-visible behaviour like that. > >>>>>>> > >>>>>>>This will break any existing user that tries to control > >>>>>>>queue length through sysfs,netlink or device ioctl. > >>>>>But it looks like a buggy API, since tx_queue_len should be for qdisc > >>>>>queue length instead of device itself. > >>>Probably, but it's been like this since 2.6.x time. > >>>Also, qdisc queue is unused for tun so it seemed kind of > >>>reasonable to override tx_queue_len. > >>> > >>>>>If we really want to preserve the > >>>>>behaviour, how about using a new feature flag and change the behaviour > >>>>>only when the device is created (TUNSETIFF) with the new flag? > >>>OK this addresses the issue partially, but there's also an issue > >>>of permissions: tx_queue_len can only be changed if > >>>capable(CAP_NET_ADMIN). OTOH in your patch a regular user > >>>can change the amount of memory consumed per queue > >>>by calling TUNSETRCVBUF. > >>Yes, but we have the same issue for TUNSETSNDBUF. > >To an extent, but TUNSETSNDBUF is different. It limits how much device can > >queue > >*in the networking stack* but each queue in the stack is also > >limited, when we exceed that we star dropping packets. > >So while with infinite value (which is the default btw) > >you can keep host pretty busy, you will not be able to run > >it out of memory. > > > >The proposed TUNSETRCVBUF would keep configured amount > >of memory around indefinitely so you can run host out of memory. > > > >So assuming all this > >How about an ethtool or netlink command to configure this > >instead? > > > > Ok, so we can add net admin check for before trying to set rcvbuf.
No, in practice I think using ioctl for sndbuf was also a mistake. Applications have no idea what to set it to - you need to know what else is running on the system, after a while QEMU ended up setting it back to infinity otherwise things kept breaking. ethtool or netlink would not have this problem. Which of the two is preferable I'm not sure. I wonder what do management tools such as libvirt prefer. > I > think it's better to use ioctl since we've already use it for > sndbuf. Using ethool means you need a dedicated new ethtool method > just for tuntap which seems sub-optimal. > Netlink looks better, but > we should also implement other ioctl also. I'm not sure what this last phrase means. Can you clarify pls? > >>>>>>>Take a look at my patch in msg ID 20140109071721.gd19...@redhat.com > >>>>>>>which gives one way to set tx_queue_len to zero without > >>>>>>>breaking userspace. > >>>>>If I read the patch correctly, it will make no way for the user who > >>>>>really want to change the qdisc queue length for tun. > >>>Why would this matter? As far as I can see qdisc queue is currently > >>>unused. > >>> > >>User may use qdisc to do port mirroring, bandwidth limitation, traffic > >>prioritization or more for a VM. So we do have users and maybe more > >>consider the case of vpn. > >Well it's not used by default at least. > >I remember that we discussed this previously actually. > > > >If all we want to do actually is utilize no_qdisc by default, > >we can simply use Eric's patch: > > > >http://article.gmane.org/gmane.linux.kernel/1279597 > > > >and a similar patch for macvtap. > >I tried it at the time and it didn't seem to help performance > >at all, but a lot has changed since, in particular I didn't > >test mq. > > > >If you now have results showing how it's beneficial, pls post them. > > > > I will have a test to see the difference. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/