Re: [net-next rfc 1/3] net: avoid high order memory allocation for queues by using flex array
On Thu, 2013-06-20 at 13:14 +0800, Jason Wang wrote: > A drawback of NETIF_F_LLTX is that we may contend on qdisc lock > especially when we have a huge number of tx queues. For your information loopback driver is LLTX, and there is no qdisc on it, unless you specifically add one. Once you add a qdisc on a device, you hit the typical contention on a spinlock. But its hard to design a parallel qdisc. So far nobody did that. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [net-next rfc 1/3] net: avoid high order memory allocation for queues by using flex array
On Thu, 2013-06-20 at 13:14 +0800, Jason Wang wrote: A drawback of NETIF_F_LLTX is that we may contend on qdisc lock especially when we have a huge number of tx queues. For your information loopback driver is LLTX, and there is no qdisc on it, unless you specifically add one. Once you add a qdisc on a device, you hit the typical contention on a spinlock. But its hard to design a parallel qdisc. So far nobody did that. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [net-next rfc 1/3] net: avoid high order memory allocation for queues by using flex array
On 06/19/2013 05:56 PM, Eric Dumazet wrote: > On Wed, 2013-06-19 at 12:11 +0300, Michael S. Tsirkin wrote: > >> Well KVM supports up to 160 VCPUs on x86. >> >> Creating a queue per CPU is very reasonable, and >> assuming cache line size of 64 bytes, netdev_queue seems to be 320 >> bytes, that's 320*160 = 51200. So 12.5 pages, order-4 allocation. >> I agree most people don't have such systems yet, but >> they do exist. > Even so, it will just work, like a fork() is likely to work, even if a > process needs order-1 allocation for kernel stack. > > Some drivers still use order-10 allocations with kmalloc(), and nobody > complained yet. > > We had complains with mlx4 driver lately only bcause kmalloc() now gives > a warning if allocations above MAX_ORDER are attempted. > > Having a single pointer means that we can : > > - Attempts a regular kmalloc() call, it will work most of the time. > - fallback to vmalloc() _if_ kmalloc() failed. > > Frankly, if you want one tx queue per cpu, I would rather use > NETIF_F_LLTX, like some other virtual devices. A drawback of NETIF_F_LLTX is that we may contend on qdisc lock especially when we have a huge number of tx queues. > > This way, you can have real per cpu memory, with proper NUMA affinity. > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [net-next rfc 1/3] net: avoid high order memory allocation for queues by using flex array
On Wed, Jun 19, 2013 at 08:58:31AM -0700, Eric Dumazet wrote: > On Wed, 2013-06-19 at 18:40 +0300, Michael S. Tsirkin wrote: > > > That's a good trick too - vmalloc memory is a bit slower > > on x86 since it's not using a huge page, but that's only > > when we have lots of CPUs/queues... > > > > Short term - how about switching to vmalloc if > 32 queues? > > You do not have to switch "if > 32 queues" > > diff --git a/net/core/dev.c b/net/core/dev.c > index fa007db..473cc9e 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -130,6 +130,7 @@ > #include > #include > #include > +#include > > #include "net-sysfs.h" > > @@ -5253,6 +5254,14 @@ static void netdev_init_one_queue(struct net_device > *dev, > #endif > } > > +static void netif_free_tx_queues(struct net_device *dev) > +{ > + if (is_vmalloc_addr(dev->_tx)) > + vfree(dev->_tx); > + else > + kfree(dev->_tx); > +} > + > static int netif_alloc_netdev_queues(struct net_device *dev) > { > unsigned int count = dev->num_tx_queues; > @@ -5261,9 +5270,11 @@ static int netif_alloc_netdev_queues(struct net_device > *dev) > BUG_ON(count < 1); > > tx = kcalloc(count, sizeof(struct netdev_queue), GFP_KERNEL); As we have explicit code to recover, maybe set __GFP_NOWARN? If we are out of memory, vzalloc will warn too, we don't need two warnings. > - if (!tx) > - return -ENOMEM; > - > + if (!tx) { > + tx = vzalloc(count * sizeof(struct netdev_queue)); > + if (!tx) > + return -ENOMEM; > + } > dev->_tx = tx; > > netdev_for_each_tx_queue(dev, netdev_init_one_queue, NULL); > @@ -5811,7 +5822,7 @@ free_all: > > free_pcpu: > free_percpu(dev->pcpu_refcnt); > - kfree(dev->_tx); > + netif_free_tx_queues(dev); > #ifdef CONFIG_RPS > kfree(dev->_rx); > #endif > @@ -5836,7 +5847,7 @@ void free_netdev(struct net_device *dev) > > release_net(dev_net(dev)); > > - kfree(dev->_tx); > + netif_free_tx_queues(dev); > #ifdef CONFIG_RPS > kfree(dev->_rx); > #endif > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [net-next rfc 1/3] net: avoid high order memory allocation for queues by using flex array
On Wed, 2013-06-19 at 17:06 +0100, David Laight wrote: > Given the number of places I've seen this code added, why > not put it in a general helper? Good luck with that. We had many attempts in the past. > I also thought that malloc() with GFP_KERNEL would sleep. > Under what conditions does it fail instead? For example on 32 bit kernel, LOW memory exhaustion/fragmentation. vmalloc() has an extra 128MB virtual space (on i386 at least with standard 3G/1G split) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [net-next rfc 1/3] net: avoid high order memory allocation for queues by using flex array
> +static void netif_free_tx_queues(struct net_device *dev) > +{ > + if (is_vmalloc_addr(dev->_tx)) > + vfree(dev->_tx); > + else > + kfree(dev->_tx); > +} > + > static int netif_alloc_netdev_queues(struct net_device *dev) > { > unsigned int count = dev->num_tx_queues; > @@ -5261,9 +5270,11 @@ static int netif_alloc_netdev_queues(struct net_device > *dev) > BUG_ON(count < 1); > > tx = kcalloc(count, sizeof(struct netdev_queue), GFP_KERNEL); > - if (!tx) > - return -ENOMEM; > - > + if (!tx) { > + tx = vzalloc(count * sizeof(struct netdev_queue)); > + if (!tx) > + return -ENOMEM; > + } > dev->_tx = tx; Given the number of places I've seen this code added, why not put it in a general helper? I also thought that malloc() with GFP_KERNEL would sleep. Under what conditions does it fail instead? David
Re: [net-next rfc 1/3] net: avoid high order memory allocation for queues by using flex array
On Wed, 2013-06-19 at 18:40 +0300, Michael S. Tsirkin wrote: > That's a good trick too - vmalloc memory is a bit slower > on x86 since it's not using a huge page, but that's only > when we have lots of CPUs/queues... > > Short term - how about switching to vmalloc if > 32 queues? You do not have to switch "if > 32 queues" diff --git a/net/core/dev.c b/net/core/dev.c index fa007db..473cc9e 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -130,6 +130,7 @@ #include #include #include +#include #include "net-sysfs.h" @@ -5253,6 +5254,14 @@ static void netdev_init_one_queue(struct net_device *dev, #endif } +static void netif_free_tx_queues(struct net_device *dev) +{ + if (is_vmalloc_addr(dev->_tx)) + vfree(dev->_tx); + else + kfree(dev->_tx); +} + static int netif_alloc_netdev_queues(struct net_device *dev) { unsigned int count = dev->num_tx_queues; @@ -5261,9 +5270,11 @@ static int netif_alloc_netdev_queues(struct net_device *dev) BUG_ON(count < 1); tx = kcalloc(count, sizeof(struct netdev_queue), GFP_KERNEL); - if (!tx) - return -ENOMEM; - + if (!tx) { + tx = vzalloc(count * sizeof(struct netdev_queue)); + if (!tx) + return -ENOMEM; + } dev->_tx = tx; netdev_for_each_tx_queue(dev, netdev_init_one_queue, NULL); @@ -5811,7 +5822,7 @@ free_all: free_pcpu: free_percpu(dev->pcpu_refcnt); - kfree(dev->_tx); + netif_free_tx_queues(dev); #ifdef CONFIG_RPS kfree(dev->_rx); #endif @@ -5836,7 +5847,7 @@ void free_netdev(struct net_device *dev) release_net(dev_net(dev)); - kfree(dev->_tx); + netif_free_tx_queues(dev); #ifdef CONFIG_RPS kfree(dev->_rx); #endif -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [net-next rfc 1/3] net: avoid high order memory allocation for queues by using flex array
On Wed, Jun 19, 2013 at 02:56:03AM -0700, Eric Dumazet wrote: > On Wed, 2013-06-19 at 12:11 +0300, Michael S. Tsirkin wrote: > > > Well KVM supports up to 160 VCPUs on x86. > > > > Creating a queue per CPU is very reasonable, and > > assuming cache line size of 64 bytes, netdev_queue seems to be 320 > > bytes, that's 320*160 = 51200. So 12.5 pages, order-4 allocation. > > I agree most people don't have such systems yet, but > > they do exist. > > Even so, it will just work, like a fork() is likely to work, even if a > process needs order-1 allocation for kernel stack. > > Some drivers still use order-10 allocations with kmalloc(), and nobody > complained yet. > > We had complains with mlx4 driver lately only bcause kmalloc() now gives > a warning if allocations above MAX_ORDER are attempted. > > Having a single pointer means that we can : > > - Attempts a regular kmalloc() call, it will work most of the time. > - fallback to vmalloc() _if_ kmalloc() failed. That's a good trick too - vmalloc memory is a bit slower on x86 since it's not using a huge page, but that's only when we have lots of CPUs/queues... Short term - how about switching to vmalloc if > 32 queues? > Frankly, if you want one tx queue per cpu, I would rather use > NETIF_F_LLTX, like some other virtual devices. > > This way, you can have real per cpu memory, with proper NUMA affinity. > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [net-next rfc 1/3] net: avoid high order memory allocation for queues by using flex array
On Wed, Jun 19, 2013 at 02:56:03AM -0700, Eric Dumazet wrote: > On Wed, 2013-06-19 at 12:11 +0300, Michael S. Tsirkin wrote: > > > Well KVM supports up to 160 VCPUs on x86. > > > > Creating a queue per CPU is very reasonable, and > > assuming cache line size of 64 bytes, netdev_queue seems to be 320 > > bytes, that's 320*160 = 51200. So 12.5 pages, order-4 allocation. > > I agree most people don't have such systems yet, but > > they do exist. > > Even so, it will just work, like a fork() is likely to work, even if a > process needs order-1 allocation for kernel stack. > Some drivers still use order-10 allocations with kmalloc(), and nobody > complained yet. > > We had complains with mlx4 driver lately only bcause kmalloc() now gives > a warning if allocations above MAX_ORDER are attempted. > > Having a single pointer means that we can : > > - Attempts a regular kmalloc() call, it will work most of the time. > - fallback to vmalloc() _if_ kmalloc() failed. Most drivers create devices at boot time, when this is more likely to work. What makes tun (and macvlan) a bit special is that the device is created from userspace. Virt setups create/destroy them all the time. > > Frankly, if you want one tx queue per cpu, I would rather use > NETIF_F_LLTX, like some other virtual devices. > > This way, you can have real per cpu memory, with proper NUMA affinity. > Hmm good point, worth looking at. Thanks, -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [net-next rfc 1/3] net: avoid high order memory allocation for queues by using flex array
On Wed, 2013-06-19 at 12:11 +0300, Michael S. Tsirkin wrote: > Well KVM supports up to 160 VCPUs on x86. > > Creating a queue per CPU is very reasonable, and > assuming cache line size of 64 bytes, netdev_queue seems to be 320 > bytes, that's 320*160 = 51200. So 12.5 pages, order-4 allocation. > I agree most people don't have such systems yet, but > they do exist. Even so, it will just work, like a fork() is likely to work, even if a process needs order-1 allocation for kernel stack. Some drivers still use order-10 allocations with kmalloc(), and nobody complained yet. We had complains with mlx4 driver lately only bcause kmalloc() now gives a warning if allocations above MAX_ORDER are attempted. Having a single pointer means that we can : - Attempts a regular kmalloc() call, it will work most of the time. - fallback to vmalloc() _if_ kmalloc() failed. Frankly, if you want one tx queue per cpu, I would rather use NETIF_F_LLTX, like some other virtual devices. This way, you can have real per cpu memory, with proper NUMA affinity. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [net-next rfc 1/3] net: avoid high order memory allocation for queues by using flex array
On Tue, Jun 18, 2013 at 11:31:58PM -0700, Eric Dumazet wrote: > On Wed, 2013-06-19 at 13:40 +0800, Jason Wang wrote: > > Currently, we use kcalloc to allocate rx/tx queues for a net device which > > could > > be easily lead to a high order memory allocation request when initializing a > > multiqueue net device. We can simply avoid this by switching to use flex > > array > > which always allocate at order zero. > > > > Signed-off-by: Jason Wang > > --- > > include/linux/netdevice.h | 13 ++ > > net/core/dev.c| 57 > > > > net/core/net-sysfs.c | 15 +++ > > 3 files changed, 58 insertions(+), 27 deletions(-) > > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > > index 09b4188..c0b5d04 100644 > > --- a/include/linux/netdevice.h > > +++ b/include/linux/netdevice.h > > @@ -32,6 +32,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > #include > > @@ -1230,7 +1231,7 @@ struct net_device { > > > > > > #ifdef CONFIG_RPS > > - struct netdev_rx_queue *_rx; > > + struct flex_array *_rx; > > > > /* Number of RX queues allocated at register_netdev() time */ > > unsigned intnum_rx_queues; > > @@ -1250,7 +1251,7 @@ struct net_device { > > /* > > * Cache lines mostly used on transmit path > > */ > > - struct netdev_queue *_tx cacheline_aligned_in_smp; > > + struct flex_array *_tx cacheline_aligned_in_smp; > > > > Using flex_array and adding overhead in this super critical part of > network stack, only to avoid order-1 allocations done in GFP_KERNEL > context is simply insane. > > We can revisit this in 2050 if we ever need order-4 allocations or so, > and still use 4K pages. > > Well KVM supports up to 160 VCPUs on x86. Creating a queue per CPU is very reasonable, and assuming cache line size of 64 bytes, netdev_queue seems to be 320 bytes, that's 320*160 = 51200. So 12.5 pages, order-4 allocation. I agree most people don't have such systems yet, but they do exist. We can cut the size of netdev_queue, moving out kobj - which does not seem to be used on data path to a separate structure. It's 64 byte in size so exactly 256 bytes. That will get us an order-3 allocation, and there's some padding there so we won't immediately increase it the moment we add some fields. Comments on this idea? Instead of always using a flex array, we could have + struct netdev_queue *_tx; /* Used with small # of queues */ +#ifdef CONFIG_NETDEV_HUGE_NUMBER_OR_QUEUES + struct flex_array *_tx_large; /* Used with large # of queues */ +#endif And fix wrappers to use _tx if not NULL, otherwise _tx_large. If configured in, it's an extra branch on data path but probably less costly than the extra indirection. -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [net-next rfc 1/3] net: avoid high order memory allocation for queues by using flex array
On 06/19/2013 02:31 PM, Eric Dumazet wrote: > On Wed, 2013-06-19 at 13:40 +0800, Jason Wang wrote: >> Currently, we use kcalloc to allocate rx/tx queues for a net device which >> could >> be easily lead to a high order memory allocation request when initializing a >> multiqueue net device. We can simply avoid this by switching to use flex >> array >> which always allocate at order zero. >> >> Signed-off-by: Jason Wang >> --- >> include/linux/netdevice.h | 13 ++ >> net/core/dev.c| 57 >> >> net/core/net-sysfs.c | 15 +++ >> 3 files changed, 58 insertions(+), 27 deletions(-) >> >> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >> index 09b4188..c0b5d04 100644 >> --- a/include/linux/netdevice.h >> +++ b/include/linux/netdevice.h >> @@ -32,6 +32,7 @@ >> #include >> #include >> #include >> +#include >> >> #include >> #include >> @@ -1230,7 +1231,7 @@ struct net_device { >> >> >> #ifdef CONFIG_RPS >> -struct netdev_rx_queue *_rx; >> +struct flex_array *_rx; >> >> /* Number of RX queues allocated at register_netdev() time */ >> unsigned intnum_rx_queues; >> @@ -1250,7 +1251,7 @@ struct net_device { >> /* >> * Cache lines mostly used on transmit path >> */ >> -struct netdev_queue *_tx cacheline_aligned_in_smp; >> +struct flex_array *_tx cacheline_aligned_in_smp; >> > Using flex_array and adding overhead in this super critical part of > network stack, only to avoid order-1 allocations done in GFP_KERNEL > context is simply insane. Yes, and I also miss the fact of GFP_KERNEL allocation. > We can revisit this in 2050 if we ever need order-4 allocations or so, > and still use 4K pages. > > Will drop this patch, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [net-next rfc 1/3] net: avoid high order memory allocation for queues by using flex array
On Wed, 2013-06-19 at 13:40 +0800, Jason Wang wrote: > Currently, we use kcalloc to allocate rx/tx queues for a net device which > could > be easily lead to a high order memory allocation request when initializing a > multiqueue net device. We can simply avoid this by switching to use flex array > which always allocate at order zero. > > Signed-off-by: Jason Wang > --- > include/linux/netdevice.h | 13 ++ > net/core/dev.c| 57 > net/core/net-sysfs.c | 15 +++ > 3 files changed, 58 insertions(+), 27 deletions(-) > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 09b4188..c0b5d04 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -32,6 +32,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -1230,7 +1231,7 @@ struct net_device { > > > #ifdef CONFIG_RPS > - struct netdev_rx_queue *_rx; > + struct flex_array *_rx; > > /* Number of RX queues allocated at register_netdev() time */ > unsigned intnum_rx_queues; > @@ -1250,7 +1251,7 @@ struct net_device { > /* > * Cache lines mostly used on transmit path > */ > - struct netdev_queue *_tx cacheline_aligned_in_smp; > + struct flex_array *_tx cacheline_aligned_in_smp; > Using flex_array and adding overhead in this super critical part of network stack, only to avoid order-1 allocations done in GFP_KERNEL context is simply insane. We can revisit this in 2050 if we ever need order-4 allocations or so, and still use 4K pages. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [net-next rfc 1/3] net: avoid high order memory allocation for queues by using flex array
On Wed, 2013-06-19 at 13:40 +0800, Jason Wang wrote: Currently, we use kcalloc to allocate rx/tx queues for a net device which could be easily lead to a high order memory allocation request when initializing a multiqueue net device. We can simply avoid this by switching to use flex array which always allocate at order zero. Signed-off-by: Jason Wang jasow...@redhat.com --- include/linux/netdevice.h | 13 ++ net/core/dev.c| 57 net/core/net-sysfs.c | 15 +++ 3 files changed, 58 insertions(+), 27 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 09b4188..c0b5d04 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -32,6 +32,7 @@ #include linux/atomic.h #include asm/cache.h #include asm/byteorder.h +#include linux/flex_array.h #include linux/percpu.h #include linux/rculist.h @@ -1230,7 +1231,7 @@ struct net_device { #ifdef CONFIG_RPS - struct netdev_rx_queue *_rx; + struct flex_array *_rx; /* Number of RX queues allocated at register_netdev() time */ unsigned intnum_rx_queues; @@ -1250,7 +1251,7 @@ struct net_device { /* * Cache lines mostly used on transmit path */ - struct netdev_queue *_tx cacheline_aligned_in_smp; + struct flex_array *_tx cacheline_aligned_in_smp; Using flex_array and adding overhead in this super critical part of network stack, only to avoid order-1 allocations done in GFP_KERNEL context is simply insane. We can revisit this in 2050 if we ever need order-4 allocations or so, and still use 4K pages. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [net-next rfc 1/3] net: avoid high order memory allocation for queues by using flex array
On 06/19/2013 02:31 PM, Eric Dumazet wrote: On Wed, 2013-06-19 at 13:40 +0800, Jason Wang wrote: Currently, we use kcalloc to allocate rx/tx queues for a net device which could be easily lead to a high order memory allocation request when initializing a multiqueue net device. We can simply avoid this by switching to use flex array which always allocate at order zero. Signed-off-by: Jason Wang jasow...@redhat.com --- include/linux/netdevice.h | 13 ++ net/core/dev.c| 57 net/core/net-sysfs.c | 15 +++ 3 files changed, 58 insertions(+), 27 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 09b4188..c0b5d04 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -32,6 +32,7 @@ #include linux/atomic.h #include asm/cache.h #include asm/byteorder.h +#include linux/flex_array.h #include linux/percpu.h #include linux/rculist.h @@ -1230,7 +1231,7 @@ struct net_device { #ifdef CONFIG_RPS -struct netdev_rx_queue *_rx; +struct flex_array *_rx; /* Number of RX queues allocated at register_netdev() time */ unsigned intnum_rx_queues; @@ -1250,7 +1251,7 @@ struct net_device { /* * Cache lines mostly used on transmit path */ -struct netdev_queue *_tx cacheline_aligned_in_smp; +struct flex_array *_tx cacheline_aligned_in_smp; Using flex_array and adding overhead in this super critical part of network stack, only to avoid order-1 allocations done in GFP_KERNEL context is simply insane. Yes, and I also miss the fact of GFP_KERNEL allocation. We can revisit this in 2050 if we ever need order-4 allocations or so, and still use 4K pages. Will drop this patch, thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [net-next rfc 1/3] net: avoid high order memory allocation for queues by using flex array
On Tue, Jun 18, 2013 at 11:31:58PM -0700, Eric Dumazet wrote: On Wed, 2013-06-19 at 13:40 +0800, Jason Wang wrote: Currently, we use kcalloc to allocate rx/tx queues for a net device which could be easily lead to a high order memory allocation request when initializing a multiqueue net device. We can simply avoid this by switching to use flex array which always allocate at order zero. Signed-off-by: Jason Wang jasow...@redhat.com --- include/linux/netdevice.h | 13 ++ net/core/dev.c| 57 net/core/net-sysfs.c | 15 +++ 3 files changed, 58 insertions(+), 27 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 09b4188..c0b5d04 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -32,6 +32,7 @@ #include linux/atomic.h #include asm/cache.h #include asm/byteorder.h +#include linux/flex_array.h #include linux/percpu.h #include linux/rculist.h @@ -1230,7 +1231,7 @@ struct net_device { #ifdef CONFIG_RPS - struct netdev_rx_queue *_rx; + struct flex_array *_rx; /* Number of RX queues allocated at register_netdev() time */ unsigned intnum_rx_queues; @@ -1250,7 +1251,7 @@ struct net_device { /* * Cache lines mostly used on transmit path */ - struct netdev_queue *_tx cacheline_aligned_in_smp; + struct flex_array *_tx cacheline_aligned_in_smp; Using flex_array and adding overhead in this super critical part of network stack, only to avoid order-1 allocations done in GFP_KERNEL context is simply insane. We can revisit this in 2050 if we ever need order-4 allocations or so, and still use 4K pages. Well KVM supports up to 160 VCPUs on x86. Creating a queue per CPU is very reasonable, and assuming cache line size of 64 bytes, netdev_queue seems to be 320 bytes, that's 320*160 = 51200. So 12.5 pages, order-4 allocation. I agree most people don't have such systems yet, but they do exist. We can cut the size of netdev_queue, moving out kobj - which does not seem to be used on data path to a separate structure. It's 64 byte in size so exactly 256 bytes. That will get us an order-3 allocation, and there's some padding there so we won't immediately increase it the moment we add some fields. Comments on this idea? Instead of always using a flex array, we could have + struct netdev_queue *_tx; /* Used with small # of queues */ +#ifdef CONFIG_NETDEV_HUGE_NUMBER_OR_QUEUES + struct flex_array *_tx_large; /* Used with large # of queues */ +#endif And fix wrappers to use _tx if not NULL, otherwise _tx_large. If configured in, it's an extra branch on data path but probably less costly than the extra indirection. -- MST -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [net-next rfc 1/3] net: avoid high order memory allocation for queues by using flex array
On Wed, 2013-06-19 at 12:11 +0300, Michael S. Tsirkin wrote: Well KVM supports up to 160 VCPUs on x86. Creating a queue per CPU is very reasonable, and assuming cache line size of 64 bytes, netdev_queue seems to be 320 bytes, that's 320*160 = 51200. So 12.5 pages, order-4 allocation. I agree most people don't have such systems yet, but they do exist. Even so, it will just work, like a fork() is likely to work, even if a process needs order-1 allocation for kernel stack. Some drivers still use order-10 allocations with kmalloc(), and nobody complained yet. We had complains with mlx4 driver lately only bcause kmalloc() now gives a warning if allocations above MAX_ORDER are attempted. Having a single pointer means that we can : - Attempts a regular kmalloc() call, it will work most of the time. - fallback to vmalloc() _if_ kmalloc() failed. Frankly, if you want one tx queue per cpu, I would rather use NETIF_F_LLTX, like some other virtual devices. This way, you can have real per cpu memory, with proper NUMA affinity. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [net-next rfc 1/3] net: avoid high order memory allocation for queues by using flex array
On Wed, Jun 19, 2013 at 02:56:03AM -0700, Eric Dumazet wrote: On Wed, 2013-06-19 at 12:11 +0300, Michael S. Tsirkin wrote: Well KVM supports up to 160 VCPUs on x86. Creating a queue per CPU is very reasonable, and assuming cache line size of 64 bytes, netdev_queue seems to be 320 bytes, that's 320*160 = 51200. So 12.5 pages, order-4 allocation. I agree most people don't have such systems yet, but they do exist. Even so, it will just work, like a fork() is likely to work, even if a process needs order-1 allocation for kernel stack. Some drivers still use order-10 allocations with kmalloc(), and nobody complained yet. We had complains with mlx4 driver lately only bcause kmalloc() now gives a warning if allocations above MAX_ORDER are attempted. Having a single pointer means that we can : - Attempts a regular kmalloc() call, it will work most of the time. - fallback to vmalloc() _if_ kmalloc() failed. Most drivers create devices at boot time, when this is more likely to work. What makes tun (and macvlan) a bit special is that the device is created from userspace. Virt setups create/destroy them all the time. Frankly, if you want one tx queue per cpu, I would rather use NETIF_F_LLTX, like some other virtual devices. This way, you can have real per cpu memory, with proper NUMA affinity. Hmm good point, worth looking at. Thanks, -- MST -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [net-next rfc 1/3] net: avoid high order memory allocation for queues by using flex array
On Wed, Jun 19, 2013 at 02:56:03AM -0700, Eric Dumazet wrote: On Wed, 2013-06-19 at 12:11 +0300, Michael S. Tsirkin wrote: Well KVM supports up to 160 VCPUs on x86. Creating a queue per CPU is very reasonable, and assuming cache line size of 64 bytes, netdev_queue seems to be 320 bytes, that's 320*160 = 51200. So 12.5 pages, order-4 allocation. I agree most people don't have such systems yet, but they do exist. Even so, it will just work, like a fork() is likely to work, even if a process needs order-1 allocation for kernel stack. Some drivers still use order-10 allocations with kmalloc(), and nobody complained yet. We had complains with mlx4 driver lately only bcause kmalloc() now gives a warning if allocations above MAX_ORDER are attempted. Having a single pointer means that we can : - Attempts a regular kmalloc() call, it will work most of the time. - fallback to vmalloc() _if_ kmalloc() failed. That's a good trick too - vmalloc memory is a bit slower on x86 since it's not using a huge page, but that's only when we have lots of CPUs/queues... Short term - how about switching to vmalloc if 32 queues? Frankly, if you want one tx queue per cpu, I would rather use NETIF_F_LLTX, like some other virtual devices. This way, you can have real per cpu memory, with proper NUMA affinity. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [net-next rfc 1/3] net: avoid high order memory allocation for queues by using flex array
On Wed, 2013-06-19 at 18:40 +0300, Michael S. Tsirkin wrote: That's a good trick too - vmalloc memory is a bit slower on x86 since it's not using a huge page, but that's only when we have lots of CPUs/queues... Short term - how about switching to vmalloc if 32 queues? You do not have to switch if 32 queues diff --git a/net/core/dev.c b/net/core/dev.c index fa007db..473cc9e 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -130,6 +130,7 @@ #include linux/cpu_rmap.h #include linux/static_key.h #include linux/hashtable.h +#include linux/vmalloc.h #include net-sysfs.h @@ -5253,6 +5254,14 @@ static void netdev_init_one_queue(struct net_device *dev, #endif } +static void netif_free_tx_queues(struct net_device *dev) +{ + if (is_vmalloc_addr(dev-_tx)) + vfree(dev-_tx); + else + kfree(dev-_tx); +} + static int netif_alloc_netdev_queues(struct net_device *dev) { unsigned int count = dev-num_tx_queues; @@ -5261,9 +5270,11 @@ static int netif_alloc_netdev_queues(struct net_device *dev) BUG_ON(count 1); tx = kcalloc(count, sizeof(struct netdev_queue), GFP_KERNEL); - if (!tx) - return -ENOMEM; - + if (!tx) { + tx = vzalloc(count * sizeof(struct netdev_queue)); + if (!tx) + return -ENOMEM; + } dev-_tx = tx; netdev_for_each_tx_queue(dev, netdev_init_one_queue, NULL); @@ -5811,7 +5822,7 @@ free_all: free_pcpu: free_percpu(dev-pcpu_refcnt); - kfree(dev-_tx); + netif_free_tx_queues(dev); #ifdef CONFIG_RPS kfree(dev-_rx); #endif @@ -5836,7 +5847,7 @@ void free_netdev(struct net_device *dev) release_net(dev_net(dev)); - kfree(dev-_tx); + netif_free_tx_queues(dev); #ifdef CONFIG_RPS kfree(dev-_rx); #endif -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [net-next rfc 1/3] net: avoid high order memory allocation for queues by using flex array
On Wed, 2013-06-19 at 17:06 +0100, David Laight wrote: Given the number of places I've seen this code added, why not put it in a general helper? Good luck with that. We had many attempts in the past. I also thought that malloc() with GFP_KERNEL would sleep. Under what conditions does it fail instead? For example on 32 bit kernel, LOW memory exhaustion/fragmentation. vmalloc() has an extra 128MB virtual space (on i386 at least with standard 3G/1G split) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [net-next rfc 1/3] net: avoid high order memory allocation for queues by using flex array
+static void netif_free_tx_queues(struct net_device *dev) +{ + if (is_vmalloc_addr(dev-_tx)) + vfree(dev-_tx); + else + kfree(dev-_tx); +} + static int netif_alloc_netdev_queues(struct net_device *dev) { unsigned int count = dev-num_tx_queues; @@ -5261,9 +5270,11 @@ static int netif_alloc_netdev_queues(struct net_device *dev) BUG_ON(count 1); tx = kcalloc(count, sizeof(struct netdev_queue), GFP_KERNEL); - if (!tx) - return -ENOMEM; - + if (!tx) { + tx = vzalloc(count * sizeof(struct netdev_queue)); + if (!tx) + return -ENOMEM; + } dev-_tx = tx; Given the number of places I've seen this code added, why not put it in a general helper? I also thought that malloc() with GFP_KERNEL would sleep. Under what conditions does it fail instead? David
Re: [net-next rfc 1/3] net: avoid high order memory allocation for queues by using flex array
On Wed, Jun 19, 2013 at 08:58:31AM -0700, Eric Dumazet wrote: On Wed, 2013-06-19 at 18:40 +0300, Michael S. Tsirkin wrote: That's a good trick too - vmalloc memory is a bit slower on x86 since it's not using a huge page, but that's only when we have lots of CPUs/queues... Short term - how about switching to vmalloc if 32 queues? You do not have to switch if 32 queues diff --git a/net/core/dev.c b/net/core/dev.c index fa007db..473cc9e 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -130,6 +130,7 @@ #include linux/cpu_rmap.h #include linux/static_key.h #include linux/hashtable.h +#include linux/vmalloc.h #include net-sysfs.h @@ -5253,6 +5254,14 @@ static void netdev_init_one_queue(struct net_device *dev, #endif } +static void netif_free_tx_queues(struct net_device *dev) +{ + if (is_vmalloc_addr(dev-_tx)) + vfree(dev-_tx); + else + kfree(dev-_tx); +} + static int netif_alloc_netdev_queues(struct net_device *dev) { unsigned int count = dev-num_tx_queues; @@ -5261,9 +5270,11 @@ static int netif_alloc_netdev_queues(struct net_device *dev) BUG_ON(count 1); tx = kcalloc(count, sizeof(struct netdev_queue), GFP_KERNEL); As we have explicit code to recover, maybe set __GFP_NOWARN? If we are out of memory, vzalloc will warn too, we don't need two warnings. - if (!tx) - return -ENOMEM; - + if (!tx) { + tx = vzalloc(count * sizeof(struct netdev_queue)); + if (!tx) + return -ENOMEM; + } dev-_tx = tx; netdev_for_each_tx_queue(dev, netdev_init_one_queue, NULL); @@ -5811,7 +5822,7 @@ free_all: free_pcpu: free_percpu(dev-pcpu_refcnt); - kfree(dev-_tx); + netif_free_tx_queues(dev); #ifdef CONFIG_RPS kfree(dev-_rx); #endif @@ -5836,7 +5847,7 @@ void free_netdev(struct net_device *dev) release_net(dev_net(dev)); - kfree(dev-_tx); + netif_free_tx_queues(dev); #ifdef CONFIG_RPS kfree(dev-_rx); #endif -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [net-next rfc 1/3] net: avoid high order memory allocation for queues by using flex array
On 06/19/2013 05:56 PM, Eric Dumazet wrote: On Wed, 2013-06-19 at 12:11 +0300, Michael S. Tsirkin wrote: Well KVM supports up to 160 VCPUs on x86. Creating a queue per CPU is very reasonable, and assuming cache line size of 64 bytes, netdev_queue seems to be 320 bytes, that's 320*160 = 51200. So 12.5 pages, order-4 allocation. I agree most people don't have such systems yet, but they do exist. Even so, it will just work, like a fork() is likely to work, even if a process needs order-1 allocation for kernel stack. Some drivers still use order-10 allocations with kmalloc(), and nobody complained yet. We had complains with mlx4 driver lately only bcause kmalloc() now gives a warning if allocations above MAX_ORDER are attempted. Having a single pointer means that we can : - Attempts a regular kmalloc() call, it will work most of the time. - fallback to vmalloc() _if_ kmalloc() failed. Frankly, if you want one tx queue per cpu, I would rather use NETIF_F_LLTX, like some other virtual devices. A drawback of NETIF_F_LLTX is that we may contend on qdisc lock especially when we have a huge number of tx queues. This way, you can have real per cpu memory, with proper NUMA affinity. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/