Re: [PATCHv2 10/14] virtio_net: limit xmit polling
Michael S. Tsirkin m...@redhat.com wrote on 05/23/2011 04:49:00 PM: To do this properly, we should really be using the actual number of sg elements needed, but we'd have to do most of xmit_skb beforehand so we know how many. Cheers, Rusty. Maybe I'm confused here. The problem isn't the failing add_buf for the given skb IIUC. What we are trying to do here is stop the queue *before xmit_skb fails*. We can't look at the number of fragments in the current skb - the next one can be much larger. That's why we check capacity after xmit_skb, not before it, right? Maybe Rusty means it is a simpler model to free the amount of space that this xmit needs. We will still fail anyway at some time but it is unlikely, since earlier iteration freed up atleast the space that it was going to use. The code could become much simpler: start_xmit() { { num_sgs = get num_sgs for this skb; /* Free enough pending old buffers to enable queueing this one */ free_old_xmit_skbs(vi, num_sgs * 2); /* ?? */ if (virtqueue_get_capacity() num_sgs) { netif_stop_queue(dev); if (virtqueue_enable_cb_delayed(vi-svq) || free_old_xmit_skbs(vi, num_sgs)) { /* Nothing freed up, or not enough freed up */ kfree_skb(skb); return NETDEV_TX_OK; } netif_start_queue(dev); virtqueue_disable_cb(vi-svq); } /* xmit_skb cannot fail now, also pass 'num_sgs' */ xmit_skb(vi, skb, num_sgs); virtqueue_kick(vi-svq); skb_orphan(skb); nf_reset(skb); return NETDEV_TX_OK; } We could even return TX_BUSY since that makes the dequeue code more efficient. See dev_dequeue_skb() - you can skip a lot of code (and avoid taking locks) to check if the queue is already stopped but that code runs only if you return TX_BUSY in the earlier iteration. BTW, shouldn't the check in start_xmit be: if (likely(!free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) { ... } Thanks, - KK ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Tue, May 24, 2011 at 01:24:15PM +0530, Krishna Kumar2 wrote: Michael S. Tsirkin m...@redhat.com wrote on 05/23/2011 04:49:00 PM: To do this properly, we should really be using the actual number of sg elements needed, but we'd have to do most of xmit_skb beforehand so we know how many. Cheers, Rusty. Maybe I'm confused here. The problem isn't the failing add_buf for the given skb IIUC. What we are trying to do here is stop the queue *before xmit_skb fails*. We can't look at the number of fragments in the current skb - the next one can be much larger. That's why we check capacity after xmit_skb, not before it, right? Maybe Rusty means it is a simpler model to free the amount of space that this xmit needs. We will still fail anyway at some time but it is unlikely, since earlier iteration freed up atleast the space that it was going to use. Not sure I nderstand. We can't know space is freed in the previous iteration as buffers might not have been used by then. The code could become much simpler: start_xmit() { { num_sgs = get num_sgs for this skb; /* Free enough pending old buffers to enable queueing this one */ free_old_xmit_skbs(vi, num_sgs * 2); /* ?? */ if (virtqueue_get_capacity() num_sgs) { netif_stop_queue(dev); if (virtqueue_enable_cb_delayed(vi-svq) || free_old_xmit_skbs(vi, num_sgs)) { /* Nothing freed up, or not enough freed up */ kfree_skb(skb); return NETDEV_TX_OK; This packet drop is what we wanted to avoid. } netif_start_queue(dev); virtqueue_disable_cb(vi-svq); } /* xmit_skb cannot fail now, also pass 'num_sgs' */ xmit_skb(vi, skb, num_sgs); virtqueue_kick(vi-svq); skb_orphan(skb); nf_reset(skb); return NETDEV_TX_OK; } We could even return TX_BUSY since that makes the dequeue code more efficient. See dev_dequeue_skb() - you can skip a lot of code (and avoid taking locks) to check if the queue is already stopped but that code runs only if you return TX_BUSY in the earlier iteration. BTW, shouldn't the check in start_xmit be: if (likely(!free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) { ... } Thanks, - KK I thought we used to do basically this but other devices moved to a model where they stop *before* queueing fails, so we did too. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
2.6.40 event idx patches
Just checking: were you going to send the following to Linus for 2.6.40? virtio:event_index_interface.patch virtio:ring_inline_function_to_check_for_events.patch virtio:ring_support_event_idx_feature.patch misc:vhost_support_event_index.patch virtio:test_support_event_index.patch virtio:add_api_for_delayed_callbacks.patch virtio:net_delay_tx_callbacks.patch I certainly hope so as these modify both host and guest which makes testing them out of tree hard for people. Once stuff lands on Linus' tree we can patch qemu and get more people to try out the patches. Also, Linus said it's going to be a short window so ... -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
Michael S. Tsirkin m...@redhat.com wrote on 05/24/2011 02:42:55 PM: To do this properly, we should really be using the actual number of sg elements needed, but we'd have to do most of xmit_skb beforehand so we know how many. Cheers, Rusty. Maybe I'm confused here. The problem isn't the failing add_buf for the given skb IIUC. What we are trying to do here is stop the queue *before xmit_skb fails*. We can't look at the number of fragments in the current skb - the next one can be much larger. That's why we check capacity after xmit_skb, not before it, right? Maybe Rusty means it is a simpler model to free the amount of space that this xmit needs. We will still fail anyway at some time but it is unlikely, since earlier iteration freed up atleast the space that it was going to use. Not sure I nderstand. We can't know space is freed in the previous iteration as buffers might not have been used by then. Yes, the first few iterations may not have freed up space, but later ones should. The amount of free space should increase from then on, especially since we try to free double of what we consume. The code could become much simpler: start_xmit() { { num_sgs = get num_sgs for this skb; /* Free enough pending old buffers to enable queueing this one */ free_old_xmit_skbs(vi, num_sgs * 2); /* ?? */ if (virtqueue_get_capacity() num_sgs) { netif_stop_queue(dev); if (virtqueue_enable_cb_delayed(vi-svq) || free_old_xmit_skbs(vi, num_sgs)) { /* Nothing freed up, or not enough freed up */ kfree_skb(skb); return NETDEV_TX_OK; This packet drop is what we wanted to avoid. Please see below on returning NETDEV_TX_BUSY. } netif_start_queue(dev); virtqueue_disable_cb(vi-svq); } /* xmit_skb cannot fail now, also pass 'num_sgs' */ xmit_skb(vi, skb, num_sgs); virtqueue_kick(vi-svq); skb_orphan(skb); nf_reset(skb); return NETDEV_TX_OK; } We could even return TX_BUSY since that makes the dequeue code more efficient. See dev_dequeue_skb() - you can skip a lot of code (and avoid taking locks) to check if the queue is already stopped but that code runs only if you return TX_BUSY in the earlier iteration. BTW, shouldn't the check in start_xmit be: if (likely(!free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) { ... } Thanks, - KK I thought we used to do basically this but other devices moved to a model where they stop *before* queueing fails, so we did too. I am not sure of why it was changed, since returning TX_BUSY seems more efficient IMHO. qdisc_restart() handles requeue'd packets much better than a stopped queue, as a significant part of this code is skipped if gso_skb is present (qdisc will eventually start dropping packets when tx_queue_len is exceeded anyway). Thanks, - KK ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Tue, May 24, 2011 at 02:57:43PM +0530, Krishna Kumar2 wrote: Michael S. Tsirkin m...@redhat.com wrote on 05/24/2011 02:42:55 PM: To do this properly, we should really be using the actual number of sg elements needed, but we'd have to do most of xmit_skb beforehand so we know how many. Cheers, Rusty. Maybe I'm confused here. The problem isn't the failing add_buf for the given skb IIUC. What we are trying to do here is stop the queue *before xmit_skb fails*. We can't look at the number of fragments in the current skb - the next one can be much larger. That's why we check capacity after xmit_skb, not before it, right? Maybe Rusty means it is a simpler model to free the amount of space that this xmit needs. We will still fail anyway at some time but it is unlikely, since earlier iteration freed up atleast the space that it was going to use. Not sure I nderstand. We can't know space is freed in the previous iteration as buffers might not have been used by then. Yes, the first few iterations may not have freed up space, but later ones should. The amount of free space should increase from then on, especially since we try to free double of what we consume. Hmm. This is only an upper limit on the # of entries in the queue. Assume that vq size is 4 and we transmit 4 enties without getting anything in the used ring. The next transmit will fail. So I don't really see why it's unlikely that we reach the packet drop code with your patch. The code could become much simpler: start_xmit() { { num_sgs = get num_sgs for this skb; /* Free enough pending old buffers to enable queueing this one */ free_old_xmit_skbs(vi, num_sgs * 2); /* ?? */ if (virtqueue_get_capacity() num_sgs) { netif_stop_queue(dev); if (virtqueue_enable_cb_delayed(vi-svq) || free_old_xmit_skbs(vi, num_sgs)) { /* Nothing freed up, or not enough freed up */ kfree_skb(skb); return NETDEV_TX_OK; This packet drop is what we wanted to avoid. Please see below on returning NETDEV_TX_BUSY. } netif_start_queue(dev); virtqueue_disable_cb(vi-svq); } /* xmit_skb cannot fail now, also pass 'num_sgs' */ xmit_skb(vi, skb, num_sgs); virtqueue_kick(vi-svq); skb_orphan(skb); nf_reset(skb); return NETDEV_TX_OK; } We could even return TX_BUSY since that makes the dequeue code more efficient. See dev_dequeue_skb() - you can skip a lot of code (and avoid taking locks) to check if the queue is already stopped but that code runs only if you return TX_BUSY in the earlier iteration. BTW, shouldn't the check in start_xmit be: if (likely(!free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) { ... } Thanks, - KK I thought we used to do basically this but other devices moved to a model where they stop *before* queueing fails, so we did too. I am not sure of why it was changed, since returning TX_BUSY seems more efficient IMHO. qdisc_restart() handles requeue'd packets much better than a stopped queue, as a significant part of this code is skipped if gso_skb is present I think this is the argument: http://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg06364.html (qdisc will eventually start dropping packets when tx_queue_len is exceeded anyway). Thanks, - KK tx_queue_len is a pretty large buffer so maybe no. I think the packet drops from the scheduler queue can also be done intelligently (e.g. with CHOKe) which should work better than dropping a random packet? -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
Michael S. Tsirkin m...@redhat.com wrote on 05/24/2011 04:59:39 PM: Maybe Rusty means it is a simpler model to free the amount of space that this xmit needs. We will still fail anyway at some time but it is unlikely, since earlier iteration freed up atleast the space that it was going to use. Not sure I nderstand. We can't know space is freed in the previous iteration as buffers might not have been used by then. Yes, the first few iterations may not have freed up space, but later ones should. The amount of free space should increase from then on, especially since we try to free double of what we consume. Hmm. This is only an upper limit on the # of entries in the queue. Assume that vq size is 4 and we transmit 4 enties without getting anything in the used ring. The next transmit will fail. So I don't really see why it's unlikely that we reach the packet drop code with your patch. I was assuming 256 entries :) I will try to get some numbers to see how often it is true tomorrow. I am not sure of why it was changed, since returning TX_BUSY seems more efficient IMHO. qdisc_restart() handles requeue'd packets much better than a stopped queue, as a significant part of this code is skipped if gso_skb is present I think this is the argument: http://www.mail-archive.com/virtualization@lists.linux- foundation.org/msg06364.html Thanks for digging up that thread! Yes, that one skb would get sent first ahead of possibly higher priority skbs. However, from a performance point, TX_BUSY code skips a lot of checks and code for all subsequent packets till the device is restarted. I can test performance with both cases and report what I find (the requeue code has become very simple and clean from horribly complex, thanks to Herbert and Dave). (qdisc will eventually start dropping packets when tx_queue_len is tx_queue_len is a pretty large buffer so maybe no. I remember seeing tons of drops (pfifo_fast_enqueue) when xmit returns TX_BUSY. I think the packet drops from the scheduler queue can also be done intelligently (e.g. with CHOKe) which should work better than dropping a random packet? I am not sure of that - choke_enqueue checks against a random skb to drop current skb, and also during congestion. But for my sample driver xmit, returning TX_BUSY could still allow to be used with CHOKe. thanks, - KK ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Tue, May 24, 2011 at 06:20:35PM +0530, Krishna Kumar2 wrote: Michael S. Tsirkin m...@redhat.com wrote on 05/24/2011 04:59:39 PM: Maybe Rusty means it is a simpler model to free the amount of space that this xmit needs. We will still fail anyway at some time but it is unlikely, since earlier iteration freed up atleast the space that it was going to use. Not sure I nderstand. We can't know space is freed in the previous iteration as buffers might not have been used by then. Yes, the first few iterations may not have freed up space, but later ones should. The amount of free space should increase from then on, especially since we try to free double of what we consume. Hmm. This is only an upper limit on the # of entries in the queue. Assume that vq size is 4 and we transmit 4 enties without getting anything in the used ring. The next transmit will fail. So I don't really see why it's unlikely that we reach the packet drop code with your patch. I was assuming 256 entries :) I will try to get some numbers to see how often it is true tomorrow. That would depend on how fast the hypervisor is. Try doing something to make hypervisor slower than the guest. I don't think we need measurements to realize that with the host being slower than guest that would happen a lot, though. I am not sure of why it was changed, since returning TX_BUSY seems more efficient IMHO. qdisc_restart() handles requeue'd packets much better than a stopped queue, as a significant part of this code is skipped if gso_skb is present I think this is the argument: http://www.mail-archive.com/virtualization@lists.linux- foundation.org/msg06364.html Thanks for digging up that thread! Yes, that one skb would get sent first ahead of possibly higher priority skbs. However, from a performance point, TX_BUSY code skips a lot of checks and code for all subsequent packets till the device is restarted. I can test performance with both cases and report what I find (the requeue code has become very simple and clean from horribly complex, thanks to Herbert and Dave). Cc Herbert, and try to convince him :) (qdisc will eventually start dropping packets when tx_queue_len is tx_queue_len is a pretty large buffer so maybe no. I remember seeing tons of drops (pfifo_fast_enqueue) when xmit returns TX_BUSY. I think the packet drops from the scheduler queue can also be done intelligently (e.g. with CHOKe) which should work better than dropping a random packet? I am not sure of that - choke_enqueue checks against a random skb to drop current skb, and also during congestion. But for my sample driver xmit, returning TX_BUSY could still allow to be used with CHOKe. thanks, - KK ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] arch/tile: add /proc/tile, /proc/sys/tile, and a sysfs cpu attribute
On Thursday 19 May 2011, Arnd Bergmann wrote: # cat /proc/tile/board board_part: 402-2-05 board_serial: NBS-5002-00012 chip_serial: P62338.01.110 chip_revision: A0 board_revision: 2.2 board_description: Tilera TILExpressPro-64, TILEPro64 processor (866 MHz-capable), 1 10GbE, 6 1GbE # cat /proc/tile/switch control: mdio gbe/0 I think it's ok to have it below /sys/hypervisor, because the information is provided through a hypervisor ABI, even though it describes something else. This is more like /sys/firmware, but the boundaries between that and /sys/hypervisor are not clearly defined when running virtualized anyway. A minor point that I meant to bring up but had not gotten to: When you do a /sys/hypervisor/ interface, put everything into a subdirectory under /sys/hypervisor with the name of your hypervisor, to avoid naming conflicts, e.g. /sys/hypervisor/tilera-hv/board/board_serial Arnd ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Mon, 23 May 2011 14:19:00 +0300, Michael S. Tsirkin m...@redhat.com wrote: On Mon, May 23, 2011 at 11:37:15AM +0930, Rusty Russell wrote: Can we hit problems with OOM? Sure, but no worse than now... The problem is that this virtqueue_get_capacity() returns the worst case, not the normal case. So using it is deceptive. Maybe just document this? Yes, but also by renaming virtqueue_get_capacity(). Takes it from a 3 to a 6 on the API hard-to-misuse scale. How about, virtqueue_min_capacity()? Makes the reader realize something weird is going on. I still believe capacity really needs to be decided at the virtqueue level, not in the driver. E.g. with indirect each skb uses a single entry: freeing 1 small skb is always enough to have space for a large one. I do understand how it seems a waste to leave direct space in the ring while we might in practice have space due to indirect. Didn't come up with a nice way to solve this yet - but 'no worse than now :)' Agreed. I just wanted to localize the 2+MAX_SKB_FRAGS logic that tries to make sure we have enough space in the buffer. Another way to do that is with a define :). To do this properly, we should really be using the actual number of sg elements needed, but we'd have to do most of xmit_skb beforehand so we know how many. Cheers, Rusty. Maybe I'm confused here. The problem isn't the failing add_buf for the given skb IIUC. What we are trying to do here is stop the queue *before xmit_skb fails*. We can't look at the number of fragments in the current skb - the next one can be much larger. That's why we check capacity after xmit_skb, not before it, right? No, I was confused... More coffee! Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: 2.6.40 event idx patches
On Tue, 24 May 2011 12:23:45 +0300, Michael S. Tsirkin m...@redhat.com wrote: Just checking: were you going to send the following to Linus for 2.6.40? virtio:event_index_interface.patch virtio:ring_inline_function_to_check_for_events.patch virtio:ring_support_event_idx_feature.patch misc:vhost_support_event_index.patch virtio:test_support_event_index.patch virtio:add_api_for_delayed_callbacks.patch virtio:net_delay_tx_callbacks.patch I certainly hope so as these modify both host and guest which makes testing them out of tree hard for people. Once stuff lands on Linus' tree we can patch qemu and get more people to try out the patches. Also, Linus said it's going to be a short window so ... Yes. They've been in linux-next since the weekend (ie. today will be the third linux-next build), and my plan was to push them Friday. Nothing goes into linux-next which is not for Linus until after Linus closes the merge window. I also want to find a few cycles to adapt lguest; it's *always* good to have a second implementation. Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Mon, 23 May 2011 14:19:00 +0300, Michael S. Tsirkin m...@redhat.com wrote: I do understand how it seems a waste to leave direct space in the ring while we might in practice have space due to indirect. Didn't come up with a nice way to solve this yet - but 'no worse than now :)' Let's just make it bool free_old_xmit_skbs(unsigned int max). max == 2 for the normal xmit path, so we're low latency but we keep ahead on average. max == -1 for the we're out of capacity, we may have to stop the queue. That keeps it simple and probably the right thing... Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv2 10/14] virtio_net: limit xmit polling
On Wed, May 25, 2011 at 10:58:26AM +0930, Rusty Russell wrote: On Mon, 23 May 2011 14:19:00 +0300, Michael S. Tsirkin m...@redhat.com wrote: On Mon, May 23, 2011 at 11:37:15AM +0930, Rusty Russell wrote: Can we hit problems with OOM? Sure, but no worse than now... The problem is that this virtqueue_get_capacity() returns the worst case, not the normal case. So using it is deceptive. Maybe just document this? Yes, but also by renaming virtqueue_get_capacity(). Takes it from a 3 to a 6 on the API hard-to-misuse scale. How about, virtqueue_min_capacity()? Makes the reader realize something weird is going on. Absolutely. Great idea. I still believe capacity really needs to be decided at the virtqueue level, not in the driver. E.g. with indirect each skb uses a single entry: freeing 1 small skb is always enough to have space for a large one. I do understand how it seems a waste to leave direct space in the ring while we might in practice have space due to indirect. Didn't come up with a nice way to solve this yet - but 'no worse than now :)' Agreed. I just wanted to localize the 2+MAX_SKB_FRAGS logic that tries to make sure we have enough space in the buffer. Another way to do that is with a define :). To do this properly, we should really be using the actual number of sg elements needed, but we'd have to do most of xmit_skb beforehand so we know how many. Cheers, Rusty. Maybe I'm confused here. The problem isn't the failing add_buf for the given skb IIUC. What we are trying to do here is stop the queue *before xmit_skb fails*. We can't look at the number of fragments in the current skb - the next one can be much larger. That's why we check capacity after xmit_skb, not before it, right? No, I was confused... More coffee! Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization