Re: [PATCH] stopmachine: add stopmachine_timeout
> >> So I think monotonic wallclock time actually makes the most sense here. > > > > This is asking for trouble... a config option to disable this would be > > nice. But as I don't know which problem this patch originally addresses > > it might be that this is needed anyway. So lets see why we need it first. > How about this. We'll make this a sysctl, as Rusty already did, and set the > default to 0 which means "never timeout". That way crazy people like me who > care about this scenario can enable this feature. Yes, sounds like that should work for everyone and no additional config option as well. Sounds good to me. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] tun: Fix/rewrite packet filtering logic
From: Max Krasnyansky <[EMAIL PROTECTED]> Date: Sat, 12 Jul 2008 01:52:54 -0700 > This is on top of the latest and greatest :). Assuming virt folks are ok with > the API this should go into 2.6.27. Really? :-) It doesn't apply cleanly to net-next-2.6, as I just tried to stick this into my tree. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] tun: Fix/rewrite packet filtering logic
From: David Miller <[EMAIL PROTECTED]> Date: Mon, 14 Jul 2008 22:16:02 -0700 (PDT) > It doesn't apply cleanly to net-next-2.6, as I just tried to > stick this into my tree. Ignore this, I did something stupid. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 4/5] virtio net: Allow receiving SG packets
On Mon, Jul 14, 2008 at 10:41:38PM -0500, Rusty Russell wrote: > > + /* If we can receive ANY GSO packets, we must allocate large ones. */ > + if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) > + || virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) > + || virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN)) > + dev->features |= NETIF_F_LRO; We don't want to do this because LRO implies joining packets together in an irreversible way which is not what this is about. Case in point we're now disabling LRO for devices that are forwarding packets which is totally unnecessary for virtio. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 3/5] virtio net: Add ethtool ops for SG/GSO
On Mon, Jul 14, 2008 at 10:40:49PM -0500, Rusty Russell wrote: > From: Herbert Xu <[EMAIL PROTECTED]> > > This patch adds some basic ethtool operations to virtio_net so > I could test SG without GSO (which was really useful because TSO > turned out to be buggy :) > > Signed-off-by: Rusty Russell <[EMAIL PROTECTED]> (remove MTU setting) Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 5/5] virtio_net: Recycle unused recv buffer pages for large skbs in net driver
If we hack the virtio_net driver to always allocate full-sized (64k+) skbuffs, the driver slows down (lguest numbers): Time to receive 1GB (small buffers): 10.85 seconds Time to receive 1GB (64k+ buffers): 24.75 seconds Of course, large buffers use up more space in the ring, so we increase that from 128 to 2048: Time to receive 1GB (64k+ buffers, 2k ring): 16.61 seconds If we recycle pages rather than using alloc_page/free_page: Time to receive 1GB (64k+ buffers, 2k ring, recycle pages): 10.81 seconds This demonstrates that with efficient allocation, we don't need to have a separate "small buffer" queue. Signed-off-by: Rusty Russell <[EMAIL PROTECTED]> --- drivers/net/virtio_net.c | 36 +++- 1 file changed, 35 insertions(+), 1 deletion(-) diff -r 4cef5ad6fd51 drivers/net/virtio_net.c --- a/drivers/net/virtio_net.c Tue May 27 16:07:56 2008 +1000 +++ b/drivers/net/virtio_net.c Tue May 27 16:08:23 2008 +1000 @@ -58,6 +58,9 @@ struct virtnet_info /* Receive & send queues. */ struct sk_buff_head recv; struct sk_buff_head send; + + /* Chain pages by the private ptr. */ + struct page *pages; }; static inline struct virtio_net_hdr *skb_vnet_hdr(struct sk_buff *skb) @@ -68,6 +71,23 @@ static inline void vnet_hdr_to_sg(struct static inline void vnet_hdr_to_sg(struct scatterlist *sg, struct sk_buff *skb) { sg_init_one(sg, skb_vnet_hdr(skb), sizeof(struct virtio_net_hdr)); +} + +static void give_a_page(struct virtnet_info *vi, struct page *page) +{ + page->private = (unsigned long)vi->pages; + vi->pages = page; +} + +static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask) +{ + struct page *p = vi->pages; + + if (p) + vi->pages = (struct page *)p->private; + else + p = alloc_page(gfp_mask); + return p; } static void skb_xmit_done(struct virtqueue *svq) @@ -97,6 +117,15 @@ static void receive_skb(struct net_devic goto drop; } len -= sizeof(struct virtio_net_hdr); + + if (len <= MAX_PACKET_LEN) { + unsigned int i; + + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) + give_a_page(dev->priv, skb_shinfo(skb)->frags[i].page); + skb->data_len = 0; + skb_shinfo(skb)->nr_frags = 0; + } err = pskb_trim(skb, len); if (err) { @@ -180,7 +209,7 @@ static void try_fill_recv(struct virtnet if (vi->dev->features & NETIF_F_LRO) { for (i = 0; i < MAX_SKB_FRAGS; i++) { skb_frag_t *f = &skb_shinfo(skb)->frags[i]; - f->page = alloc_page(GFP_ATOMIC); + f->page = get_a_page(vi, GFP_ATOMIC); if (!f->page) break; @@ -509,6 +538,7 @@ static int virtnet_probe(struct virtio_d vi->dev = dev; vi->vdev = vdev; vdev->priv = vi; + vi->pages = NULL; /* If they give us a callback when all buffers are done, we don't need * the timer. */ @@ -588,6 +618,10 @@ static void virtnet_remove(struct virtio vdev->config->del_vq(vi->svq); vdev->config->del_vq(vi->rvq); unregister_netdev(vi->dev); + + while (vi->pages) + __free_pages(get_a_page(vi, GFP_KERNEL), 0); + free_netdev(vi->dev); } ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 4/5] virtio net: Allow receiving SG packets
From: Herbert Xu <[EMAIL PROTECTED]> Finally this patch lets virtio_net receive GSO packets in addition to sending them. This can definitely be optimised for the non-GSO case. For comparison the Xen approach stores one page in each skb and uses subsequent skb's pages to construct an SG skb instead of preallocating the maximum amount of pages per skb. Signed-off-by: Rusty Russell <[EMAIL PROTECTED]> (added feature bits) --- drivers/net/virtio_net.c | 41 - 1 file changed, 36 insertions(+), 5 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -89,6 +89,7 @@ static void receive_skb(struct net_devic unsigned len) { struct virtio_net_hdr *hdr = skb_vnet_hdr(skb); + int err; if (unlikely(len < sizeof(struct virtio_net_hdr) + ETH_HLEN)) { pr_debug("%s: short packet %i\n", dev->name, len); @@ -96,10 +97,14 @@ static void receive_skb(struct net_devic goto drop; } len -= sizeof(struct virtio_net_hdr); - BUG_ON(len > MAX_PACKET_LEN); - skb_trim(skb, len); - + err = pskb_trim(skb, len); + if (err) { + pr_debug("%s: pskb_trim failed %i %d\n", dev->name, len, err); + dev->stats.rx_dropped++; + goto drop; + } + skb->truesize += skb->data_len; dev->stats.rx_bytes += skb->len; dev->stats.rx_packets++; @@ -161,7 +166,7 @@ static void try_fill_recv(struct virtnet { struct sk_buff *skb; struct scatterlist sg[2+MAX_SKB_FRAGS]; - int num, err; + int num, err, i; sg_init_table(sg, 2+MAX_SKB_FRAGS); for (;;) { @@ -171,6 +176,24 @@ static void try_fill_recv(struct virtnet skb_put(skb, MAX_PACKET_LEN); vnet_hdr_to_sg(sg, skb); + + if (vi->dev->features & NETIF_F_LRO) { + for (i = 0; i < MAX_SKB_FRAGS; i++) { + skb_frag_t *f = &skb_shinfo(skb)->frags[i]; + f->page = alloc_page(GFP_ATOMIC); + if (!f->page) + break; + + f->page_offset = 0; + f->size = PAGE_SIZE; + + skb->data_len += PAGE_SIZE; + skb->len += PAGE_SIZE; + + skb_shinfo(skb)->nr_frags++; + } + } + num = skb_to_sgvec(skb, sg+1, 0, skb->len) + 1; skb_queue_head(&vi->recv, skb); @@ -466,6 +489,12 @@ static int virtnet_probe(struct virtio_d dev->features |= NETIF_F_UFO; } + /* If we can receive ANY GSO packets, we must allocate large ones. */ + if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) + || virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) + || virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN)) + dev->features |= NETIF_F_LRO; + /* Configuration may specify what MAC to use. Otherwise random. */ if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) { vdev->config->get(vdev, @@ -571,7 +600,9 @@ static unsigned int features[] = { VIRTIO_NET_F_CSUM, VIRTIO_NET_F_GUEST_CSUM, VIRTIO_NET_F_GSO, VIRTIO_NET_F_MAC, VIRTIO_NET_F_HOST_TSO4, VIRTIO_NET_F_HOST_UFO, VIRTIO_NET_F_HOST_TSO6, - VIRTIO_NET_F_HOST_ECN, VIRTIO_F_NOTIFY_ON_EMPTY, + VIRTIO_NET_F_HOST_ECN, VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6, + VIRTIO_NET_F_GUEST_ECN, /* We don't yet handle UFO input. */ + VIRTIO_F_NOTIFY_ON_EMPTY, }; static struct virtio_driver virtio_net = { ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 3/5] virtio net: Add ethtool ops for SG/GSO
From: Herbert Xu <[EMAIL PROTECTED]> This patch adds some basic ethtool operations to virtio_net so I could test SG without GSO (which was really useful because TSO turned out to be buggy :) Signed-off-by: Rusty Russell <[EMAIL PROTECTED]> (remove MTU setting) --- drivers/net/virtio_net.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -19,6 +19,7 @@ //#define DEBUG #include #include +#include #include #include #include @@ -352,6 +353,22 @@ static int virtnet_close(struct net_devi return 0; } +static int virtnet_set_tx_csum(struct net_device *dev, u32 data) +{ + struct virtnet_info *vi = netdev_priv(dev); + struct virtio_device *vdev = vi->vdev; + + if (data && !virtio_has_feature(vdev, VIRTIO_NET_F_CSUM)) + return -ENOSYS; + + return ethtool_op_set_tx_hw_csum(dev, data); +} + +static struct ethtool_ops virtnet_ethtool_ops = { + .set_tx_csum = virtnet_set_tx_csum, + .set_sg = ethtool_op_set_sg, +}; + static int virtnet_probe(struct virtio_device *vdev) { int err; @@ -371,6 +389,7 @@ static int virtnet_probe(struct virtio_d #ifdef CONFIG_NET_POLL_CONTROLLER dev->poll_controller = virtnet_netpoll; #endif + SET_ETHTOOL_OPS(dev, &virtnet_ethtool_ops); SET_NETDEV_DEV(dev, &vdev->dev); /* Do we support "hardware" checksums? */ ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 1/5] virtio_net: Set VIRTIO_NET_F_GUEST_CSUM feature
(I know you already have this, but included for completeness) From: Mark McLoughlin <[EMAIL PROTECTED]> We can handle receiving partial csums, so set the appropriate feature bit. Signed-off-by: Mark McLoughlin <[EMAIL PROTECTED]> Signed-off-by: Rusty Russell <[EMAIL PROTECTED]> --- drivers/net/virtio_net.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -550,7 +550,8 @@ static struct virtio_device_id id_table[ }; static unsigned int features[] = { - VIRTIO_NET_F_CSUM, VIRTIO_NET_F_GSO, VIRTIO_NET_F_MAC, + VIRTIO_NET_F_CSUM, VIRTIO_NET_F_GUEST_CSUM, + VIRTIO_NET_F_GSO, VIRTIO_NET_F_MAC, VIRTIO_NET_F_HOST_TSO4, VIRTIO_NET_F_HOST_UFO, VIRTIO_NET_F_HOST_TSO6, VIRTIO_NET_F_HOST_ECN, VIRTIO_F_NOTIFY_ON_EMPTY, }; ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] stopmachine: add stopmachine_timeout
Hidetoshi Seto wrote: > Heiko Carstens wrote: >> Hmm.. probably a stupid question: but what could happen that a real cpu >> (not virtual) becomes unresponsive so that it won't schedule a MAX_RT_PRIO-1 >> prioritized task for 5 seconds? > > The original problem (once I heard and easily reproduced) was there was an > another MAX_RT_PRIO-1 task and the task was spinning in itself by a bug. > (Now this would not be a problem since RLIMIT_RTTIME will work for it, but > I cannot deny that there are some situations which cannot set the limit.) Yep. As I described in the prev email in my case it's a legitimate thing. Some of the CPU cores are running wireless basestation schedulers and the deadlines are way too tight for them to sleep (it's "cpu as a dedicated engine" kind of thing, they are properly isolated and stuff). In this case actually RT limit is the first thing that I disable :). I'd rather have stop_machine fail and tell the user that something is wrong. In which case they can simply stop the basestation app that is running when convinient. ie It give control back to the user rather than wedging the box or killing the app. Max ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] stopmachine: add stopmachine_timeout
Heiko Carstens wrote: > On Mon, Jul 14, 2008 at 11:56:18AM -0700, Jeremy Fitzhardinge wrote: >> Rusty Russell wrote: >>> On Monday 14 July 2008 21:51:25 Christian Borntraeger wrote: Am Montag, 14. Juli 2008 schrieb Hidetoshi Seto: > + /* Wait all others come to life */ > + while (cpus_weight(prepared_cpus) != num_online_cpus() - 1) { > + if (time_is_before_jiffies(limit)) > + goto timeout; > + cpu_relax(); > + } > + > Hmm. I think this could become interesting on virtual machines. The hypervisor might be to busy to schedule a specific cpu at certain load scenarios. This would cause a failure even if the cpu is not really locked up. We had similar problems with the soft lockup daemon on s390. >>> 5 seconds is a fairly long time. If all else fails we could have a config >>> option to simply disable this code. > > Hmm.. probably a stupid question: but what could happen that a real cpu > (not virtual) becomes unresponsive so that it won't schedule a MAX_RT_PRIO-1 > prioritized task for 5 seconds? I have a workload where MAX_PRIO RT thread runs and never yeilds. That's what my cpu isolation patches/tree addresses. Stopmachine is the only (that I know of) thing that really brakes in that case. btw In case you're wondering yes we've discussed workqueue threads starvation and stuff in the other threads. So yet it can happen. It would be good to not-use wall-clock time, but really used cpu time instead. Unfortunately I have no idea, if that is possible in a generic way. Heiko, any ideas? >>> Ah, cpu time comes up again. Perhaps we should actually dig that up again; >>> Zach and Jeremy CC'd. >> Hm, yeah. But in this case, it's tricky. CPU time is an inherently >> per-cpu quantity. If cpu A is waiting for cpu B, and wants to do the >> timeout in cpu-seconds, then it has to be in *B*s cpu-seconds (and if A >> is waiting on B,C,D,E,F... it needs to measure separate timeouts with >> separate timebases for each other CPU). It also means that if B is >> unresponsive but also not consuming any time (blocked in IO, >> administratively paused, etc), then the timeout will never trigger. >> >> So I think monotonic wallclock time actually makes the most sense here. > > This is asking for trouble... a config option to disable this would be > nice. But as I don't know which problem this patch originally addresses > it might be that this is needed anyway. So lets see why we need it first. How about this. We'll make this a sysctl, as Rusty already did, and set the default to 0 which means "never timeout". That way crazy people like me who care about this scenario can enable this feature. btw Rusty, I just had this "why didn't I think of that" moments. This is actually another way of handling my workload. I mean it certainly does not fix the root case of the problems and we still need other things that we talked about (non-blocking module delete, lock-free module insertion, etc) but at least in the mean time it avoids wedging the machines for good. btw I'd like that timeout in milliseconds. I think 5 seconds is way to long :). Max ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] stopmachine: add stopmachine_timeout
On Tuesday 15 July 2008 07:20:26 Heiko Carstens wrote: > On Mon, Jul 14, 2008 at 11:56:18AM -0700, Jeremy Fitzhardinge wrote: > > Rusty Russell wrote: > > > On Monday 14 July 2008 21:51:25 Christian Borntraeger wrote: > > >> Am Montag, 14. Juli 2008 schrieb Hidetoshi Seto: > > >>> + /* Wait all others come to life */ > > >>> + while (cpus_weight(prepared_cpus) != num_online_cpus() - 1) { > > >>> + if (time_is_before_jiffies(limit)) > > >>> + goto timeout; > > >>> + cpu_relax(); > > >>> + } > > >>> + > > >> > > >> Hmm. I think this could become interesting on virtual machines. The > > >> hypervisor might be to busy to schedule a specific cpu at certain load > > >> scenarios. This would cause a failure even if the cpu is not really > > >> locked up. We had similar problems with the soft lockup daemon on > > >> s390. > > > > > > 5 seconds is a fairly long time. If all else fails we could have a > > > config option to simply disable this code. > > Hmm.. probably a stupid question: but what could happen that a real cpu > (not virtual) becomes unresponsive so that it won't schedule a > MAX_RT_PRIO-1 prioritized task for 5 seconds? Yes. That's exactly what we're trying to detect. Currently the entire machine will wedge. With this patch we can often limp along. Hidetoshi's original problem was a client whose machine had one CPU die, then got wedged as the emergency backup tried to load a module. Along these lines, I found VMWare's relaxed co-scheduling interesting, BTW: http://communities.vmware.com/docs/DOC-4960 > cpu_relax() translates to a hypervisor yield on s390. Probably makes sense > if other architectures would do the same. Yes, I think so too. Actually, doing a random yield-to-other-VCPU on cpu_relax is arguable the right semantic (in Linux it's used for spinning, almost exclusively to wait for other cpus). Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] stopmachine: add stopmachine_timeout
On Mon, Jul 14, 2008 at 11:56:18AM -0700, Jeremy Fitzhardinge wrote: > Rusty Russell wrote: > > On Monday 14 July 2008 21:51:25 Christian Borntraeger wrote: > >> Am Montag, 14. Juli 2008 schrieb Hidetoshi Seto: > >> > >>> + /* Wait all others come to life */ > >>> + while (cpus_weight(prepared_cpus) != num_online_cpus() - 1) { > >>> + if (time_is_before_jiffies(limit)) > >>> + goto timeout; > >>> + cpu_relax(); > >>> + } > >>> + > >>> > >> Hmm. I think this could become interesting on virtual machines. The > >> hypervisor might be to busy to schedule a specific cpu at certain load > >> scenarios. This would cause a failure even if the cpu is not really locked > >> up. We had similar problems with the soft lockup daemon on s390. > > 5 seconds is a fairly long time. If all else fails we could have a config > > option to simply disable this code. Hmm.. probably a stupid question: but what could happen that a real cpu (not virtual) becomes unresponsive so that it won't schedule a MAX_RT_PRIO-1 prioritized task for 5 seconds? > >> It would be good to not-use wall-clock time, but really used cpu time > >> instead. Unfortunately I have no idea, if that is possible in a generic > >> way. Heiko, any ideas? > > > > Ah, cpu time comes up again. Perhaps we should actually dig that up again; > > Zach and Jeremy CC'd. > > Hm, yeah. But in this case, it's tricky. CPU time is an inherently > per-cpu quantity. If cpu A is waiting for cpu B, and wants to do the > timeout in cpu-seconds, then it has to be in *B*s cpu-seconds (and if A > is waiting on B,C,D,E,F... it needs to measure separate timeouts with > separate timebases for each other CPU). It also means that if B is > unresponsive but also not consuming any time (blocked in IO, > administratively paused, etc), then the timeout will never trigger. > > So I think monotonic wallclock time actually makes the most sense here. This is asking for trouble... a config option to disable this would be nice. But as I don't know which problem this patch originally addresses it might be that this is needed anyway. So lets see why we need it first. > The other issue is whether cpu_relax() is the right thing to put in the > busywait. We don't hook it in pvops, so it's just an x86 "pause" > instruction, so from the hypervisor's perspective it just looks like a > spinning CPU. We could either hook cpu_relax() into a hypervisor yield, > or come up with a heavier-weight cpu_snooze() (cpu_relax() is often used > in loops which are expected to have a short duration, where doing a > hypercall+yield would be overkill). cpu_relax() translates to a hypervisor yield on s390. Probably makes sense if other architectures would do the same. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] stopmachine: add stopmachine_timeout
Rusty Russell wrote: > On Monday 14 July 2008 21:51:25 Christian Borntraeger wrote: > >> Am Montag, 14. Juli 2008 schrieb Hidetoshi Seto: >> >>> + /* Wait all others come to life */ >>> + while (cpus_weight(prepared_cpus) != num_online_cpus() - 1) { >>> + if (time_is_before_jiffies(limit)) >>> + goto timeout; >>> + cpu_relax(); >>> + } >>> + >>> >> Hmm. I think this could become interesting on virtual machines. The >> hypervisor might be to busy to schedule a specific cpu at certain load >> scenarios. This would cause a failure even if the cpu is not really locked >> up. We had similar problems with the soft lockup daemon on s390. >> > > 5 seconds is a fairly long time. If all else fails we could have a config > option to simply disable this code. > > >> It would be good to not-use wall-clock time, but really used cpu time >> instead. Unfortunately I have no idea, if that is possible in a generic >> way. Heiko, any ideas? >> > > Ah, cpu time comes up again. Perhaps we should actually dig that up again; > Zach and Jeremy CC'd. Hm, yeah. But in this case, it's tricky. CPU time is an inherently per-cpu quantity. If cpu A is waiting for cpu B, and wants to do the timeout in cpu-seconds, then it has to be in *B*s cpu-seconds (and if A is waiting on B,C,D,E,F... it needs to measure separate timeouts with separate timebases for each other CPU). It also means that if B is unresponsive but also not consuming any time (blocked in IO, administratively paused, etc), then the timeout will never trigger. So I think monotonic wallclock time actually makes the most sense here. The other issue is whether cpu_relax() is the right thing to put in the busywait. We don't hook it in pvops, so it's just an x86 "pause" instruction, so from the hypervisor's perspective it just looks like a spinning CPU. We could either hook cpu_relax() into a hypervisor yield, or come up with a heavier-weight cpu_snooze() (cpu_relax() is often used in loops which are expected to have a short duration, where doing a hypercall+yield would be overkill). J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] stopmachine: add stopmachine_timeout
On Monday 14 July 2008 21:51:25 Christian Borntraeger wrote: > Am Montag, 14. Juli 2008 schrieb Hidetoshi Seto: > > + /* Wait all others come to life */ > > + while (cpus_weight(prepared_cpus) != num_online_cpus() - 1) { > > + if (time_is_before_jiffies(limit)) > > + goto timeout; > > + cpu_relax(); > > + } > > + > > Hmm. I think this could become interesting on virtual machines. The > hypervisor might be to busy to schedule a specific cpu at certain load > scenarios. This would cause a failure even if the cpu is not really locked > up. We had similar problems with the soft lockup daemon on s390. 5 seconds is a fairly long time. If all else fails we could have a config option to simply disable this code. > It would be good to not-use wall-clock time, but really used cpu time > instead. Unfortunately I have no idea, if that is possible in a generic > way. Heiko, any ideas? Ah, cpu time comes up again. Perhaps we should actually dig that up again; Zach and Jeremy CC'd. Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization