Re: [PATCH] stopmachine: add stopmachine_timeout

2008-07-14 Thread Heiko Carstens
> >> So I think monotonic wallclock time actually makes the most sense here.
> > 
> > This is asking for trouble... a config option to disable this would be
> > nice. But as I don't know which problem this patch originally addresses
> > it might be that this is needed anyway. So lets see why we need it first.
> How about this. We'll make this a sysctl, as Rusty already did, and set the
> default to 0 which means "never timeout". That way crazy people like me who
> care about this scenario can enable this feature.

Yes, sounds like that should work for everyone and no additional config
option as well. Sounds good to me.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] tun: Fix/rewrite packet filtering logic

2008-07-14 Thread David Miller
From: Max Krasnyansky <[EMAIL PROTECTED]>
Date: Sat, 12 Jul 2008 01:52:54 -0700

> This is on top of the latest and greatest :). Assuming virt folks are ok with
> the API this should go into 2.6.27.

Really? :-)

It doesn't apply cleanly to net-next-2.6, as I just tried to
stick this into my tree.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] tun: Fix/rewrite packet filtering logic

2008-07-14 Thread David Miller
From: David Miller <[EMAIL PROTECTED]>
Date: Mon, 14 Jul 2008 22:16:02 -0700 (PDT)

> It doesn't apply cleanly to net-next-2.6, as I just tried to
> stick this into my tree.

Ignore this, I did something stupid.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 4/5] virtio net: Allow receiving SG packets

2008-07-14 Thread Herbert Xu
On Mon, Jul 14, 2008 at 10:41:38PM -0500, Rusty Russell wrote:
>
> + /* If we can receive ANY GSO packets, we must allocate large ones. */
> + if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4)
> + || virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6)
> + || virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN))
> + dev->features |= NETIF_F_LRO;

We don't want to do this because LRO implies joining packets
together in an irreversible way which is not what this is about.

Case in point we're now disabling LRO for devices that are
forwarding packets which is totally unnecessary for virtio.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 3/5] virtio net: Add ethtool ops for SG/GSO

2008-07-14 Thread Herbert Xu
On Mon, Jul 14, 2008 at 10:40:49PM -0500, Rusty Russell wrote:
> From: Herbert Xu <[EMAIL PROTECTED]>
> 
> This patch adds some basic ethtool operations to virtio_net so
> I could test SG without GSO (which was really useful because TSO
> turned out to be buggy :)
> 
> Signed-off-by: Rusty Russell <[EMAIL PROTECTED]> (remove MTU setting)

Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 5/5] virtio_net: Recycle unused recv buffer pages for large skbs in net driver

2008-07-14 Thread Rusty Russell

If we hack the virtio_net driver to always allocate full-sized (64k+)
skbuffs, the driver slows down (lguest numbers):

  Time to receive 1GB (small buffers): 10.85 seconds
  Time to receive 1GB (64k+ buffers): 24.75 seconds

Of course, large buffers use up more space in the ring, so we increase
that from 128 to 2048:

  Time to receive 1GB (64k+ buffers, 2k ring): 16.61 seconds

If we recycle pages rather than using alloc_page/free_page:

  Time to receive 1GB (64k+ buffers, 2k ring, recycle pages): 10.81 seconds

This demonstrates that with efficient allocation, we don't need to
have a separate "small buffer" queue.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
---
 drivers/net/virtio_net.c |   36 +++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff -r 4cef5ad6fd51 drivers/net/virtio_net.c
--- a/drivers/net/virtio_net.c  Tue May 27 16:07:56 2008 +1000
+++ b/drivers/net/virtio_net.c  Tue May 27 16:08:23 2008 +1000
@@ -58,6 +58,9 @@ struct virtnet_info
/* Receive & send queues. */
struct sk_buff_head recv;
struct sk_buff_head send;
+
+   /* Chain pages by the private ptr. */
+   struct page *pages;
 };
 
 static inline struct virtio_net_hdr *skb_vnet_hdr(struct sk_buff *skb)
@@ -68,6 +71,23 @@ static inline void vnet_hdr_to_sg(struct
 static inline void vnet_hdr_to_sg(struct scatterlist *sg, struct sk_buff 
*skb)
 {
sg_init_one(sg, skb_vnet_hdr(skb), sizeof(struct virtio_net_hdr));
+}
+
+static void give_a_page(struct virtnet_info *vi, struct page *page)
+{
+   page->private = (unsigned long)vi->pages;
+   vi->pages = page;
+}
+
+static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask)
+{
+   struct page *p = vi->pages;
+
+   if (p)
+   vi->pages = (struct page *)p->private;
+   else
+   p = alloc_page(gfp_mask);
+   return p;
 }
 
 static void skb_xmit_done(struct virtqueue *svq)
@@ -97,6 +117,15 @@ static void receive_skb(struct net_devic
goto drop;
}
len -= sizeof(struct virtio_net_hdr);
+
+   if (len <= MAX_PACKET_LEN) {
+   unsigned int i;
+
+   for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
+   give_a_page(dev->priv, skb_shinfo(skb)->frags[i].page);
+   skb->data_len = 0;
+   skb_shinfo(skb)->nr_frags = 0;
+   }
 
err = pskb_trim(skb, len);
if (err) {
@@ -180,7 +209,7 @@ static void try_fill_recv(struct virtnet
if (vi->dev->features & NETIF_F_LRO) {
for (i = 0; i < MAX_SKB_FRAGS; i++) {
skb_frag_t *f = &skb_shinfo(skb)->frags[i];
-   f->page = alloc_page(GFP_ATOMIC);
+   f->page = get_a_page(vi, GFP_ATOMIC);
if (!f->page)
break;
 
@@ -509,6 +538,7 @@ static int virtnet_probe(struct virtio_d
vi->dev = dev;
vi->vdev = vdev;
vdev->priv = vi;
+   vi->pages = NULL;
 
/* If they give us a callback when all buffers are done, we don't need
 * the timer. */
@@ -588,6 +618,10 @@ static void virtnet_remove(struct virtio
vdev->config->del_vq(vi->svq);
vdev->config->del_vq(vi->rvq);
unregister_netdev(vi->dev);
+
+   while (vi->pages)
+   __free_pages(get_a_page(vi, GFP_KERNEL), 0);
+
free_netdev(vi->dev);
 }
 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 4/5] virtio net: Allow receiving SG packets

2008-07-14 Thread Rusty Russell
From: Herbert Xu <[EMAIL PROTECTED]>

Finally this patch lets virtio_net receive GSO packets in addition
to sending them.  This can definitely be optimised for the non-GSO
case.  For comparison the Xen approach stores one page in each skb
and uses subsequent skb's pages to construct an SG skb instead of
preallocating the maximum amount of pages per skb.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]> (added feature bits)
---
 drivers/net/virtio_net.c |   41 -
 1 file changed, 36 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -89,6 +89,7 @@ static void receive_skb(struct net_devic
unsigned len)
 {
struct virtio_net_hdr *hdr = skb_vnet_hdr(skb);
+   int err;
 
if (unlikely(len < sizeof(struct virtio_net_hdr) + ETH_HLEN)) {
pr_debug("%s: short packet %i\n", dev->name, len);
@@ -96,10 +97,14 @@ static void receive_skb(struct net_devic
goto drop;
}
len -= sizeof(struct virtio_net_hdr);
-   BUG_ON(len > MAX_PACKET_LEN);
 
-   skb_trim(skb, len);
-
+   err = pskb_trim(skb, len);
+   if (err) {
+   pr_debug("%s: pskb_trim failed %i %d\n", dev->name, len, err);
+   dev->stats.rx_dropped++;
+   goto drop;
+   }
+   skb->truesize += skb->data_len;
dev->stats.rx_bytes += skb->len;
dev->stats.rx_packets++;
 
@@ -161,7 +166,7 @@ static void try_fill_recv(struct virtnet
 {
struct sk_buff *skb;
struct scatterlist sg[2+MAX_SKB_FRAGS];
-   int num, err;
+   int num, err, i;
 
sg_init_table(sg, 2+MAX_SKB_FRAGS);
for (;;) {
@@ -171,6 +176,24 @@ static void try_fill_recv(struct virtnet
 
skb_put(skb, MAX_PACKET_LEN);
vnet_hdr_to_sg(sg, skb);
+
+   if (vi->dev->features & NETIF_F_LRO) {
+   for (i = 0; i < MAX_SKB_FRAGS; i++) {
+   skb_frag_t *f = &skb_shinfo(skb)->frags[i];
+   f->page = alloc_page(GFP_ATOMIC);
+   if (!f->page)
+   break;
+
+   f->page_offset = 0;
+   f->size = PAGE_SIZE;
+
+   skb->data_len += PAGE_SIZE;
+   skb->len += PAGE_SIZE;
+
+   skb_shinfo(skb)->nr_frags++;
+   }
+   }
+
num = skb_to_sgvec(skb, sg+1, 0, skb->len) + 1;
skb_queue_head(&vi->recv, skb);
 
@@ -466,6 +489,12 @@ static int virtnet_probe(struct virtio_d
dev->features |= NETIF_F_UFO;
}
 
+   /* If we can receive ANY GSO packets, we must allocate large ones. */
+   if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4)
+   || virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6)
+   || virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN))
+   dev->features |= NETIF_F_LRO;
+
/* Configuration may specify what MAC to use.  Otherwise random. */
if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) {
vdev->config->get(vdev,
@@ -571,7 +600,9 @@ static unsigned int features[] = {
VIRTIO_NET_F_CSUM, VIRTIO_NET_F_GUEST_CSUM,
VIRTIO_NET_F_GSO, VIRTIO_NET_F_MAC,
VIRTIO_NET_F_HOST_TSO4, VIRTIO_NET_F_HOST_UFO, VIRTIO_NET_F_HOST_TSO6,
-   VIRTIO_NET_F_HOST_ECN, VIRTIO_F_NOTIFY_ON_EMPTY,
+   VIRTIO_NET_F_HOST_ECN, VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6,
+   VIRTIO_NET_F_GUEST_ECN, /* We don't yet handle UFO input. */
+   VIRTIO_F_NOTIFY_ON_EMPTY,
 };
 
 static struct virtio_driver virtio_net = {

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 3/5] virtio net: Add ethtool ops for SG/GSO

2008-07-14 Thread Rusty Russell
From: Herbert Xu <[EMAIL PROTECTED]>

This patch adds some basic ethtool operations to virtio_net so
I could test SG without GSO (which was really useful because TSO
turned out to be buggy :)

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]> (remove MTU setting)
---
 drivers/net/virtio_net.c |   18 ++
 1 file changed, 18 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -19,6 +19,7 @@
 //#define DEBUG
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -352,6 +353,22 @@ static int virtnet_close(struct net_devi
return 0;
 }
 
+static int virtnet_set_tx_csum(struct net_device *dev, u32 data)
+{
+   struct virtnet_info *vi = netdev_priv(dev);
+   struct virtio_device *vdev = vi->vdev;
+
+   if (data && !virtio_has_feature(vdev, VIRTIO_NET_F_CSUM))
+   return -ENOSYS;
+
+   return ethtool_op_set_tx_hw_csum(dev, data);
+}
+
+static struct ethtool_ops virtnet_ethtool_ops = {
+   .set_tx_csum = virtnet_set_tx_csum,
+   .set_sg = ethtool_op_set_sg,
+};
+
 static int virtnet_probe(struct virtio_device *vdev)
 {
int err;
@@ -371,6 +389,7 @@ static int virtnet_probe(struct virtio_d
 #ifdef CONFIG_NET_POLL_CONTROLLER
dev->poll_controller = virtnet_netpoll;
 #endif
+   SET_ETHTOOL_OPS(dev, &virtnet_ethtool_ops);
SET_NETDEV_DEV(dev, &vdev->dev);
 
/* Do we support "hardware" checksums? */

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 1/5] virtio_net: Set VIRTIO_NET_F_GUEST_CSUM feature

2008-07-14 Thread Rusty Russell
(I know you already have this, but included for completeness)

From: Mark McLoughlin <[EMAIL PROTECTED]>

We can handle receiving partial csums, so set the
appropriate feature bit.

Signed-off-by: Mark McLoughlin <[EMAIL PROTECTED]>
Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
---
 drivers/net/virtio_net.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -550,7 +550,8 @@ static struct virtio_device_id id_table[
 };
 
 static unsigned int features[] = {
-   VIRTIO_NET_F_CSUM, VIRTIO_NET_F_GSO, VIRTIO_NET_F_MAC,
+   VIRTIO_NET_F_CSUM, VIRTIO_NET_F_GUEST_CSUM,
+   VIRTIO_NET_F_GSO, VIRTIO_NET_F_MAC,
VIRTIO_NET_F_HOST_TSO4, VIRTIO_NET_F_HOST_UFO, VIRTIO_NET_F_HOST_TSO6,
VIRTIO_NET_F_HOST_ECN, VIRTIO_F_NOTIFY_ON_EMPTY,
 };
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] stopmachine: add stopmachine_timeout

2008-07-14 Thread Max Krasnyansky


Hidetoshi Seto wrote:
> Heiko Carstens wrote:
>> Hmm.. probably a stupid question: but what could happen that a real cpu
>> (not virtual) becomes unresponsive so that it won't schedule a MAX_RT_PRIO-1
>> prioritized task for 5 seconds?
> 
> The original problem (once I heard and easily reproduced) was there was an
> another MAX_RT_PRIO-1 task and the task was spinning in itself by a bug.
> (Now this would not be a problem since RLIMIT_RTTIME will work for it, but
> I cannot deny that there are some situations which cannot set the limit.)
Yep. As I described in the prev email in my case it's a legitimate thing. Some
of the CPU cores are running wireless basestation schedulers and the deadlines
are way too tight for them to sleep (it's "cpu as a dedicated engine" kind of
thing, they are properly isolated and stuff).

In this case actually RT limit is the first thing that I disable :).
I'd rather have stop_machine fail and tell the user that something is wrong.
In which case they can simply stop the basestation app that is running when
convinient. ie It give control back to the user rather than wedging the box or
killing the app.

Max
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] stopmachine: add stopmachine_timeout

2008-07-14 Thread Max Krasnyansky


Heiko Carstens wrote:
> On Mon, Jul 14, 2008 at 11:56:18AM -0700, Jeremy Fitzhardinge wrote:
>> Rusty Russell wrote:
>>> On Monday 14 July 2008 21:51:25 Christian Borntraeger wrote:
 Am Montag, 14. Juli 2008 schrieb Hidetoshi Seto:
 
> + /* Wait all others come to life */
> + while (cpus_weight(prepared_cpus) != num_online_cpus() - 1) {
> + if (time_is_before_jiffies(limit))
> + goto timeout;
> + cpu_relax();
> + }
> +
>   
 Hmm. I think this could become interesting on virtual machines. The
 hypervisor might be to busy to schedule a specific cpu at certain load
 scenarios. This would cause a failure even if the cpu is not really locked
 up. We had similar problems with the soft lockup daemon on s390.
>>> 5 seconds is a fairly long time.  If all else fails we could have a config 
>>> option to simply disable this code.
> 
> Hmm.. probably a stupid question: but what could happen that a real cpu
> (not virtual) becomes unresponsive so that it won't schedule a MAX_RT_PRIO-1
> prioritized task for 5 seconds?
I have a workload where MAX_PRIO RT thread runs and never yeilds. That's what
my cpu isolation patches/tree addresses. Stopmachine is the only (that I know
of) thing that really brakes in that case. btw In case you're wondering yes
we've discussed workqueue threads starvation and stuff in the other threads.
So yet it can happen.

 It would be good to not-use wall-clock time, but really used cpu time
 instead. Unfortunately I have no idea, if that is possible in a generic
 way. Heiko, any ideas?
>>> Ah, cpu time comes up again.  Perhaps we should actually dig that up again; 
>>> Zach and Jeremy CC'd.
>> Hm, yeah. But in this case, it's tricky. CPU time is an inherently
>> per-cpu quantity. If cpu A is waiting for cpu B, and wants to do the
>> timeout in cpu-seconds, then it has to be in *B*s cpu-seconds (and if A
>> is waiting on B,C,D,E,F... it needs to measure separate timeouts with
>> separate timebases for each other CPU). It also means that if B is
>> unresponsive but also not consuming any time (blocked in IO,
>> administratively paused, etc), then the timeout will never trigger.
>>
>> So I think monotonic wallclock time actually makes the most sense here.
> 
> This is asking for trouble... a config option to disable this would be
> nice. But as I don't know which problem this patch originally addresses
> it might be that this is needed anyway. So lets see why we need it first.
How about this. We'll make this a sysctl, as Rusty already did, and set the
default to 0 which means "never timeout". That way crazy people like me who
care about this scenario can enable this feature.

btw Rusty, I just had this "why didn't I think of that" moments. This is
actually another way of handling my workload. I mean it certainly does not fix
the root case of the problems and we still need other things that we talked
about (non-blocking module delete, lock-free module insertion, etc) but at
least in the mean time it avoids wedging the machines for good.
btw I'd like that timeout in milliseconds. I think 5 seconds is way to
long :).

Max
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] stopmachine: add stopmachine_timeout

2008-07-14 Thread Rusty Russell
On Tuesday 15 July 2008 07:20:26 Heiko Carstens wrote:
> On Mon, Jul 14, 2008 at 11:56:18AM -0700, Jeremy Fitzhardinge wrote:
> > Rusty Russell wrote:
> > > On Monday 14 July 2008 21:51:25 Christian Borntraeger wrote:
> > >> Am Montag, 14. Juli 2008 schrieb Hidetoshi Seto:
> > >>> +   /* Wait all others come to life */
> > >>> +   while (cpus_weight(prepared_cpus) != num_online_cpus() - 1) {
> > >>> +   if (time_is_before_jiffies(limit))
> > >>> +   goto timeout;
> > >>> +   cpu_relax();
> > >>> +   }
> > >>> +
> > >>
> > >> Hmm. I think this could become interesting on virtual machines. The
> > >> hypervisor might be to busy to schedule a specific cpu at certain load
> > >> scenarios. This would cause a failure even if the cpu is not really
> > >> locked up. We had similar problems with the soft lockup daemon on
> > >> s390.
> > >
> > > 5 seconds is a fairly long time.  If all else fails we could have a
> > > config option to simply disable this code.
>
> Hmm.. probably a stupid question: but what could happen that a real cpu
> (not virtual) becomes unresponsive so that it won't schedule a
> MAX_RT_PRIO-1 prioritized task for 5 seconds?

Yes.  That's exactly what we're trying to detect.  Currently the entire 
machine will wedge.  With this patch we can often limp along.

Hidetoshi's original problem was a client whose machine had one CPU die, then 
got wedged as the emergency backup tried to load a module.

Along these lines, I found VMWare's relaxed co-scheduling interesting, BTW:
http://communities.vmware.com/docs/DOC-4960

> cpu_relax() translates to a hypervisor yield on s390. Probably makes sense
> if other architectures would do the same.

Yes, I think so too.  Actually, doing a random yield-to-other-VCPU on 
cpu_relax is arguable the right semantic (in Linux it's used for spinning, 
almost exclusively to wait for other cpus).

Cheers,
Rusty.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] stopmachine: add stopmachine_timeout

2008-07-14 Thread Heiko Carstens
On Mon, Jul 14, 2008 at 11:56:18AM -0700, Jeremy Fitzhardinge wrote:
> Rusty Russell wrote:
> > On Monday 14 July 2008 21:51:25 Christian Borntraeger wrote:
> >> Am Montag, 14. Juli 2008 schrieb Hidetoshi Seto:
> >> 
> >>> + /* Wait all others come to life */
> >>> + while (cpus_weight(prepared_cpus) != num_online_cpus() - 1) {
> >>> + if (time_is_before_jiffies(limit))
> >>> + goto timeout;
> >>> + cpu_relax();
> >>> + }
> >>> +
> >>>   
> >> Hmm. I think this could become interesting on virtual machines. The
> >> hypervisor might be to busy to schedule a specific cpu at certain load
> >> scenarios. This would cause a failure even if the cpu is not really locked
> >> up. We had similar problems with the soft lockup daemon on s390.
> > 5 seconds is a fairly long time.  If all else fails we could have a config 
> > option to simply disable this code.

Hmm.. probably a stupid question: but what could happen that a real cpu
(not virtual) becomes unresponsive so that it won't schedule a MAX_RT_PRIO-1
prioritized task for 5 seconds?

> >> It would be good to not-use wall-clock time, but really used cpu time
> >> instead. Unfortunately I have no idea, if that is possible in a generic
> >> way. Heiko, any ideas?
> >
> > Ah, cpu time comes up again.  Perhaps we should actually dig that up again; 
> > Zach and Jeremy CC'd.
> 
> Hm, yeah. But in this case, it's tricky. CPU time is an inherently
> per-cpu quantity. If cpu A is waiting for cpu B, and wants to do the
> timeout in cpu-seconds, then it has to be in *B*s cpu-seconds (and if A
> is waiting on B,C,D,E,F... it needs to measure separate timeouts with
> separate timebases for each other CPU). It also means that if B is
> unresponsive but also not consuming any time (blocked in IO,
> administratively paused, etc), then the timeout will never trigger.
> 
> So I think monotonic wallclock time actually makes the most sense here.

This is asking for trouble... a config option to disable this would be
nice. But as I don't know which problem this patch originally addresses
it might be that this is needed anyway. So lets see why we need it first.

> The other issue is whether cpu_relax() is the right thing to put in the
> busywait. We don't hook it in pvops, so it's just an x86 "pause"
> instruction, so from the hypervisor's perspective it just looks like a
> spinning CPU. We could either hook cpu_relax() into a hypervisor yield,
> or come up with a heavier-weight cpu_snooze() (cpu_relax() is often used
> in loops which are expected to have a short duration, where doing a
> hypercall+yield would be overkill).

cpu_relax() translates to a hypervisor yield on s390. Probably makes sense
if other architectures would do the same.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] stopmachine: add stopmachine_timeout

2008-07-14 Thread Jeremy Fitzhardinge
Rusty Russell wrote:
> On Monday 14 July 2008 21:51:25 Christian Borntraeger wrote:
>   
>> Am Montag, 14. Juli 2008 schrieb Hidetoshi Seto:
>> 
>>> +   /* Wait all others come to life */
>>> +   while (cpus_weight(prepared_cpus) != num_online_cpus() - 1) {
>>> +   if (time_is_before_jiffies(limit))
>>> +   goto timeout;
>>> +   cpu_relax();
>>> +   }
>>> +
>>>   
>> Hmm. I think this could become interesting on virtual machines. The
>> hypervisor might be to busy to schedule a specific cpu at certain load
>> scenarios. This would cause a failure even if the cpu is not really locked
>> up. We had similar problems with the soft lockup daemon on s390.
>> 
>
> 5 seconds is a fairly long time.  If all else fails we could have a config 
> option to simply disable this code.
>
>   
>> It would be good to not-use wall-clock time, but really used cpu time
>> instead. Unfortunately I have no idea, if that is possible in a generic
>> way. Heiko, any ideas?
>> 
>
> Ah, cpu time comes up again.  Perhaps we should actually dig that up again; 
> Zach and Jeremy CC'd.

Hm, yeah. But in this case, it's tricky. CPU time is an inherently
per-cpu quantity. If cpu A is waiting for cpu B, and wants to do the
timeout in cpu-seconds, then it has to be in *B*s cpu-seconds (and if A
is waiting on B,C,D,E,F... it needs to measure separate timeouts with
separate timebases for each other CPU). It also means that if B is
unresponsive but also not consuming any time (blocked in IO,
administratively paused, etc), then the timeout will never trigger.

So I think monotonic wallclock time actually makes the most sense here.

The other issue is whether cpu_relax() is the right thing to put in the
busywait. We don't hook it in pvops, so it's just an x86 "pause"
instruction, so from the hypervisor's perspective it just looks like a
spinning CPU. We could either hook cpu_relax() into a hypervisor yield,
or come up with a heavier-weight cpu_snooze() (cpu_relax() is often used
in loops which are expected to have a short duration, where doing a
hypercall+yield would be overkill).

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] stopmachine: add stopmachine_timeout

2008-07-14 Thread Rusty Russell
On Monday 14 July 2008 21:51:25 Christian Borntraeger wrote:
> Am Montag, 14. Juli 2008 schrieb Hidetoshi Seto:
> > +   /* Wait all others come to life */
> > +   while (cpus_weight(prepared_cpus) != num_online_cpus() - 1) {
> > +   if (time_is_before_jiffies(limit))
> > +   goto timeout;
> > +   cpu_relax();
> > +   }
> > +
>
> Hmm. I think this could become interesting on virtual machines. The
> hypervisor might be to busy to schedule a specific cpu at certain load
> scenarios. This would cause a failure even if the cpu is not really locked
> up. We had similar problems with the soft lockup daemon on s390.

5 seconds is a fairly long time.  If all else fails we could have a config 
option to simply disable this code.

> It would be good to not-use wall-clock time, but really used cpu time
> instead. Unfortunately I have no idea, if that is possible in a generic
> way. Heiko, any ideas?

Ah, cpu time comes up again.  Perhaps we should actually dig that up again; 
Zach and Jeremy CC'd.

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization