Re: Hypervisor detection from within a Linux VM

2010-06-29 Thread Anthony Liguori
On 06/29/2010 04:25 PM, Chetan Loke wrote:
> Hello,
>
>
> Requirement:
> I have the need to support my apps(running on a Linux VM) on different
> *nix hypervisors(ESX/Xen etc). I need to know on which hypervisor my
> app is running. I read the CPUID usage thread -
> http://thread.gmane.org/gmane.comp.emulators.kvm.devel/22643 but to be
> honest in the end I looked at
> http://lxr.linux.no/#linux+v2.6.34/arch/x86/kernel/cpu/vmware.c#L88
> The vmware_platform() detection code is straight forward.
>
> Current-hack:
> As a quick hack we just grep lspci for VMware's pci-ids.
>
> Solution:
> I can write a bare minimal driver, check the cpu-id as VMware's
> balloon driver does and then emit a proc/sysfs node. The setup
> packages and the apps can then check for this node-string.I'm
> currently working on ESX and I am hoping that this thin-driver will
> work.
>

It can be done entirely in userspace.  Take a look at virt-what:

http://people.redhat.com/~rjones/virt-what/

> Question:
> Q1)Is it possible to get this functionality as part of the stock
> kernel or is that a bad idea? I suspect there could be other
> users/apps who would need to know what *nix hypervisor(or a
> non-virtualized environment) they are
> running on?
> Q2)If this is not the right approach then can someone please suggest
> another approach?
>

It might be reasonable to list the hypervisor signature as a field in 
/proc/cpuinfo.  There's also a /sys/hypervisor where such information 
could go.

Regards,

Anthony Liguori

> Regards
> Chetan Loke
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv2] vhost-net: add dhclient work-around from userspace

2010-06-29 Thread Michael S. Tsirkin
On Tue, Jun 29, 2010 at 12:36:47AM -0700, David Miller wrote:
> From: "Michael S. Tsirkin" 
> Date: Mon, 28 Jun 2010 13:08:07 +0300
> 
> > Userspace virtio server has the following hack
> > so guests rely on it, and we have to replicate it, too:
> > 
> > Use port number to detect incoming IPv4 DHCP response packets,
> > and fill in the checksum for these.
> > 
> > The issue we are solving is that on linux guests, some apps
> > that use recvmsg with AF_PACKET sockets, don't know how to
> > handle CHECKSUM_PARTIAL;
> > The interface to return the relevant information was added
> > in 8dc4194474159660d7f37c495e3fc3f10d0db8cc,
> > and older userspace does not use it.
> > One important user of recvmsg with AF_PACKET is dhclient,
> > so we add a work-around just for DHCP.
> > 
> > Don't bother applying the hack to IPv6 as userspace virtio does not
> > have a work-around for that - let's hope guests will do the right
> > thing wrt IPv6.
> > 
> > Signed-off-by: Michael S. Tsirkin 
> 
> Yikes, this is awful too.
> 
> Nothing in the kernel should be mucking around with procotol packets
> like this by default.  In particular, what the heck does port 67 mean?
> Locally I can use it for whatever I want for my own purposes, I don't
> have to follow the conventions for service ports as specified by the
> IETF.
> 
> But I can't have the packet checksum state be left alone for port 67
> traffic on a box using virtio because you have this hack there.
> 
> And yes it's broken on machines using the qemu thing, but at least the
> hack there is restricted to userspace.

Yes, and I think it was a mistake to add the hack there. This is what
prevented applications from using the new interface in the 3 years
since it was first introduced.

> I really don't want anything in the kernel that looks like this.
> 
> These applications are broken, and we've provided a way for them to
> work properly.  What's the point of having fixed applications if
> all of these hacks grow like fungus over every virtualization transport?
> 
> It just means that people won't fix the apps, since they don't have
> to.  There is no incentive, and the mechanism we created to properly
> handle this loses it's value.
> 
> At best, you can write a netfilter module that mucks up the packet
> checksum state in these situations.  At least in that case, you can
> make it generic (it mangles iff a packet matches a certain rule,
> so for your virtio guests you'd make it match for DHCP frames) instead
> of being some hard-coded DHCP thing by design.

Nod.
One question on implementation:
why does skb_checksum_help set the checksum state to
CHECKSUM_NONE? Shouldn't it be CHECKSUM_COMPLETE?



> And since this is so cleanly seperated and portable you don't even
> need to push it upstream.  It's a temporary workaround for a temporary
> problem.  You can just delete it as soon as the majority of guests
> have the fixed dhcp.  The qemu crap should disappear similarly.

Since using the module involves updating the management tools
as well, if we go down this route it will be much less painful
for everyone to do push it upstream.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv2] vhost-net: add dhclient work-around from userspace

2010-06-29 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Mon, 28 Jun 2010 13:08:07 +0300

> Userspace virtio server has the following hack
> so guests rely on it, and we have to replicate it, too:
> 
> Use port number to detect incoming IPv4 DHCP response packets,
> and fill in the checksum for these.
> 
> The issue we are solving is that on linux guests, some apps
> that use recvmsg with AF_PACKET sockets, don't know how to
> handle CHECKSUM_PARTIAL;
> The interface to return the relevant information was added
> in 8dc4194474159660d7f37c495e3fc3f10d0db8cc,
> and older userspace does not use it.
> One important user of recvmsg with AF_PACKET is dhclient,
> so we add a work-around just for DHCP.
> 
> Don't bother applying the hack to IPv6 as userspace virtio does not
> have a work-around for that - let's hope guests will do the right
> thing wrt IPv6.
> 
> Signed-off-by: Michael S. Tsirkin 

Yikes, this is awful too.

Nothing in the kernel should be mucking around with procotol packets
like this by default.  In particular, what the heck does port 67 mean?
Locally I can use it for whatever I want for my own purposes, I don't
have to follow the conventions for service ports as specified by the
IETF.

But I can't have the packet checksum state be left alone for port 67
traffic on a box using virtio because you have this hack there.

And yes it's broken on machines using the qemu thing, but at least the
hack there is restricted to userspace.

I really don't want anything in the kernel that looks like this.

These applications are broken, and we've provided a way for them to
work properly.  What's the point of having fixed applications if
all of these hacks grow like fungus over every virtualization transport?

It just means that people won't fix the apps, since they don't have
to.  There is no incentive, and the mechanism we created to properly
handle this loses it's value.

At best, you can write a netfilter module that mucks up the packet
checksum state in these situations.  At least in that case, you can
make it generic (it mangles iff a packet matches a certain rule,
so for your virtio guests you'd make it match for DHCP frames) instead
of being some hard-coded DHCP thing by design.

And since this is so cleanly seperated and portable you don't even
need to push it upstream.  It's a temporary workaround for a temporary
problem.  You can just delete it as soon as the majority of guests
have the fixed dhcp.  The qemu crap should disappear similarly.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [RFC] virtio: Support releasing lock during kick

2010-06-29 Thread Avi Kivity
On 06/29/2010 10:08 AM, Stefan Hajnoczi wrote:
>
> Is it incorrect to have the following pattern?
> spin_lock_irqsave(q->queue_lock);
> spin_unlock(q->queue_lock);
> spin_lock(q->queue_lock);
> spin_unlock_irqrestore(q->queue_lock);
>

Perfectly legitimate.  spin_lock_irqsave() is equivalent to 
local_irq_save() followed by spin_lock() (with the potential 
optimization that we can service interrupts while spinning).

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [RFC] virtio: Support releasing lock during kick

2010-06-29 Thread Stefan Hajnoczi
On Mon, Jun 28, 2010 at 4:55 PM, Marcelo Tosatti  wrote:
> On Wed, Jun 23, 2010 at 10:24:02PM +0100, Stefan Hajnoczi wrote:
>> The virtio block device holds a lock during I/O request processing.
>> Kicking the virtqueue while the lock is held results in long lock hold
>> times and increases contention for the lock.
>>
>> This patch modifies virtqueue_kick() to optionally release a lock while
>> notifying the host.  Virtio block is modified to pass in its lock.  This
>> allows other vcpus to queue I/O requests during the time spent servicing
>> the virtqueue notify in the host.
>>
>> The virtqueue_kick() function is modified to know about locking because
>> it changes the state of the virtqueue and should execute with the lock
>> held (it would not be correct for virtio block to release the lock
>> before calling virtqueue_kick()).
>>
>> Signed-off-by: Stefan Hajnoczi 
>> ---
>> I am not yet 100% happy with this patch which aims to reduce guest CPU
>> consumption related to vblk->lock contention.  Although this patch reduces
>> wait/hold times it does not affect I/O throughput or guest CPU utilization.
>> More investigation is required to get to the bottom of why guest CPU
>> utilization does not decrease when a lock bottleneck has been removed.
>>
>> Performance figures:
>>
>> Host: 2.6.34 upstream kernel, qemu-kvm-0.12.4 if=virtio,cache=none
>> Guest: 2.6.35-rc3-kvm.git upstream kernel
>> Storage: 12 disks as striped LVM volume
>> Benchmark: 4 concurrent dd bs=4k iflag=direct
>>
>> Lockstat data for &vblk->lock:
>>
>> test       con-bounces contentions  waittime-min waittime-max waittime-total
>> unmodified 7097        7108         0.31         956.09       161165.4
>> patched    11484       11550        0.30         411.80       50245.83
>>
>> The maximum wait time went down by 544.29 us (-57%) and the total wait time
>> decreased by 69%.  This shows that the virtqueue kick is indeed hogging the
>> lock.
>>
>> The patched version actually has higher contention than the unmodified 
>> version.
>> I think the reason for this is that each virtqueue kick now includes a short
>> release and reacquire.  This short release gives other vcpus a chance to
>> acquire the lock and progress, hence more contention but overall better wait
>> time numbers.
>>
>> name       acq-bounces acquisitions holdtime-min holdtime-max holdtime-total
>> unmodified 10771       5038346      0.00         3271.81      59016905.47
>> patched    31594       5857813      0.00         219.76       24104915.55
>>
>> Here we see the full impact of this patch: hold time reduced to 219.76 us
>> (-93%).
>>
>> Again the acquisitions have increased since we're now doing an extra
>> unlock+lock per virtqueue kick.
>>
>> Testing, ideas, and comments appreciated.
>>
>>  drivers/block/virtio_blk.c          |    2 +-
>>  drivers/char/hw_random/virtio-rng.c |    2 +-
>>  drivers/char/virtio_console.c       |    6 +++---
>>  drivers/net/virtio_net.c            |    6 +++---
>>  drivers/virtio/virtio_balloon.c     |    6 +++---
>>  drivers/virtio/virtio_ring.c        |   13 +++--
>>  include/linux/virtio.h              |    3 ++-
>>  net/9p/trans_virtio.c               |    2 +-
>>  8 files changed, 25 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>> index 258bc2a..de033bf 100644
>> --- a/drivers/block/virtio_blk.c
>> +++ b/drivers/block/virtio_blk.c
>> @@ -187,7 +187,7 @@ static void do_virtblk_request(struct request_queue *q)
>>       }
>>
>>       if (issued)
>> -             virtqueue_kick(vblk->vq);
>> +             virtqueue_kick(vblk->vq, &vblk->lock);
>>  }
>>
>>  static void virtblk_prepare_flush(struct request_queue *q, struct request 
>> *req)
>> diff --git a/drivers/char/hw_random/virtio-rng.c 
>> b/drivers/char/hw_random/virtio-rng.c
>> index 75f1cbd..852d563 100644
>> --- a/drivers/char/hw_random/virtio-rng.c
>> +++ b/drivers/char/hw_random/virtio-rng.c
>> @@ -49,7 +49,7 @@ static void register_buffer(u8 *buf, size_t size)
>>       if (virtqueue_add_buf(vq, &sg, 0, 1, buf) < 0)
>>               BUG();
>>
>> -     virtqueue_kick(vq);
>> +     virtqueue_kick(vq, NULL);
>>  }
>>
>>  static int virtio_read(struct hwrng *rng, void *buf, size_t size, bool wait)
>> diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
>> index 942a982..677714d 100644
>> --- a/drivers/char/virtio_console.c
>> +++ b/drivers/char/virtio_console.c
>> @@ -328,7 +328,7 @@ static int add_inbuf(struct virtqueue *vq, struct 
>> port_buffer *buf)
>>       sg_init_one(sg, buf->buf, buf->size);
>>
>>       ret = virtqueue_add_buf(vq, sg, 0, 1, buf);
>> -     virtqueue_kick(vq);
>> +     virtqueue_kick(vq, NULL);
>>       return ret;
>>  }
>>
>> @@ -400,7 +400,7 @@ static ssize_t __send_control_msg(struct ports_device 
>> *portdev, u32 port_id,
>>
>>       sg_init_one(sg, &cpkt, sizeof(cpkt));
>>       if (virtqueue_add_buf(vq, sg, 1, 0, &cpkt) >= 0) {
>> -             virtqueue_kick(vq);
>

Re: [PATCHv2] vhost-net: add dhclient work-around from userspace

2010-06-29 Thread Michael S. Tsirkin
On Mon, Jun 28, 2010 at 03:19:41PM -0700, Sridhar Samudrala wrote:
> On Mon, 2010-06-28 at 13:08 +0300, Michael S. Tsirkin wrote:
> > Userspace virtio server has the following hack
> > so guests rely on it, and we have to replicate it, too:
> > 
> > Use port number to detect incoming IPv4 DHCP response packets,
> > and fill in the checksum for these.
> > 
> > The issue we are solving is that on linux guests, some apps
> > that use recvmsg with AF_PACKET sockets, don't know how to
> > handle CHECKSUM_PARTIAL;
> > The interface to return the relevant information was added
> > in 8dc4194474159660d7f37c495e3fc3f10d0db8cc,
> > and older userspace does not use it.
> > One important user of recvmsg with AF_PACKET is dhclient,
> > so we add a work-around just for DHCP.
> > 
> > Don't bother applying the hack to IPv6 as userspace virtio does not
> > have a work-around for that - let's hope guests will do the right
> > thing wrt IPv6.
> > 
> > Signed-off-by: Michael S. Tsirkin 
> > ---
> > 
> > Dave, I'm going to put this patch on the vhost tree,
> > no need for you to bother merging it - you'll get
> > it with a pull request.
> > 
> > 
> >  drivers/vhost/net.c |   44 +++-
> >  1 files changed, 43 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index cc19595..03bba6a 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -24,6 +24,10 @@
> >  #include 
> >  #include 
> > 
> > +#include 
> > +#include 
> > +#include 
> > +
> >  #include 
> > 
> >  #include "vhost.h"
> > @@ -186,6 +190,44 @@ static void handle_tx(struct vhost_net *net)
> > unuse_mm(net->dev.mm);
> >  }
> > 
> > +static int peek_head(struct sock *sk)
> 
> This routine is doing more than just peeking the head of sk's receive
> queue. May be this should be named similar to what qemu calls
> 'work_around_broken_dhclient()'
> > +{
> > +   struct sk_buff *skb;
> > +
> > +   lock_sock(sk);
> > +   skb = skb_peek(&sk->sk_receive_queue);
> > +   if (unlikely(!skb)) {
> > +   release_sock(sk);
> > +   return 0;
> > +   }
> > +   /* Userspace virtio server has the following hack so
> > +* guests rely on it, and we have to replicate it, too: */
> > +   /* Use port number to detect incoming IPv4 DHCP response packets,
> > +* and fill in the checksum. */
> > +
> > +   /* The issue we are solving is that on linux guests, some apps
> > +* that use recvmsg with AF_PACKET sockets, don't know how to
> > +* handle CHECKSUM_PARTIAL;
> > +* The interface to return the relevant information was added in
> > +* 8dc4194474159660d7f37c495e3fc3f10d0db8cc,
> > +* and older userspace does not use it.
> > +* One important user of recvmsg with AF_PACKET is dhclient,
> > +* so we add a work-around just for DHCP. */
> > +   if (skb->ip_summed == CHECKSUM_PARTIAL &&
> > +   skb_headlen(skb) >= skb_transport_offset(skb) +
> > +   sizeof(struct udphdr) &&
> > +   udp_hdr(skb)->dest == htons(68) &&
> > +   skb_network_header_len(skb) >= sizeof(struct iphdr) &&
> > +   ip_hdr(skb)->protocol == IPPROTO_UDP &&
> > +   skb->protocol == htons(ETH_P_IP)) {
> 
> Isn't it more logical to check for skb->protocol, followed by ip_hdr and
> then udp_hdr?


Yes, but then we'll only exit after checking them all.
My way we'll almost always exit after port check.

> > +   skb_checksum_help(skb);
> > +   /* Restore ip_summed value: tun passes it to user. */
> > +   skb->ip_summed = CHECKSUM_PARTIAL;
> > +   }
> > +   release_sock(sk);
> > +   return 1;
> > +}
> > +
> >  /* Expects to be always run from workqueue - which acts as
> >   * read-size critical section for our kind of RCU. */
> >  static void handle_rx(struct vhost_net *net)
> > @@ -222,7 +264,7 @@ static void handle_rx(struct vhost_net *net)
> > vq_log = unlikely(vhost_has_feature(&net->dev, VHOST_F_LOG_ALL)) ?
> > vq->log : NULL;
> > 
> > -   for (;;) {
> > +   while (peek_head(sock->sk)) {
> > head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
> >  ARRAY_SIZE(vq->iov),
> >  &out, &in,
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization