Re: [PATCH] kvm: log directly from the guest to the host kvm buffer

2011-05-12 Thread Dhaval Giani
On Thu, May 12, 2011 at 5:13 PM, Avi Kivity  wrote:
> On 05/12/2011 04:36 PM, Dhaval Giani wrote:
>>
>> Hi,
>>
>> As part of some of the work for my project, I have been looking at
>> tracing some of the events in the guest from inside the host. In
>> my usecase, I have been looking to co-relate the time of a network
>> packet arrival with that in the host. ftrace makes such arbitrary
>> use quite simple, so I went ahead an extended this functionality
>> in terms of a hypercall. There are still a few issues with this patch.
>>
>> 1. For some reason, the first time the hypercall is called, it works
>> just fine, but the second invocation refuses to happen. I am still
>> clueless about it. (and am looking for hints :-) )
>> 2. I am not very sure if I got the demarcation between the guest and
>> the host code fine or not. Someone more experienced than me should take
>> a look at the code as well :-)
>> 3. This adds a new paravirt call.
>> 4. This has been implemented just for x86 as of now. If there is enough
>> interest, I will look to make it more generic to be used across other
>> architectures. However, it is quite easy to do the same.
>> 5. It still does not have all the fancy ftrace features, but again,
>> depending on the interest, I can add all those in.
>> 6. Create a config option for this feature.
>>
>> I think such a feature is useful for debugging purposes and might make
>> sense to carry upstream.
>
> I guess it could help things like virtio/vhost development and profiling.
>

Exactly what i am using it for.

>
> I think that one hypercall per trace is too expensive.  Tracing is meant to
> be lightweight!  I think the guest can log to a buffer, which is flushed on
> overflow or when a vmexit occurs.  That gives us automatic serialization
> between a vcpu and the cpu it runs on, but not between a vcpu and a
> different host cpu.
>

hmm. So, basically, log all of these events, and then send them to the
host either on an exit, or when your buffer fills up. There is one
problem with approach though. One of the reasons I wanted this
approach was beacuse i wanted to co-relate the guest and the host
times. (which is why I kept is synchronous). I lose out that
information with what you say. However I see your point about the
overhead. I will think about this a bit more.

>>
>> +int kvm_pv_ftrace(struct kvm_vcpu *vcpu, unsigned long ip, gpa_t addr)
>> +{
>> +       int ret;
>> +       char *fmt = (char *) kzalloc(PAGE_SIZE, GFP_KERNEL);
>> +
>> +       ret = kvm_read_guest(vcpu->kvm, addr, fmt, PAGE_SIZE);
>> +
>> +       trace_printk("KVM instance %p: VCPU %d, IP %lu: %s",
>> +                               vcpu->kvm, vcpu->vcpu_id, ip, fmt);
>> +
>> +       kfree(fmt);
>> +
>> +       return 0;
>> +}
>
> A kmalloc and printf seem expensive here.  I'd prefer to log the arguments
> and format descriptor instead.  Similarly the guest should pass unformatted
> parameters.+int kvm_ftrace_printk(unsigned long ip, const char *fmt, ...)
>>

the trace_printk is actually quite cheap (IIRC), but I guess Steve is
the best person to let us know about that. We can avoid the kzalloc
overhead though.

Dhaval
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: log directly from the guest to the host kvm buffer

2011-05-12 Thread Dhaval Giani
Hi,

As part of some of the work for my project, I have been looking at
tracing some of the events in the guest from inside the host. In
my usecase, I have been looking to co-relate the time of a network
packet arrival with that in the host. ftrace makes such arbitrary
use quite simple, so I went ahead an extended this functionality
in terms of a hypercall. There are still a few issues with this patch.

1. For some reason, the first time the hypercall is called, it works
just fine, but the second invocation refuses to happen. I am still
clueless about it. (and am looking for hints :-) )
2. I am not very sure if I got the demarcation between the guest and
the host code fine or not. Someone more experienced than me should take
a look at the code as well :-)
3. This adds a new paravirt call.
4. This has been implemented just for x86 as of now. If there is enough
interest, I will look to make it more generic to be used across other
architectures. However, it is quite easy to do the same.
5. It still does not have all the fancy ftrace features, but again,
depending on the interest, I can add all those in.
6. Create a config option for this feature.

I think such a feature is useful for debugging purposes and might make
sense to carry upstream.

Thanks,
Dhaval

---
kvm: log directly from the guest to the host kvm buffer

Add a new hypercall kvm_pv_ftrace() which logs in the host's
ftrace buffer. To use, the caller should use the kvm_ftrace
macro in the guest. This is still very early code and does not fully
work.

Signed-off-by: Dhaval Giani 

---
 arch/x86/include/asm/kvm_para.h |   11 +++
 arch/x86/kernel/kvm.c   |   22 ++
 arch/x86/kvm/x86.c  |   18 ++
 include/linux/kvm_para.h|1 +
 4 files changed, 52 insertions(+)

Index: linux-2.6/arch/x86/kvm/x86.c
===
--- linux-2.6.orig/arch/x86/kvm/x86.c
+++ linux-2.6/arch/x86/kvm/x86.c
@@ -4832,6 +4832,21 @@ int kvm_hv_hypercall(struct kvm_vcpu *vc
return 1;
 }
 
+int kvm_pv_ftrace(struct kvm_vcpu *vcpu, unsigned long ip, gpa_t addr)
+{
+   int ret;
+   char *fmt = (char *) kzalloc(PAGE_SIZE, GFP_KERNEL);
+
+   ret = kvm_read_guest(vcpu->kvm, addr, fmt, PAGE_SIZE);
+
+   trace_printk("KVM instance %p: VCPU %d, IP %lu: %s",
+   vcpu->kvm, vcpu->vcpu_id, ip, fmt);
+
+   kfree(fmt);
+
+   return 0;
+}
+
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 {
unsigned long nr, a0, a1, a2, a3, ret;
@@ -4868,6 +4883,9 @@ int kvm_emulate_hypercall(struct kvm_vcp
case KVM_HC_MMU_OP:
r = kvm_pv_mmu_op(vcpu, a0, hc_gpa(vcpu, a1, a2), &ret);
break;
+   case KVM_HC_FTRACE:
+   ret = kvm_pv_ftrace_printk(vcpu, a0, hc_gpa(vcpu, a1, a2));
+   break;
default:
ret = -KVM_ENOSYS;
break;
Index: linux-2.6/include/linux/kvm_para.h
===
--- linux-2.6.orig/include/linux/kvm_para.h
+++ linux-2.6/include/linux/kvm_para.h
@@ -19,6 +19,7 @@
 #define KVM_HC_MMU_OP  2
 #define KVM_HC_FEATURES3
 #define KVM_HC_PPC_MAP_MAGIC_PAGE  4
+#define KVM_HC_FTRACE  5
 
 /*
  * hypercalls use architecture specific
Index: linux-2.6/arch/x86/kernel/kvm.c
===
--- linux-2.6.orig/arch/x86/kernel/kvm.c
+++ linux-2.6/arch/x86/kernel/kvm.c
@@ -274,6 +274,28 @@ static void kvm_mmu_op(void *buffer, uns
} while (len);
 }
 
+int kvm_ftrace_printk(unsigned long ip, const char *fmt, ...)
+{
+   char *buffer = kzalloc(PAGE_SIZE, GFP_KERNEL);
+   int ret;
+   unsigned long a1, a2;
+   va_list args;
+   int i;
+
+   va_start(args, fmt);
+   i = vsnprintf(buffer, PAGE_SIZE, fmt, args);
+   va_end(args);
+
+   a1 = __pa(buffer);
+   a2 = 0;
+
+   ret = kvm_hypercall3(KVM_HC_FTRACE, ip, a1, a2);
+
+   kfree(buffer);
+   return ret;
+}
+EXPORT_SYMBOL(kvm_ftrace_printk);
+
 static void mmu_queue_flush(struct kvm_para_state *state)
 {
if (state->mmu_queue_len) {
Index: linux-2.6/arch/x86/include/asm/kvm_para.h
===
--- linux-2.6.orig/arch/x86/include/asm/kvm_para.h
+++ linux-2.6/arch/x86/include/asm/kvm_para.h
@@ -21,6 +21,10 @@
  */
 #define KVM_FEATURE_CLOCKSOURCE23
 #define KVM_FEATURE_ASYNC_PF   4
+/*
+ * Right now an experiment, hopefully it works
+ */
+#define KVM_FEATURE_FTRACE 5
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -188,6 +192,13 @@ static inline u32 kvm_read_and_reset_pf_
 }
 #endif
 
+int kvm_ftrace_printk(unsigned long ip, co

Re: vhost, something changed between 2.6.35 and 2.6.36 ?

2010-09-13 Thread Dhaval Giani
(BTW, this is a regression from 2.6.35 at least. I will try to figure
out the last working version if you would like a bisect!)

On Sun, Sep 12, 2010 at 4:40 PM, Michael S. Tsirkin  wrote:
> On Sun, Sep 12, 2010 at 04:39:29PM +0200, Dhaval Giani wrote:
>> On Sun, Sep 12, 2010 at 2:05 PM, Michael S. Tsirkin  wrote:
>> > On Fri, Sep 10, 2010 at 03:37:36PM +0200, Dhaval Giani wrote:
>> >> Hi,
>> >>
>> >> I have been trying to get vhost+macvtap to work for me. I run it as
>> >>
>> >> /root/qemu-kvm-vhost-net/bin/qemu-system-x86_64 -hda $IMAGE  -serial
>> >> stdio -monitor telnet::,server,nowait -vnc :4: -m 3G -net
>> >> nic,model=virtio,macaddr=$MACADDR,netdev=macvtap0 -netdev
>> >> tap,id=macvtap0,vhost=on,fd=3 3<> /dev/tap5
>> >>
>> >> in 2.6.35, which worked just fine. On the other hand, with 2.6.36, i
>> >> don't have working networking. I am using the same image and same
>> >> macaddress. The qemu is the version from
>> >> git://git.kernel.org/pub/scm/linux/kernel/git/mst/qemu-kvm.git vhost .
>> >
>> > BTW, by now, all these patches are merged so upstream qemu-kvm should work
>> > just fine for you as well.
>> >
>> >> Any suggestions will be welcome!
>> >>
>> >> Thanks,
>> >> Dhaval
>> >
>> > You are running this as non-root user, correct?
>>
>> nope as root.
>>
>> > This could be the permission issue that got fixed
>> > by 87d6a412bd1ed82c14cabd4b408003b23bbd2880.
>> > Could you please check the latest master from Linus,
>> > and let me and the list know? Thanks!
>> >
>>
>> this is with git of friday evening CEST.
>>
>> > Another thing to try if this does *not* help:
>> >
>> > enable CONFIG_DYNAMIC_DEBUG in kernel,
>> > rebuild the kernel,
>> > mount debugfs:
>> >
>> >        mount -t debugfs none /sys/kernel/debug
>> >
>> > and then enable debug for vhost_net as described in
>> > Documentation/dynamic-debug-howto.txt:
>> >
>>
>> I will give this a run on monday morning when i am at the lab again.
>>

So nothing comes out with this.

>> >        echo 'module vhost_net +p' > /sys/kernel/debug/dynamic_debug/control
>> >
>> > Then start qemu, and after running a test, run dmesg and see if there
>> > are any messages from vhost_net. If yes please send them to
>> > me and to the list.
>> >
>> > Thanks!
>> >
>>
>>
>> thanks!
>> Dhaval
>
> Another thing to try check is generic net core issues.
>
> For this, try running tcpdump on both tap in host
> and on virtio net device in guest. Then
> send packets to host from guest and back, and check whether
> they appears on virtio and on tap.
>

tcpdump -i macvtap0 on the host leads to nothing.

tcpdump -i eth0 on the guest leads to ARP requests, with no responses.

Anything more I can try?

Thanks!
Dhaval
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vhost, something changed between 2.6.35 and 2.6.36 ?

2010-09-12 Thread Dhaval Giani
On Sun, Sep 12, 2010 at 2:05 PM, Michael S. Tsirkin  wrote:
> On Fri, Sep 10, 2010 at 03:37:36PM +0200, Dhaval Giani wrote:
>> Hi,
>>
>> I have been trying to get vhost+macvtap to work for me. I run it as
>>
>> /root/qemu-kvm-vhost-net/bin/qemu-system-x86_64 -hda $IMAGE  -serial
>> stdio -monitor telnet::,server,nowait -vnc :4: -m 3G -net
>> nic,model=virtio,macaddr=$MACADDR,netdev=macvtap0 -netdev
>> tap,id=macvtap0,vhost=on,fd=3 3<> /dev/tap5
>>
>> in 2.6.35, which worked just fine. On the other hand, with 2.6.36, i
>> don't have working networking. I am using the same image and same
>> macaddress. The qemu is the version from
>> git://git.kernel.org/pub/scm/linux/kernel/git/mst/qemu-kvm.git vhost .
>
> BTW, by now, all these patches are merged so upstream qemu-kvm should work
> just fine for you as well.
>
>> Any suggestions will be welcome!
>>
>> Thanks,
>> Dhaval
>
> You are running this as non-root user, correct?

nope as root.

> This could be the permission issue that got fixed
> by 87d6a412bd1ed82c14cabd4b408003b23bbd2880.
> Could you please check the latest master from Linus,
> and let me and the list know? Thanks!
>

this is with git of friday evening CEST.

> Another thing to try if this does *not* help:
>
> enable CONFIG_DYNAMIC_DEBUG in kernel,
> rebuild the kernel,
> mount debugfs:
>
>        mount -t debugfs none /sys/kernel/debug
>
> and then enable debug for vhost_net as described in
> Documentation/dynamic-debug-howto.txt:
>

I will give this a run on monday morning when i am at the lab again.

>        echo 'module vhost_net +p' > /sys/kernel/debug/dynamic_debug/control
>
> Then start qemu, and after running a test, run dmesg and see if there
> are any messages from vhost_net. If yes please send them to
> me and to the list.
>
> Thanks!
>


thanks!
Dhaval
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


vhost, something changed between 2.6.35 and 2.6.36 ?

2010-09-10 Thread Dhaval Giani
Hi,

I have been trying to get vhost+macvtap to work for me. I run it as

/root/qemu-kvm-vhost-net/bin/qemu-system-x86_64 -hda $IMAGE  -serial
stdio -monitor telnet::,server,nowait -vnc :4: -m 3G -net
nic,model=virtio,macaddr=$MACADDR,netdev=macvtap0 -netdev
tap,id=macvtap0,vhost=on,fd=3 3<> /dev/tap5

in 2.6.35, which worked just fine. On the other hand, with 2.6.36, i
don't have working networking. I am using the same image and same
macaddress. The qemu is the version from
git://git.kernel.org/pub/scm/linux/kernel/git/mst/qemu-kvm.git vhost .

Any suggestions will be welcome!

Thanks,
Dhaval
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-05 Thread Dhaval Giani
On Fri, Jun 05, 2009 at 04:02:11PM +0300, Avi Kivity wrote:
> Paul Menage wrote:
>> On Wed, Jun 3, 2009 at 10:36 PM, Bharata B
>> Rao wrote:
>>   
>>> - Hard limits can be used to provide guarantees.
>>>
>>> 
>>
>> This claim (and the subsequent long thread it generated on how limits
>> can provide guarantees) confused me a bit.
>>
>> Why do we need limits to provide guarantees when we can already
>> provide guarantees via shares?
>>
>> Suppose 10 cgroups each want 10% of the machine's CPU. We can just
>> give each cgroup an equal share, and they're guaranteed 10% if they
>> try to use it; if they don't use it, other cgroups can get access to
>> the idle cycles.
>>
>> Suppose cgroup A wants a guarantee of 50% and two others, B and C,
>> want guarantees of 15% each; give A 50 shares and B and C 15 shares
>> each. In this case, if they all run flat out they'll get 62%/19%/19%,
>> which is within their SLA.
>>
>> That's not to say that hard limits can't be useful in their own right
>> - e.g. for providing reproducible loadtesting conditions by
>> controlling how much CPU a service can use during the load test. But I
>> don't see why using them to implement guarantees is either necessary
>> or desirable.
>>
>> (Unless I'm missing some crucial point ...)
>>   
>
> How many shares does a cgroup with a 0% guarantee get?
>

Shares cannot be used to provide guarantees. All they decide is what
propotion groups can get CPU time. (yes, shares is a bad name, weight
shows the intent better).

thanks,
-- 
regards,
Dhaval
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-05 Thread Dhaval Giani
On Fri, Jun 05, 2009 at 02:51:18AM -0700, Paul Menage wrote:
> On Fri, Jun 5, 2009 at 2:48 AM, Dhaval Giani wrote:
> >> > Now if 11th group with same shares comes in, then each group will now
> >> > get 9% of CPU and that 10% guarantee breaks.
> >>
> >> So you're trying to guarantee 11 cgroups that they can each get 10% of
> >> the CPU? That's called over-committing, and while there's nothing
> >> wrong with doing that if you're confident that they'll not all need
> >> their 10% at the same time, there's no way to *guarantee* them all
> >> 10%. You can guarantee them all 9% and hope the extra 1% is spare for
> >> those that need it (over-committing), or you can guarantee 10 of them
> >> 10% and give the last one 0 shares.
> >>
> >> How would you propose to guarantee 11 cgroups each 10% of the CPU
> >> using hard limits?
> >>
> >
> > You cannot guarantee 10% to 11 groups on any system (unless I am missing
> > something). The sum of guarantees cannot exceed 100%.
> 
> That's exactly my point. I was trying to counter Bharata's statement, which 
> was:
> 
> > > Now if 11th group with same shares comes in, then each group will now
> > > get 9% of CPU and that 10% guarantee breaks.
> 
> which seemed to be implying that this was a drawback of using shares
> to implement guarantees.
> 

OK :). Glad to see I did not get it wrong.

I think we are focusing on the wrong use case here. Guarantees is just a
useful side-effect we get by using hard limits. I think the more
important use case is where the provider wants to limit the amount of
time a user gets (such as in a cloud).

Maybe we should direct our attention in solving that problem? :)

thanks,
-- 
regards,
Dhaval
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-05 Thread Dhaval Giani
On Fri, Jun 05, 2009 at 02:32:51AM -0700, Paul Menage wrote:
> On Fri, Jun 5, 2009 at 2:27 AM, Bharata B Rao 
> wrote:
> >>
> >> Suppose 10 cgroups each want 10% of the machine's CPU. We can just
> >> give each cgroup an equal share, and they're guaranteed 10% if they
> >> try to use it; if they don't use it, other cgroups can get access to
> >> the idle cycles.
> >
> > Now if 11th group with same shares comes in, then each group will now
> > get 9% of CPU and that 10% guarantee breaks.
> 
> So you're trying to guarantee 11 cgroups that they can each get 10% of
> the CPU? That's called over-committing, and while there's nothing
> wrong with doing that if you're confident that they'll not al need
> their 10% at the same time, there's no way to *guarantee* them all
> 10%. You can guarantee them all 9% and hope the extra 1% is spare for
> those that need it (over-committing), or you can guarantee 10 of them
> 10% and give the last one 0 shares.
> 
> How would you propose to guarantee 11 cgroups each 10% of the CPU
> using hard limits?
> 

You cannot guarantee 10% to 11 groups on any system (unless I am missing
something). The sum of guarantees cannot exceed 100%.

How would you be able to do that with any other mechanism?

Thanks,
-- 
regards,
Dhaval
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html