date:20070821

Re: [kvm-devel] [Qemu-devel] [PATCH 3/4] Add support for HPET periodic timer.

2007-08-21 Thread Dan Kenigsberg

On Tue, Aug 21, 2007 at 01:15:22PM -0700, Matthew Kent wrote:
> On Tue, 2007-21-08 at 21:40 +0200, Luca wrote:
> > On 8/21/07, Matthew Kent <[EMAIL PROTECTED]> wrote:
> > > On Sat, 2007-18-08 at 01:11 +0200, Luca Tettamanti wrote:
> > > > plain text document attachment (clock-hpet)
> > > > Linux operates the HPET timer in legacy replacement mode, which means 
> > > > that
> > > > the periodic interrupt of the CMOS RTC is not delivered (qemu won't be 
> > > > able
> > > > to use /dev/rtc). Add support for HPET (/dev/hpet) as a replacement for 
> > > > the
> > > > RTC; the periodic interrupt is delivered via SIGIO and is handled in the
> > > > same way as the RTC timer.
> > > >
> > > > Signed-off-by: Luca Tettamanti <[EMAIL PROTECTED]>
> > >
> > > I must be missing something silly here.. should I be able to open more
> > > than one instance of qemu with -clock hpet? Because upon invoking a
> > > second instance of qemu HPET_IE_ON fails.
> > 
> > It depends on your hardware. Theoretically it's possible, but I've yet
> > to see a motherboard with more than one periodic timer.
> 
> Ah thank you, after re-reading the docs I think I better understand
> this.

In a risk of being off-topic, maybe you can help me try the hpet support.
When I try the hpet Documentation demo I get

# ./hpet poll /dev/hpet 1 1000
-hpet: executing poll
hpet_poll: info.hi_flags 0x0
hpet_poll, HPET_IE_ON failed

while I have

$ dmesg|grep -i HPET
ACPI: HPET 7D5B6AE0, 0038 (r1 A M I  OEMHPET   5000708 MSFT   97)
ACPI: HPET id: 0x8086a301 base: 0xfed0
hpet0: at MMIO 0xfed0, IRQs 2, 8, 0, 0
hpet0: 4 64-bit timers, 14318180 Hz
hpet_resources: 0xfed0 is busy
Time: hpet clocksource has been installed.

Any idea what I am misconfiguring?

Thanks,

Dan.

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 0/4] Rework alarm timer infrastrucure - take2

2007-08-21 Thread Avi Kivity

Luca Tettamanti wrote:

> Actually I'm having troubles with cyclesoak (probably it's calibration),
> numbers are not very stable across multiple runs...
>   

I've had good results with cyclesoak; maybe you need to run it in
runlevel 3 so the load generated by moving the mouse or breathing
doesn't affect meaurements.

> I've also tried APC which was suggested by malc[1] and:
> - readings are far more stable
> - the gap between dynticks and non-dynticks seems not significant
>
>   
>> Can you verify this by running
>>
>>strace -c -p `pgrep qemu` & sleep 10; pkill strace
>>
>> for all 4 cases, and posting the results?
>> 
>
> Plain QEMU:
>
> With dynticks:
>
> % time seconds  usecs/call callserrors syscall
> -- --- --- - - 
>  57.970.000469   0 13795   clock_gettime
>  32.880.000266   0  1350   gettimeofday
>   7.420.60   0  1423  1072 sigreturn
>   1.730.14   0  5049   timer_gettime
>   0.000.00   0  1683  1072 select
>   0.000.00   0  2978   timer_settime
> -- --- --- - - 
> 100.000.000809 26278  2144 total
>   

The 1072 select() errors are the delivered ticks (EINTR).  But why only
1000?  would have expected 1 for a 1000Hz guest in a 10 sec period.

> HPET:
>
> % time seconds  usecs/call callserrors syscall
> -- --- --- - - 
>  87.480.010459   1 10381 10050 select
>   8.450.001010   0 40736   clock_gettime
>   2.730.000326   0 10049   gettimeofday
>   1.350.000161   0 10086 10064 sigreturn
> -- --- --- - - 
> 100.000.011956 71252 20114 total
>   

This is expected.  1 tick per millisecond.

> Unix (SIGALRM):
>
> % time seconds  usecs/call callserrors syscall
> -- --- --- - - 
>  90.360.011663   1 10291  9959 select
>   7.380.000953   0 40355   clock_gettime
>   2.050.000264   0  9960   gettimeofday
>   0.210.27   0  9985  9969 sigreturn
> -- --- --- - - 
> 100.000.012907 70591 19928 total
>   

Same here.

> And KVM:
>
> dynticks:
>
> % time seconds  usecs/call callserrors syscall
> -- --- --- - - 
>  78.900.004001   1  6681  5088 rt_sigtimedwait
>  10.870.000551   0 27901   clock_gettime
>   4.930.000250   0  7622   timer_settime
>   4.300.000218   0 10078   timer_gettime
>   0.390.20   0  3863   gettimeofday
>   0.350.18   0  6054   ioctl
>   0.260.13   0  4196   select
>   0.000.00   0  1593   rt_sigaction
> -- --- --- - - 
> 100.000.005071 67988  5088 total
>   

kvm uses sigtimedwait() to wait for signals.  Here, an error (ETIMEDOUT)
indicates we did _not_ get a wakeup, so there are 1500 wakeups in a 10
second period.  Strange.  Some calibration error?

> HPET:
>
> % time seconds  usecs/call callserrors syscall
> -- --- --- - - 
>  90.200.011029   0 32437 22244 rt_sigtimedwait
>   4.460.000545   0 44164   clock_gettime
>   2.590.000317   0 12128   gettimeofday
>   1.500.000184   0 10193   rt_sigaction
>   1.100.000134   0 12461   select
>   0.150.18   0  6060   ioctl
> -- --- --- - - 
> 100.000.012227117443 22244 total
>   

10K wakeups per second.  The code is not particularly efficient (11
syscalls per tick), but overhead is still low.

> Unix:
>
> % time seconds  usecs/call callserrors syscall
> -- --- --- - - 
>  83.290.012522   0 31652 21709 rt_sigtimedwait
>   6.910.001039   0 43125   clock_gettime
>   3.500.000526   0  6042   ioctl
>   2.740.000412   0  9943   rt_sigaction
>   1.980.000298   0 12183   select
>   1.580.000238   0 11850   gettimeofday
> -- --- --- - - 
> 100.00

Re: [kvm-devel] linux verify_pmtmr_rate() issue

2007-08-21 Thread Avi Kivity

Matthew Kent wrote:
> Issue here that's beyond my skill set to resolve:
>
> I've been starting multiple linux 2.6.23-rc3 x86 guests up in parallel
> with qemu/kvm and noticed pm-timer is being disabled in some of them
> with
>
> PM-Timer running at invalid rate: 126% of normal - aborting.
>
> in dmesg when I start about 6 at a time. Unfortunately without the timer
> a tickless kernel in my guests is disabled. 
>
> I also replicated the issue by starting a single vm when the host system
> was busy enough.
>
> After some amateurish debugging added to verify_pmtmr_rate() in the
> kernel acpi_pm driver and get_pmtmr() in qemu acpi I can indeed see it
> returning just slowly enough to throw off the sanity check. 
>
> [   10.264772] DEBUG: PM-Timer running value1: 2925874 value2: 3058371
> expected_rate: 107385 delta: 132497 count: 2269
> [   10.270766] PM-Timer running at invalid rate: 123% of normal -
> aborting.
>
> For now I've just disabled verify_pmtmr_rate() in the kernel for my
> guests and they seem to be keeping time just fine. 
>
> Not sure if a patch for the linux kernel making the sanity check
> optional with a kernel parameter would make sense or there's something
> else that can be done at the qemu level.
>   

You can try implementing qemu's cpu_get_real_ticks() using
gettimeofday() instead of using the time stamp counter (which can go
back or jump forward if the time stamp counter is not synced across
cpus).  Not sure if that's the problem though.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 00/10] PV-IO v3

2007-08-21 Thread Rusty Russell

On Tue, 2007-08-21 at 12:47 -0400, Gregory Haskins wrote:
> On Tue, 2007-08-21 at 10:06 -0400, Gregory Haskins wrote:
> > On Tue, 2007-08-21 at 23:47 +1000, Rusty Russell wrote:
> > > 
> > >   In the guest -> host direction, an interface like virtio is designed
> > > for batching, with the explicit distinction between add_buf & sync.
> > 
> > Right.  IOQ has "iter_push()" and "signal()" as synonymous operations.
> 
> Hi Rusty,
>   This reminded me of an area that I thought might have been missing in
> virtio compared to IOQ.  That is, flexibility in the io-completion via
> the distinction between "signal" and "sync".  sync() implies that its a
> blocking call based on the full drain of the queue, correct?  the
> ioq_signal() operation is purely a "kick".  You can, of course, still
> implement synchronous functions with a higher layer construct such as
> the ioq->wq.

Hi Gregory,

You raise a good point.  We should rename "sync" to "kick".  Clear
names are very important.

> Is there a way to do something similar in virtio? (and forgive me if
> there is..I still haven't seen the code).  And if not and people like
> that idea, what would be a good way to add it to the interface?

I had two implementations, an efficient descriptor based one and a dumb
dumb dumb 1-char copying-based one.  I let the latter one rot; it was
sufficient for me to convince myself that it was possible to create an
implementation which uses such a transport.

(Nonetheless, it's kinda boring to maintain so it wasn't updated for the
lastest draft of the virtio API).

Here's the lguest "efficient" implementation, which could still use some
love:

===
More efficient lguest implementation of virtio, using descriptors.

This allows zero-copy from guest <-> host.  It uses a page of
descriptors, a page to say what descriptors to use, and a page to say
what's been used: one each set for inbufs and one for outbufs.

TODO:
1) More polishing
2) Get rid of old I/O
3) Inter-guest I/O implementation

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
---
 Documentation/lguest/lguest.c   |  412 +
 drivers/lguest/Makefile |2 
 drivers/lguest/hypercalls.c |4 
 drivers/lguest/lguest_virtio.c  |  476 +++
 include/asm-i386/lguest_hcall.h |3 
 include/linux/lguest_launcher.h |   26 ++
 6 files changed, 914 insertions(+), 9 deletions(-)

===
--- a/Documentation/lguest/lguest.c
+++ b/Documentation/lguest/lguest.c
@@ -5,6 +5,8 @@
 #define _LARGEFILE64_SOURCE
 #define _GNU_SOURCE
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -43,6 +45,7 @@ typedef uint16_t u16;
 typedef uint16_t u16;
 typedef uint8_t u8;
 #include "../../include/linux/lguest_launcher.h"
+#include "../../include/linux/virtio_blk.h"
 #include "../../include/asm/e820.h"
 /*:*/
 
@@ -55,6 +58,8 @@ typedef uint8_t u8;
 /* We can have up to 256 pages for devices. */
 #define DEVICE_PAGES 256
 
+#define descs_per_page() (getpagesize() / sizeof(struct lguest_desc))
+
 /*L:120 verbose is both a global flag and a macro.  The C preprocessor allows
  * this, and although I wouldn't recommend it, it works quite nicely here. */
 static bool verbose;
@@ -106,6 +111,8 @@ struct device
unsigned long watch_key;
u32 (*handle_output)(int fd, const struct iovec *iov,
 unsigned int num, struct device *me);
+   /* Alternative to handle_output */
+   void (*handle_notify)(int fd, struct device *me);
 
/* Device-specific data. */
void *priv;
@@ -956,17 +963,21 @@ static void handle_output(int fd, unsign
struct iovec iov[LGUEST_MAX_DMA_SECTIONS];
unsigned num = 0;
 
-   /* Convert the "struct lguest_dma" they're sending to a "struct
-* iovec". */
-   lenp = dma2iov(dma, iov, &num);
-
/* Check each device: if they expect output to this key, tell them to
 * handle it. */
for (i = devices->dev; i; i = i->next) {
-   if (i->handle_output && key == i->watch_key) {
-   /* We write the result straight into the used_len field
-* for them. */
+   if (key != i->watch_key)
+   continue;
+
+   if (i->handle_output) {
+   /* Convert the "struct lguest_dma" they're sending to a
+* "struct iovec". */
+   lenp = dma2iov(dma, iov, &num);
*lenp = i->handle_output(fd, iov, num, i);
+   return;
+   } else if (i->handle_notify) {
+   /* virtio-style notify. */
+   i->handle_notify(fd, i);
return;
}
}
@@ -1079,6 +1090,7 @@ static struct device *new_device(struct 
dev->handle_input = handle_input;
dev->watch_key = to_gue

[kvm-devel] linux verify_pmtmr_rate() issue

2007-08-21 Thread Matthew Kent

Issue here that's beyond my skill set to resolve:

I've been starting multiple linux 2.6.23-rc3 x86 guests up in parallel
with qemu/kvm and noticed pm-timer is being disabled in some of them
with

PM-Timer running at invalid rate: 126% of normal - aborting.

in dmesg when I start about 6 at a time. Unfortunately without the timer
a tickless kernel in my guests is disabled. 

I also replicated the issue by starting a single vm when the host system
was busy enough.

After some amateurish debugging added to verify_pmtmr_rate() in the
kernel acpi_pm driver and get_pmtmr() in qemu acpi I can indeed see it
returning just slowly enough to throw off the sanity check. 

[   10.264772] DEBUG: PM-Timer running value1: 2925874 value2: 3058371
expected_rate: 107385 delta: 132497 count: 2269
[   10.270766] PM-Timer running at invalid rate: 123% of normal -
aborting.

For now I've just disabled verify_pmtmr_rate() in the kernel for my
guests and they seem to be keeping time just fine. 

Not sure if a patch for the linux kernel making the sanity check
optional with a kernel parameter would make sense or there's something
else that can be done at the qemu level.

Thanks.
-- 
Matthew Kent <[EMAIL PROTECTED]>
http://magoazul.com


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] Another "unhandled vm exit: 0x9"

2007-08-21 Thread Cam Macdonell

Avi Kivity wrote:
> Weiyang Chen wrote:
> 
> A hardware task switch is sometimes used when a guest detects a serious 
> error and wants to switch to a known condition.  Sometimes it happens 
> accidentally due to a previous error.
> 
> Was this image installed using kvm?  What HAL does it use?
> 

Hi,

I had this same problem.  I have an old XP image that would not run 
under kvm-35.  It was installed with the kvm that is packaged with 
Ubuntu Feisty (kvm-27, I believe).  After I upgraded, it crashes during 
boot.

How can I check the HAL it uses?

I still have it around if you would like me to do some debugging with it.

Cam

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [Qemu-devel] [PATCH 3/4] Add support for HPET periodic timer.

2007-08-21 Thread Matthew Kent

On Tue, 2007-21-08 at 21:40 +0200, Luca wrote:
> On 8/21/07, Matthew Kent <[EMAIL PROTECTED]> wrote:
> > On Sat, 2007-18-08 at 01:11 +0200, Luca Tettamanti wrote:
> > > plain text document attachment (clock-hpet)
> > > Linux operates the HPET timer in legacy replacement mode, which means that
> > > the periodic interrupt of the CMOS RTC is not delivered (qemu won't be 
> > > able
> > > to use /dev/rtc). Add support for HPET (/dev/hpet) as a replacement for 
> > > the
> > > RTC; the periodic interrupt is delivered via SIGIO and is handled in the
> > > same way as the RTC timer.
> > >
> > > Signed-off-by: Luca Tettamanti <[EMAIL PROTECTED]>
> >
> > I must be missing something silly here.. should I be able to open more
> > than one instance of qemu with -clock hpet? Because upon invoking a
> > second instance of qemu HPET_IE_ON fails.
> 
> It depends on your hardware. Theoretically it's possible, but I've yet
> to see a motherboard with more than one periodic timer.

Ah thank you, after re-reading the docs I think I better understand
this.
-- 
Matthew Kent <[EMAIL PROTECTED]>
http://magoazul.com


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 0/4] Rework alarm timer infrastrucure - take2

2007-08-21 Thread malc

On Tue, 21 Aug 2007, Luca Tettamanti wrote:

> Avi Kivity ha scritto:
>> Luca Tettamanti wrote:
>>> At 1000Hz:
>>>
>>> QEMU
>>> hpet5.5%
>>> dynticks   11.7%
>>>
>>> KVM
>>> hpet3.4%
>>> dynticks7.3%
>>>
>>> No surprises here, you can see the additional 1k syscalls per second.
>>
>> This is very surprising to me.  The 6.2% difference for the qemu case
>> translates to 62ms per second, or 62us per tick at 1000Hz.  That's more
>> than a hundred simple syscalls on modern processors.  We shouldn't have to
>> issue a hundred syscalls per guest clock tick.
>
[..snip preulde..]

> I've also tried APC which was suggested by malc[1] and:
> - readings are far more stable
> - the gap between dynticks and non-dynticks seems not significant

[..dont snip the obvious fact and snip the numbers..]

>
> Luca
> [1] copy_to_user inside spinlock is a big no-no ;)
>

[..notice a projectile targeting at you and rush to see the code..]

Mixed feelings about this... But in principle the code ofcourse is
dangerous, thank you kindly for pointing this out.

I see two ways out of this:

a. moving the lock/unlock inside the loop with unlock preceding
sometimes sleep deprived copy_to_user

b. fill temporaries and after the loop is done copy it in one go

Too late, too hot, i wouldn't mind beying on a receiving side of
a good advice.

-- 
vale

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [Qemu-devel] [PATCH 3/4] Add support for HPET periodic timer.

2007-08-21 Thread Luca

On 8/21/07, Matthew Kent <[EMAIL PROTECTED]> wrote:
> On Sat, 2007-18-08 at 01:11 +0200, Luca Tettamanti wrote:
> > plain text document attachment (clock-hpet)
> > Linux operates the HPET timer in legacy replacement mode, which means that
> > the periodic interrupt of the CMOS RTC is not delivered (qemu won't be able
> > to use /dev/rtc). Add support for HPET (/dev/hpet) as a replacement for the
> > RTC; the periodic interrupt is delivered via SIGIO and is handled in the
> > same way as the RTC timer.
> >
> > Signed-off-by: Luca Tettamanti <[EMAIL PROTECTED]>
>
> I must be missing something silly here.. should I be able to open more
> than one instance of qemu with -clock hpet? Because upon invoking a
> second instance of qemu HPET_IE_ON fails.

It depends on your hardware. Theoretically it's possible, but I've yet
to see a motherboard with more than one periodic timer.

"dmesg | grep hpet" should tell you something like:

hpet0: 3 64-bit timers, 14318180 Hz

Luca

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 0/4] Rework alarm timer infrastrucure - take2

2007-08-21 Thread Luca Tettamanti

Avi Kivity ha scritto: 
> Luca Tettamanti wrote:
>> At 1000Hz:
>>
>> QEMU
>> hpet5.5%
>> dynticks   11.7%
>>
>> KVM
>> hpet3.4%
>> dynticks7.3%
>>
>> No surprises here, you can see the additional 1k syscalls per second. 
>
> This is very surprising to me.  The 6.2% difference for the qemu case 
> translates to 62ms per second, or 62us per tick at 1000Hz.  That's more 
> than a hundred simple syscalls on modern processors.  We shouldn't have to 
> issue a hundred syscalls per guest clock tick.

APIC or PIT interrupts are delivered using the timer, which will be
re-armed after each tick, so I'd expect 1k timer_settime per second. But
according to strace it's not happening, maybe I'm misreading the code?

> The difference with kvm is smaller (just 3.9%), which is not easily 
> explained as the time for the extra syscalls should be about the same.  My 
> guess is that guest behavior is different; with dynticks the guest does 
> about twice as much work as with hpet.

Actually I'm having troubles with cyclesoak (probably it's calibration),
numbers are not very stable across multiple runs...

I've also tried APC which was suggested by malc[1] and:
- readings are far more stable
- the gap between dynticks and non-dynticks seems not significant

> Can you verify this by running
>
>strace -c -p `pgrep qemu` & sleep 10; pkill strace
>
> for all 4 cases, and posting the results?

Plain QEMU:

With dynticks:

% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 57.970.000469   0 13795   clock_gettime
 32.880.000266   0  1350   gettimeofday
  7.420.60   0  1423  1072 sigreturn
  1.730.14   0  5049   timer_gettime
  0.000.00   0  1683  1072 select
  0.000.00   0  2978   timer_settime
-- --- --- - - 
100.000.000809 26278  2144 total

HPET:

% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 87.480.010459   1 10381 10050 select
  8.450.001010   0 40736   clock_gettime
  2.730.000326   0 10049   gettimeofday
  1.350.000161   0 10086 10064 sigreturn
-- --- --- - - 
100.000.011956 71252 20114 total

Unix (SIGALRM):

% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 90.360.011663   1 10291  9959 select
  7.380.000953   0 40355   clock_gettime
  2.050.000264   0  9960   gettimeofday
  0.210.27   0  9985  9969 sigreturn
-- --- --- - - 
100.000.012907 70591 19928 total

And KVM:

dynticks:

% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 78.900.004001   1  6681  5088 rt_sigtimedwait
 10.870.000551   0 27901   clock_gettime
  4.930.000250   0  7622   timer_settime
  4.300.000218   0 10078   timer_gettime
  0.390.20   0  3863   gettimeofday
  0.350.18   0  6054   ioctl
  0.260.13   0  4196   select
  0.000.00   0  1593   rt_sigaction
-- --- --- - - 
100.000.005071 67988  5088 total

HPET:

% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 90.200.011029   0 32437 22244 rt_sigtimedwait
  4.460.000545   0 44164   clock_gettime
  2.590.000317   0 12128   gettimeofday
  1.500.000184   0 10193   rt_sigaction
  1.100.000134   0 12461   select
  0.150.18   0  6060   ioctl
-- --- --- - - 
100.000.012227117443 22244 total

Unix:

% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 83.290.012522   0 31652 21709 rt_sigtimedwait
  6.910.001039   0 43125   clock_gettime
  3.500.000526   0  6042   ioctl
  2.740.000412   0  9943   rt_sigaction
  1.980.000298   0 12183   select
  1.580.000238

Re: [kvm-devel] [Qemu-devel] [PATCH 3/4] Add support for HPET periodic timer.

2007-08-21 Thread Matthew Kent

On Sat, 2007-18-08 at 01:11 +0200, Luca Tettamanti wrote:
> plain text document attachment (clock-hpet)
> Linux operates the HPET timer in legacy replacement mode, which means that
> the periodic interrupt of the CMOS RTC is not delivered (qemu won't be able
> to use /dev/rtc). Add support for HPET (/dev/hpet) as a replacement for the
> RTC; the periodic interrupt is delivered via SIGIO and is handled in the
> same way as the RTC timer.
> 
> Signed-off-by: Luca Tettamanti <[EMAIL PROTECTED]>

I must be missing something silly here.. should I be able to open more
than one instance of qemu with -clock hpet? Because upon invoking a
second instance of qemu HPET_IE_ON fails.

I also tried running the example in the kernel docs under
Documentation/hpet.txt

[EMAIL PROTECTED] [/home/mkent]# ./demo poll /dev/hpet 1 1000
-hpet: executing poll
hpet_poll: info.hi_flags 0x0
hpet_poll: expired time = 0x8
hpet_poll: revents = 0x1
hpet_poll: data 0x1

[EMAIL PROTECTED] [/home/mkent]# ./demo poll /dev/hpet 1 1000
-hpet: executing poll
hpet_poll: info.hi_flags 0x0
hpet_poll, HPET_IE_ON failed

This is on 2.6.23-rc3 x86_64 with the patch-2.6.23-rc3-hrt2.patch
hrtimers patch.
-- 
Matthew Kent <[EMAIL PROTECTED]>
http://magoazul.com

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] Réf. : Re: [PATCH 0/4] Vir tual Machine Time Accounting

2007-08-21 Thread Glauber de Oliveira Costa

On 8/21/07, Christian Borntraeger <[EMAIL PROTECTED]> wrote:
> Am Montag, 20. August 2007 schrieb Glauber de Oliveira Costa:
> > Although I don't know KVM to a that deep level, I think it should be
> > possible to keep the virtual cpus in different process (or threads),
> > and take the accounting time from there. Perfectly possible to know
> > the time we spent running (user time), and the time the hypervisor
> > spent doing things on our behalf (system time).
>
> I disagree here. First thing, you dont want to have the virtual cpu in a
> different process than the hypervisor control code for that cpu. Otherwise
> communication has to be made via IPC.
> Secondly, Its not qemu/kvm that does the accouting. Its existing userspace
> code like top/snmp agents and clients! etc. that would require additional
> knowledge which thread is guest code.

Yes, the second argument kills me, and I think it leaves no further
room from discussion in my side. Thanks for the enlightenment.

> I personally like the approach Laurent has taken. Maybe it needs some polish
> and maybe we want an account_guest_time function, but in general I think he
> is doing the right thing.
>
Now, me too.

-- 
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] Réf. : Re: [PATCH 0/4] Vir tual Machine Time Accounting

2007-08-21 Thread Glauber de Oliveira Costa

On 8/21/07, Laurent Vivier <[EMAIL PROTECTED]> wrote:
> Glauber de Oliveira Costa wrote:
> >> by doing this at kernel level, we can:
> >> - measure exactly the guest time,
> >> - move this part of system time to user time (as you think it should be
> >> user time),
> >> - have consistency between system, user and guest time,
> >> - report values in /proc/state and /proc//state, at system wide level
> >>
> >> I'm not sure we can measure the guest time at the qemu user level.
> >>
> >> Perhaps Rusty can say what he thinks about this ?
> >>
> > Even if we cannot _now_, isn't that an easier, and safer change? (and
> > I don't think we lose anything by design).
>
> Could you explain ? How should I do this ?
> I'm _sure_ it is not easier to do that at qemu level.
>
> I don't like to patch kernel (it is the last thing I do to solve a problem: I
> know there is always at least one guy to not agree the patch :-P ) but in this
> case I think this is the best way to do that.
>
> I think the virtualization notion should be introduced at the kernel level, at
> least in the kernel statistics: it is generic, it can be used by other
> virtualization tools. As I said, until know CPUs have got only two states
> reflected in statistics by "user time" and "system time". Since recently, they
> have introduced a third state, the virtual CPU, that, in my opinion, should be
> also reflected in the CPU statistics as the "guest time".

After a second thought on this, you seem to be right.

> >
> > Although I don't know KVM to a that deep level, I think it should be
> > possible to keep the virtual cpus in different process (or threads),
> > and take the accounting time from there. Perfectly possible to know
> > the time we spent running (user time), and the time the hypervisor
> > spent doing things on our behalf (system time).
>
> But we have always user time accounted as system time. CPU stats are wrong if 
> we
> do not modify the kernel. Can you live with wrong statistics ? (yes, I think,
> you can, but perhaps someone else not)
>
No, I can't. Just because I suggested an alternate way (even if it was
wrong), it does not mean I want things to be done wrongly.

-- 
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 00/10] PV-IO v3

2007-08-21 Thread Gregory Haskins

On Tue, 2007-08-21 at 20:12 +0300, Avi Kivity wrote:

> No, sync() means "make the other side aware that there's work to be done".
> 

Ok, but still the important thing isn't the kick per se, but the
resulting completetion.  Can we do interrupt driven reclamation?  Some
of those virtio_net emails I saw kicking around earlier today implied
buffers are reclaimed on the next xmit (e.g. polling) which violates the
netif rules for avoiding deadlock.  I suppose that could have just been
an implementation decisionbut I remember wondering how reaping would
work when virtio first came out.

Regards,
-Greg

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 00/10] PV-IO v3

2007-08-21 Thread Avi Kivity

Gregory Haskins wrote:
> On Tue, 2007-08-21 at 10:06 -0400, Gregory Haskins wrote:
>   
>> On Tue, 2007-08-21 at 23:47 +1000, Rusty Russell wrote:
>> 
>>> In the guest -> host direction, an interface like virtio is designed
>>> for batching, with the explicit distinction between add_buf & sync.
>>>   
>> Right.  IOQ has "iter_push()" and "signal()" as synonymous operations.
>> 
>
> Hi Rusty,
>   This reminded me of an area that I thought might have been missing in
> virtio compared to IOQ.  That is, flexibility in the io-completion via
> the distinction between "signal" and "sync".  sync() implies that its a
> blocking call based on the full drain of the queue, correct?

No, sync() means "make the other side aware that there's work to be done".

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 00/10] PV-IO v3

2007-08-21 Thread Gregory Haskins

On Tue, 2007-08-21 at 10:06 -0400, Gregory Haskins wrote:
> On Tue, 2007-08-21 at 23:47 +1000, Rusty Russell wrote:
> > 
> > In the guest -> host direction, an interface like virtio is designed
> > for batching, with the explicit distinction between add_buf & sync.
> 
> Right.  IOQ has "iter_push()" and "signal()" as synonymous operations.

Hi Rusty,
  This reminded me of an area that I thought might have been missing in
virtio compared to IOQ.  That is, flexibility in the io-completion via
the distinction between "signal" and "sync".  sync() implies that its a
blocking call based on the full drain of the queue, correct?  the
ioq_signal() operation is purely a "kick".  You can, of course, still
implement synchronous functions with a higher layer construct such as
the ioq->wq.  For example:

void send_sync(struct ioq *ioq, struct sk_buff *skb)
{
DECLARE_WAITQUEUE(wait, current);
struct ioq_iterator iter;

ioq_iter_init(ioq, &iter, ioq_idxtype_inuse, IOQ_ITER_AUTOUPDATE);

ioq_iter_seek(&iter, ioq_seek_head, 0, 0);

/* Update the iter.desc->ptr with skb details */

mb();
iter.desc->valid = 1;
iter.desc->sown  = 1; /* give ownership to the south */
mb();

ioq_iter_push(&iter, 0);

add_wait_queue(&ioq->wq, &wait);
set_current_state(TASK_UNINTERRUPTIBLE);

/* Wait until we own it again */
while (!iter.desc->sown)
schedule();

set_current_state(TASK_RUNNING);
remove_wait_queue(&ioq->wq, &wait);
}

But really the goal behind this design was to allow for fine-grained
selection of how io-completion is notified.  E.g.  callback (e.g.
interrupt-driven) deferred reclaimation/reaping (see
ioqnet_tx_complete), sleeping-wait via ioq->wq, busy-wait, etc.

Is there a way to do something similar in virtio? (and forgive me if
there is..I still haven't seen the code).  And if not and people like
that idea, what would be a good way to add it to the interface?

Regards,
-Greg

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] bug in virtio network driver?

2007-08-21 Thread Christian Borntraeger

Am Dienstag, 21. August 2007 schrieb Rusty Russell:
> The only reason that we don't do it in skb_xmit_done() is because
> kfree_skb() isn't supposed to be called from an interrupt.  But there's
> dev_kfree_skb_any() which can be used.

Ok, I now hacked something that works but I really dont like the 
local_irq_disable bits. I sent this patch nevertheless, but I will look into 
Arnds suggestion.

--- linux-2.6.22.orig/drivers/net/virtio_net.c
+++ linux-2.6.22/drivers/net/virtio_net.c
@@ -53,12 +52,31 @@ static void vnet_hdr_to_sg(struct scatte
sg->length = sizeof(struct virtio_net_hdr);
 }
 
+static void free_old_xmit_skbs(struct virtnet_info *vi)
+{
+   struct sk_buff *skb;
+   unsigned int len;
+
+   netif_tx_lock(vi->ndev);
+   while ((skb = vi->vq_send->ops->get_buf(vi->vq_send, &len)) != NULL) {
+   /* They cannot have written to the packet. */
+   BUG_ON(len != 0);
+   pr_debug("Sent skb %p\n", skb);
+   __skb_unlink(skb, &vi->send);
+   vi->ndev->stats.tx_bytes += skb->len;
+   vi->ndev->stats.tx_packets++;
+   dev_kfree_skb_irq(skb);
+   }
+   netif_tx_unlock(vi->ndev);
+}
+
 static bool skb_xmit_done(struct virtqueue *vq)
 {
struct virtnet_info *vi = vq->priv;
 
/* In case we were waiting for output buffers. */
netif_wake_queue(vi->ndev);
+   free_old_xmit_skbs(vi);
return true;
 }
 
@@ -214,22 +232,6 @@ again:
return 0;
 }
 
-static void free_old_xmit_skbs(struct virtnet_info *vi)
-{
-   struct sk_buff *skb;
-   unsigned int len;
-
-   while ((skb = vi->vq_send->ops->get_buf(vi->vq_send, &len)) != NULL) {
-   /* They cannot have written to the packet. */
-   BUG_ON(len != 0);
-   pr_debug("Sent skb %p\n", skb);
-   __skb_unlink(skb, &vi->send);
-   vi->ndev->stats.tx_bytes += skb->len;
-   vi->ndev->stats.tx_packets++;
-   kfree_skb(skb);
-   }
-}
-
 static int start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
struct virtnet_info *vi = netdev_priv(dev);
@@ -238,12 +240,12 @@ static int start_xmit(struct sk_buff *sk
struct virtio_net_hdr *hdr;
const unsigned char *dest = ((struct ethhdr *)skb->data)->h_dest;
 
+   local_irq_disable();
+   netif_tx_lock(vi->ndev);
pr_debug("%s: xmit %p %02x:%02x:%02x:%02x:%02x:%02x\n",
 dev->name, skb,
 dest[0], dest[1], dest[2], dest[3], dest[4], dest[5]);
 
-   free_old_xmit_skbs(vi);
-
/* Encode metadata header at front. */
hdr = skb_vnet_hdr(skb);
if (skb->ip_summed == CHECKSUM_PARTIAL) {
@@ -280,10 +282,13 @@ static int start_xmit(struct sk_buff *sk
pr_debug("%s: virtio not prepared to send\n", dev->name);
skb_unlink(skb, &vi->send);
netif_stop_queue(dev);
+   netif_tx_unlock(vi->ndev);
+   local_irq_enable();
return NETDEV_TX_BUSY;
}
vi->vq_send->ops->sync(vi->vq_send);
-
+   netif_tx_unlock(vi->ndev);
+   local_irq_enable();
return 0;
 }
 
@@ -343,7 +348,7 @@ struct net_device *virtnet_probe(struct 
dev->poll = virtnet_poll;
dev->hard_start_xmit = start_xmit;
dev->weight = 16;
-   dev->features = features;
+   dev->features = features | NETIF_F_LLTX;
SET_NETDEV_DEV(dev, device);
 
vi = netdev_priv(dev);

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] Running KVM without root privileges

2007-08-21 Thread Eugene Coetzee

Luca wrote:

>>Thanks for the reply. I'm a little confused about the interaction
>>between KVM and qemu. Which binary requires CAP_NET_ADMIN capability -
>>KVM or qemu ?
>>
>>
>
>In the upstream package 'kvm' is just a script wrapper that invokes
>the right qemu executable (the userspace component of KVM is a
>modified QEMU). The exact naming depends on your distro, e.g. Debian
>package puts the executable in /usr/bin/kvm.
>
>Luca
>
>
>  
>
Thanks for the advice. I have posted the solution to the problem on 
Ubuntu Feisty at:

http://www.linuxforums.org/forum/ubuntu-help/101274-running-kvm-without-root-privileges.html#post499980

kind regards,

Eugene Coetzee

-- 
--
===
Reedflute Software Solutions

Telephone   -> +27 18 293 3236
General information -> [EMAIL PROTECTED]
Project information -> [EMAIL PROTECTED]
Web -> www.reedflute.com
=== 

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] KVM Test result, kernel 46a948d8.. , userspace fc50790c..

2007-08-21 Thread Avi Kivity

Zhao, Yunfeng wrote:
> Before all windows guests were slow to boot up, sometimes windows test
> cases failed because timeout.
> And these days this kind of timeout failure only happens while booting
> win2k guest.
>
> Any idea why win2k with ACPI HAL is slow?
>   

It may be due to different Windows versions looking at different parts 
of the ACPI tables; maybe we're better with the newer sections and worse 
with the older sections.

-- 
error compiling committee.c: too many arguments to function

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] KVM Test result, kernel 46a948d8.. , userspace fc50790c..

2007-08-21 Thread Zhao, Yunfeng

Before all windows guests were slow to boot up, sometimes windows test
cases failed because timeout.
And these days this kind of timeout failure only happens while booting
win2k guest.

Any idea why win2k with ACPI HAL is slow?

Thanks
Yunfeng

>
>Ah, I thought this was the Windows XP with ACPI MP HAL which started
>consuming lots of cpu issue.  Seems not.
>
>For Windows 2000, it's best to use the Standard PC HAL, it then boots
in
>a few seconds.
>
>
>--
>error compiling committee.c: too many arguments to function

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] KVM Test result, kernel 46a948d8.. , userspace fc50790c..

2007-08-21 Thread Avi Kivity

Zhao, Yunfeng wrote:
>> This should have been fixed in the commit you tested.  Does your test
>> 
> do
>   
>> a 'make install'?  It needs an updated bios.
>> 
> [Yunfeng] I build kvm rpm packages for the testing. Every time before
> starting a new test, the system will run "rpm -e" to remove all old
> packages. 
> In our testing, the time of booting win2k guest is about 3 minutes. 
> The test machine is a Harwich/Paxvile with 16LPs.
>   

Ah, I thought this was the Windows XP with ACPI MP HAL which started 
consuming lots of cpu issue.  Seems not.

For Windows 2000, it's best to use the Standard PC HAL, it then boots in 
a few seconds.

-- 
error compiling committee.c: too many arguments to function

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] Another "unhandled vm exit: 0x9"

2007-08-21 Thread Weiyang Chen

Hi,

I rebuild a image with new kvm. Then I find it works!
Thanks a lot for your kindly help!
-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 00/10] PV-IO v3

2007-08-21 Thread Gregory Haskins

On Tue, 2007-08-21 at 23:47 +1000, Rusty Russell wrote:

> Hi Gregory,
> 
>   The main current use is disk drivers: they process out-of-order.

Maybe for you ;)  I am working on the networking/IVMC side.

> 
> >   I think the use of rings for the tx-path in of
> > itself is questionable unless you can implement something like the bidir
> > NAPI that I demonstrated in ioqnet.  Otherwise, you end up having to
> > hypercall on each update to the ring anyway and you might as well
> > hypercall directly w/o using a ring.
> 
>   In the guest -> host direction, an interface like virtio is designed
> for batching, with the explicit distinction between add_buf & sync.

Right.  IOQ has "iter_push()" and "signal()" as synonymous operations.
But note that batching via deferred synchronization does not implicitly
require a shared queue. E.g. you could batch internally and then
hypercall at the "sync" point.  However, batching via a queue is still
nice because at least you give the host side a chance to independently
"notice" the changes concurrently before the sync.  But I digress...

>   On
> the receive side, you can have explicit interrupt suppression on
> implicit mitigation caused by scheduling effects.

Agreed.  This is precisely what the bidir NAPI stuff is doing and I
didn't mean to imply that virtio wasn't capable of it too.  All I meant
is that if you *don't* take advantage of it, the guest->host path via a
queue is likely overkill.  E.g. you might as well hypercall instead.

>   But in fact as we can see, two rings need less from each ring than one
> ring.  One ring must have producer and consumer indices, so the producer
> doesn't overrun the consumer.  But if the second ring is used to feed
> consumption, the consumer index isn't required any more: in fact, it's
> just confusing to have.

Don't get me wrong.  I am totally in favor of the two ring approach.
You have enlightened me on that front. :)  I was under the impression
that then making the two-ringed approach support out-of-order added
significantly more complexity.  Did I understand that wrong?

> 
>   I really think that a table of descriptors, a ring for produced
> descriptors and a ring for used descriptors is the most cache-friendly,
> bidir-non-trusting simple implementation possible.  Of course, the
> produced and used rings might be the same format, which allows code
> sharing and if you squint a little, that's your "lowest level" simple
> ringbuffer.

Sounds reasonable to me.

> 
> Thanks for the discussion,

Ditto!
-Greg

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 00/10] PV-IO v3

2007-08-21 Thread Rusty Russell

On Tue, 2007-08-21 at 08:00 -0400, Gregory Haskins wrote:
> On Tue, 2007-08-21 at 17:58 +1000, Rusty Russell wrote:
> 
> > Partly the horror of the code, but mainly because it is an in-order
> > ring.  You'll note that we use a reply ring, so we don't need to know
> > how much the other side has consumed (and it needn't do so in order).
> > 
> 
> I have certainly been known to take a similar stance when looking at Xen
> code ;) (recall the lapic work I did).  However, that said I am not yet
> convinced that an out-of-order ring (at least as a fundamental
> primitive) buys us much.

Hi Gregory,

The main current use is disk drivers: they process out-of-order.

>   I think the use of rings for the tx-path in of
> itself is questionable unless you can implement something like the bidir
> NAPI that I demonstrated in ioqnet.  Otherwise, you end up having to
> hypercall on each update to the ring anyway and you might as well
> hypercall directly w/o using a ring.

In the guest -> host direction, an interface like virtio is designed
for batching, with the explicit distinction between add_buf & sync.  On
the receive side, you can have explicit interrupt suppression on
implicit mitigation caused by scheduling effects.

> OTOH, its possible that its redundant to have a simple low-level
> infrastructure and then build a more complex ring for out-of-order
> processing on top of it.  I'm not sure.  My gut feeling is that it will
> probably result in a cleaner implementation: The higher-layered ring can
> stop worrying about the interrupt/hypercall details (it would use the
> simple ring as its transport)and implementations that don't need
> out-of-order (e.g. networks) don't have to deal with the associated
> complexity.

But in fact as we can see, two rings need less from each ring than one
ring.  One ring must have producer and consumer indices, so the producer
doesn't overrun the consumer.  But if the second ring is used to feed
consumption, the consumer index isn't required any more: in fact, it's
just confusing to have.

I really think that a table of descriptors, a ring for produced
descriptors and a ring for used descriptors is the most cache-friendly,
bidir-non-trusting simple implementation possible.  Of course, the
produced and used rings might be the same format, which allows code
sharing and if you squint a little, that's your "lowest level" simple
ringbuffer.

Thanks for the discussion,
Rusty.

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] KVM Test result, kernel 46a948d8.. , userspace fc50790c..

2007-08-21 Thread Zhao, Yunfeng




>> 1. Could not create kvm guest with memory >=2040
>>
>>
>https://sourceforge.net/tracker/index.php?func=detail&aid=1736307&group
_id=
>180599&atid=893831
>>
>=180599&atid=893831>
>>
>
>This ought to be fixed for the next run (only on 64-bit hosts; 32-bit
>hosts will remain with a 2GB limit).
[Yunfeng] Ok, I will check the test on 64bit.
>
>>
>=180599&atid=893831>
>>
>> 5. Booting windows guest is very slow
>https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1768187&gr
oup_
>id=180599
>>
>This should have been fixed in the commit you tested.  Does your test
do
>a 'make install'?  It needs an updated bios.
[Yunfeng] I build kvm rpm packages for the testing. Every time before
starting a new test, the system will run "rpm -e" to remove all old
packages. 
In our testing, the time of booting win2k guest is about 3 minutes. 
The test machine is a Harwich/Paxvile with 16LPs.


>
>--
>error compiling committee.c: too many arguments to function

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 00/10] PV-IO v3

2007-08-21 Thread Gregory Haskins

On Tue, 2007-08-21 at 15:25 +0300, Avi Kivity wrote:
> Gregory Haskins wrote:
> > On Tue, 2007-08-21 at 17:58 +1000, Rusty Russell wrote:
> >
> >   
> >> Partly the horror of the code, but mainly because it is an in-order
> >> ring.  You'll note that we use a reply ring, so we don't need to know
> >> how much the other side has consumed (and it needn't do so in order).
> >>
> >> 
> >
> > I have certainly been known to take a similar stance when looking at Xen
> > code ;) (recall the lapic work I did).  However, that said I am not yet
> > convinced that an out-of-order ring (at least as a fundamental
> > primitive) buys us much.  
> 
> It's pretty much required for block I/O into disk arrays.

You are misunderstanding me.  I totally agree that block io is
inherently out-of-order.  What I am trying to convey is that at a
fundamental level *everything* (including block-io) can be viewed as an
ordered sequence of events.

For instance, consider that a block-io driver is making requests like
"perform read transaction X", and "perform write transaction Y".
Likewise, the host side can pass events like "completed transaction Y"
and "completed transaction X".  At this level, everything is *always*
ordered, regardless of the fact that X and Y were temporally rearranged
by the host.

This is what the ioq/pvbus series is trying to address:  These low-level
primitives for moving events in and out of the guest in a VMM agnostic
way.  From there, you could apply higher level constructs such as an
out-of-order sg descriptor ring to represent your block-io data.  The
low-level primitives simply become a way to convey changes to that
construct.

In a nutshell, IOQ provides a simple bi-directional ordered event
channel and a context associated hypercall mechanism (see
pvbus_device->call()) to accomplish these low-level chores.

I am also advocating caution on the tx path, as I think indirection
(e.g. queuing) as opposed to direct access (e.g. contextual hypercall)
has limited applicability.  Trying to come up with a complex
"one-size-fits-all" queue for the tx path may be not worthwhile since in
the end there is still a 1:1 with queue-insert:hypercall.  You might as
well just pass the descriptor directly via the contextual hypercall.
Where this ends up being a win is where you can do the bi-dir NAPI-like
tricks like IOQNET and have the queue-insert to hypercall ratio become >
1.  

> 
> Xen does out-of-order, btw, on its single ring, but at the cost of some 
> complexity.  I don't believe it is worthwhile and prefer split 
> request/reply rings.

I am not against the split rings either.  The article that Rusty
forwarded was very interesting indeed.  But if I understood the article
and Rusty, there are kind of two aspects to it.  A) Using two rings to
make an cache-thrash friendly ordered ring, or B) adding out-of-order
capability to these two rings.  I am certainly in favor of (A) for use
as the low-level event transport.  I just question whether the
complexity of (B) is justified as the one and only queuing mechanism
when there are plenty of patterns that simply cannot take advantage of
it.

What I am wondering is if we should have a set of low-level primitives
that deal primarily with ordered event sequencing and VMM abstraction,
and a higher set of code expressed in terms of these primitives for
implementing the constructs such as (B) for block-io.

> 
> With my VJ T-shirt on, I can even say it's more efficient, as each side 
> of the ring will have a single writer and a single reader, reducing 
> ping-pong effects if the interrupt completions happens to land on the 
> wrong cpu.

Agreed.

> 
> Network tx can be out of order too (with some traffic destined to other 
> guests, some to the host, and some to external interfaces, completions 
> will be out of order).

Well, not with respect to the 1:1 event delivery channel as I envision
it (unless I am misunderstanding you?)

Regards,
-Greg

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 0/4] Virtual Machine Time Accounting

2007-08-21 Thread Avi Kivity

Jeremy Fitzhardinge wrote:
> Laurent Vivier wrote:
>   
>> functionnalities:
>>
>> - allow to measure time spent by a CPU in a virtual CPU.
>> - allow to display in /proc/state this value by CPU
>> - allow to display in /proc//state this value by process
>> - allow KVM to use these 3 previous functionnalities
>>   
>> 
>
> So, currently time spent in a kvm guest is accumulated as qemu-kvm
> usertime, right?  Given that qemu knows when its running in qemu vs
> guest context, couldn't it provide the breakdown between user and guest
> time (ditto lguest)?
>   

qemu doesn't (and shouldn't) do accounting; that's best done by 
interrupt driven code.

The patches do account for guest time in a separate counter; guest time 
is added to both user time and the new counter.  This allows an old 
'top' to see guest time (accounted as user time), and a new 'top' to 
separate guest time and user time by performing the appropriate 
mathematical operation.

-- 
error compiling committee.c: too many arguments to function

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] KVM Test result, kernel 46a948d8.. , userspace fc50790c..

2007-08-21 Thread Avi Kivity

Zhao, Yunfeng wrote:
>
> Old Issue list:
>
> 
>
> 1. Could not create kvm guest with memory >=2040
>
> https://sourceforge.net/tracker/index.php?func=detail&aid=1736307&group_id=180599&atid=893831
>  
> 
>

This ought to be fixed for the next run (only on 64-bit hosts; 32-bit 
hosts will remain with a 2GB limit).

> 
>
> 5. Booting windows guest is very slow
>
> https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1768187&group_id=180599
>  
> 
>

This should have been fixed in the commit you tested.  Does your test do 
a 'make install'?  It needs an updated bios.

-- 
error compiling committee.c: too many arguments to function

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH] Add support for passing extra LDFLAGS to qemu's configure

2007-08-21 Thread Avi Kivity

Jeremy Katz wrote:
> There are cases[1] where you want to be able to pass more ldflags to
> qemu's configure.  This lets you set LDFLAGS to accomplish that
>
> Signed-off-by: Jeremy Katz <[EMAIL PROTECTED]>
>
> Jeremy
>
> [1] Such as with the new build-id support in binutils so that you can
> pass --build-id to the linker while still building with the old compiler
> as needed for qemu
>   
> diff -up kvm-35/configure.ldflags kvm-35/configure
> --- kvm-35/configure.ldflags  2007-08-20 17:40:39.0 -0400
> +++ kvm-35/configure  2007-08-20 17:40:50.0 -0400
> @@ -83,7 +83,7 @@ target_cpu() {
>  (cd user; ./configure --prefix="$prefix" --kerneldir="$libkvm_kerneldir")
>  (cd qemu; ./configure --target-list=$(target_cpu)-softmmu --cc="$qemu_cc" \
>  --disable-kqemu --extra-cflags="-I $PWD/../user" \
> ---extra-ldflags="-L $PWD/../user" \
> +--extra-ldflags="-L $PWD/../user $LDFLAGS" \
>  --enable-kvm --kernel-path="$libkvm_kerneldir" \
>  --enable-alsa \
>  ${disable_gcc_check:+"--disable-gcc-check"} \
>   

I utterly dislike slurping build options from the environment.  Please 
provide an explicit option.  Counterexamples in current code will not help.

-- 
error compiling committee.c: too many arguments to function

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH][Rebased LAPIC5] Reading PPR directly from function rather than apic page

2007-08-21 Thread Avi Kivity

Yang, Sheng wrote:
> After introducing TPR shadow, many TPR change won't cause vmexit, so the
> present method of updating PPR can't catch it. Though we can also update
> PPR
> everytime when we want to read PPR, it's somehow ugly.
>
> Because there are only two places need reading PPR, and PPR is
> read-only, we
> do it in more clear way. Now the apic_update_PPR() have been replaced by
> apic_get_PPR() which returned current PPR, and we read the PPR directly
> from
> the function when we need it, rather than reading from apic page.
>
> Signed-off-by: Sheng Yang <[EMAIL PROTECTED]>
>
> Notice: This patch based on the rebased lapic5, for the current lapic5
> is broken...
>   

Patch is okay; pending only on the previous patch.

-- 
error compiling committee.c: too many arguments to function

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH][Rebased LAPIC5] Enable TPR shadow on CR8 access

2007-08-21 Thread Avi Kivity

Yang, Sheng wrote:
> This patch enabled TPR shadow of VMX on CR8 access. 64bit Windows using
> CR8
> access TPR frequently. The TPR shadow can improve the performance of
> access
> TPR by not causing vmexit.
>
> Signed-off-by: Sheng Yang <[EMAIL PROTECTED]>
>
> Notice: This patch based on the rebased lapic5, for the current lapic5
> is broken...
>   

Wait, which lapic5 is broken and which is okay?!

Can you explain?  maybe post sha1 hashes?

> +#define cpu_has_vmx_tpr_shadow \
> + (vmcs_config.cpu_based_exec_ctrl & CPU_BASED_TPR_SHADOW)
> +
>   

Inline function please.

Why hide code in something that looks like a variable?

> @@ -2128,6 +2165,7 @@ static void vmx_intr_assist(struct kvm_vcpu *vcpu)
>   vmx_inject_irq(vcpu, kvm_cpu_get_interrupt(vcpu));
>   else
>   enable_irq_window(vcpu);
> +
>  }
>   

Superfluous empty line, please remove.

Apart from this, code looks fine.  But please explain what happened to 
the lapic5 branch?

-- 
error compiling committee.c: too many arguments to function

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 00/10] PV-IO v3

2007-08-21 Thread Avi Kivity

Rusty Russell wrote:
> Partly the horror of the code, but mainly because it is an in-order
> ring.  You'll note that we use a reply ring, so we don't need to know
> how much the other side has consumed (and it needn't do so in order).
>   

Yes, it's quite nice: by using two in-order rings, you get out-of-order 
completions.  Simple _and_ efficient.

-- 
error compiling committee.c: too many arguments to function

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 00/10] PV-IO v3

2007-08-21 Thread Avi Kivity

Gregory Haskins wrote:
> On Tue, 2007-08-21 at 17:58 +1000, Rusty Russell wrote:
>
>   
>> Partly the horror of the code, but mainly because it is an in-order
>> ring.  You'll note that we use a reply ring, so we don't need to know
>> how much the other side has consumed (and it needn't do so in order).
>>
>> 
>
> I have certainly been known to take a similar stance when looking at Xen
> code ;) (recall the lapic work I did).  However, that said I am not yet
> convinced that an out-of-order ring (at least as a fundamental
> primitive) buys us much.  

It's pretty much required for block I/O into disk arrays.

Xen does out-of-order, btw, on its single ring, but at the cost of some 
complexity.  I don't believe it is worthwhile and prefer split 
request/reply rings.

With my VJ T-shirt on, I can even say it's more efficient, as each side 
of the ring will have a single writer and a single reader, reducing 
ping-pong effects if the interrupt completions happens to land on the 
wrong cpu.

> I think the use of rings for the tx-path in of
> itself is questionable unless you can implement something like the bidir
> NAPI that I demonstrated in ioqnet.  Otherwise, you end up having to
> hypercall on each update to the ring anyway and you might as well
> hypercall directly w/o using a ring.
>   

Network tx can be out of order too (with some traffic destined to other 
guests, some to the host, and some to external interfaces, completions 
will be out of order).

-- 
error compiling committee.c: too many arguments to function

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 0/4] Rework alarm timer infrastrucure - take2

2007-08-21 Thread Avi Kivity

Luca Tettamanti wrote:
>>>   
>>>   
>> Run a 100Hz guest, measure cpu usage using something accurate like
>> cyclesoak, with and without dynticks, with and without kvm.
>> 
>
> Ok, here I've measured the CPU usage on the host when running an idle
> guest.
>
> At 100Hz
>
> QEMU
> hpet4.8%
> dynticks5.1%
>
> Note: I've taken the mean over a period of 20 secs, but the difference
> between hpet and dynticks is well inside the variability of the test.
>
> KVM
> hpet2.2%
> dynticks1.0%
>
> Hum... here the numbers jumps a bit, but dynticks is always below hpet.
>   

The differences here are small, so I'll focus on the 1000Hz case.

> At 1000Hz:
>
> QEMU
> hpet5.5%
> dynticks   11.7%
>
> KVM
> hpet3.4%
> dynticks7.3%
>
> No surprises here, you can see the additional 1k syscalls per second. 

This is very surprising to me.  The 6.2% difference for the qemu case 
translates to 62ms per second, or 62us per tick at 1000Hz.  That's more 
than a hundred simple syscalls on modern processors.  We shouldn't have 
to issue a hundred syscalls per guest clock tick.

The difference with kvm is smaller (just 3.9%), which is not easily 
explained as the time for the extra syscalls should be about the same.  
My guess is that guest behavior is different; with dynticks the guest 
does about twice as much work as with hpet.

Can you verify this by running

strace -c -p `pgrep qemu` & sleep 10; pkill strace

for all 4 cases, and posting the results?

-- 
error compiling committee.c: too many arguments to function

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 00/10] PV-IO v3

2007-08-21 Thread Gregory Haskins

On Tue, 2007-08-21 at 17:58 +1000, Rusty Russell wrote:

> Partly the horror of the code, but mainly because it is an in-order
> ring.  You'll note that we use a reply ring, so we don't need to know
> how much the other side has consumed (and it needn't do so in order).
> 

I have certainly been known to take a similar stance when looking at Xen
code ;) (recall the lapic work I did).  However, that said I am not yet
convinced that an out-of-order ring (at least as a fundamental
primitive) buys us much.  I think the use of rings for the tx-path in of
itself is questionable unless you can implement something like the bidir
NAPI that I demonstrated in ioqnet.  Otherwise, you end up having to
hypercall on each update to the ring anyway and you might as well
hypercall directly w/o using a ring.

At a fundamental level, I think we simply need an efficient and in-order
(read: simple) ring to move data in, and a context associated hypercall
to get out.  We can also use that simple ring to move data out if its
advantageous to do so (read: tx NAPI can be used).  From there, we can
build more complex constructs from these primitives, like out-of-order
sg block-io.

OTOH, its possible that its redundant to have a simple low-level
infrastructure and then build a more complex ring for out-of-order
processing on top of it.  I'm not sure.  My gut feeling is that it will
probably result in a cleaner implementation: The higher-layered ring can
stop worrying about the interrupt/hypercall details (it would use the
simple ring as its transport)and implementations that don't need
out-of-order (e.g. networks) don't have to deal with the associated
complexity.

What are your thoughts to this layering approach?

Regards,
-Greg

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] bug in virtio network driver?

2007-08-21 Thread Rusty Russell

On Tue, 2007-08-21 at 10:48 +0200, Christian Borntraeger wrote:
> Hello Rusty,
> 
> I think I have found a problem in the virtio network driver. virtio_net 
> reclaims sent skbs on xmit. That means that there is always one skb 
> outstanding and the netdev packet statistic is always one packet to low.

Hi Christian,

Good catch!

> One solution would be to use the xmit_done interrupt. Unfortunately this 
> would 
> require additional locking as multiple interrupts can happen at two or more 
> cpus. Do you have any better ideas?

The only reason that we don't do it in skb_xmit_done() is because
kfree_skb() isn't supposed to be called from an interrupt.  But there's
dev_kfree_skb_any() which can be used.

Cheers,
Rusty.

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] bug in virtio network driver?

2007-08-21 Thread Christian Borntraeger

Hello Rusty,

I think I have found a problem in the virtio network driver. virtio_net 
reclaims sent skbs on xmit. That means that there is always one skb 
outstanding and the netdev packet statistic is always one packet to low.

Documentation/networking/drivers.txt says

3) Do not forget that once you return 0 from your hard_start_xmit
   method, it is your driver's responsibility to free up the SKB
   and in some finite amount of time.

   For example, this means that it is not allowed for your TX
   mitigation scheme to let TX packets "hang out" in the TX
   ring unreclaimed forever if no new TX packets are sent.
   This error can deadlock sockets waiting for send buffer room
   to be freed up.

One solution would be to use the xmit_done interrupt. Unfortunately this would 
require additional locking as multiple interrupts can happen at two or more 
cpus. Do you have any better ideas?

Christian

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] Réf. : Re: [PATCH 0/4] Vir tual Machine Time Accounting

2007-08-21 Thread Christian Borntraeger

Am Montag, 20. August 2007 schrieb Glauber de Oliveira Costa:
> Although I don't know KVM to a that deep level, I think it should be
> possible to keep the virtual cpus in different process (or threads),
> and take the accounting time from there. Perfectly possible to know
> the time we spent running (user time), and the time the hypervisor
> spent doing things on our behalf (system time).

I disagree here. First thing, you dont want to have the virtual cpu in a 
different process than the hypervisor control code for that cpu. Otherwise  
communication has to be made via IPC. 
Secondly, Its not qemu/kvm that does the accouting. Its existing userspace 
code like top/snmp agents and clients! etc. that would require additional 
knowledge which thread is guest code.

I personally like the approach Laurent has taken. Maybe it needs some polish 
and maybe we want an account_guest_time function, but in general I think he 
is doing the right thing. 

Christian

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 00/10] PV-IO v3

2007-08-21 Thread Rusty Russell

On Tue, 2007-08-21 at 00:33 -0700, Dor Laor wrote:
> >> > Well, for cache reasons you should really try to avoid having both
> >sides
> >> > write to the same data.  Hence two separate cache-aligned regions
> is
> >> > better than one region and a flip bit.
> >>
> >> While I certainly can see what you mean about the cache implications
> >for
> >> a bit-flip design, I don't see how you can get away with not having
> >both
> >> sides write to the same memory in other designs either.  Wouldn't you
> >> still have to adjust descriptors from one ring to the other?  E.g.
> >> wouldn't both sides be writing descriptor pointer data in this case,
> >or
> >> am I missing something?
> >
> >Hi Gregory,
> >
> > You can have separate produced and consumed counters: see for
> >example
> >Van Jacobson's Netchannels presentation
> >http://www.lemis.com/grog/Documentation/vj/lca06vj.pdf page 23.
> >
> > This single consumed count isn't sufficient if you can consume
> >out-of-order: for that you really want a second "reply" ringbuffer
> >indicating what buffers are consumed.
> >
> 
> Rusty, although your code works pretty nice, (I'll send some raw patches
> later on today with 
> kvm support for virtio). I was wandering why didn't you use Xen's ring
> implementation?
> They have separate counters and also union for the request/response
> structure in the same
> descriptor.

Partly the horror of the code, but mainly because it is an in-order
ring.  You'll note that we use a reply ring, so we don't need to know
how much the other side has consumed (and it needn't do so in order).

Cheers,
Rusty.



-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] Réf. : Re: [PATCH 0/4] Vir tual Machine Time Accounting

2007-08-21 Thread Laurent Vivier

Glauber de Oliveira Costa wrote:
>> by doing this at kernel level, we can:
>> - measure exactly the guest time,
>> - move this part of system time to user time (as you think it should be
>> user time),
>> - have consistency between system, user and guest time,
>> - report values in /proc/state and /proc//state, at system wide level
>>
>> I'm not sure we can measure the guest time at the qemu user level.
>>
>> Perhaps Rusty can say what he thinks about this ?
>>
> Even if we cannot _now_, isn't that an easier, and safer change? (and
> I don't think we lose anything by design).

Could you explain ? How should I do this ?
I'm _sure_ it is not easier to do that at qemu level.

I don't like to patch kernel (it is the last thing I do to solve a problem: I
know there is always at least one guy to not agree the patch :-P ) but in this
case I think this is the best way to do that.

I think the virtualization notion should be introduced at the kernel level, at
least in the kernel statistics: it is generic, it can be used by other
virtualization tools. As I said, until know CPUs have got only two states
reflected in statistics by "user time" and "system time". Since recently, they
have introduced a third state, the virtual CPU, that, in my opinion, should be
also reflected in the CPU statistics as the "guest time".

> 
> Although I don't know KVM to a that deep level, I think it should be
> possible to keep the virtual cpus in different process (or threads),
> and take the accounting time from there. Perfectly possible to know
> the time we spent running (user time), and the time the hypervisor
> spent doing things on our behalf (system time).

But we have always user time accounted as system time. CPU stats are wrong if we
do not modify the kernel. Can you live with wrong statistics ? (yes, I think,
you can, but perhaps someone else not)

Laurent
-- 
- [EMAIL PROTECTED]  --
  "Software is hard" - Donald Knuth

signature.asc
Description: OpenPGP digital signature
-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 00/10] PV-IO v3

2007-08-21 Thread Dor Laor

>> > Well, for cache reasons you should really try to avoid having both
>sides
>> > write to the same data.  Hence two separate cache-aligned regions
is
>> > better than one region and a flip bit.
>>
>> While I certainly can see what you mean about the cache implications
>for
>> a bit-flip design, I don't see how you can get away with not having
>both
>> sides write to the same memory in other designs either.  Wouldn't you
>> still have to adjust descriptors from one ring to the other?  E.g.
>> wouldn't both sides be writing descriptor pointer data in this case,
>or
>> am I missing something?
>
>Hi Gregory,
>
>   You can have separate produced and consumed counters: see for
>example
>Van Jacobson's Netchannels presentation
>http://www.lemis.com/grog/Documentation/vj/lca06vj.pdf page 23.
>
>   This single consumed count isn't sufficient if you can consume
>out-of-order: for that you really want a second "reply" ringbuffer
>indicating what buffers are consumed.
>

Rusty, although your code works pretty nice, (I'll send some raw patches
later on today with 
kvm support for virtio). I was wandering why didn't you use Xen's ring
implementation?
They have separate counters and also union for the request/response
structure in the same
descriptor.
Here is some of it + lxr link:

http://81.161.245.2/lxr/http/source/xen/include/public/io/ring.h?v=xen-3
.1.0-src;a=m68k


80 #define DEFINE_RING_TYPES(__name, __req_t, __rsp_t)
\
 81
\
 82 /* Shared ring entry */
\
 83 union __name##_sring_entry {
\
 84 __req_t req;
\
 85 __rsp_t rsp;
\
 86 };
\
 87
\
 88 /* Shared ring page */
\
 89 struct __name##_sring {
\
 90 RING_IDX req_prod, req_event;
\
 91 RING_IDX rsp_prod, rsp_event;
\
 92 uint8_t  pad[48];
\
 93 union __name##_sring_entry ring[1]; /* variable-length */
\
 94 };
\
 95
\
 96 /* "Front" end's private variables */
\
 97 struct __name##_front_ring {
\
 98 RING_IDX req_prod_pvt;
\
 99 RING_IDX rsp_cons;
\
100 unsigned int nr_ents;
\
101 struct __name##_sring *sring;
\
102 };
\
103
\
104 /* "Back" end's private variables */
\
105 struct __name##_back_ring {
\
106 RING_IDX rsp_prod_pvt;
\
107 RING_IDX req_cons;
\
108 unsigned int nr_ents;
\
109 struct __name##_sring *sring;
\
110 };
\
111
\
112 /* Syntactic sugar */
\
113 typedef struct __name##_sring __name##_sring_t;
\
114 typedef struct __name##_front_ring __name##_front_ring_t;
\
115 typedef struct __name##_back_ring __name##_back_ring_t

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

41 matches

Mail list logo