date:20110428

Re: [PATCH 3/3] kvm tools: Make virtio-blk kvm__irq_line thread safe

2011-04-28 Thread Pekka Enberg

On Fri, Apr 29, 2011 at 9:44 AM, Ingo Molnar  wrote:
> Hm, this looks a bit strange (the mutex here protects only a kernel call - 
> that
> cannot be right) and there's no explanation why it's needed. Why do
> VIRTIO_BLK_IRQ (== KVM_IRQ_LINE ioctl()) calls have to be covered by the 
> mutex?
>
> A short blurb about expected behavior on SMP and locking rules at the top of
> virtio-blk.c would be nice.

Yes, looks strange. Asias, did you see some bad behavior that this
fixes? The per-device mutexes are there to protect device state. The
assumption here is that KVM handles KVM_IRQ_LINE ioctl() serialization
by titself.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] kvm tools: Make virtio-net kvm__irq_line thread safe

2011-04-28 Thread Pekka Enberg

On Fri, Apr 29, 2011 at 9:36 AM, Asias He  wrote:
> This patch fixes virtio-net randmom stall
>
> host $ scp guest:/root/big.guest .
> big.guest 42%  440MB  67.7KB/s - stalled -
>
> Signed-off-by: Asias He 
> ---
>  tools/kvm/virtio-net.c |    9 +++--
>  1 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/tools/kvm/virtio-net.c b/tools/kvm/virtio-net.c
> index 58b3de4..efe06cb 100644
> --- a/tools/kvm/virtio-net.c
> +++ b/tools/kvm/virtio-net.c
> @@ -77,9 +77,12 @@ static void virtio_net_rx_callback(struct kvm *self, void 
> *param)
>                head = virt_queue__get_iov(vq, iov, &out, &in, self);
>                len = readv(net_device.tap_fd, iov, in);
>                virt_queue__set_used_elem(vq, head, len);
> -       }
>
> -       kvm__irq_line(self, VIRTIO_NET_IRQ, 1);
> +               /* We should interrupt guest right now, otherwise latency is 
> huge. */
> +               mutex_lock(&net_device.mutex);
> +               kvm__irq_line(self, VIRTIO_NET_IRQ, 1);
> +               mutex_unlock(&net_device.mutex);
> +       }
>  }
>
>  static void virtio_net_tx_callback(struct kvm *self, void *param)

Please make that IRQ latency fix a separate patch. Don't we need to do
it for TX path as well, btw?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] kvm tools: Make virtio-net kvm__irq_line thread safe

2011-04-28 Thread Ingo Molnar


* Asias He  wrote:

> This patch fixes virtio-net randmom stall
> 
> host $ scp guest:/root/big.guest .
> big.guest 42%  440MB  67.7KB/s - stalled -
> 
> Signed-off-by: Asias He 
> ---
>  tools/kvm/virtio-net.c |9 +++--
>  1 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/kvm/virtio-net.c b/tools/kvm/virtio-net.c
> index 58b3de4..efe06cb 100644
> --- a/tools/kvm/virtio-net.c
> +++ b/tools/kvm/virtio-net.c
> @@ -77,9 +77,12 @@ static void virtio_net_rx_callback(struct kvm *self, void 
> *param)
>   head = virt_queue__get_iov(vq, iov, &out, &in, self);
>   len = readv(net_device.tap_fd, iov, in);
>   virt_queue__set_used_elem(vq, head, len);
> - }
>  
> - kvm__irq_line(self, VIRTIO_NET_IRQ, 1);
> + /* We should interrupt guest right now, otherwise latency is 
> huge. */
> + mutex_lock(&net_device.mutex);
> + kvm__irq_line(self, VIRTIO_NET_IRQ, 1);
> + mutex_unlock(&net_device.mutex);
> + }
>  }
>  
>  static void virtio_net_tx_callback(struct kvm *self, void *param)
> @@ -98,7 +101,9 @@ static void virtio_net_tx_callback(struct kvm *self, void 
> *param)
>   virt_queue__set_used_elem(vq, head, len);
>   }
>  
> + mutex_lock(&net_device.mutex);
>   kvm__irq_line(self, VIRTIO_NET_IRQ, 1);
> + mutex_unlock(&net_device.mutex);

I do not find this explanation adequate either. This file too could use some 
comments about how the SMP behavior looks like.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] kvm tools: Make virtio-console kvm__irq_line thread safe

2011-04-28 Thread Ingo Molnar


* Asias He  wrote:

> Signed-off-by: Asias He 
> ---
>  tools/kvm/virtio-console.c |2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/tools/kvm/virtio-console.c b/tools/kvm/virtio-console.c
> index e66d198..492c859 100644
> --- a/tools/kvm/virtio-console.c
> +++ b/tools/kvm/virtio-console.c
> @@ -162,7 +162,9 @@ static void virtio_console_handle_callback(struct kvm 
> *self, void *param)
>   virt_queue__set_used_elem(vq, head, len);
>   }
>  
> + mutex_lock(&console_device.mutex);
>   kvm__irq_line(self, VIRTIO_CONSOLE_IRQ, 1);
> + mutex_unlock(&console_device.mutex);
>  }

This looks wrong for similar reasons as for virtio-blk.c.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] kvm tools: Make virtio-blk kvm__irq_line thread safe

2011-04-28 Thread Ingo Molnar


* Asias He  wrote:

> Signed-off-by: Asias He 
> ---
>  tools/kvm/virtio-blk.c |2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/tools/kvm/virtio-blk.c b/tools/kvm/virtio-blk.c
> index 3feabd0..ade6335 100644
> --- a/tools/kvm/virtio-blk.c
> +++ b/tools/kvm/virtio-blk.c
> @@ -159,7 +159,9 @@ static void virtio_blk_do_io(struct kvm *kvm, void *param)
>   while (virt_queue__available(vq))
>   virtio_blk_do_io_request(kvm, vq);
>  
> + mutex_lock(&blk_device.mutex);
>   kvm__irq_line(kvm, VIRTIO_BLK_IRQ, 1);
> + mutex_unlock(&blk_device.mutex);

Hm, this looks a bit strange (the mutex here protects only a kernel call - that 
cannot be right) and there's no explanation why it's needed. Why do 
VIRTIO_BLK_IRQ (== KVM_IRQ_LINE ioctl()) calls have to be covered by the mutex?

A short blurb about expected behavior on SMP and locking rules at the top of 
virtio-blk.c would be nice.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/3] kvm tools: Make virtio-blk kvm__irq_line thread safe

2011-04-28 Thread Asias He

Signed-off-by: Asias He 
---
 tools/kvm/virtio-blk.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/tools/kvm/virtio-blk.c b/tools/kvm/virtio-blk.c
index 3feabd0..ade6335 100644
--- a/tools/kvm/virtio-blk.c
+++ b/tools/kvm/virtio-blk.c
@@ -159,7 +159,9 @@ static void virtio_blk_do_io(struct kvm *kvm, void *param)
while (virt_queue__available(vq))
virtio_blk_do_io_request(kvm, vq);
 
+   mutex_lock(&blk_device.mutex);
kvm__irq_line(kvm, VIRTIO_BLK_IRQ, 1);
+   mutex_unlock(&blk_device.mutex);
 }
 
 static bool virtio_blk_pci_io_out(struct kvm *self, uint16_t port, void *data, 
int size, uint32_t count)
-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] kvm tools: Make virtio-console kvm__irq_line thread safe

2011-04-28 Thread Asias He

Signed-off-by: Asias He 
---
 tools/kvm/virtio-console.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/tools/kvm/virtio-console.c b/tools/kvm/virtio-console.c
index e66d198..492c859 100644
--- a/tools/kvm/virtio-console.c
+++ b/tools/kvm/virtio-console.c
@@ -162,7 +162,9 @@ static void virtio_console_handle_callback(struct kvm 
*self, void *param)
virt_queue__set_used_elem(vq, head, len);
}
 
+   mutex_lock(&console_device.mutex);
kvm__irq_line(self, VIRTIO_CONSOLE_IRQ, 1);
+   mutex_unlock(&console_device.mutex);
 }
 
 static bool virtio_console_pci_io_out(struct kvm *self, uint16_t port, void 
*data, int size, uint32_t count)
-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/3] kvm tools: Make virtio-net kvm__irq_line thread safe

2011-04-28 Thread Asias He

This patch fixes virtio-net randmom stall

host $ scp guest:/root/big.guest .
big.guest 42%  440MB  67.7KB/s - stalled -

Signed-off-by: Asias He 
---
 tools/kvm/virtio-net.c |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/virtio-net.c b/tools/kvm/virtio-net.c
index 58b3de4..efe06cb 100644
--- a/tools/kvm/virtio-net.c
+++ b/tools/kvm/virtio-net.c
@@ -77,9 +77,12 @@ static void virtio_net_rx_callback(struct kvm *self, void 
*param)
head = virt_queue__get_iov(vq, iov, &out, &in, self);
len = readv(net_device.tap_fd, iov, in);
virt_queue__set_used_elem(vq, head, len);
-   }
 
-   kvm__irq_line(self, VIRTIO_NET_IRQ, 1);
+   /* We should interrupt guest right now, otherwise latency is 
huge. */
+   mutex_lock(&net_device.mutex);
+   kvm__irq_line(self, VIRTIO_NET_IRQ, 1);
+   mutex_unlock(&net_device.mutex);
+   }
 }
 
 static void virtio_net_tx_callback(struct kvm *self, void *param)
@@ -98,7 +101,9 @@ static void virtio_net_tx_callback(struct kvm *self, void 
*param)
virt_queue__set_used_elem(vq, head, len);
}
 
+   mutex_lock(&net_device.mutex);
kvm__irq_line(self, VIRTIO_NET_IRQ, 1);
+   mutex_unlock(&net_device.mutex);
 }
 
 static bool virtio_net_pci_io_device_specific_in(void *data, unsigned long 
offset, int size, uint32_t count)
-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk

2011-04-28 Thread Takuya Yoshikawa

On Fri, 29 Apr 2011 14:38:08 +0900
Takuya Yoshikawa  wrote:

> On Thu, 28 Apr 2011 19:46:00 -0700
> Andi Kleen  wrote:
> 
> > Avi Kivity  writes:
> > >
> > > Good optimization.  copy_from_user() really isn't optimized for short
> > > buffers, I expect much of the improvement comes from that.
> > 
> > Actually it is equivalent to get_user for the lenghts supported by
> > get_user, assuming you pass in a constant length. You probably do not.
> > 
> > -Andi
> 
> 
> Weird, I actually measured some changes even after dropping other parts
> than get_user() usage.
> 
> Only I can guess for that reason is the reduction of some function calls
> by inlining some functions.

A bit to clarify.

Original path:

  kvm_read_guest_page_mmu()
kvm_read_guest_page()
  copy_from_user()

Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Autotest] [Autotest PATCH] KVM-test: TSC drift test

2011-04-28 Thread Lucas Meneghel Rodrigues

On Thu, Apr 21, 2011 at 4:33 AM, Amos Kong  wrote:
> This case is used to test the drift between host and guest.
> Use taskset to make tsc program execute in a single cpu.
> If the drift ratio bigger than 10%, then fail this case.

The calculations of the tsc frequency looks wrong...  See comments below.

Also, when Glauber or Zach could take a look into this test it'd be great!

> Signed-off-by: Amos Kong 
> ---
>  client/tests/kvm/deps/get_tsc.c        |   27 ++
>  client/tests/kvm/tests/tsc_drift.py    |   88 
> 
>  client/tests/kvm/tests_base.cfg.sample |    5 ++
>  3 files changed, 120 insertions(+), 0 deletions(-)
>  create mode 100644 client/tests/kvm/deps/get_tsc.c
>  create mode 100644 client/tests/kvm/tests/tsc_drift.py
>
> diff --git a/client/tests/kvm/deps/get_tsc.c b/client/tests/kvm/deps/get_tsc.c
> new file mode 100644
> index 000..e91a41f
> --- /dev/null
> +++ b/client/tests/kvm/deps/get_tsc.c
> @@ -0,0 +1,27 @@
> +/*
> + * Programme to get cpu's TSC(time stamp counter)
> + * Copyright(C) 2009 Redhat, Inc.
> + * Amos Kong 
> + * Dec 9, 2009
> + *
> + */
> +
> +#define _GNU_SOURCE
> +#include 
> +#include 
> +
> +typedef unsigned long long u64;
> +
> +u64 rdtsc(void)
> +{
> +       unsigned tsc_lo, tsc_hi;
> +
> +       asm volatile("rdtsc" : "=a"(tsc_lo), "=d"(tsc_hi));
> +       return tsc_lo | (u64)tsc_hi << 32;
> +}
> +
> +int main(void)
> +{
> +       printf("%lld\n", rdtsc());
> +       return 0;
> +}
> diff --git a/client/tests/kvm/tests/tsc_drift.py 
> b/client/tests/kvm/tests/tsc_drift.py
> new file mode 100644
> index 000..de2fb76
> --- /dev/null
> +++ b/client/tests/kvm/tests/tsc_drift.py
> @@ -0,0 +1,88 @@
> +import time, os, logging, commands, re
> +from autotest_lib.client.common_lib import error
> +from autotest_lib.client.bin import local_host
> +import kvm_test_utils
> +
> +
> +def run_tsc_drift(test, params, env):
> +    """
> +    Check the TSC(time stamp counter) frequency of guest and host whether 
> match
> +    or not
> +
> +    1) Computer average tsc frequency of host's cpus by C the program
> +    2) Copy the C code to the guest, complie and run it to get tsc
> +       frequency of guest's vcpus
> +    3) Sleep sometimes and get the TSC of host and guest again
> +    4) Compute the TSC frequency of host and guest
> +    5) Compare the frequency deviation between host and guest with standard
> +
> +    @param test: Kvm test object
> +    @param params: Dictionary with the test parameters.
> +    @param env: Dictionary with test environment.
> +    """
> +    drift_threshold = float(params.get("drift_threshold"))
> +    interval = float(params.get("interval"))
> +    cpu_chk_cmd = params.get("cpu_chk_cmd")
> +    tsc_freq_path = os.path.join(test.bindir, 'deps/get_tsc.c')
> +    host_freq = 0
> +
> +    def get_tsc(machine="host", i=0):
> +        cmd = "taskset -c %s /tmp/get_tsc" % i
> +        if machine == "host":
> +            s, o = commands.getstatusoutput(cmd)
> +        else:
> +            s, o = session.cmd_status_output(cmd)
> +        if s != 0:
> +            logging.debug(o)
> +            raise error.TestError("Fail to get tsc of host, ncpu: %d" % i)
> +        return float(re.findall("(\d+)",o)[0])
> +
> +    vm = env.get_vm(params["main_vm"])
> +    vm.verify_alive()
> +    timeout = float(params.get("login_timeout", 240))
> +    session = vm.wait_for_login(timeout=timeout)
> +
> +    commands.getoutput("gcc %s -o /tmp/get_tsc" % tsc_freq_path)
> +    ncpu = local_host.LocalHost().get_num_cpu()
> +
> +    logging.info("Interval is %s" % interval)
> +    logging.info("Determine the TSC frequency in the host")
> +    for i in range(ncpu):
> +        tsc1 = get_tsc("host", i)
> +        time.sleep(interval)
> +        tsc2 = get_tsc("host", i)
> +
> +        delta = tsc2 - tsc1
> +        logging.info("Host TSC delta for cpu %s is %s" % (i, delta))
> +        if delta < 0:
> +            raise error.TestError("Host TSC for cpu %s warps %s" % (i, 
> delta))

^ Yeah, I don't think this is expected to warp, but yet, good to check.

> +        host_freq += delta / ncpu

Now, i really didn't understand the concept behind the tsc frequency.
So we have a difference between 2 timestamps taken over an arbitrary
period of time (in this case, looks 30s by default) and divide by the
number of cpus, however we will repeat this procedure by the same
amount so:

N * ( d_tsc1/N + d_tsc2/N + d_tsc3/N + ... + d_tscn/N)

This could be simplified to

(d_tsc1 + d_tsc2 + d_tsc3 + + d_tscn)

Unless I'm missing something... so

host_freq = sum(d_tsci)

The calculation could be simplified then... And by definition, isn't
frequency how many times a phenomena occurs (in this case, change of
timestamp counter) per time? So, don't we have to divide this sum by
the time the program slept? Also, it looks like the whole logic to get
the frequencies can be factored to a single function and just call
that function for guest and host.

Re: [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk

2011-04-28 Thread Takuya Yoshikawa

On Thu, 28 Apr 2011 19:46:00 -0700
Andi Kleen  wrote:

> Avi Kivity  writes:
> >
> > Good optimization.  copy_from_user() really isn't optimized for short
> > buffers, I expect much of the improvement comes from that.
> 
> Actually it is equivalent to get_user for the lenghts supported by
> get_user, assuming you pass in a constant length. You probably do not.
> 
> -Andi


Weird, I actually measured some changes even after dropping other parts
than get_user() usage.

Only I can guess for that reason is the reduction of some function calls
by inlining some functions.

Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Autotest] [Autotest PATCH] KVM-test: Simple stop/continue test

2011-04-28 Thread Lucas Meneghel Rodrigues

On Thu, Apr 21, 2011 at 3:21 AM, Amos Kong  wrote:
> Change guest state by monitor cmd, verify guest status,
> and try to login guest by network.

I don't like the way you're handling human monitor and QMP monitors in
this tests...  comments below:

> Signed-off-by: Jason Wang 
> Signed-off-by: Amos Kong 
> ---
>  client/tests/kvm/tests/stop_continue.py |   52 
> +++
>  client/tests/kvm/tests_base.cfg.sample  |    4 ++
>  2 files changed, 56 insertions(+), 0 deletions(-)
>  create mode 100644 client/tests/kvm/tests/stop_continue.py
>
> diff --git a/client/tests/kvm/tests/stop_continue.py 
> b/client/tests/kvm/tests/stop_continue.py
> new file mode 100644
> index 000..c7d8025
> --- /dev/null
> +++ b/client/tests/kvm/tests/stop_continue.py
> @@ -0,0 +1,52 @@
> +import logging
> +from autotest_lib.client.common_lib import error
> +
> +
> +def run_stop_continue(test, params, env):
> +    """
> +    Suspend a running Virtual Machine and verify its state.
> +
> +    1) Boot the vm
> +    2) Suspend the vm through stop command
> +    3) Verify the state through info status command
> +    4) Check is the ssh session to guest is still responsive,
> +       if succeed, fail the test.
> +
> +    @param test: Kvm test object
> +    @param params: Dictionary with the test parameters
> +    @param env: Dictionary with test environment.
> +    """
> +    vm = env.get_vm(params["main_vm"])
> +    vm.verify_alive()
> +    timeout = float(params.get("login_timeout", 240))
> +    session = vm.wait_for_login(timeout=timeout)
> +
> +    try:
> +        logging.info("Suspend the virtual machine")
> +        vm.monitor.cmd("stop")
> +
> +        logging.info("Verifying the status of virtual machine through 
> monitor")
> +        o = vm.monitor.info("status")
> +        if 'paused' not in o and ( "u'running': False" not in str(o)):

^ Here, it's not clear what means a paused qmp monitor or a hmp
monitor. this statement is unnecessarily confusing. Here

'paused' not in o -> Is what would define a 'not paused hmp monitor'

"u'running': False" not in str(o) -> This defines a 'not paused qmp monitor'

why we are checking one _and_ the other, as one monitor can't be hmp
and qmp at the same time? It would be at least _or_. And like I said,
it's non trivial to flow this assertion made.

It seems to me that a better (although involving more code changes) approach is:

1) Introduce VM methods is_paused and verify_paused, which would
internally for the kvm vm class, call monitor methods also called
is_paused and verify_paused, with implementations for both hmp and
qmp. verify_paused would throw an exception in case of a failure,
while is_paused would return a boolean.

> +            logging.error(o)
> +            raise error.TestFail("Fail to suspend through monitor command 
> line")
> +
> +        logging.info("Check the session responsiveness")
> +        if session.is_responsive():
> +            raise error.TestFail("Session is still responsive after stop")
> +
> +        logging.info("Try to resume the guest")
> +        vm.monitor.cmd("cont")
> +
> +        o = vm.monitor.info("status")
> +        m_type = params.get("monitor_type", "human")
> +        if ('human' in m_type and 'running' not in o) or\
> +           ('qmp' in m_type and "u'running': True" not in str(o)):

^ same here, we should have is_running and verify_running methods on
VM that would call monitor methods with the same names.

Now, of course I might be really mistaken here, would like to hear
your opinion on that subject.

-- 
Lucas
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Autotest] [Autotest PATCH] KVM-test: Check if guest bootable after reseting several times

2011-04-28 Thread Lucas Meneghel Rodrigues

On Thu, Apr 21, 2011 at 3:47 AM, Amos Kong  wrote:
> This test comes from a regression bug:
> Guest can not found bootable device after reseting several times by
> monitor command.

Can you point out the bug number? I really don't expect that we keep
integrity of the disk after several resets, at least I wouldn't if it
was a bare metal machine... Anyway, I've made some changes to the
code, mostly removing unnecessary imports and having some messages
explaining what the test is doing... I am just not convinced that this
test should pass in all circumstances (I tried here and it does pass,
by the way).

> Signed-off-by: Amos Kong 
> ---
>  client/tests/kvm/tests/system_reset_bootable.py |   29 
> +++
>  client/tests/kvm/tests_base.cfg.sample          |    7 ++
>  2 files changed, 36 insertions(+), 0 deletions(-)
>  create mode 100755 client/tests/kvm/tests/system_reset_bootable.py
>
> diff --git a/client/tests/kvm/tests/system_reset_bootable.py 
> b/client/tests/kvm/tests/system_reset_bootable.py
> new file mode 100755
> index 000..ca9fb70
> --- /dev/null
> +++ b/client/tests/kvm/tests/system_reset_bootable.py
> @@ -0,0 +1,29 @@
> +import logging, time
> +from autotest_lib.client.common_lib import error
> +import kvm_test_utils
> +
> +
> +def run_system_reset_bootable(test, params, env):
> +    """
> +    KVM reset test:
> +    1) Boot guest.
> +    2) Send some times system_reset monitor command.
> +    3) Log into the guest to verify it could normally boot.
> +
> +    @param test: kvm test object
> +    @param params: Dictionary with the test parameters
> +    @param env: Dictionary with test environment.
> +    """
> +    vm = env.get_vm(params["main_vm"])
> +    vm.verify_alive()
> +    timeout = float(params.get("login_timeout", 240))
> +    reset_times = int(params.get("reset_times",20))
> +    interval = int(params.get("reset_interval",10))
> +    wait_time = int(params.get("wait_time_for_reset",60))
> +    time.sleep(wait_time)
> +
> +    for i in range(reset_times):
> +        vm.monitor.cmd("system_reset")
> +        time.sleep(interval)
> +
> +    session = vm.wait_for_login(timeout=timeout)
> diff --git a/client/tests/kvm/tests_base.cfg.sample 
> b/client/tests/kvm/tests_base.cfg.sample
> index 7333ed0..ceafebe 100644
> --- a/client/tests/kvm/tests_base.cfg.sample
> +++ b/client/tests/kvm/tests_base.cfg.sample
> @@ -961,6 +961,13 @@ variants:
>         sleep_before_reset = 20
>         kill_vm_on_error = yes
>
> +    - system_reset_bootable:
> +        type = system_reset_bootable
> +        interval = 1
> +        reset_times = 20
> +        wait_time_for_reset = 120
> +        kill_vm_on_error = yes
> +
>     - shutdown:     install setup unattended_install.cdrom
>         type = shutdown
>         shutdown_method = shell
>
> ___
> Autotest mailing list
> autot...@test.kernel.org
> http://test.kernel.org/cgi-bin/mailman/listinfo/autotest
>



-- 
Lucas
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM test: Fix problem when unexpected strings appear on SSH prompt

2011-04-28 Thread Lucas Meneghel Rodrigues

In some conditions, ssh may respond things like:

Warning: Permanently added localhost' (RSA) to the list of known hosts.

Take that into account on the remote login function.

Signed-off-by: Jason D. Gaston 
---
 client/virt/virt_utils.py |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/client/virt/virt_utils.py b/client/virt/virt_utils.py
index 8fa64ca..398ecd4 100644
--- a/client/virt/virt_utils.py
+++ b/client/virt/virt_utils.py
@@ -557,7 +557,7 @@ def _remote_login(session, username, password, prompt, 
timeout=10):
 match, text = session.read_until_last_line_matches(
 [r"[Aa]re you sure", r"[Pp]assword:\s*$", r"[Ll]ogin:\s*$",
  r"[Cc]onnection.*closed", r"[Cc]onnection.*refused",
- r"[Pp]lease wait", prompt],
+ r"[Pp]lease wait", r"[Ww]arning", prompt],
 timeout=timeout, internal_timeout=0.5)
 if match == 0:  # "Are you sure you want to continue connecting"
 logging.debug("Got 'Are you sure...'; sending 'yes'")
@@ -592,7 +592,10 @@ def _remote_login(session, username, password, prompt, 
timeout=10):
 logging.debug("Got 'Please wait'")
 timeout = 30
 continue
-elif match == 6:  # prompt
+elif match == 6:  # "Warning added RSA"
+logging.debug("Got 'Warning added RSA to known host list")
+continue
+elif match == 7:  # prompt
 logging.debug("Got shell prompt -- logged in")
 break
 except aexpect.ExpectTimeoutError, e:
-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/4] virt.virt_utils: Get rid of create_report function

2011-04-28 Thread Lucas Meneghel Rodrigues

As it has been transfered to the html_report module itself.

Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/virt/virt_utils.py |   12 
 1 files changed, 0 insertions(+), 12 deletions(-)

diff --git a/client/virt/virt_utils.py b/client/virt/virt_utils.py
index 97a652d..8fa64ca 100644
--- a/client/virt/virt_utils.py
+++ b/client/virt/virt_utils.py
@@ -1187,18 +1187,6 @@ def run_tests(parser, job):
 return not failed
 
 
-def create_report(report_dir, results_dir):
-"""
-Creates a neatly arranged HTML results report in the results dir.
-
-@param report_dir: Directory where the report script is located.
-@param results_dir: Directory where the results will be output.
-"""
-reporter = os.path.join(report_dir, 'html_report.py')
-html_file = os.path.join(results_dir, 'results.html')
-os.system('%s -r %s -f %s -R' % (reporter, results_dir, html_file))
-
-
 def display_attributes(instance):
 """
 Inspects a given class instance attributes and displays them, convenient
-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/4] KVM test: control: Get rid of report generation code

2011-04-28 Thread Lucas Meneghel Rodrigues

As this is now handled by autotest and control file
writers don't have to worry about it anymore.

Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/tests/kvm/control |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/client/tests/kvm/control b/client/tests/kvm/control
index 6437d88..c887a3e 100644
--- a/client/tests/kvm/control
+++ b/client/tests/kvm/control
@@ -67,6 +67,3 @@ if args:
 parser.parse_string(str)
 
 virt_utils.run_tests(parser, job)
-
-# Generate a nice HTML report inside the job's results dir
-virt_utils.create_report(kvm_test_dir, job.resultdir)
-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/4] job: Write a job HTML report for every autotest client job

2011-04-28 Thread Lucas Meneghel Rodrigues

We have a tool that can generate such a file and it makes it
easier for people who don't have access to the autotest web
interface to analyze job results. With this, all client jobs
write such a file, so test writers don't have to worry about
it.

This change does not regress the job unittests.

Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/bin/job.py |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/client/bin/job.py b/client/bin/job.py
index 9effcdb..a792309 100644
--- a/client/bin/job.py
+++ b/client/bin/job.py
@@ -17,6 +17,7 @@ from autotest_lib.client.common_lib import base_job
 from autotest_lib.client.common_lib import error, barrier, log, logging_manager
 from autotest_lib.client.common_lib import base_packages, packages
 from autotest_lib.client.common_lib import global_config
+from autotest_lib.client.tools import html_report
 
 
 LAST_BOOT_TAG = object()
@@ -950,6 +951,9 @@ class base_client_job(base_job.base_job):
 self._tap.write()
 self._tap._write_tap_archive()
 
+# write out a job HTML report
+html_report.create_report(self.resultdir)
+
 # We are about to exit 'complete' so clean up the control file.
 dest = os.path.join(self.resultdir, os.path.basename(self._state_file))
 shutil.move(self._state_file, dest)
-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/4] tools/html_report: Make html report generation autotest generic

2011-04-28 Thread Lucas Meneghel Rodrigues

The html report tool was a bit too tied to KVM autotest, and that
was not necessary. Turn it into a generic module that can be used
by any autotest job, encapsulate the main worker function into
autotest API so we don't need to exec it in a subshell, fix naming
of the report and CSS details.

Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/tools/html_report.py |  158 ---
 1 files changed, 104 insertions(+), 54 deletions(-)

diff --git a/client/tools/html_report.py b/client/tools/html_report.py
index 8b4b109..7b17a75 100755
--- a/client/tools/html_report.py
+++ b/client/tools/html_report.py
@@ -1,6 +1,6 @@
 #!/usr/bin/python
 """
-Script used to parse the test results and generate an HTML report.
+Module used to parse the autotest job results and generate an HTML report.
 
 @copyright: (c)2005-2007 Matt Kruse (javascripttoolbox.com)
 @copyright: Red Hat 2008-2009
@@ -27,7 +27,6 @@ body {
 text-decoration:none;
 font:bold 2em/2em Arial, Helvetica, sans-serif;
 text-transform:none;
-text-shadow: 2px 2px 2px #555;
 text-align: left;
 color:#55;
 border-bottom: 1px solid #55;
@@ -37,7 +36,6 @@ body {
 text-decoration:none;
 font:bold 16px Arial, Helvetica, sans-serif;
 text-transform:uppercase;
-text-shadow: 2px 2px 2px #555;
 text-align: left;
 color:#55;
 margin-bottom:0;
@@ -1374,21 +1372,27 @@ function processList(ul) {
 }
 """
 
-
-#
-##  This script gets kvm autotest results directory path as an ##
-##  input and create a single html formatted result page.  ##
-#
-
 stimelist = []
 
 
 def make_html_file(metadata, results, tag, host, output_file_name, dirname):
+"""
+Create HTML file contents for the job report, to stdout or filesystem.
+
+@param metadata: Dictionary with Job metadata (tests, exec time, etc).
+@param results: List with testcase results.
+@param tag: Job tag.
+@param host: Client hostname.
+@param output_file_name: Output file name. If empty string, prints to
+stdout.
+@param dirname: Prefix for HTML links. If empty string, the HTML links
+will be relative to the results dir.
+"""
 html_prefix = """
 http://www.w3.org/TR/html4/strict.dtd";>
 
 
-KVM Autotest Results
+Autotest job execution results
 
 %s
 
@@ -1407,14 +1411,13 @@ return true;
 
 """ % (format_css, table_js, maketree_js)
 
-
 if output_file_name:
 output = open(output_file_name, "w")
 else:   #if no output file defined, print html file to console
 output = sys.stdout
 # create html page
 print >> output, html_prefix
-print >> output, 'KVM Autotest Execution Report'
+print >> output, 'Autotest job execution report'
 
 # formating date and time to print
 t = datetime.datetime.now()
@@ -1438,7 +1441,7 @@ return true;
 stat_str = ('From %d tests executed, %d have passed (%d%% failures)' %
 (total_executed, total_passed, failed_perct))
 
-kvm_ver_str = metadata['kvmver']
+kvm_ver_str = metadata.get('kvmver', None)
 
 print >> output, ''
 print >> output, 'HOST:%s' % host
@@ -1446,10 +1449,10 @@ return true;
 print >> output, 'DATE:%s' % 
now.ctime()
 print >> output, 'STATS:%s'% stat_str
 print >> output, ''
-print >> output, 'KVM VERSION:%s' % 
kvm_ver_str
+if kvm_ver_str is not None:
+print >> output, 'KVM VERSION:%s' 
% kvm_ver_str
 print >> output, ''
 
-
 ## print test results
 print >> output, ''
 print >> output, 'Test Results'
@@ -1529,6 +1532,12 @@ id="t1" class="stats table-autosort:4 table-autofilter 
table-stripeclass:alterna
 
 
 def parse_result(dirname, line):
+"""
+Parse job status log line.
+
+@param dirname: Job results dir
+@param line: Status log line.
+"""
 parts = line.split()
 if len(parts) < 4:
 return None
@@ -1556,7 +1565,7 @@ def parse_result(dirname, line):
 # assign actual values
 rx = re.compile('^(\w+)\.(.*)$')
 m1 = rx.findall(parts[3])
-result['testcase'] = m1[0][1]
+result['testcase'] = str(tag)
 result['title'] = str(tag)
 result['status'] = parts[1]
 if result['status'] != 'GOOD':
@@ -1572,10 +1581,16 @@ def parse_result(dirname, line):
 
 
 def get_exec_log(resdir, tag):
-stdout_file = os.path.join(resdir, tag) + '/debug/stdout'
-stderr_file = os.path.join(resdir, tag) + '/debug/stderr'
-status_file = os.path.join(resdir, tag) + '/status'
-dmesg_file = os.path.join(resdir, tag) + '/sysinfo/dmesg'
+"""
+Get job execution summary.
+
+@param resdir: Job results dir.
+@param tag: Job tag.
+"""
+stdout_file = os.path.join(resdir, tag, 'debug', 'stdout')
+stderr_file = os.path.join(resdir, tag, 'de

[PATCH 0/4] Fix HTML report generation v2

2011-04-28 Thread Lucas Meneghel Rodrigues

Made HTML report generation generic and useable by any autotest
client job, fixed up some bugs in HTML report code. Now it'll
be easier to browse and visualize results without the autotest web
interface.

Lucas Meneghel Rodrigues (4):
  tools/html_report: Make html report generation autotest generic
  job: Write a job HTML report for every autotest client job
  virt.virt_utils: Get rid of create_report function
  KVM test: control: Get rid of report generation code

 client/bin/job.py   |4 +
 client/tests/kvm/control|3 -
 client/tools/html_report.py |  158 ---
 client/virt/virt_utils.py   |   12 ---
 4 files changed, 108 insertions(+), 69 deletions(-)

-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bug in KVM clock backwards compensation

2011-04-28 Thread Zachary Amsden


On 04/28/2011 01:20 PM, Joerg Roedel wrote:

On Thu, Apr 28, 2011 at 11:34:44AM -0700, Zachary Amsden wrote:
   

On 04/28/2011 12:13 AM, Roedel, Joerg wrote:
 
   

I see it different. This code wants to check if the _guest_ tsc moves
forwared (or at least not backwards). So it is fully legitimate to just
do this by reading the guest-tsc and compare it to the last one the
guest had.
   

That wasn't the intention when I wrote that code.  It's simply there to
detect backwards motion of the host TSC.  The guest TSC can legally go
backwards whenever the guest decides to change it, so checking the guest
TSC doesn't make sense here.
 

This code checks how many guest tsc cycles have passed since this vCPU
was de-scheduled last time (and before it is running again). So since
the vCPU hasn't run in the meantime it had no chance to change its TSC.
Further, the other parameters like the TSC offset and the scaling
multiplier havn't changed too, so the only variable in the guest-tsc
calculation is the host-tsc.
So this calculation using the guest-tsc can detect backwards going
host-tsc as good as the old one. The benefit here is that we can feed
consistent values into adjust_tsc_offset().
   


While true, this is more complex than the original code.  The original 
code here doesn't try to actually compensate for the guest TSC 
difference - instead what it does is NULL any discovered host TSC delta:


if (tsc_delta < 0)
mark_tsc_unstable("KVM discovered backwards TSC");
if (check_tsc_unstable()) {
kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
vcpu->arch.tsc_catchup = 1;
}
kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);

Erasing that delta also erases elapsed time since the VCPU has last been 
run, which isn't desirable, so it then sets tsc_catchup mode, which will 
restore the proper TSC.  The request here triggers code which later 
updates the TSC offset again.


To avoid complexity, I think it's simplest to do the first computation 
in terms of the host TSC.



Yes, with tsc-scaling, the machines already have stable TSCs - the above
test is for older hardware which could have problems, and can be
reverted back to the original code without worrying about switching
units.
 

This is the case pratically. But architecturally the tsc-scaling feature
does not depend on a constant tsc, so we can make no such assumtions.
Additionally, it may happen that Linux mis-detects an unstable tsc for
some reason (broken BIOS, bug in the code, ...).  Therefore I think it
is dangerous to assume that this code will never run on tsc-scaling
capable hosts. And if it does and we don't manage the tsc-offset units
right, we may see very weird behavior.
   


I agree, it is best to handle this case - hardware can and will change - 
but the TSC adjustment in terms of guest rate should be done under the 
atomic protection right before entering hardware virtualized mode - here:


I left compute_guest_tsc in place to recompute time in guest units here, 
even if the underlying hardware rate changes.


/*
 * We may have to catch up the TSC to match elapsed wall clock
 * time for two reasons, even if kvmclock is used.
 *   1) CPU could have been running below the maximum TSC rate
 *   2) Broken TSC compensation resets the base at each VCPU
 *  entry to avoid unknown leaps of TSC even when running
 *  again on the same CPU.  This may cause apparent elapsed
 *  time to disappear, and the guest to stand still or run
 *  very slowly.
 */
if (vcpu->tsc_catchup) {
u64 tsc = compute_guest_tsc(v, kernel_ns);
if (tsc > tsc_timestamp) {
kvm_x86_ops->adjust_tsc_offset(v, tsc - 
tsc_timestamp);

tsc_timestamp = tsc;
}
}

So yeah, the code is getting pretty complex but we'd like to avoid that 
as much as possible - so I would prefer to have the hardware backwards 
compensation separate from the guest rate computation by doing this:


step 1) remove any backwards hardware TSC delta (in hardware units)
step 2) recompute guest TSC from a stable clock (gotten from kernel_ns) 
and apply adjustment (in guest units)


So it appears you can just share most of the logic of guest TSC catchup 
mode.


What do you think?

Zach
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk

2011-04-28 Thread Andi Kleen

Avi Kivity  writes:
>
> Good optimization.  copy_from_user() really isn't optimized for short
> buffers, I expect much of the improvement comes from that.

Actually it is equivalent to get_user for the lenghts supported by
get_user, assuming you pass in a constant length. You probably do not.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -v3] Monitor command: x-gpa2hva, translate guest physical address to host virtual address

2011-04-28 Thread Huang Ying

On 04/28/2011 10:04 PM, Marcelo Tosatti wrote:
> On Thu, Apr 28, 2011 at 08:00:19AM -0500, Anthony Liguori wrote:
>> On 04/27/2011 06:06 PM, Marcelo Tosatti wrote:
>>> On Fri, Nov 19, 2010 at 04:17:35PM +0800, Huang Ying wrote:
 On Tue, 2010-11-16 at 10:23 +0800, Huang Ying wrote:
> Author: Max Asbock
>
> Add command x-gpa2hva to translate guest physical address to host
> virtual address. Because gpa to hva translation is not consistent, so
> this command is only used for debugging.
>
> The x-gpa2hva command provides one step in a chain of translations from
> guest virtual to guest physical to host virtual to host physical. Host
> physical is then used to inject a machine check error. As a
> consequence the HWPOISON code on the host and the MCE injection code
> in qemu-kvm are exercised.
>
> v3:
>
> - Rename to x-gpa2hva
> - Remove QMP support, because gpa2hva is not consistent

 Is this patch an acceptable solution for now? This command is useful for
 our testing.
>>>
>>> Anthony?
>>
>> Yeah, but it should come through qemu-devel, no?
> 
> Yes, Huang Ying, can you please resend?

Via QEMU git or uq/master branch of KVM git?

Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Guest virtual CPUs limited to only 50% of host CPU

2011-04-28 Thread Drew Johnson

Thanks for the reply. Turns out it was htop reporting it wrong. The
utime in /proc/stat already includes guest time.

On Tue, Apr 5, 2011 at 2:03 AM, Avi Kivity  wrote:
> On 03/31/2011 11:03 PM, Drew Johnson wrote:
>>
>> Hi,
>>
>> I am using Qemu-KVM-0.12.5
>
> This is somewhat old.  Try upgrading.
>
>>  on Intel Xeon (Vt-x enabled) processors and
>> monitoring the system using htop on the host. On the processors that
>> are running Qemu-KVM I am seeing a 50/50 split between userspace and
>> guest ("gu:" in htop). I have pinned the vCPU qemu-kvm threads to
>> specific host CPUs using taskset. In the guest the CPUs are nearly
>> 100% userspace in htop.
>>
>> Does anyone have ideas on why this is? Is there a way I can get much
>> higher utilization for the guest virtual CPUs wrt the host?
>>
>
> Please build qemu with --disable-cpu-strip and run 'perf top' to see what's
> going on.
>
> --
> error compiling committee.c: too many arguments to function
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH v3 1/5] hpet 'driftfix': add hooks required to detect coalesced interrupts (x86 apic only)

2011-04-28 Thread Jan Kiszka

On 2011-04-28 20:51, Blue Swirl wrote:
> On Thu, Apr 28, 2011 at 5:24 PM, Ulrich Obergfell  wrote:
>> 'target_get_irq_delivered' and 'target_reset_irq_delivered' contain
>> entry addresses of functions that are utilized by update_irq() to
>> detect coalesced interrupts. apic code loads these pointers during
>> initialization.
> 
> I'm not so happy with this approach, but probably then the i386
> dependencies can be removed from RTC and it can be compiled only once
> for all targets.

This whole series is really the minimalistic approach. The callbacks
defined here must remain a temporary "shortcut". Just like proper
abstraction of periodic tick compensation for reuse in other timers has
to be added later on. And the limitation to edge-triggered legacy HPET
INTs has to be removed.

Jan

signature.asc
Description: OpenPGP digital signature

Re: Bug in KVM clock backwards compensation

2011-04-28 Thread Jan Kiszka

On 2011-04-28 21:06, Zachary Amsden wrote:
> On 04/28/2011 12:22 AM, Roedel, Joerg wrote:
>> On Thu, Apr 28, 2011 at 03:06:01AM -0400, Jan Kiszka wrote:
>>   
>>> And /me still wonders (like I did when this first popped up) if the
>>> proper place of determining TSC stability really have to be KVM.
>>>
>>> If the Linux core fails to detect some instability and KVM has to jump
>>> in, shouldn't we better improve the core's detection abilities and make
>>> use of them in KVM? Conceptually this looks like we are currently just
>>> working around a core deficit in KVM.
>>>  
>> Yes, good question. Has this ever triggered on a real machine (not
>> counting the suspend/resume issue in)?
>>
> 
> Yes... some platforms don't go haywire until you start using power
> mangement, TSC is stable before that, but not afterwards, and depending
> on the version of the kernel, KVM might detect this before the kernel does.
> 
> Honestly, the code is obsolete, but still useful for those who build KVM
> as an external module on older kernels using the kvm-kmod system.

I'll happily accept patches that migrate any logic to kvm-kmod that the
current kernel does not need it anymore.

Jan



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] KVM call minutes for Apr 26

2011-04-28 Thread Lucas Meneghel Rodrigues

On Tue, 2011-04-26 at 16:29 -0500, Anthony Liguori wrote:
> On 04/26/2011 11:47 AM, Lucas Meneghel Rodrigues wrote:
> > On Tue, 2011-04-26 at 17:58 +0300, Avi Kivity wrote:
> >> On 04/26/2011 05:41 PM, Chris Wright wrote:
> >>> - having basic common config could be useful
> 
> Hi Lucas,
> 
> Could you send your suggested config as a patch to qemu.git?  Even 
> better if it was automatically invoked via a make autotest target 
> although if you supply the config, I'll happily patch the Makefile.
> 
> Regards,
> 
> Anthony Liguori

So Anthony, after hearing quite a lot of feedback from our colleagues,
I'll start working on a 'make autotest' target for qemu and qemu-kvm.

A lot of compromises will have to be made here, I was thinking on a very
minimum setup where folks can get unittests, followed by booting,
rebooting, and single host migration, using a minimal fedora guest
image, that 'make autotest' would download.

Meanwhile, here is Avi's config file translated to a more current
autotest state. I don't think we should rush into putting this into
qemu's repo, as I am aiming for having this more transparent to the
user. Just publishing here FYI.

include tests_base.cfg
include cdkeys.cfg

image_name(_.*)? ?<= /tmp/kvm_autotest_root/images/
cdrom(_.*)? ?<= /tmp/kvm_autotest_root/
floppy ?<= /tmp/kvm_autotest_root/
Linux..unattended_install:
kernel ?<= /tmp/kvm_autotest_root/
initrd ?<= /tmp/kvm_autotest_root/

variants:
# The variant names are the testset names
- @regression:
# We want qemu-kvm for this run
qemu_binary = /usr/bin/qemu-kvm
qemu_img_binary = /usr/bin/qemu-img
only qcow2
only rtl8139
only ide
only smp2
only no_pci_assignable
only smallpages
only Fedora.14 Win7
only unattended_install.cdrom, boot, reboot, migrate, shutdown

abort_on_error = yes
kill_vm.* ?= no
kill_unresponsive_vms.* ?= no

# Put the testset you want here
only regression

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [CentOS] Install CentOS as KVM guest

2011-04-28 Thread Emmanuel Noobadmin

On 4/28/11, Gleb Natapov  wrote:
> of virt stack. You should use libvirt or virt-manager instead. Especially
> if you are concerned about security. I think libvirt can start guest on
> headless server.
>
> If this still fails for you you need to complain to libvirt developers
> (not in a rant mode, but providing details of what exact version of
> software you have problem with and what are you trying to do). And
> libvirt developers will not be shy to complain to qemu developers if the
> problem turned to be on their side.

I've finally got an installation working, not using virt-install or
virt-manager. After reading through the libvirt site, I started
writing the domain definition manually.

Through trial and error, comparison with what virt-install generated
and the online examples, I got a working xml. Just for the record,
virsh --version reports 0.8.1

For the benefits of other newbies, my discovery so far is that

1. No-activity after the guest VM started
Originally, when I specified the CentOS DVD ISO, the guest will load
and then do nothing but chew up 100% CPU cycle on the allocated 1 vcpu
for quite some time. Subsequently, it appeared that mounting the ISO
as loop back is the solution. This seemed to imply that libvirt or KVM
couldn't boot a guest from ISO... which didn't quite make sense.

I ran into the issue when using my manually generated XML, it turned
out that the reason was the permissions (644) on vmlinuz and
initrd.img on the DVD. By copying the two files to local disk,
changing the permissions and using the  and  options,
I was able to boot the guest.

I was curious how virt-install got around this and learnt that I could
dump the config from a running machine. It turns out that virt-install
didn't exactly use the .xml it created, it added stuff to the running
version. Importantly making a temporary copy of initrd.img and
vmlinuz. I think the ISO problem with virt-install may be that it was
unable to mount the ISO to copy these files despite me running it as
root.

2. Anaconda couldn't see the DVD
Which was my rant earlier, since it sounded really stupid that the
installer couldn't see the disc it just booted off. Now, with #1
solved, it seems that anaconda wasn't booting off the disc after all.

However, the interesting thing here is that once I got past #1, My
guest could install from the DVD.

After comparing the xml files, it seems the problem is virt-install
did not save the path to the ISO/mounted DVD. Under the 
element, there wasn't a source. With my manually generated xml,
specifying the ISO as the source worked.

But the virt-installed anaconda was complaining I don't have any hard
disks or cdroms. Not that there was no disk in the drive. Everytime I
picked an option like install media in HDD or CDROM, it prompted me no
device, do I want to install a driver. Since the hard disk definition
appears to be the same, I'm not sure why that happened with
virt-install's xml but not mine.

So right now, I managed to get the OS installed, rebooting it required
removing the initrd and kernel entry as well as the source so that it
would boot from the image disk.

Only problem is... networking still isn't working although brctl show
on the host shows that a vnet0 had been created and attached to the
bridge. Any pointers would be appreciated!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm tools: Emulate RTC to fix system time in guests

2011-04-28 Thread Ingo Molnar


* Pekka Enberg  wrote:

> This patch fixes system time in guests by implementing proper CMOS RTC clock
> support.
> 
>   # Before:
> 
>   sh-2.05b# date
>   Fri Aug  7 04:02:01 UTC 2009
> 
>   # After:
> 
>   sh-2.05b# date
>   Thu Apr 28 19:12:21 UTC 2011

Very nice!

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bug in KVM clock backwards compensation

2011-04-28 Thread Joerg Roedel

On Thu, Apr 28, 2011 at 11:34:44AM -0700, Zachary Amsden wrote:
> On 04/28/2011 12:13 AM, Roedel, Joerg wrote:

>> I see it different. This code wants to check if the _guest_ tsc moves
>> forwared (or at least not backwards). So it is fully legitimate to just
>> do this by reading the guest-tsc and compare it to the last one the
>> guest had.
>
> That wasn't the intention when I wrote that code.  It's simply there to  
> detect backwards motion of the host TSC.  The guest TSC can legally go  
> backwards whenever the guest decides to change it, so checking the guest  
> TSC doesn't make sense here.

This code checks how many guest tsc cycles have passed since this vCPU
was de-scheduled last time (and before it is running again). So since
the vCPU hasn't run in the meantime it had no chance to change its TSC.
Further, the other parameters like the TSC offset and the scaling
multiplier havn't changed too, so the only variable in the guest-tsc
calculation is the host-tsc.
So this calculation using the guest-tsc can detect backwards going
host-tsc as good as the old one. The benefit here is that we can feed
consistent values into adjust_tsc_offset().

> Yes, with tsc-scaling, the machines already have stable TSCs - the above  
> test is for older hardware which could have problems, and can be  
> reverted back to the original code without worrying about switching 
> units.

This is the case pratically. But architecturally the tsc-scaling feature
does not depend on a constant tsc, so we can make no such assumtions.
Additionally, it may happen that Linux mis-detects an unstable tsc for
some reason (broken BIOS, bug in the code, ...).  Therefore I think it
is dangerous to assume that this code will never run on tsc-scaling
capable hosts. And if it does and we don't manage the tsc-offset units
right, we may see very weird behavior.

Regards,

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm tools: Emulate RTC to fix system time in guests

2011-04-28 Thread Pekka Enberg

This patch fixes system time in guests by implementing proper CMOS RTC clock
support.

  # Before:

  sh-2.05b# date
  Fri Aug  7 04:02:01 UTC 2009

  # After:

  sh-2.05b# date
  Thu Apr 28 19:12:21 UTC 2011

Cc: Asias He 
Cc: Cyrill Gorcunov 
Cc: Ingo Molnar 
Cc: Prasad Joshi 
Cc: Sasha Levin 
Signed-off-by: Pekka Enberg 
---
 tools/kvm/Makefile  |3 +-
 tools/kvm/include/kvm/rtc.h |6 +++
 tools/kvm/ioport.c  |   26 -
 tools/kvm/kvm-run.c |3 +
 tools/kvm/rtc.c |   87 +++
 5 files changed, 98 insertions(+), 27 deletions(-)
 create mode 100644 tools/kvm/include/kvm/rtc.h
 create mode 100644 tools/kvm/rtc.c

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index fbce14d..659bc35 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -26,8 +26,9 @@ OBJS  += kvm-cpu.o
 OBJS   += main.o
 OBJS   += mmio.o
 OBJS   += pci.o
-OBJS   += util.o
+OBJS   += rtc.o
 OBJS   += term.o
+OBJS   += util.o
 OBJS   += virtio.o
 OBJS+= util/parse-options.o
 OBJS+= util/strbuf.o
diff --git a/tools/kvm/include/kvm/rtc.h b/tools/kvm/include/kvm/rtc.h
new file mode 100644
index 000..0b8d9f9
--- /dev/null
+++ b/tools/kvm/include/kvm/rtc.h
@@ -0,0 +1,6 @@
+#ifndef KVM__RTC_H
+#define KVM__RTC_H
+
+void rtc__init(void);
+
+#endif /* KVM__RTC_H */
diff --git a/tools/kvm/ioport.c b/tools/kvm/ioport.c
index e3f67fc..a38d2d1 100644
--- a/tools/kvm/ioport.c
+++ b/tools/kvm/ioport.c
@@ -12,28 +12,6 @@
 
 bool ioport_debug;
 
-static uint8_t ioport_to_uint8(void *data)
-{
-   uint8_t *p = data;
-
-   return *p;
-}
-
-static bool cmos_ram_rtc_io_out(struct kvm *self, uint16_t port, void *data, 
int size, uint32_t count)
-{
-   uint8_t value;
-
-   value   = ioport_to_uint8(data);
-
-   self->nmi_disabled  = value & (1UL << 7);
-
-   return true;
-}
-
-static struct ioport_operations cmos_ram_rtc_ops = {
-   .io_out = cmos_ram_rtc_io_out,
-};
-
 static bool debug_io_out(struct kvm *self, uint16_t port, void *data, int 
size, uint32_t count)
 {
exit(EXIT_SUCCESS);
@@ -128,10 +106,6 @@ void ioport__setup_legacy(void)
ioport__register(0x0060, &dummy_read_write_ioport_ops, 2);
ioport__register(0x0064, &dummy_read_write_ioport_ops, 1);
 
-   /* PORT 0070-007F - CMOS RAM/RTC (REAL TIME CLOCK) */
-   ioport__register(0x0070, &cmos_ram_rtc_ops, 1);
-   ioport__register(0x0071, &dummy_read_write_ioport_ops, 1);
-
/* 0x00A0 - 0x00AF - 8259A PIC 2 */
ioport__register(0x00A0, &dummy_read_write_ioport_ops, 2);
 
diff --git a/tools/kvm/kvm-run.c b/tools/kvm/kvm-run.c
index b21b4a2..64f3409 100644
--- a/tools/kvm/kvm-run.c
+++ b/tools/kvm/kvm-run.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -416,6 +417,8 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
 
ioport__setup_legacy();
 
+   rtc__init();
+
kvm__setup_bios(kvm);
 
serial8250__init(kvm);
diff --git a/tools/kvm/rtc.c b/tools/kvm/rtc.c
new file mode 100644
index 000..b9c09b9
--- /dev/null
+++ b/tools/kvm/rtc.c
@@ -0,0 +1,87 @@
+#include "kvm/rtc.h"
+
+#include "kvm/ioport.h"
+#include "kvm/kvm.h"
+
+#include 
+
+static uint8_t cmos_index;
+
+#define CMOS_RTC_SECONDS   0x00
+#define CMOS_RTC_MINUTES   0x02
+#define CMOS_RTC_HOURS 0x04
+#define CMOS_RTC_DATE_OF_MONTH 0x07
+#define CMOS_RTC_MONTH 0x08
+#define CMOS_RTC_YEAR  0x09
+
+static inline unsigned char bin2bcd(unsigned val)
+{
+   return ((val / 10) << 4) + val % 10;
+}
+
+static bool cmos_ram_data_in(struct kvm *self, uint16_t port, void *data, int 
size, uint32_t count)
+{
+   struct tm *tm;
+   time_t ti;
+
+   time(&ti);
+
+   tm = gmtime(&ti);
+
+   switch (cmos_index) {
+   case CMOS_RTC_SECONDS:
+   ioport__write8(data, bin2bcd(tm->tm_sec));
+   break;
+   case CMOS_RTC_MINUTES:
+   ioport__write8(data, bin2bcd(tm->tm_min));
+   break;
+   case CMOS_RTC_HOURS:
+   ioport__write8(data, bin2bcd(tm->tm_hour));
+   break;
+   case CMOS_RTC_DATE_OF_MONTH:
+   ioport__write8(data, bin2bcd(tm->tm_mday));
+   break;
+   case CMOS_RTC_MONTH:
+   ioport__write8(data, bin2bcd(tm->tm_mon + 1));
+   break;
+   case CMOS_RTC_YEAR:
+   ioport__write8(data, bin2bcd(tm->tm_year));
+   break;
+   }
+
+   return true;
+}
+
+static bool cmos_ram_data_out(struct kvm *self, uint16_t port, void *data, int 
size, uint32_t count)
+{
+   return true;
+}
+
+static struct ioport_operations cmos_ram_data_ioport_ops = {
+   .io_out = cmos_ram_data_out,
+   .io_in  = cmos_ram_data_in,
+};
+
+static bool cmos_ram_index_out(struct kvm *self, uint16_t port, vo

[PATCH 3/3] KVM test: control: Replace create_report function call

2011-04-28 Thread Lucas Meneghel Rodrigues

Following up with the previous changes.

Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/tests/kvm/control |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/client/tests/kvm/control b/client/tests/kvm/control
index 6437d88..eb95a33 100644
--- a/client/tests/kvm/control
+++ b/client/tests/kvm/control
@@ -23,6 +23,7 @@ For online docs, please refer to 
http://www.linux-kvm.org/page/KVM-Autotest
 import sys, os, logging
 from autotest_lib.client.common_lib import cartesian_config
 from autotest_lib.client.virt import virt_utils
+from autotest_lib.client.tools import html_report
 
 # set English environment (command output might be localized, need to be safe)
 os.environ['LANG'] = 'en_US.UTF-8'
@@ -69,4 +70,4 @@ parser.parse_string(str)
 virt_utils.run_tests(parser, job)
 
 # Generate a nice HTML report inside the job's results dir
-virt_utils.create_report(kvm_test_dir, job.resultdir)
+html_report.create_report(job.resultdir)
-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] virt.virt_utils: Get rid of create_report function

2011-04-28 Thread Lucas Meneghel Rodrigues

As it has been transfered to the html_report module itself.

Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/virt/virt_utils.py |   12 
 1 files changed, 0 insertions(+), 12 deletions(-)

diff --git a/client/virt/virt_utils.py b/client/virt/virt_utils.py
index 97a652d..8fa64ca 100644
--- a/client/virt/virt_utils.py
+++ b/client/virt/virt_utils.py
@@ -1187,18 +1187,6 @@ def run_tests(parser, job):
 return not failed
 
 
-def create_report(report_dir, results_dir):
-"""
-Creates a neatly arranged HTML results report in the results dir.
-
-@param report_dir: Directory where the report script is located.
-@param results_dir: Directory where the results will be output.
-"""
-reporter = os.path.join(report_dir, 'html_report.py')
-html_file = os.path.join(results_dir, 'results.html')
-os.system('%s -r %s -f %s -R' % (reporter, results_dir, html_file))
-
-
 def display_attributes(instance):
 """
 Inspects a given class instance attributes and displays them, convenient
-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/3] tools/html_report: Make html report generation autotest generic

2011-04-28 Thread Lucas Meneghel Rodrigues

The html report tool was a bit too tied to KVM autotest, and that
was not necessary. Turn it into a generic module that can be used
by any autotest job, encapsulate the main worker function into
autotest API so we don't need to exec it in a subshell, fix naming
of the report and CSS details.

Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/tools/html_report.py |   85 ---
 1 files changed, 47 insertions(+), 38 deletions(-)

diff --git a/client/tools/html_report.py b/client/tools/html_report.py
index 8b4b109..1b1584c 100755
--- a/client/tools/html_report.py
+++ b/client/tools/html_report.py
@@ -27,7 +27,6 @@ body {
 text-decoration:none;
 font:bold 2em/2em Arial, Helvetica, sans-serif;
 text-transform:none;
-text-shadow: 2px 2px 2px #555;
 text-align: left;
 color:#55;
 border-bottom: 1px solid #55;
@@ -37,7 +36,6 @@ body {
 text-decoration:none;
 font:bold 16px Arial, Helvetica, sans-serif;
 text-transform:uppercase;
-text-shadow: 2px 2px 2px #555;
 text-align: left;
 color:#55;
 margin-bottom:0;
@@ -1388,7 +1386,7 @@ def make_html_file(metadata, results, tag, host, 
output_file_name, dirname):
 http://www.w3.org/TR/html4/strict.dtd";>
 
 
-KVM Autotest Results
+Autotest job execution results
 
 %s
 
@@ -1414,7 +1412,7 @@ return true;
 output = sys.stdout
 # create html page
 print >> output, html_prefix
-print >> output, 'KVM Autotest Execution Report'
+print >> output, 'Autotest job execution report'
 
 # formating date and time to print
 t = datetime.datetime.now()
@@ -1446,7 +1444,8 @@ return true;
 print >> output, 'DATE:%s' % 
now.ctime()
 print >> output, 'STATS:%s'% stat_str
 print >> output, ''
-print >> output, 'KVM VERSION:%s' % 
kvm_ver_str
+if kvm_ver_str:
+print >> output, 'KVM VERSION:%s' 
% kvm_ver_str
 print >> output, ''
 
 
@@ -1556,7 +1555,7 @@ def parse_result(dirname, line):
 # assign actual values
 rx = re.compile('^(\w+)\.(.*)$')
 m1 = rx.findall(parts[3])
-result['testcase'] = m1[0][1]
+result['testcase'] = str(tag)
 result['title'] = str(tag)
 result['status'] = parts[1]
 if result['status'] != 'GOOD':
@@ -1648,9 +1647,50 @@ def get_kvm_version(result_dir):
 kvm_version = get_keyval_value(result_dir, "kvm_version")
 kvm_userspace_version = get_keyval_value(result_dir,
  "kvm_userspace_version")
+if kvm_version == "Unknown" or kvm_userspace_version == "Unknown":
+return None
 return "Kernel: %sUserspace: %s" % (kvm_version, kvm_userspace_version)
 
 
+def create_report(dirname, html_path=None, output_file_name=None):
+res_dir = os.path.abspath(dirname)
+tag = res_dir
+status_file_name = dirname + '/status'
+sysinfo_dir = dirname + '/sysinfo'
+host = get_info_file('%s/hostname' % sysinfo_dir)
+rx = re.compile('^\s+[END|START].*$')
+# create the results set dict
+results_data = []
+if os.path.exists(status_file_name):
+f = open(status_file_name, "r")
+lines = f.readlines()
+f.close()
+for line in lines:
+if rx.match(line):
+result_dict = parse_result(dirname, line)
+if result_dict:
+results_data.append(result_dict)
+# create the meta info dict
+metalist = {
+'uname': get_info_file('%s/uname' % sysinfo_dir),
+'cpuinfo':get_info_file('%s/cpuinfo' % sysinfo_dir),
+'meminfo':get_info_file('%s/meminfo' % sysinfo_dir),
+'df':get_info_file('%s/df' % sysinfo_dir),
+'modules':get_info_file('%s/modules' % sysinfo_dir),
+'gcc':get_info_file('%s/gcc_--version' % sysinfo_dir),
+'dmidecode':get_info_file('%s/dmidecode' % sysinfo_dir),
+'dmesg':get_info_file('%s/dmesg' % sysinfo_dir),
+'kvmver':get_kvm_version(dirname)
+}
+
+if html_path is None:
+html_path = dirname
+if output_file_name is None:
+output_file_name = os.path.join(dirname, 'job_report.html')
+make_html_file(metalist, results_data, tag, host, output_file_name,
+   html_path)
+
+
 def main(argv):
 dirname = None
 output_file_name = None
@@ -1682,38 +1722,7 @@ def main(argv):
 if dirname:
 if os.path.isdir(dirname): # TBD: replace it with a validation of
# autotest result dir
-res_dir = os.path.abspath(dirname)
-tag = res_dir
-status_file_name = dirname + '/status'
-sysinfo_dir = dirname + '/sysinfo'
-host = get_info_file('%s/hostname' % sysinfo_dir)
-rx = re.compile('^\s+[END|START].*$')
-# create the results set dict
-results_data = [

[PATCH 0/3] Fix HTML report generation

2011-04-28 Thread Lucas Meneghel Rodrigues

Made HTML report generation generic and useable by any autotest
client job, fixed up some bugs in HTML report code. Now it'll
be easier to browse and visualize results without the autotest web
interface.

Lucas Meneghel Rodrigues (3):
  tools/html_report: Make html report generation autotest generic
  virt.virt_utils: Get rid of create_report function
  KVM test: control: Replace create_report function call

 client/tests/kvm/control|3 +-
 client/tools/html_report.py |   85 ---
 client/virt/virt_utils.py   |   12 --
 3 files changed, 49 insertions(+), 51 deletions(-)

-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bug in KVM clock backwards compensation

2011-04-28 Thread Zachary Amsden


On 04/28/2011 12:22 AM, Roedel, Joerg wrote:

On Thu, Apr 28, 2011 at 03:06:01AM -0400, Jan Kiszka wrote:
   

And /me still wonders (like I did when this first popped up) if the
proper place of determining TSC stability really have to be KVM.

If the Linux core fails to detect some instability and KVM has to jump
in, shouldn't we better improve the core's detection abilities and make
use of them in KVM? Conceptually this looks like we are currently just
working around a core deficit in KVM.
 

Yes, good question. Has this ever triggered on a real machine (not
counting the suspend/resume issue in)?
   


Yes... some platforms don't go haywire until you start using power 
mangement, TSC is stable before that, but not afterwards, and depending 
on the version of the kernel, KVM might detect this before the kernel does.


Honestly, the code is obsolete, but still useful for those who build KVM 
as an external module on older kernels using the kvm-kmod system.


Zach
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH v3 1/5] hpet 'driftfix': add hooks required to detect coalesced interrupts (x86 apic only)

2011-04-28 Thread Blue Swirl

On Thu, Apr 28, 2011 at 5:24 PM, Ulrich Obergfell  wrote:
> 'target_get_irq_delivered' and 'target_reset_irq_delivered' contain
> entry addresses of functions that are utilized by update_irq() to
> detect coalesced interrupts. apic code loads these pointers during
> initialization.

I'm not so happy with this approach, but probably then the i386
dependencies can be removed from RTC and it can be compiled only once
for all targets.

> This change can be replaced if a generic feedback infrastructure to
> track coalesced IRQs for periodic, clock providing devices becomes
> available.
>
> Signed-off-by: Ulrich Obergfell 
> ---
>  hw/apic.c |    4 
>  sysemu.h  |    3 +++
>  vl.c      |    3 +++
>  3 files changed, 10 insertions(+), 0 deletions(-)
>
> diff --git a/hw/apic.c b/hw/apic.c
> index a45b57f..eb0f6d9 100644
> --- a/hw/apic.c
> +++ b/hw/apic.c
> @@ -24,6 +24,7 @@
>  #include "sysbus.h"
>  #include "trace.h"
>  #include "kvm.h"
> +#include "sysemu.h"
>
>  /* APIC Local Vector Table */
>  #define APIC_LVT_TIMER   0
> @@ -1143,6 +1144,9 @@ static SysBusDeviceInfo apic_info = {
>
>  static void apic_register_devices(void)
>  {
> +    target_get_irq_delivered = apic_get_irq_delivered;
> +    target_reset_irq_delivered = apic_reset_irq_delivered;
> +
>     sysbus_register_withprop(&apic_info);
>  }
>
> diff --git a/sysemu.h b/sysemu.h
> index 07d85cd..75b0139 100644
> --- a/sysemu.h
> +++ b/sysemu.h
> @@ -98,6 +98,9 @@ int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f);
>  void qemu_savevm_state_cancel(Monitor *mon, QEMUFile *f);
>  int qemu_loadvm_state(QEMUFile *f);
>
> +extern int (*target_get_irq_delivered)(void);
> +extern void (*target_reset_irq_delivered)(void);
> +
>  /* SLIRP */
>  void do_info_slirp(Monitor *mon);
>
> diff --git a/vl.c b/vl.c
> index 0c24e07..7bab315 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -233,6 +233,9 @@ const char *prom_envs[MAX_PROM_ENVS];
>  const char *nvram = NULL;
>  int boot_menu;
>
> +int (*target_get_irq_delivered)(void) = 0;
> +void (*target_reset_irq_delivered)(void) = 0;

Instead of initializing with 0 (should be actually NULL in C), please
define stub functions, which are used by default. Then the users don't
need to check for NULL pointers.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bug in KVM clock backwards compensation

2011-04-28 Thread Zachary Amsden


On 04/28/2011 12:13 AM, Roedel, Joerg wrote:

On Thu, Apr 28, 2011 at 02:59:57AM -0400, Zachary Amsden wrote:
   

So I've been going over the new code changes to the TSC related code and
I don't like one particular set of changes.  In particular, here:

  kvm_x86_ops->vcpu_load(vcpu, cpu);
  if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) {
  /* Make sure TSC doesn't go backwards */
  s64 tsc_delta;
  u64 tsc;

  kvm_get_msr(vcpu, MSR_IA32_TSC,&tsc);
  tsc_delta = !vcpu->arch.last_guest_tsc ? 0 :
   tsc - vcpu->arch.last_guest_tsc;

  if (tsc_delta<  0)
  mark_tsc_unstable("KVM discovered backwards TSC");
  if (check_tsc_unstable()) {
  kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
  vcpu->arch.tsc_catchup = 1;
  }


The point of this code fragment is to test the host clock to see if it
is stable, because we may have just come back from an idle phase which
stopped the TSC, switched CPUs, or come back from a deep sleep state
which reset the host TSC.
 

I see it different. This code wants to check if the _guest_ tsc moves
forwared (or at least not backwards). So it is fully legitimate to just
do this by reading the guest-tsc and compare it to the last one the
guest had.
   


That wasn't the intention when I wrote that code.  It's simply there to 
detect backwards motion of the host TSC.  The guest TSC can legally go 
backwards whenever the guest decides to change it, so checking the guest 
TSC doesn't make sense here.



I saw a patch floating around that touched this code recently, but I
think there's a definite issue here that needs addressing.
 

In fact, this change was done to address one of your concerns. You
mentioned that the values passed to adjust_tsc_offset() were in
unconsistent units in my first version of tsc-scaling. This was a right
objection because one call-site used guest-tsc-units while the other
used host-tsc-units. This change intended to fix that by using
guest-tsc-units always for adjust_tsc_offset().

Not that the guest and the host tsc have the same units on current
machines. But with tsc-scaling these units are different.
   


Yes, with tsc-scaling, the machines already have stable TSCs - the above 
test is for older hardware which could have problems, and can be 
reverted back to the original code without worrying about switching units.


Zach
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bug in KVM clock backwards compensation

2011-04-28 Thread Zachary Amsden


On 04/28/2011 12:06 AM, Jan Kiszka wrote:

On 2011-04-28 08:59, Zachary Amsden wrote:
   

So I've been going over the new code changes to the TSC related code and
I don't like one particular set of changes.  In particular, here:

 kvm_x86_ops->vcpu_load(vcpu, cpu);
 if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) {
 /* Make sure TSC doesn't go backwards */
 s64 tsc_delta;
 u64 tsc;

 kvm_get_msr(vcpu, MSR_IA32_TSC,&tsc);
 tsc_delta = !vcpu->arch.last_guest_tsc ? 0 :
  tsc - vcpu->arch.last_guest_tsc;

 if (tsc_delta<  0)
 mark_tsc_unstable("KVM discovered backwards TSC");
 if (check_tsc_unstable()) {
 kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
 vcpu->arch.tsc_catchup = 1;
 }


The point of this code fragment is to test the host clock to see if it
is stable, because we may have just come back from an idle phase which
stopped the TSC, switched CPUs, or come back from a deep sleep state
which reset the host TSC.

However, the above code is fetching the guest TSC instead of the host
TSC, which isn't the way it is supposed to work.

I saw a patch floating around that touched this code recently, but I
think there's a definite issue here that needs addressing.
 

And /me still wonders (like I did when this first popped up) if the
proper place of determining TSC stability really have to be KVM.
   


No, it's not.  I have a patch which fixes that.  Was in the process of 
merging it into my local tree when I found this.



If the Linux core fails to detect some instability and KVM has to jump
in, shouldn't we better improve the core's detection abilities and make
use of them in KVM? Conceptually this looks like we are currently just
working around a core deficit in KVM.
   


100% correct you are.

Zach
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [CentOS] Install CentOS as KVM guest

2011-04-28 Thread Emmanuel Noobadmin

On 4/28/11, Gleb Natapov  wrote:
> Qemu is not intended to be used directly by end user. It is too complex as
> you already found out. VMware don't even give you access to such low parts
> of virt stack. You should use libvirt or virt-manager instead. Especially
> if you are concerned about security. I think libvirt can start guest on
> headless server.

Sorry for the confusion, I was using libvirtd in CLI, i.e.
virt-install and virsh, not qemu directly.

> If this still fails for you you need to complain to libvirt developers
> (not in a rant mode, but providing details of what exact version of
> software you have problem with and what are you trying to do). And
> libvirt developers will not be shy to complain to qemu developers if the
> problem turned to be on their side.

Apologies about the rant mode as well. Before that, I tried sending
two emails (25th and 26th Apr) to the KVM list with some details,
hoping to get some advice. But each of these failed to materialize on
the kvm list for unknown reasons.

So I resorted to posting to the CentOS list, where I did get some
response for which I'm very thankful. The rant post came when despite
the additional advice which helped me get a little further, I keep
running into unexpected brickwalls like anaconda not seeing the "dvd"
(mounted ISO specified using --location) that it just booted from.

Out of frustration, I CC'd that particular email to the kvm list,
figuring that since it's likely to get me flamed, the imp of
perversity would probably let it through... and it did.

> As far as I know libvirt has no problem using bridged networking and
> virt-manager use libvirt so it should work if you use new enough virt
> stack, but you should ask on libvirt mailing list instead.

I guess those were outdated warnings on older versions. I'll give it
another spin given some of the new suggestions like using virt-install
to create the disk file. If it still doesn't work, I'll go check the
libvirt ML (I'm belatedly getting the idea that libvirt is not part of
kvm).
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Virtualize BTS?

2011-04-28 Thread Avi Kivity


On 04/28/2011 06:03 PM, Jun Koi wrote:

hi,

i am wondering if current KVM version virtualizes the BTS (Branch
Trace Store) facility? do we have this facility inside VM?



No.  On AMD we do virtualize a similar feature (LBR).

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] pci-assign: Clean up assigned_dev_pci_read/write_config

2011-04-28 Thread Jan Kiszka

On 2011-04-28 16:54, Alex Williamson wrote:
> On Thu, 2011-04-28 at 16:46 +0200, Jan Kiszka wrote:
>> On 2011-04-28 16:29, Alex Williamson wrote:
>>> On Thu, 2011-04-28 at 10:59 +0200, Jan Kiszka wrote:
 Use rages_overlap and proper constants to match the access range against
>>>   ^ typo - only if you resend
>>>
 regions that need special handling. This also fixes yet uncaught
 high-byte write access to the command register. Moreover, use more
 constants instead of magic numbers.

 Signed-off-by: Jan Kiszka 
 ---
  hw/device-assignment.c |   39 +--
  1 files changed, 29 insertions(+), 10 deletions(-)

 diff --git a/hw/device-assignment.c b/hw/device-assignment.c
 index 606d725..3481c93 100644
 --- a/hw/device-assignment.c
 +++ b/hw/device-assignment.c
 @@ -404,13 +404,20 @@ static void assigned_dev_pci_write_config(PCIDevice 
 *d, uint32_t address,
  return assigned_device_pci_cap_write_config(d, address, val, len);
  }
  
 -if (address == 0x4) {
 +if (ranges_overlap(address, len, PCI_COMMAND, 2)) {
  pci_default_write_config(d, address, val, len);
  /* Continue to program the card */
  }
  
 -if ((address >= 0x10 && address <= 0x24) || address == 0x30 ||
 -address == 0x34 || address == 0x3c || address == 0x3d) {
 +/*
 + * Catch access to
 + *  - base address registers
 + *  - ROM base address & capability pointer
 + *  - interrupt line & pin
 + */
 +if (ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) ||
 +ranges_overlap(address, len, PCI_ROM_ADDRESS, 8) ||
>>>
>>> Should this be 5 bytes instead of 8?  I'm not sure why we'd add catching
>>> these reserved fields, but not those immediately after this range.
>>
>> Yes, that's asking for clarification: Should we allow direct access to
>> the complete reserved space or virtualize it? Depending on this, the
>> proper value should be 5 or 14 (the latter would also save one
>> ranges_overlap).
> 
> I vote for 5 here since a cleanup patch shouldn't have behavior changes
> hidden in it.  I don't see any great value in virtualizing reserved
> bits.  It seems like it could only make things not work if a vendor was
> stupid enough to hide something in there.  Thanks,

Thinking about this and the other sub-dword matches again, there is a
problem in my approach:

I redirect the complete write to the virtualized space once there is an
overlap. So a dword write at PCI_INTERRUPT_LINE would not touch the real
PCI_MIN_GNT & PCI_MAX_LAT. But writing directly to those without overlap
would. The same applies to our reserved space, though this inconsistency
is not critical here - but still undesirable.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] pci-assign: Fix dword read at PCI_COMMAND

2011-04-28 Thread Jan Kiszka

On 2011-04-28 16:51, Alex Williamson wrote:
> On Thu, 2011-04-28 at 10:59 +0200, Jan Kiszka wrote:
>> If we emulate the command register, we must only read its content from
>> the shadow config space. For dword read of both PCI_COMMAND and
>> PCI_STATUS, at least the latter must be read from the device.
>>
>> For simplicity reasons and as the code path is not considered
>> performance critical for the affected SRIOV devices, the fix performes
>> device access to the command word unconditionally, even if emulation is
>> enabled and only that word is read.
>>
>> Signed-off-by: Jan Kiszka 
>> ---
>>  hw/device-assignment.c |8 +---
>>  1 files changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
>> index f37f108..ee81434 100644
>> --- a/hw/device-assignment.c
>> +++ b/hw/device-assignment.c
>> @@ -482,14 +482,11 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice 
>> *d, uint32_t address,
>>  /*
>>   * Catch access to
>>   *  - vendor & device ID
>> - *  - command register (if emulation needed)
>>   *  - base address registers
>>   *  - ROM base address & capability pointer
>>   *  - interrupt line & pin
>>   */
>>  if (ranges_overlap(address, len, PCI_VENDOR_ID, 4) ||
>> -(pci_dev->need_emulate_cmd &&
>> - ranges_overlap(address, len, PCI_COMMAND, 2)) ||
>>  ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) ||
>>  ranges_overlap(address, len, PCI_ROM_ADDRESS, 8) ||
>>  ranges_overlap(address, len, PCI_INTERRUPT_LINE, 2)) {
>> @@ -521,6 +518,11 @@ do_log:
>>  DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
>>(d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
>>  
>> +if (pci_dev->need_emulate_cmd) {
>> +val = merge_bits(val, pci_default_read_config(d, PCI_COMMAND, 2),
>> + address, len, PCI_COMMAND, 0x);
> 
> Shouldn't this be pci_default_read_config(d, address, len)?  I think
> what you have works since PCI_COMMAND is at the start of a dword, but it
> violates the merge_bits() assumption that val and mval are from the same
> address with the same length.  Thanks,

This is actually a real issue, reading a byte from PCI_COMMAND+1 is broken.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Virtualize BTS?

2011-04-28 Thread Jun Koi

hi,

i am wondering if current KVM version virtualizes the BTS (Branch
Trace Store) facility? do we have this facility inside VM?

thanks,
Jun
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] pci-assign: Clean up assigned_dev_pci_read/write_config

2011-04-28 Thread Jan Kiszka

On 2011-04-28 16:54, Alex Williamson wrote:
> On Thu, 2011-04-28 at 16:46 +0200, Jan Kiszka wrote:
>> On 2011-04-28 16:29, Alex Williamson wrote:
>>> On Thu, 2011-04-28 at 10:59 +0200, Jan Kiszka wrote:
 Use rages_overlap and proper constants to match the access range against
>>>   ^ typo - only if you resend
>>>
 regions that need special handling. This also fixes yet uncaught
 high-byte write access to the command register. Moreover, use more
 constants instead of magic numbers.

 Signed-off-by: Jan Kiszka 
 ---
  hw/device-assignment.c |   39 +--
  1 files changed, 29 insertions(+), 10 deletions(-)

 diff --git a/hw/device-assignment.c b/hw/device-assignment.c
 index 606d725..3481c93 100644
 --- a/hw/device-assignment.c
 +++ b/hw/device-assignment.c
 @@ -404,13 +404,20 @@ static void assigned_dev_pci_write_config(PCIDevice 
 *d, uint32_t address,
  return assigned_device_pci_cap_write_config(d, address, val, len);
  }
  
 -if (address == 0x4) {
 +if (ranges_overlap(address, len, PCI_COMMAND, 2)) {
  pci_default_write_config(d, address, val, len);
  /* Continue to program the card */
  }
  
 -if ((address >= 0x10 && address <= 0x24) || address == 0x30 ||
 -address == 0x34 || address == 0x3c || address == 0x3d) {
 +/*
 + * Catch access to
 + *  - base address registers
 + *  - ROM base address & capability pointer
 + *  - interrupt line & pin
 + */
 +if (ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) ||
 +ranges_overlap(address, len, PCI_ROM_ADDRESS, 8) ||
>>>
>>> Should this be 5 bytes instead of 8?  I'm not sure why we'd add catching
>>> these reserved fields, but not those immediately after this range.
>>
>> Yes, that's asking for clarification: Should we allow direct access to
>> the complete reserved space or virtualize it? Depending on this, the
>> proper value should be 5 or 14 (the latter would also save one
>> ranges_overlap).
> 
> I vote for 5 here since a cleanup patch shouldn't have behavior changes
> hidden in it.  I don't see any great value in virtualizing reserved
> bits.  It seems like it could only make things not work if a vendor was
> stupid enough to hide something in there.  Thanks,

Yeah, and as we properly restore the config space now, it should be
Mostly Harmless. Will update.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -v3] Monitor command: x-gpa2hva, translate guest physical address to host virtual address

2011-04-28 Thread Marcelo Tosatti

On Thu, Apr 28, 2011 at 08:00:19AM -0500, Anthony Liguori wrote:
> On 04/27/2011 06:06 PM, Marcelo Tosatti wrote:
> >On Fri, Nov 19, 2010 at 04:17:35PM +0800, Huang Ying wrote:
> >>On Tue, 2010-11-16 at 10:23 +0800, Huang Ying wrote:
> >>>Author: Max Asbock
> >>>
> >>>Add command x-gpa2hva to translate guest physical address to host
> >>>virtual address. Because gpa to hva translation is not consistent, so
> >>>this command is only used for debugging.
> >>>
> >>>The x-gpa2hva command provides one step in a chain of translations from
> >>>guest virtual to guest physical to host virtual to host physical. Host
> >>>physical is then used to inject a machine check error. As a
> >>>consequence the HWPOISON code on the host and the MCE injection code
> >>>in qemu-kvm are exercised.
> >>>
> >>>v3:
> >>>
> >>>- Rename to x-gpa2hva
> >>>- Remove QMP support, because gpa2hva is not consistent
> >>
> >>Is this patch an acceptable solution for now? This command is useful for
> >>our testing.
> >
> >Anthony?
> 
> Yeah, but it should come through qemu-devel, no?

Yes, Huang Ying, can you please resend?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] pci-assign: Clean up assigned_dev_pci_read/write_config

2011-04-28 Thread Alex Williamson

On Thu, 2011-04-28 at 16:46 +0200, Jan Kiszka wrote:
> On 2011-04-28 16:29, Alex Williamson wrote:
> > On Thu, 2011-04-28 at 10:59 +0200, Jan Kiszka wrote:
> >> Use rages_overlap and proper constants to match the access range against
> >   ^ typo - only if you resend
> > 
> >> regions that need special handling. This also fixes yet uncaught
> >> high-byte write access to the command register. Moreover, use more
> >> constants instead of magic numbers.
> >>
> >> Signed-off-by: Jan Kiszka 
> >> ---
> >>  hw/device-assignment.c |   39 +--
> >>  1 files changed, 29 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> >> index 606d725..3481c93 100644
> >> --- a/hw/device-assignment.c
> >> +++ b/hw/device-assignment.c
> >> @@ -404,13 +404,20 @@ static void assigned_dev_pci_write_config(PCIDevice 
> >> *d, uint32_t address,
> >>  return assigned_device_pci_cap_write_config(d, address, val, len);
> >>  }
> >>  
> >> -if (address == 0x4) {
> >> +if (ranges_overlap(address, len, PCI_COMMAND, 2)) {
> >>  pci_default_write_config(d, address, val, len);
> >>  /* Continue to program the card */
> >>  }
> >>  
> >> -if ((address >= 0x10 && address <= 0x24) || address == 0x30 ||
> >> -address == 0x34 || address == 0x3c || address == 0x3d) {
> >> +/*
> >> + * Catch access to
> >> + *  - base address registers
> >> + *  - ROM base address & capability pointer
> >> + *  - interrupt line & pin
> >> + */
> >> +if (ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) ||
> >> +ranges_overlap(address, len, PCI_ROM_ADDRESS, 8) ||
> > 
> > Should this be 5 bytes instead of 8?  I'm not sure why we'd add catching
> > these reserved fields, but not those immediately after this range.
> 
> Yes, that's asking for clarification: Should we allow direct access to
> the complete reserved space or virtualize it? Depending on this, the
> proper value should be 5 or 14 (the latter would also save one
> ranges_overlap).

I vote for 5 here since a cleanup patch shouldn't have behavior changes
hidden in it.  I don't see any great value in virtualizing reserved
bits.  It seems like it could only make things not work if a vendor was
stupid enough to hide something in there.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] pci-assign: Fix dword read at PCI_COMMAND

2011-04-28 Thread Alex Williamson

On Thu, 2011-04-28 at 10:59 +0200, Jan Kiszka wrote:
> If we emulate the command register, we must only read its content from
> the shadow config space. For dword read of both PCI_COMMAND and
> PCI_STATUS, at least the latter must be read from the device.
> 
> For simplicity reasons and as the code path is not considered
> performance critical for the affected SRIOV devices, the fix performes
> device access to the command word unconditionally, even if emulation is
> enabled and only that word is read.
> 
> Signed-off-by: Jan Kiszka 
> ---
>  hw/device-assignment.c |8 +---
>  1 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> index f37f108..ee81434 100644
> --- a/hw/device-assignment.c
> +++ b/hw/device-assignment.c
> @@ -482,14 +482,11 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice 
> *d, uint32_t address,
>  /*
>   * Catch access to
>   *  - vendor & device ID
> - *  - command register (if emulation needed)
>   *  - base address registers
>   *  - ROM base address & capability pointer
>   *  - interrupt line & pin
>   */
>  if (ranges_overlap(address, len, PCI_VENDOR_ID, 4) ||
> -(pci_dev->need_emulate_cmd &&
> - ranges_overlap(address, len, PCI_COMMAND, 2)) ||
>  ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) ||
>  ranges_overlap(address, len, PCI_ROM_ADDRESS, 8) ||
>  ranges_overlap(address, len, PCI_INTERRUPT_LINE, 2)) {
> @@ -521,6 +518,11 @@ do_log:
>  DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
>(d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
>  
> +if (pci_dev->need_emulate_cmd) {
> +val = merge_bits(val, pci_default_read_config(d, PCI_COMMAND, 2),
> + address, len, PCI_COMMAND, 0x);

Shouldn't this be pci_default_read_config(d, address, len)?  I think
what you have works since PCI_COMMAND is at the start of a dword, but it
violates the merge_bits() assumption that val and mval are from the same
address with the same length.  Thanks,

Alex

> +}
> +
>  if (!pci_dev->cap.available) {
>  /* kill the special capabilities */
>  if (address == PCI_COMMAND && len == 4) {



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] pci-assign: Clean up assigned_dev_pci_read/write_config

2011-04-28 Thread Jan Kiszka

On 2011-04-28 16:29, Alex Williamson wrote:
> On Thu, 2011-04-28 at 10:59 +0200, Jan Kiszka wrote:
>> Use rages_overlap and proper constants to match the access range against
>   ^ typo - only if you resend
> 
>> regions that need special handling. This also fixes yet uncaught
>> high-byte write access to the command register. Moreover, use more
>> constants instead of magic numbers.
>>
>> Signed-off-by: Jan Kiszka 
>> ---
>>  hw/device-assignment.c |   39 +--
>>  1 files changed, 29 insertions(+), 10 deletions(-)
>>
>> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
>> index 606d725..3481c93 100644
>> --- a/hw/device-assignment.c
>> +++ b/hw/device-assignment.c
>> @@ -404,13 +404,20 @@ static void assigned_dev_pci_write_config(PCIDevice 
>> *d, uint32_t address,
>>  return assigned_device_pci_cap_write_config(d, address, val, len);
>>  }
>>  
>> -if (address == 0x4) {
>> +if (ranges_overlap(address, len, PCI_COMMAND, 2)) {
>>  pci_default_write_config(d, address, val, len);
>>  /* Continue to program the card */
>>  }
>>  
>> -if ((address >= 0x10 && address <= 0x24) || address == 0x30 ||
>> -address == 0x34 || address == 0x3c || address == 0x3d) {
>> +/*
>> + * Catch access to
>> + *  - base address registers
>> + *  - ROM base address & capability pointer
>> + *  - interrupt line & pin
>> + */
>> +if (ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) ||
>> +ranges_overlap(address, len, PCI_ROM_ADDRESS, 8) ||
> 
> Should this be 5 bytes instead of 8?  I'm not sure why we'd add catching
> these reserved fields, but not those immediately after this range.

Yes, that's asking for clarification: Should we allow direct access to
the complete reserved space or virtualize it? Depending on this, the
proper value should be 5 or 14 (the latter would also save one
ranges_overlap).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] pci-assign: Clean up assigned_dev_pci_read/write_config

2011-04-28 Thread Alex Williamson

On Thu, 2011-04-28 at 10:59 +0200, Jan Kiszka wrote:
> Use rages_overlap and proper constants to match the access range against
  ^ typo - only if you resend

> regions that need special handling. This also fixes yet uncaught
> high-byte write access to the command register. Moreover, use more
> constants instead of magic numbers.
> 
> Signed-off-by: Jan Kiszka 
> ---
>  hw/device-assignment.c |   39 +--
>  1 files changed, 29 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> index 606d725..3481c93 100644
> --- a/hw/device-assignment.c
> +++ b/hw/device-assignment.c
> @@ -404,13 +404,20 @@ static void assigned_dev_pci_write_config(PCIDevice *d, 
> uint32_t address,
>  return assigned_device_pci_cap_write_config(d, address, val, len);
>  }
>  
> -if (address == 0x4) {
> +if (ranges_overlap(address, len, PCI_COMMAND, 2)) {
>  pci_default_write_config(d, address, val, len);
>  /* Continue to program the card */
>  }
>  
> -if ((address >= 0x10 && address <= 0x24) || address == 0x30 ||
> -address == 0x34 || address == 0x3c || address == 0x3d) {
> +/*
> + * Catch access to
> + *  - base address registers
> + *  - ROM base address & capability pointer
> + *  - interrupt line & pin
> + */
> +if (ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) ||
> +ranges_overlap(address, len, PCI_ROM_ADDRESS, 8) ||

Should this be 5 bytes instead of 8?  I'm not sure why we'd add catching
these reserved fields, but not those immediately after this range.

> +ranges_overlap(address, len, PCI_INTERRUPT_LINE, 2)) {
>  /* used for update-mappings (BAR emulation) */
>  pci_default_write_config(d, address, val, len);
>  return;
> @@ -450,9 +457,20 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice 
> *d, uint32_t address,
>  return val;
>  }
>  
> -if (address < 0x4 || (pci_dev->need_emulate_cmd && address == 0x4) ||
> - (address >= 0x10 && address <= 0x24) || address == 0x30 ||
> -address == 0x34 || address == 0x3c || address == 0x3d) {
> +/*
> + * Catch access to
> + *  - vendor & device ID
> + *  - command register (if emulation needed)
> + *  - base address registers
> + *  - ROM base address & capability pointer
> + *  - interrupt line & pin
> + */
> +if (ranges_overlap(address, len, PCI_VENDOR_ID, 4) ||
> +(pci_dev->need_emulate_cmd &&
> + ranges_overlap(address, len, PCI_COMMAND, 2)) ||
> +ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) ||
> +ranges_overlap(address, len, PCI_ROM_ADDRESS, 8) ||

Same here.

> +ranges_overlap(address, len, PCI_INTERRUPT_LINE, 2)) {
>  val = pci_default_read_config(d, address, len);
>  DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
>(d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
> @@ -483,10 +501,11 @@ do_log:
>  
>  if (!pci_dev->cap.available) {
>  /* kill the special capabilities */
> -if (address == 4 && len == 4)
> -val &= ~0x10;
> -else if (address == 6)
> -val &= ~0x10;
> +if (address == PCI_COMMAND && len == 4) {
> +val &= ~(PCI_STATUS_CAP_LIST << 16);
> +} else if (address == PCI_STATUS) {
> +val &= ~PCI_STATUS_CAP_LIST;
> +}
>  }
>  
>  return val;

Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 2/5] hpet 'driftfix': add driftfix property to HPETState and DeviceInfo

2011-04-28 Thread Ulrich Obergfell

driftfix is a 'bit type' property. Compensation of delayed callbacks
and coalesced interrupts can be enabled with the command line option

-global hpet.driftfix=on

driftfix is 'off' (disabled) by default.

Signed-off-by: Ulrich Obergfell 
---
 hw/hpet.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/hw/hpet.c b/hw/hpet.c
index 6ce07bc..7513065 100644
--- a/hw/hpet.c
+++ b/hw/hpet.c
@@ -72,6 +72,8 @@ typedef struct HPETState {
 uint64_t isr;   /* interrupt status reg */
 uint64_t hpet_counter;  /* main counter */
 uint8_t  hpet_id;   /* instance id */
+
+uint32_t driftfix;
 } HPETState;
 
 static uint32_t hpet_in_legacy_mode(HPETState *s)
@@ -738,6 +740,7 @@ static SysBusDeviceInfo hpet_device_info = {
 .qdev.props = (Property[]) {
 DEFINE_PROP_UINT8("timers", HPETState, num_timers, HPET_MIN_TIMERS),
 DEFINE_PROP_BIT("msi", HPETState, flags, HPET_MSI_SUPPORT, false),
+DEFINE_PROP_BIT("driftfix", HPETState, driftfix, 0, false),
 DEFINE_PROP_END_OF_LIST(),
 },
 };
-- 
1.6.2.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 4/5] hpet 'driftfix': add code in update_irq() to detect coalesced interrupts (x86 apic only)

2011-04-28 Thread Ulrich Obergfell

update_irq() uses a similar method as in 'rtc_td_hack' to detect
coalesced interrupts. The function entry addresses are retrieved
from 'target_get_irq_delivered' and 'target_reset_irq_delivered'.

This change can be replaced if a generic feedback infrastructure to
track coalesced IRQs for periodic, clock providing devices becomes
available.

Signed-off-by: Ulrich Obergfell 
---
 hw/hpet.c |   15 +--
 1 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/hw/hpet.c b/hw/hpet.c
index 7ab6e62..35466ae 100644
--- a/hw/hpet.c
+++ b/hw/hpet.c
@@ -31,6 +31,7 @@
 #include "hpet_emul.h"
 #include "sysbus.h"
 #include "mc146818rtc.h"
+#include "sysemu.h"
 
 //#define HPET_DEBUG
 #ifdef HPET_DEBUG
@@ -175,11 +176,12 @@ static inline uint64_t hpet_calculate_diff(HPETTimer *t, 
uint64_t current)
 }
 }
 
-static void update_irq(struct HPETTimer *timer, int set)
+static int update_irq(struct HPETTimer *timer, int set)
 {
 uint64_t mask;
 HPETState *s;
 int route;
+int irq_delivered = 1;
 
 if (timer->tn <= 1 && hpet_in_legacy_mode(timer->state)) {
 /* if LegacyReplacementRoute bit is set, HPET specification requires
@@ -204,8 +206,17 @@ static void update_irq(struct HPETTimer *timer, int set)
 qemu_irq_raise(s->irqs[route]);
 } else {
 s->isr &= ~mask;
-qemu_irq_pulse(s->irqs[route]);
+if (s->driftfix && target_get_irq_delivered
+&& target_reset_irq_delivered) {
+target_reset_irq_delivered();
+qemu_irq_raise(s->irqs[route]);
+irq_delivered = target_get_irq_delivered();
+qemu_irq_lower(s->irqs[route]);
+} else {
+qemu_irq_pulse(s->irqs[route]);
+}
 }
+return irq_delivered;
 }
 
 static void hpet_pre_save(void *opaque)
-- 
1.6.2.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 3/5] hpet 'driftfix': add fields to HPETTimer and VMStateDescription

2011-04-28 Thread Ulrich Obergfell

The new fields in HPETTimer are covered by a separate VMStateDescription
which is a subsection of 'vmstate_hpet_timer'. They are only migrated if

-global hpet.driftfix=on

Signed-off-by: Ulrich Obergfell 
---
 hw/hpet.c |   33 +
 1 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/hw/hpet.c b/hw/hpet.c
index 7513065..7ab6e62 100644
--- a/hw/hpet.c
+++ b/hw/hpet.c
@@ -55,6 +55,10 @@ typedef struct HPETTimer {  /* timers */
 uint8_t wrap_flag;  /* timer pop will indicate wrap for one-shot 32-bit
  * mode. Next pop will be actual timer expiration.
  */
+uint64_t prev_period;
+uint64_t ticks_not_accounted;
+uint32_t irq_rate;
+uint32_t divisor;
 } HPETTimer;
 
 typedef struct HPETState {
@@ -246,6 +250,27 @@ static int hpet_post_load(void *opaque, int version_id)
 return 0;
 }
 
+static bool hpet_timer_driftfix_vmstate_needed(void *opaque)
+{
+HPETTimer *t = opaque;
+
+return (t->state->driftfix != 0);
+}
+
+static const VMStateDescription vmstate_hpet_timer_driftfix = {
+.name = "hpet_timer_driftfix",
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.fields  = (VMStateField []) {
+VMSTATE_UINT64(prev_period, HPETTimer),
+VMSTATE_UINT64(ticks_not_accounted, HPETTimer),
+VMSTATE_UINT32(irq_rate, HPETTimer),
+VMSTATE_UINT32(divisor, HPETTimer),
+VMSTATE_END_OF_LIST()
+}
+};
+
 static const VMStateDescription vmstate_hpet_timer = {
 .name = "hpet_timer",
 .version_id = 1,
@@ -260,6 +285,14 @@ static const VMStateDescription vmstate_hpet_timer = {
 VMSTATE_UINT8(wrap_flag, HPETTimer),
 VMSTATE_TIMER(qemu_timer, HPETTimer),
 VMSTATE_END_OF_LIST()
+},
+.subsections = (VMStateSubsection []) {
+{
+.vmsd = &vmstate_hpet_timer_driftfix,
+.needed = hpet_timer_driftfix_vmstate_needed,
+}, {
+/* empty */
+}
 }
 };
 
-- 
1.6.2.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 0/5] hpet 'driftfix': alleviate time drift with HPET periodic timers

2011-04-28 Thread Ulrich Obergfell

Hi,

This is version 3 of a series of patches that I originally posted in:

http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01989.html
http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01992.html
http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01991.html
http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01990.html

http://article.gmane.org/gmane.comp.emulators.kvm.devel/69325
http://article.gmane.org/gmane.comp.emulators.kvm.devel/69326
http://article.gmane.org/gmane.comp.emulators.kvm.devel/69327
http://article.gmane.org/gmane.comp.emulators.kvm.devel/69328

Changes since version 2:

 -  The vmstate related to 'driftfix' is now in a separate subsection
that can be migrated conditionally.

 -  The new fields of the HPETTimer structure are no longer initialized
in hpet_init(). This is now done in hpet_reset() only.

 -  Compensation of lost timer interrupts is now based on a backlog of
unaccounted HPET clock periods instead of 'irqs_to_inject'. This
eliminates the need to scale 'irqs_to_inject' when a guest o/s
modifies the comparator register value.

Please review and please comment.

Regards,

Uli


Ulrich Obergfell (5):
  hpet 'driftfix': add hooks required to detect coalesced interrupts
(x86 apic only)
  hpet 'driftfix': add driftfix property to HPETState and DeviceInfo
  hpet 'driftfix': add fields to HPETTimer and VMStateDescription
  hpet 'driftfix': add code in update_irq() to detect coalesced
interrupts (x86 apic only)
  hpet 'driftfix': add code in hpet_timer() to compensate delayed
callbacks and coalesced interrupts

 hw/apic.c |4 ++
 hw/hpet.c |  114 ++--
 sysemu.h  |3 ++
 vl.c  |3 ++
 4 files changed, 120 insertions(+), 4 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 1/5] hpet 'driftfix': add hooks required to detect coalesced interrupts (x86 apic only)

2011-04-28 Thread Ulrich Obergfell

'target_get_irq_delivered' and 'target_reset_irq_delivered' contain
entry addresses of functions that are utilized by update_irq() to
detect coalesced interrupts. apic code loads these pointers during
initialization.

This change can be replaced if a generic feedback infrastructure to
track coalesced IRQs for periodic, clock providing devices becomes
available.

Signed-off-by: Ulrich Obergfell 
---
 hw/apic.c |4 
 sysemu.h  |3 +++
 vl.c  |3 +++
 3 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index a45b57f..eb0f6d9 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -24,6 +24,7 @@
 #include "sysbus.h"
 #include "trace.h"
 #include "kvm.h"
+#include "sysemu.h"
 
 /* APIC Local Vector Table */
 #define APIC_LVT_TIMER   0
@@ -1143,6 +1144,9 @@ static SysBusDeviceInfo apic_info = {
 
 static void apic_register_devices(void)
 {
+target_get_irq_delivered = apic_get_irq_delivered;
+target_reset_irq_delivered = apic_reset_irq_delivered;
+
 sysbus_register_withprop(&apic_info);
 }
 
diff --git a/sysemu.h b/sysemu.h
index 07d85cd..75b0139 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -98,6 +98,9 @@ int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f);
 void qemu_savevm_state_cancel(Monitor *mon, QEMUFile *f);
 int qemu_loadvm_state(QEMUFile *f);
 
+extern int (*target_get_irq_delivered)(void);
+extern void (*target_reset_irq_delivered)(void);
+
 /* SLIRP */
 void do_info_slirp(Monitor *mon);
 
diff --git a/vl.c b/vl.c
index 0c24e07..7bab315 100644
--- a/vl.c
+++ b/vl.c
@@ -233,6 +233,9 @@ const char *prom_envs[MAX_PROM_ENVS];
 const char *nvram = NULL;
 int boot_menu;
 
+int (*target_get_irq_delivered)(void) = 0;
+void (*target_reset_irq_delivered)(void) = 0;
+
 typedef struct FWBootEntry FWBootEntry;
 
 struct FWBootEntry {
-- 
1.6.2.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 5/5] hpet 'driftfix': add code in hpet_timer() to compensate delayed callbacks and coalesced interrupts

2011-04-28 Thread Ulrich Obergfell

Loss of periodic timer interrupts caused by delayed callbacks and by
interrupt coalescing is compensated by gradually injecting additional
interrupts during subsequent timer intervals, starting at a rate of
one additional interrupt per interval. The injection of additional
interrupts is based on a backlog of unaccounted HPET clock periods
(new HPETTimer field 'ticks_not_accounted'). The backlog increases
due to delayed callbacks and coalesced interrupts, and it decreases
if an interrupt was injected successfully. If the backlog increases
while compensation is still in progress, the rate at which additional
interrupts are injected is increased too. A limit is imposed on the
backlog and on the rate.

Injecting additional timer interrupts to compensate lost interrupts
can alleviate long term time drift. However, on a short time scale,
this method can have the side effect of making virtual machine time
intermittently pass slower and faster than real time (depending on
the guest's time keeping algorithm). Compensation is disabled by
default and can be enabled for guests where this behaviour may be
acceptable.

Signed-off-by: Ulrich Obergfell 
---
 hw/hpet.c |   63 +++-
 1 files changed, 61 insertions(+), 2 deletions(-)

diff --git a/hw/hpet.c b/hw/hpet.c
index 35466ae..92d5f58 100644
--- a/hw/hpet.c
+++ b/hw/hpet.c
@@ -32,6 +32,7 @@
 #include "sysbus.h"
 #include "mc146818rtc.h"
 #include "sysemu.h"
+#include 
 
 //#define HPET_DEBUG
 #ifdef HPET_DEBUG
@@ -42,6 +43,9 @@
 
 #define HPET_MSI_SUPPORT0
 
+#define MAX_TICKS_NOT_ACCOUNTED (uint64_t)5 /* 5 sec */
+#define MAX_IRQ_RATE(uint32_t)10
+
 struct HPETState;
 typedef struct HPETTimer {  /* timers */
 uint8_t tn; /*timer number*/
@@ -326,28 +330,63 @@ static const VMStateDescription vmstate_hpet = {
 }
 };
 
+static bool hpet_timer_has_tick_backlog(HPETTimer *t)
+{
+uint64_t backlog = t->ticks_not_accounted - (t->period + t->prev_period);
+return (backlog >= t->period);
+}
+
 /*
  * timer expiration callback
  */
 static void hpet_timer(void *opaque)
 {
 HPETTimer *t = opaque;
+HPETState *s = t->state;
 uint64_t diff;
-
+int irq_delivered = 0;
+uint32_t irq_count = 0;
 uint64_t period = t->period;
 uint64_t cur_tick = hpet_get_ticks(t->state);
 
+if (s->driftfix && !t->ticks_not_accounted) {
+t->ticks_not_accounted = t->prev_period = t->period;
+}
 if (timer_is_periodic(t) && period != 0) {
 if (t->config & HPET_TN_32BIT) {
 while (hpet_time_after(cur_tick, t->cmp)) {
 t->cmp = (uint32_t)(t->cmp + t->period);
+t->ticks_not_accounted += t->period;
+irq_count++;
 }
 } else {
 while (hpet_time_after64(cur_tick, t->cmp)) {
 t->cmp += period;
+t->ticks_not_accounted += period;
+irq_count++;
 }
 }
 diff = hpet_calculate_diff(t, cur_tick);
+if (s->driftfix) {
+if (t->ticks_not_accounted > MAX_TICKS_NOT_ACCOUNTED) {
+t->ticks_not_accounted = t->period + t->prev_period;
+}
+if (hpet_timer_has_tick_backlog(t)) {
+if (t->irq_rate == 1 || irq_count > 1) {
+t->irq_rate++;
+t->irq_rate = MIN(t->irq_rate, MAX_IRQ_RATE);
+}
+if (t->divisor == 0) {
+assert(irq_count);
+}
+if (irq_count) {
+t->divisor = t->irq_rate;
+}
+diff /= t->divisor--;
+} else {
+t->irq_rate = 1;
+}
+}
 qemu_mod_timer(t->qemu_timer,
qemu_get_clock_ns(vm_clock) + 
(int64_t)ticks_to_ns(diff));
 } else if (t->config & HPET_TN_32BIT && !timer_is_periodic(t)) {
@@ -358,7 +397,22 @@ static void hpet_timer(void *opaque)
 t->wrap_flag = 0;
 }
 }
-update_irq(t, 1);
+if (s->driftfix && timer_is_periodic(t) && period != 0) {
+if (t->ticks_not_accounted >= t->period + t->prev_period) {
+irq_delivered = update_irq(t, 1);
+if (irq_delivered) {
+t->ticks_not_accounted -= t->prev_period;
+t->prev_period = t->period;
+} else {
+if (irq_count) {
+t->irq_rate++;
+t->irq_rate = MIN(t->irq_rate, MAX_IRQ_RATE);
+}
+}
+}
+} else {
+update_irq(t, 1);
+}
 }
 
 static void hpet_set_timer(HPETTimer *t)
@@ -697,6 +751,11 @@ static void hpet_reset(DeviceState *d)
 timer->config |=  0x0004ULL << 32;
 timer->period = 0ULL;
 timer->wrap_flag = 0;
+
+timer->prev_period = 0;
+timer->ticks_not_accounted = 0;
+

Re: nmi is broken?

2011-04-28 Thread OGAWA Hirofumi

OGAWA Hirofumi  writes:

> Avi Kivity  writes:
>
>>> This seems to be complex stuff depending on hardware configurations. I'm
>>> not fully understanding though, current state of it is,
>>>
>>> Yes, PIC is in AEOI mode if linux is using IO-APIC. Um.., kvm says
>>> irq == 0 is mp_INT mode in MADT, not mp_ExtINT.
>>
>> That is correct, kvm doesn't connect the master 8259 output to the 
>> IOAPIC.  Instead the 8259 is connected to LINT0 (which is configured for 
>> ExtInt when the IOAPIC is disabled, or for NMI which the NMI watchdog is 
>> enabled).
>>
>> However, now I can't see how it would work. auto EOI works on the INTA 
>> cycle, which would only occur if LINT0 is configured for ExtInt.  If it 
>> is configured for NMI, I don't think it would issue the INTA cycle.  So 
>> the NMI watchdog not working is actually correct for our hardware 
>> configuration!
>>
>> But I may be misunderstanding something here.
>
> I see. If the physical machine was configured as above, I guess (not
> pretty sure, I don't have this configuration machine), IOAPIC test
> (check_timer() in io_apic.c) should fail, and IOAPIC wouldn't have any
> effect. And I think MADT should tell mp_ExtINT.
>
> Yes, I also guess the above configuration wouldn't work NMI watchdog of
> IOAPIC mode, and linux will report as NMI watchdog can't work in
> check_timer().

Hm.., if smp was enabled, what configuration model is used by kvm? I
think this configuration model can't work on smp.

Thanks.
-- 
OGAWA Hirofumi 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: nmi is broken?

2011-04-28 Thread OGAWA Hirofumi

Avi Kivity  writes:

>> This seems to be complex stuff depending on hardware configurations. I'm
>> not fully understanding though, current state of it is,
>>
>> Yes, PIC is in AEOI mode if linux is using IO-APIC. Um.., kvm says
>> irq == 0 is mp_INT mode in MADT, not mp_ExtINT.
>
> That is correct, kvm doesn't connect the master 8259 output to the 
> IOAPIC.  Instead the 8259 is connected to LINT0 (which is configured for 
> ExtInt when the IOAPIC is disabled, or for NMI which the NMI watchdog is 
> enabled).
>
> However, now I can't see how it would work. auto EOI works on the INTA 
> cycle, which would only occur if LINT0 is configured for ExtInt.  If it 
> is configured for NMI, I don't think it would issue the INTA cycle.  So 
> the NMI watchdog not working is actually correct for our hardware 
> configuration!
>
> But I may be misunderstanding something here.

I see. If the physical machine was configured as above, I guess (not
pretty sure, I don't have this configuration machine), IOAPIC test
(check_timer() in io_apic.c) should fail, and IOAPIC wouldn't have any
effect. And I think MADT should tell mp_ExtINT.

Yes, I also guess the above configuration wouldn't work NMI watchdog of
IOAPIC mode, and linux will report as NMI watchdog can't work in
check_timer().

>> So I guess system does
>> automatically INTA cycle (and AEOI because of PIC config), or not
>> connected via 8259A? (like in mpspec figure 5-2.)
>>
>> To checking it, I've tested in check_timer() of linux on the physical
>> machine (irq==0 and mp_INT). The test is something like,
>>
>>  if (pin1 != -1) {
>>  /*
>>   * Ok, does IRQ0 through the IOAPIC work?
>>   */
>>  unmask_IO_APIC_irq(0);
>>  disable_8259_irq(0);
>>  if (timer_irq_works()) {
>>
>> even if I called disable_8259_irq(0), timer was still working via
>> IO-APIC.
>>
>> Would this explain why?
>
> Sorry, I got lost - what does this explain?

I think this explains irq == 0 and mp_INT configuration tell PIT is
connected to both of IOAPIC pin2 and PIC pin0, and it is why timer
interrupt is working even when PIC pin0 was disabled.

Thanks.
-- 
OGAWA Hirofumi 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/6] kvm tools: Use threadpool for virtio-net

2011-04-28 Thread Sasha Levin

virtio-net has been converted to use the threadpool.
This is very similar to the change done in virtio-blk, only here we had 2 
queues to handle.

Signed-off-by: Sasha Levin 
---
 tools/kvm/virtio-net.c |  101 
 1 files changed, 25 insertions(+), 76 deletions(-)

diff --git a/tools/kvm/virtio-net.c b/tools/kvm/virtio-net.c
index 3e13429..58b3de4 100644
--- a/tools/kvm/virtio-net.c
+++ b/tools/kvm/virtio-net.c
@@ -7,6 +7,7 @@
 #include "kvm/util.h"
 #include "kvm/kvm.h"
 #include "kvm/pci.h"
+#include "kvm/threadpool.h"
 
 #include 
 #include 
@@ -40,16 +41,9 @@ struct net_device {
uint8_t status;
uint16_tqueue_selector;
 
-   pthread_t   io_rx_thread;
-   pthread_mutex_t io_rx_mutex;
-   pthread_cond_t  io_rx_cond;
-
-   pthread_t   io_tx_thread;
-   pthread_mutex_t io_tx_mutex;
-   pthread_cond_t  io_tx_cond;
-
int tap_fd;
chartap_name[IFNAMSIZ];
+   void*jobs[VIRTIO_NET_NUM_QUEUES];
 };
 
 static struct net_device net_device = {
@@ -69,70 +63,44 @@ static struct net_device net_device = {
  1UL << VIRTIO_NET_F_GUEST_TSO6,
 };
 
-static void *virtio_net_rx_thread(void *p)
+static void virtio_net_rx_callback(struct kvm *self, void *param)
 {
struct iovec iov[VIRTIO_NET_QUEUE_SIZE];
struct virt_queue *vq;
-   struct kvm *self;
uint16_t out, in;
uint16_t head;
int len;
 
-   self = p;
-   vq = &net_device.vqs[VIRTIO_NET_RX_QUEUE];
-
-   while (1) {
-   mutex_lock(&net_device.io_rx_mutex);
-   if (!virt_queue__available(vq))
-   pthread_cond_wait(&net_device.io_rx_cond, 
&net_device.io_rx_mutex);
-   mutex_unlock(&net_device.io_rx_mutex);
-
-   while (virt_queue__available(vq)) {
-   head = virt_queue__get_iov(vq, iov, &out, &in, self);
-   len = readv(net_device.tap_fd, iov, in);
-   virt_queue__set_used_elem(vq, head, len);
-   /* We should interrupt guest right now, otherwise 
latency is huge. */
-   kvm__irq_line(self, VIRTIO_NET_IRQ, 1);
-   }
+   vq = param;
 
+   while (virt_queue__available(vq)) {
+   head = virt_queue__get_iov(vq, iov, &out, &in, self);
+   len = readv(net_device.tap_fd, iov, in);
+   virt_queue__set_used_elem(vq, head, len);
}
 
-   pthread_exit(NULL);
-   return NULL;
-
+   kvm__irq_line(self, VIRTIO_NET_IRQ, 1);
 }
 
-static void *virtio_net_tx_thread(void *p)
+static void virtio_net_tx_callback(struct kvm *self, void *param)
 {
struct iovec iov[VIRTIO_NET_QUEUE_SIZE];
struct virt_queue *vq;
-   struct kvm *self;
uint16_t out, in;
uint16_t head;
int len;
 
-   self = p;
-   vq = &net_device.vqs[VIRTIO_NET_TX_QUEUE];
-
-   while (1) {
-   mutex_lock(&net_device.io_tx_mutex);
-   if (!virt_queue__available(vq))
-   pthread_cond_wait(&net_device.io_tx_cond, 
&net_device.io_tx_mutex);
-   mutex_unlock(&net_device.io_tx_mutex);
+   vq = param;
 
-   while (virt_queue__available(vq)) {
-   head = virt_queue__get_iov(vq, iov, &out, &in, self);
-   len = writev(net_device.tap_fd, iov, out);
-   virt_queue__set_used_elem(vq, head, len);
-   }
-
-   kvm__irq_line(self, VIRTIO_NET_IRQ, 1);
+   while (virt_queue__available(vq)) {
+   head = virt_queue__get_iov(vq, iov, &out, &in, self);
+   len = writev(net_device.tap_fd, iov, out);
+   virt_queue__set_used_elem(vq, head, len);
}
 
-   pthread_exit(NULL);
-   return NULL;
-
+   kvm__irq_line(self, VIRTIO_NET_IRQ, 1);
 }
+
 static bool virtio_net_pci_io_device_specific_in(void *data, unsigned long 
offset, int size, uint32_t count)
 {
uint8_t *config_space = (uint8_t *) &net_device.net_config;
@@ -193,19 +161,7 @@ static bool virtio_net_pci_io_in(struct kvm *self, 
uint16_t port, void *data, in
 
 static void virtio_net_handle_callback(struct kvm *self, uint16_t queue_index)
 {
-   if (queue_index == VIRTIO_NET_TX_QUEUE) {
-
-   mutex_lock(&net_device.io_tx_mutex);
-   pthread_cond_signal(&net_device.io_tx_cond);
-   mutex_unlock(&net_device.io_tx_mutex);
-
-   } else if (queue_index == VIRTIO_NET_RX_QUEUE) {
-
-   mutex_lock(&net_device.io_rx_mutex);
-   pthread_cond_signal(&net_device.io_rx_cond);
-   mutex_unlock(&net_device.

[PATCH 3/6] kvm tools: Introduce generic IO threadpool

2011-04-28 Thread Sasha Levin

This patch adds a generic pool to create a common interface for working with 
threads within the kvm tool.
Main idea here is using this threadpool for all I/O threads instead of having 
every I/O module write it's own thread code.

The process of working with the thread pool is supposed to be very simple.
During initialization, Each module which is interested in working with the 
threadpool will call threadpool__add_jobtype with the callback function and a 
void* parameter. For example, virtio modules will register every virt_queue as 
a new job type.
During operation, When theres work to do for a specific job, the module will 
signal it to the queue and would expect the callback to be called with proper 
parameters. It is assured that the callback will be called once for every 
signal action and each callback will be called only once at a time (i.e. 
callback functions themselves don't need to handle threading).

Signed-off-by: Sasha Levin 
---
 tools/kvm/Makefile |1 +
 tools/kvm/include/kvm/threadpool.h |   16 
 tools/kvm/kvm-run.c|5 +
 tools/kvm/threadpool.c |  171 
 4 files changed, 193 insertions(+), 0 deletions(-)
 create mode 100644 tools/kvm/include/kvm/threadpool.h
 create mode 100644 tools/kvm/threadpool.c

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index 1b0c76e..fbce14d 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -36,6 +36,7 @@ OBJS+= kvm-cmd.o
 OBJS+= kvm-run.o
 OBJS+= qcow.o
 OBJS+= mptable.o
+OBJS+= threadpool.o
 
 DEPS   := $(patsubst %.o,%.d,$(OBJS))
 
diff --git a/tools/kvm/include/kvm/threadpool.h 
b/tools/kvm/include/kvm/threadpool.h
new file mode 100644
index 000..25b5eb8
--- /dev/null
+++ b/tools/kvm/include/kvm/threadpool.h
@@ -0,0 +1,16 @@
+#ifndef KVM__THREADPOOL_H
+#define KVM__THREADPOOL_H
+
+#include 
+
+struct kvm;
+
+typedef void (*kvm_thread_callback_fn_t)(struct kvm *kvm, void *data);
+
+int thread_pool__init(unsigned long thread_count);
+
+void *thread_pool__add_jobtype(struct kvm *kvm, kvm_thread_callback_fn_t 
callback, void *data);
+
+void thread_pool__signal_work(void *job);
+
+#endif
diff --git a/tools/kvm/kvm-run.c b/tools/kvm/kvm-run.c
index 071157a..97a17dd 100644
--- a/tools/kvm/kvm-run.c
+++ b/tools/kvm/kvm-run.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* header files for gitish interface  */
 #include 
@@ -312,6 +313,7 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
int i;
struct virtio_net_parameters net_params;
char *hi;
+   unsigned int nr_online_cpus;
 
signal(SIGALRM, handle_sigalrm);
signal(SIGQUIT, handle_sigquit);
@@ -457,6 +459,9 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
 
kvm__init_ram(kvm);
 
+   nr_online_cpus = sysconf(_SC_NPROCESSORS_ONLN);
+   thread_pool__init(nr_online_cpus);
+
for (i = 0; i < nrcpus; i++) {
if (pthread_create(&kvm_cpus[i]->thread, NULL, kvm_cpu_thread, 
kvm_cpus[i]) != 0)
die("unable to create KVM VCPU thread");
diff --git a/tools/kvm/threadpool.c b/tools/kvm/threadpool.c
new file mode 100644
index 000..e78db3a
--- /dev/null
+++ b/tools/kvm/threadpool.c
@@ -0,0 +1,171 @@
+#include "kvm/threadpool.h"
+#include "kvm/mutex.h"
+
+#include 
+#include 
+#include 
+#include 
+
+struct thread_pool__job_info {
+   kvm_thread_callback_fn_t callback;
+   struct kvm *kvm;
+   void *data;
+
+   int signalcount;
+   pthread_mutex_t mutex;
+
+   struct list_head queue;
+};
+
+static pthread_mutex_t job_mutex   = PTHREAD_MUTEX_INITIALIZER;
+static pthread_mutex_t thread_mutex= PTHREAD_MUTEX_INITIALIZER;
+static pthread_cond_t  job_cond= PTHREAD_COND_INITIALIZER;
+
+static LIST_HEAD(head);
+
+static pthread_t   *threads;
+static longthreadcount;
+
+static struct thread_pool__job_info *thread_pool__job_info_pop(void)
+{
+   struct thread_pool__job_info *job;
+
+   if (list_empty(&head))
+   return NULL;
+
+   job = list_first_entry(&head, struct thread_pool__job_info, queue);
+   list_del(&job->queue);
+
+   return job;
+}
+
+static void thread_pool__job_info_push(struct thread_pool__job_info *job)
+{
+   list_add_tail(&job->queue, &head);
+}
+
+static struct thread_pool__job_info *thread_pool__job_info_pop_locked(void)
+{
+   struct thread_pool__job_info *job;
+
+   mutex_lock(&job_mutex);
+   job = thread_pool__job_info_pop();
+   mutex_unlock(&job_mutex);
+   return job;
+}
+
+static void thread_pool__job_info_push_locked(struct thread_pool__job_info 
*job)
+{
+   mutex_lock(&job_mutex);
+   thread_pool__job_info_push(job);
+   mutex_unlock(&job_mutex);
+}
+
+static void thread_pool__handle_job(struct thread_pool__job_info *job)
+{
+   while (job) {
+   job->cal

[PATCH 5/6] kvm tools: Use threadpool for virtio-console.

2011-04-28 Thread Sasha Levin

This is very similar to the change done in virtio-net.

Notice that one signal here comes from outside the module (actual terminal) 
while the other one is generated by the virtio module.

Signed-off-by: Sasha Levin 
---
 tools/kvm/virtio-console.c |   40 ++--
 1 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/tools/kvm/virtio-console.c b/tools/kvm/virtio-console.c
index f11ce4e..e66d198 100644
--- a/tools/kvm/virtio-console.c
+++ b/tools/kvm/virtio-console.c
@@ -8,6 +8,7 @@
 #include "kvm/mutex.h"
 #include "kvm/kvm.h"
 #include "kvm/pci.h"
+#include "kvm/threadpool.h"
 
 #include 
 #include 
@@ -41,6 +42,8 @@ struct console_device {
uint16_tconfig_vector;
uint8_t status;
uint16_tqueue_selector;
+
+   void*jobs[VIRTIO_CONSOLE_NUM_QUEUES];
 };
 
 static struct console_device console_device = {
@@ -58,7 +61,7 @@ static struct console_device console_device = {
 /*
  * Interrupts are injected for hvc0 only.
  */
-void virtio_console__inject_interrupt(struct kvm *self)
+static void virtio_console__inject_interrupt_callback(struct kvm *self, void 
*param)
 {
struct iovec iov[VIRTIO_CONSOLE_QUEUE_SIZE];
struct virt_queue *vq;
@@ -68,7 +71,7 @@ void virtio_console__inject_interrupt(struct kvm *self)
 
mutex_lock(&console_device.mutex);
 
-   vq = &console_device.vqs[VIRTIO_CONSOLE_RX_QUEUE];
+   vq = param;
 
if (term_readable(CONSOLE_VIRTIO) && virt_queue__available(vq)) {
head = virt_queue__get_iov(vq, iov, &out, &in, self);
@@ -80,6 +83,11 @@ void virtio_console__inject_interrupt(struct kvm *self)
mutex_unlock(&console_device.mutex);
 }
 
+void virtio_console__inject_interrupt(struct kvm *self)
+{
+   thread_pool__signal_work(console_device.jobs[VIRTIO_CONSOLE_RX_QUEUE]);
+}
+
 static bool virtio_console_pci_io_device_specific_in(void *data, unsigned long 
offset, int size, uint32_t count)
 {
uint8_t *config_space = (uint8_t *) &console_device.console_config;
@@ -138,7 +146,7 @@ static bool virtio_console_pci_io_in(struct kvm *self, 
uint16_t port, void *data
return ret;
 }
 
-static void virtio_console_handle_callback(struct kvm *self, uint16_t 
queue_index)
+static void virtio_console_handle_callback(struct kvm *self, void *param)
 {
struct iovec iov[VIRTIO_CONSOLE_QUEUE_SIZE];
struct virt_queue *vq;
@@ -146,18 +154,15 @@ static void virtio_console_handle_callback(struct kvm 
*self, uint16_t queue_inde
uint16_t head;
uint32_t len;
 
-   vq = &console_device.vqs[queue_index];
-
-   if (queue_index == VIRTIO_CONSOLE_TX_QUEUE) {
+   vq = param;
 
-   while (virt_queue__available(vq)) {
-   head = virt_queue__get_iov(vq, iov, &out, &in, self);
-   len = term_putc_iov(CONSOLE_VIRTIO, iov, out);
-   virt_queue__set_used_elem(vq, head, len);
-   }
-
-   kvm__irq_line(self, VIRTIO_CONSOLE_IRQ, 1);
+   while (virt_queue__available(vq)) {
+   head = virt_queue__get_iov(vq, iov, &out, &in, self);
+   len = term_putc_iov(CONSOLE_VIRTIO, iov, out);
+   virt_queue__set_used_elem(vq, head, len);
}
+
+   kvm__irq_line(self, VIRTIO_CONSOLE_IRQ, 1);
 }
 
 static bool virtio_console_pci_io_out(struct kvm *self, uint16_t port, void 
*data, int size, uint32_t count)
@@ -183,6 +188,13 @@ static bool virtio_console_pci_io_out(struct kvm *self, 
uint16_t port, void *dat
 
vring_init(&queue->vring, VIRTIO_CONSOLE_QUEUE_SIZE, p, 4096);
 
+   if (console_device.queue_selector == VIRTIO_CONSOLE_TX_QUEUE)
+   console_device.jobs[console_device.queue_selector] =
+   thread_pool__add_jobtype(self, 
virtio_console_handle_callback, queue);
+   else if (console_device.queue_selector == 
VIRTIO_CONSOLE_RX_QUEUE)
+   console_device.jobs[console_device.queue_selector] =
+   thread_pool__add_jobtype(self, 
virtio_console__inject_interrupt_callback, queue);
+
break;
}
case VIRTIO_PCI_QUEUE_SEL:
@@ -191,7 +203,7 @@ static bool virtio_console_pci_io_out(struct kvm *self, 
uint16_t port, void *dat
case VIRTIO_PCI_QUEUE_NOTIFY: {
uint16_t queue_index;
queue_index = ioport__read16(data);
-   virtio_console_handle_callback(self, queue_index);
+   thread_pool__signal_work(console_device.jobs[queue_index]);
break;
}
case VIRTIO_PCI_STATUS:
-- 
1.7.5.rc3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/6] kvm tools: Use threadpool for virtio-blk

2011-04-28 Thread Sasha Levin

virtio-blk has been converted to use the threadpool.
All the threading code has been removed, which left only simple callback 
handling code.

New threadpool job types are created within VIRTIO_PCI_QUEUE_PFN for every 
queue (just one in the case of virtio-blk).
The module signals for work after receiving VIRTIO_PCI_QUEUE_NOTIFY and expects 
the threadpool to call virtio_blk_do_io to handle the I/O.
It is possible that the module will signal work several times while 
virtio_blk_do_io is already working, but there is no need to handle 
multithreading there since the threadpool will call each job in linear and not 
in parallel.

Signed-off-by: Sasha Levin 
---
 tools/kvm/virtio-blk.c |   86 +++-
 1 files changed, 12 insertions(+), 74 deletions(-)

diff --git a/tools/kvm/virtio-blk.c b/tools/kvm/virtio-blk.c
index 3516b1c..3feabd0 100644
--- a/tools/kvm/virtio-blk.c
+++ b/tools/kvm/virtio-blk.c
@@ -9,6 +9,7 @@
 #include "kvm/util.h"
 #include "kvm/kvm.h"
 #include "kvm/pci.h"
+#include "kvm/threadpool.h"
 
 #include 
 #include 
@@ -31,15 +32,13 @@ struct blk_device {
uint32_tguest_features;
uint16_tconfig_vector;
uint8_t status;
-   pthread_t   io_thread;
-   pthread_mutex_t io_mutex;
-   pthread_cond_t  io_cond;
 
/* virtio queue */
uint16_tqueue_selector;
-   uint64_tvirtio_blk_queue_set_flags;
 
struct virt_queue   vqs[NUM_VIRT_QUEUES];
+
+   void*jobs[NUM_VIRT_QUEUES];
 };
 
 #define DISK_SEG_MAX   126
@@ -57,9 +56,6 @@ static struct blk_device blk_device = {
 * same applies to VIRTIO_BLK_F_BLK_SIZE
 */
.host_features  = (1UL << VIRTIO_BLK_F_SEG_MAX),
-
-   .io_mutex   = PTHREAD_MUTEX_INITIALIZER,
-   .io_cond= PTHREAD_COND_INITIALIZER
 };
 
 static bool virtio_blk_pci_io_device_specific_in(void *data, unsigned long 
offset, int size, uint32_t count)
@@ -156,73 +152,14 @@ static bool virtio_blk_do_io_request(struct kvm *self, 
struct virt_queue *queue)
return true;
 }
 
-
-
-static int virtio_blk_get_selected_queue(struct blk_device *dev)
-{
-   int i;
-
-   for (i = 0 ; i < NUM_VIRT_QUEUES ; i++) {
-   if (dev->virtio_blk_queue_set_flags & (1 << i)) {
-   dev->virtio_blk_queue_set_flags &= ~(1 << i);
-   return i;
-   }
-   }
-
-   return -1;
-}
-
-static void virtio_blk_do_io(struct kvm *kvm, struct blk_device *dev)
+static void virtio_blk_do_io(struct kvm *kvm, void *param)
 {
-   for (;;) {
-   struct virt_queue *vq;
-   int queue_index;
-
-   mutex_lock(&dev->io_mutex);
-   queue_index = virtio_blk_get_selected_queue(dev);
-   mutex_unlock(&dev->io_mutex);
-
-   if (queue_index < 0)
-   break;
+   struct virt_queue *vq = param;
 
-   vq = &dev->vqs[queue_index];
+   while (virt_queue__available(vq))
+   virtio_blk_do_io_request(kvm, vq);
 
-   while (virt_queue__available(vq))
-   virtio_blk_do_io_request(kvm, vq);
-
-   kvm__irq_line(kvm, VIRTIO_BLK_IRQ, 1);
-   }
-}
-
-static void *virtio_blk_io_thread(void *ptr)
-{
-   struct kvm *self = ptr;
-
-   for (;;) {
-   int ret;
-
-   mutex_lock(&blk_device.io_mutex);
-   ret = pthread_cond_wait(&blk_device.io_cond, 
&blk_device.io_mutex);
-   mutex_unlock(&blk_device.io_mutex);
-
-   if (ret != 0)
-   break;
-
-   virtio_blk_do_io(self, &blk_device);
-   }
-
-   return NULL;
-}
-
-static void virtio_blk_handle_callback(struct blk_device *dev, uint16_t 
queue_index)
-{
-   mutex_lock(&dev->io_mutex);
-
-   dev->virtio_blk_queue_set_flags |= (1 << queue_index);
-
-   mutex_unlock(&dev->io_mutex);
-
-   pthread_cond_signal(&dev->io_cond);
+   kvm__irq_line(kvm, VIRTIO_BLK_IRQ, 1);
 }
 
 static bool virtio_blk_pci_io_out(struct kvm *self, uint16_t port, void *data, 
int size, uint32_t count)
@@ -250,6 +187,9 @@ static bool virtio_blk_pci_io_out(struct kvm *self, 
uint16_t port, void *data, i
 
vring_init(&queue->vring, VIRTIO_BLK_QUEUE_SIZE, p, 4096);
 
+   blk_device.jobs[blk_device.queue_selector] =
+   thread_pool__add_jobtype(self, virtio_blk_do_io, queue);
+
break;
}
case VIRTIO_PCI_QUEUE_SEL:
@@ -258,7 +198,7 @@ static bool virtio_blk_pci_io_out(struct kvm *self, 
uint16_t port, void *data, i
case VIRTIO_PCI_QUEUE_NOTIFY: {
uint16_t queue_index;
queue_index = i

[PATCH 2/6] kvm tools: Add kernel headers required for using list

2011-04-28 Thread Sasha Levin

Adds kernel headers so that  (and others) could be included 
directly.

Signed-off-by: Sasha Levin 
---
 tools/kvm/include/linux/kernel.h   |   26 ++
 tools/kvm/include/linux/prefetch.h |6 ++
 tools/kvm/include/linux/types.h|   12 
 3 files changed, 44 insertions(+), 0 deletions(-)
 create mode 100644 tools/kvm/include/linux/kernel.h
 create mode 100644 tools/kvm/include/linux/prefetch.h

diff --git a/tools/kvm/include/linux/kernel.h b/tools/kvm/include/linux/kernel.h
new file mode 100644
index 000..8d83037
--- /dev/null
+++ b/tools/kvm/include/linux/kernel.h
@@ -0,0 +1,26 @@
+#ifndef KVM__LINUX_KERNEL_H_
+#define KVM__LINUX_KERNEL_H_
+
+#define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))
+
+#define ALIGN(x,a) __ALIGN_MASK(x,(typeof(x))(a)-1)
+#define __ALIGN_MASK(x,mask)   (((x)+(mask))&~(mask))
+
+#ifndef offsetof
+#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
+#endif
+
+#ifndef container_of
+/**
+ * container_of - cast a member of a structure out to the containing structure
+ * @ptr:   the pointer to the member.
+ * @type:  the type of the container struct this is embedded in.
+ * @member:the name of the member within the struct.
+ *
+ */
+#define container_of(ptr, type, member) ({ \
+   const typeof(((type *)0)->member) * __mptr = (ptr); \
+   (type *)((char *)__mptr - offsetof(type, member)); })
+#endif
+
+#endif
diff --git a/tools/kvm/include/linux/prefetch.h 
b/tools/kvm/include/linux/prefetch.h
new file mode 100644
index 000..62f6788
--- /dev/null
+++ b/tools/kvm/include/linux/prefetch.h
@@ -0,0 +1,6 @@
+#ifndef KVM__LINUX_PREFETCH_H
+#define KVM__LINUX_PREFETCH_H
+
+static inline void prefetch(void *a __attribute__((unused))) { }
+
+#endif
diff --git a/tools/kvm/include/linux/types.h b/tools/kvm/include/linux/types.h
index efd8519..c7c444e 100644
--- a/tools/kvm/include/linux/types.h
+++ b/tools/kvm/include/linux/types.h
@@ -46,4 +46,16 @@ typedef __u32 __bitwise __be32;
 typedef __u64 __bitwise __le64;
 typedef __u64 __bitwise __be64;
 
+struct list_head {
+   struct list_head *next, *prev;
+};
+
+struct hlist_head {
+   struct hlist_node *first;
+};
+
+struct hlist_node {
+   struct hlist_node *next, **pprev;
+};
+
 #endif /* LINUX_TYPES_H */
-- 
1.7.5.rc3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/6] kvm tools: Prevent duplicate definitions of ALIGN

2011-04-28 Thread Sasha Levin

Signed-off-by: Sasha Levin 
---
 tools/kvm/include/kvm/bios.h |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/tools/kvm/include/kvm/bios.h b/tools/kvm/include/kvm/bios.h
index dd70c44..914720b 100644
--- a/tools/kvm/include/kvm/bios.h
+++ b/tools/kvm/include/kvm/bios.h
@@ -51,8 +51,10 @@
 #define MB_BIOS_SS 0xfff7
 #define MB_BIOS_SP 0x40
 
+#ifndef ALIGN
 #define ALIGN(x, a)\
(((x) + ((a) - 1)) & ~((a) - 1))
+#endif
 
 /*
  * note we use 16 bytes alignment which makes segment based
-- 
1.7.5.rc3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: VMX: Avoid reading %rip unnecessarily when handling exceptions

2011-04-28 Thread Avi Kivity

Avoids a VMREAD.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/vmx.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 3f6e9bf..139a5cb 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3170,7 +3170,6 @@ static int handle_exception(struct kvm_vcpu *vcpu)
}
 
error_code = 0;
-   rip = kvm_rip_read(vcpu);
if (intr_info & INTR_INFO_DELIVER_CODE_MASK)
error_code = vmcs_read32(VM_EXIT_INTR_ERROR_CODE);
if (is_page_fault(intr_info)) {
@@ -3217,6 +3216,7 @@ static int handle_exception(struct kvm_vcpu *vcpu)
vmx->vcpu.arch.event_exit_inst_len =
vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
kvm_run->exit_reason = KVM_EXIT_DEBUG;
+   rip = kvm_rip_read(vcpu);
kvm_run->debug.arch.pc = vmcs_readl(GUEST_CS_BASE) + rip;
kvm_run->debug.arch.exception = ex_no;
break;
-- 
1.7.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -v3] Monitor command: x-gpa2hva, translate guest physical address to host virtual address

2011-04-28 Thread Anthony Liguori


On 04/27/2011 06:06 PM, Marcelo Tosatti wrote:

On Fri, Nov 19, 2010 at 04:17:35PM +0800, Huang Ying wrote:

On Tue, 2010-11-16 at 10:23 +0800, Huang Ying wrote:

Author: Max Asbock

Add command x-gpa2hva to translate guest physical address to host
virtual address. Because gpa to hva translation is not consistent, so
this command is only used for debugging.

The x-gpa2hva command provides one step in a chain of translations from
guest virtual to guest physical to host virtual to host physical. Host
physical is then used to inject a machine check error. As a
consequence the HWPOISON code on the host and the MCE injection code
in qemu-kvm are exercised.

v3:

- Rename to x-gpa2hva
- Remove QMP support, because gpa2hva is not consistent


Is this patch an acceptable solution for now? This command is useful for
our testing.


Anthony?


Yeah, but it should come through qemu-devel, no?

Regards,

Anthony Liguori





Best Regards,
Huang Ying


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [CentOS] Install CentOS as KVM guest

2011-04-28 Thread Gleb Natapov

On Thu, Apr 28, 2011 at 07:46:28PM +0800, Emmanuel Noobadmin wrote:
> On 4/28/11, Gleb Natapov  wrote:
> > So why don't you use virt-manager?
> 
> The original intention was to run the host without any graphical
> desktop or anything not necessary to host the guests. That was based
> on reading and such which recommends not having anything beyond the
> necessary to minimize potential security problems and maximize
> resources available.
> 
Qemu is not intended to be used directly by end user. It is too complex as
you already found out. VMware don't even give you access to such low parts
of virt stack. You should use libvirt or virt-manager instead. Especially
if you are concerned about security. I think libvirt can start guest on
headless server.

If this still fails for you you need to complain to libvirt developers
(not in a rant mode, but providing details of what exact version of
software you have problem with and what are you trying to do). And
libvirt developers will not be shy to complain to qemu developers if the
problem turned to be on their side.

> Then there were those pages which warn that virt-manager didn't work
> too well if bridged networking was required.
As far as I know libvirt has no problem using bridged networking and
virt-manager use libvirt so it should work if you use new enough virt
stack, but you should ask on libvirt mailing list instead.

> 
> Last but not least, when I finally gave up and installed the desktop,
> virt-manager couldn't find the hypervisor. Checking up, it appears
> that the user needed additional permissions to certain files, which
> after given and tested via CLI, I still get errors.
What distribution are you using? I didn't saw recent Fedoras having such
problems (admittedly I do not use libvirt/virt-manager much).

> 
> Starting up X as root gave me this ominous warning that I really
> shouldn't be doing this and since I didn't think it was wise in the
> first place to have the desktop with root access running on what's
> supposed to be a production machine, I stopped trying that route and
> went back to figuring how to get virt-install to work.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -v3] Monitor command: x-gpa2hva, translate guest physical address to host virtual address

2011-04-28 Thread Marcelo Tosatti

On Fri, Nov 19, 2010 at 04:17:35PM +0800, Huang Ying wrote:
> On Tue, 2010-11-16 at 10:23 +0800, Huang Ying wrote:
> > Author: Max Asbock 
> > 
> > Add command x-gpa2hva to translate guest physical address to host
> > virtual address. Because gpa to hva translation is not consistent, so
> > this command is only used for debugging.
> > 
> > The x-gpa2hva command provides one step in a chain of translations from
> > guest virtual to guest physical to host virtual to host physical. Host
> > physical is then used to inject a machine check error. As a
> > consequence the HWPOISON code on the host and the MCE injection code
> > in qemu-kvm are exercised.
> > 
> > v3:
> > 
> > - Rename to x-gpa2hva
> > - Remove QMP support, because gpa2hva is not consistent
> 
> Is this patch an acceptable solution for now? This command is useful for
> our testing.

Anthony?

> 
> Best Regards,
> Huang Ying
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [CentOS] Install CentOS as KVM guest

2011-04-28 Thread Emmanuel Noobadmin

On 4/28/11, Gleb Natapov  wrote:
> So why don't you use virt-manager?

The original intention was to run the host without any graphical
desktop or anything not necessary to host the guests. That was based
on reading and such which recommends not having anything beyond the
necessary to minimize potential security problems and maximize
resources available.

Then there were those pages which warn that virt-manager didn't work
too well if bridged networking was required.

Last but not least, when I finally gave up and installed the desktop,
virt-manager couldn't find the hypervisor. Checking up, it appears
that the user needed additional permissions to certain files, which
after given and tested via CLI, I still get errors.

Starting up X as root gave me this ominous warning that I really
shouldn't be doing this and since I didn't think it was wise in the
first place to have the desktop with root access running on what's
supposed to be a production machine, I stopped trying that route and
went back to figuring how to get virt-install to work.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ kvm-Bugs-2998292 ] XP Guest Crash on Ubuntu 10.04

2011-04-28 Thread SourceForge.net

Bugs item #2998292, was opened at 2010-05-07 20:28
Message generated for change (Settings changed) made by jessorensen
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2998292&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
>Resolution: Invalid
Priority: 7
Private: No
Submitted By: cheesewright ()
Assigned to: Nobody/Anonymous (nobody)
Summary: XP Guest Crash on Ubuntu 10.04

Initial Comment:
OS: Ubuntu Server 64-bit 10.04, kernel 2.6.32
Server: Supermicro X8DTN+ 2x Xeon E5530 2.4GHz Nehalem quad core cpu, 12GB RAM
Guest: Windows XP SP3 32-bit

Below is my kvm command line:

/usr/bin/kvm -S -M pc-0.11 -enable-kvm -m 1024 -smp 2 -name data2 -uuid 
9a85ee7a-4db9-9108-df3c-61d04d3c943d -chardev 
socket,id=monitor,path=/var/lib/libvirt/qemu/data2.monitor,server,nowait 
-monitor chardev:monitor -localtime -boot c -drive 
file=/san/data2/winxp.img,if=ide,index=0,boot=on -drive 
file=/dev/cdrom,if=ide,media=cdrom,index=2 -net 
nic,macaddr=52:54:00:38:a8:af,vlan=0,name=nic.0 -net 
tap,fd=62,vlan=0,name=tap.0 -chardev pty,id=serial0 -serial chardev:serial0 
-parallel none -usb -usbdevice tablet -vnc 0.0.0.0:2,password -k en-us -vga 
cirrus

I have two XP SP3 guests which crash frequently on random events when 
connecting via remote desktop. The Windows Event Log reports Event ID 1003. 
These errors result in either a system freeze or full reboot.

There are several types of errors recorded:

Event Source:   System Error
Event Category: (102)
Event ID:   1003
Error code 0005, parameter1 04c0, parameter2 8653a6c8, parameter3 
, parameter4 .

Event Source:   System Error
Event Category: (102)
Event ID:   1003
Error code 000a, parameter1 3008, parameter2 0002, parameter3 
, parameter4 804ff806.

Event Source:   System Error
Event Category: (102)
Event ID:   1003
Error code 100a, parameter1 3008, parameter2 0002, parameter3 
, parameter4 804ff806.


--

>Comment By: Jes Sorensen (jessorensen)
Date: 2011-04-28 11:52

Message:
This bug tracker isn't taking any new bugs, please submit your bug in
https://bugs.launchpad.net/qemu/

In addition this is against a vendor kernel (Ubuntu), so please start by
filing the bug with them.

Thanks,
Jes


--

Comment By: J Snabb (epipe)
Date: 2011-04-28 11:48

Message:
I am seeing this problem also. Using Debian packaged qemu-kvm 0.12.5+dfsg-5
with Debian Linux kernel 2.6.32-5-amd64 on similar hardware.

kvm command line is:

/usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 1024 -smp
2,sockets=2,cores=1,threads=1 -name  -uuid
---- -nodefaults -chardev
socket,id=monitor,path=/var/lib/libvirt/qemu/.monitor,server,nowait
-mon chardev=monitor,mode=readline -rtc base=utc -boot c -drive
file=/dev//XXX,if=none,id=drive-ide0-0-0,boot=on,format=raw -device
ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -device
rtl8139,vlan=0,id=net0,mac=XX:XX:XX:XX:XX:XX,bus=pci.0,addr=0x3 -net
tap,fd=70,vlan=0,name=hostnet0 -chardev pty,id=serial0 -device
isa-serial,chardev=serial0 -usb -vnc 127.0.0.1:6 -vga cirrus -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6


--

Comment By: Jes Sorensen (jessorensen)
Date: 2010-11-26 12:01

Message:
Hi,

Are you still able to reproduce this with uptodate kvm/qemu? If you are,
would you mind opening a bug in
https://bugs.launchpad.net/qemu and close this one ?

Does this only happen when you access the systems via remote desktop? (I
presume this is some windows thing?)

Thanks,
Jes


--

Comment By: cheesewright ()
Date: 2010-05-07 20:32

Message:
The crash also happens when using kvm -M pc-0.12. It is easily reproducible
after clicking through windows such control panel apps, or web browsers,
for a few minutes.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2998292&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [CentOS] Install CentOS as KVM guest

2011-04-28 Thread Gleb Natapov

On Wed, Apr 27, 2011 at 04:57:18AM +0800, Emmanuel Noobadmin wrote:
> Unfortunately, things still don't work.
> 
> 
> It's just ridiculous that the installer under KVM does not detect the
> cdrom drive it was booted from. Trying to do a net-install doesn't
> work, maybe I messed up the networking even though br0 and eth0 is
> working on the host.
> 
> Nevermind, let's install apache and use the mounted ISO. Verified
> apache is working and the CentOS folder is accessible via web browser.
> But, still the guest installer cannot seem to access the installation
> files.
> 
> OK, so maybe I messed up the networking, after all I am the noob...
> maybe specifying --network=bridge:br0 isn't enough. There was
So why don't you use virt-manager?

> something about a tap or tunnel when initially reading up on bridged
> networking. Looking up more on this, there are several resources
> (sorry KVM FAQ leads to a page that no longer exist) which like many
> other instructions, give the commands without really explaining
> what/why.
> 
> So I have to run some tunctl command and scripts to add a bridge and
> tunnel/tap... but wait, I already made my bridge, will running the
> script kill my networking by creating a second bridge? Especially the
> warning about temporarily loosing connectivity due to ifcfg1 being
> reset.
> 
> And if I need to run this script everytime in order to activate the
> bridge and tunnel, doesn't that mean all my guest OS are screwed if
> the host reboots while I'm not around? Shouldn't these things go into
> permanent files like if-tun0 or something?
> 
> Every year, I get a little closer to not using VMWare but it seems
> like this year is going to be victory for VMWare again.
> 
> CC to kvm mailing list but I expect, like my previous request for help
> to the list, it will be rejected by mailman or a moderator.
> 
> 
> Just damn frustrated, even if it's probably just me being too stupid
> to know how to use KVM.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] rcu: export rcu_note_context_switch() function

2011-04-28 Thread Gleb Natapov

On Thu, Apr 28, 2011 at 01:01:04PM +0300, Avi Kivity wrote:
> On 04/28/2011 12:52 PM, Gleb Natapov wrote:
> >Signed-off-by: Gleb Natapov
> >@@ -93,11 +93,7 @@ static inline int rcu_needs_cpu(int cpu)
> >
> >  #endif /* #else #ifdef CONFIG_TINY_RCU */
> >
> >-static inline void rcu_note_context_switch(int cpu)
> >-{
> >-rcu_sched_qs(cpu);
> >-rcu_preempt_note_context_switch();
> >-}
> >+extern void rcu_note_context_switch(int cpu);
> >
> 
> Why are you uninlining this function?  Why not export the two
> functions it calls instead?
> 
To keep exported interface uniformal between both rcu implementations.
I do not think that rcu_note_context_switch() is inlined for
performance reason since two functions it calls are quite big in rcutiny
implementation. Paul what do you think?
 
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: SVM: Make dump_vmcb static, reduce text

2011-04-28 Thread Avi Kivity


On 04/25/2011 08:00 AM, Joe Perches wrote:

dump_vmcb isn't used outside this module, make it static.
Shrink text and object by ~1% by standardizing formats.

$ size arch/x86/kvm/svm.o*
text   data bss dec hex filename
   52910580   10072   63562f84a arch/x86/kvm/svm.o.new
   53563580   10072   64215fad7 arch/x86/kvm/svm.o.old



Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ kvm-Bugs-2998292 ] XP Guest Crash on Ubuntu 10.04

2011-04-28 Thread SourceForge.net

Bugs item #2998292, was opened at 2010-05-08 01:28
Message generated for change (Comment added) made by epipe
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2998292&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 7
Private: No
Submitted By: cheesewright ()
Assigned to: Nobody/Anonymous (nobody)
Summary: XP Guest Crash on Ubuntu 10.04

Initial Comment:
OS: Ubuntu Server 64-bit 10.04, kernel 2.6.32
Server: Supermicro X8DTN+ 2x Xeon E5530 2.4GHz Nehalem quad core cpu, 12GB RAM
Guest: Windows XP SP3 32-bit

Below is my kvm command line:

/usr/bin/kvm -S -M pc-0.11 -enable-kvm -m 1024 -smp 2 -name data2 -uuid 
9a85ee7a-4db9-9108-df3c-61d04d3c943d -chardev 
socket,id=monitor,path=/var/lib/libvirt/qemu/data2.monitor,server,nowait 
-monitor chardev:monitor -localtime -boot c -drive 
file=/san/data2/winxp.img,if=ide,index=0,boot=on -drive 
file=/dev/cdrom,if=ide,media=cdrom,index=2 -net 
nic,macaddr=52:54:00:38:a8:af,vlan=0,name=nic.0 -net 
tap,fd=62,vlan=0,name=tap.0 -chardev pty,id=serial0 -serial chardev:serial0 
-parallel none -usb -usbdevice tablet -vnc 0.0.0.0:2,password -k en-us -vga 
cirrus

I have two XP SP3 guests which crash frequently on random events when 
connecting via remote desktop. The Windows Event Log reports Event ID 1003. 
These errors result in either a system freeze or full reboot.

There are several types of errors recorded:

Event Source:   System Error
Event Category: (102)
Event ID:   1003
Error code 0005, parameter1 04c0, parameter2 8653a6c8, parameter3 
, parameter4 .

Event Source:   System Error
Event Category: (102)
Event ID:   1003
Error code 000a, parameter1 3008, parameter2 0002, parameter3 
, parameter4 804ff806.

Event Source:   System Error
Event Category: (102)
Event ID:   1003
Error code 100a, parameter1 3008, parameter2 0002, parameter3 
, parameter4 804ff806.


--

Comment By: J Snabb (epipe)
Date: 2011-04-28 16:48

Message:
I am seeing this problem also. Using Debian packaged qemu-kvm 0.12.5+dfsg-5
with Debian Linux kernel 2.6.32-5-amd64 on similar hardware.

kvm command line is:

/usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 1024 -smp
2,sockets=2,cores=1,threads=1 -name  -uuid
---- -nodefaults -chardev
socket,id=monitor,path=/var/lib/libvirt/qemu/.monitor,server,nowait
-mon chardev=monitor,mode=readline -rtc base=utc -boot c -drive
file=/dev//XXX,if=none,id=drive-ide0-0-0,boot=on,format=raw -device
ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -device
rtl8139,vlan=0,id=net0,mac=XX:XX:XX:XX:XX:XX,bus=pci.0,addr=0x3 -net
tap,fd=70,vlan=0,name=hostnet0 -chardev pty,id=serial0 -device
isa-serial,chardev=serial0 -usb -vnc 127.0.0.1:6 -vga cirrus -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6


--

Comment By: Jes Sorensen (jessorensen)
Date: 2010-11-26 18:01

Message:
Hi,

Are you still able to reproduce this with uptodate kvm/qemu? If you are,
would you mind opening a bug in
https://bugs.launchpad.net/qemu and close this one ?

Does this only happen when you access the systems via remote desktop? (I
presume this is some windows thing?)

Thanks,
Jes


--

Comment By: cheesewright ()
Date: 2010-05-08 01:32

Message:
The crash also happens when using kvm -M pc-0.12. It is easily reproducible
after clicking through windows such control panel apps, or web browsers,
for a few minutes.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2998292&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] KVM: MMU: Fix 64-bit paging breakage on x86_32

2011-04-28 Thread Avi Kivity


On 04/28/2011 01:08 AM, Takuya Yoshikawa wrote:

From: Takuya Yoshikawa

Fix regression introduced by
   commit e30d2a170506830d5eef5e9d7990c5aedf1b0a51
   KVM: MMU: Optimize guest page table walk

On x86_32, get_user() does not support 64-bit values and we fail to
build KVM at the point of 64-bit paging.

This patch fixes this by using get_user() twice for that condition.



Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] rcu: export rcu_note_context_switch() function

2011-04-28 Thread Avi Kivity


On 04/28/2011 12:52 PM, Gleb Natapov wrote:

Signed-off-by: Gleb Natapov
@@ -93,11 +93,7 @@ static inline int rcu_needs_cpu(int cpu)

  #endif /* #else #ifdef CONFIG_TINY_RCU */

-static inline void rcu_note_context_switch(int cpu)
-{
-   rcu_sched_qs(cpu);
-   rcu_preempt_note_context_switch();
-}
+extern void rcu_note_context_switch(int cpu);



Why are you uninlining this function?  Why not export the two functions 
it calls instead?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: nmi is broken?

2011-04-28 Thread Avi Kivity

On 04/28/2011 04:28 AM, OGAWA Hirofumi wrote:

Avi Kivity  writes:

>  On 04/24/2011 03:24 PM, Jan Kiszka wrote:
>>  >
>>  >   This would cause IRQs to be delivered even if the PIT is masked, no?
>>
>>  I checked the patch and our code again: NMI watchdog masking is managed
>>  via arch.vapics_in_nmi_mode and by re-checking the per-APIC mask
>>  situation in kvm_apic_local_deliver when delivering the NMI.
>>
>>  So the patch looks correct - NMIs aren't acked like timer IRQs, the
>>  current logic is definitely wrong.
>
>  Can you elaborate?  Why aren't NMIs acked (if delivered via the PIC)?
>  Is the PIC programmed into auto-EOI mode or something?

This seems to be complex stuff depending on hardware configurations. I'm
not fully understanding though, current state of it is,

Yes, PIC is in AEOI mode if linux is using IO-APIC. Um.., kvm says
irq == 0 is mp_INT mode in MADT, not mp_ExtINT.

That is correct, kvm doesn't connect the master 8259 output to the 
IOAPIC.  Instead the 8259 is connected to LINT0 (which is configured for 
ExtInt when the IOAPIC is disabled, or for NMI which the NMI watchdog is 
enabled).

However, now I can't see how it would work. auto EOI works on the INTA 
cycle, which would only occur if LINT0 is configured for ExtInt.  If it 
is configured for NMI, I don't think it would issue the INTA cycle.  So 
the NMI watchdog not working is actually correct for our hardware 
configuration!

But I may be misunderstanding something here.

So I guess system does
automatically INTA cycle (and AEOI because of PIC config), or not
connected via 8259A? (like in mpspec figure 5-2.)

To checking it, I've tested in check_timer() of linux on the physical
machine (irq==0 and mp_INT). The test is something like,

if (pin1 != -1) {
/*
 * Ok, does IRQ0 through the IOAPIC work?
 */
unmask_IO_APIC_irq(0);
 disable_8259_irq(0);
if (timer_irq_works()) {

even if I called disable_8259_irq(0), timer was still working via
IO-APIC.

Would this explain why?

Sorry, I got lost - what does this explain?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/2] make kvm guest mode to be rcu quiescent state

2011-04-28 Thread Gleb Natapov

CPU may spend a lot of time in a guest mode while other CPUs wait for
rcu grace period to elapse. This patch series makes guest mode into
quiescent state to shorten wait time.

Gleb Natapov (2):
  rcu: export rcu_note_context_switch() function
  KVM: make guest mode entry to be rcu quiescent state

 include/linux/kvm_host.h |1 +
 include/linux/rcutiny.h  |6 +-
 kernel/rcutiny.c |7 +++
 kernel/rcutree.c |1 +
 4 files changed, 10 insertions(+), 5 deletions(-)

-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] rcu: export rcu_note_context_switch() function

2011-04-28 Thread Gleb Natapov


Signed-off-by: Gleb Natapov 
---
 include/linux/rcutiny.h |6 +-
 kernel/rcutiny.c|7 +++
 kernel/rcutree.c|1 +
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index 30ebd7c..8e5f7cf 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -93,11 +93,7 @@ static inline int rcu_needs_cpu(int cpu)
 
 #endif /* #else #ifdef CONFIG_TINY_RCU */
 
-static inline void rcu_note_context_switch(int cpu)
-{
-   rcu_sched_qs(cpu);
-   rcu_preempt_note_context_switch();
-}
+extern void rcu_note_context_switch(int cpu);
 
 /*
  * Return the number of grace periods.
diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
index 0c343b9..3d715a4 100644
--- a/kernel/rcutiny.c
+++ b/kernel/rcutiny.c
@@ -78,6 +78,13 @@ void rcu_exit_nohz(void)
 
 #endif /* #ifdef CONFIG_NO_HZ */
 
+void rcu_note_context_switch(int cpu)
+{
+   rcu_sched_qs(cpu);
+   rcu_preempt_note_context_switch();
+}
+EXPORT_SYMBOL_GPL(rcu_note_context_switch);
+
 /*
  * Helper function for rcu_qsctr_inc() and rcu_bh_qsctr_inc().
  * Also disable irqs to avoid confusion due to interrupt handlers
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index dd4aea8..0837d63 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -124,6 +124,7 @@ void rcu_note_context_switch(int cpu)
rcu_sched_qs(cpu);
rcu_preempt_note_context_switch(cpu);
 }
+EXPORT_SYMBOL_GPL(rcu_note_context_switch);
 
 #ifdef CONFIG_NO_HZ
 DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = {
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] KVM: make guest mode entry to be rcu quiescent state

2011-04-28 Thread Gleb Natapov

KVM does not hold any references to rcu protected data when it switches
CPU into a guest mode. In fact switching to a guest mode is very similar
to exiting to userspase from rcu point of view. In addition CPU may stay
in a guest mode for quite a long time (up to one time slice). Lets treat
guest mode as quiescent state, just like we do with user-mode execution.

Signed-off-by: Gleb Natapov 
---
 include/linux/kvm_host.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0bc3d37..a347bce 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -593,6 +593,7 @@ static inline void kvm_guest_enter(void)
 {
account_system_vtime(current);
current->flags |= PF_VCPU;
+   rcu_note_context_switch(smp_processor_id());
 }
 
 static inline void kvm_guest_exit(void)
-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/5] qemu-kvm: pci-assign: Some more cleanups

2011-04-28 Thread Avi Kivity


On 04/28/2011 11:59 AM, Jan Kiszka wrote:

I found these patches in my shared-INTx queue. They do not depend on
that topic (which is on my to do list, not on the top, but close), so I
decided to repost them.



Looks good, will apply after Alex reviews.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] KVM: VMX: Cache vmcs segment fields

2011-04-28 Thread Avi Kivity

Since the emulator now checks segment limits and access rights, it
generates a lot more accesses to the vmcs segment fields.  Undo some
of the performance hit by cacheing those fields in a read-only cache
(the entire cache is invalidated on any write, or on guest exit).

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/vmx.c  |  102 +++
 2 files changed, 93 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index afb0e69..d2ac8e2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -136,6 +136,7 @@ enum kvm_reg_ex {
VCPU_EXREG_CR3,
VCPU_EXREG_RFLAGS,
VCPU_EXREG_CPL,
+   VCPU_EXREG_SEGMENTS,
 };
 
 enum {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 3f6e9bf..d7463c3 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -162,6 +162,10 @@ struct vcpu_vmx {
u32 ar;
} tr, es, ds, fs, gs;
} rmode;
+   struct {
+   u32 bitmask; /* 4 bits per segment (1 bit per field) */
+   struct kvm_save_segment seg[8];
+   } segment_cache;
int vpid;
bool emulation_required;
 
@@ -174,6 +178,15 @@ struct vcpu_vmx {
bool rdtscp_enabled;
 };
 
+enum segment_cache_field {
+   SEG_FIELD_SEL = 0,
+   SEG_FIELD_BASE = 1,
+   SEG_FIELD_LIMIT = 2,
+   SEG_FIELD_AR = 3,
+
+   SEG_FIELD_NR = 4
+};
+
 static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
 {
return container_of(vcpu, struct vcpu_vmx, vcpu);
@@ -646,6 +659,62 @@ static void vmcs_set_bits(unsigned long field, u32 mask)
vmcs_writel(field, vmcs_readl(field) | mask);
 }
 
+static void vmx_segment_cache_clear(struct vcpu_vmx *vmx)
+{
+   vmx->segment_cache.bitmask = 0;
+}
+
+static bool vmx_segment_cache_test_set(struct vcpu_vmx *vmx, unsigned seg,
+  unsigned field)
+{
+   bool ret;
+   u32 mask = 1 << (seg * SEG_FIELD_NR + field);
+
+   if (!(vmx->vcpu.arch.regs_avail & (1 << VCPU_EXREG_SEGMENTS))) {
+   vmx->vcpu.arch.regs_avail |= (1 << VCPU_EXREG_SEGMENTS);
+   vmx->segment_cache.bitmask = 0;
+   }
+   ret = vmx->segment_cache.bitmask & mask;
+   vmx->segment_cache.bitmask |= mask;
+   return ret;
+}
+
+static u16 vmx_read_guest_seg_selector(struct vcpu_vmx *vmx, unsigned seg)
+{
+   u16 *p = &vmx->segment_cache.seg[seg].selector;
+
+   if (!vmx_segment_cache_test_set(vmx, seg, SEG_FIELD_SEL))
+   *p = vmcs_read16(kvm_vmx_segment_fields[seg].selector);
+   return *p;
+}
+
+static ulong vmx_read_guest_seg_base(struct vcpu_vmx *vmx, unsigned seg)
+{
+   ulong *p = &vmx->segment_cache.seg[seg].base;
+
+   if (!vmx_segment_cache_test_set(vmx, seg, SEG_FIELD_BASE))
+   *p = vmcs_readl(kvm_vmx_segment_fields[seg].base);
+   return *p;
+}
+
+static u32 vmx_read_guest_seg_limit(struct vcpu_vmx *vmx, unsigned seg)
+{
+   u32 *p = &vmx->segment_cache.seg[seg].limit;
+
+   if (!vmx_segment_cache_test_set(vmx, seg, SEG_FIELD_LIMIT))
+   *p = vmcs_read32(kvm_vmx_segment_fields[seg].limit);
+   return *p;
+}
+
+static u32 vmx_read_guest_seg_ar(struct vcpu_vmx *vmx, unsigned seg)
+{
+   u32 *p = &vmx->segment_cache.seg[seg].ar;
+
+   if (!vmx_segment_cache_test_set(vmx, seg, SEG_FIELD_AR))
+   *p = vmcs_read32(kvm_vmx_segment_fields[seg].ar_bytes);
+   return *p;
+}
+
 static void update_exception_bitmap(struct kvm_vcpu *vcpu)
 {
u32 eb;
@@ -1271,9 +1340,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 
msr_index, u64 data)
break;
 #ifdef CONFIG_X86_64
case MSR_FS_BASE:
+   vmx_segment_cache_clear(vmx);
vmcs_writel(GUEST_FS_BASE, data);
break;
case MSR_GS_BASE:
+   vmx_segment_cache_clear(vmx);
vmcs_writel(GUEST_GS_BASE, data);
break;
case MSR_KERNEL_GS_BASE:
@@ -1717,6 +1788,8 @@ static void enter_pmode(struct kvm_vcpu *vcpu)
vmx->emulation_required = 1;
vmx->rmode.vm86_active = 0;
 
+   vmx_segment_cache_clear(vmx);
+
vmcs_write16(GUEST_TR_SELECTOR, vmx->rmode.tr.selector);
vmcs_writel(GUEST_TR_BASE, vmx->rmode.tr.base);
vmcs_write32(GUEST_TR_LIMIT, vmx->rmode.tr.limit);
@@ -1740,6 +1813,8 @@ static void enter_pmode(struct kvm_vcpu *vcpu)
fix_pmode_dataseg(VCPU_SREG_GS, &vmx->rmode.gs);
fix_pmode_dataseg(VCPU_SREG_FS, &vmx->rmode.fs);
 
+   vmx_segment_cache_clear(vmx);
+
vmcs_write16(GUEST_SS_SELECTOR, 0);
vmcs_write32(GUEST_SS_AR_BYTES, 0x93);
 
@@ -1803,6 +1878,8 @@ static void enter_rmode(struct kvm_vcpu *vcpu)
vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
}
 
+   vmx_segment_cach

[PATCH 1/2] KVM: x86 emulator: consolidate segment accessors

2011-04-28 Thread Avi Kivity

Instead of separate accessors for the segment selector and cached descriptor,
use one accessor for both.  This simplifies the code somewhat.

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_emulate.h |   13 +---
 arch/x86/kvm/emulate.c |  122 
 arch/x86/kvm/x86.c |   41 +++-
 3 files changed, 83 insertions(+), 93 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 28114f5..0049211 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -164,15 +164,10 @@ struct x86_emulate_ops {
int size, unsigned short port, const void *val,
unsigned int count);
 
-   bool (*get_cached_descriptor)(struct x86_emulate_ctxt *ctxt,
- struct desc_struct *desc, u32 *base3,
- int seg);
-   void (*set_cached_descriptor)(struct x86_emulate_ctxt *ctxt,
- struct desc_struct *desc, u32 base3,
- int seg);
-   u16 (*get_segment_selector)(struct x86_emulate_ctxt *ctxt, int seg);
-   void (*set_segment_selector)(struct x86_emulate_ctxt *ctxt,
-u16 sel, int seg);
+   bool (*get_segment)(struct x86_emulate_ctxt *ctxt, u16 *selector,
+   struct desc_struct *desc, u32 *base3, int seg);
+   void (*set_segment)(struct x86_emulate_ctxt *ctxt, u16 selector,
+   struct desc_struct *desc, u32 base3, int seg);
unsigned long (*get_cached_segment_base)(struct x86_emulate_ctxt *ctxt,
 int seg);
void (*get_gdt)(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 1568c49..a8faf8d 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -553,6 +553,26 @@ static int emulate_nm(struct x86_emulate_ctxt *ctxt)
return emulate_exception(ctxt, NM_VECTOR, 0, false);
 }
 
+static u16 get_segment_selector(struct x86_emulate_ctxt *ctxt, unsigned seg)
+{
+   u16 selector;
+   struct desc_struct desc;
+
+   ctxt->ops->get_segment(ctxt, &selector, &desc, NULL, seg);
+   return selector;
+}
+
+static void set_segment_selector(struct x86_emulate_ctxt *ctxt, u16 selector,
+unsigned seg)
+{
+   u16 dummy;
+   u32 base3;
+   struct desc_struct desc;
+
+   ctxt->ops->get_segment(ctxt, &dummy, &desc, &base3, seg);
+   ctxt->ops->set_segment(ctxt, selector, &desc, base3, seg);
+}
+
 static int __linearize(struct x86_emulate_ctxt *ctxt,
 struct segmented_address addr,
 unsigned size, bool write, bool fetch,
@@ -563,6 +583,7 @@ static int __linearize(struct x86_emulate_ctxt *ctxt,
bool usable;
ulong la;
u32 lim;
+   u16 sel;
unsigned cpl, rpl;
 
la = seg_base(ctxt, ctxt->ops, addr.seg) + addr.ea;
@@ -574,8 +595,8 @@ static int __linearize(struct x86_emulate_ctxt *ctxt,
return emulate_gp(ctxt, 0);
break;
default:
-   usable = ctxt->ops->get_cached_descriptor(ctxt, &desc, NULL,
- addr.seg);
+   usable = ctxt->ops->get_segment(ctxt, &sel, &desc, NULL,
+   addr.seg);
if (!usable)
goto bad;
/* code segment or read-only data segment */
@@ -598,7 +619,7 @@ static int __linearize(struct x86_emulate_ctxt *ctxt,
goto bad;
}
cpl = ctxt->ops->cpl(ctxt);
-   rpl = ctxt->ops->get_segment_selector(ctxt, addr.seg) & 3;
+   rpl = sel & 3;
cpl = max(cpl, rpl);
if (!(desc.type & 8)) {
/* data segment */
@@ -1142,9 +1163,10 @@ static void get_descriptor_table_ptr(struct 
x86_emulate_ctxt *ctxt,
 {
if (selector & 1 << 2) {
struct desc_struct desc;
+   u16 sel;
+
memset (dt, 0, sizeof *dt);
-   if (!ops->get_cached_descriptor(ctxt, &desc, NULL,
-   VCPU_SREG_LDTR));
+   if (!ops->get_segment(ctxt, &sel, &desc, NULL, VCPU_SREG_LDTR));
return;
 
dt->size = desc_limit_scaled(&desc); /* what if limit > 65535? 
*/
@@ -1305,8 +1327,7 @@ static int load_segment_descriptor(struct 
x86_emulate_ctxt *ctxt,
return ret;
}
 load:
-   ops->set_segment_selector(ctxt, selector, seg);
-   ops->set_cached_descriptor(ctxt, &seg_desc, 0, seg);
+   ops->set_segment(ctxt, selector,

[PATCH 0/2] Segment cleanups and optimizations

2011-04-28 Thread Avi Kivity

The first patch in this segment-themed patchset cleans up segment handling
in the emulator; while the second patch undoes some of the performance hit
taken by the emulator segment limit checks.

Avi Kivity (2):
  KVM: x86 emulator: consolidate segment accessors
  KVM: VMX: Cache vmcs segment fields

 arch/x86/include/asm/kvm_emulate.h |   13 +---
 arch/x86/include/asm/kvm_host.h|1 +
 arch/x86/kvm/emulate.c |  122 
 arch/x86/kvm/vmx.c |  102 +++---
 arch/x86/kvm/x86.c |   41 +++-
 5 files changed, 176 insertions(+), 103 deletions(-)

-- 
1.7.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/5] pci-assign: Convert need_emulate_cmd into a bitmask

2011-04-28 Thread Jan Kiszka

Define a mask of PCI command register bits that need to be emulated,
i.e. read back from their shadow state. We will need this for
selectively emulating the INTx mask bit.

Note: No initialization of emulate_cmd_mask to zero needed, the device
state is already zero-initialized.

Signed-off-by: Jan Kiszka 
---
 hw/device-assignment.c |   11 +--
 hw/device-assignment.h |2 +-
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 7ff1320..56b4832 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -513,9 +513,9 @@ again:
 DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
   (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
 
-if (pci_dev->need_emulate_cmd) {
+if (pci_dev->emulate_cmd_mask) {
 val = merge_bits(val, pci_default_read_config(d, PCI_COMMAND, 2),
- address, len, PCI_COMMAND, 0x);
+ address, len, PCI_COMMAND, pci_dev->emulate_cmd_mask);
 }
 
 if (!pci_dev->cap.available) {
@@ -778,10 +778,9 @@ again:
 
 /* dealing with virtual function device */
 snprintf(name, sizeof(name), "%sphysfn/", dir);
-if (!stat(name, &statbuf))
-   pci_dev->need_emulate_cmd = 1;
-else
-   pci_dev->need_emulate_cmd = 0;
+if (!stat(name, &statbuf)) {
+pci_dev->emulate_cmd_mask = 0x;
+}
 
 dev->region_number = r;
 return 0;
diff --git a/hw/device-assignment.h b/hw/device-assignment.h
index 86af0a9..ae1bc58 100644
--- a/hw/device-assignment.h
+++ b/hw/device-assignment.h
@@ -109,7 +109,7 @@ typedef struct AssignedDevice {
 void *msix_table_page;
 target_phys_addr_t msix_table_addr;
 int mmio_index;
-int need_emulate_cmd;
+uint32_t emulate_cmd_mask;
 char *configfd_name;
 int32_t bootindex;
 QLIST_ENTRY(AssignedDevice) next;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/5] pci-assign: Clean up assigned_dev_pci_read/write_config

2011-04-28 Thread Jan Kiszka

Use rages_overlap and proper constants to match the access range against
regions that need special handling. This also fixes yet uncaught
high-byte write access to the command register. Moreover, use more
constants instead of magic numbers.

Signed-off-by: Jan Kiszka 
---
 hw/device-assignment.c |   39 +--
 1 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 606d725..3481c93 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -404,13 +404,20 @@ static void assigned_dev_pci_write_config(PCIDevice *d, 
uint32_t address,
 return assigned_device_pci_cap_write_config(d, address, val, len);
 }
 
-if (address == 0x4) {
+if (ranges_overlap(address, len, PCI_COMMAND, 2)) {
 pci_default_write_config(d, address, val, len);
 /* Continue to program the card */
 }
 
-if ((address >= 0x10 && address <= 0x24) || address == 0x30 ||
-address == 0x34 || address == 0x3c || address == 0x3d) {
+/*
+ * Catch access to
+ *  - base address registers
+ *  - ROM base address & capability pointer
+ *  - interrupt line & pin
+ */
+if (ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) ||
+ranges_overlap(address, len, PCI_ROM_ADDRESS, 8) ||
+ranges_overlap(address, len, PCI_INTERRUPT_LINE, 2)) {
 /* used for update-mappings (BAR emulation) */
 pci_default_write_config(d, address, val, len);
 return;
@@ -450,9 +457,20 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice *d, 
uint32_t address,
 return val;
 }
 
-if (address < 0x4 || (pci_dev->need_emulate_cmd && address == 0x4) ||
-   (address >= 0x10 && address <= 0x24) || address == 0x30 ||
-address == 0x34 || address == 0x3c || address == 0x3d) {
+/*
+ * Catch access to
+ *  - vendor & device ID
+ *  - command register (if emulation needed)
+ *  - base address registers
+ *  - ROM base address & capability pointer
+ *  - interrupt line & pin
+ */
+if (ranges_overlap(address, len, PCI_VENDOR_ID, 4) ||
+(pci_dev->need_emulate_cmd &&
+ ranges_overlap(address, len, PCI_COMMAND, 2)) ||
+ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) ||
+ranges_overlap(address, len, PCI_ROM_ADDRESS, 8) ||
+ranges_overlap(address, len, PCI_INTERRUPT_LINE, 2)) {
 val = pci_default_read_config(d, address, len);
 DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
   (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
@@ -483,10 +501,11 @@ do_log:
 
 if (!pci_dev->cap.available) {
 /* kill the special capabilities */
-if (address == 4 && len == 4)
-val &= ~0x10;
-else if (address == 6)
-val &= ~0x10;
+if (address == PCI_COMMAND && len == 4) {
+val &= ~(PCI_STATUS_CAP_LIST << 16);
+} else if (address == PCI_STATUS) {
+val &= ~PCI_STATUS_CAP_LIST;
+}
 }
 
 return val;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/5] pci-assign: Fix dword read at PCI_COMMAND

2011-04-28 Thread Jan Kiszka

If we emulate the command register, we must only read its content from
the shadow config space. For dword read of both PCI_COMMAND and
PCI_STATUS, at least the latter must be read from the device.

For simplicity reasons and as the code path is not considered
performance critical for the affected SRIOV devices, the fix performes
device access to the command word unconditionally, even if emulation is
enabled and only that word is read.

Signed-off-by: Jan Kiszka 
---
 hw/device-assignment.c |8 +---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index f37f108..ee81434 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -482,14 +482,11 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice 
*d, uint32_t address,
 /*
  * Catch access to
  *  - vendor & device ID
- *  - command register (if emulation needed)
  *  - base address registers
  *  - ROM base address & capability pointer
  *  - interrupt line & pin
  */
 if (ranges_overlap(address, len, PCI_VENDOR_ID, 4) ||
-(pci_dev->need_emulate_cmd &&
- ranges_overlap(address, len, PCI_COMMAND, 2)) ||
 ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) ||
 ranges_overlap(address, len, PCI_ROM_ADDRESS, 8) ||
 ranges_overlap(address, len, PCI_INTERRUPT_LINE, 2)) {
@@ -521,6 +518,11 @@ do_log:
 DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
   (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
 
+if (pci_dev->need_emulate_cmd) {
+val = merge_bits(val, pci_default_read_config(d, PCI_COMMAND, 2),
+ address, len, PCI_COMMAND, 0x);
+}
+
 if (!pci_dev->cap.available) {
 /* kill the special capabilities */
 if (address == PCI_COMMAND && len == 4) {
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/5] pci-assign: Move merge_bits

2011-04-28 Thread Jan Kiszka

We will need it earlier in the file, so move it unmodified to the top.

Signed-off-by: Jan Kiszka 
---
 hw/device-assignment.c |   44 ++--
 1 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 3481c93..f37f108 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -71,6 +71,28 @@ static void assigned_device_pci_cap_write_config(PCIDevice 
*pci_dev,
 static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev,
 uint32_t address, int len);
 
+/* Merge the bits set in mask from mval into val.  Both val and mval are
+ * at the same addr offset, pos is the starting offset of the mask. */
+static uint32_t merge_bits(uint32_t val, uint32_t mval, uint8_t addr,
+   int len, uint8_t pos, uint32_t mask)
+{
+if (!ranges_overlap(addr, len, pos, 4)) {
+return val;
+}
+
+if (addr >= pos) {
+mask >>= (addr - pos) * 8;
+} else {
+mask <<= (pos - addr) * 8;
+}
+mask &= 0xU >> (4 - len) * 8;
+
+val &= ~mask;
+val |= (mval & mask);
+
+return val;
+}
+
 static uint32_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region,
uint32_t addr, int len, uint32_t *val)
 {
@@ -1278,28 +1300,6 @@ static uint8_t find_vndr_start(PCIDevice *pci_dev, 
uint32_t address)
 return cap;
 }
 
-/* Merge the bits set in mask from mval into val.  Both val and mval are
- * at the same addr offset, pos is the starting offset of the mask. */
-static uint32_t merge_bits(uint32_t val, uint32_t mval, uint8_t addr,
-   int len, uint8_t pos, uint32_t mask)
-{
-if (!ranges_overlap(addr, len, pos, 4)) {
-return val;
-}
-
-if (addr >= pos) {
-mask >>= (addr - pos) * 8;
-} else {
-mask <<= (pos - addr) * 8;
-}
-mask &= 0xU >> (4 - len) * 8;
-
-val &= ~mask;
-val |= (mval & mask);
-
-return val;
-}
-
 static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev,
 uint32_t address, int len)
 {
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/5] pci-assign: Remove suspicious hunk from assigned_dev_pci_read_config

2011-04-28 Thread Jan Kiszka

No one can remember where this came from, and it looks very hacky
anyway (we return 0 for config space address 0xfc of _every_ assigned
device, not only vga as the comment claims). So better remove it and
wait for the underlying issue to reappear.

Signed-off-by: Jan Kiszka 
---
 hw/device-assignment.c |5 -
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index ee81434..7ff1320 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -496,10 +496,6 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice *d, 
uint32_t address,
 return val;
 }
 
-/* vga specific, remove later */
-if (address == 0xFC)
-goto do_log;
-
 fd = pci_dev->real_device.config_fd;
 
 again:
@@ -514,7 +510,6 @@ again:
exit(1);
 }
 
-do_log:
 DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
   (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/5] qemu-kvm: pci-assign: Some more cleanups

2011-04-28 Thread Jan Kiszka

I found these patches in my shared-INTx queue. They do not depend on
that topic (which is on my to do list, not on the top, but close), so I
decided to repost them.



Jan Kiszka (5):
  pci-assign: Clean up assigned_dev_pci_read/write_config
  pci-assign: Move merge_bits
  pci-assign: Fix dword read at PCI_COMMAND
  pci-assign: Remove suspicious hunk from assigned_dev_pci_read_config
  pci-assign: Convert need_emulate_cmd into a bitmask

 hw/device-assignment.c |   97 +++
 hw/device-assignment.h |2 +-
 2 files changed, 57 insertions(+), 42 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [CentOS] Install CentOS as KVM guest

2011-04-28 Thread Emmanuel Noobadmin

On 4/28/11, Simon Grinberg  wrote:
> What version of VMWare are you using?

Currently, I'm not using VMWare yet on this new server as I really do
hope to be able to use an "unified" solution. But so far, it's just
one brickwall after another. I've given myself until this weekend to
get things working or just go the easy way.

Previously, I've used VMServer 2 as well as VMPlayer 3. All running
off CentOS 5.x host.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: faking vendor-id to guest for driver-install-triggering?

2011-04-28 Thread Avi Kivity


On 04/27/2011 04:15 PM, Oliver Rath wrote:

Hi there,

is it possible to announce the kvm-guest (i.e. winxp)  some arbitrary
vendor-id's in a way, that the win-client starts to install the driver
for this "card"?

I.e. the RTL8111/8168B PCI Express Gigabit Ethernet Card has the
vendor-id 10ec:8168 (taken from lspci -nn), so if i give this ID to the
kvm-guest, he should install the driver for this card.


It isn't possible without some hacking.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: performance of virtual functions compared to virtio

2011-04-28 Thread Avi Kivity


On 04/28/2011 12:13 AM, David Ahern wrote:

Is the following depict where copies are done for virtio-net?



Yes.  You should have been an artist.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [CentOS] Install CentOS as KVM guest

2011-04-28 Thread Simon Grinberg



- Original Message -
> From: "Emmanuel Noobadmin" 
> To: "CentOS mailing list" , "kvm" 
> Sent: Tuesday, April 26, 2011 11:57:18 PM
> Subject: Re: [CentOS] Install CentOS as KVM guest
> Unfortunately, things still don't work.
> 
> 
> It's just ridiculous that the installer under KVM does not detect the
> cdrom drive it was booted from. Trying to do a net-install doesn't
> work, maybe I messed up the networking even though br0 and eth0 is
> working on the host.
> 
> Nevermind, let's install apache and use the mounted ISO. Verified
> apache is working and the CentOS folder is accessible via web browser.
> But, still the guest installer cannot seem to access the installation
> files.
> 
> OK, so maybe I messed up the networking, after all I am the noob...
> maybe specifying --network=bridge:br0 isn't enough. There was
> something about a tap or tunnel when initially reading up on bridged
> networking. Looking up more on this, there are several resources
> (sorry KVM FAQ leads to a page that no longer exist) which like many
> other instructions, give the commands without really explaining
> what/why.
> 
> So I have to run some tunctl command and scripts to add a bridge and
> tunnel/tap... but wait, I already made my bridge, will running the
> script kill my networking by creating a second bridge? Especially the
> warning about temporarily loosing connectivity due to ifcfg1 being
> reset.
> 
> And if I need to run this script everytime in order to activate the
> bridge and tunnel, doesn't that mean all my guest OS are screwed if
> the host reboots while I'm not around? Shouldn't these things go into
> permanent files like if-tun0 or something?
> 
> Every year, I get a little closer to not using VMWare but it seems
> like this year is going to be victory for VMWare again.

What version of VMWare are you using?

> 
> CC to kvm mailing list but I expect, like my previous request for help
> to the list, it will be rejected by mailman or a moderator.
> 
> 
> Just damn frustrated, even if it's probably just me being too stupid
> to know how to use KVM.
> ___
> CentOS mailing list
> cen...@centos.org
> http://lists.centos.org/mailman/listinfo/centos
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bug in KVM clock backwards compensation

2011-04-28 Thread Roedel, Joerg

On Thu, Apr 28, 2011 at 03:06:01AM -0400, Jan Kiszka wrote:
> And /me still wonders (like I did when this first popped up) if the
> proper place of determining TSC stability really have to be KVM.
> 
> If the Linux core fails to detect some instability and KVM has to jump
> in, shouldn't we better improve the core's detection abilities and make
> use of them in KVM? Conceptually this looks like we are currently just
> working around a core deficit in KVM.

Yes, good question. Has this ever triggered on a real machine (not
counting the suspend/resume issue in)?

Regards,

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bug in KVM clock backwards compensation

2011-04-28 Thread Roedel, Joerg

On Thu, Apr 28, 2011 at 02:59:57AM -0400, Zachary Amsden wrote:
> So I've been going over the new code changes to the TSC related code and 
> I don't like one particular set of changes.  In particular, here:
> 
>  kvm_x86_ops->vcpu_load(vcpu, cpu);
>  if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) {
>  /* Make sure TSC doesn't go backwards */
>  s64 tsc_delta;
>  u64 tsc;
> 
>  kvm_get_msr(vcpu, MSR_IA32_TSC, &tsc);
>  tsc_delta = !vcpu->arch.last_guest_tsc ? 0 :
>   tsc - vcpu->arch.last_guest_tsc;
> 
>  if (tsc_delta < 0)
>  mark_tsc_unstable("KVM discovered backwards TSC");
>  if (check_tsc_unstable()) {
>  kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
>  vcpu->arch.tsc_catchup = 1;
>  }
> 
> 
> The point of this code fragment is to test the host clock to see if it 
> is stable, because we may have just come back from an idle phase which 
> stopped the TSC, switched CPUs, or come back from a deep sleep state 
> which reset the host TSC.

I see it different. This code wants to check if the _guest_ tsc moves
forwared (or at least not backwards). So it is fully legitimate to just
do this by reading the guest-tsc and compare it to the last one the
guest had.

> I saw a patch floating around that touched this code recently, but I 
> think there's a definite issue here that needs addressing.

In fact, this change was done to address one of your concerns. You
mentioned that the values passed to adjust_tsc_offset() were in
unconsistent units in my first version of tsc-scaling. This was a right
objection because one call-site used guest-tsc-units while the other
used host-tsc-units. This change intended to fix that by using
guest-tsc-units always for adjust_tsc_offset().

Not that the guest and the host tsc have the same units on current
machines. But with tsc-scaling these units are different.

Regards,

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bug in KVM clock backwards compensation

2011-04-28 Thread Jan Kiszka

On 2011-04-28 08:59, Zachary Amsden wrote:
> So I've been going over the new code changes to the TSC related code and
> I don't like one particular set of changes.  In particular, here:
> 
> kvm_x86_ops->vcpu_load(vcpu, cpu);
> if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) {
> /* Make sure TSC doesn't go backwards */
> s64 tsc_delta;
> u64 tsc;
> 
> kvm_get_msr(vcpu, MSR_IA32_TSC, &tsc);
> tsc_delta = !vcpu->arch.last_guest_tsc ? 0 :
>  tsc - vcpu->arch.last_guest_tsc;
> 
> if (tsc_delta < 0)
> mark_tsc_unstable("KVM discovered backwards TSC");
> if (check_tsc_unstable()) {
> kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
> vcpu->arch.tsc_catchup = 1;
> }
> 
> 
> The point of this code fragment is to test the host clock to see if it
> is stable, because we may have just come back from an idle phase which
> stopped the TSC, switched CPUs, or come back from a deep sleep state
> which reset the host TSC.
> 
> However, the above code is fetching the guest TSC instead of the host
> TSC, which isn't the way it is supposed to work.
> 
> I saw a patch floating around that touched this code recently, but I
> think there's a definite issue here that needs addressing.

And /me still wonders (like I did when this first popped up) if the
proper place of determining TSC stability really have to be KVM.

If the Linux core fails to detect some instability and KVM has to jump
in, shouldn't we better improve the core's detection abilities and make
use of them in KVM? Conceptually this looks like we are currently just
working around a core deficit in KVM.

Jan



signature.asc
Description: OpenPGP digital signature

Bug in KVM clock backwards compensation

2011-04-28 Thread Zachary Amsden

So I've been going over the new code changes to the TSC related code and 
I don't like one particular set of changes.  In particular, here:


kvm_x86_ops->vcpu_load(vcpu, cpu);
if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) {
/* Make sure TSC doesn't go backwards */
s64 tsc_delta;
u64 tsc;

kvm_get_msr(vcpu, MSR_IA32_TSC, &tsc);
tsc_delta = !vcpu->arch.last_guest_tsc ? 0 :
 tsc - vcpu->arch.last_guest_tsc;

if (tsc_delta < 0)
mark_tsc_unstable("KVM discovered backwards TSC");
if (check_tsc_unstable()) {
kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
vcpu->arch.tsc_catchup = 1;
}


The point of this code fragment is to test the host clock to see if it 
is stable, because we may have just come back from an idle phase which 
stopped the TSC, switched CPUs, or come back from a deep sleep state 
which reset the host TSC.


However, the above code is fetching the guest TSC instead of the host 
TSC, which isn't the way it is supposed to work.


I saw a patch floating around that touched this code recently, but I 
think there's a definite issue here that needs addressing.


Zach
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

97 matches

Mail list logo