[Qemu-devel] Tracing concerns for concurrent execution

2010-05-25 Thread Prerna Saxena

Hi,
I am writing a tracing back-end that logs tracepoint-based traces to 
qemu-internal global buffers.
I had zeroed in on a lockless buffer for logging traces(to prevent 
slowdowns 'cos of locks held while tracing). However, I'm not sure if 
qemu threads might need some synchronisation to access the global trace 
buffer.
I think the vcpu threads run lock-step, but this lock-step behaviour is 
not guaranteed for some other types of threads, such as the aio helper 
threads ? Also, can there be concurrency issues while tracing dynamic 
translator ?
While I'm doing my study, it would be good to have some pointers on what 
assumptions can / not be made for concurrent thread execution in qemu, 
and what could be done to safeguard tracing in such hot-paths ?


Regards,
--
Prerna Saxena

Linux Technology Centre,
IBM Systems and Technology Lab,
Bangalore, India



Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread MORITA Kazutaka
At Tue, 25 May 2010 10:12:53 -0700 (PDT),
Sage Weil wrote:
> 
> On Tue, 25 May 2010, Avi Kivity wrote:
> > > What's the reason for not having these drivers upstream? Do we gain
> > > anything by hiding them from our users and requiring them to install the
> > > drivers separately from somewhere else?
> > >
> > 
> > Six months.
> 
> FWIW, we (Ceph) aren't complaining about the 6 month lag time (and I don't 
> think the Sheepdog guys are either).
> 
I agree.  We aren't complaining about it.

> From our perspective, the current BlockDriver abstraction is ideal, as it 
> represents the reality of qemu's interaction with storage.  Any 'external' 
> interface will be inferior to that in one way or another.  But either way, 
> we are perfectly willing to work with you to all to keep in sync with any 
> future BlockDriver API improvements.  It is worth our time investment even 
> if the API is less stable.
> 
I agree.

> The ability to dynamically load a shared object using the existing api 
> would make development a bit easier, but I'm not convinced it's better for 
> for users.  I think having ceph and sheepdog upstream with qemu will serve 
> end users best, and we at least are willing to spend the time to help 
> maintain that code in qemu.git.
> 
I agree.

Regards,

Kazutaka



[Qemu-devel] migrating guest with msi-x interrupts

2010-05-25 Thread Cam Macdonell
Hi,

I'm trying to migrate a guest device with MSI-X interrupts.  However,
the interrupts are not injected into the guest.  I've added some
tracing to msix.c and it seems that the MSI-X vectors are masked when
the guest is resumed (I'm testing with static migration).

In particular, in msix.c, msix_is_masked(...) is returning true when
the guest resumes which causes msix_set_pending() to be called instead
of msix_set_irq().

/* Send an MSI-X message */
void msix_notify(PCIDevice *dev, unsigned vector)
{
uint8_t *table_entry = dev->msix_table_page + vector * MSIX_ENTRY_SIZE;
uint64_t address;
uint32_t data;

if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector])
return;

if (msix_is_masked(dev, vector)) {
msix_set_pending(dev, vector);
return;
}

...

Does migrating a guest device that uses MSI-X require
msix_load()/save() to be called explicity in a pre/post_save/load
function?

Any pointers or comments would be helpful,
Cam



Re: [Qemu-devel] Re: [PATCH] Add cache=volatile parameter to -drive

2010-05-25 Thread Anthony Liguori

On 05/25/2010 04:01 PM, Aurelien Jarno wrote:


I really think this patch can be useful, in my own case when testing
debian-installer (I already cache=writeback). In short all that is about
developing and testing, as opposed to run a VM in production, can
benefit about that. This was one of the original use case of QEMU before
KVM arrived.

Unless someone can convince me not to do it, I seriously considering
applying this patch.
   


There really needs to be an indication in the --help output of what the 
ramifications of this option are, in the very least.  It should also be 
removable via a ./configure option because no sane distribution should 
enable this for end users.


Regards,

Anthony Liguori





[Qemu-devel] Re: [PATCH] add support for protocol driver create_options

2010-05-25 Thread MORITA Kazutaka
At Tue, 25 May 2010 15:43:17 +0200,
Kevin Wolf wrote:
> 
> Am 24.05.2010 08:34, schrieb MORITA Kazutaka:
> > At Fri, 21 May 2010 18:57:36 +0200,
> > Kevin Wolf wrote:
> >>
> >> Am 20.05.2010 07:36, schrieb MORITA Kazutaka:
> >>> +
> >>> +/*
> >>> + * Append an option list (list) to an option list (dest).
> >>> + *
> >>> + * If dest is NULL, a new copy of list is created.
> >>> + *
> >>> + * Returns a pointer to the first element of dest (or the newly 
> >>> allocated copy)
> >>> + */
> >>> +QEMUOptionParameter *append_option_parameters(QEMUOptionParameter *dest,
> >>> +QEMUOptionParameter *list)
> >>> +{
> >>> +size_t num_options, num_dest_options;
> >>> +
> >>> +num_options = count_option_parameters(dest);
> >>> +num_dest_options = num_options;
> >>> +
> >>> +num_options += count_option_parameters(list);
> >>> +
> >>> +dest = qemu_realloc(dest, (num_options + 1) * 
> >>> sizeof(QEMUOptionParameter));
> >>> +
> >>> +while (list && list->name) {
> >>> +if (get_option_parameter(dest, list->name) == NULL) {
> >>> +dest[num_dest_options++] = *list;
> >>
> >> You need to add a dest[num_dest_options].name = NULL; here. Otherwise
> >> the next loop iteration works on uninitialized memory and possibly an
> >> unterminated list. I got a segfault for that reason.
> >>
> > 
> > I forgot to add it, sorry.
> > Fixed version is below.
> > 
> > Thanks,
> > 
> > Kazutaka
> > 
> > ==
> > This patch enables protocol drivers to use their create options which
> > are not supported by the format.  For example, protcol drivers can use
> > a backing_file option with raw format.
> > 
> > Signed-off-by: MORITA Kazutaka 
> 
> $ ./qemu-img create -f qcow2 -o cluster_size=4k /tmp/test.qcow2 4G
> Unknown option 'cluster_size'
> qemu-img: Invalid options for file format 'qcow2'.
> 
> I think you added another num_dest_options++ which shouldn't be there.
> 

Sorry again.  I wrongly added `dest[num_dest_options++].name = NULL;'
instead of `dest[num_dest_options].name = NULL;'.

Thanks,

Kazutaka

==
This patch enables protocol drivers to use their create options which
are not supported by the format.  For example, protcol drivers can use
a backing_file option with raw format.

Signed-off-by: MORITA Kazutaka 
---
 block.c   |7 +++
 block.h   |1 +
 qemu-img.c|   49 ++---
 qemu-option.c |   53 ++---
 qemu-option.h |2 ++
 5 files changed, 86 insertions(+), 26 deletions(-)

diff --git a/block.c b/block.c
index 6e7766a..f881f10 100644
--- a/block.c
+++ b/block.c
@@ -56,7 +56,6 @@ static int bdrv_read_em(BlockDriverState *bs, int64_t 
sector_num,
 uint8_t *buf, int nb_sectors);
 static int bdrv_write_em(BlockDriverState *bs, int64_t sector_num,
  const uint8_t *buf, int nb_sectors);
-static BlockDriver *find_protocol(const char *filename);
 
 static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
 QTAILQ_HEAD_INITIALIZER(bdrv_states);
@@ -210,7 +209,7 @@ int bdrv_create_file(const char* filename, 
QEMUOptionParameter *options)
 {
 BlockDriver *drv;
 
-drv = find_protocol(filename);
+drv = bdrv_find_protocol(filename);
 if (drv == NULL) {
 drv = bdrv_find_format("file");
 }
@@ -283,7 +282,7 @@ static BlockDriver *find_hdev_driver(const char *filename)
 return drv;
 }
 
-static BlockDriver *find_protocol(const char *filename)
+BlockDriver *bdrv_find_protocol(const char *filename)
 {
 BlockDriver *drv1;
 char protocol[128];
@@ -478,7 +477,7 @@ int bdrv_file_open(BlockDriverState **pbs, const char 
*filename, int flags)
 BlockDriver *drv;
 int ret;
 
-drv = find_protocol(filename);
+drv = bdrv_find_protocol(filename);
 if (!drv) {
 return -ENOENT;
 }
diff --git a/block.h b/block.h
index 24efeb6..9034ebb 100644
--- a/block.h
+++ b/block.h
@@ -54,6 +54,7 @@ void bdrv_info_stats(Monitor *mon, QObject **ret_data);
 
 void bdrv_init(void);
 void bdrv_init_with_whitelist(void);
+BlockDriver *bdrv_find_protocol(const char *filename);
 BlockDriver *bdrv_find_format(const char *format_name);
 BlockDriver *bdrv_find_whitelisted_format(const char *format_name);
 int bdrv_create(BlockDriver *drv, const char* filename,
diff --git a/qemu-img.c b/qemu-img.c
index cb007b7..ea091f0 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -252,8 +252,8 @@ static int img_create(int argc, char **argv)
 const char *base_fmt = NULL;
 const char *filename;
 const char *base_filename = NULL;
-BlockDriver *drv;
-QEMUOptionParameter *param = NULL;
+BlockDriver *drv, *proto_drv;
+QEMUOptionParameter *param = NULL, *create_options = NULL;
 char *options = NULL;
 
 flags = 0;
@@ -286,33 +286,42 @@ static int img_create(int argc, char **argv)
 }
 }
 
+/* Get the filename */
+if (optind >= argc)
+help();
+filename = argv[optind++];
+
 /

Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback

2010-05-25 Thread Jan Kiszka
Anthony Liguori wrote:
> On 05/25/2010 02:09 PM, Blue Swirl wrote:
>> On Mon, May 24, 2010 at 8:13 PM, Jan Kiszka  wrote:
>>   
>>> From: Jan Kiszka
>>>
>>> This allows to communicate potential IRQ coalescing during delivery from
>>> the sink back to the source. Targets that support IRQ coalescing
>>> workarounds need to register handlers that return the appropriate
>>> QEMU_IRQ_* code, and they have to propergate the code across all IRQ
>>> redirections. If the IRQ source receives a QEMU_IRQ_COALESCED, it can
>>> apply its workaround. If multiple sinks exist, the source may only
>>> consider an IRQ coalesced if all other sinks either report
>>> QEMU_IRQ_COALESCED as well or QEMU_IRQ_MASKED.
>>>  
>> No real devices are interested whether any of their output lines are
>> even connected. This would introduce a new signal type, bidirectional
>> multi-level, which is not correct.
>>
> 
> I don't think it's really an issue of correct, but I wouldn't disagree
> to a suggestion that we ought to introduce a new signal type for this
> type of bidirectional feedback.  Maybe it's qemu_coalesced_irq and has a
> similar interface as qemu_irq.

A separate type would complicate the delivery of the feedback value
across GPIO pins (as Paul requested for the RTC->HPET routing).

> 
>> I think the real solution to coalescing is put the logic inside one
>> device, in this case APIC because it has the information about irq
>> delivery. APIC could monitor incoming RTC irqs for frequency
>> information and whether they get delivered or not. If not, an internal
>> timer is installed which injects the lost irqs.

That won't fly as the IRQs will already arrive at the APIC with a
sufficiently high jitter. At the bare minimum, you need to tell the
interrupt controller about the fact that a particular IRQ should be
delivered at a specific regular rate. For this, you also need a generic
interface - nothing really "won".

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] Re: [PATCH] Add cache=volatile parameter to -drive

2010-05-25 Thread Aurelien Jarno
On Tue, May 25, 2010 at 07:59:18PM +0200, Alexander Graf wrote:
> Anthony Liguori wrote:
> > On 05/17/2010 11:23 AM, Paul Brook wrote:
>  I don't see a difference between the results. Apparently the barrier
>  option doesn't change a thing.
> 
> >>> Ok.  I don't like it, but I can see how it's compelling.  I'd like to
> >>> see the documentation improved though.  I also think a warning printed
> >>> on stdio about the safety of the option would be appropriate.
> >>>  
> >> I disagree with this last bit.
> >>
> >> Errors should be issued if the user did something wrong.
> >> Warnings should be issued if qemu did (or will soon do) something
> >> other than
> >> what the user requested, or otherwise made questionable decisions on the
> >> user's behalf.
> >>
> >> In this case we're doing exactly what the user requested. The only
> >> plausible
> >> failure case is where a user is blindly trying options that they
> >> clearly don't
> >> understand or read the documentation for. I have zero sympathy for
> >> complaints
> >> like "Someone on the Internet told me to use --breakme, and broke
> >> thinks".
> >>
> >
> > I see it as the equivalent to the Taint bit in Linux.  I want to make
> > it clear to users up front that if you use this option, and you have
> > data loss issues, don't complain.
> >
> > Just putting something in qemu-doc.texi is not enough IMHO.  Few
> > people actually read it.
> 
> So what exactly is the conclusion here? I really want to see this
> getting merged
> 

I really think this patch can be useful, in my own case when testing
debian-installer (I already cache=writeback). In short all that is about
developing and testing, as opposed to run a VM in production, can
benefit about that. This was one of the original use case of QEMU before
KVM arrived.

Unless someone can convince me not to do it, I seriously considering
applying this patch.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback

2010-05-25 Thread Anthony Liguori

On 05/25/2010 02:09 PM, Blue Swirl wrote:

On Mon, May 24, 2010 at 8:13 PM, Jan Kiszka  wrote:
   

From: Jan Kiszka

This allows to communicate potential IRQ coalescing during delivery from
the sink back to the source. Targets that support IRQ coalescing
workarounds need to register handlers that return the appropriate
QEMU_IRQ_* code, and they have to propergate the code across all IRQ
redirections. If the IRQ source receives a QEMU_IRQ_COALESCED, it can
apply its workaround. If multiple sinks exist, the source may only
consider an IRQ coalesced if all other sinks either report
QEMU_IRQ_COALESCED as well or QEMU_IRQ_MASKED.
 

No real devices are interested whether any of their output lines are
even connected. This would introduce a new signal type, bidirectional
multi-level, which is not correct.
   


I don't think it's really an issue of correct, but I wouldn't disagree 
to a suggestion that we ought to introduce a new signal type for this 
type of bidirectional feedback.  Maybe it's qemu_coalesced_irq and has a 
similar interface as qemu_irq.



I think the real solution to coalescing is put the logic inside one
device, in this case APIC because it has the information about irq
delivery. APIC could monitor incoming RTC irqs for frequency
information and whether they get delivered or not. If not, an internal
timer is installed which injects the lost irqs.

Of course, no real device could do such de-coalescing, but with this
approach, the voodoo is contained to insides of one device, APIC.

We should also take a step back to think what was the cause of lost
irqs, IIRC uneven execution rate in QEMU.


Not only that.  The pathological case is where a host is limited to a 
1khz timer frequency and the guest requests a 1khz timer frequency.  
Practically speaking, there is no way we'll ever be able to adjust 
timers to reinject lost interrupts because of the host timer limitation.



  Could this be fixed or taken
into account in timer handling? For example, CPU loop could analyze
the wall clock time between CPU exits and use that to offset the
timers. Thus the timer frequency (in wall clock time) could be made to
correspond a bit more to VCPU execution rate.
   


A lot of what motivates the timer reinjection work is very old linux 
kernels that had fixed userspace timer frequencies.  On newer host 
kernels, it's probably not nearly as important except when you get into 
pathological cases like exposing a high frequency HPET timer to the 
guest where you cannot keep up with the host.


Regards,

Anthony Liguori


Signed-off-by: Jan Kiszka
---
  hw/irq.c |   38 +-
  hw/irq.h |   22 +++---
  2 files changed, 44 insertions(+), 16 deletions(-)

diff --git a/hw/irq.c b/hw/irq.c
index 7703f62..db2cce6 100644
--- a/hw/irq.c
+++ b/hw/irq.c
@@ -26,19 +26,27 @@

  struct IRQState {
 qemu_irq_handler handler;
+qemu_irq_fb_handler feedback_handler;
 void *opaque;
 int n;
  };

-void qemu_set_irq(qemu_irq irq, int level)
+int qemu_set_irq(qemu_irq irq, int level)
  {
-if (!irq)
-return;
-
-irq->handler(irq->opaque, irq->n, level);
+if (!irq) {
+return 0;
+}
+if (irq->feedback_handler) {
+return irq->feedback_handler(irq->opaque, irq->n, level);
+} else {
+irq->handler(irq->opaque, irq->n, level);
+return QEMU_IRQ_DELIVERED;
+}
  }

-qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, void *opaque, int n)
+static qemu_irq *allocate_irqs(qemu_irq_handler handler,
+   qemu_irq_fb_handler feedback_handler,
+   void *opaque, int n)
  {
 qemu_irq *s;
 struct IRQState *p;
@@ -48,6 +56,7 @@ qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, void 
*opaque, int n)
 p = (struct IRQState *)qemu_mallocz(sizeof(struct IRQState) * n);
 for (i = 0; i<  n; i++) {
 p->handler = handler;
+p->feedback_handler = feedback_handler;
 p->opaque = opaque;
 p->n = i;
 s[i] = p;
@@ -56,22 +65,33 @@ qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, void 
*opaque, int n)
 return s;
  }

+qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, void *opaque, int n)
+{
+return allocate_irqs(handler, NULL, opaque, n);
+}
+
+qemu_irq *qemu_allocate_feedback_irqs(qemu_irq_fb_handler handler,
+  void *opaque, int n)
+{
+return allocate_irqs(NULL, handler, opaque, n);
+}
+
  void qemu_free_irqs(qemu_irq *s)
  {
 qemu_free(s[0]);
 qemu_free(s);
  }

-static void qemu_notirq(void *opaque, int line, int level)
+static int qemu_notirq(void *opaque, int line, int level)
  {
 struct IRQState *irq = opaque;

-irq->handler(irq->opaque, irq->n, !level);
+return qemu_set_irq(irq, !level);
  }

  qemu_irq qemu_irq_invert(qemu_irq irq)
  {
 /* The default state for IRQs is low, so raise the output now.  */
 qemu_irq_raise(irq);
-

Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Blue Swirl
On Mon, May 24, 2010 at 2:17 AM, Yehuda Sadeh Weinraub
 wrote:
> On Sun, May 23, 2010 at 12:59 AM, Blue Swirl  wrote:
>> On Thu, May 20, 2010 at 11:02 PM, Yehuda Sadeh Weinraub
>>  wrote:
>>> On Thu, May 20, 2010 at 1:31 PM, Blue Swirl  wrote:
 On Wed, May 19, 2010 at 7:22 PM, Christian Brunner  wrote:
> The attached patch is a block driver for the distributed file system
> Ceph (http://ceph.newdream.net/). This driver uses librados (which
> is part of the Ceph server) for direct access to the Ceph object
> store and is running entirely in userspace. Therefore it is
> called "rbd" - rados block device.
>>> ...

 IIRC underscores here may conflict with system header use. Please use
 something like QEMU_BLOCK_RADOS_H.
>>>
>>> This header is shared between the linux kernel client and the ceph
>>> userspace servers and client. We can actually get rid of it, as we
>>> only need it to define CEPH_OSD_TMAP_SET. We can move this definition
>>> to librados.h.
>>>
> diff --git a/block/rbd_types.h b/block/rbd_types.h
> new file mode 100644
> index 000..dfd5aa0
> --- /dev/null
> +++ b/block/rbd_types.h
> @@ -0,0 +1,48 @@
> +#ifndef _FS_CEPH_RBD
> +#define _FS_CEPH_RBD

 QEMU_BLOCK_RBD?
>>>
>>> This header is shared between the ceph kernel client, between the qemu
>>> rbd module (and between other ceph utilities). It'd be much easier
>>> maintaining it without having to have a different implementation for
>>> each. The same goes to the use of __le32/64 and __u32/64 within these
>>> headers.
>>
>> This is user space, so identifiers must conform to C standards. The
>> identifiers beginning with underscores are reserved.
>>
>> Doesn't __le32/64 also depend on some GCC extension? Or sparse magic?
> It depends on gcc extension. If needed we can probably have a separate
> header for the qemu block device that uses alternative types. Though
> looking at the qemu code I see use of other gcc extensions so I'm not
> sure this is a real issue.

We use some (contained with for example macros if possible), but in
earlier discussions, __le32 etc. were considered problematic. IIRC
it's hard to provide alternate versions for other compilers (or older
versions of gcc).

>
>>
>>>

> +
> +#include 

 Can you use standard includes, like  or ? Are
 Ceph libraries used in other systems than Linux?
>>>
>>> Not at the moment. I guess that we can take this include out.
>>>

> +
> +/*
> + * rbd image 'foo' consists of objects
> + *   foo.rbd      - image metadata
> + *   foo.
> + *   foo.0001
> + *   ...          - data
> + */
> +
> +#define RBD_SUFFIX             ".rbd"
> +#define RBD_DIRECTORY           "rbd_directory"
> +
> +#define RBD_DEFAULT_OBJ_ORDER  22   /* 4MB */
> +
> +#define RBD_MAX_OBJ_NAME_SIZE  96
> +#define RBD_MAX_SEG_NAME_SIZE  128
> +
> +#define RBD_COMP_NONE          0
> +#define RBD_CRYPT_NONE         0
> +
> +static const char rbd_text[] = "<<< Rados Block Device Image >>>\n";
> +static const char rbd_signature[] = "RBD";
> +static const char rbd_version[] = "001.001";
> +
> +struct rbd_obj_snap_ondisk {
> +       __le64 id;
> +       __le64 image_size;
> +} __attribute__((packed));
> +
> +struct rbd_obj_header_ondisk {
> +       char text[64];
> +       char signature[4];
> +       char version[8];
> +       __le64 image_size;

 Unaligned? Is the disk format fixed?
>>>
>>> This is a packed structure that represents the on disk format.
>>> Operations on it are being done only to read from the disk header or
>>> to write to the disk header.
>>
>> That's clear. But what exactly is the alignment of field 'image_size'?
>> Could there be implicit padding to mod 8 between 'version' and
>> 'image_size' with some compilers?
>
> Obviously it's not 64 bit aligned. As it's an on-disk header, I don't
> see alignment a real issue. As was said before, any operation on these
> fields have to go through endianity conversion anyway, and this
> structure should not be used directly. For such datastructures I'd
> rather have the fields ordered in some logical order than maintaining
> the alignment by ourselves. That's why we have that __attribute__
> packed in the end to let the compiler deal with those issues. Other
> compilers though have their own syntax for packed structures (but I do
> see other uses of this packed syntax in the qemu code).

Packed structures are OK, but the padding should be explicit to avoid
compiler problems.

Eventually the disk format is read into memory buffer and then aligned
fields should be also faster on all architectures, even on x86.

>>
>> If there were no other constraints, I'd either make the padding
>> explicit, or rearrange/resize fields so that the field alignment is
>> natural. Thus my question, can you change the disk format or are there
>>

[Qemu-devel] [Bug 585529] Re: Documentation link is broken

2010-05-25 Thread Anthony Liguori
Fixed now.  In the future, please use the discussion page on the wiki to
report bad links

** Changed in: qemu
   Status: New => Fix Released

-- 
Documentation link is broken
https://bugs.launchpad.net/bugs/585529
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: Fix Released

Bug description:
Sorry, did not know where else to send this. I could not find a "contact us" 
page for QEMU.

The link to "QEMU Documentation" on the page http://wiki.qemu.org/Manual is 
broken. It points to "http://wiki.qemu.org/download/qemu-doc.html"; which does 
not currently exist.





Re: [Qemu-devel] [PATCH 2/3] Tracepoint, buffer & monitor framework

2010-05-25 Thread Stefan Hajnoczi
On Tue, May 25, 2010 at 7:20 PM, Prerna Saxena
 wrote:
>> Some added lines of code use tabs for indentation, 4 space indentation
>> should
>> be used.
>>
>> +struct tracepoint {
>> +       char *name;                     /* Tracepoint name */
>> +       uint8_t  trace_id;              /* numerical ID */
>>
>> Maximum 256 tracepoints in QEMU?  A limit of 65536 is less likely to
>> be an issue in the future.
>>
>
> No, this field describes the maximum tracepoints for a given hash queue.

I see now, thanks.

> I'll work on merging this circular buffer + monitor-based reader as a
> backend for your proposed tracer. Would it be a good idea to have two trace
> buffers -- when one is full, it gets written to disk ; while the second is
> used to log traces.

In a double-buffering approach there are finite resources.  There
needs to be a case for when the write-out buffer hasn't been written
yet and the active trace buffer becomes full.  I think in that case
the active buffer should start overwriting the oldest entry.

> I think the monitor interface for reading traces can be retained as is.
> Also, I'd implemented the monitor interface for enabling/disabling data
> logging for a given tracepoint (for a running guest) Not sure if this is
> supported in the set of patches you've posted ? It might be a good to have
> feature.

The "disable" trace event feature in my tracing backends patchset
allows statically disabling a trace event.  It doesn't support
enabling/disabling trace events at runtime, which is left up to the
backend.

The motivation for the "disable" attribute in the trace-events file is
to allow completely disabling a trace event without having to remove
it from trace-events *and* removing trace_*() calls in QEMU source
code.  It's a handy way of completely knocking out a trace event.

Thanks for your patches,
Stefan



[Qemu-devel] Re: [PATCH 1/2] Pad iommu with an empty slot (necessary for SunOS 4.1.4)

2010-05-25 Thread Blue Swirl
On Tue, May 25, 2010 at 5:00 PM, Artyom Tarasenko
 wrote:
> 2010/5/21 Blue Swirl :
>> On Fri, May 21, 2010 at 5:23 PM, Artyom Tarasenko
>>  wrote:
>>> 2010/5/10 Blue Swirl :
 On 5/10/10, Artyom Tarasenko  wrote:
> 2010/5/10 Blue Swirl :
>
> > On 5/10/10, Artyom Tarasenko  wrote:
>  >> 2010/5/9 Blue Swirl :
>  >>  > On 5/9/10, Artyom Tarasenko  wrote:
>  >>  >> 2010/5/9 Blue Swirl :
>  >>  >>
>  >>  >> > On 5/8/10, Artyom Tarasenko  wrote:
>  >>  >>  >> On the real hardware (SS-5, LX) the MMU is not padded, but 
> aliased.
>  >>  >>  >>  Software shouldn't use aliased addresses, neither should it 
> crash
>  >>  >>  >>  when it uses (on the real hardware it wouldn't). Using 
> empty_slot
>  >>  >>  >>  instead of aliasing can help with debugging such accesses.
>  >>  >>  >
>  >>  >>  > TurboSPARC Microprocessor User's Manual shows that there are
>  >>  >>  > additional pages after the main IOMMU for AFX registers. So 
> this is
>  >>  >>  > not board specific, but depends on CPU/IOMMU versions.
>  >>  >>
>  >>  >>
>  >>  >> I checked it on the real hw: on LX and SS-5 these are aliased MMU 
> addresses.
>  >>  >>  SS-20 doesn't have any aliasing.
>  >>  >
>  >>  > But are your machines equipped with TurboSPARC or some other CPU?
>  >>
>  >>
>  >> Good point, I must confess, I missed the word "Turbo" in your first
>  >>  answer. LX and SS-20 don't.
>  >>  But SS-5 must have a TurboSPARC CPU:
>  >>
>  >>  ok cd /FMI,MB86904
>  >>  ok .attributes
>  >>  context-table            00 00 00 00 03 ff f0 00 00 00 10 00
>  >>  psr-implementation       
>  >>  psr-version              0004
>  >>  implementation           
>  >>  version                  0004
>  >>  cache-line-size          0020
>  >>  cache-nlines             0200
>  >>  page-size                1000
>  >>  dcache-line-size         0010
>  >>  dcache-nlines            0200
>  >>  dcache-associativity     0001
>  >>  icache-line-size         0020
>  >>  icache-nlines            0200
>  >>  icache-associativity     0001
>  >>  ncaches                  0002
>  >>  mmu-nctx                 0100
>  >>  sparc-version            0008
>  >>  mask_rev                 0026
>  >>  device_type              cpu
>  >>  name                     FMI,MB86904
>  >>
>  >>  and still it behaves the same as TI,TMS390S10 from the LX. This is 
> done on SS-5:
>  >>
>  >>  ok 1000 20 spacel@ .
>  >>  409
>  >>  ok 1400 20 spacel@ .
>  >>  409
>  >>  ok 1404 20 spacel@ .
>  >>  23000
>  >>  ok 1f04 20 spacel@ .
>  >>  23000
>  >>  ok 1008 20 spacel@ .
>  >>  409
>  >>  ok 1428 20 spacel@ .
>  >>  409
>  >>  ok 100c 20 spacel@ .
>  >>  23000
>  >>  ok 1010 20 spacel@ .
>  >>  409
>  >>
>  >>
>  >>  LX is the same except for the IOMMU-version:
>  >>
>  >>  ok 1000 20 spacel@ .
>  >>  405
>  >>  ok 1400 20 spacel@ .
>  >>  405
>  >>  ok 1800 20 spacel@ .
>  >>  405
>  >>  ok 1f00 20 spacel@ .
>  >>  405
>  >>  ok 1ff0 20 spacel@ .
>  >>  405
>  >>  ok 1fff0004 20 spacel@ .
>  >>  1fe000
>  >>  ok 1004 20 spacel@ .
>  >>  1fe000
>  >>  ok 1108 20 spacel@ .
>  >>  4105
>  >>  ok 1040 20 spacel@ .
>  >>  4105
>  >>  ok 1fff0040 20 spacel@ .
>  >>  4105
>  >>  ok 1fff0044 20 spacel@ .
>  >>  1fe000
>  >>  ok 1fff0024 20 spacel@ .
>  >>  1fe000
>  >>
>  >>
>  >>  >>  At what address the additional AFX registers are located?
>  >>  >
>  >>  > Here's complete TurboSPARC IOMMU address map:
>  >>  >  PA[30:0]          Register          Access
>  >>  > 1000_       IOMMU Control         R/W
>  >>  > 1000_0004    IOMMU Base Address       R/W
>  >>  > 1000_0014   Flush All IOTLB Entries    W
>  >>  > 1000_0018        Address Flush         W
>  >>  > 1000_1000  Asynchronous Fault Status  R/W
>  >>  > 1000_1004 Asynchronous Fault Address  R/W
>  >>  > 1000_1010  SBus Slot Configuration 0   R/W
>  >>  > 1000_1014  SBus Slot Configuration 1   R/W
>  >>  > 1000_1018  SBus Slot Configuration 2   R/W
>  >>  > 1000_101C  SBus Slot Configuration 3   R/W
>  >>  > 1000_1020  SBus Slot Configuration 4   R/W
>  >>  > 1000_1050     Memory Fault Status     R/W
>  >>  > 1000_1054    Memory Fault Address     R/W
>  >>  > 1000_2000     Module Identification    R/W
>  >>  > 1000_3018      Mask Identification      R
>  >>  > 1000_4000      AFX Queue Level         W
>  >>  > 1000_6000      AFX Queue Level         R
>  >>  > 1

Re: [Qemu-devel] [PATCH, RFC 3/4] Implement byte swapped MMIO type

2010-05-25 Thread Igor Kovalenko
On Mon, May 24, 2010 at 12:34 AM, Blue Swirl  wrote:
> BROKEN
>
> Signed-off-by: Blue Swirl 
> ---
>  cpu-common.h       |    3 +-
>  softmmu_template.h |   69 
> ++--
>  2 files changed, 63 insertions(+), 9 deletions(-)

Changes to io_read and io_write seem to have issue with considering
wrong value bits, you should check physaddr instead since index is cut
off physaddr.

It's probably a good idea to instead do a byteswap at __ld*/__st*
level (a bit higher level than in your change.)
These two sets are directly entered from runtime, and this way you
need only two templates changed.
Not sure if it is a good idea to handle unaligned accesses.

-- 
Kind regards,
Igor V. Kovalenko



Re: [Qemu-devel] [PATCH 2/2] sparc64: clean up pci bridge map

2010-05-25 Thread Igor Kovalenko
On Tue, May 25, 2010 at 11:24 PM, Blue Swirl  wrote:
> On Tue, May 25, 2010 at 12:09 PM, Igor V. Kovalenko
>  wrote:
>> From: Igor V. Kovalenko 
>>
>> - remove unused host state and store pci bus pointer only
>> - do not map host state access into unused 1fe.1000 range
>> - reorder pci region registration
>> - assign pci i/o region to isa_mem_base
>
> Looks good. Could you make a separate patch from the part that depends
> on OpenBIOS update and another for the cleanups? I think the cleanups
> could be applied quickly, but the OpenBIOS PCI changes may need more
> consideration.

The only real cleanup is removal of host state which becomes unusable
after mapping change. I think these changes may go in as is along with
OpenBIOS set "sparc64 cleanups v1" which supports changed address
ranges.

The PCI changes to OpenBIOS in set "encode-int related changes and pci
bus scan amendment" are separate but there is a simple dependency on
these cleanups in register mapping area.

It was probably bad idea to split OpenBIOS changes to 2 sets instead
of sending those as one series.

-- 
Kind regards,
Igor V. Kovalenko



Re: [Qemu-devel] Re: irq problems after live migration with 0.12.4

2010-05-25 Thread Michael Tokarev

25.05.2010 15:03, Peter Lieven wrote:

Michael Tokarev wrote:

23.05.2010 13:55, Peter Lieven wrote:

[]

[64442.298521] irq 10: nobody cared (try booting with the "irqpoll" option)

[]

[64442.299433] handlers:
[64442.299840] [] (e1000_intr+0x0/0x190 [e1000])
[64442.300046] Disabling IRQ #10


Apparently, for some reason, e1000_intr decided it's not
interesting IRQ or somehow wrong or not for that NIC.  I
dunno.  But something fishy is going on with IRQs here.


See also LP bug #584131 (https://bugs.launchpad.net/bugs/584131)
and original Debian bug#580649 (http://bugs.debian.org/580649)



Not sure if they're related...


It looks they are actually the same thing, but happens with
different devices and/or IRQs.  Either spurious, or unwanted,
or unrecognized or somesuch IRQ which is not recognized by
the irq handler, which results in disabling that IRQ by the
kernel, which is a bad thing (In your case it works because
e1000 works in 2 modes, interrupts and polling).


michael, do you have any ideas what i got do to debug whats happening?


Unfortunately, no idea.  I don't know neither kernel nor kvm
internals.


looking at launchpad and debian bug tracker i found other bugs also
with a maybe related problem. so this issue might be greater...


Can you share your findings?  I don't know other debian bugs which
are similar to this one.

Thanks!

/mjt



[Qemu-devel] [Bug 585529] [NEW] Documentation link is broken

2010-05-25 Thread Paul Rensing
Public bug reported:

Sorry, did not know where else to send this. I could not find a "contact
us" page for QEMU.

The link to "QEMU Documentation" on the page http://wiki.qemu.org/Manual
is broken. It points to "http://wiki.qemu.org/download/qemu-doc.html";
which does not currently exist.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
Documentation link is broken
https://bugs.launchpad.net/bugs/585529
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: New

Bug description:
Sorry, did not know where else to send this. I could not find a "contact us" 
page for QEMU.

The link to "QEMU Documentation" on the page http://wiki.qemu.org/Manual is 
broken. It points to "http://wiki.qemu.org/download/qemu-doc.html"; which does 
not currently exist.





Re: [Qemu-devel] [PATCH 2/2] sparc64: clean up pci bridge map

2010-05-25 Thread Blue Swirl
On Tue, May 25, 2010 at 12:09 PM, Igor V. Kovalenko
 wrote:
> From: Igor V. Kovalenko 
>
> - remove unused host state and store pci bus pointer only
> - do not map host state access into unused 1fe.1000 range
> - reorder pci region registration
> - assign pci i/o region to isa_mem_base

Looks good. Could you make a separate patch from the part that depends
on OpenBIOS update and another for the cleanups? I think the cleanups
could be applied quickly, but the OpenBIOS PCI changes may need more
consideration.

> Signed-off-by: Igor V. Kovalenko 
> ---
>  hw/apb_pci.c |   49 ++---
>  hw/sun4u.c   |    4 ++--
>  2 files changed, 28 insertions(+), 25 deletions(-)
>
> diff --git a/hw/apb_pci.c b/hw/apb_pci.c
> index 65d8ba6..b53e3c3 100644
> --- a/hw/apb_pci.c
> +++ b/hw/apb_pci.c
> @@ -65,7 +65,7 @@ do { printf("APB: " fmt , ## __VA_ARGS__); } while (0)
>
>  typedef struct APBState {
>     SysBusDevice busdev;
> -    PCIHostState host_state;
> +    PCIBus      *bus;
>     ReadWriteHandler pci_config_handler;
>     uint32_t iommu[4];
>     uint32_t pci_control[16];
> @@ -191,7 +191,7 @@ static void apb_pci_config_write(ReadWriteHandler *h, 
> pcibus_t addr,
>
>     val = qemu_bswap_len(val, size);
>     APB_DPRINTF("%s: addr " TARGET_FMT_lx " val %x\n", __func__, addr, val);
> -    pci_data_write(s->host_state.bus, addr, val, size);
> +    pci_data_write(s->bus, addr, val, size);
>  }
>
>  static uint32_t apb_pci_config_read(ReadWriteHandler *h, pcibus_t addr,
> @@ -200,7 +200,7 @@ static uint32_t apb_pci_config_read(ReadWriteHandler *h, 
> pcibus_t addr,
>     uint32_t ret;
>     APBState *s = container_of(h, APBState, pci_config_handler);
>
> -    ret = pci_data_read(s->host_state.bus, addr, size);
> +    ret = pci_data_read(s->bus, addr, size);
>     ret = qemu_bswap_len(ret, size);
>     APB_DPRINTF("%s: addr " TARGET_FMT_lx " -> %x\n", __func__, addr, ret);
>     return ret;
> @@ -331,37 +331,37 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base,
>     s = sysbus_from_qdev(dev);
>     /* apb_config */
>     sysbus_mmio_map(s, 0, special_base);
> +    /* PCI configuration space */
> +    sysbus_mmio_map(s, 1, special_base + 0x100ULL);
>     /* pci_ioport */
> -    sysbus_mmio_map(s, 1, special_base + 0x200ULL);
> -    /* pci_config */
> -    sysbus_mmio_map(s, 2, special_base + 0x100ULL);
> -    /* mem_data */
> -    sysbus_mmio_map(s, 3, mem_base);
> +    sysbus_mmio_map(s, 2, special_base + 0x200ULL);
>     d = FROM_SYSBUS(APBState, s);
> -    d->host_state.bus = pci_register_bus(&d->busdev.qdev, "pci",
> +
> +    d->bus = pci_register_bus(&d->busdev.qdev, "pci",
>                                          pci_apb_set_irq, pci_pbm_map_irq, d,
>                                          0, 32);
> -    pci_bus_set_mem_base(d->host_state.bus, mem_base);
> +    pci_bus_set_mem_base(d->bus, mem_base);
>
>     for (i = 0; i < 32; i++) {
>         sysbus_connect_irq(s, i, pic[i]);
>     }
>
> -    pci_create_simple(d->host_state.bus, 0, "pbm");
> +    pci_create_simple(d->bus, 0, "pbm");
> +
>     /* APB secondary busses */
> -    *bus2 = pci_bridge_init(d->host_state.bus, PCI_DEVFN(1, 0),
> +    *bus2 = pci_bridge_init(d->bus, PCI_DEVFN(1, 0),
>                             PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_SIMBA,
>                             pci_apb_map_irq,
>                             "Advanced PCI Bus secondary bridge 1");
>     apb_pci_bridge_init(*bus2);
>
> -    *bus3 = pci_bridge_init(d->host_state.bus, PCI_DEVFN(1, 1),
> +    *bus3 = pci_bridge_init(d->bus, PCI_DEVFN(1, 1),
>                             PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_SIMBA,
>                             pci_apb_map_irq,
>                             "Advanced PCI Bus secondary bridge 2");
>     apb_pci_bridge_init(*bus3);
>
> -    return d->host_state.bus;
> +    return d->bus;
>  }
>
>  static void pci_pbm_reset(DeviceState *d)
> @@ -382,7 +382,7 @@ static void pci_pbm_reset(DeviceState *d)
>  static int pci_pbm_init_device(SysBusDevice *dev)
>  {
>     APBState *s;
> -    int pci_mem_data, apb_config, pci_ioport, pci_config;
> +    int pci_config, apb_config, pci_ioport;
>     unsigned int i;
>
>     s = FROM_SYSBUS(APBState, dev);
> @@ -396,20 +396,23 @@ static int pci_pbm_init_device(SysBusDevice *dev)
>     /* apb_config */
>     apb_config = cpu_register_io_memory(apb_config_read,
>                                         apb_config_write, s);
> +    /* at region 0 */
>     sysbus_init_mmio(dev, 0x1ULL, apb_config);
> -    /* pci_ioport */
> -    pci_ioport = cpu_register_io_memory(pci_apb_ioread,
> -                                          pci_apb_iowrite, s);
> -    sysbus_init_mmio(dev, 0x1ULL, pci_ioport);
> -    /* pci_config */
> +
> +    /* PCI configuration space */
>     s->pci_config_handler.read = apb_pci_config_read;
>     s->pci_config_handler.write = apb_pci_config_write;
>     pci_config = cpu_register_io_memory_simple(&s->pci_conf

Re: [Qemu-devel] Re: [RFC PATCH] AMD IOMMU emulation

2010-05-25 Thread Blue Swirl
On Tue, May 25, 2010 at 8:39 AM, Joerg Roedel  wrote:
> On Mon, May 24, 2010 at 08:10:16PM +, Blue Swirl wrote:
>> On Mon, May 24, 2010 at 3:40 PM, Joerg Roedel  wrote:
>> >> +
>> >> +#define MMIO_SIZE               0x2028
>> >
>> > This size should be a power-of-two value. In this case probably 0x4000.
>>
>> Not really, the devices can reserve regions of any size. There were
>> some implementation deficiencies in earlier versions of QEMU, where
>> the whole page would be reserved anyway, but this limitation has been
>> removed long time ago.
>
> The drivers for AMD IOMMU expect that to be 0x4000. At least the Linux
> driver maps the MMIO region with this size. So the emulation should
> reserve this amount of MMIO space too.

Well, Linux drivers may take a conservative approach so I'd check
what's the value in the device specs. In practice, on x86 hardware the
size doesn't matter too much, for example on Sparc an access beyond
the end of the device region would trap.



[Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback

2010-05-25 Thread Blue Swirl
On Mon, May 24, 2010 at 8:13 PM, Jan Kiszka  wrote:
> From: Jan Kiszka 
>
> This allows to communicate potential IRQ coalescing during delivery from
> the sink back to the source. Targets that support IRQ coalescing
> workarounds need to register handlers that return the appropriate
> QEMU_IRQ_* code, and they have to propergate the code across all IRQ
> redirections. If the IRQ source receives a QEMU_IRQ_COALESCED, it can
> apply its workaround. If multiple sinks exist, the source may only
> consider an IRQ coalesced if all other sinks either report
> QEMU_IRQ_COALESCED as well or QEMU_IRQ_MASKED.

No real devices are interested whether any of their output lines are
even connected. This would introduce a new signal type, bidirectional
multi-level, which is not correct.

I think the real solution to coalescing is put the logic inside one
device, in this case APIC because it has the information about irq
delivery. APIC could monitor incoming RTC irqs for frequency
information and whether they get delivered or not. If not, an internal
timer is installed which injects the lost irqs.

Of course, no real device could do such de-coalescing, but with this
approach, the voodoo is contained to insides of one device, APIC.

We should also take a step back to think what was the cause of lost
irqs, IIRC uneven execution rate in QEMU. Could this be fixed or taken
into account in timer handling? For example, CPU loop could analyze
the wall clock time between CPU exits and use that to offset the
timers. Thus the timer frequency (in wall clock time) could be made to
correspond a bit more to VCPU execution rate.

>
> Signed-off-by: Jan Kiszka 
> ---
>  hw/irq.c |   38 +-
>  hw/irq.h |   22 +++---
>  2 files changed, 44 insertions(+), 16 deletions(-)
>
> diff --git a/hw/irq.c b/hw/irq.c
> index 7703f62..db2cce6 100644
> --- a/hw/irq.c
> +++ b/hw/irq.c
> @@ -26,19 +26,27 @@
>
>  struct IRQState {
>     qemu_irq_handler handler;
> +    qemu_irq_fb_handler feedback_handler;
>     void *opaque;
>     int n;
>  };
>
> -void qemu_set_irq(qemu_irq irq, int level)
> +int qemu_set_irq(qemu_irq irq, int level)
>  {
> -    if (!irq)
> -        return;
> -
> -    irq->handler(irq->opaque, irq->n, level);
> +    if (!irq) {
> +        return 0;
> +    }
> +    if (irq->feedback_handler) {
> +        return irq->feedback_handler(irq->opaque, irq->n, level);
> +    } else {
> +        irq->handler(irq->opaque, irq->n, level);
> +        return QEMU_IRQ_DELIVERED;
> +    }
>  }
>
> -qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, void *opaque, int n)
> +static qemu_irq *allocate_irqs(qemu_irq_handler handler,
> +                               qemu_irq_fb_handler feedback_handler,
> +                               void *opaque, int n)
>  {
>     qemu_irq *s;
>     struct IRQState *p;
> @@ -48,6 +56,7 @@ qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, void 
> *opaque, int n)
>     p = (struct IRQState *)qemu_mallocz(sizeof(struct IRQState) * n);
>     for (i = 0; i < n; i++) {
>         p->handler = handler;
> +        p->feedback_handler = feedback_handler;
>         p->opaque = opaque;
>         p->n = i;
>         s[i] = p;
> @@ -56,22 +65,33 @@ qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, 
> void *opaque, int n)
>     return s;
>  }
>
> +qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, void *opaque, int n)
> +{
> +    return allocate_irqs(handler, NULL, opaque, n);
> +}
> +
> +qemu_irq *qemu_allocate_feedback_irqs(qemu_irq_fb_handler handler,
> +                                      void *opaque, int n)
> +{
> +    return allocate_irqs(NULL, handler, opaque, n);
> +}
> +
>  void qemu_free_irqs(qemu_irq *s)
>  {
>     qemu_free(s[0]);
>     qemu_free(s);
>  }
>
> -static void qemu_notirq(void *opaque, int line, int level)
> +static int qemu_notirq(void *opaque, int line, int level)
>  {
>     struct IRQState *irq = opaque;
>
> -    irq->handler(irq->opaque, irq->n, !level);
> +    return qemu_set_irq(irq, !level);
>  }
>
>  qemu_irq qemu_irq_invert(qemu_irq irq)
>  {
>     /* The default state for IRQs is low, so raise the output now.  */
>     qemu_irq_raise(irq);
> -    return qemu_allocate_irqs(qemu_notirq, irq, 1)[0];
> +    return allocate_irqs(NULL, qemu_notirq, irq, 1)[0];
>  }
> diff --git a/hw/irq.h b/hw/irq.h
> index 5daae44..eee03e6 100644
> --- a/hw/irq.h
> +++ b/hw/irq.h
> @@ -3,15 +3,18 @@
>
>  /* Generic IRQ/GPIO pin infrastructure.  */
>
> -/* FIXME: Rmove one of these.  */
> +#define QEMU_IRQ_DELIVERED      0
> +#define QEMU_IRQ_COALESCED      (-1)
> +#define QEMU_IRQ_MASKED         (-2)
> +
>  typedef void (*qemu_irq_handler)(void *opaque, int n, int level);
> -typedef void SetIRQFunc(void *opaque, int irq_num, int level);
> +typedef int (*qemu_irq_fb_handler)(void *opaque, int n, int level);
>
> -void qemu_set_irq(qemu_irq irq, int level);
> +int qemu_set_irq(qemu_irq irq, int level);
>
> -static inline void qemu_irq_raise(qemu_irq irq)
> 

Re: [Qemu-devel] Re: [PATCH] Add cache=volatile parameter to -drive

2010-05-25 Thread Alexander Graf
Anthony Liguori wrote:
> On 05/25/2010 12:59 PM, Alexander Graf wrote:
>>> I see it as the equivalent to the Taint bit in Linux.  I want to make
>>> it clear to users up front that if you use this option, and you have
>>> data loss issues, don't complain.
>>>
>>> Just putting something in qemu-doc.texi is not enough IMHO.  Few
>>> people actually read it.
>>>  
>> So what exactly is the conclusion here? I really want to see this
>> getting merged.
>>
>
> Make it more scary and try again.

I don't see how to make it more scary while still considering users sane
human beings. You don't print out a big fat warning on -drive if=scsi
either, right?

Alex




Re: [Qemu-devel] Re: [PATCH 1/5] Exit if incoming migration fails

2010-05-25 Thread Anthony Liguori

On 05/25/2010 01:37 PM, Juan Quintela wrote:

Luiz Capitulino  wrote:
   

On Tue, 25 May 2010 16:21:01 +0200
Juan Quintela  wrote:

 

Signed-off-by: Juan Quintela
---
  migration.c |   16 ++--
  migration.h |2 +-
  vl.c|7 ++-
  3 files changed, 17 insertions(+), 8 deletions(-)

   
   

  While I agree on the change, I have two comments:

1. By taking a look at the code I have the impression that most of the
fun failures will happen on the handler passed to qemu_set_fd_handler2(),
do you agree? Any plan to address that?
 

That is outgoing migration, not incoming migration.
Incoming migration in synchronous..


   

1. Is exit()ing the best thing to be done? I understand it's the easiest
and maybe better than nothing, but wouldn't it be better to enter in
paused-forever state so that clients can query and decide what to do?
 

For incoming migration, if it fails in the middle, every bet is off.
You are in a really inconsistent state, not sure which one, and if
migration was live, with the other host possibly retaking the disks to
continue.
   


I agree that exiting is the only sane behavior for the destination.

Regards,

Anthony Liguori


In some cases, you can't do anything:
- you got passed an fd, and fd got closed/image corrupted/...
- you got passed an exec command like "exec: gzip -d<  foo.gz"
   If gzip failed once, it will fail forever.

If you are running it by hand, cursor up + enter, and you are back
If you are using a management application, it is going to be easier to
restart the process that trying to cleanup everything.

Experience shows that people really tries to do weird things when
machine is in this state.

Later, Juan.

   





Re: [Qemu-devel] [PATCH 3/5] QMP: Introduce MIGRATION events

2010-05-25 Thread Anthony Liguori

On 05/25/2010 01:31 PM, Luiz Capitulino wrote:

On Tue, 25 May 2010 16:21:03 +0200
Juan Quintela  wrote:

   

They are emitted when migration starts, ends, has a failure or is canceled.

Signed-off-by: Juan Quintela
---
  QMP/qmp-events.txt |   50 ++
  monitor.c  |   12 
  monitor.h  |4 
  3 files changed, 66 insertions(+), 0 deletions(-)

diff --git a/QMP/qmp-events.txt b/QMP/qmp-events.txt
index 01ec85f..93caa4d 100644
--- a/QMP/qmp-events.txt
+++ b/QMP/qmp-events.txt
@@ -26,6 +26,56 @@ Example:
  Note: If action is "stop", a STOP event will eventually follow the
  BLOCK_IO_ERROR event.

+MIGRATION_CANCELED
+--
+
+Emitted when migration is canceled.  This is emitted in the source.
 

  Shouldn't this one be emitted in the destination?
   


Destination can't distinguish a cancelled from a closed pipe.  But the 
idea is that a third party is talking to both source and destination so 
it knows if it's cancelled the migration.



+Target will emit MIGRATION_FAILED (no way to differentiate a FAILED
+and CANCELED migration for target).
+
+Data: None
+
+Example:
+
+{ "event": "MIGRATION_CANCELED",
+"timestamp": {"seconds": 1274687575, "microseconds": 592483} }
+
+MIGRATION_ENDED
+---
+
+Emitted when migration ends (both in source and target)
+
+Data: None
+
+Example:
+
+{ "event": "MIGRATION_ENDED",
+"timestamp": {"seconds": 1274687575, "microseconds": 592483} }
+
+MIGRATION_FAILED
+
+
+Emitted when migration fails (both is source and target).
+
+Data: None
+
+Example:
+
+{ "event": "MIGRATION_FAILED",
+"timestamp": {"seconds": 1274687575, "microseconds": 592483} }
 

  What about a MIGRATION_FINISHED event, which contains a 'success'
key which is a bool?

  The only disadvantage of this is if we decide to add more information
to the event (say, stats) then it'd get ugly. Otherwise, one event is enough.

  Anyway, the counterpart of MIGRATION_FAILED is MIGRATION_SUCCEEDED.
   


I see MIGRATION_FAILED as being very similar to block I/O error events.  
I think we'll need a very similar solution for both.  It boils down to, 
how do we raise asynchronous events when something fails?


Regards,

Anthony Liguori




Re: [Qemu-devel] Re: [PATCH] Add cache=volatile parameter to -drive

2010-05-25 Thread Anthony Liguori

On 05/25/2010 12:59 PM, Alexander Graf wrote:

I see it as the equivalent to the Taint bit in Linux.  I want to make
it clear to users up front that if you use this option, and you have
data loss issues, don't complain.

Just putting something in qemu-doc.texi is not enough IMHO.  Few
people actually read it.
 

So what exactly is the conclusion here? I really want to see this
getting merged.
   


Make it more scary and try again.

Regards,

Anthony Liguori



Alex

   





[Qemu-devel] Re: [PATCH 1/5] Exit if incoming migration fails

2010-05-25 Thread Juan Quintela
Luiz Capitulino  wrote:
> On Tue, 25 May 2010 16:21:01 +0200
> Juan Quintela  wrote:
>
>> Signed-off-by: Juan Quintela 
>> ---
>>  migration.c |   16 ++--
>>  migration.h |2 +-
>>  vl.c|7 ++-
>>  3 files changed, 17 insertions(+), 8 deletions(-)
>> 

>  While I agree on the change, I have two comments:
>
> 1. By taking a look at the code I have the impression that most of the
>fun failures will happen on the handler passed to qemu_set_fd_handler2(),
>do you agree? Any plan to address that?

That is outgoing migration, not incoming migration.
Incoming migration in synchronous..


> 1. Is exit()ing the best thing to be done? I understand it's the easiest
>and maybe better than nothing, but wouldn't it be better to enter in
>paused-forever state so that clients can query and decide what to do?

For incoming migration, if it fails in the middle, every bet is off.
You are in a really inconsistent state, not sure which one, and if
migration was live, with the other host possibly retaking the disks to
continue.

In some cases, you can't do anything:
- you got passed an fd, and fd got closed/image corrupted/...
- you got passed an exec command like "exec: gzip -d < foo.gz"
  If gzip failed once, it will fail forever.

If you are running it by hand, cursor up + enter, and you are back
If you are using a management application, it is going to be easier to
restart the process that trying to cleanup everything.

Experience shows that people really tries to do weird things when
machine is in this state.

Later, Juan.



[Qemu-devel] Re: [PATCH 3/5] QMP: Introduce MIGRATION events

2010-05-25 Thread Juan Quintela
Luiz Capitulino  wrote:
> On Tue, 25 May 2010 17:35:53 +0200
> Juan Quintela  wrote:
>
>> Anthony Liguori  wrote:
>> > On 05/25/2010 09:21 AM, Juan Quintela wrote:
>> 
>> >> +MIGRATION_CANCELED
>> >> +--
>> >> +
>> >> +Emitted when migration is canceled.  This is emitted in the source.
>> >> +Target will emit MIGRATION_FAILED (no way to differentiate a FAILED
>> >> +and CANCELED migration for target).
>> >>
>> >
>> > But the management tool is the one that cancels so surely, it knows
>> > why already.
>> 
>> ok, then that one is ok.
>
>  Isn't this one important for the destination instead?

Destination don't know what happened, only that conection/data is not
coming anymore.

>From management prespective, management application knows that it has
canceled the migration, so no need to be told.

Later, Juan.



Re: [Qemu-devel] [PATCH 1/2] virtio-9p: make virtio-9p available to all POSIX systems

2010-05-25 Thread Blue Swirl
On Mon, May 24, 2010 at 8:46 PM, Venkateswararao Jujjuri (JV)
 wrote:
> Blue Swirl wrote:
>> Field d_off in struct dirent is Linux specific.
>>
>> Signed-off-by: Blue Swirl 
>> ---
>>  Makefile.objs   |    8 
>>  Makefile.target |    2 +-
>>  hw/virtio-9p.c  |    2 +-
>>  hw/virtio-pci.c |    6 +++---
>>  hw/virtio.h     |    4 ++--
>>  qemu-config.c   |    4 ++--
>>  qemu-config.h   |    2 +-
>>  qemu-options.hx |    8 
>>  vl.c            |    8 
>>  9 files changed, 22 insertions(+), 22 deletions(-)
>>
>> diff --git a/Makefile.objs b/Makefile.objs
>> index 1585101..b1a6e01 100644
>> --- a/Makefile.objs
>> +++ b/Makefile.objs
>> @@ -35,8 +35,8 @@ net-nested-$(CONFIG_SLIRP) += slirp.o
>>  net-nested-$(CONFIG_VDE) += vde.o
>>  net-obj-y += $(addprefix net/, $(net-nested-y))
>>
>> -fsdev-nested-$(CONFIG_LINUX) = qemu-fsdev.o
>> -fsdev-obj-$(CONFIG_LINUX) += $(addprefix fsdev/, $(fsdev-nested-y))
>> +fsdev-nested-$(CONFIG_POSIX) = qemu-fsdev.o
>> +fsdev-obj-$(CONFIG_POSIX) += $(addprefix fsdev/, $(fsdev-nested-y))
>>
>>  ##
>>  # libqemu_common.a: Target independent part of system emulation. The
>> @@ -47,7 +47,7 @@ fsdev-obj-$(CONFIG_LINUX) += $(addprefix fsdev/,
>> $(fsdev-nested-y))
>>  common-obj-y = $(block-obj-y)
>>  common-obj-y += $(net-obj-y)
>>  common-obj-y += $(qobject-obj-y)
>> -common-obj-$(CONFIG_LINUX) += $(fsdev-obj-$(CONFIG_LINUX))
>> +common-obj-$(CONFIG_POSIX) += $(fsdev-obj-$(CONFIG_POSIX))
>>  common-obj-y += readline.o console.o async.o qemu-error.o
>>  common-obj-y += tcg-runtime.o host-utils.o
>>  common-obj-y += irq.o ioport.o input.o
>> @@ -229,7 +229,7 @@ sound-obj-$(CONFIG_CS4231A) += cs4231a.o
>>  adlib.o fmopl.o: QEMU_CFLAGS += -DBUILD_Y8950=0
>>  hw-obj-$(CONFIG_SOUND) += $(sound-obj-y)
>>
>> -hw-obj-$(CONFIG_LINUX) += virtio-9p-debug.o virtio-9p-local.o
>> +hw-obj-$(CONFIG_POSIX) += virtio-9p-debug.o virtio-9p-local.o
>>
>>  ##
>>  # libdis
>> diff --git a/Makefile.target b/Makefile.target
>> index fda5bf3..00e140f 100644
>> --- a/Makefile.target
>> +++ b/Makefile.target
>> @@ -168,7 +168,7 @@ obj-y += virtio-blk.o virtio-balloon.o
>> virtio-net.o virtio-serial-bus.o
>>  obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
>>  obj-y += vhost_net.o
>>  obj-$(CONFIG_VHOST_NET) += vhost.o
>> -obj-$(CONFIG_LINUX) += virtio-9p.o
>> +obj-$(CONFIG_POSIX) += virtio-9p.o
>>  obj-y += rwhandler.o
>>  obj-$(CONFIG_KVM) += kvm.o kvm-all.o
>>  obj-$(CONFIG_NO_KVM) += kvm-stub.o
>> diff --git a/hw/virtio-9p.c b/hw/virtio-9p.c
>> index e5d0112..68b0696 100644
>> --- a/hw/virtio-9p.c
>> +++ b/hw/virtio-9p.c
>> @@ -1447,8 +1447,8 @@ static void v9fs_read_post_dir_lstat(V9fsState
>> *s, V9fsReadState *vs,
>>      vs->count += vs->len;
>>      v9fs_stat_free(&vs->v9stat);
>>      v9fs_string_free(&vs->name);
>> -    vs->dir_pos = vs->dent->d_off;
>>      vs->dent = v9fs_do_readdir(s, vs->fidp->dir);
>> +    vs->dir_pos = v9fs_do_telldir(s, vs->fidp->dir);
>
>
> We need to save the the current dir position before making next readdir
> We need to seek back if we can't fit it into PDU.
> Hence moving the dir_pos after readdir is not a good idea.

Hmm, the manual page for readdir says:
   On Linux, the dirent structure is defined as follows:

   struct dirent {
   ino_t  d_ino;   /* inode number */
   off_t  d_off;   /* offset to the next dirent */
   unsigned short d_reclen;/* length of this record */
   unsigned char  d_type;  /* type of file */
   char   d_name[256]; /* filename */
   };

My change was based on the comment for d_off.

But when I run this program:

$ cat dirent.c
#include 
#include 
#include 

int main(int argc, const char **argv)
{
DIR *d;
struct dirent *entry;
off_t pos;

d = opendir(argv[1]);
entry = readdir(d);
do {
pos = telldir(d);
printf("name %s d_off %ld ino %ld pos %ld\n", entry->d_name,
entry->d_off, entry->d_ino, pos);
entry = readdir(d);
} while (entry);
closedir(d);

return 0;
}

d_off is equal to telldir value:

$ ./dirent /
name tmp d_off 206002973 ino 56 pos 206002973
name root d_off 224116791 ino 521217 pos 224116791
name vmlinuz.old d_off 255549115 ino 17 pos 255549115
name dev d_off 378658993 ino 374625 pos 378658993

I'll send an updated patch.

> BTW, Thanks for making VirtFS generic to all POSIX systems.
>
> Thanks,
> JV.
>
>>      v9fs_read_post_readdir(s, vs, err);
>>      return;
>>  out:
>> diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
>> index 7ddf612..0a74781 100644
>> --- a/hw/virtio-pci.c
>> +++ b/hw/virtio-pci.c
>> @@ -102,7 +102,7 @@ typedef struct {
>>      BlockConf block;
>>      NICConf nic;
>>      uint32_t host_features;
>> -#ifdef CONFIG_LINUX
>> +#ifdef CONFIG_POSIX
>>      V9fsConf fsconf;
>>  #endif
>>      /* Max.

Re: [Qemu-devel] [PATCH 3/5] QMP: Introduce MIGRATION events

2010-05-25 Thread Luiz Capitulino
On Tue, 25 May 2010 16:21:03 +0200
Juan Quintela  wrote:

> They are emitted when migration starts, ends, has a failure or is canceled.
> 
> Signed-off-by: Juan Quintela 
> ---
>  QMP/qmp-events.txt |   50 ++
>  monitor.c  |   12 
>  monitor.h  |4 
>  3 files changed, 66 insertions(+), 0 deletions(-)
> 
> diff --git a/QMP/qmp-events.txt b/QMP/qmp-events.txt
> index 01ec85f..93caa4d 100644
> --- a/QMP/qmp-events.txt
> +++ b/QMP/qmp-events.txt
> @@ -26,6 +26,56 @@ Example:
>  Note: If action is "stop", a STOP event will eventually follow the
>  BLOCK_IO_ERROR event.
> 
> +MIGRATION_CANCELED
> +--
> +
> +Emitted when migration is canceled.  This is emitted in the source.

 Shouldn't this one be emitted in the destination?

> +Target will emit MIGRATION_FAILED (no way to differentiate a FAILED
> +and CANCELED migration for target).
> +
> +Data: None
> +
> +Example:
> +
> +{ "event": "MIGRATION_CANCELED",
> +"timestamp": {"seconds": 1274687575, "microseconds": 592483} }
> +
> +MIGRATION_ENDED
> +---
> +
> +Emitted when migration ends (both in source and target)
> +
> +Data: None
> +
> +Example:
> +
> +{ "event": "MIGRATION_ENDED",
> +"timestamp": {"seconds": 1274687575, "microseconds": 592483} }
> +
> +MIGRATION_FAILED
> +
> +
> +Emitted when migration fails (both is source and target).
> +
> +Data: None
> +
> +Example:
> +
> +{ "event": "MIGRATION_FAILED",
> +"timestamp": {"seconds": 1274687575, "microseconds": 592483} }

 What about a MIGRATION_FINISHED event, which contains a 'success'
key which is a bool?

 The only disadvantage of this is if we decide to add more information
to the event (say, stats) then it'd get ugly. Otherwise, one event is enough.

 Anyway, the counterpart of MIGRATION_FAILED is MIGRATION_SUCCEEDED.

> +
> +MIGRATION_STARTED
> +-
> +
> +Emitted when migration starts (both in source and target).

 Don't you need this only on the destination?



Re: [Qemu-devel] Re: [PATCH 3/5] QMP: Introduce MIGRATION events

2010-05-25 Thread Luiz Capitulino
On Tue, 25 May 2010 17:35:53 +0200
Juan Quintela  wrote:

> Anthony Liguori  wrote:
> > On 05/25/2010 09:21 AM, Juan Quintela wrote:
> 
> >> +MIGRATION_CANCELED
> >> +--
> >> +
> >> +Emitted when migration is canceled.  This is emitted in the source.
> >> +Target will emit MIGRATION_FAILED (no way to differentiate a FAILED
> >> +and CANCELED migration for target).
> >>
> >
> > But the management tool is the one that cancels so surely, it knows
> > why already.
> 
> ok, then that one is ok.

 Isn't this one important for the destination instead?



Re: [Qemu-devel] [PATCH 2/3] Tracepoint, buffer & monitor framework

2010-05-25 Thread Prerna Saxena

Hi Stefan,
Thanks for having a look.
As I'd mentioned, this patchset is *work in progress*, which explains 
the dummy comments and coding style violations at places :) I was merely 
sharing a draft of what my approach is -- so that we can work together 
on how much of it can add to the trace framework you've proposed.


On 05/25/2010 05:10 PM, Stefan Hajnoczi wrote:


I think this is too much work.  Let each tracepoint have its own global struct
tracepoint so it can directly reference it using tracepoint_##name - no hash
lookup needed.  Add the QLIST_ENTRY directly to struct tracepoint so the
tracepoint register/unregister code can assign ids and look up tracepoints by
name.  No critical path code needs to do name lookups and the hash table can
disappear.


I had employed a combination of hash (derived from name) and an ID 
(which is the offset within a hash bucket where the tracepoint details 
are stored) to determine tracepoint information for a given name. Your 
suggestion to eliminate name queries is good, let me see how much of 
this can be scaled down.




+#define DECLARE_TRACE(name, tproto, tstruct)   \
+   struct __trace_struct_##name {  \
+   tstruct \
+   };  \

Should this struct be packed so more fields can fit?



Yes, indeed. Thanks for reminding !


+trace_queue->trace_buffer[tmp].metadata.write_complete = 0;

This is not guaranteed to work without memory barriers.  There is no way for
the trace consumer to block until there is more data available.  The
synchronization needs to consider writing traces to a file, which has different
constraints than dumping the current contents of the trace buffer.

We're missing a way to trace to a file.  That could be done in binary or text.
It would be easier in text because we already have the format strings and don't
need a unique ID mapping in an external binary parsing tool.



OK, at the time of working on this I hadnt really thought of dumping 
traces in a file. It meant too much of IO latency that such tracing 
would bring in. My idea of a tracer entailed buffer based logging with a 
simple reader(see last)



Making data available after crash is also useful.  The easiest way is to dump
the trace buffer from the core dump using gdb.  However, we'd need some way of
making sense of the bytes.  That could be done by reading the tracepoint_lib
structures from the core dump.



Agree.


(The way I do trace recovery from a core dump in my simple tracer is to binary
dump the trace buffer from the core dump.  Since the trace buffer contents are
normally written out to file unchanged anyway, the simpletrace.py script can
read the dumped trace buffer like a normal trace file.)

Nitpicks:



As I mentioned, this is work in progress so you'd have seen quite a lot 
of violations. Thanks for pointing those out, I'll clean those up for 
whatever approach we choose to use :)



Some added lines of code use tabs for indentation, 4 space indentation should
be used.

+struct tracepoint {
+   char *name; /* Tracepoint name */
+   uint8_t  trace_id;  /* numerical ID */

Maximum 256 tracepoints in QEMU?  A limit of 65536 is less likely to
be an issue in the future.



No, this field describes the maximum tracepoints for a given hash queue.


+   void __trace_print_##name(Monitor *mon, void *data) \
+   {   \
+ struct __trace_struct_##name *entry;  \
+   \
+ if(!entry)\
+   return; \

This does not work, entry is not initialized.


Typo ! should've been : if(!data)



+#define DO_TRACE(name, args...)\
+   trace_##name(args);

This macro is unused?


A relic that needs to be cleaned :)



+/* In essence, the structure becomes :
+ * struct tracepoint_lib {

This comment will get out of sync easily.

+   qemu_malloc(sizeof(struct tracepoint_lib));
+
+if (!new_entry)
+   return NULL;/* No memory */

qemu_malloc() does not return NULL on out-of-memory, it aborts the program.
Same for allocating new_entry->entry.name.



Wondering how I forgot that ! thanks for reminding.


+new_entry->entry.name = (char*)qemu_malloc(strlen(name)+1);
+if(!new_entry->entry.name)
+   return NULL;
+
+strncpy(new_entry->entry.name, name, strlen(name)+1);

Perhaps just strdup() instead of manual qemu_malloc()/strncpy().

Stefan



I'll work on merging this circular buffer + monitor-based reader as a 
backend for your proposed tracer. Would it be a good idea to have two 
trace buffers -- when one is full, it gets written to disk ; while the 
second is used to log traces.

I think 

[Qemu-devel] Re: [PATCH 3/5] QMP: Introduce MIGRATION events

2010-05-25 Thread Luiz Capitulino
On Tue, 25 May 2010 11:10:23 -0500
Anthony Liguori  wrote:

>  There should be some information about why it failed, no? Preferrably
>  in a QError format.
> 
>   
> >>> At this point, we have basically -1 :(
> >>>
> >>> I can add a field with an error number, but we are very bad at the
> >>> moment about moving errno's upstack.
> >>>
> >>>
> >> We need a better solution for reporting errors via notifications.
> >>  
> > Suggestions?
> >
> > Notice that what we need now is a way to know if migration ended with
> > success or in any other way, as soon as possible.
> >
> 
> Markus/Luiz?

 We need to redesign QError. I could give it a try, but quite frankly, I
don't know how do it good enough..

 Markus has worked more on error handling than me though and I think he's
the best person to do it, but he's busy at other things atm.

 Note that major work is design, not code churn (could turn out into an issue,
but I doubt it).



Re: [Qemu-devel] [PATCH -V3 2/7] virtio-9p: Rearrange fileop structures

2010-05-25 Thread Sripathi Kodi
On Fri, 21 May 2010 14:26:05 -0700
"Venkateswararao Jujjuri (JV)"  wrote:

Hi JV,

While I agree that this patch is nice to have, why is this part of the
security model patchset? Is it required to implement the models?

Thanks,
Sripathi.

> Signed-off-by: Venkateswararao Jujjuri 
> ---
>  hw/virtio-9p.c |  185 
> ++--
>  hw/virtio-9p.h |   92 
>  2 files changed, 138 insertions(+), 139 deletions(-)
> 
> diff --git a/hw/virtio-9p.c b/hw/virtio-9p.c
> index 8ecd39c..fda3c4a 100644
> --- a/hw/virtio-9p.c
> +++ b/hw/virtio-9p.c
> @@ -21,6 +21,52 @@
>  int dotu = 1;
>  int debug_9p_pdu;
> 
> +enum {
> +Oread   = 0x00,
> +Owrite  = 0x01,
> +Ordwr   = 0x02,
> +Oexec   = 0x03,
> +Oexcl   = 0x04,
> +Otrunc  = 0x10,
> +Orexec  = 0x20,
> +Orclose = 0x40,
> +Oappend = 0x80,
> +};
> +
> +static int omode_to_uflags(int8_t mode)
> +{
> +int ret = 0;
> +
> +switch (mode & 3) {
> +case Oread:
> +ret = O_RDONLY;
> +break;
> +case Ordwr:
> +ret = O_RDWR;
> +break;
> +case Owrite:
> +ret = O_WRONLY;
> +break;
> +case Oexec:
> +ret = O_RDONLY;
> +break;
> +}
> +
> +if (mode & Otrunc) {
> +ret |= O_TRUNC;
> +}
> +
> +if (mode & Oappend) {
> +ret |= O_APPEND;
> +}
> +
> +if (mode & Oexcl) {
> +ret |= O_EXCL;
> +}
> +
> +return ret;
> +}
> +
>  static int v9fs_do_lstat(V9fsState *s, V9fsString *path, struct stat *stbuf)
>  {
>  return s->ops->lstat(&s->ctx, path->data, stbuf);
> @@ -999,14 +1045,6 @@ out:
>  v9fs_string_free(&aname);
>  }
> 
> -typedef struct V9fsStatState {
> -V9fsPDU *pdu;
> -size_t offset;
> -V9fsStat v9stat;
> -V9fsFidState *fidp;
> -struct stat stbuf;
> -} V9fsStatState;
> -
>  static void v9fs_stat_post_lstat(V9fsState *s, V9fsStatState *vs, int err)
>  {
>  if (err == -1) {
> @@ -1057,19 +1095,6 @@ out:
>  qemu_free(vs);
>  }
> 
> -typedef struct V9fsWalkState {
> -V9fsPDU *pdu;
> -size_t offset;
> -int16_t nwnames;
> -int name_idx;
> -V9fsQID *qids;
> -V9fsFidState *fidp;
> -V9fsFidState *newfidp;
> -V9fsString path;
> -V9fsString *wnames;
> -struct stat stbuf;
> -} V9fsWalkState;
> -
>  static void v9fs_walk_complete(V9fsState *s, V9fsWalkState *vs, int err)
>  {
>  complete_pdu(s, vs->pdu, err);
> @@ -1233,62 +1258,6 @@ out:
>  v9fs_walk_complete(s, vs, err);
>  }
> 
> -typedef struct V9fsOpenState {
> -V9fsPDU *pdu;
> -size_t offset;
> -int8_t mode;
> -V9fsFidState *fidp;
> -V9fsQID qid;
> -struct stat stbuf;
> -
> -} V9fsOpenState;
> -
> -enum {
> -Oread   = 0x00,
> -Owrite  = 0x01,
> -Ordwr   = 0x02,
> -Oexec   = 0x03,
> -Oexcl   = 0x04,
> -Otrunc  = 0x10,
> -Orexec  = 0x20,
> -Orclose = 0x40,
> -Oappend = 0x80,
> -};
> -
> -static int omode_to_uflags(int8_t mode)
> -{
> -int ret = 0;
> -
> -switch (mode & 3) {
> -case Oread:
> -ret = O_RDONLY;
> -break;
> -case Ordwr:
> -ret = O_RDWR;
> -break;
> -case Owrite:
> -ret = O_WRONLY;
> -break;
> -case Oexec:
> -ret = O_RDONLY;
> -break;
> -}
> -
> -if (mode & Otrunc) {
> -ret |= O_TRUNC;
> -}
> -
> -if (mode & Oappend) {
> -ret |= O_APPEND;
> -}
> -
> -if (mode & Oexcl) {
> -ret |= O_EXCL;
> -}
> -
> -return ret;
> -}
> -
>  static void v9fs_open_post_opendir(V9fsState *s, V9fsOpenState *vs, int err)
>  {
>  if (vs->fidp->dir == NULL) {
> @@ -1391,25 +1360,6 @@ out:
>  complete_pdu(s, pdu, err);
>  }
> 
> -typedef struct V9fsReadState {
> -V9fsPDU *pdu;
> -size_t offset;
> -int32_t count;
> -int32_t total;
> -int64_t off;
> -V9fsFidState *fidp;
> -struct iovec iov[128]; /* FIXME: bad, bad, bad */
> -struct iovec *sg;
> -off_t dir_pos;
> -struct dirent *dent;
> -struct stat stbuf;
> -V9fsString name;
> -V9fsStat v9stat;
> -int32_t len;
> -int32_t cnt;
> -int32_t max_count;
> -} V9fsReadState;
> -
>  static void v9fs_read_post_readdir(V9fsState *, V9fsReadState *, ssize_t);
> 
>  static void v9fs_read_post_seekdir(V9fsState *s, V9fsReadState *vs, ssize_t 
> err)
> @@ -1597,19 +1547,6 @@ out:
>  qemu_free(vs);
>  }
> 
> -typedef struct V9fsWriteState {
> -V9fsPDU *pdu;
> -size_t offset;
> -int32_t len;
> -int32_t count;
> -int32_t total;
> -int64_t off;
> -V9fsFidState *fidp;
> -struct iovec iov[128]; /* FIXME: bad, bad, bad */
> -struct iovec *sg;
> -int cnt;
> -} V9fsWriteState;
> -
>  static void v9fs_write_post_writev(V9fsState *s, V9fsWriteState *vs,
> ssize_t err)
>  {
> @@ -1706,19 +1643,6 @@ out:
>  qemu_free(vs);
>  }
> 
> -typedef struct V9fsCreateState {
>

Re: [Qemu-devel] [PATCH 1/5] Exit if incoming migration fails

2010-05-25 Thread Luiz Capitulino
On Tue, 25 May 2010 16:21:01 +0200
Juan Quintela  wrote:

> Signed-off-by: Juan Quintela 
> ---
>  migration.c |   16 ++--
>  migration.h |2 +-
>  vl.c|7 ++-
>  3 files changed, 17 insertions(+), 8 deletions(-)
> 
> diff --git a/migration.c b/migration.c
> index 05f6cc5..9c1d4b6 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -36,22 +36,26 @@ static uint32_t max_throttle = (32 << 20);
> 
>  static MigrationState *current_migration;
> 
> -void qemu_start_incoming_migration(const char *uri)
> +int qemu_start_incoming_migration(const char *uri)
>  {
>  const char *p;
> +int ret;
> 
>  if (strstart(uri, "tcp:", &p))
> -tcp_start_incoming_migration(p);
> +ret = tcp_start_incoming_migration(p);
>  #if !defined(WIN32)
>  else if (strstart(uri, "exec:", &p))
> -exec_start_incoming_migration(p);
> +ret =  exec_start_incoming_migration(p);
>  else if (strstart(uri, "unix:", &p))
> -unix_start_incoming_migration(p);
> +ret = unix_start_incoming_migration(p);
>  else if (strstart(uri, "fd:", &p))
> -fd_start_incoming_migration(p);
> +ret = fd_start_incoming_migration(p);
>  #endif
> -else
> +else {
>  fprintf(stderr, "unknown migration protocol: %s\n", uri);
> +ret = -EPROTONOSUPPORT;
> +}
> +return ret;
>  }
> 
>  int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data)
> diff --git a/migration.h b/migration.h
> index 385423f..dd423a1 100644
> --- a/migration.h
> +++ b/migration.h
> @@ -50,7 +50,7 @@ struct FdMigrationState
>  void *opaque;
>  };
> 
> -void qemu_start_incoming_migration(const char *uri);
> +int qemu_start_incoming_migration(const char *uri);
> 
>  int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data);
> 
> diff --git a/vl.c b/vl.c
> index 328395e..d13440d 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -3823,7 +3823,12 @@ int main(int argc, char **argv, char **envp)
>  }
> 
>  if (incoming) {
> -qemu_start_incoming_migration(incoming);
> +int ret = qemu_start_incoming_migration(incoming);
> +if (ret < 0) {
> +fprintf(stderr, "Migration failed. Exit code %s(%d), exiting.\n",
> +incoming, ret);
> +exit(ret);

 While I agree on the change, I have two comments:

1. By taking a look at the code I have the impression that most of the
   fun failures will happen on the handler passed to qemu_set_fd_handler2(),
   do you agree? Any plan to address that?

1. Is exit()ing the best thing to be done? I understand it's the easiest
   and maybe better than nothing, but wouldn't it be better to enter in
   paused-forever state so that clients can query and decide what to do?

> +}
>  } else if (autostart) {
>  vm_start();
>  }




Re: [Qemu-devel] Re: [PATCH] Add cache=volatile parameter to -drive

2010-05-25 Thread Alexander Graf
Anthony Liguori wrote:
> On 05/17/2010 11:23 AM, Paul Brook wrote:
 I don't see a difference between the results. Apparently the barrier
 option doesn't change a thing.

>>> Ok.  I don't like it, but I can see how it's compelling.  I'd like to
>>> see the documentation improved though.  I also think a warning printed
>>> on stdio about the safety of the option would be appropriate.
>>>  
>> I disagree with this last bit.
>>
>> Errors should be issued if the user did something wrong.
>> Warnings should be issued if qemu did (or will soon do) something
>> other than
>> what the user requested, or otherwise made questionable decisions on the
>> user's behalf.
>>
>> In this case we're doing exactly what the user requested. The only
>> plausible
>> failure case is where a user is blindly trying options that they
>> clearly don't
>> understand or read the documentation for. I have zero sympathy for
>> complaints
>> like "Someone on the Internet told me to use --breakme, and broke
>> thinks".
>>
>
> I see it as the equivalent to the Taint bit in Linux.  I want to make
> it clear to users up front that if you use this option, and you have
> data loss issues, don't complain.
>
> Just putting something in qemu-doc.texi is not enough IMHO.  Few
> people actually read it.

So what exactly is the conclusion here? I really want to see this
getting merged.


Alex




[Qemu-devel] Re: [PATCH 3/5] QMP: Introduce MIGRATION events

2010-05-25 Thread Juan Quintela
Anthony Liguori  wrote:
> On 05/25/2010 11:25 AM, Daniel P. Berrange wrote:
>> On Tue, May 25, 2010 at 06:04:17PM +0200, Juan Quintela wrote:
>>
>>> Anthony Liguori  wrote:

> I'm not sure why you would need a notification of when migration
> starts (since you know when you've started migration).

But you don't know if the other end "knows" that it has also started.

started is needed only in incoming part, because  we don't have a
monitor to ask if migration has started.

Later, Juan.



Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Sage Weil
On Tue, 25 May 2010, Avi Kivity wrote:
> > What's the reason for not having these drivers upstream? Do we gain
> > anything by hiding them from our users and requiring them to install the
> > drivers separately from somewhere else?
> >
> 
> Six months.

FWIW, we (Ceph) aren't complaining about the 6 month lag time (and I don't 
think the Sheepdog guys are either).

>From our perspective, the current BlockDriver abstraction is ideal, as it 
represents the reality of qemu's interaction with storage.  Any 'external' 
interface will be inferior to that in one way or another.  But either way, 
we are perfectly willing to work with you to all to keep in sync with any 
future BlockDriver API improvements.  It is worth our time investment even 
if the API is less stable.

The ability to dynamically load a shared object using the existing api 
would make development a bit easier, but I'm not convinced it's better for 
for users.  I think having ceph and sheepdog upstream with qemu will serve 
end users best, and we at least are willing to spend the time to help 
maintain that code in qemu.git.

sage



[Qemu-devel] Re: [PATCH 1/2] Pad iommu with an empty slot (necessary for SunOS 4.1.4)

2010-05-25 Thread Artyom Tarasenko
2010/5/21 Blue Swirl :
> On Fri, May 21, 2010 at 5:23 PM, Artyom Tarasenko
>  wrote:
>> 2010/5/10 Blue Swirl :
>>> On 5/10/10, Artyom Tarasenko  wrote:
 2010/5/10 Blue Swirl :

 > On 5/10/10, Artyom Tarasenko  wrote:
  >> 2010/5/9 Blue Swirl :
  >>  > On 5/9/10, Artyom Tarasenko  wrote:
  >>  >> 2010/5/9 Blue Swirl :
  >>  >>
  >>  >> > On 5/8/10, Artyom Tarasenko  wrote:
  >>  >>  >> On the real hardware (SS-5, LX) the MMU is not padded, but 
 aliased.
  >>  >>  >>  Software shouldn't use aliased addresses, neither should it 
 crash
  >>  >>  >>  when it uses (on the real hardware it wouldn't). Using 
 empty_slot
  >>  >>  >>  instead of aliasing can help with debugging such accesses.
  >>  >>  >
  >>  >>  > TurboSPARC Microprocessor User's Manual shows that there are
  >>  >>  > additional pages after the main IOMMU for AFX registers. So 
 this is
  >>  >>  > not board specific, but depends on CPU/IOMMU versions.
  >>  >>
  >>  >>
  >>  >> I checked it on the real hw: on LX and SS-5 these are aliased MMU 
 addresses.
  >>  >>  SS-20 doesn't have any aliasing.
  >>  >
  >>  > But are your machines equipped with TurboSPARC or some other CPU?
  >>
  >>
  >> Good point, I must confess, I missed the word "Turbo" in your first
  >>  answer. LX and SS-20 don't.
  >>  But SS-5 must have a TurboSPARC CPU:
  >>
  >>  ok cd /FMI,MB86904
  >>  ok .attributes
  >>  context-table            00 00 00 00 03 ff f0 00 00 00 10 00
  >>  psr-implementation       
  >>  psr-version              0004
  >>  implementation           
  >>  version                  0004
  >>  cache-line-size          0020
  >>  cache-nlines             0200
  >>  page-size                1000
  >>  dcache-line-size         0010
  >>  dcache-nlines            0200
  >>  dcache-associativity     0001
  >>  icache-line-size         0020
  >>  icache-nlines            0200
  >>  icache-associativity     0001
  >>  ncaches                  0002
  >>  mmu-nctx                 0100
  >>  sparc-version            0008
  >>  mask_rev                 0026
  >>  device_type              cpu
  >>  name                     FMI,MB86904
  >>
  >>  and still it behaves the same as TI,TMS390S10 from the LX. This is 
 done on SS-5:
  >>
  >>  ok 1000 20 spacel@ .
  >>  409
  >>  ok 1400 20 spacel@ .
  >>  409
  >>  ok 1404 20 spacel@ .
  >>  23000
  >>  ok 1f04 20 spacel@ .
  >>  23000
  >>  ok 1008 20 spacel@ .
  >>  409
  >>  ok 1428 20 spacel@ .
  >>  409
  >>  ok 100c 20 spacel@ .
  >>  23000
  >>  ok 1010 20 spacel@ .
  >>  409
  >>
  >>
  >>  LX is the same except for the IOMMU-version:
  >>
  >>  ok 1000 20 spacel@ .
  >>  405
  >>  ok 1400 20 spacel@ .
  >>  405
  >>  ok 1800 20 spacel@ .
  >>  405
  >>  ok 1f00 20 spacel@ .
  >>  405
  >>  ok 1ff0 20 spacel@ .
  >>  405
  >>  ok 1fff0004 20 spacel@ .
  >>  1fe000
  >>  ok 1004 20 spacel@ .
  >>  1fe000
  >>  ok 1108 20 spacel@ .
  >>  4105
  >>  ok 1040 20 spacel@ .
  >>  4105
  >>  ok 1fff0040 20 spacel@ .
  >>  4105
  >>  ok 1fff0044 20 spacel@ .
  >>  1fe000
  >>  ok 1fff0024 20 spacel@ .
  >>  1fe000
  >>
  >>
  >>  >>  At what address the additional AFX registers are located?
  >>  >
  >>  > Here's complete TurboSPARC IOMMU address map:
  >>  >  PA[30:0]          Register          Access
  >>  > 1000_       IOMMU Control         R/W
  >>  > 1000_0004    IOMMU Base Address       R/W
  >>  > 1000_0014   Flush All IOTLB Entries    W
  >>  > 1000_0018        Address Flush         W
  >>  > 1000_1000  Asynchronous Fault Status  R/W
  >>  > 1000_1004 Asynchronous Fault Address  R/W
  >>  > 1000_1010  SBus Slot Configuration 0   R/W
  >>  > 1000_1014  SBus Slot Configuration 1   R/W
  >>  > 1000_1018  SBus Slot Configuration 2   R/W
  >>  > 1000_101C  SBus Slot Configuration 3   R/W
  >>  > 1000_1020  SBus Slot Configuration 4   R/W
  >>  > 1000_1050     Memory Fault Status     R/W
  >>  > 1000_1054    Memory Fault Address     R/W
  >>  > 1000_2000     Module Identification    R/W
  >>  > 1000_3018      Mask Identification      R
  >>  > 1000_4000      AFX Queue Level         W
  >>  > 1000_6000      AFX Queue Level         R
  >>  > 1000_7000      AFX Queue Status        R
  >>
  >>
  >>
  >> But if I read it correctly 0x12fff294 (which makes SunOS crash with -m 
 32) is
  >>  well above this limit

Re: [Qemu-devel] [PATCH 2/5] vnc: send desktopresize event as reply to set encodings

2010-05-25 Thread Corentin Chary
On Tue, May 25, 2010 at 6:25 PM, Gerd Hoffmann  wrote:
> In case the desktop did resize while the vnc connection setup was still
> in progress the client isn't informed about it.  Send a desktop resize
> event as soon as the client told us it can handle deskop resize via set
> encodings message to make sure the client us up to date.

I had a similar patch on my queue but yours is probably cleaner :).

-- 
Corentin Chary
http://xf.iksaif.net



Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity

On 05/25/2010 07:21 PM, Anthony Liguori wrote:

On 05/25/2010 11:16 AM, Avi Kivity wrote:

On 05/25/2010 06:01 PM, Anthony Liguori wrote:

On 05/25/2010 10:00 AM, Avi Kivity wrote:
The latter.  Why is it less important?  If you don't inherit the 
memory, you can't access it.


You can also pass /dev/shm fd's via SCM_RIGHTs to establish shared 
memory segments dynamically.


Doesn't work for anonymous memory.


What's wrong with /dev/shm memory?


The kernel treats anonymous and nonymous memory differently for 
swapping (see /proc/sys/vm/swappiness); transparent hugepages won't 
work for /dev/shm (though it may be argued that that's a problem with 
thp); setup (/dev/shm defaults to half memory IIRC, we want 
mem+swap); different cgroup handling; somewhat clunky (a minor 
concern to be sure).


Surely, with mmu notifiers, it wouldn't be that hard to share 
anonymous memory via an fd though, no?


That's what I suggested with processfd().  I wouldn't call it easy but 
it's likely doable.  Whether it's mergable is a different issue.


--
error compiling committee.c: too many arguments to function




[Qemu-devel] [PATCH 3/5] vnc: keep track of client desktop size

2010-05-25 Thread Gerd Hoffmann
Add two new variables to keep track of the vnc clients desktop size.

Signed-off-by: Gerd Hoffmann 
---
 vnc.c |   10 +++---
 vnc.h |2 ++
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/vnc.c b/vnc.c
index 0e0e566..30e0bed 100644
--- a/vnc.c
+++ b/vnc.c
@@ -521,10 +521,12 @@ static void vnc_desktop_resize(VncState *vs)
 if (vs->csock == -1 || !vnc_has_feature(vs, VNC_FEATURE_RESIZE)) {
 return;
 }
+vs->client_width = ds_get_width(ds);
+vs->client_height = ds_get_height(ds);
 vnc_write_u8(vs, VNC_MSG_SERVER_FRAMEBUFFER_UPDATE);
 vnc_write_u8(vs, 0);
 vnc_write_u16(vs, 1); /* number of rects */
-vnc_framebuffer_update(vs, 0, 0, ds_get_width(ds), ds_get_height(ds),
+vnc_framebuffer_update(vs, 0, 0, vs->client_width, vs->client_height,
VNC_ENCODING_DESKTOPRESIZE);
 vnc_flush(vs);
 }
@@ -1958,8 +1960,10 @@ static int protocol_client_init(VncState *vs, uint8_t 
*data, size_t len)
 char buf[1024];
 int size;
 
-vnc_write_u16(vs, ds_get_width(vs->ds));
-vnc_write_u16(vs, ds_get_height(vs->ds));
+vs->client_width = ds_get_width(vs->ds);
+vs->client_height = ds_get_height(vs->ds);
+vnc_write_u16(vs, vs->client_width);
+vnc_write_u16(vs, vs->client_height);
 
 pixel_format_message(vs);
 
diff --git a/vnc.h b/vnc.h
index 0d39897..d648832 100644
--- a/vnc.h
+++ b/vnc.h
@@ -134,6 +134,8 @@ struct VncState
 int absolute;
 int last_x;
 int last_y;
+int client_width;
+int client_height;
 
 uint32_t vnc_encoding;
 
-- 
1.6.6.1




[Qemu-devel] [PATCH 0/5] vnc: desktop size patches.

2010-05-25 Thread Gerd Hoffmann
  Hi,

This series brings a bunch of vnc desktop size patches, fixing the
issues discussed in the "Possible race condition in VNC display
resizing" thread.  Check list archive here:

http://lists.gnu.org/archive/html/qemu-devel/2010-04/msg01778.html

cheers,
  Gerd

Gerd Hoffmann (5):
  vnc: factor out vnc_desktop_resize()
  vnc: send desktopresize event as reply to set encodings
  vnc: keep track of client desktop size
  vnc: don't send invalid screen updates.
  vnc: move size-changed check into the vnc_desktop_resize function.

 vnc.c |   50 +-
 vnc.h |2 ++
 2 files changed, 35 insertions(+), 17 deletions(-)




[Qemu-devel] [PATCH 4/5] vnc: don't send invalid screen updates.

2010-05-25 Thread Gerd Hoffmann
Don't send updates for screen areas which are outside the clients
desktop.  May happed with vnc clients which don't support the desktop
resize message.

Signed-off-by: Gerd Hoffmann 
---
 vnc.c |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/vnc.c b/vnc.c
index 30e0bed..119ffe8 100644
--- a/vnc.c
+++ b/vnc.c
@@ -836,6 +836,7 @@ static int vnc_update_client(VncState *vs, int has_dirty)
 int y;
 int n_rectangles;
 int saved_offset;
+int width, height;
 
 if (vs->output.offset && !vs->audio_cap && !vs->force_update)
 /* kernel send buffers are full -> drop frames to throttle */
@@ -856,10 +857,13 @@ static int vnc_update_client(VncState *vs, int has_dirty)
 saved_offset = vs->output.offset;
 vnc_write_u16(vs, 0);
 
-for (y = 0; y < vd->server->height; y++) {
+width = MIN(vd->server->width, vs->client_width);
+height = MIN(vd->server->height, vs->client_height);
+
+for (y = 0; y < height; y++) {
 int x;
 int last_x = -1;
-for (x = 0; x < vd->server->width / 16; x++) {
+for (x = 0; x < width / 16; x++) {
 if (vnc_get_bit(vs->dirty[y], x)) {
 if (last_x == -1) {
 last_x = x;
-- 
1.6.6.1




Re: [Qemu-devel] Re: [PATCH 3/5] QMP: Introduce MIGRATION events

2010-05-25 Thread Daniel P. Berrange
On Tue, May 25, 2010 at 06:04:17PM +0200, Juan Quintela wrote:
> Anthony Liguori  wrote:
> > On 05/25/2010 10:35 AM, Juan Quintela wrote:
> 
> >> problem here is that libvirt start target with -S, and waits to do the
> >> "cont" as soon as possible.  As of know, only way to do it is to poll
> >> info migrate on source faster.
> >>
> >
> > Why does it do that??
> >
> > That sound like a terrible idea.
> 
> Becaues migration is not reliable, and they don't have a way to issue
> cont only in one of the sides :(
> 
> We make migration protocol reliable, or management application have to
> decide when migration suceeded or not.
> 
> This new events help then a lot.  But they issue the cont really fast
> (before migration ends).  I don't remember why they did that.

The use of '-S / cont' isn't really because of reliability. There
are several scenarios though. There's a migrate API option to leave
the guest paused upon completion, hence we need to start it with -S
to stop it auto-running upon completion. With some disk locking 
approaches we need todo a lock transfer before allowing the dest
to continue running. It could be optimized to avoid the -S /cont
in cases where those two scenarios aren't relevant, but only if
we can get a separate async notification of when migration starts
and completes on the destination, so we can notify mgmt apps that
need this lifecycle event.

So in summary these lifecycle events on source + dest for start,
complete, fail, cancel are all focused on allowing libvirt to 
remove its existing hacks in migration support for current QEMU.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|



Re: [Qemu-devel] Re: [PATCH 3/5] QMP: Introduce MIGRATION events

2010-05-25 Thread Anthony Liguori

On 05/25/2010 11:25 AM, Daniel P. Berrange wrote:

On Tue, May 25, 2010 at 06:04:17PM +0200, Juan Quintela wrote:
   

Anthony Liguori  wrote:
 

On 05/25/2010 10:35 AM, Juan Quintela wrote:
   
 

problem here is that libvirt start target with -S, and waits to do the
"cont" as soon as possible.  As of know, only way to do it is to poll
info migrate on source faster.

 

Why does it do that??

That sound like a terrible idea.
   

Becaues migration is not reliable, and they don't have a way to issue
cont only in one of the sides :(

We make migration protocol reliable, or management application have to
decide when migration suceeded or not.

This new events help then a lot.  But they issue the cont really fast
(before migration ends).  I don't remember why they did that.
 

The use of '-S / cont' isn't really because of reliability. There
are several scenarios though. There's a migrate API option to leave
the guest paused upon completion, hence we need to start it with -S
to stop it auto-running upon completion.


That's a strange API.  Why would you want to do that?  Why not just stop 
and then migrate?  You're just wasting bandwidth doing a live migration 
and then leaving it stopped.  This is a critical period of time for the 
guest and generally speaking, you don't want to involve many layers of 
management tooling in these decisions because the result is going to be 
that you break the migration downtime contract.



  With some disk locking
approaches we need todo a lock transfer before allowing the dest
to continue running.


QEMU is going to read the disk before the migration completes so the 
lock transfer is not going to work with the current implementation (it 
needs to read the disk to do probing).  I assume this is not something 
that's actually been implemented.



  It could be optimized to avoid the -S /cont
in cases where those two scenarios aren't relevant, but only if
we can get a separate async notification of when migration starts
and completes on the destination, so we can notify mgmt apps that
need this lifecycle event.
   


Migration completes == guest starts running.  You'll get a notification 
of that but you're not getting that now because you're doing -S which 
I'd argue is a functional problem on the part of libvirt (you're 
breaking the downtime contract).


I'm not sure why you would need a notification of when migration starts 
(since you know when you've started migration).


Regards,

Anthony Liguori


So in summary these lifecycle events on source + dest for start,
complete, fail, cancel are all focused on allowing libvirt to
remove its existing hacks in migration support for current QEMU.

Regards,
Daniel
   





[Qemu-devel] [PATCH 2/5] vnc: send desktopresize event as reply to set encodings

2010-05-25 Thread Gerd Hoffmann
In case the desktop did resize while the vnc connection setup was still
in progress the client isn't informed about it.  Send a desktop resize
event as soon as the client told us it can handle deskop resize via set
encodings message to make sure the client us up to date.

Signed-off-by: Gerd Hoffmann 
---
 vnc.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/vnc.c b/vnc.c
index aaebe24..0e0e566 100644
--- a/vnc.c
+++ b/vnc.c
@@ -1709,6 +1709,7 @@ static void set_encodings(VncState *vs, int32_t 
*encodings, size_t n_encodings)
 break;
 }
 }
+vnc_desktop_resize(vs);
 check_pointer_type_change(&vs->mouse_mode_notifier);
 }
 
-- 
1.6.6.1




[Qemu-devel] [PATCH 5/5] vnc: move size-changed check into the vnc_desktop_resize function.

2010-05-25 Thread Gerd Hoffmann
This make sure we send a desktop resize message only in case we actually
have to, using the new variables which track the clients desktop size.

Signed-off-by: Gerd Hoffmann 
---
 vnc.c |   11 +--
 1 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/vnc.c b/vnc.c
index 119ffe8..5715006 100644
--- a/vnc.c
+++ b/vnc.c
@@ -521,6 +521,10 @@ static void vnc_desktop_resize(VncState *vs)
 if (vs->csock == -1 || !vnc_has_feature(vs, VNC_FEATURE_RESIZE)) {
 return;
 }
+if (vs->client_width == ds_get_width(ds) &&
+vs->client_height == ds_get_height(ds)) {
+return;
+}
 vs->client_width = ds_get_width(ds);
 vs->client_height = ds_get_height(ds);
 vnc_write_u8(vs, VNC_MSG_SERVER_FRAMEBUFFER_UPDATE);
@@ -533,7 +537,6 @@ static void vnc_desktop_resize(VncState *vs)
 
 static void vnc_dpy_resize(DisplayState *ds)
 {
-int size_changed;
 VncDisplay *vd = ds->opaque;
 VncState *vs;
 
@@ -551,16 +554,12 @@ static void vnc_dpy_resize(DisplayState *ds)
 vd->guest.ds = qemu_mallocz(sizeof(*vd->guest.ds));
 if (ds_get_bytes_per_pixel(ds) != vd->guest.ds->pf.bytes_per_pixel)
 console_color_init(ds);
-size_changed = ds_get_width(ds) != vd->guest.ds->width ||
-   ds_get_height(ds) != vd->guest.ds->height;
 *(vd->guest.ds) = *(ds->surface);
 memset(vd->guest.dirty, 0xFF, sizeof(vd->guest.dirty));
 
 QTAILQ_FOREACH(vs, &vd->clients, next) {
 vnc_colordepth(vs);
-if (size_changed) {
-vnc_desktop_resize(vs);
-}
+vnc_desktop_resize(vs);
 if (vs->vd->cursor) {
 vnc_cursor_define(vs);
 }
-- 
1.6.6.1




Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity

On 05/25/2010 05:01 PM, Kevin Wolf wrote:



The current situation is that those block format drivers only exist in
qemu.git or as patches.  Surely that's even more unhappiness.
 

The difference is that in the current situation these drivers will be
part of the next qemu release, so the patch may be obsolete, but you
don't even need it any more.
   


The next qemu release may be six months in the future.  So if you're not 
happy with running qemu.git master or with patching a stable release, 
you have to wait.



If you start keeping block drivers outside qemu and not even try
integrating them, they'll stay external.
   


Which may or may not be a problem.


Confusion could be mitigated:

$ qemu -module my-fancy-block-format-driver.so
my-fancy-block-format-driver.so does not support this version of qemu
(0.19.2).  Please contact my-fancy-block-format-driver-de...@example.org.

The question is how many such block format drivers we expect.  We now
have two in the pipeline (ceph, sheepdog), it's reasonable to assume
we'll want an lvm2 driver and btrfs driver.  This is an area with a lot
of activity and a relatively simply interface.
 

What's the reason for not having these drivers upstream? Do we gain
anything by hiding them from our users and requiring them to install the
drivers separately from somewhere else?
   


Six months.

--
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Anthony Liguori

On 05/25/2010 11:16 AM, Avi Kivity wrote:

On 05/25/2010 06:01 PM, Anthony Liguori wrote:

On 05/25/2010 10:00 AM, Avi Kivity wrote:
The latter.  Why is it less important?  If you don't inherit the 
memory, you can't access it.


You can also pass /dev/shm fd's via SCM_RIGHTs to establish shared 
memory segments dynamically.


Doesn't work for anonymous memory.


What's wrong with /dev/shm memory?


The kernel treats anonymous and nonymous memory differently for 
swapping (see /proc/sys/vm/swappiness); transparent hugepages won't 
work for /dev/shm (though it may be argued that that's a problem with 
thp); setup (/dev/shm defaults to half memory IIRC, we want mem+swap); 
different cgroup handling; somewhat clunky (a minor concern to be sure).


Surely, with mmu notifiers, it wouldn't be that hard to share anonymous 
memory via an fd though, no?


Regards,

Anthony Liguori



Nothing is a killer, but we should prefer anonymous memory.






[Qemu-devel] [PATCH 1/5] vnc: factor out vnc_desktop_resize()

2010-05-25 Thread Gerd Hoffmann

Signed-off-by: Gerd Hoffmann 
---
 vnc.c |   24 
 1 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/vnc.c b/vnc.c
index 11ae3e5..aaebe24 100644
--- a/vnc.c
+++ b/vnc.c
@@ -514,6 +514,21 @@ void buffer_append(Buffer *buffer, const void *data, 
size_t len)
 buffer->offset += len;
 }
 
+static void vnc_desktop_resize(VncState *vs)
+{
+DisplayState *ds = vs->ds;
+
+if (vs->csock == -1 || !vnc_has_feature(vs, VNC_FEATURE_RESIZE)) {
+return;
+}
+vnc_write_u8(vs, VNC_MSG_SERVER_FRAMEBUFFER_UPDATE);
+vnc_write_u8(vs, 0);
+vnc_write_u16(vs, 1); /* number of rects */
+vnc_framebuffer_update(vs, 0, 0, ds_get_width(ds), ds_get_height(ds),
+   VNC_ENCODING_DESKTOPRESIZE);
+vnc_flush(vs);
+}
+
 static void vnc_dpy_resize(DisplayState *ds)
 {
 int size_changed;
@@ -542,14 +557,7 @@ static void vnc_dpy_resize(DisplayState *ds)
 QTAILQ_FOREACH(vs, &vd->clients, next) {
 vnc_colordepth(vs);
 if (size_changed) {
-if (vs->csock != -1 && vnc_has_feature(vs, VNC_FEATURE_RESIZE)) {
-vnc_write_u8(vs, VNC_MSG_SERVER_FRAMEBUFFER_UPDATE);
-vnc_write_u8(vs, 0);
-vnc_write_u16(vs, 1); /* number of rects */
-vnc_framebuffer_update(vs, 0, 0, ds_get_width(ds), 
ds_get_height(ds),
-VNC_ENCODING_DESKTOPRESIZE);
-vnc_flush(vs);
-}
+vnc_desktop_resize(vs);
 }
 if (vs->vd->cursor) {
 vnc_cursor_define(vs);
-- 
1.6.6.1




Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity

On 05/25/2010 06:01 PM, Anthony Liguori wrote:

On 05/25/2010 10:00 AM, Avi Kivity wrote:
The latter.  Why is it less important?  If you don't inherit the 
memory, you can't access it.


You can also pass /dev/shm fd's via SCM_RIGHTs to establish shared 
memory segments dynamically.


Doesn't work for anonymous memory.


What's wrong with /dev/shm memory?


The kernel treats anonymous and nonymous memory differently for swapping 
(see /proc/sys/vm/swappiness); transparent hugepages won't work for 
/dev/shm (though it may be argued that that's a problem with thp); setup 
(/dev/shm defaults to half memory IIRC, we want mem+swap); different 
cgroup handling; somewhat clunky (a minor concern to be sure).


Nothing is a killer, but we should prefer anonymous memory.

--
error compiling committee.c: too many arguments to function




[Qemu-devel] [Bug 585449] Re: 'Broken pipe' error when starting a vm

2010-05-25 Thread Anthony Liguori
Please report this bug against libvirt as this does not appear to be an
upstream qemu issue.

** Changed in: qemu
   Status: New => Invalid

-- 
'Broken pipe' error when starting a vm
https://bugs.launchpad.net/bugs/585449
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in libvirt virtualization API: Invalid

Bug description:
Occasionally get an error 'libvir: Remote error : cannot send data: Broken 
pipe' scrolling down the screen when starting a VM.  VM does not start and end 
up having to reboot server to be able to get vm going again

virsh -q -c qemu:///system  start dev32
libvir: QEMU error : server closed connection
libvir: Remote error : cannot send data: Broken pipe
libvir: Remote error : cannot send data: Broken pipe
...last line keep being printed every second or so...

Trying to then use virt-manager produced the attached log file

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:Ubuntu 10.04 LTS
Release:10.04
Codename:   lucid

$ dpkg -l | grep kvm
ii  kvm
1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9  dummy transitional pacakge from 
kvm to qemu-
ii  kvm-pxe5.4.4-1ubuntu1   
   PXE ROM's for KVM
ii  qemu-kvm   0.12.3+noroms-0ubuntu9   
   Full virtualization on i386 and amd64 hardwa

# cat/var/log/libvirt/qemu/dev32.log
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin 
QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 768 -smp 1 -name d
ev32 -uuid 4eb67c83-8e6a-ba91-5741-6789da4e4004 -chardev 
socket,id=monitor,path=/var/lib/libvirt/qemu/dev32.monitor,server,nowait 
-monitor chardev:monitor 
-localtime -boot c -drive 
file=/srv/vm_disks/dev32/disk0.qcow2,if=ide,index=0,boot=on -drive 
file=/home/dferguson/Disk_Images/MS_Win7Pro.iso,if=ide,media=c
drom,index=2 -net nic,macaddr=52:54:00:1b:24:fb,vlan=0,name=nic.0 -net 
tap,fd=35,vlan=0,name=tap.0 -chardev pty,id=serial0 -serial chardev:serial0 
-paralle
l none -usb -usbdevice tablet -vnc 127.0.0.1:0 -k en-gb -vga cirrus 
char device redirected to /dev/pts/5
pci_add_option_rom: failed to find romfile "pxe-rtl8139.bin"
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin 
QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 768 -smp 1 -name d
ev32 -uuid 4eb67c83-8e6a-ba91-5741-6789da4e4004 -chardev 
socket,id=monitor,path=/var/lib/libvirt/qemu/dev32.monitor,server,nowait 
-monitor chardev:monitor 
-localtime -boot c -drive 
file=/srv/vm_disks/dev32/disk0.qcow2,if=ide,index=0,boot=on -drive 
file=/home/dferguson/Disk_Images/MS_Win7Pro.iso,if=ide,media=c
drom,index=2 -net nic,macaddr=52:54:00:1b:24:fb,vlan=0,name=nic.0 -net 
tap,fd=35,vlan=0,name=tap.0 -chardev pty,id=serial0 -serial chardev:serial0 
-paralle
l none -usb -usbdevice tablet -vnc 127.0.0.1:1 -k en-gb -vga cirrus 
char device redirected to /dev/pts/10

# kvm-img info /srv/vm_disks/dev32/disk0.qcow2
image: /srv/vm_disks/dev32/disk0.qcow2
file format: qcow2
virtual size: 16G (17179869184 bytes)
disk size: 8.2G
cluster_size: 65536


What other information would be helpful?





[Qemu-devel] Re: [PATCH 3/5] QMP: Introduce MIGRATION events

2010-05-25 Thread Anthony Liguori

On 05/25/2010 11:04 AM, Juan Quintela wrote:

Anthony Liguori  wrote:
   

On 05/25/2010 10:35 AM, Juan Quintela wrote:
 
   

problem here is that libvirt start target with -S, and waits to do the
"cont" as soon as possible.  As of know, only way to do it is to poll
info migrate on source faster.

   

Why does it do that??

That sound like a terrible idea.
 

Becaues migration is not reliable, and they don't have a way to issue
cont only in one of the sides :(
   


I don't know what you mean by reliable.

When the migration completes on the destination, it will start 
automatically.


The source will not start unless explicitly invoked.  If you 
successfully cancel a migration on the source, it's guaranteed that it 
won't start on the destination.  So the sequence looks like:


src) // decide we want to give up migration
src) migrate_cancel
src) // check migration status
src) cont // if migration cancelled
src) //if migration succeeded, check destination for completion
dst) // if not responsive and not completed in appropriate amount of 
time, kill guest

src) cont // if killed destination

I don't see what the problem is.


We make migration protocol reliable, or management application have to
decide when migration suceeded or not.
   


Reliability has nothing to do with the protocol and everything to do 
with the presence of the third node.



This new events help then a lot.  But they issue the cont really fast
(before migration ends).  I don't remember why they did that.
   


If libvirt is launching the destination with -S, it's doing the wrong 
thing and we ought make sure the proper fix gets implemented.



danp?

   

There should be some information about why it failed, no? Preferrably
in a QError format.

 

At this point, we have basically -1 :(

I can add a field with an error number, but we are very bad at the
moment about moving errno's upstack.

   

We need a better solution for reporting errors via notifications.
 

Suggestions?

Notice that what we need now is a way to know if migration ended with
success or in any other way, as soon as possible.
   


Markus/Luiz?


I think this makes more sense as a MIGRATION_CONNECTED event.  It
probably should carry peer information too.

 

What kind of peer information?

We have tcp/fd/exec/unix migrations.  calling it CONNECTED vs STARTED, I
don't care.  But adding information?  Notice that the management
application knows what it did, I can put the:

   "exec: gzip -d<   /tmp/foo"

string, but not much more that I can put here.

   

Basically, do we have any useful information in info migrate that we
can include?
 

(qemu) info migrate
Migration status: active
transferred ram: 874808 kbytes
remaining ram: 227912 kbytes
total ram: 1065344 kbytes
(qemu)

I can't see anything interesting to put here :(
   


Ugh.


About the CONNECTED/STARTED distintion, I fully agree with danp.  We
just want STARTED event for migration, CONNECTION should be generated
(or not) for all sockets/char devices.  it don't make sense for fd/exec
for instance.
   


That makes sense to me.

Regards,

Anthony Liguori


Later, Juan.
   





[Qemu-devel] Re: [PATCH 3/5] QMP: Introduce MIGRATION events

2010-05-25 Thread Juan Quintela
Anthony Liguori  wrote:
> On 05/25/2010 10:35 AM, Juan Quintela wrote:

>> problem here is that libvirt start target with -S, and waits to do the
>> "cont" as soon as possible.  As of know, only way to do it is to poll
>> info migrate on source faster.
>>
>
> Why does it do that??
>
> That sound like a terrible idea.

Becaues migration is not reliable, and they don't have a way to issue
cont only in one of the sides :(

We make migration protocol reliable, or management application have to
decide when migration suceeded or not.

This new events help then a lot.  But they issue the cont really fast
(before migration ends).  I don't remember why they did that.

danp?

>>> There should be some information about why it failed, no? Preferrably
>>> in a QError format.
>>>  
>> At this point, we have basically -1 :(
>>
>> I can add a field with an error number, but we are very bad at the
>> moment about moving errno's upstack.
>>
>
> We need a better solution for reporting errors via notifications.

Suggestions?

Notice that what we need now is a way to know if migration ended with
success or in any other way, as soon as possible.

>>> I think this makes more sense as a MIGRATION_CONNECTED event.  It
>>> probably should carry peer information too.
>>>  
>> What kind of peer information?
>>
>> We have tcp/fd/exec/unix migrations.  calling it CONNECTED vs STARTED, I
>> don't care.  But adding information?  Notice that the management
>> application knows what it did, I can put the:
>>
>>   "exec: gzip -d<  /tmp/foo"
>>
>> string, but not much more that I can put here.
>>
>
> Basically, do we have any useful information in info migrate that we
> can include?

(qemu) info migrate
Migration status: active
transferred ram: 874808 kbytes
remaining ram: 227912 kbytes
total ram: 1065344 kbytes
(qemu) 

I can't see anything interesting to put here :(

About the CONNECTED/STARTED distintion, I fully agree with danp.  We
just want STARTED event for migration, CONNECTION should be generated
(or not) for all sockets/char devices.  it don't make sense for fd/exec
for instance.

Later, Juan.



Re: [Qemu-devel] Re: [PATCH 3/5] QMP: Introduce MIGRATION events

2010-05-25 Thread Daniel P. Berrange
On Tue, May 25, 2010 at 10:57:33AM -0500, Anthony Liguori wrote:
> On 05/25/2010 10:35 AM, Juan Quintela wrote:
> >Anthony Liguori  wrote:
> >   
> >   
> >> 
> >>>+Data: None
> >>>+
> >>>+Example:
> >>>+
> >>>+{ "event": "MIGRATION_CANCELED",
> >>>+"timestamp": {"seconds": 1274687575, "microseconds": 592483} }
> >>>+
> >>>+MIGRATION_ENDED
> >>>+---
> >>>+
> >>>+Emitted when migration ends (both in source and target)
> >>>
> >>>   
> >>A start event is going to be generated already, no?
> >> 
> >problem here is that libvirt start target with -S, and waits to do the
> >"cont" as soon as possible.  As of know, only way to do it is to poll
> >info migrate on source faster.
> >   
> 
> Why does it do that??
> 
> That sound like a terrible idea.

Historically QEMU gave no alternative. Adding these STARTED/ENDED 
events is to allow libvirt to detect start + end of migration 
reliably, avoiding the previous hacks QEMU forced us todo on the
dest, and avoid the high rate polling we had no choice but todo
on the source.

> >>I think this makes more sense as a MIGRATION_CONNECTED event.  It
> >>probably should carry peer information too.
> >> 
> >What kind of peer information?
> >
> >We have tcp/fd/exec/unix migrations.  calling it CONNECTED vs STARTED, I
> >don't care.  But adding information?  Notice that the management
> >application knows what it did, I can put the:
> >
> >  "exec: gzip -d<  /tmp/foo"
> >
> >string, but not much more that I can put here.
> 
> Basically, do we have any useful information in info migrate that we can 
> include?

info migrate just includes the progress info + state (running, finished, 
cancelled, failed). The event itself replicates state. I don't see a hugely
compelling need to include the progress info in the FINISHED/CANCELLED
events. If really needed, the app can still call 'info migrate' to get it.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|



[Qemu-devel] Re: [PATCH 3/5] QMP: Introduce MIGRATION events

2010-05-25 Thread Juan Quintela
Anthony Liguori  wrote:
> On 05/25/2010 10:35 AM, Juan Quintela wrote:

>> problem here is that libvirt start target with -S, and waits to do the
>> "cont" as soon as possible.  As of know, only way to do it is to poll
>> info migrate on source faster.
>>
>
> Why does it do that??
>
> That sound like a terrible idea.

Becaues migration is not reliable, and they don't have a way to issue
cont only in one of the sides :(

We make migration protocol reliable, or management application have to
decide when migration suceeded or not.

This new events help then a lot.  But they issue the cont really fast
(before migration ends).  I don't remember why they did that.

danp?

>>> There should be some information about why it failed, no? Preferrably
>>> in a QError format.
>>>  
>> At this point, we have basically -1 :(
>>
>> I can add a field with an error number, but we are very bad at the
>> moment about moving errno's upstack.
>>
>
> We need a better solution for reporting errors via notifications.

Suggestions?

Notice that what we need now is a way to know if migration ended with
success or in any other way, as soon as possible.

>>> I think this makes more sense as a MIGRATION_CONNECTED event.  It
>>> probably should carry peer information too.
>>>  
>> What kind of peer information?
>>
>> We have tcp/fd/exec/unix migrations.  calling it CONNECTED vs STARTED, I
>> don't care.  But adding information?  Notice that the management
>> application knows what it did, I can put the:
>>
>>   "exec: gzip -d<  /tmp/foo"
>>
>> string, but not much more that I can put here.
>>
>
> Basically, do we have any useful information in info migrate that we
> can include?

(qemu) info migrate
Migration status: active
transferred ram: 874808 kbytes
remaining ram: 227912 kbytes
total ram: 1065344 kbytes
(qemu) 

I can't see anything interesting to put here :(

About the CONNECTED/STARTED distintion, I fully agree with danp.  We
just want STARTED event for migration, CONNECTION should be generated
(or not) for all sockets/char devices.  it don't make sense for fd/exec
for instance.

Later, Juan.



Re: [Qemu-devel] SDL fullscreen window dislikes being resized past the screen size

2010-05-25 Thread Anthony Liguori

On 05/25/2010 10:53 AM, Michael Tokarev wrote:

Initially it were a bugreport on #kvm IRC, someone
asked why his kvm exits when entering fullscreen mode,
saying the famous
 "Could not open SDL display"
and nothing more.

I added a bit of debug output and here's what I see:

...
resizing to 1440x900 0 0x115
resizing to 1440x900 32 0x8115
Could not open SDL display for 1440x900, bpp=32, flags=0x8115

flag=0x8000 means fullscreen.  My screen size is
1280x1024, -- 1440 is more than 1280.

It works just fine if I choose resolution less or equal
to my screen size.


Depending on how SDL is configured, it uses DGA or some other relic to 
actually implement full screen mode.  You cannot get a DGA screen that's 
larger than the physical monitor since it's often backed by video 
memory.  SDL isn't smart enough to degrade into a scaled mode either.


Honestly, SDL full screen mode is a bad idea.  We offer no indication 
that the guest is actually running which is potentially very 
confusing/dangerous.  We need a better backend to really implement a 
functioning full screen mode.


Regards,

Anthony Liguori


And it works just fine (as seen in the example output
above) that it works with larger resolutions but not
fullscreen (in that case SDL window will be scaled to
fit the actual desktop size).

It looks to me like an incorrect usage or assumptions
about SDL window, or maybe SDL bug.  Mine is
libsdl1.2debian 1.2.13-2.

And the error message is in usual qemu style - not at
all useful :)  But it's at least better than pure
exit without any messages at all, like was in hugetlbfs
code ;)

Thanks!

/mjt






[Qemu-devel] [Bug 585449] Re: 'Broken pipe' error when starting a vm

2010-05-25 Thread Cole Robinson
** Project changed: qemu => libvirt

-- 
'Broken pipe' error when starting a vm
https://bugs.launchpad.net/bugs/585449
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in libvirt virtualization API: Invalid

Bug description:
Occasionally get an error 'libvir: Remote error : cannot send data: Broken 
pipe' scrolling down the screen when starting a VM.  VM does not start and end 
up having to reboot server to be able to get vm going again

virsh -q -c qemu:///system  start dev32
libvir: QEMU error : server closed connection
libvir: Remote error : cannot send data: Broken pipe
libvir: Remote error : cannot send data: Broken pipe
...last line keep being printed every second or so...

Trying to then use virt-manager produced the attached log file

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:Ubuntu 10.04 LTS
Release:10.04
Codename:   lucid

$ dpkg -l | grep kvm
ii  kvm
1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9  dummy transitional pacakge from 
kvm to qemu-
ii  kvm-pxe5.4.4-1ubuntu1   
   PXE ROM's for KVM
ii  qemu-kvm   0.12.3+noroms-0ubuntu9   
   Full virtualization on i386 and amd64 hardwa

# cat/var/log/libvirt/qemu/dev32.log
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin 
QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 768 -smp 1 -name d
ev32 -uuid 4eb67c83-8e6a-ba91-5741-6789da4e4004 -chardev 
socket,id=monitor,path=/var/lib/libvirt/qemu/dev32.monitor,server,nowait 
-monitor chardev:monitor 
-localtime -boot c -drive 
file=/srv/vm_disks/dev32/disk0.qcow2,if=ide,index=0,boot=on -drive 
file=/home/dferguson/Disk_Images/MS_Win7Pro.iso,if=ide,media=c
drom,index=2 -net nic,macaddr=52:54:00:1b:24:fb,vlan=0,name=nic.0 -net 
tap,fd=35,vlan=0,name=tap.0 -chardev pty,id=serial0 -serial chardev:serial0 
-paralle
l none -usb -usbdevice tablet -vnc 127.0.0.1:0 -k en-gb -vga cirrus 
char device redirected to /dev/pts/5
pci_add_option_rom: failed to find romfile "pxe-rtl8139.bin"
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin 
QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 768 -smp 1 -name d
ev32 -uuid 4eb67c83-8e6a-ba91-5741-6789da4e4004 -chardev 
socket,id=monitor,path=/var/lib/libvirt/qemu/dev32.monitor,server,nowait 
-monitor chardev:monitor 
-localtime -boot c -drive 
file=/srv/vm_disks/dev32/disk0.qcow2,if=ide,index=0,boot=on -drive 
file=/home/dferguson/Disk_Images/MS_Win7Pro.iso,if=ide,media=c
drom,index=2 -net nic,macaddr=52:54:00:1b:24:fb,vlan=0,name=nic.0 -net 
tap,fd=35,vlan=0,name=tap.0 -chardev pty,id=serial0 -serial chardev:serial0 
-paralle
l none -usb -usbdevice tablet -vnc 127.0.0.1:1 -k en-gb -vga cirrus 
char device redirected to /dev/pts/10

# kvm-img info /srv/vm_disks/dev32/disk0.qcow2
image: /srv/vm_disks/dev32/disk0.qcow2
file format: qcow2
virtual size: 16G (17179869184 bytes)
disk size: 8.2G
cluster_size: 65536


What other information would be helpful?





[Qemu-devel] Re: [PATCH 3/5] QMP: Introduce MIGRATION events

2010-05-25 Thread Anthony Liguori

On 05/25/2010 10:35 AM, Juan Quintela wrote:

Anthony Liguori  wrote:
   

On 05/25/2010 09:21 AM, Juan Quintela wrote:
 
   

+MIGRATION_CANCELED
+--
+
+Emitted when migration is canceled.  This is emitted in the source.
+Target will emit MIGRATION_FAILED (no way to differentiate a FAILED
+and CANCELED migration for target).

   

But the management tool is the one that cancels so surely, it knows
why already.
 

ok, then that one is ok.

   
 

+Data: None
+
+Example:
+
+{ "event": "MIGRATION_CANCELED",
+"timestamp": {"seconds": 1274687575, "microseconds": 592483} }
+
+MIGRATION_ENDED
+---
+
+Emitted when migration ends (both in source and target)

   

A start event is going to be generated already, no?
 

problem here is that libvirt start target with -S, and waits to do the
"cont" as soon as possible.  As of know, only way to do it is to poll
info migrate on source faster.
   


Why does it do that??

That sound like a terrible idea.


+Data: None
+
+Example:
+
+{ "event": "MIGRATION_ENDED",
+"timestamp": {"seconds": 1274687575, "microseconds": 592483} }
+
+MIGRATION_FAILED
+
+
+Emitted when migration fails (both is source and target).
+
+Data: None

   

There should be some information about why it failed, no? Preferrably
in a QError format.
 

At this point, we have basically -1 :(

I can add a field with an error number, but we are very bad at the
moment about moving errno's upstack.
   


We need a better solution for reporting errors via notifications.


I think this makes more sense as a MIGRATION_CONNECTED event.  It
probably should carry peer information too.
 

What kind of peer information?

We have tcp/fd/exec/unix migrations.  calling it CONNECTED vs STARTED, I
don't care.  But adding information?  Notice that the management
application knows what it did, I can put the:

  "exec: gzip -d<  /tmp/foo"

string, but not much more that I can put here.
   


Basically, do we have any useful information in info migrate that we can 
include?


Regards,

Anthony Liguori


Later, Juan.
   





[Qemu-devel] SDL fullscreen window dislikes being resized past the screen size

2010-05-25 Thread Michael Tokarev

Initially it were a bugreport on #kvm IRC, someone
asked why his kvm exits when entering fullscreen mode,
saying the famous
 "Could not open SDL display"
and nothing more.

I added a bit of debug output and here's what I see:

...
resizing to 1440x900 0 0x115
resizing to 1440x900 32 0x8115
Could not open SDL display for 1440x900, bpp=32, flags=0x8115

flag=0x8000 means fullscreen.  My screen size is
1280x1024, -- 1440 is more than 1280.

It works just fine if I choose resolution less or equal
to my screen size.

And it works just fine (as seen in the example output
above) that it works with larger resolutions but not
fullscreen (in that case SDL window will be scaled to
fit the actual desktop size).

It looks to me like an incorrect usage or assumptions
about SDL window, or maybe SDL bug.  Mine is
libsdl1.2debian 1.2.13-2.

And the error message is in usual qemu style - not at
all useful :)  But it's at least better than pure
exit without any messages at all, like was in hugetlbfs
code ;)

Thanks!

/mjt



Re: [Qemu-devel] Re: [PATCH 3/5] QMP: Introduce MIGRATION events

2010-05-25 Thread Daniel P. Berrange
On Tue, May 25, 2010 at 05:35:53PM +0200, Juan Quintela wrote:
> Anthony Liguori  wrote:
> 
> >> +Example:
> >> +
> >> +{ "event": "MIGRATION_FAILED",
> >> +"timestamp": {"seconds": 1274687575, "microseconds": 592483} }
> >> +
> >> +MIGRATION_STARTED
> >> +-
> >> +
> >> +Emitted when migration starts (both in source and target).
> >>
> >
> > I think this makes more sense as a MIGRATION_CONNECTED event.  It
> > probably should carry peer information too.
> 
> What kind of peer information?
> 
> We have tcp/fd/exec/unix migrations.  calling it CONNECTED vs STARTED, I
> don't care.  But adding information?  Notice that the management
> application knows what it did, I can put the:
> 
>  "exec: gzip -d < /tmp/foo"
> 
> string, but not much more that I can put here.

This is why I think network client CONNECT/DISCONNECT events should be
separate from MIGRATION START/END events. They happen to occur at roughly
the same time if using a TCP / UNIX socket based migration transport,
but CONNECT/DISCONNECT + peer info is meaningless for exec or fd based
migration.


Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|



Re: [Qemu-devel] [PATCH 3/5] QMP: Introduce MIGRATION events

2010-05-25 Thread Daniel P. Berrange
On Tue, May 25, 2010 at 10:09:55AM -0500, Anthony Liguori wrote:
> On 05/25/2010 09:21 AM, Juan Quintela wrote:
> >They are emitted when migration starts, ends, has a failure or is canceled.
> >
> >+Data: None
> >+
> >+Example:
> >+
> >+{ "event": "MIGRATION_CANCELED",
> >+"timestamp": {"seconds": 1274687575, "microseconds": 592483} }
> >+
> >+MIGRATION_ENDED
> >+---
> >+
> >+Emitted when migration ends (both in source and target)
> >   
> 
> A start event is going to be generated already, no?
> 
> >+Data: None
> >+
> >+Example:
> >+
> >+{ "event": "MIGRATION_ENDED",
> >+"timestamp": {"seconds": 1274687575, "microseconds": 592483} }
> >+
> >+MIGRATION_FAILED
> >+
> >+
> >+Emitted when migration fails (both is source and target).
> >+
> >+Data: None
> >   
> 
> There should be some information about why it failed, no? Preferrably in 
> a QError format.
> 
> >+Example:
> >+
> >+{ "event": "MIGRATION_FAILED",
> >+"timestamp": {"seconds": 1274687575, "microseconds": 592483} }
> >+
> >+MIGRATION_STARTED
> >+-
> >+
> >+Emitted when migration starts (both in source and target).
> >   
> 
> I think this makes more sense as a MIGRATION_CONNECTED event.  It 
> probably should carry peer information too.

FYI the original request for these events from a libvirt POV
for in terms of identifying the lifecycle transitions.

Currently we issue a migration commend on source, and then have
to poll very frequently on 'info migrate' to get progress stats,
and to determine completion. We want to poll much less frequently
for stats, and get async notification of completion/errors on the
source. 

Simiarly on the destination, we need to know when any migration
operation is taking place, so we can avoid issuing monitor
commands to the QEMU process during that time, and also track
success/failure + eventually get progress information via an
equivalent of 'info migrate' on destination.

So this is really focused on lifecycle transitions, rather than
network client connections. I'm not convinced that we should mix
up the two sorts of data. If we want to track network client
connections IMHO they ought to be separate events. Perhaps
there should be a generic QEMU  network CONNECT/DISCONNECT
event that works for all QEMU network sockets (migration, chardevs,
netdev sockets, vnc, spice, and whatever we invent in future using
sockets).


Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|



[Qemu-devel] Re: [PATCH 3/5] QMP: Introduce MIGRATION events

2010-05-25 Thread Juan Quintela
Anthony Liguori  wrote:
> On 05/25/2010 09:21 AM, Juan Quintela wrote:

>> +MIGRATION_CANCELED
>> +--
>> +
>> +Emitted when migration is canceled.  This is emitted in the source.
>> +Target will emit MIGRATION_FAILED (no way to differentiate a FAILED
>> +and CANCELED migration for target).
>>
>
> But the management tool is the one that cancels so surely, it knows
> why already.

ok, then that one is ok.

>
>> +Data: None
>> +
>> +Example:
>> +
>> +{ "event": "MIGRATION_CANCELED",
>> +"timestamp": {"seconds": 1274687575, "microseconds": 592483} }
>> +
>> +MIGRATION_ENDED
>> +---
>> +
>> +Emitted when migration ends (both in source and target)
>>
>
> A start event is going to be generated already, no?

problem here is that libvirt start target with -S, and waits to do the
"cont" as soon as possible.  As of know, only way to do it is to poll
info migrate on source faster.

>> +Data: None
>> +
>> +Example:
>> +
>> +{ "event": "MIGRATION_ENDED",
>> +"timestamp": {"seconds": 1274687575, "microseconds": 592483} }
>> +
>> +MIGRATION_FAILED
>> +
>> +
>> +Emitted when migration fails (both is source and target).
>> +
>> +Data: None
>>
>
> There should be some information about why it failed, no? Preferrably
> in a QError format.

At this point, we have basically -1 :(

I can add a field with an error number, but we are very bad at the
moment about moving errno's upstack.

>> +Example:
>> +
>> +{ "event": "MIGRATION_FAILED",
>> +"timestamp": {"seconds": 1274687575, "microseconds": 592483} }
>> +
>> +MIGRATION_STARTED
>> +-
>> +
>> +Emitted when migration starts (both in source and target).
>>
>
> I think this makes more sense as a MIGRATION_CONNECTED event.  It
> probably should carry peer information too.

What kind of peer information?

We have tcp/fd/exec/unix migrations.  calling it CONNECTED vs STARTED, I
don't care.  But adding information?  Notice that the management
application knows what it did, I can put the:

 "exec: gzip -d < /tmp/foo"

string, but not much more that I can put here.

Later, Juan.



[Qemu-devel] [Bug 585449] Re: 'Broken pipe' error when starting a vm

2010-05-25 Thread Duncan Ferguson

** Attachment added: "virt-manager.log"
   http://launchpadlibrarian.net/49083168/virt-manager.log

-- 
'Broken pipe' error when starting a vm
https://bugs.launchpad.net/bugs/585449
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: New

Bug description:
Occasionally get an error 'libvir: Remote error : cannot send data: Broken 
pipe' scrolling down the screen when starting a VM.  VM does not start and end 
up having to reboot server to be able to get vm going again

virsh -q -c qemu:///system  start dev32
libvir: QEMU error : server closed connection
libvir: Remote error : cannot send data: Broken pipe
libvir: Remote error : cannot send data: Broken pipe
...last line keep being printed every second or so...

Trying to then use virt-manager produced the attached log file

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:Ubuntu 10.04 LTS
Release:10.04
Codename:   lucid

$ dpkg -l | grep kvm
ii  kvm
1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9  dummy transitional pacakge from 
kvm to qemu-
ii  kvm-pxe5.4.4-1ubuntu1   
   PXE ROM's for KVM
ii  qemu-kvm   0.12.3+noroms-0ubuntu9   
   Full virtualization on i386 and amd64 hardwa

# cat/var/log/libvirt/qemu/dev32.log
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin 
QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 768 -smp 1 -name d
ev32 -uuid 4eb67c83-8e6a-ba91-5741-6789da4e4004 -chardev 
socket,id=monitor,path=/var/lib/libvirt/qemu/dev32.monitor,server,nowait 
-monitor chardev:monitor 
-localtime -boot c -drive 
file=/srv/vm_disks/dev32/disk0.qcow2,if=ide,index=0,boot=on -drive 
file=/home/dferguson/Disk_Images/MS_Win7Pro.iso,if=ide,media=c
drom,index=2 -net nic,macaddr=52:54:00:1b:24:fb,vlan=0,name=nic.0 -net 
tap,fd=35,vlan=0,name=tap.0 -chardev pty,id=serial0 -serial chardev:serial0 
-paralle
l none -usb -usbdevice tablet -vnc 127.0.0.1:0 -k en-gb -vga cirrus 
char device redirected to /dev/pts/5
pci_add_option_rom: failed to find romfile "pxe-rtl8139.bin"
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin 
QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 768 -smp 1 -name d
ev32 -uuid 4eb67c83-8e6a-ba91-5741-6789da4e4004 -chardev 
socket,id=monitor,path=/var/lib/libvirt/qemu/dev32.monitor,server,nowait 
-monitor chardev:monitor 
-localtime -boot c -drive 
file=/srv/vm_disks/dev32/disk0.qcow2,if=ide,index=0,boot=on -drive 
file=/home/dferguson/Disk_Images/MS_Win7Pro.iso,if=ide,media=c
drom,index=2 -net nic,macaddr=52:54:00:1b:24:fb,vlan=0,name=nic.0 -net 
tap,fd=35,vlan=0,name=tap.0 -chardev pty,id=serial0 -serial chardev:serial0 
-paralle
l none -usb -usbdevice tablet -vnc 127.0.0.1:1 -k en-gb -vga cirrus 
char device redirected to /dev/pts/10

# kvm-img info /srv/vm_disks/dev32/disk0.qcow2
image: /srv/vm_disks/dev32/disk0.qcow2
file format: qcow2
virtual size: 16G (17179869184 bytes)
disk size: 8.2G
cluster_size: 65536


What other information would be helpful?





[Qemu-devel] [Bug 585449] [NEW] 'Broken pipe' error when starting a vm

2010-05-25 Thread Duncan Ferguson
Public bug reported:

Occasionally get an error 'libvir: Remote error : cannot send data:
Broken pipe' scrolling down the screen when starting a VM.  VM does not
start and end up having to reboot server to be able to get vm going
again

virsh -q -c qemu:///system  start dev32
libvir: QEMU error : server closed connection
libvir: Remote error : cannot send data: Broken pipe
libvir: Remote error : cannot send data: Broken pipe
...last line keep being printed every second or so...

Trying to then use virt-manager produced the attached log file

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:Ubuntu 10.04 LTS
Release:10.04
Codename:   lucid

$ dpkg -l | grep kvm
ii  kvm
1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9  dummy transitional pacakge from 
kvm to qemu-
ii  kvm-pxe5.4.4-1ubuntu1   
   PXE ROM's for KVM
ii  qemu-kvm   0.12.3+noroms-0ubuntu9   
   Full virtualization on i386 and amd64 hardwa

# cat/var/log/libvirt/qemu/dev32.log
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin 
QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 768 -smp 1 -name d
ev32 -uuid 4eb67c83-8e6a-ba91-5741-6789da4e4004 -chardev 
socket,id=monitor,path=/var/lib/libvirt/qemu/dev32.monitor,server,nowait 
-monitor chardev:monitor 
-localtime -boot c -drive 
file=/srv/vm_disks/dev32/disk0.qcow2,if=ide,index=0,boot=on -drive 
file=/home/dferguson/Disk_Images/MS_Win7Pro.iso,if=ide,media=c
drom,index=2 -net nic,macaddr=52:54:00:1b:24:fb,vlan=0,name=nic.0 -net 
tap,fd=35,vlan=0,name=tap.0 -chardev pty,id=serial0 -serial chardev:serial0 
-paralle
l none -usb -usbdevice tablet -vnc 127.0.0.1:0 -k en-gb -vga cirrus 
char device redirected to /dev/pts/5
pci_add_option_rom: failed to find romfile "pxe-rtl8139.bin"
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin 
QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 768 -smp 1 -name d
ev32 -uuid 4eb67c83-8e6a-ba91-5741-6789da4e4004 -chardev 
socket,id=monitor,path=/var/lib/libvirt/qemu/dev32.monitor,server,nowait 
-monitor chardev:monitor 
-localtime -boot c -drive 
file=/srv/vm_disks/dev32/disk0.qcow2,if=ide,index=0,boot=on -drive 
file=/home/dferguson/Disk_Images/MS_Win7Pro.iso,if=ide,media=c
drom,index=2 -net nic,macaddr=52:54:00:1b:24:fb,vlan=0,name=nic.0 -net 
tap,fd=35,vlan=0,name=tap.0 -chardev pty,id=serial0 -serial chardev:serial0 
-paralle
l none -usb -usbdevice tablet -vnc 127.0.0.1:1 -k en-gb -vga cirrus 
char device redirected to /dev/pts/10

# kvm-img info /srv/vm_disks/dev32/disk0.qcow2
image: /srv/vm_disks/dev32/disk0.qcow2
file format: qcow2
virtual size: 16G (17179869184 bytes)
disk size: 8.2G
cluster_size: 65536


What other information would be helpful?

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
'Broken pipe' error when starting a vm
https://bugs.launchpad.net/bugs/585449
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: New

Bug description:
Occasionally get an error 'libvir: Remote error : cannot send data: Broken 
pipe' scrolling down the screen when starting a VM.  VM does not start and end 
up having to reboot server to be able to get vm going again

virsh -q -c qemu:///system  start dev32
libvir: QEMU error : server closed connection
libvir: Remote error : cannot send data: Broken pipe
libvir: Remote error : cannot send data: Broken pipe
...last line keep being printed every second or so...

Trying to then use virt-manager produced the attached log file

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:Ubuntu 10.04 LTS
Release:10.04
Codename:   lucid

$ dpkg -l | grep kvm
ii  kvm
1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9  dummy transitional pacakge from 
kvm to qemu-
ii  kvm-pxe5.4.4-1ubuntu1   
   PXE ROM's for KVM
ii  qemu-kvm   0.12.3+noroms-0ubuntu9   
   Full virtualization on i386 and amd64 hardwa

# cat/var/log/libvirt/qemu/dev32.log
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin 
QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 768 -smp 1 -name d
ev32 -uuid 4eb67c83-8e6a-ba91-5741-6789da4e4004 -chardev 
socket,id=monitor,path=/var/lib/libvirt/qemu/dev32.monitor,server,nowait 
-monitor chardev:monitor 
-localtime -boot c -drive 
file=/srv/vm_disks/dev32/disk0.qcow2,if=ide,index=0,boot=on -drive 
file=/home/dferguson/Disk_Images/MS_Win7Pro.iso,if=ide,media=c
drom,index=2 -net nic,macaddr=52:54:00:1b:24:fb,vlan=0,name=nic.0 -net 
tap,fd=35,vlan=0,name=tap.0 -chardev pty,id=serial0 -serial chardev:serial0 
-paralle
l none -usb -usbdevice tablet

Re: [Qemu-devel] [PATCH 3/5] QMP: Introduce MIGRATION events

2010-05-25 Thread Anthony Liguori

On 05/25/2010 09:21 AM, Juan Quintela wrote:

They are emitted when migration starts, ends, has a failure or is canceled.

Signed-off-by: Juan Quintela
---
  QMP/qmp-events.txt |   50 ++
  monitor.c  |   12 
  monitor.h  |4 
  3 files changed, 66 insertions(+), 0 deletions(-)

diff --git a/QMP/qmp-events.txt b/QMP/qmp-events.txt
index 01ec85f..93caa4d 100644
--- a/QMP/qmp-events.txt
+++ b/QMP/qmp-events.txt
@@ -26,6 +26,56 @@ Example:
  Note: If action is "stop", a STOP event will eventually follow the
  BLOCK_IO_ERROR event.
   




+MIGRATION_CANCELED
+--
+
+Emitted when migration is canceled.  This is emitted in the source.
+Target will emit MIGRATION_FAILED (no way to differentiate a FAILED
+and CANCELED migration for target).
   


But the management tool is the one that cancels so surely, it knows why 
already.



+Data: None
+
+Example:
+
+{ "event": "MIGRATION_CANCELED",
+"timestamp": {"seconds": 1274687575, "microseconds": 592483} }
+
+MIGRATION_ENDED
+---
+
+Emitted when migration ends (both in source and target)
   


A start event is going to be generated already, no?


+Data: None
+
+Example:
+
+{ "event": "MIGRATION_ENDED",
+"timestamp": {"seconds": 1274687575, "microseconds": 592483} }
+
+MIGRATION_FAILED
+
+
+Emitted when migration fails (both is source and target).
+
+Data: None
   


There should be some information about why it failed, no? Preferrably in 
a QError format.



+Example:
+
+{ "event": "MIGRATION_FAILED",
+"timestamp": {"seconds": 1274687575, "microseconds": 592483} }
+
+MIGRATION_STARTED
+-
+
+Emitted when migration starts (both in source and target).
   


I think this makes more sense as a MIGRATION_CONNECTED event.  It 
probably should carry peer information too.


Regards,

Anthony Liguori


+Data: None
+
+Example:
+
+{ "event": "MIGRATION_STARTED",
+"timestamp": {"seconds": 1274687575, "microseconds": 592483} }
+
  RESET
  -

diff --git a/monitor.c b/monitor.c
index ad50f12..5158780 100644
--- a/monitor.c
+++ b/monitor.c
@@ -444,6 +444,18 @@ void monitor_protocol_event(MonitorEvent event, QObject 
*data)
  case QEVENT_WATCHDOG:
  event_name = "WATCHDOG";
  break;
+case QEVENT_MIGRATION_STARTED:
+event_name = "MIGRATION_STARTED";
+break;
+case QEVENT_MIGRATION_ENDED:
+event_name = "MIGRATION_ENDED";
+break;
+case QEVENT_MIGRATION_FAILED:
+event_name = "MIGRATION_FAILED";
+break;
+case QEVENT_MIGRATION_CANCELED:
+event_name = "MIGRATION_CANCELED";
+break;
  default:
  abort();
  break;
diff --git a/monitor.h b/monitor.h
index ea15469..34bcd38 100644
--- a/monitor.h
+++ b/monitor.h
@@ -28,6 +28,10 @@ typedef enum MonitorEvent {
  QEVENT_BLOCK_IO_ERROR,
  QEVENT_RTC_CHANGE,
  QEVENT_WATCHDOG,
+QEVENT_MIGRATION_STARTED,
+QEVENT_MIGRATION_ENDED,
+QEVENT_MIGRATION_FAILED,
+QEVENT_MIGRATION_CANCELED,
  QEVENT_MAX,
  } MonitorEvent;

   





Re: [Qemu-devel] [PATCH] resent: x86/cpuid: propagate further CPUID leafs when -cpu host

2010-05-25 Thread Anthony Liguori

On 05/25/2010 08:21 AM, Andre Przywara wrote:

What's the benefit of exposing this information to the guest?


That is mostly to propagate the cache size and organization parameters 
to the guest:

>> +/* safe CPUID leafs to propagate to guest if -cpu host is specified
>> + * Intel defined leafs:
>> + * Cache descriptors (0x02)
>> + * Deterministic cache parameters (0x04)
>> + * Monitor/MWAIT parameters (0x05)
>> + *
>> + * AMD defined leafs:
>> + * L1 Cache and TLB (0x05)
>> + * L2+L3 TLB (0x06)
>> + * LongMode address size (0x08)
>> + * 1GB page TLB (0x19)
>> + * Performance optimization (0x1A)
>> + */
Since at least L1 and L2 caches are mostly private to vCPUs, I see no 
reason to disguise them.


But in practice, what is it useful for?  Just because we can expose it 
doesn't mean we should.


Regards,

Anthony Liguori


Regards,
Andre.






Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity

On 05/25/2010 05:09 PM, Kevin Wolf wrote:



The first part of your argument may be true, but the second isn't.  No
user can run upstream qemu.git.  It's not tested or supported, and has
no backwards compatibility guarantees.
 

The second part was basically meant to say "developers don't count here".
   


Agreed.

--
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity

On 05/25/2010 05:03 PM, Anthony Liguori wrote:

On 05/25/2010 08:55 AM, Avi Kivity wrote:

On 05/25/2010 04:53 PM, Kevin Wolf wrote:


I'm still not convinced that we need either. I share Christoph's 
concern

that we would make our life harder for almost no gain. It's probably a
very small group of users (if it exists at all) that wants to add new
block drivers themselves, but at the same time can't run upstream qemu.



The first part of your argument may be true, but the second isn't.  
No user can run upstream qemu.git.  It's not tested or supported, and 
has no backwards compatibility guarantees.


Yes, it does have backwards compatibility guarantees.


I meant a random untagged qemu.git snapshot.  Do we guarantee anything 
about it, except that it's likely to be broken?


--
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity

On 05/25/2010 05:05 PM, Anthony Liguori wrote:

On 05/25/2010 09:01 AM, Avi Kivity wrote:

On 05/25/2010 04:55 PM, Anthony Liguori wrote:

On 05/25/2010 08:38 AM, Avi Kivity wrote:

On 05/25/2010 04:35 PM, Anthony Liguori wrote:

On 05/25/2010 08:31 AM, Avi Kivity wrote:
A protocol based mechanism has the advantage of being more 
robust in the face of poorly written block backends so if it's 
possible to make it perform as well as a plugin, it's a 
preferable approach.


May be hard due to difficulty of exposing guest memory.


If someone did a series to add plugins, I would expect a very 
strong argument as to why a shared memory mechanism was not 
possible or at least plausible.


I'm not sure I understand why shared memory is such a bad thing 
wrt KVM.  Can you elaborate?  Is it simply a matter of fork()?


fork() doesn't work in the with of memory hotplug.  What else is 
there?




Is it that fork() doesn't work or is it that fork() is very expensive?


It doesn't work, fork() is done at block device creation time, which 
freezes the child memory map, while guest memory is allocated at 
hotplug time.


Now I'm confused.  I thought you were saying shared memory somehow 
affects fork().  If you're talking about shared memory inheritance via 
fork(), that's less important. 


The latter.  Why is it less important?  If you don't inherit the memory, 
you can't access it.


You can also pass /dev/shm fd's via SCM_RIGHTs to establish shared 
memory segments dynamically.


Doesn't work for anonymous memory.


--
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Anthony Liguori

On 05/25/2010 10:00 AM, Avi Kivity wrote:
The latter.  Why is it less important?  If you don't inherit the 
memory, you can't access it.


You can also pass /dev/shm fd's via SCM_RIGHTs to establish shared 
memory segments dynamically.


Doesn't work for anonymous memory.


What's wrong with /dev/shm memory?

Regards,

Anthony Liguori





[Qemu-devel] KVM call minutes for May 25

2010-05-25 Thread Chris Wright
Generic Asynchronous task offloading
- keep vcpu thread from blocking
- generic approach is useful, comes down to specifics
  - e.g. what is done in worker threads, how locking is handled
- offload blocking work to worker threads
- need to make device model reentrant 
  - can be simple w/ lock per device, but needs finer grained to be perfromant
  - but needs
- thread pool should be relatively easy to merge
- two alternatives for virtfs
  - one full request in thread, one that does syscalls asynchronously
  - need to have something demonstrably better to make threading acceptable
- complexity of state machine vs. locking
  - performance difference at the end (e.g. coarse grained locking
doesn't fully allow for parallel execution)
- lapic needs to be reentrant
  - in qemu lapic should show off the benefit very well
- hpet would also benefit from being threaded
- multiple threads per device (needs device specific locking)
  - qxl, scsi, virtfs...
- push global lock down to pio/mmio dispatch
  - audit for global qemu state changes done w/in device
- anthony will set up wiki page to describe making device models reentrant
  - looking for volunteers!



[Qemu-devel] [PATCH 5/5] QMP: Emit migration events on outgoing migration

2010-05-25 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 migration-exec.c |3 ++-
 migration-fd.c   |1 +
 migration-tcp.c  |2 ++
 migration-unix.c |2 ++
 migration.c  |5 +
 5 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/migration-exec.c b/migration-exec.c
index 07af11a..ebc9256 100644
--- a/migration-exec.c
+++ b/migration-exec.c
@@ -20,6 +20,7 @@
 #include "sysemu.h"
 #include "buffered_file.h"
 #include "block.h"
+#include "monitor.h"

 //#define DEBUG_MIGRATION_EXEC

@@ -91,9 +92,9 @@ MigrationState *exec_start_outgoing_migration(Monitor *mon,
 s->mig_state.shared = inc;

 s->state = MIG_STATE_ACTIVE;
+monitor_protocol_event(QEVENT_MIGRATION_STARTED, NULL);
 s->mon = NULL;
 s->bandwidth_limit = bandwidth_limit;
-
 if (!detach) {
 migrate_fd_monitor_suspend(s, mon);
 }
diff --git a/migration-fd.c b/migration-fd.c
index 6d14505..9c4c7ae 100644
--- a/migration-fd.c
+++ b/migration-fd.c
@@ -83,6 +83,7 @@ MigrationState *fd_start_outgoing_migration(Monitor *mon,
 s->mig_state.blk = blk;
 s->mig_state.shared = inc;

+monitor_protocol_event(QEVENT_MIGRATION_STARTED, NULL);
 s->state = MIG_STATE_ACTIVE;
 s->mon = NULL;
 s->bandwidth_limit = bandwidth_limit;
diff --git a/migration-tcp.c b/migration-tcp.c
index 20f2e37..11a1203 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -18,6 +18,7 @@
 #include "sysemu.h"
 #include "buffered_file.h"
 #include "block.h"
+#include "monitor.h"

 //#define DEBUG_MIGRATION_TCP

@@ -102,6 +103,7 @@ MigrationState *tcp_start_outgoing_migration(Monitor *mon,
 s->mig_state.blk = blk;
 s->mig_state.shared = inc;

+monitor_protocol_event(QEVENT_MIGRATION_STARTED, NULL);
 s->state = MIG_STATE_ACTIVE;
 s->mon = NULL;
 s->bandwidth_limit = bandwidth_limit;
diff --git a/migration-unix.c b/migration-unix.c
index 57232c0..08f29a3 100644
--- a/migration-unix.c
+++ b/migration-unix.c
@@ -18,6 +18,7 @@
 #include "sysemu.h"
 #include "buffered_file.h"
 #include "block.h"
+#include "monitor.h"

 //#define DEBUG_MIGRATION_UNIX

@@ -101,6 +102,7 @@ MigrationState *unix_start_outgoing_migration(Monitor *mon,
 s->mig_state.blk = blk;
 s->mig_state.shared = inc;

+monitor_protocol_event(QEVENT_MIGRATION_STARTED, NULL);
 s->state = MIG_STATE_ACTIVE;
 s->mon = NULL;
 s->bandwidth_limit = bandwidth_limit;
diff --git a/migration.c b/migration.c
index 32470d5..86535a9 100644
--- a/migration.c
+++ b/migration.c
@@ -306,6 +306,7 @@ void migrate_fd_monitor_suspend(FdMigrationState *s, 
Monitor *mon)
 void migrate_fd_error(FdMigrationState *s)
 {
 DPRINTF("setting error state\n");
+monitor_protocol_event(QEVENT_MIGRATION_FAILED, NULL);
 s->state = MIG_STATE_ERROR;
 migrate_fd_cleanup(s);
 }
@@ -403,8 +404,10 @@ void migrate_fd_put_ready(void *opaque)
 if (old_vm_running) {
 vm_start();
 }
+monitor_protocol_event(QEVENT_MIGRATION_FAILED, NULL);
 state = MIG_STATE_ERROR;
 } else {
+monitor_protocol_event(QEVENT_MIGRATION_ENDED, NULL);
 state = MIG_STATE_COMPLETED;
 }
 migrate_fd_cleanup(s);
@@ -427,6 +430,7 @@ void migrate_fd_cancel(MigrationState *mig_state)

 DPRINTF("cancelling migration\n");

+monitor_protocol_event(QEVENT_MIGRATION_CANCELED, NULL);
 s->state = MIG_STATE_CANCELLED;
 qemu_savevm_state_cancel(s->mon, s->file);

@@ -440,6 +444,7 @@ void migrate_fd_release(MigrationState *mig_state)
 DPRINTF("releasing state\n");

 if (s->state == MIG_STATE_ACTIVE) {
+monitor_protocol_event(QEVENT_MIGRATION_CANCELED, NULL);
 s->state = MIG_STATE_CANCELLED;
 migrate_fd_cleanup(s);
 }
-- 
1.6.6.1




Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread MORITA Kazutaka
At Mon, 24 May 2010 14:16:32 -0500,
Anthony Liguori wrote:
> 
> On 05/24/2010 06:56 AM, Avi Kivity wrote:
> > On 05/24/2010 02:42 PM, MORITA Kazutaka wrote:
> >>
> >>> The server would be local and talk over a unix domain socket, perhaps
> >>> anonymous.
> >>>
> >>> nbd has other issues though, such as requiring a copy and no support 
> >>> for
> >>> metadata operations such as snapshot and file size extension.
> >>>
> >> Sorry, my explanation was unclear.  I'm not sure how running servers
> >> on localhost can solve the problem.
> >
> > The local server can convert from the local (nbd) protocol to the 
> > remote (sheepdog, ceph) protocol.
> >
> >> What I wanted to say was that we cannot specify the image of VM. With
> >> nbd protocol, command line arguments are as follows:
> >>
> >>   $ qemu nbd:hostname:port
> >>
> >> As this syntax shows, with nbd protocol the client cannot pass the VM
> >> image name to the server.
> >
> > We would extend it to allow it to connect to a unix domain socket:
> >
> >   qemu nbd:unix:/path/to/socket
> 
> nbd is a no-go because it only supports a single, synchronous I/O 
> operation at a time and has no mechanism for extensibility.
> 
> If we go this route, I think two options are worth considering.  The 
> first would be a purely socket based approach where we just accepted the 
> extra copy.
> 
> The other potential approach would be shared memory based.  We export 
> all guest ram as shared memory along with a small bounce buffer pool.  
> We would then use a ring queue (potentially even using virtio-blk) and 
> an eventfd for notification.
> 

The shared memory approach assumes that there is a local server who
can talk with the storage system.  But Ceph doesn't require the local
server, and Sheepdog would be extended to support VMs running outside
the storage system.  We could run a local daemon who can only work as
proxy, but I don't think it looks a clean approach.  So I think a
socket based approach is the right way to go.

BTW, is it required to design a common interface?  The way Sheepdog
replicates data is different from Ceph, so I think it is not possible
to define a common protocol as Christian says.

Regards,

Kazutaka

> > The server at the other end would associate the socket with a filename 
> > and forward it to the server using the remote protocol.
> >
> > However, I don't think nbd would be a good protocol.  My preference 
> > would be for a plugin API, or for a new local protocol that uses 
> > splice() to avoid copies.
> 
> I think a good shared memory implementation would be preferable to 
> plugins.  I think it's worth attempting to do a plugin interface for the 
> block layer but I strongly suspect it would not be sufficient.
> 
> I would not want to see plugins that interacted with BlockDriverState 
> directly, for instance.  We change it far too often.  Our main loop 
> functions are also not terribly stable so I'm not sure how we would 
> handle that (unless we forced all block plugins to be in a separate thread).
> 



[Qemu-devel] [PATCH 3/5] QMP: Introduce MIGRATION events

2010-05-25 Thread Juan Quintela
They are emitted when migration starts, ends, has a failure or is canceled.

Signed-off-by: Juan Quintela 
---
 QMP/qmp-events.txt |   50 ++
 monitor.c  |   12 
 monitor.h  |4 
 3 files changed, 66 insertions(+), 0 deletions(-)

diff --git a/QMP/qmp-events.txt b/QMP/qmp-events.txt
index 01ec85f..93caa4d 100644
--- a/QMP/qmp-events.txt
+++ b/QMP/qmp-events.txt
@@ -26,6 +26,56 @@ Example:
 Note: If action is "stop", a STOP event will eventually follow the
 BLOCK_IO_ERROR event.

+MIGRATION_CANCELED
+--
+
+Emitted when migration is canceled.  This is emitted in the source.
+Target will emit MIGRATION_FAILED (no way to differentiate a FAILED
+and CANCELED migration for target).
+
+Data: None
+
+Example:
+
+{ "event": "MIGRATION_CANCELED",
+"timestamp": {"seconds": 1274687575, "microseconds": 592483} }
+
+MIGRATION_ENDED
+---
+
+Emitted when migration ends (both in source and target)
+
+Data: None
+
+Example:
+
+{ "event": "MIGRATION_ENDED",
+"timestamp": {"seconds": 1274687575, "microseconds": 592483} }
+
+MIGRATION_FAILED
+
+
+Emitted when migration fails (both is source and target).
+
+Data: None
+
+Example:
+
+{ "event": "MIGRATION_FAILED",
+"timestamp": {"seconds": 1274687575, "microseconds": 592483} }
+
+MIGRATION_STARTED
+-
+
+Emitted when migration starts (both in source and target).
+
+Data: None
+
+Example:
+
+{ "event": "MIGRATION_STARTED",
+"timestamp": {"seconds": 1274687575, "microseconds": 592483} }
+
 RESET
 -

diff --git a/monitor.c b/monitor.c
index ad50f12..5158780 100644
--- a/monitor.c
+++ b/monitor.c
@@ -444,6 +444,18 @@ void monitor_protocol_event(MonitorEvent event, QObject 
*data)
 case QEVENT_WATCHDOG:
 event_name = "WATCHDOG";
 break;
+case QEVENT_MIGRATION_STARTED:
+event_name = "MIGRATION_STARTED";
+break;
+case QEVENT_MIGRATION_ENDED:
+event_name = "MIGRATION_ENDED";
+break;
+case QEVENT_MIGRATION_FAILED:
+event_name = "MIGRATION_FAILED";
+break;
+case QEVENT_MIGRATION_CANCELED:
+event_name = "MIGRATION_CANCELED";
+break;
 default:
 abort();
 break;
diff --git a/monitor.h b/monitor.h
index ea15469..34bcd38 100644
--- a/monitor.h
+++ b/monitor.h
@@ -28,6 +28,10 @@ typedef enum MonitorEvent {
 QEVENT_BLOCK_IO_ERROR,
 QEVENT_RTC_CHANGE,
 QEVENT_WATCHDOG,
+QEVENT_MIGRATION_STARTED,
+QEVENT_MIGRATION_ENDED,
+QEVENT_MIGRATION_FAILED,
+QEVENT_MIGRATION_CANCELED,
 QEVENT_MAX,
 } MonitorEvent;

-- 
1.6.6.1




[Qemu-devel] [Bug 585113] Re: e1000 irq problems after live migration with qemu-kvm 0.12.4

2010-05-25 Thread Peter Lieven
Additional Info:

1) If I use rtl8139 instead of e1000 NIC driver. The VM freezes at 100% CPU 
after migration
2) Ubuntu Lucid LTS 64-bit Server is also affected and shows same symtomps

-- 
e1000 irq problems after live migration with qemu-kvm 0.12.4 
https://bugs.launchpad.net/bugs/585113
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: New

Bug description:
sorry for resubmitting. i accidently moved this bug to qemu-kvm at launchpad 
where it is stuck...

After live migrating ubuntu 9.10 server (2.6.31-14-server) and suse linux 10.1 
(2.6.16.13-4-smp)
it happens sometimes that the guest runs into irq problems. i mention these 2 
guest oss
since i have seen the error there. there are likely others around with the same 
problem.

on the host i run 2.6.33.3 (kernel+mod) and qemu-kvm 0.12.4.

i started a vm with:
/usr/bin/qemu-kvm-0.12.4  -net tap,vlan=141,script=no,downscript=no,ifname=tap0 
-net nic,vlan=141,model=e1000,macaddr=52:54:00:ff:00:72   -drive 
file=/dev/sdb,if=ide,boot=on,cache=none,aio=native  -m 1024 -cpu 
qemu64,model_id='Intel(R) Xeon(R) CPU   E5430  @ 2.66GHz'  -monitor 
tcp:0:4001,server,nowait -vnc :1 -name 'migration-test-9-10'  -boot 
order=dc,menu=on  -k de  -incoming tcp:172.21.55.22:5001  -pidfile 
/var/run/qemu/vm-155.pid  -mem-path /hugepages -mem-prealloc  -rtc 
base=utc,clock=host -usb -usbdevice tablet 

for testing i have a clean ubuntu 9.10 server 64-bit install and created a 
small script with fetches a dvd iso from a local server and checking md5sum in 
an endless loop.

the download performance is approx. 50MB/s on that vm.

to trigger the error i did several migrations of the vm throughout the last 
days. finally I ended up in the following oops in the guest:

[64442.298521] irq 10: nobody cared (try booting with the "irqpoll" option)
[64442.299175] Pid: 0, comm: swapper Not tainted 2.6.31-14-server #48-Ubuntu
[64442.299179] Call Trace:
[64442.299185][] __report_bad_irq+0x26/0xa0
[64442.299227]  [] note_interrupt+0x18c/0x1d0
[64442.299232]  [] handle_fasteoi_irq+0xd5/0x100
[64442.299244]  [] handle_irq+0x1d/0x30
[64442.299246]  [] do_IRQ+0x67/0xe0
[64442.299249]  [] ret_from_intr+0x0/0x11
[64442.299266]  [] ? handle_IRQ_event+0x24/0x160
[64442.299269]  [] ? handle_edge_irq+0xcf/0x170
[64442.299271]  [] ? handle_irq+0x1d/0x30
[64442.299273]  [] ? do_IRQ+0x67/0xe0
[64442.299275]  [] ? ret_from_intr+0x0/0x11
[64442.299290]  [] ? _spin_unlock_irqrestore+0x14/0x20
[64442.299302]  [] ? scsi_dispatch_cmd+0x16c/0x2d0
[64442.299307]  [] ? scsi_request_fn+0x3aa/0x500
[64442.299322]  [] ? __blk_run_queue+0x6c/0x150
[64442.299324]  [] ? blk_run_queue+0x2b/0x50
[64442.299327]  [] ? scsi_run_queue+0xcf/0x2a0
[64442.299336]  [] ? scsi_next_command+0x3d/0x60
[64442.299338]  [] ? scsi_end_request+0xab/0xb0
[64442.299340]  [] ? scsi_io_completion+0x9e/0x4d0
[64442.299348]  [] ? default_spin_lock_flags+0x9/0x10
[64442.299351]  [] ? scsi_finish_command+0xbd/0x130
[64442.299353]  [] ? scsi_softirq_done+0x145/0x170
[64442.299356]  [] ? blk_done_softirq+0x7d/0x90
[64442.299368]  [] ? __do_softirq+0xbd/0x200
[64442.299370]  [] ? call_softirq+0x1c/0x30
[64442.299372]  [] ? do_softirq+0x55/0x90
[64442.299374]  [] ? irq_exit+0x85/0x90
[64442.299376]  [] ? do_IRQ+0x70/0xe0
[64442.299379]  [] ? ret_from_intr+0x0/0x11
[64442.299380][] ? native_safe_halt+0x6/0x10
[64442.299390]  [] ? default_idle+0x4c/0xe0
[64442.299395]  [] ? atomic_notifier_call_chain+0x15/0x20
[64442.299398]  [] ? cpu_idle+0xb2/0x100
[64442.299406]  [] ? rest_init+0x66/0x70
[64442.299424]  [] ? start_kernel+0x352/0x35b
[64442.299427]  [] ? x86_64_start_reservations+0x125/0x129
[64442.299429]  [] ? x86_64_start_kernel+0xfa/0x109
[64442.299433] handlers:
[64442.299840] [] (e1000_intr+0x0/0x190 [e1000])
[64442.300046] Disabling IRQ #10

After this the guest is still allive, but download performance is down to 
approx. 500KB/s

This error is definetly not triggerable with option -no-kvm-irqchip. I have 
seen this error occasionally
since my first experiments with qemu-kvm-88 and also without hugetablefs.

Help appreciated.





Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Kevin Wolf
Am 25.05.2010 15:25, schrieb Avi Kivity:
> On 05/25/2010 04:17 PM, Anthony Liguori wrote:
>> On 05/25/2010 04:14 AM, Avi Kivity wrote:
>>> On 05/24/2010 10:38 PM, Anthony Liguori wrote:

> - Building a plugin API seems a bit simpler to me, although I'm to
> sure if I'd get the
>idea correctly:
>The block layer has already some kind of api (.bdrv_file_open, 
> .bdrv_read). We
>could simply compile the block-drivers as shared objects and 
> create a method
>for loading the necessary modules at runtime.

 That approach would be a recipe for disaster.   We would have to 
 introduce a new, reduced functionality block API that was supported 
 for plugins.  Otherwise, the only way a plugin could keep up with 
 our API changes would be if it was in tree which defeats the purpose 
 of having plugins.
>>>
>>> We could guarantee API/ABI stability in a stable branch but not 
>>> across releases.
>>
>> We have releases every six months.  There would be tons of block 
>> plugins that didn't work for random sets of releases.  That creates a 
>> lot of user confusion and unhappiness.
> 
> The current situation is that those block format drivers only exist in 
> qemu.git or as patches.  Surely that's even more unhappiness.

The difference is that in the current situation these drivers will be
part of the next qemu release, so the patch may be obsolete, but you
don't even need it any more.

If you start keeping block drivers outside qemu and not even try
integrating them, they'll stay external.

> Confusion could be mitigated:
> 
>$ qemu -module my-fancy-block-format-driver.so
>my-fancy-block-format-driver.so does not support this version of qemu 
> (0.19.2).  Please contact my-fancy-block-format-driver-de...@example.org.
> 
> The question is how many such block format drivers we expect.  We now 
> have two in the pipeline (ceph, sheepdog), it's reasonable to assume 
> we'll want an lvm2 driver and btrfs driver.  This is an area with a lot 
> of activity and a relatively simply interface.

What's the reason for not having these drivers upstream? Do we gain
anything by hiding them from our users and requiring them to install the
drivers separately from somewhere else?

Kevin



Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity

On 05/25/2010 04:54 PM, Anthony Liguori wrote:

On 05/25/2010 08:36 AM, Avi Kivity wrote:


We'd need a kernel-level generic snapshot API for this eventually.

or (2) implement BUSE to complement FUSE and CUSE to enable proper 
userspace block devices.


Likely slow due do lots of copying.  Also needs a snapshot API.


The kernel could use splice.


Still can't make guest memory appear in (A)BUSE process memory without 
either mmu tricks (vmsplice in reverse) or a copy.  May be workable for 
an (A)BUSE driver that talks over a network, and thus can splice() its 
way out.


--
error compiling committee.c: too many arguments to function




[Qemu-devel] [PATCH 4/5] QMP: Emit migration events on incoming migration

2010-05-25 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 migration.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/migration.c b/migration.c
index c388902..32470d5 100644
--- a/migration.c
+++ b/migration.c
@@ -60,10 +60,13 @@ int qemu_start_incoming_migration(const char *uri)

 void process_incoming_migration(QEMUFile *f)
 {
+monitor_protocol_event(QEVENT_MIGRATION_STARTED, NULL);
 if (qemu_loadvm_state(f) < 0) {
+monitor_protocol_event(QEVENT_MIGRATION_FAILED, NULL);
 fprintf(stderr, "load of migration failed\n");
 exit(0);
 }
+monitor_protocol_event(QEVENT_MIGRATION_ENDED, NULL);
 qemu_announce_self();
 DPRINTF("successfully loaded vm state\n");

-- 
1.6.6.1




[Qemu-devel] [PATCH 1/5] Exit if incoming migration fails

2010-05-25 Thread Juan Quintela
Signed-off-by: Juan Quintela 
---
 migration.c |   16 ++--
 migration.h |2 +-
 vl.c|7 ++-
 3 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/migration.c b/migration.c
index 05f6cc5..9c1d4b6 100644
--- a/migration.c
+++ b/migration.c
@@ -36,22 +36,26 @@ static uint32_t max_throttle = (32 << 20);

 static MigrationState *current_migration;

-void qemu_start_incoming_migration(const char *uri)
+int qemu_start_incoming_migration(const char *uri)
 {
 const char *p;
+int ret;

 if (strstart(uri, "tcp:", &p))
-tcp_start_incoming_migration(p);
+ret = tcp_start_incoming_migration(p);
 #if !defined(WIN32)
 else if (strstart(uri, "exec:", &p))
-exec_start_incoming_migration(p);
+ret =  exec_start_incoming_migration(p);
 else if (strstart(uri, "unix:", &p))
-unix_start_incoming_migration(p);
+ret = unix_start_incoming_migration(p);
 else if (strstart(uri, "fd:", &p))
-fd_start_incoming_migration(p);
+ret = fd_start_incoming_migration(p);
 #endif
-else
+else {
 fprintf(stderr, "unknown migration protocol: %s\n", uri);
+ret = -EPROTONOSUPPORT;
+}
+return ret;
 }

 int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data)
diff --git a/migration.h b/migration.h
index 385423f..dd423a1 100644
--- a/migration.h
+++ b/migration.h
@@ -50,7 +50,7 @@ struct FdMigrationState
 void *opaque;
 };

-void qemu_start_incoming_migration(const char *uri);
+int qemu_start_incoming_migration(const char *uri);

 int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data);

diff --git a/vl.c b/vl.c
index 328395e..d13440d 100644
--- a/vl.c
+++ b/vl.c
@@ -3823,7 +3823,12 @@ int main(int argc, char **argv, char **envp)
 }

 if (incoming) {
-qemu_start_incoming_migration(incoming);
+int ret = qemu_start_incoming_migration(incoming);
+if (ret < 0) {
+fprintf(stderr, "Migration failed. Exit code %s(%d), exiting.\n",
+incoming, ret);
+exit(ret);
+}
 } else if (autostart) {
 vm_start();
 }
-- 
1.6.6.1




[Qemu-devel] [PATCH v2 0/5] Add QMP migration events

2010-05-25 Thread Juan Quintela
v2:
- Address pbonzini and mst changes
  (error messages and doc fixes)

v1:

This series does:

- exit incoming migration on failure.  For exec/fd migrations, once
  there was a failure, there was nothing useful to do.  And for tcp
  migration, not exiting created interesting bugs when trying to
  migrate again to a process with a faild migration.

- Factorize common migration code, no more duplication, makes easier to do
  "global" migration things, like QMP events.

- Introduce QMP events, both for incoming and outgoing migration.


Now, the million dollar question: Why I didn't refactorize outgoing
migration?  I tried, and have it partially done on my local tree.  But
it depends (too much) of current_migration global variable -> Libvirt
folks will also want "info migrate" to work on the incoming side,
i.e. current_migraition has to also be updated on incoming side.  Done
until here, but then I hit the wall "incoming migration is synchronous".

To make the monitor work on incoming migration, we need to change
buffered_file.c abstraction to also work for incoming fd's, or another
similar solution.  I am open to suggestions about what to do here.

This series are quite simple (the unfinished part is more complex),
will send the other part as an RFC later.

Please review and consider to apply it.

Later, Juan.


Juan Quintela (5):
  Exit if incoming migration fails
  Factorize common migration incoming code
  QMP: Introduce MIGRATION events
  QMP: Emit migration events on incoming migration
  QMP: Emit migration events on outgoing migration

 QMP/qmp-events.txt |   50 ++
 migration-exec.c   |   17 +++--
 migration-fd.c |   15 ++-
 migration-tcp.c|   17 -
 migration-unix.c   |   17 -
 migration.c|   37 +++--
 migration.h|4 +++-
 monitor.c  |   12 
 monitor.h  |4 
 vl.c   |7 ++-
 10 files changed, 119 insertions(+), 61 deletions(-)




[Qemu-devel] [PATCH 2/5] Factorize common migration incoming code

2010-05-25 Thread Juan Quintela
Signed-off-by: Juan Quintela 
---
 migration-exec.c |   14 +-
 migration-fd.c   |   14 +-
 migration-tcp.c  |   15 ++-
 migration-unix.c |   15 ++-
 migration.c  |   13 +
 migration.h  |2 ++
 6 files changed, 21 insertions(+), 52 deletions(-)

diff --git a/migration-exec.c b/migration-exec.c
index 5435827..07af11a 100644
--- a/migration-exec.c
+++ b/migration-exec.c
@@ -111,20 +111,8 @@ err_after_alloc:
 static void exec_accept_incoming_migration(void *opaque)
 {
 QEMUFile *f = opaque;
-int ret;

-ret = qemu_loadvm_state(f);
-if (ret < 0) {
-fprintf(stderr, "load of migration failed\n");
-goto err;
-}
-qemu_announce_self();
-DPRINTF("successfully loaded vm state\n");
-
-if (autostart)
-vm_start();
-
-err:
+process_incoming_migration(f);
 qemu_set_fd_handler2(qemu_stdio_fd(f), NULL, NULL, NULL, NULL);
 qemu_fclose(f);
 }
diff --git a/migration-fd.c b/migration-fd.c
index 0abd372..6d14505 100644
--- a/migration-fd.c
+++ b/migration-fd.c
@@ -104,20 +104,8 @@ err_after_alloc:
 static void fd_accept_incoming_migration(void *opaque)
 {
 QEMUFile *f = opaque;
-int ret;

-ret = qemu_loadvm_state(f);
-if (ret < 0) {
-fprintf(stderr, "load of migration failed\n");
-goto err;
-}
-qemu_announce_self();
-DPRINTF("successfully loaded vm state\n");
-
-if (autostart)
-vm_start();
-
-err:
+process_incoming_migration(f);
 qemu_set_fd_handler2(qemu_stdio_fd(f), NULL, NULL, NULL, NULL);
 qemu_fclose(f);
 }
diff --git a/migration-tcp.c b/migration-tcp.c
index 95ce722..20f2e37 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -143,7 +143,7 @@ static void tcp_accept_incoming_migration(void *opaque)
 socklen_t addrlen = sizeof(addr);
 int s = (unsigned long)opaque;
 QEMUFile *f;
-int c, ret;
+int c;

 do {
 c = qemu_accept(s, (struct sockaddr *)&addr, &addrlen);
@@ -162,18 +162,7 @@ static void tcp_accept_incoming_migration(void *opaque)
 goto out;
 }

-ret = qemu_loadvm_state(f);
-if (ret < 0) {
-fprintf(stderr, "load of migration failed\n");
-goto out_fopen;
-}
-qemu_announce_self();
-DPRINTF("successfully loaded vm state\n");
-
-if (autostart)
-vm_start();
-
-out_fopen:
+process_incoming_migration(f);
 qemu_fclose(f);
 out:
 qemu_set_fd_handler2(s, NULL, NULL, NULL, NULL);
diff --git a/migration-unix.c b/migration-unix.c
index 49de1b9..57232c0 100644
--- a/migration-unix.c
+++ b/migration-unix.c
@@ -149,7 +149,7 @@ static void unix_accept_incoming_migration(void *opaque)
 socklen_t addrlen = sizeof(addr);
 int s = (unsigned long)opaque;
 QEMUFile *f;
-int c, ret;
+int c;

 do {
 c = qemu_accept(s, (struct sockaddr *)&addr, &addrlen);
@@ -168,18 +168,7 @@ static void unix_accept_incoming_migration(void *opaque)
 goto out;
 }

-ret = qemu_loadvm_state(f);
-if (ret < 0) {
-fprintf(stderr, "load of migration failed\n");
-goto out_fopen;
-}
-qemu_announce_self();
-DPRINTF("successfully loaded vm state\n");
-
-if (autostart)
-vm_start();
-
-out_fopen:
+process_incoming_migration(f);
 qemu_fclose(f);
 out:
 qemu_set_fd_handler2(s, NULL, NULL, NULL, NULL);
diff --git a/migration.c b/migration.c
index 9c1d4b6..c388902 100644
--- a/migration.c
+++ b/migration.c
@@ -58,6 +58,19 @@ int qemu_start_incoming_migration(const char *uri)
 return ret;
 }

+void process_incoming_migration(QEMUFile *f)
+{
+if (qemu_loadvm_state(f) < 0) {
+fprintf(stderr, "load of migration failed\n");
+exit(0);
+}
+qemu_announce_self();
+DPRINTF("successfully loaded vm state\n");
+
+if (autostart)
+vm_start();
+}
+
 int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data)
 {
 MigrationState *s = NULL;
diff --git a/migration.h b/migration.h
index dd423a1..017e9c3 100644
--- a/migration.h
+++ b/migration.h
@@ -50,6 +50,8 @@ struct FdMigrationState
 void *opaque;
 };

+void process_incoming_migration(QEMUFile *f);
+
 int qemu_start_incoming_migration(const char *uri);

 int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data);
-- 
1.6.6.1




Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Anthony Liguori

On 05/25/2010 08:25 AM, Avi Kivity wrote:

On 05/25/2010 04:17 PM, Anthony Liguori wrote:

On 05/25/2010 04:14 AM, Avi Kivity wrote:

On 05/24/2010 10:38 PM, Anthony Liguori wrote:



- Building a plugin API seems a bit simpler to me, although I'm to
sure if I'd get the
   idea correctly:
   The block layer has already some kind of api (.bdrv_file_open, 
.bdrv_read). We
   could simply compile the block-drivers as shared objects and 
create a method

   for loading the necessary modules at runtime.


That approach would be a recipe for disaster.   We would have to 
introduce a new, reduced functionality block API that was supported 
for plugins.  Otherwise, the only way a plugin could keep up with 
our API changes would be if it was in tree which defeats the 
purpose of having plugins.


We could guarantee API/ABI stability in a stable branch but not 
across releases.


We have releases every six months.  There would be tons of block 
plugins that didn't work for random sets of releases.  That creates a 
lot of user confusion and unhappiness.


The current situation is that those block format drivers only exist in 
qemu.git or as patches.  Surely that's even more unhappiness.


Confusion could be mitigated:

  $ qemu -module my-fancy-block-format-driver.so
  my-fancy-block-format-driver.so does not support this version of 
qemu (0.19.2).  Please contact 
my-fancy-block-format-driver-de...@example.org.


The question is how many such block format drivers we expect.  We now 
have two in the pipeline (ceph, sheepdog), it's reasonable to assume 
we'll want an lvm2 driver and btrfs driver.  This is an area with a 
lot of activity and a relatively simply interface.


If we expose a simple interface, I'm all for it.  But BlockDriver is not 
simple and things like the snapshoting API need love.


Of course, there's certainly a question of why we're solving this in 
qemu at all.  Wouldn't it be more appropriate to either (1) implement a 
kernel module for ceph/sheepdog if performance matters or (2) implement 
BUSE to complement FUSE and CUSE to enable proper userspace block devices.


If you want to use a block device within qemu, you almost certainly want 
to be able to manipulate it on the host using standard tools (like mount 
and parted) so it stands to reason that addressing this in the kernel 
makes more sense.


Regards,

Anthony Liguori




Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Anthony Liguori

On 05/25/2010 09:01 AM, Avi Kivity wrote:

On 05/25/2010 04:55 PM, Anthony Liguori wrote:

On 05/25/2010 08:38 AM, Avi Kivity wrote:

On 05/25/2010 04:35 PM, Anthony Liguori wrote:

On 05/25/2010 08:31 AM, Avi Kivity wrote:
A protocol based mechanism has the advantage of being more robust 
in the face of poorly written block backends so if it's possible 
to make it perform as well as a plugin, it's a preferable approach.


May be hard due to difficulty of exposing guest memory.


If someone did a series to add plugins, I would expect a very 
strong argument as to why a shared memory mechanism was not 
possible or at least plausible.


I'm not sure I understand why shared memory is such a bad thing wrt 
KVM.  Can you elaborate?  Is it simply a matter of fork()?


fork() doesn't work in the with of memory hotplug.  What else is there?



Is it that fork() doesn't work or is it that fork() is very expensive?


It doesn't work, fork() is done at block device creation time, which 
freezes the child memory map, while guest memory is allocated at 
hotplug time.


Now I'm confused.  I thought you were saying shared memory somehow 
affects fork().  If you're talking about shared memory inheritance via 
fork(), that's less important.  You can also pass /dev/shm fd's via 
SCM_RIGHTs to establish shared memory segments dynamically.


Regards,

Anthony Liguori

fork() actually isn't very expensive since we use MADV_DONTFORK 
(probably fast enough for everything except realtime).


It may be possible to do a processfd() which can be mmap()ed by 
another process to export anonymous memory using mmu notifiers, not 
sure how easy or mergeable that is.







Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Anthony Liguori

On 05/25/2010 08:38 AM, Avi Kivity wrote:

On 05/25/2010 04:35 PM, Anthony Liguori wrote:

On 05/25/2010 08:31 AM, Avi Kivity wrote:
A protocol based mechanism has the advantage of being more robust 
in the face of poorly written block backends so if it's possible to 
make it perform as well as a plugin, it's a preferable approach.


May be hard due to difficulty of exposing guest memory.


If someone did a series to add plugins, I would expect a very strong 
argument as to why a shared memory mechanism was not possible or at 
least plausible.


I'm not sure I understand why shared memory is such a bad thing wrt 
KVM.  Can you elaborate?  Is it simply a matter of fork()?


fork() doesn't work in the with of memory hotplug.  What else is there?



Is it that fork() doesn't work or is it that fork() is very expensive?

Regards,

Anthony Liguori



Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Anthony Liguori

On 05/25/2010 08:57 AM, Avi Kivity wrote:

On 05/25/2010 04:54 PM, Anthony Liguori wrote:

On 05/25/2010 08:36 AM, Avi Kivity wrote:


We'd need a kernel-level generic snapshot API for this eventually.

or (2) implement BUSE to complement FUSE and CUSE to enable proper 
userspace block devices.


Likely slow due do lots of copying.  Also needs a snapshot API.


The kernel could use splice.


Still can't make guest memory appear in (A)BUSE process memory without 
either mmu tricks (vmsplice in reverse) or a copy.  May be workable 
for an (A)BUSE driver that talks over a network, and thus can splice() 
its way out.


splice() actually takes offset parameter so it may be possible to treat 
that offset parameter as a file offset.  That would essentially allow 
you to implement a splice() based thread pool where splice() replaces 
preadv/pwritev.


It's not quite linux-aio, but it should take you pretty far.   I think 
the main point is that the problem of allowing block plugins to qemu is 
the same as block plugins for the kernel.  The kernel doesn't provide a 
stable interface (and we probably can't for the same reasons) and it's 
generally discourage from a code quality perspective.


That said, making an external program work well as a block backend is 
identical to making userspace block devices fast.


Regards,

Anthony Liguori




Re: [Qemu-devel] Re: [PATCH v2 12/15] monitor: Add basic device state visualization

2010-05-25 Thread Avi Kivity

On 05/25/2010 04:31 PM, Anthony Liguori wrote:

On 05/25/2010 08:19 AM, Avi Kivity wrote:

On 05/25/2010 04:03 PM, Anthony Liguori wrote:


I don't think that qdev device names and paths are something we 
have to worry much about changing over time since they reflect 
logical bus layout.  They should remain static provided the 
devices remain static.


Modulo mistakes.  We already saw one (lack of pci domains).  To 
reduce the possibility of mistakes, we need reviewable documentation.



pci domains was only a mistake as a nice-to-have.  We can add pci 
domains in a backwards compatible way.


It adds a new level to the qdev tree.


The tree is not organized like that today.  IOW, the PCI hierarchy is 
not reflected in the qdev hierarchy.  All PCI devices (regardless of 
whether they're a function or a full slot) simply sit below the PCI bus.


That's a bug IMO, but regardless, s/qdev tree/pci device component of 
the qdev path/.



--
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity

On 05/25/2010 04:55 PM, Anthony Liguori wrote:

On 05/25/2010 08:38 AM, Avi Kivity wrote:

On 05/25/2010 04:35 PM, Anthony Liguori wrote:

On 05/25/2010 08:31 AM, Avi Kivity wrote:
A protocol based mechanism has the advantage of being more robust 
in the face of poorly written block backends so if it's possible 
to make it perform as well as a plugin, it's a preferable approach.


May be hard due to difficulty of exposing guest memory.


If someone did a series to add plugins, I would expect a very strong 
argument as to why a shared memory mechanism was not possible or at 
least plausible.


I'm not sure I understand why shared memory is such a bad thing wrt 
KVM.  Can you elaborate?  Is it simply a matter of fork()?


fork() doesn't work in the with of memory hotplug.  What else is there?



Is it that fork() doesn't work or is it that fork() is very expensive?


It doesn't work, fork() is done at block device creation time, which 
freezes the child memory map, while guest memory is allocated at hotplug 
time.


fork() actually isn't very expensive since we use MADV_DONTFORK 
(probably fast enough for everything except realtime).


It may be possible to do a processfd() which can be mmap()ed by another 
process to export anonymous memory using mmu notifiers, not sure how 
easy or mergeable that is.


--
error compiling committee.c: too many arguments to function




[Qemu-devel] Re: [PATCH 7/7] trace: Trace virtqueue operations

2010-05-25 Thread Stefan Hajnoczi
On Tue, May 25, 2010 at 2:52 PM, Avi Kivity  wrote:
> Hm.  Perhaps we can convert %{type} to %p for backends which don't support
> it, and to whatever format they do support for those that do.

True.

Stefan



Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Kevin Wolf
Am 25.05.2010 15:55, schrieb Avi Kivity:
> On 05/25/2010 04:53 PM, Kevin Wolf wrote:
>>
>> I'm still not convinced that we need either. I share Christoph's concern
>> that we would make our life harder for almost no gain. It's probably a
>> very small group of users (if it exists at all) that wants to add new
>> block drivers themselves, but at the same time can't run upstream qemu.
>>
>>
> 
> The first part of your argument may be true, but the second isn't.  No 
> user can run upstream qemu.git.  It's not tested or supported, and has 
> no backwards compatibility guarantees.

The second part was basically meant to say "developers don't count here".

Kevin



Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Anthony Liguori

On 05/25/2010 08:36 AM, Avi Kivity wrote:


We'd need a kernel-level generic snapshot API for this eventually.

or (2) implement BUSE to complement FUSE and CUSE to enable proper 
userspace block devices.


Likely slow due do lots of copying.  Also needs a snapshot API.


The kernel could use splice.


(ABUSE was proposed a while ago by Zach).

If you want to use a block device within qemu, you almost certainly 
want to be able to manipulate it on the host using standard tools 
(like mount and parted) so it stands to reason that addressing this 
in the kernel makes more sense.


qemu-nbd also allows this.

This reasoning also applies to qcow2, btw.


I know.

Regards,

Anthony Liguori





Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Anthony Liguori

On 05/25/2010 08:55 AM, Avi Kivity wrote:

On 05/25/2010 04:53 PM, Kevin Wolf wrote:


I'm still not convinced that we need either. I share Christoph's concern
that we would make our life harder for almost no gain. It's probably a
very small group of users (if it exists at all) that wants to add new
block drivers themselves, but at the same time can't run upstream qemu.



The first part of your argument may be true, but the second isn't.  No 
user can run upstream qemu.git.  It's not tested or supported, and has 
no backwards compatibility guarantees.


Yes, it does have backwards compatibility guarantees.

Regards,

Anthony Liguori




[Qemu-devel] Re: [PATCH 7/7] trace: Trace virtqueue operations

2010-05-25 Thread Avi Kivity

On 05/25/2010 04:27 PM, Stefan Hajnoczi wrote:

On Tue, May 25, 2010 at 1:04 PM, Avi Kivity  wrote:
   

Those %ps are more or less useless.  We need better ways of identifying
them.
 

You're right, the vq pointer is useless in isolation.  We don't know
which virtio device or which virtqueue number.

With the full context of a trace it would be possible to correlate the
vq pointer if we had trace events for vdev and vq setup.

Adding custom formatters is could be tricky since the format string is
passed only to tracing backends that use it, like UST.  And UST uses
its own sprintf implementation which we don't have direct control
over.
   


Hm.  Perhaps we can convert %{type} to %p for backends which don't 
support it, and to whatever format they do support for those that do.


--
error compiling committee.c: too many arguments to function




[Qemu-devel] Re: [PATCH 7/7] trace: Trace virtqueue operations

2010-05-25 Thread Stefan Hajnoczi
On Tue, May 25, 2010 at 1:04 PM, Avi Kivity  wrote:
> Those %ps are more or less useless.  We need better ways of identifying
> them.

You're right, the vq pointer is useless in isolation.  We don't know
which virtio device or which virtqueue number.

With the full context of a trace it would be possible to correlate the
vq pointer if we had trace events for vdev and vq setup.

Adding custom formatters is could be tricky since the format string is
passed only to tracing backends that use it, like UST.  And UST uses
its own sprintf implementation which we don't have direct control
over.

I think we just need to guarantee that any pointer can be correlated
with previous trace entries that give context for that pointer.

Stefan



[Qemu-devel] Re: [PATCH] add support for protocol driver create_options

2010-05-25 Thread Kevin Wolf
Am 24.05.2010 08:34, schrieb MORITA Kazutaka:
> At Fri, 21 May 2010 18:57:36 +0200,
> Kevin Wolf wrote:
>>
>> Am 20.05.2010 07:36, schrieb MORITA Kazutaka:
>>> +
>>> +/*
>>> + * Append an option list (list) to an option list (dest).
>>> + *
>>> + * If dest is NULL, a new copy of list is created.
>>> + *
>>> + * Returns a pointer to the first element of dest (or the newly allocated 
>>> copy)
>>> + */
>>> +QEMUOptionParameter *append_option_parameters(QEMUOptionParameter *dest,
>>> +QEMUOptionParameter *list)
>>> +{
>>> +size_t num_options, num_dest_options;
>>> +
>>> +num_options = count_option_parameters(dest);
>>> +num_dest_options = num_options;
>>> +
>>> +num_options += count_option_parameters(list);
>>> +
>>> +dest = qemu_realloc(dest, (num_options + 1) * 
>>> sizeof(QEMUOptionParameter));
>>> +
>>> +while (list && list->name) {
>>> +if (get_option_parameter(dest, list->name) == NULL) {
>>> +dest[num_dest_options++] = *list;
>>
>> You need to add a dest[num_dest_options].name = NULL; here. Otherwise
>> the next loop iteration works on uninitialized memory and possibly an
>> unterminated list. I got a segfault for that reason.
>>
> 
> I forgot to add it, sorry.
> Fixed version is below.
> 
> Thanks,
> 
> Kazutaka
> 
> ==
> This patch enables protocol drivers to use their create options which
> are not supported by the format.  For example, protcol drivers can use
> a backing_file option with raw format.
> 
> Signed-off-by: MORITA Kazutaka 

$ ./qemu-img create -f qcow2 -o cluster_size=4k /tmp/test.qcow2 4G
Unknown option 'cluster_size'
qemu-img: Invalid options for file format 'qcow2'.

I think you added another num_dest_options++ which shouldn't be there.

Kevin



Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity

On 05/25/2010 04:53 PM, Kevin Wolf wrote:


I'm still not convinced that we need either. I share Christoph's concern
that we would make our life harder for almost no gain. It's probably a
very small group of users (if it exists at all) that wants to add new
block drivers themselves, but at the same time can't run upstream qemu.

   


The first part of your argument may be true, but the second isn't.  No 
user can run upstream qemu.git.  It's not tested or supported, and has 
no backwards compatibility guarantees.


--
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] Re: [PATCH v2 12/15] monitor: Add basic device state visualization

2010-05-25 Thread Anthony Liguori

On 05/25/2010 08:19 AM, Avi Kivity wrote:

On 05/25/2010 04:03 PM, Anthony Liguori wrote:


I don't think that qdev device names and paths are something we 
have to worry much about changing over time since they reflect 
logical bus layout.  They should remain static provided the devices 
remain static.


Modulo mistakes.  We already saw one (lack of pci domains).  To 
reduce the possibility of mistakes, we need reviewable documentation.



pci domains was only a mistake as a nice-to-have.  We can add pci 
domains in a backwards compatible way.


It adds a new level to the qdev tree.


The tree is not organized like that today.  IOW, the PCI hierarchy is 
not reflected in the qdev hierarchy.  All PCI devices (regardless of 
whether they're a function or a full slot) simply sit below the PCI bus.




The arguments you're making about the importance of backwards 
compatibility and what's needed to strongly guarantee it are equally 
applicable to the live migration protocol.  We really do need to 
formally document the live migration protocol in such a way that it's 
reviewable if we hope to truly make it compatible across versions.


Mostly agreed.  I think live migration has a faster/easier deprecation 
schedule (easier not to support migration from 0.n-k to 0.n than to 
remove qmp support for a feature introduced in 0.n-k when releasing 
0.n).  But that's a minor concern, improving our externally visible 
interface documentation is a good thing and badly needed.




Regards,

Anthony Liguori




Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Kevin Wolf
Am 25.05.2010 15:25, schrieb Anthony Liguori:
> On 05/25/2010 06:25 AM, Avi Kivity wrote:
>> On 05/25/2010 02:02 PM, Kevin Wolf wrote:
>>>

> So could we not standardize a protocol for this that both sheepdog and
> ceph could implement?
 The protocol already exists, nbd.  It doesn't support snapshotting etc.
 but we could extend it.

 But IMO what's needed is a plugin API for the block layer.
>>> What would it buy us, apart from more downstreams and having to maintain
>>> a stable API and ABI?
>>
>> Currently if someone wants to add a new block format, they have to 
>> upstream it and wait for a new qemu to be released.  With a plugin 
>> API, they can add a new block format to an existing, supported qemu.
> 
> Whether we have a plugin or protocol based mechanism to implement block 
> formats really ends up being just an implementation detail.
> 
> In order to implement either, we need to take a subset of block 
> functionality that we feel we can support long term and expose that.  
> Right now, that's basically just querying characteristics (like size and 
> geometry) and asynchronous reads and writes.
> 
> A protocol based mechanism has the advantage of being more robust in the 
> face of poorly written block backends so if it's possible to make it 
> perform as well as a plugin, it's a preferable approach.
> 
> Plugins that just expose chunks of QEMU internal state directly (like 
> BlockDriver) are a really bad idea IMHO.

I'm still not convinced that we need either. I share Christoph's concern
that we would make our life harder for almost no gain. It's probably a
very small group of users (if it exists at all) that wants to add new
block drivers themselves, but at the same time can't run upstream qemu.

But if we were to decide that there's no way around it, I agree with you
that directly exposing the internal API isn't going to work.

Kevin



Re: [Qemu-devel] [PATCH] resent: x86/cpuid: propagate further CPUID leafs when -cpu host

2010-05-25 Thread Avi Kivity

On 05/25/2010 04:26 PM, Anthony Liguori wrote:

On 05/25/2010 08:21 AM, Andre Przywara wrote:

What's the benefit of exposing this information to the guest?


That is mostly to propagate the cache size and organization 
parameters to the guest:

>> +/* safe CPUID leafs to propagate to guest if -cpu host is specified
>> + * Intel defined leafs:
>> + * Cache descriptors (0x02)
>> + * Deterministic cache parameters (0x04)
>> + * Monitor/MWAIT parameters (0x05)
>> + *
>> + * AMD defined leafs:
>> + * L1 Cache and TLB (0x05)
>> + * L2+L3 TLB (0x06)
>> + * LongMode address size (0x08)
>> + * 1GB page TLB (0x19)
>> + * Performance optimization (0x1A)
>> + */
Since at least L1 and L2 caches are mostly private to vCPUs, I see no 
reason to disguise them.


But in practice, what is it useful for? 


See my other mail.


Just because we can expose it doesn't mean we should.


What's the point of -cpu host then?

--
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity

On 05/25/2010 04:25 PM, Anthony Liguori wrote:
Currently if someone wants to add a new block format, they have to 
upstream it and wait for a new qemu to be released.  With a plugin 
API, they can add a new block format to an existing, supported qemu.



Whether we have a plugin or protocol based mechanism to implement 
block formats really ends up being just an implementation detail.


True.

In order to implement either, we need to take a subset of block 
functionality that we feel we can support long term and expose that.  
Right now, that's basically just querying characteristics (like size 
and geometry) and asynchronous reads and writes.


Unfortunately, you're right.

A protocol based mechanism has the advantage of being more robust in 
the face of poorly written block backends so if it's possible to make 
it perform as well as a plugin, it's a preferable approach.


May be hard due to difficulty of exposing guest memory.



Plugins that just expose chunks of QEMU internal state directly (like 
BlockDriver) are a really bad idea IMHO.


Also, we don't want to expose all of the qemu API.  We should default 
the visibility attribute to "hidden" and expose only select functions, 
perhaps under their own interface.  And no inlines.


--
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity

On 05/25/2010 04:35 PM, Anthony Liguori wrote:

On 05/25/2010 08:31 AM, Avi Kivity wrote:
A protocol based mechanism has the advantage of being more robust in 
the face of poorly written block backends so if it's possible to 
make it perform as well as a plugin, it's a preferable approach.


May be hard due to difficulty of exposing guest memory.


If someone did a series to add plugins, I would expect a very strong 
argument as to why a shared memory mechanism was not possible or at 
least plausible.


I'm not sure I understand why shared memory is such a bad thing wrt 
KVM.  Can you elaborate?  Is it simply a matter of fork()?


fork() doesn't work in the with of memory hotplug.  What else is there?

--
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Anthony Liguori

On 05/25/2010 08:31 AM, Avi Kivity wrote:
A protocol based mechanism has the advantage of being more robust in 
the face of poorly written block backends so if it's possible to make 
it perform as well as a plugin, it's a preferable approach.


May be hard due to difficulty of exposing guest memory.


If someone did a series to add plugins, I would expect a very strong 
argument as to why a shared memory mechanism was not possible or at 
least plausible.


I'm not sure I understand why shared memory is such a bad thing wrt 
KVM.  Can you elaborate?  Is it simply a matter of fork()?




Plugins that just expose chunks of QEMU internal state directly (like 
BlockDriver) are a really bad idea IMHO.


Also, we don't want to expose all of the qemu API.  We should default 
the visibility attribute to "hidden" and expose only select functions, 
perhaps under their own interface.  And no inlines.


Yeah, if we did plugins, this would be a key requirement.

Regards,

Anthony Liguori



  1   2   >