[Qemu-devel] Tracing concerns for concurrent execution
Hi, I am writing a tracing back-end that logs tracepoint-based traces to qemu-internal global buffers. I had zeroed in on a lockless buffer for logging traces(to prevent slowdowns 'cos of locks held while tracing). However, I'm not sure if qemu threads might need some synchronisation to access the global trace buffer. I think the vcpu threads run lock-step, but this lock-step behaviour is not guaranteed for some other types of threads, such as the aio helper threads ? Also, can there be concurrency issues while tracing dynamic translator ? While I'm doing my study, it would be good to have some pointers on what assumptions can / not be made for concurrent thread execution in qemu, and what could be done to safeguard tracing in such hot-paths ? Regards, -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
At Tue, 25 May 2010 10:12:53 -0700 (PDT), Sage Weil wrote: > > On Tue, 25 May 2010, Avi Kivity wrote: > > > What's the reason for not having these drivers upstream? Do we gain > > > anything by hiding them from our users and requiring them to install the > > > drivers separately from somewhere else? > > > > > > > Six months. > > FWIW, we (Ceph) aren't complaining about the 6 month lag time (and I don't > think the Sheepdog guys are either). > I agree. We aren't complaining about it. > From our perspective, the current BlockDriver abstraction is ideal, as it > represents the reality of qemu's interaction with storage. Any 'external' > interface will be inferior to that in one way or another. But either way, > we are perfectly willing to work with you to all to keep in sync with any > future BlockDriver API improvements. It is worth our time investment even > if the API is less stable. > I agree. > The ability to dynamically load a shared object using the existing api > would make development a bit easier, but I'm not convinced it's better for > for users. I think having ceph and sheepdog upstream with qemu will serve > end users best, and we at least are willing to spend the time to help > maintain that code in qemu.git. > I agree. Regards, Kazutaka
[Qemu-devel] migrating guest with msi-x interrupts
Hi, I'm trying to migrate a guest device with MSI-X interrupts. However, the interrupts are not injected into the guest. I've added some tracing to msix.c and it seems that the MSI-X vectors are masked when the guest is resumed (I'm testing with static migration). In particular, in msix.c, msix_is_masked(...) is returning true when the guest resumes which causes msix_set_pending() to be called instead of msix_set_irq(). /* Send an MSI-X message */ void msix_notify(PCIDevice *dev, unsigned vector) { uint8_t *table_entry = dev->msix_table_page + vector * MSIX_ENTRY_SIZE; uint64_t address; uint32_t data; if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector]) return; if (msix_is_masked(dev, vector)) { msix_set_pending(dev, vector); return; } ... Does migrating a guest device that uses MSI-X require msix_load()/save() to be called explicity in a pre/post_save/load function? Any pointers or comments would be helpful, Cam
Re: [Qemu-devel] Re: [PATCH] Add cache=volatile parameter to -drive
On 05/25/2010 04:01 PM, Aurelien Jarno wrote: I really think this patch can be useful, in my own case when testing debian-installer (I already cache=writeback). In short all that is about developing and testing, as opposed to run a VM in production, can benefit about that. This was one of the original use case of QEMU before KVM arrived. Unless someone can convince me not to do it, I seriously considering applying this patch. There really needs to be an indication in the --help output of what the ramifications of this option are, in the very least. It should also be removable via a ./configure option because no sane distribution should enable this for end users. Regards, Anthony Liguori
[Qemu-devel] Re: [PATCH] add support for protocol driver create_options
At Tue, 25 May 2010 15:43:17 +0200, Kevin Wolf wrote: > > Am 24.05.2010 08:34, schrieb MORITA Kazutaka: > > At Fri, 21 May 2010 18:57:36 +0200, > > Kevin Wolf wrote: > >> > >> Am 20.05.2010 07:36, schrieb MORITA Kazutaka: > >>> + > >>> +/* > >>> + * Append an option list (list) to an option list (dest). > >>> + * > >>> + * If dest is NULL, a new copy of list is created. > >>> + * > >>> + * Returns a pointer to the first element of dest (or the newly > >>> allocated copy) > >>> + */ > >>> +QEMUOptionParameter *append_option_parameters(QEMUOptionParameter *dest, > >>> +QEMUOptionParameter *list) > >>> +{ > >>> +size_t num_options, num_dest_options; > >>> + > >>> +num_options = count_option_parameters(dest); > >>> +num_dest_options = num_options; > >>> + > >>> +num_options += count_option_parameters(list); > >>> + > >>> +dest = qemu_realloc(dest, (num_options + 1) * > >>> sizeof(QEMUOptionParameter)); > >>> + > >>> +while (list && list->name) { > >>> +if (get_option_parameter(dest, list->name) == NULL) { > >>> +dest[num_dest_options++] = *list; > >> > >> You need to add a dest[num_dest_options].name = NULL; here. Otherwise > >> the next loop iteration works on uninitialized memory and possibly an > >> unterminated list. I got a segfault for that reason. > >> > > > > I forgot to add it, sorry. > > Fixed version is below. > > > > Thanks, > > > > Kazutaka > > > > == > > This patch enables protocol drivers to use their create options which > > are not supported by the format. For example, protcol drivers can use > > a backing_file option with raw format. > > > > Signed-off-by: MORITA Kazutaka > > $ ./qemu-img create -f qcow2 -o cluster_size=4k /tmp/test.qcow2 4G > Unknown option 'cluster_size' > qemu-img: Invalid options for file format 'qcow2'. > > I think you added another num_dest_options++ which shouldn't be there. > Sorry again. I wrongly added `dest[num_dest_options++].name = NULL;' instead of `dest[num_dest_options].name = NULL;'. Thanks, Kazutaka == This patch enables protocol drivers to use their create options which are not supported by the format. For example, protcol drivers can use a backing_file option with raw format. Signed-off-by: MORITA Kazutaka --- block.c |7 +++ block.h |1 + qemu-img.c| 49 ++--- qemu-option.c | 53 ++--- qemu-option.h |2 ++ 5 files changed, 86 insertions(+), 26 deletions(-) diff --git a/block.c b/block.c index 6e7766a..f881f10 100644 --- a/block.c +++ b/block.c @@ -56,7 +56,6 @@ static int bdrv_read_em(BlockDriverState *bs, int64_t sector_num, uint8_t *buf, int nb_sectors); static int bdrv_write_em(BlockDriverState *bs, int64_t sector_num, const uint8_t *buf, int nb_sectors); -static BlockDriver *find_protocol(const char *filename); static QTAILQ_HEAD(, BlockDriverState) bdrv_states = QTAILQ_HEAD_INITIALIZER(bdrv_states); @@ -210,7 +209,7 @@ int bdrv_create_file(const char* filename, QEMUOptionParameter *options) { BlockDriver *drv; -drv = find_protocol(filename); +drv = bdrv_find_protocol(filename); if (drv == NULL) { drv = bdrv_find_format("file"); } @@ -283,7 +282,7 @@ static BlockDriver *find_hdev_driver(const char *filename) return drv; } -static BlockDriver *find_protocol(const char *filename) +BlockDriver *bdrv_find_protocol(const char *filename) { BlockDriver *drv1; char protocol[128]; @@ -478,7 +477,7 @@ int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags) BlockDriver *drv; int ret; -drv = find_protocol(filename); +drv = bdrv_find_protocol(filename); if (!drv) { return -ENOENT; } diff --git a/block.h b/block.h index 24efeb6..9034ebb 100644 --- a/block.h +++ b/block.h @@ -54,6 +54,7 @@ void bdrv_info_stats(Monitor *mon, QObject **ret_data); void bdrv_init(void); void bdrv_init_with_whitelist(void); +BlockDriver *bdrv_find_protocol(const char *filename); BlockDriver *bdrv_find_format(const char *format_name); BlockDriver *bdrv_find_whitelisted_format(const char *format_name); int bdrv_create(BlockDriver *drv, const char* filename, diff --git a/qemu-img.c b/qemu-img.c index cb007b7..ea091f0 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -252,8 +252,8 @@ static int img_create(int argc, char **argv) const char *base_fmt = NULL; const char *filename; const char *base_filename = NULL; -BlockDriver *drv; -QEMUOptionParameter *param = NULL; +BlockDriver *drv, *proto_drv; +QEMUOptionParameter *param = NULL, *create_options = NULL; char *options = NULL; flags = 0; @@ -286,33 +286,42 @@ static int img_create(int argc, char **argv) } } +/* Get the filename */ +if (optind >= argc) +help(); +filename = argv[optind++]; + /
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
Anthony Liguori wrote: > On 05/25/2010 02:09 PM, Blue Swirl wrote: >> On Mon, May 24, 2010 at 8:13 PM, Jan Kiszka wrote: >> >>> From: Jan Kiszka >>> >>> This allows to communicate potential IRQ coalescing during delivery from >>> the sink back to the source. Targets that support IRQ coalescing >>> workarounds need to register handlers that return the appropriate >>> QEMU_IRQ_* code, and they have to propergate the code across all IRQ >>> redirections. If the IRQ source receives a QEMU_IRQ_COALESCED, it can >>> apply its workaround. If multiple sinks exist, the source may only >>> consider an IRQ coalesced if all other sinks either report >>> QEMU_IRQ_COALESCED as well or QEMU_IRQ_MASKED. >>> >> No real devices are interested whether any of their output lines are >> even connected. This would introduce a new signal type, bidirectional >> multi-level, which is not correct. >> > > I don't think it's really an issue of correct, but I wouldn't disagree > to a suggestion that we ought to introduce a new signal type for this > type of bidirectional feedback. Maybe it's qemu_coalesced_irq and has a > similar interface as qemu_irq. A separate type would complicate the delivery of the feedback value across GPIO pins (as Paul requested for the RTC->HPET routing). > >> I think the real solution to coalescing is put the logic inside one >> device, in this case APIC because it has the information about irq >> delivery. APIC could monitor incoming RTC irqs for frequency >> information and whether they get delivered or not. If not, an internal >> timer is installed which injects the lost irqs. That won't fly as the IRQs will already arrive at the APIC with a sufficiently high jitter. At the bare minimum, you need to tell the interrupt controller about the fact that a particular IRQ should be delivered at a specific regular rate. For this, you also need a generic interface - nothing really "won". Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] Re: [PATCH] Add cache=volatile parameter to -drive
On Tue, May 25, 2010 at 07:59:18PM +0200, Alexander Graf wrote: > Anthony Liguori wrote: > > On 05/17/2010 11:23 AM, Paul Brook wrote: > I don't see a difference between the results. Apparently the barrier > option doesn't change a thing. > > >>> Ok. I don't like it, but I can see how it's compelling. I'd like to > >>> see the documentation improved though. I also think a warning printed > >>> on stdio about the safety of the option would be appropriate. > >>> > >> I disagree with this last bit. > >> > >> Errors should be issued if the user did something wrong. > >> Warnings should be issued if qemu did (or will soon do) something > >> other than > >> what the user requested, or otherwise made questionable decisions on the > >> user's behalf. > >> > >> In this case we're doing exactly what the user requested. The only > >> plausible > >> failure case is where a user is blindly trying options that they > >> clearly don't > >> understand or read the documentation for. I have zero sympathy for > >> complaints > >> like "Someone on the Internet told me to use --breakme, and broke > >> thinks". > >> > > > > I see it as the equivalent to the Taint bit in Linux. I want to make > > it clear to users up front that if you use this option, and you have > > data loss issues, don't complain. > > > > Just putting something in qemu-doc.texi is not enough IMHO. Few > > people actually read it. > > So what exactly is the conclusion here? I really want to see this > getting merged > I really think this patch can be useful, in my own case when testing debian-installer (I already cache=writeback). In short all that is about developing and testing, as opposed to run a VM in production, can benefit about that. This was one of the original use case of QEMU before KVM arrived. Unless someone can convince me not to do it, I seriously considering applying this patch. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On 05/25/2010 02:09 PM, Blue Swirl wrote: On Mon, May 24, 2010 at 8:13 PM, Jan Kiszka wrote: From: Jan Kiszka This allows to communicate potential IRQ coalescing during delivery from the sink back to the source. Targets that support IRQ coalescing workarounds need to register handlers that return the appropriate QEMU_IRQ_* code, and they have to propergate the code across all IRQ redirections. If the IRQ source receives a QEMU_IRQ_COALESCED, it can apply its workaround. If multiple sinks exist, the source may only consider an IRQ coalesced if all other sinks either report QEMU_IRQ_COALESCED as well or QEMU_IRQ_MASKED. No real devices are interested whether any of their output lines are even connected. This would introduce a new signal type, bidirectional multi-level, which is not correct. I don't think it's really an issue of correct, but I wouldn't disagree to a suggestion that we ought to introduce a new signal type for this type of bidirectional feedback. Maybe it's qemu_coalesced_irq and has a similar interface as qemu_irq. I think the real solution to coalescing is put the logic inside one device, in this case APIC because it has the information about irq delivery. APIC could monitor incoming RTC irqs for frequency information and whether they get delivered or not. If not, an internal timer is installed which injects the lost irqs. Of course, no real device could do such de-coalescing, but with this approach, the voodoo is contained to insides of one device, APIC. We should also take a step back to think what was the cause of lost irqs, IIRC uneven execution rate in QEMU. Not only that. The pathological case is where a host is limited to a 1khz timer frequency and the guest requests a 1khz timer frequency. Practically speaking, there is no way we'll ever be able to adjust timers to reinject lost interrupts because of the host timer limitation. Could this be fixed or taken into account in timer handling? For example, CPU loop could analyze the wall clock time between CPU exits and use that to offset the timers. Thus the timer frequency (in wall clock time) could be made to correspond a bit more to VCPU execution rate. A lot of what motivates the timer reinjection work is very old linux kernels that had fixed userspace timer frequencies. On newer host kernels, it's probably not nearly as important except when you get into pathological cases like exposing a high frequency HPET timer to the guest where you cannot keep up with the host. Regards, Anthony Liguori Signed-off-by: Jan Kiszka --- hw/irq.c | 38 +- hw/irq.h | 22 +++--- 2 files changed, 44 insertions(+), 16 deletions(-) diff --git a/hw/irq.c b/hw/irq.c index 7703f62..db2cce6 100644 --- a/hw/irq.c +++ b/hw/irq.c @@ -26,19 +26,27 @@ struct IRQState { qemu_irq_handler handler; +qemu_irq_fb_handler feedback_handler; void *opaque; int n; }; -void qemu_set_irq(qemu_irq irq, int level) +int qemu_set_irq(qemu_irq irq, int level) { -if (!irq) -return; - -irq->handler(irq->opaque, irq->n, level); +if (!irq) { +return 0; +} +if (irq->feedback_handler) { +return irq->feedback_handler(irq->opaque, irq->n, level); +} else { +irq->handler(irq->opaque, irq->n, level); +return QEMU_IRQ_DELIVERED; +} } -qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, void *opaque, int n) +static qemu_irq *allocate_irqs(qemu_irq_handler handler, + qemu_irq_fb_handler feedback_handler, + void *opaque, int n) { qemu_irq *s; struct IRQState *p; @@ -48,6 +56,7 @@ qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, void *opaque, int n) p = (struct IRQState *)qemu_mallocz(sizeof(struct IRQState) * n); for (i = 0; i< n; i++) { p->handler = handler; +p->feedback_handler = feedback_handler; p->opaque = opaque; p->n = i; s[i] = p; @@ -56,22 +65,33 @@ qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, void *opaque, int n) return s; } +qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, void *opaque, int n) +{ +return allocate_irqs(handler, NULL, opaque, n); +} + +qemu_irq *qemu_allocate_feedback_irqs(qemu_irq_fb_handler handler, + void *opaque, int n) +{ +return allocate_irqs(NULL, handler, opaque, n); +} + void qemu_free_irqs(qemu_irq *s) { qemu_free(s[0]); qemu_free(s); } -static void qemu_notirq(void *opaque, int line, int level) +static int qemu_notirq(void *opaque, int line, int level) { struct IRQState *irq = opaque; -irq->handler(irq->opaque, irq->n, !level); +return qemu_set_irq(irq, !level); } qemu_irq qemu_irq_invert(qemu_irq irq) { /* The default state for IRQs is low, so raise the output now. */ qemu_irq_raise(irq); -
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On Mon, May 24, 2010 at 2:17 AM, Yehuda Sadeh Weinraub wrote: > On Sun, May 23, 2010 at 12:59 AM, Blue Swirl wrote: >> On Thu, May 20, 2010 at 11:02 PM, Yehuda Sadeh Weinraub >> wrote: >>> On Thu, May 20, 2010 at 1:31 PM, Blue Swirl wrote: On Wed, May 19, 2010 at 7:22 PM, Christian Brunner wrote: > The attached patch is a block driver for the distributed file system > Ceph (http://ceph.newdream.net/). This driver uses librados (which > is part of the Ceph server) for direct access to the Ceph object > store and is running entirely in userspace. Therefore it is > called "rbd" - rados block device. >>> ... IIRC underscores here may conflict with system header use. Please use something like QEMU_BLOCK_RADOS_H. >>> >>> This header is shared between the linux kernel client and the ceph >>> userspace servers and client. We can actually get rid of it, as we >>> only need it to define CEPH_OSD_TMAP_SET. We can move this definition >>> to librados.h. >>> > diff --git a/block/rbd_types.h b/block/rbd_types.h > new file mode 100644 > index 000..dfd5aa0 > --- /dev/null > +++ b/block/rbd_types.h > @@ -0,0 +1,48 @@ > +#ifndef _FS_CEPH_RBD > +#define _FS_CEPH_RBD QEMU_BLOCK_RBD? >>> >>> This header is shared between the ceph kernel client, between the qemu >>> rbd module (and between other ceph utilities). It'd be much easier >>> maintaining it without having to have a different implementation for >>> each. The same goes to the use of __le32/64 and __u32/64 within these >>> headers. >> >> This is user space, so identifiers must conform to C standards. The >> identifiers beginning with underscores are reserved. >> >> Doesn't __le32/64 also depend on some GCC extension? Or sparse magic? > It depends on gcc extension. If needed we can probably have a separate > header for the qemu block device that uses alternative types. Though > looking at the qemu code I see use of other gcc extensions so I'm not > sure this is a real issue. We use some (contained with for example macros if possible), but in earlier discussions, __le32 etc. were considered problematic. IIRC it's hard to provide alternate versions for other compilers (or older versions of gcc). > >> >>> > + > +#include Can you use standard includes, like or ? Are Ceph libraries used in other systems than Linux? >>> >>> Not at the moment. I guess that we can take this include out. >>> > + > +/* > + * rbd image 'foo' consists of objects > + * foo.rbd - image metadata > + * foo. > + * foo.0001 > + * ... - data > + */ > + > +#define RBD_SUFFIX ".rbd" > +#define RBD_DIRECTORY "rbd_directory" > + > +#define RBD_DEFAULT_OBJ_ORDER 22 /* 4MB */ > + > +#define RBD_MAX_OBJ_NAME_SIZE 96 > +#define RBD_MAX_SEG_NAME_SIZE 128 > + > +#define RBD_COMP_NONE 0 > +#define RBD_CRYPT_NONE 0 > + > +static const char rbd_text[] = "<<< Rados Block Device Image >>>\n"; > +static const char rbd_signature[] = "RBD"; > +static const char rbd_version[] = "001.001"; > + > +struct rbd_obj_snap_ondisk { > + __le64 id; > + __le64 image_size; > +} __attribute__((packed)); > + > +struct rbd_obj_header_ondisk { > + char text[64]; > + char signature[4]; > + char version[8]; > + __le64 image_size; Unaligned? Is the disk format fixed? >>> >>> This is a packed structure that represents the on disk format. >>> Operations on it are being done only to read from the disk header or >>> to write to the disk header. >> >> That's clear. But what exactly is the alignment of field 'image_size'? >> Could there be implicit padding to mod 8 between 'version' and >> 'image_size' with some compilers? > > Obviously it's not 64 bit aligned. As it's an on-disk header, I don't > see alignment a real issue. As was said before, any operation on these > fields have to go through endianity conversion anyway, and this > structure should not be used directly. For such datastructures I'd > rather have the fields ordered in some logical order than maintaining > the alignment by ourselves. That's why we have that __attribute__ > packed in the end to let the compiler deal with those issues. Other > compilers though have their own syntax for packed structures (but I do > see other uses of this packed syntax in the qemu code). Packed structures are OK, but the padding should be explicit to avoid compiler problems. Eventually the disk format is read into memory buffer and then aligned fields should be also faster on all architectures, even on x86. >> >> If there were no other constraints, I'd either make the padding >> explicit, or rearrange/resize fields so that the field alignment is >> natural. Thus my question, can you change the disk format or are there >>
[Qemu-devel] [Bug 585529] Re: Documentation link is broken
Fixed now. In the future, please use the discussion page on the wiki to report bad links ** Changed in: qemu Status: New => Fix Released -- Documentation link is broken https://bugs.launchpad.net/bugs/585529 You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. Status in QEMU: Fix Released Bug description: Sorry, did not know where else to send this. I could not find a "contact us" page for QEMU. The link to "QEMU Documentation" on the page http://wiki.qemu.org/Manual is broken. It points to "http://wiki.qemu.org/download/qemu-doc.html"; which does not currently exist.
Re: [Qemu-devel] [PATCH 2/3] Tracepoint, buffer & monitor framework
On Tue, May 25, 2010 at 7:20 PM, Prerna Saxena wrote: >> Some added lines of code use tabs for indentation, 4 space indentation >> should >> be used. >> >> +struct tracepoint { >> + char *name; /* Tracepoint name */ >> + uint8_t trace_id; /* numerical ID */ >> >> Maximum 256 tracepoints in QEMU? A limit of 65536 is less likely to >> be an issue in the future. >> > > No, this field describes the maximum tracepoints for a given hash queue. I see now, thanks. > I'll work on merging this circular buffer + monitor-based reader as a > backend for your proposed tracer. Would it be a good idea to have two trace > buffers -- when one is full, it gets written to disk ; while the second is > used to log traces. In a double-buffering approach there are finite resources. There needs to be a case for when the write-out buffer hasn't been written yet and the active trace buffer becomes full. I think in that case the active buffer should start overwriting the oldest entry. > I think the monitor interface for reading traces can be retained as is. > Also, I'd implemented the monitor interface for enabling/disabling data > logging for a given tracepoint (for a running guest) Not sure if this is > supported in the set of patches you've posted ? It might be a good to have > feature. The "disable" trace event feature in my tracing backends patchset allows statically disabling a trace event. It doesn't support enabling/disabling trace events at runtime, which is left up to the backend. The motivation for the "disable" attribute in the trace-events file is to allow completely disabling a trace event without having to remove it from trace-events *and* removing trace_*() calls in QEMU source code. It's a handy way of completely knocking out a trace event. Thanks for your patches, Stefan
[Qemu-devel] Re: [PATCH 1/2] Pad iommu with an empty slot (necessary for SunOS 4.1.4)
On Tue, May 25, 2010 at 5:00 PM, Artyom Tarasenko wrote: > 2010/5/21 Blue Swirl : >> On Fri, May 21, 2010 at 5:23 PM, Artyom Tarasenko >> wrote: >>> 2010/5/10 Blue Swirl : On 5/10/10, Artyom Tarasenko wrote: > 2010/5/10 Blue Swirl : > > > On 5/10/10, Artyom Tarasenko wrote: > >> 2010/5/9 Blue Swirl : > >> > On 5/9/10, Artyom Tarasenko wrote: > >> >> 2010/5/9 Blue Swirl : > >> >> > >> >> > On 5/8/10, Artyom Tarasenko wrote: > >> >> >> On the real hardware (SS-5, LX) the MMU is not padded, but > aliased. > >> >> >> Software shouldn't use aliased addresses, neither should it > crash > >> >> >> when it uses (on the real hardware it wouldn't). Using > empty_slot > >> >> >> instead of aliasing can help with debugging such accesses. > >> >> > > >> >> > TurboSPARC Microprocessor User's Manual shows that there are > >> >> > additional pages after the main IOMMU for AFX registers. So > this is > >> >> > not board specific, but depends on CPU/IOMMU versions. > >> >> > >> >> > >> >> I checked it on the real hw: on LX and SS-5 these are aliased MMU > addresses. > >> >> SS-20 doesn't have any aliasing. > >> > > >> > But are your machines equipped with TurboSPARC or some other CPU? > >> > >> > >> Good point, I must confess, I missed the word "Turbo" in your first > >> answer. LX and SS-20 don't. > >> But SS-5 must have a TurboSPARC CPU: > >> > >> ok cd /FMI,MB86904 > >> ok .attributes > >> context-table 00 00 00 00 03 ff f0 00 00 00 10 00 > >> psr-implementation > >> psr-version 0004 > >> implementation > >> version 0004 > >> cache-line-size 0020 > >> cache-nlines 0200 > >> page-size 1000 > >> dcache-line-size 0010 > >> dcache-nlines 0200 > >> dcache-associativity 0001 > >> icache-line-size 0020 > >> icache-nlines 0200 > >> icache-associativity 0001 > >> ncaches 0002 > >> mmu-nctx 0100 > >> sparc-version 0008 > >> mask_rev 0026 > >> device_type cpu > >> name FMI,MB86904 > >> > >> and still it behaves the same as TI,TMS390S10 from the LX. This is > done on SS-5: > >> > >> ok 1000 20 spacel@ . > >> 409 > >> ok 1400 20 spacel@ . > >> 409 > >> ok 1404 20 spacel@ . > >> 23000 > >> ok 1f04 20 spacel@ . > >> 23000 > >> ok 1008 20 spacel@ . > >> 409 > >> ok 1428 20 spacel@ . > >> 409 > >> ok 100c 20 spacel@ . > >> 23000 > >> ok 1010 20 spacel@ . > >> 409 > >> > >> > >> LX is the same except for the IOMMU-version: > >> > >> ok 1000 20 spacel@ . > >> 405 > >> ok 1400 20 spacel@ . > >> 405 > >> ok 1800 20 spacel@ . > >> 405 > >> ok 1f00 20 spacel@ . > >> 405 > >> ok 1ff0 20 spacel@ . > >> 405 > >> ok 1fff0004 20 spacel@ . > >> 1fe000 > >> ok 1004 20 spacel@ . > >> 1fe000 > >> ok 1108 20 spacel@ . > >> 4105 > >> ok 1040 20 spacel@ . > >> 4105 > >> ok 1fff0040 20 spacel@ . > >> 4105 > >> ok 1fff0044 20 spacel@ . > >> 1fe000 > >> ok 1fff0024 20 spacel@ . > >> 1fe000 > >> > >> > >> >> At what address the additional AFX registers are located? > >> > > >> > Here's complete TurboSPARC IOMMU address map: > >> > PA[30:0] Register Access > >> > 1000_ IOMMU Control R/W > >> > 1000_0004 IOMMU Base Address R/W > >> > 1000_0014 Flush All IOTLB Entries W > >> > 1000_0018 Address Flush W > >> > 1000_1000 Asynchronous Fault Status R/W > >> > 1000_1004 Asynchronous Fault Address R/W > >> > 1000_1010 SBus Slot Configuration 0 R/W > >> > 1000_1014 SBus Slot Configuration 1 R/W > >> > 1000_1018 SBus Slot Configuration 2 R/W > >> > 1000_101C SBus Slot Configuration 3 R/W > >> > 1000_1020 SBus Slot Configuration 4 R/W > >> > 1000_1050 Memory Fault Status R/W > >> > 1000_1054 Memory Fault Address R/W > >> > 1000_2000 Module Identification R/W > >> > 1000_3018 Mask Identification R > >> > 1000_4000 AFX Queue Level W > >> > 1000_6000 AFX Queue Level R > >> > 1
Re: [Qemu-devel] [PATCH, RFC 3/4] Implement byte swapped MMIO type
On Mon, May 24, 2010 at 12:34 AM, Blue Swirl wrote: > BROKEN > > Signed-off-by: Blue Swirl > --- > cpu-common.h | 3 +- > softmmu_template.h | 69 > ++-- > 2 files changed, 63 insertions(+), 9 deletions(-) Changes to io_read and io_write seem to have issue with considering wrong value bits, you should check physaddr instead since index is cut off physaddr. It's probably a good idea to instead do a byteswap at __ld*/__st* level (a bit higher level than in your change.) These two sets are directly entered from runtime, and this way you need only two templates changed. Not sure if it is a good idea to handle unaligned accesses. -- Kind regards, Igor V. Kovalenko
Re: [Qemu-devel] [PATCH 2/2] sparc64: clean up pci bridge map
On Tue, May 25, 2010 at 11:24 PM, Blue Swirl wrote: > On Tue, May 25, 2010 at 12:09 PM, Igor V. Kovalenko > wrote: >> From: Igor V. Kovalenko >> >> - remove unused host state and store pci bus pointer only >> - do not map host state access into unused 1fe.1000 range >> - reorder pci region registration >> - assign pci i/o region to isa_mem_base > > Looks good. Could you make a separate patch from the part that depends > on OpenBIOS update and another for the cleanups? I think the cleanups > could be applied quickly, but the OpenBIOS PCI changes may need more > consideration. The only real cleanup is removal of host state which becomes unusable after mapping change. I think these changes may go in as is along with OpenBIOS set "sparc64 cleanups v1" which supports changed address ranges. The PCI changes to OpenBIOS in set "encode-int related changes and pci bus scan amendment" are separate but there is a simple dependency on these cleanups in register mapping area. It was probably bad idea to split OpenBIOS changes to 2 sets instead of sending those as one series. -- Kind regards, Igor V. Kovalenko
Re: [Qemu-devel] Re: irq problems after live migration with 0.12.4
25.05.2010 15:03, Peter Lieven wrote: Michael Tokarev wrote: 23.05.2010 13:55, Peter Lieven wrote: [] [64442.298521] irq 10: nobody cared (try booting with the "irqpoll" option) [] [64442.299433] handlers: [64442.299840] [] (e1000_intr+0x0/0x190 [e1000]) [64442.300046] Disabling IRQ #10 Apparently, for some reason, e1000_intr decided it's not interesting IRQ or somehow wrong or not for that NIC. I dunno. But something fishy is going on with IRQs here. See also LP bug #584131 (https://bugs.launchpad.net/bugs/584131) and original Debian bug#580649 (http://bugs.debian.org/580649) Not sure if they're related... It looks they are actually the same thing, but happens with different devices and/or IRQs. Either spurious, or unwanted, or unrecognized or somesuch IRQ which is not recognized by the irq handler, which results in disabling that IRQ by the kernel, which is a bad thing (In your case it works because e1000 works in 2 modes, interrupts and polling). michael, do you have any ideas what i got do to debug whats happening? Unfortunately, no idea. I don't know neither kernel nor kvm internals. looking at launchpad and debian bug tracker i found other bugs also with a maybe related problem. so this issue might be greater... Can you share your findings? I don't know other debian bugs which are similar to this one. Thanks! /mjt
[Qemu-devel] [Bug 585529] [NEW] Documentation link is broken
Public bug reported: Sorry, did not know where else to send this. I could not find a "contact us" page for QEMU. The link to "QEMU Documentation" on the page http://wiki.qemu.org/Manual is broken. It points to "http://wiki.qemu.org/download/qemu-doc.html"; which does not currently exist. ** Affects: qemu Importance: Undecided Status: New -- Documentation link is broken https://bugs.launchpad.net/bugs/585529 You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. Status in QEMU: New Bug description: Sorry, did not know where else to send this. I could not find a "contact us" page for QEMU. The link to "QEMU Documentation" on the page http://wiki.qemu.org/Manual is broken. It points to "http://wiki.qemu.org/download/qemu-doc.html"; which does not currently exist.
Re: [Qemu-devel] [PATCH 2/2] sparc64: clean up pci bridge map
On Tue, May 25, 2010 at 12:09 PM, Igor V. Kovalenko wrote: > From: Igor V. Kovalenko > > - remove unused host state and store pci bus pointer only > - do not map host state access into unused 1fe.1000 range > - reorder pci region registration > - assign pci i/o region to isa_mem_base Looks good. Could you make a separate patch from the part that depends on OpenBIOS update and another for the cleanups? I think the cleanups could be applied quickly, but the OpenBIOS PCI changes may need more consideration. > Signed-off-by: Igor V. Kovalenko > --- > hw/apb_pci.c | 49 ++--- > hw/sun4u.c | 4 ++-- > 2 files changed, 28 insertions(+), 25 deletions(-) > > diff --git a/hw/apb_pci.c b/hw/apb_pci.c > index 65d8ba6..b53e3c3 100644 > --- a/hw/apb_pci.c > +++ b/hw/apb_pci.c > @@ -65,7 +65,7 @@ do { printf("APB: " fmt , ## __VA_ARGS__); } while (0) > > typedef struct APBState { > SysBusDevice busdev; > - PCIHostState host_state; > + PCIBus *bus; > ReadWriteHandler pci_config_handler; > uint32_t iommu[4]; > uint32_t pci_control[16]; > @@ -191,7 +191,7 @@ static void apb_pci_config_write(ReadWriteHandler *h, > pcibus_t addr, > > val = qemu_bswap_len(val, size); > APB_DPRINTF("%s: addr " TARGET_FMT_lx " val %x\n", __func__, addr, val); > - pci_data_write(s->host_state.bus, addr, val, size); > + pci_data_write(s->bus, addr, val, size); > } > > static uint32_t apb_pci_config_read(ReadWriteHandler *h, pcibus_t addr, > @@ -200,7 +200,7 @@ static uint32_t apb_pci_config_read(ReadWriteHandler *h, > pcibus_t addr, > uint32_t ret; > APBState *s = container_of(h, APBState, pci_config_handler); > > - ret = pci_data_read(s->host_state.bus, addr, size); > + ret = pci_data_read(s->bus, addr, size); > ret = qemu_bswap_len(ret, size); > APB_DPRINTF("%s: addr " TARGET_FMT_lx " -> %x\n", __func__, addr, ret); > return ret; > @@ -331,37 +331,37 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base, > s = sysbus_from_qdev(dev); > /* apb_config */ > sysbus_mmio_map(s, 0, special_base); > + /* PCI configuration space */ > + sysbus_mmio_map(s, 1, special_base + 0x100ULL); > /* pci_ioport */ > - sysbus_mmio_map(s, 1, special_base + 0x200ULL); > - /* pci_config */ > - sysbus_mmio_map(s, 2, special_base + 0x100ULL); > - /* mem_data */ > - sysbus_mmio_map(s, 3, mem_base); > + sysbus_mmio_map(s, 2, special_base + 0x200ULL); > d = FROM_SYSBUS(APBState, s); > - d->host_state.bus = pci_register_bus(&d->busdev.qdev, "pci", > + > + d->bus = pci_register_bus(&d->busdev.qdev, "pci", > pci_apb_set_irq, pci_pbm_map_irq, d, > 0, 32); > - pci_bus_set_mem_base(d->host_state.bus, mem_base); > + pci_bus_set_mem_base(d->bus, mem_base); > > for (i = 0; i < 32; i++) { > sysbus_connect_irq(s, i, pic[i]); > } > > - pci_create_simple(d->host_state.bus, 0, "pbm"); > + pci_create_simple(d->bus, 0, "pbm"); > + > /* APB secondary busses */ > - *bus2 = pci_bridge_init(d->host_state.bus, PCI_DEVFN(1, 0), > + *bus2 = pci_bridge_init(d->bus, PCI_DEVFN(1, 0), > PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_SIMBA, > pci_apb_map_irq, > "Advanced PCI Bus secondary bridge 1"); > apb_pci_bridge_init(*bus2); > > - *bus3 = pci_bridge_init(d->host_state.bus, PCI_DEVFN(1, 1), > + *bus3 = pci_bridge_init(d->bus, PCI_DEVFN(1, 1), > PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_SIMBA, > pci_apb_map_irq, > "Advanced PCI Bus secondary bridge 2"); > apb_pci_bridge_init(*bus3); > > - return d->host_state.bus; > + return d->bus; > } > > static void pci_pbm_reset(DeviceState *d) > @@ -382,7 +382,7 @@ static void pci_pbm_reset(DeviceState *d) > static int pci_pbm_init_device(SysBusDevice *dev) > { > APBState *s; > - int pci_mem_data, apb_config, pci_ioport, pci_config; > + int pci_config, apb_config, pci_ioport; > unsigned int i; > > s = FROM_SYSBUS(APBState, dev); > @@ -396,20 +396,23 @@ static int pci_pbm_init_device(SysBusDevice *dev) > /* apb_config */ > apb_config = cpu_register_io_memory(apb_config_read, > apb_config_write, s); > + /* at region 0 */ > sysbus_init_mmio(dev, 0x1ULL, apb_config); > - /* pci_ioport */ > - pci_ioport = cpu_register_io_memory(pci_apb_ioread, > - pci_apb_iowrite, s); > - sysbus_init_mmio(dev, 0x1ULL, pci_ioport); > - /* pci_config */ > + > + /* PCI configuration space */ > s->pci_config_handler.read = apb_pci_config_read; > s->pci_config_handler.write = apb_pci_config_write; > pci_config = cpu_register_io_memory_simple(&s->pci_conf
Re: [Qemu-devel] Re: [RFC PATCH] AMD IOMMU emulation
On Tue, May 25, 2010 at 8:39 AM, Joerg Roedel wrote: > On Mon, May 24, 2010 at 08:10:16PM +, Blue Swirl wrote: >> On Mon, May 24, 2010 at 3:40 PM, Joerg Roedel wrote: >> >> + >> >> +#define MMIO_SIZE 0x2028 >> > >> > This size should be a power-of-two value. In this case probably 0x4000. >> >> Not really, the devices can reserve regions of any size. There were >> some implementation deficiencies in earlier versions of QEMU, where >> the whole page would be reserved anyway, but this limitation has been >> removed long time ago. > > The drivers for AMD IOMMU expect that to be 0x4000. At least the Linux > driver maps the MMIO region with this size. So the emulation should > reserve this amount of MMIO space too. Well, Linux drivers may take a conservative approach so I'd check what's the value in the device specs. In practice, on x86 hardware the size doesn't matter too much, for example on Sparc an access beyond the end of the device region would trap.
[Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Mon, May 24, 2010 at 8:13 PM, Jan Kiszka wrote: > From: Jan Kiszka > > This allows to communicate potential IRQ coalescing during delivery from > the sink back to the source. Targets that support IRQ coalescing > workarounds need to register handlers that return the appropriate > QEMU_IRQ_* code, and they have to propergate the code across all IRQ > redirections. If the IRQ source receives a QEMU_IRQ_COALESCED, it can > apply its workaround. If multiple sinks exist, the source may only > consider an IRQ coalesced if all other sinks either report > QEMU_IRQ_COALESCED as well or QEMU_IRQ_MASKED. No real devices are interested whether any of their output lines are even connected. This would introduce a new signal type, bidirectional multi-level, which is not correct. I think the real solution to coalescing is put the logic inside one device, in this case APIC because it has the information about irq delivery. APIC could monitor incoming RTC irqs for frequency information and whether they get delivered or not. If not, an internal timer is installed which injects the lost irqs. Of course, no real device could do such de-coalescing, but with this approach, the voodoo is contained to insides of one device, APIC. We should also take a step back to think what was the cause of lost irqs, IIRC uneven execution rate in QEMU. Could this be fixed or taken into account in timer handling? For example, CPU loop could analyze the wall clock time between CPU exits and use that to offset the timers. Thus the timer frequency (in wall clock time) could be made to correspond a bit more to VCPU execution rate. > > Signed-off-by: Jan Kiszka > --- > hw/irq.c | 38 +- > hw/irq.h | 22 +++--- > 2 files changed, 44 insertions(+), 16 deletions(-) > > diff --git a/hw/irq.c b/hw/irq.c > index 7703f62..db2cce6 100644 > --- a/hw/irq.c > +++ b/hw/irq.c > @@ -26,19 +26,27 @@ > > struct IRQState { > qemu_irq_handler handler; > + qemu_irq_fb_handler feedback_handler; > void *opaque; > int n; > }; > > -void qemu_set_irq(qemu_irq irq, int level) > +int qemu_set_irq(qemu_irq irq, int level) > { > - if (!irq) > - return; > - > - irq->handler(irq->opaque, irq->n, level); > + if (!irq) { > + return 0; > + } > + if (irq->feedback_handler) { > + return irq->feedback_handler(irq->opaque, irq->n, level); > + } else { > + irq->handler(irq->opaque, irq->n, level); > + return QEMU_IRQ_DELIVERED; > + } > } > > -qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, void *opaque, int n) > +static qemu_irq *allocate_irqs(qemu_irq_handler handler, > + qemu_irq_fb_handler feedback_handler, > + void *opaque, int n) > { > qemu_irq *s; > struct IRQState *p; > @@ -48,6 +56,7 @@ qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, void > *opaque, int n) > p = (struct IRQState *)qemu_mallocz(sizeof(struct IRQState) * n); > for (i = 0; i < n; i++) { > p->handler = handler; > + p->feedback_handler = feedback_handler; > p->opaque = opaque; > p->n = i; > s[i] = p; > @@ -56,22 +65,33 @@ qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, > void *opaque, int n) > return s; > } > > +qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, void *opaque, int n) > +{ > + return allocate_irqs(handler, NULL, opaque, n); > +} > + > +qemu_irq *qemu_allocate_feedback_irqs(qemu_irq_fb_handler handler, > + void *opaque, int n) > +{ > + return allocate_irqs(NULL, handler, opaque, n); > +} > + > void qemu_free_irqs(qemu_irq *s) > { > qemu_free(s[0]); > qemu_free(s); > } > > -static void qemu_notirq(void *opaque, int line, int level) > +static int qemu_notirq(void *opaque, int line, int level) > { > struct IRQState *irq = opaque; > > - irq->handler(irq->opaque, irq->n, !level); > + return qemu_set_irq(irq, !level); > } > > qemu_irq qemu_irq_invert(qemu_irq irq) > { > /* The default state for IRQs is low, so raise the output now. */ > qemu_irq_raise(irq); > - return qemu_allocate_irqs(qemu_notirq, irq, 1)[0]; > + return allocate_irqs(NULL, qemu_notirq, irq, 1)[0]; > } > diff --git a/hw/irq.h b/hw/irq.h > index 5daae44..eee03e6 100644 > --- a/hw/irq.h > +++ b/hw/irq.h > @@ -3,15 +3,18 @@ > > /* Generic IRQ/GPIO pin infrastructure. */ > > -/* FIXME: Rmove one of these. */ > +#define QEMU_IRQ_DELIVERED 0 > +#define QEMU_IRQ_COALESCED (-1) > +#define QEMU_IRQ_MASKED (-2) > + > typedef void (*qemu_irq_handler)(void *opaque, int n, int level); > -typedef void SetIRQFunc(void *opaque, int irq_num, int level); > +typedef int (*qemu_irq_fb_handler)(void *opaque, int n, int level); > > -void qemu_set_irq(qemu_irq irq, int level); > +int qemu_set_irq(qemu_irq irq, int level); > > -static inline void qemu_irq_raise(qemu_irq irq) >
Re: [Qemu-devel] Re: [PATCH] Add cache=volatile parameter to -drive
Anthony Liguori wrote: > On 05/25/2010 12:59 PM, Alexander Graf wrote: >>> I see it as the equivalent to the Taint bit in Linux. I want to make >>> it clear to users up front that if you use this option, and you have >>> data loss issues, don't complain. >>> >>> Just putting something in qemu-doc.texi is not enough IMHO. Few >>> people actually read it. >>> >> So what exactly is the conclusion here? I really want to see this >> getting merged. >> > > Make it more scary and try again. I don't see how to make it more scary while still considering users sane human beings. You don't print out a big fat warning on -drive if=scsi either, right? Alex
Re: [Qemu-devel] Re: [PATCH 1/5] Exit if incoming migration fails
On 05/25/2010 01:37 PM, Juan Quintela wrote: Luiz Capitulino wrote: On Tue, 25 May 2010 16:21:01 +0200 Juan Quintela wrote: Signed-off-by: Juan Quintela --- migration.c | 16 ++-- migration.h |2 +- vl.c|7 ++- 3 files changed, 17 insertions(+), 8 deletions(-) While I agree on the change, I have two comments: 1. By taking a look at the code I have the impression that most of the fun failures will happen on the handler passed to qemu_set_fd_handler2(), do you agree? Any plan to address that? That is outgoing migration, not incoming migration. Incoming migration in synchronous.. 1. Is exit()ing the best thing to be done? I understand it's the easiest and maybe better than nothing, but wouldn't it be better to enter in paused-forever state so that clients can query and decide what to do? For incoming migration, if it fails in the middle, every bet is off. You are in a really inconsistent state, not sure which one, and if migration was live, with the other host possibly retaking the disks to continue. I agree that exiting is the only sane behavior for the destination. Regards, Anthony Liguori In some cases, you can't do anything: - you got passed an fd, and fd got closed/image corrupted/... - you got passed an exec command like "exec: gzip -d< foo.gz" If gzip failed once, it will fail forever. If you are running it by hand, cursor up + enter, and you are back If you are using a management application, it is going to be easier to restart the process that trying to cleanup everything. Experience shows that people really tries to do weird things when machine is in this state. Later, Juan.
Re: [Qemu-devel] [PATCH 3/5] QMP: Introduce MIGRATION events
On 05/25/2010 01:31 PM, Luiz Capitulino wrote: On Tue, 25 May 2010 16:21:03 +0200 Juan Quintela wrote: They are emitted when migration starts, ends, has a failure or is canceled. Signed-off-by: Juan Quintela --- QMP/qmp-events.txt | 50 ++ monitor.c | 12 monitor.h |4 3 files changed, 66 insertions(+), 0 deletions(-) diff --git a/QMP/qmp-events.txt b/QMP/qmp-events.txt index 01ec85f..93caa4d 100644 --- a/QMP/qmp-events.txt +++ b/QMP/qmp-events.txt @@ -26,6 +26,56 @@ Example: Note: If action is "stop", a STOP event will eventually follow the BLOCK_IO_ERROR event. +MIGRATION_CANCELED +-- + +Emitted when migration is canceled. This is emitted in the source. Shouldn't this one be emitted in the destination? Destination can't distinguish a cancelled from a closed pipe. But the idea is that a third party is talking to both source and destination so it knows if it's cancelled the migration. +Target will emit MIGRATION_FAILED (no way to differentiate a FAILED +and CANCELED migration for target). + +Data: None + +Example: + +{ "event": "MIGRATION_CANCELED", +"timestamp": {"seconds": 1274687575, "microseconds": 592483} } + +MIGRATION_ENDED +--- + +Emitted when migration ends (both in source and target) + +Data: None + +Example: + +{ "event": "MIGRATION_ENDED", +"timestamp": {"seconds": 1274687575, "microseconds": 592483} } + +MIGRATION_FAILED + + +Emitted when migration fails (both is source and target). + +Data: None + +Example: + +{ "event": "MIGRATION_FAILED", +"timestamp": {"seconds": 1274687575, "microseconds": 592483} } What about a MIGRATION_FINISHED event, which contains a 'success' key which is a bool? The only disadvantage of this is if we decide to add more information to the event (say, stats) then it'd get ugly. Otherwise, one event is enough. Anyway, the counterpart of MIGRATION_FAILED is MIGRATION_SUCCEEDED. I see MIGRATION_FAILED as being very similar to block I/O error events. I think we'll need a very similar solution for both. It boils down to, how do we raise asynchronous events when something fails? Regards, Anthony Liguori
Re: [Qemu-devel] Re: [PATCH] Add cache=volatile parameter to -drive
On 05/25/2010 12:59 PM, Alexander Graf wrote: I see it as the equivalent to the Taint bit in Linux. I want to make it clear to users up front that if you use this option, and you have data loss issues, don't complain. Just putting something in qemu-doc.texi is not enough IMHO. Few people actually read it. So what exactly is the conclusion here? I really want to see this getting merged. Make it more scary and try again. Regards, Anthony Liguori Alex
[Qemu-devel] Re: [PATCH 1/5] Exit if incoming migration fails
Luiz Capitulino wrote: > On Tue, 25 May 2010 16:21:01 +0200 > Juan Quintela wrote: > >> Signed-off-by: Juan Quintela >> --- >> migration.c | 16 ++-- >> migration.h |2 +- >> vl.c|7 ++- >> 3 files changed, 17 insertions(+), 8 deletions(-) >> > While I agree on the change, I have two comments: > > 1. By taking a look at the code I have the impression that most of the >fun failures will happen on the handler passed to qemu_set_fd_handler2(), >do you agree? Any plan to address that? That is outgoing migration, not incoming migration. Incoming migration in synchronous.. > 1. Is exit()ing the best thing to be done? I understand it's the easiest >and maybe better than nothing, but wouldn't it be better to enter in >paused-forever state so that clients can query and decide what to do? For incoming migration, if it fails in the middle, every bet is off. You are in a really inconsistent state, not sure which one, and if migration was live, with the other host possibly retaking the disks to continue. In some cases, you can't do anything: - you got passed an fd, and fd got closed/image corrupted/... - you got passed an exec command like "exec: gzip -d < foo.gz" If gzip failed once, it will fail forever. If you are running it by hand, cursor up + enter, and you are back If you are using a management application, it is going to be easier to restart the process that trying to cleanup everything. Experience shows that people really tries to do weird things when machine is in this state. Later, Juan.
[Qemu-devel] Re: [PATCH 3/5] QMP: Introduce MIGRATION events
Luiz Capitulino wrote: > On Tue, 25 May 2010 17:35:53 +0200 > Juan Quintela wrote: > >> Anthony Liguori wrote: >> > On 05/25/2010 09:21 AM, Juan Quintela wrote: >> >> >> +MIGRATION_CANCELED >> >> +-- >> >> + >> >> +Emitted when migration is canceled. This is emitted in the source. >> >> +Target will emit MIGRATION_FAILED (no way to differentiate a FAILED >> >> +and CANCELED migration for target). >> >> >> > >> > But the management tool is the one that cancels so surely, it knows >> > why already. >> >> ok, then that one is ok. > > Isn't this one important for the destination instead? Destination don't know what happened, only that conection/data is not coming anymore. >From management prespective, management application knows that it has canceled the migration, so no need to be told. Later, Juan.
Re: [Qemu-devel] [PATCH 1/2] virtio-9p: make virtio-9p available to all POSIX systems
On Mon, May 24, 2010 at 8:46 PM, Venkateswararao Jujjuri (JV) wrote: > Blue Swirl wrote: >> Field d_off in struct dirent is Linux specific. >> >> Signed-off-by: Blue Swirl >> --- >> Makefile.objs | 8 >> Makefile.target | 2 +- >> hw/virtio-9p.c | 2 +- >> hw/virtio-pci.c | 6 +++--- >> hw/virtio.h | 4 ++-- >> qemu-config.c | 4 ++-- >> qemu-config.h | 2 +- >> qemu-options.hx | 8 >> vl.c | 8 >> 9 files changed, 22 insertions(+), 22 deletions(-) >> >> diff --git a/Makefile.objs b/Makefile.objs >> index 1585101..b1a6e01 100644 >> --- a/Makefile.objs >> +++ b/Makefile.objs >> @@ -35,8 +35,8 @@ net-nested-$(CONFIG_SLIRP) += slirp.o >> net-nested-$(CONFIG_VDE) += vde.o >> net-obj-y += $(addprefix net/, $(net-nested-y)) >> >> -fsdev-nested-$(CONFIG_LINUX) = qemu-fsdev.o >> -fsdev-obj-$(CONFIG_LINUX) += $(addprefix fsdev/, $(fsdev-nested-y)) >> +fsdev-nested-$(CONFIG_POSIX) = qemu-fsdev.o >> +fsdev-obj-$(CONFIG_POSIX) += $(addprefix fsdev/, $(fsdev-nested-y)) >> >> ## >> # libqemu_common.a: Target independent part of system emulation. The >> @@ -47,7 +47,7 @@ fsdev-obj-$(CONFIG_LINUX) += $(addprefix fsdev/, >> $(fsdev-nested-y)) >> common-obj-y = $(block-obj-y) >> common-obj-y += $(net-obj-y) >> common-obj-y += $(qobject-obj-y) >> -common-obj-$(CONFIG_LINUX) += $(fsdev-obj-$(CONFIG_LINUX)) >> +common-obj-$(CONFIG_POSIX) += $(fsdev-obj-$(CONFIG_POSIX)) >> common-obj-y += readline.o console.o async.o qemu-error.o >> common-obj-y += tcg-runtime.o host-utils.o >> common-obj-y += irq.o ioport.o input.o >> @@ -229,7 +229,7 @@ sound-obj-$(CONFIG_CS4231A) += cs4231a.o >> adlib.o fmopl.o: QEMU_CFLAGS += -DBUILD_Y8950=0 >> hw-obj-$(CONFIG_SOUND) += $(sound-obj-y) >> >> -hw-obj-$(CONFIG_LINUX) += virtio-9p-debug.o virtio-9p-local.o >> +hw-obj-$(CONFIG_POSIX) += virtio-9p-debug.o virtio-9p-local.o >> >> ## >> # libdis >> diff --git a/Makefile.target b/Makefile.target >> index fda5bf3..00e140f 100644 >> --- a/Makefile.target >> +++ b/Makefile.target >> @@ -168,7 +168,7 @@ obj-y += virtio-blk.o virtio-balloon.o >> virtio-net.o virtio-serial-bus.o >> obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o >> obj-y += vhost_net.o >> obj-$(CONFIG_VHOST_NET) += vhost.o >> -obj-$(CONFIG_LINUX) += virtio-9p.o >> +obj-$(CONFIG_POSIX) += virtio-9p.o >> obj-y += rwhandler.o >> obj-$(CONFIG_KVM) += kvm.o kvm-all.o >> obj-$(CONFIG_NO_KVM) += kvm-stub.o >> diff --git a/hw/virtio-9p.c b/hw/virtio-9p.c >> index e5d0112..68b0696 100644 >> --- a/hw/virtio-9p.c >> +++ b/hw/virtio-9p.c >> @@ -1447,8 +1447,8 @@ static void v9fs_read_post_dir_lstat(V9fsState >> *s, V9fsReadState *vs, >> vs->count += vs->len; >> v9fs_stat_free(&vs->v9stat); >> v9fs_string_free(&vs->name); >> - vs->dir_pos = vs->dent->d_off; >> vs->dent = v9fs_do_readdir(s, vs->fidp->dir); >> + vs->dir_pos = v9fs_do_telldir(s, vs->fidp->dir); > > > We need to save the the current dir position before making next readdir > We need to seek back if we can't fit it into PDU. > Hence moving the dir_pos after readdir is not a good idea. Hmm, the manual page for readdir says: On Linux, the dirent structure is defined as follows: struct dirent { ino_t d_ino; /* inode number */ off_t d_off; /* offset to the next dirent */ unsigned short d_reclen;/* length of this record */ unsigned char d_type; /* type of file */ char d_name[256]; /* filename */ }; My change was based on the comment for d_off. But when I run this program: $ cat dirent.c #include #include #include int main(int argc, const char **argv) { DIR *d; struct dirent *entry; off_t pos; d = opendir(argv[1]); entry = readdir(d); do { pos = telldir(d); printf("name %s d_off %ld ino %ld pos %ld\n", entry->d_name, entry->d_off, entry->d_ino, pos); entry = readdir(d); } while (entry); closedir(d); return 0; } d_off is equal to telldir value: $ ./dirent / name tmp d_off 206002973 ino 56 pos 206002973 name root d_off 224116791 ino 521217 pos 224116791 name vmlinuz.old d_off 255549115 ino 17 pos 255549115 name dev d_off 378658993 ino 374625 pos 378658993 I'll send an updated patch. > BTW, Thanks for making VirtFS generic to all POSIX systems. > > Thanks, > JV. > >> v9fs_read_post_readdir(s, vs, err); >> return; >> out: >> diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c >> index 7ddf612..0a74781 100644 >> --- a/hw/virtio-pci.c >> +++ b/hw/virtio-pci.c >> @@ -102,7 +102,7 @@ typedef struct { >> BlockConf block; >> NICConf nic; >> uint32_t host_features; >> -#ifdef CONFIG_LINUX >> +#ifdef CONFIG_POSIX >> V9fsConf fsconf; >> #endif >> /* Max.
Re: [Qemu-devel] [PATCH 3/5] QMP: Introduce MIGRATION events
On Tue, 25 May 2010 16:21:03 +0200 Juan Quintela wrote: > They are emitted when migration starts, ends, has a failure or is canceled. > > Signed-off-by: Juan Quintela > --- > QMP/qmp-events.txt | 50 ++ > monitor.c | 12 > monitor.h |4 > 3 files changed, 66 insertions(+), 0 deletions(-) > > diff --git a/QMP/qmp-events.txt b/QMP/qmp-events.txt > index 01ec85f..93caa4d 100644 > --- a/QMP/qmp-events.txt > +++ b/QMP/qmp-events.txt > @@ -26,6 +26,56 @@ Example: > Note: If action is "stop", a STOP event will eventually follow the > BLOCK_IO_ERROR event. > > +MIGRATION_CANCELED > +-- > + > +Emitted when migration is canceled. This is emitted in the source. Shouldn't this one be emitted in the destination? > +Target will emit MIGRATION_FAILED (no way to differentiate a FAILED > +and CANCELED migration for target). > + > +Data: None > + > +Example: > + > +{ "event": "MIGRATION_CANCELED", > +"timestamp": {"seconds": 1274687575, "microseconds": 592483} } > + > +MIGRATION_ENDED > +--- > + > +Emitted when migration ends (both in source and target) > + > +Data: None > + > +Example: > + > +{ "event": "MIGRATION_ENDED", > +"timestamp": {"seconds": 1274687575, "microseconds": 592483} } > + > +MIGRATION_FAILED > + > + > +Emitted when migration fails (both is source and target). > + > +Data: None > + > +Example: > + > +{ "event": "MIGRATION_FAILED", > +"timestamp": {"seconds": 1274687575, "microseconds": 592483} } What about a MIGRATION_FINISHED event, which contains a 'success' key which is a bool? The only disadvantage of this is if we decide to add more information to the event (say, stats) then it'd get ugly. Otherwise, one event is enough. Anyway, the counterpart of MIGRATION_FAILED is MIGRATION_SUCCEEDED. > + > +MIGRATION_STARTED > +- > + > +Emitted when migration starts (both in source and target). Don't you need this only on the destination?
Re: [Qemu-devel] Re: [PATCH 3/5] QMP: Introduce MIGRATION events
On Tue, 25 May 2010 17:35:53 +0200 Juan Quintela wrote: > Anthony Liguori wrote: > > On 05/25/2010 09:21 AM, Juan Quintela wrote: > > >> +MIGRATION_CANCELED > >> +-- > >> + > >> +Emitted when migration is canceled. This is emitted in the source. > >> +Target will emit MIGRATION_FAILED (no way to differentiate a FAILED > >> +and CANCELED migration for target). > >> > > > > But the management tool is the one that cancels so surely, it knows > > why already. > > ok, then that one is ok. Isn't this one important for the destination instead?
Re: [Qemu-devel] [PATCH 2/3] Tracepoint, buffer & monitor framework
Hi Stefan, Thanks for having a look. As I'd mentioned, this patchset is *work in progress*, which explains the dummy comments and coding style violations at places :) I was merely sharing a draft of what my approach is -- so that we can work together on how much of it can add to the trace framework you've proposed. On 05/25/2010 05:10 PM, Stefan Hajnoczi wrote: I think this is too much work. Let each tracepoint have its own global struct tracepoint so it can directly reference it using tracepoint_##name - no hash lookup needed. Add the QLIST_ENTRY directly to struct tracepoint so the tracepoint register/unregister code can assign ids and look up tracepoints by name. No critical path code needs to do name lookups and the hash table can disappear. I had employed a combination of hash (derived from name) and an ID (which is the offset within a hash bucket where the tracepoint details are stored) to determine tracepoint information for a given name. Your suggestion to eliminate name queries is good, let me see how much of this can be scaled down. +#define DECLARE_TRACE(name, tproto, tstruct) \ + struct __trace_struct_##name { \ + tstruct \ + }; \ Should this struct be packed so more fields can fit? Yes, indeed. Thanks for reminding ! +trace_queue->trace_buffer[tmp].metadata.write_complete = 0; This is not guaranteed to work without memory barriers. There is no way for the trace consumer to block until there is more data available. The synchronization needs to consider writing traces to a file, which has different constraints than dumping the current contents of the trace buffer. We're missing a way to trace to a file. That could be done in binary or text. It would be easier in text because we already have the format strings and don't need a unique ID mapping in an external binary parsing tool. OK, at the time of working on this I hadnt really thought of dumping traces in a file. It meant too much of IO latency that such tracing would bring in. My idea of a tracer entailed buffer based logging with a simple reader(see last) Making data available after crash is also useful. The easiest way is to dump the trace buffer from the core dump using gdb. However, we'd need some way of making sense of the bytes. That could be done by reading the tracepoint_lib structures from the core dump. Agree. (The way I do trace recovery from a core dump in my simple tracer is to binary dump the trace buffer from the core dump. Since the trace buffer contents are normally written out to file unchanged anyway, the simpletrace.py script can read the dumped trace buffer like a normal trace file.) Nitpicks: As I mentioned, this is work in progress so you'd have seen quite a lot of violations. Thanks for pointing those out, I'll clean those up for whatever approach we choose to use :) Some added lines of code use tabs for indentation, 4 space indentation should be used. +struct tracepoint { + char *name; /* Tracepoint name */ + uint8_t trace_id; /* numerical ID */ Maximum 256 tracepoints in QEMU? A limit of 65536 is less likely to be an issue in the future. No, this field describes the maximum tracepoints for a given hash queue. + void __trace_print_##name(Monitor *mon, void *data) \ + { \ + struct __trace_struct_##name *entry; \ + \ + if(!entry)\ + return; \ This does not work, entry is not initialized. Typo ! should've been : if(!data) +#define DO_TRACE(name, args...)\ + trace_##name(args); This macro is unused? A relic that needs to be cleaned :) +/* In essence, the structure becomes : + * struct tracepoint_lib { This comment will get out of sync easily. + qemu_malloc(sizeof(struct tracepoint_lib)); + +if (!new_entry) + return NULL;/* No memory */ qemu_malloc() does not return NULL on out-of-memory, it aborts the program. Same for allocating new_entry->entry.name. Wondering how I forgot that ! thanks for reminding. +new_entry->entry.name = (char*)qemu_malloc(strlen(name)+1); +if(!new_entry->entry.name) + return NULL; + +strncpy(new_entry->entry.name, name, strlen(name)+1); Perhaps just strdup() instead of manual qemu_malloc()/strncpy(). Stefan I'll work on merging this circular buffer + monitor-based reader as a backend for your proposed tracer. Would it be a good idea to have two trace buffers -- when one is full, it gets written to disk ; while the second is used to log traces. I think
[Qemu-devel] Re: [PATCH 3/5] QMP: Introduce MIGRATION events
On Tue, 25 May 2010 11:10:23 -0500 Anthony Liguori wrote: > There should be some information about why it failed, no? Preferrably > in a QError format. > > > >>> At this point, we have basically -1 :( > >>> > >>> I can add a field with an error number, but we are very bad at the > >>> moment about moving errno's upstack. > >>> > >>> > >> We need a better solution for reporting errors via notifications. > >> > > Suggestions? > > > > Notice that what we need now is a way to know if migration ended with > > success or in any other way, as soon as possible. > > > > Markus/Luiz? We need to redesign QError. I could give it a try, but quite frankly, I don't know how do it good enough.. Markus has worked more on error handling than me though and I think he's the best person to do it, but he's busy at other things atm. Note that major work is design, not code churn (could turn out into an issue, but I doubt it).
Re: [Qemu-devel] [PATCH -V3 2/7] virtio-9p: Rearrange fileop structures
On Fri, 21 May 2010 14:26:05 -0700 "Venkateswararao Jujjuri (JV)" wrote: Hi JV, While I agree that this patch is nice to have, why is this part of the security model patchset? Is it required to implement the models? Thanks, Sripathi. > Signed-off-by: Venkateswararao Jujjuri > --- > hw/virtio-9p.c | 185 > ++-- > hw/virtio-9p.h | 92 > 2 files changed, 138 insertions(+), 139 deletions(-) > > diff --git a/hw/virtio-9p.c b/hw/virtio-9p.c > index 8ecd39c..fda3c4a 100644 > --- a/hw/virtio-9p.c > +++ b/hw/virtio-9p.c > @@ -21,6 +21,52 @@ > int dotu = 1; > int debug_9p_pdu; > > +enum { > +Oread = 0x00, > +Owrite = 0x01, > +Ordwr = 0x02, > +Oexec = 0x03, > +Oexcl = 0x04, > +Otrunc = 0x10, > +Orexec = 0x20, > +Orclose = 0x40, > +Oappend = 0x80, > +}; > + > +static int omode_to_uflags(int8_t mode) > +{ > +int ret = 0; > + > +switch (mode & 3) { > +case Oread: > +ret = O_RDONLY; > +break; > +case Ordwr: > +ret = O_RDWR; > +break; > +case Owrite: > +ret = O_WRONLY; > +break; > +case Oexec: > +ret = O_RDONLY; > +break; > +} > + > +if (mode & Otrunc) { > +ret |= O_TRUNC; > +} > + > +if (mode & Oappend) { > +ret |= O_APPEND; > +} > + > +if (mode & Oexcl) { > +ret |= O_EXCL; > +} > + > +return ret; > +} > + > static int v9fs_do_lstat(V9fsState *s, V9fsString *path, struct stat *stbuf) > { > return s->ops->lstat(&s->ctx, path->data, stbuf); > @@ -999,14 +1045,6 @@ out: > v9fs_string_free(&aname); > } > > -typedef struct V9fsStatState { > -V9fsPDU *pdu; > -size_t offset; > -V9fsStat v9stat; > -V9fsFidState *fidp; > -struct stat stbuf; > -} V9fsStatState; > - > static void v9fs_stat_post_lstat(V9fsState *s, V9fsStatState *vs, int err) > { > if (err == -1) { > @@ -1057,19 +1095,6 @@ out: > qemu_free(vs); > } > > -typedef struct V9fsWalkState { > -V9fsPDU *pdu; > -size_t offset; > -int16_t nwnames; > -int name_idx; > -V9fsQID *qids; > -V9fsFidState *fidp; > -V9fsFidState *newfidp; > -V9fsString path; > -V9fsString *wnames; > -struct stat stbuf; > -} V9fsWalkState; > - > static void v9fs_walk_complete(V9fsState *s, V9fsWalkState *vs, int err) > { > complete_pdu(s, vs->pdu, err); > @@ -1233,62 +1258,6 @@ out: > v9fs_walk_complete(s, vs, err); > } > > -typedef struct V9fsOpenState { > -V9fsPDU *pdu; > -size_t offset; > -int8_t mode; > -V9fsFidState *fidp; > -V9fsQID qid; > -struct stat stbuf; > - > -} V9fsOpenState; > - > -enum { > -Oread = 0x00, > -Owrite = 0x01, > -Ordwr = 0x02, > -Oexec = 0x03, > -Oexcl = 0x04, > -Otrunc = 0x10, > -Orexec = 0x20, > -Orclose = 0x40, > -Oappend = 0x80, > -}; > - > -static int omode_to_uflags(int8_t mode) > -{ > -int ret = 0; > - > -switch (mode & 3) { > -case Oread: > -ret = O_RDONLY; > -break; > -case Ordwr: > -ret = O_RDWR; > -break; > -case Owrite: > -ret = O_WRONLY; > -break; > -case Oexec: > -ret = O_RDONLY; > -break; > -} > - > -if (mode & Otrunc) { > -ret |= O_TRUNC; > -} > - > -if (mode & Oappend) { > -ret |= O_APPEND; > -} > - > -if (mode & Oexcl) { > -ret |= O_EXCL; > -} > - > -return ret; > -} > - > static void v9fs_open_post_opendir(V9fsState *s, V9fsOpenState *vs, int err) > { > if (vs->fidp->dir == NULL) { > @@ -1391,25 +1360,6 @@ out: > complete_pdu(s, pdu, err); > } > > -typedef struct V9fsReadState { > -V9fsPDU *pdu; > -size_t offset; > -int32_t count; > -int32_t total; > -int64_t off; > -V9fsFidState *fidp; > -struct iovec iov[128]; /* FIXME: bad, bad, bad */ > -struct iovec *sg; > -off_t dir_pos; > -struct dirent *dent; > -struct stat stbuf; > -V9fsString name; > -V9fsStat v9stat; > -int32_t len; > -int32_t cnt; > -int32_t max_count; > -} V9fsReadState; > - > static void v9fs_read_post_readdir(V9fsState *, V9fsReadState *, ssize_t); > > static void v9fs_read_post_seekdir(V9fsState *s, V9fsReadState *vs, ssize_t > err) > @@ -1597,19 +1547,6 @@ out: > qemu_free(vs); > } > > -typedef struct V9fsWriteState { > -V9fsPDU *pdu; > -size_t offset; > -int32_t len; > -int32_t count; > -int32_t total; > -int64_t off; > -V9fsFidState *fidp; > -struct iovec iov[128]; /* FIXME: bad, bad, bad */ > -struct iovec *sg; > -int cnt; > -} V9fsWriteState; > - > static void v9fs_write_post_writev(V9fsState *s, V9fsWriteState *vs, > ssize_t err) > { > @@ -1706,19 +1643,6 @@ out: > qemu_free(vs); > } > > -typedef struct V9fsCreateState { >
Re: [Qemu-devel] [PATCH 1/5] Exit if incoming migration fails
On Tue, 25 May 2010 16:21:01 +0200 Juan Quintela wrote: > Signed-off-by: Juan Quintela > --- > migration.c | 16 ++-- > migration.h |2 +- > vl.c|7 ++- > 3 files changed, 17 insertions(+), 8 deletions(-) > > diff --git a/migration.c b/migration.c > index 05f6cc5..9c1d4b6 100644 > --- a/migration.c > +++ b/migration.c > @@ -36,22 +36,26 @@ static uint32_t max_throttle = (32 << 20); > > static MigrationState *current_migration; > > -void qemu_start_incoming_migration(const char *uri) > +int qemu_start_incoming_migration(const char *uri) > { > const char *p; > +int ret; > > if (strstart(uri, "tcp:", &p)) > -tcp_start_incoming_migration(p); > +ret = tcp_start_incoming_migration(p); > #if !defined(WIN32) > else if (strstart(uri, "exec:", &p)) > -exec_start_incoming_migration(p); > +ret = exec_start_incoming_migration(p); > else if (strstart(uri, "unix:", &p)) > -unix_start_incoming_migration(p); > +ret = unix_start_incoming_migration(p); > else if (strstart(uri, "fd:", &p)) > -fd_start_incoming_migration(p); > +ret = fd_start_incoming_migration(p); > #endif > -else > +else { > fprintf(stderr, "unknown migration protocol: %s\n", uri); > +ret = -EPROTONOSUPPORT; > +} > +return ret; > } > > int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data) > diff --git a/migration.h b/migration.h > index 385423f..dd423a1 100644 > --- a/migration.h > +++ b/migration.h > @@ -50,7 +50,7 @@ struct FdMigrationState > void *opaque; > }; > > -void qemu_start_incoming_migration(const char *uri); > +int qemu_start_incoming_migration(const char *uri); > > int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data); > > diff --git a/vl.c b/vl.c > index 328395e..d13440d 100644 > --- a/vl.c > +++ b/vl.c > @@ -3823,7 +3823,12 @@ int main(int argc, char **argv, char **envp) > } > > if (incoming) { > -qemu_start_incoming_migration(incoming); > +int ret = qemu_start_incoming_migration(incoming); > +if (ret < 0) { > +fprintf(stderr, "Migration failed. Exit code %s(%d), exiting.\n", > +incoming, ret); > +exit(ret); While I agree on the change, I have two comments: 1. By taking a look at the code I have the impression that most of the fun failures will happen on the handler passed to qemu_set_fd_handler2(), do you agree? Any plan to address that? 1. Is exit()ing the best thing to be done? I understand it's the easiest and maybe better than nothing, but wouldn't it be better to enter in paused-forever state so that clients can query and decide what to do? > +} > } else if (autostart) { > vm_start(); > }
Re: [Qemu-devel] Re: [PATCH] Add cache=volatile parameter to -drive
Anthony Liguori wrote: > On 05/17/2010 11:23 AM, Paul Brook wrote: I don't see a difference between the results. Apparently the barrier option doesn't change a thing. >>> Ok. I don't like it, but I can see how it's compelling. I'd like to >>> see the documentation improved though. I also think a warning printed >>> on stdio about the safety of the option would be appropriate. >>> >> I disagree with this last bit. >> >> Errors should be issued if the user did something wrong. >> Warnings should be issued if qemu did (or will soon do) something >> other than >> what the user requested, or otherwise made questionable decisions on the >> user's behalf. >> >> In this case we're doing exactly what the user requested. The only >> plausible >> failure case is where a user is blindly trying options that they >> clearly don't >> understand or read the documentation for. I have zero sympathy for >> complaints >> like "Someone on the Internet told me to use --breakme, and broke >> thinks". >> > > I see it as the equivalent to the Taint bit in Linux. I want to make > it clear to users up front that if you use this option, and you have > data loss issues, don't complain. > > Just putting something in qemu-doc.texi is not enough IMHO. Few > people actually read it. So what exactly is the conclusion here? I really want to see this getting merged. Alex
[Qemu-devel] Re: [PATCH 3/5] QMP: Introduce MIGRATION events
Anthony Liguori wrote: > On 05/25/2010 11:25 AM, Daniel P. Berrange wrote: >> On Tue, May 25, 2010 at 06:04:17PM +0200, Juan Quintela wrote: >> >>> Anthony Liguori wrote: > I'm not sure why you would need a notification of when migration > starts (since you know when you've started migration). But you don't know if the other end "knows" that it has also started. started is needed only in incoming part, because we don't have a monitor to ask if migration has started. Later, Juan.
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On Tue, 25 May 2010, Avi Kivity wrote: > > What's the reason for not having these drivers upstream? Do we gain > > anything by hiding them from our users and requiring them to install the > > drivers separately from somewhere else? > > > > Six months. FWIW, we (Ceph) aren't complaining about the 6 month lag time (and I don't think the Sheepdog guys are either). >From our perspective, the current BlockDriver abstraction is ideal, as it represents the reality of qemu's interaction with storage. Any 'external' interface will be inferior to that in one way or another. But either way, we are perfectly willing to work with you to all to keep in sync with any future BlockDriver API improvements. It is worth our time investment even if the API is less stable. The ability to dynamically load a shared object using the existing api would make development a bit easier, but I'm not convinced it's better for for users. I think having ceph and sheepdog upstream with qemu will serve end users best, and we at least are willing to spend the time to help maintain that code in qemu.git. sage
[Qemu-devel] Re: [PATCH 1/2] Pad iommu with an empty slot (necessary for SunOS 4.1.4)
2010/5/21 Blue Swirl : > On Fri, May 21, 2010 at 5:23 PM, Artyom Tarasenko > wrote: >> 2010/5/10 Blue Swirl : >>> On 5/10/10, Artyom Tarasenko wrote: 2010/5/10 Blue Swirl : > On 5/10/10, Artyom Tarasenko wrote: >> 2010/5/9 Blue Swirl : >> > On 5/9/10, Artyom Tarasenko wrote: >> >> 2010/5/9 Blue Swirl : >> >> >> >> > On 5/8/10, Artyom Tarasenko wrote: >> >> >> On the real hardware (SS-5, LX) the MMU is not padded, but aliased. >> >> >> Software shouldn't use aliased addresses, neither should it crash >> >> >> when it uses (on the real hardware it wouldn't). Using empty_slot >> >> >> instead of aliasing can help with debugging such accesses. >> >> > >> >> > TurboSPARC Microprocessor User's Manual shows that there are >> >> > additional pages after the main IOMMU for AFX registers. So this is >> >> > not board specific, but depends on CPU/IOMMU versions. >> >> >> >> >> >> I checked it on the real hw: on LX and SS-5 these are aliased MMU addresses. >> >> SS-20 doesn't have any aliasing. >> > >> > But are your machines equipped with TurboSPARC or some other CPU? >> >> >> Good point, I must confess, I missed the word "Turbo" in your first >> answer. LX and SS-20 don't. >> But SS-5 must have a TurboSPARC CPU: >> >> ok cd /FMI,MB86904 >> ok .attributes >> context-table 00 00 00 00 03 ff f0 00 00 00 10 00 >> psr-implementation >> psr-version 0004 >> implementation >> version 0004 >> cache-line-size 0020 >> cache-nlines 0200 >> page-size 1000 >> dcache-line-size 0010 >> dcache-nlines 0200 >> dcache-associativity 0001 >> icache-line-size 0020 >> icache-nlines 0200 >> icache-associativity 0001 >> ncaches 0002 >> mmu-nctx 0100 >> sparc-version 0008 >> mask_rev 0026 >> device_type cpu >> name FMI,MB86904 >> >> and still it behaves the same as TI,TMS390S10 from the LX. This is done on SS-5: >> >> ok 1000 20 spacel@ . >> 409 >> ok 1400 20 spacel@ . >> 409 >> ok 1404 20 spacel@ . >> 23000 >> ok 1f04 20 spacel@ . >> 23000 >> ok 1008 20 spacel@ . >> 409 >> ok 1428 20 spacel@ . >> 409 >> ok 100c 20 spacel@ . >> 23000 >> ok 1010 20 spacel@ . >> 409 >> >> >> LX is the same except for the IOMMU-version: >> >> ok 1000 20 spacel@ . >> 405 >> ok 1400 20 spacel@ . >> 405 >> ok 1800 20 spacel@ . >> 405 >> ok 1f00 20 spacel@ . >> 405 >> ok 1ff0 20 spacel@ . >> 405 >> ok 1fff0004 20 spacel@ . >> 1fe000 >> ok 1004 20 spacel@ . >> 1fe000 >> ok 1108 20 spacel@ . >> 4105 >> ok 1040 20 spacel@ . >> 4105 >> ok 1fff0040 20 spacel@ . >> 4105 >> ok 1fff0044 20 spacel@ . >> 1fe000 >> ok 1fff0024 20 spacel@ . >> 1fe000 >> >> >> >> At what address the additional AFX registers are located? >> > >> > Here's complete TurboSPARC IOMMU address map: >> > PA[30:0] Register Access >> > 1000_ IOMMU Control R/W >> > 1000_0004 IOMMU Base Address R/W >> > 1000_0014 Flush All IOTLB Entries W >> > 1000_0018 Address Flush W >> > 1000_1000 Asynchronous Fault Status R/W >> > 1000_1004 Asynchronous Fault Address R/W >> > 1000_1010 SBus Slot Configuration 0 R/W >> > 1000_1014 SBus Slot Configuration 1 R/W >> > 1000_1018 SBus Slot Configuration 2 R/W >> > 1000_101C SBus Slot Configuration 3 R/W >> > 1000_1020 SBus Slot Configuration 4 R/W >> > 1000_1050 Memory Fault Status R/W >> > 1000_1054 Memory Fault Address R/W >> > 1000_2000 Module Identification R/W >> > 1000_3018 Mask Identification R >> > 1000_4000 AFX Queue Level W >> > 1000_6000 AFX Queue Level R >> > 1000_7000 AFX Queue Status R >> >> >> >> But if I read it correctly 0x12fff294 (which makes SunOS crash with -m 32) is >> well above this limit
Re: [Qemu-devel] [PATCH 2/5] vnc: send desktopresize event as reply to set encodings
On Tue, May 25, 2010 at 6:25 PM, Gerd Hoffmann wrote: > In case the desktop did resize while the vnc connection setup was still > in progress the client isn't informed about it. Send a desktop resize > event as soon as the client told us it can handle deskop resize via set > encodings message to make sure the client us up to date. I had a similar patch on my queue but yours is probably cleaner :). -- Corentin Chary http://xf.iksaif.net
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 07:21 PM, Anthony Liguori wrote: On 05/25/2010 11:16 AM, Avi Kivity wrote: On 05/25/2010 06:01 PM, Anthony Liguori wrote: On 05/25/2010 10:00 AM, Avi Kivity wrote: The latter. Why is it less important? If you don't inherit the memory, you can't access it. You can also pass /dev/shm fd's via SCM_RIGHTs to establish shared memory segments dynamically. Doesn't work for anonymous memory. What's wrong with /dev/shm memory? The kernel treats anonymous and nonymous memory differently for swapping (see /proc/sys/vm/swappiness); transparent hugepages won't work for /dev/shm (though it may be argued that that's a problem with thp); setup (/dev/shm defaults to half memory IIRC, we want mem+swap); different cgroup handling; somewhat clunky (a minor concern to be sure). Surely, with mmu notifiers, it wouldn't be that hard to share anonymous memory via an fd though, no? That's what I suggested with processfd(). I wouldn't call it easy but it's likely doable. Whether it's mergable is a different issue. -- error compiling committee.c: too many arguments to function
[Qemu-devel] [PATCH 3/5] vnc: keep track of client desktop size
Add two new variables to keep track of the vnc clients desktop size. Signed-off-by: Gerd Hoffmann --- vnc.c | 10 +++--- vnc.h |2 ++ 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/vnc.c b/vnc.c index 0e0e566..30e0bed 100644 --- a/vnc.c +++ b/vnc.c @@ -521,10 +521,12 @@ static void vnc_desktop_resize(VncState *vs) if (vs->csock == -1 || !vnc_has_feature(vs, VNC_FEATURE_RESIZE)) { return; } +vs->client_width = ds_get_width(ds); +vs->client_height = ds_get_height(ds); vnc_write_u8(vs, VNC_MSG_SERVER_FRAMEBUFFER_UPDATE); vnc_write_u8(vs, 0); vnc_write_u16(vs, 1); /* number of rects */ -vnc_framebuffer_update(vs, 0, 0, ds_get_width(ds), ds_get_height(ds), +vnc_framebuffer_update(vs, 0, 0, vs->client_width, vs->client_height, VNC_ENCODING_DESKTOPRESIZE); vnc_flush(vs); } @@ -1958,8 +1960,10 @@ static int protocol_client_init(VncState *vs, uint8_t *data, size_t len) char buf[1024]; int size; -vnc_write_u16(vs, ds_get_width(vs->ds)); -vnc_write_u16(vs, ds_get_height(vs->ds)); +vs->client_width = ds_get_width(vs->ds); +vs->client_height = ds_get_height(vs->ds); +vnc_write_u16(vs, vs->client_width); +vnc_write_u16(vs, vs->client_height); pixel_format_message(vs); diff --git a/vnc.h b/vnc.h index 0d39897..d648832 100644 --- a/vnc.h +++ b/vnc.h @@ -134,6 +134,8 @@ struct VncState int absolute; int last_x; int last_y; +int client_width; +int client_height; uint32_t vnc_encoding; -- 1.6.6.1
[Qemu-devel] [PATCH 0/5] vnc: desktop size patches.
Hi, This series brings a bunch of vnc desktop size patches, fixing the issues discussed in the "Possible race condition in VNC display resizing" thread. Check list archive here: http://lists.gnu.org/archive/html/qemu-devel/2010-04/msg01778.html cheers, Gerd Gerd Hoffmann (5): vnc: factor out vnc_desktop_resize() vnc: send desktopresize event as reply to set encodings vnc: keep track of client desktop size vnc: don't send invalid screen updates. vnc: move size-changed check into the vnc_desktop_resize function. vnc.c | 50 +- vnc.h |2 ++ 2 files changed, 35 insertions(+), 17 deletions(-)
[Qemu-devel] [PATCH 4/5] vnc: don't send invalid screen updates.
Don't send updates for screen areas which are outside the clients desktop. May happed with vnc clients which don't support the desktop resize message. Signed-off-by: Gerd Hoffmann --- vnc.c |8 ++-- 1 files changed, 6 insertions(+), 2 deletions(-) diff --git a/vnc.c b/vnc.c index 30e0bed..119ffe8 100644 --- a/vnc.c +++ b/vnc.c @@ -836,6 +836,7 @@ static int vnc_update_client(VncState *vs, int has_dirty) int y; int n_rectangles; int saved_offset; +int width, height; if (vs->output.offset && !vs->audio_cap && !vs->force_update) /* kernel send buffers are full -> drop frames to throttle */ @@ -856,10 +857,13 @@ static int vnc_update_client(VncState *vs, int has_dirty) saved_offset = vs->output.offset; vnc_write_u16(vs, 0); -for (y = 0; y < vd->server->height; y++) { +width = MIN(vd->server->width, vs->client_width); +height = MIN(vd->server->height, vs->client_height); + +for (y = 0; y < height; y++) { int x; int last_x = -1; -for (x = 0; x < vd->server->width / 16; x++) { +for (x = 0; x < width / 16; x++) { if (vnc_get_bit(vs->dirty[y], x)) { if (last_x == -1) { last_x = x; -- 1.6.6.1
Re: [Qemu-devel] Re: [PATCH 3/5] QMP: Introduce MIGRATION events
On Tue, May 25, 2010 at 06:04:17PM +0200, Juan Quintela wrote: > Anthony Liguori wrote: > > On 05/25/2010 10:35 AM, Juan Quintela wrote: > > >> problem here is that libvirt start target with -S, and waits to do the > >> "cont" as soon as possible. As of know, only way to do it is to poll > >> info migrate on source faster. > >> > > > > Why does it do that?? > > > > That sound like a terrible idea. > > Becaues migration is not reliable, and they don't have a way to issue > cont only in one of the sides :( > > We make migration protocol reliable, or management application have to > decide when migration suceeded or not. > > This new events help then a lot. But they issue the cont really fast > (before migration ends). I don't remember why they did that. The use of '-S / cont' isn't really because of reliability. There are several scenarios though. There's a migrate API option to leave the guest paused upon completion, hence we need to start it with -S to stop it auto-running upon completion. With some disk locking approaches we need todo a lock transfer before allowing the dest to continue running. It could be optimized to avoid the -S /cont in cases where those two scenarios aren't relevant, but only if we can get a separate async notification of when migration starts and completes on the destination, so we can notify mgmt apps that need this lifecycle event. So in summary these lifecycle events on source + dest for start, complete, fail, cancel are all focused on allowing libvirt to remove its existing hacks in migration support for current QEMU. Regards, Daniel -- |: Red Hat, Engineering, London-o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org-o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
Re: [Qemu-devel] Re: [PATCH 3/5] QMP: Introduce MIGRATION events
On 05/25/2010 11:25 AM, Daniel P. Berrange wrote: On Tue, May 25, 2010 at 06:04:17PM +0200, Juan Quintela wrote: Anthony Liguori wrote: On 05/25/2010 10:35 AM, Juan Quintela wrote: problem here is that libvirt start target with -S, and waits to do the "cont" as soon as possible. As of know, only way to do it is to poll info migrate on source faster. Why does it do that?? That sound like a terrible idea. Becaues migration is not reliable, and they don't have a way to issue cont only in one of the sides :( We make migration protocol reliable, or management application have to decide when migration suceeded or not. This new events help then a lot. But they issue the cont really fast (before migration ends). I don't remember why they did that. The use of '-S / cont' isn't really because of reliability. There are several scenarios though. There's a migrate API option to leave the guest paused upon completion, hence we need to start it with -S to stop it auto-running upon completion. That's a strange API. Why would you want to do that? Why not just stop and then migrate? You're just wasting bandwidth doing a live migration and then leaving it stopped. This is a critical period of time for the guest and generally speaking, you don't want to involve many layers of management tooling in these decisions because the result is going to be that you break the migration downtime contract. With some disk locking approaches we need todo a lock transfer before allowing the dest to continue running. QEMU is going to read the disk before the migration completes so the lock transfer is not going to work with the current implementation (it needs to read the disk to do probing). I assume this is not something that's actually been implemented. It could be optimized to avoid the -S /cont in cases where those two scenarios aren't relevant, but only if we can get a separate async notification of when migration starts and completes on the destination, so we can notify mgmt apps that need this lifecycle event. Migration completes == guest starts running. You'll get a notification of that but you're not getting that now because you're doing -S which I'd argue is a functional problem on the part of libvirt (you're breaking the downtime contract). I'm not sure why you would need a notification of when migration starts (since you know when you've started migration). Regards, Anthony Liguori So in summary these lifecycle events on source + dest for start, complete, fail, cancel are all focused on allowing libvirt to remove its existing hacks in migration support for current QEMU. Regards, Daniel
[Qemu-devel] [PATCH 2/5] vnc: send desktopresize event as reply to set encodings
In case the desktop did resize while the vnc connection setup was still in progress the client isn't informed about it. Send a desktop resize event as soon as the client told us it can handle deskop resize via set encodings message to make sure the client us up to date. Signed-off-by: Gerd Hoffmann --- vnc.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/vnc.c b/vnc.c index aaebe24..0e0e566 100644 --- a/vnc.c +++ b/vnc.c @@ -1709,6 +1709,7 @@ static void set_encodings(VncState *vs, int32_t *encodings, size_t n_encodings) break; } } +vnc_desktop_resize(vs); check_pointer_type_change(&vs->mouse_mode_notifier); } -- 1.6.6.1
[Qemu-devel] [PATCH 5/5] vnc: move size-changed check into the vnc_desktop_resize function.
This make sure we send a desktop resize message only in case we actually have to, using the new variables which track the clients desktop size. Signed-off-by: Gerd Hoffmann --- vnc.c | 11 +-- 1 files changed, 5 insertions(+), 6 deletions(-) diff --git a/vnc.c b/vnc.c index 119ffe8..5715006 100644 --- a/vnc.c +++ b/vnc.c @@ -521,6 +521,10 @@ static void vnc_desktop_resize(VncState *vs) if (vs->csock == -1 || !vnc_has_feature(vs, VNC_FEATURE_RESIZE)) { return; } +if (vs->client_width == ds_get_width(ds) && +vs->client_height == ds_get_height(ds)) { +return; +} vs->client_width = ds_get_width(ds); vs->client_height = ds_get_height(ds); vnc_write_u8(vs, VNC_MSG_SERVER_FRAMEBUFFER_UPDATE); @@ -533,7 +537,6 @@ static void vnc_desktop_resize(VncState *vs) static void vnc_dpy_resize(DisplayState *ds) { -int size_changed; VncDisplay *vd = ds->opaque; VncState *vs; @@ -551,16 +554,12 @@ static void vnc_dpy_resize(DisplayState *ds) vd->guest.ds = qemu_mallocz(sizeof(*vd->guest.ds)); if (ds_get_bytes_per_pixel(ds) != vd->guest.ds->pf.bytes_per_pixel) console_color_init(ds); -size_changed = ds_get_width(ds) != vd->guest.ds->width || - ds_get_height(ds) != vd->guest.ds->height; *(vd->guest.ds) = *(ds->surface); memset(vd->guest.dirty, 0xFF, sizeof(vd->guest.dirty)); QTAILQ_FOREACH(vs, &vd->clients, next) { vnc_colordepth(vs); -if (size_changed) { -vnc_desktop_resize(vs); -} +vnc_desktop_resize(vs); if (vs->vd->cursor) { vnc_cursor_define(vs); } -- 1.6.6.1
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 05:01 PM, Kevin Wolf wrote: The current situation is that those block format drivers only exist in qemu.git or as patches. Surely that's even more unhappiness. The difference is that in the current situation these drivers will be part of the next qemu release, so the patch may be obsolete, but you don't even need it any more. The next qemu release may be six months in the future. So if you're not happy with running qemu.git master or with patching a stable release, you have to wait. If you start keeping block drivers outside qemu and not even try integrating them, they'll stay external. Which may or may not be a problem. Confusion could be mitigated: $ qemu -module my-fancy-block-format-driver.so my-fancy-block-format-driver.so does not support this version of qemu (0.19.2). Please contact my-fancy-block-format-driver-de...@example.org. The question is how many such block format drivers we expect. We now have two in the pipeline (ceph, sheepdog), it's reasonable to assume we'll want an lvm2 driver and btrfs driver. This is an area with a lot of activity and a relatively simply interface. What's the reason for not having these drivers upstream? Do we gain anything by hiding them from our users and requiring them to install the drivers separately from somewhere else? Six months. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 11:16 AM, Avi Kivity wrote: On 05/25/2010 06:01 PM, Anthony Liguori wrote: On 05/25/2010 10:00 AM, Avi Kivity wrote: The latter. Why is it less important? If you don't inherit the memory, you can't access it. You can also pass /dev/shm fd's via SCM_RIGHTs to establish shared memory segments dynamically. Doesn't work for anonymous memory. What's wrong with /dev/shm memory? The kernel treats anonymous and nonymous memory differently for swapping (see /proc/sys/vm/swappiness); transparent hugepages won't work for /dev/shm (though it may be argued that that's a problem with thp); setup (/dev/shm defaults to half memory IIRC, we want mem+swap); different cgroup handling; somewhat clunky (a minor concern to be sure). Surely, with mmu notifiers, it wouldn't be that hard to share anonymous memory via an fd though, no? Regards, Anthony Liguori Nothing is a killer, but we should prefer anonymous memory.
[Qemu-devel] [PATCH 1/5] vnc: factor out vnc_desktop_resize()
Signed-off-by: Gerd Hoffmann --- vnc.c | 24 1 files changed, 16 insertions(+), 8 deletions(-) diff --git a/vnc.c b/vnc.c index 11ae3e5..aaebe24 100644 --- a/vnc.c +++ b/vnc.c @@ -514,6 +514,21 @@ void buffer_append(Buffer *buffer, const void *data, size_t len) buffer->offset += len; } +static void vnc_desktop_resize(VncState *vs) +{ +DisplayState *ds = vs->ds; + +if (vs->csock == -1 || !vnc_has_feature(vs, VNC_FEATURE_RESIZE)) { +return; +} +vnc_write_u8(vs, VNC_MSG_SERVER_FRAMEBUFFER_UPDATE); +vnc_write_u8(vs, 0); +vnc_write_u16(vs, 1); /* number of rects */ +vnc_framebuffer_update(vs, 0, 0, ds_get_width(ds), ds_get_height(ds), + VNC_ENCODING_DESKTOPRESIZE); +vnc_flush(vs); +} + static void vnc_dpy_resize(DisplayState *ds) { int size_changed; @@ -542,14 +557,7 @@ static void vnc_dpy_resize(DisplayState *ds) QTAILQ_FOREACH(vs, &vd->clients, next) { vnc_colordepth(vs); if (size_changed) { -if (vs->csock != -1 && vnc_has_feature(vs, VNC_FEATURE_RESIZE)) { -vnc_write_u8(vs, VNC_MSG_SERVER_FRAMEBUFFER_UPDATE); -vnc_write_u8(vs, 0); -vnc_write_u16(vs, 1); /* number of rects */ -vnc_framebuffer_update(vs, 0, 0, ds_get_width(ds), ds_get_height(ds), -VNC_ENCODING_DESKTOPRESIZE); -vnc_flush(vs); -} +vnc_desktop_resize(vs); } if (vs->vd->cursor) { vnc_cursor_define(vs); -- 1.6.6.1
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 06:01 PM, Anthony Liguori wrote: On 05/25/2010 10:00 AM, Avi Kivity wrote: The latter. Why is it less important? If you don't inherit the memory, you can't access it. You can also pass /dev/shm fd's via SCM_RIGHTs to establish shared memory segments dynamically. Doesn't work for anonymous memory. What's wrong with /dev/shm memory? The kernel treats anonymous and nonymous memory differently for swapping (see /proc/sys/vm/swappiness); transparent hugepages won't work for /dev/shm (though it may be argued that that's a problem with thp); setup (/dev/shm defaults to half memory IIRC, we want mem+swap); different cgroup handling; somewhat clunky (a minor concern to be sure). Nothing is a killer, but we should prefer anonymous memory. -- error compiling committee.c: too many arguments to function
[Qemu-devel] [Bug 585449] Re: 'Broken pipe' error when starting a vm
Please report this bug against libvirt as this does not appear to be an upstream qemu issue. ** Changed in: qemu Status: New => Invalid -- 'Broken pipe' error when starting a vm https://bugs.launchpad.net/bugs/585449 You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. Status in libvirt virtualization API: Invalid Bug description: Occasionally get an error 'libvir: Remote error : cannot send data: Broken pipe' scrolling down the screen when starting a VM. VM does not start and end up having to reboot server to be able to get vm going again virsh -q -c qemu:///system start dev32 libvir: QEMU error : server closed connection libvir: Remote error : cannot send data: Broken pipe libvir: Remote error : cannot send data: Broken pipe ...last line keep being printed every second or so... Trying to then use virt-manager produced the attached log file $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description:Ubuntu 10.04 LTS Release:10.04 Codename: lucid $ dpkg -l | grep kvm ii kvm 1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9 dummy transitional pacakge from kvm to qemu- ii kvm-pxe5.4.4-1ubuntu1 PXE ROM's for KVM ii qemu-kvm 0.12.3+noroms-0ubuntu9 Full virtualization on i386 and amd64 hardwa # cat/var/log/libvirt/qemu/dev32.log LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 768 -smp 1 -name d ev32 -uuid 4eb67c83-8e6a-ba91-5741-6789da4e4004 -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/dev32.monitor,server,nowait -monitor chardev:monitor -localtime -boot c -drive file=/srv/vm_disks/dev32/disk0.qcow2,if=ide,index=0,boot=on -drive file=/home/dferguson/Disk_Images/MS_Win7Pro.iso,if=ide,media=c drom,index=2 -net nic,macaddr=52:54:00:1b:24:fb,vlan=0,name=nic.0 -net tap,fd=35,vlan=0,name=tap.0 -chardev pty,id=serial0 -serial chardev:serial0 -paralle l none -usb -usbdevice tablet -vnc 127.0.0.1:0 -k en-gb -vga cirrus char device redirected to /dev/pts/5 pci_add_option_rom: failed to find romfile "pxe-rtl8139.bin" LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 768 -smp 1 -name d ev32 -uuid 4eb67c83-8e6a-ba91-5741-6789da4e4004 -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/dev32.monitor,server,nowait -monitor chardev:monitor -localtime -boot c -drive file=/srv/vm_disks/dev32/disk0.qcow2,if=ide,index=0,boot=on -drive file=/home/dferguson/Disk_Images/MS_Win7Pro.iso,if=ide,media=c drom,index=2 -net nic,macaddr=52:54:00:1b:24:fb,vlan=0,name=nic.0 -net tap,fd=35,vlan=0,name=tap.0 -chardev pty,id=serial0 -serial chardev:serial0 -paralle l none -usb -usbdevice tablet -vnc 127.0.0.1:1 -k en-gb -vga cirrus char device redirected to /dev/pts/10 # kvm-img info /srv/vm_disks/dev32/disk0.qcow2 image: /srv/vm_disks/dev32/disk0.qcow2 file format: qcow2 virtual size: 16G (17179869184 bytes) disk size: 8.2G cluster_size: 65536 What other information would be helpful?
[Qemu-devel] Re: [PATCH 3/5] QMP: Introduce MIGRATION events
On 05/25/2010 11:04 AM, Juan Quintela wrote: Anthony Liguori wrote: On 05/25/2010 10:35 AM, Juan Quintela wrote: problem here is that libvirt start target with -S, and waits to do the "cont" as soon as possible. As of know, only way to do it is to poll info migrate on source faster. Why does it do that?? That sound like a terrible idea. Becaues migration is not reliable, and they don't have a way to issue cont only in one of the sides :( I don't know what you mean by reliable. When the migration completes on the destination, it will start automatically. The source will not start unless explicitly invoked. If you successfully cancel a migration on the source, it's guaranteed that it won't start on the destination. So the sequence looks like: src) // decide we want to give up migration src) migrate_cancel src) // check migration status src) cont // if migration cancelled src) //if migration succeeded, check destination for completion dst) // if not responsive and not completed in appropriate amount of time, kill guest src) cont // if killed destination I don't see what the problem is. We make migration protocol reliable, or management application have to decide when migration suceeded or not. Reliability has nothing to do with the protocol and everything to do with the presence of the third node. This new events help then a lot. But they issue the cont really fast (before migration ends). I don't remember why they did that. If libvirt is launching the destination with -S, it's doing the wrong thing and we ought make sure the proper fix gets implemented. danp? There should be some information about why it failed, no? Preferrably in a QError format. At this point, we have basically -1 :( I can add a field with an error number, but we are very bad at the moment about moving errno's upstack. We need a better solution for reporting errors via notifications. Suggestions? Notice that what we need now is a way to know if migration ended with success or in any other way, as soon as possible. Markus/Luiz? I think this makes more sense as a MIGRATION_CONNECTED event. It probably should carry peer information too. What kind of peer information? We have tcp/fd/exec/unix migrations. calling it CONNECTED vs STARTED, I don't care. But adding information? Notice that the management application knows what it did, I can put the: "exec: gzip -d< /tmp/foo" string, but not much more that I can put here. Basically, do we have any useful information in info migrate that we can include? (qemu) info migrate Migration status: active transferred ram: 874808 kbytes remaining ram: 227912 kbytes total ram: 1065344 kbytes (qemu) I can't see anything interesting to put here :( Ugh. About the CONNECTED/STARTED distintion, I fully agree with danp. We just want STARTED event for migration, CONNECTION should be generated (or not) for all sockets/char devices. it don't make sense for fd/exec for instance. That makes sense to me. Regards, Anthony Liguori Later, Juan.
[Qemu-devel] Re: [PATCH 3/5] QMP: Introduce MIGRATION events
Anthony Liguori wrote: > On 05/25/2010 10:35 AM, Juan Quintela wrote: >> problem here is that libvirt start target with -S, and waits to do the >> "cont" as soon as possible. As of know, only way to do it is to poll >> info migrate on source faster. >> > > Why does it do that?? > > That sound like a terrible idea. Becaues migration is not reliable, and they don't have a way to issue cont only in one of the sides :( We make migration protocol reliable, or management application have to decide when migration suceeded or not. This new events help then a lot. But they issue the cont really fast (before migration ends). I don't remember why they did that. danp? >>> There should be some information about why it failed, no? Preferrably >>> in a QError format. >>> >> At this point, we have basically -1 :( >> >> I can add a field with an error number, but we are very bad at the >> moment about moving errno's upstack. >> > > We need a better solution for reporting errors via notifications. Suggestions? Notice that what we need now is a way to know if migration ended with success or in any other way, as soon as possible. >>> I think this makes more sense as a MIGRATION_CONNECTED event. It >>> probably should carry peer information too. >>> >> What kind of peer information? >> >> We have tcp/fd/exec/unix migrations. calling it CONNECTED vs STARTED, I >> don't care. But adding information? Notice that the management >> application knows what it did, I can put the: >> >> "exec: gzip -d< /tmp/foo" >> >> string, but not much more that I can put here. >> > > Basically, do we have any useful information in info migrate that we > can include? (qemu) info migrate Migration status: active transferred ram: 874808 kbytes remaining ram: 227912 kbytes total ram: 1065344 kbytes (qemu) I can't see anything interesting to put here :( About the CONNECTED/STARTED distintion, I fully agree with danp. We just want STARTED event for migration, CONNECTION should be generated (or not) for all sockets/char devices. it don't make sense for fd/exec for instance. Later, Juan.
Re: [Qemu-devel] Re: [PATCH 3/5] QMP: Introduce MIGRATION events
On Tue, May 25, 2010 at 10:57:33AM -0500, Anthony Liguori wrote: > On 05/25/2010 10:35 AM, Juan Quintela wrote: > >Anthony Liguori wrote: > > > > > >> > >>>+Data: None > >>>+ > >>>+Example: > >>>+ > >>>+{ "event": "MIGRATION_CANCELED", > >>>+"timestamp": {"seconds": 1274687575, "microseconds": 592483} } > >>>+ > >>>+MIGRATION_ENDED > >>>+--- > >>>+ > >>>+Emitted when migration ends (both in source and target) > >>> > >>> > >>A start event is going to be generated already, no? > >> > >problem here is that libvirt start target with -S, and waits to do the > >"cont" as soon as possible. As of know, only way to do it is to poll > >info migrate on source faster. > > > > Why does it do that?? > > That sound like a terrible idea. Historically QEMU gave no alternative. Adding these STARTED/ENDED events is to allow libvirt to detect start + end of migration reliably, avoiding the previous hacks QEMU forced us todo on the dest, and avoid the high rate polling we had no choice but todo on the source. > >>I think this makes more sense as a MIGRATION_CONNECTED event. It > >>probably should carry peer information too. > >> > >What kind of peer information? > > > >We have tcp/fd/exec/unix migrations. calling it CONNECTED vs STARTED, I > >don't care. But adding information? Notice that the management > >application knows what it did, I can put the: > > > > "exec: gzip -d< /tmp/foo" > > > >string, but not much more that I can put here. > > Basically, do we have any useful information in info migrate that we can > include? info migrate just includes the progress info + state (running, finished, cancelled, failed). The event itself replicates state. I don't see a hugely compelling need to include the progress info in the FINISHED/CANCELLED events. If really needed, the app can still call 'info migrate' to get it. Regards, Daniel -- |: Red Hat, Engineering, London-o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org-o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
[Qemu-devel] Re: [PATCH 3/5] QMP: Introduce MIGRATION events
Anthony Liguori wrote: > On 05/25/2010 10:35 AM, Juan Quintela wrote: >> problem here is that libvirt start target with -S, and waits to do the >> "cont" as soon as possible. As of know, only way to do it is to poll >> info migrate on source faster. >> > > Why does it do that?? > > That sound like a terrible idea. Becaues migration is not reliable, and they don't have a way to issue cont only in one of the sides :( We make migration protocol reliable, or management application have to decide when migration suceeded or not. This new events help then a lot. But they issue the cont really fast (before migration ends). I don't remember why they did that. danp? >>> There should be some information about why it failed, no? Preferrably >>> in a QError format. >>> >> At this point, we have basically -1 :( >> >> I can add a field with an error number, but we are very bad at the >> moment about moving errno's upstack. >> > > We need a better solution for reporting errors via notifications. Suggestions? Notice that what we need now is a way to know if migration ended with success or in any other way, as soon as possible. >>> I think this makes more sense as a MIGRATION_CONNECTED event. It >>> probably should carry peer information too. >>> >> What kind of peer information? >> >> We have tcp/fd/exec/unix migrations. calling it CONNECTED vs STARTED, I >> don't care. But adding information? Notice that the management >> application knows what it did, I can put the: >> >> "exec: gzip -d< /tmp/foo" >> >> string, but not much more that I can put here. >> > > Basically, do we have any useful information in info migrate that we > can include? (qemu) info migrate Migration status: active transferred ram: 874808 kbytes remaining ram: 227912 kbytes total ram: 1065344 kbytes (qemu) I can't see anything interesting to put here :( About the CONNECTED/STARTED distintion, I fully agree with danp. We just want STARTED event for migration, CONNECTION should be generated (or not) for all sockets/char devices. it don't make sense for fd/exec for instance. Later, Juan.
Re: [Qemu-devel] SDL fullscreen window dislikes being resized past the screen size
On 05/25/2010 10:53 AM, Michael Tokarev wrote: Initially it were a bugreport on #kvm IRC, someone asked why his kvm exits when entering fullscreen mode, saying the famous "Could not open SDL display" and nothing more. I added a bit of debug output and here's what I see: ... resizing to 1440x900 0 0x115 resizing to 1440x900 32 0x8115 Could not open SDL display for 1440x900, bpp=32, flags=0x8115 flag=0x8000 means fullscreen. My screen size is 1280x1024, -- 1440 is more than 1280. It works just fine if I choose resolution less or equal to my screen size. Depending on how SDL is configured, it uses DGA or some other relic to actually implement full screen mode. You cannot get a DGA screen that's larger than the physical monitor since it's often backed by video memory. SDL isn't smart enough to degrade into a scaled mode either. Honestly, SDL full screen mode is a bad idea. We offer no indication that the guest is actually running which is potentially very confusing/dangerous. We need a better backend to really implement a functioning full screen mode. Regards, Anthony Liguori And it works just fine (as seen in the example output above) that it works with larger resolutions but not fullscreen (in that case SDL window will be scaled to fit the actual desktop size). It looks to me like an incorrect usage or assumptions about SDL window, or maybe SDL bug. Mine is libsdl1.2debian 1.2.13-2. And the error message is in usual qemu style - not at all useful :) But it's at least better than pure exit without any messages at all, like was in hugetlbfs code ;) Thanks! /mjt
[Qemu-devel] [Bug 585449] Re: 'Broken pipe' error when starting a vm
** Project changed: qemu => libvirt -- 'Broken pipe' error when starting a vm https://bugs.launchpad.net/bugs/585449 You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. Status in libvirt virtualization API: Invalid Bug description: Occasionally get an error 'libvir: Remote error : cannot send data: Broken pipe' scrolling down the screen when starting a VM. VM does not start and end up having to reboot server to be able to get vm going again virsh -q -c qemu:///system start dev32 libvir: QEMU error : server closed connection libvir: Remote error : cannot send data: Broken pipe libvir: Remote error : cannot send data: Broken pipe ...last line keep being printed every second or so... Trying to then use virt-manager produced the attached log file $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description:Ubuntu 10.04 LTS Release:10.04 Codename: lucid $ dpkg -l | grep kvm ii kvm 1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9 dummy transitional pacakge from kvm to qemu- ii kvm-pxe5.4.4-1ubuntu1 PXE ROM's for KVM ii qemu-kvm 0.12.3+noroms-0ubuntu9 Full virtualization on i386 and amd64 hardwa # cat/var/log/libvirt/qemu/dev32.log LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 768 -smp 1 -name d ev32 -uuid 4eb67c83-8e6a-ba91-5741-6789da4e4004 -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/dev32.monitor,server,nowait -monitor chardev:monitor -localtime -boot c -drive file=/srv/vm_disks/dev32/disk0.qcow2,if=ide,index=0,boot=on -drive file=/home/dferguson/Disk_Images/MS_Win7Pro.iso,if=ide,media=c drom,index=2 -net nic,macaddr=52:54:00:1b:24:fb,vlan=0,name=nic.0 -net tap,fd=35,vlan=0,name=tap.0 -chardev pty,id=serial0 -serial chardev:serial0 -paralle l none -usb -usbdevice tablet -vnc 127.0.0.1:0 -k en-gb -vga cirrus char device redirected to /dev/pts/5 pci_add_option_rom: failed to find romfile "pxe-rtl8139.bin" LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 768 -smp 1 -name d ev32 -uuid 4eb67c83-8e6a-ba91-5741-6789da4e4004 -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/dev32.monitor,server,nowait -monitor chardev:monitor -localtime -boot c -drive file=/srv/vm_disks/dev32/disk0.qcow2,if=ide,index=0,boot=on -drive file=/home/dferguson/Disk_Images/MS_Win7Pro.iso,if=ide,media=c drom,index=2 -net nic,macaddr=52:54:00:1b:24:fb,vlan=0,name=nic.0 -net tap,fd=35,vlan=0,name=tap.0 -chardev pty,id=serial0 -serial chardev:serial0 -paralle l none -usb -usbdevice tablet -vnc 127.0.0.1:1 -k en-gb -vga cirrus char device redirected to /dev/pts/10 # kvm-img info /srv/vm_disks/dev32/disk0.qcow2 image: /srv/vm_disks/dev32/disk0.qcow2 file format: qcow2 virtual size: 16G (17179869184 bytes) disk size: 8.2G cluster_size: 65536 What other information would be helpful?
[Qemu-devel] Re: [PATCH 3/5] QMP: Introduce MIGRATION events
On 05/25/2010 10:35 AM, Juan Quintela wrote: Anthony Liguori wrote: On 05/25/2010 09:21 AM, Juan Quintela wrote: +MIGRATION_CANCELED +-- + +Emitted when migration is canceled. This is emitted in the source. +Target will emit MIGRATION_FAILED (no way to differentiate a FAILED +and CANCELED migration for target). But the management tool is the one that cancels so surely, it knows why already. ok, then that one is ok. +Data: None + +Example: + +{ "event": "MIGRATION_CANCELED", +"timestamp": {"seconds": 1274687575, "microseconds": 592483} } + +MIGRATION_ENDED +--- + +Emitted when migration ends (both in source and target) A start event is going to be generated already, no? problem here is that libvirt start target with -S, and waits to do the "cont" as soon as possible. As of know, only way to do it is to poll info migrate on source faster. Why does it do that?? That sound like a terrible idea. +Data: None + +Example: + +{ "event": "MIGRATION_ENDED", +"timestamp": {"seconds": 1274687575, "microseconds": 592483} } + +MIGRATION_FAILED + + +Emitted when migration fails (both is source and target). + +Data: None There should be some information about why it failed, no? Preferrably in a QError format. At this point, we have basically -1 :( I can add a field with an error number, but we are very bad at the moment about moving errno's upstack. We need a better solution for reporting errors via notifications. I think this makes more sense as a MIGRATION_CONNECTED event. It probably should carry peer information too. What kind of peer information? We have tcp/fd/exec/unix migrations. calling it CONNECTED vs STARTED, I don't care. But adding information? Notice that the management application knows what it did, I can put the: "exec: gzip -d< /tmp/foo" string, but not much more that I can put here. Basically, do we have any useful information in info migrate that we can include? Regards, Anthony Liguori Later, Juan.
[Qemu-devel] SDL fullscreen window dislikes being resized past the screen size
Initially it were a bugreport on #kvm IRC, someone asked why his kvm exits when entering fullscreen mode, saying the famous "Could not open SDL display" and nothing more. I added a bit of debug output and here's what I see: ... resizing to 1440x900 0 0x115 resizing to 1440x900 32 0x8115 Could not open SDL display for 1440x900, bpp=32, flags=0x8115 flag=0x8000 means fullscreen. My screen size is 1280x1024, -- 1440 is more than 1280. It works just fine if I choose resolution less or equal to my screen size. And it works just fine (as seen in the example output above) that it works with larger resolutions but not fullscreen (in that case SDL window will be scaled to fit the actual desktop size). It looks to me like an incorrect usage or assumptions about SDL window, or maybe SDL bug. Mine is libsdl1.2debian 1.2.13-2. And the error message is in usual qemu style - not at all useful :) But it's at least better than pure exit without any messages at all, like was in hugetlbfs code ;) Thanks! /mjt
Re: [Qemu-devel] Re: [PATCH 3/5] QMP: Introduce MIGRATION events
On Tue, May 25, 2010 at 05:35:53PM +0200, Juan Quintela wrote: > Anthony Liguori wrote: > > >> +Example: > >> + > >> +{ "event": "MIGRATION_FAILED", > >> +"timestamp": {"seconds": 1274687575, "microseconds": 592483} } > >> + > >> +MIGRATION_STARTED > >> +- > >> + > >> +Emitted when migration starts (both in source and target). > >> > > > > I think this makes more sense as a MIGRATION_CONNECTED event. It > > probably should carry peer information too. > > What kind of peer information? > > We have tcp/fd/exec/unix migrations. calling it CONNECTED vs STARTED, I > don't care. But adding information? Notice that the management > application knows what it did, I can put the: > > "exec: gzip -d < /tmp/foo" > > string, but not much more that I can put here. This is why I think network client CONNECT/DISCONNECT events should be separate from MIGRATION START/END events. They happen to occur at roughly the same time if using a TCP / UNIX socket based migration transport, but CONNECT/DISCONNECT + peer info is meaningless for exec or fd based migration. Daniel -- |: Red Hat, Engineering, London-o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org-o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
Re: [Qemu-devel] [PATCH 3/5] QMP: Introduce MIGRATION events
On Tue, May 25, 2010 at 10:09:55AM -0500, Anthony Liguori wrote: > On 05/25/2010 09:21 AM, Juan Quintela wrote: > >They are emitted when migration starts, ends, has a failure or is canceled. > > > >+Data: None > >+ > >+Example: > >+ > >+{ "event": "MIGRATION_CANCELED", > >+"timestamp": {"seconds": 1274687575, "microseconds": 592483} } > >+ > >+MIGRATION_ENDED > >+--- > >+ > >+Emitted when migration ends (both in source and target) > > > > A start event is going to be generated already, no? > > >+Data: None > >+ > >+Example: > >+ > >+{ "event": "MIGRATION_ENDED", > >+"timestamp": {"seconds": 1274687575, "microseconds": 592483} } > >+ > >+MIGRATION_FAILED > >+ > >+ > >+Emitted when migration fails (both is source and target). > >+ > >+Data: None > > > > There should be some information about why it failed, no? Preferrably in > a QError format. > > >+Example: > >+ > >+{ "event": "MIGRATION_FAILED", > >+"timestamp": {"seconds": 1274687575, "microseconds": 592483} } > >+ > >+MIGRATION_STARTED > >+- > >+ > >+Emitted when migration starts (both in source and target). > > > > I think this makes more sense as a MIGRATION_CONNECTED event. It > probably should carry peer information too. FYI the original request for these events from a libvirt POV for in terms of identifying the lifecycle transitions. Currently we issue a migration commend on source, and then have to poll very frequently on 'info migrate' to get progress stats, and to determine completion. We want to poll much less frequently for stats, and get async notification of completion/errors on the source. Simiarly on the destination, we need to know when any migration operation is taking place, so we can avoid issuing monitor commands to the QEMU process during that time, and also track success/failure + eventually get progress information via an equivalent of 'info migrate' on destination. So this is really focused on lifecycle transitions, rather than network client connections. I'm not convinced that we should mix up the two sorts of data. If we want to track network client connections IMHO they ought to be separate events. Perhaps there should be a generic QEMU network CONNECT/DISCONNECT event that works for all QEMU network sockets (migration, chardevs, netdev sockets, vnc, spice, and whatever we invent in future using sockets). Daniel -- |: Red Hat, Engineering, London-o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org-o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
[Qemu-devel] Re: [PATCH 3/5] QMP: Introduce MIGRATION events
Anthony Liguori wrote: > On 05/25/2010 09:21 AM, Juan Quintela wrote: >> +MIGRATION_CANCELED >> +-- >> + >> +Emitted when migration is canceled. This is emitted in the source. >> +Target will emit MIGRATION_FAILED (no way to differentiate a FAILED >> +and CANCELED migration for target). >> > > But the management tool is the one that cancels so surely, it knows > why already. ok, then that one is ok. > >> +Data: None >> + >> +Example: >> + >> +{ "event": "MIGRATION_CANCELED", >> +"timestamp": {"seconds": 1274687575, "microseconds": 592483} } >> + >> +MIGRATION_ENDED >> +--- >> + >> +Emitted when migration ends (both in source and target) >> > > A start event is going to be generated already, no? problem here is that libvirt start target with -S, and waits to do the "cont" as soon as possible. As of know, only way to do it is to poll info migrate on source faster. >> +Data: None >> + >> +Example: >> + >> +{ "event": "MIGRATION_ENDED", >> +"timestamp": {"seconds": 1274687575, "microseconds": 592483} } >> + >> +MIGRATION_FAILED >> + >> + >> +Emitted when migration fails (both is source and target). >> + >> +Data: None >> > > There should be some information about why it failed, no? Preferrably > in a QError format. At this point, we have basically -1 :( I can add a field with an error number, but we are very bad at the moment about moving errno's upstack. >> +Example: >> + >> +{ "event": "MIGRATION_FAILED", >> +"timestamp": {"seconds": 1274687575, "microseconds": 592483} } >> + >> +MIGRATION_STARTED >> +- >> + >> +Emitted when migration starts (both in source and target). >> > > I think this makes more sense as a MIGRATION_CONNECTED event. It > probably should carry peer information too. What kind of peer information? We have tcp/fd/exec/unix migrations. calling it CONNECTED vs STARTED, I don't care. But adding information? Notice that the management application knows what it did, I can put the: "exec: gzip -d < /tmp/foo" string, but not much more that I can put here. Later, Juan.
[Qemu-devel] [Bug 585449] Re: 'Broken pipe' error when starting a vm
** Attachment added: "virt-manager.log" http://launchpadlibrarian.net/49083168/virt-manager.log -- 'Broken pipe' error when starting a vm https://bugs.launchpad.net/bugs/585449 You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. Status in QEMU: New Bug description: Occasionally get an error 'libvir: Remote error : cannot send data: Broken pipe' scrolling down the screen when starting a VM. VM does not start and end up having to reboot server to be able to get vm going again virsh -q -c qemu:///system start dev32 libvir: QEMU error : server closed connection libvir: Remote error : cannot send data: Broken pipe libvir: Remote error : cannot send data: Broken pipe ...last line keep being printed every second or so... Trying to then use virt-manager produced the attached log file $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description:Ubuntu 10.04 LTS Release:10.04 Codename: lucid $ dpkg -l | grep kvm ii kvm 1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9 dummy transitional pacakge from kvm to qemu- ii kvm-pxe5.4.4-1ubuntu1 PXE ROM's for KVM ii qemu-kvm 0.12.3+noroms-0ubuntu9 Full virtualization on i386 and amd64 hardwa # cat/var/log/libvirt/qemu/dev32.log LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 768 -smp 1 -name d ev32 -uuid 4eb67c83-8e6a-ba91-5741-6789da4e4004 -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/dev32.monitor,server,nowait -monitor chardev:monitor -localtime -boot c -drive file=/srv/vm_disks/dev32/disk0.qcow2,if=ide,index=0,boot=on -drive file=/home/dferguson/Disk_Images/MS_Win7Pro.iso,if=ide,media=c drom,index=2 -net nic,macaddr=52:54:00:1b:24:fb,vlan=0,name=nic.0 -net tap,fd=35,vlan=0,name=tap.0 -chardev pty,id=serial0 -serial chardev:serial0 -paralle l none -usb -usbdevice tablet -vnc 127.0.0.1:0 -k en-gb -vga cirrus char device redirected to /dev/pts/5 pci_add_option_rom: failed to find romfile "pxe-rtl8139.bin" LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 768 -smp 1 -name d ev32 -uuid 4eb67c83-8e6a-ba91-5741-6789da4e4004 -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/dev32.monitor,server,nowait -monitor chardev:monitor -localtime -boot c -drive file=/srv/vm_disks/dev32/disk0.qcow2,if=ide,index=0,boot=on -drive file=/home/dferguson/Disk_Images/MS_Win7Pro.iso,if=ide,media=c drom,index=2 -net nic,macaddr=52:54:00:1b:24:fb,vlan=0,name=nic.0 -net tap,fd=35,vlan=0,name=tap.0 -chardev pty,id=serial0 -serial chardev:serial0 -paralle l none -usb -usbdevice tablet -vnc 127.0.0.1:1 -k en-gb -vga cirrus char device redirected to /dev/pts/10 # kvm-img info /srv/vm_disks/dev32/disk0.qcow2 image: /srv/vm_disks/dev32/disk0.qcow2 file format: qcow2 virtual size: 16G (17179869184 bytes) disk size: 8.2G cluster_size: 65536 What other information would be helpful?
[Qemu-devel] [Bug 585449] [NEW] 'Broken pipe' error when starting a vm
Public bug reported: Occasionally get an error 'libvir: Remote error : cannot send data: Broken pipe' scrolling down the screen when starting a VM. VM does not start and end up having to reboot server to be able to get vm going again virsh -q -c qemu:///system start dev32 libvir: QEMU error : server closed connection libvir: Remote error : cannot send data: Broken pipe libvir: Remote error : cannot send data: Broken pipe ...last line keep being printed every second or so... Trying to then use virt-manager produced the attached log file $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description:Ubuntu 10.04 LTS Release:10.04 Codename: lucid $ dpkg -l | grep kvm ii kvm 1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9 dummy transitional pacakge from kvm to qemu- ii kvm-pxe5.4.4-1ubuntu1 PXE ROM's for KVM ii qemu-kvm 0.12.3+noroms-0ubuntu9 Full virtualization on i386 and amd64 hardwa # cat/var/log/libvirt/qemu/dev32.log LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 768 -smp 1 -name d ev32 -uuid 4eb67c83-8e6a-ba91-5741-6789da4e4004 -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/dev32.monitor,server,nowait -monitor chardev:monitor -localtime -boot c -drive file=/srv/vm_disks/dev32/disk0.qcow2,if=ide,index=0,boot=on -drive file=/home/dferguson/Disk_Images/MS_Win7Pro.iso,if=ide,media=c drom,index=2 -net nic,macaddr=52:54:00:1b:24:fb,vlan=0,name=nic.0 -net tap,fd=35,vlan=0,name=tap.0 -chardev pty,id=serial0 -serial chardev:serial0 -paralle l none -usb -usbdevice tablet -vnc 127.0.0.1:0 -k en-gb -vga cirrus char device redirected to /dev/pts/5 pci_add_option_rom: failed to find romfile "pxe-rtl8139.bin" LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 768 -smp 1 -name d ev32 -uuid 4eb67c83-8e6a-ba91-5741-6789da4e4004 -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/dev32.monitor,server,nowait -monitor chardev:monitor -localtime -boot c -drive file=/srv/vm_disks/dev32/disk0.qcow2,if=ide,index=0,boot=on -drive file=/home/dferguson/Disk_Images/MS_Win7Pro.iso,if=ide,media=c drom,index=2 -net nic,macaddr=52:54:00:1b:24:fb,vlan=0,name=nic.0 -net tap,fd=35,vlan=0,name=tap.0 -chardev pty,id=serial0 -serial chardev:serial0 -paralle l none -usb -usbdevice tablet -vnc 127.0.0.1:1 -k en-gb -vga cirrus char device redirected to /dev/pts/10 # kvm-img info /srv/vm_disks/dev32/disk0.qcow2 image: /srv/vm_disks/dev32/disk0.qcow2 file format: qcow2 virtual size: 16G (17179869184 bytes) disk size: 8.2G cluster_size: 65536 What other information would be helpful? ** Affects: qemu Importance: Undecided Status: New -- 'Broken pipe' error when starting a vm https://bugs.launchpad.net/bugs/585449 You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. Status in QEMU: New Bug description: Occasionally get an error 'libvir: Remote error : cannot send data: Broken pipe' scrolling down the screen when starting a VM. VM does not start and end up having to reboot server to be able to get vm going again virsh -q -c qemu:///system start dev32 libvir: QEMU error : server closed connection libvir: Remote error : cannot send data: Broken pipe libvir: Remote error : cannot send data: Broken pipe ...last line keep being printed every second or so... Trying to then use virt-manager produced the attached log file $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description:Ubuntu 10.04 LTS Release:10.04 Codename: lucid $ dpkg -l | grep kvm ii kvm 1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9 dummy transitional pacakge from kvm to qemu- ii kvm-pxe5.4.4-1ubuntu1 PXE ROM's for KVM ii qemu-kvm 0.12.3+noroms-0ubuntu9 Full virtualization on i386 and amd64 hardwa # cat/var/log/libvirt/qemu/dev32.log LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 768 -smp 1 -name d ev32 -uuid 4eb67c83-8e6a-ba91-5741-6789da4e4004 -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/dev32.monitor,server,nowait -monitor chardev:monitor -localtime -boot c -drive file=/srv/vm_disks/dev32/disk0.qcow2,if=ide,index=0,boot=on -drive file=/home/dferguson/Disk_Images/MS_Win7Pro.iso,if=ide,media=c drom,index=2 -net nic,macaddr=52:54:00:1b:24:fb,vlan=0,name=nic.0 -net tap,fd=35,vlan=0,name=tap.0 -chardev pty,id=serial0 -serial chardev:serial0 -paralle l none -usb -usbdevice tablet
Re: [Qemu-devel] [PATCH 3/5] QMP: Introduce MIGRATION events
On 05/25/2010 09:21 AM, Juan Quintela wrote: They are emitted when migration starts, ends, has a failure or is canceled. Signed-off-by: Juan Quintela --- QMP/qmp-events.txt | 50 ++ monitor.c | 12 monitor.h |4 3 files changed, 66 insertions(+), 0 deletions(-) diff --git a/QMP/qmp-events.txt b/QMP/qmp-events.txt index 01ec85f..93caa4d 100644 --- a/QMP/qmp-events.txt +++ b/QMP/qmp-events.txt @@ -26,6 +26,56 @@ Example: Note: If action is "stop", a STOP event will eventually follow the BLOCK_IO_ERROR event. +MIGRATION_CANCELED +-- + +Emitted when migration is canceled. This is emitted in the source. +Target will emit MIGRATION_FAILED (no way to differentiate a FAILED +and CANCELED migration for target). But the management tool is the one that cancels so surely, it knows why already. +Data: None + +Example: + +{ "event": "MIGRATION_CANCELED", +"timestamp": {"seconds": 1274687575, "microseconds": 592483} } + +MIGRATION_ENDED +--- + +Emitted when migration ends (both in source and target) A start event is going to be generated already, no? +Data: None + +Example: + +{ "event": "MIGRATION_ENDED", +"timestamp": {"seconds": 1274687575, "microseconds": 592483} } + +MIGRATION_FAILED + + +Emitted when migration fails (both is source and target). + +Data: None There should be some information about why it failed, no? Preferrably in a QError format. +Example: + +{ "event": "MIGRATION_FAILED", +"timestamp": {"seconds": 1274687575, "microseconds": 592483} } + +MIGRATION_STARTED +- + +Emitted when migration starts (both in source and target). I think this makes more sense as a MIGRATION_CONNECTED event. It probably should carry peer information too. Regards, Anthony Liguori +Data: None + +Example: + +{ "event": "MIGRATION_STARTED", +"timestamp": {"seconds": 1274687575, "microseconds": 592483} } + RESET - diff --git a/monitor.c b/monitor.c index ad50f12..5158780 100644 --- a/monitor.c +++ b/monitor.c @@ -444,6 +444,18 @@ void monitor_protocol_event(MonitorEvent event, QObject *data) case QEVENT_WATCHDOG: event_name = "WATCHDOG"; break; +case QEVENT_MIGRATION_STARTED: +event_name = "MIGRATION_STARTED"; +break; +case QEVENT_MIGRATION_ENDED: +event_name = "MIGRATION_ENDED"; +break; +case QEVENT_MIGRATION_FAILED: +event_name = "MIGRATION_FAILED"; +break; +case QEVENT_MIGRATION_CANCELED: +event_name = "MIGRATION_CANCELED"; +break; default: abort(); break; diff --git a/monitor.h b/monitor.h index ea15469..34bcd38 100644 --- a/monitor.h +++ b/monitor.h @@ -28,6 +28,10 @@ typedef enum MonitorEvent { QEVENT_BLOCK_IO_ERROR, QEVENT_RTC_CHANGE, QEVENT_WATCHDOG, +QEVENT_MIGRATION_STARTED, +QEVENT_MIGRATION_ENDED, +QEVENT_MIGRATION_FAILED, +QEVENT_MIGRATION_CANCELED, QEVENT_MAX, } MonitorEvent;
Re: [Qemu-devel] [PATCH] resent: x86/cpuid: propagate further CPUID leafs when -cpu host
On 05/25/2010 08:21 AM, Andre Przywara wrote: What's the benefit of exposing this information to the guest? That is mostly to propagate the cache size and organization parameters to the guest: >> +/* safe CPUID leafs to propagate to guest if -cpu host is specified >> + * Intel defined leafs: >> + * Cache descriptors (0x02) >> + * Deterministic cache parameters (0x04) >> + * Monitor/MWAIT parameters (0x05) >> + * >> + * AMD defined leafs: >> + * L1 Cache and TLB (0x05) >> + * L2+L3 TLB (0x06) >> + * LongMode address size (0x08) >> + * 1GB page TLB (0x19) >> + * Performance optimization (0x1A) >> + */ Since at least L1 and L2 caches are mostly private to vCPUs, I see no reason to disguise them. But in practice, what is it useful for? Just because we can expose it doesn't mean we should. Regards, Anthony Liguori Regards, Andre.
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 05:09 PM, Kevin Wolf wrote: The first part of your argument may be true, but the second isn't. No user can run upstream qemu.git. It's not tested or supported, and has no backwards compatibility guarantees. The second part was basically meant to say "developers don't count here". Agreed. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 05:03 PM, Anthony Liguori wrote: On 05/25/2010 08:55 AM, Avi Kivity wrote: On 05/25/2010 04:53 PM, Kevin Wolf wrote: I'm still not convinced that we need either. I share Christoph's concern that we would make our life harder for almost no gain. It's probably a very small group of users (if it exists at all) that wants to add new block drivers themselves, but at the same time can't run upstream qemu. The first part of your argument may be true, but the second isn't. No user can run upstream qemu.git. It's not tested or supported, and has no backwards compatibility guarantees. Yes, it does have backwards compatibility guarantees. I meant a random untagged qemu.git snapshot. Do we guarantee anything about it, except that it's likely to be broken? -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 05:05 PM, Anthony Liguori wrote: On 05/25/2010 09:01 AM, Avi Kivity wrote: On 05/25/2010 04:55 PM, Anthony Liguori wrote: On 05/25/2010 08:38 AM, Avi Kivity wrote: On 05/25/2010 04:35 PM, Anthony Liguori wrote: On 05/25/2010 08:31 AM, Avi Kivity wrote: A protocol based mechanism has the advantage of being more robust in the face of poorly written block backends so if it's possible to make it perform as well as a plugin, it's a preferable approach. May be hard due to difficulty of exposing guest memory. If someone did a series to add plugins, I would expect a very strong argument as to why a shared memory mechanism was not possible or at least plausible. I'm not sure I understand why shared memory is such a bad thing wrt KVM. Can you elaborate? Is it simply a matter of fork()? fork() doesn't work in the with of memory hotplug. What else is there? Is it that fork() doesn't work or is it that fork() is very expensive? It doesn't work, fork() is done at block device creation time, which freezes the child memory map, while guest memory is allocated at hotplug time. Now I'm confused. I thought you were saying shared memory somehow affects fork(). If you're talking about shared memory inheritance via fork(), that's less important. The latter. Why is it less important? If you don't inherit the memory, you can't access it. You can also pass /dev/shm fd's via SCM_RIGHTs to establish shared memory segments dynamically. Doesn't work for anonymous memory. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 10:00 AM, Avi Kivity wrote: The latter. Why is it less important? If you don't inherit the memory, you can't access it. You can also pass /dev/shm fd's via SCM_RIGHTs to establish shared memory segments dynamically. Doesn't work for anonymous memory. What's wrong with /dev/shm memory? Regards, Anthony Liguori
[Qemu-devel] KVM call minutes for May 25
Generic Asynchronous task offloading - keep vcpu thread from blocking - generic approach is useful, comes down to specifics - e.g. what is done in worker threads, how locking is handled - offload blocking work to worker threads - need to make device model reentrant - can be simple w/ lock per device, but needs finer grained to be perfromant - but needs - thread pool should be relatively easy to merge - two alternatives for virtfs - one full request in thread, one that does syscalls asynchronously - need to have something demonstrably better to make threading acceptable - complexity of state machine vs. locking - performance difference at the end (e.g. coarse grained locking doesn't fully allow for parallel execution) - lapic needs to be reentrant - in qemu lapic should show off the benefit very well - hpet would also benefit from being threaded - multiple threads per device (needs device specific locking) - qxl, scsi, virtfs... - push global lock down to pio/mmio dispatch - audit for global qemu state changes done w/in device - anthony will set up wiki page to describe making device models reentrant - looking for volunteers!
[Qemu-devel] [PATCH 5/5] QMP: Emit migration events on outgoing migration
Signed-off-by: Juan Quintela --- migration-exec.c |3 ++- migration-fd.c |1 + migration-tcp.c |2 ++ migration-unix.c |2 ++ migration.c |5 + 5 files changed, 12 insertions(+), 1 deletions(-) diff --git a/migration-exec.c b/migration-exec.c index 07af11a..ebc9256 100644 --- a/migration-exec.c +++ b/migration-exec.c @@ -20,6 +20,7 @@ #include "sysemu.h" #include "buffered_file.h" #include "block.h" +#include "monitor.h" //#define DEBUG_MIGRATION_EXEC @@ -91,9 +92,9 @@ MigrationState *exec_start_outgoing_migration(Monitor *mon, s->mig_state.shared = inc; s->state = MIG_STATE_ACTIVE; +monitor_protocol_event(QEVENT_MIGRATION_STARTED, NULL); s->mon = NULL; s->bandwidth_limit = bandwidth_limit; - if (!detach) { migrate_fd_monitor_suspend(s, mon); } diff --git a/migration-fd.c b/migration-fd.c index 6d14505..9c4c7ae 100644 --- a/migration-fd.c +++ b/migration-fd.c @@ -83,6 +83,7 @@ MigrationState *fd_start_outgoing_migration(Monitor *mon, s->mig_state.blk = blk; s->mig_state.shared = inc; +monitor_protocol_event(QEVENT_MIGRATION_STARTED, NULL); s->state = MIG_STATE_ACTIVE; s->mon = NULL; s->bandwidth_limit = bandwidth_limit; diff --git a/migration-tcp.c b/migration-tcp.c index 20f2e37..11a1203 100644 --- a/migration-tcp.c +++ b/migration-tcp.c @@ -18,6 +18,7 @@ #include "sysemu.h" #include "buffered_file.h" #include "block.h" +#include "monitor.h" //#define DEBUG_MIGRATION_TCP @@ -102,6 +103,7 @@ MigrationState *tcp_start_outgoing_migration(Monitor *mon, s->mig_state.blk = blk; s->mig_state.shared = inc; +monitor_protocol_event(QEVENT_MIGRATION_STARTED, NULL); s->state = MIG_STATE_ACTIVE; s->mon = NULL; s->bandwidth_limit = bandwidth_limit; diff --git a/migration-unix.c b/migration-unix.c index 57232c0..08f29a3 100644 --- a/migration-unix.c +++ b/migration-unix.c @@ -18,6 +18,7 @@ #include "sysemu.h" #include "buffered_file.h" #include "block.h" +#include "monitor.h" //#define DEBUG_MIGRATION_UNIX @@ -101,6 +102,7 @@ MigrationState *unix_start_outgoing_migration(Monitor *mon, s->mig_state.blk = blk; s->mig_state.shared = inc; +monitor_protocol_event(QEVENT_MIGRATION_STARTED, NULL); s->state = MIG_STATE_ACTIVE; s->mon = NULL; s->bandwidth_limit = bandwidth_limit; diff --git a/migration.c b/migration.c index 32470d5..86535a9 100644 --- a/migration.c +++ b/migration.c @@ -306,6 +306,7 @@ void migrate_fd_monitor_suspend(FdMigrationState *s, Monitor *mon) void migrate_fd_error(FdMigrationState *s) { DPRINTF("setting error state\n"); +monitor_protocol_event(QEVENT_MIGRATION_FAILED, NULL); s->state = MIG_STATE_ERROR; migrate_fd_cleanup(s); } @@ -403,8 +404,10 @@ void migrate_fd_put_ready(void *opaque) if (old_vm_running) { vm_start(); } +monitor_protocol_event(QEVENT_MIGRATION_FAILED, NULL); state = MIG_STATE_ERROR; } else { +monitor_protocol_event(QEVENT_MIGRATION_ENDED, NULL); state = MIG_STATE_COMPLETED; } migrate_fd_cleanup(s); @@ -427,6 +430,7 @@ void migrate_fd_cancel(MigrationState *mig_state) DPRINTF("cancelling migration\n"); +monitor_protocol_event(QEVENT_MIGRATION_CANCELED, NULL); s->state = MIG_STATE_CANCELLED; qemu_savevm_state_cancel(s->mon, s->file); @@ -440,6 +444,7 @@ void migrate_fd_release(MigrationState *mig_state) DPRINTF("releasing state\n"); if (s->state == MIG_STATE_ACTIVE) { +monitor_protocol_event(QEVENT_MIGRATION_CANCELED, NULL); s->state = MIG_STATE_CANCELLED; migrate_fd_cleanup(s); } -- 1.6.6.1
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
At Mon, 24 May 2010 14:16:32 -0500, Anthony Liguori wrote: > > On 05/24/2010 06:56 AM, Avi Kivity wrote: > > On 05/24/2010 02:42 PM, MORITA Kazutaka wrote: > >> > >>> The server would be local and talk over a unix domain socket, perhaps > >>> anonymous. > >>> > >>> nbd has other issues though, such as requiring a copy and no support > >>> for > >>> metadata operations such as snapshot and file size extension. > >>> > >> Sorry, my explanation was unclear. I'm not sure how running servers > >> on localhost can solve the problem. > > > > The local server can convert from the local (nbd) protocol to the > > remote (sheepdog, ceph) protocol. > > > >> What I wanted to say was that we cannot specify the image of VM. With > >> nbd protocol, command line arguments are as follows: > >> > >> $ qemu nbd:hostname:port > >> > >> As this syntax shows, with nbd protocol the client cannot pass the VM > >> image name to the server. > > > > We would extend it to allow it to connect to a unix domain socket: > > > > qemu nbd:unix:/path/to/socket > > nbd is a no-go because it only supports a single, synchronous I/O > operation at a time and has no mechanism for extensibility. > > If we go this route, I think two options are worth considering. The > first would be a purely socket based approach where we just accepted the > extra copy. > > The other potential approach would be shared memory based. We export > all guest ram as shared memory along with a small bounce buffer pool. > We would then use a ring queue (potentially even using virtio-blk) and > an eventfd for notification. > The shared memory approach assumes that there is a local server who can talk with the storage system. But Ceph doesn't require the local server, and Sheepdog would be extended to support VMs running outside the storage system. We could run a local daemon who can only work as proxy, but I don't think it looks a clean approach. So I think a socket based approach is the right way to go. BTW, is it required to design a common interface? The way Sheepdog replicates data is different from Ceph, so I think it is not possible to define a common protocol as Christian says. Regards, Kazutaka > > The server at the other end would associate the socket with a filename > > and forward it to the server using the remote protocol. > > > > However, I don't think nbd would be a good protocol. My preference > > would be for a plugin API, or for a new local protocol that uses > > splice() to avoid copies. > > I think a good shared memory implementation would be preferable to > plugins. I think it's worth attempting to do a plugin interface for the > block layer but I strongly suspect it would not be sufficient. > > I would not want to see plugins that interacted with BlockDriverState > directly, for instance. We change it far too often. Our main loop > functions are also not terribly stable so I'm not sure how we would > handle that (unless we forced all block plugins to be in a separate thread). >
[Qemu-devel] [PATCH 3/5] QMP: Introduce MIGRATION events
They are emitted when migration starts, ends, has a failure or is canceled. Signed-off-by: Juan Quintela --- QMP/qmp-events.txt | 50 ++ monitor.c | 12 monitor.h |4 3 files changed, 66 insertions(+), 0 deletions(-) diff --git a/QMP/qmp-events.txt b/QMP/qmp-events.txt index 01ec85f..93caa4d 100644 --- a/QMP/qmp-events.txt +++ b/QMP/qmp-events.txt @@ -26,6 +26,56 @@ Example: Note: If action is "stop", a STOP event will eventually follow the BLOCK_IO_ERROR event. +MIGRATION_CANCELED +-- + +Emitted when migration is canceled. This is emitted in the source. +Target will emit MIGRATION_FAILED (no way to differentiate a FAILED +and CANCELED migration for target). + +Data: None + +Example: + +{ "event": "MIGRATION_CANCELED", +"timestamp": {"seconds": 1274687575, "microseconds": 592483} } + +MIGRATION_ENDED +--- + +Emitted when migration ends (both in source and target) + +Data: None + +Example: + +{ "event": "MIGRATION_ENDED", +"timestamp": {"seconds": 1274687575, "microseconds": 592483} } + +MIGRATION_FAILED + + +Emitted when migration fails (both is source and target). + +Data: None + +Example: + +{ "event": "MIGRATION_FAILED", +"timestamp": {"seconds": 1274687575, "microseconds": 592483} } + +MIGRATION_STARTED +- + +Emitted when migration starts (both in source and target). + +Data: None + +Example: + +{ "event": "MIGRATION_STARTED", +"timestamp": {"seconds": 1274687575, "microseconds": 592483} } + RESET - diff --git a/monitor.c b/monitor.c index ad50f12..5158780 100644 --- a/monitor.c +++ b/monitor.c @@ -444,6 +444,18 @@ void monitor_protocol_event(MonitorEvent event, QObject *data) case QEVENT_WATCHDOG: event_name = "WATCHDOG"; break; +case QEVENT_MIGRATION_STARTED: +event_name = "MIGRATION_STARTED"; +break; +case QEVENT_MIGRATION_ENDED: +event_name = "MIGRATION_ENDED"; +break; +case QEVENT_MIGRATION_FAILED: +event_name = "MIGRATION_FAILED"; +break; +case QEVENT_MIGRATION_CANCELED: +event_name = "MIGRATION_CANCELED"; +break; default: abort(); break; diff --git a/monitor.h b/monitor.h index ea15469..34bcd38 100644 --- a/monitor.h +++ b/monitor.h @@ -28,6 +28,10 @@ typedef enum MonitorEvent { QEVENT_BLOCK_IO_ERROR, QEVENT_RTC_CHANGE, QEVENT_WATCHDOG, +QEVENT_MIGRATION_STARTED, +QEVENT_MIGRATION_ENDED, +QEVENT_MIGRATION_FAILED, +QEVENT_MIGRATION_CANCELED, QEVENT_MAX, } MonitorEvent; -- 1.6.6.1
[Qemu-devel] [Bug 585113] Re: e1000 irq problems after live migration with qemu-kvm 0.12.4
Additional Info: 1) If I use rtl8139 instead of e1000 NIC driver. The VM freezes at 100% CPU after migration 2) Ubuntu Lucid LTS 64-bit Server is also affected and shows same symtomps -- e1000 irq problems after live migration with qemu-kvm 0.12.4 https://bugs.launchpad.net/bugs/585113 You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. Status in QEMU: New Bug description: sorry for resubmitting. i accidently moved this bug to qemu-kvm at launchpad where it is stuck... After live migrating ubuntu 9.10 server (2.6.31-14-server) and suse linux 10.1 (2.6.16.13-4-smp) it happens sometimes that the guest runs into irq problems. i mention these 2 guest oss since i have seen the error there. there are likely others around with the same problem. on the host i run 2.6.33.3 (kernel+mod) and qemu-kvm 0.12.4. i started a vm with: /usr/bin/qemu-kvm-0.12.4 -net tap,vlan=141,script=no,downscript=no,ifname=tap0 -net nic,vlan=141,model=e1000,macaddr=52:54:00:ff:00:72 -drive file=/dev/sdb,if=ide,boot=on,cache=none,aio=native -m 1024 -cpu qemu64,model_id='Intel(R) Xeon(R) CPU E5430 @ 2.66GHz' -monitor tcp:0:4001,server,nowait -vnc :1 -name 'migration-test-9-10' -boot order=dc,menu=on -k de -incoming tcp:172.21.55.22:5001 -pidfile /var/run/qemu/vm-155.pid -mem-path /hugepages -mem-prealloc -rtc base=utc,clock=host -usb -usbdevice tablet for testing i have a clean ubuntu 9.10 server 64-bit install and created a small script with fetches a dvd iso from a local server and checking md5sum in an endless loop. the download performance is approx. 50MB/s on that vm. to trigger the error i did several migrations of the vm throughout the last days. finally I ended up in the following oops in the guest: [64442.298521] irq 10: nobody cared (try booting with the "irqpoll" option) [64442.299175] Pid: 0, comm: swapper Not tainted 2.6.31-14-server #48-Ubuntu [64442.299179] Call Trace: [64442.299185][] __report_bad_irq+0x26/0xa0 [64442.299227] [] note_interrupt+0x18c/0x1d0 [64442.299232] [] handle_fasteoi_irq+0xd5/0x100 [64442.299244] [] handle_irq+0x1d/0x30 [64442.299246] [] do_IRQ+0x67/0xe0 [64442.299249] [] ret_from_intr+0x0/0x11 [64442.299266] [] ? handle_IRQ_event+0x24/0x160 [64442.299269] [] ? handle_edge_irq+0xcf/0x170 [64442.299271] [] ? handle_irq+0x1d/0x30 [64442.299273] [] ? do_IRQ+0x67/0xe0 [64442.299275] [] ? ret_from_intr+0x0/0x11 [64442.299290] [] ? _spin_unlock_irqrestore+0x14/0x20 [64442.299302] [] ? scsi_dispatch_cmd+0x16c/0x2d0 [64442.299307] [] ? scsi_request_fn+0x3aa/0x500 [64442.299322] [] ? __blk_run_queue+0x6c/0x150 [64442.299324] [] ? blk_run_queue+0x2b/0x50 [64442.299327] [] ? scsi_run_queue+0xcf/0x2a0 [64442.299336] [] ? scsi_next_command+0x3d/0x60 [64442.299338] [] ? scsi_end_request+0xab/0xb0 [64442.299340] [] ? scsi_io_completion+0x9e/0x4d0 [64442.299348] [] ? default_spin_lock_flags+0x9/0x10 [64442.299351] [] ? scsi_finish_command+0xbd/0x130 [64442.299353] [] ? scsi_softirq_done+0x145/0x170 [64442.299356] [] ? blk_done_softirq+0x7d/0x90 [64442.299368] [] ? __do_softirq+0xbd/0x200 [64442.299370] [] ? call_softirq+0x1c/0x30 [64442.299372] [] ? do_softirq+0x55/0x90 [64442.299374] [] ? irq_exit+0x85/0x90 [64442.299376] [] ? do_IRQ+0x70/0xe0 [64442.299379] [] ? ret_from_intr+0x0/0x11 [64442.299380][] ? native_safe_halt+0x6/0x10 [64442.299390] [] ? default_idle+0x4c/0xe0 [64442.299395] [] ? atomic_notifier_call_chain+0x15/0x20 [64442.299398] [] ? cpu_idle+0xb2/0x100 [64442.299406] [] ? rest_init+0x66/0x70 [64442.299424] [] ? start_kernel+0x352/0x35b [64442.299427] [] ? x86_64_start_reservations+0x125/0x129 [64442.299429] [] ? x86_64_start_kernel+0xfa/0x109 [64442.299433] handlers: [64442.299840] [] (e1000_intr+0x0/0x190 [e1000]) [64442.300046] Disabling IRQ #10 After this the guest is still allive, but download performance is down to approx. 500KB/s This error is definetly not triggerable with option -no-kvm-irqchip. I have seen this error occasionally since my first experiments with qemu-kvm-88 and also without hugetablefs. Help appreciated.
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
Am 25.05.2010 15:25, schrieb Avi Kivity: > On 05/25/2010 04:17 PM, Anthony Liguori wrote: >> On 05/25/2010 04:14 AM, Avi Kivity wrote: >>> On 05/24/2010 10:38 PM, Anthony Liguori wrote: > - Building a plugin API seems a bit simpler to me, although I'm to > sure if I'd get the >idea correctly: >The block layer has already some kind of api (.bdrv_file_open, > .bdrv_read). We >could simply compile the block-drivers as shared objects and > create a method >for loading the necessary modules at runtime. That approach would be a recipe for disaster. We would have to introduce a new, reduced functionality block API that was supported for plugins. Otherwise, the only way a plugin could keep up with our API changes would be if it was in tree which defeats the purpose of having plugins. >>> >>> We could guarantee API/ABI stability in a stable branch but not >>> across releases. >> >> We have releases every six months. There would be tons of block >> plugins that didn't work for random sets of releases. That creates a >> lot of user confusion and unhappiness. > > The current situation is that those block format drivers only exist in > qemu.git or as patches. Surely that's even more unhappiness. The difference is that in the current situation these drivers will be part of the next qemu release, so the patch may be obsolete, but you don't even need it any more. If you start keeping block drivers outside qemu and not even try integrating them, they'll stay external. > Confusion could be mitigated: > >$ qemu -module my-fancy-block-format-driver.so >my-fancy-block-format-driver.so does not support this version of qemu > (0.19.2). Please contact my-fancy-block-format-driver-de...@example.org. > > The question is how many such block format drivers we expect. We now > have two in the pipeline (ceph, sheepdog), it's reasonable to assume > we'll want an lvm2 driver and btrfs driver. This is an area with a lot > of activity and a relatively simply interface. What's the reason for not having these drivers upstream? Do we gain anything by hiding them from our users and requiring them to install the drivers separately from somewhere else? Kevin
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 04:54 PM, Anthony Liguori wrote: On 05/25/2010 08:36 AM, Avi Kivity wrote: We'd need a kernel-level generic snapshot API for this eventually. or (2) implement BUSE to complement FUSE and CUSE to enable proper userspace block devices. Likely slow due do lots of copying. Also needs a snapshot API. The kernel could use splice. Still can't make guest memory appear in (A)BUSE process memory without either mmu tricks (vmsplice in reverse) or a copy. May be workable for an (A)BUSE driver that talks over a network, and thus can splice() its way out. -- error compiling committee.c: too many arguments to function
[Qemu-devel] [PATCH 4/5] QMP: Emit migration events on incoming migration
Signed-off-by: Juan Quintela --- migration.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/migration.c b/migration.c index c388902..32470d5 100644 --- a/migration.c +++ b/migration.c @@ -60,10 +60,13 @@ int qemu_start_incoming_migration(const char *uri) void process_incoming_migration(QEMUFile *f) { +monitor_protocol_event(QEVENT_MIGRATION_STARTED, NULL); if (qemu_loadvm_state(f) < 0) { +monitor_protocol_event(QEVENT_MIGRATION_FAILED, NULL); fprintf(stderr, "load of migration failed\n"); exit(0); } +monitor_protocol_event(QEVENT_MIGRATION_ENDED, NULL); qemu_announce_self(); DPRINTF("successfully loaded vm state\n"); -- 1.6.6.1
[Qemu-devel] [PATCH 1/5] Exit if incoming migration fails
Signed-off-by: Juan Quintela --- migration.c | 16 ++-- migration.h |2 +- vl.c|7 ++- 3 files changed, 17 insertions(+), 8 deletions(-) diff --git a/migration.c b/migration.c index 05f6cc5..9c1d4b6 100644 --- a/migration.c +++ b/migration.c @@ -36,22 +36,26 @@ static uint32_t max_throttle = (32 << 20); static MigrationState *current_migration; -void qemu_start_incoming_migration(const char *uri) +int qemu_start_incoming_migration(const char *uri) { const char *p; +int ret; if (strstart(uri, "tcp:", &p)) -tcp_start_incoming_migration(p); +ret = tcp_start_incoming_migration(p); #if !defined(WIN32) else if (strstart(uri, "exec:", &p)) -exec_start_incoming_migration(p); +ret = exec_start_incoming_migration(p); else if (strstart(uri, "unix:", &p)) -unix_start_incoming_migration(p); +ret = unix_start_incoming_migration(p); else if (strstart(uri, "fd:", &p)) -fd_start_incoming_migration(p); +ret = fd_start_incoming_migration(p); #endif -else +else { fprintf(stderr, "unknown migration protocol: %s\n", uri); +ret = -EPROTONOSUPPORT; +} +return ret; } int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data) diff --git a/migration.h b/migration.h index 385423f..dd423a1 100644 --- a/migration.h +++ b/migration.h @@ -50,7 +50,7 @@ struct FdMigrationState void *opaque; }; -void qemu_start_incoming_migration(const char *uri); +int qemu_start_incoming_migration(const char *uri); int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data); diff --git a/vl.c b/vl.c index 328395e..d13440d 100644 --- a/vl.c +++ b/vl.c @@ -3823,7 +3823,12 @@ int main(int argc, char **argv, char **envp) } if (incoming) { -qemu_start_incoming_migration(incoming); +int ret = qemu_start_incoming_migration(incoming); +if (ret < 0) { +fprintf(stderr, "Migration failed. Exit code %s(%d), exiting.\n", +incoming, ret); +exit(ret); +} } else if (autostart) { vm_start(); } -- 1.6.6.1
[Qemu-devel] [PATCH v2 0/5] Add QMP migration events
v2: - Address pbonzini and mst changes (error messages and doc fixes) v1: This series does: - exit incoming migration on failure. For exec/fd migrations, once there was a failure, there was nothing useful to do. And for tcp migration, not exiting created interesting bugs when trying to migrate again to a process with a faild migration. - Factorize common migration code, no more duplication, makes easier to do "global" migration things, like QMP events. - Introduce QMP events, both for incoming and outgoing migration. Now, the million dollar question: Why I didn't refactorize outgoing migration? I tried, and have it partially done on my local tree. But it depends (too much) of current_migration global variable -> Libvirt folks will also want "info migrate" to work on the incoming side, i.e. current_migraition has to also be updated on incoming side. Done until here, but then I hit the wall "incoming migration is synchronous". To make the monitor work on incoming migration, we need to change buffered_file.c abstraction to also work for incoming fd's, or another similar solution. I am open to suggestions about what to do here. This series are quite simple (the unfinished part is more complex), will send the other part as an RFC later. Please review and consider to apply it. Later, Juan. Juan Quintela (5): Exit if incoming migration fails Factorize common migration incoming code QMP: Introduce MIGRATION events QMP: Emit migration events on incoming migration QMP: Emit migration events on outgoing migration QMP/qmp-events.txt | 50 ++ migration-exec.c | 17 +++-- migration-fd.c | 15 ++- migration-tcp.c| 17 - migration-unix.c | 17 - migration.c| 37 +++-- migration.h|4 +++- monitor.c | 12 monitor.h |4 vl.c |7 ++- 10 files changed, 119 insertions(+), 61 deletions(-)
[Qemu-devel] [PATCH 2/5] Factorize common migration incoming code
Signed-off-by: Juan Quintela --- migration-exec.c | 14 +- migration-fd.c | 14 +- migration-tcp.c | 15 ++- migration-unix.c | 15 ++- migration.c | 13 + migration.h |2 ++ 6 files changed, 21 insertions(+), 52 deletions(-) diff --git a/migration-exec.c b/migration-exec.c index 5435827..07af11a 100644 --- a/migration-exec.c +++ b/migration-exec.c @@ -111,20 +111,8 @@ err_after_alloc: static void exec_accept_incoming_migration(void *opaque) { QEMUFile *f = opaque; -int ret; -ret = qemu_loadvm_state(f); -if (ret < 0) { -fprintf(stderr, "load of migration failed\n"); -goto err; -} -qemu_announce_self(); -DPRINTF("successfully loaded vm state\n"); - -if (autostart) -vm_start(); - -err: +process_incoming_migration(f); qemu_set_fd_handler2(qemu_stdio_fd(f), NULL, NULL, NULL, NULL); qemu_fclose(f); } diff --git a/migration-fd.c b/migration-fd.c index 0abd372..6d14505 100644 --- a/migration-fd.c +++ b/migration-fd.c @@ -104,20 +104,8 @@ err_after_alloc: static void fd_accept_incoming_migration(void *opaque) { QEMUFile *f = opaque; -int ret; -ret = qemu_loadvm_state(f); -if (ret < 0) { -fprintf(stderr, "load of migration failed\n"); -goto err; -} -qemu_announce_self(); -DPRINTF("successfully loaded vm state\n"); - -if (autostart) -vm_start(); - -err: +process_incoming_migration(f); qemu_set_fd_handler2(qemu_stdio_fd(f), NULL, NULL, NULL, NULL); qemu_fclose(f); } diff --git a/migration-tcp.c b/migration-tcp.c index 95ce722..20f2e37 100644 --- a/migration-tcp.c +++ b/migration-tcp.c @@ -143,7 +143,7 @@ static void tcp_accept_incoming_migration(void *opaque) socklen_t addrlen = sizeof(addr); int s = (unsigned long)opaque; QEMUFile *f; -int c, ret; +int c; do { c = qemu_accept(s, (struct sockaddr *)&addr, &addrlen); @@ -162,18 +162,7 @@ static void tcp_accept_incoming_migration(void *opaque) goto out; } -ret = qemu_loadvm_state(f); -if (ret < 0) { -fprintf(stderr, "load of migration failed\n"); -goto out_fopen; -} -qemu_announce_self(); -DPRINTF("successfully loaded vm state\n"); - -if (autostart) -vm_start(); - -out_fopen: +process_incoming_migration(f); qemu_fclose(f); out: qemu_set_fd_handler2(s, NULL, NULL, NULL, NULL); diff --git a/migration-unix.c b/migration-unix.c index 49de1b9..57232c0 100644 --- a/migration-unix.c +++ b/migration-unix.c @@ -149,7 +149,7 @@ static void unix_accept_incoming_migration(void *opaque) socklen_t addrlen = sizeof(addr); int s = (unsigned long)opaque; QEMUFile *f; -int c, ret; +int c; do { c = qemu_accept(s, (struct sockaddr *)&addr, &addrlen); @@ -168,18 +168,7 @@ static void unix_accept_incoming_migration(void *opaque) goto out; } -ret = qemu_loadvm_state(f); -if (ret < 0) { -fprintf(stderr, "load of migration failed\n"); -goto out_fopen; -} -qemu_announce_self(); -DPRINTF("successfully loaded vm state\n"); - -if (autostart) -vm_start(); - -out_fopen: +process_incoming_migration(f); qemu_fclose(f); out: qemu_set_fd_handler2(s, NULL, NULL, NULL, NULL); diff --git a/migration.c b/migration.c index 9c1d4b6..c388902 100644 --- a/migration.c +++ b/migration.c @@ -58,6 +58,19 @@ int qemu_start_incoming_migration(const char *uri) return ret; } +void process_incoming_migration(QEMUFile *f) +{ +if (qemu_loadvm_state(f) < 0) { +fprintf(stderr, "load of migration failed\n"); +exit(0); +} +qemu_announce_self(); +DPRINTF("successfully loaded vm state\n"); + +if (autostart) +vm_start(); +} + int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data) { MigrationState *s = NULL; diff --git a/migration.h b/migration.h index dd423a1..017e9c3 100644 --- a/migration.h +++ b/migration.h @@ -50,6 +50,8 @@ struct FdMigrationState void *opaque; }; +void process_incoming_migration(QEMUFile *f); + int qemu_start_incoming_migration(const char *uri); int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data); -- 1.6.6.1
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 08:25 AM, Avi Kivity wrote: On 05/25/2010 04:17 PM, Anthony Liguori wrote: On 05/25/2010 04:14 AM, Avi Kivity wrote: On 05/24/2010 10:38 PM, Anthony Liguori wrote: - Building a plugin API seems a bit simpler to me, although I'm to sure if I'd get the idea correctly: The block layer has already some kind of api (.bdrv_file_open, .bdrv_read). We could simply compile the block-drivers as shared objects and create a method for loading the necessary modules at runtime. That approach would be a recipe for disaster. We would have to introduce a new, reduced functionality block API that was supported for plugins. Otherwise, the only way a plugin could keep up with our API changes would be if it was in tree which defeats the purpose of having plugins. We could guarantee API/ABI stability in a stable branch but not across releases. We have releases every six months. There would be tons of block plugins that didn't work for random sets of releases. That creates a lot of user confusion and unhappiness. The current situation is that those block format drivers only exist in qemu.git or as patches. Surely that's even more unhappiness. Confusion could be mitigated: $ qemu -module my-fancy-block-format-driver.so my-fancy-block-format-driver.so does not support this version of qemu (0.19.2). Please contact my-fancy-block-format-driver-de...@example.org. The question is how many such block format drivers we expect. We now have two in the pipeline (ceph, sheepdog), it's reasonable to assume we'll want an lvm2 driver and btrfs driver. This is an area with a lot of activity and a relatively simply interface. If we expose a simple interface, I'm all for it. But BlockDriver is not simple and things like the snapshoting API need love. Of course, there's certainly a question of why we're solving this in qemu at all. Wouldn't it be more appropriate to either (1) implement a kernel module for ceph/sheepdog if performance matters or (2) implement BUSE to complement FUSE and CUSE to enable proper userspace block devices. If you want to use a block device within qemu, you almost certainly want to be able to manipulate it on the host using standard tools (like mount and parted) so it stands to reason that addressing this in the kernel makes more sense. Regards, Anthony Liguori
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 09:01 AM, Avi Kivity wrote: On 05/25/2010 04:55 PM, Anthony Liguori wrote: On 05/25/2010 08:38 AM, Avi Kivity wrote: On 05/25/2010 04:35 PM, Anthony Liguori wrote: On 05/25/2010 08:31 AM, Avi Kivity wrote: A protocol based mechanism has the advantage of being more robust in the face of poorly written block backends so if it's possible to make it perform as well as a plugin, it's a preferable approach. May be hard due to difficulty of exposing guest memory. If someone did a series to add plugins, I would expect a very strong argument as to why a shared memory mechanism was not possible or at least plausible. I'm not sure I understand why shared memory is such a bad thing wrt KVM. Can you elaborate? Is it simply a matter of fork()? fork() doesn't work in the with of memory hotplug. What else is there? Is it that fork() doesn't work or is it that fork() is very expensive? It doesn't work, fork() is done at block device creation time, which freezes the child memory map, while guest memory is allocated at hotplug time. Now I'm confused. I thought you were saying shared memory somehow affects fork(). If you're talking about shared memory inheritance via fork(), that's less important. You can also pass /dev/shm fd's via SCM_RIGHTs to establish shared memory segments dynamically. Regards, Anthony Liguori fork() actually isn't very expensive since we use MADV_DONTFORK (probably fast enough for everything except realtime). It may be possible to do a processfd() which can be mmap()ed by another process to export anonymous memory using mmu notifiers, not sure how easy or mergeable that is.
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 08:38 AM, Avi Kivity wrote: On 05/25/2010 04:35 PM, Anthony Liguori wrote: On 05/25/2010 08:31 AM, Avi Kivity wrote: A protocol based mechanism has the advantage of being more robust in the face of poorly written block backends so if it's possible to make it perform as well as a plugin, it's a preferable approach. May be hard due to difficulty of exposing guest memory. If someone did a series to add plugins, I would expect a very strong argument as to why a shared memory mechanism was not possible or at least plausible. I'm not sure I understand why shared memory is such a bad thing wrt KVM. Can you elaborate? Is it simply a matter of fork()? fork() doesn't work in the with of memory hotplug. What else is there? Is it that fork() doesn't work or is it that fork() is very expensive? Regards, Anthony Liguori
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 08:57 AM, Avi Kivity wrote: On 05/25/2010 04:54 PM, Anthony Liguori wrote: On 05/25/2010 08:36 AM, Avi Kivity wrote: We'd need a kernel-level generic snapshot API for this eventually. or (2) implement BUSE to complement FUSE and CUSE to enable proper userspace block devices. Likely slow due do lots of copying. Also needs a snapshot API. The kernel could use splice. Still can't make guest memory appear in (A)BUSE process memory without either mmu tricks (vmsplice in reverse) or a copy. May be workable for an (A)BUSE driver that talks over a network, and thus can splice() its way out. splice() actually takes offset parameter so it may be possible to treat that offset parameter as a file offset. That would essentially allow you to implement a splice() based thread pool where splice() replaces preadv/pwritev. It's not quite linux-aio, but it should take you pretty far. I think the main point is that the problem of allowing block plugins to qemu is the same as block plugins for the kernel. The kernel doesn't provide a stable interface (and we probably can't for the same reasons) and it's generally discourage from a code quality perspective. That said, making an external program work well as a block backend is identical to making userspace block devices fast. Regards, Anthony Liguori
Re: [Qemu-devel] Re: [PATCH v2 12/15] monitor: Add basic device state visualization
On 05/25/2010 04:31 PM, Anthony Liguori wrote: On 05/25/2010 08:19 AM, Avi Kivity wrote: On 05/25/2010 04:03 PM, Anthony Liguori wrote: I don't think that qdev device names and paths are something we have to worry much about changing over time since they reflect logical bus layout. They should remain static provided the devices remain static. Modulo mistakes. We already saw one (lack of pci domains). To reduce the possibility of mistakes, we need reviewable documentation. pci domains was only a mistake as a nice-to-have. We can add pci domains in a backwards compatible way. It adds a new level to the qdev tree. The tree is not organized like that today. IOW, the PCI hierarchy is not reflected in the qdev hierarchy. All PCI devices (regardless of whether they're a function or a full slot) simply sit below the PCI bus. That's a bug IMO, but regardless, s/qdev tree/pci device component of the qdev path/. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 04:55 PM, Anthony Liguori wrote: On 05/25/2010 08:38 AM, Avi Kivity wrote: On 05/25/2010 04:35 PM, Anthony Liguori wrote: On 05/25/2010 08:31 AM, Avi Kivity wrote: A protocol based mechanism has the advantage of being more robust in the face of poorly written block backends so if it's possible to make it perform as well as a plugin, it's a preferable approach. May be hard due to difficulty of exposing guest memory. If someone did a series to add plugins, I would expect a very strong argument as to why a shared memory mechanism was not possible or at least plausible. I'm not sure I understand why shared memory is such a bad thing wrt KVM. Can you elaborate? Is it simply a matter of fork()? fork() doesn't work in the with of memory hotplug. What else is there? Is it that fork() doesn't work or is it that fork() is very expensive? It doesn't work, fork() is done at block device creation time, which freezes the child memory map, while guest memory is allocated at hotplug time. fork() actually isn't very expensive since we use MADV_DONTFORK (probably fast enough for everything except realtime). It may be possible to do a processfd() which can be mmap()ed by another process to export anonymous memory using mmu notifiers, not sure how easy or mergeable that is. -- error compiling committee.c: too many arguments to function
[Qemu-devel] Re: [PATCH 7/7] trace: Trace virtqueue operations
On Tue, May 25, 2010 at 2:52 PM, Avi Kivity wrote: > Hm. Perhaps we can convert %{type} to %p for backends which don't support > it, and to whatever format they do support for those that do. True. Stefan
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
Am 25.05.2010 15:55, schrieb Avi Kivity: > On 05/25/2010 04:53 PM, Kevin Wolf wrote: >> >> I'm still not convinced that we need either. I share Christoph's concern >> that we would make our life harder for almost no gain. It's probably a >> very small group of users (if it exists at all) that wants to add new >> block drivers themselves, but at the same time can't run upstream qemu. >> >> > > The first part of your argument may be true, but the second isn't. No > user can run upstream qemu.git. It's not tested or supported, and has > no backwards compatibility guarantees. The second part was basically meant to say "developers don't count here". Kevin
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 08:36 AM, Avi Kivity wrote: We'd need a kernel-level generic snapshot API for this eventually. or (2) implement BUSE to complement FUSE and CUSE to enable proper userspace block devices. Likely slow due do lots of copying. Also needs a snapshot API. The kernel could use splice. (ABUSE was proposed a while ago by Zach). If you want to use a block device within qemu, you almost certainly want to be able to manipulate it on the host using standard tools (like mount and parted) so it stands to reason that addressing this in the kernel makes more sense. qemu-nbd also allows this. This reasoning also applies to qcow2, btw. I know. Regards, Anthony Liguori
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 08:55 AM, Avi Kivity wrote: On 05/25/2010 04:53 PM, Kevin Wolf wrote: I'm still not convinced that we need either. I share Christoph's concern that we would make our life harder for almost no gain. It's probably a very small group of users (if it exists at all) that wants to add new block drivers themselves, but at the same time can't run upstream qemu. The first part of your argument may be true, but the second isn't. No user can run upstream qemu.git. It's not tested or supported, and has no backwards compatibility guarantees. Yes, it does have backwards compatibility guarantees. Regards, Anthony Liguori
[Qemu-devel] Re: [PATCH 7/7] trace: Trace virtqueue operations
On 05/25/2010 04:27 PM, Stefan Hajnoczi wrote: On Tue, May 25, 2010 at 1:04 PM, Avi Kivity wrote: Those %ps are more or less useless. We need better ways of identifying them. You're right, the vq pointer is useless in isolation. We don't know which virtio device or which virtqueue number. With the full context of a trace it would be possible to correlate the vq pointer if we had trace events for vdev and vq setup. Adding custom formatters is could be tricky since the format string is passed only to tracing backends that use it, like UST. And UST uses its own sprintf implementation which we don't have direct control over. Hm. Perhaps we can convert %{type} to %p for backends which don't support it, and to whatever format they do support for those that do. -- error compiling committee.c: too many arguments to function
[Qemu-devel] Re: [PATCH 7/7] trace: Trace virtqueue operations
On Tue, May 25, 2010 at 1:04 PM, Avi Kivity wrote: > Those %ps are more or less useless. We need better ways of identifying > them. You're right, the vq pointer is useless in isolation. We don't know which virtio device or which virtqueue number. With the full context of a trace it would be possible to correlate the vq pointer if we had trace events for vdev and vq setup. Adding custom formatters is could be tricky since the format string is passed only to tracing backends that use it, like UST. And UST uses its own sprintf implementation which we don't have direct control over. I think we just need to guarantee that any pointer can be correlated with previous trace entries that give context for that pointer. Stefan
[Qemu-devel] Re: [PATCH] add support for protocol driver create_options
Am 24.05.2010 08:34, schrieb MORITA Kazutaka: > At Fri, 21 May 2010 18:57:36 +0200, > Kevin Wolf wrote: >> >> Am 20.05.2010 07:36, schrieb MORITA Kazutaka: >>> + >>> +/* >>> + * Append an option list (list) to an option list (dest). >>> + * >>> + * If dest is NULL, a new copy of list is created. >>> + * >>> + * Returns a pointer to the first element of dest (or the newly allocated >>> copy) >>> + */ >>> +QEMUOptionParameter *append_option_parameters(QEMUOptionParameter *dest, >>> +QEMUOptionParameter *list) >>> +{ >>> +size_t num_options, num_dest_options; >>> + >>> +num_options = count_option_parameters(dest); >>> +num_dest_options = num_options; >>> + >>> +num_options += count_option_parameters(list); >>> + >>> +dest = qemu_realloc(dest, (num_options + 1) * >>> sizeof(QEMUOptionParameter)); >>> + >>> +while (list && list->name) { >>> +if (get_option_parameter(dest, list->name) == NULL) { >>> +dest[num_dest_options++] = *list; >> >> You need to add a dest[num_dest_options].name = NULL; here. Otherwise >> the next loop iteration works on uninitialized memory and possibly an >> unterminated list. I got a segfault for that reason. >> > > I forgot to add it, sorry. > Fixed version is below. > > Thanks, > > Kazutaka > > == > This patch enables protocol drivers to use their create options which > are not supported by the format. For example, protcol drivers can use > a backing_file option with raw format. > > Signed-off-by: MORITA Kazutaka $ ./qemu-img create -f qcow2 -o cluster_size=4k /tmp/test.qcow2 4G Unknown option 'cluster_size' qemu-img: Invalid options for file format 'qcow2'. I think you added another num_dest_options++ which shouldn't be there. Kevin
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 04:53 PM, Kevin Wolf wrote: I'm still not convinced that we need either. I share Christoph's concern that we would make our life harder for almost no gain. It's probably a very small group of users (if it exists at all) that wants to add new block drivers themselves, but at the same time can't run upstream qemu. The first part of your argument may be true, but the second isn't. No user can run upstream qemu.git. It's not tested or supported, and has no backwards compatibility guarantees. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Re: [PATCH v2 12/15] monitor: Add basic device state visualization
On 05/25/2010 08:19 AM, Avi Kivity wrote: On 05/25/2010 04:03 PM, Anthony Liguori wrote: I don't think that qdev device names and paths are something we have to worry much about changing over time since they reflect logical bus layout. They should remain static provided the devices remain static. Modulo mistakes. We already saw one (lack of pci domains). To reduce the possibility of mistakes, we need reviewable documentation. pci domains was only a mistake as a nice-to-have. We can add pci domains in a backwards compatible way. It adds a new level to the qdev tree. The tree is not organized like that today. IOW, the PCI hierarchy is not reflected in the qdev hierarchy. All PCI devices (regardless of whether they're a function or a full slot) simply sit below the PCI bus. The arguments you're making about the importance of backwards compatibility and what's needed to strongly guarantee it are equally applicable to the live migration protocol. We really do need to formally document the live migration protocol in such a way that it's reviewable if we hope to truly make it compatible across versions. Mostly agreed. I think live migration has a faster/easier deprecation schedule (easier not to support migration from 0.n-k to 0.n than to remove qmp support for a feature introduced in 0.n-k when releasing 0.n). But that's a minor concern, improving our externally visible interface documentation is a good thing and badly needed. Regards, Anthony Liguori
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
Am 25.05.2010 15:25, schrieb Anthony Liguori: > On 05/25/2010 06:25 AM, Avi Kivity wrote: >> On 05/25/2010 02:02 PM, Kevin Wolf wrote: >>> > So could we not standardize a protocol for this that both sheepdog and > ceph could implement? The protocol already exists, nbd. It doesn't support snapshotting etc. but we could extend it. But IMO what's needed is a plugin API for the block layer. >>> What would it buy us, apart from more downstreams and having to maintain >>> a stable API and ABI? >> >> Currently if someone wants to add a new block format, they have to >> upstream it and wait for a new qemu to be released. With a plugin >> API, they can add a new block format to an existing, supported qemu. > > Whether we have a plugin or protocol based mechanism to implement block > formats really ends up being just an implementation detail. > > In order to implement either, we need to take a subset of block > functionality that we feel we can support long term and expose that. > Right now, that's basically just querying characteristics (like size and > geometry) and asynchronous reads and writes. > > A protocol based mechanism has the advantage of being more robust in the > face of poorly written block backends so if it's possible to make it > perform as well as a plugin, it's a preferable approach. > > Plugins that just expose chunks of QEMU internal state directly (like > BlockDriver) are a really bad idea IMHO. I'm still not convinced that we need either. I share Christoph's concern that we would make our life harder for almost no gain. It's probably a very small group of users (if it exists at all) that wants to add new block drivers themselves, but at the same time can't run upstream qemu. But if we were to decide that there's no way around it, I agree with you that directly exposing the internal API isn't going to work. Kevin
Re: [Qemu-devel] [PATCH] resent: x86/cpuid: propagate further CPUID leafs when -cpu host
On 05/25/2010 04:26 PM, Anthony Liguori wrote: On 05/25/2010 08:21 AM, Andre Przywara wrote: What's the benefit of exposing this information to the guest? That is mostly to propagate the cache size and organization parameters to the guest: >> +/* safe CPUID leafs to propagate to guest if -cpu host is specified >> + * Intel defined leafs: >> + * Cache descriptors (0x02) >> + * Deterministic cache parameters (0x04) >> + * Monitor/MWAIT parameters (0x05) >> + * >> + * AMD defined leafs: >> + * L1 Cache and TLB (0x05) >> + * L2+L3 TLB (0x06) >> + * LongMode address size (0x08) >> + * 1GB page TLB (0x19) >> + * Performance optimization (0x1A) >> + */ Since at least L1 and L2 caches are mostly private to vCPUs, I see no reason to disguise them. But in practice, what is it useful for? See my other mail. Just because we can expose it doesn't mean we should. What's the point of -cpu host then? -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 04:25 PM, Anthony Liguori wrote: Currently if someone wants to add a new block format, they have to upstream it and wait for a new qemu to be released. With a plugin API, they can add a new block format to an existing, supported qemu. Whether we have a plugin or protocol based mechanism to implement block formats really ends up being just an implementation detail. True. In order to implement either, we need to take a subset of block functionality that we feel we can support long term and expose that. Right now, that's basically just querying characteristics (like size and geometry) and asynchronous reads and writes. Unfortunately, you're right. A protocol based mechanism has the advantage of being more robust in the face of poorly written block backends so if it's possible to make it perform as well as a plugin, it's a preferable approach. May be hard due to difficulty of exposing guest memory. Plugins that just expose chunks of QEMU internal state directly (like BlockDriver) are a really bad idea IMHO. Also, we don't want to expose all of the qemu API. We should default the visibility attribute to "hidden" and expose only select functions, perhaps under their own interface. And no inlines. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 04:35 PM, Anthony Liguori wrote: On 05/25/2010 08:31 AM, Avi Kivity wrote: A protocol based mechanism has the advantage of being more robust in the face of poorly written block backends so if it's possible to make it perform as well as a plugin, it's a preferable approach. May be hard due to difficulty of exposing guest memory. If someone did a series to add plugins, I would expect a very strong argument as to why a shared memory mechanism was not possible or at least plausible. I'm not sure I understand why shared memory is such a bad thing wrt KVM. Can you elaborate? Is it simply a matter of fork()? fork() doesn't work in the with of memory hotplug. What else is there? -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
On 05/25/2010 08:31 AM, Avi Kivity wrote: A protocol based mechanism has the advantage of being more robust in the face of poorly written block backends so if it's possible to make it perform as well as a plugin, it's a preferable approach. May be hard due to difficulty of exposing guest memory. If someone did a series to add plugins, I would expect a very strong argument as to why a shared memory mechanism was not possible or at least plausible. I'm not sure I understand why shared memory is such a bad thing wrt KVM. Can you elaborate? Is it simply a matter of fork()? Plugins that just expose chunks of QEMU internal state directly (like BlockDriver) are a really bad idea IMHO. Also, we don't want to expose all of the qemu API. We should default the visibility attribute to "hidden" and expose only select functions, perhaps under their own interface. And no inlines. Yeah, if we did plugins, this would be a key requirement. Regards, Anthony Liguori