Re: [Qemu-devel] QEMU Command Line Options
hi, Qasim: pls refer to 'qemu-options.hx' and the big switch in 'vl.c'. Thanks, Jules. At 2014-03-28 14:25:43,"Qasim Maqbool" wrote: Hi, I need to add a few command line options to QEMU. However, I am yet to determine how QEMU takes input from the command line and parses the option values. I have tried looking at various files including vl.c and cmd.c but nothing seems to work right now. Can anyone put me on the right path here? Or am I trying to do something which is impossible? Thanks.
Re: [Qemu-devel] [PATCH v2 2/4] Curling: cmdline interface.
At 2013-10-01 06:16:34,"Eric Blake" wrote: >On 09/29/2013 02:14 PM, Jules Wang wrote: >> Add an option '-f' to migration cmdline. >> Indicating whether to enable fault tolerant or not. >> >> Signed-off-by: Jules Wang >> --- >> .help = "migrate to URI (using -d to not wait for completion)" >> "\n\t\t\t -b for migration without shared storage with" >> " full copy of disk\n\t\t\t -i for migration without " >> "shared storage with incremental copy of disk " >> - "(base image shared between src and destination)", >> + "(base image shared between src and destination)" >> + "\n\t\t\t -f for fault tolerant, this is another " >> + "feature rather than migrate", > >That sounds awkward, and overly long. Maybe go with just: > >-f for fault tolerance mode > >and let the user then read the full documentation for what it entails. Agree. >> -@item migrate [-d] [-b] [-i] @var{uri} >> +@item migrate [-d] [-b] [-i] [-f] @var{uri} >> @findex migrate >> Migrate to @var{uri} (using -d to not wait for completion). >> -b for migration with full copy of disk >> -i for migration with incremental copy of disk (base image is shared) >> +-f for fault tolerant > >Can -d and -f be used at the same time, or are they exclusive? AFAK, The migration is always detached(In the code, the -d option is always false), -d and -f can be used at the same time with no doubt. qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, !!ft, ft, &err); By the way, neither -b nor -i could be used at the same time with -f, fault tolerant needs shared storage. >> +++ b/hmp.c >> @@ -1213,10 +1213,11 @@ void hmp_migrate(Monitor *mon, const QDict *qdict) >> int detach = qdict_get_try_bool(qdict, "detach", 0); >> int blk = qdict_get_try_bool(qdict, "blk", 0); >> int inc = qdict_get_try_bool(qdict, "inc", 0); >> + int ft = qdict_get_try_bool(qdict, "ft", 0); > >Why two spaces? To align the '=', I will remove them if you like. > >> +++ b/qapi-schema.json >> @@ -2420,7 +2420,8 @@ >> # Since: 0.14.0 >> ## >> { 'command': 'migrate', >> - 'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' >> } } >> + 'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool', >> + '*ft': 'bool' } } > >Missing documentation, including mention that the new option was only >made available in 1.7. We still don't have introspection; is there some >other means by which libvirt and other management apps can tell whether >this feature is available? I'm not clear about how to do that, could you pls give me some hints, where to add code and documentation. >Furthermore, 'ft' is an awfully short name; >for QMP, we prefer to use full words where possible, such as >'fault-tolerant'. Agree. >-- >Eric Blake eblake redhat com+1-919-301-3266 >Libvirt virtualization library http://libvirt.org >
Re: [Qemu-devel] [PATCH RFC 0/4] Curling: KVM Fault Tolerance
hi, >At the moment in your implementation the prefetch buffer can be very large >(several copies of guest memory size) >are you planning to address this >issue? >I agree but we need some way to notify the user of such problem. This issue has been handled (maybe not in the best way). The prefetch buffer size could be increased up to 1.5 * vm memory size. When the migration data size is larger than it, the prefetching is stopped with a warning (pls refer to the code 4/4) and the loading is started. In this situation, broken-in-the-middle problem is inevitable. >>>The second is that sadly live migration doesn't always converge this means that the backup VM won't have a consist state to failover to. >You need to >>>detect such a case and throttle down the guest to force convergence. > >> Yes, that's a problem. AFAK, qemu already have an auto convergence feature. > How about activating it when you do fault tolerance automatically? That is feasible. Any comments by others?
Re: [Qemu-devel] [PATCH RFC 4/4] Curling: the receiver
hi, At 2013-09-10 22:19:48,"Juan Quintela" wrote: >> @@ -112,13 +113,24 @@ static void process_incoming_migration_co(void *opaque) >> { >> QEMUFile *f = opaque; >> int ret; >> +int count = 0; >> >> -ret = qemu_loadvm_state(f); >> -qemu_fclose(f); >> -if (ret < 0) { >> -fprintf(stderr, "load of migration failed\n"); >> -exit(EXIT_FAILURE); >> +if (ft_enabled()) { >> +while (qemu_loadvm_state_ft(f) >= 0) { >> +count++; >> +DPRINTF("incoming count %d\r", count); >> +} >> +qemu_fclose(f); >> +fprintf(stderr, "ft connection lost, launching self..\n"); > >Obviously, here we are needing something more that an fprintf,, right? > >We are not checking either if it is one error. Agree. >> +} else { >> +ret = qemu_loadvm_state(f); >> +qemu_fclose(f); >> +if (ret < 0) { >> +fprintf(stderr, "load of migration failed\n"); >> +exit(EXIT_FAILURE); >> +} >> } >> +cpu_synchronize_all_post_init(); >> qemu_announce_self(); >> DPRINTF("successfully loaded vm state\n"); >> >> diff --git a/savevm.c b/savevm.c >> index 6daf690..d5bf153 100644 >> --- a/savevm.c >> +++ b/savevm.c >> @@ -52,6 +52,8 @@ >> #define ARP_PTYPE_IP 0x0800 >> #define ARP_OP_REQUEST_REV 0x3 >> >> +#define PFB_SIZE 0x01 >> + >> static int announce_self_create(uint8_t *buf, >> uint8_t *mac_addr) >> { >> @@ -135,6 +137,10 @@ struct QEMUFile { >> unsigned int iovcnt; >> >> int last_error; >> + >> +uint8_t *pfb; /* pfb -> PerFetch Buffer */ > >s/PreFetch/Prefetcth/ > >prefetch_buffer as name? not used in so many places, makes things >clearer or more convoluted? Other comments? > Agree. >> +static int socket_get_prefetch_buffer(void *opaque, uint8_t *buf, >> + int64_t pos, int size) >> +{ >> +QEMUFile *f = opaque; >> + >> +if (f->pfb_size - pos <= 0) { >> +return 0; >> +} >> + >> +if (f->pfb_size - pos < size) { >> +size = f->pfb_size - pos; >> +} >> + >> +memcpy(buf, f->pfb+pos, size); >> + >> +return size; >> +} >> + >> + >> static int socket_close(void *opaque) >> { >> QEMUFileSocket *s = opaque; >> @@ -440,6 +465,7 @@ QEMUFile *qemu_fdopen(int fd, const char *mode) >> static const QEMUFileOps socket_read_ops = { >> .get_fd = socket_get_fd, >> .get_buffer = socket_get_buffer, >> +.get_prefetch_buffer = socket_get_prefetch_buffer, >> .close = socket_close >> }; >> > >> if (f->last_error) { >> ret = f->last_error; >> } >> + >> +if (f->pfb) { >> +g_free(f->pfb); > >g_free(f->pfb); >It already checks for NULL. Got it. >> +} >> + >> g_free(f); >> return ret; >> } >> @@ -822,6 +853,14 @@ void qemu_put_byte(QEMUFile *f, int v) >> >> static void qemu_file_skip(QEMUFile *f, int size) >> { >> +if (f->pfb_index + size <= f->pfb_size) { >> +f->pfb_index += size; >> +return; >> +} else { >> +size -= f->pfb_size - f->pfb_index; >> +f->pfb_index = f->pfb_size; >> +} >> + >> if (f->buf_index + size <= f->buf_size) { >> f->buf_index += size; >> } >> @@ -831,6 +870,21 @@ static int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, >> int size, size_t offset) >> { >> int pending; >> int index; >> +int done; >> + >> +if (f->ops->get_prefetch_buffer) { >> +if (f->pfb_index + offset < f->pfb_size) { >> +done = f->ops->get_prefetch_buffer(f, buf, f->pfb_index + >> offset, >> + size); >> +if (done == size) { >> +return size; >> +} >> +size -= done; >> +buf += done; >> +} else { >> +offset -= f->pfb_size - f->pfb_index; >> +} >> +} >> >> assert(!qemu_file_is_writable(f)); >> >> @@ -875,7 +929,15 @@ int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size) >> >> static int qemu_peek_byte(QEMUFile *f, int offset) >> { >> -int index = f->buf_index + offset; >> +int index; >> + >> +if (f->pfb_index + offset < f->pfb_size) { >> +return f->pfb[f->pfb_index + offset]; >> +} else { >> +offset -= f->pfb_size - f->pfb_index; >> +} >> + >> +index = f->buf_index + offset; >> >> assert(!qemu_file_is_writable(f)); >> >> @@ -1851,7 +1913,7 @@ void qemu_savevm_state_begin(QEMUFile *f, >> } >> se->ops->set_params(params, se->opaque); >> } >> - >> + >> qemu_put_be32(f, QEMU_VM_FILE_MAGIC); >> qemu_put_be32(f, QEMU_VM_FILE_VERSION); >> >> @@ -2294,8 +2356,6 @@ int qemu_loadvm_state(QEMUFile *f) >> } >> } >> >> -cpu_synchronize_all_post_init(); >> - >> ret = 0; >> >> out: >> @@ -2311,6 +2371,89 @@ out: >> return ret; >>
Re: [Qemu-devel] [PATCH RFC 3/4] Curling: the sender
hi, >> +bool create = false; > >This variable is never set. It is set in the following 'if' block. +create = true;<<=== >> -migration_bitmap = bitmap_new(ram_pages); >> -bitmap_set(migration_bitmap, 0, ram_pages); >> -migration_dirty_pages = ram_pages; >> +if (!ft_enabled() || !migration_bitmap) { >> +migration_bitmap = bitmap_new(ram_pages); >> +bitmap_set(migration_bitmap, 0, ram_pages); >> +migration_dirty_pages = ram_pages; >> +create = true; <== >> +} >Nothing in this patch sets the migration_bitmap to anything. Let me explain all the odd 'if' block: 1 >> +if (!ft_enabled() || !migration_bitmap) { 2 >> +if (!ft_enabled() || create) { 3 >> +if (!ft_enabled()) { As I mentioned in the commit log: >> We need to handle the variables related to live migration very >> carefully. So the new migration does not restart from the very >> begin of the migration, instead, it continues the previous >> migration. Some variables should not be reset after one migration, because the next one need these variables to continue the migration. This explains all the "if ft_enabled()" Besides, some variables need to be initialized at the first migration of curling. That explains the "if create" and "if !migration_bitmap" >> +if (ft_enabled()) { >> +if (old_vm_running) { >> +qemu_mutex_lock_iothread(); >> +vm_start(); >> +qemu_mutex_unlock_iothread(); >> + >> +current_time = >> qemu_clock_get_ms(QEMU_CLOCK_REALTIME); >> +time_spent = current_time - migration_start_time; >> +DPRINTF("this migration lasts for %" PRId64 "ms\n", >> +time_spent); >> +if (time_spent < time_window) { >> +g_usleep((time_window - time_spent)*1000); > >Why are we waiting here? If we are migration faster than allowed, why >we are waiting? Looping fast is not good, that means we enter iothread lock and do vm stop more frequently. The performance will drop and vm user will experience input stall if we do not sleep. How to deal with this is a difficult issue, any suggestion is welcomed. THIS IS ONE OF THE TWO MAIN PROBLEMS. The other one is related to the magic number 0xfeedcafe.
Re: [Qemu-devel] [PATCH RFC 2/4] Curling: cmdline interface
> Shouldn't this be in migration_state? Just wondering. And yes, I > don't see either a trivial place how to get it. get_current_migration()? That's a better idea, I will put 'ft_enabled' in MigrationState Struct. > I think for the outgoing side it should just be "migrate -f tcp:foo:". > On the incoming side, perhaps you could have a different ID instead of > QEMU_VM_FILE_MAGIC, that triggers fault-tolerance mode automatically? I am OK with this solution, '-f' indicates fault-tolerance, right? Have youdecided yet?
Re: [Qemu-devel] [PATCH RFC 0/4] Curling: KVM Fault Tolerance
Hi, >The first is that if the VM failure happen in the middle on the live migration > >the backup VM state will be inconsistent which means you can't failover to >it. Yes, I have concerned about this problem. That is why we need a prefetch buffer. >Solving it is not simple as you need some transaction mechanism that will >>change the backup VM state only when the transaction completes (the live >migration completes). >Kemari has something like that. > The backup VM state will be loaded only when the one whole migration data is prefetched. Otherwise, VM state will not be loaded. So the backup VM is ensured to have a consistent state like a checkpoint. However, how close this checkpoint to the point of the VM failure depends on the workload and bandwidth. >The second is that sadly live migration doesn't always converge this means >>that the backup VM won't have a consist state to failover to. >You need to >detect such a case and throttle down the guest to force convergence. Yes, that's a problem. AFAK, qemu already have an auto convergence feature. >From another perspective, if many migrations could not converge, maybe the >workload is high and the bandwidth is low, and it is not recommended to use >FT in general.