Re: [Qemu-devel] [PATCH] [RFC] aio/async: Add timed bottom-halves
Il 19/07/2013 08:38, Alex Bligh ha scritto: > > However, I still don't quite see how the poll in the mainloop is > meant to exit when a timer expires. There's now no qemu_notify_event, > and no SIGALRM, and the timeout will still be infinite (unless I > calculate the timeout as the minimum across all clocks, in which > case I might as well do (b) above). Yes, that was the idea. Paolo
Re: [Qemu-devel] [PATCH v2 10/17] block: define get_block_status return value
On 16.07.2013 18:29, Paolo Bonzini wrote: Define the return value of get_block_status. Bits 0, 1, 2 and 9-62 are valid; bit 63 (the sign bit) is reserved for errors. Bits 3-7 are left for future extensions. The return code is compatible with the old is_allocated API: returning just 0 or 1 (aka BDRV_BLOCK_DATA) will not cause any behavioral change in clients of is_allocated. We will return more precise information in the next patches. Signed-off-by: Paolo Bonzini --- v1->v2: improved comment block.c | 7 +-- include/block/block.h | 26 ++ 2 files changed, 31 insertions(+), 2 deletions(-) diff --git a/block.c b/block.c index 6e7a8a3..7ff0716 100644 --- a/block.c +++ b/block.c @@ -2990,7 +2990,7 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs, if (!bs->drv->bdrv_co_get_block_status) { *pnum = nb_sectors; -return 1; +return BDRV_BLOCK_DATA; } return bs->drv->bdrv_co_get_block_status(bs, sector_num, nb_sectors, pnum); @@ -3040,7 +3040,10 @@ int64_t bdrv_get_block_status(BlockDriverState *bs, int64_t sector_num, int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum) { -return bdrv_get_block_status(bs, sector_num, nb_sectors, pnum); +int64_t ret = bdrv_get_block_status(bs, sector_num, nb_sectors, pnum); +return +(ret & BDRV_BLOCK_DATA) || +((ret & BDRV_BLOCK_ZERO) && !bdrv_has_zero_init(bs)); i do also not understand the "((ret & BDRV_BLOCK_ZERO) && !bdrv_has_zero_init(bs))"; if a block is unallocated and reads as zero, but the device lacks zero init, it is declared as allocated with this, isn't it? for iscsi and host_device with lbprz==1 or discardzeroes respectively all blocks would return as allocated. is this wanted? Peter
Re: [Qemu-devel] [PATCH] [RFC] aio/async: Add timed bottom-halves
Stefan, --On 19 July 2013 09:58:50 +0800 Stefan Hajnoczi wrote: Options: a) restore alarm timers (at least for the time being). Make all alarm timers do qemu_notify_event. However, only run the AioContext clock's timers within aio_poll. This is the least intrusive change. b) calculate the timeout in aio_poll with respect to the minimum deadline across all clocks, not just AioContext's clock. Use the same logic in mainloop. I'd go for (b), except for the millisecond accuracy thing. So my temptation (sadly) is (a). I think this is a non-issue because host_alarm_handler() can only be called from the main loop: main-loop.c:qemu_signal_init() sets up signalfd to monitor SIGALRM. Therefore we do not asynchronously invoke the SIGALRM signals handler. It is only invoked from main-loop.c:sigfd_handler() when the main loop runs. That's how I read the code. I haven't checked a running process to be sure. Not sure that's the case on win32, but OK let's go with that. However, I still don't quite see how the poll in the mainloop is meant to exit when a timer expires. There's now no qemu_notify_event, and no SIGALRM, and the timeout will still be infinite (unless I calculate the timeout as the minimum across all clocks, in which case I might as well do (b) above). -- Alex Bligh
Re: [Qemu-devel] [PATCH v2 10/17] block: define get_block_status return value
On 16.07.2013 18:29, Paolo Bonzini wrote: Define the return value of get_block_status. Bits 0, 1, 2 and 9-62 are valid; bit 63 (the sign bit) is reserved for errors. Bits 3-7 are left for future extensions. The return code is compatible with the old is_allocated API: returning just 0 or 1 (aka BDRV_BLOCK_DATA) will not cause any behavioral change in clients of is_allocated. We will return more precise information in the next patches. Signed-off-by: Paolo Bonzini --- v1->v2: improved comment block.c | 7 +-- include/block/block.h | 26 ++ 2 files changed, 31 insertions(+), 2 deletions(-) diff --git a/block.c b/block.c index 6e7a8a3..7ff0716 100644 --- a/block.c +++ b/block.c @@ -2990,7 +2990,7 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs, if (!bs->drv->bdrv_co_get_block_status) { *pnum = nb_sectors; -return 1; +return BDRV_BLOCK_DATA; } return bs->drv->bdrv_co_get_block_status(bs, sector_num, nb_sectors, pnum); @@ -3040,7 +3040,10 @@ int64_t bdrv_get_block_status(BlockDriverState *bs, int64_t sector_num, int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum) { -return bdrv_get_block_status(bs, sector_num, nb_sectors, pnum); +int64_t ret = bdrv_get_block_status(bs, sector_num, nb_sectors, pnum); just stumbled about a question i have: isn't here a if (ret < 0) { *pnum = 0; return 0; } missing? I was told that this is the expected return behaviour in the error case of the old API. Peter +return +(ret & BDRV_BLOCK_DATA) || +((ret & BDRV_BLOCK_ZERO) && !bdrv_has_zero_init(bs)); } /* diff --git a/include/block/block.h b/include/block/block.h index fbc0822..83c7112 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -81,6 +81,32 @@ typedef struct BlockDevOps { #define BDRV_SECTOR_SIZE (1ULL << BDRV_SECTOR_BITS) #define BDRV_SECTOR_MASK ~(BDRV_SECTOR_SIZE - 1) +/* BDRV_BLOCK_DATA: data is read from bs->file or another file + * BDRV_BLOCK_ZERO: sectors read as zero + * BDRV_BLOCK_OFFSET_VALID: sector stored in bs->file as raw data + * + * If BDRV_BLOCK_OFFSET_VALID is set, bits 9-62 represent the offset in + * bs->file where sector data can be read from as raw data. + * + * DATA == 0 && ZERO == 0 means that data is read from backing_hd if present. + * + * DATA ZERO OFFSET_VALID + * ttt sectors read as zero, bs->file is zero at offset + * tft sectors read as valid from bs->file at offset + * ftt sectors preallocated, read as zero, bs->file not + *necessarily zero at offset + * fft sectors preallocated but read from backing_hd, + *bs->file contains garbage at offset + * ttf sectors preallocated, read as zero, unknown offset + * tff sectors read from unknown file or offset + * ftf not allocated or unknown offset, read as zero + * fff not allocated or unknown offset, read from backing_hd + */ +#define BDRV_BLOCK_DATA 1 +#define BDRV_BLOCK_ZERO 2 +#define BDRV_BLOCK_OFFSET_VALID 4 +#define BDRV_BLOCK_OFFSET_MASK BDRV_SECTOR_MASK + typedef enum { BDRV_ACTION_REPORT, BDRV_ACTION_IGNORE, BDRV_ACTION_STOP } BlockErrorAction; -- Mit freundlichen Grüßen Peter Lieven ... KAMP Netzwerkdienste GmbH Vestische Str. 89-91 | 46117 Oberhausen Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40 p...@kamp.de | http://www.kamp.de Geschäftsführer: Heiner Lante | Michael Lante Amtsgericht Duisburg | HRB Nr. 12154 USt-Id-Nr.: DE 120607556 ...
Re: [Qemu-devel] [PATCH 0/4] export internal snapshot by qemu-nbd
于 2013-7-17 22:23, Eric Blake 写道: On 07/17/2013 08:03 AM, Wenchao Xia wrote: This series allow user to read internal snapshot's contents without qemu-img convert. Another purpose is that, when qemu is online and have taken an internal snapshot, let user invoke qemu-nbd to do any thing on it except write. This brings two interesting issues: 1 is it safe to let qemu-nbd and qemu access that file at same time? Probably not, for the same reason we tell people to not use qemu-img while qemu is active on a file. For external case, or backing chain, qemu-nbd export while qemu is active, do you think it is OK? base->imageA qemu-nbd export base qemu use imageA. I think it is safe, since qemu-nbd is read only. The data will be correct from qemu-nbd, if qemu does not delete that snapshot when qemu-nbd is running, and data is flushed to storage after qemu take that snapshot so that qemu-nbd would see the correct data. You're making assumptions that qemu won't be touching any metadata in a manner in which the read-only qemu-nbd could get confused; I'm not sure we are ready to make that guarantee. I think the export has to be from the running qemu process itself, rather than from a second process. 2 should an nbd-server exporting internal snapshot be added in qemu? I think no. Compared with driver-backup, the snapshot, or COW happens in storage level, so it allows another program to read it itself. Actually it should be OK to let another server other than qemu's host, do the export I/O job, if data is flushed. Unfortunately, I disagree, and think the answer to this question is yes, we need to do the export from within the single qemu process, if we want to guarantee safety. Next step: As demonstrated before, an explict API should be added, which make sure all I/O request is flushed and sent to underlining storage, and cache is sync if it is writeback type. So at qemu level, we can make sure no request is left behind, before qemu-nbd start. That API should also benefit 3rd party block snapshot solution, such as LVM2. More: With this patch and previous qcow2 snapshot at block device level, I think export/import/backup solution around qcow2, is nearly complete at qemu's level. It can do similar things as backing chain but with better performance, Some small optimization place are left: 1 compare two snapshot's data to get the diff with help of qcow2's L1/L2 table. 2 convertion between external snapshot and internal snapshot. This series need following series applied first: [PATCH V5 0/8] add internal snapshot support at block device level http://lists.nongnu.org/archive/html/qemu-devel/2013-07/msg01831.html Wenchao Xia (4): 1 snapshot: distinguish id and name in load_tmp 2 qemu-nbd: support internal snapshot export 3 qemu-nbd: add doc for internal snapshot export 4 qemu-iotests: add 057 internal snapshot export with qemu-nbd case block/qcow2-snapshot.c | 16 +++- block/qcow2.h |5 ++- block/snapshot.c | 37 ++- include/block/block_int.h |4 ++- include/block/snapshot.h |4 ++- qemu-img.c |7 +++- qemu-nbd.c | 56 - qemu-nbd.texi |3 ++ tests/qemu-iotests/057 | 87 tests/qemu-iotests/057.out | 26 + tests/qemu-iotests/group |1 + 11 files changed, 237 insertions(+), 9 deletions(-) create mode 100755 tests/qemu-iotests/057 create mode 100644 tests/qemu-iotests/057.out -- Best Regards Wenchao Xia
Re: [Qemu-devel] [PATCH] [RFC] aio/async: Add timed bottom-halves
Il 19/07/2013 03:58, Stefan Hajnoczi ha scritto: >> Options: >> >> a) restore alarm timers (at least for the time being). Make all >> alarm timers do qemu_notify_event. However, only run the AioContext >> clock's timers within aio_poll. This is the least intrusive change. >> >> b) calculate the timeout in aio_poll with respect to the minimum >> deadline across all clocks, not just AioContext's clock. Use the >> same logic in mainloop. >> >> I'd go for (b), except for the millisecond accuracy thing. So my >> temptation (sadly) is (a). > > I think this is a non-issue because host_alarm_handler() can only be > called from the main loop: > > main-loop.c:qemu_signal_init() sets up signalfd to monitor SIGALRM. > Therefore we do not asynchronously invoke the SIGALRM signals handler. > It is only invoked from main-loop.c:sigfd_handler() when the main loop > runs. > > That's how I read the code. I haven't checked a running process to be > sure. I think you're right. It was not strictly needed even with alarm timers, because host_alarm_handler() is called always before qemu_run_all_timers. But it made the code more robust. With your change to delete alarm timers and move the deadline to poll, the next poll call will have a timeout of 0 and all will be well. As to millisecond accuracy, as discussed earlier we can use ppoll on Linux. This of course should be introduced before alarm timers are deleted, to avoid breaking bisection. Paolo >> 2. If we do delete alarm timers, I'll need to delete the -clock option. > > I noticed this too because I think we should stub it out for > compatibility. Accept existing options but ignore them, update > documentation to state that they are kept for compatibility. > > Stefan >
Re: [Qemu-devel] [PATCH 4/4] qemu-img: conditionally discard target on convert
On 19.07.2013 07:58, Paolo Bonzini wrote: Il 18/07/2013 21:28, Peter Lieven ha scritto: thanks for the details. I think to have optimal performance and best change for unmapping in qemu-img convert it might be best to export the OPTIMAL UNMAP GRANULARITY Agreed about this. as well as the write_zeroes_w_discard capability via the BDI But why this?!? It is _not_ needed. All you need is to change the default of the "-S" option to be the OPTIMAL UNMAP GRANULARITY if it is nonzero. 2 reasons: a) Does this guarantee that the requests are aligned to multiple of the -S value? b) If I this flag exists qemu-img convert can do the "zeroing" a priori. This has the benefit that combined with bdrv_get_block_status requests before it might not need to touch large areas of the volume. Speaking for iSCSI its likely that the user sets a fresh volume as the destination, but its not guaranteed. With the Patch 4 there are only done a few get_block_status requests to verify this. If we just write zeroes with BDRV_MAY_UNMAP, we send hundreds or thousands of writesame requests for possibly already unmapped data. To give an example. If I take my storage with an 1TB volume. It takes about 10-12 get_block_status requests to verify that it is completely unmapped. After this I am safe to set has_zero_init = 1 in qemu-img convert. If I would convert a 1TB image to this target where lets say 50% are at leat 15MB zero blocks (15MB is the OUG of my storage) it would take ~35000 write same requests to achieve the same. Peter Paolo and than zero out the whole device with bdrv_write_zeroes and the BDRV_MAY_DISCARD flag. the logic for this is already prepared in patch4 (topic of this email). except that i have to exchange bdrv_discard with bdrv_write_zeroes w/ BDRV_MAY_DISCARD. what do you think? On Thu, Jul 18, 2013 at 11:43 AM, Peter Lieven wrote: Am 18.07.2013 um 16:35 schrieb Paolo Bonzini : Il 18/07/2013 16:32, Peter Lieven ha scritto: (Mis)alignment and granularity can be handled later. We can ignore them for now. Later, if we decide the best way to support them is a flag, we'll add it. Let's not put the cart before the horse. BTW, I expect alignment!=0 to be really, really rare. To explain my concerns: I know that my target has internal page size of 15MB. I will check what happens if I deallocate this 15MB in chunks of lets say 1MB. If the page gets unprovisioned after the last chunk is unmapped it would be fine :-) You're talking of granularity here, not (mis)alignment. you are right. for the target i am talking about this is 30720 512-byte blocks for the granularity (pagesize) and 0 for the alignment. i will see what happens if I write same w/unmap the whole 30720 blocks in smaller blocks ;-) otherwise i will have to add support for honoring this values in qemu-img convert as a follow up. Peter -- Mit freundlichen Grüßen Peter Lieven ... KAMP Netzwerkdienste GmbH Vestische Str. 89-91 | 46117 Oberhausen Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40 p...@kamp.de | http://www.kamp.de Geschäftsführer: Heiner Lante | Michael Lante Amtsgericht Duisburg | HRB Nr. 12154 USt-Id-Nr.: DE 120607556 ...
[Qemu-devel] [PATCH] migration: don't use uninitialized variables
The qmp_migrate method uses the 'blk' and 'inc' parameter without checking if they're valid or not (they may be uninitialized if command is received via QMP) Signed-off-by: Pawit Pornkitprasan --- migration.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/migration.c b/migration.c index 9f5a423..f3d1ff7 100644 --- a/migration.c +++ b/migration.c @@ -385,8 +385,8 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, MigrationParams params; const char *p; -params.blk = blk; -params.shared = inc; +params.blk = has_blk && blk; +params.shared = has_inc && inc; if (s->state == MIG_STATE_ACTIVE) { error_set(errp, QERR_MIGRATION_ACTIVE); -- 1.8.1.2
[Qemu-devel] [PATCH] migration: send total time in QMP at "completed" stage
The "completed" stage sets total_time but not has_total_time and thus it is not sent via QMP reply (but sent via HMP nevertheless) Signed-off-by: Pawit Pornkitprasan --- migration.c | 1 + 1 file changed, 1 insertion(+) diff --git a/migration.c b/migration.c index 9f5a423..4c16f2e 100644 --- a/migration.c +++ b/migration.c @@ -219,6 +219,7 @@ MigrationInfo *qmp_query_migrate(Error **errp) info->has_status = true; info->status = g_strdup("completed"); +info->has_total_time = true; info->total_time = s->total_time; info->has_downtime = true; info->downtime = s->downtime; -- 1.8.1.2
Re: [Qemu-devel] [PATCH 4/4] qemu-img: conditionally discard target on convert
Il 18/07/2013 21:28, Peter Lieven ha scritto: > thanks for the details. I think to have optimal performance and best > change for unmapping in qemu-img convert > it might be best to export the OPTIMAL UNMAP GRANULARITY Agreed about this. > as well as the write_zeroes_w_discard capability via the BDI But why this?!? It is _not_ needed. All you need is to change the default of the "-S" option to be the OPTIMAL UNMAP GRANULARITY if it is nonzero. Paolo > and than zero > out the whole device with bdrv_write_zeroes and the BDRV_MAY_DISCARD > flag. > > the logic for this is already prepared in patch4 (topic of this email). > except that > i have to exchange bdrv_discard with bdrv_write_zeroes w/ BDRV_MAY_DISCARD. > > what do you think? > > > >> >> >> On Thu, Jul 18, 2013 at 11:43 AM, Peter Lieven wrote: >>> >>> Am 18.07.2013 um 16:35 schrieb Paolo Bonzini : >>> Il 18/07/2013 16:32, Peter Lieven ha scritto: >>> >> (Mis)alignment and granularity can be handled later. We can ignore them >> for now. Later, if we decide the best way to support them is a flag, >> we'll add it. Let's not put the cart before the horse. >> >> BTW, I expect alignment!=0 to be really, really rare. > To explain my concerns: > > I know that my target has internal page size of 15MB. I will check what > happens > if I deallocate this 15MB in chunks of lets say 1MB. If the page gets > unprovisioned > after the last chunk is unmapped it would be fine :-) You're talking of granularity here, not (mis)alignment. >>> >>> you are right. for the target i am talking about this is 30720 512-byte >>> blocks for the granularity (pagesize) and 0 for the alignment. >>> i will see what happens if I write same w/unmap the whole 30720 blocks in >>> smaller blocks ;-) otherwise i will have to add support for honoring this >>> values in qemu-img convert >>> as a follow up. >>> >>> Peter >>> > > >
Re: [Qemu-devel] [PATCH 12/17] qemu-img: add a "map" subcommand
Il 19/07/2013 06:48, Stefan Hajnoczi ha scritto: > On Thu, Jul 18, 2013 at 02:04:31PM -0600, Eric Blake wrote: >> On 07/03/2013 08:34 AM, Paolo Bonzini wrote: >>> This command dumps the metadata of an entire chain, in either tabular or >>> JSON >>> format. >>> >>> Signed-off-by: Paolo Bonzini >>> --- >>> qemu-img-cmds.hx | 6 ++ >>> qemu-img.c | 186 >>> +++ >>> 2 files changed, 192 insertions(+) >>> >> >>> +case OFORMAT_JSON: >>> +printf("%s{ 'depth': %d, 'start': %lld, 'length': %lld, " >>> + "'zero': %s, 'data': %s", >>> + (e->start == 0 ? "[" : ",\n"), >>> + e->depth, (long long) e->start, (long long) e->length, >>> + (e->flags & BDRV_BLOCK_ZERO) ? "true" : "false", >>> + (e->flags & BDRV_BLOCK_DATA) ? "true" : "false"); >>> +if (e->flags & BDRV_BLOCK_OFFSET_VALID) { >>> +printf(", 'offset': %lld", (long long) e->offset); >>> +} >>> + putchar('}'); >> >> Can we please get this format documented in qapi-schema.json, even if we >> aren't using qapi to generate it yet? > > Paolo: Please send a follow-up patch documenting the json schema, I've > already merged this series. > > I was although thinking about qemu-iotests for qemu-img map, but it's > tricky since the allocation is an internal detail of the image format. > Perhaps a test case using raw? If you have interesting cases with sparse images, even raw depends on how the underlying file system splits extents. I can do a simple test that ignores the 'offset' field. Paolo
Re: [Qemu-devel] [PATCH V5 0/8] add internal snapshot support at block device level
On Thu, Jul 18, 2013 at 02:34:52PM +0200, Kevin Wolf wrote: > Am 11.07.2013 um 07:46 hat Wenchao Xia geschrieben: > > This series brings internal snapshot support at block devices level, now > > we > > have two three methods to do block snapshot lively: 1) backing chain, > > 2) internal one and 3) drive-back up approach. > > > > Comparation: > > Advantages:Disadvantages: > > 1)delta data, taken fast, export, sizeperformance, delete slow. > > 2) taken fast, delete fast, performance, size delta data, format > > 3) performance, export, format taken slow, delta data, > > size > > > > I think in most case, saving vmstate in an standalone file is better than > > saving it inside qcow2, So suggest treat internal snapshot as block level > > methods and not encourage user to savevm in qcow2 any more. > > > > Implemention details: > > To avoid trouble, this serial have hide ID in create interfaces, this make > > sure no chaos of ID and name will be introduced by these interfaces. > > There is one patch may be common to Pavel's savvm transaction, patch 1/11, > > others are not quite related. Patch 1/11 will not set errp when no snapshot > > find, since patch 3/11 need to distinguish real error case. > > > > Next steps to better full VM snapshot: > > Improve internal snapshot's export capability. > > Better vmstate saving. > > > > Thanks Kevin to give advisement about how add it in qmp_transaction, > > oldest > > version comes drom Dietmar Maurer. > > > > v3: > > General: > > Rebased after Stenfan's driver-backup patch V6. > > > > Address Eric's comments: > > 4/9: grammar fix and better doc. > > 5/9: parameter name is mandatory now. grammar fix. > > 6/9: redesiged interface: take both id and name as optional parameter, > > return > > the deleted snapshot's info. > > > > Address Stefan's comments: > > 4/9: add '' around %s in message. drop code comments about vm_clock. > > 9/9: better doc, refined the code and add more test case. > > > > v4: > > Address Stefan's comments: > > 4/9: use error_setg_errno() to show error reason for > > bdrv_snapshot_create(), > > spell fix and better doc. > > 5/9: better doc. > > 6/9: remove spurious ';' in code, spell fix and better doc. > > > > v5: > > Address Kevin's comments: > > 3/8, 4/8, 8/8: remove the limit of numeric snapshot name. > > General change: > > 4/8: use existing type as parameter in qapi schema. > > > > Wenchao Xia (8): > > 1 snapshot: new function bdrv_snapshot_find_by_id_and_name() > > 2 snapshot: distinguish id and name in snapshot delete > > 3 qmp: add internal snapshot support in qmp_transaction > > 4 qmp: add interface blockdev-snapshot-internal-sync > > 5 qmp: add interface blockdev-snapshot-delete-internal-sync > > 6 hmp: add interface hmp_snapshot_blkdev_internal > > 7 hmp: add interface hmp_snapshot_delete_blkdev_internal > > 8 qemu-iotests: add 056 internal snapshot for block device test case > > Okay, reviewed the whole series. I had some comments on two or three > patches. > > One thing that I want to add is that in order to provide transactional > behaviour (i.e. snapshotting multiple device in one transaction should > result in a consistent snapshot), I think we may in fact need to stop > the VM while taking the snapshots. If we are doing guest memory snapshots we definitely need to stop vcpus. For disk-only snapshots I do not think it's necessary to stop the guest because: 1. Guest I/O can only be processed from the QEMU global mutex. But we currently hold it so no one else can make progress. 2. In the dataplane case the drain callback that I added stops the dataplane thread temporarily. It cannot be restarted until the main loop iterates again (see #1). Therefore I think stopping vcpus is not necessary for disk-only snapshots. Stefan
Re: [Qemu-devel] RFC [PATCH] Make bdrv_flush synchronous only and update callers
On Thu, Jul 18, 2013 at 11:21:42PM +0200, Charlie Shepherd wrote: > This patch makes bdrv_flush a synchronous function and updates any callers > from > a coroutine context to use bdrv_co_flush instead. > > The motivation for this patch comes from the GSoC Continuation-Passing C > project. When coroutines were introduced, synchronous functions in the block > layer were converted to use asynchronous methods by dynamically detecting if > they were being run from a coroutine context by calling qemu_in_coroutine(), > and > yielding if so. If they were not, they would spawn a new coroutine and poll > until the asynchronous counterpart finished. > > However this approach does not work with CPC as the CPC translator converts > all > functions annotated coroutine_fn to a different (continuation based) calling > convention. This means that coroutine_fn annotated functions cannot be called > from a non-coroutine context. > > This patch is a Request For Comments on the approach of splitting these > "dynamic" functions into synchronous and asynchronous versions. This is easy > for > bdrv_flush as it already has an asynchronous counterpart - bdrv_co_flush. The > only caller of bdrv_flush from a coroutine context is mirror_drain in > block/mirror.c - this should be annotated as a coroutine_fn as it calls > qemu_coroutine_yield(). > > If this approach meets with approval I will develop a patchset splitting the > other "dynamic" functions in the block layer. This will allow all coroutine > functions to have a coroutine_fn annotation that can be statically checked > (CPC > can be used to verify annotations). > > I have audited the other callers of bdrv_flush, they are included below: > > block.c: bdrv_reopen_prepare, bdrv_close, bdrv_commit, bdrv_pwrite_sync bdrv_pwrite_sync() is a dynamic function. If called from coroutine context it will yield (indirectly from bdrv_pwrite() or bdrv_flush()). > block/qcow2-cache.c: qcow2_cache_entry_flush, qcow2_cache_flush > block/qcow2-refcount.c: qcow2_update_snapshot_refcount > block/qcow2-snapshot.c: qcow2_write_snapshots > block/qcow2.c: qcow2_mark_dirty, qcow2_mark_clean qcow2 runs in coroutine context, the coroutine_fn annotations are just missing. See block/qcow2.c:bdrv_qcow2 for the entry points like qcow2_co_readv. > block/qed-check.c: qed_check_mark_clean > block/qed.c: bdrv_qed_open, bdrv_qed_close > blockdev.c: external_snapshot_prepare, do_drive_del > cpus.c: do_vm_stop > hw/block/nvme.c: nvme_clear_ctrl > qemu-io-cmds.c: flush_f > savevm.c: bdrv_fclose I think the rest are fine and your patch looks good.
Re: [Qemu-devel] [PATCH 3/4] block/raw: add .bdrv_get_info
On Mon, Jul 15, 2013 at 12:49:34PM +0200, Peter Lieven wrote: > Signed-off-by: Peter Lieven > --- > block/raw.c |6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/block/raw.c b/block/raw.c > index 8c81de9..f1682d4 100644 > --- a/block/raw.c > +++ b/block/raw.c > @@ -121,6 +121,11 @@ static int raw_has_zero_init(BlockDriverState *bs) > return bdrv_has_zero_init(bs->file); > } > > +static int raw_get_info(BlockDriverState *bs, BlockDriverInfo *bdi) > +{ > +return bdrv_get_info(bs->file, bdi); > +} > + > static BlockDriver bdrv_raw = { > .format_name= "raw", > > @@ -140,6 +145,7 @@ static BlockDriver bdrv_raw = { > > .bdrv_probe = raw_probe, > .bdrv_getlength = raw_getlength, > +.bdrv_get_info = raw_get_info, I checked BlockDriverInfo to make sure the fields still make sense for the raw BlockDriverState. The vm_state_offset field is questionable, since the raw BDS doesn't know about vmstate and you certainly cannot write to it. If protocols start supporting vmstate we might have to rework that anyway, so I'm happy with this patch. Stefan
Re: [Qemu-devel] [PATCH 0/4] qemu-img: conditionally discard target on convert
On Mon, Jul 15, 2013 at 12:49:31PM +0200, Peter Lieven wrote: > this adds the proposed solution to add a generic mechanism to zeroize > a target in qemu-img if it has discard_zeroes but has_zero_init is 0. > > Peter Lieven (4): > block: add discard_zeroes and max_unmap to BlockDriverInfo > iscsi: add .bdrv_get_info > block/raw: add .bdrv_get_info > qemu-img: conditionally discard target on convert > > block/iscsi.c | 19 +++-- > block/raw.c |6 ++ > include/block/block.h |5 + > qemu-img.c| 56 > - > 4 files changed, 83 insertions(+), 3 deletions(-) Thanks, applied Patch 3 to my block tree as requested: https://github.com/stefanha/qemu/commits/block Stefan
Re: [Qemu-devel] Buildbots out of disk space
On Thu, Jul 18, 2013 at 12:45:42PM +0200, Andreas Färber wrote: > Am 15.07.2013 08:43, schrieb Stefan Hajnoczi: > > On Mon, Jul 15, 2013 at 07:31:05AM +0200, Christian Berendt wrote: > >> Think we should clean up the registered build slaves. Here's a list > >> of offline slaves. Can they be removed from the bot? > >> > >> default_s390 > > > > This slave goes offline relatively often but we do try to get it back > > periodically. > > It's a z/VM instance that had some networking issues but I can SSH onto > the machine again now - I don't see any buildbot-related process running > nor is there anything related scheduled in cron. Please advise what to > do to get it back online. I have added the following crontab entry for user 'build' on qemu-s390.opensuse.org: @reboot cd /home/build/qemu && /usr/local/bin/buildslave start See here for buildslave status: http://buildbot.b1-systems.de/qemu/buildslaves Stefan
Re: [Qemu-devel] [PATCH 12/17] qemu-img: add a "map" subcommand
On Thu, Jul 18, 2013 at 02:04:31PM -0600, Eric Blake wrote: > On 07/03/2013 08:34 AM, Paolo Bonzini wrote: > > This command dumps the metadata of an entire chain, in either tabular or > > JSON > > format. > > > > Signed-off-by: Paolo Bonzini > > --- > > qemu-img-cmds.hx | 6 ++ > > qemu-img.c | 186 > > +++ > > 2 files changed, 192 insertions(+) > > > > > +case OFORMAT_JSON: > > +printf("%s{ 'depth': %d, 'start': %lld, 'length': %lld, " > > + "'zero': %s, 'data': %s", > > + (e->start == 0 ? "[" : ",\n"), > > + e->depth, (long long) e->start, (long long) e->length, > > + (e->flags & BDRV_BLOCK_ZERO) ? "true" : "false", > > + (e->flags & BDRV_BLOCK_DATA) ? "true" : "false"); > > +if (e->flags & BDRV_BLOCK_OFFSET_VALID) { > > +printf(", 'offset': %lld", (long long) e->offset); > > +} > > + putchar('}'); > > Can we please get this format documented in qapi-schema.json, even if we > aren't using qapi to generate it yet? Paolo: Please send a follow-up patch documenting the json schema, I've already merged this series. I was although thinking about qemu-iotests for qemu-img map, but it's tricky since the allocation is an internal detail of the image format. Perhaps a test case using raw? Stefan
Re: [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata
On Tue, Jul 16, 2013 at 06:29:11PM +0200, Paolo Bonzini wrote: > This series adds a subcommand to "map" that can dump file metadata. > Metadata that is dumped includes: > > - whether blocks are allocated in bs->file and, if so, where > > - whether blocks are zero > > - whether data is read from bs or bs->backing_hd > > Metadata is dumped for an entire chain of images. One example usage is > (for non-compressed, non-encrypted images) to transform the metadata > into a Linux device-mapper setup, and make a qcow2 image available (for > read only) as a block device. Another possible usage is to determine > the used areas of a file, and convert it in place to another format. > Alternatively, the richer data can be used by block jobs to quickly > determine if a block is zero without reading it. > > The patches implement a new operation, bdrv_co_get_block_status, that > supersedes bdrv_co_is_allocated. The bdrv_is_allocated API is still > available of course, and implemented on top of the new operation. > > Patches 1-3 touch cow, which uses bdrv_co_is_allocated during its reads > and writes. I optimized it a bit because it was unbearably slow during > testing. With these patches (also tested with blkverify), booting of > a cow guest image is not particularly slow. > > Patches 4 to 6 clean up the bdrv_is_allocated and bdrv_is_allocated_above > implementation, eliminating some code duplication. > > Patches 7 and 8 tweak bdrv_has_zero_init and its usage in qemu-img in > a way that helps when implementing the new API. > > Patches 9 to 11 implement bdrv_get_block_status and change the formats > to report all the available information. > > Patch 12 adds the "map" subcommand to qemu-img. > > Finally, patches 13 to 17 tweak the implementation to extract more > information from protocols, and combine this information with format > metadata whenever possible. > > Paolo > > v1->v2: see patches 1, 6, 9, 10, 11, 12 > > Paolo Bonzini (17): > cow: make reads go at a decent speed > cow: make writes go at a less indecent speed > cow: do not call bdrv_co_is_allocated > block: make bdrv_co_is_allocated static > block: remove bdrv_is_allocated_above/bdrv_co_is_allocated_above > distinction > block: expect errors from bdrv_co_is_allocated > qemu-img: always probe the input image for allocated sectors > block: make bdrv_has_zero_init return false for copy-on-write-images > block: introduce bdrv_get_block_status API > block: define get_block_status return value > block: return get_block_status data and flags for formats > qemu-img: add a "map" subcommand > block: use bdrv_has_zero_init to return BDRV_BLOCK_ZERO > raw-posix: return get_block_status data and flags > raw-posix: detect XFS unwritten extents > block: add default get_block_status implementation for protocols > block: look for zero blocks in bs->file > > block.c | 134 +-- > block/commit.c| 6 +- > block/cow.c | 90 +-- > block/mirror.c| 4 +- > block/qcow.c | 13 ++- > block/qcow2.c | 24 +++-- > block/qed.c | 39 ++-- > block/raw-posix.c | 24 +++-- > block/raw.c | 6 +- > block/sheepdog.c | 14 +-- > block/stream.c| 10 +-- > block/vdi.c | 17 +++- > block/vmdk.c | 23 - > block/vvfat.c | 15 ++-- > include/block/block.h | 34 +-- > include/block/block_int.h | 2 +- > qemu-img-cmds.hx | 6 ++ > qemu-img.c| 225 > +- > 18 files changed, 503 insertions(+), 183 deletions(-) Applied s/MAX/MIN/ fixup suggested by Fam. Thanks, applied to my block tree: https://github.com/stefanha/qemu/commits/block Stefan
Re: [Qemu-devel] [PATCH] block: fix bdrv_read_unthrottled()
On Thu, Jul 18, 2013 at 10:37:32AM +0200, Peter Lieven wrote: > Signed-off-by: Peter Lieven > --- > block.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Thanks, applied to my block tree: https://github.com/stefanha/qemu/commits/block Stefan
Re: [Qemu-devel] [PATCH] cpus: Let vm_stop[_force_state]() always flush block devices
On Thu, Jul 18, 2013 at 02:52:19PM +0200, Kevin Wolf wrote: > Even if the VM is already stopped, we cannot assume that all data has > already been successfully flushed to disk. The flush during the previous > vm_stop() could have failed. > > Run bdrv_flush_all() unconditionally so that we get an error each time > if the block device isn't really flushed. > > Signed-off-by: Kevin Wolf > --- > cpus.c | 9 ++--- > 1 file changed, 6 insertions(+), 3 deletions(-) Thanks, applied to my block tree: https://github.com/stefanha/qemu/commits/block Stefan
Re: [Qemu-devel] [PATCH v2 repost 8/9] i386: generate pc guest info
<...> > +void ich9_lpc_set_guest_info(PcGuestInfo *guest_info) > +{ > +guest_info->sci_int = 9; > +guest_info->acpi_enable_cmd = ICH9_APM_ACPI_ENABLE; > +guest_info->acpi_disable_cmd = ICH9_APM_ACPI_DISABLE; > +} > + This function has to be called somewhere(ich9_lpc_pm_init?) to setup acpi_enable_cmd and acpi_disable_cmd, or guest Linux will report: ACPI Error: No ACPI mode transition supported in this system (enable/disable both zero) and disable ACPI.
Re: [Qemu-devel] [PATCH v7 00/10] qemu-ga: fsfreeze on Windows using VSS
Hi Michael, > "CoCreateInstance(VSSCoordinator) failed. (Error: 80040154) Class not > registered I have seen this error when I ran 32bit qemu-ga on 64bit Windows (2008 server R2). Just in case, could you confirm qemu-ga.exe and qga-vss.dll are built for the correct architecture? Stopping VSS service by "net stop vss" might be helpful to fix hung. Otherwise, possibly COM+ related registry are messed up...? ("qemu-ga -s uninstall" deletes whole qemu-ga VSS related registry entries, that should solve the problem in usual...) And If you have any VSS/COM related log in Event Viewer (maybe in System or Application log) , please let me know. Thanks, Tomoki Sekiyama From: fluxion [fluks...@gmail.com] on behalf of Michael Roth [mdr...@linux.vnet.ibm.com] Sent: Thursday, July 18, 2013 6:19 PM To: Tomoki Sekiyama; qemu-devel@nongnu.org Cc: libaiq...@huawei.com; ler...@redhat.com; gham...@redhat.com; stefa...@gmail.com; lcapitul...@redhat.com; vroze...@redhat.com; pbonz...@redhat.com; Seiji Aguchi; ebl...@redhat.com; ar...@redhat.com Subject: Re: [PATCH v7 00/10] qemu-ga: fsfreeze on Windows using VSS Quoting Tomoki Sekiyama (2013-07-15 11:20:23) > Hi, > > This is v7 of patch series to add fsfreeze for Windows qemu-guest-agent. > > changes from v7: > - Fix COM initialization issue for Windows service thread (patch 07/10) > > v6: http://lists.nongnu.org/archive/html/qemu-devel/2013-07/msg01788.html Hi Tomoki, I'm also having some issues testing this, though I think my problem is a little different than the issue in v6: When I initially ran qemu-ga -s install, I got some output indicating the VSS service was registered, but then it hung. I noticed afterward that I'd already had the service running, so figured that that might've been the problem. So I stopped the service and unregistered it (using a qemu-ga install that was in a separate directory). Then I went back to install via qemu-ga.exe -s install, and it just hangs with no output. I wasn't sure how to reset the state, so I took the Windows approach and rebooted. After reboot, running qemu-ga -s install gives me: C:\Users\mdroth\Documents\qga>qemu-ga.exe -s install Registering QEMU Guest Agent VSS Provider: C:\Users\mdroth\Documents\qga\qga-vss.dll C:\Users\mdroth\Documents\qga\qga-vss.tlb Failed to pCatalog->InstallComponent. (Error: 8011045c) The application name is not unique and cannot be resolved to an application id CoCreateInstance. (Error: 80040154) Class not registered Removing COM+ Application: QEMU Guest Agent VSS Provider Removing COM+ Application: QEMU Guest Agent VSS Provider [mdroth@vm5 ~]$ I'm not sure if I'm still in a bad state from earlier, but I can't seem to recover from here. If I try running qemu-ga -s uninstall, then qemu-ga -s install again, I get a pop-up error saying: "CoCreateInstance(VSSCoordinator) failed. (Error: 80040154) Class not registered Any ideas what's going on here? I'm on a Windows 7 vm and built using a Fedora 18 mingw environment. Let me know if you need any additional information to debug. Thanks! > > > * Description > In Windows, VSS (Volume Shadow Copy Service) provides a facility to > quiesce filesystems and applications before disk snapshots are taken. > This patch series implements "fsfreeze" command of qemu-ga using VSS. > > > * How to build & run qemu-ga with VSS support > > - Download Microsoft VSS SDK from: >http://www.microsoft.com/en-us/download/details.aspx?id=23490 > > - Setup the SDK >scripts/extract-vsssdk-headers setup.exe (on POSIX-systems) > > - Specify installed SDK directory to configure option as: >./configure -with-vss-sdk="path/to/VSS SDK" > --cross-prefix=i686-w64-mingw32- > > - make qemu-ga.exe qga/vss-win32/qga-vss.{dll,tlb} > > - Install qemu-ga.exe, qga/vss-win32/qga-vss.{dll,tlb}, and >the other required mingw libraries into the same directory in guests > > - Run `qemu-ga.exe -s install' and `net start qemu-ga' in the guests > > Any feedback are appreciated. > > --- > Tomoki Sekiyama (10): > configure: Support configuring C++ compiler > Add c++ keywords to QAPI helper script > checkpatch.pl: Check .cpp files > Add a script to extract VSS SDK headers on POSIX system > qemu-ga: Add configure options to specify path to Windows/VSS SDK > error: Add error_set_win32 and error_setg_win32 > qemu-ga: Add Windows VSS provider and requester as DLL > qemu-ga: Call Windows VSS requester in fsfreeze command handler > qemu-ga: Install Windows VSS provider on `qemu-ga -s install' > QMP/qemu-ga-client: Make timeout longer for guest-fsfreeze-freeze > command > > > .gitignore |1 > Makefile |3 > Makefile.objs |2 > QMP/qemu-ga-client |4 > configure | 87 +++ > hmp.c |2 > hw/pci/pci.c |
Re: [Qemu-devel] [PATCH] Bug Fix:Segmentation fault when use usb-ehci device
于 2013/7/19 1:14, Andreas Färber 写道: Hi, Am 18.07.2013 17:27, schrieb Mike Qiu: Hi all Any comments ? You should've CCed the USB maintainer whose file you are touching for review rather than just ppc people, see ./MAINTAINERS. I have CC to the usb naintainer Gerd Hoffmann, his files are hw/usb/*. There's some typos in the commit message, but the change looks okay to me - although there were discussions to catch this on the memory API side of things instead. You mean this patch: see below: exec: Support 64-bit operations in address_s if so it is very different. BTW, this bug has been opened before? Thanks Mike Regards, Andreas Thanks Mike 2013/7/16 11:50, Mike Qiu wrote: For usb-ehci in qemu, its caps just has read() operation, the write() operation does not exist. This cause a Segmentation fault when use usb-ehci device in ppc64 platform. here is gdb output: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x3fffa7fcef20 (LWP 6793)] 0x103f5244 in memory_region_oldmmio_write_accessor (opaque=0x113e9e78, addr=9, value=0x3fffa7fce088, size=1, shift=0, mask=255) at /home/Mike/qemu-impreza/memory.c:384 384 mr->ops->old_mmio.write[ctz32(size)](mr->opaque, addr, tmp); (gdb) p *mr->ops $1 = {read = @0x10716f68: 0x1020699c , write = 0, endianness = DEVICE_LITTLE_ENDIAN, valid = {min_access_size = 1, max_access_size = 4, unaligned = false, accepts = 0}, impl = {min_access_size = 1, max_access_size = 1, unaligned = false}, old_mmio = {read = {0, 0, 0}, write = {0, 0, 0}}} Becasue function write() of mr->ops has not been implement, in function memory_region_dispatch_write(), it call oldmmio write accessor, but at the same time old_mmio still not been implement by default. That is the root cause of the Segmentation fault. To solve this problem, add empty function: ehci_caps_write() Signed-off-by: Mike Qiu --- hw/usb/hcd-ehci.c |7 +++ 1 file changed, 7 insertions(+) diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c index 67e4b24..6c8a439 100644 --- a/hw/usb/hcd-ehci.c +++ b/hw/usb/hcd-ehci.c @@ -1072,6 +1072,12 @@ static void ehci_port_write(void *ptr, hwaddr addr, trace_usb_ehci_portsc_change(addr + s->portscbase, addr >> 2, *portsc, old); } +static void ehci_caps_write(void *ptr, hwaddr addr, uint64_t val, + unsigned size) +{ +/* nothing */ +} + static void ehci_opreg_write(void *ptr, hwaddr addr, uint64_t val, unsigned size) { @@ -2380,6 +2386,7 @@ static void ehci_frame_timer(void *opaque) static const MemoryRegionOps ehci_mmio_caps_ops = { .read = ehci_caps_read, +.write = ehci_caps_write, .valid.min_access_size = 1, .valid.max_access_size = 4, .impl.min_access_size = 1,
Re: [Qemu-devel] [PATCH] [RFC] aio/async: Add timed bottom-halves
On Thu, Jul 18, 2013 at 07:48:28PM +0100, Alex Bligh wrote: > Stefan, > > --On 17 July 2013 11:02:30 +0800 Stefan Hajnoczi wrote: > > >The steps to achieving this: > > > >1. Drop alarm timers from qemu-timer.c and calculate g_poll() timeout > > instead for the main loop. > > > >2. Introduce a per-AioContext aio_ctx_clock that can be used with > > qemu_new_timer() to create a QEMUTimer that expires during > > aio_poll(). > > > >3. Calculate g_poll() timeout for aio_ctx_clock in aio_poll(). > > I've pretty much written this. > > Two issues: > > 1. I very much enjoyed deleting all the alarm timers code. However, it was > doing something vaguely useful, which was calling qemu_notify_event > when the timer expired. Under the new regime above, if the AioContext > clock's timers expires within aio_poll, life is good as the timeout > to g_poll will expire. However, this won't apply if any of the other > three static clock's timers expire. Also, it won't work within the > mainloop poll. Also, we lose the ability to respond to timers in a sub > millisecond way. > > Options: > > a) restore alarm timers (at least for the time being). Make all > alarm timers do qemu_notify_event. However, only run the AioContext > clock's timers within aio_poll. This is the least intrusive change. > > b) calculate the timeout in aio_poll with respect to the minimum > deadline across all clocks, not just AioContext's clock. Use the > same logic in mainloop. > > I'd go for (b), except for the millisecond accuracy thing. So my > temptation (sadly) is (a). I think this is a non-issue because host_alarm_handler() can only be called from the main loop: main-loop.c:qemu_signal_init() sets up signalfd to monitor SIGALRM. Therefore we do not asynchronously invoke the SIGALRM signals handler. It is only invoked from main-loop.c:sigfd_handler() when the main loop runs. That's how I read the code. I haven't checked a running process to be sure. > 2. If we do delete alarm timers, I'll need to delete the -clock option. I noticed this too because I think we should stub it out for compatibility. Accept existing options but ignore them, update documentation to state that they are kept for compatibility. Stefan
Re: [Qemu-devel] [PATCH V4 3/4] Add backing drive while performing backup.
On Thu, 07/18 15:13, Paolo Bonzini wrote: > Il 18/07/2013 08:39, Fam Zheng ha scritto: > > On Wed, 07/17 13:04, Ian Main wrote: > >> This patch adds the original source drive as a backing drive to our target > >> image so that the target image will appear complete during backup. This > >> is especially useful for SYNC_MODE_NONE as it allows export via NBD to > >> have a complete point-in-time snapshot available for export. > >> > >> Signed-off-by: Ian Main > >> --- > >> block/backup.c | 13 + > >> 1 file changed, 13 insertions(+) > >> > >> diff --git a/block/backup.c b/block/backup.c > >> index 68abd23..e086e76 100644 > >> --- a/block/backup.c > >> +++ b/block/backup.c > >> @@ -323,6 +323,10 @@ static void coroutine_fn backup_run(void *opaque) > >> > >> hbitmap_free(job->bitmap); > >> > >> +/* Set the target backing drive back to NULL before calling delete or > >> + * it will also delete the underlying drive. */ > >> +target->backing_hd = NULL; > >> + > >> bdrv_iostatus_disable(target); > >> bdrv_delete(target); > >> > >> @@ -362,6 +366,15 @@ void backup_start(BlockDriverState *bs, > >> BlockDriverState *target, > >> return; > >> } > >> > >> +/* Manually set the backing hd to be the backup source drive so > >> + * that all reads done while we are backing up will be passed > >> + * on to the original source drive. This allows reading from the > >> + * image while the backup is in progress, or in the case of > >> + * SYNC_MODE_NONE allows a complete image to be present for export. > >> + * Note that we do this for all modes including SYNC_MODE_TOP as > >> + * even then it allows on-the-fly reading. */ > >> +target->backing_hd = bs; > >> + > > > > Also set target->backing_file and target->backing_format here? Paolo? > > I don't think so, it is temporary while the job runs so that the NBD > server can already return the actual data. OK. For NBD export, it's also going to have a name, to be used with nbd_server_add. So should we call bdrv_set_in_use() here to protect target from being used elsewhere? -- Fam
Re: [Qemu-devel] [PATCH] linux-user: Fix target_stat and target_stat64 for OpenRISC
Hi Peter, On Thu, Jul 18, 2013 at 6:18 PM, Peter Maydell wrote: > Ping? > Thank you, it looks good to me, please push it. > thanks > -- PMM > > On 6 July 2013 21:44, Peter Maydell wrote: >> OpenRISC uses the asm-generic versions of target_stat and >> target_stat64, but it was incorrectly using the x86/ARM/etc version >> due to a misplaced defined(TARGET_OPENRISC). The previously unused >> OpenRISC section of the ifdef ladder also defined an incorrect >> target_stat and omitted the target_stat64 definition. Fix >> target_stat, provide target_stat64, and add a comment noting that >> these are the asm-generic versions for the benefit of future ports. >> >> Signed-off-by: Peter Maydell >> --- >> This fixes basic problems like "ls -l output is nonsense" and "shell >> thinks programs aren't executable and won't run them". >> >> linux-user/syscall_defs.h | 49 >> ++--- >> 1 file changed, 37 insertions(+), 12 deletions(-) >> >> diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h >> index 92c01a9..cb6341f 100644 >> --- a/linux-user/syscall_defs.h >> +++ b/linux-user/syscall_defs.h >> @@ -1138,8 +1138,7 @@ struct target_winsize { >> #endif >> >> #if (defined(TARGET_I386) && defined(TARGET_ABI32)) || defined(TARGET_ARM) \ >> -|| defined(TARGET_CRIS) || defined(TARGET_UNICORE32) \ >> -|| defined(TARGET_OPENRISC) >> +|| defined(TARGET_CRIS) || defined(TARGET_UNICORE32) >> struct target_stat { >> unsigned short st_dev; >> unsigned short __pad1; >> @@ -1837,29 +1836,55 @@ struct target_stat { >> abi_ulong __unused[3]; >> }; >> #elif defined(TARGET_OPENRISC) >> + >> +/* These are the asm-generic versions of the stat and stat64 structures */ >> + >> struct target_stat { >> abi_ulong st_dev; >> abi_ulong st_ino; >> -abi_ulong st_nlink; >> - >> unsigned int st_mode; >> +unsigned int st_nlink; >> unsigned int st_uid; >> unsigned int st_gid; >> -unsigned int __pad0; >> abi_ulong st_rdev; >> +abi_ulong __pad1; >> abi_long st_size; >> -abi_long st_blksize; >> -abi_long st_blocks;/* Number 512-byte blocks allocated. */ >> - >> -abi_ulong target_st_atime; >> +int st_blksize; >> +int __pad2; >> +abi_long st_blocks; >> +abi_long target_st_atime; >> abi_ulong target_st_atime_nsec; >> -abi_ulong target_st_mtime; >> +abi_long target_st_mtime; >> abi_ulong target_st_mtime_nsec; >> -abi_ulong target_st_ctime; >> +abi_long target_st_ctime; >> abi_ulong target_st_ctime_nsec; >> +unsigned int __unused4; >> +unsigned int __unused5; >> +}; >> >> -abi_long __unused[3]; >> +struct target_stat64 { >> +uint64_t st_dev; >> +uint64_t st_ino; >> +unsigned int st_mode; >> +unsigned int st_nlink; >> +unsigned int st_uid; >> +unsigned int st_gid; >> +uint64_t st_rdev; >> +uint64_t __pad1; >> +int64_t st_size; >> +int st_blksize; >> +int __pad2; >> +int64_t st_blocks; >> +int target_st_atime; >> +unsigned int target_st_atime_nsec; >> +int target_st_mtime; >> +unsigned int target_st_mtime_nsec; >> +int target_st_ctime; >> +unsigned int target_st_ctime_nsec; >> +unsigned int __unused4; >> +unsigned int __unused5; >> }; >> + >> #else >> #error unsupported CPU >> #endif >> -- >> 1.7.9.5 >> >> Regards, Jia
[Qemu-devel] Is it possible to detect GPA access through the mapped HVA
Hi I am new to QEMU. I want to know is it possible to detect the guest OS physical memory access through QEMU? What I am doing right now is use mprotect to set the mapped RAM as not accessible. Then register the SIGSEGV handler to handle the segmentation fault in qemu_kvm_eat_signals. But I always get Segmentation fault. I don't know whether this method works or I miss something. Thanks for your time! Best Wishes, Yaohui Hu
[Qemu-devel] Fwd: Is it possible to detect GPA access through the mapped HVA
Hi I am new to QEMU. I want to know is it possible to detect the guest OS physical memory access through QEMU? What I am doing right now is use mprotect to set the mapped RAM as not accessible. Then register the SIGSEGV handler to handle the segmentation fault in qemu_kvm_eat_signals. But I always get Segmentation fault. I don't know whether this method works or I miss something. Thanks for your time! Best Wishes, Yaohui Hu
Re: [Qemu-devel] [PATCH] PPC: dbdma: macio: Fix format specifiers (build regression)
On 18.07.2013, at 21:49, Stefan Weil wrote: > Am 16.07.2013 07:54, schrieb Stefan Weil: >> Am 12.07.2013 18:48, schrieb Stefan Weil: >>> Fix a number of warnings for 32 bit builds (tested on MingW and Linux): >>> >>> CChw/ide/macio.o >>> qemu/hw/ide/macio.c: In function 'pmac_ide_atapi_transfer_cb': >>> qemu/hw/ide/macio.c:134:9: error: format '%lx' expects argument of type >>> 'long unsigned int', but argument 3 has type 'hwaddr' [-Werror=format] >>> qemu/hw/ide/macio.c: In function 'pmac_ide_transfer_cb': >>> qemu/hw/ide/macio.c:215:5: error: format '%ld' expects argument of type >>> 'long int', but argument 5 has type 'int64_t' [-Werror=format] >>> qemu/hw/ide/macio.c:222:9: error: format '%lx' expects argument of type >>> 'long unsigned int', but argument 3 has type 'hwaddr' [-Werror=format] >>> qemu/hw/ide/macio.c:264:9: error: format '%lx' expects argument of type >>> 'long unsigned int', but argument 3 has type 'hwaddr' [-Werror=format] >>> cc1: all warnings being treated as errors >>> make: *** [hw/ide/macio.o] Error 1 >>> >>> Signed-off-by: Stefan Weil >>> --- >>> >>> >>> >>> Hi Anthony, >>> >>> the patch fixes a build regression which was introduced today. >>> Could you please apply it without waiting for the next pull requests? >>> >>> Thanks, >>> Stefan >>> >>> >>> >>> hw/ide/macio.c |9 + >>> 1 file changed, 5 insertions(+), 4 deletions(-) >>> >>> diff --git a/hw/ide/macio.c b/hw/ide/macio.c >>> index 38ad924..ef4ba2b 100644 >>> --- a/hw/ide/macio.c >>> +++ b/hw/ide/macio.c >>> @@ -131,7 +131,7 @@ static void pmac_ide_atapi_transfer_cb(void *opaque, >>> int ret) >>> int sector_num = (s->lba << 2) + (s->io_buffer_index >> 9); >>> int nsector = io->len >> 9; >>> >>> -MACIO_DPRINTF("precopying unaligned %d bytes to %#lx\n", >>> +MACIO_DPRINTF("precopying unaligned %d bytes to %#" HWADDR_PRIx >>> "\n", >>> unaligned, io->addr + io->len - unaligned); >>> >>> bdrv_read(s->bs, sector_num + nsector, io->remainder, 1); >>> @@ -212,14 +212,15 @@ static void pmac_ide_transfer_cb(void *opaque, int >>> ret) >>> s->nsector -= n; >>> } >>> >>> -MACIO_DPRINTF("remainder: %d io->len: %d nsector: %d sector_num: >>> %ld\n", >>> +MACIO_DPRINTF("remainder: %d io->len: %d nsector: %d " >>> + "sector_num: %" PRId64 "\n", >>> io->remainder_len, io->len, s->nsector, sector_num); >>> if (io->remainder_len && io->len) { >>> /* guest wants the rest of its previous transfer */ >>> int remainder_len = MIN(io->remainder_len, io->len); >>> uint8_t *p = &io->remainder[0x200 - remainder_len]; >>> >>> -MACIO_DPRINTF("copying remainder %d bytes at %#lx\n", >>> +MACIO_DPRINTF("copying remainder %d bytes at %#" HWADDR_PRIx "\n", >>> remainder_len, io->addr); >>> >>> switch (s->dma_cmd) { >>> @@ -261,7 +262,7 @@ static void pmac_ide_transfer_cb(void *opaque, int ret) >>> if (unaligned) { >>> int nsector = io->len >> 9; >>> >>> -MACIO_DPRINTF("precopying unaligned %d bytes to %#lx\n", >>> +MACIO_DPRINTF("precopying unaligned %d bytes to %#" HWADDR_PRIx >>> "\n", >>> unaligned, io->addr + io->len - unaligned); >>> >>> switch (s->dma_cmd) { >> >> Ping? > > > Ping²? Acked-by: Alexander Graf I assume this can go through the trivial tree? Or directly get applied by Anthony? Alex
Re: [Qemu-devel] [PATCH v7 00/10] qemu-ga: fsfreeze on Windows using VSS
Quoting Tomoki Sekiyama (2013-07-15 11:20:23) > Hi, > > This is v7 of patch series to add fsfreeze for Windows qemu-guest-agent. > > changes from v7: > - Fix COM initialization issue for Windows service thread (patch 07/10) > > v6: http://lists.nongnu.org/archive/html/qemu-devel/2013-07/msg01788.html Hi Tomoki, I'm also having some issues testing this, though I think my problem is a little different than the issue in v6: When I initially ran qemu-ga -s install, I got some output indicating the VSS service was registered, but then it hung. I noticed afterward that I'd already had the service running, so figured that that might've been the problem. So I stopped the service and unregistered it (using a qemu-ga install that was in a separate directory). Then I went back to install via qemu-ga.exe -s install, and it just hangs with no output. I wasn't sure how to reset the state, so I took the Windows approach and rebooted. After reboot, running qemu-ga -s install gives me: C:\Users\mdroth\Documents\qga>qemu-ga.exe -s install Registering QEMU Guest Agent VSS Provider: C:\Users\mdroth\Documents\qga\qga-vss.dll C:\Users\mdroth\Documents\qga\qga-vss.tlb Failed to pCatalog->InstallComponent. (Error: 8011045c) The application name is not unique and cannot be resolved to an application id CoCreateInstance. (Error: 80040154) Class not registered Removing COM+ Application: QEMU Guest Agent VSS Provider Removing COM+ Application: QEMU Guest Agent VSS Provider [mdroth@vm5 ~]$ I'm not sure if I'm still in a bad state from earlier, but I can't seem to recover from here. If I try running qemu-ga -s uninstall, then qemu-ga -s install again, I get a pop-up error saying: "CoCreateInstance(VSSCoordinator) failed. (Error: 80040154) Class not registered Any ideas what's going on here? I'm on a Windows 7 vm and built using a Fedora 18 mingw environment. Let me know if you need any additional information to debug. Thanks! > > > * Description > In Windows, VSS (Volume Shadow Copy Service) provides a facility to > quiesce filesystems and applications before disk snapshots are taken. > This patch series implements "fsfreeze" command of qemu-ga using VSS. > > > * How to build & run qemu-ga with VSS support > > - Download Microsoft VSS SDK from: >http://www.microsoft.com/en-us/download/details.aspx?id=23490 > > - Setup the SDK >scripts/extract-vsssdk-headers setup.exe (on POSIX-systems) > > - Specify installed SDK directory to configure option as: >./configure -with-vss-sdk="path/to/VSS SDK" > --cross-prefix=i686-w64-mingw32- > > - make qemu-ga.exe qga/vss-win32/qga-vss.{dll,tlb} > > - Install qemu-ga.exe, qga/vss-win32/qga-vss.{dll,tlb}, and >the other required mingw libraries into the same directory in guests > > - Run `qemu-ga.exe -s install' and `net start qemu-ga' in the guests > > Any feedback are appreciated. > > --- > Tomoki Sekiyama (10): > configure: Support configuring C++ compiler > Add c++ keywords to QAPI helper script > checkpatch.pl: Check .cpp files > Add a script to extract VSS SDK headers on POSIX system > qemu-ga: Add configure options to specify path to Windows/VSS SDK > error: Add error_set_win32 and error_setg_win32 > qemu-ga: Add Windows VSS provider and requester as DLL > qemu-ga: Call Windows VSS requester in fsfreeze command handler > qemu-ga: Install Windows VSS provider on `qemu-ga -s install' > QMP/qemu-ga-client: Make timeout longer for guest-fsfreeze-freeze > command > > > .gitignore |1 > Makefile |3 > Makefile.objs |2 > QMP/qemu-ga-client |4 > configure | 87 +++ > hmp.c |2 > hw/pci/pci.c |2 > include/qapi/error.h | 13 + > qga/Makefile.objs |5 > qga/commands-win32.c | 82 ++ > qga/main.c | 10 + > qga/vss-win32.c| 154 > qga/vss-win32.h| 27 ++ > qga/vss-win32/Makefile.objs| 23 ++ > qga/vss-win32/install.cpp | 424 + > qga/vss-win32/provider.cpp | 513 > > qga/vss-win32/qga-vss.def | 13 + > qga/vss-win32/qga-vss.idl | 20 ++ > qga/vss-win32/qga-vss.tlb | Bin > qga/vss-win32/requester.cpp| 487 ++ > qga/vss-win32/requester.h | 42 +++ > qga/vss-win32/vss-common.h | 128 ++ > rules.mak |9 + > scripts/checkpatch.pl | 37 ++- > scripts/extract-vsssdk-headers | 35 +++ > scripts/qapi.py| 12 + > util/error.c | 35 +++ > 27 files changed, 2146 insertions(+), 24 deletions(-) > create mode 100644 qga/vss-win32.c > create mode 100644 qga/vss-win32.h
[Qemu-devel] RFC [PATCH] Make bdrv_flush synchronous only and update callers
This patch makes bdrv_flush a synchronous function and updates any callers from a coroutine context to use bdrv_co_flush instead. The motivation for this patch comes from the GSoC Continuation-Passing C project. When coroutines were introduced, synchronous functions in the block layer were converted to use asynchronous methods by dynamically detecting if they were being run from a coroutine context by calling qemu_in_coroutine(), and yielding if so. If they were not, they would spawn a new coroutine and poll until the asynchronous counterpart finished. However this approach does not work with CPC as the CPC translator converts all functions annotated coroutine_fn to a different (continuation based) calling convention. This means that coroutine_fn annotated functions cannot be called from a non-coroutine context. This patch is a Request For Comments on the approach of splitting these "dynamic" functions into synchronous and asynchronous versions. This is easy for bdrv_flush as it already has an asynchronous counterpart - bdrv_co_flush. The only caller of bdrv_flush from a coroutine context is mirror_drain in block/mirror.c - this should be annotated as a coroutine_fn as it calls qemu_coroutine_yield(). If this approach meets with approval I will develop a patchset splitting the other "dynamic" functions in the block layer. This will allow all coroutine functions to have a coroutine_fn annotation that can be statically checked (CPC can be used to verify annotations). I have audited the other callers of bdrv_flush, they are included below: block.c: bdrv_reopen_prepare, bdrv_close, bdrv_commit, bdrv_pwrite_sync block/qcow2-cache.c: qcow2_cache_entry_flush, qcow2_cache_flush block/qcow2-refcount.c: qcow2_update_snapshot_refcount block/qcow2-snapshot.c: qcow2_write_snapshots block/qcow2.c: qcow2_mark_dirty, qcow2_mark_clean block/qed-check.c: qed_check_mark_clean block/qed.c: bdrv_qed_open, bdrv_qed_close blockdev.c: external_snapshot_prepare, do_drive_del cpus.c: do_vm_stop hw/block/nvme.c: nvme_clear_ctrl qemu-io-cmds.c: flush_f savevm.c: bdrv_fclose --- block.c| 13 - block/mirror.c | 4 ++-- 2 files changed, 6 insertions(+), 11 deletions(-) diff --git a/block.c b/block.c index 6c493ad..00d71df 100644 --- a/block.c +++ b/block.c @@ -4110,15 +4110,10 @@ int bdrv_flush(BlockDriverState *bs) .ret = NOT_DONE, }; -if (qemu_in_coroutine()) { -/* Fast-path if already in coroutine context */ -bdrv_flush_co_entry(&rwco); -} else { -co = qemu_coroutine_create(bdrv_flush_co_entry); -qemu_coroutine_enter(co, &rwco); -while (rwco.ret == NOT_DONE) { -qemu_aio_wait(); -} +co = qemu_coroutine_create(bdrv_flush_co_entry); +qemu_coroutine_enter(co, &rwco); +while (rwco.ret == NOT_DONE) { +qemu_aio_wait(); } return rwco.ret; diff --git a/block/mirror.c b/block/mirror.c index bed4a7e..3d5da7e 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -282,7 +282,7 @@ static void mirror_free_init(MirrorBlockJob *s) } } -static void mirror_drain(MirrorBlockJob *s) +static void coroutine_fn mirror_drain(MirrorBlockJob *s) { while (s->in_flight > 0) { qemu_coroutine_yield(); @@ -390,7 +390,7 @@ static void coroutine_fn mirror_run(void *opaque) should_complete = false; if (s->in_flight == 0 && cnt == 0) { trace_mirror_before_flush(s); -ret = bdrv_flush(s->target); +ret = bdrv_co_flush(s->target); if (ret < 0) { if (mirror_error_action(s, false, -ret) == BDRV_ACTION_REPORT) { goto immediate_exit; -- 1.8.3.2
Re: [Qemu-devel] seccomp: remove unused syscalls - for 1.6
On Thursday, July 18, 2013 10:31:46 PM Peter Maydell wrote: > On 18 July 2013 21:05, Paul Moore wrote: > > On Thursday, July 18, 2013 08:48:10 PM Peter Maydell wrote: > >> On 18 July 2013 20:39, Paul Moore wrote: > >> > On the plus side, I think libseccomp is very close to being pretty much > >> > feature complete (excluding new architectures that may pop up, at > >> > present > >> > we are only x86, x86_64, x32, and ARM) > >> > >> ...AArch64 ? :-) > > > > Not yet, just 32-bit ARM EABI. > > > > If you've got a working system and are willing to so some hacking or run > > some tests we could work on it for a future libseccomp release. An > > emulated AArch64 VM would also work, but that route can be slow/annoying. > > Simulators are all we have right now (we're juuust getting to the > point where hardware is starting to become available). I wasn't > being serious really, though I'm sure somebody (possibly even > somebody at Red Hat :-)) will work around to it at some point. Regardless, consider it a standing offer. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] seccomp: remove unused syscalls - for 1.6
On 18 July 2013 21:05, Paul Moore wrote: > On Thursday, July 18, 2013 08:48:10 PM Peter Maydell wrote: >> On 18 July 2013 20:39, Paul Moore wrote: >> > On the plus side, I think libseccomp is very close to being pretty much >> > feature complete (excluding new architectures that may pop up, at present >> > we are only x86, x86_64, x32, and ARM) >> >> ...AArch64 ? :-) >> > > Not yet, just 32-bit ARM EABI. > > If you've got a working system and are willing to so some hacking or run some > tests we could work on it for a future libseccomp release. An emulated > AArch64 VM would also work, but that route can be slow/annoying. Simulators are all we have right now (we're juuust getting to the point where hardware is starting to become available). I wasn't being serious really, though I'm sure somebody (possibly even somebody at Red Hat :-)) will work around to it at some point. -- PMM
[Qemu-devel] [Bug 1081416] Re: Qemu 1.2.0 crashes when using tcp serial console and GRUB boots
I'm seeing this too. If someone cares to tell me how I get a core file from qemu-under-libvirt I will do that and report back on debugging. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1081416 Title: Qemu 1.2.0 crashes when using tcp serial console and GRUB boots Status in QEMU: New Bug description: When booting OpenWRT Attitude Adjustement ( http://downloads.openwrt.org/attitude_adjustment/12.09-beta2/x86/generic/openwrt-x86-generic-combined-ext4.img.gz ) with this command line: qemu-system-x86_64 -serial tcp:127.0.0.1: -hda openwrt-x86-generic-combined-ext4.img Qemu crashes as soon as GRUB starts, after network cards start. *** buffer overflow detected ***: /usr/bin/qemu-system-x86_64 terminated === Backtrace: = /usr/lib/libc.so.6(__fortify_fail+0x37)[0x745f2ad7] /usr/lib/libc.so.6(+0xf9bb0)[0x745f0bb0] /usr/lib/libc.so.6(+0xfba47)[0x745f2a47] /usr/bin/qemu-system-x86_64[0x46a628] /usr/bin/qemu-system-x86_64[0x4e8a14] /usr/bin/qemu-system-x86_64[0x4e802b] /usr/lib/libc.so.6(__libc_start_main+0xf5)[0x74518725] /usr/bin/qemu-system-x86_64[0x40d949] Here is a GDB backtrace: Program received signal SIGABRT, Aborted. 0x7452bfa5 in raise () from /usr/lib/libc.so.6 (gdb) bt #0 0x7452bfa5 in raise () from /usr/lib/libc.so.6 #1 0x7452d428 in abort () from /usr/lib/libc.so.6 #2 0x7456acfb in __libc_message () from /usr/lib/libc.so.6 #3 0x745f2ad7 in __fortify_fail () from /usr/lib/libc.so.6 #4 0x745f0bb0 in __chk_fail () from /usr/lib/libc.so.6 #5 0x745f2a47 in __fdelt_warn () from /usr/lib/libc.so.6 #6 0x0046a628 in qemu_iohandler_poll (readfds=0xdb7da0 , writefds=0xdb7e20 , xfds=0x6, xfds@entry=0xdb7ea0 , ret=-1, ret@entry=1) at iohandler.c:121 #7 0x004e8a14 in main_loop_wait (nonblocking=) at main-loop.c:497 #8 0x004e802b in main_loop () at /usr/src/aur/qemu/src/qemu-1.2.0/vl.c:1643 #9 main (argc=, argv=, envp=) at /usr/src/aur/qemu/src/qemu-1.2.0/vl.c:3755 (gdb) Here is a more useless dump... To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1081416/+subscriptions
[Qemu-devel] [Bug 1081416] Re: Qemu 1.2.0 crashes when using tcp serial console and GRUB boots
(fairly sure it's in the iohandler based on a manual check of the symbols, though) -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1081416 Title: Qemu 1.2.0 crashes when using tcp serial console and GRUB boots Status in QEMU: New Bug description: When booting OpenWRT Attitude Adjustement ( http://downloads.openwrt.org/attitude_adjustment/12.09-beta2/x86/generic/openwrt-x86-generic-combined-ext4.img.gz ) with this command line: qemu-system-x86_64 -serial tcp:127.0.0.1: -hda openwrt-x86-generic-combined-ext4.img Qemu crashes as soon as GRUB starts, after network cards start. *** buffer overflow detected ***: /usr/bin/qemu-system-x86_64 terminated === Backtrace: = /usr/lib/libc.so.6(__fortify_fail+0x37)[0x745f2ad7] /usr/lib/libc.so.6(+0xf9bb0)[0x745f0bb0] /usr/lib/libc.so.6(+0xfba47)[0x745f2a47] /usr/bin/qemu-system-x86_64[0x46a628] /usr/bin/qemu-system-x86_64[0x4e8a14] /usr/bin/qemu-system-x86_64[0x4e802b] /usr/lib/libc.so.6(__libc_start_main+0xf5)[0x74518725] /usr/bin/qemu-system-x86_64[0x40d949] Here is a GDB backtrace: Program received signal SIGABRT, Aborted. 0x7452bfa5 in raise () from /usr/lib/libc.so.6 (gdb) bt #0 0x7452bfa5 in raise () from /usr/lib/libc.so.6 #1 0x7452d428 in abort () from /usr/lib/libc.so.6 #2 0x7456acfb in __libc_message () from /usr/lib/libc.so.6 #3 0x745f2ad7 in __fortify_fail () from /usr/lib/libc.so.6 #4 0x745f0bb0 in __chk_fail () from /usr/lib/libc.so.6 #5 0x745f2a47 in __fdelt_warn () from /usr/lib/libc.so.6 #6 0x0046a628 in qemu_iohandler_poll (readfds=0xdb7da0 , writefds=0xdb7e20 , xfds=0x6, xfds@entry=0xdb7ea0 , ret=-1, ret@entry=1) at iohandler.c:121 #7 0x004e8a14 in main_loop_wait (nonblocking=) at main-loop.c:497 #8 0x004e802b in main_loop () at /usr/src/aur/qemu/src/qemu-1.2.0/vl.c:1643 #9 main (argc=, argv=, envp=) at /usr/src/aur/qemu/src/qemu-1.2.0/vl.c:3755 (gdb) Here is a more useless dump... To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1081416/+subscriptions
Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/2] Add Enhanced Three-Speed Ethernet Controller (eTSEC)
On 07/18/2013 04:27:50 AM, Fabien Chouteau wrote: On 07/17/2013 11:02 PM, Scott Wood wrote: > On 07/17/2013 05:17:06 AM, Fabien Chouteau wrote: >> On 07/16/2013 07:50 PM, Scott Wood wrote: >> > On 07/16/2013 10:28:28 AM, Fabien Chouteau wrote: >> >> On 07/16/2013 04:06 AM, Scott Wood wrote: >> >> > On 07/10/2013 12:10:02 PM, Fabien Chouteau wrote: >> >> >> +if (*size == etsec->rx_padding) { >> >> >> +/* The remaining bytes are for padding which is not actually allocated >> >> >> + in the buffer */ >> >> >> + >> >> >> +rem = MIN(etsec->regs[MRBLR].value - bd->length, etsec->rx_padding); >> >> >> + >> >> >> +if (rem > 0) { >> >> >> +memset(padd, 0x0, sizeof(padd)); >> >> >> +etsec->rx_padding -= rem; >> >> >> +*size -= rem; >> >> >> +bd->length+= rem; >> >> >> +cpu_physical_memory_write(bufptr, padd, rem); >> >> >> +} >> >> >> +} >> >> > >> >> > What if *size > 0 && *size < etsec->rx_padding? >> >> >> >> I don't think it's possible... >> > >> > Maybe throw in an assertion, then? >> > >> > I can see how it might not be possible if rx_padding is being used for padding a short frame, since MRBLR must be a multiple of 64, but what if it's 4 bytes for CRC? >> > >> >> Can you explain a possible error scenario? > > 126 byte packet, no fcb. rx_padding is 4 for CRC. Suppose MRBLR is 128. Wouldn't *size be 2 here? > Yes, at the end of the function, but then rx_padding is 2 as well. How is rx_padding 2? rx_init_frame() will set it to 4 (since the packet size is not less than 60). The only other place I see that modifies rx_padding is "etsec->rx_padding -= rem", but that doesn't happen because the "*size == etsec->rx_padding" check happens first. value of "to_write" will be 126: *size = etsec->rx_remaining_data + etsec->rx_padding; = 126 + 4; = 130; to_write = MIN(etsec->rx_fcb_size + *size - etsec->rx_padding, etsec->regs[MRBLR].value); = MIN(0 + 130 - 4, 128); = MIN(126, 128); = 126; So we write the packet in the first part of the BD, and there's 2 bytes left in the BD. *size -= to_write; = 4; bd->length = to_write; = 126; So *size == etsec->rx_padding (This is expected as the first write operation can only write data and no padding, I will comment this fact) rem = MIN(etsec->regs[MRBLR].value - bd->length, etsec->rx_padding); = MIN(128 - 126, 4); = MIN(2, 4); = 2; We write 2 bytes of padding. etsec->rx_padding -= rem; = 2; *size -= rem; = 2; bd->length+= rem; = 128; The BD is full, we will have to put the rest of padding in the next one. What rest of padding? I thought you said rx_padding was 2 somehow? If that were true, then it would be zero at the end. >> > Could you at least have a way to diagnose when the guest OS tries to >> > use some functionality that you don't support, rather than silently >> > doing the wrong thing? >> > >> >> This device is so complex, detecting unsupported features would take too >> much work. > > I was thinking along the lines of marking registers and bits within registers as supported (or which are properly no-ops in QEMU) -- and warning the first time you see a non-default-value write to an unsupported field or register. It could save a lot of debugging. > I think we'll spend more time implementing this than debugging. Another solution is to enable debug output and see which registers are read or write. I don't think it'd be that hard -- you already have an array of registers. The information could easily go in there, and be checked by common infrastructure. -Scott
[Qemu-devel] [PATCH v2 09/11] pseries: savevm support for PCI host bridge
From: David Gibson This adds the necessary support for saving the state of the PAPR virtual PCI host bridge (or host bridges). Signed-off-by: David Gibson Reviewed-by: Anthony Liguori --- hw/ppc/spapr_pci.c | 49 + include/hw/pci-host/spapr.h | 6 +++--- 2 files changed, 52 insertions(+), 3 deletions(-) diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c index dd9e74e..560d375c 100644 --- a/hw/ppc/spapr_pci.c +++ b/hw/ppc/spapr_pci.c @@ -699,6 +699,54 @@ static Property spapr_phb_properties[] = { DEFINE_PROP_END_OF_LIST(), }; +static const VMStateDescription vmstate_spapr_pci_lsi = { +.name = "spapr_pci/lsi", +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField []) { +VMSTATE_UINT32_EQUAL(irq, struct spapr_pci_lsi), + +VMSTATE_END_OF_LIST() +}, +}; + +static const VMStateDescription vmstate_spapr_pci_msi = { +.name = "spapr_pci/lsi", +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField []) { +VMSTATE_UINT32(config_addr, struct spapr_pci_msi), +VMSTATE_UINT32(irq, struct spapr_pci_msi), +VMSTATE_UINT32(nvec, struct spapr_pci_msi), + +VMSTATE_END_OF_LIST() +}, +}; + +static const VMStateDescription vmstate_spapr_pci = { +.name = "spapr_pci", +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField []) { +VMSTATE_UINT64_EQUAL(buid, sPAPRPHBState), +VMSTATE_UINT32_EQUAL(dma_liobn, sPAPRPHBState), +VMSTATE_UINT64_EQUAL(mem_win_addr, sPAPRPHBState), +VMSTATE_UINT64_EQUAL(mem_win_size, sPAPRPHBState), +VMSTATE_UINT64_EQUAL(io_win_addr, sPAPRPHBState), +VMSTATE_UINT64_EQUAL(io_win_size, sPAPRPHBState), +VMSTATE_UINT64_EQUAL(msi_win_addr, sPAPRPHBState), +VMSTATE_STRUCT_ARRAY(lsi_table, sPAPRPHBState, PCI_NUM_PINS, 0, + vmstate_spapr_pci_lsi, struct spapr_pci_lsi), +VMSTATE_STRUCT_ARRAY(msi_table, sPAPRPHBState, SPAPR_MSIX_MAX_DEVS, 0, + vmstate_spapr_pci_msi, struct spapr_pci_msi), + +VMSTATE_END_OF_LIST() +}, +}; + static const char *spapr_phb_root_bus_path(PCIHostState *host_bridge, PCIBus *rootbus) { @@ -717,6 +765,7 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data) sdc->init = spapr_phb_init; dc->props = spapr_phb_properties; dc->reset = spapr_phb_reset; +dc->vmsd = &vmstate_spapr_pci; } static const TypeInfo spapr_phb_info = { diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h index 1e23dbf..93f9511 100644 --- a/include/hw/pci-host/spapr.h +++ b/include/hw/pci-host/spapr.h @@ -52,14 +52,14 @@ typedef struct sPAPRPHBState { sPAPRTCETable *tcet; AddressSpace iommu_as; -struct { +struct spapr_pci_lsi { uint32_t irq; } lsi_table[PCI_NUM_PINS]; -struct { +struct spapr_pci_msi { uint32_t config_addr; uint32_t irq; -int nvec; +uint32_t nvec; } msi_table[SPAPR_MSIX_MAX_DEVS]; QLIST_ENTRY(sPAPRPHBState) list; -- 1.8.0
[Qemu-devel] [PATCH v2 04/11] pseries: savevm support for PAPR VIO logical tty
From: David Gibson This patch adds the necessary VMStateDescription information to support savevm/loadvm for the spapr_tty (PAPR logical serial) device. Signed-off-by: David Gibson Reviewed-by: Anthony Liguori --- hw/char/spapr_vty.c | 16 1 file changed, 16 insertions(+) diff --git a/hw/char/spapr_vty.c b/hw/char/spapr_vty.c index 2993848..a799721 100644 --- a/hw/char/spapr_vty.c +++ b/hw/char/spapr_vty.c @@ -142,6 +142,21 @@ static Property spapr_vty_properties[] = { DEFINE_PROP_END_OF_LIST(), }; +static const VMStateDescription vmstate_spapr_vty = { +.name = "spapr_vty", +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField []) { +VMSTATE_SPAPR_VIO(sdev, VIOsPAPRVTYDevice), + +VMSTATE_UINT32(in, VIOsPAPRVTYDevice), +VMSTATE_UINT32(out, VIOsPAPRVTYDevice), +VMSTATE_BUFFER(buf, VIOsPAPRVTYDevice), +VMSTATE_END_OF_LIST() +}, +}; + static void spapr_vty_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); @@ -152,6 +167,7 @@ static void spapr_vty_class_init(ObjectClass *klass, void *data) k->dt_type = "serial"; k->dt_compatible = "hvterm1"; dc->props = spapr_vty_properties; +dc->vmsd = &vmstate_spapr_vty; } static const TypeInfo spapr_vty_info = { -- 1.8.0
Re: [Qemu-devel] seccomp: remove unused syscalls - for 1.6
On Thursday, July 18, 2013 08:48:10 PM Peter Maydell wrote: > On 18 July 2013 20:39, Paul Moore wrote: > > On the plus side, I think libseccomp is very close to being pretty much > > feature complete (excluding new architectures that may pop up, at present > > we are only x86, x86_64, x32, and ARM) > > ...AArch64 ? :-) > Not yet, just 32-bit ARM EABI. If you've got a working system and are willing to so some hacking or run some tests we could work on it for a future libseccomp release. An emulated AArch64 VM would also work, but that route can be slow/annoying. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH 12/17] qemu-img: add a "map" subcommand
On 07/03/2013 08:34 AM, Paolo Bonzini wrote: > This command dumps the metadata of an entire chain, in either tabular or JSON > format. > > Signed-off-by: Paolo Bonzini > --- > qemu-img-cmds.hx | 6 ++ > qemu-img.c | 186 > +++ > 2 files changed, 192 insertions(+) > > +case OFORMAT_JSON: > +printf("%s{ 'depth': %d, 'start': %lld, 'length': %lld, " > + "'zero': %s, 'data': %s", > + (e->start == 0 ? "[" : ",\n"), > + e->depth, (long long) e->start, (long long) e->length, > + (e->flags & BDRV_BLOCK_ZERO) ? "true" : "false", > + (e->flags & BDRV_BLOCK_DATA) ? "true" : "false"); > +if (e->flags & BDRV_BLOCK_OFFSET_VALID) { > +printf(", 'offset': %lld", (long long) e->offset); > +} > + putchar('}'); Can we please get this format documented in qapi-schema.json, even if we aren't using qapi to generate it yet? -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH V4 4/4] Change default to qcow2 for sync mode none.
On 07/18/2013 01:13 PM, Ian Main wrote: > On Thu, Jul 18, 2013 at 12:56:51PM -0600, Eric Blake wrote: >> On 07/18/2013 12:47 PM, Ian Main wrote: >>> qcow2 supports backing files so it makes sense to default to qcow2 >>> for MIRROR_SYNC_MODE_NONE so that we can use the source as a backing >>> drive and export it via nbd. Defaulting FULL and TOP to SYNC_MODE_NONE >>> breaks tests but that could be fixed if we wanted it. >>> >>> Signed-off-by: Ian Main >>> --- >>> blockdev.c | 5 - >>> qapi-schema.json | 1 + >>> 2 files changed, 5 insertions(+), 1 deletion(-) >> >> Looks okay, but let's answer the meta-question first of whether we >> should just make 'format' mandatory and be done with it. >> >> Also, I've noticed you aren't cc'ing many people; that can slow down >> reviews. http://wiki.qemu.org/Contribute/SubmitAPatch gives hints on >> how to determine the right people to send your patches to, by >> deciphering MAINTAINERS and running ./scripts/getmaintainer.pl. > > Ah ok, I'll add them next rev. > > My take has been to just do a patch that implements the suggestion and > see what people think :). But this list is so high volume that the people that matter won't check your email unless they are cc'd :) If you want opinions fast, it pays to follow the directions. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH V4 1/4] Implement sync modes for drive-backup.
On 07/18/2013 01:06 PM, Ian Main wrote: > On Thu, Jul 18, 2013 at 11:19:43AM -0600, Eric Blake wrote: >> On 07/17/2013 02:04 PM, Ian Main wrote: >>> This patch adds sync-modes to the drive-backup interface and >>> implements the FULL, NONE and TOP modes of synchronization. >>> >>> @@ -1807,6 +1807,10 @@ >>> # @format: #optional the format of the new destination, default is to >>> # probe if @mode is 'existing', else the format of the source >>> # >>> +# @sync: what parts of the disk image should be copied to the destination >>> +#(all the disk, only the sectors allocated in the topmost image, or >>> +#only new I/O). >>> +# >>> # @mode: #optional whether and how QEMU should create a new image, default >>> is >>> #'absolute-paths'. >> >> This hunk looks bogus; the 'DriveBackup' type already documents @sync as >> of commit b53169e, which makes me suspect this hunk got misapplied >> during rebasing. > > Did you check that? I know there was one bit of documentation missing > that I fixed here. I also just did a clean rebase (git am) to > kevin/block and it all applied fine. Yes, it _applied_ fine, because of git's insistence on applying hunks regardless of line fuzzing. It probably applied to 'drive-mirror', since I see this context in the section for 'drive-mirror' when reading the unpatched file at current qemu.git head: > # @format: #optional the format of the new destination, default is to > # probe if @mode is 'existing', else the format of the source > # > # @mode: #optional whether and how QEMU should create a new image, default is > #'absolute-paths'. > # Hence, my argument that you DON'T want this hunk. >>> Example: >>> -> { "execute": "drive-backup", "arguments": { "device": "drive0", >>> + "sync": "full", >>> "target": "backup.img" } } >> >> Ouch - commit b53169e made 'sync' mandatory, but forgot to update the >> example to match; perhaps this hunk ought to be applied independently? > > That's up to you guys, I can split it out if needed. I don't care either way as long as it gets fixed before 1.6; it all depends on how long the review of the rest of your series takes. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATH 0/4] po/Makefile: Fix regression and some minor issues
Am 16.07.2013 07:16, schrieb Stefan Weil: > Am 05.07.2013 22:55, schrieb Stefan Weil: >> These patches are included: >> >> [PATCH 1/4] po/Makefile: Fix and improve help message >> [PATCH 2/4] po/Makefile: Fix *.mo generation for out-of-tree builds >> [PATCH 3/4] po/Makefile: Fix generation of messages.po >> [PATCH 4/4] po/Makefile: Use macro quiet-command for nice looking >> >> Regards >> Stefan W. > Ping? Ping^2
Re: [Qemu-devel] [PATCH] PPC: dbdma: macio: Fix format specifiers (build regression)
Am 16.07.2013 07:54, schrieb Stefan Weil: > Am 12.07.2013 18:48, schrieb Stefan Weil: >> Fix a number of warnings for 32 bit builds (tested on MingW and Linux): >> >> CChw/ide/macio.o >> qemu/hw/ide/macio.c: In function 'pmac_ide_atapi_transfer_cb': >> qemu/hw/ide/macio.c:134:9: error: format '%lx' expects argument of type >> 'long unsigned int', but argument 3 has type 'hwaddr' [-Werror=format] >> qemu/hw/ide/macio.c: In function 'pmac_ide_transfer_cb': >> qemu/hw/ide/macio.c:215:5: error: format '%ld' expects argument of type >> 'long int', but argument 5 has type 'int64_t' [-Werror=format] >> qemu/hw/ide/macio.c:222:9: error: format '%lx' expects argument of type >> 'long unsigned int', but argument 3 has type 'hwaddr' [-Werror=format] >> qemu/hw/ide/macio.c:264:9: error: format '%lx' expects argument of type >> 'long unsigned int', but argument 3 has type 'hwaddr' [-Werror=format] >> cc1: all warnings being treated as errors >> make: *** [hw/ide/macio.o] Error 1 >> >> Signed-off-by: Stefan Weil >> --- >> >> >> >> Hi Anthony, >> >> the patch fixes a build regression which was introduced today. >> Could you please apply it without waiting for the next pull requests? >> >> Thanks, >> Stefan >> >> >> >> hw/ide/macio.c |9 + >> 1 file changed, 5 insertions(+), 4 deletions(-) >> >> diff --git a/hw/ide/macio.c b/hw/ide/macio.c >> index 38ad924..ef4ba2b 100644 >> --- a/hw/ide/macio.c >> +++ b/hw/ide/macio.c >> @@ -131,7 +131,7 @@ static void pmac_ide_atapi_transfer_cb(void *opaque, int >> ret) >> int sector_num = (s->lba << 2) + (s->io_buffer_index >> 9); >> int nsector = io->len >> 9; >> >> -MACIO_DPRINTF("precopying unaligned %d bytes to %#lx\n", >> +MACIO_DPRINTF("precopying unaligned %d bytes to %#" HWADDR_PRIx >> "\n", >>unaligned, io->addr + io->len - unaligned); >> >> bdrv_read(s->bs, sector_num + nsector, io->remainder, 1); >> @@ -212,14 +212,15 @@ static void pmac_ide_transfer_cb(void *opaque, int ret) >> s->nsector -= n; >> } >> >> -MACIO_DPRINTF("remainder: %d io->len: %d nsector: %d sector_num: %ld\n", >> +MACIO_DPRINTF("remainder: %d io->len: %d nsector: %d " >> + "sector_num: %" PRId64 "\n", >>io->remainder_len, io->len, s->nsector, sector_num); >> if (io->remainder_len && io->len) { >> /* guest wants the rest of its previous transfer */ >> int remainder_len = MIN(io->remainder_len, io->len); >> uint8_t *p = &io->remainder[0x200 - remainder_len]; >> >> -MACIO_DPRINTF("copying remainder %d bytes at %#lx\n", >> +MACIO_DPRINTF("copying remainder %d bytes at %#" HWADDR_PRIx "\n", >>remainder_len, io->addr); >> >> switch (s->dma_cmd) { >> @@ -261,7 +262,7 @@ static void pmac_ide_transfer_cb(void *opaque, int ret) >> if (unaligned) { >> int nsector = io->len >> 9; >> >> -MACIO_DPRINTF("precopying unaligned %d bytes to %#lx\n", >> +MACIO_DPRINTF("precopying unaligned %d bytes to %#" HWADDR_PRIx >> "\n", >>unaligned, io->addr + io->len - unaligned); >> >> switch (s->dma_cmd) { > > Ping? Ping²?
Re: [Qemu-devel] seccomp: remove unused syscalls - for 1.6
On 18 July 2013 20:39, Paul Moore wrote: > On the plus side, I think libseccomp is very close to being pretty much > feature complete (excluding new architectures that may pop up, at present we > are only x86, x86_64, x32, and ARM) ...AArch64 ? :-) -- PMM
[Qemu-devel] [PATCH v2 11/11] xics: rename types to be sane and follow coding style
>From Ben: Basically, in HW the layout of the interrupt network is: - One ICP per processor thread (the "presenter"). This contains the registers to fetch a pending interrupt (ack), EOI, and control the processor priority. - One ICS per logical source of interrupts (ie, one per PCI host bridge, and a few others here or there). This contains the per-interrupt source configuration (target processor(s), priority, mask) and the per-interrupt internal state. Under PAPR, there is a single "virtual" ICS ... somewhat (it's a bit oddball what pHyp does here, arguably there are two but we can ignore that distinction). There is no register level access. A pair of firmware (RTAS) calls is used to configure each virtual interrupt. So our model here is somewhat the same. We have one ICS in the emulated XICS which arguably *is* the emulated XICS, there's no point making it a separate "device", that would just be gross, and each VCPU has an associated ICP. Yet we call the "XICS" struct icp_state and then the ICPs 'struct icp_server_state'. It's particularly confusing when all of the functions have xics_prefixes yet take *icp arguments. Rename: struct icp_state -> XICSState struct icp_server_state -> ICPState struct ics_state -> ICSState struct ics_irq_state -> ICSIRQState Signed-off-by: David Gibson [aik: added ics_resend() on post_load] Signed-off-by: Alexey Kardashevskiy Signed-off-by: Anthony Liguori --- hw/intc/xics.c | 345 + hw/ppc/spapr.c | 28 include/hw/ppc/spapr.h | 3 +- include/hw/ppc/xics.h | 74 ++- 4 files changed, 330 insertions(+), 120 deletions(-) diff --git a/hw/intc/xics.c b/hw/intc/xics.c index 091912e..6b3c071 100644 --- a/hw/intc/xics.c +++ b/hw/intc/xics.c @@ -34,34 +34,19 @@ * ICP: Presentation layer */ -struct icp_server_state { -uint32_t xirr; -uint8_t pending_priority; -uint8_t mfrr; -qemu_irq output; -}; - #define XISR_MASK 0x00ff #define CPPR_MASK 0xff00 #define XISR(ss) (((ss)->xirr) & XISR_MASK) #define CPPR(ss) (((ss)->xirr) >> 24) -struct ics_state; - -struct icp_state { -long nr_servers; -struct icp_server_state *ss; -struct ics_state *ics; -}; - -static void ics_reject(struct ics_state *ics, int nr); -static void ics_resend(struct ics_state *ics); -static void ics_eoi(struct ics_state *ics, int nr); +static void ics_reject(ICSState *ics, int nr); +static void ics_resend(ICSState *ics); +static void ics_eoi(ICSState *ics, int nr); -static void icp_check_ipi(struct icp_state *icp, int server) +static void icp_check_ipi(XICSState *icp, int server) { -struct icp_server_state *ss = icp->ss + server; +ICPState *ss = icp->ss + server; if (XISR(ss) && (ss->pending_priority <= ss->mfrr)) { return; @@ -78,9 +63,9 @@ static void icp_check_ipi(struct icp_state *icp, int server) qemu_irq_raise(ss->output); } -static void icp_resend(struct icp_state *icp, int server) +static void icp_resend(XICSState *icp, int server) { -struct icp_server_state *ss = icp->ss + server; +ICPState *ss = icp->ss + server; if (ss->mfrr < CPPR(ss)) { icp_check_ipi(icp, server); @@ -88,9 +73,9 @@ static void icp_resend(struct icp_state *icp, int server) ics_resend(icp->ics); } -static void icp_set_cppr(struct icp_state *icp, int server, uint8_t cppr) +static void icp_set_cppr(XICSState *icp, int server, uint8_t cppr) { -struct icp_server_state *ss = icp->ss + server; +ICPState *ss = icp->ss + server; uint8_t old_cppr; uint32_t old_xisr; @@ -112,9 +97,9 @@ static void icp_set_cppr(struct icp_state *icp, int server, uint8_t cppr) } } -static void icp_set_mfrr(struct icp_state *icp, int server, uint8_t mfrr) +static void icp_set_mfrr(XICSState *icp, int server, uint8_t mfrr) { -struct icp_server_state *ss = icp->ss + server; +ICPState *ss = icp->ss + server; ss->mfrr = mfrr; if (mfrr < CPPR(ss)) { @@ -122,7 +107,7 @@ static void icp_set_mfrr(struct icp_state *icp, int server, uint8_t mfrr) } } -static uint32_t icp_accept(struct icp_server_state *ss) +static uint32_t icp_accept(ICPState *ss) { uint32_t xirr = ss->xirr; @@ -135,9 +120,9 @@ static uint32_t icp_accept(struct icp_server_state *ss) return xirr; } -static void icp_eoi(struct icp_state *icp, int server, uint32_t xirr) +static void icp_eoi(XICSState *icp, int server, uint32_t xirr) { -struct icp_server_state *ss = icp->ss + server; +ICPState *ss = icp->ss + server; /* Send EOI -> ICS */ ss->xirr = (ss->xirr & ~CPPR_MASK) | (xirr & CPPR_MASK); @@ -148,9 +133,9 @@ static void icp_eoi(struct icp_state *icp, int server, uint32_t xirr) } } -static void icp_irq(struct icp_state *icp, int server, int nr, uint8_t priority) +static void icp_irq(XICSState *icp, int server, int nr, uin
[Qemu-devel] [PATCH v2 10/11] pseries: savevm support with KVM
From: Alexey Kardashevskiy At present, the savevm / migration support for the pseries machine will not work when KVM is enabled. That's because KVM manages the guest's hash page table in the host kernel, so qemu has no visibility of it. This patch fixes this by using new kernel interfaces to extract and reinsert the guest's hash table during the migration process. Signed-off-by: David Gibson --- hw/ppc/spapr.c | 106 +++-- include/hw/ppc/spapr.h | 1 + target-ppc/kvm.c | 69 target-ppc/kvm_ppc.h | 22 ++ 4 files changed, 176 insertions(+), 22 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 734a782..7144829 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -735,17 +735,27 @@ static int htab_save_setup(QEMUFile *f, void *opaque) { sPAPREnvironment *spapr = opaque; -spapr->htab_save_index = 0; -spapr->htab_first_pass = true; - /* "Iteration" header */ qemu_put_be32(f, spapr->htab_shift); +if (spapr->htab) { +spapr->htab_save_index = 0; +spapr->htab_first_pass = true; +} else { +assert(kvm_enabled()); + +spapr->htab_fd = kvmppc_get_htab_fd(false); +if (spapr->htab_fd < 0) { +fprintf(stderr, "Unable to open fd for reading hash table from KVM: %s\n", +strerror(errno)); +return -1; +} +} + + return 0; } -#define MAX_ITERATION_NS500 /* 5 ms */ - static void htab_save_first_pass(QEMUFile *f, sPAPREnvironment *spapr, int64_t max_ns) { @@ -796,8 +806,8 @@ static void htab_save_first_pass(QEMUFile *f, sPAPREnvironment *spapr, spapr->htab_save_index = index; } -static bool htab_save_later_pass(QEMUFile *f, sPAPREnvironment *spapr, - int64_t max_ns) +static int htab_save_later_pass(QEMUFile *f, sPAPREnvironment *spapr, +int64_t max_ns) { bool final = max_ns < 0; int htabslots = HTAB_SIZE(spapr) / HASH_PTE_SIZE_64; @@ -870,21 +880,32 @@ static bool htab_save_later_pass(QEMUFile *f, sPAPREnvironment *spapr, spapr->htab_save_index = index; -return (examined >= htabslots) && (sent == 0); +return (examined >= htabslots) && (sent == 0) ? 1 : 0; } +#define MAX_ITERATION_NS500 /* 5 ms */ +#define MAX_KVM_BUF_SIZE2048 + static int htab_save_iterate(QEMUFile *f, void *opaque) { sPAPREnvironment *spapr = opaque; -bool nothingleft = false;; +int rc = 0; /* Iteration header */ qemu_put_be32(f, 0); -if (spapr->htab_first_pass) { +if (!spapr->htab) { +assert(kvm_enabled()); + +rc = kvmppc_save_htab(f, spapr->htab_fd, + MAX_KVM_BUF_SIZE, MAX_ITERATION_NS); +if (rc < 0) { +return rc; +} +} else if (spapr->htab_first_pass) { htab_save_first_pass(f, spapr, MAX_ITERATION_NS); } else { -nothingleft = htab_save_later_pass(f, spapr, MAX_ITERATION_NS); +rc = htab_save_later_pass(f, spapr, MAX_ITERATION_NS); } /* End marker */ @@ -892,7 +913,7 @@ static int htab_save_iterate(QEMUFile *f, void *opaque) qemu_put_be16(f, 0); qemu_put_be16(f, 0); -return nothingleft ? 1 : 0; +return rc; } static int htab_save_complete(QEMUFile *f, void *opaque) @@ -902,7 +923,20 @@ static int htab_save_complete(QEMUFile *f, void *opaque) /* Iteration header */ qemu_put_be32(f, 0); -htab_save_later_pass(f, spapr, -1); +if (!spapr->htab) { +int rc; + +assert(kvm_enabled()); + +rc = kvmppc_save_htab(f, spapr->htab_fd, MAX_KVM_BUF_SIZE, -1); +if (rc < 0) { +return rc; +} +close(spapr->htab_fd); +spapr->htab_fd = -1; +} else { +htab_save_later_pass(f, spapr, -1); +} /* End marker */ qemu_put_be32(f, 0); @@ -916,6 +950,7 @@ static int htab_load(QEMUFile *f, void *opaque, int version_id) { sPAPREnvironment *spapr = opaque; uint32_t section_hdr; +int fd = -1; if (version_id < 1 || version_id > 1) { fprintf(stderr, "htab_load() bad version\n"); @@ -932,6 +967,16 @@ static int htab_load(QEMUFile *f, void *opaque, int version_id) return 0; } +if (!spapr->htab) { +assert(kvm_enabled()); + +fd = kvmppc_get_htab_fd(true); +if (fd < 0) { +fprintf(stderr, "Unable to open fd to restore KVM hash table: %s\n", +strerror(errno)); +} +} + while (true) { uint32_t index; uint16_t n_valid, n_invalid; @@ -945,24 +990,41 @@ static int htab_load(QEMUFile *f, void *opaque, int version_id) break; } -if ((index + n_valid + n_invalid) >= +if ((index + n_valid + n_invalid) > (HTAB_SIZE
[Qemu-devel] [PATCH v2 01/11] target-ppc: Convert ppc cpu savevm to VMStateDescription
From: Alexey Kardashevskiy The savevm code for the powerpc cpu emulation is currently based around the old register_savevm() rather than register_vmstate() method. It's also rather broken, missing some important state on some CPU models. This patch completely rewrites the savevm for target-ppc, using the new VMStateDescription approach. Exactly what needs to be saved in what configurations has been more carefully examined, too. This introduces a new version (5) of the cpu save format. The old load function is retained to support version 4 images. Signed-off-by: David Gibson [aik: ppc cpu savevm convertion fixed to use PowerPCCPU instead of CPUPPCState] Signed-off-by: Alexey Kardashevskiy --- target-ppc/cpu-qom.h| 4 + target-ppc/cpu.h| 8 +- target-ppc/machine.c| 531 target-ppc/translate_init.c | 2 + 4 files changed, 452 insertions(+), 93 deletions(-) diff --git a/target-ppc/cpu-qom.h b/target-ppc/cpu-qom.h index 7132599..c660e3c 100644 --- a/target-ppc/cpu-qom.h +++ b/target-ppc/cpu-qom.h @@ -106,4 +106,8 @@ void ppc_cpu_dump_state(CPUState *cpu, FILE *f, fprintf_function cpu_fprintf, void ppc_cpu_dump_statistics(CPUState *cpu, FILE *f, fprintf_function cpu_fprintf, int flags); +#ifndef CONFIG_USER_ONLY +extern const struct VMStateDescription vmstate_ppc_cpu; +#endif + #endif diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h index 7a7b1bf..454ea13 100644 --- a/target-ppc/cpu.h +++ b/target-ppc/cpu.h @@ -948,7 +948,7 @@ struct CPUPPCState { #if defined(TARGET_PPC64) /* PowerPC 64 SLB area */ ppc_slb_t slb[64]; -int slb_nr; +int32_t slb_nr; #endif /* segment registers */ hwaddr htab_base; @@ -957,11 +957,11 @@ struct CPUPPCState { /* externally stored hash table */ uint8_t *external_htab; /* BATs */ -int nb_BATs; +uint32_t nb_BATs; target_ulong DBAT[2][8]; target_ulong IBAT[2][8]; /* PowerPC TLB registers (for 4xx, e500 and 60x software driven TLBs) */ -int nb_tlb; /* Total number of TLB */ +int32_t nb_tlb; /* Total number of TLB */ int tlb_per_way; /* Speed-up helper: used to avoid divisions at run time */ int nb_ways; /* Number of ways in the TLB set*/ int last_way;/* Last used way used to allocate TLB in a LRU way */ @@ -1176,8 +1176,6 @@ static inline CPUPPCState *cpu_init(const char *cpu_model) #define cpu_signal_handler cpu_ppc_signal_handler #define cpu_list ppc_cpu_list -#define CPU_SAVE_VERSION 4 - /* MMU modes definitions */ #define MMU_MODE0_SUFFIX _user #define MMU_MODE1_SUFFIX _kernel diff --git a/target-ppc/machine.c b/target-ppc/machine.c index 2d10adb..12e1512 100644 --- a/target-ppc/machine.c +++ b/target-ppc/machine.c @@ -1,96 +1,12 @@ #include "hw/hw.h" #include "hw/boards.h" #include "sysemu/kvm.h" +#include "helper_regs.h" -void cpu_save(QEMUFile *f, void *opaque) +static int cpu_load_old(QEMUFile *f, void *opaque, int version_id) { -CPUPPCState *env = (CPUPPCState *)opaque; -unsigned int i, j; -uint32_t fpscr; -target_ulong xer; - -for (i = 0; i < 32; i++) -qemu_put_betls(f, &env->gpr[i]); -#if !defined(TARGET_PPC64) -for (i = 0; i < 32; i++) -qemu_put_betls(f, &env->gprh[i]); -#endif -qemu_put_betls(f, &env->lr); -qemu_put_betls(f, &env->ctr); -for (i = 0; i < 8; i++) -qemu_put_be32s(f, &env->crf[i]); -xer = cpu_read_xer(env); -qemu_put_betls(f, &xer); -qemu_put_betls(f, &env->reserve_addr); -qemu_put_betls(f, &env->msr); -for (i = 0; i < 4; i++) -qemu_put_betls(f, &env->tgpr[i]); -for (i = 0; i < 32; i++) { -union { -float64 d; -uint64_t l; -} u; -u.d = env->fpr[i]; -qemu_put_be64(f, u.l); -} -fpscr = env->fpscr; -qemu_put_be32s(f, &fpscr); -qemu_put_sbe32s(f, &env->access_type); -#if defined(TARGET_PPC64) -qemu_put_betls(f, &env->spr[SPR_ASR]); -qemu_put_sbe32s(f, &env->slb_nr); -#endif -qemu_put_betls(f, &env->spr[SPR_SDR1]); -for (i = 0; i < 32; i++) -qemu_put_betls(f, &env->sr[i]); -for (i = 0; i < 2; i++) -for (j = 0; j < 8; j++) -qemu_put_betls(f, &env->DBAT[i][j]); -for (i = 0; i < 2; i++) -for (j = 0; j < 8; j++) -qemu_put_betls(f, &env->IBAT[i][j]); -qemu_put_sbe32s(f, &env->nb_tlb); -qemu_put_sbe32s(f, &env->tlb_per_way); -qemu_put_sbe32s(f, &env->nb_ways); -qemu_put_sbe32s(f, &env->last_way); -qemu_put_sbe32s(f, &env->id_tlbs); -qemu_put_sbe32s(f, &env->nb_pids); -if (env->tlb.tlb6) { -// XXX assumes 6xx -for (i = 0; i < env->nb_tlb; i++) { -qemu_put_betls(f, &env->tlb.tlb6[i].pte0); -qemu_put_betls(f, &env->tlb.tlb6[i].pte1); -q
[Qemu-devel] [PATCH v2 07/11] pseries: savevm support for PAPR virtual SCSI
From: David Gibson This patch adds the necessary support for saving the state of the PAPR VIO virtual SCSI device. This also saves and restores active SCSI requests. [aik: implemented vscsi_req save/restore] Signed-off-by: Alexey Kardashevskiy Cc: David Gibson --- hw/scsi/spapr_vscsi.c | 82 ++- 1 file changed, 81 insertions(+), 1 deletion(-) diff --git a/hw/scsi/spapr_vscsi.c b/hw/scsi/spapr_vscsi.c index e2740fb..a86199b 100644 --- a/hw/scsi/spapr_vscsi.c +++ b/hw/scsi/spapr_vscsi.c @@ -578,6 +578,69 @@ static void vscsi_request_cancelled(SCSIRequest *sreq) vscsi_put_req(req); } +static const VMStateDescription vmstate_spapr_vscsi_req = { +.name = "spapr_vscsi_req", +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField []) { +VMSTATE_BUFFER(crq.raw, vscsi_req), +VMSTATE_BUFFER(iu.srp.reserved, vscsi_req), +VMSTATE_UINT32(qtag, vscsi_req), +VMSTATE_BOOL(active, vscsi_req), +VMSTATE_UINT32(data_len, vscsi_req), +VMSTATE_BOOL(writing, vscsi_req), +VMSTATE_UINT32(senselen, vscsi_req), +VMSTATE_BUFFER(sense, vscsi_req), +VMSTATE_UINT8(dma_fmt, vscsi_req), +VMSTATE_UINT16(local_desc, vscsi_req), +VMSTATE_UINT16(total_desc, vscsi_req), +VMSTATE_UINT16(cdb_offset, vscsi_req), + /*Restart SCSI request from the beginning for now */ + /*VMSTATE_UINT16(cur_desc_num, vscsi_req), +VMSTATE_UINT16(cur_desc_offset, vscsi_req),*/ +VMSTATE_END_OF_LIST() +}, +}; + +static void vscsi_save_request(QEMUFile *f, SCSIRequest *sreq) +{ +vscsi_req *req = sreq->hba_private; +assert(req->active); + +vmstate_save_state(f, &vmstate_spapr_vscsi_req, req); + +dprintf("VSCSI: saving tag=%u, current desc#%d, offset=%x\n", +req->qtag, req->cur_desc_num, req->cur_desc_offset); +} + +static void *vscsi_load_request(QEMUFile *f, SCSIRequest *sreq) +{ +SCSIBus *bus = sreq->bus; +VSCSIState *s = VIO_SPAPR_VSCSI_DEVICE(bus->qbus.parent); +vscsi_req *req; +int rc; + +assert(sreq->tag < VSCSI_REQ_LIMIT); +req = &s->reqs[sreq->tag]; +assert(!req->active); + +memset(req, 0, sizeof(*req)); +rc = vmstate_load_state(f, &vmstate_spapr_vscsi_req, req, 1); +if (rc) { +fprintf(stderr, "VSCSI: failed loading request tag#%u\n", sreq->tag); +return NULL; +} +assert(req->active); + +req->sreq = scsi_req_ref(sreq); + +dprintf("VSCSI: restoring tag=%u, current desc#%d, offset=%x\n", +req->qtag, req->cur_desc_num, req->cur_desc_offset); + +return req; +} + static void vscsi_process_login(VSCSIState *s, vscsi_req *req) { union viosrp_iu *iu = &req->iu; @@ -932,7 +995,9 @@ static const struct SCSIBusInfo vscsi_scsi_info = { .transfer_data = vscsi_transfer_data, .complete = vscsi_command_complete, -.cancel = vscsi_request_cancelled +.cancel = vscsi_request_cancelled, +.save_request = vscsi_save_request, +.load_request = vscsi_load_request, }; static void spapr_vscsi_reset(VIOsPAPRDevice *dev) @@ -991,6 +1056,20 @@ static Property spapr_vscsi_properties[] = { DEFINE_PROP_END_OF_LIST(), }; +static const VMStateDescription vmstate_spapr_vscsi = { +.name = "spapr_vscsi", +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField []) { +VMSTATE_SPAPR_VIO(vdev, VSCSIState), +/* VSCSI state */ +/* */ + +VMSTATE_END_OF_LIST() +}, +}; + static void spapr_vscsi_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); @@ -1005,6 +1084,7 @@ static void spapr_vscsi_class_init(ObjectClass *klass, void *data) k->signal_mask = 0x0001; dc->props = spapr_vscsi_properties; k->rtce_window_size = 0x1000; +dc->vmsd = &vmstate_spapr_vscsi; } static const TypeInfo spapr_vscsi_info = { -- 1.8.0
Re: [Qemu-devel] [PATCH v2 0/2] libqtest leak fix & cleanup
Applied. Thanks. Regards, Anthony Liguori
Re: [Qemu-devel] seccomp: remove unused syscalls - for 1.6
On Thursday, July 18, 2013 06:37:15 PM Paolo Bonzini wrote: > Il 18/07/2013 18:35, Eduardo Otubo ha scritto: > > On 07/18/2013 01:28 PM, Anthony Liguori wrote: > >> Eduardo Otubo writes: > >>> Hello all, > >> > >>> In this small patch series I basically: > >> Cover letter should be marked [PATCH 0/2]. Otherwise it defeats > >> filtering. > >> > >> Would like to see a Reviewed-by from someone before applying this. > > > > I'm running some tests with qemu && xen, I'll post a v3 by the end of > > the day. I'll format the cover letter in the correct way next time. > > I feel that, at some point, grep and code review must trump experiments... > > Paul, how did you guys handle this in other projects? To the best of my knowledge QEMU currently stands alone with its complexity and use of seccomp filtering. There are other applications, but they are either of the syscall sandboxing type where the users define the filters, or the rigid, smaller, well defined filter type. QEMU is both large and has a huge number of options which affect the syscalls used. At some point it would be nice to develop a mechanism to do some static analysis on a binary and its associated libraries to come up with a worst case filter (worst case because you might not want all the syscalls that a library uses, e.g. glibc). Unfortunately, we don't have such a tool the moment - it's hard enough generating correct filters with a nice architecture agnostic manner :) On the plus side, I think libseccomp is very close to being pretty much feature complete (excluding new architectures that may pop up, at present we are only x86, x86_64, x32, and ARM) so I'll be able to start turning some effort towards better tools and patches for existing applications. -- paul moore security and virtualization @ redhat
[Qemu-devel] [PATCH v2 05/11] spapr-tce: make sPAPRTCETable a proper device
Model TCE tables as a device that's hooked up as a child object to the owner. Besides the code cleanup, we get a few nice benefits: 1) free actually works now (it was dead code before) 2) the TCE information is visible in the device tree 3) we can expose table information as properties such that if we change the window_size, we can use globals to keep migration working. Signed-off-by: David Gibson [dwg: pseries: savevm support for PAPR TCE tables] Signed-off-by: Alexey Kardashevskiy [alexey: ppc kvm: fix to compile] Signed-off-by: Anthony Liguori --- hw/ppc/spapr.c | 3 - hw/ppc/spapr_iommu.c | 146 - hw/ppc/spapr_pci.c | 2 +- hw/ppc/spapr_vio.c | 2 +- include/hw/ppc/spapr.h | 23 +--- target-ppc/kvm.c | 4 +- 6 files changed, 117 insertions(+), 63 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 48ae092..e340708 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -848,9 +848,6 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args) /* Set up EPOW events infrastructure */ spapr_events_init(spapr); -/* Set up IOMMU */ -spapr_iommu_init(); - /* Set up VIO bus */ spapr->vio_bus = spapr_vio_bus_init(); diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c index 89b33a5..3d4a1fc 100644 --- a/hw/ppc/spapr_iommu.c +++ b/hw/ppc/spapr_iommu.c @@ -36,17 +36,6 @@ enum sPAPRTCEAccess { SPAPR_TCE_RW = 3, }; -struct sPAPRTCETable { -uint32_t liobn; -uint32_t window_size; -sPAPRTCE *table; -bool bypass; -int fd; -MemoryRegion iommu; -QLIST_ENTRY(sPAPRTCETable) list; -}; - - QLIST_HEAD(spapr_tce_tables, sPAPRTCETable) spapr_tce_tables; static sPAPRTCETable *spapr_tce_find_by_liobn(uint32_t liobn) @@ -96,7 +85,7 @@ static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr) return (IOMMUTLBEntry) { .perm = IOMMU_NONE }; } -tce = tcet->table[addr >> SPAPR_TCE_PAGE_SHIFT].tce; +tce = tcet->table[addr >> SPAPR_TCE_PAGE_SHIFT]; #ifdef DEBUG_TCE fprintf(stderr, " -> *paddr=0x%llx, *len=0x%llx\n", @@ -112,55 +101,97 @@ static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr) }; } +static int spapr_tce_table_pre_load(void *opaque) +{ +sPAPRTCETable *tcet = SPAPR_TCE_TABLE(opaque); + +tcet->nb_table = tcet->window_size >> SPAPR_TCE_PAGE_SHIFT; + +return 0; +} + +static const VMStateDescription vmstate_spapr_tce_table = { +.name = "spapr_iommu", +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.pre_load = spapr_tce_table_pre_load, +.fields = (VMStateField []) { +/* Sanity check */ +VMSTATE_UINT32_EQUAL(liobn, sPAPRTCETable), +VMSTATE_UINT32_EQUAL(window_size, sPAPRTCETable), + +/* IOMMU state */ +VMSTATE_BOOL(bypass, sPAPRTCETable), +VMSTATE_VARRAY_UINT32(table, sPAPRTCETable, nb_table, 0, vmstate_info_uint64, uint64_t), + +VMSTATE_END_OF_LIST() +}, +}; + static MemoryRegionIOMMUOps spapr_iommu_ops = { .translate = spapr_tce_translate_iommu, }; -sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn, size_t window_size) +static int spapr_tce_table_realize(DeviceState *dev) { -sPAPRTCETable *tcet; - -if (spapr_tce_find_by_liobn(liobn)) { -fprintf(stderr, "Attempted to create TCE table with duplicate" -" LIOBN 0x%x\n", liobn); -return NULL; -} - -if (!window_size) { -return NULL; -} - -tcet = g_malloc0(sizeof(*tcet)); -tcet->liobn = liobn; -tcet->window_size = window_size; +sPAPRTCETable *tcet = SPAPR_TCE_TABLE(dev); if (kvm_enabled()) { -tcet->table = kvmppc_create_spapr_tce(liobn, - window_size, +tcet->table = kvmppc_create_spapr_tce(tcet->liobn, + tcet->window_size, &tcet->fd); } if (!tcet->table) { -size_t table_size = (window_size >> SPAPR_TCE_PAGE_SHIFT) -* sizeof(sPAPRTCE); +size_t table_size = (tcet->window_size >> SPAPR_TCE_PAGE_SHIFT) +* sizeof(uint64_t); tcet->table = g_malloc0(table_size); } +tcet->nb_table = tcet->window_size >> SPAPR_TCE_PAGE_SHIFT; #ifdef DEBUG_TCE fprintf(stderr, "spapr_iommu: New TCE table @ %p, liobn=0x%x, " "table @ %p, fd=%d\n", tcet, liobn, tcet->table, tcet->fd); #endif -memory_region_init_iommu(&tcet->iommu, OBJECT(owner), &spapr_iommu_ops, +memory_region_init_iommu(&tcet->iommu, OBJECT(dev), &spapr_iommu_ops, "iommu-spapr", UINT64_MAX); QLIST_INSERT_HEAD(&spapr_tce_tables, tcet, list); +return 0; +} + +sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn, size_t window_size)
[Qemu-devel] [PATCH v2 00/11] pseries: migration and QOM support
This series is based on Alexey's series: spapr: migration, pci, msi, power8 Which in turn was based on work by David Gibson. I've removed the bits not related to migration and made the following changes: 1) QOMify TCE tables and XICS 2) Do everything in terms of VMStateDescriptions 3) Fix endianness problem with TCE table translation a) Drop the VMSTATE_DIVIDE thing in the process I've tested this with a TCG pseries guest on an x86_64 host. Since v1, I've incorporated some fixes that Alexey posted upon testing with KVM. Regards, Anthony Liguori
Re: [Qemu-devel] [PATCH] ioport: remove LITTLE_ENDIAN mark for portio
Applied. Thanks. Regards, Anthony Liguori
Re: [Qemu-devel] [PATCH 0/2] changes related to monitor flow control
Applied. Thanks. Regards, Anthony Liguori
Re: [Qemu-devel] [PATCH] configure: Provide more helpful message if libvte not present
Applied. Thanks. Regards, Anthony Liguori
Re: [Qemu-devel] [PULL] virtio-ccw: dataplane enablement
Pulled. Thanks. Regards, Anthony Liguori
[Qemu-devel] [PATCH v2 03/11] pseries: savevm support for PAPR VIO logical lan
From: David Gibson This patch adds the necessary VMStateDescription information to support savevm/loadvm for the spapr_llan (PAPR logical lan) device. Signed-off-by: David Gibson Reviewed-by: Anthony Liguori --- hw/net/spapr_llan.c | 24 ++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/hw/net/spapr_llan.c b/hw/net/spapr_llan.c index 03a09f2..46f7d5f 100644 --- a/hw/net/spapr_llan.c +++ b/hw/net/spapr_llan.c @@ -81,9 +81,9 @@ typedef struct VIOsPAPRVLANDevice { VIOsPAPRDevice sdev; NICConf nicconf; NICState *nic; -int isopen; +bool isopen; target_ulong buf_list; -int add_buf_ptr, use_buf_ptr, rx_bufs; +uint32_t add_buf_ptr, use_buf_ptr, rx_bufs; target_ulong rxq_ptr; } VIOsPAPRVLANDevice; @@ -500,6 +500,25 @@ static Property spapr_vlan_properties[] = { DEFINE_PROP_END_OF_LIST(), }; +static const VMStateDescription vmstate_spapr_llan = { +.name = "spapr_llan", +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField []) { +VMSTATE_SPAPR_VIO(sdev, VIOsPAPRVLANDevice), +/* LLAN state */ +VMSTATE_BOOL(isopen, VIOsPAPRVLANDevice), +VMSTATE_UINTTL(buf_list, VIOsPAPRVLANDevice), +VMSTATE_UINT32(add_buf_ptr, VIOsPAPRVLANDevice), +VMSTATE_UINT32(use_buf_ptr, VIOsPAPRVLANDevice), +VMSTATE_UINT32(rx_bufs, VIOsPAPRVLANDevice), +VMSTATE_UINTTL(rxq_ptr, VIOsPAPRVLANDevice), + +VMSTATE_END_OF_LIST() +}, +}; + static void spapr_vlan_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); @@ -514,6 +533,7 @@ static void spapr_vlan_class_init(ObjectClass *klass, void *data) k->signal_mask = 0x1; dc->props = spapr_vlan_properties; k->rtce_window_size = 0x1000; +dc->vmsd = &vmstate_spapr_llan; } static const TypeInfo spapr_vlan_info = { -- 1.8.0
[Qemu-devel] [PATCH v2 06/11] pseries: rework PAPR virtual SCSI
From: Alexey Kardashevskiy The patch reimplements handling of indirect requests in order to simplify upcoming live migration support. - all pointers (except SCSIRequest*) were replaces with integer indexes and offsets; - DMA'ed srp_direct_buf kept untouched (ie. BE format); - vscsi_fetch_desc() is added, now it is the only place where descriptors are fetched and byteswapped; - vscsi_req struct fields converted to migration-friendly types; - many dprintf()'s fixed. This also removed an unused field 'lun' from the spapr_vscsi device which is assigned, but never used. So, remove it. [David Gibson: removed unused 'lun'] Signed-off-by: Alexey Kardashevskiy --- Changes: 2013/08/07: * fixed handling of indirect requests with an additional table descriptor --- hw/scsi/spapr_vscsi.c | 223 +- 1 file changed, 130 insertions(+), 93 deletions(-) diff --git a/hw/scsi/spapr_vscsi.c b/hw/scsi/spapr_vscsi.c index e8978bf..e2740fb 100644 --- a/hw/scsi/spapr_vscsi.c +++ b/hw/scsi/spapr_vscsi.c @@ -75,20 +75,19 @@ typedef struct vscsi_req { /* SCSI request tracking */ SCSIRequest *sreq; uint32_tqtag; /* qemu tag != srp tag */ -int lun; -int active; -longdata_len; -int writing; -int senselen; +boolactive; +uint32_tdata_len; +boolwriting; +uint32_tsenselen; uint8_t sense[SCSI_SENSE_BUF_SIZE]; /* RDMA related bits */ uint8_t dma_fmt; -struct srp_direct_buf ext_desc; -struct srp_direct_buf *cur_desc; -struct srp_indirect_buf *ind_desc; -int local_desc; -int total_desc; +uint16_tlocal_desc; +uint16_ttotal_desc; +uint16_tcdb_offset; +uint16_tcur_desc_num; +uint16_tcur_desc_offset; } vscsi_req; #define TYPE_VIO_SPAPR_VSCSI_DEVICE "spapr-vscsi" @@ -264,93 +263,138 @@ static int vscsi_send_rsp(VSCSIState *s, vscsi_req *req, return 0; } -static inline void vscsi_swap_desc(struct srp_direct_buf *desc) +static inline struct srp_direct_buf vscsi_swap_desc(struct srp_direct_buf desc) { -desc->va = be64_to_cpu(desc->va); -desc->len = be32_to_cpu(desc->len); +desc.va = be64_to_cpu(desc.va); +desc.len = be32_to_cpu(desc.len); +return desc; +} + +static int vscsi_fetch_desc(VSCSIState *s, struct vscsi_req *req, +unsigned n, unsigned buf_offset, +struct srp_direct_buf *ret) +{ +struct srp_cmd *cmd = &req->iu.srp.cmd; + +switch (req->dma_fmt) { +case SRP_NO_DATA_DESC: { +dprintf("VSCSI: no data descriptor\n"); +return 0; +} +case SRP_DATA_DESC_DIRECT: { +memcpy(ret, cmd->add_data + req->cdb_offset, sizeof(*ret)); +assert(req->cur_desc_num == 0); +dprintf("VSCSI: direct segment\n"); +break; +} +case SRP_DATA_DESC_INDIRECT: { +struct srp_indirect_buf *tmp = (struct srp_indirect_buf *) + (cmd->add_data + req->cdb_offset); +if (n < req->local_desc) { +*ret = tmp->desc_list[n]; +dprintf("VSCSI: indirect segment local tag=0x%x desc#%d/%d\n", +req->qtag, n, req->local_desc); + +} else if (n < req->total_desc) { +int rc; +struct srp_direct_buf tbl_desc = vscsi_swap_desc(tmp->table_desc); +unsigned desc_offset = n * sizeof(struct srp_direct_buf); + +if (desc_offset >= tbl_desc.len) { +dprintf("VSCSI: #%d is ouf of range (%d bytes)\n", +n, desc_offset); +return -1; +} +rc = spapr_vio_dma_read(&s->vdev, tbl_desc.va + desc_offset, +ret, sizeof(struct srp_direct_buf)); +if (rc) { +dprintf("VSCSI: spapr_vio_dma_read -> %d reading ext_desc\n", +rc); +return -1; +} +dprintf("VSCSI: indirect segment ext. tag=0x%x desc#%d/%d { va=%"PRIx64" len=%x }\n", +req->qtag, n, req->total_desc, tbl_desc.va, tbl_desc.len); +} else { +dprintf("VSCSI: Out of descriptors !\n"); +return 0; +} +break; +} +default: +fprintf(stderr, "VSCSI: Unknown format %x\n", req->dma_fmt); +return -1; +} + +*ret = vscsi_swap_desc(*ret); +if (buf_offset > ret->len) { +dprintf(" offset=%x is out of a descriptor #%d boundary=%x\n", +buf_offset, req->cur_desc_num, ret->len); +return -1; +} +ret->va += buf_offset; +re
[Qemu-devel] [PATCH v2 02/11] pseries: savevm support for VIO devices
From: David Gibson This patch adds helpers to allow PAPR VIO devices to save state common to all VIO devices during savevm. Signed-off-by: David Gibson Reviewed-by: Anthony Liguori --- hw/ppc/spapr_vio.c | 20 include/hw/ppc/spapr_vio.h | 5 + 2 files changed, 25 insertions(+) diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c index 7c6f6e4..75ce19f 100644 --- a/hw/ppc/spapr_vio.c +++ b/hw/ppc/spapr_vio.c @@ -542,6 +542,26 @@ static const TypeInfo spapr_vio_bridge_info = { .class_init= spapr_vio_bridge_class_init, }; +const VMStateDescription vmstate_spapr_vio = { +.name = "spapr_vio", +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField []) { +/* Sanity check */ +VMSTATE_UINT32_EQUAL(reg, VIOsPAPRDevice), +VMSTATE_UINT32_EQUAL(irq, VIOsPAPRDevice), + +/* General VIO device state */ +VMSTATE_UINTTL(signal_state, VIOsPAPRDevice), +VMSTATE_UINT64(crq.qladdr, VIOsPAPRDevice), +VMSTATE_UINT32(crq.qsize, VIOsPAPRDevice), +VMSTATE_UINT32(crq.qnext, VIOsPAPRDevice), + +VMSTATE_END_OF_LIST() +}, +}; + static void vio_spapr_device_class_init(ObjectClass *klass, void *data) { DeviceClass *k = DEVICE_CLASS(klass); diff --git a/include/hw/ppc/spapr_vio.h b/include/hw/ppc/spapr_vio.h index 3609327..46edc2a 100644 --- a/include/hw/ppc/spapr_vio.h +++ b/include/hw/ppc/spapr_vio.h @@ -134,4 +134,9 @@ VIOsPAPRDevice *spapr_vty_get_default(VIOsPAPRBus *bus); void spapr_vio_quiesce(void); +extern const VMStateDescription vmstate_spapr_vio; + +#define VMSTATE_SPAPR_VIO(_f, _s) \ +VMSTATE_STRUCT(_f, _s, 0, vmstate_spapr_vio, VIOsPAPRDevice) + #endif /* _HW_SPAPR_VIO_H */ -- 1.8.0
[Qemu-devel] [PATCH v2 08/11] pseries: savevm support for pseries machine
From: David Gibson This adds the necessary pieces to implement savevm / migration for the pseries machine. The most complex part here is migrating the hash table - for the paravirtualized pseries machine the guest's hash page table is not stored within guest memory, but externally and the guest accesses it via hypercalls. This patch uses a hypervisor reserved bit of the HPTE as a dirty bit (tracking changes to the HPTE itself, not the page it references). This is used to implement a live migration style incremental save and restore of the hash table contents. Normally a hash table is 16MB but it can get bigger depending on how much RAM the guest has. Due to its nature, updates to it are random so the live migration style is used for it. In addition it adds VMStateDescription information to save and restore the (few) remaining pieces of state information needed by the pseries machine. Signed-off-by: David Gibson Reviewed-by: Anthony Liguori --- Changes: 2013/09/07: * added a comment about HPT size and migration style choice --- hw/ppc/spapr.c | 269 - hw/ppc/spapr_hcall.c | 8 +- include/hw/ppc/spapr.h | 12 ++- 3 files changed, 281 insertions(+), 8 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index e340708..734a782 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -32,6 +32,7 @@ #include "sysemu/cpus.h" #include "sysemu/kvm.h" #include "kvm_ppc.h" +#include "mmu-hash64.h" #include "hw/boards.h" #include "hw/ppc/ppc.h" @@ -666,7 +667,7 @@ static void spapr_cpu_reset(void *opaque) env->spr[SPR_HIOR] = 0; -env->external_htab = spapr->htab; +env->external_htab = (uint8_t *)spapr->htab; env->htab_base = -1; env->htab_mask = HTAB_SIZE(spapr) - 1; env->spr[SPR_SDR1] = (target_ulong)(uintptr_t)spapr->htab | @@ -710,6 +711,268 @@ static int spapr_vga_init(PCIBus *pci_bus) } } +static const VMStateDescription vmstate_spapr = { +.name = "spapr", +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField []) { +VMSTATE_UINT32(next_irq, sPAPREnvironment), + +/* RTC offset */ +VMSTATE_UINT64(rtc_offset, sPAPREnvironment), + +VMSTATE_END_OF_LIST() +}, +}; + +#define HPTE(_table, _i) (void *)(((uint64_t *)(_table)) + ((_i) * 2)) +#define HPTE_VALID(_hpte) (tswap64(*((uint64_t *)(_hpte))) & HPTE64_V_VALID) +#define HPTE_DIRTY(_hpte) (tswap64(*((uint64_t *)(_hpte))) & HPTE64_V_HPTE_DIRTY) +#define CLEAN_HPTE(_hpte) ((*(uint64_t *)(_hpte)) &= tswap64(~HPTE64_V_HPTE_DIRTY)) + +static int htab_save_setup(QEMUFile *f, void *opaque) +{ +sPAPREnvironment *spapr = opaque; + +spapr->htab_save_index = 0; +spapr->htab_first_pass = true; + +/* "Iteration" header */ +qemu_put_be32(f, spapr->htab_shift); + +return 0; +} + +#define MAX_ITERATION_NS500 /* 5 ms */ + +static void htab_save_first_pass(QEMUFile *f, sPAPREnvironment *spapr, + int64_t max_ns) +{ +int htabslots = HTAB_SIZE(spapr) / HASH_PTE_SIZE_64; +int index = spapr->htab_save_index; +int64_t starttime = qemu_get_clock_ns(rt_clock); + +assert(spapr->htab_first_pass); + +do { +int chunkstart; + +/* Consume invalid HPTEs */ +while ((index < htabslots) + && !HPTE_VALID(HPTE(spapr->htab, index))) { +index++; +CLEAN_HPTE(HPTE(spapr->htab, index)); +} + +/* Consume valid HPTEs */ +chunkstart = index; +while ((index < htabslots) + && HPTE_VALID(HPTE(spapr->htab, index))) { +index++; +CLEAN_HPTE(HPTE(spapr->htab, index)); +} + +if (index > chunkstart) { +int n_valid = index - chunkstart; + +qemu_put_be32(f, chunkstart); +qemu_put_be16(f, n_valid); +qemu_put_be16(f, 0); +qemu_put_buffer(f, HPTE(spapr->htab, chunkstart), +HASH_PTE_SIZE_64 * n_valid); + +if ((qemu_get_clock_ns(rt_clock) - starttime) > max_ns) { +break; +} +} +} while ((index < htabslots) && !qemu_file_rate_limit(f)); + +if (index >= htabslots) { +assert(index == htabslots); +index = 0; +spapr->htab_first_pass = false; +} +spapr->htab_save_index = index; +} + +static bool htab_save_later_pass(QEMUFile *f, sPAPREnvironment *spapr, + int64_t max_ns) +{ +bool final = max_ns < 0; +int htabslots = HTAB_SIZE(spapr) / HASH_PTE_SIZE_64; +int examined = 0, sent = 0; +int index = spapr->htab_save_index; +int64_t starttime = qemu_get_clock_ns(rt_clock); + +assert(!spapr->htab_first_pass); + +do { +int chunkstart, invalidstart; + +/* Consume non-dirty HPTEs */ +while ((index < htabslots) + && !HPTE_
Re: [Qemu-devel] [PATCH 4/4] qemu-img: conditionally discard target on convert
Am 18.07.2013 um 20:54 schrieb ronnie sahlberg : > BlockLimitsVPD OptimalUnmapGranularity also applies to unmapping with > writesame16 : > > An OPTIMAL UNMAP GRANULARITY field set to a non-zero value indicates > the optimal granularity in logical blocks > for unmap requests (e.g., an UNMAP command or a WRITE SAME (16) > command with the UNMAP bit set to > one). An unmap request with a number of logical blocks that is not a > multiple of this value may result in unmap > operations on fewer LBAs than requested. An OPTIMAL UNMAP GRANULARITY > field set to _h indicates > that the device server does not report an optimal unmap granularity. > > So if you send a writesame16+unmap-bit that honours the "optimal > unmap request" you have a pretty good chance > that the blocks will be unmapped. If they are not they will at least > be overwritten as zero. thanks for the details. I think to have optimal performance and best change for unmapping in qemu-img convert it might be best to export the OPTIMAL UNMAP GRANULARITY as well as the write_zeroes_w_discard capability via the BDI and than zero out the whole device with bdrv_write_zeroes and the BDRV_MAY_DISCARD flag. the logic for this is already prepared in patch4 (topic of this email). except that i have to exchange bdrv_discard with bdrv_write_zeroes w/ BDRV_MAY_DISCARD. what do you think? > > > On Thu, Jul 18, 2013 at 11:43 AM, Peter Lieven wrote: >> >> Am 18.07.2013 um 16:35 schrieb Paolo Bonzini : >> >>> Il 18/07/2013 16:32, Peter Lieven ha scritto: >> > (Mis)alignment and granularity can be handled later. We can ignore them > for now. Later, if we decide the best way to support them is a flag, > we'll add it. Let's not put the cart before the horse. > > BTW, I expect alignment!=0 to be really, really rare. To explain my concerns: I know that my target has internal page size of 15MB. I will check what happens if I deallocate this 15MB in chunks of lets say 1MB. If the page gets unprovisioned after the last chunk is unmapped it would be fine :-) >>> >>> You're talking of granularity here, not (mis)alignment. >> >> you are right. for the target i am talking about this is 30720 512-byte >> blocks for the granularity (pagesize) and 0 for the alignment. >> i will see what happens if I write same w/unmap the whole 30720 blocks in >> smaller blocks ;-) otherwise i will have to add support for honoring this >> values in qemu-img convert >> as a follow up. >> >> Peter >>
Re: [Qemu-devel] [PATCH V4 4/4] Change default to qcow2 for sync mode none.
On Thu, Jul 18, 2013 at 12:56:51PM -0600, Eric Blake wrote: > On 07/18/2013 12:47 PM, Ian Main wrote: > > qcow2 supports backing files so it makes sense to default to qcow2 > > for MIRROR_SYNC_MODE_NONE so that we can use the source as a backing > > drive and export it via nbd. Defaulting FULL and TOP to SYNC_MODE_NONE > > breaks tests but that could be fixed if we wanted it. > > > > Signed-off-by: Ian Main > > --- > > blockdev.c | 5 - > > qapi-schema.json | 1 + > > 2 files changed, 5 insertions(+), 1 deletion(-) > > Looks okay, but let's answer the meta-question first of whether we > should just make 'format' mandatory and be done with it. > > Also, I've noticed you aren't cc'ing many people; that can slow down > reviews. http://wiki.qemu.org/Contribute/SubmitAPatch gives hints on > how to determine the right people to send your patches to, by > deciphering MAINTAINERS and running ./scripts/getmaintainer.pl. Ah ok, I'll add them next rev. My take has been to just do a patch that implements the suggestion and see what people think :). Ian
Re: [Qemu-devel] [PATCH V4 1/4] Implement sync modes for drive-backup.
On Thu, Jul 18, 2013 at 11:19:43AM -0600, Eric Blake wrote: > On 07/17/2013 02:04 PM, Ian Main wrote: > > This patch adds sync-modes to the drive-backup interface and > > implements the FULL, NONE and TOP modes of synchronization. > > > > FULL performs as before copying the entire contents of the drive > > while preserving the point-in-time using CoW. > > NONE only copies new writes to the target drive. > > TOP copies changes to the topmost drive image and preserves the > > point-in-time using CoW. > > > > > +++ b/qapi-schema.json > > @@ -1807,6 +1807,10 @@ > > # @format: #optional the format of the new destination, default is to > > # probe if @mode is 'existing', else the format of the source > > # > > +# @sync: what parts of the disk image should be copied to the destination > > +#(all the disk, only the sectors allocated in the topmost image, or > > +#only new I/O). > > +# > > # @mode: #optional whether and how QEMU should create a new image, default > > is > > #'absolute-paths'. > > This hunk looks bogus; the 'DriveBackup' type already documents @sync as > of commit b53169e, which makes me suspect this hunk got misapplied > during rebasing. Did you check that? I know there was one bit of documentation missing that I fixed here. I also just did a clean rebase (git am) to kevin/block and it all applied fine. > > # > > diff --git a/qmp-commands.hx b/qmp-commands.hx > > index e075df4..f3f6b3d 100644 > > --- a/qmp-commands.hx > > +++ b/qmp-commands.hx > > @@ -957,6 +957,7 @@ Arguments: > > > > Example: > > -> { "execute": "drive-backup", "arguments": { "device": "drive0", > > + "sync": "full", > > "target": "backup.img" } } > > Ouch - commit b53169e made 'sync' mandatory, but forgot to update the > example to match; perhaps this hunk ought to be applied independently? That's up to you guys, I can split it out if needed. Ian
Re: [Qemu-devel] [PATCH V4 4/4] Change default to qcow2 for sync mode none.
On 07/18/2013 12:47 PM, Ian Main wrote: > qcow2 supports backing files so it makes sense to default to qcow2 > for MIRROR_SYNC_MODE_NONE so that we can use the source as a backing > drive and export it via nbd. Defaulting FULL and TOP to SYNC_MODE_NONE > breaks tests but that could be fixed if we wanted it. > > Signed-off-by: Ian Main > --- > blockdev.c | 5 - > qapi-schema.json | 1 + > 2 files changed, 5 insertions(+), 1 deletion(-) Looks okay, but let's answer the meta-question first of whether we should just make 'format' mandatory and be done with it. Also, I've noticed you aren't cc'ing many people; that can slow down reviews. http://wiki.qemu.org/Contribute/SubmitAPatch gives hints on how to determine the right people to send your patches to, by deciphering MAINTAINERS and running ./scripts/getmaintainer.pl. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH 4/4] qemu-img: conditionally discard target on convert
BlockLimitsVPD OptimalUnmapGranularity also applies to unmapping with writesame16 : An OPTIMAL UNMAP GRANULARITY field set to a non-zero value indicates the optimal granularity in logical blocks for unmap requests (e.g., an UNMAP command or a WRITE SAME (16) command with the UNMAP bit set to one). An unmap request with a number of logical blocks that is not a multiple of this value may result in unmap operations on fewer LBAs than requested. An OPTIMAL UNMAP GRANULARITY field set to _h indicates that the device server does not report an optimal unmap granularity. So if you send a writesame16+unmap-bit that honours the "optimal unmap request" you have a pretty good chance that the blocks will be unmapped. If they are not they will at least be overwritten as zero. On Thu, Jul 18, 2013 at 11:43 AM, Peter Lieven wrote: > > Am 18.07.2013 um 16:35 schrieb Paolo Bonzini : > >> Il 18/07/2013 16:32, Peter Lieven ha scritto: > (Mis)alignment and granularity can be handled later. We can ignore them for now. Later, if we decide the best way to support them is a flag, we'll add it. Let's not put the cart before the horse. BTW, I expect alignment!=0 to be really, really rare. >>> To explain my concerns: >>> >>> I know that my target has internal page size of 15MB. I will check what >>> happens >>> if I deallocate this 15MB in chunks of lets say 1MB. If the page gets >>> unprovisioned >>> after the last chunk is unmapped it would be fine :-) >> >> You're talking of granularity here, not (mis)alignment. > > you are right. for the target i am talking about this is 30720 512-byte > blocks for the granularity (pagesize) and 0 for the alignment. > i will see what happens if I write same w/unmap the whole 30720 blocks in > smaller blocks ;-) otherwise i will have to add support for honoring this > values in qemu-img convert > as a follow up. > > Peter >
Re: [Qemu-devel] [PATCH V4 4/4] Change default to qcow2 for sync mode none.
On 07/18/2013 12:03 PM, Ian Main wrote: >> >> Or we could simplify life by making 'format' mandatory for drive-backup; >> it was optional for 'drive-mirror' due to incremental implementation, >> but for 'drive-backup', we still have the opportunity to do things right >> from the first release. > > Ah, I did make a doc change, I must have forgotten to add it. > > I'm all for making format mandatory if that is ok with everyone. Maybe > that is the best solution. We can always change our mind in 1.7 to make it optional if we change our minds, but I'd definitely like to see patches that make 'format' mandatory for DriveBackup for 1.6 - simpler all around. Converting mandatory to optional is discoverable via introspection; while converting optional to mandatory is an API break. And since we can argue that optional formats is relatively risky, it's better to have our initial release be conservative by requiring the field until (and unless) someone comes up with a use case why leaving it unspecified makes a difference. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH] [RFC] aio/async: Add timed bottom-halves
Stefan, --On 17 July 2013 11:02:30 +0800 Stefan Hajnoczi wrote: The steps to achieving this: 1. Drop alarm timers from qemu-timer.c and calculate g_poll() timeout instead for the main loop. 2. Introduce a per-AioContext aio_ctx_clock that can be used with qemu_new_timer() to create a QEMUTimer that expires during aio_poll(). 3. Calculate g_poll() timeout for aio_ctx_clock in aio_poll(). I've pretty much written this. Two issues: 1. I very much enjoyed deleting all the alarm timers code. However, it was doing something vaguely useful, which was calling qemu_notify_event when the timer expired. Under the new regime above, if the AioContext clock's timers expires within aio_poll, life is good as the timeout to g_poll will expire. However, this won't apply if any of the other three static clock's timers expire. Also, it won't work within the mainloop poll. Also, we lose the ability to respond to timers in a sub millisecond way. Options: a) restore alarm timers (at least for the time being). Make all alarm timers do qemu_notify_event. However, only run the AioContext clock's timers within aio_poll. This is the least intrusive change. b) calculate the timeout in aio_poll with respect to the minimum deadline across all clocks, not just AioContext's clock. Use the same logic in mainloop. I'd go for (b), except for the millisecond accuracy thing. So my temptation (sadly) is (a). 2. If we do delete alarm timers, I'll need to delete the -clock option. WDYT? -- Alex Bligh
[Qemu-devel] [PATCH V4 4/4] Change default to qcow2 for sync mode none.
qcow2 supports backing files so it makes sense to default to qcow2 for MIRROR_SYNC_MODE_NONE so that we can use the source as a backing drive and export it via nbd. Defaulting FULL and TOP to SYNC_MODE_NONE breaks tests but that could be fixed if we wanted it. Signed-off-by: Ian Main --- blockdev.c | 5 - qapi-schema.json | 1 + 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/blockdev.c b/blockdev.c index 000dea6..a56ba08 100644 --- a/blockdev.c +++ b/blockdev.c @@ -1462,7 +1462,10 @@ void qmp_drive_backup(const char *device, const char *target, } if (!has_format) { -format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : bs->drv->format_name; +format = NULL; +if (mode != NEW_IMAGE_MODE_EXISTING) { +format = sync == MIRROR_SYNC_MODE_NONE ? "qcow2" : bs->drv->format_name; +} } if (format) { drv = bdrv_find_format(format); diff --git a/qapi-schema.json b/qapi-schema.json index b3f6c2a..e2c86f9 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -1806,6 +1806,7 @@ # # @format: #optional the format of the new destination, default is to # probe if @mode is 'existing', else the format of the source +# drive. If @sync is 'none' then the default is qcow2. # # @sync: what parts of the disk image should be copied to the destination #(all the disk, only the sectors allocated in the topmost image, or -- 1.8.1.4
Re: [Qemu-devel] [PATCH 4/4] qemu-img: conditionally discard target on convert
Am 18.07.2013 um 16:35 schrieb Paolo Bonzini : > Il 18/07/2013 16:32, Peter Lieven ha scritto: >>> (Mis)alignment and granularity can be handled later. We can ignore them >>> for now. Later, if we decide the best way to support them is a flag, >>> we'll add it. Let's not put the cart before the horse. >>> >>> BTW, I expect alignment!=0 to be really, really rare. >> To explain my concerns: >> >> I know that my target has internal page size of 15MB. I will check what >> happens >> if I deallocate this 15MB in chunks of lets say 1MB. If the page gets >> unprovisioned >> after the last chunk is unmapped it would be fine :-) > > You're talking of granularity here, not (mis)alignment. you are right. for the target i am talking about this is 30720 512-byte blocks for the granularity (pagesize) and 0 for the alignment. i will see what happens if I write same w/unmap the whole 30720 blocks in smaller blocks ;-) otherwise i will have to add support for honoring this values in qemu-img convert as a follow up. Peter
Re: [Qemu-devel] [PATCH v2 0/6] Clean up bogus default boot order
Markus Armbruster writes: > Ping? This needs a rebase. Regards, Anthony Liguori > > Markus Armbruster writes: > >> This is on top of "[PATCH v4 00/12] Boot order tests", so it's >> protected by these tests. >> >> The first five patches are admittedly related to the stated purpose of >> this series pretty much only by "I can't stand perpetuating this >> stupid crap". Max Filippov and Peter Maydell already cleaned up >> Xtensa and ARM the same way. >> >> v2: >> * Straightforward rebase >> * PATCH 6 improved commit message (Alex) >> >> Markus Armbruster (6): >> pc: Don't prematurely explode QEMUMachineInitArgs >> pc: Don't explode QEMUMachineInitArgs into local variables needlessly >> sun4: Don't prematurely explode QEMUMachineInitArgs >> ppc: Don't explode QEMUMachineInitArgs into local variables >> needlessly >> ppc: Don't duplicate QEMUMachineInitArgs in PPCE500Params >> hw: Clean up bogus default boot order >> >> hw/alpha/dp264.c | 1 - >> hw/arm/collie.c | 1 - >> hw/arm/exynos4_boards.c | 2 - >> hw/arm/gumstix.c | 2 - >> hw/arm/highbank.c| 1 - >> hw/arm/integratorcp.c| 1 - >> hw/arm/kzm.c | 1 - >> hw/arm/mainstone.c | 1 - >> hw/arm/musicpal.c| 1 - >> hw/arm/nseries.c | 6 +- >> hw/arm/omap_sx1.c| 2 - >> hw/arm/palm.c| 1 - >> hw/arm/realview.c| 4 - >> hw/arm/spitz.c | 4 - >> hw/arm/stellaris.c | 2 - >> hw/arm/tosa.c| 1 - >> hw/arm/versatilepb.c | 2 - >> hw/arm/vexpress.c| 2 - >> hw/arm/xilinx_zynq.c | 1 - >> hw/arm/z2.c | 1 - >> hw/core/null-machine.c | 1 - >> hw/cris/axis_dev88.c | 1 - >> hw/i386/pc_piix.c| 89 +++-- >> hw/i386/pc_q35.c | 28 +++ >> hw/i386/xen_machine_pv.c | 1 - >> hw/lm32/lm32_boards.c| 2 - >> hw/lm32/milkymist.c | 1 - >> hw/m68k/an5206.c | 1 - >> hw/m68k/dummy_m68k.c | 1 - >> hw/m68k/mcf5208.c| 1 - >> hw/microblaze/petalogix_ml605_mmu.c | 1 - >> hw/microblaze/petalogix_s3adsp1800_mmu.c | 1 - >> hw/mips/mips_fulong2e.c | 1 - >> hw/mips/mips_jazz.c | 2 - >> hw/mips/mips_malta.c | 1 - >> hw/mips/mips_mipssim.c | 1 - >> hw/mips/mips_r4k.c | 1 - >> hw/openrisc/openrisc_sim.c | 1 - >> hw/ppc/e500.c| 35 + >> hw/ppc/e500.h| 13 +-- >> hw/ppc/e500plat.c| 15 +--- >> hw/ppc/mac_newworld.c| 4 +- >> hw/ppc/mac_oldworld.c| 4 +- >> hw/ppc/mpc8544ds.c | 15 +--- >> hw/ppc/ppc405_boards.c | 2 - >> hw/ppc/ppc440_bamboo.c | 1 - >> hw/ppc/prep.c| 4 +- >> hw/ppc/spapr.c | 4 +- >> hw/ppc/virtex_ml507.c| 1 - >> hw/s390x/s390-virtio-ccw.c | 1 - >> hw/s390x/s390-virtio.c | 1 - >> hw/sh4/r2d.c | 1 - >> hw/sh4/shix.c| 1 - >> hw/sparc/leon3.c | 1 - >> hw/sparc/sun4m.c | 131 >> --- >> hw/sparc64/sun4u.c | 58 +- >> hw/unicore32/puv3.c | 1 - >> hw/xtensa/xtensa_lx60.c | 2 - >> hw/xtensa/xtensa_sim.c | 1 - >> include/hw/boards.h | 7 +- >> vl.c | 4 +- >> 61 files changed, 130 insertions(+), 349 deletions(-)
Re: [Qemu-devel] [PATCH V4 4/4] Change default to qcow2 for sync mode none.
On Thu, Jul 18, 2013 at 11:27:21AM -0600, Eric Blake wrote: > On 07/17/2013 02:04 PM, Ian Main wrote: > > qcow2 supports backing files so it makes sense to default to qcow2 > > for MIRROR_SYNC_MODE_NONE so that we can use the source as a backing > > drive and export it via nbd. Defaulting FULL and TOP to SYNC_MODE_NONE > > breaks tests but that could be fixed if we wanted it. > > > > Signed-off-by: Ian Main > > --- > > blockdev.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/blockdev.c b/blockdev.c > > index 000dea6..805b0e5 100644 > > --- a/blockdev.c > > +++ b/blockdev.c > > @@ -1462,7 +1462,7 @@ void qmp_drive_backup(const char *device, const char > > *target, > > } > > > > if (!has_format) { > > -format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : > > bs->drv->format_name; > > +format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : "qcow2"; > > Is this the right thing to do? Or should we do: > > if (!has_format) { > if (mode == NEW_IMAGE_MODE_EXISTING) { > format = NULL; > } else { > format = bs->drv->format_name ?: "qcow2"; > } > } > > That is, I think we should default to doing a backup in the format given > by the original (what if the original is qed, which also supports > backing files), and only use qcow2 when there is no guidance whatsoever. > > But in practice, I don't care - format probing is a security hole, so > libvirt should always be passing a format, at which point the entire > !has_format condition should be skipped when called by libvirt. Heh, actually that is basically what I have now, as with the doc change I forgot to git add it. Doh! I'll repost.. I'm also not opposed to format being non-optional. Ian
Re: [Qemu-devel] [PATCH V4 4/4] Change default to qcow2 for sync mode none.
On Thu, Jul 18, 2013 at 11:32:52AM -0600, Eric Blake wrote: > On 07/18/2013 11:27 AM, Eric Blake wrote: > > >> if (!has_format) { > >> -format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : > >> bs->drv->format_name; > >> +format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : "qcow2"; > > > > Is this the right thing to do? Or should we do: > > > > if (!has_format) { > > if (mode == NEW_IMAGE_MODE_EXISTING) { > > format = NULL; > > } else { > > format = bs->drv->format_name ?: "qcow2"; > > } > > } > > > > That is, I think we should default to doing a backup in the format given > > by the original (what if the original is qed, which also supports > > backing files), and only use qcow2 when there is no guidance whatsoever. > > > > But in practice, I don't care > > Well, I _DO_ care about one thing - make sure that the qapi-schema.json > page accurately documents how this variable is defaulted for callers > that don't care about the implications of omitting a format. > > Or we could simplify life by making 'format' mandatory for drive-backup; > it was optional for 'drive-mirror' due to incremental implementation, > but for 'drive-backup', we still have the opportunity to do things right > from the first release. Ah, I did make a doc change, I must have forgotten to add it. I'm all for making format mandatory if that is ok with everyone. Maybe that is the best solution. Ian
Re: [Qemu-devel] [PATCH v8 0/3] Throttle-down guest to help with live migration convergence
Hi Peter, Thanks for catching this. Tthis was perhaps accidentally left out during merge to Juan's migration.next tree. I have informed Juan and he said he would take care of it. Vinod -Original Message- From: Peter Lieven [mailto:lieven-li...@dlhnet.de] Sent: Wednesday, July 17, 2013 11:25 PM To: Vinod, Chegu Cc: ebl...@redhat.com; anth...@codemonkey.ws; quint...@redhat.com; owass...@redhat.com; qemu-devel@nongnu.org; pbonz...@redhat.com Subject: Re: [Qemu-devel] [PATCH v8 0/3] Throttle-down guest to help with live migration convergence Hi all, is it possible that not v8 of this patch got merged? Without checking line-by-line I see at least that this here +# @auto-converge: If enabled, QEMU will automatically throttle down the guest +# to speed up convergence of RAM migration. (since 1.6) +# is missing in qapi-schema.json. BR, Peter On 24.06.2013 11:49, Chegu Vinod wrote: > Busy enterprise workloads hosted on large sized VM's tend to dirty > memory faster than the transfer rate achieved via live guest migration. > Despite some good recent improvements (& using dedicated 10Gig NICs > between hosts) the live migration does NOT converge. > > If a user chooses to force convergence of their migration via a new > migration capability "auto-converge" then this change will auto-detect > lack of convergence scenario and trigger a slow down of the workload > by explicitly disallowing the VCPUs from spending much time in the VM > context. > > The migration thread tries to catchup and this eventually leads to > convergence in some "deterministic" amount of time. Yes it does impact > the performance of all the VCPUs but in my observation that lasts only > for a short duration of time. i.e. end up entering stage 3 (downtime > phase) soon after that. No external trigger is required. > > Thanks to Juan and Paolo for their useful suggestions. > > --- > Changes from v7: > - added a missing else to patch 3/3. > > Changes from v6: > - incorporated feedback from Paolo. > - rebased to latest qemu.git and removing RFC > > Changes from v5: > - incorporated feedback from Paolo & Igor. > - rebased to latest qemu.git > > Changes from v4: > - incorporated feedback from Paolo. > - split into 3 patches. > > Changes from v3: > - incorporated feedback from Paolo and Eric > - rebased to latest qemu.git > > Changes from v2: > - incorporated feedback from Orit, Juan and Eric > - stop the throttling thread at the start of stage 3 > - rebased to latest qemu.git > > Changes from v1: > - rebased to latest qemu.git > - added auto-converge capability(default off) - suggested by Anthony Liguori & > Eric Blake. > > Signed-off-by: Chegu Vinod > --- > > Chegu Vinod (3): >Introduce async_run_on_cpu() >Add 'auto-converge' migration capability >Force auto-convegence of live migration > > arch_init.c | 85 > + > cpus.c| 29 ++ > include/migration/migration.h |2 + > include/qemu-common.h |1 + > include/qom/cpu.h | 10 + > migration.c |9 > qapi-schema.json |5 ++- > 7 files changed, 140 insertions(+), 1 deletions(-) > >
Re: [Qemu-devel] Watchdog device in Qemu user mode
Hi Andreas, thanks alot. I only used user mode because I thought it is easier to implement. I will take a look into system mode. Best regards Bastian
Re: [Qemu-devel] [PATCH V4 4/4] Change default to qcow2 for sync mode none.
On 07/18/2013 11:27 AM, Eric Blake wrote: >> if (!has_format) { >> -format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : >> bs->drv->format_name; >> +format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : "qcow2"; > > Is this the right thing to do? Or should we do: > > if (!has_format) { > if (mode == NEW_IMAGE_MODE_EXISTING) { > format = NULL; > } else { > format = bs->drv->format_name ?: "qcow2"; > } > } > > That is, I think we should default to doing a backup in the format given > by the original (what if the original is qed, which also supports > backing files), and only use qcow2 when there is no guidance whatsoever. > > But in practice, I don't care Well, I _DO_ care about one thing - make sure that the qapi-schema.json page accurately documents how this variable is defaulted for callers that don't care about the implications of omitting a format. Or we could simplify life by making 'format' mandatory for drive-backup; it was optional for 'drive-mirror' due to incremental implementation, but for 'drive-backup', we still have the opportunity to do things right from the first release. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] Watchdog device in Qemu user mode
Hi Bastian, Am 18.07.2013 19:09, schrieb kbast...@mail.uni-paderborn.de: > The processor is usually embedded and > programs run without operating systems or with custom ones designed for > hard realtime. Would it be much work to port from user mode to system mode? > I am using a c-compiler to create testprograms and these programs are > serving the watchdog. This results in segmentation faults when I try to > execute loads or stores on addresses from the watchdog. You need a softmmu a.k.a. system emulation for that. linux-user is only for emulating programs that would actually run under Linux operating system. You'd use the -bios command line option to load bare metal software. Having no link to your code it's hard to judge how much work it might be, but in general system emulation is easier - you need to write a simple board file to load your binary, the instruction emulation remains unchanged, maybe one or two stub functions for interrupts and MMU faults. Regards, Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH V4 4/4] Change default to qcow2 for sync mode none.
On 07/17/2013 02:04 PM, Ian Main wrote: > qcow2 supports backing files so it makes sense to default to qcow2 > for MIRROR_SYNC_MODE_NONE so that we can use the source as a backing > drive and export it via nbd. Defaulting FULL and TOP to SYNC_MODE_NONE > breaks tests but that could be fixed if we wanted it. > > Signed-off-by: Ian Main > --- > blockdev.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/blockdev.c b/blockdev.c > index 000dea6..805b0e5 100644 > --- a/blockdev.c > +++ b/blockdev.c > @@ -1462,7 +1462,7 @@ void qmp_drive_backup(const char *device, const char > *target, > } > > if (!has_format) { > -format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : > bs->drv->format_name; > +format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : "qcow2"; Is this the right thing to do? Or should we do: if (!has_format) { if (mode == NEW_IMAGE_MODE_EXISTING) { format = NULL; } else { format = bs->drv->format_name ?: "qcow2"; } } That is, I think we should default to doing a backup in the format given by the original (what if the original is qed, which also supports backing files), and only use qcow2 when there is no guidance whatsoever. But in practice, I don't care - format probing is a security hole, so libvirt should always be passing a format, at which point the entire !has_format condition should be skipped when called by libvirt. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] Watchdog device in Qemu user mode
On 18 July 2013 18:09, wrote: > thanks for the fast response. The processor is usually embedded and programs > run without operating systems or with custom ones designed for hard > realtime. If this is your use case you should just implement system mode and not worry about implementing linux-user mode at all. thanks -- PMM
Re: [Qemu-devel] [PATCH V4 1/4] Implement sync modes for drive-backup.
On 07/17/2013 02:04 PM, Ian Main wrote: > This patch adds sync-modes to the drive-backup interface and > implements the FULL, NONE and TOP modes of synchronization. > > FULL performs as before copying the entire contents of the drive > while preserving the point-in-time using CoW. > NONE only copies new writes to the target drive. > TOP copies changes to the topmost drive image and preserves the > point-in-time using CoW. > > +++ b/qapi-schema.json > @@ -1807,6 +1807,10 @@ > # @format: #optional the format of the new destination, default is to > # probe if @mode is 'existing', else the format of the source > # > +# @sync: what parts of the disk image should be copied to the destination > +#(all the disk, only the sectors allocated in the topmost image, or > +#only new I/O). > +# > # @mode: #optional whether and how QEMU should create a new image, default is > #'absolute-paths'. This hunk looks bogus; the 'DriveBackup' type already documents @sync as of commit b53169e, which makes me suspect this hunk got misapplied during rebasing. > # > diff --git a/qmp-commands.hx b/qmp-commands.hx > index e075df4..f3f6b3d 100644 > --- a/qmp-commands.hx > +++ b/qmp-commands.hx > @@ -957,6 +957,7 @@ Arguments: > > Example: > -> { "execute": "drive-backup", "arguments": { "device": "drive0", > + "sync": "full", > "target": "backup.img" } } Ouch - commit b53169e made 'sync' mandatory, but forgot to update the example to match; perhaps this hunk ought to be applied independently? -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH] Bug Fix:Segmentation fault when use usb-ehci device
Hi, Am 18.07.2013 17:27, schrieb Mike Qiu: > Hi all > > Any comments ? You should've CCed the USB maintainer whose file you are touching for review rather than just ppc people, see ./MAINTAINERS. There's some typos in the commit message, but the change looks okay to me - although there were discussions to catch this on the memory API side of things instead. Regards, Andreas > > Thanks > Mike > 2013/7/16 11:50, Mike Qiu wrote: >> For usb-ehci in qemu, its caps just has read() operation, >> the write() operation does not exist. >> >> This cause a Segmentation fault when use usb-ehci device in ppc64 >> platform. >> >> here is gdb output: >> >> Program received signal SIGSEGV, Segmentation fault. >> [Switching to Thread 0x3fffa7fcef20 (LWP 6793)] >> 0x103f5244 in memory_region_oldmmio_write_accessor >> (opaque=0x113e9e78, addr=9, value=0x3fffa7fce088, >> size=1, shift=0, mask=255) at /home/Mike/qemu-impreza/memory.c:384 >> 384 mr->ops->old_mmio.write[ctz32(size)](mr->opaque, addr, tmp); >> (gdb) p *mr->ops >> $1 = {read = @0x10716f68: 0x1020699c , write = 0, >>endianness = DEVICE_LITTLE_ENDIAN, valid = {min_access_size = 1, >>max_access_size = 4, unaligned = false, accepts = 0}, impl = >>{min_access_size = 1, max_access_size = 1, unaligned = false}, >>old_mmio = {read = {0, 0, 0}, write = {0, 0, 0}}} >> >> Becasue function write() of mr->ops has not been implement, in >> function memory_region_dispatch_write(), it call >> oldmmio write accessor, but at the same time old_mmio still not >> been implement by default. >> >> That is the root cause of the Segmentation fault. >> >> To solve this problem, add empty function: ehci_caps_write() >> >> Signed-off-by: Mike Qiu >> --- >> hw/usb/hcd-ehci.c |7 +++ >> 1 file changed, 7 insertions(+) >> >> diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c >> index 67e4b24..6c8a439 100644 >> --- a/hw/usb/hcd-ehci.c >> +++ b/hw/usb/hcd-ehci.c >> @@ -1072,6 +1072,12 @@ static void ehci_port_write(void *ptr, hwaddr addr, >> trace_usb_ehci_portsc_change(addr + s->portscbase, addr >> 2, *portsc, >> old); >> } >> >> +static void ehci_caps_write(void *ptr, hwaddr addr, uint64_t val, >> + unsigned size) >> +{ >> +/* nothing */ >> +} >> + >> static void ehci_opreg_write(void *ptr, hwaddr addr, >> uint64_t val, unsigned size) >> { >> @@ -2380,6 +2386,7 @@ static void ehci_frame_timer(void *opaque) >> >> static const MemoryRegionOps ehci_mmio_caps_ops = { >> .read = ehci_caps_read, >> +.write = ehci_caps_write, >> .valid.min_access_size = 1, >> .valid.max_access_size = 4, >> .impl.min_access_size = 1, > > -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] Watchdog device in Qemu user mode
Hi Peter, thanks for the fast response. The processor is usually embedded and programs run without operating systems or with custom ones designed for hard realtime. Would it be much work to port from user mode to system mode? I am using a c-compiler to create testprograms and these programs are serving the watchdog. This results in segmentation faults when I try to execute loads or stores on addresses from the watchdog. Best regards Bastian
Re: [Qemu-devel] [PATCH] util/iov: Fix -O1 uninitialized variable warning
On 07/18/2013 09:36 AM, Peter Maydell wrote: > On 18 July 2013 17:14, Richard Henderson wrote: >> At -O2, code in the form >> >> if (p) A; B; if (p) C; >> >> may be rearranged via "jump threading" into >> >> if (p) { A; B; C; } else { B; } >> >> But at -O1 this doesn't happen and we -Werror out here on >> the "may be used uninitialized" orig_len. Perform this transform >> by hand so that -O1 remains a viable debugging alternative. >> >> Signed-off-by: Richard Henderson > > This is the same issue fixed by this (reviewed but > never applied) patch from June, isn't it? > > http://patchwork.ozlabs.org/patch/251410/ Yes, it is. r~
Re: [Qemu-devel] Watchdog device in Qemu user mode
On 18 July 2013 17:44, wrote: > Hey everybody, > > I am fairly new to qemu development. So here is my question: > I would like to write a virtual watchdog device, which > is part of the emulated cpu. This is part of an Infineon > Tricore Implementation I am writing. For simplicities sake > I use qemu user mode, to test my translation. > How could I implement such a watchdog device into qemu? > Can you give me some clues? Is it even possible to use a > device in user mode? To a first approximation, no, you can't use devices in user-mode. (Is the device really accessible to a normal Linux process running on this hardware? Usually the kernel would make it only kernel-accessible with the MMU.) -- PMM
[Qemu-devel] Watchdog device in Qemu user mode
Hey everybody, I am fairly new to qemu development. So here is my question: I would like to write a virtual watchdog device, which is part of the emulated cpu. This is part of an Infineon Tricore Implementation I am writing. For simplicities sake I use qemu user mode, to test my translation. How could I implement such a watchdog device into qemu? Can you give me some clues? Is it even possible to use a device in user mode? Best regards Bastian
Re: [Qemu-devel] [PATCH v2 0/6] Clean up bogus default boot order
Ping? Markus Armbruster writes: > This is on top of "[PATCH v4 00/12] Boot order tests", so it's > protected by these tests. > > The first five patches are admittedly related to the stated purpose of > this series pretty much only by "I can't stand perpetuating this > stupid crap". Max Filippov and Peter Maydell already cleaned up > Xtensa and ARM the same way. > > v2: > * Straightforward rebase > * PATCH 6 improved commit message (Alex) > > Markus Armbruster (6): > pc: Don't prematurely explode QEMUMachineInitArgs > pc: Don't explode QEMUMachineInitArgs into local variables needlessly > sun4: Don't prematurely explode QEMUMachineInitArgs > ppc: Don't explode QEMUMachineInitArgs into local variables > needlessly > ppc: Don't duplicate QEMUMachineInitArgs in PPCE500Params > hw: Clean up bogus default boot order > > hw/alpha/dp264.c | 1 - > hw/arm/collie.c | 1 - > hw/arm/exynos4_boards.c | 2 - > hw/arm/gumstix.c | 2 - > hw/arm/highbank.c| 1 - > hw/arm/integratorcp.c| 1 - > hw/arm/kzm.c | 1 - > hw/arm/mainstone.c | 1 - > hw/arm/musicpal.c| 1 - > hw/arm/nseries.c | 6 +- > hw/arm/omap_sx1.c| 2 - > hw/arm/palm.c| 1 - > hw/arm/realview.c| 4 - > hw/arm/spitz.c | 4 - > hw/arm/stellaris.c | 2 - > hw/arm/tosa.c| 1 - > hw/arm/versatilepb.c | 2 - > hw/arm/vexpress.c| 2 - > hw/arm/xilinx_zynq.c | 1 - > hw/arm/z2.c | 1 - > hw/core/null-machine.c | 1 - > hw/cris/axis_dev88.c | 1 - > hw/i386/pc_piix.c| 89 +++-- > hw/i386/pc_q35.c | 28 +++ > hw/i386/xen_machine_pv.c | 1 - > hw/lm32/lm32_boards.c| 2 - > hw/lm32/milkymist.c | 1 - > hw/m68k/an5206.c | 1 - > hw/m68k/dummy_m68k.c | 1 - > hw/m68k/mcf5208.c| 1 - > hw/microblaze/petalogix_ml605_mmu.c | 1 - > hw/microblaze/petalogix_s3adsp1800_mmu.c | 1 - > hw/mips/mips_fulong2e.c | 1 - > hw/mips/mips_jazz.c | 2 - > hw/mips/mips_malta.c | 1 - > hw/mips/mips_mipssim.c | 1 - > hw/mips/mips_r4k.c | 1 - > hw/openrisc/openrisc_sim.c | 1 - > hw/ppc/e500.c| 35 + > hw/ppc/e500.h| 13 +-- > hw/ppc/e500plat.c| 15 +--- > hw/ppc/mac_newworld.c| 4 +- > hw/ppc/mac_oldworld.c| 4 +- > hw/ppc/mpc8544ds.c | 15 +--- > hw/ppc/ppc405_boards.c | 2 - > hw/ppc/ppc440_bamboo.c | 1 - > hw/ppc/prep.c| 4 +- > hw/ppc/spapr.c | 4 +- > hw/ppc/virtex_ml507.c| 1 - > hw/s390x/s390-virtio-ccw.c | 1 - > hw/s390x/s390-virtio.c | 1 - > hw/sh4/r2d.c | 1 - > hw/sh4/shix.c| 1 - > hw/sparc/leon3.c | 1 - > hw/sparc/sun4m.c | 131 > --- > hw/sparc64/sun4u.c | 58 +- > hw/unicore32/puv3.c | 1 - > hw/xtensa/xtensa_lx60.c | 2 - > hw/xtensa/xtensa_sim.c | 1 - > include/hw/boards.h | 7 +- > vl.c | 4 +- > 61 files changed, 130 insertions(+), 349 deletions(-)
Re: [Qemu-devel] [PATCH v4 00/12] Boot order tests
Ping? Markus Armbruster writes: > v4: > * Old PATCH 1-6 got merged, only tests left; cover letter changed > accordingly > * PATCH 1,6 unchanged > * PATCH 2 fix system-reset race (Anthony) > * New PATCH 4,11 to make libqos/fw_cfg usable for the ppc, Sun4 tests > * New PATCH 3 shanghaied from Anthony, to make PATCH 4 compile without > make clean. > * PATCH 5,10,12 use libqos/fw_cfg (Anthony) > * PATCH 7-9 rebased, minor tweaks > v3: > * Rebased, with only trivial conflicts > * PATCH 08 cosmetic improvements > * More test cases: new PATCH 09-16 > v2: > * New PATCH 7 to make testing -boot once possible > * Old PATCH 5 reworked and extended became PATCH 8 > * Writing more tests uncovered -no-fd-bootchk weirdness, cleaned up in > new PATCH 5+6 > > Andreas Färber (1): > boot-order-test: Add tests for PowerMacs > > Anthony Liguori (1): > libqos: include dependencies > > Markus Armbruster (10): > qtest: Don't reset on qtest chardev connect > boot-order-test: New; covering just PC for now > libqos: Add support for memory-mapped fw_cfg > boot-order-test: Cover -boot once in ppc tests > boot-order-test: Better separate target-specific and generic parts > boot-order-test: Code motion for better readability > boot-order-test: Add tests for PowerPC PREP > boot-order-test: Add tests for Sun4m > libqos: Generalize I/O-mapped fw_cfg > boot-order-test: Add tests for Sun4u > > qtest.c | 7 +- > tests/Makefile | 7 +- > tests/boot-order-test.c | 209 > +++ > tests/fw_cfg-test.c | 2 +- > tests/libqos/fw_cfg-pc.c | 40 - > tests/libqos/fw_cfg-pc.h | 20 - > tests/libqos/fw_cfg.c| 55 + > tests/libqos/fw_cfg.h| 9 ++ > tests/libqos/malloc-pc.c | 2 +- > 9 files changed, 287 insertions(+), 64 deletions(-) > create mode 100644 tests/boot-order-test.c > delete mode 100644 tests/libqos/fw_cfg-pc.c > delete mode 100644 tests/libqos/fw_cfg-pc.h
Re: [Qemu-devel] seccomp: remove unused syscalls - for 1.6
Il 18/07/2013 18:35, Eduardo Otubo ha scritto: > > > On 07/18/2013 01:28 PM, Anthony Liguori wrote: >> Eduardo Otubo writes: >> >>> Hello all, >>> >>> In this small patch series I basically: >> >> Cover letter should be marked [PATCH 0/2]. Otherwise it defeats >> filtering. >> >> Would like to see a Reviewed-by from someone before applying this. > > I'm running some tests with qemu && xen, I'll post a v3 by the end of > the day. I'll format the cover letter in the correct way next time. I feel that, at some point, grep and code review must trump experiments... Paul, how did you guys handle this in other projects? Paolo
Re: [Qemu-devel] seccomp: remove unused syscalls - for 1.6
On 07/18/2013 01:28 PM, Anthony Liguori wrote: Eduardo Otubo writes: Hello all, In this small patch series I basically: Cover letter should be marked [PATCH 0/2]. Otherwise it defeats filtering. Would like to see a Reviewed-by from someone before applying this. I'm running some tests with qemu && xen, I'll post a v3 by the end of the day. I'll format the cover letter in the correct way next time. Thanks, Regards, Anthony Liguori v2 update: - set libseccomp 2.1.0 as requirement on configure script. - removed setrlimit and added sendfile64 to the whitelist. 1) Remove the ifdef's for the (not so) new libseccomp version that does a best effort and translates x86_32 syscalls into x86_64 when possible. 2) Remove unused syscalls on the seccomp whitelist. For that removal, I've been running several instances of Qemu using a script written on top of virt-test[0]. After some weeks testing I could come up with this small list, and safely remove them without breaking anything. [0] - https://github.com/autotest/virt-test/wiki -- Eduardo Otubo IBM Linux Technology Center
Re: [Qemu-devel] [PATCH] util/iov: Fix -O1 uninitialized variable warning
On 18 July 2013 17:14, Richard Henderson wrote: > At -O2, code in the form > > if (p) A; B; if (p) C; > > may be rearranged via "jump threading" into > > if (p) { A; B; C; } else { B; } > > But at -O1 this doesn't happen and we -Werror out here on > the "may be used uninitialized" orig_len. Perform this transform > by hand so that -O1 remains a viable debugging alternative. > > Signed-off-by: Richard Henderson This is the same issue fixed by this (reviewed but never applied) patch from June, isn't it? http://patchwork.ozlabs.org/patch/251410/ thanks -- PMM
Re: [Qemu-devel] [RFC 0/3] Determinitic behaviour with icount.
Il 18/07/2013 18:31, Frederic Konrad ha scritto: > On 18/07/2013 17:35, Paolo Bonzini wrote: >> Il 18/07/2013 17:06, Peter Maydell ha scritto: >>> On 18 July 2013 16:02, wrote: As I said in the last email, we have issues with determinism with icount. We are wondering if determinism is really ensured with icount? >>> My opinion is that it *should* be deterministic but it would >>> be unsurprising if the determinism had got broken along the way. >> First of all, it can only be deterministic if the guest satisfies (at >> least) all the following condition: >> >> 1) only uses timer that QEMU bases on vm_clock (which means that you >> should use "-rtc clock=vm"---sorry Fred, didn't think about this in the >> previous answer); > > Oops sorry, I didn't mentioned that, but we used rtc clock=vm for our > tests. >> 2) never does any network operation nor any asynchronous disk I/O >> operation >> >> 3) never halts the VCPU waiting for an interrupt >> >> >> Point 1 is obvious. >> >> >> To explain points 2, let's consider what happens if a block device uses >> synchronous vs. asynchronous I/O. >> >> With synchronous I/O, each block device operation will complete >> immediately. All clocks are stalled during the operation. >> >> With asynchronous I/O, each block device operation will be done while >> the CPU is running. If the CPU is polling a completion flag, the number >> of instructions executed (thus icount) depends on how long it takes to >> do I/O. > > So I suppose this can happen even if there are any network card or block > device. > > We probably need to disable it until we finally save and replay IO, to > get this thing working. Are you aware of the work that was done on fault tolerance (Kemari)? Orit is working on resurrecting it. Paolo
Re: [Qemu-devel] [RFC 0/3] Determinitic behaviour with icount.
On 18/07/2013 17:35, Paolo Bonzini wrote: Il 18/07/2013 17:06, Peter Maydell ha scritto: On 18 July 2013 16:02, wrote: As I said in the last email, we have issues with determinism with icount. We are wondering if determinism is really ensured with icount? My opinion is that it *should* be deterministic but it would be unsurprising if the determinism had got broken along the way. First of all, it can only be deterministic if the guest satisfies (at least) all the following condition: 1) only uses timer that QEMU bases on vm_clock (which means that you should use "-rtc clock=vm"---sorry Fred, didn't think about this in the previous answer); Oops sorry, I didn't mentioned that, but we used rtc clock=vm for our tests. 2) never does any network operation nor any asynchronous disk I/O operation 3) never halts the VCPU waiting for an interrupt Point 1 is obvious. To explain points 2, let's consider what happens if a block device uses synchronous vs. asynchronous I/O. With synchronous I/O, each block device operation will complete immediately. All clocks are stalled during the operation. With asynchronous I/O, each block device operation will be done while the CPU is running. If the CPU is polling a completion flag, the number of instructions executed (thus icount) depends on how long it takes to do I/O. So I suppose this can happen even if there are any network card or block device. We probably need to disable it until we finally save and replay IO, to get this thing working. To explain point 3 (which is the only one that _might_ be fixable), let's see what happens if the VCPU halts waiting for an interrupt. If that is the case, and you haven't done any asynchronous I/O, there should be active vm_clock timers, and you have another possible source of non-deterministic behavior. The current QEMU behavior is (and has always been) to start tracking rt_clock. This is obviously not deterministic. Note that with the switch to separate threads for iothread/VCPU, the algorithm to do this has become much better. Let's look at a couple possibilities: 2) jump to the next vm_clock deadline. This sounds appealing, but it is still nondeterministic in the general case when the guest *is* doing asynchronous I/O too. How many vm_clock timers do you run before I/O finishes? Furthermore, the vm_clock might move too fast. Think of an RTC clock whose alarm registers are 0/0/0 so it fires at midnight; if it is the only active vm_clock timer, you end up in 2107 even before the kernel boots! Yes I didn't think about that :). 3) do not process vm_clock timers at all unless there is no pending I/O (block/network); if there is none, track rt_clock as in current behavior. I just made it up, but it sounds promising and similar to synchronous I/O. It should not be extremely hard to implement, and it can remove this kind of nondeterminism. But it won't fix the case when the CPU is polling. Thanks, I need to take a look at all this. Paolo ps: I'm not an expert on icount at all, I'm only reasoning of the possible interactions with the main loop. Both icount and reverse execution need an instruction counter. icount use a count-down mechanism but reverse execution need a continuous counter. For now we have build a separate counter and we think that these two counters can be merged. However we would like feedback about this before modifying this. I definitely think that there should only be one counter, not two. thanks -- PMM
Re: [Qemu-devel] [PATCH v2 0/8] Guest memory allocation fixes & cleanup
Ping? Markus Armbruster writes: > All I wanted to do is exit(1) instead of abort() on guest memory > allocation failure [07/08]. But that lead me into a minor #ifdef bog, > and here's what I brought back. Enjoy! > > Testing: > * Christian Borntraeger reports v1 works fine under LPAR (new S390 > KVM, i.e. generic allocation) and as second guest under z/VM (old > S390 KVM, i.e. legacy S390 allocation). Thanks for testing, and for > catching a stupid mistake. v2 differs from v1 only in code that > isn't reachable on S390. > > Changes since v1: > * 5/8: Fix assertion in qemu_ram_remap() (Paolo) > * All other patches unchanged except for Acked-by in commit messages > Changes since RFC: > * 1-3+8/8 unchanged except for commit message tweaks > * 4+6/8 rewritten to address Paolo's review > * 5/8 rewritten: don't fix dead code, just assert it's dead > * 7/8 fix mistakes caught by Richard Henderson and Peter Maydell > > Markus Armbruster (8): > exec: Fix Xen RAM allocation with unusual options > exec: Clean up fall back when -mem-path allocation fails > exec: Reduce ifdeffery around -mem-path > exec: Simplify the guest physical memory allocation hook > exec: Drop incorrect & dead S390 code in qemu_ram_remap() > exec: Clean up unnecessary S390 ifdeffery > exec: Don't abort when we can't allocate guest memory > pc_sysfw: Fix ISA BIOS init for ridiculously big flash > > exec.c | 121 > ++-- > hw/block/pc_sysfw.c | 5 +- > include/exec/cpu-all.h | 2 - > include/exec/exec-all.h | 2 + > include/sysemu/kvm.h| 5 -- > kvm-all.c | 13 -- > target-s390x/kvm.c | 23 +++-- > util/oslib-posix.c | 4 +- > util/oslib-win32.c | 5 +- > 9 files changed, 78 insertions(+), 102 deletions(-)
[Qemu-devel] large size PCI windows vs ivshmem vs windows guests
Currently seabios simply looks for 64 bit BARs and sums them up to get the size of the region. The result is reported in the CRS method in ACPI tables. But this means that e.g. hotplug is broken - if we try to add a huge ivshmem device there won't be place for it in the 32 bit PCI hole, and a 64 bit one isn't declared. So we really should declare some 64 bit windows, but the question is - what's a good value for CRS? It would appear that we should be able to declare a very large window in CRS and be done with it. Works fine for linux, but it turns out adding the following in CRS is enough to make e.g. win7 x86 crash on boot: diff --git a/src/acpi-dsdt-pci-crs.dsl b/src/acpi-dsdt-pci-crs.dsl index d421891..405859d 100644 --- a/src/acpi-dsdt-pci-crs.dsl +++ b/src/acpi-dsdt-pci-crs.dsl @@ -36,6 +36,13 @@ Scope(\_SB.PCI0) { 0x, // Address Translation Offset 0x0002, // Address Length ,, , AddressRangeMemory, TypeStatic) +QWordMemory(ResourceProducer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite, +0x, // Address Space Granularity +0x80,// Address Range Minimum +0x80,// Address Range Maximum +0x, // Address Translation Offset +0x01,// Address Length +,, , AddressRangeMemory, TypeStatic) DWordMemory(ResourceProducer, PosDecode, MinFixed, MaxFixed, NonCacheable, ReadWrite, 0x, // Address Space Granularity 0xE000, // Address Range Minimum The issue is the Length field - if it's 0x008000 or less win7 32 bit boots (an interesting tidbit - it seems possible to declare more than one 2G resource, but each one needs to be at most 2G - did they use some signed integer somewhere?). I'd like to propose the following: - declare a safe value (2G) in CRS by default you won't be able to use ivshmem with huge sizes we can detect common cases and warn users - add a flag controlling size of 64 bit window declared I am guessing most people use <2G shared memory so won't be affected. Comments? -- MST
Re: [Qemu-devel] [PATCH v2 0/2] libqtest leak fix & cleanup
Ping? Markus Armbruster writes: > v2: qtest_start() function comment > > Markus Armbruster (2): > libqtest: Plug fd and memory leaks in qtest_quit() > libqtest: New qtest_end() to go with qtest_start() > > tests/fdc-test.c| 2 +- > tests/hd-geo-test.c | 8 > tests/ide-test.c| 2 +- > tests/libqtest.c| 4 > tests/libqtest.h| 12 > 5 files changed, 22 insertions(+), 6 deletions(-)
Re: [Qemu-devel] seccomp: remove unused syscalls - for 1.6
Eduardo Otubo writes: > Hello all, > > In this small patch series I basically: Cover letter should be marked [PATCH 0/2]. Otherwise it defeats filtering. Would like to see a Reviewed-by from someone before applying this. Regards, Anthony Liguori > > v2 update: > - set libseccomp 2.1.0 as requirement on configure script. > - removed setrlimit and added sendfile64 to the whitelist. > > 1) Remove the ifdef's for the (not so) new libseccomp version that does > a > best effort and translates x86_32 syscalls into x86_64 when possible. > > 2) Remove unused syscalls on the seccomp whitelist. For that removal, I've > been > running several instances of Qemu using a script written on top of > virt-test[0]. After some weeks testing I could come up with this small list, > and safely remove them without breaking anything. > > [0] - https://github.com/autotest/virt-test/wiki
Re: [Qemu-devel] [RFC 1/3] icount: base rt_clock on icount.
Il 18/07/2013 18:23, Frederic Konrad ha scritto: > On 18/07/2013 17:36, Paolo Bonzini wrote: >> Il 18/07/2013 17:02, fred.kon...@greensocs.com ha scritto: >>> From: KONRAD Frederic >>> >>> This bases rt_clock on icount, as vm_clock. >>> So vm_clock = rt_clock. >>> >>> Signed-off-by: KONRAD Frederic >>> --- >>> qemu-timer.c | 6 +- >>> 1 file changed, 5 insertions(+), 1 deletion(-) >>> >>> diff --git a/qemu-timer.c b/qemu-timer.c >>> index b2d95e2..6c607e5 100644 >>> --- a/qemu-timer.c >>> +++ b/qemu-timer.c >>> @@ -401,7 +401,11 @@ int64_t qemu_get_clock_ns(QEMUClock *clock) >>> switch(clock->type) { >>> case QEMU_CLOCK_REALTIME: >>> -return get_clock(); >>> +if (use_icount) { >>> +return cpu_get_icount(); >>> +} else { >>> +return get_clock(); >>> +} >>> default: >>> case QEMU_CLOCK_VIRTUAL: >>> if (use_icount) { >>> >> rt_clock is very little used in general. You should use "-rtc clock=vm" >> if you want to base the RTC on vm_clock. >> >> Paolo > > True but it seems used in some place: > > For example: ui/console.c: > ds->gui_timer = qemu_new_timer_ms(rt_clock, gui_update, ds); > > Maybe it can cause trouble no? In theory it is only used in places where it shouldn't cause trouble (those that do should use the similarly named rtc_clock variable). Paolo
Re: [Qemu-devel] [RFC 1/3] icount: base rt_clock on icount.
On 18/07/2013 17:36, Paolo Bonzini wrote: Il 18/07/2013 17:02, fred.kon...@greensocs.com ha scritto: From: KONRAD Frederic This bases rt_clock on icount, as vm_clock. So vm_clock = rt_clock. Signed-off-by: KONRAD Frederic --- qemu-timer.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/qemu-timer.c b/qemu-timer.c index b2d95e2..6c607e5 100644 --- a/qemu-timer.c +++ b/qemu-timer.c @@ -401,7 +401,11 @@ int64_t qemu_get_clock_ns(QEMUClock *clock) switch(clock->type) { case QEMU_CLOCK_REALTIME: -return get_clock(); +if (use_icount) { +return cpu_get_icount(); +} else { +return get_clock(); +} default: case QEMU_CLOCK_VIRTUAL: if (use_icount) { rt_clock is very little used in general. You should use "-rtc clock=vm" if you want to base the RTC on vm_clock. Paolo True but it seems used in some place: For example: ui/console.c: ds->gui_timer = qemu_new_timer_ms(rt_clock, gui_update, ds); Maybe it can cause trouble no? Fred
[Qemu-devel] [PATCH] util/iov: Fix -O1 uninitialized variable warning
At -O2, code in the form if (p) A; B; if (p) C; may be rearranged via "jump threading" into if (p) { A; B; C; } else { B; } But at -O1 this doesn't happen and we -Werror out here on the "may be used uninitialized" orig_len. Perform this transform by hand so that -O1 remains a viable debugging alternative. Signed-off-by: Richard Henderson --- util/iov.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/util/iov.c b/util/iov.c index cc6e837..a92eb3a 100644 --- a/util/iov.c +++ b/util/iov.c @@ -146,7 +146,7 @@ ssize_t iov_send_recv(int sockfd, struct iovec *iov, unsigned iov_cnt, { ssize_t total = 0; ssize_t ret; -size_t orig_len, tail; +size_t tail; unsigned niov; while (bytes > 0) { @@ -174,21 +174,22 @@ ssize_t iov_send_recv(int sockfd, struct iovec *iov, unsigned iov_cnt, for (niov = 0; niov < iov_cnt && iov[niov].iov_len <= tail; ++niov) { tail -= iov[niov].iov_len; } + if (tail) { /* second, fixup the last element, and remember the original * length */ +size_t orig_len = iov[niov].iov_len; assert(niov < iov_cnt); -assert(iov[niov].iov_len > tail); -orig_len = iov[niov].iov_len; +assert(orig_len > tail); iov[niov++].iov_len = tail; -} -ret = do_send_recv(sockfd, iov, niov, do_send); +ret = do_send_recv(sockfd, iov, niov, do_send); -/* Undo the changes above before checking for errors */ -if (tail) { iov[niov-1].iov_len = orig_len; +} else { +ret = do_send_recv(sockfd, iov, niov, do_send); } + if (offset) { iov[0].iov_base -= offset; iov[0].iov_len += offset; -- 1.8.1.4
Re: [Qemu-devel] [RFC 2/6] OptsVisitor: introduce list modes for interval flattening
On 07/18/13 16:56, Paolo Bonzini wrote: > Il 18/07/2013 15:59, Laszlo Ersek ha scritto: >> The new modes are equal-rank, exclusive sub-modes of LM_IN_PROGRESS. Teach >> opts_next_list(), opts_type_int() and opts_type_uint64() to handle them. > > Perhaps you could use a bitmap then: > > LM_NONE = 0 > LM_STARTED = 1 > LM_IN_PROGRESS = 2 > LM_SIGNED_INTERVAL = LM_IN_PROGRESS | 4 > LM_UNSIGNED_INTERVAL = LM_IN_PROGRESS | 8 > > I think the only change would be that this hunk: > >> @@ -211,7 +238,10 @@ opts_end_list(Visitor *v, Error **errp) >> { >> OptsVisitor *ov = DO_UPCAST(OptsVisitor, visitor, v); >> >> -assert(ov->list_mode == LM_STARTED || ov->list_mode == LM_IN_PROGRESS); >> +assert(ov->list_mode == LM_STARTED || >> + ov->list_mode == LM_IN_PROGRESS || >> + ov->list_mode == LM_SIGNED_INTERVAL || >> + ov->list_mode == LM_UNSIGNED_INTERVAL); >> ov->repeated_opts = NULL; >> ov->list_mode = LM_NONE; >> } > > could be changed to > > assert(ov->list_mode == LM_STARTED || >(ov->list_mode & LM_IN_PROGRESS)); If you don't insist (please don't :)), I wouldn't like to do this. (a) I wanted to represent and query each individual mode explicitly (helps with code search), (b) the "sub-mode" nature is a theoretical thing. It only applies to the two interval modes. This small orthogonality is limited, it doesn't cause "combinatorial explosion" (I didn't have to double the states or such). Most importantly, I specifically wanted one state in general to exclude any other state. Bitmaps beget thinking about the meaning of arbitrary variations, like 1|4, 0|2 etc; I intended to prevent such thoughts right in the type. Thanks Laszlo