[Qemu-devel] [PATCH v4 01/13] block: vhdx - minor comments and typo correction.
Just a couple of minor comments to help note where allocated buffers are freed, and a typo fix. Signed-off-by: Jeff Cody jc...@redhat.com --- block/vhdx.c | 6 -- block/vhdx.h | 6 +++--- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/block/vhdx.c b/block/vhdx.c index e9704b1..56bc88e 100644 --- a/block/vhdx.c +++ b/block/vhdx.c @@ -6,9 +6,9 @@ * Authors: * Jeff Cody jc...@redhat.com * - * This is based on the VHDX Format Specification v0.95, published 4/12/2012 + * This is based on the VHDX Format Specification v1.00, published 8/25/2012 * by Microsoft: - * https://www.microsoft.com/en-us/download/details.aspx?id=29681 + * https://www.microsoft.com/en-us/download/details.aspx?id=34750 * * This work is licensed under the terms of the GNU LGPL, version 2 or later. * See the COPYING.LIB file in the top-level directory. @@ -262,6 +262,7 @@ static int vhdx_parse_header(BlockDriverState *bs, BDRVVHDXState *s) uint64_t h2_seq = 0; uint8_t *buffer; +/* header1 header2 are freed in vhdx_close() */ header1 = qemu_blockalign(bs, sizeof(VHDXHeader)); header2 = qemu_blockalign(bs, sizeof(VHDXHeader)); @@ -787,6 +788,7 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, int flags) goto fail; } +/* s-bat is freed in vhdx_close() */ s-bat = qemu_blockalign(bs, s-bat_rt.length); ret = bdrv_pread(bs-file, s-bat_offset, s-bat, s-bat_rt.length); diff --git a/block/vhdx.h b/block/vhdx.h index fb687ed..9eb6b97 100644 --- a/block/vhdx.h +++ b/block/vhdx.h @@ -6,9 +6,9 @@ * Authors: * Jeff Cody jc...@redhat.com * - * This is based on the VHDX Format Specification v0.95, published 4/12/2012 + * This is based on the VHDX Format Specification v1.00, published 8/25/2012 * by Microsoft: - * https://www.microsoft.com/en-us/download/details.aspx?id=29681 + * https://www.microsoft.com/en-us/download/details.aspx?id=34750 * * This work is licensed under the terms of the GNU LGPL, version 2 or later. * See the COPYING.LIB file in the top-level directory. @@ -116,7 +116,7 @@ typedef struct QEMU_PACKED VHDXHeader { valid. */ uint16_tlog_version;/* version of the log format. Mustn't be zero, unless log_guid is also zero */ -uint16_tversion;/* version of th evhdx file. Currently, +uint16_tversion;/* version of the vhdx file. Currently, only supported version is 1 */ uint32_tlog_length; /* length of the log. Must be multiple of 1MB */ -- 1.8.1.4
[Qemu-devel] [PATCH v4 00/13] VHDX log replay and write support, .bdrv_create()
This patch series contains the initial VHDX log parsing, replay, and write support. (New with v4: VHDX image file creation) === v4 changes === v4 patches are available from github as well, on branch vhdx-write-v4-upstream: https://github.com/codyprime/qemu-kvm-jtc/tree/vhdx-write-v4-upstream https://github.com/codyprime/qemu-kvm-jtc.git Those in the midst of reviewing v3, don't fear - the only changes with v4 is the addition of patches on the end of the series (patches 10-13). These patches enable creating VHDX images. Image files created have been (briefly lightly) tested on Hyper-V running on Windows Server 2012. Some of the new patches could be squashed with earlier patches in the series, but I refrained from doing so, since some of the patches have already been reviewed, and others are in the midst of review. I want to make it as easy as possible on those currently reviewing. There is nothing critical that needs to be pushed into the earlier patches. New patches: Patch 10: Breaks out some more endian translation functions (likely squashable into patch 5) Patch 11: Break out some operations into seperate helper functions Patch 12: More comment typos and header fixes in vhdx.h (likely squashable into patch 1) Patch 13: Adds .bdrv_create() for vhdx. VHDX images are can be created for Fixed or Dynamic images. Patches 1-9 are unchanged. === end v4 changelog === === v3 changes === Thank you Kevin Stefan for the feedback; incoporated in v3: Patch 1: --- nil --- Patch 2: * use sizeof(crc) instead of 4 * remove outdated comment * use sizeof(MSGUID) instead of 16 * direct assignment of guid structs rather than memcpy * rename 'rw' to 'generate_data_write_guid' * use offsetof() instead of 4 * comment typos * add missing error checking * MSGUID is now QEMU_PACKED * configure enable for vhdx is now correct and not braindead Patch 3: --- nil --- Patch 4: * code style fixes * removed unused struct (VHDXLogEntryInfo) Patch 5: * more direct assignment of guid rather than memcpy * order of operation in export/import the same now * became less generous with newlines (bah-humbug!) Patch 6: * more direct assignment of guid rather than memcpy * add error check in vhdx_user_visible_write(), now returns int Patch 7: * check error return now of vhdx_user_visible_write() Patch 8: * check error return now of vhdx_user_visible_write() * vhdx_log_write_sectors() uses bdrv_pwrite() vs bdrv_pwrite_sync() * more direct assignment of guid rather than memcpy * use offsetof() instead of 4 Patch 9: --- nil --- === end v3 changelog === v2 changes: Incorporated Fam's review feedback - Thank you Fam for the feedback === end v2 changelog === This will allow an existing log in a VHDX image to be replayed (e.g., a VHDX image from a Hyper-V host that crashed). In addition, metadata writes are enabled through the log. This allows write support to be enabled for VHDX, as the BAT can be updated safely via the log journal. These exact patches are available from github, for testing: https://github.com/codyprime/qemu-kvm-jtc/tree/vhdx-write-v2-upstream The latest vhdx work (including anything beyond these patches, such as backing/parent file support) can be found at: https://github.com/codyprime/qemu-kvm-jtc/tree/jtc-vhdx-latest Jeff Cody (13): block: vhdx - minor comments and typo correction. block: vhdx - add header update capability. block: vhdx code movement - VHDXMetadataEntries and BDRVVHDXState to header. block: vhdx - log support struct and defines block: vhdx - break endian translation functions out block: vhdx - update log guid in header, and first write tracker block: vhdx - log parsing, replay, and flush support block: vhdx - add log write support block: vhdx write support block: vhdx - move more endian translations to vhdx-endian.c block: vhdx - break out code operations to functions block: vhdx - fix comment typos in header, fix incorrect struct fields block: vhdx - add .bdrv_create() support block/Makefile.objs |2 +- block/vhdx-endian.c | 216 +++ block/vhdx-log.c| 1011 ++ block/vhdx.c| 1014 --- block/vhdx.h| 167 +++-- configure | 24 ++ 6 files changed, 2273 insertions(+), 161 deletions(-) create mode 100644 block/vhdx-endian.c create mode 100644 block/vhdx-log.c -- 1.8.1.4
[Qemu-devel] [PATCH v4 02/13] block: vhdx - add header update capability.
This adds the ability to update the headers in a VHDX image, including generating a new MS-compatible GUID. As VHDX depends on uuid.h, VHDX is now a configurable build option. If VHDX support is enabled, that will also enable uuid as well. The default is to have VHDX enabled. To enable/disable VHDX: --enable-vhdx, --disable-vhdx Signed-off-by: Jeff Cody jc...@redhat.com --- block/Makefile.objs | 2 +- block/vhdx.c| 161 +++- block/vhdx.h| 14 - configure | 24 4 files changed, 196 insertions(+), 5 deletions(-) diff --git a/block/Makefile.objs b/block/Makefile.objs index 4cf9aa4..e5e54e6 100644 --- a/block/Makefile.objs +++ b/block/Makefile.objs @@ -2,7 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o block-obj-y += qed-check.o -block-obj-y += vhdx.o +block-obj-$(CONFIG_VHDX) += vhdx.o block-obj-y += parallels.o blkdebug.o blkverify.o block-obj-y += snapshot.o qapi.o block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o diff --git a/block/vhdx.c b/block/vhdx.c index 56bc88e..7bd7c12 100644 --- a/block/vhdx.c +++ b/block/vhdx.c @@ -21,6 +21,7 @@ #include qemu/crc32c.h #include block/vhdx.h +#include uuid/uuid.h /* Several metadata and region table data entries are identified by * guids in a MS-specific GUID format. */ @@ -156,11 +157,40 @@ typedef struct BDRVVHDXState { VHDXBatEntry *bat; uint64_t bat_offset; +MSGUID session_guid; + + VHDXParentLocatorHeader parent_header; VHDXParentLocatorEntry *parent_entries; } BDRVVHDXState; +/* Calculates new checksum. + * + * Zero is substituted during crc calculation for the original crc field + * crc_offset: byte offset in buf of the buffer crc + * buf: buffer pointer + * size: size of buffer (must be crc_offset+4) + * + * Note: The resulting checksum is in the CPU endianness, not necessarily + * in the file format endianness (LE). Any header export to disk should + * make sure that vhdx_header_le_export() is used to convert to the + * correct endianness + */ +uint32_t vhdx_update_checksum(uint8_t *buf, size_t size, int crc_offset) +{ +uint32_t crc; + +assert(buf != NULL); +assert(size (crc_offset + sizeof(crc))); + +memset(buf + crc_offset, 0, sizeof(crc)); +crc = crc32c(0x, buf, size); +memcpy(buf + crc_offset, crc, sizeof(crc)); + +return crc; +} + uint32_t vhdx_checksum_calc(uint32_t crc, uint8_t *buf, size_t size, int crc_offset) { @@ -212,6 +242,19 @@ bool vhdx_checksum_is_valid(uint8_t *buf, size_t size, int crc_offset) /* + * This generates a UUID that is compliant with the MS GUIDs used + * in the VHDX spec (and elsewhere). + */ +void vhdx_guid_generate(MSGUID *guid) +{ +uuid_t uuid; +assert(guid != NULL); + +uuid_generate(uuid); +memcpy(guid, uuid, sizeof(MSGUID)); +} + +/* * Per the MS VHDX Specification, for every VHDX file: * - The header section is fixed size - 1 MB * - The header section is always the first object @@ -249,6 +292,113 @@ static void vhdx_header_le_import(VHDXHeader *h) le64_to_cpus(h-log_offset); } +/* All VHDX structures on disk are little endian */ +static void vhdx_header_le_export(VHDXHeader *orig_h, VHDXHeader *new_h) +{ +assert(orig_h != NULL); +assert(new_h != NULL); + +new_h-signature = cpu_to_le32(orig_h-signature); +new_h-checksum= cpu_to_le32(orig_h-checksum); +new_h-sequence_number = cpu_to_le64(orig_h-sequence_number); + +new_h-file_write_guid = orig_h-file_write_guid; +new_h-data_write_guid = orig_h-data_write_guid; +new_h-log_guid= orig_h-log_guid; + +cpu_to_leguids(new_h-file_write_guid); +cpu_to_leguids(new_h-data_write_guid); +cpu_to_leguids(new_h-log_guid); + +new_h-log_version = cpu_to_le16(orig_h-log_version); +new_h-version = cpu_to_le16(orig_h-version); +new_h-log_length = cpu_to_le32(orig_h-log_length); +new_h-log_offset = cpu_to_le64(orig_h-log_offset); +} + +/* Update the VHDX headers + * + * This follows the VHDX spec procedures for header updates. + * + * - non-current header is updated with largest sequence number + */ +static int vhdx_update_header(BlockDriverState *bs, BDRVVHDXState *s, + bool generate_data_write_guid) +{ +int ret = 0; +int hdr_idx = 0; +uint64_t header_offset = VHDX_HEADER1_OFFSET; + +VHDXHeader *active_header; +VHDXHeader *inactive_header; +VHDXHeader header_le; +uint8_t *buffer; + +/* operate on the non-current header */ +if (s-curr_header == 0) { +hdr_idx = 1; +header_offset = VHDX_HEADER2_OFFSET; +} + +active_header =
[Qemu-devel] [PATCH v4 03/13] block: vhdx code movement - VHDXMetadataEntries and BDRVVHDXState to header.
In preparation for VHDX log support, move these structures to the header. Signed-off-by: Jeff Cody jc...@redhat.com --- block/vhdx.c | 51 --- block/vhdx.h | 47 +++ 2 files changed, 47 insertions(+), 51 deletions(-) diff --git a/block/vhdx.c b/block/vhdx.c index 7bd7c12..68648e1 100644 --- a/block/vhdx.c +++ b/block/vhdx.c @@ -104,16 +104,6 @@ static const MSGUID parent_vhdx_guid = { .data1 = 0xb04aefb7, META_PAGE_83_PRESENT | META_LOGICAL_SECTOR_SIZE_PRESENT | \ META_PHYS_SECTOR_SIZE_PRESENT) -typedef struct VHDXMetadataEntries { -VHDXMetadataTableEntry file_parameters_entry; -VHDXMetadataTableEntry virtual_disk_size_entry; -VHDXMetadataTableEntry page83_data_entry; -VHDXMetadataTableEntry logical_sector_size_entry; -VHDXMetadataTableEntry phys_sector_size_entry; -VHDXMetadataTableEntry parent_locator_entry; -uint16_t present; -} VHDXMetadataEntries; - typedef struct VHDXSectorInfo { uint32_t bat_idx; /* BAT entry index */ @@ -124,47 +114,6 @@ typedef struct VHDXSectorInfo { uint64_t block_offset; /* block offset, in bytes */ } VHDXSectorInfo; - - -typedef struct BDRVVHDXState { -CoMutex lock; - -int curr_header; -VHDXHeader *headers[2]; - -VHDXRegionTableHeader rt; -VHDXRegionTableEntry bat_rt; /* region table for the BAT */ -VHDXRegionTableEntry metadata_rt;/* region table for the metadata */ - -VHDXMetadataTableHeader metadata_hdr; -VHDXMetadataEntries metadata_entries; - -VHDXFileParameters params; -uint32_t block_size; -uint32_t block_size_bits; -uint32_t sectors_per_block; -uint32_t sectors_per_block_bits; - -uint64_t virtual_disk_size; -uint32_t logical_sector_size; -uint32_t physical_sector_size; - -uint64_t chunk_ratio; -uint32_t chunk_ratio_bits; -uint32_t logical_sector_size_bits; - -uint32_t bat_entries; -VHDXBatEntry *bat; -uint64_t bat_offset; - -MSGUID session_guid; - - -VHDXParentLocatorHeader parent_header; -VHDXParentLocatorEntry *parent_entries; - -} BDRVVHDXState; - /* Calculates new checksum. * * Zero is substituted during crc calculation for the original crc field diff --git a/block/vhdx.h b/block/vhdx.h index 403f766..74b2d5d 100644 --- a/block/vhdx.h +++ b/block/vhdx.h @@ -308,6 +308,53 @@ typedef struct QEMU_PACKED VHDXParentLocatorEntry { /* - END VHDX SPECIFICATION STRUCTURES */ +typedef struct VHDXMetadataEntries { +VHDXMetadataTableEntry file_parameters_entry; +VHDXMetadataTableEntry virtual_disk_size_entry; +VHDXMetadataTableEntry page83_data_entry; +VHDXMetadataTableEntry logical_sector_size_entry; +VHDXMetadataTableEntry phys_sector_size_entry; +VHDXMetadataTableEntry parent_locator_entry; +uint16_t present; +} VHDXMetadataEntries; + +typedef struct BDRVVHDXState { +CoMutex lock; + +int curr_header; +VHDXHeader *headers[2]; + +VHDXRegionTableHeader rt; +VHDXRegionTableEntry bat_rt; /* region table for the BAT */ +VHDXRegionTableEntry metadata_rt;/* region table for the metadata */ + +VHDXMetadataTableHeader metadata_hdr; +VHDXMetadataEntries metadata_entries; + +VHDXFileParameters params; +uint32_t block_size; +uint32_t block_size_bits; +uint32_t sectors_per_block; +uint32_t sectors_per_block_bits; + +uint64_t virtual_disk_size; +uint32_t logical_sector_size; +uint32_t physical_sector_size; + +uint64_t chunk_ratio; +uint32_t chunk_ratio_bits; +uint32_t logical_sector_size_bits; + +uint32_t bat_entries; +VHDXBatEntry *bat; +uint64_t bat_offset; + +MSGUID session_guid; + +VHDXParentLocatorHeader parent_header; +VHDXParentLocatorEntry *parent_entries; + +} BDRVVHDXState; void vhdx_guid_generate(MSGUID *guid); -- 1.8.1.4
[Qemu-devel] [PATCH v4 08/13] block: vhdx - add log write support
This adds support for writing to the VHDX log. For spec details, see VHDX Specification Format v1.00: https://www.microsoft.com/en-us/download/details.aspx?id=34750 There are a few limitations to this log support: 1.) There is no caching yet 2.) The log is flushed after each entry The primary write interface, vhdx_log_write_and_flush(), performs a log write followed by an immediate flush of the log. As each log entry sector is a minimum of 4KB, partial sector writes are filled in with data from the disk write destination. If the current file log GUID is 0, a new GUID is generated and updated in the header. Signed-off-by: Jeff Cody jc...@redhat.com --- block/vhdx-log.c | 277 +++ block/vhdx.h | 3 + 2 files changed, 280 insertions(+) diff --git a/block/vhdx-log.c b/block/vhdx-log.c index 56808e0..7db6989 100644 --- a/block/vhdx-log.c +++ b/block/vhdx-log.c @@ -170,6 +170,55 @@ exit: return ret; } +/* Writes num_sectors to the log (all log sectors are 4096 bytes), + * from buffer 'buffer'. Upon return, *sectors_written will contain + * the number of sectors successfully written. + * + * It is assumed that 'buffer' is at least 4096*num_sectors large. + * + * 0 is returned on success, -errno otherwise */ +static int vhdx_log_write_sectors(BlockDriverState *bs, VHDXLogEntries *log, + uint32_t *sectors_written, void *buffer, + uint32_t num_sectors) +{ +int ret = 0; +uint64_t offset; +uint32_t write; +void *buffer_tmp; +BDRVVHDXState *s = bs-opaque; + +ret = vhdx_user_visible_write(bs, s); +if (ret 0) { +goto exit; +} + +write = log-write; + +buffer_tmp = buffer; +while (num_sectors) { + +offset = log-offset + write; +write = vhdx_log_inc_idx(write, log-length); +if (write == log-read) { +/* full */ +break; +} +ret = bdrv_pwrite(bs-file, offset, buffer_tmp, VHDX_LOG_SECTOR_SIZE); +if (ret 0) { +goto exit; +} +buffer_tmp += VHDX_LOG_SECTOR_SIZE; + +log-write = write; +*sectors_written = *sectors_written + 1; +num_sectors--; +} + +exit: +return ret; +} + + /* Validates a log entry header */ static bool vhdx_log_hdr_is_valid(VHDXLogEntries *log, VHDXLogEntryHeader *hdr, BDRVVHDXState *s) @@ -732,3 +781,231 @@ exit: return ret; } + + +static void vhdx_log_raw_to_le_sector(VHDXLogDescriptor *desc, + VHDXLogDataSector *sector, void *data, + uint64_t seq) +{ +/* 8 + 4084 + 4 = 4096, 1 log sector */ +memcpy(desc-leading_bytes, data, 8); +data += 8; +cpu_to_le64s(desc-leading_bytes); +memcpy(sector-data, data, 4084); +data += 4084; +memcpy(desc-trailing_bytes, data, 4); +cpu_to_le32s(desc-trailing_bytes); +data += 4; + +sector-sequence_high = (uint32_t) (seq 32); +sector-sequence_low = (uint32_t) (seq 0x); +sector-data_signature = VHDX_LOG_DATA_SIGNATURE; + +vhdx_log_desc_le_export(desc); +vhdx_log_data_le_export(sector); +} + + +static int vhdx_log_write(BlockDriverState *bs, BDRVVHDXState *s, + void *data, uint32_t length, uint64_t offset) +{ +int ret = 0; +void *buffer = NULL; +void *merged_sector = NULL; +void *data_tmp, *sector_write; +unsigned int i; +int sector_offset; +uint32_t desc_sectors, sectors, total_length; +uint32_t sectors_written = 0; +uint32_t aligned_length; +uint32_t leading_length = 0; +uint32_t trailing_length = 0; +uint32_t partial_sectors = 0; +uint32_t bytes_written = 0; +uint64_t file_offset; +VHDXHeader *header; +VHDXLogEntryHeader new_hdr; +VHDXLogDescriptor *new_desc = NULL; +VHDXLogDataSector *data_sector = NULL; +MSGUID new_guid = { 0 }; + +header = s-headers[s-curr_header]; + +/* need to have offset read data, and be on 4096 byte boundary */ + +if (length header-log_length) { +/* no log present. we could create a log here instead of failing */ +ret = -EINVAL; +goto exit; +} + +if (vhdx_log_guid_is_zero(header-log_guid)) { +vhdx_guid_generate(new_guid); +vhdx_update_headers(bs, s, false, new_guid); +} else { +/* currently, we require that the log be flushed after + * every write. */ +ret = -ENOTSUP; +} + +/* 0 is an invalid sequence number, but may also represent the first + * log write (or a wrapped seq) */ +if (s-log.sequence == 0) { +s-log.sequence = 1; +} + +sector_offset = offset % VHDX_LOG_SECTOR_SIZE; +file_offset = (offset / VHDX_LOG_SECTOR_SIZE) * VHDX_LOG_SECTOR_SIZE; + +aligned_length = length; + +/* add in the unaligned head and tail bytes */
[Qemu-devel] [PATCH v4 10/13] block: vhdx - move more endian translations to vhdx-endian.c
In preperation for vhdx_create(), move more endian translation functions out to vhdx-endian.c. Signed-off-by: Jeff Cody jc...@redhat.com --- block/vhdx-endian.c | 75 + block/vhdx.c| 20 +++--- block/vhdx.h| 9 ++- 3 files changed, 87 insertions(+), 17 deletions(-) diff --git a/block/vhdx-endian.c b/block/vhdx-endian.c index 3e93e63..fe879ed 100644 --- a/block/vhdx-endian.c +++ b/block/vhdx-endian.c @@ -139,3 +139,78 @@ void vhdx_log_entry_hdr_le_export(VHDXLogEntryHeader *hdr) } +/* Region table entries */ +void vhdx_region_header_le_import(VHDXRegionTableHeader *hdr) +{ +assert(hdr != NULL); + +le32_to_cpus(hdr-signature); +le32_to_cpus(hdr-checksum); +le32_to_cpus(hdr-entry_count); +} + +void vhdx_region_header_le_export(VHDXRegionTableHeader *hdr) +{ +assert(hdr != NULL); + +cpu_to_le32s(hdr-signature); +cpu_to_le32s(hdr-checksum); +cpu_to_le32s(hdr-entry_count); +} + +void vhdx_region_entry_le_import(VHDXRegionTableEntry *e) +{ +assert(e != NULL); + +leguid_to_cpus(e-guid); +le64_to_cpus(e-file_offset); +le32_to_cpus(e-length); +le32_to_cpus(e-data_bits); +} + +void vhdx_region_entry_le_export(VHDXRegionTableEntry *e) +{ +assert(e != NULL); + +cpu_to_leguids(e-guid); +cpu_to_le64s(e-file_offset); +cpu_to_le32s(e-length); +cpu_to_le32s(e-data_bits); +} + + +/* Metadata headers table */ +void vhdx_metadata_header_le_import(VHDXMetadataTableHeader *hdr) +{ +assert(hdr != NULL); + +le64_to_cpus(hdr-signature); +le16_to_cpus(hdr-entry_count); +} + +void vhdx_metadata_header_le_export(VHDXMetadataTableHeader *hdr) +{ +assert(hdr != NULL); + +cpu_to_le64s(hdr-signature); +cpu_to_le16s(hdr-entry_count); +} + +void vhdx_metadata_entry_le_import(VHDXMetadataTableEntry *e) +{ +assert(e != NULL); + +leguid_to_cpus(e-item_id); +le32_to_cpus(e-offset); +le32_to_cpus(e-length); +le32_to_cpus(e-data_bits); +} +void vhdx_metadata_entry_le_export(VHDXMetadataTableEntry *e) +{ +assert(e != NULL); + +cpu_to_leguids(e-item_id); +cpu_to_le32s(e-offset); +cpu_to_le32s(e-length); +cpu_to_le32s(e-data_bits); +} diff --git a/block/vhdx.c b/block/vhdx.c index c97955a..c917376 100644 --- a/block/vhdx.c +++ b/block/vhdx.c @@ -424,10 +424,7 @@ static int vhdx_open_region_tables(BlockDriverState *bs, BDRVVHDXState *s) goto fail; } memcpy(s-rt, buffer, sizeof(s-rt)); -le32_to_cpus(s-rt.signature); -le32_to_cpus(s-rt.checksum); -le32_to_cpus(s-rt.entry_count); -le32_to_cpus(s-rt.reserved); +vhdx_region_header_le_import(s-rt); offset += sizeof(s-rt); if (!vhdx_checksum_is_valid(buffer, VHDX_HEADER_BLOCK_SIZE, 4) || @@ -446,10 +443,7 @@ static int vhdx_open_region_tables(BlockDriverState *bs, BDRVVHDXState *s) memcpy(rt_entry, buffer + offset, sizeof(rt_entry)); offset += sizeof(rt_entry); -leguid_to_cpus(rt_entry.guid); -le64_to_cpus(rt_entry.file_offset); -le32_to_cpus(rt_entry.length); -le32_to_cpus(rt_entry.data_bits); +vhdx_region_entry_le_import(rt_entry); /* see if we recognize the entry */ if (guid_eq(rt_entry.guid, bat_guid)) { @@ -524,9 +518,7 @@ static int vhdx_parse_metadata(BlockDriverState *bs, BDRVVHDXState *s) memcpy(s-metadata_hdr, buffer, sizeof(s-metadata_hdr)); offset += sizeof(s-metadata_hdr); -le64_to_cpus(s-metadata_hdr.signature); -le16_to_cpus(s-metadata_hdr.reserved); -le16_to_cpus(s-metadata_hdr.entry_count); +vhdx_metadata_header_le_import(s-metadata_hdr); if (memcmp(s-metadata_hdr.signature, metadata, 8)) { ret = -EINVAL; @@ -545,11 +537,7 @@ static int vhdx_parse_metadata(BlockDriverState *bs, BDRVVHDXState *s) memcpy(md_entry, buffer + offset, sizeof(md_entry)); offset += sizeof(md_entry); -leguid_to_cpus(md_entry.item_id); -le32_to_cpus(md_entry.offset); -le32_to_cpus(md_entry.length); -le32_to_cpus(md_entry.data_bits); -le32_to_cpus(md_entry.reserved2); +vhdx_metadata_entry_le_import(md_entry); if (guid_eq(md_entry.item_id, file_param_guid)) { if (s-metadata_entries.present META_FILE_PARAMETER_PRESENT) { diff --git a/block/vhdx.h b/block/vhdx.h index f3ca516..aab862c 100644 --- a/block/vhdx.h +++ b/block/vhdx.h @@ -412,7 +412,14 @@ void vhdx_log_desc_le_export(VHDXLogDescriptor *d); void vhdx_log_data_le_export(VHDXLogDataSector *d); void vhdx_log_entry_hdr_le_import(VHDXLogEntryHeader *hdr); void vhdx_log_entry_hdr_le_export(VHDXLogEntryHeader *hdr); - +void vhdx_region_header_le_import(VHDXRegionTableHeader *hdr); +void vhdx_region_header_le_export(VHDXRegionTableHeader *hdr); +void vhdx_region_entry_le_import(VHDXRegionTableEntry *e); +void vhdx_region_entry_le_export(VHDXRegionTableEntry *e); +void
[Qemu-devel] [PATCH v4 09/13] block: vhdx write support
This adds support for writing to VHDX image files, using coroutines. Writes into the BAT table goes through the VHDX log. Currently, BAT table writes occur when expanding a dynamic VHDX file, and allocating a new BAT entry. Signed-off-by: Jeff Cody jc...@redhat.com --- block/vhdx.c | 150 ++- 1 file changed, 148 insertions(+), 2 deletions(-) diff --git a/block/vhdx.c b/block/vhdx.c index dabd688..c97955a 100644 --- a/block/vhdx.c +++ b/block/vhdx.c @@ -833,7 +833,7 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, int flags) } } -/* TODO: differencing files, write */ +/* TODO: differencing files */ return 0; fail: @@ -965,7 +965,43 @@ exit: return ret; } +/* + * Allocate a new payload block at the end of the file. + * + * Allocation will happen at 1MB alignment inside the file + * + * Returns the file offset start of the new payload block + */ +static int vhdx_allocate_block(BlockDriverState *bs, BDRVVHDXState *s, +uint64_t *new_offset) +{ +*new_offset = bdrv_getlength(bs-file); + +/* per the spec, the address for a block is in units of 1MB */ +*new_offset = ROUND_UP(*new_offset, 1024*1024); + +return bdrv_truncate(bs-file, *new_offset + s-block_size); +} + +/* + * Update the BAT tablet entry with the new file offset, and the new entry + * state */ +static void vhdx_update_bat_table_entry(BlockDriverState *bs, BDRVVHDXState *s, + VHDXSectorInfo *sinfo, + uint64_t *bat_entry, + uint64_t *bat_offset, int state) +{ +/* The BAT entry is a uint64, with 44 bits for the file offset in units of + * 1MB, and 3 bits for the block state. */ +s-bat[sinfo-bat_idx] = ((sinfo-file_offset20) + VHDX_BAT_FILE_OFF_BITS); +s-bat[sinfo-bat_idx] |= state VHDX_BAT_STATE_BIT_MASK; + +*bat_entry = cpu_to_le64(s-bat[sinfo-bat_idx]); +*bat_offset = s-bat_offset + sinfo-bat_idx * sizeof(VHDXBatEntry); + +} /* Per the spec, on the first write of guest-visible data to the file the * data write guid must be updated in the header */ @@ -982,7 +1018,117 @@ int vhdx_user_visible_write(BlockDriverState *bs, BDRVVHDXState *s) static coroutine_fn int vhdx_co_writev(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov) { -return -ENOTSUP; +int ret = -ENOTSUP; +BDRVVHDXState *s = bs-opaque; +VHDXSectorInfo sinfo; +uint64_t bytes_done = 0; +uint64_t bat_entry = 0; +uint64_t bat_entry_offset = 0; +bool bat_update; +QEMUIOVector hd_qiov; + +qemu_iovec_init(hd_qiov, qiov-niov); + +qemu_co_mutex_lock(s-lock); + +ret = vhdx_user_visible_write(bs, s); +if (ret 0) { +goto exit; +} + +while (nb_sectors 0) { +if (s-params.data_bits VHDX_PARAMS_HAS_PARENT) { +/* not supported yet */ +ret = -ENOTSUP; +goto exit; +} else { +bat_update = false; +vhdx_block_translate(s, sector_num, nb_sectors, sinfo); + +qemu_iovec_reset(hd_qiov); +qemu_iovec_concat(hd_qiov, qiov, bytes_done, sinfo.bytes_avail); +/* check the payload block state */ +switch (s-bat[sinfo.bat_idx] VHDX_BAT_STATE_BIT_MASK) { +case PAYLOAD_BLOCK_ZERO: +/* in this case, we need to preserve zero writes for + * data that is not part of this write, so we must pad + * the rest of the buffer to zeroes */ + +/* if we are on a posix system with ftruncate() that extends + * a file, then it is zero-filled for us. On Win32, the raw + * layer uses SetFilePointer and SetFileEnd, which does not + * zero fill AFAIK */ + +/* TODO: queue another write of zero buffers if the host OS does + * not zero-fill on file extension */ + +/* fall through */ +case PAYLOAD_BLOCK_NOT_PRESENT: /* fall through */ +case PAYLOAD_BLOCK_UNMAPPED:/* fall through */ +case PAYLOAD_BLOCK_UNDEFINED: /* fall through */ +ret = vhdx_allocate_block(bs, s, sinfo.file_offset); +if (ret 0) { +goto exit; +} +/* once we support differencing files, this may also be + * partially present */ +/* update block state to the newly specified state */ +vhdx_update_bat_table_entry(bs, s, sinfo, bat_entry, +bat_entry_offset, +PAYLOAD_BLOCK_FULL_PRESENT); +bat_update = true; +/* since we just allocated
[Qemu-devel] [PATCH v4 05/13] block: vhdx - break endian translation functions out
This moves the endian translation functions out from the vhdx.c source, into a separate source file. In addition to the previously defined endian functions, new endian translation functions for log support are added as well. Signed-off-by: Jeff Cody jc...@redhat.com --- block/Makefile.objs | 2 +- block/vhdx-endian.c | 141 block/vhdx.c| 43 block/vhdx.h| 8 +++ 4 files changed, 150 insertions(+), 44 deletions(-) create mode 100644 block/vhdx-endian.c diff --git a/block/Makefile.objs b/block/Makefile.objs index e5e54e6..e6f5d33 100644 --- a/block/Makefile.objs +++ b/block/Makefile.objs @@ -2,7 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o block-obj-y += qed-check.o -block-obj-$(CONFIG_VHDX) += vhdx.o +block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o block-obj-y += parallels.o blkdebug.o blkverify.o block-obj-y += snapshot.o qapi.o block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o diff --git a/block/vhdx-endian.c b/block/vhdx-endian.c new file mode 100644 index 000..3e93e63 --- /dev/null +++ b/block/vhdx-endian.c @@ -0,0 +1,141 @@ +/* + * Block driver for Hyper-V VHDX Images + * + * Copyright (c) 2013 Red Hat, Inc., + * + * Authors: + * Jeff Cody jc...@redhat.com + * + * This is based on the VHDX Format Specification v1.00, published 8/25/2012 + * by Microsoft: + * https://www.microsoft.com/en-us/download/details.aspx?id=34750 + * + * This work is licensed under the terms of the GNU LGPL, version 2 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ + +#include qemu-common.h +#include block/block_int.h +#include block/vhdx.h + +#include uuid/uuid.h + + +/* + * All the VHDX formats on disk are little endian - the following + * are helper import/export functions to correctly convert + * endianness from disk read to native cpu format, and back again. + */ + + +/* VHDX File Header */ + + +void vhdx_header_le_import(VHDXHeader *h) +{ +assert(h != NULL); + +le32_to_cpus(h-signature); +le32_to_cpus(h-checksum); +le64_to_cpus(h-sequence_number); + +leguid_to_cpus(h-file_write_guid); +leguid_to_cpus(h-data_write_guid); +leguid_to_cpus(h-log_guid); + +le16_to_cpus(h-log_version); +le16_to_cpus(h-version); +le32_to_cpus(h-log_length); +le64_to_cpus(h-log_offset); +} + +void vhdx_header_le_export(VHDXHeader *orig_h, VHDXHeader *new_h) +{ +assert(orig_h != NULL); +assert(new_h != NULL); + +new_h-signature = cpu_to_le32(orig_h-signature); +new_h-checksum= cpu_to_le32(orig_h-checksum); +new_h-sequence_number = cpu_to_le64(orig_h-sequence_number); + +new_h-file_write_guid = orig_h-file_write_guid; +new_h-data_write_guid = orig_h-data_write_guid; +new_h-log_guid= orig_h-log_guid; + +cpu_to_leguids(new_h-file_write_guid); +cpu_to_leguids(new_h-data_write_guid); +cpu_to_leguids(new_h-log_guid); + +new_h-log_version = cpu_to_le16(orig_h-log_version); +new_h-version = cpu_to_le16(orig_h-version); +new_h-log_length = cpu_to_le32(orig_h-log_length); +new_h-log_offset = cpu_to_le64(orig_h-log_offset); +} + + +/* VHDX Log Headers */ + + +void vhdx_log_desc_le_import(VHDXLogDescriptor *d) +{ +assert(d != NULL); + +le32_to_cpus(d-signature); +le32_to_cpus(d-trailing_bytes); +le64_to_cpus(d-leading_bytes); +le64_to_cpus(d-file_offset); +le64_to_cpus(d-sequence_number); +} + +void vhdx_log_desc_le_export(VHDXLogDescriptor *d) +{ +assert(d != NULL); + +cpu_to_le32s(d-signature); +cpu_to_le32s(d-trailing_bytes); +cpu_to_le64s(d-leading_bytes); +cpu_to_le64s(d-file_offset); +cpu_to_le64s(d-sequence_number); +} + +void vhdx_log_data_le_export(VHDXLogDataSector *d) +{ +assert(d != NULL); + +cpu_to_le32s(d-data_signature); +cpu_to_le32s(d-sequence_high); +cpu_to_le32s(d-sequence_low); +} + +void vhdx_log_entry_hdr_le_import(VHDXLogEntryHeader *hdr) +{ +assert(hdr != NULL); + +le32_to_cpus(hdr-signature); +le32_to_cpus(hdr-checksum); +le32_to_cpus(hdr-entry_length); +le32_to_cpus(hdr-tail); +le64_to_cpus(hdr-sequence_number); +le32_to_cpus(hdr-descriptor_count); +leguid_to_cpus(hdr-log_guid); +le64_to_cpus(hdr-flushed_file_offset); +le64_to_cpus(hdr-last_file_offset); +} + +void vhdx_log_entry_hdr_le_export(VHDXLogEntryHeader *hdr) +{ +assert(hdr != NULL); + +cpu_to_le32s(hdr-signature); +cpu_to_le32s(hdr-checksum); +cpu_to_le32s(hdr-entry_length); +cpu_to_le32s(hdr-tail); +cpu_to_le64s(hdr-sequence_number); +cpu_to_le32s(hdr-descriptor_count); +cpu_to_leguids(hdr-log_guid); +cpu_to_le64s(hdr-flushed_file_offset); +
[Qemu-devel] [PATCH v4 06/13] block: vhdx - update log guid in header, and first write tracker
Allow tracking of first file write in the VHDX image, as well as the ability to update the GUID in the header. This is in preparation for log support. Signed-off-by: Jeff Cody jc...@redhat.com --- block/vhdx.c | 30 -- block/vhdx.h | 6 ++ 2 files changed, 30 insertions(+), 6 deletions(-) diff --git a/block/vhdx.c b/block/vhdx.c index 15a4d1d..4dc056b 100644 --- a/block/vhdx.c +++ b/block/vhdx.c @@ -229,7 +229,7 @@ static int vhdx_probe(const uint8_t *buf, int buf_size, const char *filename) * - non-current header is updated with largest sequence number */ static int vhdx_update_header(BlockDriverState *bs, BDRVVHDXState *s, - bool generate_data_write_guid) + bool generate_data_write_guid, MSGUID *log_guid) { int ret = 0; int hdr_idx = 0; @@ -261,6 +261,11 @@ static int vhdx_update_header(BlockDriverState *bs, BDRVVHDXState *s, vhdx_guid_generate(inactive_header-data_write_guid); } +/* update the log guid if present */ +if (log_guid) { +inactive_header-log_guid = *log_guid; +} + /* the header checksum is not over just the packed size of VHDXHeader, * but rather over the entire 'reserved' range for the header, which is * 4KB (VHDX_HEADER_SIZE). */ @@ -293,16 +298,16 @@ exit: * The VHDX spec calls for header updates to be performed twice, so that both * the current and non-current header have valid info */ -static int vhdx_update_headers(BlockDriverState *bs, BDRVVHDXState *s, - bool generate_data_write_guid) +int vhdx_update_headers(BlockDriverState *bs, BDRVVHDXState *s, +bool generate_data_write_guid, MSGUID *log_guid) { int ret; -ret = vhdx_update_header(bs, s, generate_data_write_guid); +ret = vhdx_update_header(bs, s, generate_data_write_guid, log_guid); if (ret 0) { return ret; } -ret = vhdx_update_header(bs, s, generate_data_write_guid); +ret = vhdx_update_header(bs, s, generate_data_write_guid, log_guid); return ret; } @@ -782,6 +787,7 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, int flags) s-bat = NULL; +s-first_visible_write = true; qemu_co_mutex_init(s-lock); @@ -862,7 +868,7 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, int flags) } if (flags BDRV_O_RDWR) { -ret = vhdx_update_headers(bs, s, false); +ret = vhdx_update_headers(bs, s, false, NULL); if (ret 0) { goto fail; } @@ -1002,6 +1008,18 @@ exit: +/* Per the spec, on the first write of guest-visible data to the file the + * data write guid must be updated in the header */ +int vhdx_user_visible_write(BlockDriverState *bs, BDRVVHDXState *s) +{ +int ret = 0; +if (s-first_visible_write) { +s-first_visible_write = false; +ret = vhdx_update_headers(bs, s, true, NULL); +} +return ret; +} + static coroutine_fn int vhdx_co_writev(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov) { diff --git a/block/vhdx.h b/block/vhdx.h index 89d9a78..8c61bfd 100644 --- a/block/vhdx.h +++ b/block/vhdx.h @@ -361,6 +361,7 @@ typedef struct BDRVVHDXState { VHDXBatEntry *bat; uint64_t bat_offset; +bool first_visible_write; MSGUID session_guid; VHDXLogEntries log; @@ -372,6 +373,9 @@ typedef struct BDRVVHDXState { void vhdx_guid_generate(MSGUID *guid); +int vhdx_update_headers(BlockDriverState *bs, BDRVVHDXState *s, bool rw, +MSGUID *log_guid); + uint32_t vhdx_update_checksum(uint8_t *buf, size_t size, int crc_offset); uint32_t vhdx_checksum_calc(uint32_t crc, uint8_t *buf, size_t size, int crc_offset); @@ -401,4 +405,6 @@ void vhdx_log_data_le_export(VHDXLogDataSector *d); void vhdx_log_entry_hdr_le_import(VHDXLogEntryHeader *hdr); void vhdx_log_entry_hdr_le_export(VHDXLogEntryHeader *hdr); +int vhdx_user_visible_write(BlockDriverState *bs, BDRVVHDXState *s); + #endif -- 1.8.1.4
[Qemu-devel] [PATCH v4 11/13] block: vhdx - break out code operations to functions
This is preperation for vhdx_create(). The ability to write headers, and calculate the number of BAT entries will be needed within the create() functions, so move this relevant code into helper functions. Signed-off-by: Jeff Cody jc...@redhat.com --- block/vhdx.c | 121 +++ 1 file changed, 80 insertions(+), 41 deletions(-) diff --git a/block/vhdx.c b/block/vhdx.c index c917376..5c7d0c4 100644 --- a/block/vhdx.c +++ b/block/vhdx.c @@ -203,6 +203,14 @@ void vhdx_guid_generate(MSGUID *guid) memcpy(guid, uuid, sizeof(MSGUID)); } +static void vhdx_set_shift_bits(BDRVVHDXState *s) +{ +s-logical_sector_size_bits = 31 - clz32(s-logical_sector_size); +s-sectors_per_block_bits = 31 - clz32(s-sectors_per_block); +s-chunk_ratio_bits = 63 - clz64(s-chunk_ratio); +s-block_size_bits = 31 - clz32(s-block_size); +} + /* * Per the MS VHDX Specification, for every VHDX file: * - The header section is fixed size - 1 MB @@ -222,6 +230,50 @@ static int vhdx_probe(const uint8_t *buf, int buf_size, const char *filename) return 0; } +/* + * Writes the header to the specified offset. + * + * This will optionally read in buffer data from disk (otherwise zero-fill), + * and then update the header checksum. Header is converted to proper + * endianness before being written to the specified file offset + */ +static int vhdx_write_header(BlockDriverState *bs_file, VHDXHeader *hdr, + uint64_t offset, bool read) +{ +uint8_t *buffer = NULL; +int ret; +VHDXHeader header_le; + +assert(bs_file != NULL); +assert(hdr != NULL); + +/* the header checksum is not over just the packed size of VHDXHeader, + * but rather over the entire 'reserved' range for the header, which is + * 4KB (VHDX_HEADER_SIZE). */ + +buffer = qemu_blockalign(bs_file, VHDX_HEADER_SIZE); +if (read) { +/* if true, we can't assume the extra reserved bytes are 0 */ +ret = bdrv_pread(bs_file, offset, buffer, VHDX_HEADER_SIZE); +if (ret 0) { +goto exit; +} +} else { +memset(buffer, 0, VHDX_HEADER_SIZE); +} + +/* overwrite the actual VHDXHeader portion */ +memcpy(buffer, hdr, sizeof(VHDXHeader)); +hdr-checksum = vhdx_update_checksum(buffer, VHDX_HEADER_SIZE, + offsetof(VHDXHeader, checksum)); +vhdx_header_le_export(hdr, header_le); +ret = bdrv_pwrite_sync(bs_file, offset, header_le, sizeof(VHDXHeader)); + +exit: +qemu_vfree(buffer); +return ret; +} + /* Update the VHDX headers * * This follows the VHDX spec procedures for header updates. @@ -237,8 +289,6 @@ static int vhdx_update_header(BlockDriverState *bs, BDRVVHDXState *s, VHDXHeader *active_header; VHDXHeader *inactive_header; -VHDXHeader header_le; -uint8_t *buffer; /* operate on the non-current header */ if (s-curr_header == 0) { @@ -266,31 +316,13 @@ static int vhdx_update_header(BlockDriverState *bs, BDRVVHDXState *s, inactive_header-log_guid = *log_guid; } -/* the header checksum is not over just the packed size of VHDXHeader, - * but rather over the entire 'reserved' range for the header, which is - * 4KB (VHDX_HEADER_SIZE). */ - -buffer = qemu_blockalign(bs, VHDX_HEADER_SIZE); -/* we can't assume the extra reserved bytes are 0 */ -ret = bdrv_pread(bs-file, header_offset, buffer, VHDX_HEADER_SIZE); -if (ret 0) { -goto exit; -} -/* overwrite the actual VHDXHeader portion */ -memcpy(buffer, inactive_header, sizeof(VHDXHeader)); -inactive_header-checksum = -vhdx_update_checksum(buffer, VHDX_HEADER_SIZE, - offsetof(VHDXHeader, checksum)); -vhdx_header_le_export(inactive_header, header_le); -ret = bdrv_pwrite_sync(bs-file, header_offset, header_le, - sizeof(VHDXHeader)); +vhdx_write_header(bs-file, inactive_header, header_offset, true); if (ret 0) { goto exit; } s-curr_header = hdr_idx; exit: -qemu_vfree(buffer); return ret; } @@ -710,10 +742,7 @@ static int vhdx_parse_metadata(BlockDriverState *bs, BDRVVHDXState *s) goto exit; } -s-logical_sector_size_bits = 31 - clz32(s-logical_sector_size); -s-sectors_per_block_bits = 31 - clz32(s-sectors_per_block); -s-chunk_ratio_bits = 63 - clz64(s-chunk_ratio); -s-block_size_bits = 31 - clz32(s-block_size); +vhdx_set_shift_bits(s); ret = 0; @@ -722,6 +751,31 @@ exit: return ret; } +/* + * Calculate the number of BAT entries, including sector + * bitmap entries. + */ +static void vhdx_calc_bat_entries(BDRVVHDXState *s) +{ +uint32_t data_blocks_cnt, bitmap_blocks_cnt; + +data_blocks_cnt = s-virtual_disk_size s-block_size_bits; +if
[Qemu-devel] [PATCH v4 07/13] block: vhdx - log parsing, replay, and flush support
This adds support for VHDX v0 logs, as specified in Microsoft's VHDX Specification Format v1.00: https://www.microsoft.com/en-us/download/details.aspx?id=34750 The following support is added: * Log parsing, and validation - validate that an existing log is correct. * Log search - search through an existing log, to find any valid sequence of entries. * Log replay and flush - replay an existing log, and flush/clear the log when complete. The VHDX log is a circular buffer, with elements (sectors) of 4KB. A log entry is a variably-length number of sectors, that is comprised of a header and 'descriptors', that describe each sector. A log may contain multiple entries, know as a log sequence. In a log sequence, each log entry immediately follows the previous entry, with an incrementing sequence number. There can only ever be one active and valid sequence in the log. Each log entry must match the file log GUID in order to be valid (along with other criteria). Once we have flushed all valid log entries, we marked the file log GUID to be zero, which indicates a buffer with no valid entries. Signed-off-by: Jeff Cody jc...@redhat.com --- block/Makefile.objs | 2 +- block/vhdx-log.c| 734 block/vhdx.c| 44 +--- block/vhdx.h| 7 +- 4 files changed, 743 insertions(+), 44 deletions(-) create mode 100644 block/vhdx-log.c diff --git a/block/Makefile.objs b/block/Makefile.objs index e6f5d33..2fbd79a 100644 --- a/block/Makefile.objs +++ b/block/Makefile.objs @@ -2,7 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o block-obj-y += qed-check.o -block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o +block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o block-obj-y += parallels.o blkdebug.o blkverify.o block-obj-y += snapshot.o qapi.o block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o diff --git a/block/vhdx-log.c b/block/vhdx-log.c new file mode 100644 index 000..56808e0 --- /dev/null +++ b/block/vhdx-log.c @@ -0,0 +1,734 @@ +/* + * Block driver for Hyper-V VHDX Images + * + * Copyright (c) 2013 Red Hat, Inc., + * + * Authors: + * Jeff Cody jc...@redhat.com + * + * This is based on the VHDX Format Specification v1.00, published 8/25/2012 + * by Microsoft: + * https://www.microsoft.com/en-us/download/details.aspx?id=34750 + * + * This file covers the functionality of the metadata log writing, parsing, and + * replay. + * + * This work is licensed under the terms of the GNU LGPL, version 2 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ +#include qemu-common.h +#include block/block_int.h +#include qemu/module.h +#include block/vhdx.h + + +typedef struct VHDXLogSequence { +bool valid; +uint32_t count; +VHDXLogEntries log; +VHDXLogEntryHeader hdr; +} VHDXLogSequence; + +typedef struct VHDXLogDescEntries { +VHDXLogEntryHeader hdr; +VHDXLogDescriptor desc[]; +} VHDXLogDescEntries; + + +/* Returns true if the GUID is zero */ +static bool vhdx_log_guid_is_zero(MSGUID *guid) +{ +int i; +int ret = 0; + +/* If either the log guid, or log length is zero, + * then a replay log is not present */ +for (i = 0; i sizeof(MSGUID); i++) { +ret |= ((uint8_t *) guid)[i]; +} + +return ret == 0; +} + +/* The log located on the disk is circular buffer containing + * sectors of 4096 bytes each. + * + * It is assumed for the read/write functions below that the + * circular buffer scheme uses a 'one sector open' to indicate + * the buffer is full. Given the validation methods used for each + * sector, this method should be compatible with other methods that + * do not waste a sector. + */ + + +/* Allow peeking at the hdr entry at the beginning of the current + * read index, without advancing the read index */ +static int vhdx_log_peek_hdr(BlockDriverState *bs, VHDXLogEntries *log, + VHDXLogEntryHeader *hdr) +{ +int ret = 0; +uint64_t offset; +uint32_t read; + +assert(hdr != NULL); + +/* peek is only supported on sector boundaries */ +if (log-read % VHDX_LOG_SECTOR_SIZE) { +ret = -EFAULT; +goto exit; +} + +read = log-read; +/* we are guaranteed that a) log sectors are 4096 bytes, + * and b) the log length is a multiple of 1MB. So, there + * is always a round number of sectors in the buffer */ +if ((read + sizeof(VHDXLogEntryHeader)) log-length) { +read = 0; +} + +if (read == log-write) { +ret = -EINVAL; +goto exit; +} + +offset = log-offset + read; + +ret = bdrv_pread(bs-file, offset, hdr, sizeof(VHDXLogEntryHeader)); +if (ret 0) { +goto exit; +} + +exit: +return ret; +} + +/* Index increment for
[Qemu-devel] [PATCH v4 13/13] block: vhdx - add .bdrv_create() support
This adds support for VHDX image creation, for images of type Fixed and Dynamic. Differencing types (i.e., VHDX images with backing files) are currently not supported. Options for image creation include: * log size: The size of the journaling log for VHDX. Minimum is 1MB, and it must be a multiple of 1MB. Invalid log sizes will be silently fixed by rounding up to the nearest MB. Default is 1MB. * block size: This is the size of a payload block. The range is 1MB to 256MB, inclusive, and must be a multiple of 1MB as well. Invalid sizes and multiples will be silently fixed. If '0' is passed, then a sane size is chosen (depending on virtual image size). Default is 0 (Auto-select). * subformat: - dynamic An image without data pre-allocated. - fixed An image with data pre-allocated. Default is dynamic When creating the image file, the lettered sections are created: -. | (A)| (B)|(C)| (D) | (E) | File ID | Header1 | Header 2 | Region Tbl 1 | Region Tbl 2 | | | | | .-. 0 64KB 128KB 192KB 256KB 320KB . ~ --- ~ ~ ~ ---. | (F) | (G) |(H)| | Journal Log | BAT / Bitmap | Metadata | data .. | | | | . ~ --- ~ ~ ~ ---. 1MB (var.) (var.) (var.) Signed-off-by: Jeff Cody jc...@redhat.com --- block/vhdx.c | 532 +++ block/vhdx.h | 15 +- 2 files changed, 546 insertions(+), 1 deletion(-) diff --git a/block/vhdx.c b/block/vhdx.c index 5c7d0c4..1551be1 100644 --- a/block/vhdx.c +++ b/block/vhdx.c @@ -22,6 +22,18 @@ #include block/vhdx.h #include uuid/uuid.h +#include glib.h + +/* Options for VHDX creation */ + +#define VHDX_BLOCK_OPT_LOG_SIZE log_size +#define VHDX_BLOCK_OPT_BLOCK_SIZE block_size + +typedef enum VHDXImageType { +VHDX_TYPE_DYNAMIC = 0, +VHDX_TYPE_FIXED, +VHDX_TYPE_DIFFERENCING, /* Currently unsupported */ +} VHDXImageType; /* Several metadata and region table data entries are identified by * guids in a MS-specific GUID format. */ @@ -1169,6 +1181,523 @@ static void vhdx_close(BlockDriverState *bs) qemu_vfree(s-log.hdr); } + +/* + * Create VHDX Headers + * + * There are 2 headers, and the highest sequence number will represent + * the active header + */ +static int vhdx_create_new_headers(BlockDriverState *bs, uint64_t image_size, + uint32_t log_size) +{ +int ret = 0; +VHDXHeader *hdr = NULL; + +hdr = g_malloc0(sizeof(VHDXHeader)); + +hdr-signature = VHDX_HEADER_SIGNATURE; +hdr-sequence_number = g_random_int(); +hdr-log_version = 0; +hdr-version = 1; +hdr-log_length = log_size; +hdr-log_offset = VHDX_HEADER_SECTION_END; +vhdx_guid_generate(hdr-file_write_guid); +vhdx_guid_generate(hdr-data_write_guid); + +ret = vhdx_write_header(bs, hdr, VHDX_HEADER1_OFFSET, false); +if (ret 0) { +goto exit; +} +hdr-sequence_number++; +ret = vhdx_write_header(bs, hdr, VHDX_HEADER2_OFFSET, false); +if (ret 0) { +goto exit; +} + +exit: +g_free(hdr); +return ret; +} + + +/* + * Create the Metadata entries. + * + * For more details on the entries, see section 3.5 (pg 29) in the + * VHDX 1.00 specification. + * + * We support 5 metadata entries (all required by spec): + * File Parameters, + * Virtual Disk Size, + * Page 83 Data, + * Logical Sector Size, + * Physical Sector Size + * + * The first 64KB of the Metadata section is reserved for the metadata + * header and entries; beyond that, the metadata items themselves reside. + */ +static int vhdx_create_new_metadata(BlockDriverState *bs, +uint64_t image_size, +uint32_t block_size, +uint32_t sector_size, +uint64_t metadata_offset, +VHDXImageType type) +{ +int ret = 0; +uint32_t offset = 0; +void *buffer = NULL; +void *entry_buffer; +VHDXMetadataTableHeader *md_table;; +VHDXMetadataTableEntry *md_table_entry; + +/* Metadata entries */ +VHDXFileParameters *mt_file_params; +VHDXVirtualDiskSize*mt_virtual_size; +VHDXPage83Data *mt_page83; +VHDXVirtualDiskLogicalSectorSize *mt_log_sector_size; +VHDXVirtualDiskPhysicalSectorSize *mt_phys_sector_size; + +entry_buffer =
[Qemu-devel] [PATCH v4 12/13] block: vhdx - fix comment typos in header, fix incorrect struct fields
VHDXPage83Data and VHDXParentLocatorHeader both incorrectly had their MSGUID fields set as arrays of 16. This is incorrect (it stems from an early version where those fields were uint_8 arrays). Those fields were, up to this patch, unused. Also, there were a couple of typos and incorrect wording in comments, and those have been fixed up as well. Signed-off-by: Jeff Cody jc...@redhat.com --- block/vhdx.h | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/block/vhdx.h b/block/vhdx.h index aab862c..c9c1cdd 100644 --- a/block/vhdx.h +++ b/block/vhdx.h @@ -58,7 +58,7 @@ typedef struct VHDXFileIdentifier { uint64_tsignature; /* vhdxfile in ASCII */ uint16_tcreator[256]; /* optional; utf-16 string to identify - the vhdx file creator. Diagnotistic + the vhdx file creator. Diagnostic only */ } VHDXFileIdentifier; @@ -114,8 +114,8 @@ typedef struct QEMU_PACKED VHDXHeader { there is no valid log. If non-zero, log entries with this guid are valid. */ -uint16_tlog_version;/* version of the log format. Mustn't be - zero, unless log_guid is also zero */ +uint16_tlog_version;/* version of the log format. Must be + set to zero */ uint16_tversion;/* version of the vhdx file. Currently, only supported version is 1 */ uint32_tlog_length; /* length of the log. Must be multiple @@ -281,7 +281,7 @@ typedef struct QEMU_PACKED VHDXVirtualDiskSize { } VHDXVirtualDiskSize; typedef struct QEMU_PACKED VHDXPage83Data { -MSGUID page_83_data[16]; /* unique id for scsi devices that +MSGUID page_83_data; /* unique id for scsi devices that support page 0x83 */ } VHDXPage83Data; @@ -296,7 +296,7 @@ typedef struct QEMU_PACKED VHDXVirtualDiskPhysicalSectorSize { } VHDXVirtualDiskPhysicalSectorSize; typedef struct QEMU_PACKED VHDXParentLocatorHeader { -MSGUID locator_type[16]; /* type of the parent virtual disk. */ +MSGUID locator_type; /* type of the parent virtual disk. */ uint16_treserved; uint16_tkey_value_count;/* number of key/value pairs for this locator */ -- 1.8.1.4
Re: [Qemu-devel] [PATCH] qemu-kvm bugfix for IA32_FEATURE_CONTROL
Paolo Bonzini wrote: Il 19/08/2013 16:59, Andreas Färber ha scritto: qemu-kvm is no longer maintained since 1.3 so it should not be occurring any more. Please use a prefix of target-i386: (the directory name) to signal where you are changing code, i.e. x86 only. bugfix is not a very telling description of what a patch is doing. (Up to Paolo and Gleb whether they'll fix it or whether they require a resend.) No, not this time at least. :) Paolo Thanks Paolo, and Andreas's comments is also good, so I update commit message and will send out later. Regards, Jinsong
[Qemu-devel] [PATCH v4 04/13] block: vhdx - log support struct and defines
This adds some magic number defines, and internal structure definitions for VHDX log replay support. The struct VHDXLogEntries does not reflect an on-disk data structure, and thus does not need to be packed. Some minor code style fixes are applied as well. Signed-off-by: Jeff Cody jc...@redhat.com --- block/vhdx.h | 46 ++ 1 file changed, 30 insertions(+), 16 deletions(-) diff --git a/block/vhdx.h b/block/vhdx.h index 74b2d5d..0ab8bf3 100644 --- a/block/vhdx.h +++ b/block/vhdx.h @@ -30,12 +30,12 @@ * 0.64KB...128KB192KB..256KB1MB */ -#define VHDX_HEADER_BLOCK_SIZE (64*1024) +#define VHDX_HEADER_BLOCK_SIZE (64 * 1024) #define VHDX_FILE_ID_OFFSET 0 -#define VHDX_HEADER1_OFFSET (VHDX_HEADER_BLOCK_SIZE*1) -#define VHDX_HEADER2_OFFSET (VHDX_HEADER_BLOCK_SIZE*2) -#define VHDX_REGION_TABLE_OFFSET(VHDX_HEADER_BLOCK_SIZE*3) +#define VHDX_HEADER1_OFFSET (VHDX_HEADER_BLOCK_SIZE * 1) +#define VHDX_HEADER2_OFFSET (VHDX_HEADER_BLOCK_SIZE * 2) +#define VHDX_REGION_TABLE_OFFSET(VHDX_HEADER_BLOCK_SIZE * 3) /* @@ -77,10 +77,10 @@ typedef struct QEMU_PACKED MSGUID { #define guid_eq(a, b) \ (memcmp((a), (b), sizeof(MSGUID)) == 0) -#define VHDX_HEADER_SIZE (4*1024) /* although the vhdx_header struct in disk - is only 582 bytes, for purposes of crc - the header is the first 4KB of the 64KB - block */ +#define VHDX_HEADER_SIZE (4 * 1024) /* although the vhdx_header struct in disk + is only 582 bytes, for purposes of crc + the header is the first 4KB of the 64KB + block */ /* The full header is 4KB, although the actual header data is much smaller. * But for the checksum calculation, it is over the entire 4KB structure, @@ -92,7 +92,7 @@ typedef struct QEMU_PACKED VHDXHeader { VHDX file has 2 of these headers, and only the header with the highest sequence number is valid */ -MSGUID file_write_guid; /* 128 bit unique identifier. Must be +MSGUID file_write_guid;/* 128 bit unique identifier. Must be updated to new, unique value before the first modification is made to file */ @@ -151,7 +151,10 @@ typedef struct QEMU_PACKED VHDXRegionTableEntry { /* LOG ENTRY STRUCTURES */ +#define VHDX_LOG_MIN_SIZE (1024 * 1024) +#define VHDX_LOG_SECTOR_SIZE 4096 #define VHDX_LOG_HDR_SIZE 64 +#define VHDX_LOG_SIGNATURE 0x65676f6c typedef struct QEMU_PACKED VHDXLogEntryHeader { uint32_tsignature; /* loge in ASCII */ uint32_tchecksum; /* CRC-32C hash of the 64KB table */ @@ -174,7 +177,8 @@ typedef struct QEMU_PACKED VHDXLogEntryHeader { } VHDXLogEntryHeader; #define VHDX_LOG_DESC_SIZE 32 - +#define VHDX_LOG_DESC_SIGNATURE 0x63736564 +#define VHDX_LOG_ZERO_SIGNATURE 0x6f72657a typedef struct QEMU_PACKED VHDXLogDescriptor { uint32_tsignature; /* zero or desc in ASCII */ union { @@ -194,6 +198,7 @@ typedef struct QEMU_PACKED VHDXLogDescriptor { vhdx_log_entry_header */ } VHDXLogDescriptor; +#define VHDX_LOG_DATA_SIGNATURE 0x61746164 typedef struct QEMU_PACKED VHDXLogDataSector { uint32_tdata_signature; /* data in ASCII */ uint32_tsequence_high; /* 4 MSB of 8 byte sequence_number */ @@ -219,12 +224,12 @@ typedef struct QEMU_PACKED VHDXLogDataSector { #define SB_BLOCK_PRESENT6 /* per the spec */ -#define VHDX_MAX_SECTORS_PER_BLOCK (123) +#define VHDX_MAX_SECTORS_PER_BLOCK (1 23) /* upper 44 bits are the file offset in 1MB units lower 3 bits are the state other bits are reserved */ #define VHDX_BAT_STATE_BIT_MASK 0x07 -#define VHDX_BAT_FILE_OFF_BITS (64-44) +#define VHDX_BAT_FILE_OFF_BITS (64 - 44) typedef uint64_t VHDXBatEntry; /* METADATA REGION STRUCTURES */ @@ -252,8 +257,8 @@ typedef struct QEMU_PACKED VHDXMetadataTableEntry { metadata region */ /* note: if length = 0, so is offset */ uint32_tlength; /* length of metadata. = 1MB. */ -uint32_tdata_bits; /* least-significant 3 bits are flags, the - rest are reserved (see above) */ +uint32_tdata_bits; /* least-significant 3 bits are flags, + the rest are reserved (see
Re: [Qemu-devel] [PATCHv11 13/31] aio / timers: aio_ctx_prepare sets timeout from AioContext timers
for (bh = ctx-first_bh; bh; bh = bh-next) { if (!bh-deleted bh-scheduled) { if (bh-idle) { /* idle bottom halves will be polled at least * every 10ms */ -*timeout = 10; +*timeout = qemu_soonest_timeout(*timeout, 10); glib will not set *timeout to a meaningful value before calling aio_ctx_prepare(), right? If so, *timeout = 10 should be used. I don't think that's correct. Each _prepare routine is called and has the abilitiy to alter the timeout but need not set it at all. Indeed it should not set it as there may already be a shorter timeout there. Here's the code before I touched it: aio_ctx_prepare(GSource *source, gint*timeout) { AioContext *ctx = (AioContext *) source; QEMUBH *bh; for (bh = ctx-first_bh; bh; bh = bh-next) { if (!bh-deleted bh-scheduled) { if (bh-idle) { /* idle bottom halves will be polled at least * every 10ms */ *timeout = 10; } else { /* non-idle bottom halves will be executed * immediately */ *timeout = 0; return true; } } } return false; } You'll note that what this does is: a) if there are no bottom halves, leaves *timeout as is b) if there is a non-idle bottom half, set timeout to 0 c) else set timeout to 10 if there are only idle bottom halves. Arguably (c) is a bug if timeout was shorter than 10 on entry but the whole of idle bhs are a bit of a bodge. This is fixed by my series. If you are asking WHERE it gets set to -1, I don't claim to be a glib expert but I believe it's the line gint source_timeout = -1 around line 2816 in glib/gmain.c -- Alex Bligh
Re: [Qemu-devel] vmdk stream-optimised format
On 20 Aug 2013, at 02:42, Fam Zheng wrote: OK, thanks for explaination. That sounds a valid use case for streamOptimized. However I am afraid QEMU and its users benefit not much from this feature anyway, because it's moving a VM away to VMware, :) that might be the reason it's not there yet, and I don't know about any plan to do it in the near future. Well, given it is an open source project, the more interoperability the better. Even if it just means users need not worry about lock in to faster hypervisors ... Being more serious, qemu-img is part of the project too. But if someone sends patches for this, I think it is possible to get in. I guessed send code might be the answer :-) What I'm not sure of is whether the streaming format has to be written sequentially from as opposed to random writes. I believe the way qemu-img convert works, one can't guarantee the writes are sequential. -- Alex Bligh
Re: [Qemu-devel] [PATCH] powerpc iommu: enable multiple TCE requests
On 20.08.2013, at 02:36, Alexey Kardashevskiy wrote: On 08/19/2013 07:47 PM, Paolo Bonzini wrote: Il 19/08/2013 10:44, Alexey Kardashevskiy ha scritto: It means that if you use the same QEMU version with the same command line on a different kernel version, your guest looks different because we generate the dtb differently. Oh. Sorry for my ignorance again, I am not playing dump or anything like that - I do not understand how the device tree (which we cook in QEMU) on the destination can possibly survive migration and not to be overwritten by the one from the source. What was in the destination RAM before migration does not matter at all (including dt), QEMU device tree is what matters but this does not change. As it is the same QEMU version, hypercalls are supported anyway, the only difference where they will be handled - in the host kernel or QEMU. What do I miss? Nothing, I just asked to test that handling the hypercall in QEMU works. Well, I was asking rather Alex :) On x86 we have a similar problem, though with cpuid bits instead of the device tree. An older kernel might not support some cpuid bits, thus -cpu SandyBridge might have different cpuid bits depending on the host processor and kernel version. This is handled by having an enforce mode where -cpu SandyBridge,enforce will fail to start if the host processor or the kernel is not new enough. Hm. Here we might have a problem like this is we decide to migrate from QEMU with this patch running on modern kernel to QEMU without this patch running on old kernel - for this we might want to be able to disable multi-tce via machine options on newer kernels. Do we care enough to add such a parameter or we just disable migration and that's it? The problem is not only contained to migration. Even if you just start up your VM with a different host kernel you get a different guest environment, so you potentially get change lurking in that can kill reproducability. Imagine you're Amazon EC2: You don't want people to get any idea what host they're running on. If the pseries target was a mature, well established and used target, I'd add a machine option multi-tce with 3 possibilities: on - force multi-tce exposing on off - force multi-tce exposing off unset - use your current detection code That way libvirt for example can decide that it wants to nail down TCE support throughout a cluster. It's really the same as the cpu,enforce mode, just on a machine level rather than for cpuid bits. However, considering the current user base of KVM on pseries I think it's fine to just declare newer QEMU on older KVM as slower because it doesn't use the in-kernel multi-tce support and call it a day. It makes everyone's life a _lot_ easier. Or are you aware of any products using older kernels that are going to run QEMU 1.7 and above and won't change the kernel as well as they bump up the version? This SandyBridge,enforce - what if the destination host running on old kernel was run without this option - will the migration fail? What is the Migration requires you to use the same command line on both ends of the migration. Unfortunately in only enforces it implicitly - one of the protocol's shortcomings - but that's the idea. Alex mechanism? Do machine options migrate? I looked at target-i386/cpu.c but did not see the quick answer. But in this case, you do not need this because the hypercall works if emulated by QEMU. I like Alex's solution of making it universally available in the dtb. The solution would be good if we did not already have H_PUT_TCE accelerated for emulated devices in the host kernel but we do have it. -- Alexey
Re: [Qemu-devel] [PATCH v2 1/9] tests: QAPI schema parser tests
Laszlo Ersek ler...@redhat.com writes: On 07/27/13 17:41, Markus Armbruster wrote: The parser handles erroneous input badly. To be improved shortly. Signed-off-by: Markus Armbruster arm...@redhat.com diff --git a/tests/Makefile b/tests/Makefile index cdbb79e..ddb957c 100644 --- a/tests/Makefile +++ b/tests/Makefile @@ -233,13 +242,24 @@ check-report.html: check-report.xml check-tests/qemu-iotests-quick.sh: tests/qemu-iotests-quick.sh qemu-img$(EXESUF) qemu-io$(EXESUF) $ +.PHONY: check-tests/test-qapi.py +check-tests/test-qapi.py: tests/test-qapi.py + +.PHONY: $(patsubst %, check-%, $(check-qapi-schema-y)) +$(patsubst %, check-%, $(check-qapi-schema-y)): check-%.json: $(SRC_PATH)/%.json +$(call quiet-command, PYTHONPATH=$(SRC_PATH)/scripts $(PYTHON) $(SRC_PATH)/tests/qapi-schema/test-qapi.py $^ $*.out 2$*.err; echo $$? $*.exit, TEST $*.out) +@diff -q $(SRC_PATH)/$*.out $*.out +@diff -q $(SRC_PATH)/$*.err $*.err +@diff -q $(SRC_PATH)/$*.exit $*.exit + Unfortunately, this doesn't catch errors/differences when someone builds qemu in the source tree, and runs make check there. In that case, whatever output test-qapi.py produces, overwrites the in-tree file, and the diff commands compare files with themselves. D'uh! Will fix. Thanks! Also, why is 2/9 a good idea, namely, using qapi-schema-test.json as a test file itself? Whoever adds a new test, for anything that uses the schema, now has to update the .out file as well. The parser needs to be tested with input that exercises all schema features. Since such a test already existed, I extended it to additionally test the parser. If this coupling turns out to be inconvenient, we can split the test in two: qapi-schema-test.json goes back to just its original purpose, and a (possibly simplified) copy is used for testing the parser. Drawback: when you add schema features, you have to update both tests to cover them. Neglecting to add any tests is obvious in review. Forgetting one of two not so much. (And, due to the above, only realizes this burden when someone else tries to make-check his/her code outside of the tree...) Anyone posting patches without running make check gets exactly what he asked for: embarrassment :)
Re: [Qemu-devel] [PATCH RESEND] RCU implementation for Qemu. Fixup some dynamic casts in the Qemu device tree to correspond to the QOM type-checking system.
Eric's advice in Message-ID: 5212712c.1080...@redhat.com applies: Your subject line is atrociously long. Please put a blank line between the summary (ca. 60 characters or less) and the rest of your commit message. 'git shortlog -30' will give you a hint on typical summary naming.
Re: [Qemu-devel] vmdk stream-optimised format
On Tue, 08/20 07:51, Alex Bligh wrote: On 20 Aug 2013, at 02:42, Fam Zheng wrote: OK, thanks for explaination. That sounds a valid use case for streamOptimized. However I am afraid QEMU and its users benefit not much from this feature anyway, because it's moving a VM away to VMware, :) that might be the reason it's not there yet, and I don't know about any plan to do it in the near future. Well, given it is an open source project, the more interoperability the better. Even if it just means users need not worry about lock in to faster hypervisors ... Being more serious, qemu-img is part of the project too. But if someone sends patches for this, I think it is possible to get in. I guessed send code might be the answer :-) What I'm not sure of is whether the streaming format has to be written sequentially from as opposed to random writes. I believe the way qemu-img convert works, one can't guarantee the writes are sequential. The order of sectors doesn't matter, but granularity should be aligned to, as the data is compressed cluster by cluster. And no overwrite, of course. The challenge may be that header comes at the end of file (well, called footer), which is not decided at create time. Thanks, Fam
Re: [Qemu-devel] [PATCH v2] monitor: print the invalid char in error message
Fam Zheng f...@redhat.com writes: It's more friendly to print which char is invalid to user, especially when user tries to input a float value and expect the monitor to round it to int. Since we don't round float number when we look for a integer, telling which char is invalid is less confusing. Signed-off-by: Fam Zheng f...@redhat.com --- monitor.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/monitor.c b/monitor.c index 5dc0aa9..da9c9a2 100644 --- a/monitor.c +++ b/monitor.c @@ -3171,9 +3171,13 @@ static const MonitorDef monitor_defs[] = { { NULL }, }; -static void expr_error(Monitor *mon, const char *msg) +static void expr_error(Monitor *mon, const char *fmt, ...) { -monitor_printf(mon, %s\n, msg); +va_list ap; +va_start(ap, fmt); +monitor_vprintf(mon, fmt, ap); +monitor_printf(mon, \n); +va_end(ap); siglongjmp(expr_env, 1); } @@ -3291,7 +3295,7 @@ static int64_t expr_unary(Monitor *mon) expr_error(mon, number too large); } if (pch == p) { -expr_error(mon, invalid char in expression); +expr_error(mon, invalid char '%c' in expression, *p); } pch = p; while (qemu_isspace(*pch)) The user is still left guessing *where* the error is. But it's an incremental improvement. Reviewed-by: Markus Armbruster arm...@redhat.com
Re: [Qemu-devel] [PATCHv11 13/31] aio / timers: aio_ctx_prepare sets timeout from AioContext timers
于 2013-8-20 14:48, Alex Bligh 写道: for (bh = ctx-first_bh; bh; bh = bh-next) { if (!bh-deleted bh-scheduled) { if (bh-idle) { /* idle bottom halves will be polled at least * every 10ms */ -*timeout = 10; +*timeout = qemu_soonest_timeout(*timeout, 10); glib will not set *timeout to a meaningful value before calling aio_ctx_prepare(), right? If so, *timeout = 10 should be used. I don't think that's correct. Each _prepare routine is called and has the abilitiy to alter the timeout but need not set it at all. Indeed it should not set it as there may already be a shorter timeout there. Here's the code before I touched it: aio_ctx_prepare(GSource *source, gint*timeout) { AioContext *ctx = (AioContext *) source; QEMUBH *bh; for (bh = ctx-first_bh; bh; bh = bh-next) { if (!bh-deleted bh-scheduled) { if (bh-idle) { /* idle bottom halves will be polled at least * every 10ms */ *timeout = 10; } else { /* non-idle bottom halves will be executed * immediately */ *timeout = 0; return true; } } } return false; } You'll note that what this does is: a) if there are no bottom halves, leaves *timeout as is b) if there is a non-idle bottom half, set timeout to 0 c) else set timeout to 10 if there are only idle bottom halves. Arguably (c) is a bug if timeout was shorter than 10 on entry but the whole of idle bhs are a bit of a bodge. This is fixed by my series. If you are asking WHERE it gets set to -1, I don't claim to be a glib expert but I believe it's the line gint source_timeout = -1 around line 2816 in glib/gmain.c Thanks for the explanation. It seems *timeout is always set to -1 before calling GSource's prepare(), so *timeout = qemu_soonest_timeout(*timeout, 10); will always get *timeout = 10, so this call can be saved. -- Best Regards Wenchao Xia
Re: [Qemu-devel] [Bug?] qemu-1.6.0 python traceback in GEN qmp-commands.h
On August 19, 2013 at 6:15 PM Erik Rull erik.r...@rdsoftware.de wrote: Luiz Capitulino wrote: On Fri, 16 Aug 2013 14:21:50 +0100 Peter Maydell peter.mayd...@linaro.org wrote: On 16 August 2013 08:59, Erik Rull erik.r...@rdsoftware.de wrote: Hi all, when using the released qemu-1.6.0.tar.bz2, I get the following error message: File /home/erik/qemu-1.6.0/scripts/qapi.py, line 164 except QAPISchemaError as e: ^ SyntaxError: invalid syntax make: *** [qmp-commands.h] Error 1 My guess is that your python is older than 2.6; I think the except Foo as e syntax is new in 2.6. We probably missed this because most people use a newer Python than 2.6, but configure's check only requires 2.4 or better. We should probably update the scripts to not use overly new features (or alternatively decide that 2.6 is our new minimum -- what do RHEL5 and our other oldest-supported distros ship?) For this specific case I think it needs to change to except QAPISchemaError, e: Erik, can you try that and post a patch? Would be interesting to know if this is the only problem with older python. Yes, I will try that. I never really tried to send patches to this list... My python version is 2.4 - as it was assumed already. Best regards, Erik This fixes it - it compiles successfully, but my guest no longer boots up completely! Windows XP gets a bluescreen and reboots in an infinite loop. Strange is: I was requested to put some efi* files now on my target system for handling the network cards (qemu complains at startup via stderr when I don't have them available on my target system). But why? Where can I select to use the pxe* files? There seems to be no possibility to select them via ./configure or as qemu command line option. Maybe this is related to the bluescreen? 1.2.0 was working properly. Best regards, Erik
[Qemu-devel] [PATCH 1/2] tests: Fix schema parser test for in-tree build
From: Markus Armbruster arm...@redhat.com Commit 4f193e3 added the test, but screwed up in-tree builds (SRCDIR=.): the tests's output overwrites the expected output, and is thus compared to itself. Reported-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Markus Armbruster arm...@redhat.com --- tests/.gitignore | 1 + tests/Makefile | 8 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/tests/.gitignore b/tests/.gitignore index fb05c2a..d9c2ef4 100644 --- a/tests/.gitignore +++ b/tests/.gitignore @@ -19,3 +19,4 @@ test-thread-pool test-x86-cpuid test-xbzrle *-test +qapi-schema/*.test.* diff --git a/tests/Makefile b/tests/Makefile index d044908..ad98439 100644 --- a/tests/Makefile +++ b/tests/Makefile @@ -247,10 +247,10 @@ check-tests/test-qapi.py: tests/test-qapi.py .PHONY: $(patsubst %, check-%, $(check-qapi-schema-y)) $(patsubst %, check-%, $(check-qapi-schema-y)): check-%.json: $(SRC_PATH)/%.json - $(call quiet-command, PYTHONPATH=$(SRC_PATH)/scripts $(PYTHON) $(SRC_PATH)/tests/qapi-schema/test-qapi.py $^ $*.out 2$*.err; echo $$? $*.exit, TEST $*.out) - @diff -q $(SRC_PATH)/$*.out $*.out - @diff -q $(SRC_PATH)/$*.err $*.err - @diff -q $(SRC_PATH)/$*.exit $*.exit + $(call quiet-command, PYTHONPATH=$(SRC_PATH)/scripts $(PYTHON) $(SRC_PATH)/tests/qapi-schema/test-qapi.py $^ $*.test.out 2$*.test.err; echo $$? $*.test.exit, TEST $*.out) + @diff -q $(SRC_PATH)/$*.out $*.test.out + @diff -q $(SRC_PATH)/$*.err $*.test.err + @diff -q $(SRC_PATH)/$*.exit $*.test.exit # Consolidated targets -- 1.8.1.4
[Qemu-devel] [PATCH 0/2] tests: Fixes for in-tree build
From: Markus Armbruster arm...@redhat.com Markus Armbruster (2): tests: Fix schema parser test for in-tree build tests: Update .gitignore for test-int128 and test-bitops tests/.gitignore | 3 +++ tests/Makefile | 8 2 files changed, 7 insertions(+), 4 deletions(-) -- 1.8.1.4
[Qemu-devel] [PATCH 2/2] tests: Update .gitignore for test-int128 and test-bitops
From: Markus Armbruster arm...@redhat.com Forgotten in commit 6046c62 and 3464700. Signed-off-by: Markus Armbruster arm...@redhat.com --- tests/.gitignore | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tests/.gitignore b/tests/.gitignore index d9c2ef4..9ac044d 100644 --- a/tests/.gitignore +++ b/tests/.gitignore @@ -5,8 +5,10 @@ check-qjson check-qlist check-qstring test-aio +test-bitops test-cutils test-hbitmap +test-int128 test-iov test-mul64 test-qapi-types.[ch] -- 1.8.1.4
Re: [Qemu-devel] [PATCH 7/7] switch raw block driver from raw.o to raw_bsd.o
Am 18.08.2013 um 16:29 hat Paolo Bonzini geschrieben: Il 16/08/2013 16:15, Laszlo Ersek ha scritto: +static int raw_reopen_prepare(BDRVReopenState *reopen_state, + BlockReopenQueue *queue, Error **errp) { -return bdrv_reopen_prepare(bs-file); +BDRVReopenState tmp = *reopen_state; + +tmp.bs = tmp.bs-file; +return bdrv_reopen_prepare(tmp, queue, errp); } This should just return zero, my fault. Which is because bdrv_reopen_queue() already queues bs-file for reopen. The simple return 0; implementation is shared by all other format drivers that support reopening images. Kevin
Re: [Qemu-devel] [PATCH 4/7] raw_bsd: introduce special members
Am 16.08.2013 um 16:15 hat Laszlo Ersek geschrieben: On 08/05/13 15:03, Paolo Bonzini wrote: [...] 3) These members are special .format_name is the string raw .bdrv_open raw_open should set bs-sg to bs-file-sg and return 0 .bdrv_closeraw_close should do nothing .bdrv_proberaw_probe should just return 1. Signed-off-by: Laszlo Ersek ler...@redhat.com --- block/raw_bsd.c | 20 1 files changed, 20 insertions(+), 0 deletions(-) diff --git a/block/raw_bsd.c b/block/raw_bsd.c index 5bcbe71..86e018d 100644 --- a/block/raw_bsd.c +++ b/block/raw_bsd.c @@ -112,3 +112,23 @@ static TYPE raw_create(void) { return bdrv_create_file(); } + +static const char *raw_format_name(void) +{ +return raw; +} + +static int raw_open(BlockDriverState *bs) +{ +bs-sg = bs-file-sg; I know that Paolo explicitly made this requirement, but I think it is ugly. We should instead fix bdrv_is_sg() to look at bs-file if it exists, like many other functions already do, and change the two readers of bs-sg in block.c to use bdrv_is_sg(). +return 0; +} + +static void raw_close(void) +{ +} + +static int raw_probe(void) +{ +return 1; +} Maybe add a comment here like smallest possible positive score so that raw is used if and only if no other block driver works. Kevin
Re: [Qemu-devel] [PATCH 0/3] target-ppc: Add support for dumping guest memory using qemu gdb server
Andreas Färber afaer...@suse.de writes: Hi Aneesh, Am 19.08.2013 14:29, schrieb Aneesh Kumar K.V: This patch series implement support for dumping guest memory using qemu gdb server. I had a quick look through but will leave in-depth review to Alex or Anthony. Do you plan to implement dumping guest memory via QMP, too? Are you looking at memsave command ? That would fail before. This patch series should fix that too. For memsave to fail we need the below patch diff --git a/cpus.c b/cpus.c index 0f65e76..3340150 100644 --- a/cpus.c +++ b/cpus.c @@ -1309,7 +1309,10 @@ void qmp_memsave(int64_t addr, int64_t size, const char *filename, l = sizeof(buf); if (l size) l = size; -cpu_memory_rw_debug(cpu, addr, buf, l, 0); +if (cpu_memory_rw_debug(cpu, addr, buf, l, 0) != 0) { +error_set(errp, QERR_IO_ERROR); +goto exit; +} if (fwrite(buf, 1, l, f) != l) { error_set(errp, QERR_IO_ERROR); goto exit;
Re: [Qemu-devel] [PATCH 0/7] introduce BSD-licensed block driver for raw
Am 16.08.2013 um 16:15 hat Laszlo Ersek geschrieben: Paolo asked me to write such a driver based on his textual specification alone. The first patch captures his email in full, the rest re-quotes parts that are being implemented. The tree compiles at each patch. The series passes make check-block. block/raw.c is not removed because I wanted to keep it out of my series and out of my brain. Disclaimer: I couldn't care less if the raw block driver was public domain or AGPLv3+, as long as it qualifies as free software. I'm only trying to do what Paolo asked of me. Laszlo Ersek (7): add skeleton for BSD licensed raw BlockDriver raw_bsd: emit debug events in bdrv_co_readv() and bdrv_co_writev() raw_bsd: add raw_create() raw_bsd: introduce special members raw_bsd: add raw_create_options raw_bsd: register bdrv_raw switch raw block driver from raw.o to raw_bsd.o block/Makefile.objs |2 +- block/raw_bsd.c | 186 +++ 2 files changed, 187 insertions(+), 1 deletions(-) create mode 100644 block/raw_bsd.c Reviewed if the individual added functions make sense, whether all necessary function from struct BlockDriver are implemented, and which fields from BlockDriverState need special handling (it's only bs-sg, and we should probably get rid of that requirement) Looks good in general, but please CC Stefan and me for v2 (like for all block patches). Kevin
Re: [Qemu-devel] [PATCH] powerpc iommu: enable multiple TCE requests
Il 20/08/2013 03:36, Alexey Kardashevskiy ha scritto: Hm. Here we might have a problem like this is we decide to migrate from QEMU with this patch running on modern kernel to QEMU without this patch running on old kernel - for this we might want to be able to disable multi-tce via machine options on newer kernels. Do we care enough to add such a parameter or we just disable migration and that's it? Upstream doesn't support migration to older QEMU. This SandyBridge,enforce - what if the destination host running on old kernel was run without this option - will the migration fail? The destination machine will not even start. But in this case, you do not need this because the hypercall works if emulated by QEMU. I like Alex's solution of making it universally available in the dtb. The solution would be good if we did not already have H_PUT_TCE accelerated for emulated devices in the host kernel but we do have it. The question is also whether you consider pSeries support complete enough to be production ready---and until you have versioned machine types I would say you don't. If you still consider it somewhat experimental, I would do as Alex said: make newer QEMU on older KVM as slower, and that's it. Paolo
Re: [Qemu-devel] [PATCH] powerpc iommu: enable multiple TCE requests
On 08/20/2013 06:39 PM, Paolo Bonzini wrote: Il 20/08/2013 03:36, Alexey Kardashevskiy ha scritto: Hm. Here we might have a problem like this is we decide to migrate from QEMU with this patch running on modern kernel to QEMU without this patch running on old kernel - for this we might want to be able to disable multi-tce via machine options on newer kernels. Do we care enough to add such a parameter or we just disable migration and that's it? Upstream doesn't support migration to older QEMU. Hm. That makes things simpler. Then I do not understand why we need migration protocol versions as QEMU version in stronger version for migration. Ah, offtopic. This SandyBridge,enforce - what if the destination host running on old kernel was run without this option - will the migration fail? The destination machine will not even start. But in this case, you do not need this because the hypercall works if emulated by QEMU. I like Alex's solution of making it universally available in the dtb. The solution would be good if we did not already have H_PUT_TCE accelerated for emulated devices in the host kernel but we do have it. The question is also whether you consider pSeries support complete enough to be production ready---and until you have versioned machine types I would say you don't. If you still consider it somewhat experimental, I would do as Alex said: make newer QEMU on older KVM as slower, and that's it. Sorry if I miss anything, but is not it what the patch already does? :) If so, I'll repost this patch + traces rework tomorrow or so. -- Alexey
Re: [Qemu-devel] [PATCH v2 6/7] vl: Set current_machine early
Andreas Färber afaer...@suse.de writes: Am 19.08.2013 11:35, schrieb Markus Armbruster: Andreas Färber afaer...@suse.de writes: Am 16.08.2013 15:18, schrieb arm...@redhat.com: From: Markus Armbruster arm...@redhat.com I'd like to access QEMUMachine from a QEMUMachine init() method, which is currently not possible. Instead of passing it as an argument, I simply set current_machine earlier. We had such a patch for CPU hot-add and decided against doing this. Currently current_machine != signals that it has been initialized. And Does any code actually depend on this undocumented condition? I found none. I didn't audit. Currently the users are limited to vl.c itself, device-hotplug.c for block_default_type and qmp.c for hot_add_cpu. pc.c feels odd in that mix. [...] Can't you pass either QEMUMachine or the specific fields needed from PC code to those SMBIOS functions? You did add a bool argument. Can't see how to do that without passing the machine to QEMUMachine method init(), which involves touching all boards. I doubt that's a good idea, but if you insist, I can do it. Isn't that exactly what QEMUMachineArgs was meant to address? :) Had a look at your don't-explode patches and they looked good. I could give it a try, but to be honest, I'm reluctant to base new work on a series that has been ignored by all committers for more than two months.
Re: [Qemu-devel] [PATCH] powerpc iommu: enable multiple TCE requests
Il 20/08/2013 10:49, Alexey Kardashevskiy ha scritto: On 08/20/2013 06:39 PM, Paolo Bonzini wrote: Il 20/08/2013 03:36, Alexey Kardashevskiy ha scritto: Hm. Here we might have a problem like this is we decide to migrate from QEMU with this patch running on modern kernel to QEMU without this patch running on old kernel - for this we might want to be able to disable multi-tce via machine options on newer kernels. Do we care enough to add such a parameter or we just disable migration and that's it? Upstream doesn't support migration to older QEMU. Hm. That makes things simpler. Then I do not understand why we need migration protocol versions as QEMU version in stronger version for migration. Ah, offtopic. Because you have multiple downstreams that can backport arbitrary crap on top of the same base QEMU version. In RHEL for example we do some effort to force usage of older version IDs (or to forcibly leave out subsections) depending on the machine type you're using. So even though upstream is limited in the kind of migrations we support, we provide the infrastructure for downstreams to give stronger guarantees. This SandyBridge,enforce - what if the destination host running on old kernel was run without this option - will the migration fail? The destination machine will not even start. But in this case, you do not need this because the hypercall works if emulated by QEMU. I like Alex's solution of making it universally available in the dtb. The solution would be good if we did not already have H_PUT_TCE accelerated for emulated devices in the host kernel but we do have it. The question is also whether you consider pSeries support complete enough to be production ready---and until you have versioned machine types I would say you don't. If you still consider it somewhat experimental, I would do as Alex said: make newer QEMU on older KVM as slower, and that's it. Sorry if I miss anything, but is not it what the patch already does? :) No, you need to expose multitce unconditionally in the device tree. Paolo If so, I'll repost this patch + traces rework tomorrow or so.
Re: [Qemu-devel] [Xen-devel] [PATCH] Qemu-xen: HVM S3 bugfix
On Mon, Aug 19, 2013 at 3:47 PM, Liu, Jinsong jinsong@intel.com wrote: Ping? I know Ian and Stefano are doing something at ARM this week -- Anthony might be able to give a review, at least... -George
Re: [Qemu-devel] [PATCH] powerpc iommu: enable multiple TCE requests
On Tue, 2013-08-20 at 11:09 +0200, Paolo Bonzini wrote: Sorry if I miss anything, but is not it what the patch already does? :) No, you need to expose multitce unconditionally in the device tree. If I'm not mistaken the multitce kernel side patches are still not upstream so I disagree. Exposing it will make Linux use it which means that anything running on 3.10 or 3.11 will become very slow. So no, multitce should not be exposed if KVM doesn't support it. Cheers, Ben.
Re: [Qemu-devel] [PATCH] powerpc iommu: enable multiple TCE requests
Il 20/08/2013 11:13, Benjamin Herrenschmidt ha scritto: On Tue, 2013-08-20 at 11:09 +0200, Paolo Bonzini wrote: Sorry if I miss anything, but is not it what the patch already does? :) No, you need to expose multitce unconditionally in the device tree. If I'm not mistaken the multitce kernel side patches are still not upstream so I disagree. Exposing it will make Linux use it which means that anything running on 3.10 or 3.11 will become very slow. So no, multitce should not be exposed if KVM doesn't support it. Then you have to do it right and: - provide the infrastructure to enable/disable it from the command line (which will be enough design effort alone); - add pseries-1.6 as a synonym of pseries in 1.6.1; - add pseries-1.7 a synonym of pseries in master; - add a pseries-1.6 machine type in master that always disables it. Paolo
Re: [Qemu-devel] [PATCH] powerpc iommu: enable multiple TCE requests
On Tue, 2013-08-20 at 11:15 +0200, Paolo Bonzini wrote: - provide the infrastructure to enable/disable it from the command line (which will be enough design effort alone); sight ... why do things simply when we can come up with a cathedral instead ? - add pseries-1.6 as a synonym of pseries in 1.6.1; - add pseries-1.7 a synonym of pseries in master; - add a pseries-1.6 machine type in master that always disables it.
Re: [Qemu-devel] [PATCH] powerpc iommu: enable multiple TCE requests
Il 20/08/2013 11:20, Benjamin Herrenschmidt ha scritto: On Tue, 2013-08-20 at 11:15 +0200, Paolo Bonzini wrote: - provide the infrastructure to enable/disable it from the command line (which will be enough design effort alone); sight ... why do things simply when we can come up with a cathedral instead ? Uhm... I thought Alex and I were the one who proposed the simple things. You _will_ need to do the complicated stuff sooner or later, but it's probably early enough now to ignore it. And I'm not saying this out of love for big cathedrals, but out of lessons we learned the hard way for x86 (where we haven't gotten everything right yet, either). Paolo - add pseries-1.6 as a synonym of pseries in 1.6.1; - add pseries-1.7 a synonym of pseries in master; - add a pseries-1.6 machine type in master that always disables it.
Re: [Qemu-devel] [PATCH] powerpc iommu: enable multiple TCE requests
On Tue, 2013-08-20 at 11:22 +0200, Paolo Bonzini wrote: Uhm... I thought Alex and I were the one who proposed the simple things. You _will_ need to do the complicated stuff sooner or later, but it's probably early enough now to ignore it. And I'm not saying this out of love for big cathedrals, but out of lessons we learned the hard way for x86 (where we haven't gotten everything right yet, either). I suppose if RH is going to deploy 3.10 and we aren't going to backport the multitce patches then there *might* be a case for supporting that combo specifically... but it's going to be that bad every time we add a new feature with a kernel counter part or start adding the gazillion little bits of PAPR that we are still missing ? Ben.
Re: [Qemu-devel] [PATCHv11 13/31] aio / timers: aio_ctx_prepare sets timeout from AioContext timers
On 20 Aug 2013, at 08:19, Wenchao Xia wrote: Thanks for the explanation. It seems *timeout is always set to -1 before calling GSource's prepare(), so *timeout = qemu_soonest_timeout(*timeout, 10); will always get *timeout = 10, so this call can be saved. I believe that's incorrect too. glib *currently* calls the API that way, but there's nothing to say a future or past glub couldn't call each g_source with the same timeout pointer, with each g_source adjusting the timeout downwards. Or (for instance) call it with 0 for a non-blocking prepare. Therefore the implemented way is safer (only reducing the timeout). -- Alex Bligh
Re: [Qemu-devel] [RFC] Convert AioContext to Gsource sub classes
于 2013-8-16 16:12, Wenchao Xia 写道: 于 2013-8-16 15:15, Wenchao Xia 写道: 于 2013-8-16 0:32, Michael Roth 写道: Quoting Michael Roth (2013-08-15 10:23:20) Quoting Wenchao Xia (2013-08-13 03:44:39) 于 2013-8-13 1:01, Michael Roth 写道: Quoting Paolo Bonzini (2013-08-12 02:30:28) 1) rename AioContext to AioSource. This is my major purpose, which declare it is not a context concept, and GMainContext is the entity represent the thread's activity. Note that the nested event loops in QEMU are _very_ different from glib nested event loops. In QEMU, nested event loops only run block layer events. In glib, they run all events. That's why you need AioContext. Would it be possible to use glib for our nested loops as well by just setting a higher priority for the AioContext GSource? Stefan and I were considering how we could make use of his drop ioflush patches to use a common mechanism to register fd events, but still allow us to distinguish between AioContext and non-AioContext for nested loops. I was originally thinking of using prepare() functions to filter out non-AioContext events, but that requires we implement on GSource's with that in mind, and non make use of pre-baked ones like GIOChannel's, and bakes block stuff into every event source implementation. Besides priority, also g_source_set_can_recurse() can help. With a deeper think, I found a harder problem: g_main_context_acquire() and g_main_context_release(). In release, pending BH/IO call back need to be cleared, but this action can't be triggered automatically when user call g_main_context_release(). I don't understand why this is a requirement, gmctx_acquire/release ensure that only one thread attempts to iterate the main loop at a time. this isn't currently an issue in qemu, and if we re-implemented qemu_aio_wait() to use the same glib interfaces, the tracking of in-flight requests would be moved to the block layer via Stefan's 'drop io_flush' patches, which moves that block-specific logic out of the main loop/AioContext GSource by design. Are there other areas where you see this as a problem? I think I understand better what you're referring to, you mean that if qemu_aio_wait was called, and was implementated to essentially call g_main_context_iterate(), that after, say, 1 iteration, the iothread/dataplane thread might acquire the main loop and dispatch block/non-block events between qemu_aio_wait() returned? The simple approach would be to have qemu_aio_wait() call g_main_context_acquire/release at the start end of the function, which would ensure that this never happens. qemu_aio_wait() is relative simple to resolve by g_main_context_acquire(), but also shows additional code needed for a thread switch: (guess following is the plan to implement qemu_aio_wait()) qemu_aio_wait(GMainContext *ctx) { return g_main_context(ctx, PRI_BLOCK); } at caller: { while (qemu_aio_wait(ctx) { ; } g_main_context_release(ctx); } The above code shows that, in *ctx switch operation, there is more than glib's own event loop API envolved, qemu_aio_wait(). So it referenced to a question: what data structure should be used to represent context concept and control the thread switching behavior? It is better to have a clear layer, GMainContext or GlibQContext, instead of GMainContext plus custom function. The caller reference to at least two: nested user block layer, flat user above block layer. In my opinion, this problem is brought by Gsource AioContext, Take the call path of bdrv_aio_readv(*bdrv_cb) on raw linux file as an example, there are aync following operations involved for AioContext: 1 the bdrv_cb() will be executed in bdrv_co_em_bh(). 2 bdrv_co_io_em_complete() will be executed in event_notfier_ready(). 3 aio_worker() will be executed in worker_thread(). Operation 2 and 3 are tracked by block layer's queue after Stefan's dropping io_flush() series. Now if we stick to GMainContext to represent context concept, then when thread B want to aquire GMainContext used by thread A, all works setupped by A should be finished before B aquire it, otherwise B will execute some function supposed to work in A. The work refers to the three kind of aync functions above. For this issue, we can solve it by different means: 1 the event loop API doesn't guarentee work setupped by thread A will always be finished in A. This put a limitation to caller to consider thread switching. I talked on IRC with Stefan, and thinks it is possible for the nested user block layer, but I still want to avoid this to the flat user above block layer. 2 ask caller to finish all pending operations. 2.1 glib GMainContext API plus custom API such as aio_wait(). This is bad that detail under GMainContext is exposed to caller. Since operation 1 mentioned before is not tracked yet, to make sure bdrv_cb() is called in thread setupped it, 1 need to be added in the track queue, or in the call chain of aio_wait(), check the queue plus
Re: [Qemu-devel] [PATCH] powerpc iommu: enable multiple TCE requests
Il 20/08/2013 11:26, Benjamin Herrenschmidt ha scritto: On Tue, 2013-08-20 at 11:22 +0200, Paolo Bonzini wrote: Uhm... I thought Alex and I were the one who proposed the simple things. You _will_ need to do the complicated stuff sooner or later, but it's probably early enough now to ignore it. And I'm not saying this out of love for big cathedrals, but out of lessons we learned the hard way for x86 (where we haven't gotten everything right yet, either). I suppose if RH is going to deploy 3.10 and we aren't going to backport the multitce patches then there *might* be a case for supporting that combo specifically... but it's going to be that bad every time we add a new feature with a kernel counter part or start adding the gazillion little bits of PAPR that we are still missing ? Yes. :( Unless you consider pSeries KVM not mature enough to provide a guest ABI (basically supporting live migration only between identical kernels and QEMUs). It would make some sense, for example (mutatis mutandis) Red Hat has a kernel ABI for the regular RHEL kernel, but not for the real-time RHEL kernel. Paolo
Re: [Qemu-devel] minimal linux distribution for qemu
On 08/18/2013 08:57:08 AM, Herbei Dacian wrote: good to know. I was working back in 2005-2006 with a company that had a 4MB kernel. At that time I was too inexperienced to work at that level but I thought now I could reproduce their work with some help. Anyhow for the moment I'll go for 256 MB of ram board just so that I don't worry too much about things that are not yet relevant for me. But thanks again for the warning. But since you helped me soo much I have another question. Is it fisible to change the emulator so that I may visualize the following aspects: _ address of the currently executed instruction from the guest system _ if this instruction is a form of jump like call return conditional jump. _ the address or range of addresses read by this instruction _ the address or range of addresses written by this instruction If you feed qemu the -s option it'll open a network port you can connect to to provide the gdbserver protocol (gdb's target remote command attaches to this). For system emulation it acts like a jtag attached to the emulated hardware, letting you see registers and such. I read some things about the emulator and if I understood it correctly the emulator breaks the instructions of the gurest platform in micro ops which are then executed on the host operation system. Not really, no. QEMU translates large blocks of code (used to be pages, now it's variable sized chunks depending on where the return instruction is) and keeps the translated versions cached (sort of like a java JIT). The main QEMU loop then calls the translated functions which execute until they return or get interrupted by signals (simulating things like timer IRQ). This is why QEMU is so fast, the actual translation overhead is amortized by the resulting native code being run lots of times, a function or loop gets translated once and then runs as native code. This means that the address of the currently executing instruction isn't really something qemu naturally tracks, because although there _is_ a copy of the untranslated code page, it's not what we're running. The gdbserver code tries to do so artifically, but it's slow and awkward and not always perfect. Self-modifying code is actually a horrible thing to do to qemu, from a performance perspective. Every time the emulated code page is modified, the cached copy of the translated code is discarded and the entire page gets retranslated. This means that in Aboriginal Linux, the shell scripts ./configure runs sped up 20% when I replaced my dynamically linked busybox with a statically linked one, due to the extra translations caused by the relocation fixups. Rob
Re: [Qemu-devel] [PATCH v2] Fix query-migrate documentation in qmp-commands.hx
On 08/19/2013 09:41 PM, Luiz Capitulino wrote: On Mon, 12 Aug 2013 10:19:52 -0400 Luiz Capitulino lcapitul...@redhat.com wrote: On Thu, 8 Aug 2013 20:05:48 +0300 Orit Wasserman owass...@redhat.com wrote: ram is present also when migration completes. expected-downtime, total-time and downtime are no longer part of ram data. Signed-off-by: Orit Wasserman owass...@redhat.com Applied to the qmp branch, thanks. This one missed the deadline for 1.6. I'm my updating my tree for 1.7, but this patch conflicts with 8f3067, which repeated for the setup-time key the mistake this patch is fixing for other keys. We need a respin for this patch, which should also move setup-time to the main dict. Orit, I can do it myself if you want. That will be great, Thanks, Orit --- qmp-commands.hx | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index 2e59b0d..a22a841 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -2626,8 +2626,8 @@ The main json-object contains the following: - expected-downtime: only present while migration is active total amount in ms for downtime that was calculated on the last bitmap round (json-int) -- ram: only present if status is active, it is a json-object with the - following RAM information: +- ram: only present if status is active or complete, it is a + json-object with the following RAM information: - transferred: amount transferred in bytes (json-int) - remaining: amount remaining to transfer in bytes (json-int) - total: total amount of memory in bytes (json-int) @@ -2669,12 +2669,12 @@ Examples: - { execute: query-migrate } - { return: { status: completed, +total-time:12345, +downtime:12345, ram:{ transferred:123, remaining:123, total:246, - total-time:12345, - downtime:12345, duplicate:123, normal:123, normal-bytes:123456 @@ -2693,12 +2693,12 @@ Examples: - { return:{ status:active, + total-time:12345, + expected-downtime:12345, ram:{ transferred:123, remaining:123, total:246, -total-time:12345, -expected-downtime:12345, duplicate:123, normal:123, normal-bytes:123456 @@ -2712,12 +2712,12 @@ Examples: - { return:{ status:active, + total-time:12345, + expected-downtime:12345, ram:{ total:1057024, remaining:1053304, transferred:3720, -total-time:12345, -expected-downtime:12345, duplicate:123, normal:123, normal-bytes:123456 @@ -2736,13 +2736,13 @@ Examples: - { return:{ status:active, + total-time:12345, + expected-downtime:12345, capabilities : [ { capability: xbzrle, state : true } ], ram:{ total:1057024, remaining:1053304, transferred:3720, -total-time:12345, -expected-downtime:12345, duplicate:10, normal:, normal-bytes:3412992
Re: [Qemu-devel] [PATCH 0/2] tests: Fixes for in-tree build
Am 20.08.2013 09:47, schrieb arm...@redhat.com: From: Markus Armbruster arm...@redhat.com Markus Armbruster (2): tests: Fix schema parser test for in-tree build tests: Update .gitignore for test-int128 and test-bitops Series Reviewed-by: Andreas Färber afaer...@suse.de but you should probably add Cc: qemu-sta...@nongnu.org to 1/2 at least. Andreas tests/.gitignore | 3 +++ tests/Makefile | 8 2 files changed, 7 insertions(+), 4 deletions(-) -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
[Qemu-devel] [PATCH -V2 3/4] target-ppc: Fix page table lookup with kvm enabled
From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com With kvm enabled, we store the hash page table information in the hypervisor. Use ioctl to read the htab contents. Without this we get the below error when trying to read the guest address (gdb) x/10 do_fork 0xc0098660 do_fork: Cannot access memory at address 0xc0098660 (gdb) Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- target-ppc/kvm.c| 45 + target-ppc/kvm_ppc.h| 9 - target-ppc/mmu-hash64.c | 25 - 3 files changed, 69 insertions(+), 10 deletions(-) diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index a629447..b590808 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -1885,3 +1885,48 @@ int kvm_arch_on_sigbus(int code, void *addr) void kvm_arch_init_irq_routing(KVMState *s) { } + +int kvmppc_hash64_load_hpte(PowerPCCPU *cpu, uint64_t index, +target_ulong *hpte0, target_ulong *hpte1) +{ +int htab_fd; +struct kvm_get_htab_fd ghf; +struct kvm_get_htab_buf { +struct kvm_get_htab_header header; +/* + * Older kernel required one extra byte. + */ +unsigned long hpte[3]; +} hpte_buf; + +*hpte0 = 0; +*hpte1 = 0; +if (!cap_htab_fd) { +return 0; +} +/* + * At this point we are only interested in reading only bolted entries + */ +ghf.flags = KVM_GET_HTAB_BOLTED_ONLY; +ghf.start_index = index; +htab_fd = kvm_vm_ioctl(kvm_state, KVM_PPC_GET_HTAB_FD, ghf); +if (htab_fd 0) { +return htab_fd; +} + +if (read(htab_fd, hpte_buf, sizeof(hpte_buf)) 0) { +goto out; +} +/* + * We only requested for one entry, So we should get only 1 + * valid entry at the same index + */ +if (hpte_buf.header.n_valid != 1 || hpte_buf.header.index != index) { +goto out; +} +*hpte0 = hpte_buf.hpte[0]; +*hpte1 = hpte_buf.hpte[1]; +out: +close(htab_fd); +return 0; +} diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h index 4ae7bf2..e25373a 100644 --- a/target-ppc/kvm_ppc.h +++ b/target-ppc/kvm_ppc.h @@ -42,7 +42,8 @@ int kvmppc_get_htab_fd(bool write); int kvmppc_save_htab(QEMUFile *f, int fd, size_t bufsize, int64_t max_ns); int kvmppc_load_htab_chunk(QEMUFile *f, int fd, uint32_t index, uint16_t n_valid, uint16_t n_invalid); - +int kvmppc_hash64_load_hpte(PowerPCCPU *cpu, uint64_t index, +target_ulong *hpte0, target_ulong *hpte1); #else static inline uint32_t kvmppc_get_tbfreq(void) @@ -181,6 +182,12 @@ static inline int kvmppc_load_htab_chunk(QEMUFile *f, int fd, uint32_t index, abort(); } +static inline int kvmppc_hash64_load_hpte(PowerPCCPU *cpu, uint64_t index, + target_ulong *hpte0, + target_ulong *hpte1) +{ +abort(); +} #endif #ifndef CONFIG_KVM diff --git a/target-ppc/mmu-hash64.c b/target-ppc/mmu-hash64.c index 67fc1b5..4d8120c 100644 --- a/target-ppc/mmu-hash64.c +++ b/target-ppc/mmu-hash64.c @@ -302,17 +302,26 @@ static int ppc_hash64_amr_prot(CPUPPCState *env, ppc_hash_pte64_t pte) return prot; } -static hwaddr ppc_hash64_pteg_search(CPUPPCState *env, hwaddr pteg_off, +static hwaddr ppc_hash64_pteg_search(CPUPPCState *env, hwaddr hash, bool secondary, target_ulong ptem, ppc_hash_pte64_t *pte) { -hwaddr pte_offset = pteg_off; +uint64_t index; +hwaddr pte_offset; target_ulong pte0, pte1; int i; +pte_offset = (hash * HASH_PTEG_SIZE_64) env-htab_mask;; +index = (hash * HPTES_PER_GROUP) env-htab_mask; + for (i = 0; i HPTES_PER_GROUP; i++) { -pte0 = ppc_hash64_load_hpte0(env, pte_offset); -pte1 = ppc_hash64_load_hpte1(env, pte_offset); +if (kvm_enabled()) { +index += i; +kvmppc_hash64_load_hpte(ppc_env_get_cpu(env), index, pte0, pte1); +} else { +pte0 = ppc_hash64_load_hpte0(env, pte_offset); +pte1 = ppc_hash64_load_hpte1(env, pte_offset); +} if ((pte0 HPTE64_V_VALID) (secondary == !!(pte0 HPTE64_V_SECONDARY)) @@ -332,7 +341,7 @@ static hwaddr ppc_hash64_htab_lookup(CPUPPCState *env, ppc_slb_t *slb, target_ulong eaddr, ppc_hash_pte64_t *pte) { -hwaddr pteg_off, pte_offset; +hwaddr pte_offset; hwaddr hash; uint64_t vsid, epnshift, epnmask, epn, ptem; @@ -367,8 +376,7 @@ static hwaddr ppc_hash64_htab_lookup(CPUPPCState *env, vsid= TARGET_FMT_lx ptem= TARGET_FMT_lx hash= TARGET_FMT_plx \n, env-htab_base, env-htab_mask, vsid, ptem, hash); -pteg_off = (hash * HASH_PTEG_SIZE_64) env-htab_mask;
[Qemu-devel] [PATCH -V2 2/4] target-ppc: Use #define instead of opencoding SLB valid bit
From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Use SLB_ESID_V instead of (1 27) in the code Reviewed-by: Andreas Färber afaer...@suse.de Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- target-ppc/mmu_helper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target-ppc/mmu_helper.c b/target-ppc/mmu_helper.c index 5dd4e05..9c9132e 100644 --- a/target-ppc/mmu_helper.c +++ b/target-ppc/mmu_helper.c @@ -2061,7 +2061,7 @@ void helper_store_sr(CPUPPCState *env, target_ulong srnum, target_ulong value) /* ESID = srnum */ rb |= ((uint32_t)srnum 0xf) 28; /* Set the valid bit */ -rb |= 1 27; +rb |= SLB_ESID_V; /* Index = ESID */ rb |= (uint32_t)srnum; -- 1.8.1.2
[Qemu-devel] [PATCH -V2 0/4] target-ppc: Add support for dumping guest memory using qemu gdb server
This patch series implement support for dumping guest memory using qemu gdb server. With this patch series we can now do (gdb) x/4i htab_call_hpte_insert1 0xc00470d8 .htab_call_hpte_insert1:bl 0xc00470d8 .htab_call_hpte_insert1 0xc00470dc .htab_call_hpte_insert1+4: cmpdi r3,0 0xc00470e0 .htab_call_hpte_insert1+8: bge 0xc0047190 htab_pte_insert_ok 0xc00470e4 .htab_call_hpte_insert1+12: cmpdi r3,-2 (gdb) target remote localhost:1234 Remote debugging using localhost:1234 .plpar_hcall_norets () at arch/powerpc/platforms/pseries/hvCall.S:119 119 HCALL_INST_POSTCALL_NORETS (gdb) x/4i htab_call_hpte_insert1 0xc00470d8 .htab_call_hpte_insert1:bl 0xc005f8f0 pSeries_lpar_hpte_insert 0xc00470dc .htab_call_hpte_insert1+4: cmpdi r3,0 0xc00470e0 .htab_call_hpte_insert1+8: bge 0xc0047190 htab_pte_insert_ok 0xc00470e4 .htab_call_hpte_insert1+12: cmpdi r3,-2 (gdb) NOTE: We still don't support inserting breakpoints. Before Fix: (qemu) memsave 0xc00470d8 10 memdump Invalid parameter 'addr' (qemu) After fix: (qemu) memsave 0xc00470d8 10 memdump (qemu)
[Qemu-devel] [PATCH -V2 1/4] target-ppc: Update slb array with correct index values.
From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Without this, a value of rb=0 and rs=0 results in replacing the 0th index. This can be observed when using gdb remote debugging support. (gdb) x/10i do_fork 0xc0085330 do_fork:Cannot access memory at address 0xc0085330 (gdb) This is because when we do the slb sync via kvm_cpu_synchronize_state, we overwrite the slb entry (0th entry) for 0xc0085330 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- target-ppc/kvm.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index 30a870e..a629447 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -1034,8 +1034,18 @@ int kvm_arch_get_registers(CPUState *cs) /* Sync SLB */ #ifdef TARGET_PPC64 for (i = 0; i 64; i++) { -ppc_store_slb(env, sregs.u.s.ppc64.slb[i].slbe, - sregs.u.s.ppc64.slb[i].slbv); +target_ulong rb = sregs.u.s.ppc64.slb[i].slbe; +/* + * KVM_GET_SREGS doesn't retun slb entry with slot information + * same as index. So don't depend on the slot information in + * the returned value. + */ +rb = ~0xfff; +/* + * use the array index as the slot + */ +rb |= i; +ppc_store_slb(env, rb, sregs.u.s.ppc64.slb[i].slbv); } #endif -- 1.8.1.2
[Qemu-devel] [PATCH -V2 4/4] target-ppc: Check for error on address translation in memsave command
From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com When we translate the virtual address to physical check for error. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- cpus.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/cpus.c b/cpus.c index 0f65e76..658366d 100644 --- a/cpus.c +++ b/cpus.c @@ -1309,7 +1309,10 @@ void qmp_memsave(int64_t addr, int64_t size, const char *filename, l = sizeof(buf); if (l size) l = size; -cpu_memory_rw_debug(cpu, addr, buf, l, 0); +if (cpu_memory_rw_debug(cpu, addr, buf, l, 0) != 0) { +error_set(errp, QERR_INVALID_PARAMETER, addr); +goto exit; +} if (fwrite(buf, 1, l, f) != l) { error_set(errp, QERR_IO_ERROR); goto exit; -- 1.8.1.2
Re: [Qemu-devel] vmdk stream-optimised format
Il 20/08/2013 09:08, Fam Zheng ha scritto: On Tue, 08/20 07:51, Alex Bligh wrote: On 20 Aug 2013, at 02:42, Fam Zheng wrote: OK, thanks for explaination. That sounds a valid use case for streamOptimized. However I am afraid QEMU and its users benefit not much from this feature anyway, because it's moving a VM away to VMware, :) that might be the reason it's not there yet, and I don't know about any plan to do it in the near future. Well, given it is an open source project, the more interoperability the better. Even if it just means users need not worry about lock in to faster hypervisors ... Being more serious, qemu-img is part of the project too. But if someone sends patches for this, I think it is possible to get in. I guessed send code might be the answer :-) What I'm not sure of is whether the streaming format has to be written sequentially from as opposed to random writes. I believe the way qemu-img convert works, one can't guarantee the writes are sequential. The order of sectors doesn't matter, but granularity should be aligned to, as the data is compressed cluster by cluster. And no overwrite, of course. The challenge may be that header comes at the end of file (well, called footer), which is not decided at create time. It doesn't look too different from what qemu-img convert -c does, except that you need to use the right -o incantation to specify the format type. Paolo
Re: [Qemu-devel] [RFC] Hardware acceleration for Windows and OS X?
Il 19/08/2013 22:17, Stefan Weil ha scritto: Hello, is anybody working on KVM or something equivalent for Windows and Mac OS X? I recently noticed that the QEMU based Android emulator uses KVM for Linux and HAXM, an Intel hypervisor, for Windows and Mac OS X. Would it be a good plan to merge HAXM support in latest QEMU (the Android QEMU is based on a rather old version)? It only supports Intel based hosts, is restricted to 32 bit guests (?) and free, but not open source. Considering the level of openness in both Windows and Mac OS X, a proprietary hypervisor would be the smallest of the problems for someone using those OSes... I guess patches are welcome---if they are of good quality, why not. The community doesn't have that many kernel hackers for Windows and Mac OS X. Paolo
Re: [Qemu-devel] [RFC] Hardware acceleration for Windows and OS X?
On 20 August 2013 11:56, Paolo Bonzini pbonz...@redhat.com wrote: I guess patches are welcome---if they are of good quality, why not. Well, sort of. Given the fairly minimal level of support we currently have for OSX (ie Andreas and I test it a bit and fix egregious breakage), I'd prefer it if patches for significant new features came attached to a developer who was going to stay around and help maintain the platform :-) -- PMM
Re: [Qemu-devel] [PATCHv11 13/31] aio / timers: aio_ctx_prepare sets timeout from AioContext timers
于 2013-8-20 17:29, Alex Bligh 写道: On 20 Aug 2013, at 08:19, Wenchao Xia wrote: Thanks for the explanation. It seems *timeout is always set to -1 before calling GSource's prepare(), so *timeout = qemu_soonest_timeout(*timeout, 10); will always get *timeout = 10, so this call can be saved. I believe that's incorrect too. glib *currently* calls the API that way, but there's nothing to say a future or past glub couldn't call each g_source with the same timeout pointer, with each g_source adjusting the timeout downwards. Or (for instance) call it with 0 for a So it is an undefined value, should avoid use it? For example, other gsource have minimum timeout of 5, and aio_ctx_prepare() is called with 0, then returned timeout is 0, but it is supposed to work with timeout 5 as the glib doc described: https://developer.gnome.org/glib/2.32/glib-The-Main-Event-Loop.html#GSourceFuncs That is my opinion, but I haven't read other project's code about prepare() usage, so can't be sure. Guess maintainer can give a conclusion. non-blocking prepare. Therefore the implemented way is safer (only reducing the timeout). -- Best Regards Wenchao Xia
Re: [Qemu-devel] [PATCH] powerpc iommu: enable multiple TCE requests
On Tue, 2013-08-20 at 12:14 +0200, Paolo Bonzini wrote: I suppose if RH is going to deploy 3.10 and we aren't going to backport the multitce patches then there *might* be a case for supporting that combo specifically... but it's going to be that bad every time we add a new feature with a kernel counter part or start adding the gazillion little bits of PAPR that we are still missing ? Yes. :( Unless you consider pSeries KVM not mature enough to provide a guest ABI (basically supporting live migration only between identical kernels and QEMUs). It would make some sense, for example (mutatis mutandis) Red Hat has a kernel ABI for the regular RHEL kernel, but not for the real-time RHEL kernel. Migration is somewhat less of an issue, and there is to consider what products will actually support KVM on Power. So at this early stage of the game, I would still play it by ear and stay flexible. When we have something we really need to have long term support for around the corner (or whatever we haven't announced yet :-) we'll backport what's needed. But yes, the minute we have that out, I think the problem will present itself, at which point we need to probably start considering features in batches to limit the explosion of individual options ... Cheers, Ben.
[Qemu-devel] [PATCH v2 1/2] tests: Fix schema parser test for in-tree build
From: Markus Armbruster arm...@redhat.com Commit 4f193e3 added the test, but screwed up in-tree builds (SRCDIR=.): the tests's output overwrites the expected output, and is thus compared to itself. Cc: qemu-sta...@nongnu.org Reported-by: Laszlo Ersek ler...@redhat.com Reviewed-by: Andreas Färber afaer...@suse.de Signed-off-by: Markus Armbruster arm...@redhat.com --- tests/.gitignore | 1 + tests/Makefile | 8 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/tests/.gitignore b/tests/.gitignore index fb05c2a..d9c2ef4 100644 --- a/tests/.gitignore +++ b/tests/.gitignore @@ -19,3 +19,4 @@ test-thread-pool test-x86-cpuid test-xbzrle *-test +qapi-schema/*.test.* diff --git a/tests/Makefile b/tests/Makefile index d044908..ad98439 100644 --- a/tests/Makefile +++ b/tests/Makefile @@ -247,10 +247,10 @@ check-tests/test-qapi.py: tests/test-qapi.py .PHONY: $(patsubst %, check-%, $(check-qapi-schema-y)) $(patsubst %, check-%, $(check-qapi-schema-y)): check-%.json: $(SRC_PATH)/%.json - $(call quiet-command, PYTHONPATH=$(SRC_PATH)/scripts $(PYTHON) $(SRC_PATH)/tests/qapi-schema/test-qapi.py $^ $*.out 2$*.err; echo $$? $*.exit, TEST $*.out) - @diff -q $(SRC_PATH)/$*.out $*.out - @diff -q $(SRC_PATH)/$*.err $*.err - @diff -q $(SRC_PATH)/$*.exit $*.exit + $(call quiet-command, PYTHONPATH=$(SRC_PATH)/scripts $(PYTHON) $(SRC_PATH)/tests/qapi-schema/test-qapi.py $^ $*.test.out 2$*.test.err; echo $$? $*.test.exit, TEST $*.out) + @diff -q $(SRC_PATH)/$*.out $*.test.out + @diff -q $(SRC_PATH)/$*.err $*.test.err + @diff -q $(SRC_PATH)/$*.exit $*.test.exit # Consolidated targets -- 1.8.1.4
[Qemu-devel] [PATCH v2 2/2] tests: Update .gitignore for test-int128 and test-bitops
From: Markus Armbruster arm...@redhat.com Forgotten in commit 6046c62 and 3464700. Cc: qemu-sta...@nongnu.org Reviewed-by: Andreas Färber afaer...@suse.de Signed-off-by: Markus Armbruster arm...@redhat.com --- tests/.gitignore | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tests/.gitignore b/tests/.gitignore index d9c2ef4..9ac044d 100644 --- a/tests/.gitignore +++ b/tests/.gitignore @@ -5,8 +5,10 @@ check-qjson check-qlist check-qstring test-aio +test-bitops test-cutils test-hbitmap +test-int128 test-iov test-mul64 test-qapi-types.[ch] -- 1.8.1.4
Re: [Qemu-devel] [PATCH 0/2] tests: Fixes for in-tree build
Andreas Färber afaer...@suse.de writes: Am 20.08.2013 09:47, schrieb arm...@redhat.com: From: Markus Armbruster arm...@redhat.com Markus Armbruster (2): tests: Fix schema parser test for in-tree build tests: Update .gitignore for test-int128 and test-bitops Series Reviewed-by: Andreas Färber afaer...@suse.de but you should probably add Cc: qemu-sta...@nongnu.org to 1/2 at least. Good point. v2 sent with the tag added, just to make sure it gets picked up.
[Qemu-devel] [PATCH v2 0/2] tests: Fixes for in-tree build
From: Markus Armbruster arm...@redhat.com v2: Nominate for qemu-stable (Andreas) Markus Armbruster (2): tests: Fix schema parser test for in-tree build tests: Update .gitignore for test-int128 and test-bitops tests/.gitignore | 3 +++ tests/Makefile | 8 2 files changed, 7 insertions(+), 4 deletions(-) -- 1.8.1.4
Re: [Qemu-devel] [PATCH 0/3] target-ppc: Add support for dumping guest memory using qemu gdb server
Am 20.08.2013 10:19, schrieb Aneesh Kumar K.V: Andreas Färber afaer...@suse.de writes: Hi Aneesh, Am 19.08.2013 14:29, schrieb Aneesh Kumar K.V: This patch series implement support for dumping guest memory using qemu gdb server. I had a quick look through but will leave in-depth review to Alex or Anthony. Do you plan to implement dumping guest memory via QMP, too? Are you looking at memsave command ? No, I literally meant the dump-guest-memory command or so, which is only implemented for x86 and s390x so far. Cf. target-i386/arch_{dump,memory_mapping}.c and target-s390x/arch_dump.c for examples. Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH] target-i386: Only provide CMOV and friends if feature bit set
Ping^2! This has been reviewed and I've checked that the patch still applies to master. thanks -- PMM On 25 July 2013 17:54, Peter Maydell peter.mayd...@linaro.org wrote: Ping! (patchwork url: http://patchwork.ozlabs.org/patch/259148/) thanks -- PMM On 15 July 2013 18:21, Peter Maydell peter.mayd...@linaro.org wrote: The instructions CMOVcc, FCMOVcc and F[U]COMI[P] should only be present if the CMOV feature bit is set. Add missing feature bit checks so we correctly fault if emulating a 486 or 586. This fixes bug LP:1201446. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- target-i386/translate.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/target-i386/translate.c b/target-i386/translate.c index 6550c27..f75e3b1 100644 --- a/target-i386/translate.c +++ b/target-i386/translate.c @@ -6434,12 +6434,18 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s, } break; case 0x1d: /* fucomi */ +if (!(s-cpuid_features CPUID_CMOV)) { +goto illegal_op; +} gen_update_cc_op(s); gen_helper_fmov_FT0_STN(cpu_env, tcg_const_i32(opreg)); gen_helper_fucomi_ST0_FT0(cpu_env); set_cc_op(s, CC_OP_EFLAGS); break; case 0x1e: /* fcomi */ +if (!(s-cpuid_features CPUID_CMOV)) { +goto illegal_op; +} gen_update_cc_op(s); gen_helper_fmov_FT0_STN(cpu_env, tcg_const_i32(opreg)); gen_helper_fcomi_ST0_FT0(cpu_env); @@ -6495,6 +6501,9 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s, } break; case 0x3d: /* fucomip */ +if (!(s-cpuid_features CPUID_CMOV)) { +goto illegal_op; +} gen_update_cc_op(s); gen_helper_fmov_FT0_STN(cpu_env, tcg_const_i32(opreg)); gen_helper_fucomi_ST0_FT0(cpu_env); @@ -6502,6 +6511,9 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s, set_cc_op(s, CC_OP_EFLAGS); break; case 0x3e: /* fcomip */ +if (!(s-cpuid_features CPUID_CMOV)) { +goto illegal_op; +} gen_update_cc_op(s); gen_helper_fmov_FT0_STN(cpu_env, tcg_const_i32(opreg)); gen_helper_fcomi_ST0_FT0(cpu_env); @@ -6518,6 +6530,10 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s, (JCC_BE 1), (JCC_P 1), }; + +if (!(s-cpuid_features CPUID_CMOV)) { +goto illegal_op; +} op1 = fcmov_cc[op 3] | (((op 3) 1) ^ 1); l1 = gen_new_label(); gen_jcc1_noeob(s, op1, l1); @@ -6889,6 +6905,9 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s, gen_ldst_modrm(env, s, modrm, OT_BYTE, OR_TMP0, 1); break; case 0x140 ... 0x14f: /* cmov Gv, Ev */ +if (!(s-cpuid_features CPUID_CMOV)) { +goto illegal_op; +} ot = dflag + OT_WORD; modrm = cpu_ldub_code(env, s-pc++); reg = ((modrm 3) 7) | rex_r; -- 1.7.9.5
Re: [Qemu-devel] [PATCH] hw/openrisc/openrisc_sim: Avoid using uninitialised variable 'entry'
Ping for qemu-trivial now 1.7 is open. thanks -- PMM On 5 August 2013 19:24, Peter Maydell peter.mayd...@linaro.org wrote: clang warns that cpu_openrisc_load_kernel() can use 'entry' uninitialized: hw/openrisc/openrisc_sim.c:69:9: error: variable 'entry' is used uninitialized whenever '' condition is false [-Werror,-Wsometimes-uninitialized] if (kernel_filename !qtest_enabled()) { ^~~ hw/openrisc/openrisc_sim.c:91:19: note: uninitialized use occurs here cpu-env.pc = entry; ^ Fix this by not attempting to change the CPU's starting PC unless we actually loaded a kernel. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- hw/openrisc/openrisc_sim.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/openrisc/openrisc_sim.c b/hw/openrisc/openrisc_sim.c index a08f27c..4595fa9 100644 --- a/hw/openrisc/openrisc_sim.c +++ b/hw/openrisc/openrisc_sim.c @@ -86,9 +86,9 @@ static void cpu_openrisc_load_kernel(ram_addr_t ram_size, kernel_filename); exit(1); } -} -cpu-env.pc = entry; +cpu-env.pc = entry; +} } static void openrisc_sim_init(QEMUMachineInitArgs *args) -- 1.7.9.5
Re: [Qemu-devel] [PATCH] configure: disable clang -Wstring-plus-int warning
Ping for qemu-trivial now 1.7 is open. thanks -- PMM On 5 August 2013 20:16, Peter Maydell peter.mayd...@linaro.org wrote: Some versions of clang will warn about adding integers to strings: disas/i386.c:4753:23: error: adding 'char' to a string does not append to the string [-Werror,-Wstring-plus-int] oappend (%es: + intel_syntax); ~~~^~ disas/i386.c:4753:23: note: use array indexing to silence this warning oappend (%es: + intel_syntax); ^ [ ] disas/i386.c uses this idiom to to skip a % prefix if using intel rather than ATT syntax. This seems like a reasonable thing to do, and I don't think anybody contributing to QEMU is likely to believe that '+' is a string concatenation operator in C, so just disable -Wstring-plus-int. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- configure |1 + 1 file changed, 1 insertion(+) diff --git a/configure b/configure index a3bd9be..a1d1701 100755 --- a/configure +++ b/configure @@ -1204,6 +1204,7 @@ gcc_flags=-Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers $gcc_ gcc_flags=-Wmissing-include-dirs -Wempty-body -Wnested-externs $gcc_flags gcc_flags=-Wendif-labels $gcc_flags gcc_flags=-Wno-initializer-overrides $gcc_flags +gcc_flags=-Wno-string-plus-int $gcc_flags # Note that we do not add -Werror to gcc_flags here, because that would # enable it for all configure tests. If a configure test failed due # to -Werror this would just silently disable some features, -- 1.7.9.5
Re: [Qemu-devel] [PATCH 00/14] ARM: make IRQ/FIQ gpio lines on CPU object
On 8 August 2013 17:04, Peter Maydell peter.mayd...@linaro.org wrote: This patch series makes IRQ and FIQ be inbound gpio lines on the ARMCPU object (which we can do now that it is a subclass of DeviceState). This allows us to drop the odd 'arm_pic' shim, which doesn't correspond to real hardware and existed purely to be a thing which exposed qemu_irqs Applying this to target-arm.next (since that lets me resolve the trivial textual conflict with the generic-timers patchset rather than having two pullreqs conflict). thanks -- PMM
Re: [Qemu-devel] [PATCH v2 0/4] target-arm: Implement support for generic timers
On 9 August 2013 17:17, Peter Maydell peter.mayd...@linaro.org wrote: This patch series implements support for the 'generic timers', which are a set of timers defined in the ARM Architecture Reference Manual and implemented by the Cortex-A15. We've got away without these up til now because Linux will generally fall back on whatever random timer is present on the devboard if it can't find the on-CPU timers, but this is less than ideal. (Among other things, the proposed mach-virt should just use the generic timers so it doesn't have to provide an sp804 timer.) Applying to target-arm.next. -- PMM
Re: [Qemu-devel] [PATCH] default-configs: Fix A9MP and A15MP config names
On 9 August 2013 14:50, Peter Maydell peter.mayd...@linaro.org wrote: When individual CONFIG_ switches for the A9MPcore and A15MPcore devices were created, they were inadvertently given incorrect names (CONFIG_ARM9MPCORE and CONFIG_ARM15MPCORE). These CPUs are Cortex-A9MP and Cortex-A15MP, and in particular the ARM9 is a different (rather older) CPU than the Cortex-A9. Rename the CONFIG_ switches to bring them into line with the source file names and CPU names. Signed-off-by: Peter Maydell peter.mayd...@linaro.org Applying to target-arm.next. -- PMM
Re: [Qemu-devel] [PATCH] hw/openrisc/openrisc_sim: Avoid using uninitialised variable 'entry'
Hi Peter, On Tue, Aug 20, 2013 at 9:00 PM, Peter Maydell peter.mayd...@linaro.org wrote: Ping for qemu-trivial now 1.7 is open. Thank you, I'll send a PULL very soon. thanks -- PMM On 5 August 2013 19:24, Peter Maydell peter.mayd...@linaro.org wrote: clang warns that cpu_openrisc_load_kernel() can use 'entry' uninitialized: hw/openrisc/openrisc_sim.c:69:9: error: variable 'entry' is used uninitialized whenever '' condition is false [-Werror,-Wsometimes-uninitialized] if (kernel_filename !qtest_enabled()) { ^~~ hw/openrisc/openrisc_sim.c:91:19: note: uninitialized use occurs here cpu-env.pc = entry; ^ Fix this by not attempting to change the CPU's starting PC unless we actually loaded a kernel. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- hw/openrisc/openrisc_sim.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/openrisc/openrisc_sim.c b/hw/openrisc/openrisc_sim.c index a08f27c..4595fa9 100644 --- a/hw/openrisc/openrisc_sim.c +++ b/hw/openrisc/openrisc_sim.c @@ -86,9 +86,9 @@ static void cpu_openrisc_load_kernel(ram_addr_t ram_size, kernel_filename); exit(1); } -} -cpu-env.pc = entry; +cpu-env.pc = entry; +} } static void openrisc_sim_init(QEMUMachineInitArgs *args) -- 1.7.9.5 Regards, Jia
Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled
The QEMU command line (/var/log/libvirt/qemu/[domain name].log), LC_ALL=C PATH=/bin:/sbin:/usr/bin:/usr/sbin HOME=/ QEMU_AUDIO_DRV=none /usr/local/bin/qemu-system-x86_64 -name ATS1 -S -M pc-0.12 -cpu qemu32 -enable-kvm -m 12288 -smp 4,sockets=4,cores=1,threads=1 -uuid 0505ec91-382d-800e-2c79-e5b286eb60b5 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/ATS1.monitor,ser ver, n owait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/opt/ne/vm/ATS1.img,if=none,id=drive-virtio-disk0,format=raw ,cac h e=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-dis k0,i d =virtio-disk0,bootindex=1 -netdev tap,fd=20,id=hostnet0,vhost=on,vhostfd=21 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:e0:fc:00:0f:00,bus=pci. 0 ,addr=0x3,bootindex=2 -netdev tap,fd=22,id=hostnet1,vhost=on,vhostfd=23 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:e0:fc:01:0f:00,bus=pci. 0 ,addr=0x4 -netdev tap,fd=24,id=hostnet2,vhost=on,vhostfd=25 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=00:e0:fc:02:0f:00,bus=pci. 0 ,addr=0x5 -netdev tap,fd=26,id=hostnet3,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet3,id=net3,mac=00:e0:fc:03:0f:00,bus=pci. 0 ,addr=0x6 -netdev tap,fd=28,id=hostnet4,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet4,id=net4,mac=00:e0:fc:0a:0f:00,bus=pci. 0 ,addr=0x7 -netdev tap,fd=30,id=hostnet5,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet5,id=net5,mac=00:e0:fc:0b:0f:00,bus=pci. 0 ,addr=0x9 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc *:0 -k en-us -vga cirrus -device i6300esb,id=watchdog0,bus=pci.0,addr=0xb -watchdog-action poweroff -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xa Which QEMU version is this? Can you try with e1000 NICs instead of virtio? This QEMU version is 1.0.0, but I also test QEMU 1.5.2, the same problem exists, including the performance degradation and readonly GFNs' flooding. I tried with e1000 NICs instead of virtio, including the performance degradation and readonly GFNs' flooding, the QEMU version is 1.5.2. No matter e1000 NICs or virtio NICs, the GFNs' flooding is initiated at post-restore stage (i.e. running stage), as soon as the restoring completed, the flooding is starting. Thanks, Zhang Haoyu -- Gleb. Should we focus on the first bad commit(612819c3c6e67bac8fceaa7cc402f13b1b63f7e4) and the surprising GFNs' flooding? Not really. There is no point in debugging very old version compiled with kvm-kmod, there are to many variables in the environment. I cannot reproduce the GFN flooding on upstream, so the problem may be gone, may be a result of kvm-kmod problem or something different in how I invoke qemu. So the best way to proceed is for you to reproduce with upstream version then at least I will be sure that we are using the same code. Thanks, I will test the combos of upstream kvm kernel and upstream qemu. And, the guest os version above I said was wrong, current running guest os is SLES10SP4. I tested below combos of qemu and kernel, +-+-+-+ | kvm kernel | QEMU | test result | +-+-+-+ | kvm-3.11-2 | qemu-1.5.2| GOOD | +-+-+-+ | SLES11SP2 | qemu-1.0.0| BAD| +-+-+-+ | SLES11SP2 | qemu-1.4.0| BAD| +-+-+-+ | SLES11SP2 | qemu-1.4.2| BAD| +-+-+-+ | SLES11SP2 | qemu-1.5.0-rc0 | GOOD | +-+-+-+ | SLES11SP2 | qemu-1.5.0| GOOD | +-+-+-+ | SLES11SP2 | qemu-1.5.1| GOOD | +-+-+-+ | SLES11SP2 | qemu-1.5.2| GOOD | +-+-+-+ NOTE: 1. above kvm-3.11-2 in the table is the whole tag kernel download from https://git.kernel.org/pub/scm/virt/kvm/kvm.git 2. SLES11SP2's kernel version is 3.0.13-0.27 Then I git bisect the qemu changes between qemu-1.4.2 and qemu-1.5.0-rc0 by marking the good version as bad, and the bad version as good, so the first bad commit is just the patch which fixes the degradation problem. ++---+-+-+ | bisect No. | commit | save-restore | migration|
Re: [Qemu-devel] [PATCHv11 13/31] aio / timers: aio_ctx_prepare sets timeout from AioContext timers
On 20 Aug 2013, at 12:19, Wenchao Xia wrote: So it is an undefined value, should avoid use it? It's not an undefined value. It's the value that the poll should wait for subject to modification by the prepare call. -- Alex Bligh
Re: [Qemu-devel] vmdk stream-optimised format
On 20 Aug 2013, at 11:53, Paolo Bonzini wrote: It doesn't look too different from what qemu-img convert -c does, except that you need to use the right -o incantation to specify the format type. Extensive incanting did not produce a result that works. Possibly inadequate haruspication on my part. I suspect it's only a minor modification to support it. -- Alex Bligh
Re: [Qemu-devel] vmdk stream-optimised format
Il 20/08/2013 15:37, Alex Bligh ha scritto: On 20 Aug 2013, at 11:53, Paolo Bonzini wrote: It doesn't look too different from what qemu-img convert -c does, except that you need to use the right -o incantation to specify the format type. Extensive incanting did not produce a result that works. Possibly inadequate haruspication on my part. I suspect it's only a minor modification to support it. Yes, because vmdk does not implement bdrv_write_compressed. Sorry for leaving out the small detail. :( Paolo
Re: [Qemu-devel] [PATCH v2] monitor: print the invalid char in error message
On 08/19/2013 08:58 PM, Fam Zheng wrote: It's more friendly to print which char is invalid to user, especially when user tries to input a float value and expect the monitor to round it to int. Since we don't round float number when we look for a integer, telling which char is invalid is less confusing. Including a sample of the error message issued before and after this patch as part of the commit message is helpful, but not enough for me to require a respin. Signed-off-by: Fam Zheng f...@redhat.com --- monitor.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) Reviewed-by: Eric Blake ebl...@redhat.com -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] Call for agenda for 2013-08-20
Juan Quintela quint...@redhat.com wrote: Hi Please, send any topic that you are interested in covering. Call cancelled. As this was the only topic, and neither Frederik or Konrad are able to attend today, topic got moved to next call in two weeks. Thanks, Juan. Agenda so far: - Talk about qemu reverse executing (1st description was done this week) How to handle IO when we want to do reverse execution. How this relate to Kemari needs? And to icount changes? Call details: 10:00 AM to 11:00 AM EDT Every two weeks If you need phone number details, contact me privately.
Re: [Qemu-devel] [PATCH V8 07/11] NUMA: set guest numa nodes memory policy
- Original Message - Set the guest numa nodes memory policies using the mbind(2) system call node by node. After this patch, we are able to set guest nodes memory policies through the QEMU options, this arms to solve the guest cross nodes memory access performance issue. And as you all know, if PCI-passthrough is used, direct-attached-device uses DMA transfer between device and qemu process. All pages of the guest will be pinned by get_user_pages(). KVM_ASSIGN_PCI_DEVICE ioctl kvm_vm_ioctl_assign_device() =kvm_assign_device() = kvm_iommu_map_memslots() = kvm_iommu_map_pages() = kvm_pin_pages() So, with direct-attached-device, all guest page's page count will be +1 and any page migration will not work. AutoNUMA won't too. So, we should set the guest nodes memory allocation policies before the pages are really mapped. Signed-off-by: Andre Przywara andre.przyw...@amd.com Signed-off-by: Wanlong Gao gaowanl...@cn.fujitsu.com --- numa.c | 89 ++ 1 file changed, 89 insertions(+) diff --git a/numa.c b/numa.c index 436b8e0..b2c0048 100644 --- a/numa.c +++ b/numa.c @@ -28,6 +28,16 @@ #include qapi-visit.h #include qapi/opts-visitor.h #include qapi/dealloc-visitor.h +#include exec/memory.h + +#ifdef CONFIG_NUMA +#include numa.h +#include numaif.h +#ifndef MPOL_F_RELATIVE_NODES +#define MPOL_F_RELATIVE_NODES (1 14) +#define MPOL_F_STATIC_NODES (1 15) +#endif +#endif QemuOptsList qemu_numa_opts = { .name = numa, @@ -209,6 +219,78 @@ void set_numa_nodes(void) } } +#ifdef CONFIG_NUMA +static int node_parse_bind_mode(unsigned int nodeid) +{ +int bind_mode; + +switch (numa_info[nodeid].policy) { +case NUMA_NODE_POLICY_MEMBIND: +bind_mode = MPOL_BIND; +break; +case NUMA_NODE_POLICY_INTERLEAVE: +bind_mode = MPOL_INTERLEAVE; +break; +case NUMA_NODE_POLICY_PREFERRED: +bind_mode = MPOL_PREFERRED; +break; +case NUMA_NODE_POLICY_DEFAULT: +default: +bind_mode = MPOL_DEFAULT; +return bind_mode; +} + +bind_mode |= numa_info[nodeid].relative ? +MPOL_F_RELATIVE_NODES : MPOL_F_STATIC_NODES; + +return bind_mode; +} +#endif + +static int set_node_mem_policy(int nodeid) +{ +#ifdef CONFIG_NUMA +void *ram_ptr; +RAMBlock *block; +ram_addr_t len, ram_offset = 0; +int bind_mode; +int i; + +QTAILQ_FOREACH(block, ram_list.blocks, next) { +if (!strcmp(block-mr-name, pc.ram)) { +break; +} +} + +if (block-host == NULL) { +return -1; +} + +ram_ptr = block-host; +for (i = 0; i nodeid; i++) { +len = numa_info[i].node_mem; +ram_offset += len; +} + +len = numa_info[i].node_mem; +bind_mode = node_parse_bind_mode(i); + +/* This is a workaround for a long standing bug in Linux' + * mbind implementation, which cuts off the last specified + * node. To stay compatible should this bug be fixed, we + * specify one more node and zero this one out. + */ +clear_bit(numa_num_configured_nodes() + 1, numa_info[i].host_mem); +if (mbind(ram_ptr + ram_offset, len, bind_mode, +numa_info[i].host_mem, numa_num_configured_nodes() + 1, 0)) { +perror(mbind); +return -1; +} From my quick read of this patch series, I think these two calls of numa_num_configured_nodes() are the only places that libnuma is used. Is it really worth the new dependency? Actually libnuma will only calculate what it returns from numa_num_configured_nodes() once, because it simply counts bits in a bitmask that it only initializes at library load time. So it would be more robust wrt to node onlining/offlining to avoid libnuma and to just fetch information from sysfs as needed anyway. In this particular code though, I think replacing numa_num_configured_nodes() with a maxnode, where unsigned long maxnode = find_last_bit(numa_info[i].host_mem, MAX_CPUMASK_BITS) would work the best. Another comment I have on this function is that I'd prefer to see something like unsigned long *nodes = numa_info[nodeid].host_mem; at the top, and then use that for a shorter name, rather than abusing the fact that i == nodeid after the loop, presumably just to keep the name short. drew +#endif + +return 0; +} + void set_numa_modes(void) { CPUState *cpu; @@ -221,4 +303,11 @@ void set_numa_modes(void) } } } + +for (i = 0; i nb_numa_nodes; i++) { +if (set_node_mem_policy(i) == -1) { +fprintf(stderr, +qemu: can not set host memory policy for node%d\n, i); +} +} } -- 1.8.4.rc1.21.gfb56570
[Qemu-devel] [PATCH] Make cow_co_is_allocated and cow_update_bitmap more efficient
cow_co_is_allocated and cow_update_bitmap set bits by reading the relevant word, setting the specific bit in it and writing it back. These functions set a number of contiguous bits however, so this is an extremely inefficient way of doing this. This patch converts them to read the whole bitmap they need in one go, update it and then write it out, which should be much more more efficient. Signed-off-by: Charlie Shepherd char...@ctshepherd.com --- block/cow.c | 116 1 file changed, 62 insertions(+), 54 deletions(-) diff --git a/block/cow.c b/block/cow.c index 1cc2e89..87ebef6 100644 --- a/block/cow.c +++ b/block/cow.c @@ -102,84 +102,92 @@ static int cow_open(BlockDriverState *bs, QDict *options, int flags) return ret; } -/* - * XXX(hch): right now these functions are extremely inefficient. - * We should just read the whole bitmap we'll need in one go instead. - */ -static inline int cow_set_bit(BlockDriverState *bs, int64_t bitnum) -{ -uint64_t offset = sizeof(struct cow_header_v2) + bitnum / 8; -uint8_t bitmap; -int ret; - -ret = bdrv_pread(bs-file, offset, bitmap, sizeof(bitmap)); -if (ret 0) { - return ret; -} - -bitmap |= (1 (bitnum % 8)); - -ret = bdrv_pwrite_sync(bs-file, offset, bitmap, sizeof(bitmap)); -if (ret 0) { - return ret; -} -return 0; -} - -static inline int is_bit_set(BlockDriverState *bs, int64_t bitnum) -{ -uint64_t offset = sizeof(struct cow_header_v2) + bitnum / 8; -uint8_t bitmap; -int ret; - -ret = bdrv_pread(bs-file, offset, bitmap, sizeof(bitmap)); -if (ret 0) { - return ret; -} - -return !!(bitmap (1 (bitnum % 8))); -} - /* Return true if first block has been changed (ie. current version is * in COW file). Set the number of continuous blocks for which that * is true. */ static int coroutine_fn cow_co_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *num_same) { -int changed; +int ret, changed; +uint64_t offset = sizeof(struct cow_header_v2) + sector_num / 8; + +int init_bits = (sector_num % 8) ? (8 - (sector_num % 8)) : 0; +int remaining = sector_num - init_bits; +int full_bytes = remaining / 8; +int trail = remaining % 8; + +int len = !!init_bits + full_bytes + !!trail; +uint8_t buf[len]; if (nb_sectors == 0) { - *num_same = nb_sectors; - return 0; +*num_same = nb_sectors; +return 0; } -changed = is_bit_set(bs, sector_num); -if (changed 0) { -return 0; /* XXX: how to return I/O errors? */ +ret = bdrv_pread(bs-file, offset, buf, len); +if (ret 0) { +return ret; } +#define is_bit_set(b) (!!(buf[(b)/8] (1 ((b) % 8 + +changed = is_bit_set(sector_num); for (*num_same = 1; *num_same nb_sectors; (*num_same)++) { - if (is_bit_set(bs, sector_num + *num_same) != changed) - break; +if (is_bit_set(sector_num + *num_same) != changed) { +break; +} } +#undef is_bit_set + return changed; } +/* Set the bits from sector_num to sector_num + nb_sectors in the bitmap of + * bs-file. */ static int cow_update_bitmap(BlockDriverState *bs, int64_t sector_num, int nb_sectors) { -int error = 0; -int i; +int ret; +uint64_t offset = sizeof(struct cow_header_v2) + sector_num / 8; -for (i = 0; i nb_sectors; i++) { -error = cow_set_bit(bs, sector_num + i); -if (error) { -break; -} +int init_bits = (sector_num % 8) ? (8 - (sector_num % 8)) : 0; +int remaining = sector_num - init_bits; +int full_bytes = remaining / 8; +int trail = remaining % 8; + +int len = !!init_bits + full_bytes + !!trail; +uint8_t buf[len]; + +ret = bdrv_pread(bs-file, offset, buf, len); +if (ret 0) { +return ret; +} + +/* Do sector_num - nearest byte boundary */ +if (init_bits) { +/* This sets the highest init_bits bits in the byte */ +uint8_t bits = ((1 init_bits) - 1) (8 - init_bits); +buf[0] |= bits; +} + +if (full_bytes) { +memset(buf[!!init_bits], ~0, full_bytes); +} + +/* Set the trailing bits in the final byte */ +if (trail) { +/* This sets the lowest trail bits in the byte */ +uint8_t bits = (1 trail) - 1; +buf[len - 1] |= bits; +} + +ret = bdrv_pwrite_sync(bs-file, offset, buf, len); +if (ret 0) { +return ret; } -return error; +return 0; } static int coroutine_fn cow_read(BlockDriverState *bs, int64_t sector_num, -- 1.8.3.2
Re: [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes
Il 20/08/2013 03:07, Wanlong Gao ha scritto: -numa node,nodeid=0,cpus=0, \ -numa mem,size=1024M,policy=membind,host-nodes=0-1 \ -numa node,nodeid=1,cpus=1 \ -numa mem,size=1024M,policy=interleave,host-nodes=1 What nodes would the memory be in, for this command line? Does it just compute the total and split it evenly across the nodes (so that the -numa node options could omit nodeid and cpus too)? Also, do you still need a -m option if you use -numa mem? Paolo
Re: [Qemu-devel] [PATCH] Make cow_co_is_allocated and cow_update_bitmap more efficient
Il 20/08/2013 15:42, Charlie Shepherd ha scritto: cow_co_is_allocated and cow_update_bitmap set bits by reading the relevant word, setting the specific bit in it and writing it back. These functions set a number of contiguous bits however, so this is an extremely inefficient way of doing this. This patch converts them to read the whole bitmap they need in one go, update it and then write it out, which should be much more more efficient. Signed-off-by: Charlie Shepherd char...@ctshepherd.com --- block/cow.c | 116 1 file changed, 62 insertions(+), 54 deletions(-) diff --git a/block/cow.c b/block/cow.c index 1cc2e89..87ebef6 100644 --- a/block/cow.c +++ b/block/cow.c @@ -102,84 +102,92 @@ static int cow_open(BlockDriverState *bs, QDict *options, int flags) return ret; } -/* - * XXX(hch): right now these functions are extremely inefficient. - * We should just read the whole bitmap we'll need in one go instead. - */ -static inline int cow_set_bit(BlockDriverState *bs, int64_t bitnum) -{ -uint64_t offset = sizeof(struct cow_header_v2) + bitnum / 8; -uint8_t bitmap; -int ret; - -ret = bdrv_pread(bs-file, offset, bitmap, sizeof(bitmap)); -if (ret 0) { - return ret; -} - -bitmap |= (1 (bitnum % 8)); - -ret = bdrv_pwrite_sync(bs-file, offset, bitmap, sizeof(bitmap)); -if (ret 0) { - return ret; -} -return 0; -} - -static inline int is_bit_set(BlockDriverState *bs, int64_t bitnum) -{ -uint64_t offset = sizeof(struct cow_header_v2) + bitnum / 8; -uint8_t bitmap; -int ret; - -ret = bdrv_pread(bs-file, offset, bitmap, sizeof(bitmap)); -if (ret 0) { - return ret; -} - -return !!(bitmap (1 (bitnum % 8))); -} - /* Return true if first block has been changed (ie. current version is * in COW file). Set the number of continuous blocks for which that * is true. */ static int coroutine_fn cow_co_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *num_same) { -int changed; +int ret, changed; +uint64_t offset = sizeof(struct cow_header_v2) + sector_num / 8; + +int init_bits = (sector_num % 8) ? (8 - (sector_num % 8)) : 0; +int remaining = sector_num - init_bits; +int full_bytes = remaining / 8; +int trail = remaining % 8; + +int len = !!init_bits + full_bytes + !!trail; +uint8_t buf[len]; if (nb_sectors == 0) { - *num_same = nb_sectors; - return 0; +*num_same = nb_sectors; +return 0; } -changed = is_bit_set(bs, sector_num); -if (changed 0) { -return 0; /* XXX: how to return I/O errors? */ +ret = bdrv_pread(bs-file, offset, buf, len); +if (ret 0) { +return ret; } +#define is_bit_set(b) (!!(buf[(b)/8] (1 ((b) % 8 + +changed = is_bit_set(sector_num); for (*num_same = 1; *num_same nb_sectors; (*num_same)++) { - if (is_bit_set(bs, sector_num + *num_same) != changed) - break; +if (is_bit_set(sector_num + *num_same) != changed) { +break; +} } +#undef is_bit_set + return changed; } +/* Set the bits from sector_num to sector_num + nb_sectors in the bitmap of + * bs-file. */ static int cow_update_bitmap(BlockDriverState *bs, int64_t sector_num, int nb_sectors) { -int error = 0; -int i; +int ret; +uint64_t offset = sizeof(struct cow_header_v2) + sector_num / 8; -for (i = 0; i nb_sectors; i++) { -error = cow_set_bit(bs, sector_num + i); -if (error) { -break; -} +int init_bits = (sector_num % 8) ? (8 - (sector_num % 8)) : 0; +int remaining = sector_num - init_bits; +int full_bytes = remaining / 8; +int trail = remaining % 8; + +int len = !!init_bits + full_bytes + !!trail; +uint8_t buf[len]; + +ret = bdrv_pread(bs-file, offset, buf, len); +if (ret 0) { +return ret; +} + +/* Do sector_num - nearest byte boundary */ +if (init_bits) { +/* This sets the highest init_bits bits in the byte */ +uint8_t bits = ((1 init_bits) - 1) (8 - init_bits); +buf[0] |= bits; +} + +if (full_bytes) { +memset(buf[!!init_bits], ~0, full_bytes); +} + +/* Set the trailing bits in the final byte */ +if (trail) { +/* This sets the lowest trail bits in the byte */ +uint8_t bits = (1 trail) - 1; +buf[len - 1] |= bits; +} + +ret = bdrv_pwrite_sync(bs-file, offset, buf, len); +if (ret 0) { +return ret; } -return error; +return 0; } static int coroutine_fn cow_read(BlockDriverState *bs, int64_t sector_num, I had very similar patches in my series to add get_block_status... Paolo
Re: [Qemu-devel] [Patch] ARM: Simplify and fix imx_epit implementation.
On 5 August 2013 02:27, Peter Chubb peter.ch...@nicta.com.au wrote: When imx_epit.c was last refactored, a common usecase (comparison register zero) broke. This patch fixes that, and simplifies the code yet more. It also fixes a major thinko in the reset path --- the wrong bits in the control register were being cleared. Signed-off-by: Peter Chubb peter.ch...@nicta.com.au Reviewed-by: Jean-Christophe DUBOIS j...@tribudubois.net Thanks; applied to target-arm.next. -- PMM
Re: [Qemu-devel] [PATCH] Make cow_co_is_allocated and cow_update_bitmap more efficient
On 20/08/2013 14:45, Paolo Bonzini wrote: I had very similar patches in my series to add get_block_status... Ah, sorry if I've duplicated something you've already done. Stefan encouraged me to contribute to an area of QEMU I found interesting and seeing the XXX I thought I'd take a shot at improving the code. Where should I look in the future for patches like this? Charlie
Re: [Qemu-devel] [Patch] ARM: Simplify and fix imx_epit implementation.
Am 20.08.2013 15:46, schrieb Peter Maydell: On 5 August 2013 02:27, Peter Chubb peter.ch...@nicta.com.au wrote: When imx_epit.c was last refactored, a common usecase (comparison register zero) broke. This patch fixes that, and simplifies the code yet more. It also fixes a major thinko in the reset path --- the wrong bits in the control register were being cleared. Signed-off-by: Peter Chubb peter.ch...@nicta.com.au Reviewed-by: Jean-Christophe DUBOIS j...@tribudubois.net Thanks; applied to target-arm.next. Peter Ch., please call such a patch imx_epit: rather than ARM: since it affects only your device. (PMM, can you fix?) Also if there is a reset bug, then fixing that in its own patch would better allow backporting that to 1.6.1. The way it is right now with no Cc: line for qemu-stable, the released version will keep the thinko. Changing debug output from stdout to stderr would've also been a change of its own that is not even mentioned in the commit message. Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] vmdk stream-optimised format
On 20 Aug 2013, at 14:37, Paolo Bonzini wrote: Yes, because vmdk does not implement bdrv_write_compressed. Sorry for leaving out the small detail. :( Would that be easy for me to fix? -- Alex Bligh
Re: [Qemu-devel] [Patch] ARM: Simplify and fix imx_epit implementation.
On 20 August 2013 15:01, Andreas Färber afaer...@suse.de wrote: Peter Ch., please call such a patch imx_epit: rather than ARM: since it affects only your device. (PMM, can you fix?) Yes, I fixed it to say hw/timer/imx_epit: ... -- PMM
Re: [Qemu-devel] [PATCH V8 00/13] monitor: support sub command group in auto completion and help
On Tue, 30 Jul 2013 12:03:11 -0400 Luiz Capitulino lcapitul...@redhat.com wrote: On Fri, 26 Jul 2013 11:20:29 +0800 Wenchao Xia xiaw...@linux.vnet.ibm.com wrote: This series make auto completion and help functions works normal for sub command, by using reentrant functions. In order to do that, global variables are not directly used in those functions any more. With this series, cmd_table is a member of structure Monitor so it is possible to create a monitor with different command table now, auto completion will work in that monitor. In short, info is not treated as a special case now, this series ensure help and auto complete function works normal for any sub command added in the future. Patch 5 replaced cur_mon with rs-mon, it is safe because: monitor_init() calls readline_init() which initialize mon-rs, result is mon-rs-mon == mon. Then qemu_chr_add_handlers() is called, which make monitor_read() function take *mon as its opaque. Later, when user input, monitor_read() is called, where cur_mon is set to *mon by cur_mon = opaque. If qemu's monitors run in one thread, then later in readline_handle_byte() and readline_comletion(), cur_mon is actually equal to rs-mon, in another word, it points to the monitor instance, so it is safe to replace *cur_mon in those functions. I've applied this to qmp-next with the change I suggested for patch 09/13. Unfortunately this series brakes make check: GTESTER check-qtest-x86_64 Broken pipe GTester: last random seed: R02S3492bd34f44dd17460851643383be44d main-loop: WARNING: I/O thread spun for 1000 iterations make: *** [check-qtest-x86_64] Error 1 I debugged it (with some help from Laszlo) and the problem is that it broke the human-monitor-command command. Any usage of this command triggers the bug like: { execute: human-monitor-command, arguments: { command-line: info registers } } It seems simple to fix, I think you just have to initialize mon-cmd_table in qmp_human_monitor_command(), but I'd recommend two things: 1. It's better to split off some/all QMP initialization from monitor_init() and call it from qmp_human_monitor_command() 2. Can you please take the opportunity and test all commands using cur_mon? Just grep for it Sorry for noticing this only now, but I only run make check before sending a pull request (although this very likely shows you didn't run it either).
Re: [Qemu-devel] [PATCH] Make cow_co_is_allocated and cow_update_bitmap more efficient
Il 20/08/2013 15:50, Charlie Shepherd ha scritto: On 20/08/2013 14:45, Paolo Bonzini wrote: I had very similar patches in my series to add get_block_status... Ah, sorry if I've duplicated something you've already done. Stefan encouraged me to contribute to an area of QEMU I found interesting and seeing the XXX I thought I'd take a shot at improving the code. Where should I look in the future for patches like this? There is still room for improvement in this area. You can compare my patches and yours, they follow very different ideas. I could pick up your patch in my series too, actually, if they're reviewed fast enough (I'm not sure I'll have the cycles to do that, unfortunately). Paolo
[Qemu-devel] [PULL 08/21] hw/arm/musicpal: Don't use arm_pic_init_cpu()
Drop the now-deprecated arm_pic_init_cpu() in favour of directly getting the IRQ line from the ARMCPU object. Signed-off-by: Peter Maydell peter.mayd...@linaro.org Message-id: 1375977856-25046-8-git-send-email-peter.mayd...@linaro.org --- hw/arm/musicpal.c |4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/hw/arm/musicpal.c b/hw/arm/musicpal.c index d715143..4404b8d 100644 --- a/hw/arm/musicpal.c +++ b/hw/arm/musicpal.c @@ -1586,7 +1586,6 @@ static void musicpal_init(QEMUMachineInitArgs *args) const char *kernel_cmdline = args-kernel_cmdline; const char *initrd_filename = args-initrd_filename; ARMCPU *cpu; -qemu_irq *cpu_pic; qemu_irq pic[32]; DeviceState *dev; DeviceState *i2c_dev; @@ -1610,7 +1609,6 @@ static void musicpal_init(QEMUMachineInitArgs *args) fprintf(stderr, Unable to find CPU definition\n); exit(1); } -cpu_pic = arm_pic_init_cpu(cpu); /* For now we use a fixed - the original - RAM size */ memory_region_init_ram(ram, NULL, musicpal.ram, MP_RAM_DEFAULT_SIZE); @@ -1622,7 +1620,7 @@ static void musicpal_init(QEMUMachineInitArgs *args) memory_region_add_subregion(address_space_mem, MP_SRAM_BASE, sram); dev = sysbus_create_simple(TYPE_MV88W8618_PIC, MP_PIC_BASE, - cpu_pic[ARM_PIC_CPU_IRQ]); + qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_IRQ)); for (i = 0; i 32; i++) { pic[i] = qdev_get_gpio_in(dev, i); } -- 1.7.9.5
[Qemu-devel] [PULL 07/21] hw/arm/kzm: Don't use arm_pic_init_cpu()
Drop the now-deprecated arm_pic_init_cpu() in favour of directly getting the IRQ line from the ARMCPU object. Signed-off-by: Peter Maydell peter.mayd...@linaro.org Message-id: 1375977856-25046-7-git-send-email-peter.mayd...@linaro.org --- hw/arm/kzm.c |8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/hw/arm/kzm.c b/hw/arm/kzm.c index bd6c05c..a248bf0 100644 --- a/hw/arm/kzm.c +++ b/hw/arm/kzm.c @@ -82,7 +82,6 @@ static void kzm_init(QEMUMachineInitArgs *args) MemoryRegion *ram = g_new(MemoryRegion, 1); MemoryRegion *sram = g_new(MemoryRegion, 1); MemoryRegion *ram_alias = g_new(MemoryRegion, 1); -qemu_irq *cpu_pic; DeviceState *dev; DeviceState *ccm; @@ -108,11 +107,10 @@ static void kzm_init(QEMUMachineInitArgs *args) memory_region_init_ram(sram, NULL, kzm.sram, 0x4000); memory_region_add_subregion(address_space_mem, 0x1FFFC000, sram); -cpu_pic = arm_pic_init_cpu(cpu); dev = sysbus_create_varargs(imx_avic, 0x6800, -cpu_pic[ARM_PIC_CPU_IRQ], -cpu_pic[ARM_PIC_CPU_FIQ], NULL); - +qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_IRQ), +qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_FIQ), +NULL); imx_serial_create(0, 0x43f9, qdev_get_gpio_in(dev, 45)); imx_serial_create(1, 0x43f94000, qdev_get_gpio_in(dev, 32)); -- 1.7.9.5
[Qemu-devel] [PULL 16/21] target-arm: Allow raw_read() and raw_write() to handle 64 bit regs
Extend the raw_read() and raw_write() helper accessors so that they can be used for 64 bit registers as well as 32 bit registers. Signed-off-by: Peter Maydell peter.mayd...@linaro.org Tested-by: Laurent Desnogues laurent.desnog...@gmail.com Reviewed-by: Edgar E. Iglesias edgar.igles...@gmail.com Message-id: 1376065080-26661-2-git-send-email-peter.mayd...@linaro.org --- target-arm/helper.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/target-arm/helper.c b/target-arm/helper.c index 6d9026d..f8689e2 100644 --- a/target-arm/helper.c +++ b/target-arm/helper.c @@ -67,14 +67,22 @@ static int vfp_gdb_set_reg(CPUARMState *env, uint8_t *buf, int reg) static int raw_read(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t *value) { -*value = CPREG_FIELD32(env, ri); +if (ri-type ARM_CP_64BIT) { +*value = CPREG_FIELD64(env, ri); +} else { +*value = CPREG_FIELD32(env, ri); +} return 0; } static int raw_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value) { -CPREG_FIELD32(env, ri) = value; +if (ri-type ARM_CP_64BIT) { +CPREG_FIELD64(env, ri) = value; +} else { +CPREG_FIELD32(env, ri) = value; +} return 0; } -- 1.7.9.5
[Qemu-devel] [PULL 11/21] hw/arm/strongarm: Don't use arm_pic_init_cpu()
Drop the now-deprecated arm_pic_init_cpu() in favour of directly getting the IRQ line from the ARMCPU object. Signed-off-by: Peter Maydell peter.mayd...@linaro.org Message-id: 1375977856-25046-11-git-send-email-peter.mayd...@linaro.org --- hw/arm/strongarm.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/hw/arm/strongarm.c b/hw/arm/strongarm.c index 82a9492..7b8ef8c 100644 --- a/hw/arm/strongarm.c +++ b/hw/arm/strongarm.c @@ -1588,7 +1588,6 @@ StrongARMState *sa1110_init(MemoryRegion *sysmem, unsigned int sdram_size, const char *rev) { StrongARMState *s; -qemu_irq *pic; int i; s = g_malloc0(sizeof(StrongARMState)); @@ -1613,9 +1612,10 @@ StrongARMState *sa1110_init(MemoryRegion *sysmem, vmstate_register_ram_global(s-sdram); memory_region_add_subregion(sysmem, SA_SDCS0, s-sdram); -pic = arm_pic_init_cpu(s-cpu); s-pic = sysbus_create_varargs(strongarm_pic, 0x9005, -pic[ARM_PIC_CPU_IRQ], pic[ARM_PIC_CPU_FIQ], NULL); +qdev_get_gpio_in(DEVICE(s-cpu), ARM_CPU_IRQ), +qdev_get_gpio_in(DEVICE(s-cpu), ARM_CPU_FIQ), +NULL); sysbus_create_varargs(pxa25x-timer, 0x9000, qdev_get_gpio_in(s-pic, SA_PIC_OSTC0), -- 1.7.9.5
[Qemu-devel] [PULL 17/21] target-arm: Support coprocessor registers which do I/O
Add an ARM_CP_IO flag which an ARMCPRegInfo definition can use to indicate that the register's implementation does I/O and thus its accesses need to be surrounded by gen_io_start()/gen_io_end() in order for icount to work. Most notably, cp registers which implement clocks or timers need this. Signed-off-by: Peter Maydell peter.mayd...@linaro.org Tested-by: Laurent Desnogues laurent.desnog...@gmail.com Reviewed-by: Edgar E. Iglesias edgar.igles...@gmail.com Message-id: 1376065080-26661-3-git-send-email-peter.mayd...@linaro.org --- target-arm/cpu.h |6 +- target-arm/translate.c | 16 +--- 2 files changed, 18 insertions(+), 4 deletions(-) diff --git a/target-arm/cpu.h b/target-arm/cpu.h index dffeec7..c2cb534 100644 --- a/target-arm/cpu.h +++ b/target-arm/cpu.h @@ -472,6 +472,9 @@ static inline uint64_t cpreg_to_kvm_id(uint32_t cpregid) * old must have the OVERRIDE bit set. * NO_MIGRATE indicates that this register should be ignored for migration; * (eg because any state is accessed via some other coprocessor register). + * IO indicates that this register does I/O and therefore its accesses + * need to be surrounded by gen_io_start()/gen_io_end(). In particular, + * registers which implement clocks or timers require this. */ #define ARM_CP_SPECIAL 1 #define ARM_CP_CONST 2 @@ -479,13 +482,14 @@ static inline uint64_t cpreg_to_kvm_id(uint32_t cpregid) #define ARM_CP_SUPPRESS_TB_END 8 #define ARM_CP_OVERRIDE 16 #define ARM_CP_NO_MIGRATE 32 +#define ARM_CP_IO 64 #define ARM_CP_NOP (ARM_CP_SPECIAL | (1 8)) #define ARM_CP_WFI (ARM_CP_SPECIAL | (2 8)) #define ARM_LAST_SPECIAL ARM_CP_WFI /* Used only as a terminator for ARMCPRegInfo lists */ #define ARM_CP_SENTINEL 0x /* Mask of only the flag bits in a type field */ -#define ARM_CP_FLAG_MASK 0x3f +#define ARM_CP_FLAG_MASK 0x7f /* Return true if cptype is a valid type field. This is used to try to * catch errors where the sentinel has been accidentally left off the end diff --git a/target-arm/translate.c b/target-arm/translate.c index 6db4c50..d1e8538 100644 --- a/target-arm/translate.c +++ b/target-arm/translate.c @@ -6280,6 +6280,10 @@ static int disas_coproc_insn(CPUARMState * env, DisasContext *s, uint32_t insn) break; } +if (use_icount (ri-type ARM_CP_IO)) { +gen_io_start(); +} + if (isread) { /* Read */ if (is64) { @@ -6369,14 +6373,20 @@ static int disas_coproc_insn(CPUARMState * env, DisasContext *s, uint32_t insn) store_cpu_offset(tmp, ri-fieldoffset); } } +} + +if (use_icount (ri-type ARM_CP_IO)) { +/* I/O operations must end the TB here (whether read or write) */ +gen_io_end(); +gen_lookup_tb(s); +} else if (!isread !(ri-type ARM_CP_SUPPRESS_TB_END)) { /* We default to ending the TB on a coprocessor register write, * but allow this to be suppressed by the register definition * (usually only necessary to work around guest bugs). */ -if (!(ri-type ARM_CP_SUPPRESS_TB_END)) { -gen_lookup_tb(s); -} +gen_lookup_tb(s); } + return 0; } -- 1.7.9.5
[Qemu-devel] [PULL 13/21] hw/arm/vexpress: Don't use arm_pic_init_cpu()
Drop the now-deprecated arm_pic_init_cpu() in favour of directly getting the IRQ line from the ARMCPU object. Signed-off-by: Peter Maydell peter.mayd...@linaro.org Message-id: 1375977856-25046-13-git-send-email-peter.mayd...@linaro.org --- hw/arm/vexpress.c |8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/hw/arm/vexpress.c b/hw/arm/vexpress.c index 9586e38..fbd71a7 100644 --- a/hw/arm/vexpress.c +++ b/hw/arm/vexpress.c @@ -183,7 +183,6 @@ static void a9_daughterboard_init(const VEDBoardInfo *daughterboard, MemoryRegion *lowram = g_new(MemoryRegion, 1); DeviceState *dev; SysBusDevice *busdev; -qemu_irq *irqp; int n; qemu_irq cpu_irq[4]; ram_addr_t low_ram_size; @@ -198,8 +197,7 @@ static void a9_daughterboard_init(const VEDBoardInfo *daughterboard, fprintf(stderr, Unable to find CPU definition\n); exit(1); } -irqp = arm_pic_init_cpu(cpu); -cpu_irq[n] = irqp[ARM_PIC_CPU_IRQ]; +cpu_irq[n] = qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_IRQ); } if (ram_size 0x4000) { @@ -312,15 +310,13 @@ static void a15_daughterboard_init(const VEDBoardInfo *daughterboard, for (n = 0; n smp_cpus; n++) { ARMCPU *cpu; -qemu_irq *irqp; cpu = cpu_arm_init(cpu_model); if (!cpu) { fprintf(stderr, Unable to find CPU definition\n); exit(1); } -irqp = arm_pic_init_cpu(cpu); -cpu_irq[n] = irqp[ARM_PIC_CPU_IRQ]; +cpu_irq[n] = qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_IRQ); } { -- 1.7.9.5
[Qemu-devel] [PULL 20/21] default-configs: Fix A9MP and A15MP config names
When individual CONFIG_ switches for the A9MPcore and A15MPcore devices were created, they were inadvertently given incorrect names (CONFIG_ARM9MPCORE and CONFIG_ARM15MPCORE). These CPUs are Cortex-A9MP and Cortex-A15MP, and in particular the ARM9 is a different (rather older) CPU than the Cortex-A9. Rename the CONFIG_ switches to bring them into line with the source file names and CPU names. Signed-off-by: Peter Maydell peter.mayd...@linaro.org Message-id: 1376056215-26391-1-git-send-email-peter.mayd...@linaro.org --- default-configs/arm-softmmu.mak |4 ++-- hw/cpu/Makefile.objs|4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak index 27cbe3d..ac0815d 100644 --- a/default-configs/arm-softmmu.mak +++ b/default-configs/arm-softmmu.mak @@ -34,9 +34,9 @@ CONFIG_PFLASH_CFI02=y CONFIG_MICRODRIVE=y CONFIG_USB_MUSB=y -CONFIG_ARM9MPCORE=y CONFIG_ARM11MPCORE=y -CONFIG_ARM15MPCORE=y +CONFIG_A9MPCORE=y +CONFIG_A15MPCORE=y CONFIG_ARM_GIC=y CONFIG_ARM_GIC_KVM=$(CONFIG_KVM) diff --git a/hw/cpu/Makefile.objs b/hw/cpu/Makefile.objs index 4461ece..df287c1 100644 --- a/hw/cpu/Makefile.objs +++ b/hw/cpu/Makefile.objs @@ -1,5 +1,5 @@ obj-$(CONFIG_ARM11MPCORE) += arm11mpcore.o -obj-$(CONFIG_ARM9MPCORE) += a9mpcore.o -obj-$(CONFIG_ARM15MPCORE) += a15mpcore.o +obj-$(CONFIG_A9MPCORE) += a9mpcore.o +obj-$(CONFIG_A15MPCORE) += a15mpcore.o obj-$(CONFIG_ICC_BUS) += icc_bus.o -- 1.7.9.5
[Qemu-devel] [PULL 05/21] hw/arm/highbank: Don't use arm_pic_init_cpu()
Drop the now-deprecated arm_pic_init_cpu() in favour of directly getting the IRQ line from the ARMCPU object. Signed-off-by: Peter Maydell peter.mayd...@linaro.org Message-id: 1375977856-25046-5-git-send-email-peter.mayd...@linaro.org --- hw/arm/highbank.c |4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/hw/arm/highbank.c b/hw/arm/highbank.c index 35d5511..f733a6c 100644 --- a/hw/arm/highbank.c +++ b/hw/arm/highbank.c @@ -209,7 +209,6 @@ static void calxeda_init(QEMUMachineInitArgs *args, enum cxmachines machine) const char *initrd_filename = args-initrd_filename; DeviceState *dev = NULL; SysBusDevice *busdev; -qemu_irq *irqp; qemu_irq pic[128]; int n; qemu_irq cpu_irq[4]; @@ -239,8 +238,7 @@ static void calxeda_init(QEMUMachineInitArgs *args, enum cxmachines machine) /* This will become a QOM property eventually */ cpu-reset_cbar = GIC_BASE_ADDR; -irqp = arm_pic_init_cpu(cpu); -cpu_irq[n] = irqp[ARM_PIC_CPU_IRQ]; +cpu_irq[n] = qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_IRQ); } sysmem = get_system_memory(); -- 1.7.9.5
[Qemu-devel] [PULL 21/21] hw/timer/imx_epit: Simplify and fix imx_epit implementation
From: Peter Chubb peter.ch...@nicta.com.au When imx_epit.c was last refactored, a common usecase (comparison register zero) broke. This patch fixes that, and simplifies the code yet more. It also fixes a major thinko in the reset path --- the wrong bits in the control register were being cleared. Signed-off-by: Peter Chubb peter.ch...@nicta.com.au Reviewed-by: Jean-Christophe DUBOIS j...@tribudubois.net Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- hw/timer/imx_epit.c | 94 --- 1 file changed, 36 insertions(+), 58 deletions(-) diff --git a/hw/timer/imx_epit.c b/hw/timer/imx_epit.c index 117dc7b..dc73d65 100644 --- a/hw/timer/imx_epit.c +++ b/hw/timer/imx_epit.c @@ -43,7 +43,7 @@ static char const *imx_epit_reg_name(uint32_t reg) } # define DPRINTF(fmt, args...) \ - do { printf(%s: fmt , __func__, ##args); } while (0) +do { fprintf(stderr, %s: fmt , __func__, ##args); } while (0) #else # define DPRINTF(fmt, args...) do {} while (0) #endif @@ -152,7 +152,7 @@ static void imx_epit_reset(DeviceState *dev) /* * Soft reset doesn't touch some bits; hard reset clears them */ -s-cr = ~(CR_EN|CR_ENMOD|CR_STOPEN|CR_DOZEN|CR_WAITEN|CR_DBGEN); +s-cr = (CR_EN|CR_ENMOD|CR_STOPEN|CR_DOZEN|CR_WAITEN|CR_DBGEN); s-sr = 0; s-lr = TIMER_MAX; s-cmp = 0; @@ -167,7 +167,7 @@ static void imx_epit_reset(DeviceState *dev) ptimer_set_limit(s-timer_reload, TIMER_MAX, 1); if (s-freq (s-cr CR_EN)) { /* if the timer is still enabled, restart it */ -ptimer_run(s-timer_reload, 1); +ptimer_run(s-timer_reload, 0); } } @@ -218,17 +218,17 @@ static uint64_t imx_epit_read(void *opaque, hwaddr offset, unsigned size) static void imx_epit_reload_compare_timer(IMXEPITState *s) { -if ((s-cr CR_OCIEN) s-cmp) { -/* if the compare feature is on */ +if ((s-cr (CR_EN | CR_OCIEN)) == (CR_EN | CR_OCIEN)) { +/* if the compare feature is on and timers are running */ uint32_t tmp = imx_epit_update_count(s); +uint64_t next; if (tmp s-cmp) { -/* reinit the cmp timer if required */ -ptimer_set_count(s-timer_cmp, tmp - s-cmp); -if ((s-cr CR_EN)) { -/* Restart the cmp timer if required */ -ptimer_run(s-timer_cmp, 0); -} +/* It'll fire in this round of the timer */ +next = tmp - s-cmp; +} else { /* catch it next time around */ +next = tmp - s-cmp + ((s-cr CR_RLD) ? TIMER_MAX : s-lr); } +ptimer_set_count(s-timer_cmp, next); } } @@ -237,11 +237,14 @@ static void imx_epit_write(void *opaque, hwaddr offset, uint64_t value, { IMXEPITState *s = IMX_EPIT(opaque); uint32_t reg = offset 2; +uint64_t oldcr; DPRINTF((%s, value = 0x%08x)\n, imx_epit_reg_name(reg), (uint32_t)value); switch (reg) { case 0: /* CR */ + +oldcr = s-cr; s-cr = value 0x03ff; if (s-cr CR_SWR) { /* handle the reset */ @@ -250,22 +253,35 @@ static void imx_epit_write(void *opaque, hwaddr offset, uint64_t value, imx_epit_set_freq(s); } -if (s-freq (s-cr CR_EN)) { +if (s-freq (s-cr CR_EN) !(oldcr CR_EN)) { if (s-cr CR_ENMOD) { if (s-cr CR_RLD) { ptimer_set_limit(s-timer_reload, s-lr, 1); +ptimer_set_limit(s-timer_cmp, s-lr, 1); } else { ptimer_set_limit(s-timer_reload, TIMER_MAX, 1); +ptimer_set_limit(s-timer_cmp, TIMER_MAX, 1); } } imx_epit_reload_compare_timer(s); - -ptimer_run(s-timer_reload, 1); -} else { +ptimer_run(s-timer_reload, 0); +if (s-cr CR_OCIEN) { +ptimer_run(s-timer_cmp, 0); +} else { +ptimer_stop(s-timer_cmp); +} +} else if (!(s-cr CR_EN)) { /* stop both timers */ ptimer_stop(s-timer_reload); ptimer_stop(s-timer_cmp); +} else if (s-cr CR_OCIEN) { +if (!(oldcr CR_OCIEN)) { +imx_epit_reload_compare_timer(s); +ptimer_run(s-timer_cmp, 0); +} +} else { +ptimer_stop(s-timer_cmp); } break; @@ -284,13 +300,13 @@ static void imx_epit_write(void *opaque, hwaddr offset, uint64_t value, /* Also set the limit if the LRD bit is set */ /* If IOVW bit is set then set the timer value */ ptimer_set_limit(s-timer_reload, s-lr, s-cr CR_IOVW); +ptimer_set_limit(s-timer_cmp, s-lr, 0); } else if (s-cr CR_IOVW) { /* If IOVW bit is set then set the timer value */ ptimer_set_count(s-timer_reload,
[Qemu-devel] [PULL 01/21] target-arm: Implement 'int' loglevel
The 'int' loglevel for recording interrupts and exceptions requires support in the target-specific code. Implement it for ARM. This improves debug logging in some situations that were otherwise pretty opaque, such as when we fault trying to execute at an exception vector address, which would otherwise cause an infinite loop of taking exceptions without any indication in the debug log of what was going on. Signed-off-by: Peter Maydell peter.mayd...@linaro.org Reviewed-by: Richard Henderson r...@twiddle.net Message-id: 1375700771-21665-1-git-send-email-peter.mayd...@linaro.org --- target-arm/helper.c | 42 ++ 1 file changed, 42 insertions(+) diff --git a/target-arm/helper.c b/target-arm/helper.c index 4968391..6d9026d 100644 --- a/target-arm/helper.c +++ b/target-arm/helper.c @@ -1974,6 +1974,37 @@ static void do_v7m_exception_exit(CPUARMState *env) pointer. */ } +/* Exception names for debug logging; note that not all of these + * precisely correspond to architectural exceptions. + */ +static const char * const excnames[] = { +[EXCP_UDEF] = Undefined Instruction, +[EXCP_SWI] = SVC, +[EXCP_PREFETCH_ABORT] = Prefetch Abort, +[EXCP_DATA_ABORT] = Data Abort, +[EXCP_IRQ] = IRQ, +[EXCP_FIQ] = FIQ, +[EXCP_BKPT] = Breakpoint, +[EXCP_EXCEPTION_EXIT] = QEMU v7M exception exit, +[EXCP_KERNEL_TRAP] = QEMU intercept of kernel commpage, +[EXCP_STREX] = QEMU intercept of STREX, +}; + +static inline void arm_log_exception(int idx) +{ +if (qemu_loglevel_mask(CPU_LOG_INT)) { +const char *exc = NULL; + +if (idx = 0 idx ARRAY_SIZE(excnames)) { +exc = excnames[idx]; +} +if (!exc) { +exc = unknown; +} +qemu_log_mask(CPU_LOG_INT, Taking exception %d [%s]\n, idx, exc); +} +} + void arm_v7m_cpu_do_interrupt(CPUState *cs) { ARMCPU *cpu = ARM_CPU(cs); @@ -1982,6 +2013,8 @@ void arm_v7m_cpu_do_interrupt(CPUState *cs) uint32_t lr; uint32_t addr; +arm_log_exception(env-exception_index); + lr = 0xfff1; if (env-v7m.current_sp) lr |= 4; @@ -2011,6 +2044,7 @@ void arm_v7m_cpu_do_interrupt(CPUState *cs) if (nr == 0xab) { env-regs[15] += 2; env-regs[0] = do_arm_semihosting(env); +qemu_log_mask(CPU_LOG_INT, ...handled as semihosting call\n); return; } } @@ -2064,6 +2098,8 @@ void arm_cpu_do_interrupt(CPUState *cs) assert(!IS_M(env)); +arm_log_exception(env-exception_index); + /* TODO: Vectored interrupt controller. */ switch (env-exception_index) { case EXCP_UDEF: @@ -2091,6 +2127,7 @@ void arm_cpu_do_interrupt(CPUState *cs) || (mask == 0xab env-thumb)) (env-uncached_cpsr CPSR_M) != ARM_CPU_MODE_USR) { env-regs[0] = do_arm_semihosting(env); +qemu_log_mask(CPU_LOG_INT, ...handled as semihosting call\n); return; } } @@ -2108,18 +2145,23 @@ void arm_cpu_do_interrupt(CPUState *cs) (env-uncached_cpsr CPSR_M) != ARM_CPU_MODE_USR) { env-regs[15] += 2; env-regs[0] = do_arm_semihosting(env); +qemu_log_mask(CPU_LOG_INT, ...handled as semihosting call\n); return; } } env-cp15.c5_insn = 2; /* Fall through to prefetch abort. */ case EXCP_PREFETCH_ABORT: +qemu_log_mask(CPU_LOG_INT, ...with IFSR 0x%x IFAR 0x%x\n, + env-cp15.c5_insn, env-cp15.c6_insn); new_mode = ARM_CPU_MODE_ABT; addr = 0x0c; mask = CPSR_A | CPSR_I; offset = 4; break; case EXCP_DATA_ABORT: +qemu_log_mask(CPU_LOG_INT, ...with DFSR 0x%x DFAR 0x%x\n, + env-cp15.c5_data, env-cp15.c6_data); new_mode = ARM_CPU_MODE_ABT; addr = 0x10; mask = CPSR_A | CPSR_I; -- 1.7.9.5
[Qemu-devel] [PULL 14/21] hw/arm/xilinx_zynq: Don't use arm_pic_init_cpu()
Drop the now-deprecated arm_pic_init_cpu() in favour of directly getting the IRQ line from the ARMCPU object. Signed-off-by: Peter Maydell peter.mayd...@linaro.org Message-id: 1375977856-25046-14-git-send-email-peter.mayd...@linaro.org --- hw/arm/xilinx_zynq.c |7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c index 3444823..0f18c85 100644 --- a/hw/arm/xilinx_zynq.c +++ b/hw/arm/xilinx_zynq.c @@ -108,11 +108,9 @@ static void zynq_init(QEMUMachineInitArgs *args) MemoryRegion *ocm_ram = g_new(MemoryRegion, 1); DeviceState *dev; SysBusDevice *busdev; -qemu_irq *irqp; qemu_irq pic[64]; NICInfo *nd; int n; -qemu_irq cpu_irq; if (!cpu_model) { cpu_model = cortex-a9; @@ -123,8 +121,6 @@ static void zynq_init(QEMUMachineInitArgs *args) fprintf(stderr, Unable to find CPU definition\n); exit(1); } -irqp = arm_pic_init_cpu(cpu); -cpu_irq = irqp[ARM_PIC_CPU_IRQ]; /* max 2GB ram */ if (ram_size 0x8000) { @@ -159,7 +155,8 @@ static void zynq_init(QEMUMachineInitArgs *args) qdev_init_nofail(dev); busdev = SYS_BUS_DEVICE(dev); sysbus_mmio_map(busdev, 0, 0xF8F0); -sysbus_connect_irq(busdev, 0, cpu_irq); +sysbus_connect_irq(busdev, 0, + qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_IRQ)); for (n = 0; n 64; n++) { pic[n] = qdev_get_gpio_in(dev, n); -- 1.7.9.5
[Qemu-devel] [PULL 15/21] hw/arm/pic_cpu: Remove the now-unneeded arm_pic_init_cpu()
Now all the boards have been converted arm_pic_init_cpu() is unused and can just be deleted. Signed-off-by: Peter Maydell peter.mayd...@linaro.org Message-id: 1375977856-25046-15-git-send-email-peter.mayd...@linaro.org --- hw/arm/Makefile.objs |2 +- hw/arm/pic_cpu.c | 25 - include/hw/arm/arm.h |5 - 3 files changed, 1 insertion(+), 31 deletions(-) delete mode 100644 hw/arm/pic_cpu.c diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs index 9e3a06f..3671b42 100644 --- a/hw/arm/Makefile.objs +++ b/hw/arm/Makefile.objs @@ -1,6 +1,6 @@ obj-y += boot.o collie.o exynos4_boards.o gumstix.o highbank.o obj-y += integratorcp.o kzm.o mainstone.o musicpal.o nseries.o -obj-y += omap_sx1.o palm.o pic_cpu.o realview.o spitz.o stellaris.o +obj-y += omap_sx1.o palm.o realview.o spitz.o stellaris.o obj-y += tosa.o versatilepb.o vexpress.o xilinx_zynq.o z2.o obj-y += armv7m.o exynos4210.o pxa2xx.o pxa2xx_gpio.o pxa2xx_pic.o diff --git a/hw/arm/pic_cpu.c b/hw/arm/pic_cpu.c deleted file mode 100644 index 9c36273..000 --- a/hw/arm/pic_cpu.c +++ /dev/null @@ -1,25 +0,0 @@ -/* - * Generic ARM Programmable Interrupt Controller support. - * - * Copyright (c) 2006 CodeSourcery. - * Written by Paul Brook - * - * This code is licensed under the LGPL - */ - -#include hw/hw.h -#include hw/arm/arm.h - -/* Backwards compatibility shim; this can disappear when all - * board models have been updated to get IRQ and FIQ lines directly - * from the ARMCPU object rather than by calling this function. - */ -qemu_irq *arm_pic_init_cpu(ARMCPU *cpu) -{ -DeviceState *dev = DEVICE(cpu); -qemu_irq *irqs = g_new(qemu_irq, 2); - -irqs[0] = qdev_get_gpio_in(dev, ARM_CPU_IRQ); -irqs[1] = qdev_get_gpio_in(dev, ARM_CPU_FIQ); -return irqs; -} diff --git a/include/hw/arm/arm.h b/include/hw/arm/arm.h index bae87c6..ecbbba8 100644 --- a/include/hw/arm/arm.h +++ b/include/hw/arm/arm.h @@ -14,11 +14,6 @@ #include exec/memory.h #include hw/irq.h -/* The CPU is also modelled as an interrupt controller. */ -#define ARM_PIC_CPU_IRQ 0 -#define ARM_PIC_CPU_FIQ 1 -qemu_irq *arm_pic_init_cpu(ARMCPU *cpu); - /* armv7m.c */ qemu_irq *armv7m_init(MemoryRegion *address_space_mem, int flash_size, int sram_size, -- 1.7.9.5
[Qemu-devel] [PULL 19/21] hw/cpu/a15mpcore: Wire generic timer outputs to GIC inputs
Now our A15 CPU implements the generic timers, we can wire them up to the appropriate inputs on the GIC. Signed-off-by: Peter Maydell peter.mayd...@linaro.org Tested-by: Laurent Desnogues laurent.desnog...@gmail.com Message-id: 1376065080-26661-5-git-send-email-peter.mayd...@linaro.org --- hw/cpu/a15mpcore.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/hw/cpu/a15mpcore.c b/hw/cpu/a15mpcore.c index 4f37964..af182da 100644 --- a/hw/cpu/a15mpcore.c +++ b/hw/cpu/a15mpcore.c @@ -49,6 +49,8 @@ static int a15mp_priv_init(SysBusDevice *dev) A15MPPrivState *s = A15MPCORE_PRIV(dev); SysBusDevice *busdev; const char *gictype = arm_gic; +int i; +CPUState *cpu; if (kvm_irqchip_in_kernel()) { gictype = kvm-arm-gic; @@ -67,6 +69,22 @@ static int a15mp_priv_init(SysBusDevice *dev) /* Pass through inbound GPIO lines to the GIC */ qdev_init_gpio_in(DEVICE(dev), a15mp_priv_set_irq, s-num_irq - 32); +/* Wire the outputs from each CPU's generic timer to the + * appropriate GIC PPI inputs + */ +for (i = 0, cpu = first_cpu; i s-num_cpu; i++, cpu = cpu-next_cpu) { +DeviceState *cpudev = DEVICE(cpu); +int ppibase = s-num_irq - 32 + i * 32; +/* physical timer; we wire it up to the non-secure timer's ID, + * since a real A15 always has TrustZone but QEMU doesn't. + */ +qdev_connect_gpio_out(cpudev, 0, + qdev_get_gpio_in(s-gic, ppibase + 30)); +/* virtual timer */ +qdev_connect_gpio_out(cpudev, 1, + qdev_get_gpio_in(s-gic, ppibase + 27)); +} + /* Memory map (addresses are offsets from PERIPHBASE): * 0x-0x0fff -- reserved * 0x1000-0x1fff -- GIC Distributor -- 1.7.9.5
[Qemu-devel] [PULL 09/21] hw/arm/omap*: Don't use arm_pic_init_cpu()
Drop the now-deprecated arm_pic_init_cpu() in favour of directly getting the IRQ line from the ARMCPU object. Signed-off-by: Peter Maydell peter.mayd...@linaro.org Message-id: 1375977856-25046-9-git-send-email-peter.mayd...@linaro.org --- hw/arm/omap1.c |8 hw/arm/omap2.c |8 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/hw/arm/omap1.c b/hw/arm/omap1.c index 19be5fc..b6a0b27 100644 --- a/hw/arm/omap1.c +++ b/hw/arm/omap1.c @@ -3827,7 +3827,6 @@ struct omap_mpu_state_s *omap310_mpu_init(MemoryRegion *system_memory, int i; struct omap_mpu_state_s *s = (struct omap_mpu_state_s *) g_malloc0(sizeof(struct omap_mpu_state_s)); -qemu_irq *cpu_irq; qemu_irq dma_irqs[6]; DriveInfo *dinfo; SysBusDevice *busdev; @@ -3860,14 +3859,15 @@ struct omap_mpu_state_s *omap310_mpu_init(MemoryRegion *system_memory, omap_clkm_init(system_memory, 0xfffece00, 0xe1008000, s); -cpu_irq = arm_pic_init_cpu(s-cpu); s-ih[0] = qdev_create(NULL, omap-intc); qdev_prop_set_uint32(s-ih[0], size, 0x100); qdev_prop_set_ptr(s-ih[0], clk, omap_findclk(s, arminth_ck)); qdev_init_nofail(s-ih[0]); busdev = SYS_BUS_DEVICE(s-ih[0]); -sysbus_connect_irq(busdev, 0, cpu_irq[ARM_PIC_CPU_IRQ]); -sysbus_connect_irq(busdev, 1, cpu_irq[ARM_PIC_CPU_FIQ]); +sysbus_connect_irq(busdev, 0, + qdev_get_gpio_in(DEVICE(s-cpu), ARM_CPU_IRQ)); +sysbus_connect_irq(busdev, 1, + qdev_get_gpio_in(DEVICE(s-cpu), ARM_CPU_FIQ)); sysbus_mmio_map(busdev, 0, 0xfffecb00); s-ih[1] = qdev_create(NULL, omap-intc); qdev_prop_set_uint32(s-ih[1], size, 0x800); diff --git a/hw/arm/omap2.c b/hw/arm/omap2.c index ec9610b..36efde0 100644 --- a/hw/arm/omap2.c +++ b/hw/arm/omap2.c @@ -2244,7 +2244,6 @@ struct omap_mpu_state_s *omap2420_mpu_init(MemoryRegion *sysmem, { struct omap_mpu_state_s *s = (struct omap_mpu_state_s *) g_malloc0(sizeof(struct omap_mpu_state_s)); -qemu_irq *cpu_irq; qemu_irq dma_irqs[4]; DriveInfo *dinfo; int i; @@ -2277,15 +2276,16 @@ struct omap_mpu_state_s *omap2420_mpu_init(MemoryRegion *sysmem, s-l4 = omap_l4_init(sysmem, OMAP2_L4_BASE, 54); /* Actually mapped at any 2K boundary in the ARM11 private-peripheral if */ -cpu_irq = arm_pic_init_cpu(s-cpu); s-ih[0] = qdev_create(NULL, omap2-intc); qdev_prop_set_uint8(s-ih[0], revision, 0x21); qdev_prop_set_ptr(s-ih[0], fclk, omap_findclk(s, mpu_intc_fclk)); qdev_prop_set_ptr(s-ih[0], iclk, omap_findclk(s, mpu_intc_iclk)); qdev_init_nofail(s-ih[0]); busdev = SYS_BUS_DEVICE(s-ih[0]); -sysbus_connect_irq(busdev, 0, cpu_irq[ARM_PIC_CPU_IRQ]); -sysbus_connect_irq(busdev, 1, cpu_irq[ARM_PIC_CPU_FIQ]); +sysbus_connect_irq(busdev, 0, + qdev_get_gpio_in(DEVICE(s-cpu), ARM_CPU_IRQ)); +sysbus_connect_irq(busdev, 1, + qdev_get_gpio_in(DEVICE(s-cpu), ARM_CPU_FIQ)); sysbus_mmio_map(busdev, 0, 0x480fe000); s-prcm = omap_prcm_init(omap_l4tao(s-l4, 3), qdev_get_gpio_in(s-ih[0], -- 1.7.9.5