date:20130820

[Qemu-devel] [PATCH v4 01/13] block: vhdx - minor comments and typo correction.

2013-08-20 Thread Jeff Cody

Just a couple of minor comments to help note where allocated
buffers are freed, and a typo fix.

Signed-off-by: Jeff Cody jc...@redhat.com
---
 block/vhdx.c | 6 --
 block/vhdx.h | 6 +++---
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/vhdx.c b/block/vhdx.c
index e9704b1..56bc88e 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -6,9 +6,9 @@
  * Authors:
  *  Jeff Cody jc...@redhat.com
  *
- *  This is based on the VHDX Format Specification v0.95, published 4/12/2012
+ *  This is based on the VHDX Format Specification v1.00, published 8/25/2012
  *  by Microsoft:
- *  https://www.microsoft.com/en-us/download/details.aspx?id=29681
+ *  https://www.microsoft.com/en-us/download/details.aspx?id=34750
  *
  * This work is licensed under the terms of the GNU LGPL, version 2 or later.
  * See the COPYING.LIB file in the top-level directory.
@@ -262,6 +262,7 @@ static int vhdx_parse_header(BlockDriverState *bs, 
BDRVVHDXState *s)
 uint64_t h2_seq = 0;
 uint8_t *buffer;
 
+/* header1  header2 are freed in vhdx_close() */
 header1 = qemu_blockalign(bs, sizeof(VHDXHeader));
 header2 = qemu_blockalign(bs, sizeof(VHDXHeader));
 
@@ -787,6 +788,7 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, 
int flags)
 goto fail;
 }
 
+/* s-bat is freed in vhdx_close() */
 s-bat = qemu_blockalign(bs, s-bat_rt.length);
 
 ret = bdrv_pread(bs-file, s-bat_offset, s-bat, s-bat_rt.length);
diff --git a/block/vhdx.h b/block/vhdx.h
index fb687ed..9eb6b97 100644
--- a/block/vhdx.h
+++ b/block/vhdx.h
@@ -6,9 +6,9 @@
  * Authors:
  *  Jeff Cody jc...@redhat.com
  *
- *  This is based on the VHDX Format Specification v0.95, published 4/12/2012
+ *  This is based on the VHDX Format Specification v1.00, published 8/25/2012
  *  by Microsoft:
- *  https://www.microsoft.com/en-us/download/details.aspx?id=29681
+ *  https://www.microsoft.com/en-us/download/details.aspx?id=34750
  *
  * This work is licensed under the terms of the GNU LGPL, version 2 or later.
  * See the COPYING.LIB file in the top-level directory.
@@ -116,7 +116,7 @@ typedef struct QEMU_PACKED VHDXHeader {
valid. */
 uint16_tlog_version;/* version of the log format. Mustn't 
be
zero, unless log_guid is also zero 
*/
-uint16_tversion;/* version of th evhdx file.  
Currently,
+uint16_tversion;/* version of the vhdx file.  
Currently,
only supported version is 1 */
 uint32_tlog_length; /* length of the log.  Must be multiple
of 1MB */
-- 
1.8.1.4

[Qemu-devel] [PATCH v4 00/13] VHDX log replay and write support, .bdrv_create()

2013-08-20 Thread Jeff Cody

This patch series contains the initial VHDX log parsing, replay,
and write support.  (New with v4: VHDX image file creation)

=== v4 changes ===  

v4 patches are available from github as well, on branch vhdx-write-v4-upstream:
https://github.com/codyprime/qemu-kvm-jtc/tree/vhdx-write-v4-upstream
https://github.com/codyprime/qemu-kvm-jtc.git

Those in the midst of reviewing v3, don't fear - the only changes with v4 is
the addition of patches on the end of the series (patches 10-13).  These
patches enable creating VHDX images.  Image files created have been
(briefly  lightly) tested on Hyper-V running on Windows Server 2012.

Some of the new patches could be squashed with earlier patches in the series,
but I refrained from doing so, since some of the patches have already been
reviewed, and others are in the midst of review.  I want to make it as easy
as possible on those currently reviewing. There is nothing critical
that needs to be pushed into the earlier patches.

New patches:

Patch 10:  Breaks out some more endian translation functions
(likely squashable into patch 5)

Patch 11:  Break out some operations into seperate helper functions

Patch 12:  More comment typos and header fixes in vhdx.h
(likely squashable into patch 1)

Patch 13:  Adds .bdrv_create() for vhdx.  VHDX images are can be created for
   Fixed or Dynamic images.

Patches 1-9 are unchanged.

=== end v4 changelog ===

=== v3 changes ===  

Thank you Kevin  Stefan for the feedback; incoporated in v3:

Patch 1: --- nil ---

Patch 2: * use sizeof(crc) instead of 4
 * remove outdated comment
 * use sizeof(MSGUID) instead of 16
 * direct assignment of guid structs rather than memcpy
 * rename 'rw' to 'generate_data_write_guid'
 * use offsetof() instead of 4
 * comment typos
 * add missing error checking
 * MSGUID is now QEMU_PACKED
 * configure enable for vhdx is now correct and not braindead

Patch 3: --- nil ---

Patch 4: * code style fixes
 * removed unused struct (VHDXLogEntryInfo)

Patch 5: * more direct assignment of guid rather than memcpy
 * order of operation in export/import the same now
 * became less generous with newlines (bah-humbug!)

Patch 6: * more direct assignment of guid rather than memcpy 
 * add error check in vhdx_user_visible_write(), now returns int

Patch 7: * check error return now of vhdx_user_visible_write()

Patch 8: * check error return now of vhdx_user_visible_write()
 * vhdx_log_write_sectors() uses bdrv_pwrite() vs bdrv_pwrite_sync()
 * more direct assignment of guid rather than memcpy 
 * use offsetof() instead of 4

Patch 9: --- nil ---

=== end v3 changelog ===

v2 changes:  Incorporated Fam's review feedback - Thank you Fam for the feedback

=== end v2 changelog ===

This will allow an existing log in a VHDX image to be replayed (e.g., a VHDX
image from a Hyper-V host that crashed).  In addition, metadata writes are
enabled through the log.  This allows write support to be enabled for VHDX,
as the BAT can be updated safely via the log journal.

These exact patches are available from github, for testing:
https://github.com/codyprime/qemu-kvm-jtc/tree/vhdx-write-v2-upstream

The latest vhdx work (including anything beyond these patches, such as
backing/parent file support) can be found at:
https://github.com/codyprime/qemu-kvm-jtc/tree/jtc-vhdx-latest

Jeff Cody (13):
  block: vhdx - minor comments and typo correction.
  block: vhdx - add header update capability.
  block: vhdx code movement - VHDXMetadataEntries and BDRVVHDXState to
header.
  block: vhdx - log support struct and defines
  block: vhdx - break endian translation functions out
  block: vhdx - update log guid in header, and first write tracker
  block: vhdx - log parsing, replay, and flush support
  block: vhdx - add log write support
  block: vhdx write support
  block: vhdx - move more endian translations to vhdx-endian.c
  block: vhdx - break out code operations to functions
  block: vhdx - fix comment typos in header, fix incorrect struct fields
  block: vhdx - add .bdrv_create() support

 block/Makefile.objs |2 +-
 block/vhdx-endian.c |  216 +++
 block/vhdx-log.c| 1011 ++
 block/vhdx.c| 1014 ---
 block/vhdx.h|  167 +++--
 configure   |   24 ++
 6 files changed, 2273 insertions(+), 161 deletions(-)
 create mode 100644 block/vhdx-endian.c
 create mode 100644 block/vhdx-log.c

-- 
1.8.1.4

[Qemu-devel] [PATCH v4 02/13] block: vhdx - add header update capability.

2013-08-20 Thread Jeff Cody

This adds the ability to update the headers in a VHDX image, including
generating a new MS-compatible GUID.

As VHDX depends on uuid.h, VHDX is now a configurable build option.  If
VHDX support is enabled, that will also enable uuid as well.  The
default is to have VHDX enabled.

To enable/disable VHDX:  --enable-vhdx, --disable-vhdx

Signed-off-by: Jeff Cody jc...@redhat.com
---
 block/Makefile.objs |   2 +-
 block/vhdx.c| 161 +++-
 block/vhdx.h|  14 -
 configure   |  24 
 4 files changed, 196 insertions(+), 5 deletions(-)

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 4cf9aa4..e5e54e6 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -2,7 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o 
bochs.o vpc.o vvfat
 block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o 
qcow2-cache.o
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
-block-obj-y += vhdx.o
+block-obj-$(CONFIG_VHDX) += vhdx.o
 block-obj-y += parallels.o blkdebug.o blkverify.o
 block-obj-y += snapshot.o qapi.o
 block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
diff --git a/block/vhdx.c b/block/vhdx.c
index 56bc88e..7bd7c12 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -21,6 +21,7 @@
 #include qemu/crc32c.h
 #include block/vhdx.h
 
+#include uuid/uuid.h
 
 /* Several metadata and region table data entries are identified by
  * guids in  a MS-specific GUID format. */
@@ -156,11 +157,40 @@ typedef struct BDRVVHDXState {
 VHDXBatEntry *bat;
 uint64_t bat_offset;
 
+MSGUID session_guid;
+
+
 VHDXParentLocatorHeader parent_header;
 VHDXParentLocatorEntry *parent_entries;
 
 } BDRVVHDXState;
 
+/* Calculates new checksum.
+ *
+ * Zero is substituted during crc calculation for the original crc field
+ * crc_offset: byte offset in buf of the buffer crc
+ * buf: buffer pointer
+ * size: size of buffer (must be  crc_offset+4)
+ *
+ * Note: The resulting checksum is in the CPU endianness, not necessarily
+ *   in the file format endianness (LE).  Any header export to disk should
+ *   make sure that vhdx_header_le_export() is used to convert to the
+ *   correct endianness
+ */
+uint32_t vhdx_update_checksum(uint8_t *buf, size_t size, int crc_offset)
+{
+uint32_t crc;
+
+assert(buf != NULL);
+assert(size  (crc_offset + sizeof(crc)));
+
+memset(buf + crc_offset, 0, sizeof(crc));
+crc =  crc32c(0x, buf, size);
+memcpy(buf + crc_offset, crc, sizeof(crc));
+
+return crc;
+}
+
 uint32_t vhdx_checksum_calc(uint32_t crc, uint8_t *buf, size_t size,
 int crc_offset)
 {
@@ -212,6 +242,19 @@ bool vhdx_checksum_is_valid(uint8_t *buf, size_t size, int 
crc_offset)
 
 
 /*
+ * This generates a UUID that is compliant with the MS GUIDs used
+ * in the VHDX spec (and elsewhere).
+ */
+void vhdx_guid_generate(MSGUID *guid)
+{
+uuid_t uuid;
+assert(guid != NULL);
+
+uuid_generate(uuid);
+memcpy(guid, uuid, sizeof(MSGUID));
+}
+
+/*
  * Per the MS VHDX Specification, for every VHDX file:
  *  - The header section is fixed size - 1 MB
  *  - The header section is always the first object
@@ -249,6 +292,113 @@ static void vhdx_header_le_import(VHDXHeader *h)
 le64_to_cpus(h-log_offset);
 }
 
+/* All VHDX structures on disk are little endian */
+static void vhdx_header_le_export(VHDXHeader *orig_h, VHDXHeader *new_h)
+{
+assert(orig_h != NULL);
+assert(new_h != NULL);
+
+new_h-signature   = cpu_to_le32(orig_h-signature);
+new_h-checksum= cpu_to_le32(orig_h-checksum);
+new_h-sequence_number = cpu_to_le64(orig_h-sequence_number);
+
+new_h-file_write_guid = orig_h-file_write_guid;
+new_h-data_write_guid = orig_h-data_write_guid;
+new_h-log_guid= orig_h-log_guid;
+
+cpu_to_leguids(new_h-file_write_guid);
+cpu_to_leguids(new_h-data_write_guid);
+cpu_to_leguids(new_h-log_guid);
+
+new_h-log_version = cpu_to_le16(orig_h-log_version);
+new_h-version = cpu_to_le16(orig_h-version);
+new_h-log_length  = cpu_to_le32(orig_h-log_length);
+new_h-log_offset  = cpu_to_le64(orig_h-log_offset);
+}
+
+/* Update the VHDX headers
+ *
+ * This follows the VHDX spec procedures for header updates.
+ *
+ *  - non-current header is updated with largest sequence number
+ */
+static int vhdx_update_header(BlockDriverState *bs, BDRVVHDXState *s,
+  bool generate_data_write_guid)
+{
+int ret = 0;
+int hdr_idx = 0;
+uint64_t header_offset = VHDX_HEADER1_OFFSET;
+
+VHDXHeader *active_header;
+VHDXHeader *inactive_header;
+VHDXHeader header_le;
+uint8_t *buffer;
+
+/* operate on the non-current header */
+if (s-curr_header == 0) {
+hdr_idx = 1;
+header_offset = VHDX_HEADER2_OFFSET;
+}
+
+active_header   =

[Qemu-devel] [PATCH v4 03/13] block: vhdx code movement - VHDXMetadataEntries and BDRVVHDXState to header.

2013-08-20 Thread Jeff Cody

In preparation for VHDX log support, move these structures to the
header.

Signed-off-by: Jeff Cody jc...@redhat.com
---
 block/vhdx.c | 51 ---
 block/vhdx.h | 47 +++
 2 files changed, 47 insertions(+), 51 deletions(-)

diff --git a/block/vhdx.c b/block/vhdx.c
index 7bd7c12..68648e1 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -104,16 +104,6 @@ static const MSGUID parent_vhdx_guid = { .data1 = 
0xb04aefb7,
  META_PAGE_83_PRESENT | META_LOGICAL_SECTOR_SIZE_PRESENT | \
  META_PHYS_SECTOR_SIZE_PRESENT)
 
-typedef struct VHDXMetadataEntries {
-VHDXMetadataTableEntry file_parameters_entry;
-VHDXMetadataTableEntry virtual_disk_size_entry;
-VHDXMetadataTableEntry page83_data_entry;
-VHDXMetadataTableEntry logical_sector_size_entry;
-VHDXMetadataTableEntry phys_sector_size_entry;
-VHDXMetadataTableEntry parent_locator_entry;
-uint16_t present;
-} VHDXMetadataEntries;
-
 
 typedef struct VHDXSectorInfo {
 uint32_t bat_idx;   /* BAT entry index */
@@ -124,47 +114,6 @@ typedef struct VHDXSectorInfo {
 uint64_t block_offset;  /* block offset, in bytes */
 } VHDXSectorInfo;
 
-
-
-typedef struct BDRVVHDXState {
-CoMutex lock;
-
-int curr_header;
-VHDXHeader *headers[2];
-
-VHDXRegionTableHeader rt;
-VHDXRegionTableEntry bat_rt; /* region table for the BAT */
-VHDXRegionTableEntry metadata_rt;/* region table for the metadata */
-
-VHDXMetadataTableHeader metadata_hdr;
-VHDXMetadataEntries metadata_entries;
-
-VHDXFileParameters params;
-uint32_t block_size;
-uint32_t block_size_bits;
-uint32_t sectors_per_block;
-uint32_t sectors_per_block_bits;
-
-uint64_t virtual_disk_size;
-uint32_t logical_sector_size;
-uint32_t physical_sector_size;
-
-uint64_t chunk_ratio;
-uint32_t chunk_ratio_bits;
-uint32_t logical_sector_size_bits;
-
-uint32_t bat_entries;
-VHDXBatEntry *bat;
-uint64_t bat_offset;
-
-MSGUID session_guid;
-
-
-VHDXParentLocatorHeader parent_header;
-VHDXParentLocatorEntry *parent_entries;
-
-} BDRVVHDXState;
-
 /* Calculates new checksum.
  *
  * Zero is substituted during crc calculation for the original crc field
diff --git a/block/vhdx.h b/block/vhdx.h
index 403f766..74b2d5d 100644
--- a/block/vhdx.h
+++ b/block/vhdx.h
@@ -308,6 +308,53 @@ typedef struct QEMU_PACKED VHDXParentLocatorEntry {
 
 /* - END VHDX SPECIFICATION STRUCTURES  */
 
+typedef struct VHDXMetadataEntries {
+VHDXMetadataTableEntry file_parameters_entry;
+VHDXMetadataTableEntry virtual_disk_size_entry;
+VHDXMetadataTableEntry page83_data_entry;
+VHDXMetadataTableEntry logical_sector_size_entry;
+VHDXMetadataTableEntry phys_sector_size_entry;
+VHDXMetadataTableEntry parent_locator_entry;
+uint16_t present;
+} VHDXMetadataEntries;
+
+typedef struct BDRVVHDXState {
+CoMutex lock;
+
+int curr_header;
+VHDXHeader *headers[2];
+
+VHDXRegionTableHeader rt;
+VHDXRegionTableEntry bat_rt; /* region table for the BAT */
+VHDXRegionTableEntry metadata_rt;/* region table for the metadata */
+
+VHDXMetadataTableHeader metadata_hdr;
+VHDXMetadataEntries metadata_entries;
+
+VHDXFileParameters params;
+uint32_t block_size;
+uint32_t block_size_bits;
+uint32_t sectors_per_block;
+uint32_t sectors_per_block_bits;
+
+uint64_t virtual_disk_size;
+uint32_t logical_sector_size;
+uint32_t physical_sector_size;
+
+uint64_t chunk_ratio;
+uint32_t chunk_ratio_bits;
+uint32_t logical_sector_size_bits;
+
+uint32_t bat_entries;
+VHDXBatEntry *bat;
+uint64_t bat_offset;
+
+MSGUID session_guid;
+
+VHDXParentLocatorHeader parent_header;
+VHDXParentLocatorEntry *parent_entries;
+
+} BDRVVHDXState;
 
 void vhdx_guid_generate(MSGUID *guid);
 
-- 
1.8.1.4

[Qemu-devel] [PATCH v4 08/13] block: vhdx - add log write support

2013-08-20 Thread Jeff Cody

This adds support for writing to the VHDX log.

For spec details, see VHDX Specification Format v1.00:
https://www.microsoft.com/en-us/download/details.aspx?id=34750

There are a few limitations to this log support:
1.) There is no caching yet
2.) The log is flushed after each entry

The primary write interface, vhdx_log_write_and_flush(), performs a log
write followed by an immediate flush of the log.

As each log entry sector is a minimum of 4KB, partial sector writes are
filled in with data from the disk write destination.

If the current file log GUID is 0, a new GUID is generated and updated
in the header.

Signed-off-by: Jeff Cody jc...@redhat.com
---
 block/vhdx-log.c | 277 +++
 block/vhdx.h |   3 +
 2 files changed, 280 insertions(+)

diff --git a/block/vhdx-log.c b/block/vhdx-log.c
index 56808e0..7db6989 100644
--- a/block/vhdx-log.c
+++ b/block/vhdx-log.c
@@ -170,6 +170,55 @@ exit:
 return ret;
 }
 
+/* Writes num_sectors to the log (all log sectors are 4096 bytes),
+ * from buffer 'buffer'.  Upon return, *sectors_written will contain
+ * the number of sectors successfully written.
+ *
+ * It is assumed that 'buffer' is at least 4096*num_sectors large.
+ *
+ * 0 is returned on success, -errno otherwise */
+static int vhdx_log_write_sectors(BlockDriverState *bs, VHDXLogEntries *log,
+  uint32_t *sectors_written, void *buffer,
+  uint32_t num_sectors)
+{
+int ret = 0;
+uint64_t offset;
+uint32_t write;
+void *buffer_tmp;
+BDRVVHDXState *s = bs-opaque;
+
+ret = vhdx_user_visible_write(bs, s);
+if (ret  0) {
+goto exit;
+}
+
+write = log-write;
+
+buffer_tmp = buffer;
+while (num_sectors) {
+
+offset = log-offset + write;
+write = vhdx_log_inc_idx(write, log-length);
+if (write == log-read) {
+/* full */
+break;
+}
+ret = bdrv_pwrite(bs-file, offset, buffer_tmp, VHDX_LOG_SECTOR_SIZE);
+if (ret  0) {
+goto exit;
+}
+buffer_tmp += VHDX_LOG_SECTOR_SIZE;
+
+log-write = write;
+*sectors_written = *sectors_written + 1;
+num_sectors--;
+}
+
+exit:
+return ret;
+}
+
+
 /* Validates a log entry header */
 static bool vhdx_log_hdr_is_valid(VHDXLogEntries *log, VHDXLogEntryHeader *hdr,
   BDRVVHDXState *s)
@@ -732,3 +781,231 @@ exit:
 return ret;
 }
 
+
+
+static void vhdx_log_raw_to_le_sector(VHDXLogDescriptor *desc,
+  VHDXLogDataSector *sector, void *data,
+  uint64_t seq)
+{
+/* 8 + 4084 + 4 = 4096, 1 log sector */
+memcpy(desc-leading_bytes, data, 8);
+data += 8;
+cpu_to_le64s(desc-leading_bytes);
+memcpy(sector-data, data, 4084);
+data += 4084;
+memcpy(desc-trailing_bytes, data, 4);
+cpu_to_le32s(desc-trailing_bytes);
+data += 4;
+
+sector-sequence_high  = (uint32_t) (seq  32);
+sector-sequence_low   = (uint32_t) (seq  0x);
+sector-data_signature = VHDX_LOG_DATA_SIGNATURE;
+
+vhdx_log_desc_le_export(desc);
+vhdx_log_data_le_export(sector);
+}
+
+
+static int vhdx_log_write(BlockDriverState *bs, BDRVVHDXState *s,
+  void *data, uint32_t length, uint64_t offset)
+{
+int ret = 0;
+void *buffer = NULL;
+void *merged_sector = NULL;
+void *data_tmp, *sector_write;
+unsigned int i;
+int sector_offset;
+uint32_t desc_sectors, sectors, total_length;
+uint32_t sectors_written = 0;
+uint32_t aligned_length;
+uint32_t leading_length = 0;
+uint32_t trailing_length = 0;
+uint32_t partial_sectors = 0;
+uint32_t bytes_written = 0;
+uint64_t file_offset;
+VHDXHeader *header;
+VHDXLogEntryHeader new_hdr;
+VHDXLogDescriptor *new_desc = NULL;
+VHDXLogDataSector *data_sector = NULL;
+MSGUID new_guid = { 0 };
+
+header = s-headers[s-curr_header];
+
+/* need to have offset read data, and be on 4096 byte boundary */
+
+if (length  header-log_length) {
+/* no log present.  we could create a log here instead of failing */
+ret = -EINVAL;
+goto exit;
+}
+
+if (vhdx_log_guid_is_zero(header-log_guid)) {
+vhdx_guid_generate(new_guid);
+vhdx_update_headers(bs, s, false, new_guid);
+} else {
+/* currently, we require that the log be flushed after
+ * every write. */
+ret = -ENOTSUP;
+}
+
+/* 0 is an invalid sequence number, but may also represent the first
+ * log write (or a wrapped seq) */
+if (s-log.sequence == 0) {
+s-log.sequence = 1;
+}
+
+sector_offset = offset % VHDX_LOG_SECTOR_SIZE;
+file_offset = (offset / VHDX_LOG_SECTOR_SIZE) * VHDX_LOG_SECTOR_SIZE;
+
+aligned_length = length;
+
+/* add in the unaligned head and tail bytes */

[Qemu-devel] [PATCH v4 10/13] block: vhdx - move more endian translations to vhdx-endian.c

2013-08-20 Thread Jeff Cody

In preperation for vhdx_create(), move more endian translation
functions out to vhdx-endian.c.

Signed-off-by: Jeff Cody jc...@redhat.com
---
 block/vhdx-endian.c | 75 +
 block/vhdx.c| 20 +++---
 block/vhdx.h|  9 ++-
 3 files changed, 87 insertions(+), 17 deletions(-)

diff --git a/block/vhdx-endian.c b/block/vhdx-endian.c
index 3e93e63..fe879ed 100644
--- a/block/vhdx-endian.c
+++ b/block/vhdx-endian.c
@@ -139,3 +139,78 @@ void vhdx_log_entry_hdr_le_export(VHDXLogEntryHeader *hdr)
 }
 
 
+/* Region table entries */
+void vhdx_region_header_le_import(VHDXRegionTableHeader *hdr)
+{
+assert(hdr != NULL);
+
+le32_to_cpus(hdr-signature);
+le32_to_cpus(hdr-checksum);
+le32_to_cpus(hdr-entry_count);
+}
+
+void vhdx_region_header_le_export(VHDXRegionTableHeader *hdr)
+{
+assert(hdr != NULL);
+
+cpu_to_le32s(hdr-signature);
+cpu_to_le32s(hdr-checksum);
+cpu_to_le32s(hdr-entry_count);
+}
+
+void vhdx_region_entry_le_import(VHDXRegionTableEntry *e)
+{
+assert(e != NULL);
+
+leguid_to_cpus(e-guid);
+le64_to_cpus(e-file_offset);
+le32_to_cpus(e-length);
+le32_to_cpus(e-data_bits);
+}
+
+void vhdx_region_entry_le_export(VHDXRegionTableEntry *e)
+{
+assert(e != NULL);
+
+cpu_to_leguids(e-guid);
+cpu_to_le64s(e-file_offset);
+cpu_to_le32s(e-length);
+cpu_to_le32s(e-data_bits);
+}
+
+
+/* Metadata headers  table */
+void vhdx_metadata_header_le_import(VHDXMetadataTableHeader *hdr)
+{
+assert(hdr != NULL);
+
+le64_to_cpus(hdr-signature);
+le16_to_cpus(hdr-entry_count);
+}
+
+void vhdx_metadata_header_le_export(VHDXMetadataTableHeader *hdr)
+{
+assert(hdr != NULL);
+
+cpu_to_le64s(hdr-signature);
+cpu_to_le16s(hdr-entry_count);
+}
+
+void vhdx_metadata_entry_le_import(VHDXMetadataTableEntry *e)
+{
+assert(e != NULL);
+
+leguid_to_cpus(e-item_id);
+le32_to_cpus(e-offset);
+le32_to_cpus(e-length);
+le32_to_cpus(e-data_bits);
+}
+void vhdx_metadata_entry_le_export(VHDXMetadataTableEntry *e)
+{
+assert(e != NULL);
+
+cpu_to_leguids(e-item_id);
+cpu_to_le32s(e-offset);
+cpu_to_le32s(e-length);
+cpu_to_le32s(e-data_bits);
+}
diff --git a/block/vhdx.c b/block/vhdx.c
index c97955a..c917376 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -424,10 +424,7 @@ static int vhdx_open_region_tables(BlockDriverState *bs, 
BDRVVHDXState *s)
 goto fail;
 }
 memcpy(s-rt, buffer, sizeof(s-rt));
-le32_to_cpus(s-rt.signature);
-le32_to_cpus(s-rt.checksum);
-le32_to_cpus(s-rt.entry_count);
-le32_to_cpus(s-rt.reserved);
+vhdx_region_header_le_import(s-rt);
 offset += sizeof(s-rt);
 
 if (!vhdx_checksum_is_valid(buffer, VHDX_HEADER_BLOCK_SIZE, 4) ||
@@ -446,10 +443,7 @@ static int vhdx_open_region_tables(BlockDriverState *bs, 
BDRVVHDXState *s)
 memcpy(rt_entry, buffer + offset, sizeof(rt_entry));
 offset += sizeof(rt_entry);
 
-leguid_to_cpus(rt_entry.guid);
-le64_to_cpus(rt_entry.file_offset);
-le32_to_cpus(rt_entry.length);
-le32_to_cpus(rt_entry.data_bits);
+vhdx_region_entry_le_import(rt_entry);
 
 /* see if we recognize the entry */
 if (guid_eq(rt_entry.guid, bat_guid)) {
@@ -524,9 +518,7 @@ static int vhdx_parse_metadata(BlockDriverState *bs, 
BDRVVHDXState *s)
 memcpy(s-metadata_hdr, buffer, sizeof(s-metadata_hdr));
 offset += sizeof(s-metadata_hdr);
 
-le64_to_cpus(s-metadata_hdr.signature);
-le16_to_cpus(s-metadata_hdr.reserved);
-le16_to_cpus(s-metadata_hdr.entry_count);
+vhdx_metadata_header_le_import(s-metadata_hdr);
 
 if (memcmp(s-metadata_hdr.signature, metadata, 8)) {
 ret = -EINVAL;
@@ -545,11 +537,7 @@ static int vhdx_parse_metadata(BlockDriverState *bs, 
BDRVVHDXState *s)
 memcpy(md_entry, buffer + offset, sizeof(md_entry));
 offset += sizeof(md_entry);
 
-leguid_to_cpus(md_entry.item_id);
-le32_to_cpus(md_entry.offset);
-le32_to_cpus(md_entry.length);
-le32_to_cpus(md_entry.data_bits);
-le32_to_cpus(md_entry.reserved2);
+vhdx_metadata_entry_le_import(md_entry);
 
 if (guid_eq(md_entry.item_id, file_param_guid)) {
 if (s-metadata_entries.present  META_FILE_PARAMETER_PRESENT) {
diff --git a/block/vhdx.h b/block/vhdx.h
index f3ca516..aab862c 100644
--- a/block/vhdx.h
+++ b/block/vhdx.h
@@ -412,7 +412,14 @@ void vhdx_log_desc_le_export(VHDXLogDescriptor *d);
 void vhdx_log_data_le_export(VHDXLogDataSector *d);
 void vhdx_log_entry_hdr_le_import(VHDXLogEntryHeader *hdr);
 void vhdx_log_entry_hdr_le_export(VHDXLogEntryHeader *hdr);
-
+void vhdx_region_header_le_import(VHDXRegionTableHeader *hdr);
+void vhdx_region_header_le_export(VHDXRegionTableHeader *hdr);
+void vhdx_region_entry_le_import(VHDXRegionTableEntry *e);
+void vhdx_region_entry_le_export(VHDXRegionTableEntry *e);
+void

[Qemu-devel] [PATCH v4 09/13] block: vhdx write support

2013-08-20 Thread Jeff Cody

This adds support for writing to VHDX image files, using coroutines.
Writes into the BAT table goes through the VHDX log.  Currently, BAT
table writes occur when expanding a dynamic VHDX file, and allocating a
new BAT entry.

Signed-off-by: Jeff Cody jc...@redhat.com
---
 block/vhdx.c | 150 ++-
 1 file changed, 148 insertions(+), 2 deletions(-)

diff --git a/block/vhdx.c b/block/vhdx.c
index dabd688..c97955a 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -833,7 +833,7 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, 
int flags)
 }
 }
 
-/* TODO: differencing files, write */
+/* TODO: differencing files */
 
 return 0;
 fail:
@@ -965,7 +965,43 @@ exit:
 return ret;
 }
 
+/*
+ * Allocate a new payload block at the end of the file.
+ *
+ * Allocation will happen at 1MB alignment inside the file
+ *
+ * Returns the file offset start of the new payload block
+ */
+static int vhdx_allocate_block(BlockDriverState *bs, BDRVVHDXState *s,
+uint64_t *new_offset)
+{
+*new_offset = bdrv_getlength(bs-file);
+
+/* per the spec, the address for a block is in units of 1MB */
+*new_offset = ROUND_UP(*new_offset, 1024*1024);
+
+return bdrv_truncate(bs-file, *new_offset + s-block_size);
+}
+
+/*
+ * Update the BAT tablet entry with the new file offset, and the new entry
+ * state */
+static void vhdx_update_bat_table_entry(BlockDriverState *bs, BDRVVHDXState *s,
+   VHDXSectorInfo *sinfo,
+   uint64_t *bat_entry,
+   uint64_t *bat_offset, int state)
+{
+/* The BAT entry is a uint64, with 44 bits for the file offset in units of
+ * 1MB, and 3 bits for the block state. */
+s-bat[sinfo-bat_idx]  = ((sinfo-file_offset20) 
+   VHDX_BAT_FILE_OFF_BITS);
 
+s-bat[sinfo-bat_idx] |= state  VHDX_BAT_STATE_BIT_MASK;
+
+*bat_entry = cpu_to_le64(s-bat[sinfo-bat_idx]);
+*bat_offset = s-bat_offset + sinfo-bat_idx * sizeof(VHDXBatEntry);
+
+}
 
 /* Per the spec, on the first write of guest-visible data to the file the
  * data write guid must be updated in the header */
@@ -982,7 +1018,117 @@ int vhdx_user_visible_write(BlockDriverState *bs, 
BDRVVHDXState *s)
 static coroutine_fn int vhdx_co_writev(BlockDriverState *bs, int64_t 
sector_num,
   int nb_sectors, QEMUIOVector *qiov)
 {
-return -ENOTSUP;
+int ret = -ENOTSUP;
+BDRVVHDXState *s = bs-opaque;
+VHDXSectorInfo sinfo;
+uint64_t bytes_done = 0;
+uint64_t bat_entry = 0;
+uint64_t bat_entry_offset = 0;
+bool bat_update;
+QEMUIOVector hd_qiov;
+
+qemu_iovec_init(hd_qiov, qiov-niov);
+
+qemu_co_mutex_lock(s-lock);
+
+ret = vhdx_user_visible_write(bs, s);
+if (ret  0) {
+goto exit;
+}
+
+while (nb_sectors  0) {
+if (s-params.data_bits  VHDX_PARAMS_HAS_PARENT) {
+/* not supported yet */
+ret = -ENOTSUP;
+goto exit;
+} else {
+bat_update = false;
+vhdx_block_translate(s, sector_num, nb_sectors, sinfo);
+
+qemu_iovec_reset(hd_qiov);
+qemu_iovec_concat(hd_qiov, qiov,  bytes_done, sinfo.bytes_avail);
+/* check the payload block state */
+switch (s-bat[sinfo.bat_idx]  VHDX_BAT_STATE_BIT_MASK) {
+case PAYLOAD_BLOCK_ZERO:
+/* in this case, we need to preserve zero writes for
+ * data that is not part of this write, so we must pad
+ * the rest of the buffer to zeroes */
+
+/* if we are on a posix system with ftruncate() that extends
+ * a file, then it is zero-filled for us.  On Win32, the raw
+ * layer uses SetFilePointer and SetFileEnd, which does not
+ * zero fill AFAIK */
+
+/* TODO: queue another write of zero buffers if the host OS 
does
+ * not zero-fill on file extension */
+
+/* fall through */
+case PAYLOAD_BLOCK_NOT_PRESENT: /* fall through */
+case PAYLOAD_BLOCK_UNMAPPED:/* fall through */
+case PAYLOAD_BLOCK_UNDEFINED:   /* fall through */
+ret = vhdx_allocate_block(bs, s, sinfo.file_offset);
+if (ret  0) {
+goto exit;
+}
+/* once we support differencing files, this may also be
+ * partially present */
+/* update block state to the newly specified state */
+vhdx_update_bat_table_entry(bs, s, sinfo, bat_entry,
+bat_entry_offset,
+PAYLOAD_BLOCK_FULL_PRESENT);
+bat_update = true;
+/* since we just allocated

[Qemu-devel] [PATCH v4 05/13] block: vhdx - break endian translation functions out

2013-08-20 Thread Jeff Cody

This moves the endian translation functions out from the vhdx.c source,
into a separate source file. In addition to the previously defined
endian functions, new endian translation functions for log support are
added as well.

Signed-off-by: Jeff Cody jc...@redhat.com
---
 block/Makefile.objs |   2 +-
 block/vhdx-endian.c | 141 
 block/vhdx.c|  43 
 block/vhdx.h|   8 +++
 4 files changed, 150 insertions(+), 44 deletions(-)
 create mode 100644 block/vhdx-endian.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index e5e54e6..e6f5d33 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -2,7 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o 
bochs.o vpc.o vvfat
 block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o 
qcow2-cache.o
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
-block-obj-$(CONFIG_VHDX) += vhdx.o
+block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o
 block-obj-y += parallels.o blkdebug.o blkverify.o
 block-obj-y += snapshot.o qapi.o
 block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
diff --git a/block/vhdx-endian.c b/block/vhdx-endian.c
new file mode 100644
index 000..3e93e63
--- /dev/null
+++ b/block/vhdx-endian.c
@@ -0,0 +1,141 @@
+/*
+ * Block driver for Hyper-V VHDX Images
+ *
+ * Copyright (c) 2013 Red Hat, Inc.,
+ *
+ * Authors:
+ *  Jeff Cody jc...@redhat.com
+ *
+ *  This is based on the VHDX Format Specification v1.00, published 8/25/2012
+ *  by Microsoft:
+ *  https://www.microsoft.com/en-us/download/details.aspx?id=34750
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include qemu-common.h
+#include block/block_int.h
+#include block/vhdx.h
+
+#include uuid/uuid.h
+
+
+/*
+ * All the VHDX formats on disk are little endian - the following
+ * are helper import/export functions to correctly convert
+ * endianness from disk read to native cpu format, and back again.
+ */
+
+
+/* VHDX File Header */
+
+
+void vhdx_header_le_import(VHDXHeader *h)
+{
+assert(h != NULL);
+
+le32_to_cpus(h-signature);
+le32_to_cpus(h-checksum);
+le64_to_cpus(h-sequence_number);
+
+leguid_to_cpus(h-file_write_guid);
+leguid_to_cpus(h-data_write_guid);
+leguid_to_cpus(h-log_guid);
+
+le16_to_cpus(h-log_version);
+le16_to_cpus(h-version);
+le32_to_cpus(h-log_length);
+le64_to_cpus(h-log_offset);
+}
+
+void vhdx_header_le_export(VHDXHeader *orig_h, VHDXHeader *new_h)
+{
+assert(orig_h != NULL);
+assert(new_h != NULL);
+
+new_h-signature   = cpu_to_le32(orig_h-signature);
+new_h-checksum= cpu_to_le32(orig_h-checksum);
+new_h-sequence_number = cpu_to_le64(orig_h-sequence_number);
+
+new_h-file_write_guid = orig_h-file_write_guid;
+new_h-data_write_guid = orig_h-data_write_guid;
+new_h-log_guid= orig_h-log_guid;
+
+cpu_to_leguids(new_h-file_write_guid);
+cpu_to_leguids(new_h-data_write_guid);
+cpu_to_leguids(new_h-log_guid);
+
+new_h-log_version = cpu_to_le16(orig_h-log_version);
+new_h-version = cpu_to_le16(orig_h-version);
+new_h-log_length  = cpu_to_le32(orig_h-log_length);
+new_h-log_offset  = cpu_to_le64(orig_h-log_offset);
+}
+
+
+/* VHDX Log Headers */
+
+
+void vhdx_log_desc_le_import(VHDXLogDescriptor *d)
+{
+assert(d != NULL);
+
+le32_to_cpus(d-signature);
+le32_to_cpus(d-trailing_bytes);
+le64_to_cpus(d-leading_bytes);
+le64_to_cpus(d-file_offset);
+le64_to_cpus(d-sequence_number);
+}
+
+void vhdx_log_desc_le_export(VHDXLogDescriptor *d)
+{
+assert(d != NULL);
+
+cpu_to_le32s(d-signature);
+cpu_to_le32s(d-trailing_bytes);
+cpu_to_le64s(d-leading_bytes);
+cpu_to_le64s(d-file_offset);
+cpu_to_le64s(d-sequence_number);
+}
+
+void vhdx_log_data_le_export(VHDXLogDataSector *d)
+{
+assert(d != NULL);
+
+cpu_to_le32s(d-data_signature);
+cpu_to_le32s(d-sequence_high);
+cpu_to_le32s(d-sequence_low);
+}
+
+void vhdx_log_entry_hdr_le_import(VHDXLogEntryHeader *hdr)
+{
+assert(hdr != NULL);
+
+le32_to_cpus(hdr-signature);
+le32_to_cpus(hdr-checksum);
+le32_to_cpus(hdr-entry_length);
+le32_to_cpus(hdr-tail);
+le64_to_cpus(hdr-sequence_number);
+le32_to_cpus(hdr-descriptor_count);
+leguid_to_cpus(hdr-log_guid);
+le64_to_cpus(hdr-flushed_file_offset);
+le64_to_cpus(hdr-last_file_offset);
+}
+
+void vhdx_log_entry_hdr_le_export(VHDXLogEntryHeader *hdr)
+{
+assert(hdr != NULL);
+
+cpu_to_le32s(hdr-signature);
+cpu_to_le32s(hdr-checksum);
+cpu_to_le32s(hdr-entry_length);
+cpu_to_le32s(hdr-tail);
+cpu_to_le64s(hdr-sequence_number);
+cpu_to_le32s(hdr-descriptor_count);
+cpu_to_leguids(hdr-log_guid);
+cpu_to_le64s(hdr-flushed_file_offset);
+

[Qemu-devel] [PATCH v4 06/13] block: vhdx - update log guid in header, and first write tracker

2013-08-20 Thread Jeff Cody

Allow tracking of first file write in the VHDX image, as well as
the ability to update the GUID in the header.  This is in preparation
for log support.

Signed-off-by: Jeff Cody jc...@redhat.com
---
 block/vhdx.c | 30 --
 block/vhdx.h |  6 ++
 2 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/block/vhdx.c b/block/vhdx.c
index 15a4d1d..4dc056b 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -229,7 +229,7 @@ static int vhdx_probe(const uint8_t *buf, int buf_size, 
const char *filename)
  *  - non-current header is updated with largest sequence number
  */
 static int vhdx_update_header(BlockDriverState *bs, BDRVVHDXState *s,
-  bool generate_data_write_guid)
+  bool generate_data_write_guid, MSGUID *log_guid)
 {
 int ret = 0;
 int hdr_idx = 0;
@@ -261,6 +261,11 @@ static int vhdx_update_header(BlockDriverState *bs, 
BDRVVHDXState *s,
 vhdx_guid_generate(inactive_header-data_write_guid);
 }
 
+/* update the log guid if present */
+if (log_guid) {
+inactive_header-log_guid = *log_guid;
+}
+
 /* the header checksum is not over just the packed size of VHDXHeader,
  * but rather over the entire 'reserved' range for the header, which is
  * 4KB (VHDX_HEADER_SIZE). */
@@ -293,16 +298,16 @@ exit:
  * The VHDX spec calls for header updates to be performed twice, so that both
  * the current and non-current header have valid info
  */
-static int vhdx_update_headers(BlockDriverState *bs, BDRVVHDXState *s,
-   bool generate_data_write_guid)
+int vhdx_update_headers(BlockDriverState *bs, BDRVVHDXState *s,
+bool generate_data_write_guid, MSGUID *log_guid)
 {
 int ret;
 
-ret = vhdx_update_header(bs, s, generate_data_write_guid);
+ret = vhdx_update_header(bs, s, generate_data_write_guid, log_guid);
 if (ret  0) {
 return ret;
 }
-ret = vhdx_update_header(bs, s, generate_data_write_guid);
+ret = vhdx_update_header(bs, s, generate_data_write_guid, log_guid);
 return ret;
 }
 
@@ -782,6 +787,7 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, 
int flags)
 
 
 s-bat = NULL;
+s-first_visible_write = true;
 
 qemu_co_mutex_init(s-lock);
 
@@ -862,7 +868,7 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, 
int flags)
 }
 
 if (flags  BDRV_O_RDWR) {
-ret = vhdx_update_headers(bs, s, false);
+ret = vhdx_update_headers(bs, s, false, NULL);
 if (ret  0) {
 goto fail;
 }
@@ -1002,6 +1008,18 @@ exit:
 
 
 
+/* Per the spec, on the first write of guest-visible data to the file the
+ * data write guid must be updated in the header */
+int vhdx_user_visible_write(BlockDriverState *bs, BDRVVHDXState *s)
+{
+int ret = 0;
+if (s-first_visible_write) {
+s-first_visible_write = false;
+ret = vhdx_update_headers(bs, s, true, NULL);
+}
+return ret;
+}
+
 static coroutine_fn int vhdx_co_writev(BlockDriverState *bs, int64_t 
sector_num,
   int nb_sectors, QEMUIOVector *qiov)
 {
diff --git a/block/vhdx.h b/block/vhdx.h
index 89d9a78..8c61bfd 100644
--- a/block/vhdx.h
+++ b/block/vhdx.h
@@ -361,6 +361,7 @@ typedef struct BDRVVHDXState {
 VHDXBatEntry *bat;
 uint64_t bat_offset;
 
+bool first_visible_write;
 MSGUID session_guid;
 
 VHDXLogEntries log;
@@ -372,6 +373,9 @@ typedef struct BDRVVHDXState {
 
 void vhdx_guid_generate(MSGUID *guid);
 
+int vhdx_update_headers(BlockDriverState *bs, BDRVVHDXState *s, bool rw,
+MSGUID *log_guid);
+
 uint32_t vhdx_update_checksum(uint8_t *buf, size_t size, int crc_offset);
 uint32_t vhdx_checksum_calc(uint32_t crc, uint8_t *buf, size_t size,
 int crc_offset);
@@ -401,4 +405,6 @@ void vhdx_log_data_le_export(VHDXLogDataSector *d);
 void vhdx_log_entry_hdr_le_import(VHDXLogEntryHeader *hdr);
 void vhdx_log_entry_hdr_le_export(VHDXLogEntryHeader *hdr);
 
+int vhdx_user_visible_write(BlockDriverState *bs, BDRVVHDXState *s);
+
 #endif
-- 
1.8.1.4

[Qemu-devel] [PATCH v4 11/13] block: vhdx - break out code operations to functions

2013-08-20 Thread Jeff Cody

This is preperation for vhdx_create().  The ability to write headers,
and calculate the number of BAT entries will be needed within the
create() functions, so move this relevant code into helper functions.

Signed-off-by: Jeff Cody jc...@redhat.com
---
 block/vhdx.c | 121 +++
 1 file changed, 80 insertions(+), 41 deletions(-)

diff --git a/block/vhdx.c b/block/vhdx.c
index c917376..5c7d0c4 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -203,6 +203,14 @@ void vhdx_guid_generate(MSGUID *guid)
 memcpy(guid, uuid, sizeof(MSGUID));
 }
 
+static void vhdx_set_shift_bits(BDRVVHDXState *s)
+{
+s-logical_sector_size_bits = 31 - clz32(s-logical_sector_size);
+s-sectors_per_block_bits =   31 - clz32(s-sectors_per_block);
+s-chunk_ratio_bits = 63 - clz64(s-chunk_ratio);
+s-block_size_bits =  31 - clz32(s-block_size);
+}
+
 /*
  * Per the MS VHDX Specification, for every VHDX file:
  *  - The header section is fixed size - 1 MB
@@ -222,6 +230,50 @@ static int vhdx_probe(const uint8_t *buf, int buf_size, 
const char *filename)
 return 0;
 }
 
+/*
+ * Writes the header to the specified offset.
+ *
+ * This will optionally read in buffer data from disk (otherwise zero-fill),
+ * and then update the header checksum.  Header is converted to proper
+ * endianness before being written to the specified file offset
+ */
+static int vhdx_write_header(BlockDriverState *bs_file, VHDXHeader *hdr,
+ uint64_t offset, bool read)
+{
+uint8_t *buffer = NULL;
+int ret;
+VHDXHeader header_le;
+
+assert(bs_file != NULL);
+assert(hdr != NULL);
+
+/* the header checksum is not over just the packed size of VHDXHeader,
+ * but rather over the entire 'reserved' range for the header, which is
+ * 4KB (VHDX_HEADER_SIZE). */
+
+buffer = qemu_blockalign(bs_file, VHDX_HEADER_SIZE);
+if (read) {
+/* if true, we can't assume the extra reserved bytes are 0 */
+ret = bdrv_pread(bs_file, offset, buffer, VHDX_HEADER_SIZE);
+if (ret  0) {
+goto exit;
+}
+} else {
+memset(buffer, 0, VHDX_HEADER_SIZE);
+}
+
+/* overwrite the actual VHDXHeader portion */
+memcpy(buffer, hdr, sizeof(VHDXHeader));
+hdr-checksum = vhdx_update_checksum(buffer, VHDX_HEADER_SIZE,
+ offsetof(VHDXHeader, checksum));
+vhdx_header_le_export(hdr, header_le);
+ret = bdrv_pwrite_sync(bs_file, offset, header_le, sizeof(VHDXHeader));
+
+exit:
+qemu_vfree(buffer);
+return ret;
+}
+
 /* Update the VHDX headers
  *
  * This follows the VHDX spec procedures for header updates.
@@ -237,8 +289,6 @@ static int vhdx_update_header(BlockDriverState *bs, 
BDRVVHDXState *s,
 
 VHDXHeader *active_header;
 VHDXHeader *inactive_header;
-VHDXHeader header_le;
-uint8_t *buffer;
 
 /* operate on the non-current header */
 if (s-curr_header == 0) {
@@ -266,31 +316,13 @@ static int vhdx_update_header(BlockDriverState *bs, 
BDRVVHDXState *s,
 inactive_header-log_guid = *log_guid;
 }
 
-/* the header checksum is not over just the packed size of VHDXHeader,
- * but rather over the entire 'reserved' range for the header, which is
- * 4KB (VHDX_HEADER_SIZE). */
-
-buffer = qemu_blockalign(bs, VHDX_HEADER_SIZE);
-/* we can't assume the extra reserved bytes are 0 */
-ret = bdrv_pread(bs-file, header_offset, buffer, VHDX_HEADER_SIZE);
-if (ret  0) {
-goto exit;
-}
-/* overwrite the actual VHDXHeader portion */
-memcpy(buffer, inactive_header, sizeof(VHDXHeader));
-inactive_header-checksum =
-vhdx_update_checksum(buffer, VHDX_HEADER_SIZE,
- offsetof(VHDXHeader, checksum));
-vhdx_header_le_export(inactive_header, header_le);
-ret = bdrv_pwrite_sync(bs-file, header_offset, header_le,
-   sizeof(VHDXHeader));
+vhdx_write_header(bs-file, inactive_header, header_offset, true);
 if (ret  0) {
 goto exit;
 }
 s-curr_header = hdr_idx;
 
 exit:
-qemu_vfree(buffer);
 return ret;
 }
 
@@ -710,10 +742,7 @@ static int vhdx_parse_metadata(BlockDriverState *bs, 
BDRVVHDXState *s)
 goto exit;
 }
 
-s-logical_sector_size_bits = 31 - clz32(s-logical_sector_size);
-s-sectors_per_block_bits =   31 - clz32(s-sectors_per_block);
-s-chunk_ratio_bits = 63 - clz64(s-chunk_ratio);
-s-block_size_bits =  31 - clz32(s-block_size);
+vhdx_set_shift_bits(s);
 
 ret = 0;
 
@@ -722,6 +751,31 @@ exit:
 return ret;
 }
 
+/*
+ * Calculate the number of BAT entries, including sector
+ * bitmap entries.
+ */
+static void vhdx_calc_bat_entries(BDRVVHDXState *s)
+{
+uint32_t data_blocks_cnt, bitmap_blocks_cnt;
+
+data_blocks_cnt = s-virtual_disk_size  s-block_size_bits;
+if

[Qemu-devel] [PATCH v4 07/13] block: vhdx - log parsing, replay, and flush support

2013-08-20 Thread Jeff Cody

This adds support for VHDX v0 logs, as specified in Microsoft's
VHDX Specification Format v1.00:
https://www.microsoft.com/en-us/download/details.aspx?id=34750

The following support is added:

* Log parsing, and validation - validate that an existing log
  is correct.

* Log search - search through an existing log, to find any valid
  sequence of entries.

* Log replay and flush - replay an existing log, and flush/clear
  the log when complete.

The VHDX log is a circular buffer, with elements (sectors) of 4KB.

A log entry is a variably-length number of sectors, that is
comprised of a header and 'descriptors', that describe each sector.

A log may contain multiple entries, know as a log sequence.  In a log
sequence, each log entry immediately follows the previous entry, with an
incrementing sequence number.  There can only ever be one active and
valid sequence in the log.

Each log entry must match the file log GUID in order to be valid (along
with other criteria).  Once we have flushed all valid log entries, we
marked the file log GUID to be zero, which indicates a buffer with no
valid entries.

Signed-off-by: Jeff Cody jc...@redhat.com
---
 block/Makefile.objs |   2 +-
 block/vhdx-log.c| 734 
 block/vhdx.c|  44 +---
 block/vhdx.h|   7 +-
 4 files changed, 743 insertions(+), 44 deletions(-)
 create mode 100644 block/vhdx-log.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index e6f5d33..2fbd79a 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -2,7 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o 
bochs.o vpc.o vvfat
 block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o 
qcow2-cache.o
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
-block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o
+block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
 block-obj-y += parallels.o blkdebug.o blkverify.o
 block-obj-y += snapshot.o qapi.o
 block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
diff --git a/block/vhdx-log.c b/block/vhdx-log.c
new file mode 100644
index 000..56808e0
--- /dev/null
+++ b/block/vhdx-log.c
@@ -0,0 +1,734 @@
+/*
+ * Block driver for Hyper-V VHDX Images
+ *
+ * Copyright (c) 2013 Red Hat, Inc.,
+ *
+ * Authors:
+ *  Jeff Cody jc...@redhat.com
+ *
+ *  This is based on the VHDX Format Specification v1.00, published 8/25/2012
+ *  by Microsoft:
+ *  https://www.microsoft.com/en-us/download/details.aspx?id=34750
+ *
+ * This file covers the functionality of the metadata log writing, parsing, and
+ * replay.
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+#include qemu-common.h
+#include block/block_int.h
+#include qemu/module.h
+#include block/vhdx.h
+
+
+typedef struct VHDXLogSequence {
+bool valid;
+uint32_t count;
+VHDXLogEntries log;
+VHDXLogEntryHeader hdr;
+} VHDXLogSequence;
+
+typedef struct VHDXLogDescEntries {
+VHDXLogEntryHeader hdr;
+VHDXLogDescriptor desc[];
+} VHDXLogDescEntries;
+
+
+/* Returns true if the GUID is zero */
+static bool vhdx_log_guid_is_zero(MSGUID *guid)
+{
+int i;
+int ret = 0;
+
+/* If either the log guid, or log length is zero,
+ * then a replay log is not present */
+for (i = 0; i  sizeof(MSGUID); i++) {
+ret |= ((uint8_t *) guid)[i];
+}
+
+return ret == 0;
+}
+
+/* The log located on the disk is circular buffer containing
+ * sectors of 4096 bytes each.
+ *
+ * It is assumed for the read/write functions below that the
+ * circular buffer scheme uses a 'one sector open' to indicate
+ * the buffer is full.  Given the validation methods used for each
+ * sector, this method should be compatible with other methods that
+ * do not waste a sector.
+ */
+
+
+/* Allow peeking at the hdr entry at the beginning of the current
+ * read index, without advancing the read index */
+static int vhdx_log_peek_hdr(BlockDriverState *bs, VHDXLogEntries *log,
+ VHDXLogEntryHeader *hdr)
+{
+int ret = 0;
+uint64_t offset;
+uint32_t read;
+
+assert(hdr != NULL);
+
+/* peek is only supported on sector boundaries */
+if (log-read % VHDX_LOG_SECTOR_SIZE) {
+ret = -EFAULT;
+goto exit;
+}
+
+read = log-read;
+/* we are guaranteed that a) log sectors are 4096 bytes,
+ * and b) the log length is a multiple of 1MB. So, there
+ * is always a round number of sectors in the buffer */
+if ((read + sizeof(VHDXLogEntryHeader))  log-length) {
+read = 0;
+}
+
+if (read == log-write) {
+ret = -EINVAL;
+goto exit;
+}
+
+offset = log-offset + read;
+
+ret = bdrv_pread(bs-file, offset, hdr, sizeof(VHDXLogEntryHeader));
+if (ret  0) {
+goto exit;
+}
+
+exit:
+return ret;
+}
+
+/* Index increment for

[Qemu-devel] [PATCH v4 13/13] block: vhdx - add .bdrv_create() support

2013-08-20 Thread Jeff Cody

This adds support for VHDX image creation, for images of type Fixed
and Dynamic.  Differencing types (i.e., VHDX images with backing
files) are currently not supported.

Options for image creation include:
* log size:
The size of the journaling log for VHDX.  Minimum is 1MB,
and it must be a multiple of 1MB. Invalid log sizes will be
silently fixed by rounding up to the nearest MB.

Default is 1MB.

* block size:
This is the size of a payload block.  The range is 1MB to 256MB,
inclusive, and must be a multiple of 1MB as well.  Invalid sizes
and multiples will be silently fixed.  If '0' is passed, then
a sane size is chosen (depending on virtual image size).

Default is 0 (Auto-select).

* subformat:
- dynamic
An image without data pre-allocated.
- fixed
An image with data pre-allocated.

Default is dynamic

When creating the image file, the lettered sections are created:

-.
|   (A)|   (B)|(C)| (D)   | (E)
|  File ID |  Header1 |  Header 2 |  Region Tbl 1 |  Region Tbl 2
|  |  |   |   |
.-.
0 64KB  128KB   192KB   256KB  320KB

. ~ --- ~  ~  ~ ---.
| (F) | (G)   |(H)|
| Journal Log |  BAT / Bitmap |  Metadata |   data ..
| |   |   |
. ~ --- ~  ~  ~ ---.
1MB (var.)  (var.)  (var.)

Signed-off-by: Jeff Cody jc...@redhat.com
---
 block/vhdx.c | 532 +++
 block/vhdx.h |  15 +-
 2 files changed, 546 insertions(+), 1 deletion(-)

diff --git a/block/vhdx.c b/block/vhdx.c
index 5c7d0c4..1551be1 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -22,6 +22,18 @@
 #include block/vhdx.h
 
 #include uuid/uuid.h
+#include glib.h
+
+/* Options for VHDX creation */
+
+#define VHDX_BLOCK_OPT_LOG_SIZE   log_size
+#define VHDX_BLOCK_OPT_BLOCK_SIZE block_size
+
+typedef enum VHDXImageType {
+VHDX_TYPE_DYNAMIC = 0,
+VHDX_TYPE_FIXED,
+VHDX_TYPE_DIFFERENCING,   /* Currently unsupported */
+} VHDXImageType;
 
 /* Several metadata and region table data entries are identified by
  * guids in  a MS-specific GUID format. */
@@ -1169,6 +1181,523 @@ static void vhdx_close(BlockDriverState *bs)
 qemu_vfree(s-log.hdr);
 }
 
+
+/*
+ * Create VHDX Headers
+ *
+ * There are 2 headers, and the highest sequence number will represent
+ * the active header
+ */
+static int vhdx_create_new_headers(BlockDriverState *bs, uint64_t image_size,
+   uint32_t log_size)
+{
+int ret = 0;
+VHDXHeader *hdr = NULL;
+
+hdr = g_malloc0(sizeof(VHDXHeader));
+
+hdr-signature   = VHDX_HEADER_SIGNATURE;
+hdr-sequence_number = g_random_int();
+hdr-log_version = 0;
+hdr-version = 1;
+hdr-log_length  = log_size;
+hdr-log_offset  = VHDX_HEADER_SECTION_END;
+vhdx_guid_generate(hdr-file_write_guid);
+vhdx_guid_generate(hdr-data_write_guid);
+
+ret = vhdx_write_header(bs, hdr, VHDX_HEADER1_OFFSET, false);
+if (ret  0) {
+goto exit;
+}
+hdr-sequence_number++;
+ret = vhdx_write_header(bs, hdr, VHDX_HEADER2_OFFSET, false);
+if (ret  0) {
+goto exit;
+}
+
+exit:
+g_free(hdr);
+return ret;
+}
+
+
+/*
+ * Create the Metadata entries.
+ *
+ * For more details on the entries, see section 3.5 (pg 29) in the
+ * VHDX 1.00 specification.
+ *
+ * We support 5 metadata entries (all required by spec):
+ *  File Parameters,
+ *  Virtual Disk Size,
+ *  Page 83 Data,
+ *  Logical Sector Size,
+ *  Physical Sector Size
+ *
+ * The first 64KB of the Metadata section is reserved for the metadata
+ * header and entries; beyond that, the metadata items themselves reside.
+ */
+static int vhdx_create_new_metadata(BlockDriverState *bs,
+uint64_t image_size,
+uint32_t block_size,
+uint32_t sector_size,
+uint64_t metadata_offset,
+VHDXImageType type)
+{
+int ret = 0;
+uint32_t offset = 0;
+void *buffer = NULL;
+void *entry_buffer;
+VHDXMetadataTableHeader *md_table;;
+VHDXMetadataTableEntry  *md_table_entry;
+
+/* Metadata entries */
+VHDXFileParameters *mt_file_params;
+VHDXVirtualDiskSize*mt_virtual_size;
+VHDXPage83Data *mt_page83;
+VHDXVirtualDiskLogicalSectorSize  *mt_log_sector_size;
+VHDXVirtualDiskPhysicalSectorSize *mt_phys_sector_size;
+
+entry_buffer =

[Qemu-devel] [PATCH v4 12/13] block: vhdx - fix comment typos in header, fix incorrect struct fields

2013-08-20 Thread Jeff Cody

VHDXPage83Data and VHDXParentLocatorHeader both incorrectly had their
MSGUID fields set as arrays of 16.  This is incorrect (it stems from
an early version where those fields were uint_8 arrays).  Those fields
were, up to this patch, unused.

Also, there were a couple of typos and incorrect wording in comments,
and those have been fixed up as well.

Signed-off-by: Jeff Cody jc...@redhat.com
---
 block/vhdx.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/vhdx.h b/block/vhdx.h
index aab862c..c9c1cdd 100644
--- a/block/vhdx.h
+++ b/block/vhdx.h
@@ -58,7 +58,7 @@
 typedef struct VHDXFileIdentifier {
 uint64_tsignature;  /* vhdxfile in ASCII */
 uint16_tcreator[256];   /* optional; utf-16 string to identify
-   the vhdx file creator.  Diagnotistic
+   the vhdx file creator.  Diagnostic
only */
 } VHDXFileIdentifier;
 
@@ -114,8 +114,8 @@ typedef struct QEMU_PACKED VHDXHeader {
there is no valid log. If non-zero,
log entries with this guid are
valid. */
-uint16_tlog_version;/* version of the log format. Mustn't 
be
-   zero, unless log_guid is also zero 
*/
+uint16_tlog_version;/* version of the log format. Must be
+   set to zero */
 uint16_tversion;/* version of the vhdx file.  
Currently,
only supported version is 1 */
 uint32_tlog_length; /* length of the log.  Must be multiple
@@ -281,7 +281,7 @@ typedef struct QEMU_PACKED VHDXVirtualDiskSize {
 } VHDXVirtualDiskSize;
 
 typedef struct QEMU_PACKED VHDXPage83Data {
-MSGUID  page_83_data[16];   /* unique id for scsi devices that
+MSGUID  page_83_data;   /* unique id for scsi devices that
support page 0x83 */
 } VHDXPage83Data;
 
@@ -296,7 +296,7 @@ typedef struct QEMU_PACKED 
VHDXVirtualDiskPhysicalSectorSize {
 } VHDXVirtualDiskPhysicalSectorSize;
 
 typedef struct QEMU_PACKED VHDXParentLocatorHeader {
-MSGUID  locator_type[16];   /* type of the parent virtual disk. */
+MSGUID  locator_type;   /* type of the parent virtual disk. */
 uint16_treserved;
 uint16_tkey_value_count;/* number of key/value pairs for this
locator */
-- 
1.8.1.4

Re: [Qemu-devel] [PATCH] qemu-kvm bugfix for IA32_FEATURE_CONTROL

2013-08-20 Thread Liu, Jinsong

Paolo Bonzini wrote:
 Il 19/08/2013 16:59, Andreas Färber ha scritto:
 qemu-kvm is no longer maintained since 1.3 so it should not be
 occurring any more. 
 
 Please use a prefix of target-i386:  (the directory name) to signal
 where you are changing code, i.e. x86 only.
 
 bugfix is not a very telling description of what a patch is doing.
 
 (Up to Paolo and Gleb whether they'll fix it or whether they require
 a resend.)
 
 No, not this time at least. :)
 
 Paolo

Thanks Paolo, and Andreas's comments is also good, so I update commit message 
and will send out later.

Regards,
Jinsong

[Qemu-devel] [PATCH v4 04/13] block: vhdx - log support struct and defines

2013-08-20 Thread Jeff Cody

This adds some magic number defines, and internal structure definitions
for VHDX log replay support.  The struct VHDXLogEntries does not reflect
an on-disk data structure, and thus does not need to be packed.

Some minor code style fixes are applied as well.

Signed-off-by: Jeff Cody jc...@redhat.com
---
 block/vhdx.h | 46 ++
 1 file changed, 30 insertions(+), 16 deletions(-)

diff --git a/block/vhdx.h b/block/vhdx.h
index 74b2d5d..0ab8bf3 100644
--- a/block/vhdx.h
+++ b/block/vhdx.h
@@ -30,12 +30,12 @@
  * 
0.64KB...128KB192KB..256KB1MB
  */
 
-#define VHDX_HEADER_BLOCK_SIZE  (64*1024)
+#define VHDX_HEADER_BLOCK_SIZE  (64 * 1024)
 
 #define VHDX_FILE_ID_OFFSET 0
-#define VHDX_HEADER1_OFFSET (VHDX_HEADER_BLOCK_SIZE*1)
-#define VHDX_HEADER2_OFFSET (VHDX_HEADER_BLOCK_SIZE*2)
-#define VHDX_REGION_TABLE_OFFSET(VHDX_HEADER_BLOCK_SIZE*3)
+#define VHDX_HEADER1_OFFSET (VHDX_HEADER_BLOCK_SIZE * 1)
+#define VHDX_HEADER2_OFFSET (VHDX_HEADER_BLOCK_SIZE * 2)
+#define VHDX_REGION_TABLE_OFFSET(VHDX_HEADER_BLOCK_SIZE * 3)
 
 
 /*
@@ -77,10 +77,10 @@ typedef struct QEMU_PACKED MSGUID {
 #define guid_eq(a, b) \
 (memcmp((a), (b), sizeof(MSGUID)) == 0)
 
-#define VHDX_HEADER_SIZE (4*1024)   /* although the vhdx_header struct in disk
-   is only 582 bytes, for purposes of crc
-   the header is the first 4KB of the 64KB
-   block */
+#define VHDX_HEADER_SIZE (4 * 1024)   /* although the vhdx_header struct in 
disk
+ is only 582 bytes, for purposes of crc
+ the header is the first 4KB of the 
64KB
+ block */
 
 /* The full header is 4KB, although the actual header data is much smaller.
  * But for the checksum calculation, it is over the entire 4KB structure,
@@ -92,7 +92,7 @@ typedef struct QEMU_PACKED VHDXHeader {
VHDX file has 2 of these headers,
and only the header with the highest
sequence number is valid */
-MSGUID  file_write_guid;   /* 128 bit unique identifier. Must be
+MSGUID  file_write_guid;/* 128 bit unique identifier. Must be
updated to new, unique value before
the first modification is made to
file */
@@ -151,7 +151,10 @@ typedef struct QEMU_PACKED VHDXRegionTableEntry {
 
 
 /*  LOG ENTRY STRUCTURES  */
+#define VHDX_LOG_MIN_SIZE (1024 * 1024)
+#define VHDX_LOG_SECTOR_SIZE 4096
 #define VHDX_LOG_HDR_SIZE 64
+#define VHDX_LOG_SIGNATURE 0x65676f6c
 typedef struct QEMU_PACKED VHDXLogEntryHeader {
 uint32_tsignature;  /* loge in ASCII */
 uint32_tchecksum;   /* CRC-32C hash of the 64KB table */
@@ -174,7 +177,8 @@ typedef struct QEMU_PACKED VHDXLogEntryHeader {
 } VHDXLogEntryHeader;
 
 #define VHDX_LOG_DESC_SIZE 32
-
+#define VHDX_LOG_DESC_SIGNATURE 0x63736564
+#define VHDX_LOG_ZERO_SIGNATURE 0x6f72657a
 typedef struct QEMU_PACKED VHDXLogDescriptor {
 uint32_tsignature;  /* zero or desc in ASCII */
 union  {
@@ -194,6 +198,7 @@ typedef struct QEMU_PACKED VHDXLogDescriptor {
vhdx_log_entry_header */
 } VHDXLogDescriptor;
 
+#define VHDX_LOG_DATA_SIGNATURE 0x61746164
 typedef struct QEMU_PACKED VHDXLogDataSector {
 uint32_tdata_signature; /* data in ASCII */
 uint32_tsequence_high;  /* 4 MSB of 8 byte sequence_number */
@@ -219,12 +224,12 @@ typedef struct QEMU_PACKED VHDXLogDataSector {
 #define SB_BLOCK_PRESENT6
 
 /* per the spec */
-#define VHDX_MAX_SECTORS_PER_BLOCK  (123)
+#define VHDX_MAX_SECTORS_PER_BLOCK  (1  23)
 
 /* upper 44 bits are the file offset in 1MB units lower 3 bits are the state
other bits are reserved */
 #define VHDX_BAT_STATE_BIT_MASK 0x07
-#define VHDX_BAT_FILE_OFF_BITS (64-44)
+#define VHDX_BAT_FILE_OFF_BITS (64 - 44)
 typedef uint64_t VHDXBatEntry;
 
 /*  METADATA REGION STRUCTURES  */
@@ -252,8 +257,8 @@ typedef struct QEMU_PACKED VHDXMetadataTableEntry {
metadata region */
 /* note: if length = 0, so is offset */
 uint32_tlength; /* length of metadata. = 1MB. */
-uint32_tdata_bits;  /* least-significant 3 bits are flags, the
-   rest are reserved (see above) */
+uint32_tdata_bits;  /* least-significant 3 bits are flags,
+   the rest are reserved (see

Re: [Qemu-devel] [PATCHv11 13/31] aio / timers: aio_ctx_prepare sets timeout from AioContext timers

2013-08-20 Thread Alex Bligh

 
  for (bh = ctx-first_bh; bh; bh = bh-next) {
  if (!bh-deleted  bh-scheduled) {
  if (bh-idle) {
  /* idle bottom halves will be polled at least
   * every 10ms */
 -*timeout = 10;
 +*timeout = qemu_soonest_timeout(*timeout, 10);
glib will not set *timeout to a meaningful value before calling
 aio_ctx_prepare(), right? If so, *timeout = 10 should be used.


I don't think that's correct. Each _prepare routine is called
and has the abilitiy to alter the timeout but need not
set it at all. Indeed it should not set it as there may already
be a shorter timeout there.

Here's the code before I touched it:

aio_ctx_prepare(GSource *source, gint*timeout)
{
AioContext *ctx = (AioContext *) source;
QEMUBH *bh;

for (bh = ctx-first_bh; bh; bh = bh-next) {
if (!bh-deleted  bh-scheduled) {
if (bh-idle) {
/* idle bottom halves will be polled at least
 * every 10ms */
*timeout = 10;
} else {
/* non-idle bottom halves will be executed
 * immediately */
*timeout = 0;
return true;
}
}
}

return false;
}

You'll note that what this does is:
a) if there are no bottom halves, leaves *timeout as is
b) if there is a non-idle bottom half, set timeout to 0
c) else set timeout to 10 if there are only idle bottom
   halves.

Arguably (c) is a bug if timeout was shorter than 10
on entry but the whole of idle bhs are a bit of a bodge.
This is fixed by my series.

If you are asking WHERE it gets set to -1, I don't claim
to be a glib expert but I believe it's the line
  gint source_timeout = -1
around line 2816 in glib/gmain.c

-- 
Alex Bligh

Re: [Qemu-devel] vmdk stream-optimised format

2013-08-20 Thread Alex Bligh


On 20 Aug 2013, at 02:42, Fam Zheng wrote:

 OK, thanks for explaination. That sounds a valid use case for
 streamOptimized. However I am afraid QEMU and its users benefit not much
 from this feature anyway, because it's moving a VM away to VMware, :)
 that might be the reason it's not there yet, and I don't know about any
 plan to do it in the near future.

Well, given it is an open source project, the more interoperability
the better. Even if it just means users need not worry about lock
in to faster hypervisors ... Being more serious, qemu-img is
part of the project too.

 But if someone sends patches for this, I think it is possible to get in.

I guessed send code might be the answer :-)

What I'm not sure of is whether the streaming format has to be written
sequentially from as opposed to random writes. I believe the way
qemu-img convert works, one can't guarantee the writes are
sequential.

-- 
Alex Bligh

Re: [Qemu-devel] [PATCH] powerpc iommu: enable multiple TCE requests

2013-08-20 Thread Alexander Graf


On 20.08.2013, at 02:36, Alexey Kardashevskiy wrote:

 On 08/19/2013 07:47 PM, Paolo Bonzini wrote:
 Il 19/08/2013 10:44, Alexey Kardashevskiy ha scritto:
 It means that if you use the same QEMU version with the same command
 line on a different kernel version, your guest looks different because
 we generate the dtb differently.
 Oh. Sorry for my ignorance again, I am not playing dump or anything like
 that - I do not understand how the device tree (which we cook in QEMU) on
 the destination can possibly survive migration and not to be overwritten by
 the one from the source. What was in the destination RAM before migration
 does not matter at all (including dt), QEMU device tree is what matters but
 this does not change. As it is the same QEMU version, hypercalls are
 supported anyway, the only difference where they will be handled - in the
 host kernel or QEMU. What do I miss?
 
 Nothing, I just asked to test that handling the hypercall in QEMU works.
 
 Well, I was asking rather Alex :)
 
 On x86 we have a similar problem, though with cpuid bits instead of the
 device tree.  An older kernel might not support some cpuid bits, thus
 -cpu SandyBridge might have different cpuid bits depending on the host
 processor and kernel version.  This is handled by having an enforce
 mode where -cpu SandyBridge,enforce will fail to start if the host
 processor or the kernel is not new enough.
 
 Hm. Here we might have a problem like this is we decide to migrate from
 QEMU with this patch running on modern kernel to QEMU without this patch
 running on old kernel - for this we might want to be able to disable
 multi-tce via machine options on newer kernels. Do we care enough to add
 such a parameter or we just disable migration and that's it?

The problem is not only contained to migration. Even if you just start up your 
VM with a different host kernel you get a different guest environment, so you 
potentially get change lurking in that can kill reproducability. Imagine you're 
Amazon EC2: You don't want people to get any idea what host they're running on.

If the pseries target was a mature, well established and used target, I'd add a 
machine option multi-tce with 3 possibilities:

  on - force multi-tce exposing on
  off - force multi-tce exposing off
  unset - use your current detection code

That way libvirt for example can decide that it wants to nail down TCE support 
throughout a cluster. It's really the same as the cpu,enforce mode, just on a 
machine level rather than for cpuid bits.

However, considering the current user base of KVM on pseries I think it's fine 
to just declare newer QEMU on older KVM as slower because it doesn't use the 
in-kernel multi-tce support and call it a day. It makes everyone's life a _lot_ 
easier.

Or are you aware of any products using older kernels that are going to run QEMU 
1.7 and above and won't change the kernel as well as they bump up the version?

 This SandyBridge,enforce - what if the destination host running on old
 kernel was run without this option - will the migration fail? What is the

Migration requires you to use the same command line on both ends of the 
migration. Unfortunately in only enforces it implicitly - one of the protocol's 
shortcomings - but that's the idea.


Alex

 mechanism? Do machine options migrate? I looked at target-i386/cpu.c but
 did not see the quick answer.
 
 
 But in this case, you do not need this because the hypercall works if
 emulated by QEMU.  I like Alex's solution of making it universally
 available in the dtb.
 
 The solution would be good if we did not already have H_PUT_TCE accelerated
 for emulated devices in the host kernel but we do have it.
 
 
 -- 
 Alexey

Re: [Qemu-devel] [PATCH v2 1/9] tests: QAPI schema parser tests

2013-08-20 Thread Markus Armbruster

Laszlo Ersek ler...@redhat.com writes:

 On 07/27/13 17:41, Markus Armbruster wrote:
 The parser handles erroneous input badly.  To be improved shortly.
 
 Signed-off-by: Markus Armbruster arm...@redhat.com

 diff --git a/tests/Makefile b/tests/Makefile
 index cdbb79e..ddb957c 100644
 --- a/tests/Makefile
 +++ b/tests/Makefile

 @@ -233,13 +242,24 @@ check-report.html: check-report.xml
  check-tests/qemu-iotests-quick.sh: tests/qemu-iotests-quick.sh 
 qemu-img$(EXESUF) qemu-io$(EXESUF)
  $
  
 +.PHONY: check-tests/test-qapi.py
 +check-tests/test-qapi.py: tests/test-qapi.py
 +
 +.PHONY: $(patsubst %, check-%, $(check-qapi-schema-y))
 +$(patsubst %, check-%, $(check-qapi-schema-y)): check-%.json: 
 $(SRC_PATH)/%.json
 +$(call quiet-command, PYTHONPATH=$(SRC_PATH)/scripts $(PYTHON) 
 $(SRC_PATH)/tests/qapi-schema/test-qapi.py $^ $*.out 2$*.err; echo $$? 
 $*.exit,   TEST  $*.out)
 +@diff -q $(SRC_PATH)/$*.out $*.out
 +@diff -q $(SRC_PATH)/$*.err $*.err
 +@diff -q $(SRC_PATH)/$*.exit $*.exit
 +

 Unfortunately, this doesn't catch errors/differences when someone builds
 qemu in the source tree, and runs make check there. In that case,
 whatever output test-qapi.py produces, overwrites the in-tree file,
 and the diff commands compare files with themselves.

D'uh!  Will fix.  Thanks!

 Also, why is 2/9 a good idea, namely, using qapi-schema-test.json as a
 test file itself? Whoever adds a new test, for anything that uses the
 schema, now has to update the .out file as well.

The parser needs to be tested with input that exercises all schema
features.  Since such a test already existed, I extended it to
additionally test the parser.

If this coupling turns out to be inconvenient, we can split the test in
two: qapi-schema-test.json goes back to just its original purpose, and a
(possibly simplified) copy is used for testing the parser.

Drawback: when you add schema features, you have to update both tests to
cover them.  Neglecting to add any tests is obvious in review.
Forgetting one of two not so much.

  (And, due to the above,
 only realizes this burden when someone else tries to make-check his/her
 code outside of the tree...)

Anyone posting patches without running make check gets exactly what he
asked for: embarrassment :)

Re: [Qemu-devel] [PATCH RESEND] RCU implementation for Qemu. Fixup some dynamic casts in the Qemu device tree to correspond to the QOM type-checking system.

2013-08-20 Thread Markus Armbruster

Eric's advice in Message-ID: 5212712c.1080...@redhat.com applies:

Your subject line is atrociously long.  Please put a blank line between
the summary (ca. 60 characters or less) and the rest of your commit
message.  'git shortlog -30' will give you a hint on typical summary naming.

Re: [Qemu-devel] vmdk stream-optimised format

2013-08-20 Thread Fam Zheng

On Tue, 08/20 07:51, Alex Bligh wrote:
 
 On 20 Aug 2013, at 02:42, Fam Zheng wrote:
 
  OK, thanks for explaination. That sounds a valid use case for
  streamOptimized. However I am afraid QEMU and its users benefit not much
  from this feature anyway, because it's moving a VM away to VMware, :)
  that might be the reason it's not there yet, and I don't know about any
  plan to do it in the near future.
 
 Well, given it is an open source project, the more interoperability
 the better. Even if it just means users need not worry about lock
 in to faster hypervisors ... Being more serious, qemu-img is
 part of the project too.
 
  But if someone sends patches for this, I think it is possible to get in.
 
 I guessed send code might be the answer :-)
 
 What I'm not sure of is whether the streaming format has to be written
 sequentially from as opposed to random writes. I believe the way
 qemu-img convert works, one can't guarantee the writes are
 sequential.
 
The order of sectors doesn't matter, but granularity should be aligned
to, as the data is compressed cluster by cluster. And no overwrite, of
course. The challenge may be that header comes at the end of file
(well, called footer), which is not decided at create time.

Thanks,
Fam

Re: [Qemu-devel] [PATCH v2] monitor: print the invalid char in error message

2013-08-20 Thread Markus Armbruster

Fam Zheng f...@redhat.com writes:

 It's more friendly to print which char is invalid to user, especially
 when user tries to input a float value and expect the monitor to round
 it to int. Since we don't round float number when we look for a integer,
 telling which char is invalid is less confusing.

 Signed-off-by: Fam Zheng f...@redhat.com
 ---
  monitor.c | 10 +++---
  1 file changed, 7 insertions(+), 3 deletions(-)

 diff --git a/monitor.c b/monitor.c
 index 5dc0aa9..da9c9a2 100644
 --- a/monitor.c
 +++ b/monitor.c
 @@ -3171,9 +3171,13 @@ static const MonitorDef monitor_defs[] = {
  { NULL },
  };
  
 -static void expr_error(Monitor *mon, const char *msg)
 +static void expr_error(Monitor *mon, const char *fmt, ...)
  {
 -monitor_printf(mon, %s\n, msg);
 +va_list ap;
 +va_start(ap, fmt);
 +monitor_vprintf(mon, fmt, ap);
 +monitor_printf(mon, \n);
 +va_end(ap);
  siglongjmp(expr_env, 1);
  }
  
 @@ -3291,7 +3295,7 @@ static int64_t expr_unary(Monitor *mon)
  expr_error(mon, number too large);
  }
  if (pch == p) {
 -expr_error(mon, invalid char in expression);
 +expr_error(mon, invalid char '%c' in expression, *p);
  }
  pch = p;
  while (qemu_isspace(*pch))

The user is still left guessing *where* the error is.  But it's an
incremental improvement.

Reviewed-by: Markus Armbruster arm...@redhat.com

Re: [Qemu-devel] [PATCHv11 13/31] aio / timers: aio_ctx_prepare sets timeout from AioContext timers

2013-08-20 Thread Wenchao Xia


于 2013-8-20 14:48, Alex Bligh 写道:


  for (bh = ctx-first_bh; bh; bh = bh-next) {
  if (!bh-deleted  bh-scheduled) {
  if (bh-idle) {
  /* idle bottom halves will be polled at least
   * every 10ms */
-*timeout = 10;
+*timeout = qemu_soonest_timeout(*timeout, 10);

glib will not set *timeout to a meaningful value before calling
aio_ctx_prepare(), right? If so, *timeout = 10 should be used.



I don't think that's correct. Each _prepare routine is called
and has the abilitiy to alter the timeout but need not
set it at all. Indeed it should not set it as there may already
be a shorter timeout there.

Here's the code before I touched it:

aio_ctx_prepare(GSource *source, gint*timeout)
{
 AioContext *ctx = (AioContext *) source;
 QEMUBH *bh;

 for (bh = ctx-first_bh; bh; bh = bh-next) {
 if (!bh-deleted  bh-scheduled) {
 if (bh-idle) {
 /* idle bottom halves will be polled at least
  * every 10ms */
 *timeout = 10;
 } else {
 /* non-idle bottom halves will be executed
  * immediately */
 *timeout = 0;
 return true;
 }
 }
 }

 return false;
}

You'll note that what this does is:
a) if there are no bottom halves, leaves *timeout as is
b) if there is a non-idle bottom half, set timeout to 0
c) else set timeout to 10 if there are only idle bottom
halves.

Arguably (c) is a bug if timeout was shorter than 10
on entry but the whole of idle bhs are a bit of a bodge.
This is fixed by my series.

If you are asking WHERE it gets set to -1, I don't claim
to be a glib expert but I believe it's the line
   gint source_timeout = -1
around line 2816 in glib/gmain.c


  Thanks for the explanation. It seems *timeout is always set
to -1 before calling GSource's prepare(), so
*timeout = qemu_soonest_timeout(*timeout, 10);
will always get *timeout = 10, so this call can be saved.
--
Best Regards

Wenchao Xia

Re: [Qemu-devel] [Bug?] qemu-1.6.0 python traceback in GEN qmp-commands.h

2013-08-20 Thread Erik Rull

 On August 19, 2013 at 6:15 PM Erik Rull erik.r...@rdsoftware.de wrote:


 Luiz Capitulino wrote:
  On Fri, 16 Aug 2013 14:21:50 +0100
  Peter Maydell peter.mayd...@linaro.org wrote:
 
  On 16 August 2013 08:59, Erik Rull erik.r...@rdsoftware.de wrote:
  Hi all,
 
  when using the released qemu-1.6.0.tar.bz2, I get the following error
  message:
     File /home/erik/qemu-1.6.0/scripts/qapi.py, line 164
       except QAPISchemaError as e:
                               ^
  SyntaxError: invalid syntax
  make: *** [qmp-commands.h] Error 1
 
  My guess is that your python is older than 2.6; I think
  the except Foo as e syntax is new in 2.6. We probably
  missed this because most people use a newer Python than
  2.6, but configure's check only requires 2.4 or better.
 
  We should probably update the scripts to not use overly
  new features (or alternatively decide that 2.6 is our new
  minimum -- what do RHEL5 and our other oldest-supported
  distros ship?)
 
  For this specific case I think it needs to change to
        except QAPISchemaError, e:
 
  Erik, can you try that and post a patch? Would be interesting
  to know if this is the only problem with older python.
 

 Yes, I will try that. I never really tried to send patches to this list...
 My python version is 2.4 - as it was assumed already.

 Best regards,

 Erik

This fixes it - it compiles successfully, but my guest no longer boots up
completely!
Windows XP gets a bluescreen and reboots in an infinite loop. Strange is: I was
requested to put some efi* files now on my target system for handling the
network cards (qemu complains at startup via stderr when I don't have them
available on my target system). But why? Where can I select to use the pxe*
files? There seems to be no possibility to select them via ./configure or as
qemu command line option. Maybe this is related to the bluescreen? 1.2.0 was
working properly.

Best regards,

Erik

[Qemu-devel] [PATCH 1/2] tests: Fix schema parser test for in-tree build

2013-08-20 Thread armbru

From: Markus Armbruster arm...@redhat.com

Commit 4f193e3 added the test, but screwed up in-tree builds
(SRCDIR=.): the tests's output overwrites the expected output, and is
thus compared to itself.

Reported-by: Laszlo Ersek ler...@redhat.com
Signed-off-by: Markus Armbruster arm...@redhat.com
---
 tests/.gitignore | 1 +
 tests/Makefile   | 8 
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/tests/.gitignore b/tests/.gitignore
index fb05c2a..d9c2ef4 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -19,3 +19,4 @@ test-thread-pool
 test-x86-cpuid
 test-xbzrle
 *-test
+qapi-schema/*.test.*
diff --git a/tests/Makefile b/tests/Makefile
index d044908..ad98439 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -247,10 +247,10 @@ check-tests/test-qapi.py: tests/test-qapi.py
 
 .PHONY: $(patsubst %, check-%, $(check-qapi-schema-y))
 $(patsubst %, check-%, $(check-qapi-schema-y)): check-%.json: 
$(SRC_PATH)/%.json
-   $(call quiet-command, PYTHONPATH=$(SRC_PATH)/scripts $(PYTHON) 
$(SRC_PATH)/tests/qapi-schema/test-qapi.py $^ $*.out 2$*.err; echo $$? 
$*.exit,   TEST  $*.out)
-   @diff -q $(SRC_PATH)/$*.out $*.out
-   @diff -q $(SRC_PATH)/$*.err $*.err
-   @diff -q $(SRC_PATH)/$*.exit $*.exit
+   $(call quiet-command, PYTHONPATH=$(SRC_PATH)/scripts $(PYTHON) 
$(SRC_PATH)/tests/qapi-schema/test-qapi.py $^ $*.test.out 2$*.test.err; echo 
$$? $*.test.exit,   TEST  $*.out)
+   @diff -q $(SRC_PATH)/$*.out $*.test.out
+   @diff -q $(SRC_PATH)/$*.err $*.test.err
+   @diff -q $(SRC_PATH)/$*.exit $*.test.exit
 
 # Consolidated targets
 
-- 
1.8.1.4

[Qemu-devel] [PATCH 0/2] tests: Fixes for in-tree build

2013-08-20 Thread armbru

From: Markus Armbruster arm...@redhat.com

Markus Armbruster (2):
  tests: Fix schema parser test for in-tree build
  tests: Update .gitignore for test-int128 and test-bitops

 tests/.gitignore | 3 +++
 tests/Makefile   | 8 
 2 files changed, 7 insertions(+), 4 deletions(-)

-- 
1.8.1.4

[Qemu-devel] [PATCH 2/2] tests: Update .gitignore for test-int128 and test-bitops

2013-08-20 Thread armbru

From: Markus Armbruster arm...@redhat.com

Forgotten in commit 6046c62 and 3464700.

Signed-off-by: Markus Armbruster arm...@redhat.com
---
 tests/.gitignore | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tests/.gitignore b/tests/.gitignore
index d9c2ef4..9ac044d 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -5,8 +5,10 @@ check-qjson
 check-qlist
 check-qstring
 test-aio
+test-bitops
 test-cutils
 test-hbitmap
+test-int128
 test-iov
 test-mul64
 test-qapi-types.[ch]
-- 
1.8.1.4

Re: [Qemu-devel] [PATCH 7/7] switch raw block driver from raw.o to raw_bsd.o

2013-08-20 Thread Kevin Wolf

Am 18.08.2013 um 16:29 hat Paolo Bonzini geschrieben:
 Il 16/08/2013 16:15, Laszlo Ersek ha scritto:
  +static int raw_reopen_prepare(BDRVReopenState *reopen_state,
  +  BlockReopenQueue *queue, Error **errp)
   {
  -return bdrv_reopen_prepare(bs-file);
  +BDRVReopenState tmp = *reopen_state;
  +
  +tmp.bs = tmp.bs-file;
  +return bdrv_reopen_prepare(tmp, queue, errp);
   }
 
 This should just return zero, my fault.

Which is because bdrv_reopen_queue() already queues bs-file for reopen.
The simple return 0; implementation is shared by all other format drivers
that support reopening images.

Kevin

Re: [Qemu-devel] [PATCH 4/7] raw_bsd: introduce special members

2013-08-20 Thread Kevin Wolf

Am 16.08.2013 um 16:15 hat Laszlo Ersek geschrieben:
 On 08/05/13 15:03, Paolo Bonzini wrote:
 
  [...]
 
  3) These members are special
 
  .format_name   is the string raw
  .bdrv_open raw_open should set bs-sg to bs-file-sg and return 0
  .bdrv_closeraw_close should do nothing
  .bdrv_proberaw_probe should just return 1.
 
 Signed-off-by: Laszlo Ersek ler...@redhat.com
 ---
  block/raw_bsd.c |   20 
  1 files changed, 20 insertions(+), 0 deletions(-)
 
 diff --git a/block/raw_bsd.c b/block/raw_bsd.c
 index 5bcbe71..86e018d 100644
 --- a/block/raw_bsd.c
 +++ b/block/raw_bsd.c
 @@ -112,3 +112,23 @@ static TYPE raw_create(void)
  {
  return bdrv_create_file();
  }
 +
 +static const char *raw_format_name(void)
 +{
 +return raw;
 +}
 +
 +static int raw_open(BlockDriverState *bs)
 +{
 +bs-sg = bs-file-sg;

I know that Paolo explicitly made this requirement, but I think it is
ugly. We should instead fix bdrv_is_sg() to look at bs-file if it
exists, like many other functions already do, and change the two readers
of bs-sg in block.c to use bdrv_is_sg().

 +return 0;
 +}
 +
 +static void raw_close(void)
 +{
 +}
 +
 +static int raw_probe(void)
 +{
 +return 1;
 +}

Maybe add a comment here like smallest possible positive score so that
raw is used if and only if no other block driver works.

Kevin

Re: [Qemu-devel] [PATCH 0/3] target-ppc: Add support for dumping guest memory using qemu gdb server

2013-08-20 Thread Aneesh Kumar K.V

Andreas Färber afaer...@suse.de writes:

 Hi Aneesh,

 Am 19.08.2013 14:29, schrieb Aneesh Kumar K.V:
 This patch series implement support for dumping guest memory using qemu gdb 
 server.

 I had a quick look through but will leave in-depth review to Alex or
 Anthony.

 Do you plan to implement dumping guest memory via QMP, too?

Are you looking at memsave command ? That would fail before. This patch
series should fix that too. For memsave to fail we need the below patch

diff --git a/cpus.c b/cpus.c
index 0f65e76..3340150 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1309,7 +1309,10 @@ void qmp_memsave(int64_t addr, int64_t size, const char 
*filename,
 l = sizeof(buf);
 if (l  size)
 l = size;
-cpu_memory_rw_debug(cpu, addr, buf, l, 0);
+if (cpu_memory_rw_debug(cpu, addr, buf, l, 0) != 0) {
+error_set(errp, QERR_IO_ERROR);
+goto exit;
+}
 if (fwrite(buf, 1, l, f) != l) {
 error_set(errp, QERR_IO_ERROR);
 goto exit;

Re: [Qemu-devel] [PATCH 0/7] introduce BSD-licensed block driver for raw

2013-08-20 Thread Kevin Wolf

Am 16.08.2013 um 16:15 hat Laszlo Ersek geschrieben:
 Paolo asked me to write such a driver based on his textual specification
 alone. The first patch captures his email in full, the rest re-quotes
 parts that are being implemented.
 
 The tree compiles at each patch. The series passes make check-block.
 
 block/raw.c is not removed because I wanted to keep it out of my
 series and out of my brain.
 
 Disclaimer: I couldn't care less if the raw block driver was public
 domain or AGPLv3+, as long as it qualifies as free software. I'm only
 trying to do what Paolo asked of me.
 
 Laszlo Ersek (7):
   add skeleton for BSD licensed raw BlockDriver
   raw_bsd: emit debug events in bdrv_co_readv() and bdrv_co_writev()
   raw_bsd: add raw_create()
   raw_bsd: introduce special members
   raw_bsd: add raw_create_options
   raw_bsd: register bdrv_raw
   switch raw block driver from raw.o to raw_bsd.o
 
  block/Makefile.objs |2 +-
  block/raw_bsd.c |  186 
 +++
  2 files changed, 187 insertions(+), 1 deletions(-)
  create mode 100644 block/raw_bsd.c

Reviewed if the individual added functions make sense, whether all
necessary function from struct BlockDriver are implemented, and which
fields from BlockDriverState need special handling (it's only bs-sg,
and we should probably get rid of that requirement)

Looks good in general, but please CC Stefan and me for v2 (like for all
block patches).

Kevin

Re: [Qemu-devel] [PATCH] powerpc iommu: enable multiple TCE requests

2013-08-20 Thread Paolo Bonzini

Il 20/08/2013 03:36, Alexey Kardashevskiy ha scritto:
 Hm. Here we might have a problem like this is we decide to migrate from
 QEMU with this patch running on modern kernel to QEMU without this patch
 running on old kernel - for this we might want to be able to disable
 multi-tce via machine options on newer kernels. Do we care enough to add
 such a parameter or we just disable migration and that's it?

Upstream doesn't support migration to older QEMU.

 This SandyBridge,enforce - what if the destination host running on old
 kernel was run without this option - will the migration fail?

The destination machine will not even start.

 But in this case, you do not need this because the hypercall works if
 emulated by QEMU.  I like Alex's solution of making it universally
 available in the dtb.
 
 The solution would be good if we did not already have H_PUT_TCE accelerated
 for emulated devices in the host kernel but we do have it.

The question is also whether you consider pSeries support complete
enough to be production ready---and until you have versioned machine
types I would say you don't.

If you still consider it somewhat experimental, I would do as Alex said:
make newer QEMU on older KVM as slower, and that's it.

Paolo

Re: [Qemu-devel] [PATCH] powerpc iommu: enable multiple TCE requests

2013-08-20 Thread Alexey Kardashevskiy

On 08/20/2013 06:39 PM, Paolo Bonzini wrote:
 Il 20/08/2013 03:36, Alexey Kardashevskiy ha scritto:
 Hm. Here we might have a problem like this is we decide to migrate from
 QEMU with this patch running on modern kernel to QEMU without this patch
 running on old kernel - for this we might want to be able to disable
 multi-tce via machine options on newer kernels. Do we care enough to add
 such a parameter or we just disable migration and that's it?
 
 Upstream doesn't support migration to older QEMU.


Hm. That makes things simpler. Then I do not understand why we need
migration protocol versions as QEMU version in stronger version for
migration. Ah, offtopic.

 This SandyBridge,enforce - what if the destination host running on old
 kernel was run without this option - will the migration fail?
 
 The destination machine will not even start.

 But in this case, you do not need this because the hypercall works if
 emulated by QEMU.  I like Alex's solution of making it universally
 available in the dtb.

 The solution would be good if we did not already have H_PUT_TCE accelerated
 for emulated devices in the host kernel but we do have it.
 
 The question is also whether you consider pSeries support complete
 enough to be production ready---and until you have versioned machine
 types I would say you don't.
 
 If you still consider it somewhat experimental, I would do as Alex said:
 make newer QEMU on older KVM as slower, and that's it.


Sorry if I miss anything, but is not it what the patch already does? :)
If so, I'll repost this patch + traces rework tomorrow or so.


-- 
Alexey

Re: [Qemu-devel] [PATCH v2 6/7] vl: Set current_machine early

2013-08-20 Thread Markus Armbruster

Andreas Färber afaer...@suse.de writes:

 Am 19.08.2013 11:35, schrieb Markus Armbruster:
 Andreas Färber afaer...@suse.de writes:
 
 Am 16.08.2013 15:18, schrieb arm...@redhat.com:
 From: Markus Armbruster arm...@redhat.com

 I'd like to access QEMUMachine from a QEMUMachine init() method, which
 is currently not possible.  Instead of passing it as an argument, I
 simply set current_machine earlier.

 We had such a patch for CPU hot-add and decided against doing this.
 Currently current_machine != signals that it has been initialized. And
 
 Does any code actually depend on this undocumented condition?  I found
 none.

 I didn't audit. Currently the users are limited to vl.c itself,
 device-hotplug.c for block_default_type and qmp.c for hot_add_cpu. pc.c
 feels odd in that mix.

 [...]
 Can't you pass either QEMUMachine or the specific fields needed from PC
 code to those SMBIOS functions? You did add a bool argument.
 
 Can't see how to do that without passing the machine to QEMUMachine
 method init(), which involves touching all boards.  I doubt that's a
 good idea, but if you insist, I can do it.

 Isn't that exactly what QEMUMachineArgs was meant to address? :)
 Had a look at your don't-explode patches and they looked good.

I could give it a try, but to be honest, I'm reluctant to base new work
on a series that has been ignored by all committers for more than two
months.

Re: [Qemu-devel] [PATCH] powerpc iommu: enable multiple TCE requests

2013-08-20 Thread Paolo Bonzini

Il 20/08/2013 10:49, Alexey Kardashevskiy ha scritto:
 On 08/20/2013 06:39 PM, Paolo Bonzini wrote:
 Il 20/08/2013 03:36, Alexey Kardashevskiy ha scritto:
 Hm. Here we might have a problem like this is we decide to migrate from
 QEMU with this patch running on modern kernel to QEMU without this patch
 running on old kernel - for this we might want to be able to disable
 multi-tce via machine options on newer kernels. Do we care enough to add
 such a parameter or we just disable migration and that's it?

 Upstream doesn't support migration to older QEMU.
 
 Hm. That makes things simpler. Then I do not understand why we need
 migration protocol versions as QEMU version in stronger version for
 migration. Ah, offtopic.

Because you have multiple downstreams that can backport arbitrary crap
on top of the same base QEMU version.

In RHEL for example we do some effort to force usage of older version
IDs (or to forcibly leave out subsections) depending on the machine type
you're using.

So even though upstream is limited in the kind of migrations we support,
we provide the infrastructure for downstreams to give stronger guarantees.

 This SandyBridge,enforce - what if the destination host running on old
 kernel was run without this option - will the migration fail?

 The destination machine will not even start.

 But in this case, you do not need this because the hypercall works if
 emulated by QEMU.  I like Alex's solution of making it universally
 available in the dtb.

 The solution would be good if we did not already have H_PUT_TCE accelerated
 for emulated devices in the host kernel but we do have it.

 The question is also whether you consider pSeries support complete
 enough to be production ready---and until you have versioned machine
 types I would say you don't.

 If you still consider it somewhat experimental, I would do as Alex said:
 make newer QEMU on older KVM as slower, and that's it.
 
 Sorry if I miss anything, but is not it what the patch already does? :)

No, you need to expose multitce unconditionally in the device tree.

Paolo

 If so, I'll repost this patch + traces rework tomorrow or so.

Re: [Qemu-devel] [Xen-devel] [PATCH] Qemu-xen: HVM S3 bugfix

2013-08-20 Thread George Dunlap

On Mon, Aug 19, 2013 at 3:47 PM, Liu, Jinsong jinsong@intel.com wrote:
 Ping?

I know Ian and Stefano are doing something at ARM this week -- Anthony
might be able to give a review, at least...
 -George

Re: [Qemu-devel] [PATCH] powerpc iommu: enable multiple TCE requests

2013-08-20 Thread Benjamin Herrenschmidt

On Tue, 2013-08-20 at 11:09 +0200, Paolo Bonzini wrote:
  Sorry if I miss anything, but is not it what the patch already
 does? :)
 
 No, you need to expose multitce unconditionally in the device tree.

If I'm not mistaken the multitce kernel side patches are still not
upstream so I disagree. Exposing it will make Linux use it which means
that anything running on 3.10 or 3.11 will become very slow.

So no, multitce should not be exposed if KVM doesn't support it.

Cheers,
Ben.

Re: [Qemu-devel] [PATCH] powerpc iommu: enable multiple TCE requests

2013-08-20 Thread Paolo Bonzini

Il 20/08/2013 11:13, Benjamin Herrenschmidt ha scritto:
 On Tue, 2013-08-20 at 11:09 +0200, Paolo Bonzini wrote:
 Sorry if I miss anything, but is not it what the patch already
 does? :)

 No, you need to expose multitce unconditionally in the device tree.
 
 If I'm not mistaken the multitce kernel side patches are still not
 upstream so I disagree. Exposing it will make Linux use it which means
 that anything running on 3.10 or 3.11 will become very slow.
 
 So no, multitce should not be exposed if KVM doesn't support it.

Then you have to do it right and:

- provide the infrastructure to enable/disable it from the command line
(which will be enough design effort alone);

- add pseries-1.6 as a synonym of pseries in 1.6.1;

- add pseries-1.7 a synonym of pseries in master;

- add a pseries-1.6 machine type in master that always disables it.

Paolo

Re: [Qemu-devel] [PATCH] powerpc iommu: enable multiple TCE requests

2013-08-20 Thread Benjamin Herrenschmidt

On Tue, 2013-08-20 at 11:15 +0200, Paolo Bonzini wrote:
 
 - provide the infrastructure to enable/disable it from the command
 line
 (which will be enough design effort alone);

sight ... why do things simply when we can come up with a cathedral
instead ?

 - add pseries-1.6 as a synonym of pseries in 1.6.1;
 
 - add pseries-1.7 a synonym of pseries in master;
 
 - add a pseries-1.6 machine type in master that always disables it.

Re: [Qemu-devel] [PATCH] powerpc iommu: enable multiple TCE requests

2013-08-20 Thread Paolo Bonzini

Il 20/08/2013 11:20, Benjamin Herrenschmidt ha scritto:
 On Tue, 2013-08-20 at 11:15 +0200, Paolo Bonzini wrote:

 - provide the infrastructure to enable/disable it from the command
 line
 (which will be enough design effort alone);
 
 sight ... why do things simply when we can come up with a cathedral
 instead ?

Uhm... I thought Alex and I were the one who proposed the simple things.
 You _will_ need to do the complicated stuff sooner or later, but it's
probably early enough now to ignore it.

And I'm not saying this out of love for big cathedrals, but out of
lessons we learned the hard way for x86 (where we haven't gotten
everything right yet, either).

Paolo

 - add pseries-1.6 as a synonym of pseries in 1.6.1;

 - add pseries-1.7 a synonym of pseries in master;

 - add a pseries-1.6 machine type in master that always disables it.

Re: [Qemu-devel] [PATCH] powerpc iommu: enable multiple TCE requests

2013-08-20 Thread Benjamin Herrenschmidt

On Tue, 2013-08-20 at 11:22 +0200, Paolo Bonzini wrote:
 
 Uhm... I thought Alex and I were the one who proposed the simple things.
  You _will_ need to do the complicated stuff sooner or later, but it's
 probably early enough now to ignore it.
 
 And I'm not saying this out of love for big cathedrals, but out of
 lessons we learned the hard way for x86 (where we haven't gotten
 everything right yet, either).

I suppose if RH is going to deploy 3.10 and we aren't going to backport
the multitce patches then there *might* be a case for supporting that
combo specifically... but it's going to be that bad every time we add
a new feature with a kernel counter part or start adding the gazillion
little bits of PAPR that we are still missing ?

Ben.

Re: [Qemu-devel] [PATCHv11 13/31] aio / timers: aio_ctx_prepare sets timeout from AioContext timers

2013-08-20 Thread Alex Bligh


On 20 Aug 2013, at 08:19, Wenchao Xia wrote:

  Thanks for the explanation. It seems *timeout is always set
 to -1 before calling GSource's prepare(), so
 *timeout = qemu_soonest_timeout(*timeout, 10);
 will always get *timeout = 10, so this call can be saved.

I believe that's incorrect too. glib *currently* calls
the API that way, but there's nothing to say a future
or past glub couldn't call each g_source with the same
timeout pointer, with each g_source adjusting the timeout
downwards. Or (for instance) call it with 0 for a
non-blocking prepare. Therefore the implemented way
is safer (only reducing the timeout).

-- 
Alex Bligh

Re: [Qemu-devel] [RFC] Convert AioContext to Gsource sub classes

2013-08-20 Thread Wenchao Xia


于 2013-8-16 16:12, Wenchao Xia 写道:

于 2013-8-16 15:15, Wenchao Xia 写道:

于 2013-8-16 0:32, Michael Roth 写道:

Quoting Michael Roth (2013-08-15 10:23:20)

Quoting Wenchao Xia (2013-08-13 03:44:39)

于 2013-8-13 1:01, Michael Roth 写道:

Quoting Paolo Bonzini (2013-08-12 02:30:28)

1) rename AioContext to AioSource.
This is my major purpose, which declare it is not a context
concept,
and GMainContext is the entity represent the thread's activity.


Note that the nested event loops in QEMU are _very_ different from
glib nested event loops.  In QEMU, nested event loops only run block
layer events.  In glib, they run all events.  That's why you need
AioContext.


Would it be possible to use glib for our nested loops as well by just
setting a higher priority for the AioContext GSource?

Stefan and I were considering how we could make use of his drop
ioflush patches to use a common mechanism to register fd events, but
still allow us to distinguish between AioContext and non-AioContext
for nested loops. I was originally thinking of using prepare()
functions
to filter out non-AioContext events, but that requires we implement
on GSource's with that in mind, and non make use of pre-baked ones
like GIOChannel's, and bakes block stuff into every event source
implementation.


Besides priority, also g_source_set_can_recurse() can help.
With a deeper think, I found a harder problem:
g_main_context_acquire() and g_main_context_release(). In release,
pending BH/IO call back need to be cleared, but this action can't
be triggered automatically when user call g_main_context_release().


I don't understand why this is a requirement, gmctx_acquire/release
ensure
that only one thread attempts to iterate the main loop at a time. this
isn't currently an issue in qemu, and if we re-implemented
qemu_aio_wait()
to use the same glib interfaces, the tracking of in-flight requests
would
be moved to the block layer via Stefan's 'drop io_flush' patches, which
moves that block-specific logic out of the main loop/AioContext GSource
by design. Are there other areas where you see this as a problem?


I think I understand better what you're referring to, you mean that
if qemu_aio_wait was called, and was implementated to essentially call
g_main_context_iterate(), that after, say, 1 iteration, the
iothread/dataplane thread might acquire the main loop and dispatch
block/non-block events between qemu_aio_wait() returned? The simple
approach would be to have qemu_aio_wait() call
g_main_context_acquire/release
at the start end of the function, which would ensure that this never
happens.


   qemu_aio_wait() is relative simple to resolve by
g_main_context_acquire(), but also shows additional code needed
for a thread switch:
(guess following is the plan to implement qemu_aio_wait())
qemu_aio_wait(GMainContext *ctx)
{
 return g_main_context(ctx, PRI_BLOCK);
}
at caller:
{
 while (qemu_aio_wait(ctx) {
 ;
 }
 g_main_context_release(ctx);
}
   The above code shows that, in *ctx switch operation, there is
more than glib's own event loop API envolved, qemu_aio_wait(). So
it referenced to a question: what data structure
should be used to represent context concept and control the thread
switching behavior?  It is better to have a clear layer, GMainContext or
GlibQContext, instead of GMainContext plus custom function. The caller
reference to at least two: nested user block layer, flat user above
block layer.
   In my opinion, this problem is brought by Gsource AioContext, Take
the call path of bdrv_aio_readv(*bdrv_cb) on raw linux file as
an example, there are aync following operations involved for AioContext:
1 the bdrv_cb() will be executed in bdrv_co_em_bh().
2 bdrv_co_io_em_complete() will be executed in event_notfier_ready().
3 aio_worker() will be executed in worker_thread().
   Operation 2 and 3 are tracked by block layer's queue after Stefan's
dropping io_flush() series.
   Now if we stick to GMainContext to represent context concept,
then when thread B want to aquire GMainContext used by thread A,
all works setupped by A should be finished before B aquire it,
otherwise B will execute some function supposed to work in A. The
work refers to the three kind of aync functions above.
   For this issue, we can solve it by different means:
1 the event loop API doesn't guarentee work setupped by thread A
will always be finished in A. This put a limitation to caller to
consider thread switching. I talked on IRC with Stefan, and thinks
it is possible for the nested user block layer, but I still want to
avoid this to the flat user above block layer.
2 ask caller to finish all pending operations.
   2.1 glib GMainContext API plus custom API such as aio_wait(). This is
bad that detail under GMainContext is exposed to caller. Since
operation 1 mentioned before is not tracked yet, to make sure bdrv_cb()
is called in thread setupped it, 1 need to be added in the track
queue, or in the call chain of aio_wait(), check the queue plus

Re: [Qemu-devel] [PATCH] powerpc iommu: enable multiple TCE requests

2013-08-20 Thread Paolo Bonzini

Il 20/08/2013 11:26, Benjamin Herrenschmidt ha scritto:
 On Tue, 2013-08-20 at 11:22 +0200, Paolo Bonzini wrote:

 Uhm... I thought Alex and I were the one who proposed the simple things.
  You _will_ need to do the complicated stuff sooner or later, but it's
 probably early enough now to ignore it.

 And I'm not saying this out of love for big cathedrals, but out of
 lessons we learned the hard way for x86 (where we haven't gotten
 everything right yet, either).
 
 I suppose if RH is going to deploy 3.10 and we aren't going to backport
 the multitce patches then there *might* be a case for supporting that
 combo specifically... but it's going to be that bad every time we add
 a new feature with a kernel counter part or start adding the gazillion
 little bits of PAPR that we are still missing ?

Yes. :(

Unless you consider pSeries KVM not mature enough to provide a guest ABI
(basically supporting live migration only between identical kernels and
QEMUs).  It would make some sense, for example (mutatis mutandis) Red
Hat has a kernel ABI for the regular RHEL kernel, but not for the
real-time RHEL kernel.

Paolo

Re: [Qemu-devel] minimal linux distribution for qemu

2013-08-20 Thread Rob Landley


On 08/18/2013 08:57:08 AM, Herbei Dacian wrote:


good to know.
I was working back in 2005-2006 with a company that had a 4MB kernel.
At that time I was too inexperienced to work at that level but I  
thought now I could reproduce their work with some help.
Anyhow for the moment I'll go for 256 MB of ram board just so that I  
don't worry too much about things that are not yet relevant for me.

But thanks again for the warning.
But since you helped me soo much I have another question.
Is it fisible to change the emulator so that I may visualize the  
following aspects:

_ address of the currently executed instruction from the guest system
_ if this instruction is a form of jump like call return conditional  
jump.

_ the address or range of addresses read by this instruction
_ the address or range of addresses written by this instruction


If you feed qemu the -s option it'll open a network port you can  
connect to to provide the gdbserver protocol (gdb's target remote  
command attaches to this). For system emulation it acts like a jtag  
attached to the emulated hardware, letting you see registers and such.


I read some things about the emulator and if I understood it  
correctly the emulator
breaks the instructions of the gurest platform in micro ops which are  
then executed

on the host operation system.


Not really, no.

QEMU translates large blocks of code (used to be pages, now it's  
variable sized chunks depending on where the return instruction is) and  
keeps the translated versions cached (sort of like a java JIT). The  
main QEMU loop then calls the translated functions which execute until  
they return or get interrupted by signals (simulating things like timer  
IRQ). This is why QEMU is so fast, the actual translation overhead is  
amortized by the resulting native code being run lots of times, a  
function or loop gets translated once and then runs as native code.


This means that the address of the currently executing instruction  
isn't really something qemu naturally tracks, because although there  
_is_ a copy of the untranslated code page, it's not what we're running.  
The gdbserver code tries to do so artifically, but it's slow and  
awkward and not always perfect.


Self-modifying code is actually a horrible thing to do to qemu, from a  
performance perspective. Every time the emulated code page is modified,  
the cached copy of the translated code is discarded and the entire page  
gets retranslated. This means that in Aboriginal Linux, the shell  
scripts ./configure runs sped up 20% when I replaced my dynamically  
linked busybox with a statically linked one, due to the extra  
translations caused by the relocation fixups.


Rob

Re: [Qemu-devel] [PATCH v2] Fix query-migrate documentation in qmp-commands.hx

2013-08-20 Thread Orit Wasserman

On 08/19/2013 09:41 PM, Luiz Capitulino wrote:
 On Mon, 12 Aug 2013 10:19:52 -0400
 Luiz Capitulino lcapitul...@redhat.com wrote:
 
 On Thu,  8 Aug 2013 20:05:48 +0300
 Orit Wasserman owass...@redhat.com wrote:

 ram is present also when migration completes.
 expected-downtime, total-time and downtime are no longer part of ram data.

 Signed-off-by: Orit Wasserman owass...@redhat.com

 Applied to the qmp branch, thanks.
 
 This one missed the deadline for 1.6. I'm my updating my tree for 1.7,
 but this patch conflicts with 8f3067, which repeated for the setup-time
 key the mistake this patch is fixing for other keys.
 
 We need a respin for this patch, which should also move setup-time
 to the main dict. Orit, I can do it myself if you want.
 

That will be great,
Thanks,
Orit

 ---
  qmp-commands.hx | 20 ++--
  1 file changed, 10 insertions(+), 10 deletions(-)

 diff --git a/qmp-commands.hx b/qmp-commands.hx
 index 2e59b0d..a22a841 100644
 --- a/qmp-commands.hx
 +++ b/qmp-commands.hx
 @@ -2626,8 +2626,8 @@ The main json-object contains the following:
  - expected-downtime: only present while migration is active
  total amount in ms for downtime that was calculated on
  the last bitmap round (json-int)
 -- ram: only present if status is active, it is a json-object with the
 -  following RAM information:
 +- ram: only present if status is active or complete, it is a
 + json-object with the following RAM information:
   - transferred: amount transferred in bytes (json-int)
   - remaining: amount remaining to transfer in bytes (json-int)
   - total: total amount of memory in bytes (json-int)
 @@ -2669,12 +2669,12 @@ Examples:
  - { execute: query-migrate }
  - { return: {
  status: completed,
 +total-time:12345,
 +downtime:12345,
  ram:{
transferred:123,
remaining:123,
total:246,
 -  total-time:12345,
 -  downtime:12345,
duplicate:123,
normal:123,
normal-bytes:123456
 @@ -2693,12 +2693,12 @@ Examples:
  - {
return:{
   status:active,
 + total-time:12345,
 + expected-downtime:12345,
   ram:{
  transferred:123,
  remaining:123,
  total:246,
 -total-time:12345,
 -expected-downtime:12345,
  duplicate:123,
  normal:123,
  normal-bytes:123456
 @@ -2712,12 +2712,12 @@ Examples:
  - {
return:{
   status:active,
 + total-time:12345,
 + expected-downtime:12345,
   ram:{
  total:1057024,
  remaining:1053304,
  transferred:3720,
 -total-time:12345,
 -expected-downtime:12345,
  duplicate:123,
  normal:123,
  normal-bytes:123456
 @@ -2736,13 +2736,13 @@ Examples:
  - {
return:{
   status:active,
 + total-time:12345,
 + expected-downtime:12345,
   capabilities : [ { capability: xbzrle, state : true } ],
   ram:{
  total:1057024,
  remaining:1053304,
  transferred:3720,
 -total-time:12345,
 -expected-downtime:12345,
  duplicate:10,
  normal:,
  normal-bytes:3412992

Re: [Qemu-devel] [PATCH 0/2] tests: Fixes for in-tree build

2013-08-20 Thread Andreas Färber

Am 20.08.2013 09:47, schrieb arm...@redhat.com:
 From: Markus Armbruster arm...@redhat.com
 
 Markus Armbruster (2):
   tests: Fix schema parser test for in-tree build
   tests: Update .gitignore for test-int128 and test-bitops

Series

Reviewed-by: Andreas Färber afaer...@suse.de

but you should probably add Cc: qemu-sta...@nongnu.org to 1/2 at least.

Andreas

 
  tests/.gitignore | 3 +++
  tests/Makefile   | 8 
  2 files changed, 7 insertions(+), 4 deletions(-)
 


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

[Qemu-devel] [PATCH -V2 3/4] target-ppc: Fix page table lookup with kvm enabled

2013-08-20 Thread Aneesh Kumar K.V

From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

With kvm enabled, we store the hash page table information in the hypervisor.
Use ioctl to read the htab contents. Without this we get the below error when
trying to read the guest address

 (gdb) x/10 do_fork
 0xc0098660 do_fork:   Cannot access memory at address 
0xc0098660
 (gdb)

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 target-ppc/kvm.c| 45 +
 target-ppc/kvm_ppc.h|  9 -
 target-ppc/mmu-hash64.c | 25 -
 3 files changed, 69 insertions(+), 10 deletions(-)

diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index a629447..b590808 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -1885,3 +1885,48 @@ int kvm_arch_on_sigbus(int code, void *addr)
 void kvm_arch_init_irq_routing(KVMState *s)
 {
 }
+
+int kvmppc_hash64_load_hpte(PowerPCCPU *cpu, uint64_t index,
+target_ulong *hpte0, target_ulong *hpte1)
+{
+int htab_fd;
+struct kvm_get_htab_fd ghf;
+struct kvm_get_htab_buf {
+struct kvm_get_htab_header header;
+/*
+ * Older kernel required one extra byte.
+ */
+unsigned long hpte[3];
+} hpte_buf;
+
+*hpte0 = 0;
+*hpte1 = 0;
+if (!cap_htab_fd) {
+return 0;
+}
+/*
+ * At this point we are only interested in reading only bolted entries
+ */
+ghf.flags = KVM_GET_HTAB_BOLTED_ONLY;
+ghf.start_index = index;
+htab_fd = kvm_vm_ioctl(kvm_state, KVM_PPC_GET_HTAB_FD, ghf);
+if (htab_fd  0) {
+return htab_fd;
+}
+
+if (read(htab_fd, hpte_buf, sizeof(hpte_buf))  0) {
+goto out;
+}
+/*
+ * We only requested for one entry, So we should get only 1
+ * valid entry at the same index
+ */
+if (hpte_buf.header.n_valid != 1 || hpte_buf.header.index != index) {
+goto out;
+}
+*hpte0 = hpte_buf.hpte[0];
+*hpte1 = hpte_buf.hpte[1];
+out:
+close(htab_fd);
+return 0;
+}
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index 4ae7bf2..e25373a 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -42,7 +42,8 @@ int kvmppc_get_htab_fd(bool write);
 int kvmppc_save_htab(QEMUFile *f, int fd, size_t bufsize, int64_t max_ns);
 int kvmppc_load_htab_chunk(QEMUFile *f, int fd, uint32_t index,
uint16_t n_valid, uint16_t n_invalid);
-
+int kvmppc_hash64_load_hpte(PowerPCCPU *cpu, uint64_t index,
+target_ulong *hpte0, target_ulong *hpte1);
 #else
 
 static inline uint32_t kvmppc_get_tbfreq(void)
@@ -181,6 +182,12 @@ static inline int kvmppc_load_htab_chunk(QEMUFile *f, int 
fd, uint32_t index,
 abort();
 }
 
+static inline int kvmppc_hash64_load_hpte(PowerPCCPU *cpu, uint64_t index,
+  target_ulong *hpte0,
+  target_ulong *hpte1)
+{
+abort();
+}
 #endif
 
 #ifndef CONFIG_KVM
diff --git a/target-ppc/mmu-hash64.c b/target-ppc/mmu-hash64.c
index 67fc1b5..4d8120c 100644
--- a/target-ppc/mmu-hash64.c
+++ b/target-ppc/mmu-hash64.c
@@ -302,17 +302,26 @@ static int ppc_hash64_amr_prot(CPUPPCState *env, 
ppc_hash_pte64_t pte)
 return prot;
 }
 
-static hwaddr ppc_hash64_pteg_search(CPUPPCState *env, hwaddr pteg_off,
+static hwaddr ppc_hash64_pteg_search(CPUPPCState *env, hwaddr hash,
  bool secondary, target_ulong ptem,
  ppc_hash_pte64_t *pte)
 {
-hwaddr pte_offset = pteg_off;
+uint64_t index;
+hwaddr pte_offset;
 target_ulong pte0, pte1;
 int i;
 
+pte_offset = (hash * HASH_PTEG_SIZE_64)  env-htab_mask;;
+index = (hash * HPTES_PER_GROUP)  env-htab_mask;
+
 for (i = 0; i  HPTES_PER_GROUP; i++) {
-pte0 = ppc_hash64_load_hpte0(env, pte_offset);
-pte1 = ppc_hash64_load_hpte1(env, pte_offset);
+if (kvm_enabled()) {
+index += i;
+kvmppc_hash64_load_hpte(ppc_env_get_cpu(env), index, pte0, pte1);
+} else {
+pte0 = ppc_hash64_load_hpte0(env, pte_offset);
+pte1 = ppc_hash64_load_hpte1(env, pte_offset);
+}
 
 if ((pte0  HPTE64_V_VALID)
  (secondary == !!(pte0  HPTE64_V_SECONDARY))
@@ -332,7 +341,7 @@ static hwaddr ppc_hash64_htab_lookup(CPUPPCState *env,
  ppc_slb_t *slb, target_ulong eaddr,
  ppc_hash_pte64_t *pte)
 {
-hwaddr pteg_off, pte_offset;
+hwaddr pte_offset;
 hwaddr hash;
 uint64_t vsid, epnshift, epnmask, epn, ptem;
 
@@ -367,8 +376,7 @@ static hwaddr ppc_hash64_htab_lookup(CPUPPCState *env,
  vsid= TARGET_FMT_lx  ptem= TARGET_FMT_lx
  hash= TARGET_FMT_plx \n,
 env-htab_base, env-htab_mask, vsid, ptem,  hash);
-pteg_off = (hash * HASH_PTEG_SIZE_64)  env-htab_mask;

[Qemu-devel] [PATCH -V2 2/4] target-ppc: Use #define instead of opencoding SLB valid bit

2013-08-20 Thread Aneesh Kumar K.V

From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

Use SLB_ESID_V instead of (1  27) in the code

Reviewed-by: Andreas Färber afaer...@suse.de
Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 target-ppc/mmu_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target-ppc/mmu_helper.c b/target-ppc/mmu_helper.c
index 5dd4e05..9c9132e 100644
--- a/target-ppc/mmu_helper.c
+++ b/target-ppc/mmu_helper.c
@@ -2061,7 +2061,7 @@ void helper_store_sr(CPUPPCState *env, target_ulong 
srnum, target_ulong value)
 /* ESID = srnum */
 rb |= ((uint32_t)srnum  0xf)  28;
 /* Set the valid bit */
-rb |= 1  27;
+rb |= SLB_ESID_V;
 /* Index = ESID */
 rb |= (uint32_t)srnum;
 
-- 
1.8.1.2

[Qemu-devel] [PATCH -V2 0/4] target-ppc: Add support for dumping guest memory using qemu gdb server

2013-08-20 Thread Aneesh Kumar K.V


This patch series implement support for dumping guest memory using qemu gdb 
server.

With this patch series we can now do

(gdb) x/4i htab_call_hpte_insert1
   0xc00470d8 .htab_call_hpte_insert1:bl  
0xc00470d8 .htab_call_hpte_insert1
   0xc00470dc .htab_call_hpte_insert1+4:  cmpdi   r3,0
   0xc00470e0 .htab_call_hpte_insert1+8:  bge 
0xc0047190 htab_pte_insert_ok
   0xc00470e4 .htab_call_hpte_insert1+12: cmpdi   r3,-2
(gdb) target remote localhost:1234
Remote debugging using localhost:1234
.plpar_hcall_norets () at arch/powerpc/platforms/pseries/hvCall.S:119
119 HCALL_INST_POSTCALL_NORETS
(gdb) x/4i htab_call_hpte_insert1
   0xc00470d8 .htab_call_hpte_insert1:bl  
0xc005f8f0 pSeries_lpar_hpte_insert
   0xc00470dc .htab_call_hpte_insert1+4:  cmpdi   r3,0
   0xc00470e0 .htab_call_hpte_insert1+8:  bge 
0xc0047190 htab_pte_insert_ok
   0xc00470e4 .htab_call_hpte_insert1+12: cmpdi   r3,-2
(gdb)

NOTE: We still don't support inserting breakpoints.

Before Fix:
(qemu) memsave 0xc00470d8 10 memdump
Invalid parameter 'addr'
(qemu)

After fix:

(qemu)  memsave 0xc00470d8 10 memdump
(qemu)

[Qemu-devel] [PATCH -V2 1/4] target-ppc: Update slb array with correct index values.

2013-08-20 Thread Aneesh Kumar K.V

From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

Without this, a value of rb=0 and rs=0 results in replacing the 0th
index. This can be observed when using gdb remote debugging support.

(gdb) x/10i do_fork
   0xc0085330 do_fork:Cannot access memory at address 
0xc0085330
(gdb)

This is because when we do the slb sync via kvm_cpu_synchronize_state,
we overwrite the slb entry (0th entry) for 0xc0085330

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 target-ppc/kvm.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 30a870e..a629447 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -1034,8 +1034,18 @@ int kvm_arch_get_registers(CPUState *cs)
 /* Sync SLB */
 #ifdef TARGET_PPC64
 for (i = 0; i  64; i++) {
-ppc_store_slb(env, sregs.u.s.ppc64.slb[i].slbe,
-   sregs.u.s.ppc64.slb[i].slbv);
+target_ulong rb = sregs.u.s.ppc64.slb[i].slbe;
+/*
+ * KVM_GET_SREGS doesn't retun slb entry with slot information
+ * same as index. So don't depend on the slot information in
+ * the returned value.
+ */
+rb = ~0xfff;
+/*
+ * use the array index as the slot
+ */
+rb |= i;
+ppc_store_slb(env, rb, sregs.u.s.ppc64.slb[i].slbv);
 }
 #endif
 
-- 
1.8.1.2

[Qemu-devel] [PATCH -V2 4/4] target-ppc: Check for error on address translation in memsave command

2013-08-20 Thread Aneesh Kumar K.V

From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

When we translate the virtual address to physical check for error.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 cpus.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/cpus.c b/cpus.c
index 0f65e76..658366d 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1309,7 +1309,10 @@ void qmp_memsave(int64_t addr, int64_t size, const char 
*filename,
 l = sizeof(buf);
 if (l  size)
 l = size;
-cpu_memory_rw_debug(cpu, addr, buf, l, 0);
+if (cpu_memory_rw_debug(cpu, addr, buf, l, 0) != 0) {
+error_set(errp, QERR_INVALID_PARAMETER, addr);
+goto exit;
+}
 if (fwrite(buf, 1, l, f) != l) {
 error_set(errp, QERR_IO_ERROR);
 goto exit;
-- 
1.8.1.2

Re: [Qemu-devel] vmdk stream-optimised format

2013-08-20 Thread Paolo Bonzini

Il 20/08/2013 09:08, Fam Zheng ha scritto:
 On Tue, 08/20 07:51, Alex Bligh wrote:

 On 20 Aug 2013, at 02:42, Fam Zheng wrote:

 OK, thanks for explaination. That sounds a valid use case for
 streamOptimized. However I am afraid QEMU and its users benefit not much
 from this feature anyway, because it's moving a VM away to VMware, :)
 that might be the reason it's not there yet, and I don't know about any
 plan to do it in the near future.

 Well, given it is an open source project, the more interoperability
 the better. Even if it just means users need not worry about lock
 in to faster hypervisors ... Being more serious, qemu-img is
 part of the project too.

 But if someone sends patches for this, I think it is possible to get in.

 I guessed send code might be the answer :-)

 What I'm not sure of is whether the streaming format has to be written
 sequentially from as opposed to random writes. I believe the way
 qemu-img convert works, one can't guarantee the writes are
 sequential.

 The order of sectors doesn't matter, but granularity should be aligned
 to, as the data is compressed cluster by cluster. And no overwrite, of
 course. The challenge may be that header comes at the end of file
 (well, called footer), which is not decided at create time.

It doesn't look too different from what qemu-img convert -c does,
except that you need to use the right -o incantation to specify the
format type.

Paolo

Re: [Qemu-devel] [RFC] Hardware acceleration for Windows and OS X?

2013-08-20 Thread Paolo Bonzini

Il 19/08/2013 22:17, Stefan Weil ha scritto:
 Hello,
 
 is anybody working on KVM or something equivalent for Windows and Mac OS X?
 
 I recently noticed that the QEMU based Android emulator uses KVM for Linux
 and HAXM, an Intel hypervisor, for Windows and Mac OS X.
 
 Would it be a good plan to merge HAXM support in latest QEMU
 (the Android QEMU is based on a rather old version)? It only
 supports Intel based hosts, is restricted to 32 bit guests (?)
 and free, but not open source.

Considering the level of openness in both Windows and Mac OS X, a
proprietary hypervisor would be the smallest of the problems for someone
using those OSes...

I guess patches are welcome---if they are of good quality, why not.
The community doesn't have that many kernel hackers for Windows and Mac
OS X.

Paolo

Re: [Qemu-devel] [RFC] Hardware acceleration for Windows and OS X?

2013-08-20 Thread Peter Maydell

On 20 August 2013 11:56, Paolo Bonzini pbonz...@redhat.com wrote:
 I guess patches are welcome---if they are of good quality, why not.

Well, sort of. Given the fairly minimal level of support we currently
have for OSX (ie Andreas and I test it a bit and fix egregious breakage),
I'd prefer it if patches for significant new features came attached to a
developer who was going to stay around and help maintain the platform :-)

-- PMM

Re: [Qemu-devel] [PATCHv11 13/31] aio / timers: aio_ctx_prepare sets timeout from AioContext timers

2013-08-20 Thread Wenchao Xia


于 2013-8-20 17:29, Alex Bligh 写道:


On 20 Aug 2013, at 08:19, Wenchao Xia wrote:


  Thanks for the explanation. It seems *timeout is always set
to -1 before calling GSource's prepare(), so
*timeout = qemu_soonest_timeout(*timeout, 10);
will always get *timeout = 10, so this call can be saved.


I believe that's incorrect too. glib *currently* calls
the API that way, but there's nothing to say a future
or past glub couldn't call each g_source with the same
timeout pointer, with each g_source adjusting the timeout
downwards. Or (for instance) call it with 0 for a

  So it is an undefined value, should avoid use it?
For example, other gsource have minimum timeout of 5, and
aio_ctx_prepare() is called with 0, then returned timeout
is 0, but it is supposed to work with timeout 5 as the glib
doc described:
https://developer.gnome.org/glib/2.32/glib-The-Main-Event-Loop.html#GSourceFuncs

That is my opinion, but I haven't read other project's
code about prepare() usage, so can't be sure. Guess maintainer
can give a conclusion.



non-blocking prepare. Therefore the implemented way
is safer (only reducing the timeout).


--
Best Regards

Wenchao Xia

Re: [Qemu-devel] [PATCH] powerpc iommu: enable multiple TCE requests

2013-08-20 Thread Benjamin Herrenschmidt

On Tue, 2013-08-20 at 12:14 +0200, Paolo Bonzini wrote:
  I suppose if RH is going to deploy 3.10 and we aren't going to backport
  the multitce patches then there *might* be a case for supporting that
  combo specifically... but it's going to be that bad every time we add
  a new feature with a kernel counter part or start adding the gazillion
  little bits of PAPR that we are still missing ?
 
 Yes. :(
 
 Unless you consider pSeries KVM not mature enough to provide a guest ABI
 (basically supporting live migration only between identical kernels and
 QEMUs).  It would make some sense, for example (mutatis mutandis) Red
 Hat has a kernel ABI for the regular RHEL kernel, but not for the
 real-time RHEL kernel.

Migration is somewhat less of an issue, and there is to consider what products
will actually support KVM on Power. So at this early stage of the game, I would
still play it by ear and stay flexible. When we have something we really need to
have long term support for around the corner (or whatever we haven't announced
yet :-) we'll backport what's needed.

But yes, the minute we have that out, I think the problem will present itself,
at which point we need to probably start considering features in batches to
limit the explosion of individual options ...

Cheers,
Ben.

[Qemu-devel] [PATCH v2 1/2] tests: Fix schema parser test for in-tree build

2013-08-20 Thread armbru

From: Markus Armbruster arm...@redhat.com

Commit 4f193e3 added the test, but screwed up in-tree builds
(SRCDIR=.): the tests's output overwrites the expected output, and is
thus compared to itself.

Cc: qemu-sta...@nongnu.org
Reported-by: Laszlo Ersek ler...@redhat.com
Reviewed-by: Andreas Färber afaer...@suse.de
Signed-off-by: Markus Armbruster arm...@redhat.com
---
 tests/.gitignore | 1 +
 tests/Makefile   | 8 
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/tests/.gitignore b/tests/.gitignore
index fb05c2a..d9c2ef4 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -19,3 +19,4 @@ test-thread-pool
 test-x86-cpuid
 test-xbzrle
 *-test
+qapi-schema/*.test.*
diff --git a/tests/Makefile b/tests/Makefile
index d044908..ad98439 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -247,10 +247,10 @@ check-tests/test-qapi.py: tests/test-qapi.py
 
 .PHONY: $(patsubst %, check-%, $(check-qapi-schema-y))
 $(patsubst %, check-%, $(check-qapi-schema-y)): check-%.json: 
$(SRC_PATH)/%.json
-   $(call quiet-command, PYTHONPATH=$(SRC_PATH)/scripts $(PYTHON) 
$(SRC_PATH)/tests/qapi-schema/test-qapi.py $^ $*.out 2$*.err; echo $$? 
$*.exit,   TEST  $*.out)
-   @diff -q $(SRC_PATH)/$*.out $*.out
-   @diff -q $(SRC_PATH)/$*.err $*.err
-   @diff -q $(SRC_PATH)/$*.exit $*.exit
+   $(call quiet-command, PYTHONPATH=$(SRC_PATH)/scripts $(PYTHON) 
$(SRC_PATH)/tests/qapi-schema/test-qapi.py $^ $*.test.out 2$*.test.err; echo 
$$? $*.test.exit,   TEST  $*.out)
+   @diff -q $(SRC_PATH)/$*.out $*.test.out
+   @diff -q $(SRC_PATH)/$*.err $*.test.err
+   @diff -q $(SRC_PATH)/$*.exit $*.test.exit
 
 # Consolidated targets
 
-- 
1.8.1.4

[Qemu-devel] [PATCH v2 2/2] tests: Update .gitignore for test-int128 and test-bitops

2013-08-20 Thread armbru

From: Markus Armbruster arm...@redhat.com

Forgotten in commit 6046c62 and 3464700.

Cc: qemu-sta...@nongnu.org
Reviewed-by: Andreas Färber afaer...@suse.de
Signed-off-by: Markus Armbruster arm...@redhat.com
---
 tests/.gitignore | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tests/.gitignore b/tests/.gitignore
index d9c2ef4..9ac044d 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -5,8 +5,10 @@ check-qjson
 check-qlist
 check-qstring
 test-aio
+test-bitops
 test-cutils
 test-hbitmap
+test-int128
 test-iov
 test-mul64
 test-qapi-types.[ch]
-- 
1.8.1.4

Re: [Qemu-devel] [PATCH 0/2] tests: Fixes for in-tree build

2013-08-20 Thread Markus Armbruster

Andreas Färber afaer...@suse.de writes:

 Am 20.08.2013 09:47, schrieb arm...@redhat.com:
 From: Markus Armbruster arm...@redhat.com
 
 Markus Armbruster (2):
   tests: Fix schema parser test for in-tree build
   tests: Update .gitignore for test-int128 and test-bitops

 Series

 Reviewed-by: Andreas Färber afaer...@suse.de

 but you should probably add Cc: qemu-sta...@nongnu.org to 1/2 at least.

Good point.  v2 sent with the tag added, just to make sure it gets
picked up.

[Qemu-devel] [PATCH v2 0/2] tests: Fixes for in-tree build

2013-08-20 Thread armbru

From: Markus Armbruster arm...@redhat.com

v2: Nominate for qemu-stable (Andreas)

Markus Armbruster (2):
  tests: Fix schema parser test for in-tree build
  tests: Update .gitignore for test-int128 and test-bitops

 tests/.gitignore | 3 +++
 tests/Makefile   | 8 
 2 files changed, 7 insertions(+), 4 deletions(-)

-- 
1.8.1.4

Re: [Qemu-devel] [PATCH 0/3] target-ppc: Add support for dumping guest memory using qemu gdb server

2013-08-20 Thread Andreas Färber

Am 20.08.2013 10:19, schrieb Aneesh Kumar K.V:
 Andreas Färber afaer...@suse.de writes:
 
 Hi Aneesh,

 Am 19.08.2013 14:29, schrieb Aneesh Kumar K.V:
 This patch series implement support for dumping guest memory using qemu gdb 
 server.

 I had a quick look through but will leave in-depth review to Alex or
 Anthony.

 Do you plan to implement dumping guest memory via QMP, too?
 
 Are you looking at memsave command ?

No, I literally meant the dump-guest-memory command or so, which is only
implemented for x86 and s390x so far.

Cf. target-i386/arch_{dump,memory_mapping}.c and
target-s390x/arch_dump.c for examples.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH] target-i386: Only provide CMOV and friends if feature bit set

2013-08-20 Thread Peter Maydell

Ping^2! This has been reviewed and I've checked that the
patch still applies to master.

thanks
-- PMM

On 25 July 2013 17:54, Peter Maydell peter.mayd...@linaro.org wrote:
 Ping!

 (patchwork url: http://patchwork.ozlabs.org/patch/259148/)

 thanks
 -- PMM

 On 15 July 2013 18:21, Peter Maydell peter.mayd...@linaro.org wrote:
 The instructions CMOVcc, FCMOVcc and F[U]COMI[P] should only be
 present if the CMOV feature bit is set. Add missing feature bit
 checks so we correctly fault if emulating a 486 or 586.
 This fixes bug LP:1201446.

 Signed-off-by: Peter Maydell peter.mayd...@linaro.org
 ---
  target-i386/translate.c |   19 +++
  1 file changed, 19 insertions(+)

 diff --git a/target-i386/translate.c b/target-i386/translate.c
 index 6550c27..f75e3b1 100644
 --- a/target-i386/translate.c
 +++ b/target-i386/translate.c
 @@ -6434,12 +6434,18 @@ static target_ulong disas_insn(CPUX86State *env, 
 DisasContext *s,
  }
  break;
  case 0x1d: /* fucomi */
 +if (!(s-cpuid_features  CPUID_CMOV)) {
 +goto illegal_op;
 +}
  gen_update_cc_op(s);
  gen_helper_fmov_FT0_STN(cpu_env, tcg_const_i32(opreg));
  gen_helper_fucomi_ST0_FT0(cpu_env);
  set_cc_op(s, CC_OP_EFLAGS);
  break;
  case 0x1e: /* fcomi */
 +if (!(s-cpuid_features  CPUID_CMOV)) {
 +goto illegal_op;
 +}
  gen_update_cc_op(s);
  gen_helper_fmov_FT0_STN(cpu_env, tcg_const_i32(opreg));
  gen_helper_fcomi_ST0_FT0(cpu_env);
 @@ -6495,6 +6501,9 @@ static target_ulong disas_insn(CPUX86State *env, 
 DisasContext *s,
  }
  break;
  case 0x3d: /* fucomip */
 +if (!(s-cpuid_features  CPUID_CMOV)) {
 +goto illegal_op;
 +}
  gen_update_cc_op(s);
  gen_helper_fmov_FT0_STN(cpu_env, tcg_const_i32(opreg));
  gen_helper_fucomi_ST0_FT0(cpu_env);
 @@ -6502,6 +6511,9 @@ static target_ulong disas_insn(CPUX86State *env, 
 DisasContext *s,
  set_cc_op(s, CC_OP_EFLAGS);
  break;
  case 0x3e: /* fcomip */
 +if (!(s-cpuid_features  CPUID_CMOV)) {
 +goto illegal_op;
 +}
  gen_update_cc_op(s);
  gen_helper_fmov_FT0_STN(cpu_env, tcg_const_i32(opreg));
  gen_helper_fcomi_ST0_FT0(cpu_env);
 @@ -6518,6 +6530,10 @@ static target_ulong disas_insn(CPUX86State *env, 
 DisasContext *s,
  (JCC_BE  1),
  (JCC_P  1),
  };
 +
 +if (!(s-cpuid_features  CPUID_CMOV)) {
 +goto illegal_op;
 +}
  op1 = fcmov_cc[op  3] | (((op  3)  1) ^ 1);
  l1 = gen_new_label();
  gen_jcc1_noeob(s, op1, l1);
 @@ -6889,6 +6905,9 @@ static target_ulong disas_insn(CPUX86State *env, 
 DisasContext *s,
  gen_ldst_modrm(env, s, modrm, OT_BYTE, OR_TMP0, 1);
  break;
  case 0x140 ... 0x14f: /* cmov Gv, Ev */
 +if (!(s-cpuid_features  CPUID_CMOV)) {
 +goto illegal_op;
 +}
  ot = dflag + OT_WORD;
  modrm = cpu_ldub_code(env, s-pc++);
  reg = ((modrm  3)  7) | rex_r;
 --
 1.7.9.5

Re: [Qemu-devel] [PATCH] hw/openrisc/openrisc_sim: Avoid using uninitialised variable 'entry'

2013-08-20 Thread Peter Maydell

Ping for qemu-trivial now 1.7 is open.

thanks
-- PMM

On 5 August 2013 19:24, Peter Maydell peter.mayd...@linaro.org wrote:
 clang warns that cpu_openrisc_load_kernel() can use 'entry' uninitialized:

 hw/openrisc/openrisc_sim.c:69:9: error: variable 'entry' is used
   uninitialized whenever '' condition is false 
 [-Werror,-Wsometimes-uninitialized]
 if (kernel_filename  !qtest_enabled()) {
 ^~~
 hw/openrisc/openrisc_sim.c:91:19: note: uninitialized use occurs here
 cpu-env.pc = entry;
   ^

 Fix this by not attempting to change the CPU's starting PC unless
 we actually loaded a kernel.

 Signed-off-by: Peter Maydell peter.mayd...@linaro.org
 ---
  hw/openrisc/openrisc_sim.c |4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

 diff --git a/hw/openrisc/openrisc_sim.c b/hw/openrisc/openrisc_sim.c
 index a08f27c..4595fa9 100644
 --- a/hw/openrisc/openrisc_sim.c
 +++ b/hw/openrisc/openrisc_sim.c
 @@ -86,9 +86,9 @@ static void cpu_openrisc_load_kernel(ram_addr_t ram_size,
  kernel_filename);
  exit(1);
  }
 -}

 -cpu-env.pc = entry;
 +cpu-env.pc = entry;
 +}
  }

  static void openrisc_sim_init(QEMUMachineInitArgs *args)
 --
 1.7.9.5

Re: [Qemu-devel] [PATCH] configure: disable clang -Wstring-plus-int warning

2013-08-20 Thread Peter Maydell

Ping for qemu-trivial now 1.7 is open.

thanks
-- PMM

On 5 August 2013 20:16, Peter Maydell peter.mayd...@linaro.org wrote:
 Some versions of clang will warn about adding integers to strings:

 disas/i386.c:4753:23: error: adding 'char' to a string does not append
   to the string [-Werror,-Wstring-plus-int]
   oappend (%es: + intel_syntax);
~~~^~
 disas/i386.c:4753:23: note: use array indexing to silence this warning
   oappend (%es: + intel_syntax);
   ^
  [ ]

 disas/i386.c uses this idiom to to skip a % prefix if using intel
 rather than ATT syntax. This seems like a reasonable  thing to do,
 and I don't think anybody contributing to QEMU is likely to believe
 that '+' is a string concatenation operator in C, so just disable
 -Wstring-plus-int.

 Signed-off-by: Peter Maydell peter.mayd...@linaro.org
 ---
  configure |1 +
  1 file changed, 1 insertion(+)

 diff --git a/configure b/configure
 index a3bd9be..a1d1701 100755
 --- a/configure
 +++ b/configure
 @@ -1204,6 +1204,7 @@ gcc_flags=-Wformat-security -Wformat-y2k -Winit-self 
 -Wignored-qualifiers $gcc_
  gcc_flags=-Wmissing-include-dirs -Wempty-body -Wnested-externs $gcc_flags
  gcc_flags=-Wendif-labels $gcc_flags
  gcc_flags=-Wno-initializer-overrides $gcc_flags
 +gcc_flags=-Wno-string-plus-int $gcc_flags
  # Note that we do not add -Werror to gcc_flags here, because that would
  # enable it for all configure tests. If a configure test failed due
  # to -Werror this would just silently disable some features,
 --
 1.7.9.5

Re: [Qemu-devel] [PATCH 00/14] ARM: make IRQ/FIQ gpio lines on CPU object

2013-08-20 Thread Peter Maydell

On 8 August 2013 17:04, Peter Maydell peter.mayd...@linaro.org wrote:
 This patch series makes IRQ and FIQ be inbound gpio lines on the
 ARMCPU object (which we can do now that it is a subclass of
 DeviceState). This allows us to drop the odd 'arm_pic' shim,
 which doesn't correspond to real hardware and existed purely
 to be a thing which exposed qemu_irqs

Applying this to target-arm.next (since that lets me resolve the
trivial textual conflict with the generic-timers patchset rather
than having two pullreqs conflict).

thanks
-- PMM

Re: [Qemu-devel] [PATCH v2 0/4] target-arm: Implement support for generic timers

2013-08-20 Thread Peter Maydell

On 9 August 2013 17:17, Peter Maydell peter.mayd...@linaro.org wrote:
 This patch series implements support for the 'generic timers',
 which are a set of timers defined in the ARM Architecture Reference
 Manual and implemented by the Cortex-A15. We've got away without
 these up til now because Linux will generally fall back on whatever
 random timer is present on the devboard if it can't find the
 on-CPU timers, but this is less than ideal. (Among other things,
 the proposed mach-virt should just use the generic timers so it
 doesn't have to provide an sp804 timer.)

Applying to target-arm.next.

-- PMM

Re: [Qemu-devel] [PATCH] default-configs: Fix A9MP and A15MP config names

2013-08-20 Thread Peter Maydell

On 9 August 2013 14:50, Peter Maydell peter.mayd...@linaro.org wrote:
 When individual CONFIG_ switches for the A9MPcore and A15MPcore
 devices were created, they were inadvertently given incorrect names
 (CONFIG_ARM9MPCORE and CONFIG_ARM15MPCORE). These CPUs are
 Cortex-A9MP and Cortex-A15MP, and in particular the ARM9 is
 a different (rather older) CPU than the Cortex-A9. Rename the
 CONFIG_ switches to bring them into line with the source file
 names and CPU names.

 Signed-off-by: Peter Maydell peter.mayd...@linaro.org

Applying to target-arm.next.

-- PMM

Re: [Qemu-devel] [PATCH] hw/openrisc/openrisc_sim: Avoid using uninitialised variable 'entry'

2013-08-20 Thread Jia Liu

Hi Peter,

On Tue, Aug 20, 2013 at 9:00 PM, Peter Maydell peter.mayd...@linaro.org wrote:
 Ping for qemu-trivial now 1.7 is open.

Thank you, I'll send a PULL very soon.


 thanks
 -- PMM

 On 5 August 2013 19:24, Peter Maydell peter.mayd...@linaro.org wrote:
 clang warns that cpu_openrisc_load_kernel() can use 'entry' uninitialized:

 hw/openrisc/openrisc_sim.c:69:9: error: variable 'entry' is used
   uninitialized whenever '' condition is false 
 [-Werror,-Wsometimes-uninitialized]
 if (kernel_filename  !qtest_enabled()) {
 ^~~
 hw/openrisc/openrisc_sim.c:91:19: note: uninitialized use occurs here
 cpu-env.pc = entry;
   ^

 Fix this by not attempting to change the CPU's starting PC unless
 we actually loaded a kernel.

 Signed-off-by: Peter Maydell peter.mayd...@linaro.org
 ---
  hw/openrisc/openrisc_sim.c |4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

 diff --git a/hw/openrisc/openrisc_sim.c b/hw/openrisc/openrisc_sim.c
 index a08f27c..4595fa9 100644
 --- a/hw/openrisc/openrisc_sim.c
 +++ b/hw/openrisc/openrisc_sim.c
 @@ -86,9 +86,9 @@ static void cpu_openrisc_load_kernel(ram_addr_t ram_size,
  kernel_filename);
  exit(1);
  }
 -}

 -cpu-env.pc = entry;
 +cpu-env.pc = entry;
 +}
  }

  static void openrisc_sim_init(QEMUMachineInitArgs *args)
 --
 1.7.9.5



Regards,
Jia

Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled

2013-08-20 Thread Zhanghaoyu (A)

  The QEMU command line (/var/log/libvirt/qemu/[domain name].log), 
  LC_ALL=C PATH=/bin:/sbin:/usr/bin:/usr/sbin HOME=/ 
  QEMU_AUDIO_DRV=none
  /usr/local/bin/qemu-system-x86_64 -name ATS1 -S -M pc-0.12 -cpu
  qemu32 -enable-kvm -m 12288 -smp 4,sockets=4,cores=1,threads=1 
  -uuid
  0505ec91-382d-800e-2c79-e5b286eb60b5 -no-user-config -nodefaults 
  -chardev 
  socket,id=charmonitor,path=/var/lib/libvirt/qemu/ATS1.monitor,ser
  ver, n owait -mon chardev=charmonitor,id=monitor,mode=control 
  -rtc base=localtime -no-shutdown -device
  piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive 
  file=/opt/ne/vm/ATS1.img,if=none,id=drive-virtio-disk0,format=raw
  ,cac
  h
  e=none -device
  virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-dis
  k0,i
  d
  =virtio-disk0,bootindex=1 -netdev
  tap,fd=20,id=hostnet0,vhost=on,vhostfd=21 -device 
  virtio-net-pci,netdev=hostnet0,id=net0,mac=00:e0:fc:00:0f:00,bus=pci.
  0
  ,addr=0x3,bootindex=2 -netdev
  tap,fd=22,id=hostnet1,vhost=on,vhostfd=23 -device 
  virtio-net-pci,netdev=hostnet1,id=net1,mac=00:e0:fc:01:0f:00,bus=pci.
  0
  ,addr=0x4 -netdev tap,fd=24,id=hostnet2,vhost=on,vhostfd=25 
  -device 
  virtio-net-pci,netdev=hostnet2,id=net2,mac=00:e0:fc:02:0f:00,bus=pci.
  0
  ,addr=0x5 -netdev tap,fd=26,id=hostnet3,vhost=on,vhostfd=27 
  -device 
  virtio-net-pci,netdev=hostnet3,id=net3,mac=00:e0:fc:03:0f:00,bus=pci.
  0
  ,addr=0x6 -netdev tap,fd=28,id=hostnet4,vhost=on,vhostfd=29 
  -device 
  virtio-net-pci,netdev=hostnet4,id=net4,mac=00:e0:fc:0a:0f:00,bus=pci.
  0
  ,addr=0x7 -netdev tap,fd=30,id=hostnet5,vhost=on,vhostfd=31 
  -device 
  virtio-net-pci,netdev=hostnet5,id=net5,mac=00:e0:fc:0b:0f:00,bus=pci.
  0
  ,addr=0x9 -chardev pty,id=charserial0 -device 
  isa-serial,chardev=charserial0,id=serial0 -vnc *:0 -k en-us -vga 
  cirrus -device i6300esb,id=watchdog0,bus=pci.0,addr=0xb
  -watchdog-action poweroff -device 
  virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xa
  
 Which QEMU version is this? Can you try with e1000 NICs instead of virtio?
 
 This QEMU version is 1.0.0, but I also test QEMU 1.5.2, the same problem 
 exists, including the performance degradation and readonly GFNs' flooding.
 I tried with e1000 NICs instead of virtio, including the performance 
 degradation and readonly GFNs' flooding, the QEMU version is 1.5.2.
 No matter e1000 NICs or virtio NICs, the GFNs' flooding is initiated at 
 post-restore stage (i.e. running stage), as soon as the restoring 
 completed, the flooding is starting.
 
 Thanks,
 Zhang Haoyu
 
 --
   Gleb.
 
 Should we focus on the first bad 
 commit(612819c3c6e67bac8fceaa7cc402f13b1b63f7e4) and the surprising GFNs' 
 flooding?
 
Not really. There is no point in debugging very old version compiled 
with kvm-kmod, there are to many variables in the environment. I cannot 
reproduce the GFN flooding on upstream, so the problem may be gone, may 
be a result of kvm-kmod problem or something different in how I invoke 
qemu. So the best way to proceed is for you to reproduce with upstream 
version then at least I will be sure that we are using the same code.

Thanks, I will test the combos of upstream kvm kernel and upstream qemu.
And, the guest os version above I said was wrong, current running guest os is 
SLES10SP4.

I tested below combos of qemu and kernel,
+-+-+-+
|  kvm kernel |  QEMU   |   test result   |
+-+-+-+
|  kvm-3.11-2 |   qemu-1.5.2|  GOOD   |
+-+-+-+
|  SLES11SP2  |   qemu-1.0.0|  BAD|
+-+-+-+
|  SLES11SP2  |   qemu-1.4.0|  BAD|
+-+-+-+
|  SLES11SP2  |   qemu-1.4.2|  BAD|
+-+-+-+
|  SLES11SP2  | qemu-1.5.0-rc0  |  GOOD   |
+-+-+-+
|  SLES11SP2  |   qemu-1.5.0|  GOOD   |
+-+-+-+
|  SLES11SP2  |   qemu-1.5.1|  GOOD   |
+-+-+-+
|  SLES11SP2  |   qemu-1.5.2|  GOOD   |
+-+-+-+
NOTE:
1. above kvm-3.11-2 in the table is the whole tag kernel download from 
https://git.kernel.org/pub/scm/virt/kvm/kvm.git
2. SLES11SP2's kernel version is 3.0.13-0.27

Then I git bisect the qemu changes between qemu-1.4.2 and qemu-1.5.0-rc0 by 
marking the good version as bad, and the bad version as good,
so the first bad commit is just the patch which fixes the degradation problem.
++---+-+-+
| bisect No. |  commit   |  save-restore   |
migration|

Re: [Qemu-devel] [PATCHv11 13/31] aio / timers: aio_ctx_prepare sets timeout from AioContext timers

2013-08-20 Thread Alex Bligh


On 20 Aug 2013, at 12:19, Wenchao Xia wrote:
 So it is an undefined value, should avoid use it?


It's not an undefined value. It's the value that the poll should
wait for subject to modification by the prepare call.

-- 
Alex Bligh

Re: [Qemu-devel] vmdk stream-optimised format

2013-08-20 Thread Alex Bligh


On 20 Aug 2013, at 11:53, Paolo Bonzini wrote:

 It doesn't look too different from what qemu-img convert -c does,
 except that you need to use the right -o incantation to specify the
 format type.

Extensive incanting did not produce a result that works. Possibly
inadequate haruspication on my part. I suspect it's only a minor
modification to support it.

-- 
Alex Bligh

Re: [Qemu-devel] vmdk stream-optimised format

2013-08-20 Thread Paolo Bonzini

Il 20/08/2013 15:37, Alex Bligh ha scritto:
 
 On 20 Aug 2013, at 11:53, Paolo Bonzini wrote:
 
 It doesn't look too different from what qemu-img convert -c does,
 except that you need to use the right -o incantation to specify the
 format type.
 
 Extensive incanting did not produce a result that works. Possibly
 inadequate haruspication on my part. I suspect it's only a minor
 modification to support it.

Yes, because vmdk does not implement bdrv_write_compressed.  Sorry for
leaving out the small detail. :(

Paolo

Re: [Qemu-devel] [PATCH v2] monitor: print the invalid char in error message

2013-08-20 Thread Eric Blake

On 08/19/2013 08:58 PM, Fam Zheng wrote:
 It's more friendly to print which char is invalid to user, especially
 when user tries to input a float value and expect the monitor to round
 it to int. Since we don't round float number when we look for a integer,
 telling which char is invalid is less confusing.

Including a sample of the error message issued before and after this
patch as part of the commit message is helpful, but not enough for me to
require a respin.

 
 Signed-off-by: Fam Zheng f...@redhat.com
 ---
  monitor.c | 10 +++---
  1 file changed, 7 insertions(+), 3 deletions(-)

Reviewed-by: Eric Blake ebl...@redhat.com

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] Call for agenda for 2013-08-20

2013-08-20 Thread Juan Quintela

Juan Quintela quint...@redhat.com wrote:
 Hi

 Please, send any topic that you are interested in covering.


Call cancelled.

As this was the only topic,  and neither Frederik or Konrad are able to
attend today,  topic got moved to next call in two weeks.


 Thanks, Juan.

 Agenda so far:
 - Talk about qemu reverse executing (1st description was done this week)
   How to handle IO when we want to do reverse execution.
   How this relate to Kemari needs?
   And to icount changes?

 Call details:

 10:00 AM to 11:00 AM EDT
 Every two weeks

 If you need phone number details,  contact me privately.

Re: [Qemu-devel] [PATCH V8 07/11] NUMA: set guest numa nodes memory policy

2013-08-20 Thread Andrew Jones



- Original Message -
 Set the guest numa nodes memory policies using the mbind(2)
 system call node by node.
 After this patch, we are able to set guest nodes memory policies
 through the QEMU options, this arms to solve the guest cross
 nodes memory access performance issue.
 And as you all know, if PCI-passthrough is used,
 direct-attached-device uses DMA transfer between device and qemu process.
 All pages of the guest will be pinned by get_user_pages().
 
 KVM_ASSIGN_PCI_DEVICE ioctl
   kvm_vm_ioctl_assign_device()
 =kvm_assign_device()
   = kvm_iommu_map_memslots()
 = kvm_iommu_map_pages()
= kvm_pin_pages()
 
 So, with direct-attached-device, all guest page's page count will be +1 and
 any page migration will not work. AutoNUMA won't too.
 
 So, we should set the guest nodes memory allocation policies before
 the pages are really mapped.
 
 Signed-off-by: Andre Przywara andre.przyw...@amd.com
 Signed-off-by: Wanlong Gao gaowanl...@cn.fujitsu.com
 ---
  numa.c | 89
  ++
  1 file changed, 89 insertions(+)
 
 diff --git a/numa.c b/numa.c
 index 436b8e0..b2c0048 100644
 --- a/numa.c
 +++ b/numa.c
 @@ -28,6 +28,16 @@
  #include qapi-visit.h
  #include qapi/opts-visitor.h
  #include qapi/dealloc-visitor.h
 +#include exec/memory.h
 +
 +#ifdef CONFIG_NUMA
 +#include numa.h
 +#include numaif.h
 +#ifndef MPOL_F_RELATIVE_NODES
 +#define MPOL_F_RELATIVE_NODES (1  14)
 +#define MPOL_F_STATIC_NODES   (1  15)
 +#endif
 +#endif
  
  QemuOptsList qemu_numa_opts = {
  .name = numa,
 @@ -209,6 +219,78 @@ void set_numa_nodes(void)
  }
  }
  
 +#ifdef CONFIG_NUMA
 +static int node_parse_bind_mode(unsigned int nodeid)
 +{
 +int bind_mode;
 +
 +switch (numa_info[nodeid].policy) {
 +case NUMA_NODE_POLICY_MEMBIND:
 +bind_mode = MPOL_BIND;
 +break;
 +case NUMA_NODE_POLICY_INTERLEAVE:
 +bind_mode = MPOL_INTERLEAVE;
 +break;
 +case NUMA_NODE_POLICY_PREFERRED:
 +bind_mode = MPOL_PREFERRED;
 +break;
 +case NUMA_NODE_POLICY_DEFAULT:
 +default:
 +bind_mode = MPOL_DEFAULT;
 +return bind_mode;
 +}
 +
 +bind_mode |= numa_info[nodeid].relative ?
 +MPOL_F_RELATIVE_NODES : MPOL_F_STATIC_NODES;
 +
 +return bind_mode;
 +}
 +#endif
 +
 +static int set_node_mem_policy(int nodeid)
 +{
 +#ifdef CONFIG_NUMA
 +void *ram_ptr;
 +RAMBlock *block;
 +ram_addr_t len, ram_offset = 0;
 +int bind_mode;
 +int i;
 +
 +QTAILQ_FOREACH(block, ram_list.blocks, next) {
 +if (!strcmp(block-mr-name, pc.ram)) {
 +break;
 +}
 +}
 +
 +if (block-host == NULL) {
 +return -1;
 +}
 +
 +ram_ptr = block-host;
 +for (i = 0; i  nodeid; i++) {
 +len = numa_info[i].node_mem;
 +ram_offset += len;
 +}
 +
 +len = numa_info[i].node_mem;
 +bind_mode = node_parse_bind_mode(i);
 +
 +/* This is a workaround for a long standing bug in Linux'
 + * mbind implementation, which cuts off the last specified
 + * node. To stay compatible should this bug be fixed, we
 + * specify one more node and zero this one out.
 + */
 +clear_bit(numa_num_configured_nodes() + 1, numa_info[i].host_mem);
 +if (mbind(ram_ptr + ram_offset, len, bind_mode,
 +numa_info[i].host_mem, numa_num_configured_nodes() + 1, 0)) {
 +perror(mbind);
 +return -1;
 +}

From my quick read of this patch series, I think these two calls of
numa_num_configured_nodes() are the only places that libnuma is used.
Is it really worth the new dependency? Actually libnuma will only calculate
what it returns from numa_num_configured_nodes() once, because it simply
counts bits in a bitmask that it only initializes at library load time. So
it would be more robust wrt to node onlining/offlining to avoid libnuma and
to just fetch information from sysfs as needed anyway. In this particular
code though, I think replacing numa_num_configured_nodes() with a maxnode,
where

unsigned long maxnode = find_last_bit(numa_info[i].host_mem, MAX_CPUMASK_BITS)

would work the best.

Another comment I have on this function is that I'd prefer to see something
like

unsigned long *nodes = numa_info[nodeid].host_mem;

at the top, and then use that for a shorter name, rather than abusing
the fact that i == nodeid after the loop, presumably just to keep the name
short.

drew

 +#endif
 +
 +return 0;
 +}
 +
  void set_numa_modes(void)
  {
  CPUState *cpu;
 @@ -221,4 +303,11 @@ void set_numa_modes(void)
  }
  }
  }
 +
 +for (i = 0; i  nb_numa_nodes; i++) {
 +if (set_node_mem_policy(i) == -1) {
 +fprintf(stderr,
 +qemu: can not set host memory policy for node%d\n, i);
 +}
 +}
  }
 --
 1.8.4.rc1.21.gfb56570

[Qemu-devel] [PATCH] Make cow_co_is_allocated and cow_update_bitmap more efficient

2013-08-20 Thread Charlie Shepherd

cow_co_is_allocated and cow_update_bitmap set bits by reading the relevant
word, setting the specific bit in it and writing it back. These functions set
a number of contiguous bits however, so this is an extremely inefficient way
of doing this. This patch converts them to read the whole bitmap they need in
one go, update it and then write it out, which should be much more more
efficient.

Signed-off-by: Charlie Shepherd char...@ctshepherd.com
---
 block/cow.c | 116 
 1 file changed, 62 insertions(+), 54 deletions(-)

diff --git a/block/cow.c b/block/cow.c
index 1cc2e89..87ebef6 100644
--- a/block/cow.c
+++ b/block/cow.c
@@ -102,84 +102,92 @@ static int cow_open(BlockDriverState *bs, QDict *options, 
int flags)
 return ret;
 }
 
-/*
- * XXX(hch): right now these functions are extremely inefficient.
- * We should just read the whole bitmap we'll need in one go instead.
- */
-static inline int cow_set_bit(BlockDriverState *bs, int64_t bitnum)
-{
-uint64_t offset = sizeof(struct cow_header_v2) + bitnum / 8;
-uint8_t bitmap;
-int ret;
-
-ret = bdrv_pread(bs-file, offset, bitmap, sizeof(bitmap));
-if (ret  0) {
-   return ret;
-}
-
-bitmap |= (1  (bitnum % 8));
-
-ret = bdrv_pwrite_sync(bs-file, offset, bitmap, sizeof(bitmap));
-if (ret  0) {
-   return ret;
-}
-return 0;
-}
-
-static inline int is_bit_set(BlockDriverState *bs, int64_t bitnum)
-{
-uint64_t offset = sizeof(struct cow_header_v2) + bitnum / 8;
-uint8_t bitmap;
-int ret;
-
-ret = bdrv_pread(bs-file, offset, bitmap, sizeof(bitmap));
-if (ret  0) {
-   return ret;
-}
-
-return !!(bitmap  (1  (bitnum % 8)));
-}
-
 /* Return true if first block has been changed (ie. current version is
  * in COW file).  Set the number of continuous blocks for which that
  * is true. */
 static int coroutine_fn cow_co_is_allocated(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, int *num_same)
 {
-int changed;
+int ret, changed;
+uint64_t offset = sizeof(struct cow_header_v2) + sector_num / 8;
+
+int init_bits = (sector_num % 8) ? (8 - (sector_num % 8)) : 0;
+int remaining = sector_num - init_bits;
+int full_bytes = remaining / 8;
+int trail = remaining % 8;
+
+int len = !!init_bits + full_bytes + !!trail;
+uint8_t buf[len];
 
 if (nb_sectors == 0) {
-   *num_same = nb_sectors;
-   return 0;
+*num_same = nb_sectors;
+return 0;
 }
 
-changed = is_bit_set(bs, sector_num);
-if (changed  0) {
-return 0; /* XXX: how to return I/O errors? */
+ret = bdrv_pread(bs-file, offset, buf, len);
+if (ret  0) {
+return ret;
 }
 
+#define is_bit_set(b) (!!(buf[(b)/8]  (1  ((b) % 8
+
+changed = is_bit_set(sector_num);
 for (*num_same = 1; *num_same  nb_sectors; (*num_same)++) {
-   if (is_bit_set(bs, sector_num + *num_same) != changed)
-   break;
+if (is_bit_set(sector_num + *num_same) != changed) {
+break;
+}
 }
 
+#undef is_bit_set
+
 return changed;
 }
 
+/* Set the bits from sector_num to sector_num + nb_sectors in the bitmap of
+ * bs-file. */
 static int cow_update_bitmap(BlockDriverState *bs, int64_t sector_num,
 int nb_sectors)
 {
-int error = 0;
-int i;
+int ret;
+uint64_t offset = sizeof(struct cow_header_v2) + sector_num / 8;
 
-for (i = 0; i  nb_sectors; i++) {
-error = cow_set_bit(bs, sector_num + i);
-if (error) {
-break;
-}
+int init_bits = (sector_num % 8) ? (8 - (sector_num % 8)) : 0;
+int remaining = sector_num - init_bits;
+int full_bytes = remaining / 8;
+int trail = remaining % 8;
+
+int len = !!init_bits + full_bytes + !!trail;
+uint8_t buf[len];
+
+ret = bdrv_pread(bs-file, offset, buf, len);
+if (ret  0) {
+return ret;
+}
+
+/* Do sector_num - nearest byte boundary */
+if (init_bits) {
+/* This sets the highest init_bits bits in the byte */
+uint8_t bits = ((1  init_bits) - 1)  (8 - init_bits);
+buf[0] |= bits;
+}
+
+if (full_bytes) {
+memset(buf[!!init_bits], ~0, full_bytes);
+}
+
+/* Set the trailing bits in the final byte */
+if (trail) {
+/* This sets the lowest trail bits in the byte */
+uint8_t bits = (1  trail) - 1;
+buf[len - 1] |= bits;
+}
+
+ret = bdrv_pwrite_sync(bs-file, offset, buf, len);
+if (ret  0) {
+return ret;
 }
 
-return error;
+return 0;
 }
 
 static int coroutine_fn cow_read(BlockDriverState *bs, int64_t sector_num,
-- 
1.8.3.2

Re: [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes

2013-08-20 Thread Paolo Bonzini

Il 20/08/2013 03:07, Wanlong Gao ha scritto:
  -numa node,nodeid=0,cpus=0, \
  -numa mem,size=1024M,policy=membind,host-nodes=0-1 \
  -numa node,nodeid=1,cpus=1 \
  -numa mem,size=1024M,policy=interleave,host-nodes=1

What nodes would the memory be in, for this command line?  Does it just
compute the total and split it evenly across the nodes (so that the
-numa node options could omit nodeid and cpus too)?

Also, do you still need a -m option if you use -numa mem?

Paolo

Re: [Qemu-devel] [PATCH] Make cow_co_is_allocated and cow_update_bitmap more efficient

2013-08-20 Thread Paolo Bonzini

Il 20/08/2013 15:42, Charlie Shepherd ha scritto:
 cow_co_is_allocated and cow_update_bitmap set bits by reading the relevant
 word, setting the specific bit in it and writing it back. These functions set
 a number of contiguous bits however, so this is an extremely inefficient way
 of doing this. This patch converts them to read the whole bitmap they need in
 one go, update it and then write it out, which should be much more more
 efficient.
 
 Signed-off-by: Charlie Shepherd char...@ctshepherd.com
 ---
  block/cow.c | 116 
 
  1 file changed, 62 insertions(+), 54 deletions(-)
 
 diff --git a/block/cow.c b/block/cow.c
 index 1cc2e89..87ebef6 100644
 --- a/block/cow.c
 +++ b/block/cow.c
 @@ -102,84 +102,92 @@ static int cow_open(BlockDriverState *bs, QDict 
 *options, int flags)
  return ret;
  }
  
 -/*
 - * XXX(hch): right now these functions are extremely inefficient.
 - * We should just read the whole bitmap we'll need in one go instead.
 - */
 -static inline int cow_set_bit(BlockDriverState *bs, int64_t bitnum)
 -{
 -uint64_t offset = sizeof(struct cow_header_v2) + bitnum / 8;
 -uint8_t bitmap;
 -int ret;
 -
 -ret = bdrv_pread(bs-file, offset, bitmap, sizeof(bitmap));
 -if (ret  0) {
 -   return ret;
 -}
 -
 -bitmap |= (1  (bitnum % 8));
 -
 -ret = bdrv_pwrite_sync(bs-file, offset, bitmap, sizeof(bitmap));
 -if (ret  0) {
 -   return ret;
 -}
 -return 0;
 -}
 -
 -static inline int is_bit_set(BlockDriverState *bs, int64_t bitnum)
 -{
 -uint64_t offset = sizeof(struct cow_header_v2) + bitnum / 8;
 -uint8_t bitmap;
 -int ret;
 -
 -ret = bdrv_pread(bs-file, offset, bitmap, sizeof(bitmap));
 -if (ret  0) {
 -   return ret;
 -}
 -
 -return !!(bitmap  (1  (bitnum % 8)));
 -}
 -
  /* Return true if first block has been changed (ie. current version is
   * in COW file).  Set the number of continuous blocks for which that
   * is true. */
  static int coroutine_fn cow_co_is_allocated(BlockDriverState *bs,
  int64_t sector_num, int nb_sectors, int *num_same)
  {
 -int changed;
 +int ret, changed;
 +uint64_t offset = sizeof(struct cow_header_v2) + sector_num / 8;
 +
 +int init_bits = (sector_num % 8) ? (8 - (sector_num % 8)) : 0;
 +int remaining = sector_num - init_bits;
 +int full_bytes = remaining / 8;
 +int trail = remaining % 8;
 +
 +int len = !!init_bits + full_bytes + !!trail;
 +uint8_t buf[len];
  
  if (nb_sectors == 0) {
 - *num_same = nb_sectors;
 - return 0;
 +*num_same = nb_sectors;
 +return 0;
  }
  
 -changed = is_bit_set(bs, sector_num);
 -if (changed  0) {
 -return 0; /* XXX: how to return I/O errors? */
 +ret = bdrv_pread(bs-file, offset, buf, len);
 +if (ret  0) {
 +return ret;
  }
  
 +#define is_bit_set(b) (!!(buf[(b)/8]  (1  ((b) % 8
 +
 +changed = is_bit_set(sector_num);
  for (*num_same = 1; *num_same  nb_sectors; (*num_same)++) {
 - if (is_bit_set(bs, sector_num + *num_same) != changed)
 - break;
 +if (is_bit_set(sector_num + *num_same) != changed) {
 +break;
 +}
  }
  
 +#undef is_bit_set
 +
  return changed;
  }
  
 +/* Set the bits from sector_num to sector_num + nb_sectors in the bitmap of
 + * bs-file. */
  static int cow_update_bitmap(BlockDriverState *bs, int64_t sector_num,
  int nb_sectors)
  {
 -int error = 0;
 -int i;
 +int ret;
 +uint64_t offset = sizeof(struct cow_header_v2) + sector_num / 8;
  
 -for (i = 0; i  nb_sectors; i++) {
 -error = cow_set_bit(bs, sector_num + i);
 -if (error) {
 -break;
 -}
 +int init_bits = (sector_num % 8) ? (8 - (sector_num % 8)) : 0;
 +int remaining = sector_num - init_bits;
 +int full_bytes = remaining / 8;
 +int trail = remaining % 8;
 +
 +int len = !!init_bits + full_bytes + !!trail;
 +uint8_t buf[len];
 +
 +ret = bdrv_pread(bs-file, offset, buf, len);
 +if (ret  0) {
 +return ret;
 +}
 +
 +/* Do sector_num - nearest byte boundary */
 +if (init_bits) {
 +/* This sets the highest init_bits bits in the byte */
 +uint8_t bits = ((1  init_bits) - 1)  (8 - init_bits);
 +buf[0] |= bits;
 +}
 +
 +if (full_bytes) {
 +memset(buf[!!init_bits], ~0, full_bytes);
 +}
 +
 +/* Set the trailing bits in the final byte */
 +if (trail) {
 +/* This sets the lowest trail bits in the byte */
 +uint8_t bits = (1  trail) - 1;
 +buf[len - 1] |= bits;
 +}
 +
 +ret = bdrv_pwrite_sync(bs-file, offset, buf, len);
 +if (ret  0) {
 +return ret;
  }
  
 -return error;
 +return 0;
  }
  
  static int coroutine_fn cow_read(BlockDriverState *bs, int64_t sector_num,
 

I had very similar patches in my series to add get_block_status...

Paolo

Re: [Qemu-devel] [Patch] ARM: Simplify and fix imx_epit implementation.

2013-08-20 Thread Peter Maydell

On 5 August 2013 02:27, Peter Chubb peter.ch...@nicta.com.au wrote:

 When imx_epit.c was last refactored, a common usecase (comparison
 register zero) broke.  This patch fixes that, and simplifies the code
 yet more.  It also fixes a major thinko in the reset path --- the
 wrong bits in the control register were being cleared.

 Signed-off-by: Peter Chubb peter.ch...@nicta.com.au
 Reviewed-by: Jean-Christophe DUBOIS j...@tribudubois.net

Thanks; applied to target-arm.next.

-- PMM

Re: [Qemu-devel] [PATCH] Make cow_co_is_allocated and cow_update_bitmap more efficient

2013-08-20 Thread Charlie Shepherd


On 20/08/2013 14:45, Paolo Bonzini wrote:

I had very similar patches in my series to add get_block_status...


Ah, sorry if I've duplicated something you've already done. Stefan 
encouraged me to contribute to an area of QEMU I found interesting and 
seeing the XXX I thought I'd take a shot at improving the code. Where 
should I look in the future for patches like this?



Charlie

Re: [Qemu-devel] [Patch] ARM: Simplify and fix imx_epit implementation.

2013-08-20 Thread Andreas Färber

Am 20.08.2013 15:46, schrieb Peter Maydell:
 On 5 August 2013 02:27, Peter Chubb peter.ch...@nicta.com.au wrote:

 When imx_epit.c was last refactored, a common usecase (comparison
 register zero) broke.  This patch fixes that, and simplifies the code
 yet more.  It also fixes a major thinko in the reset path --- the
 wrong bits in the control register were being cleared.

 Signed-off-by: Peter Chubb peter.ch...@nicta.com.au
 Reviewed-by: Jean-Christophe DUBOIS j...@tribudubois.net
 
 Thanks; applied to target-arm.next.

Peter Ch., please call such a patch imx_epit:  rather than ARM: 
since it affects only your device. (PMM, can you fix?)

Also if there is a reset bug, then fixing that in its own patch would
better allow backporting that to 1.6.1. The way it is right now with no
Cc: line for qemu-stable, the released version will keep the thinko.
Changing debug output from stdout to stderr would've also been a change
of its own that is not even mentioned in the commit message.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] vmdk stream-optimised format

2013-08-20 Thread Alex Bligh


On 20 Aug 2013, at 14:37, Paolo Bonzini wrote:

 Yes, because vmdk does not implement bdrv_write_compressed.  Sorry for
 leaving out the small detail. :(

Would that be easy for me to fix?

-- 
Alex Bligh

Re: [Qemu-devel] [Patch] ARM: Simplify and fix imx_epit implementation.

2013-08-20 Thread Peter Maydell

On 20 August 2013 15:01, Andreas Färber afaer...@suse.de wrote:
 Peter Ch., please call such a patch imx_epit:  rather than ARM: 
 since it affects only your device. (PMM, can you fix?)

Yes, I fixed it to say hw/timer/imx_epit: ...

-- PMM

Re: [Qemu-devel] [PATCH V8 00/13] monitor: support sub command group in auto completion and help

2013-08-20 Thread Luiz Capitulino

On Tue, 30 Jul 2013 12:03:11 -0400
Luiz Capitulino lcapitul...@redhat.com wrote:

 On Fri, 26 Jul 2013 11:20:29 +0800
 Wenchao Xia xiaw...@linux.vnet.ibm.com wrote:
 
  This series make auto completion and help functions works normal for sub
  command, by using reentrant functions. In order to do that, global variables
  are not directly used in those functions any more. With this series, 
  cmd_table
  is a member of structure Monitor so it is possible to create a monitor with
  different command table now, auto completion will work in that monitor. In
  short, info is not treated as a special case now, this series ensure help
  and auto complete function works normal for any sub command added in the 
  future.
  
  Patch 5 replaced cur_mon with rs-mon, it is safe because:
  monitor_init() calls readline_init() which initialize mon-rs, result is
  mon-rs-mon == mon. Then qemu_chr_add_handlers() is called, which make
  monitor_read() function take *mon as its opaque. Later, when user input,
  monitor_read() is called, where cur_mon is set to *mon by cur_mon = 
  opaque.
  If qemu's monitors run in one thread, then later in readline_handle_byte()
  and readline_comletion(), cur_mon is actually equal to rs-mon, in another
  word, it points to the monitor instance, so it is safe to replace *cur_mon
  in those functions.
 
 I've applied this to qmp-next with the change I suggested for
 patch 09/13.

Unfortunately this series brakes make check:

GTESTER check-qtest-x86_64
Broken pipe
GTester: last random seed: R02S3492bd34f44dd17460851643383be44d
main-loop: WARNING: I/O thread spun for 1000 iterations
make: *** [check-qtest-x86_64] Error 1

I debugged it (with some help from Laszlo) and the problem is that it
broke the human-monitor-command command. Any usage of this command
triggers the bug like:

{ execute: human-monitor-command,
 arguments: { command-line: info registers } }

It seems simple to fix, I think you just have to initialize
mon-cmd_table in qmp_human_monitor_command(), but I'd recommend two
things:

 1. It's better to split off some/all QMP initialization from
monitor_init() and call it from qmp_human_monitor_command()

 2. Can you please take the opportunity and test all commands using
cur_mon? Just grep for it

Sorry for noticing this only now, but I only run make check before
sending a pull request (although this very likely shows you didn't
run it either).

Re: [Qemu-devel] [PATCH] Make cow_co_is_allocated and cow_update_bitmap more efficient

2013-08-20 Thread Paolo Bonzini

Il 20/08/2013 15:50, Charlie Shepherd ha scritto:
 On 20/08/2013 14:45, Paolo Bonzini wrote:
 I had very similar patches in my series to add get_block_status...
 
 Ah, sorry if I've duplicated something you've already done. Stefan
 encouraged me to contribute to an area of QEMU I found interesting and
 seeing the XXX I thought I'd take a shot at improving the code. Where
 should I look in the future for patches like this?

There is still room for improvement in this area.  You can compare my
patches and yours, they follow very different ideas.

I could pick up your patch in my series too, actually, if they're
reviewed fast enough (I'm not sure I'll have the cycles to do that,
unfortunately).

Paolo

[Qemu-devel] [PULL 08/21] hw/arm/musicpal: Don't use arm_pic_init_cpu()

2013-08-20 Thread Peter Maydell

Drop the now-deprecated arm_pic_init_cpu() in favour of directly
getting the IRQ line from the ARMCPU object.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
Message-id: 1375977856-25046-8-git-send-email-peter.mayd...@linaro.org
---
 hw/arm/musicpal.c |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/hw/arm/musicpal.c b/hw/arm/musicpal.c
index d715143..4404b8d 100644
--- a/hw/arm/musicpal.c
+++ b/hw/arm/musicpal.c
@@ -1586,7 +1586,6 @@ static void musicpal_init(QEMUMachineInitArgs *args)
 const char *kernel_cmdline = args-kernel_cmdline;
 const char *initrd_filename = args-initrd_filename;
 ARMCPU *cpu;
-qemu_irq *cpu_pic;
 qemu_irq pic[32];
 DeviceState *dev;
 DeviceState *i2c_dev;
@@ -1610,7 +1609,6 @@ static void musicpal_init(QEMUMachineInitArgs *args)
 fprintf(stderr, Unable to find CPU definition\n);
 exit(1);
 }
-cpu_pic = arm_pic_init_cpu(cpu);
 
 /* For now we use a fixed - the original - RAM size */
 memory_region_init_ram(ram, NULL, musicpal.ram, MP_RAM_DEFAULT_SIZE);
@@ -1622,7 +1620,7 @@ static void musicpal_init(QEMUMachineInitArgs *args)
 memory_region_add_subregion(address_space_mem, MP_SRAM_BASE, sram);
 
 dev = sysbus_create_simple(TYPE_MV88W8618_PIC, MP_PIC_BASE,
-   cpu_pic[ARM_PIC_CPU_IRQ]);
+   qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_IRQ));
 for (i = 0; i  32; i++) {
 pic[i] = qdev_get_gpio_in(dev, i);
 }
-- 
1.7.9.5

[Qemu-devel] [PULL 07/21] hw/arm/kzm: Don't use arm_pic_init_cpu()

2013-08-20 Thread Peter Maydell

Drop the now-deprecated arm_pic_init_cpu() in favour of directly
getting the IRQ line from the ARMCPU object.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
Message-id: 1375977856-25046-7-git-send-email-peter.mayd...@linaro.org
---
 hw/arm/kzm.c |8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/hw/arm/kzm.c b/hw/arm/kzm.c
index bd6c05c..a248bf0 100644
--- a/hw/arm/kzm.c
+++ b/hw/arm/kzm.c
@@ -82,7 +82,6 @@ static void kzm_init(QEMUMachineInitArgs *args)
 MemoryRegion *ram = g_new(MemoryRegion, 1);
 MemoryRegion *sram = g_new(MemoryRegion, 1);
 MemoryRegion *ram_alias = g_new(MemoryRegion, 1);
-qemu_irq *cpu_pic;
 DeviceState *dev;
 DeviceState *ccm;
 
@@ -108,11 +107,10 @@ static void kzm_init(QEMUMachineInitArgs *args)
 memory_region_init_ram(sram, NULL, kzm.sram, 0x4000);
 memory_region_add_subregion(address_space_mem, 0x1FFFC000, sram);
 
-cpu_pic = arm_pic_init_cpu(cpu);
 dev = sysbus_create_varargs(imx_avic, 0x6800,
-cpu_pic[ARM_PIC_CPU_IRQ],
-cpu_pic[ARM_PIC_CPU_FIQ], NULL);
-
+qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_IRQ),
+qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_FIQ),
+NULL);
 
 imx_serial_create(0, 0x43f9, qdev_get_gpio_in(dev, 45));
 imx_serial_create(1, 0x43f94000, qdev_get_gpio_in(dev, 32));
-- 
1.7.9.5

[Qemu-devel] [PULL 16/21] target-arm: Allow raw_read() and raw_write() to handle 64 bit regs

2013-08-20 Thread Peter Maydell

Extend the raw_read() and raw_write() helper accessors so that
they can be used for 64 bit registers as well as 32 bit registers.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
Tested-by: Laurent Desnogues laurent.desnog...@gmail.com
Reviewed-by: Edgar E. Iglesias edgar.igles...@gmail.com
Message-id: 1376065080-26661-2-git-send-email-peter.mayd...@linaro.org
---
 target-arm/helper.c |   12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index 6d9026d..f8689e2 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -67,14 +67,22 @@ static int vfp_gdb_set_reg(CPUARMState *env, uint8_t *buf, 
int reg)
 static int raw_read(CPUARMState *env, const ARMCPRegInfo *ri,
 uint64_t *value)
 {
-*value = CPREG_FIELD32(env, ri);
+if (ri-type  ARM_CP_64BIT) {
+*value = CPREG_FIELD64(env, ri);
+} else {
+*value = CPREG_FIELD32(env, ri);
+}
 return 0;
 }
 
 static int raw_write(CPUARMState *env, const ARMCPRegInfo *ri,
  uint64_t value)
 {
-CPREG_FIELD32(env, ri) = value;
+if (ri-type  ARM_CP_64BIT) {
+CPREG_FIELD64(env, ri) = value;
+} else {
+CPREG_FIELD32(env, ri) = value;
+}
 return 0;
 }
 
-- 
1.7.9.5

[Qemu-devel] [PULL 11/21] hw/arm/strongarm: Don't use arm_pic_init_cpu()

2013-08-20 Thread Peter Maydell

Drop the now-deprecated arm_pic_init_cpu() in favour of directly
getting the IRQ line from the ARMCPU object.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
Message-id: 1375977856-25046-11-git-send-email-peter.mayd...@linaro.org
---
 hw/arm/strongarm.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/arm/strongarm.c b/hw/arm/strongarm.c
index 82a9492..7b8ef8c 100644
--- a/hw/arm/strongarm.c
+++ b/hw/arm/strongarm.c
@@ -1588,7 +1588,6 @@ StrongARMState *sa1110_init(MemoryRegion *sysmem,
 unsigned int sdram_size, const char *rev)
 {
 StrongARMState *s;
-qemu_irq *pic;
 int i;
 
 s = g_malloc0(sizeof(StrongARMState));
@@ -1613,9 +1612,10 @@ StrongARMState *sa1110_init(MemoryRegion *sysmem,
 vmstate_register_ram_global(s-sdram);
 memory_region_add_subregion(sysmem, SA_SDCS0, s-sdram);
 
-pic = arm_pic_init_cpu(s-cpu);
 s-pic = sysbus_create_varargs(strongarm_pic, 0x9005,
-pic[ARM_PIC_CPU_IRQ], pic[ARM_PIC_CPU_FIQ], NULL);
+qdev_get_gpio_in(DEVICE(s-cpu), ARM_CPU_IRQ),
+qdev_get_gpio_in(DEVICE(s-cpu), ARM_CPU_FIQ),
+NULL);
 
 sysbus_create_varargs(pxa25x-timer, 0x9000,
 qdev_get_gpio_in(s-pic, SA_PIC_OSTC0),
-- 
1.7.9.5

[Qemu-devel] [PULL 17/21] target-arm: Support coprocessor registers which do I/O

2013-08-20 Thread Peter Maydell

Add an ARM_CP_IO flag which an ARMCPRegInfo definition can use to
indicate that the register's implementation does I/O and thus
its accesses need to be surrounded by gen_io_start()/gen_io_end()
in order for icount to work. Most notably, cp registers which
implement clocks or timers need this.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
Tested-by: Laurent Desnogues laurent.desnog...@gmail.com
Reviewed-by: Edgar E. Iglesias edgar.igles...@gmail.com
Message-id: 1376065080-26661-3-git-send-email-peter.mayd...@linaro.org
---
 target-arm/cpu.h   |6 +-
 target-arm/translate.c |   16 +---
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index dffeec7..c2cb534 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -472,6 +472,9 @@ static inline uint64_t cpreg_to_kvm_id(uint32_t cpregid)
  * old must have the OVERRIDE bit set.
  * NO_MIGRATE indicates that this register should be ignored for migration;
  * (eg because any state is accessed via some other coprocessor register).
+ * IO indicates that this register does I/O and therefore its accesses
+ * need to be surrounded by gen_io_start()/gen_io_end(). In particular,
+ * registers which implement clocks or timers require this.
  */
 #define ARM_CP_SPECIAL 1
 #define ARM_CP_CONST 2
@@ -479,13 +482,14 @@ static inline uint64_t cpreg_to_kvm_id(uint32_t cpregid)
 #define ARM_CP_SUPPRESS_TB_END 8
 #define ARM_CP_OVERRIDE 16
 #define ARM_CP_NO_MIGRATE 32
+#define ARM_CP_IO 64
 #define ARM_CP_NOP (ARM_CP_SPECIAL | (1  8))
 #define ARM_CP_WFI (ARM_CP_SPECIAL | (2  8))
 #define ARM_LAST_SPECIAL ARM_CP_WFI
 /* Used only as a terminator for ARMCPRegInfo lists */
 #define ARM_CP_SENTINEL 0x
 /* Mask of only the flag bits in a type field */
-#define ARM_CP_FLAG_MASK 0x3f
+#define ARM_CP_FLAG_MASK 0x7f
 
 /* Return true if cptype is a valid type field. This is used to try to
  * catch errors where the sentinel has been accidentally left off the end
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 6db4c50..d1e8538 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -6280,6 +6280,10 @@ static int disas_coproc_insn(CPUARMState * env, 
DisasContext *s, uint32_t insn)
 break;
 }
 
+if (use_icount  (ri-type  ARM_CP_IO)) {
+gen_io_start();
+}
+
 if (isread) {
 /* Read */
 if (is64) {
@@ -6369,14 +6373,20 @@ static int disas_coproc_insn(CPUARMState * env, 
DisasContext *s, uint32_t insn)
 store_cpu_offset(tmp, ri-fieldoffset);
 }
 }
+}
+
+if (use_icount  (ri-type  ARM_CP_IO)) {
+/* I/O operations must end the TB here (whether read or write) */
+gen_io_end();
+gen_lookup_tb(s);
+} else if (!isread  !(ri-type  ARM_CP_SUPPRESS_TB_END)) {
 /* We default to ending the TB on a coprocessor register write,
  * but allow this to be suppressed by the register definition
  * (usually only necessary to work around guest bugs).
  */
-if (!(ri-type  ARM_CP_SUPPRESS_TB_END)) {
-gen_lookup_tb(s);
-}
+gen_lookup_tb(s);
 }
+
 return 0;
 }
 
-- 
1.7.9.5

[Qemu-devel] [PULL 13/21] hw/arm/vexpress: Don't use arm_pic_init_cpu()

2013-08-20 Thread Peter Maydell

Drop the now-deprecated arm_pic_init_cpu() in favour of directly
getting the IRQ line from the ARMCPU object.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
Message-id: 1375977856-25046-13-git-send-email-peter.mayd...@linaro.org
---
 hw/arm/vexpress.c |8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/hw/arm/vexpress.c b/hw/arm/vexpress.c
index 9586e38..fbd71a7 100644
--- a/hw/arm/vexpress.c
+++ b/hw/arm/vexpress.c
@@ -183,7 +183,6 @@ static void a9_daughterboard_init(const VEDBoardInfo 
*daughterboard,
 MemoryRegion *lowram = g_new(MemoryRegion, 1);
 DeviceState *dev;
 SysBusDevice *busdev;
-qemu_irq *irqp;
 int n;
 qemu_irq cpu_irq[4];
 ram_addr_t low_ram_size;
@@ -198,8 +197,7 @@ static void a9_daughterboard_init(const VEDBoardInfo 
*daughterboard,
 fprintf(stderr, Unable to find CPU definition\n);
 exit(1);
 }
-irqp = arm_pic_init_cpu(cpu);
-cpu_irq[n] = irqp[ARM_PIC_CPU_IRQ];
+cpu_irq[n] = qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_IRQ);
 }
 
 if (ram_size  0x4000) {
@@ -312,15 +310,13 @@ static void a15_daughterboard_init(const VEDBoardInfo 
*daughterboard,
 
 for (n = 0; n  smp_cpus; n++) {
 ARMCPU *cpu;
-qemu_irq *irqp;
 
 cpu = cpu_arm_init(cpu_model);
 if (!cpu) {
 fprintf(stderr, Unable to find CPU definition\n);
 exit(1);
 }
-irqp = arm_pic_init_cpu(cpu);
-cpu_irq[n] = irqp[ARM_PIC_CPU_IRQ];
+cpu_irq[n] = qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_IRQ);
 }
 
 {
-- 
1.7.9.5

[Qemu-devel] [PULL 20/21] default-configs: Fix A9MP and A15MP config names

2013-08-20 Thread Peter Maydell

When individual CONFIG_ switches for the A9MPcore and A15MPcore
devices were created, they were inadvertently given incorrect names
(CONFIG_ARM9MPCORE and CONFIG_ARM15MPCORE). These CPUs are
Cortex-A9MP and Cortex-A15MP, and in particular the ARM9 is
a different (rather older) CPU than the Cortex-A9. Rename the
CONFIG_ switches to bring them into line with the source file
names and CPU names.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
Message-id: 1376056215-26391-1-git-send-email-peter.mayd...@linaro.org
---
 default-configs/arm-softmmu.mak |4 ++--
 hw/cpu/Makefile.objs|4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index 27cbe3d..ac0815d 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -34,9 +34,9 @@ CONFIG_PFLASH_CFI02=y
 CONFIG_MICRODRIVE=y
 CONFIG_USB_MUSB=y
 
-CONFIG_ARM9MPCORE=y
 CONFIG_ARM11MPCORE=y
-CONFIG_ARM15MPCORE=y
+CONFIG_A9MPCORE=y
+CONFIG_A15MPCORE=y
 
 CONFIG_ARM_GIC=y
 CONFIG_ARM_GIC_KVM=$(CONFIG_KVM)
diff --git a/hw/cpu/Makefile.objs b/hw/cpu/Makefile.objs
index 4461ece..df287c1 100644
--- a/hw/cpu/Makefile.objs
+++ b/hw/cpu/Makefile.objs
@@ -1,5 +1,5 @@
 obj-$(CONFIG_ARM11MPCORE) += arm11mpcore.o
-obj-$(CONFIG_ARM9MPCORE) += a9mpcore.o
-obj-$(CONFIG_ARM15MPCORE) += a15mpcore.o
+obj-$(CONFIG_A9MPCORE) += a9mpcore.o
+obj-$(CONFIG_A15MPCORE) += a15mpcore.o
 obj-$(CONFIG_ICC_BUS) += icc_bus.o
 
-- 
1.7.9.5

[Qemu-devel] [PULL 05/21] hw/arm/highbank: Don't use arm_pic_init_cpu()

2013-08-20 Thread Peter Maydell

Drop the now-deprecated arm_pic_init_cpu() in favour of directly
getting the IRQ line from the ARMCPU object.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
Message-id: 1375977856-25046-5-git-send-email-peter.mayd...@linaro.org
---
 hw/arm/highbank.c |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/hw/arm/highbank.c b/hw/arm/highbank.c
index 35d5511..f733a6c 100644
--- a/hw/arm/highbank.c
+++ b/hw/arm/highbank.c
@@ -209,7 +209,6 @@ static void calxeda_init(QEMUMachineInitArgs *args, enum 
cxmachines machine)
 const char *initrd_filename = args-initrd_filename;
 DeviceState *dev = NULL;
 SysBusDevice *busdev;
-qemu_irq *irqp;
 qemu_irq pic[128];
 int n;
 qemu_irq cpu_irq[4];
@@ -239,8 +238,7 @@ static void calxeda_init(QEMUMachineInitArgs *args, enum 
cxmachines machine)
 
 /* This will become a QOM property eventually */
 cpu-reset_cbar = GIC_BASE_ADDR;
-irqp = arm_pic_init_cpu(cpu);
-cpu_irq[n] = irqp[ARM_PIC_CPU_IRQ];
+cpu_irq[n] = qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_IRQ);
 }
 
 sysmem = get_system_memory();
-- 
1.7.9.5

[Qemu-devel] [PULL 21/21] hw/timer/imx_epit: Simplify and fix imx_epit implementation

2013-08-20 Thread Peter Maydell

From: Peter Chubb peter.ch...@nicta.com.au

When imx_epit.c was last refactored, a common usecase (comparison
register zero) broke.  This patch fixes that, and simplifies the code
yet more.  It also fixes a major thinko in the reset path --- the
wrong bits in the control register were being cleared.

Signed-off-by: Peter Chubb peter.ch...@nicta.com.au
Reviewed-by: Jean-Christophe DUBOIS j...@tribudubois.net
Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 hw/timer/imx_epit.c |   94 ---
 1 file changed, 36 insertions(+), 58 deletions(-)

diff --git a/hw/timer/imx_epit.c b/hw/timer/imx_epit.c
index 117dc7b..dc73d65 100644
--- a/hw/timer/imx_epit.c
+++ b/hw/timer/imx_epit.c
@@ -43,7 +43,7 @@ static char const *imx_epit_reg_name(uint32_t reg)
 }
 
 #  define DPRINTF(fmt, args...) \
-  do { printf(%s:  fmt , __func__, ##args); } while (0)
+do { fprintf(stderr, %s:  fmt , __func__, ##args); } while (0)
 #else
 #  define DPRINTF(fmt, args...) do {} while (0)
 #endif
@@ -152,7 +152,7 @@ static void imx_epit_reset(DeviceState *dev)
 /*
  * Soft reset doesn't touch some bits; hard reset clears them
  */
-s-cr = ~(CR_EN|CR_ENMOD|CR_STOPEN|CR_DOZEN|CR_WAITEN|CR_DBGEN);
+s-cr = (CR_EN|CR_ENMOD|CR_STOPEN|CR_DOZEN|CR_WAITEN|CR_DBGEN);
 s-sr = 0;
 s-lr = TIMER_MAX;
 s-cmp = 0;
@@ -167,7 +167,7 @@ static void imx_epit_reset(DeviceState *dev)
 ptimer_set_limit(s-timer_reload, TIMER_MAX, 1);
 if (s-freq  (s-cr  CR_EN)) {
 /* if the timer is still enabled, restart it */
-ptimer_run(s-timer_reload, 1);
+ptimer_run(s-timer_reload, 0);
 }
 }
 
@@ -218,17 +218,17 @@ static uint64_t imx_epit_read(void *opaque, hwaddr 
offset, unsigned size)
 
 static void imx_epit_reload_compare_timer(IMXEPITState *s)
 {
-if ((s-cr  CR_OCIEN)  s-cmp) {
-/* if the compare feature is on */
+if ((s-cr  (CR_EN | CR_OCIEN)) == (CR_EN | CR_OCIEN))  {
+/* if the compare feature is on and timers are running */
 uint32_t tmp = imx_epit_update_count(s);
+uint64_t next;
 if (tmp  s-cmp) {
-/* reinit the cmp timer if required */
-ptimer_set_count(s-timer_cmp, tmp - s-cmp);
-if ((s-cr  CR_EN)) {
-/* Restart the cmp timer if required */
-ptimer_run(s-timer_cmp, 0);
-}
+/* It'll fire in this round of the timer */
+next = tmp - s-cmp;
+} else { /* catch it next time around */
+next = tmp - s-cmp + ((s-cr  CR_RLD) ? TIMER_MAX : s-lr);
 }
+ptimer_set_count(s-timer_cmp, next);
 }
 }
 
@@ -237,11 +237,14 @@ static void imx_epit_write(void *opaque, hwaddr offset, 
uint64_t value,
 {
 IMXEPITState *s = IMX_EPIT(opaque);
 uint32_t reg = offset  2;
+uint64_t oldcr;
 
 DPRINTF((%s, value = 0x%08x)\n, imx_epit_reg_name(reg), (uint32_t)value);
 
 switch (reg) {
 case 0: /* CR */
+
+oldcr = s-cr;
 s-cr = value  0x03ff;
 if (s-cr  CR_SWR) {
 /* handle the reset */
@@ -250,22 +253,35 @@ static void imx_epit_write(void *opaque, hwaddr offset, 
uint64_t value,
 imx_epit_set_freq(s);
 }
 
-if (s-freq  (s-cr  CR_EN)) {
+if (s-freq  (s-cr  CR_EN)  !(oldcr  CR_EN)) {
 if (s-cr  CR_ENMOD) {
 if (s-cr  CR_RLD) {
 ptimer_set_limit(s-timer_reload, s-lr, 1);
+ptimer_set_limit(s-timer_cmp, s-lr, 1);
 } else {
 ptimer_set_limit(s-timer_reload, TIMER_MAX, 1);
+ptimer_set_limit(s-timer_cmp, TIMER_MAX, 1);
 }
 }
 
 imx_epit_reload_compare_timer(s);
-
-ptimer_run(s-timer_reload, 1);
-} else {
+ptimer_run(s-timer_reload, 0);
+if (s-cr  CR_OCIEN) {
+ptimer_run(s-timer_cmp, 0);
+} else {
+ptimer_stop(s-timer_cmp);
+}
+} else if (!(s-cr  CR_EN)) {
 /* stop both timers */
 ptimer_stop(s-timer_reload);
 ptimer_stop(s-timer_cmp);
+} else  if (s-cr  CR_OCIEN) {
+if (!(oldcr  CR_OCIEN)) {
+imx_epit_reload_compare_timer(s);
+ptimer_run(s-timer_cmp, 0);
+}
+} else {
+ptimer_stop(s-timer_cmp);
 }
 break;
 
@@ -284,13 +300,13 @@ static void imx_epit_write(void *opaque, hwaddr offset, 
uint64_t value,
 /* Also set the limit if the LRD bit is set */
 /* If IOVW bit is set then set the timer value */
 ptimer_set_limit(s-timer_reload, s-lr, s-cr  CR_IOVW);
+ptimer_set_limit(s-timer_cmp, s-lr, 0);
 } else if (s-cr  CR_IOVW) {
 /* If IOVW bit is set then set the timer value */
 ptimer_set_count(s-timer_reload,

[Qemu-devel] [PULL 01/21] target-arm: Implement 'int' loglevel

2013-08-20 Thread Peter Maydell

The 'int' loglevel for recording interrupts and exceptions
requires support in the target-specific code. Implement
it for ARM. This improves debug logging in some situations
that were otherwise pretty opaque, such as when we fault
trying to execute at an exception vector address, which
would otherwise cause an infinite loop of taking exceptions
without any indication in the debug log of what was going on.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
Reviewed-by: Richard Henderson r...@twiddle.net
Message-id: 1375700771-21665-1-git-send-email-peter.mayd...@linaro.org
---
 target-arm/helper.c |   42 ++
 1 file changed, 42 insertions(+)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index 4968391..6d9026d 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -1974,6 +1974,37 @@ static void do_v7m_exception_exit(CPUARMState *env)
pointer.  */
 }
 
+/* Exception names for debug logging; note that not all of these
+ * precisely correspond to architectural exceptions.
+ */
+static const char * const excnames[] = {
+[EXCP_UDEF] = Undefined Instruction,
+[EXCP_SWI] = SVC,
+[EXCP_PREFETCH_ABORT] = Prefetch Abort,
+[EXCP_DATA_ABORT] = Data Abort,
+[EXCP_IRQ] = IRQ,
+[EXCP_FIQ] = FIQ,
+[EXCP_BKPT] = Breakpoint,
+[EXCP_EXCEPTION_EXIT] = QEMU v7M exception exit,
+[EXCP_KERNEL_TRAP] = QEMU intercept of kernel commpage,
+[EXCP_STREX] = QEMU intercept of STREX,
+};
+
+static inline void arm_log_exception(int idx)
+{
+if (qemu_loglevel_mask(CPU_LOG_INT)) {
+const char *exc = NULL;
+
+if (idx = 0  idx  ARRAY_SIZE(excnames)) {
+exc = excnames[idx];
+}
+if (!exc) {
+exc = unknown;
+}
+qemu_log_mask(CPU_LOG_INT, Taking exception %d [%s]\n, idx, exc);
+}
+}
+
 void arm_v7m_cpu_do_interrupt(CPUState *cs)
 {
 ARMCPU *cpu = ARM_CPU(cs);
@@ -1982,6 +2013,8 @@ void arm_v7m_cpu_do_interrupt(CPUState *cs)
 uint32_t lr;
 uint32_t addr;
 
+arm_log_exception(env-exception_index);
+
 lr = 0xfff1;
 if (env-v7m.current_sp)
 lr |= 4;
@@ -2011,6 +2044,7 @@ void arm_v7m_cpu_do_interrupt(CPUState *cs)
 if (nr == 0xab) {
 env-regs[15] += 2;
 env-regs[0] = do_arm_semihosting(env);
+qemu_log_mask(CPU_LOG_INT, ...handled as semihosting call\n);
 return;
 }
 }
@@ -2064,6 +2098,8 @@ void arm_cpu_do_interrupt(CPUState *cs)
 
 assert(!IS_M(env));
 
+arm_log_exception(env-exception_index);
+
 /* TODO: Vectored interrupt controller.  */
 switch (env-exception_index) {
 case EXCP_UDEF:
@@ -2091,6 +2127,7 @@ void arm_cpu_do_interrupt(CPUState *cs)
 || (mask == 0xab  env-thumb))
(env-uncached_cpsr  CPSR_M) != ARM_CPU_MODE_USR) {
 env-regs[0] = do_arm_semihosting(env);
+qemu_log_mask(CPU_LOG_INT, ...handled as semihosting call\n);
 return;
 }
 }
@@ -2108,18 +2145,23 @@ void arm_cpu_do_interrupt(CPUState *cs)
(env-uncached_cpsr  CPSR_M) != ARM_CPU_MODE_USR) {
 env-regs[15] += 2;
 env-regs[0] = do_arm_semihosting(env);
+qemu_log_mask(CPU_LOG_INT, ...handled as semihosting call\n);
 return;
 }
 }
 env-cp15.c5_insn = 2;
 /* Fall through to prefetch abort.  */
 case EXCP_PREFETCH_ABORT:
+qemu_log_mask(CPU_LOG_INT, ...with IFSR 0x%x IFAR 0x%x\n,
+  env-cp15.c5_insn, env-cp15.c6_insn);
 new_mode = ARM_CPU_MODE_ABT;
 addr = 0x0c;
 mask = CPSR_A | CPSR_I;
 offset = 4;
 break;
 case EXCP_DATA_ABORT:
+qemu_log_mask(CPU_LOG_INT, ...with DFSR 0x%x DFAR 0x%x\n,
+  env-cp15.c5_data, env-cp15.c6_data);
 new_mode = ARM_CPU_MODE_ABT;
 addr = 0x10;
 mask = CPSR_A | CPSR_I;
-- 
1.7.9.5

[Qemu-devel] [PULL 14/21] hw/arm/xilinx_zynq: Don't use arm_pic_init_cpu()

2013-08-20 Thread Peter Maydell

Drop the now-deprecated arm_pic_init_cpu() in favour of directly
getting the IRQ line from the ARMCPU object.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
Message-id: 1375977856-25046-14-git-send-email-peter.mayd...@linaro.org
---
 hw/arm/xilinx_zynq.c |7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
index 3444823..0f18c85 100644
--- a/hw/arm/xilinx_zynq.c
+++ b/hw/arm/xilinx_zynq.c
@@ -108,11 +108,9 @@ static void zynq_init(QEMUMachineInitArgs *args)
 MemoryRegion *ocm_ram = g_new(MemoryRegion, 1);
 DeviceState *dev;
 SysBusDevice *busdev;
-qemu_irq *irqp;
 qemu_irq pic[64];
 NICInfo *nd;
 int n;
-qemu_irq cpu_irq;
 
 if (!cpu_model) {
 cpu_model = cortex-a9;
@@ -123,8 +121,6 @@ static void zynq_init(QEMUMachineInitArgs *args)
 fprintf(stderr, Unable to find CPU definition\n);
 exit(1);
 }
-irqp = arm_pic_init_cpu(cpu);
-cpu_irq = irqp[ARM_PIC_CPU_IRQ];
 
 /* max 2GB ram */
 if (ram_size  0x8000) {
@@ -159,7 +155,8 @@ static void zynq_init(QEMUMachineInitArgs *args)
 qdev_init_nofail(dev);
 busdev = SYS_BUS_DEVICE(dev);
 sysbus_mmio_map(busdev, 0, 0xF8F0);
-sysbus_connect_irq(busdev, 0, cpu_irq);
+sysbus_connect_irq(busdev, 0,
+   qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_IRQ));
 
 for (n = 0; n  64; n++) {
 pic[n] = qdev_get_gpio_in(dev, n);
-- 
1.7.9.5

[Qemu-devel] [PULL 15/21] hw/arm/pic_cpu: Remove the now-unneeded arm_pic_init_cpu()

2013-08-20 Thread Peter Maydell

Now all the boards have been converted arm_pic_init_cpu()
is unused and can just be deleted.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
Message-id: 1375977856-25046-15-git-send-email-peter.mayd...@linaro.org
---
 hw/arm/Makefile.objs |2 +-
 hw/arm/pic_cpu.c |   25 -
 include/hw/arm/arm.h |5 -
 3 files changed, 1 insertion(+), 31 deletions(-)
 delete mode 100644 hw/arm/pic_cpu.c

diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
index 9e3a06f..3671b42 100644
--- a/hw/arm/Makefile.objs
+++ b/hw/arm/Makefile.objs
@@ -1,6 +1,6 @@
 obj-y += boot.o collie.o exynos4_boards.o gumstix.o highbank.o
 obj-y += integratorcp.o kzm.o mainstone.o musicpal.o nseries.o
-obj-y += omap_sx1.o palm.o pic_cpu.o realview.o spitz.o stellaris.o
+obj-y += omap_sx1.o palm.o realview.o spitz.o stellaris.o
 obj-y += tosa.o versatilepb.o vexpress.o xilinx_zynq.o z2.o
 
 obj-y += armv7m.o exynos4210.o pxa2xx.o pxa2xx_gpio.o pxa2xx_pic.o
diff --git a/hw/arm/pic_cpu.c b/hw/arm/pic_cpu.c
deleted file mode 100644
index 9c36273..000
--- a/hw/arm/pic_cpu.c
+++ /dev/null
@@ -1,25 +0,0 @@
-/*
- * Generic ARM Programmable Interrupt Controller support.
- *
- * Copyright (c) 2006 CodeSourcery.
- * Written by Paul Brook
- *
- * This code is licensed under the LGPL
- */
-
-#include hw/hw.h
-#include hw/arm/arm.h
-
-/* Backwards compatibility shim; this can disappear when all
- * board models have been updated to get IRQ and FIQ lines directly
- * from the ARMCPU object rather than by calling this function.
- */
-qemu_irq *arm_pic_init_cpu(ARMCPU *cpu)
-{
-DeviceState *dev = DEVICE(cpu);
-qemu_irq *irqs = g_new(qemu_irq, 2);
-
-irqs[0] = qdev_get_gpio_in(dev, ARM_CPU_IRQ);
-irqs[1] = qdev_get_gpio_in(dev, ARM_CPU_FIQ);
-return irqs;
-}
diff --git a/include/hw/arm/arm.h b/include/hw/arm/arm.h
index bae87c6..ecbbba8 100644
--- a/include/hw/arm/arm.h
+++ b/include/hw/arm/arm.h
@@ -14,11 +14,6 @@
 #include exec/memory.h
 #include hw/irq.h
 
-/* The CPU is also modelled as an interrupt controller.  */
-#define ARM_PIC_CPU_IRQ 0
-#define ARM_PIC_CPU_FIQ 1
-qemu_irq *arm_pic_init_cpu(ARMCPU *cpu);
-
 /* armv7m.c */
 qemu_irq *armv7m_init(MemoryRegion *address_space_mem,
   int flash_size, int sram_size,
-- 
1.7.9.5

[Qemu-devel] [PULL 19/21] hw/cpu/a15mpcore: Wire generic timer outputs to GIC inputs

2013-08-20 Thread Peter Maydell

Now our A15 CPU implements the generic timers, we can wire them
up to the appropriate inputs on the GIC.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
Tested-by: Laurent Desnogues laurent.desnog...@gmail.com
Message-id: 1376065080-26661-5-git-send-email-peter.mayd...@linaro.org
---
 hw/cpu/a15mpcore.c |   18 ++
 1 file changed, 18 insertions(+)

diff --git a/hw/cpu/a15mpcore.c b/hw/cpu/a15mpcore.c
index 4f37964..af182da 100644
--- a/hw/cpu/a15mpcore.c
+++ b/hw/cpu/a15mpcore.c
@@ -49,6 +49,8 @@ static int a15mp_priv_init(SysBusDevice *dev)
 A15MPPrivState *s = A15MPCORE_PRIV(dev);
 SysBusDevice *busdev;
 const char *gictype = arm_gic;
+int i;
+CPUState *cpu;
 
 if (kvm_irqchip_in_kernel()) {
 gictype = kvm-arm-gic;
@@ -67,6 +69,22 @@ static int a15mp_priv_init(SysBusDevice *dev)
 /* Pass through inbound GPIO lines to the GIC */
 qdev_init_gpio_in(DEVICE(dev), a15mp_priv_set_irq, s-num_irq - 32);
 
+/* Wire the outputs from each CPU's generic timer to the
+ * appropriate GIC PPI inputs
+ */
+for (i = 0, cpu = first_cpu; i  s-num_cpu; i++, cpu = cpu-next_cpu) {
+DeviceState *cpudev = DEVICE(cpu);
+int ppibase = s-num_irq - 32 + i * 32;
+/* physical timer; we wire it up to the non-secure timer's ID,
+ * since a real A15 always has TrustZone but QEMU doesn't.
+ */
+qdev_connect_gpio_out(cpudev, 0,
+  qdev_get_gpio_in(s-gic, ppibase + 30));
+/* virtual timer */
+qdev_connect_gpio_out(cpudev, 1,
+  qdev_get_gpio_in(s-gic, ppibase + 27));
+}
+
 /* Memory map (addresses are offsets from PERIPHBASE):
  *  0x-0x0fff -- reserved
  *  0x1000-0x1fff -- GIC Distributor
-- 
1.7.9.5

[Qemu-devel] [PULL 09/21] hw/arm/omap*: Don't use arm_pic_init_cpu()

2013-08-20 Thread Peter Maydell

Drop the now-deprecated arm_pic_init_cpu() in favour of directly
getting the IRQ line from the ARMCPU object.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
Message-id: 1375977856-25046-9-git-send-email-peter.mayd...@linaro.org
---
 hw/arm/omap1.c |8 
 hw/arm/omap2.c |8 
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/hw/arm/omap1.c b/hw/arm/omap1.c
index 19be5fc..b6a0b27 100644
--- a/hw/arm/omap1.c
+++ b/hw/arm/omap1.c
@@ -3827,7 +3827,6 @@ struct omap_mpu_state_s *omap310_mpu_init(MemoryRegion 
*system_memory,
 int i;
 struct omap_mpu_state_s *s = (struct omap_mpu_state_s *)
 g_malloc0(sizeof(struct omap_mpu_state_s));
-qemu_irq *cpu_irq;
 qemu_irq dma_irqs[6];
 DriveInfo *dinfo;
 SysBusDevice *busdev;
@@ -3860,14 +3859,15 @@ struct omap_mpu_state_s *omap310_mpu_init(MemoryRegion 
*system_memory,
 
 omap_clkm_init(system_memory, 0xfffece00, 0xe1008000, s);
 
-cpu_irq = arm_pic_init_cpu(s-cpu);
 s-ih[0] = qdev_create(NULL, omap-intc);
 qdev_prop_set_uint32(s-ih[0], size, 0x100);
 qdev_prop_set_ptr(s-ih[0], clk, omap_findclk(s, arminth_ck));
 qdev_init_nofail(s-ih[0]);
 busdev = SYS_BUS_DEVICE(s-ih[0]);
-sysbus_connect_irq(busdev, 0, cpu_irq[ARM_PIC_CPU_IRQ]);
-sysbus_connect_irq(busdev, 1, cpu_irq[ARM_PIC_CPU_FIQ]);
+sysbus_connect_irq(busdev, 0,
+   qdev_get_gpio_in(DEVICE(s-cpu), ARM_CPU_IRQ));
+sysbus_connect_irq(busdev, 1,
+   qdev_get_gpio_in(DEVICE(s-cpu), ARM_CPU_FIQ));
 sysbus_mmio_map(busdev, 0, 0xfffecb00);
 s-ih[1] = qdev_create(NULL, omap-intc);
 qdev_prop_set_uint32(s-ih[1], size, 0x800);
diff --git a/hw/arm/omap2.c b/hw/arm/omap2.c
index ec9610b..36efde0 100644
--- a/hw/arm/omap2.c
+++ b/hw/arm/omap2.c
@@ -2244,7 +2244,6 @@ struct omap_mpu_state_s *omap2420_mpu_init(MemoryRegion 
*sysmem,
 {
 struct omap_mpu_state_s *s = (struct omap_mpu_state_s *)
 g_malloc0(sizeof(struct omap_mpu_state_s));
-qemu_irq *cpu_irq;
 qemu_irq dma_irqs[4];
 DriveInfo *dinfo;
 int i;
@@ -2277,15 +2276,16 @@ struct omap_mpu_state_s *omap2420_mpu_init(MemoryRegion 
*sysmem,
 s-l4 = omap_l4_init(sysmem, OMAP2_L4_BASE, 54);
 
 /* Actually mapped at any 2K boundary in the ARM11 private-peripheral if */
-cpu_irq = arm_pic_init_cpu(s-cpu);
 s-ih[0] = qdev_create(NULL, omap2-intc);
 qdev_prop_set_uint8(s-ih[0], revision, 0x21);
 qdev_prop_set_ptr(s-ih[0], fclk, omap_findclk(s, mpu_intc_fclk));
 qdev_prop_set_ptr(s-ih[0], iclk, omap_findclk(s, mpu_intc_iclk));
 qdev_init_nofail(s-ih[0]);
 busdev = SYS_BUS_DEVICE(s-ih[0]);
-sysbus_connect_irq(busdev, 0, cpu_irq[ARM_PIC_CPU_IRQ]);
-sysbus_connect_irq(busdev, 1, cpu_irq[ARM_PIC_CPU_FIQ]);
+sysbus_connect_irq(busdev, 0,
+   qdev_get_gpio_in(DEVICE(s-cpu), ARM_CPU_IRQ));
+sysbus_connect_irq(busdev, 1,
+   qdev_get_gpio_in(DEVICE(s-cpu), ARM_CPU_FIQ));
 sysbus_mmio_map(busdev, 0, 0x480fe000);
 s-prcm = omap_prcm_init(omap_l4tao(s-l4, 3),
  qdev_get_gpio_in(s-ih[0],
-- 
1.7.9.5

1 2 >

1 - 100 of 198 matches

Mail list logo