Re: [Qemu-devel] [PATCH] spapr: fix LSI interrupt specifiers in the device tree

2017-12-03 Thread David Gibson
On Sat, Dec 02, 2017 at 08:30:11PM +0100, Greg Kurz wrote:
> PAPR 2.7 C.6.9.1.2 describes the "#interrupt-cells" property of the
> PowerPC External Interrupt Source Controller node as follows:
> 
> “#interrupt-cells”
> 
>   Standard property name to define the number of cells in an interrupt-
>   specifier within an interrupt domain.
> 
>   prop-encoded-array: An integer, encoded as with encode-int, that denotes
>   the number of cells required to represent an interrupt specifier in its
>   child nodes.
> 
>   The value of this property for the PowerPC External Interrupt option shall
>   be 2. Thus all interrupt specifiers (as used in the standard “interrupts”
>   property) shall consist of two cells, each containing an integer encoded
>   as with encode-int. The first integer represents the interrupt number the
>   second integer is the trigger code: 0 for edge triggered, 1 for level
>   triggered.
> 
> This patch adds a second cell to the interrupt specifier stored in the
> "interrupts" property of PCI device nodes. This property only exists if
> the Interrupt Pin register is set, ie, the interrupt is level, the extra
> cell is hence set to 1.

Nack.  This format of interrupt specifier is only needed for things
wired directly to the xics.  The PCI INTx interrupts aren't - they go
through the interrupt nexus in the PHB.  The interrupt-map is intended
to remap the simple 1,2,3,4 for INTA..INTD to xics interrupt specifiers.

> This also fixes the interrupt specifiers in the "interrupt-map" property
> of the PHB node, that were setting the second cell to 8 (confusion with
> IRQ_TYPE_LEVEL_LOW ?) instead of 1.

Fixing that is correct, though I think.  As might be the changes in
other places I'll have to check.

> While here, let's introduce defines for the interrupt specifier trigger
> code, and patch other users in spapr.
> 
> Signed-off-by: Greg Kurz 


> ---
> 
> This fixes /proc/interrupts in linux guests where LSIs appear as
> Edge instead of Level.
> ---
>  hw/ppc/spapr_events.c  |2 +-
>  hw/ppc/spapr_pci.c |4 +++-
>  hw/ppc/spapr_vio.c |3 ++-
>  include/hw/ppc/spapr.h |3 +++
>  4 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index e377fc7ddea2..4bcb98f948ea 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -283,7 +283,7 @@ void spapr_dt_events(sPAPRMachineState *spapr, void *fdt)
>  }
>  
>  interrupts[0] = cpu_to_be32(source->irq);
> -interrupts[1] = 0;
> +interrupts[1] = SPAPR_DT_INTERRUPT_IDENTIFIER_EDGE;
>  
>  _FDT(node_offset = fdt_add_subnode(fdt, event_sources, source_name));
>  _FDT(fdt_setprop(fdt, node_offset, "interrupts", interrupts,
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 5a3122a9f9f9..91fedbf0929c 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1231,6 +1231,8 @@ static void spapr_populate_pci_child_dt(PCIDevice *dev, 
> void *fdt, int offset,
>  if (pci_default_read_config(dev, PCI_INTERRUPT_PIN, 1)) {
>  _FDT(fdt_setprop_cell(fdt, offset, "interrupts",
>   pci_default_read_config(dev, PCI_INTERRUPT_PIN, 1)));
> +_FDT(fdt_appendprop_cell(fdt, offset, "interrupts",
> + SPAPR_DT_INTERRUPT_IDENTIFIER_LEVEL));
>  }
>  
>  if (!is_bridge) {
> @@ -2122,7 +2124,7 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
>  irqmap[3] = cpu_to_be32(j+1);
>  irqmap[4] = cpu_to_be32(xics_phandle);
>  irqmap[5] = cpu_to_be32(phb->lsi_table[lsi_num].irq);
> -irqmap[6] = cpu_to_be32(0x8);
> +irqmap[6] = cpu_to_be32(SPAPR_DT_INTERRUPT_IDENTIFIER_LEVEL);
>  }
>  }
>  /* Write interrupt map */
> diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c
> index ea3bc8bd9e21..29a17651a17c 100644
> --- a/hw/ppc/spapr_vio.c
> +++ b/hw/ppc/spapr_vio.c
> @@ -126,7 +126,8 @@ static int vio_make_devnode(VIOsPAPRDevice *dev,
>  }
>  
>  if (dev->irq) {
> -uint32_t ints_prop[] = {cpu_to_be32(dev->irq), 0};
> +uint32_t ints_prop[] = { cpu_to_be32(dev->irq),
> + SPAPR_DT_INTERRUPT_IDENTIFIER_EDGE };
>  
>  ret = fdt_setprop(fdt, node_off, "interrupts", ints_prop,
>sizeof(ints_prop));
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 9d21ca9bde3a..8f6298bde59b 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -590,6 +590,9 @@ void spapr_load_rtas(sPAPRMachineState *spapr, void *fdt, 
> hwaddr addr);
>  
>  #define RTAS_EVENT_SCAN_RATE1
>  
> +#define SPAPR_DT_INTERRUPT_IDENTIFIER_EDGE  0
> +#define SPAPR_DT_INTERRUPT_IDENTIFIER_LEVEL 1
> +
>  typedef struct sPAPRTCETable sPAPRTCETable;
>  
>  #define TYPE_SPAPR_TCE_TABLE "spapr-tce-table"
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.

Re: [Qemu-devel] [PATCH v2 3/7] chardev: Use goto/label instead of do/break/while(0)

2017-12-03 Thread Marc-André Lureau
On Sat, Dec 2, 2017 at 12:24 AM, Eric Blake  wrote:
> Use of a do/while(0) control flow in order to permit an early break
> is an unusual paradigm, and triggers a false positive with a planned
> future syntax check against 'while (0);'.  Rewrite the code to use a
> goto instead.  This patch temporarily keeps an extra level of
> indentation to highlight the change; the next patch cleans it up.
>
> Signed-off-by: Eric Blake 
> ---

Reviewed-by: Marc-André Lureau 


>  chardev/char-serial.c | 13 ++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/chardev/char-serial.c b/chardev/char-serial.c
> index 2f8f83821d..10162f9fa3 100644
> --- a/chardev/char-serial.c
> +++ b/chardev/char-serial.c
> @@ -64,9 +64,14 @@ static void tty_serial_init(int fd, int speed,
>  #endif
>  tcgetattr(fd, &tty);
>
> -#define check_speed(val) if (speed <= val) { spd = B##val; break; }
> +#define check_speed(val) \
> +if (speed <= val) {  \
> +spd = B##val;\
> +goto done;   \
> +}
> +
>  speed = speed * 10 / 11;
> -do {
> +{
>  check_speed(50);
>  check_speed(75);
>  check_speed(110);
> @@ -125,8 +130,10 @@ static void tty_serial_init(int fd, int speed,
>  check_speed(400);
>  #endif
>  spd = B115200;
> -} while (0);
> +}
>
> +#undef check_speed
> + done:
>  cfsetispeed(&tty, spd);
>  cfsetospeed(&tty, spd);
>
> --
> 2.14.3
>
>



-- 
Marc-André Lureau



Re: [Qemu-devel] [PATCH v2 4/7] chardev: Clean up previous patch indentation

2017-12-03 Thread Marc-André Lureau
On Sat, Dec 2, 2017 at 12:24 AM, Eric Blake  wrote:
> The previous patch left in an extra scope layer for ease of
> review; time to remove it.  No semantic change.
>
> Signed-off-by: Eric Blake 
>

Reviewed-by: Marc-André Lureau 


 ---
>  chardev/char-serial.c | 66 
> +--
>  1 file changed, 32 insertions(+), 34 deletions(-)
>
> diff --git a/chardev/char-serial.c b/chardev/char-serial.c
> index 10162f9fa3..93392c528c 100644
> --- a/chardev/char-serial.c
> +++ b/chardev/char-serial.c
> @@ -71,66 +71,64 @@ static void tty_serial_init(int fd, int speed,
>  }
>
>  speed = speed * 10 / 11;
> -{
> -check_speed(50);
> -check_speed(75);
> -check_speed(110);
> -check_speed(134);
> -check_speed(150);
> -check_speed(200);
> -check_speed(300);
> -check_speed(600);
> -check_speed(1200);
> -check_speed(1800);
> -check_speed(2400);
> -check_speed(4800);
> -check_speed(9600);
> -check_speed(19200);
> -check_speed(38400);
> -/* Non-Posix values follow. They may be unsupported on some systems. 
> */
> -check_speed(57600);
> -check_speed(115200);
> +check_speed(50);
> +check_speed(75);
> +check_speed(110);
> +check_speed(134);
> +check_speed(150);
> +check_speed(200);
> +check_speed(300);
> +check_speed(600);
> +check_speed(1200);
> +check_speed(1800);
> +check_speed(2400);
> +check_speed(4800);
> +check_speed(9600);
> +check_speed(19200);
> +check_speed(38400);
> +/* Non-Posix values follow. They may be unsupported on some systems. */
> +check_speed(57600);
> +check_speed(115200);
>  #ifdef B230400
> -check_speed(230400);
> +check_speed(230400);
>  #endif
>  #ifdef B460800
> -check_speed(460800);
> +check_speed(460800);
>  #endif
>  #ifdef B50
> -check_speed(50);
> +check_speed(50);
>  #endif
>  #ifdef B576000
> -check_speed(576000);
> +check_speed(576000);
>  #endif
>  #ifdef B921600
> -check_speed(921600);
> +check_speed(921600);
>  #endif
>  #ifdef B100
> -check_speed(100);
> +check_speed(100);
>  #endif
>  #ifdef B1152000
> -check_speed(1152000);
> +check_speed(1152000);
>  #endif
>  #ifdef B150
> -check_speed(150);
> +check_speed(150);
>  #endif
>  #ifdef B200
> -check_speed(200);
> +check_speed(200);
>  #endif
>  #ifdef B250
> -check_speed(250);
> +check_speed(250);
>  #endif
>  #ifdef B300
> -check_speed(300);
> +check_speed(300);
>  #endif
>  #ifdef B350
> -check_speed(350);
> +check_speed(350);
>  #endif
>  #ifdef B400
> -check_speed(400);
> +check_speed(400);
>  #endif
> -spd = B115200;
> -}
> +spd = B115200;
>
>  #undef check_speed
>   done:
> --
> 2.14.3
>
>



-- 
Marc-André Lureau



Re: [Qemu-devel] [PATCH 1/7] target/arm: Handle SPSEL and current stack being out of sync in MSP/PSP reads

2017-12-03 Thread Richard Henderson
On 12/01/2017 10:44 AM, Peter Maydell wrote:
> For v8M it is possible for the CONTROL.SPSEL bit value and the
> current stack to be out of sync. This means we need to update
> the checks used in reads and writes of the PSP and MSP special
> registers to use v7m_using_psp() rather than directly checking
> the SPSEL bit in the control register.
> 
> Signed-off-by: Peter Maydell 
> ---
>  target/arm/helper.c | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)

Reviewed-by: Richard Henderson 


r~



Re: [Qemu-devel] [PATCH 2/7] target/arm: Allow explicit writes to CONTROL.SPSEL in Handler mode

2017-12-03 Thread Richard Henderson
On 12/01/2017 10:44 AM, Peter Maydell wrote:
> In ARMv7M the CPU ignores explicit writes to CONTROL.SPSEL
> in Handler mode. In v8M the behaviour is slightly different:
> writes to the bit are permitted but will have no effect.
> 
> We've already done the hard work to handle the value in
> CONTROL.SPSEL being out of sync with what stack pointer is
> actually in use, so all we need to do to fix this last loose
> end is to update the condition we use to guard whether we
> call write_v7m_control_spsel() on the register write.
> 
> Signed-off-by: Peter Maydell 
> ---
>  target/arm/helper.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)

Reviewed-by: Richard Henderson 


r~



Re: [Qemu-devel] [PATCH 3/7] target/arm: Add missing M profile case to regime_is_user()

2017-12-03 Thread Richard Henderson
On 12/01/2017 10:44 AM, Peter Maydell wrote:
> When we added the ARMMMUIdx_MSUser MMU index we forgot to
> add it to the case statement in regime_is_user(), so we
> weren't treating it as unprivileged when doing MPU lookups.
> Correct the omission.
> 
> Signed-off-by: Peter Maydell 
> ---
>  target/arm/helper.c | 1 +
>  1 file changed, 1 insertion(+)

Reviewed-by: Richard Henderson 


r~



Re: [Qemu-devel] [PATCH 4/7] target/arm: Split M profile MNegPri mmu index into user and priv

2017-12-03 Thread Richard Henderson
On 12/01/2017 10:44 AM, Peter Maydell wrote:
> For M profile, we currently have an mmu index MNegPri for
> "requested execution priority negative". This fails to
> distinguish "requested execution priority negative, privileged"
> from "requested execution priority negative, usermode", but
> the two can return different results for MPU lookups. Fix this
> by splitting MNegPri into MNegPriPriv and MNegPriUser, and
> similarly for the Secure equivalent MSNegPri.
> 
> This takes us from 6 M profile MMU modes to 8, which means
> we need to bump NB_MMU_MODES; this is OK since the point
> where we are forced to reduce TLB sizes is 9 MMU modes.
> 
> (It would in theory be possible to stick with 6 MMU indexes:
> {mpu-disabled,user,privileged} x {secure,nonsecure} since
> in the MPU-disabled case the result of an MPU lookup is
> always the same for both user and privileged code. However
> we would then need to rework the TB flags handling to put
> user/priv into the TB flags separately from the mmuidx.
> Adding an extra couple of mmu indexes is simpler.)
> 
> Signed-off-by: Peter Maydell 
> ---
>  target/arm/cpu.h   | 49 -
>  target/arm/internals.h |  6 --
>  target/arm/helper.c| 14 ++
>  target/arm/translate.c |  8 ++--
>  4 files changed, 48 insertions(+), 29 deletions(-)

Reviewed-by: Richard Henderson 


r~



Re: [Qemu-devel] [PATCH 5/7] target/arm: Create new arm_v7m_mmu_idx_for_secstate_and_priv()

2017-12-03 Thread Richard Henderson
On 12/01/2017 10:44 AM, Peter Maydell wrote:
> The TT instruction is going to need to look up the MMU index
> for a specified security and privilege state. Refactor the
> existing arm_v7m_mmu_idx_for_secstate() into a version that
> lets you specify the privilege state and one that uses the
> current state of the CPU.
> 
> Signed-off-by: Peter Maydell 
> ---
>  target/arm/cpu.h | 21 -
>  1 file changed, 16 insertions(+), 5 deletions(-)

Reviewed-by: Richard Henderson 


r~



Re: [Qemu-devel] [PATCH 6/7] target/arm: Factor MPU lookup code out of get_phys_addr_pmsav8()

2017-12-03 Thread Richard Henderson
On 12/01/2017 10:44 AM, Peter Maydell wrote:
> For the TT instruction we're going to need to do an MPU lookup that
> also tells us which MPU region the access hit. This requires us
> to do the MPU lookup without first doing the SAU security access
> check, so pull the MPU lookup parts of get_phys_addr_pmsav8()
> out into their own function.
> 
> The TT instruction also needs to know the MPU region number which
> the lookup hit, so provide this information to the caller of the
> MPU lookup code, even though get_phys_addr_pmsav8() doesn't
> need to know it.
> 
> Signed-off-by: Peter Maydell 
> ---
>  target/arm/helper.c | 130 
> +++-
>  1 file changed, 79 insertions(+), 51 deletions(-)

Reviewed-by: Richard Henderson 


r~



Re: [Qemu-devel] [Bug 1735384] Re: OpenJDK JVM segfaults on qemu-sh4 (regression)

2017-12-03 Thread Thomas Huth
On 01.12.2017 00:25, John Paul Adrian Glaubitz wrote:
> The offending commit is:
> 
> d25f2a72272b9ffe0d06710d6217d1169bc2cc7d is the first bad commit
> commit d25f2a72272b9ffe0d06710d6217d1169bc2cc7d
> Author: Alex Bennée 
> Date:   Mon Nov 13 13:55:27 2017 +
> 
> accel/tcg/translate-all: expand cpu_restore_state addr check
> 
> We are still seeing signals during translation time when we walk over
> a page protection boundary. This expands the check to ensure the host
> PC is inside the code generation buffer. The original suggestion was
> to check versus tcg_ctx.code_gen_ptr but as we now segment the
> translation buffer we have to settle for just a general check for
> being inside.
> 
> I've also fixed up the declaration to make it clear it can deal with
> invalid addresses. A later patch will fix up the call sites.
> 
> Signed-off-by: Alex Bennée 
> Reported-by: Peter Maydell 
> Reviewed-by: Laurent Vivier 
> Reviewed-by: Richard Henderson 
> Message-id: 20171108153245.20740-2-alex.ben...@linaro.org
> Suggested-by: Paolo Bonzini 
> Cc: Richard Henderson 
> Tested-by: Peter Maydell 
> Signed-off-by: Peter Maydell 
> 
> :04 04 da50c4c43089d3ee7d1e9ad50d3c9036114e5f11 
> cd6a0dcaa1d284fe5439f6f3b61547d4b0662768 M  accel
> :04 04 c294a7c102d27295f8d81cc06b5d4d17357440ad 
> 5a1268b7634f69f0806f22161ec7d6a1a26c8812 M  include
> 
> Reverting the commit resolves the issue.
> 

Alex, any ideas what might be wrong here?

 Thomas



Re: [Qemu-devel] [PATCH 7/7] target/arm: Implement TT instruction

2017-12-03 Thread Richard Henderson
On 12/01/2017 10:44 AM, Peter Maydell wrote:
> Implement the TT instruction which queries the security
> state and access permissions of a memory location.
> 
> Signed-off-by: Peter Maydell 
> ---
>  target/arm/helper.h|   2 +
>  target/arm/helper.c| 108 
> +
>  target/arm/translate.c |  29 -
>  3 files changed, 138 insertions(+), 1 deletion(-)


Reviewed-by: Richard Henderson 


r~



Re: [Qemu-devel] [PULL 3/8] nbd-client: Refuse read-only client with BDRV_O_RDWR

2017-12-03 Thread Richard W.M. Jones
On Thu, Nov 09, 2017 at 10:59:34AM -0600, Eric Blake wrote:
> The NBD spec says that clients should not try to write/trim to
> an export advertised as read-only by the server.  But we failed
> to check that, and would allow the block layer to use NBD with
> BDRV_O_RDWR even when the server is read-only, which meant we
> were depending on the server sending a proper EPERM failure for
> various commands, and also exposes a leaky abstraction: using
> qemu-io in read-write mode would succeed on 'w -z 0 0' because
> of local short-circuiting logic, but 'w 0 0' would send a
> request over the wire (where it then depends on the server, and
> fails at least for qemu-nbd but might pass for other NBD
> implementations).
> 
> With this patch, a client MUST request read-only mode to access
> a server that is doing a read-only export, or else it will get
> a message like:
> 
> can't open device nbd://localhost:10809/foo: request for write access 
> conflicts with read-only export

Nice one!

This caught 3 bugs in the nbdkit test suite where we were opening the
connection for write to a read-only server instance, and it happened
to work because the test did not write anything.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW



Re: [Qemu-devel] [PATCH v3 2/2] tests/qemu-iotests: adding savevm/loadvm with postcopy flag test

2017-12-03 Thread Daniel Henrique Barboza

Hi Max,

On 12/01/2017 06:13 PM, Max Reitz wrote:

On 2017-11-16 23:35, Daniel Henrique Barboza wrote:

This patch implements a test case for the scenario that was failing
prior to the patch "migration/ram.c: do not set 'postcopy_running' in
POSTCOPY_INCOMING_END".

This new test file 198 was derived from the test file 181 authored
by Kevin Wolf.

CC: Kevin Wolf 
CC: Max Reitz 
CC: Cleber Rosa 
Signed-off-by: Daniel Henrique Barboza 

---
I CCed Cleber Rosa because this patch was developed at the same
time his patch series "[PATCH 00/10] I/O tests cleanups" hit the
mailing list. Most of the cleanups/fixes he made in the series was
done in this new test as well.

  tests/qemu-iotests/198 | 110 +
  tests/qemu-iotests/198.out |  20 +
  tests/qemu-iotests/group   |   1 +
  3 files changed, 131 insertions(+)
  create mode 100755 tests/qemu-iotests/198
  create mode 100644 tests/qemu-iotests/198.out

Looks good overall, just two nitpicks below.


diff --git a/tests/qemu-iotests/198 b/tests/qemu-iotests/198
new file mode 100755
index 00..786e37fc95
--- /dev/null
+++ b/tests/qemu-iotests/198
@@ -0,0 +1,110 @@
+#!/bin/bash
+#
+# Test savevm and loadvm after live migration with postcopy flag
+#
+# Copyright (C) 2017, IBM Corporation.
+#
+# This file is derived from tests/qemu-iotests/181 by Kevin Wolf
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+# Creator/Owner : danie...@linux.vnet.ibm.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+status=1   # failure is the default!
+
+MIG_SOCKET="${TEST_DIR}/migrate"
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.qemu
+
+_cleanup()
+{
+rm -f "${MIG_SOCKET}"
+_cleanup_test_img
+_cleanup_qemu
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_supported_fmt qcow2
+_supported_proto generic
+_supported_os Linux
+
+size=64M
+_make_test_img $size
+
+echo
+echo === Starting VMs ===
+echo
+
+qemu_comm_method="monitor"
+
+if [ "$IMGOPTSSYNTAX" = "true" ]; then
+_launch_qemu \
+-drive "${TEST_IMG}",cache=${CACHEMODE},id=disk
+else
+_launch_qemu \
+-drive file="${TEST_IMG}",cache=${CACHEMODE},driver=$IMGFMT,id=disk
+fi
+src=$QEMU_HANDLE
+
+if [ "$IMGOPTSSYNTAX" = "true" ]; then
+_launch_qemu \
+-drive "${TEST_IMG}",cache=${CACHEMODE},id=disk \
+-incoming "unix:${MIG_SOCKET}"
+else
+_launch_qemu \
+-drive file="${TEST_IMG}",cache=${CACHEMODE},driver=$IMGFMT,id=disk \
+-incoming "unix:${MIG_SOCKET}"
+fi
+dest=$QEMU_HANDLE
+
+echo
+echo === Set \'migrate_set_capability postcopy-ram on\' and migrate ===
+echo
+
+silent=yes
+_send_qemu_cmd $dest 'migrate_set_capability postcopy-ram on' "(qemu)"
+_send_qemu_cmd $src 'migrate_set_capability postcopy-ram on' "(qemu)"
+_send_qemu_cmd $src "migrate -d unix:${MIG_SOCKET}" "(qemu)"
+
+QEMU_COMM_TIMEOUT=1 qemu_cmd_repeat=10 silent=yes \
+_send_qemu_cmd $src "info migrate" "completed\|failed"
+silent=yes _send_qemu_cmd $src "" "(qemu)"

Would it make sense to query the migration status non-silently here,
too, so that the test output would verify that it has actually succeeded?

(I suppose the savevm/loadvm coming up next would fail if the migration
failed, but maybe if it's easy enough...)


Not exactly non-silently, but we can do something like this:


echo
echo === Check if migration was successful ===
echo

QEMU_COMM_TIMEOUT=1 silent=yes \
    _send_qemu_cmd $src "info migrate" "completed"
silent=yes _send_qemu_cmd $src "" "(qemu)"


If migration failed, the test will fail with timeout error waiting for 
"completed"

prior to savevm. What do you think?





+
+echo
+echo === On destination, execute savevm and loadvm ===
+echo
+
+silent=
+_send_qemu_cmd $dest 'savevm state1"' "(qemu)"
+_send_qemu_cmd $dest 'loadvm state1"' "(qemu)"

Any specific reason why the snapshot name has a trailing double quote?


No reason, just an ugly that slipped by. I'll remove it in the next version.



Thanks,


Daniel




Max


+
+echo
+echo === Shut down and check image ===
+echo
+
+_send_qemu_cmd $src 'quit' ""
+_send_qemu_cmd $dest 'quit' ""
+wait=1 _cleanup_qemu
+
+_check_test_img
+
+# success, all done
+echo "*** done"
+status=0
diff --git a/tests/qemu-iotests/198.out b/tests/qemu-iotests/198.out
new file mode 100644
index 00..6c27303857
--- /dev/null
++

[Qemu-devel] R: Re: [PATCH] qemu-pr-helper: miscellaneous fixes

2017-12-03 Thread Paolo Bonzini

- Dr. David Alan Gilbert  ha scritto:
> * Paolo Bonzini (pbonz...@redhat.com) wrote:
> > 1) Return a generic sense if TEST UNIT READY does not provide one;
> > 
> > 2) Fix two mistakes in copying from the spec.
> > 
> > Reported-by: Dr. David Alan Gilbert 
> > Signed-off-by: Paolo Bonzini 
> 
> Reviewed-by: Dr. David Alan Gilbert 
> 
> (Is there a separate fix to the scsi_target_send_command sense
> comment?)

That will be in 2.12 only.

Paolo

> Dave
> 
> > ---
> >  include/scsi/utils.h  |  6 +-
> >  scsi/qemu-pr-helper.c | 30 ++
> >  scsi/utils.c  | 10 ++
> >  3 files changed, 41 insertions(+), 5 deletions(-)
> > 
> > diff --git a/include/scsi/utils.h b/include/scsi/utils.h
> > index 00a4bdb080..eb07e474ee 100644
> > --- a/include/scsi/utils.h
> > +++ b/include/scsi/utils.h
> > @@ -76,7 +76,11 @@ extern const struct SCSISense sense_code_LUN_FAILURE;
> >  extern const struct SCSISense sense_code_LUN_COMM_FAILURE;
> >  /* Command aborted, Overlapped Commands Attempted */
> >  extern const struct SCSISense sense_code_OVERLAPPED_COMMANDS;
> > -/* LUN not ready, Capacity data has changed */
> > +/* Medium error, Unrecovered read error */
> > +extern const struct SCSISense sense_code_READ_ERROR;
> > +/* LUN not ready, Cause not reportable */
> > +extern const struct SCSISense sense_code_NOT_READY;
> > +/* Unit attention, Capacity data has changed */
> >  extern const struct SCSISense sense_code_CAPACITY_CHANGED;
> >  /* Unit attention, SCSI bus reset */
> >  extern const struct SCSISense sense_code_SCSI_BUS_RESET;
> > diff --git a/scsi/qemu-pr-helper.c b/scsi/qemu-pr-helper.c
> > index dd9785143b..9fe615c73c 100644
> > --- a/scsi/qemu-pr-helper.c
> > +++ b/scsi/qemu-pr-helper.c
> > @@ -314,6 +314,22 @@ static int is_mpath(int fd)
> >  return !strncmp(tgt->target_type, "multipath", DM_MAX_TYPE_NAME);
> >  }
> >  
> > +static SCSISense mpath_generic_sense(int r)
> > +{
> > +switch (r) {
> > +case MPATH_PR_SENSE_NOT_READY:
> > + return SENSE_CODE(NOT_READY);
> > +case MPATH_PR_SENSE_MEDIUM_ERROR:
> > + return SENSE_CODE(READ_ERROR);
> > +case MPATH_PR_SENSE_HARDWARE_ERROR:
> > + return SENSE_CODE(TARGET_FAILURE);
> > +case MPATH_PR_SENSE_ABORTED_COMMAND:
> > + return SENSE_CODE(IO_ERROR);
> > +default:
> > + abort();
> > +}
> > +}
> > +
> >  static int mpath_reconstruct_sense(int fd, int r, uint8_t *sense)
> >  {
> >  switch (r) {
> > @@ -329,7 +345,13 @@ static int mpath_reconstruct_sense(int fd, int r, 
> > uint8_t *sense)
> >   */
> >  uint8_t cdb[6] = { TEST_UNIT_READY };
> >  int sz = 0;
> > -return do_sgio(fd, cdb, sense, NULL, &sz, SG_DXFER_NONE);
> > +int r = do_sgio(fd, cdb, sense, NULL, &sz, SG_DXFER_NONE);
> > +
> > +if (r != GOOD) {
> > +return r;
> > +}
> > +scsi_build_sense(sense, mpath_generic_sense(r));
> > +return CHECK_CONDITION;
> >  }
> >  
> >  case MPATH_PR_SENSE_UNIT_ATTENTION:
> > @@ -449,7 +471,7 @@ static int multipath_pr_out(int fd, const uint8_t *cdb, 
> > uint8_t *sense,
> >  memset(¶mp, 0, sizeof(paramp));
> >  memcpy(¶mp.key, ¶m[0], 8);
> >  memcpy(¶mp.sa_key, ¶m[8], 8);
> > -paramp.sa_flags = param[10];
> > +paramp.sa_flags = param[20];
> >  if (sz > PR_OUT_FIXED_PARAM_SIZE) {
> >  size_t transportid_len;
> >  int i, j;
> > @@ -478,8 +500,8 @@ static int multipath_pr_out(int fd, const uint8_t *cdb, 
> > uint8_t *sense,
> >  j += offsetof(struct transportid, n_port_name[8]);
> >  i += 24;
> >  break;
> > -case 3:
> > -case 0x43:
> > +case 5:
> > +case 0x45:
> >  /* iSCSI transport.  */
> >  len = lduw_be_p(¶m[i + 2]);
> >  if (len > 252 || (len & 3) || i + len + 4 > 
> > transportid_len) {
> > diff --git a/scsi/utils.c b/scsi/utils.c
> > index 5684951b12..e4182a9b09 100644
> > --- a/scsi/utils.c
> > +++ b/scsi/utils.c
> > @@ -211,6 +211,16 @@ const struct SCSISense sense_code_LUN_COMM_FAILURE = {
> >  .key = ABORTED_COMMAND, .asc = 0x08, .ascq = 0x00
> >  };
> >  
> > +/* Medium Error, Unrecovered read error */
> > +const struct SCSISense sense_code_READ_ERROR = {
> > +.key = MEDIUM_ERROR, .asc = 0x11, .ascq = 0x00
> > +};
> > +
> > +/* Not ready, Cause not reportable */
> > +const struct SCSISense sense_code_NOT_READY = {
> > +.key = NOT_READY, .asc = 0x04, .ascq = 0x00
> > +};
> > +
> >  /* Unit attention, Capacity data has changed */
> >  const struct SCSISense sense_code_CAPACITY_CHANGED = {
> >  .key = UNIT_ATTENTION, .asc = 0x2a, .ascq = 0x09
> > -- 
> > 2.14.3
> > 
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [Qemu-devel] [PATCH 05/17] timer: generalize Dallas/Maxim RTC i2c devices

2017-12-03 Thread Michael Davidsaver
On 11/29/2017 11:13 PM, David Gibson wrote:
> On Sun, Nov 26, 2017 at 03:59:03PM -0600, Michael Davidsaver wrote:
>> Support for: ds1307, ds1337, ds1338, ds1339,
>> ds1340, ds1375, ds1388, and ds3231.
>>
>> Tested with ds1338 and ds1375.
>>
>> Signed-off-by: Michael Davidsaver 
> 
> I certainly like the idea of consolidating this code, but reviewing to
> see that the new code really is a generalization of the old is
> something I won't have time for for a while.
> 
> Also, hw/timer is not within my purview so it'll probably need to go
> another path to merge.

Could you be a bit more explicit about what, if anything, I need to do
to move this forward?


>> ---
>>  default-configs/arm-softmmu.mak |   2 +-
>>  hw/timer/Makefile.objs  |   2 +-
>>  hw/timer/ds-rtc-i2c.c   | 461 
>> 
>>  hw/timer/ds1338.c   | 239 -
>>  4 files changed, 463 insertions(+), 241 deletions(-)
>>  create mode 100644 hw/timer/ds-rtc-i2c.c
>>  delete mode 100644 hw/timer/ds1338.c
>>
>> diff --git a/default-configs/arm-softmmu.mak 
>> b/default-configs/arm-softmmu.mak
>> index d37edc4312..b857823681 100644
>> --- a/default-configs/arm-softmmu.mak
>> +++ b/default-configs/arm-softmmu.mak
>> @@ -31,7 +31,7 @@ CONFIG_SMC91C111=y
>>  CONFIG_ALLWINNER_EMAC=y
>>  CONFIG_IMX_FEC=y
>>  CONFIG_FTGMAC100=y
>> -CONFIG_DS1338=y
>> +CONFIG_DSRTCI2C=y
>>  CONFIG_PFLASH_CFI01=y
>>  CONFIG_PFLASH_CFI02=y
>>  CONFIG_MICRODRIVE=y
>> diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs
>> index 8c19eac3b6..290015ebec 100644
>> --- a/hw/timer/Makefile.objs
>> +++ b/hw/timer/Makefile.objs
>> @@ -3,7 +3,7 @@ common-obj-$(CONFIG_ARM_MPTIMER) += arm_mptimer.o
>>  common-obj-$(CONFIG_ARM_V7M) += armv7m_systick.o
>>  common-obj-$(CONFIG_A9_GTIMER) += a9gtimer.o
>>  common-obj-$(CONFIG_CADENCE) += cadence_ttc.o
>> -common-obj-$(CONFIG_DS1338) += ds1338.o
>> +common-obj-$(CONFIG_DSRTCI2C) += ds-rtc-i2c.o
>>  common-obj-$(CONFIG_HPET) += hpet.o
>>  common-obj-$(CONFIG_I8254) += i8254_common.o i8254.o
>>  common-obj-$(CONFIG_M48T59) += m48t59.o
>> diff --git a/hw/timer/ds-rtc-i2c.c b/hw/timer/ds-rtc-i2c.c
>> new file mode 100644
>> index 00..ad2f8f2a68
>> --- /dev/null
>> +++ b/hw/timer/ds-rtc-i2c.c
>> @@ -0,0 +1,461 @@
>> +/* Emulation of various Dallas/Maxim RTCs accessed via I2C bus
>> + *
>> + * Copyright (c) 2017 Michael Davidsaver
>> + * Copyright (c) 2009 CodeSourcery
>> + *
>> + * Authors: Michael Davidsaver
>> + *  Paul Brook
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>> + * the LICENSE file in the top-level directory.
>> + *
>> + * Models real time read/set and NVRAM.
>> + * Does not model alarms, or control/status registers.
>> + *
>> + * Generalized register map is:
>> + *   [Current time]
>> + *   [Alarm settings] (optional)
>> + *   [Control/Status] (optional)
>> + *   [Non-volatile memory] (optional)
>> + *
>> + * The current time registers are almost always the same,
>> + * with the exception being that some have a CENTURY bit
>> + * in the month register.
>> + */
>> +#include "qemu/osdep.h"
>> +#include "qemu/log.h"
>> +#include "qemu/timer.h"
>> +#include "qemu/bcd.h"
>> +#include "hw/hw.h"
>> +#include "hw/registerfields.h"
>> +#include "hw/i2c/i2c.h"
>> +#include "sysemu/qtest.h"
>> +#include "qemu/error-report.h"
>> +
>> +/* #define DEBUG_DSRTC */
>> +
>> +#ifdef DEBUG_DSRTC
>> +#define DPRINTK(FMT, ...) info_report(TYPE_DSRTC " : " FMT, ## __VA_ARGS__)
>> +#else
>> +#define DPRINTK(FMT, ...) do {} while (0)
>> +#endif
>> +
>> +#define LOG(MSK, FMT, ...) qemu_log_mask(MSK, TYPE_DSRTC " : " FMT "\n", \
>> +## __VA_ARGS__)
>> +
>> +#define DSRTC_REGSIZE (0x40)
>> +
>> +/* values stored in BCD */
>> +/* 00-59 */
>> +#define R_SEC   (0x0)
>> +/* 00-59 */
>> +#define R_MIN   (0x1)
>> +#define R_HOUR  (0x2)
>> +/* 1-7 */
>> +#define R_WDAY  (0x3)
>> +/* 0-31 */
>> +#define R_DATE  (0x4)
>> +#define R_MONTH (0x5)
>> +/* 0-99 */
>> +#define R_YEAR  (0x6)
>> +
>> +/* use 12 hour mode when set */
>> +FIELD(HOUR, SET12, 6, 1)
>> +/* 00-23 */
>> +FIELD(HOUR, HOUR24, 0, 6)
>> +FIELD(HOUR, AMPM, 5, 1)
>> +/* 1-12 (not 0-11!) */
>> +FIELD(HOUR, HOUR12, 0, 5)
>> +
>> +/* 1-12 */
>> +FIELD(MONTH, MONTH, 0, 5)
>> +FIELD(MONTH, CENTURY, 7, 1)
>> +
>> +typedef struct DSRTCInfo {
>> +/* if bit 7 of the Month register is set after Y2K */
>> +bool has_century;
>> +/* address of first non-volatile memory cell.
>> + * nv_start >= reg_end means no NV memory.
>> + */
>> +uint8_t nv_start;
>> +/* total size of register range.  When address counter rolls over. */
>> +uint8_t reg_size;
>> +} DSRTCInfo;
>> +
>> +typedef struct DSRTCState {
>> +I2CSlave parent_obj;
>> +
>> +const DSRTCInfo *info;
>> +
>> +qemu_irq alarm_irq;
>> +
>> +/* register address counter */
>> +uint8_t addr;
>> +/* when writing, whether the address has been sent */
>> + 

[Qemu-devel] [Bug 1715715] Re: [qemu-ppc] Segfault when booting from HD after MacOS9 install

2017-12-03 Thread Mark Cave-Ayland
I've just tested MacOS 9.2.1 as part of my QEMU pre-release testing and
it works fine for me with latest QEMU git and the following command
line:

qemu-system-ppc -M mac99 -m 256 -drive
file=os921.qcow2,format=qcow2,media=disk -net nic,model=rtl8139 -net
user

Note that the QEMU mac99 machine now defaults to using the sungem device
which should give out-of-the-box networking for supported OS 9 and OS X
versions, including 9.2.1 as you mention, if you simply specify "-net
nic,model=sungem -net user" instead. Otherwise you will need to make
sure that the rtl8139 MacOS 9 drivers have been installed inside the
guest.

If that doesn't help then does it work if you disable spice and/or
reduce the size of the disk image? Many older OSs will struggle to read
disks that are 128G in size so you may find that your image is corrupted
- perhaps try using something around 4G instead?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1715715

Title:
  [qemu-ppc] Segfault when booting from HD after MacOS9 install

Status in QEMU:
  New

Bug description:
  I created an empty 128G qcow2 image and booted from a Mac OS 9.2.1
  Install CD, in which I was able to install the OS successfully to the
  hard drive. Upon reboot, this time from the hard drive directly, qemu-
  system-ppc segfaults. Host system is Ubuntu 16.04.2 with latest qemu
  commit.

  qemu --version reports "2.10.50 (v2.10.0-244-gb07d1c2-dirty)", but I
  used git commit b07d1c2f5607489d4d4a6a65ce36a3e896ac065e and built
  with "./configure --target-list=ppc-softmmu --enable-debug --disable-
  strip".

  Here is the command-line arguments:

  qemu-system-ppc -boot c -g 1024x768x32 -M mac99 -m 256 -prom-env
  'auto-boot?=true' -prom-env 'boot-args=-v' -prom-env 'vga-ndrv?=true'
  -drive file=../os9.img,format=raw,media=cdrom -drive
  file=MacOS9.qcow2,format=qcow2,media=disk -spice
  port=5901,password=XXX -net nic,model=rtl8139 -net user -monitor stdio

  And the GDB backtrace:

  Program terminated with signal SIGSEGV, Segmentation fault.
  #0  0x559065fe7d3a in timer_mod (ts=0x0, expire_time=888960717010) at 
util/qemu-timer.c:462
  462 timer_mod_ns(ts, expire_time * ts->scale);
  [Current thread is 1 (Thread 0x7f60e43cb700 (LWP 9853))]
  (gdb) bt
  #0  0x559065fe7d3a in timer_mod (ts=0x0, expire_time=888960717010) at 
util/qemu-timer.c:462
  #1  0x559065d63769 in openpic_tmr_set_tmr (tmr=0x5590676fa7e0, val=96, 
enabled=true) at hw/intc/openpic.c:861
  #2  0x559065d63995 in openpic_tmr_write (opaque=0x5590676f71f0, addr=16, 
val=96, len=4) at hw/intc/openpic.c:912
  #3  0x559065b02811 in memory_region_write_accessor (mr=0x5590676f7710, 
addr=32, value=0x7f60e43c7da8, size=4, shift=0, mask=4294967295, attrs=...) at 
/home/bp/qemu/memory.c:529
  #4  0x559065b02a29 in access_with_adjusted_size (addr=32, 
value=0x7f60e43c7da8, size=1, access_size_min=4, access_size_max=4, 
access=0x559065b02727 , mr=0x5590676f7710, 
attrs=...) at /home/bp/qemu/memory.c:595
  #5  0x559065b051eb in memory_region_dispatch_write (mr=0x5590676f7710, 
addr=32, data=96, size=1, attrs=...) at /home/bp/qemu/memory.c:1337
  #6  0x559065aa3a36 in address_space_write_continue (as=0x559067614d90, 
addr=2147750160, attrs=..., buf=0x7f60e43c7ed0 "`_'\310`\177", len=1, addr1=32, 
l=1, mr=0x5590676f7710) at /home/bp/qemu/exec.c:2942
  #7  0x559065aa3b84 in address_space_write (as=0x559067614d90, 
addr=2147750160, attrs=..., buf=0x7f60e43c7ed0 "`_'\310`\177", len=1) at 
/home/bp/qemu/exec.c:2987
  #8  0x559065aa2ec0 in subpage_write (opaque=0x7f60c8275fc0, addr=272, 
value=96, len=1, attrs=...) at /home/bp/qemu/exec.c:2565
  #9  0x559065b02906 in memory_region_write_with_attrs_accessor 
(mr=0x7f60c8275fc0, addr=272, value=0x7f60e43c7fc8, size=1, shift=0, mask=255, 
attrs=...) at /home/bp/qemu/memory.c:555
  #10 0x559065b029d3 in access_with_adjusted_size (addr=272, 
value=0x7f60e43c7fc8, size=1, access_size_min=1, access_size_max=8, 
access=0x559065b02818 , 
mr=0x7f60c8275fc0, attrs=...) at /home/bp/qemu/memory.c:590
  #11 0x559065b0523a in memory_region_dispatch_write (mr=0x7f60c8275fc0, 
addr=272, data=96, size=1, attrs=...) at /home/bp/qemu/memory.c:1344
  #12 0x559065b175db in io_writex (env=0x7f60e43d42a0, 
iotlbentry=0x7f60e43e8130, mmu_idx=3, val=96, addr=2147750160, 
retaddr=140054158295744, size=1) at /home/bp/qemu/accel/tcg/cputlb.c:807
  #13 0x559065b18055 in io_writeb (env=0x7f60e43d42a0, mmu_idx=3, index=65, 
val=96 '`', addr=2147750160, retaddr=140054158295744) at 
/home/bp/qemu/softmmu_template.h:265
  #14 0x559065b181ea in helper_ret_stb_mmu (env=0x7f60e43d42a0, 
addr=2147750160, val=96 '`', oi=3, retaddr=140054158295744) at 
/home/bp/qemu/softmmu_template.h:300
  #15 0x7f60e65ac2c0 in code_gen_buffer ()
  #16 0x559065b1ff26 in cpu_tb_exec (cpu=0x7f60e43cc010, itb=0x7f60e65ac5c0 
) at /home/bp/qemu/accel/tcg/c

[Qemu-devel] [Bug 1715715] Re: [qemu-ppc] Segfault when booting from HD after MacOS9 install

2017-12-03 Thread Brad Parker
I just tried the latest git and it actually boots fine with your
command... so I guess whatever issue I was having (the null dereference
in the timer code I pasted above) must have been fixed... however I've
noticed another issue with a different command that causes the bootup to
hang:

qemu-system-ppc -boot c -g 1024x768x32 -M mac99 -m 256 -prom-env 'auto-
boot?=true' -prom-env 'boot-args=-v' -prom-env 'vga-ndrv?=true' -drive
file=os9.2.1.iso,format=raw,media=cdrom -drive
file=os921.qcow2,format=qcow2,media=disk -spice port=5901,password=XXX
-net nic,model=sungem -net user -monitor stdio

This hangs at bootup at "Trying hd:,\\:tbxi" and never progresses any
further. If I remove the cdrom then it boots fine... however, simply
adding the cdrom to your working command, it still works there... not
sure what's going on, but thanks for the help. I have something that
works now.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1715715

Title:
  [qemu-ppc] Segfault when booting from HD after MacOS9 install

Status in QEMU:
  New

Bug description:
  I created an empty 128G qcow2 image and booted from a Mac OS 9.2.1
  Install CD, in which I was able to install the OS successfully to the
  hard drive. Upon reboot, this time from the hard drive directly, qemu-
  system-ppc segfaults. Host system is Ubuntu 16.04.2 with latest qemu
  commit.

  qemu --version reports "2.10.50 (v2.10.0-244-gb07d1c2-dirty)", but I
  used git commit b07d1c2f5607489d4d4a6a65ce36a3e896ac065e and built
  with "./configure --target-list=ppc-softmmu --enable-debug --disable-
  strip".

  Here is the command-line arguments:

  qemu-system-ppc -boot c -g 1024x768x32 -M mac99 -m 256 -prom-env
  'auto-boot?=true' -prom-env 'boot-args=-v' -prom-env 'vga-ndrv?=true'
  -drive file=../os9.img,format=raw,media=cdrom -drive
  file=MacOS9.qcow2,format=qcow2,media=disk -spice
  port=5901,password=XXX -net nic,model=rtl8139 -net user -monitor stdio

  And the GDB backtrace:

  Program terminated with signal SIGSEGV, Segmentation fault.
  #0  0x559065fe7d3a in timer_mod (ts=0x0, expire_time=888960717010) at 
util/qemu-timer.c:462
  462 timer_mod_ns(ts, expire_time * ts->scale);
  [Current thread is 1 (Thread 0x7f60e43cb700 (LWP 9853))]
  (gdb) bt
  #0  0x559065fe7d3a in timer_mod (ts=0x0, expire_time=888960717010) at 
util/qemu-timer.c:462
  #1  0x559065d63769 in openpic_tmr_set_tmr (tmr=0x5590676fa7e0, val=96, 
enabled=true) at hw/intc/openpic.c:861
  #2  0x559065d63995 in openpic_tmr_write (opaque=0x5590676f71f0, addr=16, 
val=96, len=4) at hw/intc/openpic.c:912
  #3  0x559065b02811 in memory_region_write_accessor (mr=0x5590676f7710, 
addr=32, value=0x7f60e43c7da8, size=4, shift=0, mask=4294967295, attrs=...) at 
/home/bp/qemu/memory.c:529
  #4  0x559065b02a29 in access_with_adjusted_size (addr=32, 
value=0x7f60e43c7da8, size=1, access_size_min=4, access_size_max=4, 
access=0x559065b02727 , mr=0x5590676f7710, 
attrs=...) at /home/bp/qemu/memory.c:595
  #5  0x559065b051eb in memory_region_dispatch_write (mr=0x5590676f7710, 
addr=32, data=96, size=1, attrs=...) at /home/bp/qemu/memory.c:1337
  #6  0x559065aa3a36 in address_space_write_continue (as=0x559067614d90, 
addr=2147750160, attrs=..., buf=0x7f60e43c7ed0 "`_'\310`\177", len=1, addr1=32, 
l=1, mr=0x5590676f7710) at /home/bp/qemu/exec.c:2942
  #7  0x559065aa3b84 in address_space_write (as=0x559067614d90, 
addr=2147750160, attrs=..., buf=0x7f60e43c7ed0 "`_'\310`\177", len=1) at 
/home/bp/qemu/exec.c:2987
  #8  0x559065aa2ec0 in subpage_write (opaque=0x7f60c8275fc0, addr=272, 
value=96, len=1, attrs=...) at /home/bp/qemu/exec.c:2565
  #9  0x559065b02906 in memory_region_write_with_attrs_accessor 
(mr=0x7f60c8275fc0, addr=272, value=0x7f60e43c7fc8, size=1, shift=0, mask=255, 
attrs=...) at /home/bp/qemu/memory.c:555
  #10 0x559065b029d3 in access_with_adjusted_size (addr=272, 
value=0x7f60e43c7fc8, size=1, access_size_min=1, access_size_max=8, 
access=0x559065b02818 , 
mr=0x7f60c8275fc0, attrs=...) at /home/bp/qemu/memory.c:590
  #11 0x559065b0523a in memory_region_dispatch_write (mr=0x7f60c8275fc0, 
addr=272, data=96, size=1, attrs=...) at /home/bp/qemu/memory.c:1344
  #12 0x559065b175db in io_writex (env=0x7f60e43d42a0, 
iotlbentry=0x7f60e43e8130, mmu_idx=3, val=96, addr=2147750160, 
retaddr=140054158295744, size=1) at /home/bp/qemu/accel/tcg/cputlb.c:807
  #13 0x559065b18055 in io_writeb (env=0x7f60e43d42a0, mmu_idx=3, index=65, 
val=96 '`', addr=2147750160, retaddr=140054158295744) at 
/home/bp/qemu/softmmu_template.h:265
  #14 0x559065b181ea in helper_ret_stb_mmu (env=0x7f60e43d42a0, 
addr=2147750160, val=96 '`', oi=3, retaddr=140054158295744) at 
/home/bp/qemu/softmmu_template.h:300
  #15 0x7f60e65ac2c0 in code_gen_buffer ()
  #16 0x559065b1ff26 in cpu_tb_exec (cpu=0x7f60e43cc010, itb=0x7f60e65ac5c0 
) at /home/bp/qemu/accel/tc

[Qemu-devel] [Bug 1736042] [NEW] qemu-system-x86_64 does not boot image reliably

2017-12-03 Thread tezeb
Public bug reported:

Booting image as root user with following command works randomly.

./qemu-system-x86_64 -enable-kvm -curses -smp cpus=4 -m 8192
/root/ructfe2917_g.qcow2

Most of the time it ends up on "800x600 Graphic mode"(been stuck there
even for 4 hours before killed), but 1 out of ~20 it boots image
correctly(and instantly).

This is visible in v2.5.0 build from sources, v2.5.0 from Ubuntu Xenial
and v2.1.2 from Debian Jessie.

The image in question was converted from vmdk using:

qemu-img convert -O qcow2 file.vmdk file.qcow2

The image contains Ubuntu with grub.

I can provide debug logs, but will need guidance how to enable them(and
what logs are necessary).

As a side note, it seems that booting is more certain after
connecting(or mounting) partition using qemu-nbd/mount.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1736042

Title:
  qemu-system-x86_64 does not boot image reliably

Status in QEMU:
  New

Bug description:
  Booting image as root user with following command works randomly.

  ./qemu-system-x86_64 -enable-kvm -curses -smp cpus=4 -m 8192
  /root/ructfe2917_g.qcow2

  Most of the time it ends up on "800x600 Graphic mode"(been stuck there
  even for 4 hours before killed), but 1 out of ~20 it boots image
  correctly(and instantly).

  This is visible in v2.5.0 build from sources, v2.5.0 from Ubuntu
  Xenial and v2.1.2 from Debian Jessie.

  The image in question was converted from vmdk using:

  qemu-img convert -O qcow2 file.vmdk file.qcow2

  The image contains Ubuntu with grub.

  I can provide debug logs, but will need guidance how to enable
  them(and what logs are necessary).

  As a side note, it seems that booting is more certain after
  connecting(or mounting) partition using qemu-nbd/mount.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1736042/+subscriptions



Re: [Qemu-devel] [Qemu-ppc] [PATCH v2] target/ppc: Fix system lockups caused by interrupt_request state corruption

2017-12-03 Thread David Gibson
On Fri, Dec 01, 2017 at 03:49:07PM +, Richard Purdie wrote:
> Occasionally in Linux guests on x86_64 we're seeing logs like:
> 
> ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 0100req 0004
> 
> when they should read:
> 
> ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 0100req 0002
> 
> The "0004" is CPU_INTERRUPT_EXITTB yet the code calls
> cpu_interrupt(cs, CPU_INTERRUPT_HARD) ("0002") in this function
> just before the log message. Something is causing the HARD bit setting
> to get lost.
> 
> The knock on effect of losing that bit is the decrementer timer interrupts
> don't get delivered which causes the guest to sit idle in its idle handler
> and 'hang'.
> 
> The issue occurs due to races from code which sets CPU_INTERRUPT_EXITTB.
> 
> Rather than poking directly into cs->interrupt_request, that code needs to:
> 
> a) hold BQL
> b) use the cpu_interrupt() helper
> 
> This patch fixes the call sites to do this, fixing the hang.
> 
> Signed-off-by: Richard Purdie 

I strongly suspect there's a better way to do this long term - a lot
of that old ppc TCG code is really crufty.  But as best I can tell,
this is certainly a fix over what we had.  So, applied to ppc-for-2.11.

> ---
>  target/ppc/excp_helper.c | 16 +---
>  target/ppc/helper_regs.h | 10 --
>  2 files changed, 21 insertions(+), 5 deletions(-)
> 
> v2: Fixes a compile issue with master and ensures BQL is held in one case
> where it potentially wasn't.
> 
> diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> index e6009e7..8040277 100644
> --- a/target/ppc/excp_helper.c
> +++ b/target/ppc/excp_helper.c
> @@ -207,7 +207,13 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int 
> excp_model, int excp)
>  "Entering checkstop state\n");
>  }
>  cs->halted = 1;
> -cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
> +if (!qemu_mutex_iothread_locked()) {
> +qemu_mutex_lock_iothread();
> +cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
> +qemu_mutex_unlock_iothread();
> +} else {
> +cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
> +}
>  }
>  if (env->msr_mask & MSR_HVB) {
>  /* ISA specifies HV, but can be delivered to guest with HV clear
> @@ -940,7 +946,9 @@ void helper_store_msr(CPUPPCState *env, target_ulong val)
>  
>  if (excp != 0) {
>  CPUState *cs = CPU(ppc_env_get_cpu(env));
> -cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
> +qemu_mutex_lock_iothread();
> +cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
> +qemu_mutex_unlock_iothread();
>  raise_exception(env, excp);
>  }
>  }
> @@ -995,7 +1003,9 @@ static inline void do_rfi(CPUPPCState *env, target_ulong 
> nip, target_ulong msr)
>  /* No need to raise an exception here,
>   * as rfi is always the last insn of a TB
>   */
> -cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
> +qemu_mutex_lock_iothread();
> +cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
> +qemu_mutex_unlock_iothread();
>  
>  /* Reset the reservation */
>  env->reserve_addr = -1;
> diff --git a/target/ppc/helper_regs.h b/target/ppc/helper_regs.h
> index 2627a70..0beaad5 100644
> --- a/target/ppc/helper_regs.h
> +++ b/target/ppc/helper_regs.h
> @@ -20,6 +20,8 @@
>  #ifndef HELPER_REGS_H
>  #define HELPER_REGS_H
>  
> +#include "qemu/main-loop.h"
> +
>  /* Swap temporary saved registers with GPRs */
>  static inline void hreg_swap_gpr_tgpr(CPUPPCState *env)
>  {
> @@ -114,11 +116,15 @@ static inline int hreg_store_msr(CPUPPCState *env, 
> target_ulong value,
>  }
>  if (((value >> MSR_IR) & 1) != msr_ir ||
>  ((value >> MSR_DR) & 1) != msr_dr) {
> -cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
> +qemu_mutex_lock_iothread();
> +cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
> +qemu_mutex_unlock_iothread();
>  }
>  if ((env->mmu_model & POWERPC_MMU_BOOKE) &&
>  ((value >> MSR_GS) & 1) != msr_gs) {
> -cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
> +qemu_mutex_lock_iothread();
> +cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
> +qemu_mutex_unlock_iothread();
>  }
>  if (unlikely((env->flags & POWERPC_FLAG_TGPR) &&
>   ((value ^ env->msr) & (1 << MSR_TGPR {

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [Qemu-ppc] [PATCH 3/3] maint: Fix macros with broken 'do/while(0); ' usage

2017-12-03 Thread David Gibson
On Thu, Nov 30, 2017 at 07:41:59AM -0600, Eric Blake wrote:
> The point of writing a macro embedded in a 'do { ... } while (0)'
> loop is so that the macro can be used as a drop-in statement with
> the caller supplying the trailing ';'.  Although our coding style
> frowns on brace-less 'if':
>   if (cond)
> statement;
>   else
> something else;
> the use of do/while (0) in a macro is absolutely essential for the
> purpose of avoiding a syntax error on the 'else' - but it only works
> if there is no trailing ';' in the macro (as the ';' in the code
> calling the macro would then be a second statement and cause the
> 'else' to not pair to the 'if').
> 
> Many of the places touched in this code are examples of the ugly
> bit-rotting debug print statements; cleaning those up is left as
> a bite-sized task for another day.
> 
> Found mechanically via: $ git grep -B1 'while (0);' | grep -A1 
> 
> Signed-off-by: Eric Blake 

ppc related portions

Acked-by: David Gibson 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 1/2] colo: compare the packet based on the tcp sequence number

2017-12-03 Thread Zhang Chen
On Tue, Nov 28, 2017 at 8:04 PM, Mao Zhongyi 
wrote:

> The primary and secondary guest has the same TCP stream, but the
> the packet sizes are different due to the different fragmentation.
>
> In the current impletation, compare the packet with the size of
> payload, but packets of the same size and payload are very few,
> so it triggers checkopint frequently, which leads to a very low
> performance of the tcp packet comparison. In addtion, the method
> of comparing the size of packet is not correct in itself.
>
> like that:
> We send this payload:
> --
> | header |1|2|3|4|5|6|7|8|9|0|
> --
>
> primary:
> ppkt1:
> 
> | header |1|2|3|
> 
> ppkt2:
> 
> | header |4|5|6|7|8|9|0|
> 
>
> secondary:
> spkt1:
> --
> | header |1|2|3|4|5|6|7|8|9|0|
> --
>
> In the original method, ppkt1 and ppkt2 are diffrent in size and
> spkt1, so they can't compare and trigger the checkpoint.
>
> I have tested FTP get 200M and 1G file many times, I found that
> the performance was less than 1% of the native.
>
> Now I reconstructed the comparison of TCP packets based on the
> TCP sequence number. first of all, ppkt1 and spkt1 have the same
> starting sequence number, so they can compare, even though their
> length is different. And then ppkt1 with a smaller payload length
> is used as the comparison length, if the payload is same, send
> out the ppkt1 and record the offset(the length of ppkt1 payload)
> in spkt1. The next comparison, ppkt2 and spkt1 can be compared
> from the recorded position of spkt1.
>
> like that:
> 
> | header |1|2|3| ppkt1
> -|-|
>  | |
> -v-v--
> | header |1|2|3|4|5|6|7|8|9|0| spkt1
> ---|\|
>| \offset |
>   -v-v
>   | header |4|5|6|7|8|9|0| ppkt2
>   
>
> In this way, the performance can reach native 20% in my multiple
> tests.
>
> Cc: Zhang Chen 
> Cc: Li Zhijian 
> Cc: Jason Wang 
>
> Reported-by: Zhang Chen 
> Signed-off-by: Mao Zhongyi 
> Signed-off-by: Li Zhijian 
> ---
>  net/colo-compare.c | 300 ++
> +++
>  net/colo.c |   8 ++
>  net/colo.h |  14 +++
>  3 files changed, 211 insertions(+), 111 deletions(-)
>
> diff --git a/net/colo-compare.c b/net/colo-compare.c
> index 1ce195f..0752e9f 100644
> --- a/net/colo-compare.c
> +++ b/net/colo-compare.c
> @@ -38,6 +38,9 @@
>  #define COMPARE_READ_LEN_MAX NET_BUFSIZE
>  #define MAX_QUEUE_SIZE 1024
>
> +#define COLO_COMPARE_FREE_PRIMARY 0x01
> +#define COLO_COMPARE_FREE_SECONDARY   0x02
> +
>  /* TODO: Should be configurable */
>  #define REGULAR_PACKET_CHECK_MS 3000
>
> @@ -112,14 +115,31 @@ static gint seq_sorter(Packet *a, Packet *b,
> gpointer data)
>  return ntohl(atcp->th_seq) - ntohl(btcp->th_seq);
>  }
>
> +static void fill_pkt_seq(void *data, uint32_t *max_ack)
> +{
> +Packet *pkt = data;
> +struct tcphdr *tcphd;
> +
> +tcphd = (struct tcphdr *)pkt->transport_header;
> +
> +pkt->tcp_seq = ntohl(tcphd->th_seq);
> +pkt->tcp_ack = ntohl(tcphd->th_ack);
> +*max_ack = *max_ack > pkt->tcp_ack ? *max_ack : pkt->tcp_ack;
> +pkt->hdsize = pkt->transport_header - (uint8_t *)pkt->data
> +  + (tcphd->th_off << 2) - pkt->vnet_hdr_len;
> +pkt->pdsize = pkt->size - pkt->hdsize;
>


In this function you are not just "fill_pkt_seq", use "fill_pkt_tcp_info"
is more suitable.
And use "header_size" and "payload_size" instead of "hdsize" and "pdsize"
maybe better to read.



> +pkt->seq_end = pkt->tcp_seq + pkt->pdsize;
> +}
> +
>  /*
>   * Return 1 on success, if return 0 means the
>   * packet will be dropped
>   */
> -static int colo_insert_packet(GQueue *queue, Packet *pkt)
> +static int colo_insert_packet(GQueue *queue, Packet *pkt, uint32_t
> *max_ack)
>  {
>  if (g_queue_get_length(queue) <= MAX_QUEUE_SIZE) {
>  if (pkt->ip->ip_p == IPPROTO_TCP) {
> +fill_pkt_seq(pkt, max_ack);
>  g_queue_insert_sorted(queue,
>pkt,
>(GCompareDataFunc)seq_sorter,
> @@ -169,12 +189,12 @@ static int packet_enqueue(CompareState *s, int mode,
> Connection **con)
>  }
>
>  if (mode == PRIMARY_IN) {
> -if (!colo_insert_packet(&conn->primary_list, pkt)) {
> +if (!colo_insert_packet(&conn->primary_list, pkt, &conn->pack)) {
>  error_report("colo compare primary queue size too big,"
>   "drop packet");
>  }
>  } else {
> -if (!co

Re: [Qemu-devel] [Qemu-ppc] [PATCH v2] target/ppc: Fix system lockups caused by interrupt_request state corruption

2017-12-03 Thread David Gibson
On Mon, Dec 04, 2017 at 12:00:40PM +1100, David Gibson wrote:
> On Fri, Dec 01, 2017 at 03:49:07PM +, Richard Purdie wrote:
> > Occasionally in Linux guests on x86_64 we're seeing logs like:
> > 
> > ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 0100req 0004
> > 
> > when they should read:
> > 
> > ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 0100req 0002
> > 
> > The "0004" is CPU_INTERRUPT_EXITTB yet the code calls
> > cpu_interrupt(cs, CPU_INTERRUPT_HARD) ("0002") in this function
> > just before the log message. Something is causing the HARD bit setting
> > to get lost.
> > 
> > The knock on effect of losing that bit is the decrementer timer interrupts
> > don't get delivered which causes the guest to sit idle in its idle handler
> > and 'hang'.
> > 
> > The issue occurs due to races from code which sets CPU_INTERRUPT_EXITTB.
> > 
> > Rather than poking directly into cs->interrupt_request, that code needs to:
> > 
> > a) hold BQL
> > b) use the cpu_interrupt() helper
> > 
> > This patch fixes the call sites to do this, fixing the hang.
> > 
> > Signed-off-by: Richard Purdie 
> 
> I strongly suspect there's a better way to do this long term - a lot
> of that old ppc TCG code is really crufty.  But as best I can tell,
> this is certainly a fix over what we had.  So, applied to
> ppc-for-2.11.

I take that back.  Running make check with this patch results in:

  GTESTER check-qtest-ppc64
**
ERROR:/home/dwg/src/qemu/cpus.c:1582:qemu_mutex_lock_iothread: assertion 
failed: (!qemu_mutex_iothread_locked())
Broken pipe
qemu-system-ppc64: RP: Received invalid message 0x length 0x
GTester: last random seed: R02S895b0f4813776bf68c147bf987e73f7b
make: *** [/home/dwg/src/qemu/tests/Makefile.include:852: check-qtest-ppc64] 
Error 1

So, I've reverted it.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 15/25] spapr: notify the CPU when the XIVE interrupt priority is more privileged

2017-12-03 Thread David Gibson
On Sat, Dec 02, 2017 at 08:40:58AM -0600, Benjamin Herrenschmidt wrote:
> On Thu, 2017-11-30 at 16:00 +1100, David Gibson wrote:
> > 
> > >  static uint64_t spapr_xive_icp_accept(sPAPRXiveICP *icp)
> > >  {
> > > -return 0;
> > > +uint8_t nsr = icp->tima_os[TM_NSR];
> > > +
> > > +qemu_irq_lower(icp->output);
> > > +
> > > +if (icp->tima_os[TM_NSR] & TM_QW1_NSR_EO) {
> > > +uint8_t cppr = icp->tima_os[TM_PIPR];
> > > +
> > > +icp->tima_os[TM_CPPR] = cppr;
> > > +
> > > +/* Reset the pending buffer bit */
> > > +icp->tima_os[TM_IPB] &= ~priority_to_ipb(cppr);
> > 
> > What if multiple irqs of the same priority were queued?
> 
> It's the job of the OS to handle that case by consuming from the queue
> until it's empty. There is an MMIO the guest can use if it wants to
> that can set the IPB bits back to 1 for a given priority. Otherwise in
> Linux we just have a SW way to force a replay.

Ok, so "accept" is effectively saying the OS is accepting all
interrupts from that queue, right?

> 
> > > +icp->tima_os[TM_PIPR] = ipb_to_pipr(icp->tima_os[TM_IPB]);
> > > +
> > > +/* Drop Exception bit for OS */
> > > +icp->tima_os[TM_NSR] &= ~TM_QW1_NSR_EO;
> > > +}
> > > +
> > > +return (nsr << 8) | icp->tima_os[TM_CPPR];
> > > +}
> > > +
> > > +static void spapr_xive_icp_notify(sPAPRXiveICP *icp)
> > > +{
> > > +if (icp->tima_os[TM_PIPR] < icp->tima_os[TM_CPPR]) {
> > > +icp->tima_os[TM_NSR] |= TM_QW1_NSR_EO;
> > > +qemu_irq_raise(icp->output);
> > > +}
> > >  }
> > >  
> > >  static void spapr_xive_icp_set_cppr(sPAPRXiveICP *icp, uint8_t cppr)
> > > @@ -51,6 +105,9 @@ static void spapr_xive_icp_set_cppr(sPAPRXiveICP *icp, 
> > > uint8_t cppr)
> > >  }
> > >  
> > >  icp->tima_os[TM_CPPR] = cppr;
> > > +
> > > +/* CPPR has changed, inform the ICP which might raise an exception */
> > > +spapr_xive_icp_notify(icp);
> > >  }
> > >  
> > >  /*
> > > @@ -224,6 +281,8 @@ static void spapr_xive_irq(sPAPRXive *xive, int lisn)
> > >  XiveEQ *eq;
> > >  uint32_t eq_idx;
> > >  uint8_t priority;
> > > +uint32_t server;
> > > +sPAPRXiveICP *icp;
> > >  
> > >  ive = spapr_xive_get_ive(xive, lisn);
> > >  if (!ive || !(ive->w & IVE_VALID)) {
> > > @@ -253,6 +312,13 @@ static void spapr_xive_irq(sPAPRXive *xive, int lisn)
> > >  qemu_log_mask(LOG_UNIMP, "XIVE: !UCOND_NOTIFY not 
> > > implemented\n");
> > >  }
> > >  
> > > +server = GETFIELD(EQ_W6_NVT_INDEX, eq->w6);
> > > +icp = spapr_xive_icp_get(xive, server);
> > > +if (!icp) {
> > > +qemu_log_mask(LOG_GUEST_ERROR, "XIVE: No ICP for server %d\n", 
> > > server);
> > > +return;
> > > +}
> > > +
> > >  if (GETFIELD(EQ_W6_FORMAT_BIT, eq->w6) == 0) {
> > >  priority = GETFIELD(EQ_W7_F0_PRIORITY, eq->w7);
> > >  
> > > @@ -260,9 +326,18 @@ static void spapr_xive_irq(sPAPRXive *xive, int lisn)
> > >  if (priority == 0xff) {
> > >  g_assert_not_reached();
> > >  }
> > > +
> > > +/* Update the IPB (Interrupt Pending Buffer) with the priority
> > > + * of the new notification and inform the ICP, which will
> > > + * decide to raise the exception, or not, depending the CPPR.
> > > + */
> > > +icp->tima_os[TM_IPB] |= priority_to_ipb(priority);
> > > +icp->tima_os[TM_PIPR] = ipb_to_pipr(icp->tima_os[TM_IPB]);
> > >  } else {
> > >  qemu_log_mask(LOG_UNIMP, "XIVE: w7 format1 not implemented\n");
> > >  }
> > > +
> > > +spapr_xive_icp_notify(icp);
> > >  }
> > >  
> > >  /*
> > 
> > 
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 2/2] colo: add trace for the tcp packet comparison

2017-12-03 Thread Zhang Chen
On Tue, Nov 28, 2017 at 8:04 PM, Mao Zhongyi 
wrote:

> Cc: Zhang Chen 
> Cc: Li Zhijian 
> Cc: Jason Wang 
>
> Signed-off-by: Mao Zhongyi 
> ---
>  net/colo-compare.c | 16 
>  net/colo.c |  1 +
>  net/colo.h |  1 +
>  net/trace-events   |  2 +-
>  4 files changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/net/colo-compare.c b/net/colo-compare.c
> index 0752e9f..4c0a1d8 100644
> --- a/net/colo-compare.c
> +++ b/net/colo-compare.c
> @@ -129,6 +129,7 @@ static void fill_pkt_seq(void *data, uint32_t *max_ack)
>+ (tcphd->th_off << 2) - pkt->vnet_hdr_len;
>  pkt->pdsize = pkt->size - pkt->hdsize;
>  pkt->seq_end = pkt->tcp_seq + pkt->pdsize;
> +pkt->flags = tcphd->th_flags;
>  }
>
>  /*
> @@ -337,6 +338,16 @@ sec:
>  }
>
>  if (colo_mark_tcp_pkt(ppkt, spkt, &mark, max_ack)) {
> +trace_colo_compare_tcp_info("pri",
> +ppkt->tcp_seq, ppkt->tcp_ack,
> +ppkt->hdsize, ppkt->pdsize,
> +ppkt->offset, ppkt->flags);
> +
> +trace_colo_compare_tcp_info("sec",
> +spkt->tcp_seq, spkt->tcp_ack,
> +spkt->hdsize, spkt->pdsize,
> +spkt->offset, spkt->flags);
> +
>  if (mark == COLO_COMPARE_FREE_PRIMARY) {
>  conn->compare_seq = ppkt->seq_end;
>  colo_release_primary_pkt(s, ppkt);
> @@ -355,6 +366,11 @@ sec:
>  goto pri;
>  }
>  } else {
> +qemu_hexdump((char *)ppkt->data, stderr,
> + "colo-compare ppkt", ppkt->size);
> +qemu_hexdump((char *)spkt->data, stderr,
> + "colo-compare spkt", spkt->size);
> +
>  g_queue_push_head(&conn->primary_list, ppkt);
>  g_queue_push_head(&conn->secondary_list, spkt);
>
> diff --git a/net/colo.c b/net/colo.c
> index 1743522..0b469f2 100644
> --- a/net/colo.c
> +++ b/net/colo.c
> @@ -171,6 +171,7 @@ Packet *packet_new(const void *data, int size, int
> vnet_hdr_len)
>  pkt->hdsize = 0;
>  pkt->pdsize = 0;
>  pkt->offset = 0;
> +pkt->flags = 0;
>
>  return pkt;
>  }
> diff --git a/net/colo.h b/net/colo.h
> index 97bc41e..0530dd0 100644
> --- a/net/colo.h
> +++ b/net/colo.h
> @@ -53,6 +53,7 @@ typedef struct Packet {
>  uint16_t pdsize; /* the payload length */
>  /* record the payload offset(the length that has been compared) */
>  uint16_t offset;
> +uint8_t flags; /* Flags(aka Control bits) */
>  } Packet;
>
>  typedef struct ConnectionKey {
> diff --git a/net/trace-events b/net/trace-events
> index 938263d..7b594cf 100644
> --- a/net/trace-events
> +++ b/net/trace-events
> @@ -13,7 +13,7 @@ colo_compare_icmp_miscompare(const char *sta, int size)
> ": %s = %d"
>  colo_compare_ip_info(int psize, const char *sta, const char *stb, int
> ssize, const char *stc, const char *std) "ppkt size = %d, ip_src = %s,
> ip_dst = %s, spkt size = %d, ip_src = %s, ip_dst = %s"
>  colo_old_packet_check_found(int64_t old_time) "%" PRId64
>  colo_compare_miscompare(void) ""
> -colo_compare_tcp_info(const char *pkt, uint32_t seq, uint32_t ack, int
> res, uint32_t flag, int size) "side: %s seq/ack= %u/%u res= %d flags= 0x%x
> pkt_size: %d\n"
> +colo_compare_tcp_info(const char *pkt, uint32_t seq, uint32_t ack, int
> hdlen, int pdlen, int offset, int flags) "%s: seq/ack= %u/%u hdlen= %d
> pdlen= %d offset= %d flags=%d\n"
>


In patch 1/2 you have removed where they have been used, so you should
remove this definition in patch 1 firstly.


Thanks
Zhang Chen


>
>  # net/filter-rewriter.c
>  colo_filter_rewriter_debug(void) ""
> --
> 2.9.4
>
>
>
>


Re: [Qemu-devel] [PATCH 14/25] spapr: push the XIVE EQ data in OS event queue

2017-12-03 Thread David Gibson
On Sat, Dec 02, 2017 at 08:46:19AM -0600, Benjamin Herrenschmidt wrote:
> On Sat, 2017-12-02 at 08:45 -0600, Benjamin Herrenschmidt wrote:
> > On Fri, 2017-12-01 at 15:10 +1100, David Gibson wrote:
> > > 
> > > Hm, ok.  Guest endian (or at least, not definitively host-endian) data
> > > in a plain uint32_t makes me uncomfortable.  Could we use char data[4]
> > > instead, to make it clear it's a byte-ordered buffer, rather than a
> > > number as far as the XIVE is concerned.
> > > 
> > > Hm.. except that doesn't quite work, because the hardware must define
> > > which end that generation bit ends up in...
> > 
> > It also needs to be written atomically. Just say it's big endian.
> 
> Also the guest reads it using be32_to_cpup...

Ok.  Definitely should be treated as BE and read/written with the be32
DMA helper functions.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 13/25] spapr: introduce the XIVE Event Queues

2017-12-03 Thread David Gibson
On Fri, Dec 01, 2017 at 05:36:39PM +0100, Cédric Le Goater wrote:
> On 12/01/2017 12:35 AM, David Gibson wrote:
> > On Thu, Nov 30, 2017 at 02:06:27PM +, Cédric Le Goater wrote:
> >> On 11/30/2017 04:38 AM, David Gibson wrote:
> >>> On Thu, Nov 23, 2017 at 02:29:43PM +0100, Cédric Le Goater wrote:
>  The Event Queue Descriptor (EQD) table, also known as Event Notification
>  Descriptor (END), is one of the internal tables the XIVE interrupt
>  controller uses to redirect exception from event sources to CPU
>  threads.
> 
>  The EQD specifies on which Event Queue the event data should be posted
>  when an exception occurs (later on pulled by the OS) and which server
>  (VPD in XIVE terminology) to notify. The Event Queue is a much more
>  complex structure but we start with a simple model for the sPAPR
>  machine.
> >>>
> >>> Just to clarify my understanding a server / VPD in XIVE would
> >>> typically correspond to a cpu - either real or virtual, yes?
> >>
> >> yes. VP for "virtual processor" and VPD for "virtual processor 
> >> descriptor" which contains the XIVE interrupt state of the VP 
> >> when not dispatched. It is still described in some documentation 
> >> as an NVT : Notification Virtual Target.  
> >>
> >> XIVE concepts were renamed at some time but the old name perdured.
> >> I am still struggling my way through all the names.
> >>
> >>
>  There is one XiveEQ per priority and the model chooses to store them
>  under the Xive Interrupt presenter model. It will be retrieved, just
>  like for XICS, through the 'intc' object pointer of the CPU.
> 
>  The EQ indexing follows a simple pattern:
> 
> (server << 3) | (priority & 0x7)
> 
>  Signed-off-by: Cédric Le Goater 
>  ---
>   hw/intc/spapr_xive.c| 56 
>  +
>   hw/intc/xive-internal.h | 50 +++
>   2 files changed, 106 insertions(+)
> 
>  diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>  index 554b25e0884c..983317a6b3f6 100644
>  --- a/hw/intc/spapr_xive.c
>  +++ b/hw/intc/spapr_xive.c
>  @@ -23,6 +23,7 @@
>   #include "sysemu/dma.h"
>   #include "monitor/monitor.h"
>   #include "hw/ppc/spapr_xive.h"
>  +#include "hw/ppc/spapr.h"
>   #include "hw/ppc/xics.h"
>   
>   #include "xive-internal.h"
>  @@ -34,6 +35,8 @@ struct sPAPRXiveICP {
>   uint8_t   tima[TM_RING_COUNT * 0x10];
>   uint8_t   *tima_os;
>   qemu_irq  output;
>  +
>  +XiveEQeqt[XIVE_PRIORITY_MAX + 1];
>   };
>   
>   static uint64_t spapr_xive_icp_accept(sPAPRXiveICP *icp)
>  @@ -183,6 +186,13 @@ static const MemoryRegionOps spapr_xive_tm_ops = {
>   },
>   };
>   
>  +static sPAPRXiveICP *spapr_xive_icp_get(sPAPRXive *xive, int server)
>  +{
>  +PowerPCCPU *cpu = spapr_find_cpu(server);
>  +
>  +return cpu ? SPAPR_XIVE_ICP(cpu->intc) : NULL;
>  +}
>  +
>   static void spapr_xive_irq(sPAPRXive *xive, int lisn)
>   {
>   
>  @@ -632,6 +642,8 @@ static void spapr_xive_icp_reset(void *dev)
>   sPAPRXiveICP *xicp = SPAPR_XIVE_ICP(dev);
>   
>   memset(xicp->tima, 0, sizeof(xicp->tima));
>  +
>  +memset(xicp->eqt, 0, sizeof(xicp->eqt));
>   }
>   
>   static void spapr_xive_icp_realize(DeviceState *dev, Error **errp)
>  @@ -683,6 +695,23 @@ static void spapr_xive_icp_init(Object *obj)
>   xicp->tima_os = &xicp->tima[TM_QW1_OS];
>   }
>   
>  +static const VMStateDescription vmstate_spapr_xive_icp_eq = {
>  +.name = TYPE_SPAPR_XIVE_ICP "/eq",
>  +.version_id = 1,
>  +.minimum_version_id = 1,
>  +.fields = (VMStateField []) {
>  +VMSTATE_UINT32(w0, XiveEQ),
>  +VMSTATE_UINT32(w1, XiveEQ),
>  +VMSTATE_UINT32(w2, XiveEQ),
>  +VMSTATE_UINT32(w3, XiveEQ),
>  +VMSTATE_UINT32(w4, XiveEQ),
>  +VMSTATE_UINT32(w5, XiveEQ),
>  +VMSTATE_UINT32(w6, XiveEQ),
>  +VMSTATE_UINT32(w7, XiveEQ),
> >>>
> >>> Wow.  Super descriptive field names there, but I guess that's not your 
> >>> fault.
> >>
> >> The defines in the "xive-internal.h" give a better view ... 
> >>
>  +VMSTATE_END_OF_LIST()
>  +},
>  +};
>  +
>   static bool vmstate_spapr_xive_icp_needed(void *opaque)
>   {
>   /* TODO check machine XIVE support */
>  @@ -696,6 +725,8 @@ static const VMStateDescription 
>  vmstate_spapr_xive_icp = {
>   .needed = vmstate_spapr_xive_icp_needed,
>   .fields = (VMStateField[]) {
>   VMSTATE_BUFFER(tima, sPAPRXiveICP),
>  +VMSTATE_STRUCT_ARRAY(eqt, sPAPRXiveICP, (XIVE_PRIORITY_MAX + 
>  1), 1,
>  + vmstate_spapr_xi

Re: [Qemu-devel] [PATCH 1/1] main loop: remove useless code

2017-12-03 Thread felix yao
Hi Peter Maydell:

Got it, thank you very much!

2017-12-02 19:52 GMT+08:00 Peter Maydell :

> On 2 December 2017 at 07:41, FelixYao  wrote:
> > hi Paolo Bonzini:
> >
> > Those codes seem useless, Could it be removed?
> >
> > Signed-off-by: FelixYao 
> > ---
> >  vl.c | 4 
> >  1 file changed, 4 deletions(-)
> >
> > diff --git a/vl.c b/vl.c
> > index 1ad1c04..5bed4c2 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -2995,10 +2995,6 @@ static void set_memory_options(uint64_t
> *ram_slots, ram_addr_t *maxram_size,
> >
> >  sz = QEMU_ALIGN_UP(sz, 8192);
> >  ram_size = sz;
> > -if (ram_size != sz) {
> > -error_report("ram size too large");
> > -exit(EXIT_FAILURE);
> > -}
>
> ram_size is a ramaddr_t, which may be a 32-bit variable, whereas
> sz is a uint64_t. This check is making sure that the ram size
> specified can actually fit in a ramaddr_t.
>
> thanks
> -- PMM
>


Re: [Qemu-devel] [PATCH 17/25] spapr: add a sPAPRXive object to the machine

2017-12-03 Thread David Gibson
On Fri, Dec 01, 2017 at 09:10:24AM +0100, Cédric Le Goater wrote:
> On 12/01/2017 05:14 AM, David Gibson wrote:
> > On Thu, Nov 30, 2017 at 03:15:09PM +, Cédric Le Goater wrote:
> >> On 11/30/2017 05:55 AM, David Gibson wrote:
> >>> On Thu, Nov 23, 2017 at 02:29:47PM +0100, Cédric Le Goater wrote:
>  The XIVE object is designed to be always available, so it is created
>  unconditionally on newer machines.
> >>>
> >>> There doesn't actually seem to be anything dependent on machine
> >>> version here.
> >>
> >> No. I thought that was too early in the patchset. This is handled 
> >> in the last patch with a 'xive_exploitation' bool which is set to 
> >> false on older machines. 
> >>
> >> But, nevertheless, the XIVE objects are always created even if not
> >> used. Something to discuss.
> > 
> > That'll definitely break backwards migration, since the destination
> > won't understand the (unused but still present) xive state it
> > receives. 
> 
> no because it's not sent. the vmstate 'needed' op of the sPAPRXive
> object discards it :
> 
> static bool vmstate_spapr_xive_needed(void *opaque)
> {
> sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>  
> return spapr->xive_exploitation;
> }

Ah, sorry, missed that.  Once we have negotiation we'll need to make
sure the xive_exploitation bit is sent first, of course, but I'm
pretty sure the machine state is already sent first.

> > So xives can only be created on new machine types. 
> 
> That would be better I agree. I can probably use the 'xive_exploitation'
> bool to condition its creation.

Hrm.  I'm less sure about that - I'm not sure the lifetimes line up.
But I'd like to avoid creating them on earlier machine types, even if
xive_exploitation can't do the trick.

> 
> > I'm ok
> > (at least tentatively) with always creating them on the newer machine
> > types, regardless of whether the guest ends up exploiting it or not.
> 
> OK.
> 
> 
>  Depending on the configuration and
>  the guest capabilities, the CAS negotiation process will decide which
>  interrupt model to use, legacy or XIVE.
> 
>  The XIVE model makes use of the full range of the IRQ number space
>  because the IRQ numbers for the CPU IPIs are allocated in the range
>  below XICS_IRQ_BASE, which is unused by XICS.
> >>>
> >>> Ok.  And I take it 4096 is enough space for the XIVE IPIs for the
> >>> forseeable future?
> >>
> >> The biggest real system I am aware of as 16 sockets, 192 cores, SMT8. 
> >> That's 1536 cpus. pseries has a max_cpus of 1024.
> > 
> > Ok, so we can go to double the current system size, but not 4x.  Not
> > sure if that seems adequate or not.  Still it's a relatively minor
> > detail.
> > 
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 1/2] colo: compare the packet based on the tcp sequence number

2017-12-03 Thread Mao Zhongyi



On 12/04/2017 09:41 AM, Zhang Chen wrote:



On Tue, Nov 28, 2017 at 8:04 PM, Mao Zhongyi mailto:maozy.f...@cn.fujitsu.com>> wrote:

The primary and secondary guest has the same TCP stream, but the
the packet sizes are different due to the different fragmentation.

In the current impletation, compare the packet with the size of
payload, but packets of the same size and payload are very few,
so it triggers checkopint frequently, which leads to a very low
performance of the tcp packet comparison. In addtion, the method
of comparing the size of packet is not correct in itself.

like that:
We send this payload:
--
| header |1|2|3|4|5|6|7|8|9|0|
--

primary:
ppkt1:

| header |1|2|3|

ppkt2:

| header |4|5|6|7|8|9|0|


secondary:
spkt1:
--
| header |1|2|3|4|5|6|7|8|9|0|
--

In the original method, ppkt1 and ppkt2 are diffrent in size and
spkt1, so they can't compare and trigger the checkpoint.

I have tested FTP get 200M and 1G file many times, I found that
the performance was less than 1% of the native.

Now I reconstructed the comparison of TCP packets based on the
TCP sequence number. first of all, ppkt1 and spkt1 have the same
starting sequence number, so they can compare, even though their
length is different. And then ppkt1 with a smaller payload length
is used as the comparison length, if the payload is same, send
out the ppkt1 and record the offset(the length of ppkt1 payload)
in spkt1. The next comparison, ppkt2 and spkt1 can be compared
from the recorded position of spkt1.

like that:

| header |1|2|3| ppkt1
-|-|
 | |
-v-v--
| header |1|2|3|4|5|6|7|8|9|0| spkt1
---|\|
   | \offset |
  -v-v
  | header |4|5|6|7|8|9|0| ppkt2
  

In this way, the performance can reach native 20% in my multiple
tests.

Cc: Zhang Chen mailto:zhangc...@gmail.com>>
Cc: Li Zhijian mailto:lizhij...@cn.fujitsu.com>>
Cc: Jason Wang mailto:jasow...@redhat.com>>

Reported-by: Zhang Chen mailto:zhangc...@gmail.com>>
Signed-off-by: Mao Zhongyi mailto:maozy.f...@cn.fujitsu.com>>
Signed-off-by: Li Zhijian mailto:lizhij...@cn.fujitsu.com>>
---
 net/colo-compare.c | 300 
+
 net/colo.c |   8 ++
 net/colo.h |  14 +++
 3 files changed, 211 insertions(+), 111 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 1ce195f..0752e9f 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -38,6 +38,9 @@
 #define COMPARE_READ_LEN_MAX NET_BUFSIZE
 #define MAX_QUEUE_SIZE 1024

+#define COLO_COMPARE_FREE_PRIMARY 0x01
+#define COLO_COMPARE_FREE_SECONDARY   0x02
+
 /* TODO: Should be configurable */
 #define REGULAR_PACKET_CHECK_MS 3000

@@ -112,14 +115,31 @@ static gint seq_sorter(Packet *a, Packet *b, gpointer 
data)
 return ntohl(atcp->th_seq) - ntohl(btcp->th_seq);
 }

+static void fill_pkt_seq(void *data, uint32_t *max_ack)
+{
+Packet *pkt = data;
+struct tcphdr *tcphd;
+
+tcphd = (struct tcphdr *)pkt->transport_header;
+
+pkt->tcp_seq = ntohl(tcphd->th_seq);
+pkt->tcp_ack = ntohl(tcphd->th_ack);
+*max_ack = *max_ack > pkt->tcp_ack ? *max_ack : pkt->tcp_ack;
+pkt->hdsize = pkt->transport_header - (uint8_t *)pkt->data
+  + (tcphd->th_off << 2) - pkt->vnet_hdr_len;
+pkt->pdsize = pkt->size - pkt->hdsize;



In this function you are not just "fill_pkt_seq", use "fill_pkt_tcp_info" is 
more suitable.
And use "header_size" and "payload_size" instead of "hdsize" and "pdsize" maybe 
better to read.


OK, it's greater, thanks.





+pkt->seq_end = pkt->tcp_seq + pkt->pdsize;
+}
+
 /*
  * Return 1 on success, if return 0 means the
  * packet will be dropped
  */
-static int colo_insert_packet(GQueue *queue, Packet *pkt)
+static int colo_insert_packet(GQueue *queue, Packet *pkt, uint32_t 
*max_ack)
 {
 if (g_queue_get_length(queue) <= MAX_QUEUE_SIZE) {
 if (pkt->ip->ip_p == IPPROTO_TCP) {
+fill_pkt_seq(pkt, max_ack);
 g_queue_insert_sorted(queue,
   pkt,
 

Re: [Qemu-devel] [PATCH 2/2] colo: add trace for the tcp packet comparison

2017-12-03 Thread Mao Zhongyi



On 12/04/2017 09:53 AM, Zhang Chen wrote:



On Tue, Nov 28, 2017 at 8:04 PM, Mao Zhongyi mailto:maozy.f...@cn.fujitsu.com>> wrote:

Cc: Zhang Chen mailto:zhangc...@gmail.com>>
Cc: Li Zhijian mailto:lizhij...@cn.fujitsu.com>>
Cc: Jason Wang mailto:jasow...@redhat.com>>

Signed-off-by: Mao Zhongyi mailto:maozy.f...@cn.fujitsu.com>>
---
 net/colo-compare.c | 16 
 net/colo.c |  1 +
 net/colo.h |  1 +
 net/trace-events   |  2 +-
 4 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 0752e9f..4c0a1d8 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -129,6 +129,7 @@ static void fill_pkt_seq(void *data, uint32_t *max_ack)
   + (tcphd->th_off << 2) - pkt->vnet_hdr_len;
 pkt->pdsize = pkt->size - pkt->hdsize;
 pkt->seq_end = pkt->tcp_seq + pkt->pdsize;
+pkt->flags = tcphd->th_flags;
 }

 /*
@@ -337,6 +338,16 @@ sec:
 }

 if (colo_mark_tcp_pkt(ppkt, spkt, &mark, max_ack)) {
+trace_colo_compare_tcp_info("pri",
+ppkt->tcp_seq, ppkt->tcp_ack,
+ppkt->hdsize, ppkt->pdsize,
+ppkt->offset, ppkt->flags);
+
+trace_colo_compare_tcp_info("sec",
+spkt->tcp_seq, spkt->tcp_ack,
+spkt->hdsize, spkt->pdsize,
+spkt->offset, spkt->flags);
+
 if (mark == COLO_COMPARE_FREE_PRIMARY) {
 conn->compare_seq = ppkt->seq_end;
 colo_release_primary_pkt(s, ppkt);
@@ -355,6 +366,11 @@ sec:
 goto pri;
 }
 } else {
+qemu_hexdump((char *)ppkt->data, stderr,
+ "colo-compare ppkt", ppkt->size);
+qemu_hexdump((char *)spkt->data, stderr,
+ "colo-compare spkt", spkt->size);
+
 g_queue_push_head(&conn->primary_list, ppkt);
 g_queue_push_head(&conn->secondary_list, spkt);

diff --git a/net/colo.c b/net/colo.c
index 1743522..0b469f2 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -171,6 +171,7 @@ Packet *packet_new(const void *data, int size, int 
vnet_hdr_len)
 pkt->hdsize = 0;
 pkt->pdsize = 0;
 pkt->offset = 0;
+pkt->flags = 0;

 return pkt;
 }
diff --git a/net/colo.h b/net/colo.h
index 97bc41e..0530dd0 100644
--- a/net/colo.h
+++ b/net/colo.h
@@ -53,6 +53,7 @@ typedef struct Packet {
 uint16_t pdsize; /* the payload length */
 /* record the payload offset(the length that has been compared) */
 uint16_t offset;
+uint8_t flags; /* Flags(aka Control bits) */
 } Packet;

 typedef struct ConnectionKey {
diff --git a/net/trace-events b/net/trace-events
index 938263d..7b594cf 100644
--- a/net/trace-events
+++ b/net/trace-events
@@ -13,7 +13,7 @@ colo_compare_icmp_miscompare(const char *sta, int size) ": %s 
= %d"
 colo_compare_ip_info(int psize, const char *sta, const char *stb, int ssize, const 
char *stc, const char *std) "ppkt size = %d, ip_src = %s, ip_dst = %s, spkt size = 
%d, ip_src = %s, ip_dst = %s"
 colo_old_packet_check_found(int64_t old_time) "%" PRId64
 colo_compare_miscompare(void) ""
-colo_compare_tcp_info(const char *pkt, uint32_t seq, uint32_t ack, int res, uint32_t 
flag, int size) "side: %s seq/ack= %u/%u res= %d flags= 0x%x pkt_size: %d\n"
+colo_compare_tcp_info(const char *pkt, uint32_t seq, uint32_t ack, int hdlen, int 
pdlen, int offset, int flags) "%s: seq/ack= %u/%u hdlen= %d pdlen= %d offset= %d 
flags=%d\n"



In patch 1/2 you have removed where they have been used, so you should remove 
this definition in patch 1 firstly.


Well, Thanks for the clarification, I will fix it in the next.

Thanks,
Mao




Thanks
Zhang Chen



 # net/filter-rewriter.c
 colo_filter_rewriter_debug(void) ""
--
2.9.4










Re: [Qemu-devel] [PATCH v18 07/10] virtio-balloon: VIRTIO_BALLOON_F_SG

2017-12-03 Thread Wei Wang

On 12/01/2017 11:38 PM, Michael S. Tsirkin wrote:

On Wed, Nov 29, 2017 at 09:55:23PM +0800, Wei Wang wrote:

+static void send_one_desc(struct virtio_balloon *vb,
+ struct virtqueue *vq,
+ uint64_t addr,
+ uint32_t len,
+ bool inbuf,
+ bool batch)
+{
+   int err;
+   unsigned int size;
+
+   /* Detach all the used buffers from the vq */
+   while (virtqueue_get_buf(vq, &size))
+   ;
+
+   err = virtqueue_add_one_desc(vq, addr, len, inbuf, vq);
+   /*
+* This is expected to never fail: there is always at least 1 entry
+* available on the vq, because when the vq is full the worker thread
+* that adds the desc will be put into sleep until at least 1 entry is
+* available to use.
+*/
+   BUG_ON(err);
+
+   /* If batching is requested, we batch till the vq is full */
+   if (!batch || !vq->num_free)
+   kick_and_wait(vq, vb->acked);
+}
+

This internal kick complicates callers. I suggest that instead,
you move this to callers, just return a "kick required" boolean.
This way callers do not need to play with num_free at all.


Then in what situation would the function return true of "kick required"?

I think this wouldn't make a difference fundamentally. For example, we 
have 257 sgs (batching size=256) to send to host:


while (i < 257) {
kick_required = send_sgs();
if (kick_required)
kick(); // After the 256 sgs have been added, the caller 
performs a kick().

}

Do we still need a kick here for the 257th sg as before? Only the caller 
knows if the last added sgs need a kick (when the send_sgs receives one 
sg, it doesn't know if there are more to come).


There is another approach to checking if the last added sgs haven't been 
sync-ed to the host: expose "vring_virtqueue->num_added" to the caller 
via a virtio_ring API:


unsigned int virtqueue_num_added(struct virtqueue *_vq)
   {
struct vring_virtqueue *vq = to_vvq(_vq);

return vq->num_added;
  }




+/*
+ * Send balloon pages in sgs to host. The balloon pages are recorded in the
+ * page xbitmap. Each bit in the bitmap corresponds to a page of PAGE_SIZE.
+ * The page xbitmap is searched for continuous "1" bits, which correspond
+ * to continuous pages, to chunk into sgs.
+ *
+ * @page_xb_start and @page_xb_end form the range of bits in the xbitmap that
+ * need to be searched.
+ */
+static void tell_host_sgs(struct virtio_balloon *vb,
+ struct virtqueue *vq,
+ unsigned long page_xb_start,
+ unsigned long page_xb_end)
+{
+   unsigned long pfn_start, pfn_end;
+   uint64_t addr;
+   uint32_t len, max_len = round_down(UINT_MAX, PAGE_SIZE);
+
+   pfn_start = page_xb_start;
+   while (pfn_start < page_xb_end) {
+   pfn_start = xb_find_next_set_bit(&vb->page_xb, pfn_start,
+page_xb_end);
+   if (pfn_start == page_xb_end + 1)
+   break;
+   pfn_end = xb_find_next_zero_bit(&vb->page_xb,
+   pfn_start + 1,
+   page_xb_end);
+   addr = pfn_start << PAGE_SHIFT;
+   len = (pfn_end - pfn_start) << PAGE_SHIFT;

This assugnment can overflow. Next line compares with UINT_MAX but by
that time it is too late.  I think you should do all math in 64 bit to
avoid surprises, then truncate to max_len and then it's safe to assign
to sg.


Sounds reasonable, thanks.



+
+   xb_clear_bit_range(&vb->page_xb, page_xb_start, page_xb_end);
+}
+
+static inline int xb_set_page(struct virtio_balloon *vb,
+  struct page *page,
+  unsigned long *pfn_min,
+  unsigned long *pfn_max)
+{
+   unsigned long pfn = page_to_pfn(page);
+   int ret;
+
+   *pfn_min = min(pfn, *pfn_min);
+   *pfn_max = max(pfn, *pfn_max);
+
+   do {
+   ret = xb_preload_and_set_bit(&vb->page_xb, pfn,
+GFP_NOWAIT | __GFP_NOWARN);
+   } while (unlikely(ret == -EAGAIN));

what exactly does this loop do? Does this wait
forever until there is some free memory? why GFP_NOWAIT?


Basically, "-EAGAIN" is returned from xb_set_bit() in the case when the 
pre-allocated per-cpu ida_bitmap is NULL. In that case, the caller 
re-invokes xb_preload_and_set_bit(), which re-invokes xb_preload to 
allocate ida_bitmap. So "-EAGAIN" actually does not indicate a status 
about memory allocation. "-ENOMEM" is the one to indicate the failure of 
memory allocation, but the loop doesn't re-try on "-ENOMEM".


GFP_NOWAIT is used to avoid memory reclaiming, which could cause the 
deadlock issue we discussed before.






   

[Qemu-devel] [PULL 0/3] ppc-for-2.11 queue 20171204

2017-12-03 Thread David Gibson
The following changes since commit c11d61271b9e6e7a1f0479ef1ca8fb55fa457a62:

  Update version for v2.11.0-rc3 release (2017-11-29 17:59:34 +)

are available in the Git repository at:

  git://github.com/dgibson/qemu.git tags/ppc-for-2.11-20171204

for you to fetch changes up to 768a20f3a491ed4afce73ebb65347d55251c0ebd:

  spapr: Include "pre-plugged" DIMMS in ram size calculation at reset 
(2017-12-04 11:31:22 +1100)


ppc patch queue 2017-12-04

We are, alas, not yet to the bottom of ppc bugs.  This pull request
fixes several more.  I believe they're important enough to include in
2.11. despite the late date.


David Gibson (1):
  spapr: Include "pre-plugged" DIMMS in ram size calculation at reset

Kurban Mallachiev (1):
  target-ppc: Don't invalidate non-supported msr bits

Laurent Vivier (1):
  pseries: fix TCG migration

 hw/ppc/spapr.c   | 7 +--
 target/ppc/machine.c | 4 ++--
 2 files changed, 7 insertions(+), 4 deletions(-)



[Qemu-devel] [PULL 2/3] target-ppc: Don't invalidate non-supported msr bits

2017-12-03 Thread David Gibson
From: Kurban Mallachiev 

The msr invalidation code (commits 993eb and 2360b) inverts all
bits except MSR_TGPR and MSR_HVB. On non PowerPC 601 processors
this leads to incorrect change of excp_prefix in hreg_store_msr()
function. The problem is that new msr value get multiplied by msr_mask
and inverted msr does not, thus values of MSR_EP bit in new msr value
and inverted msr are distinct, so that excp_prefix changes but should
not.

Signed-off-by: Kurban Mallachiev 
Signed-off-by: David Gibson 
---
 target/ppc/machine.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/ppc/machine.c b/target/ppc/machine.c
index 24117e8f31..e475206c6a 100644
--- a/target/ppc/machine.c
+++ b/target/ppc/machine.c
@@ -300,9 +300,9 @@ static int cpu_post_load(void *opaque, int version_id)
 ppc_store_sdr1(env, env->spr[SPR_SDR1]);
 }
 
-/* Invalidate all msr bits except MSR_TGPR/MSR_HVB before restoring */
+/* Invalidate all supported msr bits except MSR_TGPR/MSR_HVB before 
restoring */
 msr = env->msr;
-env->msr ^= ~((1ULL << MSR_TGPR) | MSR_HVB);
+env->msr ^= env->msr_mask & ~((1ULL << MSR_TGPR) | MSR_HVB);
 ppc_store_msr(env, msr);
 
 hreg_compute_mem_idx(env);
-- 
2.14.3




[Qemu-devel] [PULL 1/3] pseries: fix TCG migration

2017-12-03 Thread David Gibson
From: Laurent Vivier 

Migration of pseries is broken with TCG because
QEMU tries to restore KVM MMU state unconditionally.

The result is a SIGSEGV in kvm_vm_ioctl():

  #0  kvm_vm_ioctl (s=0x0, type=-2146390353)
  at qemu/accel/kvm/kvm-all.c:2032
  #1  0x0001003e3e2c in kvmppc_configure_v3_mmu (cpu=,
  radix=, gtse=, proc_tbl=)
  at qemu/target/ppc/kvm.c:396
  #2  0x0001002f8b88 in spapr_post_load (opaque=0x1019103c0,
  version_id=) at qemu/hw/ppc/spapr.c:1578
  #3  0x00010059e4cc in vmstate_load_state (f=0x10623,
  vmsd=0x1009479e0 , opaque=0x1019103c0,
  version_id=) at qemu/migration/vmstate.c:165
  #4  0x0001005987e0 in vmstate_load (f=, se=)
  at qemu/migration/savevm.c:748

This patch fixes the problem by not calling the KVM function with the
TCG mode.

Fixes: d39c90f5f3 ("spapr: Fix migration of Radix guests")
Signed-off-by: Laurent Vivier 
Reviewed-by: Suraj Jitindar Singh 
Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 9efddeaee5..a471de6cab 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1570,7 +1570,7 @@ static int spapr_post_load(void *opaque, int version_id)
 err = spapr_rtc_import_offset(&spapr->rtc, spapr->rtc_offset);
 }
 
-if (spapr->patb_entry) {
+if (kvm_enabled() && spapr->patb_entry) {
 PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
 bool radix = !!(spapr->patb_entry & PATBE1_GR);
 bool gtse = !!(cpu->env.spr[SPR_LPCR] & LPCR_GTSE);
-- 
2.14.3




[Qemu-devel] [PULL 3/3] spapr: Include "pre-plugged" DIMMS in ram size calculation at reset

2017-12-03 Thread David Gibson
At guest reset time, we allocate a hash page table (HPT) for the guest
based on the guest's RAM size.  If dynamic HPT resizing is not available we
use the maximum RAM size, if it is we use the current RAM size.

But the "current RAM size" calculation is incorrect - we just use the
"base" ram_size from the machine structure.  This doesn't include any
pluggable DIMMs that are already plugged at reset time.

This means that if you try to start a 'pseries' machine with a DIMM
specified on the command line that's much larger than the "base" RAM size,
then the guest will get a woefully inadequate HPT.  This can lead to a
guest freeze during boot as it runs out of HPT space during initial MMU
setup.

Signed-off-by: David Gibson 
Reviewed-by: Greg Kurz 
Tested-by: Greg Kurz 
---
 hw/ppc/spapr.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index a471de6cab..1ac7eb0f8c 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1386,7 +1386,10 @@ void spapr_setup_hpt_and_vrma(sPAPRMachineState *spapr)
 && !spapr_ovec_test(spapr->ov5_cas, OV5_HPT_RESIZE))) {
 hpt_shift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
 } else {
-hpt_shift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->ram_size);
+uint64_t current_ram_size;
+
+current_ram_size = MACHINE(spapr)->ram_size + 
get_plugged_memory_size();
+hpt_shift = spapr_hpt_shift_for_ramsize(current_ram_size);
 }
 spapr_reallocate_hpt(spapr, hpt_shift, &error_fatal);
 
-- 
2.14.3




Re: [Qemu-devel] [PATCH v1 2/2] intel-iommu: Extend address width to 48 bits

2017-12-03 Thread Peter Xu
On Fri, Dec 01, 2017 at 09:02:30AM -0800, Prasad Singamsetty wrote:

[...]

> 
> > 
> > And... you may also need to create that PC_COMPAT_2_11 macro after
> > 2.11 is released.  For that you can refer to a6fd5b0e050a.
> > 
> > Anyway, I think this "set default" work can be postponed after recent
> > release, which can be a separate work besides current series.
> 
> OK. To be clear, are you suggesting that we can change the default
> value to 48 bits as a separate patch and not include it with the
> current patch set?

Yes.  Thanks,

-- 
Peter Xu



Re: [Qemu-devel] [for-2.12 3/7] pci: Fold pci_bus.h into pci.h

2017-12-03 Thread David Gibson
On Sun, Dec 03, 2017 at 07:07:36AM +0200, Michael S. Tsirkin wrote:
> On Sat, Dec 02, 2017 at 11:59:20AM +1100, David Gibson wrote:
> > On Fri, Dec 01, 2017 at 06:29:39PM +0200, Michael S. Tsirkin wrote:
> > > On Thu, Nov 30, 2017 at 03:02:48PM +1100, David Gibson wrote:
> > > > On Wed, Nov 29, 2017 at 12:38:00PM +0200, Marcel Apfelbaum wrote:
> > > > > On 29/11/2017 10:46, David Gibson wrote:
> > > > > > include/hw/pci/pci_bus.h is now very small and can only safely be 
> > > > > > included
> > > > > > after hw/pci/pci.h.  So, just fold it into pci.h.
> > > > > > 
> > > > > 
> > > > > I don't get the benefit from merging the header files.
> > > > > I would go the other way around and find stuff specific
> > > > > to pci_bus and add it there, like the pci_bus_new*
> > > > > you touched in the prev patch.
> > > > 
> > > > Hrm.  Except the point of the earlier patch was that those are
> > > > actually spoecific to root buses, so would really belong in say
> > > > pci-host.h, rather than pci-bus.h.
> > > > 
> > > > A log of PCI stuff deals with interaction between the device and bus
> > > > though, so it just seemed like more trouble than it was worth to go
> > > > disentangling them properly.
> > > > 
> > > > > Maybe if *all* code files requiring pci.h would automatically
> > > > > need pci_bus.h would make sense, but I didn't check that.
> > > > 
> > > > Yeah, I don't think every user of pci.h needs pci_bus.h, although a
> > > > fair few do as you can see from the diff.  Well, I guess it's up to
> > > > Michael.  I'll be tolerably content either way - I'd say this is the
> > > > least important patch of the series.
> > > 
> > > I'm inclined to agree with Marcel also because it's lots of noise for no
> > > real win.  The next patch depends on this one. I skipped this and next
> > > one, pls feel free to repost next one.
> > 
> > Ok, will do.  Is the tree you've merged the others to public
> > somewhere, so I can rebase on it?
> 
> git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git branch is named
> pci

Thanks.

> pls note this is a rebased branch, commit IDs won't be stable.

Understood.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v4 30/32] migration: delay the postcopy-active state switch

2017-12-03 Thread Peter Xu
On Fri, Dec 01, 2017 at 12:34:32PM +, Dr. David Alan Gilbert wrote:
> * Peter Xu (pet...@redhat.com) wrote:
> > Switch the state until we try to start the VM on destination side.  The
> > problem is that without doing this we may have a very small window that
> > we'll be in such a state:
> > 
> > - dst VM is in postcopy-active state,
> > - main thread is handling MIG_CMD_PACKAGED message, which loads all the
> >   device states,
> > - ram load thread is reading memory data from source.
> > 
> > Then if we failed at this point when reading the migration stream we'll
> > also switch to postcopy-paused state, but that is not what we want.  If
> > device states failed to load, we should fail the migration directly
> > instead of pause.
> > 
> > Postponing the state switch to the point when we have already loaded the
> > devices' states and been ready to start running destination VM.
> > 
> > Signed-off-by: Peter Xu 
> 
> If it's the only way, then this is OK; but I'd prefer you use a separate
> flag somewhere to let you know this, because this means that
> POSTCOPY_ACTIVE on the destination happens at a different point that it
> does on the source (and changing it on the source I think will break
> lots of things).

Yes, I thought it was fine to postpone it a bit to the point even
after receiving the packed data, but I fully understand your point.

> Can't you use the PostcopyState value and check if it's in
> POSTCOPY_INCOMING_RUNNING?

I think, yes. :)

Let me drop this patch, instead I'll check explicitly on
POSTCOPY_INCOMING_RUNNING state in patch 6.  Thanks,

-- 
Peter Xu



Re: [Qemu-devel] [PATCH v4 31/32] migration, qmp: new command "migrate-pause"

2017-12-03 Thread Peter Xu
On Fri, Dec 01, 2017 at 04:53:28PM +, Dr. David Alan Gilbert wrote:
> * Peter Xu (pet...@redhat.com) wrote:
> > It is used to manually trigger the postcopy pause state.  It works just
> > like when we found the migration stream failed during postcopy, but
> > provide an explicit way for user in case of misterious socket hangs.
> > 
> > Signed-off-by: Peter Xu 
> 
> Can we change the name to something like 'migrate-disconnect' - pause
> is a bit easy to confuse with other things and this is really more
> an explicit network disconnect (Is it worth just making it a flag to
> migrate-cancel?)

Then I would prefer to reuse the migrate_cancel command.  

Actually this reminded me about what would happen now if someone on
src VM sends a "migrate_cancel" during postcopy active.  It should
crash the VM, right?

Considering above, I'm thinking whether we should just make it a
default behavior that when do migrate_cancel during postcopy-active we
just do a pause instead of real cancel. After all it cannot re-start
the VM any more on source, so IMHO a real cancel does not mean much
here.  More importantly, what if someone wants to manually trigger
this pause but accidentally forgot to type that new flag (say,
-D[isconnect])?  It'll crash the VM directly.

What do you think?

> 
> 
> > ---
> >  migration/migration.c | 18 ++
> >  qapi/migration.json   | 22 ++
> >  2 files changed, 40 insertions(+)
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 536a771803..30348a5e27 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -1485,6 +1485,24 @@ void qmp_migrate_incoming(const char *uri, Error 
> > **errp)
> >  once = false;
> >  }
> >  
> > +void qmp_migrate_pause(Error **errp)
> > +{
> > +int ret;
> > +MigrationState *ms = migrate_get_current();
> > +
> > +if (ms->state != MIGRATION_STATUS_POSTCOPY_ACTIVE) {
> > +error_setg(errp, "Migration pause is currently only allowed during"
> > +   " an active postcopy phase.");
> > +return;
> > +}
> > +
> > +ret = qemu_file_shutdown(ms->to_dst_file);
> > +
> > +if (ret) {
> > +error_setg(errp, "Failed to pause migration stream.");
> > +}
> > +}
> > +
> >  bool migration_is_blocked(Error **errp)
> >  {
> >  if (qemu_savevm_state_blocked(errp)) {
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 4a3eff62f1..52901f7e2e 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -1074,6 +1074,28 @@
> >  { 'command': 'migrate-incoming', 'data': {'uri': 'str' } }
> >  
> >  ##
> > +# @migrate-pause:
> > +#
> > +# Pause an migration.  Currently it can only pause a postcopy
> > +# migration.  Pausing a precopy migration is not supported yet.
> > +#
> > +# It is mostly used as a manual way to trigger the postcopy paused
> > +# state when the network sockets hang due to some reason, so that we
> > +# can try a recovery afterward.
> 
> Can we say this explicitly;
> 'Force closes the migration connection to trigger the postcopy paused
>  state when the network sockets hang due to some reason, so that we
> can try a recovery afterwards'

Sure!  I'll just see where I should properly put these sentences.

Thanks,

-- 
Peter Xu



Re: [Qemu-devel] [for-2.12 3/7] pci: Fold pci_bus.h into pci.h

2017-12-03 Thread David Gibson
On Fri, Dec 01, 2017 at 06:29:39PM +0200, Michael S. Tsirkin wrote:
> On Thu, Nov 30, 2017 at 03:02:48PM +1100, David Gibson wrote:
> > On Wed, Nov 29, 2017 at 12:38:00PM +0200, Marcel Apfelbaum wrote:
> > > On 29/11/2017 10:46, David Gibson wrote:
> > > > include/hw/pci/pci_bus.h is now very small and can only safely be 
> > > > included
> > > > after hw/pci/pci.h.  So, just fold it into pci.h.
> > > > 
> > > 
> > > I don't get the benefit from merging the header files.
> > > I would go the other way around and find stuff specific
> > > to pci_bus and add it there, like the pci_bus_new*
> > > you touched in the prev patch.
> > 
> > Hrm.  Except the point of the earlier patch was that those are
> > actually spoecific to root buses, so would really belong in say
> > pci-host.h, rather than pci-bus.h.
> > 
> > A log of PCI stuff deals with interaction between the device and bus
> > though, so it just seemed like more trouble than it was worth to go
> > disentangling them properly.
> > 
> > > Maybe if *all* code files requiring pci.h would automatically
> > > need pci_bus.h would make sense, but I didn't check that.
> > 
> > Yeah, I don't think every user of pci.h needs pci_bus.h, although a
> > fair few do as you can see from the diff.  Well, I guess it's up to
> > Michael.  I'll be tolerably content either way - I'd say this is the
> > least important patch of the series.
> 
> I'm inclined to agree with Marcel also because it's lots of noise for no
> real win.  The next patch depends on this one. I skipped this and next
> one, pls feel free to repost next one.

Ok, will do.  I have some other cleanups in mind, so if I can finish
them time, I'll fold them in as well.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v18 10/10] virtio-balloon: don't report free pages when page poisoning is enabled

2017-12-03 Thread Wei Wang

On 12/01/2017 11:49 PM, Michael S. Tsirkin wrote:

On Wed, Nov 29, 2017 at 09:55:26PM +0800, Wei Wang wrote:

The guest free pages should not be discarded by the live migration thread
when page poisoning is enabled with PAGE_POISONING_NO_SANITY=n, because
skipping the transfer of such poisoned free pages will trigger false
positive when new pages are allocated and checked on the destination.
This patch skips the reporting of free pages in the above case.

Reported-by: Michael S. Tsirkin 
Signed-off-by: Wei Wang 
Cc: Michal Hocko 
---
  drivers/virtio/virtio_balloon.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 035bd3a..6ac4cff 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -652,7 +652,9 @@ static void report_free_page(struct work_struct *work)
/* Start by sending the obtained cmd id to the host with an outbuf */
send_one_desc(vb, vb->free_page_vq, virt_to_phys(&vb->start_cmd_id),
  sizeof(uint32_t), false, true, false);
-   walk_free_mem_block(vb, 0, &virtio_balloon_send_free_pages);
+   if (!(page_poisoning_enabled() &&
+   !IS_ENABLED(CONFIG_PAGE_POISONING_NO_SANITY)))
+   walk_free_mem_block(vb, 0, &virtio_balloon_send_free_pages);
/*
 * End by sending the stop id to the host with an outbuf. Use the
 * non-batching mode here to trigger a kick after adding the stop id.

PAGE_POISONING_ZERO is actually OK.


I think the 0-filled pages still need to be sent. If the host on the 
destination doesn't use PAGE_POISONING_ZERO, then the pages it offers to 
the guest on the destination may have non-0 values.






But I really would prefer it that we still send pages to host,
otherwise debugging becomes much harder.

And it does not have to be completely useless, even though
you can not discard them as they would be zero-filled then.

How about a config field telling host what should be there in the free
pages? This way even though host can not discard them, host can send
them out without reading them, still a win.




OK, but I think we would need two 32-bit config registers:

__u32 page_poison_val; // stores the PAGE_POISON VALUE, but it couldn't 
indicate if page poisoning is in use


__u32 special_features; // set bit 0 to indicate page poisoning is in use

#define VIRTIO_BALLOON_SF_PAGE_POISON (1 << 0)


Best,
Wei







Re: [Qemu-devel] [PATCH v3 3/3] nvdimm: add 'unarmed' option

2017-12-03 Thread Haozhong Zhang
On 12/01/17 10:44 +, Stefan Hajnoczi wrote:
> On Mon, Nov 27, 2017 at 12:35:17PM +0800, Haozhong Zhang wrote:
> > @@ -312,6 +315,10 @@ nvdimm_build_structure_memdev(GArray *structures, 
> > DeviceState *dev)
> >  
> >  /* Only one interleave for PMEM. */
> >  nfit_memdev->interleave_ways = cpu_to_le16(1);
> > +
> > +if (nvdimm->unarmed) {
> > +nfit_memdev->flags |= ACPI_NFIT_MEM_NOT_ARMED;
> 
> Missing cpu_to_le16()?
> 
> > +static void nvdimm_set_unarmed(Object *obj, bool value, Error **errp)
> > +{
> > +NVDIMMDevice *nvdimm = NVDIMM(obj);
> > +Error *local_err = NULL;
> > +
> > +if (memory_region_size(&nvdimm->nvdimm_mr)) {
> > +error_setg(&local_err, "cannot change property value");
> 
> This error message does not explain why the value cannot be changed.  I
> guess this is checking if the device has been realized.
> 
> Please call qdev_prop_set_after_realize() instead:
> 
>   if (memory_region_size(&nvdimm->nvdimm_mr)) {
>   qdev_prop_set_after_realize(DEVICE(obj), "unarmed", &local_err);
>   goto out;
>   }

I'll change in the next version.

Thanks,
Haozhong




Re: [Qemu-devel] [PATCH v1] spapr.c: Update qemu's maxcpus for pseries machine.

2017-12-03 Thread seeteena

Thanks David for the explanation.

I thought the changes been into upstream code for pseries to limit

maxcpus to 240 instead of 1024.



On 12/01/2017 06:07 PM, David Gibson wrote:

On Fri, Dec 01, 2017 at 04:54:09PM +0530, Seeteena Thoufeek wrote:

Need to adjust the max cpus supported number from error message since
it was conflicting with KVM's.

Steps to Reproduce:
1.boot up with
"-smp 64,maxcpus=102464,cores=8,threads=1,sockets=8"

qemu-kvm: Number of SMP CPUs requested (102464) exceeds max CPUs
supported by machine 'pseries-rhel7.4.0alt' (1024)

2. On KVM machine it shows

boot up with
"-m 6G,maxmem=300G,slots=256 -smp 64,maxcpus=1024,cores=8,threads=1
,sockets=128"

Number of hotpluggable cpus requested (1024) exceeds the maximum cpus
  supported by KVM (240)

It seemed that 1024 was useless since KVM only support 240 so far.
Hence,we need to adjust it to an reasonable value 240.

Signed-off-by: Seeteena Thoufeek 

This has been written without adequate thought and investigation.

First, in upstream code there's nothing wrong with having different
limits in qemu and kernel - the can both vary depending on various
factors.  Secondly, the 240 limit doesn't come from the kernel
upstream - only in RHEL.

Third, it doesn't even come from the kernel in RHEL - the downstream
qemu lies about the kernel limit.

I'm in the process of fixing this correctly, I expect to have
something on Monday.


---
  hw/ppc/spapr.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 9efddea..c753254 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3691,6 +3691,7 @@ static const TypeInfo spapr_machine_info = {
  if (latest) {\
  mc->alias = "pseries";   \
  mc->is_default = 1;  \
+mc->max_cpus = 240;  \
  }\
  }\
  static void spapr_machine_##suffix##_instance_init(Object *obj)  \





[Qemu-devel] Guest Reboot After Plug HDMI

2017-12-03 Thread Lying
Hello, Everybody. I encounter a problem when i using my guest.I booting my 
guest without HDMI primarily, Then i add it, but my guest is be rebooted.To 
know what cause it, i do it again, especially i check my guest is running 
normal or not with "ping" before plugging, that's well.What i was suffer? can 
you help me solve it?


Re: [Qemu-devel] [PATCH 1/2] colo: compare the packet based on the tcp sequence number

2017-12-03 Thread Zhang Chen
On Mon, Dec 4, 2017 at 3:32 AM, Mao Zhongyi 
wrote:

>
>
> On 12/04/2017 09:41 AM, Zhang Chen wrote:
>
>>
>>
>> On Tue, Nov 28, 2017 at 8:04 PM, Mao Zhongyi > > wrote:
>>
>> The primary and secondary guest has the same TCP stream, but the
>> the packet sizes are different due to the different fragmentation.
>>
>> In the current impletation, compare the packet with the size of
>> payload, but packets of the same size and payload are very few,
>> so it triggers checkopint frequently, which leads to a very low
>> performance of the tcp packet comparison. In addtion, the method
>> of comparing the size of packet is not correct in itself.
>>
>> like that:
>> We send this payload:
>> --
>> | header |1|2|3|4|5|6|7|8|9|0|
>> --
>>
>> primary:
>> ppkt1:
>> 
>> | header |1|2|3|
>> 
>> ppkt2:
>> 
>> | header |4|5|6|7|8|9|0|
>> 
>>
>> secondary:
>> spkt1:
>> --
>> | header |1|2|3|4|5|6|7|8|9|0|
>> --
>>
>> In the original method, ppkt1 and ppkt2 are diffrent in size and
>> spkt1, so they can't compare and trigger the checkpoint.
>>
>> I have tested FTP get 200M and 1G file many times, I found that
>> the performance was less than 1% of the native.
>>
>> Now I reconstructed the comparison of TCP packets based on the
>> TCP sequence number. first of all, ppkt1 and spkt1 have the same
>> starting sequence number, so they can compare, even though their
>> length is different. And then ppkt1 with a smaller payload length
>> is used as the comparison length, if the payload is same, send
>> out the ppkt1 and record the offset(the length of ppkt1 payload)
>> in spkt1. The next comparison, ppkt2 and spkt1 can be compared
>> from the recorded position of spkt1.
>>
>> like that:
>> 
>> | header |1|2|3| ppkt1
>> -|-|
>>  | |
>> -v-v--
>> | header |1|2|3|4|5|6|7|8|9|0| spkt1
>> ---|\|
>>| \offset |
>>   -v-v
>>   | header |4|5|6|7|8|9|0| ppkt2
>>   
>>
>> In this way, the performance can reach native 20% in my multiple
>> tests.
>>
>> Cc: Zhang Chen mailto:zhangc...@gmail.com>>
>> Cc: Li Zhijian > lizhij...@cn.fujitsu.com>>
>> Cc: Jason Wang mailto:jasow...@redhat.com>>
>>
>> Reported-by: Zhang Chen > zhangc...@gmail.com>>
>> Signed-off-by: Mao Zhongyi > maozy.f...@cn.fujitsu.com>>
>> Signed-off-by: Li Zhijian > lizhij...@cn.fujitsu.com>>
>>
>> ---
>>  net/colo-compare.c | 300 ++
>> +++
>>  net/colo.c |   8 ++
>>  net/colo.h |  14 +++
>>  3 files changed, 211 insertions(+), 111 deletions(-)
>>
>> diff --git a/net/colo-compare.c b/net/colo-compare.c
>> index 1ce195f..0752e9f 100644
>> --- a/net/colo-compare.c
>> +++ b/net/colo-compare.c
>> @@ -38,6 +38,9 @@
>>  #define COMPARE_READ_LEN_MAX NET_BUFSIZE
>>  #define MAX_QUEUE_SIZE 1024
>>
>> +#define COLO_COMPARE_FREE_PRIMARY 0x01
>> +#define COLO_COMPARE_FREE_SECONDARY   0x02
>> +
>>  /* TODO: Should be configurable */
>>  #define REGULAR_PACKET_CHECK_MS 3000
>>
>> @@ -112,14 +115,31 @@ static gint seq_sorter(Packet *a, Packet *b,
>> gpointer data)
>>  return ntohl(atcp->th_seq) - ntohl(btcp->th_seq);
>>  }
>>
>> +static void fill_pkt_seq(void *data, uint32_t *max_ack)
>> +{
>> +Packet *pkt = data;
>> +struct tcphdr *tcphd;
>> +
>> +tcphd = (struct tcphdr *)pkt->transport_header;
>> +
>> +pkt->tcp_seq = ntohl(tcphd->th_seq);
>> +pkt->tcp_ack = ntohl(tcphd->th_ack);
>> +*max_ack = *max_ack > pkt->tcp_ack ? *max_ack : pkt->tcp_ack;
>> +pkt->hdsize = pkt->transport_header - (uint8_t *)pkt->data
>> +  + (tcphd->th_off << 2) - pkt->vnet_hdr_len;
>> +pkt->pdsize = pkt->size - pkt->hdsize;
>>
>>
>>
>> In this function you are not just "fill_pkt_seq", use "fill_pkt_tcp_info"
>> is more suitable.
>> And use "header_size" and "payload_size" instead of "hdsize" and "pdsize"
>> maybe better to read.
>>
>
> OK, it's greater, thanks.
>
>
>
>>
>>
>> +pkt->seq_end = pkt->tcp_seq + pkt->pdsize;
>> +}
>> +
>>  /*
>>   * Return 1 on success, if return 0 means the
>>   * packet will be dropped
>>   */
>> -s

[Qemu-devel] [Bug 1736042] Re: qemu-system-x86_64 does not boot image reliably

2017-12-03 Thread Thomas Huth
Please always use the latest version of QEMU (currently v2.10, soon
v2.11) when reporting bugs to the upstream QEMU project - we don't
support old versions like v2.5 any more. So I guess you wanted to report
a bug agains v2.5 in Ubuntu instead and I changed the target of this bug
accordingly.

** Project changed: qemu => qemu (Ubuntu)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1736042

Title:
  qemu-system-x86_64 does not boot image reliably

Status in qemu package in Ubuntu:
  New

Bug description:
  Booting image as root user with following command works randomly.

  ./qemu-system-x86_64 -enable-kvm -curses -smp cpus=4 -m 8192
  /root/ructfe2917_g.qcow2

  Most of the time it ends up on "800x600 Graphic mode"(been stuck there
  even for 4 hours before killed), but 1 out of ~20 it boots image
  correctly(and instantly).

  This is visible in v2.5.0 build from sources, v2.5.0 from Ubuntu
  Xenial and v2.1.2 from Debian Jessie.

  The image in question was converted from vmdk using:

  qemu-img convert -O qcow2 file.vmdk file.qcow2

  The image contains Ubuntu with grub.

  I can provide debug logs, but will need guidance how to enable
  them(and what logs are necessary).

  As a side note, it seems that booting is more certain after
  connecting(or mounting) partition using qemu-nbd/mount.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1736042/+subscriptions