[PATCH 5/5] hw/arm/smmuv3: Report F_STE_FETCH fault address in correct word position

2019-12-04 Thread Simon Veith
The smmuv3_record_event() function that generates the F_STE_FETCH error
uses the EVT_SET_ADDR macro to record the fetch address, placing it in
32-bit words 4 and 5.

The correct position for this address is in words 6 and 7, per the
SMMUv3 Architecture Specification.

Update the function to use the EVT_SET_ADDR2 macro instead, which is the
macro intended for writing to these words.

ref. ARM IHI 0070C, section 7.3.4.

Signed-off-by: Simon Veith 
Cc: Eric Auger 
Cc: qemu-devel@nongnu.org
Cc: qemu-...@nongnu.org
---
 hw/arm/smmuv3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 2d6c275..125e47d 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -172,7 +172,7 @@ void smmuv3_record_event(SMMUv3State *s, SMMUEventInfo 
*info)
 case SMMU_EVT_F_STE_FETCH:
 EVT_SET_SSID(, info->u.f_ste_fetch.ssid);
 EVT_SET_SSV(,  info->u.f_ste_fetch.ssv);
-EVT_SET_ADDR(, info->u.f_ste_fetch.addr);
+EVT_SET_ADDR2(, info->u.f_ste_fetch.addr);
 break;
 case SMMU_EVT_C_BAD_STE:
 EVT_SET_SSID(, info->u.c_bad_ste.ssid);
-- 
2.7.4




Re: [PATCH v2 09/18] qga: Fix guest-get-fsinfo error API violations

2019-12-04 Thread Philippe Mathieu-Daudé

On 12/4/19 10:36 AM, Markus Armbruster wrote:

build_guest_fsinfo_for_virtual_device() dereferences @errp when
build_guest_fsinfo_for_device() fails.  That's wrong; see the big
comment in error.h.  Introduced in commit 46d4c5723e "qga: Add
guest-get-fsinfo command".

No caller actually passes null.

Fix anyway: splice in a local Error *err, and error_propagate().

Cc: Michael Roth 
Signed-off-by: Markus Armbruster 
---
  qga/commands-posix.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index 1c1a165dae..0be527ccb8 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -1049,6 +1049,7 @@ static void build_guest_fsinfo_for_virtual_device(char 
const *syspath,
GuestFilesystemInfo *fs,
Error **errp)
  {
+Error *err = NULL;
  DIR *dir;
  char *dirpath;
  struct dirent *entry;
@@ -1078,10 +1079,11 @@ static void build_guest_fsinfo_for_virtual_device(char 
const *syspath,
  
  g_debug(" slave device '%s'", entry->d_name);

  path = g_strdup_printf("%s/slaves/%s", syspath, entry->d_name);
-build_guest_fsinfo_for_device(path, fs, errp);
+build_guest_fsinfo_for_device(path, fs, );
  g_free(path);
  
-if (*errp) {

+if (err) {
+error_propagate(errp, err);
  break;
  }
  }



Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH] target/i386: relax assert when old host kernels don't include msrs

2019-12-04 Thread Paolo Bonzini
On 04/12/19 16:07, Catherine Ho wrote:
>> Ok, so the problem is that some MSR didn't exist in that version.  Which
> I thought in my platform, the only MSR didn't exist is MSR_IA32_VMX_BASIC
> (0x480). If I remove this kvm_msr_entry_add(), everything is ok, the guest can
> be boot up successfully.
> 

MSR_IA32_VMX_BASIC was added in kvm-4.10.  Maybe the issue is the
_value_ that is being written to the VM is not valid?  Can you check
what's happening in vmx_restore_vmx_basic?

Paolo




Re: [PATCH v3 2/4] s390x: Add missing vcpu reset functions

2019-12-04 Thread Cornelia Huck
On Wed, 4 Dec 2019 16:08:48 +0100
David Hildenbrand  wrote:

> On 04.12.19 16:07, David Hildenbrand wrote:
> > On 04.12.19 15:59, Cornelia Huck wrote:  
> >> On Tue,  3 Dec 2019 08:28:11 -0500
> >> Janosch Frank  wrote:
> >>  
> >>> Up to now we only had an ioctl to reset vcpu data QEMU couldn't reach
> >>> for the initial reset, and that was also called for the clear
> >>> reset. To be architecture compliant, we also need to clear local
> >>> interrupts on a normal reset.  
> >>
> >> Do we also need to do something like that for tcg? David?
> >>  
> > 
> > So, we have
> > 
> > /* Fields up to this point are not cleared by initial CPU reset */
> > struct {} start_initial_reset_fields;
> > [...]
> > int pending_int
> > uint16_t external_call_addr;
> > DECLARE_BITMAP(emergency_signals, S390_MAX_CPUS);
> > [...]
> > /* Fields up to this point are cleared by a CPU reset */
> > struct {} end_reset_fields;
> > 
> > This means, local interrupts will be cleared by everything that zeroes
> > "start_initial_reset_fields->end_reset_fields"
> > 
> > So, they will get cleared by S390_CPU_RESET_INITIAL only if I am not
> > wrong. In order to clear them on S390_CPU_RESET_NORMAL, we have to
> > manually set them to zero.  
> 
> Sorry, by S390_CPU_RESET_INITIAL and S390_CPU_RESET_CLEAR.

Ok. Will you cook up a patch, or should I do it?

> 
> > 
> > (we really should wrap TCG-only fields by ifdefs)

They probably do not get into the way, but making it obvious that they
a tcg-only is something we really should do.

> >   
> 
> 




Re: [PATCH v4 16/40] target/arm: Rearrange ARMMMUIdxBit

2019-12-04 Thread Alex Bennée


Richard Henderson  writes:

> Define via macro expansion, so that renumbering of the base ARMMMUIdx
> symbols is automatically reflexed in the bit definitions.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

> ---
>  target/arm/cpu.h | 39 +++
>  1 file changed, 23 insertions(+), 16 deletions(-)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 5f295c7e60..6ba5126852 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -2886,27 +2886,34 @@ typedef enum ARMMMUIdx {
>  ARMMMUIdx_Stage1_E1 = 1 | ARM_MMU_IDX_NOTLB,
>  } ARMMMUIdx;
>  
> -/* Bit macros for the core-mmu-index values for each index,
> +/*
> + * Bit macros for the core-mmu-index values for each index,
>   * for use when calling tlb_flush_by_mmuidx() and friends.
>   */
> +#define TO_CORE_BIT(NAME) \
> +ARMMMUIdxBit_##NAME = 1 << (ARMMMUIdx_##NAME & ARM_MMU_IDX_COREIDX_MASK)
> +
>  typedef enum ARMMMUIdxBit {
> -ARMMMUIdxBit_EL10_0 = 1 << 0,
> -ARMMMUIdxBit_EL10_1 = 1 << 1,
> -ARMMMUIdxBit_E2 = 1 << 2,
> -ARMMMUIdxBit_SE3 = 1 << 3,
> -ARMMMUIdxBit_SE0 = 1 << 4,
> -ARMMMUIdxBit_SE1 = 1 << 5,
> -ARMMMUIdxBit_Stage2 = 1 << 6,
> -ARMMMUIdxBit_MUser = 1 << 0,
> -ARMMMUIdxBit_MPriv = 1 << 1,
> -ARMMMUIdxBit_MUserNegPri = 1 << 2,
> -ARMMMUIdxBit_MPrivNegPri = 1 << 3,
> -ARMMMUIdxBit_MSUser = 1 << 4,
> -ARMMMUIdxBit_MSPriv = 1 << 5,
> -ARMMMUIdxBit_MSUserNegPri = 1 << 6,
> -ARMMMUIdxBit_MSPrivNegPri = 1 << 7,
> +TO_CORE_BIT(EL10_0),
> +TO_CORE_BIT(EL10_1),
> +TO_CORE_BIT(E2),
> +TO_CORE_BIT(SE0),
> +TO_CORE_BIT(SE1),
> +TO_CORE_BIT(SE3),
> +TO_CORE_BIT(Stage2),
> +
> +TO_CORE_BIT(MUser),
> +TO_CORE_BIT(MPriv),
> +TO_CORE_BIT(MUserNegPri),
> +TO_CORE_BIT(MPrivNegPri),
> +TO_CORE_BIT(MSUser),
> +TO_CORE_BIT(MSPriv),
> +TO_CORE_BIT(MSUserNegPri),
> +TO_CORE_BIT(MSPrivNegPri),
>  } ARMMMUIdxBit;
>  
> +#undef TO_CORE_BIT
> +
>  #define MMU_USER_IDX 0
>  
>  static inline int arm_to_core_mmu_idx(ARMMMUIdx mmu_idx)


-- 
Alex Bennée



Re: [PATCH] target/i386: relax assert when old host kernels don't include msrs

2019-12-04 Thread Catherine Ho
Hi Paolo
[sorry to resend it, seems to reply it incorrectly]

On Wed, 4 Dec 2019 at 19:23, Paolo Bonzini  wrote:

> On 04/12/19 09:50, Catherine Ho wrote:
> > Commit 20a78b02d315 ("target/i386: add VMX features") unconditionally
> > add vmx msr entry although older host kernels don't include them.
> >
> > But old host kernel + newest qemu will cause a qemu crash as follows:
> > qemu-system-x86_64: error: failed to set MSR 0x480 to 0x0
> > target/i386/kvm.c:2932: kvm_put_msrs: Assertion `ret ==
> > cpu->kvm_msr_buf->nmsrs' failed.
> >
> > This fixes it by relaxing the condition.
>
> This is intentional.  The VMX MSR entries should not have been added.
> What combination of host kernel/QEMU are you using, and what QEMU
> command line?
>
>
> Host kernel: 4.15.0 (ubuntu 18.04)
Qemu: https://gitlab.com/virtio-fs/qemu/tree/virtio-fs-dev
cmdline: qemu-system-x86_64 -M pc -cpu host --enable-kvm -smp 8 \
  -m 4G,maxmem=4G

But before 20a78b02d315, the older kernel + latest qemu can boot guest
successfully.

Best Regards,
Catherine


Re: [PATCH v2] qga: fence guest-set-time if hwclock not available

2019-12-04 Thread Michael Roth
Quoting Cornelia Huck (2019-11-28 12:11:00)
> The Posix implementation of guest-set-time invokes hwclock to
> set/retrieve the time to/from the hardware clock. If hwclock
> is not available, the user is currently informed that "hwclock
> failed to set hardware clock to system time", which is quite
> misleading. This may happen e.g. on s390x, which has a different
> timekeeping concept anyway.
> 
> Let's check for the availability of the hwclock command and
> return QERR_UNSUPPORTED for guest-set-time if it is not available.
> 
> Signed-off-by: Cornelia Huck 

Reviewed-by: Michael Roth 

> ---
> 
> v1 (RFC) -> v2:
> - use hwclock_path[]
> - use access() instead of stat()
> 
> ---
>  qga/commands-posix.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/qga/commands-posix.c b/qga/commands-posix.c
> index 1c1a165daed8..ffb6420fa9cd 100644
> --- a/qga/commands-posix.c
> +++ b/qga/commands-posix.c
> @@ -156,6 +156,17 @@ void qmp_guest_set_time(bool has_time, int64_t time_ns, 
> Error **errp)
>  pid_t pid;
>  Error *local_err = NULL;
>  struct timeval tv;
> +const char hwclock_path[] = "/sbin/hwclock";
> +static int hwclock_available = -1;
> +
> +if (hwclock_available < 0) {
> +hwclock_available = (access(hwclock_path, X_OK) == 0);
> +}
> +
> +if (!hwclock_available) {
> +error_setg(errp, QERR_UNSUPPORTED);
> +return;
> +}
>  
>  /* If user has passed a time, validate and set it. */
>  if (has_time) {
> @@ -195,7 +206,7 @@ void qmp_guest_set_time(bool has_time, int64_t time_ns, 
> Error **errp)
>  
>  /* Use '/sbin/hwclock -w' to set RTC from the system time,
>   * or '/sbin/hwclock -s' to set the system time from RTC. */
> -execle("/sbin/hwclock", "hwclock", has_time ? "-w" : "-s",
> +execle(hwclock_path, "hwclock", has_time ? "-w" : "-s",
> NULL, environ);
>  _exit(EXIT_FAILURE);
>  } else if (pid < 0) {
> -- 
> 2.21.0
> 



[PATCH v2 0/7] Enable Travis builds on arm64, ppc64le and s390x

2019-12-04 Thread Thomas Huth
Travis recently added build hosts for arm64, ppc64le and s390x, so
this is a welcome addition to our Travis testing matrix.

Unfortunately, the builds are running in quite restricted LXD containers
there, for example it is not possible to create huge files there (even
if they are just sparse), and certain system calls are blocked. So we
have to change some tests first to stop them failing in such environments.

v2:
 - Added "make check-tcg" and Alex' patch to disable cross-containers
 - Explicitely set "dist: xenial" for arm64 and ppc64le since some
   iotests are crashing on bionic on these hosts.
 - Dropped "libcap-dev" from the package list since it will be replaced
   by libcapng-dev soon.

Alex Bennée (1):
  configure: allow disable of cross compilation containers

Thomas Huth (6):
  iotests: Provide a function for checking the creation of huge files
  iotests: Skip test 060 if it is not possible to create large files
  iotests: Skip test 079 if it is not possible to create large files
  tests/hd-geo-test: Skip test when images can not be created
  tests/test-util-filemonitor: Skip test on non-x86 Travis containers
  travis.yml: Enable builds on arm64, ppc64le and s390x

 .travis.yml   | 86 +++
 configure |  8 +++-
 tests/hd-geo-test.c   | 12 -
 tests/qemu-iotests/005|  5 +-
 tests/qemu-iotests/060|  3 ++
 tests/qemu-iotests/079|  3 ++
 tests/qemu-iotests/220|  6 +--
 tests/qemu-iotests/common.rc  | 10 
 tests/tcg/configure.sh|  6 ++-
 tests/test-util-filemonitor.c | 11 +
 10 files changed, 138 insertions(+), 12 deletions(-)

-- 
2.18.1




Re: [PATCH v6 6/9] docs/clocks: add device's clock documentation

2019-12-04 Thread Damien Hedde


On 12/2/19 4:17 PM, Peter Maydell wrote:
> On Wed, 4 Sep 2019 at 13:56, Damien Hedde  wrote:
>>
>> Add the documentation about the clock inputs and outputs in devices.
>>
>> This is based on the original work of Frederic Konrad.
>>
>> Signed-off-by: Damien Hedde 
>> ---
>>  docs/devel/clock.txt | 246 +++
>>  1 file changed, 246 insertions(+)
>>  create mode 100644 docs/devel/clock.txt
> 
> Could you convert this to rst format, please?Yes.
> 
> 
> 
>> +Changing a clock output
>> +===
>> +
>> +A device can change its outputs using the clock_set_frequency function. It
>> +will trigger updates on every connected inputs.
> 
> "input"
> 
>> +
>> +For example, let's say that we have an output clock "clkout" and we have a
>> +pointer to it in the device state because we did the following in init 
>> phase:
>> +dev->clkout = qdev_init_clock_out(DEVICE(dev), "clkout");
>> +
>> +Then at any time (apart from the cases listed below), it is possible to
>> +change the clock value by doing:
>> +clock_set_frequency(dev->clkout, 1000 * 1000 * 1000); /* 1Ghz */
>> +This operation must be done while holding the qemu io lock.
>> +
>> +One can change clocks only when it is allowed to have side effects on other
>> +objects. In consequence, it is forbidden:
>> ++ during migration,
>> ++ and in the init phase of reset.
>> +
>> +Forwarding clocks
>> +=
>> +
>> +Sometimes, one needs to forward, or inherit, a clock from another device.
>> +Typically, when doing device composition, a device might expose a 
>> sub-device's
>> +clock without interfering with it.
>> +The function qdev_pass_clock() can be used to achieve this behaviour. Note, 
>> that
> 
> "Note that"
> 
>> +it is possible to expose the clock under a different name. This works for 
>> both
>> +inputs or outputs.
> 
> "inputs and outputs"
> 
> 
>> +Migration
>> +=
>> +
>> +Only the ClockIn object has a state. ClockOut is not concerned by migration.
> 
> "has any state".
> 
> "ClockOut has no state and does not need special handling for migration."
> 
>> +
>> +In case the frequency of in input clock is needed for a device's migration,
>> +this state must be migrated.
> 
> Are you trying to say that if an input clock is known to be a
> fixed frequency we don't need to migrate anything? I wonder
> if we need to worry about that or if we could/should just say that
> input clocks should always be migrated.

What I wanted to say is that there are indeed probably cases where
migrating the frequency is unnecessary. For example if we only use the
callback and never fetch the frequency outside it: if the frequency is
only used to compute something which is already saved/loaded during
migration.

But yes we could just do as you say. It's probably less confusing.

> 
>> The VMSTATE_CLOCKIN macro defines an entry to
>> +be added in a vmstate description.
>> +
>> +For example, if a device has a clock input and the device state looks like:
>> +MyDeviceState {
>> +DeviceState parent_obj;
>> +ClockIn *clk;
>> +};
>> +
>> +Then, to add the clock frequency to the device's migrated state, the vmstate
>> +description is:
>> +VMStateDescription my_device_vmstate = {
>> +.name = "my_device",
>> +.fields = (VMStateField[]) {
>> +VMSTATE_CLOCKIN(clk, MyDeviceState),
>> +VMSTATE_END_OF_LIST()
>> +}
>> +};
>> +
>> +When adding a input clock support to an existing device, you must care about
>> +migration compatibility. To this end, you can use the clock_init_frequency 
>> in
>> +a pre_load function to setup a default value in case the source vm does not
>> +migrate the frequency.
> 
> thanks
> -- PMM
> 



NBD reconnect on open

2019-12-04 Thread Vladimir Sementsov-Ogievskiy
Hi Eric!

There is a question to discuss.

We need to make an option to allow nbd-reconnect loop on nbd-open.
For example, add optional nbd blockdev option open-reconnect-delay, to
make it possible to start qemu with specified nbd connection, when nbd
server is down, and make several tries to connect before starting the
guest.

So, we need it for nbd opened from commandline arguments, and this case
seems OK.

But adding option to QAPI, we also allow it for qmp blockdev-add, and
reconnecting in context of qmp command execution is a wrong thing..

I can add new option only to options in block/nbd.c, but this way
-blockdev command line option will not work, it needs QAPI definition.

What do you think about it?

I can detect somehow in nbd_open that we are in qmp monitor context, and
return error if open-reconnect-delay specified.. Is it OK? Is there a
way to do it?


-- 
Best regards,
Vladimir


Re: virtiofsd: Where should it live?

2019-12-04 Thread Thomas Huth
On 04/12/2019 14.28, Kevin Wolf wrote:
> Am 04.12.2019 um 09:17 hat Gerd Hoffmann geschrieben:
>>   Hi,
>>
 |   ...
 +- qemu-edid
>>>
>>> Has its own MAINTAINERS section, together with hw/display/edit* and
>>> include/hw/display/edid.h.  I'm not sure moving it hw/display/ is a good
>>> idea.  Gerd?
>>
>> Sort-of makes sense.  My personal preference would be a tools/ directory
>> for all those small utilities though.
> 
> I think I would like that better than throwing tools into block/ where
> currently mostly just block drivers live.

+1 for tools/

 Thomas




Re: [PATCH v3] travis.yml: Run tcg tests with tci

2019-12-04 Thread Alex Bennée


Thomas Huth  writes:

> So far we only have compile coverage for tci. But since commit
> 2f160e0f9797c7522bfd0d09218d0c9340a5137c ("tci: Add implementation
> for INDEX_op_ld16u_i64") has been included now, we can also run the
> "tcg" and "qtest" tests with tci, so let's enable them in Travis now.
> Since we don't gain much additional test coverage by compiling all
> targets, and TCI is broken e.g. with the Sparc targets, we also limit
> the target list to a reasonable subset now (which should still get us
> test coverage by tests/boot-serial-test for example).

Queued to testing/next, thanks.
>
> Tested-by: Stefan Weil 
> Reviewed-by: Philippe Mathieu-Daudé 
> Signed-off-by: Thomas Huth 
> ---
>  v3:
>  - Add --disable-kvm option since we're only interested in TCG here
>
>  .travis.yml | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/.travis.yml b/.travis.yml
> index 445b0646c1..d73e2fb744 100644
> --- a/.travis.yml
> +++ b/.travis.yml
> @@ -215,10 +215,11 @@ matrix:
>  - TEST_CMD=""
>  
>  
> -# We manually include builds which we disable "make check" for
> +# Check the TCG interpreter (TCI)
>  - env:
> -- CONFIG="--enable-debug --enable-tcg-interpreter"
> -- TEST_CMD=""
> +- CONFIG="--enable-debug --enable-tcg-interpreter --disable-kvm 
> --disable-containers
> +
> --target-list=alpha-softmmu,arm-softmmu,hppa-softmmu,m68k-softmmu,microblaze-softmmu,moxie-softmmu,ppc-softmmu,s390x-softmmu,x86_64-softmmu"
> +- TEST_CMD="make check-qtest check-tcg V=1"
>  
>  
>  # We don't need to exercise every backend with every front-end


-- 
Alex Bennée



Re: [PATCH v3] travis.yml: Run tcg tests with tci

2019-12-04 Thread Alex Bennée


Richard Henderson  writes:

> On 12/4/19 12:31 AM, Thomas Huth wrote:
>> -# We manually include builds which we disable "make check" for
>> +# Check the TCG interpreter (TCI)
>>  - env:
>> -- CONFIG="--enable-debug --enable-tcg-interpreter"
>> -- TEST_CMD=""
>> +- CONFIG="--enable-debug --enable-tcg-interpreter --disable-kvm
>
> While we're changing things, the interpreter will go much faster with
> optimization enabled.  We can change this to --enable-debug-tcg, which leaves
> the asserts enabled, but compiles with -O2.

I've amended the commit in my tree - no need to resend.
>
>
> r~


-- 
Alex Bennée



Re: [PATCH] target/i386: relax assert when old host kernels don't include msrs

2019-12-04 Thread Paolo Bonzini
On 04/12/19 14:33, Catherine Ho wrote:
> Hi Paolo
> [sorry to resend it, seems to reply it incorrectly]
> 
> On Wed, 4 Dec 2019 at 19:23, Paolo Bonzini  > wrote:
> 
> On 04/12/19 09:50, Catherine Ho wrote:
> > Commit 20a78b02d315 ("target/i386: add VMX features") unconditionally
> > add vmx msr entry although older host kernels don't include them.
> >
> > But old host kernel + newest qemu will cause a qemu crash as follows:
> > qemu-system-x86_64: error: failed to set MSR 0x480 to 0x0
> > target/i386/kvm.c:2932: kvm_put_msrs: Assertion `ret ==
> > cpu->kvm_msr_buf->nmsrs' failed.
> >
> > This fixes it by relaxing the condition.
> 
> This is intentional.  The VMX MSR entries should not have been added.
> What combination of host kernel/QEMU are you using, and what QEMU
> command line?
> 
> 
> Host kernel: 4.15.0 (ubuntu 18.04)
> Qemu: https://gitlab.com/virtio-fs/qemu/tree/virtio-fs-dev
> cmdline: qemu-system-x86_64 -M pc -cpu host --enable-kvm -smp 8 \
>                   -m 4G,maxmem=4G
> 
> But before 20a78b02d315, the older kernel + latest qemu can boot guest
> successfully.

Ok, so the problem is that some MSR didn't exist in that version.  Which
one it is?  Can you make it conditional, similar to MSR_IA32_VMX_VMFUNC?

Thanks,

Paolo




Re: [PATCH v4 14/40] target/arm: Recover 4 bits from TBFLAGs

2019-12-04 Thread Richard Henderson
On 12/4/19 3:43 AM, Alex Bennée wrote:
> I'm not sure if this visual aid helps but here you go:
> 
>  *  31  20 1916 15 10 90
>  * +--++-+--+
>  * |  ||   TBFLAG_A64   |
>  * |  | +--+-+--+
>  * | TBFLAG_ANY   | |   TBFLAG_A32   |  |
>  * |  | +-+--+  TBFLAG_AM32 |
>  * |  |   |TBFLAG_M32|  |
>  * +--+---+--+--+

Oooh ahh.  Pretty.  Sure, that's helpful.  We'll see how irritating it is to
keep up-to-date as time goes on.  ;-)


>>  void gen_intermediate_code(CPUState *cpu, TranslationBlock *tb, int 
>> max_insns)
>>  {
>> -DisasContext dc;
>> +DisasContext dc = { };
> 
> We seemed to have dropped an initialise here which seems unrelated.

Added, not dropped.


r~



Re: [PATCH v2 07/18] hw/core: Fix fit_load_fdt() error handling violations

2019-12-04 Thread Philippe Mathieu-Daudé

On 12/4/19 10:36 AM, Markus Armbruster wrote:

fit_load_fdt() passes @errp to fit_image_addr(), then recovers from
ENOENT failures.  Passing @errp is wrong, because it works only as
long as @errp is neither @error_fatal nor @error_abort.  Messed up in
commit 3eb99edb48 "loader-fit: Wean off error_printf()".

No caller actually passes such values.

Fix anyway: splice in a local Error *err, and error_propagate().

Signed-off-by: Markus Armbruster 
---
  hw/core/loader-fit.c | 15 ---
  1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/hw/core/loader-fit.c b/hw/core/loader-fit.c
index 953b16bc82..c465921b8f 100644
--- a/hw/core/loader-fit.c
+++ b/hw/core/loader-fit.c
@@ -178,11 +178,12 @@ static int fit_load_fdt(const struct fit_loader *ldr, 
const void *itb,
  int cfg, void *opaque, const void *match_data,
  hwaddr kernel_end, Error **errp)
  {
+Error *err = NULL;
  const char *name;
  const void *data;
  const void *load_data;
  hwaddr load_addr;
-int img_off, err;
+int img_off;
  size_t sz;
  int ret;
  
@@ -197,13 +198,13 @@ static int fit_load_fdt(const struct fit_loader *ldr, const void *itb,

  return -EINVAL;
  }
  
-err = fit_image_addr(itb, img_off, "load", _addr, errp);

-if (err == -ENOENT) {
+ret = fit_image_addr(itb, img_off, "load", _addr, );
+if (ret == -ENOENT) {
  load_addr = ROUND_UP(kernel_end, 64 * KiB) + (10 * MiB);
-error_free(*errp);
-} else if (err) {
-error_prepend(errp, "unable to read FDT load address from FIT: ");
-ret = err;
+error_free(err);
+} else if (ret) {
+error_propagate_prepend(errp, err,
+"unable to read FDT load address from FIT: ");
  goto out;
  }
  


Cleaner.

Reviewed-by: Philippe Mathieu-Daudé 




Re: [RFC v5 024/126] error: auto propagated local_err

2019-12-04 Thread Markus Armbruster
Vladimir Sementsov-Ogievskiy  writes:

> Here is introduced ERRP_AUTO_PROPAGATE macro, to be used at start of
> functions with errp OUT parameter.
>
> It has three goals:
>
> 1. Fix issue with error_fatal & error_prepend/error_append_hint: user
> can't see this additional information, because exit() happens in
> error_setg earlier than information is added. [Reported by Greg Kurz]
>
> 2. Fix issue with error_abort & error_propagate: when we wrap
> error_abort by local_err+error_propagate, resulting coredump will
> refer to error_propagate and not to the place where error happened.

I get what you mean, but I have plenty of context.

> (the macro itself doesn't fix the issue, but it allows to [3.] drop all
> local_err+error_propagate pattern, which will definitely fix the issue)

The parenthesis is not part of the goal.

> [Reported by Kevin Wolf]
>
> 3. Drop local_err+error_propagate pattern, which is used to workaround
> void functions with errp parameter, when caller wants to know resulting
> status. (Note: actually these functions could be merely updated to
> return int error code).
>
> To achieve these goals, we need to add invocation of the macro at start
> of functions, which needs error_prepend/error_append_hint (1.); add
> invocation of the macro at start of functions which do
> local_err+error_propagate scenario the check errors, drop local errors
> from them and just use *errp instead (2., 3.).

The paragraph talks about two cases: 1. and 2.+3.  Makes me think we
want two paragraphs, each illustrated with an example.

What about you provide the examples, and then I try to polish the prose?

> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Eric Blake 
> ---
>
> CC: Gerd Hoffmann 
> CC: "Gonglei (Arei)" 
> CC: Eduardo Habkost 
> CC: Igor Mammedov 
> CC: Laurent Vivier 
> CC: Amit Shah 
> CC: Kevin Wolf 
> CC: Max Reitz 
> CC: John Snow 
> CC: Ari Sundholm 
> CC: Pavel Dovgalyuk 
> CC: Paolo Bonzini 
> CC: Stefan Hajnoczi 
> CC: Fam Zheng 
> CC: Stefan Weil 
> CC: Ronnie Sahlberg 
> CC: Peter Lieven 
> CC: Eric Blake 
> CC: "Denis V. Lunev" 
> CC: Markus Armbruster 
> CC: Alberto Garcia 
> CC: Jason Dillaman 
> CC: Wen Congyang 
> CC: Xie Changlong 
> CC: Liu Yuan 
> CC: "Richard W.M. Jones" 
> CC: Jeff Cody 
> CC: "Marc-André Lureau" 
> CC: "Daniel P. Berrangé" 
> CC: Richard Henderson 
> CC: Greg Kurz 
> CC: "Michael S. Tsirkin" 
> CC: Marcel Apfelbaum 
> CC: Beniamino Galvani 
> CC: Peter Maydell 
> CC: "Cédric Le Goater" 
> CC: Andrew Jeffery 
> CC: Joel Stanley 
> CC: Andrew Baumann 
> CC: "Philippe Mathieu-Daudé" 
> CC: Antony Pavlov 
> CC: Jean-Christophe Dubois 
> CC: Peter Chubb 
> CC: Subbaraya Sundeep 
> CC: Eric Auger 
> CC: Alistair Francis 
> CC: "Edgar E. Iglesias" 
> CC: Stefano Stabellini 
> CC: Anthony Perard 
> CC: Paul Durrant 
> CC: Paul Burton 
> CC: Aleksandar Rikalo 
> CC: Chris Wulff 
> CC: Marek Vasut 
> CC: David Gibson 
> CC: Cornelia Huck 
> CC: Halil Pasic 
> CC: Christian Borntraeger 
> CC: "Hervé Poussineau" 
> CC: Xiao Guangrong 
> CC: Aurelien Jarno 
> CC: Aleksandar Markovic 
> CC: Mark Cave-Ayland 
> CC: Jason Wang 
> CC: Laszlo Ersek 
> CC: Yuval Shaia 
> CC: Palmer Dabbelt 
> CC: Sagar Karandikar 
> CC: Bastian Koppelmann 
> CC: David Hildenbrand 
> CC: Thomas Huth 
> CC: Eric Farman 
> CC: Matthew Rosato 
> CC: Hannes Reinecke 
> CC: Michael Walle 
> CC: Artyom Tarasenko 
> CC: Stefan Berger 
> CC: Samuel Thibault 
> CC: Alex Williamson 
> CC: Tony Krowiak 
> CC: Pierre Morel 
> CC: Michael Roth 
> CC: Hailiang Zhang 
> CC: Juan Quintela 
> CC: "Dr. David Alan Gilbert" 
> CC: Luigi Rizzo 
> CC: Giuseppe Lettieri 
> CC: Vincenzo Maffione 
> CC: Jan Kiszka 
> CC: Anthony Green 
> CC: Stafford Horne 
> CC: Guan Xuetao 
> CC: Max Filippov 
> CC: qemu-bl...@nongnu.org
> CC: integrat...@gluster.org
> CC: sheep...@lists.wpkg.org
> CC: qemu-...@nongnu.org
> CC: xen-de...@lists.xenproject.org
> CC: qemu-...@nongnu.org
> CC: qemu-s3...@nongnu.org
> CC: qemu-ri...@nongnu.org
>
>  include/qapi/error.h | 38 ++
>  1 file changed, 38 insertions(+)
>
> diff --git a/include/qapi/error.h b/include/qapi/error.h
> index d6898d833b..47238d9065 100644
> --- a/include/qapi/error.h
> +++ b/include/qapi/error.h
> @@ -345,6 +345,44 @@ void error_set_internal(Error **errp,
>  ErrorClass err_class, const char *fmt, ...)
>  GCC_FMT_ATTR(6, 7);
>  
> +typedef struct ErrorPropagator {
> +Error *local_err;
> +Error **errp;
> +} ErrorPropagator;
> +
> +static inline void error_propagator_cleanup(ErrorPropagator *prop)
> +{
> +error_propagate(prop->errp, prop->local_err);
> +}
> +
> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagator, error_propagator_cleanup);
> +
> +/*
> + * ERRP_AUTO_PROPAGATE
> + *
> + * This macro is created to be the first line of a function with Error **errp
> + * OUT parameter. It's needed only in cases where we want to use 
> error_prepend,
> + * error_append_hint or dereference 

Re: [PATCH v3 2/4] s390x: Add missing vcpu reset functions

2019-12-04 Thread David Hildenbrand
On 04.12.19 16:33, Cornelia Huck wrote:
> On Wed, 4 Dec 2019 16:08:48 +0100
> David Hildenbrand  wrote:
> 
>> On 04.12.19 16:07, David Hildenbrand wrote:
>>> On 04.12.19 15:59, Cornelia Huck wrote:  
 On Tue,  3 Dec 2019 08:28:11 -0500
 Janosch Frank  wrote:
  
> Up to now we only had an ioctl to reset vcpu data QEMU couldn't reach
> for the initial reset, and that was also called for the clear
> reset. To be architecture compliant, we also need to clear local
> interrupts on a normal reset.  

 Do we also need to do something like that for tcg? David?
  
>>>
>>> So, we have
>>>
>>> /* Fields up to this point are not cleared by initial CPU reset */
>>> struct {} start_initial_reset_fields;
>>> [...]
>>> int pending_int
>>> uint16_t external_call_addr;
>>> DECLARE_BITMAP(emergency_signals, S390_MAX_CPUS);
>>> [...]
>>> /* Fields up to this point are cleared by a CPU reset */
>>> struct {} end_reset_fields;
>>>
>>> This means, local interrupts will be cleared by everything that zeroes
>>> "start_initial_reset_fields->end_reset_fields"
>>>
>>> So, they will get cleared by S390_CPU_RESET_INITIAL only if I am not
>>> wrong. In order to clear them on S390_CPU_RESET_NORMAL, we have to
>>> manually set them to zero.  
>>
>> Sorry, by S390_CPU_RESET_INITIAL and S390_CPU_RESET_CLEAR.
> 
> Ok. Will you cook up a patch, or should I do it?
> 

If you want to have some TCG fun, feel free. Otherwise, let me know :)

-- 
Thanks,

David / dhildenb




[PATCH v2 6/7] configure: allow disable of cross compilation containers

2019-12-04 Thread Thomas Huth
From: Alex Bennée 

Our docker infrastructure isn't quite as multiarch as we would wish so
let's allow the user to disable it if they want. This will allow us to
use still run check-tcg on non-x86 CI setups.

Signed-off-by: Alex Bennée 
Reviewed-by: Stefan Weil 
Signed-off-by: Thomas Huth 
---
 configure  | 8 +++-
 tests/tcg/configure.sh | 6 --
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/configure b/configure
index 6099be1d84..fe6d0971f1 100755
--- a/configure
+++ b/configure
@@ -302,6 +302,7 @@ audio_win_int=""
 libs_qga=""
 debug_info="yes"
 stack_protector=""
+use_containers="yes"
 
 if test -e "$source_path/.git"
 then
@@ -1539,6 +1540,10 @@ for opt do
   ;;
   --disable-plugins) plugins="no"
   ;;
+  --enable-containers) use_containers="yes"
+  ;;
+  --disable-containers) use_containers="no"
+  ;;
   *)
   echo "ERROR: unknown option $opt"
   echo "Try '$0 --help' for more information"
@@ -1722,6 +1727,7 @@ Advanced options (experts only):
track the maximum stack usage of stacks created by 
qemu_alloc_stack
   --enable-plugins
enable plugins via shared library loading
+  --disable-containers don't use containers for cross-building
 
 Optional features, enabled with --enable-FEATURE and
 disabled with --disable-FEATURE, default is enabled if available:
@@ -8039,7 +8045,7 @@ done
 (for i in $cross_cc_vars; do
   export $i
 done
-export target_list source_path
+export target_list source_path use_containers
 $source_path/tests/tcg/configure.sh)
 
 # temporary config to build submodules
diff --git a/tests/tcg/configure.sh b/tests/tcg/configure.sh
index 6c4a471aea..210e68396f 100755
--- a/tests/tcg/configure.sh
+++ b/tests/tcg/configure.sh
@@ -36,8 +36,10 @@ TMPC="${TMPDIR1}/qemu-conf.c"
 TMPE="${TMPDIR1}/qemu-conf.exe"
 
 container="no"
-if has "docker" || has "podman"; then
-  container=$($python $source_path/tests/docker/docker.py probe)
+if test $use_containers = "yes"; then
+if has "docker" || has "podman"; then
+container=$($python $source_path/tests/docker/docker.py probe)
+fi
 fi
 
 # cross compilers defaults, can be overridden with --cross-cc-ARCH
-- 
2.18.1




Re: [PATCH v2 03/13] s390x: protvirt: Support unpack facility

2019-12-04 Thread Janosch Frank
On 12/4/19 12:34 PM, Thomas Huth wrote:
> On 04/12/2019 12.32, Janosch Frank wrote:
>> On 12/4/19 11:48 AM, Thomas Huth wrote:
>>> On 29/11/2019 10.47, Janosch Frank wrote:
 When a guest has saved a ipib of type 5 and call diagnose308 with
 subcode 10, we have to setup the protected processing environment via
 Ultravisor calls. The calls are done by KVM and are exposed via an API.

 The following steps are necessary:
 1. Create a VM (register it with the Ultravisor)
 2. Create secure CPUs for all of our current cpus
 3. Forward the secure header to the Ultravisor (has all information on
 how to decrypt the image and VM information)
 4. Protect image pages from the host and decrypt them
 5. Verify the image integrity

 Only after step 5 a protected VM is allowed to run.

 Signed-off-by: Janosch Frank 
 ---
>>> [...]
 +++ b/hw/s390x/pv.c
 @@ -0,0 +1,118 @@
 +/*
 + * Secure execution functions
 + *
 + * Copyright IBM Corp. 2019
 + * Author(s):
 + *  Janosch Frank 
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or (at
 + * your option) any later version. See the COPYING file in the top-level
 + * directory.
 + */
 +#include "qemu/osdep.h"
 +#include 
 +
 +#include 
 +
 +#include "qemu/error-report.h"
 +#include "sysemu/kvm.h"
 +#include "pv.h"
 +
 +static int s390_pv_cmd(uint32_t cmd, void *data)
 +{
 +int rc;
 +struct kvm_pv_cmd pv_cmd = {
 +.cmd = cmd,
 +.data = (uint64_t)data,
 +};
 +
 +rc = kvm_vm_ioctl(kvm_state, KVM_S390_PV_COMMAND, _cmd);
 +if (rc) {
 +error_report("KVM PV command failed cmd: %d rc: %d", cmd, rc);
 +exit(1);
 +}
 +return rc;
 +}
 +
 +static int s390_pv_cmd_vcpu(CPUState *cs, uint32_t cmd, void *data)
 +{
 +int rc;
 +struct kvm_pv_cmd pv_cmd = {
 +.cmd = cmd,
 +.data = (uint64_t)data,
 +};
 +
 +rc = kvm_vcpu_ioctl(cs, KVM_S390_PV_COMMAND_VCPU, _cmd);
 +if (rc) {
 +error_report("KVM PV VCPU command failed cmd: %d rc: %d", cmd, 
 rc);
 +exit(1);
 +}
 +return rc;
 +}
 +
 +int s390_pv_vm_create(void)
 +{
 +return s390_pv_cmd(KVM_PV_VM_CREATE, NULL);
 +}
 +
 +int s390_pv_vm_destroy(void)
 +{
 +return s390_pv_cmd(KVM_PV_VM_DESTROY, NULL);
 +}
 +
 +int s390_pv_vcpu_create(CPUState *cs)
 +{
 +return s390_pv_cmd_vcpu(cs, KVM_PV_VCPU_CREATE, NULL);
 +}
 +
 +int s390_pv_vcpu_destroy(CPUState *cs)
 +{
 +S390CPU *cpu = S390_CPU(cs);
 +CPUS390XState *env = >env;
 +int rc;
 +
 +rc = s390_pv_cmd_vcpu(cs, KVM_PV_VCPU_DESTROY, NULL);
 +if (!rc) {
 +env->pv = false;
 +}
 +return rc;
 +}
 +
 +int s390_pv_set_sec_parms(uint64_t origin, uint64_t length)
 +{
 +struct kvm_s390_pv_sec_parm args = {
 +.origin = origin,
 +.length = length,
 +};
 +
 +return s390_pv_cmd(KVM_PV_VM_SET_SEC_PARMS, );
 +}
 +
 +/*
 + * Called for each component in the SE type IPL parameter block 0.
 + */
 +int s390_pv_unpack(uint64_t addr, uint64_t size, uint64_t tweak)
 +{
 +struct kvm_s390_pv_unp args = {
 +.addr = addr,
 +.size = size,
 +.tweak = tweak,
 +};
 +
 +return s390_pv_cmd(KVM_PV_VM_UNPACK, );
 +}
 +
 +int s390_pv_perf_clear_reset(void)
 +{
 +return s390_pv_cmd(KVM_PV_VM_PERF_CLEAR_RESET, NULL);
 +}
 +
 +int s390_pv_verify(void)
 +{
 +return s390_pv_cmd(KVM_PV_VM_VERIFY, NULL);
 +}
 +
 +int s390_pv_unshare(void)
 +{
 +return s390_pv_cmd(KVM_PV_VM_UNSHARE, NULL);
 +}
 diff --git a/hw/s390x/pv.h b/hw/s390x/pv.h
 new file mode 100644
 index 00..eb074e4bc9
 --- /dev/null
 +++ b/hw/s390x/pv.h
 @@ -0,0 +1,26 @@
 +/*
 + * Protected Virtualization header
 + *
 + * Copyright IBM Corp. 2019
 + * Author(s):
 + *  Janosch Frank 
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or (at
 + * your option) any later version. See the COPYING file in the top-level
 + * directory.
 + */
 +
 +#ifndef HW_S390_PV_H
 +#define HW_S390_PV_H
 +
 +int s390_pv_vm_create(void);
 +int s390_pv_vm_destroy(void);
 +int s390_pv_vcpu_destroy(CPUState *cs);
 +int s390_pv_vcpu_create(CPUState *cs);
 +int s390_pv_set_sec_parms(uint64_t origin, uint64_t length);
 +int s390_pv_unpack(uint64_t addr, uint64_t size, uint64_t tweak);
 +int s390_pv_perf_clear_reset(void);
 +int s390_pv_verify(void);
 +int 

Re: [PATCH v2 03/13] s390x: protvirt: Support unpack facility

2019-12-04 Thread Janosch Frank
On 12/4/19 11:48 AM, Thomas Huth wrote:
> On 29/11/2019 10.47, Janosch Frank wrote:
>> When a guest has saved a ipib of type 5 and call diagnose308 with
>> subcode 10, we have to setup the protected processing environment via
>> Ultravisor calls. The calls are done by KVM and are exposed via an API.
>>
>> The following steps are necessary:
>> 1. Create a VM (register it with the Ultravisor)
>> 2. Create secure CPUs for all of our current cpus
>> 3. Forward the secure header to the Ultravisor (has all information on
>> how to decrypt the image and VM information)
>> 4. Protect image pages from the host and decrypt them
>> 5. Verify the image integrity
>>
>> Only after step 5 a protected VM is allowed to run.
>>
>> Signed-off-by: Janosch Frank 
>> ---
> [...]
>> +++ b/hw/s390x/pv.c
>> @@ -0,0 +1,118 @@
>> +/*
>> + * Secure execution functions
>> + *
>> + * Copyright IBM Corp. 2019
>> + * Author(s):
>> + *  Janosch Frank 
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or (at
>> + * your option) any later version. See the COPYING file in the top-level
>> + * directory.
>> + */
>> +#include "qemu/osdep.h"
>> +#include 
>> +
>> +#include 
>> +
>> +#include "qemu/error-report.h"
>> +#include "sysemu/kvm.h"
>> +#include "pv.h"
>> +
>> +static int s390_pv_cmd(uint32_t cmd, void *data)
>> +{
>> +int rc;
>> +struct kvm_pv_cmd pv_cmd = {
>> +.cmd = cmd,
>> +.data = (uint64_t)data,
>> +};
>> +
>> +rc = kvm_vm_ioctl(kvm_state, KVM_S390_PV_COMMAND, _cmd);
>> +if (rc) {
>> +error_report("KVM PV command failed cmd: %d rc: %d", cmd, rc);
>> +exit(1);
>> +}
>> +return rc;
>> +}
>> +
>> +static int s390_pv_cmd_vcpu(CPUState *cs, uint32_t cmd, void *data)
>> +{
>> +int rc;
>> +struct kvm_pv_cmd pv_cmd = {
>> +.cmd = cmd,
>> +.data = (uint64_t)data,
>> +};
>> +
>> +rc = kvm_vcpu_ioctl(cs, KVM_S390_PV_COMMAND_VCPU, _cmd);
>> +if (rc) {
>> +error_report("KVM PV VCPU command failed cmd: %d rc: %d", cmd, rc);
>> +exit(1);
>> +}
>> +return rc;
>> +}
>> +
>> +int s390_pv_vm_create(void)
>> +{
>> +return s390_pv_cmd(KVM_PV_VM_CREATE, NULL);
>> +}
>> +
>> +int s390_pv_vm_destroy(void)
>> +{
>> +return s390_pv_cmd(KVM_PV_VM_DESTROY, NULL);
>> +}
>> +
>> +int s390_pv_vcpu_create(CPUState *cs)
>> +{
>> +return s390_pv_cmd_vcpu(cs, KVM_PV_VCPU_CREATE, NULL);
>> +}
>> +
>> +int s390_pv_vcpu_destroy(CPUState *cs)
>> +{
>> +S390CPU *cpu = S390_CPU(cs);
>> +CPUS390XState *env = >env;
>> +int rc;
>> +
>> +rc = s390_pv_cmd_vcpu(cs, KVM_PV_VCPU_DESTROY, NULL);
>> +if (!rc) {
>> +env->pv = false;
>> +}
>> +return rc;
>> +}
>> +
>> +int s390_pv_set_sec_parms(uint64_t origin, uint64_t length)
>> +{
>> +struct kvm_s390_pv_sec_parm args = {
>> +.origin = origin,
>> +.length = length,
>> +};
>> +
>> +return s390_pv_cmd(KVM_PV_VM_SET_SEC_PARMS, );
>> +}
>> +
>> +/*
>> + * Called for each component in the SE type IPL parameter block 0.
>> + */
>> +int s390_pv_unpack(uint64_t addr, uint64_t size, uint64_t tweak)
>> +{
>> +struct kvm_s390_pv_unp args = {
>> +.addr = addr,
>> +.size = size,
>> +.tweak = tweak,
>> +};
>> +
>> +return s390_pv_cmd(KVM_PV_VM_UNPACK, );
>> +}
>> +
>> +int s390_pv_perf_clear_reset(void)
>> +{
>> +return s390_pv_cmd(KVM_PV_VM_PERF_CLEAR_RESET, NULL);
>> +}
>> +
>> +int s390_pv_verify(void)
>> +{
>> +return s390_pv_cmd(KVM_PV_VM_VERIFY, NULL);
>> +}
>> +
>> +int s390_pv_unshare(void)
>> +{
>> +return s390_pv_cmd(KVM_PV_VM_UNSHARE, NULL);
>> +}
>> diff --git a/hw/s390x/pv.h b/hw/s390x/pv.h
>> new file mode 100644
>> index 00..eb074e4bc9
>> --- /dev/null
>> +++ b/hw/s390x/pv.h
>> @@ -0,0 +1,26 @@
>> +/*
>> + * Protected Virtualization header
>> + *
>> + * Copyright IBM Corp. 2019
>> + * Author(s):
>> + *  Janosch Frank 
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or (at
>> + * your option) any later version. See the COPYING file in the top-level
>> + * directory.
>> + */
>> +
>> +#ifndef HW_S390_PV_H
>> +#define HW_S390_PV_H
>> +
>> +int s390_pv_vm_create(void);
>> +int s390_pv_vm_destroy(void);
>> +int s390_pv_vcpu_destroy(CPUState *cs);
>> +int s390_pv_vcpu_create(CPUState *cs);
>> +int s390_pv_set_sec_parms(uint64_t origin, uint64_t length);
>> +int s390_pv_unpack(uint64_t addr, uint64_t size, uint64_t tweak);
>> +int s390_pv_perf_clear_reset(void);
>> +int s390_pv_verify(void);
>> +int s390_pv_unshare(void);
> 
> I still think you should make all those functions returning "void"
> instead of "int" - since errors results in an exit() in s390_pv_cmd()
> and s390_pv_cmd_vcpu() anyway...

Hey Thomas,

I have requested an error code for diag 308 subcode 10 that tells the
VM, that we didn't succeed starting a secure guest. Once that is in
place, I'll need to check the return codes.

Although I'm a bit unsure how I should 

Re: virtiofsd: Where should it live?

2019-12-04 Thread Dr. David Alan Gilbert
* Markus Armbruster (arm...@redhat.com) wrote:
> Daniel P. Berrangé  writes:
> 
> > On Tue, Dec 03, 2019 at 11:06:44AM +, Peter Maydell wrote:
> >> On Tue, 3 Dec 2019 at 10:53, Dr. David Alan Gilbert  
> >> wrote:
> >> >
> >> > We seem to be coming to the conclusion something that:
> >> >
> >> >   a) It should live in the qemu tree
> >> >   b) It shouldn't live under contrib
> >> >   c) We'll create a new top level, i.e. 'daemons'
> >> >   d) virtiofsd will be daemons/virtiofsd
> >> >
> >> > Now, somethings I'm less clear on:
> >> >   e) What else would move into daemons?  It was suggested
> >> > that if we've got virtiofsd in there, then we should move
> >> > libvhost-user - which I understand, but then it's not a
> >> > 'daemons'.
> >> > Are there any otehr daemons that should move?
> >> 
> >> I like the idea of a new top level directory, but I think
> >> 'daemons' is a bit too specific -- for instance it seems to
> >> me that qemu-img would be sensible to move out of the root,
> >> and that's not a daemon.
> >
> > Do we really need an extra directory level ?
> 
> +1
> 
> > IIUC, the main point against having $GIT_ROOT/virtiofsd is that
> > the root of our repo is quite cluttered already.
> >
> > Rather than trying to create a multi-level hierarchy which adds
> > a debate around naming, why not address the clutter by moving
> > *ALL* the .c/.h files out of the root so that we have a flatter
> > tree:
> >
> >   $GITROOT
> > +- qemu-system
> > |   +- vl.c
> > |   +- ...most other files...
> 
> Sounds good to me.
> 
> > +- qemu-img
> > |   +- qemu-img.c
> 
> Perhaps this one can all go into existing block/, similar to how
> pr-manager-helper.c is in scsi/, and virtfs-proxy-helper.c is in fsdev/.
> Up to the block maintainers, of course.
> 
> > +- qemu-nbd
> > |   +- qemu-nbd.c
> 
> block/ or nbd/?
> 
> > +- qemu-io
> > |   +- qemu-io.c
> > |   +- qemu-io-cmds.c
> 
> block/?
> 
> > +- qemu-bridge-helper
> 
> net/?
> 
> > |   ...
> > +- qemu-edid
> 
> Has its own MAINTAINERS section, together with hw/display/edit* and
> include/hw/display/edid.h.  I'm not sure moving it hw/display/ is a good
> idea.  Gerd?
> 
> > +- qemu-keymap
> 
> Not covered by MAINTAINERS.  scripts/get_maintainer.pl --git-blame
> points to Gerd.
> 
> > +- qga  (already exists)
> 
> Yes.
> 
> > Then we can add virtiofsd and other programs at the root with no big
> > issue.
> 
> We don't *have* to put each program into its own directory.  Simple ones
> could also share one.  We just need a directory name.

So what do you think of Paolo's suggestion of putting virtiofsd in 
fsdev (mkdir fsdev/9p && mv fsdev/* fsdev/9p && mkdir fsdev/virtiofsd )

Dave

--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [PATCH v2 08/13] s390x: protvirt: Add new VCPU reset functions

2019-12-04 Thread Thomas Huth
On 29/11/2019 10.48, Janosch Frank wrote:
> CPU resets for protected guests need to be done via Ultravisor
> calls. Hence we need a way to issue these calls for each reset.
> 
> As we formerly had only one reset function and it was called for
> initial, as well as for the clear reset, we now need a new interface.
> 
> Signed-off-by: Janosch Frank 
> ---
[...]
> diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
> index b802d02ff5..5b1ed3acb4 100644
> --- a/target/s390x/kvm.c
> +++ b/target/s390x/kvm.c
> @@ -154,6 +154,7 @@ static int cap_ri;
>  static int cap_gs;
>  static int cap_hpage_1m;
>  static int cap_protvirt;
> +static int cap_vcpu_resets;
>  
>  static int active_cmma;
>  
> @@ -346,6 +347,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>  cap_mem_op = kvm_check_extension(s, KVM_CAP_S390_MEM_OP);
>  cap_s390_irq = kvm_check_extension(s, KVM_CAP_S390_INJECT_IRQ);
>  cap_protvirt = kvm_check_extension(s, KVM_CAP_S390_PROTECTED);
> +cap_vcpu_resets = kvm_check_extension(s, KVM_CAP_S390_VCPU_RESETS);
>  
>  if (!kvm_check_extension(s, KVM_CAP_S390_GMAP)
>  || !kvm_check_extension(s, KVM_CAP_S390_COW)) {
> @@ -407,20 +409,44 @@ int kvm_arch_destroy_vcpu(CPUState *cs)
>  return 0;
>  }
>  
> -void kvm_s390_reset_vcpu(S390CPU *cpu)
> +static void kvm_s390_reset_vcpu(S390CPU *cpu, unsigned long type)
>  {
>  CPUState *cs = CPU(cpu);
>  
> -/* The initial reset call is needed here to reset in-kernel
> - * vcpu data that we can't access directly from QEMU
> - * (i.e. with older kernels which don't support sync_regs/ONE_REG).
> - * Before this ioctl cpu_synchronize_state() is called in common kvm
> - * code (kvm-all) */
> +/*
> + * The reset call is needed here to reset in-kernel vcpu data that
> + * we can't access directly from QEMU (i.e. with older kernels
> + * which don't support sync_regs/ONE_REG).  Before this ioctl
> + * cpu_synchronize_state() is called in common kvm code
> + * (kvm-all).
> + */
> +if (cap_vcpu_resets) {
> +if (kvm_vcpu_ioctl(cs, KVM_S390_VCPU_RESET, type)) {
> +error_report("CPU reset type %ld failed on CPU %i",
> + type, cs->cpu_index);
> +}
> +return;
> +}
>  if (kvm_vcpu_ioctl(cs, KVM_S390_INITIAL_RESET, NULL)) {
>  error_report("Initial CPU reset failed on CPU %i", cs->cpu_index);
>  }

Don't you want to check for type != KVM_S390_VCPU_RESET_NORMAL before
doing the INITIAL_RESET here?

>  }
>  
> +void kvm_s390_reset_vcpu_initial(S390CPU *cpu)
> +{
> +kvm_s390_reset_vcpu(cpu, KVM_S390_VCPU_RESET_INITIAL);
> +}
> +
> +void kvm_s390_reset_vcpu_clear(S390CPU *cpu)
> +{
> +kvm_s390_reset_vcpu(cpu, KVM_S390_VCPU_RESET_CLEAR);
> +}
> +
> +void kvm_s390_reset_vcpu_normal(S390CPU *cpu)
> +{
> +kvm_s390_reset_vcpu(cpu, KVM_S390_VCPU_RESET_NORMAL);
> +}

... otherwise this might reset more things than expected?

Or do I miss something here?

 Thomas




Re: [PATCH] Revert "qemu-options.hx: Update for reboot-timeout parameter"

2019-12-04 Thread Dr. David Alan Gilbert
* Han Han (h...@redhat.com) wrote:
> This reverts commit bbd9e6985ff342cbe15b9cb7eb30e842796fbbe8.

Patchew spotted you're missing the signed-off-by; please send one.

Dave

> In 20a1922032 we allowed reboot-timeout=-1 again, so update the doc
> accordingly.
> ---
>  qemu-options.hx | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 65c9473b73..e14d88e9b2 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -327,8 +327,8 @@ format(true color). The resolution should be supported by 
> the SVGA mode, so
>  the recommended is 320x240, 640x480, 800x640.
>  
>  A timeout could be passed to bios, guest will pause for @var{rb_timeout} ms
> -when boot failed, then reboot. If @option{reboot-timeout} is not set,
> -guest will not reboot by default. Currently Seabios for X86
> +when boot failed, then reboot. If @var{rb_timeout} is '-1', guest will not
> +reboot, qemu passes '-1' to bios by default. Currently Seabios for X86
>  system support it.
>  
>  Do strict boot via @option{strict=on} as far as firmware/BIOS
> -- 
> 2.24.0.rc1
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




[PATCH 1/5] hw/arm/smmuv3: Apply address mask to linear strtab base address

2019-12-04 Thread Simon Veith
In the SMMU_STRTAB_BASE register, the stream table base address only
occupies bits [51:6]. Other bits, such as RA (bit [62]), must be masked
out to obtain the base address.

The branch for 2-level stream tables correctly applies this mask by way
of SMMU_BASE_ADDR_MASK, but the one for linear stream tables does not.

Apply the missing mask in that case as well so that the correct stream
base address is used by guests which configure a linear stream table.

Linux guests are unaffected by this change because they choose a 2-level
stream table layout for the QEMU SMMUv3, based on the size of its stream
ID space.

ref. ARM IHI 0070C, section 6.3.23.

Signed-off-by: Simon Veith 
Cc: Eric Auger 
Cc: qemu-devel@nongnu.org
Cc: qemu-...@nongnu.org
---
 hw/arm/smmuv3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index e2fbb83..eef9a18 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -429,7 +429,7 @@ static int smmu_find_ste(SMMUv3State *s, uint32_t sid, STE 
*ste,
 }
 addr = l2ptr + l2_ste_offset * sizeof(*ste);
 } else {
-addr = s->strtab_base + sid * sizeof(*ste);
+addr = (s->strtab_base & SMMU_BASE_ADDR_MASK) + sid * sizeof(*ste);
 }
 
 if (smmu_get_ste(s, addr, ste, event)) {
-- 
2.7.4




[PATCH 2/5] hw/arm/smmuv3: Check stream IDs against actual table LOG2SIZE

2019-12-04 Thread Simon Veith
When checking whether a stream ID is in range of the stream table, we
have so far been only checking it against our implementation limit
(SMMU_IDR1_SIDSIZE). However, the guest can program the
STRTAB_BASE_CFG.LOG2SIZE field to a size that is smaller than this
limit.

Check the stream ID against this limit as well to match the hardware
behavior of raising C_BAD_STREAMID events in case the limit is exceeded.

ref. ARM IHI 0070C, section 6.3.24.

Signed-off-by: Simon Veith 
Cc: Eric Auger 
Cc: qemu-devel@nongnu.org
Cc: qemu-...@nongnu.org
---
 hw/arm/smmuv3.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index eef9a18..aad4639 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -377,11 +377,15 @@ static int smmu_find_ste(SMMUv3State *s, uint32_t sid, 
STE *ste,
  SMMUEventInfo *event)
 {
 dma_addr_t addr;
+uint32_t log2size;
 int ret;
 
 trace_smmuv3_find_ste(sid, s->features, s->sid_split);
-/* Check SID range */
-if (sid > (1 << SMMU_IDR1_SIDSIZE)) {
+log2size = FIELD_EX32(s->strtab_base_cfg, STRTAB_BASE_CFG, LOG2SIZE);
+/*
+ * Check SID range against both guest-configured and implementation limits
+ */
+if (sid > (1 << MIN(log2size, SMMU_IDR1_SIDSIZE))) {
 event->type = SMMU_EVT_C_BAD_STREAMID;
 return -EINVAL;
 }
-- 
2.7.4




[PATCH 0/5] hw/arm/smmuv3: Correct stream ID and event address handling

2019-12-04 Thread Simon Veith
While working on the Linux SMMUv3 driver, I noticed a few cases where the QEMU
SMMUv3 behavior relating to stream tables was inconsistent with our hardware.

Also, when debugging those differences, I found that the errors reported through
the QEMU SMMUv3 event queue contained the address fields in an incorrect
position.

These patches correct the QEMU SMMUv3 behavior to match the specification (and
the behavior that I observed in our hardware). Linux guests normally will not
notice these issues, but other SMMUv3 driver implementations might.

Simon Veith (5):
  hw/arm/smmuv3: Apply address mask to linear strtab base address
  hw/arm/smmuv3: Check stream IDs against actual table LOG2SIZE
  hw/arm/smmuv3: Align stream table base address to table size
  hw/arm/smmuv3: Use correct bit positions in EVT_SET_ADDR2 macro
  hw/arm/smmuv3: Report F_STE_FETCH fault address in correct word
position

 hw/arm/smmuv3-internal.h |  4 ++--
 hw/arm/smmuv3.c  | 39 ---
 2 files changed, 34 insertions(+), 9 deletions(-)

-- 
2.7.4




[PATCH 4/5] hw/arm/smmuv3: Use correct bit positions in EVT_SET_ADDR2 macro

2019-12-04 Thread Simon Veith
The bit offsets in the EVT_SET_ADDR2 macro do not match those specified
in the ARM SMMUv3 Architecture Specification. In all events that use
this macro, e.g. F_WALK_EABT, the faulting fetch address or IPA actually
occupies the 32-bit words 6 and 7 in the event record contiguously, with
the upper and lower unused bits clear due to alignment or maximum
supported address bits. How many bits are clear depends on the
individual event type.

Update the macro to write to the correct words in the event record so
that guest drivers can obtain accurate address information on events.

ref. ARM IHI 0070C, sections 7.3.12 through 7.3.16.

Signed-off-by: Simon Veith 
Cc: Eric Auger 
Cc: qemu-devel@nongnu.org
Cc: qemu-...@nongnu.org
---
 hw/arm/smmuv3-internal.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
index d190181..eb275e2 100644
--- a/hw/arm/smmuv3-internal.h
+++ b/hw/arm/smmuv3-internal.h
@@ -461,8 +461,8 @@ typedef struct SMMUEventInfo {
 } while (0)
 #define EVT_SET_ADDR2(x, addr)\
 do {  \
-(x)->word[7] = deposit32((x)->word[7], 3, 29, addr >> 16);   \
-(x)->word[7] = deposit32((x)->word[7], 0, 16, addr & 0x);\
+(x)->word[7] = (uint32_t)(addr >> 32);\
+(x)->word[6] = (uint32_t)(addr & 0x); \
 } while (0)
 
 void smmuv3_record_event(SMMUv3State *s, SMMUEventInfo *event);
-- 
2.7.4




Re: virtiofsd: Where should it live?

2019-12-04 Thread Eric Blake

On 12/4/19 1:43 AM, Markus Armbruster wrote:


 +- qemu-img
 |   +- qemu-img.c


Perhaps this one can all go into existing block/, similar to how
pr-manager-helper.c is in scsi/, and virtfs-proxy-helper.c is in fsdev/.
Up to the block maintainers, of course.


 +- qemu-nbd
 |   +- qemu-nbd.c


block/ or nbd/?


I'd lean towards nbd/, but am open to opinions from other block 
maintainers.  Kevin's plan to add a generic block utility that includes 
the functionality of qemu-nbd may also matter (where block/ starts to 
sound better).





 +- qemu-io
 |   +- qemu-io.c
 |   +- qemu-io-cmds.c


block/?



qemu-img and qemu-io in block/ sound reasonable to me.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v2 05/18] exec: Fix file_ram_alloc() error API violations

2019-12-04 Thread Philippe Mathieu-Daudé

On 12/4/19 10:36 AM, Markus Armbruster wrote:

When os_mem_prealloc() fails, file_ram_alloc() calls qemu_ram_munmap()
and returns null.  Except it doesn't when its @errp argument is null,
because it checks for failure with (errp && *errp).  Introduced in
commit 056b68af77 "fix qemu exit on memory hotplug when allocation
fails at prealloc time".

No caller actually passes null.

Fix anyway: splice in a local Error *err, and error_propagate().

Cc: Igor Mammedov 
Signed-off-by: Markus Armbruster 
Reviewed-by: Igor Mammedov 


Reviewed-by: Philippe Mathieu-Daudé 


---
  exec.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/exec.c b/exec.c
index ffdb518535..45695a5f2d 100644
--- a/exec.c
+++ b/exec.c
@@ -1841,6 +1841,7 @@ static void *file_ram_alloc(RAMBlock *block,
  bool truncate,
  Error **errp)
  {
+Error *err = NULL;
  MachineState *ms = MACHINE(qdev_get_machine());
  void *area;
  
@@ -1898,8 +1899,9 @@ static void *file_ram_alloc(RAMBlock *block,

  }
  
  if (mem_prealloc) {

-os_mem_prealloc(fd, area, memory, ms->smp.cpus, errp);
-if (errp && *errp) {
+os_mem_prealloc(fd, area, memory, ms->smp.cpus, );
+if (err) {
+error_propagate(errp, err);
  qemu_ram_munmap(fd, area, memory);
  return NULL;
  }






[PATCH v2 1/7] iotests: Provide a function for checking the creation of huge files

2019-12-04 Thread Thomas Huth
Some tests create huge (but sparse) files, and to be able to run those
tests in certain limited environments (like CI containers), we have to
check for the possibility to create such files first. Thus let's introduce
a common function to check for large files, and replace the already
existing checks in the iotests 005 and 220 with this function.

Reviewed-by: Alex Bennée 
Signed-off-by: Thomas Huth 
---
 tests/qemu-iotests/005   |  5 +
 tests/qemu-iotests/220   |  6 ++
 tests/qemu-iotests/common.rc | 10 ++
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/tests/qemu-iotests/005 b/tests/qemu-iotests/005
index 58442762fe..b6d03ac37d 100755
--- a/tests/qemu-iotests/005
+++ b/tests/qemu-iotests/005
@@ -59,10 +59,7 @@ fi
 # Sanity check: For raw, we require a file system that permits the creation
 # of a HUGE (but very sparse) file. Check we can create it before continuing.
 if [ "$IMGFMT" = "raw" ]; then
-if ! truncate --size=5T "$TEST_IMG"; then
-_notrun "file system on $TEST_DIR does not support large enough files"
-fi
-rm "$TEST_IMG"
+_require_large_file 5T
 fi
 
 echo
diff --git a/tests/qemu-iotests/220 b/tests/qemu-iotests/220
index 2d62c5dcac..15159270d3 100755
--- a/tests/qemu-iotests/220
+++ b/tests/qemu-iotests/220
@@ -42,10 +42,8 @@ echo "== Creating huge file =="
 
 # Sanity check: We require a file system that permits the creation
 # of a HUGE (but very sparse) file.  tmpfs works, ext4 does not.
-if ! truncate --size=513T "$TEST_IMG"; then
-_notrun "file system on $TEST_DIR does not support large enough files"
-fi
-rm "$TEST_IMG"
+_require_large_file 513T
+
 IMGOPTS='cluster_size=2M,refcount_bits=1' _make_test_img 513T
 
 echo "== Populating refcounts =="
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 0cc8acc9ed..6f0582c79a 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -643,5 +643,15 @@ _require_drivers()
 done
 }
 
+# Check that we have a file system that allows huge (but very sparse) files
+#
+_require_large_file()
+{
+if ! truncate --size="$1" "$TEST_IMG"; then
+_notrun "file system on $TEST_DIR does not support large enough files"
+fi
+rm "$TEST_IMG"
+}
+
 # make sure this script returns success
 true
-- 
2.18.1




[PATCH v2 4/7] tests/hd-geo-test: Skip test when images can not be created

2019-12-04 Thread Thomas Huth
In certain environments like restricted containers, we can not create
huge test images. To be able to use "make check" in such container
environments, too, let's skip the hd-geo-test instead of failing when
the test images could not be created.

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alex Bennée 
Signed-off-by: Thomas Huth 
---
 tests/hd-geo-test.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/tests/hd-geo-test.c b/tests/hd-geo-test.c
index 7e86c5416c..a249800544 100644
--- a/tests/hd-geo-test.c
+++ b/tests/hd-geo-test.c
@@ -34,8 +34,13 @@ static char *create_test_img(int secs)
 fd = mkstemp(template);
 g_assert(fd >= 0);
 ret = ftruncate(fd, (off_t)secs * 512);
-g_assert(ret == 0);
 close(fd);
+
+if (ret) {
+free(template);
+template = NULL;
+}
+
 return template;
 }
 
@@ -934,6 +939,10 @@ int main(int argc, char **argv)
 for (i = 0; i < backend_last; i++) {
 if (img_secs[i] >= 0) {
 img_file_name[i] = create_test_img(img_secs[i]);
+if (!img_file_name[i]) {
+g_test_message("Could not create test images.");
+goto test_add_done;
+}
 } else {
 img_file_name[i] = NULL;
 }
@@ -965,6 +974,7 @@ int main(int argc, char **argv)
"skipping hd-geo/override/* tests");
 }
 
+test_add_done:
 ret = g_test_run();
 
 for (i = 0; i < backend_last; i++) {
-- 
2.18.1




Re: [PATCH v6 3/9] qdev: add clock input support to devices.

2019-12-04 Thread Damien Hedde



On 12/4/19 10:53 AM, Philippe Mathieu-Daudé wrote:
> On 12/4/19 10:05 AM, Damien Hedde wrote:
>> On 12/2/19 3:34 PM, Peter Maydell wrote:
>>> On Wed, 4 Sep 2019 at 13:56, Damien Hedde
>>>  wrote:

> [...]
 +/**
 + * qdev_pass_clock:
 + * @dev: the device to forward the clock to
 + * @name: the name of the clock to be added (can't be NULL)
 + * @container: the device which already has the clock
 + * @cont_name: the name of the clock in the container device
 + *
 + * Add a clock @name to @dev which forward to the clock @cont_name
 in @container
 + */
>>>
>>> 'container' seems odd terminology here, because I would expect
>>> the usual use of this function to be when a 'container' object
>>> like an SoC wants to forward a clock to one of its components;
>>> in that case the 'container' SoC would be @dev, wouldn't it?
>>
>> Yes. I agree it is confusing.
>> This function just allow a a device 'A' to exhibit a clock from another
>> device 'B' (typically a composing sub-device, inside a soc like you
>> said). The device A is not supposed to interact with the clock itself.
>> The original sub-device is the true
>> owner/controller of the clock and is the only one which should use the
>> clock API: setting a callback on it (input clock); or updating the clock
>> frequency (output clock).
>> Basically, it just add the clock into the device clock namespace without
>> interfering with it.
>>
>>> We should get this to be the same way round as qdev_pass_gpios(),
>>> which takes "DeviceState *dev, DeviceState *container", and
>>> passes the gpios that exist on 'dev' over to 'container' so that
>>> 'container' now has gpios which it did not before.
>>
>> Ok, I'll invert the arguments.
>>
>>>
>>> Also, your use of 'forward to' is inconsistent: in the 'dev'
>>> documentation you say we're forwarding the clock to 'dev',
>>> but in the body of the documentation you say we're forwarding
>>> the clock to the clock in 'container'.
>>
>> I'll try to clarify this and make the documentation more consistent with
>> the comments here.
>>
>>>
>>> I think the way to resolve this is to stick to the terminology
>>> in the function name itself:
>>>   @dev: the device which has the clock
>>>   @name: the name of the clock on @dev
>>>   @container: the name of the device which the clock should
>>>    be passed to
>>>   @cont_name: the name to use for the clock on @container
>>
>> I think container is confusing because depending on how we reason (clock
>> in a device; device in another device), we end up thinking the opposite.
>>
>> Maybe we can use "exhibit" instead of "container" in the 2nd pair of
>> parameters, or prefix use "origin" or "owner" as a prefix for the first
>> pair of parameters.
> 
> @sink vs @source?
> 
>>> Q: if you pass a clock to another device with this function,
>>> does it still exist to be used directly on the original
>>> device? For qdev_pass_gpios it does not (I think), but
>>> this is more accident of implementation than anything else.
>>
>> It depends what we mean by "used by".
>> Original device can:
>> + set the callback in case it is an input
>> + update the frequency in case it is an output
> 
> Hmm here you use @input vs @output...

Not really. What I meant here is that the device which originally
creates the clock is the only device which can interact with the clock.
And it will never gives this right to another device even if
qdev_pass_clock() is used.

There are 2 interactions, depending on the clock direction (input or
output). If it is an input clock, it can register a callback on
frequency changes; and if it is an output clock, it can updates the
frequency of the clock.

> 
>> But since an input clock can only be connected once,
>> I think the logic here is that any connection should now go through the
>> new device. But this is not checked and using one or the other is
>> exactly the same.
>>
>> Maybe I misunderstood the meaning of qdev_pass_gpios().
> [...]
> 



Re: virtiofsd: Where should it live?

2019-12-04 Thread Markus Armbruster
"Dr. David Alan Gilbert"  writes:

> So what do you think of Paolo's suggestion of putting virtiofsd in 
> fsdev (mkdir fsdev/9p && mv fsdev/* fsdev/9p && mkdir fsdev/virtiofsd )

No objections.

Flatter: fsdev-9p/ and fsdev-virtio/.  Matter of taste.




Re: NBD reconnect on open

2019-12-04 Thread Kevin Wolf
Am 04.12.2019 um 13:18 hat Vladimir Sementsov-Ogievskiy geschrieben:
> There is a question to discuss.
> 
> We need to make an option to allow nbd-reconnect loop on nbd-open.
> For example, add optional nbd blockdev option open-reconnect-delay, to
> make it possible to start qemu with specified nbd connection, when nbd
> server is down, and make several tries to connect before starting the
> guest.
> 
> So, we need it for nbd opened from commandline arguments, and this case
> seems OK.
> 
> But adding option to QAPI, we also allow it for qmp blockdev-add, and
> reconnecting in context of qmp command execution is a wrong thing..
> 
> I can add new option only to options in block/nbd.c, but this way
> -blockdev command line option will not work, it needs QAPI definition.
> 
> What do you think about it?

I think there is a more general problem here actually. bdrv_open() is a
blocking operation and it shouldn't be. BlockDriver should probably have
a .bdrv_co_open instead, and I think this wouldn't be too hard to do.

However, that's only half of the solution: QMP still takes the BQL while
it's executing a command, so even if we used a coroutine, it would be of
no use if then the QMP command implementation would just call
BDRV_POLL_WHILE() to wait for the completion of the coroutine while
holding the BQL.

This is going in direction of async commands that Marc-André was working
on. I didn't follow this closely, so I'm not sure what the status there
is, but he and Markus should be able to tell more.

> I can detect somehow in nbd_open that we are in qmp monitor context, and
> return error if open-reconnect-delay specified.. Is it OK? Is there a
> way to do it?

The whole point of -blockdev is that it's a direct mapping of
blockdev-add to the command line, so making things behave differently
between them sounds like a bad idea.

Kevin




Re: [PATCH v3] travis.yml: Run tcg tests with tci

2019-12-04 Thread Richard Henderson
On 12/4/19 12:31 AM, Thomas Huth wrote:
> -# We manually include builds which we disable "make check" for
> +# Check the TCG interpreter (TCI)
>  - env:
> -- CONFIG="--enable-debug --enable-tcg-interpreter"
> -- TEST_CMD=""
> +- CONFIG="--enable-debug --enable-tcg-interpreter --disable-kvm

While we're changing things, the interpreter will go much faster with
optimization enabled.  We can change this to --enable-debug-tcg, which leaves
the asserts enabled, but compiles with -O2.


r~



Re: [PATCH] travis.yml: Drop libcap-dev

2019-12-04 Thread Alex Bennée


Greg Kurz  writes:

> Commit b1553ab12fe0 converted virtfs-proxy-helper to using libcap-ng. There
> aren't any users of libcap anymore. No need to install libcap-dev.
>
> Signed-off-by: Greg Kurz 
> ---
>
> Yet another follow-up to Paolo's patch to use libcap-ng instead of libcap.
> Like with the docker and the gitlab CI patches, if I get an ack from Alex
> I'll gladly merge this in the 9p tree and send a PR as soon as 5.0 dev
> begins. I'll make sure the SHA1 for Paolo's patch remains the same.

Acked-by: Alex Bennée 

>
>  .travis.yml |1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/.travis.yml b/.travis.yml
> index 445b0646c18a..6cb8af6fa599 100644
> --- a/.travis.yml
> +++ b/.travis.yml
> @@ -26,7 +26,6 @@ addons:
>- libaio-dev
>- libattr1-dev
>- libbrlapi-dev
> -  - libcap-dev
>- libcap-ng-dev
>- libgcc-4.8-dev
>- libgnutls28-dev


-- 
Alex Bennée



Re: [PATCH v2] qga: fence guest-set-time if hwclock not available

2019-12-04 Thread Cornelia Huck
On Thu, 28 Nov 2019 19:11:00 +0100
Cornelia Huck  wrote:

> The Posix implementation of guest-set-time invokes hwclock to
> set/retrieve the time to/from the hardware clock. If hwclock
> is not available, the user is currently informed that "hwclock
> failed to set hardware clock to system time", which is quite
> misleading. This may happen e.g. on s390x, which has a different
> timekeeping concept anyway.
> 
> Let's check for the availability of the hwclock command and
> return QERR_UNSUPPORTED for guest-set-time if it is not available.
> 
> Signed-off-by: Cornelia Huck 

Michael, any comments before I send a v3?

> ---
> 
> v1 (RFC) -> v2:
> - use hwclock_path[]
> - use access() instead of stat()
> 
> ---
>  qga/commands-posix.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/qga/commands-posix.c b/qga/commands-posix.c
> index 1c1a165daed8..ffb6420fa9cd 100644
> --- a/qga/commands-posix.c
> +++ b/qga/commands-posix.c
> @@ -156,6 +156,17 @@ void qmp_guest_set_time(bool has_time, int64_t time_ns, 
> Error **errp)
>  pid_t pid;
>  Error *local_err = NULL;
>  struct timeval tv;
> +const char hwclock_path[] = "/sbin/hwclock";
> +static int hwclock_available = -1;
> +
> +if (hwclock_available < 0) {
> +hwclock_available = (access(hwclock_path, X_OK) == 0);
> +}
> +
> +if (!hwclock_available) {
> +error_setg(errp, QERR_UNSUPPORTED);
> +return;
> +}
>  
>  /* If user has passed a time, validate and set it. */
>  if (has_time) {
> @@ -195,7 +206,7 @@ void qmp_guest_set_time(bool has_time, int64_t time_ns, 
> Error **errp)
>  
>  /* Use '/sbin/hwclock -w' to set RTC from the system time,
>   * or '/sbin/hwclock -s' to set the system time from RTC. */
> -execle("/sbin/hwclock", "hwclock", has_time ? "-w" : "-s",
> +execle(hwclock_path, "hwclock", has_time ? "-w" : "-s",
> NULL, environ);
>  _exit(EXIT_FAILURE);
>  } else if (pid < 0) {




Re: virtiofsd: Where should it live?

2019-12-04 Thread Eric Blake

On 12/4/19 7:28 AM, Kevin Wolf wrote:

Am 04.12.2019 um 09:17 hat Gerd Hoffmann geschrieben:

   Hi,


 |   ...
 +- qemu-edid


Has its own MAINTAINERS section, together with hw/display/edit* and
include/hw/display/edid.h.  I'm not sure moving it hw/display/ is a good
idea.  Gerd?


Sort-of makes sense.  My personal preference would be a tools/ directory
for all those small utilities though.


I think I would like that better than throwing tools into block/ where
currently mostly just block drivers live.


Ooh, I should have read this sub-thread before writing my reply at the 
top level.  Yes, I like the idea of tools/ for qemu-img, qemu-io, and 
qemu-nbd better than block/.




Or, if we want to move the tools there, we'd need another directory
level inside block/ to keep things reasonably organised.


Separating drivers from executables sounds worthwhile either way, 
whether that division is by a new tools/ or by a subdirectory in block/.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v4 20/40] target/arm: Update arm_mmu_idx for VHE

2019-12-04 Thread Alex Bennée


Richard Henderson  writes:

> Return the indexes for the EL2&0 regime when the appropriate bits
> are set within HCR_EL2.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

> ---
>  target/arm/helper.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 27adf24fa6..c6b4c0a25f 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -11172,12 +11172,16 @@ ARMMMUIdx arm_mmu_idx_el(CPUARMState *env, int el)
>  return arm_v7m_mmu_idx_for_secstate(env, env->v7m.secure);
>  }
>  
> +/* See ARM pseudo-function ELIsInHost.  */
>  switch (el) {
>  case 0:
> -/* TODO: ARMv8.1-VHE */
>  if (arm_is_secure_below_el3(env)) {
>  return ARMMMUIdx_SE0;
>  }
> +if ((env->cp15.hcr_el2 & (HCR_E2H | HCR_TGE)) == (HCR_E2H | HCR_TGE)
> +&& arm_el_is_aa64(env, 2)) {
> +return ARMMMUIdx_EL20_0;
> +}
>  return ARMMMUIdx_EL10_0;
>  case 1:
>  if (arm_is_secure_below_el3(env)) {
> @@ -11185,8 +11189,11 @@ ARMMMUIdx arm_mmu_idx_el(CPUARMState *env, int el)
>  }
>  return ARMMMUIdx_EL10_1;
>  case 2:
> -/* TODO: ARMv8.1-VHE */
>  /* TODO: ARMv8.4-SecEL2 */
> +/* Note that TGE does not apply at EL2.  */
> +if ((env->cp15.hcr_el2 & HCR_E2H) && arm_el_is_aa64(env, 2)) {
> +return ARMMMUIdx_EL20_2;
> +}
>  return ARMMMUIdx_E2;
>  case 3:
>  return ARMMMUIdx_SE3;


-- 
Alex Bennée



Re: qom device lifecycle interaction with hotplug/hotunplug ?

2019-12-04 Thread Eduardo Habkost
On Wed, Dec 04, 2019 at 10:18:24AM +0100, Jens Freimann wrote:
> On Tue, Dec 03, 2019 at 06:40:04PM -0300, Eduardo Habkost wrote:
> > +jfreimann, +mst
> > 
> > On Sat, Nov 30, 2019 at 11:10:19AM +, Peter Maydell wrote:
> > > On Fri, 29 Nov 2019 at 20:05, Eduardo Habkost  wrote:
> > > > So, to summarize the current issues:
> > > >
> > > > 1) realize triggers a plug operation implicitly.
> > > > 2) unplug triggers unrealize implicitly.
> > > >
> > > > Do you expect to see use cases that will require us to implement
> > > > realize-without-plug?
> > > 
> > > I don't think so, but only because of the oddity that
> > > we put lots of devices on the 'sysbus' and claim that
> > > that's plugging them into the bus. The common case of
> > > 'realize' is where one device (say an SoC) has a bunch of child
> > > devices (like UARTs); the SoC's realize method realizes its child
> > > devices. Those devices all end up plugged into the 'sysbus'
> > > but there's no actual bus there, it's fictional and about
> > > the only thing it matters for is reset propagation (which
> > > we don't model right either). A few devices don't live on
> > > buses at all.
> > 
> > That's my impression as well.
> > 
> > > 
> > > > Similarly, do you expect use cases that will require us to
> > > > implement unplug-without-unrealize?
> > > 
> > > I don't know enough about hotplug to answer this one:
> > > it's essentially what I'm hoping you'd be able to answer.
> > > I vaguely had in mind that eg the user might be able to
> > > create a 'disk' object, plug it into a SCSI bus, then
> > > unplug it from the bus without the disk and all its data
> > > evaporating, and maybe plug it back into the SCSI
> > > bus (or some other SCSI bus) later ? But I don't know
> > > anything about how we expose that kind of thing to the
> > > user via QMP/HMP.
> > 
> > This ability isn't exposed to the user at all.  Our existing
> > interfaces are -device, device_add and device_del.
> > 
> > We do have something new that sounds suspiciously similar to
> > "unplugged but not unrealized", though: the new hidden device
> > API, added by commit f3a850565693 ("qdev/qbus: add hidden device
> > support").
> > 
> > Jens, Michael, what exactly is the difference between a "hidden"
> > device and a "unplugged" device?
> 
> "hidden" the way we use it for virtio-net failover is actually unplugged. But 
> it
> doesn't have to be that way. You can register a function that decides
> if the device should be hidden, i.e. plugged now, or do something else
> with it (in the virtio-net failover case we just save everything we
> need to plug the device later).
> 
> We did introduce a "unplugged but not unrealized" function too as part
> of the failover feature. See "a99c4da9fc pci: mark devices partially
> unplugged"
> 
> This was needed so we would be able to re-plug the device in case a
> migration failed and we need to hotplug the primary device back to the
> guest. To avoid the risk of not getting the resources the device needs
> we don't unrealize but just trigger the unplug from the guest OS.

Thanks for the explanation.  Let me confirm if I understand the
purpose of the new mechanisms: should_be_hidden is a mechanism
for implementing realize-without-plug.  partially_hotplugged is a
mechanism for implementing unplug-without-unrealize.  Is that
correct?

-- 
Eduardo




Re: [PATCH v4 22/40] target/arm: Update aa64_zva_access for EL2

2019-12-04 Thread Alex Bennée


Richard Henderson  writes:

> The comment that we don't support EL2 is somewhat out of date.
> Update to include checks against HCR_EL2.TDZ.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

> ---
>  target/arm/helper.c | 26 +-
>  1 file changed, 21 insertions(+), 5 deletions(-)
>
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 4f5e0b656c..ffa82b5509 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -4109,11 +4109,27 @@ static void tlbi_aa64_ipas2e1is_write(CPUARMState 
> *env, const ARMCPRegInfo *ri,
>  static CPAccessResult aa64_zva_access(CPUARMState *env, const ARMCPRegInfo 
> *ri,
>bool isread)
>  {
> -/* We don't implement EL2, so the only control on DC ZVA is the
> - * bit in the SCTLR which can prohibit access for EL0.
> - */
> -if (arm_current_el(env) == 0 && !(env->cp15.sctlr_el[1] & SCTLR_DZE)) {
> -return CP_ACCESS_TRAP;
> +int cur_el = arm_current_el(env);
> +
> +if (cur_el < 2) {
> +uint64_t hcr = arm_hcr_el2_eff(env);
> +
> +if (cur_el == 0) {
> +if ((hcr & (HCR_E2H | HCR_TGE)) == (HCR_E2H | HCR_TGE)) {
> +if (!(env->cp15.sctlr_el[2] & SCTLR_DZE)) {
> +return CP_ACCESS_TRAP_EL2;
> +}
> +} else {
> +if (!(env->cp15.sctlr_el[1] & SCTLR_DZE)) {
> +return CP_ACCESS_TRAP;
> +}
> +if (hcr & HCR_TDZ) {
> +return CP_ACCESS_TRAP_EL2;
> +}
> +}
> +} else if (hcr & HCR_TDZ) {
> +return CP_ACCESS_TRAP_EL2;
> +}
>  }
>  return CP_ACCESS_OK;
>  }


-- 
Alex Bennée



Re: [PATCH] net/imx_fec: Updating the IMX_FEC IP to support loopback mode.

2019-12-04 Thread Peter Maydell
On Wed, 4 Dec 2019 at 02:15, Jason Wang  wrote:
>
>
> On 2019/11/30 上午12:04, Philippe Mathieu-Daudé wrote:
> > On Fri, Nov 29, 2019 at 4:59 PM Wasim, Bilal  wrote:
> >> Thanks for the pointers philippe.. Is the patch okay to be merged without 
> >> it or do I need to do a re-submission with the updated username ?
> > If there are no review comments on your patch, I think the maintainer
> > taking your patch can fix this details directly, no need to resend.
> >
> >> -Original Message-
> >> From: Philippe Mathieu-Daudé [mailto:phi...@redhat.com]
> >> Sent: Friday, November 29, 2019 8:38 PM
> >> To: bilalwasim...@gmail.com; qemu-devel@nongnu.org
> >> Cc: peter.mayd...@linaro.org; aa1ron...@gmail.com; j...@tribudubois.net; 
> >> qemu-...@nongnu.org; Wasim, Bilal ; 
> >> li...@roeck-us.net; Jason Wang 
> >> Subject: Re: [PATCH] net/imx_fec: Updating the IMX_FEC IP to support 
> >> loopback mode.
> >>
> >> Hi Bilal,
> >>
> >> Cc'ing Jason, the maintainer of network devices.
> >>
> >> On 11/29/19 4:05 PM, bilalwasim...@gmail.com wrote:
> >>> From: bwasim 
> >> Your git setup misses your 'user.name', you could fix it running:
> >>
> >> git config user.name "Bilal Wasim"
> >>
> >> (eventually with the --global option).
> >>
> >> The patch looks good otherwise.
> >>
> >> Thanks!
>
>
> Applied with the fix for user.name.

Could you fix up the non-standard block comment formatting too?

thanks
-- PMM



Re: [PATCH v6] error: rename errp to errp_in where it is IN-argument

2019-12-04 Thread Markus Armbruster
Vladimir Sementsov-Ogievskiy  writes:

> 04.12.2019 16:03, Markus Armbruster wrote:
>> Markus Armbruster  writes:
>> 
>>> Vladimir Sementsov-Ogievskiy  writes:
>>>
 29.11.2019 17:35, Markus Armbruster wrote:
>> [...]
> I pushed my fixups to git://repo.or.cz/qemu/armbru.git branch error-prep
> for your convenience.  The commit messages of the fixed up commits need
> rephrasing, of course.
>

 Looked through fixups, looks OK for me, thanks! What next?
>>>
>>> Let me finish my fixing incorrect dereferences of errp, and then we
>>> figure out what to include in v7.
>> 
>> Your v6 with my fixups does not conflict with my "[PATCH v2 00/18] Error
>> handling fixes", except for "hw/core/loader-fit: fix freeing errp in
>> fit_load_fdt", which my "[PATCH v2 07/18] hw/core: Fix fit_load_fdt()
>> error handling violations" supersedes.
>> 
>> Suggest you work in the fixups and post as v7.  I'll merge that in my
>> tree, to give you a base for the remainder of your "auto propagated
>> local_err" work.  While you work on that, I'll work on getting the base
>> merged into master.  Sounds like a plan?
>> 
>
> Yes, that's good. I'll send v7 tomorrow.
>
> What you suggest to do after it?
> Send in one series a patch with macro + coccinelle +
> subset of autogenerated patches, which were reviewed (but not sending half
> a subsystem of course)?

Sounds good to me.

Visibility into the complete work is useful, though.  Having the cover
letter point to a branch in your public git repo should do.




[PATCH v2 5/7] tests/test-util-filemonitor: Skip test on non-x86 Travis containers

2019-12-04 Thread Thomas Huth
test-util-filemonitor fails in restricted non-x86 Travis containers
since they apparently blacklisted some required system calls there.
Let's simply skip the test if we detect such an environment.

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alex Bennée 
Signed-off-by: Thomas Huth 
---
 tests/test-util-filemonitor.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/tests/test-util-filemonitor.c b/tests/test-util-filemonitor.c
index 301cd2db61..45009c69f4 100644
--- a/tests/test-util-filemonitor.c
+++ b/tests/test-util-filemonitor.c
@@ -406,10 +406,21 @@ test_file_monitor_events(void)
 char *pathdst = NULL;
 QFileMonitorTestData data;
 GHashTable *ids = g_hash_table_new(g_int64_hash, g_int64_equal);
+char *travis_arch;
 
 qemu_mutex_init();
 data.records = NULL;
 
+/*
+ * This test does not work on Travis LXD containers since some
+ * syscalls are blocked in that environment.
+ */
+travis_arch = getenv("TRAVIS_ARCH");
+if (travis_arch && !g_str_equal(travis_arch, "x86_64")) {
+g_test_skip("Test does not work on non-x86 Travis containers.");
+return;
+}
+
 /*
  * The file monitor needs the main loop running in
  * order to receive events from inotify. We must
-- 
2.18.1




Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion

2019-12-04 Thread Dr. David Alan Gilbert
* Scott Cheloha (chel...@linux.vnet.ibm.com) wrote:
> savevm_state's SaveStateEntry TAILQ is a priority queue.  Priority
> sorting is maintained by searching from head to tail for a suitable
> insertion spot.  Insertion is thus an O(n) operation.
> 
> If we instead keep track of the head of each priority's subqueue
> within that larger queue we can reduce this operation to O(1) time.
> 
> savevm_state_handler_remove() becomes slightly more complex to
> accomodate these gains: we need to replace the head of a priority's
> subqueue when removing it.
> 
> With O(1) insertion, booting VMs with many SaveStateEntry objects is
> more plausible.  For example, a ppc64 VM with maxmem=8T has 4 such
> objects to insert.
> 
> Signed-off-by: Scott Cheloha 

OK, it took me a while to figure out why you didn't just
turn handlers into handlers[MIG_PRI_MAX]; but I guess the problem is
you would have to change all the foreach's scattered around that walk
the list.  So


Reviewed-by: Dr. David Alan Gilbert 

> ---
>  migration/savevm.c | 26 +++---
>  1 file changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index b2e3b7222a..f7a2d36bba 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -250,6 +250,7 @@ typedef struct SaveStateEntry {
>  
>  typedef struct SaveState {
>  QTAILQ_HEAD(, SaveStateEntry) handlers;
> +SaveStateEntry *handler_pri_head[MIG_PRI_MAX + 1];
>  int global_section_id;
>  uint32_t len;
>  const char *name;
> @@ -261,6 +262,7 @@ typedef struct SaveState {
>  
>  static SaveState savevm_state = {
>  .handlers = QTAILQ_HEAD_INITIALIZER(savevm_state.handlers),
> +.handler_pri_head = { [MIG_PRI_DEFAULT ... MIG_PRI_MAX] = NULL },
>  .global_section_id = 0,
>  };
>  
> @@ -709,24 +711,42 @@ static void savevm_state_handler_insert(SaveStateEntry 
> *nse)
>  {
>  MigrationPriority priority = save_state_priority(nse);
>  SaveStateEntry *se;
> +int i;
>  
>  assert(priority <= MIG_PRI_MAX);
>  
> -QTAILQ_FOREACH(se, _state.handlers, entry) {
> -if (save_state_priority(se) < priority) {
> +for (i = priority - 1; i >= 0; i--) {
> +se = savevm_state.handler_pri_head[i];
> +if (se != NULL) {
> +assert(save_state_priority(se) < priority);
>  break;
>  }
>  }
>  
> -if (se) {
> +if (i >= 0) {
>  QTAILQ_INSERT_BEFORE(se, nse, entry);
>  } else {
>  QTAILQ_INSERT_TAIL(_state.handlers, nse, entry);
>  }
> +
> +if (savevm_state.handler_pri_head[priority] == NULL) {
> +savevm_state.handler_pri_head[priority] = nse;
> +}
>  }
>  
>  static void savevm_state_handler_remove(SaveStateEntry *se)
>  {
> +SaveStateEntry *next;
> +MigrationPriority priority = save_state_priority(se);
> +
> +if (se == savevm_state.handler_pri_head[priority]) {
> +next = QTAILQ_NEXT(se, entry);
> +if (next != NULL && save_state_priority(next) == priority) {
> +savevm_state.handler_pri_head[priority] = next;
> +} else {
> +savevm_state.handler_pri_head[priority] = NULL;
> +}
> +}
>  QTAILQ_REMOVE(_state.handlers, se, entry);
>  }
>  
> -- 
> 2.23.0
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




[PATCH for-5.0 7/8] acpi: cpuhp: add CPHP_GET_CPU_ID_CMD command

2019-12-04 Thread Igor Mammedov
Extend CPU hotplug interface to return architecture specific
identifier for current CPU in 2 registers:
 - lower 32 bits existing ACPI_CPU_CMD_DATA_OFFSET_RW
 - upper 32 bits in new ACPI_CPU_CMD_DATA2_OFFSET_R at
   offset 0.

Target user is UEFI firmware, which needs a way to enumerate
all CPUs (including possible CPUs) to allocate and initialize
CPU structures on boot.
(for x86: it needs APIC ID and later command will be used to
retrieve ARM's MPIDR which serves the similar to APIC ID purpose)

The new ACPI_CPU_CMD_DATA2_OFFSET_R register will also be used
to detect presence of modern CPU hotplug, which will be described
in follow up patch.

Signed-off-by: Igor Mammedov 
---
v1:
 - s/ACPI_CPU_CMD_DATA2_OFFSET_RW/ACPI_CPU_CMD_DATA2_OFFSET_R/.
---
 docs/specs/acpi_cpu_hotplug.txt | 10 --
 hw/acpi/cpu.c   | 15 +++
 hw/acpi/trace-events|  1 +
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/docs/specs/acpi_cpu_hotplug.txt b/docs/specs/acpi_cpu_hotplug.txt
index 58c16c6..bb33144 100644
--- a/docs/specs/acpi_cpu_hotplug.txt
+++ b/docs/specs/acpi_cpu_hotplug.txt
@@ -44,7 +44,11 @@ keeps the current value.
 
 read access:
 offset:
-[0x0-0x3] reserved
+[0x0-0x3] Command data 2: (DWORD access)
+  if last stored 'Command field' value:
+  3: upper 32 bits of architecture specific identifying CPU 
value
+ (n x86 case: 0x0)
+  other values: reserved
 [0x4] CPU device status fields: (1 byte access)
 bits:
0: Device is enabled and may be used by guest
@@ -96,7 +100,9 @@ write access:
   2: stores value into OST status register, triggers
  ACPI_DEVICE_OST QMP event from QEMU to external applications
  with current values of OST event and status registers.
-other values: reserved
+  3: lower 32 bit of architecture specific identifier
+ (in x86 case: APIC ID)
+  other values: reserved
 
 Typical usecases:
 - Get a cpu with pending event
diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
index 87f30a3..87813ce 100644
--- a/hw/acpi/cpu.c
+++ b/hw/acpi/cpu.c
@@ -12,11 +12,13 @@
 #define ACPI_CPU_FLAGS_OFFSET_RW 4
 #define ACPI_CPU_CMD_OFFSET_WR 5
 #define ACPI_CPU_CMD_DATA_OFFSET_RW 8
+#define ACPI_CPU_CMD_DATA2_OFFSET_R 0
 
 enum {
 CPHP_GET_NEXT_CPU_WITH_EVENT_CMD = 0,
 CPHP_OST_EVENT_CMD = 1,
 CPHP_OST_STATUS_CMD = 2,
+CPHP_GET_CPU_ID_CMD = 3,
 CPHP_CMD_MAX
 };
 
@@ -74,11 +76,24 @@ static uint64_t cpu_hotplug_rd(void *opaque, hwaddr addr, 
unsigned size)
 case CPHP_GET_NEXT_CPU_WITH_EVENT_CMD:
val = cpu_st->selector;
break;
+case CPHP_GET_CPU_ID_CMD:
+   val = cdev->arch_id & 0x;
+   break;
 default:
break;
 }
 trace_cpuhp_acpi_read_cmd_data(cpu_st->selector, val);
 break;
+case ACPI_CPU_CMD_DATA2_OFFSET_R:
+switch (cpu_st->command) {
+case CPHP_GET_CPU_ID_CMD:
+   val = cdev->arch_id >> 32;
+   break;
+default:
+   break;
+}
+trace_cpuhp_acpi_read_cmd_data2(cpu_st->selector, val);
+break;
 default:
 break;
 }
diff --git a/hw/acpi/trace-events b/hw/acpi/trace-events
index 96b8273..afbc77d 100644
--- a/hw/acpi/trace-events
+++ b/hw/acpi/trace-events
@@ -23,6 +23,7 @@ cpuhp_acpi_read_flags(uint32_t idx, uint8_t flags) 
"idx[0x%"PRIx32"] flags: 0x%"
 cpuhp_acpi_write_idx(uint32_t idx) "set active cpu idx: 0x%"PRIx32
 cpuhp_acpi_write_cmd(uint32_t idx, uint8_t cmd) "idx[0x%"PRIx32"] cmd: 
0x%"PRIx8
 cpuhp_acpi_read_cmd_data(uint32_t idx, uint32_t data) "idx[0x%"PRIx32"] data: 
0x%"PRIx32
+cpuhp_acpi_read_cmd_data2(uint32_t idx, uint32_t data) "idx[0x%"PRIx32"] data: 
0x%"PRIx32
 cpuhp_acpi_cpu_has_events(uint32_t idx, bool ins, bool rm) "idx[0x%"PRIx32"] 
inserting: %d, removing: %d"
 cpuhp_acpi_clear_inserting_evt(uint32_t idx) "idx[0x%"PRIx32"]"
 cpuhp_acpi_clear_remove_evt(uint32_t idx) "idx[0x%"PRIx32"]"
-- 
2.7.4




[PATCH for-5.0 2/8] tests: q35: MCH: add default SMBASE SMRAM lock test

2019-12-04 Thread Igor Mammedov
test lockable SMRAM at default SMBASE feature, introduced by
patch "q35: implement 128K SMRAM at default SMBASE address"

Signed-off-by: Igor Mammedov 
---
 tests/q35-test.c | 105 +++
 1 file changed, 105 insertions(+)

diff --git a/tests/q35-test.c b/tests/q35-test.c
index a68183d..dd02660 100644
--- a/tests/q35-test.c
+++ b/tests/q35-test.c
@@ -186,6 +186,109 @@ static void test_tseg_size(const void *data)
 qtest_quit(qts);
 }
 
+#define SMBASE 0x3
+#define SMRAM_TEST_PATTERN 0x32
+#define SMRAM_TEST_RESET_PATTERN 0x23
+
+static void test_smram_smbase_lock(void)
+{
+QPCIBus *pcibus;
+QPCIDevice *pcidev;
+QDict *response;
+QTestState *qts;
+int i;
+
+qts = qtest_init("-M q35");
+
+pcibus = qpci_new_pc(qts, NULL);
+g_assert(pcibus != NULL);
+
+pcidev = qpci_device_find(pcibus, 0);
+g_assert(pcidev != NULL);
+
+/* check that SMRAM is not enabled by default */
+g_assert(qpci_config_readb(pcidev, MCH_HOST_BRIDGE_F_SMBASE) == 0);
+qtest_writeb(qts, SMBASE, SMRAM_TEST_PATTERN);
+g_assert_cmpint(qtest_readb(qts, SMBASE), ==, SMRAM_TEST_PATTERN);
+
+/* check that writinng junk to 0x9c before before negotiating is ignorred 
*/
+for (i = 0; i < 0xff; i++) {
+qpci_config_writeb(pcidev, MCH_HOST_BRIDGE_F_SMBASE, i);
+g_assert(qpci_config_readb(pcidev, MCH_HOST_BRIDGE_F_SMBASE) == 0);
+}
+
+/* enable SMRAM at SMBASE */
+qpci_config_writeb(pcidev, MCH_HOST_BRIDGE_F_SMBASE, 0xff);
+g_assert(qpci_config_readb(pcidev, MCH_HOST_BRIDGE_F_SMBASE) == 0x01);
+/* lock SMRAM at SMBASE */
+qpci_config_writeb(pcidev, MCH_HOST_BRIDGE_F_SMBASE, 0x02);
+g_assert(qpci_config_readb(pcidev, MCH_HOST_BRIDGE_F_SMBASE) == 0x02);
+
+/* check that SMRAM at SMBASE is locked and can't be unlocked */
+g_assert_cmpint(qtest_readb(qts, SMBASE), ==, 0xff);
+for (i = 0; i <= 0xff; i++) {
+/* make sure register is immutable */
+qpci_config_writeb(pcidev, MCH_HOST_BRIDGE_F_SMBASE, i);
+g_assert(qpci_config_readb(pcidev, MCH_HOST_BRIDGE_F_SMBASE) == 0x02);
+
+/* RAM access should go inot black hole */
+qtest_writeb(qts, SMBASE, SMRAM_TEST_PATTERN);
+g_assert_cmpint(qtest_readb(qts, SMBASE), ==, 0xff);
+}
+
+/* reset */
+response = qtest_qmp(qts, "{'execute': 'system_reset', 'arguments': {} }");
+g_assert(response);
+g_assert(!qdict_haskey(response, "error"));
+qobject_unref(response);
+
+/* check RAM at SMBASE is available after reset */
+g_assert_cmpint(qtest_readb(qts, SMBASE), ==, SMRAM_TEST_PATTERN);
+g_assert(qpci_config_readb(pcidev, MCH_HOST_BRIDGE_F_SMBASE) == 0);
+qtest_writeb(qts, SMBASE, SMRAM_TEST_RESET_PATTERN);
+g_assert_cmpint(qtest_readb(qts, SMBASE), ==, SMRAM_TEST_RESET_PATTERN);
+
+g_free(pcidev);
+qpci_free_pc(pcibus);
+
+qtest_quit(qts);
+}
+
+static void test_without_smram_base(void)
+{
+QPCIBus *pcibus;
+QPCIDevice *pcidev;
+QTestState *qts;
+int i;
+
+qts = qtest_init("-M pc-q35-4.1");
+
+pcibus = qpci_new_pc(qts, NULL);
+g_assert(pcibus != NULL);
+
+pcidev = qpci_device_find(pcibus, 0);
+g_assert(pcidev != NULL);
+
+/* check that RAM accessible */
+qtest_writeb(qts, SMBASE, SMRAM_TEST_PATTERN);
+g_assert_cmpint(qtest_readb(qts, SMBASE), ==, SMRAM_TEST_PATTERN);
+
+/* check that writing to 0x9c succeeds */
+for (i = 0; i <= 0xff; i++) {
+qpci_config_writeb(pcidev, MCH_HOST_BRIDGE_F_SMBASE, i);
+g_assert(qpci_config_readb(pcidev, MCH_HOST_BRIDGE_F_SMBASE) == i);
+}
+
+/* check that RAM is still accessible */
+qtest_writeb(qts, SMBASE, SMRAM_TEST_PATTERN + 1);
+g_assert_cmpint(qtest_readb(qts, SMBASE), ==, (SMRAM_TEST_PATTERN + 1));
+
+g_free(pcidev);
+qpci_free_pc(pcibus);
+
+qtest_quit(qts);
+}
+
 int main(int argc, char **argv)
 {
 g_test_init(, , NULL);
@@ -197,5 +300,7 @@ int main(int argc, char **argv)
 qtest_add_data_func("/q35/tseg-size/8mb", _8mb, test_tseg_size);
 qtest_add_data_func("/q35/tseg-size/ext/16mb", _ext_16mb,
 test_tseg_size);
+qtest_add_func("/q35/smram/smbase_lock", test_smram_smbase_lock);
+qtest_add_func("/q35/smram/legacy_smbase", test_without_smram_base);
 return g_test_run();
 }
-- 
2.7.4




[PATCH for-5.0 4/8] acpi: cpuhp: spec: fix 'Command data' description

2019-12-04 Thread Igor Mammedov
Correct returned value description in case 'Command field' == 0x0,
it's in not PXM but CPU selector value with pending event

In addition describe 0 blanket value in case of not supported
'Command field' value.

Signed-off-by: Igor Mammedov 
---
 docs/specs/acpi_cpu_hotplug.txt | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/docs/specs/acpi_cpu_hotplug.txt b/docs/specs/acpi_cpu_hotplug.txt
index 4e65286..19c508f 100644
--- a/docs/specs/acpi_cpu_hotplug.txt
+++ b/docs/specs/acpi_cpu_hotplug.txt
@@ -56,9 +56,8 @@ read access:
3-7: reserved and should be ignored by OSPM
 [0x5-0x7] reserved
 [0x8] Command data: (DWORD access)
-  in case of error or unsupported command reads is 0x
-  current 'Command field' value:
-  0: returns PXM value corresponding to device
+  contains 0 unless last stored in 'Command field' value is one of:
+  0: contains 'CPU selector' value of a CPU with pending event[s]
 
 write access:
 offset:
@@ -81,9 +80,9 @@ write access:
   value:
 0: selects a CPU device with inserting/removing events and
following reads from 'Command data' register return
-   selected CPU (CPU selector value). If no CPU with events
-   found, the current CPU selector doesn't change and
-   corresponding insert/remove event flags are not set.
+   selected CPU ('CPU selector' value).
+   If no CPU with events found, the current 'CPU selector' doesn't
+   change and corresponding insert/remove event flags are not set.
 1: following writes to 'Command data' register set OST event
register in QEMU
 2: following writes to 'Command data' register set OST status
-- 
2.7.4




[PATCH for-5.0 3/8] acpi: cpuhp: spec: clarify 'CPU selector' register usage and endianness

2019-12-04 Thread Igor Mammedov
* Move reserved registers to the top of the section, so reader would be
  aware of effects when reading registers description.
* State registers endianness explicitly at the beginning of the section
* Describe registers behavior in case of 'CPU selector' register contains
  value that doesn't point to a possible CPU.

Signed-off-by: Igor Mammedov 
---
 docs/specs/acpi_cpu_hotplug.txt | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/docs/specs/acpi_cpu_hotplug.txt b/docs/specs/acpi_cpu_hotplug.txt
index ee219c8..4e65286 100644
--- a/docs/specs/acpi_cpu_hotplug.txt
+++ b/docs/specs/acpi_cpu_hotplug.txt
@@ -30,6 +30,18 @@ Register block base address:
 Register block size:
 ACPI_CPU_HOTPLUG_REG_LEN = 12
 
+All accesses to registers described below, imply little-endian byte order.
+
+Reserved resisters behavior:
+   - write accesses are ignored
+   - read accesses return all bits set to 0.
+
+The last stored value in 'CPU selector' must refer to a possible CPU, otherwise
+  - reads from any register return 0
+  - writes to any other register are ignored until valid value is stored into 
it
+On QEMU start, 'CPU selector' is initialized to a valid value, on reset it
+keeps the current value.
+
 read access:
 offset:
 [0x0-0x3] reserved
@@ -86,9 +98,3 @@ write access:
  ACPI_DEVICE_OST QMP event from QEMU to external applications
  with current values of OST event and status registers.
 other values: reserved
-
-Selecting CPU device beyond possible range has no effect on platform:
-   - write accesses to CPU hot-plug registers not documented above are
- ignored
-   - read accesses to CPU hot-plug registers not documented above return
- all bits set to 0.
-- 
2.7.4




Re: [PATCH v4 26/40] target/arm: Update define_one_arm_cp_reg_with_opaque for VHE

2019-12-04 Thread Alex Bennée


Richard Henderson  writes:

> For ARMv8.1, op1 == 5 is reserved for EL2 aliases of
> EL1 and EL0 registers.
>
> Signed-off-by: Richard Henderson 
> ---
>  target/arm/helper.c | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 023b8963cf..1812588fa1 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -7437,13 +7437,10 @@ void define_one_arm_cp_reg_with_opaque(ARMCPU *cpu,
>  mask = PL0_RW;
>  break;
>  case 4:
> +case 5:
>  /* min_EL EL2 */
>  mask = PL2_RW;
>  break;
> -case 5:
> -/* unallocated encoding, so not possible */
> -assert(false);
> -break;

This change is fine - I don't think we should have asserted here anyway.
But don't we generate an unallocated exception if the CPU is v8.0?


>  case 6:
>  /* min_EL EL3 */
>  mask = PL3_RW;


-- 
Alex Bennée



Re: [PATCH v2 1/5] virtiofsd: Get rid of unused fields in fv_QueueInfo

2019-12-04 Thread Dr. David Alan Gilbert
* Vivek Goyal (vgo...@redhat.com) wrote:
> There are some unused fields in "struct fv_QueueInfo". Get rid of these 
> fields.
> 
> Signed-off-by: Vivek Goyal 
> ---
>  contrib/virtiofsd/fuse_virtio.c | 6 --
>  1 file changed, 6 deletions(-)
> 
> diff --git a/contrib/virtiofsd/fuse_virtio.c b/contrib/virtiofsd/fuse_virtio.c
> index 31c8542b6c..2a9cd60a01 100644
> --- a/contrib/virtiofsd/fuse_virtio.c
> +++ b/contrib/virtiofsd/fuse_virtio.c
> @@ -50,12 +50,6 @@ struct fv_QueueInfo {
>  int qidx;
>  int kick_fd;
>  int kill_fd; /* For killing the thread */
> -
> -/* The element for the command currently being processed */
> -VuVirtqElement *qe;
> -/* If any of the qe vec elements (towards vmm) are unmappable */
> -unsigned int elem_bad_in;
> -bool reply_sent;

Yep, those last two got moved into FVRequest as part of the thread pool
stuff.


Reviewed-by: Dr. David Alan Gilbert 

>  };
>  
>  /* A FUSE request */
> -- 
> 2.20.1
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [PATCH v2 2/7] iotests: Skip test 060 if it is not possible to create large files

2019-12-04 Thread Philippe Mathieu-Daudé

On 12/4/19 4:46 PM, Thomas Huth wrote:

Test 060 fails in the arm64, s390x and ppc64le LXD containers on Travis
(which we will hopefully enable in our CI soon). These containers
apparently do not allow large files to be created. The repair process
in test 060 creates a file of 64 GiB, so test first whether such large
files are possible and skip the test if that's not the case.

Signed-off-by: Thomas Huth 
---
  tests/qemu-iotests/060 | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/tests/qemu-iotests/060 b/tests/qemu-iotests/060
index b91d8321bb..d96f17a484 100755
--- a/tests/qemu-iotests/060
+++ b/tests/qemu-iotests/060
@@ -49,6 +49,9 @@ _supported_fmt qcow2
  _supported_proto file
  _supported_os Linux
  
+# The repair process will create a large file - so check for availability first

+_require_large_file 64G
+
  rt_offset=65536  # 0x1 (XXX: just an assumption)
  rb_offset=131072 # 0x2 (XXX: just an assumption)
  l1_offset=196608 # 0x3 (XXX: just an assumption)



Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH] target/i386: relax assert when old host kernels don't include msrs

2019-12-04 Thread Paolo Bonzini
On 04/12/19 16:47, Eduardo Habkost wrote:
> On Wed, Dec 04, 2019 at 04:34:45PM +0100, Paolo Bonzini wrote:
>> On 04/12/19 16:07, Catherine Ho wrote:
 Ok, so the problem is that some MSR didn't exist in that version.  Which
>>> I thought in my platform, the only MSR didn't exist is MSR_IA32_VMX_BASIC
>>> (0x480). If I remove this kvm_msr_entry_add(), everything is ok, the guest 
>>> can
>>> be boot up successfully.
>>>
>>
>> MSR_IA32_VMX_BASIC was added in kvm-4.10.  Maybe the issue is the
>> _value_ that is being written to the VM is not valid?  Can you check
>> what's happening in vmx_restore_vmx_basic?
> 
> I believe env->features[FEAT_VMX_BASIC] will be initialized to 0
> if the host kernel doesn't have KVM_CAP_GET_MSR_FEATURES.

But the host must have MSR features if the MSRs are added:

if (kvm_feature_msrs && cpu_has_vmx(env)) {
kvm_msr_entry_add_vmx(cpu, env->features);
}

Looks like feature MSRs were backported to 4.14, but
1389309c811b0c954bf3b591b761d79b1700283d and the previous commit weren't.

Paolo




Re: [PATCH v4 23/40] target/arm: Update ctr_el0_access for EL2

2019-12-04 Thread Alex Bennée


Richard Henderson  writes:

> Update to include checks against HCR_EL2.TID2.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

> ---
>  target/arm/helper.c | 26 +-
>  1 file changed, 21 insertions(+), 5 deletions(-)
>
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index ffa82b5509..9ad5015d5c 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -5212,11 +5212,27 @@ static const ARMCPRegInfo el3_cp_reginfo[] = {
>  static CPAccessResult ctr_el0_access(CPUARMState *env, const ARMCPRegInfo 
> *ri,
>   bool isread)
>  {
> -/* Only accessible in EL0 if SCTLR.UCT is set (and only in AArch64,
> - * but the AArch32 CTR has its own reginfo struct)
> - */
> -if (arm_current_el(env) == 0 && !(env->cp15.sctlr_el[1] & SCTLR_UCT)) {
> -return CP_ACCESS_TRAP;
> +int cur_el = arm_current_el(env);
> +
> +if (cur_el < 2) {
> +uint64_t hcr = arm_hcr_el2_eff(env);
> +
> +if (cur_el == 0) {
> +if ((hcr & (HCR_E2H | HCR_TGE)) == (HCR_E2H | HCR_TGE)) {
> +if (!(env->cp15.sctlr_el[2] & SCTLR_UCT)) {
> +return CP_ACCESS_TRAP_EL2;
> +}
> +} else {
> +if (!(env->cp15.sctlr_el[1] & SCTLR_UCT)) {
> +return CP_ACCESS_TRAP;
> +}
> +if (hcr & HCR_TID2) {
> +return CP_ACCESS_TRAP_EL2;
> +}
> +}
> +} else if (hcr & HCR_TID2) {
> +return CP_ACCESS_TRAP_EL2;
> +}
>  }
>  return CP_ACCESS_OK;
>  }


-- 
Alex Bennée



Re: qom device lifecycle interaction with hotplug/hotunplug ?

2019-12-04 Thread Jens Freimann

On Wed, Dec 04, 2019 at 11:35:37AM -0300, Eduardo Habkost wrote:

On Wed, Dec 04, 2019 at 10:18:24AM +0100, Jens Freimann wrote:

On Tue, Dec 03, 2019 at 06:40:04PM -0300, Eduardo Habkost wrote:
> +jfreimann, +mst
>
> On Sat, Nov 30, 2019 at 11:10:19AM +, Peter Maydell wrote:
> > On Fri, 29 Nov 2019 at 20:05, Eduardo Habkost  wrote:
> > > So, to summarize the current issues:
> > >
> > > 1) realize triggers a plug operation implicitly.
> > > 2) unplug triggers unrealize implicitly.
> > >
> > > Do you expect to see use cases that will require us to implement
> > > realize-without-plug?
> >
> > I don't think so, but only because of the oddity that
> > we put lots of devices on the 'sysbus' and claim that
> > that's plugging them into the bus. The common case of
> > 'realize' is where one device (say an SoC) has a bunch of child
> > devices (like UARTs); the SoC's realize method realizes its child
> > devices. Those devices all end up plugged into the 'sysbus'
> > but there's no actual bus there, it's fictional and about
> > the only thing it matters for is reset propagation (which
> > we don't model right either). A few devices don't live on
> > buses at all.
>
> That's my impression as well.
>
> >
> > > Similarly, do you expect use cases that will require us to
> > > implement unplug-without-unrealize?
> >
> > I don't know enough about hotplug to answer this one:
> > it's essentially what I'm hoping you'd be able to answer.
> > I vaguely had in mind that eg the user might be able to
> > create a 'disk' object, plug it into a SCSI bus, then
> > unplug it from the bus without the disk and all its data
> > evaporating, and maybe plug it back into the SCSI
> > bus (or some other SCSI bus) later ? But I don't know
> > anything about how we expose that kind of thing to the
> > user via QMP/HMP.
>
> This ability isn't exposed to the user at all.  Our existing
> interfaces are -device, device_add and device_del.
>
> We do have something new that sounds suspiciously similar to
> "unplugged but not unrealized", though: the new hidden device
> API, added by commit f3a850565693 ("qdev/qbus: add hidden device
> support").
>
> Jens, Michael, what exactly is the difference between a "hidden"
> device and a "unplugged" device?

"hidden" the way we use it for virtio-net failover is actually unplugged. But it
doesn't have to be that way. You can register a function that decides
if the device should be hidden, i.e. plugged now, or do something else
with it (in the virtio-net failover case we just save everything we
need to plug the device later).

We did introduce a "unplugged but not unrealized" function too as part
of the failover feature. See "a99c4da9fc pci: mark devices partially
unplugged"

This was needed so we would be able to re-plug the device in case a
migration failed and we need to hotplug the primary device back to the
guest. To avoid the risk of not getting the resources the device needs
we don't unrealize but just trigger the unplug from the guest OS.


Thanks for the explanation.  Let me confirm if I understand the
purpose of the new mechanisms: should_be_hidden is a mechanism
for implementing realize-without-plug.  partially_hotplugged is a
mechanism for implementing unplug-without-unrealize.  Is that
correct?


should_be_hidden is a mechanism for implementing
realize-without-plug: kind of. It's a mechanism that ensures
qdev_device_add() returns early as long as the condition to hide the
device is true. You could to the realize-without-plug in the handler
function that decides if the device should be "hidden".  


partially_hotplugged is a mechanism for implementing
unplug-without-unrealize: yes. 


regards
Jens




[for-5.0 PATCH 2/4] xics: Don't deassert outputs

2019-12-04 Thread Greg Kurz
The correct way to do this is to deassert the input pins on the CPU side.
This is the case since a previous change.

Signed-off-by: Greg Kurz 
---
 hw/intc/xics.c |3 ---
 1 file changed, 3 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 0b259a09c545..1952009e6d22 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -289,9 +289,6 @@ void icp_reset(ICPState *icp)
 icp->pending_priority = 0xff;
 icp->mfrr = 0xff;
 
-/* Make all outputs are deasserted */
-qemu_set_irq(icp->output, 0);
-
 if (kvm_irqchip_in_kernel()) {
 Error *local_err = NULL;
 




Re: [PATCH 04/10] arm: allwinner-h3: add USB host controller

2019-12-04 Thread Aleksandar Markovic
On Monday, December 2, 2019, Niek Linnenbank 
wrote:

> The Allwinner H3 System on Chip contains multiple USB 2.0 bus
> connections which provide software access using the Enhanced
> Host Controller Interface (EHCI) and Open Host Controller
> Interface (OHCI) interfaces. This commit adds support for
> both interfaces in the Allwinner H3 System on Chip.
>
> Signed-off-by: Niek Linnenbank 
> ---


Niek, hi!

I would like to clarify a detail here:

The spec of the SoC enumerates (in 8.5.2.4. USB Host Register List) a
number of registers for reading various USB-related states, but also for
setting some of USB features.

Does this series cover these registers, and interaction with them? If yes,
how and where? If not, do you think it is not necessary at all? Or perhaps
that it is a non-crucial limitation of this series?

Thanks in advance, and congrats for your, it seems, first submission!

Aleksandar


 hw/arm/allwinner-h3.c| 20 
>  hw/usb/hcd-ehci-sysbus.c | 17 +
>  hw/usb/hcd-ehci.h|  1 +
>  3 files changed, 38 insertions(+)
>
> diff --git a/hw/arm/allwinner-h3.c b/hw/arm/allwinner-h3.c
> index 5566e979ec..afeb49c0ac 100644
> --- a/hw/arm/allwinner-h3.c
> +++ b/hw/arm/allwinner-h3.c
> @@ -26,6 +26,7 @@
>  #include "hw/sysbus.h"
>  #include "hw/arm/allwinner-h3.h"
>  #include "hw/misc/unimp.h"
> +#include "hw/usb/hcd-ehci.h"
>  #include "sysemu/sysemu.h"
>
>  static void aw_h3_init(Object *obj)
> @@ -183,6 +184,25 @@ static void aw_h3_realize(DeviceState *dev, Error
> **errp)
>  }
>  sysbus_mmio_map(SYS_BUS_DEVICE(>ccu), 0, AW_H3_CCU_BASE);
>
> +/* Universal Serial Bus */
> +sysbus_create_simple(TYPE_AW_H3_EHCI, AW_H3_EHCI0_BASE,
> + s->irq[AW_H3_GIC_SPI_EHCI0]);
> +sysbus_create_simple(TYPE_AW_H3_EHCI, AW_H3_EHCI1_BASE,
> + s->irq[AW_H3_GIC_SPI_EHCI1]);
> +sysbus_create_simple(TYPE_AW_H3_EHCI, AW_H3_EHCI2_BASE,
> + s->irq[AW_H3_GIC_SPI_EHCI2]);
> +sysbus_create_simple(TYPE_AW_H3_EHCI, AW_H3_EHCI3_BASE,
> + s->irq[AW_H3_GIC_SPI_EHCI3]);
> +
> +sysbus_create_simple("sysbus-ohci", AW_H3_OHCI0_BASE,
> + s->irq[AW_H3_GIC_SPI_OHCI0]);
> +sysbus_create_simple("sysbus-ohci", AW_H3_OHCI1_BASE,
> + s->irq[AW_H3_GIC_SPI_OHCI1]);
> +sysbus_create_simple("sysbus-ohci", AW_H3_OHCI2_BASE,
> + s->irq[AW_H3_GIC_SPI_OHCI2]);
> +sysbus_create_simple("sysbus-ohci", AW_H3_OHCI3_BASE,
> + s->irq[AW_H3_GIC_SPI_OHCI3]);
> +
>  /* UART */
>  if (serial_hd(0)) {
>  serial_mm_init(get_system_memory(), AW_H3_UART0_REG_BASE, 2,
> diff --git a/hw/usb/hcd-ehci-sysbus.c b/hw/usb/hcd-ehci-sysbus.c
> index 020211fd10..174c3446ef 100644
> --- a/hw/usb/hcd-ehci-sysbus.c
> +++ b/hw/usb/hcd-ehci-sysbus.c
> @@ -145,6 +145,22 @@ static const TypeInfo ehci_exynos4210_type_info = {
>  .class_init= ehci_exynos4210_class_init,
>  };
>
> +static void ehci_aw_h3_class_init(ObjectClass *oc, void *data)
> +{
> +SysBusEHCIClass *sec = SYS_BUS_EHCI_CLASS(oc);
> +DeviceClass *dc = DEVICE_CLASS(oc);
> +
> +sec->capsbase = 0x0;
> +sec->opregbase = 0x10;
> +set_bit(DEVICE_CATEGORY_USB, dc->categories);
> +}
> +
> +static const TypeInfo ehci_aw_h3_type_info = {
> +.name  = TYPE_AW_H3_EHCI,
> +.parent= TYPE_SYS_BUS_EHCI,
> +.class_init= ehci_aw_h3_class_init,
> +};
> +
>  static void ehci_tegra2_class_init(ObjectClass *oc, void *data)
>  {
>  SysBusEHCIClass *sec = SYS_BUS_EHCI_CLASS(oc);
> @@ -267,6 +283,7 @@ static void ehci_sysbus_register_types(void)
>  type_register_static(_platform_type_info);
>  type_register_static(_xlnx_type_info);
>  type_register_static(_exynos4210_type_info);
> +type_register_static(_aw_h3_type_info);
>  type_register_static(_tegra2_type_info);
>  type_register_static(_ppc4xx_type_info);
>  type_register_static(_fusbh200_type_info);
> diff --git a/hw/usb/hcd-ehci.h b/hw/usb/hcd-ehci.h
> index 0298238f0b..edb59311c4 100644
> --- a/hw/usb/hcd-ehci.h
> +++ b/hw/usb/hcd-ehci.h
> @@ -342,6 +342,7 @@ typedef struct EHCIPCIState {
>  #define TYPE_SYS_BUS_EHCI "sysbus-ehci-usb"
>  #define TYPE_PLATFORM_EHCI "platform-ehci-usb"
>  #define TYPE_EXYNOS4210_EHCI "exynos4210-ehci-usb"
> +#define TYPE_AW_H3_EHCI "aw-h3-ehci-usb"
>  #define TYPE_TEGRA2_EHCI "tegra2-ehci-usb"
>  #define TYPE_PPC4xx_EHCI "ppc4xx-ehci-usb"
>  #define TYPE_FUSBH200_EHCI "fusbh200-ehci-usb"
> --
> 2.17.1
>
>
>


Re: [PATCH] target/sparc: Remove old TODO file

2019-12-04 Thread Thomas Huth
On 30/09/2019 19.10, Thomas Huth wrote:
> This file hasn't seen a real (non-trivial) update since 2008 anymore,
> so we can assume that it is pretty much out of date and nobody cares
> for it anymore. Let's simply remove it.
> 
> Signed-off-by: Thomas Huth 
> ---
>  target/sparc/TODO | 88 ---
>  1 file changed, 88 deletions(-)
>  delete mode 100644 target/sparc/TODO
> 
> diff --git a/target/sparc/TODO b/target/sparc/TODO
> deleted file mode 100644
> index b8c727e858..00
> --- a/target/sparc/TODO
> +++ /dev/null
> @@ -1,88 +0,0 @@
> -TODO-list:
> -
> -CPU common:
> -- Unimplemented features/bugs:
> - - Delay slot handling may fail sometimes (branch end of page, delay
> - slot next page)
> - - Atomical instructions
> - - CPU features should match real CPUs (also ASI selection)
> -- Optimizations/improvements:
> - - Condition code/branch handling like x86, also for FPU?
> - - Remove remaining explicit alignment checks
> - - Global register for regwptr, so that windowed registers can be
> - accessed directly
> - - Improve Sparc32plus addressing
> - - NPC/PC static optimisations (use JUMP_TB when possible)? (Is this
> - obsolete?)
> - - Synthetic instructions
> - - MMU model dependent on CPU model
> - - Select ASI helper at translation time (on V9 only if known)
> - - KQemu/KVM support for VM only
> - - Hardware breakpoint/watchpoint support
> - - Cache emulation mode
> - - Reverse-endian pages
> - - Faster FPU emulation
> - - Busy loop detection
> -
> -Sparc32 CPUs:
> -- Unimplemented features/bugs:
> - - Sun4/Sun4c MMUs
> - - Some V8 ASIs
> -
> -Sparc64 CPUs:
> -- Unimplemented features/bugs:
> - - Interrupt handling
> - - Secondary address space, other MMU functions
> - - Many V9/UA2005/UA2007 ASIs
> - - Rest of V9 instructions, missing VIS instructions
> - - IG/MG/AG vs. UA2007 globals
> - - Full hypervisor support
> - - SMP/CMT
> - - Sun4v CPUs
> -
> -Sun4:
> -- To be added
> -
> -Sun4c:
> -- A lot of unimplemented features
> -- Maybe split from Sun4m
> -
> -Sun4m:
> -- Unimplemented features/bugs:
> - - Hardware devices do not match real boards
> - - Floppy does not work
> - - CS4231: merge with cs4231a, add DMA
> - - Add cg6, bwtwo
> - - Arbitrary resolution support
> - - PCI for MicroSparc-IIe
> - - JavaStation machines
> - - SBus slot probing, FCode ROM support
> - - SMP probing support
> - - Interrupt routing does not match real HW
> - - SuSE 7.3 keyboard sometimes unresponsive
> - - Gentoo 2004.1 SMP does not work
> - - SS600MP ledma -> lebuffer
> - - Type 5 keyboard
> - - Less fixed hardware choices
> - - DBRI audio (Am7930)
> - - BPP parallel
> - - Diagnostic switch
> - - ESP PIO mode
> -
> -Sun4d:
> -- A lot of unimplemented features:
> - - SBI
> - - IO-unit
> -- Maybe split from Sun4m
> -
> -Sun4u:
> -- Unimplemented features/bugs:
> - - Interrupt controller
> - - PCI/IOMMU support (Simba, JIO, Tomatillo, Psycho, Schizo, Safari...)
> - - SMP
> - - Happy Meal Ethernet, flash, I2C, GPIO
> - - A lot of real machine types
> -
> -Sun4v:
> -- A lot of unimplemented features
> - - A lot of real machine types
> 

Ping?

 Thomas




Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion

2019-12-04 Thread Dr. David Alan Gilbert
* Scott Cheloha (chel...@linux.vnet.ibm.com) wrote:
> On Mon, Oct 21, 2019 at 09:14:44AM +0100, Dr. David Alan Gilbert wrote:
> > * David Gibson (da...@gibson.dropbear.id.au) wrote:
> > > On Fri, Oct 18, 2019 at 10:43:52AM +0100, Dr. David Alan Gilbert wrote:
> > > > * Laurent Vivier (lviv...@redhat.com) wrote:
> > > > > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote:
> > > > > > * Scott Cheloha (chel...@linux.vnet.ibm.com) wrote:
> > > > > >> savevm_state's SaveStateEntry TAILQ is a priority queue.  Priority
> > > > > >> sorting is maintained by searching from head to tail for a suitable
> > > > > >> insertion spot.  Insertion is thus an O(n) operation.
> > > > > >>
> > > > > >> If we instead keep track of the head of each priority's subqueue
> > > > > >> within that larger queue we can reduce this operation to O(1) time.
> > > > > >>
> > > > > >> savevm_state_handler_remove() becomes slightly more complex to
> > > > > >> accomodate these gains: we need to replace the head of a priority's
> > > > > >> subqueue when removing it.
> > > > > >>
> > > > > >> With O(1) insertion, booting VMs with many SaveStateEntry objects 
> > > > > >> is
> > > > > >> more plausible.  For example, a ppc64 VM with maxmem=8T has 4 
> > > > > >> such
> > > > > >> objects to insert.
> > > > > > 
> > > > > > Separate from reviewing this patch, I'd like to understand why 
> > > > > > you've
> > > > > > got 4 objects.  This feels very very wrong and is likely to 
> > > > > > cause
> > > > > > problems to random other bits of qemu as well.
> > > > > 
> > > > > I think the 4 objects are the "dr-connectors" that are used to 
> > > > > plug
> > > > > peripherals (memory, pci card, cpus, ...).
> > > > 
> > > > Yes, Scott confirmed that in the reply to the previous version.
> > > > IMHO nothing in qemu is designed to deal with that many devices/objects
> > > > - I'm sure that something other than the migration code is going to
> > > > get upset.
> > > 
> > > It kind of did.  Particularly when there was n^2 and n^3 cubed
> > > behaviour in the property stuff we had some ludicrously long startup
> > > times (hours) with large maxmem values.
> > > 
> > > Fwiw, the DRCs for PCI slots, DRCs and PHBs aren't really a problem.
> > > The problem is the memory DRCs, there's one for each LMB - each 256MiB
> > > chunk of memory (or possible memory).
> > > 
> > > > Is perhaps the structure wrong somewhere - should there be a single DRC
> > > > device that knows about all DRCs?
> > > 
> > > Maybe.  The tricky bit is how to get there from here without breaking
> > > migration or something else along the way.
> > 
> > Switch on the next machine type version - it doesn't matter if migration
> > is incompatible then.
> 
> 1mo bump.
> 
> Is there anything I need to do with this patch in particular to make it 
> suitable
> for merging?

Apologies for the delay;  hopefully this will go in one of the pulls
just after the tree opens again.

Please please try and work on reducing the number of objects somehow -
while this migration fix is a useful short term fix, and not too
invasive; having that many objects around qemu is a really really bad
idea so needs fixing properly.

Dave

> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




[PATCH for-5.0 8/8] acpi: cpuhp: spec: document procedure for enabling modern CPU hotplug

2019-12-04 Thread Igor Mammedov
Describe how to enable and detect modern CPU hotplug interface.
Detection part is based on new CPHP_GET_CPU_ID_CMD command,
introduced by "acpi: cpuhp: add CPHP_GET_CPU_ID_CMD command" patch.

Signed-off-by: Igor Mammedov 
---
 docs/specs/acpi_cpu_hotplug.txt | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/docs/specs/acpi_cpu_hotplug.txt b/docs/specs/acpi_cpu_hotplug.txt
index bb33144..667b264 100644
--- a/docs/specs/acpi_cpu_hotplug.txt
+++ b/docs/specs/acpi_cpu_hotplug.txt
@@ -15,14 +15,14 @@ CPU present bitmap for:
   PIIX-PM  (IO port 0xaf00-0xaf1f, 1-byte access)
   One bit per CPU. Bit position reflects corresponding CPU APIC ID. Read-only.
   The first DWORD in bitmap is used in write mode to switch from legacy
-  to new CPU hotplug interface, write 0 into it to do switch.
+  to modern CPU hotplug interface, write 0 into it to do switch.
 ---
 QEMU sets corresponding CPU bit on hot-add event and issues SCI
 with GPE.2 event set. CPU present map is read by ACPI BIOS GPE.2 handler
 to notify OS about CPU hot-add events. CPU hot-remove isn't supported.
 
 =
-ACPI CPU hotplug interface registers:
+Modern ACPI CPU hotplug interface registers:
 -
 Register block base address:
 ICH9-LPC IO port 0x0cd8
@@ -105,6 +105,24 @@ write access:
   other values: reserved
 
 Typical usecases:
+- (x86) Detecting and enabling modern CPU hotplug interface.
+  QEMU starts with legacy CPU hotplug interface enabled. Detecting and
+  switching to modern interface is based on the 2 legacy CPU hotplug 
features:
+1. Writes into CPU bitmap are ignored.
+2. CPU bitmap always has bit#0 set, corresponding to boot CPU.
+
+  Use following steps to detect and enable modern CPU hotplug 
interface:
+1. Store 0x0 to the 'CPU selector' register,
+   attempting to switch to modern mode
+2. Store 0x0 to the 'CPU selector' register,
+   to ensure valid selector value
+3. Store 0x3 to the 'Command field' register,
+   sets the 'Command data 2' register into architecture specific
+   CPU identifier mode
+4. Read the 'Command data 2' register.
+   If read value is 0x0, the modern interface is enabled.
+   Otherwise legacy or no CPU hotplug interface available
+
 - Get a cpu with pending event
   1. Store 0x0 to the 'CPU selector' register.
   2. Store 0x0 to the 'Command field' register.
-- 
2.7.4




Re: [PATCH v9 Kernel 2/5] vfio iommu: Add ioctl defination to get dirty pages bitmap.

2019-12-04 Thread Alex Williamson
On Wed, 4 Dec 2019 23:40:25 +0530
Kirti Wankhede  wrote:

> On 12/3/2019 11:34 PM, Alex Williamson wrote:
> > On Mon, 25 Nov 2019 19:57:39 -0500
> > Yan Zhao  wrote:
> >   
> >> On Fri, Nov 15, 2019 at 05:06:25AM +0800, Alex Williamson wrote:  
> >>> On Fri, 15 Nov 2019 00:26:07 +0530
> >>> Kirti Wankhede  wrote:
> >>>  
>  On 11/14/2019 1:37 AM, Alex Williamson wrote:  
> > On Thu, 14 Nov 2019 01:07:21 +0530
> > Kirti Wankhede  wrote:
> >
> >> On 11/13/2019 4:00 AM, Alex Williamson wrote:  
> >>> On Tue, 12 Nov 2019 22:33:37 +0530
> >>> Kirti Wankhede  wrote:
> >>>   
>  All pages pinned by vendor driver through vfio_pin_pages API should 
>  be
>  considered as dirty during migration. IOMMU container maintains a 
>  list of
>  all such pinned pages. Added an ioctl defination to get bitmap of 
>  such  
> >>>
> >>> definition
> >>>   
>  pinned pages for requested IO virtual address range.  
> >>>
> >>> Additionally, all mapped pages are considered dirty when physically
> >>> mapped through to an IOMMU, modulo we discussed devices opting in to
> >>> per page pinning to indicate finer granularity with a TBD mechanism to
> >>> figure out if any non-opt-in devices remain.
> >>>   
> >>
> >> You mean, in case of device direct assignment (device pass through)?  
> >
> > Yes, or IOMMU backed mdevs.  If vfio_dmas in the container are fully
> > pinned and mapped, then the correct dirty page set is all mapped pages.
> > We discussed using the vpfn list as a mechanism for vendor drivers to
> > reduce their migration footprint, but we also discussed that we would
> > need a way to determine that all participants in the container have
> > explicitly pinned their working pages or else we must consider the
> > entire potential working set as dirty.
> >
> 
>  How can vendor driver tell this capability to iommu module? Any 
>  suggestions?  
> >>>
> >>> I think it does so by pinning pages.  Is it acceptable that if the
> >>> vendor driver pins any pages, then from that point forward we consider
> >>> the IOMMU group dirty page scope to be limited to pinned pages?  There  
> >> we should also be aware of that dirty page scope is pinned pages + 
> >> unpinned pages,
> >> which means ever since a page is pinned, it should be regarded as dirty
> >> no matter whether it's unpinned later. only after log_sync is called and
> >> dirty info retrieved, its dirty state should be cleared.  
> > 
> > Yes, good point.  We can't just remove a vpfn when a page is unpinned
> > or else we'd lose information that the page potentially had been
> > dirtied while it was pinned.  Maybe that vpfn needs to move to a dirty
> > list and both the currently pinned vpfns and the dirty vpfns are walked
> > on a log_sync.  The dirty vpfns list would be cleared after a log_sync.
> > The container would need to know that dirty tracking is enabled and
> > only manage the dirty vpfns list when necessary.  Thanks,
> >   
> 
> If page is unpinned, then that page is available in free page pool for 
> others to use, then how can we say that unpinned page has valid data?
> 
> If suppose, one driver A unpins a page and when driver B of some other 
> device gets that page and he pins it, uses it, and then unpins it, then 
> how can we say that page has valid data for driver A?
> 
> Can you give one example where unpinned page data is considered reliable 
> and valid?

We can only pin pages that the user has already allocated* and mapped
through the vfio DMA API.  The pinning of the page simply locks the
page for the vendor driver to access it and unpinning that page only
indicates that access is complete.  Pages are not freed when a vendor
driver unpins them, they still exist and at this point we're now
assuming the device dirtied the page while it was pinned.  Thanks,

Alex

* An exception here is that the page might be demand allocated and the
  act of pinning the page could actually allocate the backing page for
  the user if they have not faulted the page to trigger that allocation
  previously.  That page remains mapped for the user's virtual address
  space even after the unpinning though.




Re: qom device lifecycle interaction with hotplug/hotunplug ?

2019-12-04 Thread Eduardo Habkost
On Wed, Dec 04, 2019 at 05:21:25PM +0100, Jens Freimann wrote:
> On Wed, Dec 04, 2019 at 11:35:37AM -0300, Eduardo Habkost wrote:
> > On Wed, Dec 04, 2019 at 10:18:24AM +0100, Jens Freimann wrote:
> > > On Tue, Dec 03, 2019 at 06:40:04PM -0300, Eduardo Habkost wrote:
> > > > +jfreimann, +mst
> > > >
> > > > On Sat, Nov 30, 2019 at 11:10:19AM +, Peter Maydell wrote:
> > > > > On Fri, 29 Nov 2019 at 20:05, Eduardo Habkost  
> > > > > wrote:
> > > > > > So, to summarize the current issues:
> > > > > >
> > > > > > 1) realize triggers a plug operation implicitly.
> > > > > > 2) unplug triggers unrealize implicitly.
> > > > > >
> > > > > > Do you expect to see use cases that will require us to implement
> > > > > > realize-without-plug?
> > > > >
> > > > > I don't think so, but only because of the oddity that
> > > > > we put lots of devices on the 'sysbus' and claim that
> > > > > that's plugging them into the bus. The common case of
> > > > > 'realize' is where one device (say an SoC) has a bunch of child
> > > > > devices (like UARTs); the SoC's realize method realizes its child
> > > > > devices. Those devices all end up plugged into the 'sysbus'
> > > > > but there's no actual bus there, it's fictional and about
> > > > > the only thing it matters for is reset propagation (which
> > > > > we don't model right either). A few devices don't live on
> > > > > buses at all.
> > > >
> > > > That's my impression as well.
> > > >
> > > > >
> > > > > > Similarly, do you expect use cases that will require us to
> > > > > > implement unplug-without-unrealize?
> > > > >
> > > > > I don't know enough about hotplug to answer this one:
> > > > > it's essentially what I'm hoping you'd be able to answer.
> > > > > I vaguely had in mind that eg the user might be able to
> > > > > create a 'disk' object, plug it into a SCSI bus, then
> > > > > unplug it from the bus without the disk and all its data
> > > > > evaporating, and maybe plug it back into the SCSI
> > > > > bus (or some other SCSI bus) later ? But I don't know
> > > > > anything about how we expose that kind of thing to the
> > > > > user via QMP/HMP.
> > > >
> > > > This ability isn't exposed to the user at all.  Our existing
> > > > interfaces are -device, device_add and device_del.
> > > >
> > > > We do have something new that sounds suspiciously similar to
> > > > "unplugged but not unrealized", though: the new hidden device
> > > > API, added by commit f3a850565693 ("qdev/qbus: add hidden device
> > > > support").
> > > >
> > > > Jens, Michael, what exactly is the difference between a "hidden"
> > > > device and a "unplugged" device?
> > > 
> > > "hidden" the way we use it for virtio-net failover is actually unplugged. 
> > > But it
> > > doesn't have to be that way. You can register a function that decides
> > > if the device should be hidden, i.e. plugged now, or do something else
> > > with it (in the virtio-net failover case we just save everything we
> > > need to plug the device later).
> > > 
> > > We did introduce a "unplugged but not unrealized" function too as part
> > > of the failover feature. See "a99c4da9fc pci: mark devices partially
> > > unplugged"
> > > 
> > > This was needed so we would be able to re-plug the device in case a
> > > migration failed and we need to hotplug the primary device back to the
> > > guest. To avoid the risk of not getting the resources the device needs
> > > we don't unrealize but just trigger the unplug from the guest OS.
> > 
> > Thanks for the explanation.  Let me confirm if I understand the
> > purpose of the new mechanisms: should_be_hidden is a mechanism
> > for implementing realize-without-plug.  partially_hotplugged is a
> > mechanism for implementing unplug-without-unrealize.  Is that
> > correct?
> 
> should_be_hidden is a mechanism for implementing
> realize-without-plug: kind of. It's a mechanism that ensures
> qdev_device_add() returns early as long as the condition to hide the
> device is true. You could to the realize-without-plug in the handler
> function that decides if the device should be "hidden".

Oh, right.  I thought "qdev_device_add() returns early" meant
"return after realize, before plug".  Now I see it returns before
object_new().  This means we have another user-visible device
state: "defined (in QemuOpts), but not created".

> 
> partially_hotplugged is a mechanism for implementing
> unplug-without-unrealize: yes.

Thanks!

-- 
Eduardo




[PATCH v2 3/5] virtiofd: Create a notification queue

2019-12-04 Thread Vivek Goyal
Add a notification queue which will be used to send async notifications
for file lock availability.

Signed-off-by: Vivek Goyal 
---
 contrib/virtiofsd/fuse_i.h |  1 +
 contrib/virtiofsd/fuse_virtio.c| 74 +++---
 hw/virtio/vhost-user-fs-pci.c  |  2 +-
 hw/virtio/vhost-user-fs.c  | 37 +--
 include/hw/virtio/vhost-user-fs.h  |  1 +
 include/standard-headers/linux/virtio_fs.h |  3 +
 6 files changed, 87 insertions(+), 31 deletions(-)

diff --git a/contrib/virtiofsd/fuse_i.h b/contrib/virtiofsd/fuse_i.h
index 966b1a3baa..4eeae0bfeb 100644
--- a/contrib/virtiofsd/fuse_i.h
+++ b/contrib/virtiofsd/fuse_i.h
@@ -74,6 +74,7 @@ struct fuse_session {
char *vu_socket_lock;
struct fv_VuDev *virtio_dev;
int thread_pool_size;
+   bool notify_enabled;
 };
 
 struct fuse_chan {
diff --git a/contrib/virtiofsd/fuse_virtio.c b/contrib/virtiofsd/fuse_virtio.c
index 2a9cd60a01..b1eebcf054 100644
--- a/contrib/virtiofsd/fuse_virtio.c
+++ b/contrib/virtiofsd/fuse_virtio.c
@@ -14,6 +14,7 @@
 #include "qemu/osdep.h"
 #include "qemu/iov.h"
 #include "qapi/error.h"
+#include "standard-headers/linux/virtio_fs.h"
 #include "fuse_i.h"
 #include "fuse_kernel.h"
 #include "fuse_misc.h"
@@ -92,23 +93,31 @@ struct fv_VuDev {
  */
 size_t nqueues;
 struct fv_QueueInfo **qi;
-};
-
-/* From spec */
-struct virtio_fs_config {
-char tag[36];
-uint32_t num_queues;
+/* True if notification queue is being used */
+bool notify_enabled;
 };
 
 /* Callback from libvhost-user */
 static uint64_t fv_get_features(VuDev *dev)
 {
-return 1ULL << VIRTIO_F_VERSION_1;
+uint64_t features;
+
+features = 1ull << VIRTIO_F_VERSION_1 |
+   1ull << VIRTIO_FS_F_NOTIFICATION;
+
+return features;
 }
 
 /* Callback from libvhost-user */
 static void fv_set_features(VuDev *dev, uint64_t features)
 {
+struct fv_VuDev *vud = container_of(dev, struct fv_VuDev, dev);
+struct fuse_session *se = vud->se;
+
+if ((1ull << VIRTIO_FS_F_NOTIFICATION) & features) {
+vud->notify_enabled = true;
+se->notify_enabled = true;
+}
 }
 
 /*
@@ -765,6 +774,9 @@ static void fv_queue_set_started(VuDev *dev, int qidx, bool 
started)
 {
 struct fv_VuDev *vud = container_of(dev, struct fv_VuDev, dev);
 struct fv_QueueInfo *ourqi;
+void * (*thread_func) (void *) = fv_queue_thread;
+int valid_queues = 2; /* One hiprio queue and one request queue */
+bool notification_q = false;
 
 fuse_log(FUSE_LOG_INFO, "%s: qidx=%d started=%d\n", __func__, qidx,
  started);
@@ -776,10 +788,18 @@ static void fv_queue_set_started(VuDev *dev, int qidx, 
bool started)
  * well-behaved client in mind and may not protect against all types of
  * races yet.
  */
-if (qidx > 1) {
-fuse_log(FUSE_LOG_ERR,
- "%s: multiple request queues not yet implemented, please only 
"
- "configure 1 request queue\n",
+if (vud->notify_enabled) {
+valid_queues++;
+/*
+ * If notification queue is enabled, then qidx 1 is notificaiton queue.
+ */
+if (qidx == 1)
+notification_q = true;
+}
+
+if (qidx >= valid_queues) {
+fuse_log(FUSE_LOG_ERR, "%s: multiple request queues not yet"
+ "implemented, please only configure 1 request queue\n",
  __func__);
 exit(EXIT_FAILURE);
 }
@@ -803,13 +823,19 @@ static void fv_queue_set_started(VuDev *dev, int qidx, 
bool started)
 assert(vud->qi[qidx]->kick_fd == -1);
 }
 ourqi = vud->qi[qidx];
+pthread_mutex_init(>vq_lock, NULL);
+/*
+ * For notification queue, we don't have to start a thread yet.
+ */
+if (notification_q)
+return;
+
 ourqi->kick_fd = dev->vq[qidx].kick_fd;
 
 ourqi->kill_fd = eventfd(0, EFD_CLOEXEC | EFD_SEMAPHORE);
 assert(ourqi->kill_fd != -1);
-pthread_mutex_init(>vq_lock, NULL);
 
-if (pthread_create(>thread, NULL, fv_queue_thread, ourqi)) {
+if (pthread_create(>thread, NULL, thread_func, ourqi)) {
 fuse_log(FUSE_LOG_ERR, "%s: Failed to create thread for queue 
%d\n",
  __func__, qidx);
 assert(0);
@@ -819,17 +845,19 @@ static void fv_queue_set_started(VuDev *dev, int qidx, 
bool started)
 assert(qidx < vud->nqueues);
 ourqi = vud->qi[qidx];
 
-/* Kill the thread */
-if (eventfd_write(ourqi->kill_fd, 1)) {
-fuse_log(FUSE_LOG_ERR, "Eventfd_read for queue: %m\n");
-}
-ret = pthread_join(ourqi->thread, NULL);
-if (ret) {
-fuse_log(FUSE_LOG_ERR, "%s: Failed to join thread idx %d err %d\n",
- __func__, qidx, ret);
+if (!notification_q) {
+/* Kill the thread */
+if 

[for-5.0 PATCH 0/4] ppc: Fix interrupt controller emulation

2019-12-04 Thread Greg Kurz
Guest hangs have been observed recently on POWER9 hosts, specifically LC92x
"Boston" systems, when the guests are being rebooted multiple times. The
issue isn't POWER9 specific though. It is caused by a very long standing bug
when using the uncommon accel=kvm,kernel-irqchip=off machine configuration
which happens to be enforced on LC92x because of a host FW limitation. This
affects both the XICS and XIVE emulated interrupt controllers.

The actual fix is in patch 1. Patch 2 is a followup cleanup. The other
patches are unrelated cleanups I came up with while investigating.

Since this bug always existed and we're already in rc4, I think it is better
to fix it in 5.0 and possibly backport it to stable and downstream if needed.

--
Greg

---

Greg Kurz (4):
  ppc: Deassert the external interrupt pin in KVM on reset
  xics: Don't deassert outputs
  ppc: Don't use CPUPPCState::irq_input_state with modern Book3s CPU models
  ppc: Ignore the CPU_INTERRUPT_EXITTB interrupt with KVM


 hw/intc/xics.c  |3 ---
 hw/ppc/ppc.c|   24 ++--
 include/hw/ppc/ppc.h|2 ++
 target/ppc/cpu.h|4 +++-
 target/ppc/helper_regs.h|5 +
 target/ppc/translate_init.inc.c |1 +
 6 files changed, 21 insertions(+), 18 deletions(-)




[for-5.0 PATCH 1/4] ppc: Deassert the external interrupt pin in KVM on reset

2019-12-04 Thread Greg Kurz
When a CPU is reset, QEMU makes sure no interrupt is pending by clearing
CPUPPCstate::pending_interrupts in ppc_cpu_reset(). In the case of a
complete machine emulation, eg. a sPAPR machine, an external interrupt
request could still be pending in KVM though, eg. an IPI. It will be
eventually presented to the guest, which is supposed to acknowledge it at
the interrupt controller. If the interrupt controller is emulated in QEMU,
either XICS or XIVE, ppc_set_irq() won't deassert the external interrupt
pin in KVM since it isn't pending anymore for QEMU. When the vCPU re-enters
the guest, the interrupt request is still pending and the vCPU will try
again to acknowledge it. This causes an infinite loop and eventually hangs
the guest.

The code has been broken since the beginning. The issue wasn't hit before
because accel=kvm,kernel-irqchip=off is an awkward setup that never got
used until recently with the LC92x IBM systems (aka, Boston).

Add a ppc_irq_reset() function to do the necessary cleanup, ie. deassert
the IRQ pins of the CPU in QEMU and most importantly the external interrupt
pin for this vCPU in KVM.

Reported-by: Satheesh Rajendran 
Signed-off-by: Greg Kurz 
---
 hw/ppc/ppc.c|8 
 include/hw/ppc/ppc.h|2 ++
 target/ppc/translate_init.inc.c |1 +
 3 files changed, 11 insertions(+)

diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index 8dd982fc1e40..fab73f1b1fc9 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -1515,3 +1515,11 @@ PowerPCCPU *ppc_get_vcpu_by_pir(int pir)
 
 return NULL;
 }
+
+void ppc_irq_reset(PowerPCCPU *cpu)
+{
+CPUPPCState *env = >env;
+
+env->irq_input_state = 0;
+kvmppc_set_interrupt(cpu, PPC_INTERRUPT_EXT, 0);
+}
diff --git a/include/hw/ppc/ppc.h b/include/hw/ppc/ppc.h
index 585be6ab98c5..89e1dd065af7 100644
--- a/include/hw/ppc/ppc.h
+++ b/include/hw/ppc/ppc.h
@@ -77,6 +77,7 @@ static inline void ppc970_irq_init(PowerPCCPU *cpu) {}
 static inline void ppcPOWER7_irq_init(PowerPCCPU *cpu) {}
 static inline void ppcPOWER9_irq_init(PowerPCCPU *cpu) {}
 static inline void ppce500_irq_init(PowerPCCPU *cpu) {}
+static inline void ppc_irq_reset(PowerPCCPU *cpu) {}
 #else
 void ppc40x_irq_init(PowerPCCPU *cpu);
 void ppce500_irq_init(PowerPCCPU *cpu);
@@ -84,6 +85,7 @@ void ppc6xx_irq_init(PowerPCCPU *cpu);
 void ppc970_irq_init(PowerPCCPU *cpu);
 void ppcPOWER7_irq_init(PowerPCCPU *cpu);
 void ppcPOWER9_irq_init(PowerPCCPU *cpu);
+void ppc_irq_reset(PowerPCCPU *cpu);
 #endif
 
 /* PPC machines for OpenBIOS */
diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
index ba726dec4d00..64a838095c7a 100644
--- a/target/ppc/translate_init.inc.c
+++ b/target/ppc/translate_init.inc.c
@@ -10461,6 +10461,7 @@ static void ppc_cpu_reset(CPUState *s)
 env->pending_interrupts = 0;
 s->exception_index = POWERPC_EXCP_NONE;
 env->error_code = 0;
+ppc_irq_reset(cpu);
 
 /* tininess for underflow is detected before rounding */
 set_float_detect_tininess(float_tininess_before_rounding,




Re: [PATCH v4 14/40] target/arm: Recover 4 bits from TBFLAGs

2019-12-04 Thread Alex Bennée


Richard Henderson  writes:

> On 12/4/19 3:43 AM, Alex Bennée wrote:

>>>  void gen_intermediate_code(CPUState *cpu, TranslationBlock *tb, int 
>>> max_insns)
>>>  {
>>> -DisasContext dc;
>>> +DisasContext dc = { };
>> 
>> We seemed to have dropped an initialise here which seems unrelated.
>
> Added, not dropped.

But is it related to this patch or fixing another bug?


-- 
Alex Bennée



[PATCH for-5.0 6/8] acpi: cpuhp: spec: add typical usecases

2019-12-04 Thread Igor Mammedov
Document work-flows for
  * finding a CPU with pending 'insert/remove' event
  * enumerating present and possible CPUs

Signed-off-by: Igor Mammedov 
---
 docs/specs/acpi_cpu_hotplug.txt | 29 +
 1 file changed, 29 insertions(+)

diff --git a/docs/specs/acpi_cpu_hotplug.txt b/docs/specs/acpi_cpu_hotplug.txt
index f3c552d..58c16c6 100644
--- a/docs/specs/acpi_cpu_hotplug.txt
+++ b/docs/specs/acpi_cpu_hotplug.txt
@@ -64,6 +64,7 @@ write access:
 [0x0-0x3] CPU selector: (DWORD access)
   selects active CPU device. All following accesses to other
   registers will read/store data from/to selected CPU.
+  Valid values: [0 .. max_cpus)
 [0x4] CPU device control fields: (1 byte access)
 bits:
 0: reserved, OSPM must clear it before writing to register.
@@ -96,3 +97,31 @@ write access:
  ACPI_DEVICE_OST QMP event from QEMU to external applications
  with current values of OST event and status registers.
 other values: reserved
+
+Typical usecases:
+- Get a cpu with pending event
+  1. Store 0x0 to the 'CPU selector' register.
+  2. Store 0x0 to the 'Command field' register.
+  3. Read the 'CPU device status fields' register.
+  4. If both bit#1 and bit#2 are clear in the value read, there is no 
CPU
+ with a pending event and selected CPU remains unchanged.
+  5. Otherwise, read the 'Command data' register. The value read is the
+ selector of the CPU with the pending event (which is already
+ selected).
+
+- Enumerate CPUs present/non present CPUs
+  01. Set the present CPU count to 0.
+  02. Set the iterator to 0.
+  03. Store 0x0 to the 'CPU selector' register, to ensure that it's in
+  a valid state and that access to other registers won't be 
ignored.
+  04. Store 0x0 to the 'Command field' register to make 'Command data'
+  register return 'CPU selector' value of selected CPU
+  05. Read the 'CPU device status fields' register.
+  06. If bit#0 is set, increment the present CPU count.
+  07. Increment the iterator.
+  08. Store the iterator to the 'CPU selector' register.
+  09. Read the 'Command data' register.
+  10. If the value read is not zero, goto 05.
+  11. Otherwise store 0x0 to the 'CPU selector' register, to put it
+  into a valid state and exit.
+  The iterator at this point equals "max_cpus".
-- 
2.7.4




[PATCH v2 5/5] virtiofsd: Implement blocking posix locks

2019-12-04 Thread Vivek Goyal
As of now we don't support fcntl(F_SETLKW) and if we see one, we return
-EOPNOTSUPP.

Change that by accepting these requests and returning a reply immediately
asking caller to wait. Once lock is available, send a notification to
the waiter indicating lock is available.

Signed-off-by: Vivek Goyal 
---
 contrib/virtiofsd/fuse_kernel.h|  7 +++
 contrib/virtiofsd/fuse_lowlevel.c  | 23 ++-
 contrib/virtiofsd/fuse_lowlevel.h  | 25 
 contrib/virtiofsd/fuse_virtio.c| 97 --
 contrib/virtiofsd/passthrough_ll.c | 49 ---
 5 files changed, 185 insertions(+), 16 deletions(-)

diff --git a/contrib/virtiofsd/fuse_kernel.h b/contrib/virtiofsd/fuse_kernel.h
index 2bdc8b1c88..432eb14d14 100644
--- a/contrib/virtiofsd/fuse_kernel.h
+++ b/contrib/virtiofsd/fuse_kernel.h
@@ -444,6 +444,7 @@ enum fuse_notify_code {
FUSE_NOTIFY_STORE = 4,
FUSE_NOTIFY_RETRIEVE = 5,
FUSE_NOTIFY_DELETE = 6,
+   FUSE_NOTIFY_LOCK = 7,
FUSE_NOTIFY_CODE_MAX,
 };
 
@@ -836,6 +837,12 @@ struct fuse_notify_retrieve_in {
uint64_tdummy4;
 };
 
+struct fuse_notify_lock_out {
+   uint64_tunique;
+   int32_t error;
+   int32_t padding;
+};
+
 /* Device ioctls: */
 #define FUSE_DEV_IOC_CLONE _IOR(229, 0, uint32_t)
 
diff --git a/contrib/virtiofsd/fuse_lowlevel.c 
b/contrib/virtiofsd/fuse_lowlevel.c
index d4a42d9804..3d9c289510 100644
--- a/contrib/virtiofsd/fuse_lowlevel.c
+++ b/contrib/virtiofsd/fuse_lowlevel.c
@@ -183,7 +183,8 @@ int fuse_send_reply_iov_nofree(fuse_req_t req, int error, 
struct iovec *iov,
 {
struct fuse_out_header out;
 
-   if (error <= -1000 || error > 0) {
+   /* error = 1 has been used to signal client to wait for notificaiton */
+   if (error <= -1000 || error > 1) {
fuse_log(FUSE_LOG_ERR, "fuse: bad error value: %i\n",   error);
error = -ERANGE;
}
@@ -291,6 +292,12 @@ int fuse_reply_err(fuse_req_t req, int err)
return send_reply(req, -err, NULL, 0);
 }
 
+int fuse_reply_wait(fuse_req_t req)
+{
+   /* TODO: This is a hack. Fix it */
+   return send_reply(req, 1, NULL, 0);
+}
+
 void fuse_reply_none(fuse_req_t req)
 {
fuse_free_req(req);
@@ -2207,6 +2214,20 @@ static int send_notify_iov(struct fuse_session *se, int 
notify_code,
return fuse_send_msg(se, NULL, iov, count);
 }
 
+int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t unique,
+ int32_t error)
+{
+   struct fuse_notify_lock_out outarg = {0};
+   struct iovec iov[2];
+
+   outarg.unique = unique;
+   outarg.error = -error;
+
+   iov[1].iov_base = 
+   iov[1].iov_len = sizeof(outarg);
+   return send_notify_iov(se, FUSE_NOTIFY_LOCK, iov, 2);
+}
+
 int fuse_lowlevel_notify_poll(struct fuse_pollhandle *ph)
 {
if (ph != NULL) {
diff --git a/contrib/virtiofsd/fuse_lowlevel.h 
b/contrib/virtiofsd/fuse_lowlevel.h
index e664d2d12d..4126b4f967 100644
--- a/contrib/virtiofsd/fuse_lowlevel.h
+++ b/contrib/virtiofsd/fuse_lowlevel.h
@@ -1251,6 +1251,22 @@ struct fuse_lowlevel_ops {
  */
 int fuse_reply_err(fuse_req_t req, int err);
 
+/**
+ * Ask caller to wait for lock.
+ *
+ * Possible requests:
+ *   setlkw
+ *
+ * If caller sends a blocking lock request (setlkw), then reply to caller
+ * that wait for lock to be available. Once lock is available caller will
+ * receive a notification with request's unique id. Notification will
+ * carry info whether lock was successfully obtained or not.
+ *
+ * @param req request handle
+ * @return zero for success, -errno for failure to send reply
+ */
+int fuse_reply_wait(fuse_req_t req);
+
 /**
  * Don't send reply
  *
@@ -1704,6 +1720,15 @@ int fuse_lowlevel_notify_delete(struct fuse_session *se,
 int fuse_lowlevel_notify_store(struct fuse_session *se, fuse_ino_t ino,
   off_t offset, struct fuse_bufvec *bufv,
   enum fuse_buf_copy_flags flags);
+/**
+ * Notify event related to previous lock request
+ *
+ * @param se the session object
+ * @param unique the unique id of the request which requested setlkw
+ * @param error zero for success, -errno for the failure
+ */
+int fuse_lowlevel_notify_lock(struct fuse_session *se, uint64_t unique,
+ int32_t error);
 
 /* --- *
  * Utility functions  *
diff --git a/contrib/virtiofsd/fuse_virtio.c b/contrib/virtiofsd/fuse_virtio.c
index 94cf9b3791..129dd329f6 100644
--- a/contrib/virtiofsd/fuse_virtio.c
+++ b/contrib/virtiofsd/fuse_virtio.c
@@ -208,6 +208,83 @@ static void copy_iov(struct iovec *src_iov, int src_count,
 }
 }
 
+static int virtio_send_notify_msg(struct fuse_session *se, struct iovec *iov,
+ int count)
+{
+struct fv_QueueInfo *qi;
+VuDev *dev = 

Re: [PATCH v2 3/7] iotests: Skip test 079 if it is not possible to create large files

2019-12-04 Thread Philippe Mathieu-Daudé

On 12/4/19 4:46 PM, Thomas Huth wrote:

Test 079 fails in the arm64, s390x and ppc64le LXD containers on Travis
(which we will hopefully enable in our CI soon). These containers
apparently do not allow large files to be created. Test 079 tries to
create a 4G sparse file, which is apparently already too big for these
containers, so check first whether we can really create such files before
executing the test.

Signed-off-by: Thomas Huth 
---
  tests/qemu-iotests/079 | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/tests/qemu-iotests/079 b/tests/qemu-iotests/079
index 81f0c21f53..78536d3bbf 100755
--- a/tests/qemu-iotests/079
+++ b/tests/qemu-iotests/079
@@ -39,6 +39,9 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
  _supported_fmt qcow2
  _supported_proto file nfs
  
+# Some containers (e.g. non-x86 on Travis) do not allow large files

+_require_large_file 4G
+
  echo "=== Check option preallocation and cluster_size ==="
  echo
  cluster_sizes="16384 32768 65536 131072 262144 524288 1048576 2097152 4194304"



Reviewed-by: Philippe Mathieu-Daudé 




[PATCH for-5.0 0/8] q35: CPU hotplug with secure boot, part 1+2

2019-12-04 Thread Igor Mammedov
Series consists of 2 parts: 1st is lockable SMRAM at SMBASE
and the 2nd adds means to enumerate APIC IDs for possible CPUs.

1st part [1-2/8]:
 In order to support CPU hotplug in secure boot mode,
 UEFI firmware needs to relocate SMI handler of hotplugged CPU,
 in a way that won't allow ring 0 user to break in priveleged
 SMM mode that firmware maintains during runtime.
 Used approach allows to hide RAM at default SMBASE to make it
 accessible only to SMM mode, which lets us to make sure that
 SMI handler installed by firmware can not be hijacked by
 unpriveleged user (similar to TSEG behavior). 

2nd part:
 mostly fixes and extra documentation on how to detect and use
 modern CPU hotplug interface (MMIO block).
 So firmware could reuse it for enumerating possible CPUs and
 detecting hotplugged CPU(s). It also adds support for
 CPHP_GET_CPU_ID_CMD command [7/8], which should allow firmware
 to fetch APIC IDs for possible CPUs which is necessary for
 initializing internal structures for possible CPUs on boot.
 

CC: m...@redhat.com
CC: pbonz...@redhat.com
CC: ler...@redhat.com
CC: phi...@redhat.com

Igor Mammedov (8):
  q35: implement 128K SMRAM at default SMBASE address
  tests: q35: MCH: add default SMBASE SMRAM lock test
  acpi: cpuhp: spec: clarify 'CPU selector' register usage and
endianness
  acpi: cpuhp: spec: fix 'Command data' description
  acpi: cpuhp: spec: clarify store into 'Command data' when 'Command
field' == 0
  acpi: cpuhp: spec: add typical usecases
  acpi: cpuhp: add CPHP_GET_CPU_ID_CMD command
  acpi: cpuhp: spec: document procedure for enabling modern CPU hotplug

 include/hw/pci-host/q35.h   |  10 
 docs/specs/acpi_cpu_hotplug.txt |  91 +++---
 hw/acpi/cpu.c   |  15 ++
 hw/acpi/trace-events|   1 +
 hw/i386/pc.c|   4 +-
 hw/pci-host/q35.c   |  80 +++---
 tests/q35-test.c| 105 
 7 files changed, 281 insertions(+), 25 deletions(-)

-- 
2.7.4




[PATCH for-5.0 1/8] q35: implement 128K SMRAM at default SMBASE address

2019-12-04 Thread Igor Mammedov
Use commit (2f295167e0 q35/mch: implement extended TSEG sizes) for
inspiration and (ab)use reserved register in config space at 0x9c
offset [*] to extend q35 pci-host with ability to use 128K at
0x3 as SMRAM and hide it (like TSEG) from non-SMM context.

Usage:
  1: write 0xff in the register
  2: if the feature is supported, follow up read from the register
 should return 0x01. At this point RAM at 0x3 is still
 available for SMI handler configuration from non-SMM context
  3: writing 0x02 in the register, locks SMBASE area, making its contents
 available only from SMM context. In non-SMM context, reads return
 0xff and writes are ignored. Further writes into the register are
 ignored until the system reset.

*) https://www.mail-archive.com/qemu-devel@nongnu.org/msg455991.html

Signed-off-by: Igor Mammedov 
---
 include/hw/pci-host/q35.h | 10 ++
 hw/i386/pc.c  |  4 ++-
 hw/pci-host/q35.c | 80 ++-
 3 files changed, 86 insertions(+), 8 deletions(-)

diff --git a/include/hw/pci-host/q35.h b/include/hw/pci-host/q35.h
index b3bcf2e..976fbae 100644
--- a/include/hw/pci-host/q35.h
+++ b/include/hw/pci-host/q35.h
@@ -32,6 +32,7 @@
 #include "hw/acpi/ich9.h"
 #include "hw/pci-host/pam.h"
 #include "hw/i386/intel_iommu.h"
+#include "qemu/units.h"
 
 #define TYPE_Q35_HOST_DEVICE "q35-pcihost"
 #define Q35_HOST_DEVICE(obj) \
@@ -54,6 +55,8 @@ typedef struct MCHPCIState {
 MemoryRegion smram_region, open_high_smram;
 MemoryRegion smram, low_smram, high_smram;
 MemoryRegion tseg_blackhole, tseg_window;
+MemoryRegion smbase_blackhole, smbase_window;
+bool has_smram_at_smbase;
 Range pci_hole;
 uint64_t below_4g_mem_size;
 uint64_t above_4g_mem_size;
@@ -97,6 +100,13 @@ typedef struct Q35PCIHost {
 #define MCH_HOST_BRIDGE_EXT_TSEG_MBYTES_QUERY  0x
 #define MCH_HOST_BRIDGE_EXT_TSEG_MBYTES_MAX0xfff
 
+#define MCH_HOST_BRIDGE_SMBASE_SIZE(128 * KiB)
+#define MCH_HOST_BRIDGE_SMBASE_ADDR0x3
+#define MCH_HOST_BRIDGE_F_SMBASE   0x9c
+#define MCH_HOST_BRIDGE_F_SMBASE_QUERY 0xff
+#define MCH_HOST_BRIDGE_F_SMBASE_IN_RAM0x01
+#define MCH_HOST_BRIDGE_F_SMBASE_LCK   0x02
+
 #define MCH_HOST_BRIDGE_PCIEXBAR   0x60/* 64bit register */
 #define MCH_HOST_BRIDGE_PCIEXBAR_SIZE  8   /* 64bit register */
 #define MCH_HOST_BRIDGE_PCIEXBAR_DEFAULT   0xb000
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index ac08e63..9c4b4ac 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -103,7 +103,9 @@
 
 struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
 
-GlobalProperty pc_compat_4_1[] = {};
+GlobalProperty pc_compat_4_1[] = {
+{ "mch", "smbase-smram", "off" },
+};
 const size_t pc_compat_4_1_len = G_N_ELEMENTS(pc_compat_4_1);
 
 GlobalProperty pc_compat_4_0[] = {};
diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 158d270..c1bd9f7 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -275,20 +275,20 @@ static const TypeInfo q35_host_info = {
  * MCH D0:F0
  */
 
-static uint64_t tseg_blackhole_read(void *ptr, hwaddr reg, unsigned size)
+static uint64_t blackhole_read(void *ptr, hwaddr reg, unsigned size)
 {
 return 0x;
 }
 
-static void tseg_blackhole_write(void *opaque, hwaddr addr, uint64_t val,
- unsigned width)
+static void blackhole_write(void *opaque, hwaddr addr, uint64_t val,
+unsigned width)
 {
 /* nothing */
 }
 
-static const MemoryRegionOps tseg_blackhole_ops = {
-.read = tseg_blackhole_read,
-.write = tseg_blackhole_write,
+static const MemoryRegionOps blackhole_ops = {
+.read = blackhole_read,
+.write = blackhole_write,
 .endianness = DEVICE_NATIVE_ENDIAN,
 .valid.min_access_size = 1,
 .valid.max_access_size = 4,
@@ -430,6 +430,46 @@ static void mch_update_ext_tseg_mbytes(MCHPCIState *mch)
 }
 }
 
+static void mch_update_smbase_smram(MCHPCIState *mch)
+{
+PCIDevice *pd = PCI_DEVICE(mch);
+uint8_t *reg = pd->config + MCH_HOST_BRIDGE_F_SMBASE;
+bool lck;
+
+if (!mch->has_smram_at_smbase) {
+return;
+}
+
+if (*reg == MCH_HOST_BRIDGE_F_SMBASE_QUERY) {
+pd->wmask[MCH_HOST_BRIDGE_F_SMBASE] =
+MCH_HOST_BRIDGE_F_SMBASE_LCK;
+*reg = MCH_HOST_BRIDGE_F_SMBASE_IN_RAM;
+return;
+}
+
+/*
+ * default/reset state, discard written value
+ * which will disable SMRAM balackhole at SMBASE
+ */
+if (pd->wmask[MCH_HOST_BRIDGE_F_SMBASE] == 0xff) {
+*reg = 0x00;
+}
+
+memory_region_transaction_begin();
+if (*reg & MCH_HOST_BRIDGE_F_SMBASE_LCK) {
+/* disable all writes */
+pd->wmask[MCH_HOST_BRIDGE_F_SMBASE] &=
+~MCH_HOST_BRIDGE_F_SMBASE_LCK;
+*reg = MCH_HOST_BRIDGE_F_SMBASE_LCK;
+lck = true;
+} else {
+lck = false;
+}
+

[PATCH v2 1/5] virtiofsd: Get rid of unused fields in fv_QueueInfo

2019-12-04 Thread Vivek Goyal
There are some unused fields in "struct fv_QueueInfo". Get rid of these fields.

Signed-off-by: Vivek Goyal 
---
 contrib/virtiofsd/fuse_virtio.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/contrib/virtiofsd/fuse_virtio.c b/contrib/virtiofsd/fuse_virtio.c
index 31c8542b6c..2a9cd60a01 100644
--- a/contrib/virtiofsd/fuse_virtio.c
+++ b/contrib/virtiofsd/fuse_virtio.c
@@ -50,12 +50,6 @@ struct fv_QueueInfo {
 int qidx;
 int kick_fd;
 int kill_fd; /* For killing the thread */
-
-/* The element for the command currently being processed */
-VuVirtqElement *qe;
-/* If any of the qe vec elements (towards vmm) are unmappable */
-unsigned int elem_bad_in;
-bool reply_sent;
 };
 
 /* A FUSE request */
-- 
2.20.1




Re: [PATCH] target/i386: relax assert when old host kernels don't include msrs

2019-12-04 Thread Eduardo Habkost
On Wed, Dec 04, 2019 at 04:34:45PM +0100, Paolo Bonzini wrote:
> On 04/12/19 16:07, Catherine Ho wrote:
> >> Ok, so the problem is that some MSR didn't exist in that version.  Which
> > I thought in my platform, the only MSR didn't exist is MSR_IA32_VMX_BASIC
> > (0x480). If I remove this kvm_msr_entry_add(), everything is ok, the guest 
> > can
> > be boot up successfully.
> > 
> 
> MSR_IA32_VMX_BASIC was added in kvm-4.10.  Maybe the issue is the
> _value_ that is being written to the VM is not valid?  Can you check
> what's happening in vmx_restore_vmx_basic?

I believe env->features[FEAT_VMX_BASIC] will be initialized to 0
if the host kernel doesn't have KVM_CAP_GET_MSR_FEATURES.

-- 
Eduardo




Re: [PATCH v2 1/7] iotests: Provide a function for checking the creation of huge files

2019-12-04 Thread Philippe Mathieu-Daudé

On 12/4/19 4:46 PM, Thomas Huth wrote:

Some tests create huge (but sparse) files, and to be able to run those
tests in certain limited environments (like CI containers), we have to
check for the possibility to create such files first. Thus let's introduce
a common function to check for large files, and replace the already
existing checks in the iotests 005 and 220 with this function.

Reviewed-by: Alex Bennée 
Signed-off-by: Thomas Huth 
---
  tests/qemu-iotests/005   |  5 +
  tests/qemu-iotests/220   |  6 ++
  tests/qemu-iotests/common.rc | 10 ++
  3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/tests/qemu-iotests/005 b/tests/qemu-iotests/005
index 58442762fe..b6d03ac37d 100755
--- a/tests/qemu-iotests/005
+++ b/tests/qemu-iotests/005
@@ -59,10 +59,7 @@ fi
  # Sanity check: For raw, we require a file system that permits the creation
  # of a HUGE (but very sparse) file. Check we can create it before continuing.
  if [ "$IMGFMT" = "raw" ]; then
-if ! truncate --size=5T "$TEST_IMG"; then
-_notrun "file system on $TEST_DIR does not support large enough files"
-fi
-rm "$TEST_IMG"
+_require_large_file 5T
  fi
  
  echo

diff --git a/tests/qemu-iotests/220 b/tests/qemu-iotests/220
index 2d62c5dcac..15159270d3 100755
--- a/tests/qemu-iotests/220
+++ b/tests/qemu-iotests/220
@@ -42,10 +42,8 @@ echo "== Creating huge file =="
  
  # Sanity check: We require a file system that permits the creation

  # of a HUGE (but very sparse) file.  tmpfs works, ext4 does not.
-if ! truncate --size=513T "$TEST_IMG"; then
-_notrun "file system on $TEST_DIR does not support large enough files"
-fi
-rm "$TEST_IMG"
+_require_large_file 513T
+
  IMGOPTS='cluster_size=2M,refcount_bits=1' _make_test_img 513T
  
  echo "== Populating refcounts =="

diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 0cc8acc9ed..6f0582c79a 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -643,5 +643,15 @@ _require_drivers()
  done
  }
  
+# Check that we have a file system that allows huge (but very sparse) files

+#
+_require_large_file()
+{
+if ! truncate --size="$1" "$TEST_IMG"; then
+_notrun "file system on $TEST_DIR does not support large enough files"
+fi
+rm "$TEST_IMG"
+}


:)

Reviewed-by: Philippe Mathieu-Daudé 


+
  # make sure this script returns success
  true






Re: [PATCH v4 14/40] target/arm: Recover 4 bits from TBFLAGs

2019-12-04 Thread Richard Henderson
On 12/4/19 7:53 AM, Alex Bennée wrote:
> 
> Richard Henderson  writes:
> 
>> On 12/4/19 3:43 AM, Alex Bennée wrote:
> 
  void gen_intermediate_code(CPUState *cpu, TranslationBlock *tb, int 
 max_insns)
  {
 -DisasContext dc;
 +DisasContext dc = { };
>>>
>>> We seemed to have dropped an initialise here which seems unrelated.
>>
>> Added, not dropped.
> 
> But is it related to this patch or fixing another bug?

It is related to the patch.

We used to initialize all of the a32 and m32 fields in DisasContext by
assignment.  Now we only initialize either the a32 or m32 by assignment,
because the bits overlap in tbflags.  So zero out the other bits here.

I'll add this to the commit message.


r~



Re: [PATCH 01/10] hw: arm: add Allwinner H3 System-on-Chip

2019-12-04 Thread Philippe Mathieu-Daudé

Hi Niek,

On 12/2/19 10:09 PM, Niek Linnenbank wrote:

The Allwinner H3 is a System on Chip containing four ARM Cortex A7
processor cores. Features and specifications include DDR2/DDR3 memory,
SD/MMC storage cards, 10/100/1000Mbit ethernet, USB 2.0, HDMI and
various I/O modules. This commit adds support for the Allwinner H3
System on Chip.

Signed-off-by: Niek Linnenbank 
---
  MAINTAINERS |   7 ++
  default-configs/arm-softmmu.mak |   1 +
  hw/arm/Kconfig  |   8 ++
  hw/arm/Makefile.objs|   1 +
  hw/arm/allwinner-h3.c   | 215 
  include/hw/arm/allwinner-h3.h   | 118 ++
  6 files changed, 350 insertions(+)
  create mode 100644 hw/arm/allwinner-h3.c
  create mode 100644 include/hw/arm/allwinner-h3.h


Since your series changes various files, can you have a look at the 
scripts/git.orderfile file and setup it for your QEMU contributions?




diff --git a/MAINTAINERS b/MAINTAINERS
index 5e5e3e52d6..29c9936037 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -479,6 +479,13 @@ F: hw/*/allwinner*
  F: include/hw/*/allwinner*
  F: hw/arm/cubieboard.c
  
+Allwinner-h3

+M: Niek Linnenbank 
+L: qemu-...@nongnu.org
+S: Maintained
+F: hw/*/allwinner-h3*
+F: include/hw/*/allwinner-h3*
+
  ARM PrimeCell and CMSDK devices
  M: Peter Maydell 
  L: qemu-...@nongnu.org
diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index 1f2e0e7fde..d75a239c2c 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -40,3 +40,4 @@ CONFIG_FSL_IMX25=y
  CONFIG_FSL_IMX7=y
  CONFIG_FSL_IMX6UL=y
  CONFIG_SEMIHOSTING=y
+CONFIG_ALLWINNER_H3=y
diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index c6e7782580..ebf8d2325f 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -291,6 +291,14 @@ config ALLWINNER_A10
  select SERIAL
  select UNIMP
  
+config ALLWINNER_H3

+bool
+select ALLWINNER_A10_PIT
+select SERIAL
+select ARM_TIMER
+select ARM_GIC
+select UNIMP
+
  config RASPI
  bool
  select FRAMEBUFFER
diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
index fe749f65fd..956e496052 100644
--- a/hw/arm/Makefile.objs
+++ b/hw/arm/Makefile.objs
@@ -34,6 +34,7 @@ obj-$(CONFIG_DIGIC) += digic.o
  obj-$(CONFIG_OMAP) += omap1.o omap2.o
  obj-$(CONFIG_STRONGARM) += strongarm.o
  obj-$(CONFIG_ALLWINNER_A10) += allwinner-a10.o cubieboard.o
+obj-$(CONFIG_ALLWINNER_H3) += allwinner-h3.o
  obj-$(CONFIG_RASPI) += bcm2835_peripherals.o bcm2836.o raspi.o
  obj-$(CONFIG_STM32F205_SOC) += stm32f205_soc.o
  obj-$(CONFIG_XLNX_ZYNQMP_ARM) += xlnx-zynqmp.o xlnx-zcu102.o
diff --git a/hw/arm/allwinner-h3.c b/hw/arm/allwinner-h3.c
new file mode 100644
index 00..470fdfebef
--- /dev/null
+++ b/hw/arm/allwinner-h3.c
@@ -0,0 +1,215 @@
+/*
+ * Allwinner H3 System on Chip emulation
+ *
+ * Copyright (C) 2019 Niek Linnenbank 
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "exec/address-spaces.h"
+#include "qapi/error.h"
+#include "qemu/module.h"
+#include "qemu/units.h"
+#include "cpu.h"
+#include "hw/sysbus.h"
+#include "hw/arm/allwinner-h3.h"
+#include "hw/misc/unimp.h"
+#include "sysemu/sysemu.h"
+
+static void aw_h3_init(Object *obj)
+{
+AwH3State *s = AW_H3(obj);
+
+sysbus_init_child_obj(obj, "gic", >gic, sizeof(s->gic),
+  TYPE_ARM_GIC);
+
+sysbus_init_child_obj(obj, "timer", >timer, sizeof(s->timer),
+  TYPE_AW_A10_PIT);
+}
+
+static void aw_h3_realize(DeviceState *dev, Error **errp)
+{
+AwH3State *s = AW_H3(dev);
+SysBusDevice *sysbusdev = NULL;
+Error *err = NULL;
+unsigned i = 0;
+
+/* CPUs */
+for (i = 0; i < AW_H3_NUM_CPUS; i++) {


In https://www.mail-archive.com/qemu-devel@nongnu.org/msg662942.html
Markus noted some incorrect pattern, and apparently you inherited it.
You should initialize 'err' in the loop.


+Object *cpuobj = object_new(ARM_CPU_TYPE_NAME("cortex-a7"));
+CPUState *cpustate = CPU(cpuobj);


We loose access to the CPUs. Can you use an array of AW_H3_NUM_CPUS cpus 
in AwH3State?



+
+/* Set the proper CPU index */
+cpustate->cpu_index = i;
+
+/* Provide Power State Coordination Interface */
+object_property_set_int(cpuobj, QEMU_PSCI_CONDUIT_HVC,
+   

Re: [PATCH v9 Kernel 2/5] vfio iommu: Add ioctl defination to get dirty pages bitmap.

2019-12-04 Thread Kirti Wankhede




On 12/3/2019 11:34 PM, Alex Williamson wrote:

On Mon, 25 Nov 2019 19:57:39 -0500
Yan Zhao  wrote:


On Fri, Nov 15, 2019 at 05:06:25AM +0800, Alex Williamson wrote:

On Fri, 15 Nov 2019 00:26:07 +0530
Kirti Wankhede  wrote:
   

On 11/14/2019 1:37 AM, Alex Williamson wrote:

On Thu, 14 Nov 2019 01:07:21 +0530
Kirti Wankhede  wrote:
 

On 11/13/2019 4:00 AM, Alex Williamson wrote:

On Tue, 12 Nov 2019 22:33:37 +0530
Kirti Wankhede  wrote:


All pages pinned by vendor driver through vfio_pin_pages API should be
considered as dirty during migration. IOMMU container maintains a list of
all such pinned pages. Added an ioctl defination to get bitmap of such


definition


pinned pages for requested IO virtual address range.


Additionally, all mapped pages are considered dirty when physically
mapped through to an IOMMU, modulo we discussed devices opting in to
per page pinning to indicate finer granularity with a TBD mechanism to
figure out if any non-opt-in devices remain.



You mean, in case of device direct assignment (device pass through)?


Yes, or IOMMU backed mdevs.  If vfio_dmas in the container are fully
pinned and mapped, then the correct dirty page set is all mapped pages.
We discussed using the vpfn list as a mechanism for vendor drivers to
reduce their migration footprint, but we also discussed that we would
need a way to determine that all participants in the container have
explicitly pinned their working pages or else we must consider the
entire potential working set as dirty.
 


How can vendor driver tell this capability to iommu module? Any suggestions?


I think it does so by pinning pages.  Is it acceptable that if the
vendor driver pins any pages, then from that point forward we consider
the IOMMU group dirty page scope to be limited to pinned pages?  There

we should also be aware of that dirty page scope is pinned pages + unpinned 
pages,
which means ever since a page is pinned, it should be regarded as dirty
no matter whether it's unpinned later. only after log_sync is called and
dirty info retrieved, its dirty state should be cleared.


Yes, good point.  We can't just remove a vpfn when a page is unpinned
or else we'd lose information that the page potentially had been
dirtied while it was pinned.  Maybe that vpfn needs to move to a dirty
list and both the currently pinned vpfns and the dirty vpfns are walked
on a log_sync.  The dirty vpfns list would be cleared after a log_sync.
The container would need to know that dirty tracking is enabled and
only manage the dirty vpfns list when necessary.  Thanks,



If page is unpinned, then that page is available in free page pool for 
others to use, then how can we say that unpinned page has valid data?


If suppose, one driver A unpins a page and when driver B of some other 
device gets that page and he pins it, uses it, and then unpins it, then 
how can we say that page has valid data for driver A?


Can you give one example where unpinned page data is considered reliable 
and valid?


Thanks,
Kirti


Alex
  

are complications around non-singleton IOMMU groups, but I think we're
already leaning towards that being a non-worthwhile problem to solve.
So if we require that only singleton IOMMU groups can pin pages and we
pass the IOMMU group as a parameter to
vfio_iommu_driver_ops.pin_pages(), then the type1 backend can set a
flag on its local vfio_group struct to indicate dirty page scope is
limited to pinned pages.  We might want to keep a flag on the
vfio_iommu struct to indicate if all of the vfio_groups for each
vfio_domain in the vfio_iommu.domain_list dirty page scope limited to
pinned pages as an optimization to avoid walking lists too often.  Then
we could test if vfio_iommu.domain_list is not empty and this new flag
does not limit the dirty page scope, then everything within each
vfio_dma is considered dirty.


Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
include/uapi/linux/vfio.h | 23 +++
1 file changed, 23 insertions(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 35b09427ad9f..6fd3822aa610 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -902,6 +902,29 @@ struct vfio_iommu_type1_dma_unmap {
#define VFIO_IOMMU_ENABLE   _IO(VFIO_TYPE, VFIO_BASE + 15)
#define VFIO_IOMMU_DISABLE  _IO(VFIO_TYPE, VFIO_BASE + 16)

+/**

+ * VFIO_IOMMU_GET_DIRTY_BITMAP - _IOWR(VFIO_TYPE, VFIO_BASE + 17,
+ * struct vfio_iommu_type1_dirty_bitmap)
+ *
+ * IOCTL to get dirty pages bitmap for IOMMU container during migration.
+ * Get dirty pages bitmap of given IO virtual addresses range using
+ * struct vfio_iommu_type1_dirty_bitmap. Caller sets argsz, which is size of
+ * struct vfio_iommu_type1_dirty_bitmap. User should allocate memory to get
+ * bitmap and should set size of allocated memory in bitmap_size field.
+ * One bit is used to represent per page consecutively starting 

Re: [PATCH v4 25/40] target/arm: Update timer access for VHE

2019-12-04 Thread Alex Bennée


Richard Henderson  writes:

> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

> ---
>  target/arm/helper.c | 102 +++-
>  1 file changed, 81 insertions(+), 21 deletions(-)
>
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index a4a7f82661..023b8963cf 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -2287,10 +2287,18 @@ static CPAccessResult gt_cntfrq_access(CPUARMState 
> *env, const ARMCPRegInfo *ri,
>   * Writable only at the highest implemented exception level.
>   */
>  int el = arm_current_el(env);
> +uint64_t hcr;
> +uint32_t cntkctl;
>  
>  switch (el) {
>  case 0:
> -if (!extract32(env->cp15.c14_cntkctl, 0, 2)) {
> +hcr = arm_hcr_el2_eff(env);
> +if ((hcr & (HCR_E2H | HCR_TGE)) == (HCR_E2H | HCR_TGE)) {
> +cntkctl = env->cp15.cnthctl_el2;
> +} else {
> +cntkctl = env->cp15.c14_cntkctl;
> +}
> +if (!extract32(cntkctl, 0, 2)) {
>  return CP_ACCESS_TRAP;
>  }
>  break;
> @@ -2318,17 +2326,47 @@ static CPAccessResult gt_counter_access(CPUARMState 
> *env, int timeridx,
>  {
>  unsigned int cur_el = arm_current_el(env);
>  bool secure = arm_is_secure(env);
> +uint64_t hcr = arm_hcr_el2_eff(env);
>  
> -/* CNT[PV]CT: not visible from PL0 if ELO[PV]CTEN is zero */
> -if (cur_el == 0 &&
> -!extract32(env->cp15.c14_cntkctl, timeridx, 1)) {
> -return CP_ACCESS_TRAP;
> -}
> +switch (cur_el) {
> +case 0:
> +/* If HCR_EL2. == '11': check CNTHCTL_EL2.EL0[PV]CTEN. */
> +if ((hcr & (HCR_E2H | HCR_TGE)) == (HCR_E2H | HCR_TGE)) {
> +return (extract32(env->cp15.cnthctl_el2, timeridx, 1)
> +? CP_ACCESS_OK : CP_ACCESS_TRAP_EL2);
> +}
>  
> -if (arm_feature(env, ARM_FEATURE_EL2) &&
> -timeridx == GTIMER_PHYS && !secure && cur_el < 2 &&
> -!extract32(env->cp15.cnthctl_el2, 0, 1)) {
> -return CP_ACCESS_TRAP_EL2;
> +/* CNT[PV]CT: not visible from PL0 if EL0[PV]CTEN is zero */
> +if (!extract32(env->cp15.c14_cntkctl, timeridx, 1)) {
> +return CP_ACCESS_TRAP;
> +}
> +
> +/* If HCR_EL2. == '10': check CNTHCTL_EL2.EL1PCTEN. */
> +if (hcr & HCR_E2H) {
> +if (timeridx == GTIMER_PHYS &&
> +!extract32(env->cp15.cnthctl_el2, 10, 1)) {
> +return CP_ACCESS_TRAP_EL2;
> +}
> +} else {
> +/* If HCR_EL2. == 0: check CNTHCTL_EL2.EL1PCEN. */
> +if (arm_feature(env, ARM_FEATURE_EL2) &&
> +timeridx == GTIMER_PHYS && !secure &&
> +!extract32(env->cp15.cnthctl_el2, 1, 1)) {
> +return CP_ACCESS_TRAP_EL2;
> +}
> +}
> +break;
> +
> +case 1:
> +/* Check CNTHCTL_EL2.EL1PCTEN, which changes location based on E2H. 
> */
> +if (arm_feature(env, ARM_FEATURE_EL2) &&
> +timeridx == GTIMER_PHYS && !secure &&
> +(hcr & HCR_E2H
> + ? !extract32(env->cp15.cnthctl_el2, 10, 1)
> + : !extract32(env->cp15.cnthctl_el2, 0, 1))) {
> +return CP_ACCESS_TRAP_EL2;
> +}
> +break;
>  }
>  return CP_ACCESS_OK;
>  }
> @@ -2338,19 +2376,41 @@ static CPAccessResult gt_timer_access(CPUARMState 
> *env, int timeridx,
>  {
>  unsigned int cur_el = arm_current_el(env);
>  bool secure = arm_is_secure(env);
> +uint64_t hcr = arm_hcr_el2_eff(env);
>  
> -/* CNT[PV]_CVAL, CNT[PV]_CTL, CNT[PV]_TVAL: not visible from PL0 if
> - * EL0[PV]TEN is zero.
> - */
> -if (cur_el == 0 &&
> -!extract32(env->cp15.c14_cntkctl, 9 - timeridx, 1)) {
> -return CP_ACCESS_TRAP;
> -}
> +switch (cur_el) {
> +case 0:
> +if ((hcr & (HCR_E2H | HCR_TGE)) == (HCR_E2H | HCR_TGE)) {
> +/* If HCR_EL2. == '11': check CNTHCTL_EL2.EL0[PV]TEN. */
> +return (extract32(env->cp15.cnthctl_el2, 9 - timeridx, 1)
> +? CP_ACCESS_OK : CP_ACCESS_TRAP_EL2);
> +}
>  
> -if (arm_feature(env, ARM_FEATURE_EL2) &&
> -timeridx == GTIMER_PHYS && !secure && cur_el < 2 &&
> -!extract32(env->cp15.cnthctl_el2, 1, 1)) {
> -return CP_ACCESS_TRAP_EL2;
> +/*
> + * CNT[PV]_CVAL, CNT[PV]_CTL, CNT[PV]_TVAL: not visible from
> + * EL0 if EL0[PV]TEN is zero.
> + */
> +if (!extract32(env->cp15.c14_cntkctl, 9 - timeridx, 1)) {
> +return CP_ACCESS_TRAP;
> +}
> +/* fall through */
> +
> +case 1:
> +if (arm_feature(env, ARM_FEATURE_EL2) &&
> +timeridx == GTIMER_PHYS && !secure) {
> +if (hcr & HCR_E2H) {
> +/* If HCR_EL2. == '10': check CNTHCTL_EL2.EL1PTEN. 
> */
> +if (!extract32(env->cp15.cnthctl_el2, 11, 1)) {
> +  

[PATCH v2 0/5] [RFC] virtiofsd, vhost-user-fs: Add support for notification queue

2019-12-04 Thread Vivek Goyal
Hi,

Here is V2 of RFC patches for adding a notification queue to
vhost-user-fs device to send notifications from host to guest.
It also has patches to support remote posix locks which make use of this
newly introduced notification queue.

I have taken care of most of the comments from last iteration. Still one
major TODO item is to be able to interrupt/stop blocked thrads for locks
when guest reboots. 

Patches are also available here.

https://github.com/rhvgoyal/qemu/commits/blocking-locks-v2

Associated kernel changes are available here.

https://github.com/rhvgoyal/linux/commits/blocking-locks-v2

Thanks
Vivek

Vivek Goyal (5):
  virtiofsd: Get rid of unused fields in fv_QueueInfo
  virtiofsd: Release file locks using F_UNLCK
  virtiofd: Create a notification queue
  virtiofsd: Specify size of notification buffer using config space
  virtiofsd: Implement blocking posix locks

 contrib/virtiofsd/fuse_i.h |   1 +
 contrib/virtiofsd/fuse_kernel.h|   7 +
 contrib/virtiofsd/fuse_lowlevel.c  |  23 ++-
 contrib/virtiofsd/fuse_lowlevel.h  |  25 +++
 contrib/virtiofsd/fuse_virtio.c| 208 +
 contrib/virtiofsd/passthrough_ll.c |  80 ++--
 hw/virtio/vhost-user-fs-pci.c  |   2 +-
 hw/virtio/vhost-user-fs.c  |  63 ++-
 include/hw/virtio/vhost-user-fs.h  |   3 +
 include/standard-headers/linux/virtio_fs.h |   5 +
 10 files changed, 354 insertions(+), 63 deletions(-)

-- 
2.20.1




[PATCH v2 4/5] virtiofsd: Specify size of notification buffer using config space

2019-12-04 Thread Vivek Goyal
Daemon specifies size of notification buffer needed and that should be done
using config space.

Only ->notify_buf_size value of config space comes from daemon. Rest of
it is filled by qemu device emulation code.

Signed-off-by: Vivek Goyal 
---
 contrib/virtiofsd/fuse_virtio.c| 31 ++
 hw/virtio/vhost-user-fs.c  | 26 ++
 include/hw/virtio/vhost-user-fs.h  |  2 ++
 include/standard-headers/linux/virtio_fs.h |  2 ++
 4 files changed, 61 insertions(+)

diff --git a/contrib/virtiofsd/fuse_virtio.c b/contrib/virtiofsd/fuse_virtio.c
index b1eebcf054..94cf9b3791 100644
--- a/contrib/virtiofsd/fuse_virtio.c
+++ b/contrib/virtiofsd/fuse_virtio.c
@@ -869,6 +869,35 @@ static bool fv_queue_order(VuDev *dev, int qidx)
 return false;
 }
 
+static uint64_t fv_get_protocol_features(VuDev *dev)
+{
+   return 1ull << VHOST_USER_PROTOCOL_F_CONFIG;
+}
+
+static int fv_get_config(VuDev *dev, uint8_t *config, uint32_t len)
+{
+   struct virtio_fs_config fscfg = {};
+   unsigned notify_size, roundto = 64;
+   union fuse_notify_union {
+   struct fuse_notify_poll_wakeup_out  wakeup_out;
+   struct fuse_notify_inval_inode_out  inode_out;
+   struct fuse_notify_inval_entry_out  entry_out;
+   struct fuse_notify_delete_out   delete_out;
+   struct fuse_notify_store_outstore_out;
+   struct fuse_notify_retrieve_out retrieve_out;
+   };
+
+   notify_size = sizeof(struct fuse_out_header) +
+ sizeof(union fuse_notify_union);
+   notify_size = ((notify_size + roundto)/roundto) * roundto;
+
+   fscfg.notify_buf_size = notify_size;
+   memcpy(config, , len);
+   fuse_log(FUSE_LOG_DEBUG, "%s:Setting notify_buf_size=%d\n", __func__,
+ fscfg.notify_buf_size);
+   return 0;
+}
+
 static const VuDevIface fv_iface = {
 .get_features = fv_get_features,
 .set_features = fv_set_features,
@@ -877,6 +906,8 @@ static const VuDevIface fv_iface = {
 .queue_set_started = fv_queue_set_started,
 
 .queue_is_processed_in_order = fv_queue_order,
+.get_protocol_features = fv_get_protocol_features,
+.get_config = fv_get_config,
 };
 
 /*
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index fe9dbe..5a6d244b98 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -277,16 +277,40 @@ uint64_t vhost_user_fs_slave_io(struct vhost_dev *dev, 
VhostUserFSSlaveMsg *sm,
 return (uint64_t)done;
 }
 
+static int vhost_user_fs_handle_config_change(struct vhost_dev *dev)
+{
+return 0;
+}
+
+const VhostDevConfigOps fs_ops = {
+.vhost_dev_config_notifier = vhost_user_fs_handle_config_change,
+};
 
 static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
 {
 VHostUserFS *fs = VHOST_USER_FS(vdev);
 struct virtio_fs_config fscfg = {};
+int ret;
+
+/*
+ * As of now we only get notification buffer size from device. And that's
+ * needed only if notification queue is enabled.
+ */
+if (fs->notify_enabled) {
+ret = vhost_dev_get_config(>vhost_dev, (uint8_t *)>fscfg,
+   sizeof(struct virtio_fs_config));
+if (ret < 0) {
+error_report("vhost-user-fs: get device config space failed."
+ " ret=%d\n", ret);
+return;
+}
+}
 
 memcpy((char *)fscfg.tag, fs->conf.tag,
MIN(strlen(fs->conf.tag) + 1, sizeof(fscfg.tag)));
 
 virtio_stl_p(vdev, _request_queues, fs->conf.num_request_queues);
+virtio_stl_p(vdev, _buf_size, fs->fscfg.notify_buf_size);
 
 memcpy(config, , sizeof(fscfg));
 }
@@ -545,6 +569,8 @@ static void vuf_device_realize(DeviceState *dev, Error 
**errp)
 fs->vhost_dev.nvqs = 2 + fs->conf.num_request_queues;
 
 fs->vhost_dev.vqs = g_new0(struct vhost_virtqueue, fs->vhost_dev.nvqs);
+
+vhost_dev_set_config_notifier(>vhost_dev, _ops);
 ret = vhost_dev_init(>vhost_dev, >vhost_user,
  VHOST_BACKEND_TYPE_USER, 0);
 if (ret < 0) {
diff --git a/include/hw/virtio/vhost-user-fs.h 
b/include/hw/virtio/vhost-user-fs.h
index bd47e0da98..f667cc4b5a 100644
--- a/include/hw/virtio/vhost-user-fs.h
+++ b/include/hw/virtio/vhost-user-fs.h
@@ -14,6 +14,7 @@
 #ifndef _QEMU_VHOST_USER_FS_H
 #define _QEMU_VHOST_USER_FS_H
 
+#include "standard-headers/linux/virtio_fs.h"
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-user.h"
@@ -58,6 +59,7 @@ typedef struct {
 struct vhost_virtqueue *vhost_vqs;
 struct vhost_dev vhost_dev;
 VhostUserState vhost_user;
+struct virtio_fs_config fscfg;
 
 /*< public >*/
 MemoryRegion cache;
diff --git a/include/standard-headers/linux/virtio_fs.h 
b/include/standard-headers/linux/virtio_fs.h
index 9ee95f584f..719216a262 100644
--- 

[PATCH v2 2/5] virtiofsd: Release file locks using F_UNLCK

2019-12-04 Thread Vivek Goyal
We are emulating posix locks for guest using open file description locks
in virtiofsd. When any of the fd is closed in guest, we find associated
OFD lock fd (if there is one) and close it to release all the locks.

Assumption here is that there is no other thread using lo_inode_plock
structure or plock->fd, hence it is safe to do so.

But now we are about to introduce blocking variant of locks (SETLKW),
and that means we might be waiting to a lock to be available and
using plock->fd. And that means there are still users of plock structure.

So release locks using fcntl(SETLK, F_UNLCK) instead and plock will
be freed later.

Signed-off-by: Vivek Goyal 
---
 contrib/virtiofsd/passthrough_ll.c | 31 --
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/contrib/virtiofsd/passthrough_ll.c 
b/contrib/virtiofsd/passthrough_ll.c
index bc214df0c7..6aa56882e8 100644
--- a/contrib/virtiofsd/passthrough_ll.c
+++ b/contrib/virtiofsd/passthrough_ll.c
@@ -936,6 +936,14 @@ static void put_shared(struct lo_data *lo, struct lo_inode 
*inode)
}
 }
 
+static void posix_locks_value_destroy(gpointer data)
+{
+   struct lo_inode_plock *plock = data;
+
+   close(plock->fd);
+   free(plock);
+}
+
 /* Increments nlookup and caller must release refcount using
  * lo_inode_put().
  */
@@ -994,7 +1002,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, 
const char *name,
inode->key.ino = e->attr.st_ino;
inode->key.dev = e->attr.st_dev;
pthread_mutex_init(>plock_mutex, NULL);
-   inode->posix_locks = g_hash_table_new(g_direct_hash, 
g_direct_equal);
+   inode->posix_locks = g_hash_table_new_full(g_direct_hash,
+   g_direct_equal, NULL,
+   posix_locks_value_destroy);
 
get_shared(lo, inode);
 
@@ -1436,9 +1446,6 @@ static void unref_inode(struct lo_data *lo, struct 
lo_inode *inode, uint64_t n)
if (!inode->nlookup) {
lo_map_remove(>ino_map, inode->fuse_ino);
 g_hash_table_remove(lo->inodes, >key);
-   if (g_hash_table_size(inode->posix_locks)) {
-   fuse_log(FUSE_LOG_WARNING, "Hash table is not empty\n");
-   }
g_hash_table_destroy(inode->posix_locks);
pthread_mutex_destroy(>plock_mutex);
 
@@ -1868,6 +1875,7 @@ static struct lo_inode_plock 
*lookup_create_plock_ctx(struct lo_data *lo,
plock->fd = fd;
g_hash_table_insert(inode->posix_locks,
GUINT_TO_POINTER(plock->lock_owner), plock);
+   fuse_log(FUSE_LOG_DEBUG, "lookup_create_plock_ctx(): Inserted element 
in posix_locks hash table with value pointer %p\n", plock);
return plock;
 }
 
@@ -2046,6 +2054,7 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, 
struct fuse_file_info *fi)
(void) ino;
struct lo_inode *inode;
struct lo_inode_plock *plock;
+   struct flock flock;
 
inode = lo_inode(req, ino);
if (!inode) {
@@ -2058,14 +2067,16 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, 
struct fuse_file_info *fi)
plock = g_hash_table_lookup(inode->posix_locks,
GUINT_TO_POINTER(fi->lock_owner));
if (plock) {
-   g_hash_table_remove(inode->posix_locks,
-   GUINT_TO_POINTER(fi->lock_owner));
/*
-* We had used open() for locks and had only one fd. So
-* closing this fd should release all OFD locks.
+* An fd is being closed. For posix locks, this means
+* drop all the associated locks.
 */
-   close(plock->fd);
-   free(plock);
+   memset(, 0, sizeof(struct flock));
+   flock.l_type = F_UNLCK;
+   flock.l_whence = SEEK_SET;
+   /* Unlock whole file */
+   flock.l_start = flock.l_len = 0;
+   fcntl(plock->fd, F_OFD_SETLK, );
}
pthread_mutex_unlock(>plock_mutex);
 
-- 
2.20.1




Re: [PATCH v4 16/40] target/arm: Rearrange ARMMMUIdxBit

2019-12-04 Thread Philippe Mathieu-Daudé

On 12/3/19 3:29 AM, Richard Henderson wrote:

Define via macro expansion, so that renumbering of the base ARMMMUIdx
symbols is automatically reflexed in the bit definitions.

Signed-off-by: Richard Henderson 
---
  target/arm/cpu.h | 39 +++
  1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 5f295c7e60..6ba5126852 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -2886,27 +2886,34 @@ typedef enum ARMMMUIdx {
  ARMMMUIdx_Stage1_E1 = 1 | ARM_MMU_IDX_NOTLB,
  } ARMMMUIdx;
  
-/* Bit macros for the core-mmu-index values for each index,

+/*
+ * Bit macros for the core-mmu-index values for each index,
   * for use when calling tlb_flush_by_mmuidx() and friends.
   */
+#define TO_CORE_BIT(NAME) \
+ARMMMUIdxBit_##NAME = 1 << (ARMMMUIdx_##NAME & ARM_MMU_IDX_COREIDX_MASK)
+
  typedef enum ARMMMUIdxBit {
-ARMMMUIdxBit_EL10_0 = 1 << 0,
-ARMMMUIdxBit_EL10_1 = 1 << 1,
-ARMMMUIdxBit_E2 = 1 << 2,
-ARMMMUIdxBit_SE3 = 1 << 3,
-ARMMMUIdxBit_SE0 = 1 << 4,
-ARMMMUIdxBit_SE1 = 1 << 5,
-ARMMMUIdxBit_Stage2 = 1 << 6,
-ARMMMUIdxBit_MUser = 1 << 0,
-ARMMMUIdxBit_MPriv = 1 << 1,
-ARMMMUIdxBit_MUserNegPri = 1 << 2,
-ARMMMUIdxBit_MPrivNegPri = 1 << 3,
-ARMMMUIdxBit_MSUser = 1 << 4,
-ARMMMUIdxBit_MSPriv = 1 << 5,
-ARMMMUIdxBit_MSUserNegPri = 1 << 6,
-ARMMMUIdxBit_MSPrivNegPri = 1 << 7,
+TO_CORE_BIT(EL10_0),
+TO_CORE_BIT(EL10_1),
+TO_CORE_BIT(E2),
+TO_CORE_BIT(SE0),
+TO_CORE_BIT(SE1),
+TO_CORE_BIT(SE3),
+TO_CORE_BIT(Stage2),
+
+TO_CORE_BIT(MUser),
+TO_CORE_BIT(MPriv),
+TO_CORE_BIT(MUserNegPri),
+TO_CORE_BIT(MPrivNegPri),
+TO_CORE_BIT(MSUser),
+TO_CORE_BIT(MSPriv),
+TO_CORE_BIT(MSUserNegPri),
+TO_CORE_BIT(MSPrivNegPri),
  } ARMMMUIdxBit;
  
+#undef TO_CORE_BIT

+
  #define MMU_USER_IDX 0
  
  static inline int arm_to_core_mmu_idx(ARMMMUIdx mmu_idx)




Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH] target/sparc: Remove old TODO file

2019-12-04 Thread Artyom Tarasenko
On Wed, Dec 4, 2019 at 5:27 PM Thomas Huth  wrote:
>
> On 30/09/2019 19.10, Thomas Huth wrote:
> > This file hasn't seen a real (non-trivial) update since 2008 anymore,
> > so we can assume that it is pretty much out of date and nobody cares
> > for it anymore. Let's simply remove it.
> >
> > Signed-off-by: Thomas Huth 
> > ---
> >  target/sparc/TODO | 88 ---
> >  1 file changed, 88 deletions(-)
> >  delete mode 100644 target/sparc/TODO
> >
> > diff --git a/target/sparc/TODO b/target/sparc/TODO
> > deleted file mode 100644
> > index b8c727e858..00
> > --- a/target/sparc/TODO
> > +++ /dev/null
> > @@ -1,88 +0,0 @@
> > -TODO-list:
> > -
> > -CPU common:
> > -- Unimplemented features/bugs:
> > - - Delay slot handling may fail sometimes (branch end of page, delay
> > - slot next page)
> > - - Atomical instructions
> > - - CPU features should match real CPUs (also ASI selection)
> > -- Optimizations/improvements:
> > - - Condition code/branch handling like x86, also for FPU?
> > - - Remove remaining explicit alignment checks
> > - - Global register for regwptr, so that windowed registers can be
> > - accessed directly
> > - - Improve Sparc32plus addressing
> > - - NPC/PC static optimisations (use JUMP_TB when possible)? (Is this
> > - obsolete?)
> > - - Synthetic instructions
> > - - MMU model dependent on CPU model
> > - - Select ASI helper at translation time (on V9 only if known)
> > - - KQemu/KVM support for VM only
> > - - Hardware breakpoint/watchpoint support
> > - - Cache emulation mode
> > - - Reverse-endian pages
> > - - Faster FPU emulation
> > - - Busy loop detection
> > -
> > -Sparc32 CPUs:
> > -- Unimplemented features/bugs:
> > - - Sun4/Sun4c MMUs
> > - - Some V8 ASIs
> > -
> > -Sparc64 CPUs:
> > -- Unimplemented features/bugs:
> > - - Interrupt handling
> > - - Secondary address space, other MMU functions
> > - - Many V9/UA2005/UA2007 ASIs
> > - - Rest of V9 instructions, missing VIS instructions
> > - - IG/MG/AG vs. UA2007 globals
> > - - Full hypervisor support
> > - - SMP/CMT
> > - - Sun4v CPUs
> > -
> > -Sun4:
> > -- To be added
> > -
> > -Sun4c:
> > -- A lot of unimplemented features
> > -- Maybe split from Sun4m
> > -
> > -Sun4m:
> > -- Unimplemented features/bugs:
> > - - Hardware devices do not match real boards
> > - - Floppy does not work
> > - - CS4231: merge with cs4231a, add DMA
> > - - Add cg6, bwtwo
> > - - Arbitrary resolution support
> > - - PCI for MicroSparc-IIe
> > - - JavaStation machines
> > - - SBus slot probing, FCode ROM support
> > - - SMP probing support
> > - - Interrupt routing does not match real HW
> > - - SuSE 7.3 keyboard sometimes unresponsive
> > - - Gentoo 2004.1 SMP does not work
> > - - SS600MP ledma -> lebuffer
> > - - Type 5 keyboard
> > - - Less fixed hardware choices
> > - - DBRI audio (Am7930)
> > - - BPP parallel
> > - - Diagnostic switch
> > - - ESP PIO mode
> > -
> > -Sun4d:
> > -- A lot of unimplemented features:
> > - - SBI
> > - - IO-unit
> > -- Maybe split from Sun4m
> > -
> > -Sun4u:
> > -- Unimplemented features/bugs:
> > - - Interrupt controller
> > - - PCI/IOMMU support (Simba, JIO, Tomatillo, Psycho, Schizo, Safari...)
> > - - SMP
> > - - Happy Meal Ethernet, flash, I2C, GPIO
> > - - A lot of real machine types
> > -
> > -Sun4v:
> > -- A lot of unimplemented features
> > - - A lot of real machine types
> >
>
> Ping?

Sorry for the delay, you are right the file doesn't reflect the
current state, so

Reviewed-by: Artyom Tarasenko 


-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu



Re: virtiofsd: Where should it live?

2019-12-04 Thread Dr. David Alan Gilbert
We seem to be settling out to either fsdev/virtiofsd or tools/virtiofsd
with tools picking up some speed as people seem to want to put a bunch
of other stuff in there.

Unless anyone shouts really loud, I'll work on making it
tools/virtiofsd.

Dave

--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [PATCH v6 0/9] Clock framework API

2019-12-04 Thread Damien Hedde



On 12/2/19 5:15 PM, Peter Maydell wrote:
> 
> The one topic I think we could do with discussing is whether
> a simple uint64_t giving the frequency of the clock in Hz is
> the right representation. In particular in your patch 9 the
> board has a clock frequency that's not a nice integer number
> of Hz. I think Philippe also mentioned on irc some board where
> the UART clock ends up at a weird frequency. Since the
> representation of the frequency is baked into the migration
> format it's going to be easier to get it right first rather
> than trying to change it later.
> 
> So what should the representation be? Some random thoughts:
> 
> 1) ptimer internally uses a 'period plus fraction' representation:
>  int64_t period is the integer part of the period in nanoseconds,
>  uint32_t period_frac is the fractional part of the period
> (if you like you can think of this as "96-bit integer
> period measured in units of one-2^32nd of a nanosecond").
> However its only public interfaces for setting the frequency
> are (a) set the frequency in Hz (uint32_t) or (b) set
> the period in nanoseconds (int64_t); the period_frac part
> is used to handle frequencies which don't work out to
> a nice whole number of nanoseconds per cycle.
> 
> 2) I hear that SystemC uses "value plus a time unit", with
> the smallest unit being a picosecond. (I think SystemC
> also lets you specify the duty cycle, but we definitely
> don't want to get into that!)

The "value" is internally stored in a 64bits unsigned integer.

> 
> 3) QEMUTimers are basically just nanosecond timers
> 
> 4) The MAME emulator seems to work with periods of
> 96-bit attoseconds (represented internally by a
> 32-bit count of seconds plus a 64-bit count of
> attoseconds). One attosecond is 1e-18 seconds.
> 
> Does anybody else have experience with other modelling
> or emulator technology and how it represents clocks ?

5) In linux, a clock rate is an "unsigned long" representing Hz.

> 
> I feel we should at least be able to represent clocks
> with the same accuracy that ptimer has.

Then is a maybe a good idea to store the period and not the frequency in
clocks so that we don't loose anything when we switch from a clock to a
ptimer ?

Regarding the clock, I don't see any strong obstacle to switch
internally to a period based value.
The only things we have to choose is how to represent a disabled clock.
Since putting a "0" period to a ptimer will disable the timer in
ptimer_reload(). We can choose that (and it's a good value because we
can multiply or divide it, it stays the same).

We could use the same representation as a ptimer. But if we don't keep a
C number representation, then computation of frequencies/periods will be
complicated at best and error prone.

>From that point of view, if we could stick to a 64bits integer (or
floating point number) it would be great. Can we use a sub nanosecond
unit that fit our needs ?

I did some test with a unit of 2^-32 of nanoseconds on 64bits (is that
the unit of the ptimer fractional part ?) and if I'm not mistaken
+ we have a frequency range from ~0.2Hz up to 10^18Hz
+ the resolution is decreasing with the frequency (but at 100Mhz we have
a ~2.3mHz resolution, at 1GHz it's ~0.23Hz and at 10GHz ~23Hz
resolution). We hit 1Hz resolution around 2GHz.

So it sounds to me we have largely enough resolution to model clocks in
the range of frequencies we will have to handle. What do you think ?

--
Damien



Re: [PATCH v2 1/2] migration: add savevm_state_handler_remove()

2019-12-04 Thread Dr. David Alan Gilbert
* Scott Cheloha (chel...@linux.vnet.ibm.com) wrote:
> Create a function to abstract common logic needed when removing a
> SaveStateEntry element from the savevm_state.handlers queue.
> 
> For now we just remove the element.  Soon it will involve additional
> cleanup.
> 
> Signed-off-by: Scott Cheloha 

Reviewed-by: Dr. David Alan Gilbert 

> ---
>  migration/savevm.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 8d95e261f6..b2e3b7222a 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -725,6 +725,11 @@ static void savevm_state_handler_insert(SaveStateEntry 
> *nse)
>  }
>  }
>  
> +static void savevm_state_handler_remove(SaveStateEntry *se)
> +{
> +QTAILQ_REMOVE(_state.handlers, se, entry);
> +}
> +
>  /* TODO: Individual devices generally have very little idea about the rest
> of the system, so instance_id should be removed/replaced.
> Meanwhile pass -1 as instance_id if you do not already have a clearly
> @@ -777,7 +782,7 @@ void unregister_savevm(DeviceState *dev, const char 
> *idstr, void *opaque)
>  
>  QTAILQ_FOREACH_SAFE(se, _state.handlers, entry, new_se) {
>  if (strcmp(se->idstr, id) == 0 && se->opaque == opaque) {
> -QTAILQ_REMOVE(_state.handlers, se, entry);
> +savevm_state_handler_remove(se);
>  g_free(se->compat);
>  g_free(se);
>  }
> @@ -841,7 +846,7 @@ void vmstate_unregister(DeviceState *dev, const 
> VMStateDescription *vmsd,
>  
>  QTAILQ_FOREACH_SAFE(se, _state.handlers, entry, new_se) {
>  if (se->vmsd == vmsd && se->opaque == opaque) {
> -QTAILQ_REMOVE(_state.handlers, se, entry);
> +savevm_state_handler_remove(se);
>  g_free(se->compat);
>  g_free(se);
>  }
> -- 
> 2.23.0
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




[PATCH for-5.0 5/8] acpi: cpuhp: spec: clarify store into 'Command data' when 'Command field' == 0

2019-12-04 Thread Igor Mammedov
Write section of 'Command data' register should describe what happens
when it's written into. Correct description in case the last stored
'Command field' value equals to 0, to reflect that currently it's not
supported.

Signed-off-by: Igor Mammedov 
---
 docs/specs/acpi_cpu_hotplug.txt | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/docs/specs/acpi_cpu_hotplug.txt b/docs/specs/acpi_cpu_hotplug.txt
index 19c508f..f3c552d 100644
--- a/docs/specs/acpi_cpu_hotplug.txt
+++ b/docs/specs/acpi_cpu_hotplug.txt
@@ -90,8 +90,7 @@ write access:
 other values: reserved
 [0x6-0x7] reserved
 [0x8] Command data: (DWORD access)
-  current 'Command field' value:
-  0: OSPM reads value of CPU selector
+  if last stored 'Command field' value:
   1: stores value into OST event register
   2: stores value into OST status register, triggers
  ACPI_DEVICE_OST QMP event from QEMU to external applications
-- 
2.7.4




Re: [PATCH v2 0/7] Enable Travis builds on arm64, ppc64le and s390x

2019-12-04 Thread Cleber Rosa
On Wed, Dec 04, 2019 at 04:46:11PM +0100, Thomas Huth wrote:
> Travis recently added build hosts for arm64, ppc64le and s390x, so
> this is a welcome addition to our Travis testing matrix.
> 
> Unfortunately, the builds are running in quite restricted LXD containers
> there, for example it is not possible to create huge files there (even
> if they are just sparse), and certain system calls are blocked. So we
> have to change some tests first to stop them failing in such environments.
>

Hi Thomas,

FIY, Avocado[1] has been running checks on those arches for a little
over two weeks and in my experience, there are still some reliability
issues (besides the other limitations you're already aware).

During the last week I've stopped seeing "machines" that wouldn't boot,
or severe networking limitations, but things are still not as smooth
as I'd like.

Anyway, I think we should insist on it, and give it a bit more time,
so I definitely agree with and appreciate this work.

[1] https://travis-ci.org/avocado-framework/avocado/builds

- Cleber.

> v2:
>  - Added "make check-tcg" and Alex' patch to disable cross-containers
>  - Explicitely set "dist: xenial" for arm64 and ppc64le since some
>iotests are crashing on bionic on these hosts.
>  - Dropped "libcap-dev" from the package list since it will be replaced
>by libcapng-dev soon.
> 
> Alex Bennée (1):
>   configure: allow disable of cross compilation containers
> 
> Thomas Huth (6):
>   iotests: Provide a function for checking the creation of huge files
>   iotests: Skip test 060 if it is not possible to create large files
>   iotests: Skip test 079 if it is not possible to create large files
>   tests/hd-geo-test: Skip test when images can not be created
>   tests/test-util-filemonitor: Skip test on non-x86 Travis containers
>   travis.yml: Enable builds on arm64, ppc64le and s390x
> 
>  .travis.yml   | 86 +++
>  configure |  8 +++-
>  tests/hd-geo-test.c   | 12 -
>  tests/qemu-iotests/005|  5 +-
>  tests/qemu-iotests/060|  3 ++
>  tests/qemu-iotests/079|  3 ++
>  tests/qemu-iotests/220|  6 +--
>  tests/qemu-iotests/common.rc  | 10 
>  tests/tcg/configure.sh|  6 ++-
>  tests/test-util-filemonitor.c | 11 +
>  10 files changed, 138 insertions(+), 12 deletions(-)
> 
> -- 
> 2.18.1
> 




Re: [PATCH 02/10] hw: arm: add Xunlong Orange Pi PC machine

2019-12-04 Thread Niek Linnenbank
On Wed, Dec 4, 2019 at 10:03 AM Philippe Mathieu-Daudé 
wrote:

> On 12/3/19 8:33 PM, Niek Linnenbank wrote:
> > Hello Philippe,
> >
> > Thanks for your quick review comments!
> > I'll start working on a v2 of the patches and include the changes you
> > suggested.
>
> Thanks, but I'd suggest to wait few more days to give time to others
> reviewers. Else having multiple versions of a big series reviewed at the
> same time is very confusing.
> I have other minor comments on others patches, but need to find the time
> to continue reviewing.
>
>
OK Philippe, I will follow your advise and wait a few more days before
submitting a new version.
I'll wait at least until you had a chance to review all the patches. I'm
new to the QEMU
community, so I will need to learn the process along the way.

Regards,
Niek





-- 
Niek Linnenbank


Re: [PATCH 01/10] hw: arm: add Allwinner H3 System-on-Chip

2019-12-04 Thread Niek Linnenbank
Hello Philippe,

On Wed, Dec 4, 2019 at 5:53 PM Philippe Mathieu-Daudé 
wrote:

> Hi Niek,
>
> On 12/2/19 10:09 PM, Niek Linnenbank wrote:
> > The Allwinner H3 is a System on Chip containing four ARM Cortex A7
> > processor cores. Features and specifications include DDR2/DDR3 memory,
> > SD/MMC storage cards, 10/100/1000Mbit ethernet, USB 2.0, HDMI and
> > various I/O modules. This commit adds support for the Allwinner H3
> > System on Chip.
> >
> > Signed-off-by: Niek Linnenbank 
> > ---
> >   MAINTAINERS |   7 ++
> >   default-configs/arm-softmmu.mak |   1 +
> >   hw/arm/Kconfig  |   8 ++
> >   hw/arm/Makefile.objs|   1 +
> >   hw/arm/allwinner-h3.c   | 215 
> >   include/hw/arm/allwinner-h3.h   | 118 ++
> >   6 files changed, 350 insertions(+)
> >   create mode 100644 hw/arm/allwinner-h3.c
> >   create mode 100644 include/hw/arm/allwinner-h3.h
>
> Since your series changes various files, can you have a look at the
> scripts/git.orderfile file and setup it for your QEMU contributions?
>

OK, done! I didn't know such a script existed, thanks.
I ran this command in my local repository:
 $ git config diff.orderFile scripts/git.orderfile
It seems to work, when I re-generate the patches, the order of the diff is
different.



> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 5e5e3e52d6..29c9936037 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -479,6 +479,13 @@ F: hw/*/allwinner*
> >   F: include/hw/*/allwinner*
> >   F: hw/arm/cubieboard.c
> >
> > +Allwinner-h3
> > +M: Niek Linnenbank 
> > +L: qemu-...@nongnu.org
> > +S: Maintained
> > +F: hw/*/allwinner-h3*
> > +F: include/hw/*/allwinner-h3*
> > +
> >   ARM PrimeCell and CMSDK devices
> >   M: Peter Maydell 
> >   L: qemu-...@nongnu.org
> > diff --git a/default-configs/arm-softmmu.mak
> b/default-configs/arm-softmmu.mak
> > index 1f2e0e7fde..d75a239c2c 100644
> > --- a/default-configs/arm-softmmu.mak
> > +++ b/default-configs/arm-softmmu.mak
> > @@ -40,3 +40,4 @@ CONFIG_FSL_IMX25=y
> >   CONFIG_FSL_IMX7=y
> >   CONFIG_FSL_IMX6UL=y
> >   CONFIG_SEMIHOSTING=y
> > +CONFIG_ALLWINNER_H3=y
> > diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
> > index c6e7782580..ebf8d2325f 100644
> > --- a/hw/arm/Kconfig
> > +++ b/hw/arm/Kconfig
> > @@ -291,6 +291,14 @@ config ALLWINNER_A10
> >   select SERIAL
> >   select UNIMP
> >
> > +config ALLWINNER_H3
> > +bool
> > +select ALLWINNER_A10_PIT
> > +select SERIAL
> > +select ARM_TIMER
> > +select ARM_GIC
> > +select UNIMP
> > +
> >   config RASPI
> >   bool
> >   select FRAMEBUFFER
> > diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
> > index fe749f65fd..956e496052 100644
> > --- a/hw/arm/Makefile.objs
> > +++ b/hw/arm/Makefile.objs
> > @@ -34,6 +34,7 @@ obj-$(CONFIG_DIGIC) += digic.o
> >   obj-$(CONFIG_OMAP) += omap1.o omap2.o
> >   obj-$(CONFIG_STRONGARM) += strongarm.o
> >   obj-$(CONFIG_ALLWINNER_A10) += allwinner-a10.o cubieboard.o
> > +obj-$(CONFIG_ALLWINNER_H3) += allwinner-h3.o
> >   obj-$(CONFIG_RASPI) += bcm2835_peripherals.o bcm2836.o raspi.o
> >   obj-$(CONFIG_STM32F205_SOC) += stm32f205_soc.o
> >   obj-$(CONFIG_XLNX_ZYNQMP_ARM) += xlnx-zynqmp.o xlnx-zcu102.o
> > diff --git a/hw/arm/allwinner-h3.c b/hw/arm/allwinner-h3.c
> > new file mode 100644
> > index 00..470fdfebef
> > --- /dev/null
> > +++ b/hw/arm/allwinner-h3.c
> > @@ -0,0 +1,215 @@
> > +/*
> > + * Allwinner H3 System on Chip emulation
> > + *
> > + * Copyright (C) 2019 Niek Linnenbank 
> > + *
> > + * This program is free software: you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation, either version 2 of the License, or
> > + * (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program.  If not, see  >.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "exec/address-spaces.h"
> > +#include "qapi/error.h"
> > +#include "qemu/module.h"
> > +#include "qemu/units.h"
> > +#include "cpu.h"
> > +#include "hw/sysbus.h"
> > +#include "hw/arm/allwinner-h3.h"
> > +#include "hw/misc/unimp.h"
> > +#include "sysemu/sysemu.h"
> > +
> > +static void aw_h3_init(Object *obj)
> > +{
> > +AwH3State *s = AW_H3(obj);
> > +
> > +sysbus_init_child_obj(obj, "gic", >gic, sizeof(s->gic),
> > +  TYPE_ARM_GIC);
> > +
> > +sysbus_init_child_obj(obj, "timer", >timer, sizeof(s->timer),
> > +  TYPE_AW_A10_PIT);
> > +}
> > +
> > +static void 

[PATCH-for-5.0] roms/edk2-funcs.sh: Use available GCC for ARM/Aarch64 targets

2019-12-04 Thread Philippe Mathieu-Daudé
Centos 7.7 only provides cross GCC 4.8.5, but the script forces
us to use GCC5. Since the same machinery is valid to check the
GCC version, remove the $emulation_target check.

  $ cat /etc/redhat-release
  CentOS Linux release 7.7.1908 (Core)

  $ aarch64-linux-gnu-gcc -v 2>&1 | tail -1
  gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC)

Signed-off-by: Philippe Mathieu-Daudé 
---
Patch to review with --ignore-all-space
---
 roms/edk2-funcs.sh | 48 +++---
 1 file changed, 20 insertions(+), 28 deletions(-)

diff --git a/roms/edk2-funcs.sh b/roms/edk2-funcs.sh
index 3f4485b201..a455611c0d 100644
--- a/roms/edk2-funcs.sh
+++ b/roms/edk2-funcs.sh
@@ -135,35 +135,27 @@ qemu_edk2_get_toolchain()
 return 1
   fi
 
-  case "$emulation_target" in
-(arm|aarch64)
-  printf 'GCC5\n'
+  if ! cross_prefix=$(qemu_edk2_get_cross_prefix "$emulation_target"); then
+return 1
+  fi
+
+  gcc_version=$("${cross_prefix}gcc" -v 2>&1 | tail -1 | awk '{print $3}')
+  # Run "git-blame" on "OvmfPkg/build.sh" in edk2 for more information on
+  # the mapping below.
+  case "$gcc_version" in
+([1-3].*|4.[0-7].*)
+  printf '%s: unsupported gcc version "%s"\n' \
+"$program_name" "$gcc_version" >&2
+  return 1
   ;;
-
-(i386|x86_64)
-  if ! cross_prefix=$(qemu_edk2_get_cross_prefix "$emulation_target"); then
-return 1
-  fi
-
-  gcc_version=$("${cross_prefix}gcc" -v 2>&1 | tail -1 | awk '{print $3}')
-  # Run "git-blame" on "OvmfPkg/build.sh" in edk2 for more information on
-  # the mapping below.
-  case "$gcc_version" in
-([1-3].*|4.[0-7].*)
-  printf '%s: unsupported gcc version "%s"\n' \
-"$program_name" "$gcc_version" >&2
-  return 1
-  ;;
-(4.8.*)
-  printf 'GCC48\n'
-  ;;
-(4.9.*|6.[0-2].*)
-  printf 'GCC49\n'
-  ;;
-(*)
-  printf 'GCC5\n'
-  ;;
-  esac
+(4.8.*)
+  printf 'GCC48\n'
+  ;;
+(4.9.*|6.[0-2].*)
+  printf 'GCC49\n'
+  ;;
+(*)
+  printf 'GCC5\n'
   ;;
   esac
 }
-- 
2.21.0




Re: [PATCH 04/10] arm: allwinner-h3: add USB host controller

2019-12-04 Thread Niek Linnenbank
On Wed, Dec 4, 2019 at 5:11 PM Aleksandar Markovic <
aleksandar.m.m...@gmail.com> wrote:

>
>
> On Monday, December 2, 2019, Niek Linnenbank 
> wrote:
>
>> The Allwinner H3 System on Chip contains multiple USB 2.0 bus
>> connections which provide software access using the Enhanced
>> Host Controller Interface (EHCI) and Open Host Controller
>> Interface (OHCI) interfaces. This commit adds support for
>> both interfaces in the Allwinner H3 System on Chip.
>>
>> Signed-off-by: Niek Linnenbank 
>> ---
>
>
> Niek, hi!
>
> I would like to clarify a detail here:
>
> The spec of the SoC enumerates (in 8.5.2.4. USB Host Register List) a
> number of registers for reading various USB-related states, but also for
> setting some of USB features.
>
> Does this series cover these registers, and interaction with them? If yes,
> how and where? If not, do you think it is not necessary at all? Or perhaps
> that it is a non-crucial limitation of this series?
>

Hello Aleksandar!

Very good question, I will try to explain what I did to support USB for the
Allwinner H3 emulation.
EHCI and OHCI are both standardized interfaces to the USB bus and both
provide their own standardized software interface.
Because they are standards, operatings system drivers can implement a
generic driver which uses the defined interface and
re-use it in multiple boards/platforms. Things that can be different
between boards are, for example the base address in
memory where the registers are provided.

In QEMU I found that both the OHCI and EHCI host controllers are already
emulated and used by other boards as well. For example,
you can find the OHCI registers from 8.5.2.4 implemented in the file
hw/usb/hcd-ohci.c:1515 in ohci_mem_read(). So for the Allwinner
H3 I simply had to define the base address for both controllers and create
the objects. At that point, the Linux kernel can access
the USB bus with the generic EHCI/OHCI platform drivers. In the Linux code,
you can see in the file ./arch/arm/boot/dts/sunxi-h3-h5.dtsi:281
the definitions named ehci0-ehci3 and ohci0-ohci3 where it specifies in the
device tree configuration to load the generic drivers.


>
> Thanks in advance, and congrats for your, it seems, first submission!
>
>
Thank you Aleksandar! Indeed, it is my first submission. I will do my best
to
update the patches to comply with the QEMU coding style and best practises.

Regards,
Niek


> Aleksandar
>
>
>  hw/arm/allwinner-h3.c| 20 
>>  hw/usb/hcd-ehci-sysbus.c | 17 +
>>  hw/usb/hcd-ehci.h|  1 +
>>  3 files changed, 38 insertions(+)
>>
>> diff --git a/hw/arm/allwinner-h3.c b/hw/arm/allwinner-h3.c
>> index 5566e979ec..afeb49c0ac 100644
>> --- a/hw/arm/allwinner-h3.c
>> +++ b/hw/arm/allwinner-h3.c
>> @@ -26,6 +26,7 @@
>>  #include "hw/sysbus.h"
>>  #include "hw/arm/allwinner-h3.h"
>>  #include "hw/misc/unimp.h"
>> +#include "hw/usb/hcd-ehci.h"
>>  #include "sysemu/sysemu.h"
>>
>>  static void aw_h3_init(Object *obj)
>> @@ -183,6 +184,25 @@ static void aw_h3_realize(DeviceState *dev, Error
>> **errp)
>>  }
>>  sysbus_mmio_map(SYS_BUS_DEVICE(>ccu), 0, AW_H3_CCU_BASE);
>>
>> +/* Universal Serial Bus */
>> +sysbus_create_simple(TYPE_AW_H3_EHCI, AW_H3_EHCI0_BASE,
>> + s->irq[AW_H3_GIC_SPI_EHCI0]);
>> +sysbus_create_simple(TYPE_AW_H3_EHCI, AW_H3_EHCI1_BASE,
>> + s->irq[AW_H3_GIC_SPI_EHCI1]);
>> +sysbus_create_simple(TYPE_AW_H3_EHCI, AW_H3_EHCI2_BASE,
>> + s->irq[AW_H3_GIC_SPI_EHCI2]);
>> +sysbus_create_simple(TYPE_AW_H3_EHCI, AW_H3_EHCI3_BASE,
>> + s->irq[AW_H3_GIC_SPI_EHCI3]);
>> +
>> +sysbus_create_simple("sysbus-ohci", AW_H3_OHCI0_BASE,
>> + s->irq[AW_H3_GIC_SPI_OHCI0]);
>> +sysbus_create_simple("sysbus-ohci", AW_H3_OHCI1_BASE,
>> + s->irq[AW_H3_GIC_SPI_OHCI1]);
>> +sysbus_create_simple("sysbus-ohci", AW_H3_OHCI2_BASE,
>> + s->irq[AW_H3_GIC_SPI_OHCI2]);
>> +sysbus_create_simple("sysbus-ohci", AW_H3_OHCI3_BASE,
>> + s->irq[AW_H3_GIC_SPI_OHCI3]);
>> +
>>  /* UART */
>>  if (serial_hd(0)) {
>>  serial_mm_init(get_system_memory(), AW_H3_UART0_REG_BASE, 2,
>> diff --git a/hw/usb/hcd-ehci-sysbus.c b/hw/usb/hcd-ehci-sysbus.c
>> index 020211fd10..174c3446ef 100644
>> --- a/hw/usb/hcd-ehci-sysbus.c
>> +++ b/hw/usb/hcd-ehci-sysbus.c
>> @@ -145,6 +145,22 @@ static const TypeInfo ehci_exynos4210_type_info = {
>>  .class_init= ehci_exynos4210_class_init,
>>  };
>>
>> +static void ehci_aw_h3_class_init(ObjectClass *oc, void *data)
>> +{
>> +SysBusEHCIClass *sec = SYS_BUS_EHCI_CLASS(oc);
>> +DeviceClass *dc = DEVICE_CLASS(oc);
>> +
>> +sec->capsbase = 0x0;
>> +sec->opregbase = 0x10;
>> +set_bit(DEVICE_CATEGORY_USB, dc->categories);
>> +}
>> +
>> +static const TypeInfo ehci_aw_h3_type_info = {
>> +.name  = TYPE_AW_H3_EHCI,

Re: [PATCH v6 0/9] Clock framework API

2019-12-04 Thread Philippe Mathieu-Daudé

On 12/4/19 5:40 PM, Damien Hedde wrote:

On 12/2/19 5:15 PM, Peter Maydell wrote:


The one topic I think we could do with discussing is whether
a simple uint64_t giving the frequency of the clock in Hz is
the right representation. In particular in your patch 9 the
board has a clock frequency that's not a nice integer number
of Hz. I think Philippe also mentioned on irc some board where
the UART clock ends up at a weird frequency. Since the
representation of the frequency is baked into the migration
format it's going to be easier to get it right first rather
than trying to change it later.


Important precision for Damien, IIUC we can not migrate float/double types.


So what should the representation be? Some random thoughts:

1) ptimer internally uses a 'period plus fraction' representation:
  int64_t period is the integer part of the period in nanoseconds,
  uint32_t period_frac is the fractional part of the period
(if you like you can think of this as "96-bit integer
period measured in units of one-2^32nd of a nanosecond").
However its only public interfaces for setting the frequency
are (a) set the frequency in Hz (uint32_t) or (b) set
the period in nanoseconds (int64_t); the period_frac part
is used to handle frequencies which don't work out to
a nice whole number of nanoseconds per cycle.


This is very clear, thanks Peter!

The period+period_frac split allow us to migrate the 96 bits:

VMSTATE_UINT32(period_frac, ptimer_state),
VMSTATE_INT64(period, ptimer_state),


2) I hear that SystemC uses "value plus a time unit", with
the smallest unit being a picosecond. (I think SystemC
also lets you specify the duty cycle, but we definitely
don't want to get into that!)


The "value" is internally stored in a 64bits unsigned integer.



3) QEMUTimers are basically just nanosecond timers


Similarly to SystemC, the QEMUTimers macro use a 'scale' unit, of:

#define SCALE_MS 100
#define SCALE_US 1000
#define SCALE_NS 1



4) The MAME emulator seems to work with periods of
96-bit attoseconds (represented internally by a
32-bit count of seconds plus a 64-bit count of
attoseconds). One attosecond is 1e-18 seconds.

Does anybody else have experience with other modelling
or emulator technology and how it represents clocks ?


5) In linux, a clock rate is an "unsigned long" representing Hz.



I feel we should at least be able to represent clocks
with the same accuracy that ptimer has.


Then is a maybe a good idea to store the period and not the frequency in
clocks so that we don't loose anything when we switch from a clock to a
ptimer ?


I think storing the period as an integer type is a good idea.

However if we store the period in nanoseconds, we get at most 1GHz 
frequency.


The attosecond granularity feels overkill.

If we use a 96-bit integer to store picoseconds and use similar SCALE 
macros we get to 1THz.


Regardless the unit chosen, as long it is integer, we can migrate it.
If can migrate the period, we don't need to migrate the frequency.
We can then use the float type in with the timer API to pass frequencies 
(which in the modeled hardware are ratios, likely not integers).


So we could use set_freq(100e6 / 3), set_freq(40e6 / 5.5) directly.


Regarding the clock, I don't see any strong obstacle to switch
internally to a period based value.
The only things we have to choose is how to represent a disabled clock.
Since putting a "0" period to a ptimer will disable the timer in
ptimer_reload(). We can choose that (and it's a good value because we
can multiply or divide it, it stays the same).

We could use the same representation as a ptimer. But if we don't keep a
C number representation, then computation of frequencies/periods will be
complicated at best and error prone.

 From that point of view, if we could stick to a 64bits integer (or
floating point number) it would be great. Can we use a sub nanosecond
unit that fit our needs ?

I did some test with a unit of 2^-32 of nanoseconds on 64bits (is that
the unit of the ptimer fractional part ?) and if I'm not mistaken
+ we have a frequency range from ~0.2Hz up to 10^18Hz
+ the resolution is decreasing with the frequency (but at 100Mhz we have
a ~2.3mHz resolution, at 1GHz it's ~0.23Hz and at 10GHz ~23Hz
resolution). We hit 1Hz resolution around 2GHz.

So it sounds to me we have largely enough resolution to model clocks in
the range of frequencies we will have to handle. What do you think ?


Back to your series, I wonder why you want to store the frequency in 
ClockIn. ClockIn shouldn't be aware at what frequency it is clocked. 
What matters is ClockOut, and each device exposing ClockOuts has a 
(migrated) state of the output frequencies (rather in fields, or encoded 
in registers). Once migrated, after the state is loaded back into the 
device, we call post_load(). Isn't it a good place to call 
clock_set_frequency(ClockOut[]) which will correctly set each ClockIn 
frequency.


IOW I don't think ClockIn/ClockOut require to 

[PATCH-for-5.0] hw/alpha/dp264: Use the DECchip Tulip network interface

2019-12-04 Thread Philippe Mathieu-Daudé
Commit 34ea023d4b9 introduced the Tulip PCI NIC.
Since this better models the DP264 hardware, use it.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/alpha/dp264.c | 4 ++--
 hw/alpha/Kconfig | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/alpha/dp264.c b/hw/alpha/dp264.c
index 51b3cf7a61..4424551ba1 100644
--- a/hw/alpha/dp264.c
+++ b/hw/alpha/dp264.c
@@ -85,9 +85,9 @@ static void clipper_init(MachineState *machine)
 /* VGA setup.  Don't bother loading the bios.  */
 pci_vga_init(pci_bus);
 
-/* Network setup.  e1000 is good enough, failing Tulip support.  */
+/* Network setup */
 for (i = 0; i < nb_nics; i++) {
-pci_nic_init_nofail(_table[i], pci_bus, "e1000", NULL);
+pci_nic_init_nofail(_table[i], pci_bus, "tulip", NULL);
 }
 
 /* 2 82C37 (dma) */
diff --git a/hw/alpha/Kconfig b/hw/alpha/Kconfig
index 15c59ff264..552e6a4c23 100644
--- a/hw/alpha/Kconfig
+++ b/hw/alpha/Kconfig
@@ -2,7 +2,7 @@ config DP264
 bool
 imply PCI_DEVICES
 imply TEST_DEVICES
-imply E1000_PCI
+imply TULIP
 select I82374
 select I8254
 select I8259
-- 
2.21.0




[for-5.0 PATCH 4/4] ppc: Ignore the CPU_INTERRUPT_EXITTB interrupt with KVM

2019-12-04 Thread Greg Kurz
This only makes sense with an emulated CPU. Don't set the bit in
CPUState::interrupt_request when using KVM to avoid confusions.

Signed-off-by: Greg Kurz 
---
 target/ppc/helper_regs.h |5 +
 1 file changed, 5 insertions(+)

diff --git a/target/ppc/helper_regs.h b/target/ppc/helper_regs.h
index 85dfe7687fbb..d78c2af63eac 100644
--- a/target/ppc/helper_regs.h
+++ b/target/ppc/helper_regs.h
@@ -22,6 +22,7 @@
 
 #include "qemu/main-loop.h"
 #include "exec/exec-all.h"
+#include "sysemu/kvm.h"
 
 /* Swap temporary saved registers with GPRs */
 static inline void hreg_swap_gpr_tgpr(CPUPPCState *env)
@@ -102,6 +103,10 @@ static inline void hreg_compute_hflags(CPUPPCState *env)
 
 static inline void cpu_interrupt_exittb(CPUState *cs)
 {
+if (!kvm_enabled()) {
+return;
+}
+
 if (!qemu_mutex_iothread_locked()) {
 qemu_mutex_lock_iothread();
 cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);




[for-5.0 PATCH 3/4] ppc: Don't use CPUPPCState::irq_input_state with modern Book3s CPU models

2019-12-04 Thread Greg Kurz
The power7_set_irq() and power9_set_irq() functions set this but it is
never used actually. Modern Book3s compatible CPUs are only supported
by the pnv and spapr machines. They have an interrupt controller, XICS
for POWER7/8 and XIVE for POWER9, whose models don't require to track
IRQ input states at the CPU level.

Drop these lines to avoid confusion.

Signed-off-by: Greg Kurz 
---
 hw/ppc/ppc.c |   16 ++--
 target/ppc/cpu.h |4 +++-
 2 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index fab73f1b1fc9..45834f98d176 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -275,10 +275,9 @@ void ppc970_irq_init(PowerPCCPU *cpu)
 static void power7_set_irq(void *opaque, int pin, int level)
 {
 PowerPCCPU *cpu = opaque;
-CPUPPCState *env = >env;
 
 LOG_IRQ("%s: env %p pin %d level %d\n", __func__,
-env, pin, level);
+>env, pin, level);
 
 switch (pin) {
 case POWER7_INPUT_INT:
@@ -292,11 +291,6 @@ static void power7_set_irq(void *opaque, int pin, int 
level)
 LOG_IRQ("%s: unknown IRQ pin %d\n", __func__, pin);
 return;
 }
-if (level) {
-env->irq_input_state |= 1 << pin;
-} else {
-env->irq_input_state &= ~(1 << pin);
-}
 }
 
 void ppcPOWER7_irq_init(PowerPCCPU *cpu)
@@ -311,10 +305,9 @@ void ppcPOWER7_irq_init(PowerPCCPU *cpu)
 static void power9_set_irq(void *opaque, int pin, int level)
 {
 PowerPCCPU *cpu = opaque;
-CPUPPCState *env = >env;
 
 LOG_IRQ("%s: env %p pin %d level %d\n", __func__,
-env, pin, level);
+>env, pin, level);
 
 switch (pin) {
 case POWER9_INPUT_INT:
@@ -334,11 +327,6 @@ static void power9_set_irq(void *opaque, int pin, int 
level)
 LOG_IRQ("%s: unknown IRQ pin %d\n", __func__, pin);
 return;
 }
-if (level) {
-env->irq_input_state |= 1 << pin;
-} else {
-env->irq_input_state &= ~(1 << pin);
-}
 }
 
 void ppcPOWER9_irq_init(PowerPCCPU *cpu)
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index e3e82327b723..f9528fc29d98 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1090,7 +1090,9 @@ struct CPUPPCState {
 #if !defined(CONFIG_USER_ONLY)
 /*
  * This is the IRQ controller, which is implementation dependent
- * and only relevant when emulating a complete machine.
+ * and only relevant when emulating a complete machine. Note that
+ * this isn't used by recent Book3s compatible CPUs (POWER7 and
+ * newer).
  */
 uint32_t irq_input_state;
 void **irq_inputs;




Re: [PATCH v4 26/40] target/arm: Update define_one_arm_cp_reg_with_opaque for VHE

2019-12-04 Thread Alex Bennée


Richard Henderson  writes:

> On 12/4/19 10:58 AM, Alex Bennée wrote:
>>> @@ -7437,13 +7437,10 @@ void define_one_arm_cp_reg_with_opaque(ARMCPU *cpu,
>>>  mask = PL0_RW;
>>>  break;
>>>  case 4:
>>> +case 5:
>>>  /* min_EL EL2 */
>>>  mask = PL2_RW;
>>>  break;
>>> -case 5:
>>> -/* unallocated encoding, so not possible */
>>> -assert(false);
>>> -break;
>> 
>> This change is fine - I don't think we should have asserted here anyway.
>> But don't we generate an unallocated exception if the CPU is v8.0?
>
> This change is only for validation of the system registers themselves.  It has
> nothing to do with the usage of system registers from the actual guest.

So what is the mechanism that feeds back to the translator?
access_check_cp_reg only seems to care about XSCALE. I guess
cp_access_ok would trip if you weren't at EL2 but what if you are a v8.0
at EL2?

-- 
Alex Bennée



Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion

2019-12-04 Thread David Gibson
On Wed, Dec 04, 2019 at 04:49:15PM +, Dr. David Alan Gilbert wrote:
> * Scott Cheloha (chel...@linux.vnet.ibm.com) wrote:
> > On Mon, Oct 21, 2019 at 09:14:44AM +0100, Dr. David Alan Gilbert wrote:
> > > * David Gibson (da...@gibson.dropbear.id.au) wrote:
> > > > On Fri, Oct 18, 2019 at 10:43:52AM +0100, Dr. David Alan Gilbert wrote:
> > > > > * Laurent Vivier (lviv...@redhat.com) wrote:
> > > > > > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote:
> > > > > > > * Scott Cheloha (chel...@linux.vnet.ibm.com) wrote:
> > > > > > >> savevm_state's SaveStateEntry TAILQ is a priority queue.  
> > > > > > >> Priority
> > > > > > >> sorting is maintained by searching from head to tail for a 
> > > > > > >> suitable
> > > > > > >> insertion spot.  Insertion is thus an O(n) operation.
> > > > > > >>
> > > > > > >> If we instead keep track of the head of each priority's subqueue
> > > > > > >> within that larger queue we can reduce this operation to O(1) 
> > > > > > >> time.
> > > > > > >>
> > > > > > >> savevm_state_handler_remove() becomes slightly more complex to
> > > > > > >> accomodate these gains: we need to replace the head of a 
> > > > > > >> priority's
> > > > > > >> subqueue when removing it.
> > > > > > >>
> > > > > > >> With O(1) insertion, booting VMs with many SaveStateEntry 
> > > > > > >> objects is
> > > > > > >> more plausible.  For example, a ppc64 VM with maxmem=8T has 
> > > > > > >> 4 such
> > > > > > >> objects to insert.
> > > > > > > 
> > > > > > > Separate from reviewing this patch, I'd like to understand why 
> > > > > > > you've
> > > > > > > got 4 objects.  This feels very very wrong and is likely to 
> > > > > > > cause
> > > > > > > problems to random other bits of qemu as well.
> > > > > > 
> > > > > > I think the 4 objects are the "dr-connectors" that are used to 
> > > > > > plug
> > > > > > peripherals (memory, pci card, cpus, ...).
> > > > > 
> > > > > Yes, Scott confirmed that in the reply to the previous version.
> > > > > IMHO nothing in qemu is designed to deal with that many 
> > > > > devices/objects
> > > > > - I'm sure that something other than the migration code is going to
> > > > > get upset.
> > > > 
> > > > It kind of did.  Particularly when there was n^2 and n^3 cubed
> > > > behaviour in the property stuff we had some ludicrously long startup
> > > > times (hours) with large maxmem values.
> > > > 
> > > > Fwiw, the DRCs for PCI slots, DRCs and PHBs aren't really a problem.
> > > > The problem is the memory DRCs, there's one for each LMB - each 256MiB
> > > > chunk of memory (or possible memory).
> > > > 
> > > > > Is perhaps the structure wrong somewhere - should there be a single 
> > > > > DRC
> > > > > device that knows about all DRCs?
> > > > 
> > > > Maybe.  The tricky bit is how to get there from here without breaking
> > > > migration or something else along the way.
> > > 
> > > Switch on the next machine type version - it doesn't matter if migration
> > > is incompatible then.
> > 
> > 1mo bump.
> > 
> > Is there anything I need to do with this patch in particular to make it 
> > suitable
> > for merging?
> 
> Apologies for the delay;  hopefully this will go in one of the pulls
> just after the tree opens again.
> 
> Please please try and work on reducing the number of objects somehow -
> while this migration fix is a useful short term fix, and not too
> invasive; having that many objects around qemu is a really really bad
> idea so needs fixing properly.

I'm hoping to have a crack at this tomorrow.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v4 26/40] target/arm: Update define_one_arm_cp_reg_with_opaque for VHE

2019-12-04 Thread Richard Henderson
On 12/4/19 10:58 AM, Alex Bennée wrote:
>> @@ -7437,13 +7437,10 @@ void define_one_arm_cp_reg_with_opaque(ARMCPU *cpu,
>>  mask = PL0_RW;
>>  break;
>>  case 4:
>> +case 5:
>>  /* min_EL EL2 */
>>  mask = PL2_RW;
>>  break;
>> -case 5:
>> -/* unallocated encoding, so not possible */
>> -assert(false);
>> -break;
> 
> This change is fine - I don't think we should have asserted here anyway.
> But don't we generate an unallocated exception if the CPU is v8.0?

This change is only for validation of the system registers themselves.  It has
nothing to do with the usage of system registers from the actual guest.


r~




<    1   2   3   >