Re: [PATCH v3 1/7] file-posix: fix max_iov for /dev/sg devices

2021-06-06 Thread Vladimir Sementsov-Ogievskiy

03.06.2021 16:37, Paolo Bonzini wrote:

Even though it was only called for devices that have bs->sg set (which
must be character devices),
sg_get_max_segments looked at /sys/dev/block which only works for
block devices.

On Linux the sg driver has its own way to provide the maximum number of
iovecs in a scatter/gather list.

Signed-off-by: Paolo Bonzini 
---
  block/file-posix.c | 11 +++
  1 file changed, 11 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index f37dfc10b3..58db526cc2 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1180,6 +1180,17 @@ static int sg_get_max_segments(int fd)
  goto out;
  }
  
+if (S_ISCHR(st->st_mode)) {

+if (ioctl(fd, SG_GET_SG_TABLESIZE, ) == 0) {
+return ret;


So, this is a new most-probable path for char-dev, yes?

Is it better do it at start of the function to avoid extra fstat() ?


+}
+return -EIO;
+}
+
+if (!S_ISBLK(st->st_mode)) {
+return -ENOTSUP;
+}
+
  sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
  major(st.st_rdev), minor(st.st_rdev));
  sysfd = open(sysfspath, O_RDONLY);




--
Best regards,
Vladimir



Re: [RFC PATCH 00/11] hw/nvme: reimplement all multi-aio commands with custom aiocbs

2021-06-06 Thread Vladimir Sementsov-Ogievskiy

04.06.2021 09:52, Klaus Jensen wrote:

From: Klaus Jensen 

This series reimplements flush, dsm, copy, zone reset and format nvm to
allow cancellation. I posted an RFC back in March ("hw/block/nvme:
convert ad-hoc aio tracking to aiocb") and I've applied some feedback
from Stefan and reimplemented the remaining commands.

The basic idea is to define custom AIOCBs for these commands. The custom
AIOCB takes care of issuing all the "nested" AIOs one by one instead of
blindly sending them off simultaneously without tracking the returned
aiocbs.


I'm not familiar with nvme. But intuitively, isn't it less efficient to send 
mutltiple commands one-by-one? Overall, wouldn't it be slower? In block layer 
we mostly do opposite transition: instead of doing IO operations one-by-one, 
run them simultaneously to make a non-empty queue on a device.. Even on one 
device. This way overall performance is increased.

If you need to store nested AIOCBs, you may store them in a list for example, 
and cancel in a loop, keeping simultaneous start for all flushes.. If you send 
two flushes on two different disks, what's the reason to wait for first flush 
finish before issuing the second?



I've kept the RFC since I'm still new to using the block layer like
this. I was hoping that Stefan could find some time to look over this -
this is a huge series, so I don't expect non-nvme folks to spend a large
amount of time on it, but I would really like feedback on my approach in
the reimplementation of flush and format.


If I understand your code correctly, you do stat next io operation from 
call-back of a previous. It works, and it is similar to haw mirror block-job 
was operating some time ago (still it maintained several in-flight requests 
simultaneously, but I'm about using _aio_ functions). Still, now mirror doesn't 
use _aio_ functions like this.

Better approach to call several io functions of block layer one-by-one is 
creating a coroutine. You may just add a coroutine function, that does all your 
linear logic as you want, without any callbacks like:

nvme_co_flush()
{
   for (...) {
  blk_co_flush();
   }
}

(and you'll need qemu_coroutine_create() and qemu_coroutine_enter() to start a 
coroutine).

Still, I'm not sure that moving from simultaneous issuing several IO commands 
to sequential is good idea..
And this way you of course can't use blk_aio_canel.. This leads to my last 
doubt:

One more thing I don't understand after fast look at the series: how cancelation 
works? It seems to me, that you just call cancel on nested AIOCBs, produced by 
blk_, but no one of them implement cancel.. I see only four 
implementations of .cancel_async callback in the whole Qemu code: in iscsi, in 
ide/core.c, in dma-helpers.c and in thread-pool.c.. Seems no one is related to 
blk_aio_flush() and other blk_* functions you call in the series. Or, what I miss?



Those commands are special in
that may issue AIOs to multiple namespaces and thus, to multiple block
backends. Since this device does not support iothreads, I've opted for
simply always returning the main loop aio context, but I wonder if this
is acceptable or not. It might be the case that this should contain an
assert of some kind, in case someone starts adding iothread support.

Klaus Jensen (11):
   hw/nvme: reimplement flush to allow cancellation
   hw/nvme: add nvme_block_status_all helper
   hw/nvme: reimplement dsm to allow cancellation
   hw/nvme: save reftag when generating pi
   hw/nvme: remove assert from nvme_get_zone_by_slba
   hw/nvme: use prinfo directly in nvme_check_prinfo and nvme_dif_check
   hw/nvme: add dw0/1 to the req completion trace event
   hw/nvme: reimplement the copy command to allow aio cancellation
   hw/nvme: reimplement zone reset to allow cancellation
   hw/nvme: reimplement format nvm to allow cancellation
   Partially revert "hw/block/nvme: drain namespaces on sq deletion"

  hw/nvme/nvme.h   |   10 +-
  include/block/nvme.h |8 +
  hw/nvme/ctrl.c   | 1861 --
  hw/nvme/dif.c|   64 +-
  hw/nvme/trace-events |   21 +-
  5 files changed, 1102 insertions(+), 862 deletions(-)




--
Best regards,
Vladimir



Re: [RFC PATCH 5/5] target/ppc: powerpc_excp: Move interrupt raising code to QOM

2021-06-06 Thread David Gibson
On Tue, Jun 01, 2021 at 06:46:49PM -0300, Fabiano Rosas wrote:
> This patch introduces a new way to dispatch the emulated interrupts in
> powerpc_excp. It leverages the QEMU object model to store the
> implementations for each interrupt and link them to their identifier
> from POWERPC_EXCP enum. The processor-specific code then uses this
> identifier to register which interrupts it supports.
> 
> Interrupts now come out of the big switch in powerpc_excp into their
> own functions:
> 
>   static void ppc_intr_system_reset()
>   {
>   /*
>* Interrupt code. Sets any specific registers and MSR bits.
>*/
>   }
>   PPC_DEFINE_INTR(POWERPC_EXCP_RESET, system_reset, "System reset");
> 
>   ^This line registers the interrupt with QOM.
> 
> When we initialize the emulated processor, the correct set of
> interrupts is instantiated (pretty much like we already do):
> 
>   static void init_excp_POWER9(CPUPPCState *env)
>   {
>   ppc_intr_add(env, 0x0100, POWERPC_EXCP_RESET);
>   (...)
>   }
> 
> When it comes the time to inject the interrupt:
> 
>   static inline void powerpc_excp(PowerPCCPU *cpu, int excp_model, int excp)
>   {
>   (...)
> 
>   intr = >entry_points[excp];
>   intr->setup_regs();<-- ppc_intr_system_reset function
> 
>   (...)
>   env->spr[srr0] = nip;
>   env->spr[srr1] = msr;
> 
>   env->nip = intr->addr;
>   env->msr = new_msr;
>   }
> 
> Some points to notice:
> 
> - The structure for the new PPCInterrupt class object is stored
>   directly inside of CPUPPCState (env) so the translation code can
>   still access it linearly at an offset.
> 
> - Some interrupts were being registered for P7/8/9/10 but were never
>   implemented (i.e. not in the powerpc_excp switch statement). They
>   are likely never triggered. We now get the benefit of QOM warning in
>   such cases:
> 
>   qemu-system-ppc64: missing object type 'POWERPC_EXCP_SDOOR'
>   qemu-system-ppc64: missing object type 'POWERPC_EXCP_HV_MAINT'
> 
> - The code currently allows for Program interrupts to be ignored and
>   System call interrupts to be directed to the vhyp hypercall code. I
>   have added an 'ignore' flag to deal with these two cases and return
>   early from powerpc_excp.
> 
> Signed-off-by: Fabiano Rosas 
> ---
>  target/ppc/cpu.h |  29 +-
>  target/ppc/cpu_init.c| 640 +++
>  target/ppc/excp_helper.c | 545 ++---
>  target/ppc/interrupts.c  | 638 ++
>  target/ppc/machine.c |   2 +-
>  target/ppc/meson.build   |   1 +
>  target/ppc/ppc_intr.h|  55 
>  target/ppc/translate.c   |   3 +-
>  8 files changed, 1066 insertions(+), 847 deletions(-)
>  create mode 100644 target/ppc/interrupts.c
>  create mode 100644 target/ppc/ppc_intr.h
> 
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index b0934d9be4..012677965f 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -174,6 +174,33 @@ enum {
>  POWERPC_EXCP_TRAP  = 0x40,
>  };
>  
> +typedef struct PPCInterrupt PPCInterrupt;
> +typedef struct ppc_intr_args ppc_intr_args;
> +typedef void (*ppc_intr_fn_t)(PowerPCCPU *cpu, PPCInterrupt *intr,
> +  int excp_model, ppc_intr_args *regs,
> +  bool *ignore);

Hmm.  Using this signature kind of enforces that we dispatch based on
which exception *then* then the exception model.  I think that's
backwards: since what vectors exist and make sense depends on the
exception model, I think we should ideally be splitting on model
first, then exception type.

Now, a lot of the existing code is exception-then-model and changing
that is a long term project, but I don't think we should lock
ourselves further into doing it the backwards way.

> +
> +struct ppc_intr_args {
> +target_ulong nip;
> +target_ulong msr;
> +target_ulong new_nip;
> +target_ulong new_msr;
> +int sprn_srr0;
> +int sprn_srr1;
> +int sprn_asrr0;
> +int sprn_asrr1;
> +int lev;
> +};
> +
> +struct PPCInterrupt {

Having an info/dispatch structure for each vector makes sense..

> +Object parent;

..but making it a QOM object really seems like overkill.  In fact
making it a QOM object at least somewhat exposes the internal
structure to the user via QMP, which I really don't think we want to
do.

> +
> +int id;
> +const char *name;
> +target_ulong addr;
> +ppc_intr_fn_t setup_regs;
> +};
> +
>  #define PPC_INPUT(env) ((env)->bus_model)
>  
>  
> /*/
> @@ -1115,7 +1142,7 @@ struct CPUPPCState {
>  uint32_t irq_input_state;
>  void **irq_inputs;
>  
> -target_ulong excp_vectors[POWERPC_EXCP_NB]; /* Exception vectors */
> +PPCInterrupt entry_points[POWERPC_EXCP_NB];
>  target_ulong excp_prefix;
>  target_ulong ivor_mask;
>  target_ulong ivpr_mask;
> diff --git 

Re: [PATCH 02/11] target/ppc: moved ppc_store_sdr1 to cpu.c

2021-06-06 Thread David Gibson
On Mon, May 31, 2021 at 03:17:47PM -0300, Bruno Piazera Larsen wrote:
> 
> On 13/05/2021 00:50, David Gibson wrote:
> > On Wed, May 12, 2021 at 11:08:04AM -0300, Bruno Larsen (billionai) wrote:
> > > Moved this function that is required in !TCG cases into a
> > > common code file
> > The reasons it's needed by !TCG are kind of bogus, related to
> > weirdness in the way KVM PR works.  But it's fair not to care about
> > that right now, so, applied to ppc-for-6.1.
> Now that the future is here, I was looking into why might the reasons be
> bogus. From what I can see, what should be happening is just storing what
> was retrieved by the kvm ioctl, right? Am I missing something?

Actually, I was mixing this up with something else.  We invoke
ppc_store_sdr1() in several places from !TCG code.  From explicitly
KVM code in kvmppc_book_get_sregs() - since we more or less trust KVM
we could in theory just store directly to the value.  However we also
use it in common code in machine.c in the loadvm path.  I don't want
to bypass ppc_store_sdr1() there so that we have a common place for
all SDR1 updates, for sanity checking and any secondary alterations we
need.

So having this in common code is the right thing to do.

(In case you care, the "bogus reason" thing I was thinking of in
connection to SDR1 handling is actually the special
encode_hpt_for_kvm_pr() case in kvmppc_but_books_sregs.  That's there
because KVM PR requires us to inform us of the *qemu userspace*
location of the hash table for PAPR guests via the SDR1 value, which
doesn't really make sense)

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [RFC PATCH 4/5] target/ppc: powerpc_excp: Standardize arguments to interrupt code

2021-06-06 Thread David Gibson
On Tue, Jun 01, 2021 at 06:46:48PM -0300, Fabiano Rosas wrote:
> The next patches will split the big switch statement in powerpc_excp
> into individual functions so it would be cleaner if all variables are
> already grouped in a structure and their names consistent.
> 
> This patch makes it so that the old values for MSR and NIP (from env)
> are saved at the beginning as regs.msr and regs.nip and all
> modifications are done over this regs version. At the end of the
> function regs.msr and regs.nip are saved in the SRRs and regs.new_msr
> and regs.new_nip are written to env.
> 
> There are two points of interest here:
> 
> - The system call code has a particularity where it needs to use
> env->nip because it might return early and the modification needs to
> be seen by the virtual hypervisor hypercall code. I have added a
> comment making this clear.
> 
> - The MSR filter at the beginning is being applied to the old MSR value
> only, i.e. the one that goes into SRR1. The new_msr is taken from
> env->msr without filtering the reserved bits. This might be a bug in
> the existing code. I'm also adding a comment to point that out.
> 
> Signed-off-by: Fabiano Rosas 
> ---
>  target/ppc/excp_helper.c | 231 +--
>  1 file changed, 125 insertions(+), 106 deletions(-)
> 
> diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> index fd147e2a37..12bf829c8f 100644
> --- a/target/ppc/excp_helper.c
> +++ b/target/ppc/excp_helper.c
> @@ -186,7 +186,7 @@ static int powerpc_reset_wakeup(CPUState *cs, CPUPPCState 
> *env, int excp,
>  static inline void ppc_excp_apply_ail(PowerPCCPU *cpu, int excp_model, int 
> excp,
>target_ulong msr,
>target_ulong *new_msr,
> -  target_ulong *vector)
> +  target_ulong *new_nip)
>  {
>  #if defined(TARGET_PPC64)
>  CPUPPCState *env = >env;
> @@ -263,9 +263,9 @@ static inline void ppc_excp_apply_ail(PowerPCCPU *cpu, 
> int excp_model, int excp,
>  
>  if (excp != POWERPC_EXCP_SYSCALL_VECTORED) {
>  if (ail == 2) {
> -*vector |= 0x00018000ull;
> +*new_nip |= 0x00018000ull;
>  } else if (ail == 3) {
> -*vector |= 0xc0004000ull;
> +*new_nip |= 0xc0004000ull;
>  }
>  } else {
>  /*
> @@ -273,15 +273,15 @@ static inline void ppc_excp_apply_ail(PowerPCCPU *cpu, 
> int excp_model, int excp,
>   * only the MSR. AIL=3 replaces the 0x17000 base with 0xc...3000.
>   */
>  if (ail == 3) {
> -*vector &= ~0x00017000ull; /* Un-apply the base offset */
> -*vector |= 0xc0003000ull; /* Apply scv's AIL=3 offset */
> +*new_nip &= ~0x00017000ull; /* Un-apply the base offset 
> */
> +*new_nip |= 0xc0003000ull; /* Apply scv's AIL=3 offset */
>  }
>  }
>  #endif
>  }
>  
> -static inline void powerpc_set_excp_state(PowerPCCPU *cpu,
> -  target_ulong vector, target_ulong 
> msr)
> +static inline void powerpc_set_excp_state(PowerPCCPU *cpu, target_ulong 
> new_nip,
> +  target_ulong new_msr)
>  {
>  CPUState *cs = CPU(cpu);
>  CPUPPCState *env = >env;
> @@ -294,9 +294,9 @@ static inline void powerpc_set_excp_state(PowerPCCPU *cpu,
>   * will prevent setting of the HV bit which some exceptions might need
>   * to do.
>   */
> -env->msr = msr & env->msr_mask;
> +env->msr = new_msr & env->msr_mask;
>  hreg_compute_hflags(env);
> -env->nip = vector;
> +env->nip = new_nip;
>  /* Reset exception state */
>  cs->exception_index = POWERPC_EXCP_NONE;
>  env->error_code = 0;
> @@ -311,6 +311,17 @@ static inline void powerpc_set_excp_state(PowerPCCPU 
> *cpu,
>  check_tlb_flush(env, false);
>  }
>  
> +struct ppc_intr_args {

Please use qemu coding style convetions for type names (CamelCase and
a typedef).

> +target_ulong nip;
> +target_ulong msr;
> +target_ulong new_nip;
> +target_ulong new_msr;
> +int sprn_srr0;
> +int sprn_srr1;
> +int sprn_asrr0;
> +int sprn_asrr1;
> +};
> +
>  /*
>   * Note that this function should be greatly optimized when called
>   * with a constant excp, from ppc_hw_interrupt
> @@ -319,37 +330,40 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int 
> excp_model, int excp)
>  {
>  CPUState *cs = CPU(cpu);
>  CPUPPCState *env = >env;
> -target_ulong msr, new_msr, vector;
> -int srr0, srr1, asrr0, asrr1, lev = -1;
> +struct ppc_intr_args regs;
> +int lev = -1;
>  
>  qemu_log_mask(CPU_LOG_INT, "Raise exception at " TARGET_FMT_lx
>" => %08x (%02x)\n", env->nip, excp, env->error_code);
>  
>  /* new srr1 value excluding must-be-zero bits */
>  if 

Re: [RFC PATCH] target/ppc: removed usage of ppc_store_sdr1 in kvm.c

2021-06-06 Thread David Gibson
On Tue, Jun 01, 2021 at 03:42:42PM -0300, Bruno Larsen (billionai) wrote:
> The only use of this function in kvm.c is right after using the KVM
> ioctl to get the registers themselves, so there is no need to do the
> error checks done by ppc_store_sdr1.
> 
> The probable reason this was here before is because of the hack where
> KVM PR stores the hash table size along with the SDR1 information, but
> since ppc_store_sdr1 would also store that information, there should be
> no need to do any extra processing here.
> 
> Signed-off-by: Bruno Larsen (billionai) 
> ---
> 
> This change means we won't have to compile ppc_store_sdr1 when we get
> disable-tcg working, but I'm not working on that code motion just yet
> since Lucas is dealing with the same file.
> 
> I'm sending this as an RFC because I'm pretty sure I'm missing
> something, but from what I can see, this is all we'd need

I don't think this is a good idea.  Even though it's not strictly
necessary for KVM, I'd prefer to have a common entry point for SDR1
updates to reduce confusion.  Plus this won't be sufficient to fix
things for !TCG builds, since we still have the common calls on the
loadvm path.

> 
>  target/ppc/kvm.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index 104a308abb..3f52a7189d 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -1159,7 +1159,11 @@ static int kvmppc_get_books_sregs(PowerPCCPU *cpu)
>  }
>  
>  if (!cpu->vhyp) {
> -ppc_store_sdr1(env, sregs.u.s.sdr1);
> +/*
> + * We have just gotten the SDR1, there should be no
> + * reason to do error checking right?
> + */
> +env->spr[SPR_SDR1] = sregs.u.s.sdr1;
>  }
>  
>  /* Sync SLB */

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH qemu v20] spapr: Implement Open Firmware client interface

2021-06-06 Thread David Gibson
On Fri, Jun 04, 2021 at 03:59:22PM +0200, BALATON Zoltan wrote:
> On Fri, 4 Jun 2021, David Gibson wrote:
> > On Wed, Jun 02, 2021 at 02:29:29PM +0200, BALATON Zoltan wrote:
> > > On Wed, 2 Jun 2021, David Gibson wrote:
> > > > On Thu, May 27, 2021 at 02:42:39PM +0200, BALATON Zoltan wrote:
> > > > > On Thu, 27 May 2021, David Gibson wrote:
> > > > > > On Tue, May 25, 2021 at 12:08:45PM +0200, BALATON Zoltan wrote:
> > > > > > > On Tue, 25 May 2021, David Gibson wrote:
> > > > > > > > On Mon, May 24, 2021 at 12:55:07PM +0200, BALATON Zoltan wrote:
> > > > > > > > > On Mon, 24 May 2021, David Gibson wrote:
> > > > > > > > > > On Sun, May 23, 2021 at 07:09:26PM +0200, BALATON Zoltan 
> > > > > > > > > > wrote:
> > > > > > > > > > > On Sun, 23 May 2021, BALATON Zoltan wrote:
> > > > > > > > > > > > On Sun, 23 May 2021, Alexey Kardashevskiy wrote:
> > > > > > > > > > > > > One thing to note about PCI is that normally I think 
> > > > > > > > > > > > > the client
> > > > > > > > > > > > > expects the firmware to do PCI probing and SLOF does 
> > > > > > > > > > > > > it. But VOF
> > > > > > > > > > > > > does not and Linux scans PCI bus(es) itself. Might be 
> > > > > > > > > > > > > a problem for
> > > > > > > > > > > > > you kernel.
> > > > > > > > > > > > 
> > > > > > > > > > > > I'm not sure what info does MorphOS get from the device 
> > > > > > > > > > > > tree and what it
> > > > > > > > > > > > probes itself but I think it may at least need device 
> > > > > > > > > > > > ids and info about
> > > > > > > > > > > > the PCI bus to be able to access the config regs, after 
> > > > > > > > > > > > that it should
> > > > > > > > > > > > set the devices up hopefully. I could add these from 
> > > > > > > > > > > > the board code to
> > > > > > > > > > > > device tree so VOF does not need to do anything about 
> > > > > > > > > > > > it. However I'm
> > > > > > > > > > > > not getting to that point yet because it crashes on 
> > > > > > > > > > > > something that it's
> > > > > > > > > > > > missing and couldn't yet find out what is that.
> > > > > > > > > > > > 
> > > > > > > > > > > > I'd like to get Linux working now as that would be 
> > > > > > > > > > > > enough to test this
> > > > > > > > > > > > and then if for MorphOS we still need a ROM it's not a 
> > > > > > > > > > > > problem if at
> > > > > > > > > > > > least we can boot Linux without the original firmware. 
> > > > > > > > > > > > But I can't make
> > > > > > > > > > > > Linux open a serial console and I don't know what it 
> > > > > > > > > > > > needs for that. Do
> > > > > > > > > > > > you happen to know? I've looked at the sources in 
> > > > > > > > > > > > Linux/arch/powerpc but
> > > > > > > > > > > > not sure how it would find and open a serial port on 
> > > > > > > > > > > > pegasos2. It seems
> > > > > > > > > > > > to work with the board firmware and now I can get it to 
> > > > > > > > > > > > boot with VOF
> > > > > > > > > > > > but then it does not open serial so it probably needs 
> > > > > > > > > > > > something in the
> > > > > > > > > > > > device tree or expects the firmware to set something up 
> > > > > > > > > > > > that we should
> > > > > > > > > > > > add in pegasos2.c when using VOF.
> > > > > > > > > > > 
> > > > > > > > > > > I've now found that Linux uses rtas methods 
> > > > > > > > > > > read-pci-config and
> > > > > > > > > > > write-pci-config for PCI access on pegasos2 so this means 
> > > > > > > > > > > that we'll
> > > > > > > > > > > probably need rtas too (I hoped we could get away without 
> > > > > > > > > > > it if it were only
> > > > > > > > > > > used for shutdown/reboot or so but seems Linux needs it 
> > > > > > > > > > > for PCI as well and
> > > > > > > > > > > does not scan the bus and won't find some devices without 
> > > > > > > > > > > it).
> > > > > > > > > > 
> > > > > > > > > > Yes, definitely sounds like you'll need an RTAS 
> > > > > > > > > > implementation.
> > > > > > > > > 
> > > > > > > > > I plan to fix that after managed to get serial working as 
> > > > > > > > > that seems to not
> > > > > > > > > need it. If I delete the rtas-size property from /rtas on the 
> > > > > > > > > original
> > > > > > > > > firmware that makes Linux skip instantiating rtas, but I 
> > > > > > > > > still get serial
> > > > > > > > > output just not accessing PCI devices. So I think it should 
> > > > > > > > > work and keeps
> > > > > > > > > things simpler at first. Then I'll try rtas later.
> > > > > > > > > 
> > > > > > > > > > > While VOF can do rtas, this causes a problem with the 
> > > > > > > > > > > hypercall method using
> > > > > > > > > > > sc 1 that goes through vhyp but trips the assert in 
> > > > > > > > > > > ppc_store_sdr1() so
> > > > > > > > > > > cannot work after guest is past quiesce.
> > > > > > > > > > 
> > > > > > > > > > > So the question is why is that
> > > > > > > > > > > assert there
> > > > > > > > > > 
> > > > > > > > > > Ah.. right.  So, vhyp was designed 

Re: [RFC PATCH 4/4] target/ppc: Moved helpers to mmu_helper.c

2021-06-06 Thread David Gibson
On Wed, Jun 02, 2021 at 04:26:04PM -0300, Lucas Mateus Castro (alqotel) wrote:
> Moved helpers from target/ppc/mmu-hash64.c to target/ppc/mmu_helpers.c
> and removed #ifdef CONFIG_TCG and #include exec/helper-proto.h from
> mmu-hash64.c
> 
> Signed-off-by: Lucas Mateus Castro (alqotel)
> 

I'd prefer not to do this.  These helpers used to be in
mmu_helper.c, along with most of the stuff in mmu-*.[ch].  I think the
division by MMU model is more important than the TCG/!TCG distinction,
so I'd prefer to keep these here, even if it means ifdefs.  Eventually
we could consider splitting each of the individual MMU files into
TCG/!TCG parts, but I don't want to go back to having all the helpers
for umpteen very different MMU models all lumped into one giant file.

> ---
> I had to turn slb_lookup in a non static function as it had calls from
> the code that was moved to mmu_helper.c and from the code that wasn't
> moved.
> 
> Also perhaps it would be best to create a new file to move the mmu-hash
> functions that are not compiled in !TCG, personally I thought that
> moving the helpers in mmu-hash64 to mmu_helpers the better choice.
> ---
>  target/ppc/mmu-hash64.c | 219 +---
>  target/ppc/mmu-hash64.h |   1 +
>  target/ppc/mmu_helper.c | 209 ++
>  3 files changed, 211 insertions(+), 218 deletions(-)
> 
> diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
> index 708dffc31b..d2ded71107 100644
> --- a/target/ppc/mmu-hash64.c
> +++ b/target/ppc/mmu-hash64.c
> @@ -32,10 +32,6 @@
>  #include "mmu-book3s-v3.h"
>  #include "helper_regs.h"
>  
> -#ifdef CONFIG_TCG
> -#include "exec/helper-proto.h"
> -#endif
> -
>  /* #define DEBUG_SLB */
>  
>  #ifdef DEBUG_SLB
> @@ -48,7 +44,7 @@
>   * SLB handling
>   */
>  
> -static ppc_slb_t *slb_lookup(PowerPCCPU *cpu, target_ulong eaddr)
> +ppc_slb_t *slb_lookup(PowerPCCPU *cpu, target_ulong eaddr)
>  {
>  CPUPPCState *env = >env;
>  uint64_t esid_256M, esid_1T;
> @@ -100,114 +96,6 @@ void dump_slb(PowerPCCPU *cpu)
>  }
>  }
>  
> -#ifdef CONFIG_TCG
> -void helper_slbia(CPUPPCState *env, uint32_t ih)
> -{
> -PowerPCCPU *cpu = env_archcpu(env);
> -int starting_entry;
> -int n;
> -
> -/*
> - * slbia must always flush all TLB (which is equivalent to ERAT in ppc
> - * architecture). Matching on SLB_ESID_V is not good enough, because 
> slbmte
> - * can overwrite a valid SLB without flushing its lookaside information.
> - *
> - * It would be possible to keep the TLB in synch with the SLB by flushing
> - * when a valid entry is overwritten by slbmte, and therefore slbia would
> - * not have to flush unless it evicts a valid SLB entry. However it is
> - * expected that slbmte is more common than slbia, and slbia is usually
> - * going to evict valid SLB entries, so that tradeoff is unlikely to be a
> - * good one.
> - *
> - * ISA v2.05 introduced IH field with values 0,1,2,6. These all 
> invalidate
> - * the same SLB entries (everything but entry 0), but differ in what
> - * "lookaside information" is invalidated. TCG can ignore this and flush
> - * everything.
> - *
> - * ISA v3.0 introduced additional values 3,4,7, which change what SLBs 
> are
> - * invalidated.
> - */
> -
> -env->tlb_need_flush |= TLB_NEED_LOCAL_FLUSH;
> -
> -starting_entry = 1; /* default for IH=0,1,2,6 */
> -
> -if (env->mmu_model == POWERPC_MMU_3_00) {
> -switch (ih) {
> -case 0x7:
> -/* invalidate no SLBs, but all lookaside information */
> -return;
> -
> -case 0x3:
> -case 0x4:
> -/* also considers SLB entry 0 */
> -starting_entry = 0;
> -break;
> -
> -case 0x5:
> -/* treat undefined values as ih==0, and warn */
> -qemu_log_mask(LOG_GUEST_ERROR,
> -  "slbia undefined IH field %u.\n", ih);
> -break;
> -
> -default:
> -/* 0,1,2,6 */
> -break;
> -}
> -}
> -
> -for (n = starting_entry; n < cpu->hash64_opts->slb_size; n++) {
> -ppc_slb_t *slb = >slb[n];
> -
> -if (!(slb->esid & SLB_ESID_V)) {
> -continue;
> -}
> -if (env->mmu_model == POWERPC_MMU_3_00) {
> -if (ih == 0x3 && (slb->vsid & SLB_VSID_C) == 0) {
> -/* preserves entries with a class value of 0 */
> -continue;
> -}
> -}
> -
> -slb->esid &= ~SLB_ESID_V;
> -}
> -}
> -
> -static void __helper_slbie(CPUPPCState *env, target_ulong addr,
> -   target_ulong global)
> -{
> -PowerPCCPU *cpu = env_archcpu(env);
> -ppc_slb_t *slb;
> -
> -slb = slb_lookup(cpu, addr);
> -if (!slb) {
> -return;
> -}
> -
> -if (slb->esid & SLB_ESID_V) {
> -slb->esid &= ~SLB_ESID_V;
> -
> -/*
> -  

Re: [RFC PATCH 3/4] target/ppc: moved ppc_store_sdr1 to mmu_common.c

2021-06-06 Thread David Gibson
On Wed, Jun 02, 2021 at 04:26:03PM -0300, Lucas Mateus Castro (alqotel) wrote:
> Moved ppc_store_sdr1 to mmu_common.c as it was originally in
> mmu_helper.c.
> 
> Signed-off-by: Lucas Mateus Castro (alqotel)
> 

Reviewed-by: David Gibson 

Though it will need a rebase for comments on the earlier patches in
the series.

> ---
> Talking to billionai he commented that ppc_store_sdr1 was at first in
> mmu_helper.c and was moved as part of the patches to enable the
> disable-tcg option, now it's being moved back to a file that will be
> compiled in a !TCG build.
> ---
>  target/ppc/cpu.c| 28 
>  target/ppc/mmu_common.c | 28 
>  2 files changed, 28 insertions(+), 28 deletions(-)
> 
> diff --git a/target/ppc/cpu.c b/target/ppc/cpu.c
> index c8e87e30f1..5ff00a6c01 100644
> --- a/target/ppc/cpu.c
> +++ b/target/ppc/cpu.c
> @@ -66,34 +66,6 @@ uint32_t ppc_get_vscr(CPUPPCState *env)
>  return env->vscr | (sat << VSCR_SAT);
>  }
>  
> -#ifdef CONFIG_SOFTMMU
> -void ppc_store_sdr1(CPUPPCState *env, target_ulong value)
> -{
> -PowerPCCPU *cpu = env_archcpu(env);
> -qemu_log_mask(CPU_LOG_MMU, "%s: " TARGET_FMT_lx "\n", __func__, value);
> -assert(!cpu->vhyp);
> -#if defined(TARGET_PPC64)
> -if (mmu_is_64bit(env->mmu_model)) {
> -target_ulong sdr_mask = SDR_64_HTABORG | SDR_64_HTABSIZE;
> -target_ulong htabsize = value & SDR_64_HTABSIZE;
> -
> -if (value & ~sdr_mask) {
> -qemu_log_mask(LOG_GUEST_ERROR, "Invalid bits 0x"TARGET_FMT_lx
> - " set in SDR1", value & ~sdr_mask);
> -value &= sdr_mask;
> -}
> -if (htabsize > 28) {
> -qemu_log_mask(LOG_GUEST_ERROR, "Invalid HTABSIZE 0x" 
> TARGET_FMT_lx
> - " stored in SDR1", htabsize);
> -return;
> -}
> -}
> -#endif /* defined(TARGET_PPC64) */
> -/* FIXME: Should check for valid HTABMASK values in 32-bit case */
> -env->spr[SPR_SDR1] = value;
> -}
> -#endif /* CONFIG_SOFTMMU */
> -
>  /* GDBstub can read and write MSR... */
>  void ppc_store_msr(CPUPPCState *env, target_ulong value)
>  {
> diff --git a/target/ppc/mmu_common.c b/target/ppc/mmu_common.c
> index d95399d67f..50b8799d71 100644
> --- a/target/ppc/mmu_common.c
> +++ b/target/ppc/mmu_common.c
> @@ -58,6 +58,34 @@
>  #  define LOG_BATS(...) do { } while (0)
>  #endif
>  
> +#ifdef CONFIG_SOFTMMU
> +void ppc_store_sdr1(CPUPPCState *env, target_ulong value)
> +{
> +PowerPCCPU *cpu = env_archcpu(env);
> +qemu_log_mask(CPU_LOG_MMU, "%s: " TARGET_FMT_lx "\n", __func__, value);
> +assert(!cpu->vhyp);
> +#if defined(TARGET_PPC64)
> +if (mmu_is_64bit(env->mmu_model)) {
> +target_ulong sdr_mask = SDR_64_HTABORG | SDR_64_HTABSIZE;
> +target_ulong htabsize = value & SDR_64_HTABSIZE;
> +
> +if (value & ~sdr_mask) {
> +qemu_log_mask(LOG_GUEST_ERROR, "Invalid bits 0x"TARGET_FMT_lx
> + " set in SDR1", value & ~sdr_mask);
> +value &= sdr_mask;
> +}
> +if (htabsize > 28) {
> +qemu_log_mask(LOG_GUEST_ERROR, "Invalid HTABSIZE 0x" 
> TARGET_FMT_lx
> + " stored in SDR1", htabsize);
> +return;
> +}
> +}
> +#endif /* defined(TARGET_PPC64) */
> +/* FIXME: Should check for valid HTABMASK values in 32-bit case */
> +env->spr[SPR_SDR1] = value;
> +}
> +#endif /* CONFIG_SOFTMMU */
> +
>  
> /*/
>  /* PowerPC MMU emulation */
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH qemu v20] spapr: Implement Open Firmware client interface

2021-06-06 Thread David Gibson
On Fri, Jun 04, 2021 at 03:50:28PM +0200, BALATON Zoltan wrote:
> On Fri, 4 Jun 2021, David Gibson wrote:
> > On Sun, May 30, 2021 at 07:33:01PM +0200, BALATON Zoltan wrote:
[snip]
> > > MorphOS checks the name property of the root node ("/") to decide what
> > > platform it runs on so we may need to be able to set this property on /
> > > where it should return "bplan,Pegasos2", therefore the above maybe should 
> > > do
> > > getprop first and only generate name property if it's not set (or at least
> > > check if we're on the root node and allow setting name property there). 
> > > (On
> > > Macs the root node is named "device-tree" and this was before found to be
> > > needed for MorphOS.)
> > 
> > Ah.  Hrm.  Have to think about what to do about that.
> 
> This is easy to fix, this seems to allow setting a name property or return a
> default:
> 
> > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
> index b47bbd509d..746842593e 100644
> --- a/hw/ppc/vof.c
> +++ b/hw/ppc/vof.c
> @@ -163,14 +163,14 @@ static uint32_t vof_finddevice(const void *fdt,
> uint32_t nodeaddr)
>  static const void *getprop(const void *fdt, int nodeoff, const char 
> *propname,
> int *proplen, bool *write0)
>  {
> -const char *unit, *prop;
> +const char *unit, *prop = fdt_getprop(fdt, nodeoff, propname, proplen);
> 
>  /*
>   * The "name" property is not actually stored as a property in the FDT,
>   * we emulate it by returning a pointer to the node's name and adjust
>   * proplen to include only the name but not the unit.
>   */
> -if (strcmp(propname, "name") == 0) {
> +if (!prop && strcmp(propname, "name") == 0) {
>  prop = fdt_get_name(fdt, nodeoff, proplen);
>  if (!prop) {
>  *proplen = 0;
> @@ -196,7 +196,7 @@ static const void *getprop(const void *fdt, int nodeoff, 
> const char *propname,
>  if (write0) {
>  *write0 = false;
>  }
> -return fdt_getprop(fdt, nodeoff, propname, proplen);
> +return prop;
>  }

Kind of a hack, but it'll do for now.

> This allows adding a name property to "/" different from the default but
> this does not yet fix MorphOS booting with VOF on pegasos2. I think it tries
> to query name on / and check if it's called "device-tree" in which case it
> assumes Mac hardware otherwise it goes with pegasos2 so even if we return
> nothing for name it would not matter in this case as we don't use VOF on
> Mac. If we wanted that then this would become a problem so it could be fixed
> now in advance just in case other guests may need it.
> 
> > > Other than the above two problems, I've found that getting the device tree
> > > from vof returns it in reverse order compared to the board firmware if I 
> > > add
> > > it the expected order. This may or may not be a problem but to avoid it I
> > > can build the tree in reverse order then it comes out right so unless
> > > there's an easy fix this should not cause a problem but may worth a 
> > > comment
> > > somewhere.
> > 
> > The order of things in the device tree *should* never matter.  If it
> > does, that's definitely a client bug... but of course that doesn't
> > necessarily mean we won't have to work around it in practice.
> 
> I don't know if it matters or not but having the device tree in the same
> order as the firmware ROM helps with comparing it for debugging but I've
> found I can solve this by building the tree in reverse order so no changes
> to VOF is needed for this, just thought adding a comment somewhere may
> clarify it but it's not really a problem.
> 
> I still don't know what's MorphOS is missing, I've tried adding almost all
> misssing properties, checked what hardware is init by the firmware and tried
> to do the same in board reset code and even after that MorphOS still takes a
> different route with VOF and crashes but boots with the board firmware. I'm
> now thinking it may be either different memory organisation or the missing
> name properties that are not returned by nextprop in VOF so they are only
> appearing when explicitely queried whereas with the board firmware they are
> present as properties. With the above patch I could explicitely set it on
> nodes and test if that makes a difference.
> 
> I got to this because adding more missing props or init more devices did not
> make a difference so I'm guessing it may be something else then and the only
> difference I can see compared to board firmware are the different memory
> ranges in claimed (VOF puts itself to 0 for example); and the missing name
> and additional phandle props in the device tree. MorphOS copies the whole
> device tree on startup then later it uses this copy of the device tree after
> shutting down OF with quiesce. I can imagine it may use some name props like
> that on the cpu node without checking assuming it's always there and if
> we're missing that it may cause a NULL dereference. I have no better idea
> what else could be missing so I'll test this 

Re: [PATCH qemu v20] spapr: Implement Open Firmware client interface

2021-06-06 Thread David Gibson
On Mon, Jun 07, 2021 at 12:21:21AM +0200, BALATON Zoltan wrote:
> On Fri, 4 Jun 2021, David Gibson wrote:
> > On Wed, Jun 02, 2021 at 02:29:29PM +0200, BALATON Zoltan wrote:
> > > On Wed, 2 Jun 2021, David Gibson wrote:
> > > > On Thu, May 27, 2021 at 02:42:39PM +0200, BALATON Zoltan wrote:
> > > > > On Thu, 27 May 2021, David Gibson wrote:
> > > > > > On Tue, May 25, 2021 at 12:08:45PM +0200, BALATON Zoltan wrote:
> > > > > > > On Tue, 25 May 2021, David Gibson wrote:
> > > > > > > > On Mon, May 24, 2021 at 12:55:07PM +0200, BALATON Zoltan wrote:
> > > > > > > > > On Mon, 24 May 2021, David Gibson wrote:
> > > > > > > > > > On Sun, May 23, 2021 at 07:09:26PM +0200, BALATON Zoltan 
> > > > > > > > > > wrote:
> > > > > > > > > > > On Sun, 23 May 2021, BALATON Zoltan wrote:
> > > > > > > > > > > and would using sc 1 for hypercalls on pegasos2 cause 
> > > > > > > > > > > other
> > > > > > > > > > > problems later even if the assert could be removed?
> > > > > > > > > > 
> > > > > > > > > > At least in the short term, I think you probably can remove 
> > > > > > > > > > the
> > > > > > > > > > assert.  In your case the 'sc 1' calls aren't truly to a 
> > > > > > > > > > hypervisor,
> > > > > > > > > > but a special case escape to qemu for the firmware 
> > > > > > > > > > emulation.  I think
> > > > > > > > > > it's unlikely to cause problems later, because nothing on a 
> > > > > > > > > > 32-bit
> > > > > > > > > > system should be attempting an 'sc 1'.  The only thing I 
> > > > > > > > > > can think of
> > > > > > > > > > that would fail is some test case which explicitly verified 
> > > > > > > > > > that 'sc
> > > > > > > > > > 1' triggered a 0x700 (SIGILL from userspace).
> > > > > > > > > 
> > > > > > > > > OK so the assert should check if the CPU has an HV bit. I 
> > > > > > > > > think there was a
> > > > > > > > > #detine for that somewhere that I can add to the assert then 
> > > > > > > > > I can try that.
> > > > > > > > > What I wasn't sure about is that sc 1 would conflict with the 
> > > > > > > > > guest's usage
> > > > > > > > > of normal sc calls or are these going through different paths 
> > > > > > > > > and only sc 1
> > > > > > > > > will trigger vhyp callback not affecting notmal sc calls?
> > > > > > > > 
> > > > > > > > The vhyp shouldn't affect normal system calls, 'sc 1' is 
> > > > > > > > specifically
> > > > > > > > for hypercalls, as opposed to normal 'sc' (a.k.a. 'sc 0'), and 
> > > > > > > > the
> > > > > > > > vhyp only intercepts the hypercall version (after all Linux on 
> > > > > > > > PAPR
> > > > > > > > certainly uses its own system calls, and hypercalls are active 
> > > > > > > > for the
> > > > > > > > lifetime of the guest there).
> > > > > > > > 
> > > > > > > > > (Or if this causes
> > > > > > > > > an otherwise unnecessary VM exit on KVM even when it works 
> > > > > > > > > then maybe
> > > > > > > > > looking for a different way in the future might be needed.
> > > > > > > > 
> > > > > > > > What you're doing here won't work with KVM as it stands.  There 
> > > > > > > > are
> > > > > > > > basically two paths into the vhyp hypercall path: 1) from TCG, 
> > > > > > > > if we
> > > > > > > > interpret an 'sc 1' instruction we enter vhyp, 2) from KVM, if 
> > > > > > > > we get
> > > > > > > > a KVM_EXIT_PAPR_HCALL KVM exit then we also go to the vhyp path.
> > > > > > > > 
> > > > > > > > The second path is specific to the PAPR (ppc64) implementation 
> > > > > > > > of KVM,
> > > > > > > > and will not work for a non-PAPR platform without substantial
> > > > > > > > modification of the KVM code.
> > > > > > > 
> > > > > > > OK so then at that point when we try KVM we'll need to look at 
> > > > > > > alternative
> > > > > > > ways, I think MOL OSI worked with KVM at least in MOL but will 
> > > > > > > probably make
> > > > > > > all syscalls exit KVM but since we'll probably need to use KVM PR 
> > > > > > > it will
> > > > > > > exit anyway. For now I keep this vhyp as it does not run with KVM 
> > > > > > > for other
> > > > > > > reasons yet so that's another area to clean up so as a proof of 
> > > > > > > concept
> > > > > > > first version of using VOF vhyp will do.
> > > > > > 
> > > > > > Eh, since you'll need to modify KVM anyway, it probably makes just 
> > > > > > as
> > > > > > much sense to modify it to catch the 'sc 1' as MoL's magic thingy.
> > > > > 
> > > > > I'm not sure how KVM works for this case so I also don't know why and 
> > > > > what
> > > > > would need to be modified. I think we'll only have KVM PR working as 
> > > > > newer
> > > > > POWER CPUs having HV (besides being rare among potential users) are 
> > > > > probably
> > > > > too different to run the OSes that expect at most a G4 on pegasos2 so 
> > > > > likely
> > > > > it won't work with KVM HV.
> > > > 
> > > > Oh, it definitely won't work with KVM HV.
> > > > 
> > > > > If we have KVM PR doesn't sc already trap so we
> > > > > could add MOL OSI without further 

Re: [PATCH v2 0/2] DEVICE_UNPLUG_ERROR QAPI event

2021-06-06 Thread David Gibson
On Fri, Jun 04, 2021 at 05:03:51PM -0300, Daniel Henrique Barboza wrote:
> Hi,
> 
> This is the v2 of a series that started with 2 events,
> DEVICE_UNPLUG_ERROR and DEVICE_NOT_DELETED [1]. After discussions in v1
> we reached the conclussion that the DEVICE_NOT_DELETED wasn't doing much
> of anything. It was an event that was trying to say 'I think something
> happen, but I'm not sure', forcing the QAPI listener to inspect the
> guest itself to see what went wrong, or just wait for some sort of
> internal timeout (as Libvirt will do) and fail the operation regardless.
> 
> During this period between v1 and this v2 the PowerPC kernel was changed
> to add a reliable error report mechanism in the device_removal path of
> CPUs, which in turn gave QEMU the opportunity to do the same. This made
> the DEVICE_UNPLUG_ERROR more relevant because now we can report CPU and
> DIMM hotunplug errors.
> 
> 
> changes from v1:
> - former patches 1 and 2: dropped
> - patch 1 (former 3): changed the version to '6.1'
> - patch 2 (former 4): add a DEVICE_UNPLUG_ERROR event in the device
>   unplug error path of CPUs and DIMMs
> 
> [1] v1 link:
> https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg04682.html

It would be nice to add a patch making x86 also issue the new error
format (as well as the old) on memory hot unplug errors.

> 
> 
> 
> Daniel Henrique Barboza (2):
>   qapi/machine.json: add DEVICE_UNPLUG_ERROR QAPI event
>   spapr: use DEVICE_UNPLUG_ERROR to report unplug errors
> 
>  hw/ppc/spapr.c |  2 +-
>  hw/ppc/spapr_drc.c | 15 +--
>  qapi/machine.json  | 23 +++
>  3 files changed, 33 insertions(+), 7 deletions(-)
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH qemu v20] spapr: Implement Open Firmware client interface

2021-06-06 Thread David Gibson
On Fri, Jun 04, 2021 at 03:27:12PM +0200, BALATON Zoltan wrote:
> On Fri, 4 Jun 2021, David Gibson wrote:
> > On Tue, Jun 01, 2021 at 04:12:44PM +0200, BALATON Zoltan wrote:
> > > On Tue, 1 Jun 2021, Alexey Kardashevskiy wrote:
> > > > On 31/05/2021 23:07, BALATON Zoltan wrote:
> > > > > On Sun, 30 May 2021, BALATON Zoltan wrote:
> > > > > > On Thu, 20 May 2021, Alexey Kardashevskiy wrote:
> > > > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
> > > > > > > new file mode 100644
> > > > > > > index ..a283b7d251a7
> > > > > > > --- /dev/null
> > > > > > > +++ b/hw/ppc/vof.c
> > > > > > > @@ -0,0 +1,1021 @@
> > > > > > > +/*
> > > > > > > + * QEMU PowerPC Virtual Open Firmware.
> > > > > > > + *
> > > > > > > + * This implements client interface from OpenFirmware
> > > > > > > IEEE1275 on the QEMU
> > > > > > > + * side to leave only a very basic firmware in the VM.
> > > > > > > + *
> > > > > > > + * Copyright (c) 2021 IBM Corporation.
> > > > > > > + *
> > > > > > > + * SPDX-License-Identifier: GPL-2.0-or-later
> > > > > > > + */
> > > > > > > +
> > > > > > > +#include "qemu/osdep.h"
> > > > > > > +#include "qemu-common.h"
> > > > > > > +#include "qemu/timer.h"
> > > > > > > +#include "qemu/range.h"
> > > > > > > +#include "qemu/units.h"
> > > > > > > +#include "qapi/error.h"
> > > > > > > +#include 
> > > > > > > +#include "exec/ram_addr.h"
> > > > > > > +#include "exec/address-spaces.h"
> > > > > > > +#include "hw/ppc/vof.h"
> > > > > > > +#include "hw/ppc/fdt.h"
> > > > > > > +#include "sysemu/runstate.h"
> > > > > > > +#include "qom/qom-qobject.h"
> > > > > > > +#include "trace.h"
> > > > > > > +
> > > > > > > +#include 
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * OF 1275 "nextprop" description suggests is it 32 bytes max but
> > > > > > > + * LoPAPR defines "ibm,query-interrupt-source-number" which
> > > > > > > is 33 chars long.
> > > > > > > + */
> > > > > > > +#define OF_PROPNAME_LEN_MAX 64
> > > > > > > +
> > > > > > > +#define VOF_MAX_PATH    256
> > > > > > > +#define VOF_MAX_SETPROPLEN  2048
> > > > > > > +#define VOF_MAX_METHODLEN   256
> > > > > > > +#define VOF_MAX_FORTHCODE   256
> > > > > > > +#define VOF_VTY_BUF_SIZE    256
> > > > > > > +
> > > > > > > +typedef struct {
> > > > > > > +    uint64_t start;
> > > > > > > +    uint64_t size;
> > > > > > > +} OfClaimed;
> > > > > > > +
> > > > > > > +typedef struct {
> > > > > > > +    char *path; /* the path used to open the instance */
> > > > > > > +    uint32_t phandle;
> > > > > > > +} OfInstance;
> > > > > > > +
> > > > > > > +#define VOF_MEM_READ(pa, buf, size) \
> > > > > > > +    address_space_read_full(_space_memory, \
> > > > > > > +    (pa), MEMTXATTRS_UNSPECIFIED, (buf), (size))
> > > > > > > +#define VOF_MEM_WRITE(pa, buf, size) \
> > > > > > > +    address_space_write(_space_memory, \
> > > > > > > +    (pa), MEMTXATTRS_UNSPECIFIED, (buf), (size))
> > > > > > > +
> > > > > > > +static int readstr(hwaddr pa, char *buf, int size)
> > > > > > > +{
> > > > > > > +    if (VOF_MEM_READ(pa, buf, size) != MEMTX_OK) {
> > > > > > > +    return -1;
> > > > > > > +    }
> > > > > > > +    if (strnlen(buf, size) == size) {
> > > > > > > +    buf[size - 1] = '\0';
> > > > > > > +    trace_vof_error_str_truncated(buf, size);
> > > > > > > +    return -1;
> > > > > > > +    }
> > > > > > > +    return 0;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static bool cmpservice(const char *s, unsigned nargs, unsigned 
> > > > > > > nret,
> > > > > > > +   const char *s1, unsigned nargscheck,
> > > > > > > unsigned nretcheck)
> > > > > > > +{
> > > > > > > +    if (strcmp(s, s1)) {
> > > > > > > +    return false;
> > > > > > > +    }
> > > > > > > +    if ((nargscheck && (nargs != nargscheck)) ||
> > > > > > > +    (nretcheck && (nret != nretcheck))) {
> > > > > > > +    trace_vof_error_param(s, nargscheck, nretcheck, nargs, 
> > > > > > > nret);
> > > > > > > +    return false;
> > > > > > > +    }
> > > > > > > +
> > > > > > > +    return true;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void prop_format(char *tval, int tlen, const void *prop, 
> > > > > > > int len)
> > > > > > > +{
> > > > > > > +    int i;
> > > > > > > +    const unsigned char *c;
> > > > > > > +    char *t;
> > > > > > > +    const char bin[] = "...";
> > > > > > > +
> > > > > > > +    for (i = 0, c = prop; i < len; ++i, ++c) {
> > > > > > > +    if (*c == '\0' && i == len - 1) {
> > > > > > > +    strncpy(tval, prop, tlen - 1);
> > > > > > > +    return;
> > > > > > > +    }
> > > > > > > +    if (*c < 0x20 || *c >= 0x80) {
> > > > > > > +    break;
> > > > > > > +    }
> > > > > > > +    }
> > > > > > > +
> > > > > > > +    for (i = 0, c = prop, t = tval; i < len; ++i, ++c) {
> > > > > > > +    if (t >= tval + tlen - sizeof(bin) - 1 - 2 - 1) {
> > > > > > > +    strcpy(t, bin);
> > > > > > > +    return;
> > > > > > > +    

Re: [RFC PATCH 1/4] target/ppc: Don't compile ppc_tlb_invalid_all without TCG

2021-06-06 Thread David Gibson
On Wed, Jun 02, 2021 at 04:26:01PM -0300, Lucas Mateus Castro (alqotel) wrote:
> The function ppc_tlb_invalid_all is not compiled anymore in a TCG-less
> environment, and the call to that function has been disabled in this
> situation.
> 
> Signed-off-by: Lucas Mateus Castro (alqotel) 
> ---
> Is there a better way than to deal with the
> ppc_tlb_invalidate_all call than ifdef? I couldn't think of one.
> ---
>  target/ppc/cpu_init.c   | 4 
>  target/ppc/mmu_helper.c | 4 
>  2 files changed, 8 insertions(+)
> 
> diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
> index 74a397ad6c..2051f24467 100644
> --- a/target/ppc/cpu_init.c
> +++ b/target/ppc/cpu_init.c
> @@ -9114,7 +9114,11 @@ static void ppc_cpu_reset(DeviceState *dev)
>  #if !defined(CONFIG_USER_ONLY)
>  env->nip = env->hreset_vector | env->excp_prefix;
>  if (env->mmu_model != POWERPC_MMU_REAL) {
> +#if defined(CONFIG_TCG)
>  ppc_tlb_invalidate_all(env);
> +#else
> + cpu_abort(env_cpu(env),"PowerPC not in real mode, invalid in this 
> build\n");

As Lucas says, this doesn't look right, I think you want a no-op stub
for ppc_tlb_invalidate_all instead().

Also, qemu style is not to use tabs for indentation.

> +#endif
>  }
>  #endif
>  
> diff --git a/target/ppc/mmu_helper.c b/target/ppc/mmu_helper.c
> index 1ecb36e85a..803b66f2b0 100644
> --- a/target/ppc/mmu_helper.c
> +++ b/target/ppc/mmu_helper.c
> @@ -825,6 +825,7 @@ static int mmubooke_get_physical_address(CPUPPCState 
> *env, mmu_ctx_t *ctx,
>  return ret;
>  }
>  
> +#ifdef CONFIG_TCG
>  static void booke206_flush_tlb(CPUPPCState *env, int flags,
> const int check_iprot)
>  {
> @@ -846,6 +847,7 @@ static void booke206_flush_tlb(CPUPPCState *env, int 
> flags,
>  
>  tlb_flush(env_cpu(env));
>  }
> +#endif
>  
>  static hwaddr booke206_tlb_to_page_size(CPUPPCState *env,
>  ppcmas_tlb_t *tlb)
> @@ -1952,6 +1954,7 @@ void helper_store_601_batl(CPUPPCState *env, uint32_t 
> nr, target_ulong value)
>  }
>  #endif
>  
> +#ifdef CONFIG_TCG
>  
> /*/
>  /* TLB management */
>  void ppc_tlb_invalidate_all(CPUPPCState *env)
> @@ -1995,6 +1998,7 @@ void ppc_tlb_invalidate_all(CPUPPCState *env)
>  break;
>  }
>  }
> +#endif
>  
>  #ifdef CONFIG_TCG
>  void ppc_tlb_invalidate_one(CPUPPCState *env, target_ulong addr)

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v2 1/2] qapi/machine.json: add DEVICE_UNPLUG_ERROR QAPI event

2021-06-06 Thread David Gibson
On Fri, Jun 04, 2021 at 05:03:52PM -0300, Daniel Henrique Barboza wrote:
> At this moment we only provide one event to report a hotunplug error,
> MEM_UNPLUG_ERROR. As of Linux kernel 5.12 and QEMU 6.0.0, the pseries
> machine is now able to report unplug errors for other device types, such
> as CPUs.
> 
> Instead of creating a (device_type)_UNPLUG_ERROR for each new device,
> create a generic DEVICE_UNPLUG_ERROR event that can be used by all
> unplug errors in the future.
> 
> Signed-off-by: Daniel Henrique Barboza 

Reviewed-by: David Gibson 

> ---
>  qapi/machine.json | 23 +++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/qapi/machine.json b/qapi/machine.json
> index 58a9c86b36..f0c7e56be0 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -1274,3 +1274,26 @@
>  ##
>  { 'event': 'MEM_UNPLUG_ERROR',
>'data': { 'device': 'str', 'msg': 'str' } }
> +
> +##
> +# @DEVICE_UNPLUG_ERROR:
> +#
> +# Emitted when a device hot unplug error occurs.
> +#
> +# @device: device name
> +#
> +# @msg: Informative message
> +#
> +# Since: 6.1
> +#
> +# Example:
> +#
> +# <- { "event": "DEVICE_UNPLUG_ERROR"
> +#  "data": { "device": "dimm1",
> +#"msg": "Memory hotunplug rejected by the guest for device 
> dimm1"
> +#  },
> +#  "timestamp": { "seconds": 1615570772, "microseconds": 202844 } }
> +#
> +##
> +{ 'event': 'DEVICE_UNPLUG_ERROR',
> +  'data': { 'device': 'str', 'msg': 'str' } }

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [RFC PATCH 2/4] target/ppc: divided mmu_helper.c in 2 files

2021-06-06 Thread David Gibson
On Wed, Jun 02, 2021 at 04:26:02PM -0300, Lucas Mateus Castro (alqotel) wrote:
> Moved functions in mmu_helper.c that should be compiled in build to
> mmu_common.c, moved declaration of functions that both files use to
> cpu.h and moved struct declarations and inline functions needed by
> both to target/ppc/internal.h. Updated meson.build to compile the
> new file.
> 
> Signed-off-by: Lucas Mateus Castro (alqotel) 
> ---
> Had to turn a few functions non static as it was used by both
> mmu_common.c and mmu_helper.c. Added the declaration of mmu_ctx_t to
> cpu.h so functions there can reference it and added the definition to
> internal.h so functions in both mmu_* files can access it.
> And maybe the tlb functions should be declared in internal.h instead of
> cpu.h.
> ---
>  target/ppc/cpu.h|   35 +
>  target/ppc/internal.h   |   26 +
>  target/ppc/meson.build  |6 +-
>  target/ppc/mmu_common.c | 1606 ++
>  target/ppc/mmu_helper.c | 1814 +++
>  5 files changed, 1774 insertions(+), 1713 deletions(-)
>  create mode 100644 target/ppc/mmu_common.c
> 
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index b0934d9be4..cfc35ef83e 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -1329,6 +1329,41 @@ void store_booke_tsr(CPUPPCState *env, target_ulong 
> val);
>  void ppc_tlb_invalidate_all(CPUPPCState *env);
>  void ppc_tlb_invalidate_one(CPUPPCState *env, target_ulong addr);
>  void cpu_ppc_set_vhyp(PowerPCCPU *cpu, PPCVirtualHypervisor *vhyp);
> +
> +typedef struct mmu_ctx_t mmu_ctx_t;
> +int ppcemb_tlb_check(CPUPPCState *env, ppcemb_tlb_t *tlb,
> +hwaddr *raddrp,
> +target_ulong address, uint32_t pid, int ext,
> +int i);
> +int get_physical_address_wtlb(CPUPPCState *env, mmu_ctx_t *ctx,
> + target_ulong eaddr,
> + MMUAccessType access_type, int type,
> + int mmu_idx);
> +hwaddr booke206_tlb_to_page_size(CPUPPCState *env,
> +ppcmas_tlb_t *tlb);
> +int ppcmas_tlb_check(CPUPPCState *env, ppcmas_tlb_t *tlb,
> +hwaddr *raddrp, target_ulong address,
> +uint32_t pid);
> +int get_physical_address(CPUPPCState *env, mmu_ctx_t *ctx,
> +target_ulong eaddr, MMUAccessType 
> access_type,
> +int type);
> +static inline int ppc6xx_tlb_getnum(CPUPPCState *env, target_ulong eaddr,
> +int way, int is_code)
> +{
> +int nr;
> +
> +/* Select TLB num in a way from address */
> +nr = (eaddr >> TARGET_PAGE_BITS) & (env->tlb_per_way - 1);
> +/* Select TLB way */
> +nr += env->tlb_per_way * way;
> +/* 6xx have separate TLBs for instructions and data */
> +if (is_code && env->id_tlbs == 1) {
> +nr += env->nb_tlb;
> +}
> +
> +return nr;
> +}

This is a rather complex and model specific function to have inline in
a header.

> +
>  #endif
>  #endif
>  
> diff --git a/target/ppc/internal.h b/target/ppc/internal.h
> index 2b4b06eb76..7d2913ec28 100644
> --- a/target/ppc/internal.h
> +++ b/target/ppc/internal.h
> @@ -247,4 +247,30 @@ static inline int prot_for_access_type(MMUAccessType 
> access_type)
>  g_assert_not_reached();
>  }
>  
> +/* Context used internally during MMU translations */
> +
> +struct mmu_ctx_t {
> +hwaddr raddr;  /* Real address  */
> +hwaddr eaddr;  /* Effective address */
> +int prot;  /* Protection bits   */
> +hwaddr hash[2];/* Pagetable hash values */
> +target_ulong ptem; /* Virtual segment ID | API  */
> +int key;   /* Access key*/
> +int nx;/* Non-execute area  */
> +};
> +
> +/* Common routines used by software and hardware TLBs emulation */
> +static inline int pte_is_valid(target_ulong pte0)
> +{
> +return pte0 & 0x8000 ? 1 : 0;
> +}
> +
> +static inline void pte_invalidate(target_ulong *pte0)
> +{
> +*pte0 &= ~0x8000;
> +}
> +
> +#define PTE_PTEM_MASK 0x7FBF
> +#define PTE_CHECK_MASK (TARGET_PAGE_MASK | 0x7B)
> +
>  #endif /* PPC_INTERNAL_H */
> diff --git a/target/ppc/meson.build b/target/ppc/meson.build
> index a6a53a8d5c..a54445d0da 100644
> --- a/target/ppc/meson.build
> +++ b/target/ppc/meson.build
> @@ -28,10 +28,12 @@ ppc_softmmu_ss.add(files(
>'arch_dump.c',
>'machine.c',
>'mmu-hash32.c',
> -  'mmu_helper.c',
> +  'mmu_common.c',
>'monitor.c',
>  ))
> -ppc_softmmu_ss.add(when: 'CONFIG_TCG', if_false: files(
> +ppc_softmmu_ss.add(when: 'CONFIG_TCG', if_true: files(
> +  'mmu_helper.c',
> +), if_false: files(
>'tcg-stub.c'
>  ))
>  
> diff --git a/target/ppc/mmu_common.c 

Re: [PATCH v2 2/2] spapr: use DEVICE_UNPLUG_ERROR to report unplug errors

2021-06-06 Thread David Gibson
On Fri, Jun 04, 2021 at 05:03:53PM -0300, Daniel Henrique Barboza wrote:
> Linux Kernel 5.12 is now unisolating CPU DRCs in the device_removal
> error path, signalling that the hotunplug process wasn't successful.
> This allow us to send a DEVICE_UNPLUG_ERROR in drc_unisolate_logical()
> to signal this error to the management layer.
> 
> We also have another error path in spapr_memory_unplug_rollback() for
> configured LMB DRCs. Kernels older than 5.13 will not unisolate the LMBs
> in the hotunplug error path, but it will reconfigure them.  Let's send
> the DEVICE_UNPLUG_ERROR event in that code path as well to cover the
> case of older kernels.
> 
> Signed-off-by: Daniel Henrique Barboza 

Reviewed-by: David Gibson 

> ---
>  hw/ppc/spapr.c |  2 +-
>  hw/ppc/spapr_drc.c | 15 +--
>  2 files changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index c23bcc4490..29aa2f467d 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3639,7 +3639,7 @@ void spapr_memory_unplug_rollback(SpaprMachineState 
> *spapr, DeviceState *dev)
>   */
>  qapi_error = g_strdup_printf("Memory hotunplug rejected by the guest "
>   "for device %s", dev->id);
> -qapi_event_send_mem_unplug_error(dev->id, qapi_error);
> +qapi_event_send_device_unplug_error(dev->id, qapi_error);
>  }
>  
>  /* Callback to be called during DRC release. */
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index a2f2634601..0e1a8733bc 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -17,6 +17,8 @@
>  #include "hw/ppc/spapr_drc.h"
>  #include "qom/object.h"
>  #include "migration/vmstate.h"
> +#include "qapi/error.h"
> +#include "qapi/qapi-events-machine.h"
>  #include "qapi/visitor.h"
>  #include "qemu/error-report.h"
>  #include "hw/ppc/spapr.h" /* for RTAS return codes */
> @@ -160,6 +162,10 @@ static uint32_t drc_unisolate_logical(SpaprDrc *drc)
>   * means that the kernel is refusing the removal.
>   */
>  if (drc->unplug_requested && drc->dev) {
> +const char qapi_error_fmt[] = "Device hotunplug rejected by the "
> +  "guest for device %s";
> +g_autofree char *qapi_error = NULL;
> +
>  if (spapr_drc_type(drc) == SPAPR_DR_CONNECTOR_TYPE_LMB) {
>  spapr = SPAPR_MACHINE(qdev_get_machine());
>  
> @@ -167,13 +173,10 @@ static uint32_t drc_unisolate_logical(SpaprDrc *drc)
>  }
>  
>  drc->unplug_requested = false;
> -error_report("Device hotunplug rejected by the guest "
> - "for device %s", drc->dev->id);
> +error_report(qapi_error_fmt, drc->dev->id);
>  
> -/*
> - * TODO: send a QAPI DEVICE_UNPLUG_ERROR event when
> - * it is implemented.
> - */
> +qapi_error = g_strdup_printf(qapi_error_fmt, drc->dev->id);
> +qapi_event_send_device_unplug_error(drc->dev->id, qapi_error);
>  }
>  
>  return RTAS_OUT_SUCCESS; /* Nothing to do */

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v2 1/2] qapi/machine.json: add DEVICE_UNPLUG_ERROR QAPI event

2021-06-06 Thread David Gibson
On Mon, Jun 07, 2021 at 12:23:08PM +1000, David Gibson wrote:
> On Fri, Jun 04, 2021 at 05:03:52PM -0300, Daniel Henrique Barboza wrote:
> > At this moment we only provide one event to report a hotunplug error,
> > MEM_UNPLUG_ERROR. As of Linux kernel 5.12 and QEMU 6.0.0, the pseries
> > machine is now able to report unplug errors for other device types, such
> > as CPUs.
> > 
> > Instead of creating a (device_type)_UNPLUG_ERROR for each new device,
> > create a generic DEVICE_UNPLUG_ERROR event that can be used by all
> > unplug errors in the future.
> > 
> > Signed-off-by: Daniel Henrique Barboza 
> 
> Reviewed-by: David Gibson 

Markus, I'm happy to take this through my tree if that's convenient
for you, but I'd like to get an ack.

> 
> > ---
> >  qapi/machine.json | 23 +++
> >  1 file changed, 23 insertions(+)
> > 
> > diff --git a/qapi/machine.json b/qapi/machine.json
> > index 58a9c86b36..f0c7e56be0 100644
> > --- a/qapi/machine.json
> > +++ b/qapi/machine.json
> > @@ -1274,3 +1274,26 @@
> >  ##
> >  { 'event': 'MEM_UNPLUG_ERROR',
> >'data': { 'device': 'str', 'msg': 'str' } }
> > +
> > +##
> > +# @DEVICE_UNPLUG_ERROR:
> > +#
> > +# Emitted when a device hot unplug error occurs.
> > +#
> > +# @device: device name
> > +#
> > +# @msg: Informative message
> > +#
> > +# Since: 6.1
> > +#
> > +# Example:
> > +#
> > +# <- { "event": "DEVICE_UNPLUG_ERROR"
> > +#  "data": { "device": "dimm1",
> > +#"msg": "Memory hotunplug rejected by the guest for device 
> > dimm1"
> > +#  },
> > +#  "timestamp": { "seconds": 1615570772, "microseconds": 202844 } }
> > +#
> > +##
> > +{ 'event': 'DEVICE_UNPLUG_ERROR',
> > +  'data': { 'device': 'str', 'msg': 'str' } }
> 



-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


TCG op for 32 bit only cpu on qemu-riscv64

2021-06-06 Thread LIU Zhiwei

Hi Alistair,

As I see,  we are moving  on to remove TARGET_RISCV64 macro.

I have some questions:

1) Which tcg op should use when translate an instruction for 32bit cpu. 
The tcg_*_i64, tcg_*_i32 or tcg_*_tl?
I see some API such as gen_get_gpr that are using the tcg_*_tl. But I am 
not sure if it is

right for 32bit cpu.

2) Do we should have a sign-extend 64 bit register(bit 31 as the sign 
bit)  for 32 bit cpu?


Best Regards,
Zhiwei




Re: [PATCH v3 0/7] support dirtyrate at the granualrity of vcpu

2021-06-06 Thread Hyman

cc Chuan Zheng 

在 2021/6/7 9:11, huang...@chinatelecom.cn 写道:

From: Hyman Huang(黄勇) 

v3:
- pick up "migration/dirtyrate: make sample page count configurable" to
   make patchset apply master correctly

v2:
- rebase to "migration/dirtyrate: make sample page count configurable"

- rename "vcpu" to "per_vcpu" to show the per-vcpu method

- squash patch 5/6 into a single one, squash patch 1/2 also

- pick up "hmp: Add "calc_dirty_rate" and "info dirty_rate" cmds"

- make global_dirty_log a bitmask to make sure both migration and dirty
   could not intefer with each other

- add memory free callback to prevent memory leaking

the most different of v2 fron v1 is that we make the global_dirty_log a
bitmask. the reason is dirty rate measurement may start or stop dirty
logging during calculation. this conflict with migration because stop
dirty log make migration leave dirty pages out then that'll be a
problem.

make global_dirty_log a bitmask can let both migration and dirty
rate measurement work fine. introduce GLOBAL_DIRTY_MIGRATION and
GLOBAL_DIRTY_DIRTY_RATE to distinguish what current dirty log aims
for, migration or dirty rate.
 
all references to global_dirty_log should be untouched because any bit

set there should justify that global dirty logging is enabled.

Please review, thanks !

v1:

Since the Dirty Ring on QEMU part has been merged recently, how to use
this feature is under consideration.

In the scene of migration, it is valuable to provide a more accurante
interface to track dirty memory than existing one, so that the upper
layer application can make a wise decision, or whatever. More
importantly,
dirtyrate info at the granualrity of vcpu could provide a possibility to
make migration convergent by imposing restriction on vcpu. With Dirty
Ring, we can calculate dirtyrate efficiently and cheaply.

The old interface implemented by sampling pages, it consumes cpu
resource, and the larger guest memory size become, the more cpu resource
it consumes, namely, hard to scale. New interface has no such drawback.

Please review, thanks !

Best Regards !

Hyman Huang(黄勇) (6):
   migration/dirtyrate: make sample page count configurable
   KVM: introduce dirty_pages and kvm_dirty_ring_enabled
   migration/dirtyrate: add per-vcpu option for calc-dirty-rate
   migration/dirtyrate: adjust struct DirtyRateStat
   memory: make global_dirty_log a bitmask
   migration/dirtyrate: implement dirty-ring dirtyrate calculation

Peter Xu (1):
   hmp: Add "calc_dirty_rate" and "info dirty_rate" cmds

  accel/kvm/kvm-all.c|   7 +
  hmp-commands-info.hx   |  13 ++
  hmp-commands.hx|  15 ++
  include/exec/memory.h  |  13 +-
  include/hw/core/cpu.h  |   1 +
  include/monitor/hmp.h  |   2 +
  include/sysemu/kvm.h   |   1 +
  migration/dirtyrate.c  | 347 ++---
  migration/dirtyrate.h  |  27 +++-
  migration/ram.c|   8 +-
  migration/trace-events |   5 +
  qapi/migration.json|  38 -
  softmmu/memory.c   |  36 +++--
  13 files changed, 468 insertions(+), 45 deletions(-)





[PATCH v3 5/7] migration/dirtyrate: adjust struct DirtyRateStat

2021-06-06 Thread huangy81
From: Hyman Huang(黄勇) 

use union to store stat data of two mutual exclusive methods.

Signed-off-by: Hyman Huang(黄勇) 
---
 migration/dirtyrate.c | 40 +---
 migration/dirtyrate.h | 18 +++---
 2 files changed, 36 insertions(+), 22 deletions(-)

diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index 5947c688f6..055145c24c 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -88,33 +88,39 @@ static struct DirtyRateInfo *query_dirty_rate_info(void)
 return info;
 }
 
-static void init_dirtyrate_stat(int64_t start_time, int64_t calc_time,
-uint64_t sample_pages)
+static void init_dirtyrate_stat(int64_t start_time,
+struct DirtyRateConfig config)
 {
-DirtyStat.total_dirty_samples = 0;
-DirtyStat.total_sample_count = 0;
-DirtyStat.total_block_mem_MB = 0;
 DirtyStat.dirty_rate = -1;
 DirtyStat.start_time = start_time;
-DirtyStat.calc_time = calc_time;
-DirtyStat.sample_pages = sample_pages;
+DirtyStat.calc_time = config.sample_period_seconds;
+DirtyStat.sample_pages = config.sample_pages_per_gigabytes;
+
+if (config.per_vcpu) {
+DirtyStat.method.vcpu.nvcpu = -1;
+DirtyStat.method.vcpu.rates = NULL;
+} else {
+DirtyStat.method.vm.total_dirty_samples = 0;
+DirtyStat.method.vm.total_sample_count = 0;
+DirtyStat.method.vm.total_block_mem_MB = 0;
+}
 }
 
 static void update_dirtyrate_stat(struct RamblockDirtyInfo *info)
 {
-DirtyStat.total_dirty_samples += info->sample_dirty_count;
-DirtyStat.total_sample_count += info->sample_pages_count;
+DirtyStat.method.vm.total_dirty_samples += info->sample_dirty_count;
+DirtyStat.method.vm.total_sample_count += info->sample_pages_count;
 /* size of total pages in MB */
-DirtyStat.total_block_mem_MB += (info->ramblock_pages *
+DirtyStat.method.vm.total_block_mem_MB += (info->ramblock_pages *
  TARGET_PAGE_SIZE) >> 20;
 }
 
 static void update_dirtyrate(uint64_t msec)
 {
 uint64_t dirtyrate;
-uint64_t total_dirty_samples = DirtyStat.total_dirty_samples;
-uint64_t total_sample_count = DirtyStat.total_sample_count;
-uint64_t total_block_mem_MB = DirtyStat.total_block_mem_MB;
+uint64_t total_dirty_samples = DirtyStat.method.vm.total_dirty_samples;
+uint64_t total_sample_count = DirtyStat.method.vm.total_sample_count;
+uint64_t total_block_mem_MB = DirtyStat.method.vm.total_block_mem_MB;
 
 dirtyrate = total_dirty_samples * total_block_mem_MB *
 1000 / (total_sample_count * msec);
@@ -327,7 +333,7 @@ static bool compare_page_hash_info(struct RamblockDirtyInfo 
*info,
 update_dirtyrate_stat(block_dinfo);
 }
 
-if (DirtyStat.total_sample_count == 0) {
+if (DirtyStat.method.vm.total_sample_count == 0) {
 return false;
 }
 
@@ -372,8 +378,6 @@ void *get_dirtyrate_thread(void *arg)
 struct DirtyRateConfig config = *(struct DirtyRateConfig *)arg;
 int ret;
 int64_t start_time;
-int64_t calc_time;
-uint64_t sample_pages;
 
 ret = dirtyrate_set_state(, DIRTY_RATE_STATUS_UNSTARTED,
   DIRTY_RATE_STATUS_MEASURING);
@@ -383,9 +387,7 @@ void *get_dirtyrate_thread(void *arg)
 }
 
 start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000;
-calc_time = config.sample_period_seconds;
-sample_pages = config.sample_pages_per_gigabytes;
-init_dirtyrate_stat(start_time, calc_time, sample_pages);
+init_dirtyrate_stat(start_time, config);
 
 calculate_dirtyrate(config);
 
diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h
index ec82716b40..af32e33d5d 100644
--- a/migration/dirtyrate.h
+++ b/migration/dirtyrate.h
@@ -59,17 +59,29 @@ struct RamblockDirtyInfo {
 uint32_t *hash_result; /* array of hash result for sampled pages */
 };
 
+typedef struct SampleVMStat {
+uint64_t total_dirty_samples; /* total dirty sampled page */
+uint64_t total_sample_count; /* total sampled pages */
+uint64_t total_block_mem_MB; /* size of total sampled pages in MB */
+} SampleVMStat;
+
+typedef struct VcpuStat {
+int nvcpu; /* number of vcpu */
+DirtyRateVcpu *rates; /* array of dirty rate for each vcpu */
+} VcpuStat;
+
 /*
  * Store calculation statistics for each measure.
  */
 struct DirtyRateStat {
-uint64_t total_dirty_samples; /* total dirty sampled page */
-uint64_t total_sample_count; /* total sampled pages */
-uint64_t total_block_mem_MB; /* size of total sampled pages in MB */
 int64_t dirty_rate; /* dirty rate in MB/s */
 int64_t start_time; /* calculation start time in units of second */
 int64_t calc_time; /* time duration of two sampling in units of second */
 uint64_t sample_pages; /* sample pages per GB */
+union {
+SampleVMStat vm;
+VcpuStat vcpu;
+} method;
 };
 
 void 

[PATCH v3 1/7] migration/dirtyrate: make sample page count configurable

2021-06-06 Thread huangy81
From: Hyman Huang(黄勇) 

introduce optional sample-pages argument in calc-dirty-rate,
making sample page count per GB configurable so that more
accurate dirtyrate can be calculated.

Signed-off-by: Hyman Huang(黄勇) 
---
 migration/dirtyrate.c | 31 +++
 migration/dirtyrate.h |  8 +++-
 qapi/migration.json   | 13 ++---
 3 files changed, 44 insertions(+), 8 deletions(-)

diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index ccb98147e8..2ee3890721 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -48,6 +48,12 @@ static bool is_sample_period_valid(int64_t sec)
 return true;
 }
 
+static bool is_sample_pages_valid(int64_t pages)
+{
+return pages >= MIN_SAMPLE_PAGE_COUNT &&
+   pages <= MAX_SAMPLE_PAGE_COUNT;
+}
+
 static int dirtyrate_set_state(int *state, int old_state, int new_state)
 {
 assert(new_state < DIRTY_RATE_STATUS__MAX);
@@ -72,13 +78,15 @@ static struct DirtyRateInfo *query_dirty_rate_info(void)
 info->status = CalculatingState;
 info->start_time = DirtyStat.start_time;
 info->calc_time = DirtyStat.calc_time;
+info->sample_pages = DirtyStat.sample_pages;
 
 trace_query_dirty_rate_info(DirtyRateStatus_str(CalculatingState));
 
 return info;
 }
 
-static void init_dirtyrate_stat(int64_t start_time, int64_t calc_time)
+static void init_dirtyrate_stat(int64_t start_time, int64_t calc_time,
+uint64_t sample_pages)
 {
 DirtyStat.total_dirty_samples = 0;
 DirtyStat.total_sample_count = 0;
@@ -86,6 +94,7 @@ static void init_dirtyrate_stat(int64_t start_time, int64_t 
calc_time)
 DirtyStat.dirty_rate = -1;
 DirtyStat.start_time = start_time;
 DirtyStat.calc_time = calc_time;
+DirtyStat.sample_pages = sample_pages;
 }
 
 static void update_dirtyrate_stat(struct RamblockDirtyInfo *info)
@@ -361,6 +370,7 @@ void *get_dirtyrate_thread(void *arg)
 int ret;
 int64_t start_time;
 int64_t calc_time;
+uint64_t sample_pages;
 
 ret = dirtyrate_set_state(, DIRTY_RATE_STATUS_UNSTARTED,
   DIRTY_RATE_STATUS_MEASURING);
@@ -371,7 +381,8 @@ void *get_dirtyrate_thread(void *arg)
 
 start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000;
 calc_time = config.sample_period_seconds;
-init_dirtyrate_stat(start_time, calc_time);
+sample_pages = config.sample_pages_per_gigabytes;
+init_dirtyrate_stat(start_time, calc_time, sample_pages);
 
 calculate_dirtyrate(config);
 
@@ -383,7 +394,8 @@ void *get_dirtyrate_thread(void *arg)
 return NULL;
 }
 
-void qmp_calc_dirty_rate(int64_t calc_time, Error **errp)
+void qmp_calc_dirty_rate(int64_t calc_time, bool has_sample_pages,
+ int64_t sample_pages, Error **errp)
 {
 static struct DirtyRateConfig config;
 QemuThread thread;
@@ -404,6 +416,17 @@ void qmp_calc_dirty_rate(int64_t calc_time, Error **errp)
 return;
 }
 
+if (has_sample_pages) {
+if (!is_sample_pages_valid(sample_pages)) {
+error_setg(errp, "sample-pages is out of range[%d, %d].",
+MIN_SAMPLE_PAGE_COUNT,
+MAX_SAMPLE_PAGE_COUNT);
+return;
+}
+} else {
+sample_pages = DIRTYRATE_DEFAULT_SAMPLE_PAGES;
+}
+
 /*
  * Init calculation state as unstarted.
  */
@@ -415,7 +438,7 @@ void qmp_calc_dirty_rate(int64_t calc_time, Error **errp)
 }
 
 config.sample_period_seconds = calc_time;
-config.sample_pages_per_gigabytes = DIRTYRATE_DEFAULT_SAMPLE_PAGES;
+config.sample_pages_per_gigabytes = sample_pages;
 qemu_thread_create(, "get_dirtyrate", get_dirtyrate_thread,
(void *), QEMU_THREAD_DETACHED);
 }
diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h
index 6ec429534d..e1fd29089e 100644
--- a/migration/dirtyrate.h
+++ b/migration/dirtyrate.h
@@ -15,7 +15,6 @@
 
 /*
  * Sample 512 pages per GB as default.
- * TODO: Make it configurable.
  */
 #define DIRTYRATE_DEFAULT_SAMPLE_PAGES512
 
@@ -35,6 +34,12 @@
 #define MIN_FETCH_DIRTYRATE_TIME_SEC  1
 #define MAX_FETCH_DIRTYRATE_TIME_SEC  60
 
+/*
+ * Take 1/16 pages in 1G as the maxmum sample page count
+ */
+#define MIN_SAMPLE_PAGE_COUNT 128
+#define MAX_SAMPLE_PAGE_COUNT 16384
+
 struct DirtyRateConfig {
 uint64_t sample_pages_per_gigabytes; /* sample pages per GB */
 int64_t sample_period_seconds; /* time duration between two sampling */
@@ -63,6 +68,7 @@ struct DirtyRateStat {
 int64_t dirty_rate; /* dirty rate in MB/s */
 int64_t start_time; /* calculation start time in units of second */
 int64_t calc_time; /* time duration of two sampling in units of second */
+uint64_t sample_pages; /* sample pages per GB */
 };
 
 void *get_dirtyrate_thread(void *arg);
diff --git a/qapi/migration.json b/qapi/migration.json
index 

[PATCH v3 6/7] memory: make global_dirty_log a bitmask

2021-06-06 Thread huangy81
From: Hyman Huang(黄勇) 

dirty rate measurement may start or stop dirty logging during
calculation. this conflict with migration because stop dirty
log make migration leave dirty pages out then that'll be a problem.

make global_dirty_log a bitmask can let both migration and dirty
rate measurement work fine. introduce GLOBAL_DIRTY_MIGRATION and
GLOBAL_DIRTY_DIRTY_RATE to distinguish what current dirty log aims
for, migration or dirty rate.

all references to global_dirty_log should be untouched because any bit
set there should justify that global dirty logging is enabled.

Signed-off-by: Hyman Huang(黄勇) 
---
 include/exec/memory.h | 13 ++---
 migration/ram.c   |  8 
 softmmu/memory.c  | 36 +++-
 3 files changed, 41 insertions(+), 16 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index c158fd7084..94c7088299 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -55,7 +55,10 @@ static inline void fuzz_dma_read_cb(size_t addr,
 }
 #endif
 
-extern bool global_dirty_log;
+#define GLOBAL_DIRTY_MIGRATION  (1U<<0)
+#define GLOBAL_DIRTY_DIRTY_RATE (1U<<1)
+
+extern int global_dirty_log;
 
 typedef struct MemoryRegionOps MemoryRegionOps;
 
@@ -2099,13 +2102,17 @@ void memory_listener_unregister(MemoryListener 
*listener);
 
 /**
  * memory_global_dirty_log_start: begin dirty logging for all regions
+ *
+ * @flags: purpose of start dirty log, migration or dirty rate
  */
-void memory_global_dirty_log_start(void);
+void memory_global_dirty_log_start(int flags);
 
 /**
  * memory_global_dirty_log_stop: end dirty logging for all regions
+ *
+ * @flags: purpose of stop dirty log, migration or dirty rate
  */
-void memory_global_dirty_log_stop(void);
+void memory_global_dirty_log_stop(int flags);
 
 void mtree_info(bool flatview, bool dispatch_tree, bool owner, bool disabled);
 
diff --git a/migration/ram.c b/migration/ram.c
index 60ea913c54..9ce31af9d1 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2190,7 +2190,7 @@ static void ram_save_cleanup(void *opaque)
 /* caller have hold iothread lock or is in a bh, so there is
  * no writing race against the migration bitmap
  */
-memory_global_dirty_log_stop();
+memory_global_dirty_log_stop(GLOBAL_DIRTY_MIGRATION);
 }
 
 RAMBLOCK_FOREACH_NOT_IGNORED(block) {
@@ -2652,7 +2652,7 @@ static void ram_init_bitmaps(RAMState *rs)
 ram_list_init_bitmaps();
 /* We don't use dirty log with background snapshots */
 if (!migrate_background_snapshot()) {
-memory_global_dirty_log_start();
+memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
 migration_bitmap_sync_precopy(rs);
 }
 }
@@ -3393,7 +3393,7 @@ void colo_incoming_start_dirty_log(void)
 /* Discard this dirty bitmap record */
 bitmap_zero(block->bmap, block->max_length >> TARGET_PAGE_BITS);
 }
-memory_global_dirty_log_start();
+memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
 }
 ram_state->migration_dirty_pages = 0;
 qemu_mutex_unlock_ramlist();
@@ -3405,7 +3405,7 @@ void colo_release_ram_cache(void)
 {
 RAMBlock *block;
 
-memory_global_dirty_log_stop();
+memory_global_dirty_log_stop(GLOBAL_DIRTY_MIGRATION);
 RAMBLOCK_FOREACH_NOT_IGNORED(block) {
 g_free(block->bmap);
 block->bmap = NULL;
diff --git a/softmmu/memory.c b/softmmu/memory.c
index c19b0be6b1..b93baba82d 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -39,7 +39,7 @@
 static unsigned memory_region_transaction_depth;
 static bool memory_region_update_pending;
 static bool ioeventfd_update_pending;
-bool global_dirty_log;
+int global_dirty_log;
 
 static QTAILQ_HEAD(, MemoryListener) memory_listeners
 = QTAILQ_HEAD_INITIALIZER(memory_listeners);
@@ -2659,14 +2659,20 @@ void memory_global_after_dirty_log_sync(void)
 
 static VMChangeStateEntry *vmstate_change;
 
-void memory_global_dirty_log_start(void)
+void memory_global_dirty_log_start(int flags)
 {
 if (vmstate_change) {
 qemu_del_vm_change_state_handler(vmstate_change);
 vmstate_change = NULL;
 }
 
-global_dirty_log = true;
+if (flags & GLOBAL_DIRTY_MIGRATION) {
+global_dirty_log |= GLOBAL_DIRTY_MIGRATION;
+}
+
+if (flags & GLOBAL_DIRTY_DIRTY_RATE) {
+global_dirty_log |= GLOBAL_DIRTY_DIRTY_RATE;
+}
 
 MEMORY_LISTENER_CALL_GLOBAL(log_global_start, Forward);
 
@@ -2676,9 +2682,15 @@ void memory_global_dirty_log_start(void)
 memory_region_transaction_commit();
 }
 
-static void memory_global_dirty_log_do_stop(void)
+static void memory_global_dirty_log_do_stop(int flags)
 {
-global_dirty_log = false;
+if (flags & GLOBAL_DIRTY_MIGRATION) {
+global_dirty_log &= ~GLOBAL_DIRTY_MIGRATION;
+}
+
+if (flags & GLOBAL_DIRTY_DIRTY_RATE) {
+global_dirty_log &= ~GLOBAL_DIRTY_DIRTY_RATE;
+}
 
 /* Refresh 

[PATCH v3 2/7] hmp: Add "calc_dirty_rate" and "info dirty_rate" cmds

2021-06-06 Thread huangy81
From: Peter Xu 

These two commands are missing when adding the QMP sister commands.
Add them, so developers can play with them easier.

Signed-off-by: Peter Xu 
Signed-off-by: Hyman Huang(黄勇) 
---
 hmp-commands-info.hx  | 13 
 hmp-commands.hx   | 14 +
 include/monitor/hmp.h |  2 ++
 migration/dirtyrate.c | 47 +++
 4 files changed, 76 insertions(+)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index b2347a6aea..fb59c27200 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -867,3 +867,16 @@ SRST
   ``info replay``
 Display the record/replay information: mode and the current icount.
 ERST
+
+{
+.name   = "dirty_rate",
+.args_type  = "",
+.params = "",
+.help   = "show dirty rate information",
+.cmd= hmp_info_dirty_rate,
+},
+
+SRST
+  ``info dirty_rate``
+Display the vcpu dirty rate information.
+ERST
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 2d21fe5ad4..84dcc3aae6 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1727,3 +1727,17 @@ ERST
 .flags  = "p",
 },
 
+SRST
+``calc_dirty_rate`` *second*
+  Start a round of dirty rate measurement with the period specified in 
*second*.
+  The result of the dirty rate measurement may be observed with ``info
+  dirty_rate`` command.
+ERST
+
+{
+.name   = "calc_dirty_rate",
+.args_type  = "second:l,sample_pages_per_GB:l?",
+.params = "second [sample_pages_per_GB]",
+.help   = "start a round of guest dirty rate measurement",
+.cmd= hmp_calc_dirty_rate,
+},
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index 605d57287a..3baa1058e2 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -129,5 +129,7 @@ void hmp_info_replay(Monitor *mon, const QDict *qdict);
 void hmp_replay_break(Monitor *mon, const QDict *qdict);
 void hmp_replay_delete_break(Monitor *mon, const QDict *qdict);
 void hmp_replay_seek(Monitor *mon, const QDict *qdict);
+void hmp_info_dirty_rate(Monitor *mon, const QDict *qdict);
+void hmp_calc_dirty_rate(Monitor *mon, const QDict *qdict);
 
 #endif
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index 2ee3890721..320c56ba2c 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -20,6 +20,9 @@
 #include "ram.h"
 #include "trace.h"
 #include "dirtyrate.h"
+#include "monitor/hmp.h"
+#include "monitor/monitor.h"
+#include "qapi/qmp/qdict.h"
 
 static int CalculatingState = DIRTY_RATE_STATUS_UNSTARTED;
 static struct DirtyRateStat DirtyStat;
@@ -447,3 +450,47 @@ struct DirtyRateInfo *qmp_query_dirty_rate(Error **errp)
 {
 return query_dirty_rate_info();
 }
+
+void hmp_info_dirty_rate(Monitor *mon, const QDict *qdict)
+{
+DirtyRateInfo *info = query_dirty_rate_info();
+
+monitor_printf(mon, "Status: %s\n",
+   DirtyRateStatus_str(info->status));
+monitor_printf(mon, "Start Time: %"PRIi64" (ms)\n",
+   info->start_time);
+monitor_printf(mon, "Sample Pages: %"PRIu64" (per GB)\n",
+   info->sample_pages);
+monitor_printf(mon, "Period: %"PRIi64" (sec)\n",
+   info->calc_time);
+monitor_printf(mon, "Dirty rate: ");
+if (info->has_dirty_rate) {
+monitor_printf(mon, "%"PRIi64" (MB/s)\n", info->dirty_rate);
+} else {
+monitor_printf(mon, "(not ready)\n");
+}
+g_free(info);
+}
+
+void hmp_calc_dirty_rate(Monitor *mon, const QDict *qdict)
+{
+int64_t sec = qdict_get_try_int(qdict, "second", 0);
+int64_t sample_pages = qdict_get_try_int(qdict, "sample_pages_per_GB", -1);
+bool has_sample_pages = (sample_pages != -1);
+Error *err = NULL;
+
+if (!sec) {
+monitor_printf(mon, "Incorrect period length specified!\n");
+return;
+}
+
+qmp_calc_dirty_rate(sec, has_sample_pages, sample_pages, );
+if (err) {
+hmp_handle_error(mon, err);
+return;
+}
+
+monitor_printf(mon, "Starting dirty rate measurement with period %"PRIi64
+   " seconds\n", sec);
+monitor_printf(mon, "[Please use 'info dirty_rate' to check results]\n");
+}
-- 
2.18.2




[PATCH v3 7/7] migration/dirtyrate: implement dirty-ring dirtyrate calculation

2021-06-06 Thread huangy81
From: Hyman Huang(黄勇) 

use dirty ring feature to implement dirtyrate calculation.
to enable it, set vcpu option as true in calc-dirty-rate.

add per_vcpu as mandatory option in calc_dirty_rate, to calculate
dirty rate for vcpu, and use hmp cmd:
(qemu) calc_dirty_rate 1 on

Signed-off-by: Hyman Huang(黄勇) 
---
 hmp-commands.hx|   7 +-
 migration/dirtyrate.c  | 226 ++---
 migration/trace-events |   5 +
 3 files changed, 220 insertions(+), 18 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 84dcc3aae6..cc24ab2ab1 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1736,8 +1736,9 @@ ERST
 
 {
 .name   = "calc_dirty_rate",
-.args_type  = "second:l,sample_pages_per_GB:l?",
-.params = "second [sample_pages_per_GB]",
-.help   = "start a round of guest dirty rate measurement",
+.args_type  = "second:l,per_vcpu:b,sample_pages_per_GB:l?",
+.params = "second on|off [sample_pages_per_GB]",
+.help   = "start a round of guest dirty rate measurement, "
+  "calculate for vcpu use on|off",
 .cmd= hmp_calc_dirty_rate,
 },
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index 055145c24c..e432118f49 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -16,6 +16,9 @@
 #include "cpu.h"
 #include "exec/ramblock.h"
 #include "qemu/rcu_queue.h"
+#include "qemu/main-loop.h"
+#include "sysemu/kvm.h"
+#include "sysemu/runstate.h"
 #include "qapi/qapi-commands-migration.h"
 #include "ram.h"
 #include "trace.h"
@@ -23,9 +26,38 @@
 #include "monitor/hmp.h"
 #include "monitor/monitor.h"
 #include "qapi/qmp/qdict.h"
+#include "exec/memory.h"
+
+typedef enum {
+CALC_NONE = 0,
+CALC_DIRTY_RING,
+CALC_SAMPLE_PAGES,
+} CalcMethod;
+
+typedef struct DirtyPageRecord {
+int64_t start_pages;
+int64_t end_pages;
+} DirtyPageRecord;
+
+static DirtyPageRecord *dirty_pages;
 
 static int CalculatingState = DIRTY_RATE_STATUS_UNSTARTED;
 static struct DirtyRateStat DirtyStat;
+static CalcMethod last_method = CALC_NONE;
+bool register_powerdown_callback = false;
+
+static void dirtyrate_powerdown_req(Notifier *n, void *opaque)
+{
+if (last_method == CALC_DIRTY_RING) {
+g_free(DirtyStat.method.vcpu.rates);
+DirtyStat.method.vcpu.rates = NULL;
+}
+trace_dirtyrate_powerdown_callback();
+}
+
+static Notifier dirtyrate_powerdown_notifier = {
+.notify = dirtyrate_powerdown_req
+};
 
 static int64_t set_sample_page_period(int64_t msec, int64_t initial_time)
 {
@@ -72,6 +104,7 @@ static struct DirtyRateInfo *query_dirty_rate_info(void)
 {
 int64_t dirty_rate = DirtyStat.dirty_rate;
 struct DirtyRateInfo *info = g_malloc0(sizeof(DirtyRateInfo));
+DirtyRateVcpuList *head = NULL, **tail = 
 
 if (qatomic_read() == DIRTY_RATE_STATUS_MEASURED) {
 info->has_dirty_rate = true;
@@ -81,7 +114,22 @@ static struct DirtyRateInfo *query_dirty_rate_info(void)
 info->status = CalculatingState;
 info->start_time = DirtyStat.start_time;
 info->calc_time = DirtyStat.calc_time;
-info->sample_pages = DirtyStat.sample_pages;
+
+if (last_method == CALC_DIRTY_RING) {
+int i = 0;
+info->per_vcpu = true;
+info->has_vcpu_dirty_rate = true;
+for (i = 0; i < DirtyStat.method.vcpu.nvcpu; i++) {
+DirtyRateVcpu *rate = g_malloc0(sizeof(DirtyRateVcpu));
+rate->id = DirtyStat.method.vcpu.rates[i].id;
+rate->dirty_rate = DirtyStat.method.vcpu.rates[i].dirty_rate;
+QAPI_LIST_APPEND(tail, rate);
+}
+info->vcpu_dirty_rate = head;
+} else {
+info->has_sample_pages = true;
+info->sample_pages = DirtyStat.sample_pages;
+}
 
 trace_query_dirty_rate_info(DirtyRateStatus_str(CalculatingState));
 
@@ -94,15 +142,37 @@ static void init_dirtyrate_stat(int64_t start_time,
 DirtyStat.dirty_rate = -1;
 DirtyStat.start_time = start_time;
 DirtyStat.calc_time = config.sample_period_seconds;
-DirtyStat.sample_pages = config.sample_pages_per_gigabytes;
-
-if (config.per_vcpu) {
-DirtyStat.method.vcpu.nvcpu = -1;
-DirtyStat.method.vcpu.rates = NULL;
-} else {
-DirtyStat.method.vm.total_dirty_samples = 0;
-DirtyStat.method.vm.total_sample_count = 0;
-DirtyStat.method.vm.total_block_mem_MB = 0;
+DirtyStat.sample_pages =
+config.per_vcpu ? -1 : config.sample_pages_per_gigabytes;
+
+if (unlikely(!register_powerdown_callback)) {
+qemu_register_powerdown_notifier(_powerdown_notifier);
+register_powerdown_callback = true;
+}
+
+switch (last_method) {
+case CALC_NONE:
+case CALC_SAMPLE_PAGES:
+if (config.per_vcpu) {
+DirtyStat.method.vcpu.nvcpu = -1;
+DirtyStat.method.vcpu.rates = NULL;
+} else {
+DirtyStat.method.vm.total_dirty_samples = 

[PATCH v3 4/7] migration/dirtyrate: add per-vcpu option for calc-dirty-rate

2021-06-06 Thread huangy81
From: Hyman Huang(黄勇) 

calculate dirtyrate for each vcpu if vcpu is true, add the
dirtyrate of each vcpu to the return value also.

Signed-off-by: Hyman Huang(黄勇) 
---
 migration/dirtyrate.c | 45 ++-
 migration/dirtyrate.h |  1 +
 qapi/migration.json   | 29 ++--
 3 files changed, 68 insertions(+), 7 deletions(-)

diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index 320c56ba2c..5947c688f6 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -397,8 +397,11 @@ void *get_dirtyrate_thread(void *arg)
 return NULL;
 }
 
-void qmp_calc_dirty_rate(int64_t calc_time, bool has_sample_pages,
- int64_t sample_pages, Error **errp)
+void qmp_calc_dirty_rate(int64_t calc_time,
+ bool per_vcpu,
+ bool has_sample_pages,
+ int64_t sample_pages,
+ Error **errp)
 {
 static struct DirtyRateConfig config;
 QemuThread thread;
@@ -419,6 +422,12 @@ void qmp_calc_dirty_rate(int64_t calc_time, bool 
has_sample_pages,
 return;
 }
 
+if (has_sample_pages && per_vcpu) {
+error_setg(errp, "per-vcpu and sample-pages are mutually exclusive, "
+ "only one of then can be specified!\n");
+return;
+}
+
 if (has_sample_pages) {
 if (!is_sample_pages_valid(sample_pages)) {
 error_setg(errp, "sample-pages is out of range[%d, %d].",
@@ -430,6 +439,15 @@ void qmp_calc_dirty_rate(int64_t calc_time, bool 
has_sample_pages,
 sample_pages = DIRTYRATE_DEFAULT_SAMPLE_PAGES;
 }
 
+/*
+ * Vcpu method only works when kvm dirty ring is enabled.
+ */
+if (per_vcpu && !kvm_dirty_ring_enabled()) {
+error_setg(errp, "dirty ring is disabled or conflict with migration"
+ "use sample method or remeasure later.");
+return;
+}
+
 /*
  * Init calculation state as unstarted.
  */
@@ -442,6 +460,7 @@ void qmp_calc_dirty_rate(int64_t calc_time, bool 
has_sample_pages,
 
 config.sample_period_seconds = calc_time;
 config.sample_pages_per_gigabytes = sample_pages;
+config.per_vcpu = per_vcpu;
 qemu_thread_create(, "get_dirtyrate", get_dirtyrate_thread,
(void *), QEMU_THREAD_DETACHED);
 }
@@ -459,13 +478,22 @@ void hmp_info_dirty_rate(Monitor *mon, const QDict *qdict)
DirtyRateStatus_str(info->status));
 monitor_printf(mon, "Start Time: %"PRIi64" (ms)\n",
info->start_time);
-monitor_printf(mon, "Sample Pages: %"PRIu64" (per GB)\n",
-   info->sample_pages);
 monitor_printf(mon, "Period: %"PRIi64" (sec)\n",
info->calc_time);
+if (info->has_sample_pages) {
+monitor_printf(mon, "Sample Pages: %"PRIu64" (per GB)\n",
+   info->sample_pages);
+}
 monitor_printf(mon, "Dirty rate: ");
 if (info->has_dirty_rate) {
 monitor_printf(mon, "%"PRIi64" (MB/s)\n", info->dirty_rate);
+if (info->per_vcpu && info->has_vcpu_dirty_rate) {
+DirtyRateVcpuList *rate, *head = info->vcpu_dirty_rate;
+for (rate = head; rate != NULL; rate = rate->next) {
+monitor_printf(mon, "vcpu[%"PRIi64"], Dirty rate: %"PRIi64"\n",
+   rate->value->id, rate->value->dirty_rate);
+}
+}
 } else {
 monitor_printf(mon, "(not ready)\n");
 }
@@ -477,6 +505,7 @@ void hmp_calc_dirty_rate(Monitor *mon, const QDict *qdict)
 int64_t sec = qdict_get_try_int(qdict, "second", 0);
 int64_t sample_pages = qdict_get_try_int(qdict, "sample_pages_per_GB", -1);
 bool has_sample_pages = (sample_pages != -1);
+bool per_vcpu = qdict_get_try_bool(qdict, "per_vcpu", false);
 Error *err = NULL;
 
 if (!sec) {
@@ -484,7 +513,13 @@ void hmp_calc_dirty_rate(Monitor *mon, const QDict *qdict)
 return;
 }
 
-qmp_calc_dirty_rate(sec, has_sample_pages, sample_pages, );
+if (has_sample_pages && per_vcpu) {
+monitor_printf(mon, "per_vcpu and sample_pages are mutually exclusive, 
"
+   "only one of then can be specified!\n");
+return;
+}
+
+qmp_calc_dirty_rate(sec, per_vcpu, has_sample_pages, sample_pages, );
 if (err) {
 hmp_handle_error(mon, err);
 return;
diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h
index e1fd29089e..ec82716b40 100644
--- a/migration/dirtyrate.h
+++ b/migration/dirtyrate.h
@@ -43,6 +43,7 @@
 struct DirtyRateConfig {
 uint64_t sample_pages_per_gigabytes; /* sample pages per GB */
 int64_t sample_period_seconds; /* time duration between two sampling */
+bool per_vcpu; /* calculate dirtyrate for each vcpu using dirty ring */
 };
 
 /*
diff --git a/qapi/migration.json b/qapi/migration.json
index 770ae54c17..7eef988182 100644
--- 

[PATCH v3 0/7] support dirtyrate at the granualrity of vcpu

2021-06-06 Thread huangy81
From: Hyman Huang(黄勇) 

v3:
- pick up "migration/dirtyrate: make sample page count configurable" to
  make patchset apply master correctly

v2:
- rebase to "migration/dirtyrate: make sample page count configurable"

- rename "vcpu" to "per_vcpu" to show the per-vcpu method

- squash patch 5/6 into a single one, squash patch 1/2 also 

- pick up "hmp: Add "calc_dirty_rate" and "info dirty_rate" cmds"

- make global_dirty_log a bitmask to make sure both migration and dirty
  could not intefer with each other

- add memory free callback to prevent memory leaking 

the most different of v2 fron v1 is that we make the global_dirty_log a 
bitmask. the reason is dirty rate measurement may start or stop dirty
logging during calculation. this conflict with migration because stop
dirty log make migration leave dirty pages out then that'll be a
problem.

make global_dirty_log a bitmask can let both migration and dirty
rate measurement work fine. introduce GLOBAL_DIRTY_MIGRATION and
GLOBAL_DIRTY_DIRTY_RATE to distinguish what current dirty log aims
for, migration or dirty rate.

all references to global_dirty_log should be untouched because any bit
set there should justify that global dirty logging is enabled.

Please review, thanks !

v1:

Since the Dirty Ring on QEMU part has been merged recently, how to use
this feature is under consideration.

In the scene of migration, it is valuable to provide a more accurante
interface to track dirty memory than existing one, so that the upper
layer application can make a wise decision, or whatever. More
importantly,
dirtyrate info at the granualrity of vcpu could provide a possibility to
make migration convergent by imposing restriction on vcpu. With Dirty
Ring, we can calculate dirtyrate efficiently and cheaply.

The old interface implemented by sampling pages, it consumes cpu 
resource, and the larger guest memory size become, the more cpu resource
it consumes, namely, hard to scale. New interface has no such drawback.

Please review, thanks !

Best Regards !

Hyman Huang(黄勇) (6):
  migration/dirtyrate: make sample page count configurable
  KVM: introduce dirty_pages and kvm_dirty_ring_enabled
  migration/dirtyrate: add per-vcpu option for calc-dirty-rate
  migration/dirtyrate: adjust struct DirtyRateStat
  memory: make global_dirty_log a bitmask
  migration/dirtyrate: implement dirty-ring dirtyrate calculation

Peter Xu (1):
  hmp: Add "calc_dirty_rate" and "info dirty_rate" cmds

 accel/kvm/kvm-all.c|   7 +
 hmp-commands-info.hx   |  13 ++
 hmp-commands.hx|  15 ++
 include/exec/memory.h  |  13 +-
 include/hw/core/cpu.h  |   1 +
 include/monitor/hmp.h  |   2 +
 include/sysemu/kvm.h   |   1 +
 migration/dirtyrate.c  | 347 ++---
 migration/dirtyrate.h  |  27 +++-
 migration/ram.c|   8 +-
 migration/trace-events |   5 +
 qapi/migration.json|  38 -
 softmmu/memory.c   |  36 +++--
 13 files changed, 468 insertions(+), 45 deletions(-)

-- 
2.18.2




[PATCH v3 3/7] KVM: introduce dirty_pages and kvm_dirty_ring_enabled

2021-06-06 Thread huangy81
From: Hyman Huang(黄勇) 

dirty_pages is used to calculate dirtyrate via dirty ring, when
enabled, kvm-reaper will increase the dirty pages after gfns
being dirtied.

kvm_dirty_ring_enabled shows if kvm-reaper is working. dirtyrate
thread could use it to check if measurement can base on dirty
ring feature.

Signed-off-by: Hyman Huang(黄勇) 
---
 accel/kvm/kvm-all.c   | 7 +++
 include/hw/core/cpu.h | 1 +
 include/sysemu/kvm.h  | 1 +
 3 files changed, 9 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index c7ec538850..bc012f0bee 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -469,6 +469,7 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
 cpu->kvm_fd = ret;
 cpu->kvm_state = s;
 cpu->vcpu_dirty = true;
+cpu->dirty_pages = 0;
 
 mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
 if (mmap_size < 0) {
@@ -743,6 +744,7 @@ static uint32_t kvm_dirty_ring_reap_one(KVMState *s, 
CPUState *cpu)
 count++;
 }
 cpu->kvm_fetch_index = fetch;
+cpu->dirty_pages += count;
 
 return count;
 }
@@ -2293,6 +2295,11 @@ bool kvm_vcpu_id_is_valid(int vcpu_id)
 return vcpu_id >= 0 && vcpu_id < kvm_max_vcpu_id(s);
 }
 
+bool kvm_dirty_ring_enabled(void)
+{
+return kvm_state->kvm_dirty_ring_size ? true : false;
+}
+
 static int kvm_init(MachineState *ms)
 {
 MachineClass *mc = MACHINE_GET_CLASS(ms);
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 4e0ea68efc..80fcb1d563 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -374,6 +374,7 @@ struct CPUState {
 struct kvm_run *kvm_run;
 struct kvm_dirty_gfn *kvm_dirty_gfns;
 uint32_t kvm_fetch_index;
+uint64_t dirty_pages;
 
 /* Used for events with 'vcpu' and *without* the 'disabled' properties */
 DECLARE_BITMAP(trace_dstate_delayed, CPU_TRACE_DSTATE_MAX_EVENTS);
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index a1ab1ee12d..7b22aeb6ae 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -547,4 +547,5 @@ bool kvm_cpu_check_are_resettable(void);
 
 bool kvm_arch_cpu_check_are_resettable(void);
 
+bool kvm_dirty_ring_enabled(void);
 #endif
-- 
2.18.2




debug script for QEMU/KVM

2021-06-06 Thread Kallol Biswas
Hi,
   I am planning to develop an open source debug package that will
bring visibility to a KVM

It will navigate through the data structures and display information
in a text file and on a web page.

I have never worked on Qemu/KVM and so a guideline from a book like
Qiang Li's book "Qemu/KVM Source Code Analysis and Application" would
be great. Unfortunately there is no english version.

It seems the Qemu developers have blogs that I can follow to get
familiar with  the design and data structures of QEMU/KVM. I would
like to start with memory virtualization data structures.

How can I find Qiang Li's blog and other qemu-developers' blogs?



Re: [PATCH qemu v20] spapr: Implement Open Firmware client interface

2021-06-06 Thread BALATON Zoltan

On Fri, 4 Jun 2021, David Gibson wrote:

On Wed, Jun 02, 2021 at 02:29:29PM +0200, BALATON Zoltan wrote:

On Wed, 2 Jun 2021, David Gibson wrote:

On Thu, May 27, 2021 at 02:42:39PM +0200, BALATON Zoltan wrote:

On Thu, 27 May 2021, David Gibson wrote:

On Tue, May 25, 2021 at 12:08:45PM +0200, BALATON Zoltan wrote:

On Tue, 25 May 2021, David Gibson wrote:

On Mon, May 24, 2021 at 12:55:07PM +0200, BALATON Zoltan wrote:

On Mon, 24 May 2021, David Gibson wrote:

On Sun, May 23, 2021 at 07:09:26PM +0200, BALATON Zoltan wrote:

On Sun, 23 May 2021, BALATON Zoltan wrote:
and would using sc 1 for hypercalls on pegasos2 cause other
problems later even if the assert could be removed?


At least in the short term, I think you probably can remove the
assert.  In your case the 'sc 1' calls aren't truly to a hypervisor,
but a special case escape to qemu for the firmware emulation.  I think
it's unlikely to cause problems later, because nothing on a 32-bit
system should be attempting an 'sc 1'.  The only thing I can think of
that would fail is some test case which explicitly verified that 'sc
1' triggered a 0x700 (SIGILL from userspace).


OK so the assert should check if the CPU has an HV bit. I think there was a
#detine for that somewhere that I can add to the assert then I can try that.
What I wasn't sure about is that sc 1 would conflict with the guest's usage
of normal sc calls or are these going through different paths and only sc 1
will trigger vhyp callback not affecting notmal sc calls?


The vhyp shouldn't affect normal system calls, 'sc 1' is specifically
for hypercalls, as opposed to normal 'sc' (a.k.a. 'sc 0'), and the
vhyp only intercepts the hypercall version (after all Linux on PAPR
certainly uses its own system calls, and hypercalls are active for the
lifetime of the guest there).


(Or if this causes
an otherwise unnecessary VM exit on KVM even when it works then maybe
looking for a different way in the future might be needed.


What you're doing here won't work with KVM as it stands.  There are
basically two paths into the vhyp hypercall path: 1) from TCG, if we
interpret an 'sc 1' instruction we enter vhyp, 2) from KVM, if we get
a KVM_EXIT_PAPR_HCALL KVM exit then we also go to the vhyp path.

The second path is specific to the PAPR (ppc64) implementation of KVM,
and will not work for a non-PAPR platform without substantial
modification of the KVM code.


OK so then at that point when we try KVM we'll need to look at alternative
ways, I think MOL OSI worked with KVM at least in MOL but will probably make
all syscalls exit KVM but since we'll probably need to use KVM PR it will
exit anyway. For now I keep this vhyp as it does not run with KVM for other
reasons yet so that's another area to clean up so as a proof of concept
first version of using VOF vhyp will do.


Eh, since you'll need to modify KVM anyway, it probably makes just as
much sense to modify it to catch the 'sc 1' as MoL's magic thingy.


I'm not sure how KVM works for this case so I also don't know why and what
would need to be modified. I think we'll only have KVM PR working as newer
POWER CPUs having HV (besides being rare among potential users) are probably
too different to run the OSes that expect at most a G4 on pegasos2 so likely
it won't work with KVM HV.


Oh, it definitely won't work with KVM HV.


If we have KVM PR doesn't sc already trap so we
could add MOL OSI without further modification to KVM itself only needing
change in QEMU?


Uh... I guess so?


I also hope that MOL OSI could be useful for porting some
paravirt drivers from MOL for running Mac OS X on Mac emulation but I don't
know about that for sure so I'm open to any other solution too.


Maybe.  I never know much about MOL to begin with, and anything I did
know was a decade or more ago so I've probably forgotten.


That may still be more than what I know about it since I never had any
knowledge about PPC KVM and don't have any PPC hardware to test with so I'm
mostly guessing. (I could test with KVM emulated in QEMU and I did set up an
environment for that but that's a bit slow and inconvenient so I'd leave KVM
support to those interested and have more knowledge and hardware for it.)


Sounds like a problem for someone else another time, then.


So now that it works on TCG with vhyp I tried what it would do on KVM PR 
with the sc 1 but I could only test that on QEMU itself running in a Linux 
guest. First I've hit missing this callback:


https://git.qemu.org/?p=qemu.git;a=blob;f=target/ppc/kvm.c;h=104a308abb5700b2fe075397271f314d7f607543;hb=HEAD#l856

that I can fix by providing a callback in pegasos2.c that does what the 
else clause would do returning POWERPC_CPU(current_cpu)->env.spr[SPR_SDR1] 
(I guess that's the correct thing to do if it works without vhyp).


After getting past this, the host QEMU crashed on the first sc 1 call with 
this error:


qemu: fatal: Trying to deliver HV exception (MSR) 8 with no HV support

NIP 0148   LR 

[PATCH RFC] USB Video Class device emulation.

2021-06-06 Thread Raphael Lisicki
This continues work started by Natalia Portillo , as
submitted here: 
https://lists.nongnu.org/archive/html/qemu-devel/2010-06/msg01126.html

I have updated the changes so that the patch now applies to v4.2.1. Video
input has been expanded to support user-mode buffers so that an UVC device
itself can be used as input for this module.

My work on this has already stalled some weeks ago and I am submitting it in
case somebody might be interested.

Best regards
Raphael
---
 hw/usb/Makefile.objs |1 +
 hw/usb/usb-uvc.c | 1435 ++
 2 files changed, 1436 insertions(+)
 create mode 100644 hw/usb/usb-uvc.c

diff --git a/hw/usb/Makefile.objs b/hw/usb/Makefile.objs
index 303ac084a0..83e460af8a 100644
--- a/hw/usb/Makefile.objs
+++ b/hw/usb/Makefile.objs
@@ -26,6 +26,7 @@ common-obj-$(CONFIG_USB_AUDIO)+= dev-audio.o
 common-obj-$(CONFIG_USB_SERIAL)   += dev-serial.o
 common-obj-$(CONFIG_USB_NETWORK)  += dev-network.o
 common-obj-$(CONFIG_USB_BLUETOOTH)+= dev-bluetooth.o
+common-obj-$(CONFIG_LINUX)+= usb-uvc.o
 
 ifeq ($(CONFIG_USB_SMARTCARD),y)
 common-obj-y  += dev-smartcard-reader.o
diff --git a/hw/usb/usb-uvc.c b/hw/usb/usb-uvc.c
new file mode 100644
index 00..a7ffba68d8
--- /dev/null
+++ b/hw/usb/usb-uvc.c
@@ -0,0 +1,1435 @@
+/*
+ * USB Video Class Device emulation.
+ *
+ * Copyright (c) 2010 Claunia.com
+ * Written by Natalia Portillo 
+ *
+ * Based on hw/usb-hid.c:
+ * Copyright (c) 2005 Fabrice Bellard
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation in its version 2.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ *
+ */
+#include "usb.h"
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qemu/error-report.h"
+#include "hw/usb.h"
+#include "hw/usb/desc.h"
+#include "hw/qdev-properties.h"
+#include "hw/hw.h"
+#include 
+#include 
+#include 
+#include "qapi/error.h"
+// V4L2 ioctls
+#include 
+#include 
+
+#define DEBUG_UVC
+
+#ifdef DEBUG_UVC
+#define DPRINTF(fmt, ...) \
+do { printf("usb-uvc: " fmt , ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) do {} while(0)
+#endif
+
+/* USB Video Class Request codes */
+#define USB_UVC_RC_UNDEFINED   0x00
+#define USB_UVC_SET_CUR0x01
+#define USB_UVC_GET_CUR0x81
+#define USB_UVC_GET_MIN0x82
+#define USB_UVC_GET_MAX0x83
+#define USB_UVC_GET_RES0x84
+#define USB_UVC_GET_LEN0x85
+#define USB_UVC_GET_INFO   0x86
+#define USB_UVC_GET_DEF0x87
+
+/* USB Video Class Request types */
+#define UVCSetVideoControl 0x2100
+#define UVCSetVideoStreaming   0x2200
+#define UVCGetVideoControl 0xA100
+#define UVCGetVideoStreaming   0xA200
+
+
+enum v4l2_api {
+   V4L2_API_UNKOWN = 0,
+   V4L2_API_READ = 1,
+   V4L2_API_USERPTRS = 2,
+};
+
+
+
+typedef struct USBUVCState {
+USBDevice dev;
+   charcurrent_input;
+   char*v4l2_device;
+   size_t  height;
+   size_t  width;
+
+
+   int v4l2_fd;
+   enum v4l2_api v4l2_api;
+   char *frame;
+   char *frame_start;
+   char* frame_storage[3];
+   int frame_id;
+   int frame_remaining_bytes;
+   int frame_max_length;
+
+   /* values in 16bit, as by UVC. Lets assume the 64bit values from V4L2 
fit. Report error if the assumption breaks. */
+   struct {
+   bool supported;
+   int16_t minimum;
+   int16_t maximum;
+   int16_t resolution;
+   int16_t default_value;
+   int16_t current_value;
+   } brightness;
+} USBUVCState;
+
+
+
+static void get_frame_read_api(USBUVCState *s)
+{
+   DPRINTF("Getting frame.\n");
+   s->frame = s->frame_start;
+   int frame_length = read(s->v4l2_fd, s->frame+2, s->frame_max_length);
+   
+   if(frame_length == -1)
+   {
+   DPRINTF("Error while reading frame. errno %d\n", errno);
+   }
+   else
+   {
+   s->frame[0] = 2;
+   s->frame_id = s->frame_id ^ 1;
+   s->frame[1] = 0x82 | s->frame_id;
+   s->frame_remaining_bytes = frame_length+2;
+   DPRINTF("Got a frame of %d bytes.\n", frame_length);
+   }
+   
+   return;
+}
+
+
+
+static void get_frame_userptrs_api(USBUVCState *s)
+{
+   struct v4l2_buffer buf;
+   memset(, 0, 

Re: [RFC PATCH 2/5] ppc/pegasos2: Introduce Pegasos2MachineState structure

2021-06-06 Thread BALATON Zoltan

On Sun, 6 Jun 2021, Philippe Mathieu-Daudé wrote:

On 6/6/21 5:46 PM, BALATON Zoltan wrote:

Add own machine state structure which will be used to store state
needed for firmware emulation.

Signed-off-by: BALATON Zoltan 
---
 hw/ppc/pegasos2.c | 50 +++
 1 file changed, 37 insertions(+), 13 deletions(-)



+struct Pegasos2MachineState {
+MachineState parent_obj;
+PowerPCCPU *cpu;
+DeviceState *mv;
+};
+
 static void pegasos2_cpu_reset(void *opaque)
 {
 PowerPCCPU *cpu = opaque;
@@ -51,9 +60,9 @@ static void pegasos2_cpu_reset(void *opaque)

 static void pegasos2_init(MachineState *machine)
 {
-PowerPCCPU *cpu = NULL;
+Pegasos2MachineState *pm = PEGASOS2_MACHINE(machine);
+CPUPPCState *env;
 MemoryRegion *rom = g_new(MemoryRegion, 1);


It would be nice to have the 'rom' variable also in the machine state.
Can be done later...


I've only moved vars to the new machine state that will be needed outside 
of the init method for VOF and rom isn't so that's kept local. It could be 
put there for consistency but not really needed so I've left it here.



Reviewed-by: Philippe Mathieu-Daudé 


Thanks. By the way, even though this series allows loading Linux and 
MorphOS without the board ROM I'd prefer to keep the test using the ROM as 
implemented in your series because that excercies the emulation more than 
booting without ROM as the firmware inits more devices and uses more 
facilities that VOF mostly bypasses so for a test the board firmware is 
more complete.


Regards,
BALATON Zoltan

Re: [PATCH 8/8] Fixes for seconday CPU start-up.

2021-06-06 Thread Richard Henderson

On 6/2/21 8:53 PM, Jason Thorpe wrote:

Changes to make secondary CPU start-up work on NetBSD, which depends
on some specific behavior in the architecture specification:

- Change the internal swppal() function to take the new VPTPTR and
   Procedure Value as explicit arguments.  Adapt do_start() to the
   new the new swppal() signature.

- In do_start_wait(), extract the new VPTPTR and PV from the relevant
   HWRPB fields, which will have been initialized by the OS, and pass
   them to swppal().

- In the SWPPAL PAL call, get the value to stuff into PV (r27) from
   a4 (r20), and add a comment describing why this implementation detail
   is allowed by the architecture specification.

Signed-off-by: Jason Thorpe
---
  init.c | 25 -
  pal.S  | 13 ++---
  2 files changed, 26 insertions(+), 12 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 0/8] PALcode fixes required to run NetBSD/alpha.

2021-06-06 Thread Richard Henderson

On 6/2/21 8:53 PM, Jason Thorpe wrote:

Jason Thorpe (8):
   Make qemu-palcode build environment standalone. NFC.
   Fix delivery of unaligned access exceptions.
   Fix initialization of the hwrpb.hwrpb.cpuid field.
   Make some PCI macros available to other files.  NFC.
   Fix incorrect initialization of PCI BARs.
   Provide interrupt mapping information in PCI config registers.
   Provide a Console Terminal Block in the HWRPB.
   Fixes for seconday CPU start-up.


Thanks.  Applied all but the Console Terminal Block patch, and pushed to my 
upstream github repo.


I'll wait for an update on the ctb patch before refreshing the image in the 
qemu repo.



r~



Re: [PATCH 7/8] Provide a Console Terminal Block in the HWRPB.

2021-06-06 Thread Jason Thorpe



> On Jun 6, 2021, at 12:27 PM, Richard Henderson  
> wrote:
> 
> On 6/2/21 8:53 PM, Jason Thorpe wrote:
>> +  hwrpb.hwrpb.ctbt_offset = offsetof(struct hwrpb_combine, ctb);
>> +  hwrpb.hwrpb.ctb_size = sizeof(hwrpb.ctb);
>> +  if (have_vga && !CONFIG_NOGRAPHICS(config))
>> +{
>> +  hwrpb.ctb.term_type = CTB_GRAPHICS;
>> +  hwrpb.ctb.turboslot = (CTB_TURBOSLOT_TYPE_PCI << 16) |
>> +(pci_vga_bus << 8) | pci_vga_dev;
>> +}
>> +  else
>> +{
>> +  hwrpb.ctb.term_type = CTB_PRINTERPORT;
>> +}
> 
> I'm concerned that you're initializing only 1 or 2 slots of 34.
> 
> It would seem that at a bare minimum the struct should be zeroed, and the 
> device-independent header (4 slots) should be set.

I'll rework it.

> I notice you're setting term_type (offset 56) and not type (offset 0), which 
> is where my documentation says that CTB_GRAPHICS goes (Console Interface 
> Architecture 2.3.8.2 Console Terminal Block Table).

It could be that the value was mirrored in both fields.  I'll investigate 
further.

> I'm also confused that this
> 
>> + * Format of the Console Terminal Block Type 4 `turboslot' field:
> 
> says "type 4", but you're actually using type 3 (GRAPHICS) above.

Yes.  The GRAPHICS type was originally just for the TURBOchannel systems, but 
when the first AlphaStations landed, SRM continued using GRAPHICS as the 
"term_type" ... it's entirely possible that the "type" field was in fact set to 
MULTIPURPOSE in that case.

> But I do see that what you're filling in is exactly what netbsd examines -- 
> no header checks, no size checks, or anything.  And that openbsd has an exact 
> copy of that code.

I'll see if I can figure out what Digital Unix does.

-- thorpej




[PATCH v2 5/6] memory: make global_dirty_log a bitmask

2021-06-06 Thread huangy81
From: Hyman Huang(黄勇) 

dirty rate measurement may start or stop dirty logging during
calculation. this conflict with migration because stop dirty
log make migration leave dirty pages out then that'll be a problem.

make global_dirty_log a bitmask can let both migration and dirty
rate measurement work fine. introduce GLOBAL_DIRTY_MIGRATION and
GLOBAL_DIRTY_DIRTY_RATE to distinguish what current dirty log aims
for, migration or dirty rate.

all references to global_dirty_log should be untouched because any bit
set there should justify that global dirty logging is enabled.

Signed-off-by: Hyman Huang(黄勇) 
---
 include/exec/memory.h | 13 ++---
 migration/ram.c   |  8 
 softmmu/memory.c  | 36 +++-
 3 files changed, 41 insertions(+), 16 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index c158fd7084..94c7088299 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -55,7 +55,10 @@ static inline void fuzz_dma_read_cb(size_t addr,
 }
 #endif
 
-extern bool global_dirty_log;
+#define GLOBAL_DIRTY_MIGRATION  (1U<<0)
+#define GLOBAL_DIRTY_DIRTY_RATE (1U<<1)
+
+extern int global_dirty_log;
 
 typedef struct MemoryRegionOps MemoryRegionOps;
 
@@ -2099,13 +2102,17 @@ void memory_listener_unregister(MemoryListener 
*listener);
 
 /**
  * memory_global_dirty_log_start: begin dirty logging for all regions
+ *
+ * @flags: purpose of start dirty log, migration or dirty rate
  */
-void memory_global_dirty_log_start(void);
+void memory_global_dirty_log_start(int flags);
 
 /**
  * memory_global_dirty_log_stop: end dirty logging for all regions
+ *
+ * @flags: purpose of stop dirty log, migration or dirty rate
  */
-void memory_global_dirty_log_stop(void);
+void memory_global_dirty_log_stop(int flags);
 
 void mtree_info(bool flatview, bool dispatch_tree, bool owner, bool disabled);
 
diff --git a/migration/ram.c b/migration/ram.c
index 60ea913c54..9ce31af9d1 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2190,7 +2190,7 @@ static void ram_save_cleanup(void *opaque)
 /* caller have hold iothread lock or is in a bh, so there is
  * no writing race against the migration bitmap
  */
-memory_global_dirty_log_stop();
+memory_global_dirty_log_stop(GLOBAL_DIRTY_MIGRATION);
 }
 
 RAMBLOCK_FOREACH_NOT_IGNORED(block) {
@@ -2652,7 +2652,7 @@ static void ram_init_bitmaps(RAMState *rs)
 ram_list_init_bitmaps();
 /* We don't use dirty log with background snapshots */
 if (!migrate_background_snapshot()) {
-memory_global_dirty_log_start();
+memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
 migration_bitmap_sync_precopy(rs);
 }
 }
@@ -3393,7 +3393,7 @@ void colo_incoming_start_dirty_log(void)
 /* Discard this dirty bitmap record */
 bitmap_zero(block->bmap, block->max_length >> TARGET_PAGE_BITS);
 }
-memory_global_dirty_log_start();
+memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
 }
 ram_state->migration_dirty_pages = 0;
 qemu_mutex_unlock_ramlist();
@@ -3405,7 +3405,7 @@ void colo_release_ram_cache(void)
 {
 RAMBlock *block;
 
-memory_global_dirty_log_stop();
+memory_global_dirty_log_stop(GLOBAL_DIRTY_MIGRATION);
 RAMBLOCK_FOREACH_NOT_IGNORED(block) {
 g_free(block->bmap);
 block->bmap = NULL;
diff --git a/softmmu/memory.c b/softmmu/memory.c
index c19b0be6b1..b93baba82d 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -39,7 +39,7 @@
 static unsigned memory_region_transaction_depth;
 static bool memory_region_update_pending;
 static bool ioeventfd_update_pending;
-bool global_dirty_log;
+int global_dirty_log;
 
 static QTAILQ_HEAD(, MemoryListener) memory_listeners
 = QTAILQ_HEAD_INITIALIZER(memory_listeners);
@@ -2659,14 +2659,20 @@ void memory_global_after_dirty_log_sync(void)
 
 static VMChangeStateEntry *vmstate_change;
 
-void memory_global_dirty_log_start(void)
+void memory_global_dirty_log_start(int flags)
 {
 if (vmstate_change) {
 qemu_del_vm_change_state_handler(vmstate_change);
 vmstate_change = NULL;
 }
 
-global_dirty_log = true;
+if (flags & GLOBAL_DIRTY_MIGRATION) {
+global_dirty_log |= GLOBAL_DIRTY_MIGRATION;
+}
+
+if (flags & GLOBAL_DIRTY_DIRTY_RATE) {
+global_dirty_log |= GLOBAL_DIRTY_DIRTY_RATE;
+}
 
 MEMORY_LISTENER_CALL_GLOBAL(log_global_start, Forward);
 
@@ -2676,9 +2682,15 @@ void memory_global_dirty_log_start(void)
 memory_region_transaction_commit();
 }
 
-static void memory_global_dirty_log_do_stop(void)
+static void memory_global_dirty_log_do_stop(int flags)
 {
-global_dirty_log = false;
+if (flags & GLOBAL_DIRTY_MIGRATION) {
+global_dirty_log &= ~GLOBAL_DIRTY_MIGRATION;
+}
+
+if (flags & GLOBAL_DIRTY_DIRTY_RATE) {
+global_dirty_log &= ~GLOBAL_DIRTY_DIRTY_RATE;
+}
 
 /* Refresh 

[PATCH v2 3/6] migration/dirtyrate: add per-vcpu option for calc-dirty-rate

2021-06-06 Thread huangy81
From: Hyman Huang(黄勇) 

calculate dirtyrate for each vcpu if vcpu is true, add the
dirtyrate of each vcpu to the return value also.

Signed-off-by: Hyman Huang(黄勇) 
---
 migration/dirtyrate.c | 45 ++-
 migration/dirtyrate.h |  1 +
 qapi/migration.json   | 29 ++--
 3 files changed, 68 insertions(+), 7 deletions(-)

diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index 320c56ba2c..5947c688f6 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -397,8 +397,11 @@ void *get_dirtyrate_thread(void *arg)
 return NULL;
 }
 
-void qmp_calc_dirty_rate(int64_t calc_time, bool has_sample_pages,
- int64_t sample_pages, Error **errp)
+void qmp_calc_dirty_rate(int64_t calc_time,
+ bool per_vcpu,
+ bool has_sample_pages,
+ int64_t sample_pages,
+ Error **errp)
 {
 static struct DirtyRateConfig config;
 QemuThread thread;
@@ -419,6 +422,12 @@ void qmp_calc_dirty_rate(int64_t calc_time, bool 
has_sample_pages,
 return;
 }
 
+if (has_sample_pages && per_vcpu) {
+error_setg(errp, "per-vcpu and sample-pages are mutually exclusive, "
+ "only one of then can be specified!\n");
+return;
+}
+
 if (has_sample_pages) {
 if (!is_sample_pages_valid(sample_pages)) {
 error_setg(errp, "sample-pages is out of range[%d, %d].",
@@ -430,6 +439,15 @@ void qmp_calc_dirty_rate(int64_t calc_time, bool 
has_sample_pages,
 sample_pages = DIRTYRATE_DEFAULT_SAMPLE_PAGES;
 }
 
+/*
+ * Vcpu method only works when kvm dirty ring is enabled.
+ */
+if (per_vcpu && !kvm_dirty_ring_enabled()) {
+error_setg(errp, "dirty ring is disabled or conflict with migration"
+ "use sample method or remeasure later.");
+return;
+}
+
 /*
  * Init calculation state as unstarted.
  */
@@ -442,6 +460,7 @@ void qmp_calc_dirty_rate(int64_t calc_time, bool 
has_sample_pages,
 
 config.sample_period_seconds = calc_time;
 config.sample_pages_per_gigabytes = sample_pages;
+config.per_vcpu = per_vcpu;
 qemu_thread_create(, "get_dirtyrate", get_dirtyrate_thread,
(void *), QEMU_THREAD_DETACHED);
 }
@@ -459,13 +478,22 @@ void hmp_info_dirty_rate(Monitor *mon, const QDict *qdict)
DirtyRateStatus_str(info->status));
 monitor_printf(mon, "Start Time: %"PRIi64" (ms)\n",
info->start_time);
-monitor_printf(mon, "Sample Pages: %"PRIu64" (per GB)\n",
-   info->sample_pages);
 monitor_printf(mon, "Period: %"PRIi64" (sec)\n",
info->calc_time);
+if (info->has_sample_pages) {
+monitor_printf(mon, "Sample Pages: %"PRIu64" (per GB)\n",
+   info->sample_pages);
+}
 monitor_printf(mon, "Dirty rate: ");
 if (info->has_dirty_rate) {
 monitor_printf(mon, "%"PRIi64" (MB/s)\n", info->dirty_rate);
+if (info->per_vcpu && info->has_vcpu_dirty_rate) {
+DirtyRateVcpuList *rate, *head = info->vcpu_dirty_rate;
+for (rate = head; rate != NULL; rate = rate->next) {
+monitor_printf(mon, "vcpu[%"PRIi64"], Dirty rate: %"PRIi64"\n",
+   rate->value->id, rate->value->dirty_rate);
+}
+}
 } else {
 monitor_printf(mon, "(not ready)\n");
 }
@@ -477,6 +505,7 @@ void hmp_calc_dirty_rate(Monitor *mon, const QDict *qdict)
 int64_t sec = qdict_get_try_int(qdict, "second", 0);
 int64_t sample_pages = qdict_get_try_int(qdict, "sample_pages_per_GB", -1);
 bool has_sample_pages = (sample_pages != -1);
+bool per_vcpu = qdict_get_try_bool(qdict, "per_vcpu", false);
 Error *err = NULL;
 
 if (!sec) {
@@ -484,7 +513,13 @@ void hmp_calc_dirty_rate(Monitor *mon, const QDict *qdict)
 return;
 }
 
-qmp_calc_dirty_rate(sec, has_sample_pages, sample_pages, );
+if (has_sample_pages && per_vcpu) {
+monitor_printf(mon, "per_vcpu and sample_pages are mutually exclusive, 
"
+   "only one of then can be specified!\n");
+return;
+}
+
+qmp_calc_dirty_rate(sec, per_vcpu, has_sample_pages, sample_pages, );
 if (err) {
 hmp_handle_error(mon, err);
 return;
diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h
index e1fd29089e..ec82716b40 100644
--- a/migration/dirtyrate.h
+++ b/migration/dirtyrate.h
@@ -43,6 +43,7 @@
 struct DirtyRateConfig {
 uint64_t sample_pages_per_gigabytes; /* sample pages per GB */
 int64_t sample_period_seconds; /* time duration between two sampling */
+bool per_vcpu; /* calculate dirtyrate for each vcpu using dirty ring */
 };
 
 /*
diff --git a/qapi/migration.json b/qapi/migration.json
index 770ae54c17..7eef988182 100644
--- 

[PATCH v2 6/6] migration/dirtyrate: implement dirty-ring dirtyrate calculation

2021-06-06 Thread huangy81
From: Hyman Huang(黄勇) 

use dirty ring feature to implement dirtyrate calculation.
to enable it, set vcpu option as true in calc-dirty-rate.

add per_vcpu as mandatory option in calc_dirty_rate, to calculate
dirty rate for vcpu, and use hmp cmd:
(qemu) calc_dirty_rate 1 on

Signed-off-by: Hyman Huang(黄勇) 
---
 hmp-commands.hx|   7 +-
 migration/dirtyrate.c  | 226 ++---
 migration/trace-events |   5 +
 3 files changed, 220 insertions(+), 18 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 84dcc3aae6..cc24ab2ab1 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1736,8 +1736,9 @@ ERST
 
 {
 .name   = "calc_dirty_rate",
-.args_type  = "second:l,sample_pages_per_GB:l?",
-.params = "second [sample_pages_per_GB]",
-.help   = "start a round of guest dirty rate measurement",
+.args_type  = "second:l,per_vcpu:b,sample_pages_per_GB:l?",
+.params = "second on|off [sample_pages_per_GB]",
+.help   = "start a round of guest dirty rate measurement, "
+  "calculate for vcpu use on|off",
 .cmd= hmp_calc_dirty_rate,
 },
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index 055145c24c..e432118f49 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -16,6 +16,9 @@
 #include "cpu.h"
 #include "exec/ramblock.h"
 #include "qemu/rcu_queue.h"
+#include "qemu/main-loop.h"
+#include "sysemu/kvm.h"
+#include "sysemu/runstate.h"
 #include "qapi/qapi-commands-migration.h"
 #include "ram.h"
 #include "trace.h"
@@ -23,9 +26,38 @@
 #include "monitor/hmp.h"
 #include "monitor/monitor.h"
 #include "qapi/qmp/qdict.h"
+#include "exec/memory.h"
+
+typedef enum {
+CALC_NONE = 0,
+CALC_DIRTY_RING,
+CALC_SAMPLE_PAGES,
+} CalcMethod;
+
+typedef struct DirtyPageRecord {
+int64_t start_pages;
+int64_t end_pages;
+} DirtyPageRecord;
+
+static DirtyPageRecord *dirty_pages;
 
 static int CalculatingState = DIRTY_RATE_STATUS_UNSTARTED;
 static struct DirtyRateStat DirtyStat;
+static CalcMethod last_method = CALC_NONE;
+bool register_powerdown_callback = false;
+
+static void dirtyrate_powerdown_req(Notifier *n, void *opaque)
+{
+if (last_method == CALC_DIRTY_RING) {
+g_free(DirtyStat.method.vcpu.rates);
+DirtyStat.method.vcpu.rates = NULL;
+}
+trace_dirtyrate_powerdown_callback();
+}
+
+static Notifier dirtyrate_powerdown_notifier = {
+.notify = dirtyrate_powerdown_req
+};
 
 static int64_t set_sample_page_period(int64_t msec, int64_t initial_time)
 {
@@ -72,6 +104,7 @@ static struct DirtyRateInfo *query_dirty_rate_info(void)
 {
 int64_t dirty_rate = DirtyStat.dirty_rate;
 struct DirtyRateInfo *info = g_malloc0(sizeof(DirtyRateInfo));
+DirtyRateVcpuList *head = NULL, **tail = 
 
 if (qatomic_read() == DIRTY_RATE_STATUS_MEASURED) {
 info->has_dirty_rate = true;
@@ -81,7 +114,22 @@ static struct DirtyRateInfo *query_dirty_rate_info(void)
 info->status = CalculatingState;
 info->start_time = DirtyStat.start_time;
 info->calc_time = DirtyStat.calc_time;
-info->sample_pages = DirtyStat.sample_pages;
+
+if (last_method == CALC_DIRTY_RING) {
+int i = 0;
+info->per_vcpu = true;
+info->has_vcpu_dirty_rate = true;
+for (i = 0; i < DirtyStat.method.vcpu.nvcpu; i++) {
+DirtyRateVcpu *rate = g_malloc0(sizeof(DirtyRateVcpu));
+rate->id = DirtyStat.method.vcpu.rates[i].id;
+rate->dirty_rate = DirtyStat.method.vcpu.rates[i].dirty_rate;
+QAPI_LIST_APPEND(tail, rate);
+}
+info->vcpu_dirty_rate = head;
+} else {
+info->has_sample_pages = true;
+info->sample_pages = DirtyStat.sample_pages;
+}
 
 trace_query_dirty_rate_info(DirtyRateStatus_str(CalculatingState));
 
@@ -94,15 +142,37 @@ static void init_dirtyrate_stat(int64_t start_time,
 DirtyStat.dirty_rate = -1;
 DirtyStat.start_time = start_time;
 DirtyStat.calc_time = config.sample_period_seconds;
-DirtyStat.sample_pages = config.sample_pages_per_gigabytes;
-
-if (config.per_vcpu) {
-DirtyStat.method.vcpu.nvcpu = -1;
-DirtyStat.method.vcpu.rates = NULL;
-} else {
-DirtyStat.method.vm.total_dirty_samples = 0;
-DirtyStat.method.vm.total_sample_count = 0;
-DirtyStat.method.vm.total_block_mem_MB = 0;
+DirtyStat.sample_pages =
+config.per_vcpu ? -1 : config.sample_pages_per_gigabytes;
+
+if (unlikely(!register_powerdown_callback)) {
+qemu_register_powerdown_notifier(_powerdown_notifier);
+register_powerdown_callback = true;
+}
+
+switch (last_method) {
+case CALC_NONE:
+case CALC_SAMPLE_PAGES:
+if (config.per_vcpu) {
+DirtyStat.method.vcpu.nvcpu = -1;
+DirtyStat.method.vcpu.rates = NULL;
+} else {
+DirtyStat.method.vm.total_dirty_samples = 

[PATCH v2 4/6] migration/dirtyrate: adjust struct DirtyRateStat

2021-06-06 Thread huangy81
From: Hyman Huang(黄勇) 

use union to store stat data of two mutual exclusive methods.

Signed-off-by: Hyman Huang(黄勇) 
---
 migration/dirtyrate.c | 40 +---
 migration/dirtyrate.h | 18 +++---
 2 files changed, 36 insertions(+), 22 deletions(-)

diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index 5947c688f6..055145c24c 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -88,33 +88,39 @@ static struct DirtyRateInfo *query_dirty_rate_info(void)
 return info;
 }
 
-static void init_dirtyrate_stat(int64_t start_time, int64_t calc_time,
-uint64_t sample_pages)
+static void init_dirtyrate_stat(int64_t start_time,
+struct DirtyRateConfig config)
 {
-DirtyStat.total_dirty_samples = 0;
-DirtyStat.total_sample_count = 0;
-DirtyStat.total_block_mem_MB = 0;
 DirtyStat.dirty_rate = -1;
 DirtyStat.start_time = start_time;
-DirtyStat.calc_time = calc_time;
-DirtyStat.sample_pages = sample_pages;
+DirtyStat.calc_time = config.sample_period_seconds;
+DirtyStat.sample_pages = config.sample_pages_per_gigabytes;
+
+if (config.per_vcpu) {
+DirtyStat.method.vcpu.nvcpu = -1;
+DirtyStat.method.vcpu.rates = NULL;
+} else {
+DirtyStat.method.vm.total_dirty_samples = 0;
+DirtyStat.method.vm.total_sample_count = 0;
+DirtyStat.method.vm.total_block_mem_MB = 0;
+}
 }
 
 static void update_dirtyrate_stat(struct RamblockDirtyInfo *info)
 {
-DirtyStat.total_dirty_samples += info->sample_dirty_count;
-DirtyStat.total_sample_count += info->sample_pages_count;
+DirtyStat.method.vm.total_dirty_samples += info->sample_dirty_count;
+DirtyStat.method.vm.total_sample_count += info->sample_pages_count;
 /* size of total pages in MB */
-DirtyStat.total_block_mem_MB += (info->ramblock_pages *
+DirtyStat.method.vm.total_block_mem_MB += (info->ramblock_pages *
  TARGET_PAGE_SIZE) >> 20;
 }
 
 static void update_dirtyrate(uint64_t msec)
 {
 uint64_t dirtyrate;
-uint64_t total_dirty_samples = DirtyStat.total_dirty_samples;
-uint64_t total_sample_count = DirtyStat.total_sample_count;
-uint64_t total_block_mem_MB = DirtyStat.total_block_mem_MB;
+uint64_t total_dirty_samples = DirtyStat.method.vm.total_dirty_samples;
+uint64_t total_sample_count = DirtyStat.method.vm.total_sample_count;
+uint64_t total_block_mem_MB = DirtyStat.method.vm.total_block_mem_MB;
 
 dirtyrate = total_dirty_samples * total_block_mem_MB *
 1000 / (total_sample_count * msec);
@@ -327,7 +333,7 @@ static bool compare_page_hash_info(struct RamblockDirtyInfo 
*info,
 update_dirtyrate_stat(block_dinfo);
 }
 
-if (DirtyStat.total_sample_count == 0) {
+if (DirtyStat.method.vm.total_sample_count == 0) {
 return false;
 }
 
@@ -372,8 +378,6 @@ void *get_dirtyrate_thread(void *arg)
 struct DirtyRateConfig config = *(struct DirtyRateConfig *)arg;
 int ret;
 int64_t start_time;
-int64_t calc_time;
-uint64_t sample_pages;
 
 ret = dirtyrate_set_state(, DIRTY_RATE_STATUS_UNSTARTED,
   DIRTY_RATE_STATUS_MEASURING);
@@ -383,9 +387,7 @@ void *get_dirtyrate_thread(void *arg)
 }
 
 start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000;
-calc_time = config.sample_period_seconds;
-sample_pages = config.sample_pages_per_gigabytes;
-init_dirtyrate_stat(start_time, calc_time, sample_pages);
+init_dirtyrate_stat(start_time, config);
 
 calculate_dirtyrate(config);
 
diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h
index ec82716b40..af32e33d5d 100644
--- a/migration/dirtyrate.h
+++ b/migration/dirtyrate.h
@@ -59,17 +59,29 @@ struct RamblockDirtyInfo {
 uint32_t *hash_result; /* array of hash result for sampled pages */
 };
 
+typedef struct SampleVMStat {
+uint64_t total_dirty_samples; /* total dirty sampled page */
+uint64_t total_sample_count; /* total sampled pages */
+uint64_t total_block_mem_MB; /* size of total sampled pages in MB */
+} SampleVMStat;
+
+typedef struct VcpuStat {
+int nvcpu; /* number of vcpu */
+DirtyRateVcpu *rates; /* array of dirty rate for each vcpu */
+} VcpuStat;
+
 /*
  * Store calculation statistics for each measure.
  */
 struct DirtyRateStat {
-uint64_t total_dirty_samples; /* total dirty sampled page */
-uint64_t total_sample_count; /* total sampled pages */
-uint64_t total_block_mem_MB; /* size of total sampled pages in MB */
 int64_t dirty_rate; /* dirty rate in MB/s */
 int64_t start_time; /* calculation start time in units of second */
 int64_t calc_time; /* time duration of two sampling in units of second */
 uint64_t sample_pages; /* sample pages per GB */
+union {
+SampleVMStat vm;
+VcpuStat vcpu;
+} method;
 };
 
 void 

[PATCH v2 2/6] KVM: introduce dirty_pages and kvm_dirty_ring_enabled

2021-06-06 Thread huangy81
From: Hyman Huang(黄勇) 

dirty_pages is used to calculate dirtyrate via dirty ring, when
enabled, kvm-reaper will increase the dirty pages after gfns
being dirtied.

kvm_dirty_ring_enabled shows if kvm-reaper is working. dirtyrate
thread could use it to check if measurement can base on dirty
ring feature.

Signed-off-by: Hyman Huang(黄勇) 
---
 accel/kvm/kvm-all.c   | 7 +++
 include/hw/core/cpu.h | 1 +
 include/sysemu/kvm.h  | 1 +
 3 files changed, 9 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index c7ec538850..bc012f0bee 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -469,6 +469,7 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
 cpu->kvm_fd = ret;
 cpu->kvm_state = s;
 cpu->vcpu_dirty = true;
+cpu->dirty_pages = 0;
 
 mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
 if (mmap_size < 0) {
@@ -743,6 +744,7 @@ static uint32_t kvm_dirty_ring_reap_one(KVMState *s, 
CPUState *cpu)
 count++;
 }
 cpu->kvm_fetch_index = fetch;
+cpu->dirty_pages += count;
 
 return count;
 }
@@ -2293,6 +2295,11 @@ bool kvm_vcpu_id_is_valid(int vcpu_id)
 return vcpu_id >= 0 && vcpu_id < kvm_max_vcpu_id(s);
 }
 
+bool kvm_dirty_ring_enabled(void)
+{
+return kvm_state->kvm_dirty_ring_size ? true : false;
+}
+
 static int kvm_init(MachineState *ms)
 {
 MachineClass *mc = MACHINE_GET_CLASS(ms);
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 4e0ea68efc..80fcb1d563 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -374,6 +374,7 @@ struct CPUState {
 struct kvm_run *kvm_run;
 struct kvm_dirty_gfn *kvm_dirty_gfns;
 uint32_t kvm_fetch_index;
+uint64_t dirty_pages;
 
 /* Used for events with 'vcpu' and *without* the 'disabled' properties */
 DECLARE_BITMAP(trace_dstate_delayed, CPU_TRACE_DSTATE_MAX_EVENTS);
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index a1ab1ee12d..7b22aeb6ae 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -547,4 +547,5 @@ bool kvm_cpu_check_are_resettable(void);
 
 bool kvm_arch_cpu_check_are_resettable(void);
 
+bool kvm_dirty_ring_enabled(void);
 #endif
-- 
2.18.2




[PATCH v2 1/6] hmp: Add "calc_dirty_rate" and "info dirty_rate" cmds

2021-06-06 Thread huangy81
From: Peter Xu 

These two commands are missing when adding the QMP sister commands.
Add them, so developers can play with them easier.

Signed-off-by: Peter Xu 
Signed-off-by: Hyman Huang(黄勇) 
---
 hmp-commands-info.hx  | 13 
 hmp-commands.hx   | 14 +
 include/monitor/hmp.h |  2 ++
 migration/dirtyrate.c | 47 +++
 4 files changed, 76 insertions(+)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index b2347a6aea..fb59c27200 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -867,3 +867,16 @@ SRST
   ``info replay``
 Display the record/replay information: mode and the current icount.
 ERST
+
+{
+.name   = "dirty_rate",
+.args_type  = "",
+.params = "",
+.help   = "show dirty rate information",
+.cmd= hmp_info_dirty_rate,
+},
+
+SRST
+  ``info dirty_rate``
+Display the vcpu dirty rate information.
+ERST
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 2d21fe5ad4..84dcc3aae6 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1727,3 +1727,17 @@ ERST
 .flags  = "p",
 },
 
+SRST
+``calc_dirty_rate`` *second*
+  Start a round of dirty rate measurement with the period specified in 
*second*.
+  The result of the dirty rate measurement may be observed with ``info
+  dirty_rate`` command.
+ERST
+
+{
+.name   = "calc_dirty_rate",
+.args_type  = "second:l,sample_pages_per_GB:l?",
+.params = "second [sample_pages_per_GB]",
+.help   = "start a round of guest dirty rate measurement",
+.cmd= hmp_calc_dirty_rate,
+},
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index 605d57287a..3baa1058e2 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -129,5 +129,7 @@ void hmp_info_replay(Monitor *mon, const QDict *qdict);
 void hmp_replay_break(Monitor *mon, const QDict *qdict);
 void hmp_replay_delete_break(Monitor *mon, const QDict *qdict);
 void hmp_replay_seek(Monitor *mon, const QDict *qdict);
+void hmp_info_dirty_rate(Monitor *mon, const QDict *qdict);
+void hmp_calc_dirty_rate(Monitor *mon, const QDict *qdict);
 
 #endif
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index 2ee3890721..320c56ba2c 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -20,6 +20,9 @@
 #include "ram.h"
 #include "trace.h"
 #include "dirtyrate.h"
+#include "monitor/hmp.h"
+#include "monitor/monitor.h"
+#include "qapi/qmp/qdict.h"
 
 static int CalculatingState = DIRTY_RATE_STATUS_UNSTARTED;
 static struct DirtyRateStat DirtyStat;
@@ -447,3 +450,47 @@ struct DirtyRateInfo *qmp_query_dirty_rate(Error **errp)
 {
 return query_dirty_rate_info();
 }
+
+void hmp_info_dirty_rate(Monitor *mon, const QDict *qdict)
+{
+DirtyRateInfo *info = query_dirty_rate_info();
+
+monitor_printf(mon, "Status: %s\n",
+   DirtyRateStatus_str(info->status));
+monitor_printf(mon, "Start Time: %"PRIi64" (ms)\n",
+   info->start_time);
+monitor_printf(mon, "Sample Pages: %"PRIu64" (per GB)\n",
+   info->sample_pages);
+monitor_printf(mon, "Period: %"PRIi64" (sec)\n",
+   info->calc_time);
+monitor_printf(mon, "Dirty rate: ");
+if (info->has_dirty_rate) {
+monitor_printf(mon, "%"PRIi64" (MB/s)\n", info->dirty_rate);
+} else {
+monitor_printf(mon, "(not ready)\n");
+}
+g_free(info);
+}
+
+void hmp_calc_dirty_rate(Monitor *mon, const QDict *qdict)
+{
+int64_t sec = qdict_get_try_int(qdict, "second", 0);
+int64_t sample_pages = qdict_get_try_int(qdict, "sample_pages_per_GB", -1);
+bool has_sample_pages = (sample_pages != -1);
+Error *err = NULL;
+
+if (!sec) {
+monitor_printf(mon, "Incorrect period length specified!\n");
+return;
+}
+
+qmp_calc_dirty_rate(sec, has_sample_pages, sample_pages, );
+if (err) {
+hmp_handle_error(mon, err);
+return;
+}
+
+monitor_printf(mon, "Starting dirty rate measurement with period %"PRIi64
+   " seconds\n", sec);
+monitor_printf(mon, "[Please use 'info dirty_rate' to check results]\n");
+}
-- 
2.18.2




[PATCH v2 0/6] support dirtyrate at the granualrity of vcpu

2021-06-06 Thread huangy81
From: Hyman Huang(黄勇) 

v2:
- rebase to "migration/dirtyrate: make sample page count configurable"

- rename "vcpu" to "per_vcpu" to show the per-vcpu method

- squash patch 5/6 into a single one, squash patch 1/2 also 

- pick up "hmp: Add "calc_dirty_rate" and "info dirty_rate" cmds"

- make global_dirty_log a bitmask to make sure both migration and dirty
  could not intefer with each other

- add memory free callback to prevent memory leaking 

the most different of v2 fron v1 is that we make the global_dirty_log a 
bitmask. the reason is dirty rate measurement may start or stop dirty
logging during calculation. this conflict with migration because stop
dirty log make migration leave dirty pages out then that'll be a problem.

make global_dirty_log a bitmask can let both migration and dirty
rate measurement work fine. introduce GLOBAL_DIRTY_MIGRATION and
GLOBAL_DIRTY_DIRTY_RATE to distinguish what current dirty log aims
for, migration or dirty rate.

all references to global_dirty_log should be untouched because any bit
set there should justify that global dirty logging is enabled.

v1:

Since the Dirty Ring on QEMU part has been merged recently, how to use
this feature is under consideration.

In the scene of migration, it is valuable to provide a more accurante
interface to track dirty memory than existing one, so that the upper
layer application can make a wise decision, or whatever. More
importantly,
dirtyrate info at the granualrity of vcpu could provide a possibility to
make migration convergent by imposing restriction on vcpu. With Dirty
Ring, we can calculate dirtyrate efficiently and cheaply.

The old interface implemented by sampling pages, it consumes cpu 
resource, and the larger guest memory size become, the more cpu resource
it consumes, namely, hard to scale. New interface has no such drawback.

Please review, thanks !

Best Regards !

Hyman Huang(黄勇) (5):
  KVM: introduce dirty_pages and kvm_dirty_ring_enabled
  migration/dirtyrate: add per-vcpu option for calc-dirty-rate
  migration/dirtyrate: adjust struct DirtyRateStat
  memory: make global_dirty_log a bitmask
  migration/dirtyrate: implement dirty-ring dirtyrate calculation

Peter Xu (1):
  hmp: Add "calc_dirty_rate" and "info dirty_rate" cmds

 accel/kvm/kvm-all.c|   7 +
 hmp-commands-info.hx   |  13 ++
 hmp-commands.hx|  15 ++
 include/exec/memory.h  |  13 +-
 include/hw/core/cpu.h  |   1 +
 include/monitor/hmp.h  |   2 +
 include/sysemu/kvm.h   |   1 +
 migration/dirtyrate.c  | 334 +
 migration/dirtyrate.h  |  19 ++-
 migration/ram.c|   8 +-
 migration/trace-events |   5 +
 qapi/migration.json|  29 +++-
 softmmu/memory.c   |  36 +++--
 13 files changed, 435 insertions(+), 48 deletions(-)

-- 
2.18.2




Re: [PATCH 7/8] Provide a Console Terminal Block in the HWRPB.

2021-06-06 Thread Richard Henderson

On 6/2/21 8:53 PM, Jason Thorpe wrote:

+  hwrpb.hwrpb.ctbt_offset = offsetof(struct hwrpb_combine, ctb);
+  hwrpb.hwrpb.ctb_size = sizeof(hwrpb.ctb);
+  if (have_vga && !CONFIG_NOGRAPHICS(config))
+{
+  hwrpb.ctb.term_type = CTB_GRAPHICS;
+  hwrpb.ctb.turboslot = (CTB_TURBOSLOT_TYPE_PCI << 16) |
+   (pci_vga_bus << 8) | pci_vga_dev;
+}
+  else
+{
+  hwrpb.ctb.term_type = CTB_PRINTERPORT;
+}


I'm concerned that you're initializing only 1 or 2 slots of 34.

It would seem that at a bare minimum the struct should be zeroed, and the 
device-independent header (4 slots) should be set.


I notice you're setting term_type (offset 56) and not type (offset 0), which is 
where my documentation says that CTB_GRAPHICS goes (Console Interface 
Architecture 2.3.8.2 Console Terminal Block Table).


I'm also confused that this


+ * Format of the Console Terminal Block Type 4 `turboslot' field:


says "type 4", but you're actually using type 3 (GRAPHICS) above.

But I do see that what you're filling in is exactly what netbsd examines -- no 
header checks, no size checks, or anything.  And that openbsd has an exact copy 
of that code.



r~



Re: [RFC PATCH 2/5] ppc/pegasos2: Introduce Pegasos2MachineState structure

2021-06-06 Thread Philippe Mathieu-Daudé
On 6/6/21 5:46 PM, BALATON Zoltan wrote:
> Add own machine state structure which will be used to store state
> needed for firmware emulation.
> 
> Signed-off-by: BALATON Zoltan 
> ---
>  hw/ppc/pegasos2.c | 50 +++
>  1 file changed, 37 insertions(+), 13 deletions(-)

> +struct Pegasos2MachineState {
> +MachineState parent_obj;
> +PowerPCCPU *cpu;
> +DeviceState *mv;
> +};
> +
>  static void pegasos2_cpu_reset(void *opaque)
>  {
>  PowerPCCPU *cpu = opaque;
> @@ -51,9 +60,9 @@ static void pegasos2_cpu_reset(void *opaque)
>  
>  static void pegasos2_init(MachineState *machine)
>  {
> -PowerPCCPU *cpu = NULL;
> +Pegasos2MachineState *pm = PEGASOS2_MACHINE(machine);
> +CPUPPCState *env;
>  MemoryRegion *rom = g_new(MemoryRegion, 1);

It would be nice to have the 'rom' variable also in the machine state.
Can be done later...

Reviewed-by: Philippe Mathieu-Daudé 



[Bug 1920013] Re: Unable to pass-through PCIe devices from a ppc64le host to an x86_64 guest

2021-06-06 Thread cyrozap
I've moved this bug over to GitLab here: https://gitlab.com/qemu-
project/qemu/-/issues/391

** Bug watch added: gitlab.com/qemu-project/qemu/-/issues #391
   https://gitlab.com/qemu-project/qemu/-/issues/391

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1920013

Title:
  Unable to pass-through PCIe devices from a ppc64le host to an x86_64
  guest

Status in QEMU:
  Incomplete

Bug description:
  Attempting to pass through a PCIe device from a ppc64le host to an
  x86_64 guest with QEMU v5.2.0-3031-g571d413b5d (built from git master)
  fails with the following error:

  include/exec/memory.h:43:IOMMU_MEMORY_REGION: Object 0x10438eb00
  is not an instance of type qemu:iommu-memory-region

  To reproduce this issue, simply run the following command on a POWER9
  system:

  qemu-system-x86_64 -machine q35 -device vfio-pci,host=$DBSF

  Where $DBSF is a domain:bus:slot.function PCIe device address.

  This also fails with QEMU 3.1.0 (from Debian Buster), so I assume this
  has never worked. Helpfully, the error message it prints seems to
  indicate where the problem is:

  hw/vfio/spapr.c:147:vfio_spapr_create_window: Object 0x164473510
  is not an instance of type qemu:iommu-memory-region

  My kernel (Linux v5.8.0 plus some small unrelated patches) is built
  with the page size set to 4k, so this issue shouldn't be due to a page
  size mismatch. And as I stated earlier, my host arch is ppc64le, so it
  shouldn't be an endianness issue, either.

  I assume this should be possible (in theory) since I've seen reports
  of others getting PCIe passthrough working with aarch64 guests on
  x86_64 hosts, but of course that (passthrough to weird guest arch on
  x86) is somewhat the opposite of what I'm trying to do (passthrough to
  x86 on weird host arch) so I don't know for sure. If it is possible,
  I'm willing to develop a fix myself, but I'm almost completely
  unfamiliar with QEMU's internals so if anyone has any advice on where
  to start I'd greatly appreciate it.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1920013/+subscriptions



Re: [PATCH 3/3] target/arm: Diagnose UNALLOCATED in disas_simd_three_reg_same_fp16

2021-06-06 Thread Philippe Mathieu-Daudé
On 6/4/21 8:35 PM, Richard Henderson wrote:
> This fprintf+assert has been in place since the beginning.
> It is after to the fp_access_check, so we need to move the
> check up.  Fold that in to the pairwise filter.
> 
> Signed-off-by: Richard Henderson 
> ---
>  target/arm/translate-a64.c | 82 +++---
>  1 file changed, 50 insertions(+), 32 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé 



Re: [PATCH 2/3] target/arm: Remove fprintf from disas_simd_mod_imm

2021-06-06 Thread Philippe Mathieu-Daudé
On 6/4/21 8:35 PM, Richard Henderson wrote:
> The default of this switch is truly unreachable.
> The switch selector is 3 bits, and all 8 cases are present.
> 
> Signed-off-by: Richard Henderson 
> ---
>  target/arm/translate-a64.c | 1 -
>  1 file changed, 1 deletion(-)

Reviewed-by: Philippe Mathieu-Daudé 



Re: [PATCH 1/3] target/arm: Diagnose UNALLOCATED in disas_simd_two_reg_misc_fp16

2021-06-06 Thread Philippe Mathieu-Daudé
On 6/4/21 8:35 PM, Richard Henderson wrote:
> This fprintf+assert has been in place since the beginning.
> It is prior to the fp_access_check, so we're still good to
> raise sigill here.
> 
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/381
> Signed-off-by: Richard Henderson 
> ---
>  target/arm/translate-a64.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé 




Re: [Qemu-devel] [PATCH] configure: Define NCURSES_WIDECHAR if we're using curses

2021-06-06 Thread Stefan Weil

Am 02.06.17 um 16:35 schrieb Peter Maydell:


We want the wide character functions from the ncurses header.
Unfortunately it doesn't provide them by default, but only
if either:
  * NCURSES_WIDECHAR is defined (for ncurses 20111030 and up)
  * _XOPEN_SOURCE/_XOPEN_SOURCE_EXTENDED are suitably defined

So far we have been implicitly relying on the latter, because
for GNU libc when we define _GNU_SOURCE this causes libc
to define the _XOPEN_SOURCE macros for us. Unfortunately
this doesn't work on all libcs, because some (like OSX and
musl libc) do not define _XOPEN_SOURCE when _GNU_SOURCE
is defined.

We can't fix this by defining _XOPEN_SOURCE ourselves, because
that also means "and don't provide any functions that aren't in
that standard", and not all libcs provide any way to override
that to also get the non-standard functions. In particular
FreeBSD has no such mechanism, and OSX's _DARWIN_C_SOURCE
doesn't reenable everything (for instance getpagesize()
is still not prototyped if _DARWIN_C_SOURCE and _XOPEN_SOURCE
are both defined).

So we have to define NCURSES_WIDECHAR. (This will only work
if your ncurses is at least 20111030, as older versions
don't honour this macro.)



I answer to this very old e-mail because I noticed today that defining 
NCURSES_WIDECHAR does not work with the latest MacOS on M1:


Apple still delivers a curses.h which indicates NCURSES_VERSION "5.7" 
(20081102). It does not know NCURSES_WIDECHAR, but support for curses 
can be enabled with _XOPEN_SOURCE_EXTENDED=1.


I used this patch for git master:

diff --git a/meson.build b/meson.build
index 626cf932c1..513332a76d 100644
--- a/meson.build
+++ b/meson.build
@@ -606,7 +606,7 @@ if have_system and not get_option('curses').disabled()
 endif
   endforeach
   msg = get_option('curses').enabled() ? 'curses library not found' : ''
-  curses_compile_args = ['-DNCURSES_WIDECHAR']
+  curses_compile_args = ['-DNCURSES_WIDECHAR', 
'-D_XOPEN_SOURCE_EXTENDED=1']

   if curses.found()
 if cc.links(curses_test, args: curses_compile_args, dependencies: 
[curses])
   curses = declare_dependency(compile_args: curses_compile_args, 
dependencies: [curses])



Then curses is detected and works when configure is given the right 
PKG_CONFIG_PATH:


PKG_CONFIG_PATH=/opt/homebrew/Library/Homebrew/os/mac/pkgconfig/11 
./configure


Regards

Stefan





[RFC PATCH 4/5] ppc/pegasos2: Use Virtual Open Firmware as firmware replacement

2021-06-06 Thread BALATON Zoltan
The pegasos2 board comes with an Open Firmware compliant ROM based on
SmartFirmware but it has some changes that are not open source
therefore the ROM binary cannot be included in QEMU. Guests running on
the board however depend on services provided by the firmware. The
Virtual Open Firmware recently added to QEMU imlements a minimal set
of these services to allow some guests to boot without the original
firmware. This patch adds VOF as the default firmware for pegasos2
which allows booting Linux and MorphOS via -kernel option while a ROM
image can still be used with -bios for guests that don't run with VOF.

Signed-off-by: BALATON Zoltan 
---
 hw/ppc/Kconfig|   1 +
 hw/ppc/pegasos2.c | 622 +-
 2 files changed, 621 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig
index b895720b28..0eb48128fe 100644
--- a/hw/ppc/Kconfig
+++ b/hw/ppc/Kconfig
@@ -75,6 +75,7 @@ config PEGASOS2
 select VT82C686
 select IDE_VIA
 select SMBUS_EEPROM
+select VOF
 # This should come with VT82C686
 select ACPI_X86
 
diff --git a/hw/ppc/pegasos2.c b/hw/ppc/pegasos2.c
index 07971175c9..91e5fa8fbe 100644
--- a/hw/ppc/pegasos2.c
+++ b/hw/ppc/pegasos2.c
@@ -34,13 +34,36 @@
 #include "trace.h"
 #include "qemu/datadir.h"
 #include "sysemu/device_tree.h"
+#include "hw/ppc/vof.h"
 
-#define PROM_FILENAME "pegasos2.rom"
+#include 
+
+#define PROM_FILENAME "vof.bin"
 #define PROM_ADDR 0xfff0
 #define PROM_SIZE 0x8
 
+#define KVMPPC_HCALL_BASE0xf000
+#define KVMPPC_H_VOF_CLIENT  (KVMPPC_HCALL_BASE + 0x5)
+
+/* Copied from SLOF, and 4K is definitely not enough for GRUB */
+#define OF_STACK_SIZE   0x8000
+
+#define H_SUCCESS 0
+#define H_PRIVILEGE  -3  /* Caller not privileged */
+#define H_PARAMETER  -4  /* Parameter invalid, out-of-range or conflicting */
+
 #define BUS_FREQ_HZ 1
 
+#define PCI0_MEM_BASE 0xc000
+#define PCI0_MEM_SIZE 0x2000
+#define PCI0_IO_BASE  0xf800
+#define PCI0_IO_SIZE  0x1
+
+#define PCI1_MEM_BASE 0x8000
+#define PCI1_MEM_SIZE 0x4000
+#define PCI1_IO_BASE  0xfe00
+#define PCI1_IO_SIZE  0x1
+
 #define TYPE_PEGASOS2_MACHINE  MACHINE_TYPE_NAME("pegasos2")
 OBJECT_DECLARE_TYPE(Pegasos2MachineState, MachineClass, PEGASOS2_MACHINE)
 
@@ -48,14 +71,26 @@ struct Pegasos2MachineState {
 MachineState parent_obj;
 PowerPCCPU *cpu;
 DeviceState *mv;
+Vof *vof;
+void *fdt_blob;
+uint64_t kernel_addr;
+uint64_t kernel_entry;
+uint64_t kernel_size;
 };
 
+static void *build_fdt(MachineState *machine, int *fdt_size);
+
 static void pegasos2_cpu_reset(void *opaque)
 {
 PowerPCCPU *cpu = opaque;
+Pegasos2MachineState *pm = PEGASOS2_MACHINE(current_machine);
 
 cpu_reset(CPU(cpu));
 cpu->env.spr[SPR_HID1] = 7ULL << 28;
+if (pm->vof) {
+cpu->env.gpr[1] = 2 * OF_STACK_SIZE - 0x20;
+cpu->env.nip = 0x100;
+}
 }
 
 static void pegasos2_init(MachineState *machine)
@@ -92,18 +127,24 @@ static void pegasos2_init(MachineState *machine)
 error_report("Could not find firmware '%s'", fwname);
 exit(1);
 }
+if (!machine->firmware && !pm->vof) {
+pm->vof = g_malloc0(sizeof(*pm->vof));
+}
 memory_region_init_rom(rom, NULL, "pegasos2.rom", PROM_SIZE, _fatal);
 memory_region_add_subregion(get_system_memory(), PROM_ADDR, rom);
 sz = load_elf(filename, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 1,
   PPC_ELF_MACHINE, 0, 0);
 if (sz <= 0) {
-sz = load_image_targphys(filename, PROM_ADDR, PROM_SIZE);
+sz = load_image_targphys(filename, pm->vof ? 0 : PROM_ADDR, PROM_SIZE);
 }
 if (sz <= 0 || sz > PROM_SIZE) {
 error_report("Could not load firmware '%s'", filename);
 exit(1);
 }
 g_free(filename);
+if (pm->vof) {
+pm->vof->fw_size = sz;
+}
 
 /* Marvell Discovery II system controller */
 pm->mv = DEVICE(sysbus_create_simple(TYPE_MV64361, -1,
@@ -137,20 +178,179 @@ static void pegasos2_init(MachineState *machine)
 
 /* other PC hardware */
 pci_vga_init(pci_bus);
+
+if (machine->kernel_filename) {
+sz = load_elf(machine->kernel_filename, NULL, NULL, NULL,
+  >kernel_entry, >kernel_addr, NULL, NULL, 1,
+  PPC_ELF_MACHINE, 0, 0);
+if (sz <= 0) {
+error_report("Could not load kernel '%s'",
+ machine->kernel_filename);
+exit(1);
+}
+pm->kernel_size = sz;
+if (!pm->vof) {
+warn_report("Option -kernel may be ineffective with -bios.");
+}
+}
+if (machine->kernel_cmdline && !pm->vof) {
+warn_report("Option -append may be ineffective with -bios.");
+}
+}
+
+static void pegasos2_pci_config_write(AddressSpace *as, int bus, uint32_t addr,
+  uint32_t len, uint32_t val)
+{
+hwaddr pcicfg = 

[RFC PATCH 0/5] ppc/Pegasos2 VOF

2021-06-06 Thread BALATON Zoltan


Based-on: <20210520090557.435689-1-...@ozlabs.ru>
^ That is v20 of Alexey's VOF patch

Hello,

Posting these for early review now. I plan to rebase on the next VOF
patch that hopefully fixes those points that I had to circumvent in
patch 1 for now. I've reported these before but now all of those that
are needed for pegasos2 are in one place. Other points I've reported
could be clean ups but not sttictly needed.

With this series on top of VOF v20 I can now boot Linux and MorphOS on
pegasos2 without needing a firmware blob so I hope this is enough to
get this board in 6.1 and also have it enabled so users can start
using it. That means that VOF will also be merged by then. This now
gives VOF another use case that may help it getting finished.

I've also updated my development tree with this series here:

https://osdn.net/projects/qmiga/scm/git/qemu/tree/pegasos2/

Please review so I can do any needed changes together with the rebase
on next VOF patch so we don't miss 6.1 this time.

Regards,
BALATON Zoltan

BALATON Zoltan (5):
  Misc VOF fixes
  ppc/pegasos2: Introduce Pegasos2MachineState structure
  target/ppc: Allow virtual hypervisor on CPU without HV
  ppc/pegasos2: Use Virtual Open Firmware as firmware replacement
  ppc/pegasos2: Implement some RTAS functions with VOF

 default-configs/devices/ppc-softmmu.mak |   2 +-
 hw/ppc/Kconfig  |   1 +
 hw/ppc/pegasos2.c   | 780 +++-
 hw/ppc/vof.c|  11 +-
 pc-bios/vof.bin | Bin 3784 -> 3784 bytes
 pc-bios/vof/entry.S |   2 +-
 target/ppc/cpu.c|   2 +-
 7 files changed, 776 insertions(+), 22 deletions(-)

-- 
2.21.4




[RFC PATCH 5/5] ppc/pegasos2: Implement some RTAS functions with VOF

2021-06-06 Thread BALATON Zoltan
Linux uses RTAS functions to access PCI devices so we need to provide
these with VOF. Implement some of the most important functions to
allow booting Linux with VOF. With this the board is now usable
without a binary ROM image and we can enable it by default as other
boards.

Signed-off-by: BALATON Zoltan 
---
 default-configs/devices/ppc-softmmu.mak |   2 +-
 hw/ppc/pegasos2.c   | 108 
 2 files changed, 109 insertions(+), 1 deletion(-)

diff --git a/default-configs/devices/ppc-softmmu.mak 
b/default-configs/devices/ppc-softmmu.mak
index c2d41198cd..4535993d8d 100644
--- a/default-configs/devices/ppc-softmmu.mak
+++ b/default-configs/devices/ppc-softmmu.mak
@@ -14,7 +14,7 @@ CONFIG_SAM460EX=y
 CONFIG_MAC_OLDWORLD=y
 CONFIG_MAC_NEWWORLD=y
 
-CONFIG_PEGASOS2=n
+CONFIG_PEGASOS2=y
 
 # For PReP
 CONFIG_PREP=y
diff --git a/hw/ppc/pegasos2.c b/hw/ppc/pegasos2.c
index 91e5fa8fbe..cb2be27394 100644
--- a/hw/ppc/pegasos2.c
+++ b/hw/ppc/pegasos2.c
@@ -43,6 +43,7 @@
 #define PROM_SIZE 0x8
 
 #define KVMPPC_HCALL_BASE0xf000
+#define KVMPPC_H_RTAS(KVMPPC_HCALL_BASE + 0x0)
 #define KVMPPC_H_VOF_CLIENT  (KVMPPC_HCALL_BASE + 0x5)
 
 /* Copied from SLOF, and 4K is definitely not enough for GRUB */
@@ -198,6 +199,30 @@ static void pegasos2_init(MachineState *machine)
 }
 }
 
+static uint32_t pegasos2_pci_config_read(AddressSpace *as, int bus,
+ uint32_t addr, uint32_t len)
+{
+hwaddr pcicfg = (bus ? 0xf1000c78 : 0xf1000cf8);
+uint32_t val = 0x;
+
+stl_le_phys(as, pcicfg, addr | BIT(31));
+switch (len) {
+case 4:
+val = ldl_le_phys(as, pcicfg + 4);
+break;
+case 2:
+val = lduw_le_phys(as, pcicfg + 4);
+break;
+case 1:
+val = ldub_phys(as, pcicfg + 4);
+break;
+default:
+qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid length\n", __func__);
+break;
+}
+return val;
+}
+
 static void pegasos2_pci_config_write(AddressSpace *as, int bus, uint32_t addr,
   uint32_t len, uint32_t val)
 {
@@ -307,6 +332,87 @@ static void pegasos2_machine_reset(MachineState *machine)
 pm->cpu->vhyp = PPC_VIRTUAL_HYPERVISOR(machine);
 }
 
+enum pegasos2_rtas_tokens {
+RTAS_RESTART_RTAS = 0,
+RTAS_NVRAM_FETCH = 1,
+RTAS_NVRAM_STORE = 2,
+RTAS_GET_TIME_OF_DAY = 3,
+RTAS_SET_TIME_OF_DAY = 4,
+RTAS_EVENT_SCAN = 6,
+RTAS_CHECK_EXCEPTION = 7,
+RTAS_READ_PCI_CONFIG = 8,
+RTAS_WRITE_PCI_CONFIG = 9,
+RTAS_DISPLAY_CHARACTER = 10,
+RTAS_SET_INDICATOR = 11,
+RTAS_POWER_OFF = 17,
+RTAS_SUSPEND = 18,
+RTAS_HIBERNATE = 19,
+RTAS_SYSTEM_REBOOT = 20,
+};
+
+static target_ulong pegasos2_rtas(PowerPCCPU *cpu, Pegasos2MachineState *pm,
+  target_ulong args_real)
+{
+AddressSpace *as = CPU(cpu)->as;
+uint32_t token = ldl_be_phys(as, args_real);
+uint32_t nargs = ldl_be_phys(as, args_real + 4);
+uint32_t nrets = ldl_be_phys(as, args_real + 8);
+uint32_t args = args_real + 12;
+uint32_t rets = args_real + 12 + nargs * 4;
+
+if (nrets < 1) {
+qemu_log_mask(LOG_GUEST_ERROR, "Too few return values in RTAS call\n");
+return H_PARAMETER;
+}
+switch (token) {
+case RTAS_READ_PCI_CONFIG:
+{
+uint32_t addr, len, val;
+
+if (nargs != 2 || nrets != 2) {
+stl_be_phys(as, rets, -1);
+return H_PARAMETER;
+}
+addr = ldl_be_phys(as, args);
+len = ldl_be_phys(as, args + 4);
+val = pegasos2_pci_config_read(as, !(addr >> 24),
+   addr & 0x0fff, len);
+stl_be_phys(as, rets, 0);
+stl_be_phys(as, rets + 4, val);
+return H_SUCCESS;
+}
+case RTAS_WRITE_PCI_CONFIG:
+{
+uint32_t addr, len, val;
+
+if (nargs != 3 || nrets != 1) {
+stl_be_phys(as, rets, -1);
+return H_PARAMETER;
+}
+addr = ldl_be_phys(as, args);
+len = ldl_be_phys(as, args + 4);
+val = ldl_be_phys(as, args + 8);
+pegasos2_pci_config_write(as, !(addr >> 24),
+  addr & 0x0fff, len, val);
+stl_be_phys(as, rets, 0);
+return H_SUCCESS;
+}
+case RTAS_DISPLAY_CHARACTER:
+if (nargs != 1 || nrets != 1) {
+stl_be_phys(as, rets, -1);
+return H_PARAMETER;
+}
+qemu_log_mask(LOG_UNIMP, "%c", ldl_be_phys(as, args));
+stl_be_phys(as, rets, 0);
+return H_SUCCESS;
+default:
+qemu_log_mask(LOG_UNIMP, "Unknown RTAS token %u (args=%u, rets=%u)\n",
+  token, nargs, nrets);
+stl_be_phys(as, rets, 0);
+return H_SUCCESS;
+}
+}
+
 static void pegasos2_hypercall(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu)
 {
 Pegasos2MachineState *pm = PEGASOS2_MACHINE(vhyp);

[RFC PATCH 3/5] target/ppc: Allow virtual hypervisor on CPU without HV

2021-06-06 Thread BALATON Zoltan
Change the assert in ppc_store_sdr1() to allow vhyp to be set on CPUs
without HV bit. This allows using the vhyp interface for firmware
emulation on pegasos2.

Signed-off-by: BALATON Zoltan 
---
 target/ppc/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/ppc/cpu.c b/target/ppc/cpu.c
index 19d67b5b07..a29299882a 100644
--- a/target/ppc/cpu.c
+++ b/target/ppc/cpu.c
@@ -72,7 +72,7 @@ void ppc_store_sdr1(CPUPPCState *env, target_ulong value)
 {
 PowerPCCPU *cpu = env_archcpu(env);
 qemu_log_mask(CPU_LOG_MMU, "%s: " TARGET_FMT_lx "\n", __func__, value);
-assert(!cpu->vhyp);
+assert(!cpu->env.has_hv_mode || !cpu->vhyp);
 #if defined(TARGET_PPC64)
 if (mmu_is_64bit(env->mmu_model)) {
 target_ulong sdr_mask = SDR_64_HTABORG | SDR_64_HTABSIZE;
-- 
2.21.4




[RFC PATCH 2/5] ppc/pegasos2: Introduce Pegasos2MachineState structure

2021-06-06 Thread BALATON Zoltan
Add own machine state structure which will be used to store state
needed for firmware emulation.

Signed-off-by: BALATON Zoltan 
---
 hw/ppc/pegasos2.c | 50 +++
 1 file changed, 37 insertions(+), 13 deletions(-)

diff --git a/hw/ppc/pegasos2.c b/hw/ppc/pegasos2.c
index 0bfd0928aa..07971175c9 100644
--- a/hw/ppc/pegasos2.c
+++ b/hw/ppc/pegasos2.c
@@ -1,7 +1,7 @@
 /*
  * QEMU PowerPC CHRP (Genesi/bPlan Pegasos II) hardware System Emulator
  *
- * Copyright (c) 2018-2020 BALATON Zoltan
+ * Copyright (c) 2018-2021 BALATON Zoltan
  *
  * This work is licensed under the GNU GPL license version 2 or later.
  *
@@ -41,6 +41,15 @@
 
 #define BUS_FREQ_HZ 1
 
+#define TYPE_PEGASOS2_MACHINE  MACHINE_TYPE_NAME("pegasos2")
+OBJECT_DECLARE_TYPE(Pegasos2MachineState, MachineClass, PEGASOS2_MACHINE)
+
+struct Pegasos2MachineState {
+MachineState parent_obj;
+PowerPCCPU *cpu;
+DeviceState *mv;
+};
+
 static void pegasos2_cpu_reset(void *opaque)
 {
 PowerPCCPU *cpu = opaque;
@@ -51,9 +60,9 @@ static void pegasos2_cpu_reset(void *opaque)
 
 static void pegasos2_init(MachineState *machine)
 {
-PowerPCCPU *cpu = NULL;
+Pegasos2MachineState *pm = PEGASOS2_MACHINE(machine);
+CPUPPCState *env;
 MemoryRegion *rom = g_new(MemoryRegion, 1);
-DeviceState *mv;
 PCIBus *pci_bus;
 PCIDevice *dev;
 I2CBus *i2c_bus;
@@ -63,15 +72,16 @@ static void pegasos2_init(MachineState *machine)
 uint8_t *spd_data;
 
 /* init CPU */
-cpu = POWERPC_CPU(cpu_create(machine->cpu_type));
-if (PPC_INPUT(>env) != PPC_FLAGS_INPUT_6xx) {
+pm->cpu = POWERPC_CPU(cpu_create(machine->cpu_type));
+env = >cpu->env;
+if (PPC_INPUT(env) != PPC_FLAGS_INPUT_6xx) {
 error_report("Incompatible CPU, only 6xx bus supported");
 exit(1);
 }
 
 /* Set time-base frequency */
-cpu_ppc_tb_init(>env, BUS_FREQ_HZ / 4);
-qemu_register_reset(pegasos2_cpu_reset, cpu);
+cpu_ppc_tb_init(env, BUS_FREQ_HZ / 4);
+qemu_register_reset(pegasos2_cpu_reset, pm->cpu);
 
 /* RAM */
 memory_region_add_subregion(get_system_memory(), 0, machine->ram);
@@ -96,16 +106,16 @@ static void pegasos2_init(MachineState *machine)
 g_free(filename);
 
 /* Marvell Discovery II system controller */
-mv = DEVICE(sysbus_create_simple(TYPE_MV64361, -1,
-((qemu_irq *)cpu->env.irq_inputs)[PPC6xx_INPUT_INT]));
-pci_bus = mv64361_get_pci_bus(mv, 1);
+pm->mv = DEVICE(sysbus_create_simple(TYPE_MV64361, -1,
+ ((qemu_irq *)env->irq_inputs)[PPC6xx_INPUT_INT]));
+pci_bus = mv64361_get_pci_bus(pm->mv, 1);
 
 /* VIA VT8231 South Bridge (multifunction PCI device) */
 /* VT8231 function 0: PCI-to-ISA Bridge */
 dev = pci_create_simple_multifunction(pci_bus, PCI_DEVFN(12, 0), true,
   TYPE_VT8231_ISA);
 qdev_connect_gpio_out(DEVICE(dev), 0,
-  qdev_get_gpio_in_named(mv, "gpp", 31));
+  qdev_get_gpio_in_named(pm->mv, "gpp", 31));
 
 /* VT8231 function 1: IDE Controller */
 dev = pci_create_simple(pci_bus, PCI_DEVFN(12, 1), "via-ide");
@@ -129,8 +139,10 @@ static void pegasos2_init(MachineState *machine)
 pci_vga_init(pci_bus);
 }
 
-static void pegasos2_machine(MachineClass *mc)
+static void pegasos2_machine_class_init(ObjectClass *oc, void *data)
 {
+MachineClass *mc = MACHINE_CLASS(oc);
+
 mc->desc = "Genesi/bPlan Pegasos II";
 mc->init = pegasos2_init;
 mc->block_default_type = IF_IDE;
@@ -141,4 +153,16 @@ static void pegasos2_machine(MachineClass *mc)
 mc->default_ram_size = 512 * MiB;
 }
 
-DEFINE_MACHINE("pegasos2", pegasos2_machine)
+static const TypeInfo pegasos2_machine_info = {
+.name  = TYPE_PEGASOS2_MACHINE,
+.parent= TYPE_MACHINE,
+.class_init= pegasos2_machine_class_init,
+.instance_size = sizeof(Pegasos2MachineState),
+};
+
+static void pegasos2_machine_register_types(void)
+{
+type_register_static(_machine_info);
+}
+
+type_init(pegasos2_machine_register_types)
-- 
2.21.4




[RFC PATCH 1/5] Misc VOF fixes

2021-06-06 Thread BALATON Zoltan
Signed-off-by: BALATON Zoltan 
---
 hw/ppc/vof.c|  11 +++
 pc-bios/vof.bin | Bin 3784 -> 3784 bytes
 pc-bios/vof/entry.S |   2 +-
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
index a283b7d251..ac95be9666 100644
--- a/hw/ppc/vof.c
+++ b/hw/ppc/vof.c
@@ -144,12 +144,15 @@ static uint32_t vof_finddevice(const void *fdt, uint32_t 
nodeaddr)
 char fullnode[VOF_MAX_PATH];
 uint32_t ret = -1;
 int offset;
+gchar *p;
 
 if (readstr(nodeaddr, fullnode, sizeof(fullnode))) {
 return (uint32_t) ret;
 }
 
-offset = fdt_path_offset(fdt, fullnode);
+p = g_ascii_strdown(fullnode, -1);
+offset = fdt_path_offset(fdt, p);
+g_free(p);
 if (offset >= 0) {
 ret = fdt_get_phandle(fdt, offset);
 }
@@ -160,14 +163,14 @@ static uint32_t vof_finddevice(const void *fdt, uint32_t 
nodeaddr)
 static const void *getprop(const void *fdt, int nodeoff, const char *propname,
int *proplen, bool *write0)
 {
-const char *unit, *prop;
+const char *unit, *prop = fdt_getprop(fdt, nodeoff, propname, proplen);
 
 /*
  * The "name" property is not actually stored as a property in the FDT,
  * we emulate it by returning a pointer to the node's name and adjust
  * proplen to include only the name but not the unit.
  */
-if (strcmp(propname, "name") == 0) {
+if (!prop && strcmp(propname, "name") == 0) {
 prop = fdt_get_name(fdt, nodeoff, proplen);
 if (!prop) {
 *proplen = 0;
@@ -193,7 +196,7 @@ static const void *getprop(const void *fdt, int nodeoff, 
const char *propname,
 if (write0) {
 *write0 = false;
 }
-return fdt_getprop(fdt, nodeoff, propname, proplen);
+return prop;
 }
 
 static uint32_t vof_getprop(const void *fdt, uint32_t nodeph, uint32_t pname,
diff --git a/pc-bios/vof.bin b/pc-bios/vof.bin
index 
7e4c3742deae3c1904f4b2bf03ef72576b12d532..1ec670be82134adcb5ae128732aff6e371281360
 100755
GIT binary patch
delta 14
VcmX>hdqQ@D4kKgpW?jbFyZ|U11hoJF

delta 14
VcmX>hdqQ@D4kP31=Uc>yYn1swnY

diff --git a/pc-bios/vof/entry.S b/pc-bios/vof/entry.S
index 569688714c..f8066775ec 100644
--- a/pc-bios/vof/entry.S
+++ b/pc-bios/vof/entry.S
@@ -30,7 +30,7 @@ ENTRY(_prom_entry)
bl prom_entry
nop
mtlr%r31
-   ld  %r31,104(%r1)
+   lwz %r31,104(%r1)
addi%r1,%r1,112
blr
 
-- 
2.21.4




Re: [PATCH 01/11] hw/char: Renesas SCI module.

2021-06-06 Thread Yoshinori Sato
On Fri, 04 Jun 2021 18:09:44 +0900,
Peter Maydell wrote:
> 
> On Thu, 27 May 2021 at 06:30, Yoshinori Sato  
> wrote:
> >
> > This module supported SCI / SCIa / SCIF.
> >
> > Hardware manual.
> > SCI / SCIF
> > https://www.renesas.com/us/en/doc/products/mpumcu/001/r01uh0457ej0401_sh7751.pdf
> > SCIa
> > https://www.renesas.com/us/en/doc/products/mpumcu/doc/rx_family/r01uh0033ej0140_rx62n.pdf
> >
> > Signed-off-by: Yoshinori Sato 
> > ---
> >  include/hw/char/renesas_sci.h |  129 +++-
> >  hw/char/renesas_sci.c | 1039 +++--
> >  2 files changed, 966 insertions(+), 202 deletions(-)
> 
> This patch is much too large to review. Could you split it up into
> multiple logical pieces, please?

OK. I will snt v2.

> 
> >
> > diff --git a/include/hw/char/renesas_sci.h b/include/hw/char/renesas_sci.h
> > index a4764e3eee..ae9554db60 100644
> > --- a/include/hw/char/renesas_sci.h
> > +++ b/include/hw/char/renesas_sci.h
> > @@ -1,54 +1,137 @@
> >  /*
> >   * Renesas Serial Communication Interface
> >   *
> > - * Copyright (c) 2018 Yoshinori Sato
> > + * Copyright (c) 2020 Yoshinori Sato
> > + *
> > + * This code is licensed under the GPL version 2 or later.
> >   *
> > - * SPDX-License-Identifier: GPL-2.0-or-later
> 
> Did you have a reason for deleting the SPDX line ?
> 
> >   */
> >
> > -#ifndef HW_CHAR_RENESAS_SCI_H
> > -#define HW_CHAR_RENESAS_SCI_H
> > -
> 
> Why have you deleted the multiple-include guard?
> 
> thanks
> -- PMM

-- 
Yoshinori Sato



Re: [PATCH 02/11] hw/char: remove sh_serial.

2021-06-06 Thread Yoshinori Sato
On Fri, 04 Jun 2021 19:08:07 +0900,
Peter Maydell wrote:
> 
> On Thu, 27 May 2021 at 06:24, Yoshinori Sato  
> wrote:
> >
> > Migrate to renesas_sci.
> >
> > Signed-off-by: Yoshinori Sato 
> > ---
> >  hw/char/sh_serial.c | 431 
> >  MAINTAINERS |   4 +-
> >  hw/char/Kconfig |   3 -
> >  hw/char/meson.build |   1 -
> >  4 files changed, 2 insertions(+), 437 deletions(-)
> >  delete mode 100644 hw/char/sh_serial.c
> 
> This won't compile, because there are still boards which use
> the old code you are removing here. The patchset has to be able
> to compile and work at every step, so you need to first add new
> devices, then convert the boards to use them, and then as the
> last step you can remove the old implementations.
> 
> Checking, I find that patch 1 doesn't compile either.
> 
> thanks
> -- PMM

OK. I will sent v2.

-- 
Yoshinori Sato



Re: [PATCH 02/11] hw/char: remove sh_serial.

2021-06-06 Thread Yoshinori Sato
On Fri, 04 Jun 2021 19:08:07 +0900,
Peter Maydell wrote:
> 
> On Thu, 27 May 2021 at 06:24, Yoshinori Sato  
> wrote:
> >
> > Migrate to renesas_sci.
> >
> > Signed-off-by: Yoshinori Sato 
> > ---
> >  hw/char/sh_serial.c | 431 
> >  MAINTAINERS |   4 +-
> >  hw/char/Kconfig |   3 -
> >  hw/char/meson.build |   1 -
> >  4 files changed, 2 insertions(+), 437 deletions(-)
> >  delete mode 100644 hw/char/sh_serial.c
> 
> This won't compile, because there are still boards which use
> the old code you are removing here. The patchset has to be able
> to compile and work at every step, so you need to first add new
> devices, then convert the boards to use them, and then as the
> last step you can remove the old implementations.
> 
> Checking, I find that patch 1 doesn't compile either.
> 
> thanks
> -- PMM

OK. I will sent v2.

-- 
Yoshinori Sato



Re: [PATCH 01/11] hw/char: Renesas SCI module.

2021-06-06 Thread Yoshinori Sato
On Fri, 04 Jun 2021 18:09:44 +0900,
Peter Maydell wrote:
> 
> On Thu, 27 May 2021 at 06:30, Yoshinori Sato  
> wrote:
> >
> > This module supported SCI / SCIa / SCIF.
> >
> > Hardware manual.
> > SCI / SCIF
> > https://www.renesas.com/us/en/doc/products/mpumcu/001/r01uh0457ej0401_sh7751.pdf
> > SCIa
> > https://www.renesas.com/us/en/doc/products/mpumcu/doc/rx_family/r01uh0033ej0140_rx62n.pdf
> >
> > Signed-off-by: Yoshinori Sato 
> > ---
> >  include/hw/char/renesas_sci.h |  129 +++-
> >  hw/char/renesas_sci.c | 1039 +++--
> >  2 files changed, 966 insertions(+), 202 deletions(-)
> 
> This patch is much too large to review. Could you split it up into
> multiple logical pieces, please?

OK. I will snt v2.

> 
> >
> > diff --git a/include/hw/char/renesas_sci.h b/include/hw/char/renesas_sci.h
> > index a4764e3eee..ae9554db60 100644
> > --- a/include/hw/char/renesas_sci.h
> > +++ b/include/hw/char/renesas_sci.h
> > @@ -1,54 +1,137 @@
> >  /*
> >   * Renesas Serial Communication Interface
> >   *
> > - * Copyright (c) 2018 Yoshinori Sato
> > + * Copyright (c) 2020 Yoshinori Sato
> > + *
> > + * This code is licensed under the GPL version 2 or later.
> >   *
> > - * SPDX-License-Identifier: GPL-2.0-or-later
> 
> Did you have a reason for deleting the SPDX line ?
> 
> >   */
> >
> > -#ifndef HW_CHAR_RENESAS_SCI_H
> > -#define HW_CHAR_RENESAS_SCI_H
> > -
> 
> Why have you deleted the multiple-include guard?
> 
> thanks
> -- PMM

-- 
Yoshinori Sato



Re: [PATCH 02/11] hw/char: remove sh_serial.

2021-06-06 Thread Yoshinori Sato
On Fri, 04 Jun 2021 19:08:07 +0900,
Peter Maydell wrote:
> 
> On Thu, 27 May 2021 at 06:24, Yoshinori Sato  
> wrote:
> >
> > Migrate to renesas_sci.
> >
> > Signed-off-by: Yoshinori Sato 
> > ---
> >  hw/char/sh_serial.c | 431 
> >  MAINTAINERS |   4 +-
> >  hw/char/Kconfig |   3 -
> >  hw/char/meson.build |   1 -
> >  4 files changed, 2 insertions(+), 437 deletions(-)
> >  delete mode 100644 hw/char/sh_serial.c
> 
> This won't compile, because there are still boards which use
> the old code you are removing here. The patchset has to be able
> to compile and work at every step, so you need to first add new
> devices, then convert the boards to use them, and then as the
> last step you can remove the old implementations.
> 
> Checking, I find that patch 1 doesn't compile either.
> 
> thanks
> -- PMM

OK. I will sent v2.

-- 
Yoshinori Sato