from:"Aurelien Jarno"

Re: [Qemu-devel] [PATCH v3 00/11] tcg mips64 and mips r6 improvements

2016-11-25 Thread Aurelien Jarno

On 2016-11-25 11:31, Jin Guojie wrote:
> Changes since v2:
>   * Update against master(v2.8.0-rc1)
>   * Tested on Loongson as mips32r2(el) and mips64r2(el) hosts.
> Loongson only implements little-endian mips32/mips64 ISA.
>   * Fully work for 32-bit and 64-bit guests.
> Fix two bugs：segmentation fault on mips64el with 32-bit guests,
>   blocking when emulating i386 kernel on mips64el.
>   * Fix some minor style problems.
>   * PATCH v2 12~16 are not examined due to the lack of R6 machine. 
> 
> To be tested:
>   * big-endian mips32 and mips64 hosts.
> I have tried running qemu-system-mips on an X86. The speed is awful. 
> The compilation of qemu did not complete over a night until I gave up.
> A better way is needed to do this test.

Thanks for the new version. I should be able to try this on a 32-bit BE
host during the week-end.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH] tcg/mips: Add support for mips64el backend

2016-11-14 Thread Aurelien Jarno

Hi,

On 2016-11-14 17:33, Jin Guojie wrote:
> Richard,
> 
> I have studied your V2 patch
> 
> https://lists.nongnu.org/archive/html/qemu-devel/2016-02/msg02969.html
> 
> . Though I have not tested this patch on Loongson machine, I feel this 
> patch has implemented MIPS64 ISA very completely, including big/little
> endian, N32 and N64 ABI. The use of #if is more clean. Many corner cases
> are well handled. My patch is only a subset of yours.
> 
> I wonder why your patch have not be merged into the mainstream.
> If I had seen it before, I don't need to waste time reinventing my patch.
> 
> Since tcg target for MIPS64 is of great use for developers, I 
> really hope this feature can be merged into mainstream.
> 
> Is your v2 patch still in review process? Is there chance for this
> patch to be merged in a not so long term? Or should other code
> work should be done before being merged?

Please see:
 https://lists.nongnu.org/archive/html/qemu-devel/2016-02/msg06444.html

In short this patch set looks overall good, but breaks support for
existing big-endian 32-bit hosts. It also doesn't fully work on 64-bit
hosts for 32-bit guests, but I guess that's something acceptable, as
it's not a regression.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v2] target-sh4: add atomic tas

2016-11-03 Thread Aurelien Jarno

On 2016-11-03 15:07, Laurent Vivier wrote:
> Implement real atomic tas:
> 
> When (Rn) = 0, 1 -> T
> Otherwise, 0 -> T
> In both cases, 1 -> MSB of (Rn)
> 
> using atomic_fetch_or_i32() and setcondi_i32().
> 
> Tested with image from:
> http://wiki.qemu.org/download/sh-test-0.2.tar.bz2
> 
> This image contains a "tas_test" that runs without
> error with this change.
> 
> Signed-off-by: Laurent Vivier <laur...@vivier.eu>
> ---
> v2:
> - don't use helper but atomic_fetch_or_i32
>   Thank you Paolo!

Thanks, this look good. I have tried it with my test image, and it
doesn't break it.

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>
Acked-by: Aurelien Jarno <aurel...@aurel32.net>

I consider this as a bugfix, not a new feature, so that should be fine
despite the soft freeze. Do you want me to send a pull request?

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net


signature.asc
Description: PGP signature

Re: [Qemu-devel] Provide safe_syscall for s390x

2016-10-18 Thread Aurelien Jarno

On 2016-10-17 07:35, Richard Henderson wrote:
> On 10/17/2016 03:55 AM, Christian Borntraeger wrote:
> > On 10/17/2016 10:26 AM, Thomas Huth wrote:
> > > On 14.10.2016 20:58, Michael Tokarev wrote:
> > > > Hi.
> > > > 
> > > > This commit: c9bc3437a905b660561a26cd4ecc64579843267b
> > > > Author: Richard Henderson <r...@twiddle.net>
> > > > Date:   Tue Jun 21 17:32:12 2016 -0700
> > > > 
> > > > linux-user: Provide safe_syscall for s390x
> > > > 
> > > > does not build on debian unstable porterbox for s390x, with
> > > > the following error message:
> > > > 
> > > > linux-user/host/s390x/safe-syscall.inc.S: Assembler messages:
> > > > linux-user/host/s390x/safe-syscall.inc.S:75: Error: Unrecognized opcode:
> > > > `lt'
> > > > rules.mak:72: recipe for target 'linux-user/safe-syscall.o' failed
> > > > 
> > > > Since I know nothing about s390, I've no idea what's at fault
> > > > here... :)  Thought I'd report this :)
> > > 
> > > "lt" seems to be a newer s390x opcode which has been added to the
> > > architecture within the last ten years or so. So maybe you've got to add
> > > some "-march=xxx" flag when compiling this file?
> > > Could you maybe start with finding out the exact comand line that is
> > > used to compile this file?
> > 
> > Yes, lt was added with the extended immidiate facility. So either use 
> > -march=z9-109 (introduced in
> > 2005) or replace the lt with an l + ltr to also run on older models.
> > 
> 
> Whoops.  I'd forgotten that RHEL defaults to z196 these days.  This piece of
> code is certainly not important enough to conditionalize on newer hardware.
> I'll put together a patch.

Actually gcc suggests to use icm %r0,15,0(%r8). It's even shorter than
lt %r0,0(%r8).

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH] exec: fix tlb_vaddr_to_host()

2016-09-22 Thread Aurelien Jarno

On 2016-09-21 19:06, Laurent Vivier wrote:
> When used in linux-user mode, tlb_vaddr_to_host(..., addr, ...))
> should return "g2h(addr)", but instead it returns "g2h(vaddr)".
> As "vaddr" is "typedef uint64_t", the result of "g2h(vaddr)" is
> "((void *)((unsigned long)(target_ulong)(uint64_t) + guest_base))".
> 
> This bug has been found trying to run "ls" with qemu-ppc.
> 
> Fixes: "c9f82d0 ppc: Speed up dcbz"
> Reported-by: Andreas Färber <afaer...@suse.de>
> Signed-off-by: Laurent Vivier <lviv...@redhat.com>
> ---
>  include/exec/cpu_ldst.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h
> index b573df5..6eb5fe8 100644
> --- a/include/exec/cpu_ldst.h
> +++ b/include/exec/cpu_ldst.h
> @@ -401,7 +401,7 @@ static inline void *tlb_vaddr_to_host(CPUArchState *env, 
> target_ulong addr,
>int access_type, int mmu_idx)
>  {
>  #if defined(CONFIG_USER_ONLY)
> -return g2h(vaddr);
> +return g2h(addr);
>  #else
>  int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>  CPUTLBEntry *tlbentry = >tlb_table[mmu_idx][index];

That looks fine to me, sorry for the typo.

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v2] sh4: fix broken link to documentation

2016-08-31 Thread Aurelien Jarno

On 2016-08-31 18:31, Reda Sallahi wrote:
> The page that was previously linked in the source code and the README file is
> no longer available so it now returns a 404 error message.
> 
> This puts a previous snapshot from archive.org instead.
> 
> Signed-off-by: Reda Sallahi <fullma...@gmail.com>
> ---
> Changes from v1:
> * Add the 'https://' part to the link in hw/sh4/shix.c.
> 
>  hw/sh4/shix.c | 2 +-
>  target-sh4/README.sh4 | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)

Thanks for the new version.

Acked-by: Aurelien Jarno <aurel...@aurel32.net>

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH] sh4: fix broken link to documentation

2016-08-31 Thread Aurelien Jarno

On 2016-08-31 17:54, Reda Sallahi wrote:
> The page that was previously linked in the source code and the README file is
> no longer available so it now returns a 404 error message.
> 
> This puts a previous snapshot from archive.org instead.
> 
> Signed-off-by: Reda Sallahi <fullma...@gmail.com>
> ---
>  hw/sh4/shix.c | 2 +-
>  target-sh4/README.sh4 | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/sh4/shix.c b/hw/sh4/shix.c
> index ccc9e75..2ac79fc 100644
> --- a/hw/sh4/shix.c
> +++ b/hw/sh4/shix.c
> @@ -23,7 +23,7 @@
>   */
>  /*
> Shix 2.0 board by Alexis Polti, described at
> -   http://perso.enst.fr/~polti/realisations/shix20/
> +   
> web.archive.org/web/20070917001736/perso.enst.fr/~polti/realisations/shix20

Thanks for the patch. Maybe put the http:// in front. It will go over
the 80 characters limits, but I think that's fine in such a case.

I guess this will be merged through the trivial queue, that seems the
best to me.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH] softfloat: Fix warn about implicit conversion from int to int8_t

2016-08-10 Thread Aurelien Jarno

On 2016-08-10 14:55, Pranith Kumar wrote:
> Change the flag type to 'uint8_t' to fix the implicit conversion error.
> 
> Signed-off-by: Pranith Kumar <bobby.pr...@gmail.com>
> ---
>  fpu/softfloat-specialize.h | 2 +-
>  include/fpu/softfloat.h| 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/fpu/softfloat-specialize.h b/fpu/softfloat-specialize.h
> index 43d0890..f5aed72 100644
> --- a/fpu/softfloat-specialize.h
> +++ b/fpu/softfloat-specialize.h
> @@ -197,7 +197,7 @@ float128 float128_default_nan(float_status *status)
>  | should be simply `float_exception_flags |= flags;'.
>  
> **/
>  
> -void float_raise(int8_t flags, float_status *status)
> +void float_raise(uint8_t flags, float_status *status)
>  {
>  status->float_exception_flags |= flags;
>  }
> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
> index 0e57ee5..1bde349 100644
> --- a/include/fpu/softfloat.h
> +++ b/include/fpu/softfloat.h
> @@ -198,7 +198,7 @@ enum {
>  typedef struct float_status {
>  signed char float_detect_tininess;
>  signed char float_rounding_mode;
> -signed char float_exception_flags;
> +uint8_t float_exception_flags;
>  signed char floatx80_rounding_precision;
>  /* should denormalised results go to zero and set the inexact flag? */
>  flag flush_to_zero;
> @@ -274,7 +274,7 @@ static inline flag get_default_nan_mode(float_status 
> *status)
>  | Routine to raise any or all of the software IEC/IEEE floating-point
>  | exception flags.
>  
> **/
> -void float_raise(int8_t flags, float_status *status);
> +void float_raise(uint8_t flags, float_status *status);
>  
>  
> /*--------
>  | If `a' is denormal and we are in flush-to-zero mode then set the

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH 2/5] softfloat: Fix warn about implicit conversion from int to int8_t

2016-08-10 Thread Aurelien Jarno

On 2016-08-09 22:12, Peter Maydell wrote:
> On 9 August 2016 at 20:16, Aurelien Jarno <aurel...@aurel32.net> wrote:
> > On 2016-08-09 15:02, Pranith Kumar wrote:
> >> Change the flag type to 'int' to fix the implicit conversion error.
> >>
> >> Suggested-by: Peter Maydell <peter.mayd...@linaro.org>
> >> Signed-off-by: Pranith Kumar <bobby.pr...@gmail.com>
> >> ---
> >>  fpu/softfloat-specialize.h | 2 +-
> >>  include/fpu/softfloat.h| 4 ++--
> >>  2 files changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/fpu/softfloat-specialize.h b/fpu/softfloat-specialize.h
> >> index 43d0890..46b4091 100644
> >> --- a/fpu/softfloat-specialize.h
> >> +++ b/fpu/softfloat-specialize.h
> >> @@ -197,7 +197,7 @@ float128 float128_default_nan(float_status *status)
> >>  | should be simply `float_exception_flags |= flags;'.
> >>  
> >> **/
> >>
> >> -void float_raise(int8_t flags, float_status *status)
> >> +void float_raise(int flags, float_status *status)
> >>  {
> >>  status->float_exception_flags |= flags;
> >>  }
> >> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
> >> index 0e57ee5..416cf7a 100644
> >> --- a/include/fpu/softfloat.h
> >> +++ b/include/fpu/softfloat.h
> >> @@ -196,9 +196,9 @@ enum {
> >>  };
> >>
> >>  typedef struct float_status {
> >> +int float_exception_flags;
> >>  signed char float_detect_tininess;
> >>  signed char float_rounding_mode;
> >> -signed char float_exception_flags;
> >>  signed char floatx80_rounding_precision;
> >>  /* should denormalised results go to zero and set the inexact flag? */
> >>  flag flush_to_zero;
> >
> > This changes the size of the structure, and thus of the CPU*State
> > structures. I don't think it's something we want to do, especially given
> > we currently only use 7 flags, so 7 bits and that fits in a char.
> 
> It does, but only by four bytes, which I didn't think was that
> big a deal. If we want to keep it to one byte then I think

Indeed it's not a lot, but if we do that with everything that goes into
the CPU*state structures, it has a significant impact.

> making it a uint8_t is probably better than leaving it as
> signed char, given we're definitely not treating it as a
> signed value.

I agree the signed char type is very strange here, it is probably there
for historical reasons. I guess using a uint8_t would indeed be the
correct short term fix.

The long term fix is probably to change this whole structure as a
bitfield of unsigned int, like we already do in the tcg code.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH 2/5] softfloat: Fix warn about implicit conversion from int to int8_t

2016-08-09 Thread Aurelien Jarno

On 2016-08-09 15:02, Pranith Kumar wrote:
> Change the flag type to 'int' to fix the implicit conversion error.
>
> Suggested-by: Peter Maydell <peter.mayd...@linaro.org>
> Signed-off-by: Pranith Kumar <bobby.pr...@gmail.com>
> ---
>  fpu/softfloat-specialize.h | 2 +-
>  include/fpu/softfloat.h| 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/fpu/softfloat-specialize.h b/fpu/softfloat-specialize.h
> index 43d0890..46b4091 100644
> --- a/fpu/softfloat-specialize.h
> +++ b/fpu/softfloat-specialize.h
> @@ -197,7 +197,7 @@ float128 float128_default_nan(float_status *status)
>  | should be simply `float_exception_flags |= flags;'.
>  
> **/
>  
> -void float_raise(int8_t flags, float_status *status)
> +void float_raise(int flags, float_status *status)
>  {
>  status->float_exception_flags |= flags;
>  }
> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
> index 0e57ee5..416cf7a 100644
> --- a/include/fpu/softfloat.h
> +++ b/include/fpu/softfloat.h
> @@ -196,9 +196,9 @@ enum {
>  };
>  
>  typedef struct float_status {
> +int float_exception_flags;
>  signed char float_detect_tininess;
>  signed char float_rounding_mode;
> -signed char float_exception_flags;
>  signed char floatx80_rounding_precision;
>  /* should denormalised results go to zero and set the inexact flag? */
>  flag flush_to_zero;

This changes the size of the structure, and thus of the CPU*State
structures. I don't think it's something we want to do, especially given
we currently only use 7 flags, so 7 bits and that fits in a char.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v4 for-2.7 2/7] tcg: Reorg TCGOp chaining

2016-08-05 Thread Aurelien Jarno

On 2016-08-04 21:56, Richard Henderson wrote:
> Instead of using -1 as end of chain, use 0, and link through the 0
> entry as a fully circular double-linked list.
> 
> Signed-off-by: Richard Henderson <r...@twiddle.net>

Thanks for the new patchset. It looks fine to me.

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 9/9] tcg: Lower indirect registers in a separate pass

2016-07-25 Thread Aurelien Jarno

On 2016-06-23 20:48, Richard Henderson wrote:
> Rather than rely on recursion during the middle of register allocation,
> lower indirect registers to loads and stores off the indirect base into
> plain temps.
> 
> For an x86_64 host, with sufficient registers, this results in identical
> code, modulo the actual register assignments.
> 
> For an i686 host, with insufficient registers, this means that temps can
> be (temporarily) spilled to the stack in order to satisfy an allocation.
> This as opposed to the possibility of not being able to spill, to allocate
> a register for the indirect base, in order to perform a spill.
> 
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  include/qemu/log.h |   1 +
>  tcg/optimize.c |  31 +-
>  tcg/tcg.c  | 306 
> +++--
>  tcg/tcg.h  |   4 +
>  util/log.c |   5 +-
>  5 files changed, 263 insertions(+), 84 deletions(-)

This patch is a difficult one to review... On the purely technical side
it does what it is supposed to do and I haven't found any issue, though
it's probably very easy to miss one in this kind of code. I have done
tests with various sparc images and I haven't found any obvious
regression on an x86_64 host.

Now on the less technical side, I really like the idea of being able to
transform more or less in place the TCG instruction stream. Your more or
less recent patches towards that direction are great. That said I am a
bit worried that we loop many times on the various ops. We used to have
one forward pass (optimizer) and one backward pass (liveness analysis).
Your patch adds up to two additional passes (one forward and one
backward), this clearly has a cost. Given that indirect registers bring
a lot of performance I think it is worth it. Now I wonder if there is
any way to do the lowering of registers earlier, I mean before the
liveness analysis. This would probably generate plenty of useless ops,
but that are later removed by the liveness analysis. Maybe you have
already try that?

I think it also depends on which direction we want to go with TCG,
either plenty of small independent optimization passes, or keep the
number of passes limited which means more complex code. Contrary to
a compiler we have to do a much more difficult trade-off between the
optimization time and the level of optimization.

Nevertheless I think it's the correct way to go forward for now and
this patch fixes real issues on hosts with limited registers. Maybe just
add a note saying there *might* be better ways to do that.

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 8/9] tcg: Include liveness info in the dumps

2016-07-25 Thread Aurelien Jarno

On 2016-06-23 20:48, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  tcg/tcg.c | 26 ++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index fd92b06..3e4bc99 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -1009,6 +1009,7 @@ void tcg_dump_ops(TCGContext *s)
>  const TCGOpDef *def;
>  const TCGArg *args;
>  TCGOpcode c;
> +long pos = ftell(qemu_logfile);
>  

Small additional note: ftell() doesn't work well when the logfile is the
terminal. As such when not dumping to the file, the alignment is wrong.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 8/9] tcg: Include liveness info in the dumps

2016-07-25 Thread Aurelien Jarno

On 2016-06-23 20:48, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  tcg/tcg.c | 26 ++
>  1 file changed, 26 insertions(+)

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 7/9] tcg: Compress dead_temps and mem_temps into a single array

2016-07-25 Thread Aurelien Jarno

On 2016-06-23 20:48, Richard Henderson wrote:
> We only need two bits per temporary.  Fold the two bytes into one,
> and reduce the memory and cachelines required during compilation.
> 
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  tcg/tcg.c | 118 
> +++---
>  1 file changed, 59 insertions(+), 59 deletions(-)
> 
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index c41640f..fd92b06 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -332,7 +332,7 @@ void tcg_context_init(TCGContext *s)
>  
>  memset(s, 0, sizeof(*s));
>  s->nb_globals = 0;
> -
> +
>  /* Count total number of arguments and allocate the corresponding
> space */
>  total_args = 0;
> @@ -824,16 +824,16 @@ void tcg_gen_callN(TCGContext *s, void *func, TCGArg 
> ret,
>  real_args++;
>  }
>  #endif
> - /* If stack grows up, then we will be placing successive
> -arguments at lower addresses, which means we need to
> -reverse the order compared to how we would normally
> -treat either big or little-endian.  For those arguments
> -that will wind up in registers, this still works for
> -HPPA (the only current STACK_GROWSUP target) since the
> -argument registers are *also* allocated in decreasing
> -order.  If another such target is added, this logic may
> -have to get more complicated to differentiate between
> -stack arguments and register arguments.  */
> +   /* If stack grows up, then we will be placing successive
> +  arguments at lower addresses, which means we need to
> +  reverse the order compared to how we would normally
> +  treat either big or little-endian.  For those arguments
> +  that will wind up in registers, this still works for
> +  HPPA (the only current STACK_GROWSUP target) since the
> +  argument registers are *also* allocated in decreasing
> +  order.  If another such target is added, this logic may
> +  have to get more complicated to differentiate between
> +  stack arguments and register arguments.  */
>  #if defined(HOST_WORDS_BIGENDIAN) != defined(TCG_TARGET_STACK_GROWSUP)
>  s->gen_opparam_buf[pi++] = args[i] + 1;
>  s->gen_opparam_buf[pi++] = args[i];
> @@ -1297,27 +1297,28 @@ void tcg_op_remove(TCGContext *s, TCGOp *op)
>  #endif
>  }
>  
> +#define TS_DEAD  1
> +#define TS_SYNC  2

I am not sure that TS_SYNC is the best name for that. There we really
want to tell that at this moment in the TCG instruction stream the
operand is in memory. It doesn't implied it is synced. Maybe TS_MEM?

The rest looks fine to me. The other alternative would have been to use
the TCGTempSet with the bitmap functions like in optimize.c, and use
only 2 bits per temp. That something that can be done later though.

All that said and provided you change the name:

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 6/9] tcg: Fold life data into TCGOp

2016-07-25 Thread Aurelien Jarno

On 2016-06-23 20:48, Richard Henderson wrote:
> Reduce the size of other bitfields to make room.
> This reduces the cache footprint of compilation.
> 
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  tcg/tcg.c |  9 +++--
>  tcg/tcg.h | 26 ++
>  2 files changed, 17 insertions(+), 18 deletions(-)

This looks fine and goes in the right direction of having all the data
at the same location. In the long term it might be useful to have the
TCG stream and the associated data together to implement more agressive
optimisations.

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 5/9] tcg: Reorg TCGOp chaining

2016-07-25 Thread Aurelien Jarno

izeof(*op));

Doing so means you can remove the op at gen_op_buf[0]. The doubled
linked list is still valid, but then I guess you can't assume anymore
that the first op is gen_op_buf[0], as it is done elsewhere your patch.

I guess it's unlikely to happen that we have to remove the first op.
Maybe adding an assert there is enough?


>  #ifdef CONFIG_PROFILER
>  s->del_op_count++;
> @@ -1344,7 +1336,7 @@ static void tcg_liveness_analysis(TCGContext *s)
>  mem_temps = tcg_malloc(s->nb_temps);
>  tcg_la_func_end(s, dead_temps, mem_temps);
>  
> -for (oi = s->gen_last_op_idx; oi >= 0; oi = oi_prev) {
> +for (oi = s->gen_op_buf[0].prev; oi != 0; oi = oi_prev) {
>  int i, nb_iargs, nb_oargs;
>  TCGOpcode opc_new, opc_new2;
>  bool have_opc_new2;
> @@ -2321,7 +2313,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>  {
>  int n;
>  
> -n = s->gen_last_op_idx + 1;
> +n = s->gen_op_buf[0].prev + 1;
>  s->op_count += n;
>  if (n > s->op_count_max) {
>  s->op_count_max = n;
> @@ -2380,7 +2372,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>  tcg_out_tb_init(s);
>  
>  num_insns = -1;
> -for (oi = s->gen_first_op_idx; oi >= 0; oi = oi_next) {
> +for (oi = s->gen_op_buf[0].next; oi != 0; oi = oi_next) {
>  TCGOp * const op = >gen_op_buf[oi];
>  TCGArg * const args = >gen_opparam_buf[op->args];
>  TCGOpcode opc = op->opc;
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index cc14560..49b396d 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -520,17 +520,21 @@ typedef struct TCGOp {
>  unsigned callo  : 2;
>  unsigned calli  : 6;
>  
> -/* Index of the arguments for this op, or -1 for zero-operand ops.  */
> -signed args : 16;
> +/* Index of the arguments for this op, or 0 for zero-operand ops.  */
> +unsigned args   : 16;
>  
> -/* Index of the prex/next op, or -1 for the end of the list.  */
> -signed prev : 16;
> -signed next : 16;
> +/* Index of the prex/next op, or 0 for the end of the list.  */

It's not introduced by your patch, but you might want to fix the typo
prex -> prev.

> +unsigned prev   : 16;
> +unsigned next   : 16;
>  } TCGOp;
>  
> -QEMU_BUILD_BUG_ON(NB_OPS > 0xff);
> -QEMU_BUILD_BUG_ON(OPC_BUF_SIZE >= 0x7fff);
> -QEMU_BUILD_BUG_ON(OPPARAM_BUF_SIZE >= 0x7fff);
> +/* Make sure operands fit in the bitfields above.  */
> +QEMU_BUILD_BUG_ON(NB_OPS > (1 << 8));
> +QEMU_BUILD_BUG_ON(OPC_BUF_SIZE > (1 << 16));
> +QEMU_BUILD_BUG_ON(OPPARAM_BUF_SIZE > (1 << 16));
> +
> +/* Make sure that we don't overflow 64 bits without noticing.  */
> +QEMU_BUILD_BUG_ON(sizeof(TCGOp) > 8);
>  
>  struct TCGContext {
>  uint8_t *pool_cur, *pool_end;

It seems that gen_first_op_idx and gen_last_op_idx are now unused.
Shouldn't they be removed?

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 3/9] tcg: Require liveness analysis

2016-07-25 Thread Aurelien Jarno

On 2016-06-23 20:48, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  tcg/tcg.c | 20 
>  1 file changed, 20 deletions(-)
> 

Thanks for doing that. The code without liveness analysis is not really
tested anymore and given the kind of optimisations we are doing there
it became quite difficult to compare the generated code with and without
liveness.

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 1/9] tcg: Fix name for high-half register

2016-07-25 Thread Aurelien Jarno

On 2016-06-23 20:48, Richard Henderson wrote:
> Reviewed-by: David Gibson <da...@gibson.dropbear.id.au>
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  tcg/tcg.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 254427b..154ffe8 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -557,7 +557,7 @@ int tcg_global_mem_new_internal(TCGType type, TCGv_ptr 
> base,
>  ts2->mem_offset = offset + (1 - bigendian) * 4;
>  pstrcpy(buf, sizeof(buf), name);
>  pstrcat(buf, sizeof(buf), "_1");
> -ts->name = strdup(buf);
> +ts2->name = strdup(buf);
>  } else {
>  ts->base_type = type;
>  ts->type = type;

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 4/9] tcg: Compress liveness data to 16 bits

2016-07-25 Thread Aurelien Jarno

On 2016-06-23 20:48, Richard Henderson wrote:
> This reduces both memory usage and per-insn cacheline usage
> during code generation.
> 
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  tcg/tcg.c | 58 ++
>  tcg/tcg.h | 16 ++--
>  2 files changed, 32 insertions(+), 42 deletions(-)
> 
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 64060c6..400e69c 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -1329,7 +1329,7 @@ static inline void tcg_la_bb_end(TCGContext *s, uint8_t 
> *dead_temps,
>  }
>  }
>  
> -/* Liveness analysis : update the opc_dead_args array to tell if a
> +/* Liveness analysis : update the opc_arg_life array to tell if a
> given input arguments is dead. Instructions updating dead
> temporaries are removed. */
>  static void tcg_liveness_analysis(TCGContext *s)
> @@ -1338,9 +1338,8 @@ static void tcg_liveness_analysis(TCGContext *s)
>  int oi, oi_prev, nb_ops;
>  
>  nb_ops = s->gen_next_op_idx;
> -s->op_dead_args = tcg_malloc(nb_ops * sizeof(uint16_t));
> -s->op_sync_args = tcg_malloc(nb_ops * sizeof(uint8_t));
> -
> +s->op_arg_life = tcg_malloc(nb_ops * sizeof(TCGLifeData));
> +
>  dead_temps = tcg_malloc(s->nb_temps);
>  mem_temps = tcg_malloc(s->nb_temps);
>  tcg_la_func_end(s, dead_temps, mem_temps);
> @@ -1349,8 +1348,7 @@ static void tcg_liveness_analysis(TCGContext *s)
>  int i, nb_iargs, nb_oargs;
>  TCGOpcode opc_new, opc_new2;
>  bool have_opc_new2;
> -uint16_t dead_args;
> -uint8_t sync_args;
> +TCGLifeData arg_life = 0;

A small improvement, probably for later: we can zero the s->op_arg_life
structure, and then access it directly instead of using the arg_life
temporary variable.

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH] gt64xxx: access right I/O port when activating byte swapping

2016-07-22 Thread Aurelien Jarno

On 2016-06-18 22:48, Hervé Poussineau wrote:
> Hi Aurélien,
> 
> Le 20/05/2016 à 21:56, Aurelien Jarno a écrit :
> > On 2016-05-20 15:05, Hervé Poussineau wrote:
> > > Incidentally, this fixes YAMON on big endian guest.
> > > 
> > > Signed-off-by: Hervé Poussineau <hpous...@reactos.org>
> > > ---
> > >  hw/mips/gt64xxx_pci.c | 62 
> > > +--
> > >  1 file changed, 60 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/hw/mips/gt64xxx_pci.c b/hw/mips/gt64xxx_pci.c
> > > index 3f4523d..c76ee88 100644
> > > --- a/hw/mips/gt64xxx_pci.c
> > > +++ b/hw/mips/gt64xxx_pci.c
> > > @@ -177,6 +177,7 @@
> > > 
> > >  /* PCI Internal */
> > >  #define GT_PCI0_CMD  (0xc00 >> 2)
> > > +#define   GT_CMD_MWORDSWAP  (1 << 10)
> > >  #define GT_PCI0_TOR  (0xc04 >> 2)
> > >  #define GT_PCI0_BS_SCS10 (0xc08 >> 2)
> > >  #define GT_PCI0_BS_SCS32 (0xc0c >> 2)
> > > @@ -294,6 +295,62 @@ static void gt64120_isd_mapping(GT64120State *s)
> > >  memory_region_add_subregion(get_system_memory(), s->ISD_start, 
> > > >ISD_mem);
> > >  }
> > > 
> > > +static uint64_t gt64120_pci_io_read(void *opaque, hwaddr addr,
> > > +unsigned int size)
> > > +{
> > > +GT64120State *s = opaque;
> > > +uint8_t buf[4];
> > > +
> > > +if (s->regs[GT_PCI0_CMD] & GT_CMD_MWORDSWAP) {
> > 
> > First of all, it should be noted that this bit doesn't control byte
> > swapping, but swaps the 2 4-byte words in a 8-byte word.
> > 
> > > +addr = (addr & ~3) + 4 - size - (addr & 3);
> > 
> > This looks complicated, and I don't think it is correct. In addition
> > this doesn't behave correctly at the edges of the address space. For
> > example a 2 byte access at address 0x3 would access address
> > 0x.
> > 
> > For sizes <= 4, swapping the 2 words should be done with addr ^= 4.
> > Maybe you should also check for MBYTESWAP which also swaps the bytes
> > within a 8-byte word.
> 
> The real word problem (ie the one from Yamon) is:
> In LE Yamon, there is a read a 0x4d1 (len = 1). MWORDSWAP and MBYTESWAP are 
> disabled
> In BE Yamon, the same read is at address 0x4d2. MWORDSWAP is enabled while 
> MBYTESWAP is disabled.
> 
> MWORDSWAP documentation is:
> "The GT-64120 PCI master swaps the words of the incoming and outgoing PCI 
> data (swap the 2 words of a long word)"
> 
> Do we have to ignore it, as QEMU only handles 4-bytes accesses?

I think indeed that it should be ignored.

> Then, how to change this address 0x4d2 to 0x4d1, address where is located the 
> i8259 ELCR register?
> Next accesses are for the RTC, at address 0x72 in BE and address 0x71 in LE.
> I think I'm missing something.

If you talk about byte accesses, it really looks like a simple byteswap.
0x4d2 ^ 3 = 0x4d1 and 0x72 ^ 3 = 0x71. This could be there is a byteswap
that is missing somewhere. It would be interesting to see how 16-bit and
32-bit accesses are changed between big and little endian.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 0/9] Third try at fixing sparc register allocation

2016-07-22 Thread Aurelien Jarno

On 2016-07-19 09:09, Richard Henderson wrote:
> On 06/24/2016 09:18 AM, Richard Henderson wrote:
> > I was unhappy about the complexity of the second try.
> > 
> > Better to convert to normal temps, allowing in rare
> > occasions, spilling the "globals" to the stack in order
> > to satisfy register allocation.
> > 
> > I can no longer provoke an allocation failure on i686.
> > Hopefully this fixes the OpenBSD case that Mark mentioned
> > re the second attempt.
> 
> Ping for review.  It would be nice to have this fixed for 2.7, but this is
> complex enough I'd prefer another set of eyes.

I'll try to have a look during the week-end. Sorry about the delay.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH] hw/mips_malta: Fix YAMON API print routine

2016-07-22 Thread Aurelien Jarno

On 2016-07-22 10:55, Paul Burton wrote:
> The print routine provided as part of the in-built bootloader had a bug
> in that it attempted to use a jump instruction as part of a loop, but
> the target has its upper bits zeroed leading to control flow
> transferring to 0xb814 rather than the intended 0xbfc00814. Fix this
> by using a branch instruction instead, which seems more fit for purpose.
> 
> A simple way to test this is to build a Linux kernel with EVA enabled &
> attempt to boot it in QEMU. It will attempt to print a message
> indicating the configuration mismatch but QEMU would previously
> incorrectly jump & wind up printing a continuous stream of the letter E.
> 
> Signed-off-by: Paul Burton <paul.bur...@imgtec.com>
> Cc: Aurelien Jarno <aurel...@aurel32.net>
> Cc: Leon Alrae <leon.al...@imgtec.com>
> ---
>  hw/mips/mips_malta.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/mips/mips_malta.c b/hw/mips/mips_malta.c
> index 34d41ef..e90857e 100644
> --- a/hw/mips/mips_malta.c
> +++ b/hw/mips/mips_malta.c
> @@ -727,7 +727,7 @@ static void write_bootloader(uint8_t *base, int64_t 
> run_addr,
>  stl_p(p++, 0x); /* nop */
>  stl_p(p++, 0x0ff0021c); /* jal 870 */
>  stl_p(p++, 0x); /* nop */
> -stl_p(p++, 0x08000205); /* j 814 */
> +stl_p(p++, 0x1000fff9); /* b 814 */
>  stl_p(p++, 0x); /* nop */
>  stl_p(p++, 0x01a9); /* jalr t5 */
>  stl_p(p++, 0x01602021); /* move 
> a0,t3 */

This looks fine. The switch from jump to branch is questionable given
there are other jumps around in the code, but that's just nitpicking.

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH] target-sh4: Use glib allocator in movcal helper

2016-07-21 Thread Aurelien Jarno

On 2016-07-21 13:44, Peter Maydell wrote:
> Ping?
> 
> thanks
> -- PMM
> 
> On 12 July 2016 at 13:50, Peter Maydell <peter.mayd...@linaro.org> wrote:
> > Coverity spots that helper_movcal() calls malloc() but doesn't
> > check for failure. Fix this by switching to the glib allocation
> > functions, which abort on allocation failure.
> >
> > Signed-off-by: Peter Maydell <peter.mayd...@linaro.org>
> > ---
> >  target-sh4/op_helper.c | 7 ---
> >  1 file changed, 4 insertions(+), 3 deletions(-)

I have just looked at it and test it. It's all fine, sorry for the
delay.

Acked-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH] hw/sh4: Add dtb support

2016-06-26 Thread Aurelien Jarno

On 2016-06-19 14:34, Yoshinori Sato wrote:
> New SH kernel use dtb.

Do you have a pointer to the corresponding kernel code? I can't find in
linus' tree.

> This patch add dtb support for R2D board emulation.
> 
> Signed-off-by: Yoshinori Sato <ys...@users.sourceforge.jp>
> ---
>  hw/sh4/r2d.c | 52 +++-
>  1 file changed, 43 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/sh4/r2d.c b/hw/sh4/r2d.c
> index f2547ed..203c117 100644
> --- a/hw/sh4/r2d.c
> +++ b/hw/sh4/r2d.c
> @@ -41,6 +41,7 @@
>  #include "hw/usb.h"
>  #include "hw/block/flash.h"
>  #include "sysemu/block-backend.h"
> +#include "sysemu/device_tree.h"
>  #include "exec/address-spaces.h"
>  
>  #define FLASH_BASE 0x
> @@ -51,7 +52,7 @@
>  
>  #define SM501_VRAM_SIZE 0x80
>  
> -#define BOOT_PARAMS_OFFSET 0x0001000
> +#define BOOT_PARAMS_OFFSET 0x001

Hmm, the current code already contains the value 0x001

>  /* CONFIG_BOOT_LINK_OFFSET of Linux kernel */
>  #define LINUX_LOAD_OFFSET  0x080
>  #define INITRD_LOAD_OFFSET 0x180
> @@ -198,6 +199,7 @@ static qemu_irq *r2d_fpga_init(MemoryRegion *sysmem,
>  typedef struct ResetData {
>  SuperHCPU *cpu;
>  uint32_t vector;
> +uint32_t dtb;
>  } ResetData;
>  
>  static void main_cpu_reset(void *opaque)
> @@ -207,6 +209,8 @@ static void main_cpu_reset(void *opaque)
>  
>  cpu_reset(CPU(s->cpu));
>  env->pc = s->vector;
> +/* r4_bank1 is DTB address */
> +env->gregs[4 + 16] = s->dtb;
>  }
>  
>  static struct QEMU_PACKED
> @@ -225,10 +229,12 @@ static struct QEMU_PACKED
>  
>  static void r2d_init(MachineState *machine)
>  {
> +QemuOpts *machine_opts;
>  const char *cpu_model = machine->cpu_model;
> -const char *kernel_filename = machine->kernel_filename;
> -const char *kernel_cmdline = machine->kernel_cmdline;
> -const char *initrd_filename = machine->initrd_filename;
> +const char *kernel_filename;
> +const char *kernel_cmdline;
> +const char *initrd_filename;
> +const char *dtb_filename;
>  SuperHCPU *cpu;
>  CPUSH4State *env;
>  ResetData *reset_info;
> @@ -258,6 +264,12 @@ static void r2d_init(MachineState *machine)
>  reset_info->vector = env->pc;
>  qemu_register_reset(main_cpu_reset, reset_info);
>  
> +machine_opts = qemu_get_machine_opts();
> +kernel_filename = qemu_opt_get(machine_opts, "kernel");
> +kernel_cmdline = qemu_opt_get(machine_opts, "append");
> +initrd_filename = qemu_opt_get(machine_opts, "initrd");
> +dtb_filename = qemu_opt_get(machine_opts, "dtb");
> +
>  /* Allocate memory space */
>  memory_region_init_ram(sdram, NULL, "r2d.sdram", SDRAM_SIZE, 
> _fatal);
>  vmstate_register_ram_global(sdram);
> @@ -347,11 +359,33 @@ static void r2d_init(MachineState *machine)
>  boot_params.initrd_size = tswap32(initrd_size);
>  }
>  
> -if (kernel_cmdline) {
> -/* I see no evidence that this .kernel_cmdline buffer requires
> -   NUL-termination, so using strncpy should be ok. */
> -strncpy(boot_params.kernel_cmdline, kernel_cmdline,
> -sizeof(boot_params.kernel_cmdline));
> +if (!dtb_filename) {
> +if (kernel_cmdline) {
> +/* I see no evidence that this .kernel_cmdline buffer requires
> +   NUL-termination, so using strncpy should be ok. */
> +strncpy(boot_params.kernel_cmdline, kernel_cmdline,
> +sizeof(boot_params.kernel_cmdline));
> +}
> +} else {
> +void *fdt;
> +int r = 0;
> +uint32_t addr;
> +int fdt_size;
> +
> +fdt = load_device_tree(dtb_filename, _size);
> +if (!fdt) {
> +return;
> +}

If the user explicitly passed a dtb file, the error should not be
silently ignore if the file can't be loaded. It should print an error
message and bail out, just like it's done when the kernel image can't be
loaded.

> +if (kernel_cmdline) {
> +r = qemu_fdt_setprop_string(fdt, "/chosen", "bootargs",
> +kernel_cmdline);
> +if (r < 0) {
> +fprintf(stderr, "couldn't set /chosen/bootargs\n");
> +}
> +}
> +addr = (SDRAM_BASE + SDRAM_SIZE - fdt_size) & ~0xfff;
> +cpu_physical_memory_write(addr, fdt, fdt_size);
> +reset_info->dtb = addr;
>  }
>  
>  rom_add_blob_fixed("boot_params", _params, sizeof(boot_params),

When a dtb file is passed, this memory area is initialized. Either the
command line should also be copied there, or this blob should not be
added in the dtb case.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH] net: mipsnet: check transmit buffer size before sending

2016-06-13 Thread Aurelien Jarno

On 2016-06-02 10:28, Peter Maydell wrote:
> On 2 June 2016 at 07:44, P J P <ppan...@redhat.com> wrote:
> > From: Prasad J Pandit <p...@fedoraproject.org>
> >
> > When processing MIPSnet I/O port write operation, it uses a
> > transmit buffer tx_buffer[MAX_ETH_FRAME_SIZE=1514]. Two indices
> > 's->tx_written' and 's->tx_count' are used to control data written
> > to this buffer. If the two were to be equal before writing, it'd
> > lead to an OOB write access beyond tx_buffer. Add check to avoid it.
> >
> > Reported-by: Li Qiang <liqiang...@360.cn>
> > Signed-off-by: Prasad J Pandit <p...@fedoraproject.org>
> > ---
> >  hw/net/mipsnet.c | 10 ++
> >  1 file changed, 6 insertions(+), 4 deletions(-)
> >
> > diff --git a/hw/net/mipsnet.c b/hw/net/mipsnet.c
> > index 740cd98..8d5e5bf 100644
> > --- a/hw/net/mipsnet.c
> > +++ b/hw/net/mipsnet.c
> > @@ -158,7 +158,7 @@ static void mipsnet_ioport_write(void *opaque, hwaddr 
> > addr,
> >  trace_mipsnet_write(addr, val);
> >  switch (addr) {
> >  case MIPSNET_TX_DATA_COUNT:
> > -   s->tx_count = (val <= MAX_ETH_FRAME_SIZE) ? val : 0;
> > +s->tx_count = (val < MAX_ETH_FRAME_SIZE) ? val : 
> > MAX_ETH_FRAME_SIZE;
> >  s->tx_written = 0;
> 
> This is a behaviour change -- the register will now read
> back as MAX_ETH_FRAME_SIZE rather than 0 if written with
> an overlarge value.
> 
> Do we have any documentation on how this (simulated)
> device is supposed to behave in this case?

This device is not supported by the linux kernel for more than 2.5 years
(since v3.7). Do we want to keep this device in QEMU? 

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH 1/2] Fix confusing argument names of do_unaligned_access() functions

2016-06-13 Thread Aurelien Jarno

On 2016-06-10 19:26, Sergey Sorokin wrote:
> There are functions cpu_unaligned_access() and do_unaligned_access() that
> are called with access type and mmu index arguments. But these arguments
> are named 'is_write' and 'is_user' in their declarations.
> The patch fixes the names to avoid a confusion.

Unless I missed something, it seems that the is_user/mmu_idx argument is
never used. Should we maybe just drop it?

Otherwise it looks fine.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH] hw/sh4/sh_pci.c: Use ldl_le_p() and stl_le_p()

2016-06-13 Thread Aurelien Jarno

On 2016-06-10 17:10, Peter Maydell wrote:
> Use ldl_le_p() and stl_le_p() instead of le32_to_cpup() and
> cpu_to_le32w(); the former handle misaligned addresses and don't
> need casts, and the latter are deprecated.
> 
> Signed-off-by: Peter Maydell <peter.mayd...@linaro.org>
> ---
>  hw/sh4/sh_pci.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/sh4/sh_pci.c b/hw/sh4/sh_pci.c
> index e820a32..1747628 100644
> --- a/hw/sh4/sh_pci.c
> +++ b/hw/sh4/sh_pci.c
> @@ -55,7 +55,7 @@ static void sh_pci_reg_write (void *p, hwaddr addr, 
> uint64_t val,
>  
>  switch(addr) {
>  case 0 ... 0xfc:
> -cpu_to_le32w((uint32_t*)(pcic->dev->config + addr), val);
> +stl_le_p(pcic->dev->config + addr, val);
>  break;
>  case 0x1c0:
>  pcic->par = val;
> @@ -85,7 +85,7 @@ static uint64_t sh_pci_reg_read (void *p, hwaddr addr,
>  
>  switch(addr) {
>  case 0 ... 0xfc:
> -return le32_to_cpup((uint32_t*)(pcic->dev->config + addr));
> +return ldl_le_p(pcic->dev->config + addr);
>  case 0x1c0:
>  return pcic->par;
>  case 0x1c4:

Thanks for the patch. I confirm it builds and works fine.

Acked-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH 0/2] macio: switch over to new byte-aligned DMA helpers

2016-05-30 Thread Aurelien Jarno

On 2016-05-27 09:48, Mark Cave-Ayland wrote:
> Here is a tidied up version of my patch to convert the macio controller over 
> to
> using the new byte-aligned DMA helpers.
> 
> The first patch is just a hack and temporarily disables unaligned iovec
> truncation in the DMA helper (as discussed in the recent thread) until Paolo 
> or
> someone else can devise a proper solution. Without this, the subsequent switch
> over to the DMA helpers will appear to work during a Darwin PPC install but 
> the
> resulting image is corrupt and will fail to boot.
> 
> The second patch is the real one and switches the macio controller over to use
> the new byte-aligned DMA helpers. Here I see a speed-up of around 2.5x-3x for
> a typical Darwin PPC installation compared to the previous code.
> 
> Aurelien, I'd be grateful if you could test the TRIM path as I know this is
> something you've had issues with before and I couldn't quite figure out how to
> reproduce your TRIM tests from before.

I have just tested the TRIM path, all works fine with your 2 patches
applied.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v2 06/12] tcg/mips: Add support for fence

2016-05-30 Thread Aurelien Jarno

On 2016-05-26 18:00, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  tcg/mips/tcg-target.inc.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
> index 50e98ea..cad1d4d 100644
> --- a/tcg/mips/tcg-target.inc.c
> +++ b/tcg/mips/tcg-target.inc.c
> @@ -292,6 +292,7 @@ typedef enum {
>  OPC_JALR = OPC_SPECIAL | 0x09,
>  OPC_MOVZ = OPC_SPECIAL | 0x0A,
>  OPC_MOVN = OPC_SPECIAL | 0x0B,
> +OPC_SYNC = OPC_SPECIAL | 0x0F,
>  OPC_MFHI = OPC_SPECIAL | 0x10,
>  OPC_MFLO = OPC_SPECIAL | 0x12,
>  OPC_MULT = OPC_SPECIAL | 0x18,
> @@ -1636,6 +1637,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
> opc,
>  const_args[4], const_args[5], true);
>  break;
>  
> +case INDEX_op_fence:
> +tcg_out32(s, OPC_SYNC);
> +break;
>  case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
>  case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
>  case INDEX_op_call: /* Always emitted via tcg_out_call.  */
> @@ -1716,6 +1720,8 @@ static const TCGTargetOpDef mips_op_defs[] = {
>  { INDEX_op_qemu_ld_i64, { "L", "L", "lZ", "lZ" } },
>  { INDEX_op_qemu_st_i64, { "SZ", "SZ", "SZ", "SZ" } },
>  #endif
> +
> +    { INDEX_op_fence, { } },
>  { -1 },
>  };

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

Also compiled tested, but we don't really have a way to test that so
far.


-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH] gt64xxx: access right I/O port when activating byte swapping

2016-05-20 Thread Aurelien Jarno

On 2016-05-20 15:05, Hervé Poussineau wrote:
> Incidentally, this fixes YAMON on big endian guest.
> 
> Signed-off-by: Hervé Poussineau <hpous...@reactos.org>
> ---
>  hw/mips/gt64xxx_pci.c | 62 
> +--
>  1 file changed, 60 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/mips/gt64xxx_pci.c b/hw/mips/gt64xxx_pci.c
> index 3f4523d..c76ee88 100644
> --- a/hw/mips/gt64xxx_pci.c
> +++ b/hw/mips/gt64xxx_pci.c
> @@ -177,6 +177,7 @@
>  
>  /* PCI Internal */
>  #define GT_PCI0_CMD  (0xc00 >> 2)
> +#define   GT_CMD_MWORDSWAP  (1 << 10)
>  #define GT_PCI0_TOR  (0xc04 >> 2)
>  #define GT_PCI0_BS_SCS10 (0xc08 >> 2)
>  #define GT_PCI0_BS_SCS32 (0xc0c >> 2)
> @@ -294,6 +295,62 @@ static void gt64120_isd_mapping(GT64120State *s)
>  memory_region_add_subregion(get_system_memory(), s->ISD_start, 
> >ISD_mem);
>  }
>  
> +static uint64_t gt64120_pci_io_read(void *opaque, hwaddr addr,
> +unsigned int size)
> +{
> +GT64120State *s = opaque;
> +uint8_t buf[4];
> +
> +if (s->regs[GT_PCI0_CMD] & GT_CMD_MWORDSWAP) {

First of all, it should be noted that this bit doesn't control byte
swapping, but swaps the 2 4-byte words in a 8-byte word.

> +addr = (addr & ~3) + 4 - size - (addr & 3);

This looks complicated, and I don't think it is correct. In addition
this doesn't behave correctly at the edges of the address space. For
example a 2 byte access at address 0x3 would access address
0x.

For sizes <= 4, swapping the 2 words should be done with addr ^= 4.
Maybe you should also check for MBYTESWAP which also swaps the bytes
within a 8-byte word.

> +}
> +
> +address_space_read(_space_io, addr, MEMTXATTRS_UNSPECIFIED,
> +   buf, size);
> +
> +if (size == 1) {
> +return buf[0];
> +} else if (size == 2) {
> +return lduw_le_p(buf);
> +} else if (size == 4) {
> +return ldl_le_p(buf);
> +} else {
> +g_assert_not_reached();
> +    }

The device is configured is little endian, and then the little endian
value converted into native endianness. Wouldn't it be simple to declare
it as DEVICE_NATIVE_ENDIAN?

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

[Qemu-devel] [PATCH] target-mips: fix call to memset in soft reset code

2016-05-09 Thread Aurelien Jarno

Recent versions of GCC report the following error when compiling
target-mips/helper.c:

  qemu/target-mips/helper.c:542:9: warning: ‘memset’ used with length
  equal to number of elements without multiplication by element size
  [-Wmemset-elt-size]

This is indeed correct and due to a wrong usage of sizeof(). Fix that.

Cc: Stefan Weil <s...@weilnetz.de>
Cc: Leon Alrae <leon.al...@imgtec.com>
LP: https://bugs.launchpad.net/qemu/+bug/1577841
Signed-off-by: Aurelien Jarno <aurel...@aurel32.net>
---
 target-mips/helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target-mips/helper.c b/target-mips/helper.c
index 1004ede..cfea177 100644
--- a/target-mips/helper.c
+++ b/target-mips/helper.c
@@ -539,7 +539,7 @@ void mips_cpu_do_interrupt(CPUState *cs)
 break;
 case EXCP_SRESET:
 env->CP0_Status |= (1 << CP0St_SR);
-memset(env->CP0_WatchLo, 0, sizeof(*env->CP0_WatchLo));
+memset(env->CP0_WatchLo, 0, sizeof(env->CP0_WatchLo));
 goto set_error_EPC;
 case EXCP_NMI:
 env->CP0_Status |= (1 << CP0St_NMI);
-- 
2.8.1

Re: [Qemu-devel] [Bug 1577841] [NEW] target-mips/helper.c:542: bad sizeof ?

2016-05-03 Thread Aurelien Jarno

On 2016-05-03 20:11, Stefan Weil wrote:
> Am 03.05.2016 um 17:58 schrieb dcb:
> > Public bug reported:
> >
> > Recent versions of gcc say this:
> >
> > qemu/target-mips/helper.c:542:9: warning: ‘memset’ used with length
> > equal to number of elements without multiplication by element size
> > [-Wmemset-elt-size]
> >
> > Source code is
> >
> >memset(env->CP0_WatchLo, 0, sizeof(*env->CP0_WatchLo));
> >
> > Maybe better code
> >
> >memset(env->CP0_WatchLo, 0, 8 * sizeof(target_ulong));
> >
> > ** Affects: qemu
> >  Importance: Undecided
> >  Status: New
> 
> This might be an error. I think it should be
> 
> memset(env->CP0_WatchLo, 0, sizeof(env->CP0_WatchLo));
> 

I confirm this is the correct version of the code. Someone is sending a
patch, i should I do that?

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH] target-mips: Fix RDHWR exception host PC

2016-04-28 Thread Aurelien Jarno

On 2016-04-27 23:21, James Hogan wrote:
> Commit b00c72180c36 ("target-mips: add PC, XNP reg numbers to RDHWR")
> changed the rdhwr helpers to use check_hwrena() to check the register
> being accessed is enabled in CP0_HWREna when used from user mode. If
> that check fails an EXCP_RI exception is raised at the host PC
> calculated with GETPC().
> 
> However check_hwrena() may not be fully inlined as the
> do_raise_exception() part of it is common regardless of the arguments.
> This causes GETPC() to calculate the address in the call in the helper
> instead of the generated code calling the helper. No TB will be found
> and the EPC reported with the resulting guest RI exception points to the
> beginning of the TB instead of the RDHWR instruction.
> 
> We can't reliably force check_hwrena() to be inlined, and converting it
> to a macro would be ugly, so instead pass the host PC in as an argument,
> with each rdhwr helper passing GETPC(). This should avoid any dependence
> on compiler behaviour, and in practice seems to prevent the partial
> inlining of check_hwrena() on x86_64.
> 
> This issue causes failures when running a MIPS KVM (trap & emulate)
> guest in a MIPS QEMU TCG guest, as the inner guest kernel will do a
> RDHWR of counter, which is disabled in the outer guest's CP0_HWREna by
> KVM so it can emulate the inner guest's counter. The emulation fails and
> the RI exception is passed to the inner guest.
> 
> Fixes: b00c72180c36 ("target-mips: add PC, XNP reg numbers to RDHWR")
> Signed-off-by: James Hogan <james.ho...@imgtec.com>
> Cc: Leon Alrae <leon.al...@imgtec.com>
> Cc: Yongbok Kim <yongbok@imgtec.com>
> Cc: Aurelien Jarno <aurel...@aurel32.net>
> ---
>  target-mips/op_helper.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)

Thanks for the detailed analysis. The other solution would have been to
declare the function as __attribute__((__always_inline__)), but I think
your solution is even better.

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] 'tcg fatal error' with qemu v2.6.0-rc3 (bisected)

2016-04-27 Thread Aurelien Jarno

On 2016-04-26 20:12, Guenter Roeck wrote:
> Hi,
> 
> when running qemu version 2.6.0-rc3, I get the following error message.
> 
> /opt/buildbot/qemu/qemu/tcg/tcg.c:1769: tcg fatal error
> 
> qemu command line is as follows.
> 
> qemu-system-ppc64 -M mpc8544ds \
>   -cpu e5500 \
>   -m 1024 -kernel arch/powerpc/boot/uImage -initrd busybox-ppc.cpio \
>   -nographic -vga none -monitor null -no-reboot \
>   --append "rdinit=/sbin/init console=tty console=ttyS0" \
>   -machine "dt_compatible=fsl,,P5020DS"
> 
> Files are from my test suite at https://github.com/groeck/linux-build-test.

I have try to reproduce the issue with a kernel build with my toolchain
and your defconfig, but it worked for me. Could you please share the
binaries so that it is easier to reproduce the issue?

> Bisect points to commit 40ae5c62ebaaf7d9d3b93b88c2d32bf6342f7889 ("tcg:
> Introduce temp_load"). Bisect log is attached.

I look at this patch again quickly, and I don't think the bug is
actually introduced by this commit, it is just that it does more strict
tests on the value. I guess it was working before by chance by loading
a random value into the register. That said to look deeper into it, it
would be better to be able to reproduce the issue.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v2 10/11] tcg/mips: Make direct jump patching thread-safe

2016-04-22 Thread Aurelien Jarno

On 2016-04-22 20:00, Sergey Fedorov wrote:
> On 22/04/16 19:51, Aurelien Jarno wrote:
> > On 2016-04-22 18:47, Aurelien Jarno wrote:
> >> On 2016-04-22 19:08, Sergey Fedorov wrote:
> >>> From: Sergey Fedorov <serge.f...@gmail.com>
> >>>
> >>> Ensure direct jump patching in MIPS is atomic by using
> >>> atomic_read()/atomic_set() for code patching.
> >>>
> >>> Signed-off-by: Sergey Fedorov <serge.f...@gmail.com>
> >>> Signed-off-by: Sergey Fedorov <sergey.fedo...@linaro.org>
> >>> ---
> >>>
> >>> Changes in v2:
> >>>  * s/atomic_write/atomic_set/
> >>>
> >>>  tcg/mips/tcg-target.inc.c | 3 ++-
> >>>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
> >>> index 682e19897db0..cefc0398018a 100644
> >>> --- a/tcg/mips/tcg-target.inc.c
> >>> +++ b/tcg/mips/tcg-target.inc.c
> >>> @@ -1886,6 +1886,7 @@ static void tcg_target_init(TCGContext *s)
> >>>  void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr)
> >>>  {
> >>>  uint32_t *ptr = (uint32_t *)jmp_addr;
> >>> -*ptr = deposit32(*ptr, 0, 26, addr >> 2);
> >>> +uint32_t insn = atomic_read(ptr);
> >>> +atomic_set(ptr, deposit32(insn, 0, 26, addr >> 2));
> >>>  flush_icache_range(jmp_addr, jmp_addr + 4);
> >> Does it really make sense to read and write the value atomically? The
> >> resulting operation is still not atomic, something can happen in
> >> between.
> > Hmm, thinking more about that, given the only instruction used is "J",
> > we don't have to read the value, patch it and write it. We can directly
> > use something like (untested):
> >
> > atomic_set(ptr, (0x02 << 26) | (addr >> 2));
> 
> Hmm, looking at "case INDEX_op_goto_tb:" in tcg_out_op() again I'm
> thinking about:
> 
> atomic_set(ptr, deposit32(OPC_J, 0, 26, addr >> 2));

Oh right, I forgot when reviewing the patch that this code is actually
in tcg-target.inc.c, and that the OPC_J constant is available.

Your version looks good to me.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v2 10/11] tcg/mips: Make direct jump patching thread-safe

2016-04-22 Thread Aurelien Jarno

On 2016-04-22 18:47, Aurelien Jarno wrote:
> On 2016-04-22 19:08, Sergey Fedorov wrote:
> > From: Sergey Fedorov <serge.f...@gmail.com>
> > 
> > Ensure direct jump patching in MIPS is atomic by using
> > atomic_read()/atomic_set() for code patching.
> > 
> > Signed-off-by: Sergey Fedorov <serge.f...@gmail.com>
> > Signed-off-by: Sergey Fedorov <sergey.fedo...@linaro.org>
> > ---
> > 
> > Changes in v2:
> >  * s/atomic_write/atomic_set/
> > 
> >  tcg/mips/tcg-target.inc.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
> > index 682e19897db0..cefc0398018a 100644
> > --- a/tcg/mips/tcg-target.inc.c
> > +++ b/tcg/mips/tcg-target.inc.c
> > @@ -1886,6 +1886,7 @@ static void tcg_target_init(TCGContext *s)
> >  void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr)
> >  {
> >  uint32_t *ptr = (uint32_t *)jmp_addr;
> > -*ptr = deposit32(*ptr, 0, 26, addr >> 2);
> > +uint32_t insn = atomic_read(ptr);
> > +atomic_set(ptr, deposit32(insn, 0, 26, addr >> 2));
> >  flush_icache_range(jmp_addr, jmp_addr + 4);
> 
> Does it really make sense to read and write the value atomically? The
> resulting operation is still not atomic, something can happen in
> between.

Hmm, thinking more about that, given the only instruction used is "J",
we don't have to read the value, patch it and write it. We can directly
use something like (untested):

atomic_set(ptr, (0x02 << 26) | (addr >> 2));

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v2 10/11] tcg/mips: Make direct jump patching thread-safe

2016-04-22 Thread Aurelien Jarno

On 2016-04-22 19:08, Sergey Fedorov wrote:
> From: Sergey Fedorov <serge.f...@gmail.com>
> 
> Ensure direct jump patching in MIPS is atomic by using
> atomic_read()/atomic_set() for code patching.
> 
> Signed-off-by: Sergey Fedorov <serge.f...@gmail.com>
> Signed-off-by: Sergey Fedorov <sergey.fedo...@linaro.org>
> ---
> 
> Changes in v2:
>  * s/atomic_write/atomic_set/
> 
>  tcg/mips/tcg-target.inc.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
> index 682e19897db0..cefc0398018a 100644
> --- a/tcg/mips/tcg-target.inc.c
> +++ b/tcg/mips/tcg-target.inc.c
> @@ -1886,6 +1886,7 @@ static void tcg_target_init(TCGContext *s)
>  void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr)
>  {
>  uint32_t *ptr = (uint32_t *)jmp_addr;
> -*ptr = deposit32(*ptr, 0, 26, addr >> 2);
> +uint32_t insn = atomic_read(ptr);
> +atomic_set(ptr, deposit32(insn, 0, 26, addr >> 2));
>  flush_icache_range(jmp_addr, jmp_addr + 4);

Does it really make sense to read and write the value atomically? The
resulting operation is still not atomic, something can happen in
between.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

[Qemu-devel] [PATCH 1/2] tcg: use tcg_debug_assert instead of assert (fix performance regression)

2016-04-21 Thread Aurelien Jarno

The TCG code is quite performance sensitive, but at the same time can
also be quite tricky. That is why asserts that can be enabled with the
--enable-debug-tcg configure option.

This used to work the following way:

| #include "config.h"
|
| ...
|
| #if !defined(CONFIG_DEBUG_TCG) && !defined(NDEBUG)
| /* define it to suppress various consistency checks (faster) */
| #define NDEBUG
| #endif
|
| ...
|
| #include 

Since commit 757e725b (tcg: Clean up includes) "config.h" as been
replaced by "qemu/osdep.h" which itself includes . As a
consequence the assertions are always enabled, even when using
--disable-debug-tcg, causing a performance regression, especially on
targets with many registers. For instance on qemu-system-ppc the
speed difference is about 15%.

tcg_debug_assert is controlled directly by CONFIG_DEBUG_TCG and already
uses in some places. This patch replaces all the calls to assert into
calss to tcg_debug_assert.

Cc: Peter Maydell <peter.mayd...@linaro.org>
Cc: Richard Henderson <r...@twiddle.net>
Signed-off-by: Aurelien Jarno <aurel...@aurel32.net>
---
 tcg/aarch64/tcg-target.inc.c | 24 
 tcg/arm/tcg-target.inc.c | 12 ++--
 tcg/i386/tcg-target.inc.c|  8 
 tcg/ia64/tcg-target.inc.c|  6 +++---
 tcg/mips/tcg-target.inc.c| 18 +-
 tcg/optimize.c   |  4 ++--
 tcg/ppc/tcg-target.inc.c | 22 +++---
 tcg/s390/tcg-target.inc.c|  6 +++---
 tcg/sparc/tcg-target.inc.c   |  8 
 tcg/tcg.c| 36 ++--
 tcg/tci/tcg-target.inc.c | 42 +-
 11 files changed, 93 insertions(+), 93 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 0ed10a9..f5f0ce3 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -67,7 +67,7 @@ static const int tcg_target_call_oarg_regs[1] = {
 static inline void reloc_pc26(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
 {
 ptrdiff_t offset = target - code_ptr;
-assert(offset == sextract64(offset, 0, 26));
+tcg_debug_assert(offset == sextract64(offset, 0, 26));
 /* read instruction, mask away previous PC_REL26 parameter contents,
set the proper offset, then write back the instruction. */
 *code_ptr = deposit32(*code_ptr, 0, 26, offset);
@@ -76,14 +76,14 @@ static inline void reloc_pc26(tcg_insn_unit *code_ptr, 
tcg_insn_unit *target)
 static inline void reloc_pc19(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
 {
 ptrdiff_t offset = target - code_ptr;
-assert(offset == sextract64(offset, 0, 19));
+tcg_debug_assert(offset == sextract64(offset, 0, 19));
 *code_ptr = deposit32(*code_ptr, 5, 19, offset);
 }
 
 static inline void patch_reloc(tcg_insn_unit *code_ptr, int type,
intptr_t value, intptr_t addend)
 {
-assert(addend == 0);
+tcg_debug_assert(addend == 0);
 switch (type) {
 case R_AARCH64_JUMP26:
 case R_AARCH64_CALL26:
@@ -402,7 +402,7 @@ static void tcg_out_insn_3314(TCGContext *s, AArch64Insn 
insn,
 insn |= pre << 24;
 insn |= w << 23;
 
-assert(ofs >= -0x200 && ofs < 0x200 && (ofs & 7) == 0);
+tcg_debug_assert(ofs >= -0x200 && ofs < 0x200 && (ofs & 7) == 0);
 insn |= (ofs & (0x7f << 3)) << (15 - 3);
 
 tcg_out32(s, insn | r2 << 10 | rn << 5 | r1);
@@ -412,9 +412,9 @@ static void tcg_out_insn_3401(TCGContext *s, AArch64Insn 
insn, TCGType ext,
   TCGReg rd, TCGReg rn, uint64_t aimm)
 {
 if (aimm > 0xfff) {
-assert((aimm & 0xfff) == 0);
+tcg_debug_assert((aimm & 0xfff) == 0);
 aimm >>= 12;
-assert(aimm <= 0xfff);
+tcg_debug_assert(aimm <= 0xfff);
 aimm |= 1 << 12;  /* apply LSL 12 */
 }
 tcg_out32(s, insn | ext << 31 | aimm << 10 | rn << 5 | rd);
@@ -444,7 +444,7 @@ static void tcg_out_insn_3403(TCGContext *s, AArch64Insn 
insn, TCGType ext,
 static void tcg_out_insn_3405(TCGContext *s, AArch64Insn insn, TCGType ext,
   TCGReg rd, uint16_t half, unsigned shift)
 {
-assert((shift & ~0x30) == 0);
+tcg_debug_assert((shift & ~0x30) == 0);
 tcg_out32(s, insn | ext << 31 | shift << (21 - 4) | half << 5 | rd);
 }
 
@@ -538,7 +538,7 @@ static void tcg_out_logicali(TCGContext *s, AArch64Insn 
insn, TCGType ext,
 {
 unsigned h, l, r, c;
 
-assert(is_limm(limm));
+tcg_debug_assert(is_limm(limm));
 
 h = clz64(limm);
 l = ctz64(limm);
@@ -793,7 +793,7 @@ static void tcg_out_cmp(TCGContext *s, TCGType ext, TCGReg 
a,
 static inline void tcg_out_goto(TCGContext *s, tcg_insn_unit *target)
 {
 ptrdiff_t offset = target - s->code_ptr;
-

[Qemu-devel] [PATCH 2/2] tcg: check for CONFIG_DEBUG_TCG instead of NDEBUG

2016-04-21 Thread Aurelien Jarno

Check for CONFIG_DEBUG_TCG instead of NDEBUG, drop now useless code.

Cc: Richard Henderson <r...@twiddle.net>
Signed-off-by: Aurelien Jarno <aurel...@aurel32.net>
---
 tcg/aarch64/tcg-target.inc.c | 4 ++--
 tcg/arm/tcg-target.inc.c | 2 +-
 tcg/i386/tcg-target.inc.c| 2 +-
 tcg/ia64/tcg-target.inc.c| 2 +-
 tcg/mips/tcg-target.inc.c| 2 +-
 tcg/ppc/tcg-target.inc.c | 2 +-
 tcg/s390/tcg-target.inc.c| 2 +-
 tcg/sparc/tcg-target.inc.c   | 2 +-
 tcg/tcg.c| 9 ++---
 tcg/tci/tcg-target.inc.c | 2 +-
 10 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index f5f0ce3..a8fb442 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -18,14 +18,14 @@
makes things much cleaner.  */
 QEMU_BUILD_BUG_ON(TCG_TYPE_I32 != 0 || TCG_TYPE_I64 != 1);
 
-#ifndef NDEBUG
+#ifdef CONFIG_DEBUG_TCG
 static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 "%x0", "%x1", "%x2", "%x3", "%x4", "%x5", "%x6", "%x7",
 "%x8", "%x9", "%x10", "%x11", "%x12", "%x13", "%x14", "%x15",
 "%x16", "%x17", "%x18", "%x19", "%x20", "%x21", "%x22", "%x23",
 "%x24", "%x25", "%x26", "%x27", "%x28", "%fp", "%x30", "%sp",
 };
-#endif /* NDEBUG */
+#endif /* CONFIG_DEBUG_TCG */
 
 static const int tcg_target_reg_alloc_order[] = {
 TCG_REG_X20, TCG_REG_X21, TCG_REG_X22, TCG_REG_X23,
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 9995b74..2b7fbdd 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -67,7 +67,7 @@ bool use_idiv_instructions;
 # define USING_SOFTMMU 0
 #endif
 
-#ifndef NDEBUG
+#ifdef CONFIG_DEBUG_TCG
 static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 "%r0",
 "%r1",
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 65f0d4c..007407c 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -24,7 +24,7 @@
 
 #include "tcg-be-ldst.h"
 
-#ifndef NDEBUG
+#ifdef CONFIG_DEBUG_TCG
 static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 #if TCG_TARGET_REG_BITS == 64
 "%rax", "%rcx", "%rdx", "%rbx", "%rsp", "%rbp", "%rsi", "%rdi",
diff --git a/tcg/ia64/tcg-target.inc.c b/tcg/ia64/tcg-target.inc.c
index 5233da1..7557e6a 100644
--- a/tcg/ia64/tcg-target.inc.c
+++ b/tcg/ia64/tcg-target.inc.c
@@ -27,7 +27,7 @@
  * Register definitions
  */
 
-#ifndef NDEBUG
+#ifdef CONFIG_DEBUG_TCG
 static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
  "r0",  "r1",  "r2",  "r3",  "r4",  "r5",  "r6",  "r7",
  "r8",  "r9", "r10", "r11", "r12", "r13", "r14", "r15",
diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index f7194b6..aaf881c 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -35,7 +35,7 @@
 #define LO_OFF(MIPS_BE * 4)
 #define HI_OFF(4 - LO_OFF)
 
-#ifndef NDEBUG
+#ifdef CONFIG_DEBUG_TCG
 static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 "zero",
 "at",
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 60bcaa7..00bb90f 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -89,7 +89,7 @@ static bool have_isa_2_06;
 #define TCG_GUEST_BASE_REG 30
 #endif
 
-#ifndef NDEBUG
+#ifdef CONFIG_DEBUG_TCG
 static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 "r0",
 "r1",
diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index 8b30256..5805532 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -221,7 +221,7 @@ typedef enum S390Opcode {
 RX_STH  = 0x40,
 } S390Opcode;
 
-#ifndef NDEBUG
+#ifdef CONFIG_DEBUG_TCG
 static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 "%r0", "%r1", "%r2", "%r3", "%r4", "%r5", "%r6", "%r7",
 "%r8", "%r9", "%r10" "%r11" "%r12" "%r13" "%r14" "%r15"
diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
index d64daba..d641cfd 100644
--- a/tcg/sparc/tcg-target.inc.c
+++ b/tcg/sparc/tcg-target.inc.c
@@ -24,7 +24,7 @@
 
 #include "tcg-be-null.h"
 
-#ifndef NDEBUG
+#ifdef CONFIG_DEBUG_TCG
 stati

[Qemu-devel] [PATCH] cuda: fix off-by-one error in SET_TIME command

2016-04-18 Thread Aurelien Jarno

With the new framework the cuda_cmd_set_time command directly receive
the data, without the command byte. Therefore the time is stored at
in_data[0], not at in_data[1].

This fixes the "hwclock --systohc" command in a guest.

Cc: Hervé Poussineau <hpous...@reactos.org>
Cc: David Gibson <da...@gibson.dropbear.id.au>
Signed-off-by: Aurelien Jarno <aurel...@aurel32.net>
---
 hw/misc/macio/cuda.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/misc/macio/cuda.c b/hw/misc/macio/cuda.c
index c7472aa..f15f301 100644
--- a/hw/misc/macio/cuda.c
+++ b/hw/misc/macio/cuda.c
@@ -685,8 +685,8 @@ static bool cuda_cmd_set_time(CUDAState *s,
 return false;
 }
 
-ti = (((uint32_t)in_data[1]) << 24) + (((uint32_t)in_data[2]) << 16)
- + (((uint32_t)in_data[3]) << 8) + in_data[4];
+ti = (((uint32_t)in_data[0]) << 24) + (((uint32_t)in_data[1]) << 16)
+ + (((uint32_t)in_data[2]) << 8) + in_data[3];
 s->tick_offset = ti - (qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL)
/ NANOSECONDS_PER_SECOND);
 return true;
-- 
2.8.0.rc3

Re: [Qemu-devel] [PATCH] tcg/mips: Fix type of tcg_target_reg_alloc_order[]

2016-04-01 Thread Aurelien Jarno

On 2016-04-01 15:49, James Hogan wrote:
> The MIPS TCG backend is the only one to have
> tcg_target_reg_alloc_order[] elements of type TCGReg rather than int.
> This resulted in commit 91478cefaaf2 ("tcg: Allocate indirect_base
> temporaries in a different order") breaking the build on MIPS since the
> type differed from indirect_reg_alloc_order[]:
> 
> tcg/tcg.c:1725:44: error: pointer type mismatch in conditional expression 
> [-Werror]
>  order = rev ? indirect_reg_alloc_order : tcg_target_reg_alloc_order;
> ^
> 
> Make it an array of ints to fix the build and match other architectures.
> 
> Fixes: 91478cefaaf2 ("tcg: Allocate indirect_base temporaries in a different 
> order")
> Signed-off-by: James Hogan <james.ho...@imgtec.com>
> Cc: Aurelien Jarno <aurel...@aurel32.net>
> Cc: Richard Henderson <r...@twiddle.net>
> ---
>  tcg/mips/tcg-target.inc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
> index 297bd00910b7..682e19897db0 100644
> --- a/tcg/mips/tcg-target.inc.c
> +++ b/tcg/mips/tcg-target.inc.c
> @@ -76,7 +76,7 @@ static const char * const 
> tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
>  #define TCG_TMP1  TCG_REG_T9
>  
>  /* check if we really need so many registers :P */
> -static const TCGReg tcg_target_reg_alloc_order[] = {
> +static const int tcg_target_reg_alloc_order[] = {
>  /* Call saved registers.  */
>  TCG_REG_S0,
>  TCG_REG_S1,

Acked-by: Aurelien Jarno <aurel...@aurel32.net>

Richard, do you have a pending TCG pull request in which you can include
this one?

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v2 01/16] tcg-mips: Always use tcg_debug_assert

2016-02-28 Thread Aurelien Jarno

On 2016-02-15 14:42, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  tcg/mips/tcg-target.c | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/tcg/mips/tcg-target.c b/tcg/mips/tcg-target.c
> index 2dc4998..ebb936d 100644
> --- a/tcg/mips/tcg-target.c
> +++ b/tcg/mips/tcg-target.c
> @@ -128,7 +128,7 @@ static inline uint32_t reloc_pc16_val(tcg_insn_unit *pc, 
> tcg_insn_unit *target)
>  {
>  /* Let the compiler perform the right-shift as part of the arithmetic.  
> */
>  ptrdiff_t disp = target - (pc + 1);
> -assert(disp == (int16_t)disp);
> +tcg_debug_assert(disp == (int16_t)disp);
>  return disp & 0x;
>  }
>  
> @@ -139,7 +139,7 @@ static inline void reloc_pc16(tcg_insn_unit *pc, 
> tcg_insn_unit *target)
>  
>  static inline uint32_t reloc_26_val(tcg_insn_unit *pc, tcg_insn_unit *target)
>  {
> -assertuintptr_t)pc ^ (uintptr_t)target) & 0xf000) == 0);
> +tcg_debug_assertuintptr_t)pc ^ (uintptr_t)target) & 0xf000) == 
> 0);
>  return ((uintptr_t)target >> 2) & 0x3ff;
>  }
>  
> @@ -151,8 +151,8 @@ static inline void reloc_26(tcg_insn_unit *pc, 
> tcg_insn_unit *target)
>  static void patch_reloc(tcg_insn_unit *code_ptr, int type,
>  intptr_t value, intptr_t addend)
>  {
> -assert(type == R_MIPS_PC16);
> -assert(addend == 0);
> +tcg_debug_assert(type == R_MIPS_PC16);
> +tcg_debug_assert(addend == 0);
>  reloc_pc16(code_ptr, (tcg_insn_unit *)value);
>  }
>  
> @@ -433,7 +433,7 @@ static bool tcg_out_opc_jmp(TCGContext *s, MIPSInsn opc, 
> void *target)
>  if ((from ^ dest) & -(1 << 28)) {
>  return false;
>  }
> -assert((dest & 3) == 0);
> +tcg_debug_assert((dest & 3) == 0);
>  
>  inst = opc;
>  inst |= (dest >> 2) & 0x3ff;
> @@ -808,9 +808,9 @@ static void tcg_out_setcond2(TCGContext *s, TCGCond cond, 
> TCGReg ret,
>  TCGReg tmp0 = TCG_TMP0;
>  TCGReg tmp1 = ret;
>  
> -assert(ret != TCG_TMP0);
> +tcg_debug_assert(ret != TCG_TMP0);
>  if (ret == ah || ret == bh) {
> -assert(ret != TCG_TMP1);
> +tcg_debug_assert(ret != TCG_TMP1);
>  tmp1 = TCG_TMP1;
>  }
>  
> @@ -1471,8 +1471,8 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
> opc,
>  case INDEX_op_and_i32:
>  if (c2 && a2 != (uint16_t)a2) {
>  int msb = ctz32(~a2) - 1;
> -assert(use_mips32r2_instructions);
> -    assert(is_p2m1(a2));
> +tcg_debug_assert(use_mips32r2_instructions);
> +tcg_debug_assert(is_p2m1(a2));
>  tcg_out_opc_bf(s, OPC_EXT, a0, a1, msb, 0);
>  break;
>  }

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v2 00/16] tcg mips64 and mips r6 improvements

2016-02-28 Thread Aurelien Jarno

On 2016-02-15 14:42, Richard Henderson wrote:
> Changes since v1:
>   * Some bugs pointed out by Mark fixed.
>   * Canonicalize the whole file on tcg_debug_assert.
>   * Switch bswap code to subroutine earlier; the first patch is
> standalone for mips32, and there is no longer an intermediate
> patch with inline bswap for mips64.
>   * Use NAL for pre-r6 mips64 loading of the slow path return address.
> 

Thanks a lot for working on that, it's something I have on my TODO list
for months. I have finally found time to have a look and give a try over
the week-end (sorry about the delay).

It seems to work perfectly for 64-bit guests on mips64el but 32-bit
guests end-up quickly in a segmentation fault. It's easily reproducible 
by starting qemu-system-i386 on a mips64el host, it crashes when
executing seabios.

More problematic it seems that the patch "Adjust qemu_ld/st for mips64"
causes a regression on at least a big-endian 32-bit host running
qemu-system-i386. It is reproducible by booting a Debian i386 wheezy
guest on such a system. Unfortunately the week-end was too short for
finding the issue, I'll continue looking in the next days.

I have a few comments on the individual patches, I'll send them asap.
Note that I don't have an R6 machine, so I haven't been able to test
that part.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [Qemu-ppc] [PULL 03/39] macio: use the existing IDEDMA aiocb to hold the active DMA aiocb

2016-01-29 Thread Aurelien Jarno

On 2016-01-29 16:06, David Gibson wrote:
> From: Mark Cave-Ayland <mark.cave-ayl...@ilande.co.uk>
> 
> Currently the aiocb is held within MACIOIDEState, however the IDE core code
> assumes that the current actvie DMA aiocb is held in aiocb in a few places,
> e.g. ide_bus_reset() and ide_reset().
> 
> Switch over to using IDEDMA aiocb to store the aiocb for the current active
> DMA request so that bus resets and restarts are handled correctly. As a
> consequence we can now use ide_set_inactive() rather than handling its
> functionality ourselves.
> 
> Signed-off-by: Mark Cave-Ayland <mark.cave-ayl...@ilande.co.uk>
> Reviewed-by: John Snow <js...@redhat.com>
> Signed-off-by: David Gibson <da...@gibson.dropbear.id.au>
> ---
>  hw/ide/macio.c  |  20 +-
>  hw/ide/macio.c.orig | 634 
> ++++

I don't think you want to add this file to the git.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH] target-mips: Stop using uint_fast*_t types in r4k_tlb_t struct

2016-01-29 Thread Aurelien Jarno

On 2016-01-25 17:40, Peter Maydell wrote:
> The r4k_tlb_t structure uses the uint_fast*_t types. Most of these
> uses are in bitfields and are thus pointless, because the bitfield
> itself specifies the width of the type; just use 'unsigned int'
> instead. (On glibc uint_fast16_t is defined as either 32 or 64 bits,
> so we know the code is not reliant on it being exactly 16 bits.)
> There is also one use of uint_fast8_t, which we replace with uint8_t,
> because both are exactly 8 bits on glibc and this is the only
> place outside the softfloat code which uses an int_fast*_t type.
> 
> Signed-off-by: Peter Maydell <peter.mayd...@linaro.org>
> ---
> I'm going to have a go at getting rid of the int_fast16_t usage
> in the softfloat code too, but in the meantime this is an
> independent cleanup.
> 
>  target-mips/cpu.h | 26 +-
>  1 file changed, 13 insertions(+), 13 deletions(-)

Thanks for the cleanup.

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH 0/4] fpu: Remove use of int_fast*_t types

2016-01-29 Thread Aurelien Jarno

On 2016-01-26 11:30, Peter Maydell wrote:
> This patchset removes the uses of int_fast*_t types from the
> softfloat code:
>  * the return types for the "convert to 16 bit integer" functions
>are changed to int16_t
>  * uses of int_fast*_t for a shift count or an exponent value
>are changed to int
> 
> Basically, where the type was being used to mean what should
> logically be exactly 16 bits we use int16_t; where it was just
> being used for a value which isn't inherently 16 bits wide
> we switch to plain int.
> 
> Compatibility note: both these changes match the logical definition
> of int_fast*_t so if the code was not previously buggily relying
> on the width it happened to be they will not introduce any new bugs.
> In practice on glibc int_fast16_t is 32-bits on 32-bit platforms
> and 64-bits on 64-bit platforms so we are changing the underlying
> type size. I have tested by running a bunch of ARM regression
> tests with 'risu', so I'm pretty happy this doesn't cause problems.
> 
> The final patch removes some back-compat defines from osdep.h;
> it depends on both the earlier patches in this series and also
> on the targe-mips patch I sent out yesterday:
>   http://patchwork.ozlabs.org/patch/572843/
> (there are no other uses of the int_fast* types in QEMU.)
> 
> thanks
> -- PMM
> 
> Peter Maydell (4):
>   fpu: Remove use of int_fast16_t in conversions to int16
>   fpu: Use plain 'int' rather than 'int_fast16_t' for shift counts
>   fpu: Use plain 'int' rather than 'int_fast16_t' for exponents
>   osdep.h: Remove int_fast*_t Solaris compatibility code
> 
>  fpu/softfloat-macros.h  |  18 +++---
>  fpu/softfloat.c | 162 
> ++--
>  include/fpu/softfloat.h |  16 ++---
>  include/qemu/osdep.h|   7 ---
>  4 files changed, 104 insertions(+), 99 deletions(-)

Great work.

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH] MAINTAINERS: Add section for FPU emulation

2016-01-29 Thread Aurelien Jarno

On 2016-01-26 13:27, Peter Maydell wrote:
> Add an entry to the MAINTAINERS file for our softfloat FPU
> emulation code. This code is only 'odd fixes' but it's useful to
> record who to cc on patches to it.
> 
> Signed-off-by: Peter Maydell <peter.mayd...@linaro.org>
> ---
> Would anybody else like to be listed here (ie to be cc'd on softfloat
> patches) ? Richard? Aurelien?

As long as it is in "Odd Fixes" mode, it would like to get it listed
please. I don't have time to follow the whole mailing list anymore, so
being Cc'd n on softfloat patches would be nice.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH RESEND] softfloat: fix return type of roundAndPackFloat16

2016-01-17 Thread Aurelien Jarno

On 2016-01-15 14:21, Peter Maydell wrote:
> On 13 January 2016 at 16:03, Aurelien Jarno <aurel...@aurel32.net> wrote:
> > The roundAndPackFloat16 function should return a float16 value, not a
> > float32 one. Fix that.
> >
> > Cc: Peter Maydell <peter.mayd...@linaro.org>
> > Signed-off-by: Aurelien Jarno <aurel...@aurel32.net>
> > ---
> >  fpu/softfloat.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > Peter, given you are working on softfloat patches, you might want to get
> > this one merged at the same time.
> >
> > diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> > index f1170fe..acc9099 100644
> > --- a/fpu/softfloat.c
> > +++ b/fpu/softfloat.c
> > @@ -3368,7 +3368,7 @@ static float16 packFloat16(flag zSign, int_fast16_t 
> > zExp, uint16_t zSig)
> >  | Binary Floating-Point Arithmetic.
> >  
> > **/
> >
> > -static float32 roundAndPackFloat16(flag zSign, int_fast16_t zExp,
> > +static float16 roundAndPackFloat16(flag zSign, int_fast16_t zExp,
> > uint32_t zSig, flag ieee,
> > float_status *status)
> >  {
> 
> Reviewed-by: Peter Maydell <peter.mayd...@linaro.org>
> 
> (a harmless error in the current code but we might as well get it right).

It's harmless in the default build, but it fails to build when softfloat
type checking is activated. Unfortunately more code with the wrong type
has been added recently.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH 0/6] Get rid of confusing softfloat-specific integer types

2016-01-13 Thread Aurelien Jarno

On 2016-01-12 12:55, Peter Maydell wrote:
> This patchset removes the confusing softfloat-specific integer
> types int8, uint8, int32, uint32, int64 and uint64, replacing
> them with the standard _t types that they were typedef'd as.
> These frequently got accidentally used outside the softfloat
> code as a simple typo for the standard types (as you can see
> from the various files touched in the diffstat).
> 
> Although there is technically a semantic difference (the
> softfloat types are "at least X bits" whereas the standard
> types are "exactly X bits", the distinction is unlikely to
> make much performance difference and "upgrading" the types to
> use int_fast*_t would require careful code analysis to check we
> weren't accidentally  relying on the type width. It also means
> we might potentially have subtle bugs on only some host platforms,
> which is worth avoiding I think.
> 
> (In particular glibc defines int_fast32_t as a 64 bit type
> on 64 bit systems, which is unlikely to be the most sensible
> type to actually use for performance. I was reading a discussion
> about the _fast_ types from the musl irc channel recently:
> https://gist.github.com/andrewrk/ac66b24a0a202d87cea7
> which suggests that they're in practice not very useful.)

Thanks for doing this change. I hope this time we'll reach a consensus.

> This is admittedly a different decision to the one we made in
> the past for int16/uint16 (commits 94a49d86c536af3, 5aea4c589aa).
> I can do a followup patch which converts our int_fast16_t/uint_fast16_t
> usage to int16_t/uint16_t if people would like.
> (I think the difference is partly that the old int16/uint16 types
> really were bigger than 16 bits so we knew the code was not
> accidentally relying on exactly-16-bitness. Also I have a feeling
> that I was one of those suggesting the _fast_ types, but I have
> changed my mind and think I was wrong back then.) 

Yes please it would be nice if we can use standard consistent type
everywhere.

> I have left the 'flag' type alone. This could reasonably be changed
> to 'bool' if we checked all the uses to make sure they weren't
> accidentally relying on it being an integer type. The type name
> is not such that it will be accidentally used outside softfloat,
> so it's less of an irritant.

Indeed.

> thanks
> -- PMM
> 
> Peter Maydell (6):
>   fpu: Replace int64 typedef with int64_t
>   fpu: Replace uint64 typedef with uint64_t
>   fpu: Replace int32 typedef with int32_t
>   fpu: Replace uint32 typedef with uint32_t
>   fpu: Replace int8 typedef with int8_t
>   fpu: Replace uint8 typedef with uint8_t
> 
>  crypto/secret.c|   2 +-
>  fpu/softfloat-macros.h |  26 +++---
>  fpu/softfloat-specialize.h |   2 +-
>  fpu/softfloat.c| 218 
> ++---
>  hw/i386/pc.c   |   2 +-
>  hw/ipmi/isa_ipmi_bt.c  |   2 +-
>  hw/ipmi/isa_ipmi_kcs.c |   2 +-
>  hw/misc/imx25_ccm.c|   2 +-
>  hw/misc/imx31_ccm.c|   2 +-
>  hw/net/vmware_utils.h  |   2 +-
>  hw/net/vmxnet3.c   |   2 +-
>  hw/ppc/spapr_events.c  |   4 +-
>  include/fpu/softfloat.h|  68 ++
>  include/hw/i386/pc.h   |   2 +-
>  migration/ram.c|   2 +-
>  target-alpha/fpu_helper.c  |   2 +-
>  target-mips/kvm.c  |   4 +-
>  target-mips/msa_helper.c   |  36 
>  target-s390x/kvm.c |   2 +-
>  tests/vhost-user-test.c|   2 +-
>  20 files changed, 187 insertions(+), 197 deletions(-)

So I am happy to give a:

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>


Regards,
Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

[Qemu-devel] [PATCH RESEND] softfloat: fix return type of roundAndPackFloat16

2016-01-13 Thread Aurelien Jarno

The roundAndPackFloat16 function should return a float16 value, not a
float32 one. Fix that.

Cc: Peter Maydell <peter.mayd...@linaro.org>
Signed-off-by: Aurelien Jarno <aurel...@aurel32.net>
---
 fpu/softfloat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Peter, given you are working on softfloat patches, you might want to get
this one merged at the same time.

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index f1170fe..acc9099 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -3368,7 +3368,7 @@ static float16 packFloat16(flag zSign, int_fast16_t zExp, 
uint16_t zSig)
 | Binary Floating-Point Arithmetic.
 **/
 
-static float32 roundAndPackFloat16(flag zSign, int_fast16_t zExp,
+static float16 roundAndPackFloat16(flag zSign, int_fast16_t zExp,
uint32_t zSig, flag ieee,
float_status *status)
 {
-- 
2.1.4

Re: [Qemu-devel] [PATCH] target-mips: Fix ALIGN instruction when bp=0

2016-01-01 Thread Aurelien Jarno

[snip]

> From e01539a11061c447bece8dccde1715da9534024d Mon Sep 17 00:00:00 2001
> From: Miodrag Dinic <miodrag.di...@imgtec.com>
> Date: Thu, 3 Dec 2015 16:48:57 +0100
> Subject: [PATCH] target-mips: Fix ALIGN instruction when bp=0
> 
> If executing ALIGN with shift count bp=0 within mips64 emulation,
> the result of the operation should be sign extended.
> 
> Taken from the official documentation (pseudo code) :
> 
> ALIGN:
>   tmp_rt_hi = unsigned_word(GPR[rt]) << (8*bp)
>   tmp_rs_lo = unsigned_word(GPR[rs]) >> (8*(4-bp))
>   tmp = tmp_rt_hi || tmp_rt_lo
>   GPR[rd] = sign_extend.32(tmp)
> 
> Signed-off-by: Miodrag Dinic <miodrag.di...@imgtec.com>
> ---
>  target-mips/translate.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/target-mips/translate.c b/target-mips/translate.c
> index 5626647..f20678c 100644
> --- a/target-mips/translate.c
> +++ b/target-mips/translate.c
> @@ -4630,7 +4630,15 @@ static void gen_align(DisasContext *ctx, int opc, int 
> rd, int rs, int rt,
>  t0 = tcg_temp_new();
>  gen_load_gpr(t0, rt);
>  if (bp == 0) {
> -tcg_gen_mov_tl(cpu_gpr[rd], t0);
> +switch (opc) {
> +case OPC_ALIGN:
> +#if defined(TARGET_MIPS64)
> +tcg_gen_ext32s_i64(cpu_gpr[rd], t0);
> +break;
> +#endif

The way to fix that is basically ok. However you should just use
tcg_gen_ext32s_tl instead of tcg_gen_ext32s_i64 and drop the
TARGET_MIPS64 #ifdef.

> +default:

Then you can replace this by OPC_DALIGN for more clarity.

> +tcg_gen_mov_tl(cpu_gpr[rd], t0);
> +}
>  } else {
>  TCGv t1 = tcg_temp_new();
>  gen_load_gpr(t1, rs);

The resulting binary code should be the same, but less #ifdef means less
risk of breakage, as the code is always compiled.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v2 06/14] tcg: Change reg_to_temp to TCGTemp pointer

2015-12-31 Thread Aurelien Jarno

On 2015-12-17 12:00, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  tcg/tcg.c | 113 
> ++
>  tcg/tcg.h |   6 ++--
>  2 files changed, 57 insertions(+), 62 deletions(-)

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH 01/14] tcg: Change tcg_global_mem_new_* to take a TCGv_ptr

2015-12-31 Thread Aurelien Jarno

On 2015-12-31 11:26, Richard Henderson wrote:
> Thus, use cpu_env as the parameter, not TCG_AREG0 directly.
> Update all uses in the translators.
> 
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  target-alpha/translate.c  |  8 ++---
>  target-arm/translate-a64.c|  6 ++--
>  target-arm/translate.c| 18 +-
>  target-cris/translate.c   | 24 ++---
>  target-cris/translate_v10.c   | 82 
> +--
>  target-i386/translate.c   | 10 +++---
>  target-lm32/translate.c   | 24 ++---
>  target-m68k/translate.c   | 30 +---
>  target-microblaze/translate.c | 18 +-
>  target-mips/translate.c   | 25 ++---
>  target-moxie/translate.c  |  8 ++---
>  target-openrisc/translate.c   | 26 +++---
>  target-ppc/translate.c| 44 +++
>  target-s390x/translate.c  | 18 +-
>  target-sh4/translate.c| 48 -
>  target-sparc/translate.c  | 60 ---
>  target-tilegx/translate.c |  4 +--
>  target-tricore/translate.c| 22 ++--
>  target-unicore32/translate.c  |  2 +-
>  target-xtensa/translate.c | 10 +++---
>  tcg/tcg.c | 21 +++
>  tcg/tcg.h | 38 +++-
>  22 files changed, 282 insertions(+), 264 deletions(-)
> 

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v2 11/14] tcg: Implement indirect memory registers

2015-12-31 Thread Aurelien Jarno

On 2015-12-17 12:00, Richard Henderson wrote:
> That is, global_mem registers whose base is another global_mem
> register, rather than a fixed register.
> 
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  tcg/tcg.c | 95 
> ---
>  tcg/tcg.h |  2 ++
>  2 files changed, 68 insertions(+), 29 deletions(-)
> 
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index c51e0ec..7150a3f 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -509,17 +509,23 @@ int tcg_global_mem_new_internal(TCGType type, TCGv_ptr 
> base,
>  TCGContext *s = _ctx;
>  TCGTemp *base_ts = >temps[GET_TCGV_PTR(base)];
>  TCGTemp *ts = tcg_global_alloc(s);
> -int bigendian = 0;
> +int indirect_reg = 0, bigendian = 0;
>  #ifdef HOST_WORDS_BIGENDIAN
>  bigendian = 1;
>  #endif
>  
> +if (!base_ts->fixed_reg) {
> +indirect_reg = 1;
> +base_ts->indirect_base = 1;
> +}
> +
>  if (TCG_TARGET_REG_BITS == 32 && type == TCG_TYPE_I64) {
>  TCGTemp *ts2 = tcg_global_alloc(s);
>  char buf[64];
>  
>  ts->base_type = TCG_TYPE_I64;
>  ts->type = TCG_TYPE_I32;
> +ts->indirect_reg = indirect_reg;

Do we really need to add this new bit, while we can simply test
ts->mem_base->fixed_reg? This means one more derefence, but anyway it
has to be done later when calling temp_load?

>  ts->mem_allocated = 1;
>  ts->mem_base = base_ts;
>  ts->mem_offset = offset + bigendian * 4;

> @@ -1781,13 +1803,16 @@ static inline void temp_save(TCGContext *s, TCGTemp 
> *ts,
>   TCGRegSet allocated_regs)
>  {
>  #ifdef USE_LIVENESS_ANALYSIS
> -/* The liveness analysis already ensures that globals are back
> -   in memory. Keep an assert for safety. */
> -tcg_debug_assert(ts->val_type == TEMP_VAL_MEM || ts->fixed_reg);
> -#else
> +/* ??? Liveness does not yet incorporate indirect bases.  */
> +if (!ts->indirect_base) {
> +/* The liveness analysis already ensures that globals are back
> +   in memory. Keep an assert for safety. */
> +tcg_debug_assert(ts->val_type == TEMP_VAL_MEM || ts->fixed_reg);
> +return;
> +}

This basically disables the assert. What does it mean in practice? Can
it generates bad code?

> +#endif
>  temp_sync(s, ts, allocated_regs);
>  temp_dead(s, ts);
> -#endif
>  }
>  
>  /* save globals to their canonical location and assume they can be
> @@ -1812,12 +1837,15 @@ static void sync_globals(TCGContext *s, TCGRegSet 
> allocated_regs)
>  for (i = 0; i < s->nb_globals; i++) {
>  TCGTemp *ts = >temps[i];
>  #ifdef USE_LIVENESS_ANALYSIS
> -tcg_debug_assert(ts->val_type != TEMP_VAL_REG
> - || ts->fixed_reg
> - || ts->mem_coherent);
> -#else
> -temp_sync(s, ts, allocated_regs);
> +/* ??? Liveness does not yet incorporate indirect bases.  */
> +if (!ts->indirect_base) {
> +tcg_debug_assert(ts->val_type != TEMP_VAL_REG
> + || ts->fixed_reg
> + || ts->mem_coherent);
> +continue;
> +}

Same here.

>  #endif
> +temp_sync(s, ts, allocated_regs);
>  }
>  }
>  
> @@ -1833,12 +1861,15 @@ static void tcg_reg_alloc_bb_end(TCGContext *s, 
> TCGRegSet allocated_regs)
>  temp_save(s, ts, allocated_regs);
>  } else {
>  #ifdef USE_LIVENESS_ANALYSIS
> -/* The liveness analysis already ensures that temps are dead.
> -   Keep an assert for safety. */
> -assert(ts->val_type == TEMP_VAL_DEAD);
> -#else
> -temp_dead(s, ts);
> +/* ??? Liveness does not yet incorporate indirect bases.  */
> +if (!ts->indirect_base) {
> +/* The liveness analysis already ensures that temps are dead.
> +   Keep an assert for safety. */
> +assert(ts->val_type == TEMP_VAL_DEAD);
> +continue;
> +}

And here.

>  #endif
> +temp_dead(s, ts);
>  }
>  }
>  

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v2 10/14] tcg: Introduce temp_load

2015-12-31 Thread Aurelien Jarno

On 2015-12-17 12:00, Richard Henderson wrote:
> Unify all of the places that realize a temporary into a register.
> 
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  tcg/tcg.c | 116 
> --
>  1 file changed, 52 insertions(+), 64 deletions(-)
> 
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 6f40bf3..c51e0ec 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c

[ snip ]

> @@ -2162,23 +2163,9 @@ static void tcg_reg_alloc_call(TCGContext *s, int 
> nb_oargs, int nb_iargs,
>  #endif
>  if (arg != TCG_CALL_DUMMY_ARG) {
>  ts = >temps[arg];
> -if (ts->val_type == TEMP_VAL_REG) {
> -tcg_out_st(s, ts->type, ts->reg, TCG_REG_CALL_STACK, 
> stack_offset);
> -} else if (ts->val_type == TEMP_VAL_MEM) {
> -reg = tcg_reg_alloc(s, tcg_target_available_regs[ts->type], 
> -s->reserved_regs);
> -/* XXX: not correct if reading values from the stack */

Shouldn't we keep this comment by moving it to temp_load?

> -tcg_out_ld(s, ts->type, reg, ts->mem_base->reg, 
> ts->mem_offset);
> -tcg_out_st(s, ts->type, reg, TCG_REG_CALL_STACK, 
> stack_offset);
> -} else if (ts->val_type == TEMP_VAL_CONST) {
> -reg = tcg_reg_alloc(s, tcg_target_available_regs[ts->type], 
> -s->reserved_regs);
> -/* XXX: sign extend may be needed on some targets */

Same here.

> -tcg_out_movi(s, ts->type, reg, ts->val);
> -tcg_out_st(s, ts->type, reg, TCG_REG_CALL_STACK, 
> stack_offset);
> -} else {
> -tcg_abort();
> -}
> +temp_load(s, ts, tcg_target_available_regs[ts->type],
> +  s->reserved_regs);
> +tcg_out_st(s, ts->type, ts->reg, TCG_REG_CALL_STACK, 
> stack_offset);
>  }
>  #ifndef TCG_TARGET_STACK_GROWSUP
>  stack_offset += sizeof(tcg_target_long);
> @@ -2193,18 +2180,19 @@ static void tcg_reg_alloc_call(TCGContext *s, int 
> nb_oargs, int nb_iargs,
>  ts = >temps[arg];
>  reg = tcg_target_call_iarg_regs[i];
>  tcg_reg_free(s, reg);
> +
>  if (ts->val_type == TEMP_VAL_REG) {
>  if (ts->reg != reg) {
>  tcg_out_mov(s, ts->type, reg, ts->reg);
>  }
> -} else if (ts->val_type == TEMP_VAL_MEM) {
> -tcg_out_ld(s, ts->type, reg, ts->mem_base->reg, 
> ts->mem_offset);
> -} else if (ts->val_type == TEMP_VAL_CONST) {
> -/* XXX: sign extend ? */

And here.

> -tcg_out_movi(s, ts->type, reg, ts->val);
>  } else {
> -tcg_abort();
> +TCGRegSet arg_set;
> +
> +    tcg_regset_clear(arg_set);
> +    tcg_regset_set_reg(arg_set, reg);
> +temp_load(s, ts, arg_set, allocated_regs);
>  }
> +
>  tcg_regset_set_reg(allocated_regs, reg);
>  }
>  }

Otherwise it's a nice cleanup :-).

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v2 02/14] tcg: Change ts->mem_reg to ts->mem_base

2015-12-31 Thread Aurelien Jarno

On 2015-12-17 12:00, Richard Henderson wrote:
> Chain the temporaries together via pointers intstead of indices.
> The mem_reg value is now mem_base->reg.  This will be important later.
> 
> This does require that the frame pointer have a global temporary
> allocated for it.  This is simple bar the existing reserved_regs check.
> 
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  tcg/tcg.c | 65 
> +++
>  tcg/tcg.h |  4 ++--
>  2 files changed, 38 insertions(+), 31 deletions(-)

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v2 04/14] tcg: More use of TCGReg where appropriate

2015-12-31 Thread Aurelien Jarno

On 2015-12-17 12:00, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  tcg/tcg.c | 26 +++---
>  tcg/tcg.h |  8 
>  2 files changed, 19 insertions(+), 15 deletions(-)

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v2 03/14] tcg: Tidy temporary allocation

2015-12-31 Thread Aurelien Jarno

On 2015-12-17 12:00, Richard Henderson wrote:
> In particular, make sure the memory is memset before use.
> Continues the increased use of TCGTemp pointers instead of
> integer indices where appropriate.
> 
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  tcg/tcg.c | 123 
> --
>  1 file changed, 56 insertions(+), 67 deletions(-)
> 
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index b1864d3..0a6edfb 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -429,32 +429,45 @@ void tcg_func_start(TCGContext *s)
>  s->be = tcg_malloc(sizeof(TCGBackendData));
>  }
>  
> -static inline void tcg_temp_alloc(TCGContext *s, int n)
> +static inline int temp_idx(TCGContext *s, TCGTemp *ts)
>  {
> -if (n > TCG_MAX_TEMPS)
> -tcg_abort();
> +ptrdiff_t n = ts - s->temps;
> +tcg_debug_assert(n >= 0 && n < s->nb_temps);
> +return n;
> +}
> +
> +static inline TCGTemp *tcg_temp_alloc(TCGContext *s)
> +{
> +int n = s->nb_temps++;
> +tcg_debug_assert(n < TCG_MAX_TEMPS);
> +return memset(>temps[n], 0, sizeof(TCGTemp));
> +}
> +
> +static inline TCGTemp *tcg_global_alloc(TCGContext *s)
> +{
> +tcg_debug_assert(s->nb_globals == s->nb_temps);
> +s->nb_globals++;
> +return tcg_temp_alloc(s);
>  }

This is transforming an abort() which can happen all the time in an
assert which can happen only when TCG debug is enabled. Is it really
something we want? Maybe we should add a tcg_assert() function.

Otherwise it looks fine.

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v2 05/14] tcg: Remove tcg_get_arg_str_i32/64

2015-12-31 Thread Aurelien Jarno

On 2015-12-17 12:00, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  tcg/tcg.c | 10 --
>  tcg/tcg.h |  5 -
>  2 files changed, 15 deletions(-)

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v2 07/14] tcg: Change temp_dead argument to TCGTemp

2015-12-31 Thread Aurelien Jarno

On 2015-12-17 12:00, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  tcg/tcg.c | 48 +++-
>  1 file changed, 23 insertions(+), 25 deletions(-)

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v2 08/14] tcg: Change temp_sync argument to TCGTemp

2015-12-31 Thread Aurelien Jarno

On 2015-12-17 12:00, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  tcg/tcg.c | 55 ---
>  1 file changed, 28 insertions(+), 27 deletions(-)

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v2 09/14] tcg: Change temp_save argument to TCGTemp

2015-12-31 Thread Aurelien Jarno

On 2015-12-17 12:00, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  tcg/tcg.c | 16 +++-
>  1 file changed, 7 insertions(+), 9 deletions(-)

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH] tcg/arm: improve direct jump

2015-12-10 Thread Aurelien Jarno

On 2015-12-10 07:31, Richard Henderson wrote:
> On 12/10/2015 12:02 AM, Aurelien Jarno wrote:
> >Note: I don't really get the reason for the current 16MB limit. With the
> >standard branch instructions the offset is coded on 24 bits, but shifted
> >right by 2, which should give us a +/-32MB jumps, and therefore a 32MB
> >limit.
> 
> That might be me with the off-by-one error on the bit counting...

While the way to store the value has been changed a few times recently,
the original value dates from this commit:

commit 1cb0661e009267a5d060c4686f0857784a8da228
Author: balrog <balrog@c046a42c-6fe2-441c-8c8c-71466251a162>
Date:   Mon Dec 1 02:10:17 2008 +

arm: Reserve code buffer in memory range reachable for pc-relative branch.

Unfortunately this range is so narrow that I'm not sure if it makes more
sense to always use memory load to pc kind of branch instead.

git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5844 
c046a42c-6fe2-441c-8c8c-71466251a162

This doesn't fully explain the reason why 16MB and not 32MB.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

[Qemu-devel] [PATCH] tcg/arm: improve direct jump

2015-12-10 Thread Aurelien Jarno

Use ldr pc, [pc, #-4] kind of branch for direct jump. This removes the
need to flush the icache on TB linking, and allow to remove the limit
on the code generation buffer.

Cc: Richard Henderson <r...@twiddle.net>
Cc: TeLeMan <gele...@gmail.com>
Cc: Andrzej Zaborowski <balr...@gmail.com>
Signed-off-by: Aurelien Jarno <aurel...@aurel32.net>
---
 include/exec/exec-all.h | 24 
 tcg/arm/tcg-target.c|  8 +++-
 translate-all.c |  2 --
 3 files changed, 7 insertions(+), 27 deletions(-)

Note: I don't really get the reason for the current 16MB limit. With the
standard branch instructions the offset is coded on 24 bits, but shifted
right by 2, which should give us a +/-32MB jumps, and therefore a 32MB
limit.

Therefore it might be a good idea to benchmark the original QEMU vs a
QEMU with a 32MB buffer vs QEMU with this patch.

If mixing data and instructions like in this patch causes too much
performances trouble, at least on ARMv7 we might want to try with movw +
movt + movpc. It's only 4 bytes longer than the current patch.

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index d900b0d..3cd63c9 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -274,26 +274,10 @@ void aarch64_tb_set_jmp_target(uintptr_t jmp_addr, 
uintptr_t addr);
 #elif defined(__arm__)
 static inline void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr)
 {
-#if !QEMU_GNUC_PREREQ(4, 1)
-register unsigned long _beg __asm ("a1");
-register unsigned long _end __asm ("a2");
-register unsigned long _flg __asm ("a3");
-#endif
-
-/* we could use a ldr pc, [pc, #-4] kind of branch and avoid the flush */
-*(uint32_t *)jmp_addr =
-(*(uint32_t *)jmp_addr & ~0xff)
-| (((addr - (jmp_addr + 8)) >> 2) & 0xff);
-
-#if QEMU_GNUC_PREREQ(4, 1)
-__builtin___clear_cache((char *) jmp_addr, (char *) jmp_addr + 4);
-#else
-/* flush icache */
-_beg = jmp_addr;
-_end = jmp_addr + 4;
-_flg = 0;
-__asm __volatile__ ("swi 0x9f0002" : : "r" (_beg), "r" (_end), "r" (_flg));
-#endif
+/* Patch the branch destination. It uses a ldr pc, [pc, #-4] kind
+   of branch so we write absolute address and we don't need to
+   flush icache. */
+*(uint32_t *)jmp_addr = addr;
 }
 #elif defined(__sparc__) || defined(__mips__)
 void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr);
diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index 3edf6a6..f28b9ba 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -986,10 +986,6 @@ static inline void tcg_out_st8(TCGContext *s, int cond,
 tcg_out_st8_12(s, cond, rd, rn, offset);
 }
 
-/* The _goto case is normally between TBs within the same code buffer, and
- * with the code buffer limited to 16MB we wouldn't need the long case.
- * But we also use it for the tail-call to the qemu_ld/st helpers, which does.
- */
 static inline void tcg_out_goto(TCGContext *s, int cond, tcg_insn_unit *addr)
 {
 intptr_t addri = (intptr_t)addr;
@@ -1649,8 +1645,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 case INDEX_op_goto_tb:
 if (s->tb_jmp_offset) {
 /* Direct jump method */
+tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_PC, -4);
 s->tb_jmp_offset[args[0]] = tcg_current_code_size(s);
-tcg_out_b_noaddr(s, COND_AL);
+/* Skip over address */
+s->code_ptr++;
 } else {
 /* Indirect jump method */
 intptr_t ptr = (intptr_t)(s->tb_next + args[0]);
diff --git a/translate-all.c b/translate-all.c
index 042a857..1ca113c 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -472,8 +472,6 @@ static inline PageDesc *page_find(tb_page_addr_t index)
 # define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
 #elif defined(__aarch64__)
 # define MAX_CODE_GEN_BUFFER_SIZE  (128ul * 1024 * 1024)
-#elif defined(__arm__)
-# define MAX_CODE_GEN_BUFFER_SIZE  (16u * 1024 * 1024)
 #elif defined(__s390x__)
   /* We have a +- 4GB range on the branches; leave some slop.  */
 # define MAX_CODE_GEN_BUFFER_SIZE  (3ul * 1024 * 1024 * 1024)
-- 
2.6.2

Re: [Qemu-devel] tcg: improve MAX_CODE_GEN_BUFFER_SIZE for arm

2015-12-08 Thread Aurelien Jarno

On 2015-12-08 10:43, TeLeMan wrote:
> I know MAX_CODE_GEN_BUFFER_SIZE is limited by the host direct branch
> instructions.But the arm's MAX_CODE_GEN_BUFFER_SIZE is so small.I
> tried improving MAX_CODE_GEN_BUFFER_SIZE.I wrote some check codes for
> the overflow offset in tcg_out_b(), tcg_out_bl(),
> tcg_out_blx_imm(),reloc_pc24(). But I didn't catch any overflow case
> when tb_size and MAX_CODE_GEN_BUFFER_SIZE were larger than 32MB. After
> the generated code size was larger than 32MB, qemu crashed.

Instrumenting all the tcg_out_* branch related functions do not work
here as the address is actually not known at code generation:

case INDEX_op_goto_tb:
if (s->tb_jmp_offset) {
/* Direct jump method */
s->tb_jmp_offset[args[0]] = tcg_current_code_size(s);
tcg_out_b_noaddr(s, COND_AL);

It is patched later during TB linking.

> Any suggest for this issue?

I already posted a patch a long time ago to remove the 16MB limit on ARM
hosts:

http://lists.gnu.org/archive/html/qemu-devel/2012-10/msg01684.html

However as you can see in the thread, it has been rejected as it doesn't
not bring improvement in all cases.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] tcg: improve MAX_CODE_GEN_BUFFER_SIZE for arm

2015-12-08 Thread Aurelien Jarno

On 2015-12-08 11:51, Laurent Desnogues wrote:
> Hello,
> 
> On Tue, Dec 8, 2015 at 11:39 AM, Aurelien Jarno <aurel...@aurel32.net> wrote:
> [...]
> > I already posted a patch a long time ago to remove the 16MB limit on ARM
> > hosts:
> >
> > http://lists.gnu.org/archive/html/qemu-devel/2012-10/msg01684.html
> >
> > However as you can see in the thread, it has been rejected as it doesn't
> > not bring improvement in all cases.
> 
> We could perhaps resurrect it and do some more benchmarking?  Who
> would be able to do testing on (recent) ARM hardware?

I can provide an updated patch, but I would prefer if someone else does
the benchmarking on a really recent hardware. Not sure the hardware I
have (cortex A7) is really representative of a modern ARM CPU.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

[Qemu-devel] [PATCH] target-mips: silence NaNs for cvt.s.d and cvt.d.s

2015-12-06 Thread Aurelien Jarno

cvt.s.d and cvt.d.s are FP operations and thus need to convert input
sNaN into corresponding qNaN. Explicitely use the floatXX_maybe_silence_nan
functions for that as the floatXX_to_floatXX functions do not do that.

Cc: Leon Alrae <leon.al...@imgtec.com>
Signed-off-by: Aurelien Jarno <aurel...@aurel32.net>
---
 target-mips/op_helper.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c
index d2c98c9..20e79be 100644
--- a/target-mips/op_helper.c
+++ b/target-mips/op_helper.c
@@ -2545,6 +2545,7 @@ uint64_t helper_float_cvtd_s(CPUMIPSState *env, uint32_t 
fst0)
 uint64_t fdt2;
 
 fdt2 = float32_to_float64(fst0, >active_fpu.fp_status);
+fdt2 = float64_maybe_silence_nan(fdt2);
 update_fcr31(env, GETPC());
 return fdt2;
 }
@@ -2634,6 +2635,7 @@ uint32_t helper_float_cvts_d(CPUMIPSState *env, uint64_t 
fdt0)
 uint32_t fst2;
 
 fst2 = float64_to_float32(fdt0, >active_fpu.fp_status);
+fst2 = float32_maybe_silence_nan(fst2);
 update_fcr31(env, GETPC());
 return fst2;
 }
-- 
2.6.2

Re: [Qemu-devel] [PATCH for-2.5] tcg: Increase the highwater reservation

2015-12-04 Thread Aurelien Jarno

On 2015-12-03 13:19, Aurelien Jarno wrote:
> On 2015-12-02 10:36, Richard Henderson wrote:
> > On 12/01/2015 08:32 AM, Aurelien Jarno wrote:
> > >On 2015-12-01 08:19, Richard Henderson wrote:
> > >>If there are a lot of guest memory ops in the TB, the amount of
> > >>code generated by tcg_out_tb_finalize could be well more than 1k.
> > >>In the short term, increase the reservation larger than any TB
> > >>seen in practice.
> > >>
> > >>Reported-by: Aurelien Jarno <aurel...@aurel32.net>
> > >>Signed-off-by: Richard Henderson <r...@twiddle.net>
> > >
> > >Thanks! I was testing the same patch locally after our discussion, and I
> > >confirm it fixes the issue.
> > 
> > Here's the proper patch for 2.6.  If you'd like to run it through your same
> > tests, please?
> 
> The patch looks fine to me, the only tiny nitpick I have is that you can
> return a bool instead of a hint.
> 
> I am currently testing it (which is basically running a glibc build and
> testsuite multiple time in a VM), I'll let you know the result by
> tomorrow. In the meantime, you can add:
> 
> Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

I haven't seen any crash so far, which is more time than it usually
needs to trigger the issue. So I guess we can consider that the patch is
correct. Therefore:

Tested-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH for-2.5] tcg: Increase the highwater reservation

2015-12-03 Thread Aurelien Jarno

On 2015-12-02 10:36, Richard Henderson wrote:
> On 12/01/2015 08:32 AM, Aurelien Jarno wrote:
> >On 2015-12-01 08:19, Richard Henderson wrote:
> >>If there are a lot of guest memory ops in the TB, the amount of
> >>code generated by tcg_out_tb_finalize could be well more than 1k.
> >>In the short term, increase the reservation larger than any TB
> >>seen in practice.
> >>
> >>Reported-by: Aurelien Jarno <aurel...@aurel32.net>
> >>Signed-off-by: Richard Henderson <r...@twiddle.net>
> >
> >Thanks! I was testing the same patch locally after our discussion, and I
> >confirm it fixes the issue.
> 
> Here's the proper patch for 2.6.  If you'd like to run it through your same
> tests, please?

The patch looks fine to me, the only tiny nitpick I have is that you can
return a bool instead of a hint.

I am currently testing it (which is basically running a glibc build and
testsuite multiple time in a VM), I'll let you know the result by
tomorrow. In the meantime, you can add:

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH for-2.5] tcg: Increase the highwater reservation

2015-12-01 Thread Aurelien Jarno

On 2015-12-01 08:19, Richard Henderson wrote:
> If there are a lot of guest memory ops in the TB, the amount of
> code generated by tcg_out_tb_finalize could be well more than 1k.
> In the short term, increase the reservation larger than any TB
> seen in practice.
> 
> Reported-by: Aurelien Jarno <aurel...@aurel32.net>
> Signed-off-by: Richard Henderson <r...@twiddle.net>

Thanks! I was testing the same patch locally after our discussion, and I
confirm it fixes the issue.

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>
Tested-by: Aurelien Jarno <aurel...@aurel32.net>


-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH for-2.5] tcg: Increase the highwater reservation

2015-12-01 Thread Aurelien Jarno

On 2015-12-01 16:28, Peter Maydell wrote:
> On 1 December 2015 at 16:19, Richard Henderson <r...@twiddle.net> wrote:
> > If there are a lot of guest memory ops in the TB, the amount of
> > code generated by tcg_out_tb_finalize could be well more than 1k.
> > In the short term, increase the reservation larger than any TB
> > seen in practice.
> >
> > Reported-by: Aurelien Jarno <aurel...@aurel32.net>
> > Signed-off-by: Richard Henderson <r...@twiddle.net>
> > ---
> >
> > Reported and discussed with Aurelien on IRC yesterday.  This seems
> > to be the easiest fix for the upcoming release.  I will fix this
> > properly (by modifying every backend's finalize routines) for 2.6.
> 
> What would be the result of our hitting this bug? I ask because
> there's a report on qemu-discuss about a qemu-i386-on-ARM-host
> bug: http://lists.nongnu.org/archive/html/qemu-discuss/2015-11/msg00042.html
> and the debug log (http://www.mediafire.com/download/ge611be9vbebbw7/qemu.log)
> suggests we're segfaulting in translation on the TB shortly
> after we (successfully) translate a TB whose final 'out' size
> is 1100 and which has 64 guest writes in it. So I'm wondering
> if that's actually the same bug this is fixing...

I don't think this is the same bug. The problem happens because the slow
path of the softmmu load/store access is written at the end of the TB.
In user mode, there is no slow path, so nothing is written at the end.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH for-2.5] tcg: Increase the highwater reservation

2015-12-01 Thread Aurelien Jarno

On 2015-12-01 17:34, Aurelien Jarno wrote:
> On 2015-12-01 16:28, Peter Maydell wrote:
> > On 1 December 2015 at 16:19, Richard Henderson <r...@twiddle.net> wrote:
> > > If there are a lot of guest memory ops in the TB, the amount of
> > > code generated by tcg_out_tb_finalize could be well more than 1k.
> > > In the short term, increase the reservation larger than any TB
> > > seen in practice.
> > >
> > > Reported-by: Aurelien Jarno <aurel...@aurel32.net>
> > > Signed-off-by: Richard Henderson <r...@twiddle.net>
> > > ---
> > >
> > > Reported and discussed with Aurelien on IRC yesterday.  This seems
> > > to be the easiest fix for the upcoming release.  I will fix this
> > > properly (by modifying every backend's finalize routines) for 2.6.
> > 
> > What would be the result of our hitting this bug? I ask because
> > there's a report on qemu-discuss about a qemu-i386-on-ARM-host
> > bug: http://lists.nongnu.org/archive/html/qemu-discuss/2015-11/msg00042.html
> > and the debug log 
> > (http://www.mediafire.com/download/ge611be9vbebbw7/qemu.log)
> > suggests we're segfaulting in translation on the TB shortly
> > after we (successfully) translate a TB whose final 'out' size
> > is 1100 and which has 64 guest writes in it. So I'm wondering
> > if that's actually the same bug this is fixing...
> 
> I don't think this is the same bug. The problem happens because the slow
> path of the softmmu load/store access is written at the end of the TB.
> In user mode, there is no slow path, so nothing is written at the end.

That said the problem reported is likely fixed by this commit that went
just after it has been reported:

commit 644da9b39e477caa80bab69d2847dfcb468f0d33
Author: John Clarke <jo...@kirriwa.net>
Date:   Thu Nov 19 10:30:50 2015 +0100

tcg: Fix highwater check

A simple typo in the variable to use when comparing vs the highwater mark.
Reports are that qemu can in fact segfault occasionally due to this mistake.

Signed-off-by: John Clarke <jo...@kirriwa.net>
Signed-off-by: Richard Henderson <r...@twiddle.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH] hw/mips_malta: Fix KVM PC initialisation

2015-10-15 Thread Aurelien Jarno

On 2015-10-12 17:54, James Hogan wrote:
> Commit 71c199c81d29 ("mips_malta: provide ememsize env variable to
> kernels") changed the meaning of loaderparams.ram_size to be the whole
> of RAM rather than just the low part below where the boot code is placed
> for KVM, but it didn't update the PC initialisation for KVM to use
> ram_low_size. Fix that now.
> 
> Fixes: 71c199c81d29 ("mips_malta: provide ememsize env variable to kernels")
> Signed-off-by: James Hogan <james.ho...@imgtec.com>
> Cc: Paul Burton <paul.bur...@imgtec.com>
> Cc: Leon Alrae <leon.al...@imgtec.com>
> Cc: Aurelien Jarno <aurel...@aurel32.net>
> ---
>  hw/mips/mips_malta.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v2] target-mips: remove wrong checks for recip.fmt and rsqrt.fmt

2015-10-11 Thread Aurelien Jarno

(sorry for the late answer)

On 2015-08-26 14:12, Petar Jovanovic wrote:
> From: Petar Jovanovic <petar.jovano...@imgtec.com>
> 
> Instructions recip.{s|d} and rsqrt.{s|d} do not require 64-bit FPU neither
> they require any particular mode for its FPU. This patch removes the checks
> that may break a program that uses these instructions.

That is correct. That said these instructions do require at least a
MIPS32R2 or a MIPS64R1 CPU. I guess we should add these checks now that
check_cop1x do not guard them anymore.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 5/6] tcg/mips: Support r6 multiply/divide encodings

2015-10-08 Thread Aurelien Jarno

On 2015-10-02 13:24, James Hogan wrote:
> MIPSr6 adds several new integer multiply, divide, and modulo
> instructions, and removes several pre-r6 encodings, along with the HI/LO
> registers which were the implicit operands of some of those
> instructions. Update TCG to use the new instructions when built for r6.
> 
> The new instructions actually map much more directly to the TCG ops, as
> they only provide a single 32-bit half of the result and in a normal
> general purpose register instead of HI or LO.
> 
> The mulu2_i32 and muls2_i32 operations are no longer appropriate for r6,
> so they are removed from the TCG opcode table. This is because they
> would need to emit two separate host instructions anyway (for the high
> and low half of the result), which TCG can arrange automatically for us
> in the absense of mulu2_i32/muls2_i32 by splitting it into mul_i32 and
> mul*h_i32 TCG ops.
> 
> Signed-off-by: James Hogan <james.ho...@imgtec.com>
> Reviewed-by: Richard Henderson <r...@twiddle.net>
> Cc: Aurelien Jarno <aurel...@aurel32.net>
> ---
> Changes in v2:
> - Use a common OPC_MUL definition. use_mips32_instructions will always
>   be 1 for MIPS r6 builds (Richard)
> ---
>  tcg/mips/tcg-target.c | 42 +-
>  tcg/mips/tcg-target.h |  4 ++--
>  2 files changed, 43 insertions(+), 3 deletions(-)

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 1/6] tcg-opc.h: Simplify debug_insn_start def

2015-10-08 Thread Aurelien Jarno

On 2015-10-02 13:24, James Hogan wrote:
> We already have a TLADDR_ARGS definition, so rearrange the order
> slightly and use it in the definition of debug_insn_start, instead of
> having an #ifdef.
> 
> Signed-off-by: James Hogan <james.ho...@imgtec.com>
> Reviewed-by: Richard Henderson <r...@twiddle.net>
> ---
>  tcg/tcg-opc.h | 12 
>  1 file changed, 4 insertions(+), 8 deletions(-)

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

Note that it will conflict with the "tcg: Rename debug_insn_start to
insn_start" patch that Richard has posted on the list, but it should
be trivial to solve it.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 4/6] tcg/mips: Support r6 JR encoding

2015-10-08 Thread Aurelien Jarno

On 2015-10-02 13:24, James Hogan wrote:
> MIPSr6 encodes JR as JALR with zero as the link register, and the pre-r6
> JR encoding is removed. Update TCG to use the new encoding when built
> for r6.
> 
> We still use the old encoding for pre-r6, so as not to confuse return
> prediction stack hardware which may detect only particular encodings of
> the return instruction.
> 
> Signed-off-by: James Hogan <james.ho...@imgtec.com>
> Reviewed-by: Richard Henderson <r...@twiddle.net>
> Cc: Aurelien Jarno <aurel...@aurel32.net>
> ---
> Changes in v2:
> - Turn #define into enum (Richard).
> ---
>  tcg/mips/tcg-target.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 2/6] disas/mips: Add R6 jr/jr.hb to disassembler

2015-10-08 Thread Aurelien Jarno

On 2015-10-02 13:24, James Hogan wrote:
> MIPS r6 encodes jr as jalr zero, and jr.hb as jalr.hb zero, so add these
> encodings to the MIPS disassembly table.
> 
> Signed-off-by: James Hogan <james.ho...@imgtec.com>
> Reviewed-by: Leon Alrae <leon.al...@imgtec.com>
> Reviewed-by: Richard Henderson <r...@twiddle.net>
> Cc: Aurelien Jarno <aurel...@aurel32.net>
> ---
> Changes in v3:
> - Whoops. Fix jr.hb r6 encoding (Leon)
> ---
>  disas/mips.c | 2 ++
>  1 file changed, 2 insertions(+)

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 6/6] tcg/mips: Support r6 SEL{NE, EQ}Z instead of MOVN/MOVZ

2015-10-08 Thread Aurelien Jarno

On 2015-10-07 12:47, Leon Alrae wrote:
> On 07/10/15 10:46, Richard Henderson wrote:
> > Leon, do you want to take this as a mips maintainer, or shall I as tcg
> > maintainer?
> 
> I thought this would go via Aurelien's mips tcg-backend queue. But if
> Aurelien is busy, could you take them? (at the moment I don't have
> anything handy to test the mips backend).

Sorry I have been indeed a bit busy. I can send a pull request in the
next days. As you prefer.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 6/6] tcg/mips: Support r6 SEL{NE, EQ}Z instead of MOVN/MOVZ

2015-10-08 Thread Aurelien Jarno

On 2015-10-02 13:24, James Hogan wrote:
> Extend MIPS movcond implementation to support the SELNEZ/SELEQZ
> instructions introduced in MIPS r6 (where MOVN/MOVZ have been removed).
> 
> Whereas the "MOVN/MOVZ rd, rs, rt" instructions have the following
> semantics:
>  rd = [!]rt ? rs : rd
> 
> The "SELNEZ/SELEQZ rd, rs, rt" instructions are slightly different:
>  rd = [!]rt ? rs : 0
> 
> First we ensure that if one of the movcond input values is zero that it
> comes last (we can swap the input arguments if we invert the condition).
> This is so that it can exactly match one of the SELNEZ/SELEQZ
> instructions and avoid the need to emit the other one.
> 
> Otherwise we emit the opposite instruction first into a temporary
> register, and OR that into the result:
>  SELNEZ/SELEQZ  TMP1, v2, c1
>  SELEQZ/SELNEZ  ret, v1, c1
>  OR ret, ret, TMP1
> 
> Which does the following:
>  ret = cond ? v1 : v2
> 
> Signed-off-by: James Hogan <james.ho...@imgtec.com>
> Cc: Richard Henderson <r...@twiddle.net>
> Cc: Aurelien Jarno <aurel...@aurel32.net>
> ---
> Changes in v3:
> - Switch to using bool eqz to indicate whether to use SELEQZ / MOVZ
>   instead of SELNEZ / MOVN (Richard).
> - Add tcg_debug_assert(v2 == ret) for pre-r6 case with comment to remind
>   reader that it should be guaranteed via constraints (Richard).
> 
> Changes in v2:
> - Combine with patch 6 from v1, and drop functional changes to movcond
>   implementation pre-r6. We now provide different constraints for
>   movcond depending on presence of r6. (thanks Richard for feedback).
> ---
>  tcg/mips/tcg-target.c | 43 +++++--
>  1 file changed, 37 insertions(+), 6 deletions(-)

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 3/6] tcg/mips: Add use_mips32r6_instructions definition

2015-10-08 Thread Aurelien Jarno

On 2015-10-02 13:24, James Hogan wrote:
> Add definition use_mips32r6_instructions to the MIPS TCG backend which
> is constant 1 when built for MIPS release 6. This will be used to decide
> between pre-R6 and R6 instruction encodings.
> 
> Signed-off-by: James Hogan <james.ho...@imgtec.com>
> Reviewed-by: Richard Henderson <r...@twiddle.net>
> Cc: Aurelien Jarno <aurel...@aurel32.net>
> ---
>  tcg/mips/tcg-target.h | 7 +++
>  1 file changed, 7 insertions(+)

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

> diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
> index f5ba52cacfe5..e579c10b9aaa 100644
> --- a/tcg/mips/tcg-target.h
> +++ b/tcg/mips/tcg-target.h
> @@ -96,6 +96,13 @@ extern bool use_mips32_instructions;
>  extern bool use_mips32r2_instructions;
>  #endif
>  
> +/* MIPS32R6 instruction set detection */
> +#if defined(__mips_isa_rev) && (__mips_isa_rev >= 6)
> +#define use_mips32r6_instructions  1
> +#else
> +#define use_mips32r6_instructions  0
> +#endif
> +
>  /* optional instructions */
>  #define TCG_TARGET_HAS_div_i32  1
>  #define TCG_TARGET_HAS_rem_i32  1
> -- 
> 2.4.9
> 
> 

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH] target-mips: Add enum for BREAK32

2015-10-08 Thread Aurelien Jarno

On 2015-10-02 17:50, Yongbok Kim wrote:
> Add enum for BREAK32
> 
> Signed-off-by: Yongbok Kim <yongbok@imgtec.com>
> ---
>  target-mips/translate.c |3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v4 23/26] tcg: Emit prologue to the beginning of code_gen_buffer

2015-10-01 Thread Aurelien Jarno

On 2015-09-30 15:09, Richard Henderson wrote:
> By putting the prologue at the end, we risk overwriting the
> prologue should our estimate of maximum TB size.  Given the
> two different placements of the call to tcg_prologue_init,
> move the high water mark computation into tcg_prologue_init.
> 
> Reviewed-by: Peter Maydell <peter.mayd...@linaro.org>
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  tcg/tcg.c   | 35 ---
>  translate-all.c | 28 +---
>  2 files changed, 37 insertions(+), 26 deletions(-)

Good idea to move it. I have done some experiments with putting slow
path "helpers" in the prologue, and I ended-up going over the 1024
bytes limits.

> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index d3693b1..5609108 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -363,17 +363,38 @@ void tcg_context_init(TCGContext *s)
>  
>  void tcg_prologue_init(TCGContext *s)
>  {
> -/* init global prologue and epilogue */
> -s->code_buf = s->code_gen_prologue;
> -s->code_ptr = s->code_buf;
> +size_t prologue_size, total_size;
> +void *buf0, *buf1;
> +
> +/* Put the prologue at the beginning of code_gen_buffer.  */
> +buf0 = s->code_gen_buffer;
> +s->code_ptr = buf0;
> +s->code_buf = buf0;
> +s->code_gen_prologue = buf0;
> +
> +/* Generate the prologue.  */
>  tcg_target_qemu_prologue(s);
> -flush_icache_range((uintptr_t)s->code_buf, (uintptr_t)s->code_ptr);
> +buf1 = s->code_ptr;
> +flush_icache_range((uintptr_t)buf0, (uintptr_t)buf1);
> +
> +/* Deduct the prologue from the buffer.  */
> +prologue_size = tcg_current_code_size(s);
> +s->code_gen_ptr = buf1;
> +s->code_gen_buffer = buf1;
> +s->code_buf = buf1;
> +total_size = s->code_gen_buffer_size - prologue_size;
> +s->code_gen_buffer_size = total_size;
> +
> +/* Compute a high-water mark, at which we voluntarily flush the
> +   buffer and start over.  */
> +s->code_gen_buffer_max_size = total_size - TCG_MAX_OP_SIZE * 
> OPC_BUF_SIZE;
> +
> +tcg_register_jit(s->code_gen_buffer, total_size);

I am not sure why you moved this 2 lines there, I think they have more
their place in code_gen_alloc() so that the heuristics stay at the same
place. total_size is available in s->code_gen_buffer_size, so that
should be doable.

>  #ifdef DEBUG_DISAS
>  if (qemu_loglevel_mask(CPU_LOG_TB_OUT_ASM)) {
> -size_t size = tcg_current_code_size(s);
> -qemu_log("PROLOGUE: [size=%zu]\n", size);
> -log_disas(s->code_buf, size);
> +qemu_log("PROLOGUE: [size=%zu]\n", prologue_size);
> +log_disas(buf0, prologue_size);
>  qemu_log("\n");
>  qemu_log_flush();
>  }
> diff --git a/translate-all.c b/translate-all.c
> index 3454f4e..0e8d176 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -690,23 +690,15 @@ static inline void code_gen_alloc(size_t tb_size)
>  }
>  
>  qemu_madvise(tcg_ctx.code_gen_buffer, tcg_ctx.code_gen_buffer_size,
> -QEMU_MADV_HUGEPAGE);
> -
> -/* Steal room for the prologue at the end of the buffer.  This ensures
> -   (via the MAX_CODE_GEN_BUFFER_SIZE limits above) that direct branches
> -   from TB's to the prologue are going to be in range.  It also means
> -   that we don't need to mark (additional) portions of the data segment
> -   as executable.  */
> -tcg_ctx.code_gen_prologue = tcg_ctx.code_gen_buffer +
> -tcg_ctx.code_gen_buffer_size - 1024;
> -tcg_ctx.code_gen_buffer_size -= 1024;
> -
> -tcg_ctx.code_gen_buffer_max_size = tcg_ctx.code_gen_buffer_size -
> -(TCG_MAX_OP_SIZE * OPC_BUF_SIZE);
> -tcg_ctx.code_gen_max_blocks = tcg_ctx.code_gen_buffer_size /
> -CODE_GEN_AVG_BLOCK_SIZE;
> -tcg_ctx.tb_ctx.tbs =
> -g_malloc(tcg_ctx.code_gen_max_blocks * sizeof(TranslationBlock));
> + QEMU_MADV_HUGEPAGE);
> +
> +/* Estimate a good size for the number of TBs we can support.  We
> +   still haven't deducted the prologue from the buffer size here,
> +   but that's minimal and won't affect the estimate much.  */
> +tcg_ctx.code_gen_max_blocks
> += tcg_ctx.code_gen_buffer_size / CODE_GEN_AVG_BLOCK_SIZE;
> +tcg_ctx.tb_ctx.tbs = g_new(TranslationBlock, 
> tcg_ctx.code_gen_max_blocks);
> +
>  qemu_mutex_init(_ctx.tb_ctx.tb_lock);
>  }
>  
> @@ -717,8 +709,6 @@ void tcg_exec_init(unsigned long tb_size)
>  {
>  cpu_gen_init();
>  code_gen_alloc(tb_size);
> -tcg_ctx.code_gen_ptr = tcg_ctx.code_gen_buffer;
> -tcg_register_jit(tcg_ctx.code_gen_buffer, tcg_ctx.code_gen_buffer_size);
>  page_init();
>  #if defined(CONFIG_SOFTMMU)
>  /* There's no guest base to take into account, so go ahead and

Otherwise the patch looks fine to me.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v4 00/26] Do away with TB retranslation

2015-10-01 Thread Aurelien Jarno

On 2015-09-30 15:09, Richard Henderson wrote:
> Version 4:
>   * Typos noticed by pmm during round 3.
>   * RB's collected during round 3.
>   * Reduce the number of TBs allocated.
> 
> I believe the only patch left without an RB is 24; and of
> course 26 which is new.
> 
> 
> r~
> 
> 
> Richard Henderson (26):
>   tcg: Rename debug_insn_start to insn_start
>   target-*: Unconditionally emit tcg_gen_insn_start
>   target-*: Increment num_insns immediately after tcg_gen_insn_start
>   target-*: Introduce and use cpu_breakpoint_test
>   tcg: Allow extra data to be attached to insn_start
>   target-arm: Add condexec state to insn_start
>   target-i386: Add cc_op state to insn_start
>   target-mips: Add delayed branch state to insn_start
>   target-s390x: Add cc_op state to insn_start
>   target-sh4: Add flags state to insn_start
>   target-cris: Mirror gen_opc_pc into insn_start
>   target-sparc: Tidy gen_branch_a interface
>   target-sparc: Split out gen_branch_n
>   target-sparc: Remove gen_opc_jump_pc
>   target-sparc: Add npc state to insn_start
>   tcg: Merge cpu_gen_code into tb_gen_code
>   target-*: Drop cpu_gen_code define
>   tcg: Add TCG_MAX_INSNS
>   tcg: Pass data argument to restore_state_to_opc
>   tcg: Save insn data and use it in cpu_restore_state_from_tb
>   tcg: Remove gen_intermediate_code_pc
>   tcg: Remove tcg_gen_code_search_pc
>   tcg: Emit prologue to the beginning of code_gen_buffer
>   tcg: Allocate a guard page after code_gen_buffer
>   tcg: Check for overflow via highwater mark
>   tcg: Adjust CODE_GEN_AVG_BLOCK_SIZE
> 
>  include/exec/exec-all.h   |  23 +-
>  include/qom/cpu.h |  16 ++
>  target-alpha/cpu.h|   1 -
>  target-alpha/translate.c  |  70 ++
>  target-arm/cpu.h  |   2 +-
>  target-arm/translate-a64.c|  48 +---
>  target-arm/translate.c|  83 +++
>  target-arm/translate.h|   8 +-
>  target-cris/cpu.h |   1 -
>  target-cris/translate.c   |  93 ++--
>  target-cris/translate_v10.c   |   3 -
>  target-i386/cpu.h |   2 +-
>  target-i386/translate.c   | 106 +++--
>  target-lm32/cpu.h |   1 -
>  target-lm32/translate.c   |  83 ++-
>  target-m68k/cpu.h |   1 -
>  target-m68k/translate.c   |  82 ++-
>  target-microblaze/cpu.h   |   1 -
>  target-microblaze/translate.c |  83 ++-
>  target-mips/cpu.h |   2 +-
>  target-mips/translate.c   |  98 +++-
>  target-moxie/cpu.h|   1 -
>  target-moxie/translate.c  |  82 +++
>  target-openrisc/cpu.h |   1 -
>  target-openrisc/translate.c   |  78 ++-
>  target-ppc/cpu.h  |   1 -
>  target-ppc/translate.c|  72 ++
>  target-s390x/cpu.h|   2 +-
>  target-s390x/translate.c  |  78 ++-
>  target-sh4/cpu.h  |   2 +-
>  target-sh4/translate.c|  91 +++-
>  target-sparc/cpu.h|   2 +-
>  target-sparc/translate.c  | 185 +++
>  target-tilegx/cpu.h   |   1 -
>  target-tilegx/translate.c |  58 ++---
>  target-tricore/translate.c|  59 ++---
>  target-unicore32/translate.c  |  83 ++-
>  target-xtensa/cpu.h   |   1 -
>  target-xtensa/translate.c |  79 ++-
>  tcg/tcg-op.h  |  52 -
>  tcg/tcg-opc.h |   4 +-
>  tcg/tcg.c | 168 --
>  tcg/tcg.h |  20 +-
>  tci.c |   9 -
>  translate-all.c   | 520 
> +-
>  45 files changed, 964 insertions(+), 1492 deletions(-)

I have finally been able to review this whole patchset, sorry for the 
time it took. Thanks a lot for working on that, I think it's really a 
great improvement, both in term of performances, and in code cleanup
(look at the final diff above). It's probably possible to do some more 
code cleaning after it's applied, but that should wait until this
patchset is applied. There is at least the "do not change the jump
target while retranslating" code in a few TCG targets that can be
removed.

As said while reviewing this patchset, it opens some place for more
improvements (like not saving data for instructions which can't trigger
exceptions), but that should come after it has been applied.

On this subject, I believe this patchset is now a good state, except
maybe for the last few patches. I am running an earlier version of it
on plenty of VM without any issue, except on SH4. I realized SH4
broken in some rare cases, but not directly by this patchset. There is 
an issue when a delay slot is split in two parts when reaching the
maximum number of TC

Re: [Qemu-devel] [PATCH v4 24/26] tcg: Allocate a guard page after code_gen_buffer

2015-10-01 Thread Aurelien Jarno

On 2015-09-30 15:09, Richard Henderson wrote:
> This will catch any overflow of the buffer.
> 
> Add a native win32 alternative for alloc_code_gen_buffer;
> remove the malloc alternative.
> 
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  translate-all.c | 210 
> 
>  1 file changed, 119 insertions(+), 91 deletions(-)

I havent reviewed the patch in details, but I wonder if that could really
happen? Given the size of the code generation buffer (a few MB at
least), I don't think it's a problem if we don't use it to the last kB,
and thus we could keep some safe margin if needed.

Also what happens if an overflow really happens? In softmmu mode a
segmentation fault will happen. In user-mode I guess the fault will
be forwarded to the guest process, so this will likely wrongly
be interpreted as a bug in the guest code.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v4 26/26] tcg: Adjust CODE_GEN_AVG_BLOCK_SIZE

2015-10-01 Thread Aurelien Jarno

On 2015-09-30 15:09, Richard Henderson wrote:
> At present, the "average" guestimate of TB size is way too small, leading
> to many unused entries in the pre-allocated TB array.  For a guest with 1GB
> ram, we're currently allocating 256MB for the array.
> 
> Survey arm, alpha, aarch64, ppc, sparc, i686, x86_64 guests running on
> x86_64 and ppc64 hosts and select a new average.  The size of the array
> drops to 81MB with no more flushing than before.
> 
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  include/exec/exec-all.h | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v4 04/26] target-*: Introduce and use cpu_breakpoint_test

2015-10-01 Thread Aurelien Jarno

On 2015-09-30 15:09, Richard Henderson wrote:
> Reduce the boilerplate required for each target.  At the same time,
> move the test for breakpoint after calling tcg_gen_insn_start.
> 
> Note that arm and aarch64 do not use cpu_breakpoint_test, but still
> move the inline test down after tcg_gen_insn_start.
> 
> Reviewed-by: Peter Maydell <peter.mayd...@linaro.org>
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  include/qom/cpu.h | 16 
>  target-alpha/translate.c  | 13 -
>  target-arm/translate-a64.c| 26 +-
>  target-arm/translate.c| 31 ---
>  target-cris/translate.c   | 27 ---
>  target-i386/translate.c   | 17 +++--
>  target-lm32/translate.c   | 25 +++--
>  target-m68k/translate.c   | 18 ++
>  target-microblaze/translate.c | 36 +---
>  target-mips/translate.c   | 25 ++---
>  target-moxie/translate.c  | 19 +++
>  target-openrisc/translate.c   | 24 +++-
>  target-ppc/translate.c| 14 +-
>  target-s390x/translate.c  | 16 ++--
>  target-sh4/translate.c| 20 
>  target-sparc/translate.c  | 23 ++-
>  target-unicore32/translate.c  | 24 ++--
>  target-xtensa/translate.c | 25 +++------
>  18 files changed, 160 insertions(+), 239 deletions(-)

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v4 25/26] tcg: Check for overflow via highwater mark

2015-10-01 Thread Aurelien Jarno

On 2015-09-30 15:09, Richard Henderson wrote:
> We currently pre-compute an worst case code size for any TB, which
> works out to be 122kB.  Since the average TB size is near 1kB, this
> wastes quite a lot of storage.

The code generation buffer is currently computed as 1/4 of the guest
RAM in softmmu mode (so 32MB for the default 128MB of RAM) or 32MB in
user mode. 122kB is therefore less than 0.4% of waster memory, I am not
therefore sure we need to add so much code just for that.

BTW, I wonder if it is really a good idea to scale the code generation
buffer with the size of the RAM, as the two do not seem that related. It
happens that at some point we don't really increases performances
anymore, and always defining it as 32MB might actually be a good idea.
Personally I am using a patch that limits it to 128MB.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 20/25] tcg: Save insn data and use it in cpu_restore_state_from_tb

2015-09-25 Thread Aurelien Jarno

 +p = encode_sleb128(p, tcg_ctx.gen_insn_end_off[i] - prev);
> +}
> +
> +return p - block;
> +}
> +

Given we save both the host and the guest PC in this structure, one
obvious optimization would be to skip saving data for host instructions
which can not generate exception. It means that all the TCG ops in this
instruction do not generate exceptions either. We can easily test that
for all TCG instructions except all by looking at the
TCG_OPF_SIDE_EFFECTS flag. For the call op, we have to look at the
TCG_CALL_NO_SIDE_EFFECTS flag, even if it doesn't necessary means the
helper might generate exception.

That should significantly save space on load/store architectures. That
said we can probably do that in a latter time.

>  /* The cpu state corresponding to 'searched_pc' is restored.  */
>  static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
>   uintptr_t searched_pc)
>  {
> +target_ulong data[TARGET_INSN_START_WORDS] = { tb->pc };
> +uintptr_t host_pc = (uintptr_t)tb->tc_ptr;
>  CPUArchState *env = cpu->env_ptr;
> -TCGContext *s = _ctx;
> -int j;
> -uintptr_t tc_ptr;
> +uint8_t *p = tb->tc_search;
> +int i, j, num_insns = tb->icount;
>  #ifdef CONFIG_PROFILER
> -int64_t ti;
> +int64_t ti = profile_getclock();
>  #endif
>  
> -#ifdef CONFIG_PROFILER
> -ti = profile_getclock();
> -#endif
> -tcg_func_start(s);
> +if (searched_pc < host_pc) {
> +return -1;
> +}
>  
> -gen_intermediate_code_pc(env, tb);
> +/* Reconstruct the stored insn data while looking for the point at
> +   which the end of the insn exceeds the searched_pc.  */
> +for (i = 0; i < num_insns; ++i) {
> +for (j = 0; j < TARGET_INSN_START_WORDS; ++j) {
> +data[j] += decode_sleb128();
> +}
> +host_pc += decode_sleb128();
> +if (host_pc > searched_pc) {
> +goto found;
> +}
> +}
> +return -1;
>  
> + found:
>  if (tb->cflags & CF_USE_ICOUNT) {
>  assert(use_icount);
>  /* Reset the cycle counter to the start of the block.  */
> -cpu->icount_decr.u16.low += tb->icount;
> +cpu->icount_decr.u16.low += num_insns;
>  /* Clear the IO flag.  */
>  cpu->can_do_io = 0;
>  }
> -
> -/* find opc index corresponding to search_pc */
> -tc_ptr = (uintptr_t)tb->tc_ptr;
> -if (searched_pc < tc_ptr)
> -return -1;
> -
> -s->tb_next_offset = tb->tb_next_offset;
> -#ifdef USE_DIRECT_JUMP
> -s->tb_jmp_offset = tb->tb_jmp_offset;
> -s->tb_next = NULL;
> -#else
> -s->tb_jmp_offset = NULL;
> -s->tb_next = tb->tb_next;
> -#endif
> -j = tcg_gen_code_search_pc(s, (tcg_insn_unit *)tc_ptr,
> -   searched_pc - tc_ptr);
> -if (j < 0)
> -return -1;
> -/* now find start of instruction before */
> -while (s->gen_opc_instr_start[j] == 0) {
> -j--;
> -}
> -cpu->icount_decr.u16.low -= s->gen_opc_icount[j];
> -
> -restore_state_to_opc(env, tb, s->gen_opc_data);
> +cpu->icount_decr.u16.low -= i;
> +restore_state_to_opc(env, tb, data);
>  
>  #ifdef CONFIG_PROFILER
> -s->restore_time += profile_getclock() - ti;
> -s->restore_count++;
> +tcg_ctx.restore_time += profile_getclock() - ti;
> +tcg_ctx.restore_count++;
>  #endif
>  return 0;
>  }
> @@ -969,7 +1035,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>  tb_page_addr_t phys_pc, phys_page2;
>  target_ulong virt_page2;
>  tcg_insn_unit *gen_code_buf;
> -int gen_code_size;
> +int gen_code_size, search_size;
>  #ifdef CONFIG_PROFILER
>  int64_t ti;
>  #endif
> @@ -1025,11 +1091,13 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>  #endif
>  
>  gen_code_size = tcg_gen_code(_ctx, gen_code_buf);
> +search_size = encode_search(tb, (void *)gen_code_buf + gen_code_size);
>  
>  #ifdef CONFIG_PROFILER
>  tcg_ctx.code_time += profile_getclock();
>  tcg_ctx.code_in_len += tb->size;
>  tcg_ctx.code_out_len += gen_code_size;
> +tcg_ctx.search_out_len += search_size;
>  #endif
>  
>  #ifdef DEBUG_DISAS
> @@ -1041,8 +1109,9 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>  }
>  #endif
>  
> -tcg_ctx.code_gen_ptr = (void *)(((uintptr_t)gen_code_buf +
> -gen_code_size + CODE_GEN_ALIGN - 1) & ~(CODE_GEN_ALIGN - 1));
> +tcg_ctx.code_gen_ptr = (void *)
> +ROUND_UP((uintptr_t)gen_code_buf + gen_code_size + search_size,
> + CODE_GEN_ALIGN);
>  
>  /* check next page if needed */
>  virt_page2 = (pc + tb->size - 1) & TARGET_PAGE_MASK;

If you fix the coding style issue I mentioned above, you get:

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 21/25] tcg: Remove gen_intermediate_code_pc

2015-09-25 Thread Aurelien Jarno

On 2015-09-22 13:25, Richard Henderson wrote:
> It is no longer used, so tidy up everything reached by it.
> This includes the gen_opc_* arrays, the search_pc parameter
> and the inline gen_intermediate_code_internal functions.
> 
> Reviewed-by: Peter Maydell <peter.mayd...@linaro.org>
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  include/exec/exec-all.h   |  1 -
>  target-alpha/translate.c  | 41 
>  target-arm/translate-a64.c| 30 +++-
>  target-arm/translate.c| 54 
> ---
>  target-arm/translate.h|  8 ++-
>  target-cris/translate.c   | 50 +--
>  target-i386/translate.c   | 49 ---
>  target-lm32/translate.c   | 42 -
>  target-m68k/translate.c   | 43 --
>  target-microblaze/translate.c | 40 
>  target-mips/translate.c   | 48 --
>  target-moxie/translate.c  | 41 
>  target-openrisc/translate.c   | 42 -
>  target-ppc/translate.c| 40 
>  target-s390x/translate.c  | 44 ---
>  target-sh4/translate.c| 43 --
>  target-sparc/translate.c  | 51 
>  target-tilegx/translate.c | 41 
>  target-tricore/translate.c| 31 -
>  target-unicore32/translate.c  | 44 ---
>  target-xtensa/translate.c | 39 ---
>  tcg/tcg.h     |  4 
>  22 files changed, 90 insertions(+), 736 deletions(-)

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 22/25] tcg: Remove tcg_gen_code_search_pc

2015-09-25 Thread Aurelien Jarno

On 2015-09-22 13:25, Richard Henderson wrote:
> It's no longer used, so tidy up everything reached by it.
> 
> Reviewed-by: Peter Maydell <peter.mayd...@linaro.org>
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  tcg/tcg.c | 59 +++
>  tcg/tcg.h |  2 --
>  2 files changed, 19 insertions(+), 42 deletions(-)

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 19/25] tcg: Pass data argument to restore_state_to_opc

2015-09-24 Thread Aurelien Jarno

On 2015-09-22 13:25, Richard Henderson wrote:
> The gen_opc_* arrays are already redundant with the data stored in
> the insn_start arguments.  Transition restore_state_to_opc to use
> data from the latter.
> 
> Reviewed-by: Peter Maydell <peter.mayd...@linaro.org>
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  include/exec/exec-all.h   |  2 +-
>  target-alpha/translate.c  |  5 +++--
>  target-arm/translate.c|  9 +
>  target-cris/translate.c   |  5 +++--
>  target-i386/translate.c   | 26 ++
>  target-lm32/translate.c   |  5 +++--
>  target-m68k/translate.c   |  5 +++--
>  target-microblaze/translate.c |  5 +++--
>  target-mips/translate.c   |  9 +
>  target-moxie/translate.c  |  5 +++--
>  target-openrisc/translate.c   |  4 ++--
>  target-ppc/translate.c|  5 +++--
>  target-s390x/translate.c  |  8 
>  target-sh4/translate.c|  7 ---
>  target-sparc/translate.c  | 10 ++
>  target-tilegx/translate.c |  5 +++--
>  target-tricore/translate.c|  5 +++--
>  target-unicore32/translate.c  |  5 +++--
>  target-xtensa/translate.c |  5 +++--
>  tcg/tcg.c | 11 ++-
>  tcg/tcg.h |  2 ++
>  translate-all.c   |  2 +-
>  22 files changed, 79 insertions(+), 66 deletions(-)

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 16/25] tcg: Merge cpu_gen_code into tb_gen_code

2015-09-24 Thread Aurelien Jarno

On 2015-09-22 13:24, Richard Henderson wrote:
> As it's only caller, this tidies things a bit.
> 
> Reviewed-by: Peter Maydell <peter.mayd...@linaro.org>
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  include/exec/exec-all.h |   2 -
>  translate-all.c | 131 
> ++--
>  2 files changed, 59 insertions(+), 74 deletions(-)

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 17/25] target-*: Drop cpu_gen_code define

2015-09-24 Thread Aurelien Jarno

On 2015-09-22 13:24, Richard Henderson wrote:
> This symbol no longer exists.
> 
> Reviewed-by: Peter Maydell <peter.mayd...@linaro.org>
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  target-alpha/cpu.h  | 1 -
>  target-arm/cpu.h| 1 -
>  target-cris/cpu.h   | 1 -
>  target-i386/cpu.h   | 1 -
>  target-lm32/cpu.h   | 1 -
>  target-m68k/cpu.h   | 1 -
>  target-microblaze/cpu.h | 1 -
>  target-mips/cpu.h   | 1 -
>  target-moxie/cpu.h  | 1 -
>  target-openrisc/cpu.h   | 1 -
>  target-ppc/cpu.h| 1 -
>  target-s390x/cpu.h  | 1 -
>  target-sh4/cpu.h| 1 -
>  target-sparc/cpu.h  | 1 -
>  target-tilegx/cpu.h | 1 -
>  target-xtensa/cpu.h | 1 -
>  16 files changed, 16 deletions(-)

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 15/25] target-sparc: Add npc state to insn_start

2015-09-24 Thread Aurelien Jarno

On 2015-09-22 13:24, Richard Henderson wrote:
> Reviewed-by: Peter Maydell <peter.mayd...@linaro.org>
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  target-sparc/cpu.h   | 1 +
>  target-sparc/translate.c | 7 ++-
>  2 files changed, 7 insertions(+), 1 deletion(-)

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net> 

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 14/25] target-sparc: Remove gen_opc_jump_pc

2015-09-24 Thread Aurelien Jarno

On 2015-09-22 13:24, Richard Henderson wrote:
> Since jump_pc[1] is always npc + 4, we can infer after incrementing
> that jump_pc[1] == pc + 4.  Because of that, we can encode the branch
> destination into a single word, and store that in npc.
> 
> Reviewed-by: Peter Maydell <peter.mayd...@linaro.org>
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  target-sparc/translate.c | 19 ++-
>  1 file changed, 10 insertions(+), 9 deletions(-)

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 12/25] target-sparc: Tidy gen_branch_a interface

2015-09-24 Thread Aurelien Jarno

On 2015-09-22 13:24, Richard Henderson wrote:
> We always pass pc2 == dc->npc and r_cond == cpu_cond,
> and always set is_br afterward.  Infer all of that.
> 
> Reviewed-by: Peter Maydell <peter.mayd...@linaro.org>
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  target-sparc/translate.c | 21 ++---
>  1 file changed, 10 insertions(+), 11 deletions(-)
> 

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>
-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 13/25] target-sparc: Split out gen_branch_n

2015-09-24 Thread Aurelien Jarno

On 2015-09-22 13:24, Richard Henderson wrote:
> Unify three copies of this code from different
> branch types.  Fix the case when npc == DYNAMIC_PC,
> i.e. a branch within a delay slot.
> 
> Reviewed-by: Peter Maydell <peter.mayd...@linaro.org>
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  target-sparc/translate.c | 55 
> 
>  1 file changed, 28 insertions(+), 27 deletions(-)

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v3 18/25] tcg: Add TCG_MAX_INSNS

2015-09-24 Thread Aurelien Jarno

On 2015-09-22 13:25, Richard Henderson wrote:
> Adjust all translators to respect it.
> 
> Reviewed-by: Peter Maydell <peter.mayd...@linaro.org>
> Signed-off-by: Richard Henderson <r...@twiddle.net>
> ---
>  target-alpha/translate.c  |  3 +++
>  target-arm/translate-a64.c|  3 +++
>  target-arm/translate.c|  6 +-
>  target-cris/translate.c   |  3 +++
>  target-i386/translate.c   |  6 +-
>  target-lm32/translate.c   |  3 +++
>  target-m68k/translate.c   |  6 +-
>  target-microblaze/translate.c |  6 +-
>  target-mips/translate.c   |  7 ++-
>  target-moxie/translate.c  | 13 +++--
>  target-openrisc/translate.c   |  3 +++
>  target-ppc/translate.c|  6 +-
>  target-s390x/translate.c  |  3 +++
>  target-sh4/translate.c|  7 ++-
>  target-sparc/translate.c  |  7 ++-
>  target-tilegx/translate.c |  3 +++
>  target-tricore/translate.c| 20 +---
>  target-unicore32/translate.c  |  3 +++
>  target-xtensa/translate.c |  3 +++
>  tcg/tcg.h |  1 +
>  20 files changed, 95 insertions(+), 17 deletions(-)
> 
> diff --git a/target-alpha/translate.c b/target-alpha/translate.c
> index c10193e..538e202 100644
> --- a/target-alpha/translate.c
> +++ b/target-alpha/translate.c
> @@ -2903,6 +2903,9 @@ static inline void 
> gen_intermediate_code_internal(AlphaCPU *cpu,
>  if (max_insns == 0) {
>  max_insns = CF_COUNT_MASK;
>  }

I guess you can change also change the value to TCG_MAX_INSNS, though I
guess the compiler will realize about that.

> +if (max_insns > TCG_MAX_INSNS) {
> +max_insns = TCG_MAX_INSNS;
> +}
>  
>  if (in_superpage(, pc_start)) {
>  pc_mask = (1ULL << 41) - 1;

Given we have the same pattern in all targets, I do wonder if it
wouldn't be better to just setup (cflags & CF_COUNT_MASK) to
TCG_MAX_INSNS instead of 0 in translate-all.c when not using icount.

That said your code is correct, so:

Reviewed-by: Aurelien Jarno <aurel...@aurel32.net>

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

< 1 2 3 4 5 6 7 8 9 10 >

501 - 600 of 4024 matches

Mail list logo