Re: [Qemu-devel] [PATCH v2 00/20] Emulate guest vector operations with host vector operations

2017-02-01 Thread no-reply
Hi,

Your series seems to have some coding style problems. See output below for
more information:

Type: series
Subject: [Qemu-devel] [PATCH v2 00/20] Emulate guest vector operations with 
host vector operations
Message-id: 1485951502-28774-1-git-send-email-batuz...@ispras.ru

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] 
patchew/1485951502-28774-1-git-send-email-batuz...@ispras.ru -> 
patchew/1485951502-28774-1-git-send-email-batuz...@ispras.ru
Switched to a new branch 'test'
02942b7 tcg/README: update README to include information about vector opcodes
44febfa target/arm: load two consecutive 64-bits vector regs as a 128-bit 
vector reg
8b2f8a3 tcg/i386: add support for qemu_ld_v128/qemu_st_v128 ops
ae2a31c softmmu: create helpers for vector loads
26f10d4 tcg: introduce qemu_ld_v128 and qemu_st_v128 opcodes
a5a8f82 tcg: introduce new TCGMemOp - MO_128
e0d0067 tcg: do not rely on exact values of MO_BSWAP or MO_SIGN in backend
8233c86 tcg/i386: support remaining vector addition operations
9bd34b5 tcg/i386: support 64-bit vector operations
773dc86 tcg/i386: add support for vector opcodes
4896d8b target/arm: use vector opcode to handle vadd. instruction
8117d04 target/arm: support access to vector guest registers as globals
6eb8190 tcg: add vector addition operations
003734a tcg: allow globals to overlap
0b4f31e tcg: use results of alias analysis in liveness analysis
5dc7612 tcg: add simple alias analysis
8b1630f tcg: add ld_v128, ld_v64, st_v128 and st_v64 opcodes
ec6b284 tcg: support representing vector type with smaller vector or scalar 
types
2e9c5ae tcg: add support for 64bit vector type
61f938a tcg: add support for 128bit vector type

=== OUTPUT BEGIN ===
Checking PATCH 1/20: tcg: add support for 128bit vector type...
Checking PATCH 2/20: tcg: add support for 64bit vector type...
Checking PATCH 3/20: tcg: support representing vector type with smaller vector 
or scalar types...
Checking PATCH 4/20: tcg: add ld_v128, ld_v64, st_v128 and st_v64 opcodes...
Checking PATCH 5/20: tcg: add simple alias analysis...
ERROR: spaces required around that ':' (ctx:VxE)
#81: FILE: tcg/optimize.c:1472:
+CASE_OP_32_64(movi):
^

ERROR: spaces required around that ':' (ctx:VxE)
#85: FILE: tcg/optimize.c:1476:
+CASE_OP_32_64(mov):
   ^

ERROR: spaces required around that ':' (ctx:VxE)
#90: FILE: tcg/optimize.c:1481:
+CASE_OP_32_64(add):
   ^

ERROR: spaces required around that ':' (ctx:VxE)
#91: FILE: tcg/optimize.c:1482:
+CASE_OP_32_64(sub):
   ^

ERROR: spaces required around that ':' (ctx:VxE)
#101: FILE: tcg/optimize.c:1492:
+CASE_OP_32_64(ld8s):
^

ERROR: spaces required around that ':' (ctx:VxE)
#102: FILE: tcg/optimize.c:1493:
+CASE_OP_32_64(ld8u):
^

ERROR: spaces required around that ':' (ctx:VxE)
#106: FILE: tcg/optimize.c:1497:
+CASE_OP_32_64(ld16s):
 ^

ERROR: spaces required around that ':' (ctx:VxE)
#107: FILE: tcg/optimize.c:1498:
+CASE_OP_32_64(ld16u):
 ^

ERROR: spaces required around that ':' (ctx:VxE)
#125: FILE: tcg/optimize.c:1516:
+CASE_OP_32_64(st8):
   ^

ERROR: spaces required around that ':' (ctx:VxE)
#129: FILE: tcg/optimize.c:1520:
+CASE_OP_32_64(st16):
^

total: 10 errors, 0 warnings, 196 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 6/20: tcg: use results of alias analysis in liveness analysis...
Checking PATCH 7/20: tcg: allow globals to overlap...
Checking PATCH 8/20: tcg: add vector addition operations...
Checking PATCH 9/20: target/arm: support access to vector guest registers as 
globals...
ERROR: that open brace { should be on the previous line
#38: FILE: target/arm/translate.c:82:
+static const char *regnames_q[] =
+{ "q0", "q1", "q2", "q3", "q4", "q5", "q6", "q7",

ERROR: that open brace { should be on the previous line
#42: FILE: target/arm/translate.c:86:
+static const char *regnames_d[] =
+{ "d0", "

[Qemu-devel] [PATCH v2 00/20] Emulate guest vector operations with host vector operations

2017-02-01 Thread Kirill Batuzov
The goal of these patch series is to set up an infrastructure to emulate
guest vector operations using host vector operations. Preliminary
experiments show that simply translating loads and stores increases
performance of x264 video codec by 10%. The performance of a gcc vectorized
for loop increased 2x.

To be able to emulate guest vector operations using host vector operations,
several things need to be done.

1. Corresponding vector types should be added to TCG. These series add
TCG_v128 and TCG_v64. I've made TCG_v64 a different type than TCG_i64
because it usually needs to be allocated to different registers and
supports different operations.

2. Load/store operations for these new types need to be implemented.

3. For seamless transition from current model to a new one we need to
handle cases where memory occupied by global variable can be accessed via
pointer to the CPUArchState structure. A very simple conservative alias
analysis has been added to do it. This analysis tracks memory loads and
stores that overlap with fields of CPUArchState and provides this
information to the register allocator. The allocator then spills and
reloads affected globals when needed.

4. Allow overlapping globals. For scalar registers this is a rare case, and
overlapping registers can ba handled as a single one (ah, al, ax, eax,
rax). In ARM every Q-register consists of two D-register each consisting of
two S-registers. Handling 4 S-registers as one because they are parts of
the same Q-register is way too inefficient.

5. Add new memory addressing mode to MMU code for large accesses and create
needed helpers. Only 128-bit vectors have been handled for now.

6. Create TCG opcodes for vector operations. Only addition has beed handled
in these series. Each operation has a wrapper that checks if the backend
supports the corresponding operation or not. In one case the vector opcode
is generated, in the other the operation is emulated with scalar
operations. The emulation code is generated inline for performance reasons
(there is a huge performance difference between inline generation
and calling a helper). As a positive side effect this will eventually allow
 to merge similar emulation code for vector instructions from different
frontends to target-independent implementation.

7. Use new operations in the frontend (ARM was used in these series).

8. Support new operations in the backend (x86_64 was used in these series).

For experiments I have used ARM guest on x86_64 host. I wanted some pair of
different architectures with vector extensions both. ARM and x86_64 pair
fits well.

v1 -> v2:
 - represent v128 type with smaller types when it is not supported by the host
 - detect AVX support and use AVX instructions when available
 - tcg/README updated
 - generate two v64 adds instead of one v128 when applicable
 - rebased to newer master
 - overlap detection for temps added (it needs to be explicitly called from
   _translate_init)
 - the stack is used to temporary store 128 bit variables to memory
   (instead of the TCGContext field)

Outstanding issues:
 - qemu_ld_v128 and qemu_st_v128 do not generate fallback code if the host
   does not support 128 bit registers. The reason is that I do not know how to
   handle the host/guest different endianness (whether do we swap only bytes
   in elements or whole vectors?). Different targets seem to have different
   ideas on how this should be done.

Kirill Batuzov (20):
  tcg: add support for 128bit vector type
  tcg: add support for 64bit vector type
  tcg: support representing vector type with smaller vector or scalar
types
  tcg: add ld_v128, ld_v64, st_v128 and st_v64 opcodes
  tcg: add simple alias analysis
  tcg: use results of alias analysis in liveness analysis
  tcg: allow globals to overlap
  tcg: add vector addition operations
  target/arm: support access to vector guest registers as globals
  target/arm: use vector opcode to handle vadd. instruction
  tcg/i386: add support for vector opcodes
  tcg/i386: support 64-bit vector operations
  tcg/i386: support remaining vector addition operations
  tcg: do not rely on exact values of MO_BSWAP or MO_SIGN in backend
  tcg: introduce new TCGMemOp - MO_128
  tcg: introduce qemu_ld_v128 and qemu_st_v128 opcodes
  softmmu: create helpers for vector loads
  tcg/i386: add support for qemu_ld_v128/qemu_st_v128 ops
  target/arm: load two consecutive 64-bits vector regs as a 128-bit
vector reg
  tcg/README: update README to include information about vector opcodes

 cputlb.c |   4 +
 softmmu_template_vector.h| 266 +++
 target/arm/translate.c   |  74 -
 tcg/README   |  47 +-
 tcg/aarch64/tcg-target.inc.c |   4 +-
 tcg/arm/tcg-target.inc.c |   4 +-
 tcg/i386/tcg-target.h|  45 +-
 tcg/i386/tcg-target.inc.c| 260 +--
 tcg/mips/tcg-target.inc.c|   4 +-
 tcg/optimize.c   | 165