[Qemu-devel] some comments on virtio crypto device specification

2019-04-20 Thread Michael S. Tsirkin
Hi guys!
Some comments about things to improve in the virtio
crypto device part of the spec:
https://news.ycombinator.com/item?id=19698399#19706987

It would be great if someone translated this to
github issues and tried to improve the spec by
addressing them.

Thanks!

-- 
MST



Re: [Qemu-devel] virtfs/9p duplicate inodes

2019-04-20 Thread Christian Schoenebeck via Qemu-devel
On Samstag, 30. März 2019 21:01:28 CEST Christian Schoenebeck wrote:
> On Samstag, 30. März 2019 17:47:51 CET Greg Kurz wrote:
> > Maybe have a look at this tentative to fix QID collisions:
> > 
> > https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg02283.html
[snip]
> Question: so far I just had a look at that patch set, but haven't tried it
> yet. Am I correct that the inode numbers (of the same file) would actually
> change on guest side with every reboot (i.e. depending on the precise
> sequence individual files would be accessed by guest after each reboot)?

I intended to extend Antonios' patch set regarding 9p QID collisions with the 
goal to make the ids constant beyond reboots by storing the qpp_table as fs 
xattr.

My plan was to load the qpp_table in v9fs_device_realize_common() and save the 
table only once in v9fs_device_unrealize_common(), instead of storing the 
table on every new insertion. The problem though is that none of the 9p 
unrealize functions is called on guest shutdowns.

Is there any callback that is guaranteed to be called on guest shutdowns?

Best regards,
Christian Schoenebeck



Re: [Qemu-devel] [PATCH] cputlb: Fix io_readx() to respect the access_type

2019-04-20 Thread Peter Maydell
On Fri, 19 Apr 2019 at 12:46, Shahab Vahedi  wrote:
>
> This change adapts io_readx() to its input access_type. Currently
> io_readx() treats any memory access as a read, although it has an
> input argument "MMUAccessType access_type". This results in:
>
> 1) Calling the tlb_fill() only with MMU_DATA_LOAD
> 2) Considering only entry->addr_read as the tlb_addr
>
> Buglink: https://bugs.launchpad.net/qemu/+bug/1825359
>
> Signed-off-by: Shahab Vahedi 
> ---
> Changelog:
> - Extra space before closing parenthesis is removed
>
>  accel/tcg/cputlb.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)

Hi; this patch mostly looks good; thanks for submitting it.

> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index 88cc8389e9..4a305ac942 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -878,10 +878,13 @@ static uint64_t io_readx(CPUArchState *env, 
> CPUIOTLBEntry *iotlbentry,
>  CPUTLBEntry *entry;
>  target_ulong tlb_addr;
>
> -tlb_fill(cpu, addr, size, MMU_DATA_LOAD, mmu_idx, retaddr);
> +tlb_fill(cpu, addr, size, access_type, mmu_idx, retaddr);
>
>  entry = tlb_entry(env, mmu_idx, addr);
> -tlb_addr = entry->addr_read;
> +tlb_addr =
> +(access_type == MMU_DATA_LOAD)  ? entry->addr_read  :
> +(access_type == MMU_DATA_STORE) ? entry->addr_write :
> +entry->addr_code;

Here you don't need to handle MMU_DATA_STORE, because
we're in io_readx -- stores will go to io_writex, not here.

Style-wise it's probably better just to use an
  if (...) {
  tlb_addr = ...;
  } else {
  tlb_addr = ...;
  }

rather than a multi-line ?: expression.

>  if (!(tlb_addr & ~(TARGET_PAGE_MASK | TLB_RECHECK))) {
>  /* RAM access */
>  uintptr_t haddr = addr + entry->addend;
> --
> 2.21.0
>

thanks
-- PMM



[Qemu-devel] [PATCH v2] net/colo-compare.c: Fix a crash in COLO Primary.

2019-04-20 Thread Lukas Straub
From: Lukas Straub 
Because event_unhandled_count may be accessed concurrently, it needs
to be protected by taking the lock. However the assert is outside the
lock, probably causing it to read garbage and aborting Qemu erroneously.

The Bug only happens when running Qemu in COLO mode.

This Patch fixes the following bug: https://bugs.launchpad.net/qemu/+bug/1824622

Signed-off-by: Lukas Straub 
---
 net/colo-compare.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index bf10526f05..fcb491121b 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -813,9 +813,8 @@ static void colo_compare_handle_event(void *opaque)
 break;
 }

-assert(event_unhandled_count > 0);
-
 qemu_mutex_lock(_mtx);
+assert(event_unhandled_count > 0);
 event_unhandled_count--;
 qemu_cond_broadcast(_complete_cond);
 qemu_mutex_unlock(_mtx);
--
2.20.1




[Qemu-devel] [PATCH v2 0/3] hw: edu: some fixes

2019-04-20 Thread Li Qiang
Recently I am considering write a driver for edu device.
After reading the spec, I found these three small issue.
Two first two related the MMIO access and the third is
related the DMA operation.

Change since v1:
Fix format compile error on Windows

Li Qiang (3):
  edu: mmio: set 'max_access_size' to 8
  edu: mmio: allow mmio read dispatch accept 8 bytes
  edu: uses uint64_t in dma operation

 hw/misc/edu.c | 28 +---
 1 file changed, 21 insertions(+), 7 deletions(-)

-- 
2.17.1





[Qemu-devel] [PATCH v2 3/3] edu: uses uint64_t in dma operation

2019-04-20 Thread Li Qiang
The dma related variable is dma_addr_t, it is uint64_t in
x64 platform. Change these usage from uint32_to uint64_t to
avoid trancation.

Signed-off-by: Li Qiang 
---
Change since v1:
Fix format compile error on Windows

 hw/misc/edu.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/hw/misc/edu.c b/hw/misc/edu.c
index 4018dddcb8..f4a6d5f1c5 100644
--- a/hw/misc/edu.c
+++ b/hw/misc/edu.c
@@ -98,23 +98,24 @@ static void edu_lower_irq(EduState *edu, uint32_t val)
 }
 }
 
-static bool within(uint32_t addr, uint32_t start, uint32_t end)
+static bool within(uint64_t addr, uint64_t start, uint64_t end)
 {
 return start <= addr && addr < end;
 }
 
-static void edu_check_range(uint32_t addr, uint32_t size1, uint32_t start,
+static void edu_check_range(uint64_t addr, uint64_t size1, uint64_t start,
 uint32_t size2)
 {
-uint32_t end1 = addr + size1;
-uint32_t end2 = start + size2;
+uint64_t end1 = addr + size1;
+uint64_t end2 = start + size2;
 
 if (within(addr, start, end2) &&
 end1 > addr && within(end1, start, end2)) {
 return;
 }
 
-hw_error("EDU: DMA range 0x%.8x-0x%.8x out of bounds (0x%.8x-0x%.8x)!",
+hw_error("EDU: DMA range 0x%016"PRIx64"-0x%016"PRIx64
+ " out of bounds (0x%016"PRIx64"-0x%016"PRIx64")!",
 addr, end1 - 1, start, end2 - 1);
 }
 
@@ -139,13 +140,13 @@ static void edu_dma_timer(void *opaque)
 }
 
 if (EDU_DMA_DIR(edu->dma.cmd) == EDU_DMA_FROM_PCI) {
-uint32_t dst = edu->dma.dst;
+uint64_t dst = edu->dma.dst;
 edu_check_range(dst, edu->dma.cnt, DMA_START, DMA_SIZE);
 dst -= DMA_START;
 pci_dma_read(>pdev, edu_clamp_addr(edu, edu->dma.src),
 edu->dma_buf + dst, edu->dma.cnt);
 } else {
-uint32_t src = edu->dma.src;
+uint64_t src = edu->dma.src;
 edu_check_range(src, edu->dma.cnt, DMA_START, DMA_SIZE);
 src -= DMA_START;
 pci_dma_write(>pdev, edu_clamp_addr(edu, edu->dma.dst),
-- 
2.17.1





[Qemu-devel] [PATCH v2 1/3] edu: mmio: set 'max_access_size' to 8

2019-04-20 Thread Li Qiang
The edu spec said, the MMIO area can be accessed by 8 bytes.
However currently the 'max_access_size' is not so the MMIO
access dispatch can only access 4 bytes one time. This patch
fixes this to respect the spec.

Notice: here the 'min_access_size' is not a must, I set this
for completement.

Signed-off-by: Li Qiang 
---
 hw/misc/edu.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/hw/misc/edu.c b/hw/misc/edu.c
index 91af452c9e..65fc32b928 100644
--- a/hw/misc/edu.c
+++ b/hw/misc/edu.c
@@ -289,6 +289,15 @@ static const MemoryRegionOps edu_mmio_ops = {
 .read = edu_mmio_read,
 .write = edu_mmio_write,
 .endianness = DEVICE_NATIVE_ENDIAN,
+.valid = {
+.min_access_size = 4,
+.max_access_size = 8,
+},
+.impl = {
+.min_access_size = 4,
+.max_access_size = 8,
+},
+
 };
 
 /*
-- 
2.17.1





[Qemu-devel] [PATCH v2 2/3] edu: mmio: allow mmio read dispatch accept 8 bytes

2019-04-20 Thread Li Qiang
The edu spec said when address >= 0x80, the MMIO area can
be accessed by 8 bytes.

Signed-off-by: Li Qiang 
---
 hw/misc/edu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/misc/edu.c b/hw/misc/edu.c
index 65fc32b928..4018dddcb8 100644
--- a/hw/misc/edu.c
+++ b/hw/misc/edu.c
@@ -189,6 +189,10 @@ static uint64_t edu_mmio_read(void *opaque, hwaddr addr, 
unsigned size)
 return val;
 }
 
+if (addr >= 0x80 && size != 4 && size != 8) {
+return val;
+}
+
 switch (addr) {
 case 0x00:
 val = 0x01edu;
-- 
2.17.1





[Qemu-devel] [PATCH v2 2/3] edu: mmio: set 'max_access_size' to 8

2019-04-20 Thread Li Qiang
The edu spec said, the MMIO area can be accessed by 8 bytes.
However currently the 'max_access_size' is not so the MMIO
access dispatch can only access 4 bytes one time. This patch
fixes this to respect the spec.

Notice: here the 'min_access_size' is not a must, I set this
for completement.

Signed-off-by: Li Qiang 
---
 hw/misc/edu.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/hw/misc/edu.c b/hw/misc/edu.c
index 91af452c9e..65fc32b928 100644
--- a/hw/misc/edu.c
+++ b/hw/misc/edu.c
@@ -289,6 +289,15 @@ static const MemoryRegionOps edu_mmio_ops = {
 .read = edu_mmio_read,
 .write = edu_mmio_write,
 .endianness = DEVICE_NATIVE_ENDIAN,
+.valid = {
+.min_access_size = 4,
+.max_access_size = 8,
+},
+.impl = {
+.min_access_size = 4,
+.max_access_size = 8,
+},
+
 };
 
 /*
-- 
2.17.1





[Qemu-devel] [PATCH v2 0/3] hw: edu: some fixes

2019-04-20 Thread Li Qiang
Recently I am considering write a driver for edu device.
After reading the spec, I found these three small issue.
Two first two related the MMIO access and the third is
related the DMA operation.

Change since v1:
Fix format compile error on Windows

Li Qiang (3):
  tests: fw_cfg: add splash time test case
  edu: mmio: set 'max_access_size' to 8
  edu: mmio: allow mmio read dispatch accept 8 bytes

 hw/misc/edu.c   | 13 +
 tests/fw_cfg-test.c | 19 +++
 2 files changed, 32 insertions(+)

-- 
2.17.1





[Qemu-devel] [PATCH v2 1/3] tests: fw_cfg: add splash time test case

2019-04-20 Thread Li Qiang
Signed-off-by: Li Qiang 
---
 tests/fw_cfg-test.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/tests/fw_cfg-test.c b/tests/fw_cfg-test.c
index 9f75dbb5f4..de8e81ea9d 100644
--- a/tests/fw_cfg-test.c
+++ b/tests/fw_cfg-test.c
@@ -192,6 +192,24 @@ static void test_fw_cfg_reboot_timeout(void)
 qtest_quit(s);
 }
 
+static void test_fw_cfg_splash_time(void)
+{
+QFWCFG *fw_cfg;
+QTestState *s;
+uint16_t splash_time = 0;
+size_t filesize;
+
+s = qtest_init("-boot splash-time=12");
+fw_cfg = pc_fw_cfg_init(s);
+
+filesize = qfw_cfg_get_file(fw_cfg, "etc/boot-menu-wait",
+ _time, sizeof(splash_time));
+g_assert_cmpint(filesize, ==, sizeof(splash_time));
+g_assert_cmpint(splash_time, ==, 12);
+pc_fw_cfg_uninit(fw_cfg);
+qtest_quit(s);
+}
+
 int main(int argc, char **argv)
 {
 int ret;
@@ -214,6 +232,7 @@ int main(int argc, char **argv)
 qtest_add_func("fw_cfg/numa", test_fw_cfg_numa);
 qtest_add_func("fw_cfg/boot_menu", test_fw_cfg_boot_menu);
 qtest_add_func("fw_cfg/reboot_timeout", test_fw_cfg_reboot_timeout);
+qtest_add_func("fw_cfg/splash_time", test_fw_cfg_splash_time);
 
 ret = g_test_run();
 
-- 
2.17.1





[Qemu-devel] [PATCH v2 3/3] edu: mmio: allow mmio read dispatch accept 8 bytes

2019-04-20 Thread Li Qiang
The edu spec said when address >= 0x80, the MMIO area can
be accessed by 8 bytes.

Signed-off-by: Li Qiang 
---
Change since v1:
Fix format compile error on Windows

 hw/misc/edu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/misc/edu.c b/hw/misc/edu.c
index 65fc32b928..4018dddcb8 100644
--- a/hw/misc/edu.c
+++ b/hw/misc/edu.c
@@ -189,6 +189,10 @@ static uint64_t edu_mmio_read(void *opaque, hwaddr addr, 
unsigned size)
 return val;
 }
 
+if (addr >= 0x80 && size != 4 && size != 8) {
+return val;
+}
+
 switch (addr) {
 case 0x00:
 val = 0x01edu;
-- 
2.17.1





Re: [Qemu-devel] [PATCH 0/3] hw: edu: some fixes

2019-04-20 Thread no-reply
Patchew URL: https://patchew.org/QEMU/20190420145120.122847-1-liq...@163.com/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
time make docker-test-mingw@fedora SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  SIGNoptionrom/multiboot.bin
  LINKqemu-edid.exe
/tmp/qemu-test/src/hw/misc/edu.c: In function 'edu_check_range':
/tmp/qemu-test/src/hw/misc/edu.c:117:36: error: format '%lx' expects argument 
of type 'long unsigned int', but argument 2 has type 'uint64_t' {aka 'long long 
unsigned int'} [-Werror=format=]
 hw_error("EDU: DMA range 0x%.8lx-0x%.8lx out of bounds (0x%.8lx-0x%.8lx)!",
^
%.8llx
 addr, end1 - 1, start, end2 - 1);
 
/tmp/qemu-test/src/hw/misc/edu.c:117:44: error: format '%lx' expects argument 
of type 'long unsigned int', but argument 3 has type 'uint64_t' {aka 'long long 
unsigned int'} [-Werror=format=]
 hw_error("EDU: DMA range 0x%.8lx-0x%.8lx out of bounds (0x%.8lx-0x%.8lx)!",
^
%.8llx
 addr, end1 - 1, start, end2 - 1);
     
/tmp/qemu-test/src/hw/misc/edu.c:117:67: error: format '%lx' expects argument 
of type 'long unsigned int', but argument 4 has type 'uint64_t' {aka 'long long 
unsigned int'} [-Werror=format=]
 hw_error("EDU: DMA range 0x%.8lx-0x%.8lx out of bounds (0x%.8lx-0x%.8lx)!",
   ^
   %.8llx
 addr, end1 - 1, start, end2 - 1);
 ~  
/tmp/qemu-test/src/hw/misc/edu.c:117:75: error: format '%lx' expects argument 
of type 'long unsigned int', but argument 5 has type 'uint64_t' {aka 'long long 
unsigned int'} [-Werror=format=]
 hw_error("EDU: DMA range 0x%.8lx-0x%.8lx out of bounds (0x%.8lx-0x%.8lx)!",
   ^
   %.8llx


The full log is available at
http://patchew.org/logs/20190420145120.122847-1-liq...@163.com/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[Qemu-devel] [PATCH 3/3] edu: uses uint64_t in dma operation

2019-04-20 Thread Li Qiang
The dma related variable is dma_addr_t, it is uint64_t in
x64 platform. Change these usage from uint32_to uint64_t to
avoid trancation.

Signed-off-by: Li Qiang 
---
 hw/misc/edu.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/hw/misc/edu.c b/hw/misc/edu.c
index 4018dddcb8..b93a679adf 100644
--- a/hw/misc/edu.c
+++ b/hw/misc/edu.c
@@ -98,23 +98,23 @@ static void edu_lower_irq(EduState *edu, uint32_t val)
 }
 }
 
-static bool within(uint32_t addr, uint32_t start, uint32_t end)
+static bool within(uint64_t addr, uint64_t start, uint64_t end)
 {
 return start <= addr && addr < end;
 }
 
-static void edu_check_range(uint32_t addr, uint32_t size1, uint32_t start,
+static void edu_check_range(uint64_t addr, uint64_t size1, uint64_t start,
 uint32_t size2)
 {
-uint32_t end1 = addr + size1;
-uint32_t end2 = start + size2;
+uint64_t end1 = addr + size1;
+uint64_t end2 = start + size2;
 
 if (within(addr, start, end2) &&
 end1 > addr && within(end1, start, end2)) {
 return;
 }
 
-hw_error("EDU: DMA range 0x%.8x-0x%.8x out of bounds (0x%.8x-0x%.8x)!",
+hw_error("EDU: DMA range 0x%.8lx-0x%.8lx out of bounds (0x%.8lx-0x%.8lx)!",
 addr, end1 - 1, start, end2 - 1);
 }
 
@@ -139,13 +139,13 @@ static void edu_dma_timer(void *opaque)
 }
 
 if (EDU_DMA_DIR(edu->dma.cmd) == EDU_DMA_FROM_PCI) {
-uint32_t dst = edu->dma.dst;
+uint64_t dst = edu->dma.dst;
 edu_check_range(dst, edu->dma.cnt, DMA_START, DMA_SIZE);
 dst -= DMA_START;
 pci_dma_read(>pdev, edu_clamp_addr(edu, edu->dma.src),
 edu->dma_buf + dst, edu->dma.cnt);
 } else {
-uint32_t src = edu->dma.src;
+uint64_t src = edu->dma.src;
 edu_check_range(src, edu->dma.cnt, DMA_START, DMA_SIZE);
 src -= DMA_START;
 pci_dma_write(>pdev, edu_clamp_addr(edu, edu->dma.dst),
-- 
2.17.1





[Qemu-devel] [PATCH 1/3] edu: mmio: set 'max_access_size' to 8

2019-04-20 Thread Li Qiang
The edu spec said, the MMIO area can be accessed by 8 bytes.
However currently the 'max_access_size' is not so the MMIO
access dispatch can only access 4 bytes one time. This patch
fixes this to respect the spec.

Notice: here the 'min_access_size' is not a must, I set this
for completement.

Signed-off-by: Li Qiang 
---
 hw/misc/edu.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/hw/misc/edu.c b/hw/misc/edu.c
index 91af452c9e..65fc32b928 100644
--- a/hw/misc/edu.c
+++ b/hw/misc/edu.c
@@ -289,6 +289,15 @@ static const MemoryRegionOps edu_mmio_ops = {
 .read = edu_mmio_read,
 .write = edu_mmio_write,
 .endianness = DEVICE_NATIVE_ENDIAN,
+.valid = {
+.min_access_size = 4,
+.max_access_size = 8,
+},
+.impl = {
+.min_access_size = 4,
+.max_access_size = 8,
+},
+
 };
 
 /*
-- 
2.17.1





[Qemu-devel] [PATCH 0/3] hw: edu: some fixes

2019-04-20 Thread Li Qiang
Recently I am considering write a driver for edu device.
After reading the spec, I found these three small issue.
Two first two related the MMIO access and the third is
related the DMA operation.

Li Qiang (3):
  edu: mmio: set 'max_access_size' to 8
  edu: mmio: allow mmio read dispatch accept 8 bytes
  edu: uses uint64_t in dma operation

 hw/misc/edu.c | 27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

-- 
2.17.1





[Qemu-devel] [PATCH 2/3] edu: mmio: allow mmio read dispatch accept 8 bytes

2019-04-20 Thread Li Qiang
The edu spec said when address >= 0x80, the MMIO area can
be accessed by 8 bytes.

Signed-off-by: Li Qiang 
---
 hw/misc/edu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/misc/edu.c b/hw/misc/edu.c
index 65fc32b928..4018dddcb8 100644
--- a/hw/misc/edu.c
+++ b/hw/misc/edu.c
@@ -189,6 +189,10 @@ static uint64_t edu_mmio_read(void *opaque, hwaddr addr, 
unsigned size)
 return val;
 }
 
+if (addr >= 0x80 && size != 4 && size != 8) {
+return val;
+}
+
 switch (addr) {
 case 0x00:
 val = 0x01edu;
-- 
2.17.1





Re: [Qemu-devel] [PATCH 03/38] tcg: Return bool success from tcg_out_mov

2019-04-20 Thread Philippe Mathieu-Daudé
On 4/20/19 9:34 AM, Richard Henderson wrote:
> This patch merely changes the interface, aborting on all failures,
> of which there are currently none.
> 
> Reviewed-by: David Gibson 
> Signed-off-by: Richard Henderson 

Reviewed-by: Philippe Mathieu-Daudé 

> ---
>  tcg/aarch64/tcg-target.inc.c |  5 +++--
>  tcg/arm/tcg-target.inc.c |  7 +--
>  tcg/i386/tcg-target.inc.c|  5 +++--
>  tcg/mips/tcg-target.inc.c|  3 ++-
>  tcg/ppc/tcg-target.inc.c |  3 ++-
>  tcg/riscv/tcg-target.inc.c   |  5 +++--
>  tcg/s390/tcg-target.inc.c|  3 ++-
>  tcg/sparc/tcg-target.inc.c   |  3 ++-
>  tcg/tcg.c| 14 ++
>  tcg/tci/tcg-target.inc.c |  3 ++-
>  10 files changed, 34 insertions(+), 17 deletions(-)
> 
> diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
> index 8b93598bce..b2d3f9c0a5 100644
> --- a/tcg/aarch64/tcg-target.inc.c
> +++ b/tcg/aarch64/tcg-target.inc.c
> @@ -938,10 +938,10 @@ static void tcg_out_ldst(TCGContext *s, AArch64Insn 
> insn, TCGReg rd,
>  tcg_out_ldst_r(s, insn, rd, rn, TCG_TYPE_I64, TCG_REG_TMP);
>  }
>  
> -static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
> +static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
>  {
>  if (ret == arg) {
> -return;
> +return true;
>  }
>  switch (type) {
>  case TCG_TYPE_I32:
> @@ -970,6 +970,7 @@ static void tcg_out_mov(TCGContext *s, TCGType type, 
> TCGReg ret, TCGReg arg)
>  default:
>  g_assert_not_reached();
>  }
> +return true;
>  }
>  
>  static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
> diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
> index 6873b0cf95..34e6652142 100644
> --- a/tcg/arm/tcg-target.inc.c
> +++ b/tcg/arm/tcg-target.inc.c
> @@ -2275,10 +2275,13 @@ static inline bool tcg_out_sti(TCGContext *s, TCGType 
> type, TCGArg val,
>  return false;
>  }
>  
> -static inline void tcg_out_mov(TCGContext *s, TCGType type,
> +static inline bool tcg_out_mov(TCGContext *s, TCGType type,
> TCGReg ret, TCGReg arg)
>  {
> -tcg_out_dat_reg(s, COND_AL, ARITH_MOV, ret, 0, arg, SHIFT_IMM_LSL(0));
> +if (ret != arg) {
> +tcg_out_dat_reg(s, COND_AL, ARITH_MOV, ret, 0, arg, 
> SHIFT_IMM_LSL(0));
> +}
> +return true;
>  }
>  
>  static inline void tcg_out_movi(TCGContext *s, TCGType type,
> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
> index 1fa833840e..817a167767 100644
> --- a/tcg/i386/tcg-target.inc.c
> +++ b/tcg/i386/tcg-target.inc.c
> @@ -809,12 +809,12 @@ static inline void tgen_arithr(TCGContext *s, int 
> subop, int dest, int src)
>  tcg_out_modrm(s, OPC_ARITH_GvEv + (subop << 3) + ext, dest, src);
>  }
>  
> -static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
> +static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
>  {
>  int rexw = 0;
>  
>  if (arg == ret) {
> -return;
> +return true;
>  }
>  switch (type) {
>  case TCG_TYPE_I64:
> @@ -852,6 +852,7 @@ static void tcg_out_mov(TCGContext *s, TCGType type, 
> TCGReg ret, TCGReg arg)
>  default:
>  g_assert_not_reached();
>  }
> +return true;
>  }
>  
>  static void tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
> diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
> index 8a92e916dd..f31ebb43bf 100644
> --- a/tcg/mips/tcg-target.inc.c
> +++ b/tcg/mips/tcg-target.inc.c
> @@ -558,13 +558,14 @@ static inline void tcg_out_dsra(TCGContext *s, TCGReg 
> rd, TCGReg rt, TCGArg sa)
>  tcg_out_opc_sa64(s, OPC_DSRA, OPC_DSRA32, rd, rt, sa);
>  }
>  
> -static inline void tcg_out_mov(TCGContext *s, TCGType type,
> +static inline bool tcg_out_mov(TCGContext *s, TCGType type,
> TCGReg ret, TCGReg arg)
>  {
>  /* Simple reg-reg move, optimising out the 'do nothing' case */
>  if (ret != arg) {
>  tcg_out_opc_reg(s, OPC_OR, ret, arg, TCG_REG_ZERO);
>  }
> +return true;
>  }
>  
>  static void tcg_out_movi(TCGContext *s, TCGType type,
> diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
> index 773690f1d9..ec8e336be8 100644
> --- a/tcg/ppc/tcg-target.inc.c
> +++ b/tcg/ppc/tcg-target.inc.c
> @@ -566,12 +566,13 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int 
> type,
>  static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt,
>   TCGReg base, tcg_target_long offset);
>  
> -static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
> +static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
>  {
>  tcg_debug_assert(TCG_TARGET_REG_BITS == 64 || type == TCG_TYPE_I32);
>  if (ret != arg) {
>  tcg_out32(s, OR | SAB(arg, ret, arg));
>  }
> +return true;
>  }
>  
>  static inline void tcg_out_rld(TCGContext *s, int op, 

Re: [Qemu-devel] [PATCH for-QEMU-4.1 v5 23/29] hw/arm: Express dependencies of the MSF2 / EMCRAFT_SF2 machine with Kconfig

2019-04-20 Thread Philippe Mathieu-Daudé
On 4/18/19 8:00 PM, Thomas Huth wrote:
> Add Kconfig dependencies for the emcraft-sf2 machine - we also
> distinguish between the machine (CONFIG_EMCRAFT_SF2) and the SoC
> (CONFIG_MSF2) now.

Thanks!

> 
> Signed-off-by: Thomas Huth 

Reviewed-by: Philippe Mathieu-Daudé 

> ---
>  default-configs/arm-softmmu.mak | 3 +--
>  hw/arm/Kconfig  | 8 
>  hw/arm/Makefile.objs| 3 ++-
>  3 files changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
> index ef7dd7156a..1455d083d8 100644
> --- a/default-configs/arm-softmmu.mak
> +++ b/default-configs/arm-softmmu.mak
> @@ -34,9 +34,9 @@ CONFIG_MPS2=y
>  CONFIG_RASPI=y
>  CONFIG_DIGIC=y
>  CONFIG_SABRELITE=y
> +CONFIG_EMCRAFT_SF2=y
>  
>  CONFIG_VGA=y
> -CONFIG_SSI_M25P80=y
>  CONFIG_IMX_FEC=y
>  
>  CONFIG_NRF51_SOC=y
> @@ -49,5 +49,4 @@ CONFIG_PCIE_PORT=y
>  CONFIG_XIO3130=y
>  CONFIG_IOH3420=y
>  CONFIG_I82801B11=y
> -CONFIG_MSF2=y
>  CONFIG_PCI_EXPRESS_DESIGNWARE=y
> diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
> index 065c7acf1b..f5cd63860d 100644
> --- a/hw/arm/Kconfig
> +++ b/hw/arm/Kconfig
> @@ -331,9 +331,17 @@ config FSL_IMX6UL
>  config NRF51_SOC
>  bool
>  
> +config EMCRAFT_SF2
> +bool
> +select MSF2
> +select SSI_M25P80
> +
>  config MSF2
>  bool
> +select ARM_V7M
>  select PTIMER
> +select SERIAL
> +select SSI
>  
>  config ZAURUS
>  bool
> diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
> index fadd69882c..eae9f6c442 100644
> --- a/hw/arm/Makefile.objs
> +++ b/hw/arm/Makefile.objs
> @@ -4,6 +4,7 @@ obj-$(CONFIG_ARM_VIRT) += virt.o
>  obj-$(CONFIG_ACPI) += virt-acpi-build.o
>  obj-$(CONFIG_DIGIC) += digic_boards.o
>  obj-$(CONFIG_EXYNOS4) += exynos4_boards.o
> +obj-$(CONFIG_EMCRAFT_SF2) += msf2-som.o
>  obj-$(CONFIG_HIGHBANK) += highbank.o
>  obj-$(CONFIG_INTEGRATOR) += integratorcp.o
>  obj-$(CONFIG_MAINSTONE) += mainstone.o
> @@ -41,7 +42,7 @@ obj-$(CONFIG_FSL_IMX6) += fsl-imx6.o
>  obj-$(CONFIG_ASPEED_SOC) += aspeed_soc.o aspeed.o
>  obj-$(CONFIG_MPS2) += mps2.o
>  obj-$(CONFIG_MPS2) += mps2-tz.o
> -obj-$(CONFIG_MSF2) += msf2-soc.o msf2-som.o
> +obj-$(CONFIG_MSF2) += msf2-soc.o
>  obj-$(CONFIG_MUSCA) += musca.o
>  obj-$(CONFIG_ARMSSE) += armsse.o
>  obj-$(CONFIG_FSL_IMX7) += fsl-imx7.o mcimx7d-sabre.o
> 



Re: [Qemu-devel] [PATCH] configure: Change capstone's default state to disabled

2019-04-20 Thread Thomas Huth
On 19/04/2019 15.44, G 3 wrote:
> 
> On Apr 19, 2019, at 3:10 AM, Thomas Huth wrote:
> 
>> On 19/04/2019 00.47, John Arbuckle wrote:
>>> Capstone is not necessary in order to use QEMU. Disable it by default.
>>> This will save the user the pain of having to figure why QEMU isn't
>>> building when this library is missing.
>>>
>>> Signed-off-by: John Arbuckle 
>>> ---
>>>  configure | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/configure b/configure
>>> index 1c563a7027..77d7967f92 100755
>>> --- a/configure
>>> +++ b/configure
>>> @@ -433,7 +433,7 @@ opengl_dmabuf="no"
>>>  cpuid_h="no"
>>>  avx2_opt=""
>>>  zlib="yes"
>>> -capstone=""
>>> +capstone="no"
>>>  lzo=""
>>>  snappy=""
>>>  bzip2=""
>>
>> AFAIK we ship capstone as a submodule, so how can this be missing? Also,
>> our philosophy is to keep everything enabled by default if possible, so
>> that the code paths don't bitrot. Thus I don't think that disabling this
>> by default is a good idea. ... so if you've got a problem here, there
>> must be another solution (e.g. is the system capstone detection not
>> working right on your system?).
>>
>>  Thomas
> 
> Thank you for replying. Capstone comes with QEMU? Every time I try to
> compile QEMU I see an error relating to Capstone not being on my system.
> Why do you feel that disabling Capstone by default is not a good idea?
> 
> Here is the error message I see when compiling QEMU:
> 
> CHK version_gen.h
> make[1]: *** No rule to make target
> `/Users/John/qemu-git/capstone/libcapstone.a'.  Stop.
> make: *** [subdir-capstone] Error 2

I assume you're using a git checkout here, right? For git checkouts, the
Makefile should take care of calling the scripts/git-submodule.sh script
which should initialize the submodule in the capstone directory.

What's the content of your .git-submodule-status file? What does
"configure" say about capstone support on your system?

 Thomas



Re: [Qemu-devel] [PATCH for-QEMU-4.1 v5 02/29] hw/ide/ahci: Add a Kconfig switch for the AHDI-ICH9 device

2019-04-20 Thread Philippe Mathieu-Daudé
On 4/18/19 8:00 PM, Thomas Huth wrote:
> Some of our machines (like the ARM cubieboard) use CONFIG_AHCI for an AHCI
> sysbus device, but do not use CONFIG_PCI since they do not feature a PCI
> bus. With CONFIG_AHCI but without CONFIG_PCI, currently linking fails:
> 
> ../hw/ide/ich.o: In function `pci_ich9_ahci_realize':
> hw/ide/ich.c:124: undefined reference to `pci_allocate_irq'
> hw/ide/ich.c:126: undefined reference to `pci_register_bar'
> hw/ide/ich.c:128: undefined reference to `pci_register_bar'
> hw/ide/ich.c:131: undefined reference to `pci_add_capability'
> hw/ide/ich.c:147: undefined reference to `msi_init'
> ../hw/ide/ich.o: In function `pci_ich9_uninit':
> hw/ide/ich.c:158: undefined reference to `msi_uninit'
> ../hw/ide/ich.o:(.data.rel+0x50): undefined reference to 
> `vmstate_pci_device'
> 
> We must only compile ich.c if CONFIG_PCI is available, too, so introduce a
> new config switch for this device.
> 
> Signed-off-by: Thomas Huth 

Reviewed-by: Philippe Mathieu-Daudé 

> ---
>  hw/ide/Kconfig   | 6 +-
>  hw/ide/Makefile.objs | 2 +-
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ide/Kconfig b/hw/ide/Kconfig
> index ab47b6a7a3..5d9106b1ac 100644
> --- a/hw/ide/Kconfig
> +++ b/hw/ide/Kconfig
> @@ -43,10 +43,14 @@ config MICRODRIVE
>  select IDE_QDEV
>  
>  config AHCI
> +bool
> +select IDE_QDEV
> +
> +config AHCI_ICH9
>  bool
>  default y if PCI_DEVICES
>  depends on PCI
> -select IDE_QDEV
> +select AHCI
>  
>  config IDE_SII3112
>  bool
> diff --git a/hw/ide/Makefile.objs b/hw/ide/Makefile.objs
> index a142add90e..faf04e0209 100644
> --- a/hw/ide/Makefile.objs
> +++ b/hw/ide/Makefile.objs
> @@ -9,6 +9,6 @@ common-obj-$(CONFIG_IDE_MMIO) += mmio.o
>  common-obj-$(CONFIG_IDE_VIA) += via.o
>  common-obj-$(CONFIG_MICRODRIVE) += microdrive.o
>  common-obj-$(CONFIG_AHCI) += ahci.o
> -common-obj-$(CONFIG_AHCI) += ich.o
> +common-obj-$(CONFIG_AHCI_ICH9) += ich.o
>  common-obj-$(CONFIG_ALLWINNER_A10) += ahci-allwinner.o
>  common-obj-$(CONFIG_IDE_SII3112) += sii3112.o
> 



Re: [Qemu-devel] [PATCH 2/2] tests: fw_cfg: add reboot_timeout test case

2019-04-20 Thread Li Qiang
Philippe Mathieu-Daudé  于2019年4月19日周五 上午5:01写道:

> Hi Li,
>
> On 3/19/19 3:30 AM, Li Qiang wrote:
> > Signed-off-by: Li Qiang 
> > ---
> >  tests/fw_cfg-test.c | 15 ++-
> >  1 file changed, 14 insertions(+), 1 deletion(-)
> >
> > diff --git a/tests/fw_cfg-test.c b/tests/fw_cfg-test.c
> > index 1c5103fe1c..551b51e38f 100644
> > --- a/tests/fw_cfg-test.c
> > +++ b/tests/fw_cfg-test.c
> > @@ -99,6 +99,17 @@ static void test_fw_cfg_boot_menu(void)
> >  g_assert_cmpint(qfw_cfg_get_u16(fw_cfg, FW_CFG_BOOT_MENU), ==,
> boot_menu);
> >  }
> >
> > +static void test_fw_cfg_reboot_timeout(void)
> > +{
> > +uint32_t reboot_timeout = 0;
> > +size_t filesize;
> > +
> > +filesize = qfw_cfg_get_file(fw_cfg, "etc/boot-fail-wait",
> > + _timeout, sizeof(reboot_timeout));
> > +g_assert_cmpint(filesize, ==, sizeof(reboot_timeout));
> > +g_assert_cmpint(reboot_timeout, ==, 15);
> > +}
> > +
> >  int main(int argc, char **argv)
> >  {
> >  QTestState *s;
> > @@ -106,7 +117,8 @@ int main(int argc, char **argv)
> >
> >  g_test_init(, , NULL);
> >
> > -s = qtest_init("-uuid 4600cb32-38ec-4b2f-8acb-81c6ea54f2d8");
> > +s = qtest_init("-uuid 4600cb32-38ec-4b2f-8acb-81c6ea54f2d8 "
> > +   "-boot reboot-timeout=15");
>
> This modify all tests. I'd rather add a specific test with this option.
> Doing so, we can easily modify the timeout and add the <0 and >0x
> cases.
>
> Can you think of a 'splash-time' test (for commit 6912bb0b3d3b1)?
>
>
Hi Philippe,

I have sent out the new revision patchset.
Please notice as the new patchset changed a lot(refactor the fw_cfg_test
and add two test cases)
I don't bump the version.

Thanks,
Li Qiang



> Regards,
>
> Phil.
>
> >
> >  fw_cfg = pc_fw_cfg_init(s);
> >
> > @@ -125,6 +137,7 @@ int main(int argc, char **argv)
> >  qtest_add_func("fw_cfg/max_cpus", test_fw_cfg_max_cpus);
> >  qtest_add_func("fw_cfg/numa", test_fw_cfg_numa);
> >  qtest_add_func("fw_cfg/boot_menu", test_fw_cfg_boot_menu);
> > +qtest_add_func("fw_cfg/reboot_timeout", test_fw_cfg_reboot_timeout);
> >
> >  ret = g_test_run();
> >
> >
>


[Qemu-devel] [PATCH 4/4] tests: fw_cfg: add splash time test case

2019-04-20 Thread Li Qiang
Signed-off-by: Li Qiang 
---
 tests/fw_cfg-test.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/tests/fw_cfg-test.c b/tests/fw_cfg-test.c
index 9f75dbb5f4..de8e81ea9d 100644
--- a/tests/fw_cfg-test.c
+++ b/tests/fw_cfg-test.c
@@ -192,6 +192,24 @@ static void test_fw_cfg_reboot_timeout(void)
 qtest_quit(s);
 }
 
+static void test_fw_cfg_splash_time(void)
+{
+QFWCFG *fw_cfg;
+QTestState *s;
+uint16_t splash_time = 0;
+size_t filesize;
+
+s = qtest_init("-boot splash-time=12");
+fw_cfg = pc_fw_cfg_init(s);
+
+filesize = qfw_cfg_get_file(fw_cfg, "etc/boot-menu-wait",
+ _time, sizeof(splash_time));
+g_assert_cmpint(filesize, ==, sizeof(splash_time));
+g_assert_cmpint(splash_time, ==, 12);
+pc_fw_cfg_uninit(fw_cfg);
+qtest_quit(s);
+}
+
 int main(int argc, char **argv)
 {
 int ret;
@@ -214,6 +232,7 @@ int main(int argc, char **argv)
 qtest_add_func("fw_cfg/numa", test_fw_cfg_numa);
 qtest_add_func("fw_cfg/boot_menu", test_fw_cfg_boot_menu);
 qtest_add_func("fw_cfg/reboot_timeout", test_fw_cfg_reboot_timeout);
+qtest_add_func("fw_cfg/splash_time", test_fw_cfg_splash_time);
 
 ret = g_test_run();
 
-- 
2.17.1





Re: [Qemu-devel] [PATCH] cputlb: Fix io_readx() to respect the access_type

2019-04-20 Thread Philippe Mathieu-Daudé
Hi Alex,

Le sam. 20 avr. 2019 01:05, Alex Bennée  a écrit :

>
> Shahab Vahedi  writes:
>
> > This change adapts io_readx() to its input access_type. Currently
> > io_readx() treats any memory access as a read, although it has an
> > input argument "MMUAccessType access_type". This results in:
> >
> > 1) Calling the tlb_fill() only with MMU_DATA_LOAD
> > 2) Considering only entry->addr_read as the tlb_addr
> >
> > Buglink: https://bugs.launchpad.net/qemu/+bug/1825359
>
> This bug talks about the distinction between DATA_LOAD and INST_FETCH
> but...
>
> >
> > Signed-off-by: Shahab Vahedi 
> > ---
> >  accel/tcg/cputlb.c | 7 +--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> > index 88cc8389e9..0daac0e806 100644
> > --- a/accel/tcg/cputlb.c
> > +++ b/accel/tcg/cputlb.c
> > @@ -878,10 +878,13 @@ static uint64_t io_readx(CPUArchState *env,
> CPUIOTLBEntry *iotlbentry,
> >  CPUTLBEntry *entry;
> >  target_ulong tlb_addr;
> >
> > -tlb_fill(cpu, addr, size, MMU_DATA_LOAD, mmu_idx, retaddr);
> > +tlb_fill(cpu, addr, size, access_type, mmu_idx, retaddr);
> >
> >  entry = tlb_entry(env, mmu_idx, addr);
> > -tlb_addr = entry->addr_read;
> > +tlb_addr =
> > +(access_type == MMU_DATA_LOAD ) ? entry->addr_read  :
> > +(access_type == MMU_DATA_STORE) ? entry->addr_write :
> > +entry->addr_code;
>
> ...why do we care here about MMU_DATA_STORE?
>
> We could just assert (access_type == MMU_DATA_LOAD || access_type ==
> MMU_INST_FETCH) and then have:
>

Is asserting the best we can do here?


>   (access_type == MMU_DATA_LOAD ) ? entry->addr_read  : entry->addr_code
>
>
> >  if (!(tlb_addr & ~(TARGET_PAGE_MASK | TLB_RECHECK))) {
> >  /* RAM access */
> >  uintptr_t haddr = addr + entry->addend;
>
>
> --
> Alex Bennée
>
>


[Qemu-devel] [PATCH 0/4] fw_cfg_test refactor and add two test cases

2019-04-20 Thread Li Qiang
In the disscuss of adding reboot timeout test case:
https://lists.gnu.org/archive/html/qemu-devel/2019-04/msg03304.html

Philippe suggested we should uses the only related option for one
specific test. However currently we uses one QTestState for all the
test cases. In order to achieve Philippe's idea, I split the test case
for its own QTestState. As this patchset has changed a lot, I don't bump
the version.


Li Qiang (4):
  tests: refactor fw_cfg_test
  tests: fw_cfg: add a function to get the fw_cfg file
  tests: fw_cfg: add reboot_timeout test case
  tests: fw_cfg: add splash time test case

 tests/fw_cfg-test.c   | 122 +++---
 tests/libqos/fw_cfg.c |  55 +++
 tests/libqos/fw_cfg.h |   9 
 3 files changed, 178 insertions(+), 8 deletions(-)

-- 
2.17.1





[Qemu-devel] [PATCH 2/4] tests: fw_cfg: add a function to get the fw_cfg file

2019-04-20 Thread Li Qiang
This is useful to write qtest about fw_cfg file entry.

Signed-off-by: Li Qiang 
---
 tests/libqos/fw_cfg.c | 45 +++
 tests/libqos/fw_cfg.h |  2 ++
 2 files changed, 47 insertions(+)

diff --git a/tests/libqos/fw_cfg.c b/tests/libqos/fw_cfg.c
index c6839c53c8..1f46258f96 100644
--- a/tests/libqos/fw_cfg.c
+++ b/tests/libqos/fw_cfg.c
@@ -16,6 +16,7 @@
 #include "libqos/fw_cfg.h"
 #include "libqtest.h"
 #include "qemu/bswap.h"
+#include "hw/nvram/fw_cfg.h"
 
 void qfw_cfg_select(QFWCFG *fw_cfg, uint16_t key)
 {
@@ -59,6 +60,50 @@ static void mm_fw_cfg_select(QFWCFG *fw_cfg, uint16_t key)
 qtest_writew(fw_cfg->qts, fw_cfg->base, key);
 }
 
+/*
+ * The caller need check the return value. When the return value is
+ * nonzero, it means that some bytes have been transferred.
+ *
+ * If the fw_cfg file in question is smaller than the allocated & passed-in
+ * buffer, then the buffer has been populated only in part.
+ *
+ * If the fw_cfg file in question is larger than the passed-in
+ * buffer, then the return value explains how much room would have been
+ * necessary in total. And, while the caller's buffer has been fully
+ * populated, it has received only a starting slice of the fw_cfg file.
+ */
+size_t qfw_cfg_get_file(QFWCFG *fw_cfg, const char *filename,
+  void *data, size_t buflen)
+{
+uint32_t count;
+uint32_t i;
+unsigned char *filesbuf = NULL;
+size_t dsize;
+FWCfgFile *pdir_entry;
+size_t filesize = 0;
+
+qfw_cfg_get(fw_cfg, FW_CFG_FILE_DIR, , sizeof(count));
+count = be32_to_cpu(count);
+dsize = sizeof(uint32_t) + count * sizeof(struct fw_cfg_file);
+filesbuf = g_malloc(dsize);
+qfw_cfg_get(fw_cfg, FW_CFG_FILE_DIR, filesbuf, dsize);
+pdir_entry = (FWCfgFile *)(filesbuf + sizeof(uint32_t));
+for (i = 0; i < count; ++i, ++pdir_entry) {
+if (!strcmp(pdir_entry->name, filename)) {
+uint32_t len = be32_to_cpu(pdir_entry->size);
+uint16_t sel = be16_to_cpu(pdir_entry->select);
+filesize = len;
+if (len > buflen) {
+len = buflen;
+}
+qfw_cfg_get(fw_cfg, sel, data, len);
+break;
+}
+}
+g_free(filesbuf);
+return filesize;
+}
+
 static void mm_fw_cfg_read(QFWCFG *fw_cfg, void *data, size_t len)
 {
 uint8_t *ptr = data;
diff --git a/tests/libqos/fw_cfg.h b/tests/libqos/fw_cfg.h
index 60de81e863..13325cc4ff 100644
--- a/tests/libqos/fw_cfg.h
+++ b/tests/libqos/fw_cfg.h
@@ -31,6 +31,8 @@ void qfw_cfg_get(QFWCFG *fw_cfg, uint16_t key, void *data, 
size_t len);
 uint16_t qfw_cfg_get_u16(QFWCFG *fw_cfg, uint16_t key);
 uint32_t qfw_cfg_get_u32(QFWCFG *fw_cfg, uint16_t key);
 uint64_t qfw_cfg_get_u64(QFWCFG *fw_cfg, uint16_t key);
+size_t qfw_cfg_get_file(QFWCFG *fw_cfg, const char *filename,
+void *data, size_t buflen);
 
 QFWCFG *mm_fw_cfg_init(QTestState *qts, uint64_t base);
 void mm_fw_cfg_uninit(QFWCFG *fw_cfg);
-- 
2.17.1





[Qemu-devel] [PATCH 3/4] tests: fw_cfg: add reboot_timeout test case

2019-04-20 Thread Li Qiang
Signed-off-by: Li Qiang 
---
 tests/fw_cfg-test.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/tests/fw_cfg-test.c b/tests/fw_cfg-test.c
index c22503619f..9f75dbb5f4 100644
--- a/tests/fw_cfg-test.c
+++ b/tests/fw_cfg-test.c
@@ -174,6 +174,24 @@ static void test_fw_cfg_boot_menu(void)
 qtest_quit(s);
 }
 
+static void test_fw_cfg_reboot_timeout(void)
+{
+QFWCFG *fw_cfg;
+QTestState *s;
+uint32_t reboot_timeout = 0;
+size_t filesize;
+
+s = qtest_init("-boot reboot-timeout=15");
+fw_cfg = pc_fw_cfg_init(s);
+
+filesize = qfw_cfg_get_file(fw_cfg, "etc/boot-fail-wait",
+ _timeout, sizeof(reboot_timeout));
+g_assert_cmpint(filesize, ==, sizeof(reboot_timeout));
+g_assert_cmpint(reboot_timeout, ==, 15);
+pc_fw_cfg_uninit(fw_cfg);
+qtest_quit(s);
+}
+
 int main(int argc, char **argv)
 {
 int ret;
@@ -195,6 +213,7 @@ int main(int argc, char **argv)
 qtest_add_func("fw_cfg/max_cpus", test_fw_cfg_max_cpus);
 qtest_add_func("fw_cfg/numa", test_fw_cfg_numa);
 qtest_add_func("fw_cfg/boot_menu", test_fw_cfg_boot_menu);
+qtest_add_func("fw_cfg/reboot_timeout", test_fw_cfg_reboot_timeout);
 
 ret = g_test_run();
 
-- 
2.17.1





[Qemu-devel] [PATCH 1/4] tests: refactor fw_cfg_test

2019-04-20 Thread Li Qiang
Currently, fw_cfg_test uses one QTestState for every test case.
This will add all command lines for every test case and
this is unnecessary. This patch split the test cases and for
every test case it uses his own QTestState. This patch does following
things:

1. Get rid of the global 'fw_cfg', this need add a uninit function

2. Convert every test case in a separate QTestState

After this patch, we can add fw_cfg test case freely and will not
have efect other test cases.

Signed-off-by: Li Qiang 
---
 tests/fw_cfg-test.c   | 86 ++-
 tests/libqos/fw_cfg.c | 10 +
 tests/libqos/fw_cfg.h |  7 
 3 files changed, 94 insertions(+), 9 deletions(-)

diff --git a/tests/fw_cfg-test.c b/tests/fw_cfg-test.c
index 1c5103fe1c..c22503619f 100644
--- a/tests/fw_cfg-test.c
+++ b/tests/fw_cfg-test.c
@@ -21,62 +21,127 @@ static uint16_t nb_cpus = 1;
 static uint16_t max_cpus = 1;
 static uint64_t nb_nodes = 0;
 static uint16_t boot_menu = 0;
-static QFWCFG *fw_cfg = NULL;
 
 static void test_fw_cfg_signature(void)
 {
+QFWCFG *fw_cfg;
+QTestState *s;
 char buf[5];
 
+s = qtest_init("");
+fw_cfg = pc_fw_cfg_init(s);
+
 qfw_cfg_get(fw_cfg, FW_CFG_SIGNATURE, buf, 4);
 buf[4] = 0;
 
 g_assert_cmpstr(buf, ==, "QEMU");
+pc_fw_cfg_uninit(fw_cfg);
+qtest_quit(s);
 }
 
 static void test_fw_cfg_id(void)
 {
-uint32_t id = qfw_cfg_get_u32(fw_cfg, FW_CFG_ID);
+QFWCFG *fw_cfg;
+QTestState *s;
+uint32_t id;
+
+s = qtest_init("");
+fw_cfg = pc_fw_cfg_init(s);
+
+id = qfw_cfg_get_u32(fw_cfg, FW_CFG_ID);
 g_assert((id == 1) ||
  (id == 3));
+pc_fw_cfg_uninit(fw_cfg);
+qtest_quit(s);
 }
 
 static void test_fw_cfg_uuid(void)
 {
+QFWCFG *fw_cfg;
+QTestState *s;
+
 uint8_t buf[16];
 static const uint8_t uuid[16] = {
 0x46, 0x00, 0xcb, 0x32, 0x38, 0xec, 0x4b, 0x2f,
 0x8a, 0xcb, 0x81, 0xc6, 0xea, 0x54, 0xf2, 0xd8,
 };
 
+s = qtest_init("-uuid 4600cb32-38ec-4b2f-8acb-81c6ea54f2d8");
+fw_cfg = pc_fw_cfg_init(s);
+
 qfw_cfg_get(fw_cfg, FW_CFG_UUID, buf, 16);
 g_assert(memcmp(buf, uuid, sizeof(buf)) == 0);
+
+pc_fw_cfg_uninit(fw_cfg);
+qtest_quit(s);
+
 }
 
 static void test_fw_cfg_ram_size(void)
 {
+QFWCFG *fw_cfg;
+QTestState *s;
+
+s = qtest_init("");
+fw_cfg = pc_fw_cfg_init(s);
+
 g_assert_cmpint(qfw_cfg_get_u64(fw_cfg, FW_CFG_RAM_SIZE), ==, ram_size);
+
+pc_fw_cfg_uninit(fw_cfg);
+qtest_quit(s);
 }
 
 static void test_fw_cfg_nographic(void)
 {
+QFWCFG *fw_cfg;
+QTestState *s;
+
+s = qtest_init("");
+fw_cfg = pc_fw_cfg_init(s);
+
 g_assert_cmpint(qfw_cfg_get_u16(fw_cfg, FW_CFG_NOGRAPHIC), ==, 0);
+
+pc_fw_cfg_uninit(fw_cfg);
+qtest_quit(s);
 }
 
 static void test_fw_cfg_nb_cpus(void)
 {
+QFWCFG *fw_cfg;
+QTestState *s;
+
+s = qtest_init("");
+fw_cfg = pc_fw_cfg_init(s);
+
 g_assert_cmpint(qfw_cfg_get_u16(fw_cfg, FW_CFG_NB_CPUS), ==, nb_cpus);
+
+pc_fw_cfg_uninit(fw_cfg);
+qtest_quit(s);
 }
 
 static void test_fw_cfg_max_cpus(void)
 {
+QFWCFG *fw_cfg;
+QTestState *s;
+
+s = qtest_init("");
+fw_cfg = pc_fw_cfg_init(s);
+
 g_assert_cmpint(qfw_cfg_get_u16(fw_cfg, FW_CFG_MAX_CPUS), ==, max_cpus);
+pc_fw_cfg_uninit(fw_cfg);
+qtest_quit(s);
 }
 
 static void test_fw_cfg_numa(void)
 {
+QFWCFG *fw_cfg;
+QTestState *s;
 uint64_t *cpu_mask;
 uint64_t *node_mask;
 
+s = qtest_init("");
+fw_cfg = pc_fw_cfg_init(s);
+
 g_assert_cmpint(qfw_cfg_get_u64(fw_cfg, FW_CFG_NUMA), ==, nb_nodes);
 
 cpu_mask = g_new0(uint64_t, max_cpus);
@@ -92,24 +157,29 @@ static void test_fw_cfg_numa(void)
 
 g_free(node_mask);
 g_free(cpu_mask);
+pc_fw_cfg_uninit(fw_cfg);
+qtest_quit(s);
 }
 
 static void test_fw_cfg_boot_menu(void)
 {
+QFWCFG *fw_cfg;
+QTestState *s;
+
+s = qtest_init("");
+fw_cfg = pc_fw_cfg_init(s);
+
 g_assert_cmpint(qfw_cfg_get_u16(fw_cfg, FW_CFG_BOOT_MENU), ==, boot_menu);
+pc_fw_cfg_uninit(fw_cfg);
+qtest_quit(s);
 }
 
 int main(int argc, char **argv)
 {
-QTestState *s;
 int ret;
 
 g_test_init(, , NULL);
 
-s = qtest_init("-uuid 4600cb32-38ec-4b2f-8acb-81c6ea54f2d8");
-
-fw_cfg = pc_fw_cfg_init(s);
-
 qtest_add_func("fw_cfg/signature", test_fw_cfg_signature);
 qtest_add_func("fw_cfg/id", test_fw_cfg_id);
 qtest_add_func("fw_cfg/uuid", test_fw_cfg_uuid);
@@ -128,7 +198,5 @@ int main(int argc, char **argv)
 
 ret = g_test_run();
 
-qtest_quit(s);
-
 return ret;
 }
diff --git a/tests/libqos/fw_cfg.c b/tests/libqos/fw_cfg.c
index d0889d1e22..c6839c53c8 100644
--- a/tests/libqos/fw_cfg.c
+++ b/tests/libqos/fw_cfg.c
@@ -81,6 +81,11 @@ QFWCFG *mm_fw_cfg_init(QTestState *qts, uint64_t base)
 return fw_cfg;
 }
 
+void mm_fw_cfg_uninit(QFWCFG *fw_cfg)
+{
+g_free(fw_cfg);
+}
+
 static void io_fw_cfg_select(QFWCFG *fw_cfg, uint16_t key)
 {
 

[Qemu-devel] [PATCH] libvhost-user: fix bad vu_log_write

2019-04-20 Thread Li Feng
Mark dirty as page, the step of each call is 1.

Signed-off-by: Li Feng 
---
 contrib/libvhost-user/libvhost-user.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/libvhost-user/libvhost-user.c 
b/contrib/libvhost-user/libvhost-user.c
index e08d6c7b97..2689de6d1c 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -433,7 +433,7 @@ vu_log_write(VuDev *dev, uint64_t address, uint64_t length)
 page = address / VHOST_LOG_PAGE;
 while (page * VHOST_LOG_PAGE < address + length) {
 vu_log_page(dev->log_table, page);
-page += VHOST_LOG_PAGE;
+page += 1;
 }
 
 vu_log_kick(dev);
-- 
2.11.0




Re: [Qemu-devel] [PATCH 00/38] tcg vector improvements

2019-04-20 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/20190420073442.7488-1-richard.hender...@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20190420073442.7488-1-richard.hender...@linaro.org
Subject: [Qemu-devel] [PATCH 00/38] tcg vector improvements

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]   
patchew/20190420073442.7488-1-richard.hender...@linaro.org -> 
patchew/20190420073442.7488-1-richard.hender...@linaro.org
Switched to a new branch 'test'
eaace97609 tcg/aarch64: Use ORRI and BICI for vector logical operations
a701168d9c tcg/aarch64: Use MVNI for expansion of dupi
508dc19d39 tcg: Expand vector minmax using cmp+cmpsel
6976ef828e tcg: Introduce do_op3_nofail for vector expansion
2d6f5f7050 tcg: Do not recreate INDEX_op_neg_vec unless supported
7425b741da tcg/aarch64: Do not advertise minmax for MO_64
54ed8f1e33 target/arm: Vectorize USHL and SSHL
23ef118db8 target/ppc: Use vector variable shifts for VS{L, R, RA}{B, H, W, D}
65cae51501 tcg/aarch64: Support vector comparison select value
28392fb9a2 tcg/i386: Support vector comparison select value
a7efba49a2 tcg: Add support for vector comparison select
2aded2eb78 tcg/aarch64: Support vector absolute value
4537dd40fc tcg/i386: Support vector absolute value
f81e912312 target/xtensa: Use tcg_gen_abs_i32
3c3292af32 target/s390x: Use tcg_gen_abs_i64
e81e04de28 target/ppc: Use tcg_gen_abs_tl
1869b46302 target/cris: Use tcg_gen_abs_tl
74420abd9d target/arm: Use tcg_gen_abs_i64 and tcg_gen_gvec_abs
72e6fb61b1 tcg: Add support for vector absolute value
b6885d0400 tcg: Add support for integer absolute value
9c178a385e tcg/i386: Support vector scalar shift opcodes
8a79bb2407 tcg: Add gvec expanders for vector shift by scalar
c611d5ab1d tcg: Specify optional vector requirements with a list
2d22c62f7d tcg: Implement tcg_gen_gvec_3i()
3e06422a97 tcg/aarch64: Support vector variable shift opcodes
1d95e80f8b tcg/i386: Support vector variable shift opcodes
8f6ed1c661 tcg: Add gvec expanders for variable shift
8c573e5a9c tcg: Add INDEX_op_dup_mem_vec
ca9a67767b tcg/aarch64: Implement tcg_out_dupm_vec
24cd3652da tcg/i386: Implement tcg_out_dupm_vec
9a14d7cf98 tcg: Add tcg_out_dupm_vec to the backend interface
504eb1c2ce tcg: Manually expand INDEX_op_dup_vec
71309aeb54 tcg: Promote tcg_out_{dup, dupi}_vec to backend interface
24aee13663 tcg: Allow add_vec, sub_vec, neg_vec, not_vec to be expanded
0ccf0219e9 tcg: Support cross-class moves without instruction support
476aacd9e7 tcg: Return bool success from tcg_out_mov
41a8017efa tcg: Assert fixed_reg is read-only
d2482ee256 target/arm: Fill in .opc for cmtst_op

=== OUTPUT BEGIN ===
1/38 Checking commit d2482ee25652 (target/arm: Fill in .opc for cmtst_op)
2/38 Checking commit 41a8017efa97 (tcg: Assert fixed_reg is read-only)
WARNING: Block comments use a leading /* on a separate line
#103: FILE: tcg/tcg.c:3529:
+/* temp value is modified, so the value kept in memory is

WARNING: Block comments use * on subsequent lines
#104: FILE: tcg/tcg.c:3530:
+/* temp value is modified, so the value kept in memory is
+   potentially not the same */

WARNING: Block comments use a trailing */ on a separate line
#104: FILE: tcg/tcg.c:3530:
+   potentially not the same */

total: 0 errors, 3 warnings, 140 lines checked

Patch 2/38 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
3/38 Checking commit 476aacd9e75c (tcg: Return bool success from tcg_out_mov)
4/38 Checking commit 0ccf0219e9e3 (tcg: Support cross-class moves without 
instruction support)
WARNING: Block comments use a leading /* on a separate line
#24: FILE: tcg/tcg.c:3372:
+/* Cross register class move not supported.

WARNING: Block comments use * on subsequent lines
#25: FILE: tcg/tcg.c:3373:
+/* Cross register class move not supported.
+   Store the source register into the destination slot

WARNING: Block comments use a trailing */ on a separate line
#26: FILE: tcg/tcg.c:3374:
+   and leave the destination temp as TEMP_VAL_MEM.  */

WARNING: Block comments use a leading /* on a separate line
#44: FILE: tcg/tcg.c:3485:
+/* Cross register class move not supported.  Sync the

WARNING: Block comments use * on subsequent lines
#45: FILE: tcg/tcg.c:3486:
+/* Cross register class move not supported.  Sync the
+   temp back to its slot and load from there.  */

WARNING: Block comments use a trailing */ on a separate line

[Qemu-devel] [PATCH 30/38] tcg/aarch64: Support vector comparison select value

2019-04-20 Thread Richard Henderson
The instruction set has 3 insns that perform the same operation,
only varying in which operand must overlap the destination.  We
can represent the operation without overlap and choose based on
the operands seen.

Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.h |  2 +-
 tcg/aarch64/tcg-target.inc.c | 24 +++-
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index e1135e930a..e030bf3c8f 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -140,7 +140,7 @@ typedef enum {
 #define TCG_TARGET_HAS_mul_vec  1
 #define TCG_TARGET_HAS_sat_vec  1
 #define TCG_TARGET_HAS_minmax_vec   1
-#define TCG_TARGET_HAS_cmpsel_vec   0
+#define TCG_TARGET_HAS_cmpsel_vec   1
 
 #define TCG_TARGET_DEFAULT_MO (0)
 #define TCG_TARGET_HAS_MEMORY_BSWAP 1
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index cf891defd4..84d402acd8 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -525,6 +525,9 @@ typedef enum {
 I3616_ADD   = 0x0e208400,
 I3616_AND   = 0x0e201c00,
 I3616_BIC   = 0x0e601c00,
+I3616_BIF   = 0x2ee01c00,
+I3616_BIT   = 0x2ea01c00,
+I3616_BSL   = 0x2e601c00,
 I3616_EOR   = 0x2e201c00,
 I3616_MUL   = 0x0e209c00,
 I3616_ORR   = 0x0ea01c00,
@@ -2178,7 +2181,7 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 
 TCGType type = vecl + TCG_TYPE_V64;
 unsigned is_q = vecl;
-TCGArg a0, a1, a2;
+TCGArg a0, a1, a2, a3;
 
 a0 = args[0];
 a1 = args[1];
@@ -2301,6 +2304,20 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 }
 break;
 
+case INDEX_op_cmpsel_vec:
+a3 = args[3];
+if (a0 == a3) {
+tcg_out_insn(s, 3616, BIT, is_q, 0, a0, a2, a1);
+} else if (a0 == a2) {
+tcg_out_insn(s, 3616, BIF, is_q, 0, a0, a3, a1);
+} else {
+if (a0 != a1) {
+tcg_out_mov(s, type, a0, a1);
+}
+tcg_out_insn(s, 3616, BSL, is_q, 0, a0, a2, a3);
+}
+break;
+
 case INDEX_op_mov_vec:  /* Always emitted via tcg_out_mov.  */
 case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi.  */
 case INDEX_op_dup_vec:  /* Always emitted via tcg_out_dup_vec.  */
@@ -2323,6 +2340,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_abs_vec:
 case INDEX_op_not_vec:
 case INDEX_op_cmp_vec:
+case INDEX_op_cmpsel_vec:
 case INDEX_op_shli_vec:
 case INDEX_op_shri_vec:
 case INDEX_op_sari_vec:
@@ -2405,6 +2423,8 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode 
op)
 = { .args_ct_str = { "r", "r", "rA", "rZ", "rZ" } };
 static const TCGTargetOpDef add2
 = { .args_ct_str = { "r", "r", "rZ", "rZ", "rA", "rMZ" } };
+static const TCGTargetOpDef w_w_w_w
+= { .args_ct_str = { "w", "w", "w", "w" } };
 
 switch (op) {
 case INDEX_op_goto_ptr:
@@ -2577,6 +2597,8 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode 
op)
 return _wr;
 case INDEX_op_cmp_vec:
 return _w_wZ;
+case INDEX_op_cmpsel_vec:
+return _w_w_w;
 
 default:
 return NULL;
-- 
2.17.1




Re: [Qemu-devel] [PATCH v3] cputlb: Fix io_readx() to respect the access_type

2019-04-20 Thread Richard Henderson
On 4/19/19 9:22 PM, Shahab Vahedi wrote:
> This change adapts io_readx() to its input access_type. Currently
> io_readx() treats any memory access as a read, although it has an
> input argument "MMUAccessType access_type". This results in:
> 
> 1) Calling the tlb_fill() only with MMU_DATA_LOAD
> 2) Considering only entry->addr_read as the tlb_addr
> 
> Buglink: https://bugs.launchpad.net/qemu/+bug/1825359
> 
> Signed-off-by: Shahab Vahedi 
> ---
> Changelog:
> v3
>   - Only handle read/fetch. There must be no write access.
> 
> v2
>   - Extra space before closing parenthesis is removed
> 
> v1
>   - Initial submit
> 
>  accel/tcg/cputlb.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)

Reviewed-by: Richard Henderson 


r~



[Qemu-devel] [PATCH 20/38] tcg: Add support for vector absolute value

2019-04-20 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 accel/tcg/tcg-runtime.h  |  5 +++
 tcg/aarch64/tcg-target.h |  1 +
 tcg/i386/tcg-target.h|  1 +
 tcg/tcg-op-gvec.h|  2 +
 tcg/tcg-opc.h|  1 +
 tcg/tcg.h|  1 +
 accel/tcg/tcg-runtime-gvec.c | 48 
 tcg/tcg-op-gvec.c| 71 
 tcg/tcg-op-vec.c | 31 
 tcg/tcg.c|  2 +
 tcg/README   |  4 ++
 11 files changed, 167 insertions(+)

diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
index ed3ce5fd91..6d73dc2d65 100644
--- a/accel/tcg/tcg-runtime.h
+++ b/accel/tcg/tcg-runtime.h
@@ -225,6 +225,11 @@ DEF_HELPER_FLAGS_3(gvec_neg16, TCG_CALL_NO_RWG, void, ptr, 
ptr, i32)
 DEF_HELPER_FLAGS_3(gvec_neg32, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(gvec_neg64, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_3(gvec_abs8, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(gvec_abs16, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(gvec_abs32, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(gvec_abs64, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_3(gvec_not, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_and, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_or, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index f5640a229b..21d06d928c 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -132,6 +132,7 @@ typedef enum {
 #define TCG_TARGET_HAS_orc_vec  1
 #define TCG_TARGET_HAS_not_vec  1
 #define TCG_TARGET_HAS_neg_vec  1
+#define TCG_TARGET_HAS_abs_vec  0
 #define TCG_TARGET_HAS_shi_vec  1
 #define TCG_TARGET_HAS_shs_vec  0
 #define TCG_TARGET_HAS_shv_vec  1
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 618aa520d2..7445f05885 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -182,6 +182,7 @@ extern bool have_avx2;
 #define TCG_TARGET_HAS_orc_vec  0
 #define TCG_TARGET_HAS_not_vec  0
 #define TCG_TARGET_HAS_neg_vec  0
+#define TCG_TARGET_HAS_abs_vec  0
 #define TCG_TARGET_HAS_shi_vec  1
 #define TCG_TARGET_HAS_shs_vec  1
 #define TCG_TARGET_HAS_shv_vec  have_avx2
diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h
index f9c6058e92..46f58febbf 100644
--- a/tcg/tcg-op-gvec.h
+++ b/tcg/tcg-op-gvec.h
@@ -228,6 +228,8 @@ void tcg_gen_gvec_not(unsigned vece, uint32_t dofs, 
uint32_t aofs,
   uint32_t oprsz, uint32_t maxsz);
 void tcg_gen_gvec_neg(unsigned vece, uint32_t dofs, uint32_t aofs,
   uint32_t oprsz, uint32_t maxsz);
+void tcg_gen_gvec_abs(unsigned vece, uint32_t dofs, uint32_t aofs,
+  uint32_t oprsz, uint32_t maxsz);
 
 void tcg_gen_gvec_add(unsigned vece, uint32_t dofs, uint32_t aofs,
   uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 4bf71f261f..4a2dd116eb 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -225,6 +225,7 @@ DEF(add_vec, 1, 2, 0, IMPLVEC)
 DEF(sub_vec, 1, 2, 0, IMPLVEC)
 DEF(mul_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_mul_vec))
 DEF(neg_vec, 1, 1, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_neg_vec))
+DEF(abs_vec, 1, 1, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_abs_vec))
 DEF(ssadd_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_sat_vec))
 DEF(usadd_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_sat_vec))
 DEF(sssub_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_sat_vec))
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 48d4d2e03e..986055fdfa 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -176,6 +176,7 @@ typedef uint64_t TCGRegSet;
 && !defined(TCG_TARGET_HAS_v128) \
 && !defined(TCG_TARGET_HAS_v256)
 #define TCG_TARGET_MAYBE_vec0
+#define TCG_TARGET_HAS_abs_vec  0
 #define TCG_TARGET_HAS_neg_vec  0
 #define TCG_TARGET_HAS_not_vec  0
 #define TCG_TARGET_HAS_andc_vec 0
diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c
index 7b88f5590c..dd08095773 100644
--- a/accel/tcg/tcg-runtime-gvec.c
+++ b/accel/tcg/tcg-runtime-gvec.c
@@ -398,6 +398,54 @@ void HELPER(gvec_neg64)(void *d, void *a, uint32_t desc)
 clear_high(d, oprsz, desc);
 }
 
+void HELPER(gvec_abs8)(void *d, void *a, uint32_t desc)
+{
+intptr_t oprsz = simd_oprsz(desc);
+intptr_t i;
+
+for (i = 0; i < oprsz; i += sizeof(int8_t)) {
+int8_t aa = *(int8_t *)(a + i);
+*(int8_t *)(d + i) = aa < 0 ? -aa : aa;
+}
+clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_abs16)(void *d, void *a, uint32_t desc)
+{
+intptr_t oprsz = simd_oprsz(desc);
+intptr_t i;
+
+for (i = 0; i < oprsz; i += sizeof(int16_t)) {
+int16_t aa = *(int16_t *)(a + i);
+*(int16_t *)(d + i) = aa < 0 

[Qemu-devel] [PATCH 22/38] target/cris: Use tcg_gen_abs_tl

2019-04-20 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/cris/translate.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/target/cris/translate.c b/target/cris/translate.c
index 11b2c11174..0374718f66 100644
--- a/target/cris/translate.c
+++ b/target/cris/translate.c
@@ -1685,18 +1685,11 @@ static int dec_cmp_r(CPUCRISState *env, DisasContext 
*dc)
 
 static int dec_abs_r(CPUCRISState *env, DisasContext *dc)
 {
-TCGv t0;
-
 LOG_DIS("abs $r%u, $r%u\n",
 dc->op1, dc->op2);
 cris_cc_mask(dc, CC_MASK_NZ);
 
-t0 = tcg_temp_new();
-tcg_gen_sari_tl(t0, cpu_R[dc->op1], 31);
-tcg_gen_xor_tl(cpu_R[dc->op2], cpu_R[dc->op1], t0);
-tcg_gen_sub_tl(cpu_R[dc->op2], cpu_R[dc->op2], t0);
-tcg_temp_free(t0);
-
+tcg_gen_abs_tl(cpu_R[dc->op2], cpu_R[dc->op1]);
 cris_alu(dc, CC_OP_MOVE,
 cpu_R[dc->op2], cpu_R[dc->op2], cpu_R[dc->op2], 4);
 return 2;
-- 
2.17.1




[Qemu-devel] [PATCH 32/38] target/arm: Vectorize USHL and SSHL

2019-04-20 Thread Richard Henderson
These instructions shift left or right depending on the sign
of the input, and 7 bits are significant to the shift.  This
requires several masks and selects in addition to the actual
shifts to form the complete answer.

Signed-off-by: Richard Henderson 
---
 target/arm/helper.h|  15 +-
 target/arm/translate.h |   6 +
 target/arm/neon_helper.c   |  33 -
 target/arm/translate-a64.c |  18 +--
 target/arm/translate.c | 288 +++--
 target/arm/vec_helper.c| 176 +++
 6 files changed, 470 insertions(+), 66 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 3d90b5be66..1c0de661fb 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -292,14 +292,8 @@ DEF_HELPER_2(neon_abd_s16, i32, i32, i32)
 DEF_HELPER_2(neon_abd_u32, i32, i32, i32)
 DEF_HELPER_2(neon_abd_s32, i32, i32, i32)
 
-DEF_HELPER_2(neon_shl_u8, i32, i32, i32)
-DEF_HELPER_2(neon_shl_s8, i32, i32, i32)
 DEF_HELPER_2(neon_shl_u16, i32, i32, i32)
 DEF_HELPER_2(neon_shl_s16, i32, i32, i32)
-DEF_HELPER_2(neon_shl_u32, i32, i32, i32)
-DEF_HELPER_2(neon_shl_s32, i32, i32, i32)
-DEF_HELPER_2(neon_shl_u64, i64, i64, i64)
-DEF_HELPER_2(neon_shl_s64, i64, i64, i64)
 DEF_HELPER_2(neon_rshl_u8, i32, i32, i32)
 DEF_HELPER_2(neon_rshl_s8, i32, i32, i32)
 DEF_HELPER_2(neon_rshl_u16, i32, i32, i32)
@@ -686,6 +680,15 @@ DEF_HELPER_FLAGS_2(frint64_s, TCG_CALL_NO_RWG, f32, f32, 
ptr)
 DEF_HELPER_FLAGS_2(frint32_d, TCG_CALL_NO_RWG, f64, f64, ptr)
 DEF_HELPER_FLAGS_2(frint64_d, TCG_CALL_NO_RWG, f64, f64, ptr)
 
+DEF_HELPER_FLAGS_4(gvec_sshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_sshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_sshl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_sshl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_ushl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_ushl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_ushl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_ushl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #include "helper-sve.h"
diff --git a/target/arm/translate.h b/target/arm/translate.h
index 912cc2a4a5..633668fa1b 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -244,6 +244,8 @@ extern const GVecGen3 bif_op;
 extern const GVecGen3 mla_op[4];
 extern const GVecGen3 mls_op[4];
 extern const GVecGen3 cmtst_op[4];
+extern const GVecGen3 sshl_op[4];
+extern const GVecGen3 ushl_op[4];
 extern const GVecGen2i ssra_op[4];
 extern const GVecGen2i usra_op[4];
 extern const GVecGen2i sri_op[4];
@@ -253,6 +255,10 @@ extern const GVecGen4 sqadd_op[4];
 extern const GVecGen4 uqsub_op[4];
 extern const GVecGen4 sqsub_op[4];
 void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
+void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
+void gen_sshl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
+void gen_ushl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
+void gen_sshl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 
 /*
  * Forward to the isar_feature_* tests given a DisasContext pointer.
diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c
index 4259056723..c581ffb7d3 100644
--- a/target/arm/neon_helper.c
+++ b/target/arm/neon_helper.c
@@ -615,24 +615,9 @@ NEON_VOP(abd_u32, neon_u32, 1)
 } else { \
 dest = src1 << tmp; \
 }} while (0)
-NEON_VOP(shl_u8, neon_u8, 4)
 NEON_VOP(shl_u16, neon_u16, 2)
-NEON_VOP(shl_u32, neon_u32, 1)
 #undef NEON_FN
 
-uint64_t HELPER(neon_shl_u64)(uint64_t val, uint64_t shiftop)
-{
-int8_t shift = (int8_t)shiftop;
-if (shift >= 64 || shift <= -64) {
-val = 0;
-} else if (shift < 0) {
-val >>= -shift;
-} else {
-val <<= shift;
-}
-return val;
-}
-
 #define NEON_FN(dest, src1, src2) do { \
 int8_t tmp; \
 tmp = (int8_t)src2; \
@@ -645,27 +630,9 @@ uint64_t HELPER(neon_shl_u64)(uint64_t val, uint64_t 
shiftop)
 } else { \
 dest = src1 << tmp; \
 }} while (0)
-NEON_VOP(shl_s8, neon_s8, 4)
 NEON_VOP(shl_s16, neon_s16, 2)
-NEON_VOP(shl_s32, neon_s32, 1)
 #undef NEON_FN
 
-uint64_t HELPER(neon_shl_s64)(uint64_t valop, uint64_t shiftop)
-{
-int8_t shift = (int8_t)shiftop;
-int64_t val = valop;
-if (shift >= 64) {
-val = 0;
-} else if (shift <= -64) {
-val >>= 63;
-} else if (shift < 0) {
-val >>= -shift;
-} else {
-val <<= shift;
-}
-return val;
-}
-
 #define NEON_FN(dest, src1, src2) do { \
 int8_t tmp; \
 tmp = (int8_t)src2; \
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index fd8921565e..c30f99c7cd 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -8845,9 +8845,9 @@ static void handle_3same_64(DisasContext *s, int opcode, 
bool u,
 break;
 case 0x8: /* SSHL, USHL */
 

[Qemu-devel] [PATCH 35/38] tcg: Introduce do_op3_nofail for vector expansion

2019-04-20 Thread Richard Henderson
This makes do_op3 match do_op2 in allowing for failure,
and thus fall back expansions.

Signed-off-by: Richard Henderson 
---
 tcg/tcg-op-vec.c | 45 +++--
 1 file changed, 27 insertions(+), 18 deletions(-)

diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index 7d8f7b490a..5868a51270 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -450,7 +450,7 @@ void tcg_gen_cmp_vec(TCGCond cond, unsigned vece,
 }
 }
 
-static void do_op3(unsigned vece, TCGv_vec r, TCGv_vec a,
+static bool do_op3(unsigned vece, TCGv_vec r, TCGv_vec a,
TCGv_vec b, TCGOpcode opc)
 {
 TCGTemp *rt = tcgv_vec_temp(r);
@@ -468,82 +468,91 @@ static void do_op3(unsigned vece, TCGv_vec r, TCGv_vec a,
 can = tcg_can_emit_vec_op(opc, type, vece);
 if (can > 0) {
 vec_gen_3(opc, type, vece, ri, ai, bi);
-} else {
+} else if (can < 0) {
 const TCGOpcode *hold_list = tcg_swap_vecop_list(NULL);
-tcg_debug_assert(can < 0);
 tcg_expand_vec_op(opc, type, vece, ri, ai, bi);
 tcg_swap_vecop_list(hold_list);
+} else {
+return false;
 }
+return true;
+}
+
+static void do_op3_nofail(unsigned vece, TCGv_vec r, TCGv_vec a,
+  TCGv_vec b, TCGOpcode opc)
+{
+bool ok = do_op3(vece, r, a, b, opc);
+tcg_debug_assert(ok);
 }
 
 void tcg_gen_add_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3(vece, r, a, b, INDEX_op_add_vec);
+do_op3_nofail(vece, r, a, b, INDEX_op_add_vec);
 }
 
 void tcg_gen_sub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3(vece, r, a, b, INDEX_op_sub_vec);
+do_op3_nofail(vece, r, a, b, INDEX_op_sub_vec);
 }
 
 void tcg_gen_mul_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3(vece, r, a, b, INDEX_op_mul_vec);
+do_op3_nofail(vece, r, a, b, INDEX_op_mul_vec);
 }
 
 void tcg_gen_ssadd_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3(vece, r, a, b, INDEX_op_ssadd_vec);
+do_op3_nofail(vece, r, a, b, INDEX_op_ssadd_vec);
 }
 
 void tcg_gen_usadd_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3(vece, r, a, b, INDEX_op_usadd_vec);
+do_op3_nofail(vece, r, a, b, INDEX_op_usadd_vec);
 }
 
 void tcg_gen_sssub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3(vece, r, a, b, INDEX_op_sssub_vec);
+do_op3_nofail(vece, r, a, b, INDEX_op_sssub_vec);
 }
 
 void tcg_gen_ussub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3(vece, r, a, b, INDEX_op_ussub_vec);
+do_op3_nofail(vece, r, a, b, INDEX_op_ussub_vec);
 }
 
 void tcg_gen_smin_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3(vece, r, a, b, INDEX_op_smin_vec);
+do_op3_nofail(vece, r, a, b, INDEX_op_smin_vec);
 }
 
 void tcg_gen_umin_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3(vece, r, a, b, INDEX_op_umin_vec);
+do_op3_nofail(vece, r, a, b, INDEX_op_umin_vec);
 }
 
 void tcg_gen_smax_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3(vece, r, a, b, INDEX_op_smax_vec);
+do_op3_nofail(vece, r, a, b, INDEX_op_smax_vec);
 }
 
 void tcg_gen_umax_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3(vece, r, a, b, INDEX_op_umax_vec);
+do_op3_nofail(vece, r, a, b, INDEX_op_umax_vec);
 }
 
 void tcg_gen_shlv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3(vece, r, a, b, INDEX_op_shlv_vec);
+do_op3_nofail(vece, r, a, b, INDEX_op_shlv_vec);
 }
 
 void tcg_gen_shrv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3(vece, r, a, b, INDEX_op_shrv_vec);
+do_op3_nofail(vece, r, a, b, INDEX_op_shrv_vec);
 }
 
 void tcg_gen_sarv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3(vece, r, a, b, INDEX_op_sarv_vec);
+do_op3_nofail(vece, r, a, b, INDEX_op_sarv_vec);
 }
 
 static void do_shifts(unsigned vece, TCGv_vec r, TCGv_vec a,
@@ -579,7 +588,7 @@ static void do_shifts(unsigned vece, TCGv_vec r, TCGv_vec a,
 } else {
 tcg_gen_dup_i32_vec(vece, vec_s, s);
 }
-do_op3(vece, r, a, vec_s, opc_v);
+do_op3_nofail(vece, r, a, vec_s, opc_v);
 tcg_temp_free_vec(vec_s);
 }
 tcg_swap_vecop_list(hold_list);
-- 
2.17.1




[Qemu-devel] [PATCH 33/38] tcg/aarch64: Do not advertise minmax for MO_64

2019-04-20 Thread Richard Henderson
The min/max instructions are not available for 64-bit elements.

Fixes: 93f332a50371
Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.inc.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 84d402acd8..e68e4de08c 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -2348,16 +2348,16 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_sssub_vec:
 case INDEX_op_usadd_vec:
 case INDEX_op_ussub_vec:
-case INDEX_op_smax_vec:
-case INDEX_op_smin_vec:
-case INDEX_op_umax_vec:
-case INDEX_op_umin_vec:
 case INDEX_op_shlv_vec:
 return 1;
 case INDEX_op_shrv_vec:
 case INDEX_op_sarv_vec:
 return -1;
 case INDEX_op_mul_vec:
+case INDEX_op_smax_vec:
+case INDEX_op_smin_vec:
+case INDEX_op_umax_vec:
+case INDEX_op_umin_vec:
 return vece < MO_64;
 
 default:
-- 
2.17.1




[Qemu-devel] [PATCH 28/38] tcg: Add support for vector comparison select

2019-04-20 Thread Richard Henderson
At present, only tcg_gen_cmpsel_vec added, which can be used by
other target-specific vector expanders.  It is not clear whether
a full gvec expander would be worthwhile, given the unspecified
nature of the selector.

Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.h |  1 +
 tcg/i386/tcg-target.h|  1 +
 tcg/tcg-op.h |  2 ++
 tcg/tcg-opc.h|  1 +
 tcg/tcg.h|  1 +
 tcg/tcg-op-gvec.c|  3 +++
 tcg/tcg-op-vec.c | 37 +
 tcg/tcg.c|  2 ++
 tcg/README   | 12 
 9 files changed, 60 insertions(+)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index e43554c3c7..e1135e930a 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -140,6 +140,7 @@ typedef enum {
 #define TCG_TARGET_HAS_mul_vec  1
 #define TCG_TARGET_HAS_sat_vec  1
 #define TCG_TARGET_HAS_minmax_vec   1
+#define TCG_TARGET_HAS_cmpsel_vec   0
 
 #define TCG_TARGET_DEFAULT_MO (0)
 #define TCG_TARGET_HAS_MEMORY_BSWAP 1
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 66f16fbe3c..683e029980 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -190,6 +190,7 @@ extern bool have_avx2;
 #define TCG_TARGET_HAS_mul_vec  1
 #define TCG_TARGET_HAS_sat_vec  1
 #define TCG_TARGET_HAS_minmax_vec   1
+#define TCG_TARGET_HAS_cmpsel_vec   0
 
 #define TCG_TARGET_deposit_i32_valid(ofs, len) \
 (((ofs) == 0 && (len) == 8) || ((ofs) == 8 && (len) == 8) || \
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 660fe205d0..6c4cd0aa14 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -999,6 +999,8 @@ void tcg_gen_sarv_vec(unsigned vece, TCGv_vec r, TCGv_vec 
a, TCGv_vec s);
 
 void tcg_gen_cmp_vec(TCGCond cond, unsigned vece, TCGv_vec r,
  TCGv_vec a, TCGv_vec b);
+void tcg_gen_cmpsel_vec(unsigned vece, TCGv_vec r, TCGv_vec s,
+TCGv_vec a, TCGv_vec b);
 
 void tcg_gen_ld_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset);
 void tcg_gen_st_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset);
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 4a2dd116eb..05fb9e3f37 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -255,6 +255,7 @@ DEF(shrv_vec, 1, 2, 0, IMPLVEC | 
IMPL(TCG_TARGET_HAS_shv_vec))
 DEF(sarv_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shv_vec))
 
 DEF(cmp_vec, 1, 2, 1, IMPLVEC)
+DEF(cmpsel_vec, 1, 3, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_cmpsel_vec))
 
 DEF(last_generic, 0, 0, 0, TCG_OPF_NOT_PRESENT)
 
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 986055fdfa..1abee6cbe5 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -187,6 +187,7 @@ typedef uint64_t TCGRegSet;
 #define TCG_TARGET_HAS_mul_vec  0
 #define TCG_TARGET_HAS_sat_vec  0
 #define TCG_TARGET_HAS_minmax_vec   0
+#define TCG_TARGET_HAS_cmpsel_vec   0
 #else
 #define TCG_TARGET_MAYBE_vec1
 #endif
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 87d5a01cc9..e7029d26f4 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -96,6 +96,9 @@ static bool tcg_can_emit_vecop_list(const TCGOpcode *list,
 continue;
 }
 break;
+case INDEX_op_cmpsel_vec:
+/* Fallback expansion uses only required logial ops.  */
+continue;
 default:
 break;
 }
diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index b7f21145bb..7d8f7b490a 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -599,3 +599,40 @@ void tcg_gen_sars_vec(unsigned vece, TCGv_vec r, TCGv_vec 
a, TCGv_i32 b)
 {
 do_shifts(vece, r, a, b, INDEX_op_sars_vec, INDEX_op_sarv_vec);
 }
+
+void tcg_gen_cmpsel_vec(unsigned vece, TCGv_vec r, TCGv_vec s,
+TCGv_vec a, TCGv_vec b)
+{
+TCGTemp *rt = tcgv_vec_temp(r);
+TCGTemp *st = tcgv_vec_temp(s);
+TCGTemp *at = tcgv_vec_temp(a);
+TCGTemp *bt = tcgv_vec_temp(b);
+TCGArg ri = temp_arg(rt);
+TCGArg si = temp_arg(st);
+TCGArg ai = temp_arg(at);
+TCGArg bi = temp_arg(bt);
+TCGType type = rt->base_type;
+const TCGOpcode *hold_list;
+int can;
+
+tcg_debug_assert(st->base_type >= type);
+tcg_debug_assert(at->base_type >= type);
+tcg_debug_assert(bt->base_type >= type);
+tcg_assert_listed_vecop(INDEX_op_cmpsel_vec);
+hold_list = tcg_swap_vecop_list(NULL);
+
+can = tcg_can_emit_vec_op(INDEX_op_cmpsel_vec, type, vece);
+if (can > 0) {
+vec_gen_4(INDEX_op_cmpsel_vec, type, vece, ri, si, ai, bi);
+} else if (can < 0) {
+tcg_expand_vec_op(INDEX_op_cmpsel_vec, type, vece, ri, si, ai, bi);
+} else {
+TCGv_vec t = tcg_temp_new_vec(type);
+
+tcg_gen_and_vec(vece, t, a, s);
+tcg_gen_andc_vec(vece, r, b, s);
+tcg_gen_or_vec(vece, r, r, t);
+tcg_temp_free_vec(t);
+}
+tcg_swap_vecop_list(hold_list);
+}
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 86a95a636b..0c68bd5cf5 100644

[Qemu-devel] [PATCH 26/38] tcg/i386: Support vector absolute value

2019-04-20 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.h |  2 +-
 tcg/i386/tcg-target.inc.c | 15 +++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 7445f05885..66f16fbe3c 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -182,7 +182,7 @@ extern bool have_avx2;
 #define TCG_TARGET_HAS_orc_vec  0
 #define TCG_TARGET_HAS_not_vec  0
 #define TCG_TARGET_HAS_neg_vec  0
-#define TCG_TARGET_HAS_abs_vec  0
+#define TCG_TARGET_HAS_abs_vec  1
 #define TCG_TARGET_HAS_shi_vec  1
 #define TCG_TARGET_HAS_shs_vec  1
 #define TCG_TARGET_HAS_shv_vec  have_avx2
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 85b68e4326..3dae0bf0c5 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -369,6 +369,9 @@ static inline int tcg_target_const_match(tcg_target_long 
val, TCGType type,
 #define OPC_MOVSLQ (0x63 | P_REXW)
 #define OPC_MOVZBL (0xb6 | P_EXT)
 #define OPC_MOVZWL (0xb7 | P_EXT)
+#define OPC_PABSB   (0x1c | P_EXT38 | P_DATA16)
+#define OPC_PABSW   (0x1d | P_EXT38 | P_DATA16)
+#define OPC_PABSD   (0x1e | P_EXT38 | P_DATA16)
 #define OPC_PACKSSDW(0x6b | P_EXT | P_DATA16)
 #define OPC_PACKSSWB(0x63 | P_EXT | P_DATA16)
 #define OPC_PACKUSDW(0x2b | P_EXT38 | P_DATA16)
@@ -2739,6 +2742,10 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 static int const sars_insn[4] = {
 OPC_UD2, OPC_PSRAW, OPC_PSRAD, OPC_UD2
 };
+static int const abs_insn[4] = {
+/* TODO: AVX512 adds support for MO_64.  */
+OPC_PABSB, OPC_PABSW, OPC_PABSD, OPC_UD2
+};
 
 TCGType type = vecl + TCG_TYPE_V64;
 int insn, sub;
@@ -2827,6 +2834,11 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 insn = OPC_PUNPCKLDQ;
 goto gen_simd;
 #endif
+case INDEX_op_abs_vec:
+insn = abs_insn[vece];
+a2 = a1;
+a1 = 0;
+goto gen_simd;
 gen_simd:
 tcg_debug_assert(insn != OPC_UD2);
 if (type == TCG_TYPE_V256) {
@@ -3204,6 +3216,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_dup2_vec:
 #endif
 return _x_x;
+case INDEX_op_abs_vec:
 case INDEX_op_dup_vec:
 case INDEX_op_shli_vec:
 case INDEX_op_shri_vec:
@@ -3281,6 +3294,8 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_umin_vec:
 case INDEX_op_umax_vec:
 return vece <= MO_32 ? 1 : -1;
+case INDEX_op_abs_vec:
+return vece <= MO_32;
 
 default:
 return 0;
-- 
2.17.1




[Qemu-devel] [PATCH 27/38] tcg/aarch64: Support vector absolute value

2019-04-20 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.h | 2 +-
 tcg/aarch64/tcg-target.inc.c | 6 ++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 21d06d928c..e43554c3c7 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -132,7 +132,7 @@ typedef enum {
 #define TCG_TARGET_HAS_orc_vec  1
 #define TCG_TARGET_HAS_not_vec  1
 #define TCG_TARGET_HAS_neg_vec  1
-#define TCG_TARGET_HAS_abs_vec  0
+#define TCG_TARGET_HAS_abs_vec  1
 #define TCG_TARGET_HAS_shi_vec  1
 #define TCG_TARGET_HAS_shs_vec  0
 #define TCG_TARGET_HAS_shv_vec  1
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 7d2a8213ec..cf891defd4 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -554,6 +554,7 @@ typedef enum {
 I3617_CMGE0 = 0x2e208800,
 I3617_CMLE0 = 0x2e20a800,
 I3617_NOT   = 0x2e205800,
+I3617_ABS   = 0x0e20b800,
 I3617_NEG   = 0x2e20b800,
 
 /* System instructions.  */
@@ -2205,6 +2206,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_neg_vec:
 tcg_out_insn(s, 3617, NEG, is_q, vece, a0, a1);
 break;
+case INDEX_op_abs_vec:
+tcg_out_insn(s, 3617, ABS, is_q, vece, a0, a1);
+break;
 case INDEX_op_and_vec:
 tcg_out_insn(s, 3616, AND, is_q, 0, a0, a1, a2);
 break;
@@ -2316,6 +2320,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_andc_vec:
 case INDEX_op_orc_vec:
 case INDEX_op_neg_vec:
+case INDEX_op_abs_vec:
 case INDEX_op_not_vec:
 case INDEX_op_cmp_vec:
 case INDEX_op_shli_vec:
@@ -2559,6 +2564,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode 
op)
 return _w_w;
 case INDEX_op_not_vec:
 case INDEX_op_neg_vec:
+case INDEX_op_abs_vec:
 case INDEX_op_shli_vec:
 case INDEX_op_shri_vec:
 case INDEX_op_sari_vec:
-- 
2.17.1




[Qemu-devel] [PATCH 36/38] tcg: Expand vector minmax using cmp+cmpsel

2019-04-20 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tcg-op-gvec.c |  8 
 tcg/tcg-op-vec.c  | 19 +++
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index e7029d26f4..dddb00719a 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -99,6 +99,14 @@ static bool tcg_can_emit_vecop_list(const TCGOpcode *list,
 case INDEX_op_cmpsel_vec:
 /* Fallback expansion uses only required logial ops.  */
 continue;
+case INDEX_op_smin_vec:
+case INDEX_op_smax_vec:
+case INDEX_op_umin_vec:
+case INDEX_op_umax_vec:
+if (tcg_can_emit_vec_op(INDEX_op_cmp_vec, type, vece)) {
+continue;
+}
+break;
 default:
 break;
 }
diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index 5868a51270..43abeb0674 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -520,24 +520,35 @@ void tcg_gen_ussub_vec(unsigned vece, TCGv_vec r, 
TCGv_vec a, TCGv_vec b)
 do_op3_nofail(vece, r, a, b, INDEX_op_ussub_vec);
 }
 
+static void do_minmax(unsigned vece, TCGv_vec r, TCGv_vec a,
+  TCGv_vec b, TCGOpcode opc, TCGCond cond)
+{
+if (!do_op3(vece, r, a, b, opc)) {
+TCGv_vec t = tcg_temp_new_vec_matching(r);
+tcg_gen_cmp_vec(cond, vece, t, a, b);
+tcg_gen_cmpsel_vec(vece, r, t, a, b);
+tcg_temp_free_vec(t);
+}
+}
+
 void tcg_gen_smin_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3_nofail(vece, r, a, b, INDEX_op_smin_vec);
+do_minmax(vece, r, a, b, INDEX_op_smin_vec, TCG_COND_GT);
 }
 
 void tcg_gen_umin_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3_nofail(vece, r, a, b, INDEX_op_umin_vec);
+do_minmax(vece, r, a, b, INDEX_op_umin_vec, TCG_COND_LTU);
 }
 
 void tcg_gen_smax_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3_nofail(vece, r, a, b, INDEX_op_smax_vec);
+do_minmax(vece, r, a, b, INDEX_op_smax_vec, TCG_COND_GT);
 }
 
 void tcg_gen_umax_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3_nofail(vece, r, a, b, INDEX_op_umax_vec);
+do_minmax(vece, r, a, b, INDEX_op_umax_vec, TCG_COND_GTU);
 }
 
 void tcg_gen_shlv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
-- 
2.17.1




[Qemu-devel] [PATCH 34/38] tcg: Do not recreate INDEX_op_neg_vec unless supported

2019-04-20 Thread Richard Henderson
Use tcg_can_emit_vec_op instead of just TCG_TARGET_HAS_neg_vec,
so that we check the type and vece for the actual operation.

Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 5150c38a25..24faa06260 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -734,9 +734,13 @@ void tcg_optimize(TCGContext *s)
 } else if (opc == INDEX_op_sub_i64) {
 neg_op = INDEX_op_neg_i64;
 have_neg = TCG_TARGET_HAS_neg_i64;
-} else {
+} else if (TCG_TARGET_HAS_neg_vec) {
+TCGType type = TCGOP_VECL(op) + TCG_TYPE_V64;
+unsigned vece = TCGOP_VECE(op);
 neg_op = INDEX_op_neg_vec;
-have_neg = TCG_TARGET_HAS_neg_vec;
+have_neg = tcg_can_emit_vec_op(neg_op, type, vece) > 0;
+} else {
+break;
 }
 if (!have_neg) {
 break;
-- 
2.17.1




[Qemu-devel] [PATCH 37/38] tcg/aarch64: Use MVNI for expansion of dupi

2019-04-20 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.inc.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index e68e4de08c..20c8699f79 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -513,6 +513,7 @@ typedef enum {
 
 /* AdvSIMD modified immediate */
 I3606_MOVI  = 0x0f000400,
+I3606_MVNI  = 0x2f000400,
 
 /* AdvSIMD shift by immediate */
 I3614_SSHR  = 0x0f000400,
@@ -823,6 +824,9 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
 
 if (is_fimm(v64, , , )) {
 tcg_out_insn(s, 3606, MOVI, type == TCG_TYPE_V128, rd, op, cmode, 
imm8);
+} else if (is_fimm(~v64, , , )
+   && op == 0 && ((1 << cmode) & 0x03555)) {
+tcg_out_insn(s, 3606, MVNI, type == TCG_TYPE_V128, rd, 0, cmode, imm8);
 } else if (type == TCG_TYPE_V128) {
 new_pool_l2(s, R_AARCH64_CONDBR19, s->code_ptr, 0, v64, v64);
 tcg_out_insn(s, 3305, LDR_v128, 0, rd);
-- 
2.17.1




[Qemu-devel] [PATCH 38/38] tcg/aarch64: Use ORRI and BICI for vector logical operations

2019-04-20 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.inc.c | 77 ++--
 1 file changed, 65 insertions(+), 12 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 20c8699f79..1e79a60fb2 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -119,6 +119,8 @@ static inline bool patch_reloc(tcg_insn_unit *code_ptr, int 
type,
 #define TCG_CT_CONST_LIMM 0x200
 #define TCG_CT_CONST_ZERO 0x400
 #define TCG_CT_CONST_MONE 0x800
+#define TCG_CT_CONST_PVI  0x1000
+#define TCG_CT_CONST_IVI  0x2000
 
 /* parse target specific constraints */
 static const char *target_parse_constraint(TCGArgConstraint *ct,
@@ -154,6 +156,12 @@ static const char 
*target_parse_constraint(TCGArgConstraint *ct,
 case 'M': /* minus one */
 ct->ct |= TCG_CT_CONST_MONE;
 break;
+case 'P': /* vector positive immediate */
+ct->ct |= TCG_CT_CONST_PVI;
+break;
+case 'I': /* vector inverted immediate */
+ct->ct |= TCG_CT_CONST_IVI;
+break;
 case 'Z': /* zero */
 ct->ct |= TCG_CT_CONST_ZERO;
 break;
@@ -294,6 +302,7 @@ static int tcg_target_const_match(tcg_target_long val, 
TCGType type,
   const TCGArgConstraint *arg_ct)
 {
 int ct = arg_ct->ct;
+int op, cmode, imm8;
 
 if (ct & TCG_CT_CONST) {
 return 1;
@@ -313,6 +322,16 @@ static int tcg_target_const_match(tcg_target_long val, 
TCGType type,
 if ((ct & TCG_CT_CONST_MONE) && val == -1) {
 return 1;
 }
+if ((ct & TCG_CT_CONST_PVI) &&
+is_fimm(val, , , ) &&
+op == 0 && ((1 << cmode) & 0x555)) {
+return 1;
+}
+if ((ct & TCG_CT_CONST_IVI) &&
+is_fimm(~val, , , ) &&
+op == 0 && ((1 << cmode) & 0x555)) {
+return 1;
+}
 
 return 0;
 }
@@ -514,6 +533,8 @@ typedef enum {
 /* AdvSIMD modified immediate */
 I3606_MOVI  = 0x0f000400,
 I3606_MVNI  = 0x2f000400,
+I3606_ORRI  = 0x0f001400,
+I3606_BICI  = 0x2f001400,
 
 /* AdvSIMD shift by immediate */
 I3614_SSHR  = 0x0f000400,
@@ -2217,20 +2238,48 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 tcg_out_insn(s, 3617, ABS, is_q, vece, a0, a1);
 break;
 case INDEX_op_and_vec:
-tcg_out_insn(s, 3616, AND, is_q, 0, a0, a1, a2);
+if (const_args[2]) {
+int op, cmode, imm8;
+is_fimm(~a2, , , );
+tcg_out_mov(s, type, a0, a1);
+tcg_out_insn(s, 3606, BICI, is_q, a0, 0, cmode, imm8);
+} else {
+tcg_out_insn(s, 3616, AND, is_q, 0, a0, a1, a2);
+}
 break;
 case INDEX_op_or_vec:
-tcg_out_insn(s, 3616, ORR, is_q, 0, a0, a1, a2);
+if (const_args[2]) {
+int op, cmode, imm8;
+is_fimm(a2, , , );
+tcg_out_mov(s, type, a0, a1);
+tcg_out_insn(s, 3606, ORRI, is_q, a0, 0, cmode, imm8);
+} else {
+tcg_out_insn(s, 3616, ORR, is_q, 0, a0, a1, a2);
+}
+break;
+case INDEX_op_andc_vec:
+if (const_args[2]) {
+int op, cmode, imm8;
+is_fimm(a2, , , );
+tcg_out_mov(s, type, a0, a1);
+tcg_out_insn(s, 3606, BICI, is_q, a0, 0, cmode, imm8);
+} else {
+tcg_out_insn(s, 3616, BIC, is_q, 0, a0, a1, a2);
+}
+break;
+case INDEX_op_orc_vec:
+if (const_args[2]) {
+int op, cmode, imm8;
+is_fimm(~a2, , , );
+tcg_out_mov(s, type, a0, a1);
+tcg_out_insn(s, 3606, ORRI, is_q, a0, 0, cmode, imm8);
+} else {
+tcg_out_insn(s, 3616, ORN, is_q, 0, a0, a1, a2);
+}
 break;
 case INDEX_op_xor_vec:
 tcg_out_insn(s, 3616, EOR, is_q, 0, a0, a1, a2);
 break;
-case INDEX_op_andc_vec:
-tcg_out_insn(s, 3616, BIC, is_q, 0, a0, a1, a2);
-break;
-case INDEX_op_orc_vec:
-tcg_out_insn(s, 3616, ORN, is_q, 0, a0, a1, a2);
-break;
 case INDEX_op_ssadd_vec:
 tcg_out_insn(s, 3616, SQADD, is_q, vece, a0, a1, a2);
 break;
@@ -2413,6 +2462,8 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode 
op)
 static const TCGTargetOpDef lZ_l = { .args_ct_str = { "lZ", "l" } };
 static const TCGTargetOpDef r_r_r = { .args_ct_str = { "r", "r", "r" } };
 static const TCGTargetOpDef w_w_w = { .args_ct_str = { "w", "w", "w" } };
+static const TCGTargetOpDef w_w_wI = { .args_ct_str = { "w", "w", "wI" } };
+static const TCGTargetOpDef w_w_wP = { .args_ct_str = { "w", "w", "wP" } };
 static const TCGTargetOpDef w_w_wZ = { .args_ct_str = { "w", "w", "wZ" } };
 static const TCGTargetOpDef r_r_ri = { .args_ct_str = { "r", "r", "ri" } };
 static const TCGTargetOpDef r_r_rA = { .args_ct_str = { "r", "r", "rA" } };
@@ -2568,11 +2619,7 @@ static const 

[Qemu-devel] [Bug 1818367] Re: Initialization of device cfi.pflash01 failed: Block node is read-only

2019-04-20 Thread José Pekkarinen
this is apparmor profile related in the end, rebooting with apparmor disabled 
allows
the domain to boot, so I'll deal with the gentoo community, and this can be 
closed.

Thanks!

José.

** Changed in: libvirt
   Status: New => Invalid

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1818367

Title:
  Initialization of device cfi.pflash01 failed: Block node is read-only

Status in libvirt:
  Invalid
Status in QEMU:
  Invalid

Bug description:
  Hi,

  I have several vms defined in libvirt using ovmf for uefi, since a later
  update of my server I'm unable to start any of the domains defined. This is
  an example of the output given:

  # virsh start os-1
  error: Failed to start domain os-1
  error: internal error: qemu unexpectedly closed the monitor: 
2019-03-02T21:23:51.726446Z qemu-system-x86_64: Initialization of device 
cfi.pflash01 failed: Block node is read-only

  an example of domain is like this:

 


 
os-1   


 
34c41008-ab91-483b-959c-81a7a12ae9be   


 
8388608 


 
8388608   


 
 


 



 



 
4  


 



 
  hvm   


 
  /var/lib/libvirt/qemu/nvram/os-1-ovmf.fd   


 
   


 



 
   


 
  

 

[Qemu-devel] [PATCH 24/38] target/s390x: Use tcg_gen_abs_i64

2019-04-20 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/s390x/translate.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index 0afa8f7ca5..030129acbb 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -1407,13 +1407,7 @@ static DisasJumpType help_branch(DisasContext *s, 
DisasCompare *c,
 
 static DisasJumpType op_abs(DisasContext *s, DisasOps *o)
 {
-TCGv_i64 z, n;
-z = tcg_const_i64(0);
-n = tcg_temp_new_i64();
-tcg_gen_neg_i64(n, o->in2);
-tcg_gen_movcond_i64(TCG_COND_LT, o->out, o->in2, z, n, o->in2);
-tcg_temp_free_i64(n);
-tcg_temp_free_i64(z);
+tcg_gen_abs_i64(o->out, o->in2);
 return DISAS_NEXT;
 }
 
-- 
2.17.1




[Qemu-devel] [PATCH 21/38] target/arm: Use tcg_gen_abs_i64 and tcg_gen_gvec_abs

2019-04-20 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/arm/helper.h|  2 --
 target/arm/neon_helper.c   |  5 -
 target/arm/translate-a64.c | 41 +-
 target/arm/translate.c | 11 +++---
 4 files changed, 8 insertions(+), 51 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index a09566f795..3d90b5be66 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -347,8 +347,6 @@ DEF_HELPER_2(neon_ceq_u8, i32, i32, i32)
 DEF_HELPER_2(neon_ceq_u16, i32, i32, i32)
 DEF_HELPER_2(neon_ceq_u32, i32, i32, i32)
 
-DEF_HELPER_1(neon_abs_s8, i32, i32)
-DEF_HELPER_1(neon_abs_s16, i32, i32)
 DEF_HELPER_1(neon_clz_u8, i32, i32)
 DEF_HELPER_1(neon_clz_u16, i32, i32)
 DEF_HELPER_1(neon_cls_s8, i32, i32)
diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c
index ed1c6fc41c..4259056723 100644
--- a/target/arm/neon_helper.c
+++ b/target/arm/neon_helper.c
@@ -1228,11 +1228,6 @@ NEON_VOP(ceq_u16, neon_u16, 2)
 NEON_VOP(ceq_u32, neon_u32, 1)
 #undef NEON_FN
 
-#define NEON_FN(dest, src, dummy) dest = (src < 0) ? -src : src
-NEON_VOP1(abs_s8, neon_s8, 4)
-NEON_VOP1(abs_s16, neon_s16, 2)
-#undef NEON_FN
-
 /* Count Leading Sign/Zero Bits.  */
 static inline int do_clz8(uint8_t x)
 {
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index dcdeb80176..fd8921565e 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -9468,11 +9468,7 @@ static void handle_2misc_64(DisasContext *s, int opcode, 
bool u,
 if (u) {
 tcg_gen_neg_i64(tcg_rd, tcg_rn);
 } else {
-TCGv_i64 tcg_zero = tcg_const_i64(0);
-tcg_gen_neg_i64(tcg_rd, tcg_rn);
-tcg_gen_movcond_i64(TCG_COND_GT, tcg_rd, tcg_rn, tcg_zero,
-tcg_rn, tcg_rd);
-tcg_temp_free_i64(tcg_zero);
+tcg_gen_abs_i64(tcg_rd, tcg_rn);
 }
 break;
 case 0x2f: /* FABS */
@@ -12366,11 +12362,12 @@ static void disas_simd_two_reg_misc(DisasContext *s, 
uint32_t insn)
 }
 break;
 case 0xb:
-if (u) { /* NEG */
+if (u) { /* ABS, NEG */
 gen_gvec_fn2(s, is_q, rd, rn, tcg_gen_gvec_neg, size);
-return;
+} else {
+gen_gvec_fn2(s, is_q, rd, rn, tcg_gen_gvec_abs, size);
 }
-break;
+return;
 }
 
 if (size == 3) {
@@ -12438,17 +12435,6 @@ static void disas_simd_two_reg_misc(DisasContext *s, 
uint32_t insn)
 gen_helper_neon_qabs_s32(tcg_res, cpu_env, tcg_op);
 }
 break;
-case 0xb: /* ABS, NEG */
-if (u) {
-tcg_gen_neg_i32(tcg_res, tcg_op);
-} else {
-TCGv_i32 tcg_zero = tcg_const_i32(0);
-tcg_gen_neg_i32(tcg_res, tcg_op);
-tcg_gen_movcond_i32(TCG_COND_GT, tcg_res, tcg_op,
-tcg_zero, tcg_op, tcg_res);
-tcg_temp_free_i32(tcg_zero);
-}
-break;
 case 0x2f: /* FABS */
 gen_helper_vfp_abss(tcg_res, tcg_op);
 break;
@@ -12561,23 +12547,6 @@ static void disas_simd_two_reg_misc(DisasContext *s, 
uint32_t insn)
 tcg_temp_free_i32(tcg_zero);
 break;
 }
-case 0xb: /* ABS, NEG */
-if (u) {
-TCGv_i32 tcg_zero = tcg_const_i32(0);
-if (size) {
-gen_helper_neon_sub_u16(tcg_res, tcg_zero, tcg_op);
-} else {
-gen_helper_neon_sub_u8(tcg_res, tcg_zero, tcg_op);
-}
-tcg_temp_free_i32(tcg_zero);
-} else {
-if (size) {
-gen_helper_neon_abs_s16(tcg_res, tcg_op);
-} else {
-gen_helper_neon_abs_s8(tcg_res, tcg_op);
-}
-}
-break;
 case 0x4: /* CLS, CLZ */
 if (u) {
 if (size == 0) {
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 721171794d..911ad0bdab 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -8031,6 +8031,9 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t 
insn)
 case NEON_2RM_VNEG:
 tcg_gen_gvec_neg(size, rd_ofs, rm_ofs, vec_size, vec_size);
 break;
+case NEON_2RM_VABS:
+tcg_gen_gvec_abs(size, rd_ofs, rm_ofs, vec_size, vec_size);
+break;
 
 default:
 elementwise:
@@ -8136,14 +8139,6 @@ static int 

[Qemu-devel] [PATCH 25/38] target/xtensa: Use tcg_gen_abs_i32

2019-04-20 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/xtensa/translate.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/target/xtensa/translate.c b/target/xtensa/translate.c
index 65561d2c49..62be8a6f6a 100644
--- a/target/xtensa/translate.c
+++ b/target/xtensa/translate.c
@@ -1707,14 +1707,7 @@ void restore_state_to_opc(CPUXtensaState *env, 
TranslationBlock *tb,
 static void translate_abs(DisasContext *dc, const OpcodeArg arg[],
   const uint32_t par[])
 {
-TCGv_i32 zero = tcg_const_i32(0);
-TCGv_i32 neg = tcg_temp_new_i32();
-
-tcg_gen_neg_i32(neg, arg[1].in);
-tcg_gen_movcond_i32(TCG_COND_GE, arg[0].out,
-arg[1].in, zero, arg[1].in, neg);
-tcg_temp_free(neg);
-tcg_temp_free(zero);
+tcg_gen_abs_i32(arg[0].out, arg[1].in);
 }
 
 static void translate_add(DisasContext *dc, const OpcodeArg arg[],
-- 
2.17.1




[Qemu-devel] [PATCH 16/38] tcg: Specify optional vector requirements with a list

2019-04-20 Thread Richard Henderson
Replace the single opcode in .opc with a null-terminated
array in .opt_opc.  We still require that all opcodes be
used with the same .vece.

Validate the contents of this list with CONFIG_DEBUG_TCG.
All tcg_gen_*_vec functions will check any list active
during .fniv expansion.  Swap the active list in and out
as we expand other opcodes, or take control away from the
front-end function.

Convert all existing vector aware front ends.

Signed-off-by: Richard Henderson 
---
 tcg/tcg-op-gvec.h   |  24 +-
 tcg/tcg.h   |  18 ++
 target/arm/translate-sve.c  |   9 +-
 target/arm/translate.c  | 127 +++
 target/ppc/translate/vmx-impl.inc.c |   7 +-
 tcg/tcg-op-gvec.c   | 341 ++--
 tcg/tcg-op-vec.c|  18 ++
 7 files changed, 365 insertions(+), 179 deletions(-)

diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h
index aaeb6e5d8b..a0e0902f6c 100644
--- a/tcg/tcg-op-gvec.h
+++ b/tcg/tcg-op-gvec.h
@@ -91,8 +91,8 @@ typedef struct {
 void (*fniv)(unsigned, TCGv_vec, TCGv_vec);
 /* Expand out-of-line helper w/descriptor.  */
 gen_helper_gvec_2 *fno;
-/* The opcode, if any, to which this corresponds.  */
-TCGOpcode opc;
+/* The optional opcodes, if any, utilized by .fniv.  */
+const TCGOpcode *opt_opc;
 /* The data argument to the out-of-line helper.  */
 int32_t data;
 /* The vector element size, if applicable.  */
@@ -112,8 +112,8 @@ typedef struct {
 gen_helper_gvec_2 *fno;
 /* Expand out-of-line helper w/descriptor, data as argument.  */
 gen_helper_gvec_2i *fnoi;
-/* The opcode, if any, to which this corresponds.  */
-TCGOpcode opc;
+/* The optional opcodes, if any, utilized by .fniv.  */
+const TCGOpcode *opt_opc;
 /* The vector element size, if applicable.  */
 uint8_t vece;
 /* Prefer i64 to v64.  */
@@ -131,8 +131,8 @@ typedef struct {
 void (*fniv)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec);
 /* Expand out-of-line helper w/descriptor.  */
 gen_helper_gvec_2i *fno;
-/* The opcode, if any, to which this corresponds.  */
-TCGOpcode opc;
+/* The optional opcodes, if any, utilized by .fniv.  */
+const TCGOpcode *opt_opc;
 /* The data argument to the out-of-line helper.  */
 uint32_t data;
 /* The vector element size, if applicable.  */
@@ -152,8 +152,8 @@ typedef struct {
 void (*fniv)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec);
 /* Expand out-of-line helper w/descriptor.  */
 gen_helper_gvec_3 *fno;
-/* The opcode, if any, to which this corresponds.  */
-TCGOpcode opc;
+/* The optional opcodes, if any, utilized by .fniv.  */
+const TCGOpcode *opt_opc;
 /* The data argument to the out-of-line helper.  */
 int32_t data;
 /* The vector element size, if applicable.  */
@@ -175,8 +175,8 @@ typedef struct {
 void (*fniv)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec, int64_t);
 /* Expand out-of-line helper w/descriptor, data in descriptor.  */
 gen_helper_gvec_3 *fno;
-/* The opcode, if any, to which this corresponds.  */
-TCGOpcode opc;
+/* The optional opcodes, if any, utilized by .fniv.  */
+const TCGOpcode *opt_opc;
 /* The vector element size, if applicable.  */
 uint8_t vece;
 /* Prefer i64 to v64.  */
@@ -194,8 +194,8 @@ typedef struct {
 void (*fniv)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec, TCGv_vec);
 /* Expand out-of-line helper w/descriptor.  */
 gen_helper_gvec_4 *fno;
-/* The opcode, if any, to which this corresponds.  */
-TCGOpcode opc;
+/* The optional opcodes, if any, utilized by .fniv.  */
+const TCGOpcode *opt_opc;
 /* The data argument to the out-of-line helper.  */
 int32_t data;
 /* The vector element size, if applicable.  */
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 7b1c15b40b..48d4d2e03e 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -694,6 +694,7 @@ struct TCGContext {
 QSIMPLEQ_HEAD(, TCGLabel) labels;
 int temps_in_use;
 int goto_tb_issue_mask;
+const TCGOpcode *vecop_list;
 #endif
 
 /* Code generation.  Note that we specifically do not use tcg_insn_unit
@@ -1493,4 +1494,21 @@ void helper_atomic_sto_le_mmu(CPUArchState *env, 
target_ulong addr, Int128 val,
 void helper_atomic_sto_be_mmu(CPUArchState *env, target_ulong addr, Int128 val,
   TCGMemOpIdx oi, uintptr_t retaddr);
 
+#ifdef CONFIG_DEBUG_TCG
+void tcg_assert_listed_vecop(TCGOpcode);
+#else
+static inline void tcg_assert_listed_vecop(TCGOpcode op) { }
+#endif
+
+static inline const TCGOpcode *tcg_swap_vecop_list(const TCGOpcode *n)
+{
+#ifdef CONFIG_DEBUG_TCG
+const TCGOpcode *o = tcg_ctx->vecop_list;
+tcg_ctx->vecop_list = n;
+return o;
+#else
+return NULL;
+#endif
+}
+
 #endif /* TCG_H */
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 245cd82621..0682c0d32b 100644
--- a/target/arm/translate-sve.c
+++ 

[Qemu-devel] [PATCH 23/38] target/ppc: Use tcg_gen_abs_tl

2019-04-20 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/ppc/translate.c | 80 +-
 1 file changed, 32 insertions(+), 48 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index badc1ae1a3..97b8e8ddaf 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -5013,39 +5013,27 @@ static void gen_ecowx(DisasContext *ctx)
 /* abs - abs. */
 static void gen_abs(DisasContext *ctx)
 {
-TCGLabel *l1 = gen_new_label();
-TCGLabel *l2 = gen_new_label();
-tcg_gen_brcondi_tl(TCG_COND_GE, cpu_gpr[rA(ctx->opcode)], 0, l1);
-tcg_gen_neg_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)]);
-tcg_gen_br(l2);
-gen_set_label(l1);
-tcg_gen_mov_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)]);
-gen_set_label(l2);
-if (unlikely(Rc(ctx->opcode) != 0))
-gen_set_Rc0(ctx, cpu_gpr[rD(ctx->opcode)]);
+TCGv d = cpu_gpr[rD(ctx->opcode)];
+TCGv a = cpu_gpr[rA(ctx->opcode)];
+
+tcg_gen_abs_tl(d, a);
+if (unlikely(Rc(ctx->opcode) != 0)) {
+gen_set_Rc0(ctx, d);
+}
 }
 
 /* abso - abso. */
 static void gen_abso(DisasContext *ctx)
 {
-TCGLabel *l1 = gen_new_label();
-TCGLabel *l2 = gen_new_label();
-TCGLabel *l3 = gen_new_label();
-/* Start with XER OV disabled, the most likely case */
-tcg_gen_movi_tl(cpu_ov, 0);
-tcg_gen_brcondi_tl(TCG_COND_GE, cpu_gpr[rA(ctx->opcode)], 0, l2);
-tcg_gen_brcondi_tl(TCG_COND_NE, cpu_gpr[rA(ctx->opcode)], 0x8000, l1);
-tcg_gen_movi_tl(cpu_ov, 1);
-tcg_gen_movi_tl(cpu_so, 1);
-tcg_gen_br(l2);
-gen_set_label(l1);
-tcg_gen_neg_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)]);
-tcg_gen_br(l3);
-gen_set_label(l2);
-tcg_gen_mov_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)]);
-gen_set_label(l3);
-if (unlikely(Rc(ctx->opcode) != 0))
-gen_set_Rc0(ctx, cpu_gpr[rD(ctx->opcode)]);
+TCGv d = cpu_gpr[rD(ctx->opcode)];
+TCGv a = cpu_gpr[rA(ctx->opcode)];
+
+tcg_gen_setcondi_tl(TCG_COND_EQ, cpu_ov, a, 0x8000);
+tcg_gen_abs_tl(d, a);
+tcg_gen_or_tl(cpu_so, cpu_so, cpu_ov);
+if (unlikely(Rc(ctx->opcode) != 0)) {
+gen_set_Rc0(ctx, d);
+}
 }
 
 /* clcs */
@@ -5265,33 +5253,29 @@ static void gen_mulo(DisasContext *ctx)
 /* nabs - nabs. */
 static void gen_nabs(DisasContext *ctx)
 {
-TCGLabel *l1 = gen_new_label();
-TCGLabel *l2 = gen_new_label();
-tcg_gen_brcondi_tl(TCG_COND_GT, cpu_gpr[rA(ctx->opcode)], 0, l1);
-tcg_gen_mov_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)]);
-tcg_gen_br(l2);
-gen_set_label(l1);
-tcg_gen_neg_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)]);
-gen_set_label(l2);
-if (unlikely(Rc(ctx->opcode) != 0))
-gen_set_Rc0(ctx, cpu_gpr[rD(ctx->opcode)]);
+TCGv d = cpu_gpr[rD(ctx->opcode)];
+TCGv a = cpu_gpr[rA(ctx->opcode)];
+
+tcg_gen_abs_tl(d, a);
+tcg_gen_neg_tl(d, d);
+if (unlikely(Rc(ctx->opcode) != 0)) {
+gen_set_Rc0(ctx, d);
+}
 }
 
 /* nabso - nabso. */
 static void gen_nabso(DisasContext *ctx)
 {
-TCGLabel *l1 = gen_new_label();
-TCGLabel *l2 = gen_new_label();
-tcg_gen_brcondi_tl(TCG_COND_GT, cpu_gpr[rA(ctx->opcode)], 0, l1);
-tcg_gen_mov_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)]);
-tcg_gen_br(l2);
-gen_set_label(l1);
-tcg_gen_neg_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)]);
-gen_set_label(l2);
+TCGv d = cpu_gpr[rD(ctx->opcode)];
+TCGv a = cpu_gpr[rA(ctx->opcode)];
+
+tcg_gen_abs_tl(d, a);
+tcg_gen_neg_tl(d, d);
 /* nabs never overflows */
 tcg_gen_movi_tl(cpu_ov, 0);
-if (unlikely(Rc(ctx->opcode) != 0))
-gen_set_Rc0(ctx, cpu_gpr[rD(ctx->opcode)]);
+if (unlikely(Rc(ctx->opcode) != 0)) {
+gen_set_Rc0(ctx, d);
+}
 }
 
 /* rlmi - rlmi. */
-- 
2.17.1




[Qemu-devel] [PATCH 13/38] tcg/i386: Support vector variable shift opcodes

2019-04-20 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.h |  2 +-
 tcg/i386/tcg-target.inc.c | 35 +++
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 241bf19413..b240633455 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -184,7 +184,7 @@ extern bool have_avx2;
 #define TCG_TARGET_HAS_neg_vec  0
 #define TCG_TARGET_HAS_shi_vec  1
 #define TCG_TARGET_HAS_shs_vec  0
-#define TCG_TARGET_HAS_shv_vec  0
+#define TCG_TARGET_HAS_shv_vec  have_avx2
 #define TCG_TARGET_HAS_cmp_vec  1
 #define TCG_TARGET_HAS_mul_vec  1
 #define TCG_TARGET_HAS_sat_vec  1
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 4c42a2430d..04e609c7b2 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -467,6 +467,11 @@ static inline int tcg_target_const_match(tcg_target_long 
val, TCGType type,
 #define OPC_VPBROADCASTQ (0x59 | P_EXT38 | P_DATA16)
 #define OPC_VPERMQ  (0x00 | P_EXT3A | P_DATA16 | P_REXW)
 #define OPC_VPERM2I128  (0x46 | P_EXT3A | P_DATA16 | P_VEXL)
+#define OPC_VPSLLVD (0x47 | P_EXT38 | P_DATA16)
+#define OPC_VPSLLVQ (0x47 | P_EXT38 | P_DATA16 | P_REXW)
+#define OPC_VPSRAVD (0x46 | P_EXT38 | P_DATA16)
+#define OPC_VPSRLVD (0x45 | P_EXT38 | P_DATA16)
+#define OPC_VPSRLVQ (0x45 | P_EXT38 | P_DATA16 | P_REXW)
 #define OPC_VZEROUPPER  (0x77 | P_EXT)
 #define OPC_XCHG_ax_r32(0x90)
 
@@ -2705,6 +2710,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 static int const umax_insn[4] = {
 OPC_PMAXUB, OPC_PMAXUW, OPC_PMAXUD, OPC_UD2
 };
+static int const shlv_insn[4] = {
+/* TODO: AVX512 adds support for MO_16.  */
+OPC_UD2, OPC_UD2, OPC_VPSLLVD, OPC_VPSLLVQ
+};
+static int const shrv_insn[4] = {
+/* TODO: AVX512 adds support for MO_16.  */
+OPC_UD2, OPC_UD2, OPC_VPSRLVD, OPC_VPSRLVQ
+};
+static int const sarv_insn[4] = {
+/* TODO: AVX512 adds support for MO_16, MO_64.  */
+OPC_UD2, OPC_UD2, OPC_VPSRAVD, OPC_UD2
+};
 
 TCGType type = vecl + TCG_TYPE_V64;
 int insn, sub;
@@ -2757,6 +2774,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_umax_vec:
 insn = umax_insn[vece];
 goto gen_simd;
+case INDEX_op_shlv_vec:
+insn = shlv_insn[vece];
+goto gen_simd;
+case INDEX_op_shrv_vec:
+insn = shrv_insn[vece];
+goto gen_simd;
+case INDEX_op_sarv_vec:
+insn = sarv_insn[vece];
+goto gen_simd;
 case INDEX_op_x86_punpckl_vec:
 insn = punpckl_insn[vece];
 goto gen_simd;
@@ -3134,6 +3160,9 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_umin_vec:
 case INDEX_op_smax_vec:
 case INDEX_op_umax_vec:
+case INDEX_op_shlv_vec:
+case INDEX_op_shrv_vec:
+case INDEX_op_sarv_vec:
 case INDEX_op_cmp_vec:
 case INDEX_op_x86_shufps_vec:
 case INDEX_op_x86_blend_vec:
@@ -3191,6 +3220,12 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 }
 return 1;
 
+case INDEX_op_shlv_vec:
+case INDEX_op_shrv_vec:
+return have_avx2 && vece >= MO_32;
+case INDEX_op_sarv_vec:
+return have_avx2 && vece == MO_32;
+
 case INDEX_op_mul_vec:
 if (vece == MO_8) {
 /* We can expand the operation for MO_8.  */
-- 
2.17.1




[Qemu-devel] [PATCH 18/38] tcg/i386: Support vector scalar shift opcodes

2019-04-20 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.h |  2 +-
 tcg/i386/tcg-target.inc.c | 35 +++
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index b240633455..618aa520d2 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -183,7 +183,7 @@ extern bool have_avx2;
 #define TCG_TARGET_HAS_not_vec  0
 #define TCG_TARGET_HAS_neg_vec  0
 #define TCG_TARGET_HAS_shi_vec  1
-#define TCG_TARGET_HAS_shs_vec  0
+#define TCG_TARGET_HAS_shs_vec  1
 #define TCG_TARGET_HAS_shv_vec  have_avx2
 #define TCG_TARGET_HAS_cmp_vec  1
 #define TCG_TARGET_HAS_mul_vec  1
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 04e609c7b2..85b68e4326 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -420,6 +420,14 @@ static inline int tcg_target_const_match(tcg_target_long 
val, TCGType type,
 #define OPC_PSHIFTW_Ib  (0x71 | P_EXT | P_DATA16) /* /2 /6 /4 */
 #define OPC_PSHIFTD_Ib  (0x72 | P_EXT | P_DATA16) /* /2 /6 /4 */
 #define OPC_PSHIFTQ_Ib  (0x73 | P_EXT | P_DATA16) /* /2 /6 /4 */
+#define OPC_PSLLW   (0xf1 | P_EXT | P_DATA16)
+#define OPC_PSLLD   (0xf2 | P_EXT | P_DATA16)
+#define OPC_PSLLQ   (0xf3 | P_EXT | P_DATA16)
+#define OPC_PSRAW   (0xe1 | P_EXT | P_DATA16)
+#define OPC_PSRAD   (0xe2 | P_EXT | P_DATA16)
+#define OPC_PSRLW   (0xd1 | P_EXT | P_DATA16)
+#define OPC_PSRLD   (0xd2 | P_EXT | P_DATA16)
+#define OPC_PSRLQ   (0xd3 | P_EXT | P_DATA16)
 #define OPC_PSUBB   (0xf8 | P_EXT | P_DATA16)
 #define OPC_PSUBW   (0xf9 | P_EXT | P_DATA16)
 #define OPC_PSUBD   (0xfa | P_EXT | P_DATA16)
@@ -2722,6 +2730,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 /* TODO: AVX512 adds support for MO_16, MO_64.  */
 OPC_UD2, OPC_UD2, OPC_VPSRAVD, OPC_UD2
 };
+static int const shls_insn[4] = {
+OPC_UD2, OPC_PSLLW, OPC_PSLLD, OPC_PSLLQ
+};
+static int const shrs_insn[4] = {
+OPC_UD2, OPC_PSRLW, OPC_PSRLD, OPC_PSRLQ
+};
+static int const sars_insn[4] = {
+OPC_UD2, OPC_PSRAW, OPC_PSRAD, OPC_UD2
+};
 
 TCGType type = vecl + TCG_TYPE_V64;
 int insn, sub;
@@ -2783,6 +2800,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_sarv_vec:
 insn = sarv_insn[vece];
 goto gen_simd;
+case INDEX_op_shls_vec:
+insn = shls_insn[vece];
+goto gen_simd;
+case INDEX_op_shrs_vec:
+insn = shrs_insn[vece];
+goto gen_simd;
+case INDEX_op_sars_vec:
+insn = sars_insn[vece];
+goto gen_simd;
 case INDEX_op_x86_punpckl_vec:
 insn = punpckl_insn[vece];
 goto gen_simd;
@@ -3163,6 +3189,9 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_shlv_vec:
 case INDEX_op_shrv_vec:
 case INDEX_op_sarv_vec:
+case INDEX_op_shls_vec:
+case INDEX_op_shrs_vec:
+case INDEX_op_sars_vec:
 case INDEX_op_cmp_vec:
 case INDEX_op_x86_shufps_vec:
 case INDEX_op_x86_blend_vec:
@@ -3220,6 +3249,12 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 }
 return 1;
 
+case INDEX_op_shls_vec:
+case INDEX_op_shrs_vec:
+return vece >= MO_16;
+case INDEX_op_sars_vec:
+return vece >= MO_16 && vece <= MO_32;
+
 case INDEX_op_shlv_vec:
 case INDEX_op_shrv_vec:
 return have_avx2 && vece >= MO_32;
-- 
2.17.1




[Qemu-devel] [PATCH 19/38] tcg: Add support for integer absolute value

2019-04-20 Thread Richard Henderson
Remove a function of the same name from target/arm/.
Use a branchless implementation of abs that gcc uses for x86.

Signed-off-by: Richard Henderson 
---
 tcg/tcg-op.h   |  5 +
 target/arm/translate.c | 10 --
 tcg/tcg-op.c   | 20 
 3 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 472b73cb38..660fe205d0 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -335,6 +335,7 @@ void tcg_gen_smin_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 
arg2);
 void tcg_gen_smax_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_umin_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_umax_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
+void tcg_gen_abs_i32(TCGv_i32, TCGv_i32);
 
 static inline void tcg_gen_discard_i32(TCGv_i32 arg)
 {
@@ -534,6 +535,7 @@ void tcg_gen_smin_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 
arg2);
 void tcg_gen_smax_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_umin_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_umax_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
+void tcg_gen_abs_i64(TCGv_i64, TCGv_i64);
 
 #if TCG_TARGET_REG_BITS == 64
 static inline void tcg_gen_discard_i64(TCGv_i64 arg)
@@ -973,6 +975,7 @@ void tcg_gen_nor_vec(unsigned vece, TCGv_vec r, TCGv_vec a, 
TCGv_vec b);
 void tcg_gen_eqv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
 void tcg_gen_not_vec(unsigned vece, TCGv_vec r, TCGv_vec a);
 void tcg_gen_neg_vec(unsigned vece, TCGv_vec r, TCGv_vec a);
+void tcg_gen_abs_vec(unsigned vece, TCGv_vec r, TCGv_vec a);
 void tcg_gen_ssadd_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
 void tcg_gen_usadd_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
 void tcg_gen_sssub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
@@ -1019,6 +1022,7 @@ void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg 
offset, TCGType t);
 #define tcg_gen_addi_tl tcg_gen_addi_i64
 #define tcg_gen_sub_tl tcg_gen_sub_i64
 #define tcg_gen_neg_tl tcg_gen_neg_i64
+#define tcg_gen_abs_tl tcg_gen_abs_i64
 #define tcg_gen_subfi_tl tcg_gen_subfi_i64
 #define tcg_gen_subi_tl tcg_gen_subi_i64
 #define tcg_gen_and_tl tcg_gen_and_i64
@@ -1131,6 +1135,7 @@ void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg 
offset, TCGType t);
 #define tcg_gen_addi_tl tcg_gen_addi_i32
 #define tcg_gen_sub_tl tcg_gen_sub_i32
 #define tcg_gen_neg_tl tcg_gen_neg_i32
+#define tcg_gen_abs_tl tcg_gen_abs_i32
 #define tcg_gen_subfi_tl tcg_gen_subfi_i32
 #define tcg_gen_subi_tl tcg_gen_subi_i32
 #define tcg_gen_and_tl tcg_gen_and_i32
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 83a008e945..721171794d 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -603,16 +603,6 @@ static void gen_sar(TCGv_i32 dest, TCGv_i32 t0, TCGv_i32 
t1)
 tcg_temp_free_i32(tmp1);
 }
 
-static void tcg_gen_abs_i32(TCGv_i32 dest, TCGv_i32 src)
-{
-TCGv_i32 c0 = tcg_const_i32(0);
-TCGv_i32 tmp = tcg_temp_new_i32();
-tcg_gen_neg_i32(tmp, src);
-tcg_gen_movcond_i32(TCG_COND_GT, dest, src, c0, src, tmp);
-tcg_temp_free_i32(c0);
-tcg_temp_free_i32(tmp);
-}
-
 static void shifter_out_im(TCGv_i32 var, int shift)
 {
 if (shift == 0) {
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index a00d1df37e..0ac291f1c4 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1091,6 +1091,16 @@ void tcg_gen_umax_i32(TCGv_i32 ret, TCGv_i32 a, TCGv_i32 
b)
 tcg_gen_movcond_i32(TCG_COND_LTU, ret, a, b, b, a);
 }
 
+void tcg_gen_abs_i32(TCGv_i32 ret, TCGv_i32 a)
+{
+TCGv_i32 t = tcg_temp_new_i32();
+
+tcg_gen_sari_i32(t, a, 31);
+tcg_gen_xor_i32(ret, a, t);
+tcg_gen_sub_i32(ret, ret, t);
+tcg_temp_free_i32(t);
+}
+
 /* 64-bit ops */
 
 #if TCG_TARGET_REG_BITS == 32
@@ -2548,6 +2558,16 @@ void tcg_gen_umax_i64(TCGv_i64 ret, TCGv_i64 a, TCGv_i64 
b)
 tcg_gen_movcond_i64(TCG_COND_LTU, ret, a, b, b, a);
 }
 
+void tcg_gen_abs_i64(TCGv_i64 ret, TCGv_i64 a)
+{
+TCGv_i64 t = tcg_temp_new_i64();
+
+tcg_gen_sari_i64(t, a, 63);
+tcg_gen_xor_i64(ret, a, t);
+tcg_gen_sub_i64(ret, ret, t);
+tcg_temp_free_i64(t);
+}
+
 /* Size changing operations.  */
 
 void tcg_gen_extrl_i64_i32(TCGv_i32 ret, TCGv_i64 arg)
-- 
2.17.1




[Qemu-devel] [PATCH 11/38] tcg: Add INDEX_op_dup_mem_vec

2019-04-20 Thread Richard Henderson
Allow the backend to expand dup from memory directly, instead of
forcing the value into a temp first.  This is especially important
if integer/vector register moves do not exist.

Note that officially tcg_out_dupm_vec is allowed to fail.
If it did, we could fix this up relatively easily:

  VECE == 32/64:
Load the value into a vector register, then dup.
Both of these must work.

  VECE == 8/16:
If the value happens to be at an offset such that an aligned
load would place the desired value in the least significant
end of the register, go ahead and load w/garbage in high bits.

Load the value w/INDEX_op_ld{8,16}_i32.
Attempt a move directly to vector reg, which may fail.
Store the value into the backing store for OTS.
Load the value into the vector reg w/TCG_TYPE_I32, which must work.
Duplicate from the vector reg into itself, which must work.

All of which is well and good, except that all supported
hosts can support dupm for all vece, so all of the failure
paths would be dead code and untestable.

Signed-off-by: Richard Henderson 
---
 tcg/tcg-op.h |  1 +
 tcg/tcg-opc.h|  1 +
 tcg/aarch64/tcg-target.inc.c |  4 ++
 tcg/i386/tcg-target.inc.c|  4 ++
 tcg/tcg-op-gvec.c| 88 +++-
 tcg/tcg-op-vec.c | 11 +
 tcg/tcg.c|  1 +
 7 files changed, 69 insertions(+), 41 deletions(-)

diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 1f1824c30a..9fff9864f6 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -954,6 +954,7 @@ void tcg_gen_atomic_umax_fetch_i64(TCGv_i64, TCGv, 
TCGv_i64, TCGArg, TCGMemOp);
 void tcg_gen_mov_vec(TCGv_vec, TCGv_vec);
 void tcg_gen_dup_i32_vec(unsigned vece, TCGv_vec, TCGv_i32);
 void tcg_gen_dup_i64_vec(unsigned vece, TCGv_vec, TCGv_i64);
+void tcg_gen_dup_mem_vec(unsigned vece, TCGv_vec, TCGv_ptr, tcg_target_long);
 void tcg_gen_dup8i_vec(TCGv_vec, uint32_t);
 void tcg_gen_dup16i_vec(TCGv_vec, uint32_t);
 void tcg_gen_dup32i_vec(TCGv_vec, uint32_t);
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 1bad6e4208..4bf71f261f 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -219,6 +219,7 @@ DEF(dup2_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_REG_BITS 
== 32))
 
 DEF(ld_vec, 1, 1, 1, IMPLVEC)
 DEF(st_vec, 0, 2, 1, IMPLVEC)
+DEF(dupm_vec, 1, 1, 1, IMPLVEC)
 
 DEF(add_vec, 1, 2, 0, IMPLVEC)
 DEF(sub_vec, 1, 2, 0, IMPLVEC)
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 1db4e22365..1c9f4b0cb3 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -2188,6 +2188,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_st_vec:
 tcg_out_st(s, type, a0, a1, a2);
 break;
+case INDEX_op_dupm_vec:
+tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
+break;
 case INDEX_op_add_vec:
 tcg_out_insn(s, 3616, ADD, is_q, vece, a0, a1, a2);
 break;
@@ -2520,6 +2523,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode 
op)
 return _w;
 case INDEX_op_ld_vec:
 case INDEX_op_st_vec:
+case INDEX_op_dupm_vec:
 return _r;
 case INDEX_op_dup_vec:
 return _wr;
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index fcabc1bdf2..4c42a2430d 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -2827,6 +2827,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_st_vec:
 tcg_out_st(s, type, a0, a1, a2);
 break;
+case INDEX_op_dupm_vec:
+tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
+break;
 
 case INDEX_op_x86_shufps_vec:
 insn = OPC_SHUFPS;
@@ -3113,6 +3116,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode 
op)
 
 case INDEX_op_ld_vec:
 case INDEX_op_st_vec:
+case INDEX_op_dupm_vec:
 return _r;
 
 case INDEX_op_add_vec:
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 0996ef0812..f056018713 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -390,6 +390,40 @@ static TCGType choose_vector_type(TCGOpcode op, unsigned 
vece, uint32_t size,
 return 0;
 }
 
+static void do_dup_store(TCGType type, uint32_t dofs, uint32_t oprsz,
+ uint32_t maxsz, TCGv_vec t_vec)
+{
+uint32_t i = 0;
+
+switch (type) {
+case TCG_TYPE_V256:
+/* Recall that ARM SVE allows vector sizes that are not a
+ * power of 2, but always a multiple of 16.  The intent is
+ * that e.g. size == 80 would be expanded with 2x32 + 1x16.
+ */
+for (; i + 32 <= oprsz; i += 32) {
+tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V256);
+}
+/* fallthru */
+case TCG_TYPE_V128:
+for (; i + 16 <= oprsz; i += 16) {
+tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V128);
+}
+break;
+case TCG_TYPE_V64:
+for (; i < oprsz; i += 8) {
+

[Qemu-devel] [PATCH 31/38] target/ppc: Use vector variable shifts for VS{L, R, RA}{B, H, W, D}

2019-04-20 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/ppc/helper.h |  24 ++--
 target/ppc/int_helper.c |   6 +-
 target/ppc/translate/vmx-impl.inc.c | 168 ++--
 3 files changed, 172 insertions(+), 26 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 638a6e99c4..5416dc55ce 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -180,18 +180,18 @@ DEF_HELPER_3(vmuloub, void, avr, avr, avr)
 DEF_HELPER_3(vmulouh, void, avr, avr, avr)
 DEF_HELPER_3(vmulouw, void, avr, avr, avr)
 DEF_HELPER_3(vmuluwm, void, avr, avr, avr)
-DEF_HELPER_3(vsrab, void, avr, avr, avr)
-DEF_HELPER_3(vsrah, void, avr, avr, avr)
-DEF_HELPER_3(vsraw, void, avr, avr, avr)
-DEF_HELPER_3(vsrad, void, avr, avr, avr)
-DEF_HELPER_3(vsrb, void, avr, avr, avr)
-DEF_HELPER_3(vsrh, void, avr, avr, avr)
-DEF_HELPER_3(vsrw, void, avr, avr, avr)
-DEF_HELPER_3(vsrd, void, avr, avr, avr)
-DEF_HELPER_3(vslb, void, avr, avr, avr)
-DEF_HELPER_3(vslh, void, avr, avr, avr)
-DEF_HELPER_3(vslw, void, avr, avr, avr)
-DEF_HELPER_3(vsld, void, avr, avr, avr)
+DEF_HELPER_4(vsrab, void, avr, avr, avr, i32)
+DEF_HELPER_4(vsrah, void, avr, avr, avr, i32)
+DEF_HELPER_4(vsraw, void, avr, avr, avr, i32)
+DEF_HELPER_4(vsrad, void, avr, avr, avr, i32)
+DEF_HELPER_4(vsrb, void, avr, avr, avr, i32)
+DEF_HELPER_4(vsrh, void, avr, avr, avr, i32)
+DEF_HELPER_4(vsrw, void, avr, avr, avr, i32)
+DEF_HELPER_4(vsrd, void, avr, avr, avr, i32)
+DEF_HELPER_4(vslb, void, avr, avr, avr, i32)
+DEF_HELPER_4(vslh, void, avr, avr, avr, i32)
+DEF_HELPER_4(vslw, void, avr, avr, avr, i32)
+DEF_HELPER_4(vsld, void, avr, avr, avr, i32)
 DEF_HELPER_3(vslo, void, avr, avr, avr)
 DEF_HELPER_3(vsro, void, avr, avr, avr)
 DEF_HELPER_3(vsrv, void, avr, avr, avr)
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 162add561e..35ec1ccdfb 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1770,7 +1770,8 @@ VSHIFT(r, 0)
 #undef VSHIFT
 
 #define VSL(suffix, element, mask)  \
-void helper_vsl##suffix(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)   \
+void helper_vsl##suffix(ppc_avr_t *r, ppc_avr_t *a, \
+ppc_avr_t *b, uint32_t desc)\
 {   \
 int i;  \
 \
@@ -1958,7 +1959,8 @@ VNEG(vnegd, s64)
 #undef VNEG
 
 #define VSR(suffix, element, mask)  \
-void helper_vsr##suffix(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)   \
+void helper_vsr##suffix(ppc_avr_t *r, ppc_avr_t *a, \
+ppc_avr_t *b, uint32_t desc)\
 {   \
 int i;  \
 \
diff --git a/target/ppc/translate/vmx-impl.inc.c 
b/target/ppc/translate/vmx-impl.inc.c
index c83e605a00..8cc2e99963 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -511,6 +511,150 @@ static void gen_vmrgow(DisasContext *ctx)
 tcg_temp_free_i64(avr);
 }
 
+static void gen_vsl_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
+{
+TCGv_vec t = tcg_temp_new_vec_matching(b);
+tcg_gen_dupi_vec(vece, t, (8 << vece) - 1);
+tcg_gen_and_vec(vece, b, b, t);
+tcg_temp_free_vec(t);
+tcg_gen_shlv_vec(vece, d, a, b);
+}
+
+static void gen_vslw_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+tcg_gen_andi_i32(b, b, 31);
+tcg_gen_shl_i32(d, a, b);
+}
+
+static void gen_vsld_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+tcg_gen_andi_i64(b, b, 63);
+tcg_gen_shl_i64(d, a, b);
+}
+
+static void gen__vsl(unsigned vece, uint32_t dofs, uint32_t aofs,
+ uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
+{
+static const TCGOpcode shlv_list[] = { INDEX_op_shlv_vec, 0 };
+static const GVecGen3 g[4] = {
+{ .fniv = gen_vsl_vec,
+  .fno = gen_helper_vslb,
+  .opt_opc = shlv_list,
+  .vece = MO_8 },
+{ .fniv = gen_vsl_vec,
+  .fno = gen_helper_vslh,
+  .opt_opc = shlv_list,
+  .vece = MO_16 },
+{ .fni4 = gen_vslw_i32,
+  .fniv = gen_vsl_vec,
+  .fno = gen_helper_vslw,
+  .opt_opc = shlv_list,
+  .vece = MO_32 },
+{ .fni8 = gen_vsld_i64,
+  .fniv = gen_vsl_vec,
+  .fno = gen_helper_vsld,
+  .opt_opc = shlv_list,
+  .vece = MO_64 }
+};
+tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, [vece]);
+}
+
+static void gen_vsr_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
+{
+TCGv_vec t = tcg_temp_new_vec_matching(b);
+tcg_gen_dupi_vec(vece, t, (8 

[Qemu-devel] [PATCH 14/38] tcg/aarch64: Support vector variable shift opcodes

2019-04-20 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.h |  2 +-
 tcg/aarch64/tcg-target.opc.h |  2 ++
 tcg/aarch64/tcg-target.inc.c | 42 
 3 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index ce2bb1f90b..f5640a229b 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -134,7 +134,7 @@ typedef enum {
 #define TCG_TARGET_HAS_neg_vec  1
 #define TCG_TARGET_HAS_shi_vec  1
 #define TCG_TARGET_HAS_shs_vec  0
-#define TCG_TARGET_HAS_shv_vec  0
+#define TCG_TARGET_HAS_shv_vec  1
 #define TCG_TARGET_HAS_cmp_vec  1
 #define TCG_TARGET_HAS_mul_vec  1
 #define TCG_TARGET_HAS_sat_vec  1
diff --git a/tcg/aarch64/tcg-target.opc.h b/tcg/aarch64/tcg-target.opc.h
index 4816a6c3d4..59e1d3f7f7 100644
--- a/tcg/aarch64/tcg-target.opc.h
+++ b/tcg/aarch64/tcg-target.opc.h
@@ -1,3 +1,5 @@
 /* Target-specific opcodes for host vector expansion.  These will be
emitted by tcg_expand_vec_op.  For those familiar with GCC internals,
consider these to be UNSPEC with names.  */
+
+DEF(aa64_sshl_vec, 1, 2, 0, IMPLVEC)
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 1c9f4b0cb3..7d2a8213ec 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -538,12 +538,14 @@ typedef enum {
 I3616_CMEQ  = 0x2e208c00,
 I3616_SMAX  = 0x0e206400,
 I3616_SMIN  = 0x0e206c00,
+I3616_SSHL  = 0x0e204400,
 I3616_SQADD = 0x0e200c00,
 I3616_SQSUB = 0x0e202c00,
 I3616_UMAX  = 0x2e206400,
 I3616_UMIN  = 0x2e206c00,
 I3616_UQADD = 0x2e200c00,
 I3616_UQSUB = 0x2e202c00,
+I3616_USHL  = 0x2e204400,
 
 /* AdvSIMD two-reg misc.  */
 I3617_CMGT0 = 0x0e208800,
@@ -2254,6 +2256,12 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_sari_vec:
 tcg_out_insn(s, 3614, SSHR, is_q, a0, a1, (16 << vece) - a2);
 break;
+case INDEX_op_shlv_vec:
+tcg_out_insn(s, 3616, USHL, is_q, vece, a0, a1, a2);
+break;
+case INDEX_op_aa64_sshl_vec:
+tcg_out_insn(s, 3616, SSHL, is_q, vece, a0, a1, a2);
+break;
 case INDEX_op_cmp_vec:
 {
 TCGCond cond = args[3];
@@ -2321,7 +2329,11 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_smin_vec:
 case INDEX_op_umax_vec:
 case INDEX_op_umin_vec:
+case INDEX_op_shlv_vec:
 return 1;
+case INDEX_op_shrv_vec:
+case INDEX_op_sarv_vec:
+return -1;
 case INDEX_op_mul_vec:
 return vece < MO_64;
 
@@ -2333,6 +2345,32 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
TCGArg a0, ...)
 {
+va_list va;
+TCGv_vec v0, v1, v2, t1;
+
+va_start(va, a0);
+v0 = temp_tcgv_vec(arg_temp(a0));
+v1 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
+v2 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
+
+switch (opc) {
+case INDEX_op_shrv_vec:
+case INDEX_op_sarv_vec:
+/* Right shifts are negative left shifts for AArch64.  */
+t1 = tcg_temp_new_vec(type);
+tcg_gen_neg_vec(vece, t1, v2);
+opc = (opc == INDEX_op_shrv_vec
+   ? INDEX_op_shlv_vec : INDEX_op_aa64_sshl_vec);
+vec_gen_3(opc, type, vece, tcgv_vec_arg(v0),
+  tcgv_vec_arg(v1), tcgv_vec_arg(t1));
+tcg_temp_free_vec(t1);
+break;
+
+default:
+g_assert_not_reached();
+}
+
+va_end(va);
 }
 
 static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
@@ -2514,6 +2552,10 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_smin_vec:
 case INDEX_op_umax_vec:
 case INDEX_op_umin_vec:
+case INDEX_op_shlv_vec:
+case INDEX_op_shrv_vec:
+case INDEX_op_sarv_vec:
+case INDEX_op_aa64_sshl_vec:
 return _w_w;
 case INDEX_op_not_vec:
 case INDEX_op_neg_vec:
-- 
2.17.1




[Qemu-devel] [PATCH 29/38] tcg/i386: Support vector comparison select value

2019-04-20 Thread Richard Henderson
We already had backend support for this feature, but with a
backend-specific opcode.  Remove the old name, and reorder
the arguments to match the generic opcode.

Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.h |  2 +-
 tcg/i386/tcg-target.opc.h |  1 -
 tcg/i386/tcg-target.inc.c | 13 ++---
 3 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 683e029980..acdf96b99d 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -190,7 +190,7 @@ extern bool have_avx2;
 #define TCG_TARGET_HAS_mul_vec  1
 #define TCG_TARGET_HAS_sat_vec  1
 #define TCG_TARGET_HAS_minmax_vec   1
-#define TCG_TARGET_HAS_cmpsel_vec   0
+#define TCG_TARGET_HAS_cmpsel_vec   1
 
 #define TCG_TARGET_deposit_i32_valid(ofs, len) \
 (((ofs) == 0 && (len) == 8) || ((ofs) == 8 && (len) == 8) || \
diff --git a/tcg/i386/tcg-target.opc.h b/tcg/i386/tcg-target.opc.h
index e5fa88ba25..d761cca0cd 100644
--- a/tcg/i386/tcg-target.opc.h
+++ b/tcg/i386/tcg-target.opc.h
@@ -3,7 +3,6 @@
consider these to be UNSPEC with names.  */
 
 DEF(x86_shufps_vec, 1, 2, 1, IMPLVEC)
-DEF(x86_vpblendvb_vec, 1, 3, 0, IMPLVEC)
 DEF(x86_blend_vec, 1, 2, 1, IMPLVEC)
 DEF(x86_packss_vec, 1, 2, 0, IMPLVEC)
 DEF(x86_packus_vec, 1, 2, 0, IMPLVEC)
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 3dae0bf0c5..50296b4b2f 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -2921,13 +2921,13 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 tcg_out8(s, sub);
 break;
 
-case INDEX_op_x86_vpblendvb_vec:
+case INDEX_op_cmpsel_vec:
 insn = OPC_VPBLENDVB;
 if (type == TCG_TYPE_V256) {
 insn |= P_VEXL;
 }
-tcg_out_vex_modrm(s, insn, a0, a1, a2);
-tcg_out8(s, args[3] << 4);
+tcg_out_vex_modrm(s, insn, a0, a2, args[3]);
+tcg_out8(s, a1 << 4);
 break;
 
 case INDEX_op_x86_psrldq_vec:
@@ -3223,7 +3223,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_sari_vec:
 case INDEX_op_x86_psrldq_vec:
 return _x;
-case INDEX_op_x86_vpblendvb_vec:
+case INDEX_op_cmpsel_vec:
 return _x_x_x;
 
 default:
@@ -3241,6 +3241,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_or_vec:
 case INDEX_op_xor_vec:
 case INDEX_op_andc_vec:
+case INDEX_op_cmpsel_vec:
 return 1;
 case INDEX_op_cmp_vec:
 return -1;
@@ -3537,9 +3538,7 @@ static void expand_vec_minmax(TCGType type, unsigned vece,
 TCGv_vec t2;
 t2 = v1, v1 = v2, v2 = t2;
 }
-vec_gen_4(INDEX_op_x86_vpblendvb_vec, type, vece,
-  tcgv_vec_arg(v0), tcgv_vec_arg(v1),
-  tcgv_vec_arg(v2), tcgv_vec_arg(t1));
+tcg_gen_cmpsel_vec(vece, v0, t1, v1, v2);
 tcg_temp_free_vec(t1);
 }
 
-- 
2.17.1




[Qemu-devel] [PATCH 09/38] tcg/i386: Implement tcg_out_dupm_vec

2019-04-20 Thread Richard Henderson
At the same time, improve tcg_out_dupi_vec wrt broadcast
from the constant pool.

Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.inc.c | 57 +--
 1 file changed, 43 insertions(+), 14 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 0be7d8f589..fcabc1bdf2 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -358,7 +358,6 @@ static inline int tcg_target_const_match(tcg_target_long 
val, TCGType type,
 #define OPC_MOVBE_MyGy  (0xf1 | P_EXT38)
 #define OPC_MOVD_VyEy   (0x6e | P_EXT | P_DATA16)
 #define OPC_MOVD_EyVy   (0x7e | P_EXT | P_DATA16)
-#define OPC_MOVDDUP (0x12 | P_EXT | P_SIMDF2)
 #define OPC_MOVDQA_VxWx (0x6f | P_EXT | P_DATA16)
 #define OPC_MOVDQA_WxVx (0x7f | P_EXT | P_DATA16)
 #define OPC_MOVDQU_VxWx (0x6f | P_EXT | P_SIMDF3)
@@ -458,6 +457,10 @@ static inline int tcg_target_const_match(tcg_target_long 
val, TCGType type,
 #define OPC_UD2 (0x0b | P_EXT)
 #define OPC_VPBLENDD(0x02 | P_EXT3A | P_DATA16)
 #define OPC_VPBLENDVB   (0x4c | P_EXT3A | P_DATA16)
+#define OPC_VPINSRB (0x20 | P_EXT3A | P_DATA16)
+#define OPC_VPINSRW (0xc4 | P_EXT | P_DATA16)
+#define OPC_VBROADCASTSS (0x18 | P_EXT38 | P_DATA16)
+#define OPC_VBROADCASTSD (0x19 | P_EXT38 | P_DATA16)
 #define OPC_VPBROADCASTB (0x78 | P_EXT38 | P_DATA16)
 #define OPC_VPBROADCASTW (0x79 | P_EXT38 | P_DATA16)
 #define OPC_VPBROADCASTD (0x58 | P_EXT38 | P_DATA16)
@@ -855,16 +858,17 @@ static bool tcg_out_mov(TCGContext *s, TCGType type, 
TCGReg ret, TCGReg arg)
 return true;
 }
 
+static const int avx2_dup_insn[4] = {
+OPC_VPBROADCASTB, OPC_VPBROADCASTW,
+OPC_VPBROADCASTD, OPC_VPBROADCASTQ,
+};
+
 static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
 TCGReg r, TCGReg a)
 {
 if (have_avx2) {
-static const int dup_insn[4] = {
-OPC_VPBROADCASTB, OPC_VPBROADCASTW,
-OPC_VPBROADCASTD, OPC_VPBROADCASTQ,
-};
 int vex_l = (type == TCG_TYPE_V256 ? P_VEXL : 0);
-tcg_out_vex_modrm(s, dup_insn[vece] + vex_l, r, 0, a);
+tcg_out_vex_modrm(s, avx2_dup_insn[vece] + vex_l, r, 0, a);
 } else {
 switch (vece) {
 case MO_8:
@@ -894,10 +898,35 @@ static bool tcg_out_dup_vec(TCGContext *s, TCGType type, 
unsigned vece,
 static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
  TCGReg r, TCGReg base, intptr_t offset)
 {
-return false;
+if (have_avx2) {
+int vex_l = (type == TCG_TYPE_V256 ? P_VEXL : 0);
+tcg_out_vex_modrm_offset(s, avx2_dup_insn[vece] + vex_l,
+ r, 0, base, offset);
+} else {
+switch (vece) {
+case MO_64:
+tcg_out_vex_modrm_offset(s, OPC_VBROADCASTSD, r, 0, base, offset);
+break;
+case MO_32:
+tcg_out_vex_modrm_offset(s, OPC_VBROADCASTSS, r, 0, base, offset);
+break;
+case MO_16:
+tcg_out_vex_modrm_offset(s, OPC_VPINSRW, r, r, base, offset);
+tcg_out8(s, 0); /* imm8 */
+tcg_out_dup_vec(s, type, vece, r, r);
+break;
+case MO_8:
+tcg_out_vex_modrm_offset(s, OPC_VPINSRB, r, r, base, offset);
+tcg_out8(s, 0); /* imm8 */
+tcg_out_dup_vec(s, type, vece, r, r);
+break;
+default:
+g_assert_not_reached();
+}
+}
+return true;
 }
 
-
 static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
  TCGReg ret, tcg_target_long arg)
 {
@@ -918,16 +947,16 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
 } else if (have_avx2) {
 tcg_out_vex_modrm_pool(s, OPC_VPBROADCASTQ + vex_l, ret);
 } else {
-tcg_out_vex_modrm_pool(s, OPC_MOVDDUP, ret);
+tcg_out_vex_modrm_pool(s, OPC_VBROADCASTSD, ret);
 }
 new_pool_label(s, arg, R_386_PC32, s->code_ptr - 4, -4);
-} else if (have_avx2) {
-tcg_out_vex_modrm_pool(s, OPC_VPBROADCASTD + vex_l, ret);
-new_pool_label(s, arg, R_386_32, s->code_ptr - 4, 0);
 } else {
-tcg_out_vex_modrm_pool(s, OPC_MOVD_VyEy, ret);
+if (have_avx2) {
+tcg_out_vex_modrm_pool(s, OPC_VBROADCASTSD + vex_l, ret);
+} else {
+tcg_out_vex_modrm_pool(s, OPC_VBROADCASTSS, ret);
+}
 new_pool_label(s, arg, R_386_32, s->code_ptr - 4, 0);
-tcg_out_dup_vec(s, type, MO_32, ret, ret);
 }
 }
 
-- 
2.17.1




[Qemu-devel] [PATCH 15/38] tcg: Implement tcg_gen_gvec_3i()

2019-04-20 Thread Richard Henderson
From: David Hildenbrand 

Let's add tcg_gen_gvec_3i(), similar to tcg_gen_gvec_2i(), however
without introducing "gen_helper_gvec_3i *fnoi", as it isn't needed
for now.

Signed-off-by: David Hildenbrand 
Message-Id: <20190416185301.25344-2-da...@redhat.com>
Signed-off-by: Richard Henderson 
---
 tcg/tcg-op-gvec.h |  24 
 tcg/tcg-op-gvec.c | 139 ++
 2 files changed, 163 insertions(+)

diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h
index 1cd18a959a..aaeb6e5d8b 100644
--- a/tcg/tcg-op-gvec.h
+++ b/tcg/tcg-op-gvec.h
@@ -164,6 +164,27 @@ typedef struct {
 bool load_dest;
 } GVecGen3;
 
+typedef struct {
+/*
+ * Expand inline as a 64-bit or 32-bit integer. Only one of these will be
+ * non-NULL.
+ */
+void (*fni8)(TCGv_i64, TCGv_i64, TCGv_i64, int64_t);
+void (*fni4)(TCGv_i32, TCGv_i32, TCGv_i32, int32_t);
+/* Expand inline with a host vector type.  */
+void (*fniv)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec, int64_t);
+/* Expand out-of-line helper w/descriptor, data in descriptor.  */
+gen_helper_gvec_3 *fno;
+/* The opcode, if any, to which this corresponds.  */
+TCGOpcode opc;
+/* The vector element size, if applicable.  */
+uint8_t vece;
+/* Prefer i64 to v64.  */
+bool prefer_i64;
+/* Load dest as a 3rd source operand.  */
+bool load_dest;
+} GVecGen3i;
+
 typedef struct {
 /* Expand inline as a 64-bit or 32-bit integer.
Only one of these will be non-NULL.  */
@@ -193,6 +214,9 @@ void tcg_gen_gvec_2s(uint32_t dofs, uint32_t aofs, uint32_t 
oprsz,
  uint32_t maxsz, TCGv_i64 c, const GVecGen2s *);
 void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs,
 uint32_t oprsz, uint32_t maxsz, const GVecGen3 *);
+void tcg_gen_gvec_3i(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+ uint32_t oprsz, uint32_t maxsz, int64_t c,
+ const GVecGen3i *);
 void tcg_gen_gvec_4(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t cofs,
 uint32_t oprsz, uint32_t maxsz, const GVecGen4 *);
 
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 5d28184045..3eb9126f50 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -659,6 +659,29 @@ static void expand_3_i32(uint32_t dofs, uint32_t aofs,
 tcg_temp_free_i32(t0);
 }
 
+static void expand_3i_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+  uint32_t oprsz, int32_t c, bool load_dest,
+  void (*fni)(TCGv_i32, TCGv_i32, TCGv_i32, int32_t))
+{
+TCGv_i32 t0 = tcg_temp_new_i32();
+TCGv_i32 t1 = tcg_temp_new_i32();
+TCGv_i32 t2 = tcg_temp_new_i32();
+uint32_t i;
+
+for (i = 0; i < oprsz; i += 4) {
+tcg_gen_ld_i32(t0, cpu_env, aofs + i);
+tcg_gen_ld_i32(t1, cpu_env, bofs + i);
+if (load_dest) {
+tcg_gen_ld_i32(t2, cpu_env, dofs + i);
+}
+fni(t2, t0, t1, c);
+tcg_gen_st_i32(t2, cpu_env, dofs + i);
+}
+tcg_temp_free_i32(t0);
+tcg_temp_free_i32(t1);
+tcg_temp_free_i32(t2);
+}
+
 /* Expand OPSZ bytes worth of three-operand operations using i32 elements.  */
 static void expand_4_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs,
  uint32_t cofs, uint32_t oprsz, bool write_aofs,
@@ -766,6 +789,29 @@ static void expand_3_i64(uint32_t dofs, uint32_t aofs,
 tcg_temp_free_i64(t0);
 }
 
+static void expand_3i_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+  uint32_t oprsz, int64_t c, bool load_dest,
+  void (*fni)(TCGv_i64, TCGv_i64, TCGv_i64, int64_t))
+{
+TCGv_i64 t0 = tcg_temp_new_i64();
+TCGv_i64 t1 = tcg_temp_new_i64();
+TCGv_i64 t2 = tcg_temp_new_i64();
+uint32_t i;
+
+for (i = 0; i < oprsz; i += 8) {
+tcg_gen_ld_i64(t0, cpu_env, aofs + i);
+tcg_gen_ld_i64(t1, cpu_env, bofs + i);
+if (load_dest) {
+tcg_gen_ld_i64(t2, cpu_env, dofs + i);
+}
+fni(t2, t0, t1, c);
+tcg_gen_st_i64(t2, cpu_env, dofs + i);
+}
+tcg_temp_free_i64(t0);
+tcg_temp_free_i64(t1);
+tcg_temp_free_i64(t2);
+}
+
 /* Expand OPSZ bytes worth of three-operand operations using i64 elements.  */
 static void expand_4_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs,
  uint32_t cofs, uint32_t oprsz, bool write_aofs,
@@ -879,6 +925,35 @@ static void expand_3_vec(unsigned vece, uint32_t dofs, 
uint32_t aofs,
 tcg_temp_free_vec(t0);
 }
 
+/*
+ * Expand OPSZ bytes worth of three-vector operands and an immediate operand
+ * using host vectors.
+ */
+static void expand_3i_vec(unsigned vece, uint32_t dofs, uint32_t aofs,
+  uint32_t bofs, uint32_t oprsz, uint32_t tysz,
+  TCGType type, int64_t c, bool load_dest,
+  void (*fni)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec,
+

[Qemu-devel] [PATCH 12/38] tcg: Add gvec expanders for variable shift

2019-04-20 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 accel/tcg/tcg-runtime.h  |  15 
 tcg/tcg-op-gvec.h|   7 ++
 tcg/tcg-op.h |   4 ++
 accel/tcg/tcg-runtime-gvec.c | 132 +++
 tcg/tcg-op-gvec.c|  87 +++
 tcg/tcg-op-vec.c |  15 
 6 files changed, 260 insertions(+)

diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
index dfe325625c..ed3ce5fd91 100644
--- a/accel/tcg/tcg-runtime.h
+++ b/accel/tcg/tcg-runtime.h
@@ -254,6 +254,21 @@ DEF_HELPER_FLAGS_3(gvec_sar16i, TCG_CALL_NO_RWG, void, 
ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(gvec_sar32i, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(gvec_sar64i, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(gvec_shl8v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_shl16v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_shl32v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_shl64v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(gvec_shr8v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_shr16v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_shr32v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_shr64v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(gvec_sar8v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_sar16v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_sar32v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_sar64v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(gvec_eq8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_eq16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_eq32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h
index 850da32ded..1cd18a959a 100644
--- a/tcg/tcg-op-gvec.h
+++ b/tcg/tcg-op-gvec.h
@@ -294,6 +294,13 @@ void tcg_gen_gvec_shri(unsigned vece, uint32_t dofs, 
uint32_t aofs,
 void tcg_gen_gvec_sari(unsigned vece, uint32_t dofs, uint32_t aofs,
int64_t shift, uint32_t oprsz, uint32_t maxsz);
 
+void tcg_gen_gvec_shlv(unsigned vece, uint32_t dofs, uint32_t aofs,
+   uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
+void tcg_gen_gvec_shrv(unsigned vece, uint32_t dofs, uint32_t aofs,
+   uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
+void tcg_gen_gvec_sarv(unsigned vece, uint32_t dofs, uint32_t aofs,
+   uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
+
 void tcg_gen_gvec_cmp(TCGCond cond, unsigned vece, uint32_t dofs,
   uint32_t aofs, uint32_t bofs,
   uint32_t oprsz, uint32_t maxsz);
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 9fff9864f6..833c6330b5 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -986,6 +986,10 @@ void tcg_gen_shli_vec(unsigned vece, TCGv_vec r, TCGv_vec 
a, int64_t i);
 void tcg_gen_shri_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i);
 void tcg_gen_sari_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i);
 
+void tcg_gen_shlv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
+void tcg_gen_shrv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
+void tcg_gen_sarv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
+
 void tcg_gen_cmp_vec(TCGCond cond, unsigned vece, TCGv_vec r,
  TCGv_vec a, TCGv_vec b);
 
diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c
index e2c6f24262..7b88f5590c 100644
--- a/accel/tcg/tcg-runtime-gvec.c
+++ b/accel/tcg/tcg-runtime-gvec.c
@@ -725,6 +725,138 @@ void HELPER(gvec_sar64i)(void *d, void *a, uint32_t desc)
 clear_high(d, oprsz, desc);
 }
 
+void HELPER(gvec_shl8v)(void *d, void *a, void *b, uint32_t desc)
+{
+intptr_t oprsz = simd_oprsz(desc);
+intptr_t i;
+
+for (i = 0; i < oprsz; i += sizeof(uint8_t)) {
+*(uint8_t *)(d + i) = *(uint8_t *)(a + i) << *(uint8_t *)(b + i);
+}
+clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_shl16v)(void *d, void *a, void *b, uint32_t desc)
+{
+intptr_t oprsz = simd_oprsz(desc);
+intptr_t i;
+
+for (i = 0; i < oprsz; i += sizeof(uint16_t)) {
+*(uint16_t *)(d + i) = *(uint16_t *)(a + i) << *(uint16_t *)(b + i);
+}
+clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_shl32v)(void *d, void *a, void *b, uint32_t desc)
+{
+intptr_t oprsz = simd_oprsz(desc);
+intptr_t i;
+
+for (i = 0; i < oprsz; i += sizeof(uint32_t)) {
+*(uint32_t *)(d + i) = *(uint32_t *)(a + i) << *(uint32_t *)(b + i);
+}
+clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_shl64v)(void *d, void *a, void *b, uint32_t desc)
+{
+intptr_t oprsz = simd_oprsz(desc);
+intptr_t i;
+
+for (i = 0; i < oprsz; i += sizeof(uint64_t)) {
+*(uint64_t *)(d + i) = 

[Qemu-devel] [PATCH 04/38] tcg: Support cross-class moves without instruction support

2019-04-20 Thread Richard Henderson
PowerPC Altivec does not support direct moves between vector registers
and general registers.  So when tcg_out_mov fails, we can use the
backing memory for the temporary to perform the move.

Signed-off-by: Richard Henderson 
---
 tcg/tcg.c | 25 ++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index b083faacd2..d3dcfe3dca 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -3373,7 +3373,18 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp 
*op)
  ots->indirect_base);
 }
 if (!tcg_out_mov(s, otype, ots->reg, ts->reg)) {
-abort();
+/* Cross register class move not supported.
+   Store the source register into the destination slot
+   and leave the destination temp as TEMP_VAL_MEM.  */
+assert(!ots->fixed_reg);
+if (!ts->mem_allocated) {
+temp_allocate_frame(s, ots);
+}
+tcg_out_st(s, ts->type, ts->reg,
+   ots->mem_base->reg, ots->mem_offset);
+ots->mem_coherent = 1;
+temp_free_or_dead(s, ots, -1);
+return;
 }
 }
 ots->val_type = TEMP_VAL_REG;
@@ -3475,7 +3486,11 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp 
*op)
 reg = tcg_reg_alloc(s, arg_ct->u.regs, i_allocated_regs,
 o_preferred_regs, ts->indirect_base);
 if (!tcg_out_mov(s, ts->type, reg, ts->reg)) {
-abort();
+/* Cross register class move not supported.  Sync the
+   temp back to its slot and load from there.  */
+temp_sync(s, ts, i_allocated_regs, 0, 0);
+tcg_out_ld(s, ts->type, reg,
+   ts->mem_base->reg, ts->mem_offset);
 }
 }
 new_args[i] = reg;
@@ -3634,7 +3649,11 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
 if (ts->reg != reg) {
 tcg_reg_free(s, reg, allocated_regs);
 if (!tcg_out_mov(s, ts->type, reg, ts->reg)) {
-abort();
+/* Cross register class move not supported.  Sync the
+   temp back to its slot and load from there.  */
+temp_sync(s, ts, allocated_regs, 0, 0);
+tcg_out_ld(s, ts->type, reg,
+   ts->mem_base->reg, ts->mem_offset);
 }
 }
 } else {
-- 
2.17.1




[Qemu-devel] [PATCH 10/38] tcg/aarch64: Implement tcg_out_dupm_vec

2019-04-20 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.inc.c | 38 ++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 3f95930e88..1db4e22365 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -381,6 +381,9 @@ typedef enum {
 I3207_BLR   = 0xd63f,
 I3207_RET   = 0xd65f,
 
+/* AdvSIMD load/store single structure.  */
+I3303_LD1R  = 0x0d40c000,
+
 /* Load literal for loading the address at pc-relative offset */
 I3305_LDR   = 0x5800,
 I3305_LDR_v64   = 0x5c00,
@@ -414,6 +417,8 @@ typedef enum {
 I3312_LDRVQ = 0x3c00 | 3 << 22 | 0 << 30,
 I3312_STRVQ = 0x3c00 | 2 << 22 | 0 << 30,
 
+
+
 I3312_TO_I3310  = 0x00200800,
 I3312_TO_I3313  = 0x0100,
 
@@ -566,7 +571,14 @@ static inline uint32_t tcg_in32(TCGContext *s)
 #define tcg_out_insn(S, FMT, OP, ...) \
 glue(tcg_out_insn_,FMT)(S, glue(glue(glue(I,FMT),_),OP), ## __VA_ARGS__)
 
-static void tcg_out_insn_3305(TCGContext *s, AArch64Insn insn, int imm19, 
TCGReg rt)
+static void tcg_out_insn_3303(TCGContext *s, AArch64Insn insn, bool q,
+  TCGReg rt, TCGReg rn, unsigned size)
+{
+tcg_out32(s, insn | (rt & 0x1f) | (rn << 5) | (size << 10) | (q << 30));
+}
+
+static void tcg_out_insn_3305(TCGContext *s, AArch64Insn insn,
+  int imm19, TCGReg rt)
 {
 tcg_out32(s, insn | (imm19 & 0x7) << 5 | rt);
 }
@@ -825,7 +837,29 @@ static bool tcg_out_dup_vec(TCGContext *s, TCGType type, 
unsigned vece,
 static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
  TCGReg r, TCGReg base, intptr_t offset)
 {
-return false;
+if (offset != 0) {
+AArch64Insn add_insn = I3401_ADDI;
+TCGReg temp = TCG_REG_TMP;
+
+if (offset < 0) {
+add_insn = I3401_SUBI;
+offset = -offset;
+}
+if (offset <= 0xfff) {
+tcg_out_insn_3401(s, add_insn, 1, temp, base, offset);
+} else if (offset <= 0xff) {
+tcg_out_insn_3401(s, add_insn, 1, temp, base, offset & 0xfff000);
+if (offset & 0xfff) {
+tcg_out_insn_3401(s, add_insn, 1, temp, base, offset & 0xfff);
+}
+} else {
+tcg_out_movi(s, TCG_TYPE_PTR, temp, offset);
+tcg_out_insn(s, 3502, ADD, 1, temp, temp, base);
+}
+base = temp;
+}
+tcg_out_insn(s, 3303, LD1R, type == TCG_TYPE_V128, r, base, vece);
+return true;
 }
 
 static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
-- 
2.17.1




[Qemu-devel] [PATCH 17/38] tcg: Add gvec expanders for vector shift by scalar

2019-04-20 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tcg-op-gvec.h |   7 ++
 tcg/tcg-op.h  |   4 +
 tcg/tcg-op-gvec.c | 210 ++
 tcg/tcg-op-vec.c  |  54 
 4 files changed, 275 insertions(+)

diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h
index a0e0902f6c..f9c6058e92 100644
--- a/tcg/tcg-op-gvec.h
+++ b/tcg/tcg-op-gvec.h
@@ -318,6 +318,13 @@ void tcg_gen_gvec_shri(unsigned vece, uint32_t dofs, 
uint32_t aofs,
 void tcg_gen_gvec_sari(unsigned vece, uint32_t dofs, uint32_t aofs,
int64_t shift, uint32_t oprsz, uint32_t maxsz);
 
+void tcg_gen_gvec_shls(unsigned vece, uint32_t dofs, uint32_t aofs,
+   TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz);
+void tcg_gen_gvec_shrs(unsigned vece, uint32_t dofs, uint32_t aofs,
+   TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz);
+void tcg_gen_gvec_sars(unsigned vece, uint32_t dofs, uint32_t aofs,
+   TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz);
+
 void tcg_gen_gvec_shlv(unsigned vece, uint32_t dofs, uint32_t aofs,
uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
 void tcg_gen_gvec_shrv(unsigned vece, uint32_t dofs, uint32_t aofs,
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 833c6330b5..472b73cb38 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -986,6 +986,10 @@ void tcg_gen_shli_vec(unsigned vece, TCGv_vec r, TCGv_vec 
a, int64_t i);
 void tcg_gen_shri_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i);
 void tcg_gen_sari_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i);
 
+void tcg_gen_shls_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 s);
+void tcg_gen_shrs_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 s);
+void tcg_gen_sars_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 s);
+
 void tcg_gen_shlv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
 void tcg_gen_shrv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
 void tcg_gen_sarv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 40858a83e0..4eb0747ddd 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -2617,6 +2617,216 @@ void tcg_gen_gvec_sari(unsigned vece, uint32_t dofs, 
uint32_t aofs,
 }
 }
 
+/*
+ * Specialized generation vector shifts by a non-constant scalar.
+ */
+
+static void expand_2sh_vec(unsigned vece, uint32_t dofs, uint32_t aofs,
+   uint32_t oprsz, uint32_t tysz, TCGType type,
+   TCGv_i32 shift,
+   void (*fni)(unsigned, TCGv_vec, TCGv_vec, TCGv_i32))
+{
+TCGv_vec t0 = tcg_temp_new_vec(type);
+uint32_t i;
+
+for (i = 0; i < oprsz; i += tysz) {
+tcg_gen_ld_vec(t0, cpu_env, aofs + i);
+fni(vece, t0, t0, shift);
+tcg_gen_st_vec(t0, cpu_env, dofs + i);
+}
+tcg_temp_free_vec(t0);
+}
+
+static void do_shifts(unsigned vece, uint32_t dofs, uint32_t aofs,
+  TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz,
+  void (*fni4)(TCGv_i32, TCGv_i32, TCGv_i32),
+  void (*fni8)(TCGv_i64, TCGv_i64, TCGv_i64),
+  void (*fniv_s)(unsigned, TCGv_vec, TCGv_vec, TCGv_i32),
+  void (*fniv_v)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec),
+  gen_helper_gvec_2 *fno,
+  const TCGOpcode *s_list, const TCGOpcode *v_list)
+{
+TCGType type;
+uint32_t some;
+
+check_size_align(oprsz, maxsz, dofs | aofs);
+check_overlap_2(dofs, aofs, maxsz);
+
+/* If the backend has a scalar expansion, great.  */
+type = choose_vector_type(s_list, vece, oprsz, vece == MO_64);
+if (type) {
+const TCGOpcode *hold_list = tcg_swap_vecop_list(NULL);
+switch (type) {
+case TCG_TYPE_V256:
+some = QEMU_ALIGN_DOWN(oprsz, 32);
+expand_2sh_vec(vece, dofs, aofs, some, 32,
+   TCG_TYPE_V256, shift, fniv_s);
+if (some == oprsz) {
+break;
+}
+dofs += some;
+aofs += some;
+oprsz -= some;
+maxsz -= some;
+/* fallthru */
+case TCG_TYPE_V128:
+expand_2sh_vec(vece, dofs, aofs, oprsz, 16,
+   TCG_TYPE_V128, shift, fniv_s);
+break;
+case TCG_TYPE_V64:
+expand_2sh_vec(vece, dofs, aofs, oprsz, 8,
+   TCG_TYPE_V64, shift, fniv_s);
+break;
+default:
+g_assert_not_reached();
+}
+tcg_swap_vecop_list(hold_list);
+goto clear_tail;
+}
+
+/* If the backend supports variable vector shifts, also cool.  */
+type = choose_vector_type(v_list, vece, oprsz, vece == MO_64);
+if (type) {
+const TCGOpcode *hold_list = tcg_swap_vecop_list(NULL);
+TCGv_vec v_shift = tcg_temp_new_vec(type);
+
+  

[Qemu-devel] [PATCH 08/38] tcg: Add tcg_out_dupm_vec to the backend interface

2019-04-20 Thread Richard Henderson
Currently stubbed out in all backends that support vectors.

Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.inc.c |  6 ++
 tcg/i386/tcg-target.inc.c|  7 +++
 tcg/tcg.c| 19 ++-
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index b272822969..3f95930e88 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -822,6 +822,12 @@ static bool tcg_out_dup_vec(TCGContext *s, TCGType type, 
unsigned vece,
 return true;
 }
 
+static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
+ TCGReg r, TCGReg base, intptr_t offset)
+{
+return false;
+}
+
 static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
  tcg_target_long value)
 {
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 49691c4f56..0be7d8f589 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -891,6 +891,13 @@ static bool tcg_out_dup_vec(TCGContext *s, TCGType type, 
unsigned vece,
 return true;
 }
 
+static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
+ TCGReg r, TCGReg base, intptr_t offset)
+{
+return false;
+}
+
+
 static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
  TCGReg ret, tcg_target_long arg)
 {
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 55498b63d7..1c34c08791 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -110,6 +110,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const 
TCGArg *args,
 #if TCG_TARGET_MAYBE_vec
 static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
 TCGReg dst, TCGReg src);
+static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
+ TCGReg dst, TCGReg base, intptr_t offset);
 static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
  TCGReg dst, tcg_target_long arg);
 static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, unsigned vecl,
@@ -121,6 +123,11 @@ static inline bool tcg_out_dup_vec(TCGContext *s, TCGType 
type, unsigned vece,
 {
 g_assert_not_reached();
 }
+static inline bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
+TCGReg dst, TCGReg base, intptr_t offset)
+{
+g_assert_not_reached();
+}
 static inline void tcg_out_dupi_vec(TCGContext *s, TCGType type,
 TCGReg dst, tcg_target_long arg)
 {
@@ -3416,6 +3423,7 @@ static void tcg_reg_alloc_dup(TCGContext *s, const TCGOp 
*op)
 TCGRegSet dup_out_regs, dup_in_regs;
 TCGTemp *its, *ots;
 TCGType itype, vtype;
+intptr_t endian_fixup;
 unsigned vece;
 bool ok;
 
@@ -3485,7 +3493,16 @@ static void tcg_reg_alloc_dup(TCGContext *s, const TCGOp 
*op)
 /* fall through */
 
 case TEMP_VAL_MEM:
-/* TODO: dup from memory */
+#ifdef HOST_WORDS_BIGENDIAN
+endian_fixup = itype == TCG_TYPE_I32 ? 4 : 8;
+endian_fixup -= 1 << vece;
+#else
+endian_fixup = 0;
+#endif
+if (tcg_out_dupm_vec(s, vtype, vece, ots->reg, its->mem_base->reg,
+ its->mem_offset + endian_fixup)) {
+goto done;
+}
 tcg_out_ld(s, itype, ots->reg, its->mem_base->reg, its->mem_offset);
 break;
 
-- 
2.17.1




[Qemu-devel] [PATCH 06/38] tcg: Promote tcg_out_{dup, dupi}_vec to backend interface

2019-04-20 Thread Richard Henderson
The i386 backend already has these functions, and the aarch64
backend could easily split out one.  Nothing is done with these
functions yet, but this will aid register allocation of
INDEX_op_dup_vec in a later patch.

Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.inc.c | 12 ++--
 tcg/i386/tcg-target.inc.c|  3 ++-
 tcg/tcg.c| 14 ++
 3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index b2d3f9c0a5..116ebd8c1a 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -799,7 +799,7 @@ static void tcg_out_logicali(TCGContext *s, AArch64Insn 
insn, TCGType ext,
 }
 
 static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
- TCGReg rd, uint64_t v64)
+ TCGReg rd, tcg_target_long v64)
 {
 int op, cmode, imm8;
 
@@ -814,6 +814,14 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
 }
 }
 
+static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
+TCGReg rd, TCGReg rs)
+{
+int is_q = type - TCG_TYPE_V64;
+tcg_out_insn(s, 3605, DUP, is_q, rd, rs, 1 << vece, 0);
+return true;
+}
+
 static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
  tcg_target_long value)
 {
@@ -2197,7 +2205,7 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 tcg_out_insn(s, 3617, NOT, is_q, 0, a0, a1);
 break;
 case INDEX_op_dup_vec:
-tcg_out_insn(s, 3605, DUP, is_q, a0, a1, 1 << vece, 0);
+tcg_out_dup_vec(s, type, vece, a0, a1);
 break;
 case INDEX_op_shli_vec:
 tcg_out_insn(s, 3614, SHL, is_q, a0, a1, a2 + (8 << vece));
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 817a167767..04e3d37b05 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -855,7 +855,7 @@ static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg 
ret, TCGReg arg)
 return true;
 }
 
-static void tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
+static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
 TCGReg r, TCGReg a)
 {
 if (have_avx2) {
@@ -888,6 +888,7 @@ static void tcg_out_dup_vec(TCGContext *s, TCGType type, 
unsigned vece,
 g_assert_not_reached();
 }
 }
+return true;
 }
 
 static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
diff --git a/tcg/tcg.c b/tcg/tcg.c
index d3dcfe3dca..5ed9c7bee5 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -108,10 +108,24 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
const int *const_args);
 #if TCG_TARGET_MAYBE_vec
+static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
+TCGReg dst, TCGReg src);
+static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
+ TCGReg dst, tcg_target_long arg);
 static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, unsigned vecl,
unsigned vece, const TCGArg *args,
const int *const_args);
 #else
+static inline bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
+   TCGReg dst, TCGReg src)
+{
+g_assert_not_reached();
+}
+static inline void tcg_out_dupi_vec(TCGContext *s, TCGType type,
+TCGReg dst, tcg_target_long arg)
+{
+g_assert_not_reached();
+}
 static inline void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, unsigned vecl,
   unsigned vece, const TCGArg *args,
   const int *const_args)
-- 
2.17.1




[Qemu-devel] [PATCH 02/38] tcg: Assert fixed_reg is read-only

2019-04-20 Thread Richard Henderson
The only fixed_reg is cpu_env, and it should not be modified
during any TB.  Therefore code that tries to special-case moves
into a fixed_reg is dead.  Remove it.

Signed-off-by: Richard Henderson 
---
 tcg/tcg.c | 85 +--
 1 file changed, 38 insertions(+), 47 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index ade6050982..4f77a957b0 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -3279,11 +3279,8 @@ static void tcg_reg_alloc_do_movi(TCGContext *s, TCGTemp 
*ots,
   tcg_target_ulong val, TCGLifeData arg_life,
   TCGRegSet preferred_regs)
 {
-if (ots->fixed_reg) {
-/* For fixed registers, we do not do any constant propagation.  */
-tcg_out_movi(s, ots->type, ots->reg, val);
-return;
-}
+/* ENV should not be modified.  */
+tcg_debug_assert(!ots->fixed_reg);
 
 /* The movi is not explicitly generated here.  */
 if (ots->val_type == TEMP_VAL_REG) {
@@ -3319,6 +3316,9 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp 
*op)
 ots = arg_temp(op->args[0]);
 ts = arg_temp(op->args[1]);
 
+/* ENV should not be modified.  */
+tcg_debug_assert(!ots->fixed_reg);
+
 /* Note that otype != itype for no-op truncation.  */
 otype = ots->type;
 itype = ts->type;
@@ -3343,7 +3343,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp 
*op)
 }
 
 tcg_debug_assert(ts->val_type == TEMP_VAL_REG);
-if (IS_DEAD_ARG(0) && !ots->fixed_reg) {
+if (IS_DEAD_ARG(0)) {
 /* mov to a non-saved dead register makes no sense (even with
liveness analysis disabled). */
 tcg_debug_assert(NEED_SYNC_ARG(0));
@@ -3356,7 +3356,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp 
*op)
 }
 temp_dead(s, ots);
 } else {
-if (IS_DEAD_ARG(1) && !ts->fixed_reg && !ots->fixed_reg) {
+if (IS_DEAD_ARG(1) && !ts->fixed_reg) {
 /* the mov can be suppressed */
 if (ots->val_type == TEMP_VAL_REG) {
 s->reg_to_temp[ots->reg] = NULL;
@@ -3509,6 +3509,10 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp 
*op)
 arg = op->args[i];
 arg_ct = >args_ct[i];
 ts = arg_temp(arg);
+
+/* ENV should not be modified.  */
+tcg_debug_assert(!ts->fixed_reg);
+
 if ((arg_ct->ct & TCG_CT_ALIAS)
 && !const_args[arg_ct->alias_index]) {
 reg = new_args[arg_ct->alias_index];
@@ -3517,29 +3521,19 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp 
*op)
 i_allocated_regs | o_allocated_regs,
 op->output_pref[k], ts->indirect_base);
 } else {
-/* if fixed register, we try to use it */
-reg = ts->reg;
-if (ts->fixed_reg &&
-tcg_regset_test_reg(arg_ct->u.regs, reg)) {
-goto oarg_end;
-}
 reg = tcg_reg_alloc(s, arg_ct->u.regs, o_allocated_regs,
 op->output_pref[k], ts->indirect_base);
 }
 tcg_regset_set_reg(o_allocated_regs, reg);
-/* if a fixed register is used, then a move will be done 
afterwards */
-if (!ts->fixed_reg) {
-if (ts->val_type == TEMP_VAL_REG) {
-s->reg_to_temp[ts->reg] = NULL;
-}
-ts->val_type = TEMP_VAL_REG;
-ts->reg = reg;
-/* temp value is modified, so the value kept in memory is
-   potentially not the same */
-ts->mem_coherent = 0;
-s->reg_to_temp[reg] = ts;
+if (ts->val_type == TEMP_VAL_REG) {
+s->reg_to_temp[ts->reg] = NULL;
 }
-oarg_end:
+ts->val_type = TEMP_VAL_REG;
+ts->reg = reg;
+/* temp value is modified, so the value kept in memory is
+   potentially not the same */
+ts->mem_coherent = 0;
+s->reg_to_temp[reg] = ts;
 new_args[i] = reg;
 }
 }
@@ -3555,10 +3549,10 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp 
*op)
 /* move the outputs in the correct register if needed */
 for(i = 0; i < nb_oargs; i++) {
 ts = arg_temp(op->args[i]);
-reg = new_args[i];
-if (ts->fixed_reg && ts->reg != reg) {
-tcg_out_mov(s, ts->type, ts->reg, reg);
-}
+
+/* ENV should not be modified.  */
+tcg_debug_assert(!ts->fixed_reg);
+
 if (NEED_SYNC_ARG(i)) {
 temp_sync(s, ts, o_allocated_regs, 0, IS_DEAD_ARG(i));
 } else if (IS_DEAD_ARG(i)) {
@@ -3679,26 +3673,23 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
 for(i = 0; i < nb_oargs; i++) {
 

[Qemu-devel] [PATCH 07/38] tcg: Manually expand INDEX_op_dup_vec

2019-04-20 Thread Richard Henderson
This case is similar to INDEX_op_mov_* in that we need to do
different things depending on the current location of the source.

Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.inc.c |   9 ++--
 tcg/i386/tcg-target.inc.c|   8 ++-
 tcg/tcg.c| 102 +++
 3 files changed, 109 insertions(+), 10 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 116ebd8c1a..b272822969 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -2104,10 +2104,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
 case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
 case INDEX_op_mov_i64:
-case INDEX_op_mov_vec:
 case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
 case INDEX_op_movi_i64:
-case INDEX_op_dupi_vec:
 case INDEX_op_call: /* Always emitted via tcg_out_call.  */
 default:
 g_assert_not_reached();
@@ -2204,9 +2202,6 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_not_vec:
 tcg_out_insn(s, 3617, NOT, is_q, 0, a0, a1);
 break;
-case INDEX_op_dup_vec:
-tcg_out_dup_vec(s, type, vece, a0, a1);
-break;
 case INDEX_op_shli_vec:
 tcg_out_insn(s, 3614, SHL, is_q, a0, a1, a2 + (8 << vece));
 break;
@@ -2250,6 +2245,10 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 }
 }
 break;
+
+case INDEX_op_mov_vec:  /* Always emitted via tcg_out_mov.  */
+case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi.  */
+case INDEX_op_dup_vec:  /* Always emitted via tcg_out_dup_vec.  */
 default:
 g_assert_not_reached();
 }
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 04e3d37b05..49691c4f56 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -2601,10 +2601,8 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 break;
 case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
 case INDEX_op_mov_i64:
-case INDEX_op_mov_vec:
 case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
 case INDEX_op_movi_i64:
-case INDEX_op_dupi_vec:
 case INDEX_op_call: /* Always emitted via tcg_out_call.  */
 default:
 tcg_abort();
@@ -2793,9 +2791,6 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_st_vec:
 tcg_out_st(s, type, a0, a1, a2);
 break;
-case INDEX_op_dup_vec:
-tcg_out_dup_vec(s, type, vece, a0, a1);
-break;
 
 case INDEX_op_x86_shufps_vec:
 insn = OPC_SHUFPS;
@@ -2837,6 +2832,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 tcg_out8(s, a2);
 break;
 
+case INDEX_op_mov_vec:  /* Always emitted via tcg_out_mov.  */
+case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi.  */
+case INDEX_op_dup_vec:  /* Always emitted via tcg_out_dup_vec.  */
 default:
 g_assert_not_reached();
 }
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 5ed9c7bee5..55498b63d7 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -3410,6 +3410,105 @@ static void tcg_reg_alloc_mov(TCGContext *s, const 
TCGOp *op)
 }
 }
 
+static void tcg_reg_alloc_dup(TCGContext *s, const TCGOp *op)
+{
+const TCGLifeData arg_life = op->life;
+TCGRegSet dup_out_regs, dup_in_regs;
+TCGTemp *its, *ots;
+TCGType itype, vtype;
+unsigned vece;
+bool ok;
+
+ots = arg_temp(op->args[0]);
+its = arg_temp(op->args[1]);
+
+/* There should be no fixed vector registers.  */
+tcg_debug_assert(!ots->fixed_reg);
+
+itype = its->type;
+vece = TCGOP_VECE(op);
+vtype = TCGOP_VECL(op) + TCG_TYPE_V64;
+
+if (its->val_type == TEMP_VAL_CONST) {
+/* Propagate constant via movi -> dupi.  */
+tcg_target_ulong val = its->val;
+if (IS_DEAD_ARG(1)) {
+temp_dead(s, its);
+}
+tcg_reg_alloc_do_movi(s, ots, val, arg_life, op->output_pref[0]);
+return;
+}
+
+dup_out_regs = tcg_op_defs[INDEX_op_dup_vec].args_ct[0].u.regs;
+dup_in_regs = tcg_op_defs[INDEX_op_dup_vec].args_ct[1].u.regs;
+
+/* Allocate the output register now.  */
+if (ots->val_type != TEMP_VAL_REG) {
+TCGRegSet allocated_regs = s->reserved_regs;
+
+if (!IS_DEAD_ARG(1) && its->val_type == TEMP_VAL_REG) {
+/* Make sure to not spill the input register. */
+tcg_regset_set_reg(allocated_regs, its->reg);
+}
+ots->reg = tcg_reg_alloc(s, dup_out_regs, allocated_regs,
+ op->output_pref[0], ots->indirect_base);
+ots->val_type = TEMP_VAL_REG;
+ots->mem_coherent = 0;
+s->reg_to_temp[ots->reg] = ots;
+}
+
+switch (its->val_type) {
+case TEMP_VAL_REG:
+/*
+ * The dup constriaints must be broad, covering all possible 

[Qemu-devel] [PATCH 03/38] tcg: Return bool success from tcg_out_mov

2019-04-20 Thread Richard Henderson
This patch merely changes the interface, aborting on all failures,
of which there are currently none.

Reviewed-by: David Gibson 
Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.inc.c |  5 +++--
 tcg/arm/tcg-target.inc.c |  7 +--
 tcg/i386/tcg-target.inc.c|  5 +++--
 tcg/mips/tcg-target.inc.c|  3 ++-
 tcg/ppc/tcg-target.inc.c |  3 ++-
 tcg/riscv/tcg-target.inc.c   |  5 +++--
 tcg/s390/tcg-target.inc.c|  3 ++-
 tcg/sparc/tcg-target.inc.c   |  3 ++-
 tcg/tcg.c| 14 ++
 tcg/tci/tcg-target.inc.c |  3 ++-
 10 files changed, 34 insertions(+), 17 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 8b93598bce..b2d3f9c0a5 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -938,10 +938,10 @@ static void tcg_out_ldst(TCGContext *s, AArch64Insn insn, 
TCGReg rd,
 tcg_out_ldst_r(s, insn, rd, rn, TCG_TYPE_I64, TCG_REG_TMP);
 }
 
-static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
+static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
 {
 if (ret == arg) {
-return;
+return true;
 }
 switch (type) {
 case TCG_TYPE_I32:
@@ -970,6 +970,7 @@ static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg 
ret, TCGReg arg)
 default:
 g_assert_not_reached();
 }
+return true;
 }
 
 static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 6873b0cf95..34e6652142 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -2275,10 +2275,13 @@ static inline bool tcg_out_sti(TCGContext *s, TCGType 
type, TCGArg val,
 return false;
 }
 
-static inline void tcg_out_mov(TCGContext *s, TCGType type,
+static inline bool tcg_out_mov(TCGContext *s, TCGType type,
TCGReg ret, TCGReg arg)
 {
-tcg_out_dat_reg(s, COND_AL, ARITH_MOV, ret, 0, arg, SHIFT_IMM_LSL(0));
+if (ret != arg) {
+tcg_out_dat_reg(s, COND_AL, ARITH_MOV, ret, 0, arg, SHIFT_IMM_LSL(0));
+}
+return true;
 }
 
 static inline void tcg_out_movi(TCGContext *s, TCGType type,
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 1fa833840e..817a167767 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -809,12 +809,12 @@ static inline void tgen_arithr(TCGContext *s, int subop, 
int dest, int src)
 tcg_out_modrm(s, OPC_ARITH_GvEv + (subop << 3) + ext, dest, src);
 }
 
-static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
+static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
 {
 int rexw = 0;
 
 if (arg == ret) {
-return;
+return true;
 }
 switch (type) {
 case TCG_TYPE_I64:
@@ -852,6 +852,7 @@ static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg 
ret, TCGReg arg)
 default:
 g_assert_not_reached();
 }
+return true;
 }
 
 static void tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index 8a92e916dd..f31ebb43bf 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -558,13 +558,14 @@ static inline void tcg_out_dsra(TCGContext *s, TCGReg rd, 
TCGReg rt, TCGArg sa)
 tcg_out_opc_sa64(s, OPC_DSRA, OPC_DSRA32, rd, rt, sa);
 }
 
-static inline void tcg_out_mov(TCGContext *s, TCGType type,
+static inline bool tcg_out_mov(TCGContext *s, TCGType type,
TCGReg ret, TCGReg arg)
 {
 /* Simple reg-reg move, optimising out the 'do nothing' case */
 if (ret != arg) {
 tcg_out_opc_reg(s, OPC_OR, ret, arg, TCG_REG_ZERO);
 }
+return true;
 }
 
 static void tcg_out_movi(TCGContext *s, TCGType type,
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 773690f1d9..ec8e336be8 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -566,12 +566,13 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt,
  TCGReg base, tcg_target_long offset);
 
-static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
+static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
 {
 tcg_debug_assert(TCG_TARGET_REG_BITS == 64 || type == TCG_TYPE_I32);
 if (ret != arg) {
 tcg_out32(s, OR | SAB(arg, ret, arg));
 }
+return true;
 }
 
 static inline void tcg_out_rld(TCGContext *s, int op, TCGReg ra, TCGReg rs,
diff --git a/tcg/riscv/tcg-target.inc.c b/tcg/riscv/tcg-target.inc.c
index b785f4acb7..e2bf1c2c6e 100644
--- a/tcg/riscv/tcg-target.inc.c
+++ b/tcg/riscv/tcg-target.inc.c
@@ -515,10 +515,10 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
  * TCG intrinsics
  */
 
-static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg 

[Qemu-devel] [PATCH 05/38] tcg: Allow add_vec, sub_vec, neg_vec, not_vec to be expanded

2019-04-20 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tcg-op-vec.c | 49 
 1 file changed, 33 insertions(+), 16 deletions(-)

diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index 27f65600c3..cfb18682b1 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -226,16 +226,6 @@ void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr b, TCGArg o, 
TCGType low_type)
 vec_gen_3(INDEX_op_st_vec, low_type, 0, ri, bi, o);
 }
 
-void tcg_gen_add_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
-{
-vec_gen_op3(INDEX_op_add_vec, vece, r, a, b);
-}
-
-void tcg_gen_sub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
-{
-vec_gen_op3(INDEX_op_sub_vec, vece, r, a, b);
-}
-
 void tcg_gen_and_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
 vec_gen_op3(INDEX_op_and_vec, 0, r, a, b);
@@ -296,11 +286,30 @@ void tcg_gen_eqv_vec(unsigned vece, TCGv_vec r, TCGv_vec 
a, TCGv_vec b)
 tcg_gen_not_vec(0, r, r);
 }
 
+static bool do_op2(unsigned vece, TCGv_vec r, TCGv_vec a, TCGOpcode opc)
+{
+TCGTemp *rt = tcgv_vec_temp(r);
+TCGTemp *at = tcgv_vec_temp(a);
+TCGArg ri = temp_arg(rt);
+TCGArg ai = temp_arg(at);
+TCGType type = rt->base_type;
+int can;
+
+tcg_debug_assert(at->base_type >= type);
+can = tcg_can_emit_vec_op(opc, type, vece);
+if (can > 0) {
+vec_gen_2(opc, type, vece, ri, ai);
+} else if (can < 0) {
+tcg_expand_vec_op(opc, type, vece, ri, ai);
+} else {
+return false;
+}
+return true;
+}
+
 void tcg_gen_not_vec(unsigned vece, TCGv_vec r, TCGv_vec a)
 {
-if (TCG_TARGET_HAS_not_vec) {
-vec_gen_op2(INDEX_op_not_vec, 0, r, a);
-} else {
+if (!TCG_TARGET_HAS_not_vec || !do_op2(vece, r, a, INDEX_op_not_vec)) {
 TCGv_vec t = tcg_const_ones_vec_matching(r);
 tcg_gen_xor_vec(0, r, a, t);
 tcg_temp_free_vec(t);
@@ -309,9 +318,7 @@ void tcg_gen_not_vec(unsigned vece, TCGv_vec r, TCGv_vec a)
 
 void tcg_gen_neg_vec(unsigned vece, TCGv_vec r, TCGv_vec a)
 {
-if (TCG_TARGET_HAS_neg_vec) {
-vec_gen_op2(INDEX_op_neg_vec, vece, r, a);
-} else {
+if (!TCG_TARGET_HAS_neg_vec || !do_op2(vece, r, a, INDEX_op_neg_vec)) {
 TCGv_vec t = tcg_const_zeros_vec_matching(r);
 tcg_gen_sub_vec(vece, r, t, a);
 tcg_temp_free_vec(t);
@@ -409,6 +416,16 @@ static void do_op3(unsigned vece, TCGv_vec r, TCGv_vec a,
 }
 }
 
+void tcg_gen_add_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
+{
+do_op3(vece, r, a, b, INDEX_op_add_vec);
+}
+
+void tcg_gen_sub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
+{
+do_op3(vece, r, a, b, INDEX_op_sub_vec);
+}
+
 void tcg_gen_mul_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
 do_op3(vece, r, a, b, INDEX_op_mul_vec);
-- 
2.17.1




[Qemu-devel] [PATCH 01/38] target/arm: Fill in .opc for cmtst_op

2019-04-20 Thread Richard Henderson
This allows us to fall back to integers if the tcg backend
does not support comparisons in the given vece.

Signed-off-by: Richard Henderson 
---
 target/arm/translate.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index d408e4d7ef..13e2dc6562 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -6140,16 +6140,20 @@ static void gen_cmtst_vec(unsigned vece, TCGv_vec d, 
TCGv_vec a, TCGv_vec b)
 const GVecGen3 cmtst_op[4] = {
 { .fni4 = gen_helper_neon_tst_u8,
   .fniv = gen_cmtst_vec,
+  .opc = INDEX_op_cmp_vec,
   .vece = MO_8 },
 { .fni4 = gen_helper_neon_tst_u16,
   .fniv = gen_cmtst_vec,
+  .opc = INDEX_op_cmp_vec,
   .vece = MO_16 },
 { .fni4 = gen_cmtst_i32,
   .fniv = gen_cmtst_vec,
+  .opc = INDEX_op_cmp_vec,
   .vece = MO_32 },
 { .fni8 = gen_cmtst_i64,
   .fniv = gen_cmtst_vec,
   .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+  .opc = INDEX_op_cmp_vec,
   .vece = MO_64 },
 };
 
-- 
2.17.1




[Qemu-devel] [PATCH 00/38] tcg vector improvements

2019-04-20 Thread Richard Henderson
Based-on: tcg-next, which at present is only tcg_gen_extract2.

The dupm patches have been on list before, with a larger context
of supporting tcg/ppc.  The rest of the set was written to support
David's s390 vector patches.  In particular:

(1) Add vector absolute value.
(2) Add vector shift by non-constant scalar.
(3) Add vector shift by vector.
(4) Add vector select.
(5) Be more precise in handling target-specific vector expansions.

And then there's a set of bugs that I encountered while working
on this across x86, aa64, and ppc hosts.  Tested primarily with
aa64 as the guest, via RISU.


r~


David Hildenbrand (1):
  tcg: Implement tcg_gen_gvec_3i()

Richard Henderson (37):
  target/arm: Fill in .opc for cmtst_op
  tcg: Assert fixed_reg is read-only
  tcg: Return bool success from tcg_out_mov
  tcg: Support cross-class moves without instruction support
  tcg: Allow add_vec, sub_vec, neg_vec, not_vec to be expanded
  tcg: Promote tcg_out_{dup,dupi}_vec to backend interface
  tcg: Manually expand INDEX_op_dup_vec
  tcg: Add tcg_out_dupm_vec to the backend interface
  tcg/i386: Implement tcg_out_dupm_vec
  tcg/aarch64: Implement tcg_out_dupm_vec
  tcg: Add INDEX_op_dup_mem_vec
  tcg: Add gvec expanders for variable shift
  tcg/i386: Support vector variable shift opcodes
  tcg/aarch64: Support vector variable shift opcodes
  tcg: Specify optional vector requirements with a list
  tcg: Add gvec expanders for vector shift by scalar
  tcg/i386: Support vector scalar shift opcodes
  tcg: Add support for integer absolute value
  tcg: Add support for vector absolute value
  target/arm: Use tcg_gen_abs_i64 and tcg_gen_gvec_abs
  target/cris: Use tcg_gen_abs_tl
  target/ppc: Use tcg_gen_abs_tl
  target/s390x: Use tcg_gen_abs_i64
  target/xtensa: Use tcg_gen_abs_i32
  tcg/i386: Support vector absolute value
  tcg/aarch64: Support vector absolute value
  tcg: Add support for vector comparison select
  tcg/i386: Support vector comparison select value
  tcg/aarch64: Support vector comparison select value
  target/ppc: Use vector variable shifts for VS{L,R,RA}{B,H,W,D}
  target/arm: Vectorize USHL and SSHL
  tcg/aarch64: Do not advertise minmax for MO_64
  tcg: Do not recreate INDEX_op_neg_vec unless supported
  tcg: Introduce do_op3_nofail for vector expansion
  tcg: Expand vector minmax using cmp+cmpsel
  tcg/aarch64: Use MVNI for expansion of dupi
  tcg/aarch64: Use ORRI and BICI for vector logical operations

 accel/tcg/tcg-runtime.h |  20 +
 target/arm/helper.h |  17 +-
 target/arm/translate.h  |   6 +
 target/ppc/helper.h |  24 +-
 tcg/aarch64/tcg-target.h|   4 +-
 tcg/aarch64/tcg-target.opc.h|   2 +
 tcg/i386/tcg-target.h   |   6 +-
 tcg/i386/tcg-target.opc.h   |   1 -
 tcg/tcg-op-gvec.h   |  60 +-
 tcg/tcg-op.h|  16 +
 tcg/tcg-opc.h   |   3 +
 tcg/tcg.h   |  20 +
 accel/tcg/tcg-runtime-gvec.c| 180 ++
 target/arm/neon_helper.c|  38 --
 target/arm/translate-a64.c  |  59 +-
 target/arm/translate-sve.c  |   9 +-
 target/arm/translate.c  | 432 ++---
 target/arm/vec_helper.c | 176 ++
 target/cris/translate.c |   9 +-
 target/ppc/int_helper.c |   6 +-
 target/ppc/translate.c  |  80 +--
 target/ppc/translate/vmx-impl.inc.c | 175 +-
 target/s390x/translate.c|   8 +-
 target/xtensa/translate.c   |   9 +-
 tcg/aarch64/tcg-target.inc.c| 227 ++-
 tcg/arm/tcg-target.inc.c|   7 +-
 tcg/i386/tcg-target.inc.c   | 176 +-
 tcg/mips/tcg-target.inc.c   |   3 +-
 tcg/optimize.c  |   8 +-
 tcg/ppc/tcg-target.inc.c|   3 +-
 tcg/riscv/tcg-target.inc.c  |   5 +-
 tcg/s390/tcg-target.inc.c   |   3 +-
 tcg/sparc/tcg-target.inc.c  |   3 +-
 tcg/tcg-op-gvec.c   | 917 +++-
 tcg/tcg-op-vec.c| 259 +++-
 tcg/tcg-op.c|  20 +
 tcg/tcg.c   | 256 ++--
 tcg/tci/tcg-target.inc.c|   3 +-
 tcg/README  |  16 +
 39 files changed, 2699 insertions(+), 567 deletions(-)

-- 
2.17.1




[Qemu-devel] [PATCH v3] cputlb: Fix io_readx() to respect the access_type

2019-04-20 Thread Shahab Vahedi
This change adapts io_readx() to its input access_type. Currently
io_readx() treats any memory access as a read, although it has an
input argument "MMUAccessType access_type". This results in:

1) Calling the tlb_fill() only with MMU_DATA_LOAD
2) Considering only entry->addr_read as the tlb_addr

Buglink: https://bugs.launchpad.net/qemu/+bug/1825359

Signed-off-by: Shahab Vahedi 
---
Changelog:
v3
  - Only handle read/fetch. There must be no write access.

v2
  - Extra space before closing parenthesis is removed

v1
  - Initial submit

 accel/tcg/cputlb.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 88cc8389e9..6d50fcc52d 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -868,6 +868,9 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry 
*iotlbentry,
 bool locked = false;
 MemTxResult r;
 
+/* Only support for reading/fetching IO */
+assert(access_type == MMU_DATA_LOAD || access_type == MMU_INST_FETCH);
+
 if (recheck) {
 /*
  * This is a TLB_RECHECK access, where the MMU protection
@@ -878,10 +881,11 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry 
*iotlbentry,
 CPUTLBEntry *entry;
 target_ulong tlb_addr;
 
-tlb_fill(cpu, addr, size, MMU_DATA_LOAD, mmu_idx, retaddr);
+tlb_fill(cpu, addr, size, access_type, mmu_idx, retaddr);
 
 entry = tlb_entry(env, mmu_idx, addr);
-tlb_addr = entry->addr_read;
+tlb_addr = (access_type == MMU_DATA_LOAD) ?
+entry->addr_read : entry->addr_code;
 if (!(tlb_addr & ~(TARGET_PAGE_MASK | TLB_RECHECK))) {
 /* RAM access */
 uintptr_t haddr = addr + entry->addend;
-- 
2.21.0