[PATCH v2 11/11] docs: system: arm: Introduce bananapi_m2u

2023-03-27 Thread qianfanguijin
From: qianfan Zhao 

Add documents for Banana Pi M2U

Signed-off-by: qianfan Zhao 
---
 docs/system/arm/bananapi_m2u.rst | 138 +++
 1 file changed, 138 insertions(+)
 create mode 100644 docs/system/arm/bananapi_m2u.rst

diff --git a/docs/system/arm/bananapi_m2u.rst b/docs/system/arm/bananapi_m2u.rst
new file mode 100644
index 00..ae7194a9df
--- /dev/null
+++ b/docs/system/arm/bananapi_m2u.rst
@@ -0,0 +1,138 @@
+Banana Pi BPI-M2U (``bpim2u``)
+^^
+
+Banana Pi BPI-M2 Ultra is a quad-core mini single board computer built with
+Allwinner A40i/R40/V40 SoC. It features 2GB of RAM and 8GB eMMC. It also
+has onboard WiFi and BT. On the ports side, the BPI-M2 Ultra has 2 USB A
+2.0 ports, 1 USB OTG port, 1 HDMI port, 1 audio jack, a DC power port,
+and last but not least, a SATA port.
+
+Supported devices
+"
+
+The Banana Pi M2U machine supports the following devices:
+
+ * SMP (Quad Core Cortex-A7)
+ * Generic Interrupt Controller configuration
+ * SRAM mappings
+ * SDRAM controller
+ * Timer device (re-used from Allwinner A10)
+ * UART
+ * SD/MMC storage controller
+ * EMAC ethernet
+ * GMAC ethernet
+ * Clock Control Unit
+ * TWI (I2C)
+
+Limitations
+"""
+
+Currently, Banana Pi M2U does *not* support the following features:
+
+- Graphical output via HDMI, GPU and/or the Display Engine
+- Audio output
+- Hardware Watchdog
+- Real Time Clock
+- USB 2.0 interfaces
+
+Also see the 'unimplemented' array in the Allwinner R40 SoC module
+for a complete list of unimplemented I/O devices: ``./hw/arm/allwinner-r40.c``
+
+Boot options
+
+
+The Banana Pi M2U machine can start using the standard -kernel functionality
+for loading a Linux kernel or ELF executable. Additionally, the Banana Pi M2U
+machine can also emulate the BootROM which is present on an actual Allwinner 
R40
+based SoC, which loads the bootloader from a SD card, specified via the -sd
+argument to qemu-system-arm.
+
+Running mainline Linux
+""
+
+To build a Linux mainline kernel that can be booted by the Banana Pi M2U 
machine,
+simply configure the kernel using the sunxi_defconfig configuration:
+
+.. code-block:: bash
+
+  $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- make mrproper
+  $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- make sunxi_defconfig
+
+To boot the newly build linux kernel in QEMU with the Banana Pi M2U machine, 
use:
+
+.. code-block:: bash
+
+  $ qemu-system-arm -M bpim2u -nographic \
+  -kernel /path/to/linux/arch/arm/boot/zImage \
+  -append 'console=ttyS0,115200' \
+  -dtb /path/to/linux/arch/arm/boot/dts/sun8i-r40-bananapi-m2-ultra.dtb
+
+Banana Pi M2U images
+
+
+Note that the mainline kernel does not have a root filesystem. You can choose
+to build you own image with buildroot using the bananapi_m2_ultra_defconfig.
+Also see https://buildroot.org for more information.
+
+Another possibility is to run an OpenWrt image for Banana Pi M2U which
+can be downloaded from:
+
+   https://downloads.openwrt.org/releases/22.03.3/targets/sunxi/cortexa7/
+
+When using an image as an SD card, it must be resized to a power of two. This 
can be
+done with the ``qemu-img`` command. It is recommended to only increase the 
image size
+instead of shrinking it to a power of two, to avoid loss of data. For example,
+to prepare a downloaded Armbian image, first extract it and then increase
+its size to one gigabyte as follows:
+
+.. code-block:: bash
+
+  $ qemu-img resize \
+openwrt-22.03.3-sunxi-cortexa7-sinovoip_bananapi-m2-ultra-ext4-sdcard.img \
+1G
+
+Instead of providing a custom Linux kernel via the -kernel command you may also
+choose to let the Banana Pi M2U machine load the bootloader from SD card, just 
like
+a real board would do using the BootROM. Simply pass the selected image via 
the -sd
+argument and remove the -kernel, -append, -dbt and -initrd arguments:
+
+.. code-block:: bash
+
+  $ qemu-system-arm -M bpim2u -nic user -nographic \
+-sd 
openwrt-22.03.3-sunxi-cortexa7-sinovoip_bananapi-m2-ultra-ext4-sdcard.img
+
+Running U-Boot
+""
+
+U-Boot mainline can be build and configured using the 
Bananapi_M2_Ultra_defconfig
+using similar commands as describe above for Linux. Note that it is recommended
+for development/testing to select the following configuration setting in 
U-Boot:
+
+  Device Tree Control > Provider for DTB for DT Control > Embedded DTB
+
+The BootROM of allwinner R40 loading u-boot from the 8KiB offset of sdcard.
+Let's create an bootable disk image:
+
+.. code-block:: bash
+
+  $ dd if=/dev/zero of=sd.img bs=32M count=1
+  $ dd if=u-boot-sunxi-with-spl.bin of=sd.img bs=1k seek=8 conv=notrunc
+
+And then boot it.
+
+.. code-block:: bash
+  $ qemu-system-arm -M bpim2u -nographic -sd sd.img
+
+Banana Pi M2U integration tests
+""
+
+The Banana Pi M2U machine has several integration tests included.
+To run the whole set of 

[PATCH v2 01/12] hw: arm: Add bananapi M2-Ultra and allwinner-r40 support

2023-03-27 Thread qianfanguijin
From: qianfan Zhao 

Allwinner R40 (sun8i) SoC features a Quad-Core Cortex-A7 ARM CPU,
and a Mali400 MP2 GPU from ARM. It's also known as the Allwinner T3
for In-Car Entertainment usage, A40i and A40pro are variants that
differ in applicable temperatures range (industrial and military).

This patch is a draft and provides very few features that we will
improve late.

Signed-off-by: qianfan Zhao 
---
 configs/devices/arm-softmmu/default.mak |   1 +
 hw/arm/Kconfig  |   9 +
 hw/arm/allwinner-r40.c  | 418 
 hw/arm/bananapi_m2u.c   | 129 
 hw/arm/meson.build  |   1 +
 include/hw/arm/allwinner-r40.h  | 110 +++
 6 files changed, 668 insertions(+)
 create mode 100644 hw/arm/allwinner-r40.c
 create mode 100644 hw/arm/bananapi_m2u.c
 create mode 100644 include/hw/arm/allwinner-r40.h

diff --git a/configs/devices/arm-softmmu/default.mak 
b/configs/devices/arm-softmmu/default.mak
index 1b49a7830c..76a43add23 100644
--- a/configs/devices/arm-softmmu/default.mak
+++ b/configs/devices/arm-softmmu/default.mak
@@ -43,3 +43,4 @@ CONFIG_FSL_IMX6UL=y
 CONFIG_SEMIHOSTING=y
 CONFIG_ARM_COMPATIBLE_SEMIHOSTING=y
 CONFIG_ALLWINNER_H3=y
+CONFIG_ALLWINNER_R40=y
diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index b5aed4aff5..9e14c3427e 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -344,6 +344,15 @@ config ALLWINNER_H3
 select USB_EHCI_SYSBUS
 select SD
 
+config ALLWINNER_R40
+bool
+select ALLWINNER_A10_PIT
+select SERIAL
+select ARM_TIMER
+select ARM_GIC
+select UNIMP
+select SD
+
 config RASPI
 bool
 select FRAMEBUFFER
diff --git a/hw/arm/allwinner-r40.c b/hw/arm/allwinner-r40.c
new file mode 100644
index 00..b743d64253
--- /dev/null
+++ b/hw/arm/allwinner-r40.c
@@ -0,0 +1,418 @@
+/*
+ * Allwinner R40/A40i/T3 System on Chip emulation
+ *
+ * Copyright (C) 2023 qianfan Zhao 
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "qemu/bswap.h"
+#include "qemu/module.h"
+#include "qemu/units.h"
+#include "hw/qdev-core.h"
+#include "hw/sysbus.h"
+#include "hw/char/serial.h"
+#include "hw/misc/unimp.h"
+#include "hw/usb/hcd-ehci.h"
+#include "hw/loader.h"
+#include "sysemu/sysemu.h"
+#include "hw/arm/allwinner-r40.h"
+
+/* Memory map */
+const hwaddr allwinner_r40_memmap[] = {
+[AW_R40_DEV_SRAM_A1]= 0x,
+[AW_R40_DEV_SRAM_A2]= 0x4000,
+[AW_R40_DEV_SRAM_A3]= 0x8000,
+[AW_R40_DEV_SRAM_A4]= 0xb400,
+[AW_R40_DEV_MMC0]   = 0x01c0f000,
+[AW_R40_DEV_MMC1]   = 0x01c1,
+[AW_R40_DEV_MMC2]   = 0x01c11000,
+[AW_R40_DEV_MMC3]   = 0x01c12000,
+[AW_R40_DEV_PIT]= 0x01c20c00,
+[AW_R40_DEV_UART0]  = 0x01c28000,
+[AW_R40_DEV_GIC_DIST]   = 0x01c81000,
+[AW_R40_DEV_GIC_CPU]= 0x01c82000,
+[AW_R40_DEV_GIC_HYP]= 0x01c84000,
+[AW_R40_DEV_GIC_VCPU]   = 0x01c86000,
+[AW_R40_DEV_SDRAM]  = 0x4000
+};
+
+/* List of unimplemented devices */
+struct AwR40Unimplemented {
+const char *device_name;
+hwaddr base;
+hwaddr size;
+};
+
+static struct AwR40Unimplemented r40_unimplemented[] = {
+{ "d-engine",   0x0100, 4 * MiB },
+{ "d-inter",0x0140, 128 * KiB },
+{ "sram-c", 0x01c0, 4 * KiB },
+{ "dma",0x01c02000, 4 * KiB },
+{ "nfdc",   0x01c03000, 4 * KiB },
+{ "ts", 0x01c04000, 4 * KiB },
+{ "spi0",   0x01c05000, 4 * KiB },
+{ "spi1",   0x01c06000, 4 * KiB },
+{ "cs0",0x01c09000, 4 * KiB },
+{ "keymem", 0x01c0a000, 4 * KiB },
+{ "emac",   0x01c0b000, 4 * KiB },
+{ "usb0-otg",   0x01c13000, 4 * KiB },
+{ "usb0-host",  0x01c14000, 4 * KiB },
+{ "crypto", 0x01c15000, 4 * KiB },
+{ "spi2",   0x01c17000, 4 * KiB },
+{ "sata",   0x01c18000, 4 * KiB },
+{ "usb1-host",  0x01c19000, 4 * KiB },
+{ "sid",0x01c1b000, 4 * KiB },
+{ "usb2-host",  0x01c1c000, 4 * KiB },
+{ "cs1",0x01c1d000, 4 * KiB },
+{ "spi3",   0x01c1f000, 4 * KiB },
+{ "ccu",0x01c2, 1 * KiB },
+{ "rtc",0x01c20400, 1 * KiB },
+{ "pio",0x01c20800, 1 * KiB },
+{ "owa",

[PATCH v2 12/12] docs: system: arm: Introduce bananapi_m2u

2023-03-27 Thread qianfanguijin
From: qianfan Zhao 

Add documents for Banana Pi M2U

Signed-off-by: qianfan Zhao 
---
 docs/system/arm/bananapi_m2u.rst | 138 +++
 1 file changed, 138 insertions(+)
 create mode 100644 docs/system/arm/bananapi_m2u.rst

diff --git a/docs/system/arm/bananapi_m2u.rst b/docs/system/arm/bananapi_m2u.rst
new file mode 100644
index 00..ae7194a9df
--- /dev/null
+++ b/docs/system/arm/bananapi_m2u.rst
@@ -0,0 +1,138 @@
+Banana Pi BPI-M2U (``bpim2u``)
+^^
+
+Banana Pi BPI-M2 Ultra is a quad-core mini single board computer built with
+Allwinner A40i/R40/V40 SoC. It features 2GB of RAM and 8GB eMMC. It also
+has onboard WiFi and BT. On the ports side, the BPI-M2 Ultra has 2 USB A
+2.0 ports, 1 USB OTG port, 1 HDMI port, 1 audio jack, a DC power port,
+and last but not least, a SATA port.
+
+Supported devices
+"
+
+The Banana Pi M2U machine supports the following devices:
+
+ * SMP (Quad Core Cortex-A7)
+ * Generic Interrupt Controller configuration
+ * SRAM mappings
+ * SDRAM controller
+ * Timer device (re-used from Allwinner A10)
+ * UART
+ * SD/MMC storage controller
+ * EMAC ethernet
+ * GMAC ethernet
+ * Clock Control Unit
+ * TWI (I2C)
+
+Limitations
+"""
+
+Currently, Banana Pi M2U does *not* support the following features:
+
+- Graphical output via HDMI, GPU and/or the Display Engine
+- Audio output
+- Hardware Watchdog
+- Real Time Clock
+- USB 2.0 interfaces
+
+Also see the 'unimplemented' array in the Allwinner R40 SoC module
+for a complete list of unimplemented I/O devices: ``./hw/arm/allwinner-r40.c``
+
+Boot options
+
+
+The Banana Pi M2U machine can start using the standard -kernel functionality
+for loading a Linux kernel or ELF executable. Additionally, the Banana Pi M2U
+machine can also emulate the BootROM which is present on an actual Allwinner 
R40
+based SoC, which loads the bootloader from a SD card, specified via the -sd
+argument to qemu-system-arm.
+
+Running mainline Linux
+""
+
+To build a Linux mainline kernel that can be booted by the Banana Pi M2U 
machine,
+simply configure the kernel using the sunxi_defconfig configuration:
+
+.. code-block:: bash
+
+  $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- make mrproper
+  $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- make sunxi_defconfig
+
+To boot the newly build linux kernel in QEMU with the Banana Pi M2U machine, 
use:
+
+.. code-block:: bash
+
+  $ qemu-system-arm -M bpim2u -nographic \
+  -kernel /path/to/linux/arch/arm/boot/zImage \
+  -append 'console=ttyS0,115200' \
+  -dtb /path/to/linux/arch/arm/boot/dts/sun8i-r40-bananapi-m2-ultra.dtb
+
+Banana Pi M2U images
+
+
+Note that the mainline kernel does not have a root filesystem. You can choose
+to build you own image with buildroot using the bananapi_m2_ultra_defconfig.
+Also see https://buildroot.org for more information.
+
+Another possibility is to run an OpenWrt image for Banana Pi M2U which
+can be downloaded from:
+
+   https://downloads.openwrt.org/releases/22.03.3/targets/sunxi/cortexa7/
+
+When using an image as an SD card, it must be resized to a power of two. This 
can be
+done with the ``qemu-img`` command. It is recommended to only increase the 
image size
+instead of shrinking it to a power of two, to avoid loss of data. For example,
+to prepare a downloaded Armbian image, first extract it and then increase
+its size to one gigabyte as follows:
+
+.. code-block:: bash
+
+  $ qemu-img resize \
+openwrt-22.03.3-sunxi-cortexa7-sinovoip_bananapi-m2-ultra-ext4-sdcard.img \
+1G
+
+Instead of providing a custom Linux kernel via the -kernel command you may also
+choose to let the Banana Pi M2U machine load the bootloader from SD card, just 
like
+a real board would do using the BootROM. Simply pass the selected image via 
the -sd
+argument and remove the -kernel, -append, -dbt and -initrd arguments:
+
+.. code-block:: bash
+
+  $ qemu-system-arm -M bpim2u -nic user -nographic \
+-sd 
openwrt-22.03.3-sunxi-cortexa7-sinovoip_bananapi-m2-ultra-ext4-sdcard.img
+
+Running U-Boot
+""
+
+U-Boot mainline can be build and configured using the 
Bananapi_M2_Ultra_defconfig
+using similar commands as describe above for Linux. Note that it is recommended
+for development/testing to select the following configuration setting in 
U-Boot:
+
+  Device Tree Control > Provider for DTB for DT Control > Embedded DTB
+
+The BootROM of allwinner R40 loading u-boot from the 8KiB offset of sdcard.
+Let's create an bootable disk image:
+
+.. code-block:: bash
+
+  $ dd if=/dev/zero of=sd.img bs=32M count=1
+  $ dd if=u-boot-sunxi-with-spl.bin of=sd.img bs=1k seek=8 conv=notrunc
+
+And then boot it.
+
+.. code-block:: bash
+  $ qemu-system-arm -M bpim2u -nographic -sd sd.img
+
+Banana Pi M2U integration tests
+""
+
+The Banana Pi M2U machine has several integration tests included.
+To run the whole set of 

[PATCH v2 08/12] hw: arm: allwinner-r40: Fix the mmc controller's type

2023-03-27 Thread qianfanguijin
From: qianfan Zhao 

R40 has SAMP_DL_REG register and mmc2 controller has only 8K dma buffer.
Fix it's compatible string.

Signed-off-by: qianfan Zhao 
---
 hw/arm/allwinner-r40.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/arm/allwinner-r40.c b/hw/arm/allwinner-r40.c
index 0e4542d35f..b148c56449 100644
--- a/hw/arm/allwinner-r40.c
+++ b/hw/arm/allwinner-r40.c
@@ -271,7 +271,7 @@ static void allwinner_r40_init(Object *obj)
 
 for (int i = 0; i < AW_R40_NUM_MMCS; i++) {
 object_initialize_child(obj, mmc_names[i], >mmc[i],
-TYPE_AW_SDHOST_SUN5I);
+TYPE_AW_SDHOST_SUN50I_A64);
 }
 
 object_initialize_child(obj, "twi0", >i2c0, TYPE_AW_I2C_SUN6I);
-- 
2.25.1




[PATCH v2 05/12] hw/misc: Rename axp209 to axp22x and add support AXP221 PMU

2023-03-27 Thread qianfanguijin
From: qianfan Zhao 

This patch adds minimal support for AXP-221 PMU and connect it to
bananapi M2U board.

Signed-off-by: qianfan Zhao 
---
 hw/arm/Kconfig|   3 +-
 hw/arm/bananapi_m2u.c |   6 +
 hw/misc/Kconfig   |   2 +-
 hw/misc/axp209.c  | 238 ---
 hw/misc/axp2xx.c  | 283 ++
 hw/misc/meson.build   |   2 +-
 hw/misc/trace-events  |   8 +-
 7 files changed, 297 insertions(+), 245 deletions(-)
 delete mode 100644 hw/misc/axp209.c
 create mode 100644 hw/misc/axp2xx.c

diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 9e14c3427e..85ded354ed 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -327,7 +327,7 @@ config ALLWINNER_A10
 select ALLWINNER_A10_DRAMC
 select ALLWINNER_EMAC
 select ALLWINNER_I2C
-select AXP209_PMU
+select AXP2XX_PMU
 select SERIAL
 select UNIMP
 
@@ -347,6 +347,7 @@ config ALLWINNER_H3
 config ALLWINNER_R40
 bool
 select ALLWINNER_A10_PIT
+select AXP2XX_PMU
 select SERIAL
 select ARM_TIMER
 select ARM_GIC
diff --git a/hw/arm/bananapi_m2u.c b/hw/arm/bananapi_m2u.c
index 1d49a006b5..9c5360a41b 100644
--- a/hw/arm/bananapi_m2u.c
+++ b/hw/arm/bananapi_m2u.c
@@ -23,6 +23,7 @@
 #include "qapi/error.h"
 #include "qemu/error-report.h"
 #include "hw/boards.h"
+#include "hw/i2c/i2c.h"
 #include "hw/qdev-properties.h"
 #include "hw/arm/allwinner-r40.h"
 
@@ -61,6 +62,7 @@ static void bpim2u_init(MachineState *machine)
 {
 bool bootroom_loaded = false;
 AwR40State *r40;
+I2CBus *i2c;
 
 /* BIOS is not supported by this board */
 if (machine->firmware) {
@@ -104,6 +106,10 @@ static void bpim2u_init(MachineState *machine)
 }
 }
 
+/* Connect AXP221 */
+i2c = I2C_BUS(qdev_get_child_bus(DEVICE(>i2c0), "i2c"));
+i2c_slave_create_simple(i2c, "axp221_pmu", 0x34);
+
 /* SDRAM */
 memory_region_add_subregion(get_system_memory(),
 r40->memmap[AW_R40_DEV_SDRAM], machine->ram);
diff --git a/hw/misc/Kconfig b/hw/misc/Kconfig
index 2ef5781ef8..efeb430a6c 100644
--- a/hw/misc/Kconfig
+++ b/hw/misc/Kconfig
@@ -176,7 +176,7 @@ config ALLWINNER_A10_CCM
 config ALLWINNER_A10_DRAMC
 bool
 
-config AXP209_PMU
+config AXP2XX_PMU
 bool
 depends on I2C
 
diff --git a/hw/misc/axp209.c b/hw/misc/axp209.c
deleted file mode 100644
index 2908ed99a6..00
--- a/hw/misc/axp209.c
+++ /dev/null
@@ -1,238 +0,0 @@
-/*
- * AXP-209 PMU Emulation
- *
- * Copyright (C) 2022 Strahinja Jankovic 
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
- * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
- * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
- * DEALINGS IN THE SOFTWARE.
- *
- * SPDX-License-Identifier: MIT
- */
-
-#include "qemu/osdep.h"
-#include "qemu/log.h"
-#include "trace.h"
-#include "hw/i2c/i2c.h"
-#include "migration/vmstate.h"
-
-#define TYPE_AXP209_PMU "axp209_pmu"
-
-#define AXP209(obj) \
-OBJECT_CHECK(AXP209I2CState, (obj), TYPE_AXP209_PMU)
-
-/* registers */
-enum {
-REG_POWER_STATUS = 0x0u,
-REG_OPERATING_MODE,
-REG_OTG_VBUS_STATUS,
-REG_CHIP_VERSION,
-REG_DATA_CACHE_0,
-REG_DATA_CACHE_1,
-REG_DATA_CACHE_2,
-REG_DATA_CACHE_3,
-REG_DATA_CACHE_4,
-REG_DATA_CACHE_5,
-REG_DATA_CACHE_6,
-REG_DATA_CACHE_7,
-REG_DATA_CACHE_8,
-REG_DATA_CACHE_9,
-REG_DATA_CACHE_A,
-REG_DATA_CACHE_B,
-REG_POWER_OUTPUT_CTRL = 0x12u,
-REG_DC_DC2_OUT_V_CTRL = 0x23u,
-REG_DC_DC2_DVS_CTRL = 0x25u,
-REG_DC_DC3_OUT_V_CTRL = 0x27u,
-REG_LDO2_4_OUT_V_CTRL,
-REG_LDO3_OUT_V_CTRL,
-REG_VBUS_CH_MGMT = 0x30u,
-REG_SHUTDOWN_V_CTRL,
-REG_SHUTDOWN_CTRL,
-REG_CHARGE_CTRL_1,
-REG_CHARGE_CTRL_2,
-REG_SPARE_CHARGE_CTRL,
-REG_PEK_KEY_CTRL,
-REG_DC_DC_FREQ_SET,
-REG_CHR_TEMP_TH_SET,
-REG_CHR_HIGH_TEMP_TH_CTRL,
-REG_IPSOUT_WARN_L1,
-REG_IPSOUT_WARN_L2,
-REG_DISCHR_TEMP_TH_SET,
-REG_DISCHR_HIGH_TEMP_TH_CTRL,
-REG_IRQ_BANK_1_CTRL = 0x40u,
-REG_IRQ_BANK_2_CTRL,
-

[PATCH v2 11/12] tests: avocado: boot_linux_console: Add test case for bpim2u

2023-03-27 Thread qianfanguijin
From: qianfan Zhao 

Add test case for booting from initrd and sd card.

Signed-off-by: qianfan Zhao 
---
 tests/avocado/boot_linux_console.py | 176 
 1 file changed, 176 insertions(+)

diff --git a/tests/avocado/boot_linux_console.py 
b/tests/avocado/boot_linux_console.py
index 574609bf43..d17417828c 100644
--- a/tests/avocado/boot_linux_console.py
+++ b/tests/avocado/boot_linux_console.py
@@ -760,6 +760,182 @@ def test_arm_quanta_gsj_initrd(self):
 self.wait_for_console_pattern(
 'Give root password for system maintenance')
 
+def test_arm_bpim2u(self):
+"""
+:avocado: tags=arch:arm
+:avocado: tags=machine:bpim2u
+:avocado: tags=accel:tcg
+"""
+deb_url = ('https://apt.armbian.com/pool/main/l/linux-5.10.16-sunxi/'
+   'linux-image-current-sunxi_21.02.2_armhf.deb')
+deb_hash = '9fa84beda245cabf0b4fa84cf6eaa7738ead1da0'
+deb_path = self.fetch_asset(deb_url, asset_hash=deb_hash)
+kernel_path = self.extract_from_deb(deb_path,
+'/boot/vmlinuz-5.10.16-sunxi')
+dtb_path = ('/usr/lib/linux-image-current-sunxi/'
+'sun8i-r40-bananapi-m2-ultra.dtb')
+dtb_path = self.extract_from_deb(deb_path, dtb_path)
+
+self.vm.set_console()
+kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
+   'console=ttyS0,115200n8 '
+   'earlycon=uart,mmio32,0x1c28000')
+self.vm.add_args('-kernel', kernel_path,
+ '-dtb', dtb_path,
+ '-append', kernel_command_line)
+self.vm.launch()
+console_pattern = 'Kernel command line: %s' % kernel_command_line
+self.wait_for_console_pattern(console_pattern)
+
+def test_arm_bpim2u_initrd(self):
+"""
+:avocado: tags=arch:arm
+:avocado: tags=accel:tcg
+:avocado: tags=machine:bpim2u
+"""
+deb_url = ('https://apt.armbian.com/pool/main/l/linux-5.10.16-sunxi/'
+   'linux-image-current-sunxi_21.02.2_armhf.deb')
+deb_hash = '9fa84beda245cabf0b4fa84cf6eaa7738ead1da0'
+deb_path = self.fetch_asset(deb_url, asset_hash=deb_hash)
+kernel_path = self.extract_from_deb(deb_path,
+'/boot/vmlinuz-5.10.16-sunxi')
+dtb_path = ('/usr/lib/linux-image-current-sunxi/'
+'sun8i-r40-bananapi-m2-ultra.dtb')
+dtb_path = self.extract_from_deb(deb_path, dtb_path)
+initrd_url = ('https://github.com/groeck/linux-build-test/raw/'
+  '2eb0a73b5d5a28df3170c546ddaaa9757e1e0848/rootfs/'
+  'arm/rootfs-armv7a.cpio.gz')
+initrd_hash = '604b2e45cdf35045846b8bbfbf2129b1891bdc9c'
+initrd_path_gz = self.fetch_asset(initrd_url, asset_hash=initrd_hash)
+initrd_path = os.path.join(self.workdir, 'rootfs.cpio')
+archive.gzip_uncompress(initrd_path_gz, initrd_path)
+
+self.vm.set_console()
+kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
+   'console=ttyS0,115200 '
+   'panic=-1 noreboot')
+self.vm.add_args('-kernel', kernel_path,
+ '-dtb', dtb_path,
+ '-initrd', initrd_path,
+ '-append', kernel_command_line,
+ '-no-reboot')
+self.vm.launch()
+self.wait_for_console_pattern('Boot successful.')
+
+exec_command_and_wait_for_pattern(self, 'cat /proc/cpuinfo',
+'Allwinner sun8i Family')
+exec_command_and_wait_for_pattern(self, 'cat /proc/iomem',
+'system-control@1c0')
+exec_command_and_wait_for_pattern(self, 'reboot',
+'reboot: Restarting system')
+# Wait for VM to shut down gracefully
+self.vm.wait()
+
+def test_arm_bpim2u_gmac(self):
+"""
+:avocado: tags=arch:arm
+:avocado: tags=accel:tcg
+:avocado: tags=machine:bpim2u
+:avocado: tags=device:sd
+"""
+self.require_netdev('user')
+
+deb_url = ('https://apt.armbian.com/pool/main/l/linux-5.10.16-sunxi/'
+   'linux-image-current-sunxi_21.02.2_armhf.deb')
+deb_hash = '9fa84beda245cabf0b4fa84cf6eaa7738ead1da0'
+deb_path = self.fetch_asset(deb_url, asset_hash=deb_hash)
+kernel_path = self.extract_from_deb(deb_path,
+'/boot/vmlinuz-5.10.16-sunxi')
+dtb_path = ('/usr/lib/linux-image-current-sunxi/'
+'sun8i-r40-bananapi-m2-ultra.dtb')
+dtb_path = self.extract_from_deb(deb_path, dtb_path)
+rootfs_url = 

[PATCH v2 06/12] hw/arm/allwinner-r40: add SDRAM controller device

2023-03-27 Thread qianfanguijin
From: qianfan Zhao 

Types of memory that the SDRAM controller supports are DDR2/DDR3
and capacities of up to 2GiB. This commit adds emulation support
of the Allwinner R40 SDRAM controller.

This driver only support 256M, 512M and 1024M memory now.

Signed-off-by: qianfan Zhao 
---
 hw/arm/allwinner-r40.c|  21 +-
 hw/arm/bananapi_m2u.c |   7 +
 hw/misc/allwinner-r40-dramc.c | 513 ++
 hw/misc/meson.build   |   1 +
 hw/misc/trace-events  |  14 +
 include/hw/arm/allwinner-r40.h|  13 +-
 include/hw/misc/allwinner-r40-dramc.h | 108 ++
 7 files changed, 674 insertions(+), 3 deletions(-)
 create mode 100644 hw/misc/allwinner-r40-dramc.c
 create mode 100644 include/hw/misc/allwinner-r40-dramc.h

diff --git a/hw/arm/allwinner-r40.c b/hw/arm/allwinner-r40.c
index 4bc582630c..0e4542d35f 100644
--- a/hw/arm/allwinner-r40.c
+++ b/hw/arm/allwinner-r40.c
@@ -31,6 +31,7 @@
 #include "hw/loader.h"
 #include "sysemu/sysemu.h"
 #include "hw/arm/allwinner-r40.h"
+#include "hw/misc/allwinner-r40-dramc.h"
 
 /* Memory map */
 const hwaddr allwinner_r40_memmap[] = {
@@ -53,6 +54,9 @@ const hwaddr allwinner_r40_memmap[] = {
 [AW_R40_DEV_UART6]  = 0x01c29800,
 [AW_R40_DEV_UART7]  = 0x01c29c00,
 [AW_R40_DEV_TWI0]   = 0x01c2ac00,
+[AW_R40_DEV_DRAMCOM]= 0x01c62000,
+[AW_R40_DEV_DRAMCTL]= 0x01c63000,
+[AW_R40_DEV_DRAMPHY]= 0x01c65000,
 [AW_R40_DEV_GIC_DIST]   = 0x01c81000,
 [AW_R40_DEV_GIC_CPU]= 0x01c82000,
 [AW_R40_DEV_GIC_HYP]= 0x01c84000,
@@ -129,8 +133,6 @@ static struct AwR40Unimplemented r40_unimplemented[] = {
 { "gpu",0x01c4, 64 * KiB },
 { "gmac",   0x01c5, 64 * KiB },
 { "hstmr",  0x01c6, 4 * KiB },
-{ "dram-com",   0x01c62000, 4 * KiB },
-{ "dram-ctl",   0x01c63000, 4 * KiB },
 { "tcon-top",   0x01c7, 4 * KiB },
 { "lcd0",   0x01c71000, 4 * KiB },
 { "lcd1",   0x01c72000, 4 * KiB },
@@ -273,6 +275,12 @@ static void allwinner_r40_init(Object *obj)
 }
 
 object_initialize_child(obj, "twi0", >i2c0, TYPE_AW_I2C_SUN6I);
+
+object_initialize_child(obj, "dramc", >dramc, TYPE_AW_R40_DRAMC);
+object_property_add_alias(obj, "ram-addr", OBJECT(>dramc),
+ "ram-addr");
+object_property_add_alias(obj, "ram-size", OBJECT(>dramc),
+  "ram-size");
 }
 
 static void allwinner_r40_realize(DeviceState *dev, Error **errp)
@@ -425,6 +433,15 @@ static void allwinner_r40_realize(DeviceState *dev, Error 
**errp)
 sysbus_connect_irq(SYS_BUS_DEVICE(>i2c0), 0,
qdev_get_gpio_in(DEVICE(>gic), AW_R40_GIC_SPI_TWI0));
 
+/* DRAMC */
+sysbus_realize(SYS_BUS_DEVICE(>dramc), _fatal);
+sysbus_mmio_map(SYS_BUS_DEVICE(>dramc), 0,
+s->memmap[AW_R40_DEV_DRAMCOM]);
+sysbus_mmio_map(SYS_BUS_DEVICE(>dramc), 1,
+s->memmap[AW_R40_DEV_DRAMCTL]);
+sysbus_mmio_map(SYS_BUS_DEVICE(>dramc), 2,
+s->memmap[AW_R40_DEV_DRAMPHY]);
+
 /* Unimplemented devices */
 for (i = 0; i < ARRAY_SIZE(r40_unimplemented); i++) {
 create_unimplemented_device(r40_unimplemented[i].device_name,
diff --git a/hw/arm/bananapi_m2u.c b/hw/arm/bananapi_m2u.c
index 9c5360a41b..20a4550c68 100644
--- a/hw/arm/bananapi_m2u.c
+++ b/hw/arm/bananapi_m2u.c
@@ -85,6 +85,13 @@ static void bpim2u_init(MachineState *machine)
 object_property_set_int(OBJECT(r40), "clk1-freq", 24 * 1000 * 1000,
 _abort);
 
+/* DRAMC */
+r40->ram_size = machine->ram_size / MiB;
+object_property_set_uint(OBJECT(r40), "ram-addr",
+ r40->memmap[AW_R40_DEV_SDRAM], _abort);
+object_property_set_int(OBJECT(r40), "ram-size",
+r40->ram_size, _abort);
+
 /* Mark R40 object realized */
 qdev_realize(DEVICE(r40), NULL, _abort);
 
diff --git a/hw/misc/allwinner-r40-dramc.c b/hw/misc/allwinner-r40-dramc.c
new file mode 100644
index 00..b102bcdaba
--- /dev/null
+++ b/hw/misc/allwinner-r40-dramc.c
@@ -0,0 +1,513 @@
+/*
+ * Allwinner R40 SDRAM Controller emulation
+ *
+ * CCopyright (C) 2023 qianfan Zhao 
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"

[PATCH v2 09/12] hw: arm: allwinner-r40: Add emac and gmac support

2023-03-27 Thread qianfanguijin
From: qianfan Zhao 

R40 has two ethernet controllers named as emac and gmac. The emac is
compatibled with A10, and the GMAC is compatibled with H3.

Signed-off-by: qianfan Zhao 
---
 hw/arm/allwinner-r40.c | 50 --
 hw/arm/bananapi_m2u.c  |  3 ++
 include/hw/arm/allwinner-r40.h |  6 
 3 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/hw/arm/allwinner-r40.c b/hw/arm/allwinner-r40.c
index b148c56449..c018ad231a 100644
--- a/hw/arm/allwinner-r40.c
+++ b/hw/arm/allwinner-r40.c
@@ -39,6 +39,7 @@ const hwaddr allwinner_r40_memmap[] = {
 [AW_R40_DEV_SRAM_A2]= 0x4000,
 [AW_R40_DEV_SRAM_A3]= 0x8000,
 [AW_R40_DEV_SRAM_A4]= 0xb400,
+[AW_R40_DEV_EMAC]   = 0x01c0b000,
 [AW_R40_DEV_MMC0]   = 0x01c0f000,
 [AW_R40_DEV_MMC1]   = 0x01c1,
 [AW_R40_DEV_MMC2]   = 0x01c11000,
@@ -54,6 +55,7 @@ const hwaddr allwinner_r40_memmap[] = {
 [AW_R40_DEV_UART6]  = 0x01c29800,
 [AW_R40_DEV_UART7]  = 0x01c29c00,
 [AW_R40_DEV_TWI0]   = 0x01c2ac00,
+[AW_R40_DEV_GMAC]   = 0x01c5,
 [AW_R40_DEV_DRAMCOM]= 0x01c62000,
 [AW_R40_DEV_DRAMCTL]= 0x01c63000,
 [AW_R40_DEV_DRAMPHY]= 0x01c65000,
@@ -82,7 +84,6 @@ static struct AwR40Unimplemented r40_unimplemented[] = {
 { "spi1",   0x01c06000, 4 * KiB },
 { "cs0",0x01c09000, 4 * KiB },
 { "keymem", 0x01c0a000, 4 * KiB },
-{ "emac",   0x01c0b000, 4 * KiB },
 { "usb0-otg",   0x01c13000, 4 * KiB },
 { "usb0-host",  0x01c14000, 4 * KiB },
 { "crypto", 0x01c15000, 4 * KiB },
@@ -131,7 +132,6 @@ static struct AwR40Unimplemented r40_unimplemented[] = {
 { "tvd2",   0x01c33000, 4 * KiB },
 { "tvd3",   0x01c34000, 4 * KiB },
 { "gpu",0x01c4, 64 * KiB },
-{ "gmac",   0x01c5, 64 * KiB },
 { "hstmr",  0x01c6, 4 * KiB },
 { "tcon-top",   0x01c7, 4 * KiB },
 { "lcd0",   0x01c71000, 4 * KiB },
@@ -180,6 +180,8 @@ enum {
 AW_R40_GIC_SPI_MMC1  = 33,
 AW_R40_GIC_SPI_MMC2  = 34,
 AW_R40_GIC_SPI_MMC3  = 35,
+AW_R40_GIC_SPI_EMAC  = 55,
+AW_R40_GIC_SPI_GMAC  = 85,
 };
 
 /* Allwinner R40 general constants */
@@ -276,6 +278,11 @@ static void allwinner_r40_init(Object *obj)
 
 object_initialize_child(obj, "twi0", >i2c0, TYPE_AW_I2C_SUN6I);
 
+object_initialize_child(obj, "emac", >emac, TYPE_AW_EMAC);
+object_initialize_child(obj, "gmac", >gmac, TYPE_AW_SUN8I_EMAC);
+object_property_add_alias(obj, "gmac-phy-addr",
+  OBJECT(>gmac), "phy-addr");
+
 object_initialize_child(obj, "dramc", >dramc, TYPE_AW_R40_DRAMC);
 object_property_add_alias(obj, "ram-addr", OBJECT(>dramc),
  "ram-addr");
@@ -285,6 +292,7 @@ static void allwinner_r40_init(Object *obj)
 
 static void allwinner_r40_realize(DeviceState *dev, Error **errp)
 {
+const char *r40_nic_models[] = { "gmac", "emac", NULL };
 AwR40State *s = AW_R40(dev);
 unsigned i;
 
@@ -442,6 +450,44 @@ static void allwinner_r40_realize(DeviceState *dev, Error 
**errp)
 sysbus_mmio_map(SYS_BUS_DEVICE(>dramc), 2,
 s->memmap[AW_R40_DEV_DRAMPHY]);
 
+/* nic support gmac and emac */
+for (int i = 0; i < ARRAY_SIZE(r40_nic_models) - 1; i++) {
+NICInfo *nic = _table[i];
+
+if (!nic->used) {
+continue;
+}
+if (qemu_show_nic_models(nic->model, r40_nic_models)) {
+exit(0);
+}
+
+switch (qemu_find_nic_model(nic, r40_nic_models, r40_nic_models[0])) {
+case 0: /* gmac */
+qdev_set_nic_properties(DEVICE(>gmac), nic);
+break;
+case 1: /* emac */
+qdev_set_nic_properties(DEVICE(>emac), nic);
+break;
+default:
+exit(1);
+break;
+}
+}
+
+/* GMAC */
+object_property_set_link(OBJECT(>gmac), "dma-memory",
+ OBJECT(get_system_memory()), 
_fatal);
+sysbus_realize(SYS_BUS_DEVICE(>gmac), _fatal);
+sysbus_mmio_map(SYS_BUS_DEVICE(>gmac), 0, s->memmap[AW_R40_DEV_GMAC]);
+sysbus_connect_irq(SYS_BUS_DEVICE(>gmac), 0,
+   qdev_get_gpio_in(DEVICE(>gic), AW_R40_GIC_SPI_GMAC));
+
+/* EMAC */
+sysbus_realize(SYS_BUS_DEVICE(>emac), _fatal);
+sysbus_mmio_map(SYS_BUS_DEVICE(>emac), 0, s->memmap[AW_R40_DEV_EMAC]);
+sysbus_connect_irq(SYS_BUS_DEVICE(>emac), 0,
+   qdev_get_gpio_in(DEVICE(>gic), AW_R40_GIC_SPI_EMAC));
+
 /* Unimplemented devices */
 for (i = 0; i < ARRAY_SIZE(r40_unimplemented); i++) {
 create_unimplemented_device(r40_unimplemented[i].device_name,
diff --git a/hw/arm/bananapi_m2u.c b/hw/arm/bananapi_m2u.c
index 20a4550c68..74121d8966 100644
--- a/hw/arm/bananapi_m2u.c
+++ b/hw/arm/bananapi_m2u.c
@@ -92,6 +92,9 @@ static void 

[PATCH v2 00/12] *** add allwinner-r40 support ***

2023-03-27 Thread qianfanguijin
From: qianfan Zhao 

*** history ***

# v1: 2023-03-21

The first version which add allwinner-r40 support, supported features:

+ ccu
+ dram controller
+ uart
+ i2c and pmic(axp221)
+ sdcard
+ emac/gmac

Also provide a test case under avocado, running quickly test:

$ AVOCADO_ALLOW_LARGE_STORAGE=yes tests/venv/bin/avocado \
--verbose --show=app,console run -t machine:bpim2u \
../tests/avocado/boot_linux_console.py

# v2: 2023-03-28

1. Fix the waring and error reported by checkpatch.pl
2. Remove the other i2c controllers except that i2c0
3. Use an array to register mmc and uart devices
4. Rename axp209 to axp22x and add axp221 support
5. Add a basic SRAM controller

qianfan Zhao (12):
  hw: arm: Add bananapi M2-Ultra and allwinner-r40 support
  hw/arm/allwinner-r40: add Clock Control Unit
  hw: allwinner-r40: Complete uart devices
  hw: arm: allwinner-r40: Add i2c0 device
  hw/misc: Rename axp209 to axp22x and add support AXP221 PMU
  hw/arm/allwinner-r40: add SDRAM controller device
  hw: sd: allwinner-sdhost: Add sun50i-a64 SoC support
  hw: arm: allwinner-r40: Fix the mmc controller's type
  hw: arm: allwinner-r40: Add emac and gmac support
  hw: arm: allwinner-sramc: Add SRAM Controller support for R40
  tests: avocado: boot_linux_console: Add test case for bpim2u
  docs: system: arm: Introduce bananapi_m2u

 configs/devices/arm-softmmu/default.mak |   1 +
 docs/system/arm/bananapi_m2u.rst| 138 +++
 hw/arm/Kconfig  |  13 +-
 hw/arm/allwinner-r40.c  | 526 
 hw/arm/bananapi_m2u.c   | 145 +++
 hw/arm/meson.build  |   1 +
 hw/misc/Kconfig |   5 +-
 hw/misc/allwinner-r40-ccu.c | 209 ++
 hw/misc/allwinner-r40-dramc.c   | 513 +++
 hw/misc/allwinner-sramc.c   | 184 +
 hw/misc/axp209.c| 238 ---
 hw/misc/axp2xx.c| 283 +
 hw/misc/meson.build |   5 +-
 hw/misc/trace-events|  26 +-
 hw/sd/allwinner-sdhost.c|  70 +++-
 include/hw/arm/allwinner-r40.h  | 143 +++
 include/hw/misc/allwinner-r40-ccu.h |  65 +++
 include/hw/misc/allwinner-r40-dramc.h   | 108 +
 include/hw/misc/allwinner-sramc.h   |  69 
 include/hw/sd/allwinner-sdhost.h|   9 +
 tests/avocado/boot_linux_console.py | 176 
 21 files changed, 2679 insertions(+), 248 deletions(-)
 create mode 100644 docs/system/arm/bananapi_m2u.rst
 create mode 100644 hw/arm/allwinner-r40.c
 create mode 100644 hw/arm/bananapi_m2u.c
 create mode 100644 hw/misc/allwinner-r40-ccu.c
 create mode 100644 hw/misc/allwinner-r40-dramc.c
 create mode 100644 hw/misc/allwinner-sramc.c
 delete mode 100644 hw/misc/axp209.c
 create mode 100644 hw/misc/axp2xx.c
 create mode 100644 include/hw/arm/allwinner-r40.h
 create mode 100644 include/hw/misc/allwinner-r40-ccu.h
 create mode 100644 include/hw/misc/allwinner-r40-dramc.h
 create mode 100644 include/hw/misc/allwinner-sramc.h

-- 
2.25.1




[PATCH v2 03/12] hw: allwinner-r40: Complete uart devices

2023-03-27 Thread qianfanguijin
From: qianfan Zhao 

R40 has eight UARTs, support both 16450 and 16550 compatible modes.

Signed-off-by: qianfan Zhao 
---
 hw/arm/allwinner-r40.c | 31 ---
 include/hw/arm/allwinner-r40.h |  8 
 2 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/hw/arm/allwinner-r40.c b/hw/arm/allwinner-r40.c
index 128c0ca470..537a90b23d 100644
--- a/hw/arm/allwinner-r40.c
+++ b/hw/arm/allwinner-r40.c
@@ -45,6 +45,13 @@ const hwaddr allwinner_r40_memmap[] = {
 [AW_R40_DEV_CCU]= 0x01c2,
 [AW_R40_DEV_PIT]= 0x01c20c00,
 [AW_R40_DEV_UART0]  = 0x01c28000,
+[AW_R40_DEV_UART1]  = 0x01c28400,
+[AW_R40_DEV_UART2]  = 0x01c28800,
+[AW_R40_DEV_UART3]  = 0x01c28c00,
+[AW_R40_DEV_UART4]  = 0x01c29000,
+[AW_R40_DEV_UART5]  = 0x01c29400,
+[AW_R40_DEV_UART6]  = 0x01c29800,
+[AW_R40_DEV_UART7]  = 0x01c29c00,
 [AW_R40_DEV_GIC_DIST]   = 0x01c81000,
 [AW_R40_DEV_GIC_CPU]= 0x01c82000,
 [AW_R40_DEV_GIC_HYP]= 0x01c84000,
@@ -160,6 +167,10 @@ enum {
 AW_R40_GIC_SPI_UART1 =  2,
 AW_R40_GIC_SPI_UART2 =  3,
 AW_R40_GIC_SPI_UART3 =  4,
+AW_R40_GIC_SPI_UART4 = 17,
+AW_R40_GIC_SPI_UART5 = 18,
+AW_R40_GIC_SPI_UART6 = 19,
+AW_R40_GIC_SPI_UART7 = 20,
 AW_R40_GIC_SPI_TIMER0= 22,
 AW_R40_GIC_SPI_TIMER1= 23,
 AW_R40_GIC_SPI_MMC0  = 32,
@@ -387,9 +398,23 @@ static void allwinner_r40_realize(DeviceState *dev, Error 
**errp)
 }
 
 /* UART0. For future clocktree API: All UARTS are connected to APB2_CLK. */
-serial_mm_init(get_system_memory(), s->memmap[AW_R40_DEV_UART0], 2,
-   qdev_get_gpio_in(DEVICE(>gic), AW_R40_GIC_SPI_UART0),
-   115200, serial_hd(0), DEVICE_NATIVE_ENDIAN);
+for (int i = 0; i < AW_R40_NUM_UARTS; i++) {
+static const int uart_irqs[AW_R40_NUM_UARTS] = {
+AW_R40_GIC_SPI_UART0,
+AW_R40_GIC_SPI_UART1,
+AW_R40_GIC_SPI_UART2,
+AW_R40_GIC_SPI_UART3,
+AW_R40_GIC_SPI_UART4,
+AW_R40_GIC_SPI_UART5,
+AW_R40_GIC_SPI_UART6,
+AW_R40_GIC_SPI_UART7,
+};
+const hwaddr addr = s->memmap[AW_R40_DEV_UART0 + i];
+
+serial_mm_init(get_system_memory(), addr, 2,
+   qdev_get_gpio_in(DEVICE(>gic), uart_irqs[i]),
+   115200, serial_hd(i), DEVICE_NATIVE_ENDIAN);
+}
 
 /* Unimplemented devices */
 for (i = 0; i < ARRAY_SIZE(r40_unimplemented); i++) {
diff --git a/include/hw/arm/allwinner-r40.h b/include/hw/arm/allwinner-r40.h
index 3be9dc962b..959b5dc4e0 100644
--- a/include/hw/arm/allwinner-r40.h
+++ b/include/hw/arm/allwinner-r40.h
@@ -41,6 +41,13 @@ enum {
 AW_R40_DEV_CCU,
 AW_R40_DEV_PIT,
 AW_R40_DEV_UART0,
+AW_R40_DEV_UART1,
+AW_R40_DEV_UART2,
+AW_R40_DEV_UART3,
+AW_R40_DEV_UART4,
+AW_R40_DEV_UART5,
+AW_R40_DEV_UART6,
+AW_R40_DEV_UART7,
 AW_R40_DEV_GIC_DIST,
 AW_R40_DEV_GIC_CPU,
 AW_R40_DEV_GIC_HYP,
@@ -70,6 +77,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(AwR40State, AW_R40)
  * which are currently emulated by the R40 SoC code.
  */
 #define AW_R40_NUM_MMCS 4
+#define AW_R40_NUM_UARTS8
 
 struct AwR40State {
 /*< private >*/
-- 
2.25.1




[PATCH v2 04/12] hw: arm: allwinner-r40: Add i2c0 device

2023-03-27 Thread qianfanguijin
From: qianfan Zhao 

TWI(i2c) is designed to be used as an interface between CPU host and the
serial 2-Wire bus. It can support all standard 2-Wire transfer, can be
operated in standard mode(100kbit/s) or fast-mode, supporting data rate
up to 400kbit/s.

Signed-off-by: qianfan Zhao 
---
 hw/arm/allwinner-r40.c | 11 ++-
 include/hw/arm/allwinner-r40.h |  3 +++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/hw/arm/allwinner-r40.c b/hw/arm/allwinner-r40.c
index 537a90b23d..4bc582630c 100644
--- a/hw/arm/allwinner-r40.c
+++ b/hw/arm/allwinner-r40.c
@@ -52,6 +52,7 @@ const hwaddr allwinner_r40_memmap[] = {
 [AW_R40_DEV_UART5]  = 0x01c29400,
 [AW_R40_DEV_UART6]  = 0x01c29800,
 [AW_R40_DEV_UART7]  = 0x01c29c00,
+[AW_R40_DEV_TWI0]   = 0x01c2ac00,
 [AW_R40_DEV_GIC_DIST]   = 0x01c81000,
 [AW_R40_DEV_GIC_CPU]= 0x01c82000,
 [AW_R40_DEV_GIC_HYP]= 0x01c84000,
@@ -115,7 +116,6 @@ static struct AwR40Unimplemented r40_unimplemented[] = {
 { "uart7",  0x01c29c00, 1 * KiB },
 { "ps20",   0x01c2a000, 1 * KiB },
 { "ps21",   0x01c2a400, 1 * KiB },
-{ "twi0",   0x01c2ac00, 1 * KiB },
 { "twi1",   0x01c2b000, 1 * KiB },
 { "twi2",   0x01c2b400, 1 * KiB },
 { "twi3",   0x01c2b800, 1 * KiB },
@@ -167,6 +167,7 @@ enum {
 AW_R40_GIC_SPI_UART1 =  2,
 AW_R40_GIC_SPI_UART2 =  3,
 AW_R40_GIC_SPI_UART3 =  4,
+AW_R40_GIC_SPI_TWI0  =  7,
 AW_R40_GIC_SPI_UART4 = 17,
 AW_R40_GIC_SPI_UART5 = 18,
 AW_R40_GIC_SPI_UART6 = 19,
@@ -270,6 +271,8 @@ static void allwinner_r40_init(Object *obj)
 object_initialize_child(obj, mmc_names[i], >mmc[i],
 TYPE_AW_SDHOST_SUN5I);
 }
+
+object_initialize_child(obj, "twi0", >i2c0, TYPE_AW_I2C_SUN6I);
 }
 
 static void allwinner_r40_realize(DeviceState *dev, Error **errp)
@@ -416,6 +419,12 @@ static void allwinner_r40_realize(DeviceState *dev, Error 
**errp)
115200, serial_hd(i), DEVICE_NATIVE_ENDIAN);
 }
 
+/* I2C */
+sysbus_realize(SYS_BUS_DEVICE(>i2c0), _fatal);
+sysbus_mmio_map(SYS_BUS_DEVICE(>i2c0), 0, s->memmap[AW_R40_DEV_TWI0]);
+sysbus_connect_irq(SYS_BUS_DEVICE(>i2c0), 0,
+   qdev_get_gpio_in(DEVICE(>gic), AW_R40_GIC_SPI_TWI0));
+
 /* Unimplemented devices */
 for (i = 0; i < ARRAY_SIZE(r40_unimplemented); i++) {
 create_unimplemented_device(r40_unimplemented[i].device_name,
diff --git a/include/hw/arm/allwinner-r40.h b/include/hw/arm/allwinner-r40.h
index 959b5dc4e0..95366f4eee 100644
--- a/include/hw/arm/allwinner-r40.h
+++ b/include/hw/arm/allwinner-r40.h
@@ -26,6 +26,7 @@
 #include "hw/intc/arm_gic.h"
 #include "hw/sd/allwinner-sdhost.h"
 #include "hw/misc/allwinner-r40-ccu.h"
+#include "hw/i2c/allwinner-i2c.h"
 #include "target/arm/cpu.h"
 #include "sysemu/block-backend.h"
 
@@ -48,6 +49,7 @@ enum {
 AW_R40_DEV_UART5,
 AW_R40_DEV_UART6,
 AW_R40_DEV_UART7,
+AW_R40_DEV_TWI0,
 AW_R40_DEV_GIC_DIST,
 AW_R40_DEV_GIC_CPU,
 AW_R40_DEV_GIC_HYP,
@@ -89,6 +91,7 @@ struct AwR40State {
 AwA10PITState timer;
 AwSdHostState mmc[AW_R40_NUM_MMCS];
 AwR40ClockCtlState ccu;
+AWI2CState i2c0;
 GICState gic;
 MemoryRegion sram_a1;
 MemoryRegion sram_a2;
-- 
2.25.1




[PATCH v2 10/12] hw: arm: allwinner-sramc: Add SRAM Controller support for R40

2023-03-27 Thread qianfanguijin
From: qianfan Zhao 

Only a few important registers are added, especially the SRAM_VER
register.

Signed-off-by: qianfan Zhao 
---
 hw/arm/Kconfig|   1 +
 hw/arm/allwinner-r40.c|   7 +-
 hw/misc/Kconfig   |   3 +
 hw/misc/allwinner-sramc.c | 184 ++
 hw/misc/meson.build   |   1 +
 hw/misc/trace-events  |   4 +
 include/hw/arm/allwinner-r40.h|   3 +
 include/hw/misc/allwinner-sramc.h |  69 +++
 8 files changed, 271 insertions(+), 1 deletion(-)
 create mode 100644 hw/misc/allwinner-sramc.c
 create mode 100644 include/hw/misc/allwinner-sramc.h

diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 85ded354ed..f3a4eb3f78 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -346,6 +346,7 @@ config ALLWINNER_H3
 
 config ALLWINNER_R40
 bool
+select ALLWINNER_SRAMC
 select ALLWINNER_A10_PIT
 select AXP2XX_PMU
 select SERIAL
diff --git a/hw/arm/allwinner-r40.c b/hw/arm/allwinner-r40.c
index c018ad231a..7d29eb224f 100644
--- a/hw/arm/allwinner-r40.c
+++ b/hw/arm/allwinner-r40.c
@@ -39,6 +39,7 @@ const hwaddr allwinner_r40_memmap[] = {
 [AW_R40_DEV_SRAM_A2]= 0x4000,
 [AW_R40_DEV_SRAM_A3]= 0x8000,
 [AW_R40_DEV_SRAM_A4]= 0xb400,
+[AW_R40_DEV_SRAMC]  = 0x01c0,
 [AW_R40_DEV_EMAC]   = 0x01c0b000,
 [AW_R40_DEV_MMC0]   = 0x01c0f000,
 [AW_R40_DEV_MMC1]   = 0x01c1,
@@ -76,7 +77,6 @@ struct AwR40Unimplemented {
 static struct AwR40Unimplemented r40_unimplemented[] = {
 { "d-engine",   0x0100, 4 * MiB },
 { "d-inter",0x0140, 128 * KiB },
-{ "sram-c", 0x01c0, 4 * KiB },
 { "dma",0x01c02000, 4 * KiB },
 { "nfdc",   0x01c03000, 4 * KiB },
 { "ts", 0x01c04000, 4 * KiB },
@@ -288,6 +288,8 @@ static void allwinner_r40_init(Object *obj)
  "ram-addr");
 object_property_add_alias(obj, "ram-size", OBJECT(>dramc),
   "ram-size");
+
+object_initialize_child(obj, "sramc", >sramc, TYPE_AW_SRAMC_SUN8I_R40);
 }
 
 static void allwinner_r40_realize(DeviceState *dev, Error **errp)
@@ -382,6 +384,9 @@ static void allwinner_r40_realize(DeviceState *dev, Error 
**errp)
AW_R40_GIC_SPI_TIMER1));
 
 /* SRAM */
+sysbus_realize(SYS_BUS_DEVICE(>sramc), _fatal);
+sysbus_mmio_map(SYS_BUS_DEVICE(>sramc), 0, s->memmap[AW_R40_DEV_SRAMC]);
+
 memory_region_init_ram(>sram_a1, OBJECT(dev), "sram A1",
 16 * KiB, _abort);
 memory_region_init_ram(>sram_a2, OBJECT(dev), "sram A2",
diff --git a/hw/misc/Kconfig b/hw/misc/Kconfig
index efeb430a6c..e4c2149175 100644
--- a/hw/misc/Kconfig
+++ b/hw/misc/Kconfig
@@ -170,6 +170,9 @@ config VIRT_CTRL
 config LASI
 bool
 
+config ALLWINNER_SRAMC
+bool
+
 config ALLWINNER_A10_CCM
 bool
 
diff --git a/hw/misc/allwinner-sramc.c b/hw/misc/allwinner-sramc.c
new file mode 100644
index 00..a8b731f8f2
--- /dev/null
+++ b/hw/misc/allwinner-sramc.c
@@ -0,0 +1,184 @@
+/*
+ * Allwinner R40 SRAM controller emulation
+ *
+ * Copyright (C) 2023 qianfan Zhao 
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "hw/sysbus.h"
+#include "migration/vmstate.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "qapi/error.h"
+#include "hw/qdev-properties.h"
+#include "hw/qdev-properties-system.h"
+#include "hw/misc/allwinner-sramc.h"
+#include "trace.h"
+
+/*
+ * register offsets
+ * https://linux-sunxi.org/SRAM_Controller_Register_Guide
+ */
+enum {
+REG_SRAM_CTL1_CFG   = 0x04, /* SRAM Control register 1 */
+REG_SRAM_VER= 0x24, /* SRAM Version register */
+REG_SRAM_R40_SOFT_ENTRY_REG0= 0xbc,
+};
+
+/* REG_SRAMC_VERSION bit defines */
+#define SRAM_VER_READ_ENABLE(1 << 15)
+#define SRAM_VER_VERSION_SHIFT  16
+#define SRAM_VERSION_SUN8I_R40  0x1701
+
+static uint64_t allwinner_sramc_read(void *opaque, hwaddr offset,
+ unsigned size)
+{
+AwSRAMCState *s = AW_SRAMC(opaque);
+AwSRAMCClass *sc = AW_SRAMC_GET_CLASS(s);
+uint64_t val = 0;
+
+switch (offset) {
+case REG_SRAM_CTL1_CFG:
+val = s->sram_ctl1;
+break;

[PATCH v2 07/12] hw: sd: allwinner-sdhost: Add sun50i-a64 SoC support

2023-03-27 Thread qianfanguijin
From: qianfan Zhao 

A64's sd register was similar to H3, and it introduced a new register
named SAMP_DL_REG location at 0x144. The dma descriptor buffer size of
mmc2 is only 8K and the other mmc controllers has 64K.

Signed-off-by: qianfan Zhao 
---
 hw/sd/allwinner-sdhost.c | 70 ++--
 include/hw/sd/allwinner-sdhost.h |  9 
 2 files changed, 76 insertions(+), 3 deletions(-)

diff --git a/hw/sd/allwinner-sdhost.c b/hw/sd/allwinner-sdhost.c
index 51e5e90830..38e7844399 100644
--- a/hw/sd/allwinner-sdhost.c
+++ b/hw/sd/allwinner-sdhost.c
@@ -77,6 +77,7 @@ enum {
 REG_SD_DATA1_CRC  = 0x12C, /* CRC Data 1 from card/eMMC */
 REG_SD_DATA0_CRC  = 0x130, /* CRC Data 0 from card/eMMC */
 REG_SD_CRC_STA= 0x134, /* CRC status from card/eMMC during write */
+REG_SD_SAMP_DL= 0x144, /* Sample Delay Control (sun50i-a64) */
 REG_SD_FIFO   = 0x200, /* Read/Write FIFO */
 };
 
@@ -158,6 +159,7 @@ enum {
 REG_SD_RES_CRC_RST  = 0x0,
 REG_SD_DATA_CRC_RST = 0x0,
 REG_SD_CRC_STA_RST  = 0x0,
+REG_SD_SAMPLE_DL_RST= 0x2000,
 REG_SD_FIFO_RST = 0x0,
 };
 
@@ -438,6 +440,7 @@ static uint64_t allwinner_sdhost_read(void *opaque, hwaddr 
offset,
 {
 AwSdHostState *s = AW_SDHOST(opaque);
 AwSdHostClass *sc = AW_SDHOST_GET_CLASS(s);
+bool out_of_bounds = false;
 uint32_t res = 0;
 
 switch (offset) {
@@ -556,13 +559,24 @@ static uint64_t allwinner_sdhost_read(void *opaque, 
hwaddr offset,
 case REG_SD_FIFO:  /* Read/Write FIFO */
 res = allwinner_sdhost_fifo_read(s);
 break;
+case REG_SD_SAMP_DL: /* Sample Delay */
+if (sc->can_calibrate) {
+res = s->sample_delay;
+} else {
+out_of_bounds = true;
+}
+break;
 default:
-qemu_log_mask(LOG_GUEST_ERROR, "%s: out-of-bounds offset %"
-  HWADDR_PRIx"\n", __func__, offset);
+out_of_bounds = true;
 res = 0;
 break;
 }
 
+if (out_of_bounds) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: out-of-bounds offset %"
+  HWADDR_PRIx"\n", __func__, offset);
+}
+
 trace_allwinner_sdhost_read(offset, res, size);
 return res;
 }
@@ -581,6 +595,7 @@ static void allwinner_sdhost_write(void *opaque, hwaddr 
offset,
 {
 AwSdHostState *s = AW_SDHOST(opaque);
 AwSdHostClass *sc = AW_SDHOST_GET_CLASS(s);
+bool out_of_bounds = false;
 
 trace_allwinner_sdhost_write(offset, value, size);
 
@@ -704,10 +719,21 @@ static void allwinner_sdhost_write(void *opaque, hwaddr 
offset,
 case REG_SD_DATA0_CRC: /* CRC Data 0 from card/eMMC */
 case REG_SD_CRC_STA:   /* CRC status from card/eMMC in write operation */
 break;
+case REG_SD_SAMP_DL: /* Sample delay control */
+if (sc->can_calibrate) {
+s->sample_delay = value;
+} else {
+out_of_bounds = true;
+}
+break;
 default:
+out_of_bounds = true;
+break;
+}
+
+if (out_of_bounds) {
 qemu_log_mask(LOG_GUEST_ERROR, "%s: out-of-bounds offset %"
   HWADDR_PRIx"\n", __func__, offset);
-break;
 }
 }
 
@@ -756,6 +782,7 @@ static const VMStateDescription vmstate_allwinner_sdhost = {
 VMSTATE_UINT32(response_crc, AwSdHostState),
 VMSTATE_UINT32_ARRAY(data_crc, AwSdHostState, 8),
 VMSTATE_UINT32(status_crc, AwSdHostState),
+VMSTATE_UINT32(sample_delay, AwSdHostState),
 VMSTATE_END_OF_LIST()
 }
 };
@@ -794,6 +821,7 @@ static void allwinner_sdhost_realize(DeviceState *dev, 
Error **errp)
 static void allwinner_sdhost_reset(DeviceState *dev)
 {
 AwSdHostState *s = AW_SDHOST(dev);
+AwSdHostClass *sc = AW_SDHOST_GET_CLASS(s);
 
 s->global_ctl = REG_SD_GCTL_RST;
 s->clock_ctl = REG_SD_CKCR_RST;
@@ -834,6 +862,10 @@ static void allwinner_sdhost_reset(DeviceState *dev)
 }
 
 s->status_crc = REG_SD_CRC_STA_RST;
+
+if (sc->can_calibrate) {
+s->sample_delay = REG_SD_SAMPLE_DL_RST;
+}
 }
 
 static void allwinner_sdhost_bus_class_init(ObjectClass *klass, void *data)
@@ -867,6 +899,24 @@ static void allwinner_sdhost_sun5i_class_init(ObjectClass 
*klass, void *data)
 sc->is_sun4i = false;
 }
 
+static void allwinner_sdhost_sun50i_a64_class_init(ObjectClass *klass,
+   void *data)
+{
+AwSdHostClass *sc = AW_SDHOST_CLASS(klass);
+sc->max_desc_size = 64 * KiB;
+sc->is_sun4i = false;
+sc->can_calibrate = true;
+}
+
+static void allwinner_sdhost_sun50i_a64_emmc_class_init(ObjectClass *klass,
+void *data)
+{
+AwSdHostClass *sc = AW_SDHOST_CLASS(klass);
+sc->max_desc_size = 8 * KiB;
+sc->is_sun4i = false;
+sc->can_calibrate = true;
+}
+
 static const TypeInfo allwinner_sdhost_info = {
 .name  = TYPE_AW_SDHOST,
 

[PATCH v2 02/12] hw/arm/allwinner-r40: add Clock Control Unit

2023-03-27 Thread qianfanguijin
From: qianfan Zhao 

The CCU provides the registers to program the PLLs and the controls
most of the clock generation, division, distribution, synchronization
and gating.

This commit adds support for the Clock Control Unit which emulates
a simple read/write register interface.

Signed-off-by: qianfan Zhao 
---
 hw/arm/allwinner-r40.c  |   8 +-
 hw/misc/allwinner-r40-ccu.c | 209 
 hw/misc/meson.build |   1 +
 include/hw/arm/allwinner-r40.h  |   2 +
 include/hw/misc/allwinner-r40-ccu.h |  65 +
 5 files changed, 284 insertions(+), 1 deletion(-)
 create mode 100644 hw/misc/allwinner-r40-ccu.c
 create mode 100644 include/hw/misc/allwinner-r40-ccu.h

diff --git a/hw/arm/allwinner-r40.c b/hw/arm/allwinner-r40.c
index b743d64253..128c0ca470 100644
--- a/hw/arm/allwinner-r40.c
+++ b/hw/arm/allwinner-r40.c
@@ -42,6 +42,7 @@ const hwaddr allwinner_r40_memmap[] = {
 [AW_R40_DEV_MMC1]   = 0x01c1,
 [AW_R40_DEV_MMC2]   = 0x01c11000,
 [AW_R40_DEV_MMC3]   = 0x01c12000,
+[AW_R40_DEV_CCU]= 0x01c2,
 [AW_R40_DEV_PIT]= 0x01c20c00,
 [AW_R40_DEV_UART0]  = 0x01c28000,
 [AW_R40_DEV_GIC_DIST]   = 0x01c81000,
@@ -80,7 +81,6 @@ static struct AwR40Unimplemented r40_unimplemented[] = {
 { "usb2-host",  0x01c1c000, 4 * KiB },
 { "cs1",0x01c1d000, 4 * KiB },
 { "spi3",   0x01c1f000, 4 * KiB },
-{ "ccu",0x01c2, 1 * KiB },
 { "rtc",0x01c20400, 1 * KiB },
 { "pio",0x01c20800, 1 * KiB },
 { "owa",0x01c21000, 1 * KiB },
@@ -253,6 +253,8 @@ static void allwinner_r40_init(Object *obj)
 object_property_add_alias(obj, "clk1-freq", OBJECT(>timer),
   "clk1-freq");
 
+object_initialize_child(obj, "ccu", >ccu, TYPE_AW_R40_CCU);
+
 for (int i = 0; i < AW_R40_NUM_MMCS; i++) {
 object_initialize_child(obj, mmc_names[i], >mmc[i],
 TYPE_AW_SDHOST_SUN5I);
@@ -367,6 +369,10 @@ static void allwinner_r40_realize(DeviceState *dev, Error 
**errp)
 memory_region_add_subregion(get_system_memory(),
 s->memmap[AW_R40_DEV_SRAM_A4], >sram_a4);
 
+/* Clock Control Unit */
+sysbus_realize(SYS_BUS_DEVICE(>ccu), _fatal);
+sysbus_mmio_map(SYS_BUS_DEVICE(>ccu), 0, s->memmap[AW_R40_DEV_CCU]);
+
 /* SD/MMC */
 for (int i = 0; i < AW_R40_NUM_MMCS; i++) {
 qemu_irq irq = qdev_get_gpio_in(DEVICE(>gic),
diff --git a/hw/misc/allwinner-r40-ccu.c b/hw/misc/allwinner-r40-ccu.c
new file mode 100644
index 00..d82fee12db
--- /dev/null
+++ b/hw/misc/allwinner-r40-ccu.c
@@ -0,0 +1,209 @@
+/*
+ * Allwinner R40 Clock Control Unit emulation
+ *
+ * Copyright (C) 2023 qianfan Zhao 
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "hw/sysbus.h"
+#include "migration/vmstate.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "hw/misc/allwinner-r40-ccu.h"
+
+/* CCU register offsets */
+enum {
+REG_PLL_CPUX_CTRL   = 0x,
+REG_PLL_AUDIO_CTRL  = 0x0008,
+REG_PLL_VIDEO0_CTRL = 0x0010,
+REG_PLL_VE_CTRL = 0x0018,
+REG_PLL_DDR0_CTRL   = 0x0020,
+REG_PLL_PERIPH0_CTRL= 0x0028,
+REG_PLL_PERIPH1_CTRL= 0x002c,
+REG_PLL_VIDEO1_CTRL = 0x0030,
+REG_PLL_SATA_CTRL   = 0x0034,
+REG_PLL_GPU_CTRL= 0x0038,
+REG_PLL_MIPI_CTRL   = 0x0040,
+REG_PLL_DE_CTRL = 0x0048,
+REG_PLL_DDR1_CTRL   = 0x004c,
+REG_AHB1_APB1_CFG   = 0x0054,
+REG_APB2_CFG= 0x0058,
+REG_MMC0_CLK= 0x0088,
+REG_MMC1_CLK= 0x008c,
+REG_MMC2_CLK= 0x0090,
+REG_MMC3_CLK= 0x0094,
+REG_USBPHY_CFG  = 0x00cc,
+REG_PLL_DDR_AUX = 0x00f0,
+REG_DRAM_CFG= 0x00f4,
+REG_PLL_DDR1_CFG= 0x00f8,
+REG_DRAM_CLK_GATING = 0x0100,
+REG_GMAC_CLK= 0x0164,
+REG_SYS_32K_CLK = 0x0310,
+REG_PLL_LOCK_CTRL   = 0x0320,
+};
+
+#define REG_INDEX(offset)(offset / sizeof(uint32_t))
+
+/* CCU register flags */
+enum {
+REG_PLL_ENABLE

Re: [PATCH v4 06/10] migration: Introduce dirty-limit capability

2023-03-27 Thread Hyman Huang




在 2023/3/27 14:41, Markus Armbruster 写道:

Hyman Huang  writes:


在 2023/3/24 22:32, Markus Armbruster 写道:

Hyman Huang  writes:


在 2023/3/24 20:11, Markus Armbruster 写道:

huang...@chinatelecom.cn writes:


From: Hyman Huang(黄勇) 

Introduce migration dirty-limit capability, which can
be turned on before live migration and limit dirty
page rate durty live migration.

Introduce migrate_dirty_limit function to help check
if dirty-limit capability enabled during live migration.

Meanwhile, refactor vcpu_dirty_rate_stat_collect
so that period can be configured instead of hardcoded.

dirty-limit capability is kind of like auto-converge
but using dirty limit instead of traditional cpu-throttle
to throttle guest down. To enable this feature, turn on
the dirty-limit capability before live migration using
migrate-set-capabilities, and set the parameters
"x-vcpu-dirty-limit-period", "vcpu-dirty-limit" suitably
to speed up convergence.

Signed-off-by: Hyman Huang(黄勇) 
Acked-by: Peter Xu 

[...]


diff --git a/qapi/migration.json b/qapi/migration.json
index d33cc2d582..b7a92be055 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -477,6 +477,8 @@
#will be handled faster.  This is a performance feature 
and
#should not affect the correctness of postcopy 
migration.
#(since 7.1)
+# @dirty-limit: Use dirty-limit to throttle down guest if enabled.
+#   (since 8.0)


Feels too terse.  What exactly is used and how?  It's not the capability
itself (although the text sure sounds like it).  I guess it's the thing
you set with command set-vcpu-dirty-limit.

Is that used only when the capability is set?


Yes, live migration set "dirty-limit" value when that capability is set,
the comment changes to "Apply the algorithm of dirty page rate limit to throttle 
down guest if capability is set, rather than auto-converge".

Please continue to polish the doc if needed. Thanks.


Let's see whether I understand.

Throttling happens only during migration.

There are two throttling algorithms: "auto-converge" (default) and
"dirty page rate limit".

The latter can be tuned with set-vcpu-dirty-limit.
Correct?


Yes


What happens when migration capability dirty-limit is enabled, but the
user hasn't set a limit with set-vcpu-dirty-limit, or canceled it with
cancel-vcpu-dirty-limit?


dirty-limit capability use the default value if user hasn't set.


What is the default value?  I can't find it in the doc comments.

The default value is 1MB/s, i'll add it to the doc comments.



In the path of cancel-vcpu-dirty-limit, canceling should be check and not be 
allowed if migration is in process.


Can you change the dirty limit with set-vcpu-dirty-limit while migration
is in progress?  Let's see...

No, this is not allowed.



Has the dirty limit any effect while migration is not in progress?
Like the auto-converge capability, dirty-limit capability has no effect 
if migration is not in progress.



see the following code in commit:
[PATCH v4 08/10] migration: Implement dirty-limit convergence algo

--- a/softmmu/dirtylimit.c
+++ b/softmmu/dirtylimit.c
@@ -438,6 +438,8 @@ void qmp_cancel_vcpu_dirty_limit(bool has_cpu_index,
   int64_t cpu_index,
   Error **errp)
  {
+MigrationState *ms = migrate_get_current();
+
  if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
  return;
  }
@@ -451,6 +453,15 @@ void qmp_cancel_vcpu_dirty_limit(bool has_cpu_index,
  return;
  }

+if (migration_is_running(ms->state) &&
+(!qemu_thread_is_self(>thread)) &&
+migrate_dirty_limit() &&
+dirtylimit_in_service()) {
+error_setg(errp, "can't cancel dirty page limit while"
+   " migration is running");
+return;
+}


We can get here even when migration_is_running() is true.  Seems to
contradict your claim "no cancel while migration is in progress".  Am I
confused?

Please drop the superfluous parenthesis around !qemu_thread_is_self().


+
  dirtylimit_state_lock();

  if (has_cpu_index) {
@@ -486,6 +497,8 @@ void qmp_set_vcpu_dirty_limit(bool has_cpu_index,
uint64_t dirty_rate,
Error **errp)
  {
+MigrationState *ms = migrate_get_current();
+
  if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
  error_setg(errp, "dirty page limit feature requires KVM with"
 " accelerator property 'dirty-ring-size' set'");
@@ -502,6 +515,15 @@ void qmp_set_vcpu_dirty_limit(bool has_cpu_index,
  return;
  }

+if (migration_is_running(ms->state) &&
+(!qemu_thread_is_self(>thread)) &&
+migrate_dirty_limit() &&
+dirtylimit_in_service()) {
+error_setg(errp, "can't cancel dirty page limit while"
+   " migration is running");


Same condition, i.e. we dirty limit change is 

Re: [PATCH v5 1/1] util/async-teardown: wire up query-command-line-options

2023-03-27 Thread Markus Armbruster
Paolo Bonzini  writes:

> I am honestly not a fan of adding a more complex option,.just because
> query-command-line-options only returns the square holes whereas here we
> got a round one.
>
> Can we imagine another functionality that would be added to -teardown? If
> not, it's not a good design. If it works, I would add a completely dummy
> (no suboptions) group "async-teardown" and not modify the parsing at all.

Does v2 implement your suggestion?
Message-Id: <20230320131648.61728-1-imbre...@linux.ibm.com>

I dislike it, because it makes query-command-line-options claim
-async-teardown has an option argument with unknown keys, which is
plainly wrong, and must be treated as a special case.  Worse, a new kind
of special case.

Can we have a QMP command, so libvirt can use query-qmp-schema?

In case QMP becomes functional too late for the command to actually
work: make it always fail for now.  It can still serve as a witness for
-async-teardown.  If we rework QEMU startup so that QMP can do
everything the CLI can do, we'll make the QMP command work.




[PULL 06/12] igb: handle PF/VF reset properly

2023-03-27 Thread Jason Wang
From: Sriram Yagnaraman 

Use PFRSTD to reset RSTI bit for VFs, and raise VFLRE interrupt when VF
is reset.

Signed-off-by: Sriram Yagnaraman 
Signed-off-by: Jason Wang 
---
 hw/net/igb_core.c   | 38 ++
 hw/net/igb_regs.h   |  3 +++
 hw/net/trace-events |  2 ++
 3 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 78d3073..6ba9696 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -1898,14 +1898,6 @@ static void igb_set_eims(IGBCore *core, int index, 
uint32_t val)
 igb_update_interrupt_state(core);
 }
 
-static void igb_vf_reset(IGBCore *core, uint16_t vfn)
-{
-/* TODO: Reset of the queue enable and the interrupt registers of the VF. 
*/
-
-core->mac[V2PMAILBOX0 + vfn] &= ~E1000_V2PMAILBOX_RSTI;
-core->mac[V2PMAILBOX0 + vfn] = E1000_V2PMAILBOX_RSTD;
-}
-
 static void mailbox_interrupt_to_vf(IGBCore *core, uint16_t vfn)
 {
 uint32_t ent = core->mac[VTIVAR_MISC + vfn];
@@ -1983,6 +1975,17 @@ static void igb_set_vfmailbox(IGBCore *core, int index, 
uint32_t val)
 }
 }
 
+static void igb_vf_reset(IGBCore *core, uint16_t vfn)
+{
+/* disable Rx and Tx for the VF*/
+core->mac[VFTE] &= ~BIT(vfn);
+core->mac[VFRE] &= ~BIT(vfn);
+/* indicate VF reset to PF */
+core->mac[VFLRE] |= BIT(vfn);
+/* VFLRE and mailbox use the same interrupt cause */
+mailbox_interrupt_to_pf(core);
+}
+
 static void igb_w1c(IGBCore *core, int index, uint32_t val)
 {
 core->mac[index] &= ~val;
@@ -2237,14 +2240,20 @@ igb_set_status(IGBCore *core, int index, uint32_t val)
 static void
 igb_set_ctrlext(IGBCore *core, int index, uint32_t val)
 {
-trace_e1000e_link_set_ext_params(!!(val & E1000_CTRL_EXT_ASDCHK),
- !!(val & E1000_CTRL_EXT_SPD_BYPS));
-
-/* TODO: PFRSTD */
+trace_igb_link_set_ext_params(!!(val & E1000_CTRL_EXT_ASDCHK),
+  !!(val & E1000_CTRL_EXT_SPD_BYPS),
+  !!(val & E1000_CTRL_EXT_PFRSTD));
 
 /* Zero self-clearing bits */
 val &= ~(E1000_CTRL_EXT_ASDCHK | E1000_CTRL_EXT_EE_RST);
 core->mac[CTRL_EXT] = val;
+
+if (core->mac[CTRL_EXT] & E1000_CTRL_EXT_PFRSTD) {
+for (int vfn = 0; vfn < IGB_MAX_VF_FUNCTIONS; vfn++) {
+core->mac[V2PMAILBOX0 + vfn] &= ~E1000_V2PMAILBOX_RSTI;
+core->mac[V2PMAILBOX0 + vfn] |= E1000_V2PMAILBOX_RSTD;
+}
+}
 }
 
 static void
@@ -4027,6 +4036,11 @@ static void igb_reset(IGBCore *core, bool sw)
 
 e1000x_reset_mac_addr(core->owner_nic, core->mac, core->permanent_mac);
 
+for (int vfn = 0; vfn < IGB_MAX_VF_FUNCTIONS; vfn++) {
+/* Set RSTI, so VF can identify a PF reset is in progress */
+core->mac[V2PMAILBOX0 + vfn] |= E1000_V2PMAILBOX_RSTI;
+}
+
 for (i = 0; i < ARRAY_SIZE(core->tx); i++) {
 tx = >tx[i];
 net_tx_pkt_reset(tx->tx_pkt, NULL);
diff --git a/hw/net/igb_regs.h b/hw/net/igb_regs.h
index 00934d4..a658f9b 100644
--- a/hw/net/igb_regs.h
+++ b/hw/net/igb_regs.h
@@ -240,6 +240,9 @@ union e1000_adv_rx_desc {
 
 /* from igb/e1000_defines.h */
 
+/* Physical Func Reset Done Indication */
+#define E1000_CTRL_EXT_PFRSTD   0x4000
+
 #define E1000_IVAR_VALID 0x80
 #define E1000_GPIE_NSICR 0x0001
 #define E1000_GPIE_MSIX_MODE 0x0010
diff --git a/hw/net/trace-events b/hw/net/trace-events
index 6575341..d35554f 100644
--- a/hw/net/trace-events
+++ b/hw/net/trace-events
@@ -280,6 +280,8 @@ igb_core_mdic_read_unhandled(uint32_t addr) "MDIC READ: 
PHY[%u] UNHANDLED"
 igb_core_mdic_write(uint32_t addr, uint32_t data) "MDIC WRITE: PHY[%u] = 0x%x"
 igb_core_mdic_write_unhandled(uint32_t addr) "MDIC WRITE: PHY[%u] UNHANDLED"
 
+igb_link_set_ext_params(bool asd_check, bool speed_select_bypass, bool pfrstd) 
"Set extended link params: ASD check: %d, Speed select bypass: %d, PF reset 
done: %d"
+
 igb_rx_desc_buff_size(uint32_t b) "buffer size: %u"
 igb_rx_desc_buff_write(uint64_t addr, uint16_t offset, const void* source, 
uint32_t len) "addr: 0x%"PRIx64", offset: %u, from: %p, length: %u"
 
-- 
2.7.4




[PULL 05/12] MAINTAINERS: Add Sriram Yagnaraman as a igb reviewer

2023-03-27 Thread Jason Wang
From: Sriram Yagnaraman 

I would like to review and be informed on changes to igb device

Signed-off-by: Sriram Yagnaraman 
Signed-off-by: Jason Wang 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 34b50b2..ef45b5e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2252,6 +2252,7 @@ F: tests/qtest/libqos/e1000e.*
 
 igb
 M: Akihiko Odaki 
+R: Sriram Yagnaraman 
 S: Maintained
 F: docs/system/devices/igb.rst
 F: hw/net/igb*
-- 
2.7.4




[PULL 01/12] igb: Save more Tx states

2023-03-27 Thread Jason Wang
From: Akihiko Odaki 

The current implementation of igb uses only part of a advanced Tx
context descriptor and first data descriptor because it misses some
features and sniffs the trait of the packet instead of respecting the
packet type specified in the descriptor. However, we will certainly
need the entire Tx context descriptor when we update igb to respect
these ignored fields. Save the entire context descriptor and first
data descriptor except the buffer address to prepare for such a change.

This also introduces the distinction of contexts with different
indexes, which was not present in e1000e but in igb.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Sriram Yagnaraman 
Signed-off-by: Jason Wang 
---
 hw/net/igb.c  | 26 +++---
 hw/net/igb_core.c | 39 +++
 hw/net/igb_core.h |  8 +++-
 3 files changed, 41 insertions(+), 32 deletions(-)

diff --git a/hw/net/igb.c b/hw/net/igb.c
index c6d753d..51a7e91 100644
--- a/hw/net/igb.c
+++ b/hw/net/igb.c
@@ -502,16 +502,28 @@ static int igb_post_load(void *opaque, int version_id)
 return igb_core_post_load(>core);
 }
 
-static const VMStateDescription igb_vmstate_tx = {
-.name = "igb-tx",
+static const VMStateDescription igb_vmstate_tx_ctx = {
+.name = "igb-tx-ctx",
 .version_id = 1,
 .minimum_version_id = 1,
 .fields = (VMStateField[]) {
-VMSTATE_UINT16(vlan, struct igb_tx),
-VMSTATE_UINT16(mss, struct igb_tx),
-VMSTATE_BOOL(tse, struct igb_tx),
-VMSTATE_BOOL(ixsm, struct igb_tx),
-VMSTATE_BOOL(txsm, struct igb_tx),
+VMSTATE_UINT32(vlan_macip_lens, struct e1000_adv_tx_context_desc),
+VMSTATE_UINT32(seqnum_seed, struct e1000_adv_tx_context_desc),
+VMSTATE_UINT32(type_tucmd_mlhl, struct e1000_adv_tx_context_desc),
+VMSTATE_UINT32(mss_l4len_idx, struct e1000_adv_tx_context_desc),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static const VMStateDescription igb_vmstate_tx = {
+.name = "igb-tx",
+.version_id = 2,
+.minimum_version_id = 2,
+.fields = (VMStateField[]) {
+VMSTATE_STRUCT_ARRAY(ctx, struct igb_tx, 2, 0, igb_vmstate_tx_ctx,
+ struct e1000_adv_tx_context_desc),
+VMSTATE_UINT32(first_cmd_type_len, struct igb_tx),
+VMSTATE_UINT32(first_olinfo_status, struct igb_tx),
 VMSTATE_BOOL(first, struct igb_tx),
 VMSTATE_BOOL(skip_cp, struct igb_tx),
 VMSTATE_END_OF_LIST()
diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index a7c7bfd..7708333 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -389,8 +389,10 @@ igb_rss_parse_packet(IGBCore *core, struct NetRxPkt *pkt, 
bool tx,
 static bool
 igb_setup_tx_offloads(IGBCore *core, struct igb_tx *tx)
 {
-if (tx->tse) {
-if (!net_tx_pkt_build_vheader(tx->tx_pkt, true, true, tx->mss)) {
+if (tx->first_cmd_type_len & E1000_ADVTXD_DCMD_TSE) {
+uint32_t idx = (tx->first_olinfo_status >> 4) & 1;
+uint32_t mss = tx->ctx[idx].mss_l4len_idx >> 16;
+if (!net_tx_pkt_build_vheader(tx->tx_pkt, true, true, mss)) {
 return false;
 }
 
@@ -399,13 +401,13 @@ igb_setup_tx_offloads(IGBCore *core, struct igb_tx *tx)
 return true;
 }
 
-if (tx->txsm) {
+if (tx->first_olinfo_status & E1000_ADVTXD_POTS_TXSM) {
 if (!net_tx_pkt_build_vheader(tx->tx_pkt, false, true, 0)) {
 return false;
 }
 }
 
-if (tx->ixsm) {
+if (tx->first_olinfo_status & E1000_ADVTXD_POTS_IXSM) {
 net_tx_pkt_update_ip_hdr_checksum(tx->tx_pkt);
 }
 
@@ -527,7 +529,7 @@ igb_process_tx_desc(IGBCore *core,
 {
 struct e1000_adv_tx_context_desc *tx_ctx_desc;
 uint32_t cmd_type_len;
-uint32_t olinfo_status;
+uint32_t idx;
 uint64_t buffer_addr;
 uint16_t length;
 
@@ -538,20 +540,19 @@ igb_process_tx_desc(IGBCore *core,
 E1000_ADVTXD_DTYP_DATA) {
 /* advanced transmit data descriptor */
 if (tx->first) {
-olinfo_status = le32_to_cpu(tx_desc->read.olinfo_status);
-
-tx->tse = !!(cmd_type_len & E1000_ADVTXD_DCMD_TSE);
-tx->ixsm = !!(olinfo_status & E1000_ADVTXD_POTS_IXSM);
-tx->txsm = !!(olinfo_status & E1000_ADVTXD_POTS_TXSM);
-
+tx->first_cmd_type_len = cmd_type_len;
+tx->first_olinfo_status = 
le32_to_cpu(tx_desc->read.olinfo_status);
 tx->first = false;
 }
 } else if ((cmd_type_len & E1000_ADVTXD_DTYP_CTXT) ==
E1000_ADVTXD_DTYP_CTXT) {
 /* advanced transmit context descriptor */
 tx_ctx_desc = (struct e1000_adv_tx_context_desc *)tx_desc;
-tx->vlan = le32_to_cpu(tx_ctx_desc->vlan_macip_lens) >> 16;
-tx->mss = le32_to_cpu(tx_ctx_desc->mss_l4len_idx) >> 16;
+idx = (le32_to_cpu(tx_ctx_desc->mss_l4len_idx) >> 4) & 1;
+

[PULL 02/12] igb: Fix DMA requester specification for Tx packet

2023-03-27 Thread Jason Wang
From: Akihiko Odaki 

igb used to specify the PF as DMA requester when reading Tx packets.
This made Tx requests from VFs to be performed on the address space of
the PF, defeating the purpose of SR-IOV. Add some logic to change the
requester depending on the queue, which can be assigned to a VF.

Fixes: 3a977deebe ("Intrdocue igb device emulation")
Signed-off-by: Akihiko Odaki 
Signed-off-by: Jason Wang 
---
 hw/net/e1000e_core.c |  6 +++---
 hw/net/igb_core.c| 13 -
 hw/net/net_tx_pkt.c  |  3 ++-
 hw/net/net_tx_pkt.h  |  3 ++-
 hw/net/vmxnet3.c |  4 ++--
 5 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 4d9679c..c0c09b6 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -765,7 +765,7 @@ e1000e_process_tx_desc(E1000ECore *core,
 }
 
 tx->skip_cp = false;
-net_tx_pkt_reset(tx->tx_pkt);
+net_tx_pkt_reset(tx->tx_pkt, core->owner);
 
 tx->sum_needed = 0;
 tx->cptse = 0;
@@ -3447,7 +3447,7 @@ e1000e_core_pci_uninit(E1000ECore *core)
 qemu_del_vm_change_state_handler(core->vmstate);
 
 for (i = 0; i < E1000E_NUM_QUEUES; i++) {
-net_tx_pkt_reset(core->tx[i].tx_pkt);
+net_tx_pkt_reset(core->tx[i].tx_pkt, core->owner);
 net_tx_pkt_uninit(core->tx[i].tx_pkt);
 }
 
@@ -3572,7 +3572,7 @@ static void e1000e_reset(E1000ECore *core, bool sw)
 e1000x_reset_mac_addr(core->owner_nic, core->mac, core->permanent_mac);
 
 for (i = 0; i < ARRAY_SIZE(core->tx); i++) {
-net_tx_pkt_reset(core->tx[i].tx_pkt);
+net_tx_pkt_reset(core->tx[i].tx_pkt, core->owner);
 memset(>tx[i].props, 0, sizeof(core->tx[i].props));
 core->tx[i].skip_cp = false;
 }
diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 7708333..78d3073 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -523,6 +523,7 @@ igb_on_tx_done_update_stats(IGBCore *core, struct NetTxPkt 
*tx_pkt)
 
 static void
 igb_process_tx_desc(IGBCore *core,
+PCIDevice *dev,
 struct igb_tx *tx,
 union e1000_adv_tx_desc *tx_desc,
 int queue_index)
@@ -588,7 +589,7 @@ igb_process_tx_desc(IGBCore *core,
 
 tx->first = true;
 tx->skip_cp = false;
-net_tx_pkt_reset(tx->tx_pkt);
+net_tx_pkt_reset(tx->tx_pkt, dev);
 }
 }
 
@@ -803,6 +804,8 @@ igb_start_xmit(IGBCore *core, const IGB_TxRing *txr)
 d = core->owner;
 }
 
+net_tx_pkt_reset(txr->tx->tx_pkt, d);
+
 while (!igb_ring_empty(core, txi)) {
 base = igb_ring_head_descr(core, txi);
 
@@ -811,7 +814,7 @@ igb_start_xmit(IGBCore *core, const IGB_TxRing *txr)
 trace_e1000e_tx_descr((void *)(intptr_t)desc.read.buffer_addr,
   desc.read.cmd_type_len, desc.wb.status);
 
-igb_process_tx_desc(core, txr->tx, , txi->idx);
+igb_process_tx_desc(core, d, txr->tx, , txi->idx);
 igb_ring_advance(core, txi, 1);
 eic |= igb_txdesc_writeback(core, base, , txi);
 }
@@ -3828,7 +3831,7 @@ igb_core_pci_realize(IGBCore*core,
 core->vmstate = qemu_add_vm_change_state_handler(igb_vm_state_change, 
core);
 
 for (i = 0; i < IGB_NUM_QUEUES; i++) {
-net_tx_pkt_init(>tx[i].tx_pkt, core->owner, E1000E_MAX_TX_FRAGS);
+net_tx_pkt_init(>tx[i].tx_pkt, NULL, E1000E_MAX_TX_FRAGS);
 }
 
 net_rx_pkt_init(>rx_pkt);
@@ -3853,7 +3856,7 @@ igb_core_pci_uninit(IGBCore *core)
 qemu_del_vm_change_state_handler(core->vmstate);
 
 for (i = 0; i < IGB_NUM_QUEUES; i++) {
-net_tx_pkt_reset(core->tx[i].tx_pkt);
+net_tx_pkt_reset(core->tx[i].tx_pkt, NULL);
 net_tx_pkt_uninit(core->tx[i].tx_pkt);
 }
 
@@ -4026,7 +4029,7 @@ static void igb_reset(IGBCore *core, bool sw)
 
 for (i = 0; i < ARRAY_SIZE(core->tx); i++) {
 tx = >tx[i];
-net_tx_pkt_reset(tx->tx_pkt);
+net_tx_pkt_reset(tx->tx_pkt, NULL);
 memset(tx->ctx, 0, sizeof(tx->ctx));
 tx->first = true;
 tx->skip_cp = false;
diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
index 986a3ad..cb606cc 100644
--- a/hw/net/net_tx_pkt.c
+++ b/hw/net/net_tx_pkt.c
@@ -443,7 +443,7 @@ void net_tx_pkt_dump(struct NetTxPkt *pkt)
 #endif
 }
 
-void net_tx_pkt_reset(struct NetTxPkt *pkt)
+void net_tx_pkt_reset(struct NetTxPkt *pkt, PCIDevice *pci_dev)
 {
 int i;
 
@@ -467,6 +467,7 @@ void net_tx_pkt_reset(struct NetTxPkt *pkt)
   pkt->raw[i].iov_len, DMA_DIRECTION_TO_DEVICE, 0);
 }
 }
+pkt->pci_dev = pci_dev;
 pkt->raw_frags = 0;
 
 pkt->hdr_len = 0;
diff --git a/hw/net/net_tx_pkt.h b/hw/net/net_tx_pkt.h
index f57b4e0..e5ce6f2 100644
--- a/hw/net/net_tx_pkt.h
+++ b/hw/net/net_tx_pkt.h
@@ -148,9 +148,10 @@ void net_tx_pkt_dump(struct NetTxPkt *pkt);
  * reset tx packet private context (needed to be called between packets)
  *
  * @pkt:   

[PULL 08/12] igb: implement VFRE and VFTE registers

2023-03-27 Thread Jason Wang
From: Sriram Yagnaraman 

Also introduce:
- Checks for RXDCTL/TXDCTL queue enable bits
- IGB_NUM_VM_POOLS enum (Sec 1.5: Table 1-7)

Signed-off-by: Sriram Yagnaraman 
Signed-off-by: Jason Wang 
---
 hw/net/igb_core.c | 38 +++---
 hw/net/igb_core.h |  1 +
 hw/net/igb_regs.h |  3 +++
 3 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 9ab90e8..753f17b 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -784,6 +784,18 @@ igb_txdesc_writeback(IGBCore *core, dma_addr_t base,
 return igb_tx_wb_eic(core, txi->idx);
 }
 
+static inline bool
+igb_tx_enabled(IGBCore *core, const E1000E_RingInfo *txi)
+{
+bool vmdq = core->mac[MRQC] & 1;
+uint16_t qn = txi->idx;
+uint16_t pool = qn % IGB_NUM_VM_POOLS;
+
+return (core->mac[TCTL] & E1000_TCTL_EN) &&
+(!vmdq || core->mac[VFTE] & BIT(pool)) &&
+(core->mac[TXDCTL0 + (qn * 16)] & E1000_TXDCTL_QUEUE_ENABLE);
+}
+
 static void
 igb_start_xmit(IGBCore *core, const IGB_TxRing *txr)
 {
@@ -793,8 +805,7 @@ igb_start_xmit(IGBCore *core, const IGB_TxRing *txr)
 const E1000E_RingInfo *txi = txr->i;
 uint32_t eic = 0;
 
-/* TODO: check if the queue itself is enabled too. */
-if (!(core->mac[TCTL] & E1000_TCTL_EN)) {
+if (!igb_tx_enabled(core, txi)) {
 trace_e1000e_tx_disabled();
 return;
 }
@@ -872,6 +883,9 @@ igb_can_receive(IGBCore *core)
 
 for (i = 0; i < IGB_NUM_QUEUES; i++) {
 E1000E_RxRing rxr;
+if (!(core->mac[RXDCTL0 + (i * 16)] & E1000_RXDCTL_QUEUE_ENABLE)) {
+continue;
+}
 
 igb_rx_ring_init(core, , i);
 if (igb_ring_enabled(core, rxr.i) && igb_has_rxbufs(core, rxr.i, 1)) {
@@ -938,7 +952,7 @@ static uint16_t igb_receive_assign(IGBCore *core, const 
struct eth_header *ehdr,
 
 if (core->mac[MRQC] & 1) {
 if (is_broadcast_ether_addr(ehdr->h_dest)) {
-for (i = 0; i < 8; i++) {
+for (i = 0; i < IGB_NUM_VM_POOLS; i++) {
 if (core->mac[VMOLR0 + i] & E1000_VMOLR_BAM) {
 queues |= BIT(i);
 }
@@ -972,7 +986,7 @@ static uint16_t igb_receive_assign(IGBCore *core, const 
struct eth_header *ehdr,
 f = ta_shift[(rctl >> E1000_RCTL_MO_SHIFT) & 3];
 f = (((ehdr->h_dest[5] << 8) | ehdr->h_dest[4]) >> f) & 0xfff;
 if (macp[f >> 5] & (1 << (f & 0x1f))) {
-for (i = 0; i < 8; i++) {
+for (i = 0; i < IGB_NUM_VM_POOLS; i++) {
 if (core->mac[VMOLR0 + i] & E1000_VMOLR_ROMPE) {
 queues |= BIT(i);
 }
@@ -995,7 +1009,7 @@ static uint16_t igb_receive_assign(IGBCore *core, const 
struct eth_header *ehdr,
 }
 }
 } else {
-for (i = 0; i < 8; i++) {
+for (i = 0; i < IGB_NUM_VM_POOLS; i++) {
 if (core->mac[VMOLR0 + i] & E1000_VMOLR_AUPE) {
 mask |= BIT(i);
 }
@@ -1011,6 +1025,7 @@ static uint16_t igb_receive_assign(IGBCore *core, const 
struct eth_header *ehdr,
 queues = BIT(def_pl >> E1000_VT_CTL_DEFAULT_POOL_SHIFT);
 }
 
+queues &= core->mac[VFRE];
 igb_rss_parse_packet(core, core->rx_pkt, external_tx != NULL, 
rss_info);
 if (rss_info->queue & 1) {
 queues <<= 8;
@@ -1571,7 +1586,8 @@ igb_receive_internal(IGBCore *core, const struct iovec 
*iov, int iovcnt,
 e1000x_fcs_len(core->mac);
 
 for (i = 0; i < IGB_NUM_QUEUES; i++) {
-if (!(queues & BIT(i))) {
+if (!(queues & BIT(i)) ||
+!(core->mac[RXDCTL0 + (i * 16)] & E1000_RXDCTL_QUEUE_ENABLE)) {
 continue;
 }
 
@@ -1977,9 +1993,16 @@ static void igb_set_vfmailbox(IGBCore *core, int index, 
uint32_t val)
 
 static void igb_vf_reset(IGBCore *core, uint16_t vfn)
 {
+uint16_t qn0 = vfn;
+uint16_t qn1 = vfn + IGB_NUM_VM_POOLS;
+
 /* disable Rx and Tx for the VF*/
-core->mac[VFTE] &= ~BIT(vfn);
+core->mac[RXDCTL0 + (qn0 * 16)] &= ~E1000_RXDCTL_QUEUE_ENABLE;
+core->mac[RXDCTL0 + (qn1 * 16)] &= ~E1000_RXDCTL_QUEUE_ENABLE;
+core->mac[TXDCTL0 + (qn0 * 16)] &= ~E1000_TXDCTL_QUEUE_ENABLE;
+core->mac[TXDCTL0 + (qn1 * 16)] &= ~E1000_TXDCTL_QUEUE_ENABLE;
 core->mac[VFRE] &= ~BIT(vfn);
+core->mac[VFTE] &= ~BIT(vfn);
 /* indicate VF reset to PF */
 core->mac[VFLRE] |= BIT(vfn);
 /* VFLRE and mailbox use the same interrupt cause */
@@ -3914,6 +3937,7 @@ igb_phy_reg_init[] = {
 static const uint32_t igb_mac_reg_init[] = {
 [LEDCTL]= 2 | (3 << 8) | BIT(15) | (6 << 16) | (7 << 24),
 [EEMNGCTL]  = BIT(31),
+[TXDCTL0]   = E1000_TXDCTL_QUEUE_ENABLE,
 [RXDCTL0]   = E1000_RXDCTL_QUEUE_ENABLE | (1 << 16),
 [RXDCTL1]   = 1 << 16,
 [RXDCTL2]   = 1 << 16,

[PULL 03/12] hw/net/net_tx_pkt: Ignore ECN bit

2023-03-27 Thread Jason Wang
From: Akihiko Odaki 

No segmentation should be performed if gso type is
VIRTIO_NET_HDR_GSO_NONE even if ECN bit is set.

Fixes: e263cd49c7 ("Packet abstraction for VMWARE network devices")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1544
Signed-off-by: Akihiko Odaki 
Signed-off-by: Jason Wang 
---
 hw/net/net_tx_pkt.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
index cb606cc..efe80b1 100644
--- a/hw/net/net_tx_pkt.c
+++ b/hw/net/net_tx_pkt.c
@@ -796,11 +796,13 @@ bool net_tx_pkt_send_custom(struct NetTxPkt *pkt, bool 
offload,
 {
 assert(pkt);
 
+uint8_t gso_type = pkt->virt_hdr.gso_type & ~VIRTIO_NET_HDR_GSO_ECN;
+
 /*
  * Since underlying infrastructure does not support IP datagrams longer
  * than 64K we should drop such packets and don't even try to send
  */
-if (VIRTIO_NET_HDR_GSO_NONE != pkt->virt_hdr.gso_type) {
+if (VIRTIO_NET_HDR_GSO_NONE != gso_type) {
 if (pkt->payload_len >
 ETH_MAX_IP_DGRAM_LEN -
 pkt->vec[NET_TX_PKT_L3HDR_FRAG].iov_len) {
@@ -808,7 +810,7 @@ bool net_tx_pkt_send_custom(struct NetTxPkt *pkt, bool 
offload,
 }
 }
 
-if (offload || pkt->virt_hdr.gso_type == VIRTIO_NET_HDR_GSO_NONE) {
+if (offload || gso_type == VIRTIO_NET_HDR_GSO_NONE) {
 if (!offload && pkt->virt_hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
 net_tx_pkt_do_sw_csum(pkt, >vec[NET_TX_PKT_L2HDR_FRAG],
   pkt->payload_frags + 
NET_TX_PKT_PL_START_FRAG - 1,
-- 
2.7.4




[PULL 09/12] igb: check oversized packets for VMDq

2023-03-27 Thread Jason Wang
From: Sriram Yagnaraman 

Signed-off-by: Sriram Yagnaraman 
Signed-off-by: Jason Wang 
---
 hw/net/igb_core.c | 41 -
 1 file changed, 36 insertions(+), 5 deletions(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 753f17b..38aa459 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -921,12 +921,26 @@ igb_rx_l4_cso_enabled(IGBCore *core)
 return !!(core->mac[RXCSUM] & E1000_RXCSUM_TUOFLD);
 }
 
+static bool
+igb_rx_is_oversized(IGBCore *core, uint16_t qn, size_t size)
+{
+uint16_t pool = qn % IGB_NUM_VM_POOLS;
+bool lpe = !!(core->mac[VMOLR0 + pool] & E1000_VMOLR_LPE);
+int max_ethernet_lpe_size =
+core->mac[VMOLR0 + pool] & E1000_VMOLR_RLPML_MASK;
+int max_ethernet_vlan_size = 1522;
+
+return size > (lpe ? max_ethernet_lpe_size : max_ethernet_vlan_size);
+}
+
 static uint16_t igb_receive_assign(IGBCore *core, const struct eth_header 
*ehdr,
-   E1000E_RSSInfo *rss_info, bool *external_tx)
+   size_t size, E1000E_RSSInfo *rss_info,
+   bool *external_tx)
 {
 static const int ta_shift[] = { 4, 3, 2, 0 };
 uint32_t f, ra[2], *macp, rctl = core->mac[RCTL];
 uint16_t queues = 0;
+uint16_t oversized = 0;
 uint16_t vid = lduw_be_p(_GET_VLAN_HDR(ehdr)->h_tci) & VLAN_VID_MASK;
 bool accepted = false;
 int i;
@@ -1026,9 +1040,26 @@ static uint16_t igb_receive_assign(IGBCore *core, const 
struct eth_header *ehdr,
 }
 
 queues &= core->mac[VFRE];
-igb_rss_parse_packet(core, core->rx_pkt, external_tx != NULL, 
rss_info);
-if (rss_info->queue & 1) {
-queues <<= 8;
+if (queues) {
+for (i = 0; i < IGB_NUM_VM_POOLS; i++) {
+if ((queues & BIT(i)) && igb_rx_is_oversized(core, i, size)) {
+oversized |= BIT(i);
+}
+}
+/* 8.19.37 increment ROC if packet is oversized for all queues */
+if (oversized == queues) {
+trace_e1000x_rx_oversized(size);
+e1000x_inc_reg_if_not_full(core->mac, ROC);
+}
+queues &= ~oversized;
+}
+
+if (queues) {
+igb_rss_parse_packet(core, core->rx_pkt,
+ external_tx != NULL, rss_info);
+if (rss_info->queue & 1) {
+queues <<= 8;
+}
 }
 } else {
 switch (net_rx_pkt_get_packet_type(core->rx_pkt)) {
@@ -1576,7 +1607,7 @@ igb_receive_internal(IGBCore *core, const struct iovec 
*iov, int iovcnt,
e1000x_vlan_enabled(core->mac),
core->mac[VET] & 0x);
 
-queues = igb_receive_assign(core, ehdr, _info, external_tx);
+queues = igb_receive_assign(core, ehdr, size, _info, external_tx);
 if (!queues) {
 trace_e1000e_rx_flt_dropped();
 return orig_size;
-- 
2.7.4




[PULL 12/12] igb: respect VMVIR and VMOLR for VLAN

2023-03-27 Thread Jason Wang
From: Sriram Yagnaraman 

Add support for stripping/inserting VLAN for VFs.

Had to move CSUM calculation back into the for loop, since packet data
is pulled inside the loop based on strip VLAN decision for every VF.

net_rx_pkt_fix_l4_csum should be extended to accept a buffer instead for
igb. Work for a future patch.

Signed-off-by: Sriram Yagnaraman 
Signed-off-by: Jason Wang 
---
 hw/net/igb_core.c | 62 +++
 1 file changed, 49 insertions(+), 13 deletions(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 162ba8b..d733fed 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -386,6 +386,28 @@ igb_rss_parse_packet(IGBCore *core, struct NetRxPkt *pkt, 
bool tx,
 info->queue = E1000_RSS_QUEUE(>mac[RETA], info->hash);
 }
 
+static void
+igb_tx_insert_vlan(IGBCore *core, uint16_t qn, struct igb_tx *tx,
+uint16_t vlan, bool insert_vlan)
+{
+if (core->mac[MRQC] & 1) {
+uint16_t pool = qn % IGB_NUM_VM_POOLS;
+
+if (core->mac[VMVIR0 + pool] & E1000_VMVIR_VLANA_DEFAULT) {
+/* always insert default VLAN */
+insert_vlan = true;
+vlan = core->mac[VMVIR0 + pool] & 0x;
+} else if (core->mac[VMVIR0 + pool] & E1000_VMVIR_VLANA_NEVER) {
+insert_vlan = false;
+}
+}
+
+if (insert_vlan && e1000x_vlan_enabled(core->mac)) {
+net_tx_pkt_setup_vlan_header_ex(tx->tx_pkt, vlan,
+core->mac[VET] & 0x);
+}
+}
+
 static bool
 igb_setup_tx_offloads(IGBCore *core, struct igb_tx *tx)
 {
@@ -583,12 +605,11 @@ igb_process_tx_desc(IGBCore *core,
 
 if (cmd_type_len & E1000_TXD_CMD_EOP) {
 if (!tx->skip_cp && net_tx_pkt_parse(tx->tx_pkt)) {
-if (cmd_type_len & E1000_TXD_CMD_VLE) {
-idx = (tx->first_olinfo_status >> 4) & 1;
-uint16_t vlan = tx->ctx[idx].vlan_macip_lens >> 16;
-uint16_t vet = core->mac[VET] & 0x;
-net_tx_pkt_setup_vlan_header_ex(tx->tx_pkt, vlan, vet);
-}
+idx = (tx->first_olinfo_status >> 4) & 1;
+igb_tx_insert_vlan(core, queue_index, tx,
+tx->ctx[idx].vlan_macip_lens >> 16,
+!!(cmd_type_len & E1000_TXD_CMD_VLE));
+
 if (igb_tx_pkt_send(core, tx, queue_index)) {
 igb_on_tx_done_update_stats(core, tx->tx_pkt, queue_index);
 }
@@ -1547,6 +1568,20 @@ igb_write_packet_to_guest(IGBCore *core, struct NetRxPkt 
*pkt,
 igb_update_rx_stats(core, rxi, size, total_size);
 }
 
+static bool
+igb_rx_strip_vlan(IGBCore *core, const E1000E_RingInfo *rxi)
+{
+if (core->mac[MRQC] & 1) {
+uint16_t pool = rxi->idx % IGB_NUM_VM_POOLS;
+/* Sec 7.10.3.8: CTRL.VME is ignored, only VMOLR/RPLOLR is used */
+return (net_rx_pkt_get_packet_type(core->rx_pkt) == ETH_PKT_MCAST) ?
+core->mac[RPLOLR] & E1000_RPLOLR_STRVLAN :
+core->mac[VMOLR0 + pool] & E1000_VMOLR_STRVLAN;
+}
+
+return e1000x_vlan_enabled(core->mac);
+}
+
 static inline void
 igb_rx_fix_l4_csum(IGBCore *core, struct NetRxPkt *pkt)
 {
@@ -1627,10 +1662,7 @@ igb_receive_internal(IGBCore *core, const struct iovec 
*iov, int iovcnt,
 
 ehdr = PKT_GET_ETH_HDR(filter_buf);
 net_rx_pkt_set_packet_type(core->rx_pkt, get_eth_packet_type(ehdr));
-
-net_rx_pkt_attach_iovec_ex(core->rx_pkt, iov, iovcnt, iov_ofs,
-   e1000x_vlan_enabled(core->mac),
-   core->mac[VET] & 0x);
+net_rx_pkt_set_protocols(core->rx_pkt, filter_buf, size);
 
 queues = igb_receive_assign(core, ehdr, size, _info, external_tx);
 if (!queues) {
@@ -1638,9 +1670,6 @@ igb_receive_internal(IGBCore *core, const struct iovec 
*iov, int iovcnt,
 return orig_size;
 }
 
-total_size = net_rx_pkt_get_total_len(core->rx_pkt) +
-e1000x_fcs_len(core->mac);
-
 for (i = 0; i < IGB_NUM_QUEUES; i++) {
 if (!(queues & BIT(i)) ||
 !(core->mac[RXDCTL0 + (i * 16)] & E1000_RXDCTL_QUEUE_ENABLE)) {
@@ -1649,6 +1678,13 @@ igb_receive_internal(IGBCore *core, const struct iovec 
*iov, int iovcnt,
 
 igb_rx_ring_init(core, , i);
 
+net_rx_pkt_attach_iovec_ex(core->rx_pkt, iov, iovcnt, iov_ofs,
+   igb_rx_strip_vlan(core, rxr.i),
+   core->mac[VET] & 0x);
+
+total_size = net_rx_pkt_get_total_len(core->rx_pkt) +
+e1000x_fcs_len(core->mac);
+
 if (!igb_has_rxbufs(core, rxr.i, total_size)) {
 n |= E1000_ICS_RXO;
 trace_e1000e_rx_not_written_to_guest(rxr.i->idx);
-- 
2.7.4




[PULL 11/12] igb: implement VF Tx and Rx stats

2023-03-27 Thread Jason Wang
From: Sriram Yagnaraman 

Please note that loopback counters for VM to VM traffic is not
implemented yet: VFGOTLBC, VFGPTLBC, VFGORLBC and VFGPRLBC.

Signed-off-by: Sriram Yagnaraman 
Signed-off-by: Jason Wang 
---
 hw/net/igb_core.c | 26 ++
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index fd61c6c..162ba8b 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -492,7 +492,7 @@ igb_tx_pkt_send(IGBCore *core, struct igb_tx *tx, int 
queue_index)
 }
 
 static void
-igb_on_tx_done_update_stats(IGBCore *core, struct NetTxPkt *tx_pkt)
+igb_on_tx_done_update_stats(IGBCore *core, struct NetTxPkt *tx_pkt, int qn)
 {
 static const int PTCregs[6] = { PTC64, PTC127, PTC255, PTC511,
 PTC1023, PTC1522 };
@@ -519,6 +519,13 @@ igb_on_tx_done_update_stats(IGBCore *core, struct NetTxPkt 
*tx_pkt)
 core->mac[GPTC] = core->mac[TPT];
 core->mac[GOTCL] = core->mac[TOTL];
 core->mac[GOTCH] = core->mac[TOTH];
+
+if (core->mac[MRQC] & 1) {
+uint16_t pool = qn % IGB_NUM_VM_POOLS;
+
+core->mac[PVFGOTC0 + (pool * 64)] += tot_len;
+core->mac[PVFGPTC0 + (pool * 64)]++;
+}
 }
 
 static void
@@ -583,7 +590,7 @@ igb_process_tx_desc(IGBCore *core,
 net_tx_pkt_setup_vlan_header_ex(tx->tx_pkt, vlan, vet);
 }
 if (igb_tx_pkt_send(core, tx, queue_index)) {
-igb_on_tx_done_update_stats(core, tx->tx_pkt);
+igb_on_tx_done_update_stats(core, tx->tx_pkt, queue_index);
 }
 }
 
@@ -1409,7 +1416,8 @@ igb_write_to_rx_buffers(IGBCore *core,
 }
 
 static void
-igb_update_rx_stats(IGBCore *core, size_t data_size, size_t data_fcs_size)
+igb_update_rx_stats(IGBCore *core, const E1000E_RingInfo *rxi,
+size_t data_size, size_t data_fcs_size)
 {
 e1000x_update_rx_total_stats(core->mac, data_size, data_fcs_size);
 
@@ -1425,6 +1433,16 @@ igb_update_rx_stats(IGBCore *core, size_t data_size, 
size_t data_fcs_size)
 default:
 break;
 }
+
+if (core->mac[MRQC] & 1) {
+uint16_t pool = rxi->idx % IGB_NUM_VM_POOLS;
+
+core->mac[PVFGORC0 + (pool * 64)] += data_size + 4;
+core->mac[PVFGPRC0 + (pool * 64)]++;
+if (net_rx_pkt_get_packet_type(core->rx_pkt) == ETH_PKT_MCAST) {
+core->mac[PVFMPRC0 + (pool * 64)]++;
+}
+}
 }
 
 static inline bool
@@ -1526,7 +1544,7 @@ igb_write_packet_to_guest(IGBCore *core, struct NetRxPkt 
*pkt,
 
 } while (desc_offset < total_size);
 
-igb_update_rx_stats(core, size, total_size);
+igb_update_rx_stats(core, rxi, size, total_size);
 }
 
 static inline void
-- 
2.7.4




[PULL 00/12] Net patches

2023-03-27 Thread Jason Wang
The following changes since commit e3debd5e7d0ce031356024878a0a18b9d109354a:

  Merge tag 'pull-request-2023-03-24' of https://gitlab.com/thuth/qemu into 
staging (2023-03-24 16:08:46 +)

are available in the git repository at:

  https://github.com/jasowang/qemu.git tags/net-pull-request

for you to fetch changes up to fba7c3b788dfcb99a3f9253f7d99cc0d217d6d3c:

  igb: respect VMVIR and VMOLR for VLAN (2023-03-28 13:10:55 +0800)




Akihiko Odaki (4):
  igb: Save more Tx states
  igb: Fix DMA requester specification for Tx packet
  hw/net/net_tx_pkt: Ignore ECN bit
  hw/net/net_tx_pkt: Align l3_hdr

Sriram Yagnaraman (8):
  MAINTAINERS: Add Sriram Yagnaraman as a igb reviewer
  igb: handle PF/VF reset properly
  igb: add ICR_RXDW
  igb: implement VFRE and VFTE registers
  igb: check oversized packets for VMDq
  igb: respect E1000_VMOLR_RSSE
  igb: implement VF Tx and Rx stats
  igb: respect VMVIR and VMOLR for VLAN

 MAINTAINERS  |   1 +
 hw/net/e1000e_core.c |   6 +-
 hw/net/e1000x_regs.h |   4 +
 hw/net/igb.c |  26 --
 hw/net/igb_core.c| 256 ++-
 hw/net/igb_core.h|   9 +-
 hw/net/igb_regs.h|   6 ++
 hw/net/net_tx_pkt.c  |  30 +++---
 hw/net/net_tx_pkt.h  |   3 +-
 hw/net/trace-events  |   2 +
 hw/net/vmxnet3.c |   4 +-
 11 files changed, 254 insertions(+), 93 deletions(-)





[PULL 04/12] hw/net/net_tx_pkt: Align l3_hdr

2023-03-27 Thread Jason Wang
From: Akihiko Odaki 

Align the l3_hdr member of NetTxPkt by defining it as a union of
ip_header, ip6_header, and an array of octets.

Fixes: e263cd49c7 ("Packet abstraction for VMWARE network devices")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1544
Signed-off-by: Akihiko Odaki 
Signed-off-by: Jason Wang 
---
 hw/net/net_tx_pkt.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
index efe80b1..8dc8568 100644
--- a/hw/net/net_tx_pkt.c
+++ b/hw/net/net_tx_pkt.c
@@ -43,7 +43,11 @@ struct NetTxPkt {
 struct iovec *vec;
 
 uint8_t l2_hdr[ETH_MAX_L2_HDR_LEN];
-uint8_t l3_hdr[ETH_MAX_IP_DGRAM_LEN];
+union {
+struct ip_header ip;
+struct ip6_header ip6;
+uint8_t octets[ETH_MAX_IP_DGRAM_LEN];
+} l3_hdr;
 
 uint32_t payload_len;
 
@@ -89,16 +93,14 @@ void net_tx_pkt_update_ip_hdr_checksum(struct NetTxPkt *pkt)
 {
 uint16_t csum;
 assert(pkt);
-struct ip_header *ip_hdr;
-ip_hdr = pkt->vec[NET_TX_PKT_L3HDR_FRAG].iov_base;
 
-ip_hdr->ip_len = cpu_to_be16(pkt->payload_len +
+pkt->l3_hdr.ip.ip_len = cpu_to_be16(pkt->payload_len +
 pkt->vec[NET_TX_PKT_L3HDR_FRAG].iov_len);
 
-ip_hdr->ip_sum = 0;
-csum = net_raw_checksum((uint8_t *)ip_hdr,
+pkt->l3_hdr.ip.ip_sum = 0;
+csum = net_raw_checksum(pkt->l3_hdr.octets,
 pkt->vec[NET_TX_PKT_L3HDR_FRAG].iov_len);
-ip_hdr->ip_sum = cpu_to_be16(csum);
+pkt->l3_hdr.ip.ip_sum = cpu_to_be16(csum);
 }
 
 void net_tx_pkt_update_ip_checksums(struct NetTxPkt *pkt)
@@ -832,15 +834,14 @@ void net_tx_pkt_fix_ip6_payload_len(struct NetTxPkt *pkt)
 {
 struct iovec *l2 = >vec[NET_TX_PKT_L2HDR_FRAG];
 if (eth_get_l3_proto(l2, 1, l2->iov_len) == ETH_P_IPV6) {
-struct ip6_header *ip6 = (struct ip6_header *) pkt->l3_hdr;
 /*
  * TODO: if qemu would support >64K packets - add jumbo option check
  * something like that:
  * 'if (ip6->ip6_plen == 0 && !has_jumbo_option(ip6)) {'
  */
-if (ip6->ip6_plen == 0) {
+if (pkt->l3_hdr.ip6.ip6_plen == 0) {
 if (pkt->payload_len <= ETH_MAX_IP_DGRAM_LEN) {
-ip6->ip6_plen = htons(pkt->payload_len);
+pkt->l3_hdr.ip6.ip6_plen = htons(pkt->payload_len);
 }
 /*
  * TODO: if qemu would support >64K packets
-- 
2.7.4




[PULL 10/12] igb: respect E1000_VMOLR_RSSE

2023-03-27 Thread Jason Wang
From: Sriram Yagnaraman 

RSS for VFs is only enabled if VMOLR[n].RSSE is set.

Signed-off-by: Sriram Yagnaraman 
Signed-off-by: Jason Wang 
---
 hw/net/igb_core.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 38aa459..fd61c6c 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -1057,8 +1057,15 @@ static uint16_t igb_receive_assign(IGBCore *core, const 
struct eth_header *ehdr,
 if (queues) {
 igb_rss_parse_packet(core, core->rx_pkt,
  external_tx != NULL, rss_info);
+/* Sec 8.26.1: PQn = VFn + VQn*8 */
 if (rss_info->queue & 1) {
-queues <<= 8;
+for (i = 0; i < IGB_NUM_VM_POOLS; i++) {
+if ((queues & BIT(i)) &&
+(core->mac[VMOLR0 + i] & E1000_VMOLR_RSSE)) {
+queues |= BIT(i + IGB_NUM_VM_POOLS);
+queues &= ~BIT(i);
+}
+}
 }
 }
 } else {
-- 
2.7.4




[PULL 07/12] igb: add ICR_RXDW

2023-03-27 Thread Jason Wang
From: Sriram Yagnaraman 

IGB uses RXDW ICR bit to indicate that rx descriptor has been written
back. This is the same as RXT0 bit in older HW.

Signed-off-by: Sriram Yagnaraman 
Signed-off-by: Jason Wang 
---
 hw/net/e1000x_regs.h | 4 
 hw/net/igb_core.c| 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/net/e1000x_regs.h b/hw/net/e1000x_regs.h
index c0832fa..6d3c4c6 100644
--- a/hw/net/e1000x_regs.h
+++ b/hw/net/e1000x_regs.h
@@ -335,6 +335,7 @@
 #define E1000_ICR_RXDMT00x0010 /* rx desc min. threshold (0) */
 #define E1000_ICR_RXO   0x0040 /* rx overrun */
 #define E1000_ICR_RXT0  0x0080 /* rx timer intr (ring 0) */
+#define E1000_ICR_RXDW  0x0080 /* rx desc written back */
 #define E1000_ICR_MDAC  0x0200 /* MDIO access complete */
 #define E1000_ICR_RXCFG 0x0400 /* RX /c/ ordered set */
 #define E1000_ICR_GPI_EN0   0x0800 /* GP Int 0 */
@@ -378,6 +379,7 @@
 #define E1000_ICS_RXDMT0E1000_ICR_RXDMT0/* rx desc min. threshold */
 #define E1000_ICS_RXO   E1000_ICR_RXO   /* rx overrun */
 #define E1000_ICS_RXT0  E1000_ICR_RXT0  /* rx timer intr */
+#define E1000_ICS_RXDW  E1000_ICR_RXDW  /* rx desc written back */
 #define E1000_ICS_MDAC  E1000_ICR_MDAC  /* MDIO access complete */
 #define E1000_ICS_RXCFG E1000_ICR_RXCFG /* RX /c/ ordered set */
 #define E1000_ICS_GPI_EN0   E1000_ICR_GPI_EN0   /* GP Int 0 */
@@ -407,6 +409,7 @@
 #define E1000_IMS_RXDMT0E1000_ICR_RXDMT0/* rx desc min. threshold */
 #define E1000_IMS_RXO   E1000_ICR_RXO   /* rx overrun */
 #define E1000_IMS_RXT0  E1000_ICR_RXT0  /* rx timer intr */
+#define E1000_IMS_RXDW  E1000_ICR_RXDW  /* rx desc written back */
 #define E1000_IMS_MDAC  E1000_ICR_MDAC  /* MDIO access complete */
 #define E1000_IMS_RXCFG E1000_ICR_RXCFG /* RX /c/ ordered set */
 #define E1000_IMS_GPI_EN0   E1000_ICR_GPI_EN0   /* GP Int 0 */
@@ -441,6 +444,7 @@
 #define E1000_IMC_RXDMT0E1000_ICR_RXDMT0/* rx desc min. threshold */
 #define E1000_IMC_RXO   E1000_ICR_RXO   /* rx overrun */
 #define E1000_IMC_RXT0  E1000_ICR_RXT0  /* rx timer intr */
+#define E1000_IMC_RXDW  E1000_ICR_RXDW  /* rx desc written back */
 #define E1000_IMC_MDAC  E1000_ICR_MDAC  /* MDIO access complete */
 #define E1000_IMC_RXCFG E1000_ICR_RXCFG /* RX /c/ ordered set */
 #define E1000_IMC_GPI_EN0   E1000_ICR_GPI_EN0   /* GP Int 0 */
diff --git a/hw/net/igb_core.c b/hw/net/igb_core.c
index 6ba9696..9ab90e8 100644
--- a/hw/net/igb_core.c
+++ b/hw/net/igb_core.c
@@ -1583,7 +1583,7 @@ igb_receive_internal(IGBCore *core, const struct iovec 
*iov, int iovcnt,
 continue;
 }
 
-n |= E1000_ICR_RXT0;
+n |= E1000_ICR_RXDW;
 
 igb_rx_fix_l4_csum(core, core->rx_pkt);
 igb_write_packet_to_guest(core, core->rx_pkt, , _info);
-- 
2.7.4




Re: [PATCH 5/5] target/riscv: Add pointer mask support for instruction fetch

2023-03-27 Thread liweiwei



On 2023/3/28 11:31, Richard Henderson wrote:

On 3/27/23 18:55, liweiwei wrote:


On 2023/3/28 02:04, Richard Henderson wrote:

On 3/27/23 03:00, Weiwei Li wrote:
@@ -1248,6 +1265,10 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr 
address, int size,
  qemu_log_mask(CPU_LOG_MMU, "%s ad %" VADDR_PRIx " rw %d 
mmu_idx %d\n",

    __func__, address, access_type, mmu_idx);
  +    if (access_type == MMU_INST_FETCH) {
+    address = adjust_pc_address(env, address);
+    }


Why do you want to do this so late, as opposed to earlier in 
cpu_get_tb_cpu_state?


In this way, the pc for tb may be different from the reg pc. Then the 
pc register will be wrong if sync from tb.


Hmm, true.

But you certainly cannot adjust the address in tlb_fill, as you'll be 
producing different result for read/write and exec.  You could 
plausibly use a separate mmu_idx, but that's not ideal either.


The best solution might be to implement pc-relative translation 
(CF_PCREL).  At which point cpu_pc always has the correct results and 
we make relative adjustments to that.


I'm not very familiar with how CF_PCREL works currently. I'll try this 
way later.


Regards,

Weiwei Li




r~





Question: why no VMEXIT occur when I read/write emulated pci devices's bar by mmap inside guest userspace?

2023-03-27 Thread Wu Zongyong
Hi,

I create a VM with a virtual pci device "-net nic,model=e1000", and I
attemped to read/write the bar0 of the virtual nic inside the guest
userspace by mmap the "/sys/bus/pci/devices/:xx:xx.x/resource0" to 
the userspace.
What I expected is a VMEXIT should occur when I read/write the bar but I
didn't see any VMEXIT event.
So I'm confused with this and any detailed analysis about mmio handle
routine would be appreciated.

Thanks




Re: [PATCH 1/5] target/riscv: Fix effective address for pointer mask

2023-03-27 Thread liweiwei


On 2023/3/28 11:18, Richard Henderson wrote:

On 3/27/23 19:48, liweiwei wrote:


On 2023/3/28 10:20, LIU Zhiwei wrote:


On 2023/3/27 18:00, Weiwei Li wrote:

Since pointer mask works on effective address, and the xl works on the
generation of effective address, so xl related calculation should 
be done

before pointer mask.


Incorrect. It has been done.

When updating the pm_mask,  we have already considered the env->xl.

You can see it in riscv_cpu_update_mask

    if (env->xl == MXL_RV32) {
    env->cur_pmmask = mask & UINT32_MAX;
    env->cur_pmbase = base & UINT32_MAX;
    } else {
    env->cur_pmmask = mask;
    env->cur_pmbase = base;
    }

Yeah, I missed this part. Then we should ensure cur_pmmask/base is 
updated when xl changes.


Is that even possible?  XL can change on priv level changes (SXL, UXL).


Yeah. Not possible, since only UXL is changable currently, and SXL/UXL 
can only be changed in higher priv level.


So the recompute for xl in write_mstatus() seems redundant.

Maybe there is a way to change current xl in future if misa.mxl is 
changable.


Regards,

Weiwei Li




r~

Re: [PATCH 5/5] target/riscv: Add pointer mask support for instruction fetch

2023-03-27 Thread Richard Henderson

On 3/27/23 18:55, liweiwei wrote:


On 2023/3/28 02:04, Richard Henderson wrote:

On 3/27/23 03:00, Weiwei Li wrote:

@@ -1248,6 +1265,10 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
  qemu_log_mask(CPU_LOG_MMU, "%s ad %" VADDR_PRIx " rw %d mmu_idx %d\n",
    __func__, address, access_type, mmu_idx);
  +    if (access_type == MMU_INST_FETCH) {
+    address = adjust_pc_address(env, address);
+    }


Why do you want to do this so late, as opposed to earlier in 
cpu_get_tb_cpu_state?


In this way, the pc for tb may be different from the reg pc. Then the pc register will be 
wrong if sync from tb.


Hmm, true.

But you certainly cannot adjust the address in tlb_fill, as you'll be producing different 
result for read/write and exec.  You could plausibly use a separate mmu_idx, but that's 
not ideal either.


The best solution might be to implement pc-relative translation (CF_PCREL).  At which 
point cpu_pc always has the correct results and we make relative adjustments to that.



r~



Re: [PATCH 1/5] target/riscv: Fix effective address for pointer mask

2023-03-27 Thread LIU Zhiwei



On 2023/3/28 11:18, Richard Henderson wrote:

On 3/27/23 19:48, liweiwei wrote:


On 2023/3/28 10:20, LIU Zhiwei wrote:


On 2023/3/27 18:00, Weiwei Li wrote:

Since pointer mask works on effective address, and the xl works on the
generation of effective address, so xl related calculation should 
be done

before pointer mask.


Incorrect. It has been done.

When updating the pm_mask,  we have already considered the env->xl.

You can see it in riscv_cpu_update_mask

    if (env->xl == MXL_RV32) {
    env->cur_pmmask = mask & UINT32_MAX;
    env->cur_pmbase = base & UINT32_MAX;
    } else {
    env->cur_pmmask = mask;
    env->cur_pmbase = base;
    }

Yeah, I missed this part. Then we should ensure cur_pmmask/base is 
updated when xl changes.


Is that even possible?  XL can change on priv level changes (SXL, UXL).


I think I have considered this.

https://lists.gnu.org/archive/html/qemu-devel/2022-01/msg04366.html

Zhiwei




r~




Re: [PATCH] hw/loongarch/virt: Fix virt_to_phys_addr function

2023-03-27 Thread gaosong



在 2023/3/28 上午1:44, Richard Henderson 写道:

On 3/27/23 04:23, Tianrui Zhao wrote:

The virt addr should mask TARGET_PHYS_ADDR_SPACE_BITS to
get the phys addr, and this is used by loading kernel elf.

Signed-off-by: Tianrui Zhao 
---
  hw/loongarch/virt.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index b702c3f51e..f4bf14c1c8 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -399,7 +399,7 @@ static struct _loaderparams {
    static uint64_t cpu_loongarch_virt_to_phys(void *opaque, uint64_t 
addr)

  {
-    return addr & 0x1fffll;
+    return addr & MAKE_64BIT_MASK(0, TARGET_PHYS_ADDR_SPACE_BITS);
  }
    static int64_t load_kernel_info(void)


Looks correct.  Any idea where this 29-bit value originated?
We just considered using  256M low-memory to load the kernel and did not 
consider using the high-memory.


Thanks.
Song Gao


Acked-by: Richard Henderson 

r~





Re: [PATCH 1/5] target/riscv: Fix effective address for pointer mask

2023-03-27 Thread Richard Henderson

On 3/27/23 19:48, liweiwei wrote:


On 2023/3/28 10:20, LIU Zhiwei wrote:


On 2023/3/27 18:00, Weiwei Li wrote:

Since pointer mask works on effective address, and the xl works on the
generation of effective address, so xl related calculation should be done
before pointer mask.


Incorrect. It has been done.

When updating the pm_mask,  we have already considered the env->xl.

You can see it in riscv_cpu_update_mask

    if (env->xl == MXL_RV32) {
    env->cur_pmmask = mask & UINT32_MAX;
    env->cur_pmbase = base & UINT32_MAX;
    } else {
    env->cur_pmmask = mask;
    env->cur_pmbase = base;
    }


Yeah, I missed this part. Then we should ensure cur_pmmask/base is updated when 
xl changes.


Is that even possible?  XL can change on priv level changes (SXL, UXL).


r~



Re: [PATCH 5/5] target/riscv: Add pointer mask support for instruction fetch

2023-03-27 Thread liweiwei


On 2023/3/28 10:31, LIU Zhiwei wrote:


On 2023/3/28 9:55, liweiwei wrote:


On 2023/3/28 02:04, Richard Henderson wrote:

On 3/27/23 03:00, Weiwei Li wrote:
@@ -1248,6 +1265,10 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr 
address, int size,
  qemu_log_mask(CPU_LOG_MMU, "%s ad %" VADDR_PRIx " rw %d 
mmu_idx %d\n",

    __func__, address, access_type, mmu_idx);
  +    if (access_type == MMU_INST_FETCH) {
+    address = adjust_pc_address(env, address);
+    }


Why do you want to do this so late, as opposed to earlier in 
cpu_get_tb_cpu_state?


In this way, the pc for tb may be different from the reg pc. 

I don't understand.

Then the pc register will be wrong if sync from tb.


I think you should give an explain here why it is wrong.

Zhiwei


Assume the pc is 0x1fff , pmmask is 0xf000 , if we adjust pc in  
cpu_get_tb_cpu_state,


then the tb->pc will be 0x0fff .

If we sync pc from tb by riscv_cpu_synchronize_from_tb()

Then the pc will be updated to 0x0fff  in this case, which will 
different from the original value.


I ignore many internal steps in above case. Any critical condition I 
missed? or any misunderstood?


Regards,

Weiwei Li





Regards,

Weiwei Li




r~

[RFC PATCH v2 39/44] target/loongarch: Implement vinsgr2vr vpickve2gr vreplgr2vr

2023-03-27 Thread Song Gao
This patch includes:
- VINSGR2VR.{B/H/W/D};
- VPICKVE2GR.{B/H/W/D}[U];
- VREPLGR2VR.{B/H/W/D}.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c|  33 ++
 target/loongarch/insn_trans/trans_lsx.c.inc | 110 
 target/loongarch/insns.decode   |  30 ++
 3 files changed, 173 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index ecf0c7b577..7255a2aa4f 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -818,6 +818,21 @@ static void output_(DisasContext *ctx, arg_ *a, 
const char *mnemonic)
 output(ctx, mnemonic, "v%d, v%d, v%d, v%d", a->vd, a->vj, a->vk, a->va);
 }
 
+static void output_vr_i(DisasContext *ctx, arg_vr_i *a, const char *mnemonic)
+{
+output(ctx, mnemonic, "v%d, r%d, 0x%x", a->vd, a->rj, a->imm);
+}
+
+static void output_rv_i(DisasContext *ctx, arg_rv_i *a, const char *mnemonic)
+{
+output(ctx, mnemonic, "r%d, v%d, 0x%x", a->rd, a->vj,  a->imm);
+}
+
+static void output_vr(DisasContext *ctx, arg_vr *a, const char *mnemonic)
+{
+output(ctx, mnemonic, "v%d, r%d", a->vd, a->rj);
+}
+
 INSN_LSX(vadd_b,   vvv)
 INSN_LSX(vadd_h,   vvv)
 INSN_LSX(vadd_w,   vvv)
@@ -1561,3 +1576,21 @@ INSN_LSX(vsetallnez_b, cv)
 INSN_LSX(vsetallnez_h, cv)
 INSN_LSX(vsetallnez_w, cv)
 INSN_LSX(vsetallnez_d, cv)
+
+INSN_LSX(vinsgr2vr_b,  vr_i)
+INSN_LSX(vinsgr2vr_h,  vr_i)
+INSN_LSX(vinsgr2vr_w,  vr_i)
+INSN_LSX(vinsgr2vr_d,  vr_i)
+INSN_LSX(vpickve2gr_b, rv_i)
+INSN_LSX(vpickve2gr_h, rv_i)
+INSN_LSX(vpickve2gr_w, rv_i)
+INSN_LSX(vpickve2gr_d, rv_i)
+INSN_LSX(vpickve2gr_bu,rv_i)
+INSN_LSX(vpickve2gr_hu,rv_i)
+INSN_LSX(vpickve2gr_wu,rv_i)
+INSN_LSX(vpickve2gr_du,rv_i)
+
+INSN_LSX(vreplgr2vr_b, vr)
+INSN_LSX(vreplgr2vr_h, vr)
+INSN_LSX(vreplgr2vr_w, vr)
+INSN_LSX(vreplgr2vr_d, vr)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index 7fc5c6c1d6..b2489537ef 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -3221,3 +3221,113 @@ TRANS(vsetallnez_b, gen_cv, gen_helper_vsetallnez_b)
 TRANS(vsetallnez_h, gen_cv, gen_helper_vsetallnez_h)
 TRANS(vsetallnez_w, gen_cv, gen_helper_vsetallnez_w)
 TRANS(vsetallnez_d, gen_cv, gen_helper_vsetallnez_d)
+
+static bool trans_vinsgr2vr_b(DisasContext *ctx, arg_vr_i *a)
+{
+CHECK_SXE;
+tcg_gen_st8_i64(cpu_gpr[a->rj], cpu_env,
+offsetof(CPULoongArchState, fpr[a->vd].vreg.B(a->imm)));
+return true;
+}
+
+static bool trans_vinsgr2vr_h(DisasContext *ctx, arg_vr_i *a)
+{
+CHECK_SXE;
+tcg_gen_st16_i64(cpu_gpr[a->rj], cpu_env,
+offsetof(CPULoongArchState, fpr[a->vd].vreg.H(a->imm)));
+return true;
+}
+
+static bool trans_vinsgr2vr_w(DisasContext *ctx, arg_vr_i *a)
+{
+CHECK_SXE;
+tcg_gen_st32_i64(cpu_gpr[a->rj], cpu_env,
+ offsetof(CPULoongArchState, fpr[a->vd].vreg.W(a->imm)));
+return true;
+}
+
+static bool trans_vinsgr2vr_d(DisasContext *ctx, arg_vr_i *a)
+{
+CHECK_SXE;
+tcg_gen_st_i64(cpu_gpr[a->rj], cpu_env,
+   offsetof(CPULoongArchState, fpr[a->vd].vreg.D(a->imm)));
+return true;
+}
+
+static bool trans_vpickve2gr_b(DisasContext *ctx, arg_rv_i *a)
+{
+CHECK_SXE;
+tcg_gen_ld8s_i64(cpu_gpr[a->rd], cpu_env,
+ offsetof(CPULoongArchState, fpr[a->vj].vreg.B(a->imm)));
+return true;
+}
+
+static bool trans_vpickve2gr_h(DisasContext *ctx, arg_rv_i *a)
+{
+CHECK_SXE;
+tcg_gen_ld16s_i64(cpu_gpr[a->rd], cpu_env,
+  offsetof(CPULoongArchState, fpr[a->vj].vreg.H(a->imm)));
+return true;
+}
+
+static bool trans_vpickve2gr_w(DisasContext *ctx, arg_rv_i *a)
+{
+CHECK_SXE;
+tcg_gen_ld32s_i64(cpu_gpr[a->rd], cpu_env,
+  offsetof(CPULoongArchState, fpr[a->vj].vreg.W(a->imm)));
+return true;
+}
+
+static bool trans_vpickve2gr_d(DisasContext *ctx, arg_rv_i *a)
+{
+CHECK_SXE;
+tcg_gen_ld_i64(cpu_gpr[a->rd], cpu_env,
+   offsetof(CPULoongArchState, fpr[a->vj].vreg.D(a->imm)));
+return true;
+}
+
+static bool trans_vpickve2gr_bu(DisasContext *ctx, arg_rv_i *a)
+{
+CHECK_SXE;
+tcg_gen_ld8u_i64(cpu_gpr[a->rd], cpu_env,
+ offsetof(CPULoongArchState, fpr[a->vj].vreg.B(a->imm)));
+return true;
+}
+
+static bool trans_vpickve2gr_hu(DisasContext *ctx, arg_rv_i *a)
+{
+CHECK_SXE;
+tcg_gen_ld16u_i64(cpu_gpr[a->rd], cpu_env,
+  offsetof(CPULoongArchState, fpr[a->vj].vreg.H(a->imm)));
+return true;
+}
+
+static bool trans_vpickve2gr_wu(DisasContext *ctx, arg_rv_i *a)
+{
+CHECK_SXE;
+tcg_gen_ld32u_i64(cpu_gpr[a->rd], cpu_env,
+  offsetof(CPULoongArchState, fpr[a->vj].vreg.W(a->imm)));
+return true;
+}
+
+static bool 

[RFC PATCH v2 16/44] target/loongarch: Implement vmadd/vmsub/vmaddw{ev/od}

2023-03-27 Thread Song Gao
This patch includes:
- VMADD.{B/H/W/D};
- VMSUB.{B/H/W/D};
- VMADDW{EV/OD}.{H.B/W.H/D.W/Q.D}[U];
- VMADDW{EV/OD}.{H.BU.B/W.HU.H/D.WU.W/Q.DU.D}.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c|  34 ++
 target/loongarch/helper.h   |  36 ++
 target/loongarch/insn_trans/trans_lsx.c.inc | 454 
 target/loongarch/insns.decode   |  34 ++
 target/loongarch/lsx_helper.c   | 114 +
 5 files changed, 672 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 48e6ef5309..980e6e6375 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1010,3 +1010,37 @@ INSN_LSX(vmulwod_h_bu_b,   vvv)
 INSN_LSX(vmulwod_w_hu_h,   vvv)
 INSN_LSX(vmulwod_d_wu_w,   vvv)
 INSN_LSX(vmulwod_q_du_d,   vvv)
+
+INSN_LSX(vmadd_b,  vvv)
+INSN_LSX(vmadd_h,  vvv)
+INSN_LSX(vmadd_w,  vvv)
+INSN_LSX(vmadd_d,  vvv)
+INSN_LSX(vmsub_b,  vvv)
+INSN_LSX(vmsub_h,  vvv)
+INSN_LSX(vmsub_w,  vvv)
+INSN_LSX(vmsub_d,  vvv)
+
+INSN_LSX(vmaddwev_h_b, vvv)
+INSN_LSX(vmaddwev_w_h, vvv)
+INSN_LSX(vmaddwev_d_w, vvv)
+INSN_LSX(vmaddwev_q_d, vvv)
+INSN_LSX(vmaddwod_h_b, vvv)
+INSN_LSX(vmaddwod_w_h, vvv)
+INSN_LSX(vmaddwod_d_w, vvv)
+INSN_LSX(vmaddwod_q_d, vvv)
+INSN_LSX(vmaddwev_h_bu,vvv)
+INSN_LSX(vmaddwev_w_hu,vvv)
+INSN_LSX(vmaddwev_d_wu,vvv)
+INSN_LSX(vmaddwev_q_du,vvv)
+INSN_LSX(vmaddwod_h_bu,vvv)
+INSN_LSX(vmaddwod_w_hu,vvv)
+INSN_LSX(vmaddwod_d_wu,vvv)
+INSN_LSX(vmaddwod_q_du,vvv)
+INSN_LSX(vmaddwev_h_bu_b,  vvv)
+INSN_LSX(vmaddwev_w_hu_h,  vvv)
+INSN_LSX(vmaddwev_d_wu_w,  vvv)
+INSN_LSX(vmaddwev_q_du_d,  vvv)
+INSN_LSX(vmaddwod_h_bu_b,  vvv)
+INSN_LSX(vmaddwod_w_hu_h,  vvv)
+INSN_LSX(vmaddwod_d_wu_w,  vvv)
+INSN_LSX(vmaddwod_q_du_d,  vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 437b47fa78..6bb273fefe 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -282,3 +282,39 @@ DEF_HELPER_FLAGS_4(vmulwod_h_bu_b, TCG_CALL_NO_RWG, void, 
ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vmulwod_w_hu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vmulwod_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vmulwod_q_du_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vmadd_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmadd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmadd_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmadd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmsub_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmsub_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmsub_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmsub_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vmaddwev_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwev_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwev_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwev_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwod_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwod_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwod_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwod_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vmaddwev_h_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwev_w_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwev_d_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwev_q_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwod_h_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwod_w_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwod_d_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwod_q_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vmaddwev_h_bu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwev_w_hu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwev_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwev_q_du_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwod_h_bu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwod_w_hu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwod_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmaddwod_q_du_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index 583b608cd2..29c7aca8f9 100644
--- 

[RFC PATCH v2 40/44] target/loongarch: Implement vreplve vpack vpick

2023-03-27 Thread Song Gao
This patch includes:
- VREPLVE[I].{B/H/W/D};
- VBSLL.V, VBSRL.V;
- VPACK{EV/OD}.{B/H/W/D};
- VPICK{EV/OD}.{B/H/W/D}.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c|  35 +
 target/loongarch/helper.h   |  18 +++
 target/loongarch/insn_trans/trans_lsx.c.inc | 154 
 target/loongarch/insns.decode   |  34 +
 target/loongarch/lsx_helper.c   |  92 
 5 files changed, 333 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 7255a2aa4f..c6cf782725 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -833,6 +833,11 @@ static void output_vr(DisasContext *ctx, arg_vr *a, const 
char *mnemonic)
 output(ctx, mnemonic, "v%d, r%d", a->vd, a->rj);
 }
 
+static void output_vvr(DisasContext *ctx, arg_vvr *a, const char *mnemonic)
+{
+output(ctx, mnemonic, "v%d, v%d, r%d", a->vd, a->vj, a->rk);
+}
+
 INSN_LSX(vadd_b,   vvv)
 INSN_LSX(vadd_h,   vvv)
 INSN_LSX(vadd_w,   vvv)
@@ -1594,3 +1599,33 @@ INSN_LSX(vreplgr2vr_b, vr)
 INSN_LSX(vreplgr2vr_h, vr)
 INSN_LSX(vreplgr2vr_w, vr)
 INSN_LSX(vreplgr2vr_d, vr)
+
+INSN_LSX(vreplve_b,vvr)
+INSN_LSX(vreplve_h,vvr)
+INSN_LSX(vreplve_w,vvr)
+INSN_LSX(vreplve_d,vvr)
+INSN_LSX(vreplvei_b,   vv_i)
+INSN_LSX(vreplvei_h,   vv_i)
+INSN_LSX(vreplvei_w,   vv_i)
+INSN_LSX(vreplvei_d,   vv_i)
+
+INSN_LSX(vbsll_v,  vv_i)
+INSN_LSX(vbsrl_v,  vv_i)
+
+INSN_LSX(vpackev_b,vvv)
+INSN_LSX(vpackev_h,vvv)
+INSN_LSX(vpackev_w,vvv)
+INSN_LSX(vpackev_d,vvv)
+INSN_LSX(vpackod_b,vvv)
+INSN_LSX(vpackod_h,vvv)
+INSN_LSX(vpackod_w,vvv)
+INSN_LSX(vpackod_d,vvv)
+
+INSN_LSX(vpickev_b,vvv)
+INSN_LSX(vpickev_h,vvv)
+INSN_LSX(vpickev_w,vvv)
+INSN_LSX(vpickev_d,vvv)
+INSN_LSX(vpickod_b,vvv)
+INSN_LSX(vpickod_h,vvv)
+INSN_LSX(vpickod_w,vvv)
+INSN_LSX(vpickod_d,vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index cdc007a072..bf03a16afd 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -668,3 +668,21 @@ DEF_HELPER_3(vsetallnez_b, void, env, i32, i32)
 DEF_HELPER_3(vsetallnez_h, void, env, i32, i32)
 DEF_HELPER_3(vsetallnez_w, void, env, i32, i32)
 DEF_HELPER_3(vsetallnez_d, void, env, i32, i32)
+
+DEF_HELPER_4(vpackev_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vpackev_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vpackev_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vpackev_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vpackod_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vpackod_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vpackod_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vpackod_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vpickev_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vpickev_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vpickev_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vpickev_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vpickod_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vpickod_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vpickod_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vpickod_d, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index b2489537ef..66cb67a19c 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -3331,3 +3331,157 @@ TRANS(vreplgr2vr_b, gvec_dup, MO_8)
 TRANS(vreplgr2vr_h, gvec_dup, MO_16)
 TRANS(vreplgr2vr_w, gvec_dup, MO_32)
 TRANS(vreplgr2vr_d, gvec_dup, MO_64)
+
+static bool trans_vreplvei_b(DisasContext *ctx, arg_vv_i *a)
+{
+CHECK_SXE;
+tcg_gen_gvec_dup_mem(MO_8,vreg_full_offset(a->vd),
+ offsetof(CPULoongArchState,
+  fpr[a->vj].vreg.B((a->imm))),
+ 16, 16);
+return true;
+}
+
+static bool trans_vreplvei_h(DisasContext *ctx, arg_vv_i *a)
+{
+CHECK_SXE;
+tcg_gen_gvec_dup_mem(MO_16, vreg_full_offset(a->vd),
+ offsetof(CPULoongArchState,
+  fpr[a->vj].vreg.H((a->imm))),
+ 16, 16);
+return true;
+}
+static bool trans_vreplvei_w(DisasContext *ctx, arg_vv_i *a)
+{
+CHECK_SXE;
+tcg_gen_gvec_dup_mem(MO_32, vreg_full_offset(a->vd),
+ offsetof(CPULoongArchState,
+  fpr[a->vj].vreg.W((a->imm))),
+16, 16);
+return true;
+}
+static bool trans_vreplvei_d(DisasContext *ctx, arg_vv_i *a)
+{
+CHECK_SXE;
+tcg_gen_gvec_dup_mem(MO_64, vreg_full_offset(a->vd),
+ offsetof(CPULoongArchState,
+  fpr[a->vj].vreg.D((a->imm))),
+ 16, 16);
+return true;
+}
+
+static bool gen_vreplve(DisasContext *ctx, 

[RFC PATCH v2 18/44] target/loongarch: Implement vsat

2023-03-27 Thread Song Gao
This patch includes:
- VSAT.{B/H/W/D}[U].

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c|   9 ++
 target/loongarch/helper.h   |   9 ++
 target/loongarch/insn_trans/trans_lsx.c.inc | 105 
 target/loongarch/insns.decode   |  12 +++
 target/loongarch/lsx_helper.c   |  73 ++
 5 files changed, 208 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 6e4f676a42..b04aefe3ed 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1061,3 +1061,12 @@ INSN_LSX(vmod_bu,  vvv)
 INSN_LSX(vmod_hu,  vvv)
 INSN_LSX(vmod_wu,  vvv)
 INSN_LSX(vmod_du,  vvv)
+
+INSN_LSX(vsat_b,   vv_i)
+INSN_LSX(vsat_h,   vv_i)
+INSN_LSX(vsat_w,   vv_i)
+INSN_LSX(vsat_d,   vv_i)
+INSN_LSX(vsat_bu,  vv_i)
+INSN_LSX(vsat_hu,  vv_i)
+INSN_LSX(vsat_wu,  vv_i)
+INSN_LSX(vsat_du,  vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index e46f12cb65..6345b7ef9c 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -335,3 +335,12 @@ DEF_HELPER_4(vmod_bu, void, env, i32, i32, i32)
 DEF_HELPER_4(vmod_hu, void, env, i32, i32, i32)
 DEF_HELPER_4(vmod_wu, void, env, i32, i32, i32)
 DEF_HELPER_4(vmod_du, void, env, i32, i32, i32)
+
+DEF_HELPER_FLAGS_4(vsat_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsat_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsat_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsat_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsat_bu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsat_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsat_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsat_du, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index 46a18da6dd..7dfb3b33f6 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2382,3 +2382,108 @@ TRANS(vmod_bu, gen_vvv, gen_helper_vmod_bu)
 TRANS(vmod_hu, gen_vvv, gen_helper_vmod_hu)
 TRANS(vmod_wu, gen_vvv, gen_helper_vmod_wu)
 TRANS(vmod_du, gen_vvv, gen_helper_vmod_du)
+
+static void gen_vsat_s(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
+{
+TCGv_vec t1;
+int64_t max  = (1l << imm) - 1;
+int64_t min =  ~max;
+
+t1 = tcg_temp_new_vec_matching(t);
+tcg_gen_dupi_vec(vece, t, min);
+tcg_gen_smax_vec(vece, t, a, t);
+tcg_gen_dupi_vec(vece, t1, max);
+tcg_gen_smin_vec(vece, t, t, t1);
+}
+
+static void do_vsat_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+  int64_t imm, uint32_t oprsz, uint32_t maxsz)
+{
+static const TCGOpcode vecop_list[] = {
+INDEX_op_smax_vec, INDEX_op_smin_vec, 0
+};
+static const GVecGen2i op[4] = {
+{
+.fniv = gen_vsat_s,
+.fnoi = gen_helper_vsat_b,
+.opt_opc = vecop_list,
+.vece = MO_8
+},
+{
+.fniv = gen_vsat_s,
+.fnoi = gen_helper_vsat_h,
+.opt_opc = vecop_list,
+.vece = MO_16
+},
+{
+.fniv = gen_vsat_s,
+.fnoi = gen_helper_vsat_w,
+.opt_opc = vecop_list,
+.vece = MO_32
+},
+{
+.fniv = gen_vsat_s,
+.fnoi = gen_helper_vsat_d,
+.opt_opc = vecop_list,
+.vece = MO_64
+},
+};
+
+tcg_gen_gvec_2i(vd_ofs, vj_ofs, oprsz, maxsz, imm, [vece]);
+}
+
+TRANS(vsat_b, gvec_vv_i, MO_8, do_vsat_s)
+TRANS(vsat_h, gvec_vv_i, MO_16, do_vsat_s)
+TRANS(vsat_w, gvec_vv_i, MO_32, do_vsat_s)
+TRANS(vsat_d, gvec_vv_i, MO_64, do_vsat_s)
+
+static void gen_vsat_u(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
+{
+uint64_t max;
+
+max = (imm == 0x3f) ? UINT64_MAX : (1ul << (imm + 1)) - 1;
+
+tcg_gen_dupi_vec(vece, t, max);
+tcg_gen_umin_vec(vece, t, a, t);
+}
+
+static void do_vsat_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+   int64_t imm, uint32_t oprsz, uint32_t maxsz)
+{
+static const TCGOpcode vecop_list[] = {
+INDEX_op_umin_vec, 0
+};
+static const GVecGen2i op[4] = {
+{
+.fniv = gen_vsat_u,
+.fnoi = gen_helper_vsat_bu,
+.opt_opc = vecop_list,
+.vece = MO_8
+},
+{
+.fniv = gen_vsat_u,
+.fnoi = gen_helper_vsat_hu,
+.opt_opc = vecop_list,
+.vece = MO_16
+},
+{
+.fniv = gen_vsat_u,
+.fnoi = gen_helper_vsat_wu,
+.opt_opc = vecop_list,
+.vece = MO_32
+},
+{
+.fniv = gen_vsat_u,
+.fnoi 

[RFC PATCH v2 43/44] target/loongarch: Implement vldi

2023-03-27 Thread Song Gao
This patch includes:
- VLDI.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c|   7 +
 target/loongarch/insn_trans/trans_lsx.c.inc | 142 
 target/loongarch/insns.decode   |   4 +
 3 files changed, 153 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 8627908fc9..5c402d944d 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -858,6 +858,11 @@ static void output_vrr(DisasContext *ctx, arg_vrr *a, 
const char *mnemonic)
 output(ctx, mnemonic, "v%d, r%d, r%d", a->vd, a->rj, a->rk);
 }
 
+static void output_v_i(DisasContext *ctx, arg_v_i *a, const char *mnemonic)
+{
+output(ctx, mnemonic, "v%d, 0x%x", a->vd, a->imm);
+}
+
 INSN_LSX(vadd_b,   vvv)
 INSN_LSX(vadd_h,   vvv)
 INSN_LSX(vadd_w,   vvv)
@@ -1143,6 +1148,8 @@ INSN_LSX(vmskltz_d,vv)
 INSN_LSX(vmskgez_b,vv)
 INSN_LSX(vmsknz_b, vv)
 
+INSN_LSX(vldi, v_i)
+
 INSN_LSX(vand_v,   vvv)
 INSN_LSX(vor_v,vvv)
 INSN_LSX(vxor_v,   vvv)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index ab896f8a9e..cb5aa9e4a9 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2606,6 +2606,148 @@ TRANS(vmskltz_d, gen_vv, gen_helper_vmskltz_d)
 TRANS(vmskgez_b, gen_vv, gen_helper_vmskgez_b)
 TRANS(vmsknz_b, gen_vv, gen_helper_vmsknz_b)
 
+#define EXPAND_BYTE(bit)  ((uint64_t)(bit ? 0xff : 0))
+
+static uint64_t vldi_get_value(DisasContext *ctx, uint32_t imm)
+{
+int mode;
+uint64_t data, t;
+
+/*
+ * imm bit [11:8] is mode, mode value is 0-12.
+ * other values are invalid.
+ */
+mode = (imm >> 8) & 0xf;
+t =  imm & 0xff;
+switch (mode) {
+case 0:
+/* data: {2{24'0, imm[7:0]}} */
+data =  (t << 32) | t ;
+break;
+case 1:
+/* data: {2{16'0, imm[7:0], 8'0}} */
+data = (t << 24) | (t << 8);
+break;
+case 2:
+/* data: {2{8'0, imm[7:0], 16'0}} */
+data = (t << 48) | (t << 16);
+break;
+case 3:
+/* data: {2{imm[7:0], 24'0}} */
+data = (t << 56) | (t << 24);
+break;
+case 4:
+/* data: {4{8'0, imm[7:0]}} */
+data = (t << 48) | (t << 32) | (t << 16) | t;
+break;
+case 5:
+/* data: {4{imm[7:0], 8'0}} */
+data = (t << 56) |(t << 40) | (t << 24) | (t << 8);
+break;
+case 6:
+/* data: {2{16'0, imm[7:0], 8'1}} */
+data = (t << 40) | ((uint64_t)0xff << 32) | (t << 8) | 0xff;
+break;
+case 7:
+/* data: {2{8'0, imm[7:0], 16'1}} */
+data = (t << 48) | ((uint64_t)0x << 32) | (t << 16) | 0x;
+break;
+case 8:
+/* data: {8{imm[7:0]}} */
+data =(t << 56) | (t << 48) | (t << 40) | (t << 32) |
+  (t << 24) | (t << 16) | (t << 8) | t;
+break;
+case 9:
+/* data: {{8{imm[7]}, ..., 8{imm[0]}}} */
+{
+uint64_t b0,b1,b2,b3,b4,b5,b6,b7;
+b0 = t& 0x1;
+b1 = (t & 0x2) >> 1;
+b2 = (t & 0x4) >> 2;
+b3 = (t & 0x8) >> 3;
+b4 = (t & 0x10) >> 4;
+b5 = (t & 0x20) >> 5;
+b6 = (t & 0x40) >> 6;
+b7 = (t & 0x80) >> 7;
+data = (EXPAND_BYTE(b7) << 56) |
+   (EXPAND_BYTE(b6) << 48) |
+   (EXPAND_BYTE(b5) << 40) |
+   (EXPAND_BYTE(b4) << 32) |
+   (EXPAND_BYTE(b3) << 24) |
+   (EXPAND_BYTE(b2) << 16) |
+   (EXPAND_BYTE(b1) <<  8) |
+   EXPAND_BYTE(b0);
+}
+break;
+case 10:
+/* data: {2{imm[7], ~imm[6], {5{imm[6]}}, imm[5:0], 19'0}} */
+{
+uint64_t b6, b7;
+uint64_t t0, t1;
+b6 = (imm & 0x40) >> 6;
+b7 = (imm & 0x80) >> 7;
+t0 = (imm & 0x3f);
+t1 = (b7 << 6) | ((1-b6) << 5) | (uint64_t)(b6 ? 0x1f : 0);
+data  = (t1 << 57) | (t0 << 51) | (t1 << 25) | (t0 << 19);
+}
+break;
+case 11:
+/* data: {32'0, imm[7], ~{imm[6]}, 5{imm[6]}, imm[5:0], 19'0} */
+{
+uint64_t b6,b7;
+uint64_t t0, t1;
+b6 = (imm & 0x40) >> 6;
+b7 = (imm & 0x80) >> 7;
+t0 = (imm & 0x3f);
+t1 = (b7 << 6) | ((1-b6) << 5) | (b6 ? 0x1f : 0);
+data = (t1 << 25) | (t0 << 19);
+}
+break;
+case 12:
+/* data: {imm[7], ~imm[6], 8{imm[6]}, imm[5:0], 48'0} */
+{
+uint64_t b6,b7;
+uint64_t t0, t1;
+b6 = (imm & 0x40) >> 6;
+b7 = (imm & 0x80) >> 7;
+t0 = (imm & 0x3f);
+t1 = (b7 << 9) | ((1-b6) << 8) | (b6 ? 0xff : 0);
+data = (t1 << 54) | (t0 << 48);
+

[RFC PATCH v2 30/44] target/loongarch: Implement vclo vclz

2023-03-27 Thread Song Gao
This patch includes:
- VCLO.{B/H/W/D};
- VCLZ.{B/H/W/D}.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c|  9 ++
 target/loongarch/helper.h   |  9 ++
 target/loongarch/insn_trans/trans_lsx.c.inc |  9 ++
 target/loongarch/insns.decode   |  9 ++
 target/loongarch/lsx_helper.c   | 31 +
 5 files changed, 67 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 405e8885cd..0c82a1d9d1 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1258,3 +1258,12 @@ INSN_LSX(vssrarni_bu_h,vv_i)
 INSN_LSX(vssrarni_hu_w,vv_i)
 INSN_LSX(vssrarni_wu_d,vv_i)
 INSN_LSX(vssrarni_du_q,vv_i)
+
+INSN_LSX(vclo_b,   vv)
+INSN_LSX(vclo_h,   vv)
+INSN_LSX(vclo_w,   vv)
+INSN_LSX(vclo_d,   vv)
+INSN_LSX(vclz_b,   vv)
+INSN_LSX(vclz_h,   vv)
+INSN_LSX(vclz_w,   vv)
+INSN_LSX(vclz_d,   vv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index d602de390b..a7facc6bc1 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -486,3 +486,12 @@ DEF_HELPER_4(vssrarni_bu_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vssrarni_hu_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vssrarni_wu_d, void, env, i32, i32, i32)
 DEF_HELPER_4(vssrarni_du_q, void, env, i32, i32, i32)
+
+DEF_HELPER_3(vclo_b, void, env, i32, i32)
+DEF_HELPER_3(vclo_h, void, env, i32, i32)
+DEF_HELPER_3(vclo_w, void, env, i32, i32)
+DEF_HELPER_3(vclo_d, void, env, i32, i32)
+DEF_HELPER_3(vclz_b, void, env, i32, i32)
+DEF_HELPER_3(vclz_h, void, env, i32, i32)
+DEF_HELPER_3(vclz_w, void, env, i32, i32)
+DEF_HELPER_3(vclz_d, void, env, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index c732c43580..5d81c02103 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2785,3 +2785,12 @@ TRANS(vssrarni_bu_h, gen_vv_i, gen_helper_vssrarni_bu_h)
 TRANS(vssrarni_hu_w, gen_vv_i, gen_helper_vssrarni_hu_w)
 TRANS(vssrarni_wu_d, gen_vv_i, gen_helper_vssrarni_wu_d)
 TRANS(vssrarni_du_q, gen_vv_i, gen_helper_vssrarni_du_q)
+
+TRANS(vclo_b, gen_vv, gen_helper_vclo_b)
+TRANS(vclo_h, gen_vv, gen_helper_vclo_h)
+TRANS(vclo_w, gen_vv, gen_helper_vclo_w)
+TRANS(vclo_d, gen_vv, gen_helper_vclo_d)
+TRANS(vclz_b, gen_vv, gen_helper_vclz_b)
+TRANS(vclz_h, gen_vv, gen_helper_vclz_h)
+TRANS(vclz_w, gen_vv, gen_helper_vclz_w)
+TRANS(vclz_d, gen_vv, gen_helper_vclz_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index bb4b2a8632..7591ec1bab 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -959,3 +959,12 @@ vssrarni_bu_h0111 00110110 11000 1  . .   
@vv_ui4
 vssrarni_hu_w0111 00110110 11001 . . .@vv_ui5
 vssrarni_wu_d0111 00110110 1101 .. . .@vv_ui6
 vssrarni_du_q0111 00110110 111 ... . .@vv_ui7
+
+vclo_b   0111 00101001 11000 0 . .@vv
+vclo_h   0111 00101001 11000 1 . .@vv
+vclo_w   0111 00101001 11000 00010 . .@vv
+vclo_d   0111 00101001 11000 00011 . .@vv
+vclz_b   0111 00101001 11000 00100 . .@vv
+vclz_h   0111 00101001 11000 00101 . .@vv
+vclz_w   0111 00101001 11000 00110 . .@vv
+vclz_d   0111 00101001 11000 00111 . .@vv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 4b933f8a69..8ec479dc2d 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -2170,3 +2170,34 @@ void HELPER(vssrarni_du_q)(CPULoongArchState *env,
 VSSRARNUI(vssrarni_bu_h, 16, B, H)
 VSSRARNUI(vssrarni_hu_w, 32, H, W)
 VSSRARNUI(vssrarni_wu_d, 64, W, D)
+
+#define DO_2OP(NAME, BIT, E, T, DO_OP)  \
+void HELPER(NAME)(CPULoongArchState *env, uint32_t vd, uint32_t vj) \
+{   \
+int i;  \
+VReg *Vd = &(env->fpr[vd].vreg);\
+VReg *Vj = &(env->fpr[vj].vreg);\
+\
+for (i = 0; i < LSX_LEN/BIT; i++)   \
+{   \
+Vd->E(i) = DO_OP((T)Vj->E(i));  \
+}   \
+}
+
+#define DO_CLO_B(N)  (clz32((uint8_t)~N) - 24)
+#define DO_CLO_H(N)  (clz32((uint16_t)~N) - 16)
+#define DO_CLO_W(N)  (clz32((uint32_t)~N))
+#define DO_CLO_D(N)  (clz64((uint64_t)~N))
+#define DO_CLZ_B(N)  (clz32(N) - 24)
+#define DO_CLZ_H(N)  (clz32(N) - 16)
+#define 

[RFC PATCH v2 00/44] Add LoongArch LSX instructions

2023-03-27 Thread Song Gao
Hi,

This series adds LoongArch LSX instructions, Since the LoongArch
Vol2 is not open, So we use 'RFC' title.

About Test:
V2 we use RISU test the LoongArch LSX instructions.
No problems have been found so far.

QEMU:
https://github.com/loongson/qemu/tree/tcg-old-abi-support-lsx
RISU:
https://github.com/loongson/risu/tree/loongarch-suport-lsx

V2:
  - Use gvec;
  - Fix instructions bugs;
  - Add set_fpr()/get_fpr() replace to cpu_fpr.

Thanks.
Song Gao

Song Gao (44):
  target/loongarch: Add LSX data type VReg
  target/loongarch: CPUCFG support LSX
  target/loongarch: meson.build support build LSX
  target/loongarch: Add CHECK_SXE maccro for check LSX enable
  target/loongarch: Implement vadd/vsub
  target/loongarch: Implement vaddi/vsubi
  target/loongarch: Implement vneg
  target/loongarch: Implement vsadd/vssub
  target/loongarch: Implement vhaddw/vhsubw
  target/loongarch: Implement vaddw/vsubw
  target/loongarch: Implement vavg/vavgr
  target/loongarch: Implement vabsd
  target/loongarch: Implement vadda
  target/loongarch: Implement vmax/vmin
  target/loongarch: Implement vmul/vmuh/vmulw{ev/od}
  target/loongarch: Implement vmadd/vmsub/vmaddw{ev/od}
  target/loongarch: Implement vdiv/vmod
  target/loongarch: Implement vsat
  target/loongarch: Implement vexth
  target/loongarch: Implement vsigncov
  target/loongarch: Implement vmskltz/vmskgez/vmsknz
  target/loongarch: Implement LSX logic instructions
  target/loongarch: Implement vsll vsrl vsra vrotr
  target/loongarch: Implement vsllwil vextl
  target/loongarch: Implement vsrlr vsrar
  target/loongarch: Implement vsrln vsran
  target/loongarch: Implement vsrlrn vsrarn
  target/loongarch: Implement vssrln vssran
  target/loongarch: Implement vssrlrn vssrarn
  target/loongarch: Implement vclo vclz
  target/loongarch: Implement vpcnt
  target/loongarch: Implement vbitclr vbitset vbitrev
  target/loongarch: Implement vfrstp
  target/loongarch: Implement LSX fpu arith instructions
  target/loongarch: Implement LSX fpu fcvt instructions
  target/loongarch: Implement vseq vsle vslt
  target/loongarch: Implement vfcmp
  target/loongarch: Implement vbitsel vset
  target/loongarch: Implement vinsgr2vr vpickve2gr vreplgr2vr
  target/loongarch: Implement vreplve vpack vpick
  target/loongarch: Implement vilvl vilvh vextrins vshuf
  target/loongarch: Implement vld vst
  target/loongarch: Implement vldi
  target/loongarch: Use {set/get}_gpr replace to cpu_fpr

 fpu/softfloat.c   |   55 +
 include/fpu/softfloat.h   |   27 +
 linux-user/loongarch64/signal.c   |4 +-
 target/loongarch/cpu.c|5 +-
 target/loongarch/cpu.h|   37 +-
 target/loongarch/disas.c  |  911 
 target/loongarch/fpu_helper.c |2 +-
 target/loongarch/gdbstub.c|4 +-
 target/loongarch/helper.h |  593 +++
 .../loongarch/insn_trans/trans_farith.c.inc   |   72 +-
 target/loongarch/insn_trans/trans_fcmp.c.inc  |   12 +-
 .../loongarch/insn_trans/trans_fmemory.c.inc  |   37 +-
 target/loongarch/insn_trans/trans_fmov.c.inc  |   31 +-
 target/loongarch/insn_trans/trans_lsx.c.inc   | 3724 +
 target/loongarch/insns.decode |  811 
 target/loongarch/internals.h  |1 +
 target/loongarch/lsx_helper.c | 3553 
 target/loongarch/machine.c|   34 +-
 target/loongarch/meson.build  |1 +
 target/loongarch/translate.c  |   38 +-
 20 files changed, 9901 insertions(+), 51 deletions(-)
 create mode 100644 target/loongarch/insn_trans/trans_lsx.c.inc
 create mode 100644 target/loongarch/lsx_helper.c

-- 
2.31.1




[RFC PATCH v2 34/44] target/loongarch: Implement LSX fpu arith instructions

2023-03-27 Thread Song Gao
This patch includes:
- VF{ADD/SUB/MUL/DIV}.{S/D};
- VF{MADD/MSUB/NMADD/NMSUB}.{S/D};
- VF{MAX/MIN}.{S/D};
- VF{MAXA/MINA}.{S/D};
- VFLOGB.{S/D};
- VFCLASS.{S/D};
- VF{SQRT/RECIP/RSQRT}.{S/D}.

Signed-off-by: Song Gao 
---
 target/loongarch/cpu.h  |   4 +
 target/loongarch/disas.c|  46 +
 target/loongarch/fpu_helper.c   |   2 +-
 target/loongarch/helper.h   |  41 +
 target/loongarch/insn_trans/trans_lsx.c.inc |  55 ++
 target/loongarch/insns.decode   |  43 +
 target/loongarch/internals.h|   1 +
 target/loongarch/lsx_helper.c   | 187 
 8 files changed, 378 insertions(+), 1 deletion(-)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index 2e5326f474..abbe79f783 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -55,6 +55,10 @@ FIELD(FCSR0, CAUSE, 24, 5)
 do { \
 (REG) = FIELD_DP32(REG, FCSR0, CAUSE, V); \
 } while (0)
+#define UPDATE_FP_CAUSE(REG, V) \
+do { \
+(REG) |= FIELD_DP32(0, FCSR0, CAUSE, V); \
+} while (0)
 
 #define GET_FP_ENABLES(REG)FIELD_EX32(REG, FCSR0, ENABLES)
 #define SET_FP_ENABLES(REG, V) \
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index be2bb9cc42..b57b284e49 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -807,6 +807,11 @@ static void output_vv(DisasContext *ctx, arg_vv *a, const 
char *mnemonic)
 output(ctx, mnemonic, "v%d, v%d", a->vd, a->vj);
 }
 
+static void output_(DisasContext *ctx, arg_ *a, const char *mnemonic)
+{
+output(ctx, mnemonic, "v%d, v%d, v%d, v%d", a->vd, a->vj, a->vk, a->va);
+}
+
 INSN_LSX(vadd_b,   vvv)
 INSN_LSX(vadd_h,   vvv)
 INSN_LSX(vadd_w,   vvv)
@@ -1302,3 +1307,44 @@ INSN_LSX(vfrstp_b, vvv)
 INSN_LSX(vfrstp_h, vvv)
 INSN_LSX(vfrstpi_b,vv_i)
 INSN_LSX(vfrstpi_h,vv_i)
+
+INSN_LSX(vfadd_s,  vvv)
+INSN_LSX(vfadd_d,  vvv)
+INSN_LSX(vfsub_s,  vvv)
+INSN_LSX(vfsub_d,  vvv)
+INSN_LSX(vfmul_s,  vvv)
+INSN_LSX(vfmul_d,  vvv)
+INSN_LSX(vfdiv_s,  vvv)
+INSN_LSX(vfdiv_d,  vvv)
+
+INSN_LSX(vfmadd_s, )
+INSN_LSX(vfmadd_d, )
+INSN_LSX(vfmsub_s, )
+INSN_LSX(vfmsub_d, )
+INSN_LSX(vfnmadd_s,)
+INSN_LSX(vfnmadd_d,)
+INSN_LSX(vfnmsub_s,)
+INSN_LSX(vfnmsub_d,)
+
+INSN_LSX(vfmax_s,  vvv)
+INSN_LSX(vfmax_d,  vvv)
+INSN_LSX(vfmin_s,  vvv)
+INSN_LSX(vfmin_d,  vvv)
+
+INSN_LSX(vfmaxa_s, vvv)
+INSN_LSX(vfmaxa_d, vvv)
+INSN_LSX(vfmina_s, vvv)
+INSN_LSX(vfmina_d, vvv)
+
+INSN_LSX(vflogb_s, vv)
+INSN_LSX(vflogb_d, vv)
+
+INSN_LSX(vfclass_s,vv)
+INSN_LSX(vfclass_d,vv)
+
+INSN_LSX(vfsqrt_s, vv)
+INSN_LSX(vfsqrt_d, vv)
+INSN_LSX(vfrecip_s,vv)
+INSN_LSX(vfrecip_d,vv)
+INSN_LSX(vfrsqrt_s,vv)
+INSN_LSX(vfrsqrt_d,vv)
diff --git a/target/loongarch/fpu_helper.c b/target/loongarch/fpu_helper.c
index 4b9637210a..f6753c5875 100644
--- a/target/loongarch/fpu_helper.c
+++ b/target/loongarch/fpu_helper.c
@@ -33,7 +33,7 @@ void restore_fp_status(CPULoongArchState *env)
 set_flush_to_zero(0, >fp_status);
 }
 
-static int ieee_ex_to_loongarch(int xcpt)
+int ieee_ex_to_loongarch(int xcpt)
 {
 int ret = 0;
 if (xcpt & float_flag_invalid) {
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index d8b783ebc7..2c59fb09c0 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -530,3 +530,44 @@ DEF_HELPER_4(vfrstp_b, void, env, i32, i32, i32)
 DEF_HELPER_4(vfrstp_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vfrstpi_b, void, env, i32, i32, i32)
 DEF_HELPER_4(vfrstpi_h, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vfadd_s, void, env, i32, i32, i32)
+DEF_HELPER_4(vfadd_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vfsub_s, void, env, i32, i32, i32)
+DEF_HELPER_4(vfsub_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vfmul_s, void, env, i32, i32, i32)
+DEF_HELPER_4(vfmul_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vfdiv_s, void, env, i32, i32, i32)
+DEF_HELPER_4(vfdiv_d, void, env, i32, i32, i32)
+
+DEF_HELPER_5(vfmadd_s, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(vfmadd_d, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(vfmsub_s, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(vfmsub_d, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(vfnmadd_s, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(vfnmadd_d, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(vfnmsub_s, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(vfnmsub_d, void, env, i32, i32, i32, i32)
+
+DEF_HELPER_4(vfmax_s, void, env, i32, i32, i32)
+DEF_HELPER_4(vfmax_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vfmin_s, void, env, i32, i32, i32)
+DEF_HELPER_4(vfmin_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vfmaxa_s, void, 

[RFC PATCH v2 09/44] target/loongarch: Implement vhaddw/vhsubw

2023-03-27 Thread Song Gao
This patch includes:
- VHADDW.{H.B/W.H/D.W/Q.D/HU.BU/WU.HU/DU.WU/QU.DU};
- VHSUBW.{H.B/W.H/D.W/Q.D/HU.BU/WU.HU/DU.WU/QU.DU}.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c| 17 
 target/loongarch/helper.h   | 17 
 target/loongarch/insn_trans/trans_lsx.c.inc | 17 
 target/loongarch/insns.decode   | 17 
 target/loongarch/lsx_helper.c   | 89 +
 5 files changed, 157 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index b7f9320ba0..adfd693938 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -848,3 +848,20 @@ INSN_LSX(vssub_bu, vvv)
 INSN_LSX(vssub_hu, vvv)
 INSN_LSX(vssub_wu, vvv)
 INSN_LSX(vssub_du, vvv)
+
+INSN_LSX(vhaddw_h_b,   vvv)
+INSN_LSX(vhaddw_w_h,   vvv)
+INSN_LSX(vhaddw_d_w,   vvv)
+INSN_LSX(vhaddw_q_d,   vvv)
+INSN_LSX(vhaddw_hu_bu, vvv)
+INSN_LSX(vhaddw_wu_hu, vvv)
+INSN_LSX(vhaddw_du_wu, vvv)
+INSN_LSX(vhaddw_qu_du, vvv)
+INSN_LSX(vhsubw_h_b,   vvv)
+INSN_LSX(vhsubw_w_h,   vvv)
+INSN_LSX(vhsubw_d_w,   vvv)
+INSN_LSX(vhsubw_q_d,   vvv)
+INSN_LSX(vhsubw_hu_bu, vvv)
+INSN_LSX(vhsubw_wu_hu, vvv)
+INSN_LSX(vhsubw_du_wu, vvv)
+INSN_LSX(vhsubw_qu_du, vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 13390c07d6..040f12c92c 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -134,3 +134,20 @@ DEF_HELPER_1(idle, void, env)
 /* LoongArch LSX  */
 DEF_HELPER_4(vadd_q, void, env, i32, i32, i32)
 DEF_HELPER_4(vsub_q, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vhaddw_h_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vhaddw_w_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vhaddw_d_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vhaddw_q_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vhaddw_hu_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vhaddw_wu_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vhaddw_du_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vhaddw_qu_du, void, env, i32, i32, i32)
+DEF_HELPER_4(vhsubw_h_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vhsubw_w_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vhsubw_d_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vhsubw_q_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vhsubw_hu_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vhsubw_wu_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vhsubw_du_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vhsubw_qu_du, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index 0bf4759a0f..d8b8c2a5ea 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -128,3 +128,20 @@ TRANS(vssub_bu, gvec_vvv, MO_8, tcg_gen_gvec_ussub)
 TRANS(vssub_hu, gvec_vvv, MO_16, tcg_gen_gvec_ussub)
 TRANS(vssub_wu, gvec_vvv, MO_32, tcg_gen_gvec_ussub)
 TRANS(vssub_du, gvec_vvv, MO_64, tcg_gen_gvec_ussub)
+
+TRANS(vhaddw_h_b, gen_vvv, gen_helper_vhaddw_h_b)
+TRANS(vhaddw_w_h, gen_vvv, gen_helper_vhaddw_w_h)
+TRANS(vhaddw_d_w, gen_vvv, gen_helper_vhaddw_d_w)
+TRANS(vhaddw_q_d, gen_vvv, gen_helper_vhaddw_q_d)
+TRANS(vhaddw_hu_bu, gen_vvv, gen_helper_vhaddw_hu_bu)
+TRANS(vhaddw_wu_hu, gen_vvv, gen_helper_vhaddw_wu_hu)
+TRANS(vhaddw_du_wu, gen_vvv, gen_helper_vhaddw_du_wu)
+TRANS(vhaddw_qu_du, gen_vvv, gen_helper_vhaddw_qu_du)
+TRANS(vhsubw_h_b, gen_vvv, gen_helper_vhsubw_h_b)
+TRANS(vhsubw_w_h, gen_vvv, gen_helper_vhsubw_w_h)
+TRANS(vhsubw_d_w, gen_vvv, gen_helper_vhsubw_d_w)
+TRANS(vhsubw_q_d, gen_vvv, gen_helper_vhsubw_q_d)
+TRANS(vhsubw_hu_bu, gen_vvv, gen_helper_vhsubw_hu_bu)
+TRANS(vhsubw_wu_hu, gen_vvv, gen_helper_vhsubw_wu_hu)
+TRANS(vhsubw_du_wu, gen_vvv, gen_helper_vhsubw_du_wu)
+TRANS(vhsubw_qu_du, gen_vvv, gen_helper_vhsubw_qu_du)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 3a29f0a9ab..10a20858e5 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -542,3 +542,20 @@ vssub_bu 0111 0100 11000 . . .
@vvv
 vssub_hu 0111 0100 11001 . . .@vvv
 vssub_wu 0111 0100 11010 . . .@vvv
 vssub_du 0111 0100 11011 . . .@vvv
+
+vhaddw_h_b   0111 0101 01000 . . .@vvv
+vhaddw_w_h   0111 0101 01001 . . .@vvv
+vhaddw_d_w   0111 0101 01010 . . .@vvv
+vhaddw_q_d   0111 0101 01011 . . .@vvv
+vhaddw_hu_bu 0111 0101 1 . . .@vvv
+vhaddw_wu_hu 0111 0101 10001 . . .@vvv
+vhaddw_du_wu 0111 0101 10010 . . .@vvv
+vhaddw_qu_du 0111 0101 10011 . . .@vvv
+vhsubw_h_b   0111 0101 01100 . . .@vvv
+vhsubw_w_h   0111 0101 01101 . . .@vvv
+vhsubw_d_w   0111 0101 01110 . . .  

[RFC PATCH v2 10/44] target/loongarch: Implement vaddw/vsubw

2023-03-27 Thread Song Gao
This patch includes:
- VADDW{EV/OD}.{H.B/W.H/D.W/Q.D}[U];
- VSUBW{EV/OD}.{H.B/W.H/D.W/Q.D}[U];
- VADDW{EV/OD}.{H.BU.B/W.HU.H/D.WU.W/Q.DU.D}.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c|  43 +
 target/loongarch/helper.h   |  45 ++
 target/loongarch/insn_trans/trans_lsx.c.inc | 832 
 target/loongarch/insns.decode   |  43 +
 target/loongarch/lsx_helper.c   | 210 +
 5 files changed, 1173 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index adfd693938..8ee14916f3 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -865,3 +865,46 @@ INSN_LSX(vhsubw_hu_bu, vvv)
 INSN_LSX(vhsubw_wu_hu, vvv)
 INSN_LSX(vhsubw_du_wu, vvv)
 INSN_LSX(vhsubw_qu_du, vvv)
+
+INSN_LSX(vaddwev_h_b,  vvv)
+INSN_LSX(vaddwev_w_h,  vvv)
+INSN_LSX(vaddwev_d_w,  vvv)
+INSN_LSX(vaddwev_q_d,  vvv)
+INSN_LSX(vaddwod_h_b,  vvv)
+INSN_LSX(vaddwod_w_h,  vvv)
+INSN_LSX(vaddwod_d_w,  vvv)
+INSN_LSX(vaddwod_q_d,  vvv)
+INSN_LSX(vsubwev_h_b,  vvv)
+INSN_LSX(vsubwev_w_h,  vvv)
+INSN_LSX(vsubwev_d_w,  vvv)
+INSN_LSX(vsubwev_q_d,  vvv)
+INSN_LSX(vsubwod_h_b,  vvv)
+INSN_LSX(vsubwod_w_h,  vvv)
+INSN_LSX(vsubwod_d_w,  vvv)
+INSN_LSX(vsubwod_q_d,  vvv)
+
+INSN_LSX(vaddwev_h_bu, vvv)
+INSN_LSX(vaddwev_w_hu, vvv)
+INSN_LSX(vaddwev_d_wu, vvv)
+INSN_LSX(vaddwev_q_du, vvv)
+INSN_LSX(vaddwod_h_bu, vvv)
+INSN_LSX(vaddwod_w_hu, vvv)
+INSN_LSX(vaddwod_d_wu, vvv)
+INSN_LSX(vaddwod_q_du, vvv)
+INSN_LSX(vsubwev_h_bu, vvv)
+INSN_LSX(vsubwev_w_hu, vvv)
+INSN_LSX(vsubwev_d_wu, vvv)
+INSN_LSX(vsubwev_q_du, vvv)
+INSN_LSX(vsubwod_h_bu, vvv)
+INSN_LSX(vsubwod_w_hu, vvv)
+INSN_LSX(vsubwod_d_wu, vvv)
+INSN_LSX(vsubwod_q_du, vvv)
+
+INSN_LSX(vaddwev_h_bu_b,   vvv)
+INSN_LSX(vaddwev_w_hu_h,   vvv)
+INSN_LSX(vaddwev_d_wu_w,   vvv)
+INSN_LSX(vaddwev_q_du_d,   vvv)
+INSN_LSX(vaddwod_h_bu_b,   vvv)
+INSN_LSX(vaddwod_w_hu_h,   vvv)
+INSN_LSX(vaddwod_d_wu_w,   vvv)
+INSN_LSX(vaddwod_q_du_d,   vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 040f12c92c..566d9b6293 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -151,3 +151,48 @@ DEF_HELPER_4(vhsubw_hu_bu, void, env, i32, i32, i32)
 DEF_HELPER_4(vhsubw_wu_hu, void, env, i32, i32, i32)
 DEF_HELPER_4(vhsubw_du_wu, void, env, i32, i32, i32)
 DEF_HELPER_4(vhsubw_qu_du, void, env, i32, i32, i32)
+
+DEF_HELPER_FLAGS_4(vaddwev_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwev_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwev_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwev_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwod_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwod_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwod_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwod_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vsubwev_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsubwev_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsubwev_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsubwev_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsubwod_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsubwod_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsubwod_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsubwod_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vaddwev_h_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwev_w_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwev_d_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwev_q_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwod_h_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwod_w_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwod_d_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vaddwod_q_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vsubwev_h_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsubwev_w_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsubwev_d_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsubwev_q_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsubwod_h_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsubwod_w_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsubwod_d_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsubwod_q_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)

[RFC PATCH v2 13/44] target/loongarch: Implement vadda

2023-03-27 Thread Song Gao
This patch includes:
- VADDA.{B/H/W/D}.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c|  5 ++
 target/loongarch/helper.h   |  5 ++
 target/loongarch/insn_trans/trans_lsx.c.inc | 53 +
 target/loongarch/insns.decode   |  5 ++
 target/loongarch/lsx_helper.c   | 19 
 5 files changed, 87 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index e98ea37793..1f61e67d1f 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -934,3 +934,8 @@ INSN_LSX(vabsd_bu, vvv)
 INSN_LSX(vabsd_hu, vvv)
 INSN_LSX(vabsd_wu, vvv)
 INSN_LSX(vabsd_du, vvv)
+
+INSN_LSX(vadda_b,  vvv)
+INSN_LSX(vadda_h,  vvv)
+INSN_LSX(vadda_w,  vvv)
+INSN_LSX(vadda_d,  vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index a2f197..37685ded2c 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -223,3 +223,8 @@ DEF_HELPER_FLAGS_4(vabsd_bu, TCG_CALL_NO_RWG, void, ptr, 
ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vabsd_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vabsd_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vabsd_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vadda_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vadda_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vadda_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vadda_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index 3a75347db1..a3fcb47c4f 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -1269,3 +1269,56 @@ TRANS(vabsd_bu, gvec_vvv, MO_8, do_vabsd_u)
 TRANS(vabsd_hu, gvec_vvv, MO_16, do_vabsd_u)
 TRANS(vabsd_wu, gvec_vvv, MO_32, do_vabsd_u)
 TRANS(vabsd_du, gvec_vvv, MO_64, do_vabsd_u)
+
+static void gen_vadda(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+TCGv_vec t1, t2;
+
+t1 = tcg_temp_new_vec_matching(a);
+t2 = tcg_temp_new_vec_matching(b);
+
+tcg_gen_abs_vec(vece, t1, a);
+tcg_gen_abs_vec(vece, t2, b);
+tcg_gen_add_vec(vece, t, t1, t2);
+}
+
+static void do_vadda(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+ uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+static const TCGOpcode vecop_list[] = {
+INDEX_op_abs_vec, INDEX_op_add_vec, 0
+};
+static const GVecGen3 op[4] = {
+{
+.fniv = gen_vadda,
+.fno = gen_helper_vadda_b,
+.opt_opc = vecop_list,
+.vece = MO_8
+},
+{
+.fniv = gen_vadda,
+.fno = gen_helper_vadda_h,
+.opt_opc = vecop_list,
+.vece = MO_16
+},
+{
+.fniv = gen_vadda,
+.fno = gen_helper_vadda_w,
+.opt_opc = vecop_list,
+.vece = MO_32
+},
+{
+.fniv = gen_vadda,
+.fno = gen_helper_vadda_d,
+.opt_opc = vecop_list,
+.vece = MO_64
+},
+};
+
+tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, [vece]);
+}
+
+TRANS(vadda_b, gvec_vvv, MO_8, do_vadda)
+TRANS(vadda_h, gvec_vvv, MO_16, do_vadda)
+TRANS(vadda_w, gvec_vvv, MO_32, do_vadda)
+TRANS(vadda_d, gvec_vvv, MO_64, do_vadda)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 825ddedf4d..6cb22f9297 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -628,3 +628,8 @@ vabsd_bu 0111 0110 00100 . . .
@vvv
 vabsd_hu 0111 0110 00101 . . .@vvv
 vabsd_wu 0111 0110 00110 . . .@vvv
 vabsd_du 0111 0110 00111 . . .@vvv
+
+vadda_b  0111 0101 11000 . . .@vvv
+vadda_h  0111 0101 11001 . . .@vvv
+vadda_w  0111 0101 11010 . . .@vvv
+vadda_d  0111 0101 11011 . . .@vvv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 18d566feaa..c28eb62cff 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -409,3 +409,22 @@ DO_VABSD_U(vabsd_bu, 8, uint8_t, B, DO_VABSD)
 DO_VABSD_U(vabsd_hu, 16, uint16_t, H, DO_VABSD)
 DO_VABSD_U(vabsd_wu, 32, uint32_t, W, DO_VABSD)
 DO_VABSD_U(vabsd_du, 64, uint64_t, D, DO_VABSD)
+
+#define DO_VABS(a)  ((a < 0) ? (-a) : (a))
+
+#define DO_VADDA(NAME, BIT, E, DO_OP)   \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v) \
+{   \
+int i;  \
+VReg *Vd = (VReg *)vd;   

[RFC PATCH v2 38/44] target/loongarch: Implement vbitsel vset

2023-03-27 Thread Song Gao
This patch includes:
- VBITSEL.V;
- VBITSELI.B;
- VSET{EQZ/NEZ}.V;
- VSETANYEQZ.{B/H/W/D};
- VSETALLNEZ.{B/H/W/D}.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c| 20 +++
 target/loongarch/helper.h   | 13 +
 target/loongarch/insn_trans/trans_lsx.c.inc | 58 +
 target/loongarch/insns.decode   | 17 ++
 target/loongarch/lsx_helper.c   | 57 
 5 files changed, 165 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 64db01d2f9..ecf0c7b577 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -792,6 +792,12 @@ static bool trans_##insn(DisasContext *ctx, arg_##type * 
a) \
 return true;\
 }
 
+static void output_cv(DisasContext *ctx, arg_cv *a,
+const char *mnemonic)
+{
+output(ctx, mnemonic, "fcc%d, v%d", a->cd, a->vj);
+}
+
 static void output_vvv(DisasContext *ctx, arg_vvv *a, const char *mnemonic)
 {
 output(ctx, mnemonic, "v%d, v%d, v%d", a->vd, a->vj, a->vk);
@@ -1541,3 +1547,17 @@ static bool trans_vfcmp_cond_##suffix(DisasContext *ctx, 
\
 
 LSX_FCMP_INSN(s)
 LSX_FCMP_INSN(d)
+
+INSN_LSX(vbitsel_v,)
+INSN_LSX(vbitseli_b,   vv_i)
+
+INSN_LSX(vseteqz_v,cv)
+INSN_LSX(vsetnez_v,cv)
+INSN_LSX(vsetanyeqz_b, cv)
+INSN_LSX(vsetanyeqz_h, cv)
+INSN_LSX(vsetanyeqz_w, cv)
+INSN_LSX(vsetanyeqz_d, cv)
+INSN_LSX(vsetallnez_b, cv)
+INSN_LSX(vsetallnez_h, cv)
+INSN_LSX(vsetallnez_w, cv)
+INSN_LSX(vsetallnez_d, cv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index ef0b67349d..cdc007a072 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -655,3 +655,16 @@ DEF_HELPER_5(vfcmp_c_s, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(vfcmp_s_s, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(vfcmp_c_d, void, env, i32, i32, i32, i32)
 DEF_HELPER_5(vfcmp_s_d, void, env, i32, i32, i32, i32)
+
+DEF_HELPER_FLAGS_4(vbitseli_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_3(vseteqz_v, void, env, i32, i32)
+DEF_HELPER_3(vsetnez_v, void, env, i32, i32)
+DEF_HELPER_3(vsetanyeqz_b, void, env, i32, i32)
+DEF_HELPER_3(vsetanyeqz_h, void, env, i32, i32)
+DEF_HELPER_3(vsetanyeqz_w, void, env, i32, i32)
+DEF_HELPER_3(vsetanyeqz_d, void, env, i32, i32)
+DEF_HELPER_3(vsetallnez_b, void, env, i32, i32)
+DEF_HELPER_3(vsetallnez_h, void, env, i32, i32)
+DEF_HELPER_3(vsetallnez_w, void, env, i32, i32)
+DEF_HELPER_3(vsetallnez_d, void, env, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index 593b8b481d..7fc5c6c1d6 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -65,6 +65,17 @@ static bool gen_vv_i(DisasContext *ctx, arg_vv_i *a,
 return true;
 }
 
+static bool gen_cv(DisasContext *ctx, arg_cv *a,
+void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32))
+{
+TCGv_i32 vj = tcg_constant_i32(a->vj);
+TCGv_i32 cd = tcg_constant_i32(a->cd);
+
+CHECK_SXE;
+func(cpu_env, cd, vj);
+return true;
+}
+
 static bool gvec_vvv(DisasContext *ctx, arg_vvv *a, MemOp mop,
  void (*func)(unsigned, uint32_t, uint32_t,
   uint32_t, uint32_t, uint32_t))
@@ -3163,3 +3174,50 @@ static bool trans_vfcmp_cond_d(DisasContext *ctx, 
arg_vvv_fcond *a)
 
 return true;
 }
+
+static bool trans_vbitsel_v(DisasContext *ctx, arg_ *a)
+{
+CHECK_SXE;
+
+tcg_gen_gvec_bitsel(MO_64, vreg_full_offset(a->vd), 
vreg_full_offset(a->va),
+vreg_full_offset(a->vk), vreg_full_offset(a->vj),
+16, 16);
+return true;
+}
+
+static void gen_vbitseli(unsigned vece, TCGv_vec a, TCGv_vec b, int64_t imm)
+{
+TCGv_vec t;
+
+t = tcg_temp_new_vec_matching(a);
+tcg_gen_dupi_vec(vece, t, imm);
+tcg_gen_bitsel_vec(vece, a, a, t, b);
+}
+
+static bool trans_vbitseli_b(DisasContext *ctx, arg_vv_i *a)
+{
+static const GVecGen2i op = {
+   .fniv = gen_vbitseli,
+   .fnoi = gen_helper_vbitseli_b,
+   .vece = MO_8,
+   .load_dest = true
+};
+
+CHECK_SXE;
+
+tcg_gen_gvec_2i(vreg_full_offset(a->vd), vreg_full_offset(a->vj),
+16, 16, a->imm, );
+return true;
+}
+
+
+TRANS(vseteqz_v, gen_cv, gen_helper_vseteqz_v)
+TRANS(vsetnez_v, gen_cv, gen_helper_vsetnez_v)
+TRANS(vsetanyeqz_b, gen_cv, gen_helper_vsetanyeqz_b)
+TRANS(vsetanyeqz_h, gen_cv, gen_helper_vsetanyeqz_h)
+TRANS(vsetanyeqz_w, gen_cv, gen_helper_vsetanyeqz_w)
+TRANS(vsetanyeqz_d, gen_cv, gen_helper_vsetanyeqz_d)
+TRANS(vsetallnez_b, gen_cv, gen_helper_vsetallnez_b)
+TRANS(vsetallnez_h, gen_cv, gen_helper_vsetallnez_h)
+TRANS(vsetallnez_w, gen_cv, gen_helper_vsetallnez_w)
+TRANS(vsetallnez_d, gen_cv, gen_helper_vsetallnez_d)
diff --git 

[RFC PATCH v2 32/44] target/loongarch: Implement vbitclr vbitset vbitrev

2023-03-27 Thread Song Gao
This patch includes:
- VBITCLR[I].{B/H/W/D};
- VBITSET[I].{B/H/W/D};
- VBITREV[I].{B/H/W/D}.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c| 25 +
 target/loongarch/helper.h   | 25 +
 target/loongarch/insn_trans/trans_lsx.c.inc | 25 +
 target/loongarch/insns.decode   | 25 +
 target/loongarch/lsx_helper.c   | 57 +
 5 files changed, 157 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 0ca51de9d8..48c7ea47a4 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1272,3 +1272,28 @@ INSN_LSX(vpcnt_b,  vv)
 INSN_LSX(vpcnt_h,  vv)
 INSN_LSX(vpcnt_w,  vv)
 INSN_LSX(vpcnt_d,  vv)
+
+INSN_LSX(vbitclr_b,vvv)
+INSN_LSX(vbitclr_h,vvv)
+INSN_LSX(vbitclr_w,vvv)
+INSN_LSX(vbitclr_d,vvv)
+INSN_LSX(vbitclri_b,   vv_i)
+INSN_LSX(vbitclri_h,   vv_i)
+INSN_LSX(vbitclri_w,   vv_i)
+INSN_LSX(vbitclri_d,   vv_i)
+INSN_LSX(vbitset_b,vvv)
+INSN_LSX(vbitset_h,vvv)
+INSN_LSX(vbitset_w,vvv)
+INSN_LSX(vbitset_d,vvv)
+INSN_LSX(vbitseti_b,   vv_i)
+INSN_LSX(vbitseti_h,   vv_i)
+INSN_LSX(vbitseti_w,   vv_i)
+INSN_LSX(vbitseti_d,   vv_i)
+INSN_LSX(vbitrev_b,vvv)
+INSN_LSX(vbitrev_h,vvv)
+INSN_LSX(vbitrev_w,vvv)
+INSN_LSX(vbitrev_d,vvv)
+INSN_LSX(vbitrevi_b,   vv_i)
+INSN_LSX(vbitrevi_h,   vv_i)
+INSN_LSX(vbitrevi_w,   vv_i)
+INSN_LSX(vbitrevi_d,   vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 38e310512b..4622f788ee 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -500,3 +500,28 @@ DEF_HELPER_3(vpcnt_b, void, env, i32, i32)
 DEF_HELPER_3(vpcnt_h, void, env, i32, i32)
 DEF_HELPER_3(vpcnt_w, void, env, i32, i32)
 DEF_HELPER_3(vpcnt_d, void, env, i32, i32)
+
+DEF_HELPER_4(vbitclr_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitclr_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitclr_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitclr_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitclri_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitclri_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitclri_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitclri_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitset_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitset_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitset_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitset_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitseti_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitseti_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitseti_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitseti_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitrev_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitrev_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitrev_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitrev_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitrevi_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitrevi_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitrevi_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vbitrevi_d, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index 59923eb1fa..6d3a804767 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2799,3 +2799,28 @@ TRANS(vpcnt_b, gen_vv, gen_helper_vpcnt_b)
 TRANS(vpcnt_h, gen_vv, gen_helper_vpcnt_h)
 TRANS(vpcnt_w, gen_vv, gen_helper_vpcnt_w)
 TRANS(vpcnt_d, gen_vv, gen_helper_vpcnt_d)
+
+TRANS(vbitclr_b, gen_vvv, gen_helper_vbitclr_b)
+TRANS(vbitclr_h, gen_vvv, gen_helper_vbitclr_h)
+TRANS(vbitclr_w, gen_vvv, gen_helper_vbitclr_w)
+TRANS(vbitclr_d, gen_vvv, gen_helper_vbitclr_d)
+TRANS(vbitclri_b, gen_vv_i, gen_helper_vbitclri_b)
+TRANS(vbitclri_h, gen_vv_i, gen_helper_vbitclri_h)
+TRANS(vbitclri_w, gen_vv_i, gen_helper_vbitclri_w)
+TRANS(vbitclri_d, gen_vv_i, gen_helper_vbitclri_d)
+TRANS(vbitset_b, gen_vvv, gen_helper_vbitset_b)
+TRANS(vbitset_h, gen_vvv, gen_helper_vbitset_h)
+TRANS(vbitset_w, gen_vvv, gen_helper_vbitset_w)
+TRANS(vbitset_d, gen_vvv, gen_helper_vbitset_d)
+TRANS(vbitseti_b, gen_vv_i, gen_helper_vbitseti_b)
+TRANS(vbitseti_h, gen_vv_i, gen_helper_vbitseti_h)
+TRANS(vbitseti_w, gen_vv_i, gen_helper_vbitseti_w)
+TRANS(vbitseti_d, gen_vv_i, gen_helper_vbitseti_d)
+TRANS(vbitrev_b, gen_vvv, gen_helper_vbitrev_b)
+TRANS(vbitrev_h, gen_vvv, gen_helper_vbitrev_h)
+TRANS(vbitrev_w, gen_vvv, gen_helper_vbitrev_w)
+TRANS(vbitrev_d, gen_vvv, gen_helper_vbitrev_d)
+TRANS(vbitrevi_b, gen_vv_i, gen_helper_vbitrevi_b)
+TRANS(vbitrevi_h, gen_vv_i, gen_helper_vbitrevi_h)
+TRANS(vbitrevi_w, gen_vv_i, gen_helper_vbitrevi_w)
+TRANS(vbitrevi_d, gen_vv_i, gen_helper_vbitrevi_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index f865e83da5..801c97714e 100644
--- a/target/loongarch/insns.decode
+++ 

[RFC PATCH v2 35/44] target/loongarch: Implement LSX fpu fcvt instructions

2023-03-27 Thread Song Gao
This patch includes:
- VFCVT{L/H}.{S.H/D.S};
- VFCVT.{H.S/S.D};
- VFRINT[{RNE/RZ/RP/RM}].{S/D};
- VFTINT[{RNE/RZ/RP/RM}].{W.S/L.D};
- VFTINT[RZ].{WU.S/LU.D};
- VFTINT[{RNE/RZ/RP/RM}].W.D;
- VFTINT[{RNE/RZ/RP/RM}]{L/H}.L.S;
- VFFINT.{S.W/D.L}[U];
- VFFINT.S.L, VFFINT{L/H}.D.W.

Signed-off-by: Song Gao 
---
 fpu/softfloat.c |  55 +++
 include/fpu/softfloat.h |  27 ++
 target/loongarch/disas.c|  56 +++
 target/loongarch/helper.h   |  56 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  56 +++
 target/loongarch/insns.decode   |  56 +++
 target/loongarch/lsx_helper.c   | 369 
 7 files changed, 675 insertions(+)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index c7454c3eb1..79975c6b01 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -2988,6 +2988,25 @@ float64 float64_round_to_int(float64 a, float_status *s)
 return float64_round_pack_canonical(, s);
 }
 
+#define FRINT_RM(rm, rmode, bits) \
+float ## bits float ## bits ## _round_to_int_ ## rm(  \
+ float ## bits a, float_status *s)\
+{ \
+FloatParts64 pa;   \
+float ## bits ## _unpack_canonical(, a, s); \
+parts_round_to_int(, rmode, 0, s, _params);\
+return float ## bits ## _round_pack_canonical(, s);\
+}
+FRINT_RM(rne, float_round_nearest_even, 32)
+FRINT_RM(rm,  float_round_down, 32)
+FRINT_RM(rp,  float_round_up,   32)
+FRINT_RM(rz,  float_round_to_zero,  32)
+FRINT_RM(rne, float_round_nearest_even, 64)
+FRINT_RM(rm,  float_round_down, 64)
+FRINT_RM(rp,  float_round_up,   64)
+FRINT_RM(rz,  float_round_to_zero,  64)
+#undef FRINT_RM
+
 bfloat16 bfloat16_round_to_int(bfloat16 a, float_status *s)
 {
 FloatParts64 p;
@@ -3349,6 +3368,42 @@ int32_t float64_to_int32_round_to_zero(float64 a, 
float_status *s)
 return float64_to_int32_scalbn(a, float_round_to_zero, 0, s);
 }
 
+#define FTINT_RM(rm, rmode, sbits, dbits) \
+int ## dbits ## _t float ## sbits ## _to_int ## dbits ## _ ## rm( \
+ float ## sbits a, float_status *s)   \
+{ \
+return float ## sbits ## _to_int ## dbits ## _scalbn(a, rmode, 0, s); \
+}
+FTINT_RM(rne, float_round_nearest_even, 32, 32)
+FTINT_RM(rm,  float_round_down, 32, 32)
+FTINT_RM(rp,  float_round_up,   32, 32)
+FTINT_RM(rz,  float_round_to_zero,  32, 32)
+FTINT_RM(rne, float_round_nearest_even, 64, 64)
+FTINT_RM(rm,  float_round_down, 64, 64)
+FTINT_RM(rp,  float_round_up,   64, 64)
+FTINT_RM(rz,  float_round_to_zero,  64, 64)
+
+FTINT_RM(rne, float_round_nearest_even, 32, 64)
+FTINT_RM(rm,  float_round_down, 32, 64)
+FTINT_RM(rp,  float_round_up,   32, 64)
+FTINT_RM(rz,  float_round_to_zero,  32, 64)
+#undef FTINT_RM
+
+int32_t float64_to_int32_round_up(float64 a, float_status *s)
+{
+return float64_to_int32_scalbn(a, float_round_up, 0, s);
+}
+
+int32_t float64_to_int32_round_down(float64 a, float_status *s)
+{
+return float64_to_int32_scalbn(a, float_round_down, 0, s);
+}
+
+int32_t float64_to_int32_round_nearest_even(float64 a, float_status *s)
+{
+return float64_to_int32_scalbn(a, float_round_nearest_even, 0, s);
+}
+
 int64_t float64_to_int64_round_to_zero(float64 a, float_status *s)
 {
 return float64_to_int64_scalbn(a, float_round_to_zero, 0, s);
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 3dcf20e3a2..ebdbaa4ac8 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -559,6 +559,16 @@ int16_t float32_to_int16_round_to_zero(float32, 
float_status *status);
 int32_t float32_to_int32_round_to_zero(float32, float_status *status);
 int64_t float32_to_int64_round_to_zero(float32, float_status *status);
 
+int64_t float32_to_int64_rm(float32, float_status *status);
+int64_t float32_to_int64_rp(float32, float_status *status);
+int64_t float32_to_int64_rz(float32, float_status *status);
+int64_t float32_to_int64_rne(float32, float_status *status);
+
+int32_t float32_to_int32_rm(float32, float_status *status);
+int32_t float32_to_int32_rp(float32, float_status *status);
+int32_t float32_to_int32_rz(float32, float_status *status);
+int32_t float32_to_int32_rne(float32, float_status *status);
+
 uint16_t float32_to_uint16_scalbn(float32, FloatRoundMode, int, float_status 
*);
 uint32_t float32_to_uint32_scalbn(float32, FloatRoundMode, int, float_status 
*);
 uint64_t float32_to_uint64_scalbn(float32, FloatRoundMode, int, float_status 
*);
@@ -579,6 +589,10 @@ float128 float32_to_float128(float32, float_status 
*status);
 | Software IEC/IEEE single-precision operations.
 

[RFC PATCH v2 24/44] target/loongarch: Implement vsllwil vextl

2023-03-27 Thread Song Gao
This patch includes:
- VSLLWIL.{H.B/W.H/D.W};
- VSLLWIL.{HU.BU/WU.HU/DU.WU};
- VEXTL.Q.D, VEXTL.QU.DU.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c|  9 +
 target/loongarch/helper.h   |  9 +
 target/loongarch/insn_trans/trans_lsx.c.inc | 21 +++
 target/loongarch/insns.decode   |  9 +
 target/loongarch/lsx_helper.c   | 40 +
 5 files changed, 88 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index f7d0fb4441..087cac10ad 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1139,3 +1139,12 @@ INSN_LSX(vrotri_b, vv_i)
 INSN_LSX(vrotri_h, vv_i)
 INSN_LSX(vrotri_w, vv_i)
 INSN_LSX(vrotri_d, vv_i)
+
+INSN_LSX(vsllwil_h_b,  vv_i)
+INSN_LSX(vsllwil_w_h,  vv_i)
+INSN_LSX(vsllwil_d_w,  vv_i)
+INSN_LSX(vextl_q_d,vv)
+INSN_LSX(vsllwil_hu_bu,vv_i)
+INSN_LSX(vsllwil_wu_hu,vv_i)
+INSN_LSX(vsllwil_du_wu,vv_i)
+INSN_LSX(vextl_qu_du,  vv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 1eeb614427..0266b9a4ad 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -367,3 +367,12 @@ DEF_HELPER_3(vmskgez_b, void, env, i32, i32)
 DEF_HELPER_3(vmsknz_b, void, env, i32,i32)
 
 DEF_HELPER_FLAGS_4(vnori_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_4(vsllwil_h_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vsllwil_w_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsllwil_d_w, void, env, i32, i32, i32)
+DEF_HELPER_3(vextl_q_d, void, env, i32, i32)
+DEF_HELPER_4(vsllwil_hu_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vsllwil_wu_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vsllwil_du_wu, void, env, i32, i32, i32)
+DEF_HELPER_3(vextl_qu_du, void, env, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index 84c8d92ad6..fb40aaf5ad 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -39,6 +39,18 @@ static bool gen_vv(DisasContext *ctx, arg_vv *a,
 return true;
 }
 
+static bool gen_vv_i(DisasContext *ctx, arg_vv_i *a,
+ void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32))
+{
+TCGv_i32 vd = tcg_constant_i32(a->vd);
+TCGv_i32 vj = tcg_constant_i32(a->vj);
+TCGv_i32 imm = tcg_constant_i32(a->imm);
+
+CHECK_SXE;
+func(cpu_env, vd, vj, imm);
+return true;
+}
+
 static bool gvec_vvv(DisasContext *ctx, arg_vvv *a, MemOp mop,
  void (*func)(unsigned, uint32_t, uint32_t,
   uint32_t, uint32_t, uint32_t))
@@ -2654,3 +2666,12 @@ TRANS(vrotri_b, gvec_vv_i, MO_8, tcg_gen_gvec_rotri)
 TRANS(vrotri_h, gvec_vv_i, MO_16, tcg_gen_gvec_rotri)
 TRANS(vrotri_w, gvec_vv_i, MO_32, tcg_gen_gvec_rotri)
 TRANS(vrotri_d, gvec_vv_i, MO_64, tcg_gen_gvec_rotri)
+
+TRANS(vsllwil_h_b, gen_vv_i, gen_helper_vsllwil_h_b)
+TRANS(vsllwil_w_h, gen_vv_i, gen_helper_vsllwil_w_h)
+TRANS(vsllwil_d_w, gen_vv_i, gen_helper_vsllwil_d_w)
+TRANS(vextl_q_d, gen_vv, gen_helper_vextl_q_d)
+TRANS(vsllwil_hu_bu, gen_vv_i, gen_helper_vsllwil_hu_bu)
+TRANS(vsllwil_wu_hu, gen_vv_i, gen_helper_vsllwil_wu_hu)
+TRANS(vsllwil_du_wu, gen_vv_i, gen_helper_vsllwil_du_wu)
+TRANS(vextl_qu_du, gen_vv, gen_helper_vextl_qu_du)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 7c0b0c4ac8..23dd338026 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -839,3 +839,12 @@ vrotri_b 0111 00101010 0 01 ... . .   
@vv_ui3
 vrotri_h 0111 00101010 0 1  . .   @vv_ui4
 vrotri_w 0111 00101010 1 . . .@vv_ui5
 vrotri_d 0111 00101010 0001 .. . .@vv_ui6
+
+vsllwil_h_b  0111 0011 1 01 ... . .   @vv_ui3
+vsllwil_w_h  0111 0011 1 1  . .   @vv_ui4
+vsllwil_d_w  0111 0011 10001 . . .@vv_ui5
+vextl_q_d0111 0011 10010 0 . .@vv
+vsllwil_hu_bu0111 0011 11000 01 ... . .   @vv_ui3
+vsllwil_wu_hu0111 0011 11000 1  . .   @vv_ui4
+vsllwil_du_wu0111 0011 11001 . . .@vv_ui5
+vextl_qu_du  0111 0011 11010 0 . .@vv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 198ab3088b..72efdd5a74 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -1031,3 +1031,43 @@ void HELPER(vnori_b)(void *vd, void *vj, uint64_t imm, 
uint32_t v)
 Vd->B(i) = ~(Vj->B(i) | (uint8_t)imm);
 }
 }
+
+#define VSLLWIL(NAME, BIT, T1, T2, E1, E2)\
+void HELPER(NAME)(CPULoongArchState *env, \
+  uint32_t vd, uint32_t vj, uint32_t imm) \
+{ \
+int i; 

[RFC PATCH v2 21/44] target/loongarch: Implement vmskltz/vmskgez/vmsknz

2023-03-27 Thread Song Gao
This patch includes:
- VMSKLTZ.{B/H/W/D};
- VMSKGEZ.B;
- VMSKNZ.B.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c|   7 ++
 target/loongarch/helper.h   |   7 ++
 target/loongarch/insn_trans/trans_lsx.c.inc |   7 ++
 target/loongarch/insns.decode   |   7 ++
 target/loongarch/lsx_helper.c   | 130 
 5 files changed, 158 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 46e808c321..2725b827ee 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1084,3 +1084,10 @@ INSN_LSX(vsigncov_b,   vvv)
 INSN_LSX(vsigncov_h,   vvv)
 INSN_LSX(vsigncov_w,   vvv)
 INSN_LSX(vsigncov_d,   vvv)
+
+INSN_LSX(vmskltz_b,vv)
+INSN_LSX(vmskltz_h,vv)
+INSN_LSX(vmskltz_w,vv)
+INSN_LSX(vmskltz_d,vv)
+INSN_LSX(vmskgez_b,vv)
+INSN_LSX(vmsknz_b, vv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index a7394b2eb7..cc2f542278 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -358,3 +358,10 @@ DEF_HELPER_FLAGS_4(vsigncov_b, TCG_CALL_NO_RWG, void, ptr, 
ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vsigncov_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vsigncov_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vsigncov_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_3(vmskltz_b, void, env, i32, i32)
+DEF_HELPER_3(vmskltz_h, void, env, i32, i32)
+DEF_HELPER_3(vmskltz_w, void, env, i32, i32)
+DEF_HELPER_3(vmskltz_d, void, env, i32, i32)
+DEF_HELPER_3(vmskgez_b, void, env, i32, i32)
+DEF_HELPER_3(vmsknz_b, void, env, i32,i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index 865485ea10..9ca3a23106 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2561,3 +2561,10 @@ TRANS(vsigncov_b, gvec_vvv, MO_8, do_vsigncov)
 TRANS(vsigncov_h, gvec_vvv, MO_16, do_vsigncov)
 TRANS(vsigncov_w, gvec_vvv, MO_32, do_vsigncov)
 TRANS(vsigncov_d, gvec_vvv, MO_64, do_vsigncov)
+
+TRANS(vmskltz_b, gen_vv, gen_helper_vmskltz_b)
+TRANS(vmskltz_h, gen_vv, gen_helper_vmskltz_h)
+TRANS(vmskltz_w, gen_vv, gen_helper_vmskltz_w)
+TRANS(vmskltz_d, gen_vv, gen_helper_vmskltz_d)
+TRANS(vmskgez_b, gen_vv, gen_helper_vmskgez_b)
+TRANS(vmsknz_b, gen_vv, gen_helper_vmsknz_b)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 4233dd7404..47c1ef78a7 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -783,3 +783,10 @@ vsigncov_b   0111 00010010 11100 . . .
@vvv
 vsigncov_h   0111 00010010 11101 . . .@vvv
 vsigncov_w   0111 00010010 0 . . .@vvv
 vsigncov_d   0111 00010010 1 . . .@vvv
+
+vmskltz_b0111 00101001 11000 1 . .@vv
+vmskltz_h0111 00101001 11000 10001 . .@vv
+vmskltz_w0111 00101001 11000 10010 . .@vv
+vmskltz_d0111 00101001 11000 10011 . .@vv
+vmskgez_b0111 00101001 11000 10100 . .@vv
+vmsknz_b 0111 00101001 11000 11000 . .@vv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index b3a9b8cb66..f8916c06da 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -890,3 +890,133 @@ VSIGNCOV(vsigncov_b, 8, B, DO_SIGNCOV)
 VSIGNCOV(vsigncov_h, 16, H, DO_SIGNCOV)
 VSIGNCOV(vsigncov_w, 32, W, DO_SIGNCOV)
 VSIGNCOV(vsigncov_d, 64, D, DO_SIGNCOV)
+
+static uint64_t do_vmskltz_b(int64_t val)
+{
+uint64_t m = 0x8080808080808080ULL;
+uint64_t c =  val & m;
+c |= c << 7;
+c |= c << 14;
+c |= c << 28;
+return c >> 56;
+}
+
+void HELPER(vmskltz_b)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+{
+VReg temp;
+VReg *Vd = &(env->fpr[vd].vreg);
+VReg *Vj = &(env->fpr[vj].vreg);
+
+temp.D(0) = 0;
+temp.D(1) = 0;
+temp.H(0) = do_vmskltz_b(Vj->D(0));
+temp.H(0) |= (do_vmskltz_b(Vj->D(1)) << 8);
+Vd->D(0) = temp.D(0);
+Vd->D(1) = 0;
+}
+
+static uint64_t do_vmskltz_h(int64_t val)
+{
+uint64_t m = 0x8000800080008000ULL;
+uint64_t c =  val & m;
+c |= c << 15;
+c |= c << 30;
+return c >> 60;
+}
+
+void HELPER(vmskltz_h)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+{
+VReg temp;
+VReg *Vd = &(env->fpr[vd].vreg);
+VReg *Vj = &(env->fpr[vj].vreg);
+
+temp.D(0) = 0;
+temp.D(1) = 0;
+temp.H(0) = do_vmskltz_h(Vj->D(0));
+temp.H(0) |= (do_vmskltz_h(Vj->D(1)) << 4);
+Vd->D(0) = temp.D(0);
+Vd->D(1) = 0;
+}
+
+static uint64_t do_vmskltz_w(int64_t val)
+{
+uint64_t m = 0x80008000ULL;
+uint64_t c =  val & m;
+c |= c << 31;
+return c >> 62;
+}
+
+void HELPER(vmskltz_w)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+{
+VReg temp;
+VReg *Vd 

[RFC PATCH v2 29/44] target/loongarch: Implement vssrlrn vssrarn

2023-03-27 Thread Song Gao
This patch includes:
- VSSRLRN.{B.H/H.W/W.D};
- VSSRARN.{B.H/H.W/W.D};
- VSSRLRN.{BU.H/HU.W/WU.D};
- VSSRARN.{BU.H/HU.W/WU.D};
- VSSRLRNI.{B.H/H.W/W.D/D.Q};
- VSSRARNI.{B.H/H.W/W.D/D.Q};
- VSSRLRNI.{BU.H/HU.W/WU.D/DU.Q};
- VSSRARNI.{BU.H/HU.W/WU.D/DU.Q}.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c|  30 ++
 target/loongarch/helper.h   |  30 ++
 target/loongarch/insn_trans/trans_lsx.c.inc |  30 ++
 target/loongarch/insns.decode   |  30 ++
 target/loongarch/lsx_helper.c   | 362 
 5 files changed, 482 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 426d30dc01..405e8885cd 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1228,3 +1228,33 @@ INSN_LSX(vssrani_bu_h, vv_i)
 INSN_LSX(vssrani_hu_w, vv_i)
 INSN_LSX(vssrani_wu_d, vv_i)
 INSN_LSX(vssrani_du_q, vv_i)
+
+INSN_LSX(vssrlrn_b_h,  vvv)
+INSN_LSX(vssrlrn_h_w,  vvv)
+INSN_LSX(vssrlrn_w_d,  vvv)
+INSN_LSX(vssrarn_b_h,  vvv)
+INSN_LSX(vssrarn_h_w,  vvv)
+INSN_LSX(vssrarn_w_d,  vvv)
+INSN_LSX(vssrlrn_bu_h, vvv)
+INSN_LSX(vssrlrn_hu_w, vvv)
+INSN_LSX(vssrlrn_wu_d, vvv)
+INSN_LSX(vssrarn_bu_h, vvv)
+INSN_LSX(vssrarn_hu_w, vvv)
+INSN_LSX(vssrarn_wu_d, vvv)
+
+INSN_LSX(vssrlrni_b_h, vv_i)
+INSN_LSX(vssrlrni_h_w, vv_i)
+INSN_LSX(vssrlrni_w_d, vv_i)
+INSN_LSX(vssrlrni_d_q, vv_i)
+INSN_LSX(vssrlrni_bu_h,vv_i)
+INSN_LSX(vssrlrni_hu_w,vv_i)
+INSN_LSX(vssrlrni_wu_d,vv_i)
+INSN_LSX(vssrlrni_du_q,vv_i)
+INSN_LSX(vssrarni_b_h, vv_i)
+INSN_LSX(vssrarni_h_w, vv_i)
+INSN_LSX(vssrarni_w_d, vv_i)
+INSN_LSX(vssrarni_d_q, vv_i)
+INSN_LSX(vssrarni_bu_h,vv_i)
+INSN_LSX(vssrarni_hu_w,vv_i)
+INSN_LSX(vssrarni_wu_d,vv_i)
+INSN_LSX(vssrarni_du_q,vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 7562f01ad6..d602de390b 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -456,3 +456,33 @@ DEF_HELPER_4(vssrani_bu_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vssrani_hu_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vssrani_wu_d, void, env, i32, i32, i32)
 DEF_HELPER_4(vssrani_du_q, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vssrlrn_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrn_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrn_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarn_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarn_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarn_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrn_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrn_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrn_wu_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarn_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarn_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarn_wu_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vssrlrni_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrni_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrni_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrni_d_q, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarni_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarni_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarni_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarni_d_q, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrni_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrni_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrni_wu_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlrni_du_q, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarni_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarni_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarni_wu_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrarni_du_q, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index 58f27d7f65..c732c43580 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2755,3 +2755,33 @@ TRANS(vssrani_bu_h, gen_vv_i, gen_helper_vssrani_bu_h)
 TRANS(vssrani_hu_w, gen_vv_i, gen_helper_vssrani_hu_w)
 TRANS(vssrani_wu_d, gen_vv_i, gen_helper_vssrani_wu_d)
 TRANS(vssrani_du_q, gen_vv_i, gen_helper_vssrani_du_q)
+
+TRANS(vssrlrn_b_h, gen_vvv, gen_helper_vssrlrn_b_h)
+TRANS(vssrlrn_h_w, gen_vvv, gen_helper_vssrlrn_h_w)
+TRANS(vssrlrn_w_d, gen_vvv, gen_helper_vssrlrn_w_d)
+TRANS(vssrarn_b_h, gen_vvv, gen_helper_vssrarn_b_h)
+TRANS(vssrarn_h_w, gen_vvv, gen_helper_vssrarn_h_w)
+TRANS(vssrarn_w_d, gen_vvv, gen_helper_vssrarn_w_d)
+TRANS(vssrlrn_bu_h, gen_vvv, gen_helper_vssrlrn_bu_h)
+TRANS(vssrlrn_hu_w, gen_vvv, gen_helper_vssrlrn_hu_w)
+TRANS(vssrlrn_wu_d, gen_vvv, gen_helper_vssrlrn_wu_d)
+TRANS(vssrarn_bu_h, gen_vvv, gen_helper_vssrarn_bu_h)
+TRANS(vssrarn_hu_w, gen_vvv, gen_helper_vssrarn_hu_w)
+TRANS(vssrarn_wu_d, gen_vvv, gen_helper_vssrarn_wu_d)
+
+TRANS(vssrlrni_b_h, gen_vv_i, gen_helper_vssrlrni_b_h)

[RFC PATCH v2 08/44] target/loongarch: Implement vsadd/vssub

2023-03-27 Thread Song Gao
This patch includes:
- VSADD.{B/H/W/D}[U];
- VSSUB.{B/H/W/D}[U].

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c| 17 +
 target/loongarch/insn_trans/trans_lsx.c.inc | 17 +
 target/loongarch/insns.decode   | 17 +
 3 files changed, 51 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 5eabb8c47a..b7f9320ba0 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -831,3 +831,20 @@ INSN_LSX(vneg_b,   vv)
 INSN_LSX(vneg_h,   vv)
 INSN_LSX(vneg_w,   vv)
 INSN_LSX(vneg_d,   vv)
+
+INSN_LSX(vsadd_b,  vvv)
+INSN_LSX(vsadd_h,  vvv)
+INSN_LSX(vsadd_w,  vvv)
+INSN_LSX(vsadd_d,  vvv)
+INSN_LSX(vsadd_bu, vvv)
+INSN_LSX(vsadd_hu, vvv)
+INSN_LSX(vsadd_wu, vvv)
+INSN_LSX(vsadd_du, vvv)
+INSN_LSX(vssub_b,  vvv)
+INSN_LSX(vssub_h,  vvv)
+INSN_LSX(vssub_w,  vvv)
+INSN_LSX(vssub_d,  vvv)
+INSN_LSX(vssub_bu, vvv)
+INSN_LSX(vssub_hu, vvv)
+INSN_LSX(vssub_wu, vvv)
+INSN_LSX(vssub_du, vvv)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index dc66e44a75..0bf4759a0f 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -111,3 +111,20 @@ TRANS(vneg_b, gvec_vv, MO_8, tcg_gen_gvec_neg)
 TRANS(vneg_h, gvec_vv, MO_16, tcg_gen_gvec_neg)
 TRANS(vneg_w, gvec_vv, MO_32, tcg_gen_gvec_neg)
 TRANS(vneg_d, gvec_vv, MO_64, tcg_gen_gvec_neg)
+
+TRANS(vsadd_b, gvec_vvv, MO_8, tcg_gen_gvec_ssadd)
+TRANS(vsadd_h, gvec_vvv, MO_16, tcg_gen_gvec_ssadd)
+TRANS(vsadd_w, gvec_vvv, MO_32, tcg_gen_gvec_ssadd)
+TRANS(vsadd_d, gvec_vvv, MO_64, tcg_gen_gvec_ssadd)
+TRANS(vsadd_bu, gvec_vvv, MO_8, tcg_gen_gvec_usadd)
+TRANS(vsadd_hu, gvec_vvv, MO_16, tcg_gen_gvec_usadd)
+TRANS(vsadd_wu, gvec_vvv, MO_32, tcg_gen_gvec_usadd)
+TRANS(vsadd_du, gvec_vvv, MO_64, tcg_gen_gvec_usadd)
+TRANS(vssub_b, gvec_vvv, MO_8, tcg_gen_gvec_sssub)
+TRANS(vssub_h, gvec_vvv, MO_16, tcg_gen_gvec_sssub)
+TRANS(vssub_w, gvec_vvv, MO_32, tcg_gen_gvec_sssub)
+TRANS(vssub_d, gvec_vvv, MO_64, tcg_gen_gvec_sssub)
+TRANS(vssub_bu, gvec_vvv, MO_8, tcg_gen_gvec_ussub)
+TRANS(vssub_hu, gvec_vvv, MO_16, tcg_gen_gvec_ussub)
+TRANS(vssub_wu, gvec_vvv, MO_32, tcg_gen_gvec_ussub)
+TRANS(vssub_du, gvec_vvv, MO_64, tcg_gen_gvec_ussub)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index d90798be11..3a29f0a9ab 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -525,3 +525,20 @@ vneg_b   0111 00101001 11000 01100 . .
@vv
 vneg_h   0111 00101001 11000 01101 . .@vv
 vneg_w   0111 00101001 11000 01110 . .@vv
 vneg_d   0111 00101001 11000 0 . .@vv
+
+vsadd_b  0111 0100 01100 . . .@vvv
+vsadd_h  0111 0100 01101 . . .@vvv
+vsadd_w  0111 0100 01110 . . .@vvv
+vsadd_d  0111 0100 0 . . .@vvv
+vsadd_bu 0111 0100 10100 . . .@vvv
+vsadd_hu 0111 0100 10101 . . .@vvv
+vsadd_wu 0111 0100 10110 . . .@vvv
+vsadd_du 0111 0100 10111 . . .@vvv
+vssub_b  0111 0100 1 . . .@vvv
+vssub_h  0111 0100 10001 . . .@vvv
+vssub_w  0111 0100 10010 . . .@vvv
+vssub_d  0111 0100 10011 . . .@vvv
+vssub_bu 0111 0100 11000 . . .@vvv
+vssub_hu 0111 0100 11001 . . .@vvv
+vssub_wu 0111 0100 11010 . . .@vvv
+vssub_du 0111 0100 11011 . . .@vvv
-- 
2.31.1




[RFC PATCH v2 14/44] target/loongarch: Implement vmax/vmin

2023-03-27 Thread Song Gao
This patch includes:
- VMAX[I].{B/H/W/D}[U];
- VMIN[I].{B/H/W/D}[U].

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c|  33 +++
 target/loongarch/helper.h   |  18 ++
 target/loongarch/insn_trans/trans_lsx.c.inc | 211 
 target/loongarch/insns.decode   |  35 
 target/loongarch/lsx_helper.c   |  43 
 5 files changed, 340 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 1f61e67d1f..6b0e518bfa 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -939,3 +939,36 @@ INSN_LSX(vadda_b,  vvv)
 INSN_LSX(vadda_h,  vvv)
 INSN_LSX(vadda_w,  vvv)
 INSN_LSX(vadda_d,  vvv)
+
+INSN_LSX(vmax_b,   vvv)
+INSN_LSX(vmax_h,   vvv)
+INSN_LSX(vmax_w,   vvv)
+INSN_LSX(vmax_d,   vvv)
+INSN_LSX(vmin_b,   vvv)
+INSN_LSX(vmin_h,   vvv)
+INSN_LSX(vmin_w,   vvv)
+INSN_LSX(vmin_d,   vvv)
+INSN_LSX(vmax_bu,  vvv)
+INSN_LSX(vmax_hu,  vvv)
+INSN_LSX(vmax_wu,  vvv)
+INSN_LSX(vmax_du,  vvv)
+INSN_LSX(vmin_bu,  vvv)
+INSN_LSX(vmin_hu,  vvv)
+INSN_LSX(vmin_wu,  vvv)
+INSN_LSX(vmin_du,  vvv)
+INSN_LSX(vmaxi_b,  vv_i)
+INSN_LSX(vmaxi_h,  vv_i)
+INSN_LSX(vmaxi_w,  vv_i)
+INSN_LSX(vmaxi_d,  vv_i)
+INSN_LSX(vmini_b,  vv_i)
+INSN_LSX(vmini_h,  vv_i)
+INSN_LSX(vmini_w,  vv_i)
+INSN_LSX(vmini_d,  vv_i)
+INSN_LSX(vmaxi_bu, vv_i)
+INSN_LSX(vmaxi_hu, vv_i)
+INSN_LSX(vmaxi_wu, vv_i)
+INSN_LSX(vmaxi_du, vv_i)
+INSN_LSX(vmini_bu, vv_i)
+INSN_LSX(vmini_hu, vv_i)
+INSN_LSX(vmini_wu, vv_i)
+INSN_LSX(vmini_du, vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 37685ded2c..f0fc7760bd 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -228,3 +228,21 @@ DEF_HELPER_FLAGS_4(vadda_b, TCG_CALL_NO_RWG, void, ptr, 
ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vadda_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vadda_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vadda_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vmini_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vmini_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vmini_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vmini_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vmini_bu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vmini_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vmini_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vmini_du, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_4(vmaxi_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vmaxi_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vmaxi_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vmaxi_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vmaxi_bu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vmaxi_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vmaxi_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vmaxi_du, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index a3fcb47c4f..4e2f1ff097 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -1322,3 +1322,214 @@ TRANS(vadda_b, gvec_vvv, MO_8, do_vadda)
 TRANS(vadda_h, gvec_vvv, MO_16, do_vadda)
 TRANS(vadda_w, gvec_vvv, MO_32, do_vadda)
 TRANS(vadda_d, gvec_vvv, MO_64, do_vadda)
+
+TRANS(vmax_b, gvec_vvv, MO_8, tcg_gen_gvec_smax)
+TRANS(vmax_h, gvec_vvv, MO_16, tcg_gen_gvec_smax)
+TRANS(vmax_w, gvec_vvv, MO_32, tcg_gen_gvec_smax)
+TRANS(vmax_d, gvec_vvv, MO_64, tcg_gen_gvec_smax)
+TRANS(vmax_bu, gvec_vvv, MO_8, tcg_gen_gvec_umax)
+TRANS(vmax_hu, gvec_vvv, MO_16, tcg_gen_gvec_umax)
+TRANS(vmax_wu, gvec_vvv, MO_32, tcg_gen_gvec_umax)
+TRANS(vmax_du, gvec_vvv, MO_64, tcg_gen_gvec_umax)
+
+TRANS(vmin_b, gvec_vvv, MO_8, tcg_gen_gvec_smin)
+TRANS(vmin_h, gvec_vvv, MO_16, tcg_gen_gvec_smin)
+TRANS(vmin_w, gvec_vvv, MO_32, tcg_gen_gvec_smin)
+TRANS(vmin_d, gvec_vvv, MO_64, tcg_gen_gvec_smin)
+TRANS(vmin_bu, gvec_vvv, MO_8, tcg_gen_gvec_umin)
+TRANS(vmin_hu, gvec_vvv, MO_16, tcg_gen_gvec_umin)
+TRANS(vmin_wu, gvec_vvv, MO_32, tcg_gen_gvec_umin)
+TRANS(vmin_du, gvec_vvv, MO_64, tcg_gen_gvec_umin)
+
+static void do_vminmax(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm,
+   void(*gen_vminmax_vec)(unsigned,
+  TCGv_vec, TCGv_vec, TCGv_vec))
+{
+TCGv_vec t1;
+
+t1 = tcg_temp_new_vec_matching(t);
+

[RFC PATCH v2 12/44] target/loongarch: Implement vabsd

2023-03-27 Thread Song Gao
This patch includes:
- VABSD.{B/H/W/D}[U].

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c|  9 ++
 target/loongarch/helper.h   |  9 ++
 target/loongarch/insn_trans/trans_lsx.c.inc | 95 +
 target/loongarch/insns.decode   |  9 ++
 target/loongarch/lsx_helper.c   | 36 
 5 files changed, 158 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index e7592e7a34..e98ea37793 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -925,3 +925,12 @@ INSN_LSX(vavgr_bu, vvv)
 INSN_LSX(vavgr_hu, vvv)
 INSN_LSX(vavgr_wu, vvv)
 INSN_LSX(vavgr_du, vvv)
+
+INSN_LSX(vabsd_b,  vvv)
+INSN_LSX(vabsd_h,  vvv)
+INSN_LSX(vabsd_w,  vvv)
+INSN_LSX(vabsd_d,  vvv)
+INSN_LSX(vabsd_bu, vvv)
+INSN_LSX(vabsd_hu, vvv)
+INSN_LSX(vabsd_wu, vvv)
+INSN_LSX(vabsd_du, vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 021fe3cd60..a2f197 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -214,3 +214,12 @@ DEF_HELPER_FLAGS_4(vavgr_bu, TCG_CALL_NO_RWG, void, ptr, 
ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vavgr_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vavgr_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vavgr_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vabsd_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vabsd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vabsd_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vabsd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vabsd_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vabsd_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vabsd_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vabsd_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index 512fe947f6..3a75347db1 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -1174,3 +1174,98 @@ TRANS(vavgr_bu, gvec_vvv, MO_8, do_vavgr_u)
 TRANS(vavgr_hu, gvec_vvv, MO_16, do_vavgr_u)
 TRANS(vavgr_wu, gvec_vvv, MO_32, do_vavgr_u)
 TRANS(vavgr_du, gvec_vvv, MO_64, do_vavgr_u)
+
+static void gen_vabsd_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+tcg_gen_smax_vec(vece, t, a, b);
+tcg_gen_smin_vec(vece, a, a, b);
+tcg_gen_sub_vec(vece, t, t, a);
+}
+
+static void do_vabsd_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+   uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+static const TCGOpcode vecop_list[] = {
+INDEX_op_smax_vec, INDEX_op_smin_vec, INDEX_op_sub_vec, 0
+};
+static const GVecGen3 op[4] = {
+{
+.fniv = gen_vabsd_s,
+.fno = gen_helper_vabsd_b,
+.opt_opc = vecop_list,
+.vece = MO_8
+},
+{
+.fniv = gen_vabsd_s,
+.fno = gen_helper_vabsd_h,
+.opt_opc = vecop_list,
+.vece = MO_16
+},
+{
+.fniv = gen_vabsd_s,
+.fno = gen_helper_vabsd_w,
+.opt_opc = vecop_list,
+.vece = MO_32
+},
+{
+.fniv = gen_vabsd_s,
+.fno = gen_helper_vabsd_d,
+.opt_opc = vecop_list,
+.vece = MO_64
+},
+};
+
+tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, [vece]);
+}
+
+static void gen_vabsd_u(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+tcg_gen_umax_vec(vece, t, a, b);
+tcg_gen_umin_vec(vece, a, a, b);
+tcg_gen_sub_vec(vece, t, t, a);
+}
+
+static void do_vabsd_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+   uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+static const TCGOpcode vecop_list[] = {
+INDEX_op_umax_vec, INDEX_op_umin_vec, INDEX_op_sub_vec, 0
+};
+static const GVecGen3 op[4] = {
+{
+.fniv = gen_vabsd_u,
+.fno = gen_helper_vabsd_bu,
+.opt_opc = vecop_list,
+.vece = MO_8
+},
+{
+.fniv = gen_vabsd_u,
+.fno = gen_helper_vabsd_hu,
+.opt_opc = vecop_list,
+.vece = MO_16
+},
+{
+.fniv = gen_vabsd_u,
+.fno = gen_helper_vabsd_wu,
+.opt_opc = vecop_list,
+.vece = MO_32
+},
+{
+.fniv = gen_vabsd_u,
+.fno = gen_helper_vabsd_du,
+.opt_opc = vecop_list,
+.vece = MO_64
+},
+};
+
+tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, [vece]);
+}
+
+TRANS(vabsd_b, gvec_vvv, MO_8, do_vabsd_s)

[RFC PATCH v2 19/44] target/loongarch: Implement vexth

2023-03-27 Thread Song Gao
This patch includes:
- VEXTH.{H.B/W.H/D.W/Q.D};
- VEXTH.{HU.BU/WU.HU/DU.WU/QU.DU}.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c|  9 ++
 target/loongarch/helper.h   |  9 ++
 target/loongarch/insn_trans/trans_lsx.c.inc | 20 
 target/loongarch/insns.decode   |  9 ++
 target/loongarch/lsx_helper.c   | 35 +
 5 files changed, 82 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index b04aefe3ed..412c1cedcb 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1070,3 +1070,12 @@ INSN_LSX(vsat_bu,  vv_i)
 INSN_LSX(vsat_hu,  vv_i)
 INSN_LSX(vsat_wu,  vv_i)
 INSN_LSX(vsat_du,  vv_i)
+
+INSN_LSX(vexth_h_b,vv)
+INSN_LSX(vexth_w_h,vv)
+INSN_LSX(vexth_d_w,vv)
+INSN_LSX(vexth_q_d,vv)
+INSN_LSX(vexth_hu_bu,  vv)
+INSN_LSX(vexth_wu_hu,  vv)
+INSN_LSX(vexth_du_wu,  vv)
+INSN_LSX(vexth_qu_du,  vv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 6345b7ef9c..0876aa3331 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -344,3 +344,12 @@ DEF_HELPER_FLAGS_4(vsat_bu, TCG_CALL_NO_RWG, void, ptr, 
ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vsat_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vsat_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vsat_du, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_3(vexth_h_b, void, env, i32, i32)
+DEF_HELPER_3(vexth_w_h, void, env, i32, i32)
+DEF_HELPER_3(vexth_d_w, void, env, i32, i32)
+DEF_HELPER_3(vexth_q_d, void, env, i32, i32)
+DEF_HELPER_3(vexth_hu_bu, void, env, i32, i32)
+DEF_HELPER_3(vexth_wu_hu, void, env, i32, i32)
+DEF_HELPER_3(vexth_du_wu, void, env, i32, i32)
+DEF_HELPER_3(vexth_qu_du, void, env, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index 7dfb3b33f6..f6058c1360 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -28,6 +28,17 @@ static bool gen_vvv(DisasContext *ctx, arg_vvv *a,
 return true;
 }
 
+static bool gen_vv(DisasContext *ctx, arg_vv *a,
+   void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32))
+{
+TCGv_i32 vd = tcg_constant_i32(a->vd);
+TCGv_i32 vj = tcg_constant_i32(a->vj);
+
+CHECK_SXE;
+func(cpu_env, vd, vj);
+return true;
+}
+
 static bool gvec_vvv(DisasContext *ctx, arg_vvv *a, MemOp mop,
  void (*func)(unsigned, uint32_t, uint32_t,
   uint32_t, uint32_t, uint32_t))
@@ -2487,3 +2498,12 @@ TRANS(vsat_bu, gvec_vv_i, MO_8, do_vsat_u)
 TRANS(vsat_hu, gvec_vv_i, MO_16, do_vsat_u)
 TRANS(vsat_wu, gvec_vv_i, MO_32, do_vsat_u)
 TRANS(vsat_du, gvec_vv_i, MO_64, do_vsat_u)
+
+TRANS(vexth_h_b, gen_vv, gen_helper_vexth_h_b)
+TRANS(vexth_w_h, gen_vv, gen_helper_vexth_w_h)
+TRANS(vexth_d_w, gen_vv, gen_helper_vexth_d_w)
+TRANS(vexth_q_d, gen_vv, gen_helper_vexth_q_d)
+TRANS(vexth_hu_bu, gen_vv, gen_helper_vexth_hu_bu)
+TRANS(vexth_wu_hu, gen_vv, gen_helper_vexth_wu_hu)
+TRANS(vexth_du_wu, gen_vv, gen_helper_vexth_du_wu)
+TRANS(vexth_qu_du, gen_vv, gen_helper_vexth_qu_du)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 3ed61b3d68..39c582d098 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -769,3 +769,12 @@ vsat_bu  0111 00110010 1 01 ... . .   
@vv_ui3
 vsat_hu  0111 00110010 1 1  . .   @vv_ui4
 vsat_wu  0111 00110010 10001 . . .@vv_ui5
 vsat_du  0111 00110010 1001 .. . .@vv_ui6
+
+vexth_h_b0111 00101001 11101 11000 . .@vv
+vexth_w_h0111 00101001 11101 11001 . .@vv
+vexth_d_w0111 00101001 11101 11010 . .@vv
+vexth_q_d0111 00101001 11101 11011 . .@vv
+vexth_hu_bu  0111 00101001 11101 11100 . .@vv
+vexth_wu_hu  0111 00101001 11101 11101 . .@vv
+vexth_du_wu  0111 00101001 11101 0 . .@vv
+vexth_qu_du  0111 00101001 11101 1 . .@vv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 15efc64e4e..9a0b358576 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -836,3 +836,38 @@ VSAT_U(vsat_bu, 8, uint8_t, B)
 VSAT_U(vsat_hu, 16, uint16_t, H)
 VSAT_U(vsat_wu, 32, uint32_t, W)
 VSAT_U(vsat_du, 64, uint64_t, D)
+
+#define VEXTH(NAME, BIT, T1, T2, E1, E2)\
+void HELPER(NAME)(CPULoongArchState *env, uint32_t vd, uint32_t vj) \
+{   \
+int i;  \
+VReg *Vd = &(env->fpr[vd].vreg);\
+VReg *Vj = 

[RFC PATCH v2 04/44] target/loongarch: Add CHECK_SXE maccro for check LSX enable

2023-03-27 Thread Song Gao
Signed-off-by: Song Gao 
---
 target/loongarch/cpu.c  |  2 ++
 target/loongarch/cpu.h  |  2 ++
 target/loongarch/insn_trans/trans_lsx.c.inc | 11 +++
 3 files changed, 15 insertions(+)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 2263bd4fdd..a3ce1ccf00 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -52,6 +52,7 @@ static const char * const excp_names[] = {
 [EXCCODE_FPE] = "Floating Point Exception",
 [EXCCODE_DBP] = "Debug breakpoint",
 [EXCCODE_BCE] = "Bound Check Exception",
+[EXCCODE_SXD] = "128 bit vector instructions Disable exception",
 };
 
 const char *loongarch_exception_name(int32_t exception)
@@ -187,6 +188,7 @@ static void loongarch_cpu_do_interrupt(CPUState *cs)
 case EXCCODE_FPD:
 case EXCCODE_FPE:
 case EXCCODE_BCE:
+case EXCCODE_ASXD:
 env->CSR_BADV = env->pc;
 QEMU_FALLTHROUGH;
 case EXCCODE_ADEM:
diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index 6e5fa6a01d..2e5326f474 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -429,6 +429,7 @@ static inline int cpu_mmu_index(CPULoongArchState *env, 
bool ifetch)
 #define HW_FLAGS_PLV_MASK   R_CSR_CRMD_PLV_MASK  /* 0x03 */
 #define HW_FLAGS_CRMD_PGR_CSR_CRMD_PG_MASK   /* 0x10 */
 #define HW_FLAGS_EUEN_FPE   0x04
+#define HW_FLAGS_EUEN_SXE   0x08
 
 static inline void cpu_get_tb_cpu_state(CPULoongArchState *env,
 target_ulong *pc,
@@ -439,6 +440,7 @@ static inline void cpu_get_tb_cpu_state(CPULoongArchState 
*env,
 *cs_base = 0;
 *flags = env->CSR_CRMD & (R_CSR_CRMD_PLV_MASK | R_CSR_CRMD_PG_MASK);
 *flags |= FIELD_EX64(env->CSR_EUEN, CSR_EUEN, FPE) * HW_FLAGS_EUEN_FPE;
+*flags |= FIELD_EX64(env->CSR_EUEN, CSR_EUEN, SXE) * HW_FLAGS_EUEN_SXE;
 }
 
 void loongarch_cpu_list(void);
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index 1cf3ab34a9..5dedb044d7 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -3,3 +3,14 @@
  * LSX translate functions
  * Copyright (c) 2022-2023 Loongson Technology Corporation Limited
  */
+
+#ifndef CONFIG_USER_ONLY
+#define CHECK_SXE do { \
+if ((ctx->base.tb->flags & HW_FLAGS_EUEN_SXE) == 0) { \
+generate_exception(ctx, EXCCODE_SXD); \
+return true; \
+} \
+} while (0)
+#else
+#define CHECK_SXE
+#endif
-- 
2.31.1




[RFC PATCH v2 33/44] target/loongarch: Implement vfrstp

2023-03-27 Thread Song Gao
This patch includes:
- VFRSTP[I].{B/H}.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c|  5 +++
 target/loongarch/helper.h   |  5 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  5 +++
 target/loongarch/insns.decode   |  5 +++
 target/loongarch/lsx_helper.c   | 41 +
 5 files changed, 61 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 48c7ea47a4..be2bb9cc42 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1297,3 +1297,8 @@ INSN_LSX(vbitrevi_b,   vv_i)
 INSN_LSX(vbitrevi_h,   vv_i)
 INSN_LSX(vbitrevi_w,   vv_i)
 INSN_LSX(vbitrevi_d,   vv_i)
+
+INSN_LSX(vfrstp_b, vvv)
+INSN_LSX(vfrstp_h, vvv)
+INSN_LSX(vfrstpi_b,vv_i)
+INSN_LSX(vfrstpi_h,vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 4622f788ee..d8b783ebc7 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -525,3 +525,8 @@ DEF_HELPER_4(vbitrevi_b, void, env, i32, i32, i32)
 DEF_HELPER_4(vbitrevi_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vbitrevi_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vbitrevi_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vfrstp_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vfrstp_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vfrstpi_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vfrstpi_h, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index 6d3a804767..9ba9113ca3 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2824,3 +2824,8 @@ TRANS(vbitrevi_b, gen_vv_i, gen_helper_vbitrevi_b)
 TRANS(vbitrevi_h, gen_vv_i, gen_helper_vbitrevi_h)
 TRANS(vbitrevi_w, gen_vv_i, gen_helper_vbitrevi_w)
 TRANS(vbitrevi_d, gen_vv_i, gen_helper_vbitrevi_d)
+
+TRANS(vfrstp_b, gen_vvv, gen_helper_vfrstp_b)
+TRANS(vfrstp_h, gen_vvv, gen_helper_vfrstp_h)
+TRANS(vfrstpi_b, gen_vv_i, gen_helper_vfrstpi_b)
+TRANS(vfrstpi_h, gen_vv_i, gen_helper_vfrstpi_h)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 801c97714e..4cb286ffe5 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -998,3 +998,8 @@ vbitrevi_b   0111 00110001 1 01 ... . .   
@vv_ui3
 vbitrevi_h   0111 00110001 1 1  . .   @vv_ui4
 vbitrevi_w   0111 00110001 10001 . . .@vv_ui5
 vbitrevi_d   0111 00110001 1001 .. . .@vv_ui6
+
+vfrstp_b 0111 00010010 10110 . . .@vvv
+vfrstp_h 0111 00010010 10111 . . .@vvv
+vfrstpi_b0111 00101001 10100 . . .@vv_ui5
+vfrstpi_h0111 00101001 10101 . . .@vv_ui5
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index e23c75bd56..d6143a0016 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -2288,3 +2288,44 @@ DO_BITI(vbitrevi_b, 8, uint8_t, B, DO_BITREV)
 DO_BITI(vbitrevi_h, 16, uint16_t, H, DO_BITREV)
 DO_BITI(vbitrevi_w, 32, uint32_t, W, DO_BITREV)
 DO_BITI(vbitrevi_d, 64, uint64_t, D, DO_BITREV)
+
+#define VFRSTP(NAME, BIT, MASK, E)   \
+void HELPER(NAME)(CPULoongArchState *env,\
+  uint32_t vd, uint32_t vj, uint32_t vk) \
+{\
+int i, m;\
+VReg *Vd = &(env->fpr[vd].vreg); \
+VReg *Vj = &(env->fpr[vj].vreg); \
+VReg *Vk = &(env->fpr[vk].vreg); \
+ \
+for (i = 0; i < LSX_LEN/BIT; i++) {  \
+if (Vj->E(i) < 0) {  \
+break;   \
+}\
+}\
+m = Vk->E(0) & MASK; \
+Vd->E(m) = i;\
+}
+
+VFRSTP(vfrstp_b, 8, 0xf, B)
+VFRSTP(vfrstp_h, 16, 0x7, H)
+
+#define VFRSTPI(NAME, BIT, E) \
+void HELPER(NAME)(CPULoongArchState *env, \
+  uint32_t vd, uint32_t vj, uint32_t imm) \
+{ \
+int i, m; \
+VReg *Vd = &(env->fpr[vd].vreg);  \
+VReg *Vj = &(env->fpr[vj].vreg);  \
+  \
+for (i = 0; i < LSX_LEN/BIT; i++) {   \
+if (Vj->E(i) < 0) {   \
+break;\
+}   

[RFC PATCH v2 25/44] target/loongarch: Implement vsrlr vsrar

2023-03-27 Thread Song Gao
This patch includes:
- VSRLR[I].{B/H/W/D};
- VSRAR[I].{B/H/W/D}.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c|  18 
 target/loongarch/helper.h   |  18 
 target/loongarch/insn_trans/trans_lsx.c.inc |  18 
 target/loongarch/insns.decode   |  18 
 target/loongarch/lsx_helper.c   | 104 
 5 files changed, 176 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 087cac10ad..c62b6720ec 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1148,3 +1148,21 @@ INSN_LSX(vsllwil_hu_bu,vv_i)
 INSN_LSX(vsllwil_wu_hu,vv_i)
 INSN_LSX(vsllwil_du_wu,vv_i)
 INSN_LSX(vextl_qu_du,  vv)
+
+INSN_LSX(vsrlr_b,  vvv)
+INSN_LSX(vsrlr_h,  vvv)
+INSN_LSX(vsrlr_w,  vvv)
+INSN_LSX(vsrlr_d,  vvv)
+INSN_LSX(vsrlri_b, vv_i)
+INSN_LSX(vsrlri_h, vv_i)
+INSN_LSX(vsrlri_w, vv_i)
+INSN_LSX(vsrlri_d, vv_i)
+
+INSN_LSX(vsrar_b,  vvv)
+INSN_LSX(vsrar_h,  vvv)
+INSN_LSX(vsrar_w,  vvv)
+INSN_LSX(vsrar_d,  vvv)
+INSN_LSX(vsrari_b, vv_i)
+INSN_LSX(vsrari_h, vv_i)
+INSN_LSX(vsrari_w, vv_i)
+INSN_LSX(vsrari_d, vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 0266b9a4ad..c28353d822 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -376,3 +376,21 @@ DEF_HELPER_4(vsllwil_hu_bu, void, env, i32, i32, i32)
 DEF_HELPER_4(vsllwil_wu_hu, void, env, i32, i32, i32)
 DEF_HELPER_4(vsllwil_du_wu, void, env, i32, i32, i32)
 DEF_HELPER_3(vextl_qu_du, void, env, i32, i32)
+
+DEF_HELPER_4(vsrlr_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlr_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlr_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlr_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlri_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlri_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlri_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlri_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vsrar_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrar_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrar_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrar_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrari_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrari_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrari_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrari_d, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index fb40aaf5ad..2ee763fb32 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2675,3 +2675,21 @@ TRANS(vsllwil_hu_bu, gen_vv_i, gen_helper_vsllwil_hu_bu)
 TRANS(vsllwil_wu_hu, gen_vv_i, gen_helper_vsllwil_wu_hu)
 TRANS(vsllwil_du_wu, gen_vv_i, gen_helper_vsllwil_du_wu)
 TRANS(vextl_qu_du, gen_vv, gen_helper_vextl_qu_du)
+
+TRANS(vsrlr_b, gen_vvv, gen_helper_vsrlr_b)
+TRANS(vsrlr_h, gen_vvv, gen_helper_vsrlr_h)
+TRANS(vsrlr_w, gen_vvv, gen_helper_vsrlr_w)
+TRANS(vsrlr_d, gen_vvv, gen_helper_vsrlr_d)
+TRANS(vsrlri_b, gen_vv_i, gen_helper_vsrlri_b)
+TRANS(vsrlri_h, gen_vv_i, gen_helper_vsrlri_h)
+TRANS(vsrlri_w, gen_vv_i, gen_helper_vsrlri_w)
+TRANS(vsrlri_d, gen_vv_i, gen_helper_vsrlri_d)
+
+TRANS(vsrar_b, gen_vvv, gen_helper_vsrar_b)
+TRANS(vsrar_h, gen_vvv, gen_helper_vsrar_h)
+TRANS(vsrar_w, gen_vvv, gen_helper_vsrar_w)
+TRANS(vsrar_d, gen_vvv, gen_helper_vsrar_d)
+TRANS(vsrari_b, gen_vv_i, gen_helper_vsrari_b)
+TRANS(vsrari_h, gen_vv_i, gen_helper_vsrari_h)
+TRANS(vsrari_w, gen_vv_i, gen_helper_vsrari_w)
+TRANS(vsrari_d, gen_vv_i, gen_helper_vsrari_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 23dd338026..a21743 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -848,3 +848,21 @@ vsllwil_hu_bu0111 0011 11000 01 ... . .   
@vv_ui3
 vsllwil_wu_hu0111 0011 11000 1  . .   @vv_ui4
 vsllwil_du_wu0111 0011 11001 . . .@vv_ui5
 vextl_qu_du  0111 0011 11010 0 . .@vv
+
+vsrlr_b  0111  0 . . .@vvv
+vsrlr_h  0111  1 . . .@vvv
+vsrlr_w  0111  00010 . . .@vvv
+vsrlr_d  0111  00011 . . .@vvv
+vsrlri_b 0111 00101010 01000 01 ... . .   @vv_ui3
+vsrlri_h 0111 00101010 01000 1  . .   @vv_ui4
+vsrlri_w 0111 00101010 01001 . . .@vv_ui5
+vsrlri_d 0111 00101010 0101 .. . .@vv_ui6
+
+vsrar_b  0111  00100 . . .@vvv
+vsrar_h  0111  00101 . . .@vvv
+vsrar_w  0111  00110 . . .@vvv
+vsrar_d  0111  00111 . . .@vvv
+vsrari_b 0111 

[RFC PATCH v2 36/44] target/loongarch: Implement vseq vsle vslt

2023-03-27 Thread Song Gao
This patch includes:
- VSEQ[I].{B/H/W/D};
- VSLE[I].{B/H/W/D}[U];
- VSLT[I].{B/H/W/D/}[U].

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c|  43 +
 target/loongarch/helper.h   |  23 +++
 target/loongarch/insn_trans/trans_lsx.c.inc | 191 
 target/loongarch/insns.decode   |  43 +
 target/loongarch/lsx_helper.c   |  36 
 5 files changed, 336 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index c04271081f..e589b23f4c 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1404,3 +1404,46 @@ INSN_LSX(vffint_d_lu,  vv)
 INSN_LSX(vffintl_d_w,  vv)
 INSN_LSX(vffinth_d_w,  vv)
 INSN_LSX(vffint_s_l,   vvv)
+
+INSN_LSX(vseq_b,   vvv)
+INSN_LSX(vseq_h,   vvv)
+INSN_LSX(vseq_w,   vvv)
+INSN_LSX(vseq_d,   vvv)
+INSN_LSX(vseqi_b,  vv_i)
+INSN_LSX(vseqi_h,  vv_i)
+INSN_LSX(vseqi_w,  vv_i)
+INSN_LSX(vseqi_d,  vv_i)
+
+INSN_LSX(vsle_b,   vvv)
+INSN_LSX(vsle_h,   vvv)
+INSN_LSX(vsle_w,   vvv)
+INSN_LSX(vsle_d,   vvv)
+INSN_LSX(vslei_b,  vv_i)
+INSN_LSX(vslei_h,  vv_i)
+INSN_LSX(vslei_w,  vv_i)
+INSN_LSX(vslei_d,  vv_i)
+INSN_LSX(vsle_bu,  vvv)
+INSN_LSX(vsle_hu,  vvv)
+INSN_LSX(vsle_wu,  vvv)
+INSN_LSX(vsle_du,  vvv)
+INSN_LSX(vslei_bu, vv_i)
+INSN_LSX(vslei_hu, vv_i)
+INSN_LSX(vslei_wu, vv_i)
+INSN_LSX(vslei_du, vv_i)
+
+INSN_LSX(vslt_b,   vvv)
+INSN_LSX(vslt_h,   vvv)
+INSN_LSX(vslt_w,   vvv)
+INSN_LSX(vslt_d,   vvv)
+INSN_LSX(vslti_b,  vv_i)
+INSN_LSX(vslti_h,  vv_i)
+INSN_LSX(vslti_w,  vv_i)
+INSN_LSX(vslti_d,  vv_i)
+INSN_LSX(vslt_bu,  vvv)
+INSN_LSX(vslt_hu,  vvv)
+INSN_LSX(vslt_wu,  vvv)
+INSN_LSX(vslt_du,  vvv)
+INSN_LSX(vslti_bu, vv_i)
+INSN_LSX(vslti_hu, vv_i)
+INSN_LSX(vslti_wu, vv_i)
+INSN_LSX(vslti_du, vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index b2cc1a6ddb..25ea9b633d 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -627,3 +627,26 @@ DEF_HELPER_3(vffint_d_lu, void, env, i32, i32)
 DEF_HELPER_3(vffintl_d_w, void, env, i32, i32)
 DEF_HELPER_3(vffinth_d_w, void, env, i32, i32)
 DEF_HELPER_4(vffint_s_l, void, env, i32, i32, i32)
+
+DEF_HELPER_FLAGS_4(vseqi_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vseqi_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vseqi_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vseqi_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_4(vslei_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vslei_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vslei_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vslei_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vslei_bu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vslei_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vslei_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vslei_du, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_4(vslti_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vslti_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vslti_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vslti_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vslti_bu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vslti_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vslti_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vslti_du, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index ee3817dd31..7368731424 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2940,3 +2940,194 @@ TRANS(vffint_d_lu, gen_vv, gen_helper_vffint_d_lu)
 TRANS(vffintl_d_w, gen_vv, gen_helper_vffintl_d_w)
 TRANS(vffinth_d_w, gen_vv, gen_helper_vffinth_d_w)
 TRANS(vffint_s_l, gen_vvv, gen_helper_vffint_s_l)
+
+static bool do_cmp(DisasContext *ctx, arg_vvv *a, MemOp mop, TCGCond cond,
+   void (*func)(TCGCond, unsigned, uint32_t, uint32_t,
+uint32_t, uint32_t, uint32_t))
+{
+uint32_t vd_ofs, vj_ofs, vk_ofs;
+
+CHECK_SXE;
+
+vd_ofs = vreg_full_offset(a->vd);
+vj_ofs = vreg_full_offset(a->vj);
+vk_ofs = vreg_full_offset(a->vk);
+
+func(cond, mop, vd_ofs, vj_ofs, vk_ofs, 16, 16);
+return true;
+}
+
+static void do_cmpi_vec(TCGCond cond,
+unsigned vece, TCGv_vec t, TCGv_vec a, 

[RFC PATCH v2 15/44] target/loongarch: Implement vmul/vmuh/vmulw{ev/od}

2023-03-27 Thread Song Gao
This patch includes:
- VMUL.{B/H/W/D};
- VMUH.{B/H/W/D}[U];
- VMULW{EV/OD}.{H.B/W.H/D.W/Q.D}[U];
- VMULW{EV/OD}.{H.BU.B/W.HU.H/D.WU.W/Q.DU.D}.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c|  38 ++
 target/loongarch/helper.h   |  36 ++
 target/loongarch/insn_trans/trans_lsx.c.inc | 378 
 target/loongarch/insns.decode   |  38 ++
 target/loongarch/lsx_helper.c   | 140 
 5 files changed, 630 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 6b0e518bfa..48e6ef5309 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -972,3 +972,41 @@ INSN_LSX(vmini_bu, vv_i)
 INSN_LSX(vmini_hu, vv_i)
 INSN_LSX(vmini_wu, vv_i)
 INSN_LSX(vmini_du, vv_i)
+
+INSN_LSX(vmul_b,   vvv)
+INSN_LSX(vmul_h,   vvv)
+INSN_LSX(vmul_w,   vvv)
+INSN_LSX(vmul_d,   vvv)
+INSN_LSX(vmuh_b,   vvv)
+INSN_LSX(vmuh_h,   vvv)
+INSN_LSX(vmuh_w,   vvv)
+INSN_LSX(vmuh_d,   vvv)
+INSN_LSX(vmuh_bu,  vvv)
+INSN_LSX(vmuh_hu,  vvv)
+INSN_LSX(vmuh_wu,  vvv)
+INSN_LSX(vmuh_du,  vvv)
+
+INSN_LSX(vmulwev_h_b,  vvv)
+INSN_LSX(vmulwev_w_h,  vvv)
+INSN_LSX(vmulwev_d_w,  vvv)
+INSN_LSX(vmulwev_q_d,  vvv)
+INSN_LSX(vmulwod_h_b,  vvv)
+INSN_LSX(vmulwod_w_h,  vvv)
+INSN_LSX(vmulwod_d_w,  vvv)
+INSN_LSX(vmulwod_q_d,  vvv)
+INSN_LSX(vmulwev_h_bu, vvv)
+INSN_LSX(vmulwev_w_hu, vvv)
+INSN_LSX(vmulwev_d_wu, vvv)
+INSN_LSX(vmulwev_q_du, vvv)
+INSN_LSX(vmulwod_h_bu, vvv)
+INSN_LSX(vmulwod_w_hu, vvv)
+INSN_LSX(vmulwod_d_wu, vvv)
+INSN_LSX(vmulwod_q_du, vvv)
+INSN_LSX(vmulwev_h_bu_b,   vvv)
+INSN_LSX(vmulwev_w_hu_h,   vvv)
+INSN_LSX(vmulwev_d_wu_w,   vvv)
+INSN_LSX(vmulwev_q_du_d,   vvv)
+INSN_LSX(vmulwod_h_bu_b,   vvv)
+INSN_LSX(vmulwod_w_hu_h,   vvv)
+INSN_LSX(vmulwod_d_wu_w,   vvv)
+INSN_LSX(vmulwod_q_du_d,   vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index f0fc7760bd..437b47fa78 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -246,3 +246,39 @@ DEF_HELPER_FLAGS_4(vmaxi_bu, TCG_CALL_NO_RWG, void, ptr, 
ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vmaxi_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vmaxi_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vmaxi_du, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_4(vmuh_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmuh_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmuh_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmuh_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmuh_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmuh_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmuh_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmuh_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vmulwev_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwev_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwev_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwev_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwod_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwod_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwod_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwod_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vmulwev_h_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwev_w_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwev_d_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwev_q_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwod_h_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwod_w_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwod_d_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwod_q_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vmulwev_h_bu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwev_w_hu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwev_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwev_q_du_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwod_h_bu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwod_w_hu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwod_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmulwod_q_du_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 

[RFC PATCH v2 03/44] target/loongarch: meson.build support build LSX

2023-03-27 Thread Song Gao
Signed-off-by: Song Gao 
---
 target/loongarch/insn_trans/trans_lsx.c.inc | 5 +
 target/loongarch/lsx_helper.c   | 6 ++
 target/loongarch/meson.build| 1 +
 target/loongarch/translate.c| 1 +
 4 files changed, 13 insertions(+)
 create mode 100644 target/loongarch/insn_trans/trans_lsx.c.inc
 create mode 100644 target/loongarch/lsx_helper.c

diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
new file mode 100644
index 00..1cf3ab34a9
--- /dev/null
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * LSX translate functions
+ * Copyright (c) 2022-2023 Loongson Technology Corporation Limited
+ */
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
new file mode 100644
index 00..9332163aff
--- /dev/null
+++ b/target/loongarch/lsx_helper.c
@@ -0,0 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * QEMU LoongArch LSX helper functions.
+ *
+ * Copyright (c) 2022-2023 Loongson Technology Corporation Limited
+ */
diff --git a/target/loongarch/meson.build b/target/loongarch/meson.build
index 9293a8ab78..1117a51c52 100644
--- a/target/loongarch/meson.build
+++ b/target/loongarch/meson.build
@@ -11,6 +11,7 @@ loongarch_tcg_ss.add(files(
   'op_helper.c',
   'translate.c',
   'gdbstub.c',
+  'lsx_helper.c',
 ))
 loongarch_tcg_ss.add(zlib)
 
diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c
index f443b5822f..104d4f2fbd 100644
--- a/target/loongarch/translate.c
+++ b/target/loongarch/translate.c
@@ -171,6 +171,7 @@ static void gen_set_gpr(int reg_num, TCGv t, DisasExtend 
dst_ext)
 #include "insn_trans/trans_fmemory.c.inc"
 #include "insn_trans/trans_branch.c.inc"
 #include "insn_trans/trans_privileged.c.inc"
+#include "insn_trans/trans_lsx.c.inc"
 
 static void loongarch_tr_translate_insn(DisasContextBase *dcbase, CPUState *cs)
 {
-- 
2.31.1




[RFC PATCH v2 31/44] target/loongarch: Implement vpcnt

2023-03-27 Thread Song Gao
This patch includes:
- VPCNT.{B/H/W/D}.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c|  5 
 target/loongarch/helper.h   |  5 
 target/loongarch/insn_trans/trans_lsx.c.inc |  5 
 target/loongarch/insns.decode   |  5 
 target/loongarch/lsx_helper.c   | 30 +
 5 files changed, 50 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 0c82a1d9d1..0ca51de9d8 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1267,3 +1267,8 @@ INSN_LSX(vclz_b,   vv)
 INSN_LSX(vclz_h,   vv)
 INSN_LSX(vclz_w,   vv)
 INSN_LSX(vclz_d,   vv)
+
+INSN_LSX(vpcnt_b,  vv)
+INSN_LSX(vpcnt_h,  vv)
+INSN_LSX(vpcnt_w,  vv)
+INSN_LSX(vpcnt_d,  vv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index a7facc6bc1..38e310512b 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -495,3 +495,8 @@ DEF_HELPER_3(vclz_b, void, env, i32, i32)
 DEF_HELPER_3(vclz_h, void, env, i32, i32)
 DEF_HELPER_3(vclz_w, void, env, i32, i32)
 DEF_HELPER_3(vclz_d, void, env, i32, i32)
+
+DEF_HELPER_3(vpcnt_b, void, env, i32, i32)
+DEF_HELPER_3(vpcnt_h, void, env, i32, i32)
+DEF_HELPER_3(vpcnt_w, void, env, i32, i32)
+DEF_HELPER_3(vpcnt_d, void, env, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index 5d81c02103..59923eb1fa 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2794,3 +2794,8 @@ TRANS(vclz_b, gen_vv, gen_helper_vclz_b)
 TRANS(vclz_h, gen_vv, gen_helper_vclz_h)
 TRANS(vclz_w, gen_vv, gen_helper_vclz_w)
 TRANS(vclz_d, gen_vv, gen_helper_vclz_d)
+
+TRANS(vpcnt_b, gen_vv, gen_helper_vpcnt_b)
+TRANS(vpcnt_h, gen_vv, gen_helper_vpcnt_h)
+TRANS(vpcnt_w, gen_vv, gen_helper_vpcnt_w)
+TRANS(vpcnt_d, gen_vv, gen_helper_vpcnt_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 7591ec1bab..f865e83da5 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -968,3 +968,8 @@ vclz_b   0111 00101001 11000 00100 . .
@vv
 vclz_h   0111 00101001 11000 00101 . .@vv
 vclz_w   0111 00101001 11000 00110 . .@vv
 vclz_d   0111 00101001 11000 00111 . .@vv
+
+vpcnt_b  0111 00101001 11000 01000 . .@vv
+vpcnt_h  0111 00101001 11000 01001 . .@vv
+vpcnt_w  0111 00101001 11000 01010 . .@vv
+vpcnt_d  0111 00101001 11000 01011 . .@vv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 8ec479dc2d..94dded7e49 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -2201,3 +2201,33 @@ DO_2OP(vclz_b, 8, B, uint8_t, DO_CLZ_B)
 DO_2OP(vclz_h, 16, H, uint16_t, DO_CLZ_H)
 DO_2OP(vclz_w, 32, W, uint32_t, DO_CLZ_W)
 DO_2OP(vclz_d, 64, D, uint64_t, DO_CLZ_D)
+
+static uint64_t do_vpcnt(uint64_t u1)
+{
+u1 = (u1 & 0xULL) + ((u1 >>  1) & 0xULL);
+u1 = (u1 & 0xULL) + ((u1 >>  2) & 0xULL);
+u1 = (u1 & 0x0F0F0F0F0F0F0F0FULL) + ((u1 >>  4) & 0x0F0F0F0F0F0F0F0FULL);
+u1 = (u1 & 0x00FF00FF00FF00FFULL) + ((u1 >>  8) & 0x00FF00FF00FF00FFULL);
+u1 = (u1 & 0xULL) + ((u1 >> 16) & 0xULL);
+u1 = (u1 & 0xULL) + ((u1 >> 32));
+
+return u1;
+}
+
+#define VPCNT(NAME, BIT, E, T)  \
+void HELPER(NAME)(CPULoongArchState *env, uint32_t vd, uint32_t vj) \
+{   \
+int i;  \
+VReg *Vd = &(env->fpr[vd].vreg);\
+VReg *Vj = &(env->fpr[vj].vreg);\
+\
+for (i = 0; i < LSX_LEN/BIT; i++)   \
+{   \
+Vd->E(i) = do_vpcnt((T)Vj->E(i));   \
+}   \
+}
+
+VPCNT(vpcnt_b, 8, B, uint8_t)
+VPCNT(vpcnt_h, 16, H, uint16_t)
+VPCNT(vpcnt_w, 32, W, uint32_t)
+VPCNT(vpcnt_d, 64, D, uint64_t)
-- 
2.31.1




[RFC PATCH v2 22/44] target/loongarch: Implement LSX logic instructions

2023-03-27 Thread Song Gao
This patch includes:
- V{AND/OR/XOR/NOR/ANDN/ORN}.V;
- V{AND/OR/XOR/NOR}I.B.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c| 12 +
 target/loongarch/helper.h   |  2 +
 target/loongarch/insn_trans/trans_lsx.c.inc | 50 +
 target/loongarch/insns.decode   | 13 ++
 target/loongarch/lsx_helper.c   | 11 +
 5 files changed, 88 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 2725b827ee..eca0a4bb7b 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1091,3 +1091,15 @@ INSN_LSX(vmskltz_w,vv)
 INSN_LSX(vmskltz_d,vv)
 INSN_LSX(vmskgez_b,vv)
 INSN_LSX(vmsknz_b, vv)
+
+INSN_LSX(vand_v,   vvv)
+INSN_LSX(vor_v,vvv)
+INSN_LSX(vxor_v,   vvv)
+INSN_LSX(vnor_v,   vvv)
+INSN_LSX(vandn_v,  vvv)
+INSN_LSX(vorn_v,   vvv)
+
+INSN_LSX(vandi_b,  vv_i)
+INSN_LSX(vori_b,   vv_i)
+INSN_LSX(vxori_b,  vv_i)
+INSN_LSX(vnori_b,  vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index cc2f542278..1eeb614427 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -365,3 +365,5 @@ DEF_HELPER_3(vmskltz_w, void, env, i32, i32)
 DEF_HELPER_3(vmskltz_d, void, env, i32, i32)
 DEF_HELPER_3(vmskgez_b, void, env, i32, i32)
 DEF_HELPER_3(vmsknz_b, void, env, i32,i32)
+
+DEF_HELPER_FLAGS_4(vnori_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index 9ca3a23106..c20d77bd3a 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2568,3 +2568,53 @@ TRANS(vmskltz_w, gen_vv, gen_helper_vmskltz_w)
 TRANS(vmskltz_d, gen_vv, gen_helper_vmskltz_d)
 TRANS(vmskgez_b, gen_vv, gen_helper_vmskgez_b)
 TRANS(vmsknz_b, gen_vv, gen_helper_vmsknz_b)
+
+TRANS(vand_v, gvec_vvv, MO_64, tcg_gen_gvec_and)
+TRANS(vor_v, gvec_vvv, MO_64, tcg_gen_gvec_or)
+TRANS(vxor_v, gvec_vvv, MO_64, tcg_gen_gvec_xor)
+TRANS(vnor_v, gvec_vvv, MO_64, tcg_gen_gvec_nor)
+
+static bool trans_vandn_v(DisasContext *ctx, arg_vvv *a)
+{
+uint32_t vd_ofs, vj_ofs, vk_ofs;
+
+CHECK_SXE;
+
+vd_ofs = vreg_full_offset(a->vd);
+vj_ofs = vreg_full_offset(a->vj);
+vk_ofs = vreg_full_offset(a->vk);
+
+tcg_gen_gvec_andc(MO_64, vd_ofs, vk_ofs, vj_ofs, 16, 16);
+return true;
+}
+TRANS(vorn_v, gvec_vvv, MO_64, tcg_gen_gvec_orc)
+TRANS(vandi_b, gvec_vv_i, MO_8, tcg_gen_gvec_andi)
+TRANS(vori_b, gvec_vv_i, MO_8, tcg_gen_gvec_ori)
+TRANS(vxori_b, gvec_vv_i, MO_8, tcg_gen_gvec_xori)
+
+static void gen_vnori(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
+{
+TCGv_vec t1;
+
+t1 = tcg_temp_new_vec_matching(t);
+tcg_gen_dupi_vec(vece, t1, imm);
+tcg_gen_nor_vec(vece, t, a, t1);
+}
+
+static void do_vnori_b(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+   int64_t imm, uint32_t oprsz, uint32_t maxsz)
+{
+static const TCGOpcode vecop_list[] = {
+INDEX_op_nor_vec, 0
+};
+static const GVecGen2i op = {
+   .fniv = gen_vnori,
+   .fnoi = gen_helper_vnori_b,
+   .opt_opc = vecop_list,
+   .vece = MO_8
+};
+
+tcg_gen_gvec_2i(vd_ofs, vj_ofs, oprsz, maxsz, imm, );
+}
+
+TRANS(vnori_b, gvec_vv_i, MO_8, do_vnori_b)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 47c1ef78a7..6309683be9 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -503,6 +503,7 @@ dbcl  0010 10101 ...  
@i15
 @vv_ui4   . . imm:4 vj:5 vd:5_i
 @vv_ui5     . imm:5 vj:5 vd:5_i
 @vv_ui6   imm:6 vj:5 vd:5_i
+@vv_ui8    .. imm:8 vj:5 vd:5_i
 @vv_i5     . imm:s5 vj:5 vd:5_i
 
 vadd_b   0111  10100 . . .@vvv
@@ -790,3 +791,15 @@ vmskltz_w0111 00101001 11000 10010 . .
@vv
 vmskltz_d0111 00101001 11000 10011 . .@vv
 vmskgez_b0111 00101001 11000 10100 . .@vv
 vmsknz_b 0111 00101001 11000 11000 . .@vv
+
+vand_v   0111 00010010 01100 . . .@vvv
+vor_v0111 00010010 01101 . . .@vvv
+vxor_v   0111 00010010 01110 . . .@vvv
+vnor_v   0111 00010010 0 . . .@vvv
+vandn_v  0111 00010010 1 . . .@vvv
+vorn_v   0111 00010010 10001 . . .@vvv
+
+vandi_b  0111 0001 00  . .@vv_ui8
+vori_b   0111 0001 01  . .@vv_ui8
+vxori_b  0111 0001 10  . .@vv_ui8
+vnori_b  0111 0001 11  . .@vv_ui8

[RFC PATCH v2 20/44] target/loongarch: Implement vsigncov

2023-03-27 Thread Song Gao
This patch includes:
- VSIGNCOV.{B/H/W/D}.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c|  5 ++
 target/loongarch/helper.h   |  5 ++
 target/loongarch/insn_trans/trans_lsx.c.inc | 54 +
 target/loongarch/insns.decode   |  5 ++
 target/loongarch/lsx_helper.c   | 19 
 5 files changed, 88 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 412c1cedcb..46e808c321 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1079,3 +1079,8 @@ INSN_LSX(vexth_hu_bu,  vv)
 INSN_LSX(vexth_wu_hu,  vv)
 INSN_LSX(vexth_du_wu,  vv)
 INSN_LSX(vexth_qu_du,  vv)
+
+INSN_LSX(vsigncov_b,   vvv)
+INSN_LSX(vsigncov_h,   vvv)
+INSN_LSX(vsigncov_w,   vvv)
+INSN_LSX(vsigncov_d,   vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 0876aa3331..a7394b2eb7 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -353,3 +353,8 @@ DEF_HELPER_3(vexth_hu_bu, void, env, i32, i32)
 DEF_HELPER_3(vexth_wu_hu, void, env, i32, i32)
 DEF_HELPER_3(vexth_du_wu, void, env, i32, i32)
 DEF_HELPER_3(vexth_qu_du, void, env, i32, i32)
+
+DEF_HELPER_FLAGS_4(vsigncov_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsigncov_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsigncov_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsigncov_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index f6058c1360..865485ea10 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2507,3 +2507,57 @@ TRANS(vexth_hu_bu, gen_vv, gen_helper_vexth_hu_bu)
 TRANS(vexth_wu_hu, gen_vv, gen_helper_vexth_wu_hu)
 TRANS(vexth_du_wu, gen_vv, gen_helper_vexth_du_wu)
 TRANS(vexth_qu_du, gen_vv, gen_helper_vexth_qu_du)
+
+static void gen_vsigncov(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+TCGv_vec t1, t2;
+
+t1 = tcg_temp_new_vec_matching(t);
+t2 = tcg_temp_new_vec_matching(t);
+
+tcg_gen_neg_vec(vece, t1, b);
+tcg_gen_dupi_vec(vece, t2, 0);
+tcg_gen_cmpsel_vec(TCG_COND_LT, vece, t, a, t2, t1, b);
+tcg_gen_cmpsel_vec(TCG_COND_EQ, vece, t, a, t2, t2, t);
+}
+
+static void do_vsigncov(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+static const TCGOpcode vecop_list[] = {
+INDEX_op_neg_vec, INDEX_op_cmpsel_vec, 0
+};
+static const GVecGen3 op[4] = {
+{
+.fniv = gen_vsigncov,
+.fno = gen_helper_vsigncov_b,
+.opt_opc = vecop_list,
+.vece = MO_8
+},
+{
+.fniv = gen_vsigncov,
+.fno = gen_helper_vsigncov_h,
+.opt_opc = vecop_list,
+.vece = MO_16
+},
+{
+.fniv = gen_vsigncov,
+.fno = gen_helper_vsigncov_w,
+.opt_opc = vecop_list,
+.vece = MO_32
+},
+{
+.fniv = gen_vsigncov,
+.fno = gen_helper_vsigncov_d,
+.opt_opc = vecop_list,
+.vece = MO_64
+},
+};
+
+tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, [vece]);
+}
+
+TRANS(vsigncov_b, gvec_vvv, MO_8, do_vsigncov)
+TRANS(vsigncov_h, gvec_vvv, MO_16, do_vsigncov)
+TRANS(vsigncov_w, gvec_vvv, MO_32, do_vsigncov)
+TRANS(vsigncov_d, gvec_vvv, MO_64, do_vsigncov)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 39c582d098..4233dd7404 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -778,3 +778,8 @@ vexth_hu_bu  0111 00101001 11101 11100 . .
@vv
 vexth_wu_hu  0111 00101001 11101 11101 . .@vv
 vexth_du_wu  0111 00101001 11101 0 . .@vv
 vexth_qu_du  0111 00101001 11101 1 . .@vv
+
+vsigncov_b   0111 00010010 11100 . . .@vvv
+vsigncov_h   0111 00010010 11101 . . .@vvv
+vsigncov_w   0111 00010010 0 . . .@vvv
+vsigncov_d   0111 00010010 1 . . .@vvv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 9a0b358576..b3a9b8cb66 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -871,3 +871,22 @@ VEXTH(vexth_d_w, 64, int64_t, int32_t, D, W)
 VEXTH(vexth_hu_bu, 16, uint16_t, uint8_t, H, B)
 VEXTH(vexth_wu_hu, 32, uint32_t, uint16_t, W, H)
 VEXTH(vexth_du_wu, 64, uint64_t, uint32_t, D, W)
+
+#define DO_SIGNCOV(a, b)  (a == 0 ? 0 : a < 0 ? -b : b)
+
+#define VSIGNCOV(NAME, BIT, E, DO_OP)   \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v) \
+{   \
+int i;   

[RFC PATCH v2 42/44] target/loongarch: Implement vld vst

2023-03-27 Thread Song Gao
This patch includes:
- VLD[X], VST[X];
- VLDREPL.{B/H/W/D};
- VSTELM.{B/H/W/D}.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c|  34 +++
 target/loongarch/helper.h   |  12 +
 target/loongarch/insn_trans/trans_lsx.c.inc |  70 +
 target/loongarch/insns.decode   |  36 +++
 target/loongarch/lsx_helper.c   | 267 
 target/loongarch/translate.c|  10 +
 6 files changed, 429 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 0b62bbb8be..8627908fc9 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -21,11 +21,21 @@ static inline int plus_1(DisasContext *ctx, int x)
 return x + 1;
 }
 
+static inline int shl_1(DisasContext *ctx, int x)
+{
+return x << 1;
+}
+
 static inline int shl_2(DisasContext *ctx, int x)
 {
 return x << 2;
 }
 
+static inline int shl_3(DisasContext *ctx, int x)
+{
+return x << 3;
+}
+
 #define CSR_NAME(REG) \
 [LOONGARCH_CSR_##REG] = (#REG)
 
@@ -823,6 +833,11 @@ static void output_vr_i(DisasContext *ctx, arg_vr_i *a, 
const char *mnemonic)
 output(ctx, mnemonic, "v%d, r%d, 0x%x", a->vd, a->rj, a->imm);
 }
 
+static void output_vr_ii(DisasContext *ctx, arg_vr_ii *a, const char *mnemonic)
+{
+output(ctx, mnemonic, "v%d, r%d, 0x%x, 0x%x", a->vd, a->rj, a->imm, 
a->imm2);
+}
+
 static void output_rv_i(DisasContext *ctx, arg_rv_i *a, const char *mnemonic)
 {
 output(ctx, mnemonic, "r%d, v%d, 0x%x", a->rd, a->vj,  a->imm);
@@ -838,6 +853,11 @@ static void output_vvr(DisasContext *ctx, arg_vvr *a, 
const char *mnemonic)
 output(ctx, mnemonic, "v%d, v%d, r%d", a->vd, a->vj, a->rk);
 }
 
+static void output_vrr(DisasContext *ctx, arg_vrr *a, const char *mnemonic)
+{
+output(ctx, mnemonic, "v%d, r%d, r%d", a->vd, a->rj, a->rk);
+}
+
 INSN_LSX(vadd_b,   vvv)
 INSN_LSX(vadd_h,   vvv)
 INSN_LSX(vadd_w,   vvv)
@@ -1654,3 +1674,17 @@ INSN_LSX(vextrins_d,   vv_i)
 INSN_LSX(vextrins_w,   vv_i)
 INSN_LSX(vextrins_h,   vv_i)
 INSN_LSX(vextrins_b,   vv_i)
+
+INSN_LSX(vld,  vr_i)
+INSN_LSX(vst,  vr_i)
+INSN_LSX(vldx, vrr)
+INSN_LSX(vstx, vrr)
+
+INSN_LSX(vldrepl_d,vr_i)
+INSN_LSX(vldrepl_w,vr_i)
+INSN_LSX(vldrepl_h,vr_i)
+INSN_LSX(vldrepl_b,vr_i)
+INSN_LSX(vstelm_d, vr_ii)
+INSN_LSX(vstelm_w, vr_ii)
+INSN_LSX(vstelm_h, vr_ii)
+INSN_LSX(vstelm_b, vr_ii)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 86c7eeeae1..5b6674ff0e 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -711,3 +711,15 @@ DEF_HELPER_4(vextrins_b, void, env, i32, i32, i32)
 DEF_HELPER_4(vextrins_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vextrins_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vextrins_d, void, env, i32, i32, i32)
+
+DEF_HELPER_3(vld_b, void, env, i32, tl)
+DEF_HELPER_3(vst_b, void, env, i32, tl)
+
+DEF_HELPER_3(vldrepl_d, void, env, i32, tl)
+DEF_HELPER_3(vldrepl_w, void, env, i32, tl)
+DEF_HELPER_3(vldrepl_h, void, env, i32, tl)
+DEF_HELPER_3(vldrepl_b, void, env, i32, tl)
+DEF_HELPER_4(vstelm_d, void, env, i32, tl, i32)
+DEF_HELPER_4(vstelm_w, void, env, i32, tl, i32)
+DEF_HELPER_4(vstelm_h, void, env, i32, tl, i32)
+DEF_HELPER_4(vstelm_b, void, env, i32, tl, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index 0ea7c65445..ab896f8a9e 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -3510,3 +3510,73 @@ TRANS(vextrins_b, gen_vv_i, gen_helper_vextrins_b)
 TRANS(vextrins_h, gen_vv_i, gen_helper_vextrins_h)
 TRANS(vextrins_w, gen_vv_i, gen_helper_vextrins_w)
 TRANS(vextrins_d, gen_vv_i, gen_helper_vextrins_d)
+
+static bool gen_memory(DisasContext *ctx, arg_vr_i *a,
+   void (*func)(TCGv_ptr, TCGv_i32, TCGv))
+{
+TCGv_i32 vd = tcg_constant_i32(a->vd);
+TCGv addr = gpr_src(ctx, a->rj, EXT_NONE);
+TCGv temp;
+
+CHECK_SXE;
+
+if (a->imm) {
+temp = tcg_temp_new();
+tcg_gen_addi_tl(temp, addr, a->imm);
+addr = temp;
+}
+
+func(cpu_env, vd, addr);
+
+return true;
+}
+
+static bool gen_memory_x(DisasContext *ctx, arg_vrr *a,
+void (*func)(TCGv_ptr, TCGv_i32, TCGv))
+{
+TCGv_i32 vd = tcg_constant_i32(a->vd);
+TCGv src1 = gpr_src(ctx, a->rj, EXT_NONE);
+TCGv src2 = gpr_src(ctx, a->rk, EXT_NONE);
+
+CHECK_SXE;
+
+TCGv addr = tcg_temp_new();
+tcg_gen_add_tl(addr, src1, src2);
+func(cpu_env, vd, addr);
+return true;
+}
+
+TRANS(vld, gen_memory, gen_helper_vld_b)
+TRANS(vst, gen_memory, gen_helper_vst_b)
+TRANS(vldx, gen_memory_x, gen_helper_vld_b)
+TRANS(vstx, gen_memory_x, gen_helper_vst_b)
+
+static bool gen_memory_elm(DisasContext *ctx, arg_vr_ii *a,
+   void (*func)(TCGv_ptr, 

Re: [PATCH 2/5] target/riscv: Use sign-extended data address when xl = 32

2023-03-27 Thread liweiwei


On 2023/3/28 10:14, LIU Zhiwei wrote:



On 2023/3/27 18:00, Weiwei Li wrote:

Currently, the pc use signed-extend(in gen_set_pc*) when xl = 32. And
data address should use the same memory address space with it when
xl = 32. So we should change their address calculation to use sign-extended
address when xl = 32.


Incorrect. PC sign-extend is mandated by the spec. It can be seen for 
gdb or the OS. But for the memory address for xl = 32, it's the qemu 
internal implementation.


Yeah, there is no spec description for the memory address for xlen = 32. 
But it seems  easier to use the original (sign-extended) pc in this case.


We needn't cut the pc in cpu_get_tb_cpu_state and sign-extend it in 
riscv_cpu_synchronize_from_tb.


Regards,

Weiwei Li


We should not to make it too complex.

Even for the PC, when fectch instruction, we only use the low 32-bits, 
as you can see  from the cpu_get_tb_cpu_state.


*pc = cpu_get_xl(env) == MXL_RV32 ? env->pc & UINT32_MAX : env->pc;

Zhiwei


Signed-off-by: Weiwei Li
Signed-off-by: Junqiang Wang
---
  target/riscv/translate.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index bf0e2d318e..c48cb19389 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -570,7 +570,7 @@ static TCGv get_address(DisasContext *ctx, int rs1, int imm)
  tcg_gen_addi_tl(addr, src1, imm);
  
  if (get_xl(ctx) == MXL_RV32) {

-tcg_gen_ext32u_tl(addr, addr);
+tcg_gen_ext32s_tl(addr, addr);
  }
  
  if (ctx->pm_mask_enabled) {

@@ -592,7 +592,7 @@ static TCGv get_address_indexed(DisasContext *ctx, int rs1, 
TCGv offs)
  tcg_gen_add_tl(addr, src1, offs);
  
  if (get_xl(ctx) == MXL_RV32) {

-tcg_gen_ext32u_tl(addr, addr);
+tcg_gen_ext32s_tl(addr, addr);
  }
  
  if (ctx->pm_mask_enabled) {

[RFC PATCH v2 05/44] target/loongarch: Implement vadd/vsub

2023-03-27 Thread Song Gao
This patch includes:
- VADD.{B/H/W/D/Q};
- VSUB.{B/H/W/D/Q}.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c| 23 
 target/loongarch/helper.h   |  4 +++
 target/loongarch/insn_trans/trans_lsx.c.inc | 40 +
 target/loongarch/insns.decode   | 22 
 target/loongarch/lsx_helper.c   | 25 +
 target/loongarch/translate.c|  7 
 6 files changed, 121 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 2e93e77e0d..a5948d7847 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -784,3 +784,26 @@ PCADD_INSN(pcaddi)
 PCADD_INSN(pcalau12i)
 PCADD_INSN(pcaddu12i)
 PCADD_INSN(pcaddu18i)
+
+#define INSN_LSX(insn, type)\
+static bool trans_##insn(DisasContext *ctx, arg_##type * a) \
+{   \
+output_##type(ctx, a, #insn);   \
+return true;\
+}
+
+static void output_vvv(DisasContext *ctx, arg_vvv *a, const char *mnemonic)
+{
+output(ctx, mnemonic, "v%d, v%d, v%d", a->vd, a->vj, a->vk);
+}
+
+INSN_LSX(vadd_b,   vvv)
+INSN_LSX(vadd_h,   vvv)
+INSN_LSX(vadd_w,   vvv)
+INSN_LSX(vadd_d,   vvv)
+INSN_LSX(vadd_q,   vvv)
+INSN_LSX(vsub_b,   vvv)
+INSN_LSX(vsub_h,   vvv)
+INSN_LSX(vsub_w,   vvv)
+INSN_LSX(vsub_d,   vvv)
+INSN_LSX(vsub_q,   vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 9c01823a26..13390c07d6 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -130,3 +130,7 @@ DEF_HELPER_4(ldpte, void, env, tl, tl, i32)
 DEF_HELPER_1(ertn, void, env)
 DEF_HELPER_1(idle, void, env)
 #endif
+
+/* LoongArch LSX  */
+DEF_HELPER_4(vadd_q, void, env, i32, i32, i32)
+DEF_HELPER_4(vsub_q, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index 5dedb044d7..2fe0e4ace5 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -14,3 +14,43 @@
 #else
 #define CHECK_SXE
 #endif
+
+static bool gen_vvv(DisasContext *ctx, arg_vvv *a,
+void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32))
+{
+TCGv_i32 vd = tcg_constant_i32(a->vd);
+TCGv_i32 vj = tcg_constant_i32(a->vj);
+TCGv_i32 vk = tcg_constant_i32(a->vk);
+
+CHECK_SXE;
+
+func(cpu_env, vd, vj, vk);
+return true;
+}
+
+static bool gvec_vvv(DisasContext *ctx, arg_vvv *a, MemOp mop,
+ void (*func)(unsigned, uint32_t, uint32_t,
+  uint32_t, uint32_t, uint32_t))
+{
+uint32_t vd_ofs, vj_ofs, vk_ofs;
+
+CHECK_SXE;
+
+vd_ofs = vreg_full_offset(a->vd);
+vj_ofs = vreg_full_offset(a->vj);
+vk_ofs = vreg_full_offset(a->vk);
+
+func(mop, vd_ofs, vj_ofs, vk_ofs, 16, 16);
+return true;
+}
+
+TRANS(vadd_b, gvec_vvv, MO_8, tcg_gen_gvec_add)
+TRANS(vadd_h, gvec_vvv, MO_16, tcg_gen_gvec_add)
+TRANS(vadd_w, gvec_vvv, MO_32, tcg_gen_gvec_add)
+TRANS(vadd_d, gvec_vvv, MO_64, tcg_gen_gvec_add)
+TRANS(vadd_q, gen_vvv, gen_helper_vadd_q)
+TRANS(vsub_b, gvec_vvv, MO_8, tcg_gen_gvec_sub)
+TRANS(vsub_h, gvec_vvv, MO_16, tcg_gen_gvec_sub)
+TRANS(vsub_w, gvec_vvv, MO_32, tcg_gen_gvec_sub)
+TRANS(vsub_d, gvec_vvv, MO_64, tcg_gen_gvec_sub)
+TRANS(vsub_q, gen_vvv, gen_helper_vsub_q)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index de7b8f0f3c..d18db68d51 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -485,3 +485,25 @@ ldpte 01100100 01  . 0
@j_i
 ertn  01100100 1 01110 0 0@empty
 idle  01100100 10001 ...  @i15
 dbcl  0010 10101 ...  @i15
+
+#
+# LSX Argument sets
+#
+
+  vd vj vk
+
+#
+# LSX Formats
+#
+@vvv     . vk:5 vj:5 vd:5
+
+vadd_b   0111  10100 . . .@vvv
+vadd_h   0111  10101 . . .@vvv
+vadd_w   0111  10110 . . .@vvv
+vadd_d   0111  10111 . . .@vvv
+vadd_q   0111 00010010 11010 . . .@vvv
+vsub_b   0111  11000 . . .@vvv
+vsub_h   0111  11001 . . .@vvv
+vsub_w   0111  11010 . . .@vvv
+vsub_d   0111  11011 . . .@vvv
+vsub_q   0111 00010010 11011 . . .@vvv
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 9332163aff..edd6e99b23 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -4,3 +4,28 @@
  *
 

[RFC PATCH v2 23/44] target/loongarch: Implement vsll vsrl vsra vrotr

2023-03-27 Thread Song Gao
This patch includes:
- VSLL[I].{B/H/W/D};
- VSRL[I].{B/H/W/D};
- VSRA[I].{B/H/W/D};
- VROTR[I].{B/H/W/D}.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c| 36 +
 target/loongarch/insn_trans/trans_lsx.c.inc | 36 +
 target/loongarch/insns.decode   | 36 +
 3 files changed, 108 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index eca0a4bb7b..f7d0fb4441 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1103,3 +1103,39 @@ INSN_LSX(vandi_b,  vv_i)
 INSN_LSX(vori_b,   vv_i)
 INSN_LSX(vxori_b,  vv_i)
 INSN_LSX(vnori_b,  vv_i)
+
+INSN_LSX(vsll_b,   vvv)
+INSN_LSX(vsll_h,   vvv)
+INSN_LSX(vsll_w,   vvv)
+INSN_LSX(vsll_d,   vvv)
+INSN_LSX(vslli_b,  vv_i)
+INSN_LSX(vslli_h,  vv_i)
+INSN_LSX(vslli_w,  vv_i)
+INSN_LSX(vslli_d,  vv_i)
+
+INSN_LSX(vsrl_b,   vvv)
+INSN_LSX(vsrl_h,   vvv)
+INSN_LSX(vsrl_w,   vvv)
+INSN_LSX(vsrl_d,   vvv)
+INSN_LSX(vsrli_b,  vv_i)
+INSN_LSX(vsrli_h,  vv_i)
+INSN_LSX(vsrli_w,  vv_i)
+INSN_LSX(vsrli_d,  vv_i)
+
+INSN_LSX(vsra_b,   vvv)
+INSN_LSX(vsra_h,   vvv)
+INSN_LSX(vsra_w,   vvv)
+INSN_LSX(vsra_d,   vvv)
+INSN_LSX(vsrai_b,  vv_i)
+INSN_LSX(vsrai_h,  vv_i)
+INSN_LSX(vsrai_w,  vv_i)
+INSN_LSX(vsrai_d,  vv_i)
+
+INSN_LSX(vrotr_b,  vvv)
+INSN_LSX(vrotr_h,  vvv)
+INSN_LSX(vrotr_w,  vvv)
+INSN_LSX(vrotr_d,  vvv)
+INSN_LSX(vrotri_b, vv_i)
+INSN_LSX(vrotri_h, vv_i)
+INSN_LSX(vrotri_w, vv_i)
+INSN_LSX(vrotri_d, vv_i)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index c20d77bd3a..84c8d92ad6 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2618,3 +2618,39 @@ static void do_vnori_b(unsigned vece, uint32_t vd_ofs, 
uint32_t vj_ofs,
 }
 
 TRANS(vnori_b, gvec_vv_i, MO_8, do_vnori_b)
+
+TRANS(vsll_b, gvec_vvv, MO_8, tcg_gen_gvec_shlv)
+TRANS(vsll_h, gvec_vvv, MO_16, tcg_gen_gvec_shlv)
+TRANS(vsll_w, gvec_vvv, MO_32, tcg_gen_gvec_shlv)
+TRANS(vsll_d, gvec_vvv, MO_64, tcg_gen_gvec_shlv)
+TRANS(vslli_b, gvec_vv_i, MO_8, tcg_gen_gvec_shli)
+TRANS(vslli_h, gvec_vv_i, MO_16, tcg_gen_gvec_shli)
+TRANS(vslli_w, gvec_vv_i, MO_32, tcg_gen_gvec_shli)
+TRANS(vslli_d, gvec_vv_i, MO_64, tcg_gen_gvec_shli)
+
+TRANS(vsrl_b, gvec_vvv, MO_8, tcg_gen_gvec_shrv)
+TRANS(vsrl_h, gvec_vvv, MO_16, tcg_gen_gvec_shrv)
+TRANS(vsrl_w, gvec_vvv, MO_32, tcg_gen_gvec_shrv)
+TRANS(vsrl_d, gvec_vvv, MO_64, tcg_gen_gvec_shrv)
+TRANS(vsrli_b, gvec_vv_i, MO_8, tcg_gen_gvec_shri)
+TRANS(vsrli_h, gvec_vv_i, MO_16, tcg_gen_gvec_shri)
+TRANS(vsrli_w, gvec_vv_i, MO_32, tcg_gen_gvec_shri)
+TRANS(vsrli_d, gvec_vv_i, MO_64, tcg_gen_gvec_shri)
+
+TRANS(vsra_b, gvec_vvv, MO_8, tcg_gen_gvec_sarv)
+TRANS(vsra_h, gvec_vvv, MO_16, tcg_gen_gvec_sarv)
+TRANS(vsra_w, gvec_vvv, MO_32, tcg_gen_gvec_sarv)
+TRANS(vsra_d, gvec_vvv, MO_64, tcg_gen_gvec_sarv)
+TRANS(vsrai_b, gvec_vv_i, MO_8, tcg_gen_gvec_sari)
+TRANS(vsrai_h, gvec_vv_i, MO_16, tcg_gen_gvec_sari)
+TRANS(vsrai_w, gvec_vv_i, MO_32, tcg_gen_gvec_sari)
+TRANS(vsrai_d, gvec_vv_i, MO_64, tcg_gen_gvec_sari)
+
+TRANS(vrotr_b, gvec_vvv, MO_8, tcg_gen_gvec_rotrv)
+TRANS(vrotr_h, gvec_vvv, MO_16, tcg_gen_gvec_rotrv)
+TRANS(vrotr_w, gvec_vvv, MO_32, tcg_gen_gvec_rotrv)
+TRANS(vrotr_d, gvec_vvv, MO_64, tcg_gen_gvec_rotrv)
+TRANS(vrotri_b, gvec_vv_i, MO_8, tcg_gen_gvec_rotri)
+TRANS(vrotri_h, gvec_vv_i, MO_16, tcg_gen_gvec_rotri)
+TRANS(vrotri_w, gvec_vv_i, MO_32, tcg_gen_gvec_rotri)
+TRANS(vrotri_d, gvec_vv_i, MO_64, tcg_gen_gvec_rotri)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 6309683be9..7c0b0c4ac8 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -803,3 +803,39 @@ vandi_b  0111 0001 00  . .
@vv_ui8
 vori_b   0111 0001 01  . .@vv_ui8
 vxori_b  0111 0001 10  . .@vv_ui8
 vnori_b  0111 0001 11  . .@vv_ui8
+
+vsll_b   0111 1110 1 . . .@vvv
+vsll_h   0111 1110 10001 . . .@vvv
+vsll_w   0111 1110 10010 . . .@vvv
+vsll_d   0111 1110 10011 . . .@vvv
+vslli_b  0111 00110010 11000 01 ... . .   @vv_ui3
+vslli_h  0111 00110010 11000 1  . .   @vv_ui4
+vslli_w  0111 00110010 11001 . . .@vv_ui5
+vslli_d  0111 00110010 1101 .. . .@vv_ui6
+
+vsrl_b   0111 1110 10100 . . .@vvv
+vsrl_h   0111 1110 10101 . . .@vvv
+vsrl_w

[RFC PATCH v2 44/44] target/loongarch: Use {set/get}_gpr replace to cpu_fpr

2023-03-27 Thread Song Gao
Introduce set_fpr() and get_fpr() and remove cpu_fpr.

Signed-off-by: Song Gao 
---
 .../loongarch/insn_trans/trans_farith.c.inc   | 72 +++
 target/loongarch/insn_trans/trans_fcmp.c.inc  | 12 ++--
 .../loongarch/insn_trans/trans_fmemory.c.inc  | 37 ++
 target/loongarch/insn_trans/trans_fmov.c.inc  | 31 +---
 target/loongarch/translate.c  | 20 --
 5 files changed, 129 insertions(+), 43 deletions(-)

diff --git a/target/loongarch/insn_trans/trans_farith.c.inc 
b/target/loongarch/insn_trans/trans_farith.c.inc
index 7081fbb89b..21ea47308b 100644
--- a/target/loongarch/insn_trans/trans_farith.c.inc
+++ b/target/loongarch/insn_trans/trans_farith.c.inc
@@ -17,18 +17,29 @@
 static bool gen_fff(DisasContext *ctx, arg_fff *a,
 void (*func)(TCGv, TCGv_env, TCGv, TCGv))
 {
+TCGv dest = get_fpr(ctx, a->fd);
+TCGv src1 = get_fpr(ctx, a->fj);
+TCGv src2 = get_fpr(ctx, a->fk);
+
 CHECK_FPE;
 
-func(cpu_fpr[a->fd], cpu_env, cpu_fpr[a->fj], cpu_fpr[a->fk]);
+func(dest, cpu_env, src1, src2);
+set_fpr(a->fd, dest);
+
 return true;
 }
 
 static bool gen_ff(DisasContext *ctx, arg_ff *a,
void (*func)(TCGv, TCGv_env, TCGv))
 {
+TCGv dest = get_fpr(ctx, a->fd);
+TCGv src = get_fpr(ctx, a->fj);
+
 CHECK_FPE;
 
-func(cpu_fpr[a->fd], cpu_env, cpu_fpr[a->fj]);
+func(dest, cpu_env, src);
+set_fpr(a->fd, dest);
+
 return true;
 }
 
@@ -37,61 +48,98 @@ static bool gen_muladd(DisasContext *ctx, arg_ *a,
int flag)
 {
 TCGv_i32 tflag = tcg_constant_i32(flag);
+TCGv dest = get_fpr(ctx, a->fd);
+TCGv src1 = get_fpr(ctx, a->fj);
+TCGv src2 = get_fpr(ctx, a->fk);
+TCGv src3 = get_fpr(ctx, a->fa);
 
 CHECK_FPE;
 
-func(cpu_fpr[a->fd], cpu_env, cpu_fpr[a->fj],
- cpu_fpr[a->fk], cpu_fpr[a->fa], tflag);
+func(dest, cpu_env, src1, src2, src3, tflag);
+set_fpr(a->fd, dest);
+
 return true;
 }
 
 static bool trans_fcopysign_s(DisasContext *ctx, arg_fcopysign_s *a)
 {
+TCGv dest = get_fpr(ctx, a->fd);
+TCGv src1 = get_fpr(ctx, a->fk);
+TCGv src2 = get_fpr(ctx, a->fj);
+
 CHECK_FPE;
 
-tcg_gen_deposit_i64(cpu_fpr[a->fd], cpu_fpr[a->fk], cpu_fpr[a->fj], 0, 31);
+tcg_gen_deposit_i64(dest, src1, src2, 0, 31);
+set_fpr(a->fd, dest);
+
 return true;
 }
 
 static bool trans_fcopysign_d(DisasContext *ctx, arg_fcopysign_d *a)
 {
+TCGv dest = get_fpr(ctx, a->fd);
+TCGv src1 = get_fpr(ctx, a->fk);
+TCGv src2 = get_fpr(ctx, a->fj);
+
 CHECK_FPE;
 
-tcg_gen_deposit_i64(cpu_fpr[a->fd], cpu_fpr[a->fk], cpu_fpr[a->fj], 0, 63);
+tcg_gen_deposit_i64(dest, src1, src2, 0, 63);
+set_fpr(a->fd, dest);
+
 return true;
 }
 
 static bool trans_fabs_s(DisasContext *ctx, arg_fabs_s *a)
 {
+TCGv dest = get_fpr(ctx, a->fd);
+TCGv src = get_fpr(ctx, a->fj);
+
 CHECK_FPE;
 
-tcg_gen_andi_i64(cpu_fpr[a->fd], cpu_fpr[a->fj], MAKE_64BIT_MASK(0, 31));
-gen_nanbox_s(cpu_fpr[a->fd], cpu_fpr[a->fd]);
+tcg_gen_andi_i64(dest, src, MAKE_64BIT_MASK(0, 31));
+gen_nanbox_s(dest, dest);
+set_fpr(a->fd, dest);
+
 return true;
 }
 
 static bool trans_fabs_d(DisasContext *ctx, arg_fabs_d *a)
 {
+TCGv dest = get_fpr(ctx, a->fd);
+TCGv src = get_fpr(ctx, a->fj);
+
 CHECK_FPE;
 
-tcg_gen_andi_i64(cpu_fpr[a->fd], cpu_fpr[a->fj], MAKE_64BIT_MASK(0, 63));
+tcg_gen_andi_i64(dest, src, MAKE_64BIT_MASK(0, 63));
+set_fpr(a->fd, dest);
+
 return true;
 }
 
 static bool trans_fneg_s(DisasContext *ctx, arg_fneg_s *a)
 {
+TCGv dest = get_fpr(ctx, a->fd);
+TCGv src = get_fpr(ctx, a->fj);
+
 CHECK_FPE;
 
-tcg_gen_xori_i64(cpu_fpr[a->fd], cpu_fpr[a->fj], 0x8000);
-gen_nanbox_s(cpu_fpr[a->fd], cpu_fpr[a->fd]);
+tcg_gen_xori_i64(dest, src, 0x8000);
+gen_nanbox_s(dest, dest);
+set_fpr(a->fd, dest);
+
 return true;
 }
 
 static bool trans_fneg_d(DisasContext *ctx, arg_fneg_d *a)
 {
+TCGv dest = get_fpr(ctx, a->fd);
+TCGv src = get_fpr(ctx, a->fj);
+
 CHECK_FPE;
 
-tcg_gen_xori_i64(cpu_fpr[a->fd], cpu_fpr[a->fj], 0x8000LL);
+tcg_gen_xori_i64(dest, src, 0x8000LL);
+set_fpr(a->fd, dest);
+
 return true;
 }
 
diff --git a/target/loongarch/insn_trans/trans_fcmp.c.inc 
b/target/loongarch/insn_trans/trans_fcmp.c.inc
index 3b0da2b9f4..a78868dbc4 100644
--- a/target/loongarch/insn_trans/trans_fcmp.c.inc
+++ b/target/loongarch/insn_trans/trans_fcmp.c.inc
@@ -25,17 +25,19 @@ static uint32_t get_fcmp_flags(int cond)
 
 static bool trans_fcmp_cond_s(DisasContext *ctx, arg_fcmp_cond_s *a)
 {
-TCGv var;
+TCGv var, src1, src2;
 uint32_t flags;
 void (*fn)(TCGv, TCGv_env, TCGv, TCGv, TCGv_i32);
 
 CHECK_FPE;
 
 var = tcg_temp_new();
+src1 = get_fpr(ctx, a->fj);
+src2 = get_fpr(ctx, a->fk);
 fn = (a->fcond & 1 ? gen_helper_fcmp_s_s : gen_helper_fcmp_c_s);
 

[RFC PATCH v2 01/44] target/loongarch: Add LSX data type VReg

2023-03-27 Thread Song Gao
Signed-off-by: Song Gao 
---
 linux-user/loongarch64/signal.c |  4 ++--
 target/loongarch/cpu.c  |  2 +-
 target/loongarch/cpu.h  | 31 +-
 target/loongarch/gdbstub.c  |  4 ++--
 target/loongarch/machine.c  | 34 -
 5 files changed, 68 insertions(+), 7 deletions(-)

diff --git a/linux-user/loongarch64/signal.c b/linux-user/loongarch64/signal.c
index 7c7afb652e..bb8efb1172 100644
--- a/linux-user/loongarch64/signal.c
+++ b/linux-user/loongarch64/signal.c
@@ -128,7 +128,7 @@ static void setup_sigframe(CPULoongArchState *env,
 
 fpu_ctx = (struct target_fpu_context *)(info + 1);
 for (i = 0; i < 32; ++i) {
-__put_user(env->fpr[i], _ctx->regs[i]);
+__put_user(env->fpr[i].vreg.D(0), _ctx->regs[i]);
 }
 __put_user(read_fcc(env), _ctx->fcc);
 __put_user(env->fcsr0, _ctx->fcsr);
@@ -193,7 +193,7 @@ static void restore_sigframe(CPULoongArchState *env,
 uint64_t fcc;
 
 for (i = 0; i < 32; ++i) {
-__get_user(env->fpr[i], _ctx->regs[i]);
+__get_user(env->fpr[i].vreg.D(0), _ctx->regs[i]);
 }
 __get_user(fcc, _ctx->fcc);
 write_fcc(env, fcc);
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 97e6579f6a..18b41221a6 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -656,7 +656,7 @@ void loongarch_cpu_dump_state(CPUState *cs, FILE *f, int 
flags)
 /* fpr */
 if (flags & CPU_DUMP_FPU) {
 for (i = 0; i < 32; i++) {
-qemu_fprintf(f, " %s %016" PRIx64, fregnames[i], env->fpr[i]);
+qemu_fprintf(f, " %s %016" PRIx64, fregnames[i], 
env->fpr[i].vreg.D(0));
 if ((i & 3) == 3) {
 qemu_fprintf(f, "\n");
 }
diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index e11c875188..6e5fa6a01d 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -8,6 +8,7 @@
 #ifndef LOONGARCH_CPU_H
 #define LOONGARCH_CPU_H
 
+#include "qemu/int128.h"
 #include "exec/cpu-defs.h"
 #include "fpu/softfloat-types.h"
 #include "hw/registerfields.h"
@@ -241,6 +242,34 @@ FIELD(TLB_MISC, ASID, 1, 10)
 FIELD(TLB_MISC, VPPN, 13, 35)
 FIELD(TLB_MISC, PS, 48, 6)
 
+#define LSX_LEN   (128)
+typedef union VReg {
+int8_t   B[LSX_LEN / 8];
+int16_t  H[LSX_LEN / 16];
+int32_t  W[LSX_LEN / 32];
+int64_t  D[LSX_LEN / 64];
+Int128   Q[LSX_LEN / 128];
+}VReg;
+
+typedef union fpr_t fpr_t;
+union fpr_t {
+VReg  vreg;
+};
+
+#if  HOST_BIG_ENDIAN
+#define B(x)  B[15 - (x)]
+#define H(x)  H[7 - (x)]
+#define W(x)  W[3 - (x)]
+#define D(x)  D[1 - (x)]
+#define Q(x)  Q[x]
+#else
+#define B(x)  B[x]
+#define H(x)  H[x]
+#define W(x)  W[x]
+#define D(x)  D[x]
+#define Q(x)  Q[x]
+#endif
+
 struct LoongArchTLB {
 uint64_t tlb_misc;
 /* Fields corresponding to CSR_TLBELO0/1 */
@@ -253,7 +282,7 @@ typedef struct CPUArchState {
 uint64_t gpr[32];
 uint64_t pc;
 
-uint64_t fpr[32];
+fpr_t fpr[32];
 float_status fp_status;
 bool cf[8];
 
diff --git a/target/loongarch/gdbstub.c b/target/loongarch/gdbstub.c
index fa3e034d15..0752fff924 100644
--- a/target/loongarch/gdbstub.c
+++ b/target/loongarch/gdbstub.c
@@ -69,7 +69,7 @@ static int loongarch_gdb_get_fpu(CPULoongArchState *env,
  GByteArray *mem_buf, int n)
 {
 if (0 <= n && n < 32) {
-return gdb_get_reg64(mem_buf, env->fpr[n]);
+return gdb_get_reg64(mem_buf, env->fpr[n].vreg.D(0));
 } else if (n == 32) {
 uint64_t val = read_fcc(env);
 return gdb_get_reg64(mem_buf, val);
@@ -85,7 +85,7 @@ static int loongarch_gdb_set_fpu(CPULoongArchState *env,
 int length = 0;
 
 if (0 <= n && n < 32) {
-env->fpr[n] = ldq_p(mem_buf);
+env->fpr[n].vreg.D(0) = ldq_p(mem_buf);
 length = 8;
 } else if (n == 32) {
 uint64_t val = ldq_p(mem_buf);
diff --git a/target/loongarch/machine.c b/target/loongarch/machine.c
index b1e523ea72..54e67e63bc 100644
--- a/target/loongarch/machine.c
+++ b/target/loongarch/machine.c
@@ -33,7 +33,39 @@ const VMStateDescription vmstate_loongarch_cpu = {
 
 VMSTATE_UINTTL_ARRAY(env.gpr, LoongArchCPU, 32),
 VMSTATE_UINTTL(env.pc, LoongArchCPU),
-VMSTATE_UINT64_ARRAY(env.fpr, LoongArchCPU, 32),
+VMSTATE_INT64(env.fpr[0].vreg.D(0), LoongArchCPU),
+VMSTATE_INT64(env.fpr[1].vreg.D(0), LoongArchCPU),
+VMSTATE_INT64(env.fpr[2].vreg.D(0), LoongArchCPU),
+VMSTATE_INT64(env.fpr[3].vreg.D(0), LoongArchCPU),
+VMSTATE_INT64(env.fpr[4].vreg.D(0), LoongArchCPU),
+VMSTATE_INT64(env.fpr[5].vreg.D(0), LoongArchCPU),
+VMSTATE_INT64(env.fpr[6].vreg.D(0), LoongArchCPU),
+VMSTATE_INT64(env.fpr[7].vreg.D(0), LoongArchCPU),
+VMSTATE_INT64(env.fpr[8].vreg.D(0), LoongArchCPU),
+VMSTATE_INT64(env.fpr[9].vreg.D(0), LoongArchCPU),
+

[RFC PATCH v2 28/44] target/loongarch: Implement vssrln vssran

2023-03-27 Thread Song Gao
This patch includes:
- VSSRLN.{B.H/H.W/W.D};
- VSSRAN.{B.H/H.W/W.D};
- VSSRLN.{BU.H/HU.W/WU.D};
- VSSRAN.{BU.H/HU.W/WU.D};
- VSSRLNI.{B.H/H.W/W.D/D.Q};
- VSSRANI.{B.H/H.W/W.D/D.Q};
- VSSRLNI.{BU.H/HU.W/WU.D/DU.Q};
- VSSRANI.{BU.H/HU.W/WU.D/DU.Q}.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c|  30 ++
 target/loongarch/helper.h   |  30 ++
 target/loongarch/insn_trans/trans_lsx.c.inc |  30 ++
 target/loongarch/insns.decode   |  30 ++
 target/loongarch/lsx_helper.c   | 383 
 5 files changed, 503 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 185cd36381..426d30dc01 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1198,3 +1198,33 @@ INSN_LSX(vsrarni_b_h,  vv_i)
 INSN_LSX(vsrarni_h_w,  vv_i)
 INSN_LSX(vsrarni_w_d,  vv_i)
 INSN_LSX(vsrarni_d_q,  vv_i)
+
+INSN_LSX(vssrln_b_h,   vvv)
+INSN_LSX(vssrln_h_w,   vvv)
+INSN_LSX(vssrln_w_d,   vvv)
+INSN_LSX(vssran_b_h,   vvv)
+INSN_LSX(vssran_h_w,   vvv)
+INSN_LSX(vssran_w_d,   vvv)
+INSN_LSX(vssrln_bu_h,  vvv)
+INSN_LSX(vssrln_hu_w,  vvv)
+INSN_LSX(vssrln_wu_d,  vvv)
+INSN_LSX(vssran_bu_h,  vvv)
+INSN_LSX(vssran_hu_w,  vvv)
+INSN_LSX(vssran_wu_d,  vvv)
+
+INSN_LSX(vssrlni_b_h,  vv_i)
+INSN_LSX(vssrlni_h_w,  vv_i)
+INSN_LSX(vssrlni_w_d,  vv_i)
+INSN_LSX(vssrlni_d_q,  vv_i)
+INSN_LSX(vssrani_b_h,  vv_i)
+INSN_LSX(vssrani_h_w,  vv_i)
+INSN_LSX(vssrani_w_d,  vv_i)
+INSN_LSX(vssrani_d_q,  vv_i)
+INSN_LSX(vssrlni_bu_h, vv_i)
+INSN_LSX(vssrlni_hu_w, vv_i)
+INSN_LSX(vssrlni_wu_d, vv_i)
+INSN_LSX(vssrlni_du_q, vv_i)
+INSN_LSX(vssrani_bu_h, vv_i)
+INSN_LSX(vssrani_hu_w, vv_i)
+INSN_LSX(vssrani_wu_d, vv_i)
+INSN_LSX(vssrani_du_q, vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index ee0812dca2..7562f01ad6 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -426,3 +426,33 @@ DEF_HELPER_4(vsrarni_b_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vsrarni_h_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vsrarni_w_d, void, env, i32, i32, i32)
 DEF_HELPER_4(vsrarni_d_q, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vssrln_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrln_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrln_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssran_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssran_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssran_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrln_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrln_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrln_wu_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssran_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssran_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssran_wu_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vssrlni_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlni_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlni_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlni_d_q, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrani_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrani_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrani_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrani_d_q, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlni_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlni_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlni_wu_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrlni_du_q, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrani_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrani_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrani_wu_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vssrani_du_q, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index a8f699915d..58f27d7f65 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2725,3 +2725,33 @@ TRANS(vsrarni_b_h, gen_vv_i, gen_helper_vsrarni_b_h)
 TRANS(vsrarni_h_w, gen_vv_i, gen_helper_vsrarni_h_w)
 TRANS(vsrarni_w_d, gen_vv_i, gen_helper_vsrarni_w_d)
 TRANS(vsrarni_d_q, gen_vv_i, gen_helper_vsrarni_d_q)
+
+TRANS(vssrln_b_h, gen_vvv, gen_helper_vssrln_b_h)
+TRANS(vssrln_h_w, gen_vvv, gen_helper_vssrln_h_w)
+TRANS(vssrln_w_d, gen_vvv, gen_helper_vssrln_w_d)
+TRANS(vssran_b_h, gen_vvv, gen_helper_vssran_b_h)
+TRANS(vssran_h_w, gen_vvv, gen_helper_vssran_h_w)
+TRANS(vssran_w_d, gen_vvv, gen_helper_vssran_w_d)
+TRANS(vssrln_bu_h, gen_vvv, gen_helper_vssrln_bu_h)
+TRANS(vssrln_hu_w, gen_vvv, gen_helper_vssrln_hu_w)
+TRANS(vssrln_wu_d, gen_vvv, gen_helper_vssrln_wu_d)
+TRANS(vssran_bu_h, gen_vvv, gen_helper_vssran_bu_h)
+TRANS(vssran_hu_w, gen_vvv, gen_helper_vssran_hu_w)
+TRANS(vssran_wu_d, gen_vvv, gen_helper_vssran_wu_d)
+
+TRANS(vssrlni_b_h, gen_vv_i, gen_helper_vssrlni_b_h)
+TRANS(vssrlni_h_w, gen_vv_i, gen_helper_vssrlni_h_w)
+TRANS(vssrlni_w_d, 

[RFC PATCH v2 26/44] target/loongarch: Implement vsrln vsran

2023-03-27 Thread Song Gao
This patch includes:
- VSRLN.{B.H/H.W/W.D};
- VSRAN.{B.H/H.W/W.D};
- VSRLNI.{B.H/H.W/W.D/D.Q};
- VSRANI.{B.H/H.W/W.D/D.Q}.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c|  16 +++
 target/loongarch/helper.h   |  16 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  16 +++
 target/loongarch/insns.decode   |  17 +++
 target/loongarch/lsx_helper.c   | 118 
 5 files changed, 183 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index c62b6720ec..f0fc2ff84b 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1166,3 +1166,19 @@ INSN_LSX(vsrari_b, vv_i)
 INSN_LSX(vsrari_h, vv_i)
 INSN_LSX(vsrari_w, vv_i)
 INSN_LSX(vsrari_d, vv_i)
+
+INSN_LSX(vsrln_b_h,   vvv)
+INSN_LSX(vsrln_h_w,   vvv)
+INSN_LSX(vsrln_w_d,   vvv)
+INSN_LSX(vsran_b_h,   vvv)
+INSN_LSX(vsran_h_w,   vvv)
+INSN_LSX(vsran_w_d,   vvv)
+
+INSN_LSX(vsrlni_b_h,   vv_i)
+INSN_LSX(vsrlni_h_w,   vv_i)
+INSN_LSX(vsrlni_w_d,   vv_i)
+INSN_LSX(vsrlni_d_q,   vv_i)
+INSN_LSX(vsrani_b_h,   vv_i)
+INSN_LSX(vsrani_h_w,   vv_i)
+INSN_LSX(vsrani_w_d,   vv_i)
+INSN_LSX(vsrani_d_q,   vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index c28353d822..e7d0a8d6cf 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -394,3 +394,19 @@ DEF_HELPER_4(vsrari_b, void, env, i32, i32, i32)
 DEF_HELPER_4(vsrari_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vsrari_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vsrari_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vsrln_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrln_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrln_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsran_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsran_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsran_w_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vsrlni_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlni_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlni_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlni_d_q, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrani_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrani_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrani_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrani_d_q, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index 2ee763fb32..77f7d6319f 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2693,3 +2693,19 @@ TRANS(vsrari_b, gen_vv_i, gen_helper_vsrari_b)
 TRANS(vsrari_h, gen_vv_i, gen_helper_vsrari_h)
 TRANS(vsrari_w, gen_vv_i, gen_helper_vsrari_w)
 TRANS(vsrari_d, gen_vv_i, gen_helper_vsrari_d)
+
+TRANS(vsrln_b_h, gen_vvv, gen_helper_vsrln_b_h)
+TRANS(vsrln_h_w, gen_vvv, gen_helper_vsrln_h_w)
+TRANS(vsrln_w_d, gen_vvv, gen_helper_vsrln_w_d)
+TRANS(vsran_b_h, gen_vvv, gen_helper_vsran_b_h)
+TRANS(vsran_h_w, gen_vvv, gen_helper_vsran_h_w)
+TRANS(vsran_w_d, gen_vvv, gen_helper_vsran_w_d)
+
+TRANS(vsrlni_b_h, gen_vv_i, gen_helper_vsrlni_b_h)
+TRANS(vsrlni_h_w, gen_vv_i, gen_helper_vsrlni_h_w)
+TRANS(vsrlni_w_d, gen_vv_i, gen_helper_vsrlni_w_d)
+TRANS(vsrlni_d_q, gen_vv_i, gen_helper_vsrlni_d_q)
+TRANS(vsrani_b_h, gen_vv_i, gen_helper_vsrani_b_h)
+TRANS(vsrani_h_w, gen_vv_i, gen_helper_vsrani_h_w)
+TRANS(vsrani_w_d, gen_vv_i, gen_helper_vsrani_w_d)
+TRANS(vsrani_d_q, gen_vv_i, gen_helper_vsrani_d_q)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index a21743..ee54b632a7 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -503,6 +503,7 @@ dbcl  0010 10101 ...  
@i15
 @vv_ui4   . . imm:4 vj:5 vd:5_i
 @vv_ui5     . imm:5 vj:5 vd:5_i
 @vv_ui6   imm:6 vj:5 vd:5_i
+@vv_ui7   ... imm:7 vj:5 vd:5_i
 @vv_ui8    .. imm:8 vj:5 vd:5_i
 @vv_i5     . imm:s5 vj:5 vd:5_i
 
@@ -866,3 +867,19 @@ vsrari_b 0111 00101010 1 01 ... . .   
@vv_ui3
 vsrari_h 0111 00101010 1 1  . .   @vv_ui4
 vsrari_w 0111 00101010 10001 . . .@vv_ui5
 vsrari_d 0111 00101010 1001 .. . .@vv_ui6
+
+vsrln_b_h0111  01001 . . .@vvv
+vsrln_h_w0111  01010 . . .@vvv
+vsrln_w_d0111  01011 . . .@vvv
+vsran_b_h0111  01101 . . .@vvv
+vsran_h_w0111  01110 . . .@vvv
+vsran_w_d0111  0 . . .@vvv
+
+vsrlni_b_h   0111 00110100 0 1  . .   @vv_ui4
+vsrlni_h_w   0111 00110100 1 . . .@vv_ui5

[RFC PATCH v2 27/44] target/loongarch: Implement vsrlrn vsrarn

2023-03-27 Thread Song Gao
This patch includes:
- VSRLRN.{B.H/H.W/W.D};
- VSRARN.{B.H/H.W/W.D};
- VSRLRNI.{B.H/H.W/W.D/D.Q};
- VSRARNI.{B.H/H.W/W.D/D.Q}.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c|  16 +++
 target/loongarch/helper.h   |  16 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  16 +++
 target/loongarch/insns.decode   |  16 +++
 target/loongarch/lsx_helper.c   | 132 
 5 files changed, 196 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index f0fc2ff84b..185cd36381 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1182,3 +1182,19 @@ INSN_LSX(vsrani_b_h,   vv_i)
 INSN_LSX(vsrani_h_w,   vv_i)
 INSN_LSX(vsrani_w_d,   vv_i)
 INSN_LSX(vsrani_d_q,   vv_i)
+
+INSN_LSX(vsrlrn_b_h,   vvv)
+INSN_LSX(vsrlrn_h_w,   vvv)
+INSN_LSX(vsrlrn_w_d,   vvv)
+INSN_LSX(vsrarn_b_h,   vvv)
+INSN_LSX(vsrarn_h_w,   vvv)
+INSN_LSX(vsrarn_w_d,   vvv)
+
+INSN_LSX(vsrlrni_b_h,  vv_i)
+INSN_LSX(vsrlrni_h_w,  vv_i)
+INSN_LSX(vsrlrni_w_d,  vv_i)
+INSN_LSX(vsrlrni_d_q,  vv_i)
+INSN_LSX(vsrarni_b_h,  vv_i)
+INSN_LSX(vsrarni_h_w,  vv_i)
+INSN_LSX(vsrarni_w_d,  vv_i)
+INSN_LSX(vsrarni_d_q,  vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index e7d0a8d6cf..ee0812dca2 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -410,3 +410,19 @@ DEF_HELPER_4(vsrani_b_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vsrani_h_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vsrani_w_d, void, env, i32, i32, i32)
 DEF_HELPER_4(vsrani_d_q, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vsrlrn_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlrn_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlrn_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrarn_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrarn_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrarn_w_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vsrlrni_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlrni_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlrni_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrlrni_d_q, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrarni_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrarni_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrarni_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsrarni_d_q, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index 77f7d6319f..a8f699915d 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2709,3 +2709,19 @@ TRANS(vsrani_b_h, gen_vv_i, gen_helper_vsrani_b_h)
 TRANS(vsrani_h_w, gen_vv_i, gen_helper_vsrani_h_w)
 TRANS(vsrani_w_d, gen_vv_i, gen_helper_vsrani_w_d)
 TRANS(vsrani_d_q, gen_vv_i, gen_helper_vsrani_d_q)
+
+TRANS(vsrlrn_b_h, gen_vvv, gen_helper_vsrlrn_b_h)
+TRANS(vsrlrn_h_w, gen_vvv, gen_helper_vsrlrn_h_w)
+TRANS(vsrlrn_w_d, gen_vvv, gen_helper_vsrlrn_w_d)
+TRANS(vsrarn_b_h, gen_vvv, gen_helper_vsrarn_b_h)
+TRANS(vsrarn_h_w, gen_vvv, gen_helper_vsrarn_h_w)
+TRANS(vsrarn_w_d, gen_vvv, gen_helper_vsrarn_w_d)
+
+TRANS(vsrlrni_b_h, gen_vv_i, gen_helper_vsrlrni_b_h)
+TRANS(vsrlrni_h_w, gen_vv_i, gen_helper_vsrlrni_h_w)
+TRANS(vsrlrni_w_d, gen_vv_i, gen_helper_vsrlrni_w_d)
+TRANS(vsrlrni_d_q, gen_vv_i, gen_helper_vsrlrni_d_q)
+TRANS(vsrarni_b_h, gen_vv_i, gen_helper_vsrarni_b_h)
+TRANS(vsrarni_h_w, gen_vv_i, gen_helper_vsrarni_h_w)
+TRANS(vsrarni_w_d, gen_vv_i, gen_helper_vsrarni_w_d)
+TRANS(vsrarni_d_q, gen_vv_i, gen_helper_vsrarni_d_q)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index ee54b632a7..29bf4a8a6a 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -883,3 +883,19 @@ vsrani_b_h   0111 00110101 1 1  . .   
@vv_ui4
 vsrani_h_w   0111 00110101 10001 . . .@vv_ui5
 vsrani_w_d   0111 00110101 1001 .. . .@vv_ui6
 vsrani_d_q   0111 00110101 101 ... . .@vv_ui7
+
+vsrlrn_b_h   0111  10001 . . .@vvv
+vsrlrn_h_w   0111  10010 . . .@vvv
+vsrlrn_w_d   0111  10011 . . .@vvv
+vsrarn_b_h   0111  10101 . . .@vvv
+vsrarn_h_w   0111  10110 . . .@vvv
+vsrarn_w_d   0111  10111 . . .@vvv
+
+vsrlrni_b_h  0111 00110100 01000 1  . .   @vv_ui4
+vsrlrni_h_w  0111 00110100 01001 . . .@vv_ui5
+vsrlrni_w_d  0111 00110100 0101 .. . .@vv_ui6
+vsrlrni_d_q  0111 00110100 011 ... . .@vv_ui7
+vsrarni_b_h  0111 00110101 11000 1  . .   @vv_ui4
+vsrarni_h_w  0111 00110101 11001 . . .@vv_ui5
+vsrarni_w_d  0111 00110101 1101 .. . .@vv_ui6
+vsrarni_d_q  0111 00110101 111 

[RFC PATCH v2 41/44] target/loongarch: Implement vilvl vilvh vextrins vshuf

2023-03-27 Thread Song Gao
This patch includes:
- VILV{L/H}.{B/H/W/D};
- VSHUF.{B/H/W/D};
- VSHUF4I.{B/H/W/D};
- VPERMI.W;
- VEXTRINS.{B/H/W/D}.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c|  25 +++
 target/loongarch/helper.h   |  25 +++
 target/loongarch/insn_trans/trans_lsx.c.inc |  25 +++
 target/loongarch/insns.decode   |  25 +++
 target/loongarch/lsx_helper.c   | 163 
 5 files changed, 263 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index c6cf782725..0b62bbb8be 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1629,3 +1629,28 @@ INSN_LSX(vpickod_b,vvv)
 INSN_LSX(vpickod_h,vvv)
 INSN_LSX(vpickod_w,vvv)
 INSN_LSX(vpickod_d,vvv)
+
+INSN_LSX(vilvl_b,  vvv)
+INSN_LSX(vilvl_h,  vvv)
+INSN_LSX(vilvl_w,  vvv)
+INSN_LSX(vilvl_d,  vvv)
+INSN_LSX(vilvh_b,  vvv)
+INSN_LSX(vilvh_h,  vvv)
+INSN_LSX(vilvh_w,  vvv)
+INSN_LSX(vilvh_d,  vvv)
+
+INSN_LSX(vshuf_b,  )
+INSN_LSX(vshuf_h,  vvv)
+INSN_LSX(vshuf_w,  vvv)
+INSN_LSX(vshuf_d,  vvv)
+INSN_LSX(vshuf4i_b,vv_i)
+INSN_LSX(vshuf4i_h,vv_i)
+INSN_LSX(vshuf4i_w,vv_i)
+INSN_LSX(vshuf4i_d,vv_i)
+
+INSN_LSX(vpermi_w, vv_i)
+
+INSN_LSX(vextrins_d,   vv_i)
+INSN_LSX(vextrins_w,   vv_i)
+INSN_LSX(vextrins_h,   vv_i)
+INSN_LSX(vextrins_b,   vv_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index bf03a16afd..86c7eeeae1 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -686,3 +686,28 @@ DEF_HELPER_4(vpickod_b, void, env, i32, i32, i32)
 DEF_HELPER_4(vpickod_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vpickod_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vpickod_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vilvl_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vilvl_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vilvl_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vilvl_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vilvh_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vilvh_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vilvh_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vilvh_d, void, env, i32, i32, i32)
+
+DEF_HELPER_5(vshuf_b, void, env, i32, i32, i32, i32)
+DEF_HELPER_4(vshuf_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vshuf_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vshuf_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vshuf4i_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vshuf4i_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vshuf4i_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vshuf4i_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vpermi_w, void, env, i32, i32, i32)
+
+DEF_HELPER_4(vextrins_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vextrins_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vextrins_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vextrins_d, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index 66cb67a19c..0ea7c65445 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -3485,3 +3485,28 @@ TRANS(vpickod_b, gen_vvv, gen_helper_vpickod_b)
 TRANS(vpickod_h, gen_vvv, gen_helper_vpickod_h)
 TRANS(vpickod_w, gen_vvv, gen_helper_vpickod_w)
 TRANS(vpickod_d, gen_vvv, gen_helper_vpickod_d)
+
+TRANS(vilvl_b, gen_vvv, gen_helper_vilvl_b)
+TRANS(vilvl_h, gen_vvv, gen_helper_vilvl_h)
+TRANS(vilvl_w, gen_vvv, gen_helper_vilvl_w)
+TRANS(vilvl_d, gen_vvv, gen_helper_vilvl_d)
+TRANS(vilvh_b, gen_vvv, gen_helper_vilvh_b)
+TRANS(vilvh_h, gen_vvv, gen_helper_vilvh_h)
+TRANS(vilvh_w, gen_vvv, gen_helper_vilvh_w)
+TRANS(vilvh_d, gen_vvv, gen_helper_vilvh_d)
+
+TRANS(vshuf_b, gen_, gen_helper_vshuf_b)
+TRANS(vshuf_h, gen_vvv, gen_helper_vshuf_h)
+TRANS(vshuf_w, gen_vvv, gen_helper_vshuf_w)
+TRANS(vshuf_d, gen_vvv, gen_helper_vshuf_d)
+TRANS(vshuf4i_b, gen_vv_i, gen_helper_vshuf4i_b)
+TRANS(vshuf4i_h, gen_vv_i, gen_helper_vshuf4i_h)
+TRANS(vshuf4i_w, gen_vv_i, gen_helper_vshuf4i_w)
+TRANS(vshuf4i_d, gen_vv_i, gen_helper_vshuf4i_d)
+
+TRANS(vpermi_w, gen_vv_i, gen_helper_vpermi_w)
+
+TRANS(vextrins_b, gen_vv_i, gen_helper_vextrins_b)
+TRANS(vextrins_h, gen_vv_i, gen_helper_vextrins_h)
+TRANS(vextrins_w, gen_vv_i, gen_helper_vextrins_w)
+TRANS(vextrins_d, gen_vv_i, gen_helper_vextrins_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index ab9e9e422f..0263bce28e 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1231,3 +1231,28 @@ vpickod_b0111 00010010 0 . . .   
 @vvv
 vpickod_h0111 00010010 1 . . .@vvv
 vpickod_w0111 00010010 00010 . . .@vvv
 vpickod_d0111 00010010 00011 . . .@vvv
+
+vilvl_b  0111 00010001 10100 . . .@vvv
+vilvl_h  0111 00010001 10101 . 

[RFC PATCH v2 37/44] target/loongarch: Implement vfcmp

2023-03-27 Thread Song Gao
This patch includes:
- VFCMP.cond.{S/D}.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c| 94 +
 target/loongarch/helper.h   |  5 ++
 target/loongarch/insn_trans/trans_lsx.c.inc | 32 +++
 target/loongarch/insns.decode   |  5 ++
 target/loongarch/lsx_helper.c   | 51 +++
 5 files changed, 187 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index e589b23f4c..64db01d2f9 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1447,3 +1447,97 @@ INSN_LSX(vslti_bu, vv_i)
 INSN_LSX(vslti_hu, vv_i)
 INSN_LSX(vslti_wu, vv_i)
 INSN_LSX(vslti_du, vv_i)
+
+#define output_vfcmp(C, PREFIX, SUFFIX) \
+{   \
+(C)->info->fprintf_func((C)->info->stream, "%08x   %s%s\t%d, f%d, f%d", \
+(C)->insn, PREFIX, SUFFIX, a->vd,   \
+a->vj, a->vk);  \
+}
+
+static bool output_vvv_fcond(DisasContext *ctx, arg_vvv_fcond * a,
+ const char *suffix)
+{
+bool ret = true;
+switch (a->fcond) {
+case 0x0:
+output_vfcmp(ctx, "vfcmp_caf_", suffix);
+break;
+case 0x1:
+output_vfcmp(ctx, "vfcmp_saf_", suffix);
+break;
+case 0x2:
+output_vfcmp(ctx, "vfcmp_clt_", suffix);
+break;
+case 0x3:
+output_vfcmp(ctx, "vfcmp_slt_", suffix);
+break;
+case 0x4:
+output_vfcmp(ctx, "vfcmp_ceq_", suffix);
+break;
+case 0x5:
+output_vfcmp(ctx, "vfcmp_seq_", suffix);
+break;
+case 0x6:
+output_vfcmp(ctx, "vfcmp_cle_", suffix);
+break;
+case 0x7:
+output_vfcmp(ctx, "vfcmp_sle_", suffix);
+break;
+case 0x8:
+output_vfcmp(ctx, "vfcmp_cun_", suffix);
+break;
+case 0x9:
+output_vfcmp(ctx, "vfcmp_sun_", suffix);
+break;
+case 0xA:
+output_vfcmp(ctx, "vfcmp_cult_", suffix);
+break;
+case 0xB:
+output_vfcmp(ctx, "vfcmp_sult_", suffix);
+break;
+case 0xC:
+output_vfcmp(ctx, "vfcmp_cueq_", suffix);
+break;
+case 0xD:
+output_vfcmp(ctx, "vfcmp_sueq_", suffix);
+break;
+case 0xE:
+output_vfcmp(ctx, "vfcmp_cule_", suffix);
+break;
+case 0xF:
+output_vfcmp(ctx, "vfcmp_sule_", suffix);
+break;
+case 0x10:
+output_vfcmp(ctx, "vfcmp_cne_", suffix);
+break;
+case 0x11:
+output_vfcmp(ctx, "vfcmp_sne_", suffix);
+break;
+case 0x14:
+output_vfcmp(ctx, "vfcmp_cor_", suffix);
+break;
+case 0x15:
+output_vfcmp(ctx, "vfcmp_sor_", suffix);
+break;
+case 0x18:
+output_vfcmp(ctx, "vfcmp_cune_", suffix);
+break;
+case 0x19:
+output_vfcmp(ctx, "vfcmp_sune_", suffix);
+break;
+default:
+ret = false;
+}
+return ret;
+}
+
+#define LSX_FCMP_INSN(suffix)\
+static bool trans_vfcmp_cond_##suffix(DisasContext *ctx, \
+ arg_vvv_fcond * a)  \
+{\
+return output_vvv_fcond(ctx, a, #suffix);\
+}
+
+LSX_FCMP_INSN(s)
+LSX_FCMP_INSN(d)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 25ea9b633d..ef0b67349d 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -650,3 +650,8 @@ DEF_HELPER_FLAGS_4(vslti_bu, TCG_CALL_NO_RWG, void, ptr, 
ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vslti_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vslti_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vslti_du, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_5(vfcmp_c_s, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(vfcmp_s_s, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(vfcmp_c_d, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(vfcmp_s_d, void, env, i32, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index 7368731424..593b8b481d 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -3131,3 +3131,35 @@ TRANS(vslti_bu, do_vslti_u, MO_8)
 TRANS(vslti_hu, do_vslti_u, MO_16)
 TRANS(vslti_wu, do_vslti_u, MO_32)
 TRANS(vslti_du, do_vslti_u, MO_64)
+
+static bool trans_vfcmp_cond_s(DisasContext *ctx, arg_vvv_fcond *a)
+{
+uint32_t flags;
+void (*fn)(TCGv_env, TCGv_i32, TCGv_i32, TCGv_i32, TCGv_i32);
+TCGv_i32 vd = tcg_constant_i32(a->vd);
+TCGv_i32 vj = tcg_constant_i32(a->vj);
+TCGv_i32 vk = tcg_constant_i32(a->vk);
+
+CHECK_SXE;
+
+fn = (a->fcond & 1 ? gen_helper_vfcmp_s_s : gen_helper_vfcmp_c_s);
+   

[RFC PATCH v2 02/44] target/loongarch: CPUCFG support LSX

2023-03-27 Thread Song Gao
Signed-off-by: Song Gao 
---
 target/loongarch/cpu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 18b41221a6..2263bd4fdd 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -386,6 +386,7 @@ static void loongarch_la464_initfn(Object *obj)
 data = FIELD_DP32(data, CPUCFG2, FP_SP, 1);
 data = FIELD_DP32(data, CPUCFG2, FP_DP, 1);
 data = FIELD_DP32(data, CPUCFG2, FP_VER, 1);
+data = FIELD_DP32(data, CPUCFG2, LSX, 1),
 data = FIELD_DP32(data, CPUCFG2, LLFTP, 1);
 data = FIELD_DP32(data, CPUCFG2, LLFTP_VER, 1);
 data = FIELD_DP32(data, CPUCFG2, LAM, 1);
-- 
2.31.1




[RFC PATCH v2 17/44] target/loongarch: Implement vdiv/vmod

2023-03-27 Thread Song Gao
This patch includes:
- VDIV.{B/H/W/D}[U];
- VMOD.{B/H/W/D}[U].

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c| 17 +
 target/loongarch/helper.h   | 17 +
 target/loongarch/insn_trans/trans_lsx.c.inc | 17 +
 target/loongarch/insns.decode   | 17 +
 target/loongarch/lsx_helper.c   | 38 +
 5 files changed, 106 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 980e6e6375..6e4f676a42 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1044,3 +1044,20 @@ INSN_LSX(vmaddwod_h_bu_b,  vvv)
 INSN_LSX(vmaddwod_w_hu_h,  vvv)
 INSN_LSX(vmaddwod_d_wu_w,  vvv)
 INSN_LSX(vmaddwod_q_du_d,  vvv)
+
+INSN_LSX(vdiv_b,   vvv)
+INSN_LSX(vdiv_h,   vvv)
+INSN_LSX(vdiv_w,   vvv)
+INSN_LSX(vdiv_d,   vvv)
+INSN_LSX(vdiv_bu,  vvv)
+INSN_LSX(vdiv_hu,  vvv)
+INSN_LSX(vdiv_wu,  vvv)
+INSN_LSX(vdiv_du,  vvv)
+INSN_LSX(vmod_b,   vvv)
+INSN_LSX(vmod_h,   vvv)
+INSN_LSX(vmod_w,   vvv)
+INSN_LSX(vmod_d,   vvv)
+INSN_LSX(vmod_bu,  vvv)
+INSN_LSX(vmod_hu,  vvv)
+INSN_LSX(vmod_wu,  vvv)
+INSN_LSX(vmod_du,  vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 6bb273fefe..e46f12cb65 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -318,3 +318,20 @@ DEF_HELPER_FLAGS_4(vmaddwod_h_bu_b, TCG_CALL_NO_RWG, void, 
ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vmaddwod_w_hu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vmaddwod_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vmaddwod_q_du_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_4(vdiv_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vdiv_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vdiv_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vdiv_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vdiv_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vdiv_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vdiv_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vdiv_du, void, env, i32, i32, i32)
+DEF_HELPER_4(vmod_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vmod_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vmod_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vmod_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vmod_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmod_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmod_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(vmod_du, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index 29c7aca8f9..46a18da6dd 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2365,3 +2365,20 @@ TRANS(vmaddwod_h_bu_b, gvec_vvv, MO_8, do_vmaddwod_u_s)
 TRANS(vmaddwod_w_hu_h, gvec_vvv, MO_16, do_vmaddwod_u_s)
 TRANS(vmaddwod_d_wu_w, gvec_vvv, MO_32, do_vmaddwod_u_s)
 TRANS(vmaddwod_q_du_d, gvec_vvv, MO_64, do_vmaddwod_u_s)
+
+TRANS(vdiv_b, gen_vvv, gen_helper_vdiv_b)
+TRANS(vdiv_h, gen_vvv, gen_helper_vdiv_h)
+TRANS(vdiv_w, gen_vvv, gen_helper_vdiv_w)
+TRANS(vdiv_d, gen_vvv, gen_helper_vdiv_d)
+TRANS(vdiv_bu, gen_vvv, gen_helper_vdiv_bu)
+TRANS(vdiv_hu, gen_vvv, gen_helper_vdiv_hu)
+TRANS(vdiv_wu, gen_vvv, gen_helper_vdiv_wu)
+TRANS(vdiv_du, gen_vvv, gen_helper_vdiv_du)
+TRANS(vmod_b, gen_vvv, gen_helper_vmod_b)
+TRANS(vmod_h, gen_vvv, gen_helper_vmod_h)
+TRANS(vmod_w, gen_vvv, gen_helper_vmod_w)
+TRANS(vmod_d, gen_vvv, gen_helper_vmod_d)
+TRANS(vmod_bu, gen_vvv, gen_helper_vmod_bu)
+TRANS(vmod_hu, gen_vvv, gen_helper_vmod_hu)
+TRANS(vmod_wu, gen_vvv, gen_helper_vmod_wu)
+TRANS(vmod_du, gen_vvv, gen_helper_vmod_du)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index df23d4ee1e..67d016edb7 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -740,3 +740,20 @@ vmaddwod_h_bu_b  0111 1011 11100 . . .
@vvv
 vmaddwod_w_hu_h  0111 1011 11101 . . .@vvv
 vmaddwod_d_wu_w  0111 1011 0 . . .@vvv
 vmaddwod_q_du_d  0111 1011 1 . . .@vvv
+
+vdiv_b   0111 1110 0 . . .@vvv
+vdiv_h   0111 1110 1 . . .@vvv
+vdiv_w   0111 1110 00010 . . .@vvv
+vdiv_d   0111 1110 00011 . . .@vvv
+vdiv_bu  0111 1110 01000 . . .@vvv
+vdiv_hu  0111 1110 01001 . . .@vvv
+vdiv_wu  0111 1110 01010 . . .@vvv
+vdiv_du  0111 1110 01011 . . .@vvv
+vmod_b   0111 1110 00100 . . .@vvv
+vmod_h   0111 1110 00101 . . .@vvv
+vmod_w   0111 1110 00110 . . .@vvv
+vmod_d   0111 1110 00111 . . .

[RFC PATCH v2 06/44] target/loongarch: Implement vaddi/vsubi

2023-03-27 Thread Song Gao
This patch includes:
- VADDI.{B/H/W/D}U;
- VSUBI.{B/H/W/D}U.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c| 14 
 target/loongarch/insn_trans/trans_lsx.c.inc | 37 +
 target/loongarch/insns.decode   | 11 ++
 3 files changed, 62 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index a5948d7847..c1960610c2 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -797,6 +797,11 @@ static void output_vvv(DisasContext *ctx, arg_vvv *a, 
const char *mnemonic)
 output(ctx, mnemonic, "v%d, v%d, v%d", a->vd, a->vj, a->vk);
 }
 
+static void output_vv_i(DisasContext *ctx, arg_vv_i *a, const char *mnemonic)
+{
+output(ctx, mnemonic, "v%d, v%d, 0x%x", a->vd, a->vj, a->imm);
+}
+
 INSN_LSX(vadd_b,   vvv)
 INSN_LSX(vadd_h,   vvv)
 INSN_LSX(vadd_w,   vvv)
@@ -807,3 +812,12 @@ INSN_LSX(vsub_h,   vvv)
 INSN_LSX(vsub_w,   vvv)
 INSN_LSX(vsub_d,   vvv)
 INSN_LSX(vsub_q,   vvv)
+
+INSN_LSX(vaddi_bu, vv_i)
+INSN_LSX(vaddi_hu, vv_i)
+INSN_LSX(vaddi_wu, vv_i)
+INSN_LSX(vaddi_du, vv_i)
+INSN_LSX(vsubi_bu, vv_i)
+INSN_LSX(vsubi_hu, vv_i)
+INSN_LSX(vsubi_wu, vv_i)
+INSN_LSX(vsubi_du, vv_i)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index 2fe0e4ace5..99a5c2474d 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -44,6 +44,34 @@ static bool gvec_vvv(DisasContext *ctx, arg_vvv *a, MemOp 
mop,
 return true;
 }
 
+static bool gvec_vv_i(DisasContext *ctx, arg_vv_i *a, MemOp mop,
+  void (*func)(unsigned, uint32_t, uint32_t,
+   int64_t, uint32_t, uint32_t))
+{
+uint32_t vd_ofs, vj_ofs;
+
+CHECK_SXE;
+
+vd_ofs = vreg_full_offset(a->vd);
+vj_ofs = vreg_full_offset(a->vj);
+
+func(mop, vd_ofs, vj_ofs, a->imm , 16, 16);
+return true;
+}
+
+static bool gvec_subi(DisasContext *ctx, arg_vv_i *a, MemOp mop)
+{
+uint32_t vd_ofs, vj_ofs;
+
+CHECK_SXE;
+
+vd_ofs = vreg_full_offset(a->vd);
+vj_ofs = vreg_full_offset(a->vj);
+
+tcg_gen_gvec_addi(mop, vd_ofs, vj_ofs, -(a->imm), 16, 16);
+return true;
+}
+
 TRANS(vadd_b, gvec_vvv, MO_8, tcg_gen_gvec_add)
 TRANS(vadd_h, gvec_vvv, MO_16, tcg_gen_gvec_add)
 TRANS(vadd_w, gvec_vvv, MO_32, tcg_gen_gvec_add)
@@ -54,3 +82,12 @@ TRANS(vsub_h, gvec_vvv, MO_16, tcg_gen_gvec_sub)
 TRANS(vsub_w, gvec_vvv, MO_32, tcg_gen_gvec_sub)
 TRANS(vsub_d, gvec_vvv, MO_64, tcg_gen_gvec_sub)
 TRANS(vsub_q, gen_vvv, gen_helper_vsub_q)
+
+TRANS(vaddi_bu, gvec_vv_i, MO_8, tcg_gen_gvec_addi)
+TRANS(vaddi_hu, gvec_vv_i, MO_16, tcg_gen_gvec_addi)
+TRANS(vaddi_wu, gvec_vv_i, MO_32, tcg_gen_gvec_addi)
+TRANS(vaddi_du, gvec_vv_i, MO_64, tcg_gen_gvec_addi)
+TRANS(vsubi_bu, gvec_subi, MO_8)
+TRANS(vsubi_hu, gvec_subi, MO_16)
+TRANS(vsubi_wu, gvec_subi, MO_32)
+TRANS(vsubi_du, gvec_subi, MO_64)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index d18db68d51..2a98c14518 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -491,11 +491,13 @@ dbcl  0010 10101 ...  
@i15
 #
 
   vd vj vk
+_i vd vj imm
 
 #
 # LSX Formats
 #
 @vvv     . vk:5 vj:5 vd:5
+@vv_ui5     . imm:5 vj:5 vd:5_i
 
 vadd_b   0111  10100 . . .@vvv
 vadd_h   0111  10101 . . .@vvv
@@ -507,3 +509,12 @@ vsub_h   0111  11001 . . .
@vvv
 vsub_w   0111  11010 . . .@vvv
 vsub_d   0111  11011 . . .@vvv
 vsub_q   0111 00010010 11011 . . .@vvv
+
+vaddi_bu 0111 00101000 10100 . . .@vv_ui5
+vaddi_hu 0111 00101000 10101 . . .@vv_ui5
+vaddi_wu 0111 00101000 10110 . . .@vv_ui5
+vaddi_du 0111 00101000 10111 . . .@vv_ui5
+vsubi_bu 0111 00101000 11000 . . .@vv_ui5
+vsubi_hu 0111 00101000 11001 . . .@vv_ui5
+vsubi_wu 0111 00101000 11010 . . .@vv_ui5
+vsubi_du 0111 00101000 11011 . . .@vv_ui5
-- 
2.31.1




[RFC PATCH v2 07/44] target/loongarch: Implement vneg

2023-03-27 Thread Song Gao
This patch includes;
- VNEG.{B/H/W/D}.

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c| 10 ++
 target/loongarch/insn_trans/trans_lsx.c.inc | 20 
 target/loongarch/insns.decode   |  7 +++
 3 files changed, 37 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index c1960610c2..5eabb8c47a 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -802,6 +802,11 @@ static void output_vv_i(DisasContext *ctx, arg_vv_i *a, 
const char *mnemonic)
 output(ctx, mnemonic, "v%d, v%d, 0x%x", a->vd, a->vj, a->imm);
 }
 
+static void output_vv(DisasContext *ctx, arg_vv *a, const char *mnemonic)
+{
+output(ctx, mnemonic, "v%d, v%d", a->vd, a->vj);
+}
+
 INSN_LSX(vadd_b,   vvv)
 INSN_LSX(vadd_h,   vvv)
 INSN_LSX(vadd_w,   vvv)
@@ -821,3 +826,8 @@ INSN_LSX(vsubi_bu, vv_i)
 INSN_LSX(vsubi_hu, vv_i)
 INSN_LSX(vsubi_wu, vv_i)
 INSN_LSX(vsubi_du, vv_i)
+
+INSN_LSX(vneg_b,   vv)
+INSN_LSX(vneg_h,   vv)
+INSN_LSX(vneg_w,   vv)
+INSN_LSX(vneg_d,   vv)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index 99a5c2474d..dc66e44a75 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -44,6 +44,21 @@ static bool gvec_vvv(DisasContext *ctx, arg_vvv *a, MemOp 
mop,
 return true;
 }
 
+static bool gvec_vv(DisasContext *ctx, arg_vv *a, MemOp mop,
+void (*func)(unsigned, uint32_t, uint32_t,
+ uint32_t, uint32_t))
+{
+uint32_t dofs, jofs;
+
+CHECK_SXE;
+
+dofs = vreg_full_offset(a->vd);
+jofs = vreg_full_offset(a->vj);
+
+func(mop, dofs, jofs, 16, 16);
+return true;
+}
+
 static bool gvec_vv_i(DisasContext *ctx, arg_vv_i *a, MemOp mop,
   void (*func)(unsigned, uint32_t, uint32_t,
int64_t, uint32_t, uint32_t))
@@ -91,3 +106,8 @@ TRANS(vsubi_bu, gvec_subi, MO_8)
 TRANS(vsubi_hu, gvec_subi, MO_16)
 TRANS(vsubi_wu, gvec_subi, MO_32)
 TRANS(vsubi_du, gvec_subi, MO_64)
+
+TRANS(vneg_b, gvec_vv, MO_8, tcg_gen_gvec_neg)
+TRANS(vneg_h, gvec_vv, MO_16, tcg_gen_gvec_neg)
+TRANS(vneg_w, gvec_vv, MO_32, tcg_gen_gvec_neg)
+TRANS(vneg_d, gvec_vv, MO_64, tcg_gen_gvec_neg)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 2a98c14518..d90798be11 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -490,12 +490,14 @@ dbcl  0010 10101 ...  
@i15
 # LSX Argument sets
 #
 
+   vd vj
   vd vj vk
 _i vd vj imm
 
 #
 # LSX Formats
 #
+@vv     . . vj:5 vd:5
 @vvv     . vk:5 vj:5 vd:5
 @vv_ui5     . imm:5 vj:5 vd:5_i
 
@@ -518,3 +520,8 @@ vsubi_bu 0111 00101000 11000 . . .
@vv_ui5
 vsubi_hu 0111 00101000 11001 . . .@vv_ui5
 vsubi_wu 0111 00101000 11010 . . .@vv_ui5
 vsubi_du 0111 00101000 11011 . . .@vv_ui5
+
+vneg_b   0111 00101001 11000 01100 . .@vv
+vneg_h   0111 00101001 11000 01101 . .@vv
+vneg_w   0111 00101001 11000 01110 . .@vv
+vneg_d   0111 00101001 11000 0 . .@vv
-- 
2.31.1




[RFC PATCH v2 11/44] target/loongarch: Implement vavg/vavgr

2023-03-27 Thread Song Gao
This patch includes:
- VAVG.{B/H/W/D}[U];
- VAVGR.{B/H/W/D}[U].

Signed-off-by: Song Gao 
---
 target/loongarch/disas.c|  17 ++
 target/loongarch/helper.h   |  18 ++
 target/loongarch/insn_trans/trans_lsx.c.inc | 197 
 target/loongarch/insns.decode   |  17 ++
 target/loongarch/lsx_helper.c   |  45 +
 5 files changed, 294 insertions(+)

diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 8ee14916f3..e7592e7a34 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -908,3 +908,20 @@ INSN_LSX(vaddwod_h_bu_b,   vvv)
 INSN_LSX(vaddwod_w_hu_h,   vvv)
 INSN_LSX(vaddwod_d_wu_w,   vvv)
 INSN_LSX(vaddwod_q_du_d,   vvv)
+
+INSN_LSX(vavg_b,   vvv)
+INSN_LSX(vavg_h,   vvv)
+INSN_LSX(vavg_w,   vvv)
+INSN_LSX(vavg_d,   vvv)
+INSN_LSX(vavg_bu,  vvv)
+INSN_LSX(vavg_hu,  vvv)
+INSN_LSX(vavg_wu,  vvv)
+INSN_LSX(vavg_du,  vvv)
+INSN_LSX(vavgr_b,  vvv)
+INSN_LSX(vavgr_h,  vvv)
+INSN_LSX(vavgr_w,  vvv)
+INSN_LSX(vavgr_d,  vvv)
+INSN_LSX(vavgr_bu, vvv)
+INSN_LSX(vavgr_hu, vvv)
+INSN_LSX(vavgr_wu, vvv)
+INSN_LSX(vavgr_du, vvv)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 566d9b6293..021fe3cd60 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -196,3 +196,21 @@ DEF_HELPER_FLAGS_4(vaddwod_h_bu_b, TCG_CALL_NO_RWG, void, 
ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vaddwod_w_hu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vaddwod_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vaddwod_q_du_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vavg_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vavg_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vavg_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vavg_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vavg_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vavg_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vavg_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vavg_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vavgr_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vavgr_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vavgr_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vavgr_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vavgr_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vavgr_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vavgr_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vavgr_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index 213a775490..512fe947f6 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -977,3 +977,200 @@ TRANS(vaddwod_h_bu_b, gvec_vvv, MO_8, do_vaddwod_u_s)
 TRANS(vaddwod_w_hu_h, gvec_vvv, MO_16, do_vaddwod_u_s)
 TRANS(vaddwod_d_wu_w, gvec_vvv, MO_32, do_vaddwod_u_s)
 TRANS(vaddwod_q_du_d, gvec_vvv, MO_64, do_vaddwod_u_s)
+
+static void do_vavg(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b,
+void (*gen_shr_vec)(unsigned, TCGv_vec,
+TCGv_vec, int64_t),
+void (*gen_round_vec)(unsigned, TCGv_vec,
+  TCGv_vec, TCGv_vec))
+{
+TCGv_vec tmp = tcg_temp_new_vec_matching(t);
+gen_round_vec(vece, tmp, a, b);
+tcg_gen_and_vec(vece, tmp, tmp, tcg_constant_vec_matching(t, vece, 1));
+gen_shr_vec(vece, a, a, 1);
+gen_shr_vec(vece, b, b, 1);
+tcg_gen_add_vec(vece, t, a, b);
+tcg_gen_add_vec(vece, t, t, tmp);
+}
+
+static void gen_vavg_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+do_vavg(vece, t, a, b, tcg_gen_sari_vec, tcg_gen_and_vec);
+}
+
+static void gen_vavg_u(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+do_vavg(vece, t, a, b, tcg_gen_shri_vec, tcg_gen_and_vec);
+}
+
+static void gen_vavgr_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+do_vavg(vece, t, a, b, tcg_gen_sari_vec, tcg_gen_or_vec);
+}
+
+static void gen_vavgr_u(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+do_vavg(vece, t, a, b, tcg_gen_shri_vec, tcg_gen_or_vec);
+}
+
+static void do_vavg_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
+  uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+static const TCGOpcode vecop_list[] = {
+INDEX_op_sari_vec, INDEX_op_add_vec, 0
+};
+static const GVecGen3 op[4] = {
+{
+.fniv = gen_vavg_s,
+.fno = 

Re: [PATCH 1/5] target/riscv: Fix effective address for pointer mask

2023-03-27 Thread liweiwei



On 2023/3/28 10:20, LIU Zhiwei wrote:


On 2023/3/27 18:00, Weiwei Li wrote:

Since pointer mask works on effective address, and the xl works on the
generation of effective address, so xl related calculation should be 
done

before pointer mask.


Incorrect. It has been done.

When updating the pm_mask,  we have already considered the env->xl.

You can see it in riscv_cpu_update_mask

    if (env->xl == MXL_RV32) {
    env->cur_pmmask = mask & UINT32_MAX;
    env->cur_pmbase = base & UINT32_MAX;
    } else {
    env->cur_pmmask = mask;
    env->cur_pmbase = base;
    }

Yeah, I missed this part. Then we should ensure cur_pmmask/base is 
updated when xl changes.


If so, I'll drop this patch.

Regards,

Weiwei Li



Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
  target/riscv/translate.c | 16 
  1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 0ee8ee147d..bf0e2d318e 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -568,11 +568,15 @@ static TCGv get_address(DisasContext *ctx, int 
rs1, int imm)

  TCGv src1 = get_gpr(ctx, rs1, EXT_NONE);
    tcg_gen_addi_tl(addr, src1, imm);
+
+    if (get_xl(ctx) == MXL_RV32) {
+    tcg_gen_ext32u_tl(addr, addr);
+    }
+
  if (ctx->pm_mask_enabled) {
  tcg_gen_andc_tl(addr, addr, pm_mask);
-    } else if (get_xl(ctx) == MXL_RV32) {
-    tcg_gen_ext32u_tl(addr, addr);
  }


The else is processing when only xl works, and the pm_mask doesn't work.

Zhiwei


+
  if (ctx->pm_base_enabled) {
  tcg_gen_or_tl(addr, addr, pm_base);
  }
@@ -586,11 +590,15 @@ static TCGv get_address_indexed(DisasContext 
*ctx, int rs1, TCGv offs)

  TCGv src1 = get_gpr(ctx, rs1, EXT_NONE);
    tcg_gen_add_tl(addr, src1, offs);
+
+    if (get_xl(ctx) == MXL_RV32) {
+    tcg_gen_ext32u_tl(addr, addr);
+    }
+
  if (ctx->pm_mask_enabled) {
  tcg_gen_andc_tl(addr, addr, pm_mask);
-    } else if (get_xl(ctx) == MXL_RV32) {
-    tcg_gen_ext32u_tl(addr, addr);
  }
+
  if (ctx->pm_base_enabled) {
  tcg_gen_or_tl(addr, addr, pm_base);
  }





Re: [PATCH v6 07/25] target/riscv: Reduce overhead of MSTATUS_SUM change

2023-03-27 Thread LIU Zhiwei



On 2023/3/25 18:54, Richard Henderson wrote:

From: Fei Wu 

Kernel needs to access user mode memory e.g. during syscalls, the window
is usually opened up for a very limited time through MSTATUS.SUM, the
overhead is too much if tlb_flush() gets called for every SUM change.

This patch creates a separate MMU index for S+SUM, so that it's not
necessary to flush tlb anymore when SUM changes. This is similar to how
ARM handles Privileged Access Never (PAN).

Result of 'pipe 10' from unixbench boosts from 223656 to 1705006. Many
other syscalls benefit a lot from this too.

Reviewed-by: Richard Henderson 
Signed-off-by: Fei Wu 
Message-Id: <20230324054154.414846-3-fei2...@intel.com>
---
  target/riscv/cpu.h  |  2 --
  target/riscv/internals.h| 14 ++
  target/riscv/cpu_helper.c   | 17 +++--
  target/riscv/csr.c  |  3 +--
  target/riscv/op_helper.c|  5 +++--
  target/riscv/insn_trans/trans_rvh.c.inc |  4 ++--
  6 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 3e59dbb3fd..5e589db106 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -631,8 +631,6 @@ G_NORETURN void riscv_raise_exception(CPURISCVState *env,
  target_ulong riscv_cpu_get_fflags(CPURISCVState *env);
  void riscv_cpu_set_fflags(CPURISCVState *env, target_ulong);
  
-#define TB_FLAGS_PRIV_HYP_ACCESS_MASK   (1 << 2)

-
  #include "exec/cpu-all.h"
  
  FIELD(TB_FLAGS, MEM_IDX, 0, 3)

diff --git a/target/riscv/internals.h b/target/riscv/internals.h
index 5620fbffb6..b55152a7dc 100644
--- a/target/riscv/internals.h
+++ b/target/riscv/internals.h
@@ -21,6 +21,20 @@
  
  #include "hw/registerfields.h"
  
+/*

+ * The current MMU Modes are:
+ *  - U 0b000
+ *  - S 0b001
+ *  - S+SUM 0b010
+ *  - M 0b011
+ *  - HLV/HLVX/HSV adds 0b100


Reviewed-by: LIU Zhiwei 

Zhiwei


+ */
+#define MMUIdx_U0
+#define MMUIdx_S1
+#define MMUIdx_S_SUM2
+#define MMUIdx_M3
+#define MMU_HYP_ACCESS_BIT  (1 << 2)
+
  /* share data between vector helpers and decode code */
  FIELD(VDATA, VM, 0, 1)
  FIELD(VDATA, LMUL, 1, 3)
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 5753126c7a..052fdd2d9d 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -21,6 +21,7 @@
  #include "qemu/log.h"
  #include "qemu/main-loop.h"
  #include "cpu.h"
+#include "internals.h"
  #include "pmu.h"
  #include "exec/exec-all.h"
  #include "instmap.h"
@@ -36,7 +37,19 @@ int riscv_cpu_mmu_index(CPURISCVState *env, bool ifetch)
  #ifdef CONFIG_USER_ONLY
  return 0;
  #else
-return env->priv;
+if (ifetch) {
+return env->priv;
+}
+
+/* All priv -> mmu_idx mapping are here */
+int mode = env->priv;
+if (mode == PRV_M && get_field(env->mstatus, MSTATUS_MPRV)) {
+mode = get_field(env->mstatus, MSTATUS_MPP);
+}
+if (mode == PRV_S && get_field(env->mstatus, MSTATUS_SUM)) {
+return MMUIdx_S_SUM;
+}
+return mode;
  #endif
  }
  
@@ -600,7 +613,7 @@ void riscv_cpu_set_virt_enabled(CPURISCVState *env, bool enable)
  
  bool riscv_cpu_two_stage_lookup(int mmu_idx)

  {
-return mmu_idx & TB_FLAGS_PRIV_HYP_ACCESS_MASK;
+return mmu_idx & MMU_HYP_ACCESS_BIT;
  }
  
  int riscv_cpu_claim_interrupts(RISCVCPU *cpu, uint64_t interrupts)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index abea7b749e..b79758a606 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -1246,8 +1246,7 @@ static RISCVException write_mstatus(CPURISCVState *env, 
int csrno,
  RISCVMXL xl = riscv_cpu_mxl(env);
  
  /* flush tlb on mstatus fields that affect VM */

-if ((val ^ mstatus) & (MSTATUS_MXR | MSTATUS_MPP | MSTATUS_MPV |
-MSTATUS_MPRV | MSTATUS_SUM)) {
+if ((val ^ mstatus) & (MSTATUS_MXR | MSTATUS_MPV)) {
  tlb_flush(env_cpu(env));
  }
  mask = MSTATUS_SIE | MSTATUS_SPIE | MSTATUS_MIE | MSTATUS_MPIE |
diff --git a/target/riscv/op_helper.c b/target/riscv/op_helper.c
index 84ee018f7d..962a061228 100644
--- a/target/riscv/op_helper.c
+++ b/target/riscv/op_helper.c
@@ -20,6 +20,7 @@
  
  #include "qemu/osdep.h"

  #include "cpu.h"
+#include "internals.h"
  #include "qemu/main-loop.h"
  #include "exec/exec-all.h"
  #include "exec/helper-proto.h"
@@ -428,14 +429,14 @@ void helper_hyp_gvma_tlb_flush(CPURISCVState *env)
  
  target_ulong helper_hyp_hlvx_hu(CPURISCVState *env, target_ulong address)

  {
-int mmu_idx = cpu_mmu_index(env, true) | TB_FLAGS_PRIV_HYP_ACCESS_MASK;
+int mmu_idx = cpu_mmu_index(env, true) | MMU_HYP_ACCESS_BIT;
  
  return cpu_lduw_mmuidx_ra(env, address, mmu_idx, GETPC());

  }
  
  target_ulong helper_hyp_hlvx_wu(CPURISCVState *env, target_ulong address)

  {
-int mmu_idx = cpu_mmu_index(env, true) | TB_FLAGS_PRIV_HYP_ACCESS_MASK;
+int mmu_idx = 

Re: [PATCH v6 06/25] target/riscv: Separate priv from mmu_idx

2023-03-27 Thread LIU Zhiwei



On 2023/3/25 18:54, Richard Henderson wrote:

From: Fei Wu 

Currently it's assumed the 2 low bits of mmu_idx map to privilege mode,
this assumption won't last as we are about to add more mmu_idx. Here an
individual priv field is added into TB_FLAGS.

Reviewed-by: Richard Henderson 
Signed-off-by: Fei Wu 
Message-Id: <20230324054154.414846-2-fei2...@intel.com>
---
  target/riscv/cpu.h | 2 +-
  target/riscv/cpu_helper.c  | 4 +++-
  target/riscv/translate.c   | 2 ++
  target/riscv/insn_trans/trans_privileged.c.inc | 2 +-
  target/riscv/insn_trans/trans_xthead.c.inc | 7 +--
  5 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 86a82e25dc..3e59dbb3fd 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -631,7 +631,6 @@ G_NORETURN void riscv_raise_exception(CPURISCVState *env,
  target_ulong riscv_cpu_get_fflags(CPURISCVState *env);
  void riscv_cpu_set_fflags(CPURISCVState *env, target_ulong);
  
-#define TB_FLAGS_PRIV_MMU_MASK3

  #define TB_FLAGS_PRIV_HYP_ACCESS_MASK   (1 << 2)
  
  #include "exec/cpu-all.h"

@@ -658,6 +657,7 @@ FIELD(TB_FLAGS, ITRIGGER, 22, 1)
  /* Virtual mode enabled */
  FIELD(TB_FLAGS, VIRT_ENABLED, 23, 1)
  FIELD(TB_FLAGS, VSTART_EQ_ZERO, 24, 1)
+FIELD(TB_FLAGS, PRIV, 25, 2)
Though I am not prefer this.  It is acceptable as the other patches will 
explicitly encode the mem_index in tb flags.

After that, this is necessary.
  
  #ifdef TARGET_RISCV32

  #define riscv_cpu_mxl(env)  ((void)(env), MXL_RV32)
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 4f0999d50b..5753126c7a 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -83,6 +83,8 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong 
*pc,
  fs = EXT_STATUS_DIRTY;
  vs = EXT_STATUS_DIRTY;
  #else
+flags = FIELD_DP32(flags, TB_FLAGS, PRIV, env->priv);
+
  flags |= cpu_mmu_index(env, 0);
  fs = get_field(env->mstatus, MSTATUS_FS);
  vs = get_field(env->mstatus, MSTATUS_VS);
@@ -764,7 +766,7 @@ static int get_physical_address(CPURISCVState *env, hwaddr 
*physical,
   * (riscv_cpu_do_interrupt) is correct */
  MemTxResult res;
  MemTxAttrs attrs = MEMTXATTRS_UNSPECIFIED;
-int mode = mmu_idx & TB_FLAGS_PRIV_MMU_MASK;
+int mode = env->priv;
  bool use_background = false;
  hwaddr ppn;
  RISCVCPU *cpu = env_archcpu(env);
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index f8c077525c..abfc152553 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -67,6 +67,7 @@ typedef struct DisasContext {
  RISCVExtStatus mstatus_fs;
  RISCVExtStatus mstatus_vs;
  uint32_t mem_idx;
+uint32_t priv;
  /* Remember the rounding mode encoded in the previous fp instruction,
 which we have already installed into env->fp_status.  Or -1 for
 no previous fp instruction.  Note that we exit the TB when writing
@@ -1140,6 +1141,7 @@ static void riscv_tr_init_disas_context(DisasContextBase 
*dcbase, CPUState *cs)
  uint32_t tb_flags = ctx->base.tb->flags;
  
  ctx->pc_succ_insn = ctx->base.pc_first;

+ctx->priv = FIELD_EX32(tb_flags, TB_FLAGS, PRIV);
  ctx->mem_idx = FIELD_EX32(tb_flags, TB_FLAGS, MEM_IDX);
  ctx->mstatus_fs = FIELD_EX32(tb_flags, TB_FLAGS, FS);
  ctx->mstatus_vs = FIELD_EX32(tb_flags, TB_FLAGS, VS);
diff --git a/target/riscv/insn_trans/trans_privileged.c.inc 
b/target/riscv/insn_trans/trans_privileged.c.inc
index 59501b2780..9305b18299 100644
--- a/target/riscv/insn_trans/trans_privileged.c.inc
+++ b/target/riscv/insn_trans/trans_privileged.c.inc
@@ -52,7 +52,7 @@ static bool trans_ebreak(DisasContext *ctx, arg_ebreak *a)
   * that no exception will be raised when fetching them.
   */
  
-if (semihosting_enabled(ctx->mem_idx < PRV_S) &&

+if (semihosting_enabled(ctx->priv < PRV_S) &&
  (pre_addr & TARGET_PAGE_MASK) == (post_addr & TARGET_PAGE_MASK)) {
  pre= opcode_at(>base, pre_addr);
  ebreak = opcode_at(>base, ebreak_addr);
diff --git a/target/riscv/insn_trans/trans_xthead.c.inc 
b/target/riscv/insn_trans/trans_xthead.c.inc
index df504c3f2c..adfb53cb4c 100644
--- a/target/riscv/insn_trans/trans_xthead.c.inc
+++ b/target/riscv/insn_trans/trans_xthead.c.inc
@@ -265,12 +265,7 @@ static bool trans_th_tst(DisasContext *ctx, arg_th_tst *a)
  
  static inline int priv_level(DisasContext *ctx)

  {
-#ifdef CONFIG_USER_ONLY
-return PRV_U;
-#else
- /* Priv level is part of mem_idx. */
-return ctx->mem_idx & TB_FLAGS_PRIV_MMU_MASK;
-#endif
+return ctx->priv;
  }


Could you  remove the priv_level and use ctx->priv directly in this file

Otherwise,

Reviewed-by: LIU Zhiwei 

Zhiwei

  
  /* Test if priv level is M, S, or U (cannot fail). */




Re: [PATCH v6 04/25] target/riscv: Remove mstatus_hs_{fs,vs} from tb_flags

2023-03-27 Thread LIU Zhiwei



On 2023/3/25 18:54, Richard Henderson wrote:

Merge with mstatus_{fs,vs}.  We might perform a redundant
assignment to one or the other field, but it's a trivial
and saves 4 bits from TB_FLAGS.

Signed-off-by: Richard Henderson 


Reviewed-by: LIU Zhiwei 

Zhiwei


---
  target/riscv/cpu.h| 16 +++-
  target/riscv/cpu_helper.c | 34 --
  target/riscv/translate.c  | 32 ++--
  3 files changed, 33 insertions(+), 49 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index f787145a21..d9e0eaaf9b 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -646,19 +646,17 @@ FIELD(TB_FLAGS, VL_EQ_VLMAX, 13, 1)
  FIELD(TB_FLAGS, VILL, 14, 1)
  /* Is a Hypervisor instruction load/store allowed? */
  FIELD(TB_FLAGS, HLSX, 15, 1)
-FIELD(TB_FLAGS, MSTATUS_HS_FS, 16, 2)
-FIELD(TB_FLAGS, MSTATUS_HS_VS, 18, 2)
  /* The combination of MXL/SXL/UXL that applies to the current cpu mode. */
-FIELD(TB_FLAGS, XL, 20, 2)
+FIELD(TB_FLAGS, XL, 16, 2)
  /* If PointerMasking should be applied */
-FIELD(TB_FLAGS, PM_MASK_ENABLED, 22, 1)
-FIELD(TB_FLAGS, PM_BASE_ENABLED, 23, 1)
-FIELD(TB_FLAGS, VTA, 24, 1)
-FIELD(TB_FLAGS, VMA, 25, 1)
+FIELD(TB_FLAGS, PM_MASK_ENABLED, 18, 1)
+FIELD(TB_FLAGS, PM_BASE_ENABLED, 19, 1)
+FIELD(TB_FLAGS, VTA, 20, 1)
+FIELD(TB_FLAGS, VMA, 21, 1)
  /* Native debug itrigger */
-FIELD(TB_FLAGS, ITRIGGER, 26, 1)
+FIELD(TB_FLAGS, ITRIGGER, 22, 1)
  /* Virtual mode enabled */
-FIELD(TB_FLAGS, VIRT_ENABLED, 27, 1)
+FIELD(TB_FLAGS, VIRT_ENABLED, 23, 1)
  
  #ifdef TARGET_RISCV32

  #define riscv_cpu_mxl(env)  ((void)(env), MXL_RV32)
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 1e7ee9aa30..4fdd6fe021 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -45,7 +45,7 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong 
*pc,
  {
  CPUState *cs = env_cpu(env);
  RISCVCPU *cpu = RISCV_CPU(cs);
-
+RISCVExtStatus fs, vs;
  uint32_t flags = 0;
  
  *pc = env->xl == MXL_RV32 ? env->pc & UINT32_MAX : env->pc;

@@ -79,18 +79,12 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong 
*pc,
  }
  
  #ifdef CONFIG_USER_ONLY

-flags = FIELD_DP32(flags, TB_FLAGS, FS, EXT_STATUS_DIRTY);
-flags = FIELD_DP32(flags, TB_FLAGS, VS, EXT_STATUS_DIRTY);
+fs = EXT_STATUS_DIRTY;
+vs = EXT_STATUS_DIRTY;
  #else
  flags |= cpu_mmu_index(env, 0);
-if (riscv_cpu_fp_enabled(env)) {
-flags =  FIELD_DP32(flags, TB_FLAGS, FS,
-get_field(env->mstatus,  MSTATUS_FS));
-}
-if (riscv_cpu_vector_enabled(env)) {
-flags =  FIELD_DP32(flags, TB_FLAGS, VS,
-get_field(env->mstatus, MSTATUS_VS));
-}
+fs = get_field(env->mstatus, MSTATUS_FS);
+vs = get_field(env->mstatus, MSTATUS_VS);
  
  if (riscv_has_ext(env, RVH)) {

  if (env->priv == PRV_M ||
@@ -100,19 +94,23 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong 
*pc,
  flags = FIELD_DP32(flags, TB_FLAGS, HLSX, 1);
  }
  
-flags = FIELD_DP32(flags, TB_FLAGS, MSTATUS_HS_FS,

-   get_field(env->mstatus_hs, MSTATUS_FS));
-
-flags = FIELD_DP32(flags, TB_FLAGS, MSTATUS_HS_VS,
-   get_field(env->mstatus_hs, MSTATUS_VS));
-flags = FIELD_DP32(flags, TB_FLAGS, VIRT_ENABLED,
-   get_field(env->virt, VIRT_ONOFF));
+if (riscv_cpu_virt_enabled(env)) {
+flags = FIELD_DP32(flags, TB_FLAGS, VIRT_ENABLED, 1);
+/*
+ * Merge DISABLED and !DIRTY states using MIN.
+ * We will set both fields when dirtying.
+ */
+fs = MIN(fs, get_field(env->mstatus_hs, MSTATUS_FS));
+vs = MIN(vs, get_field(env->mstatus_hs, MSTATUS_VS));
+}
  }
  if (cpu->cfg.debug && !icount_enabled()) {
  flags = FIELD_DP32(flags, TB_FLAGS, ITRIGGER, env->itrigger_enabled);
  }
  #endif
  
+flags = FIELD_DP32(flags, TB_FLAGS, FS, fs);

+flags = FIELD_DP32(flags, TB_FLAGS, VS, vs);
  flags = FIELD_DP32(flags, TB_FLAGS, XL, env->xl);
  if (env->cur_pmmask < (env->xl == MXL_RV32 ? UINT32_MAX : UINT64_MAX)) {
  flags = FIELD_DP32(flags, TB_FLAGS, PM_MASK_ENABLED, 1);
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index b897bf6006..74d0b9889d 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -66,8 +66,6 @@ typedef struct DisasContext {
  uint32_t opcode;
  RISCVExtStatus mstatus_fs;
  RISCVExtStatus mstatus_vs;
-RISCVExtStatus mstatus_hs_fs;
-RISCVExtStatus mstatus_hs_vs;
  uint32_t mem_idx;
  /* Remember the rounding mode encoded in the previous fp instruction,
 which we have already installed into env->fp_status.  Or -1 for
@@ -618,16 +616,12 @@ static void mark_fs_dirty(DisasContext *ctx)
  tcg_gen_ld_tl(tmp, 

Re: [PATCH 5/5] target/riscv: Add pointer mask support for instruction fetch

2023-03-27 Thread LIU Zhiwei



On 2023/3/28 9:55, liweiwei wrote:


On 2023/3/28 02:04, Richard Henderson wrote:

On 3/27/23 03:00, Weiwei Li wrote:
@@ -1248,6 +1265,10 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr 
address, int size,
  qemu_log_mask(CPU_LOG_MMU, "%s ad %" VADDR_PRIx " rw %d 
mmu_idx %d\n",

    __func__, address, access_type, mmu_idx);
  +    if (access_type == MMU_INST_FETCH) {
+    address = adjust_pc_address(env, address);
+    }


Why do you want to do this so late, as opposed to earlier in 
cpu_get_tb_cpu_state?


In this way, the pc for tb may be different from the reg pc. 

I don't understand.

Then the pc register will be wrong if sync from tb.


I think you should give an explain here why it is wrong.

Zhiwei



Regards,

Weiwei Li




r~




Re: [PATCH 4/5] target/riscv: take xl into consideration for vector address

2023-03-27 Thread LIU Zhiwei



On 2023/3/27 18:00, Weiwei Li wrote:

Sign-extend the vector address when xl = 32.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
  target/riscv/vector_helper.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index a58d82af8c..07477663eb 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -172,6 +172,9 @@ static inline uint32_t vext_get_total_elems(CPURISCVState 
*env, uint32_t desc,
  
  static inline target_ulong adjust_addr(CPURISCVState *env, target_ulong addr)

  {
+if (env->xl == MXL_RV32) {
+addr = (int32_t)addr;
+}


Incorrect. Same reason as patch 1.

Zhiwei


  return (addr & ~env->cur_pmmask) | env->cur_pmbase;
  }
  




Re: [PATCH 3/5] target/riscv: Fix pointer mask transformation for vector address

2023-03-27 Thread LIU Zhiwei



On 2023/3/27 18:00, Weiwei Li wrote:

actual_address = (requested_address & ~mpmmask) | mpmbase.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
  target/riscv/vector_helper.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 2423affe37..a58d82af8c 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -172,7 +172,7 @@ static inline uint32_t vext_get_total_elems(CPURISCVState 
*env, uint32_t desc,
  
  static inline target_ulong adjust_addr(CPURISCVState *env, target_ulong addr)

  {
-return (addr & env->cur_pmmask) | env->cur_pmbase;
+return (addr & ~env->cur_pmmask) | env->cur_pmbase;


It's my typo. Thanks.

Reviewed-by: LIU Zhiwei 

Zhiwei


  }
  
  /*




Re: [PATCH 1/5] target/riscv: Fix effective address for pointer mask

2023-03-27 Thread LIU Zhiwei



On 2023/3/27 18:00, Weiwei Li wrote:

Since pointer mask works on effective address, and the xl works on the
generation of effective address, so xl related calculation should be done
before pointer mask.


Incorrect. It has been done.

When updating the pm_mask,  we have already considered the env->xl.

You can see it in riscv_cpu_update_mask

    if (env->xl == MXL_RV32) {
    env->cur_pmmask = mask & UINT32_MAX;
    env->cur_pmbase = base & UINT32_MAX;
    } else {
    env->cur_pmmask = mask;
    env->cur_pmbase = base;
    }



Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
  target/riscv/translate.c | 16 
  1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 0ee8ee147d..bf0e2d318e 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -568,11 +568,15 @@ static TCGv get_address(DisasContext *ctx, int rs1, int 
imm)
  TCGv src1 = get_gpr(ctx, rs1, EXT_NONE);
  
  tcg_gen_addi_tl(addr, src1, imm);

+
+if (get_xl(ctx) == MXL_RV32) {
+tcg_gen_ext32u_tl(addr, addr);
+}
+
  if (ctx->pm_mask_enabled) {
  tcg_gen_andc_tl(addr, addr, pm_mask);
-} else if (get_xl(ctx) == MXL_RV32) {
-tcg_gen_ext32u_tl(addr, addr);
  }


The else is processing when only xl works, and the pm_mask doesn't work.

Zhiwei


+
  if (ctx->pm_base_enabled) {
  tcg_gen_or_tl(addr, addr, pm_base);
  }
@@ -586,11 +590,15 @@ static TCGv get_address_indexed(DisasContext *ctx, int 
rs1, TCGv offs)
  TCGv src1 = get_gpr(ctx, rs1, EXT_NONE);
  
  tcg_gen_add_tl(addr, src1, offs);

+
+if (get_xl(ctx) == MXL_RV32) {
+tcg_gen_ext32u_tl(addr, addr);
+}
+
  if (ctx->pm_mask_enabled) {
  tcg_gen_andc_tl(addr, addr, pm_mask);
-} else if (get_xl(ctx) == MXL_RV32) {
-tcg_gen_ext32u_tl(addr, addr);
  }
+
  if (ctx->pm_base_enabled) {
  tcg_gen_or_tl(addr, addr, pm_base);
  }




Re: [PATCH 2/5] target/riscv: Use sign-extended data address when xl = 32

2023-03-27 Thread LIU Zhiwei


On 2023/3/27 18:00, Weiwei Li wrote:

Currently, the pc use signed-extend(in gen_set_pc*) when xl = 32. And
data address should use the same memory address space with it when
xl = 32. So we should change their address calculation to use sign-extended
address when xl = 32.


Incorrect. PC sign-extend is mandated by the spec. It can be seen for 
gdb or the OS. But for the memory address for xl = 32, it's the qemu 
internal implementation.


We should not to make it too complex.

Even for the PC, when fectch instruction, we only use the low 32-bits, 
as you can see  from the cpu_get_tb_cpu_state.


*pc = cpu_get_xl(env) == MXL_RV32 ? env->pc & UINT32_MAX : env->pc;

Zhiwei



Signed-off-by: Weiwei Li
Signed-off-by: Junqiang Wang
---
  target/riscv/translate.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index bf0e2d318e..c48cb19389 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -570,7 +570,7 @@ static TCGv get_address(DisasContext *ctx, int rs1, int imm)
  tcg_gen_addi_tl(addr, src1, imm);
  
  if (get_xl(ctx) == MXL_RV32) {

-tcg_gen_ext32u_tl(addr, addr);
+tcg_gen_ext32s_tl(addr, addr);
  }
  
  if (ctx->pm_mask_enabled) {

@@ -592,7 +592,7 @@ static TCGv get_address_indexed(DisasContext *ctx, int rs1, 
TCGv offs)
  tcg_gen_add_tl(addr, src1, offs);
  
  if (get_xl(ctx) == MXL_RV32) {

-tcg_gen_ext32u_tl(addr, addr);
+tcg_gen_ext32s_tl(addr, addr);
  }
  
  if (ctx->pm_mask_enabled) {

Re: [PATCH 5/5] target/riscv: Add pointer mask support for instruction fetch

2023-03-27 Thread liweiwei



On 2023/3/28 02:04, Richard Henderson wrote:

On 3/27/23 03:00, Weiwei Li wrote:
@@ -1248,6 +1265,10 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr 
address, int size,
  qemu_log_mask(CPU_LOG_MMU, "%s ad %" VADDR_PRIx " rw %d mmu_idx 
%d\n",

    __func__, address, access_type, mmu_idx);
  +    if (access_type == MMU_INST_FETCH) {
+    address = adjust_pc_address(env, address);
+    }


Why do you want to do this so late, as opposed to earlier in 
cpu_get_tb_cpu_state?


In this way, the pc for tb may be different from the reg pc. Then the pc 
register will be wrong if sync from tb.


Regards,

Weiwei Li




r~





Re: [PATCH v6 13/25] target/riscv: Introduce mmuidx_priv

2023-03-27 Thread LIU Zhiwei



On 2023/3/28 9:33, LIU Zhiwei wrote:


On 2023/3/28 0:29, Richard Henderson wrote:

On 3/26/23 19:07, LIU Zhiwei wrote:

+static inline int mmuidx_priv(int mmu_idx)
+{
+    int ret = mmu_idx & 3;
+    if (ret == MMUIdx_S_SUM) {
+    ret = PRV_S;
+    }
+    return ret;
+}
+


Can we remove the PRIV from the tb flags after we have this function?


No, because this is the priv of the memory operation as modified by 
e.g. MPRV, not the true cpu priv.


For this implementation, we explicitly use the tb flags for mmu index. 
I think it is the reason why we have to maintain the redundant 
privilege in tb flags.
It may be better to only store machine states into tb flags. Can we 
just pass everything that we need, for example, the priv and sum, and 
then implicitly

calculate the ctx->mem_idx in disas_init_fn?

I remember that you give the similar suggestion in the comment process

https://mail.gnu.org/archive/html/qemu-riscv/2023-03/msg00566.html

Best Regards,
Zhiwei

To make this comment clear, I paste a simple implementatioin here. But 
it is just for discussing, not a normal patch for merging.


diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 623288e6f9..d4506be5be 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -632,12 +632,10 @@ G_NORETURN void 
riscv_raise_exception(CPURISCVState *env,

 target_ulong riscv_cpu_get_fflags(CPURISCVState *env);
 void riscv_cpu_set_fflags(CPURISCVState *env, target_ulong);

-#define TB_FLAGS_PRIV_MMU_MASK    3
-#define TB_FLAGS_PRIV_HYP_ACCESS_MASK   (1 << 2)
-
 #include "exec/cpu-all.h"

-FIELD(TB_FLAGS, MEM_IDX, 0, 3)
+FIELD(TB_FLAGS, PRIV, 0, 2)
+FIELD(TB_FLAGS, SUM, 2, 1)
 FIELD(TB_FLAGS, FS, 3, 2)
 /* Vector flags */
 FIELD(TB_FLAGS, VS, 5, 2)
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index f80d069884..b11f583643 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -30,6 +30,7 @@
 #include "sysemu/cpu-timers.h"
 #include "cpu_bits.h"
 #include "debug.h"
+#include "internals.h"

 int riscv_cpu_mmu_index(CPURISCVState *env, bool ifetch)
 {
@@ -40,6 +41,20 @@ int riscv_cpu_mmu_index(CPURISCVState *env, bool ifetch)
 #endif
 }

+#ifndef CONFIG_USER_ONLY
+static bool riscv_cpu_sum_enabled(CPURISCVState *env)
+{
+    int mode = env->priv;
+    if (mode == PRV_M && get_field(env->mstatus, MSTATUS_MPRV)) {
+    mode = get_field(env->mstatus, MSTATUS_MPP);
+    }
+    if (mode == PRV_S && get_field(env->mstatus, MSTATUS_SUM)) {
+    return true;
+    }
+    return false;
+}
+#endif
+
 void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong *pc,
   target_ulong *cs_base, uint32_t *pflags)
 {
@@ -80,10 +95,11 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, 
target_ulong *pc,

 }

 #ifdef CONFIG_USER_ONLY
    flags =  FIELD_DP32(flags, TB_FLAGS, FS, EXT_STATUS_DIRTY);
    flags =  FIELD_DP32(flags, TB_FLAGS, VS, EXT_STATUS_DIRTY);
 #else
-    flags |= cpu_mmu_index(env, 0);
+    flags = FIELD_DP32(flags, TB_FLAGS, PRIV, env->priv);
+    flags = FIELD_DP32(flags, TB_FLAGS, SUM, riscv_cpu_sum_enabled(env));
 if (riscv_cpu_fp_enabled(env)) {
 flags =  FIELD_DP32(flags, TB_FLAGS, FS,
 get_field(env->mstatus, MSTATUS_FS));
@@ -600,7 +616,7 @@ void riscv_cpu_set_virt_enabled(CPURISCVState *env, 
bool enable)


 bool riscv_cpu_two_stage_lookup(int mmu_idx)
 {
-    return mmu_idx & TB_FLAGS_PRIV_HYP_ACCESS_MASK;
+    return mmu_idx & MMU_HYP_ACCESS_BIT;
 }

 int riscv_cpu_claim_interrupts(RISCVCPU *cpu, uint64_t interrupts)
@@ -766,7 +782,7 @@ static int get_physical_address(CPURISCVState *env, 
hwaddr *physical,

  * (riscv_cpu_do_interrupt) is correct */
 MemTxResult res;
 MemTxAttrs attrs = MEMTXATTRS_UNSPECIFIED;
-    int mode = mmu_idx & TB_FLAGS_PRIV_MMU_MASK;
+    int mode = mmu_idx;
 bool use_background = false;
 hwaddr ppn;
 RISCVCPU *cpu = env_archcpu(env);
@@ -788,10 +804,8 @@ static int get_physical_address(CPURISCVState *env, 
hwaddr *physical,

    instructions, HLV, HLVX, and HSV. */
 if (riscv_cpu_two_stage_lookup(mmu_idx)) {
 mode = get_field(env->hstatus, HSTATUS_SPVP);
-    } else if (mode == PRV_M && access_type != MMU_INST_FETCH) {
-    if (get_field(env->mstatus, MSTATUS_MPRV)) {
-    mode = get_field(env->mstatus, MSTATUS_MPP);
-    }
+    } else if (mmu_idx == MMUIdx_S_SUM) {
+    mode = PRV_S;
 }
 if (first_stage == false) {
@@ -847,7 +861,7 @@ static int get_physical_address(CPURISCVState *env, 
hwaddr *physical,

 widened = 2;
 }
 /* status.SUM will be ignored if execute on background */
-    sum = get_field(env->mstatus, MSTATUS_SUM) || use_background || 
is_debug;

+    sum = (mmu_idx == MMUIdx_S_SUM) || use_background || is_debug;
 switch (vm) {
 case VM_1_10_SV32:
   levels = 2; ptidxbits = 10; ptesize = 4; break;
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index abea7b749e..dd5f774b2f 

Re: [PATCH v6 13/25] target/riscv: Introduce mmuidx_priv

2023-03-27 Thread LIU Zhiwei



On 2023/3/28 0:29, Richard Henderson wrote:

On 3/26/23 19:07, LIU Zhiwei wrote:

+static inline int mmuidx_priv(int mmu_idx)
+{
+    int ret = mmu_idx & 3;
+    if (ret == MMUIdx_S_SUM) {
+    ret = PRV_S;
+    }
+    return ret;
+}
+


Can we remove the PRIV from the tb flags after we have this function?


No, because this is the priv of the memory operation as modified by 
e.g. MPRV, not the true cpu priv.


For this implementation, we explicitly use the tb flags for mmu index. I 
think it is the reason why we have to maintain the redundant privilege 
in tb flags.
It may be better to only store machine states into tb flags. Can we just 
pass everything that we need, for example, the priv and sum, and then 
implicitly

calculate the ctx->mem_idx in disas_init_fn?

I remember that you give the similar suggestion in the comment process

https://mail.gnu.org/archive/html/qemu-riscv/2023-03/msg00566.html

Best Regards,
Zhiwei




r~




Re: [PATCH v6 00/25] target/riscv: MSTATUS_SUM + cleanups

2023-03-27 Thread Wu, Fei
On 3/28/2023 12:43 AM, Daniel Henrique Barboza wrote:
> 
> 
> On 3/25/23 07:54, Richard Henderson wrote:
>> This builds on Fei and Zhiwei's SUM and TB_FLAGS changes.
>>
>>    * Reclaim 5 TB_FLAGS bits, since we nearly ran out.
>>
>>    * Using cpu_mmu_index(env, true) is insufficient to implement
>>  HLVX properly.  While that chooses the correct mmu_idx, it
>>  does not perform the read with execute permission.
>>  I add a new tcg interface to perform a read-for-execute with
>>  an arbitrary mmu_idx.  This is still not 100% compliant, but
>>  it's closer.
>>
>>    * Handle mstatus.MPV in cpu_mmu_index.
>>    * Use vsstatus.SUM when required for MMUIdx_S_SUM.
>>    * Cleanups for get_physical_address.
>>
>> While this passes check-avocado, I'm sure that's insufficient.
>> Please have a close look.
> 
> Tested fine in my end with some buildroot tests and 'stress-ng' in a 'virt'
> machine with Ubuntu.
> 
> Tested-by: Daniel Henrique Barboza 
> 
Great. I suppose class 'os' in stress-ng should see performance boost too.

btw, Is there any public URL for us to check QEMU regressions and
performance data?

Thanks,
Fei.

>>
>>
>> r~
>>
>>
>> Fei Wu (2):
>>    target/riscv: Separate priv from mmu_idx
>>    target/riscv: Reduce overhead of MSTATUS_SUM change
>>
>> LIU Zhiwei (4):
>>    target/riscv: Extract virt enabled state from tb flags
>>    target/riscv: Add a general status enum for extensions
>>    target/riscv: Encode the FS and VS on a normal way for tb flags
>>    target/riscv: Add a tb flags field for vstart
>>
>> Richard Henderson (19):
>>    target/riscv: Remove mstatus_hs_{fs,vs} from tb_flags
>>    accel/tcg: Add cpu_ld*_code_mmu
>>    target/riscv: Use cpu_ld*_code_mmu for HLVX
>>    target/riscv: Handle HLV, HSV via helpers
>>    target/riscv: Rename MMU_HYP_ACCESS_BIT to MMU_2STAGE_BIT
>>    target/riscv: Introduce mmuidx_sum
>>    target/riscv: Introduce mmuidx_priv
>>    target/riscv: Introduce mmuidx_2stage
>>    target/riscv: Move hstatus.spvp check to check_access_hlsv
>>    target/riscv: Set MMU_2STAGE_BIT in riscv_cpu_mmu_index
>>    target/riscv: Check SUM in the correct register
>>    target/riscv: Hoist second stage mode change to callers
>>    target/riscv: Hoist pbmte and hade out of the level loop
>>    target/riscv: Move leaf pte processing out of level loop
>>    target/riscv: Suppress pte update with is_debug
>>    target/riscv: Don't modify SUM with is_debug
>>    target/riscv: Merge checks for reserved pte flags
>>    target/riscv: Reorg access check in get_physical_address
>>    target/riscv: Reorg sum check in get_physical_address
>>
>>   include/exec/cpu_ldst.h   |   9 +
>>   target/riscv/cpu.h    |  47 ++-
>>   target/riscv/cpu_bits.h   |  12 +-
>>   target/riscv/helper.h |  12 +-
>>   target/riscv/internals.h  |  35 ++
>>   accel/tcg/cputlb.c    |  48 +++
>>   accel/tcg/user-exec.c |  58 +++
>>   target/riscv/cpu.c    |   2 +-
>>   target/riscv/cpu_helper.c | 393 +-
>>   target/riscv/csr.c    |  21 +-
>>   target/riscv/op_helper.c  | 113 -
>>   target/riscv/translate.c  |  72 ++--
>>   .../riscv/insn_trans/trans_privileged.c.inc   |   2 +-
>>   target/riscv/insn_trans/trans_rvf.c.inc   |   2 +-
>>   target/riscv/insn_trans/trans_rvh.c.inc   | 135 +++---
>>   target/riscv/insn_trans/trans_rvv.c.inc   |  22 +-
>>   target/riscv/insn_trans/trans_xthead.c.inc    |   7 +-
>>   17 files changed, 595 insertions(+), 395 deletions(-)
>>




Re: [PATCH v2 06/10] target/riscv: Remove riscv_cpu_virt_enabled()

2023-03-27 Thread LIU Zhiwei



On 2023/3/27 16:08, Weiwei Li wrote:

Directly use env->virt_enabled instead.

Suggested-by: LIU Zhiwei 
Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
  target/riscv/cpu.c|  2 +-
  target/riscv/cpu.h|  1 -
  target/riscv/cpu_helper.c | 51 ++-
  target/riscv/csr.c| 46 +--
  target/riscv/debug.c  | 10 
  target/riscv/op_helper.c  | 18 +++---
  target/riscv/pmu.c|  4 +--
  target/riscv/translate.c  |  2 +-
  8 files changed, 64 insertions(+), 70 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 16e465a0ab..e71b4d24a7 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -549,7 +549,7 @@ static void riscv_cpu_dump_state(CPUState *cs, FILE *f, int 
flags)
  
  #if !defined(CONFIG_USER_ONLY)

  if (riscv_has_ext(env, RVH)) {
-qemu_fprintf(f, " %s %d\n", "V  =  ", riscv_cpu_virt_enabled(env));
+qemu_fprintf(f, " %s %d\n", "V  =  ", env->virt_enabled);
  }
  #endif
  qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "pc  ", env->pc);
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 22dc5ddb95..dc9817b40d 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -576,7 +576,6 @@ bool riscv_cpu_fp_enabled(CPURISCVState *env);
  target_ulong riscv_cpu_get_geilen(CPURISCVState *env);
  void riscv_cpu_set_geilen(CPURISCVState *env, target_ulong geilen);
  bool riscv_cpu_vector_enabled(CPURISCVState *env);
-bool riscv_cpu_virt_enabled(CPURISCVState *env);
  void riscv_cpu_set_virt_enabled(CPURISCVState *env, bool enable);
  bool riscv_cpu_two_stage_lookup(int mmu_idx);
  int riscv_cpu_mmu_index(CPURISCVState *env, bool ifetch);
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index c7bc3fc553..1ad39e7157 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -93,8 +93,8 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong 
*pc,
  
  if (riscv_has_ext(env, RVH)) {

  if (env->priv == PRV_M ||
-(env->priv == PRV_S && !riscv_cpu_virt_enabled(env)) ||
-(env->priv == PRV_U && !riscv_cpu_virt_enabled(env) &&
+(env->priv == PRV_S && !env->virt_enabled) ||
+(env->priv == PRV_U && !env->virt_enabled &&
  get_field(env->hstatus, HSTATUS_HU))) {
  flags = FIELD_DP32(flags, TB_FLAGS, HLSX, 1);
  }
@@ -391,7 +391,7 @@ static int riscv_cpu_local_irq_pending(CPURISCVState *env)
  uint64_t irqs, pending, mie, hsie, vsie;
  
  /* Determine interrupt enable state of all privilege modes */

-if (riscv_cpu_virt_enabled(env)) {
+if (env->virt_enabled) {
  mie = 1;
  hsie = 1;
  vsie = (env->priv < PRV_S) ||
@@ -452,7 +452,7 @@ bool riscv_cpu_exec_interrupt(CPUState *cs, int 
interrupt_request)
  bool riscv_cpu_fp_enabled(CPURISCVState *env)
  {
  if (env->mstatus & MSTATUS_FS) {
-if (riscv_cpu_virt_enabled(env) && !(env->mstatus_hs & MSTATUS_FS)) {
+if (env->virt_enabled && !(env->mstatus_hs & MSTATUS_FS)) {
  return false;
  }
  return true;
@@ -465,7 +465,7 @@ bool riscv_cpu_fp_enabled(CPURISCVState *env)
  bool riscv_cpu_vector_enabled(CPURISCVState *env)
  {
  if (env->mstatus & MSTATUS_VS) {
-if (riscv_cpu_virt_enabled(env) && !(env->mstatus_hs & MSTATUS_VS)) {
+if (env->virt_enabled && !(env->mstatus_hs & MSTATUS_VS)) {
  return false;
  }
  return true;
@@ -483,7 +483,7 @@ void riscv_cpu_swap_hypervisor_regs(CPURISCVState *env)
  if (riscv_has_ext(env, RVF)) {
  mstatus_mask |= MSTATUS_FS;
  }
-bool current_virt = riscv_cpu_virt_enabled(env);
+bool current_virt = env->virt_enabled;
  
  g_assert(riscv_has_ext(env, RVH));
  
@@ -558,11 +558,6 @@ void riscv_cpu_set_geilen(CPURISCVState *env, target_ulong geilen)

  env->geilen = geilen;
  }
  
-bool riscv_cpu_virt_enabled(CPURISCVState *env)

-{
-return env->virt_enabled;
-}


Reviewed-by: LIU Zhiwei 

Zhiwei


-
  /* This function can only be called to set virt when RVH is enabled */
  void riscv_cpu_set_virt_enabled(CPURISCVState *env, bool enable)
  {
@@ -609,7 +604,7 @@ uint64_t riscv_cpu_update_mip(CPURISCVState *env, uint64_t 
mask,
  CPUState *cs = env_cpu(env);
  uint64_t gein, vsgein = 0, vstip = 0, old = env->mip;
  
-if (riscv_cpu_virt_enabled(env)) {

+if (env->virt_enabled) {
  gein = get_field(env->hstatus, HSTATUS_VGEIN);
  vsgein = (env->hgeip & (1ULL << gein)) ? MIP_VSEIP : 0;
  }
@@ -768,7 +763,7 @@ static int get_physical_address(CPURISCVState *env, hwaddr 
*physical,
   * was called. Background registers will be used if the guest has
   * forced a two stage translation to be on (in HS or M mode).
   */
-if (!riscv_cpu_virt_enabled(env) && two_stage) {
+if (!env->virt_enabled && two_stage) {
 

Re: [PATCH for-8.0] nbd/server: Request TCP_NODELAY

2023-03-27 Thread Eric Blake
On Tue, Mar 28, 2023 at 12:42:59AM +0200, Florian Westphal wrote:
> Eric Blake  wrote:
> > Nagle's algorithm adds latency in order to reduce network packet
> > overhead on small packets.  But when we are already using corking to
> > merge smaller packets into transactional requests, the extra delay
> > from TCP defaults just gets in the way.
> > 
> > For reference, qemu as an NBD client already requests TCP_NODELAY (see
> > nbd_connect() in nbd/client-connection.c); as does libnbd as a client
> > [1], and nbdkit as a server [2].
> > 
> > [1] 
> > https://gitlab.com/nbdkit/libnbd/-/blob/a48a1142/generator/states-connect.c#L39
> > [2] https://gitlab.com/nbdkit/nbdkit/-/blob/45b72f5b/server/sockets.c#L430
> > 
> > CC: Florian Westphal 
> > Signed-off-by: Eric Blake 
> > ---
> >  nbd/server.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/nbd/server.c b/nbd/server.c
> > index a4750e41880..976223860bf 100644
> > --- a/nbd/server.c
> > +++ b/nbd/server.c
> > @@ -2755,6 +2755,7 @@ void nbd_client_new(QIOChannelSocket *sioc,
> >  }
> >  client->tlsauthz = g_strdup(tlsauthz);
> >  client->sioc = sioc;
> > +qio_channel_set_delay(QIO_CHANNEL(cioc), false);
> 
> ../nbd/server.c: In function 'nbd_client_new':
> ../nbd/server.c:2763:39: error: 'cioc' undeclared (first use in this 
> function); did you mean 'sioc'?
> 
> Other than that this looks good to me.

Arrgh. Bitten by hitting send before saving the edits in my buffer.
Yes, the obvious fix is needed and intended.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




[PATCH v2 11/19] target/riscv: remove cpu->cfg.ext_s

2023-03-27 Thread Daniel Henrique Barboza
Create a new "s" RISCVCPUMisaExtConfig property that will update
env->misa_ext* with RVS. Instances of cpu->cfg.ext_s and similar are
replaced with riscv_has_ext(env, RVS).

Remove the old "s" property and 'ext_s' from RISCVCPUConfig.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu.c | 11 +--
 target/riscv/cpu.h |  1 -
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 67a7d518c1..768b0a79ca 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -402,7 +402,6 @@ static void rv64_thead_c906_cpu_init(Object *obj)
 
 cpu->cfg.ext_g = true;
 cpu->cfg.ext_u = true;
-cpu->cfg.ext_s = true;
 cpu->cfg.ext_icsr = true;
 cpu->cfg.ext_zfh = true;
 cpu->cfg.mmu = true;
@@ -837,7 +836,7 @@ static void riscv_cpu_validate_set_extensions(RISCVCPU 
*cpu, Error **errp)
 return;
 }
 
-if (cpu->cfg.ext_s && !cpu->cfg.ext_u) {
+if (riscv_has_ext(env, RVS) && !cpu->cfg.ext_u) {
 error_setg(errp,
"Setting S extension without U extension is illegal");
 return;
@@ -849,7 +848,7 @@ static void riscv_cpu_validate_set_extensions(RISCVCPU 
*cpu, Error **errp)
 return;
 }
 
-if (cpu->cfg.ext_h && !cpu->cfg.ext_s) {
+if (cpu->cfg.ext_h && !riscv_has_ext(env, RVS)) {
 error_setg(errp, "H extension implicitly requires S-mode");
 return;
 }
@@ -1109,7 +1108,7 @@ static void riscv_cpu_sync_misa_cfg(CPURISCVState *env)
 if (riscv_has_ext(env, RVC)) {
 ext |= RVC;
 }
-if (riscv_cpu_cfg(env)->ext_s) {
+if (riscv_has_ext(env, RVS)) {
 ext |= RVS;
 }
 if (riscv_cpu_cfg(env)->ext_u) {
@@ -1448,6 +1447,8 @@ static const RISCVCPUMisaExtConfig misa_ext_cfgs[] = {
  .misa_bit = RVE, .enabled = false},
 {.name = "m", .description = "Integer multiplication and division",
  .misa_bit = RVM, .enabled = true},
+{.name = "s", .description = "Supervisor-level instructions",
+ .misa_bit = RVS, .enabled = true},
 };
 
 static void riscv_cpu_add_misa_properties(Object *cpu_obj)
@@ -1471,7 +1472,6 @@ static void riscv_cpu_add_misa_properties(Object *cpu_obj)
 static Property riscv_cpu_extensions[] = {
 /* Defaults for standard extensions */
 DEFINE_PROP_BOOL("g", RISCVCPU, cfg.ext_g, false),
-DEFINE_PROP_BOOL("s", RISCVCPU, cfg.ext_s, true),
 DEFINE_PROP_BOOL("u", RISCVCPU, cfg.ext_u, true),
 DEFINE_PROP_BOOL("v", RISCVCPU, cfg.ext_v, false),
 DEFINE_PROP_BOOL("h", RISCVCPU, cfg.ext_h, true),
@@ -1579,7 +1579,6 @@ static void register_cpu_props(Object *obj)
  */
 if (cpu->env.misa_ext != 0) {
 cpu->cfg.ext_v = misa_ext & RVV;
-cpu->cfg.ext_s = misa_ext & RVS;
 cpu->cfg.ext_u = misa_ext & RVU;
 cpu->cfg.ext_h = misa_ext & RVH;
 cpu->cfg.ext_j = misa_ext & RVJ;
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 7a42c80b7d..fc35aa7509 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -419,7 +419,6 @@ typedef struct {
 
 struct RISCVCPUConfig {
 bool ext_g;
-bool ext_s;
 bool ext_u;
 bool ext_h;
 bool ext_j;
-- 
2.39.2




[PATCH v2 16/19] target/riscv: remove riscv_cpu_sync_misa_cfg()

2023-03-27 Thread Daniel Henrique Barboza
This function was created to move the sync between cpu->cfg.ext_N bit
changes to env->misa_ext* from the validation step to an ealier step,
giving us a guarantee that we could use either cpu->cfg.ext_N or
riscv_has_ext(env,N) in the validation.

We don't have any cpu->cfg.ext_N left that has an existing MISA bit
(cfg.ext_g will be handled shortly). The function is now a no-op, simply
copying the existing values of misa_ext* back to misa_ext*.

Remove it.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu.c | 52 --
 1 file changed, 52 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index a916252077..fa50aae4a5 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1082,50 +1082,6 @@ static void riscv_cpu_finalize_features(RISCVCPU *cpu, 
Error **errp)
 #endif
 }
 
-static void riscv_cpu_sync_misa_cfg(CPURISCVState *env)
-{
-uint32_t ext = 0;
-
-if (riscv_has_ext(env, RVI)) {
-ext |= RVI;
-}
-if (riscv_has_ext(env, RVE)) {
-ext |= RVE;
-}
-if (riscv_has_ext(env, RVM)) {
-ext |= RVM;
-}
-if (riscv_has_ext(env, RVA)) {
-ext |= RVA;
-}
-if (riscv_has_ext(env, RVF)) {
-ext |= RVF;
-}
-if (riscv_has_ext(env, RVD)) {
-ext |= RVD;
-}
-if (riscv_has_ext(env, RVC)) {
-ext |= RVC;
-}
-if (riscv_has_ext(env, RVS)) {
-ext |= RVS;
-}
-if (riscv_has_ext(env, RVU)) {
-ext |= RVU;
-}
-if (riscv_has_ext(env, RVH)) {
-ext |= RVH;
-}
-if (riscv_has_ext(env, RVV)) {
-ext |= RVV;
-}
-if (riscv_has_ext(env, RVJ)) {
-ext |= RVJ;
-}
-
-env->misa_ext = env->misa_ext_mask = ext;
-}
-
 static void riscv_cpu_validate_misa_priv(CPURISCVState *env, Error **errp)
 {
 if (riscv_has_ext(env, RVH) && env->priv_ver < PRIV_VERSION_1_12_0) {
@@ -1169,14 +1125,6 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 set_priv_version(env, priv_version);
 }
 
-/*
- * We can't be sure of whether we set defaults during cpu_init()
- * or whether the user enabled/disabled some bits via cpu->cfg
- * flags. Sync env->misa_ext with cpu->cfg now to allow us to
- * use just env->misa_ext later.
- */
-riscv_cpu_sync_misa_cfg(env);
-
 riscv_cpu_validate_misa_priv(env, _err);
 if (local_err != NULL) {
 error_propagate(errp, local_err);
-- 
2.39.2




  1   2   3   4   >