This series adds support for the Hexagon processor with Linux user support Hexagon is Qualcomm's very long instruction word (VLIW) digital signal processor(DSP). We also support Hexagon Vector eXtensions (HVX). HVX is a wide vector coprocessor designed for high performance computer vision, image processing, machine learning, and other workloads.
This series of patches supports the following versions of the Hexagon core Scalar core: v67 https://developer.qualcomm.com/downloads/qualcomm-hexagon-v67-programmer-s-reference-manual HVX extension: v66 https://developer.qualcomm.com/downloads/qualcomm-hexagon-v66-hvx-programmer-s-reference-manual We presented an overview of the project at the 2019 KVM Forum. https://kvmforum2019.sched.com/event/Tmwc/qemu-hexagon-automatic-translation-of-the-isa-manual-pseudcode-to-tiny-code-instructions-of-a-vliw-architecture-niccolo-izzo-revng-taylor-simpson-qualcomm-innovation-center The patches up to and including "Hexagon build infractructure" implement the base Hexagon core and the remainder add HVX. Once the build infrastructure patch is applied, you can build and qemu will execute non-HVX Hexagon programs. We have a parallel effort to make the Hexagon Linux toolchain publically available. *** Required patches *** In order to build, we need this patch https://lists.nongnu.org/archive/html/qemu-devel/2020-02/msg01203.html In order to run pthread_cancel, we need this patch series https://lists.nongnu.org/archive/html/qemu-devel/2020-02/msg00834.html https://lists.nongnu.org/archive/html/qemu-devel/2020-02/msg00832.html https://lists.nongnu.org/archive/html/qemu-devel/2020-02/msg00833.html https://lists.nongnu.org/archive/html/qemu-devel/2020-02/msg00835.html https://lists.nongnu.org/archive/html/qemu-devel/2020-02/msg00836.html *** Testing *** The port passes the following tests Directed unit tests MUSL libc test suite (good coverage of Linux system calls) Compiler intrinsics test suite (good coverage of instructions) Hexagon machine learning library unit tests make check-tcg TIMEOUT=60 *** Known checkpatch issues *** The following are known checkpatch errors in the series include/disas/dis-asm.h space prohibited (Follow convention of other targets on prior lines) target/hexagon/reg_fields.h Complex macro target/hexagon/attribs.h Complex macro target/hexagon/decode.c Complex macro target/hexagon/q6v_decode.c Macro needs do - while target/hexagon/printinsn.c Macro needs do - while target/hexagon/gen_semantics.c Suspicious ; after while (0) target/hexagon/gen_dectree_import.c Complex macro target/hexagon/gen_dectree_import.c Suspicious ; after while (0) target/hexagon/opcodes.c Complex macro target/hexagon/iclass.h Complex macro scripts/qemu-binfmt-conf.sh Line over 90 characters target/hexagon/mmvec/macros.h Suspicious ; after while (0) The following are known checkpatch warnings in the series target/hexagon/fma_emu.c Comments inside macro definition scripts/qemu-binfmt-conf.sh Line over 80 characters *** Tour of the code *** The qemu-hexagon implementation is a combination of qemu and the Hexagon architecture library (aka archlib). The three primary directories with Hexagon-specific code are qemu/target/hexagon This has all the instruction and packet semantics qemu/target/hexagon/imported These files are imported with very little modification from archlib *.idef Instruction semantics definition macros.def Mapping of macros to instruction attributes encode*.def Encoding patterns for each instruction iclass.def Instruction class definitions used to determine legal VLIW slots for each instruction qemu/linux-user/hexagon Helpers for loading the ELF file and making Linux system calls, signals, etc We start with a script that generates qemu helper for each instruction. This is a two step process. The first step is to use the C preprocessor to expand macros inside the architecture definition files. This is done in target/hexagon/semantics.c. This step produces <BUILD_DIR>/hexagon-linux-user/semantics_generated.pyinc. That file is consumed by the do_qemu.py script. This script generates several files. All of the generated files end in "_generated.*". The primary file produced is <BUILD_DIR>/hexagon-linux-user/qemu_def_generated.h Qemu helper functions have 3 parts DEF_HELPER declaration indicates the signature of the helper gen_helper_<NAME> will generate a TCG call to the helper function The helper implementation In the qemu_def_generated.h file, there is a DEF_QEMU macro for each user-space instruction. The file is included several times with DEF_QEMU defined differently, depending on the context. The macro has four arguments The instruction tag The semantics_short code DEF_HELPER declaration Call to the helper Helper implementation Here's an example of the A2_add instruction. Instruction tag A2_add Assembly syntax "Rd32=add(Rs32,Rt32)" Instruction semantics "{ RdV=RsV+RtV;}" By convention, the operands are identified by letter RdV is the destination register RsV, RtV are source registers The generator uses the operand naming conventions (see large comment in do_qemu.py) to determine the signature of the helper function. Here is the result for A2_add from qemu_def_generated.h DEF_QEMU(A2_add,{ RdV=RsV+RtV;}, #ifndef fWRAP_A2_add DEF_HELPER_3(A2_add, s32, env, s32, s32) #endif , { /* A2_add */ DECL_RREG_d(RdV, RdN, 0, 0); DECL_RREG_s(RsV, RsN, 1, 0); DECL_RREG_t(RtV, RtN, 2, 0); READ_RREG_s(RsV, RsN); READ_RREG_t(RtV, RtN); fWRAP_A2_add( do { gen_helper_A2_add(RdV, cpu_env, RsV, RtV); } while (0), { RdV=RsV+RtV;}); WRITE_RREG_d(RdN, RdV); FREE_RREG_d(RdV); FREE_RREG_s(RsV); FREE_RREG_t(RtV); /* A2_add */ }, #ifndef fWRAP_A2_add int32_t HELPER(A2_add)(CPUHexagonState *env, int32_t RsV, int32_t RtV) { uint32_t slot = 4; slot = slot; int32_t RdV = 0; { RdV=RsV+RtV;} COUNT_HELPER(A2_add); return RdV; } #endif ) For each operand, there are macros for DECL, FREE, READ, WRITE. These are defined in macros.h. Note that we append the operand type to the macro name, which allows us to specialize the TCG code tenerated. For read-only operands, DECL simply declares the TCGv variable (no need for tcg_temp_local_new()), and READ will assign from the TCGv corresponding to the GPR, and FREE doesn't have to do anything. Also, note that the WRITE macros update the disassembly context to be processed when the packet commits (see "Packet Semantics" below). Note the fWRAP_A2_add macro around the gen_helper call. Each instruction has a fWRAP_<tag> macro that takes 2 arguments gen_helper call C semantics (aka short code) This allows the code generator to override the auto-generated code. In some cases this is necessary for correct execution. We can also override for faster emulation. For example, calling a helper for add is more expensive than generating a TCG add operation. The qemu_wrap_generated.h file contains a default fWRAP_<tag> for each instruction. The default is to invoke the gen_helper code. #ifndef fWRAP_A2_add #define fWRAP_A2_add(GENHLPR, SHORTCODE) GENHLPR #endif The helper_overrides.h file has any overrides. For example, #define fWRAP_A2_add(GENHLPR, SHORTCODE) \ tcg_gen_add_tl(RdV, RsV, RtV) This file is included twice 1) In genptr.c, it overrides the semantics of the desired instructions 2) In helper.h, it prevents the generation of helpers for overridden instructions. Notice the #ifndef fWRAP_A2_add above. The instruction semantics C code heavily on macros. In cases where the C semantics are specified only with macros, we can override the default with the short semantics option and #define the macros to generate TCG code. One example is Y2_dczeroa (dc == data cache, zero == zero out the cache line, a == address: zero out the data cache line at the given address): Instruction tag Y2_dczeroa Assembly syntax "dczeroa(Rs32)" Instruction semantics "{fEA_REG(RsV); fDCZEROA(EA);}" In helper_overrides.h, we use the shortcode #define fWRAP_Y2_dczeroa(GENHLPR, SHORTCODE) SHORTCODE In other cases, just a little bit of wrapper code needs to be written. #define fWRAP_tmp(SHORTCODE) \ { \ TCGv tmp = tcg_temp_new(); \ SHORTCODE; \ tcg_temp_free(tmp); \ } For example, some load instructions use a temporary for address computation. The SL2_loadrd_sp instruction needs a temporary to hold the value of the stack pointer (r29) Instruction tag SL2_loadrd_sp Assembly syntax "Rdd8=memd(r29+#u5:3)" Instruction semantics "{fEA_RI(fREAD_SP(),uiV); fLOAD(1,8,u,EA,RddV);}" In helper_overrides.h you'll see #define fWRAP_SL2_loadrd_sp(GENHLPR, SHORTCODE) fWRAP_tmp(SHORTCODE) There are also cases where we brute force the TCG code generation. The allocframe and deallocframe instructions are examples. Other examples are instructions with multiple definitions. These require special handling because qemu helpers can only return a single value. In addition to instruction semantics, we use a generator to create the decode tree. This generation is also a two step process. The first step is to run target/hexagon/gen_dectree_import.c to produce <BUILD_DIR>/hexagon-linux-user/iset.py This file is imported by target/hexagon/dectree.py to produce <BUILD_DIR>/hexagon-linux-user/dectree_generated.h *** Key Files *** cpu.h This file contains the definition of the CPUHexagonState struct. It is the runtime information for each thread and contains stuff like the GPR and predicate registers. macros.h mmvec/macros.h The Hexagon arch lib relies heavily on macros for the instruction semantics. This is a great advantage for qemu because we can override them for different purposes. You will also notice there are sometimes two definitions of a macro. The QEMU_GENERATE variable determines whether we want the macro to generate TCG code. If QEMU_GENERATE is not defined, we want the macro to generate vanilla C code that will work in the helper implementation. translate.c The functions in this file generate TCG code for a translation block. Some important functions in this file are gen_start_packet - initialize the data structures for packet semantics gen_commit_packet - commit the register writes, stores, etc for a packet decode_packet - disassemble a packet and generate code genptr.c genptr_helpers.h helper_overrides.h These file create a function for each instruction. It is mostly composed of fWRAP_<tag> definitions followed by including qemu_def_generated.h. The genptr_helpers.h file contains helper functions that are invoked by the macros in helper_overrides.h and macros.h op_helper.c This file contains the implementations of all the helpers. There are a few general purpose helpers, but most of them are generated by including qemu_def_generated.h. There are also several helpers used for debugging. *** Packet Semantics *** VLIW packet semantics differ from serial semantics in that all input operands are read, then the operations are performed, then all the results are written. For exmaple, this packet performs a swap of registers r0 and r1 { r0 = r1; r1 = r0 } Note that the result is different if the instructions are executed serially. Packet semantics dictate that we defer any changes of state until the entire packet is committed. We record the results of each instruction in a side data structure, and update the visible processor state when we commit the packet. The data structures are divided between the runtime state and the translation context. During the TCG generation (see translate.[ch]), we use the DisasContext to track what needs to be done during packet commit. Here are the relevant fields ctx_reg_log list of registers written ctx_reg_log_idx index into ctx_reg_log ctx_pred_log list of predicates written ctx_pred_log_idx index into ctx_pred_log ctx_store_width width of stores (indexed by slot) During runtime, the following fields in CPUHexagonState (see cpu.h) are used new_value new value of a given register reg_written boolean indicating if register was written new_pred_value new value of a predicate register new_pred_written boolean indicating if predicate was written mem_log_stores record of the stores (indexed by slot) For Hexagon Vector eXtensions (HVX), the following fields are used future_VRegs tmp_VRegs future_ZRegs ZRegs_updated VRegs_updated_tmp VRegs_updated VRegs_select *** Debugging *** You can turn on a lot of debugging by changing the HEX_DEBUG macro to 1 in internal.h. This will stream a lot of information as it generates TCG and executes the code. To track down nasty issues with Hexagon->TCG generation, we compare the execution results with actual hardware running on a Hexagon Linux target. Run qemu with the "-d cpu" option. Then, we can diff the results and figure out where qemu and hardware behave differently. The stacks are located at different locations. We handle this by changing env->stack_adjust in translate.c. First, set this to zero and run qemu. Then, change env->stack_adjust to the difference between the two stack locations. Then rebuild qemu and run again. That will produce a very clean diff. Here are some handy places to set breakpoints At the call to gen_start_packet for a given PC (note that the line number might change in the future) br translate.c:602 if ctx->base.pc_next == 0xdeadbeef The helper function for each instruction is named helper_<TAG>, so here's an example that will set a breakpoint at the start br helper_V6_vgathermh If you have the HEX_DEBUG macro set, the following will be useful At the start of execution of a packet for a given PC br helper_debug_start_packet if env->gpr[41] == 0xdeadbeef At the end of execution of a packet for a given PC br helper_debug_commit_end if env->this_PC == 0xdeadbeef Taylor Simpson (66): Hexagon Maintainers Hexagon ELF Machine Definition Hexagon CPU Scalar Core Definition Hexagon register names Hexagon Disassembler Hexagon CPU Scalar Core Helpers Hexagon GDB Stub Hexagon instruction and packet types Hexagon architecture types Hexagon register fields Hexagon instruction attributes Hexagon register map Hexagon instruction/packet decode Hexagon instruction printing Hexagon arch import - instruction semantics definitions Hexagon arch import - macro definitions Hexagon arch import - instruction encoding Hexagon instruction class definitions Hexagon instruction utility functions Hexagon generator phase 1 - C preprocessor for semantics Hexagon generator phase 2 - qemu_def_generated.h Hexagon generator phase 2 - qemu_wrap_generated.h Hexagon generator phase 2 - opcodes_def_generated.h Hexagon generator phase 2 - op_attribs_generated.h Hexagon generator phase 2 - op_regs_generated.h Hexagon generator phase 2 - printinsn-generated.h Hexagon generator phase 3 - C preprocessor for decode tree Hexagon generater phase 4 - Decode tree Hexagon opcode data structures Hexagon macros to interface with the generator Hexagon macros referenced in instruction semantics Hexagon instruction classes Hexagon TCG generation helpers - step 1 Hexagon TCG generation helpers - step 2 Hexagon TCG generation helpers - step 3 Hexagon TCG generation helpers - step 4 Hexagon TCG generation helpers - step 5 Hexagon TCG generation - step 01 Hexagon TCG generation - step 02 Hexagon TCG generation - step 03 Hexagon TCG generation - step 04 Hexagon TCG generation - step 05 Hexagon TCG generation - step 06 Hexagon TCG generation - step 07 Hexagon TCG generation - step 08 Hexagon TCG generation - step 09 Hexagon TCG generation - step 10 Hexagon TCG generation - step 11 Hexagon TCG generation - step 12 Hexagon translation Hexagon Linux user emulation Hexagon build infrastructure Hexagon - Add Hexagon Vector eXtensions (HVX) to core definition Hexagon HVX support in gdbstub Hexagon HVX import instruction encodings Hexagon HVX import semantics Hexagon HVX import macro definitions Hexagon HVX semantics generator Hexagon HVX instruction decoding Hexagon HVX instruction utility functions Hexagon HVX macros to interface with the generator Hexagon HVX macros referenced in instruction semantics Hexagon HVX helper to commit vector stores (masked and scatter/gather) Hexagon HVX TCG generation Hexagon HVX translation Hexagon HVX build infrastructure MAINTAINERS | 8 + configure | 9 + default-configs/hexagon-linux-user.mak | 1 + disas/Makefile.objs | 1 + disas/hexagon.c | 56 + include/disas/dis-asm.h | 1 + include/elf.h | 2 + linux-user/elfload.c | 16 + linux-user/hexagon/cpu_loop.c | 173 ++ linux-user/hexagon/signal.c | 276 ++ linux-user/hexagon/sockbits.h | 18 + linux-user/hexagon/syscall_nr.h | 346 +++ linux-user/hexagon/target_cpu.h | 44 + linux-user/hexagon/target_elf.h | 38 + linux-user/hexagon/target_fcntl.h | 18 + linux-user/hexagon/target_signal.h | 34 + linux-user/hexagon/target_structs.h | 46 + linux-user/hexagon/target_syscall.h | 32 + linux-user/hexagon/termbits.h | 18 + linux-user/syscall.c | 2 + linux-user/syscall_defs.h | 33 + scripts/qemu-binfmt-conf.sh | 6 +- target/hexagon/Makefile.objs | 109 + target/hexagon/arch.c | 664 +++++ target/hexagon/arch.h | 62 + target/hexagon/attribs.h | 32 + target/hexagon/attribs_def.h | 404 +++ target/hexagon/conv_emu.c | 370 +++ target/hexagon/conv_emu.h | 50 + target/hexagon/cpu-param.h | 26 + target/hexagon/cpu.c | 356 +++ target/hexagon/cpu.h | 207 ++ target/hexagon/cpu_bits.h | 37 + target/hexagon/decode.c | 792 +++++ target/hexagon/decode.h | 39 + target/hexagon/dectree.py | 354 +++ target/hexagon/do_qemu.py | 1198 ++++++++ target/hexagon/fma_emu.c | 918 ++++++ target/hexagon/fma_emu.h | 30 + target/hexagon/gdbstub.c | 111 + target/hexagon/gen_dectree_import.c | 205 ++ target/hexagon/gen_semantics.c | 101 + target/hexagon/genptr.c | 62 + target/hexagon/genptr.h | 25 + target/hexagon/genptr_helpers.h | 1022 +++++++ target/hexagon/helper.h | 38 + target/hexagon/helper_overrides.h | 1850 ++++++++++++ target/hexagon/hex_arch_types.h | 42 + target/hexagon/hex_regs.h | 97 + target/hexagon/iclass.c | 109 + target/hexagon/iclass.h | 46 + target/hexagon/imported/allext.idef | 25 + target/hexagon/imported/allext_macros.def | 25 + target/hexagon/imported/allextenc.def | 20 + target/hexagon/imported/allidefs.def | 92 + target/hexagon/imported/alu.idef | 1335 +++++++++ target/hexagon/imported/branch.idef | 344 +++ target/hexagon/imported/compare.idef | 639 +++++ target/hexagon/imported/encode.def | 126 + target/hexagon/imported/encode_pp.def | 2283 +++++++++++++++ target/hexagon/imported/encode_subinsn.def | 150 + target/hexagon/imported/float.idef | 498 ++++ target/hexagon/imported/iclass.def | 52 + target/hexagon/imported/ldst.idef | 421 +++ target/hexagon/imported/macros.def | 3970 ++++++++++++++++++++++++++ target/hexagon/imported/mmvec/encode_ext.def | 830 ++++++ target/hexagon/imported/mmvec/ext.idef | 2809 ++++++++++++++++++ target/hexagon/imported/mmvec/macros.def | 1110 +++++++ target/hexagon/imported/mpy.idef | 1269 ++++++++ target/hexagon/imported/shift.idef | 1211 ++++++++ target/hexagon/imported/subinsns.idef | 152 + target/hexagon/imported/system.idef | 302 ++ target/hexagon/insn.h | 149 + target/hexagon/internal.h | 54 + target/hexagon/macros.h | 1499 ++++++++++ target/hexagon/mmvec/decode_ext_mmvec.c | 673 +++++ target/hexagon/mmvec/decode_ext_mmvec.h | 24 + target/hexagon/mmvec/macros.h | 668 +++++ target/hexagon/mmvec/mmvec.h | 87 + target/hexagon/mmvec/system_ext_mmvec.c | 265 ++ target/hexagon/mmvec/system_ext_mmvec.h | 38 + target/hexagon/op_helper.c | 507 ++++ target/hexagon/opcodes.c | 223 ++ target/hexagon/opcodes.h | 67 + target/hexagon/printinsn.c | 93 + target/hexagon/printinsn.h | 26 + target/hexagon/q6v_decode.c | 416 +++ target/hexagon/reg_fields.c | 28 + target/hexagon/reg_fields.h | 40 + target/hexagon/reg_fields_def.h | 109 + target/hexagon/regmap.h | 38 + target/hexagon/translate.c | 906 ++++++ target/hexagon/translate.h | 112 + tests/tcg/configure.sh | 4 +- tests/tcg/hexagon/float_convs.ref | 748 +++++ tests/tcg/hexagon/float_madds.ref | 768 +++++ 96 files changed, 35737 insertions(+), 2 deletions(-) create mode 100644 default-configs/hexagon-linux-user.mak create mode 100644 disas/hexagon.c create mode 100644 linux-user/hexagon/cpu_loop.c create mode 100644 linux-user/hexagon/signal.c create mode 100644 linux-user/hexagon/sockbits.h create mode 100644 linux-user/hexagon/syscall_nr.h create mode 100644 linux-user/hexagon/target_cpu.h create mode 100644 linux-user/hexagon/target_elf.h create mode 100644 linux-user/hexagon/target_fcntl.h create mode 100644 linux-user/hexagon/target_signal.h create mode 100644 linux-user/hexagon/target_structs.h create mode 100644 linux-user/hexagon/target_syscall.h create mode 100644 linux-user/hexagon/termbits.h create mode 100644 target/hexagon/Makefile.objs create mode 100644 target/hexagon/arch.c create mode 100644 target/hexagon/arch.h create mode 100644 target/hexagon/attribs.h create mode 100644 target/hexagon/attribs_def.h create mode 100644 target/hexagon/conv_emu.c create mode 100644 target/hexagon/conv_emu.h create mode 100644 target/hexagon/cpu-param.h create mode 100644 target/hexagon/cpu.c create mode 100644 target/hexagon/cpu.h create mode 100644 target/hexagon/cpu_bits.h create mode 100644 target/hexagon/decode.c create mode 100644 target/hexagon/decode.h create mode 100755 target/hexagon/dectree.py create mode 100755 target/hexagon/do_qemu.py create mode 100644 target/hexagon/fma_emu.c create mode 100644 target/hexagon/fma_emu.h create mode 100644 target/hexagon/gdbstub.c create mode 100644 target/hexagon/gen_dectree_import.c create mode 100644 target/hexagon/gen_semantics.c create mode 100644 target/hexagon/genptr.c create mode 100644 target/hexagon/genptr.h create mode 100644 target/hexagon/genptr_helpers.h create mode 100644 target/hexagon/helper.h create mode 100644 target/hexagon/helper_overrides.h create mode 100644 target/hexagon/hex_arch_types.h create mode 100644 target/hexagon/hex_regs.h create mode 100644 target/hexagon/iclass.c create mode 100644 target/hexagon/iclass.h create mode 100644 target/hexagon/imported/allext.idef create mode 100644 target/hexagon/imported/allext_macros.def create mode 100644 target/hexagon/imported/allextenc.def create mode 100644 target/hexagon/imported/allidefs.def create mode 100644 target/hexagon/imported/alu.idef create mode 100644 target/hexagon/imported/branch.idef create mode 100644 target/hexagon/imported/compare.idef create mode 100644 target/hexagon/imported/encode.def create mode 100644 target/hexagon/imported/encode_pp.def create mode 100644 target/hexagon/imported/encode_subinsn.def create mode 100644 target/hexagon/imported/float.idef create mode 100644 target/hexagon/imported/iclass.def create mode 100644 target/hexagon/imported/ldst.idef create mode 100755 target/hexagon/imported/macros.def create mode 100644 target/hexagon/imported/mmvec/encode_ext.def create mode 100644 target/hexagon/imported/mmvec/ext.idef create mode 100755 target/hexagon/imported/mmvec/macros.def create mode 100644 target/hexagon/imported/mpy.idef create mode 100644 target/hexagon/imported/shift.idef create mode 100644 target/hexagon/imported/subinsns.idef create mode 100644 target/hexagon/imported/system.idef create mode 100644 target/hexagon/insn.h create mode 100644 target/hexagon/internal.h create mode 100644 target/hexagon/macros.h create mode 100644 target/hexagon/mmvec/decode_ext_mmvec.c create mode 100644 target/hexagon/mmvec/decode_ext_mmvec.h create mode 100644 target/hexagon/mmvec/macros.h create mode 100644 target/hexagon/mmvec/mmvec.h create mode 100644 target/hexagon/mmvec/system_ext_mmvec.c create mode 100644 target/hexagon/mmvec/system_ext_mmvec.h create mode 100644 target/hexagon/op_helper.c create mode 100644 target/hexagon/opcodes.c create mode 100644 target/hexagon/opcodes.h create mode 100644 target/hexagon/printinsn.c create mode 100644 target/hexagon/printinsn.h create mode 100644 target/hexagon/q6v_decode.c create mode 100644 target/hexagon/reg_fields.c create mode 100644 target/hexagon/reg_fields.h create mode 100644 target/hexagon/reg_fields_def.h create mode 100644 target/hexagon/regmap.h create mode 100644 target/hexagon/translate.c create mode 100644 target/hexagon/translate.h create mode 100644 tests/tcg/hexagon/float_convs.ref create mode 100644 tests/tcg/hexagon/float_madds.ref -- 2.7.4