[I] [Bug] [RISC-V RVV] round operator shows suboptimal vectorization [tvm]

via GitHub Mon, 08 Dec 2025 20:14:37 -0800


yanyanyanggg opened a new issue, #18566:
URL: https://github.com/apache/tvm/issues/18566


   ### Issue: [RISC-V RVV] round operator shows suboptimal vectorization
   
   #### Description
   The round operator performs worse with the RISC‑V Vector (RVV) extension, 
achieving only 0.547× the performance of the scalar implementation. This 
indicates inefficient vectorization for rounding operations.
   
   #### Steps to Reproduce
   1. Generate the round operator with the following configuration:
   ```python
   params = {
       "dtype": "float32",
       "batch": 14,
       "channels": 23,
       "input_height": 67,
       "input_width": 99
   }
   ```
   
   2. Export the operator to two targets:
      - **RV target** (scalar, without vector extension):
        ```
        llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d 
-mattr=+64bit,+m,+a,+f,+d,+c
        ```
      - **RVV target** (with vector extension):
        ```
        llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d 
-mattr=+64bit,+m,+a,+f,+d,+c,+v
        ```
   
   3. Run performance measurement on both targets.
   
   Operator definition code:
   ```python
   def export_round(params, set_dir=None, platform="rv"):
       data = relay.var("data",
                        shape=(params["batch"], params["channels"],
                               params["input_height"], params["input_width"]),
                        dtype=params["dtype"])
       round_op = relay.round(data)
       export_op(round_op, params["op_name"], [data], params, set_dir=set_dir)
   ```
   
   #### Performance Data
   - **RV execution time**: 7.314920 ms
   - **RVV execution time**: 13.376600 ms
   - **Acceleration ratio (RV/RVV)**: 0.547 (RVV is ~1.8× slower)
   
   #### Environment Information
   - **TVM version**: 0.19.0
   - **LLVM version**: [Please provide: `llvm-config --version`]
   - **Hardware**: Spacemit K1‑X bit‑brick board
   - **CPU**: Spacemit X60 (8 cores, 1.6 GHz)
   - **ISA**: rv64imafdcv (with vector extensions)
   - **Memory**: 7.6 GB
   - **OS**: Bianbu 2.2, Linux kernel 6.6.63
   - **Operation**: Elementwise rounding on ~1.7M elements
   
   #### Expected Behavior
   RVV vectorization should provide a performance improvement over the scalar 
RV baseline for elementwise operations like round.
   
   #### Additional Context
   - The round operation is applied elementwise to a tensor of ~1.7M elements.
   - The performance regression is significant and suggests that the vectorized 
implementation of round may be using inefficient instructions or suboptimal 
vector length.
   - This is part of a pattern where multiple elementwise operations (including 
floor, round, etc.) show performance degradation with RVV, indicating a 
potential systemic issue in the vectorization of these operations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [Bug] [RISC-V RVV] round operator shows suboptimal vectorization [tvm]

Reply via email to