yanyanyanggg opened a new issue, #18560:
URL: https://github.com/apache/tvm/issues/18560

   ### Issue: [RISC-V RVV] Performance Regression: sum operator slower on RVV 
than RV
   
   #### Description
   The sum operator shows significant performance degradation when using the 
RISC‑V Vector (RVV) extension compared to the scalar RV baseline. The 
acceleration ratio is 0.325, meaning the RVV version is about 3× slower. This 
is unexpected because vector extensions should improve performance, especially 
for reduction operations like sum.
   
   #### Steps to Reproduce
   1. Generate the sum operator with the following configuration:
   ```python
   params = {
       "dtype": "float32",
       "batch": 14,
       "channels": 23,
       "input_height": 67,
       "input_width": 99,
       "axis": 1,
       "keepdims": True
   }
   ```
   
   2. Export the operator to two targets:
      - **RV target** (scalar, without vector extension):
        ```
        llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d 
-mattr=+64bit,+m,+a,+f,+d,+c
        ```
      - **RVV target** (with vector extension):
        ```
        llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d 
-mattr=+64bit,+m,+a,+f,+d,+c,+v
        ```
   
   3. Run performance measurement on both targets.
   
   Operator definition code:
   ```python
   def export_sum(params, set_dir=None, platform="rv"):
       data = relay.var("data",
                        shape=(params["batch"], params["channels"],
                               params["input_height"], params["input_width"]),
                        dtype=params["dtype"])
       sum_op = relay.sum(data, axis=params["axis"], 
keepdims=params["keepdims"])
       export_op(sum_op, params["op_name"], [data], params, set_dir=set_dir)
   ```
   
   #### Performance Data
   - **RV execution time**: 9.301150 ms
   - **RVV execution time**: 28.622800 ms
   - **Acceleration ratio (RV/RVV)**: 0.325 (RVV is ~3× slower)
   
   #### Environment Information
   - **TVM version**: 0.19.0
   - **LLVM version**: [Please provide: `llvm-config --version`]
   - **Hardware**: Spacemit K1‑X bit‑brick board
   - **CPU**: Spacemit X60 (8 cores, 1.6 GHz)
   - **ISA**: rv64imafdcv (with vector extensions)
   - **Memory**: 7.6 GB
   - **OS**: Bianbu 2.2, Linux kernel 6.6.63
   
   #### Expected Behavior
   RVV vectorization should provide a performance improvement over the scalar 
RV baseline for reduction operations like sum.
   
   #### Additional Context
   - The sum operation reduces along axis=1 on a tensor of shape (14, 23, 67, 
99) (≈1.7M elements).
   - The performance regression suggests suboptimal vectorization for reduction 
operations on RVV.
   - Other operators (log, relu, bias_add, sqrt, etc.) also show similar 
regressions, indicating a broader RVV code‑generation or optimization issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to