yanyanyanggg opened a new issue, #18562:
URL: https://github.com/apache/tvm/issues/18562

   ### Issue: [RISC-V RVV] Performance Degradation: ReLU activation slower with 
vector extension
   
   #### Description
   The ReLU (rectified linear unit) operator shows significant performance 
degradation with the RISC‑V Vector (RVV) extension. The acceleration ratio is 
0.337, meaning the RVV version is about 3× slower than the scalar 
implementation. This is unexpected for a simple elementwise operation that 
should benefit greatly from vectorization.
   
   #### Steps to Reproduce
   1. Generate the ReLU operator with the following configuration:
   ```python
   params = {
       "dtype": "float32",
       "batch": 14,
       "channels": 23,
       "input_height": 67,
       "input_width": 99
   }
   ```
   
   2. Export the operator to two targets:
      - **RV target** (scalar, without vector extension):
        ```
        llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d 
-mattr=+64bit,+m,+a,+f,+d,+c
        ```
      - **RVV target** (with vector extension):
        ```
        llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d 
-mattr=+64bit,+m,+a,+f,+d,+c,+v
        ```
   
   3. Run performance measurement on both targets.
   
   Operator definition code:
   ```python
   def export_relu(params, set_dir=None, platform="rv"):
       data = relay.var("data",
                        shape=(params["batch"], params["channels"],
                               params["input_height"], params["input_width"]),
                        dtype=params["dtype"])
       relu = relay.nn.relu(data)
       export_op(relu, params["op_name"], [data], params, set_dir=set_dir)
   ```
   
   #### Performance Data
   - **RV execution time**: 7.945310 ms
   - **RVV execution time**: 23.579300 ms
   - **Acceleration ratio (RV/RVV)**: 0.337 (RVV is ~3× slower)
   
   #### Environment Information
   - **TVM version**: 0.19.0
   - **LLVM version**: [Please provide: `llvm-config --version`]
   - **Hardware**: Spacemit K1‑X bit‑brick board
   - **CPU**: Spacemit X60 (8 cores, 1.6 GHz)
   - **ISA**: rv64imafdcv (with vector extensions)
   - **Memory**: 7.6 GB
   - **OS**: Bianbu 2.2, Linux kernel 6.6.63
   - **Operation**: Elementwise ReLU on ~1.7M elements
   
   #### Expected Behavior
   RVV vectorization should provide a performance improvement over the scalar 
RV baseline for simple elementwise operations like ReLU.
   
   #### Additional Context
   - The ReLU operation is applied elementwise to a tensor of ~1.7M elements.
   - The severe performance regression (3× slower) is particularly surprising 
for such a simple operation that should be a perfect candidate for 
vectorization.
   - This issue is part of a broader pattern where multiple operators (sum, 
log, relu, bias_add, sqrt, etc.) show significant performance degradation with 
RVV, suggesting a potential systemic issue in TVM's RVV code generation or 
optimization.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to