yanyanyanggg opened a new issue, #18570:
URL: https://github.com/apache/tvm/issues/18570

   ### Issue: [RISC-V RVV] negative operator shows performance degradation
   
   #### Description
   The negative operator (elementwise negation) shows performance regression 
with the RISC‑V Vector (RVV) extension, achieving only 0.854× the performance 
of the scalar implementation. This is unexpected for a simple arithmetic 
operation that should benefit from vectorization.
   
   #### Steps to Reproduce
   1. Generate the negative operator with the following configuration:
   ```python
   params = {
       "dtype": "float32",
       "batch": 14,
       "channels": 23,
       "input_height": 67,
       "input_width": 99
   }
   ```
   
   2. Export the operator to two targets:
      - **RV target** (scalar, without vector extension):
        ```
        llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d 
-mattr=+64bit,+m,+a,+f,+d,+c
        ```
      - **RVV target** (with vector extension):
        ```
        llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d 
-mattr=+64bit,+m,+a,+f,+d,+c,+v
        ```
   
   3. Run performance measurement on both targets.
   
   Operator definition code:
   ```python
   def export_negative(params, set_dir=None, platform="rv"):
       data = relay.var("data",
                        shape=(params["batch"], params["channels"],
                               params["input_height"], params["input_width"]),
                        dtype=params["dtype"])
       neg_op = relay.negative(data)
       export_op(neg_op, params["op_name"], [data], params, set_dir=set_dir)
   ```
   
   #### Performance Data
   - **RV execution time**: 7.581020 ms
   - **RVV execution time**: 8.875480 ms
   - **Acceleration ratio (RV/RVV)**: 0.854 (RVV is ~1.17× slower)
   
   #### Environment Information
   - **TVM version**: 0.19.0
   - **LLVM version**: [Please provide: `llvm-config --version`]
   - **Hardware**: Spacemit K1‑X bit‑brick board
   - **CPU**: Spacemit X60 (8 cores, 1.6 GHz)
   - **ISA**: rv64imafdcv (with vector extensions)
   - **Memory**: 7.6 GB
   - **OS**: Bianbu 2.2, Linux kernel 6.6.63
   - **Operation**: Elementwise negation on ~1.7M elements
   
   #### Expected Behavior
   RVV vectorization should provide a performance improvement over the scalar 
RV baseline for simple arithmetic operations like negation.
   
   #### Additional Context
   - The negative operation is applied elementwise to a tensor of ~1.7M 
elements.
   - The performance regression, while less severe than other operators, is 
still unexpected and indicates that even simple arithmetic operations are not 
being efficiently vectorized.
   - This issue is part of a broader pattern where all tested operators show 
performance degradation with RVV, suggesting a potential systemic issue in 
TVM's RVV code generation or optimization.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to