yanyanyanggg opened a new issue, #18564:
URL: https://github.com/apache/tvm/issues/18564

   ### Issue: [RISC-V RVV] sqrt operator shows poor vectorization performance
   
   #### Description
   The sqrt (square root) operator performs poorly with the RISC‑V Vector (RVV) 
extension, achieving only 0.385× the performance of the scalar implementation. 
This is unexpected for a mathematical function that should see significant 
benefits from vectorization.
   
   #### Steps to Reproduce
   1. Generate the sqrt operator with the following configuration:
   ```python
   params = {
       "dtype": "float32",
       "batch": 14,
       "channels": 23,
       "input_height": 67,
       "input_width": 99
   }
   ```
   
   2. Export the operator to two targets:
      - **RV target** (scalar, without vector extension):
        ```
        llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d 
-mattr=+64bit,+m,+a,+f,+d,+c
        ```
      - **RVV target** (with vector extension):
        ```
        llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d 
-mattr=+64bit,+m,+a,+f,+d,+c,+v
        ```
   
   3. Run performance measurement on both targets.
   
   Operator definition code:
   ```python
   def export_sqrt(params, set_dir=None, platform="rv"):
       data = relay.var("data",
                        shape=(params["batch"], params["channels"],
                               params["input_height"], params["input_width"]),
                        dtype=params["dtype"])
       sqrt_op = relay.sqrt(data)
       export_op(sqrt_op, params["op_name"], [data], params, set_dir=set_dir)
   ```
   
   #### Performance Data
   - **RV execution time**: 11.502000 ms
   - **RVV execution time**: 29.906500 ms
   - **Acceleration ratio (RV/RVV)**: 0.385 (RVV is ~2.6× slower)
   
   #### Environment Information
   - **TVM version**: 0.19.0
   - **LLVM version**: [Please provide: `llvm-config --version`]
   - **Hardware**: Spacemit K1‑X bit‑brick board
   - **CPU**: Spacemit X60 (8 cores, 1.6 GHz)
   - **ISA**: rv64imafdcv (with vector extensions)
   - **Memory**: 7.6 GB
   - **OS**: Bianbu 2.2, Linux kernel 6.6.63
   - **Operation**: Elementwise square root on ~1.7M elements
   
   #### Expected Behavior
   RVV vectorization should provide a performance improvement over the scalar 
RV baseline for mathematical functions like square root.
   
   #### Additional Context
   - The sqrt operation is applied elementwise to a tensor of ~1.7M elements.
   - The performance regression (2.6× slower) suggests that the vectorized 
implementation of sqrt may be using suboptimal instructions or inefficient 
vector length management.
   - This is part of a broader pattern where multiple mathematical operators 
(log, sqrt, etc.) show severe performance degradation with RVV, indicating a 
potential issue with vector intrinsic mapping or loop vectorization for 
transcendental functions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to