yanyanyanggg opened a new issue, #18568:
URL: https://github.com/apache/tvm/issues/18568

   ### Issue: [RISC-V RVV] sigmoid operator slower with vector extension
   
   #### Description
   The sigmoid activation function shows performance degradation with the 
RISC‑V Vector (RVV) extension, achieving only 0.703× the performance of the 
scalar implementation. This is unexpected for a common activation function that 
should benefit from vectorization.
   
   #### Steps to Reproduce
   1. Generate the sigmoid operator with the following configuration:
   ```python
   params = {
       "dtype": "float32",
       "batch": 14,
       "channels": 23,
       "input_height": 67,
       "input_width": 99
   }
   ```
   
   2. Export the operator to two targets:
      - **RV target** (scalar, without vector extension):
        ```
        llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d 
-mattr=+64bit,+m,+a,+f,+d,+c
        ```
      - **RVV target** (with vector extension):
        ```
        llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d 
-mattr=+64bit,+m,+a,+f,+d,+c,+v
        ```
   
   3. Run performance measurement on both targets.
   
   Operator definition code:
   ```python
   def export_sigmoid(params, set_dir=None, platform="rv"):
       data = relay.var("data",
                        shape=(params["batch"], params["channels"],
                               params["input_height"], params["input_width"]),
                        dtype=params["dtype"])
       sigmoid = relay.sigmoid(data)
       export_op(sigmoid, params["op_name"], [data], params, set_dir=set_dir)
   ```
   
   #### Performance Data
   - **RV execution time**: 19.811500 ms
   - **RVV execution time**: 28.199300 ms
   - **Acceleration ratio (RV/RVV)**: 0.703 (RVV is ~1.4× slower)
   
   #### Environment Information
   - **TVM version**: 0.19.0
   - **LLVM version**: [Please provide: `llvm-config --version`]
   - **Hardware**: Spacemit K1‑X bit‑brick board
   - **CPU**: Spacemit X60 (8 cores, 1.6 GHz)
   - **ISA**: rv64imafdcv (with vector extensions)
   - **Memory**: 7.6 GB
   - **OS**: Bianbu 2.2, Linux kernel 6.6.63
   - **Operation**: Elementwise sigmoid activation on ~1.7M elements
   
   #### Expected Behavior
   RVV vectorization should provide a performance improvement over the scalar 
RV baseline for activation functions like sigmoid.
   
   #### Additional Context
   - The sigmoid operation is applied elementwise to a tensor of ~1.7M elements.
   - While the performance regression is less severe than for some other 
operators, it is still significant and indicates that the vectorized 
implementation of sigmoid may be using suboptimal instructions or inefficient 
vector length management.
   - This is part of a pattern where multiple mathematical and activation 
functions (log, sqrt, sigmoid, etc.) show performance degradation with RVV, 
suggesting a potential issue with vector intrinsic mapping for transcendental 
functions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to