yanyanyanggg opened a new issue, #18563:
URL: https://github.com/apache/tvm/issues/18563

   ### Issue: [RISC-V RVV] Performance Issue: bias_add operator slower with 
vectorization
   
   #### Description
   The bias_add operator shows significant performance degradation when using 
the RISC‑V Vector (RVV) extension. With an acceleration ratio of 0.360, the RVV 
implementation is nearly 3× slower than the scalar implementation. This is 
unexpected for a channel‑wise addition operation that should benefit from 
vectorization.
   
   #### Steps to Reproduce
   1. Generate the bias_add operator with the following configuration:
   ```python
   params = {
       "dtype": "float32",
       "batch": 14,
       "channels": 23,
       "input_height": 67,
       "input_width": 99
   }
   ```
   
   2. Export the operator to two targets:
      - **RV target** (scalar, without vector extension):
        ```
        llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d 
-mattr=+64bit,+m,+a,+f,+d,+c
        ```
      - **RVV target** (with vector extension):
        ```
        llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d 
-mattr=+64bit,+m,+a,+f,+d,+c,+v
        ```
   
   3. Run performance measurement on both targets.
   
   Operator definition code:
   ```python
   def export_bias_add(params, set_dir=None, platform="rv"):
       data = relay.var("data",
                        shape=(params["batch"], params["channels"],
                               params["input_height"], params["input_width"]),
                        dtype=params["dtype"])
       bias = relay.var("bias", shape=(params["channels"],), 
dtype=params["dtype"])
       bias_add = relay.nn.bias_add(data, bias)
       export_op(bias_add, params["op_name"], [data, bias], params, 
set_dir=set_dir)
   ```
   
   #### Performance Data
   - **RV execution time**: 7.683920 ms
   - **RVV execution time**: 21.363800 ms
   - **Acceleration ratio (RV/RVV)**: 0.360 (RVV is ~2.8× slower)
   
   #### Environment Information
   - **TVM version**: 0.19.0
   - **LLVM version**: [Please provide: `llvm-config --version`]
   - **Hardware**: Spacemit K1‑X bit‑brick board
   - **CPU**: Spacemit X60 (8 cores, 1.6 GHz)
   - **ISA**: rv64imafdcv (with vector extensions)
   - **Memory**: 7.6 GB
   - **OS**: Bianbu 2.2, Linux kernel 6.6.63
   - **Operation**: Channel‑wise bias addition on a tensor of shape (14, 23, 
67, 99)
   
   #### Expected Behavior
   RVV vectorization should provide a performance improvement over the scalar 
RV baseline for broadcast addition operations like bias_add.
   
   #### Additional Context
   - The bias_add operation adds a 1D bias vector to each channel of a 4D 
tensor (≈1.7M elements total).
   - The performance regression is severe and similar to other operators (sum, 
log, relu, etc.).
   - This suggests that the current RVV vectorization for broadcast operations 
may be suboptimal, or there are inefficiencies in memory access patterns or 
instruction selection.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to