yanyanyanggg opened a new issue, #18565:
URL: https://github.com/apache/tvm/issues/18565

   ### Issue: [RISC-V RVV] floor operator performance regression
   
   #### Description
   The floor operator shows performance degradation with the RISC‑V Vector 
(RVV) extension, achieving only 0.521× the performance of the scalar 
implementation. This suggests inefficient vectorization for the floor operation.
   
   #### Steps to Reproduce
   1. Generate the floor operator with the following configuration:
   ```python
   params = {
       "dtype": "float32",
       "batch": 14,
       "channels": 23,
       "input_height": 67,
       "input_width": 99
   }
   ```
   
   2. Export the operator to two targets:
      - **RV target** (scalar, without vector extension):
        ```
        llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d 
-mattr=+64bit,+m,+a,+f,+d,+c
        ```
      - **RVV target** (with vector extension):
        ```
        llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d 
-mattr=+64bit,+m,+a,+f,+d,+c,+v
        ```
   
   3. Run performance measurement on both targets.
   
   Operator definition code:
   ```python
   def export_floor(params, set_dir=None, platform="rv"):
       data = relay.var("data",
                        shape=(params["batch"], params["channels"],
                               params["input_height"], params["input_width"]),
                        dtype=params["dtype"])
       floor_op = relay.floor(data)
       export_op(floor_op, params["op_name"], [data], params, set_dir=set_dir)
   ```
   
   #### Performance Data
   - **RV execution time**: 8.891440 ms
   - **RVV execution time**: 17.061200 ms
   - **Acceleration ratio (RV/RVV)**: 0.521 (RVV is ~1.9× slower)
   
   #### Environment Information
   - **TVM version**: 0.19.0
   - **LLVM version**: [Please provide: `llvm-config --version`]
   - **Hardware**: Spacemit K1‑X bit‑brick board
   - **CPU**: Spacemit X60 (8 cores, 1.6 GHz)
   - **ISA**: rv64imafdcv (with vector extensions)
   - **Memory**: 7.6 GB
   - **OS**: Bianbu 2.2, Linux kernel 6.6.63
   - **Operation**: Elementwise floor on ~1.7M elements
   
   #### Expected Behavior
   RVV vectorization should provide a performance improvement over the scalar 
RV baseline for elementwise operations like floor.
   
   #### Additional Context
   - The floor operation is applied elementwise to a tensor of ~1.7M elements.
   - While the regression is less severe than for other operators (sum, log, 
etc.), it still represents a significant performance degradation for a simple 
arithmetic operation.
   - This suggests that the current RVV implementation of floor may be using 
suboptimal vector instructions or inefficient vector length management.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to