yanyanyanggg opened a new issue, #18567:
URL: https://github.com/apache/tvm/issues/18567

   ### Issue: [RISC-V RVV] avg_pool2d operator shows performance degradation
   
   #### Description
   The average pooling operator (avg_pool2d) shows performance regression with 
the RISC‑V Vector (RVV) extension, achieving only 0.621× the performance of the 
scalar implementation. This suggests suboptimal vectorization for 2D pooling 
operations.
   
   #### Steps to Reproduce
   1. Generate the avg_pool2d operator with the following configuration:
   ```python
   params = {
       "dtype": "float32",
       "batch": 14,
       "pool_channels": 23,
       "pool_size": 2,
       "stride": 4,
       "padding": 1,
       "input_height": 99,
       "input_width": 95
   }
   ```
   
   2. Export the operator to two targets:
      - **RV target** (scalar, without vector extension):
        ```
        llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d 
-mattr=+64bit,+m,+a,+f,+d,+c
        ```
      - **RVV target** (with vector extension):
        ```
        llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d 
-mattr=+64bit,+m,+a,+f,+d,+c,+v
        ```
   
   3. Run performance measurement on both targets.
   
   Operator definition code:
   ```python
   def export_avg_pool2d(params, set_dir=None, platform="rv"):
       data = relay.var("data",
                        shape=(params["batch"], params["pool_channels"],
                               params["input_height"], params["input_width"]),
                        dtype=params["dtype"])
       pool = relay.nn.avg_pool2d(
           data,
           pool_size=(params["pool_size"], params["pool_size"]),
           strides=(params["stride"], params["stride"]),
           padding=(params["padding"], params["padding"])
       )
       export_op(pool, params["op_name"], [data], params, set_dir=set_dir)
   ```
   
   #### Performance Data
   - **RV execution time**: 8.779250 ms
   - **RVV execution time**: 14.134500 ms
   - **Acceleration ratio (RV/RVV)**: 0.621 (RVV is ~1.6× slower)
   
   #### Environment Information
   - **TVM version**: 0.19.0
   - **LLVM version**: [Please provide: `llvm-config --version`]
   - **Hardware**: Spacemit K1‑X bit‑brick board
   - **CPU**: Spacemit X60 (8 cores, 1.6 GHz)
   - **ISA**: rv64imafdcv (with vector extensions)
   - **Memory**: 7.6 GB
   - **OS**: Bianbu 2.2, Linux kernel 6.6.63
   - **Operation**: 2×2 average pooling with stride 4 on input shape (14, 23, 
99, 95)
   
   #### Expected Behavior
   RVV vectorization should provide a performance improvement over the scalar 
RV baseline for 2D pooling operations like avg_pool2d.
   
   #### Additional Context
   - The operation performs 2×2 average pooling with stride 4 and padding 1 on 
a 4D tensor.
   - The performance regression indicates that the vectorized implementation of 
2D pooling may have inefficient memory access patterns or suboptimal use of 
vector instructions for reduction within pooling windows.
   - This is part of a broader pattern where multiple operators show 
performance degradation with RVV, suggesting potential issues with 
vectorization strategies for 2D operations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to