yanyanyanggg opened a new issue, #18571:
URL: https://github.com/apache/tvm/issues/18571
### Issue: [RISC-V RVV] max_pool2d operator shows minor performance
regression
#### Description
The max_pool2d operator shows slight performance degradation with the RISC‑V
Vector (RVV) extension, achieving 0.867× the performance of the scalar
implementation. While the regression is smaller than other operators, it still
indicates suboptimal vectorization for 2D max pooling.
#### Steps to Reproduce
1. Generate the max_pool2d operator with the following configuration:
```python
params = {
"dtype": "float32",
"batch": 14,
"pool_channels": 23,
"pool_size": 2,
"stride": 4,
"padding": 1,
"input_height": 99,
"input_width": 95
}
```
2. Export the operator to two targets:
- **RV target** (scalar, without vector extension):
```
llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d
-mattr=+64bit,+m,+a,+f,+d,+c
```
- **RVV target** (with vector extension):
```
llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d
-mattr=+64bit,+m,+a,+f,+d,+c,+v
```
3. Run performance measurement on both targets.
Operator definition code:
```python
def export_max_pool2d(params, set_dir=None, platform="rv"):
data = relay.var("data",
shape=(params["batch"], params["pool_channels"],
params["input_height"], params["input_width"]),
dtype=params["dtype"])
pool = relay.nn.max_pool2d(
data,
pool_size=(params["pool_size"], params["pool_size"]),
strides=(params["stride"], params["stride"]),
padding=(params["padding"], params["padding"])
)
export_op(pool, params["op_name"], [data], params, set_dir=set_dir)
```
#### Performance Data
- **RV execution time**: 8.357100 ms
- **RVV execution time**: 9.634620 ms
- **Acceleration ratio (RV/RVV)**: 0.867 (RVV is ~1.15× slower)
#### Environment Information
- **TVM version**: 0.19.0
- **LLVM version**: [Please provide: `llvm-config --version`]
- **Hardware**: Spacemit K1‑X bit‑brick board
- **CPU**: Spacemit X60 (8 cores, 1.6 GHz)
- **ISA**: rv64imafdcv (with vector extensions)
- **Memory**: 7.6 GB
- **OS**: Bianbu 2.2, Linux kernel 6.6.63
- **Operation**: 2×2 max pooling with stride 4 on input shape (14, 23, 99,
95)
#### Expected Behavior
RVV vectorization should provide a performance improvement over the scalar
RV baseline for 2D pooling operations like max_pool2d.
#### Additional Context
- The operation performs 2×2 max pooling with stride 4 and padding 1 on a 4D
tensor.
- While the performance regression is less severe than for other operators,
it still indicates that the vectorized implementation of 2D max pooling may
have inefficiencies in memory access patterns or vector reduction within
pooling windows.
- This is part of a broader pattern where all tested operators show
performance degradation with RVV, suggesting potential issues with
vectorization strategies in TVM's RISC‑V backend.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]