yanyanyanggg opened a new issue, #18565:
URL: https://github.com/apache/tvm/issues/18565
### Issue: [RISC-V RVV] floor operator performance regression
#### Description
The floor operator shows performance degradation with the RISC‑V Vector
(RVV) extension, achieving only 0.521× the performance of the scalar
implementation. This suggests inefficient vectorization for the floor operation.
#### Steps to Reproduce
1. Generate the floor operator with the following configuration:
```python
params = {
"dtype": "float32",
"batch": 14,
"channels": 23,
"input_height": 67,
"input_width": 99
}
```
2. Export the operator to two targets:
- **RV target** (scalar, without vector extension):
```
llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d
-mattr=+64bit,+m,+a,+f,+d,+c
```
- **RVV target** (with vector extension):
```
llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d
-mattr=+64bit,+m,+a,+f,+d,+c,+v
```
3. Run performance measurement on both targets.
Operator definition code:
```python
def export_floor(params, set_dir=None, platform="rv"):
data = relay.var("data",
shape=(params["batch"], params["channels"],
params["input_height"], params["input_width"]),
dtype=params["dtype"])
floor_op = relay.floor(data)
export_op(floor_op, params["op_name"], [data], params, set_dir=set_dir)
```
#### Performance Data
- **RV execution time**: 8.891440 ms
- **RVV execution time**: 17.061200 ms
- **Acceleration ratio (RV/RVV)**: 0.521 (RVV is ~1.9× slower)
#### Environment Information
- **TVM version**: 0.19.0
- **LLVM version**: [Please provide: `llvm-config --version`]
- **Hardware**: Spacemit K1‑X bit‑brick board
- **CPU**: Spacemit X60 (8 cores, 1.6 GHz)
- **ISA**: rv64imafdcv (with vector extensions)
- **Memory**: 7.6 GB
- **OS**: Bianbu 2.2, Linux kernel 6.6.63
- **Operation**: Elementwise floor on ~1.7M elements
#### Expected Behavior
RVV vectorization should provide a performance improvement over the scalar
RV baseline for elementwise operations like floor.
#### Additional Context
- The floor operation is applied elementwise to a tensor of ~1.7M elements.
- While the regression is less severe than for other operators (sum, log,
etc.), it still represents a significant performance degradation for a simple
arithmetic operation.
- This suggests that the current RVV implementation of floor may be using
suboptimal vector instructions or inefficient vector length management.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]