cbalint13 commented on PR #18182: URL: https://github.com/apache/tvm/pull/18182#issuecomment-3220162168
Further, investigated the corectness of the proposed tensorization kernels. The proposed ```multimvul``` does multiple dotproducts yielding highest benefits inside RVV. All tests here needs https://github.com/apache/tvm/pull/18232 --- * Proposed kernels looks wrong, implementation also produce bad numerical: [riscv64-rvv-kernels-pr18182.py.gz](https://github.com/user-attachments/files/21968923/riscv64-rvv-kernels-pr18182.py.gz) ``` $ ./riscv64-rvv-kernels-pr18182.py 64 Testing rvv_float32_multivmul_8_64_m8 C (output): (8,) [float32] [1363. 0. 0. 0. 0. 0. 0. 0.] Output (kernel) [1363. 0. 0. 0. 0. 0. 0. 0.] Output (numpy) [1363. 1407. 1460. 1388. 1504. 1373. 1268. 1270.] $ ./riscv64-rvv-kernels-pr18182.py 32 Testing rvv_float32_multivmul_8_32_m8 C (output): (8,) [float32] [699. 0. 0. 0. 0. 0. 0. 0.] Output (kernel) [699. 0. 0. 0. 0. 0. 0. 0.] Output (numpy) [699. 493. 671. 707. 635. 639. 764. 611.] $ ./riscv64-rvv-kernels-pr18182.py 16 Testing rvv_float32_multivmul_8_16_m8 C (output): (8,) [float32] [425. 0. 0. 0. 0. 0. 0. 0.] Output (kernel) [425. 0. 0. 0. 0. 0. 0. 0.] Output (numpy) [425. 192. 382. 464. 465. 382. 438. 202.] {...} ``` * Here is a working reference fp32 kernel leveraging one-hot [full RVV occupancy](https://github.com/cbalint13/rvv-kernels/blob/main/dot_fp32_kernel.ir). [riscv64-rvv-full-fp32_kern.py.gz](https://github.com/user-attachments/files/21968922/riscv64-rvv-full-fp32_kern.py.gz) ``` $ ./riscv64-rvv-full-fp32_kern.py DEBUG:pydot:pydot initializing DEBUG:pydot:pydot 3.0.1 DEBUG:pydot.core:pydot core module initializing DEBUG:pydot.dot_parser:pydot dot_parser module initializing # from tvm.script import ir as I # from tvm.script import tir as T @I.ir_module class Module: @T.prim_func def main(A_handle: T.handle, B_handle: T.handle, C_handle: T.handle): T.func_attr({"global_symbol": "rvv_dot_4f32_4x4f32_2f32"}) A = T.match_buffer(A_handle, (4,), align=4, offset_factor=1) B = T.match_buffer(B_handle, (4, 4), strides=(4, 1), align=4, offset_factor=1) C = T.match_buffer(C_handle, (4,), align=4, offset_factor=1) with T.block("root"): T.reads(A[0:4], B[0:4, 0:4]) T.writes(C[0:4]) zero: T.float32xvscalex2 = T.call_llvm_intrin("float32xvscalex2", "llvm.riscv.vfmv.v.f", T.Broadcast(T.float32(0.0), T.vscale() * 2), C[0], T.uint64(1)) vec_A: T.float32xvscalex4 = T.call_llvm_intrin("float32xvscalex4", "llvm.riscv.vle", T.Broadcast(T.float32(0.0), T.vscale() * 4), T.tvm_access_ptr(T.type_annotation("float32"), A.data, 0, 4, 1), T.int64(4)) for i in range(4): with T.block("reduction"): vi = T.axis.spatial(4, i) T.reads(B[0:4, 0:4]) T.writes(C[vi]) vec_B: T.float32xvscalex4 = T.call_llvm_intrin("float32xvscalex4", "llvm.riscv.vle", T.Broadcast(T.float32(0.0), T.vscale() * 4), T.tvm_access_ptr(T.type_annotation("float32"), B.data, vi * 4, 4, 1), T.int64(4)) product: T.float32xvscalex4 = T.call_llvm_intrin("float32xvscalex4", "llvm.riscv.vfmul", T.Broadcast(T.float32(0.0), T.vscale() * 4), vec_A, vec_B, T.uint64(7), T.uint64(4)) reduction_result_vec: T.float32xvscalex2 = T.call_llvm_intrin("float32xvscalex2", "llvm.riscv.vfredusum", T.Broadcast(T.float32(0.0), T.vscale() * 2), product, zero, T.uint64(7), T.uint64(4)) C[vi] = T.call_llvm_intrin("float32", "llvm.riscv.vfmv.f.s", reduction_result_vec) [6. 6. 9. 3.] [[3. 7. 7. 7.] [0. 2. 5. 7.] [3. 9. 5. 7.] [9. 3. 6. 1.]] Output (kernel) [144. 78. 138. 129.] Output (numpy) [144. 78. 138. 129.] ``` For this working sample, 4 x (4x4) -> 4xlanes for VLEN=256 @ fp32 case is [the maximum](https://github.com/cbalint13/rvv-kernels/blob/f9ae5903b557c31df90eaafab644d9598a08e0cb/rvv-dot-kernel-gen.py#L37-L40) for a fully occupied RVV machine. --- Now, beside the matching template issues due to relax flow (exemplified with a working dense/matmul testcase), the numerical implementation of the kernels itself are also wrong and personally I don't see how they fully exploit the RVV machine (also provided a working testcase). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
