cbalint13 opened a new pull request, #18243:
URL: https://github.com/apache/tvm/pull/18243

   This PR adds RISCV kernels in compliance with RVV v1.0 specifications
   
   ---
   
   #### Notes
   
    * Enables high performance kernels covering majority of usual [ML datatype 
combinations](https://github.com/apache/tvm/compare/main...cbalint13:tvm:riscv-rvv-metasch?expand=1#diff-7330d34376acbd4172bbc2ed08b7238486f27fdfc32efa3188b172845996e87bR189-R191)
    * It is currently compliant with [RVV specs version 
v1.0](https://github.com/riscvarchive/riscv-v-spec/releases/tag/v1.0) (does not 
work with old v0.7.1)
   
   Like all other CPU intrisics, only limited list of operators, currently 
dense (linear), works with metaschedule.
   The list of operators will be revisited and extended in the near future to 
transposed flavours and convs.
   
   
   #### Performance
   
   The performance evaluation revealed an approximate 10x improvement on a 
SpaceMIT-x60 SoC board.
   
   * Evaluation program 
[riscv64-dense-relax-metaschedule.py.gz](https://github.com/user-attachments/files/22007138/riscv64-dense-relax-metaschedule.py.gz)
 results here:
   
   ```
   $ ./riscv64-dense-relax-metaschedule.py --num_trials 256 \
            --data_dtype uint8 --weight-dtype int8 --output-dtype int32
   {...}
    ID |  Name |      FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted 
Latency (us) | Trials | Done 
   
---------------------------------------------------------------------------------------------------------
     0 | dense | 268435456 |      1 |       852.2994 |     314.9544 |           
   314.9544 |    256 |      
   
---------------------------------------------------------------------------------------------------------
   
   $ ./riscv64-dense-relax-metaschedule.py --num_trials 256 \
           --data_dtype float16 --weight-dtype float16 --output-dtype float16
    ID |  Name |      FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted 
Latency (us) | Trials | Done 
   
---------------------------------------------------------------------------------------------------------
     0 | dense | 268435456 |      1 |       798.0926 |     336.3463 |           
   336.3463 |    256 |    Y 
   
---------------------------------------------------------------------------------------------------------
   
   $ ./riscv64-dense-relax-metaschedule.py --num_trials 256 \
           --data_dtype float32 --weight-dtype float32 --output-dtype float32
    ID |  Name |      FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted 
Latency (us) | Trials | Done 
   
---------------------------------------------------------------------------------------------------------
     0 | dense | 268435456 |      1 |       464.1279 |     578.3653 |           
   578.3653 |    256 |    Y 
   
---------------------------------------------------------------------------------------------------------
   ```
   
   ---
   
   #### Tests
   
    * All intrinsics was tested for their numerical corectness, programs are 
provided below.
   
   
[kernel-numerical-testing.log.gz](https://github.com/user-attachments/files/22007176/kernel-numerical-testing.log.gz)
   
[riscv64-rvv-kernels-numerical-testing.py.gz](https://github.com/user-attachments/files/22007178/riscv64-rvv-kernels-numerical-testing.py.gz)
   
[riscv64-rvv-kernels-numerical-testing.sh.gz](https://github.com/user-attachments/files/22007179/riscv64-rvv-kernels-numerical-testing.sh.gz)
   
   
   ```
   $ ./riscv64-rvv-kernels-numerical-testing.sh
   {...}
   $ cat kernel-numerical-testing.log | grep -e Testing 
   Testing rvv_dot_4u8_8x4i8_8i32
   Testing rvv_dot_4i8_8x4i8_8i32
   Testing rvv_dot_4f16_8x4f16_8f16
   Testing rvv_dot_4f32_8x4f32_8f32
   Testing rvv_dot_8u8_8x8i8_8i32
   Testing rvv_dot_8i8_8x8i8_8i32
   Testing rvv_dot_8f16_8x8f16_8f16
   Testing rvv_dot_8f32_8x8f32_8f32
   Testing rvv_dot_16u8_8x16i8_8i32
   Testing rvv_dot_16i8_8x16i8_8i32
   Testing rvv_dot_16f16_8x16f16_8f16
   Testing rvv_dot_16f32_8x16f32_8f32
   Testing rvv_dot_32u8_8x32i8_8i32
   Testing rvv_dot_32i8_8x32i8_8i32
   Testing rvv_dot_32f16_8x32f16_8f16
   Testing rvv_dot_32f32_8x32f32_8f32
   Testing rvv_dot_64u8_8x64i8_8i32
   Testing rvv_dot_64i8_8x64i8_8i32
   Testing rvv_dot_64f16_8x64f16_8f16
   Testing rvv_dot_64f32_8x64f32_8f32
   Testing rvv_dot_128u8_8x128i8_8i32
   Testing rvv_dot_128i8_8x128i8_8i32
   Testing rvv_dot_128f16_8x128f16_8f16
   Testing rvv_dot_128f32_8x128f32_8f32
   ```
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to