giuseros opened a new pull request #6445:
URL: https://github.com/apache/incubator-tvm/pull/6445


   ### High level description of the submission
   We added two new intrinsics in: `topi/arm_cpu/tensor_intrin.py`, namely
   - `mmla4x4`: compute a matrix multiplication between tile `A(4,4)` and tile
     `B(4,4)`
   - `mmla16x4`: compute a matrix multiplication between tile `A(rows,4)` and 
tile
     `B(4,16)`
   Then we used those intrinsics in two separate strategies. We added the
   strategies in `topi/arm_cpu/conv2d_int8.py` and implemented the schedules
   in `topi/arm_cpu/conv2d_gemm.py`. In particular:
   - `schedule_conv2d_gemm`, when accelerated, packs matrix `A`, compute GEMM,
     and unpack the resulting matrix. This uses the `mmla4x4` intrinsic
   - `schedule_conv2d_gemm_hybrid` doesn't do any packing on `A` and `C` which
     are in native form.  This uses the `mmla16x4` intrinsic
   
   Please note that for the limitations of `tensorize` we need to pad
   matrix `A` in both cases (when dimensions are not multiple of the tiling
   shape)
   
   ### RFC
   This PR is based on the following RFC: 
https://discuss.tvm.apache.org/t/rfc-accelerate-quantized-convolution-through-dot-product/7873
   
   Change-Id: Id0d818d84ffc458c6dad7983fd350a0f3d5db395


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to