LeiWang1999 commented on PR #15111:
URL: https://github.com/apache/tvm/pull/15111#issuecomment-1709871632

   thanks, I see. I mentioned that because I benchmarked the performance on the 
Llama-70b GEMM (cuBLAS FP16xFP16 vs. Relax.Cutlass fp16xint4).
   
   <html xmlns:v="urn:schemas-microsoft-com:vml"
   xmlns:o="urn:schemas-microsoft-com:office:office"
   xmlns:x="urn:schemas-microsoft-com:office:excel"
   xmlns="http://www.w3.org/TR/REC-html40";>
   
   <body link="#467886" vlink="#96607D">
   
   
   
   M | N | K | cublas | cutlass-fpa-intb | speedup
   -- | -- | -- | -- | -- | --
   1 | 1024 | 8192 | 0.030588 | 0.046896935 | 0.652236035
   1 | 8192 | 8192 | 0.199339 | 0.087285042 | 2.28376673
   1 | 8192 | 28672 | 1.055949 | 0.187683105 | 5.626232851
   1 | 28672 | 8192 | 0.672551 | 0.192594528 | 3.492055583
   16 | 1024 | 8192 | 0.036352 | 0.095558167 | 0.380417524
   16 | 8192 | 8192 | 0.19456 | 0.086522102 | 2.248674049
   16 | 8192 | 28672 | 0.666054 | 0.187945366 | 3.54386972
   16 | 28672 | 8192 | 0.669696 | 0.19159317 | 3.495406297
   32 | 1024 | 8192 | 0.037856 | 0.101447105 | 0.373159996
   32 | 8192 | 8192 | 0.196592 | 0.101852417 | 1.930165321
   32 | 8192 | 28672 | 0.664686 | 0.319719315 | 2.078966443
   32 | 28672 | 8192 | 0.67072 | 0.279974937 | 2.395642936
   64 | 1024 | 8192 | 0.05376 | 0.170302391 | 0.31567378
   64 | 8192 | 8192 | 0.199168 | 0.175619125 | 1.134090585
   64 | 8192 | 28672 | 0.675594 | 0.57182312 | 1.181474212
   64 | 28672 | 8192 | 0.681984 | 0.283193588 | 2.408190141
   128 | 1024 | 8192 | 0.078336 | 0.051283836 | 1.527498838
   128 | 8192 | 8192 | 0.238592 | 0.178599358 | 1.335906254
   128 | 8192 | 28672 | 0.739888 | 0.575089455 | 1.286561606
   128 | 28672 | 8192 | 0.714752 | 0.521111488 | 1.371591367
   1024 | 1024 | 8192 | 0.28323 | 0.171804428 | 1.648558319
   1024 | 8192 | 8192 | 1.158315 | 1.086783409 | 1.065819275
   1024 | 8192 | 28672 | 4.166997 | 4.37412262 | 0.952647604
   1024 | 28672 | 8192 | 3.709171 | 3.643035889 | 1.01815373
   4096 | 1024 | 8192 | 0.597701 | 0.690698624 | 0.865357094
   4096 | 8192 | 8192 | 4.334677 | 6.313800812 | 0.686540065
   4096 | 8192 | 28672 | 16.15016 | 23.39763641 | 0.690247328
   4096 | 28672 | 8192 | 14.95114 | 22.15902805 | 0.674720217
   8192 | 1024 | 8192 | 1.200128 | 1.547074318 | 0.775740341
   8192 | 8192 | 8192 | 8.550848 | 12.74256706 | 0.671045949
   8192 | 8192 | 28672 | 30.88835 | 44.36366558 | 0.696253236
   8192 | 28672 | 8192 | 30.19989 | 44.58220005 | 0.677397932
   16384 | 1024 | 8192 | 2.349141 | 3.225851059 | 0.728223751
   16384 | 8192 | 8192 | 16.97864 | 25.81973076 | 0.657583766
   16384 | 8192 | 28672 | 61.27206 | 91.48414135 | 0.669756127
   16384 | 28672 | 8192 | 61.03514 | 90.86410999 | 0.671718869
   
   
   
   </body>
   
   </html>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to