LeiWang1999 commented on PR #15111: URL: https://github.com/apache/tvm/pull/15111#issuecomment-1709871632
thanks, I see. I mentioned that because I benchmarked the performance on the Llama-70b GEMM (cuBLAS FP16xFP16 vs. Relax.Cutlass fp16xint4). <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40"> <body link="#467886" vlink="#96607D"> M | N | K | cublas | cutlass-fpa-intb | speedup -- | -- | -- | -- | -- | -- 1 | 1024 | 8192 | 0.030588 | 0.046896935 | 0.652236035 1 | 8192 | 8192 | 0.199339 | 0.087285042 | 2.28376673 1 | 8192 | 28672 | 1.055949 | 0.187683105 | 5.626232851 1 | 28672 | 8192 | 0.672551 | 0.192594528 | 3.492055583 16 | 1024 | 8192 | 0.036352 | 0.095558167 | 0.380417524 16 | 8192 | 8192 | 0.19456 | 0.086522102 | 2.248674049 16 | 8192 | 28672 | 0.666054 | 0.187945366 | 3.54386972 16 | 28672 | 8192 | 0.669696 | 0.19159317 | 3.495406297 32 | 1024 | 8192 | 0.037856 | 0.101447105 | 0.373159996 32 | 8192 | 8192 | 0.196592 | 0.101852417 | 1.930165321 32 | 8192 | 28672 | 0.664686 | 0.319719315 | 2.078966443 32 | 28672 | 8192 | 0.67072 | 0.279974937 | 2.395642936 64 | 1024 | 8192 | 0.05376 | 0.170302391 | 0.31567378 64 | 8192 | 8192 | 0.199168 | 0.175619125 | 1.134090585 64 | 8192 | 28672 | 0.675594 | 0.57182312 | 1.181474212 64 | 28672 | 8192 | 0.681984 | 0.283193588 | 2.408190141 128 | 1024 | 8192 | 0.078336 | 0.051283836 | 1.527498838 128 | 8192 | 8192 | 0.238592 | 0.178599358 | 1.335906254 128 | 8192 | 28672 | 0.739888 | 0.575089455 | 1.286561606 128 | 28672 | 8192 | 0.714752 | 0.521111488 | 1.371591367 1024 | 1024 | 8192 | 0.28323 | 0.171804428 | 1.648558319 1024 | 8192 | 8192 | 1.158315 | 1.086783409 | 1.065819275 1024 | 8192 | 28672 | 4.166997 | 4.37412262 | 0.952647604 1024 | 28672 | 8192 | 3.709171 | 3.643035889 | 1.01815373 4096 | 1024 | 8192 | 0.597701 | 0.690698624 | 0.865357094 4096 | 8192 | 8192 | 4.334677 | 6.313800812 | 0.686540065 4096 | 8192 | 28672 | 16.15016 | 23.39763641 | 0.690247328 4096 | 28672 | 8192 | 14.95114 | 22.15902805 | 0.674720217 8192 | 1024 | 8192 | 1.200128 | 1.547074318 | 0.775740341 8192 | 8192 | 8192 | 8.550848 | 12.74256706 | 0.671045949 8192 | 8192 | 28672 | 30.88835 | 44.36366558 | 0.696253236 8192 | 28672 | 8192 | 30.19989 | 44.58220005 | 0.677397932 16384 | 1024 | 8192 | 2.349141 | 3.225851059 | 0.728223751 16384 | 8192 | 8192 | 16.97864 | 25.81973076 | 0.657583766 16384 | 8192 | 28672 | 61.27206 | 91.48414135 | 0.669756127 16384 | 28672 | 8192 | 61.03514 | 90.86410999 | 0.671718869 </body> </html> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
