billishyahao commented on PR #11513: URL: https://github.com/apache/tvm/pull/11513#issuecomment-1150858975
> One important comment about performance. Just to point out. > > In this patch you are using mechanic of auto detection proper layout inside of dnnl_json_runtime. It works correctly and dense primitive will use optimal layout. But it will execute weight reordering each inference call. This reordering significantly break performance (still better than previously, but less than possible). > > To avoid weight reordering it should be done once during `Init`. For that you need change dense weight pattern from `wildcard` to `is_constant`. Hi @apeskov , the following is a clip of dnnl verbose log: `onednn_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_inference,src_f32::blocked:ab:f0 wei_f32::blocked:AB16b64a:f0 bia_undef::undef::f0 dst_f32::blocked:ab:f0,attr-scratchpad:user ,,mb49ic512oc512,0.0400391 onednn_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_inference,src_f32::blocked:ab:f0 wei_f32::blocked:AB16b64a:f0 bia_undef::undef::f0 dst_f32::blocked:ab:f0,attr-scratchpad:user ,,mb49ic512oc1024,0.0717773 onednn_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_inference,src_f32::blocked:ab:f0 wei_f32::blocked:AB16b64a:f0 bia_f32::blocked:a:f0 dst_f32::blocked:ab:f0,attr-scratchpad:user ,,mb49ic512oc512,0.0351562 onednn_verbose,exec,cpu,inner_product,brgemm:avx512_ core,forward_inference,src_f32::blocked:ab:f0 wei_f32::blocked:AB16b64a:f0 bia_f32::blocked:a:f0 dst_f32::blocked:ab:f0,attr-scratchpad:user attr-post-ops:eltwise_gelu_erf ,,mb49ic512oc2048,0.215088 onednn_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_inference,src_f32::blocked:ab:f0 wei_f32::blocked:AB16b64a:f0 bia_f32::blocked:a:f0 dst_f32::blocked:ab:f0,attr-scratchpad:user ,,mb49ic2048oc512,0.227051 onednn_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_inference,src_f32::blocked:ab:f0 wei_f32::blocked:AB16b64a:f0 bia_undef::undef::f0 dst_f32::blocked:ab:f0,attr-scratchpad:user ,,mb49ic512oc512,0.0339355 onednn_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_inference,src_f32::blocked:ab:f0 wei_f32:: blocked:AB16b64a:f0 bia_undef::undef::f0 dst_f32::blocked:ab:f0,attr-scratchpad:user ,,mb49ic512oc1024,0.072998 onednn_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_inference,src_f32::blocked:ab:f0 wei_f32::blocked:AB16b64a:f0 bia_f32::blocked:a:f0 dst_f32::blocked:ab:f0,attr-scratchpad:user ,,mb49ic512oc512,0.0349121 onednn_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_inference,src_f32::blocked:ab:f0 wei_f32::blocked:AB16b64a:f0 bia_f32::blocked:a:f0 dst_f32::blocked:ab:f0,attr-scratchpad:user attr-post-ops:eltwise_gelu_erf ,,mb49ic512oc2048,0.226807 onednn_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_inference,src_f32::blocked:ab:f0 wei_f32::blocked:AB16b64a:f0 bia_f32::blocked:a:f0 dst_f32::bloc ked:ab:f0,attr-scratchpad:user ,,mb49ic2048oc512,0.231934 ` I don't observe the reorder primitive executed before or after inner_product. I think current mechanism still work? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org