[GitHub] [tvm] cyx-6 opened a new pull request, #14608: [Unity][BYOC] Add fused patterns for stacked attention

via GitHub Wed, 12 Apr 2023 11:22:05 -0700


cyx-6 opened a new pull request, #14608:
URL: https://github.com/apache/tvm/pull/14608


   In some models, the input Q, K and V for attention ops are from a stacked 
tensor initially, and then they are splitted and reshaped to call attention op, 
like
   
   stacked_qkv -> split -> reshape -> attention.
   
   Actually, we could to skip the split and reshape ops, by manipulating the 
layout parameters in codegen.
   
   This PR adds the such fused patterns for stacked attention in BYOC. So that 
we are able to codegen directly from stacked_qkv.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] cyx-6 opened a new pull request, #14608: [Unity][BYOC] Add fused patterns for stacked attention

Reply via email to