masahi opened a new pull request, #16474: URL: https://github.com/apache/tvm/pull/16474
Flash attention recently added support for loading from paged KV cache in https://github.com/Dao-AILab/flash-attention/commit/54e80a3829c6d2337570d01e78ebd9529c02d342. The support was added to Flash-Decoding kernel, which we haven't used so far. This PR let's us use Flash Decoding with paged KV cache support from TVM. We already use other kernels from Flash attention via BYOC, but due to the specialized nature of this kernel, it is supported as a contrib kernel (similar to vllm). @vinx13 @sunggg -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org