[PR] [CUDA] Update FlashInfer JIT integration [tvm]

via GitHub Wed, 01 Oct 2025 12:24:09 -0700


MasterJH5574 opened a new pull request, #18353:
URL: https://github.com/apache/tvm/pull/18353


   Following recent JIT refactor in FlashInfer that uses TVM FFI as the JIT 
interface, this PR updates the JIT integration of FlashInfer in TVM.
   
   Major changes:
   * we leverage FlashInfer's `JitSpec.build_and_load` to compile all the 
JIT-generated source files, and remove the compilation logic in TVM.
   * for efficient tensor buffer management and efficient pointer calculation, 
we enforced all `byte_offset` fields of auxiliary tensors in KV cache to be 
zeros. The byte offset is now directly applied to the data pointers.
   * we also add a new parameter to FlashInfer JIT that controls whether 
returning a linked shared library, or a list of compiled object paths. For unit 
tests, returning a shared library is convenient and preferred, while for cases 
such as MLC model compilation, object files are needed to serialize the 
compiled model.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [CUDA] Update FlashInfer JIT integration [tvm]

Reply via email to