MasterJH5574 opened a new pull request, #18353: URL: https://github.com/apache/tvm/pull/18353
Following recent JIT refactor in FlashInfer that uses TVM FFI as the JIT interface, this PR updates the JIT integration of FlashInfer in TVM. Major changes: * we leverage FlashInfer's `JitSpec.build_and_load` to compile all the JIT-generated source files, and remove the compilation logic in TVM. * for efficient tensor buffer management and efficient pointer calculation, we enforced all `byte_offset` fields of auxiliary tensors in KV cache to be zeros. The byte offset is now directly applied to the data pointers. * we also add a new parameter to FlashInfer JIT that controls whether returning a linked shared library, or a list of compiled object paths. For unit tests, returning a shared library is convenient and preferred, while for cases such as MLC model compilation, object files are needed to serialize the compiled model. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
