This is an automated email from the ASF dual-hosted git repository.
tqchen pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm-site.git
The following commit(s) were added to refs/heads/main by this push:
new b89bf8f701 Initial post checkin
b89bf8f701 is described below
commit b89bf8f701616b12ff343c571cdaef56a00743f0
Author: tqchen <[email protected]>
AuthorDate: Tue Oct 21 11:39:48 2025 -0700
Initial post checkin
---
_data/menus.yml | 6 +-
_layouts/post.html | 4 +-
_posts/2025-10-21-tvm-ffi.md | 127 +++++++++++++++++++++++++++++++++++
css/custom.scss | 16 +++++
images/tvm-ffi/c_abi.png | Bin 0 -> 346556 bytes
images/tvm-ffi/cuda_export.png | Bin 0 -> 301164 bytes
images/tvm-ffi/interop-challenge.png | Bin 0 -> 228386 bytes
images/tvm-ffi/load_cpp.png | Bin 0 -> 152841 bytes
images/tvm-ffi/load_pytorch.png | Bin 0 -> 73488 bytes
images/tvm-ffi/mydsl.png | Bin 0 -> 103941 bytes
images/tvm-ffi/safecall.png | Bin 0 -> 44343 bytes
images/tvm-ffi/shiponewheel.png | Bin 0 -> 210773 bytes
images/tvm-ffi/throw.png | Bin 0 -> 27243 bytes
images/tvm-ffi/tvm-ffi.png | Bin 0 -> 138495 bytes
images/tvm-ffi/tvmffiany.png | Bin 0 -> 166291 bytes
images/tvm-ffi/tvmffiobject.png | Bin 0 -> 60762 bytes
blog.html => posts.html | 9 ++-
17 files changed, 156 insertions(+), 6 deletions(-)
diff --git a/_data/menus.yml b/_data/menus.yml
index c535446e33..9f02025c49 100644
--- a/_data/menus.yml
+++ b/_data/menus.yml
@@ -2,7 +2,9 @@
link: /community
- name: Download
link: /download
-- name: Docs
- link: https://tvm.apache.org/docs/
+- name: Posts
+ link: /posts
+- name: TVM FFI
+ link: https://github.com/apache/tvm-ffi/
- name: Github
link: https://github.com/apache/tvm/
diff --git a/_layouts/post.html b/_layouts/post.html
index b9c5edee29..7011aef6c3 100644
--- a/_layouts/post.html
+++ b/_layouts/post.html
@@ -27,7 +27,9 @@ layout: default
</span>
</span>{% endif %}</p>
</br>
- {{ content }}
+ <div class="post-content">
+ {{ content }}
+ </div>
</div>
</div>
</div>
diff --git a/_posts/2025-10-21-tvm-ffi.md b/_posts/2025-10-21-tvm-ffi.md
new file mode 100644
index 0000000000..9e49f27d10
--- /dev/null
+++ b/_posts/2025-10-21-tvm-ffi.md
@@ -0,0 +1,127 @@
+---
+ layout: post
+ title: "Building an Open ABI and FFI for ML Systems"
+ date: 2025-10-21
+ author: "Apache TVM FFI Community"
+---
+
+
+
+We are currently living in an exciting era for AI, where machine learning
systems and infrastructures are crucial for training and deploying efficient AI
models. The modern machine learning systems landscape comes rich with diverse
components, including popular ML frameworks and array libraries like JAX,
PyTorch, and CuPy. It also includes specialized libraries such as
FlashAttention, FlashInfer and cuDNN. Furthermore, there's a growing trend of
ML compilers and domain-specific languages [...]
+
+The exciting growth of the ecosystem is the reason for the fast pace of
innovation in AI today. However, it also presents a significant challenge:
**interoperability**. Many of those components need to integrate with each
other. For example, libraries such as FlashInfer, cuDNN needs to be integrated
into PyTorch, JAX, TensorRT’s runtime system, each may come with different
interface requirements. ML compilers and DSLs also usually expose Python JIT
binding support, while also need to bri [...]
+
+{: style="width: 70%; margin:
auto; display: block;" }
+
+The the core of these interoperability challenges are the **Application Binary
Interface (ABI)** and the **Foreign Function Interface (FFI)**. **ABI** defines
how data structures are stored in memory and precisely what occurs when a
function is called. For instance, the way torch stores Tensors may be different
from say cupy/numpy, so we cannot directly pass a torch.Tensor pointer and its
treatment as a cupy.NDArray. The very nature of machine learning applications
usually mandates cross [...]
+
+All of the above observations call for a **need for ABI and FFI for the ML
systems** use-cases. Looking at the state today, luckily, we do have something
to start with – the C ABI, which every programming language speaks and remains
stable over time. Unfortunately, C only focuses on low-level data types such as
int, float and raw pointers. On the other end of the spectrum, we know that
python is something that must gain first-class support, but also there is still
a need for different-la [...]
+
+This post introduces TVM FFI, an **open ABI and FFI for machine learning
systems**. The project evolved from multiple years of ABI calling conventions
design iterations in the Apache TVM project. We find that the design can be
made generic, independent from the choice of compiler/language and should
benefit the ML systems community. As a result, we brought into a minimal
library built from the ground up with a clear intention to become an open,
standalone library that can be shared and e [...]
+
+- **Stable, minimal C ABI** designed for kernels, DSLs, and runtime
extensibility.
+- **Zero-copy interop** across PyTorch, JAX, and CuPy using [DLPack
protocol](https://data-apis.org/array-api/2024.12/design_topics/data_interchange.html).
+- **Compact value and call convention** covering common data types for ultra
low-overhead ML applications.
+- **Multi-language support out of the box:** Python, C++, and Rust (with a
path towards more languages).
+
+{: style="width: 70%; margin: auto;
display: block;" }
+
+Importantly, the goal of the project is not to create another framework or
language. Instead it aims to get the ML system components to do their magic,
and enables them to amplify each other more organically.
+
+
+## **Technical Design**
+
+To start with, we need a mechanism to store the values that are passing across
machine learning frameworks. It achieves this using a core data structure
called TVMFFIAny. It is a 16 bytes C structure that follows the design
principle of tagged-union
+
+{: style="width: 50%; margin: auto;
display: block;" }
+
+
+
+The objects in TVMFFIObject are managed as intrusive pointers, where
TVMFFIObject itself contains the header of the pointer that helps to manage
type information and deletion. This design allows us to use the same type_index
mechanism that allows for the future growth and recognition of new kinds of
objects within the FFI, ensuring extensibility. The standalone deleter ensures
objects can be safely allocated by one source or language and deleted in
another place.
+
+{: style="width: 50%; margin: auto;
display: block;" }
+
+
+We provide first-class support for owned and unowned Tensor support that
adopts DLPack DLTensor layout. Thanks to the collective efforts from the ML
system ecosystems, we can leverage DLPack for first class support and bring in
tensors/arrays from PyTorch, NumPy, JAX. We also provide support for common
data types such as string, array, and map. Generally, these values cover most
common machine learning system use cases we know of. The type_index mechanism
still leaves room for registerin [...]
+
+As discussed in the overview, we need to consider foreign function calls as
first class citizens. We adopt a single standard C function as follows:
+
+{: style="width: 50%; margin: auto;
display: block;" }
+
+
+The handle contains the pointer to the function object itself, allowing us to
support closures. args and num_args describe the input arguments and results
store the return value. When args and results contain heap-managed objects, we
expect the caller to own args and results.
+
+We call this approach a packed function, as it provides a single signature to
represent all functions in a “type-erased” way. It saves the need to declare
and JIT shim for each FFI function call while maintaining reasonable
efficiency. This mechanism enables the following scenarios
+
+- **Calling from Dynamic Languages (e.g., Python):** we provide a tvm_ffi
binding that prepares the args based on dynamically examining Python arguments
passed in.
+- **Calling from Static Languages (e.g., C++):** For static languages, we can
leverage C++ templates to directly instantiate the arguments on the stack,
saving the need for dynamic examination
+- **Dynamic language Callbacks:** the signature enables us to easily bring
dynamic language (Python) callbacks as ffi::Function, as we can take each
argument and convert to the dynamic values.
+
+**Efficiency** In practice, we find this approach is sufficient for machine
learning focused workloads. For example, we can get to **0.4 us** level
overhead for Python/C++ calls, which is already very close to the limit (for
reference, each python c extension call is at least **0.1us**), and much faster
than most ML system python eager use cases which are usually above 1-2 us
level. When both sides of calls are static languages, the overhead will go down
to tens of nanoseconds. As a sid [...]
+
+We support first class Function objects that allow us to also pass
function/closures from different places around, enabling cool usages such as
quick python callback for prototyping, and dynamic Functor creation for
driver-based kernel launching.
+
+**Error handling** Because the function ABI is based on C, we need a method to
propagate errors. A non-zero return value of TVMFFISafeCallType indicates an
error. We provide a thread-local storage (TLS) based C API to set and fetch
errors, and we also build library bindings to automatically translate
exceptions. For example, the macro
+
+{: style="width: 50%; margin: auto;
display: block;" }
+
+will raise an exception that translates into a TypeError in Python. We also
preserve and propagate tracebacks across FFI boundaries whenever possible. The
TLS-based API is a simple yet effective convention for DSL compilers and
libraries to leverage for efficient error propagation.
+
+**First-class GPU Support for PyTorch** We provide first-class support for
torch.Tensors, it will automatically zero-copy transfer to an FFI Tensor. We
also provide a minimal stream context so that the stream is carried over from
the PyTorch Stream context. In short, calling a function would serve like a
normal PyTorch functions when passing in torch Tensor arguments.
+
+## Ship One Wheel
+
+TVM FFI provides a minimal pip package that includes libtvm_ffi, which handles
essential registration and context management. The package consists of a C++
library that automatically manages function types built upon the C ABI, and a
Python library for interacting with this convention.
+Because we defined a stable ABI for ML systems, kernel libraries, the compiled
library is agnostic to **Python ABI and PyTorch versions,** and can work across
multiple python versions (including free-threaded python). This allows us to
**ship one wheel(library)** for multiple frameworks and python environments,
and greatly simplifies the deployment.
+
+{: style="width: 70%; margin: auto;
display: block;" }
+
+
+The above figure shows how it works in practice, most libraries only need to
ship `mylib.so` that links to the ABI, then the particular python version
specific apache-tvm-ffi package will handle the bridge to specific Python
version. The same mechanism also works for non-python inference engines. There
are many ways to build a library that targets the tvm-ffi ABI. The following
example shows how can we do that in cuda
+
+{: style="width: 50%; margin: auto;
display: block;" }
+
+
+Once we compiled this library into mylib, then it can be loaded back into
Python or any other runtime that works with TVM FFI.
+
+{: style="width: 50%; margin: auto;
display: block;" }
+
+Notably, this same function can be loaded from other runtimes and languages
that interfaces with the tvm-ffi. For example, the same example contains a C++
loading
+
+{: style="width: 50%; margin: auto;
display: block;" }
+
+
+The ABI is designed **with the needs of DSL compilers in mind.** Because the
ABI is minimal, we can readily target it in C (or any of low-level compiler IRs
such as LLVM IR, or MLIR LLVM dialect).
+Once DSL integrates with the ABI, we can leverage the same flow to load back
and run the library as normal torch functions. Additionally, we can also
support JIT mechanisms to the same ABI.
+
+{: style="width: 40%; margin: auto;
display: block;" }
+
+
+
+As we can see, the common open ABI foundation offers numerous opportunities
for ML systems to interoperate. We anticipate that this solution can
significantly benefit various aspects of ML systems and AI infrastructure:
+
+* **Kernel libraries**: Ship a single package to support multiple frameworks,
Python versions, and different languages.
+* **Kernel DSLs**: a reusable ABI for JIT and AOT kernel exposure frameworks
and runtimes.
+* **Frameworks and runtimes**: Offer a uniform interop with ABI-compliant
libraries and DSLs.
+* **ML infrastructure**: Enable out-of-the-box interoperability for Python,
C++, and Rust.
+* **Coding agents**: Establish a unified mechanism for shipping generated code
in production.
+
+Currently, the tvm-ffi package offers out-of-the-box support for frameworks
like PyTorch, JAX, and CuPy. We are also collaborating with machine learning
system builders to develop solutions based on it. For instance, FlashInfer now
ships with tvm-ffi, and active work is underway to enable more DSL libraries,
agent solutions, and inference runtimes.
+This project also is an important step for Apache TVM itself, as we will start
to
+provide neutral and modular infrastructure pieces that can be useful broadly to
+the machine learning system ecosystems.
+
+## Links
+
+TVM FFI is an open convention that is independent from a specific compiler or
framework.
+We welcome contributions and encourage the ML systems community to collaborate
on improving the open ABI.
+Please checkout the following resources:
+
+- Github:
[https://github.com/apache/tvm-ffi/](https://github.com/apache/tvm-ffi/)
+- [Quick start
examples](https://tvm.apache.org/ffi/get_started/quickstart.html)
+
+## Acknowledgement
+
+The project draws collective wisdoms of the Machine Learning System community
and python open source ecosystem, including past development insights of many
developers from numpy, PyTorch, JAX, Caffe, mxnet, XGBoost, cuPy, pybind11,
nanobind and more.
+
+We would specifically like to thank the PyTorch team, JAX team, CUDA python
team, cuteDSL team, cuTile team, Apache TVM community, XGBoost team, TiLang
team, Triton distributed team, FlashInfer team, SGLang community,
TensorRT-LLM community, the vLLM community, for their their insightful
feedbacks.
diff --git a/css/custom.scss b/css/custom.scss
index 01da555b30..3412368487 100644
--- a/css/custom.scss
+++ b/css/custom.scss
@@ -58,6 +58,17 @@ ul{
padding:0;
margin:0;
}
+
+/* Re-enable bullets inside blog post content only */
+.post-content ul,
+.post-content ol {
+ list-style: revert;
+ margin-left: 1.25rem;
+ padding-left: 1.25rem;
+}
+.post-content li {
+ list-style-position: outside;
+}
h1 {
font-weight: 400;
font-size: 55px;
@@ -1421,6 +1432,11 @@ table th, table td {
.highlight .w {
color: #bbbbbb;
}
+.bloglist {
+ list-style-type: disc;
+ padding-left: 20px;
+}
+
.highlight {
background-color: #f8f8f8;
}
diff --git a/images/tvm-ffi/c_abi.png b/images/tvm-ffi/c_abi.png
new file mode 100644
index 0000000000..87f64e3fa2
Binary files /dev/null and b/images/tvm-ffi/c_abi.png differ
diff --git a/images/tvm-ffi/cuda_export.png b/images/tvm-ffi/cuda_export.png
new file mode 100644
index 0000000000..9babbf4ced
Binary files /dev/null and b/images/tvm-ffi/cuda_export.png differ
diff --git a/images/tvm-ffi/interop-challenge.png
b/images/tvm-ffi/interop-challenge.png
new file mode 100644
index 0000000000..9449b5fa2f
Binary files /dev/null and b/images/tvm-ffi/interop-challenge.png differ
diff --git a/images/tvm-ffi/load_cpp.png b/images/tvm-ffi/load_cpp.png
new file mode 100644
index 0000000000..21240988ef
Binary files /dev/null and b/images/tvm-ffi/load_cpp.png differ
diff --git a/images/tvm-ffi/load_pytorch.png b/images/tvm-ffi/load_pytorch.png
new file mode 100644
index 0000000000..87612bbbd6
Binary files /dev/null and b/images/tvm-ffi/load_pytorch.png differ
diff --git a/images/tvm-ffi/mydsl.png b/images/tvm-ffi/mydsl.png
new file mode 100644
index 0000000000..e5a08d7284
Binary files /dev/null and b/images/tvm-ffi/mydsl.png differ
diff --git a/images/tvm-ffi/safecall.png b/images/tvm-ffi/safecall.png
new file mode 100644
index 0000000000..d64773c94a
Binary files /dev/null and b/images/tvm-ffi/safecall.png differ
diff --git a/images/tvm-ffi/shiponewheel.png b/images/tvm-ffi/shiponewheel.png
new file mode 100644
index 0000000000..fbf8d4dd54
Binary files /dev/null and b/images/tvm-ffi/shiponewheel.png differ
diff --git a/images/tvm-ffi/throw.png b/images/tvm-ffi/throw.png
new file mode 100644
index 0000000000..cc4f5f86ac
Binary files /dev/null and b/images/tvm-ffi/throw.png differ
diff --git a/images/tvm-ffi/tvm-ffi.png b/images/tvm-ffi/tvm-ffi.png
new file mode 100644
index 0000000000..2bcd65ec65
Binary files /dev/null and b/images/tvm-ffi/tvm-ffi.png differ
diff --git a/images/tvm-ffi/tvmffiany.png b/images/tvm-ffi/tvmffiany.png
new file mode 100644
index 0000000000..ef622e29b1
Binary files /dev/null and b/images/tvm-ffi/tvmffiany.png differ
diff --git a/images/tvm-ffi/tvmffiobject.png b/images/tvm-ffi/tvmffiobject.png
new file mode 100644
index 0000000000..bcf8c72e98
Binary files /dev/null and b/images/tvm-ffi/tvmffiobject.png differ
diff --git a/blog.html b/posts.html
similarity index 71%
rename from blog.html
rename to posts.html
index 3cefd770d9..bcfa3c2b7c 100644
--- a/blog.html
+++ b/posts.html
@@ -1,16 +1,18 @@
---
layout: page
-title : Blog
-header : Blogposts
+title : Posts
+header : Posts
group : blog
order : 100
---
{% include JB/setup %}
-<h1>TVM Community Blog</h1>
+<h1>Posts</h1>
<ul class="bloglist">
{% for post in site.posts %}
+{% assign post_year = post.date | date: "%Y" %}
+{% if post_year >= "2025" %}
<li>
<span>
<a class="post-link" href="{{ post.url | prepend: site.baseurl }}">{{
post.title }}</a>
@@ -20,5 +22,6 @@ order : 100
{{ post.date | date: "%b %-d, %Y" }}
</span>
</li>
+{% endif %}
{% endfor %}
</ul>