This is an automated email from the ASF dual-hosted git repository.
junrushao pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm-ffi.git
The following commit(s) were added to refs/heads/main by this push:
new 0369a07 doc: Tensor and DLPack (#363)
0369a07 is described below
commit 0369a07158685dd260f00f81b5ccfa67c35634c6
Author: Junru Shao <[email protected]>
AuthorDate: Sun Jan 4 15:00:16 2026 -0800
doc: Tensor and DLPack (#363)
---
docs/.rstcheck.cfg | 2 +-
docs/concepts/abi_overview.md | 2 +
docs/concepts/tensor.rst | 483 ++++++++++++++++++++++++++++++++++++
docs/get_started/quickstart.rst | 78 +++---
docs/get_started/stable_c_abi.rst | 35 ++-
docs/index.rst | 1 +
docs/packaging/python_packaging.rst | 91 ++++---
7 files changed, 588 insertions(+), 104 deletions(-)
diff --git a/docs/.rstcheck.cfg b/docs/.rstcheck.cfg
index 080a7cc..4e532c1 100644
--- a/docs/.rstcheck.cfg
+++ b/docs/.rstcheck.cfg
@@ -1,5 +1,5 @@
[rstcheck]
report_level = warning
ignore_directives = automodule, autosummary, currentmodule, toctree, ifconfig,
tab-set, collapse, tabs, dropdown
-ignore_roles = ref, cpp:class, cpp:func, py:func, c:macro,
external+data-api:doc, external+scikit_build_core:doc
+ignore_roles = ref, cpp:class, cpp:func, py:func, c:macro,
external+data-api:doc, external+scikit_build_core:doc, external+dlpack:doc
ignore_languages = cpp, python
diff --git a/docs/concepts/abi_overview.md b/docs/concepts/abi_overview.md
index c8e0cd5..7762ede 100644
--- a/docs/concepts/abi_overview.md
+++ b/docs/concepts/abi_overview.md
@@ -184,6 +184,8 @@ and hash TVMFFIAny in bytes for quick equality checks
without going through
type index switching.
:::
+(object-storage-format)=
+
## Object Storage Format
When TVMFFIAny points to a heap-allocated object (such as n-dimensional
arrays),
diff --git a/docs/concepts/tensor.rst b/docs/concepts/tensor.rst
new file mode 100644
index 0000000..17aeef7
--- /dev/null
+++ b/docs/concepts/tensor.rst
@@ -0,0 +1,483 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
+Tensor and DLPack
+=================
+
+At runtime, TVM-FFI often needs to accept tensors from many sources:
+
+* Frameworks (e.g. PyTorch, JAX) via :py:meth:`array_api.array.__dlpack__`;
+* C/C++ callers passing :c:struct:`DLTensor* <DLTensor>`;
+* Tensors allocated by a library but managed by TVM-FFI itself.
+
+TVM-FFI standardizes on **DLPack as the lingua franca**: tensors are
+built on top of DLPack structs with additional C++ convenience methods
+and minimal extensions for ownership management.
+
+.. tip::
+
+ Prefer :cpp:class:`tvm::ffi::TensorView` or :cpp:class:`tvm::ffi::Tensor` in
C++ code;
+ they provide safer and more convenient abstractions over raw DLPack structs.
+
+
+This tutorial is organized as follows:
+
+* **Tensor Classes**: introduces what tensor types are provided, and which one
you should use.
+* **Conversion between TVMFFIAny**: how tensors flow across ABI boundaries.
+* **Tensor APIs**: the most important tensor APIs you will use, including
allocation and stream handling.
+
+Glossary
+--------
+
+DLPack
+ A cross-library tensor interchange standard defined in the small C header
``dlpack.h``.
+ It defines pure C data structures for describing n-dimensional arrays and
their memory layout,
+ including :c:struct:`DLTensor`, :c:struct:`DLManagedTensorVersioned`,
:c:struct:`DLDataType`,
+ :c:struct:`DLDevice`, and related types.
+
+View (non-owning)
+ A "header" that describes a tensor but does not own its memory. When a
consumer
+ receives a view, it must respect that the producer owns the underlying
storage and controls its
+ lifetime. The view is valid only while the producer guarantees it remains
valid.
+
+Managed object (owning)
+ An object that includes lifetime management, using reference counting or a
cleanup callback
+ mechanism. This establishes a contract between producer and consumer about
when the consumer's ownership ends.
+
+.. note::
+
+ As a loose analogy, think of **view** vs. **managed** as similar to
+ ``T*`` (raw pointer) vs. ``std::shared_ptr<T>`` (reference-counted pointer)
in C++.
+
+Tensor Classes
+--------------
+
+This section defines each tensor type you will encounter in the TVM-FFI C++
API and explains the
+*intended* usage. Exact C layout details are covered later in
:ref:`layout-and-conversion`.
+
+.. tip::
+
+ On the Python side, only :py:class:`tvm_ffi.Tensor` exists. It strictly
follows DLPack semantics for interop and can be converted to PyTorch via
:py:func:`torch.from_dlpack`.
+
+
+DLPack Tensors
+~~~~~~~~~~~~~~
+
+DLPack tensors come in two main flavors:
+
+*Non-owning* object, :c:struct:`DLTensor`
+ The tensor descriptor is a **view** of the underlying data.
+ It describes the device the tensor lives on, its shape, dtype, and data
pointer. It does not own the underlying data.
+
+*Owning* object, :c:struct:`DLManagedTensorVersioned`, or its legacy
counterpart :c:struct:`DLManagedTensor`
+ It is a **managed** variant that wraps a :c:struct:`DLTensor` descriptor with
additional fields.
+ Notably, it includes a ``deleter`` callback that releases ownership when the
consumer is done with the tensor,
+ and an opaque ``manager_ctx`` handle used by the producer to store additional
context.
+
+TVM-FFI Tensors
+~~~~~~~~~~~~~~~
+
+Similarly, TVM-FFI defines two main tensor types in C++:
+
+*Non-owning* object, :cpp:class:`tvm::ffi::TensorView`
+ A thin C++ wrapper around :c:struct:`DLTensor` for inspecting metadata and
accessing the data pointer.
+ It is designed for **kernel authors** to inspect metadata and access the
underlying data pointer during a call,
+ without taking ownership of the tensor's memory. Being a **view** also means
you must ensure the backing tensor remains valid while you use it.
+
+*Owning* object, :cpp:class:`tvm::ffi::TensorObj` and
:cpp:class:`tvm::ffi::Tensor`
+ :cpp:class:`Tensor <tvm::ffi::Tensor>`, similar to
``std::shared_ptr<TensorObj>``, is the managed class to hold heap-allocated
+ :cpp:class:`TensorObj <tvm::ffi::TensorObj>`. Once the reference count drops
to zero, the cleanup logic deallocates the descriptor
+ and releases ownership of the underlying data buffer.
+
+
+.. note::
+
+ - For handwritten C++, always use TVM-FFI tensors over DLPack's raw C
tensors.
+
+ - For compiler development, DLPack's raw C tensors are recommended because
C is easier to target from codegen.
+
+The owning :cpp:class:`Tensor <tvm::ffi::Tensor>` is the recommended interface
for passing around managed tensors.
+Use owning tensors when you need one or more of the following:
+
+* return a tensor from a function across ABI, which will be converted to
:cpp:class:`tvm::ffi::Any`;
+* allocate an output tensor as the producer, and hand it to a kernel consumer;
+* store a tensor in a long-lived object.
+
+.. admonition:: :cpp:class:`TensorObj <tvm::ffi::TensorObj>` vs
:cpp:class:`Tensor <tvm::ffi::Tensor>`
+ :class: hint
+
+ :cpp:class:`Tensor <tvm::ffi::Tensor>` is an intrusive pointer of a
heap-allocated :cpp:class:`TensorObj <tvm::ffi::TensorObj>`.
+ As an analogy to ``std::shared_ptr``, think of
+
+ .. code-block:: cpp
+
+ using Tensor = std::shared_ptr<TensorObj>;
+
+ You can convert between the two types:
+
+ - :cpp:func:`Tensor::get() <tvm::ffi::Tensor::get>` converts it to
:cpp:class:`TensorObj* <tvm::ffi::TensorObj>`.
+ - :cpp:func:`GetRef\<Tensor\> <tvm::ffi::GetRef>` converts a
:cpp:class:`TensorObj* <tvm::ffi::TensorObj>` back to :cpp:class:`Tensor
<tvm::ffi::Tensor>`.
+
+.. _layout-and-conversion:
+
+Tensor Layouts
+~~~~~~~~~~~~~~
+
+:ref:`Figure 1 <fig:layout-tensor>` summarizes the layout relationships among
DLPack tensors and TVM-FFI tensors.
+All tensor classes are POD-like; :cpp:class:`tvm::ffi::TensorObj` is also a
standard TVM-FFI object, typically
+heap-allocated and reference-counted.
+
+.. figure::
https://raw.githubusercontent.com/tlc-pack/web-data/main/images/tvm-ffi/tensor-layout.png
+ :alt: Layout of DLPack Tensors and TVM-FFI Tensors
+ :align: center
+ :name: fig:layout-tensor
+
+ Figure 1. Layout specification of DLPack tensors and TVM-FFI tensors. All
the tensor types share :c:struct:`DLTensor` as the common descriptor, while
carrying different metadata and ownership semantics.
+
+As demonstrated in the figure, all tensor classes share :c:struct:`DLTensor`
as the common descriptor.
+In particular,
+
+- :c:struct:`DLTensor` and :cpp:class:`TensorView <tvm::ffi::TensorView>`
share the exact same memory layout.
+- :c:struct:`DLManagedTensorVersioned` and :cpp:class:`TensorObj
<tvm::ffi::TensorObj>` both have a deleter
+ callback to manage the lifetime of the underlying data buffer, while
:c:struct:`DLTensor` and :cpp:class:`TensorView <tvm::ffi::TensorView>` do not.
+- Compared with :cpp:class:`TensorView <tvm::ffi::TensorView>`,
:cpp:class:`TensorObj <tvm::ffi::TensorObj>`
+ has an extra TVM-FFI object header, making it reference-countable via the
standard managed reference :cpp:class:`Tensor <tvm::ffi::Tensor>`.
+
+What Tensor is not
+~~~~~~~~~~~~~~~~~~
+
+TVM-FFI is not a tensor library. While it presents a unified representation
for tensors,
+it does not provide any of the following:
+
+* kernels, such as vector addition, matrix multiplication;
+* host-device copy or synchronization primitives;
+* advanced indexing or slicing;
+* automatic differentiation or computational graph support.
+
+Conversion between :cpp:class:`TVMFFIAny`
+-----------------------------------------
+
+At the stable C ABI boundary, TVM-FFI passes values using an "Any-like"
carrier, often referred
+to as :cpp:class:`Any <tvm::ffi::Any>` (owning) or :cpp:class:`AnyView
<tvm::ffi::AnyView>` (non-owning).
+These are 128-bit tagged unions derived from :cpp:class:`TVMFFIAny` that
contain:
+
+* a :cpp:member:`type_index <TVMFFIAny::type_index>` that indicates the type
of the payload, and
+* a union payload that may contain:
+
+ * A1. Primitive values, such as integers, floats, enums, raw pointers, or
+ * A2. TVM-FFI object handles, which are reference-counted pointers.
+
+Specifically for tensors stored in :cpp:class:`Any <tvm::ffi::Any>` or
:cpp:class:`AnyView <tvm::ffi::AnyView>`,
+there are two possible representations:
+
+* Non-owning views as A1 (primitive values), i.e. :c:struct:`DLTensor*
<DLTensor>` whose type index is
:cpp:enumerator:`TVMFFITypeIndex::kTVMFFIDLTensorPtr`.
+* Owning objects as A2 (TVM-FFI tensor object handles), i.e.,
:cpp:class:`TensorObj* <tvm::ffi::TensorObj>` whose type index is
:cpp:enumerator:`TVMFFITypeIndex::kTVMFFITensor`.
+
+Therefore, when you see a tensor in :cpp:class:`Any <tvm::ffi::Any>` or
:cpp:class:`AnyView <tvm::ffi::AnyView>`,
+first check its :cpp:member:`type_index <TVMFFIAny::type_index>` to determine
whether it is a raw pointer or an object handle
+before converting it to the desired tensor type.
+
+.. important::
+
+ As a rule of thumb, an owning object can be converted to a non-owning view,
but not vice versa.
+
+To Non-Owning Tensor
+~~~~~~~~~~~~~~~~~~~~
+
+This converts an owning :cpp:class:`Any <tvm::ffi::Any>` or non-owning
:cpp:class:`AnyView <tvm::ffi::AnyView>` into a non-owning tensor.
+Two type indices can be converted to a non-owning tensor view:
+
+- :cpp:enumerator:`TVMFFITypeIndex::kTVMFFIDLTensorPtr`: the payload is a raw
pointer :c:struct:`DLTensor* <DLTensor>`.
+- :cpp:enumerator:`TVMFFITypeIndex::kTVMFFITensor`: the payload is a TVM-FFI
tensor object handle, from which you can extract the underlying
:c:struct:`DLTensor` according to the layout defined in :ref:`Figure 1
<fig:layout-tensor>`.
+
+The snippets below are plain C (C99-compatible) and assume the TVM-FFI C ABI
definitions from
+``tvm/ffi/c_api.h`` are available.
+
+.. code-block:: cpp
+
+ // Converts Any/AnyView to DLTensor*
+ int AnyToDLTensorView(const TVMFFIAny* value, DLTensor** out) {
+ if (value->type_index == kTVMFFIDLTensorPtr) {
+ *out = (DLTensor*)value->v_ptr;
+ return SUCCESS;
+ }
+ if (value->type_index == kTVMFFITensor) {
+ // See Figure 1 for layout of tvm::ffi::TensorObj
+ TVMFFIObject* obj = value->v_obj;
+ *out = (DLTensor*)((char*)obj + sizeof(TVMFFIObject));
+ return SUCCESS;
+ }
+ return FAILURE;
+ }
+
+:cpp:class:`TensorView <tvm::ffi::TensorView>` can be constructed directly
from the returned :c:struct:`DLTensor* <DLTensor>`.
+
+To Owning Tensor
+~~~~~~~~~~~~~~~~
+
+This converts an owning :cpp:class:`Any <tvm::ffi::Any>` or non-owning
:cpp:class:`AnyView <tvm::ffi::AnyView>` into an owning :cpp:class:`TensorObj
<tvm::ffi::TensorObj>`. Only type index
:cpp:enumerator:`TVMFFITypeIndex::kTVMFFITensor` can be converted to an owning
tensor because it contains a TVM-FFI tensor object handle. The conversion
involves incrementing the reference count to take ownership.
+
+.. code-block:: cpp
+
+ // Converts Any/AnyView to TensorObj*
+ int AnyToOwnedTensor(const TVMFFIAny* value, TVMFFIObjectHandle* out) {
+ if (value->type_index == kTVMFFITensor) {
+ *out = (TVMFFIObjectHandle)value->v_obj;
+ return SUCCESS;
+ }
+ return FAILURE;
+ }
+
+The caller can obtain shared ownership by calling
:cpp:func:`TVMFFIObjectIncRef` on the returned handle,
+and later release it with :cpp:func:`TVMFFIObjectDecRef`.
+
+From Owning Tensor
+~~~~~~~~~~~~~~~~~~
+
+This converts an owning :cpp:class:`TensorObj <tvm::ffi::TensorObj>` to an
owning :cpp:class:`Any <tvm::ffi::Any>` or non-owning :cpp:class:`AnyView
<tvm::ffi::AnyView>`. It sets the type index to
:cpp:enumerator:`TVMFFITypeIndex::kTVMFFITensor` and stores the tensor object
handle in the payload.
+
+.. code-block:: cpp
+
+ // Converts TensorObj* to AnyView
+ int TensorToAnyView(TVMFFIObjectHandle tensor, TVMFFIAny* out_any_view) {
+ out_any_view->type_index = kTVMFFITensor;
+ out_any_view->zero_padding = 0;
+ out_any_view->v_obj = (TVMFFIObject*)tensor;
+ return SUCCESS;
+ }
+
+ // Converts TensorObj* to Any
+ int TensorToAny(TVMFFIObjectHandle tensor, TVMFFIAny* out_any) {
+ TVMFFIAny any_view;
+ int ret = TensorToAnyView(tensor, &any_view);
+ if (ret != SUCCESS) {
+ return ret;
+ }
+ TVMFFIObjectIncRef(tensor);
+ *out_any = any_view;
+ return SUCCESS;
+ }
+
+The C API :cpp:func:`TVMFFIObjectIncRef` obtains shared ownership of the
tensor into `out_any`. Later, release it with
+:cpp:func:`TVMFFIObjectDecRef` on its :cpp:member:`TVMFFIAny::v_obj` field.
+
+From Non-Owning Tensor
+~~~~~~~~~~~~~~~~~~~~~~
+
+This converts a non-owning :cpp:class:`TensorView <tvm::ffi::TensorView>` to
non-owning :cpp:class:`AnyView <tvm::ffi::AnyView>`.
+It sets the type index to
:cpp:enumerator:`TVMFFITypeIndex::kTVMFFIDLTensorPtr` and stores a raw pointer
to :c:struct:`DLTensor* <DLTensor>` in the payload.
+
+.. warning::
+
+ Non-owning :c:struct:`DLTensor` or :cpp:class:`TensorView
<tvm::ffi::TensorView>` can be converted to non-owning :cpp:class:`AnyView
<tvm::ffi::AnyView>`, but cannot be converted to owning :cpp:class:`Any
<tvm::ffi::Any>`.
+
+.. code-block:: cpp
+
+ // Converts DLTensor* to AnyView
+ int DLTensorToAnyView(DLTensor* tensor, TVMFFIAny* out) {
+ out->type_index = kTVMFFIDLTensorPtr;
+ out->zero_padding = 0;
+ out->v_ptr = tensor;
+ return SUCCESS;
+ }
+
+ // Converts TensorView to AnyView
+ int TensorViewToAnyView(const tvm::ffi::TensorView& tensor_view,
TVMFFIAny* out) {
+ return DLTensorToAnyView(tensor_view.GetDLTensorPtr(), out);
+ }
+
+Tensor APIs
+-----------
+
+This section introduces the most important APIs you will use in C++ and
Python. It intentionally
+focuses on introductory, day-to-day methods.
+
+C++ APIs
+~~~~~~~~
+
+**Common pattern**. A typical kernel implementation includes accepting a
:cpp:class:`TensorView <tvm::ffi::TensorView>` parameter,
+validating its metadata (dtype, shape, device), and then accessing its data
pointer for computation.
+
+.. code-block:: cpp
+
+ void MyKernel(tvm::ffi::TensorView input, tvm::ffi::TensorView output) {
+ // Validate dtype & device
+ if (input.dtype() != DLDataType{kDLFloat, 32, 1})
+ TVM_FFI_THROW(TypeError) << "Expect float32 input, but got " <<
input.dtype();
+ if (input.device() != DLDevice{kDLCUDA, 0})
+ TVM_FFI_THROW(ValueError) << "Expect input on CUDA:0, but got " <<
input.device();
+ // Access data pointer
+ float* input_data_ptr = static_cast<float*>(input.data_ptr());
+ float* output_data_ptr = static_cast<float*>(output.data_ptr());
+ Kernel<<<...>>>(..., input_data_ptr, output_data_ptr, ...);
+ }
+
+**Metadata APIs**. The example above uses metadata APIs for querying tensor
shapes, data types, device information, data pointers, etc. Common ones include:
+
+ :cpp:func:`TensorView::shape() <tvm::ffi::TensorView::shape>` and
:cpp:func:`Tensor::shape() <tvm::ffi::Tensor::shape>`
+ shape array
+
+ :cpp:func:`TensorView::dtype() <tvm::ffi::TensorView::dtype>` and
:cpp:func:`Tensor::dtype() <tvm::ffi::Tensor::dtype>`
+ element data type
+
+ :cpp:func:`TensorView::data_ptr() <tvm::ffi::TensorView::data_ptr>` and
:cpp:func:`Tensor::data_ptr() <tvm::ffi::Tensor::data_ptr>`
+ base pointer to the tensor's data
+
+ :cpp:func:`TensorView::device() <tvm::ffi::TensorView::device>` and
:cpp:func:`Tensor::device() <tvm::ffi::Tensor::device>`
+ device type and id
+
+ :cpp:func:`TensorView::byte_offset() <tvm::ffi::TensorView::byte_offset>` and
:cpp:func:`Tensor::byte_offset() <tvm::ffi::Tensor::byte_offset>`
+ byte offset to the first element
+
+ :cpp:func:`TensorView::ndim() <tvm::ffi::TensorView::ndim>` and
:cpp:func:`Tensor::ndim() <tvm::ffi::Tensor::ndim>`
+ number of dimensions (:cpp:func:`ShapeView::size
<tvm::ffi::ShapeView::size>`)
+
+ :cpp:func:`TensorView::numel() <tvm::ffi::TensorView::numel>` and
:cpp:func:`Tensor::numel() <tvm::ffi::Tensor::numel>`
+ total number of elements (:cpp:func:`ShapeView::Product
<tvm::ffi::ShapeView::Product>`)
+
+
+Python APIs
+~~~~~~~~~~~
+
+The Python-facing :py:class:`tvm_ffi.Tensor` is a managed n-dimensional array
that:
+
+* Can be created via :py:func:`tvm_ffi.from_dlpack(ext_tensor, ...)
<tvm_ffi.from_dlpack>` to import tensors from external frameworks, e.g.
:ref:`PyTorch <ship-to-pytorch>`, :ref:`JAX <ship-to-jax>`, :ref:`NumPy/CuPy
<ship-to-numpy>`.
+* Implements the DLPack protocol so it can be passed back to frameworks
without copying, e.g. :py:func:`torch.from_dlpack`.
+
+Typical import pattern:
+
+.. code-block:: python
+
+ import tvm_ffi
+ import torch
+
+ x = torch.randn(1024, device="cuda")
+ t = tvm_ffi.from_dlpack(x, require_contiguous=True)
+
+ # t is a tvm_ffi.Tensor that views the same memory.
+ # You can pass t into TVM-FFI-exposed functions.
+
+Allocation in C++
+~~~~~~~~~~~~~~~~~
+
+TVM-FFI is not a kernel library per se and is not linked to any specific
device memory allocator or runtime.
+However, for kernel library developers, it provides standardized allocation
entry points by
+interfacing with the surrounding framework's allocator. For example, it uses
PyTorch's allocator when running inside
+a PyTorch environment.
+
+**Env Allocator.** Use :cpp:func:`Tensor::FromEnvAlloc()
<tvm::ffi::Tensor::FromEnvAlloc>` along with C API
+:cpp:func:`TVMFFIEnvTensorAlloc` to allocate a tensor using the framework's
allocator.
+
+.. code-block:: cpp
+
+ Tensor tensor = Tensor::FromEnvAlloc(
+ TVMFFIEnvTensorAlloc,
+ /*shape=*/{1, 2, 3},
+ /*dtype=*/DLDataType({kDLFloat, 32, 1}),
+ /*device=*/DLDevice({kDLCPU, 0})
+ );
+
+In a PyTorch environment, this is equivalent to :py:func:`torch.empty`.
+
+.. warning::
+
+ While allocation APIs are available, it is generally **recommended** to
avoid allocating tensors inside kernels.
+ Instead, prefer pre-allocating outputs and passing them in as
:cpp:class:`tvm::ffi::TensorView` parameters.
+ Reasons include:
+
+ - Avoiding fragmentation and performance pitfalls;
+ - Avoiding cudagraph incompatibilities on GPU;
+ - Allowing the outer framework to control allocation policy (pools, device
strategies, etc.).
+
+
+**Custom Allocator.** Use :cpp:func:`Tensor::FromNDAlloc(custom_alloc, ...)
<tvm::ffi::Tensor::FromNDAlloc>`,
+or its advanced variant :cpp:func:`Tensor::FromNDAllocStrided(custom_alloc,
...) <tvm::ffi::Tensor::FromNDAllocStrided>`,
+to allocate a tensor with user-provided allocation callback.
+
+Below is an example that uses ``cudaMalloc``/``cudaFree`` as custom allocators
for GPU tensors.
+
+.. code-block:: cpp
+
+ struct CUDANDAlloc {
+ void AllocData(DLTensor* tensor) {
+ size_t data_size = ffi::GetDataSize(*tensor);
+ void* ptr = nullptr;
+ cudaError_t err = cudaMalloc(&ptr, data_size);
+ TVM_FFI_ICHECK_EQ(err, cudaSuccess) << "cudaMalloc failed: " <<
cudaGetErrorString(err);
+ tensor->data = ptr;
+ }
+
+ void FreeData(DLTensor* tensor) {
+ if (tensor->data != nullptr) {
+ cudaError_t err = cudaFree(tensor->data);
+ TVM_FFI_ICHECK_EQ(err, cudaSuccess) << "cudaFree failed: " <<
cudaGetErrorString(err);
+ tensor->data = nullptr;
+ }
+ }
+ };
+
+ ffi::Tensor cuda_tensor = ffi::Tensor::FromNDAlloc(
+ CUDANDAlloc(),
+ /*shape=*/{3, 4, 5},
+ /*dtype=*/DLDataType({kDLFloat, 32, 1}),
+ /*device=*/DLDevice({kDLCUDA, 0})
+ );
+
+
+
+Stream Handling in C++
+~~~~~~~~~~~~~~~~~~~~~~
+
+Besides tensors, stream context is another key concept in a kernel library,
especially for kernel execution. While CUDA does not have a global context for
default streams, frameworks like PyTorch maintain a "current stream" per device
(:py:func:`torch.cuda.current_stream`), and kernel libraries must read the
current stream from the embedding environment.
+
+As a hardware-agnostic abstraction layer, TVM-FFI is not linked to any
specific stream management library, but to ensure GPU kernels launch on the
correct stream, it provides standardized APIs to obtain stream context from the
upper framework (e.g. PyTorch).
+
+**Obtain Stream Context.** Use C API :cpp:func:`TVMFFIEnvGetStream` to obtain
the current stream for a given device.
+
+.. code-block:: c++
+
+ void func(ffi::TensorView input, ...) {
+ ffi::DLDevice device = input.device();
+ cudaStream_t stream =
reinterpret_cast<cudaStream_t>(TVMFFIEnvGetStream(device.device_type,
device.device_id));
+ }
+
+which is equivalent to:
+
+.. code-block:: c++
+
+ void func(at::Tensor input, ...) {
+ c10::Device device = input.device();
+ cudaStream_t stream =
reinterpret_cast<cudaStream_t>(c10::cuda::getCurrentCUDAStream(device.index()).stream());
+ }
+
+
+**Auto-Update Stream Context.** When converting framework tensors as mentioned
above, TVM-FFI automatically updates the stream context to match the device of
the converted tensors.
+
+For example, when converting a PyTorch tensor at ``torch.device('cuda:3')``,
TVM-FFI automatically sets the stream context to
:py:func:`torch.cuda.current_stream(device='cuda:3')`.
+
+**Set Stream Context.** :py:func:`tvm_ffi.use_torch_stream` and
:py:func:`tvm_ffi.use_raw_stream` are provided to manually update the stream
context when the automatic update is insufficient.
+
+Further Reading
+---------------
+
+- :cpp:class:`TensorObj <tvm::ffi::TensorObj>` and :cpp:class:`Tensor
<tvm::ffi::Tensor>` are part of the standard TVM-FFI object system.
+ See :ref:`Object System <object-storage-format>` for details on how TVM-FFI
objects work.
+- :cpp:class:`AnyView <tvm::ffi::AnyView>` and :cpp:class:`Any
<tvm::ffi::Any>` are part of the stable C ABI.
+ Tutorial :doc:`Stable C ABI<../get_started/stable_c_abi>` explains the ABI
design at a high level,
+ and :doc:`ABI Overview <abi_overview>` shares details on the design.
+- DLPack specification can be found at :external+data-api:doc:`DLPack protocol
<design_topics/data_interchange>`, and documentation at :external+dlpack:doc:`C
API <c_api>` and :external+dlpack:doc:`Python API <python_spec>`.
+- Kernel library developers may also refer to
:doc:`../guides/kernel_library_guide` and `FlashInfer
<https://github.com/flashinfer-ai/flashinfer/>`_ for best practices on building
high-performance kernel libraries with TVM-FFI.
diff --git a/docs/get_started/quickstart.rst b/docs/get_started/quickstart.rst
index 8dd0fe5..6d608e7 100644
--- a/docs/get_started/quickstart.rst
+++ b/docs/get_started/quickstart.rst
@@ -20,16 +20,16 @@ Quick Start
.. note::
- All the code in this tutorial can be found under `examples/quickstart
<https://github.com/apache/tvm-ffi/tree/main/examples/quickstart>`_ in the
repository.
+ All the code in this tutorial is under `examples/quickstart
<https://github.com/apache/tvm-ffi/tree/main/examples/quickstart>`_ in the
repository.
This guide walks through shipping a minimal ``add_one`` function that computes
``y = x + 1`` in C++ and CUDA.
TVM-FFI's Open ABI and FFI make it possible to **ship one library** for
multiple frameworks and languages.
We can build a single shared library that works across:
-- **ML frameworks**, e.g. PyTorch, JAX, NumPy, CuPy, etc., and
-- **Languages**, e.g. C++, Python, Rust, etc.,
-- **Python ABI versions**, e.g. ship one wheel to support all Python versions,
including free-threaded ones.
+- **ML frameworks**, e.g. PyTorch, JAX, NumPy, CuPy, and others;
+- **Languages**, e.g. C++, Python, Rust, and others;
+- **Python ABI versions**, e.g. one wheel that supports all Python versions,
including free-threaded ones.
.. admonition:: Prerequisite
:class: hint
@@ -39,7 +39,7 @@ We can build a single shared library that works across:
- Compiler: C++17-capable toolchain (GCC/Clang/MSVC)
- Optional ML frameworks for testing: NumPy, PyTorch, JAX, CuPy
- CUDA: Any modern version (if you want to try the CUDA part)
- - TVM-FFI installed via
+ - TVM-FFI installed via:
.. code-block:: bash
@@ -52,7 +52,7 @@ Write a Simple ``add_one``
Source Code
~~~~~~~~~~~
-Suppose we implement a C++ function ``AddOne`` that performs elementwise ``y =
x + 1`` for a 1-D ``float32`` vector. The source code (C++, CUDA) is:
+Suppose we implement a C++ function ``AddOne`` that performs elementwise ``y =
x + 1`` for a 1-D ``float32`` vector. The source code (C++ and CUDA) is:
.. hint::
@@ -84,23 +84,23 @@ Suppose we implement a C++ function ``AddOne`` that
performs elementwise ``y = x
The macro :c:macro:`TVM_FFI_DLL_EXPORT_TYPED_FUNC` exports the C++ function
``AddOne``
-as a TVM FFI compatible symbol ``__tvm_ffi_add_one_cpu/cuda``. If
:c:macro:`TVM_FFI_DLL_EXPORT_INCLUDE_METADATA` is set to 1,
+as a TVM-FFI-compatible symbol ``__tvm_ffi_add_one_cpu/cuda``. If
:c:macro:`TVM_FFI_DLL_EXPORT_INCLUDE_METADATA` is set to 1,
it also exports the function's metadata as a symbol
``__tvm_ffi__metadata_add_one_cpu/cuda`` for type checking and stub generation.
-The class :cpp:class:`tvm::ffi::TensorView` allows zero-copy interop with
tensors from different ML frameworks:
+The class :cpp:class:`tvm::ffi::TensorView` enables zero-copy interop with
tensors from different ML frameworks:
- NumPy, CuPy,
- PyTorch, JAX, or
- any array type that supports the standard :external+data-api:doc:`DLPack
protocol <design_topics/data_interchange>`.
-Finally, :cpp:func:`TVMFFIEnvGetStream` can be used in the CUDA code to launch
a kernel on the caller's stream.
+Finally, :cpp:func:`TVMFFIEnvGetStream` can be used in the CUDA code to launch
kernels on the caller's stream.
.. _sec-cpp-compile-with-tvm-ffi:
Compile with TVM-FFI
~~~~~~~~~~~~~~~~~~~~
-**Raw command.** We can use the following minimal commands to compile the
source code:
+**Raw command.** Use the following minimal commands to compile the source code:
.. tabs::
@@ -118,16 +118,16 @@ Compile with TVM-FFI
:start-after: [cuda_compile.begin]
:end-before: [cuda_compile.end]
-This step produces a shared library ``add_one_cpu.so`` and ``add_one_cuda.so``
that can be used across languages and frameworks.
+These steps produce shared libraries ``add_one_cpu.so`` and
``add_one_cuda.so`` that can be used across languages and frameworks.
.. hint::
- For a single-file C++/CUDA project, a convenient method
:py:func:`tvm_ffi.cpp.load_inline`
- is provided to minimize boilerplate code in compilation, linking, and
loading.
+ For a single-file C++/CUDA project, :py:func:`tvm_ffi.cpp.load_inline`
+ minimizes boilerplate for compilation, linking, and loading.
**CMake.** CMake is the preferred approach for building across platforms.
-TVM-FFI natively integrates with CMake via ``find_package`` as demonstrated
below:
+TVM-FFI integrates with CMake via ``find_package`` as demonstrated below:
.. tabs::
@@ -158,19 +158,19 @@ TVM-FFI natively integrates with CMake via
``find_package`` as demonstrated belo
add_library(add_one_cuda SHARED compile/add_one_cuda.cu)
tvm_ffi_configure_target(add_one_cuda)
-**Artifact.** The resulting ``add_one_cpu.so`` and ``add_one_cuda.so`` are
minimal libraries that are agnostic to:
+**Artifact.** The resulting ``add_one_cpu.so`` and ``add_one_cuda.so`` are
small libraries that are agnostic to:
-- Python version/ABI. It is not compiled/linked with Python and depends only
on TVM-FFI's stable C ABI;
-- Languages, including C++, Python, Rust or any other language that can
interop with C ABI;
-- ML frameworks, such as PyTorch, JAX, NumPy, CuPy, or anything with standard
:external+data-api:doc:`DLPack protocol <design_topics/data_interchange>`.
+- Python version/ABI. They are not compiled or linked with Python and depend
only on TVM-FFI's stable C ABI;
+- Languages, including C++, Python, Rust, or any other language that can
interop with the C ABI;
+- ML frameworks, such as PyTorch, JAX, NumPy, CuPy, or any array library that
implements the standard :external+data-api:doc:`DLPack protocol
<design_topics/data_interchange>`.
.. _sec-use-across-framework:
Ship Across ML Frameworks
-------------------------
-TVM-FFI's Python package provides :py:func:`tvm_ffi.load_module`, which can
load either
-the ``add_one_cpu.so`` or ``add_one_cuda.so`` into :py:class:`tvm_ffi.Module`.
+TVM-FFI's Python package provides :py:func:`tvm_ffi.load_module` to load either
+``add_one_cpu.so`` or ``add_one_cuda.so`` into a :py:class:`tvm_ffi.Module`.
.. code-block:: python
@@ -179,7 +179,7 @@ the ``add_one_cpu.so`` or ``add_one_cuda.so`` into
:py:class:`tvm_ffi.Module`.
func : tvm_ffi.Function = mod.add_one_cpu
``mod.add_one_cpu`` retrieves a callable :py:class:`tvm_ffi.Function` that
accepts tensors from host frameworks
-directly. This process is done zero-copy, without any boilerplate code, under
extremely low latency.
+directly. This is zero-copy, requires no boilerplate code, and adds very
little overhead.
We can then use these functions in the following ways:
@@ -198,13 +198,13 @@ PyTorch
JAX
~~~
-Support via `nvidia/jax-tvm-ffi <https://github.com/nvidia/jax-tvm-ffi>`_.
This can be installed via
+Support is provided via `nvidia/jax-tvm-ffi
<https://github.com/nvidia/jax-tvm-ffi>`_. Install it with:
.. code-block:: bash
pip install jax-tvm-ffi
-After installation, ``add_one_cuda`` can be registered as a target to JAX's
``ffi_call``.
+After installation, ``add_one_cuda`` can be registered as a target for JAX's
``ffi_call``.
.. code-block:: python
@@ -248,9 +248,9 @@ NumPy/CuPy
Ship Across Languages
---------------------
-TVM-FFI's core loading mechanism is ABI stable and works across language
boundaries.
-A single library can be loaded in every language TVM-FFI supports,
-without having to recompile different libraries targeting different ABIs or
languages.
+TVM-FFI's core loading mechanism is ABI-stable and works across language
boundaries.
+A single library can be loaded in any language TVM-FFI supports,
+without recompiling for different ABIs or languages.
.. _ship-to-python:
@@ -258,8 +258,8 @@ Python
~~~~~~
As shown in the :ref:`previous section<sec-use-across-framework>`,
:py:func:`tvm_ffi.load_module` loads a language-
-and framework-independent ``add_one_cpu.so`` or ``add_one_cuda.so`` and can be
used to incorporate it into all Python
-array frameworks that implement the standard :external+data-api:doc:`DLPack
protocol <design_topics/data_interchange>`.
+and framework-independent ``add_one_cpu.so`` or ``add_one_cuda.so`` and can be
used with any Python
+array framework that implements the standard :external+data-api:doc:`DLPack
protocol <design_topics/data_interchange>`.
.. _ship-to-cpp:
@@ -267,7 +267,7 @@ C++
~~~
TVM-FFI's C++ API :cpp:func:`tvm::ffi::Module::LoadFromFile` loads
``add_one_cpu.so`` or ``add_one_cuda.so`` and
-can be used directly in C/C++ with no Python dependency.
+can be used directly from C/C++ without a Python dependency.
.. literalinclude:: ../../examples/quickstart/load/load_cpp.cc
:language: cpp
@@ -290,13 +290,13 @@ Compile and run it with:
.. note::
- Don't like loading shared libraries? Static linking is also supported.
+ Prefer not to load shared libraries? Static linking is also supported.
- In such cases, we can use :cpp:func:`tvm::ffi::Function::FromExternC` to
create a
+ In such cases, use :cpp:func:`tvm::ffi::Function::FromExternC` to create a
:cpp:class:`tvm::ffi::Function` from the exported symbol, or directly use
:cpp:func:`tvm::ffi::Function::InvokeExternC` to invoke the function.
- This feature can be useful on iOS, or when the exported module is generated
by another DSL compiler matching the ABI.
+ This feature can be useful on iOS, or when the exported module is generated
by another DSL compiler targeting the ABI.
.. code-block:: cpp
@@ -321,7 +321,7 @@ Rust
TVM-FFI's Rust API ``tvm_ffi::Module::load_from_file`` loads
``add_one_cpu.so`` or ``add_one_cuda.so`` and
then retrieves a function ``add_one_cpu`` or ``add_one_cuda`` from it.
-This procedure is identical to those in C++ and Python:
+This mirrors the C++ and Python flows:
.. code-block:: rust
@@ -336,8 +336,8 @@ This procedure is identical to those in C++ and Python:
.. hint::
- We can also use the Rust API to target the TVM FFI ABI. This means we can
use Rust to write the function
- implementation and export to Python/C++ in the same fashion.
+ You can also use the Rust API to target the TVM-FFI ABI. This lets you
write the function
+ implementation in Rust and export it to Python/C++ in the same way.
Troubleshooting
@@ -351,7 +351,7 @@ Troubleshooting
Further Reading
---------------
-- :doc:`Python Packaging <../packaging/python_packaging>` provides details on
ABI-agnostic Python wheel building, as well as
- exposing functions, classes and C symbols from TVM-FFI modules.
-- :doc:`Stable C ABI <stable_c_abi>` explains the ABI in depth and how it
enables stability guarantee. Its C examples demonstrate
- how to interoperate through the stable C ABI from both callee and caller
sides.
+- :doc:`Python Packaging <../packaging/python_packaging>` provides details on
ABI-agnostic Python wheel builds and on
+ exposing functions, classes, and C symbols from TVM-FFI modules.
+- :doc:`Stable C ABI <stable_c_abi>` explains the ABI in depth and the
stability guarantees it enables. Its C examples demonstrate
+ how to interoperate through the stable C ABI from both the callee and caller
sides.
diff --git a/docs/get_started/stable_c_abi.rst
b/docs/get_started/stable_c_abi.rst
index bcfe491..abab5c2 100644
--- a/docs/get_started/stable_c_abi.rst
+++ b/docs/get_started/stable_c_abi.rst
@@ -20,7 +20,7 @@ Stable C ABI
.. note::
- All code used in this guide lives under
+ All code used in this guide is under
`examples/stable_c_abi
<https://github.com/apache/tvm-ffi/tree/main/examples/stable_c_abi>`_.
.. admonition:: Prerequisite
@@ -34,11 +34,10 @@ Stable C ABI
pip install --reinstall --upgrade apache-tvm-ffi
-This guide introduces TVM-FFI's stable C ABI: a single, minimal and stable
-ABI that represents any cross-language calls, with DSL and ML compiler codegen
-in mind.
+This guide introduces TVM-FFI's stable C ABI: a single, minimal ABI that
represents
+cross-language calls and is designed for DSL and ML compiler codegen.
-TVM-FFI builds on the following key idea:
+TVM-FFI is built around the following key idea:
.. _tvm_ffi_c_abi:
@@ -56,19 +55,19 @@ TVM-FFI builds on the following key idea:
TVMFFIAny* result, // output: *result
);
- where :cpp:class:`TVMFFIAny`, is a tagged union of all supported types, e.g.
integers, floats, Tensors, strings, etc., and can be further extended to
arbitrary user-defined types.
+ where :cpp:class:`TVMFFIAny` is a tagged union of all supported types, e.g.
integers, floats, tensors, strings, and more, and can be extended to
user-defined types.
-Built on top of this stable C ABI, TVM-FFI defines a common C ABI protocol for
all functions, and further provides an extensible, performant, and
ecosystem-friendly open solution for all.
+Built on top of this stable C ABI, TVM-FFI defines a common C ABI protocol for
all functions and provides an extensible, performant, and ecosystem-friendly
solution.
The rest of this guide covers:
- The stable C layout and calling convention of ``tvm_ffi_c_abi``;
-- C examples from both callee and caller side of this ABI.
+- C examples from both the callee and caller side of this ABI.
Stable C Layout
---------------
-TVM-FFI's :ref:`C ABI <tvm_ffi_c_abi>` uses a stable layout for all the input
and output arguments.
+TVM-FFI's :ref:`C ABI <tvm_ffi_c_abi>` uses a stable layout for all input and
output arguments.
Layout of :cpp:class:`TVMFFIAny`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -76,7 +75,7 @@ Layout of :cpp:class:`TVMFFIAny`
:cpp:class:`TVMFFIAny` is a fixed-size (128-bit) tagged union that represents
all supported types.
- First 32 bits: type index indicating which value is stored (supports up to
2^32 types).
-- Next 32 bits: reserved (used for flags in rare cases, e.g. small-string
optimization).
+- Next 32 bits: reserved (used for flags in rare cases, e.g., small-string
optimization).
- Last 64 bits: payload that is either a 64-bit integer, a 64-bit
floating-point number, or a pointer to a heap-allocated object.
.. figure::
https://raw.githubusercontent.com/tlc-pack/web-data/main/images/tvm-ffi/stable-c-abi-layout-any.svg
@@ -137,9 +136,9 @@ Stable ABI in C Code
You can build and run the examples either with raw compiler commands or with
CMake.
Both approaches are demonstrated below.
-TVM FFI's :ref:`C ABI <tvm_ffi_c_abi>` is designed with DSL and ML compilers
in mind. DSL codegen usually relies on MLIR, LLVM or low-level C as the
compilation target, where no access to C++ features is available, and where
stable C ABIs are preferred for simplicity and stability.
+TVM-FFI's :ref:`C ABI <tvm_ffi_c_abi>` is designed with DSL and ML compilers
in mind. DSL codegen often targets MLIR, LLVM, or low-level C, where C++
features are unavailable and stable C ABIs are preferred for simplicity and
stability.
-This section shows how to write C code that follows the stable C ABI.
Specifically, we provide two examples:
+This section shows how to write C code that follows the stable C ABI using two
examples:
- Callee side: A CPU ``add_one_cpu`` kernel in C that is equivalent to the
:ref:`C++ example <cpp_add_one_kernel>`.
- Caller side: A loader and runner in C that invokes the kernel, a direct C
translation of the :ref:`C++ example <ship-to-cpp>`.
@@ -149,11 +148,11 @@ The C code is minimal and dependency-free, so it can
serve as a direct reference
Callee: ``add_one_cpu`` Kernel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Below is a minimal ``add_one_cpu`` kernel in C that follows the stable C ABI.
It has three steps:
+Below is a minimal ``add_one_cpu`` kernel in C that follows the stable C ABI
in three steps:
- **Step 1**. Extract input ``x`` and output ``y`` as DLPack tensors;
- **Step 2**. Implement the kernel ``y = x + 1`` on CPU with a simple for-loop;
-- **Step 3**. Set the output result to ``result``.
+- **Step 3**. Set the output result in ``result``.
.. literalinclude:: ../../examples/stable_c_abi/src/add_one_cpu.c
:language: c
@@ -188,7 +187,7 @@ Build it with either approach:
**C vs. C++.** Compared to the :ref:`C++ example <cpp_add_one_kernel>`, there
are a few key differences:
- The explicit marshalling in **Step 1** is only needed in C. In C++,
templates hide these details.
-- The C++ macro :c:macro:`TVM_FFI_DLL_EXPORT_TYPED_FUNC` (used to export
``add_one_cpu``) is not needed in C, because this example directly defines the
exported C symbol ``__tvm_ffi_add_one_cpu``.
+- The C++ macro :c:macro:`TVM_FFI_DLL_EXPORT_TYPED_FUNC` (used to export
``add_one_cpu``) is not needed in C, since this example directly defines the
exported C symbol ``__tvm_ffi_add_one_cpu``.
.. hint::
@@ -200,7 +199,7 @@ Build it with either approach:
Caller: Kernel Loader
~~~~~~~~~~~~~~~~~~~~~
-Next, a minimal C loader invokes the ``add_one_cpu`` kernel. It is
functionally identical to the :ref:`C++ example <ship-to-cpp>` and performs:
+Next, a minimal C loader invokes the ``add_one_cpu`` kernel. It mirrors the
:ref:`C++ example <ship-to-cpp>` and performs:
- **Step 1**. Load the shared library ``build/add_one_cpu.so`` that contains
the kernel;
- **Step 2**. Get function ``add_one_cpu`` from the library;
@@ -238,7 +237,7 @@ Build and run the loader with either approach:
cmake --build build --config RelWithDebInfo
build/load
-To call a function via the stable C ABI in C, idiomatically:
+In C, the idiomatic steps to call a function via the stable C ABI are:
- Convert input arguments to the :cpp:class:`TVMFFIAny` type;
- Call the target function (e.g., ``add_one_cpu``) via
:cpp:func:`TVMFFIFunctionCall`;
@@ -247,7 +246,7 @@ To call a function via the stable C ABI in C, idiomatically:
What's Next
-----------
-**ABI specification.** See the complete ABI specification in
:doc:`../concepts/abi_overview`.
+**ABI specification.** See the full ABI specification in
:doc:`../concepts/abi_overview`.
**Convenient compiler target.** The stable C ABI is a simple, portable codegen
target for DSL compilers. Emit C that follows this ABI to integrate with
TVM-FFI and call the result from multiple languages and frameworks. See
:doc:`../concepts/abi_overview`.
diff --git a/docs/index.rst b/docs/index.rst
index 2387846..f6eced7 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -59,6 +59,7 @@ Table of Contents
:caption: Concepts
concepts/abi_overview.md
+ concepts/tensor.rst
.. toctree::
:maxdepth: 1
diff --git a/docs/packaging/python_packaging.rst
b/docs/packaging/python_packaging.rst
index c3d8f81..b89abc0 100644
--- a/docs/packaging/python_packaging.rst
+++ b/docs/packaging/python_packaging.rst
@@ -18,10 +18,10 @@
Python Packaging
================
-This guide walks through a small but complete workflow for packaging a TVM-FFI
extension
+This guide walks through a small, complete workflow for packaging a TVM-FFI
extension
as a Python wheel. The goal is to help you wire up a simple extension, produce
a wheel,
and ship user-friendly typing annotations without needing to know every detail
of TVM
-internals. We will cover three checkpoints:
+internals. We cover three checkpoints:
- Export C++ to Python;
- Build Python wheel;
@@ -29,7 +29,7 @@ internals. We will cover three checkpoints:
.. note::
- All code used in this guide lives under
+ All code used in this guide is under
`examples/python_packaging
<https://github.com/apache/tvm-ffi/tree/main/examples/python_packaging>`_.
.. admonition:: Prerequisite
@@ -55,19 +55,19 @@ Include the umbrella header to access the core TVM-FFI C++
API.
TVM-FFI offers three ways to expose code:
-- C symbols in TVM FFI ABI: Export code as plain C symbols. This is the
recommended way for
- most usecases as it keeps the boundary thin and works well with compiler
codegen;
-- Functions: Reflect functions via the global registry;
-- Classes: Register C++ classes derived from :cpp:class:`tvm::ffi::Object` to
Python dataclasses.
+- C symbols in the TVM-FFI ABI: export code as plain C symbols. This is the
recommended way for
+ most use cases because it keeps the boundary thin and works well with
compiler codegen;
+- Functions: expose functions via the global registry;
+- Classes: register C++ classes derived from :cpp:class:`tvm::ffi::Object` as
Python dataclasses.
-Metadata is automatically captured and is later be turned into type hints for
proper LSP help.
+Metadata is captured automatically and later turned into type hints for LSP
support.
TVM-FFI ABI (Recommended)
~~~~~~~~~~~~~~~~~~~~~~~~~
If you prefer to export plain C symbols, TVM-FFI provides helpers to make them
accessible
-to Python. This option keeps the boundary thin and works well with LLVM
compilers where
-C symbols are easier to call into.
+from Python. This option keeps the boundary thin and works well with
LLVM-based compilers where
+C symbols are easier to call.
.. tabs::
@@ -107,8 +107,8 @@ C symbols are easier to call into.
Global Function
~~~~~~~~~~~~~~~
-This example registers a function into the global registry and then calls it
from Python.
-It registry handles type translation, error handling, and metadata.
+This example registers a function in the global registry and then calls it
from Python.
+The registry handles type translation, error handling, and metadata.
.. tabs::
@@ -164,7 +164,7 @@ It registry handles type translation, error handling, and
metadata.
Class
~~~~~
-Any class derived from :cpp:class:`tvm::ffi::Object` can be registered,
exported and
+Any class derived from :cpp:class:`tvm::ffi::Object` can be registered,
exported, and
instantiated from Python. The reflection helper
:cpp:class:`tvm::ffi::reflection::ObjectDef`
makes it easy to expose:
@@ -205,7 +205,7 @@ makes it easy to expose:
import my_ffi_extension
pair = my_ffi_extension.IntPair(1, 2)
- pair.sum() # -> 3
+ pair.sum() # -> 3
.. group-tab:: Python (Generated)
@@ -227,11 +227,11 @@ makes it easy to expose:
Build Python Wheel
------------------
-Once the C++ side is ready, TVM-FFI provides convenient helpers to build and
ship
-ABI-agnostic Python extensions using any standard packaging tool.
+Once the C++ side is ready, TVM-FFI provides helpers to build and ship
+ABI-agnostic Python extensions using standard packaging tools.
The flow below uses :external+scikit_build_core:doc:`scikit-build-core <index>`
-that drives CMake build, but the same ideas translate to setuptools or other
:pep:`517` backends.
+to drive a CMake build, but the same ideas apply to setuptools or other
:pep:`517` backends.
CMake Target
~~~~~~~~~~~~
@@ -244,19 +244,19 @@ creates a shared target ``my_ffi_extension`` and
configures it against TVM-FFI.
:start-after: [example.cmake.begin]
:end-before: [example.cmake.end]
-Function ``tvm_ffi_configure_target`` sets up TVM-FFI include paths, link
against TVM-FFI library,
+Function ``tvm_ffi_configure_target`` sets up TVM-FFI include paths, links
against the TVM-FFI library,
generates stubs under ``STUB_DIR``, and can scaffold stub files when
``STUB_INIT`` is
enabled.
-Function ``tvm_ffi_install`` places necessary information, e.g. debug symbols
in macOS, next to
-the shared library for proper packaging.
+Function ``tvm_ffi_install`` places necessary information (e.g., debug symbols
on macOS) next to
+the shared library for packaging.
Python Build Backend
~~~~~~~~~~~~~~~~~~~~
-Define a :pep:`517` build backend in ``pyproject.toml``, with the following
steps:
+Define a :pep:`517` build backend in ``pyproject.toml`` with the following
steps:
-- Sepcfiy ``apache-tvm-ffi`` as a build requirement, so that CMake can find
TVM-FFI;
+- Specify ``apache-tvm-ffi`` as a build requirement, so that CMake can find
TVM-FFI;
- Configure ``wheel.py-api`` that indicates a Python ABI-agnostic wheel;
- Specify the source directory of the package via ``wheel.packages``, and the
installation
destination via ``wheel.install-dir``.
@@ -266,13 +266,13 @@ Define a :pep:`517` build backend in ``pyproject.toml``,
with the following step
:start-after: [pyproject.build.begin]
:end-before: [pyproject.build.end]
-Once fully specified, scikit-build-core will invoke CMake and drive the
extension building process.
+Once specified, scikit-build-core will invoke CMake and drive the extension
build.
Wheel Auditing
~~~~~~~~~~~~~~
-**Build wheels**. The wheel can be built using the standard workflows, e.g.:
+**Build wheels**. You can build wheels using standard workflows, for example:
- `pip workflow <https://pip.pypa.io/en/stable/cli/pip_wheel/>`_ or `editable
install
<https://pip.pypa.io/en/stable/topics/local-project-installs/#editable-installs>`_
@@ -295,9 +295,9 @@ Wheel Auditing
cibuildwheel --output-dir dist
-**Audit wheels**. In practice, an extra step is usually necessary to remove
redundant
-and error-prone shared library dependencies. In our case, given
``libtvm_ffi.so``
-(or its respective platform variants) is guaranteed to be loaded by importing
``tvm_ffi``,
+**Audit wheels**. In practice, an extra step is usually needed to remove
redundant
+and error-prone shared library dependencies. In our case, because
``libtvm_ffi.so``
+(or its platform variants) is guaranteed to be loaded by importing ``tvm_ffi``,
we can safely exclude this dependency from the final wheel.
.. code-block:: bash
@@ -322,11 +322,11 @@ corresponding Python code **inline** and **statically**.
Inline Directives
~~~~~~~~~~~~~~~~~
-Similar to linter tools, ``tvm-ffi-stubgen`` uses special comments
+Like linter tools, ``tvm-ffi-stubgen`` uses special comments
to identify what to generate and where to write generated code.
-**Directive 1 (Global functions)**. Example below shows an directive
-``global/${prefix}`` marking a type stub section of global functions.
+**Directive 1 (Global functions)**. The example below shows a directive
+``global/${prefix}`` that marks a type stub section for global functions.
.. code-block:: python
@@ -340,10 +340,10 @@ to identify what to generate and where to write generated
code.
Running ``tvm-ffi-stubgen`` fills in the function stubs between the
``begin`` and ``end`` markers based on the loaded registry, and in this case
-introduces all the global functions named ``my_ext.arith.*``.
+adds all the global functions named ``my_ext.arith.*``.
-**Directive 2 (Classes)**. Example below shows an directive
-``object/${type_key}`` marking the fields and methods of a registered class.
+**Directive 2 (Classes)**. The example below shows a directive
+``object/${type_key}`` that marks the fields and methods of a registered class.
.. code-block:: python
@@ -360,12 +360,12 @@ introduces all the global functions named
``my_ext.arith.*``.
Directive-based Generation
~~~~~~~~~~~~~~~~~~~~~~~~~~
-After TVM-FFI extension is built as a shared library, say at
-``build/libmy_ffi_extension.so``
+After the TVM-FFI extension is built as a shared library, for example at
+``build/libmy_ffi_extension.so``:
**Command line tool**. The command below generates stubs for
the package located at ``python/my_ffi_extension``, updating
-all sections marked by the directives.
+all sections marked by directives.
.. code-block:: bash
@@ -384,15 +384,14 @@ every time the target is built.
STUB_DIR "python"
)
-Inside the function, CMake manages to find proper ``--dlls`` arguments
+Inside the function, CMake derives the proper ``--dlls`` arguments
via ``$<TARGET_FILE:${target}>``.
Scaffold Missing Directives
~~~~~~~~~~~~~~~~~~~~~~~~~~~
**Command line tool**. Beyond updating existing directives, ``tvm-ffi-stubgen``
-can be used to scaffold missing directives if they are not yet present in the
-package with a few extra flags.
+can scaffold missing directives with a few extra flags.
.. code-block:: bash
@@ -405,11 +404,11 @@ package with a few extra flags.
- ``--init-pypkg <pypkg>``: Specifies the name of the Python package to
initialize, e.g. ``apache-tvm-ffi``, ``my-ffi-extension``;
- ``--init-lib <libtarget>``: Specifies the name of the CMake target (shared
library) to load for reflection metadata;
-- ``--init-prefix <prefix>``: Specifies the registry prefix to include for
stub generation, e.g. ``my_ffi_extension.``. If names of global functions or
classes start with this prefix, they will be included in the generated stubs.
+- ``--init-prefix <prefix>``: Specifies the registry prefix to include for
stub generation, e.g. ``my_ffi_extension.``. If global function or class names
start with this prefix, they will be included in the generated stubs.
**CMake Integration**. CMake function ``tvm_ffi_configure_target``
-also supports scaffolding missing directives via the extra options
-``STUB_INIT``, ``STUB_PKG``, and ``STUB_PREFIX``.
+also supports scaffolding missing directives via the ``STUB_INIT``,
``STUB_PKG``,
+and ``STUB_PREFIX`` options.
.. code-block:: cmake
@@ -424,7 +423,7 @@ based on the target and package information already
specified.
Other Directives
~~~~~~~~~~~~~~~~
-All the supported directives are documented via:
+All supported directives are documented via:
.. code-block:: bash
@@ -448,8 +447,8 @@ It includes:
from typing import Any, Callable
# tvm-ffi-stubgen(end)
-**Directive 4 (Export)**. It re-exports names defined in `_ffi_api.__all__`
into the current file. Usually
-used in ``__init__.py`` to aggregate all exported names. Example:
+**Directive 4 (Export)**. It re-exports names defined in `_ffi_api.__all__`
into the current file, usually
+in ``__init__.py`` to aggregate exported names. Example:
.. code-block:: python
@@ -481,7 +480,7 @@ classes and functions, as well as ``LIB`` if present. It's
usually placed at the
# tvm-ffi-stubgen(ty-map): ffi.reflection.AccessStep ->
ffi.access_path.AccessStep
-means the class with type key ``ffi.reflection.AccessStep``, is instead class
``ffi.access_path.AccessStep``
+means the class with type key ``ffi.reflection.AccessStep`` is mapped to
``ffi.access_path.AccessStep``
in Python.
**Directive 7 (Import object)**. It injects a custom import into generated
code, optionally