gemini-code-assist[bot] commented on code in PR #382: URL: https://github.com/apache/tvm-ffi/pull/382#discussion_r2660657359
########## docs/concepts/any.rst: ########## @@ -0,0 +1,460 @@ +.. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Any and AnyView +=============== + +How do you pass arbitrary values across language boundaries - integers, floats, tensors, +functions, custom objects - without writing marshalling code for every type combination? + +TVM-FFI's answer is :cpp:class:`tvm::ffi::Any` and :cpp:class:`tvm::ffi::AnyView`: +a type-erased container that can hold any supported value and transport it across +C, C++, Python, and Rust boundaries through a stable ABI. + +Similar to ``std::any``, :cpp:class:`~tvm::ffi::Any` is a tagged union that can store +values of any type. Unlike ``std::any``, it is designed for inter-language +communication, with a fixed 16-byte layout and built-in support for reference counting +and ownership semantics. + +This tutorial covers the full Any system: its ownership semantics, memory layout, +and the patterns that make cross-language interop seamless. + + +Ownership +--------- + +The core distinction between :cpp:class:`tvm::ffi::Any` and +:cpp:class:`tvm::ffi::AnyView` is **ownership**: + +.. list-table:: + :header-rows: 1 + :widths: 25 35 40 + + * - Aspect + - :cpp:class:`~tvm::ffi::AnyView` + - :cpp:class:`~tvm::ffi::Any` + * - Ownership + - Non-owning (like ``std::string_view``) + - Owning (like ``std::string``) + * - Reference counting + - No refcount changes on copy + - Increments refcount on copy; decrements on destroy + * - Lifetime + - Valid only while source lives + - Extends object lifetime + * - Primary use + - Function inputs + - Return values, storage + +:cpp:class:`~tvm::ffi::AnyView` is a lightweight view. Copying it just copies 16 bytes - no reference +count updates. This makes it perfect for passing arguments without overhead: + +.. code-block:: cpp + + void process(ffi::AnyView value) { + // value is a view into the caller's data + // Zero refcount overhead + int x = value.cast<int>(); + } + +:cpp:class:`~tvm::ffi::Any` is a managed container. Copy an :cpp:class:`~tvm::ffi::Any` holding an object, and the reference +count goes up. Destroy it, and the count goes down: + +.. code-block:: cpp + + ffi::Any create_value() { + ffi::String str = "hello"; + ffi::Any result = str; // refcount: 1 → 2 + return result; // str destroyed, refcount: 2 → 1 + } // result owns the string + + +Ownership at the C ABI Boundary +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +TVM-FFI's function calling convention follows a simple rule: + +- **Inputs are non-owning**: Arguments are passed as :cpp:class:`~tvm::ffi::AnyView`. The caller + retains ownership; the callee borrows them for the duration of the call. +- **Outputs are owning**: The return value is an :cpp:class:`~tvm::ffi::Any`. Ownership transfers + to the caller, who becomes responsible for the value's lifetime. + +.. code-block:: cpp + + // At the C ABI level (TVMFFISafeCallType): + int my_function( + void* handle, + const TVMFFIAny* args, // Non-owning: caller owns these + int32_t num_args, + TVMFFIAny* result // Owning: caller takes ownership of result + ); + +This convention enables zero-copy argument passing while ensuring returned +values don't dangle. + + +Converting AnyView to Any +~~~~~~~~~~~~~~~~~~~~~~~~~ + +When an :cpp:class:`~tvm::ffi::AnyView` is assigned to an :cpp:class:`~tvm::ffi::Any`, TVM-FFI automatically handles the +ownership transition: + +- **POD types**: Direct copy (no ownership semantics needed) +- **Objects**: Reference count incremented +- **Raw strings** (``const char*``): Copied into a managed :cpp:class:`~tvm::ffi::String` object + +.. code-block:: cpp + + void example() { + ffi::AnyView view = "raw string"; // kTVMFFIRawStr, no allocation + ffi::Any owned = view; // Converted to kTVMFFIStr or kTVMFFISmallStr + } + + +Layout +------ + +The Any system is built on three abstraction layers: + +.. code-block:: text + + ┌─────────────────────────────────────────┐ + │ C++ Layer: Any (owning), AnyView │ ← (0 bytes) RAII wrapper or view + ├─────────────────────────────────────────┤ + │ C Layer: TVMFFIAny │ ← (16 bytes) Stable ABI + ├─────────────────────────────────────────┤ + │ Payload: Atomic values or pointers │ ← Actual data + └─────────────────────────────────────────┘ + +**Bottom layer**: The payload is either a primitive value (int, float, device) stored +directly, or a pointer to a heap-allocated, reference-counted object. + +**Middle layer**: :cpp:class:`TVMFFIAny` is a 16-byte C struct that provides the stable +ABI. This is what crosses language boundaries. + +**Top layer**: :cpp:class:`tvm::ffi::Any` and :cpp:class:`tvm::ffi::AnyView` are C++ +wrappers that add type safety, RAII, and ergonomic APIs. + + +.. figure:: https://raw.githubusercontent.com/tlc-pack/web-data/main/images/tvm-ffi/stable-c-abi-layout-any.svg + :alt: Layout of the 128-bit Any tagged union + :align: center + + Figure 1. Layout of the :cpp:class:`TVMFFIAny` tagged union. + + +.. tip:: + + Think of :cpp:class:`TVMFFIAny` as the "wire format" and :cpp:class:`~tvm::ffi::Any`/:cpp:class:`~tvm::ffi::AnyView` as the + "application API". You work with the C++ classes; the C struct handles ABI stability. + + +Tagged Union +~~~~~~~~~~~~ + +At the ABI level, every value lives in a :cpp:class:`TVMFFIAny`: + +.. code-block:: c + + typedef struct TVMFFIAny { + int32_t type_index; // Bytes 0-3: identifies the stored type + union { + uint32_t zero_padding; // Bytes 4-7: must be zero (or small_str_len) + uint32_t small_str_len; + }; + union { // Bytes 8-15: the actual value + int64_t v_int64; + double v_float64; + void* v_ptr; + TVMFFIObject* v_obj; + DLDevice v_device; + char v_bytes[8]; + // ... other union members + }; + } TVMFFIAny; + +It is effectively a layout-stable 16-byte tagged union. + +* The first 4-byte :cpp:member:`TVMFFIAny::type_index` is a tag that tells you the type of the stored value. +* The last 8-byte field holds the actual value, either directly stored in place as an atomic value (e.g. ``int64_t``, ``float64_t``, ``void*``) or as a pointer to a heap object. + +Strings and bytes get special treatment: those 7 bytes or shorter are stored inline using +the **small string optimization**, avoiding heap allocation entirely: + +.. code-block:: cpp + + ffi::Any small = "hello"; // kTVMFFISmallStr, in v_bytes + ffi::Any large = "this is a longer string"; // kTVMFFIStr, heap allocated + +Atomic Types +~~~~~~~~~~~~ + +Primitive values - integers, floats, booleans, devices, and raw pointers - are stored +directly in the 8-byte payload. No heap allocation or no reference counting. + +.. list-table:: Figure 2. Common atomic types stored directly in :cpp:class:`TVMFFIAny` + :header-rows: 1 + :name: atomic-types-table + :widths: 40 40 30 + + * - Type + - type_index + - Payload Field + * - ``None`` / ``nullptr`` + - :cpp:enumerator:`kTVMFFINone <TVMFFITypeIndex::kTVMFFINone>` = 0 + - :cpp:member:`~TVMFFIAny::v_int64` (must be 0) + * - ``int64_t`` + - :cpp:enumerator:`kTVMFFIInt <TVMFFITypeIndex::kTVMFFIInt>` = 1 + - :cpp:member:`~TVMFFIAny::v_int64` + * - ``bool`` + - :cpp:enumerator:`kTVMFFIBool <TVMFFITypeIndex::kTVMFFIBool>` = 2 + - :cpp:member:`~TVMFFIAny::v_int64` (0 or 1) + * - ``float64_t`` + - :cpp:enumerator:`kTVMFFIFloat <TVMFFITypeIndex::kTVMFFIFloat>` = 3 + - :cpp:member:`~TVMFFIAny::v_float64` + * - ``void*`` (opaque pointer) + - :cpp:enumerator:`kTVMFFIOpaquePtr <TVMFFITypeIndex::kTVMFFIOpaquePtr>` = 4 + - :cpp:member:`~TVMFFIAny::v_ptr` + * - :c:struct:`DLDataType <DLDataType>` + - :cpp:enumerator:`kTVMFFIDataType <TVMFFITypeIndex::kTVMFFIDataType>` = 5 + - :cpp:member:`~TVMFFIAny::v_dtype` + * - :c:struct:`DLDevice <DLDevice>` + - :cpp:enumerator:`kTVMFFIDevice <TVMFFITypeIndex::kTVMFFIDevice>` = 6 + - :cpp:member:`~TVMFFIAny::v_device` + * - :c:struct:`DLTensor* <DLTensor>` + - :cpp:enumerator:`kTVMFFIDLTensorPtr <TVMFFITypeIndex::kTVMFFIDLTensorPtr>` = 7 + - :cpp:member:`~TVMFFIAny::v_ptr` + * - ``const char*`` (raw string) + - :cpp:enumerator:`kTVMFFIRawStr <TVMFFITypeIndex::kTVMFFIRawStr>` = 8 + - :cpp:member:`~TVMFFIAny::v_c_str` + * - :cpp:class:`TVMFFIByteArray* <TVMFFIByteArray>` + - :cpp:enumerator:`kTVMFFIByteArrayPtr <TVMFFITypeIndex::kTVMFFIByteArrayPtr>` = 9 + - :cpp:member:`~TVMFFIAny::v_ptr` + +:ref:`Figure 2 <atomic-types-table>` shows common atomic types stored in-place inside the +:cpp:class:`TVMFFIAny` payload. + +.. code-block:: cpp + + AnyView int_val = 42; // v_int64 = 42 + AnyView float_val = 3.14; // v_float64 = 3.14 + AnyView bool_val = true; // v_int64 = 1 + AnyView device = DLDevice{kDLCUDA, 0}; // v_device + DLTensor tensor; + AnyView view = &tensor; // v_ptr = &tensor + +Note that raw pointers like :c:struct:`DLTensor* <DLTensor>` and ``char *`` also fit here. They carry no ownership. +It means the caller must ensure the pointed-to data outlives the :cpp:class:`~tvm::ffi::AnyView` or :cpp:class:`~tvm::ffi::Any`. + +Heap-Allocated Objects +~~~~~~~~~~~~~~~~~~~~~~ + +.. list-table:: Figure 3. Common TVM-FFI object types stored as pointers in :cpp:member:`TVMFFIAny::v_obj`. + :header-rows: 1 + :widths: 40 40 30 + + * - Type + - type_index + - Payload Field + * - :cpp:class:`ErrorObj* <tvm::ffi::ErrorObj>` + - :cpp:enumerator:`kTVMFFIError <TVMFFITypeIndex::kTVMFFIError>` = 67 + - :cpp:member:`~TVMFFIAny::v_obj` + * - :cpp:class:`FunctionObj* <tvm::ffi::FunctionObj>` + - :cpp:enumerator:`kTVMFFIFunction <TVMFFITypeIndex::kTVMFFIFunction>` = 68 + - :cpp:member:`~TVMFFIAny::v_obj` + * - :cpp:class:`TensorObj* <tvm::ffi::TensorObj>` + - :cpp:enumerator:`kTVMFFITensor <TVMFFITypeIndex::kTVMFFITensor>` = 70 + - :cpp:member:`~TVMFFIAny::v_obj` + * - :cpp:class:`ArrayObj* <tvm::ffi::ArrayObj>` + - :cpp:enumerator:`kTVMFFIArray <TVMFFITypeIndex::kTVMFFIArray>` = 71 + - :cpp:member:`~TVMFFIAny::v_obj` + * - :cpp:class:`MapObj* <tvm::ffi::MapObj>` + - :cpp:enumerator:`kTVMFFIMap <TVMFFITypeIndex::kTVMFFIMap>` = 72 + - :cpp:member:`~TVMFFIAny::v_obj` + * - :cpp:class:`ModuleObj* <tvm::ffi::ModuleObj>` + - :cpp:enumerator:`kTVMFFIModule <TVMFFITypeIndex::kTVMFFIModule>` = 73 + - :cpp:member:`~TVMFFIAny::v_obj` + + +Heap-allocated objects - :cpp:class:`~tvm::ffi::String`, :cpp:class:`~tvm::ffi::Function`, :cpp:class:`~tvm::ffi::Tensor`, :cpp:class:`~tvm::ffi::Array`, :cpp:class:`~tvm::ffi::Map`, and custom types - are +stored as pointers to reference-counted :cpp:class:`TVMFFIObject` headers: + +.. code-block:: cpp + + ffi::String str = "hello world"; + ffi::Any any_str = str; // v_obj points to StringObj + + // Object layout in memory: + // [TVMFFIObject header (24 bytes)][object-specific data] + + +Common Patterns +--------------- + +Extracting Values +~~~~~~~~~~~~~~~~~ + +Three methods pull values out of :cpp:class:`~tvm::ffi::Any` and :cpp:class:`~tvm::ffi::AnyView`, each with different +strictness: + +.. list-table:: + :header-rows: 1 + :widths: 20 40 40 + + * - Method + - Behavior + - Use When + * - :cpp:func:`~tvm::ffi::Any::cast` + - Returns ``T`` or throws :cpp:class:`~tvm::ffi::Error` + - You know the type + * - :cpp:func:`~tvm::ffi::Any::as` + - Returns ``std::optional<T>`` (strict) + - Type must match exactly + * - :cpp:func:`~tvm::ffi::Any::try_cast` + - Returns ``std::optional<T>`` (with coercion) + - Allow type conversion + +:cpp:func:`~tvm::ffi::Any::cast` is the workhorse. It returns the value or throws: + +.. code-block:: cpp + + ffi::Any value = 42; + int x = value.cast<int>(); // OK: 42 + double y = value.cast<double>(); // OK: 42.0 (int → double) + + try { + ffi::String s = value.cast<ffi::String>(); // Throws TypeError + } catch (const ffi::Error& e) { + // "Cannot convert from type `int` to `ffi.Str`" + } + +:cpp:func:`~tvm::ffi::Any::as` is strict - it only succeeds if the stored type exactly matches: + +.. code-block:: cpp + + ffi::Any value = 42; + + std::optional<int64_t> opt_int = value.as<int64_t>(); + // opt_int.has_value() == true + + std::optional<double> opt_float = value.as<double>(); + // opt_float.has_value() == false (int stored, not float) + +:cpp:func:`~tvm::ffi::Any::try_cast` allows type coercion: + +.. code-block:: cpp + + ffi::Any value = 42; + + std::optional<double> opt_float = value.try_cast<double>(); + // opt_float.has_value() == true, *opt_float == 42.0 + + std::optional<bool> opt_bool = value.try_cast<bool>(); + // opt_bool.has_value() == true, *opt_bool == true + +**Conversion rules** (what :cpp:func:`~tvm::ffi::Any::try_cast` allows): + +- ``int`` ↔ ``bool``: integers convert to bool (0 = false, non-zero = true) +- ``int`` → ``float``: integers convert to floating point +- ``bool`` → ``int``: bools convert to 0 or 1 +- Object downcasting: :cpp:class:`~tvm::ffi::ObjectRef` → derived type if runtime type matches Review Comment:  The current description of conversion rules is a bit redundant and the use of `↔` could be clearer. The rule for `int ↔ bool` is described, and then a separate rule for `bool → int` is also listed. To improve clarity and remove redundancy, I suggest rephrasing these rules to be more direct and distinct. ```suggestion - ``int`` to ``bool``: Non-zero integers convert to `true`, while `0` converts to `false`. - ``bool`` to ``int``: `true` converts to `1`, while `false` converts to `0`. - ``int`` to ``float``: Standard integer to floating-point conversion. - Object downcasting: :cpp:class:`~tvm::ffi::ObjectRef` → derived type if runtime type matches ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
