## Introduction
The TVM stack has been evolving for more than two years now. The current
compiler stack contains several components that were designed at different
points in time. This RFC proposes a unified IR infrastructure for the TVM stack
by combining past lessons.
>From a high-level the proposed infrastructure will consist of:
- A unified module, pass and type system for all IR function variants.
- Two major variants of IR expressions and functions: the high-level functional
IR(relay)and the tensor-level IR for loop optimizations.
- First-class Python and hybrid script support, and a cross-language in-memory
IR structure.
- A unified runtime::Module to enable extensive combination of traditional
devices, microcontrollers and NPUs.
## Unified IR Module and Pass Infra
The main component of our proposal is to introduce a unified IRModule structure
that can contain different variants of functions(relay::Function/te::Function).
The namespace te(tensor expression) is a tentative namespace for low-level
functions, and comments and suggestions about name choices are more than
welcome.

This change will simplify the intermediate representation data structures and
terminology used in our previous compilation flow, which can be shown as
follows:
```
importer: model ->
high-level optimizations: relay::Module -> optimizations -> relay::Module ->
low-level optimizations: compute/schedule declaration -> Stmt-> ir passes
->Stmt ->
device-specific codegen: LoweredFunc -> runtime::Module
```
As shown above, our current flow has different intermediate data structures at
different stages (relay::Module, Stmt, LoweredFunc) with different
terminologies. Under the new design, we will have a single module structure,
and transformations between IRs become ir::Module to ir::Module transformations.

More importantly, we can use specific calling conventions, enabling different
function variants to call each other. The following code snippet is a mock-up
to demonstrate a module containing both a relay.Function and te.Function. The
```relay_add_one``` function can call into the ```te_add_one``` function using
the destination-passing convention, where outputs are passed as inputs to the
function.
```
def @relay_add_one(%x : Tensor((10,), f32)) {
call_destination_passing @te_add_one(%x, out=%b)
}
def @te_add_one(%a: NDArray, %b: NDArray) {
var %n
%A = decl_buffer(shape=[%n], src=%a)
%B = decl_buffer(shape=[%n], src=%b)
// body ir contents need to be evolved
for %i = 0 to 10 [data_par] {
%B[%i] = %A[%i] + 1.0
}
}
```
Enabling both high-level functions(`relay.Function`) and tensor-expression
functions to coexist in the same module enables the potential for cross layer
optimizations. Of course, most of the transformations will only operate on one
type of function and will simply ignore other functions in the same module.
Most importantly, the proposed change will minimize concepts for developers.
Developers only need to know about ir::Module and runtime::Module. Every
transformation is ir::Module -> ir::Module.
AutoTVM and other schedule transformations can be viewed as an intelligent way
to transform an ir::Module. We are exploring ways to embed the current tensor
expressions as a special type of Function in ir::Module.
### Discussions
One of the design questions is whether to unify the te::Expr and relay::Expr
into a single base-class. relay::Expr allows tensor types and supports
broadcasting in operations. te::Expr only points to primitive data types(int,
float, pointers, buffers). Unifying them into a single base allows reuse of
certain AST nodes and potentially mix the expressions. On the other hand,
combining two namespaces brings additional complexity of mixing expressions
that could be invalid. The separate namespace also allows us to use the low
level expression in the tensor types to express shape constraints. Given these
considerations, thus we suggest that the two needs to be separated, at least
for now. Note that however, both use the same FunctionExpr, to enable calling
across different types of functions.
Another design question is how to express shape and data type buffer
constraints in the te.Function. One option is to express it as a constraint in
the type and make te.Function polymorphic to n. While this approach works well
for shape constraints, it is harder to express the sharing of data field for
in-place update. The current alternative is to express the constraint via bind
expression, we can also make bind information as a part of the function
signature, but not as type. This signature is a more faithful representation of
the final generated code. This approach allows us to make use of the
constraints during analysis. It does introduce less information for
shape-related typing checking when a relay function calls into a te.Function.
We can resolve that by providing an auxiliary function that reconstructs the
function type with constraints.
## External Function Interpolation
Besides relay.Function and te.Function, we can also introduce other external
function types. As long as we define a clear function calling convention into
these modules. For example, we can introduce an ASM function to embed external
assembly code. We can also incorporate functions expressed in other IRs such as
TorchScript, MLIR and LLVM to make use of these ecosystems.
The only limitation of external function is the ability to do cross-function
optimizations (as special code-path has to be written for each external IR).
Given most of our use-cases partitions functions in coarse grained matter, we
expect that such impact will be low, as long as the major chunk of the code is
optimized using the unified in-house variants.
## First-class Python and Hybrid Script Support
Python is the most popular language for deep learning frameworks due to its
great flexibility and rich ecosystem. We plan to provide first-class python
support. This is also a fundamental infrastructure design choice that makes TVM
different from other stacks.
Specifically, we expose a programmatic interface which allows developers and
researchers to customize compilation, and write new passes in Python. All IR
data structures can be created, manipulated and transformed using Python APIs.
We will leverage the rich ecosystem of Python ML to further explore ML guided
compilation.
Additionally, we plan to specify a subset of python AST that can express every
possible function in the TVM IR. The new dialect will be an extension of the
current TVM hybrid script, and will serve as a way to construct and inspect the
IR in Python. Eventually, it can serve as a secondary text format. In order to
cover all the possible IR nodes to enable round trip, we will take a gradual
upgrade approach by introducing the ```meta``` keyword in the hybrid script
similar to the meta support in the current relay text format. The meta is a
dictionary of IR nodes that are opaque in the text format but can be populated
by json blob. We can always serialize the IR components that are not in the
hybrid dialect specification into meta and improve the dialect specifications
gradually by adding more native constructs. The code below gives a same
function in the hybrid script format.

ML compilation is an open research area, while its great value has already
leads to quick transfer to production. We believe these additional
flexibilities will increase the rate of innovation, and are critical to help
accelerate the wildly open field of deep learning compilation. When the
optimization passes become stable, we can seamlessly move passes directly into
the C++ core when product ready.
## Pave Ways for to Other Languages
Besides the python support, the cross-language runtime object protocol also
paves ways for bringing native compiler support into other languages besides
python and C++. For example, one possible candidate is rust based the current
community interest. We can build language bindings that allows us to write
additional compilation passes in rust and expose them to the core tvm compiler.
We already provide runtime support on python, rust go, javascript, java.
## Extensible Unified Runtime Module
Because TVM stack supports different kinds of compilation targets, we need a
unified runtime interface to expose the compiled module to the developer.
runtime::Module is an abstract interface for all possible compilation targets.
Codegen is defined as a transformation from ir::Module -> runtime::Module. The
current runtime module interface already handles the serialization(as shared
library) and runtime linking(via PackedFunc).

As an artifact of compilation, we will get a module that imports several
modules of different types. For example, we could have a graph runtime module
that calls into host functions defined on the DSOModule, which then calls into
CUDA module to invoke device functions. It is important to create an extensible
mechanism to be able to run and export any heterogeneous runtime modules. We
have an on-going RFC that proposes a unified exportation format to package all
possible runtime::Module into a single shared library.

We can also evolve the interface to better handle cases such as hetrogenuous
devices and flexible task scheduling.
### Discussions
One tension in the runtime design is whether to introduce a more opinionated
view of serialization format (e.g. ONNX for model serialization) or allow each
module define their own format. We eventually use the later to give more
flexibility to each of the module, and only ask each module to expose necessary
interface for a unified packaging. Any module packaged in this way will
immediately enjoy the benefit by interacting with other components of the
stack, including automatic exposure python, java and other runtime APIs, and
use of the RPC infrastructure for scalable remote profiling.
## Relation with other IRs
Deep learning compilation is an active field and there are other related
efforts such as MLIR, LLVM, TorchScript, TensorRT and vendor specific
compilation and runtime toolchains.
We believe the most important principle as an open source project is to allow
our developers to tap into other ecosystems. In particular, we can incorporate
these compilation flow and runtime into tvm by using ExternFunc in the
compilation flow, and a specific runtime in the unified runtime interface.
We would also like to facilitate cross IR translations, for example,
TorchScript to Relay importer and expose compilation back to a PyTorch
compatible function. Another area of interest would be to create translation
from MLIR-TF dialect to TVM IR, and make enable MLIR functions as part of
ExternFunc support to give better support and interpolation for TF ecosystem.
The scope of TVM’s IR remains very focused -- automating the optimization of
deep learning workloads on diverse hardware backends and enable our developers
to easily integrate the TVM into their stack.
## Timeline Upgrade Path
Once we agree on the general design, we can step several refactor steps to
bring the current codebase to the new IR infrastructure.
- Introduce ir::Module and ir::Function
- Move relay::Module to make use of ir::Module
- Move the current low-level IR infra ir_pass and LoweredFunc transformations
to Module->Module
- Introduce text format and hybrid script specifications of the unified
ir::Module
- Continue to evolve the design of the IRs themselves
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/4617