zhiics commented on a change in pull request #6097:
URL: https://github.com/apache/incubator-tvm/pull/6097#discussion_r458193399
##########
File path: docs/dev/index.rst
##########
@@ -15,28 +15,322 @@
specific language governing permissions and limitations
under the License.
-Design and Developer Guide
-==========================
+Design and Architecture
+=======================
+
+This document is intended for developers who want to understand the
+architecture of TVM and/or actively develop on the project.
+This page is organized as follows:
+
+- The `Example Compilation Flow`_ gives an overview of the steps that TVM
takes to turn a high level description of a model into a deployable module.
+ To get started, please read the this section first.
+- The `Logical Architecture Components`_ section describes the logical
components.
+ The sections after are specific guides focused on each logical component,
organized
+ by the component's name.
+- The `How Tos`_ section contains useful tutorials to solve specific
development problems.
+
+This guide provides a few complementary views of the architecture.
+First, we review a single end to end compilation flow and discuss the key data
structures and the transformations.
Review comment:
```suggestion
First, we review a single end-to-end compilation flow and discuss the key
data structures and the transformations.
```
##########
File path: docs/dev/index.rst
##########
@@ -15,28 +15,322 @@
specific language governing permissions and limitations
under the License.
-Design and Developer Guide
-==========================
+Design and Architecture
+=======================
+
+This document is intended for developers who want to understand the
+architecture of TVM and/or actively develop on the project.
+This page is organized as follows:
+
+- The `Example Compilation Flow`_ gives an overview of the steps that TVM
takes to turn a high level description of a model into a deployable module.
+ To get started, please read the this section first.
Review comment:
```suggestion
To get started, please read this section first.
```
##########
File path: docs/dev/index.rst
##########
@@ -15,28 +15,360 @@
specific language governing permissions and limitations
under the License.
-Design and Developer Guide
-==========================
+Design and Architecture
+=======================
+
+This document is intended for developers who want to understand the
+architecture of TVM and/or actively develop on the project.
+This page is organized as follows:
+
+- The `Example Compilation Flow`_ gives an overview of the steps that TVM
takes to turn a high level description of a model into a deployable module.
+ To get started, please read the this section first.
+- The `Logical Architecture Components`_ section describes the logical
components.
+ The sections after are specific guides focused on each logical component,
organized
+ by the component's name.
+- The `How Tos`_ section contains useful tutorials to solve specific
development problems.
+
+This guide provides a few complementary views of the architecture.
+First, we review a single end to end compilation flow and discuss the key data
structures and the transformations.
+This runtime-based view focuses on the interactions of each components when
running the compiler.
+Then we will review the logical modules of the codebase and their
relationship. This part provides a static overarching view of the design.
+
+
+Example Compilation Flow
+------------------------
+
+In this guide, we will study an example compilation flow in the compiler. The
figure below shows the flow. At a high-level, it contains several steps:
+
+- Import: The frontend component ingests a model into an IRModule, which
contains a collection of functions that internally represent the model.
+- Transformation: The compiler transforms an IRModule to another functionally
equivalent or approximately
+ equivalent(e.g. in the case of quantization) IRModule. Many of the
transformatons are target (backend) independent.
+ We also allow target to affect the configuration of the transformation
pipeline.
+- Target Translation: The compiler translates(codegen) the IRModule to an
executable format specified by the target.
+ The target translation result is encapsulated as a `runtime.Module` that can
be exported, loaded, and executed on the target runtime environment.
+- Runtime Execution: the user loads back a `runtime.Module` and runs the
compiled functions in the supported runtime environment.
+
+
+.. figure::
https://raw.githubusercontent.com/tvmai/web-data/master/images/design/tvm_dyn_workflow.svg
+ :align: center
+ :width: 85%
+
+
+Key data structures
+~~~~~~~~~~~~~~~~~~~
+
+One of the best ways to design and understand a complex system is to identify
the key data structures and APIs that
+manipulate (transform) these data structures. Once we identified the key data
structures, we can then breakdown a system into logical
+components that either define a collection of key data structures or
transformations among the data structures.
+
+**IRModule** is the primary data structure used across the entire stack. An
IRModule (intermediate representation module)
+contains a collection of functions. Currently, we support two primary variants
of functions.
+
+- **relay::Function** is a high-level functional program representation. A
relay.Function usually corresponds to an end to end model.
+ You can view a relay.Function as a computational graph with additional
support for control-flow, recursion, and complex data structures.
+- **tir::PrimFunc** is a low-level program representation that contains
elements including loop-nest choices, multi-dimensional load/store,
+ threading, and vector/tensor instructions. It is usually used to represent
an operator program that executes a (possibly-fused) layer in a model.
+
+During the compilation, a relay function may be lowered to multiple
tir::PrimFunc functions and a top-level function that calls into
+those tir::PrimFunc functions.
+
+Transformations
+~~~~~~~~~~~~~~~
+
+Now that we have covered the key data structures, let us talk about the
transformations. Each transformation could serve one of the following purposes:
+
+- optimization: transform a program to an equivalent, possibly more optimized
version.
+- lowering: transform a program to a lower-level representation that is closer
to the target.
+
+**relay/transform** contains a collection of passes that optimize the model.
The optimizations include common program
+optimizations such as constant folding and dead-code elimination, and
tensor-computation specific passes such as layout
+transformation and scaling factor folding.
+
+Near the end of the relay optimization pipeline, we will run a pass(FuseOps)
to break the end to end function(e.g. mobilenet)
+into sub-function(e.g. conv2d-relu) segments. We call these segments of
functions.
+This process helps us to divide the original problem into two sub-problems:
+
+- Compilation and optimization for each sub-function.
+- Overall execution structure: we need to do a sequence of calls into the
generated sub-functions to execute the whole model.
+
+We use the low-level tir phase to compile and optimize each sub-functions. For
specific targets, we may also directly go to the target translation phase and
use external code generators.
+
+There are a few different ways(in relay/backend) to handle the calls into the
overall execution problem. For simple models with known shapes and no control
flow, we can lower to a graph runtime that stores the execution structure in a
graph. We also support a virtual machine backend for dynamic executions.
Finally, we plan to support ahead of time compilation that compiles the
high-level execution structure into the executable and generated primitive
functions. All of these execution modes are encapsulated by a unified
**runtime.Module** interface, which we will discuss in the latter part of the
guide.
+
+**tir/transform** contains transformation passes for TIR level functions. Many
tir passes serve the purpose of lowering. For example, there are passes to
flatten multi-dimensional access to one-dimensional pointer access, to expand
the intrinsics into target-specific ones, and to decorate the function entry to
meet the runtime calling convention. Of course, there are also optimizations
passes, such as access index simplification and dead code elimination.
+
+Many low-level optimizations can be handled in the target phase by the LLVM,
CUDA C, and other target compilers. As a result, we leave low-level
optimizations such as register allocation to the downstream compilers and only
focus on optimizations that are not covered by them.
+
+Search-space and Learning-based Transformations
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The transformation passes we described so far are deterministic and
rule-based. One design goal of the TVM stack is to support high-performance
code optimizations for different hardware platforms. To do so, we will need to
investigate as many optimizations choices as possible, including but not
limited to, multi-dimensional tensor access, loop tiling behavior, special
accelerator memory hierarchy, and threading.
+
+It is hard to define a heuristic to make all of the choices. Instead, we will
take a search and learning-based approach.
+We first define a collection of actions we can take to transform a program.
Example actions include loop transformations, inlining,
+vectorization. We call these actions **scheduling primitives**. The collection
of scheduling primitives defines a search space of possible
+optimizations we can make to a program. The system will use then searches over
different possible scheduling
+sequence to pick the best scheduling combination.
+The search procedure is usually guided by a machine learning algorithm.
+
+We can record the best schedule sequence for an (possibly-fused) operator once
the search is completed. The compiler can then just lookup the best
+schedule sequence and apply it to the program. Notably, this schedule
application phase **exactly like** the rule-based transformations,
+enabling us to share the same interface convention with tradition passes.
+
+We use search based optimizations to handle the initial tir function
generation problem. This part of the module is called AutoTVM(auto_scheduler).
+We expect to expand the learning-based transformations to more areas as we
continue to develop the TVM stack.
+
+Target Translation
+~~~~~~~~~~~~~~~~~~
+
+The target translation phase transforms an IRModule to the corresponding
target executable format.
+For backends such as x86 and ARM, we will use the LLVM IRBuilder to build
in-memory LLVM IR.
+We can also generate source-level languages such as CUDA C and OpenCL.
+Finally, we support the direct translation of a Relay function (sub-graph) for
external code generators.
+Importantly, the final code generation phase should be lightweight as possible
with the vast majority of transformations
+and lowering performed before target translation.
+We also provide a Target structure to specify the compilation target.
+The transformations before the target translation phase can also be affected
by the target — for example,
+a target's vector length would change the vectorization behavior.
+
+Runtime Execution
+~~~~~~~~~~~~~~~~~
+
+The main goal of TVM's runtime is to provide a minimal API for loading and
executing the compiled artifact in a language of their choice, including
Python, C++, Rust, Go, Java, and JavaScript. The code snippet below shows such
an example in Python:
+
+.. code-block:: python
+
+ import tvm
+ # Example runtime execution program in python, with type annotated
+ mod: tvm.runtime.Module = tvm.runtime.load_module("compiled_artifact.so")
+ arr: tvm.runtime.NDArray = tvm.nd.array([1, 2, 3], ctx=tvm.gpu(0))
+ fun: tvm.runtime.PackedFunc = mod["addone"]
+ fun(a)
+ print(a.asnumpy())
+
+
+:py:class:`tvm.runtime.Module` encapsulates the result of compilation. A
runtime.Module contains a GetFunction method to obtain PackedFuncs by name.
+
+:py:class:`tvm.runtime.PackedFunc` is a type-erased function interface for
both the generated functions. A runtime.PackedFunc can take arguments and
return values with the following types: POD types(int, float), string,
runtime.PackedFunc, runtime.Module, runtime.NDArray, sub-classes of
runtime.Object.
+
+:py:class:`tvm.runtime.Module` and :py:class:`tvm.runtime.PackedFunc` are
powerful mechanisms to modularize the runtime. For example, to get the above
`addone` function on CUDA, we can use LLVM to generate the host-side code to
compute the launching parameters(e.g. size of the thread groups) and then call
into another PackedFunc from a CUDAModule that is backed by the CUDA driver
API. The same mechanism can be used for OpenCL kernels.
+
+The above example only deals with a simple `addone` function. The code snippet
below gives an example of an end to end model execution using the same
interface:
Review comment:
```suggestion
The above example only deals with a simple `addone` function. The code
snippet below gives an example of an end-to-end model execution using the same
interface:
```
##########
File path: docs/dev/index.rst
##########
@@ -15,28 +15,360 @@
specific language governing permissions and limitations
under the License.
-Design and Developer Guide
-==========================
+Design and Architecture
+=======================
+
+This document is intended for developers who want to understand the
+architecture of TVM and/or actively develop on the project.
+This page is organized as follows:
+
+- The `Example Compilation Flow`_ gives an overview of the steps that TVM
takes to turn a high level description of a model into a deployable module.
+ To get started, please read the this section first.
+- The `Logical Architecture Components`_ section describes the logical
components.
+ The sections after are specific guides focused on each logical component,
organized
+ by the component's name.
+- The `How Tos`_ section contains useful tutorials to solve specific
development problems.
+
+This guide provides a few complementary views of the architecture.
+First, we review a single end to end compilation flow and discuss the key data
structures and the transformations.
+This runtime-based view focuses on the interactions of each components when
running the compiler.
+Then we will review the logical modules of the codebase and their
relationship. This part provides a static overarching view of the design.
+
+
+Example Compilation Flow
+------------------------
+
+In this guide, we will study an example compilation flow in the compiler. The
figure below shows the flow. At a high-level, it contains several steps:
+
+- Import: The frontend component ingests a model into an IRModule, which
contains a collection of functions that internally represent the model.
+- Transformation: The compiler transforms an IRModule to another functionally
equivalent or approximately
+ equivalent(e.g. in the case of quantization) IRModule. Many of the
transformatons are target (backend) independent.
+ We also allow target to affect the configuration of the transformation
pipeline.
+- Target Translation: The compiler translates(codegen) the IRModule to an
executable format specified by the target.
+ The target translation result is encapsulated as a `runtime.Module` that can
be exported, loaded, and executed on the target runtime environment.
+- Runtime Execution: the user loads back a `runtime.Module` and runs the
compiled functions in the supported runtime environment.
+
+
+.. figure::
https://raw.githubusercontent.com/tvmai/web-data/master/images/design/tvm_dyn_workflow.svg
+ :align: center
+ :width: 85%
+
+
+Key data structures
+~~~~~~~~~~~~~~~~~~~
+
+One of the best ways to design and understand a complex system is to identify
the key data structures and APIs that
+manipulate (transform) these data structures. Once we identified the key data
structures, we can then breakdown a system into logical
+components that either define a collection of key data structures or
transformations among the data structures.
+
+**IRModule** is the primary data structure used across the entire stack. An
IRModule (intermediate representation module)
+contains a collection of functions. Currently, we support two primary variants
of functions.
+
+- **relay::Function** is a high-level functional program representation. A
relay.Function usually corresponds to an end to end model.
+ You can view a relay.Function as a computational graph with additional
support for control-flow, recursion, and complex data structures.
+- **tir::PrimFunc** is a low-level program representation that contains
elements including loop-nest choices, multi-dimensional load/store,
+ threading, and vector/tensor instructions. It is usually used to represent
an operator program that executes a (possibly-fused) layer in a model.
+
+During the compilation, a relay function may be lowered to multiple
tir::PrimFunc functions and a top-level function that calls into
+those tir::PrimFunc functions.
+
+Transformations
+~~~~~~~~~~~~~~~
+
+Now that we have covered the key data structures, let us talk about the
transformations. Each transformation could serve one of the following purposes:
+
+- optimization: transform a program to an equivalent, possibly more optimized
version.
+- lowering: transform a program to a lower-level representation that is closer
to the target.
+
+**relay/transform** contains a collection of passes that optimize the model.
The optimizations include common program
+optimizations such as constant folding and dead-code elimination, and
tensor-computation specific passes such as layout
+transformation and scaling factor folding.
+
+Near the end of the relay optimization pipeline, we will run a pass(FuseOps)
to break the end to end function(e.g. mobilenet)
+into sub-function(e.g. conv2d-relu) segments. We call these segments of
functions.
+This process helps us to divide the original problem into two sub-problems:
+
+- Compilation and optimization for each sub-function.
+- Overall execution structure: we need to do a sequence of calls into the
generated sub-functions to execute the whole model.
+
+We use the low-level tir phase to compile and optimize each sub-functions. For
specific targets, we may also directly go to the target translation phase and
use external code generators.
+
+There are a few different ways(in relay/backend) to handle the calls into the
overall execution problem. For simple models with known shapes and no control
flow, we can lower to a graph runtime that stores the execution structure in a
graph. We also support a virtual machine backend for dynamic executions.
Finally, we plan to support ahead of time compilation that compiles the
high-level execution structure into the executable and generated primitive
functions. All of these execution modes are encapsulated by a unified
**runtime.Module** interface, which we will discuss in the latter part of the
guide.
+
+**tir/transform** contains transformation passes for TIR level functions. Many
tir passes serve the purpose of lowering. For example, there are passes to
flatten multi-dimensional access to one-dimensional pointer access, to expand
the intrinsics into target-specific ones, and to decorate the function entry to
meet the runtime calling convention. Of course, there are also optimizations
passes, such as access index simplification and dead code elimination.
+
+Many low-level optimizations can be handled in the target phase by the LLVM,
CUDA C, and other target compilers. As a result, we leave low-level
optimizations such as register allocation to the downstream compilers and only
focus on optimizations that are not covered by them.
+
+Search-space and Learning-based Transformations
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The transformation passes we described so far are deterministic and
rule-based. One design goal of the TVM stack is to support high-performance
code optimizations for different hardware platforms. To do so, we will need to
investigate as many optimizations choices as possible, including but not
limited to, multi-dimensional tensor access, loop tiling behavior, special
accelerator memory hierarchy, and threading.
+
+It is hard to define a heuristic to make all of the choices. Instead, we will
take a search and learning-based approach.
+We first define a collection of actions we can take to transform a program.
Example actions include loop transformations, inlining,
+vectorization. We call these actions **scheduling primitives**. The collection
of scheduling primitives defines a search space of possible
+optimizations we can make to a program. The system will use then searches over
different possible scheduling
+sequence to pick the best scheduling combination.
+The search procedure is usually guided by a machine learning algorithm.
+
+We can record the best schedule sequence for an (possibly-fused) operator once
the search is completed. The compiler can then just lookup the best
+schedule sequence and apply it to the program. Notably, this schedule
application phase **exactly like** the rule-based transformations,
+enabling us to share the same interface convention with tradition passes.
+
+We use search based optimizations to handle the initial tir function
generation problem. This part of the module is called AutoTVM(auto_scheduler).
+We expect to expand the learning-based transformations to more areas as we
continue to develop the TVM stack.
+
+Target Translation
+~~~~~~~~~~~~~~~~~~
+
+The target translation phase transforms an IRModule to the corresponding
target executable format.
+For backends such as x86 and ARM, we will use the LLVM IRBuilder to build
in-memory LLVM IR.
+We can also generate source-level languages such as CUDA C and OpenCL.
+Finally, we support the direct translation of a Relay function (sub-graph) for
external code generators.
+Importantly, the final code generation phase should be lightweight as possible
with the vast majority of transformations
+and lowering performed before target translation.
+We also provide a Target structure to specify the compilation target.
+The transformations before the target translation phase can also be affected
by the target — for example,
+a target's vector length would change the vectorization behavior.
+
+Runtime Execution
+~~~~~~~~~~~~~~~~~
+
+The main goal of TVM's runtime is to provide a minimal API for loading and
executing the compiled artifact in a language of their choice, including
Python, C++, Rust, Go, Java, and JavaScript. The code snippet below shows such
an example in Python:
+
+.. code-block:: python
+
+ import tvm
+ # Example runtime execution program in python, with type annotated
+ mod: tvm.runtime.Module = tvm.runtime.load_module("compiled_artifact.so")
+ arr: tvm.runtime.NDArray = tvm.nd.array([1, 2, 3], ctx=tvm.gpu(0))
+ fun: tvm.runtime.PackedFunc = mod["addone"]
+ fun(a)
+ print(a.asnumpy())
+
+
+:py:class:`tvm.runtime.Module` encapsulates the result of compilation. A
runtime.Module contains a GetFunction method to obtain PackedFuncs by name.
+
+:py:class:`tvm.runtime.PackedFunc` is a type-erased function interface for
both the generated functions. A runtime.PackedFunc can take arguments and
return values with the following types: POD types(int, float), string,
runtime.PackedFunc, runtime.Module, runtime.NDArray, sub-classes of
runtime.Object.
+
+:py:class:`tvm.runtime.Module` and :py:class:`tvm.runtime.PackedFunc` are
powerful mechanisms to modularize the runtime. For example, to get the above
`addone` function on CUDA, we can use LLVM to generate the host-side code to
compute the launching parameters(e.g. size of the thread groups) and then call
into another PackedFunc from a CUDAModule that is backed by the CUDA driver
API. The same mechanism can be used for OpenCL kernels.
+
+The above example only deals with a simple `addone` function. The code snippet
below gives an example of an end to end model execution using the same
interface:
+
+.. code-block:: python
+
+ import tvm
+ # Example runtime execution program in python, with type annotated
+ factory: tvm.runtime.Module = tvm.runtime.load_module("resnet18.so")
+ # Create a stateful graph execution module for resnet18 on gpu(0)
+ gmod: tvm.runtime.Module = factory["resnet18"](tvm.gpu(0))
+ data: tvm.runtime.NDArray = get_input_data()
+ # set input
+ gmod["set_input"](0, data)
+ # execute the model
+ gmod["run"]()
+ # get the output
+ result = gmod["get_output"](0).asnumpy()
+
+The main take away is that the runtime.Module and runtime.PackedFunc are
sufficient to encapsulate both operator level programs(such as addone), as well
as the end to end models.
Review comment:
```suggestion
The main take away is that runtime.Module and runtime.PackedFunc are
sufficient to encapsulate both operator level programs(such as addone), as well
as the end-to-end models.
```
##########
File path: docs/dev/index.rst
##########
@@ -15,28 +15,360 @@
specific language governing permissions and limitations
under the License.
-Design and Developer Guide
-==========================
+Design and Architecture
+=======================
+
+This document is intended for developers who want to understand the
+architecture of TVM and/or actively develop on the project.
+This page is organized as follows:
+
+- The `Example Compilation Flow`_ gives an overview of the steps that TVM
takes to turn a high level description of a model into a deployable module.
+ To get started, please read the this section first.
+- The `Logical Architecture Components`_ section describes the logical
components.
+ The sections after are specific guides focused on each logical component,
organized
+ by the component's name.
+- The `How Tos`_ section contains useful tutorials to solve specific
development problems.
+
+This guide provides a few complementary views of the architecture.
+First, we review a single end to end compilation flow and discuss the key data
structures and the transformations.
+This runtime-based view focuses on the interactions of each components when
running the compiler.
+Then we will review the logical modules of the codebase and their
relationship. This part provides a static overarching view of the design.
+
+
+Example Compilation Flow
+------------------------
+
+In this guide, we will study an example compilation flow in the compiler. The
figure below shows the flow. At a high-level, it contains several steps:
+
+- Import: The frontend component ingests a model into an IRModule, which
contains a collection of functions that internally represent the model.
+- Transformation: The compiler transforms an IRModule to another functionally
equivalent or approximately
+ equivalent(e.g. in the case of quantization) IRModule. Many of the
transformatons are target (backend) independent.
+ We also allow target to affect the configuration of the transformation
pipeline.
+- Target Translation: The compiler translates(codegen) the IRModule to an
executable format specified by the target.
+ The target translation result is encapsulated as a `runtime.Module` that can
be exported, loaded, and executed on the target runtime environment.
+- Runtime Execution: the user loads back a `runtime.Module` and runs the
compiled functions in the supported runtime environment.
+
+
+.. figure::
https://raw.githubusercontent.com/tvmai/web-data/master/images/design/tvm_dyn_workflow.svg
+ :align: center
+ :width: 85%
+
+
+Key data structures
+~~~~~~~~~~~~~~~~~~~
+
+One of the best ways to design and understand a complex system is to identify
the key data structures and APIs that
+manipulate (transform) these data structures. Once we identified the key data
structures, we can then breakdown a system into logical
+components that either define a collection of key data structures or
transformations among the data structures.
+
+**IRModule** is the primary data structure used across the entire stack. An
IRModule (intermediate representation module)
+contains a collection of functions. Currently, we support two primary variants
of functions.
+
+- **relay::Function** is a high-level functional program representation. A
relay.Function usually corresponds to an end to end model.
+ You can view a relay.Function as a computational graph with additional
support for control-flow, recursion, and complex data structures.
+- **tir::PrimFunc** is a low-level program representation that contains
elements including loop-nest choices, multi-dimensional load/store,
+ threading, and vector/tensor instructions. It is usually used to represent
an operator program that executes a (possibly-fused) layer in a model.
+
+During the compilation, a relay function may be lowered to multiple
tir::PrimFunc functions and a top-level function that calls into
+those tir::PrimFunc functions.
+
+Transformations
+~~~~~~~~~~~~~~~
+
+Now that we have covered the key data structures, let us talk about the
transformations. Each transformation could serve one of the following purposes:
+
+- optimization: transform a program to an equivalent, possibly more optimized
version.
+- lowering: transform a program to a lower-level representation that is closer
to the target.
+
+**relay/transform** contains a collection of passes that optimize the model.
The optimizations include common program
+optimizations such as constant folding and dead-code elimination, and
tensor-computation specific passes such as layout
+transformation and scaling factor folding.
+
+Near the end of the relay optimization pipeline, we will run a pass(FuseOps)
to break the end to end function(e.g. mobilenet)
+into sub-function(e.g. conv2d-relu) segments. We call these segments of
functions.
+This process helps us to divide the original problem into two sub-problems:
+
+- Compilation and optimization for each sub-function.
+- Overall execution structure: we need to do a sequence of calls into the
generated sub-functions to execute the whole model.
+
+We use the low-level tir phase to compile and optimize each sub-functions. For
specific targets, we may also directly go to the target translation phase and
use external code generators.
+
+There are a few different ways(in relay/backend) to handle the calls into the
overall execution problem. For simple models with known shapes and no control
flow, we can lower to a graph runtime that stores the execution structure in a
graph. We also support a virtual machine backend for dynamic executions.
Finally, we plan to support ahead of time compilation that compiles the
high-level execution structure into the executable and generated primitive
functions. All of these execution modes are encapsulated by a unified
**runtime.Module** interface, which we will discuss in the latter part of the
guide.
+
+**tir/transform** contains transformation passes for TIR level functions. Many
tir passes serve the purpose of lowering. For example, there are passes to
flatten multi-dimensional access to one-dimensional pointer access, to expand
the intrinsics into target-specific ones, and to decorate the function entry to
meet the runtime calling convention. Of course, there are also optimizations
passes, such as access index simplification and dead code elimination.
+
+Many low-level optimizations can be handled in the target phase by the LLVM,
CUDA C, and other target compilers. As a result, we leave low-level
optimizations such as register allocation to the downstream compilers and only
focus on optimizations that are not covered by them.
+
+Search-space and Learning-based Transformations
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The transformation passes we described so far are deterministic and
rule-based. One design goal of the TVM stack is to support high-performance
code optimizations for different hardware platforms. To do so, we will need to
investigate as many optimizations choices as possible, including but not
limited to, multi-dimensional tensor access, loop tiling behavior, special
accelerator memory hierarchy, and threading.
+
+It is hard to define a heuristic to make all of the choices. Instead, we will
take a search and learning-based approach.
+We first define a collection of actions we can take to transform a program.
Example actions include loop transformations, inlining,
+vectorization. We call these actions **scheduling primitives**. The collection
of scheduling primitives defines a search space of possible
+optimizations we can make to a program. The system will use then searches over
different possible scheduling
+sequence to pick the best scheduling combination.
+The search procedure is usually guided by a machine learning algorithm.
+
+We can record the best schedule sequence for an (possibly-fused) operator once
the search is completed. The compiler can then just lookup the best
+schedule sequence and apply it to the program. Notably, this schedule
application phase **exactly like** the rule-based transformations,
+enabling us to share the same interface convention with tradition passes.
+
+We use search based optimizations to handle the initial tir function
generation problem. This part of the module is called AutoTVM(auto_scheduler).
+We expect to expand the learning-based transformations to more areas as we
continue to develop the TVM stack.
+
+Target Translation
+~~~~~~~~~~~~~~~~~~
+
+The target translation phase transforms an IRModule to the corresponding
target executable format.
+For backends such as x86 and ARM, we will use the LLVM IRBuilder to build
in-memory LLVM IR.
+We can also generate source-level languages such as CUDA C and OpenCL.
+Finally, we support the direct translation of a Relay function (sub-graph) for
external code generators.
+Importantly, the final code generation phase should be lightweight as possible
with the vast majority of transformations
+and lowering performed before target translation.
+We also provide a Target structure to specify the compilation target.
+The transformations before the target translation phase can also be affected
by the target — for example,
+a target's vector length would change the vectorization behavior.
+
+Runtime Execution
+~~~~~~~~~~~~~~~~~
+
+The main goal of TVM's runtime is to provide a minimal API for loading and
executing the compiled artifact in a language of their choice, including
Python, C++, Rust, Go, Java, and JavaScript. The code snippet below shows such
an example in Python:
+
+.. code-block:: python
+
+ import tvm
+ # Example runtime execution program in python, with type annotated
+ mod: tvm.runtime.Module = tvm.runtime.load_module("compiled_artifact.so")
+ arr: tvm.runtime.NDArray = tvm.nd.array([1, 2, 3], ctx=tvm.gpu(0))
+ fun: tvm.runtime.PackedFunc = mod["addone"]
+ fun(a)
+ print(a.asnumpy())
+
+
+:py:class:`tvm.runtime.Module` encapsulates the result of compilation. A
runtime.Module contains a GetFunction method to obtain PackedFuncs by name.
+
+:py:class:`tvm.runtime.PackedFunc` is a type-erased function interface for
both the generated functions. A runtime.PackedFunc can take arguments and
return values with the following types: POD types(int, float), string,
runtime.PackedFunc, runtime.Module, runtime.NDArray, sub-classes of
runtime.Object.
Review comment:
and other sub-classes of runtime.Object? because module is also a
sub-class of it while it is handled differently.
##########
File path: docs/dev/index.rst
##########
@@ -15,28 +15,360 @@
specific language governing permissions and limitations
under the License.
-Design and Developer Guide
-==========================
+Design and Architecture
+=======================
+
+This document is intended for developers who want to understand the
+architecture of TVM and/or actively develop on the project.
+This page is organized as follows:
+
+- The `Example Compilation Flow`_ gives an overview of the steps that TVM
takes to turn a high level description of a model into a deployable module.
+ To get started, please read the this section first.
+- The `Logical Architecture Components`_ section describes the logical
components.
+ The sections after are specific guides focused on each logical component,
organized
+ by the component's name.
+- The `How Tos`_ section contains useful tutorials to solve specific
development problems.
+
+This guide provides a few complementary views of the architecture.
+First, we review a single end to end compilation flow and discuss the key data
structures and the transformations.
+This runtime-based view focuses on the interactions of each components when
running the compiler.
+Then we will review the logical modules of the codebase and their
relationship. This part provides a static overarching view of the design.
+
+
+Example Compilation Flow
+------------------------
+
+In this guide, we will study an example compilation flow in the compiler. The
figure below shows the flow. At a high-level, it contains several steps:
+
+- Import: The frontend component ingests a model into an IRModule, which
contains a collection of functions that internally represent the model.
+- Transformation: The compiler transforms an IRModule to another functionally
equivalent or approximately
+ equivalent(e.g. in the case of quantization) IRModule. Many of the
transformatons are target (backend) independent.
+ We also allow target to affect the configuration of the transformation
pipeline.
+- Target Translation: The compiler translates(codegen) the IRModule to an
executable format specified by the target.
+ The target translation result is encapsulated as a `runtime.Module` that can
be exported, loaded, and executed on the target runtime environment.
+- Runtime Execution: the user loads back a `runtime.Module` and runs the
compiled functions in the supported runtime environment.
+
+
+.. figure::
https://raw.githubusercontent.com/tvmai/web-data/master/images/design/tvm_dyn_workflow.svg
+ :align: center
+ :width: 85%
+
+
+Key data structures
+~~~~~~~~~~~~~~~~~~~
+
+One of the best ways to design and understand a complex system is to identify
the key data structures and APIs that
+manipulate (transform) these data structures. Once we identified the key data
structures, we can then breakdown a system into logical
+components that either define a collection of key data structures or
transformations among the data structures.
+
+**IRModule** is the primary data structure used across the entire stack. An
IRModule (intermediate representation module)
+contains a collection of functions. Currently, we support two primary variants
of functions.
+
+- **relay::Function** is a high-level functional program representation. A
relay.Function usually corresponds to an end to end model.
+ You can view a relay.Function as a computational graph with additional
support for control-flow, recursion, and complex data structures.
+- **tir::PrimFunc** is a low-level program representation that contains
elements including loop-nest choices, multi-dimensional load/store,
+ threading, and vector/tensor instructions. It is usually used to represent
an operator program that executes a (possibly-fused) layer in a model.
+
+During the compilation, a relay function may be lowered to multiple
tir::PrimFunc functions and a top-level function that calls into
+those tir::PrimFunc functions.
+
+Transformations
+~~~~~~~~~~~~~~~
+
+Now that we have covered the key data structures, let us talk about the
transformations. Each transformation could serve one of the following purposes:
+
+- optimization: transform a program to an equivalent, possibly more optimized
version.
+- lowering: transform a program to a lower-level representation that is closer
to the target.
+
+**relay/transform** contains a collection of passes that optimize the model.
The optimizations include common program
+optimizations such as constant folding and dead-code elimination, and
tensor-computation specific passes such as layout
+transformation and scaling factor folding.
+
+Near the end of the relay optimization pipeline, we will run a pass(FuseOps)
to break the end to end function(e.g. mobilenet)
Review comment:
```suggestion
Near the end of the relay optimization pipeline, we will run a pass(FuseOps)
to break the end-to-end function(e.g. MobileNet)
```
##########
File path: docs/dev/index.rst
##########
@@ -15,28 +15,360 @@
specific language governing permissions and limitations
under the License.
-Design and Developer Guide
-==========================
+Design and Architecture
+=======================
+
+This document is intended for developers who want to understand the
+architecture of TVM and/or actively develop on the project.
+This page is organized as follows:
+
+- The `Example Compilation Flow`_ gives an overview of the steps that TVM
takes to turn a high level description of a model into a deployable module.
+ To get started, please read the this section first.
+- The `Logical Architecture Components`_ section describes the logical
components.
+ The sections after are specific guides focused on each logical component,
organized
+ by the component's name.
+- The `How Tos`_ section contains useful tutorials to solve specific
development problems.
+
+This guide provides a few complementary views of the architecture.
+First, we review a single end to end compilation flow and discuss the key data
structures and the transformations.
+This runtime-based view focuses on the interactions of each components when
running the compiler.
+Then we will review the logical modules of the codebase and their
relationship. This part provides a static overarching view of the design.
+
+
+Example Compilation Flow
+------------------------
+
+In this guide, we will study an example compilation flow in the compiler. The
figure below shows the flow. At a high-level, it contains several steps:
+
+- Import: The frontend component ingests a model into an IRModule, which
contains a collection of functions that internally represent the model.
+- Transformation: The compiler transforms an IRModule to another functionally
equivalent or approximately
+ equivalent(e.g. in the case of quantization) IRModule. Many of the
transformatons are target (backend) independent.
+ We also allow target to affect the configuration of the transformation
pipeline.
+- Target Translation: The compiler translates(codegen) the IRModule to an
executable format specified by the target.
+ The target translation result is encapsulated as a `runtime.Module` that can
be exported, loaded, and executed on the target runtime environment.
+- Runtime Execution: the user loads back a `runtime.Module` and runs the
compiled functions in the supported runtime environment.
+
+
+.. figure::
https://raw.githubusercontent.com/tvmai/web-data/master/images/design/tvm_dyn_workflow.svg
+ :align: center
+ :width: 85%
+
+
+Key data structures
+~~~~~~~~~~~~~~~~~~~
+
+One of the best ways to design and understand a complex system is to identify
the key data structures and APIs that
+manipulate (transform) these data structures. Once we identified the key data
structures, we can then breakdown a system into logical
+components that either define a collection of key data structures or
transformations among the data structures.
+
+**IRModule** is the primary data structure used across the entire stack. An
IRModule (intermediate representation module)
+contains a collection of functions. Currently, we support two primary variants
of functions.
+
+- **relay::Function** is a high-level functional program representation. A
relay.Function usually corresponds to an end to end model.
+ You can view a relay.Function as a computational graph with additional
support for control-flow, recursion, and complex data structures.
+- **tir::PrimFunc** is a low-level program representation that contains
elements including loop-nest choices, multi-dimensional load/store,
+ threading, and vector/tensor instructions. It is usually used to represent
an operator program that executes a (possibly-fused) layer in a model.
+
+During the compilation, a relay function may be lowered to multiple
tir::PrimFunc functions and a top-level function that calls into
+those tir::PrimFunc functions.
+
+Transformations
+~~~~~~~~~~~~~~~
+
+Now that we have covered the key data structures, let us talk about the
transformations. Each transformation could serve one of the following purposes:
+
+- optimization: transform a program to an equivalent, possibly more optimized
version.
+- lowering: transform a program to a lower-level representation that is closer
to the target.
+
+**relay/transform** contains a collection of passes that optimize the model.
The optimizations include common program
+optimizations such as constant folding and dead-code elimination, and
tensor-computation specific passes such as layout
+transformation and scaling factor folding.
+
+Near the end of the relay optimization pipeline, we will run a pass(FuseOps)
to break the end to end function(e.g. mobilenet)
+into sub-function(e.g. conv2d-relu) segments. We call these segments of
functions.
+This process helps us to divide the original problem into two sub-problems:
+
+- Compilation and optimization for each sub-function.
+- Overall execution structure: we need to do a sequence of calls into the
generated sub-functions to execute the whole model.
+
+We use the low-level tir phase to compile and optimize each sub-functions. For
specific targets, we may also directly go to the target translation phase and
use external code generators.
+
+There are a few different ways(in relay/backend) to handle the calls into the
overall execution problem. For simple models with known shapes and no control
flow, we can lower to a graph runtime that stores the execution structure in a
graph. We also support a virtual machine backend for dynamic executions.
Finally, we plan to support ahead of time compilation that compiles the
high-level execution structure into the executable and generated primitive
functions. All of these execution modes are encapsulated by a unified
**runtime.Module** interface, which we will discuss in the latter part of the
guide.
+
+**tir/transform** contains transformation passes for TIR level functions. Many
tir passes serve the purpose of lowering. For example, there are passes to
flatten multi-dimensional access to one-dimensional pointer access, to expand
the intrinsics into target-specific ones, and to decorate the function entry to
meet the runtime calling convention. Of course, there are also optimizations
passes, such as access index simplification and dead code elimination.
+
+Many low-level optimizations can be handled in the target phase by the LLVM,
CUDA C, and other target compilers. As a result, we leave low-level
optimizations such as register allocation to the downstream compilers and only
focus on optimizations that are not covered by them.
+
+Search-space and Learning-based Transformations
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The transformation passes we described so far are deterministic and
rule-based. One design goal of the TVM stack is to support high-performance
code optimizations for different hardware platforms. To do so, we will need to
investigate as many optimizations choices as possible, including but not
limited to, multi-dimensional tensor access, loop tiling behavior, special
accelerator memory hierarchy, and threading.
+
+It is hard to define a heuristic to make all of the choices. Instead, we will
take a search and learning-based approach.
+We first define a collection of actions we can take to transform a program.
Example actions include loop transformations, inlining,
+vectorization. We call these actions **scheduling primitives**. The collection
of scheduling primitives defines a search space of possible
+optimizations we can make to a program. The system will use then searches over
different possible scheduling
Review comment:
will use them to search?
##########
File path: docs/dev/index.rst
##########
@@ -15,28 +15,360 @@
specific language governing permissions and limitations
under the License.
-Design and Developer Guide
-==========================
+Design and Architecture
+=======================
+
+This document is intended for developers who want to understand the
+architecture of TVM and/or actively develop on the project.
+This page is organized as follows:
+
+- The `Example Compilation Flow`_ gives an overview of the steps that TVM
takes to turn a high level description of a model into a deployable module.
+ To get started, please read the this section first.
+- The `Logical Architecture Components`_ section describes the logical
components.
+ The sections after are specific guides focused on each logical component,
organized
+ by the component's name.
+- The `How Tos`_ section contains useful tutorials to solve specific
development problems.
+
+This guide provides a few complementary views of the architecture.
+First, we review a single end to end compilation flow and discuss the key data
structures and the transformations.
+This runtime-based view focuses on the interactions of each components when
running the compiler.
+Then we will review the logical modules of the codebase and their
relationship. This part provides a static overarching view of the design.
+
+
+Example Compilation Flow
+------------------------
+
+In this guide, we will study an example compilation flow in the compiler. The
figure below shows the flow. At a high-level, it contains several steps:
+
+- Import: The frontend component ingests a model into an IRModule, which
contains a collection of functions that internally represent the model.
+- Transformation: The compiler transforms an IRModule to another functionally
equivalent or approximately
+ equivalent(e.g. in the case of quantization) IRModule. Many of the
transformatons are target (backend) independent.
+ We also allow target to affect the configuration of the transformation
pipeline.
+- Target Translation: The compiler translates(codegen) the IRModule to an
executable format specified by the target.
+ The target translation result is encapsulated as a `runtime.Module` that can
be exported, loaded, and executed on the target runtime environment.
+- Runtime Execution: the user loads back a `runtime.Module` and runs the
compiled functions in the supported runtime environment.
+
+
+.. figure::
https://raw.githubusercontent.com/tvmai/web-data/master/images/design/tvm_dyn_workflow.svg
+ :align: center
+ :width: 85%
+
+
+Key data structures
+~~~~~~~~~~~~~~~~~~~
+
+One of the best ways to design and understand a complex system is to identify
the key data structures and APIs that
+manipulate (transform) these data structures. Once we identified the key data
structures, we can then breakdown a system into logical
+components that either define a collection of key data structures or
transformations among the data structures.
+
+**IRModule** is the primary data structure used across the entire stack. An
IRModule (intermediate representation module)
+contains a collection of functions. Currently, we support two primary variants
of functions.
+
+- **relay::Function** is a high-level functional program representation. A
relay.Function usually corresponds to an end to end model.
+ You can view a relay.Function as a computational graph with additional
support for control-flow, recursion, and complex data structures.
+- **tir::PrimFunc** is a low-level program representation that contains
elements including loop-nest choices, multi-dimensional load/store,
+ threading, and vector/tensor instructions. It is usually used to represent
an operator program that executes a (possibly-fused) layer in a model.
+
+During the compilation, a relay function may be lowered to multiple
tir::PrimFunc functions and a top-level function that calls into
+those tir::PrimFunc functions.
+
+Transformations
+~~~~~~~~~~~~~~~
+
+Now that we have covered the key data structures, let us talk about the
transformations. Each transformation could serve one of the following purposes:
+
+- optimization: transform a program to an equivalent, possibly more optimized
version.
+- lowering: transform a program to a lower-level representation that is closer
to the target.
+
+**relay/transform** contains a collection of passes that optimize the model.
The optimizations include common program
+optimizations such as constant folding and dead-code elimination, and
tensor-computation specific passes such as layout
+transformation and scaling factor folding.
+
+Near the end of the relay optimization pipeline, we will run a pass(FuseOps)
to break the end to end function(e.g. mobilenet)
+into sub-function(e.g. conv2d-relu) segments. We call these segments of
functions.
+This process helps us to divide the original problem into two sub-problems:
+
+- Compilation and optimization for each sub-function.
+- Overall execution structure: we need to do a sequence of calls into the
generated sub-functions to execute the whole model.
+
+We use the low-level tir phase to compile and optimize each sub-functions. For
specific targets, we may also directly go to the target translation phase and
use external code generators.
+
+There are a few different ways(in relay/backend) to handle the calls into the
overall execution problem. For simple models with known shapes and no control
flow, we can lower to a graph runtime that stores the execution structure in a
graph. We also support a virtual machine backend for dynamic executions.
Finally, we plan to support ahead of time compilation that compiles the
high-level execution structure into the executable and generated primitive
functions. All of these execution modes are encapsulated by a unified
**runtime.Module** interface, which we will discuss in the latter part of the
guide.
+
+**tir/transform** contains transformation passes for TIR level functions. Many
tir passes serve the purpose of lowering. For example, there are passes to
flatten multi-dimensional access to one-dimensional pointer access, to expand
the intrinsics into target-specific ones, and to decorate the function entry to
meet the runtime calling convention. Of course, there are also optimizations
passes, such as access index simplification and dead code elimination.
+
+Many low-level optimizations can be handled in the target phase by the LLVM,
CUDA C, and other target compilers. As a result, we leave low-level
optimizations such as register allocation to the downstream compilers and only
focus on optimizations that are not covered by them.
+
+Search-space and Learning-based Transformations
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The transformation passes we described so far are deterministic and
rule-based. One design goal of the TVM stack is to support high-performance
code optimizations for different hardware platforms. To do so, we will need to
investigate as many optimizations choices as possible, including but not
limited to, multi-dimensional tensor access, loop tiling behavior, special
accelerator memory hierarchy, and threading.
+
+It is hard to define a heuristic to make all of the choices. Instead, we will
take a search and learning-based approach.
+We first define a collection of actions we can take to transform a program.
Example actions include loop transformations, inlining,
+vectorization. We call these actions **scheduling primitives**. The collection
of scheduling primitives defines a search space of possible
+optimizations we can make to a program. The system will use then searches over
different possible scheduling
+sequence to pick the best scheduling combination.
+The search procedure is usually guided by a machine learning algorithm.
+
+We can record the best schedule sequence for an (possibly-fused) operator once
the search is completed. The compiler can then just lookup the best
+schedule sequence and apply it to the program. Notably, this schedule
application phase **exactly like** the rule-based transformations,
Review comment:
```suggestion
schedule sequence and apply it to the program. Notably, this schedule
application phase is **exactly like** the rule-based transformations,
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]