junrushao1994 commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r664878703



##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,339 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: TBD (apache/tvm-rfcs#0000)
+* GitHub Issue: TBD (apache/tvm-rfcs#0000)
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a probabilistic scheduling DSL on TIR 
that unifies the approaches of AutoTVM and Auto Scheduler (Ansor). Meta 
schedule provides a pragmatic way to define the space of automatic tuning, 
extensibility in terms of all possible TIR schedule primitives like 
tensorization and loop partitioning, and customizability on every layer of the 
automation system.
+
+Meta Schedule is our 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling and Design Space**
+
+In TVM TensorIR, optimization of a TensorIR program is done via a sequence of 
transformations. For example, we reorder loops for better locality and we 
tensorize for specific hardware intrinsics. The process of invoking such a set 
of pre-defined transformations is called "**scheduling**", and each 
transformation is called a "**schedule primitive**". These primitives form a 
domain-specific language (DSL) describing the transformation of TensorIR 
programs. **Design space** is the set of all possible schedulings with respect 
to a TensorIR program.
+
+**Problems with the Current Scheduling System**
+
+* **Manual schedule**: Developers optimize their programs by manually invoking 
schedule primitives, i.e. explore points in the design space with humans in the 
loop. This can be a tedious and error-prone approach, hence the creation of 
AutoTVM and AutoScheduler (Ansor).
+* **AutoTVM**: The automation system requires users to define "schedule 
templates" as the design space for each operator. Therefore, it is inextensible 
to hundreds of operators and variety hardware platforms.
+* **AutoScheduler (Ansor)**: It automatically generates schedule templates as 
the design space, according to a set of predefined "search rules". However, it 
is non-trivial to extend AutoScheduler to new schedule primitives (tensorize, 
loop partition, software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of 
their own abstraction, which are not only hard to learn, but also 
engineering-intensive to customize.
+
+**Benefit of Meta Schedule**
+
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of 
abstraction.
+* Provides unified APIs for implementing manual schedule, AutoTVM and 
AutoScheduler (Ansor).
+* Extensibility to all the schedule primitives, including tensorization and 
loop partitioning. Almost no extra effort is needed to use a new primitive in 
auto-tuning.
+* The automation infrastructure is extensible on every of its components. 
Every component of the system can be customized easily in pure python or C++ or 
both. For example, one can develop a new design space generator in python, a 
new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+In this section, we describe the syntax of meta schedule DSL, and how it could 
be used to describe and auto-generate the design space.
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is 
an example of a manual schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this example, the developers may tweak the tile sizes and measure the 
performance of the generated kernels to explore the opportunities of potential 
optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters 
that are hard to determine ahead of time, for example, tile sizes, unroll 
steps, or which tensor intrinsics to use. Developers may manually enumerate 
possible combinations of these unknown factors, and then pick the best schedule 
according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive 
handles only a very basic transformation of the IR, for example, `split` only 
splits a loop into two. In the real world, the over-fine granularity of those 
primitives usually leads to repetitive and verbose scheduling code, as 
[mentioned](https://discuss.tvm.apache.org/t/rfc-tensorir-a-schedulable-ir-for-tvm/7872/43?u=junrushao1994)
 by developers in our community.
+
+To make it more convenient and modular, we allow users to register "composite 
schedules" that apply a sequence of schedule primitives according to certain 
analysis of the IR. For instance, a composite schedule may inspect a TensorIR 
block and decide whether we should call `compute_inline` on it.
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with sampling instructions. When 
included in a schedule, these instructions parametrize the schedule from a 
single deterministic point to a space supported by random variables (tile size, 
etc.), making it possible for developers to describe the design space with meta 
schedule APIs.
+
+We can extend the matmul example above to cover all possible tilings using 
these sampling instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)
+j_tiles = sch.sample_perfect_tile(j, n=4)
+k_tiles = sch.sample_perfect_tile(k, n=2)
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+AutoScheduler (Ansor) generates schedule templates by applying their 
SearchRules to each stage. SearchRule analyzes TE and eagerly trigger schedule 
primitives accordingly in its internally maintained mini IR.
+
+As introduced in Section 3.2, composite schedule rules are equivalent to 
AutoScheduler's SearchRule in TensorIR scheduling. To further generate a design 
space for scheduling, sampling instructions are used in composite schedule 
rules. Similarly, the sketch generation phase in AutoScheduler is equivalent to 
applying composite schedule rules to each block in TensorIR.
+
+Several built-in composite schedule rules are shipped with our system to align 
with Ansor's design space:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+* ...
+
+Developers may implement their own rules in either Python or C++, and specify 
which rules to use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / Ansor
+
+In this section, we show that the design space induced by TE manual schedule, 
AutoTVM and Ansor are all subsets of meta schedule, and meta schedule further 
allows mixing those three styles to search jointly.
+
+**Manual schedule**. The TE schedule is a special case of a meta schedule 
program, where there is no randomness introduced by sampling instructions. It 
is a single point in terms of design space.
+
+**AutoTVM (Template-based tuning)**. It is more natural representation of 
AutoTVM’s schedule templates (knobs) by writing one or more schedule functions 
in meta schedule with sampling instructions. The execution traces generated by 
the schedule functions are the design space to be explored.
+
+**AutoScheduler (Ansor, Template-free tuning)**. As mentioned in the previous 
section, application of composite schedule rules generates the design space, 
which is equivalent to Ansor’s sketch generation.
+
+**Mixing styles in design space definition**. By taking union of the spaces 
induced by the three special cases, our system allows developers to combine 
generic rules that Ansor provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+In this section, we introduce the underlying techniques for the automation 
system to extract and explore the design space. The figure below briefly 
illustrates the workflow of the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the 
underlying system records all the instructions users applied to the schedule 
class, including sampling and schedule primitives. We call this list of 
instructions a trace.
+
+Executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling 
instructions, which introduces the uncertainty in scheduling - one instance of 
sampling results in one point in the design space. Therefore, the trace itself 
forms a design space to explore, e.g. which set of tile sizes works best on a 
specific hardware.
+
+**Union of design space**. Our system works on a set of traces, representing 
the union of the design spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are 
equally important to generate high-performance schedules, we allow forking the 
trace into two, and the design space is the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with 
the following syntax:
+
+```python
+# requires to trace the execution
+sch = tir.Schedule(..., traced=True)
+# do a lot of scheduling
+...
+# print the trace
+print(sch.trace)
+```
+
+And below is an example of the printed trace, which honestly reflects the 
schedule as a snippet of python scheduling function:
+
+```python
+b0 = sch.get_block(name="matmul", func_name="main")
+l1, l2, l3 = sch.get_loops(block=b0)
+v4, v5, v6, v7 = sch.sample_perfect_tile(loop=l1, n=4, 
max_innermost_factor=16, decision=[32, 1, 16, 2])
+v8, v9, v10, v11 = sch.sample_perfect_tile(loop=l2, n=4, 
max_innermost_factor=16, decision=[64, 4, 2, 2])
+v12, v13 = sch.sample_perfect_tile(loop=l3, n=2, max_innermost_factor=16, 
decision=[64, 16])
+l14, l15, l16, l17 = sch.split(loop=l1, factors=[v4, v5, v6, v7])
+l18, l19, l20, l21 = sch.split(loop=l2, factors=[v8, v9, v10, v11])
+l22, l23 = sch.split(loop=l3, factors=[v12, v13])
+sch.reorder(l14, l18, l15, l19, l22, l16, l20, l23, l17, l21)
+```
+
+### 4.2. Exploring the Search Space
+
+Meta Schedule provides several built-in exploration strategies to exhaustively 
or efficiently search for efficient schedules.
+
+**Program replay**. A simple strategy that replays the user-provided schedule 
function without using or taking any advantage of trace.

Review comment:
       This paragraph was not well written to express the difference between 
"replaying an opaque schedule function" and "replaying a trace". I am going to 
rephrase a little bit




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to