This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/tvm-site.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 1791746f79 deploying docs 
(apache/tvm@d76c7292598e4b7ebaac0b6928f18fc32b95a579)
1791746f79 is described below

commit 1791746f79cd559ac1090283c5b1950eb284c007
Author: tvm-bot <[email protected]>
AuthorDate: Thu Feb 5 16:48:19 2026 +0000

    deploying docs (apache/tvm@d76c7292598e4b7ebaac0b6928f18fc32b95a579)
---
 .../11c11e53c7dace51a8be968ee169ed0d/ir_module.zip | Bin 23904 -> 23904 bytes
 .../tir_transformation.zip                         | Bin 15611 -> 15611 bytes
 .../relax_creation.zip                             | Bin 22398 -> 22398 bytes
 .../relax_transformation.zip                       | Bin 11460 -> 11460 bytes
 .../optimize_llm.zip                               | Bin 54101 -> 54101 bytes
 .../e2e_opt_model.zip                              | Bin 14486 -> 14486 bytes
 .../quick_start.zip                                | Bin 16250 -> 16250 bytes
 .../export_and_load_executable.zip                 | Bin 31428 -> 31428 bytes
 .../tir_creation.zip                               | Bin 24395 -> 24395 bytes
 .../cross_compilation_and_rpc.zip                  | Bin 46611 -> 46611 bytes
 .../customize_opt.zip                              | Bin 19813 -> 19813 bytes
 .../relax/tutorials/sg_execution_times.rst.txt     |   6 +--
 .../tensor_ir/tutorials/sg_execution_times.rst.txt |   6 +--
 .../tensor_ir/tutorials/tir_creation.rst.txt       |  20 +++++-----
 .../tensor_ir/tutorials/tir_transformation.rst.txt |   6 +--
 .../get_started/tutorials/ir_module.rst.txt        |   8 ++--
 .../get_started/tutorials/quick_start.rst.txt      |   4 +-
 .../tutorials/sg_execution_times.rst.txt           |   6 +--
 .../tutorials/cross_compilation_and_rpc.rst.txt    |   6 +--
 .../how_to/tutorials/customize_opt.rst.txt         |   4 +-
 .../how_to/tutorials/e2e_opt_model.rst.txt         |   2 +-
 .../how_to/tutorials/sg_execution_times.rst.txt    |  10 ++---
 docs/_sources/sg_execution_times.rst.txt           |  22 +++++------
 docs/deep_dive/relax/tutorials/relax_creation.html |  16 +-------
 .../relax/tutorials/relax_transformation.html      |  15 +------
 .../relax/tutorials/sg_execution_times.html        |   6 +--
 .../tensor_ir/tutorials/sg_execution_times.html    |   6 +--
 .../tensor_ir/tutorials/tir_creation.html          |  44 ++++++---------------
 .../tensor_ir/tutorials/tir_transformation.html    |  23 +++--------
 docs/get_started/tutorials/ir_module.html          |  16 ++++----
 docs/get_started/tutorials/quick_start.html        |  24 +++++------
 docs/get_started/tutorials/sg_execution_times.html |   6 +--
 .../tutorials/cross_compilation_and_rpc.html       |   6 +--
 docs/how_to/tutorials/customize_opt.html           |   8 ++--
 docs/how_to/tutorials/e2e_opt_model.html           |   6 +--
 .../tutorials/export_and_load_executable.html      |   6 +--
 docs/how_to/tutorials/optimize_llm.html            |  10 ++---
 docs/how_to/tutorials/sg_execution_times.html      |  10 ++---
 docs/objects.inv                                   | Bin 19879 -> 19883 bytes
 docs/reference/api/python/runtime/vm.html          |   2 +-
 docs/searchindex.js                                |   2 +-
 docs/sg_execution_times.html                       |  22 +++++------
 42 files changed, 136 insertions(+), 192 deletions(-)

diff --git a/docs/_downloads/11c11e53c7dace51a8be968ee169ed0d/ir_module.zip 
b/docs/_downloads/11c11e53c7dace51a8be968ee169ed0d/ir_module.zip
index e7c95e61bd..b8bd097d23 100644
Binary files a/docs/_downloads/11c11e53c7dace51a8be968ee169ed0d/ir_module.zip 
and b/docs/_downloads/11c11e53c7dace51a8be968ee169ed0d/ir_module.zip differ
diff --git 
a/docs/_downloads/18ba0d2ee8120824175aaef66bc9c9bf/tir_transformation.zip 
b/docs/_downloads/18ba0d2ee8120824175aaef66bc9c9bf/tir_transformation.zip
index 750ddf396b..1a67a76974 100644
Binary files 
a/docs/_downloads/18ba0d2ee8120824175aaef66bc9c9bf/tir_transformation.zip and 
b/docs/_downloads/18ba0d2ee8120824175aaef66bc9c9bf/tir_transformation.zip differ
diff --git 
a/docs/_downloads/4753776bbe68e7c9ee4d19117973fc8b/relax_creation.zip 
b/docs/_downloads/4753776bbe68e7c9ee4d19117973fc8b/relax_creation.zip
index 43534451a6..0f2d0f8e61 100644
Binary files 
a/docs/_downloads/4753776bbe68e7c9ee4d19117973fc8b/relax_creation.zip and 
b/docs/_downloads/4753776bbe68e7c9ee4d19117973fc8b/relax_creation.zip differ
diff --git 
a/docs/_downloads/7d201684dfa095a5ea48d98e9a2ef7ad/relax_transformation.zip 
b/docs/_downloads/7d201684dfa095a5ea48d98e9a2ef7ad/relax_transformation.zip
index 13994e8224..18e6e11638 100644
Binary files 
a/docs/_downloads/7d201684dfa095a5ea48d98e9a2ef7ad/relax_transformation.zip and 
b/docs/_downloads/7d201684dfa095a5ea48d98e9a2ef7ad/relax_transformation.zip 
differ
diff --git a/docs/_downloads/83e85f38cf16f1d926d06615fd54095c/optimize_llm.zip 
b/docs/_downloads/83e85f38cf16f1d926d06615fd54095c/optimize_llm.zip
index f3cdf14de8..102f812c35 100644
Binary files 
a/docs/_downloads/83e85f38cf16f1d926d06615fd54095c/optimize_llm.zip and 
b/docs/_downloads/83e85f38cf16f1d926d06615fd54095c/optimize_llm.zip differ
diff --git a/docs/_downloads/a7dd7652b2ad50f82d7b739ce3645799/e2e_opt_model.zip 
b/docs/_downloads/a7dd7652b2ad50f82d7b739ce3645799/e2e_opt_model.zip
index a6d40052e8..5f0581b343 100644
Binary files 
a/docs/_downloads/a7dd7652b2ad50f82d7b739ce3645799/e2e_opt_model.zip and 
b/docs/_downloads/a7dd7652b2ad50f82d7b739ce3645799/e2e_opt_model.zip differ
diff --git a/docs/_downloads/bb7db6678496193ed0c55d3b95fa6778/quick_start.zip 
b/docs/_downloads/bb7db6678496193ed0c55d3b95fa6778/quick_start.zip
index f7f51f7ad1..256780ba12 100644
Binary files a/docs/_downloads/bb7db6678496193ed0c55d3b95fa6778/quick_start.zip 
and b/docs/_downloads/bb7db6678496193ed0c55d3b95fa6778/quick_start.zip differ
diff --git 
a/docs/_downloads/bc875d02d5382abc9ea5fb9eb2c1de2c/export_and_load_executable.zip
 
b/docs/_downloads/bc875d02d5382abc9ea5fb9eb2c1de2c/export_and_load_executable.zip
index af3936bf9d..ed25126b33 100644
Binary files 
a/docs/_downloads/bc875d02d5382abc9ea5fb9eb2c1de2c/export_and_load_executable.zip
 and 
b/docs/_downloads/bc875d02d5382abc9ea5fb9eb2c1de2c/export_and_load_executable.zip
 differ
diff --git a/docs/_downloads/be26483bb70b8468499a01c55e8e866c/tir_creation.zip 
b/docs/_downloads/be26483bb70b8468499a01c55e8e866c/tir_creation.zip
index b8120ae2f0..0558ddbd9b 100644
Binary files 
a/docs/_downloads/be26483bb70b8468499a01c55e8e866c/tir_creation.zip and 
b/docs/_downloads/be26483bb70b8468499a01c55e8e866c/tir_creation.zip differ
diff --git 
a/docs/_downloads/f69380821f417ef2210f45503d81bded/cross_compilation_and_rpc.zip
 
b/docs/_downloads/f69380821f417ef2210f45503d81bded/cross_compilation_and_rpc.zip
index 16c6d42265..efdee9b26c 100644
Binary files 
a/docs/_downloads/f69380821f417ef2210f45503d81bded/cross_compilation_and_rpc.zip
 and 
b/docs/_downloads/f69380821f417ef2210f45503d81bded/cross_compilation_and_rpc.zip
 differ
diff --git a/docs/_downloads/f69433a4a80715725df90d1386679956/customize_opt.zip 
b/docs/_downloads/f69433a4a80715725df90d1386679956/customize_opt.zip
index 616cee5ca8..e2b4748408 100644
Binary files 
a/docs/_downloads/f69433a4a80715725df90d1386679956/customize_opt.zip and 
b/docs/_downloads/f69433a4a80715725df90d1386679956/customize_opt.zip differ
diff --git a/docs/_sources/deep_dive/relax/tutorials/sg_execution_times.rst.txt 
b/docs/_sources/deep_dive/relax/tutorials/sg_execution_times.rst.txt
index 576c5e270b..4c7db14738 100644
--- a/docs/_sources/deep_dive/relax/tutorials/sg_execution_times.rst.txt
+++ b/docs/_sources/deep_dive/relax/tutorials/sg_execution_times.rst.txt
@@ -6,7 +6,7 @@
 
 Computation times
 =================
-**00:00.171** total execution time for 2 files **from 
deep_dive/relax/tutorials**:
+**00:00.169** total execution time for 2 files **from 
deep_dive/relax/tutorials**:
 
 .. container::
 
@@ -33,8 +33,8 @@ Computation times
      - Time
      - Mem (MB)
    * - :ref:`sphx_glr_deep_dive_relax_tutorials_relax_creation.py` 
(``relax_creation.py``)
-     - 00:00.109
+     - 00:00.107
      - 0.0
    * - :ref:`sphx_glr_deep_dive_relax_tutorials_relax_transformation.py` 
(``relax_transformation.py``)
-     - 00:00.062
+     - 00:00.063
      - 0.0
diff --git 
a/docs/_sources/deep_dive/tensor_ir/tutorials/sg_execution_times.rst.txt 
b/docs/_sources/deep_dive/tensor_ir/tutorials/sg_execution_times.rst.txt
index d288c15946..a93a9004f2 100644
--- a/docs/_sources/deep_dive/tensor_ir/tutorials/sg_execution_times.rst.txt
+++ b/docs/_sources/deep_dive/tensor_ir/tutorials/sg_execution_times.rst.txt
@@ -6,7 +6,7 @@
 
 Computation times
 =================
-**00:00.469** total execution time for 2 files **from 
deep_dive/tensor_ir/tutorials**:
+**00:00.459** total execution time for 2 files **from 
deep_dive/tensor_ir/tutorials**:
 
 .. container::
 
@@ -33,8 +33,8 @@ Computation times
      - Time
      - Mem (MB)
    * - :ref:`sphx_glr_deep_dive_tensor_ir_tutorials_tir_transformation.py` 
(``tir_transformation.py``)
-     - 00:00.296
+     - 00:00.289
      - 0.0
    * - :ref:`sphx_glr_deep_dive_tensor_ir_tutorials_tir_creation.py` 
(``tir_creation.py``)
-     - 00:00.174
+     - 00:00.170
      - 0.0
diff --git a/docs/_sources/deep_dive/tensor_ir/tutorials/tir_creation.rst.txt 
b/docs/_sources/deep_dive/tensor_ir/tutorials/tir_creation.rst.txt
index b1ccbb5a1c..f257be01f4 100644
--- a/docs/_sources/deep_dive/tensor_ir/tutorials/tir_creation.rst.txt
+++ b/docs/_sources/deep_dive/tensor_ir/tutorials/tir_creation.rst.txt
@@ -319,17 +319,17 @@ Now let's check the runtime dynamic shape inference:
 
  .. code-block:: none
 
-    [[0.66395146 0.9731671  0.5579553  0.83329487]
-     [0.28166202 0.2968866  0.18112361 0.2758509 ]
-     [0.29287887 0.59326327 0.31995752 0.4811729 ]
-     [0.36032397 0.9437797  0.569279   0.672779  ]]
-    [[31.453966 28.067139 29.689894 ... 31.817009 34.22411  30.223434]
-     [28.98735  29.64394  29.827055 ... 31.29005  34.07603  29.910816]
-     [30.846943 30.170332 31.640345 ... 32.7398   35.731262 30.372322]
+    [[1.1707907  1.1987079  0.90032184 1.230586  ]
+     [1.2139356  1.1025238  0.9032923  1.1955839 ]
+     [0.95371306 0.86685586 0.69221765 1.0787495 ]
+     [1.0905532  0.71847934 0.96397567 0.5273048 ]]
+    [[30.138514 28.199673 24.100554 ... 30.719109 28.356918 29.140915]
+     [37.550213 33.45395  31.837355 ... 36.413628 32.489414 33.274647]
+     [35.609116 33.061962 31.23119  ... 34.974236 32.11245  30.839909]
      ...
-     [33.3841   30.616657 33.785934 ... 34.35256  38.367065 34.010597]
-     [33.75541  33.209534 33.53989  ... 37.496735 38.519188 34.53092 ]
-     [30.60816  28.78788  30.272537 ... 31.03617  35.37258  29.144073]]
+     [35.50563  35.12555  30.850153 ... 37.291557 32.579887 32.654045]
+     [30.811462 29.73757  29.051355 ... 33.262543 30.94805  29.032318]
+     [35.33563  35.39359  30.650606 ... 37.331318 33.28751  33.48364 ]]
 
 
 
diff --git 
a/docs/_sources/deep_dive/tensor_ir/tutorials/tir_transformation.rst.txt 
b/docs/_sources/deep_dive/tensor_ir/tutorials/tir_transformation.rst.txt
index 2426c0c757..5b87848f61 100644
--- a/docs/_sources/deep_dive/tensor_ir/tutorials/tir_transformation.rst.txt
+++ b/docs/_sources/deep_dive/tensor_ir/tutorials/tir_transformation.rst.txt
@@ -117,7 +117,7 @@ original implementation.
 
     Execution time summary:
      mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
-       2.7574       2.7574       2.7574       2.7574       0.0000              
    
+       2.7350       2.7350       2.7350       2.7350       0.0000              
    
 
 
 
@@ -289,7 +289,7 @@ action involves reordering these two loops.
 
     Execution time summary:
      mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
-       0.8681       0.8681       0.8681       0.8681       0.0000              
    
+       0.8611       0.8611       0.8611       0.8611       0.0000              
    
 
 
 
@@ -417,7 +417,7 @@ from the reduction update via the **decompose_reduction** 
primitive.
 
     Execution time summary:
      mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
-       0.3475       0.3475       0.3475       0.3475       0.0000              
    
+       0.3371       0.3371       0.3371       0.3371       0.0000              
    
 
 
 
diff --git a/docs/_sources/get_started/tutorials/ir_module.rst.txt 
b/docs/_sources/get_started/tutorials/ir_module.rst.txt
index fb11329e28..d6fe97b7f4 100644
--- a/docs/_sources/get_started/tutorials/ir_module.rst.txt
+++ b/docs/_sources/get_started/tutorials/ir_module.rst.txt
@@ -692,8 +692,8 @@ We can deploy the IRModule on CPU by specifying the target 
as ``llvm``.
 
  .. code-block:: none
 
-    [[ 0.13186169  0.03027971  0.25219893 -0.00776528  0.04768787  0.049862
-      -0.00460886  0.12887058  0.11069757  0.07827966]]
+    [[ 0.04623564  0.15965246 -0.10441527 -0.07932861 -0.11549026  0.09914821
+       0.010793   -0.0678812   0.03407104 -0.07796752]]
 
 
 
@@ -759,8 +759,8 @@ Now we can compile the IRModule on GPU, the similar way as 
we did on CPU.
 
  .. code-block:: none
 
-    [[ 0.13186167  0.03027973  0.25219896 -0.00776527  0.04768784  0.049862
-      -0.00460887  0.12887052  0.11069752  0.07827966]]
+    [[ 0.04623562  0.15965243 -0.10441533 -0.07932858 -0.11549021  0.09914818
+       0.01079302 -0.0678813   0.03407101 -0.07796749]]
 
 
 
diff --git a/docs/_sources/get_started/tutorials/quick_start.rst.txt 
b/docs/_sources/get_started/tutorials/quick_start.rst.txt
index 0bb0865577..e46b3b62cd 100644
--- a/docs/_sources/get_started/tutorials/quick_start.rst.txt
+++ b/docs/_sources/get_started/tutorials/quick_start.rst.txt
@@ -224,8 +224,8 @@ different devices.
 
  .. code-block:: none
 
-    [[26306.326 26073.63  26867.973 26758.207 25452.387 26039.125 26349.092
-      26437.273 26610.932 25183.252]]
+    [[24096.695 23445.986 24458.24  25460.695 24985.992 25700.53  24365.977
+      24206.19  23916.527 24693.312]]
 
 
 
diff --git a/docs/_sources/get_started/tutorials/sg_execution_times.rst.txt 
b/docs/_sources/get_started/tutorials/sg_execution_times.rst.txt
index 303eafb4cc..9e4b888281 100644
--- a/docs/_sources/get_started/tutorials/sg_execution_times.rst.txt
+++ b/docs/_sources/get_started/tutorials/sg_execution_times.rst.txt
@@ -6,7 +6,7 @@
 
 Computation times
 =================
-**00:07.539** total execution time for 2 files **from get_started/tutorials**:
+**00:05.508** total execution time for 2 files **from get_started/tutorials**:
 
 .. container::
 
@@ -33,8 +33,8 @@ Computation times
      - Time
      - Mem (MB)
    * - :ref:`sphx_glr_get_started_tutorials_ir_module.py` (``ir_module.py``)
-     - 00:07.364
+     - 00:05.226
      - 0.0
    * - :ref:`sphx_glr_get_started_tutorials_quick_start.py` 
(``quick_start.py``)
-     - 00:00.175
+     - 00:00.283
      - 0.0
diff --git a/docs/_sources/how_to/tutorials/cross_compilation_and_rpc.rst.txt 
b/docs/_sources/how_to/tutorials/cross_compilation_and_rpc.rst.txt
index 62dff311c0..ad6dac0359 100644
--- a/docs/_sources/how_to/tutorials/cross_compilation_and_rpc.rst.txt
+++ b/docs/_sources/how_to/tutorials/cross_compilation_and_rpc.rst.txt
@@ -268,7 +268,7 @@ device and returns the measured cost. Network overhead is 
excluded.
 
  .. code-block:: none
 
-    1.33e-07 secs/op
+    1.15e-07 secs/op
 
 
 
@@ -651,8 +651,8 @@ This workflow is applicable to various deployment scenarios:
     Converted PyTorch model to Relax:
       - Number of parameters: 4
     Using local target for demonstration
-    Exported library to: /tmp/tmp5u8y539w/model_deployed.so
-    Saved parameters to: /tmp/tmp5u8y539w/model_params.npz
+    Exported library to: /tmp/tmprxhj41dx/model_deployed.so
+    Saved parameters to: /tmp/tmprxhj41dx/model_params.npz
 
     RPC workflow (works for any remote device):
     ==================================================
diff --git a/docs/_sources/how_to/tutorials/customize_opt.rst.txt 
b/docs/_sources/how_to/tutorials/customize_opt.rst.txt
index f69cbe56b4..09258395cd 100644
--- a/docs/_sources/how_to/tutorials/customize_opt.rst.txt
+++ b/docs/_sources/how_to/tutorials/customize_opt.rst.txt
@@ -414,8 +414,8 @@ We can build and deploy the optimized model to the TVM 
runtime.
 
  .. code-block:: none
 
-    [[24151.723 24796.18  25246.785 24885.469 25175.598 25225.684 25009.82
-      25679.812 26206.955 25438.086]]
+    [[25672.938 25102.557 25765.455 27441.629 26252.055 26036.45  25429.342
+      25760.045 26325.924 24135.113]]
 
 
 
diff --git a/docs/_sources/how_to/tutorials/e2e_opt_model.rst.txt 
b/docs/_sources/how_to/tutorials/e2e_opt_model.rst.txt
index 8a8e292084..a2276ef9aa 100644
--- a/docs/_sources/how_to/tutorials/e2e_opt_model.rst.txt
+++ b/docs/_sources/how_to/tutorials/e2e_opt_model.rst.txt
@@ -53,7 +53,7 @@ PyTorch.
  .. code-block:: none
 
     Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth"; 
to /workspace/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
-       0%|          | 0.00/44.7M [00:00<?, ?B/s]      53%|█████▎    | 
23.6M/44.7M [00:00<00:00, 247MB/s]     100%|██████████| 44.7M/44.7M 
[00:00<00:00, 278MB/s]
+       0%|          | 0.00/44.7M [00:00<?, ?B/s]      58%|█████▊    | 
26.0M/44.7M [00:00<00:00, 272MB/s]     100%|██████████| 44.7M/44.7M 
[00:00<00:00, 299MB/s]
 
 
 
diff --git a/docs/_sources/how_to/tutorials/sg_execution_times.rst.txt 
b/docs/_sources/how_to/tutorials/sg_execution_times.rst.txt
index 1d759bae62..382e998b68 100644
--- a/docs/_sources/how_to/tutorials/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/tutorials/sg_execution_times.rst.txt
@@ -6,7 +6,7 @@
 
 Computation times
 =================
-**00:32.929** total execution time for 5 files **from how_to/tutorials**:
+**00:32.661** total execution time for 5 files **from how_to/tutorials**:
 
 .. container::
 
@@ -33,16 +33,16 @@ Computation times
      - Time
      - Mem (MB)
    * - :ref:`sphx_glr_how_to_tutorials_optimize_llm.py` (``optimize_llm.py``)
-     - 00:30.561
+     - 00:30.689
      - 0.0
    * - :ref:`sphx_glr_how_to_tutorials_cross_compilation_and_rpc.py` 
(``cross_compilation_and_rpc.py``)
-     - 00:00.912
+     - 00:00.816
      - 0.0
    * - :ref:`sphx_glr_how_to_tutorials_customize_opt.py` (``customize_opt.py``)
-     - 00:00.756
+     - 00:00.679
      - 0.0
    * - :ref:`sphx_glr_how_to_tutorials_e2e_opt_model.py` (``e2e_opt_model.py``)
-     - 00:00.697
+     - 00:00.475
      - 0.0
    * - :ref:`sphx_glr_how_to_tutorials_export_and_load_executable.py` 
(``export_and_load_executable.py``)
      - 00:00.002
diff --git a/docs/_sources/sg_execution_times.rst.txt 
b/docs/_sources/sg_execution_times.rst.txt
index adc52cfe70..ae4d59a1b6 100644
--- a/docs/_sources/sg_execution_times.rst.txt
+++ b/docs/_sources/sg_execution_times.rst.txt
@@ -6,7 +6,7 @@
 
 Computation times
 =================
-**00:41.109** total execution time for 11 files **from all galleries**:
+**00:38.797** total execution time for 11 files **from all galleries**:
 
 .. container::
 
@@ -33,34 +33,34 @@ Computation times
      - Time
      - Mem (MB)
    * - :ref:`sphx_glr_how_to_tutorials_optimize_llm.py` 
(``../how_to/tutorials/optimize_llm.py``)
-     - 00:30.561
+     - 00:30.689
      - 0.0
    * - :ref:`sphx_glr_get_started_tutorials_ir_module.py` 
(``../get_started/tutorials/ir_module.py``)
-     - 00:07.364
+     - 00:05.226
      - 0.0
    * - :ref:`sphx_glr_how_to_tutorials_cross_compilation_and_rpc.py` 
(``../how_to/tutorials/cross_compilation_and_rpc.py``)
-     - 00:00.912
+     - 00:00.816
      - 0.0
    * - :ref:`sphx_glr_how_to_tutorials_customize_opt.py` 
(``../how_to/tutorials/customize_opt.py``)
-     - 00:00.756
+     - 00:00.679
      - 0.0
    * - :ref:`sphx_glr_how_to_tutorials_e2e_opt_model.py` 
(``../how_to/tutorials/e2e_opt_model.py``)
-     - 00:00.697
+     - 00:00.475
      - 0.0
    * - :ref:`sphx_glr_deep_dive_tensor_ir_tutorials_tir_transformation.py` 
(``../deep_dive/tensor_ir/tutorials/tir_transformation.py``)
-     - 00:00.296
+     - 00:00.289
      - 0.0
    * - :ref:`sphx_glr_get_started_tutorials_quick_start.py` 
(``../get_started/tutorials/quick_start.py``)
-     - 00:00.175
+     - 00:00.283
      - 0.0
    * - :ref:`sphx_glr_deep_dive_tensor_ir_tutorials_tir_creation.py` 
(``../deep_dive/tensor_ir/tutorials/tir_creation.py``)
-     - 00:00.174
+     - 00:00.170
      - 0.0
    * - :ref:`sphx_glr_deep_dive_relax_tutorials_relax_creation.py` 
(``../deep_dive/relax/tutorials/relax_creation.py``)
-     - 00:00.109
+     - 00:00.107
      - 0.0
    * - :ref:`sphx_glr_deep_dive_relax_tutorials_relax_transformation.py` 
(``../deep_dive/relax/tutorials/relax_transformation.py``)
-     - 00:00.062
+     - 00:00.063
      - 0.0
    * - :ref:`sphx_glr_how_to_tutorials_export_and_load_executable.py` 
(``../how_to/tutorials/export_and_load_executable.py``)
      - 00:00.002
diff --git a/docs/deep_dive/relax/tutorials/relax_creation.html 
b/docs/deep_dive/relax/tutorials/relax_creation.html
index 760c654e87..ac36c4c44f 100644
--- a/docs/deep_dive/relax/tutorials/relax_creation.html
+++ b/docs/deep_dive/relax/tutorials/relax_creation.html
@@ -192,22 +192,10 @@
 <li class="toctree-l1"><a class="reference internal" 
href="../../../how_to/dev/index.html">Development Guides</a></li>
 </ul>
 <p class="caption" role="heading"><span class="caption-text">Deep 
Dive</span></p>
-<ul class="current">
+<ul>
 <li class="toctree-l1"><a class="reference internal" 
href="../../../arch/index.html">Design and Architecture</a></li>
 <li class="toctree-l1"><a class="reference internal" 
href="../../tensor_ir/index.html">TensorIR</a></li>
-<li class="toctree-l1 current"><a class="reference internal" 
href="../index.html">Relax</a><ul class="current">
-<li class="toctree-l2"><a class="reference internal" 
href="../abstraction.html">Graph Abstraction for ML Models</a></li>
-<li class="toctree-l2"><a class="reference internal" 
href="../learning.html">Understand Relax Abstraction</a></li>
-<li class="toctree-l2 current"><a class="current reference internal" 
href="#">Relax Creation</a><ul>
-<li class="toctree-l3"><a class="reference internal" 
href="#create-relax-programs-using-tvmscript">Create Relax programs using 
TVMScript</a></li>
-<li class="toctree-l3"><a class="reference internal" 
href="#create-relax-programs-using-nnmodule-api">Create Relax programs using 
NNModule API</a></li>
-<li class="toctree-l3"><a class="reference internal" 
href="#create-relax-programs-using-block-builder-api">Create Relax programs 
using Block Builder API</a></li>
-<li class="toctree-l3"><a class="reference internal" 
href="#summary">Summary</a></li>
-</ul>
-</li>
-<li class="toctree-l2"><a class="reference internal" 
href="relax_transformation.html">Transformation</a></li>
-</ul>
-</li>
+<li class="toctree-l1"><a class="reference internal" 
href="../index.html">Relax</a></li>
 </ul>
 <p class="caption" role="heading"><span class="caption-text">API 
Reference</span></p>
 <ul>
diff --git a/docs/deep_dive/relax/tutorials/relax_transformation.html 
b/docs/deep_dive/relax/tutorials/relax_transformation.html
index dbe7e49e7b..d1eff89b89 100644
--- a/docs/deep_dive/relax/tutorials/relax_transformation.html
+++ b/docs/deep_dive/relax/tutorials/relax_transformation.html
@@ -192,21 +192,10 @@
 <li class="toctree-l1"><a class="reference internal" 
href="../../../how_to/dev/index.html">Development Guides</a></li>
 </ul>
 <p class="caption" role="heading"><span class="caption-text">Deep 
Dive</span></p>
-<ul class="current">
+<ul>
 <li class="toctree-l1"><a class="reference internal" 
href="../../../arch/index.html">Design and Architecture</a></li>
 <li class="toctree-l1"><a class="reference internal" 
href="../../tensor_ir/index.html">TensorIR</a></li>
-<li class="toctree-l1 current"><a class="reference internal" 
href="../index.html">Relax</a><ul class="current">
-<li class="toctree-l2"><a class="reference internal" 
href="../abstraction.html">Graph Abstraction for ML Models</a></li>
-<li class="toctree-l2"><a class="reference internal" 
href="../learning.html">Understand Relax Abstraction</a></li>
-<li class="toctree-l2"><a class="reference internal" 
href="relax_creation.html">Relax Creation</a></li>
-<li class="toctree-l2 current"><a class="current reference internal" 
href="#">Transformation</a><ul>
-<li class="toctree-l3"><a class="reference internal" 
href="#apply-transformations">Apply transformations</a></li>
-<li class="toctree-l3"><a class="reference internal" 
href="#custom-passes">Custom Passes</a></li>
-<li class="toctree-l3"><a class="reference internal" 
href="#summary">Summary</a></li>
-</ul>
-</li>
-</ul>
-</li>
+<li class="toctree-l1"><a class="reference internal" 
href="../index.html">Relax</a></li>
 </ul>
 <p class="caption" role="heading"><span class="caption-text">API 
Reference</span></p>
 <ul>
diff --git a/docs/deep_dive/relax/tutorials/sg_execution_times.html 
b/docs/deep_dive/relax/tutorials/sg_execution_times.html
index 7d5a1c4977..300d9d8d73 100644
--- a/docs/deep_dive/relax/tutorials/sg_execution_times.html
+++ b/docs/deep_dive/relax/tutorials/sg_execution_times.html
@@ -294,7 +294,7 @@
             
   <section id="computation-times">
 <span 
id="sphx-glr-deep-dive-relax-tutorials-sg-execution-times"></span><h1>Computation
 times<a class="headerlink" href="#computation-times" title="Link to this 
heading"></a></h1>
-<p><strong>00:00.171</strong> total execution time for 2 files <strong>from 
deep_dive/relax/tutorials</strong>:</p>
+<p><strong>00:00.169</strong> total execution time for 2 files <strong>from 
deep_dive/relax/tutorials</strong>:</p>
 <div class="docutils container">
 <style scoped>
 <link 
href="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/5.3.0/css/bootstrap.min.css";
 rel="stylesheet" />
@@ -316,11 +316,11 @@ $(document).ready( function () {
 </thead>
 <tbody>
 <tr class="row-even"><td><p><a class="reference internal" 
href="relax_creation.html#sphx-glr-deep-dive-relax-tutorials-relax-creation-py"><span
 class="std std-ref">Relax Creation</span></a> (<code class="docutils literal 
notranslate"><span class="pre">relax_creation.py</span></code>)</p></td>
-<td><p>00:00.109</p></td>
+<td><p>00:00.107</p></td>
 <td><p>0.0</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" 
href="relax_transformation.html#sphx-glr-deep-dive-relax-tutorials-relax-transformation-py"><span
 class="std std-ref">Transformation</span></a> (<code class="docutils literal 
notranslate"><span class="pre">relax_transformation.py</span></code>)</p></td>
-<td><p>00:00.062</p></td>
+<td><p>00:00.063</p></td>
 <td><p>0.0</p></td>
 </tr>
 </tbody>
diff --git a/docs/deep_dive/tensor_ir/tutorials/sg_execution_times.html 
b/docs/deep_dive/tensor_ir/tutorials/sg_execution_times.html
index 8b51d4239e..acd4043b2a 100644
--- a/docs/deep_dive/tensor_ir/tutorials/sg_execution_times.html
+++ b/docs/deep_dive/tensor_ir/tutorials/sg_execution_times.html
@@ -294,7 +294,7 @@
             
   <section id="computation-times">
 <span 
id="sphx-glr-deep-dive-tensor-ir-tutorials-sg-execution-times"></span><h1>Computation
 times<a class="headerlink" href="#computation-times" title="Link to this 
heading"></a></h1>
-<p><strong>00:00.469</strong> total execution time for 2 files <strong>from 
deep_dive/tensor_ir/tutorials</strong>:</p>
+<p><strong>00:00.459</strong> total execution time for 2 files <strong>from 
deep_dive/tensor_ir/tutorials</strong>:</p>
 <div class="docutils container">
 <style scoped>
 <link 
href="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/5.3.0/css/bootstrap.min.css";
 rel="stylesheet" />
@@ -316,11 +316,11 @@ $(document).ready( function () {
 </thead>
 <tbody>
 <tr class="row-even"><td><p><a class="reference internal" 
href="tir_transformation.html#sphx-glr-deep-dive-tensor-ir-tutorials-tir-transformation-py"><span
 class="std std-ref">Transformation</span></a> (<code class="docutils literal 
notranslate"><span class="pre">tir_transformation.py</span></code>)</p></td>
-<td><p>00:00.296</p></td>
+<td><p>00:00.289</p></td>
 <td><p>0.0</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" 
href="tir_creation.html#sphx-glr-deep-dive-tensor-ir-tutorials-tir-creation-py"><span
 class="std std-ref">TensorIR Creation</span></a> (<code class="docutils 
literal notranslate"><span class="pre">tir_creation.py</span></code>)</p></td>
-<td><p>00:00.174</p></td>
+<td><p>00:00.170</p></td>
 <td><p>0.0</p></td>
 </tr>
 </tbody>
diff --git a/docs/deep_dive/tensor_ir/tutorials/tir_creation.html 
b/docs/deep_dive/tensor_ir/tutorials/tir_creation.html
index 8474ec6ce7..5eb7a8c3c3 100644
--- a/docs/deep_dive/tensor_ir/tutorials/tir_creation.html
+++ b/docs/deep_dive/tensor_ir/tutorials/tir_creation.html
@@ -193,29 +193,9 @@
 <li class="toctree-l1"><a class="reference internal" 
href="../../../how_to/dev/index.html">Development Guides</a></li>
 </ul>
 <p class="caption" role="heading"><span class="caption-text">Deep 
Dive</span></p>
-<ul class="current">
+<ul>
 <li class="toctree-l1"><a class="reference internal" 
href="../../../arch/index.html">Design and Architecture</a></li>
-<li class="toctree-l1 current"><a class="reference internal" 
href="../index.html">TensorIR</a><ul class="current">
-<li class="toctree-l2"><a class="reference internal" 
href="../abstraction.html">Tensor Program Abstraction</a></li>
-<li class="toctree-l2"><a class="reference internal" 
href="../learning.html">Understand TensorIR Abstraction</a></li>
-<li class="toctree-l2 current"><a class="current reference internal" 
href="#">TensorIR Creation</a><ul>
-<li class="toctree-l3"><a class="reference internal" 
href="#create-tensorir-using-tvmscript">Create TensorIR using TVMScript</a><ul>
-<li class="toctree-l4"><a class="reference internal" 
href="#standard-format">Standard Format</a></li>
-<li class="toctree-l4"><a class="reference internal" 
href="#concise-with-syntactic-sugar">Concise with Syntactic Sugar</a></li>
-<li class="toctree-l4"><a class="reference internal" 
href="#interactive-with-python-variables">Interactive with Python 
Variables</a></li>
-<li class="toctree-l4"><a class="reference internal" 
href="#tensorir-function-with-dynamic-shapes">TensorIR Function with Dynamic 
Shapes</a></li>
-</ul>
-</li>
-<li class="toctree-l3"><a class="reference internal" 
href="#create-tensorir-using-tensor-expression">Create TensorIR using Tensor 
Expression</a><ul>
-<li class="toctree-l4"><a class="reference internal" 
href="#create-static-shape-functions">Create Static-Shape Functions</a></li>
-<li class="toctree-l4"><a class="reference internal" 
href="#create-dynamic-shape-functions">Create Dynamic-Shape Functions</a></li>
-</ul>
-</li>
-</ul>
-</li>
-<li class="toctree-l2"><a class="reference internal" 
href="tir_transformation.html">Transformation</a></li>
-</ul>
-</li>
+<li class="toctree-l1"><a class="reference internal" 
href="../index.html">TensorIR</a></li>
 <li class="toctree-l1"><a class="reference internal" 
href="../../relax/index.html">Relax</a></li>
 </ul>
 <p class="caption" role="heading"><span class="caption-text">API 
Reference</span></p>
@@ -509,17 +489,17 @@ be used to ascertain the shape and data type of a 
TensorIR.</p>
 <span class="nb">print</span><span class="p">(</span><span 
class="n">evaluate_dynamic_shape</span><span class="p">(</span><span 
class="n">dyn_shape_lib</span><span class="p">,</span> <span 
class="n">m</span><span class="o">=</span><span class="mi">64</span><span 
class="p">,</span> <span class="n">n</span><span class="o">=</span><span 
class="mi">64</span><span class="p">,</span> <a 
href="../../../reference/api/python/tir/tir.html#tvm.tir.IterVar" 
title="tvm.tir.IterVar" class="sphx-glr-ba [...]
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div 
class="highlight"><pre><span></span>[[0.66395146 0.9731671  0.5579553  
0.83329487]
- [0.28166202 0.2968866  0.18112361 0.2758509 ]
- [0.29287887 0.59326327 0.31995752 0.4811729 ]
- [0.36032397 0.9437797  0.569279   0.672779  ]]
-[[31.453966 28.067139 29.689894 ... 31.817009 34.22411  30.223434]
- [28.98735  29.64394  29.827055 ... 31.29005  34.07603  29.910816]
- [30.846943 30.170332 31.640345 ... 32.7398   35.731262 30.372322]
+<div class="sphx-glr-script-out highlight-none notranslate"><div 
class="highlight"><pre><span></span>[[1.1707907  1.1987079  0.90032184 1.230586 
 ]
+ [1.2139356  1.1025238  0.9032923  1.1955839 ]
+ [0.95371306 0.86685586 0.69221765 1.0787495 ]
+ [1.0905532  0.71847934 0.96397567 0.5273048 ]]
+[[30.138514 28.199673 24.100554 ... 30.719109 28.356918 29.140915]
+ [37.550213 33.45395  31.837355 ... 36.413628 32.489414 33.274647]
+ [35.609116 33.061962 31.23119  ... 34.974236 32.11245  30.839909]
  ...
- [33.3841   30.616657 33.785934 ... 34.35256  38.367065 34.010597]
- [33.75541  33.209534 33.53989  ... 37.496735 38.519188 34.53092 ]
- [30.60816  28.78788  30.272537 ... 31.03617  35.37258  29.144073]]
+ [35.50563  35.12555  30.850153 ... 37.291557 32.579887 32.654045]
+ [30.811462 29.73757  29.051355 ... 33.262543 30.94805  29.032318]
+ [35.33563  35.39359  30.650606 ... 37.331318 33.28751  33.48364 ]]
 </pre></div>
 </div>
 </section>
diff --git a/docs/deep_dive/tensor_ir/tutorials/tir_transformation.html 
b/docs/deep_dive/tensor_ir/tutorials/tir_transformation.html
index 9a0a03a71a..0cd95da591 100644
--- a/docs/deep_dive/tensor_ir/tutorials/tir_transformation.html
+++ b/docs/deep_dive/tensor_ir/tutorials/tir_transformation.html
@@ -192,22 +192,9 @@
 <li class="toctree-l1"><a class="reference internal" 
href="../../../how_to/dev/index.html">Development Guides</a></li>
 </ul>
 <p class="caption" role="heading"><span class="caption-text">Deep 
Dive</span></p>
-<ul class="current">
+<ul>
 <li class="toctree-l1"><a class="reference internal" 
href="../../../arch/index.html">Design and Architecture</a></li>
-<li class="toctree-l1 current"><a class="reference internal" 
href="../index.html">TensorIR</a><ul class="current">
-<li class="toctree-l2"><a class="reference internal" 
href="../abstraction.html">Tensor Program Abstraction</a></li>
-<li class="toctree-l2"><a class="reference internal" 
href="../learning.html">Understand TensorIR Abstraction</a></li>
-<li class="toctree-l2"><a class="reference internal" 
href="tir_creation.html">TensorIR Creation</a></li>
-<li class="toctree-l2 current"><a class="current reference internal" 
href="#">Transformation</a><ul>
-<li class="toctree-l3"><a class="reference internal" 
href="#initialization-schedule">Initialization Schedule</a></li>
-<li class="toctree-l3"><a class="reference internal" href="#loop-tiling">Loop 
Tiling</a></li>
-<li class="toctree-l3"><a class="reference internal" 
href="#leverage-localities">Leverage Localities</a></li>
-<li class="toctree-l3"><a class="reference internal" 
href="#rewrite-reduction">Rewrite Reduction</a></li>
-<li class="toctree-l3"><a class="reference internal" 
href="#trace-the-transformation">Trace the Transformation</a></li>
-</ul>
-</li>
-</ul>
-</li>
+<li class="toctree-l1"><a class="reference internal" 
href="../index.html">TensorIR</a></li>
 <li class="toctree-l1"><a class="reference internal" 
href="../../relax/index.html">Relax</a></li>
 </ul>
 <p class="caption" role="heading"><span class="caption-text">API 
Reference</span></p>
@@ -381,7 +368,7 @@ original implementation.</p>
 </div>
 <div class="sphx-glr-script-out highlight-none notranslate"><div 
class="highlight"><pre><span></span>Execution time summary:
  mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
-   2.7574       2.7574       2.7574       2.7574       0.0000
+   2.7350       2.7350       2.7350       2.7350       0.0000
 </pre></div>
 </div>
 <section id="initialization-schedule">
@@ -477,7 +464,7 @@ class Module:
 
 Execution time summary:
  mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
-   0.8681       0.8681       0.8681       0.8681       0.0000
+   0.8611       0.8611       0.8611       0.8611       0.0000
 </pre></div>
 </div>
 </section>
@@ -571,7 +558,7 @@ class Module:
 
 Execution time summary:
  mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
-   0.3475       0.3475       0.3475       0.3475       0.0000
+   0.3371       0.3371       0.3371       0.3371       0.0000
 </pre></div>
 </div>
 </section>
diff --git a/docs/get_started/tutorials/ir_module.html 
b/docs/get_started/tutorials/ir_module.html
index 6fbb2ab84b..578d0fdf67 100644
--- a/docs/get_started/tutorials/ir_module.html
+++ b/docs/get_started/tutorials/ir_module.html
@@ -803,16 +803,16 @@ backends.</p>
 <p>We can deploy the IRModule on CPU by specifying the target as <code 
class="docutils literal notranslate"><span class="pre">llvm</span></code>.</p>
 <div class="highlight-Python notranslate"><div 
class="highlight"><pre><span></span><a 
href="../../reference/api/python/relax/relax.html#tvm.relax.VMExecutable" 
title="tvm.relax.VMExecutable" class="sphx-glr-backref-module-tvm-relax 
sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span 
class="n">exec</span></a> <span class="o">=</span> <a 
href="../../reference/api/python/driver.html#tvm.compile" title="tvm.compile" 
class="sphx-glr-backref-module-tvm sphx-glr-backref-type-py-func [...]
 <span class="n">dev</span> <span class="o">=</span> <span 
class="n">tvm</span><span class="o">.</span><span class="n">cpu</span><span 
class="p">()</span>
-<a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm sphx-glr-backref-type-py-class 
sphx-glr-backref-instance"><span class="n">vm</span></a> <span 
class="o">=</span> <a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm 
sphx-glr-backref-type-py-class"><span class=" [...]
+<span class="n">vm</span> <span class="o">=</span> <a 
href="../../reference/api/python/relax/relax.html#tvm.relax.VirtualMachine" 
title="tvm.relax.VirtualMachine" class="sphx-glr-backref-module-tvm-relax 
sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span 
class="n">relax</span><span class="o">.</span><span 
class="n">VirtualMachine</span></a><span class="p">(</span><a 
href="../../reference/api/python/relax/relax.html#tvm.relax.VMExecutable" 
title="tvm.relax.VMExecutable" class [...]
 
 <span class="n">raw_data</span> <span class="o">=</span> <span 
class="n">np</span><span class="o">.</span><span class="n">random</span><span 
class="o">.</span><span class="n">rand</span><span class="p">(</span><span 
class="mi">1</span><span class="p">,</span> <span class="mi">784</span><span 
class="p">)</span><span class="o">.</span><span class="n">astype</span><span 
class="p">(</span><span class="s2">&quot;float32&quot;</span><span 
class="p">)</span>
 <span class="n">data</span> <span class="o">=</span> <span 
class="n">tvm</span><span class="o">.</span><span class="n">runtime</span><span 
class="o">.</span><span class="n">tensor</span><span class="p">(</span><span 
class="n">raw_data</span><span class="p">,</span> <span 
class="n">dev</span><span class="p">)</span>
-<span class="n">cpu_out</span> <span class="o">=</span> <a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm sphx-glr-backref-type-py-class 
sphx-glr-backref-instance"><span class="n">vm</span></a><span 
class="p">[</span><span class="s2">&quot;main&quot;</span><span 
class="p">](</span><span class="n">data</span><span class="p">,</span> <span 
class="o">*</span><a href="https:// [...]
+<span class="n">cpu_out</span> <span class="o">=</span> <span 
class="n">vm</span><span class="p">[</span><span 
class="s2">&quot;main&quot;</span><span class="p">](</span><span 
class="n">data</span><span class="p">,</span> <span class="o">*</span><a 
href="https://docs.python.org/3/library/stdtypes.html#dict"; 
title="builtins.dict" class="sphx-glr-backref-module-builtins 
sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span 
class="n">params_from_torch</span></a><span class="p">[</ [...]
 <span class="nb">print</span><span class="p">(</span><span 
class="n">cpu_out</span><span class="p">)</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div 
class="highlight"><pre><span></span>[[ 0.13186169  0.03027971  0.25219893 
-0.00776528  0.04768787  0.049862
-  -0.00460886  0.12887058  0.11069757  0.07827966]]
+<div class="sphx-glr-script-out highlight-none notranslate"><div 
class="highlight"><pre><span></span>[[ 0.04623564  0.15965246 -0.10441527 
-0.07932861 -0.11549026  0.09914821
+   0.010793   -0.0678812   0.03407104 -0.07796752]]
 </pre></div>
 </div>
 </section>
@@ -835,19 +835,19 @@ the details of <code class="docutils literal 
notranslate"><span class="pre">DLig
 <p>Now we can compile the IRModule on GPU, the similar way as we did on 
CPU.</p>
 <div class="highlight-Python notranslate"><div 
class="highlight"><pre><span></span><a 
href="../../reference/api/python/relax/relax.html#tvm.relax.VMExecutable" 
title="tvm.relax.VMExecutable" class="sphx-glr-backref-module-tvm-relax 
sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span 
class="n">exec</span></a> <span class="o">=</span> <a 
href="../../reference/api/python/driver.html#tvm.compile" title="tvm.compile" 
class="sphx-glr-backref-module-tvm sphx-glr-backref-type-py-func [...]
 <span class="n">dev</span> <span class="o">=</span> <span 
class="n">tvm</span><span class="o">.</span><span class="n">device</span><span 
class="p">(</span><span class="s2">&quot;cuda&quot;</span><span 
class="p">,</span> <span class="mi">0</span><span class="p">)</span>
-<a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm sphx-glr-backref-type-py-class 
sphx-glr-backref-instance"><span class="n">vm</span></a> <span 
class="o">=</span> <a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm 
sphx-glr-backref-type-py-class"><span class=" [...]
+<span class="n">vm</span> <span class="o">=</span> <a 
href="../../reference/api/python/relax/relax.html#tvm.relax.VirtualMachine" 
title="tvm.relax.VirtualMachine" class="sphx-glr-backref-module-tvm-relax 
sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span 
class="n">relax</span><span class="o">.</span><span 
class="n">VirtualMachine</span></a><span class="p">(</span><a 
href="../../reference/api/python/relax/relax.html#tvm.relax.VMExecutable" 
title="tvm.relax.VMExecutable" class [...]
 <span class="c1"># Need to allocate data and params on GPU device</span>
 <span class="n">data</span> <span class="o">=</span> <span 
class="n">tvm</span><span class="o">.</span><span class="n">runtime</span><span 
class="o">.</span><span class="n">tensor</span><span class="p">(</span><span 
class="n">raw_data</span><span class="p">,</span> <span 
class="n">dev</span><span class="p">)</span>
 <a href="https://docs.python.org/3/library/stdtypes.html#list"; 
title="builtins.list" class="sphx-glr-backref-module-builtins 
sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span 
class="n">gpu_params</span></a> <span class="o">=</span> <span 
class="p">[</span><span class="n">tvm</span><span class="o">.</span><span 
class="n">runtime</span><span class="o">.</span><span 
class="n">tensor</span><span class="p">(</span><span class="n">p</span><span 
class="p">,</span> <span class="n"> [...]
-<span class="n">gpu_out</span> <span class="o">=</span> <a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm sphx-glr-backref-type-py-class 
sphx-glr-backref-instance"><span class="n">vm</span></a><span 
class="p">[</span><span class="s2">&quot;main&quot;</span><span 
class="p">](</span><span class="n">data</span><span class="p">,</span> <span 
class="o">*</span><a href="https:// [...]
+<span class="n">gpu_out</span> <span class="o">=</span> <span 
class="n">vm</span><span class="p">[</span><span 
class="s2">&quot;main&quot;</span><span class="p">](</span><span 
class="n">data</span><span class="p">,</span> <span class="o">*</span><a 
href="https://docs.python.org/3/library/stdtypes.html#list"; 
title="builtins.list" class="sphx-glr-backref-module-builtins 
sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span 
class="n">gpu_params</span></a><span class="p">)</span><s [...]
 <span class="nb">print</span><span class="p">(</span><span 
class="n">gpu_out</span><span class="p">)</span>
 
 <span class="c1"># Check the correctness of the results</span>
 <span class="k">assert</span> <span class="n">np</span><span 
class="o">.</span><span class="n">allclose</span><span class="p">(</span><span 
class="n">cpu_out</span><span class="p">,</span> <span 
class="n">gpu_out</span><span class="p">,</span> <span 
class="n">atol</span><span class="o">=</span><span class="mf">1e-3</span><span 
class="p">)</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div 
class="highlight"><pre><span></span>[[ 0.13186167  0.03027973  0.25219896 
-0.00776527  0.04768784  0.049862
-  -0.00460887  0.12887052  0.11069752  0.07827966]]
+<div class="sphx-glr-script-out highlight-none notranslate"><div 
class="highlight"><pre><span></span>[[ 0.04623562  0.15965243 -0.10441533 
-0.07932858 -0.11549021  0.09914818
+   0.01079302 -0.0678813   0.03407101 -0.07796749]]
 </pre></div>
 </div>
 </section>
diff --git a/docs/get_started/tutorials/quick_start.html 
b/docs/get_started/tutorials/quick_start.html
index e7d8b744f3..1faa671b96 100644
--- a/docs/get_started/tutorials/quick_start.html
+++ b/docs/get_started/tutorials/quick_start.html
@@ -449,16 +449,16 @@ different devices.</p>
 <a href="../../reference/api/python/target.html#tvm.target.Target" 
title="tvm.target.Target" class="sphx-glr-backref-module-tvm-target 
sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span 
class="n">target</span></a> <span class="o">=</span> <a 
href="../../reference/api/python/target.html#tvm.target.Target" 
title="tvm.target.Target" class="sphx-glr-backref-module-tvm-target 
sphx-glr-backref-type-py-class"><span class="n">tvm</span><span 
class="o">.</span><span class="n">target< [...]
 <a href="../../reference/api/python/relax/relax.html#tvm.relax.VMExecutable" 
title="tvm.relax.VMExecutable" class="sphx-glr-backref-module-tvm-relax 
sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span 
class="n">ex</span></a> <span class="o">=</span> <a 
href="../../reference/api/python/driver.html#tvm.compile" title="tvm.compile" 
class="sphx-glr-backref-module-tvm sphx-glr-backref-type-py-function"><span 
class="n">tvm</span><span class="o">.</span><span class="n">compile</span [...]
 <span class="n">device</span> <span class="o">=</span> <span 
class="n">tvm</span><span class="o">.</span><span class="n">cpu</span><span 
class="p">()</span>
-<a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm sphx-glr-backref-type-py-class 
sphx-glr-backref-instance"><span class="n">vm</span></a> <span 
class="o">=</span> <a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm 
sphx-glr-backref-type-py-class"><span class=" [...]
+<span class="n">vm</span> <span class="o">=</span> <a 
href="../../reference/api/python/relax/relax.html#tvm.relax.VirtualMachine" 
title="tvm.relax.VirtualMachine" class="sphx-glr-backref-module-tvm-relax 
sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span 
class="n">relax</span><span class="o">.</span><span 
class="n">VirtualMachine</span></a><span class="p">(</span><a 
href="../../reference/api/python/relax/relax.html#tvm.relax.VMExecutable" 
title="tvm.relax.VMExecutable" class [...]
 <span class="n">data</span> <span class="o">=</span> <span 
class="n">np</span><span class="o">.</span><span class="n">random</span><span 
class="o">.</span><span class="n">rand</span><span class="p">(</span><span 
class="mi">1</span><span class="p">,</span> <span class="mi">784</span><span 
class="p">)</span><span class="o">.</span><span class="n">astype</span><span 
class="p">(</span><span class="s2">&quot;float32&quot;</span><span 
class="p">)</span>
 <span class="n">tvm_data</span> <span class="o">=</span> <span 
class="n">tvm</span><span class="o">.</span><span class="n">runtime</span><span 
class="o">.</span><span class="n">tensor</span><span class="p">(</span><span 
class="n">data</span><span class="p">,</span> <span 
class="n">device</span><span class="o">=</span><span 
class="n">device</span><span class="p">)</span>
 <a href="https://docs.python.org/3/library/stdtypes.html#list"; 
title="builtins.list" class="sphx-glr-backref-module-builtins 
sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span 
class="n">params</span></a> <span class="o">=</span> <span 
class="p">[</span><span class="n">np</span><span class="o">.</span><span 
class="n">random</span><span class="o">.</span><span class="n">rand</span><span 
class="p">(</span><span class="o">*</span><span class="n">param</span><span 
class="o">.</sp [...]
 <a href="https://docs.python.org/3/library/stdtypes.html#list"; 
title="builtins.list" class="sphx-glr-backref-module-builtins 
sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span 
class="n">params</span></a> <span class="o">=</span> <span 
class="p">[</span><span class="n">tvm</span><span class="o">.</span><span 
class="n">runtime</span><span class="o">.</span><span 
class="n">tensor</span><span class="p">(</span><span 
class="n">param</span><span class="p">,</span> <span class="n"> [...]
-<span class="nb">print</span><span class="p">(</span><a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm sphx-glr-backref-type-py-class 
sphx-glr-backref-instance"><span class="n">vm</span></a><span 
class="p">[</span><span class="s2">&quot;forward&quot;</span><span 
class="p">](</span><span class="n">tvm_data</span><span class="p">,</span> 
<span class="o">*</span><a href="http [...]
+<span class="nb">print</span><span class="p">(</span><span 
class="n">vm</span><span class="p">[</span><span 
class="s2">&quot;forward&quot;</span><span class="p">](</span><span 
class="n">tvm_data</span><span class="p">,</span> <span class="o">*</span><a 
href="https://docs.python.org/3/library/stdtypes.html#list"; 
title="builtins.list" class="sphx-glr-backref-module-builtins 
sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span 
class="n">params</span></a><span class="p">)</span><s [...]
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div 
class="highlight"><pre><span></span>[[26306.326 26073.63  26867.973 26758.207 
25452.387 26039.125 26349.092
-  26437.273 26610.932 25183.252]]
+<div class="sphx-glr-script-out highlight-none notranslate"><div 
class="highlight"><pre><span></span>[[24096.695 23445.986 24458.24  25460.695 
24985.992 25700.53  24365.977
+  24206.19  23916.527 24693.312]]
 </pre></div>
 </div>
 <p>Our goal is to bring machine learning to the application with any language 
of interest,
@@ -466,8 +466,8 @@ with the minimum runtime support.</p>
 <ul>
 <li><p>Each function in IRModule becomes a runnable function in the runtime. 
For example in LLM
 cases, we can call <code class="docutils literal notranslate"><span 
class="pre">prefill</span></code> and <code class="docutils literal 
notranslate"><span class="pre">decode</span></code> functions directly.</p>
-<div class="highlight-Python notranslate"><div 
class="highlight"><pre><span></span><span class="n">prefill_logits</span> <span 
class="o">=</span> <a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm sphx-glr-backref-type-py-class 
sphx-glr-backref-instance"><span class="n">vm</span></a><span 
class="p">[</span><span class="s2">&quot;prefill&quot;</span><span 
class="p">](</span> [...]
-<span class="n">decoded_logits</span> <span class="o">=</span> <a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm sphx-glr-backref-type-py-class 
sphx-glr-backref-instance"><span class="n">vm</span></a><span 
class="p">[</span><span class="s2">&quot;decode&quot;</span><span 
class="p">](</span><span class="n">inputs</span><span class="p">,</span> <span 
class="n">weight</span>< [...]
+<div class="highlight-Python notranslate"><div 
class="highlight"><pre><span></span><span class="n">prefill_logits</span> <span 
class="o">=</span> <span class="n">vm</span><span class="p">[</span><span 
class="s2">&quot;prefill&quot;</span><span class="p">](</span><span 
class="n">inputs</span><span class="p">,</span> <span 
class="n">weight</span><span class="p">,</span> <span 
class="n">kv_cache</span><span class="p">)</span>
+<span class="n">decoded_logits</span> <span class="o">=</span> <span 
class="n">vm</span><span class="p">[</span><span 
class="s2">&quot;decode&quot;</span><span class="p">](</span><span 
class="n">inputs</span><span class="p">,</span> <span 
class="n">weight</span><span class="p">,</span> <span 
class="n">kv_cache</span><span class="p">)</span>
 </pre></div>
 </div>
 </li>
@@ -482,15 +482,15 @@ copy exchange with existing ecosystem (DLPack exchange 
with PyTorch)</p>
 </li>
 <li><p>TVM runtime works in non-python environments, so it works on settings 
such as mobile</p>
 <div class="highlight-C++ notranslate"><div 
class="highlight"><pre><span></span><span class="c1">// C++ snippet</span>
-<span class="n">runtime</span><span class="o">::</span><span 
class="n">Module</span><span class="w"> </span><a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm sphx-glr-backref-type-py-class 
sphx-glr-backref-instance"><span class="n">vm</span></a><span class="w"> 
</span><span class="o">=</span><span class="w"> </span><a 
href="../../reference/api/python/relax/relax.html#tvm.r [...]
-<a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm sphx-glr-backref-type-py-class 
sphx-glr-backref-instance"><span class="n">vm</span></a><span 
class="p">.</span><span class="n">GetFunction</span><span 
class="p">(</span><span class="s">&quot;init&quot;</span><span 
class="p">)(...);</span>
-<span class="n">Tensor</span><span class="w"> </span><span 
class="n">out</span><span class="w"> </span><span class="o">=</span><span 
class="w"> </span><a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm sphx-glr-backref-type-py-class 
sphx-glr-backref-instance"><span class="n">vm</span></a><span 
class="p">.</span><span class="n">GetFunction</span><span 
class="p">(</span><span [...]
+<span class="n">runtime</span><span class="o">::</span><span 
class="n">Module</span><span class="w"> </span><span class="n">vm</span><span 
class="w"> </span><span class="o">=</span><span class="w"> </span><a 
href="../../reference/api/python/relax/relax.html#tvm.relax.VMExecutable" 
title="tvm.relax.VMExecutable" class="sphx-glr-backref-module-tvm-relax 
sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span 
class="n">ex</span></a><span class="p">.</span><span class="n">GetFunction [...]
+<span class="n">vm</span><span class="p">.</span><span 
class="n">GetFunction</span><span class="p">(</span><span 
class="s">&quot;init&quot;</span><span class="p">)(...);</span>
+<span class="n">Tensor</span><span class="w"> </span><span 
class="n">out</span><span class="w"> </span><span class="o">=</span><span 
class="w"> </span><span class="n">vm</span><span class="p">.</span><span 
class="n">GetFunction</span><span class="p">(</span><span 
class="s">&quot;prefill&quot;</span><span class="p">)(</span><span 
class="n">data</span><span class="p">,</span><span class="w"> </span><span 
class="n">weight</span><span class="p">,</span><span class="w"> </span><span 
class="n" [...]
 </pre></div>
 </div>
 <div class="highlight-Java notranslate"><div 
class="highlight"><pre><span></span><span class="c1">// Java snippet</span>
-<span class="n">Module</span><span class="w"> </span><a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm sphx-glr-backref-type-py-class 
sphx-glr-backref-instance"><span class="n">vm</span></a><span class="w"> 
</span><span class="o">=</span><span class="w"> </span><a 
href="../../reference/api/python/relax/relax.html#tvm.relax.VMExecutable" 
title="tvm.relax.VMExecutable" class [...]
-<a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm sphx-glr-backref-type-py-class 
sphx-glr-backref-instance"><span class="n">vm</span></a><span 
class="p">.</span><span class="na">getFunction</span><span 
class="p">(</span><span class="s">&quot;init&quot;</span><span 
class="p">).</span><span class="na">pushArg</span><span 
class="p">(...).</span><span class="na">invoke</span>< [...]
-<span class="n">Tensor</span><span class="w"> </span><span 
class="n">out</span><span class="w"> </span><span class="o">=</span><span 
class="w"> </span><a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm sphx-glr-backref-type-py-class 
sphx-glr-backref-instance"><span class="n">vm</span></a><span 
class="p">.</span><span class="na">getFunction</span><span 
class="p">(</span><spa [...]
+<span class="n">Module</span><span class="w"> </span><span 
class="n">vm</span><span class="w"> </span><span class="o">=</span><span 
class="w"> </span><a 
href="../../reference/api/python/relax/relax.html#tvm.relax.VMExecutable" 
title="tvm.relax.VMExecutable" class="sphx-glr-backref-module-tvm-relax 
sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span 
class="n">ex</span></a><span class="p">.</span><span 
class="na">getFunction</span><span class="p">(</span><span class="s">&quot;l 
[...]
+<span class="n">vm</span><span class="p">.</span><span 
class="na">getFunction</span><span class="p">(</span><span 
class="s">&quot;init&quot;</span><span class="p">).</span><span 
class="na">pushArg</span><span class="p">(...).</span><span 
class="na">invoke</span><span class="p">;</span>
+<span class="n">Tensor</span><span class="w"> </span><span 
class="n">out</span><span class="w"> </span><span class="o">=</span><span 
class="w"> </span><span class="n">vm</span><span class="p">.</span><span 
class="na">getFunction</span><span class="p">(</span><span 
class="s">&quot;prefill&quot;</span><span class="p">).</span><span 
class="na">pushArg</span><span class="p">(</span><span 
class="n">data</span><span class="p">).</span><span 
class="na">pushArg</span><span class="p">(</span><spa [...]
 </pre></div>
 </div>
 </li>
diff --git a/docs/get_started/tutorials/sg_execution_times.html 
b/docs/get_started/tutorials/sg_execution_times.html
index f816b0f553..184692bc9f 100644
--- a/docs/get_started/tutorials/sg_execution_times.html
+++ b/docs/get_started/tutorials/sg_execution_times.html
@@ -294,7 +294,7 @@
             
   <section id="computation-times">
 <span 
id="sphx-glr-get-started-tutorials-sg-execution-times"></span><h1>Computation 
times<a class="headerlink" href="#computation-times" title="Link to this 
heading"></a></h1>
-<p><strong>00:07.539</strong> total execution time for 2 files <strong>from 
get_started/tutorials</strong>:</p>
+<p><strong>00:05.508</strong> total execution time for 2 files <strong>from 
get_started/tutorials</strong>:</p>
 <div class="docutils container">
 <style scoped>
 <link 
href="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/5.3.0/css/bootstrap.min.css";
 rel="stylesheet" />
@@ -316,11 +316,11 @@ $(document).ready( function () {
 </thead>
 <tbody>
 <tr class="row-even"><td><p><a class="reference internal" 
href="ir_module.html#sphx-glr-get-started-tutorials-ir-module-py"><span 
class="std std-ref">IRModule</span></a> (<code class="docutils literal 
notranslate"><span class="pre">ir_module.py</span></code>)</p></td>
-<td><p>00:07.364</p></td>
+<td><p>00:05.226</p></td>
 <td><p>0.0</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" 
href="quick_start.html#sphx-glr-get-started-tutorials-quick-start-py"><span 
class="std std-ref">Quick Start</span></a> (<code class="docutils literal 
notranslate"><span class="pre">quick_start.py</span></code>)</p></td>
-<td><p>00:00.175</p></td>
+<td><p>00:00.283</p></td>
 <td><p>0.0</p></td>
 </tr>
 </tbody>
diff --git a/docs/how_to/tutorials/cross_compilation_and_rpc.html 
b/docs/how_to/tutorials/cross_compilation_and_rpc.html
index f17cb5894e..d5888940dc 100644
--- a/docs/how_to/tutorials/cross_compilation_and_rpc.html
+++ b/docs/how_to/tutorials/cross_compilation_and_rpc.html
@@ -473,7 +473,7 @@ device and returns the measured cost. Network overhead is 
excluded.</p>
 <span class="nb">print</span><span class="p">(</span><span 
class="s2">&quot;</span><span class="si">%g</span><span class="s2"> 
secs/op&quot;</span> <span class="o">%</span> <span class="n">cost</span><span 
class="p">)</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div 
class="highlight"><pre><span></span>1.33e-07 secs/op
+<div class="sphx-glr-script-out highlight-none notranslate"><div 
class="highlight"><pre><span></span>1.15e-07 secs/op
 </pre></div>
 </div>
 </section>
@@ -822,8 +822,8 @@ for ONNX models. Simply replace <code class="docutils 
literal notranslate"><span
 <div class="sphx-glr-script-out highlight-none notranslate"><div 
class="highlight"><pre><span></span>Converted PyTorch model to Relax:
   - Number of parameters: 4
 Using local target for demonstration
-Exported library to: /tmp/tmp5u8y539w/model_deployed.so
-Saved parameters to: /tmp/tmp5u8y539w/model_params.npz
+Exported library to: /tmp/tmprxhj41dx/model_deployed.so
+Saved parameters to: /tmp/tmprxhj41dx/model_params.npz
 
 RPC workflow (works for any remote device):
 ==================================================
diff --git a/docs/how_to/tutorials/customize_opt.html 
b/docs/how_to/tutorials/customize_opt.html
index d82bbd307b..b491e4da1d 100644
--- a/docs/how_to/tutorials/customize_opt.html
+++ b/docs/how_to/tutorials/customize_opt.html
@@ -598,16 +598,16 @@ pushing the performance to the limit. The current 
optimization may not be the be
 <p>We can build and deploy the optimized model to the TVM runtime.</p>
 <div class="highlight-Python notranslate"><div 
class="highlight"><pre><span></span><a 
href="../../reference/api/python/relax/relax.html#tvm.relax.VMExecutable" 
title="tvm.relax.VMExecutable" class="sphx-glr-backref-module-tvm-relax 
sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span 
class="n">ex</span></a> <span class="o">=</span> <a 
href="../../reference/api/python/driver.html#tvm.compile" title="tvm.compile" 
class="sphx-glr-backref-module-tvm sphx-glr-backref-type-py-functi [...]
 <span class="n">dev</span> <span class="o">=</span> <span 
class="n">tvm</span><span class="o">.</span><span class="n">device</span><span 
class="p">(</span><span class="s2">&quot;cuda&quot;</span><span 
class="p">,</span> <span class="mi">0</span><span class="p">)</span>
-<a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm sphx-glr-backref-type-py-class 
sphx-glr-backref-instance"><span class="n">vm</span></a> <span 
class="o">=</span> <a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm 
sphx-glr-backref-type-py-class"><span class=" [...]
+<span class="n">vm</span> <span class="o">=</span> <a 
href="../../reference/api/python/relax/relax.html#tvm.relax.VirtualMachine" 
title="tvm.relax.VirtualMachine" class="sphx-glr-backref-module-tvm-relax 
sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span 
class="n">relax</span><span class="o">.</span><span 
class="n">VirtualMachine</span></a><span class="p">(</span><a 
href="../../reference/api/python/relax/relax.html#tvm.relax.VMExecutable" 
title="tvm.relax.VMExecutable" class [...]
 <span class="c1"># Need to allocate data and params on GPU device</span>
 <span class="n">data</span> <span class="o">=</span> <span 
class="n">tvm</span><span class="o">.</span><span class="n">runtime</span><span 
class="o">.</span><span class="n">tensor</span><span class="p">(</span><span 
class="n">np</span><span class="o">.</span><span class="n">random</span><span 
class="o">.</span><span class="n">rand</span><span class="p">(</span><span 
class="o">*</span><a 
href="https://docs.python.org/3/library/stdtypes.html#tuple"; 
title="builtins.tuple" class="sphx-glr-ba [...]
 <a href="https://docs.python.org/3/library/stdtypes.html#list"; 
title="builtins.list" class="sphx-glr-backref-module-builtins 
sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span 
class="n">gpu_params</span></a> <span class="o">=</span> <span 
class="p">[</span><span class="n">tvm</span><span class="o">.</span><span 
class="n">runtime</span><span class="o">.</span><span 
class="n">tensor</span><span class="p">(</span><span class="n">np</span><span 
class="o">.</span><span class="n"> [...]
-<span class="n">gpu_out</span> <span class="o">=</span> <a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm sphx-glr-backref-type-py-class 
sphx-glr-backref-instance"><span class="n">vm</span></a><span 
class="p">[</span><span class="s2">&quot;forward&quot;</span><span 
class="p">](</span><span class="n">data</span><span class="p">,</span> <span 
class="o">*</span><a href="https [...]
+<span class="n">gpu_out</span> <span class="o">=</span> <span 
class="n">vm</span><span class="p">[</span><span 
class="s2">&quot;forward&quot;</span><span class="p">](</span><span 
class="n">data</span><span class="p">,</span> <span class="o">*</span><a 
href="https://docs.python.org/3/library/stdtypes.html#list"; 
title="builtins.list" class="sphx-glr-backref-module-builtins 
sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span 
class="n">gpu_params</span></a><span class="p">)</span [...]
 <span class="nb">print</span><span class="p">(</span><span 
class="n">gpu_out</span><span class="p">)</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div 
class="highlight"><pre><span></span>[[24151.723 24796.18  25246.785 24885.469 
25175.598 25225.684 25009.82
-  25679.812 26206.955 25438.086]]
+<div class="sphx-glr-script-out highlight-none notranslate"><div 
class="highlight"><pre><span></span>[[25672.938 25102.557 25765.455 27441.629 
26252.055 26036.45  25429.342
+  25760.045 26325.924 24135.113]]
 </pre></div>
 </div>
 </section>
diff --git a/docs/how_to/tutorials/e2e_opt_model.html 
b/docs/how_to/tutorials/e2e_opt_model.html
index de1ab95fa5..7512be4c09 100644
--- a/docs/how_to/tutorials/e2e_opt_model.html
+++ b/docs/how_to/tutorials/e2e_opt_model.html
@@ -328,8 +328,8 @@ PyTorch.</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div 
class="highlight"><pre><span></span>Downloading: 
&quot;https://download.pytorch.org/models/resnet18-f37072fd.pth&quot; to 
/workspace/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
 
   0%|          | 0.00/44.7M [00:00&lt;?, ?B/s]
- 53%|█████▎    | 23.6M/44.7M [00:00&lt;00:00, 247MB/s]
-100%|██████████| 44.7M/44.7M [00:00&lt;00:00, 278MB/s]
+ 58%|█████▊    | 26.0M/44.7M [00:00&lt;00:00, 272MB/s]
+100%|██████████| 44.7M/44.7M [00:00&lt;00:00, 299MB/s]
 </pre></div>
 </div>
 </section>
@@ -430,7 +430,7 @@ We skip this step in the CI environment.</p>
         <span class="n">mod</span> <span class="o">=</span> <a 
href="../../reference/api/python/tir/transform.html#tvm.tir.transform.DefaultGPUSchedule"
 title="tvm.tir.transform.DefaultGPUSchedule" 
class="sphx-glr-backref-module-tvm-tir-transform 
sphx-glr-backref-type-py-function"><span class="n">tvm</span><span 
class="o">.</span><span class="n">tir</span><span class="o">.</span><span 
class="n">transform</span><span class="o">.</span><span 
class="n">DefaultGPUSchedule</span></a><span cla [...]
     <span class="n">ex</span> <span class="o">=</span> <a 
href="../../reference/api/python/driver.html#tvm.compile" title="tvm.compile" 
class="sphx-glr-backref-module-tvm sphx-glr-backref-type-py-function"><span 
class="n">tvm</span><span class="o">.</span><span 
class="n">compile</span></a><span class="p">(</span><span 
class="n">mod</span><span class="p">,</span> <a 
href="../../reference/api/python/target.html#tvm.target.Target" 
title="tvm.target.Target" class="sphx-glr-backref-module-tvm [...]
     <span class="n">dev</span> <span class="o">=</span> <span 
class="n">tvm</span><span class="o">.</span><span class="n">device</span><span 
class="p">(</span><span class="s2">&quot;cuda&quot;</span><span 
class="p">,</span> <span class="mi">0</span><span class="p">)</span>
-    <span class="n">vm</span> <span class="o">=</span> <a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm 
sphx-glr-backref-type-py-class"><span class="n">relax</span><span 
class="o">.</span><span class="n">VirtualMachine</span></a><span 
class="p">(</span><span class="n">ex</span><span class="p">,</span> <span 
class="n">dev</span><span class="p">)</span>
+    <span class="n">vm</span> <span class="o">=</span> <a 
href="../../reference/api/python/relax/relax.html#tvm.relax.VirtualMachine" 
title="tvm.relax.VirtualMachine" class="sphx-glr-backref-module-tvm-relax 
sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span 
class="n">relax</span><span class="o">.</span><span 
class="n">VirtualMachine</span></a><span class="p">(</span><span 
class="n">ex</span><span class="p">,</span> <span class="n">dev</span><span 
class="p">)</span>
     <span class="c1"># Need to allocate data and params on GPU device</span>
     <span class="n">gpu_data</span> <span class="o">=</span> <span 
class="n">tvm</span><span class="o">.</span><span class="n">runtime</span><span 
class="o">.</span><span class="n">tensor</span><span class="p">(</span><span 
class="n">np</span><span class="o">.</span><span class="n">random</span><span 
class="o">.</span><span class="n">rand</span><span class="p">(</span><span 
class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span 
class="p">,</span> <span class="mi">224< [...]
     <span class="n">gpu_params</span> <span class="o">=</span> <span 
class="p">[</span><span class="n">tvm</span><span class="o">.</span><span 
class="n">runtime</span><span class="o">.</span><span 
class="n">tensor</span><span class="p">(</span><span class="n">p</span><span 
class="p">,</span> <span class="n">dev</span><span class="p">)</span> <span 
class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span 
class="n">params</span><span class="p">[</span><span class="s2" [...]
diff --git a/docs/how_to/tutorials/export_and_load_executable.html 
b/docs/how_to/tutorials/export_and_load_executable.html
index 380d96df46..df0a774356 100644
--- a/docs/how_to/tutorials/export_and_load_executable.html
+++ b/docs/how_to/tutorials/export_and_load_executable.html
@@ -441,7 +441,7 @@ runtime module directly.</p>
 <div class="highlight-Python notranslate"><div 
class="highlight"><pre><span></span><span class="k">if</span> <a 
href="https://docs.python.org/3/library/functions.html#bool"; 
title="builtins.bool" class="sphx-glr-backref-module-builtins 
sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span 
class="n">RUN_EXAMPLE</span></a><span class="p">:</span>
     <span class="n">loaded_rt_mod</span> <span class="o">=</span> <span 
class="n">tvm</span><span class="o">.</span><span class="n">runtime</span><span 
class="o">.</span><span class="n">load_module</span><span 
class="p">(</span><span class="nb">str</span><span class="p">(</span><span 
class="n">library_path</span><span class="p">))</span>
     <span class="n">dev</span> <span class="o">=</span> <span 
class="n">tvm</span><span class="o">.</span><span class="n">cpu</span><span 
class="p">(</span><span class="mi">0</span><span class="p">)</span>
-    <span class="n">vm</span> <span class="o">=</span> <a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm 
sphx-glr-backref-type-py-class"><span class="n">relax</span><span 
class="o">.</span><span class="n">VirtualMachine</span></a><span 
class="p">(</span><span class="n">loaded_rt_mod</span><span class="p">,</span> 
<span class="n">dev</span><span class="p">)</span>
+    <span class="n">vm</span> <span class="o">=</span> <a 
href="../../reference/api/python/relax/relax.html#tvm.relax.VirtualMachine" 
title="tvm.relax.VirtualMachine" class="sphx-glr-backref-module-tvm-relax 
sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span 
class="n">relax</span><span class="o">.</span><span 
class="n">VirtualMachine</span></a><span class="p">(</span><span 
class="n">loaded_rt_mod</span><span class="p">,</span> <span 
class="n">dev</span><span class="p">)</span>
 
     <span class="c1"># Prepare input data</span>
     <span class="n">input_tensor</span> <span class="o">=</span> <span 
class="n">torch</span><span class="o">.</span><span class="n">randn</span><span 
class="p">(</span><span class="mi">1</span><span class="p">,</span> <span 
class="mi">1</span><span class="p">,</span> <span class="mi">28</span><span 
class="p">,</span> <span class="mi">28</span><span class="p">,</span> <span 
class="n">dtype</span><span class="o">=</span><span class="n">torch</span><span 
class="o">.</span><span class="n">f [...]
@@ -522,7 +522,7 @@ of how to reload and run the model. Save this as <code 
class="docutils literal n
 
 <span class="c1"># Step 2: Create Virtual Machine</span>
 <span class="n">device</span> <span class="o">=</span> <span 
class="n">tvm</span><span class="o">.</span><span class="n">cpu</span><span 
class="p">(</span><span class="mi">0</span><span class="p">)</span>
-<span class="n">vm</span> <span class="o">=</span> <a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm 
sphx-glr-backref-type-py-class"><span class="n">relax</span><span 
class="o">.</span><span class="n">VirtualMachine</span></a><span 
class="p">(</span><span class="n">lib</span><span class="p">,</span> <span 
class="n">device</span><span class="p">)</span>
+<span class="n">vm</span> <span class="o">=</span> <a 
href="../../reference/api/python/relax/relax.html#tvm.relax.VirtualMachine" 
title="tvm.relax.VirtualMachine" class="sphx-glr-backref-module-tvm-relax 
sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span 
class="n">relax</span><span class="o">.</span><span 
class="n">VirtualMachine</span></a><span class="p">(</span><span 
class="n">lib</span><span class="p">,</span> <span class="n">device</span><span 
class="p">)</span>
 
 <span class="c1"># Step 3: Load parameters from the .npz file</span>
 <span class="n">params_npz</span> <span class="o">=</span> <span 
class="n">np</span><span class="o">.</span><span class="n">load</span><span 
class="p">(</span><span 
class="s2">&quot;relax_export_artifacts/model_params.npz&quot;</span><span 
class="p">)</span>
@@ -624,7 +624,7 @@ for a comprehensive guide on:</p>
 
 <span class="c1"># Step 4: Load and run on remote device</span>
 <span class="n">lib</span> <span class="o">=</span> <span 
class="n">remote</span><span class="o">.</span><span 
class="n">load_module</span><span class="p">(</span><span 
class="s2">&quot;mlp_arm.so&quot;</span><span class="p">)</span>
-<span class="n">vm</span> <span class="o">=</span> <a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm 
sphx-glr-backref-type-py-class"><span class="n">relax</span><span 
class="o">.</span><span class="n">VirtualMachine</span></a><span 
class="p">(</span><span class="n">lib</span><span class="p">,</span> <span 
class="n">remote</span><span class="o">.</span><span class="n">cpu</ [...]
+<span class="n">vm</span> <span class="o">=</span> <a 
href="../../reference/api/python/relax/relax.html#tvm.relax.VirtualMachine" 
title="tvm.relax.VirtualMachine" class="sphx-glr-backref-module-tvm-relax 
sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span 
class="n">relax</span><span class="o">.</span><span 
class="n">VirtualMachine</span></a><span class="p">(</span><span 
class="n">lib</span><span class="p">,</span> <span class="n">remote</span><span 
class="o">.</span><span cla [...]
 <span class="c1"># ... prepare input and params, then run inference</span>
 </pre></div>
 </div>
diff --git a/docs/how_to/tutorials/optimize_llm.html 
b/docs/how_to/tutorials/optimize_llm.html
index fe8dac81ff..2140fedf12 100644
--- a/docs/how_to/tutorials/optimize_llm.html
+++ b/docs/how_to/tutorials/optimize_llm.html
@@ -725,7 +725,7 @@ is designed specifically for the LLMs.</p>
 
 <span class="k">with</span> <a 
href="../../reference/api/python/target.html#tvm.target.Target" 
title="tvm.target.Target" class="sphx-glr-backref-module-tvm-target 
sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span 
class="n">target</span></a><span class="p">:</span>
     <a 
href="../../reference/api/python/relax/relax.html#tvm.relax.VMExecutable" 
title="tvm.relax.VMExecutable" class="sphx-glr-backref-module-tvm-relax 
sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span 
class="n">ex</span></a> <span class="o">=</span> <a 
href="../../reference/api/python/driver.html#tvm.compile" title="tvm.compile" 
class="sphx-glr-backref-module-tvm sphx-glr-backref-type-py-function"><span 
class="n">tvm</span><span class="o">.</span><span class="n">compile</ [...]
-    <a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm sphx-glr-backref-type-py-class 
sphx-glr-backref-instance"><span class="n">vm</span></a> <span 
class="o">=</span> <a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm 
sphx-glr-backref-type-py-class"><span cla [...]
+    <span class="n">vm</span> <span class="o">=</span> <a 
href="../../reference/api/python/relax/relax.html#tvm.relax.VirtualMachine" 
title="tvm.relax.VirtualMachine" class="sphx-glr-backref-module-tvm-relax 
sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span 
class="n">relax</span><span class="o">.</span><span 
class="n">VirtualMachine</span></a><span class="p">(</span><a 
href="../../reference/api/python/relax/relax.html#tvm.relax.VMExecutable" 
title="tvm.relax.VMExecutable" c [...]
 </pre></div>
 </div>
 </section>
@@ -823,7 +823,7 @@ the model documentation for the correct tokenization and 
prompt format.</p>
 key and value tensors for the attention layer. Apache TVM provides a 
PagedKVCache to store the
 key and value tensors. We create the PagedKVCache with the specified 
parameters.</p>
 <div class="highlight-Python notranslate"><div 
class="highlight"><pre><span></span><span class="k">if</span> <span 
class="ow">not</span> <a 
href="https://docs.python.org/3/library/functions.html#bool"; 
title="builtins.bool" class="sphx-glr-backref-module-builtins 
sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span 
class="n">IS_IN_CI</span></a><span class="p">:</span>
-    <span class="n">kv_cache</span> <span class="o">=</span> <a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm sphx-glr-backref-type-py-class 
sphx-glr-backref-instance"><span class="n">vm</span></a><span 
class="p">[</span><span 
class="s2">&quot;create_tir_paged_kv_cache&quot;</span><span class="p">](</span>
+    <span class="n">kv_cache</span> <span class="o">=</span> <span 
class="n">vm</span><span class="p">[</span><span 
class="s2">&quot;create_tir_paged_kv_cache&quot;</span><span class="p">](</span>
         <a href="https://docs.python.org/3/library/stdtypes.html#tuple"; 
title="builtins.tuple" class="sphx-glr-backref-module-builtins 
sphx-glr-backref-type-py-class"><span class="n">ShapeTuple</span></a><span 
class="p">([</span><span class="mi">1</span><span class="p">]),</span>  <span 
class="c1"># max_batch_size=1</span>
         <a href="https://docs.python.org/3/library/stdtypes.html#tuple"; 
title="builtins.tuple" class="sphx-glr-backref-module-builtins 
sphx-glr-backref-type-py-class"><span class="n">ShapeTuple</span></a><span 
class="p">([</span><span class="mi">2048</span><span class="p">]),</span>  
<span class="c1"># max_total_seq_len=2048</span>
         <a href="https://docs.python.org/3/library/stdtypes.html#tuple"; 
title="builtins.tuple" class="sphx-glr-backref-module-builtins 
sphx-glr-backref-type-py-class"><span class="n">ShapeTuple</span></a><span 
class="p">([</span><span class="mi">2048</span><span class="p">]),</span>  
<span class="c1"># prefill_chunk_size=2048</span>
@@ -840,7 +840,7 @@ compiled in the Relax IRModule to embed the tokens into the 
hidden states.</p>
 
 
 <span class="k">def</span><span class="w"> </span><span 
class="nf">embed</span><span class="p">(</span><span 
class="n">tokens</span><span class="p">,</span> <span 
class="n">params</span><span class="p">):</span>
-    <span class="n">_embed</span> <span class="o">=</span> <a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm sphx-glr-backref-type-py-class 
sphx-glr-backref-instance"><span class="n">vm</span></a><span 
class="p">[</span><span class="s2">&quot;embed&quot;</span><span 
class="p">](</span><span class="n">tokens</span><span class="p">,</span> <span 
class="n">params</span><span  [...]
+    <span class="n">_embed</span> <span class="o">=</span> <span 
class="n">vm</span><span class="p">[</span><span 
class="s2">&quot;embed&quot;</span><span class="p">](</span><span 
class="n">tokens</span><span class="p">,</span> <span 
class="n">params</span><span class="p">)</span>
     <span class="c1"># Reshape hidden from [seq_len, hidden_size] to [1, 
seq_len, hidden_size]</span>
     <span class="n">_embed</span> <span class="o">=</span> <span 
class="n">nd_view_func</span><span class="p">(</span><span 
class="n">_embed</span><span class="p">,</span> <a 
href="https://docs.python.org/3/library/stdtypes.html#tuple"; 
title="builtins.tuple" class="sphx-glr-backref-module-builtins 
sphx-glr-backref-type-py-class"><span class="n">ShapeTuple</span></a><span 
class="p">([</span><span class="mi">1</span><span class="p">,</span> <span 
class="n">_embed</span><span class="o">.</s [...]
     <span class="k">return</span> <span class="n">_embed</span>
@@ -863,7 +863,7 @@ and <cite>end_forward_func</cite> to end the forward 
pass.</p>
     <span class="n">add_sequence_func</span><span class="p">(</span><span 
class="n">kv_cache</span><span class="p">,</span> <span 
class="n">seq_id</span><span class="p">)</span>
     <span class="n">hidden_states</span> <span class="o">=</span> <span 
class="n">embed</span><span class="p">(</span><span 
class="n">tokens</span><span class="p">,</span> <span 
class="n">params</span><span class="p">)</span>
     <span class="n">begin_forward_func</span><span class="p">(</span><span 
class="n">kv_cache</span><span class="p">,</span> <a 
href="https://docs.python.org/3/library/stdtypes.html#tuple"; 
title="builtins.tuple" class="sphx-glr-backref-module-builtins 
sphx-glr-backref-type-py-class"><span class="n">ShapeTuple</span></a><span 
class="p">([</span><span class="n">seq_id</span><span class="p">]),</span> <a 
href="https://docs.python.org/3/library/stdtypes.html#tuple"; 
title="builtins.tuple" cla [...]
-    <span class="n">logits</span><span class="p">,</span> <span 
class="n">kv_cache</span> <span class="o">=</span> <a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm sphx-glr-backref-type-py-class 
sphx-glr-backref-instance"><span class="n">vm</span></a><span 
class="p">[</span><span class="s2">&quot;prefill&quot;</span><span 
class="p">](</span><span class="n">hidden_states</ [...]
+    <span class="n">logits</span><span class="p">,</span> <span 
class="n">kv_cache</span> <span class="o">=</span> <span 
class="n">vm</span><span class="p">[</span><span 
class="s2">&quot;prefill&quot;</span><span class="p">](</span><span 
class="n">hidden_states</span><span class="p">,</span> <span 
class="n">kv_cache</span><span class="p">,</span> <span 
class="n">params</span><span class="p">)</span>
     <span class="n">end_forward_func</span><span class="p">(</span><span 
class="n">kv_cache</span><span class="p">)</span>
 </pre></div>
 </div>
@@ -895,7 +895,7 @@ IRModule to generate the token.</p>
         <span class="n">tokens</span> <span class="o">=</span> <span 
class="n">tvm</span><span class="o">.</span><span class="n">runtime</span><span 
class="o">.</span><span class="n">tensor</span><span class="p">(</span><span 
class="n">np</span><span class="o">.</span><span class="n">array</span><span 
class="p">([</span><span class="n">last_token</span><span 
class="p">])</span><span class="o">.</span><span class="n">astype</span><span 
class="p">(</span><span class="s2">&quot;int32&quot;< [...]
         <span class="n">hidden_states</span> <span class="o">=</span> <span 
class="n">embed</span><span class="p">(</span><span 
class="n">tokens</span><span class="p">,</span> <span 
class="n">params</span><span class="p">)</span>
         <span class="n">begin_forward_func</span><span class="p">(</span><span 
class="n">kv_cache</span><span class="p">,</span> <a 
href="https://docs.python.org/3/library/stdtypes.html#tuple"; 
title="builtins.tuple" class="sphx-glr-backref-module-builtins 
sphx-glr-backref-type-py-class"><span class="n">ShapeTuple</span></a><span 
class="p">([</span><span class="n">seq_id</span><span class="p">]),</span> <a 
href="https://docs.python.org/3/library/stdtypes.html#tuple"; 
title="builtins.tuple" [...]
-        <span class="n">logits</span><span class="p">,</span> <span 
class="n">kv_cache</span> <span class="o">=</span> <a 
href="../../reference/api/python/runtime/vm.html#tvm.runtime.vm.VirtualMachine" 
title="tvm.runtime.vm.VirtualMachine" 
class="sphx-glr-backref-module-tvm-runtime-vm sphx-glr-backref-type-py-class 
sphx-glr-backref-instance"><span class="n">vm</span></a><span 
class="p">[</span><span class="s2">&quot;decode&quot;</span><span 
class="p">](</span><span class="n">hidden_state [...]
+        <span class="n">logits</span><span class="p">,</span> <span 
class="n">kv_cache</span> <span class="o">=</span> <span 
class="n">vm</span><span class="p">[</span><span 
class="s2">&quot;decode&quot;</span><span class="p">](</span><span 
class="n">hidden_states</span><span class="p">,</span> <span 
class="n">kv_cache</span><span class="p">,</span> <span 
class="n">params</span><span class="p">)</span>
 
         <span class="n">end_forward_func</span><span class="p">(</span><span 
class="n">kv_cache</span><span class="p">)</span>
         <span class="n">last_token</span> <span class="o">=</span> <span 
class="n">sample_token</span><span class="p">(</span><span 
class="n">logits</span><span class="p">)</span>
diff --git a/docs/how_to/tutorials/sg_execution_times.html 
b/docs/how_to/tutorials/sg_execution_times.html
index 15cc6cc0ac..b71ad08678 100644
--- a/docs/how_to/tutorials/sg_execution_times.html
+++ b/docs/how_to/tutorials/sg_execution_times.html
@@ -294,7 +294,7 @@
             
   <section id="computation-times">
 <span id="sphx-glr-how-to-tutorials-sg-execution-times"></span><h1>Computation 
times<a class="headerlink" href="#computation-times" title="Link to this 
heading"></a></h1>
-<p><strong>00:32.929</strong> total execution time for 5 files <strong>from 
how_to/tutorials</strong>:</p>
+<p><strong>00:32.661</strong> total execution time for 5 files <strong>from 
how_to/tutorials</strong>:</p>
 <div class="docutils container">
 <style scoped>
 <link 
href="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/5.3.0/css/bootstrap.min.css";
 rel="stylesheet" />
@@ -316,19 +316,19 @@ $(document).ready( function () {
 </thead>
 <tbody>
 <tr class="row-even"><td><p><a class="reference internal" 
href="optimize_llm.html#sphx-glr-how-to-tutorials-optimize-llm-py"><span 
class="std std-ref">Optimize Large Language Model</span></a> (<code 
class="docutils literal notranslate"><span 
class="pre">optimize_llm.py</span></code>)</p></td>
-<td><p>00:30.561</p></td>
+<td><p>00:30.689</p></td>
 <td><p>0.0</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" 
href="cross_compilation_and_rpc.html#sphx-glr-how-to-tutorials-cross-compilation-and-rpc-py"><span
 class="std std-ref">Cross Compilation and RPC</span></a> (<code 
class="docutils literal notranslate"><span 
class="pre">cross_compilation_and_rpc.py</span></code>)</p></td>
-<td><p>00:00.912</p></td>
+<td><p>00:00.816</p></td>
 <td><p>0.0</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" 
href="customize_opt.html#sphx-glr-how-to-tutorials-customize-opt-py"><span 
class="std std-ref">Customize Optimization</span></a> (<code class="docutils 
literal notranslate"><span class="pre">customize_opt.py</span></code>)</p></td>
-<td><p>00:00.756</p></td>
+<td><p>00:00.679</p></td>
 <td><p>0.0</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" 
href="e2e_opt_model.html#sphx-glr-how-to-tutorials-e2e-opt-model-py"><span 
class="std std-ref">End-to-End Optimize Model</span></a> (<code class="docutils 
literal notranslate"><span class="pre">e2e_opt_model.py</span></code>)</p></td>
-<td><p>00:00.697</p></td>
+<td><p>00:00.475</p></td>
 <td><p>0.0</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" 
href="export_and_load_executable.html#sphx-glr-how-to-tutorials-export-and-load-executable-py"><span
 class="std std-ref">Export and Load Relax Executables</span></a> (<code 
class="docutils literal notranslate"><span 
class="pre">export_and_load_executable.py</span></code>)</p></td>
diff --git a/docs/objects.inv b/docs/objects.inv
index 26758ecb83..48f1b1cff1 100644
Binary files a/docs/objects.inv and b/docs/objects.inv differ
diff --git a/docs/reference/api/python/runtime/vm.html 
b/docs/reference/api/python/runtime/vm.html
index 866aa7d24a..3d60479ce8 100644
--- a/docs/reference/api/python/runtime/vm.html
+++ b/docs/reference/api/python/runtime/vm.html
@@ -490,7 +490,7 @@ more details.</p>
 <div class="admonition seealso">
 <p class="admonition-title">See also</p>
 <dl class="simple">
-<dt><a class="reference internal" 
href="#tvm.runtime.vm.VMInstrumentReturnKind" 
title="tvm.runtime.vm.VMInstrumentReturnKind"><code class="xref py py-obj 
docutils literal notranslate"><span 
class="pre">VMInstrumentReturnKind</span></code></a></dt><dd><p>the possible 
return values in VM.</p>
+<dt><a class="reference internal" 
href="../relax/relax.html#tvm.relax.VMInstrumentReturnKind" 
title="tvm.runtime.vm.VMInstrumentReturnKind"><code class="xref py py-obj 
docutils literal notranslate"><span 
class="pre">VMInstrumentReturnKind</span></code></a></dt><dd><p>the possible 
return values in VM.</p>
 </dd>
 </dl>
 </div>
diff --git a/docs/searchindex.js b/docs/searchindex.js
index 98a37abe8d..68485b3885 100644
--- a/docs/searchindex.js
+++ b/docs/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Cross Compile TVM Runtime": [[40, 
"cross-compile-tvm-runtime"]], "1. The lack of numpy on device machine caused 
the RPC server can\u2019t be launched.": [[40, 
"the-lack-of-numpy-on-device-machine-caused-the-rpc-server-can-t-be-launched"]],
 "2. Pack and Deploy to Device Machine": [[40, 
"pack-and-deploy-to-device-machine"]], "2. The lack of cloudpickle on device 
machine caused the RPC server can\u2019t be launched.": [[40, 
"the-lack-of-cloudpickle-on-devi [...]
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Cross Compile TVM Runtime": [[40, 
"cross-compile-tvm-runtime"]], "1. The lack of numpy on device machine caused 
the RPC server can\u2019t be launched.": [[40, 
"the-lack-of-numpy-on-device-machine-caused-the-rpc-server-can-t-be-launched"]],
 "2. Pack and Deploy to Device Machine": [[40, 
"pack-and-deploy-to-device-machine"]], "2. The lack of cloudpickle on device 
machine caused the RPC server can\u2019t be launched.": [[40, 
"the-lack-of-cloudpickle-on-devi [...]
\ No newline at end of file
diff --git a/docs/sg_execution_times.html b/docs/sg_execution_times.html
index 325a48eec3..f6c0fb2228 100644
--- a/docs/sg_execution_times.html
+++ b/docs/sg_execution_times.html
@@ -294,7 +294,7 @@
             
   <section id="computation-times">
 <span id="sphx-glr-sg-execution-times"></span><h1>Computation times<a 
class="headerlink" href="#computation-times" title="Link to this 
heading"></a></h1>
-<p><strong>00:41.109</strong> total execution time for 11 files <strong>from 
all galleries</strong>:</p>
+<p><strong>00:38.797</strong> total execution time for 11 files <strong>from 
all galleries</strong>:</p>
 <div class="docutils container">
 <style scoped>
 <link 
href="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/5.3.0/css/bootstrap.min.css";
 rel="stylesheet" />
@@ -316,43 +316,43 @@ $(document).ready( function () {
 </thead>
 <tbody>
 <tr class="row-even"><td><p><a class="reference internal" 
href="how_to/tutorials/optimize_llm.html#sphx-glr-how-to-tutorials-optimize-llm-py"><span
 class="std std-ref">Optimize Large Language Model</span></a> (<code 
class="docutils literal notranslate"><span 
class="pre">../how_to/tutorials/optimize_llm.py</span></code>)</p></td>
-<td><p>00:30.561</p></td>
+<td><p>00:30.689</p></td>
 <td><p>0.0</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" 
href="get_started/tutorials/ir_module.html#sphx-glr-get-started-tutorials-ir-module-py"><span
 class="std std-ref">IRModule</span></a> (<code class="docutils literal 
notranslate"><span 
class="pre">../get_started/tutorials/ir_module.py</span></code>)</p></td>
-<td><p>00:07.364</p></td>
+<td><p>00:05.226</p></td>
 <td><p>0.0</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" 
href="how_to/tutorials/cross_compilation_and_rpc.html#sphx-glr-how-to-tutorials-cross-compilation-and-rpc-py"><span
 class="std std-ref">Cross Compilation and RPC</span></a> (<code 
class="docutils literal notranslate"><span 
class="pre">../how_to/tutorials/cross_compilation_and_rpc.py</span></code>)</p></td>
-<td><p>00:00.912</p></td>
+<td><p>00:00.816</p></td>
 <td><p>0.0</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" 
href="how_to/tutorials/customize_opt.html#sphx-glr-how-to-tutorials-customize-opt-py"><span
 class="std std-ref">Customize Optimization</span></a> (<code class="docutils 
literal notranslate"><span 
class="pre">../how_to/tutorials/customize_opt.py</span></code>)</p></td>
-<td><p>00:00.756</p></td>
+<td><p>00:00.679</p></td>
 <td><p>0.0</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" 
href="how_to/tutorials/e2e_opt_model.html#sphx-glr-how-to-tutorials-e2e-opt-model-py"><span
 class="std std-ref">End-to-End Optimize Model</span></a> (<code 
class="docutils literal notranslate"><span 
class="pre">../how_to/tutorials/e2e_opt_model.py</span></code>)</p></td>
-<td><p>00:00.697</p></td>
+<td><p>00:00.475</p></td>
 <td><p>0.0</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" 
href="deep_dive/tensor_ir/tutorials/tir_transformation.html#sphx-glr-deep-dive-tensor-ir-tutorials-tir-transformation-py"><span
 class="std std-ref">Transformation</span></a> (<code class="docutils literal 
notranslate"><span 
class="pre">../deep_dive/tensor_ir/tutorials/tir_transformation.py</span></code>)</p></td>
-<td><p>00:00.296</p></td>
+<td><p>00:00.289</p></td>
 <td><p>0.0</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" 
href="get_started/tutorials/quick_start.html#sphx-glr-get-started-tutorials-quick-start-py"><span
 class="std std-ref">Quick Start</span></a> (<code class="docutils literal 
notranslate"><span 
class="pre">../get_started/tutorials/quick_start.py</span></code>)</p></td>
-<td><p>00:00.175</p></td>
+<td><p>00:00.283</p></td>
 <td><p>0.0</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" 
href="deep_dive/tensor_ir/tutorials/tir_creation.html#sphx-glr-deep-dive-tensor-ir-tutorials-tir-creation-py"><span
 class="std std-ref">TensorIR Creation</span></a> (<code class="docutils 
literal notranslate"><span 
class="pre">../deep_dive/tensor_ir/tutorials/tir_creation.py</span></code>)</p></td>
-<td><p>00:00.174</p></td>
+<td><p>00:00.170</p></td>
 <td><p>0.0</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" 
href="deep_dive/relax/tutorials/relax_creation.html#sphx-glr-deep-dive-relax-tutorials-relax-creation-py"><span
 class="std std-ref">Relax Creation</span></a> (<code class="docutils literal 
notranslate"><span 
class="pre">../deep_dive/relax/tutorials/relax_creation.py</span></code>)</p></td>
-<td><p>00:00.109</p></td>
+<td><p>00:00.107</p></td>
 <td><p>0.0</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" 
href="deep_dive/relax/tutorials/relax_transformation.html#sphx-glr-deep-dive-relax-tutorials-relax-transformation-py"><span
 class="std std-ref">Transformation</span></a> (<code class="docutils literal 
notranslate"><span 
class="pre">../deep_dive/relax/tutorials/relax_transformation.py</span></code>)</p></td>
-<td><p>00:00.062</p></td>
+<td><p>00:00.063</p></td>
 <td><p>0.0</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" 
href="how_to/tutorials/export_and_load_executable.html#sphx-glr-how-to-tutorials-export-and-load-executable-py"><span
 class="std std-ref">Export and Load Relax Executables</span></a> (<code 
class="docutils literal notranslate"><span 
class="pre">../how_to/tutorials/export_and_load_executable.py</span></code>)</p></td>

Reply via email to