from:"comaniac"

(tvm) branch main updated: [BugFix][Ansor] Fixing Ansor Gradient Bug (#16739)

2024-04-01 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new ffa9cfd0dd [BugFix][Ansor] Fixing Ansor Gradient Bug (#16739)
ffa9cfd0dd is described below

commit ffa9cfd0dd096000d356103c6c4df9cfd2e226e2
Author: Thais Camacho 
AuthorDate: Mon Apr 1 04:37:33 2024 -0300

[BugFix][Ansor] Fixing Ansor Gradient Bug (#16739)

* Fixing ansor gradient bug

* Changing to dead_task

* Applying reviews
---
 python/tvm/auto_scheduler/task_scheduler.py | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/python/tvm/auto_scheduler/task_scheduler.py 
b/python/tvm/auto_scheduler/task_scheduler.py
index 547e5a5833..58457daad0 100644
--- a/python/tvm/auto_scheduler/task_scheduler.py
+++ b/python/tvm/auto_scheduler/task_scheduler.py
@@ -358,6 +358,11 @@ class TaskScheduler:
 self.best_ct = self.ct
 self.best_score = self.cur_score
 
+# put task without schedule on warm up to dead state
+for task_idx, cost in enumerate(self.best_costs):
+if cost == 1e10:
+self.dead_tasks.add(task_idx)
+
 # use the specific strategy to choose workload to tune
 task_idx = -1
 while self.ct < tune_option.num_measure_trials and 
len(self.dead_tasks) < len(self.tasks):
@@ -367,6 +372,7 @@ class TaskScheduler:
 task_idx = (task_idx + 1) % len(self.tasks)
 elif self.strategy == "gradient":
 gradients = []
+
 for i in range(len(self.tasks)):
 if i in self.dead_tasks:
 gradients.append(0)

[tvm] branch main updated: [Codegen] Fix CUDA codegen for int64 Ramp (#13382)

2022-11-14 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new 3aa16f72dd [Codegen] Fix CUDA codegen for int64 Ramp (#13382)
3aa16f72dd is described below

commit 3aa16f72dd3f1807b11ec61cf372af07d32099c4
Author: Wuwei Lin 
AuthorDate: Mon Nov 14 17:45:39 2022 -0800

[Codegen] Fix CUDA codegen for int64 Ramp (#13382)
---
 src/target/source/codegen_cuda.cc   | 4 +++-
 tests/python/topi/python/test_topi_transform.py | 1 +
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/target/source/codegen_cuda.cc 
b/src/target/source/codegen_cuda.cc
index d96e0cbc16..3ae74cc16d 100644
--- a/src/target/source/codegen_cuda.cc
+++ b/src/target/source/codegen_cuda.cc
@@ -1005,7 +1005,9 @@ void CodeGenCUDA::VisitStmt_(const EvaluateNode* op) {
 
 void CodeGenCUDA::VisitExpr_(const RampNode* op, std::ostream& os) {
   CHECK_LE(op->lanes, 4) << "ValueError: Ramp of more than 4 lanes is not 
allowed.";
-  os << "(make_int" << op->lanes << "(";
+  os << "(make_";
+  PrintType(op->dtype, os);
+  os << "(";
   for (int i = 0; i < op->lanes; i++) {
 os << "(" << PrintExpr(op->base) << ")"
<< "+(" << PrintExpr(op->stride) << "*" << i << ")";
diff --git a/tests/python/topi/python/test_topi_transform.py 
b/tests/python/topi/python/test_topi_transform.py
index dd5ad1b119..0f64b486f3 100644
--- a/tests/python/topi/python/test_topi_transform.py
+++ b/tests/python/topi/python/test_topi_transform.py
@@ -1040,6 +1040,7 @@ def test_gather():
 verify_gather(np.random.randn(4, 7, 5), 1, np.random.randint(low=0, 
high=7, size=(4, 10, 5)))
 verify_gather(np.random.randn(4, 7, 5), 2, np.random.randint(low=0, 
high=5, size=(4, 7, 2)))
 verify_gather(np.random.randn(4, 7, 5), 2, np.random.randint(low=0, 
high=5, size=(4, 7, 10)))
+verify_gather(np.random.randn(4, 7, 2), 0, np.random.randint(low=0, 
high=4, size=(4, 7, 2)))
 
 
 @tvm.testing.uses_gpu

[tvm] branch main updated (6b238c4b6e -> e398d16de8)

2022-11-07 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


from 6b238c4b6e [Bugfix][Runtime] Fix sched_setaffinity in Android (#13158)
 add e398d16de8 [Torch] Fix advanced indexing with boolean mask (#13306)

No new revisions were added by this update.

Summary of changes:
 python/tvm/relay/frontend/pytorch.py  | 15 +--
 tests/python/frontend/pytorch/test_forward.py |  8 
 2 files changed, 21 insertions(+), 2 deletions(-)

[tvm] branch main updated (9aedb8bdda -> 1311cac88b)

2022-10-20 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


from 9aedb8bdda [Hexagon] refactor HexagonBufferManager class (#13145)
 add 1311cac88b Fix typo in test_pipeline_executor.py (#13134)

No new revisions were added by this update.

Summary of changes:
 tests/python/relay/test_pipeline_executor.py | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

[tvm] branch fix_placeholder created (now 2677a7c536)

2022-09-26 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch fix_placeholder
in repository https://gitbox.apache.org/repos/asf/tvm.git


  at 2677a7c536 [Relay][TE] Add default param name if needed

This branch includes the following new commits:

 new 2677a7c536 [Relay][TE] Add default param name if needed

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.

[tvm] 01/01: [Relay][TE] Add default param name if needed

2022-09-26 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch fix_placeholder
in repository https://gitbox.apache.org/repos/asf/tvm.git

commit 2677a7c536078595e28eb4f58b834ca940d50882
Author: Cody Yu 
AuthorDate: Mon Sep 26 20:47:30 2022 +

[Relay][TE] Add default param name if needed
---
 src/relay/backend/te_compiler_cache.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/relay/backend/te_compiler_cache.cc 
b/src/relay/backend/te_compiler_cache.cc
index 17eac443ff..6f55402bad 100644
--- a/src/relay/backend/te_compiler_cache.cc
+++ b/src/relay/backend/te_compiler_cache.cc
@@ -131,8 +131,9 @@ class LowerToTECompute : public 
backend::MemoizedExprTranslatorparams) {
   Array inputs;
   for (const auto& ttype : FlattenTupleType(param->checked_type())) {
-tvm::te::Tensor tensor =
-tvm::te::placeholder(GetShape(ttype->shape), ttype->dtype, 
param->vid->name_hint);
+auto name_hint = param->vid->name_hint;
+tvm::te::Tensor tensor = tvm::te::placeholder(
+GetShape(ttype->shape), ttype->dtype, (name_hint == "") ? 
"placeholder" : name_hint);
 inputs.push_back(tensor);
 fn_inputs_.push_back(tensor);
   }

[tvm-rfcs] branch main updated: [RFC77] Added tracking issue link (#85)

2022-08-01 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm-rfcs.git


The following commit(s) were added to refs/heads/main by this push:
 new e6708b4  [RFC77] Added tracking issue link (#85)
e6708b4 is described below

commit e6708b478aac1199a4d68e2bfa91818795285d43
Author: Eric Lunderberg 
AuthorDate: Mon Aug 1 11:15:27 2022 -0500

[RFC77] Added tracking issue link (#85)
---
 rfcs/0077-layout-transform-padding.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/rfcs/0077-layout-transform-padding.md 
b/rfcs/0077-layout-transform-padding.md
index ecbda54..8f40a66 100644
--- a/rfcs/0077-layout-transform-padding.md
+++ b/rfcs/0077-layout-transform-padding.md
@@ -5,7 +5,7 @@
[Junru Shao](https://github.com/junrushao1994)
 - Start Date: 2022-06-06
 - RFC PR: [apache/tvm-rfcs#0077](https://github.com/apache/tvm-rfcs/pull/0077)
-- GitHub Issue: TBD
+- GitHub Issue: [apache/tvm#12261](https://github.com/apache/tvm/issues/12261)
 
 # Table of contents
 - [Table of contents](#table-of-contents)

[tvm] branch comaniac-patch-1 created (now 418b890fff)

2022-07-02 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch comaniac-patch-1
in repository https://gitbox.apache.org/repos/asf/tvm.git


  at 418b890fff [COMMUNITY] Hongyi Jin -> Reviewer

This branch includes the following new commits:

 new 418b890fff [COMMUNITY] Hongyi Jin -> Reviewer

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.

[tvm] 01/01: [COMMUNITY] Hongyi Jin -> Reviewer

2022-07-02 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch comaniac-patch-1
in repository https://gitbox.apache.org/repos/asf/tvm.git

commit 418b890fff4f3c47ae19cec5febf16d0d7c593c5
Author: Cody Yu 
AuthorDate: Sat Jul 2 13:34:58 2022 -0700

[COMMUNITY] Hongyi Jin -> Reviewer
---
 CONTRIBUTORS.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CONTRIBUTORS.md b/CONTRIBUTORS.md
index 95e006513d..e3b3082040 100644
--- a/CONTRIBUTORS.md
+++ b/CONTRIBUTORS.md
@@ -115,6 +115,7 @@ We do encourage everyone to work anything they are 
interested in.
 - [Chenfan Jia](https://github.com/jcf94): @jcf94
 - [Hua Jiang](https://github.com/huajsj): @huajsj
 - [Ziheng Jiang](https://github.com/ZihengJiang): @ZihengJiang
+- [Hongyi Jin](https://github.com/jinhongyii): @jinhongyii
 - [Manupa Karunaratne](https://github.com/manupa-arm): @manupa-arm
 - [Elen Kalda](https://github.com/ekalda): @ekalda
 - [Marisa Kirisame](https://github.com/MarisaKirisame): @MarisaKirisame

[tvm] branch main updated (993f72877d -> a063404812)

2022-06-27 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


from 993f72877d [PyTorch] [Relay] Add aten::pad (#11922)
 add a063404812  [DNNL] Add bfloat16 type support for dnnl conv2d kernel 
(#11902)

No new revisions were added by this update.

Summary of changes:
 python/tvm/contrib/dnnl.py   |   5 ++
 src/runtime/contrib/dnnl/dnnl.cc |  51 -
 tests/python/relay/test_op_level2.py | 140 ++-
 3 files changed, 125 insertions(+), 71 deletions(-)

[tvm] branch main updated: [python][docs] fix docstring / comment typos (#11608)

2022-06-23 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new a6cbe0d13e [python][docs] fix docstring / comment typos (#11608)
a6cbe0d13e is described below

commit a6cbe0d13eacbdcb6471caade4baa4b02926a490
Author: Christian Convey 
AuthorDate: Thu Jun 23 13:41:59 2022 -0400

[python][docs] fix docstring / comment typos (#11608)
---
 python/tvm/auto_scheduler/cost_model/xgb_model.py | 10 +-
 python/tvm/auto_scheduler/task_scheduler.py   | 12 ++--
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/python/tvm/auto_scheduler/cost_model/xgb_model.py 
b/python/tvm/auto_scheduler/cost_model/xgb_model.py
index 3cf65954be..a4e39b9061 100644
--- a/python/tvm/auto_scheduler/cost_model/xgb_model.py
+++ b/python/tvm/auto_scheduler/cost_model/xgb_model.py
@@ -98,8 +98,8 @@ class XGBModel(PythonBasedModel):
 The random seed
 model_file: Optional[str]
 If is not None, save model to this file after every update.
-adapative_training: bool = False
-Whether to use adapatie training, which reduces the training frequency 
when there are
+adaptive_training: bool = False
+Whether to use adaptive training, which reduces the training frequency 
when there are
 too many logs.
 """
 
@@ -109,7 +109,7 @@ class XGBModel(PythonBasedModel):
 num_warmup_sample=100,
 seed=None,
 model_file=None,
-adapative_training=False,
+adaptive_training=False,
 ):
 global xgb
 try:
@@ -141,7 +141,7 @@ class XGBModel(PythonBasedModel):
 self.num_warmup_sample = num_warmup_sample
 self.verbose_eval = verbose_eval
 self.model_file = model_file
-self.adapative_training = adapative_training
+self.adaptive_training = adaptive_training
 
 super().__init__()
 
@@ -169,7 +169,7 @@ class XGBModel(PythonBasedModel):
 self.results.extend(results)
 
 if (
-self.adapative_training
+self.adaptive_training
 and len(self.inputs) - self.last_train_length < 
self.last_train_length / 5
 ):
 # Set a training threshold related to `last_train_length` to 
reduce the training
diff --git a/python/tvm/auto_scheduler/task_scheduler.py 
b/python/tvm/auto_scheduler/task_scheduler.py
index 762c507359..c23c9b3c0c 100644
--- a/python/tvm/auto_scheduler/task_scheduler.py
+++ b/python/tvm/auto_scheduler/task_scheduler.py
@@ -47,7 +47,7 @@ def make_search_policies(
 verbose,
 load_model_file=None,
 load_log_file=None,
-adapative_training=False,
+adaptive_training=False,
 ):
 """Make a list of search policies for a list of search tasks.
 It creates one policy per task.
@@ -71,7 +71,7 @@ def make_search_policies(
 load_log_file: Optional[str]
 Load measurement records from this file. If it is not None, the status 
of the
 task scheduler, search policies and cost models will be restored 
according to this file.
-adapative_training: bool = False
+adaptive_training: bool = False
 Option used by XGBModel to reduce the model training frequency when 
there're too
 many logs.
 
@@ -89,7 +89,7 @@ def make_search_policies(
 cost_model = XGBModel(
 num_warmup_sample=len(tasks) * num_measures_per_round,
 model_file=load_model_file,
-adapative_training=adapative_training,
+adaptive_training=adaptive_training,
 )
 if load_model_file and os.path.isfile(load_model_file):
 logger.info("TaskScheduler: Load pretrained model...")
@@ -283,7 +283,7 @@ class TaskScheduler:
 tune_option,
 search_policy="default",
 search_policy_params=None,
-adapative_training=False,
+adaptive_training=False,
 per_task_early_stopping=None,
 ):
 """Tune a batch of tasks together.
@@ -300,7 +300,7 @@ class TaskScheduler:
 "sketch.random" for SketchPolicy + RandomModel.
 search_policy_params : Optional[Dict[str, Any]]
 The parameters of the search policy
-adapative_training : bool = False
+adaptive_training : bool = False
 Option used by XGBModel to reduce the model training frequency 
when there're
 too many logs.
 per_task_early_stopping : Optional[int]
@@ -347,7 +347,7 @@ class TaskScheduler:
 tune_option.verbose,
 self.load_model_file,
 self.load_log_file,
-adapative_training,
+adaptive_training,
 )
 
 # do a round robin first to warm up

[tvm] branch main updated: Fix typos in target warn of dnnl (#11678)

2022-06-10 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new d0da0b94de Fix typos in target warn of dnnl (#11678)
d0da0b94de is described below

commit d0da0b94dea206400d3bf4e15cb7815713c5b6e7
Author: billishyahao 
AuthorDate: Sat Jun 11 13:49:50 2022 +0800

Fix typos in target warn of dnnl (#11678)
---
 python/tvm/target/target.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/python/tvm/target/target.py b/python/tvm/target/target.py
index aea3dfec43..830cd03cec 100644
--- a/python/tvm/target/target.py
+++ b/python/tvm/target/target.py
@@ -111,8 +111,8 @@ class Target(Object):
 if isinstance(target, str) and "-libs=mkldnn" in target:
 target = target.replace("mkldnn", "dnnl")
 warnings.warn(
-"legacy supoort of mkldnn will be eprecated in the next 
release."
-" Please replace -libs=mkldnn to -libs=dnnl to enable Intel 
OneDNN.",
+"Legacy support of mkldnn is going to be deprecated. "
+"Please use -libs=dnnl instead.",
 )
 if isinstance(target, (dict, str)):
 target = convert(target)

[tvm-rfcs] branch main updated: [RFC] DietCode: An Auto-Scheduler for Dynamic Tensor Programs (#72)

2022-05-31 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm-rfcs.git


The following commit(s) were added to refs/heads/main by this push:
 new a518000  [RFC] DietCode: An Auto-Scheduler for Dynamic Tensor Programs 
(#72)
a518000 is described below

commit a518000cbc82e53321f526d2090c7c8067607391
Author: Bojian Zheng 
AuthorDate: Tue May 31 16:43:09 2022 -0400

[RFC] DietCode: An Auto-Scheduler for Dynamic Tensor Programs (#72)

* Create 0072-dynamic-autoscheduler.md

* Address all the feedbacks
---
 rfcs/0072-dynamic-autoscheduler.md | 247 +
 1 file changed, 247 insertions(+)

diff --git a/rfcs/0072-dynamic-autoscheduler.md 
b/rfcs/0072-dynamic-autoscheduler.md
new file mode 100644
index 000..68be422
--- /dev/null
+++ b/rfcs/0072-dynamic-autoscheduler.md
@@ -0,0 +1,247 @@
+- Feature Name: DietCode: An Auto-Scheduler for Dynamic Tensor Programs
+- Start Date: (2022-05-10)
+- RFC PR: [apache/tvm-rfcs#xx](https://github.com/apache/tvm-rfcs/pull/xx)
+- GitHub Issue: [apache/tvm#yy](https://github.com/apache/tvm/pull/yy)
+
+# Summary
+[summary]: #summary
+
+We propose to integrate DietCode, an auto-scheduler for dynamic tensor 
programs,
+to AutoTIR. DietCode offers the following features:
+- A shape-generic search space to cover possible shapes in dynamic shape
+  workloads.
+- A dynamic-shape aware cost model to judge the quality of schedule candidates.
+- Enhancement to the TVM CUDA codegen for imperfect tiling.
+
+DietCode has been published by MLSys 2022 so please see [the
+paper](https://proceedings.mlsys.org/paper/2022/hash/fa7cdfad1a5aaf8370ebeda47a1ff1c3-Abstract.html)
+for more details and evaluations. Meanwhile, the latest DietCode codebase is 
also publicly
+available [here](https://github.com/UofT-EcoSystem/DietCode).
+
+# Motivation
+[motivation]: #motivation
+
+Achieving high performance for compute-intensive operators in machine learning
+workloads is a crucial but challenging task. Many machine learning and system
+practitioners rely on vendor libraries or auto-schedulers to do the job. While
+the former requires significant engineering efforts, the latter in TVM only 
supports
+static-shape workloads in existing works. It is difficult, if not impractical,
+to apply the existing auto-scheduler directly to **dynamic-shape workloads**, 
as
+this leads to extremely long tuning time.
+
+We observe that the key challenge faced by existing auto-schedulers when
+handling a dynamic-shape workload is that they cannot construct a conclusive 
search
+space for all the possible shapes of the workload, because their search space 
is
+shape-dependent. To address this, this RFC aims to add dynamic-shape supports 
to
+AutoTIR by integrating DietCode framework, which constructs **a shape-generic
+search space and cost model** to auto-schedule dynamic-shape workloads
+efficiently.
+
+Our evaluation shows that DietCode has the following key strengths when
+auto-scheduling an entire model end-to-end: 
+
+1. reduces the auto-scheduling time by up to 5.88x less than the current
+   auto-scheduler on 8 uniformly sampled dynamic shapes, and
+1. improves performance by up to 69.5% better than the auto-scheduler and 18.6%
+   better than the vendor library. All these advantages make DietCode an
+   efficient and practical solution for dynamic-shape workloads.
+
+
+# Guide-Level Explanation
+[guide-level-explanation]: #guide-level-explanation
+
+The existing experiments are largely conducted with auto-scheduler. However,
+having been syncing with the AutoTIR team for quarters, we plan to integrate
+this RFC to MetaSchedule (AutoTIR), because it provides more systematic
+interface and cleaner integration path with less hacks.
+
+To provide an example of additional information users are required to feed the
+system (see https://github.com/UofT-EcoSystem/DietCode/tree/MLSys2022_AE for a
+PoC design):
+
+```python
+# A symbolic shape constraint
+T = tir.ShapeVar('T’)
+I = tir.ShapeVar('I')
+H = tir.ShapeVar('H')
+# The candidate values of `T`
+T_vals = range(1, 128)
+wkl_insts = []
+for t in T_vals:
+  wkl_insts.append((t, 768, 768))
+  wkl_insts.append((t, 768, 3072))
+  wkl_insts.append((t, 3072, 768))
+
+
+task = Task(func=Dense,
+args=(16*T, I, H),
+shape_vars=(T, I, H),
+wkl_insts=wkl_insts
+wkl_inst_weights=([1. for _ in T_vals],))
+```
+
+To enable auto-scheduling for dynamic shape workloads, users only need to:
+1. Have `ShapeVar` in the TE/TensorIR compututation.
+2. Specify the weight/distribution of each shape value.
+
+Notes:
+1. Symbolic constraint is required additional in Relay, but could be inferred
+   automatically after Relax is introduced;
+2. The proposed interface does not change any existing functionality.
+
+# Reference-Level Explanation
+[reference-level-explanation]: #reference-level

[tvm] branch main updated: Fix mixed precision output type to original type (#11142)

2022-05-05 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new eae836cdf6 Fix mixed precision output type to original type (#11142)
eae836cdf6 is described below

commit eae836cdf66f54f1e81e78e48bfa051431e8556f
Author: Gayatri P K 
AuthorDate: Thu May 5 21:04:30 2022 +0530

Fix mixed precision output type to original type (#11142)
---
 src/relay/transforms/to_mixed_precision.cc| 60 +++
 tests/python/relay/test_to_mixed_precision.py | 39 -
 2 files changed, 82 insertions(+), 17 deletions(-)

diff --git a/src/relay/transforms/to_mixed_precision.cc 
b/src/relay/transforms/to_mixed_precision.cc
index 4ad3482f74..e1d3a264c2 100644
--- a/src/relay/transforms/to_mixed_precision.cc
+++ b/src/relay/transforms/to_mixed_precision.cc
@@ -36,6 +36,7 @@
 namespace tvm {
 namespace relay {
 
+TVM_REGISTER_PASS_CONFIG_OPTION("relay.ToMixedPrecision.keep_orig_output_dtype",
 Bool);
 // A callable which hashes std::pair
 struct pair_hash {
   template 
@@ -105,6 +106,9 @@ class MixedPrecisionPass : public MixedModeMutator {
* encountered. Used for emitting warnings on missing ops in the pass.
*/
   std::unordered_map missing_ops_;
+  const RelayExprNode* root_;
+  std::vector original_dtype_;
+  bool keep_orig_output_dtype_;
 
   Attrs GetNewAttrs(const CallNode* call, const DataType& accumulation_dtype) 
const {
 /* If the accumulation dtype is in the attributes make a copy and mutate 
the field. */
@@ -278,8 +282,23 @@ class MixedPrecisionPass : public MixedModeMutator {
  public:
   using MixedModeMutator::VisitExpr_;
 
-  explicit MixedPrecisionPass(DataType mixed_precision_type = 
DataType::Float(16))
-  : MixedModeMutator(), mixed_precision_type_(mixed_precision_type) {
+  explicit MixedPrecisionPass(Expr base, bool keep_orig_output_dtype,
+  DataType mixed_precision_type = 
DataType::Float(16))
+  : MixedModeMutator(),
+mixed_precision_type_(mixed_precision_type),
+root_(Downcast(base)->body.get()),
+keep_orig_output_dtype_(keep_orig_output_dtype) {
+if (keep_orig_output_dtype_) {
+  if (root_->IsInstance()) {
+const TupleTypeNode* tuple_type = 
(root_->checked_type_).as();
+for (Type t : tuple_type->fields) {
+  const TensorTypeNode* tensor_type = t.as();
+  original_dtype_.push_back(tensor_type->dtype);
+}
+  } else if (root_->IsInstance()) {
+
original_dtype_.push_back((root_->checked_type_).as()->dtype);
+  }
+}
 if (!mixed_precision_type_.is_float() && 
!mixed_precision_type_.is_bfloat16()) {
   LOG(FATAL) << "Only support IEEE floating point mixed precision types 
and bfloat16, but got "
  << mixed_precision_type_;
@@ -381,6 +400,11 @@ class MixedPrecisionPass : public MixedModeMutator {
   if (accumulation_dtype != output_dtype) {
 output = CastArg(output, GetType(output), output_dtype);
   }
+  if (pre_call_node == root_ && keep_orig_output_dtype_) {
+if (original_dtype_[0] != output_dtype) {
+  output = CastArg(output, GetType(output), original_dtype_[0]);
+}
+  }
   return output;
 }
 
@@ -396,6 +420,21 @@ class MixedPrecisionPass : public MixedModeMutator {
   Expr Rewrite_(const TupleNode* pre, const Expr& post) {
 // The old checked type in the expression may not be valid so clear it
 post->checked_type_ = Type(nullptr);
+if (pre == root_ && keep_orig_output_dtype_) {
+  Array new_expr;
+  bool all_same = true;
+  for (size_t i = 0; i < original_dtype_.size(); i++) {
+Expr output_element = GetField(post, i);
+Expr casted_element;
+auto output_element_type = transform::InferTypeLocal(output_element);
+casted_element = CastArg(output_element, output_element_type, 
original_dtype_[i]);
+new_expr.push_back(casted_element);
+all_same &= casted_element.same_as(output_element);
+  }
+  if (!all_same) {
+return Tuple(new_expr);
+  }
+}
 return post;
   }
 
@@ -421,11 +460,12 @@ class MixedPrecisionPass : public MixedModeMutator {
   }
 
   // To access map of ops not registered for error reporting
-  friend Expr ToMixedPrecision(const Expr& expr, const DataType& 
mixed_precision_type,
-   int missing_op_mode);
+  friend Expr ToMixedPrecision(const Expr& expr, bool keep_orig_output_dtype,
+   const DataType& mixed_precision_type, int 
missing_op_mode);
 };
 
-Expr ToMixedPrecision(const Expr& expr, const DataType& mixed_precision_type, 
int missing_op_mode) {
+Expr ToMixedPrecision(const

[tvm] branch main updated (169f824d69 -> 6e23e22b17)

2022-05-02 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


from 169f824d69 [TIR] Reduced duplication in op.h (#11129)
 add 6e23e22b17 [TRT] Add check to use setBindingDimensions in TRT 6.0.1+ 
(#11178)

No new revisions were added by this update.

Summary of changes:
 src/runtime/contrib/tensorrt/tensorrt_runtime.cc | 2 ++
 1 file changed, 2 insertions(+)

[tvm] branch main updated (be90c656e8 -> 9284d32e3a)

2022-05-02 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


from be90c656e8 [FIX] Avoid stack overflow in TargetHookVisitor with large 
modules (#11135)
 add 9284d32e3a [TRT] Add check to support split op with TRT 5.1.5+ (#11154)

No new revisions were added by this update.

Summary of changes:
 docs/how_to/deploy/tensorrt.rst  | 4 
 src/runtime/contrib/tensorrt/tensorrt_ops.cc | 4 +++-
 2 files changed, 7 insertions(+), 1 deletion(-)

[tvm] 01/01: Update CONTRIBUTORS.md

2022-04-11 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch comaniac-patch-1
in repository https://gitbox.apache.org/repos/asf/tvm.git

commit 4c10879fd7f623d48575ec90894ad2e1825cf6bb
Author: Cody Yu 
AuthorDate: Mon Apr 11 18:00:02 2022 -0700

Update CONTRIBUTORS.md
---
 CONTRIBUTORS.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CONTRIBUTORS.md b/CONTRIBUTORS.md
index 24fb8f424a..ed67d6b889 100644
--- a/CONTRIBUTORS.md
+++ b/CONTRIBUTORS.md
@@ -46,6 +46,7 @@ We do encourage everyone to work anything they are interested 
in.
 - [Ziheng Jiang](https://github.com/ZihengJiang) (PMC): @ZihengJiang - relay, 
compiler
 - [Manupa Karunaratne](https://github.com/manupa-arm): @manupa-arm - ethos-u, 
memory planner
 - [Marisa Kirisame](https://github.com/MarisaKirisame): @MarisaKirisame - relay
+- [Ruihang Lai](https://github.com/MasterJH5574): @MasterJH5574 - tir, 
tvm-script
 - [Wuwei Lin](https://github.com/vinx13): @vinx13 - relay, topi
 - [Yizhi Liu](https://github.com/yzhliu) (PMC): @yzhliu - jvm, topi, relay
 - [Hao Lu](https://github.com/hlu1): @hlu1 - nnpack, frontends

[tvm] branch comaniac-patch-1 created (now 4c10879fd7)

2022-04-11 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch comaniac-patch-1
in repository https://gitbox.apache.org/repos/asf/tvm.git


  at 4c10879fd7 Update CONTRIBUTORS.md

This branch includes the following new commits:

 new 4c10879fd7 Update CONTRIBUTORS.md

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.

[tvm] branch main updated (4b3b86c2c3 -> 7f52cc4c0e)

2022-04-04 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


from 4b3b86c2c3 Fix submodule URLs. (#10888)
 add 7f52cc4c0e Handle uint8 in ConstantNode visitor in LowerToTECompute 
(#10894)

No new revisions were added by this update.

Summary of changes:
 src/relay/backend/te_compiler_cache.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

[tvm] branch main updated (5cacecc -> 5814fdd)

2022-03-31 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 5cacecc  [CUBLAS] Add support for nn.dense and nn.batch_matmul (#10826)
 add 5814fdd  prune dnnl subgraph, and add related test case. (#10835)

No new revisions were added by this update.

Summary of changes:
 python/tvm/relay/op/contrib/dnnl.py | 96 -
 tests/python/contrib/test_dnnl.py   | 51 +---
 2 files changed, 139 insertions(+), 8 deletions(-)

[tvm] branch main updated (1f60529 -> 48793f3)

2022-03-09 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 1f60529  [Hexagon] Resolve breakage in 
test_hexagon/test_cache_read_write (#10520)
 add 48793f3  Add ONNX LinearRegressor operator support (#10477)

No new revisions were added by this update.

Summary of changes:
 python/tvm/relay/frontend/onnx.py  | 31 +
 tests/python/frontend/onnx/test_forward.py | 44 ++
 2 files changed, 75 insertions(+)

[tvm] branch main updated (4d586e2 -> 2ea6e55)

2022-03-03 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 4d586e2  Modify Jenkinsfile to prevent builds from triggering on 
branch indexing (#10432)
 add 2ea6e55  [skip ci][ci] Skip actions on forks (#10468)

No new revisions were added by this update.

Summary of changes:
 .github/workflows/cc_bot.yml| 1 +
 .github/workflows/ping_reviewers.yml| 1 +
 .github/workflows/tag_teams.yml | 1 +
 .github/workflows/update_last_successful_branch.yml | 1 +
 4 files changed, 4 insertions(+)

[tvm] branch main updated (a1f51aa -> 780f88a)

2022-02-01 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from a1f51aa  [CUTLASS] Conv2d dgrad (#10110)
 add 780f88a  [FIX,AUTOTVM] Add backtraces to tuning errors (#9901)

No new revisions were added by this update.

Summary of changes:
 python/tvm/autotvm/measure/measure_methods.py | 59 ++-
 python/tvm/autotvm/tuner/tuner.py |  6 +--
 python/tvm/runtime/object.py  |  2 +-
 3 files changed, 52 insertions(+), 15 deletions(-)

[tvm] branch main updated: [Bugfix][Op] Fix shape inference of adv_index (#9717)

2022-01-31 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new dad8f62  [Bugfix][Op] Fix shape inference of adv_index (#9717)
dad8f62 is described below

commit dad8f62fc10282691227f303be3a7bd306e511c8
Author: Huang, Guangtai 
AuthorDate: Tue Feb 1 01:53:45 2022 +0800

[Bugfix][Op] Fix shape inference of adv_index (#9717)

* init

* test

* lint
---
 include/tvm/topi/transform.h| 36 ---
 python/tvm/relay/op/_transform.py   | 58 ++---
 src/relay/op/tensor/transform.cc| 39 +++--
 tests/python/relay/test_any.py  | 13 +++---
 tests/python/relay/test_op_level3.py|  1 +
 tests/python/topi/python/test_topi_transform.py |  2 +-
 6 files changed, 48 insertions(+), 101 deletions(-)

diff --git a/include/tvm/topi/transform.h b/include/tvm/topi/transform.h
index 59e6d41..acff301 100644
--- a/include/tvm/topi/transform.h
+++ b/include/tvm/topi/transform.h
@@ -1902,43 +1902,23 @@ inline Tensor matrix_set_diag(const Tensor& input, 
const Tensor& diagonal, int k
 inline Tensor adv_index(const Tensor& data, const Array& indices,
 const std::string name = "advanced_index",
 const std::string tag = kInjective) {
+  ICHECK_LE(indices.size(), data->shape.size()) << "too many indices for 
data!";
   Array oshape;
   Array broadcast_shape;
   Array bindices;
-  std::vector flatten_shape_lens;
-  int64_t num_picked_elems = 1;
-  bool has_dyn_shape = false;
 
+  broadcast_shape = indices[0]->shape;
+  for (size_t i = 1; i < indices.size(); ++i) {
+auto bh = detail::BroadcastShape(broadcast_shape, indices[i]->shape);
+broadcast_shape = Array(bh.common_shape.begin(), 
bh.common_shape.end());
+  }
   if (indices.size() == 1) {
-broadcast_shape = indices[0]->shape;
+// quick path
 bindices = indices;
   } else {
-for (const auto& index : indices) {
-  int64_t flatten_len = 1;
-  for (const auto& dim : index->shape) {
-const IntImmNode* axis_len = dim.as();
-if (!axis_len) {
-  broadcast_shape = index->shape;
-  has_dyn_shape = true;
-  break;
-}
-flatten_len *= axis_len->value;
-  }
-  if (has_dyn_shape) break;
-  flatten_shape_lens.push_back(flatten_len);
-  if (flatten_len > num_picked_elems) {
-num_picked_elems = flatten_len;
-broadcast_shape = index->shape;
-  }
-}
-
 // Do broadcast for indices
 for (size_t i = 0; i < indices.size(); ++i) {
-  if (!has_dyn_shape && flatten_shape_lens[i] < num_picked_elems) {
-bindices.push_back(broadcast_to(indices[i], broadcast_shape));
-  } else {
-bindices.push_back(indices[i]);
-  }
+  bindices.push_back(broadcast_to(indices[i], broadcast_shape));
 }
   }
 
diff --git a/python/tvm/relay/op/_transform.py 
b/python/tvm/relay/op/_transform.py
index cc71ea1..b67579a 100644
--- a/python/tvm/relay/op/_transform.py
+++ b/python/tvm/relay/op/_transform.py
@@ -976,40 +976,6 @@ def split_shape_func(attrs, inputs, _):
 
 
 @script
-def _adv_index_shape_func(inputs):
-index_rank = inputs[1].shape[0]
-data_rank = inputs[0].shape[0]
-out = output_tensor((data_rank + index_rank - len(inputs) + 1,), "int64")
-
-max_flatten_len = int64(1)
-for i in const_range(index_rank):
-max_flatten_len *= inputs[1][i]
-out[i] = inputs[1][i]
-for i in const_range(len(inputs) - 2):
-flatten_len = int64(1)
-for j in const_range(index_rank):
-flatten_len *= inputs[i + 2][j]
-if flatten_len > max_flatten_len:
-max_flatten_len = flatten_len
-for k in const_range(index_rank):
-out[k] = inputs[i + 2][k]
-
-for i in const_range(data_rank - len(inputs) + 1):
-out[i + index_rank] = inputs[0][i + len(inputs) - 1]
-
-return out
-
-
-@_reg.register_shape_func("adv_index", False)
-def adv_index_shape_func(attrs, inputs, _):
-"""
-Shape func for adv_index.
-Only allow single index tensor.
-"""
-return [_adv_index_shape_func(inputs)]
-
-
-@script
 def _repeat_shape_func(data_shape, repeats, axis):
 out = output_tensor((data_shape.shape[0],), "int64")
 
@@ -1116,6 +1082,30 @@ def where_shape_func(attrs, inputs, _):
 
 
 @script
+def _adv_index_post_process(data_shape, bcast_shape, num_indices):
+data_rank = data_shape.shape[0]
+bcast_rank = bcast_shape.shape[0]
+out = output_tensor((data_rank + bcast_rank - num_indices,), "int64")
+
+for i in const_range(bcast

[tvm] branch main updated (3af9c30 -> 80d4d05)

2022-01-28 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 3af9c30  [microTVM] Update Zephyr to 2.7 (#10094)
 add 80d4d05  [Runtime][PipelineExecutor] Pipeline Executor Sequential 
execution (#10082)

No new revisions were added by this update.

Summary of changes:
 python/tvm/contrib/pipeline_executor.py  |  10 +++
 src/runtime/pipeline/pipeline_executor.cc|  34 ++--
 src/runtime/pipeline/pipeline_executor.h |  12 +++
 src/runtime/pipeline/pipeline_scheduler.cc   |  80 ++
 src/runtime/pipeline/pipeline_scheduler.h|  25 ++
 src/runtime/pipeline/pipeline_struct.h   |  82 +-
 tests/python/relay/test_pipeline_executor.py | 120 +++
 7 files changed, 340 insertions(+), 23 deletions(-)

[tvm] branch main updated (920c380 -> ff2c434)

2022-01-12 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 920c380  [Caffe Frontend] adding Reduction op (#8015)
 add ff2c434  [CUTLASS] Support more kernels: int8, tf32, and 3xtf32 (#9899)

No new revisions were added by this update.

Summary of changes:
 python/tvm/contrib/cutlass/build.py  |  91 +++-
 python/tvm/contrib/cutlass/gen_conv2d.py |  29 +++-
 python/tvm/contrib/cutlass/gen_gemm.py   |  66 +++--
 python/tvm/contrib/cutlass/gen_tensor_op.py  | 210 +++
 python/tvm/contrib/cutlass/library.py|  17 +++
 python/tvm/relay/op/contrib/cutlass.py   |   7 +-
 src/relay/backend/contrib/cutlass/codegen.cc |   6 +-
 tests/python/contrib/test_cutlass.py | 135 ++---
 8 files changed, 445 insertions(+), 116 deletions(-)

[tvm] branch main updated (920c380 -> ff2c434)

2022-01-12 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 920c380  [Caffe Frontend] adding Reduction op (#8015)
 add ff2c434  [CUTLASS] Support more kernels: int8, tf32, and 3xtf32 (#9899)

No new revisions were added by this update.

Summary of changes:
 python/tvm/contrib/cutlass/build.py  |  91 +++-
 python/tvm/contrib/cutlass/gen_conv2d.py |  29 +++-
 python/tvm/contrib/cutlass/gen_gemm.py   |  66 +++--
 python/tvm/contrib/cutlass/gen_tensor_op.py  | 210 +++
 python/tvm/contrib/cutlass/library.py|  17 +++
 python/tvm/relay/op/contrib/cutlass.py   |   7 +-
 src/relay/backend/contrib/cutlass/codegen.cc |   6 +-
 tests/python/contrib/test_cutlass.py | 135 ++---
 8 files changed, 445 insertions(+), 116 deletions(-)

[tvm] branch main updated (920c380 -> ff2c434)

2022-01-12 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 920c380  [Caffe Frontend] adding Reduction op (#8015)
 add ff2c434  [CUTLASS] Support more kernels: int8, tf32, and 3xtf32 (#9899)

No new revisions were added by this update.

Summary of changes:
 python/tvm/contrib/cutlass/build.py  |  91 +++-
 python/tvm/contrib/cutlass/gen_conv2d.py |  29 +++-
 python/tvm/contrib/cutlass/gen_gemm.py   |  66 +++--
 python/tvm/contrib/cutlass/gen_tensor_op.py  | 210 +++
 python/tvm/contrib/cutlass/library.py|  17 +++
 python/tvm/relay/op/contrib/cutlass.py   |   7 +-
 src/relay/backend/contrib/cutlass/codegen.cc |   6 +-
 tests/python/contrib/test_cutlass.py | 135 ++---
 8 files changed, 445 insertions(+), 116 deletions(-)

[tvm] branch comaniac-patch-1 updated (35a04ac -> 7939316)

2022-01-10 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch comaniac-patch-1
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 35a04ac  Update CONTRIBUTORS.md
 add 7939316  Update CONTRIBUTORS.md

No new revisions were added by this update.

Summary of changes:
 CONTRIBUTORS.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

[tvm] 01/01: Update CONTRIBUTORS.md

2022-01-10 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch comaniac-patch-1
in repository https://gitbox.apache.org/repos/asf/tvm.git

commit 35a04acc826432beb682d849d2e6b608eb7f2d48
Author: Cody Yu 
AuthorDate: Mon Jan 10 15:30:37 2022 -0800

Update CONTRIBUTORS.md
---
 CONTRIBUTORS.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/CONTRIBUTORS.md b/CONTRIBUTORS.md
index 0adea78..cc8edf5 100644
--- a/CONTRIBUTORS.md
+++ b/CONTRIBUTORS.md
@@ -48,8 +48,8 @@ We do encourage everyone to work anything they are interested 
in.
 - [Wuwei Lin](https://github.com/vinx13): @vinx13 - relay, topi
 - [Yizhi Liu](https://github.com/yzhliu) (PMC): @yzhliu - jvm, topi, relay
 - [Hao Lu](https://github.com/hlu1): @hlu1 - nnpack, frontends
-- [Eric Lunderberg](https://github.com/Lunderberg): @Lunderberg - CI, Vulkan
-  backend
+- [Eric Lunderberg](https://github.com/Lunderberg): @Lunderberg - CI, Vulkan 
backend
+- [Andrew Z. Luo](https://github.com/AndrewZhaoLuo): @AndrewZhaoLuo - AMP, 
relay, frontends
 - [Steven Lyubomirsky](https://github.com/slyubomirsky): @slyubomirsky - relay
 - [Masahiro Masuda](https://github.com/masahi) (PMC): @masahi - topi, relay
 - [Thierry Moreau](https://github.com/tmoreau89) (PMC): @tmoreau89 - vta

[tvm] branch comaniac-patch-1 created (now 35a04ac)

2022-01-10 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch comaniac-patch-1
in repository https://gitbox.apache.org/repos/asf/tvm.git.


  at 35a04ac  Update CONTRIBUTORS.md

This branch includes the following new commits:

 new 35a04ac  Update CONTRIBUTORS.md

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.

[tvm] branch main updated (07a46a1 -> 65e5ddd)

2022-01-06 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 07a46a1  [BugFix] resolve integer 32. ~ 64. mismatch by casting (#9582)
 add 65e5ddd  [Torch] Better support in-place variant of ops (aten::relu_ 
etc) (#9851)

No new revisions were added by this update.

Summary of changes:
 python/tvm/relay/frontend/pytorch.py | 45 +---
 1 file changed, 21 insertions(+), 24 deletions(-)

[tvm] branch main updated: Fix reduce NCHWc infer layout (do not keep reduced inner c when keepdims=false) (#9821)

2022-01-03 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new 11379f7  Fix reduce NCHWc infer layout (do not keep reduced inner c 
when keepdims=false) (#9821)
11379f7 is described below

commit 11379f710bf9bebf4a7a0cf6c0943899047d11ed
Author: masahi 
AuthorDate: Tue Jan 4 02:32:36 2022 +0900

Fix reduce NCHWc infer layout (do not keep reduced inner c when 
keepdims=false) (#9821)

* Fix reduce NCHWc infer layout (do not keep reduced inner c when 
keepdims=false)

* black

* lint
---
 src/relay/op/tensor/reduce.cc   |  2 +-
 tests/python/relay/test_pass_alter_op_layout.py | 19 +++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/src/relay/op/tensor/reduce.cc b/src/relay/op/tensor/reduce.cc
index 5001925..d844bb5 100644
--- a/src/relay/op/tensor/reduce.cc
+++ b/src/relay/op/tensor/reduce.cc
@@ -176,7 +176,7 @@ InferCorrectLayoutOutput ReduceInferCorrectLayout(const 
Attrs& attrs,
   if (params->exclude) {
 // The primal axis is not reduced, so keep the input packed dim.
 inferred_out_string += packed_dim;
-  } else {
+  } else if (params->keepdims) {
 // If the primal axis is part of reduce axes in the original 
layout, the inner dim
 // becomes 1 after reduction.
 inferred_out_string += "1" + layout_dim;
diff --git a/tests/python/relay/test_pass_alter_op_layout.py 
b/tests/python/relay/test_pass_alter_op_layout.py
index 7514a93..ea7fe0b 100644
--- a/tests/python/relay/test_pass_alter_op_layout.py
+++ b/tests/python/relay/test_pass_alter_op_layout.py
@@ -24,6 +24,7 @@ from tvm.relay.testing.temp_op_attr import TempOpAttr
 from tvm.relay.testing import run_infer_type
 import numpy as np
 import tvm.testing
+from tvm.relay import testing
 
 
 def run_opt_pass(expr, passes):
@@ -1452,5 +1453,23 @@ def test_conv2d_strided_slice_packed_to_unpacked():
 assert tvm.ir.structural_equal(a, b)
 
 
+def test_conv2d_reduce_channels():
+x = relay.var("data", shape=(1, 8, 48, 48))
+y = relay.nn.conv2d(
+data=x,
+weight=relay.var("weight"),
+kernel_size=(1, 1),
+channels=8,
+dilation=1,
+strides=(47, 47),
+)
+z = relay.argmin(y, axis=1)
+
+mod, params = testing.create_workload(z)
+
+with tvm.transform.PassContext(opt_level=3):
+relay.build(mod, params=params, target="llvm")
+
+
 if __name__ == "__main__":
 pytest.main([__file__])

[tvm] branch main updated: [CUTLASS] Refactor cutlass kernel generation and selection (#9800)

2021-12-30 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new 6d35f0b  [CUTLASS] Refactor cutlass kernel generation and selection 
(#9800)
6d35f0b is described below

commit 6d35f0bbdf656d393b0722cb93cd213217781c9d
Author: masahi 
AuthorDate: Fri Dec 31 08:20:03 2021 +0900

[CUTLASS] Refactor cutlass kernel generation and selection (#9800)
---
 python/tvm/contrib/cutlass/build.py|  78 +++--
 python/tvm/contrib/cutlass/conv2d_operation.py |   2 +-
 python/tvm/contrib/cutlass/gen_conv2d.py   | 198 -
 python/tvm/contrib/cutlass/gen_gemm.py | 234 ++---
 python/tvm/contrib/cutlass/gen_tensor_op.py|  18 ++
 python/tvm/relay/op/contrib/cutlass.py |  11 +-
 6 files changed, 301 insertions(+), 240 deletions(-)

diff --git a/python/tvm/contrib/cutlass/build.py 
b/python/tvm/contrib/cutlass/build.py
index 3bc3b5d..e921302 100644
--- a/python/tvm/contrib/cutlass/build.py
+++ b/python/tvm/contrib/cutlass/build.py
@@ -94,15 +94,17 @@ class OpAnnotator(tvm.relay.ExprVisitor):
 
 
 def select_gemm_kernel(
-cutlass_profiler, MM, KK, NN, out_dtype, batched, profile_all, 
use_multiprocessing
+cutlass_profiler, op_type, MM, KK, NN, out_dtype, batched, profile_all, 
use_multiprocessing
 ):
 """Run CUTLASS profiler to select the best kernel, or return the default 
one for dynamic
 workloads."""
 if any(isinstance(s, tvm.tir.Any) for s in [MM, KK, NN]):
-out = cutlass_profiler.get_default(out_dtype, batched=batched)
-logger.info("Picked the default kernel %s", out["name"])
+out = cutlass_profiler.get_default(op_type, out_dtype, batched=batched)
+name, cutlass_op_def = out["name"], out["opdef"]
+logger.info("Picked the default kernel %s", name)
 else:
-out = cutlass_profiler.profile(
+name, cutlass_op_def, _ = cutlass_profiler.profile(
+op_type,
 MM,
 NN,
 KK,
@@ -112,10 +114,11 @@ def select_gemm_kernel(
 use_multiprocessing=use_multiprocessing,
 )
 if profile_all:
-logger.info("The best kernel is %s", out["name"])
+logger.info("The best kernel is %s", name)
 else:
-logger.info("Picked the first kernel found %s", out["name"])
-return out
+logger.info("Picked the first kernel found %s", name)
+
+return name, cutlass_op_def
 
 
 def handle_batch_matmul(
@@ -126,24 +129,17 @@ def handle_batch_matmul(
 KK = arg0_shape[2]
 NN = arg1_shape[1]
 
-out = select_gemm_kernel(
-cutlass_profiler, MM, KK, NN, out_dtype, True, profile_all, 
use_multiprocessing
+name, cutlass_op_def = select_gemm_kernel(
+cutlass_profiler, op_type, MM, KK, NN, out_dtype, True, profile_all, 
use_multiprocessing
 )
 
-if op_type == "cutlass.batch_matmul":
-cutlass_op_def = out["opdef"]
-else:
-raise ValueError("%s pattern is not implemented." % op_type)
-
-assert "tn_align" in out["name"], "Only supports (row_major, col_major) 
input layout for now."
-
 return {
 "batch": arg0_shape[0],
 "batch_stride_A": arg0_shape[1] * arg0_shape[2],
 "batch_stride_B": arg1_shape[1] * arg1_shape[2],
 "batch_stride_C": arg0_shape[1] * arg1_shape[1],
 "cutlass_op_def": cutlass_op_def,
-"cutlass_op_name": out["name"],
+"cutlass_op_name": name,
 "lda": "K",
 "ldb": "K",
 "ldc": "N",
@@ -158,26 +154,15 @@ def handle_dense(
 KK = arg0_shape[1]
 NN = arg1_shape[0]
 
-out = select_gemm_kernel(
-cutlass_profiler, MM, KK, NN, out_dtype, False, profile_all, 
use_multiprocessing
+name, cutlass_op_def = select_gemm_kernel(
+cutlass_profiler, op_type, MM, KK, NN, out_dtype, False, profile_all, 
use_multiprocessing
 )
 
-if op_type == "cutlass.dense":
-cutlass_op_def = out["opdef"]
-elif op_type == "cutlass.dense_bias":
-cutlass_op_def = out["opdef_bias"]
-elif op_type == "cutlass.dense_bias_relu":
-cutlass_op_def = out["opdef_bias_relu"]
-elif "cutlass.dense_bias_gelu" in op_type:
-cutlass_op_def = out["opdef_bias_gelu"]
-else:
-raise ValueError("%s pattern is not implemented." % op_type)
-
-assert "tn_align" in out["name"

[tvm-rfcs] branch main updated: [RFC] Integrate LIBXSMM with TVM. (#47)

2021-12-24 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm-rfcs.git


The following commit(s) were added to refs/heads/main by this push:
 new 1a3d4f1  [RFC] Integrate LIBXSMM with TVM. (#47)
1a3d4f1 is described below

commit 1a3d4f13bf9c2ffb2baf420daeae460901fe79c7
Author: zhuwenxi 
AuthorDate: Sat Dec 25 05:17:48 2021 +0800

[RFC] Integrate LIBXSMM with TVM. (#47)

* RFC to integrate LIBXSMM with TVM.

* Fix indent.

* Convert tab to space.

* Fix typo: partition_for_cmsisnn -> partition_for_libxsmm

* Add python annotation.

* Add support for target system and Relay op strategy.

* Add upstream plan.

* Reschedule the integration plan.
---
 rfcs/0046-Intel-LIBXSMM-integration.md | 109 +
 1 file changed, 109 insertions(+)

diff --git a/rfcs/0046-Intel-LIBXSMM-integration.md 
b/rfcs/0046-Intel-LIBXSMM-integration.md
new file mode 100644
index 000..c3384e1
--- /dev/null
+++ b/rfcs/0046-Intel-LIBXSMM-integration.md
@@ -0,0 +1,109 @@
+# Summary
+This RFC introduces the plan of integrating LIBXSMM into TVM. LIBXSMM 
leverages JIT code generator to produce high efficient kernels targeting x86 
architectures. 
+
+For details of LIBXSMM, please refer to:
+* [LIBXSMM User Manual](https://libxsmm.readthedocs.io/en/latest/)
+* [LIBXSMM github repo](https://github.com/hfp/libxsmm)
+
+# Motivation
+TVM has shown satisfactory performance on MLP models with CPU. However there 
are still some defects in the assembly code generated by LLVM which block 
AutoTVM/AutoScheduler from achieving optimal on GEMM.
+
+LIBXSMM is a open source library developed by Intel Lab for accelerating small 
matrix multiplication. It leverages the JIT code generator to generate high 
efficient GEMM kernels for x86 CPU, which could be very close to hardware 
rootline. According to our evaluation, in “small” GEMM (cube_root(m * n * k) <= 
256) , LIBXSMM shows a superior performance over the well-known BLAS library 
Intel MKL. 
+
+By the way, given that LIBXSMM can generate quite efficient GEMM kernel 
implementation, it is also an ideal substitution for inner-kernel of normal 
size GEMM. According our experiments, the AutoTVM templates we wrote with 
LIBXSMM as register-block generation, has a much higher performance comparing 
to MKL and existing TOPI implementation.
+
+# Guide-level explanation
+This proposal aims to integrate LIBXSMM into TVM to accelerate small GEMM and 
serve as inner-kernel to accelerate normal size GEMM.
+
+We will integrate LIBXSMM with TVM in following 3 components:
+1. Add extern call “tvm.contrib.libxsmm.gemm” in “src/runtime/contrib” 
directory, and corresponding python interface in "python/tvm/contrib/" 
directory, so users can call them just as CBLAS;
+2. Use BYOC to accelerate small GEMM (cube_root(m * n * k ) <= 256) and its 
epilogue fusion variations (bias/relu/sigmoid/bias_relu/bias_sigmoid);
+3. AutoTVM template we wrote with LIBXSMM as inner kernel into TOPI, as a GEMM 
implementation candidate.
+4. Add target system and Relay op strategy support. When users specify `llvm 
-libs=libxsmm`, Relay op strategy automatically lowers corresponding GEMM ops 
to libxsmm.
+
+# Reference-level explanation
+1. Users can call libxsmm as CBLAS through extern call API.
+```python
+  def matmul(lhs, rhs, transa=False, transb=False, alpha=1.0, beta=0.0, 
lda=-1, ldb=-1, ldc=-1, **kwargs):
+n = lhs.shape[1] if transa else lhs.shape[0]
+m = rhs.shape[0] if transb else rhs.shape[1]
+return te.extern(
+  (n, m),
+  [lhs, rhs],
+  lambda ins, outs: tvm.tir.call_packed(
+"tvm.contrib.libxsmm.matmul", ins[0], ins[1], outs[0], transa, transb, 
alpha, beta, lda, ldb, ldc),
+  name="C",
+  **kwargs,
+  )
+```
+2. BYOC allows for graph partitioning and using LIBXSMM for code generation.
+  * API to obtain the partitioned function:
+```python
+  from tvm.relay.op.contrib import libxsmm
+
+  # API to call LIBXSMM partitioning
+  libxsmm_module = libxsmm.partition_for_libxsmm(module) 
+```
+  * Pattern matching table: 
+```python
+  @register_pattern_table("libxsmm")
+  def pattern_table():
+  dense_pattern = ("libxsmm.dense", make_pattern(with_bias=False, 
with_activation=None))
+  denese_bias_pattern = ("libxsmm.dense_bias", 
make_pattern(with_bias=True, with_activation=None))
+  denese_relu_pattern = ("libxsmm.dense_relu", 
make_pattern(with_bias=False, with_activation="relu"))
+  denese_sigmoid_pattern = ("libxsmm.dense_sigmoid", 
make_pattern(with_bias=False, with_activation="sigmoid"))
+  denese_bias_relu = ("libxsmm.dense_bias_relu", 
make_pattern(with_bias=True, with_activation="relu"))
+  denese_bias_sigmoid =

[tvm] branch main updated (c7ddb41 -> aa86dc0)

2021-12-16 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from c7ddb41  [Relay] Add a unit test for structural equality (#9745)
 add aa86dc0  [CUTLASS] Support conv2d activation fusion (#9746)

No new revisions were added by this update.

Summary of changes:
 python/tvm/contrib/cutlass/build.py|  9 
 python/tvm/contrib/cutlass/conv2d_operation.py | 35 ++
 python/tvm/contrib/cutlass/gen_conv2d.py   |  9 ++--
 python/tvm/contrib/cutlass/gen_gemm.py |  1 -
 python/tvm/contrib/cutlass/library.py  |  2 +
 python/tvm/relay/op/contrib/cutlass.py | 55 ++---
 src/relay/backend/contrib/cutlass/codegen.cc   | 56 --
 tests/python/contrib/test_cutlass.py   | 66 +-
 8 files changed, 196 insertions(+), 37 deletions(-)

[tvm] branch main updated (6e9e4e6 -> a674121)

2021-12-13 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 6e9e4e6  [TVMC] Add --opt-level to compile mode (#9722)
 add a674121  [Relay] Non-recursive dependency graph (#9528)

No new revisions were added by this update.

Summary of changes:
 src/relay/analysis/dependency_graph.cc | 38 +++---
 src/relay/ir/indexed_graph.cc  | 20 ++
 2 files changed, 51 insertions(+), 7 deletions(-)

[tvm] branch main updated (bd361b9 -> 01599d1)

2021-12-10 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from bd361b9  [RELAY] [AST] Add virtual_device as a first class field in 
Relay (#9641)
 add 01599d1  [SimplifyExpr] Simplify consecutive adds with constants 
(#9671)

No new revisions were added by this update.

Summary of changes:
 src/relay/transforms/simplify_expr.cc | 44 +++
 tests/python/relay/test_pass_simplify_expr.py | 42 +
 2 files changed, 86 insertions(+)

[tvm] branch main updated: [Dyn] Use SizeVar instead of Var in the GetShape function (#9650)

2021-12-05 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new ccd59e8  [Dyn] Use SizeVar instead of Var in the GetShape function 
(#9650)
ccd59e8 is described below

commit ccd59e89d21cc81cc06f2a16cddcc1ffeed1e2a1
Author: Chenfan 
AuthorDate: Mon Dec 6 03:39:03 2021 +0800

[Dyn] Use SizeVar instead of Var in the GetShape function (#9650)
---
 python/tvm/relay/backend/te_compiler.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/tvm/relay/backend/te_compiler.py 
b/python/tvm/relay/backend/te_compiler.py
index db75049..def827c 100644
--- a/python/tvm/relay/backend/te_compiler.py
+++ b/python/tvm/relay/backend/te_compiler.py
@@ -268,7 +268,7 @@ def get_shape(shape):
 assert val <= np.iinfo(np.int32).max
 ret.append(tvm.tir.IntImm("int32", val))
 elif isinstance(dim, tvm.tir.Any):
-ret.append(te.var("any_dim", "int32"))
+ret.append(te.size_var("any_dim", "int32"))
 else:
 ret.append(dim)
 return ret

[tvm] branch main updated: [CUTLASS] Initial conv2d support (#9595)

2021-12-01 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new dc988b2  [CUTLASS] Initial conv2d support (#9595)
dc988b2 is described below

commit dc988b288d85660822e0fbadbf1fc74326e763e5
Author: masahi 
AuthorDate: Thu Dec 2 10:34:46 2021 +0900

[CUTLASS] Initial conv2d support (#9595)

* Add initial conv generator

* added conv2d pattern

* profile by gemm profiler

* remove conv2d profiler for now

* remove unused code

* add default

* minor fix, profiling working

* start codegen

* generated code compiled

* fixed layout initialization

* matched with autotvm tensorcore result

* test refactor

* minor cleanup

* remove iteration algo "Analytic"

* add test for dynamic batch conv2d

* pass dl tensor as output too

* support conv2d dynamic shape in codegen

* test working

* lint

* simplify codegen

* fix weird formatting

* typo fix

* check if cutlass is enabled in the test

* simplify gen_conv2d.py
---
 python/tvm/contrib/cutlass/build.py |  90 +++--
 python/tvm/contrib/cutlass/conv2d_operation.py  | 240 
 python/tvm/contrib/cutlass/gen_conv2d.py| 147 +++
 python/tvm/contrib/cutlass/gen_gemm.py  |   3 +
 python/tvm/contrib/cutlass/library.py   |  57 +-
 python/tvm/relay/op/contrib/cutlass.py  |   7 +
 python/tvm/relay/op/nn/_nn.py   |  15 ++
 src/relay/backend/contrib/codegen_c/codegen_c.h |  12 +-
 src/relay/backend/contrib/cutlass/codegen.cc| 134 -
 tests/python/contrib/test_cutlass.py|  96 ++
 10 files changed, 776 insertions(+), 25 deletions(-)

diff --git a/python/tvm/contrib/cutlass/build.py 
b/python/tvm/contrib/cutlass/build.py
index 615b900..c3a8fdc 100644
--- a/python/tvm/contrib/cutlass/build.py
+++ b/python/tvm/contrib/cutlass/build.py
@@ -23,6 +23,7 @@ import tvm
 from tvm import runtime, relay
 from tvm.contrib.nvcc import find_cuda_path, get_cuda_version
 from .gen_gemm import CutlassGemmProfiler
+from .gen_conv2d import CutlassConv2DProfiler
 
 logger = logging.getLogger("cutlass")
 
@@ -65,7 +66,7 @@ def _get_cutlass_compile_options(sm, threads):
 return kwargs
 
 
-class GemmAnnotator(tvm.relay.ExprVisitor):
+class OpAnnotator(tvm.relay.ExprVisitor):
 """Annotates partitioned functions with shape and dtype information."""
 
 def __init__(self):
@@ -81,6 +82,10 @@ class GemmAnnotator(tvm.relay.ExprVisitor):
 self.signature["arg%d_dtype" % i] = arg.checked_type.dtype
 self.signature["ret_shape"] = op.ret_type.shape
 self.signature["ret_dtype"] = op.ret_type.dtype
+self.visit(op.body)
+
+if str(op) == "nn.conv2d":
+self.op_attrs = call.attrs
 
 
 def select_gemm_kernel(
@@ -125,6 +130,8 @@ def handle_batch_matmul(
 else:
 raise ValueError("%s pattern is not implemented." % op_type)
 
+assert "tn_align" in out["name"], "Only supports (row_major, col_major) 
input layout for now."
+
 return {
 "batch": arg0_shape[0],
 "batch_stride_A": arg0_shape[1] * arg0_shape[2],
@@ -132,6 +139,9 @@ def handle_batch_matmul(
 "batch_stride_C": arg0_shape[1] * arg1_shape[1],
 "cutlass_op_def": cutlass_op_def,
 "cutlass_op_name": out["name"],
+"lda": "K",
+"ldb": "K",
+"ldc": "N",
 }
 
 
@@ -158,6 +168,50 @@ def handle_dense(
 else:
 raise ValueError("%s pattern is not implemented." % op_type)
 
+assert "tn_align" in out["name"], "Only supports (row_major, col_major) 
input layout for now."
+
+return {
+"cutlass_op_def": cutlass_op_def,
+"cutlass_op_name": out["name"],
+"lda": "K",
+"ldb": "K",
+"ldc": "N",
+}
+
+
+def handle_conv2d(
+cutlass_profiler,
+op_type,
+d_shape,
+w_shape,
+out_shape,
+out_dtype,
+profile_all,
+use_multiprocessing,
+):
+"""Profile and select a kernel for conv2d op workload."""
+if any(isinstance(s, tvm.tir.Any) for s in d_shape):
+out = cutlass_profiler.get_default(out_dtype)
+logger.info("Picked the default kernel %s", out["name"])
+else:

[tvm] branch main updated: [CUTLASS] Refactor GEMM generator in preparation for conv2d (#9571)

2021-11-25 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new adf560e  [CUTLASS] Refactor GEMM generator in preparation for conv2d 
(#9571)
adf560e is described below

commit adf560ebed8465c22bf58f406d0a8d20663cdd1d
Author: masahi 
AuthorDate: Fri Nov 26 08:53:55 2021 +0900

[CUTLASS] Refactor GEMM generator in preparation for conv2d (#9571)

* split non-gemm specific generator code to gen_tensor_op.py

commit 250f915652e72e0012e9aa6ce0b6ef337d3da845
Author: Masahiro Masuda 
Date:   Sun Nov 14 06:44:52 2021 +0900

remove conv2d stuff

commit 1a6b27c438472f13acd4a0f466d78f293415e076
Author: Masahiro Masuda 
Date:   Sun Nov 14 06:41:31 2021 +0900

remove unused import

commit f7c3b5a191b8c73e8b178c32f6d3182fb0f697d6
Author: Masahiro Masuda 
Date:   Sun Nov 14 06:37:07 2021 +0900

add profiler boilarplate for conv2d

commit ca1ae274fb8f96a1dcde688deaf15339fe5604fb
Author: Masahiro Masuda 
Date:   Sun Nov 14 06:22:06 2021 +0900

introduce gen_tensor_op.py

commit 37bb918e0873f04457c29479eb21a530b7052217
Author: Masahiro Masuda 
Date:   Sun Nov 14 05:45:41 2021 +0900

more conv2d code

commit 5c00398892c99cb2a03be51f75878992663432dd
Author: Masahiro Masuda 
Date:   Sun Nov 14 05:13:30 2021 +0900

Begin conv2d support

* fix

* use functools.partial

* remove unused import
---
 python/tvm/contrib/cutlass/gen_gemm.py | 230 ++---
 .../cutlass/{gen_gemm.py => gen_tensor_op.py}  | 202 +-
 tests/python/contrib/test_cutlass.py   |   2 +-
 3 files changed, 30 insertions(+), 404 deletions(-)

diff --git a/python/tvm/contrib/cutlass/gen_gemm.py 
b/python/tvm/contrib/cutlass/gen_gemm.py
index 1ed4bfe..4025354 100644
--- a/python/tvm/contrib/cutlass/gen_gemm.py
+++ b/python/tvm/contrib/cutlass/gen_gemm.py
@@ -15,37 +15,29 @@
 # specific language governing permissions and limitations
 # under the License.
 # pylint: disable=invalid-name
-"""Kernel generator and profiler for CUTLASS."""
-import logging
-import os
+"""GEMM kernel generator and profiler for CUTLASS."""
+from functools import partial
 import re
-import tempfile
-import subprocess
-import multiprocessing
 from .gemm_operation import GemmOperation, EmitGemmInstance
 from .gemm_profiler import GemmProfilerEmitter
+from .gen_tensor_op import (
+ProfilerEngine,
+generate_sm75_tensor_op_1688,
+generate_sm80_tensor_op_16816,
+)
 from .library import (
 EpilogueFunctor,
 SwizzlingFunctor,
 TensorDescription,
 DataTypeTag,
 LayoutType,
-MathInstruction,
-DataType,
-OpcodeClass,
-MathOperation,
-TileDescription,
 )
 
-logger = logging.getLogger("cutlass")
-
 
 def create_gemm_operator(
-layouts,
 tile_descriptions,
 data_type,
 alignment_constraints,
-epilogue_functor=EpilogueFunctor.LinearCombination,
 swizzling_functor=SwizzlingFunctor.Identity8,
 batched=False,
 ):
@@ -59,6 +51,10 @@ def create_gemm_operator(
 if batched:
 swizzling_functor = SwizzlingFunctor.Batched
 
+layouts = [
+(LayoutType.RowMajor, LayoutType.ColumnMajor, LayoutType.RowMajor),
+]
+
 for layout in layouts:
 for tile_description in tile_descriptions:
 for alignment in alignment_constraints:
@@ -76,7 +72,7 @@ def create_gemm_operator(
 B,
 C,
 element_epilogue,
-epilogue_functor,
+EpilogueFunctor.LinearCombination,
 swizzling_functor,
 )
 op_bias = GemmOperation(
@@ -110,7 +106,6 @@ def create_gemm_operator(
 swizzling_functor,
 )
 
-kernel_emitter = EmitGemmInstance()
 op_entry["op"] = op
 op_entry["name"] = op.procedural_name()
 op_entry["opdef"] = kernel_emitter.emit(op, batched=batched)
@@ -134,141 +129,12 @@ def create_gemm_operator(
 return ret
 
 
-def generate_tensor_op_common(
-math_instructions, alignment_constraints, get_tile_descriptions, 
batched=False
-):
-"""Common kernel generator to be used by archtecture specific 
generators."""
-ops = []
-layouts = [
-(LayoutType.RowMajor, LayoutType.ColumnMajor, LayoutType.RowMajor),
-]
-for math_inst in math_instructions:
-tile_descriptions = get_tile_descriptions(math_inst)
-data_type = [
-math_inst.element_a,
-m

[tvm] branch main updated (289bd90 -> 0195afc)

2021-11-24 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 289bd90  Prepare for switching VM to LowerTEPass. (#9550)
 add 0195afc  [Target] enable -arch=sm_xx for assigning cuda target arch 
and deprecate autotvm.measure.set_cuda_target_arch api (#9544)

No new revisions were added by this update.

Summary of changes:
 apps/topi_recipe/broadcast/test_broadcast_map.py  |  4 +-
 apps/topi_recipe/conv/depthwise_conv2d_test.py|  4 +-
 apps/topi_recipe/conv/test_conv2d_hwcn_map.py |  4 +-
 apps/topi_recipe/gemm/cuda_gemm_square.py |  4 +-
 apps/topi_recipe/rnn/lstm.py  |  4 +-
 apps/topi_recipe/rnn/matexp.py|  4 +-
 jvm/core/src/test/scripts/test_add_gpu.py |  8 ++-
 python/tvm/auto_scheduler/measure.py  |  5 --
 python/tvm/autotvm/env.py |  1 -
 python/tvm/autotvm/measure/measure_methods.py | 34 +++---
 python/tvm/contrib/nvcc.py| 76 ---
 python/tvm/meta_schedule/builder/local_builder.py |  4 --
 python/tvm/target/target.py   | 15 -
 src/target/opt/build_cuda_on.cc   |  4 ++
 src/target/target_kind.cc | 30 -
 tests/python/integration/test_ewise.py|  4 +-
 16 files changed, 118 insertions(+), 87 deletions(-)

[tvm] branch main updated (dc56eea -> f1c2c5f)

2021-11-10 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from dc56eea  [Support] Add libinfo into the runtime build (#9310)
 add f1c2c5f  Fixed some warnings about lambda's closures that are bigger 
than necessary (#9481)

No new revisions were added by this update.

Summary of changes:
 src/driver/driver_api.cc | 2 +-
 src/relay/backend/vm/compiler.cc | 3 +--
 2 files changed, 2 insertions(+), 3 deletions(-)

[tvm] branch main updated (a6e90b9 -> 93b764c)

2021-10-31 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from a6e90b9  [CUDA] Support memory reuse for dynamic shared memory (#9341)
 add 93b764c  [CUTLASS, Eazy] Cache profiling result and support compiling 
generated kernels in parallel  (#9402)

No new revisions were added by this update.

Summary of changes:
 python/tvm/contrib/cutlass/build.py| 13 -
 python/tvm/contrib/cutlass/gen_gemm.py |  5 +
 2 files changed, 17 insertions(+), 1 deletion(-)

[tvm] branch comaniac-patch-1 created (now 15d727b)

2021-10-25 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch comaniac-patch-1
in repository https://gitbox.apache.org/repos/asf/tvm.git.


  at 15d727b  [COMMUNITY] Mehrdad Hessar -> Reviewer

This branch includes the following new commits:

 new 15d727b  [COMMUNITY] Mehrdad Hessar -> Reviewer

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.

[tvm] 01/01: [COMMUNITY] Mehrdad Hessar -> Reviewer

2021-10-25 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch comaniac-patch-1
in repository https://gitbox.apache.org/repos/asf/tvm.git

commit 15d727bb7e8fbb58cc2185d7bd07c6e44f94c5dd
Author: Cody Yu 
AuthorDate: Mon Oct 25 14:44:01 2021 -0700

[COMMUNITY] Mehrdad Hessar -> Reviewer
---
 CONTRIBUTORS.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CONTRIBUTORS.md b/CONTRIBUTORS.md
index c8e2d70..c883b70 100644
--- a/CONTRIBUTORS.md
+++ b/CONTRIBUTORS.md
@@ -90,6 +90,7 @@ We do encourage everyone to work anything they are interested 
in.
 - [Siyuan Feng](https://github.com/Hzfengsy): @Hzfengsy
 - [Josh Fromm](https://github.com/jwfromm): @jwfromm
 - [Sergei Grechanik](https://github.com/sgrechanik-h): @sgrechanik-h
+- [Mehrdad Hessar](https://github.com/mehrdadh): @mehrdadh
 - [Bohan Hou](https://github.com/spectrometerHBH): @spectrometerHBH
 - [Yuwei Hu](https://github.com/Huyuwei): @Huyuwei
 - [Luke Hutton](https://github.com/lhutton1): @lhutton1

[tvm] 01/01: [Community] @elvin-n -> Reviewer

2021-10-19 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch comaniac-patch-1
in repository https://gitbox.apache.org/repos/asf/tvm.git

commit e37df7878f77ec8492abc9c394b1108fe7c90501
Author: Cody Yu 
AuthorDate: Tue Oct 19 09:51:03 2021 -0700

[Community] @elvin-n -> Reviewer
---
 CONTRIBUTORS.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CONTRIBUTORS.md b/CONTRIBUTORS.md
index b9ef047..19287b4 100644
--- a/CONTRIBUTORS.md
+++ b/CONTRIBUTORS.md
@@ -111,6 +111,7 @@ We do encourage everyone to work anything they are 
interested in.
 - [Andrew Z. Luo](https://github.com/AndrewZhaoLuo): @AndrewZhaoLuo
 - [Steven Lyubomirsky](https://github.com/slyubomirsky): @slyubomirsky
 - [Masahiro Masuda](https://github.com/masahi): @masahi
+- [Andrey Malyshev](https://github.com/elvin-n): @elvin-n
 - [Sergey Mironov](https://github.com/grwlf): @grwlf
 - [Thierry Moreau](https://github.com/tmoreau89): @tmoreau89
 - [Kazutaka Morita](https://github.com/kazum): @kazum

[tvm] branch comaniac-patch-1 created (now e37df78)

2021-10-19 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch comaniac-patch-1
in repository https://gitbox.apache.org/repos/asf/tvm.git.


  at e37df78  [Community] @elvin-n -> Reviewer

This branch includes the following new commits:

 new e37df78  [Community] @elvin-n -> Reviewer

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.

[tvm] branch main updated: [Relay] Gather op dynamic input support (#9240)

2021-10-11 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new 5ad2f77  [Relay] Gather op dynamic input support (#9240)
5ad2f77 is described below

commit 5ad2f77403bed9a2bf356cc0d3d785ecc13e6c58
Author: masahi 
AuthorDate: Tue Oct 12 01:22:10 2021 +0900

[Relay] Gather op dynamic input support (#9240)

* support gather op dynamic input

* fix shape func and add test

* remove constness check

* fix shape func output rank

* restore check

Co-authored-by: masa 
---
 include/tvm/topi/transform.h  |  6 --
 python/tvm/relay/op/_transform.py | 20 
 src/relay/op/tensor/transform.cc  |  6 --
 tests/python/relay/test_any.py| 22 ++
 4 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/include/tvm/topi/transform.h b/include/tvm/topi/transform.h
index 8d1a49a..3df9caf 100644
--- a/include/tvm/topi/transform.h
+++ b/include/tvm/topi/transform.h
@@ -1233,8 +1233,10 @@ inline Tensor gather(const Tensor& data, int axis, const 
Tensor& indices,
   }
   ICHECK_GE(axis, 0);
   ICHECK_LT(axis, ndim_d);
-  size_t indices_dim_i = 
static_cast(GetConstInt(indices->shape[axis]));
-  ICHECK_GE(indices_dim_i, 1);
+  if (indices->shape[axis].as()) {
+size_t indices_dim_i = 
static_cast(GetConstInt(indices->shape[axis]));
+ICHECK_GE(indices_dim_i, 1);
+  }
   ICHECK(indices->dtype.is_int());
 
   Array out_shape;
diff --git a/python/tvm/relay/op/_transform.py 
b/python/tvm/relay/op/_transform.py
index 0284d24..76c8069 100644
--- a/python/tvm/relay/op/_transform.py
+++ b/python/tvm/relay/op/_transform.py
@@ -1174,3 +1174,23 @@ def gather_nd_shape_func(attrs, inputs, _):
 assert index_rank > 0, "index_rank needs to be specified for dynamic 
gather_nd"
 
 return [_gather_nd_shape(inputs[0], inputs[1], convert(batch_dims), 
convert(index_rank))]
+
+
+@script
+def _gather_shape(data_shape, indices_shape, axis):
+out_shape = output_tensor((data_shape.shape[0],), "int64")
+for i in range(data_shape.shape[0]):
+if i != axis:
+assert (
+data_shape[i] == indices_shape[i]
+), "data and indices size at non-gather axes must be the same"
+out_shape[i] = indices_shape[i]
+return out_shape
+
+
+@_reg.register_shape_func("gather", False)
+def gather_shape_func(attrs, inputs, _):
+"""
+Shape func for gather operator.
+"""
+return [_gather_shape(inputs[0], inputs[1], attrs.axis)]
diff --git a/src/relay/op/tensor/transform.cc b/src/relay/op/tensor/transform.cc
index 3781107..fa5b31a 100644
--- a/src/relay/op/tensor/transform.cc
+++ b/src/relay/op/tensor/transform.cc
@@ -3260,8 +3260,10 @@ bool GatherRel(const Array& types, int num_inputs, 
const Attrs& attrs,
   oshape.reserve(ndim_data);
   for (size_t i = 0; i < ndim_data; ++i) {
 if (i == static_cast(axis)) {
-  const int64_t* indice_shape_i = tir::as_const_int(indices->shape[i]);
-  ICHECK_GE(*indice_shape_i, 1);
+  if (indices->shape[i].as()) {
+const int64_t* indice_shape_i = tir::as_const_int(indices->shape[i]);
+ICHECK_GE(*indice_shape_i, 1);
+  }
 } else {
   ICHECK(reporter->AssertEQ(indices->shape[i], data->shape[i]));
 }
diff --git a/tests/python/relay/test_any.py b/tests/python/relay/test_any.py
index decddc1..8788faf 100644
--- a/tests/python/relay/test_any.py
+++ b/tests/python/relay/test_any.py
@@ -2064,5 +2064,27 @@ def test_scatter_nd():
 verify_scatter_nd(data, indices, updates, out)
 
 
+@tvm.testing.uses_gpu
+def test_gather():
+def verify_gather(data_shape, indices_shape, data_shape_np, 
indices_shape_np, axis):
+x = relay.var("x", relay.TensorType(data_shape, "float32"))
+y = relay.var("y", relay.TensorType(indices_shape, "int32"))
+z = relay.gather(x, axis, y)
+
+mod = tvm.IRModule()
+mod["main"] = relay.Function([x, y], z)
+
+data_np = np.random.uniform(size=data_shape_np).astype("float32")
+indices_np = np.random.randint(low=0, high=2, size=indices_shape_np, 
dtype="int32")
+
+ref_res = tvm.topi.testing.gather_python(data_np, axis, indices_np)
+check_result([data_np, indices_np], mod, [ref_res])
+
+verify_gather((relay.Any(),), (relay.Any(),), (10,), (10,), 0)
+verify_gather((2, 2), (2, relay.Any()), (2, 2), (2, 3), 1)
+verify_gather((relay.Any(), 2), (2, relay.Any()), (2, 2), (2, 3), 1)
+verify_gather((relay.Any(), relay.Any()), (relay.Any(), relay.Any()), (2, 
3), (1, 3), 0)
+
+
 if __name__ == "__main__":
 pytest.main([__file__])

[tvm] branch main updated: [VitisAI] Update Vitis AI integration to 1.4 release (#8815)

2021-10-05 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new 627e92e  [VitisAI] Update Vitis AI integration to 1.4 release (#8815)
627e92e is described below

commit 627e92e7c2261d3b2ed8111f13c298a54417084b
Author: Jorn Tuyls 
AuthorDate: Tue Oct 5 18:55:14 2021 +0200

[VitisAI] Update Vitis AI integration to 1.4 release (#8815)

* Update Vitis AI to 1.4 release

* Parameterize Vitis AI codegen tests

* Update Dockerfile.demo_vitis_ai
---
 docker/Dockerfile.demo_vitis_ai|   4 +-
 docker/install/ubuntu_install_vitis_ai_core.sh |  11 +
 docs/deploy/vitis_ai.rst   | 833 -
 python/tvm/relay/op/contrib/vitis_ai.py|  21 +-
 .../python/contrib/test_vitis_ai/infrastructure.py |  13 +-
 .../contrib/test_vitis_ai/test_vitis_ai_codegen.py | 170 +++--
 .../test_vitis_ai_runtime_cpu_part.py  |  15 +-
 7 files changed, 457 insertions(+), 610 deletions(-)

diff --git a/docker/Dockerfile.demo_vitis_ai b/docker/Dockerfile.demo_vitis_ai
index 8cc623e..c38ccaf 100644
--- a/docker/Dockerfile.demo_vitis_ai
+++ b/docker/Dockerfile.demo_vitis_ai
@@ -15,8 +15,8 @@
 # specific language governing permissions and limitations
 # under the License.
 
-# CI docker VAI env
-FROM xilinx/vitis-ai:latest
+# Main Vitis AI docker env
+FROM xilinx/vitis-ai:1.4.916
 
 RUN apt-get update --fix-missing
 
diff --git a/docker/install/ubuntu_install_vitis_ai_core.sh 
b/docker/install/ubuntu_install_vitis_ai_core.sh
index a2d7c2e..09e7aae 100755
--- a/docker/install/ubuntu_install_vitis_ai_core.sh
+++ b/docker/install/ubuntu_install_vitis_ai_core.sh
@@ -20,6 +20,9 @@ set -e
 set -u
 set -o pipefail
 
+export PYXIR_HOME=/opt/pyxir
+mkdir "$PYXIR_HOME"
+
 # install libraries for building Vitis-AI on ubuntu
 apt-get update && apt-get install -y \
 graphviz \
@@ -27,3 +30,11 @@ apt-get update && apt-get install -y \
 gpg-agent \
 gcc-aarch64-linux-gnu \
 && rm -rf /var/lib/apt/lists/*
+
+
+. $VAI_ROOT/conda/etc/profile.d/conda.sh
+conda activate vitis-ai-tensorflow
+pip3 install progressbar h5py==2.10.0
+
+git clone --recursive --branch rel-v0.3.1 --depth 1 
https://github.com/Xilinx/pyxir.git "${PYXIR_HOME}"
+cd "${PYXIR_HOME}" && python3 setup.py install --use_vart_cloud_dpu 
--use_dpuczdx8g_vart
diff --git a/docs/deploy/vitis_ai.rst b/docs/deploy/vitis_ai.rst
index d3e3ca0..7e97ddc 100755
--- a/docs/deploy/vitis_ai.rst
+++ b/docs/deploy/vitis_ai.rst
@@ -16,170 +16,96 @@
 under the License.
 
 
-Vitis-AI Integration
+Vitis AI Integration
 
 
-`Vitis-AI <https://github.com/Xilinx/Vitis-AI>`__ is Xilinx's
+`Vitis AI <https://github.com/Xilinx/Vitis-AI>`__ is Xilinx's
 development stack for hardware-accelerated AI inference on Xilinx
 platforms, including both edge devices and Alveo cards. It consists of
 optimized IP, tools, libraries, models, and example designs. It is
 designed with high efficiency and ease of use in mind, unleashing the
 full potential of AI acceleration on Xilinx FPGA and ACAP.
 
-The current Vitis-AI Byoc flow inside TVM enables acceleration of Neural
-Network model inference on edge and cloud. The identifiers for the
-supported edge and cloud Deep Learning Processor Units (DPU's) are
-DPUCZDX8G respectively DPUCADX8G. DPUCZDX8G and DPUCADX8G are hardware
-accelerators for convolutional neural networks (CNN's) on top of the
-Xilinx `Zynq Ultrascale+
-MPSoc 
<https://www.xilinx.com/products/silicon-devices/soc/zynq-ultrascale-mpsoc.html>`__
-respectively
-`Alveo <https://www.xilinx.com/products/boards-and-kits/alveo.html>`__
-(U200/U250) platforms. For more information about the DPU identifiers
-see the section on `DPU naming information <#dpu-naming-information>`__.
-
-On this page you will find information on how to
-`build <#build-instructions>`__ TVM with Vitis-AI and on how to `get
-started <#getting-started>`__ with an example.
-
-DPU naming information
---
-
-+-+-+-++---+--+
-| DPU | Application | HW Platform  
   | Quantization Method

[tvm] branch main updated (b9f2284 -> 2f02b1e)

2021-10-04 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from b9f2284  Support quantised RSQRT operator in TFLite (#9165)
 add 2f02b1e  support Torch all and any op (#9185)

No new revisions were added by this update.

Summary of changes:
 python/tvm/relay/frontend/pytorch.py  | 13 +
 tests/python/frontend/pytorch/test_forward.py | 12 
 2 files changed, 25 insertions(+)

[tvm] branch main updated (f573007 -> 0564d38)

2021-09-26 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from f573007  relu of dnnl json runtime only support 4-dims input (#9122)
 add 0564d38  add `multiply` and remove `subtract` for dnnl json runtime 
(#9120)

No new revisions were added by this update.

Summary of changes:
 python/tvm/relay/op/contrib/dnnl.py   |  1 -
 src/runtime/contrib/dnnl/dnnl_json_runtime.cc | 15 -
 tests/python/relay/test_json_runtime.py   | 45 +++
 3 files changed, 53 insertions(+), 8 deletions(-)

[tvm] branch main updated (80de123 -> f573007)

2021-09-26 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 80de123  [Meta Schedule][M3a] SpaceGenerator  (#9079)
 add f573007  relu of dnnl json runtime only support 4-dims input (#9122)

No new revisions were added by this update.

Summary of changes:
 src/runtime/contrib/dnnl/dnnl_json_runtime.cc |  5 ++---
 tests/python/relay/test_json_runtime.py   | 28 +++
 2 files changed, 18 insertions(+), 15 deletions(-)

[tvm] branch main updated (44b644c -> 44d3934)

2021-09-20 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 44b644c  [3/10] Moved TIR generation from Python to C++ for CMSIS-NN 
(#8951)
 add 44d3934  [Meta Schedule][M3b] Builder (#9044)

No new revisions were added by this update.

Summary of changes:
 CMakeLists.txt |   1 +
 include/tvm/meta_schedule/builder.h| 151 ++
 .../{_ffi/_ctypes => meta_schedule}/__init__.py|   3 +-
 python/tvm/{arith => meta_schedule}/_ffi_api.py|   7 +-
 .../builder/__init__.py}   |  11 +-
 python/tvm/meta_schedule/builder/builder.py| 131 
 python/tvm/meta_schedule/builder/local_builder.py  | 229 +
 python/tvm/meta_schedule/utils.py  |  97 +
 python/tvm/relay/transform/transform.py|   2 +-
 src/meta_schedule/builder/builder.cc   |  69 +++
 .../utils.cc => meta_schedule/utils.h} |  20 +-
 .../python/unittest/test_meta_schedule_builder.py  | 219 
 tests/scripts/task_mypy.sh |   7 +-
 13 files changed, 919 insertions(+), 28 deletions(-)
 create mode 100644 include/tvm/meta_schedule/builder.h
 copy python/tvm/{_ffi/_ctypes => meta_schedule}/__init__.py (89%)
 copy python/tvm/{arith => meta_schedule}/_ffi_api.py (84%)
 copy python/tvm/{driver/tvmc/__main__.py => meta_schedule/builder/__init__.py} 
(77%)
 create mode 100644 python/tvm/meta_schedule/builder/builder.py
 create mode 100644 python/tvm/meta_schedule/builder/local_builder.py
 create mode 100644 python/tvm/meta_schedule/utils.py
 create mode 100644 src/meta_schedule/builder/builder.cc
 copy src/{auto_scheduler/utils.cc => meta_schedule/utils.h} (78%)
 mode change 100755 => 100644
 create mode 100644 tests/python/unittest/test_meta_schedule_builder.py

[tvm] branch main updated (be37923 -> db78d96)

2021-09-16 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from be37923  Implementation of relay_to_tir target hook (#8423)
 add db78d96  [CUDA] Fix dense tensorcore legalize type error when units is 
specified (#9030)

No new revisions were added by this update.

Summary of changes:
 python/tvm/topi/cuda/tensorcore_alter_op.py |  6 ++
 tests/python/relay/test_pass_legalize_tensorcore.py | 12 ++--
 2 files changed, 12 insertions(+), 6 deletions(-)

[tvm] branch main updated (e44f6c0 -> 6f5b674)

2021-09-15 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from e44f6c0  [ONNX] Add Einsum converter (#8985)
 add 6f5b674  [BYOC][TensorRT] Add TensorRT own int8 calibration support to 
TensorRT BYOC integration (#8808)

No new revisions were added by this update.

Summary of changes:
 src/runtime/contrib/tensorrt/tensorrt_builder.cc   |  19 ++-
 src/runtime/contrib/tensorrt/tensorrt_builder.h|  11 +-
 src/runtime/contrib/tensorrt/tensorrt_calibrator.h | 130 ++
 src/runtime/contrib/tensorrt/tensorrt_runtime.cc   | 108 +--
 tests/python/contrib/test_tensorrt_int8_exp.py | 149 +
 5 files changed, 399 insertions(+), 18 deletions(-)
 create mode 100755 src/runtime/contrib/tensorrt/tensorrt_calibrator.h
 create mode 100644 tests/python/contrib/test_tensorrt_int8_exp.py

[tvm] branch comaniac-patch-1 created (now afb3c28)

2021-09-15 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch comaniac-patch-1
in repository https://gitbox.apache.org/repos/asf/tvm.git.


  at afb3c28  [Community] @AndrewZhaoLuo -> Reviewer

This branch includes the following new commits:

 new afb3c28  [Community] @AndrewZhaoLuo -> Reviewer

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.

[tvm] 01/01: [Community] @AndrewZhaoLuo -> Reviewer

2021-09-15 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch comaniac-patch-1
in repository https://gitbox.apache.org/repos/asf/tvm.git

commit afb3c289a6ee1c74d6d850248e168bc6c04c051b
Author: Cody Yu 
AuthorDate: Wed Sep 15 11:42:57 2021 -0700

[Community] @AndrewZhaoLuo -> Reviewer
---
 CONTRIBUTORS.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CONTRIBUTORS.md b/CONTRIBUTORS.md
index 2821446..14f8191 100644
--- a/CONTRIBUTORS.md
+++ b/CONTRIBUTORS.md
@@ -108,6 +108,7 @@ We do encourage everyone to work anything they are 
interested in.
 - [Yizhi Liu](https://github.com/yzhliu) : @yzhliu
 - [Hao Lu](https://github.com/hlu1): @hlu1
 - [Eric Lunderberg](https://github.com/Lunderberg): @Lunderberg
+- [Andrew Z. Luo](https://github.com/AndrewZhaoLuo): @AndrewZhaoLuo
 - [Steven Lyubomirsky](https://github.com/slyubomirsky): @slyubomirsky
 - [Masahiro Masuda](https://github.com/masahi): @masahi
 - [Sergey Mironov](https://github.com/grwlf): @grwlf

[tvm] branch main updated (e1ae821 -> 5bf63be)

2021-09-14 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from e1ae821  Add while node support in TVMScript (#9004)
 add 5bf63be  [CMake] Corrected warning message about 
USE_GRAPH_EXECUTOR_DEBUG (#9006)

No new revisions were added by this update.

Summary of changes:
 CMakeLists.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

[tvm] branch main updated (01aeeb1 -> f8b1df4)

2021-09-08 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 01aeeb1  [2/6] Arm(R) Ethos(TM)-U NPU Relay passes and Conv2D op 
(#8795)
 add f8b1df4  [Bugfix] Fix visit_attrs error if its function pointer is 
equal to nullptr (#8920)

No new revisions were added by this update.

Summary of changes:
 python/tvm/ir/container.py | 20 +
 tests/python/unittest/test_ir_container.py | 35 --
 2 files changed, 48 insertions(+), 7 deletions(-)

[tvm] branch main updated (0034732 -> 0fb840e)

2021-09-07 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 0034732  [Documentation] Document rewrite_once option (#8900)
 add 0fb840e  [Layout] Unify dense op input layout (#8921)

No new revisions were added by this update.

Summary of changes:
 include/tvm/relay/attrs/nn.h|  4 ++--
 python/tvm/relay/op/nn/nn.py|  4 ++--
 python/tvm/topi/x86/dense_alter_op.py   |  2 +-
 src/relay/op/nn/nn.cc   |  2 +-
 tests/python/frontend/pytorch/test_forward.py   | 10 ++
 tests/python/relay/test_pass_alter_op_layout.py |  8 
 6 files changed, 20 insertions(+), 10 deletions(-)

[tvm] branch main updated (7eda4a5 -> 054e2bb)

2021-09-06 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 7eda4a5  [Relay, TOPI] Make Softmax op fusible with elemwise ops 
(#8909)
 add 054e2bb  [CUDA] Improve adaptive and global pool schedule  (#8936)

No new revisions were added by this update.

Summary of changes:
 python/tvm/topi/cuda/pooling.py | 36 +++-
 1 file changed, 19 insertions(+), 17 deletions(-)

[tvm-site] branch main updated: Add SiMa.ai logo to contributing organizations (#30)

2021-09-03 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm-site.git


The following commit(s) were added to refs/heads/main by this push:
 new 00f550c  Add SiMa.ai logo to contributing organizations (#30)
00f550c is described below

commit 00f550c53e50cbfdec8206cda2e7a08aaf0a1ef5
Author: Alicja Kwasniewska <87716821+alicja-sima...@users.noreply.github.com>
AuthorDate: Fri Sep 3 22:08:12 2021 +0200

Add SiMa.ai logo to contributing organizations (#30)

* Add SiMa.ai logo file

* Add SiMa.ai to contributing organizations
---
 _data/contrib_orgs.yml  |   3 +++
 images/community/simaai.png | Bin 0 -> 37282 bytes
 2 files changed, 3 insertions(+)

diff --git a/_data/contrib_orgs.yml b/_data/contrib_orgs.yml
index 9b489df..daffd71 100644
--- a/_data/contrib_orgs.yml
+++ b/_data/contrib_orgs.yml
@@ -54,6 +54,9 @@
 - img: /images/community/sertis.png
   title: sertis
   alter: sertis
+- img: /images/community/simaai.png
+  title: simaai
+  alter: simaai
 - img: /images/community/sjtu.png
   title: sjtu
   alter: sjtu
diff --git a/images/community/simaai.png b/images/community/simaai.png
new file mode 100644
index 000..097278b
Binary files /dev/null and b/images/community/simaai.png differ

[tvm-rfcs] branch main updated: Simple module connection API. (#26)

2021-08-26 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm-rfcs.git


The following commit(s) were added to refs/heads/main by this push:
 new 7003e21  Simple module connection API. (#26)
7003e21 is described below

commit 7003e21edb22ee3c32a9f631bd9a2b79430cff6b
Author: Hua Jiang 
AuthorDate: Thu Aug 26 22:17:34 2021 -0700

Simple module connection API. (#26)
---
 rfcs/0014-pipeline-executor.md | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/rfcs/0014-pipeline-executor.md b/rfcs/0014-pipeline-executor.md
index 7a173d2..f37f19b 100644
--- a/rfcs/0014-pipeline-executor.md
+++ b/rfcs/0014-pipeline-executor.md
@@ -69,20 +69,20 @@ This section introduces the use case for Pipeline Executor.
 
 ```python
 
-mod1, mod2, mod3 = my_manual_partitioner(mod)
+mod1, mod2, mod3 = my_manual_partitioner()
 pipe_cfg = PipelineModuleConfig()
 
 # Define pipeline inputs. Here I assume two inputs of mod1 and one input of 
mod3 are the pipeline inputs.
-pipe_cfg.inputs["data_0"] = (mod1, "data_0")
-pipe_cfg.inputs["data_1"] = (mod1, "data_1")
-pipe_cfg.inputs["data_2"] = (mod3, "data_0")
+pipe_config.connect(pipe_config.pipe_input("data_0"), 
pipe_config[mod1].input("data_0"))
+pipe_config.connect(pipe_config.pipe_input("data_1"), 
pipe_config[mod1].input("data_1"))
+pipe_config.connect(pipe_config.pipe_input("data_2"), 
pipe_config[mod3].input("data_1"))
 
 # Define pipeline outputs to be the first output of mod3.
-pipe_cfg.outputs.append((mod3, 0))
+pipe_config.connect(pipe_config[mod3].output(0), pipe_config.pipe_output("0"))
 
 # Define connections.
-pipe_cfg.connect(mod1, 0, mod2, "data_0") # mod1.output(0) -> mod2.data_0
-pipe_cfg.connect(mod2, 0, mod3, "data_1") # mod2.output(0) -> mod3.data_1
+pipe_config.connect(pipe_config[mod1].output(0), 
pipe_config[mod2].input("data_0")) # mod1.output(0) -> mod2.data_0
+pipe_config.connect(pipe_config[mod2].output(0), 
pipe_config[mod3].input("data_1")) # mod2.output(0) -> mod3.data_1
 
 # Print config for debugging
 print(str(pipe_cfg))
@@ -91,7 +91,7 @@ print(str(pipe_cfg))
 #   |- data_1: mod1.data_1
 #   |- data_2: mod3.data_0
 # Outputs:
-#   |- mod3.output(0)
+#   |- output(0): mod3.output(0)
 # Connections:
 #   |- mod1.output(0) -> mod2.data_0
 #   |- mod2.output(0) -> mod3.data_1

[tvm] branch main updated: update gpu and cpu (#8853)

2021-08-26 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new 4fd1bf4  update gpu and cpu (#8853)
4fd1bf4 is described below

commit 4fd1bf4e512aafc0bea0b809789cd27f8dd944d4
Author: Mehrdad Hessar 
AuthorDate: Thu Aug 26 19:08:15 2021 +0200

update gpu and cpu (#8853)
---
 Jenkinsfile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Jenkinsfile b/Jenkinsfile
index 4814dc7..9eafb44 100755
--- a/Jenkinsfile
+++ b/Jenkinsfile
@@ -45,8 +45,8 @@
 
 // NOTE: these lines are scanned by docker/dev_common.sh. Please update the 
regex as needed. -->
 ci_lint = "tlcpack/ci-lint:v0.67"
-ci_gpu = "tlcpack/ci-gpu:v0.76"
-ci_cpu = "tlcpack/ci-cpu:v0.76"
+ci_gpu = "tlcpack/ci-gpu:v0.77"
+ci_cpu = "tlcpack/ci-cpu:v0.77"
 ci_wasm = "tlcpack/ci-wasm:v0.71"
 ci_i386 = "tlcpack/ci-i386:v0.73"
 ci_qemu = "tlcpack/ci-qemu:v0.08"

[tvm] branch main updated (3f777d5 -> d263c6d)

2021-08-26 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 3f777d5  [Hexagon] Rework tvm.target.hexagon() interface (#8823)
 add d263c6d  [Pattern matching] Add an option to rewrite the graph only 
once (#8843)

No new revisions were added by this update.

Summary of changes:
 include/tvm/relay/dataflow_matcher.h  |  6 +-
 python/tvm/relay/dataflow_pattern/__init__.py | 17 +++--
 src/relay/ir/dataflow_matcher.cc  | 11 +--
 tests/python/relay/test_dataflow_pattern.py   | 98 +--
 4 files changed, 58 insertions(+), 74 deletions(-)

[tvm] branch main updated (2859c20 -> 7ae8f89)

2021-08-24 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 2859c20  [M3a][Meta Schedule] Add Sampling Primitive 
SampleCategorical. (#8817)
 add 7ae8f89  [Community] @Lunderberg -> Reviewer (#8834)

No new revisions were added by this update.

Summary of changes:
 CONTRIBUTORS.md | 1 +
 1 file changed, 1 insertion(+)

[tvm-rfcs] branch main updated (2d57c28 -> dd2e7a8)

2021-08-24 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm-rfcs.git.


from 2d57c28  [RFC] Pipeline Executor (#14)
 add dd2e7a8  [RFC] [Relay] Automatic Mixed Precision Pass (#6)

No new revisions were added by this update.

Summary of changes:
 rfcs/0006-AMP_pass.md | 307 ++
 1 file changed, 307 insertions(+)
 create mode 100644 rfcs/0006-AMP_pass.md

[tvm] branch comaniac-patch-1 created (now 5861379)

2021-08-24 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch comaniac-patch-1
in repository https://gitbox.apache.org/repos/asf/tvm.git.


  at 5861379  [Community] @Lunderberg -> Reviewer

This branch includes the following new commits:

 new 5861379  [Community] @Lunderberg -> Reviewer

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.

[tvm] 01/01: [Community] @Lunderberg -> Reviewer

2021-08-24 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch comaniac-patch-1
in repository https://gitbox.apache.org/repos/asf/tvm.git

commit 58613792d38ca2b979d94d48ca7dd9050141a4ef
Author: Cody Yu 
AuthorDate: Tue Aug 24 09:34:39 2021 -0700

[Community] @Lunderberg -> Reviewer
---
 CONTRIBUTORS.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CONTRIBUTORS.md b/CONTRIBUTORS.md
index 6a48690..614479b 100644
--- a/CONTRIBUTORS.md
+++ b/CONTRIBUTORS.md
@@ -115,6 +115,7 @@ We do encourage everyone to work anything they are 
interested in.
 - [Xin Liu](https://github.com/Meteorix): @Meteorix
 - [Yizhi Liu](https://github.com/yzhliu) : @yzhliu
 - [Hao Lu](https://github.com/hlu1): @hlu1
+- [Eric Lunderberg](https://github.com/Lunderberg): @Lunderberg
 - [Steven Lyubomirsky](https://github.com/slyubomirsky): @slyubomirsky
 - [Masahiro Masuda](https://github.com/masahi): @masahi
 - [Sergey Mironov](https://github.com/grwlf): @grwlf

[tvm-rfcs] branch main updated: [RFC] Pipeline Executor (#14)

2021-08-20 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm-rfcs.git


The following commit(s) were added to refs/heads/main by this push:
 new 2d57c28  [RFC] Pipeline Executor (#14)
2d57c28 is described below

commit 2d57c28d55ab26587c748724c9eef4e0835d5ea8
Author: Hua Jiang 
AuthorDate: Fri Aug 20 09:41:49 2021 -0700

[RFC] Pipeline Executor (#14)

* add pipeline compute rfc.

* Update rfcs/0012-pipeline-executor.md

Co-authored-by: Cody Yu 

* Update rfcs/0012-pipeline-executor.md

Co-authored-by: Cody Yu 

* Update rfcs/0012-pipeline-executor.md

Co-authored-by: Cody Yu 

* Update rfcs/0012-pipeline-executor.md

Co-authored-by: Cody Yu 

* Update rfcs/0012-pipeline-executor.md

Co-authored-by: Cody Yu 

* Update rfcs/0012-pipeline-executor.md

Co-authored-by: Cody Yu 

* Update rfcs/0012-pipeline-executor.md

Co-authored-by: Cody Yu 

* Update rfcs/0012-pipeline-executor.md

Co-authored-by: Cody Yu 

* Update rfcs/0012-pipeline-executor.md

Co-authored-by: Cody Yu 

* Update rfcs/0012-pipeline-executor.md

Co-authored-by: Cody Yu 

* Update rfcs/0012-pipeline-executor.md

Co-authored-by: Cody Yu 

* Update rfcs/0012-pipeline-executor.md

Co-authored-by: Cody Yu 

* Update rfcs/0012-pipeline-executor.md

Co-authored-by: Cody Yu 

* Update rfcs/0012-pipeline-executor.md

Co-authored-by: Cody Yu 

* Update rfcs/0012-pipeline-executor.md

Co-authored-by: Cody Yu 

* address review comments.

* Update rfcs/0012-pipeline-executor.md

Co-authored-by: Cody Yu 

* Update rfcs/0012-pipeline-executor.md

Co-authored-by: Cody Yu 

* address review comments.

* address review comments.

* Update rfcs/0012-pipeline-executor.md

Co-authored-by: Cody Yu 

* Update rfcs/0012-pipeline-executor.md

Co-authored-by: Cody Yu 

* Update rfcs/0012-pipeline-executor.md

Co-authored-by: Cody Yu 

* rename rfcs file name into 0014.

Co-authored-by: hua jiang 
Co-authored-by: Cody Yu 
---
 resources/pipeline-executor-arch.png   | Bin 0 -> 72676 bytes
 resources/pipeline-executor-pipeline.png   | Bin 0 -> 237086 bytes
 resources/pipeline-executor-runtime.png| Bin 0 -> 90514 bytes
 resources/pipeline-executor-schedule.png   | Bin 0 -> 97559 bytes
 resources/pipeline-executor-subgraph-split.png | Bin 0 -> 156056 bytes
 resources/pipeline-executor.png| Bin 0 -> 39948 bytes
 rfcs/0014-pipeline-executor.md | 236 +
 7 files changed, 236 insertions(+)

diff --git a/resources/pipeline-executor-arch.png 
b/resources/pipeline-executor-arch.png
new file mode 100644
index 000..3f91dd3
Binary files /dev/null and b/resources/pipeline-executor-arch.png differ
diff --git a/resources/pipeline-executor-pipeline.png 
b/resources/pipeline-executor-pipeline.png
new file mode 100644
index 000..a634c3a
Binary files /dev/null and b/resources/pipeline-executor-pipeline.png differ
diff --git a/resources/pipeline-executor-runtime.png 
b/resources/pipeline-executor-runtime.png
new file mode 100644
index 000..a9857d2
Binary files /dev/null and b/resources/pipeline-executor-runtime.png differ
diff --git a/resources/pipeline-executor-schedule.png 
b/resources/pipeline-executor-schedule.png
new file mode 100644
index 000..e3dcc83
Binary files /dev/null and b/resources/pipeline-executor-schedule.png differ
diff --git a/resources/pipeline-executor-subgraph-split.png 
b/resources/pipeline-executor-subgraph-split.png
new file mode 100644
index 000..d9e2937
Binary files /dev/null and b/resources/pipeline-executor-subgraph-split.png 
differ
diff --git a/resources/pipeline-executor.png b/resources/pipeline-executor.png
new file mode 100644
index 000..a7858ee
Binary files /dev/null and b/resources/pipeline-executor.png differ
diff --git a/rfcs/0014-pipeline-executor.md b/rfcs/0014-pipeline-executor.md
new file mode 100644
index 000..7a173d2
--- /dev/null
+++ b/rfcs/0014-pipeline-executor.md
@@ -0,0 +1,236 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+- Feature Name: Pipeline Executor
+- Start Date: 2021-07-30
+- RFC PR: [apache/tvm-rfcs#0014](https://github.com/apache/tvm-rfcs/pull/0014)
+- GitHub Issue: [apache/tvm#8596](https://github.com/apache/tvm/issues/8596)
+
+## 1. Summary
+
+
+This proposal introduces Pipeline Executor: A runtime executor that schedules
+a list of Relay modules in pipeline to achieve task level parallelism to 
improve
+computation throughput.
+
+## 2. Motivation
+
+
+
+Currently more and more edge device inference deployments happen on SOC 
devices.
+Sinc

[tvm] branch main updated (36ea17a -> 7f237dd)

2021-08-19 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 36ea17a  [Docker][Vulkan] Allow Vulkan GPU access in docker container. 
(#8784)
 add 7f237dd  Extend tune_relay_x86 tutorial to measure default and kernel 
level tune (#8794)

No new revisions were added by this update.

Summary of changes:
 tutorials/autotvm/tune_relay_x86.py | 71 +
 1 file changed, 56 insertions(+), 15 deletions(-)

[tvm] branch main updated: [FIX] Bug fix for batch_matmul parameters mismatch (#8785)

2021-08-18 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new 41879b2  [FIX] Bug fix for batch_matmul parameters mismatch (#8785)
41879b2 is described below

commit 41879b2552364f094492470a77a3ec0866b30eae
Author: Chenfan 
AuthorDate: Thu Aug 19 11:15:34 2021 +0800

[FIX] Bug fix for batch_matmul parameters mismatch (#8785)
---
 python/tvm/topi/cuda/batch_matmul.py| 13 -
 python/tvm/topi/cuda/batch_matmul_tensorcore.py |  9 +++--
 python/tvm/topi/rocm/batch_matmul.py|  7 +--
 3 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/python/tvm/topi/cuda/batch_matmul.py 
b/python/tvm/topi/cuda/batch_matmul.py
index 3fc8a58..bd556d2 100644
--- a/python/tvm/topi/cuda/batch_matmul.py
+++ b/python/tvm/topi/cuda/batch_matmul.py
@@ -237,7 +237,9 @@ def schedule_batch_matmul_cublas(_, outs):
 
 
 @autotvm.register_topi_compute("batch_matmul_int8.cuda")
-def batch_matmul_int8(cfg, x, y, out_shape=None, out_dtype=None):
+def batch_matmul_int8(
+cfg, x, y, out_shape=None, out_dtype=None, transpose_a=False, 
transpose_b=True
+):
 """Batch Matmul operator for int8 on CUDA.
 
 Parameters
@@ -258,11 +260,20 @@ def batch_matmul_int8(cfg, x, y, out_shape=None, 
out_dtype=None):
 out_dtype : Optional[str]
 Specifies the output data type for mixed precision batch matmul.
 
+transpose_a : Optional[bool] = False
+Whether the first tensor is in transposed format.
+
+transpose_b : Optional[bool] = True
+Whether the second tensor is in transposed format.
+
 Returns
 ---
 output : tvm.te.Tensor
 3-D with shape [batch, M, N]
 """
+del out_shape
+# TODO(jcf94): Deal with different transpose combinations
+assert not transpose_a and transpose_b
 if out_dtype is None:
 out_dtype = x.dtype
 
diff --git a/python/tvm/topi/cuda/batch_matmul_tensorcore.py 
b/python/tvm/topi/cuda/batch_matmul_tensorcore.py
index a56d3c3..5324302 100644
--- a/python/tvm/topi/cuda/batch_matmul_tensorcore.py
+++ b/python/tvm/topi/cuda/batch_matmul_tensorcore.py
@@ -29,9 +29,14 @@ from .tensor_intrin import (
 
 
 @autotvm.register_topi_compute("batch_matmul_tensorcore.cuda")
-def batch_matmul_tensorcore(cfg, x, y, out_shape=None, out_dtype=None):
+def batch_matmul_tensorcore(
+cfg, x, y, out_shape=None, out_dtype=None, transpose_a=False, 
transpose_b=True
+):
 """batch matmul tensorcore operator on cuda"""
-# todo: deal with out_shape for broadcast, liuxin.ai
+# TODO(jcf94): Deal with different transpose combinations
+assert not transpose_a and transpose_b
+# TODO(liuxin.ai): Deal with out_shape for broadcast
+del out_shape
 return batch_matmul_tensorcore_cuda(x, y, out_dtype)
 
 
diff --git a/python/tvm/topi/rocm/batch_matmul.py 
b/python/tvm/topi/rocm/batch_matmul.py
index 7f35f4b..53b51ee 100644
--- a/python/tvm/topi/rocm/batch_matmul.py
+++ b/python/tvm/topi/rocm/batch_matmul.py
@@ -23,7 +23,9 @@ from ..utils import get_const_tuple
 
 
 @autotvm.register_topi_compute("batch_matmul_rocblas.rocm")
-def batch_matmul_rocblas(cfg, x, y, out_shape=None):
+def batch_matmul_rocblas(
+cfg, x, y, out_shape=None, out_dtype=None, transpose_a=False, 
transpose_b=True
+):
 """Computes matrix multiplication of `x` and `y` via rocblas when
 `x` and `y` are batched matrices.
 
@@ -40,12 +42,13 @@ def batch_matmul_rocblas(cfg, x, y, out_shape=None):
 output : tvm.te.Tensor
 3-D with shape [batch, M, N]
 """
+del out_dtype
 batch, M, K = get_const_tuple(x.shape)
 _, N, _ = get_const_tuple(y.shape)
 if out_shape is not None:
 assert out_shape[0] == batch, "Input and output batch sizes must match"
 assert out_shape[1] == M and out_shape[2] == N, "Invalid output shape"
-result = rocblas.batch_matmul(x, y, False, True)
+result = rocblas.batch_matmul(x, y, transpose_a, transpose_b)
 cfg.add_flop(batch * M * N * K * 2)
 return result

[tvm] branch main updated (5b2e504 -> 3f881ab)

2021-08-18 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 5b2e504  Restore License (#8779)
 add 3f881ab  Expose FTVMInferCorrectLayout Python interface (#8755)

No new revisions were added by this update.

Summary of changes:
 python/tvm/relay/op/op.py  | 17 
 .../transform/infer_layout_utils.py}   | 32 +++
 src/relay/transforms/convert_layout.cc |  7 
 src/relay/transforms/infer_layout_utils.h  |  9 +
 tests/python/relay/test_pass_convert_op_layout.py  | 45 ++
 5 files changed, 91 insertions(+), 19 deletions(-)
 copy python/tvm/{arith/bound.py => relay/transform/infer_layout_utils.py} (59%)
 mode change 100644 => 100755

[tvm] branch main updated: [Relay testing] densenet implementation fix (#8704)

2021-08-17 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new 26f7c0d  [Relay testing] densenet implementation fix (#8704)
26f7c0d is described below

commit 26f7c0d7c1959bc1fe37915abe26db5c080dbb57
Author: Jaehun Ryu 
AuthorDate: Wed Aug 18 02:14:05 2021 +0900

[Relay testing] densenet implementation fix (#8704)

* Fixed testing densenet bug

* Fixed code format using black
---
 python/tvm/relay/testing/densenet.py | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/python/tvm/relay/testing/densenet.py 
b/python/tvm/relay/testing/densenet.py
index 1ceb626..d621249 100644
--- a/python/tvm/relay/testing/densenet.py
+++ b/python/tvm/relay/testing/densenet.py
@@ -1,4 +1,3 @@
-# Licensed to the Apache Software Foundation (ASF) under one
 # or more contributor license agreements.  See the NOTICE file
 # distributed with this work for additional information
 # regarding copyright ownership.  The ASF licenses this file
@@ -44,9 +43,12 @@ def _make_dense_layer(data, growth_rate, bn_size, index):
 def _make_dense_block(data, num_layers, bn_size, growth_rate, index):
 """Makes a block of dense layers of the specified size."""
 layer_out = data
+blocks = []
 for i in range(num_layers):
 layer_out = _make_dense_layer(layer_out, growth_rate, bn_size, "%s_%s" 
% (index, i))
-return layer_out
+blocks.append(layer_out)
+block_out = relay.concatenate(blocks, 1)
+return block_out
 
 
 def _make_transition(data, num_output_features, index):
@@ -63,7 +65,9 @@ def _make_dense_net(
 num_init_features, growth_rate, block_config, data_shape, data_dtype, 
bn_size=4, classes=1000
 ):
 """Builds up a densenet."""
-data = relay.Var("data", relay.TensorType(data_shape, data_dtype))  # 
(bn_size, 3, 224, 224)))
+data = relay.Var(
+"data", relay.TensorType(data_shape, data_dtype)
+)  # (batch_size, 3, 224, 224)))
 conv1 = layers.conv2d(
 data,
 channels=num_init_features,
@@ -79,7 +83,7 @@ def _make_dense_net(
 num_features = num_init_features
 layer_out = mp
 for i, num_layers in enumerate(block_config):
-layer_out = _make_dense_block(layer_out, num_layers, growth_rate, 
bn_size, i)
+layer_out = _make_dense_block(layer_out, num_layers, bn_size, 
growth_rate, i)
 num_features = num_features + num_layers * growth_rate
 if i != len(block_config) - 1:
 layer_out = _make_transition(layer_out, num_features // 2, i)
@@ -131,10 +135,10 @@ def get_workload(
 169: (69, 32, [6, 12, 32, 32]),
 201: (64, 32, [6, 12, 48, 32]),
 }
-
+bn_size = 4
 num_init_features, growth_rate, block_config = specs[densenet_size]
 data_shape = tuple([batch_size] + list(image_shape))
 net = _make_dense_net(
-num_init_features, growth_rate, block_config, data_shape, dtype, 
batch_size, classes
+num_init_features, growth_rate, block_config, data_shape, dtype, 
bn_size, classes
 )
 return create_workload(net)

[tvm] branch main updated (5e20ef9 -> 66ac470)

2021-08-12 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 5e20ef9  Remove qemu installation from Zephyr RVM (#8701)
 add 66ac470  [Relay] Dense alter layout fixed for packed input (#8669)

No new revisions were added by this update.

Summary of changes:
 include/tvm/relay/attrs/nn.h   | 19 +++
 python/tvm/relay/op/nn/_nn.py  |  6 +--
 python/tvm/relay/op/nn/nn.py   | 14 ++---
 python/tvm/topi/x86/dense_alter_op.py  |  4 +-
 src/relay/op/nn/nn.cc  | 62 +++---
 src/relay/op/nn/nn.h   | 23 
 .../contrib/test_arm_compute_lib/test_dense.py |  7 ++-
 tests/python/relay/test_pass_alter_op_layout.py| 48 -
 8 files changed, 139 insertions(+), 44 deletions(-)

[tvm] branch main updated (00bed97 -> 3145867)

2021-08-09 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 00bed97  Remove unused variables in AOT tests (#8686)
 add 3145867  [Meta Schedule][M3a] Traced Schedule (#8623)

No new revisions were added by this update.

Summary of changes:
 include/tvm/tir/schedule/schedule.h|  23 ++-
 include/tvm/tir/schedule/state.h   |  12 +-
 python/tvm/tir/schedule/schedule.py|  96 ++-
 python/tvm/tir/schedule/state.py   |  57 ---
 python/tvm/tir/schedule/testing.py |  62 
 src/tir/schedule/concrete_schedule.cc  |  14 +-
 src/tir/schedule/concrete_schedule.h   |   6 +-
 src/tir/schedule/schedule.cc   |  13 +-
 src/tir/schedule/state.cc  |  28 ++--
 src/tir/schedule/traced_schedule.cc| 156 ++
 src/tir/schedule/traced_schedule.h |  73 +
 tests/python/unittest/test_te_create_primfunc.py   |   4 +-
 .../unittest/test_tir_schedule_block_scope.py  |   6 +-
 .../unittest/test_tir_schedule_compute_inline.py   |  44 --
 tests/python/unittest/test_tir_schedule_error.py   |   6 +-
 .../python/unittest/test_tir_schedule_reduction.py | 175 +++--
 .../unittest/test_tir_schedule_split_fuse.py   |  55 ---
 tests/python/unittest/test_tir_schedule_state.py   |   8 +-
 .../test_tir_schedule_state_cached_flags.py|  32 ++--
 tests/python/unittest/test_tir_schedule_trace.py   |   4 +-
 .../python/unittest/test_tir_schedule_utilities.py |  46 +-
 21 files changed, 625 insertions(+), 295 deletions(-)
 create mode 100644 python/tvm/tir/schedule/testing.py
 create mode 100644 src/tir/schedule/traced_schedule.cc
 create mode 100644 src/tir/schedule/traced_schedule.h

[tvm] branch main updated (d4d4e89 -> 208a537)

2021-08-07 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from d4d4e89  [Contrib] Support fp16 input in cpu sort (#8672)
 add 208a537  [Refactor] Rename .asnumpy() to .numpy() (#8659)

No new revisions were added by this update.

Summary of changes:
 python/tvm/relay/frontend/onnx.py   | 12 ++--
 python/tvm/relay/frontend/tensorflow2_ops.py|  2 +-
 python/tvm/relay/testing/tf.py  |  4 ++--
 tests/micro/zephyr/test_zephyr.py   |  4 ++--
 tests/python/frontend/mxnet/test_forward.py |  8 +++-
 tests/python/frontend/tensorflow2/common.py |  2 +-
 tests/python/relay/test_op_level10.py   |  2 +-
 tests/python/relay/test_prng.py |  2 +-
 tests/python/relay/test_to_mixed_precision.py   |  4 ++--
 tests/python/topi/python/test_topi_loss.py  |  2 +-
 tests/python/topi/python/test_topi_prng.py  |  4 ++--
 tests/python/topi/python/test_topi_relu.py  |  2 +-
 tests/python/topi/python/test_topi_transform.py |  4 ++--
 web/tests/python/webgpu_rpc_test.py |  4 ++--
 web/tests/python/websock_rpc_test.py|  2 +-
 15 files changed, 28 insertions(+), 30 deletions(-)

[tvm] branch main updated (a729787 -> d4d4e89)

2021-08-07 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from a729787  [microTVM] Project API infrastructure (#8380)
 add d4d4e89  [Contrib] Support fp16 input in cpu sort (#8672)

No new revisions were added by this update.

Summary of changes:
 src/runtime/contrib/sort/sort.cc | 71 ++--
 tests/python/relay/test_op_level6.py | 42 +++--
 web/Makefile |  2 +-
 3 files changed, 83 insertions(+), 32 deletions(-)

[tvm] branch main updated (5140d90 -> 4b9d43e)

2021-08-03 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 5140d90  Parametrize ONNX Unit tests (#8621)
 add 4b9d43e  [Refactor] Avoid Override Generic Op Strategy in "hls.py" 
(#8614)

No new revisions were added by this update.

Summary of changes:
 python/tvm/relay/op/strategy/hls.py | 4 ++--
 tests/python/relay/test_any.py  | 1 -
 tests/python/relay/test_pass_alter_op_layout.py | 1 -
 3 files changed, 2 insertions(+), 4 deletions(-)

[tvm] branch main updated (9f29e2a -> 7653972)

2021-08-01 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 9f29e2a  [BUILD] Add caching to CMake (#8373)
 add 7653972  [Meta Schedule][M3a] Instruction and Trace (#8615)

No new revisions were added by this update.

Summary of changes:
 include/tvm/tir/schedule/instruction.h | 288 +++
 include/tvm/tir/schedule/schedule.h|  19 +-
 include/tvm/tir/schedule/state.h   |   8 -
 include/tvm/tir/schedule/trace.h   | 164 +++
 python/tvm/tir/schedule/__init__.py|   4 +-
 .../schedule/{_ffi_api_schedule.py => _ffi_api.py} |   0
 python/tvm/tir/schedule/block_scope.py |  14 +-
 python/tvm/tir/schedule/instruction.py | 166 +++
 python/tvm/tir/schedule/schedule.py| 130 +++--
 python/tvm/tir/schedule/state.py   |  26 +-
 python/tvm/tir/schedule/trace.py   | 260 ++
 src/tir/schedule/analysis.h|  16 -
 src/tir/schedule/analysis/analysis.cc  |  34 --
 src/tir/schedule/concrete_schedule.h   |  22 +-
 src/tir/schedule/instruction.cc| 102 
 src/tir/schedule/instruction_traits.h  | 536 +
 src/tir/schedule/primitive.h   |  36 +-
 src/tir/schedule/primitive/compute_inline.cc   |  51 ++
 src/tir/schedule/primitive/get_block_loop.cc   | 113 +
 src/tir/schedule/primitive/loop_transformation.cc  |  74 +++
 src/tir/schedule/primitive/reduction.cc|  29 ++
 src/tir/schedule/schedule.cc   |  38 +-
 src/tir/schedule/state.cc  |  17 +-
 src/tir/schedule/trace.cc  | 533 
 src/tir/schedule/utils.h   |   4 +
 .../unittest/test_tir_schedule_block_scope.py  |   7 +-
 .../unittest/test_tir_schedule_compute_inline.py   |  20 +-
 tests/python/unittest/test_tir_schedule_error.py   |   7 +-
 .../unittest/test_tir_schedule_instruction.py  |  68 +++
 .../python/unittest/test_tir_schedule_reduction.py |   5 +-
 .../unittest/test_tir_schedule_split_fuse.py   |   4 +-
 tests/python/unittest/test_tir_schedule_state.py   |  17 +-
 .../test_tir_schedule_state_cached_flags.py|  19 +-
 tests/python/unittest/test_tir_schedule_trace.py   | 241 +
 .../python/unittest/test_tir_schedule_utilities.py |   9 +-
 35 files changed, 2836 insertions(+), 245 deletions(-)
 create mode 100644 include/tvm/tir/schedule/instruction.h
 create mode 100644 include/tvm/tir/schedule/trace.h
 rename python/tvm/tir/schedule/{_ffi_api_schedule.py => _ffi_api.py} (100%)
 create mode 100644 python/tvm/tir/schedule/instruction.py
 create mode 100644 python/tvm/tir/schedule/trace.py
 create mode 100644 src/tir/schedule/instruction.cc
 create mode 100644 src/tir/schedule/instruction_traits.h
 create mode 100644 src/tir/schedule/primitive/get_block_loop.cc
 create mode 100644 src/tir/schedule/trace.cc
 create mode 100644 tests/python/unittest/test_tir_schedule_instruction.py
 create mode 100644 tests/python/unittest/test_tir_schedule_trace.py

[tvm] branch main updated (e39204e -> 66b3cc9)

2021-07-29 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from e39204e  [Bugfix] Preserve IRModule type definition and imports in 
NameMangleExtFuncs (#8523)
 add 66b3cc9  [TIR] cast disparate floating point types for binary ops 
(#8517)

No new revisions were added by this update.

Summary of changes:
 src/tir/op/op.cc   | 30 ++--
 tests/python/unittest/test_tir_base.py | 65 +-
 tests/python/unittest/test_tir_ops.py  | 19 +++---
 3 files changed, 89 insertions(+), 25 deletions(-)

[tvm] branch main updated (850abb0 -> e39204e)

2021-07-29 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 850abb0  [TOPI] Add transpose_a/b & dynamic shape support for batch 
matmul (#8527)
 add e39204e  [Bugfix] Preserve IRModule type definition and imports in 
NameMangleExtFuncs (#8523)

No new revisions were added by this update.

Summary of changes:
 src/relay/transforms/partition_graph.cc |  2 +-
 tests/python/relay/test_pass_partition_graph.py | 29 +
 2 files changed, 30 insertions(+), 1 deletion(-)

[tvm] branch main updated (cb395ff -> 850abb0)

2021-07-29 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from cb395ff  Disable pip cache when creating Docker images (#8575)
 add 850abb0  [TOPI] Add transpose_a/b & dynamic shape support for batch 
matmul (#8527)

No new revisions were added by this update.

Summary of changes:
 include/tvm/relay/attrs/nn.h   |  14 +-
 python/tvm/relay/frontend/tensorflow.py|  23 +++-
 python/tvm/relay/frontend/tensorflow_ops.py|  15 ++-
 python/tvm/relay/op/_tensor_grad.py|  56 +++-
 python/tvm/relay/op/nn/_nn.py  |  24 ++--
 python/tvm/relay/op/nn/nn.py   |  26 ++--
 python/tvm/relay/op/op_attrs.py|   5 +
 python/tvm/relay/op/strategy/cuda.py   |  15 ++-
 python/tvm/relay/op/strategy/generic.py|   5 +-
 python/tvm/topi/cuda/batch_matmul.py   | 115 ++--
 python/tvm/topi/cuda/tensorcore_alter_op.py|  11 +-
 python/tvm/topi/nn/batch_matmul.py | 134 --
 python/tvm/topi/testing/batch_matmul.py|  17 ++-
 python/tvm/topi/x86/batch_matmul.py| 149 -
 src/relay/op/make_op.h |   2 +-
 src/relay/op/nn/nn.cc  |  26 ++--
 src/relay/op/nn/nn.h   |  47 +++
 src/relay/qnn/op/batch_matmul.cc   |  10 +-
 .../transforms/combine_parallel_batch_matmul.cc|  13 +-
 src/relay/transforms/combine_parallel_dense.cc |   3 +-
 tests/python/frontend/tensorflow/test_forward.py   |  47 +--
 tests/python/relay/test_any.py |  92 +
 tests/python/relay/test_op_grad_level10.py |  20 ++-
 tests/python/relay/test_op_level10.py  |  81 ++-
 24 files changed, 673 insertions(+), 277 deletions(-)

[tvm-rfcs] branch comaniac-patch-1 updated (8d4be1a -> 4cde53e)

2021-07-27 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch comaniac-patch-1
in repository https://gitbox.apache.org/repos/asf/tvm-rfcs.git.


from 8d4be1a  Update README.md
 add 4cde53e  Update README.md

No new revisions were added by this update.

Summary of changes:
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

[tvm-rfcs] branch comaniac-patch-1 updated (5e0e01c -> 8d4be1a)

2021-07-27 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch comaniac-patch-1
in repository https://gitbox.apache.org/repos/asf/tvm-rfcs.git.


from 5e0e01c  Update the guideline of RFC tracking issues
 add 8d4be1a  Update README.md

No new revisions were added by this update.

Summary of changes:
 README.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

[tvm-rfcs] branch comaniac-patch-1 created (now 5e0e01c)

2021-07-27 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch comaniac-patch-1
in repository https://gitbox.apache.org/repos/asf/tvm-rfcs.git.


  at 5e0e01c  Update the guideline of RFC tracking issues

No new revisions were added by this update.

[tvm] branch main updated (ee207fd -> a492db8)

2021-07-26 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from ee207fd  [RPC] Add explicit type cast to print. (#8524)
 add a492db8  [Bugfix] Visit each input param of the function in 
ExprVisitor visit_function (#8521)

No new revisions were added by this update.

Summary of changes:
 python/tvm/relay/expr_functor.py | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

[tvm] branch main updated (e664ef0 -> 8ab2074)

2021-07-25 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from e664ef0  [PRINTER] Fix the repeatitive cast in scripr printing (#8531)
 add 8ab2074  [Frontend, Tensorflow2] Added support for TensorList ops 
(#8454)

No new revisions were added by this update.

Summary of changes:
 python/tvm/relay/frontend/tensorflow2.py   | 206 -
 python/tvm/relay/frontend/tensorflow2_ops.py   | 179 ++
 python/tvm/relay/frontend/tensorflow_ops.py|  12 ++
 .../frontend/tensorflow2/test_functional_models.py | 136 ++
 .../frontend/tensorflow2/test_sequential_models.py |  55 ++
 5 files changed, 583 insertions(+), 5 deletions(-)
 create mode 100644 python/tvm/relay/frontend/tensorflow2_ops.py

[tvm] branch main updated (eacc2cb -> e8c7f67)

2021-07-21 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from eacc2cb  [TIR] Bugfix for zero number arguments tir functions. (#8515)
 add e8c7f67  [Relay] Fix bug in test_op_level3 (#8508)

No new revisions were added by this update.

Summary of changes:
 tests/python/relay/test_op_level3.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

[tvm] branch main updated: Fix dynamic batching when use_implicit_batch=False (#8461)

2021-07-17 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new d3db5d6  Fix dynamic batching when use_implicit_batch=False (#8461)
d3db5d6 is described below

commit d3db5d65c9aefacda128756c15c7ec3f0a7b91ea
Author: Trevor Morris 
AuthorDate: Sat Jul 17 14:47:25 2021 -0700

Fix dynamic batching when use_implicit_batch=False (#8461)
---
 src/runtime/contrib/tensorrt/tensorrt_builder.cc | 13 +-
 src/runtime/contrib/tensorrt/tensorrt_runtime.cc |  8 +++-
 tests/python/contrib/test_tensorrt.py| 54 
 3 files changed, 46 insertions(+), 29 deletions(-)

diff --git a/src/runtime/contrib/tensorrt/tensorrt_builder.cc 
b/src/runtime/contrib/tensorrt/tensorrt_builder.cc
index d8182b0..08ac2ae 100644
--- a/src/runtime/contrib/tensorrt/tensorrt_builder.cc
+++ b/src/runtime/contrib/tensorrt/tensorrt_builder.cc
@@ -163,10 +163,19 @@ TensorRTEngineAndContext TensorRTBuilder::BuildEngine() {
 auto profile = builder_->createOptimizationProfile();
 for (int i = 0; i < network_->getNbInputs(); ++i) {
   auto name = network_->getInput(i)->getName();
-  auto dims = network_->getInput(i)->getDimensions();
-  profile->setDimensions(name, nvinfer1::OptProfileSelector::kMIN, dims);
+  const uint32_t entry_id = entry_id_map_[name];
+  std::vector shape(data_entry_[entry_id]->shape,
+ data_entry_[entry_id]->shape + 
data_entry_[entry_id]->ndim);
+  auto dims = VectorToTrtDims(shape);
+
   profile->setDimensions(name, nvinfer1::OptProfileSelector::kOPT, dims);
   profile->setDimensions(name, nvinfer1::OptProfileSelector::kMAX, dims);
+  // Set minimum batch size to 1 when dynamic batching is used.
+  if (network_->getInput(i)->getDimensions().nbDims >= 1 &&
+  network_->getInput(i)->getDimensions().d[0] == -1) {
+dims.d[0] = 1;
+  }
+  profile->setDimensions(name, nvinfer1::OptProfileSelector::kMIN, dims);
 }
 config_->addOptimizationProfile(profile);
   }
diff --git a/src/runtime/contrib/tensorrt/tensorrt_runtime.cc 
b/src/runtime/contrib/tensorrt/tensorrt_runtime.cc
index 6358e59..5562f85 100644
--- a/src/runtime/contrib/tensorrt/tensorrt_runtime.cc
+++ b/src/runtime/contrib/tensorrt/tensorrt_runtime.cc
@@ -140,6 +140,12 @@ class TensorRTRuntime : public JSONRuntimeBase {
   const std::string name = nodes_[nid].GetOpName() + "_" + 
std::to_string(j);
   int binding_index = engine->getBindingIndex(name.c_str());
   ICHECK_NE(binding_index, -1);
+  if (!use_implicit_batch_) {
+std::vector shape(data_entry_[eid]->shape,
+   data_entry_[eid]->shape + 
data_entry_[eid]->ndim);
+auto dims = VectorToTrtDims(shape);
+ICHECK(context->setBindingDimensions(binding_index, dims));
+  }
   if (data_entry_[eid]->device.device_type == kDLCUDA) {
 bindings[binding_index] = data_entry_[eid]->data;
   } else {
@@ -300,7 +306,7 @@ class TensorRTRuntime : public JSONRuntimeBase {
 helper.DeclareField("inputs", _and_context.inputs);
 helper.DeclareField("outputs", _and_context.outputs);
 helper.ReadAllFields();
-const int batch_size = 1;
+const int batch_size = GetBatchSize();
 trt_engine_cache_[std::make_pair(symbol_name_, batch_size)] = 
engine_and_context;
 return true;
   }
diff --git a/tests/python/contrib/test_tensorrt.py 
b/tests/python/contrib/test_tensorrt.py
index 59f1c3a..3f57df5 100644
--- a/tests/python/contrib/test_tensorrt.py
+++ b/tests/python/contrib/test_tensorrt.py
@@ -1251,33 +1251,35 @@ def test_tensorrt_dynamic_batch_conv():
 x_data = np.ones([max(batches_to_test)] + 
list(x_shape)[1:]).astype("float32")
 k_shape = (16, 32, 3, 3)
 params = {"kernel": np.random.uniform(-1, 1, k_shape).astype("float32")}
-result_arr = [{"cuda": {}, "llvm": {}} for _ in 
range(len(batches_to_test))]
-for use_trt in [True, False]:
-x = relay.var("x", shape=x_shape, dtype="float32")
-kernel = relay.var("kernel", shape=k_shape, dtype="float32")
-out = relay.nn.conv2d(x, kernel, channels=16, kernel_size=(3, 3), 
groups=1)
-f = relay.Function([x, kernel], out)
-mod = tvm.IRModule()
-mod["main"] = f
-if use_trt:
-mod, _ = tensorrt.partition_for_tensorrt(mod, params)
-
+for use_implicit_batch in [True, False]:
+result_arr = [{"cuda": {}, "llvm": {}} for _ in 
range(len(batches_to_test))]
+for use_trt in [True, False]:
+

[tvm] branch main updated (cba9cf3 -> c8b9900)

2021-07-16 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from cba9cf3  [VM] Fix the shape function of conv nhwc (#8480)
 add c8b9900  [BYOC] add multi functions support in partition pass (#8464)

No new revisions were added by this update.

Summary of changes:
 src/relay/analysis/annotated_region_set.cc |  18 ++-
 src/relay/analysis/annotated_region_set.h  |  11 +-
 src/relay/transforms/partition_graph.cc|  11 +-
 .../contrib/test_bnns/test_conv2d_patterns.py  |   6 +-
 tests/python/contrib/test_ethosn/test_networks.py  |   4 +
 .../contrib/test_vitis_ai/test_vitis_ai_codegen.py |   4 +-
 tests/python/relay/test_pass_partition_graph.py| 145 -
 7 files changed, 152 insertions(+), 47 deletions(-)

[tvm] branch main updated (8a8c9b2 -> bd88ee2)

2021-07-15 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 8a8c9b2  [AMP] Add default op attribute registration to __init__.py 
(#8460)
 add bd88ee2  Fix auto-scheduling after 9c6658721 (#8478)

No new revisions were added by this update.

Summary of changes:
 python/tvm/auto_scheduler/task_scheduler.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

[tvm] branch main updated: [CUDA] Improve injective schedule to enable half2 (#8457)

2021-07-13 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new 5c1a1cf  [CUDA] Improve injective schedule to enable half2 (#8457)
5c1a1cf is described below

commit 5c1a1cf7289b439b0042a85b63b0007dc1d9b98a
Author: Cody Yu 
AuthorDate: Tue Jul 13 17:57:19 2021 -0700

[CUDA] Improve injective schedule to enable half2 (#8457)

* [CUDA] Improve injective schedule to enable half2

* lint

* fix

* trigger ci
---
 python/tvm/topi/cuda/injective.py | 36 +---
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git a/python/tvm/topi/cuda/injective.py 
b/python/tvm/topi/cuda/injective.py
index cce56b7..0faddc3 100644
--- a/python/tvm/topi/cuda/injective.py
+++ b/python/tvm/topi/cuda/injective.py
@@ -16,6 +16,8 @@
 # under the License.
 # pylint: disable=invalid-name, unused-variable,
 """Schedule for composition of injective operator"""
+import numpy as np
+
 import tvm
 from tvm import te
 from .. import utils
@@ -36,13 +38,21 @@ def schedule_injective_from_existing(sch, out):
 sch: Schedule
  The updated schedule.
 """
+
+def find_nearest_small_factor(num, target):
+"""Find the nearest factor of the given number that is smaller than 
the target."""
+for i in range(target, 0, -1):
+if num % i == 0:
+return i
+# Unreachable because i=1 must hold.
+return -1
+
 fused = sch[out].fuse(*sch[out].op.axis)
 num_thread = tvm.target.Target.current(allow_none=False).max_num_threads
 max_block = 256
 
-# vectorize on fp16 data type. This allows to better utilize the memory
-# bandwidth.
-vector_width = 4 if out.dtype == "float16" else 1
+# Vectorize on fp16 data type to enable half2 for better memory bandwidth 
utilization.
+vector_width = 2 if out.dtype == "float16" else 1
 
 is_dynamic_output = False
 for dim in out.shape:
@@ -54,6 +64,26 @@ def schedule_injective_from_existing(sch, out):
 
 try:
 const_size = utils.get_const_int(out_len)
+
+# Adjust block and thread to make sure they are dividable so that 
vectorize can be
+# correctly applied.
+if vector_width > 1 and const_size % vector_width == 0:
+remain_total_size = const_size // vector_width
+cand_sizes = []
+for max_size in [num_thread, max_block]:
+cand_sizes.append(
+max_size
+if remain_total_size % max_size == 0
+else find_nearest_small_factor(remain_total_size, max_size)
+)
+remain_total_size //= cand_sizes[-1]
+
+# If the product of candidate dividable (block * thread) is too 
small,
+# then the performance may be worse even half2 is enabled. Note 
that 0.7
+# is just a heuristic ratio and may not be optimal for all 
workloads.
+if np.prod(cand_sizes) / (max_block * num_thread) >= 0.7:
+num_thread, max_block = cand_sizes
+
 need_block_split = const_size > max_block * num_thread * vector_width
 except ValueError:
 need_block_split = False

[tvm] branch main updated (d67514b -> 136f218)

2021-07-13 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from d67514b  [PROFILING] Use PAPI to collect hardware performance counters 
on CPU and CUDA (#7983)
 add 136f218  [Relay][ONNX] Batch_matmul to dense optimization (#8440)

No new revisions were added by this update.

Summary of changes:
 python/tvm/relay/frontend/onnx.py  | 27 ++-
 tests/python/frontend/onnx/test_forward.py |  7 ---
 2 files changed, 22 insertions(+), 12 deletions(-)

[tvm] branch main updated: fix wrong log of tir pass VerifyMemory (#8445)

2021-07-11 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new 3a9a388  fix wrong log of tir pass VerifyMemory (#8445)
3a9a388 is described below

commit 3a9a388229d701007cbefe96e9625ecd237a45c6
Author: Sen Yang 
AuthorDate: Mon Jul 12 01:04:27 2021 +0800

fix wrong log of tir pass VerifyMemory (#8445)
---
 src/tir/analysis/verify_memory.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/tir/analysis/verify_memory.cc 
b/src/tir/analysis/verify_memory.cc
index 3c29e4e..2089ead 100644
--- a/src/tir/analysis/verify_memory.cc
+++ b/src/tir/analysis/verify_memory.cc
@@ -170,7 +170,7 @@ class MemoryAccessVerifier final : protected 
StmtExprVisitor {
 /// Interface of VerifyMemory pass
 std::vector VerifyMemory_(const PrimFunc& func) {
   auto target = func->GetAttr(tvm::attr::kTarget);
-  ICHECK(target.defined()) << "LowerWarpMemory: Require the target attribute";
+  ICHECK(target.defined()) << "VerifyMemory: Require the target attribute";
 
   if (func->GetAttr(tvm::attr::kCallingConv, 
Integer(CallingConv::kDefault)) ==
   CallingConv::kDefault) {

[tvm] branch main updated: Replace RuntimeError in _lookup_task with deferred error. (#8421)

2021-07-09 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new 6141cac  Replace RuntimeError in _lookup_task with deferred error. 
(#8421)
6141cac is described below

commit 6141cac635fbdaad25b0f8ec3bce130e787922b5
Author: Matt Welsh (OctoML) <63477620+mdw-oct...@users.noreply.github.com>
AuthorDate: Fri Jul 9 14:56:34 2021 -0400

Replace RuntimeError in _lookup_task with deferred error. (#8421)

* Replace RuntimeError in _lookup_task with deferred error.

This allows unknown tasks to be created (e.g., when parsing
autotvm log files) but not invoked.

* Format.

* Update python/tvm/autotvm/task/task.py

Co-authored-by: Cody Yu 

Co-authored-by: Matt Welsh 
Co-authored-by: Cody Yu 
---
 python/tvm/autotvm/task/task.py | 29 -
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/python/tvm/autotvm/task/task.py b/python/tvm/autotvm/task/task.py
index 3097c29..ee17508 100644
--- a/python/tvm/autotvm/task/task.py
+++ b/python/tvm/autotvm/task/task.py
@@ -40,11 +40,11 @@ from .space import ConfigSpace
 def _lookup_task(name):
 task = TASK_TABLE.get(name)
 if task is None:
-raise RuntimeError(
-f"Could not find a registered function for the task {name}. It is "
-"possible that the function is registered in a python file which 
was "
-"not imported in this run."
-)
+# Unable to find the given task. This might be because we are
+# creating a task based on a name that has not been imported.
+# Rather than raising an exception here, we return a dummy
+# task which cannot be invoked.
+task = MissingTask(name)
 return task
 
 
@@ -264,6 +264,25 @@ class TaskTemplate(object):
 return inputs
 
 
+class MissingTask(TaskTemplate):
+"""
+Dummy task template for a task lookup which cannot be resolved.
+This can occur if the task being requested from _lookup_task()
+has not been imported in this run.
+"""
+
+def __init__(self, taskname: str):
+super().__init__()
+self._taskname = taskname
+
+def __call__(self, *args, **kwargs):
+raise RuntimeError(
+f"Attempting to invoke a missing task {self._taskname}."
+"It is possible that the function is registered in a "
+"Python module that is not imported in this run, or the log is 
out-of-date."
+)
+
+
 def _register_task_compute(name, func=None):
 """Register compute function to autotvm task

[tvm] branch main updated (b803bab -> c8f54f9)

2021-06-29 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from b803bab  Decoupling AOT from graph memory planner (#8096)
 add c8f54f9  [Bugfix, CuDNN] fix segfault when cudnnDestroy called with 
destroyed cuda context (#8267)

No new revisions were added by this update.

Summary of changes:
 src/runtime/contrib/cudnn/cudnn_utils.cc | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

[tvm] branch main updated (b71b837 -> 4ff5cef)

2021-06-27 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from b71b837  Remove an extra print from the relay astext tests (#8342)
 add 4ff5cef  ffi: add missing binding for FixedPointMultiplyAttrs (#8353)

No new revisions were added by this update.

Summary of changes:
 python/tvm/relay/op/op_attrs.py | 5 +
 1 file changed, 5 insertions(+)

[tvm] branch main updated (53e4c60 -> 1f2ca06)

2021-06-09 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 53e4c60  [DOC] Improve "Getting Started with TVM" tutorials and fix 
warnings (#8221)
 add 1f2ca06  Expose list of PassContext configurations to the Python APIs 
(#8212)

No new revisions were added by this update.

Summary of changes:
 include/tvm/ir/transform.h   |  6 ++
 python/tvm/ir/transform.py   |  5 +
 src/ir/transform.cc  | 14 ++
 tests/cpp/relay_transform_sequential_test.cc |  7 +++
 tests/python/relay/test_pass_instrument.py   |  7 +++
 5 files changed, 39 insertions(+)

1 2 3 >

1 - 100 of 234 matches

Mail list logo