Re: [PR] [3rdparty] AUTO mode for custom all-reduce strategy [tvm]

2024-03-26 Thread via GitHub


yongwww merged PR #16797:
URL: https://github.com/apache/tvm/pull/16797


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(tvm) branch main updated: [3rdparty] AUTO mode for custom all-reduce strategy (#16797)

2024-03-26 Thread yongwww
This is an automated email from the ASF dual-hosted git repository.

yongwww pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new 2f889774ec [3rdparty] AUTO mode for custom all-reduce strategy (#16797)
2f889774ec is described below

commit 2f889774ec10b56ebfac89f78698e06eb200db46
Author: Ruihang Lai 
AuthorDate: Wed Mar 27 01:30:09 2024 -0400

[3rdparty] AUTO mode for custom all-reduce strategy (#16797)

This PR adds the automatic mode selection for customized all-reduce
kernels, referring TensorRT-LLM.

Meanwhile, this PR fixes a bug that may cause customized all-reduce
kernel to hang forever. Prior to this PR, each worker resets its
barrier values to 0 *after using all-gather to exchange their
barrier handles*. Afterwards, the customized all-reduce kernels
update the barriers of all workers. So it is possible that, worker 0
updates worker 1's barrier *before* worker 1 resets its barrier to 0.
This lead to the all-reduce kernel hanging forever.

This PR changes the behavior to resetting barriers before all-gather,
and forcing a device synchronization after reset.
---
 3rdparty/tensorrt_llm/custom_allreduce_kernels.h   | 33 ++
 .../tvm/relax/transform/ipc_allreduce_rewrite.py   |  2 --
 src/runtime/disco/cuda_ipc/cuda_ipc_memory.cc  | 26 +++--
 src/runtime/disco/cuda_ipc/custom_allreduce.cc | 12 ++--
 tests/python/disco/test_custom_allreduce.py|  4 +++
 5 files changed, 63 insertions(+), 14 deletions(-)

diff --git a/3rdparty/tensorrt_llm/custom_allreduce_kernels.h 
b/3rdparty/tensorrt_llm/custom_allreduce_kernels.h
index 7fd66e5d10..7c515a03ac 100644
--- a/3rdparty/tensorrt_llm/custom_allreduce_kernels.h
+++ b/3rdparty/tensorrt_llm/custom_allreduce_kernels.h
@@ -25,8 +25,10 @@ constexpr size_t MAX_RANKS_PER_NODE = 8;
 constexpr size_t DEFAULT_BLOCK_SIZE = 1024;
 
 enum class AllReduceStrategyType : int8_t {
+  RING = 0,
   ONESHOT = 1,
   TWOSHOT = 2,
+  AUTO = 3,
 };
 
 struct AllReduceParams {
@@ -42,6 +44,37 @@ struct AllReduceParams {
   void* local_output_buffer_ptr;
 };
 
+inline size_t GetMaxRequiredWorkspaceSize(int world_size) {
+  if (world_size <= 2) {
+return 16 * 1000 * 1000;
+  }
+  return 8 * 1000 * 1000;
+}
+
+inline AllReduceStrategyType SelectImplementation(size_t message_size, int 
world_size) {
+  const size_t maxWorkspaceSize = GetMaxRequiredWorkspaceSize(world_size);
+
+  if (message_size > maxWorkspaceSize) {
+return AllReduceStrategyType::RING;
+  }
+
+  if (world_size <= 2) {
+return AllReduceStrategyType::ONESHOT;
+  }
+
+  if (world_size <= 4) {
+if (message_size < 1 * 1000 * 1000) {
+  return AllReduceStrategyType::ONESHOT;
+}
+return AllReduceStrategyType::TWOSHOT;
+  }
+
+  if (message_size < 500 * 1000) {
+return AllReduceStrategyType::ONESHOT;
+  }
+  return AllReduceStrategyType::TWOSHOT;
+}
+
 void customAllReduce(AllReduceParams& params, void* data, size_t elts, 
DLDataType dataType,
  AllReduceStrategyType strat, cudaStream_t stream);
 
diff --git a/python/tvm/relax/transform/ipc_allreduce_rewrite.py 
b/python/tvm/relax/transform/ipc_allreduce_rewrite.py
index 3e7b005a60..df40181cb9 100644
--- a/python/tvm/relax/transform/ipc_allreduce_rewrite.py
+++ b/python/tvm/relax/transform/ipc_allreduce_rewrite.py
@@ -40,8 +40,6 @@ class IPCAllReduceRewrite:
 The all-reduce strategy. Only "1" and "2" are supported.
 "1" stands for one-shot, and "2" stands for two-shot.
 """
-if allreduce_strategy not in [1, 2]:
-raise ValueError(f"All-reduce strategy {allreduce_strategy} is not 
supported.")
 self.allreduce_strategy = allreduce_strategy
 
 def transform_module(self, mod: IRModule, _ctx: tvm.transform.PassContext) 
-> IRModule:
diff --git a/src/runtime/disco/cuda_ipc/cuda_ipc_memory.cc 
b/src/runtime/disco/cuda_ipc/cuda_ipc_memory.cc
index 451c3df0cb..fec5abec86 100644
--- a/src/runtime/disco/cuda_ipc/cuda_ipc_memory.cc
+++ b/src/runtime/disco/cuda_ipc/cuda_ipc_memory.cc
@@ -91,15 +91,13 @@ class CUDAIPCMemoryAllocator final : public 
memory::PooledAllocator {
  private:
   void* DeviceAllocDataSpace(Device dev, size_t size, size_t alignment,
  DLDataType type_hint) final {
-auto [data_ptr, data_comm_ptrs] = AllocIPCMemory(dev, size, alignment, 
type_hint);
+auto [data_ptr, data_comm_ptrs] =
+AllocIPCMemory(dev, size, alignment, type_hint, 
/*reset_memory_to_zero=*/false);
 int barrier_ptr_size = sizeof(uint32_t) * (MAX_ALL_REDUCE_BLOCKS + 2) * 
MAX_RANKS_PER_NODE;
-auto [barrier_in_ptr, barrier_in_comm_ptrs] =
-AllocIPCMemory(dev, barrier_ptr_size, alignment, DataType::UInt(32));
-auto [barrier_out_ptr, barrier_out_comm_ptrs] =
-AllocIPCMemory(dev, barrier_ptr_size, 

Re: [PR] [TIR] Modify IntImmNode deep_equal to match regardless of type [tvm]

2024-03-26 Thread via GitHub


quic-sanirudh commented on PR #16795:
URL: https://github.com/apache/tvm/pull/16795#issuecomment-2021948537

   > Yeah I think fixing the dtype is a good idea, it would hopefully avoid 
this kind of problems in the future as well. Out of interest, what were the 
mismatching dtypes of the two compared `IntImmNode`s that you observed 
@quic-sanirudh?
   
   Thanks @ekalda. I'll update the PR to fix the dtypes in RampNode (and 
perhaps the broadcast node as well).
   
   The dtypes in my case were `int32` and `int64`. The expression I saw was 
something like this (slightly simpler version)
   `T.Broadcast(c, 128) + T.Ramp(T.int64(0), T.int64(1), T.int64(128))`
   
   The RampNode seems to get the int64 lanes because the all the iterators in 
our case is by default int64, but the broadcast seems to be inserted during the 
[evaluation of AddNode in op.cc 
here](https://github.com/apache/tvm/blob/d43e1ab71d5d9e16bbc962d4d7952dcc7a1cdbca/src/tir/op/op.cc#L126-L139)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(tvm) branch nightly updated (b2204ae698 -> d43e1ab71d)

2024-03-26 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch nightly
in repository https://gitbox.apache.org/repos/asf/tvm.git


from b2204ae698 [IR] Default to empty attributes, instead of NULL (#16745)
 add 69c091400a [Fix] Fix build errors with VS2022 (#16790)
 add ae7b8d9aed [Codegen, Cuda] Add overload for fp8x4 e5m2 <-> half4 
conversion (#16787)
 add 72f0326a88 [Analysis] Allow calls to GlobalVar in @R.function (#16778)
 add bf2d43e314 [IR][Relax] Improve highlighting in assert_structural_equal 
(#16756)
 add bcfbcabff8 [Bugfix][Cutlass] Remove a typo in cutlass build (#16789)
 add 016b512ad4 [Relax] Refactor PatternRewriter into separate Block/Expr 
mutators (#16730)
 add 8274d142a3 [Relax] Implement operators to inspec DLTensor::strides and 
offset  (#16721)
 add 571fdaf1eb [Web] Add `kv_state` and `rnn_state` to wasm_runtime 
(#16791)
 add 4f3a863c1f [Cutlass] Add check for group gemm param shapes (#16788)
 add ac2f47867f [SME] Add support for inserting processor state annotations 
(#16761)
 add a768ee4900 [Fix] fix for numpy 2.0 compatibility (#16793)
 add d43e1ab71d [Doc] Fix set_axis_separator example (#16792)

No new revisions were added by this update.

Summary of changes:
 include/tvm/relax/analysis.h   |   6 +-
 include/tvm/relax/dataflow_matcher.h   |   4 +-
 include/tvm/relax/expr.h   |  24 +-
 python/tvm/_ffi/runtime_ctypes.py  |   2 +-
 python/tvm/contrib/cutlass/build.py|   2 +-
 python/tvm/relax/analysis/analysis.py  |   8 +-
 python/tvm/relax/expr.py   |  97 
 .../tvm/relax/transform/legalize_ops/__init__.py   |   1 +
 .../tvm/relax/transform/legalize_ops/inspect_op.py | 128 +++
 python/tvm/relay/frontend/paddlepaddle.py  |   2 +-
 python/tvm/relay/frontend/pytorch.py   |   4 +-
 python/tvm/script/parser/core/entry.py |  26 ++-
 python/tvm/tir/schedule/schedule.py|   2 +-
 python/tvm/topi/arm_cpu/pstate_attributes.py   |  84 +++
 src/node/structural_equal.cc   |  45 ++--
 src/relax/analysis/well_formed.cc  |  47 ++--
 src/relax/ir/dataflow_matcher.cc   | 238 ++-
 src/relax/ir/expr.cc   |  50 
 src/relax/op/tensor/inspect.cc | 180 ---
 src/relax/op/tensor/inspect.h  |  39 
 src/runtime/contrib/cutlass/fp8_group_gemm.cu  |   4 +-
 src/runtime/metadata.cc|   3 +-
 src/target/llvm/codegen_aarch64.cc | 102 +
 src/target/source/literal/cuda_half_t.h|  23 +-
 src/tir/analysis/identify_memcpy.cc|   2 +-
 src/tir/contrib/ethosu/passes.cc   |   2 +-
 src/tir/transforms/lower_tvm_builtin.cc|  36 ++-
 .../python/codegen/test_target_codegen_aarch64.py  | 116 +-
 .../contrib/test_msc/test_translate_tensorflow.py  |   2 +-
 tests/python/frontend/pytorch/test_forward.py  |   4 +-
 tests/python/frontend/tensorflow/test_forward.py   |   2 +-
 tests/python/relax/test_analysis_well_formed.py|  34 +++
 tests/python/relax/test_op_inspect.py  | 252 +
 tests/python/relax/test_op_unpack.py   | 127 ---
 tests/python/relax/test_tvmscript_parser.py|  37 +++
 tests/python/relax/test_utils.py   |  63 +-
 tests/python/relay/test_op_level3.py   |   4 +-
 .../test_tir_transform_lower_tvm_builtin.py|  37 ++-
 tests/python/topi/test_topi_math.py|   4 +-
 web/emcc/wasm_runtime.cc   |   2 +
 40 files changed, 1480 insertions(+), 365 deletions(-)
 create mode 100644 python/tvm/relax/transform/legalize_ops/inspect_op.py
 create mode 100644 python/tvm/topi/arm_cpu/pstate_attributes.py
 create mode 100644 src/target/llvm/codegen_aarch64.cc
 create mode 100644 tests/python/relax/test_op_inspect.py
 delete mode 100644 tests/python/relax/test_op_unpack.py



[I] [Bug] [VTA, RPC] Can’t upload custom bit file by RPC on ZCU104 [tvm]

2024-03-26 Thread via GitHub


muonkmu opened a new issue, #16799:
URL: https://github.com/apache/tvm/issues/16799

   I am testing VTA in the following environment.
   
   Target : ZCU104 (pynq 2.7)
   Host : ubuntu 20.04 + TVM(v0.16,dev0)
   xilinx toos : vivado 2020.1
   
   I successfully synthesized the “vta.bit” file for ZCU104, and successfully 
launched the PRC server on ZCU104. However, if I try to upload “vta.bit” using 
“vta.program_fpga (remote, bitstream=“vta.bit”)”, the following error occurs. 
   
   Which version of TVM and Pynq are guaranteed compatibility  Is there a 
solution for this.
   
   ```bash
   Traceback (most recent call last):
 File "Simple_Matrix_Multiply.py", line 24, in 
   vta.program_fpga(remote, bitstream="vta.bit")
 File 
"/home/minwook/Workspace/Study_lab/71_tvm/tvm/vta/python/vta/rpc_client.py", 
line 66, in program_fpga
   fprogram(os.path.basename(bitstream))
 File 
"/home/minwook/Workspace/Study_lab/71_tvm/tvm/python/tvm/_ffi/_ctypes/packed_func.py",
 line 239, in __call__
   raise_last_ffi_error()
 File 
"/home/minwook/Workspace/Study_lab/71_tvm/tvm/python/tvm/_ffi/base.py", line 
481, in raise_last_ffi_error
   raise py_err
   tvm.error.RPCError: Traceback (most recent call last):
 3: tvm::runtime::RPCWrappedFunc::operator()(tvm::runtime::TVMArgs, 
tvm::runtime::TVMRetValue*) const
 2: tvm::runtime::RPCClientSession::CallFunc(void*, TVMValue const*, int 
const*, int, std::function const&)
 1: tvm::runtime::RPCEndpoint::CallFunc(void*, TVMValue const*, int const*, 
int, std::function)
 0: tvm::runtime::RPCEndpoint::HandleUntilReturnEvent(bool, 
std::function)
 File 
"/home/minwook/Workspace/Study_lab/71_tvm/tvm/src/runtime/rpc/rpc_endpoint.cc", 
line 427
   RPCError: Error caught from RPC call:
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [RELAX] Tuning capability for external cuBLAS codegen [tvm]

2024-03-26 Thread via GitHub


vinx13 commented on code in PR #16764:
URL: https://github.com/apache/tvm/pull/16764#discussion_r1539988421


##
src/runtime/contrib/cublas/cublas_json_runtime.cc:
##
@@ -129,14 +132,50 @@ class CublasJSONRuntime : public JSONRuntimeBase {
 
 auto [a_ptr, b_ptr, bias_ptr] = get_inputs(node, epilogue != 
CUBLASLT_EPILOGUE_DEFAULT);
 
+const cublasLtMatmulAlgo_t* predef_algo_ptr = nullptr;
+int64_t dyn_dim_val = 
dl_tensors[std::get<0>(dyn_dim_position)]->shape[std::get<1>(dyn_dim_position)];
+auto algo_desc = algo_collection(dyn_dim_val);
+if (algo_desc.defined())
+  predef_algo_ptr = _desc->algo;
+
 tvm::contrib::CallCublasLt(entry_ptr->handle, stream, 
entry_ptr->matmul_pref_desc, a_ptr,
b_ptr, bias_ptr, out_ptr, transa, transb,
-   entry_ptr->workspace_ptr, 
entry_ptr->workspace_size, epilogue);
+   entry_ptr->workspace_ptr, 
entry_ptr->workspace_size, epilogue,
+   predef_algo_ptr);
   }
 }
   }
 
   void Run() override { LOG(FATAL) << "Unreachable"; }
+
+ protected:
+  void LoadPredefAlgoCollection() {
+for (const auto& node : nodes_) {
+  if (node.GetOpType() == "kernel" && node.HasAttr("predefined_algos")) {
+// Load algo collection
+auto predef_algos_str = 
node.GetAttr>("predefined_algos");
+ICHECK_EQ(predef_algos_str.size(), 1);
+algo_collection = 
tvm::contrib::AlgoCollection::FromJSON(predef_algos_str[0]);
+
+// Define dynamic dimension position
+for (const auto& ne : node.GetInputs()) {
+  auto shape = nodes_[ne.id_].GetOpShape()[ne.index_];
+  auto found = std::find(shape.begin(), shape.end(), -1);
+  if (found != shape.end()) {
+uint32_t dyn_dim_idx = std::distance(shape.begin(), found);
+uint32_t dyn_dim_eid = EntryID(ne);
+dyn_dim_position = {dyn_dim_eid, dyn_dim_idx};

Review Comment:
   when there are multiple nodes with predefined algos, does overwrite the 
results of previous iterations?



##
src/relax/backend/contrib/cublas/algo_db.h:
##
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \brief Codegen part of tuning capabilities for cublas matmul primitives. 
+ */
+
+#include 
+
+#include "../../../../runtime/contrib/cublas/cublas_algo.h"
+
+namespace tvm {
+namespace relax {
+namespace contrib {
+
+using AlgoCollection = tvm::contrib::AlgoCollection;
+using AlgoDesc = tvm::contrib::AlgoDesc;
+
+/*! \brief Algo database with predefined Algo objects. */
+class AlgoDatabaseNode: public runtime::Object {
+  /*! \brief Mapping of compisite func struct hash to algo colelction. */
+  std::map collections;
+
+public:
+  void VisitAttrs(tvm::AttrVisitor* v) {
+// v->Visit("collections", );

Review Comment:
   remove this



##
src/runtime/contrib/cublas/cublas_json_runtime.cc:
##
@@ -129,14 +132,50 @@ class CublasJSONRuntime : public JSONRuntimeBase {
 
 auto [a_ptr, b_ptr, bias_ptr] = get_inputs(node, epilogue != 
CUBLASLT_EPILOGUE_DEFAULT);
 
+const cublasLtMatmulAlgo_t* predef_algo_ptr = nullptr;
+int64_t dyn_dim_val = 
dl_tensors[std::get<0>(dyn_dim_position)]->shape[std::get<1>(dyn_dim_position)];
+auto algo_desc = algo_collection(dyn_dim_val);
+if (algo_desc.defined())
+  predef_algo_ptr = _desc->algo;

Review Comment:
   nit
   ```suggestion
   if (algo_desc.defined()) {
 predef_algo_ptr = _desc->algo;
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(tvm) branch main updated (a768ee4900 -> d43e1ab71d)

2024-03-26 Thread wuwei
This is an automated email from the ASF dual-hosted git repository.

wuwei pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


from a768ee4900 [Fix] fix for numpy 2.0 compatibility (#16793)
 add d43e1ab71d [Doc] Fix set_axis_separator example (#16792)

No new revisions were added by this update.

Summary of changes:
 python/tvm/tir/schedule/schedule.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)



Re: [PR] [Doc] Fix set_axis_separator example [tvm]

2024-03-26 Thread via GitHub


vinx13 merged PR #16792:
URL: https://github.com/apache/tvm/pull/16792


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(tvm) branch main updated (ac2f47867f -> a768ee4900)

2024-03-26 Thread tqchen
This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


from ac2f47867f [SME] Add support for inserting processor state annotations 
(#16761)
 add a768ee4900 [Fix] fix for numpy 2.0 compatibility (#16793)

No new revisions were added by this update.

Summary of changes:
 python/tvm/_ffi/runtime_ctypes.py  | 2 +-
 python/tvm/relay/frontend/paddlepaddle.py  | 2 +-
 python/tvm/relay/frontend/pytorch.py   | 4 ++--
 tests/python/contrib/test_msc/test_translate_tensorflow.py | 2 +-
 tests/python/frontend/pytorch/test_forward.py  | 4 ++--
 tests/python/frontend/tensorflow/test_forward.py   | 2 +-
 tests/python/relay/test_op_level3.py   | 4 +---
 tests/python/topi/test_topi_math.py| 4 +---
 8 files changed, 10 insertions(+), 14 deletions(-)



Re: [PR] [Fix] fix for numpy 2.0 compatibility [tvm]

2024-03-26 Thread via GitHub


tqchen merged PR #16793:
URL: https://github.com/apache/tvm/pull/16793


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Target] Use LLVM target parser for determining Arm(R) A-Profile Architecture features [tvm]

2024-03-26 Thread via GitHub


cbalint13 commented on PR #16425:
URL: https://github.com/apache/tvm/pull/16425#issuecomment-2021120932

   > > Here is a reproducer:
   > > mem_leak.cpp
   > 
   > Thanks a lot for this, I start to look at it now.
   
   @lhutton1 ,
   
   Here is a patch:  
[tvm-llvm-memleak.diff.gz](https://github.com/apache/tvm/files/14762596/tvm-llvm-memleak.diff.gz)
   Can confirm that on your side is fine ?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [SME] Add support for inserting processor state annotations [tvm]

2024-03-26 Thread via GitHub


ekalda merged PR #16761:
URL: https://github.com/apache/tvm/pull/16761


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(tvm) branch main updated (4f3a863c1f -> ac2f47867f)

2024-03-26 Thread ekalda
This is an automated email from the ASF dual-hosted git repository.

ekalda pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


from 4f3a863c1f [Cutlass] Add check for group gemm param shapes (#16788)
 add ac2f47867f [SME] Add support for inserting processor state annotations 
(#16761)

No new revisions were added by this update.

Summary of changes:
 python/tvm/topi/arm_cpu/pstate_attributes.py   |  84 +++
 src/target/llvm/codegen_aarch64.cc | 102 ++
 .../python/codegen/test_target_codegen_aarch64.py  | 116 -
 3 files changed, 300 insertions(+), 2 deletions(-)
 create mode 100644 python/tvm/topi/arm_cpu/pstate_attributes.py
 create mode 100644 src/target/llvm/codegen_aarch64.cc



Re: [PR] [SME] Add support for inserting processor state annotations [tvm]

2024-03-26 Thread via GitHub


ekalda commented on PR #16761:
URL: https://github.com/apache/tvm/pull/16761#issuecomment-2021063350

   Thanks @lhutton1!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [TIR] Modify IntImmNode deep_equal to match regardless of type [tvm]

2024-03-26 Thread via GitHub


ekalda commented on PR #16795:
URL: https://github.com/apache/tvm/pull/16795#issuecomment-2021055007

   Yeah I think fixing the dtype is a good idea, it would hopefully avoid this 
kind of problems in the future as well. Out of interest, what were the 
mismatching dtypes of the two compared `IntImmNode`s that you observed 
@quic-sanirudh? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Relax] Improve CanonicalizeBindings in DataflowVar edge case [tvm]

2024-03-26 Thread via GitHub


Lunderberg commented on PR #16783:
URL: https://github.com/apache/tvm/pull/16783#issuecomment-2021040212

   @tvm-bot rerun


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Relax][Transform] Provide callback versions of LazyTransformParams [tvm]

2024-03-26 Thread via GitHub


Lunderberg commented on PR #16798:
URL: https://github.com/apache/tvm/pull/16798#issuecomment-2021025223

   This PR is currently marked as a draft, as the unit tests depend on 
functionality introduced in https://github.com/apache/tvm/pull/16642.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [Relax][Transform] Provide callback versions of LazyTransformParams [tvm]

2024-03-26 Thread via GitHub


Lunderberg opened a new pull request, #16798:
URL: https://github.com/apache/tvm/pull/16798

   Prior to this commit, the `LazyTransformParams` function could be used to 
load model parameters on demand.  However, the function used to load or set 
parameters needed to be registered within the global registry of `PackedFunc`s. 
 This PR provides `LazyGetInput` and `LazySetOutput` transforms, which perform 
the lazy-loading through a `R.Callable` callback argument, rather than through 
a globally-registered `PackedFunc`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [SVE] Support scalable vectors in LoopVectorizer [tvm]

2024-03-26 Thread via GitHub


ekalda commented on PR #16782:
URL: https://github.com/apache/tvm/pull/16782#issuecomment-2020983646

   Thank you for your feedback @Lunderberg, much appreciated!
   
   > The implementation looks reasonable, though I have one main question for 
it: What is the behavior of the updated pass for a target that doesn't support 
SVE? Prior SVE-commits enabled the functionality, but didn't produce SVE in any 
of the default lowering passes.
   > 
   > From [this 
line](https://github.com/apache/tvm/pull/16696/files#diff-f61b04b100f5145f2681340c81d3f2af221239594ed01e2e24896522329ce92cR598-R600),
 versions of LLVM before 11.0 do not support SVE, nor from my brief reading of 
the CUDA codegen 
[here](https://github.com/apache/tvm/blob/main/src/target/source/codegen_cuda.cc#L253)
 does CUDA.
   
   When it comes to targets that don't support SVE, I'd expect these targets to 
not trigger the creation of scalable vectors. In the current plan the creation 
of scalable vectors has to be intentional, i.e. it comes from splitting an axis 
by a `vscale` dependent expression in the (target dependent) schedules and 
vectorizing the resulting axis. If the `LoopVectorizer` is trying to create 
scalable vectors for target that doesn't support it, something has gone wrong 
and the compilation will fall over at some point:
   * If by some mistake a schedule that doesn't support VLA programming 
contains `vscale`, it will fall over latest in a target dependent codegen
   * If there is an attempt to vectorize loops with non-int extent that doesn't 
contain `vscale`, the "scalable ramp" creation will error since it expects the 
`PrimExpr lanes` in a form `vscale * int`. I realize though that this is a 
weird deviation from a current behaviour of
   ```
 if (!extent_as_int || extent_as_int->value < 1) {
   LOG(FATAL) << "Failed to vectorize loop with extent " << op->extent;
 }
   ```
   so I'll modify the patch such that it checks for a target and fails as 
before if the extent is not an int.
   
   > Since `VectorizeLoop` occurs after the `BindTarget` pass, we can check the 
function attribute to know which target will be executing each function. I 
think we should have the loop vectorization apply only to fixed-extent loops by 
default, but enable the scalable vectorization for targets that support it.
   
   In principle I'm not against making vectorizing for scalable vectors 
functionality more explicitly target specific, but it is not obvious to me what 
that would mean in terms of code? `ICHECK`s for the appropriate targets at the 
places where scalable vectors are created? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Cutlass] Add check for group gemm param shapes [tvm]

2024-03-26 Thread via GitHub


tqchen merged PR #16788:
URL: https://github.com/apache/tvm/pull/16788


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(tvm) branch main updated (571fdaf1eb -> 4f3a863c1f)

2024-03-26 Thread tqchen
This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


from 571fdaf1eb [Web] Add `kv_state` and `rnn_state` to wasm_runtime 
(#16791)
 add 4f3a863c1f [Cutlass] Add check for group gemm param shapes (#16788)

No new revisions were added by this update.

Summary of changes:
 src/runtime/contrib/cutlass/fp8_group_gemm.cu | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)



Re: [PR] [TIR] Modify IntImmNode deep_equal to match regardless of type [tvm]

2024-03-26 Thread via GitHub


tqchen commented on PR #16795:
URL: https://github.com/apache/tvm/pull/16795#issuecomment-2020842450

   ah oK, i think in this case we should try to come up with a rule for lanes. 
I think having a fixed dtype probably makes sense then we handle cast for 
related cases


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Web] Add `kv_state` and `rnn_state` to wasm_runtime [tvm]

2024-03-26 Thread via GitHub


tqchen merged PR #16791:
URL: https://github.com/apache/tvm/pull/16791


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(tvm) branch main updated (8274d142a3 -> 571fdaf1eb)

2024-03-26 Thread tqchen
This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


from 8274d142a3 [Relax] Implement operators to inspec DLTensor::strides and 
offset  (#16721)
 add 571fdaf1eb [Web] Add `kv_state` and `rnn_state` to wasm_runtime 
(#16791)

No new revisions were added by this update.

Summary of changes:
 web/emcc/wasm_runtime.cc | 2 ++
 1 file changed, 2 insertions(+)



Re: [PR] [Relax] Allow R.Prim('bool') in relax::If and assert_op [tvm]

2024-03-26 Thread via GitHub


Lunderberg commented on PR #16642:
URL: https://github.com/apache/tvm/pull/16642#issuecomment-2020783444

   Rebased onto main to resolve a merge conflict.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [SLM] Allow modules to define pre-processing of weights [tvm]

2024-03-26 Thread via GitHub


Lunderberg commented on code in PR #16785:
URL: https://github.com/apache/tvm/pull/16785#discussion_r1539406497


##
python/tvm/relax/frontend/nn/op.py:
##
@@ -676,12 +676,31 @@ def permute_dims(x: Tensor, axes: Optional[List[int]] = 
None, name: str = None)
 result : Tensor
 The transposed result.
 """
+
+# TODO(Lunderberg): This is a more extensive auto-naming than
+# intended here.  Is this still worth it?

Review Comment:
   Long-term, I want to move this automatic naming from the `nn.Module` side to 
the Relax side, since it could then be performed after removal of trivial 
bindings.  I don't expect these chains to be deep, as it only tracks trivial 
bindings.  The trivial binding from the Relax function parameter to the 
parameter's `param._expr` field should be the only one that would be tracked.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [SLM] Allow modules to define pre-processing of weights [tvm]

2024-03-26 Thread via GitHub


Lunderberg commented on code in PR #16785:
URL: https://github.com/apache/tvm/pull/16785#discussion_r1539402961


##
tests/python/relax/test_frontend_nn_packing.py:
##
@@ -25,7 +25,9 @@ def _iter_binding_names(mod):
 """Helper function to compare the names of relax variables"""
 for block in mod["forward"].body.blocks:
 for binding in block.bindings:
-yield binding.var.name_hint
+# Relax variable names may contain '.' even though it
+# cannot be expressed in TVMScript.

Review Comment:
   I could go either way.  It's nice to have the 1:1 mapping between Relax and 
TVMScript, which would forbid the period within a relax variable name.  
However, it's also nice to have a 1:1 mapping between a Relax function 
parameter and weight tensor's name in a pytorch or safetensor file, which are 
usually written with a period in the name.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [SLM] Allow modules to define pre-processing of weights [tvm]

2024-03-26 Thread via GitHub


Lunderberg commented on code in PR #16785:
URL: https://github.com/apache/tvm/pull/16785#discussion_r1539397202


##
python/tvm/relax/frontend/nn/core.py:
##
@@ -591,7 +609,22 @@ def wrap_nested(expr: rx.Expr, name: str) -> Union[Tensor, 
Sequence[Tensor]]:
 The computed result.
 """
 if not isinstance(expr, rx.DataflowVar):
-expr = BlockBuilder.current().emit(expr, name)
+block_builder = BlockBuilder.current()
+if block_builder is None:
+# Normalize to make sure we have valid StructInfo, but
+# wait until we are actually building the function to
+# flatten nested expressions.
+#
+# TODO(Lunderberg): Make this easier to call.  Infering
+# struct info for a nested expression should be doable in
+# a free function, without requiring an active
+# BlockBuilder and an active FunctionFrame.

Review Comment:
   Long-term, I think it would be nice to distinguish between local struct 
inference and non-local struct inference.  The local inference could be applied 
when a relax object is constructed, which would avoid the current two-phase 
initialization of relax objects.  Since this step can only perform local struct 
inference, which would be applied by default, this entire conditional could be 
removed.
   
   There's some kinks that would need to be worked out first.  Some of the 
struct inference for tensor operations currently throw errors a bit more than I 
think they should.   (e.g. If `R.matmul` throws an exception if the arguments 
are not `R.Tensor`.  If the arguments are `R.Object`, the exception is still 
thrown, even though `R.Tensor` is a subtype of `R.Object`.)  These fallbacks 
would probably get more exercise with local inference, as there may be less 
information available.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [SLM] Allow modules to define pre-processing of weights [tvm]

2024-03-26 Thread via GitHub


Lunderberg commented on code in PR #16785:
URL: https://github.com/apache/tvm/pull/16785#discussion_r1539399718


##
python/tvm/relax/frontend/nn/exporter.py:
##
@@ -190,34 +207,64 @@ def _convert_input(arg):
 def _params(mode: str) -> typing.List[rx.Var]:
 inputs: typing.List[rx.Var] = []
 
-def _get_var(shape_var: tir.Var) -> tir.Var:
-name = shape_var.name
-if name in str2var_params:
-return str2var_params[name]
-var = tir.Var(name, "int64")
-str2var_params[name] = var
-return var
+def _normalize_dim(dim: typing.Union[int, str, tir.Var]) -> 
tir.PrimExpr:
+if isinstance(dim, int):
+return tir.IntImm("int64", dim)
+elif isinstance(dim, str):
+if dim in str2var_params:
+return str2var_params[dim]
+else:
+new_var = tir.Var(dim, "int64")
+str2var_params[dim] = new_var
+return new_var
+elif isinstance(dim, tir.Var):
+return dim
+else:
+raise TypeError(
+f"Expected dim to be int, str, or tir.Var, "
+f"but {dim} was of type {type(dim)}."
+)
 
 for name, param in params:
 # Make sure the a symbolic shape is not re-registered (same as 
_method_spec_to_inputs)
 # e.g. we do not see `vocab_size` for `lm_head` and `vocab_size_1` 
for `embed_tokens`
-new_shape = [_get_var(x) if isinstance(x, tir.Var) else x for x in 
param.shape]
-var = core.Tensor.placeholder(new_shape, param.dtype, name)._expr
+new_shape = [_normalize_dim(dim) for dim in param._shape]
+# var_cls = rx.DataflowVar if mode == "packed" else rx.Var

Review Comment:
   Whoops, that was a test during dev work.  Removing the commented-out 
`var_cls` line.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [SLM] Allow modules to define pre-processing of weights [tvm]

2024-03-26 Thread via GitHub


Lunderberg commented on code in PR #16785:
URL: https://github.com/apache/tvm/pull/16785#discussion_r1539376002


##
python/tvm/relax/frontend/nn/exporter.py:
##
@@ -135,9 +136,18 @@ def _effects() -> typing.List[typing.Tuple[str, 
core.Effect]]:
 with self.builder.dataflow():
 outputs, inputs = _emit_method(self.builder, 
method_spec, params, effects)
 self.builder.emit_func_output(outputs, inputs)
+
+# TODO(Lunderberg): Make a `ir.transform.ConvertSSA`,
+# similar to the existing `tir.transform.ConvertSSA`,
+# that converts an entire module to SSA, including TIR
+# variable definitions used in either TIR or Relax.

Review Comment:
   Both Relax and TIR require SSA to be well-formed.  However, there's a number 
of cases where a module could be unambiguously converted to SSA.  (e.g. Two 
functions use the same `relax.Var` as a parameter, which can be fixed by 
substituting a new variable in one of the functions.)
   
   So, it wouldn't be a pass that would be called directly by end users, but 
would be for internal use.  If a pass is most easily written in a way that 
results in the same symbolic variable occurring in multiple different 
functions, then this would be used as a post-processing pass. (e.g. Apply 
`BindSymbolicVars` to one variable in a function, then save the result as a new 
function in the same IRModule.  Useful, but would duplicate all other symbolic 
variables.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [SVE] Support scalable vectors in LoopVectorizer [tvm]

2024-03-26 Thread via GitHub


lhutton1 commented on code in PR #16782:
URL: https://github.com/apache/tvm/pull/16782#discussion_r1539232691


##
src/tir/ir/expr.cc:
##
@@ -196,7 +196,9 @@ TVM_REGISTER_NODE_TYPE(StringImmNode);
 // Cast
 Cast::Cast(DataType t, PrimExpr value, Span span) {
   ICHECK(value.defined());
-  ICHECK_EQ(t.lanes(), value.dtype().lanes());
+  ICHECK_EQ(t.get_lanes_or_vscale_factor(), 
value.dtype().get_lanes_or_vscale_factor());
+  ICHECK((t.is_scalable_vector() == value.dtype().is_scalable_vector()) ||
+ (!t.is_scalable_vector() && !value.dtype().is_scalable_vector()));

Review Comment:
   I think `a == b` already implies `!a && !b`, so the expression could be 
simplified to just `t.is_scalable_vector() == 
value.dtype().is_scalable_vector()`



##
src/tir/transforms/vectorize_loop.cc:
##
@@ -37,19 +37,36 @@
 namespace tvm {
 namespace tir {
 
-// TODO(ekalda): P5 in https://github.com/apache/tvm/issues/16455
-inline PrimExpr BroadcastTo(PrimExpr e, int lanes) {
-  if (e.dtype().lanes() == lanes) return e;
+inline PrimExpr CreateNewLanes(bool is_scalable, int lanes_or_vscale_factor) {
+  if (is_scalable) {
+return Mul(Call(DataType::Int(32), builtin::vscale(), {}), 
lanes_or_vscale_factor);
+  } else {
+return lanes_or_vscale_factor;
+  }
+}
+
+inline PrimExpr BroadcastTo(PrimExpr e, int lanes, bool is_scalable) {
+  // Check if e is already in the expected form
+  if (e.dtype().get_lanes_or_vscale_factor() == lanes &&
+  e.dtype().is_scalable_vector() == is_scalable)
+return e;
+
   if (const BroadcastNode* op = e.as()) {
-ICHECK(!e.dtype().is_scalable_vector());
-int broadcast_lanes = static_cast(Downcast(op->lanes)->value);
-if (lanes % broadcast_lanes == 0) {
-  return Broadcast(op->value, lanes);
+ICHECK(op->dtype.is_scalable_vector() == is_scalable)
+<< "Can't broadcast between scalable and fixed length vectors.";
+int e_lanes = is_scalable ? op->dtype.vscale_factor() : op->dtype.lanes();

Review Comment:
   nit: `get_lanes_or_vscale_factor()`



##
src/tir/transforms/vectorize_loop.cc:
##
@@ -433,20 +488,27 @@ class Vectorizer : public StmtMutator, public 
ExprFunctorVisitExpr(op->value);
 
 if (!indices.same_as(op->indices) || !value.same_as(op->value)) {
+  ICHECK(!op->buffer->dtype.is_scalable_vector())
+  << "Vectorizing over scalable buffer elements is not supported in 
vectorizer.";
   // How many lanes of indexing are present in the index and
-  // buffer element type, excluding the last index.  T
+  // buffer element type, excluding the last index.
   int other_index_lanes = op->buffer->dtype.lanes();
   for (size_t i = 0; i < indices.size() - 1; i++) {
 other_index_lanes *= indices[i].dtype().lanes();
+// Only allow the last index to be scalable
+ICHECK(!indices[i].dtype().is_scalable_vector()) << "Only the last 
index can be scalable.";
   }
 
   // The total number of lanes of indexing, including the last index.
-  int index_lanes = other_index_lanes * indices[indices.size() - 
1].dtype().lanes();
+  int lanes_in_last_index = indices[indices.size() - 
1].dtype().get_lanes_or_vscale_factor();
+  int index_lanes = other_index_lanes * lanes_in_last_index;
 
   // The total number of lanes in this store operation.  Either
   // the index or the value will be broadcast out to this number
   // of lanes, depending on which has more lanes.
-  int total_lanes = std::max(index_lanes, value.dtype().lanes());
+  int value_dtype_lanes = value.dtype().get_lanes_or_vscale_factor();
+  bool is_last_index_scalable = indices[indices.size() - 
1].dtype().is_scalable_vector();

Review Comment:
   nit: might be nicer to replace uses of `indices[indices.size() - 1].dtype()` 
with a `last_index_dtype` variable



##
src/tir/transforms/vectorize_loop.cc:
##
@@ -635,19 +701,22 @@ class Vectorizer : public StmtMutator, public 
ExprFunctora) && b.same_as(op->b)) {
   return GetRef(op);
 } else {
-  int lanes = std::max(a.dtype().lanes(), b.dtype().lanes());
+  int a_lanes = a.dtype().get_lanes_or_vscale_factor();
+  int b_lanes = b.dtype().get_lanes_or_vscale_factor();
+  int lanes = std::max(a_lanes, b_lanes);
   if (lanes != 1) {
 const RampNode* b_ramp = b.as();
 const RampNode* a_ramp = a.as();
-if (a.dtype().lanes() == 1 && b_ramp) {
+if (!a.dtype().is_scalable_or_fixed_length_vector() && b_ramp) {

Review Comment:
   `is_scalar`?



##
tests/python/tir-transform/test_tir_transform_vectorize.py:
##
@@ -64,28 +61,86 @@ def test_vectorize_vector():
 assert isinstance(stmt.body.value, tvm.tir.Broadcast)
 
 
-def test_vectorize_with_if():
-n = te.var("n")
-x = te.var("x")
-ib = tvm.tir.ir_builder.create()
-A = ib.pointer("float32", name="A")
-with ib.for_range(0, 4, kind="vectorize") as i:
-

Re: [PR] [TIR] LowerTVMBuiltin may use device_type from PrimFunc annotation [tvm]

2024-03-26 Thread via GitHub


Lunderberg commented on code in PR #16727:
URL: https://github.com/apache/tvm/pull/16727#discussion_r1539352746


##
tests/python/tir-transform/test_tir_transform_lower_tvm_builtin.py:
##
@@ -260,11 +260,13 @@ def expected():
 
 
 class TestLowerAllocateRequiresDeviceID(tvm.testing.CompareBeforeAfter):
+"""If device id is missing, error."""
+
 transform = tvm.tir.transform.LowerTVMBuiltin()
 
 def before():
 T.func_attr({"target": T.target("llvm")})
-T.attr("dummy", "device_id", 0)
+T.attr("dummy", "device_type", 2)  # kDLCuda

Review Comment:
   Good question, and looks like it is defined in the `tvm.runtime.Device` 
struct.  I've updated the usage here, and throughout this unit test file.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [TIR] Fix segfaults from ordering of Let/Assert in MakePackedAPI [tvm]

2024-03-26 Thread via GitHub


Lunderberg commented on code in PR #16543:
URL: https://github.com/apache/tvm/pull/16543#discussion_r1539344129


##
src/tir/transforms/arg_binder.cc:
##
@@ -186,18 +191,8 @@ void ArgBinder::BindDLTensor(const Buffer& buffer, const 
PrimExpr& device_type,
   if (!(buffer->dtype == DataType::Int(1) || buffer->dtype == DataType::Int(4) 
||
 buffer->dtype == DataType::UInt(4))) {
 auto type_msg = tvm::tir::StringImm(type_err_msg.str());
-asserts_.emplace_back(AssertStmt(a_ndim == v_ndim, msg, nop));

Review Comment:
   Yup.  The buffer's dimensionality is checked earlier, so this is entirely a 
duplicate check on the dimensionality.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [TIR] Fix segfaults from ordering of Let/Assert in MakePackedAPI [tvm]

2024-03-26 Thread via GitHub


Lunderberg commented on code in PR #16543:
URL: https://github.com/apache/tvm/pull/16543#discussion_r1539338859


##
rust/tvm-graph-rt/tests/test_tvm_basic/build.rs:
##
@@ -48,10 +48,6 @@ fn main() -> Result<()> {
 obj_file.exists(),
 "Could not build tvm lib: {}",
 String::from_utf8(output.stderr)?
-.trim()
-.split("\n")
-.last()
-.unwrap_or("")

Review Comment:
   Oh, that's really weird.  I'm guessing it was from bouncing over to the PR 
branch of https://github.com/apache/tvm/pull/16183, which touched a number of 
the FFI bindings.  I've removed this delta from the PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [SVE] Support scalable vectors in LoopVectorizer [tvm]

2024-03-26 Thread via GitHub


Lunderberg commented on PR #16782:
URL: https://github.com/apache/tvm/pull/16782#issuecomment-2020540959

   The implementation looks reasonable, though I have one main question for it: 
What is the behavior of the updated pass for a target that doesn't support SVE? 
 Prior SVE-commits enabled the functionality, but didn't produce SVE in any of 
the default lowering passes.
   
   From [this 
line](https://github.com/apache/tvm/pull/16696/files#diff-f61b04b100f5145f2681340c81d3f2af221239594ed01e2e24896522329ce92cR598-R600),
 versions of LLVM before 11.0 do not support SVE, nor from my brief reading of 
the CUDA codegen 
[here](https://github.com/apache/tvm/blob/main/src/target/source/codegen_cuda.cc#L253)
 does CUDA.
   
   Since `VectorizeLoop` occurs after the `BindTarget` pass, we can check the 
function attribute to know which target will be executing each function.  I 
think we should have the loop vectorization apply only to fixed-extent loops by 
default, but enable the scalable vectorization for targets that support it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Relax] Implement operators to inspec DLTensor::strides and offset [tvm]

2024-03-26 Thread via GitHub


Lunderberg commented on PR #16721:
URL: https://github.com/apache/tvm/pull/16721#issuecomment-2020500989

   > We may want to revisit the default inferred PrimStructInfo for some of 
these calls in the future, namely if we handle offsets/strides more 
systematically later, though the approach here is correct for the present.
   
   Sounds like a plan.  I think the biggest use of `strides` would be in 
exposing a view of a tensor to a compute kernel, without requiring the entire 
tensor to be exposed.  (e.g. Improved `R.split` legalization)  That said, 
there's enough kernels that assume contiguous tensors, as are currently 
provided by Relax, that for now I'd want to keep that requirement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Relax] Implement operators to inspec DLTensor::strides and offset [tvm]

2024-03-26 Thread via GitHub


Lunderberg merged PR #16721:
URL: https://github.com/apache/tvm/pull/16721


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Relax] Improve CanonicalizeBindings in DataflowVar edge case [tvm]

2024-03-26 Thread via GitHub


Lunderberg commented on code in PR #16783:
URL: https://github.com/apache/tvm/pull/16783#discussion_r1539285615


##
src/relax/transform/canonicalize_bindings.cc:
##
@@ -91,18 +91,20 @@ class CanonicalizePlanner : public ExprVisitor {
 bound_to = opt.value();
   }
 
-  if (bound_var.as() || !bound_to.as()) {
+  if (bound_var.as() || !bound_to.as() ||
+  !visitor.used_outside_home_dataflow_.count(bound_var)) {
 // Case 1: Var = Var
 // Case 2: DataflowVar = Var
 // Case 3: DataflowVar = DataflowVar
+// Case 4a: Var = DataflowVar, but used outside this DataflowBlock

Review Comment:
   Thank you, and updated the comment to be more explicit.  I've also changed 
"this DataflowBlock" to "the DataflowBlock containing the binding", since this 
function is called after the entire function is visited, not during the visit 
of any specific dataflow block.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(tvm) branch main updated: [Relax] Implement operators to inspec DLTensor::strides and offset (#16721)

2024-03-26 Thread lunderberg
This is an automated email from the ASF dual-hosted git repository.

lunderberg pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new 8274d142a3 [Relax] Implement operators to inspec DLTensor::strides and 
offset  (#16721)
8274d142a3 is described below

commit 8274d142a3c229eb664d041c5a8034c3638f8c0f
Author: Eric Lunderberg 
AuthorDate: Tue Mar 26 08:55:10 2024 -0500

[Relax] Implement operators to inspec DLTensor::strides and offset  (#16721)

* [TIR] LowerTVMBuiltin may use device_type from PrimFunc annotation

If an allocation occurs within a host function, it may not have a
device/host split.

* lint fix

* [Relax] Implement operators to inspec DLTensor::strides and offset

A follow-up PR to https://github.com/apache/tvm/pull/16563.  This PR
implements similar operators to inspect the runtime values of
`DLTensor::strides` and `DLTensor::byte_offset`.  In addition, while the
element offset is not explicitly present in the `DLTensor` struct, a
Relax operator is implemented to infer it from the `byte_offset` and
`data_type` fields, for use when interacting with the TIR
`BufferNode::elem_offset` field.
---
 python/tvm/relax/expr.py   |  97 
 .../tvm/relax/transform/legalize_ops/__init__.py   |   1 +
 .../tvm/relax/transform/legalize_ops/inspect_op.py | 128 +++
 src/relax/op/tensor/inspect.cc | 180 ---
 src/relax/op/tensor/inspect.h  |  39 
 src/tir/transforms/lower_tvm_builtin.cc|  36 ++-
 tests/python/relax/test_op_inspect.py  | 252 +
 tests/python/relax/test_op_unpack.py   | 127 ---
 .../test_tir_transform_lower_tvm_builtin.py|  37 ++-
 9 files changed, 727 insertions(+), 170 deletions(-)

diff --git a/python/tvm/relax/expr.py b/python/tvm/relax/expr.py
index 12f08f4dbf..4dca710e77 100644
--- a/python/tvm/relax/expr.py
+++ b/python/tvm/relax/expr.py
@@ -280,6 +280,33 @@ class ExprWithOp(Expr, Scriptable):
 self._check_for_tensor_struct_info()
 return _DLTensorShapeProxy(self)
 
+@property
+def strides(self) -> "_DLTensorStrideProxy":
+"""Returns a proxy object for accessing DLTensor::strides"""
+self._check_for_tensor_struct_info()
+return _DLTensorStrideProxy(self)
+
+@property
+def byte_offset(self) -> "Expr":
+"""Returns a proxy object for accessing DLTensor::byte_offset"""
+self._check_for_tensor_struct_info()
+op = tvm.ir.Op.get("relax.inspect.tensor_byte_offset")
+return tvm.relax.Call(op, [self])
+
+@property
+def elem_offset(self) -> "Expr":
+"""Returns a proxy object for accessing a DLTensor's elem_offset
+
+This parameter is not stored in the DLTensor, but is instead
+derived from the DLTensor's byte offset and datatype.  This is
+exposed in Relax for ease of use, and for translation into the
+`tir::BufferNode::elem_offset` field when interacting with TIR
+buffers.
+"""
+self._check_for_tensor_struct_info()
+op = tvm.ir.Op.get("relax.inspect.tensor_elem_offset")
+return tvm.relax.Call(op, [self])
+
 
 class _DLTensorDTypeProxy(tvm.runtime.ObjectGeneric):
 """A proxy object for unpacking DLDatatype from DLTensor
@@ -431,6 +458,76 @@ class _DLTensorShapeProxy(tvm.runtime.ObjectGeneric):
 return tvm.relax.Call(op, [self.tensor, axis])
 
 
+class _DLTensorStrideProxy(tvm.runtime.ObjectGeneric):
+"""A proxy object for unpacking the strides from DLTensor
+
+Exposes accessors for the `DLTensor::strides` field.  Accessing
+these fields will produce `relax.Call` expressions, representing
+the field's runtime value.  If the datatype of the tensor is known
+at compile-time, the `relax.Call` will be normalized into a
+`relax.PrimValue`, with no runtime cost.
+
+Parameters
+--
+tensor: relax.Expr
+
+The relax tensor (or a variable referring to a relax tensor),
+whose runtime strides is being inspected.
+"""
+
+def __init__(self, tensor):
+self.tensor = tensor
+
+def asobject(self):
+"""Provide expected in error message
+
+This method is called when `_DLTensorStrideProxy` is used in a
+context that requires a `relax.Expr`.  This usage is not
+supported, and raising an error here can provide suggested
+fixes that are not present in the default error message from
+`tvm.runtime.convert_to_object`.
+"""
+raise TypeError(
+f"{self.tensor}.strides cannot be converted to a relax expression, 
"
+f"and should be used as a proxy object to access the runtime 
strides of the DLTensor. "
+f"The DLTensor::ndim 

(tvm) branch p0-install-testing-infra deleted (was 02c6cad99e)

2024-03-26 Thread lukhut
This is an automated email from the ASF dual-hosted git repository.

lukhut pushed a change to branch p0-install-testing-infra
in repository https://gitbox.apache.org/repos/asf/tvm.git


 was 02c6cad99e [SME][Docker] Add Fixed Virtual Platform (FVP) and 
toolchain install

The revisions that were on this branch are still contained in
other references; therefore, this change does not discard any commits
from the repository.



Re: [PR] [Relax] Improve CanonicalizeBindings in DataflowVar edge case [tvm]

2024-03-26 Thread via GitHub


Lunderberg commented on code in PR #16783:
URL: https://github.com/apache/tvm/pull/16783#discussion_r1539278031


##
src/relax/transform/canonicalize_bindings.cc:
##
@@ -91,18 +91,20 @@ class CanonicalizePlanner : public ExprVisitor {
 bound_to = opt.value();
   }
 
-  if (bound_var.as() || !bound_to.as()) {
+  if (bound_var.as() || !bound_to.as() ||
+  !visitor.used_outside_home_dataflow_.count(bound_var)) {
 // Case 1: Var = Var
 // Case 2: DataflowVar = Var
 // Case 3: DataflowVar = DataflowVar
+// Case 4a: Var = DataflowVar, but used outside this DataflowBlock
 //
 // For these three cases, the trivial binding can be

Review Comment:
   Off by one errors, my ~~two~~ one nemesis!  (Thank you, and fixed.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Relax] Allow composition of DFPattern replacements [tvm]

2024-03-26 Thread via GitHub


Lunderberg commented on PR #16732:
URL: https://github.com/apache/tvm/pull/16732#issuecomment-2020480480

   The pre-requisite PR https://github.com/apache/tvm/pull/16730 has landed, so 
this PR is now rebased on top of `main` and marked as ready.  Thank you 
@slyubomirsky for the review, and so I think it's just waiting on CI now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Web] Add `kv_state` and `rnn_state` to wasm_runtime [tvm]

2024-03-26 Thread via GitHub


CharlieFRuan commented on PR #16791:
URL: https://github.com/apache/tvm/pull/16791#issuecomment-2020477526

   Thank you so much @Hzfengsy!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Relax] Allow composition of DFPattern replacements [tvm]

2024-03-26 Thread via GitHub


Lunderberg commented on code in PR #16732:
URL: https://github.com/apache/tvm/pull/16732#discussion_r1539272553


##
src/relax/ir/dataflow_matcher.cc:
##
@@ -1140,34 +1071,173 @@ class PatternRewriter : ExprMutator {
 return block;
   }
 
-  /*! \brief The pattern for rewriting call nodes */
-  Optional pattern_;
   /*! \brief The pattern constraint contexts for rewriting dataflow blocks */
-  Optional ctx_;
+  PatternContext ctx_;
   /*!
* \brief The user-provided rewriter function. Its signature and semantics 
are:
-   * - (Call, Map) -> Call for call node rewriting. Given the 
matched
-   *call node and the map of patterns and matched expressions, it should 
return a new call node
-   *to replace the original one or the original matched call node as is.
-   * - (Map, Map) -> Map for dataflow 
block rewriting.
-   *Given the map of patterns and corresponding variables (bound variables 
or parameters),
-   *it should return a map that specifies new values for matched bound 
variables. It can refer
+   *
+   * - (Map, Map) -> Map
+   *
+   *Given the map of patterns and corresponding variables (bound
+   *variables or parameters), it should return a map that
+   *specifies new values for matched bound variables. It can refer
*to the passed bindings to create the replacement expressions.
*/
-  PackedFunc rewriter_func_;
-  std::unordered_set params_;
+  TypedPackedFunc(Map, Map)> 
rewriter_func_;
+};
+
+/*!
+ * \brief Apply pattern matching to each expression, replacing
+ * matches with the output of a user-provided rewriter function.
+ */
+class ExprPatternRewriter : ExprMutator {
+ public:
+  using ExprMutator::VisitBindingBlock_;
+  using ExprMutator::VisitExpr_;
+
+  ExprPatternRewriter(DFPattern pat,
+  TypedPackedFunc)> 
rewriter_func)
+  : pattern_(pat), rewriter_func_(rewriter_func) {}
+
+  template 
+  static Function Run(PatternType pat,
+  TypedPackedFunc)> 
rewriter_func,
+  Function func) {
+ExprPatternRewriter rewriter(pat, rewriter_func);
+func = Downcast(rewriter(func));
+func = Downcast(RemoveAllUnused(func));
+return func;
+  }
+
+  Expr VisitExpr_(const SeqExprNode* seq) override {
+auto cache = bindings_;
+SeqExpr prev = GetRef(seq);
+
+StructuralEqual struct_equal;
+
+while (true) {
+  SeqExpr next = 
Downcast(builder_->Normalize(ExprMutator::VisitExpr_(prev.get(;
+  if (struct_equal(prev, next)) {
+return std::move(next);
+  }
+
+  // Canonicalization may result in two previously-different
+  // expressions being recognized as identical.  Elimination of
+  // common subexpressions may result in trival var-to-var
+  // bindings that can be canonicalized.  Therefore, iterate the
+  // simplification steps until converged.
+  while (true) {
+auto start_of_loop = next;
+next = Downcast(CanonicalizeBindings(next));
+next = Downcast(EliminateCommonSubexpr(next));
+next = Downcast(RemoveAllUnused(next));
+if (struct_equal(start_of_loop, next)) {
+  break;
+}
+  }
+
+  if (struct_equal(prev, next)) {
+return std::move(next);
+  }
+
+  // Reset all knowledge of bindings that were collected from
+  // this SeqExpr.  The collected bindings are only after
+  // the point where they were collected, and we are repeating
+  // the mutation of this SeqExpr.
+  bindings_ = cache;
+  prev = next;
+}
+  }
+
+  void VisitBinding_(const VarBindingNode* binding) override {
+auto expr = VisitExpr(binding->value);
+bindings_.Set(binding->var, expr);
+ReEmitBinding(binding, expr);
+  }
+
+  Expr VisitExpr(const Expr& expr) override {
+auto node = ExprMutator::VisitExpr(expr);
+
+std::vector matches_top_level;
+if (auto rewritten = TryRewrite(node, pattern_, _top_level)) {
+  return builder_->Normalize(rewritten.value());
+}
+
+return node;
+  }
+
+ private:
+  Optional TryRewrite(const Expr& expr, const DFPattern& pattern,
+std::vector* matches_top_level) {
+ICHECK(matches_top_level);
+
+// Special handling if the user-supplied pattern is a `OrPattern`.
+// While the `ExtractMatchedExpr` can handle match the

Review Comment:
   Whoops, this was a typo.  It should be "handle matching", and I've updated 
the PR with the correction.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Relax] Refactor PatternRewriter into separate Block/Expr mutators [tvm]

2024-03-26 Thread via GitHub


Lunderberg merged PR #16730:
URL: https://github.com/apache/tvm/pull/16730


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(tvm) branch main updated: [Relax] Refactor PatternRewriter into separate Block/Expr mutators (#16730)

2024-03-26 Thread lunderberg
This is an automated email from the ASF dual-hosted git repository.

lunderberg pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new 016b512ad4 [Relax] Refactor PatternRewriter into separate Block/Expr 
mutators (#16730)
016b512ad4 is described below

commit 016b512ad4950cba32eaf81be0cfe3c0321851f7
Author: Eric Lunderberg 
AuthorDate: Tue Mar 26 08:43:36 2024 -0500

[Relax] Refactor PatternRewriter into separate Block/Expr mutators (#16730)

Prior to this commit, the `PatternRewriter` mutator handled pattern
rewriting at either the expression level (`rewrite_call`) or the
dataflow block level (`rewrite_bindings`).  These two functionalities
had different external APIs, defined diffierent member variables, and
visited different IR nodes.  In effect, it had two entirely
independent implementations, which just happened to be implemented
within the same class.

This commit refactors the single `PatternRewriter` mutator into
separate `BlockPatternRewriter` and `ExprPatternRewriter` mutators.
---
 include/tvm/relax/dataflow_matcher.h |   4 +-
 src/relax/ir/dataflow_matcher.cc | 238 ---
 2 files changed, 140 insertions(+), 102 deletions(-)

diff --git a/include/tvm/relax/dataflow_matcher.h 
b/include/tvm/relax/dataflow_matcher.h
index bbc8e9382e..8f2024f264 100644
--- a/include/tvm/relax/dataflow_matcher.h
+++ b/include/tvm/relax/dataflow_matcher.h
@@ -67,7 +67,9 @@ TVM_DLL Optional> MatchGraph(const 
PatternContext& ctx,
  * \param f The function to rewrite
  * \return The rewritten or the input function, depending on the pattern 
matching result.
  */
-TVM_DLL Function RewriteBindings(const PatternContext& ctx, PackedFunc 
rewriter, Function f);
+TVM_DLL Function RewriteBindings(
+const PatternContext& ctx,
+TypedPackedFunc(Map, Map)> 
rewriter, Function f);
 
 /**
  * \brief Rewrite a function with the given pattern and the rewriter function.
diff --git a/src/relax/ir/dataflow_matcher.cc b/src/relax/ir/dataflow_matcher.cc
index a14d43f6d3..531971d3db 100644
--- a/src/relax/ir/dataflow_matcher.cc
+++ b/src/relax/ir/dataflow_matcher.cc
@@ -973,102 +973,33 @@ TVM_REGISTER_GLOBAL("relax.dpl.match_dfb")
 });
 
 /*!
- * \brief Apply pattern matching to each call node and dataflow block, and 
replace matching ones
+ * \brief Apply pattern matching to each dataflow block, replacing matches
  * with the output of a user-provided rewriter function.
  */
-class PatternRewriter : ExprMutator {
+class BlockPatternRewriter : ExprMutator {
  public:
   using ExprMutator::VisitBindingBlock_;
   using ExprMutator::VisitExpr_;
 
-  PatternRewriter(DFPattern pat, PackedFunc rewriter_func,
-  const std::unordered_set& params)
-  : pattern_(pat), rewriter_func_(rewriter_func), params_(params) {}
-
-  PatternRewriter(const PatternContext& ctx, PackedFunc rewriter_func,
-  const std::unordered_set& params)
-  : ctx_(ctx), rewriter_func_(rewriter_func), params_(params) {}
+  BlockPatternRewriter(
+  const PatternContext& ctx,
+  TypedPackedFunc(Map, Map)> 
rewriter_func)
+  : ctx_(ctx), rewriter_func_(rewriter_func) {}
 
   template 
-  static Function Run(PatternType pat, PackedFunc rewriter_func, Function f) {
-std::unordered_set params;
-for (const auto& p : f->params) {
-  params.insert(p.get());
-}
-PatternRewriter rewriter(pat, rewriter_func, params);
-return Downcast(RemoveAllUnused(rewriter.VisitExpr(f)));
-  }
-
-  Expr VisitExpr_(const SeqExprNode* seq) override {
-if (ctx_) {
-  return ExprMutator::VisitExpr_(seq);
-}
-
-auto cache = bindings_;
-SeqExpr prev = GetRef(seq);
-
-StructuralEqual struct_equal;
-
-while (true) {
-  SeqExpr next = 
Downcast(builder_->Normalize(ExprMutator::VisitExpr_(prev.get(;
-  if (struct_equal(prev, next)) {
-return std::move(next);
-  }
-
-  // Canonicalization may result in two previously-different
-  // expressions being recognized as identical.  Elimination of
-  // common subexpressions may result in trival var-to-var
-  // bindings that can be canonicalized.  Therefore, iterate the
-  // simplification steps until converged.
-  while (true) {
-auto start_of_loop = next;
-next = Downcast(CanonicalizeBindings(next));
-next = Downcast(EliminateCommonSubexpr(next));
-next = Downcast(RemoveAllUnused(next));
-if (struct_equal(start_of_loop, next)) {
-  break;
-}
-  }
-
-  if (struct_equal(prev, next)) {
-return std::move(next);
-  }
-
-  // Reset all knowledge of bindings that were collected from
-  // this DataflowBlock.  The collected bindings are only after
-  // the point where they were collected, and we are repeating
-  // the mutation of 

Re: [PR] [Bugfix][Cutlass] Remove a typo in cutlass build [tvm]

2024-03-26 Thread via GitHub


Lunderberg merged PR #16789:
URL: https://github.com/apache/tvm/pull/16789


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [IR][Relax] Improve highlighting in assert_structural_equal [tvm]

2024-03-26 Thread via GitHub


Lunderberg commented on PR #16756:
URL: https://github.com/apache/tvm/pull/16756#issuecomment-2020467097

   And the additional unit test is added in 
https://github.com/apache/tvm/pull/16796.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(tvm) branch main updated (bf2d43e314 -> bcfbcabff8)

2024-03-26 Thread lunderberg
This is an automated email from the ASF dual-hosted git repository.

lunderberg pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


from bf2d43e314 [IR][Relax] Improve highlighting in assert_structural_equal 
(#16756)
 add bcfbcabff8 [Bugfix][Cutlass] Remove a typo in cutlass build (#16789)

No new revisions were added by this update.

Summary of changes:
 python/tvm/contrib/cutlass/build.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)



[PR] [Relax] Unit-test for structural equal of recursive function [tvm]

2024-03-26 Thread via GitHub


Lunderberg opened a new pull request, #16796:
URL: https://github.com/apache/tvm/pull/16796

   A follow-up PR to https://github.com/apache/tvm/pull/16756, adding an 
explicit unit test for `tvm.ir.assert_structural_equal` of two distinct 
recursive functions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [TIR] Modify IntImmNode deep_equal to match regardless of type [tvm]

2024-03-26 Thread via GitHub


quic-sanirudh commented on PR #16795:
URL: https://github.com/apache/tvm/pull/16795#issuecomment-2020458701

   > In this particular case (deep equality), i think type do matter, so it 
would be great instead to fix the cases that would depend on the relaxed 
behavior.
   > 
   > I know we had some i64/i32 issues, and general rule of thumb now is to try 
to be explicit as much as possible and that helps to reduce errors
   
   Oh okay, thanks for the feedback @tqchen. The cases we started seeing was 
that some expressions were not getting simplified properly after [`RampNode` 
lanes were changed to 
PrimExpr](https://github.com/apache/tvm/commit/a6157a6369c184b6fa5f66654feb685e58726737#diff-046cdcb6494a6719465080bb9156cd4620828af4b18f7018e5b443d6c7c1c1d0L792-R792).
 I narrowed down the exact simplification that was failing was actually a 
[rewrite simplify rule 
here](https://github.com/apache/tvm/blob/main/src/arith/rewrite_simplify.cc#L401).
 
   
   I realized that the simplification was not happening because lanes between 
broadcast and RampNode in this case had different types, so a couple other 
solutions I thought would apply here is to either fix the RampNode constructor 
to stick to some fixed dtype for lanes (something like int32/int16, since 
`DLDataType` anyways only supports int16 dtype), or to update the simplify 
rules here to try the same rules with an `PVar` lanes type in case of 
fixed length vectors.
   
   But if dtype does matter, then should we update the `PEqualChecked` 
to also check for dtypes?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(tvm) branch main updated: [IR][Relax] Improve highlighting in assert_structural_equal (#16756)

2024-03-26 Thread lunderberg
This is an automated email from the ASF dual-hosted git repository.

lunderberg pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new bf2d43e314 [IR][Relax] Improve highlighting in assert_structural_equal 
(#16756)
bf2d43e314 is described below

commit bf2d43e314ca7e682ae26dca70ada657054f8786
Author: Eric Lunderberg 
AuthorDate: Tue Mar 26 08:26:52 2024 -0500

[IR][Relax] Improve highlighting in assert_structural_equal (#16756)

* [IR][Relax] Improve highlighting in assert_structural_equal

Prior to this commit, `tvm.ir.assert_structural_equal` would highlight
an entire `relax::BindingBlock` if the number of elements in the
binding block differs.  This can result in the entire Relax function
being highlighted, making it difficult to identify the location of the
mismatch.

This commit makes the following changes, to improve the error messages
that occur when `tvm.ir.assert_structural_equal` raises an exception.

- In `"node.StructuralEqual"`, set `defer_fails = true` when
  `assert_mode` is true.  This highlights the first mismatch of an
  `Array`, rather than the entire array, in cases
  where the LHS and RHS have different sizes.

- In the `SHashReduce` for `VarBinding` and `MatchCast`, visit the
  value first, and then the variable to which it is bound.  This
  highlights the mismatched expression, rather than mismatches in the
  resulting struct info.

- In `SEqualHandlerDefault::Impl::SEqualReduce`, defer the failure if
  enabled.  This highlights the first mismatch, which may also have been
  deferred, rather than an early return a later mismatch occurs
  involving `NullOpt`.

* DeferFail should follow assert_mode

* Handle recursively defined lambda functions
---
 include/tvm/relax/expr.h | 24 ---
 src/node/structural_equal.cc | 45 +++-
 src/relax/ir/expr.cc | 50 +++
 tests/python/relax/test_utils.py | 63 +++-
 4 files changed, 149 insertions(+), 33 deletions(-)

diff --git a/include/tvm/relax/expr.h b/include/tvm/relax/expr.h
index 4634d1e228..40707675fe 100644
--- a/include/tvm/relax/expr.h
+++ b/include/tvm/relax/expr.h
@@ -780,18 +780,8 @@ class MatchCastNode : public BindingNode {
 v->Visit("span", );
   }
 
-  bool SEqualReduce(const MatchCastNode* other, SEqualReducer equal) const {
-// NOTE: pattern can contain ShapeExpr which defines the vars
-return equal.DefEqual(var, other->var) && equal.DefEqual(struct_info, 
other->struct_info) &&
-   equal(value, other->value);
-  }
-
-  void SHashReduce(SHashReducer hash_reduce) const {
-// NOTE: pattern can contain ShapeExpr which defines the vars
-hash_reduce.DefHash(var);
-hash_reduce.DefHash(struct_info);
-hash_reduce(value);
-  }
+  bool SEqualReduce(const MatchCastNode* other, SEqualReducer equal) const;
+  void SHashReduce(SHashReducer hash_reduce) const;
 
   static constexpr const char* _type_key = "relax.expr.MatchCast";
   static constexpr const bool _type_has_method_sequal_reduce = true;
@@ -822,13 +812,9 @@ class VarBindingNode : public BindingNode {
 v->Visit("span", );
   }
 
-  bool SEqualReduce(const VarBindingNode* other, SEqualReducer equal) const {
-return equal.DefEqual(var, other->var) && equal(value, other->value);
-  }
-  void SHashReduce(SHashReducer hash_reduce) const {
-hash_reduce.DefHash(var);
-hash_reduce(value);
-  }
+  bool SEqualReduce(const VarBindingNode* other, SEqualReducer equal) const;
+  void SHashReduce(SHashReducer hash_reduce) const;
+
   static constexpr const char* _type_key = "relax.expr.VarBinding";
   static constexpr const bool _type_has_method_sequal_reduce = true;
   static constexpr const bool _type_has_method_shash_reduce = true;
diff --git a/src/node/structural_equal.cc b/src/node/structural_equal.cc
index 66a347f6b8..e0de514122 100644
--- a/src/node/structural_equal.cc
+++ b/src/node/structural_equal.cc
@@ -27,6 +27,7 @@
 #include 
 #include 
 
+#include 
 #include 
 
 #include "ndarray_hash_equal.h"
@@ -249,15 +250,30 @@ class SEqualHandlerDefault::Impl {
 // in which case we can use same_as for quick checking,
 // or we have to run deep comparison and avoid to use same_as checks.
 auto run = [=]() {
-  if (!lhs.defined() && !rhs.defined()) return true;
-  if (!lhs.defined() && rhs.defined()) return false;
-  if (!rhs.defined() && lhs.defined()) return false;
-  if (lhs->type_index() != rhs->type_index()) return false;
-  auto it = equal_map_lhs_.find(lhs);
-  if (it != equal_map_lhs_.end()) {
-return it->second.same_as(rhs);
+  std::optional early_result = [&]() -> std::optional {
+if (!lhs.defined() && !rhs.defined()) return true;
+ 

Re: [PR] [IR][Relax] Improve highlighting in assert_structural_equal [tvm]

2024-03-26 Thread via GitHub


Lunderberg commented on PR #16756:
URL: https://github.com/apache/tvm/pull/16756#issuecomment-2020428978

   > The only change that might be warranted would be a test case of a 
recursive function that does not match.
   
   Good call, and I'll add one in a follow-up PR.  The failure mode that 
occurred for the recursive functions was an error being thrown for use of an 
undefined variable, which only required any comparison at all and is tested in 
`test_structural_equal_with_recursive_lambda_function`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [IR][Relax] Improve highlighting in assert_structural_equal [tvm]

2024-03-26 Thread via GitHub


Lunderberg merged PR #16756:
URL: https://github.com/apache/tvm/pull/16756


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Disco] Propagate structlog/logging config to workers [tvm]

2024-03-26 Thread via GitHub


Lunderberg commented on PR #16715:
URL: https://github.com/apache/tvm/pull/16715#issuecomment-2020421377

   > This seems straightforward enough, though I'm not aware of the wider 
context.
   
   Thank you.  For context, some applications use `structlog` to provide more 
flexible logging than python's stdlib `logging`.  The `structlog` configuration 
determines what pre-processing is done for the log statements (e.g. appending 
contextual information to a log statement).  When starting child processes 
using `multiprocessing`, it would be useful for the child processes to 
format/save their logs in the same manner as the parent, but this doesn't occur 
by default.  In [PR#16618](https://github.com/apache/tvm/pull/16618), I added 
handling to forward the `structlog` configuration from the main process to the 
`tvm.runtime.disco` worker processes.
   
   However, some configurations ([example from `structlog`'s 
documentation](https://www.structlog.org/en/stable/standard-library.html#rendering-using-structlog-based-formatters-within-logging)
 integrate `structlog` with the stdlib `logging`.  The previous implementation 
only forwarded the configuration held by `structlog`, and didn't forward the 
configuration within the stdlib `logging`.  This PR closes that gap.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [SLM] Add unit tests for SLM to Relax exporter [tvm]

2024-03-26 Thread via GitHub


Lunderberg commented on PR #16784:
URL: https://github.com/apache/tvm/pull/16784#issuecomment-2020396730

   > It's good to have test cases, especially when they explain the intended 
functionality very well. A tutorial featuring based on these examples (perhaps 
literally generated from the test cases) might be a good investment of time as 
well, especially if it can be made more visible.
   
   Thank you, and that was in part my goal with the sequence of unit tests.  
Whenever possible, I prefer test cases that double as mini-tutorials, since 
then they are less likely to become stale than full tutorials.  (Though I agree 
that it would be beneficial to have a tutorial that follows user-focused flow, 
rather than the feature-focused flow of these unit tests.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [SLM] Add unit tests for SLM to Relax exporter [tvm]

2024-03-26 Thread via GitHub


Lunderberg commented on code in PR #16784:
URL: https://github.com/apache/tvm/pull/16784#discussion_r1539197565


##
tests/python/relax/test_frontend_nn_exporter.py:
##
@@ -0,0 +1,632 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import pytest
+
+import tvm
+import tvm.testing
+
+from tvm import relax, tir
+from tvm.ir import assert_structural_equal
+from tvm.relax.frontend import nn
+from tvm.script import ir as I, relax as R, tir as T
+
+
+def test_simple():
+"""A module may be exported from nn.Module to Relax"""
+
+slm_mod = nn.modules.ReLU()
+exported_mod, _ = slm_mod.export_tvm(
+spec={"forward": {"x": nn.spec.Tensor((3, 3), "float32")}},
+debug=False,
+)
+
+@I.ir_module
+class Expected:
+@R.function
+def forward(x: R.Tensor([3, 3], dtype="float32")):
+R.func_attr({"num_input": 1})
+with R.dataflow():
+relu = R.nn.relu(x)
+relu = relu
+R.output(relu)
+return relu
+
+assert_structural_equal(exported_mod, Expected)
+
+
+def test_custom_module():
+"""A module may be exported from nn.Module to Relax"""

Review Comment:
   Good call, and I've updated the docstring to remove the copy/paste duplicate.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [TIR] Modify IntImmNode deep_equal to match regardless of type [tvm]

2024-03-26 Thread via GitHub


tqchen commented on PR #16795:
URL: https://github.com/apache/tvm/pull/16795#issuecomment-2020375413

   In this particular case, i think type do matter, so it would be great 
instead to fix the cases that would depend on the relaxed behavior.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Analysis] Allow calls to GlobalVar in @R.function [tvm]

2024-03-26 Thread via GitHub


Lunderberg merged PR #16778:
URL: https://github.com/apache/tvm/pull/16778


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(tvm) branch main updated: [Analysis] Allow calls to GlobalVar in @R.function (#16778)

2024-03-26 Thread lunderberg
This is an automated email from the ASF dual-hosted git repository.

lunderberg pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new 72f0326a88 [Analysis] Allow calls to GlobalVar in @R.function (#16778)
72f0326a88 is described below

commit 72f0326a889b60a146fb51aca4041abf0fb0fbb9
Author: Eric Lunderberg 
AuthorDate: Tue Mar 26 08:03:33 2024 -0500

[Analysis] Allow calls to GlobalVar in @R.function (#16778)

* [Analysis] Allow calls to GlobalVar in @R.function

Prior to this commit, the post-parsing well-formed check performed by
TVMScript allowed a call to `GlobalVar` in a `@R.function`, but only
if it occurred within the context of a `@I.ir_module`.  If
`@R.function` appeared on its own, calls to a `GlobalVar` would be
treated as calls to an undefined function.

* Use approrpirate well-formed checks TIR/Relax functions

* Lint fix

* Import order fix
---
 include/tvm/relax/analysis.h|  6 ++--
 python/tvm/relax/analysis/analysis.py   |  8 ++---
 python/tvm/script/parser/core/entry.py  | 26 +-
 src/relax/analysis/well_formed.cc   | 47 ++---
 tests/python/relax/test_analysis_well_formed.py | 34 ++
 tests/python/relax/test_tvmscript_parser.py | 37 +++
 6 files changed, 122 insertions(+), 36 deletions(-)

diff --git a/include/tvm/relax/analysis.h b/include/tvm/relax/analysis.h
index 0c43732813..fa928d082d 100644
--- a/include/tvm/relax/analysis.h
+++ b/include/tvm/relax/analysis.h
@@ -547,15 +547,15 @@ TVM_DLL bool ContainsImpureCall(const Expr& expr,
 /*!
  * \brief Check if the IRModule is well formed.
  *
- * \param m the IRModule to check.
+ * \param obj The IRModule or relax::Function to check.
  * \param check_struct_info A boolean flag indicating if the property "every 
Expr
  * must have defined structure info" will be checked.
- * \return true if the IRModule is well formed, false if not.
+ * \return true if the object is well formed, false if not.
  * \note By default the structure info is always checked. It is only in test 
cases
  * where `check_struct_info` might be false, so that other well-formed 
requirements
  * will be well tested and will not be blocked by not having structure info.
  */
-TVM_DLL bool WellFormed(IRModule m, bool check_struct_info = true);
+TVM_DLL bool WellFormed(Variant obj, bool 
check_struct_info = true);
 
 /*!
  * \brief Using the layout transforms on the outputs, suggest layout 
transformation on the blocks
diff --git a/python/tvm/relax/analysis/analysis.py 
b/python/tvm/relax/analysis/analysis.py
index 83286c0980..e6eaff3711 100644
--- a/python/tvm/relax/analysis/analysis.py
+++ b/python/tvm/relax/analysis/analysis.py
@@ -434,13 +434,13 @@ def remove_all_unused(func: Function) -> Function:
 return _ffi_api.remove_all_unused(func)  # type: ignore
 
 
-def well_formed(mod: IRModule, check_struct_info: bool = True) -> bool:
+def well_formed(obj: Union[IRModule, Function], check_struct_info: bool = 
True) -> bool:
 """Check if the IRModule is well formed.
 
 Parameters
 --
-mod : tvm.IRModule
-The input IRModule.
+obj : Union[tvm.IRModule, Function]
+The input IRModule or relax.Function.
 
 check_struct_info : bool
 A boolean flag indicating if the property "every Expr must
@@ -457,7 +457,7 @@ def well_formed(mod: IRModule, check_struct_info: bool = 
True) -> bool:
 where `check_struct_info` might be false, so that other well-formed 
requirements
 will be well tested and will not be blocked by not having structure info.
 """
-return _ffi_api.well_formed(mod, check_struct_info)  # type: ignore
+return _ffi_api.well_formed(obj, check_struct_info)  # type: ignore
 
 
 def _get_prim_func_default_dtype(func: PrimFunc):
diff --git a/python/tvm/script/parser/core/entry.py 
b/python/tvm/script/parser/core/entry.py
index 0c88cacf8a..e7a7f98b76 100644
--- a/python/tvm/script/parser/core/entry.py
+++ b/python/tvm/script/parser/core/entry.py
@@ -18,6 +18,7 @@
 import inspect
 from typing import Any, Dict, Union
 
+import tvm
 from ir.module import IRModule
 from ...ir_builder import IRBuilder
 from . import doc
@@ -34,12 +35,19 @@ WELL_FORMED_ERROR_MESSAGE = (
 
 
 def _default_globals() -> Dict[str, Any]:
-import tvm  # pylint: disable=import-outside-toplevel
 from tvm.script.parser import ir  # pylint: disable=import-outside-toplevel
 from tvm.script.parser import relax  # pylint: 
disable=import-outside-toplevel
 from tvm.script.parser import tir  # pylint: 
disable=import-outside-toplevel
 
-extra_vars = {"tvm": tvm, "I": ir, "ir": ir, "T": tir, "tir": tir, "R": 
relax, "relax": relax}
+extra_vars = {
+"tvm": tvm,
+"I": ir,
+"ir": ir,
+"T": tir,
+

(tvm) branch main updated: [Codegen, Cuda] Add overload for fp8x4 e5m2 <-> half4 conversion (#16787)

2024-03-26 Thread tqchen
This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new ae7b8d9aed [Codegen, Cuda] Add overload for fp8x4 e5m2 <-> half4 
conversion (#16787)
ae7b8d9aed is described below

commit ae7b8d9aeddd81c862e03255b7628bf5932c24ec
Author: Wuwei Lin 
AuthorDate: Tue Mar 26 05:58:18 2024 -0700

[Codegen, Cuda] Add overload for fp8x4 e5m2 <-> half4 conversion (#16787)
---
 src/target/source/literal/cuda_half_t.h | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/src/target/source/literal/cuda_half_t.h 
b/src/target/source/literal/cuda_half_t.h
index bf3e83928e..27d44d9f7f 100644
--- a/src/target/source/literal/cuda_half_t.h
+++ b/src/target/source/literal/cuda_half_t.h
@@ -410,7 +410,28 @@ struct __align__(8) half4 {
 result.__x =
 (static_cast<__uint32_t>(lo_part.__x) | 
(static_cast<__uint32_t>(hi_part.__x) << 16));
 return result;
-  })";
+  }
+  __host__ __device__ explicit half4(const __nv_fp8x4_e5m2& fp8x4) {
+__nv_fp8x2_e5m2 lo_part, hi_part;
+lo_part.__x = static_cast<__nv_fp8x2_storage_t>(fp8x4.__x & 0x);
+hi_part.__x = static_cast<__nv_fp8x2_storage_t>((fp8x4.__x >> 16) & 
0x);
+__half2 lo_half2 = static_cast<__half2>(lo_part);
+__half2 hi_half2 = static_cast<__half2>(hi_part);
+x = reinterpret_cast<__half*>(_half2)[0];
+y = reinterpret_cast<__half*>(_half2)[1];
+z = reinterpret_cast<__half*>(_half2)[0];
+w = reinterpret_cast<__half*>(_half2)[1];
+  }
+  __host__ __device__ explicit operator __nv_fp8x4_e5m2() const {
+__nv_fp8x4_e5m2 result;
+__half2 lo_half2 = *reinterpret_cast();
+__half2 hi_half2 = *reinterpret_cast();
+__nv_fp8x2_e5m2 lo_part(lo_half2), hi_part(hi_half2);
+result.__x =
+(static_cast<__uint32_t>(lo_part.__x) | 
(static_cast<__uint32_t>(hi_part.__x) << 16));
+return result;
+  }
+  )";
 }
 stream << R"(
 };



Re: [PR] [Fix] Fix build errors with VS2022 [tvm]

2024-03-26 Thread via GitHub


tqchen commented on PR #16790:
URL: https://github.com/apache/tvm/pull/16790#issuecomment-2020360881

   Thank you @Jiawei-Shao 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Fix] Fix build errors with VS2022 [tvm]

2024-03-26 Thread via GitHub


tqchen merged PR #16790:
URL: https://github.com/apache/tvm/pull/16790


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Codegen, Cuda] Add overload for fp8x4 e5m2 <-> half4 conversion [tvm]

2024-03-26 Thread via GitHub


tqchen merged PR #16787:
URL: https://github.com/apache/tvm/pull/16787


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(tvm) branch main updated (b2204ae698 -> 69c091400a)

2024-03-26 Thread tqchen
This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


from b2204ae698 [IR] Default to empty attributes, instead of NULL (#16745)
 add 69c091400a [Fix] Fix build errors with VS2022 (#16790)

No new revisions were added by this update.

Summary of changes:
 src/runtime/metadata.cc | 3 +--
 src/tir/analysis/identify_memcpy.cc | 2 +-
 src/tir/contrib/ethosu/passes.cc| 2 +-
 3 files changed, 3 insertions(+), 4 deletions(-)



[PR] [TIR] Modify IntImmNode deep_equal to match regardless of type [tvm]

2024-03-26 Thread via GitHub


quic-sanirudh opened a new pull request, #16795:
URL: https://github.com/apache/tvm/pull/16795

   This patch makes a small change to compare the values of IntImmNode to see 
if they're equal when performing a deep_equal of expressions. This is to try 
and align it with how the 
[`PEqualChecker`](https://github.com/apache/tvm/blob/b2204ae6988c7745ea9736340ccd900bc21ae821/src/arith/pattern_match.h#L168)
 works where we only compare the values if both are IntImm.
   
   This caused some simplifications to be inconsistent based on whether we used 
IntImmNode or PrimExpr to pass an integer between different passes, and it 
seemed to make more sense to say that if the values are equal, then we can 
conclude the immediates are equal.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Frontend][PaddlePaddle] Update the export method of PaddlePaddle Softmax [tvm]

2024-03-26 Thread via GitHub


Zheng-Bicheng commented on PR #16653:
URL: https://github.com/apache/tvm/pull/16653#issuecomment-2020098015

   > Hello,@leandron . I found in 
[cmsis.py](https://github.com/apache/tvm/blob/ff3716b83a72c2ff261c492f259e1fcd260600ce/python/tvm/relay/op/contrib/cmsisnn.py#L90)
 that the scale of softmax must be 1/256 and the zero point must be -128. Why 
is that? According to the formula Q(x_fp32, scale, zero_point) = 
round(x_fp32/scale) + zero_point, scale and zp should be adjustable (for 
example, in the case where scale is 1/128 and zp is 0, it should still meet the 
conditions for int8), right?
   
   By the way, in my testing of the Paddle model, the scale is 0.0078649195 
(close to 1/127), and the zero point is 0."


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Frontend][PaddlePaddle] Update the export method of PaddlePaddle Softmax [tvm]

2024-03-26 Thread via GitHub


Zheng-Bicheng commented on PR #16653:
URL: https://github.com/apache/tvm/pull/16653#issuecomment-2020092045

   Hello,@leandron . I found in 
[cmsis.py](https://github.com/apache/tvm/blob/ff3716b83a72c2ff261c492f259e1fcd260600ce/python/tvm/relay/op/contrib/cmsisnn.py#L90)
 that the scale of softmax must be 1/256 and the zero point must be -128. Why 
is that? According to the formula Q(x_fp32, scale, zero_point) = 
round(x_fp32/scale) + zero_point, scale and zp should be adjustable (for 
example, in the case where scale is 1/128 and zp is 0, it should still meet the 
conditions for int8), right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [SME] Target parser support for SME [tvm]

2024-03-26 Thread via GitHub


lhutton1 opened a new pull request, #16794:
URL: https://github.com/apache/tvm/pull/16794

   This commit adds support for recognising when the SME architecture feature 
is available based on the target string. A python user can use 
target.features.has_sme to check availability.
   
   This PR relies on #16425


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Bug] Tensorization Failure During Multilevel Tiling with Tensor Intrin [tvm]

2024-03-26 Thread via GitHub


krishnab30 commented on issue #16614:
URL: https://github.com/apache/tvm/issues/16614#issuecomment-2020074885

   Hi @zxybazh , I am facing the same issue
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Target] Use LLVM target parser for determining Arm(R) A-Profile Architecture features [tvm]

2024-03-26 Thread via GitHub


cbalint13 commented on PR #16425:
URL: https://github.com/apache/tvm/pull/16425#issuecomment-2019978336

   > Here is a reproducer:
   > mem_leak.cpp
   
   Thanks a lot for this, I start to look at it now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Target] Use LLVM target parser for determining Arm(R) A-Profile Architecture features [tvm]

2024-03-26 Thread via GitHub


lhutton1 commented on PR #16425:
URL: https://github.com/apache/tvm/pull/16425#issuecomment-2019958120

   Here is a reproducer:
   mem_leak.cpp
   ```c++
   #include "tvm/runtime/registry.h"
   #include "tvm/target/target.h"
   
   int main() {
 auto pf = tvm::runtime::Registry::Get("target.llvm_get_cpu_archlist");
 (*pf)(tvm::Target("llvm"));
   }
   ```
   Compile:
   ```bash
   g++ -std=c++17 -O2 -fPIC -I{TVM_DIR}/include 
-I{TVM_DIR}/3rdparty/dmlc-core/include -I{TVM_DIR}/tvm/3rdparty/dlpack/include 
-DDMLC_USE_LOGGING_LIBRARY=\ -o mem_leak_exec 
mem_leak.cpp -L{TVM_BUILD_DIR} -ldl -ltvm -pthread
   ```
   Run with valgrind:
   ```bash
   LD_PRELOAD="{TVM_BUILD_DIR}/libtvm.so" valgrind --leak-check=full -v 
--track-origins=yes ./mem_leak_exec
   ```
   
   Output:
   ```
   ...
   ==475237== 12,369 (1,560 direct, 10,809 indirect) bytes in 1 blocks are 
definitely lost in loss record 42,596 of 42,630
   ==475237==at 0x4849013: operator new(unsigned long) (in 
/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
   ==475237==by 0x12244479: ??? (in 
/usr/lib/x86_64-linux-gnu/libLLVM-17.so.1)
   ==475237==by 0xBC0131B: 
llvm::Target::createTargetMachine(llvm::StringRef, llvm::StringRef, 
llvm::StringRef, llvm::TargetOptions const&, std::optional, 
std::optional, llvm::CodeGenOpt::Level, bool) const 
(TargetRegistry.h:488)
   ==475237==by 0xBBFBC05: 
tvm::codegen::CreateLLVMTargetMachine(llvm::Target const*, 
std::__cxx11::basic_string, std::allocator > 
const&, std::__cxx11::basic_string, 
std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, llvm::TargetOptions 
const&, llvm::Reloc::Model const&, llvm::CodeModel::Model const&, 
llvm::CodeGenOpt::Level const&) (llvm_instance.cc:393)
   ==475237==by 0xBBFBD8A: 
tvm::codegen::GetLLVMSubtargetInfo(std::__cxx11::basic_string, std::allocator > const&, 
std::__cxx11::basic_string, std::allocator > 
const&, std::__cxx11::basic_string, 
std::allocator > const&) (llvm_instance.cc:408)
   ==475237==by 0xBBFEAE1: 
tvm::codegen::LLVMTargetInfo::GetAllLLVMTargetArches() const 
(llvm_instance.cc:835)
   ==475237==by 0xBBFA2BB: 
tvm::codegen::LLVMTargetInfo::LLVMTargetInfo(tvm::codegen::LLVMInstance&, 
tvm::Target const&) (llvm_instance.cc:218)
   ==475237==by 0xBC0EE16: tvm::codegen::__mk_TVM8::{lambda(tvm::Target 
const&)#1}::operator()(tvm::Target const) const (llvm_module.cc:695)
   ==475237==by 0xBC188B0: 
tvm::runtime::TypedPackedFunc 
(tvm::Target 
const&)>::AssignTypedLambda(tvm::codegen::__mk_TVM8::{lambda(tvm::Target const&)#1}, 
std::__cxx11::basic_string, std::allocator 
>)::{lambda(tvm::runtime::TVMArgs const&, 
tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const, 
tvm::runtime::TVMRetValue) const (packed_func.h:1826)
   ==475237==by 0xBC233EE: 
tvm::runtime::PackedFuncObj::Extractor (tvm::Target 
const&)>::AssignTypedLambda(tvm::codegen::__mk_TVM8::{lambda(tvm::Target const&)#1}, 
std::__cxx11::basic_string, std::allocator 
>)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> 
>::Call(tvm::runtime::PackedFuncObj const*, std::__cxx11::basic_string, std::allocator >, tvm::runtime::TVMRetValue) 
(packed_func.h:1252)
   ==475237==by 0x1092F8: main (in 
/workspaces/tvm-ethosn/src/tvm/test_mem_leak/cpp_deploy)
   ...
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [Fix] fix for numpy 2.0 Compatibility [tvm]

2024-03-26 Thread via GitHub


mshr-h opened a new pull request, #16793:
URL: https://github.com/apache/tvm/pull/16793

   I checked the entire tvm codebase with `ruff check . --select NPY201` and no 
other deprecations were detected.
   Changes are below.
   - use `-np.inf` instead of `np.NINF`
   - use `np.inf` instead of `np.infty`
   - better attribute existence check #16780 
   
   ref: [NumPy 2.0 migration guide — NumPy v2.1.dev0 
Manual](https://numpy.org/devdocs/numpy_2_0_migration_guide.html)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [CI][AArch64] Enable ONNX and PyTorch tests on AArch64 [tvm]

2024-03-26 Thread via GitHub


Liam-Sturge commented on PR #16747:
URL: https://github.com/apache/tvm/pull/16747#issuecomment-2019939171

   OK, thanks for your feedback @tqchen. I see your point relating to overdoing 
the testing. Do we currently have any implementation of nightly tests set up 
that I could append these tests on to? I didn't see anything when I looked, but 
I may have missed something.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Fix] Fix build errors with VS2022 [tvm]

2024-03-26 Thread via GitHub


Jiawei-Shao commented on PR #16790:
URL: https://github.com/apache/tvm/pull/16790#issuecomment-2019840094

   @tqchen PTAL, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [Doc] Fix set_axis_separator example [tvm]

2024-03-26 Thread via GitHub


quic-sanirudh opened a new pull request, #16792:
URL: https://github.com/apache/tvm/pull/16792

   Minor fix to update the `set_axis_separator` example to match the definition


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org