[GitHub] [incubator-tvm] comaniac commented on pull request #6867: Fix bug in processing script
comaniac commented on pull request #6867: URL: https://github.com/apache/incubator-tvm/pull/6867#issuecomment-723392973 Thanks @hogepodge This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] comaniac merged pull request #6867: Fix bug in processing script
comaniac merged pull request #6867: URL: https://github.com/apache/incubator-tvm/pull/6867 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-tvm] branch main updated: Fix bug in processing script (#6867)
This is an automated email from the ASF dual-hosted git repository. comaniac pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git The following commit(s) were added to refs/heads/main by this push: new b7b69a2 Fix bug in processing script (#6867) b7b69a2 is described below commit b7b69a2d1dbfe7a9cd04ddab2e60f33654419d58 Author: Chris Hoge AuthorDate: Fri Nov 6 21:07:08 2020 -0800 Fix bug in processing script (#6867) The argsort command returns a new array that is the sorted index rather than a new sorted value array. This patch stores the sorted index in a new variable and uses it to reference the predicted values. --- tutorials/get_started/tvmc_command_line_driver.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tutorials/get_started/tvmc_command_line_driver.py b/tutorials/get_started/tvmc_command_line_driver.py index d844de5..bcdf03e 100644 --- a/tutorials/get_started/tvmc_command_line_driver.py +++ b/tutorials/get_started/tvmc_command_line_driver.py @@ -246,10 +246,10 @@ if os.path.exists(output_file): with np.load(output_file) as data: scores = softmax(data["output_0"]) scores = np.squeeze(scores) -scores = np.argsort(scores)[::-1] +ranks = np.argsort(scores)[::-1] -for i in scores[0:5]: -print("class='%s' with probability=%f" % (labels[i], scores[i])) +for rank in ranks[0:5]: +print("class='%s' with probability=%f" % (labels[rank], scores[rank]))
[GitHub] [incubator-tvm] vinx13 commented on pull request #6840: conv1d_transpose speedup
vinx13 commented on pull request #6840: URL: https://github.com/apache/incubator-tvm/pull/6840#issuecomment-723379937 Thanks @alexgl-github @anijain2305 @giuseros This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-tvm] branch main updated: conv1d_transpose speedup. (#6840)
This is an automated email from the ASF dual-hosted git repository. wuwei pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git The following commit(s) were added to refs/heads/main by this push: new f0979e4 conv1d_transpose speedup. (#6840) f0979e4 is described below commit f0979e4207d8e61c470f86f7ee0137402330b650 Author: Alex Gladkov AuthorDate: Fri Nov 6 18:53:54 2020 -0800 conv1d_transpose speedup. (#6840) Improve performance of transposed convolution by avoiding redundant multiplication by zero values from dilated data. Co-authored-by: Ubuntu --- python/tvm/topi/cuda/conv1d_transpose_ncw.py | 75 +++--- .../topi/python/test_topi_conv1d_transpose_ncw.py | 4 ++ 2 files changed, 40 insertions(+), 39 deletions(-) diff --git a/python/tvm/topi/cuda/conv1d_transpose_ncw.py b/python/tvm/topi/cuda/conv1d_transpose_ncw.py index 1ddbdcc..58f53ea 100644 --- a/python/tvm/topi/cuda/conv1d_transpose_ncw.py +++ b/python/tvm/topi/cuda/conv1d_transpose_ncw.py @@ -65,29 +65,46 @@ def conv1d_transpose_ncw(cfg, data, kernel, stride, padding, out_dtype, output_p out_width = (inp_width - 1) * stride + kernel_size - pad_left - pad_right + output_padding pad_left = kernel_size - 1 - pad_left pad_right = kernel_size - 1 - pad_right + output_padding -dilated_width = stride * (inp_width - 1) + 1 -data = te.compute( -(batch, inp_channels, pad_left + dilated_width + pad_right), +padded_width = pad_left + inp_width + pad_right + +padded_data = te.compute( +(batch, inp_channels, padded_width), lambda n, c, x: tvm.tir.if_then_else( -tvm.tir.all( -x >= pad_left, -x < pad_left + dilated_width, -tvm.tir.indexmod(x - pad_left, stride).equal(0), -), -data[n, c, tvm.tir.indexdiv(x - pad_left, stride)], +tvm.tir.all(x >= pad_left, x < pad_left + inp_width), +data[n, c, x - pad_left], tvm.tir.const(0.0, "float32"), ), name="data_pad", ) -dc = te.reduce_axis((0, inp_channels), name="dc") -dw = te.reduce_axis((0, kernel_size), name="dw") +padded_kernel = te.compute( +(inp_channels, out_channels, kernel_size + stride - 1), +lambda ci, co, k: tvm.tir.if_then_else( +tvm.tir.all(k < kernel_size), +kernel[ci, co, kernel_size - k - 1], +tvm.tir.const(0.0, "float32"), +), +name="kernel_pad", +) + +ci = te.reduce_axis((0, inp_channels), name="ci") +k = te.reduce_axis((0, tvm.tir.indexdiv(kernel_size + stride - 1, stride)), name="k") +border = pad_left * (stride - 1) + +# Skip multiplication by 0 values in the input data inserted when stride is greater then 1. +# During multiplication of kernel by padded data: +# Kernel indices are: 0, 1 * stride, 2 * stride, ..., ceil(kernel_size / stride) plus +# data offset mod stride data_out = te.compute( (batch, out_channels, out_width), -lambda b, c, w: te.sum( -data[b, dc, w + dw].astype(out_dtype) -* kernel[dc, c, kernel_size - 1 - dw].astype(out_dtype), -axis=[dc, dw], +lambda b, co, w: te.sum( +padded_data[b, ci, tvm.tir.indexdiv(border + w + stride - 1, stride) + k].astype( +out_dtype +) +* padded_kernel[ +ci, co, k * stride + tvm.tir.indexmod(stride - w - border, stride) +].astype(out_dtype), +axis=[ci, k], ), tag="conv1d_transpose_ncw", ) @@ -118,8 +135,8 @@ def schedule_conv1d_transpose_ncw(cfg, outs): def _callback(op): if op.tag == "conv1d_transpose_ncw": -pad_data = op.input_tensors[0] -kernel = op.input_tensors[1] +padded_data = op.input_tensors[0] +padded_kernel = op.input_tensors[1] conv = op.output(0) # space definition begin # @@ -139,9 +156,6 @@ def schedule_conv1d_transpose_ncw(cfg, outs): # space definition end # -if isinstance(kernel.op, tvm.te.ComputeOp) and "dilate" in kernel.op.tag: -s[kernel].compute_inline() - if conv.op in s.outputs: output = conv OL = s.cache_write(conv, "local") @@ -150,10 +164,8 @@ def schedule_conv1d_transpose_ncw(cfg, outs): s[conv].set_scope("local") OL = conv -# create cache stage -s[pad_data].set_scope("shared") -AA = pad_data -WW = s.cache_read(kernel, "shared", [OL]) +s[padded_kernel].compute_inline() +s[padded_data].compute_inline() # tile and bind spatial axes n, f, x = s[output].op.axis @@ -172,9 +184,6 @@ def schedule_con
[GitHub] [incubator-tvm] vinx13 merged pull request #6840: conv1d_transpose speedup
vinx13 merged pull request #6840: URL: https://github.com/apache/incubator-tvm/pull/6840 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] vinx13 commented on pull request #6714: More flexible conv2d_NCHWc_int8 generic operator.
vinx13 commented on pull request #6714: URL: https://github.com/apache/incubator-tvm/pull/6714#issuecomment-723378800 @cbalint13 could you try restarting the CI? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-tvm] branch main updated: making quantization tweaks (#6731)
This is an automated email from the ASF dual-hosted git repository. wuwei pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git The following commit(s) were added to refs/heads/main by this push: new ff9c480 making quantization tweaks (#6731) ff9c480 is described below commit ff9c4803913b82085f281c98afbd54feedefeb7c Author: Thierry Moreau AuthorDate: Fri Nov 6 18:20:56 2020 -0800 making quantization tweaks (#6731) --- python/tvm/relay/quantize/_annotate.py | 43 ++ src/relay/quantize/realize.cc | 36 2 files changed, 79 insertions(+) diff --git a/python/tvm/relay/quantize/_annotate.py b/python/tvm/relay/quantize/_annotate.py index b187387..6c395e2 100644 --- a/python/tvm/relay/quantize/_annotate.py +++ b/python/tvm/relay/quantize/_annotate.py @@ -175,6 +175,28 @@ def conv2d_rewrite(ref_call, new_args, ctx): return QAnnotateExpr(expr, QAnnotateKind.ACTIVATION) +@register_annotate_function("nn.conv1d") +def conv1d_rewrite(ref_call, new_args, ctx): +"""Rewrite function for conv1d. Lhs of conv will be quantized to +input field, and rhs of conv will be quantized to weight field. +Output would be in activation field""" +if quantize_context().check_to_skip(ref_call): +return None + +lhs_expr, lhs_kind = _get_expr_kind(new_args[0]) +rhs_expr, rhs_kind = _get_expr_kind(new_args[1]) + +if lhs_kind is None or lhs_kind == QAnnotateKind.ACTIVATION: +lhs_expr = attach_simulated_quantize(lhs_expr, QAnnotateKind.INPUT) + +assert rhs_kind is None +rhs_expr = attach_simulated_quantize(rhs_expr, QAnnotateKind.WEIGHT) + +expr = _forward_op(ref_call, [lhs_expr, rhs_expr]) + +return QAnnotateExpr(expr, QAnnotateKind.ACTIVATION) + + @register_annotate_function("nn.dense") def dense_rewrite(ref_call, new_args, ctx): """Rewrite function for dense. Lhs of dense will be quantized to input field, and rhs of @@ -289,6 +311,8 @@ register_annotate_function("clip", identity_rewrite) register_annotate_function("nn.relu", identity_rewrite) register_annotate_function("strided_slice", identity_rewrite) register_annotate_function("nn.avg_pool2d", identity_rewrite) +register_annotate_function("nn.batch_flatten", identity_rewrite) +register_annotate_function("transpose", identity_rewrite) register_annotate_function("annotation.stop_fusion", identity_rewrite) @@ -311,6 +335,25 @@ def pool2d_rewrite(ref_call, new_args, ctx): register_annotate_function("nn.max_pool2d", pool2d_rewrite) +def pool1d_rewrite(ref_call, new_args, ctx): +"""Rewrite function for max pool1d""" +if quantize_context().check_to_skip(ref_call): +return None + +expr, x_kind = _get_expr_kind(new_args[0]) + +if x_kind is None: +return None +if x_kind == QAnnotateKind.ACTIVATION: +expr = attach_simulated_quantize(expr, QAnnotateKind.INPUT) + +expr = _forward_op(ref_call, [expr]) +return QAnnotateExpr(expr, QAnnotateKind.INPUT) + + +register_annotate_function("nn.max_pool1d", pool1d_rewrite) + + @register_annotate_function("annotation.cast_hint") def cast_hint_rewrite(ref_call, new_args, ctx): """Rewrite function to force cast""" diff --git a/src/relay/quantize/realize.cc b/src/relay/quantize/realize.cc index 8db72a3..2716c6e 100644 --- a/src/relay/quantize/realize.cc +++ b/src/relay/quantize/realize.cc @@ -234,6 +234,37 @@ Expr Conv2dRealize(const Call& ref_call, const Array& new_args, const Obje RELAY_REGISTER_OP("nn.conv2d").set_attr("FQRealizeRewrite", Conv2dRealize); +Expr Conv1dRealize(const Call& ref_call, const Array& new_args, const ObjectRef& ctx) { + const QConfig& cfg = QConfig::Current(); + CHECK_EQ(new_args.size(), 2); + if (!new_args[0]->IsInstance() && !new_args[1]->IsInstance()) { +return Expr(nullptr); + } + const auto* lhs = new_args[0].as(); + CHECK(lhs); + const auto* rhs = new_args[1].as(); + CHECK(rhs); + + Expr ldata = lhs->data; + if (lhs->dtype != cfg->dtype_input) { +ldata = Cast(ldata, cfg->dtype_input); + } + Expr rdata = Cast(rhs->data, cfg->dtype_weight); + + const auto ref_attrs = ref_call->attrs.as(); + auto attrs = make_object(); + *attrs = *ref_attrs; + DataType out_dtype = cfg->dtype_activation; + attrs->out_dtype = out_dtype; + + Expr ret = Call(ref_call->op, {ldata, rdata}, Attrs(attrs), ref_call->type_args); + Expr mul = Multiply(lhs->dom_scale, rhs->dom_scale); + Expr dom_scale = FoldConstantOpt(mul); + return QRealizeIntExpr(ret, dom_scale, out_dtype); +} + +RELAY_REGISTER_OP("nn.conv1d").set_attr("FQRealizeRewrite", Conv1dRealize); + Expr DenseRealize(const Call& ref_call, const Array& new_args, const ObjectRef& ctx) { const QConfig& cfg = QConfig::Current(); ICHECK_EQ(new_args.size(), 2); @@ -449,6 +480,8 @@ RELAY_REGISTER_OP("strided_slice").set_attr("FQRealizeRewrite", RELAY_REGISTER_OP("nn.batch_flatten")
[GitHub] [incubator-tvm] vinx13 merged pull request #6731: [Quantization] Support for more ops (conv1d)
vinx13 merged pull request #6731: URL: https://github.com/apache/incubator-tvm/pull/6731 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-tvm] branch main updated: Update search for bitcode files for rocm 3.9 (#6865)
This is an automated email from the ASF dual-hosted git repository. masahi pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git The following commit(s) were added to refs/heads/main by this push: new 89ce1ed Update search for bitcode files for rocm 3.9 (#6865) 89ce1ed is described below commit 89ce1ed44904dd2a37237241c14f7e70bdc2729e Author: Thomas Viehmann AuthorDate: Sat Nov 7 02:41:04 2020 +0100 Update search for bitcode files for rocm 3.9 (#6865) rocm 3.9 moved the bitcodes, we adapt to that. As this gives opaque error messages that are hard to debug (loading the module fails with could not initialize shared object but does not tell you about the missing symbols), we tighten the checks at this stage: - we become more strict with missing bitcodes, - we let the linker fail loudly for unresolved symbols. --- python/tvm/contrib/rocm.py | 72 -- 1 file changed, 51 insertions(+), 21 deletions(-) diff --git a/python/tvm/contrib/rocm.py b/python/tvm/contrib/rocm.py index e69b255..4f62f1a 100644 --- a/python/tvm/contrib/rocm.py +++ b/python/tvm/contrib/rocm.py @@ -73,7 +73,20 @@ def rocm_link(in_file, out_file, lld=None): The lld linker, if not specified, we will try to guess the matched clang version. """ -args = [lld if lld is not None else find_lld()[0], "-shared", in_file, "-o", out_file] + +# if our result has undefined symbols, it will fail to load +# (hipModuleLoad/hipModuleLoadData), but with a somewhat opaque message +# so we have ld.lld check this here. +# If you get a complaint about missing symbols you might want to check the +# list of bitcode files below. +args = [ +lld if lld is not None else find_lld()[0], +"--no-undefined", +"-shared", +in_file, +"-o", +out_file, +] proc = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) (out, _) = proc.communicate() @@ -108,7 +121,7 @@ def callback_rocm_link(obj_bin): @tvm._ffi.register_func("tvm_callback_rocm_bitcode_path") -def callback_rocm_bitcode_path(rocdl_dir="/opt/rocm/lib/"): +def callback_rocm_bitcode_path(rocdl_dir=None): """Utility function to find ROCm device library bitcodes Parameters @@ -118,23 +131,40 @@ def callback_rocm_bitcode_path(rocdl_dir="/opt/rocm/lib/"): The default value is the standard location """ # seems link order matters. -bitcode_files = [ -"oclc_daz_opt_on.amdgcn.bc", -"ocml.amdgcn.bc", -"hc.amdgcn.bc", -"irif.amdgcn.bc", -"ockl.amdgcn.bc", -"oclc_correctly_rounded_sqrt_off.amdgcn.bc", -"oclc_correctly_rounded_sqrt_on.amdgcn.bc", -"oclc_daz_opt_off.amdgcn.bc", -"oclc_finite_only_off.amdgcn.bc", -"oclc_finite_only_on.amdgcn.bc", -"oclc_isa_version_803.amdgcn.bc", -"oclc_isa_version_900.amdgcn.bc", -"oclc_isa_version_906.amdgcn.bc", -"oclc_unsafe_math_off.amdgcn.bc", -"oclc_unsafe_math_on.amdgcn.bc", -"oclc_wavefrontsize64_on.amdgcn.bc", + +if rocdl_dir is None: +if exists("/opt/rocm/amdgcn/bitcode/"): +rocdl_dir = "/opt/rocm/amdgcn/bitcode/" # starting with rocm 3.9 +else: +rocdl_dir = "/opt/rocm/lib/" # until rocm 3.8 + +bitcode_names = [ +"oclc_daz_opt_on", +"ocml", +"hc", +"irif", # this does not exist in rocm 3.9, drop eventually +"ockl", +"oclc_correctly_rounded_sqrt_off", +"oclc_correctly_rounded_sqrt_on", +"oclc_daz_opt_off", +"oclc_finite_only_off", +"oclc_finite_only_on", +"oclc_isa_version_803", # todo (t-vi): an alternative might be to scan for the +"oclc_isa_version_900", # isa version files (if the linker throws out +"oclc_isa_version_906", # the unneeded ones or we filter for the arch we need) +"oclc_unsafe_math_off", +"oclc_unsafe_math_on", +"oclc_wavefrontsize64_on", ] -paths = [join(rocdl_dir, bitcode) for bitcode in bitcode_files] -return tvm.runtime.convert([path for path in paths if exists(path)]) + +bitcode_files = [] +for n in bitcode_names: +p = join(rocdl_dir, n + ".bc") # rocm >= 3.9 +if not exists(p): # rocm <= 3.8 +p = join(rocdl_dir, n + ".amdgcn.bc") +if exists(p): +bitcode_files.append(p) +elif "isa_version" not in n and n not in {"irif"}: +raise RuntimeError("could not find bitcode " + n) + +return tvm.runtime.convert(bitcode_files)
[GitHub] [incubator-tvm] masahi commented on pull request #6865: Update search for bitcode files for rocm 3.9
masahi commented on pull request #6865: URL: https://github.com/apache/incubator-tvm/pull/6865#issuecomment-723370898 Thanks @t-vi This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] masahi merged pull request #6865: Update search for bitcode files for rocm 3.9
masahi merged pull request #6865: URL: https://github.com/apache/incubator-tvm/pull/6865 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] comaniac commented on a change in pull request #6872: [BYOC][TRT] Allocate GPU data buffers and transfer data when needed
comaniac commented on a change in pull request #6872: URL: https://github.com/apache/incubator-tvm/pull/6872#discussion_r519076124 ## File path: src/runtime/contrib/tensorrt/tensorrt_runtime.cc ## @@ -141,26 +150,38 @@ class TensorRTRuntime : public JSONRuntimeBase { #else ICHECK(context->execute(batch_size_, bindings.data())) << "Running TensorRT failed."; #endif + +// Copy outputs from GPU buffers if needed. +for (size_t i = 0; i < outputs_.size(); ++i) { + uint32_t eid = EntryID(outputs_[i]); + const std::string& name = engine_and_context.outputs[i]; + int binding_index = engine->getBindingIndex(name.c_str()); + ICHECK_NE(binding_index, -1); + if (data_entry_[eid]->ctx.device_type != kDLGPU) { + device_buffers[binding_index].CopyTo(const_cast(data_entry_[eid])); + } +} } private: /*! * \brief Build TensorRT engine from JSON representation. */ void BuildEngine() { +if (trt_engine_cache_.count(symbol_name_)) return; Review comment: Improve the docstring to explicitly mention the caching functionality. ## File path: src/runtime/contrib/tensorrt/tensorrt_builder.cc ## @@ -217,6 +231,20 @@ void TensorRTBuilder::CleanUp() { } } +void TensorRTBuilder::AllocateDeviceBufferIfNeeded(nvinfer1::ICudaEngine* engine, Review comment: We can just name it `AllocateDeviceBuffer` and add comments to mention we will bypass if the data entry is already on the GPU. ## File path: src/runtime/contrib/tensorrt/tensorrt_runtime.cc ## @@ -106,9 +104,11 @@ class TensorRTRuntime : public JSONRuntimeBase { #ifdef TVM_GRAPH_RUNTIME_TENSORRT /*! \brief Run inference using built engine. */ void Run() override { +BuildEngine(); Review comment: Is the reason of moving `BuildEngine` from `Init` to `Run` because you need subgraph specific information (e.g., I/O data entry IDs) to allocate device buffers? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] jroesch opened a new pull request #6874: [Diagnostics] Enable AnnotateSpans by default, add environment variable for controlling top-level.
jroesch opened a new pull request #6874: URL: https://github.com/apache/incubator-tvm/pull/6874 Fix some parsing issues from NMS and add top-level environment variables. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] merrymercy edited a comment on issue #6869: [Relay][Ansor] cannot find a working schedule for certain tasks
merrymercy edited a comment on issue #6869: URL: https://github.com/apache/incubator-tvm/issues/6869#issuecomment-723357787 @Hecmay It seems you are using our dev branch. I did some debugging and found that the problem is because the weight tensor in your dense layer is too large (the shape is (25088, 4096), the size is around 400 MB) It causes some problems in the RPC copy. You can apply this diff in the `Ansor-dev` repo to fix the problem. ```diff diff --git a/python/tvm/ansor/measure.py b/python/tvm/ansor/measure.py index 7d06a5d96..93c293a5f 100644 --- a/python/tvm/ansor/measure.py +++ b/python/tvm/ansor/measure.py @@ -477,7 +477,9 @@ def rpc_run_worker(index): if get_special_buffer(arg.name) is not None: args.append(ndarray.array(get_special_buffer(arg.name))) else: - args.append(ndarray.non_empty(get_const_tuple(arg.shape), arg.dtype, ctx)) + #args.append(ndarray.non_empty(get_const_tuple(arg.shape), arg.dtype, ctx)) +tmp = ndarray.empty(get_const_tuple(arg.shape), arg.dtype, ctx) +args.append(ndarray.array(tmp, ctx=ctx)) ctx.sync() # retry until the coefficient of variation is small enough ``` This should work well for GPU, but you should not do this for CPU. Luckily, this problem is fixed in the upstream version because we use a different implementation in the upstream version. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] merrymercy commented on issue #6869: [Relay][Ansor] cannot find a working schedule for certain tasks
merrymercy commented on issue #6869: URL: https://github.com/apache/incubator-tvm/issues/6869#issuecomment-723357787 @Hecmay It seems you are using our dev branch. I did some debugging and found that the problem is because the weight tensor in your dense layer is too large (the shape is (25088, 4096), the size is around 400 MB) It causes some problems in the RPC copy. You can apply this diff in the `Ansor-dev` repo to fix the problem. ```diff diff --git a/python/tvm/ansor/measure.py b/python/tvm/ansor/measure.py index 7d06a5d96..93c293a5f 100644 --- a/python/tvm/ansor/measure.py +++ b/python/tvm/ansor/measure.py @@ -477,7 +477,9 @@ def rpc_run_worker(index): if get_special_buffer(arg.name) is not None: args.append(ndarray.array(get_special_buffer(arg.name))) else: - args.append(ndarray.non_empty(get_const_tuple(arg.shape), arg.dtype, ctx)) + #args.append(ndarray.non_empty(get_const_tuple(arg.shape), arg.dtype, ctx)) +tmp = ndarray.empty(get_const_tuple(arg.shape), arg.dtype, ctx) +args.append(ndarray.array(tmp, ctx=ctx)) ctx.sync() # retry until the coefficient of variation is small enough ``` This should work well for GPU, but you should not do this for CPU. This problem is fixed in the upstream version. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] merrymercy removed a comment on issue #6869: [Relay][Ansor] cannot find a working schedule for certain tasks
merrymercy removed a comment on issue #6869: URL: https://github.com/apache/incubator-tvm/issues/6869#issuecomment-723357031 It seems you are using our dev branch. I did some debugging and found that the problem is because the weight tensor in your dense layer is too large (the shape is (25088, 4096), the size is around 400 MB) It causes some problems in the RPC copy. You can apply this diff in the `Ansor-dev` repo to fix the problem. ```diff diff --git a/python/tvm/ansor/measure.py b/python/tvm/ansor/measure.py index 7d06a5d96..93c293a5f 100644 --- a/python/tvm/ansor/measure.py +++ b/python/tvm/ansor/measure.py @@ -477,7 +477,9 @@ def rpc_run_worker(index): if get_special_buffer(arg.name) is not None: args.append(ndarray.array(get_special_buffer(arg.name))) else: - args.append(ndarray.non_empty(get_const_tuple(arg.shape), arg.dtype, ctx)) + #args.append(ndarray.non_empty(get_const_tuple(arg.shape), arg.dtype, ctx)) +tmp = ndarray.empty(get_const_tuple(arg.shape), arg.dtype, ctx) +args.append(ndarray.array(tmp, ctx=ctx)) ctx.sync() # retry until the coefficient of variation is small enough ``` This should work well for GPU, but you should not do this for CPU. This problem is fixed in the upstream version. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] merrymercy edited a comment on issue #6869: [Relay][Ansor] cannot find a working schedule for certain tasks
merrymercy edited a comment on issue #6869: URL: https://github.com/apache/incubator-tvm/issues/6869#issuecomment-723357031 It seems you are using our dev branch. I did some debugging and found that the problem is because the weight tensor in your dense layer is too large (the shape is (25088, 4096), the size is around 400 MB) It causes some problems in the RPC copy. You can apply this diff in the `Ansor-dev` repo to fix the problem. ```diff diff --git a/python/tvm/ansor/measure.py b/python/tvm/ansor/measure.py index 7d06a5d96..93c293a5f 100644 --- a/python/tvm/ansor/measure.py +++ b/python/tvm/ansor/measure.py @@ -477,7 +477,9 @@ def rpc_run_worker(index): if get_special_buffer(arg.name) is not None: args.append(ndarray.array(get_special_buffer(arg.name))) else: - args.append(ndarray.non_empty(get_const_tuple(arg.shape), arg.dtype, ctx)) + #args.append(ndarray.non_empty(get_const_tuple(arg.shape), arg.dtype, ctx)) +tmp = ndarray.empty(get_const_tuple(arg.shape), arg.dtype, ctx) +args.append(ndarray.array(tmp, ctx=ctx)) ctx.sync() # retry until the coefficient of variation is small enough ``` This should work well for GPU, but you should not do this for CPU. This problem is fixed in the upstream version. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] merrymercy edited a comment on issue #6869: [Relay][Ansor] cannot find a working schedule for certain tasks
merrymercy edited a comment on issue #6869: URL: https://github.com/apache/incubator-tvm/issues/6869#issuecomment-723357031 It seems you are using our dev branch. I did some debugging and found that the problem is because the weight tensor in your dense layer is too large (the shape is (25088, 4096), the size is around 400 MB) It causes some problems in the RPC copy. You can apply this diff in the `Ansor-dev` repo to fix the problem. ```diff diff --git a/python/tvm/ansor/measure.py b/python/tvm/ansor/measure.py index 7d06a5d96..93c293a5f 100644 --- a/python/tvm/ansor/measure.py +++ b/python/tvm/ansor/measure.py @@ -477,7 +477,9 @@ def rpc_run_worker(index): if get_special_buffer(arg.name) is not None: args.append(ndarray.array(get_special_buffer(arg.name))) else: - args.append(ndarray.non_empty(get_const_tuple(arg.shape), arg.dtype, ctx)) + #args.append(ndarray.non_empty(get_const_tuple(arg.shape), arg.dtype, ctx)) +tmp = ndarray.empty(get_const_tuple(arg.shape), arg.dtype, ctx) +args.append(ndarray.array(tmp, ctx=ctx)) ctx.sync() # retry until the coefficient of variation is small enough ``` This should work well for GPU, but you should not do this for CPU. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] merrymercy edited a comment on issue #6869: [Relay][Ansor] cannot find a working schedule for certain tasks
merrymercy edited a comment on issue #6869: URL: https://github.com/apache/incubator-tvm/issues/6869#issuecomment-723357031 It seems you are using our dev branch. I did some debugging and found that the problem is because the weight tensor in your dense layer is too large (the shape is (25088, 4096), the size is around 400 MB) It causes some problems in the RPC copy. You can apply this diff in the `Ansor-dev` repo to fix the problem. ```diff diff --git a/python/tvm/ansor/measure.py b/python/tvm/ansor/measure.py index 7d06a5d96..93c293a5f 100644 --- a/python/tvm/ansor/measure.py +++ b/python/tvm/ansor/measure.py @@ -477,7 +477,9 @@ def rpc_run_worker(index): if get_special_buffer(arg.name) is not None: args.append(ndarray.array(get_special_buffer(arg.name))) else: - args.append(ndarray.non_empty(get_const_tuple(arg.shape), arg.dtype, ctx)) + #args.append(ndarray.non_empty(get_const_tuple(arg.shape), arg.dtype, ctx)) +tmp = ndarray.empty(get_const_tuple(arg.shape), arg.dtype, ctx) +args.append(ndarray.array(tmp, ctx=ctx)) ctx.sync() # retry until the coefficient of variation is small enough ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] merrymercy edited a comment on issue #6869: [Relay][Ansor] cannot find a working schedule for certain tasks
merrymercy edited a comment on issue #6869: URL: https://github.com/apache/incubator-tvm/issues/6869#issuecomment-723357031 It seems you are using our dev branch. I did some debugging and found that the problem is because the weight tensor in your dense layer is too large (the shape is (25088, 4096)). It causes some problems in the RPC copy. You can apply this diff in the `Ansor-dev` repo to fix the problem. ```diff diff --git a/python/tvm/ansor/measure.py b/python/tvm/ansor/measure.py index 7d06a5d96..93c293a5f 100644 --- a/python/tvm/ansor/measure.py +++ b/python/tvm/ansor/measure.py @@ -477,7 +477,9 @@ def rpc_run_worker(index): if get_special_buffer(arg.name) is not None: args.append(ndarray.array(get_special_buffer(arg.name))) else: - args.append(ndarray.non_empty(get_const_tuple(arg.shape), arg.dtype, ctx)) + #args.append(ndarray.non_empty(get_const_tuple(arg.shape), arg.dtype, ctx)) +tmp = ndarray.empty(get_const_tuple(arg.shape), arg.dtype, ctx) +args.append(ndarray.array(tmp, ctx=ctx)) ctx.sync() # retry until the coefficient of variation is small enough ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] merrymercy commented on issue #6869: [Relay][Ansor] cannot find a working schedule for certain tasks
merrymercy commented on issue #6869: URL: https://github.com/apache/incubator-tvm/issues/6869#issuecomment-723357031 It seems you are using our dev branch. I did some debug and found that the problem is because the weight tensor in your dense layer is too large (the shape is (25088, 4096)). It causes some problems in the RPC copy. You can apply this diff in the `Ansor-dev` repo to fix the problem. ```diff diff --git a/python/tvm/ansor/measure.py b/python/tvm/ansor/measure.py index 7d06a5d96..93c293a5f 100644 --- a/python/tvm/ansor/measure.py +++ b/python/tvm/ansor/measure.py @@ -477,7 +477,9 @@ def rpc_run_worker(index): if get_special_buffer(arg.name) is not None: args.append(ndarray.array(get_special_buffer(arg.name))) else: - args.append(ndarray.non_empty(get_const_tuple(arg.shape), arg.dtype, ctx)) + #args.append(ndarray.non_empty(get_const_tuple(arg.shape), arg.dtype, ctx)) +tmp = ndarray.empty(get_const_tuple(arg.shape), arg.dtype, ctx) +args.append(ndarray.array(tmp, ctx=ctx)) ctx.sync() # retry until the coefficient of variation is small enough ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] manupa-arm commented on a change in pull request #6777: [BYOC] Configurable optimize pass for PartitionGraph
manupa-arm commented on a change in pull request #6777: URL: https://github.com/apache/incubator-tvm/pull/6777#discussion_r519063704 ## File path: python/tvm/relay/op/contrib/arm_compute_lib.py ## @@ -62,7 +72,7 @@ def partition_for_arm_compute_lib(mod, params=None): transform.InferType(), transform.MergeComposite(arm_compute_lib_pattern_table()), transform.AnnotateTarget("arm_compute_lib"), -transform.PartitionGraph(), +transform.PartitionGraph(optimize), Review comment: OK, sounds good :). I think PartitionGraph should just perform partitioning unless its really unavoidable to perform the needed post-processing after the pass. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] comaniac commented on a change in pull request #6777: [BYOC] Configurable optimize pass for PartitionGraph
comaniac commented on a change in pull request #6777: URL: https://github.com/apache/incubator-tvm/pull/6777#discussion_r519061574 ## File path: python/tvm/relay/op/contrib/arm_compute_lib.py ## @@ -62,7 +72,7 @@ def partition_for_arm_compute_lib(mod, params=None): transform.InferType(), transform.MergeComposite(arm_compute_lib_pattern_table()), transform.AnnotateTarget("arm_compute_lib"), -transform.PartitionGraph(), +transform.PartitionGraph(optimize), Review comment: As being said in that PR, we can definitely do that. In fact we are planning an RFC right now, so depending on one the RFC, we probably don't need this PR anymore. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] manupa-arm commented on a change in pull request #6777: [BYOC] Configurable optimize pass for PartitionGraph
manupa-arm commented on a change in pull request #6777: URL: https://github.com/apache/incubator-tvm/pull/6777#discussion_r519057683 ## File path: python/tvm/relay/op/contrib/arm_compute_lib.py ## @@ -62,7 +72,7 @@ def partition_for_arm_compute_lib(mod, params=None): transform.InferType(), transform.MergeComposite(arm_compute_lib_pattern_table()), transform.AnnotateTarget("arm_compute_lib"), -transform.PartitionGraph(), +transform.PartitionGraph(optimize), Review comment: Hmm, then why not after PartitionGraph ? (using kCompiler as a filter) ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] tkonolige commented on a change in pull request #6868: [WIP][TOPI][OP] cuda for argwhere
tkonolige commented on a change in pull request #6868: URL: https://github.com/apache/incubator-tvm/pull/6868#discussion_r519057540 ## File path: python/tvm/topi/cuda/argwhere.py ## @@ -0,0 +1,621 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# pylint: disable=too-many-arguments, invalid-name +"""Argwhere operator""" + +import logging + +import tvm +from tvm import te +from tvm._ffi import get_global_func +from .injective import schedule_injective_from_existing +from .nms import atomic_add +from .sort import topk, topk_thrust, argsort, argsort_thrust +from .. import tag +from ..transform import strided_slice, adv_index, squeeze + +logger = logging.getLogger("topi") + + +def _get_sort_func(mode=0): +"""Get sort function for argwhere. mode 0 for topk and others for argsort.""" +if get_global_func("tvm.contrib.thrust.sort", allow_missing=True): +ret = topk_thrust if mode == 0 else argsort_thrust +else: +logger.warn( +"It's highly recommended to enable thrust library with set(USE_THRUST ON)" +" when compiling argwhere for cuda target. Otherwise, it can result in" +" significant performance degradation or incorrect result" +) +ret = topk if mode == 0 else argsort + +return ret + + +def argwhere_1d_ir(condition, out): +"""Low level IR for argwhere 1D + +Parameters +-- +condition : Buffer +The condition buffer. + +out : Buffer +The output buffer. + +Returns +--- +stmt : Stmt +The result IR statement. +""" +ib = tvm.tir.ir_builder.create() +a0 = condition.shape[0] + +condition = ib.buffer_ptr(condition) +out = ib.buffer_ptr(out) + +valid_index = ib.allocate("int32", (1,), name="valid_index", scope="global") +tmp = ib.allocate("int32", (1,), name="tmp", scope="local") +one_count = tvm.tir.const(1, dtype="int32") + +max_threads = int(tvm.target.Target.current(allow_none=False).max_num_threads) +nthread_tx = max_threads +# Limit threads to a single block to make sure atomic_add works normally. +tx = te.thread_axis("threadIdx.x") +ib.scope_attr(tx, "thread_extent", nthread_tx) +len_inner_for = a0 // nthread_tx + 1 +valid_index[0] = 0 + +with ib.for_range(0, len_inner_for, name="i") as i: +idx = tx * len_inner_for + i +with ib.if_scope(idx < a0): +with ib.if_scope(condition[idx] != 0): +tmp[0] = atomic_add( +tvm.tir.call_intrin("handle", "tir.address_of", valid_index[0]), +one_count, +) +out[tmp[0]] = idx + +return ib.get() + + +def argwhere_1d(output_shape, condition): +"""Compute for argwhere 1D + +Parameters +-- +condition : list of int or tvm.tir.Any +The output shape + +out : tvm.te.Tensor +Tensor with boolean values. + +Returns +--- +stmt : Stmt +The result IR statement. +""" +condition_buf = tvm.tir.decl_buffer( +condition.shape, condition.dtype, "data_buf", data_alignment=8 +) +out_buf = tvm.tir.decl_buffer(output_shape, "int32", "out_buf", data_alignment=8) + +out = te.extern( +[output_shape], +[condition], +lambda ins, outs: argwhere_1d_ir(ins[0], outs[0]), +dtype=["int32"], +in_buffers=[condition_buf], +out_buffers=[out_buf], +name="argwhere_1d", +tag="argwhere1d_gpu", +) + +if out.shape[0] <= 1: +return out + +sorted_out = _get_sort_func()( +out, k=0, axis=0, ret_type="values", is_ascend="True", dtype="int32" +) + +return sorted_out + + +def argwhere_2d_ir(condition, out): +"""Low level IR for argwhere 2D + +Parameters +-- +condition : Buffer +The condition buffer. + +out : Buffer +The output buffer. + +Returns +--- +stmt : Stmt +The result IR statement. +""" +ib = tvm.tir.ir_builder.create() +a0 = condition.shape[0] +a1 = condition.shape[1] + +condition = ib.buffer_ptr(condition) +out = ib.buffer_ptr(out) + +valid_inde
[GitHub] [incubator-tvm] kevinthesun commented on a change in pull request #6868: [WIP][TOPI][OP] cuda for argwhere
kevinthesun commented on a change in pull request #6868: URL: https://github.com/apache/incubator-tvm/pull/6868#discussion_r519055922 ## File path: python/tvm/topi/cuda/argwhere.py ## @@ -0,0 +1,621 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# pylint: disable=too-many-arguments, invalid-name +"""Argwhere operator""" + +import logging + +import tvm +from tvm import te +from tvm._ffi import get_global_func +from .injective import schedule_injective_from_existing +from .nms import atomic_add +from .sort import topk, topk_thrust, argsort, argsort_thrust +from .. import tag +from ..transform import strided_slice, adv_index, squeeze + +logger = logging.getLogger("topi") + + +def _get_sort_func(mode=0): +"""Get sort function for argwhere. mode 0 for topk and others for argsort.""" +if get_global_func("tvm.contrib.thrust.sort", allow_missing=True): +ret = topk_thrust if mode == 0 else argsort_thrust +else: +logger.warn( +"It's highly recommended to enable thrust library with set(USE_THRUST ON)" +" when compiling argwhere for cuda target. Otherwise, it can result in" +" significant performance degradation or incorrect result" +) +ret = topk if mode == 0 else argsort + +return ret + + +def argwhere_1d_ir(condition, out): +"""Low level IR for argwhere 1D + +Parameters +-- +condition : Buffer +The condition buffer. + +out : Buffer +The output buffer. + +Returns +--- +stmt : Stmt +The result IR statement. +""" +ib = tvm.tir.ir_builder.create() +a0 = condition.shape[0] + +condition = ib.buffer_ptr(condition) +out = ib.buffer_ptr(out) + +valid_index = ib.allocate("int32", (1,), name="valid_index", scope="global") +tmp = ib.allocate("int32", (1,), name="tmp", scope="local") +one_count = tvm.tir.const(1, dtype="int32") + +max_threads = int(tvm.target.Target.current(allow_none=False).max_num_threads) +nthread_tx = max_threads +# Limit threads to a single block to make sure atomic_add works normally. +tx = te.thread_axis("threadIdx.x") +ib.scope_attr(tx, "thread_extent", nthread_tx) +len_inner_for = a0 // nthread_tx + 1 +valid_index[0] = 0 + +with ib.for_range(0, len_inner_for, name="i") as i: +idx = tx * len_inner_for + i +with ib.if_scope(idx < a0): +with ib.if_scope(condition[idx] != 0): +tmp[0] = atomic_add( +tvm.tir.call_intrin("handle", "tir.address_of", valid_index[0]), +one_count, +) +out[tmp[0]] = idx + +return ib.get() + + +def argwhere_1d(output_shape, condition): +"""Compute for argwhere 1D + +Parameters +-- +condition : list of int or tvm.tir.Any +The output shape + +out : tvm.te.Tensor +Tensor with boolean values. + +Returns +--- +stmt : Stmt +The result IR statement. +""" +condition_buf = tvm.tir.decl_buffer( +condition.shape, condition.dtype, "data_buf", data_alignment=8 +) +out_buf = tvm.tir.decl_buffer(output_shape, "int32", "out_buf", data_alignment=8) + +out = te.extern( +[output_shape], +[condition], +lambda ins, outs: argwhere_1d_ir(ins[0], outs[0]), +dtype=["int32"], +in_buffers=[condition_buf], +out_buffers=[out_buf], +name="argwhere_1d", +tag="argwhere1d_gpu", +) + +if out.shape[0] <= 1: +return out + +sorted_out = _get_sort_func()( +out, k=0, axis=0, ret_type="values", is_ascend="True", dtype="int32" +) + +return sorted_out + + +def argwhere_2d_ir(condition, out): +"""Low level IR for argwhere 2D + +Parameters +-- +condition : Buffer +The condition buffer. + +out : Buffer +The output buffer. + +Returns +--- +stmt : Stmt +The result IR statement. +""" +ib = tvm.tir.ir_builder.create() +a0 = condition.shape[0] +a1 = condition.shape[1] + +condition = ib.buffer_ptr(condition) +out = ib.buffer_ptr(out) + +valid_in
[GitHub] [incubator-tvm] kevinthesun commented on a change in pull request #6868: [WIP][TOPI][OP] cuda for argwhere
kevinthesun commented on a change in pull request #6868: URL: https://github.com/apache/incubator-tvm/pull/6868#discussion_r519055922 ## File path: python/tvm/topi/cuda/argwhere.py ## @@ -0,0 +1,621 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# pylint: disable=too-many-arguments, invalid-name +"""Argwhere operator""" + +import logging + +import tvm +from tvm import te +from tvm._ffi import get_global_func +from .injective import schedule_injective_from_existing +from .nms import atomic_add +from .sort import topk, topk_thrust, argsort, argsort_thrust +from .. import tag +from ..transform import strided_slice, adv_index, squeeze + +logger = logging.getLogger("topi") + + +def _get_sort_func(mode=0): +"""Get sort function for argwhere. mode 0 for topk and others for argsort.""" +if get_global_func("tvm.contrib.thrust.sort", allow_missing=True): +ret = topk_thrust if mode == 0 else argsort_thrust +else: +logger.warn( +"It's highly recommended to enable thrust library with set(USE_THRUST ON)" +" when compiling argwhere for cuda target. Otherwise, it can result in" +" significant performance degradation or incorrect result" +) +ret = topk if mode == 0 else argsort + +return ret + + +def argwhere_1d_ir(condition, out): +"""Low level IR for argwhere 1D + +Parameters +-- +condition : Buffer +The condition buffer. + +out : Buffer +The output buffer. + +Returns +--- +stmt : Stmt +The result IR statement. +""" +ib = tvm.tir.ir_builder.create() +a0 = condition.shape[0] + +condition = ib.buffer_ptr(condition) +out = ib.buffer_ptr(out) + +valid_index = ib.allocate("int32", (1,), name="valid_index", scope="global") +tmp = ib.allocate("int32", (1,), name="tmp", scope="local") +one_count = tvm.tir.const(1, dtype="int32") + +max_threads = int(tvm.target.Target.current(allow_none=False).max_num_threads) +nthread_tx = max_threads +# Limit threads to a single block to make sure atomic_add works normally. +tx = te.thread_axis("threadIdx.x") +ib.scope_attr(tx, "thread_extent", nthread_tx) +len_inner_for = a0 // nthread_tx + 1 +valid_index[0] = 0 + +with ib.for_range(0, len_inner_for, name="i") as i: +idx = tx * len_inner_for + i +with ib.if_scope(idx < a0): +with ib.if_scope(condition[idx] != 0): +tmp[0] = atomic_add( +tvm.tir.call_intrin("handle", "tir.address_of", valid_index[0]), +one_count, +) +out[tmp[0]] = idx + +return ib.get() + + +def argwhere_1d(output_shape, condition): +"""Compute for argwhere 1D + +Parameters +-- +condition : list of int or tvm.tir.Any +The output shape + +out : tvm.te.Tensor +Tensor with boolean values. + +Returns +--- +stmt : Stmt +The result IR statement. +""" +condition_buf = tvm.tir.decl_buffer( +condition.shape, condition.dtype, "data_buf", data_alignment=8 +) +out_buf = tvm.tir.decl_buffer(output_shape, "int32", "out_buf", data_alignment=8) + +out = te.extern( +[output_shape], +[condition], +lambda ins, outs: argwhere_1d_ir(ins[0], outs[0]), +dtype=["int32"], +in_buffers=[condition_buf], +out_buffers=[out_buf], +name="argwhere_1d", +tag="argwhere1d_gpu", +) + +if out.shape[0] <= 1: +return out + +sorted_out = _get_sort_func()( +out, k=0, axis=0, ret_type="values", is_ascend="True", dtype="int32" +) + +return sorted_out + + +def argwhere_2d_ir(condition, out): +"""Low level IR for argwhere 2D + +Parameters +-- +condition : Buffer +The condition buffer. + +out : Buffer +The output buffer. + +Returns +--- +stmt : Stmt +The result IR statement. +""" +ib = tvm.tir.ir_builder.create() +a0 = condition.shape[0] +a1 = condition.shape[1] + +condition = ib.buffer_ptr(condition) +out = ib.buffer_ptr(out) + +valid_in
[GitHub] [incubator-tvm] tkonolige commented on a change in pull request #6868: [WIP][TOPI][OP] cuda for argwhere
tkonolige commented on a change in pull request #6868: URL: https://github.com/apache/incubator-tvm/pull/6868#discussion_r519047405 ## File path: python/tvm/topi/cuda/argwhere.py ## @@ -0,0 +1,621 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# pylint: disable=too-many-arguments, invalid-name +"""Argwhere operator""" + +import logging + +import tvm +from tvm import te +from tvm._ffi import get_global_func +from .injective import schedule_injective_from_existing +from .nms import atomic_add +from .sort import topk, topk_thrust, argsort, argsort_thrust +from .. import tag +from ..transform import strided_slice, adv_index, squeeze + +logger = logging.getLogger("topi") + + +def _get_sort_func(mode=0): +"""Get sort function for argwhere. mode 0 for topk and others for argsort.""" +if get_global_func("tvm.contrib.thrust.sort", allow_missing=True): +ret = topk_thrust if mode == 0 else argsort_thrust +else: +logger.warn( +"It's highly recommended to enable thrust library with set(USE_THRUST ON)" +" when compiling argwhere for cuda target. Otherwise, it can result in" +" significant performance degradation or incorrect result" +) +ret = topk if mode == 0 else argsort + +return ret + + +def argwhere_1d_ir(condition, out): +"""Low level IR for argwhere 1D + +Parameters +-- +condition : Buffer +The condition buffer. + +out : Buffer +The output buffer. + +Returns +--- +stmt : Stmt +The result IR statement. +""" +ib = tvm.tir.ir_builder.create() +a0 = condition.shape[0] + +condition = ib.buffer_ptr(condition) +out = ib.buffer_ptr(out) + +valid_index = ib.allocate("int32", (1,), name="valid_index", scope="global") +tmp = ib.allocate("int32", (1,), name="tmp", scope="local") +one_count = tvm.tir.const(1, dtype="int32") + +max_threads = int(tvm.target.Target.current(allow_none=False).max_num_threads) +nthread_tx = max_threads +# Limit threads to a single block to make sure atomic_add works normally. +tx = te.thread_axis("threadIdx.x") +ib.scope_attr(tx, "thread_extent", nthread_tx) +len_inner_for = a0 // nthread_tx + 1 +valid_index[0] = 0 + +with ib.for_range(0, len_inner_for, name="i") as i: +idx = tx * len_inner_for + i +with ib.if_scope(idx < a0): +with ib.if_scope(condition[idx] != 0): +tmp[0] = atomic_add( +tvm.tir.call_intrin("handle", "tir.address_of", valid_index[0]), +one_count, +) +out[tmp[0]] = idx + +return ib.get() + + +def argwhere_1d(output_shape, condition): +"""Compute for argwhere 1D + +Parameters +-- +condition : list of int or tvm.tir.Any +The output shape + +out : tvm.te.Tensor +Tensor with boolean values. + +Returns +--- +stmt : Stmt +The result IR statement. +""" +condition_buf = tvm.tir.decl_buffer( +condition.shape, condition.dtype, "data_buf", data_alignment=8 +) +out_buf = tvm.tir.decl_buffer(output_shape, "int32", "out_buf", data_alignment=8) + +out = te.extern( +[output_shape], +[condition], +lambda ins, outs: argwhere_1d_ir(ins[0], outs[0]), +dtype=["int32"], +in_buffers=[condition_buf], +out_buffers=[out_buf], +name="argwhere_1d", +tag="argwhere1d_gpu", +) + +if out.shape[0] <= 1: +return out + +sorted_out = _get_sort_func()( +out, k=0, axis=0, ret_type="values", is_ascend="True", dtype="int32" +) + +return sorted_out + + +def argwhere_2d_ir(condition, out): +"""Low level IR for argwhere 2D + +Parameters +-- +condition : Buffer +The condition buffer. + +out : Buffer +The output buffer. + +Returns +--- +stmt : Stmt +The result IR statement. +""" +ib = tvm.tir.ir_builder.create() +a0 = condition.shape[0] +a1 = condition.shape[1] + +condition = ib.buffer_ptr(condition) +out = ib.buffer_ptr(out) + +valid_inde
[GitHub] [incubator-tvm] tqchen opened a new pull request #6873: [COMMUNITY] New committer -- @mbaret
tqchen opened a new pull request #6873: URL: https://github.com/apache/incubator-tvm/pull/6873 Please join us to welcome @mbaret as a new committer Matt has been contributing to relay graph optimization, customized codegen(BYOC). He is also spending quite a good chunk of time interacting with other developers outside his organizations in the form of participating discussions, helping out others and creating proposals. - [Commits History](https://github.com/apache/incubator-tvm/commits?author=mbaret) - [Code Review](https://github.com/apache/incubator-tvm/pulls?utf8=%E2%9C%93&q=reviewed-by%3Ambaret) - [Community Forum Summary](https://discuss.tvm.apache.org/u/matt-arm/summary) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] tkonolige commented on pull request #6854: [RELAY,TOPI] Add scatter_nd op
tkonolige commented on pull request #6854: URL: https://github.com/apache/incubator-tvm/pull/6854#issuecomment-723335052 @mbrookhart Actually, you've got your wish. I've added a simple x86 and cuda implementation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] mbrookhart commented on pull request #6870: Dynamic gpu tests, add dynamic strided slice to topi
mbrookhart commented on pull request #6870: URL: https://github.com/apache/incubator-tvm/pull/6870#issuecomment-72814 added the python api and tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] FrozenGene merged pull request #6802: Add smmla/ummla support in quantized Conv2d
FrozenGene merged pull request #6802: URL: https://github.com/apache/incubator-tvm/pull/6802 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] mbrookhart commented on a change in pull request #6839: [WIP][ONNX] NMS in ONNX
mbrookhart commented on a change in pull request #6839: URL: https://github.com/apache/incubator-tvm/pull/6839#discussion_r519041432 ## File path: python/tvm/topi/cuda/nms.py ## @@ -519,14 +557,90 @@ def non_max_suppression( coord_start, id_index, score_index, +return_indices, ), dtype=[data.dtype, "int32"], -in_buffers=[data_buf, sort_tensor_buf, valid_count_buf], +in_buffers=[data_buf, sort_tensor_buf, valid_count_buf, indices_buf], name="nms", tag="nms", ) -# TODO(yongwww): Update cuda nms to be consistent with cpu version if return_indices: -return box_indices +out_shape = box_indices.shape +valid_box_count_shape = [box_indices.shape[0], 1] +valid_box_count = tvm.tir.decl_buffer(valid_box_count_shape, "int32", "valid_box_count") +output = tvm.tir.decl_buffer(box_indices.shape, "int32", "output") +return te.extern( +[out_shape, valid_box_count_shape], +[box_indices], +lambda ins, outs: rearrange_indices_out_ir(ins[0], outs[0], outs[1]), +dtype="int32", +out_buffers=[output, valid_box_count], +name="rearrange_indices_out_gpu", +tag="rearrange_indices_out_gpu", +) return out + + +def rearrange_indices_out_ir(data, output, valid_box_count): +"""Hybrid routine to rearrange nms output to +move all valid entries to top. + +Parameters +-- +data : tvm.te.Tensor or numpy NDArray +NMS output. 3-D tensor with shape +[batch_size, num_anchors, 6] or +[batch_size, num_anchors, 5], or 2-D +tensor with shape [batch_size, num_anchors]. + +one: tvm.tir.const +Constant one with the same dtype as data. + +batch_size: tvm.tir.IntImm or tvm.tir.Var +Batch size. We need to pass it in since hybrid script doesn't support +binding variable to symbolic dim. + +num_anchors: tvm.tir.IntImm or tvm.tir.Var +Number of anchors. + +Returns +--- +output : tvm.te.Tensor or numpy NDArray +2-D tensor with shape [batch_size, num_anchors]. + +valid_box_count : tvm.te.Tensor or numpy NDArray +Tensor with shape [batch_size, 1], indicates +the valid number of boxes. +""" +batch_size = data.shape[0] +num_anchors = data.shape[1] + +ib = tvm.tir.ir_builder.create() + +data = ib.buffer_ptr(data) +valid_box_count = ib.buffer_ptr(valid_box_count) +output = ib.buffer_ptr(output) + +with ib.new_scope(): +i = te.thread_axis("blockIdx.x") +ib.scope_attr(i, "thread_extent", batch_size) +valid_idx = ib.allocate("int32", (1), name="valid_idx", scope="local") +valid_idx[0] = 0 +with ib.for_range(0, num_anchors, name="j") as j: +with ib.if_scope(data[i, j] >= 0): +with ib.if_scope(data[i, j] > num_anchors): +output[i, valid_idx[0]] = 0 +valid_idx[0] = valid_idx[0] + 1 +with ib.else_scope(): +output[i, valid_idx[0]] = data[i, j] +valid_idx[0] = valid_idx[0] + 1 +with ib.else_scope(): +with ib.if_scope(data[i, j] < -num_anchors): +output[i, valid_idx[0]] = 0 +valid_idx[0] = valid_idx[0] + 1 +with ib.if_scope(j >= valid_idx[0]): +output[i, j] = -1 +valid_box_count[i, 0] = valid_idx[0] + +return ib.get() Review comment: I've been attempted to get SSD-Mobilenet from the ONNX model zoo working, but I"m hitting other bugs. I'll post some perf metrics as soon as I can get that working. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] FrozenGene commented on pull request #6802: Add smmla/ummla support in quantized Conv2d
FrozenGene commented on pull request #6802: URL: https://github.com/apache/incubator-tvm/pull/6802#issuecomment-723332467 > Hi @FrozenGene , > Thanks for approving! Yes, I wanted to do that in a separate PR to not pollute this one. Is that ok with you? yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-tvm] branch main updated (3ea0686 -> 83b75f8)
This is an automated email from the ASF dual-hosted git repository. zhaowu pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git. from 3ea0686 [FIX,RPC] Skip RPC tests when using multiprocessing's spawn method (#6858) add 83b75f8 Add smmla/ummla support in quantized Conv2d (#6802) No new revisions were added by this update. Summary of changes: python/tvm/topi/arm_cpu/arm_utils.py | 27 +++- python/tvm/topi/arm_cpu/conv2d_gemm.py| 179 +++--- python/tvm/topi/arm_cpu/tensor_intrin.py | 106 - tests/python/topi/python/test_topi_conv2d_int8.py | 6 + 4 files changed, 252 insertions(+), 66 deletions(-)
[incubator-tvm] branch main updated (3cf997a -> 3ea0686)
This is an automated email from the ASF dual-hosted git repository. tqchen pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git. from 3cf997a [Relay] Mix mode type inference (#6704) add 3ea0686 [FIX,RPC] Skip RPC tests when using multiprocessing's spawn method (#6858) No new revisions were added by this update. Summary of changes: tests/python/unittest/test_runtime_rpc.py | 16 1 file changed, 16 insertions(+)
[GitHub] [incubator-tvm] tqchen merged pull request #6858: [FIX,RPC] Skip RPC tests when using multiprocessing's spawn method
tqchen merged pull request #6858: URL: https://github.com/apache/incubator-tvm/pull/6858 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] tqchen commented on pull request #6858: [FIX,RPC] Skip RPC tests when using multiprocessing's spawn method
tqchen commented on pull request #6858: URL: https://github.com/apache/incubator-tvm/pull/6858#issuecomment-723331514 merge this for now, we should re-enable after we rewrite test to use startup env callback This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] tqchen commented on issue #6869: [Relay][Ansor] cannot find a working schedule for certain tasks
tqchen commented on issue #6869: URL: https://github.com/apache/incubator-tvm/issues/6869#issuecomment-723331185 It would be great topic for https://discuss.tvm.apache.org/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] tqchen closed issue #6869: [Relay][Ansor] cannot find a working schedule for certain tasks
tqchen closed issue #6869: URL: https://github.com/apache/incubator-tvm/issues/6869 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] Laurawly commented on a change in pull request #6839: [WIP][ONNX] NMS in ONNX
Laurawly commented on a change in pull request #6839: URL: https://github.com/apache/incubator-tvm/pull/6839#discussion_r519038726 ## File path: python/tvm/topi/cuda/nms.py ## @@ -519,14 +557,90 @@ def non_max_suppression( coord_start, id_index, score_index, +return_indices, ), dtype=[data.dtype, "int32"], -in_buffers=[data_buf, sort_tensor_buf, valid_count_buf], +in_buffers=[data_buf, sort_tensor_buf, valid_count_buf, indices_buf], name="nms", tag="nms", ) -# TODO(yongwww): Update cuda nms to be consistent with cpu version if return_indices: -return box_indices +out_shape = box_indices.shape +valid_box_count_shape = [box_indices.shape[0], 1] +valid_box_count = tvm.tir.decl_buffer(valid_box_count_shape, "int32", "valid_box_count") +output = tvm.tir.decl_buffer(box_indices.shape, "int32", "output") +return te.extern( +[out_shape, valid_box_count_shape], +[box_indices], +lambda ins, outs: rearrange_indices_out_ir(ins[0], outs[0], outs[1]), +dtype="int32", +out_buffers=[output, valid_box_count], +name="rearrange_indices_out_gpu", +tag="rearrange_indices_out_gpu", +) return out + + +def rearrange_indices_out_ir(data, output, valid_box_count): +"""Hybrid routine to rearrange nms output to +move all valid entries to top. + +Parameters +-- +data : tvm.te.Tensor or numpy NDArray +NMS output. 3-D tensor with shape +[batch_size, num_anchors, 6] or +[batch_size, num_anchors, 5], or 2-D +tensor with shape [batch_size, num_anchors]. + +one: tvm.tir.const +Constant one with the same dtype as data. + +batch_size: tvm.tir.IntImm or tvm.tir.Var +Batch size. We need to pass it in since hybrid script doesn't support +binding variable to symbolic dim. + +num_anchors: tvm.tir.IntImm or tvm.tir.Var +Number of anchors. + +Returns +--- +output : tvm.te.Tensor or numpy NDArray +2-D tensor with shape [batch_size, num_anchors]. + +valid_box_count : tvm.te.Tensor or numpy NDArray +Tensor with shape [batch_size, 1], indicates +the valid number of boxes. +""" +batch_size = data.shape[0] +num_anchors = data.shape[1] + +ib = tvm.tir.ir_builder.create() + +data = ib.buffer_ptr(data) +valid_box_count = ib.buffer_ptr(valid_box_count) +output = ib.buffer_ptr(output) + +with ib.new_scope(): +i = te.thread_axis("blockIdx.x") +ib.scope_attr(i, "thread_extent", batch_size) +valid_idx = ib.allocate("int32", (1), name="valid_idx", scope="local") +valid_idx[0] = 0 +with ib.for_range(0, num_anchors, name="j") as j: +with ib.if_scope(data[i, j] >= 0): +with ib.if_scope(data[i, j] > num_anchors): +output[i, valid_idx[0]] = 0 +valid_idx[0] = valid_idx[0] + 1 +with ib.else_scope(): +output[i, valid_idx[0]] = data[i, j] +valid_idx[0] = valid_idx[0] + 1 +with ib.else_scope(): +with ib.if_scope(data[i, j] < -num_anchors): +output[i, valid_idx[0]] = 0 +valid_idx[0] = valid_idx[0] + 1 +with ib.if_scope(j >= valid_idx[0]): +output[i, j] = -1 +valid_box_count[i, 0] = valid_idx[0] + +return ib.get() Review comment: Could you show the performance benchmark on some popular OD model workloads with the modified nms.py? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] trevor-m edited a comment on pull request #6872: [BYOC][TRT] Allocate GPU data buffers when needed and transfer data
trevor-m edited a comment on pull request #6872: URL: https://github.com/apache/incubator-tvm/pull/6872#issuecomment-723327290 @zhiics @comaniac @anijain2305 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] trevor-m commented on pull request #6872: [BYOC][TRT] Allocate GPU data buffers when needed and transfer data
trevor-m commented on pull request #6872: URL: https://github.com/apache/incubator-tvm/pull/6872#issuecomment-723327290 @zhiics @comaniac This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] trevor-m opened a new pull request #6872: [BYOC][TRT] Allocate GPU data buffers when needed and transfer data
trevor-m opened a new pull request #6872: URL: https://github.com/apache/incubator-tvm/pull/6872 This PR enables the TRT BYOC integration to be used with target="llvm" (previously could only use "cuda"). If an input or output DLTensor is not located on the GPU, we will now allocate a GPU buffer to pass to TensorRT and transfer the data from the DLTensor accordingly. Since data_entry_ is needed during BuildEngine now, we had to move BuildEngine from JsonRuntime::Init to first run. This is prerequisite to use TRT BYOC in combination with Relay VM which in general requires llvm target. Thanks @ylc for original implementation: https://github.com/neo-ai/tvm/pull/147 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] hogepodge commented on pull request #6867: Fix bug in processing script
hogepodge commented on pull request #6867: URL: https://github.com/apache/incubator-tvm/pull/6867#issuecomment-723318753 Thanks, updated This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] mbrookhart commented on pull request #6870: Dynamic gpu tests, add dynamic strided slice to topi
mbrookhart commented on pull request #6870: URL: https://github.com/apache/incubator-tvm/pull/6870#issuecomment-723310203 Oops! I was missing part of it in this commit, let me extract it from the other branch This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] comaniac commented on a change in pull request #6777: [BYOC] Configurable optimize pass for PartitionGraph
comaniac commented on a change in pull request #6777: URL: https://github.com/apache/incubator-tvm/pull/6777#discussion_r519014138 ## File path: python/tvm/relay/op/contrib/arm_compute_lib.py ## @@ -62,7 +72,7 @@ def partition_for_arm_compute_lib(mod, params=None): transform.InferType(), transform.MergeComposite(arm_compute_lib_pattern_table()), transform.AnnotateTarget("arm_compute_lib"), -transform.PartitionGraph(), +transform.PartitionGraph(optimize), Review comment: This pass applies to each partitioned function, so it has to be called after the partitioned function has been created. See #6068 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] tqchen commented on pull request #6858: [FIX,RPC] Skip RPC tests when using multiprocessing's spawn method
tqchen commented on pull request #6858: URL: https://github.com/apache/incubator-tvm/pull/6858#issuecomment-723307232 I think this is because of the location of the "main" file. The python spawn might automatically run the global reg in the main file. but not other places. Pytest itself have a different main file. Putting the registeration under tvm.testing means they are imported regardless. Alternatively, passing an startup env callback to RPCServer would mean the startup env function get called explicitly(and registers the function). My guess is that the startup env might be an attractive approach we can try. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] comaniac commented on issue #6869: [Relay][Ansor] cannot find a working schedule for certain tasks
comaniac commented on issue #6869: URL: https://github.com/apache/incubator-tvm/issues/6869#issuecomment-723306184 Looks like you're using a very old branch? We don't have namespace `ansor` anymore and some API names are changed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] zhiics commented on pull request #6870: Dynamic gpu tests, add dynamic strided slice to topi
zhiics commented on pull request #6870: URL: https://github.com/apache/incubator-tvm/pull/6870#issuecomment-723305330 @mbrookhart Thanks for the PR. I think this probably solves the problem if we invoke from Relay directly. But similarly to topk, argwhere also needs to invoke strided_slice in topi. I think we also need to have a dispatch in topi to invoke each dyn_strided_slice or strided_slice. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] tqchen edited a comment on pull request #6858: [FIX,RPC] Skip RPC tests when using multiprocessing's spawn method
tqchen edited a comment on pull request #6858: URL: https://github.com/apache/incubator-tvm/pull/6858#issuecomment-723300424 This is because some of the register_func are in local function scope instead of global, so they won't be executed right after spawn. Globally registered packedfunc will be preserved. If we move most of the rpc testing function registration to tvm.testing. Or enable a env setup function that passes to the spawn method(via cloud pickle) which registers the function during rpc startup, it should work. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] tkonolige commented on pull request #6858: [FIX,RPC] Skip RPC tests when using multiprocessing's spawn method
tkonolige commented on pull request #6858: URL: https://github.com/apache/incubator-tvm/pull/6858#issuecomment-723301559 > This is because some of the register_func are in local function scope instead of global, so they won't be executed right after spawn. Globally registered packedfunc will be preserved This is what I thought initially, but something weirder is going on. The globally registered packedfunc works when you run the test script on its own (without pytest). But when running under pytest, the registered functions cannot be found. I commented about it in the source code: ``` # tkonolige: The issue as I understand it is this: multiprocessing's spawn # method launches a new process and then imports the relevant modules. This # means that all registered functions must exist at the top level scope. In # this file they are, so all is well when we run this file directly. # However, when run under pytest, the functions aren't registered on the # server. I believe this is because pytest is also using multiprocessing to # run individual functions. Somewhere along the way, the imports are being # lost, so the server ends up not registering the functions. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] tqchen commented on pull request #6858: [FIX,RPC] Skip RPC tests when using multiprocessing's spawn method
tqchen commented on pull request #6858: URL: https://github.com/apache/incubator-tvm/pull/6858#issuecomment-723300424 This is because some of the register_func are in local function scope instead of global, so they won't be executed right after spawn. Globally registered packedfunc will be preserved This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] kazum opened a new pull request #6871: [Rust] Revive wasm32 test
kazum opened a new pull request #6871: URL: https://github.com/apache/incubator-tvm/pull/6871 - Follow the latest BackendPackedCFunc signature - Use llvm-10 by default to create library (it look like at least llvm-9 is required for wasm32) - Enable a wasm32 test in CI @jroesch @binarybana @mwillsey Please take a look This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] mbrookhart opened a new pull request #6870: Dynamic gpu tests, add dynamic strided slice to topi
mbrookhart opened a new pull request #6870: URL: https://github.com/apache/incubator-tvm/pull/6870 cc @zhiics Like I said, this is from my topk branch, which is still having issues with memory corruption, so I'm not actually using the topi version by this commit. Let me know if this works for argwhere, I'll try to update it with a topi unit test this afternoon. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] Hecmay commented on issue #6869: [Relay][Ansor] cannot find a working schedule for certain tasks
Hecmay commented on issue #6869: URL: https://github.com/apache/incubator-tvm/issues/6869#issuecomment-723286159 @merrymercy @comaniac This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] Hecmay opened a new issue #6869: [Relay][Ansor] cannot find a working schedule for certain tasks
Hecmay opened a new issue #6869: URL: https://github.com/apache/incubator-tvm/issues/6869 ## Env * Ubuntu 18.04 LTS (LLVM dev 6.0.0 ) * GPU: RTX2080 + CUDA 10.2 ## Problem Description I was trying to construct a neural network using realy's APIs, and tune it with TVM's auto scheduler. During the tuning process, I can see from the log that some tasks failed all the evaluation (e.g. the log outputs `*T*T*T*T*T*T"` for all the evaluation). As a result, those tasks are not scheduled at all. Here is error msg I got from relay runtime when trying to run the network. I suppose this is the problematic task in the network. ```shell RuntimeError: Check failed: VerifyMemory(func): Direct host side access to device memory is detected. Did you forget to bind? PrimFunc([placeholder, placeholder, placeholder, T_relu]) attrs={"global_symbol": "fused_nn_dense_add_nn_relu_3", "tir.noalias": (bool)1, "target": cuda} { // attr [T_dense] storage_scope = "global" allocate T_dense[float32 * 4096] for (j, 0, 4096) { T_dense[j] = 0f for (k, 0, 25088) { T_dense[j] = (T_dense[j] + (placeholder[k]*placeholder[((j*25088) + k)])) } } for (ax1, 0, 4096) { T_dense[ax1] = (placeholder[ax1] + T_dense[ax1]) } for (ax1, 0, 4096) { T_relu[ax1] = max(T_dense[ax1], 0f) } } ``` ## Test case Here is a VGG11 network. I set the timeout to be a very large value, but for some tasks, all the measurement still would not pass because of timeout. ```python import os import numpy as np import tvm from tvm import relay from tvm import ansor as auto_scheduler from tvm.relay import testing import tvm.contrib.graph_runtime as runtime def build_graph(): t1 = relay.var('I_1', shape=(1,3,224,224), dtype='float32') t2 = relay.var('I_2', shape=(64,3,3,3), dtype='float32') t3 = relay.var('I_3', shape=(64,), dtype='float32') t4 = relay.var('I_4', shape=(128,64,3,3), dtype='float32') t5 = relay.var('I_5', shape=(128,), dtype='float32') t6 = relay.var('I_6', shape=(256,128,3,3), dtype='float32') t7 = relay.var('I_7', shape=(256,), dtype='float32') t8 = relay.var('I_8', shape=(256,256,3,3), dtype='float32') t9 = relay.var('I_9', shape=(256,), dtype='float32') t10 = relay.var('I_10', shape=(512,256,3,3), dtype='float32') t11 = relay.var('I_11', shape=(512,), dtype='float32') t12 = relay.var('I_12', shape=(512,512,3,3), dtype='float32') t13 = relay.var('I_13', shape=(512,), dtype='float32') t14 = relay.var('I_14', shape=(512,512,3,3), dtype='float32') t15 = relay.var('I_15', shape=(512,), dtype='float32') t16 = relay.var('I_16', shape=(512,512,3,3), dtype='float32') t17 = relay.var('I_17', shape=(512,), dtype='float32') t18 = relay.var('I_18', shape=(4096,25088), dtype='float32') t19 = relay.var('I_19', shape=(4096,), dtype='float32') t20 = relay.var('I_20', shape=(4096,4096), dtype='float32') t21 = relay.var('I_21', shape=(4096,), dtype='float32') t22 = relay.var('I_22', shape=(1000,4096), dtype='float32') t23 = relay.var('I_23', shape=(1000,), dtype='float32') t24 = relay.nn.conv2d(t1, t2, padding=[1,1]) t25 = relay.reshape(t3, (64,1,1)) t26 = relay.reshape(t5, (128,1,1)) t27 = relay.reshape(t7, (256,1,1)) t28 = relay.reshape(t9, (256,1,1)) t29 = relay.reshape(t11, (512,1,1)) t30 = relay.reshape(t13, (512,1,1)) t31 = relay.reshape(t15, (512,1,1)) t32 = relay.reshape(t17, (512,1,1)) t33 = relay.add(t24, t25) t34 = relay.nn.relu(t33) t35 = relay.nn.max_pool2d(t34, pool_size=[2,2], strides=[2,2], padding=[0,0]) t36 = relay.nn.conv2d(t35, t4, padding=[1,1]) t37 = relay.add(t36, t26) t38 = relay.nn.relu(t37) t39 = relay.nn.max_pool2d(t38, pool_size=[2,2], strides=[2,2], padding=[0,0]) t40 = relay.nn.conv2d(t39, t6, padding=[1,1]) t41 = relay.add(t40, t27) t42 = relay.nn.relu(t41) t43 = relay.nn.conv2d(t42, t8, padding=[1,1]) t44 = relay.add(t43, t28) t45 = relay.nn.relu(t44) t46 = relay.nn.max_pool2d(t45, pool_size=[2,2], strides=[2,2], padding=[0,0]) t47 = relay.nn.conv2d(t46, t10, padding=[1,1]) t48 = relay.add(t47, t29) t49 = relay.nn.relu(t48) t50 = relay.nn.conv2d(t49, t12, padding=[1,1]) t51 = relay.add(t50, t30) t52 = relay.nn.relu(t51) t53 = relay.nn.max_pool2d(t52, pool_size=[2,2], strides=[2,2], padding=[0,0]) t54 = relay.nn.conv2d(t53, t14, padding=[1,1]) t55 = relay.add(t54, t31) t56 = relay.nn.relu(t55) t57 = relay.nn.conv2d(t56, t16, padding=[1,1]) t58 = relay.add(t57, t32) t59 = relay.nn.relu(t58) t60 = relay.nn.max_pool2d(t59, pool_size=[2,2], strides=[2,2], padding=[0,0]) t61 = relay.nn.avg_pool2d(t60, pool_size=[1,1], strides=[1,1], padding=[0,0]) t62 = relay.resh
[GitHub] [incubator-tvm] zhiics commented on pull request #6868: [WIP][TOPI][OP] cuda for argwhere
zhiics commented on pull request #6868: URL: https://github.com/apache/incubator-tvm/pull/6868#issuecomment-723283024 @mbrookhart Thanks. That would be cool. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] mbrookhart commented on pull request #6868: [WIP][TOPI][OP] cuda for argwhere
mbrookhart commented on pull request #6868: URL: https://github.com/apache/incubator-tvm/pull/6868#issuecomment-723282021 @zhiics I have a branch with the changes you'd need, but I haven't opened a PR because I've been fighting that memory corruption issue with topk. Would you like me to submit a PR to enable the other dynamic tests and include my refactors to strided slice? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-tvm] branch main updated (a6c29b2 -> 3cf997a)
This is an automated email from the ASF dual-hosted git repository. jroesch pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git. from a6c29b2 fix first-order AD on tuple arguments (#6827) add 3cf997a [Relay] Mix mode type inference (#6704) No new revisions were added by this update. Summary of changes: include/tvm/relay/expr_functor.h | 68 ++ src/relay/ir/expr_functor.cc | 68 -- src/relay/op/algorithm/topk.cc | 2 +- src/relay/transforms/type_infer.cc | 51 +++- 4 files changed, 112 insertions(+), 77 deletions(-)
[GitHub] [incubator-tvm] jroesch merged pull request #6704: [Relay] Mix mode type inference
jroesch merged pull request #6704: URL: https://github.com/apache/incubator-tvm/pull/6704 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] jroesch commented on pull request #6704: [Relay] Mix mode type inference
jroesch commented on pull request #6704: URL: https://github.com/apache/incubator-tvm/pull/6704#issuecomment-723264700 @lixiaoquan sorry I forgot to merge this one, going to land now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-tvm] branch main updated (df6a796 -> a6c29b2)
This is an automated email from the ASF dual-hosted git repository. jroesch pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git. from df6a796 [RELAY][OP] Support MXNet-style attributes for reshape_like (#6851) add a6c29b2 fix first-order AD on tuple arguments (#6827) No new revisions were added by this update. Summary of changes: src/relay/transforms/gradient.cc | 20 ++- tests/python/relay/test_pass_gradient.py | 33 2 files changed, 52 insertions(+), 1 deletion(-)
[GitHub] [incubator-tvm] junrushao1994 commented on pull request #6858: [FIX,RPC] Skip RPC tests when using multiprocessing's spawn method
junrushao1994 commented on pull request #6858: URL: https://github.com/apache/incubator-tvm/pull/6858#issuecomment-723264095 Thanks for the contribution! I didn't actually understand where the issue comes from? Is it because some of the `register_func`s are not correctly executed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] jroesch merged pull request #6827: [RELAY][GRAD] Fix first-order AD on tuple arguments
jroesch merged pull request #6827: URL: https://github.com/apache/incubator-tvm/pull/6827 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] tqchen commented on pull request #6858: [FIX,RPC] Skip RPC tests when using multiprocessing's spawn method
tqchen commented on pull request #6858: URL: https://github.com/apache/incubator-tvm/pull/6858#issuecomment-723263307 I think it is fine as it is. A better way is to also fix the testcases so that we don't have to rely on the forking behavior in many rpc tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] areusch commented on pull request #6865: Update search for bitcode files for rocm 3.9
areusch commented on pull request #6865: URL: https://github.com/apache/incubator-tvm/pull/6865#issuecomment-723262149 @jroesch @t-vi best guess is a race condition in the test--it launches an rpc.Server, doesn't wait for it to start listening, then immediately connects. you can see in the captured teardown stdio that the "listening" message is printed, so i'd guess the test could use like a sleep(0.1) or better yet a retry on the connect call. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] zhiics opened a new pull request #6868: [WIP][TOPI][OP] cuda for argwhere
zhiics opened a new pull request #6868: URL: https://github.com/apache/incubator-tvm/pull/6868 This PR adds cuda schedule for argwhere. - Since frameworks require sorted results, we sort the indices from the least significant to the most significant columns. - Only one block is used to avoid atomic_add emitting flaky results. - The added argwhere tests in test_any would currently fail because topi strided_slice currently doesn't support symbolic shape yet. @mbrookhart has some work on it. Will ping reviews when we can run argwhere relay tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] junrushao1994 commented on pull request #6851: [RELAY][OP] Support MXNet-style attributes for reshape_like
junrushao1994 commented on pull request #6851: URL: https://github.com/apache/incubator-tvm/pull/6851#issuecomment-723258939 Thanks @altanh @tkonolige @electriclilies @jroesch @giuseros! It is now merged :-) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-tvm] branch main updated (6d71a32 -> df6a796)
This is an automated email from the ASF dual-hosted git repository. junrushao pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git. from 6d71a32 [TIR] Make loop unrolling in LoopPartition optional (#6823) add df6a796 [RELAY][OP] Support MXNet-style attributes for reshape_like (#6851) No new revisions were added by this update. Summary of changes: include/tvm/relay/attrs/transform.h | 20 +++ python/tvm/relay/op/op_attrs.py | 5 +++ python/tvm/relay/op/transform.py | 43 ++- src/relay/op/make_op.h | 3 ++ src/relay/op/tensor/transform.cc | 66 +--- src/relay/transforms/pattern_utils.h | 6 ++-- tests/python/relay/test_op_level3.py | 41 +++--- 7 files changed, 163 insertions(+), 21 deletions(-)
[GitHub] [incubator-tvm] junrushao1994 merged pull request #6851: [RELAY][OP] Support MXNet-style attributes for reshape_like
junrushao1994 merged pull request #6851: URL: https://github.com/apache/incubator-tvm/pull/6851 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-tvm] branch main updated (3bfe6d3 -> 6d71a32)
This is an automated email from the ASF dual-hosted git repository. tqchen pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git. from 3bfe6d3 [BYOC][CONTRIB] Vitis-AI codegen integration (#6343) add 6d71a32 [TIR] Make loop unrolling in LoopPartition optional (#6823) No new revisions were added by this update. Summary of changes: src/tir/transforms/loop_partition.cc | 21 +++- .../unittest/test_tir_transform_loop_partition.py | 28 ++ 2 files changed, 43 insertions(+), 6 deletions(-)
[GitHub] [incubator-tvm] tqchen merged pull request #6823: [TIR] Make loop unrolling in LoopPartition optional
tqchen merged pull request #6823: URL: https://github.com/apache/incubator-tvm/pull/6823 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] jroesch commented on pull request #6865: Update search for bitcode files for rocm 3.9
jroesch commented on pull request #6865: URL: https://github.com/apache/incubator-tvm/pull/6865#issuecomment-723257007 It looks like it could be flaky networking? networking code can be flaky even on localhost. I re-ran pipeline any thoughts @tqchen or @areusch This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] comaniac commented on pull request #6867: Fix bug in processing script
comaniac commented on pull request #6867: URL: https://github.com/apache/incubator-tvm/pull/6867#issuecomment-723256729 It's a bit confusing to assign index to `scores`. How about something like ```python ranks = np.argsort(scores)[::-1] for rank in ranks[0:5]: print("class='%s' with probability=%f" % (labels[rank], scores[rank])) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-tvm] branch v0.7 updated: [Backport][Bugfix][Module] Fix recursive GetFunction in runtime::Module (#6866)
This is an automated email from the ASF dual-hosted git repository. tqchen pushed a commit to branch v0.7 in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git The following commit(s) were added to refs/heads/v0.7 by this push: new 5eb56a9 [Backport][Bugfix][Module] Fix recursive GetFunction in runtime::Module (#6866) 5eb56a9 is described below commit 5eb56a99c0fabae02671298f10c6222e75966bb1 Author: Junru Shao AuthorDate: Fri Nov 6 11:22:52 2020 -0800 [Backport][Bugfix][Module] Fix recursive GetFunction in runtime::Module (#6866) --- src/runtime/module.cc | 3 +++ .../test_runtime_module_based_interface.py | 30 ++ 2 files changed, 33 insertions(+) diff --git a/src/runtime/module.cc b/src/runtime/module.cc index 98b0b3a..e50ea1c 100644 --- a/src/runtime/module.cc +++ b/src/runtime/module.cc @@ -68,6 +68,9 @@ PackedFunc ModuleNode::GetFunction(const std::string& name, bool query_imports) if (query_imports) { for (Module& m : self->imports_) { pf = m.operator->()->GetFunction(name, query_imports); + if (pf != nullptr) { +return pf; + } } } return pf; diff --git a/tests/python/unittest/test_runtime_module_based_interface.py b/tests/python/unittest/test_runtime_module_based_interface.py index 1d682d2..a2e2e59 100644 --- a/tests/python/unittest/test_runtime_module_based_interface.py +++ b/tests/python/unittest/test_runtime_module_based_interface.py @@ -538,6 +538,35 @@ def test_debug_graph_runtime(): tvm.testing.assert_allclose(out, verify(data), atol=1e-5) +def test_multiple_imported_modules(): +def make_func(symbol): +n = tvm.te.size_var("n") +Ab = tvm.tir.decl_buffer((n,), dtype="float32") +i = tvm.te.var("i") +stmt = tvm.tir.For( +i, +0, +n - 1, +0, +0, +tvm.tir.Store(Ab.data, tvm.tir.Load("float32", Ab.data, i) + 1, i + 1), +) +return tvm.tir.PrimFunc([Ab], stmt).with_attr("global_symbol", symbol) + +def make_module(mod): +mod = tvm.IRModule(mod) +mod = tvm.driver.build(mod, target="llvm") +return mod + +module_main = make_module({"main": make_func("main")}) +module_a = make_module({"func_a": make_func("func_a")}) +module_b = make_module({"func_b": make_func("func_b")}) +module_main.import_module(module_a) +module_main.import_module(module_b) +module_main.get_function("func_a", query_imports=True) +module_main.get_function("func_b", query_imports=True) + + if __name__ == "__main__": test_legacy_compatibility() test_cpu() @@ -545,3 +574,4 @@ if __name__ == "__main__": test_mod_export() test_remove_package_params() test_debug_graph_runtime() +test_multiple_imported_modules()
[GitHub] [incubator-tvm] tqchen merged pull request #6866: [Backport][Bugfix][Module] Fix recursive GetFunction in runtime::Module
tqchen merged pull request #6866: URL: https://github.com/apache/incubator-tvm/pull/6866 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-tvm] branch v0.7 updated (091a427 -> 12e3551)
This is an automated email from the ASF dual-hosted git repository. tqchen pushed a change to branch v0.7 in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git. from 091a427 [FFI][BUGFIX] Fix memory leak when Pac callback argument is NDArray (#6744) (#6821) add 12e3551 Update task_golang.sh No new revisions were added by this update. Summary of changes: tests/scripts/task_golang.sh | 2 ++ 1 file changed, 2 insertions(+)
[GitHub] [incubator-tvm] giuseros edited a comment on pull request #6860: [TIR] Add spans to all ExprNodes
giuseros edited a comment on pull request #6860: URL: https://github.com/apache/incubator-tvm/pull/6860#issuecomment-723235173 Hi @jroesch , Thanks for the detailed explanation! I still think that it would have been very nice to have this explanation on the forum (also for future reference and future changes). For this change, if we commit to not adding more to the AST, I am happy to approve! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] giuseros commented on pull request #6802: Add smmla/ummla support in quantized Conv2d
giuseros commented on pull request #6802: URL: https://github.com/apache/incubator-tvm/pull/6802#issuecomment-723238412 Hi @FrozenGene , Thanks for approving! Yes, I wanted to do that in a separate PR to not pollute this one. Is that ok with you? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] giuseros commented on pull request #6860: [TIR] Add spans to all ExprNodes
giuseros commented on pull request #6860: URL: https://github.com/apache/incubator-tvm/pull/6860#issuecomment-723235173 Hi @jroesch , Thanks for the brilliant explanation! I still think that it would have been very nice to have this explanation on the forum (also for future reference and future changes). For this change, if we commit to not adding more to the AST, I am happy to approve! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] jroesch commented on pull request #6860: [TIR] Add spans to all ExprNodes
jroesch commented on pull request #6860: URL: https://github.com/apache/incubator-tvm/pull/6860#issuecomment-723230910 @giuseros The entire Relay AST has had spans for the entire existence of the IR, this change is a follow-on from UnifiedIR refactors where we make things more consistent. The span design originates (or Location) style of diagnostics is the style adopted by many modern compilers including Rust, and MLIR. The reason to have spans directly on the AST is the same reason to have type information they are important fields and having them be "intrinsic" vs. "extrinsic" properties. In my exp. working on compilers having things exist in global stateful maps which must be kept in sync introduces complexity as the global state must be passed everywhere and you have to be very careful at which time you read from the maps. For example propagating span information inside of a pass which builds new AST fragments is easy as you can directly build span information from existing spans. If we want to attach more meta-data for diagnostics I think we should attach that information to the diagnostic objects instead of attaching them to the spans/ast nodes directly. The diagnostics correspond to a location where some information was generated and the spans are indexes into the source representation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] t-vi commented on pull request #6865: Update search for bitcode files for rocm 3.9
t-vi commented on pull request #6865: URL: https://github.com/apache/incubator-tvm/pull/6865#issuecomment-723227479 I could use a hint regarding the CI failure. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] hogepodge commented on pull request #6867: Fix bug in processing script
hogepodge commented on pull request #6867: URL: https://github.com/apache/incubator-tvm/pull/6867#issuecomment-723224164 @mbrookhart This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] hogepodge opened a new pull request #6867: Fix bug in processing script
hogepodge opened a new pull request #6867: URL: https://github.com/apache/incubator-tvm/pull/6867 The argsort command returns a new array that is the sorted index rather than a new sorted value array. This patch stores the sorted index in a new variable and uses it to reference the predicted values. Thanks for contributing to TVM! Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from [Reviewers](https://github.com/apache/incubator-tvm/blob/master/CONTRIBUTORS.md#reviewers) by @ them in the pull request thread. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] junrushao1994 commented on pull request #6859: [Bugfix][Module] Fix recursive GetFunction in runtime::Module
junrushao1994 commented on pull request #6859: URL: https://github.com/apache/incubator-tvm/pull/6859#issuecomment-723215782 @tqchen Thanks! Backport PR: #6866. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] junrushao1994 opened a new pull request #6866: [Backport][Bugfix][Module] Fix recursive GetFunction in runtime::Module
junrushao1994 opened a new pull request #6866: URL: https://github.com/apache/incubator-tvm/pull/6866 Backports #6859 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] giuseros commented on pull request #6860: [TIR] Add spans to all ExprNodes
giuseros commented on pull request #6860: URL: https://github.com/apache/incubator-tvm/pull/6860#issuecomment-723204745 Well, I meant an RFC discussing this interface change. In general, I think those interface changes should be first discussed in the discuss forum, and their implementation should then be discussed in the PR. Probably the same applies to the Relay PR you mentioned. Anyway, I am OK in continuing the interface discussion here. I think that adding a single span parameter to all the Expr nodes risks of polluting the interface (especially if, as you said, in the future more info will be needed). Is it possible to wrap the span at least in a DebugInfo structure? What are the deltas of this approach ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] tkonolige commented on pull request #6860: [TIR] Add spans to all ExprNodes
tkonolige commented on pull request #6860: URL: https://github.com/apache/incubator-tvm/pull/6860#issuecomment-723196876 @giuseros This is the RFC that covers all error handling: https://discuss.tvm.apache.org/t/rfc-meta-rfc-3-pronged-plan-for-improving-error-messages-in-tvm. This PR is really just a continuation of #6274. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-tvm] branch main updated (01e76c2 -> 3bfe6d3)
This is an automated email from the ASF dual-hosted git repository. zhic pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git. from 01e76c2 [Bugfix][Module] Fix recursive GetFunction in runtime::Module (#6859) add 3bfe6d3 [BYOC][CONTRIB] Vitis-AI codegen integration (#6343) No new revisions were added by this update. Summary of changes: CMakeLists.txt | 2 + cmake/config.cmake | 3 + cmake/modules/contrib/VitisAI.cmake| 47 ++ docs/deploy/index.rst | 1 + docs/deploy/vitis_ai.rst | 652 + python/tvm/contrib/target/vitis_ai.py | 156 + python/tvm/relay/op/contrib/vitis_ai.py| 100 .../backend/contrib/vitis_ai/config_vitis_ai.cc| 46 ++ src/runtime/contrib/vitis_ai/vitis_ai_runtime.cc | 194 ++ src/runtime/contrib/vitis_ai/vitis_ai_runtime.h| 115 .../python/contrib/test_vitis_ai}/__init__.py | 2 +- .../python/contrib/test_vitis_ai/infrastructure.py | 171 ++ .../contrib/test_vitis_ai/test_vitis_ai_codegen.py | 336 +++ .../test_vitis_ai_runtime_cpu_part.py | 82 +++ tests/scripts/task_config_build_cpu.sh | 1 + 15 files changed, 1907 insertions(+), 1 deletion(-) create mode 100644 cmake/modules/contrib/VitisAI.cmake create mode 100755 docs/deploy/vitis_ai.rst create mode 100644 python/tvm/contrib/target/vitis_ai.py create mode 100644 python/tvm/relay/op/contrib/vitis_ai.py create mode 100644 src/relay/backend/contrib/vitis_ai/config_vitis_ai.cc create mode 100755 src/runtime/contrib/vitis_ai/vitis_ai_runtime.cc create mode 100755 src/runtime/contrib/vitis_ai/vitis_ai_runtime.h copy {vta/python/vta/exec => tests/python/contrib/test_vitis_ai}/__init__.py (93%) create mode 100644 tests/python/contrib/test_vitis_ai/infrastructure.py create mode 100644 tests/python/contrib/test_vitis_ai/test_vitis_ai_codegen.py create mode 100644 tests/python/contrib/test_vitis_ai/test_vitis_ai_runtime_cpu_part.py
[GitHub] [incubator-tvm] zhiics merged pull request #6343: [BYOC][CONTRIB] Vitis-AI codegen integration
zhiics merged pull request #6343: URL: https://github.com/apache/incubator-tvm/pull/6343 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] zhiics commented on pull request #6343: [BYOC][CONTRIB] Vitis-AI codegen integration
zhiics commented on pull request #6343: URL: https://github.com/apache/incubator-tvm/pull/6343#issuecomment-723196376 Thanks everyone. This is now merged! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] giuseros edited a comment on pull request #6860: [TIR] Add spans to all ExprNodes
giuseros edited a comment on pull request #6860: URL: https://github.com/apache/incubator-tvm/pull/6860#issuecomment-723194362 @tkonolige is there an RFC (or anything similar) discussing these interface changes with an evaluation of the alternative designs? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] giuseros commented on pull request #6860: [TIR] Add spans to all ExprNodes
giuseros commented on pull request #6860: URL: https://github.com/apache/incubator-tvm/pull/6860#issuecomment-723194362 @tkonolige is there an RFC (or anything similar) discussing these changes with an evaluation of the alternative designs? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] tkonolige commented on a change in pull request #6797: [TVMSCRIPT] Using diagnostics for TVM Script
tkonolige commented on a change in pull request #6797: URL: https://github.com/apache/incubator-tvm/pull/6797#discussion_r518885303 ## File path: python/tvm/script/diagnostics.py ## @@ -0,0 +1,52 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +"""Bridge from synr DiagnosticContext to TVM's diagnostics""" Review comment: When you parse the AST with synr, you also pass in a object implementing synr's DiagnosticContext interface (https://synr.readthedocs.io/en/latest/#synr.DiagnosticContext). The DiagnosticContext is responsible for accumulating and reporting errors. This file defines a wrapper object that implements synr's DiagnosticContext and calls out to TVMCtx to report errors. I think the closest RFC you will find is https://discuss.tvm.apache.org/t/rfc-meta-rfc-3-pronged-plan-for-improving-error-messages-in-tvm/7214. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] zhiics commented on issue #5765: Feature-request: State-of-art Yolo v4 Detector
zhiics commented on issue #5765: URL: https://github.com/apache/incubator-tvm/issues/5765#issuecomment-723192392 Gentle ping @siju-samuel @AlexeyAB Is there any update the support of Darknet Yolo V4? Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] u99127 commented on a change in pull request #6797: [TVMSCRIPT] Using diagnostics for TVM Script
u99127 commented on a change in pull request #6797: URL: https://github.com/apache/incubator-tvm/pull/6797#discussion_r517327118 ## File path: python/tvm/script/diagnostics.py ## @@ -0,0 +1,52 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +"""Bridge from synr DiagnosticContext to TVM's diagnostics""" Review comment: The comment is not very clear as it only refers to linking a Diagnostic context in synr to tvm. Is there an RFC or something that relates to this that you could point people at who might be interested in what problem this is solving and why this is needed ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] tkonolige commented on pull request #6860: [TIR] Add spans to all ExprNodes
tkonolige commented on pull request #6860: URL: https://github.com/apache/incubator-tvm/pull/6860#issuecomment-723180231 @giuseros This is related to #6797. The goal is to use tvmscript to provide line numbers for TIR. In the future we would like all TIR nodes to have span information. This PR is also related to #6274 in which spanning information was added to all relay nodes. You'll have to ask @jroesch about the why we are using Spans vs a more complex DebugInfo struct. I know in the future we may want more complex debugging info. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] tkonolige commented on a change in pull request #6860: [TIR] Add spans to all ExprNodes
tkonolige commented on a change in pull request #6860: URL: https://github.com/apache/incubator-tvm/pull/6860#discussion_r518868048 ## File path: src/ir/expr.cc ## @@ -55,20 +55,21 @@ PrimExpr PrimExpr::FromObject_(ObjectRef ref) { return Downcast(ref); } -IntImm::IntImm(DataType dtype, int64_t value) { - ICHECK(dtype.is_scalar()) << "ValueError: IntImm can only take scalar."; - ICHECK(dtype.is_int() || dtype.is_uint()) << "ValueError: IntImm supports only int or uint type."; +IntImm::IntImm(DataType dtype, int64_t value, Span span) { + ICHECK(dtype.is_scalar()) << "ValueError: IntImm can only take scalar, but " << dtype << " was supplied."; + ICHECK(dtype.is_int() || dtype.is_uint()) << "ValueError: IntImm supports only int or uint type, but " << dtype << " was supplied."; Review comment: I just cleaned up the error messages while I was there. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org