[tvm] branch main updated: [Auto Scheduler] Add target host to measure record (#7046)
This is an automated email from the ASF dual-hosted git repository. zhaowu pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/tvm.git The following commit(s) were added to refs/heads/main by this push: new a867bcb [Auto Scheduler] Add target host to measure record (#7046) a867bcb is described below commit a867bcbf1ecf537cfb061a2ca4790b16a9cc9748 Author: Zhao Wu AuthorDate: Tue Dec 8 14:46:29 2020 +0800 [Auto Scheduler] Add target host to measure record (#7046) * [Auto Scheduler] Add target host to measure record * Fix PyLint * Fix lint * Solve the serialization logic when we don't have hardware params * update auto scheduler log --- src/auto_scheduler/measure_record.cc | 12 -- .../python/unittest/test_auto_scheduler_measure.py | 26 ++ 2 files changed, 36 insertions(+), 2 deletions(-) diff --git a/src/auto_scheduler/measure_record.cc b/src/auto_scheduler/measure_record.cc index d57e2f2..aad0abe 100644 --- a/src/auto_scheduler/measure_record.cc +++ b/src/auto_scheduler/measure_record.cc @@ -163,6 +163,9 @@ struct Handler<::tvm::auto_scheduler::SearchTaskNode> { writer->WriteArrayItem(std::string(data.workload_key)); writer->WriteArrayItem(data.target->str()); writer->WriteArrayItem(*data.hardware_params.get()); +if (data.target_host.defined()) { + writer->WriteArrayItem(data.target_host->str()); +} writer->EndArray(); } inline static void Read(dmlc::JSONReader* reader, ::tvm::auto_scheduler::SearchTaskNode* data) { @@ -183,7 +186,12 @@ struct Handler<::tvm::auto_scheduler::SearchTaskNode> { reader->Read(hardware_params_node.get()); s = reader->NextArrayItem(); data->hardware_params = ::tvm::auto_scheduler::HardwareParams(hardware_params_node); - ICHECK(!s); + if (s) { +reader->Read(&str_value); +data->target_host = ::tvm::Target(str_value); +s = reader->NextArrayItem(); +ICHECK(!s); + } } } }; @@ -271,7 +279,7 @@ namespace auto_scheduler { TVM_REGISTER_OBJECT_TYPE(RecordToFileNode); TVM_REGISTER_OBJECT_TYPE(RecordReaderNode); -const std::string AUTO_SCHEDULER_LOG_VERSION = "v0.3"; // NOLINT(*) +const std::string AUTO_SCHEDULER_LOG_VERSION = "v0.4"; // NOLINT(*) RecordToFile::RecordToFile(String filename) { auto node = make_object(); diff --git a/tests/python/unittest/test_auto_scheduler_measure.py b/tests/python/unittest/test_auto_scheduler_measure.py index b214d9c..10bb0b4 100644 --- a/tests/python/unittest/test_auto_scheduler_measure.py +++ b/tests/python/unittest/test_auto_scheduler_measure.py @@ -250,6 +250,31 @@ def test_measure_local_builder_rpc_runner_spawn(): p.join() +@tvm.testing.requires_llvm +def test_measure_target_host(): +task = auto_scheduler.SearchTask( +func=matmul_auto_scheduler_test, +args=(512, 512, 512), +target="llvm", +target_host="llvm -mtriple=aarch64-linux-gnu", +) + +inp = auto_scheduler.measure.MeasureInput(task, task.compute_dag.init_state) +res = auto_scheduler.measure.MeasureResult([0.1], 0, "", 0.2, 1) + +with tempfile.NamedTemporaryFile() as fp: +auto_scheduler.save_records(fp.name, [inp], [res]) + +log_reader = auto_scheduler.RecordReader(fp.name) +inputs, results = log_reader.read_lines() +assert len(inputs) == 1 + +raw_inp = inputs[0] + +recovered_inp = auto_scheduler.measure.recover_measure_input(raw_inp) +assert str(recovered_inp.task.target_host) == str(inp.task.target_host) + + if __name__ == "__main__": test_record_split_reorder_fuse_annotation() test_record_compute_at_root_inline_cache_read_write() @@ -258,3 +283,4 @@ if __name__ == "__main__": test_recover_measure_input() test_measure_local_builder_runner() test_measure_local_builder_rpc_runner() +test_measure_target_host()
[GitHub] [tvm] FrozenGene merged pull request #7046: [Auto Scheduler] Add target host to measure record
FrozenGene merged pull request #7046: URL: https://github.com/apache/tvm/pull/7046 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] antinucleon commented on a change in pull request #7053: [auto_scheduler] buffer support, correctness check
antinucleon commented on a change in pull request #7053: URL: https://github.com/apache/tvm/pull/7053#discussion_r538065121 ## File path: python/tvm/auto_scheduler/auto_schedule.py ## @@ -136,9 +159,14 @@ def __init__( if isinstance(runner, str): if runner == "local": -runner = LocalRunner() +runner = LocalRunner(working_dir=self.temp_working_dir) else: raise ValueError("Invalid runner: " + runner) + +elif isinstance(runner, RPCRunner): +rpc_kwargs = runner.kwargs +rpc_kwargs["working_dir"] = self.temp_working_dir Review comment: https://github.com/apache/tvm/pull/7053/files#diff-5e828f80da7fdc456523468e7f69c1617d8a88867500f68998567fbd6f95a1d7R183 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] FrozenGene commented on a change in pull request #7053: [auto_scheduler] buffer support, correctness check
FrozenGene commented on a change in pull request #7053: URL: https://github.com/apache/tvm/pull/7053#discussion_r538062449 ## File path: python/tvm/auto_scheduler/auto_schedule.py ## @@ -136,9 +159,14 @@ def __init__( if isinstance(runner, str): if runner == "local": -runner = LocalRunner() +runner = LocalRunner(working_dir=self.temp_working_dir) else: raise ValueError("Invalid runner: " + runner) + +elif isinstance(runner, RPCRunner): +rpc_kwargs = runner.kwargs +rpc_kwargs["working_dir"] = self.temp_working_dir Review comment: How this argument be used later? I can not find any logic to handle this parameter... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] FrozenGene commented on a change in pull request #7053: [auto_scheduler] buffer support, correctness check
FrozenGene commented on a change in pull request #7053: URL: https://github.com/apache/tvm/pull/7053#discussion_r538060242 ## File path: python/tvm/auto_scheduler/auto_schedule.py ## @@ -210,6 +253,35 @@ def auto_schedule(task, search_policy=None, tuning_options=TuningOptions()): if search_policy is None: cost_model = XGBModel() search_policy = SketchPolicy(task, cost_model) + +if tuning_options.check_correctness == True: +empty_sch, args = task.compute_dag.apply_steps_from_state( +task.compute_dag.get_init_state(), layout_rewrite=True) +cpu_func = build_module.build( +empty_sch, args, target="llvm", target_host=task.target_host +) +buffer_path = os.path.join(tuning_options.working_dir, "buffer.pkl") +if os.path.exists(buffer_path) is True: +with open(buffer_path, "rb") as fi: +buffer = pickle.load(fi) +if len(buffer) == len(args): +# we skip check each arg shape here +pass +elif len(buffer) == len(args) - 1: +# assume only one output +np_args = np.zeros(size=get_const_tuple(args[-1].shape)).astype(args[-1].dtype) +cpu_args = [v for _, v in buffer.items()] + [ndarray.array(np_args, ctx=tvm.cpu())] +cpu_func(*cpu_args) +### save cpu result +answer = [x.asnumpy() for x in cpu_args] +tuning_options.register_buffer(args[-1].name, answer[-1]) +else: +np_args = [np.random.uniform(-0.1, 0.1, size=get_const_tuple(x.shape)).astype(x.dtype) for x in args] Review comment: We should use `random_fill` function as it help us handle different types. For example, for quantized uint8 dtype, [-0.1, 0.1] of `np.random.uniform(-0.1, 0.1, size=get_const_tuple(x.shape)).astype(x.dtype) for x in args` will be all zeros, which is not we want. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] antinucleon opened a new pull request #7053: [auto_scheduler] buffer support, correctness check
antinucleon opened a new pull request #7053: URL: https://github.com/apache/tvm/pull/7053 This PR enables correctness check for a generated schedule. It is useful for: - Metal / ROCM: For some invalid schedule, driver may skip it instead of return any errors (which will show as an impossble large FLOPS number) - Sparse kernel search Example 1: correctness check. This will generate random buffers for correctness check ``` if train_flag: #measure_ctx = auto_scheduler.LocalRPCMeasureContext(min_repeat_ms=300) measure_runner = auto_scheduler.RPCRunner("m1", "127.0.0.1", 9190, min_repeat_ms=300, timeout=30, repeat=3) tune_option = auto_scheduler.TuningOptions( num_measure_trials=1500, check_correctness=True, builder_n_parallel=1, runner=measure_runner, measure_callbacks=[auto_scheduler.RecordToFile(log_file)], verbose=2, ) sch, args = auto_scheduler.auto_schedule(task, tuning_options=tune_option) ``` Example 2: Sparse tuning, this will register given buffers for measure. ``` if train_flag: #measure_ctx = auto_scheduler.LocalRPCMeasureContext(min_repeat_ms=300) measure_runner = auto_scheduler.RPCRunner("m1", "127.0.0.1", 9190, min_repeat_ms=300, timeout=30, repeat=3) tune_option = auto_scheduler.TuningOptions( num_measure_trials=1500, #runner=measure_ctx.runner, check_correctness=False, builder_n_parallel=1, runner=measure_runner, measure_callbacks=[auto_scheduler.RecordToFile(log_file)], verbose=2, ) for k, v in BUFFER.items(): tune_option.register_buffer(k, v) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] FrozenGene commented on a change in pull request #7046: [Auto Scheduler] Add target host to measure record
FrozenGene commented on a change in pull request #7046: URL: https://github.com/apache/tvm/pull/7046#discussion_r537987159 ## File path: src/auto_scheduler/measure_record.cc ## @@ -183,7 +186,12 @@ struct Handler<::tvm::auto_scheduler::SearchTaskNode> { reader->Read(hardware_params_node.get()); s = reader->NextArrayItem(); data->hardware_params = ::tvm::auto_scheduler::HardwareParams(hardware_params_node); - ICHECK(!s); + if (s) { +reader->Read(&str_value); +data->target_host = ::tvm::Target(str_value); +s = reader->NextArrayItem(); +ICHECK(!s); Review comment: Move out will break back compatibility This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] FrozenGene commented on pull request #7046: [Auto Scheduler] Add target host to measure record
FrozenGene commented on pull request #7046: URL: https://github.com/apache/tvm/pull/7046#issuecomment-740321677 @merrymercy @comaniac Have another round of look This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] Laurawly commented on a change in pull request #6839: [ONNX] NMS in ONNX
Laurawly commented on a change in pull request #6839: URL: https://github.com/apache/tvm/pull/6839#discussion_r537977579 ## File path: python/tvm/topi/cuda/nms.py ## @@ -97,47 +97,44 @@ def get_valid_counts_ir( valid_count = ib.buffer_ptr(valid_count) out = ib.buffer_ptr(out) out_indices = ib.buffer_ptr(out_indices) -atomic_add_return = ib.allocate( -valid_count.dtype, (1,), name="atomic_add_return", scope="local" -) -one_count = tvm.tir.const(1, dtype=valid_count.dtype) one = tvm.tir.const(1, dtype=out.dtype) -score_threshold = tvm.ir.make_node("FloatImm", dtype="float32", value=score_threshold) +if isinstance(score_threshold, float): +score_threshold = tvm.ir.make_node("FloatImm", dtype="float32", value=score_threshold) id_index = tvm.ir.make_node("IntImm", dtype="int32", value=id_index) score_index = tvm.ir.make_node("IntImm", dtype="int32", value=score_index) max_threads = int(tvm.target.Target.current(allow_none=False).max_num_threads) -nthread_tx = max_threads -nthread_bx = batch_size * num_anchors // max_threads + 1 -tx = te.thread_axis("threadIdx.x") -bx = te.thread_axis("blockIdx.x") -ib.scope_attr(tx, "thread_extent", nthread_tx) -ib.scope_attr(bx, "thread_extent", nthread_bx) -tid = bx * max_threads + tx -idxd = tvm.tir.indexdiv - -# initialize valid_count -with ib.if_scope(tid < batch_size): -valid_count[tid] = 0 -with ib.if_scope(tid < batch_size * num_anchors): -i = idxd(tid, num_anchors) -with ib.if_scope( -tvm.tir.all( -data[tid * elem_length + score_index] > score_threshold, -tvm.tir.any(id_index < 0, data[tid * elem_length + id_index] >= 0), -) -): -atomic_add_return[0] = atomic_add( -tvm.tir.call_intrin("handle", "tir.address_of", valid_count[i]), one_count -) -with ib.for_range(0, elem_length) as k: -out[tid * elem_length + k] = data[tid * elem_length + k] -out_indices[tid + k] = tid + k -with ib.else_scope(): -with ib.for_range(0, elem_length) as k: -out[tid * elem_length + k] = -one -out_indices[tid + k] = -one_count - +with ib.new_scope(): +nthread_tx = max_threads +nthread_bx = batch_size // max_threads + 1 +tx = te.thread_axis("threadIdx.x") +bx = te.thread_axis("blockIdx.x") +ib.scope_attr(tx, "thread_extent", nthread_tx) +ib.scope_attr(bx, "thread_extent", nthread_bx) +tid = bx * max_threads + tx +with ib.if_scope(tid < batch_size): +valid_count[tid] = 0 +i = tid +with ib.for_range(0, num_anchors) as j: +score = data[(i * num_anchors + j) * elem_length + score_index] +with ib.if_scope( +tvm.tir.all( +score > score_threshold, +tvm.tir.any( +id_index < 0, data[(i * num_anchors + j) * elem_length + id_index] >= 0 +), +) +): +with ib.for_range(0, elem_length) as k: +out[(i * num_anchors + valid_count[i]) * elem_length + k] = data[ +(i * num_anchors + j) * elem_length + k +] +out_indices[i * num_anchors + valid_count[i]] = j +valid_count[i] += 1 Review comment: I see. So we lose parallelism in `num_anchors` (could be quite large) compared with the original implementation. Are we able to keep the level of parallelism while getting the correct output indices? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] Laurawly commented on a change in pull request #6839: [ONNX] NMS in ONNX
Laurawly commented on a change in pull request #6839: URL: https://github.com/apache/tvm/pull/6839#discussion_r537977579 ## File path: python/tvm/topi/cuda/nms.py ## @@ -97,47 +97,44 @@ def get_valid_counts_ir( valid_count = ib.buffer_ptr(valid_count) out = ib.buffer_ptr(out) out_indices = ib.buffer_ptr(out_indices) -atomic_add_return = ib.allocate( -valid_count.dtype, (1,), name="atomic_add_return", scope="local" -) -one_count = tvm.tir.const(1, dtype=valid_count.dtype) one = tvm.tir.const(1, dtype=out.dtype) -score_threshold = tvm.ir.make_node("FloatImm", dtype="float32", value=score_threshold) +if isinstance(score_threshold, float): +score_threshold = tvm.ir.make_node("FloatImm", dtype="float32", value=score_threshold) id_index = tvm.ir.make_node("IntImm", dtype="int32", value=id_index) score_index = tvm.ir.make_node("IntImm", dtype="int32", value=score_index) max_threads = int(tvm.target.Target.current(allow_none=False).max_num_threads) -nthread_tx = max_threads -nthread_bx = batch_size * num_anchors // max_threads + 1 -tx = te.thread_axis("threadIdx.x") -bx = te.thread_axis("blockIdx.x") -ib.scope_attr(tx, "thread_extent", nthread_tx) -ib.scope_attr(bx, "thread_extent", nthread_bx) -tid = bx * max_threads + tx -idxd = tvm.tir.indexdiv - -# initialize valid_count -with ib.if_scope(tid < batch_size): -valid_count[tid] = 0 -with ib.if_scope(tid < batch_size * num_anchors): -i = idxd(tid, num_anchors) -with ib.if_scope( -tvm.tir.all( -data[tid * elem_length + score_index] > score_threshold, -tvm.tir.any(id_index < 0, data[tid * elem_length + id_index] >= 0), -) -): -atomic_add_return[0] = atomic_add( -tvm.tir.call_intrin("handle", "tir.address_of", valid_count[i]), one_count -) -with ib.for_range(0, elem_length) as k: -out[tid * elem_length + k] = data[tid * elem_length + k] -out_indices[tid + k] = tid + k -with ib.else_scope(): -with ib.for_range(0, elem_length) as k: -out[tid * elem_length + k] = -one -out_indices[tid + k] = -one_count - +with ib.new_scope(): +nthread_tx = max_threads +nthread_bx = batch_size // max_threads + 1 +tx = te.thread_axis("threadIdx.x") +bx = te.thread_axis("blockIdx.x") +ib.scope_attr(tx, "thread_extent", nthread_tx) +ib.scope_attr(bx, "thread_extent", nthread_bx) +tid = bx * max_threads + tx +with ib.if_scope(tid < batch_size): +valid_count[tid] = 0 +i = tid +with ib.for_range(0, num_anchors) as j: +score = data[(i * num_anchors + j) * elem_length + score_index] +with ib.if_scope( +tvm.tir.all( +score > score_threshold, +tvm.tir.any( +id_index < 0, data[(i * num_anchors + j) * elem_length + id_index] >= 0 +), +) +): +with ib.for_range(0, elem_length) as k: +out[(i * num_anchors + valid_count[i]) * elem_length + k] = data[ +(i * num_anchors + j) * elem_length + k +] +out_indices[i * num_anchors + valid_count[i]] = j +valid_count[i] += 1 Review comment: I see. So we lose parallelism in `num_anchors` compared with the original implementation. Are we able to keep the level of parallelism while getting the correct output indices? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] Coder-nlper opened a new issue #7052: c++ run slower than python?
Coder-nlper opened a new issue #7052: URL: https://github.com/apache/tvm/issues/7052 I use the following example, and modify it to load my model. https://github.com/apache/tvm/blob/main/apps/howto_deploy/cpp_deploy.cc I count the inference time of python and C++. c++ code: clock_t startTime, endTime; startTime=clock(); for (int i = 0; i < 36; ++i) { static_cast(x->data)[i] = array[i]; } // set the right input set_input("input_ids", x); // run the code run(); // get the output get_output(0, y); for (int i = 0; i < 36; ++i) { cout << static_cast(y->data)[i] << " "; } endTime = clock(); cout<<(double)(endTime - startTime)/CLOCKS_PER_SEC<
[GitHub] [tvm-vta] dsteger commented on a change in pull request #20: Enable Supported Xilinx target ZCU104 with Hardware Preset
dsteger commented on a change in pull request #20: URL: https://github.com/apache/tvm-vta/pull/20#discussion_r537919215 ## File path: hardware/xilinx/scripts/vivado.tcl ## @@ -80,6 +82,11 @@ set store_ip "${ip_path}/vta_store/soln/impl/ip/xilinx_com_hls_store_1_0.zip" # Create custom project create_project -force $proj_name $proj_path -part $device +# Apply board preset if exists +if {$board != "None" && $board_rev != "None"} { + set_property BOARD_PART $board:$board_rev [current_project] Review comment: @liangfu Let me know if you have any more questions. Would love to get this merged so I can submit the next PR to support the ultra96 variants. (v1 and v2 have different parts from a tools point of view - v2 is industrial grade). ## File path: hardware/xilinx/scripts/vivado.tcl ## @@ -80,6 +82,11 @@ set store_ip "${ip_path}/vta_store/soln/impl/ip/xilinx_com_hls_store_1_0.zip" # Create custom project create_project -force $proj_name $proj_path -part $device +# Apply board preset if exists +if {$board != "None" && $board_rev != "None"} { + set_property BOARD_PART $board:$board_rev [current_project] Review comment: AVNET has board files that can be used as external sources. Would be a good update for the Ultra96 hardware. https://github.com/Avnet/bdf/tree/master/ultra96v2/1.1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] jwfromm commented on pull request #7036: [Relay][Frontend][Onnx] MaxUnpool Operator
jwfromm commented on pull request #7036: URL: https://github.com/apache/tvm/pull/7036#issuecomment-740284637 @masahi that would be nice, although in this case I was just being lazy and not calculating the output shape. I've fixed the test now if you want to take another look. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] altanh commented on pull request #6798: [Relay][VM] Add support for references.
altanh commented on pull request #6798: URL: https://github.com/apache/tvm/pull/6798#issuecomment-740276633 I think the failing unit tests are actually unsound and rely on DCE for refs, or perhaps they are sound but correctness is definitely not guaranteed by what we currently have. cc @MarisaKirisame, how should we proceed? We might have to block this PR until DCE gets fixed once and for all, or disable all the offending tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[tvm] branch main updated (2a2081e -> 5e68e6a)
This is an automated email from the ASF dual-hosted git repository. tqchen pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/tvm.git. from 2a2081e [TOPI] GPU scatter_add using atomic (#7044) add 5e68e6a [DOCS] Document cloudpickle dependency in tutorials (#7049) No new revisions were added by this update. Summary of changes: docs/install/from_source.rst | 2 +- tutorials/autotvm/tune_conv2d_cuda.py | 2 +- tutorials/autotvm/tune_relay_arm.py| 2 +- tutorials/autotvm/tune_relay_cuda.py | 2 +- tutorials/autotvm/tune_relay_mobile_gpu.py | 2 +- tutorials/autotvm/tune_simple_template.py | 2 +- vta/tutorials/autotvm/tune_relay_vta.py| 2 +- 7 files changed, 7 insertions(+), 7 deletions(-)
[GitHub] [tvm] tqchen merged pull request #7049: [DOCS] Document cloudpickle dependency in tutorials
tqchen merged pull request #7049: URL: https://github.com/apache/tvm/pull/7049 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] altanh commented on pull request #6798: [Relay][VM] Add support for references.
altanh commented on pull request #6798: URL: https://github.com/apache/tvm/pull/6798#issuecomment-740251072 I've addressed the ADT tag issue as suggested (using Tuple), and left a TODO comment in DCE. For the record, I tried adding Feature set checking in DCE to error on detecting RefWrite. However, the VM compilation process requires DCE in other places (I think most critically in `InlinePrimitives`), so I couldn't get the test to pass without removing feature checking. I think this raises the criticality of fixing DCE slightly, and we are working on it. re @mbrookhart , I'm not totally sure what you meant by the debug stuff, but references have been in Relay since higher-order AD was introduced. I'm not sure if there is/was an RFC for adding references, although I agree we should probably make one especially as we are working towards supporting stateful stuff going forward. cc @jroesch who might have more thoughts This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm-vta] dsteger commented on a change in pull request #20: Enable Supported Xilinx target ZCU104 with Hardware Preset
dsteger commented on a change in pull request #20: URL: https://github.com/apache/tvm-vta/pull/20#discussion_r537919215 ## File path: hardware/xilinx/scripts/vivado.tcl ## @@ -80,6 +82,11 @@ set store_ip "${ip_path}/vta_store/soln/impl/ip/xilinx_com_hls_store_1_0.zip" # Create custom project create_project -force $proj_name $proj_path -part $device +# Apply board preset if exists +if {$board != "None" && $board_rev != "None"} { + set_property BOARD_PART $board:$board_rev [current_project] Review comment: @liangfu Let me know if you have any more questions. Would love to get this merged so I can submit the next PR to support the ultra96 variants. (v1 and v2 have different parts from a tools point of view - v2 is industrial grade). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] comaniac commented on pull request #7046: [Auto Scheduler] Add target host to measure record
comaniac commented on pull request #7046: URL: https://github.com/apache/tvm/pull/7046#issuecomment-740235434 > Currently, we have 4 versions of log format. They all have backward compatibility. > We can call this PR v0.4. This PR can correctly read v0.3 and v0.2. We do not need to do any additional checks. Yeah that's definitely more flexible. I'm just afraid that it might introduce some confusions, as AutoTVM strictly checks log versions. Maybe we could let it be for now and add the check once the newer log format is no longer backward compatible. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] merrymercy commented on pull request #7046: [Auto Scheduler] Add target host to measure record
merrymercy commented on pull request #7046: URL: https://github.com/apache/tvm/pull/7046#issuecomment-740232724 Currently, we have 4 versions of log format. They all have backward compatibility. We can call this PR v0.4. v0.4 can correctly read v0.3 and v0.2. We do not need to do any additional checks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] merrymercy edited a comment on pull request #7046: [Auto Scheduler] Add target host to measure record
merrymercy edited a comment on pull request #7046: URL: https://github.com/apache/tvm/pull/7046#issuecomment-740232724 Currently, we have 4 versions of log format. They all have backward compatibility. We can call this PR v0.4. This PR can correctly read v0.3 and v0.2. We do not need to do any additional checks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm-vta] dsteger commented on pull request #20: Enable Supported Xilinx target ZCU104 with Hardware Preset
dsteger commented on pull request #20: URL: https://github.com/apache/tvm-vta/pull/20#issuecomment-740216108 I just forced push a change to apply the preset. FYI This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] masahi opened a new pull request #7051: [LLVM] Support atomic for GPU backend (NVPTX, ROCm)
masahi opened a new pull request #7051: URL: https://github.com/apache/tvm/pull/7051 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] mbrookhart commented on a change in pull request #6839: [ONNX] NMS in ONNX
mbrookhart commented on a change in pull request #6839: URL: https://github.com/apache/tvm/pull/6839#discussion_r537873092 ## File path: python/tvm/topi/cuda/nms.py ## @@ -54,64 +54,66 @@ def atomic_add(x, y): return tvm.tir.call_intrin(y.dtype, "tir.atomic_add", x, y) -def rearrange_indices_out_ir(data, out, valid_box_count): +def rearrange_indices_out_ir(data, output, valid_box_count): """Hybrid routine to rearrange nms output to move all valid entries to top. Parameters -- data : tvm.te.Tensor or numpy NDArray +NMS output. 3-D tensor with shape +[batch_size, num_anchors, 6] or +[batch_size, num_anchors, 5], or 2-D tensor with shape [batch_size, num_anchors]. +one: tvm.tir.const +Constant one with the same dtype as data. + +batch_size: tvm.tir.IntImm or tvm.tir.Var +Batch size. We need to pass it in since hybrid script doesn't support +binding variable to symbolic dim. + +num_anchors: tvm.tir.IntImm or tvm.tir.Var +Number of anchors. Returns --- -stmt : Stmt -The result IR statement. +output : tvm.te.Tensor or numpy NDArray +2-D tensor with shape [batch_size, num_anchors]. + +valid_box_count : tvm.te.Tensor or numpy NDArray +Tensor with shape [batch_size, 1], indicates +the valid number of boxes. """ batch_size = data.shape[0] num_anchors = data.shape[1] ib = tvm.tir.ir_builder.create() + data = ib.buffer_ptr(data) -out = ib.buffer_ptr(out) valid_box_count = ib.buffer_ptr(valid_box_count) - -one_count = tvm.tir.const(1, dtype="int32") -atomic_add_return = ib.allocate( -valid_box_count.dtype, (batch_size,), name="atomic_add_return", scope="local" -) - -max_threads = int(tvm.target.Target.current(allow_none=False).max_num_threads) -nthread_tx = max_threads -tx = te.thread_axis("threadIdx.x") -ib.scope_attr(tx, "thread_extent", nthread_tx) -len_inner_for = (batch_size * num_anchors) // nthread_tx + 2 - -idxd = tvm.tir.indexdiv -idxm = tvm.tir.indexmod - -with ib.for_range(0, len_inner_for, name="i") as i: -idx = tx * len_inner_for + i -batch_idx = idxd(idx, num_anchors) -with ib.if_scope(idx < batch_size): -valid_box_count[idx] = 0 -with ib.if_scope(idx < batch_size * num_anchors): -with ib.if_scope(data[idx] >= 0): -atomic_add_return[batch_idx] = atomic_add( -tvm.tir.call_intrin("handle", "tir.address_of", valid_box_count[batch_idx]), -one_count, -) -out[batch_idx * num_anchors + atomic_add_return[batch_idx]] = data[idx] -with ib.if_scope(tvm.tir.any(data[idx] > num_anchors, data[idx] < -num_anchors)): -atomic_add_return[batch_idx] = atomic_add( -tvm.tir.call_intrin("handle", "tir.address_of", valid_box_count[batch_idx]), -one_count, -) -out[batch_idx * num_anchors + atomic_add_return[batch_idx]] = 0 - -with ib.if_scope(idxm(idx, num_anchors) >= valid_box_count[batch_idx]): -out[idx] = -1 Review comment: This implementation of rearrange_indices_out_ir returns an undersized tensor in some case, I think the threading isn't quite right, but i haven't been able to fix. ## File path: python/tvm/topi/cuda/nms.py ## @@ -97,47 +97,44 @@ def get_valid_counts_ir( valid_count = ib.buffer_ptr(valid_count) out = ib.buffer_ptr(out) out_indices = ib.buffer_ptr(out_indices) -atomic_add_return = ib.allocate( -valid_count.dtype, (1,), name="atomic_add_return", scope="local" -) -one_count = tvm.tir.const(1, dtype=valid_count.dtype) one = tvm.tir.const(1, dtype=out.dtype) -score_threshold = tvm.ir.make_node("FloatImm", dtype="float32", value=score_threshold) +if isinstance(score_threshold, float): +score_threshold = tvm.ir.make_node("FloatImm", dtype="float32", value=score_threshold) id_index = tvm.ir.make_node("IntImm", dtype="int32", value=id_index) score_index = tvm.ir.make_node("IntImm", dtype="int32", value=score_index) max_threads = int(tvm.target.Target.current(allow_none=False).max_num_threads) -nthread_tx = max_threads -nthread_bx = batch_size * num_anchors // max_threads + 1 -tx = te.thread_axis("threadIdx.x") -bx = te.thread_axis("blockIdx.x") -ib.scope_attr(tx, "thread_extent", nthread_tx) -ib.scope_attr(bx, "thread_extent", nthread_bx) -tid = bx * max_threads + tx -idxd = tvm.tir.indexdiv - -# initialize valid_count -with ib.if_scope(tid < batch_size): -valid_count[tid] = 0 -with ib.if_scope(tid < batch_size * num_anchors): -i = idxd(tid, num_anchors) -with ib.if_scope( -tv
[GitHub] [tvm] masahi commented on pull request #7044: [TOPI] GPU scatter_add using atomic
masahi commented on pull request #7044: URL: https://github.com/apache/tvm/pull/7044#issuecomment-740207942 Thanks @mbrookhart @tkonolige @Laurawly This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[tvm] branch main updated: [TOPI] GPU scatter_add using atomic (#7044)
This is an automated email from the ASF dual-hosted git repository. masahi pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/tvm.git The following commit(s) were added to refs/heads/main by this push: new 2a2081e [TOPI] GPU scatter_add using atomic (#7044) 2a2081e is described below commit 2a2081e536f26b49506ef38fec820dc196bc6a2f Author: masahi AuthorDate: Tue Dec 8 07:04:34 2020 +0900 [TOPI] GPU scatter_add using atomic (#7044) * use atomic add for faster 1d scatter add * update tests * run black * more pylint fix * remove fp64 bintcount test Co-authored-by: masa --- python/tvm/relay/frontend/pytorch.py | 17 +- python/tvm/topi/cuda/scatter.py | 80 ++- tests/python/frontend/pytorch/test_forward.py | 10 ++-- tests/python/relay/test_op_level3.py | 4 ++ 4 files changed, 102 insertions(+), 9 deletions(-) diff --git a/python/tvm/relay/frontend/pytorch.py b/python/tvm/relay/frontend/pytorch.py index 4f75cf3..d2c52fb 100644 --- a/python/tvm/relay/frontend/pytorch.py +++ b/python/tvm/relay/frontend/pytorch.py @@ -1921,18 +1921,29 @@ class PyTorchOpConverter: def bincount(self, inputs, input_types): data = inputs[0] weights = inputs[1] +input_type = _infer_type(data).checked_type.dtype +if input_type == "int64": +logging.warning( +"Casting an int64 input to int32, since we do not have int64 atomic add" +"needed for bincount yet." +) +data = _op.cast(data, "int32") maximum = _op.max(data) -dim = maximum + _expr.const(1, dtype="int64") +dim = maximum + _expr.const(1, dtype="int32") if weights: weight_type = _infer_type(weights).checked_type out_dtype = weight_type.dtype updates = weights else: -out_dtype = "int64" +out_dtype = "int32" updates = _op.ones_like(data) counts = _op.zeros(_op.reshape(dim, [1]), out_dtype) -return _op.scatter_add(counts, data, updates, axis=0) +out = _op.scatter_add(counts, data, updates, axis=0) +if input_type == "int32": +# Torch always outputs int64 results for bincount +return _op.cast(out, "int64") +return out def scatter_add(self, inputs, input_types): data = inputs[0] diff --git a/python/tvm/topi/cuda/scatter.py b/python/tvm/topi/cuda/scatter.py index 5e03faf..89c5cd2 100644 --- a/python/tvm/topi/cuda/scatter.py +++ b/python/tvm/topi/cuda/scatter.py @@ -19,6 +19,7 @@ import tvm from tvm import te from ..scatter import _verify_scatter_nd_inputs +from .nms import atomic_add def ceil_div(a, b): @@ -470,6 +471,83 @@ def scatter(data, indices, updates, axis=0): return out +def gen_scatter_add_1d_atomic(data, indices, updates, axis, out, _): +"""Generate scatter add ir for 1d inputs, using atomic_add instruction + +Parameters +-- +data : tir.Tensor +The input data to the operator. + +indices : tir.Tensor +The index locations to update. + +updates : tir.Tensor +The values to update. + +axis : int +The axis to scatter on + +out : tir.Tensor +The output tensor. + +Returns +--- +ret : tir +The computational ir. +""" +assert axis == 0 +n = data.shape[0] + +ib = tvm.tir.ir_builder.create() + +out_ptr = ib.buffer_ptr(out) +data_ptr = ib.buffer_ptr(data) + +max_threads = int(tvm.target.Target.current(allow_none=False).max_num_threads) +nthread_tx = max_threads + +with ib.new_scope(): +nthread_bx = ceil_div(n, nthread_tx) +tx = te.thread_axis("threadIdx.x") +bx = te.thread_axis("blockIdx.x") +ib.scope_attr(tx, "thread_extent", nthread_tx) +ib.scope_attr(bx, "thread_extent", nthread_bx) +tid = bx * nthread_tx + tx +with ib.if_scope(tid < n): +out_ptr[tid] = data_ptr[tid] + +indices_ptr = ib.buffer_ptr(indices) +updates_ptr = ib.buffer_ptr(updates) + +ni = indices.shape[0] + +atomic_add_return = ib.allocate(updates.dtype, (1,), name="atomic_add_return", scope="local") + +with ib.new_scope(): +nthread_bx = ceil_div(ni, nthread_tx) +tx = te.thread_axis("threadIdx.x") +bx = te.thread_axis("blockIdx.x") +ib.scope_attr(tx, "thread_extent", nthread_tx) +ib.scope_attr(bx, "thread_extent", nthread_bx) +tid = bx * nthread_tx + tx + +with ib.if_scope(tid < ni): +index = indices_ptr[tid] +with ib.if_scope(index < 0): +atomic_add_return[0] = atomic_add( +tvm.tir.call_intrin("handle", "tir.address_of", out_ptr[index + n]), +updates_ptr[tid], +) +
[GitHub] [tvm] masahi merged pull request #7044: [TOPI] GPU scatter_add using atomic
masahi merged pull request #7044: URL: https://github.com/apache/tvm/pull/7044 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] comaniac commented on pull request #7046: [Auto Scheduler] Add target host to measure record
comaniac commented on pull request #7046: URL: https://github.com/apache/tvm/pull/7046#issuecomment-740197227 > @comaniac This change does not break compatibility. It can correctly read all old logs. I don't think we have to update the logs. Ah I didn't notice that we didn't check the log format when reading from file. Should we have that check? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] merrymercy commented on pull request #7046: [Auto Scheduler] Add target host to measure record
merrymercy commented on pull request #7046: URL: https://github.com/apache/tvm/pull/7046#issuecomment-740195089 @comaniac This change does not break compatibility. It can correctly read all old logs. I don't think we have to update the logs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] mbrookhart commented on a change in pull request #6839: [ONNX] NMS in ONNX
mbrookhart commented on a change in pull request #6839: URL: https://github.com/apache/tvm/pull/6839#discussion_r537847330 ## File path: python/tvm/topi/cuda/nms.py ## @@ -97,47 +97,44 @@ def get_valid_counts_ir( valid_count = ib.buffer_ptr(valid_count) out = ib.buffer_ptr(out) out_indices = ib.buffer_ptr(out_indices) -atomic_add_return = ib.allocate( -valid_count.dtype, (1,), name="atomic_add_return", scope="local" -) -one_count = tvm.tir.const(1, dtype=valid_count.dtype) one = tvm.tir.const(1, dtype=out.dtype) -score_threshold = tvm.ir.make_node("FloatImm", dtype="float32", value=score_threshold) +if isinstance(score_threshold, float): +score_threshold = tvm.ir.make_node("FloatImm", dtype="float32", value=score_threshold) id_index = tvm.ir.make_node("IntImm", dtype="int32", value=id_index) score_index = tvm.ir.make_node("IntImm", dtype="int32", value=score_index) max_threads = int(tvm.target.Target.current(allow_none=False).max_num_threads) -nthread_tx = max_threads -nthread_bx = batch_size * num_anchors // max_threads + 1 -tx = te.thread_axis("threadIdx.x") -bx = te.thread_axis("blockIdx.x") -ib.scope_attr(tx, "thread_extent", nthread_tx) -ib.scope_attr(bx, "thread_extent", nthread_bx) -tid = bx * max_threads + tx -idxd = tvm.tir.indexdiv - -# initialize valid_count -with ib.if_scope(tid < batch_size): -valid_count[tid] = 0 -with ib.if_scope(tid < batch_size * num_anchors): -i = idxd(tid, num_anchors) -with ib.if_scope( -tvm.tir.all( -data[tid * elem_length + score_index] > score_threshold, -tvm.tir.any(id_index < 0, data[tid * elem_length + id_index] >= 0), -) -): -atomic_add_return[0] = atomic_add( -tvm.tir.call_intrin("handle", "tir.address_of", valid_count[i]), one_count -) -with ib.for_range(0, elem_length) as k: -out[tid * elem_length + k] = data[tid * elem_length + k] -out_indices[tid + k] = tid + k -with ib.else_scope(): -with ib.for_range(0, elem_length) as k: -out[tid * elem_length + k] = -one -out_indices[tid + k] = -one_count - +with ib.new_scope(): +nthread_tx = max_threads +nthread_bx = batch_size // max_threads + 1 +tx = te.thread_axis("threadIdx.x") +bx = te.thread_axis("blockIdx.x") +ib.scope_attr(tx, "thread_extent", nthread_tx) +ib.scope_attr(bx, "thread_extent", nthread_bx) +tid = bx * max_threads + tx +with ib.if_scope(tid < batch_size): +valid_count[tid] = 0 +i = tid +with ib.for_range(0, num_anchors) as j: +score = data[(i * num_anchors + j) * elem_length + score_index] +with ib.if_scope( +tvm.tir.all( +score > score_threshold, +tvm.tir.any( +id_index < 0, data[(i * num_anchors + j) * elem_length + id_index] >= 0 +), +) +): +with ib.for_range(0, elem_length) as k: +out[(i * num_anchors + valid_count[i]) * elem_length + k] = data[ +(i * num_anchors + j) * elem_length + k +] +out_indices[i * num_anchors + valid_count[i]] = j +valid_count[i] += 1 Review comment: atomic_add doesn't work with nvptx. That's a headache... ## File path: python/tvm/topi/cuda/nms.py ## @@ -97,47 +97,44 @@ def get_valid_counts_ir( valid_count = ib.buffer_ptr(valid_count) out = ib.buffer_ptr(out) out_indices = ib.buffer_ptr(out_indices) -atomic_add_return = ib.allocate( -valid_count.dtype, (1,), name="atomic_add_return", scope="local" -) -one_count = tvm.tir.const(1, dtype=valid_count.dtype) one = tvm.tir.const(1, dtype=out.dtype) -score_threshold = tvm.ir.make_node("FloatImm", dtype="float32", value=score_threshold) +if isinstance(score_threshold, float): +score_threshold = tvm.ir.make_node("FloatImm", dtype="float32", value=score_threshold) id_index = tvm.ir.make_node("IntImm", dtype="int32", value=id_index) score_index = tvm.ir.make_node("IntImm", dtype="int32", value=score_index) max_threads = int(tvm.target.Target.current(allow_none=False).max_num_threads) -nthread_tx = max_threads -nthread_bx = batch_size * num_anchors // max_threads + 1 -tx = te.thread_axis("threadIdx.x") -bx = te.thread_axis("blockIdx.x") -ib.scope_attr(tx, "thread_extent", nthread_tx) -ib.scope_attr(bx, "thread_extent", nthread_bx) -tid = bx * max_threads + tx -idxd = tvm.tir.indexdiv - -# initialize valid_count -
[GitHub] [tvm] mbrookhart commented on pull request #6839: [ONNX] NMS in ONNX
mbrookhart commented on pull request #6839: URL: https://github.com/apache/tvm/pull/6839#issuecomment-740189993 @Laurawly @kevinthesun I have rebased, but I was unable to get it passing tests with Yao's changes. I'm going back through the kernels one by one to see if I can get the faster versions to pass tests before attempting the ONNX integration. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm-vta] dsteger commented on a change in pull request #20: Enable Supported Xilinx target ZCU104 with Hardware Preset
dsteger commented on a change in pull request #20: URL: https://github.com/apache/tvm-vta/pull/20#discussion_r537844949 ## File path: hardware/xilinx/scripts/vivado.tcl ## @@ -80,6 +82,11 @@ set store_ip "${ip_path}/vta_store/soln/impl/ip/xilinx_com_hls_store_1_0.zip" # Create custom project create_project -force $proj_name $proj_path -part $device +# Apply board preset if exists +if {$board != "None" && $board_rev != "None"} { + set_property BOARD_PART $board:$board_rev [current_project] Review comment: AVNET has board files that can be used as external sources. Would be a good update for the Ultra96 hardware. https://github.com/Avnet/bdf/tree/master/ultra96v2/1.1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm-vta] dsteger commented on a change in pull request #20: Enable Supported Xilinx target ZCU104 with Hardware Preset
dsteger commented on a change in pull request #20: URL: https://github.com/apache/tvm-vta/pull/20#discussion_r537829814 ## File path: hardware/xilinx/scripts/vivado.tcl ## @@ -80,6 +82,11 @@ set store_ip "${ip_path}/vta_store/soln/impl/ip/xilinx_com_hls_store_1_0.zip" # Create custom project create_project -force $proj_name $proj_path -part $device +# Apply board preset if exists +if {$board != "None" && $board_rev != "None"} { + set_property BOARD_PART $board:$board_rev [current_project] Review comment: When you build a hardware design Vivado let's you specify something called presets based on BOARD_PART. Presets are board specific configurations related to the hardware. Most importantly the DDR configuration. If you look at the hardware design built without a preset you will notice that the DDR defaults to 1600MHz. If you apply the preset (ZCU104 for this example) the DDR clock will be 2133MHZ. If we want meaningful output products then we should specify this for the boards. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] tkonolige commented on a change in pull request #7050: Sparse Conv2D for CPU (NCHW)
tkonolige commented on a change in pull request #7050: URL: https://github.com/apache/tvm/pull/7050#discussion_r537807196 ## File path: python/tvm/topi/sparse/csrmm.py ## @@ -121,3 +121,46 @@ def csrmm(a, b, c=None): 2-D with shape [m, n] """ return csrmm_default(a.data, a.indices, a.indptr, b, c) + + +def batch_csrmm(data, indices, indptr, dense, oshape): +# pylint: disable=invalid-name +assert len(data.shape) == 1 and len(indices.shape) == 1 and len(indptr.shape) == 1 \ +and len(dense.shape) == 3, "only support 2-dim csrmm" +assert indptr.dtype == 'int32', f"CSR indptr must be integers, but is {indptr.dtype}" +assert indices.dtype == 'int32', f"CSR indices must be integers, but is {indices.dtype}" + +assert isinstance(dense, te.tensor.Tensor), \ +"dense matrix is assumed to be tvm.te.Tensor, but dense is `%s`" % (type(dense)) + +M = simplify(indptr.shape[0]-1) +batches, _, N = dense.shape +def csrmm_default_ir(data, indices, indptr, dense, out): +"""define ir for csrmm""" +irb = tvm.tir.ir_builder.create() +data_ptr = irb.buffer_ptr(data) +indices_ptr = irb.buffer_ptr(indices) +indptr_ptr = irb.buffer_ptr(indptr) +dense_ptr = irb.buffer_ptr(dense) +out_ptr = irb.buffer_ptr(out) +M = simplify(indptr.shape[0]-1) +batches, _, N = dense.shape +with irb.for_range(0, batches, name='batch') as batch: +with irb.for_range(0, N, for_type="vectorize", name='n') as n: +with irb.for_range(0, M, for_type="parallel", name='row') as row: +dot = irb.allocate('float32', (1,), name='dot', scope='local') +out_ptr[(batch*N*M) + (row*N+n)] = 0. Review comment: ir_builder supports multidimensional access (`out_ptr[batch, row, n]`), which might make this code cleaner. ## File path: python/tvm/topi/sparse/csrmm.py ## @@ -121,3 +121,46 @@ def csrmm(a, b, c=None): 2-D with shape [m, n] """ return csrmm_default(a.data, a.indices, a.indptr, b, c) + + +def batch_csrmm(data, indices, indptr, dense, oshape): +# pylint: disable=invalid-name +assert len(data.shape) == 1 and len(indices.shape) == 1 and len(indptr.shape) == 1 \ +and len(dense.shape) == 3, "only support 2-dim csrmm" +assert indptr.dtype == 'int32', f"CSR indptr must be integers, but is {indptr.dtype}" +assert indices.dtype == 'int32', f"CSR indices must be integers, but is {indices.dtype}" + +assert isinstance(dense, te.tensor.Tensor), \ +"dense matrix is assumed to be tvm.te.Tensor, but dense is `%s`" % (type(dense)) + +M = simplify(indptr.shape[0]-1) +batches, _, N = dense.shape +def csrmm_default_ir(data, indices, indptr, dense, out): +"""define ir for csrmm""" +irb = tvm.tir.ir_builder.create() +data_ptr = irb.buffer_ptr(data) +indices_ptr = irb.buffer_ptr(indices) +indptr_ptr = irb.buffer_ptr(indptr) +dense_ptr = irb.buffer_ptr(dense) +out_ptr = irb.buffer_ptr(out) +M = simplify(indptr.shape[0]-1) +batches, _, N = dense.shape +with irb.for_range(0, batches, name='batch') as batch: +with irb.for_range(0, N, for_type="vectorize", name='n') as n: +with irb.for_range(0, M, for_type="parallel", name='row') as row: +dot = irb.allocate('float32', (1,), name='dot', scope='local') +out_ptr[(batch*N*M) + (row*N+n)] = 0. +dot[0] = 0. +row_start = indptr_ptr[row] +row_end = indptr_ptr[row+1] +row_elems = row_end-row_start +with irb.for_range(0, row_elems, name='idx') as idx: +elem = row_start+idx +dot[0] += data_ptr[elem] * dense_ptr[indices_ptr[elem]*N+n] +out_ptr[(batch*N*M) + row*N+n] += dot[0] +return irb.get() +matmul = te.extern(oshape, [data, indices, indptr, dense], + lambda ins, outs: csrmm_default_ir(ins[0], ins[1], ins[2], ins[3], outs[0]), + tag="csrmm", dtype='float32', name='out') Review comment: I think we would like to support more than float32. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] tkonolige commented on a change in pull request #7050: Sparse Conv2D for CPU (NCHW)
tkonolige commented on a change in pull request #7050: URL: https://github.com/apache/tvm/pull/7050#discussion_r537793211 ## File path: python/tvm/topi/nn/conv2d_sparse.py ## @@ -0,0 +1,261 @@ +import tvm +from tvm import te +from tvm.topi import nn +from tvm.topi.nn.util import get_pad_tuple +from tvm.topi.util import get_const_tuple +from tvm import autotvm +from ..nn.conv2d import conv2d_infer_layout, _get_workload as _get_conv2d_workload +from ..util import get_const_tuple, traverse_inline +from tvm.topi.sparse import batch_csrmm, csrmm_default + +def _fallback_schedule(cfg, wkl): +HPAD, WPAD = wkl.hpad, wkl.wpad +HSTR, WSTR = wkl.hstride, wkl.wstride +out_width = (wkl.width + 2 * WPAD - wkl.wkernel) // WSTR + 1 + +def _get_default_config(cfg, data, kernel, strides, padding, out_dtype, is_depthwise=False, +layout='NCHW'): +""" +Get default schedule config for the workload +""" +static_data_shape = [] +for dim in get_const_tuple(data.shape): +if isinstance(dim, tvm.tir.Var): +static_data_shape.append(1) +else: +static_data_shape.append(dim) +data = te.placeholder(static_data_shape, dtype=data.dtype) +wkl = _get_conv2d_workload(data, kernel, strides, padding, out_dtype, layout) +is_kernel_1x1 = wkl.hkernel == 1 and wkl.wkernel == 1 +_fallback_schedule(cfg, wkl) + +def conv2d_sparse_gemm_nchw(data, w_data, w_indices, w_indptr, +OC, KH, KW, +strides, padding, dilation, +out_dtype='float32'): +"""Compute conv2d by transforming the input, +executing GEMM and not transforming the output back yet""" +batches, IC, IH, IW = get_const_tuple(data.shape) + +K = KH * KW + +if isinstance(dilation, int): +dilation_h = dilation_w = dilation +else: +dilation_h, dilation_w = dilation + +dilated_kernel_h = (KH - 1) * dilation_h + 1 +dilated_kernel_w = (KW - 1) * dilation_w + 1 + +pad_top, pad_left, pad_down, pad_right = \ +get_pad_tuple(padding, (dilated_kernel_h, dilated_kernel_w)) +HSTR, WSTR = strides if isinstance(strides, (tuple, list)) else (strides, strides) + +OH = (IH + pad_top + pad_down - dilated_kernel_h) // HSTR + 1 +OW = (IW + pad_left + pad_right - dilated_kernel_w) // WSTR + 1 + +N = OC +K = KH * KW * IC +M = OH * OW + +if pad_top or pad_left: +data_pad = nn.pad(data, [0, 0, pad_top, pad_left], [0, 0, pad_down, pad_right], + name="data_pad") +else: +data_pad = data + +# --- Im2col + +B_shape = (batches, K, M) +idxmod = tvm.tir.indexmod +idxdiv = tvm.tir.indexdiv +# print(KH, KW, IC, OW, HSTR) + +B = te.compute(B_shape, lambda n, k, m: + data_pad[n, (k // (KH*KW)) % IC, +(k // KH) % KW + ((m // OW) * HSTR), +(k % KW) + ((m % OW) * WSTR)], + name='data_im2col') + + +# --- GEMM: A*B' +# oshape = (batches, N, M) +oshape = (batches, OC, OH, OW) +# B = te.compute((N,M), lambda n, m: +#B[0, n, m], +#name='data_flatten') +C = batch_csrmm(w_data, w_indices, w_indptr, B, oshape) +# C = csrmm_default(w_data, w_indices, w_indptr, B) + + +# placeholder reshape +# k = te.reduce_axis((0, K), 'k') +# C = te.compute( +# oshape, +# lambda b, c, h, w: te.sum(C[b, c, w] * C[b, c, w], axis=k), +# name='C') + +return C + +def csrdc(data, indices, indptr, inputs, oshape, kdim, strides, padding): Review comment: What is csrdc? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] mbrookhart commented on pull request #6978: [Relay][Topi][Dynamic] Add a Sort op and use the legalization pass to perform dynamic topk on GPU
mbrookhart commented on pull request #6978: URL: https://github.com/apache/tvm/pull/6978#issuecomment-740143976 closing in favor of #7018 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] mbrookhart closed pull request #6978: [Relay][Topi][Dynamic] Add a Sort op and use the legalization pass to perform dynamic topk on GPU
mbrookhart closed pull request #6978: URL: https://github.com/apache/tvm/pull/6978 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] mbrookhart commented on pull request #6839: [ONNX] NMS in ONNX
mbrookhart commented on pull request #6839: URL: https://github.com/apache/tvm/pull/6839#issuecomment-740142785 #7005 re implemented some of the features in this PR, I'll rebase and try to reconcile. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] altanh commented on pull request #7050: Sparse Conv2D for CPU (NCHW)
altanh commented on pull request #7050: URL: https://github.com/apache/tvm/pull/7050#issuecomment-740137646 cc @tkonolige who has some sparse experience This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] Wheest opened a new pull request #7050: Sparse Conv2D for CPU (NCHW)
Wheest opened a new pull request #7050: URL: https://github.com/apache/tvm/pull/7050 This pull request adds sparse conv2d implementations to CPU for TOPI. I have implemented sparse GEMM convolution, and sparse direct convolution for the NCHW data layout, using the CSR sparse data format. The extension to the C++ runtime is pretty stable. The code for TOPI is not clean or very well integrated yet, but I am looking for some guidance from other developers. [This gist](https://gist.github.com/Wheest/94433f73ff3279669bf35adcc38b321d) has a simple example of running a single layer Conv2D network with sparsity. You can choose what algorithm the Relay strategy uses with the two environment variables: ``` export TVM_DIRECT_CONV=1 export TVM_GEMM_CONV=0 ``` Comments on how to improve the integration appreciated. Further pull requests could add other sparse algorithms, and sparse data formats. I am in the process of creating sparse versions for GPU runtimes, but am having some difficulties I am discussing on the [Discuss](https://discuss.tvm.apache.org/t/sparse-opencl-error-scheduling-sparse-computations-that-use-tir-ir-builder/). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] comaniac commented on a change in pull request #7046: [Auto Scheduler] Add target host to measure record
comaniac commented on a change in pull request #7046: URL: https://github.com/apache/tvm/pull/7046#discussion_r537759346 ## File path: src/auto_scheduler/measure_record.cc ## @@ -183,7 +186,12 @@ struct Handler<::tvm::auto_scheduler::SearchTaskNode> { reader->Read(hardware_params_node.get()); s = reader->NextArrayItem(); data->hardware_params = ::tvm::auto_scheduler::HardwareParams(hardware_params_node); - ICHECK(!s); + if (s) { +reader->Read(&str_value); +data->target_host = ::tvm::Target(str_value); +s = reader->NextArrayItem(); +ICHECK(!s); Review comment: This check should be out of this if-statement. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] tqchen commented on pull request #6953: Add retry to sockets on EINTR error
tqchen commented on pull request #6953: URL: https://github.com/apache/tvm/pull/6953#issuecomment-740110237 I see, in that case @areusch is right that we might need a PackedFunc callback in the handler This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] masahi commented on pull request #7044: [TOPI] GPU scatter_add using atomic
masahi commented on pull request #7044: URL: https://github.com/apache/tvm/pull/7044#issuecomment-740103421 @tkonolige Thanks for the pointers. I've only improved scatter_add, I'm afraid I have no idea how to improve scatter. But yeah, I can see that if we can assume some structure on indices, we can do some parallelization on scatter too. I find this problem interesting, I'll put scatter improvement in my backlog. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] tkonolige opened a new pull request #7049: [DOCS] Document cloudpickle dependency in tutorials
tkonolige opened a new pull request #7049: URL: https://github.com/apache/tvm/pull/7049 This PR documents the cloud pickle dependency introduced in #6790. @merrymercy This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] trevor-m commented on a change in pull request #7026: [BYOC][TRT] Support batch norm for all ranks <=5, and all axes
trevor-m commented on a change in pull request #7026: URL: https://github.com/apache/tvm/pull/7026#discussion_r537729585 ## File path: src/runtime/contrib/tensorrt/tensorrt_ops.cc ## @@ -386,8 +386,35 @@ class BatchNormOpConverter : public TensorRTOpConverter { const int axis = std::stoi(params->node.GetAttr>("axis")[0]); const bool scale = std::stoi(params->node.GetAttr>("scale")[0]); const bool center = std::stoi(params->node.GetAttr>("center")[0]); -ICHECK(axis == 1 || axis == 3); -const bool need_transpose = axis == 3; +auto input_dims = TrtDimsToVector(input->getDimensions()); +const size_t min_rank = TRT_HAS_IMPLICIT_BATCH(params) ? 3 : 4; +const size_t max_rank = TRT_HAS_IMPLICIT_BATCH(params) ? 4 : 5; +ICHECK_LE(input_dims.size(), max_rank); Review comment: Hi @jroesch, thanks for reviewing! These checks are more for sanity checking, since the annotation functions in python/tvm/relay/op/contrib/tensorrt.py will filter out the unsupported ops before they ever get to this code. I don't expect users to ever see these. Anyway, I can make a separate PR to port all of the ICHECK to Diagnostics. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] trevor-m commented on a change in pull request #7026: [BYOC][TRT] Support batch norm for all ranks <=5, and all axes
trevor-m commented on a change in pull request #7026: URL: https://github.com/apache/tvm/pull/7026#discussion_r537729585 ## File path: src/runtime/contrib/tensorrt/tensorrt_ops.cc ## @@ -386,8 +386,35 @@ class BatchNormOpConverter : public TensorRTOpConverter { const int axis = std::stoi(params->node.GetAttr>("axis")[0]); const bool scale = std::stoi(params->node.GetAttr>("scale")[0]); const bool center = std::stoi(params->node.GetAttr>("center")[0]); -ICHECK(axis == 1 || axis == 3); -const bool need_transpose = axis == 3; +auto input_dims = TrtDimsToVector(input->getDimensions()); +const size_t min_rank = TRT_HAS_IMPLICIT_BATCH(params) ? 3 : 4; +const size_t max_rank = TRT_HAS_IMPLICIT_BATCH(params) ? 4 : 5; +ICHECK_LE(input_dims.size(), max_rank); Review comment: Hi @jroesch, thanks for reviewing! These checks are more for sanity checking, since the annotation functions in python/tvm/relay/op/contrib/tensorrt.py will filter out the unsupported ops before they ever get to this code. I don't expect users to ever see these. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] tkonolige commented on a change in pull request #6839: [ONNX] NMS in ONNX
tkonolige commented on a change in pull request #6839: URL: https://github.com/apache/tvm/pull/6839#discussion_r537725421 ## File path: python/tvm/topi/cuda/nms.py ## @@ -97,47 +97,44 @@ def get_valid_counts_ir( valid_count = ib.buffer_ptr(valid_count) out = ib.buffer_ptr(out) out_indices = ib.buffer_ptr(out_indices) -atomic_add_return = ib.allocate( -valid_count.dtype, (1,), name="atomic_add_return", scope="local" -) -one_count = tvm.tir.const(1, dtype=valid_count.dtype) one = tvm.tir.const(1, dtype=out.dtype) -score_threshold = tvm.ir.make_node("FloatImm", dtype="float32", value=score_threshold) +if isinstance(score_threshold, float): +score_threshold = tvm.ir.make_node("FloatImm", dtype="float32", value=score_threshold) id_index = tvm.ir.make_node("IntImm", dtype="int32", value=id_index) score_index = tvm.ir.make_node("IntImm", dtype="int32", value=score_index) max_threads = int(tvm.target.Target.current(allow_none=False).max_num_threads) -nthread_tx = max_threads -nthread_bx = batch_size * num_anchors // max_threads + 1 -tx = te.thread_axis("threadIdx.x") -bx = te.thread_axis("blockIdx.x") -ib.scope_attr(tx, "thread_extent", nthread_tx) -ib.scope_attr(bx, "thread_extent", nthread_bx) -tid = bx * max_threads + tx -idxd = tvm.tir.indexdiv - -# initialize valid_count -with ib.if_scope(tid < batch_size): -valid_count[tid] = 0 -with ib.if_scope(tid < batch_size * num_anchors): -i = idxd(tid, num_anchors) -with ib.if_scope( -tvm.tir.all( -data[tid * elem_length + score_index] > score_threshold, -tvm.tir.any(id_index < 0, data[tid * elem_length + id_index] >= 0), -) -): -atomic_add_return[0] = atomic_add( -tvm.tir.call_intrin("handle", "tir.address_of", valid_count[i]), one_count -) -with ib.for_range(0, elem_length) as k: -out[tid * elem_length + k] = data[tid * elem_length + k] -out_indices[tid + k] = tid + k -with ib.else_scope(): -with ib.for_range(0, elem_length) as k: -out[tid * elem_length + k] = -one -out_indices[tid + k] = -one_count - +with ib.new_scope(): +nthread_tx = max_threads +nthread_bx = batch_size // max_threads + 1 +tx = te.thread_axis("threadIdx.x") +bx = te.thread_axis("blockIdx.x") +ib.scope_attr(tx, "thread_extent", nthread_tx) +ib.scope_attr(bx, "thread_extent", nthread_bx) +tid = bx * max_threads + tx +with ib.if_scope(tid < batch_size): +valid_count[tid] = 0 +i = tid +with ib.for_range(0, num_anchors) as j: +score = data[(i * num_anchors + j) * elem_length + score_index] +with ib.if_scope( +tvm.tir.all( +score > score_threshold, +tvm.tir.any( +id_index < 0, data[(i * num_anchors + j) * elem_length + id_index] >= 0 +), +) +): +with ib.for_range(0, elem_length) as k: +out[(i * num_anchors + valid_count[i]) * elem_length + k] = data[ +(i * num_anchors + j) * elem_length + k +] +out_indices[i * num_anchors + valid_count[i]] = j +valid_count[i] += 1 Review comment: Could you use atomic add here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] tkonolige commented on pull request #7044: [TOPI] GPU scatter_add using atomic
tkonolige commented on pull request #7044: URL: https://github.com/apache/tvm/pull/7044#issuecomment-740090664 Hey @masahi, thanks for this work. I'm wondering if you've looked at a sort then lookup approach to scatter (some references: https://www.cse.ust.hk/catalac/papers/scatter_sc07.pdf, https://developer.nvidia.com/gpugems/gpugems2/part-iv-general-purpose-computation-gpus-primer/chapter-32-taking-plunge-gpu)? You also might want to look at `scatter_nd` in the codebase. It is a generalization of scatter to arbitrary dimensions. For 1D its performance probably won't be great, so maybe you could improve it too? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] masahi commented on pull request #7044: [TOPI] GPU scatter_add using atomic
masahi commented on pull request #7044: URL: https://github.com/apache/tvm/pull/7044#issuecomment-740084249 > What kind of perf improvement do you see with this? Well, the comparison would be a bit embarrassing, since the current one is the worst gpu kernel ever :) Below is an excerpt from nvprof log. This result is obtained on my crappy laptop gpu, but the difference is still significant (5 ms vs 20 us). Current one ``` Duration Grid Size Block Size Name 5.5576ms (1 1 1)(1 1 1)fused_scatter_add_1_kernel1 ``` New one in this PR ``` Duration Grid Size Block Size Name 22.176us (10 1 1) (1024 1 1) fused_scatter_add_1_kernel1 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] masahi commented on a change in pull request #7044: [TOPI] GPU scatter_add using atomic
masahi commented on a change in pull request #7044: URL: https://github.com/apache/tvm/pull/7044#discussion_r537707428 ## File path: tests/python/frontend/pytorch/test_forward.py ## @@ -3355,12 +3355,12 @@ def test_bincount(): def test_fn(x, weights=None): return torch.bincount(x, weights=weights) -inp = torch.randint(0, 8, (5,), dtype=torch.int64) -weights = torch.linspace(0, 1, steps=5) +inp = torch.randint(0, 100, (1,), dtype=torch.int64) +weights = torch.linspace(0, 100, steps=1) -verify_trace_model(test_fn, [inp], ["llvm"]) -verify_trace_model(test_fn, [inp, weights], ["llvm"]) -verify_trace_model(test_fn, [inp, weights.to(torch.float64)], ["llvm"]) Review comment: No. For some reason, CUDA on CI fails to compile fp64 atomic add https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-7044/3/pipeline/ I don't have this problem locally. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] mbrookhart commented on a change in pull request #6839: [ONNX] NMS in ONNX
mbrookhart commented on a change in pull request #6839: URL: https://github.com/apache/tvm/pull/6839#discussion_r537667569 ## File path: python/tvm/topi/cuda/nms.py ## @@ -97,47 +97,44 @@ def get_valid_counts_ir( valid_count = ib.buffer_ptr(valid_count) out = ib.buffer_ptr(out) out_indices = ib.buffer_ptr(out_indices) -atomic_add_return = ib.allocate( -valid_count.dtype, (1,), name="atomic_add_return", scope="local" -) -one_count = tvm.tir.const(1, dtype=valid_count.dtype) one = tvm.tir.const(1, dtype=out.dtype) -score_threshold = tvm.ir.make_node("FloatImm", dtype="float32", value=score_threshold) +if isinstance(score_threshold, float): +score_threshold = tvm.ir.make_node("FloatImm", dtype="float32", value=score_threshold) id_index = tvm.ir.make_node("IntImm", dtype="int32", value=id_index) score_index = tvm.ir.make_node("IntImm", dtype="int32", value=score_index) max_threads = int(tvm.target.Target.current(allow_none=False).max_num_threads) -nthread_tx = max_threads -nthread_bx = batch_size * num_anchors // max_threads + 1 -tx = te.thread_axis("threadIdx.x") -bx = te.thread_axis("blockIdx.x") -ib.scope_attr(tx, "thread_extent", nthread_tx) -ib.scope_attr(bx, "thread_extent", nthread_bx) -tid = bx * max_threads + tx -idxd = tvm.tir.indexdiv - -# initialize valid_count -with ib.if_scope(tid < batch_size): -valid_count[tid] = 0 -with ib.if_scope(tid < batch_size * num_anchors): -i = idxd(tid, num_anchors) -with ib.if_scope( -tvm.tir.all( -data[tid * elem_length + score_index] > score_threshold, -tvm.tir.any(id_index < 0, data[tid * elem_length + id_index] >= 0), -) -): -atomic_add_return[0] = atomic_add( -tvm.tir.call_intrin("handle", "tir.address_of", valid_count[i]), one_count -) -with ib.for_range(0, elem_length) as k: -out[tid * elem_length + k] = data[tid * elem_length + k] -out_indices[tid + k] = tid + k -with ib.else_scope(): -with ib.for_range(0, elem_length) as k: -out[tid * elem_length + k] = -one -out_indices[tid + k] = -one_count - +with ib.new_scope(): +nthread_tx = max_threads +nthread_bx = batch_size // max_threads + 1 +tx = te.thread_axis("threadIdx.x") +bx = te.thread_axis("blockIdx.x") +ib.scope_attr(tx, "thread_extent", nthread_tx) +ib.scope_attr(bx, "thread_extent", nthread_bx) +tid = bx * max_threads + tx +with ib.if_scope(tid < batch_size): +valid_count[tid] = 0 +i = tid +with ib.for_range(0, num_anchors) as j: +score = data[(i * num_anchors + j) * elem_length + score_index] +with ib.if_scope( +tvm.tir.all( +score > score_threshold, +tvm.tir.any( +id_index < 0, data[(i * num_anchors + j) * elem_length + id_index] >= 0 +), +) +): +with ib.for_range(0, elem_length) as k: +out[(i * num_anchors + valid_count[i]) * elem_length + k] = data[ +(i * num_anchors + j) * elem_length + k +] +out_indices[i * num_anchors + valid_count[i]] = j +valid_count[i] += 1 Review comment: There is definitely not a data race now, because I removed the threading :smile: But I think I see your point, this might be while I couldn't pass the test with threading on. I will investigate This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] ANSHUMAN87 opened a new pull request #7048: [Frontend][TFLite] Densify Op added
ANSHUMAN87 opened a new pull request #7048: URL: https://github.com/apache/tvm/pull/7048 Densify Op performs sparse to dense transformation for sparse weights based on their sparse parameters provided. This Op is needed for sparse ConvNet models. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] mbrookhart commented on a change in pull request #7044: [TOPI] GPU scatter_add using atomic
mbrookhart commented on a change in pull request #7044: URL: https://github.com/apache/tvm/pull/7044#discussion_r537647724 ## File path: tests/python/frontend/pytorch/test_forward.py ## @@ -3355,12 +3355,12 @@ def test_bincount(): def test_fn(x, weights=None): return torch.bincount(x, weights=weights) -inp = torch.randint(0, 8, (5,), dtype=torch.int64) -weights = torch.linspace(0, 1, steps=5) +inp = torch.randint(0, 100, (1,), dtype=torch.int64) +weights = torch.linspace(0, 100, steps=1) -verify_trace_model(test_fn, [inp], ["llvm"]) -verify_trace_model(test_fn, [inp, weights], ["llvm"]) -verify_trace_model(test_fn, [inp, weights.to(torch.float64)], ["llvm"]) Review comment: I assume you removed this because the atomic add isn't precise enough for float64? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] zhiics commented on issue #7047: is-there-a-horizontal-fusion-demo
zhiics commented on issue #7047: URL: https://github.com/apache/tvm/issues/7047#issuecomment-740003810 Thanks. Let's just use the discuss thread to keep the discussion. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] zhiics closed issue #7047: is-there-a-horizontal-fusion-demo
zhiics closed issue #7047: URL: https://github.com/apache/tvm/issues/7047 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[tvm-site] branch asf-site updated: Build at Mon Dec 7 10:45:56 EST 2020
This is an automated email from the ASF dual-hosted git repository. tqchen pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/tvm-site.git The following commit(s) were added to refs/heads/asf-site by this push: new 1cf4e15 Build at Mon Dec 7 10:45:56 EST 2020 1cf4e15 is described below commit 1cf4e15e63e714bd6340334268a179ac837fc5ce Author: tqchen AuthorDate: Mon Dec 7 10:45:57 2020 -0500 Build at Mon Dec 7 10:45:56 EST 2020 --- atom.xml | 2 +- feed.xml | 2 +- rss.xml | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/atom.xml b/atom.xml index 1e1edc7..9c061c8 100644 --- a/atom.xml +++ b/atom.xml @@ -4,7 +4,7 @@ TVM https://tvm.apache.org"; rel="self"/> https://tvm.apache.org"/> - 2020-11-25T14:54:37-05:00 + 2020-12-07T10:45:44-05:00 https://tvm.apache.org diff --git a/feed.xml b/feed.xml index 4aadebb..64f5387 100644 --- a/feed.xml +++ b/feed.xml @@ -1,4 +1,4 @@ -http://www.w3.org/2005/Atom"; >https://jekyllrb.com/"; version="4.1.1">Jekyll2020-11-25T14:54:37-05:00/feed.xmlTVM{"name"=>nil}Bring Your Own Datatypes: Enabling Custom Datatype [...] +http://www.w3.org/2005/Atom"; >https://jekyllrb.com/"; version="4.1.1">Jekyll2020-12-07T10:45:44-05:00/feed.xmlTVM{"name"=>nil}Bring Your Own Datatypes: Enabling Custom Datatype [...]Introduction
diff --git a/rss.xml b/rss.xml index 12029fb..f3dee7f 100644 --- a/rss.xml +++ b/rss.xml @@ -5,8 +5,8 @@ TVM - https://tvm.apache.org https://tvm.apache.org"; rel="self" type="application/rss+xml" /> -Wed, 25 Nov 2020 14:54:37 -0500 -Wed, 25 Nov 2020 14:54:37 -0500 +Mon, 07 Dec 2020 10:45:44 -0500 +Mon, 07 Dec 2020 10:45:44 -0500 60
[GitHub] [tvm] zhxfl commented on issue #7047: is-there-a-horizontal-fusion-demo
zhxfl commented on issue #7047: URL: https://github.com/apache/tvm/issues/7047#issuecomment-739870925 paper : https://arxiv.org/pdf/2007.01277.pdf This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] zhxfl commented on issue #7047: is-there-a-horizontal-fusion-demo
zhxfl commented on issue #7047: URL: https://github.com/apache/tvm/issues/7047#issuecomment-739868539 It is very import for small network when the num of block is small than the num of sm This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] FrozenGene commented on pull request #7046: [Auto Scheduler] Add target host to measure record
FrozenGene commented on pull request #7046: URL: https://github.com/apache/tvm/pull/7046#issuecomment-739853283 @merrymercy @comaniac @jcf94 @minminsun This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] FrozenGene opened a new pull request #7046: [Auto Scheduler] Add target host to measure record
FrozenGene opened a new pull request #7046: URL: https://github.com/apache/tvm/pull/7046 We don't append target host to measure record serialization / deserialization currently, which works well on x86 host machine / target is simply arm cpu. However, it will have problem for more complex deployment. Say we want to on our x86 server machine cross compile for arm machine's mali GPU, whose target host is arm cpu, but target is mali GPU. So we should add target host to measure record if `target_host` is not `nullptr`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] hzfan opened a new pull request #7045: [Arith] Simplify cast
hzfan opened a new pull request #7045: URL: https://github.com/apache/tvm/pull/7045 Follow up of #6691 Simplify `cast(i32, c * 2 + 1) + 1 - cast(i32, c * 2)` to `2` by first transforming to `cast(i32, c * 2) + cast(i32, 1) + 1 - cast(i32, c * 2)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] masahi opened a new pull request #7044: [TOPI] GPU scatter_add using atomic
masahi opened a new pull request #7044: URL: https://github.com/apache/tvm/pull/7044 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org