date:20201207

[tvm] branch main updated: [Auto Scheduler] Add target host to measure record (#7046)

2020-12-07 Thread zhaowu

This is an automated email from the ASF dual-hosted git repository.

zhaowu pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new a867bcb  [Auto Scheduler] Add target host to measure record (#7046)
a867bcb is described below

commit a867bcbf1ecf537cfb061a2ca4790b16a9cc9748
Author: Zhao Wu 
AuthorDate: Tue Dec 8 14:46:29 2020 +0800

[Auto Scheduler] Add target host to measure record (#7046)

* [Auto Scheduler] Add target host to measure record

* Fix PyLint

* Fix lint

* Solve the serialization logic when we don't have hardware params

* update auto scheduler log
---
 src/auto_scheduler/measure_record.cc   | 12 --
 .../python/unittest/test_auto_scheduler_measure.py | 26 ++
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/src/auto_scheduler/measure_record.cc 
b/src/auto_scheduler/measure_record.cc
index d57e2f2..aad0abe 100644
--- a/src/auto_scheduler/measure_record.cc
+++ b/src/auto_scheduler/measure_record.cc
@@ -163,6 +163,9 @@ struct Handler<::tvm::auto_scheduler::SearchTaskNode> {
 writer->WriteArrayItem(std::string(data.workload_key));
 writer->WriteArrayItem(data.target->str());
 writer->WriteArrayItem(*data.hardware_params.get());
+if (data.target_host.defined()) {
+  writer->WriteArrayItem(data.target_host->str());
+}
 writer->EndArray();
   }
   inline static void Read(dmlc::JSONReader* reader, 
::tvm::auto_scheduler::SearchTaskNode* data) {
@@ -183,7 +186,12 @@ struct Handler<::tvm::auto_scheduler::SearchTaskNode> {
   reader->Read(hardware_params_node.get());
   s = reader->NextArrayItem();
   data->hardware_params = 
::tvm::auto_scheduler::HardwareParams(hardware_params_node);
-  ICHECK(!s);
+  if (s) {
+reader->Read(&str_value);
+data->target_host = ::tvm::Target(str_value);
+s = reader->NextArrayItem();
+ICHECK(!s);
+  }
 }
   }
 };
@@ -271,7 +279,7 @@ namespace auto_scheduler {
 TVM_REGISTER_OBJECT_TYPE(RecordToFileNode);
 TVM_REGISTER_OBJECT_TYPE(RecordReaderNode);
 
-const std::string AUTO_SCHEDULER_LOG_VERSION = "v0.3";  // NOLINT(*)
+const std::string AUTO_SCHEDULER_LOG_VERSION = "v0.4";  // NOLINT(*)
 
 RecordToFile::RecordToFile(String filename) {
   auto node = make_object();
diff --git a/tests/python/unittest/test_auto_scheduler_measure.py 
b/tests/python/unittest/test_auto_scheduler_measure.py
index b214d9c..10bb0b4 100644
--- a/tests/python/unittest/test_auto_scheduler_measure.py
+++ b/tests/python/unittest/test_auto_scheduler_measure.py
@@ -250,6 +250,31 @@ def test_measure_local_builder_rpc_runner_spawn():
 p.join()
 
 
+@tvm.testing.requires_llvm
+def test_measure_target_host():
+task = auto_scheduler.SearchTask(
+func=matmul_auto_scheduler_test,
+args=(512, 512, 512),
+target="llvm",
+target_host="llvm -mtriple=aarch64-linux-gnu",
+)
+
+inp = auto_scheduler.measure.MeasureInput(task, 
task.compute_dag.init_state)
+res = auto_scheduler.measure.MeasureResult([0.1], 0, "", 0.2, 1)
+
+with tempfile.NamedTemporaryFile() as fp:
+auto_scheduler.save_records(fp.name, [inp], [res])
+
+log_reader = auto_scheduler.RecordReader(fp.name)
+inputs, results = log_reader.read_lines()
+assert len(inputs) == 1
+
+raw_inp = inputs[0]
+
+recovered_inp = auto_scheduler.measure.recover_measure_input(raw_inp)
+assert str(recovered_inp.task.target_host) == str(inp.task.target_host)
+
+
 if __name__ == "__main__":
 test_record_split_reorder_fuse_annotation()
 test_record_compute_at_root_inline_cache_read_write()
@@ -258,3 +283,4 @@ if __name__ == "__main__":
 test_recover_measure_input()
 test_measure_local_builder_runner()
 test_measure_local_builder_rpc_runner()
+test_measure_target_host()

[GitHub] [tvm] FrozenGene merged pull request #7046: [Auto Scheduler] Add target host to measure record

2020-12-07 Thread GitBox



FrozenGene merged pull request #7046:
URL: https://github.com/apache/tvm/pull/7046


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] antinucleon commented on a change in pull request #7053: [auto_scheduler] buffer support, correctness check

2020-12-07 Thread GitBox



antinucleon commented on a change in pull request #7053:
URL: https://github.com/apache/tvm/pull/7053#discussion_r538065121



##
File path: python/tvm/auto_scheduler/auto_schedule.py
##
@@ -136,9 +159,14 @@ def __init__(
 
 if isinstance(runner, str):
 if runner == "local":
-runner = LocalRunner()
+runner = LocalRunner(working_dir=self.temp_working_dir)
 else:
 raise ValueError("Invalid runner: " + runner)
+
+elif isinstance(runner, RPCRunner):
+rpc_kwargs = runner.kwargs
+rpc_kwargs["working_dir"] = self.temp_working_dir

Review comment:
   
https://github.com/apache/tvm/pull/7053/files#diff-5e828f80da7fdc456523468e7f69c1617d8a88867500f68998567fbd6f95a1d7R183





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] FrozenGene commented on a change in pull request #7053: [auto_scheduler] buffer support, correctness check

2020-12-07 Thread GitBox



FrozenGene commented on a change in pull request #7053:
URL: https://github.com/apache/tvm/pull/7053#discussion_r538062449



##
File path: python/tvm/auto_scheduler/auto_schedule.py
##
@@ -136,9 +159,14 @@ def __init__(
 
 if isinstance(runner, str):
 if runner == "local":
-runner = LocalRunner()
+runner = LocalRunner(working_dir=self.temp_working_dir)
 else:
 raise ValueError("Invalid runner: " + runner)
+
+elif isinstance(runner, RPCRunner):
+rpc_kwargs = runner.kwargs
+rpc_kwargs["working_dir"] = self.temp_working_dir

Review comment:
   How this argument be used later? I can not find any logic to handle this 
parameter...





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] FrozenGene commented on a change in pull request #7053: [auto_scheduler] buffer support, correctness check

2020-12-07 Thread GitBox



FrozenGene commented on a change in pull request #7053:
URL: https://github.com/apache/tvm/pull/7053#discussion_r538060242



##
File path: python/tvm/auto_scheduler/auto_schedule.py
##
@@ -210,6 +253,35 @@ def auto_schedule(task, search_policy=None, 
tuning_options=TuningOptions()):
 if search_policy is None:
 cost_model = XGBModel()
 search_policy = SketchPolicy(task, cost_model)
+
+if tuning_options.check_correctness == True:
+empty_sch, args = task.compute_dag.apply_steps_from_state(
+task.compute_dag.get_init_state(), layout_rewrite=True)
+cpu_func = build_module.build(
+empty_sch, args, target="llvm", 
target_host=task.target_host
+)
+buffer_path = os.path.join(tuning_options.working_dir, "buffer.pkl")
+if os.path.exists(buffer_path) is True:
+with open(buffer_path, "rb") as fi:
+buffer = pickle.load(fi)
+if len(buffer) == len(args):
+# we skip check each arg shape here
+pass
+elif len(buffer) == len(args) - 1:
+# assume only one output
+np_args = 
np.zeros(size=get_const_tuple(args[-1].shape)).astype(args[-1].dtype)
+cpu_args = [v for _, v in buffer.items()] + 
[ndarray.array(np_args, ctx=tvm.cpu())]
+cpu_func(*cpu_args)
+### save cpu result
+answer = [x.asnumpy() for x in cpu_args]
+tuning_options.register_buffer(args[-1].name, answer[-1])
+else:
+np_args = [np.random.uniform(-0.1, 0.1, 
size=get_const_tuple(x.shape)).astype(x.dtype) for x in args]

Review comment:
   We should use `random_fill` function as it help us handle different 
types. For example, for quantized uint8 dtype, [-0.1, 0.1] of 
`np.random.uniform(-0.1, 0.1, size=get_const_tuple(x.shape)).astype(x.dtype) 
for x in args` will be all zeros, which is not we want.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] antinucleon opened a new pull request #7053: [auto_scheduler] buffer support, correctness check

2020-12-07 Thread GitBox



antinucleon opened a new pull request #7053:
URL: https://github.com/apache/tvm/pull/7053


   This PR enables correctness check for a generated schedule.
   
   It is useful for:
   - Metal / ROCM: For some invalid schedule, driver may skip it instead of 
return any errors (which will show as an impossble large FLOPS number)
   - Sparse kernel search
   
   Example 1: correctness check. This will generate random buffers for 
correctness check
   
   ```
   if train_flag:
 #measure_ctx = auto_scheduler.LocalRPCMeasureContext(min_repeat_ms=300)
 measure_runner = auto_scheduler.RPCRunner("m1", "127.0.0.1", 9190, 
min_repeat_ms=300, timeout=30, repeat=3)
 tune_option = auto_scheduler.TuningOptions(
 num_measure_trials=1500, 
 check_correctness=True,
 builder_n_parallel=1,
 runner=measure_runner,
 measure_callbacks=[auto_scheduler.RecordToFile(log_file)],
 verbose=2,
 )
   
 sch, args = auto_scheduler.auto_schedule(task, 
tuning_options=tune_option)
   
   ```
   
   Example 2: Sparse tuning, this will register given buffers for measure.
   ```
   if train_flag:
 #measure_ctx = auto_scheduler.LocalRPCMeasureContext(min_repeat_ms=300)
 measure_runner = auto_scheduler.RPCRunner("m1", "127.0.0.1", 9190, 
min_repeat_ms=300, timeout=30, repeat=3)
 tune_option = auto_scheduler.TuningOptions(
 num_measure_trials=1500, 
 #runner=measure_ctx.runner,
 check_correctness=False,
 builder_n_parallel=1,
 runner=measure_runner,
 measure_callbacks=[auto_scheduler.RecordToFile(log_file)],
 verbose=2,
 )
 for k, v in BUFFER.items():
 tune_option.register_buffer(k, v)
   ```
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] FrozenGene commented on a change in pull request #7046: [Auto Scheduler] Add target host to measure record

2020-12-07 Thread GitBox



FrozenGene commented on a change in pull request #7046:
URL: https://github.com/apache/tvm/pull/7046#discussion_r537987159



##
File path: src/auto_scheduler/measure_record.cc
##
@@ -183,7 +186,12 @@ struct Handler<::tvm::auto_scheduler::SearchTaskNode> {
   reader->Read(hardware_params_node.get());
   s = reader->NextArrayItem();
   data->hardware_params = 
::tvm::auto_scheduler::HardwareParams(hardware_params_node);
-  ICHECK(!s);
+  if (s) {
+reader->Read(&str_value);
+data->target_host = ::tvm::Target(str_value);
+s = reader->NextArrayItem();
+ICHECK(!s);

Review comment:
   Move out will break back compatibility





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] FrozenGene commented on pull request #7046: [Auto Scheduler] Add target host to measure record

2020-12-07 Thread GitBox



FrozenGene commented on pull request #7046:
URL: https://github.com/apache/tvm/pull/7046#issuecomment-740321677


   @merrymercy @comaniac Have another round of look



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] Laurawly commented on a change in pull request #6839: [ONNX] NMS in ONNX

2020-12-07 Thread GitBox



Laurawly commented on a change in pull request #6839:
URL: https://github.com/apache/tvm/pull/6839#discussion_r537977579



##
File path: python/tvm/topi/cuda/nms.py
##
@@ -97,47 +97,44 @@ def get_valid_counts_ir(
 valid_count = ib.buffer_ptr(valid_count)
 out = ib.buffer_ptr(out)
 out_indices = ib.buffer_ptr(out_indices)
-atomic_add_return = ib.allocate(
-valid_count.dtype, (1,), name="atomic_add_return", scope="local"
-)
-one_count = tvm.tir.const(1, dtype=valid_count.dtype)
 one = tvm.tir.const(1, dtype=out.dtype)
-score_threshold = tvm.ir.make_node("FloatImm", dtype="float32", 
value=score_threshold)
+if isinstance(score_threshold, float):
+score_threshold = tvm.ir.make_node("FloatImm", dtype="float32", 
value=score_threshold)
 id_index = tvm.ir.make_node("IntImm", dtype="int32", value=id_index)
 score_index = tvm.ir.make_node("IntImm", dtype="int32", value=score_index)
 
 max_threads = 
int(tvm.target.Target.current(allow_none=False).max_num_threads)
-nthread_tx = max_threads
-nthread_bx = batch_size * num_anchors // max_threads + 1
-tx = te.thread_axis("threadIdx.x")
-bx = te.thread_axis("blockIdx.x")
-ib.scope_attr(tx, "thread_extent", nthread_tx)
-ib.scope_attr(bx, "thread_extent", nthread_bx)
-tid = bx * max_threads + tx
-idxd = tvm.tir.indexdiv
-
-# initialize valid_count
-with ib.if_scope(tid < batch_size):
-valid_count[tid] = 0
-with ib.if_scope(tid < batch_size * num_anchors):
-i = idxd(tid, num_anchors)
-with ib.if_scope(
-tvm.tir.all(
-data[tid * elem_length + score_index] > score_threshold,
-tvm.tir.any(id_index < 0, data[tid * elem_length + id_index] 
>= 0),
-)
-):
-atomic_add_return[0] = atomic_add(
-tvm.tir.call_intrin("handle", "tir.address_of", 
valid_count[i]), one_count
-)
-with ib.for_range(0, elem_length) as k:
-out[tid * elem_length + k] = data[tid * elem_length + k]
-out_indices[tid + k] = tid + k
-with ib.else_scope():
-with ib.for_range(0, elem_length) as k:
-out[tid * elem_length + k] = -one
-out_indices[tid + k] = -one_count
-
+with ib.new_scope():
+nthread_tx = max_threads
+nthread_bx = batch_size // max_threads + 1
+tx = te.thread_axis("threadIdx.x")
+bx = te.thread_axis("blockIdx.x")
+ib.scope_attr(tx, "thread_extent", nthread_tx)
+ib.scope_attr(bx, "thread_extent", nthread_bx)
+tid = bx * max_threads + tx
+with ib.if_scope(tid < batch_size):
+valid_count[tid] = 0
+i = tid
+with ib.for_range(0, num_anchors) as j:
+score = data[(i * num_anchors + j) * elem_length + score_index]
+with ib.if_scope(
+tvm.tir.all(
+score > score_threshold,
+tvm.tir.any(
+id_index < 0, data[(i * num_anchors + j) * 
elem_length + id_index] >= 0
+),
+)
+):
+with ib.for_range(0, elem_length) as k:
+out[(i * num_anchors + valid_count[i]) * elem_length + 
k] = data[
+(i * num_anchors + j) * elem_length + k
+]
+out_indices[i * num_anchors + valid_count[i]] = j
+valid_count[i] += 1

Review comment:
   I see. So we lose parallelism in `num_anchors` (could be quite large) 
compared with the original implementation. Are we able to keep the level of 
parallelism while getting the correct output indices?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] Laurawly commented on a change in pull request #6839: [ONNX] NMS in ONNX

2020-12-07 Thread GitBox



Laurawly commented on a change in pull request #6839:
URL: https://github.com/apache/tvm/pull/6839#discussion_r537977579



##
File path: python/tvm/topi/cuda/nms.py
##
@@ -97,47 +97,44 @@ def get_valid_counts_ir(
 valid_count = ib.buffer_ptr(valid_count)
 out = ib.buffer_ptr(out)
 out_indices = ib.buffer_ptr(out_indices)
-atomic_add_return = ib.allocate(
-valid_count.dtype, (1,), name="atomic_add_return", scope="local"
-)
-one_count = tvm.tir.const(1, dtype=valid_count.dtype)
 one = tvm.tir.const(1, dtype=out.dtype)
-score_threshold = tvm.ir.make_node("FloatImm", dtype="float32", 
value=score_threshold)
+if isinstance(score_threshold, float):
+score_threshold = tvm.ir.make_node("FloatImm", dtype="float32", 
value=score_threshold)
 id_index = tvm.ir.make_node("IntImm", dtype="int32", value=id_index)
 score_index = tvm.ir.make_node("IntImm", dtype="int32", value=score_index)
 
 max_threads = 
int(tvm.target.Target.current(allow_none=False).max_num_threads)
-nthread_tx = max_threads
-nthread_bx = batch_size * num_anchors // max_threads + 1
-tx = te.thread_axis("threadIdx.x")
-bx = te.thread_axis("blockIdx.x")
-ib.scope_attr(tx, "thread_extent", nthread_tx)
-ib.scope_attr(bx, "thread_extent", nthread_bx)
-tid = bx * max_threads + tx
-idxd = tvm.tir.indexdiv
-
-# initialize valid_count
-with ib.if_scope(tid < batch_size):
-valid_count[tid] = 0
-with ib.if_scope(tid < batch_size * num_anchors):
-i = idxd(tid, num_anchors)
-with ib.if_scope(
-tvm.tir.all(
-data[tid * elem_length + score_index] > score_threshold,
-tvm.tir.any(id_index < 0, data[tid * elem_length + id_index] 
>= 0),
-)
-):
-atomic_add_return[0] = atomic_add(
-tvm.tir.call_intrin("handle", "tir.address_of", 
valid_count[i]), one_count
-)
-with ib.for_range(0, elem_length) as k:
-out[tid * elem_length + k] = data[tid * elem_length + k]
-out_indices[tid + k] = tid + k
-with ib.else_scope():
-with ib.for_range(0, elem_length) as k:
-out[tid * elem_length + k] = -one
-out_indices[tid + k] = -one_count
-
+with ib.new_scope():
+nthread_tx = max_threads
+nthread_bx = batch_size // max_threads + 1
+tx = te.thread_axis("threadIdx.x")
+bx = te.thread_axis("blockIdx.x")
+ib.scope_attr(tx, "thread_extent", nthread_tx)
+ib.scope_attr(bx, "thread_extent", nthread_bx)
+tid = bx * max_threads + tx
+with ib.if_scope(tid < batch_size):
+valid_count[tid] = 0
+i = tid
+with ib.for_range(0, num_anchors) as j:
+score = data[(i * num_anchors + j) * elem_length + score_index]
+with ib.if_scope(
+tvm.tir.all(
+score > score_threshold,
+tvm.tir.any(
+id_index < 0, data[(i * num_anchors + j) * 
elem_length + id_index] >= 0
+),
+)
+):
+with ib.for_range(0, elem_length) as k:
+out[(i * num_anchors + valid_count[i]) * elem_length + 
k] = data[
+(i * num_anchors + j) * elem_length + k
+]
+out_indices[i * num_anchors + valid_count[i]] = j
+valid_count[i] += 1

Review comment:
   I see. So we lose parallelism in `num_anchors` compared with the 
original implementation. Are we able to keep the level of parallelism while 
getting the correct output indices?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] Coder-nlper opened a new issue #7052: c++ run slower than python?

2020-12-07 Thread GitBox



Coder-nlper opened a new issue #7052:
URL: https://github.com/apache/tvm/issues/7052


   I use the following example, and modify it to load my model.
   https://github.com/apache/tvm/blob/main/apps/howto_deploy/cpp_deploy.cc
   I count the inference time of python and C++.
   
   c++ code:
 clock_t startTime, endTime;
 startTime=clock();
 for (int i = 0; i < 36; ++i) {
 static_cast(x->data)[i] = array[i];
 }
 // set the right input
 set_input("input_ids", x);
 // run the code
 run();
 // get the output
 get_output(0, y);
   
 for (int i = 0; i < 36; ++i) {
 cout << static_cast(y->data)[i] << " ";
 }
endTime = clock();
cout<<(double)(endTime - startTime)/CLOCKS_PER_SEC<

[GitHub] [tvm-vta] dsteger commented on a change in pull request #20: Enable Supported Xilinx target ZCU104 with Hardware Preset

2020-12-07 Thread GitBox



dsteger commented on a change in pull request #20:
URL: https://github.com/apache/tvm-vta/pull/20#discussion_r537919215



##
File path: hardware/xilinx/scripts/vivado.tcl
##
@@ -80,6 +82,11 @@ set store_ip 
"${ip_path}/vta_store/soln/impl/ip/xilinx_com_hls_store_1_0.zip"
 # Create custom project
 create_project -force $proj_name $proj_path -part $device
 
+# Apply board preset if exists
+if {$board != "None" && $board_rev != "None"} {
+  set_property BOARD_PART $board:$board_rev [current_project]

Review comment:
   @liangfu Let me know if you have any more questions. Would love to get 
this merged so I can submit the next PR to support the ultra96 variants. (v1 
and v2 have different parts from a tools point of view - v2 is industrial 
grade). 

##
File path: hardware/xilinx/scripts/vivado.tcl
##
@@ -80,6 +82,11 @@ set store_ip 
"${ip_path}/vta_store/soln/impl/ip/xilinx_com_hls_store_1_0.zip"
 # Create custom project
 create_project -force $proj_name $proj_path -part $device
 
+# Apply board preset if exists
+if {$board != "None" && $board_rev != "None"} {
+  set_property BOARD_PART $board:$board_rev [current_project]

Review comment:
   AVNET has board files that can be used as external sources. Would be a 
good update for the Ultra96 hardware. 
   
   https://github.com/Avnet/bdf/tree/master/ultra96v2/1.1 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] jwfromm commented on pull request #7036: [Relay][Frontend][Onnx] MaxUnpool Operator

2020-12-07 Thread GitBox



jwfromm commented on pull request #7036:
URL: https://github.com/apache/tvm/pull/7036#issuecomment-740284637


   @masahi that would be nice, although in this case I was just being lazy and 
not calculating the output shape. I've fixed the test now if you want to take 
another look.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] altanh commented on pull request #6798: [Relay][VM] Add support for references.

2020-12-07 Thread GitBox



altanh commented on pull request #6798:
URL: https://github.com/apache/tvm/pull/6798#issuecomment-740276633


   I think the failing unit tests are actually unsound and rely on DCE for 
refs, or perhaps they are sound but correctness is definitely not guaranteed by 
what we currently have. cc @MarisaKirisame, how should we proceed? We might 
have to  block this PR until DCE gets fixed once and for all, or disable all 
the offending tests.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[tvm] branch main updated (2a2081e -> 5e68e6a)

2020-12-07 Thread tqchen

This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 2a2081e  [TOPI] GPU scatter_add using atomic (#7044)
 add 5e68e6a  [DOCS] Document cloudpickle dependency in tutorials (#7049)

No new revisions were added by this update.

Summary of changes:
 docs/install/from_source.rst   | 2 +-
 tutorials/autotvm/tune_conv2d_cuda.py  | 2 +-
 tutorials/autotvm/tune_relay_arm.py| 2 +-
 tutorials/autotvm/tune_relay_cuda.py   | 2 +-
 tutorials/autotvm/tune_relay_mobile_gpu.py | 2 +-
 tutorials/autotvm/tune_simple_template.py  | 2 +-
 vta/tutorials/autotvm/tune_relay_vta.py| 2 +-
 7 files changed, 7 insertions(+), 7 deletions(-)

[GitHub] [tvm] tqchen merged pull request #7049: [DOCS] Document cloudpickle dependency in tutorials

2020-12-07 Thread GitBox



tqchen merged pull request #7049:
URL: https://github.com/apache/tvm/pull/7049


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] altanh commented on pull request #6798: [Relay][VM] Add support for references.

2020-12-07 Thread GitBox



altanh commented on pull request #6798:
URL: https://github.com/apache/tvm/pull/6798#issuecomment-740251072


   I've addressed the ADT tag issue as suggested (using Tuple), and left a TODO 
comment in DCE.
   
   For the record, I tried adding Feature set checking in DCE to error on 
detecting RefWrite. However, the VM compilation process requires DCE in other 
places (I think most critically in `InlinePrimitives`), so I couldn't get the 
test to pass without removing feature checking. I think this raises the 
criticality of fixing DCE slightly, and we are working on it.
   
   re @mbrookhart , I'm not totally sure what you meant by the debug stuff, but 
references have been in Relay since higher-order AD was introduced. I'm not 
sure if there is/was an RFC for adding references, although I agree we should 
probably make one especially as we are working towards supporting stateful 
stuff going forward.
   
   cc @jroesch who might have more thoughts



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm-vta] dsteger commented on a change in pull request #20: Enable Supported Xilinx target ZCU104 with Hardware Preset

2020-12-07 Thread GitBox



dsteger commented on a change in pull request #20:
URL: https://github.com/apache/tvm-vta/pull/20#discussion_r537919215



##
File path: hardware/xilinx/scripts/vivado.tcl
##
@@ -80,6 +82,11 @@ set store_ip 
"${ip_path}/vta_store/soln/impl/ip/xilinx_com_hls_store_1_0.zip"
 # Create custom project
 create_project -force $proj_name $proj_path -part $device
 
+# Apply board preset if exists
+if {$board != "None" && $board_rev != "None"} {
+  set_property BOARD_PART $board:$board_rev [current_project]

Review comment:
   @liangfu Let me know if you have any more questions. Would love to get 
this merged so I can submit the next PR to support the ultra96 variants. (v1 
and v2 have different parts from a tools point of view - v2 is industrial 
grade). 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] comaniac commented on pull request #7046: [Auto Scheduler] Add target host to measure record

2020-12-07 Thread GitBox



comaniac commented on pull request #7046:
URL: https://github.com/apache/tvm/pull/7046#issuecomment-740235434


   > Currently, we have 4 versions of log format. They all have backward 
compatibility.
   > We can call this PR v0.4. This PR can correctly read v0.3 and v0.2. We do 
not need to do any additional checks.
   
   Yeah that's definitely more flexible. I'm just afraid that it might 
introduce some confusions, as AutoTVM strictly checks log versions. Maybe we 
could let it be for now and add the check once the newer log format is no 
longer backward compatible.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] merrymercy commented on pull request #7046: [Auto Scheduler] Add target host to measure record

2020-12-07 Thread GitBox



merrymercy commented on pull request #7046:
URL: https://github.com/apache/tvm/pull/7046#issuecomment-740232724


   Currently, we have 4 versions of log format. They all have backward 
compatibility.
   We can call this PR v0.4. v0.4 can correctly read v0.3 and v0.2.  We do not 
need to do any additional checks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] merrymercy edited a comment on pull request #7046: [Auto Scheduler] Add target host to measure record

2020-12-07 Thread GitBox



merrymercy edited a comment on pull request #7046:
URL: https://github.com/apache/tvm/pull/7046#issuecomment-740232724


   Currently, we have 4 versions of log format. They all have backward 
compatibility.
   We can call this PR v0.4. This PR can correctly read v0.3 and v0.2.  We do 
not need to do any additional checks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm-vta] dsteger commented on pull request #20: Enable Supported Xilinx target ZCU104 with Hardware Preset

2020-12-07 Thread GitBox



dsteger commented on pull request #20:
URL: https://github.com/apache/tvm-vta/pull/20#issuecomment-740216108


   I just forced push a change to apply the preset. FYI



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] masahi opened a new pull request #7051: [LLVM] Support atomic for GPU backend (NVPTX, ROCm)

2020-12-07 Thread GitBox



masahi opened a new pull request #7051:
URL: https://github.com/apache/tvm/pull/7051


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] mbrookhart commented on a change in pull request #6839: [ONNX] NMS in ONNX

2020-12-07 Thread GitBox



mbrookhart commented on a change in pull request #6839:
URL: https://github.com/apache/tvm/pull/6839#discussion_r537873092



##
File path: python/tvm/topi/cuda/nms.py
##
@@ -54,64 +54,66 @@ def atomic_add(x, y):
 return tvm.tir.call_intrin(y.dtype, "tir.atomic_add", x, y)
 
 
-def rearrange_indices_out_ir(data, out, valid_box_count):
+def rearrange_indices_out_ir(data, output, valid_box_count):
 """Hybrid routine to rearrange nms output to
 move all valid entries to top.
 
 Parameters
 --
 data : tvm.te.Tensor or numpy NDArray
+NMS output. 3-D tensor with shape
+[batch_size, num_anchors, 6] or
+[batch_size, num_anchors, 5], or 2-D
 tensor with shape [batch_size, num_anchors].
 
+one: tvm.tir.const
+Constant one with the same dtype as data.
+
+batch_size: tvm.tir.IntImm or tvm.tir.Var
+Batch size. We need to pass it in since hybrid script doesn't support
+binding variable to symbolic dim.
+
+num_anchors: tvm.tir.IntImm or tvm.tir.Var
+Number of anchors.
 
 Returns
 ---
-stmt : Stmt
-The result IR statement.
+output : tvm.te.Tensor or numpy NDArray
+2-D tensor with shape [batch_size, num_anchors].
+
+valid_box_count : tvm.te.Tensor or numpy NDArray
+Tensor with shape [batch_size, 1], indicates
+the valid number of boxes.
 """
 batch_size = data.shape[0]
 num_anchors = data.shape[1]
 
 ib = tvm.tir.ir_builder.create()
+
 data = ib.buffer_ptr(data)
-out = ib.buffer_ptr(out)
 valid_box_count = ib.buffer_ptr(valid_box_count)
-
-one_count = tvm.tir.const(1, dtype="int32")
-atomic_add_return = ib.allocate(
-valid_box_count.dtype, (batch_size,), name="atomic_add_return", 
scope="local"
-)
-
-max_threads = 
int(tvm.target.Target.current(allow_none=False).max_num_threads)
-nthread_tx = max_threads
-tx = te.thread_axis("threadIdx.x")
-ib.scope_attr(tx, "thread_extent", nthread_tx)
-len_inner_for = (batch_size * num_anchors) // nthread_tx + 2
-
-idxd = tvm.tir.indexdiv
-idxm = tvm.tir.indexmod
-
-with ib.for_range(0, len_inner_for, name="i") as i:
-idx = tx * len_inner_for + i
-batch_idx = idxd(idx, num_anchors)
-with ib.if_scope(idx < batch_size):
-valid_box_count[idx] = 0
-with ib.if_scope(idx < batch_size * num_anchors):
-with ib.if_scope(data[idx] >= 0):
-atomic_add_return[batch_idx] = atomic_add(
-tvm.tir.call_intrin("handle", "tir.address_of", 
valid_box_count[batch_idx]),
-one_count,
-)
-out[batch_idx * num_anchors + atomic_add_return[batch_idx]] = 
data[idx]
-with ib.if_scope(tvm.tir.any(data[idx] > num_anchors, data[idx] < 
-num_anchors)):
-atomic_add_return[batch_idx] = atomic_add(
-tvm.tir.call_intrin("handle", "tir.address_of", 
valid_box_count[batch_idx]),
-one_count,
-)
-out[batch_idx * num_anchors + atomic_add_return[batch_idx]] = 0
-
-with ib.if_scope(idxm(idx, num_anchors) >= 
valid_box_count[batch_idx]):
-out[idx] = -1

Review comment:
   This implementation of rearrange_indices_out_ir returns an undersized 
tensor in some case, I think the threading isn't quite right, but i haven't 
been able to fix.

##
File path: python/tvm/topi/cuda/nms.py
##
@@ -97,47 +97,44 @@ def get_valid_counts_ir(
 valid_count = ib.buffer_ptr(valid_count)
 out = ib.buffer_ptr(out)
 out_indices = ib.buffer_ptr(out_indices)
-atomic_add_return = ib.allocate(
-valid_count.dtype, (1,), name="atomic_add_return", scope="local"
-)
-one_count = tvm.tir.const(1, dtype=valid_count.dtype)
 one = tvm.tir.const(1, dtype=out.dtype)
-score_threshold = tvm.ir.make_node("FloatImm", dtype="float32", 
value=score_threshold)
+if isinstance(score_threshold, float):
+score_threshold = tvm.ir.make_node("FloatImm", dtype="float32", 
value=score_threshold)
 id_index = tvm.ir.make_node("IntImm", dtype="int32", value=id_index)
 score_index = tvm.ir.make_node("IntImm", dtype="int32", value=score_index)
 
 max_threads = 
int(tvm.target.Target.current(allow_none=False).max_num_threads)
-nthread_tx = max_threads
-nthread_bx = batch_size * num_anchors // max_threads + 1
-tx = te.thread_axis("threadIdx.x")
-bx = te.thread_axis("blockIdx.x")
-ib.scope_attr(tx, "thread_extent", nthread_tx)
-ib.scope_attr(bx, "thread_extent", nthread_bx)
-tid = bx * max_threads + tx
-idxd = tvm.tir.indexdiv
-
-# initialize valid_count
-with ib.if_scope(tid < batch_size):
-valid_count[tid] = 0
-with ib.if_scope(tid < batch_size * num_anchors):
-i = idxd(tid, num_anchors)
-with ib.if_scope(
-tv

[GitHub] [tvm] masahi commented on pull request #7044: [TOPI] GPU scatter_add using atomic

2020-12-07 Thread GitBox



masahi commented on pull request #7044:
URL: https://github.com/apache/tvm/pull/7044#issuecomment-740207942


   Thanks @mbrookhart @tkonolige @Laurawly 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[tvm] branch main updated: [TOPI] GPU scatter_add using atomic (#7044)

2020-12-07 Thread masahi

This is an automated email from the ASF dual-hosted git repository.

masahi pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
 new 2a2081e  [TOPI] GPU scatter_add using atomic (#7044)
2a2081e is described below

commit 2a2081e536f26b49506ef38fec820dc196bc6a2f
Author: masahi 
AuthorDate: Tue Dec 8 07:04:34 2020 +0900

[TOPI] GPU scatter_add using atomic (#7044)

* use atomic add for faster 1d scatter add

* update tests

* run black

* more pylint fix

* remove fp64 bintcount test

Co-authored-by: masa 
---
 python/tvm/relay/frontend/pytorch.py  | 17 +-
 python/tvm/topi/cuda/scatter.py   | 80 ++-
 tests/python/frontend/pytorch/test_forward.py | 10 ++--
 tests/python/relay/test_op_level3.py  |  4 ++
 4 files changed, 102 insertions(+), 9 deletions(-)

diff --git a/python/tvm/relay/frontend/pytorch.py 
b/python/tvm/relay/frontend/pytorch.py
index 4f75cf3..d2c52fb 100644
--- a/python/tvm/relay/frontend/pytorch.py
+++ b/python/tvm/relay/frontend/pytorch.py
@@ -1921,18 +1921,29 @@ class PyTorchOpConverter:
 def bincount(self, inputs, input_types):
 data = inputs[0]
 weights = inputs[1]
+input_type = _infer_type(data).checked_type.dtype
+if input_type == "int64":
+logging.warning(
+"Casting an int64 input to int32, since we do not have int64 
atomic add"
+"needed for bincount yet."
+)
+data = _op.cast(data, "int32")
 maximum = _op.max(data)
-dim = maximum + _expr.const(1, dtype="int64")
+dim = maximum + _expr.const(1, dtype="int32")
 if weights:
 weight_type = _infer_type(weights).checked_type
 out_dtype = weight_type.dtype
 updates = weights
 else:
-out_dtype = "int64"
+out_dtype = "int32"
 updates = _op.ones_like(data)
 
 counts = _op.zeros(_op.reshape(dim, [1]), out_dtype)
-return _op.scatter_add(counts, data, updates, axis=0)
+out = _op.scatter_add(counts, data, updates, axis=0)
+if input_type == "int32":
+# Torch always outputs int64 results for bincount
+return _op.cast(out, "int64")
+return out
 
 def scatter_add(self, inputs, input_types):
 data = inputs[0]
diff --git a/python/tvm/topi/cuda/scatter.py b/python/tvm/topi/cuda/scatter.py
index 5e03faf..89c5cd2 100644
--- a/python/tvm/topi/cuda/scatter.py
+++ b/python/tvm/topi/cuda/scatter.py
@@ -19,6 +19,7 @@
 import tvm
 from tvm import te
 from ..scatter import _verify_scatter_nd_inputs
+from .nms import atomic_add
 
 
 def ceil_div(a, b):
@@ -470,6 +471,83 @@ def scatter(data, indices, updates, axis=0):
 return out
 
 
+def gen_scatter_add_1d_atomic(data, indices, updates, axis, out, _):
+"""Generate scatter add ir for 1d inputs, using atomic_add instruction
+
+Parameters
+--
+data : tir.Tensor
+The input data to the operator.
+
+indices : tir.Tensor
+The index locations to update.
+
+updates : tir.Tensor
+The values to update.
+
+axis : int
+The axis to scatter on
+
+out : tir.Tensor
+The output tensor.
+
+Returns
+---
+ret : tir
+The computational ir.
+"""
+assert axis == 0
+n = data.shape[0]
+
+ib = tvm.tir.ir_builder.create()
+
+out_ptr = ib.buffer_ptr(out)
+data_ptr = ib.buffer_ptr(data)
+
+max_threads = 
int(tvm.target.Target.current(allow_none=False).max_num_threads)
+nthread_tx = max_threads
+
+with ib.new_scope():
+nthread_bx = ceil_div(n, nthread_tx)
+tx = te.thread_axis("threadIdx.x")
+bx = te.thread_axis("blockIdx.x")
+ib.scope_attr(tx, "thread_extent", nthread_tx)
+ib.scope_attr(bx, "thread_extent", nthread_bx)
+tid = bx * nthread_tx + tx
+with ib.if_scope(tid < n):
+out_ptr[tid] = data_ptr[tid]
+
+indices_ptr = ib.buffer_ptr(indices)
+updates_ptr = ib.buffer_ptr(updates)
+
+ni = indices.shape[0]
+
+atomic_add_return = ib.allocate(updates.dtype, (1,), 
name="atomic_add_return", scope="local")
+
+with ib.new_scope():
+nthread_bx = ceil_div(ni, nthread_tx)
+tx = te.thread_axis("threadIdx.x")
+bx = te.thread_axis("blockIdx.x")
+ib.scope_attr(tx, "thread_extent", nthread_tx)
+ib.scope_attr(bx, "thread_extent", nthread_bx)
+tid = bx * nthread_tx + tx
+
+with ib.if_scope(tid < ni):
+index = indices_ptr[tid]
+with ib.if_scope(index < 0):
+atomic_add_return[0] = atomic_add(
+tvm.tir.call_intrin("handle", "tir.address_of", 
out_ptr[index + n]),
+updates_ptr[tid],
+)
+

[GitHub] [tvm] masahi merged pull request #7044: [TOPI] GPU scatter_add using atomic

2020-12-07 Thread GitBox



masahi merged pull request #7044:
URL: https://github.com/apache/tvm/pull/7044


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] comaniac commented on pull request #7046: [Auto Scheduler] Add target host to measure record

2020-12-07 Thread GitBox



comaniac commented on pull request #7046:
URL: https://github.com/apache/tvm/pull/7046#issuecomment-740197227


   > @comaniac This change does not break compatibility. It can correctly read 
all old logs. I don't think we have to update the logs.
   
   Ah I didn't notice that we didn't check the log format when reading from 
file. Should we have that check?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] merrymercy commented on pull request #7046: [Auto Scheduler] Add target host to measure record

2020-12-07 Thread GitBox



merrymercy commented on pull request #7046:
URL: https://github.com/apache/tvm/pull/7046#issuecomment-740195089


   @comaniac  This change does not break compatibility. It can correctly read 
all old logs. I don't think we have to update the logs.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] mbrookhart commented on a change in pull request #6839: [ONNX] NMS in ONNX

2020-12-07 Thread GitBox



mbrookhart commented on a change in pull request #6839:
URL: https://github.com/apache/tvm/pull/6839#discussion_r537847330



##
File path: python/tvm/topi/cuda/nms.py
##
@@ -97,47 +97,44 @@ def get_valid_counts_ir(
 valid_count = ib.buffer_ptr(valid_count)
 out = ib.buffer_ptr(out)
 out_indices = ib.buffer_ptr(out_indices)
-atomic_add_return = ib.allocate(
-valid_count.dtype, (1,), name="atomic_add_return", scope="local"
-)
-one_count = tvm.tir.const(1, dtype=valid_count.dtype)
 one = tvm.tir.const(1, dtype=out.dtype)
-score_threshold = tvm.ir.make_node("FloatImm", dtype="float32", 
value=score_threshold)
+if isinstance(score_threshold, float):
+score_threshold = tvm.ir.make_node("FloatImm", dtype="float32", 
value=score_threshold)
 id_index = tvm.ir.make_node("IntImm", dtype="int32", value=id_index)
 score_index = tvm.ir.make_node("IntImm", dtype="int32", value=score_index)
 
 max_threads = 
int(tvm.target.Target.current(allow_none=False).max_num_threads)
-nthread_tx = max_threads
-nthread_bx = batch_size * num_anchors // max_threads + 1
-tx = te.thread_axis("threadIdx.x")
-bx = te.thread_axis("blockIdx.x")
-ib.scope_attr(tx, "thread_extent", nthread_tx)
-ib.scope_attr(bx, "thread_extent", nthread_bx)
-tid = bx * max_threads + tx
-idxd = tvm.tir.indexdiv
-
-# initialize valid_count
-with ib.if_scope(tid < batch_size):
-valid_count[tid] = 0
-with ib.if_scope(tid < batch_size * num_anchors):
-i = idxd(tid, num_anchors)
-with ib.if_scope(
-tvm.tir.all(
-data[tid * elem_length + score_index] > score_threshold,
-tvm.tir.any(id_index < 0, data[tid * elem_length + id_index] 
>= 0),
-)
-):
-atomic_add_return[0] = atomic_add(
-tvm.tir.call_intrin("handle", "tir.address_of", 
valid_count[i]), one_count
-)
-with ib.for_range(0, elem_length) as k:
-out[tid * elem_length + k] = data[tid * elem_length + k]
-out_indices[tid + k] = tid + k
-with ib.else_scope():
-with ib.for_range(0, elem_length) as k:
-out[tid * elem_length + k] = -one
-out_indices[tid + k] = -one_count
-
+with ib.new_scope():
+nthread_tx = max_threads
+nthread_bx = batch_size // max_threads + 1
+tx = te.thread_axis("threadIdx.x")
+bx = te.thread_axis("blockIdx.x")
+ib.scope_attr(tx, "thread_extent", nthread_tx)
+ib.scope_attr(bx, "thread_extent", nthread_bx)
+tid = bx * max_threads + tx
+with ib.if_scope(tid < batch_size):
+valid_count[tid] = 0
+i = tid
+with ib.for_range(0, num_anchors) as j:
+score = data[(i * num_anchors + j) * elem_length + score_index]
+with ib.if_scope(
+tvm.tir.all(
+score > score_threshold,
+tvm.tir.any(
+id_index < 0, data[(i * num_anchors + j) * 
elem_length + id_index] >= 0
+),
+)
+):
+with ib.for_range(0, elem_length) as k:
+out[(i * num_anchors + valid_count[i]) * elem_length + 
k] = data[
+(i * num_anchors + j) * elem_length + k
+]
+out_indices[i * num_anchors + valid_count[i]] = j
+valid_count[i] += 1

Review comment:
   atomic_add doesn't work with nvptx. That's a headache...

##
File path: python/tvm/topi/cuda/nms.py
##
@@ -97,47 +97,44 @@ def get_valid_counts_ir(
 valid_count = ib.buffer_ptr(valid_count)
 out = ib.buffer_ptr(out)
 out_indices = ib.buffer_ptr(out_indices)
-atomic_add_return = ib.allocate(
-valid_count.dtype, (1,), name="atomic_add_return", scope="local"
-)
-one_count = tvm.tir.const(1, dtype=valid_count.dtype)
 one = tvm.tir.const(1, dtype=out.dtype)
-score_threshold = tvm.ir.make_node("FloatImm", dtype="float32", 
value=score_threshold)
+if isinstance(score_threshold, float):
+score_threshold = tvm.ir.make_node("FloatImm", dtype="float32", 
value=score_threshold)
 id_index = tvm.ir.make_node("IntImm", dtype="int32", value=id_index)
 score_index = tvm.ir.make_node("IntImm", dtype="int32", value=score_index)
 
 max_threads = 
int(tvm.target.Target.current(allow_none=False).max_num_threads)
-nthread_tx = max_threads
-nthread_bx = batch_size * num_anchors // max_threads + 1
-tx = te.thread_axis("threadIdx.x")
-bx = te.thread_axis("blockIdx.x")
-ib.scope_attr(tx, "thread_extent", nthread_tx)
-ib.scope_attr(bx, "thread_extent", nthread_bx)
-tid = bx * max_threads + tx
-idxd = tvm.tir.indexdiv
-
-# initialize valid_count
-

[GitHub] [tvm] mbrookhart commented on pull request #6839: [ONNX] NMS in ONNX

2020-12-07 Thread GitBox



mbrookhart commented on pull request #6839:
URL: https://github.com/apache/tvm/pull/6839#issuecomment-740189993


   @Laurawly @kevinthesun I have rebased, but I was unable to get it passing 
tests with Yao's changes. I'm going back through the kernels one by one to see 
if I can get the faster versions to pass tests before attempting the ONNX 
integration.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm-vta] dsteger commented on a change in pull request #20: Enable Supported Xilinx target ZCU104 with Hardware Preset

2020-12-07 Thread GitBox



dsteger commented on a change in pull request #20:
URL: https://github.com/apache/tvm-vta/pull/20#discussion_r537844949



##
File path: hardware/xilinx/scripts/vivado.tcl
##
@@ -80,6 +82,11 @@ set store_ip 
"${ip_path}/vta_store/soln/impl/ip/xilinx_com_hls_store_1_0.zip"
 # Create custom project
 create_project -force $proj_name $proj_path -part $device
 
+# Apply board preset if exists
+if {$board != "None" && $board_rev != "None"} {
+  set_property BOARD_PART $board:$board_rev [current_project]

Review comment:
   AVNET has board files that can be used as external sources. Would be a 
good update for the Ultra96 hardware. 
   
   https://github.com/Avnet/bdf/tree/master/ultra96v2/1.1 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm-vta] dsteger commented on a change in pull request #20: Enable Supported Xilinx target ZCU104 with Hardware Preset

2020-12-07 Thread GitBox



dsteger commented on a change in pull request #20:
URL: https://github.com/apache/tvm-vta/pull/20#discussion_r537829814



##
File path: hardware/xilinx/scripts/vivado.tcl
##
@@ -80,6 +82,11 @@ set store_ip 
"${ip_path}/vta_store/soln/impl/ip/xilinx_com_hls_store_1_0.zip"
 # Create custom project
 create_project -force $proj_name $proj_path -part $device
 
+# Apply board preset if exists
+if {$board != "None" && $board_rev != "None"} {
+  set_property BOARD_PART $board:$board_rev [current_project]

Review comment:
   When you build a hardware design Vivado let's you specify something 
called presets based on BOARD_PART. Presets are board specific configurations 
related to the hardware. Most importantly the DDR configuration. If you look at 
the hardware design built without a preset you will notice that the DDR 
defaults to 1600MHz. If you apply the preset (ZCU104 for this example) the DDR 
clock will be 2133MHZ.
   
   If we want meaningful output products then we should specify this for the 
boards. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] tkonolige commented on a change in pull request #7050: Sparse Conv2D for CPU (NCHW)

2020-12-07 Thread GitBox



tkonolige commented on a change in pull request #7050:
URL: https://github.com/apache/tvm/pull/7050#discussion_r537807196



##
File path: python/tvm/topi/sparse/csrmm.py
##
@@ -121,3 +121,46 @@ def csrmm(a, b, c=None):
 2-D with shape [m, n]
 """
 return csrmm_default(a.data, a.indices, a.indptr, b, c)
+
+
+def batch_csrmm(data, indices, indptr, dense, oshape):
+# pylint: disable=invalid-name
+assert len(data.shape) == 1 and len(indices.shape) == 1 and 
len(indptr.shape) == 1 \
+and len(dense.shape) == 3, "only support 2-dim csrmm"
+assert indptr.dtype == 'int32', f"CSR indptr must be integers, but is 
{indptr.dtype}"
+assert indices.dtype == 'int32', f"CSR indices must be integers, but is 
{indices.dtype}"
+
+assert isinstance(dense, te.tensor.Tensor), \
+"dense matrix is assumed to be tvm.te.Tensor, but dense is `%s`" % 
(type(dense))
+
+M = simplify(indptr.shape[0]-1)
+batches, _, N = dense.shape
+def csrmm_default_ir(data, indices, indptr, dense, out):
+"""define ir for csrmm"""
+irb = tvm.tir.ir_builder.create()
+data_ptr = irb.buffer_ptr(data)
+indices_ptr = irb.buffer_ptr(indices)
+indptr_ptr = irb.buffer_ptr(indptr)
+dense_ptr = irb.buffer_ptr(dense)
+out_ptr = irb.buffer_ptr(out)
+M = simplify(indptr.shape[0]-1)
+batches, _, N = dense.shape
+with irb.for_range(0, batches, name='batch') as batch:
+with irb.for_range(0, N, for_type="vectorize", name='n') as n:
+with irb.for_range(0, M, for_type="parallel", name='row') as 
row:
+dot = irb.allocate('float32', (1,), name='dot', 
scope='local')
+out_ptr[(batch*N*M) + (row*N+n)] = 0.

Review comment:
   ir_builder supports multidimensional access (`out_ptr[batch, row, n]`), 
which might make this code cleaner.

##
File path: python/tvm/topi/sparse/csrmm.py
##
@@ -121,3 +121,46 @@ def csrmm(a, b, c=None):
 2-D with shape [m, n]
 """
 return csrmm_default(a.data, a.indices, a.indptr, b, c)
+
+
+def batch_csrmm(data, indices, indptr, dense, oshape):
+# pylint: disable=invalid-name
+assert len(data.shape) == 1 and len(indices.shape) == 1 and 
len(indptr.shape) == 1 \
+and len(dense.shape) == 3, "only support 2-dim csrmm"
+assert indptr.dtype == 'int32', f"CSR indptr must be integers, but is 
{indptr.dtype}"
+assert indices.dtype == 'int32', f"CSR indices must be integers, but is 
{indices.dtype}"
+
+assert isinstance(dense, te.tensor.Tensor), \
+"dense matrix is assumed to be tvm.te.Tensor, but dense is `%s`" % 
(type(dense))
+
+M = simplify(indptr.shape[0]-1)
+batches, _, N = dense.shape
+def csrmm_default_ir(data, indices, indptr, dense, out):
+"""define ir for csrmm"""
+irb = tvm.tir.ir_builder.create()
+data_ptr = irb.buffer_ptr(data)
+indices_ptr = irb.buffer_ptr(indices)
+indptr_ptr = irb.buffer_ptr(indptr)
+dense_ptr = irb.buffer_ptr(dense)
+out_ptr = irb.buffer_ptr(out)
+M = simplify(indptr.shape[0]-1)
+batches, _, N = dense.shape
+with irb.for_range(0, batches, name='batch') as batch:
+with irb.for_range(0, N, for_type="vectorize", name='n') as n:
+with irb.for_range(0, M, for_type="parallel", name='row') as 
row:
+dot = irb.allocate('float32', (1,), name='dot', 
scope='local')
+out_ptr[(batch*N*M) + (row*N+n)] = 0.
+dot[0] = 0.
+row_start = indptr_ptr[row]
+row_end = indptr_ptr[row+1]
+row_elems = row_end-row_start
+with irb.for_range(0, row_elems, name='idx') as idx:
+elem = row_start+idx
+dot[0] += data_ptr[elem] * 
dense_ptr[indices_ptr[elem]*N+n]
+out_ptr[(batch*N*M) + row*N+n] += dot[0]
+return irb.get()
+matmul = te.extern(oshape, [data, indices, indptr, dense],
+   lambda ins, outs: csrmm_default_ir(ins[0], ins[1], 
ins[2], ins[3], outs[0]),
+   tag="csrmm", dtype='float32', name='out')

Review comment:
   I think we would like to support more than float32.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] tkonolige commented on a change in pull request #7050: Sparse Conv2D for CPU (NCHW)

2020-12-07 Thread GitBox



tkonolige commented on a change in pull request #7050:
URL: https://github.com/apache/tvm/pull/7050#discussion_r537793211



##
File path: python/tvm/topi/nn/conv2d_sparse.py
##
@@ -0,0 +1,261 @@
+import tvm
+from tvm import te
+from tvm.topi import nn
+from tvm.topi.nn.util import get_pad_tuple
+from tvm.topi.util import get_const_tuple
+from tvm import autotvm
+from ..nn.conv2d import conv2d_infer_layout, _get_workload as 
_get_conv2d_workload
+from ..util import get_const_tuple, traverse_inline
+from tvm.topi.sparse import batch_csrmm, csrmm_default
+
+def _fallback_schedule(cfg, wkl):
+HPAD, WPAD = wkl.hpad, wkl.wpad
+HSTR, WSTR = wkl.hstride, wkl.wstride
+out_width = (wkl.width + 2 * WPAD - wkl.wkernel) // WSTR + 1
+
+def _get_default_config(cfg, data, kernel, strides, padding, out_dtype, 
is_depthwise=False,
+layout='NCHW'):
+"""
+Get default schedule config for the workload
+"""
+static_data_shape = []
+for dim in get_const_tuple(data.shape):
+if isinstance(dim, tvm.tir.Var):
+static_data_shape.append(1)
+else:
+static_data_shape.append(dim)
+data = te.placeholder(static_data_shape, dtype=data.dtype)
+wkl = _get_conv2d_workload(data, kernel, strides, padding, out_dtype, 
layout)
+is_kernel_1x1 = wkl.hkernel == 1 and wkl.wkernel == 1
+_fallback_schedule(cfg, wkl)
+
+def conv2d_sparse_gemm_nchw(data, w_data, w_indices, w_indptr,
+OC, KH, KW,
+strides, padding, dilation,
+out_dtype='float32'):
+"""Compute conv2d by transforming the input,
+executing GEMM and not transforming the output back yet"""
+batches, IC, IH, IW = get_const_tuple(data.shape)
+
+K = KH * KW
+
+if isinstance(dilation, int):
+dilation_h = dilation_w = dilation
+else:
+dilation_h, dilation_w = dilation
+
+dilated_kernel_h = (KH - 1) * dilation_h + 1
+dilated_kernel_w = (KW - 1) * dilation_w + 1
+
+pad_top, pad_left, pad_down, pad_right = \
+get_pad_tuple(padding, (dilated_kernel_h, dilated_kernel_w))
+HSTR, WSTR = strides if isinstance(strides, (tuple, list)) else (strides, 
strides)
+
+OH = (IH + pad_top + pad_down - dilated_kernel_h) // HSTR + 1
+OW = (IW + pad_left + pad_right - dilated_kernel_w) // WSTR + 1
+
+N = OC
+K = KH * KW * IC
+M = OH * OW
+
+if pad_top or pad_left:
+data_pad = nn.pad(data, [0, 0, pad_top, pad_left], [0, 0, pad_down, 
pad_right],
+  name="data_pad")
+else:
+data_pad = data
+
+# --- Im2col
+
+B_shape = (batches, K, M)
+idxmod = tvm.tir.indexmod
+idxdiv = tvm.tir.indexdiv
+# print(KH, KW, IC, OW, HSTR)
+
+B = te.compute(B_shape, lambda n, k, m:
+   data_pad[n, (k // (KH*KW)) % IC,
+(k // KH) % KW + ((m // OW) * HSTR),
+(k % KW) + ((m % OW) * WSTR)],
+   name='data_im2col')
+
+
+# --- GEMM: A*B'
+# oshape = (batches, N, M)
+oshape = (batches, OC, OH, OW)
+# B = te.compute((N,M), lambda n, m:
+#B[0, n, m],
+#name='data_flatten')
+C = batch_csrmm(w_data, w_indices, w_indptr, B, oshape)
+# C = csrmm_default(w_data, w_indices, w_indptr, B)
+
+
+# placeholder reshape
+# k = te.reduce_axis((0, K), 'k')
+# C = te.compute(
+# oshape,
+# lambda b, c, h, w: te.sum(C[b, c, w] * C[b, c, w], axis=k),
+# name='C')
+
+return C
+
+def csrdc(data, indices, indptr, inputs, oshape, kdim, strides, padding):

Review comment:
   What is csrdc?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] mbrookhart commented on pull request #6978: [Relay][Topi][Dynamic] Add a Sort op and use the legalization pass to perform dynamic topk on GPU

2020-12-07 Thread GitBox



mbrookhart commented on pull request #6978:
URL: https://github.com/apache/tvm/pull/6978#issuecomment-740143976


   closing in favor of #7018 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] mbrookhart closed pull request #6978: [Relay][Topi][Dynamic] Add a Sort op and use the legalization pass to perform dynamic topk on GPU

2020-12-07 Thread GitBox



mbrookhart closed pull request #6978:
URL: https://github.com/apache/tvm/pull/6978


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] mbrookhart commented on pull request #6839: [ONNX] NMS in ONNX

2020-12-07 Thread GitBox



mbrookhart commented on pull request #6839:
URL: https://github.com/apache/tvm/pull/6839#issuecomment-740142785


   #7005 re implemented some of the features in this PR, I'll rebase and try to 
reconcile.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] altanh commented on pull request #7050: Sparse Conv2D for CPU (NCHW)

2020-12-07 Thread GitBox



altanh commented on pull request #7050:
URL: https://github.com/apache/tvm/pull/7050#issuecomment-740137646


   cc @tkonolige who has some sparse experience



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] Wheest opened a new pull request #7050: Sparse Conv2D for CPU (NCHW)

2020-12-07 Thread GitBox



Wheest opened a new pull request #7050:
URL: https://github.com/apache/tvm/pull/7050


   This pull request adds sparse conv2d implementations to CPU for TOPI.  I 
have implemented sparse GEMM convolution, and sparse direct convolution for the 
NCHW data layout, using the CSR sparse data format.
   
   The extension to the C++ runtime is pretty stable.  The code for TOPI is not 
clean or very well integrated yet, but I am looking for some guidance from 
other developers.
   
   [This gist](https://gist.github.com/Wheest/94433f73ff3279669bf35adcc38b321d) 
has a simple example of running a single layer Conv2D network with sparsity.  
   
   You can choose what algorithm the Relay strategy uses with the two 
environment variables:
   
   ```
   export TVM_DIRECT_CONV=1
   export TVM_GEMM_CONV=0
   ```
   
   Comments on how to improve the integration appreciated.  Further pull 
requests could add other sparse algorithms, and sparse data formats.
   
   I am in the process of creating sparse versions for GPU runtimes, but am 
having some difficulties I am discussing on the 
[Discuss](https://discuss.tvm.apache.org/t/sparse-opencl-error-scheduling-sparse-computations-that-use-tir-ir-builder/).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] comaniac commented on a change in pull request #7046: [Auto Scheduler] Add target host to measure record

2020-12-07 Thread GitBox



comaniac commented on a change in pull request #7046:
URL: https://github.com/apache/tvm/pull/7046#discussion_r537759346



##
File path: src/auto_scheduler/measure_record.cc
##
@@ -183,7 +186,12 @@ struct Handler<::tvm::auto_scheduler::SearchTaskNode> {
   reader->Read(hardware_params_node.get());
   s = reader->NextArrayItem();
   data->hardware_params = 
::tvm::auto_scheduler::HardwareParams(hardware_params_node);
-  ICHECK(!s);
+  if (s) {
+reader->Read(&str_value);
+data->target_host = ::tvm::Target(str_value);
+s = reader->NextArrayItem();
+ICHECK(!s);

Review comment:
   This check should be out of this if-statement.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] tqchen commented on pull request #6953: Add retry to sockets on EINTR error

2020-12-07 Thread GitBox



tqchen commented on pull request #6953:
URL: https://github.com/apache/tvm/pull/6953#issuecomment-740110237


   I see, in that case @areusch is right that we might need a PackedFunc 
callback in the handler



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] masahi commented on pull request #7044: [TOPI] GPU scatter_add using atomic

2020-12-07 Thread GitBox



masahi commented on pull request #7044:
URL: https://github.com/apache/tvm/pull/7044#issuecomment-740103421


   @tkonolige Thanks for the pointers. I've only improved scatter_add, I'm 
afraid I have no idea how to improve scatter. But yeah, I can see that if we 
can assume some structure on indices, we can do some parallelization on scatter 
too. I find this problem interesting, I'll put scatter improvement in my 
backlog.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] tkonolige opened a new pull request #7049: [DOCS] Document cloudpickle dependency in tutorials

2020-12-07 Thread GitBox



tkonolige opened a new pull request #7049:
URL: https://github.com/apache/tvm/pull/7049


   This PR documents the cloud pickle dependency introduced in #6790.
   
   @merrymercy



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] trevor-m commented on a change in pull request #7026: [BYOC][TRT] Support batch norm for all ranks <=5, and all axes

2020-12-07 Thread GitBox



trevor-m commented on a change in pull request #7026:
URL: https://github.com/apache/tvm/pull/7026#discussion_r537729585



##
File path: src/runtime/contrib/tensorrt/tensorrt_ops.cc
##
@@ -386,8 +386,35 @@ class BatchNormOpConverter : public TensorRTOpConverter {
 const int axis = 
std::stoi(params->node.GetAttr>("axis")[0]);
 const bool scale = 
std::stoi(params->node.GetAttr>("scale")[0]);
 const bool center = 
std::stoi(params->node.GetAttr>("center")[0]);
-ICHECK(axis == 1 || axis == 3);
-const bool need_transpose = axis == 3;
+auto input_dims = TrtDimsToVector(input->getDimensions());
+const size_t min_rank = TRT_HAS_IMPLICIT_BATCH(params) ? 3 : 4;
+const size_t max_rank = TRT_HAS_IMPLICIT_BATCH(params) ? 4 : 5;
+ICHECK_LE(input_dims.size(), max_rank);

Review comment:
   Hi @jroesch, thanks for reviewing!
   
   These checks are more for sanity checking, since the annotation functions in 
python/tvm/relay/op/contrib/tensorrt.py will filter out the unsupported ops 
before they ever get to this code. I don't expect users to ever see these.
   
   Anyway, I can make a separate PR to port all of the ICHECK to Diagnostics.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] trevor-m commented on a change in pull request #7026: [BYOC][TRT] Support batch norm for all ranks <=5, and all axes

2020-12-07 Thread GitBox



trevor-m commented on a change in pull request #7026:
URL: https://github.com/apache/tvm/pull/7026#discussion_r537729585



##
File path: src/runtime/contrib/tensorrt/tensorrt_ops.cc
##
@@ -386,8 +386,35 @@ class BatchNormOpConverter : public TensorRTOpConverter {
 const int axis = 
std::stoi(params->node.GetAttr>("axis")[0]);
 const bool scale = 
std::stoi(params->node.GetAttr>("scale")[0]);
 const bool center = 
std::stoi(params->node.GetAttr>("center")[0]);
-ICHECK(axis == 1 || axis == 3);
-const bool need_transpose = axis == 3;
+auto input_dims = TrtDimsToVector(input->getDimensions());
+const size_t min_rank = TRT_HAS_IMPLICIT_BATCH(params) ? 3 : 4;
+const size_t max_rank = TRT_HAS_IMPLICIT_BATCH(params) ? 4 : 5;
+ICHECK_LE(input_dims.size(), max_rank);

Review comment:
   Hi @jroesch, thanks for reviewing!
   
   These checks are more for sanity checking, since the annotation functions in 
python/tvm/relay/op/contrib/tensorrt.py will filter out the unsupported ops 
before they ever get to this code. I don't expect users to ever see these.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] tkonolige commented on a change in pull request #6839: [ONNX] NMS in ONNX

2020-12-07 Thread GitBox



tkonolige commented on a change in pull request #6839:
URL: https://github.com/apache/tvm/pull/6839#discussion_r537725421



##
File path: python/tvm/topi/cuda/nms.py
##
@@ -97,47 +97,44 @@ def get_valid_counts_ir(
 valid_count = ib.buffer_ptr(valid_count)
 out = ib.buffer_ptr(out)
 out_indices = ib.buffer_ptr(out_indices)
-atomic_add_return = ib.allocate(
-valid_count.dtype, (1,), name="atomic_add_return", scope="local"
-)
-one_count = tvm.tir.const(1, dtype=valid_count.dtype)
 one = tvm.tir.const(1, dtype=out.dtype)
-score_threshold = tvm.ir.make_node("FloatImm", dtype="float32", 
value=score_threshold)
+if isinstance(score_threshold, float):
+score_threshold = tvm.ir.make_node("FloatImm", dtype="float32", 
value=score_threshold)
 id_index = tvm.ir.make_node("IntImm", dtype="int32", value=id_index)
 score_index = tvm.ir.make_node("IntImm", dtype="int32", value=score_index)
 
 max_threads = 
int(tvm.target.Target.current(allow_none=False).max_num_threads)
-nthread_tx = max_threads
-nthread_bx = batch_size * num_anchors // max_threads + 1
-tx = te.thread_axis("threadIdx.x")
-bx = te.thread_axis("blockIdx.x")
-ib.scope_attr(tx, "thread_extent", nthread_tx)
-ib.scope_attr(bx, "thread_extent", nthread_bx)
-tid = bx * max_threads + tx
-idxd = tvm.tir.indexdiv
-
-# initialize valid_count
-with ib.if_scope(tid < batch_size):
-valid_count[tid] = 0
-with ib.if_scope(tid < batch_size * num_anchors):
-i = idxd(tid, num_anchors)
-with ib.if_scope(
-tvm.tir.all(
-data[tid * elem_length + score_index] > score_threshold,
-tvm.tir.any(id_index < 0, data[tid * elem_length + id_index] 
>= 0),
-)
-):
-atomic_add_return[0] = atomic_add(
-tvm.tir.call_intrin("handle", "tir.address_of", 
valid_count[i]), one_count
-)
-with ib.for_range(0, elem_length) as k:
-out[tid * elem_length + k] = data[tid * elem_length + k]
-out_indices[tid + k] = tid + k
-with ib.else_scope():
-with ib.for_range(0, elem_length) as k:
-out[tid * elem_length + k] = -one
-out_indices[tid + k] = -one_count
-
+with ib.new_scope():
+nthread_tx = max_threads
+nthread_bx = batch_size // max_threads + 1
+tx = te.thread_axis("threadIdx.x")
+bx = te.thread_axis("blockIdx.x")
+ib.scope_attr(tx, "thread_extent", nthread_tx)
+ib.scope_attr(bx, "thread_extent", nthread_bx)
+tid = bx * max_threads + tx
+with ib.if_scope(tid < batch_size):
+valid_count[tid] = 0
+i = tid
+with ib.for_range(0, num_anchors) as j:
+score = data[(i * num_anchors + j) * elem_length + score_index]
+with ib.if_scope(
+tvm.tir.all(
+score > score_threshold,
+tvm.tir.any(
+id_index < 0, data[(i * num_anchors + j) * 
elem_length + id_index] >= 0
+),
+)
+):
+with ib.for_range(0, elem_length) as k:
+out[(i * num_anchors + valid_count[i]) * elem_length + 
k] = data[
+(i * num_anchors + j) * elem_length + k
+]
+out_indices[i * num_anchors + valid_count[i]] = j
+valid_count[i] += 1

Review comment:
   Could you use atomic add here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] tkonolige commented on pull request #7044: [TOPI] GPU scatter_add using atomic

2020-12-07 Thread GitBox



tkonolige commented on pull request #7044:
URL: https://github.com/apache/tvm/pull/7044#issuecomment-740090664


   Hey @masahi, thanks for this work. I'm wondering if you've looked at a sort 
then lookup approach to scatter (some references: 
https://www.cse.ust.hk/catalac/papers/scatter_sc07.pdf, 
https://developer.nvidia.com/gpugems/gpugems2/part-iv-general-purpose-computation-gpus-primer/chapter-32-taking-plunge-gpu)?
   
   You also might want to look at `scatter_nd` in the codebase. It is a 
generalization of scatter to arbitrary dimensions. For 1D its performance 
probably won't be great, so maybe you could improve it too?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] masahi commented on pull request #7044: [TOPI] GPU scatter_add using atomic

2020-12-07 Thread GitBox



masahi commented on pull request #7044:
URL: https://github.com/apache/tvm/pull/7044#issuecomment-740084249


   > What kind of perf improvement do you see with this?
   
   Well, the comparison would be a bit embarrassing, since the current one is 
the worst gpu kernel ever :) Below is an excerpt from nvprof log. This result 
is obtained on my crappy laptop gpu, but the difference is still significant (5 
ms vs 20 us).  
   
   Current one
   ```
   Duration  Grid Size  Block Size Name
   5.5576ms  (1 1 1)(1 1 1)fused_scatter_add_1_kernel1
   ```
   
   New one in this PR
   ```  




   Duration  Grid Size  Block Size Name
   22.176us  (10 1 1)   (1024 1 1) fused_scatter_add_1_kernel1
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] masahi commented on a change in pull request #7044: [TOPI] GPU scatter_add using atomic

2020-12-07 Thread GitBox



masahi commented on a change in pull request #7044:
URL: https://github.com/apache/tvm/pull/7044#discussion_r537707428



##
File path: tests/python/frontend/pytorch/test_forward.py
##
@@ -3355,12 +3355,12 @@ def test_bincount():
 def test_fn(x, weights=None):
 return torch.bincount(x, weights=weights)
 
-inp = torch.randint(0, 8, (5,), dtype=torch.int64)
-weights = torch.linspace(0, 1, steps=5)
+inp = torch.randint(0, 100, (1,), dtype=torch.int64)
+weights = torch.linspace(0, 100, steps=1)
 
-verify_trace_model(test_fn, [inp], ["llvm"])
-verify_trace_model(test_fn, [inp, weights], ["llvm"])
-verify_trace_model(test_fn, [inp, weights.to(torch.float64)], ["llvm"])

Review comment:
   No. For some reason, CUDA on CI fails to compile fp64 atomic add
   
https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-7044/3/pipeline/
   
   I don't have this problem locally.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] mbrookhart commented on a change in pull request #6839: [ONNX] NMS in ONNX

2020-12-07 Thread GitBox



mbrookhart commented on a change in pull request #6839:
URL: https://github.com/apache/tvm/pull/6839#discussion_r537667569



##
File path: python/tvm/topi/cuda/nms.py
##
@@ -97,47 +97,44 @@ def get_valid_counts_ir(
 valid_count = ib.buffer_ptr(valid_count)
 out = ib.buffer_ptr(out)
 out_indices = ib.buffer_ptr(out_indices)
-atomic_add_return = ib.allocate(
-valid_count.dtype, (1,), name="atomic_add_return", scope="local"
-)
-one_count = tvm.tir.const(1, dtype=valid_count.dtype)
 one = tvm.tir.const(1, dtype=out.dtype)
-score_threshold = tvm.ir.make_node("FloatImm", dtype="float32", 
value=score_threshold)
+if isinstance(score_threshold, float):
+score_threshold = tvm.ir.make_node("FloatImm", dtype="float32", 
value=score_threshold)
 id_index = tvm.ir.make_node("IntImm", dtype="int32", value=id_index)
 score_index = tvm.ir.make_node("IntImm", dtype="int32", value=score_index)
 
 max_threads = 
int(tvm.target.Target.current(allow_none=False).max_num_threads)
-nthread_tx = max_threads
-nthread_bx = batch_size * num_anchors // max_threads + 1
-tx = te.thread_axis("threadIdx.x")
-bx = te.thread_axis("blockIdx.x")
-ib.scope_attr(tx, "thread_extent", nthread_tx)
-ib.scope_attr(bx, "thread_extent", nthread_bx)
-tid = bx * max_threads + tx
-idxd = tvm.tir.indexdiv
-
-# initialize valid_count
-with ib.if_scope(tid < batch_size):
-valid_count[tid] = 0
-with ib.if_scope(tid < batch_size * num_anchors):
-i = idxd(tid, num_anchors)
-with ib.if_scope(
-tvm.tir.all(
-data[tid * elem_length + score_index] > score_threshold,
-tvm.tir.any(id_index < 0, data[tid * elem_length + id_index] 
>= 0),
-)
-):
-atomic_add_return[0] = atomic_add(
-tvm.tir.call_intrin("handle", "tir.address_of", 
valid_count[i]), one_count
-)
-with ib.for_range(0, elem_length) as k:
-out[tid * elem_length + k] = data[tid * elem_length + k]
-out_indices[tid + k] = tid + k
-with ib.else_scope():
-with ib.for_range(0, elem_length) as k:
-out[tid * elem_length + k] = -one
-out_indices[tid + k] = -one_count
-
+with ib.new_scope():
+nthread_tx = max_threads
+nthread_bx = batch_size // max_threads + 1
+tx = te.thread_axis("threadIdx.x")
+bx = te.thread_axis("blockIdx.x")
+ib.scope_attr(tx, "thread_extent", nthread_tx)
+ib.scope_attr(bx, "thread_extent", nthread_bx)
+tid = bx * max_threads + tx
+with ib.if_scope(tid < batch_size):
+valid_count[tid] = 0
+i = tid
+with ib.for_range(0, num_anchors) as j:
+score = data[(i * num_anchors + j) * elem_length + score_index]
+with ib.if_scope(
+tvm.tir.all(
+score > score_threshold,
+tvm.tir.any(
+id_index < 0, data[(i * num_anchors + j) * 
elem_length + id_index] >= 0
+),
+)
+):
+with ib.for_range(0, elem_length) as k:
+out[(i * num_anchors + valid_count[i]) * elem_length + 
k] = data[
+(i * num_anchors + j) * elem_length + k
+]
+out_indices[i * num_anchors + valid_count[i]] = j
+valid_count[i] += 1

Review comment:
   There is definitely not a data race now, because I removed the threading 
:smile: But I think I see your point, this might be while I couldn't pass the 
test with threading on. I will investigate





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] ANSHUMAN87 opened a new pull request #7048: [Frontend][TFLite] Densify Op added

2020-12-07 Thread GitBox



ANSHUMAN87 opened a new pull request #7048:
URL: https://github.com/apache/tvm/pull/7048


   Densify Op performs sparse to dense transformation for sparse weights based 
on their sparse parameters provided.
   This Op is needed for sparse ConvNet models.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] mbrookhart commented on a change in pull request #7044: [TOPI] GPU scatter_add using atomic

2020-12-07 Thread GitBox



mbrookhart commented on a change in pull request #7044:
URL: https://github.com/apache/tvm/pull/7044#discussion_r537647724



##
File path: tests/python/frontend/pytorch/test_forward.py
##
@@ -3355,12 +3355,12 @@ def test_bincount():
 def test_fn(x, weights=None):
 return torch.bincount(x, weights=weights)
 
-inp = torch.randint(0, 8, (5,), dtype=torch.int64)
-weights = torch.linspace(0, 1, steps=5)
+inp = torch.randint(0, 100, (1,), dtype=torch.int64)
+weights = torch.linspace(0, 100, steps=1)
 
-verify_trace_model(test_fn, [inp], ["llvm"])
-verify_trace_model(test_fn, [inp, weights], ["llvm"])
-verify_trace_model(test_fn, [inp, weights.to(torch.float64)], ["llvm"])

Review comment:
   I assume you removed this because the atomic add isn't precise enough 
for float64?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] zhiics commented on issue #7047: is-there-a-horizontal-fusion-demo

2020-12-07 Thread GitBox



zhiics commented on issue #7047:
URL: https://github.com/apache/tvm/issues/7047#issuecomment-740003810


   Thanks. Let's just use the discuss thread to keep the discussion.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] zhiics closed issue #7047: is-there-a-horizontal-fusion-demo

2020-12-07 Thread GitBox



zhiics closed issue #7047:
URL: https://github.com/apache/tvm/issues/7047


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[tvm-site] branch asf-site updated: Build at Mon Dec 7 10:45:56 EST 2020

2020-12-07 Thread tqchen

This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/tvm-site.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 1cf4e15  Build at Mon Dec  7 10:45:56 EST 2020
1cf4e15 is described below

commit 1cf4e15e63e714bd6340334268a179ac837fc5ce
Author: tqchen 
AuthorDate: Mon Dec 7 10:45:57 2020 -0500

Build at Mon Dec  7 10:45:56 EST 2020
---
 atom.xml | 2 +-
 feed.xml | 2 +-
 rss.xml  | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/atom.xml b/atom.xml
index 1e1edc7..9c061c8 100644
--- a/atom.xml
+++ b/atom.xml
@@ -4,7 +4,7 @@
  TVM
  https://tvm.apache.org"; rel="self"/>
  https://tvm.apache.org"/>
- 2020-11-25T14:54:37-05:00
+ 2020-12-07T10:45:44-05:00
  https://tvm.apache.org
  

diff --git a/feed.xml b/feed.xml
index 4aadebb..64f5387 100644
--- a/feed.xml
+++ b/feed.xml
@@ -1,4 +1,4 @@
-http://www.w3.org/2005/Atom"; >https://jekyllrb.com/"; 
version="4.1.1">Jekyll2020-11-25T14:54:37-05:00/feed.xmlTVM{"name"=>nil}Bring Your Own Datatypes: Enabling Custom Datatype [...]
+http://www.w3.org/2005/Atom"; >https://jekyllrb.com/"; 
version="4.1.1">Jekyll2020-12-07T10:45:44-05:00/feed.xmlTVM{"name"=>nil}Bring Your Own Datatypes: Enabling Custom Datatype [...]
 
 Introduction
 
diff --git a/rss.xml b/rss.xml
index 12029fb..f3dee7f 100644
--- a/rss.xml
+++ b/rss.xml
@@ -5,8 +5,8 @@
 TVM - 
 https://tvm.apache.org
 https://tvm.apache.org"; rel="self" 
type="application/rss+xml" />
-Wed, 25 Nov 2020 14:54:37 -0500
-Wed, 25 Nov 2020 14:54:37 -0500
+Mon, 07 Dec 2020 10:45:44 -0500
+Mon, 07 Dec 2020 10:45:44 -0500
 60

[GitHub] [tvm] zhxfl commented on issue #7047: is-there-a-horizontal-fusion-demo

2020-12-07 Thread GitBox



zhxfl commented on issue #7047:
URL: https://github.com/apache/tvm/issues/7047#issuecomment-739870925


   paper :  https://arxiv.org/pdf/2007.01277.pdf



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] zhxfl commented on issue #7047: is-there-a-horizontal-fusion-demo

2020-12-07 Thread GitBox



zhxfl commented on issue #7047:
URL: https://github.com/apache/tvm/issues/7047#issuecomment-739868539


   It is very import for small network when the num of block is small than the 
num of sm



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] FrozenGene commented on pull request #7046: [Auto Scheduler] Add target host to measure record

2020-12-07 Thread GitBox



FrozenGene commented on pull request #7046:
URL: https://github.com/apache/tvm/pull/7046#issuecomment-739853283


   @merrymercy @comaniac @jcf94 @minminsun 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] FrozenGene opened a new pull request #7046: [Auto Scheduler] Add target host to measure record

2020-12-07 Thread GitBox



FrozenGene opened a new pull request #7046:
URL: https://github.com/apache/tvm/pull/7046


   We don't append target host to measure record serialization / 
deserialization currently, which works well on x86 host machine / target is 
simply arm cpu. However, it will have problem for more complex deployment. Say 
we want to on our x86 server machine cross compile for arm machine's mali GPU, 
whose target host is arm cpu, but target is mali GPU.
   
   So we should add target host to measure record if `target_host` is not 
`nullptr`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] hzfan opened a new pull request #7045: [Arith] Simplify cast

2020-12-07 Thread GitBox



hzfan opened a new pull request #7045:
URL: https://github.com/apache/tvm/pull/7045


   Follow up of #6691 
   Simplify `cast(i32, c * 2 + 1) + 1 - cast(i32, c * 2)` to `2` by first 
transforming to `cast(i32, c * 2) + cast(i32, 1) + 1 - cast(i32, c * 2)`
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] masahi opened a new pull request #7044: [TOPI] GPU scatter_add using atomic

2020-12-07 Thread GitBox



masahi opened a new pull request #7044:
URL: https://github.com/apache/tvm/pull/7044


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Introduction

62 matches

Mail list logo