date:20201210

[GitHub] [tvm] masahi merged pull request #7074: Fix QNN type inference

2020-12-10 Thread GitBox



masahi merged pull request #7074:
URL: https://github.com/apache/tvm/pull/7074


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[tvm] branch main updated (fcead9f -> ffb6029)

2020-12-10 Thread masahi

This is an automated email from the ASF dual-hosted git repository.

masahi pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from fcead9f  [Relay] Support deformable Conv2D NHWC (#7075)
 add ffb6029  Fix QNN type inference (#7074)

No new revisions were added by this update.

Summary of changes:
 src/relay/qnn/op/concatenate.cc   | 36 ++-
 src/relay/qnn/op/convolution.cc   | 13 +++---
 src/relay/qnn/op/convolution_transpose.cc | 11 +++--
 src/relay/qnn/op/dense.cc | 15 ---
 src/relay/qnn/op/op_common.h  | 13 ++
 src/relay/qnn/op/requantize.cc|  7 ++
 tests/python/frontend/pytorch/qnn_test.py | 41 +++
 7 files changed, 121 insertions(+), 15 deletions(-)

[GitHub] [tvm] masahi commented on pull request #7074: Fix QNN type inference

2020-12-10 Thread GitBox



masahi commented on pull request #7074:
URL: https://github.com/apache/tvm/pull/7074#issuecomment-742364417


   Thanks @mbrookhart @jwfromm 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] masahi closed issue #7067: [QNN] [BYOC] MergeComposite pass breaks on QNN graph

2020-12-10 Thread GitBox



masahi closed issue #7067:
URL: https://github.com/apache/tvm/issues/7067


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[tvm] branch main updated (ffb6029 -> f7e13d7)

2020-12-10 Thread lmzheng

This is an automated email from the ASF dual-hosted git repository.

lmzheng pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from ffb6029  Fix QNN type inference (#7074)
 add f7e13d7  [AutoTVM] Compatibility improvement with XGBoost v1.3.0 
(#7076)

No new revisions were added by this update.

Summary of changes:
 python/tvm/autotvm/tuner/xgboost_cost_model.py | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

[GitHub] [tvm] merrymercy merged pull request #7076: [AutoTVM] Compatibility improvement with XGBoost v1.3.0

2020-12-10 Thread GitBox



merrymercy merged pull request #7076:
URL: https://github.com/apache/tvm/pull/7076


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] fprotopapa commented on a change in pull request #7059: #7058 [Tutorial] Import errors in deploy_detection.py and deploy_classification.py

2020-12-10 Thread GitBox



fprotopapa commented on a change in pull request #7059:
URL: https://github.com/apache/tvm/pull/7059#discussion_r540069099



##
File path: vta/tutorials/frontend/legacy/deploy_detection.py
##
@@ -58,7 +58,7 @@
 from tvm import rpc, autotvm, relay
 from tvm.relay.testing import yolo_detection, darknet
 from tvm.relay.testing.darknet import __darknetffi__
-from tvm.contrib import graph_runtime, graph_runtime, util
+from tvm.contrib import graph_runtime, graph_runtime, utils

Review comment:
   Duplicated import is removed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] yongwww commented on issue #4102: [Relay] Support for Tensorflow 2.0

2020-12-10 Thread GitBox



yongwww commented on issue #4102:
URL: https://github.com/apache/tvm/issues/4102#issuecomment-742463914


   @srkreddy1238 yeah, I agree that GraphDef protocol buffer definition just 
changed a little, creating a new parser will end up with some redundant work 
(copy from the existing parser). I tried to run the test cases in your shared 
branch, I found the existing parser works for most of the TF 2.x ops 
(previously I assume we have to modify most of the mapping in the parser). 
   
   I agree that we can just update the existing parser for TF2.x, and remove 
unnecessary part in the future once we stop supporting 1.x support. 
   
   Please rebase your work, it would be good to have it work on top of TF 
2.4(latest version is 2.4.0rc4). We can work on variable and tensorlist related 
ops at the same time. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] yongwww edited a comment on issue #4102: [Relay] Support for Tensorflow 2.0

2020-12-10 Thread GitBox



yongwww edited a comment on issue #4102:
URL: https://github.com/apache/tvm/issues/4102#issuecomment-742463914


   @srkreddy1238 yeah, I agree that GraphDef protocol buffer definition just 
changed a little, creating a new parser will end up with some redundant work 
(copy from the existing parser). I tried to run the test cases in your shared 
branch, I found the existing parser works for most of the TF 2.x ops 
(previously I assume we have to modify most of the mapping in the parser). we 
can just update the existing parser for TF2.x, and remove unnecessary part in 
the future once we stop supporting 1.x support. 
   
   Please rebase your work, it would be good to have it work on top of TF 
2.4(latest version is 2.4.0rc4). We can work on variable and tensorlist related 
ops at the same time. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] u99127 commented on pull request #7048: [Frontend][TFLite] Densify Op added

2020-12-10 Thread GitBox



u99127 commented on pull request #7048:
URL: https://github.com/apache/tvm/pull/7048#issuecomment-742526188


   > 
   > 
   > > Thanks @ANSHUMAN87. Overall looks pretty good, but I am a little 
confused at exactly what is happening is this pass. Is it converting matrices 
from sparse to dense at compile time?
   > 
   > Thanks @tkonolige for review! The Densify Op does not need to be present 
in TVM runtime currently, because its purpose is to convert sparse 
weights(constant) to dense form. So i have optimized it out. However in case it 
is required in future to be present in TVM runtime, we can always do it with 
sparse_to_dense Op with the indices produced from the utility 
prepare_dense_matrix_from_sparse().
   > 
   Sorry about the delay in reviewing .  
   
   I think there are a couple of design points here that we should consider and 
take a well-informed decision on this direction. 
   
   1. The advantage of keeping the sparse weights representation all the way 
through is something that helps with keeping the weights size low in the final 
binary generated which could help with static size / footprint in the final 
binary at the expense of some runtime (obviously). For usecases where we go 
through this with BYOC or others where there are weight compression techniques, 
this should clear up reasonably ok, however where we fall back to traditional 
tvm compilation on CPUs and GPUs there's no weight compression techniques and 
thus this contributes to rodata bloat or flash bloat. 
   
   2. Even if it doesn't make sense to keep this in the final representation to 
be expanded at the runtime either in the form of an op for the Relay VM or a 
new op in the graph runtime - is this operator something that appears in other 
frameworks and if so does it make sense to keep this behaviour common across 
the stack to allow us to reuse this logic for other frontends as well i.e. the 
various frontends lower to a relay.op.densify () and the result of that is 
something that represents a dense weight tensor. Further legalization could end 
up creating this dense weight tensor ? One of the principles in compiler design 
is to try and bind things as late as possible to give us a chance of optimizing 
and being able to recover easily without too much reverse engineering. So this 
comes back to that as well. 
   
   TBH I would probably prefer #2 as an approach to try and reduce any 
duplications in the frontends and give other compilers or others who use Relay 
a chance to get to the densify op. I don't think we can use it for any of our 
work instantly. If the performance or code size becomes an issue we'll need to 
reconsider a runtime or partial runtime representation of sparse weights at 
which point #1 would be a stepping stone towards it .
   
   These are my thought on the design. 
   
   regards
   Ramana
   
   > As to your other comments i have tried to add comments in the necessary 
places. Please have a look. In case anything more required. Please let me know. 
TIA!
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] mbaret commented on pull request #6655: [BYOC] Added "include_non_call_ops" parameter to AnnotateTarget pass

2020-12-10 Thread GitBox



mbaret commented on pull request #6655:
URL: https://github.com/apache/tvm/pull/6655#issuecomment-742539127


   I don't agree that this should be separate, because the current behaviour of 
AnnotateTarget is simply incorrect. ACL does not support tuples, so 
AnnotateTarget should not mark them as supported. We shouldn't need a 'fix-up' 
pass after AnnotateTarget to make it correct, it's just a bug in AnnotateTarget 
that's come about from faulty reasoning about how codegens work. As this is 
user-facing and a critical error, I think the priority is accepting a fix with 
a refactor to reduce complexity coming later.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] corehalt opened a new pull request #7080: Add individual test for qnn and byoc

2020-12-10 Thread GitBox



corehalt opened a new pull request #7080:
URL: https://github.com/apache/tvm/pull/7080


   Restored the previously patched `qnn_test.py`
   Added a separate test for qnn graph and byoc flow.
   @masahi @mbrookhart 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7009: [TFLite] Bugfix - ensure pad calcalution to be in int32

2020-12-10 Thread GitBox



giuseros commented on a change in pull request #7009:
URL: https://github.com/apache/tvm/pull/7009#discussion_r540240975



##
File path: python/tvm/relay/frontend/tflite.py
##
@@ -3210,7 +3210,7 @@ def get_pad_value(data, kernel, stride):
 """
 
 out = int(math.ceil(float(data) / float(stride)))
-pad = max(0, (out - 1) * stride + kernel - data)
+pad = max(0, (out - 1) * int(stride) + int(kernel) - int(data))

Review comment:
   Also, do we know why this fails with `int64`? AFAIU from your RFC: 
https://discuss.tvm.apache.org/t/int64-vs-int32-dtype-error/8555 this used to 
work before. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] masahi commented on a change in pull request #7080: Add a standalone test for qnn and byoc

2020-12-10 Thread GitBox



masahi commented on a change in pull request #7080:
URL: https://github.com/apache/tvm/pull/7080#discussion_r540280053



##
File path: tests/python/frontend/pytorch/test_qnn_byoc_flow.py
##
@@ -0,0 +1,89 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+""" Tests on QNN + BYOC """
+import torch
+
+import tvm
+import tvm.testing
+
+from tvm import relay
+from tvm.relay.dataflow_pattern import wildcard, is_op
+from tvm.relay.op.contrib.register import register_pattern_table
+from tvm.relay.op.contrib.register import get_pattern_table
+
+
+def make_qnn_add_pattern():
+lhs = wildcard()
+rhs = wildcard()
+lhs_scale = wildcard()
+lhs_zero_point = wildcard()
+rhs_scale = wildcard()
+rhs_zero_point = wildcard()
+output_scale = wildcard()
+output_zero_point = wildcard()
+qadd = is_op("qnn.add")(
+lhs,
+rhs,
+lhs_scale,
+lhs_zero_point,
+rhs_scale,
+rhs_zero_point,
+output_scale,
+output_zero_point,
+)
+return qadd.optional(is_op("clip"))
+
+
+@register_pattern_table("test_table")
+def pattern_table():
+return [
+("qnn_add", make_qnn_add_pattern()),
+]
+
+
+def run_byoc_flow(script_module, input_name, ishape):
+input_shapes = [(input_name, ishape)]
+mod, params = relay.frontend.from_pytorch(script_module, input_shapes)
+pattern_table = get_pattern_table("test_table")
+with tvm.transform.PassContext(opt_level=3):
+pass_list = [
+tvm.relay.transform.SimplifyInference(),
+tvm.relay.transform.MergeComposite(pattern_table),

Review comment:
   Remove passes after merge composite below, they are not relevant for 
https://github.com/apache/tvm/issues/7067





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] masahi commented on pull request #7080: Add a standalone test for qnn and byoc

2020-12-10 Thread GitBox



masahi commented on pull request #7080:
URL: https://github.com/apache/tvm/pull/7080#issuecomment-742608360


   @corehalt I dont think we need to introduce a new file, please add your 
change to qnn_test.py



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] ANSHUMAN87 commented on pull request #7048: [Frontend][TFLite] Densify Op added

2020-12-10 Thread GitBox



ANSHUMAN87 commented on pull request #7048:
URL: https://github.com/apache/tvm/pull/7048#issuecomment-742610846


   > > 
   > 
   > Sorry about the delay in reviewing .
   > 
   > I think there are a couple of design points here that we should consider 
and take a well-informed decision on this direction.
   > 
   > 1. The advantage of keeping the sparse weights representation all the 
way through is something that helps with keeping the weights size low in the 
final binary generated which could help with static size / footprint in the 
final binary at the expense of some runtime (obviously). For usecases where we 
go through this with BYOC or others where there are weight compression 
techniques, this should clear up reasonably ok, however where we fall back to 
traditional tvm compilation on CPUs and GPUs there's no weight compression 
techniques and thus this contributes to rodata bloat or flash bloat.
   > 
   > 2. Even if it doesn't make sense to keep this in the final 
representation to be expanded at the runtime either in the form of an op for 
the Relay VM or a new op in the graph runtime - is this operator something that 
appears in other frameworks and if so does it make sense to keep this behaviour 
common across the stack to allow us to reuse this logic for other frontends as 
well i.e. the various frontends lower to a relay.op.densify () and the result 
of that is something that represents a dense weight tensor. Further 
legalization could end up creating this dense weight tensor ? One of the 
principles in compiler design is to try and bind things as late as possible to 
give us a chance of optimizing and being able to recover easily without too 
much reverse engineering. So this comes back to that as well.
   > 
   > 
   > TBH I would probably prefer #2 as an approach to try and reduce any 
duplications in the frontends and give other compilers or others who use Relay 
a chance to get to the densify op. I don't think we can use it for any of our 
work instantly. If the performance or code size becomes an issue we'll need to 
reconsider a runtime or partial runtime representation of sparse weights at 
which point #1 would be a stepping stone towards it .
   > 
   > These are my thought on the design.
   > 
   > regards
   > Ramana
   
   Thanks @u99127 for your inputs! Most of your arguments i concur. 
   I think we have 2 possible approaches to bring Sparse Convnet models onto 
TVM:
   1:> Keep the sparse weight and convert to dense in runtime, as Sparse 
inference is not complete in TVM currently.
   2:> Optimize out the Sparse weights, by transforming to Dense while 
on-boarding to TVM.
   
 When i went through first approach, i found the performance is too bad 
and Sparse_to_Dense Op resulted in stack corruption when fed with higher 
dimension inputs, hence i switched to second approach, which makes sense if we 
think from TFLite perspective. Because TFLite framework, optmizes  out Densify 
Op when it comes to Sparse Inference. As my future work will include the Sparse 
Inference as well, i think at this point it will suffice to keep densified 
weight in Runtime.
   
   However as i mentioned earlier, if we want to keep in 
TVM runtime, current PR changes supports that as well, with the steps i 
mentioned in my previous comment. Provided we first fix the Sparse_to_Dense 
limitations(which i am looking into right now).
  I think keeping a Densify Op is not a good option as it will be duplicate 
of Sparse_to_Dense Op.
   
   May be we can do this Op conversion with a configurable user option in 
Frontend.
   Now by default we optimize out this Op and keep the dense weight in Runtime.
   When i am ready with an appropriate solution for Sparse_to_Dense Op, we can 
make that as default and keep the first approach to user choice.
   Let me know your thoughts on this. TIA!
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] ANSHUMAN87 edited a comment on pull request #7048: [Frontend][TFLite] Densify Op added

2020-12-10 Thread GitBox



ANSHUMAN87 edited a comment on pull request #7048:
URL: https://github.com/apache/tvm/pull/7048#issuecomment-742610846


   > > 
   > 
   > Sorry about the delay in reviewing .
   > 
   > I think there are a couple of design points here that we should consider 
and take a well-informed decision on this direction.
   > 
   > 1. The advantage of keeping the sparse weights representation all the 
way through is something that helps with keeping the weights size low in the 
final binary generated which could help with static size / footprint in the 
final binary at the expense of some runtime (obviously). For usecases where we 
go through this with BYOC or others where there are weight compression 
techniques, this should clear up reasonably ok, however where we fall back to 
traditional tvm compilation on CPUs and GPUs there's no weight compression 
techniques and thus this contributes to rodata bloat or flash bloat.
   > 
   > 2. Even if it doesn't make sense to keep this in the final 
representation to be expanded at the runtime either in the form of an op for 
the Relay VM or a new op in the graph runtime - is this operator something that 
appears in other frameworks and if so does it make sense to keep this behaviour 
common across the stack to allow us to reuse this logic for other frontends as 
well i.e. the various frontends lower to a relay.op.densify () and the result 
of that is something that represents a dense weight tensor. Further 
legalization could end up creating this dense weight tensor ? One of the 
principles in compiler design is to try and bind things as late as possible to 
give us a chance of optimizing and being able to recover easily without too 
much reverse engineering. So this comes back to that as well.
   > 
   > 
   > TBH I would probably prefer #2 as an approach to try and reduce any 
duplications in the frontends and give other compilers or others who use Relay 
a chance to get to the densify op. I don't think we can use it for any of our 
work instantly. If the performance or code size becomes an issue we'll need to 
reconsider a runtime or partial runtime representation of sparse weights at 
which point #1 would be a stepping stone towards it .
   > 
   > These are my thought on the design.
   > 
   > regards
   > Ramana
   
   Thanks @u99127 for your inputs! Most of your arguments i concur. 
   I think we have 2 possible approaches to bring Sparse Convnet models onto 
TVM:
   1:> Keep the sparse weight and convert to dense in runtime, as Sparse 
inference is not complete in TVM currently.
   2:> Optimize out the Sparse weights, by transforming to Dense while 
on-boarding to TVM.
   
   When i went through first approach, i found the performance is too bad and 
Sparse_to_Dense Op resulted in stack corruption when fed with higher dimension 
inputs, hence i switched to second approach, which makes sense if we think from 
TFLite perspective. Because TFLite framework, optmizes  out Densify Op when it 
comes to Sparse Inference. As my future work will include the Sparse Inference 
as well, i think at this point it will suffice to keep densified weight in 
Runtime.
   
   However as i mentioned earlier, if we want to keep in TVM runtime, current 
PR changes supports that as well, with the steps i mentioned in my previous 
comment. Provided we first fix the Sparse_to_Dense limitations(which i am 
looking into right now).
  I think keeping a Densify Op is not a good option as it will be duplicate 
of Sparse_to_Dense Op.
   
   May be we can do this Op conversion with a configurable user option in 
Frontend.
   Now by default we optimize out this Op and keep the dense weight in Runtime.
   When i am ready with an appropriate solution for Sparse_to_Dense Op, we can 
make that as default and keep the first approach to user choice.
   Let me know your thoughts on this. TIA!
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] leandron commented on a change in pull request #7055: Fix shape init for tvmc onnx frontend

2020-12-10 Thread GitBox



leandron commented on a change in pull request #7055:
URL: https://github.com/apache/tvm/pull/7055#discussion_r540287151



##
File path: python/tvm/driver/tvmc/frontends.py
##
@@ -161,14 +161,11 @@ def load(self, path):
 # pylint: disable=E1101
 model = onnx.load(path)
 
+shape_dict = {}
 # pylint: disable=E1101
-name = model.graph.input[0].name
-
-# pylint: disable=E1101
-proto_shape = model.graph.input[0].type.tensor_type.shape.dim
-shape = [d.dim_value for d in proto_shape]
-
-shape_dict = {name: shape}
+for i in model.graph.input:
+# pylint: disable=E1101
+shape_dict[i.name] = [dim.dim_value for dim in 
i.type.tensor_type.shape.dim]

Review comment:
   My understanding is that we all agree two things here:
   1) moving the current logic to to `from_onnx`, to be done in the current PR 
(@dlexplorer )
   2) a new feature to include `--input-shape` (in `tvmc compile` and `tvmc 
tune` I think), to be submitted in a separate PR @lixiaoquan)
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] ANSHUMAN87 edited a comment on pull request #7048: [Frontend][TFLite] Densify Op added

2020-12-10 Thread GitBox



ANSHUMAN87 edited a comment on pull request #7048:
URL: https://github.com/apache/tvm/pull/7048#issuecomment-742610846


   > > 
   > 
   > Sorry about the delay in reviewing .
   > 
   > I think there are a couple of design points here that we should consider 
and take a well-informed decision on this direction.
   > 
   > 1. The advantage of keeping the sparse weights representation all the 
way through is something that helps with keeping the weights size low in the 
final binary generated which could help with static size / footprint in the 
final binary at the expense of some runtime (obviously). For usecases where we 
go through this with BYOC or others where there are weight compression 
techniques, this should clear up reasonably ok, however where we fall back to 
traditional tvm compilation on CPUs and GPUs there's no weight compression 
techniques and thus this contributes to rodata bloat or flash bloat.
   > 
   > 2. Even if it doesn't make sense to keep this in the final 
representation to be expanded at the runtime either in the form of an op for 
the Relay VM or a new op in the graph runtime - is this operator something that 
appears in other frameworks and if so does it make sense to keep this behaviour 
common across the stack to allow us to reuse this logic for other frontends as 
well i.e. the various frontends lower to a relay.op.densify () and the result 
of that is something that represents a dense weight tensor. Further 
legalization could end up creating this dense weight tensor ? One of the 
principles in compiler design is to try and bind things as late as possible to 
give us a chance of optimizing and being able to recover easily without too 
much reverse engineering. So this comes back to that as well.
   > 
   > 
   > TBH I would probably prefer #2 as an approach to try and reduce any 
duplications in the frontends and give other compilers or others who use Relay 
a chance to get to the densify op. I don't think we can use it for any of our 
work instantly. If the performance or code size becomes an issue we'll need to 
reconsider a runtime or partial runtime representation of sparse weights at 
which point #1 would be a stepping stone towards it .
   > 
   > These are my thought on the design.
   > 
   > regards
   > Ramana
   
   Thanks @u99127 for your inputs! Most of your arguments i concur. 
   I think we have 2 possible approaches to bring Sparse Convnet models onto 
TVM:
   1:> Keep the sparse weight and convert to dense in runtime, as Sparse 
inference is not complete in TVM currently.
   2:> Optimize out the Sparse weights, by transforming to Dense while 
on-boarding to TVM.
   
   When i went through first approach, i found the performance is too bad and 
Sparse_to_Dense Op resulted in stack corruption when fed with higher dimension 
inputs, hence i switched to second approach, which makes sense if we think from 
TFLite perspective. Because TFLite framework, optmizes  out Densify Op when it 
comes to Sparse Inference. As my future work will include the Sparse Inference 
as well, i think at this point it will suffice to keep densified weight in 
Runtime.
   
   However as i mentioned earlier, if we want to keep in TVM runtime, current 
PR changes supports that as well, with the steps i mentioned in my previous 
comment. Provided we first fix the Sparse_to_Dense limitations(which i am 
looking into right now).
  I think keeping a Densify Op is not a good option as it will be duplicate 
of Sparse_to_Dense Op.
   
   May be we can do this Op conversion with a configurable user option in 
Frontend.
   Now by default we optimize out this Op and keep the dense weight in Runtime.
   When i am ready with an appropriate solution for Sparse_to_Dense Op, we can 
make that as default and keep the first approach to user choice, so that user 
can chose between performance or compression.
   Let me know your thoughts on this. TIA!
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] leandron commented on a change in pull request #7055: Fix shape init for tvmc onnx frontend

2020-12-10 Thread GitBox



leandron commented on a change in pull request #7055:
URL: https://github.com/apache/tvm/pull/7055#discussion_r540287151



##
File path: python/tvm/driver/tvmc/frontends.py
##
@@ -161,14 +161,11 @@ def load(self, path):
 # pylint: disable=E1101
 model = onnx.load(path)
 
+shape_dict = {}
 # pylint: disable=E1101
-name = model.graph.input[0].name
-
-# pylint: disable=E1101
-proto_shape = model.graph.input[0].type.tensor_type.shape.dim
-shape = [d.dim_value for d in proto_shape]
-
-shape_dict = {name: shape}
+for i in model.graph.input:
+# pylint: disable=E1101
+shape_dict[i.name] = [dim.dim_value for dim in 
i.type.tensor_type.shape.dim]

Review comment:
   My understanding is that we all agree two things here:
   1) moving the current logic to to `from_onnx`, to be done in the current PR 
(@dlexplorer )
   2) a new feature to include `--input-shape` (in `tvmc compile` and `tvmc 
tune` I think), to be submitted in a separate PR (@lixiaoquan)
   
   Is that correct?
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] dlexplorer commented on a change in pull request #7055: Fix shape init for tvmc onnx frontend

2020-12-10 Thread GitBox



dlexplorer commented on a change in pull request #7055:
URL: https://github.com/apache/tvm/pull/7055#discussion_r540288874



##
File path: python/tvm/driver/tvmc/frontends.py
##
@@ -161,14 +161,11 @@ def load(self, path):
 # pylint: disable=E1101
 model = onnx.load(path)
 
+shape_dict = {}
 # pylint: disable=E1101
-name = model.graph.input[0].name
-
-# pylint: disable=E1101
-proto_shape = model.graph.input[0].type.tensor_type.shape.dim
-shape = [d.dim_value for d in proto_shape]
-
-shape_dict = {name: shape}
+for i in model.graph.input:
+# pylint: disable=E1101
+shape_dict[i.name] = [dim.dim_value for dim in 
i.type.tensor_type.shape.dim]

Review comment:
   I understood the same. Sorry, little bit busy these days, cannot modify 
PR quickly. Hope will do later today





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[tvm] branch main updated (f7e13d7 -> 7a3278a)

2020-12-10 Thread comaniac

This is an automated email from the ASF dual-hosted git repository.

comaniac pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from f7e13d7  [AutoTVM] Compatibility improvement with XGBoost v1.3.0 
(#7076)
 add 7a3278a  [Relay][Strategy] Allow cuda cross compilation without 
physical device. (#7063)

No new revisions were added by this update.

Summary of changes:
 python/tvm/contrib/nvcc.py   | 25 ++---
 python/tvm/relay/op/strategy/cuda.py | 10 +-
 2 files changed, 27 insertions(+), 8 deletions(-)

[GitHub] [tvm] comaniac merged pull request #7063: [Relay][Strategy] Allow cuda cross compilation without physical device.

2020-12-10 Thread GitBox



comaniac merged pull request #7063:
URL: https://github.com/apache/tvm/pull/7063


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] comaniac commented on pull request #7063: [Relay][Strategy] Allow cuda cross compilation without physical device.

2020-12-10 Thread GitBox



comaniac commented on pull request #7063:
URL: https://github.com/apache/tvm/pull/7063#issuecomment-742638947


   Thanks @jwfromm @anwang2009.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] vegaluisjose opened a new pull request #7081: [VTA] update 3rdparty submodule

2020-12-10 Thread GitBox



vegaluisjose opened a new pull request #7081:
URL: https://github.com/apache/tvm/pull/7081


   @tmoreau89 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] leandron commented on a change in pull request #7070: Add autoscheduler support to tvmc

2020-12-10 Thread GitBox



leandron commented on a change in pull request #7070:
URL: https://github.com/apache/tvm/pull/7070#discussion_r540331693



##
File path: python/tvm/driver/tvmc/autoscheduler.py
##
@@ -0,0 +1,212 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Provides support to auto-tuning networks using AutoScheduler.
+"""
+import logging
+
+from urllib.parse import urlparse
+
+from tvm import auto_scheduler
+from tvm.auto_scheduler.auto_schedule import HardwareParams
+
+from . import common, frontends
+from .common import add_tuning_options
+from .main import register_parser
+
+
+# pylint: disable=invalid-name
+logger = logging.getLogger("TVMC")
+
+
+@register_parser
+def add_autoscheduler_parser(subparsers):
+""" Include parser for 'autoschedule' subcommand """
+parser = subparsers.add_parser("autoschedule", help="auto-schedule a 
model")

Review comment:
   I agree that adding this feature to `tune`, rather than a new 
`autoschedule` is s good idea, and simplifies the message for the end-user.
   
   In this case, I think we will need a way to enable this autoschedule mode on 
`tvmc tune`. We don't have any other flag-type option on TVMC yet, so it is a 
good opportunity so see which format we want.
   
   Thinking about the name, I think `--enable-[thing]` (e.g. 
`--enable-auto-scheduler`) is clear and generic enough to be extended to other 
flags in future. Looking into `clang` options, for example, I couldn't find a 
specific standard they use for flag-type options - any thoughts?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] comaniac commented on a change in pull request #7070: Add autoscheduler support to tvmc

2020-12-10 Thread GitBox



comaniac commented on a change in pull request #7070:
URL: https://github.com/apache/tvm/pull/7070#discussion_r540342698



##
File path: python/tvm/driver/tvmc/autoscheduler.py
##
@@ -0,0 +1,212 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Provides support to auto-tuning networks using AutoScheduler.
+"""
+import logging
+
+from urllib.parse import urlparse
+
+from tvm import auto_scheduler
+from tvm.auto_scheduler.auto_schedule import HardwareParams
+
+from . import common, frontends
+from .common import add_tuning_options
+from .main import register_parser
+
+
+# pylint: disable=invalid-name
+logger = logging.getLogger("TVMC")
+
+
+@register_parser
+def add_autoscheduler_parser(subparsers):
+""" Include parser for 'autoschedule' subcommand """
+parser = subparsers.add_parser("autoschedule", help="auto-schedule a 
model")

Review comment:
   `--enable-auto-scheduler` looks good to me, or we could use 
`--use-auto-scheduler`, which aligns the configuration name used in the 
PassContext.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] mbrookhart commented on pull request #6839: [ONNX] NMS in ONNX

2020-12-10 Thread GitBox



mbrookhart commented on pull request #6839:
URL: https://github.com/apache/tvm/pull/6839#issuecomment-742658875


   @Laurawly I took the mxnet example you provided and ran it with the debug 
runtime. It required a little bit of editing, APIs have changed slightly since 
that tutorial was written. Anyway, this is what I get on my 1070 TI with Thrust 
enabled.
   
   main:
   ```
   Ops  
   Time(us)Time(%)  Shape
   ---  
   ---  - 
   fused_vision_non_max_suppression 
   139329.074.66(1, 122640, 6)
   fused_vision_get_valid_counts
   124.255 0.067(1, 122640, 6) 
   ```
   this PR:
   ```   
   fused_vision_get_valid_counts
   46138.350.891   (1, 122640, 6)  
   fused_vision_non_max_suppression 
   12319.813.589   (1, 122640, 6)
   ```
   
   The get valid counts function slow down, but I'm actually seeing the total 
runtime of these ops decrease from 139.3ms to 58.5ms
   
   My modifications to the example can be found here: 
https://gist.github.com/mbrookhart/df25427cbbfb3c73ed16be72c8525610



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7070: Add autoscheduler support to tvmc

2020-12-10 Thread GitBox



giuseros commented on a change in pull request #7070:
URL: https://github.com/apache/tvm/pull/7070#discussion_r540411800



##
File path: python/tvm/driver/tvmc/autoscheduler.py
##
@@ -0,0 +1,212 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Provides support to auto-tuning networks using AutoScheduler.
+"""
+import logging
+
+from urllib.parse import urlparse
+
+from tvm import auto_scheduler
+from tvm.auto_scheduler.auto_schedule import HardwareParams
+
+from . import common, frontends
+from .common import add_tuning_options
+from .main import register_parser
+
+
+# pylint: disable=invalid-name
+logger = logging.getLogger("TVMC")
+
+
+@register_parser
+def add_autoscheduler_parser(subparsers):
+""" Include parser for 'autoschedule' subcommand """
+parser = subparsers.add_parser("autoschedule", help="auto-schedule a 
model")

Review comment:
   I went for `--enable-autoscheduler` (without the dash). This means that 
I can get rid of a lot of the `common` code. Very good!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7070: Add autoscheduler support to tvmc

2020-12-10 Thread GitBox



giuseros commented on a change in pull request #7070:
URL: https://github.com/apache/tvm/pull/7070#discussion_r540412183



##
File path: python/tvm/driver/tvmc/autoscheduler.py
##
@@ -0,0 +1,212 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Provides support to auto-tuning networks using AutoScheduler.
+"""
+import logging
+
+from urllib.parse import urlparse
+
+from tvm import auto_scheduler
+from tvm.auto_scheduler.auto_schedule import HardwareParams
+
+from . import common, frontends
+from .common import add_tuning_options
+from .main import register_parser
+
+
+# pylint: disable=invalid-name
+logger = logging.getLogger("TVMC")
+
+
+@register_parser
+def add_autoscheduler_parser(subparsers):
+""" Include parser for 'autoschedule' subcommand """
+parser = subparsers.add_parser("autoschedule", help="auto-schedule a 
model")
+parser.set_defaults(func=drive_autoschedule)
+add_tuning_options(parser)
+
+parser.add_argument(
+"--cache-line-bytes",
+default=64,
+help="the size of cache line in bytes",
+)
+parser.add_argument(
+"--num-cores",
+default=4,
+help="the number of device cores",
+)
+parser.add_argument(
+"--vector-unit-bytes",
+default=16,
+help="the width of vector units in bytes",
+)

Review comment:
   I used two groups actually, to differentiate between autotuner and 
autoschedule options





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7070: Add autoscheduler support to tvmc

2020-12-10 Thread GitBox



giuseros commented on a change in pull request #7070:
URL: https://github.com/apache/tvm/pull/7070#discussion_r540412352



##
File path: python/tvm/driver/tvmc/common.py
##
@@ -36,6 +36,93 @@ class TVMCException(Exception):
 """TVMC Exception"""
 
 
+def add_tuning_options(parser):

Review comment:
   Got rid of this





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7070: Add autoscheduler support to tvmc

2020-12-10 Thread GitBox



giuseros commented on a change in pull request #7070:
URL: https://github.com/apache/tvm/pull/7070#discussion_r540412352



##
File path: python/tvm/driver/tvmc/common.py
##
@@ -36,6 +36,93 @@ class TVMCException(Exception):
 """TVMC Exception"""
 
 
+def add_tuning_options(parser):

Review comment:
   File has been deleted

##
File path: python/tvm/driver/tvmc/autoscheduler.py
##
@@ -0,0 +1,212 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Provides support to auto-tuning networks using AutoScheduler.
+"""
+import logging
+
+from urllib.parse import urlparse
+
+from tvm import auto_scheduler
+from tvm.auto_scheduler.auto_schedule import HardwareParams
+
+from . import common, frontends
+from .common import add_tuning_options
+from .main import register_parser
+
+
+# pylint: disable=invalid-name
+logger = logging.getLogger("TVMC")
+
+
+@register_parser
+def add_autoscheduler_parser(subparsers):
+""" Include parser for 'autoschedule' subcommand """
+parser = subparsers.add_parser("autoschedule", help="auto-schedule a 
model")
+parser.set_defaults(func=drive_autoschedule)
+add_tuning_options(parser)
+
+parser.add_argument(
+"--cache-line-bytes",
+default=64,
+help="the size of cache line in bytes",
+)
+parser.add_argument(
+"--num-cores",
+default=4,
+help="the number of device cores",
+)
+parser.add_argument(
+"--vector-unit-bytes",
+default=16,
+help="the width of vector units in bytes",
+)
+parser.add_argument(
+"--model-format",
+choices=frontends.get_frontend_names(),
+help="specify input model format",
+)
+
+
+def drive_autoschedule(args):
+"""Invoke auto-scheduling with command line arguments
+
+Parameters
+--
+args: argparse.Namespace
+Arguments from command line parser.
+"""
+
+# extra arguments validation before importing the model, so that obvious 
errors
+# are pointed in advance.
+if args.rpc_tracker:
+parsed_url = urlparse("//%s" % args.rpc_tracker)
+rpc_hostname = parsed_url.hostname
+rpc_port = parsed_url.port or 9090
+logger.info("RPC tracker hostname: %s", rpc_hostname)
+logger.info("RPC tracker port: %s", rpc_port)
+
+if not args.rpc_key:
+raise common.TVMCException(
+"need to provide an RPC tracker key (--rpc-key) for remote 
tuning"
+)
+
+target = common.target_from_cli(args.target)
+mod, params = frontends.load_model(args.FILE, args.model_format)
+
+# min_repeat_ms should be:
+# a. the value provided by the user, if any, or
+# b. 0ms in case target is "cpu"; otherwise 1000ms
+if args.min_repeat_ms is not None:
+min_repeat_ms = args.min_repeat_ms
+else:
+min_repeat_ms = 0 if target.keys[0] == "cpu" else 1000
+logger.debug("Default --min-repeat-ms for this target is %s", 
min_repeat_ms)

Review comment:
   File has been deleted





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7070: Add autoscheduler support to tvmc

2020-12-10 Thread GitBox



giuseros commented on a change in pull request #7070:
URL: https://github.com/apache/tvm/pull/7070#discussion_r540412904



##
File path: python/tvm/driver/tvmc/autoscheduler.py
##
@@ -0,0 +1,212 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Provides support to auto-tuning networks using AutoScheduler.
+"""
+import logging
+
+from urllib.parse import urlparse
+
+from tvm import auto_scheduler
+from tvm.auto_scheduler.auto_schedule import HardwareParams
+
+from . import common, frontends
+from .common import add_tuning_options
+from .main import register_parser
+
+
+# pylint: disable=invalid-name
+logger = logging.getLogger("TVMC")
+
+
+@register_parser
+def add_autoscheduler_parser(subparsers):
+""" Include parser for 'autoschedule' subcommand """
+parser = subparsers.add_parser("autoschedule", help="auto-schedule a 
model")
+parser.set_defaults(func=drive_autoschedule)
+add_tuning_options(parser)
+
+parser.add_argument(
+"--cache-line-bytes",
+default=64,
+help="the size of cache line in bytes",
+)
+parser.add_argument(
+"--num-cores",
+default=4,
+help="the number of device cores",
+)
+parser.add_argument(
+"--vector-unit-bytes",
+default=16,
+help="the width of vector units in bytes",
+)
+parser.add_argument(
+"--model-format",
+choices=frontends.get_frontend_names(),
+help="specify input model format",
+)
+
+
+def drive_autoschedule(args):
+"""Invoke auto-scheduling with command line arguments
+
+Parameters
+--
+args: argparse.Namespace
+Arguments from command line parser.
+"""
+
+# extra arguments validation before importing the model, so that obvious 
errors
+# are pointed in advance.
+if args.rpc_tracker:
+parsed_url = urlparse("//%s" % args.rpc_tracker)
+rpc_hostname = parsed_url.hostname
+rpc_port = parsed_url.port or 9090
+logger.info("RPC tracker hostname: %s", rpc_hostname)
+logger.info("RPC tracker port: %s", rpc_port)
+
+if not args.rpc_key:
+raise common.TVMCException(
+"need to provide an RPC tracker key (--rpc-key) for remote 
tuning"
+)
+
+target = common.target_from_cli(args.target)
+mod, params = frontends.load_model(args.FILE, args.model_format)
+
+# min_repeat_ms should be:
+# a. the value provided by the user, if any, or
+# b. 0ms in case target is "cpu"; otherwise 1000ms
+if args.min_repeat_ms is not None:
+min_repeat_ms = args.min_repeat_ms
+else:
+min_repeat_ms = 0 if target.keys[0] == "cpu" else 1000
+logger.debug("Default --min-repeat-ms for this target is %s", 
min_repeat_ms)
+
+if args.rpc_tracker:
+
+runner = auto_scheduler.RPCRunner(
+key=args.rpc_key,
+host=rpc_hostname,
+port=rpc_port,
+number=args.number,
+repeat=args.repeat,
+n_parallel=args.parallel,
+timeout=args.timeout,
+min_repeat_ms=min_repeat_ms,
+)
+else:
+logger.info("starting localhost tuning")
+runner = auto_scheduler.LocalRunner(
+number=args.number,
+repeat=args.repeat,
+timeout=args.timeout,
+min_repeat_ms=min_repeat_ms,
+)
+
+# Create the autoscheduler tuning options
+tuning_options = auto_scheduler.TuningOptions(
+num_measure_trials=args.trials,
+measure_callbacks=[auto_scheduler.RecordToFile(args.output)],
+runner=runner,
+builder="local",
+early_stopping=args.early_stopping,
+)
+
+# Specify hardware parameters
+hardware_params = HardwareParams(
+args.num_cores, args.vector_unit_bytes, args.cache_line_bytes, None, 
None, None, None, None
+)
+
+# Extract the tasks from the model
+tasks, weights = get_tuning_tasks(
+mod, params, target, target_host, args.desired_layout, hardware_params
+)
+
+# Schedule the tasks (i.e., produce a schedule for each task)
+schedule_tasks(
+tasks,
+weights,
+tuning_options,

[GitHub] [tvm] giuseros commented on a change in pull request #7070: Add autoscheduler support to tvmc

2020-12-10 Thread GitBox



giuseros commented on a change in pull request #7070:
URL: https://github.com/apache/tvm/pull/7070#discussion_r540413155



##
File path: python/tvm/driver/tvmc/autoscheduler.py
##
@@ -0,0 +1,212 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Provides support to auto-tuning networks using AutoScheduler.
+"""
+import logging
+
+from urllib.parse import urlparse
+
+from tvm import auto_scheduler
+from tvm.auto_scheduler.auto_schedule import HardwareParams
+
+from . import common, frontends
+from .common import add_tuning_options
+from .main import register_parser
+
+
+# pylint: disable=invalid-name
+logger = logging.getLogger("TVMC")
+
+
+@register_parser
+def add_autoscheduler_parser(subparsers):
+""" Include parser for 'autoschedule' subcommand """
+parser = subparsers.add_parser("autoschedule", help="auto-schedule a 
model")
+parser.set_defaults(func=drive_autoschedule)
+add_tuning_options(parser)
+
+parser.add_argument(
+"--cache-line-bytes",
+default=64,
+help="the size of cache line in bytes",
+)
+parser.add_argument(
+"--num-cores",
+default=4,
+help="the number of device cores",
+)
+parser.add_argument(
+"--vector-unit-bytes",
+default=16,
+help="the width of vector units in bytes",
+)
+parser.add_argument(
+"--model-format",
+choices=frontends.get_frontend_names(),
+help="specify input model format",
+)
+
+
+def drive_autoschedule(args):
+"""Invoke auto-scheduling with command line arguments
+
+Parameters
+--
+args: argparse.Namespace
+Arguments from command line parser.
+"""
+
+# extra arguments validation before importing the model, so that obvious 
errors
+# are pointed in advance.
+if args.rpc_tracker:
+parsed_url = urlparse("//%s" % args.rpc_tracker)
+rpc_hostname = parsed_url.hostname
+rpc_port = parsed_url.port or 9090
+logger.info("RPC tracker hostname: %s", rpc_hostname)
+logger.info("RPC tracker port: %s", rpc_port)
+
+if not args.rpc_key:
+raise common.TVMCException(
+"need to provide an RPC tracker key (--rpc-key) for remote 
tuning"
+)
+
+target = common.target_from_cli(args.target)
+mod, params = frontends.load_model(args.FILE, args.model_format)
+
+# min_repeat_ms should be:
+# a. the value provided by the user, if any, or
+# b. 0ms in case target is "cpu"; otherwise 1000ms
+if args.min_repeat_ms is not None:
+min_repeat_ms = args.min_repeat_ms
+else:
+min_repeat_ms = 0 if target.keys[0] == "cpu" else 1000
+logger.debug("Default --min-repeat-ms for this target is %s", 
min_repeat_ms)
+
+if args.rpc_tracker:
+
+runner = auto_scheduler.RPCRunner(
+key=args.rpc_key,
+host=rpc_hostname,
+port=rpc_port,
+number=args.number,
+repeat=args.repeat,
+n_parallel=args.parallel,
+timeout=args.timeout,
+min_repeat_ms=min_repeat_ms,
+)
+else:
+logger.info("starting localhost tuning")
+runner = auto_scheduler.LocalRunner(
+number=args.number,
+repeat=args.repeat,
+timeout=args.timeout,
+min_repeat_ms=min_repeat_ms,
+)
+
+# Create the autoscheduler tuning options
+tuning_options = auto_scheduler.TuningOptions(
+num_measure_trials=args.trials,
+measure_callbacks=[auto_scheduler.RecordToFile(args.output)],
+runner=runner,
+builder="local",
+early_stopping=args.early_stopping,
+)
+
+# Specify hardware parameters
+hardware_params = HardwareParams(
+args.num_cores, args.vector_unit_bytes, args.cache_line_bytes, None, 
None, None, None, None
+)
+
+# Extract the tasks from the model
+tasks, weights = get_tuning_tasks(
+mod, params, target, target_host, args.desired_layout, hardware_params
+)
+
+# Schedule the tasks (i.e., produce a schedule for each task)
+schedule_tasks(
+tasks,
+weights,
+tuning_options,

[GitHub] [tvm] giuseros commented on a change in pull request #7070: Add autoscheduler support to tvmc

2020-12-10 Thread GitBox



giuseros commented on a change in pull request #7070:
URL: https://github.com/apache/tvm/pull/7070#discussion_r540413465



##
File path: python/tvm/driver/tvmc/compiler.py
##
@@ -82,6 +82,11 @@ def add_compile_parser(subparsers):
 help="path to an auto-tuning log file by AutoTVM. If not presented, "
 "the fallback/tophub configs will be used",
 )
+parser.add_argument(
+"--use-autoscheduler",
+action="store_true",
+help="use the autoscheduler to generate the compute schedules",
+)

Review comment:
   Very nice suggestion, thanks!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7070: Add autoscheduler support to tvmc

2020-12-10 Thread GitBox



giuseros commented on a change in pull request #7070:
URL: https://github.com/apache/tvm/pull/7070#discussion_r540413857



##
File path: tests/python/driver/tvmc/test_autoscheduler.py
##
@@ -0,0 +1,101 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import json
+import pytest
+import os
+import tarfile
+
+from os import path
+
+from tvm import auto_scheduler
+from tvm.driver import tvmc
+
+
+def _get_tasks(model):
+mod, params = tvmc.frontends.load_model(model)
+tasks, weights = tvmc.autoscheduler.get_tuning_tasks(mod, params, "llvm")
+return (tasks, weights)
+
+
+def _autoscheduler_test_helper(

Review comment:
   I removed this file. But now the autoscheduler is untested. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] Laurawly commented on pull request #6839: [ONNX] NMS in ONNX

2020-12-10 Thread GitBox



Laurawly commented on pull request #6839:
URL: https://github.com/apache/tvm/pull/6839#issuecomment-742728283


   > @Laurawly I took the mxnet example you provided and ran it with the debug 
runtime. It required a little bit of editing, APIs have changed slightly since 
that tutorial was written. Anyway, this is what I get on my 1070 TI with Thrust 
enabled.
   > 
   > main:
   > 
   > ```
   > Ops
 Time(us)Time(%)  Shape
   > ---
 ---  - 
   > fused_vision_non_max_suppression   
 139329.074.66(1, 122640, 6)
   > fused_vision_get_valid_counts  
 124.255 0.067(1, 122640, 6) 
   > ```
   > 
   > this PR:
   > 
   > ```
   > fused_vision_get_valid_counts  
 46138.350.891   (1, 122640, 6)  
   > fused_vision_non_max_suppression   
 12319.813.589   (1, 122640, 6)
   > ```
   > 
   > The get valid counts function slow down, but I'm actually seeing the total 
runtime of these ops decrease from 139.3ms to 58.5ms
   > 
   > My modifications to the example can be found here: 
https://gist.github.com/mbrookhart/df25427cbbfb3c73ed16be72c8525610
   
   The time measurement is not as fast as before because of this PR: 
https://github.com/apache/tvm/pull/7005. If you reverse back this one, you 
should get fairly good performance without any effect on the correctness.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] giuseros commented on pull request #7070: Add autoscheduler support to tvmc

2020-12-10 Thread GitBox



giuseros commented on pull request #7070:
URL: https://github.com/apache/tvm/pull/7070#issuecomment-742730719


   Hi @comaniac , @leandron , 
   Thanks for your comments, now the code is way more sound. I thought of 
summing up here some issues left:
   * It's now hard to make autotuner vs autoscheduler options mutually 
exclusive. So if I specify `--tuner` with `--enable-autoscheduling`, the tuner 
I selected won't be considered. This is because we are using default values, so 
a tuner will be always defined. 
   * Extracting tasks. @comaniac , you flagged the function, but this is 
actually how we do for the autotuner. I guess this is because we want to 
unit-test the function
   * Testing: I removed the `test_autoscheduler.py` test. When I wrote it, I 
was inspired by the `test_autotuner.py`. Can we agree on a set of tests we want 
to execute? @comaniac , I agree we should not test the autotuner, but only the 
user interface. But I fear to test the user interface, we will need to test a 
bit the functionality as well. What do you guys think?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] Laurawly edited a comment on pull request #6839: [ONNX] NMS in ONNX

2020-12-10 Thread GitBox



Laurawly edited a comment on pull request #6839:
URL: https://github.com/apache/tvm/pull/6839#issuecomment-742728283


   > @Laurawly I took the mxnet example you provided and ran it with the debug 
runtime. It required a little bit of editing, APIs have changed slightly since 
that tutorial was written. Anyway, this is what I get on my 1070 TI with Thrust 
enabled.
   > 
   > main:
   > 
   > ```
   > Ops
 Time(us)Time(%)  Shape
   > ---
 ---  - 
   > fused_vision_non_max_suppression   
 139329.074.66(1, 122640, 6)
   > fused_vision_get_valid_counts  
 124.255 0.067(1, 122640, 6) 
   > ```
   > 
   > this PR:
   > 
   > ```
   > fused_vision_get_valid_counts  
 46138.350.891   (1, 122640, 6)  
   > fused_vision_non_max_suppression   
 12319.813.589   (1, 122640, 6)
   > ```
   > 
   > The get valid counts function slow down, but I'm actually seeing the total 
runtime of these ops decrease from 139.3ms to 58.5ms
   > 
   > My modifications to the example can be found here: 
https://gist.github.com/mbrookhart/df25427cbbfb3c73ed16be72c8525610
   
   The time measurement is not as fast as before because of this PR: 
https://github.com/apache/tvm/pull/7005. If you reverse back this one, you 
should get fairly good performance without any effect on the correctness. 
Because after my improvement, nms is not a bottleneck of the ssd model anymore 
while in your measurement, it seems it still is. So my point is your changes to 
PR #7005 here: 
https://github.com/apache/tvm/blob/f332512629e94fd8d761f3dd002fd3aa91673ce0/python/tvm/topi/cuda/nms.py#L470-L483
 should be faster and thus gets the performance improvement. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] comaniac commented on a change in pull request #7070: Add autoscheduler support to tvmc

2020-12-10 Thread GitBox



comaniac commented on a change in pull request #7070:
URL: https://github.com/apache/tvm/pull/7070#discussion_r540416089



##
File path: python/tvm/driver/tvmc/autotuner.py
##
@@ -133,6 +127,45 @@ def add_tune_parser(subparsers):
 default=None,
 help="change the data layout of the whole graph",
 )
+parser.add_argument(
+"--enable-autoscheduler",
+help="enable tuning the graph through the autoscheduler",
+action="store_true",
+)
+
+auto_scheduler_group = parser.add_argument_group(
+"Autoscheduler options",
+"Autoscheduler options, used when --enabled-auto-scheduler is 
provided",
+)
+
+auto_scheduler_group.add_argument(
+"--cache-line-bytes",
+type=int,
+default=64,
+help="the size of cache line in bytes",
+)
+auto_scheduler_group.add_argument(
+"--num-cores",
+type=int,
+default=4,
+help="the number of device cores",
+)
+auto_scheduler_group.add_argument(
+"--vector-unit-bytes",
+type=int,
+default=16,
+help="the width of vector units in bytes",
+)
+auto_tuning_group = parser.add_argument_group(
+"Autotuning options",
+"Autotuning options, used when the autoscheduler is not enabled",
+)

Review comment:
   This group can have more options. For example, we could include other 
hardware parameters (it should have much more than just 4). Others like whether 
to include simple tasks, whether to log the estimated latency over time, or 
even the search policy can also be considered. To me, it might be fine to add 
more options in follow-up PRs. If it's the preference, leaving a TODO here 
should be fine for now.
   
   cc @merrymercy for suggestions about the nice to have options for 
auto-scheduler in this PR.

##
File path: python/tvm/driver/tvmc/autotuner.py
##
@@ -196,28 +237,55 @@ def drive_tune(args):
 )
 else:
 logger.info("starting localhost tuning")
-runner = autotvm.LocalRunner(
+runner_ctor = (
+auto_scheduler.LocalRunner if args.enable_autoscheduler else 
autotvm.LocalRunner
+)
+runner = runner_ctor(
 number=args.number,
 repeat=args.repeat,
 timeout=args.timeout,
 min_repeat_ms=min_repeat_ms,
 )
 
-tuning_option = {
-"tuner": args.tuner,
-"trials": args.trials,
-"early_stopping": args.early_stopping,
-"measure_option": autotvm.measure_option(
-builder=autotvm.LocalBuilder(build_func="default"), runner=runner
-),
-"tuning_records": args.tuning_records,
-}
-logger.debug(" tuning options: %s", tuning_option)
+if args.enable_autoscheduler:

Review comment:
   The current the flow is (extract task) -> (initialize runner) -> 
(tuning), so you have two ` if args.enable_autoscheduler` branches. More 
seriously, `weights` is only defined in the `enable_autoscheduler` branch, and 
this could be a potential issue in the future.
   
   I would suggest moving the task extraction here, so that the flow becomes 
(initialize runner) -> (extract task, tuning). In this way, you only need one ` 
if args.enable_autoscheduler` branch, and both `task` and `weights` are local 
variables used in the branch.

##
File path: python/tvm/driver/tvmc/autotuner.py
##
@@ -196,28 +237,55 @@ def drive_tune(args):
 )
 else:
 logger.info("starting localhost tuning")
-runner = autotvm.LocalRunner(
+runner_ctor = (
+auto_scheduler.LocalRunner if args.enable_autoscheduler else 
autotvm.LocalRunner
+)
+runner = runner_ctor(
 number=args.number,
 repeat=args.repeat,
 timeout=args.timeout,
 min_repeat_ms=min_repeat_ms,
 )
 
-tuning_option = {
-"tuner": args.tuner,
-"trials": args.trials,
-"early_stopping": args.early_stopping,
-"measure_option": autotvm.measure_option(
-builder=autotvm.LocalBuilder(build_func="default"), runner=runner
-),
-"tuning_records": args.tuning_records,
-}
-logger.debug(" tuning options: %s", tuning_option)
+if args.enable_autoscheduler:
+# Create the autoscheduler tuning options
+tuning_options = auto_scheduler.TuningOptions(
+num_measure_trials=args.trials,
+measure_callbacks=[auto_scheduler.RecordToFile(args.output)],
+runner=runner,
+early_stopping=args.early_stopping,
+)
+
+# Specify hardware parameters
+print(type(args.vector_unit_bytes))
+hardware_params = auto_scheduler.HardwareParams(
+args.num_cores, args.vector_unit_bytes, args.cache_line_bytes, 0, 
0, 0, 0, 0
+)
+
+# Schedule the tasks (i.e., produce a schedule for each task

[GitHub] [tvm] Laurawly edited a comment on pull request #6839: [ONNX] NMS in ONNX

2020-12-10 Thread GitBox



Laurawly edited a comment on pull request #6839:
URL: https://github.com/apache/tvm/pull/6839#issuecomment-742728283


   > @Laurawly I took the mxnet example you provided and ran it with the debug 
runtime. It required a little bit of editing, APIs have changed slightly since 
that tutorial was written. Anyway, this is what I get on my 1070 TI with Thrust 
enabled.
   > 
   > main:
   > 
   > ```
   > Ops
 Time(us)Time(%)  Shape
   > ---
 ---  - 
   > fused_vision_non_max_suppression   
 139329.074.66(1, 122640, 6)
   > fused_vision_get_valid_counts  
 124.255 0.067(1, 122640, 6) 
   > ```
   > 
   > this PR:
   > 
   > ```
   > fused_vision_get_valid_counts  
 46138.350.891   (1, 122640, 6)  
   > fused_vision_non_max_suppression   
 12319.813.589   (1, 122640, 6)
   > ```
   > 
   > The get valid counts function slow down, but I'm actually seeing the total 
runtime of these ops decrease from 139.3ms to 58.5ms
   > 
   > My modifications to the example can be found here: 
https://gist.github.com/mbrookhart/df25427cbbfb3c73ed16be72c8525610
   
   The time measurement is not as fast as before because of this PR: 
https://github.com/apache/tvm/pull/7005. If you reverse back this one, you 
should get fairly good performance without any effect on the correctness. 
Because after my improvement, nms is not a bottleneck of the ssd model anymore 
while in your measurement, it seems it still is. So my point is your changes to 
PR #7005 here: 
https://github.com/apache/tvm/blob/f332512629e94fd8d761f3dd002fd3aa91673ce0/python/tvm/topi/cuda/nms.py#L470-L483
 
   and here: 
   
https://github.com/apache/tvm/blob/f332512629e94fd8d761f3dd002fd3aa91673ce0/python/tvm/topi/cuda/nms.py#L632-L643
   should be faster and thus gets the performance improvement. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] comaniac commented on a change in pull request #7070: Add autoscheduler support to tvmc

2020-12-10 Thread GitBox



comaniac commented on a change in pull request #7070:
URL: https://github.com/apache/tvm/pull/7070#discussion_r540422973



##
File path: python/tvm/driver/tvmc/autotuner.py
##
@@ -253,6 +320,72 @@ def get_tuning_tasks(mod, params, target, 
target_host=None, alter_layout=None):
 return tasks
 
 
+def autoscheduler_get_tuning_tasks(
+mod, params, target, target_host=None, alter_layout=None, 
hardware_params=None
+):
+"""Get the tuning tasks for a given relay module.
+
+Parameters
+--
+mod : tvm.relay.Module
+The relay module from which to extract tuning tasks.
+params : dict
+The params for the relay module.
+target : tvm.target.Target
+The compilation target.
+target_host : str, optional
+The compilation target for the host.
+alter_layout : str, optional
+The layout to convert the graph to. Note, the convert layout
+pass doesn't currently guarantee the whole of the graph will
+be converted to the chosen layout.

Review comment:
   Missing `hardware_params`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] comaniac commented on pull request #7070: Add autoscheduler support to tvmc

2020-12-10 Thread GitBox



comaniac commented on pull request #7070:
URL: https://github.com/apache/tvm/pull/7070#issuecomment-742737514


   > Hi @comaniac , @leandron ,
   > Thanks for your comments, now the code is way more sound. I thought of 
summing up here some issues left:
   > 
   > * It's now hard to make autotuner vs autoscheduler options mutually 
exclusive. So if I specify `--tuner` with `--enable-autoscheduling`, the tuner 
I selected won't be considered. This is because we are using default values, so 
a tuner will be always defined.
   
   I'm ok with this for now.
   
   > * Extracting tasks. @comaniac , you flagged the function, but this is 
actually how we do for the autotuner. I guess this is because we want to 
unit-test the function
   
   No worries. It is no longer an issue based on the latest implementation. It 
is reasonable to have separate functions for two frameworks launched by the 
same CLI command.
   
   > * Testing: I removed the `test_autoscheduler.py` test. When I wrote it, I 
was inspired by the `test_autotuner.py`. Can we agree on a set of tests we want 
to execute? @comaniac , I agree we should not test the autotuner, but only the 
user interface. But I fear to test the user interface, we will need to test a 
bit the functionality as well. What do you guys think?
   
   You're right. It's always hard to keep the proper unit test scope of TVMC. 
We should have a thread to discuss all of them and make a agreement. cc 
@leandron 
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] mbrookhart commented on pull request #6839: [ONNX] NMS in ONNX

2020-12-10 Thread GitBox



mbrookhart commented on pull request #6839:
URL: https://github.com/apache/tvm/pull/6839#issuecomment-742739538


   So, we seem to be at a bit of an impass. The implementation that is in 
master is not 100% correct, and it was even less correct before #7005. This PR 
is the best combination of correctness and performance we have a the moment. I 
would like to get it merged to have a correct implementation in place, but I 
would also like to find a way to speed up get valid counts, what I have here is 
not great.
   
   I'm happy to spend the next few days trying to improve parallelism in 
get_valid_counts, but given that this is a performance improvement over main 
and it's more correct, I think we shouldn't block merging over the current 
performance, we could always do a second PR to improve performance once we have 
the unit tests in place



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] hypercubestart opened a new pull request #7082: [Rust] LeakyReluAttrsNode, AvgPool2DAttrsNode bindings

2020-12-10 Thread GitBox



hypercubestart opened a new pull request #7082:
URL: https://github.com/apache/tvm/pull/7082


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] tkonolige opened a new pull request #7083: [RELAY,TOPI] Threefry PRNG: splittable and stateless

2020-12-10 Thread GitBox



tkonolige opened a new pull request #7083:
URL: https://github.com/apache/tvm/pull/7083


   This PR adds a fast PRNG to Relay for use in dropout and batch norm. The 
PRNG is stateless: for a given key, it always returns the same random number. 
It is also splittable: for a given key, we can split the key to generate two 
new ones.
   
   JAX provides a good explanation of stateless and splittable: 
https://jax.readthedocs.io/en/latest/notebooks/Common_Gotchas_in_JAX.html#JAX-PRNG.
   
   @altanh 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] trevor-m commented on pull request #7062: [Frontend][MXNet] Support mode=instance,spatial for l2_normalize

2020-12-10 Thread GitBox



trevor-m commented on pull request #7062:
URL: https://github.com/apache/tvm/pull/7062#issuecomment-742747268


   @icemelon9 @kevinthesun Could you please review? Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] masahi commented on a change in pull request #7080: Add a standalone regression test for running MergeComposite on a QNN graph

2020-12-10 Thread GitBox



masahi commented on a change in pull request #7080:
URL: https://github.com/apache/tvm/pull/7080#discussion_r540446077



##
File path: tests/python/frontend/pytorch/test_qnn_byoc_flow.py
##
@@ -0,0 +1,89 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+""" Tests on QNN + BYOC """
+import torch
+
+import tvm
+import tvm.testing
+
+from tvm import relay
+from tvm.relay.dataflow_pattern import wildcard, is_op
+from tvm.relay.op.contrib.register import register_pattern_table
+from tvm.relay.op.contrib.register import get_pattern_table
+
+
+def make_qnn_add_pattern():
+lhs = wildcard()
+rhs = wildcard()
+lhs_scale = wildcard()
+lhs_zero_point = wildcard()
+rhs_scale = wildcard()
+rhs_zero_point = wildcard()
+output_scale = wildcard()
+output_zero_point = wildcard()
+qadd = is_op("qnn.add")(
+lhs,
+rhs,
+lhs_scale,
+lhs_zero_point,
+rhs_scale,
+rhs_zero_point,
+output_scale,
+output_zero_point,
+)
+return qadd.optional(is_op("clip"))
+
+
+@register_pattern_table("test_table")
+def pattern_table():
+return [
+("qnn_add", make_qnn_add_pattern()),
+]
+
+
+def run_byoc_flow(script_module, input_name, ishape):
+input_shapes = [(input_name, ishape)]
+mod, params = relay.frontend.from_pytorch(script_module, input_shapes)
+pattern_table = get_pattern_table("test_table")
+with tvm.transform.PassContext(opt_level=3):
+pass_list = [
+tvm.relay.transform.SimplifyInference(),
+tvm.relay.transform.MergeComposite(pattern_table),
+tvm.relay.transform.AnnotateTarget("test"),
+tvm.relay.transform.MergeCompilerRegions(),
+tvm.relay.transform.PartitionGraph(),
+]
+composite_partition = tvm.transform.Sequential(pass_list)
+partitioned = composite_partition(mod)
+
+
+def test_qnn_byoc_flow():

Review comment:
   `test_qnn_byoc_flow` -> `test_qnn_mergecomposite`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] masahi commented on a change in pull request #7080: Add a standalone regression test for running MergeComposite on a QNN graph

2020-12-10 Thread GitBox



masahi commented on a change in pull request #7080:
URL: https://github.com/apache/tvm/pull/7080#discussion_r540446358



##
File path: tests/python/frontend/pytorch/test_qnn_byoc_flow.py
##
@@ -0,0 +1,89 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+""" Tests on QNN + BYOC """
+import torch
+
+import tvm
+import tvm.testing
+
+from tvm import relay
+from tvm.relay.dataflow_pattern import wildcard, is_op
+from tvm.relay.op.contrib.register import register_pattern_table
+from tvm.relay.op.contrib.register import get_pattern_table
+
+
+def make_qnn_add_pattern():
+lhs = wildcard()
+rhs = wildcard()
+lhs_scale = wildcard()
+lhs_zero_point = wildcard()
+rhs_scale = wildcard()
+rhs_zero_point = wildcard()
+output_scale = wildcard()
+output_zero_point = wildcard()
+qadd = is_op("qnn.add")(
+lhs,
+rhs,
+lhs_scale,
+lhs_zero_point,
+rhs_scale,
+rhs_zero_point,
+output_scale,
+output_zero_point,
+)
+return qadd.optional(is_op("clip"))
+
+
+@register_pattern_table("test_table")
+def pattern_table():
+return [
+("qnn_add", make_qnn_add_pattern()),
+]
+
+
+def run_byoc_flow(script_module, input_name, ishape):

Review comment:
   `run_byoc_flow` -> `run_qnn_mergecomposite`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7070: Add autoscheduler support to tvmc

2020-12-10 Thread GitBox



giuseros commented on a change in pull request #7070:
URL: https://github.com/apache/tvm/pull/7070#discussion_r540449511



##
File path: python/tvm/driver/tvmc/autotuner.py
##
@@ -133,6 +127,45 @@ def add_tune_parser(subparsers):
 default=None,
 help="change the data layout of the whole graph",
 )
+parser.add_argument(
+"--enable-autoscheduler",
+help="enable tuning the graph through the autoscheduler",
+action="store_true",
+)
+
+auto_scheduler_group = parser.add_argument_group(
+"Autoscheduler options",
+"Autoscheduler options, used when --enabled-auto-scheduler is 
provided",
+)
+
+auto_scheduler_group.add_argument(
+"--cache-line-bytes",
+type=int,
+default=64,
+help="the size of cache line in bytes",
+)
+auto_scheduler_group.add_argument(
+"--num-cores",
+type=int,
+default=4,
+help="the number of device cores",
+)
+auto_scheduler_group.add_argument(
+"--vector-unit-bytes",
+type=int,
+default=16,
+help="the width of vector units in bytes",
+)
+auto_tuning_group = parser.add_argument_group(
+"Autotuning options",
+"Autotuning options, used when the autoscheduler is not enabled",
+)

Review comment:
   Since this is also for me to familiarize with the Autoscheduler, I would 
prefer to add here anything that you and @merrymercy think relevant in terms of 
Autoscheduler options. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] giuseros commented on a change in pull request #7070: Add autoscheduler support to tvmc

2020-12-10 Thread GitBox



giuseros commented on a change in pull request #7070:
URL: https://github.com/apache/tvm/pull/7070#discussion_r540449511



##
File path: python/tvm/driver/tvmc/autotuner.py
##
@@ -133,6 +127,45 @@ def add_tune_parser(subparsers):
 default=None,
 help="change the data layout of the whole graph",
 )
+parser.add_argument(
+"--enable-autoscheduler",
+help="enable tuning the graph through the autoscheduler",
+action="store_true",
+)
+
+auto_scheduler_group = parser.add_argument_group(
+"Autoscheduler options",
+"Autoscheduler options, used when --enabled-auto-scheduler is 
provided",
+)
+
+auto_scheduler_group.add_argument(
+"--cache-line-bytes",
+type=int,
+default=64,
+help="the size of cache line in bytes",
+)
+auto_scheduler_group.add_argument(
+"--num-cores",
+type=int,
+default=4,
+help="the number of device cores",
+)
+auto_scheduler_group.add_argument(
+"--vector-unit-bytes",
+type=int,
+default=16,
+help="the width of vector units in bytes",
+)
+auto_tuning_group = parser.add_argument_group(
+"Autotuning options",
+"Autotuning options, used when the autoscheduler is not enabled",
+)

Review comment:
   Since this is also for me to familiarize with the Autoscheduler, I would 
prefer to add here anything that you and @merrymercy consider relevant in terms 
of Autoscheduler options. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] dimitraka commented on issue #7066: tvm c++ compil error

2020-12-10 Thread GitBox



dimitraka commented on issue #7066:
URL: https://github.com/apache/tvm/issues/7066#issuecomment-742755419


   How did you solved this issue?
   
   I'm getting an error "recipe for target 'container_test' failed"  while  
"Linking CXX executable container_test"



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] dimitraka edited a comment on issue #7066: tvm c++ compil error

2020-12-10 Thread GitBox



dimitraka edited a comment on issue #7066:
URL: https://github.com/apache/tvm/issues/7066#issuecomment-742755419


   How did you solve this issue?
   
   I'm getting an error "recipe for target 'container_test' failed"  while  
"Linking CXX executable container_test"



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] merrymercy commented on pull request #7073: Rollback changes to SSA begin/end scope for Store in C codegen

2020-12-10 Thread GitBox



merrymercy commented on pull request #7073:
URL: https://github.com/apache/tvm/pull/7073#issuecomment-742775769


   LGTM. Please fix the lint error



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] ZihengJiang opened a new pull request #7084: [WIP][TIR] Support Return in TIR

2020-12-10 Thread GitBox



ZihengJiang opened a new pull request #7084:
URL: https://github.com/apache/tvm/pull/7084


   This PR supports building TIR function directly and allows TIR returns value 
directly. 
   Current approach is by adding an intrinsic for `return`.  Example:
   ```python
   def add():
   a = tir.Var("a", "float32") 
   b = tir.Var("b", "float32") 
   c = a + b
   c = tir.call_intrin("float32", "tir.myreturn", c) 
   c = tir.Evaluate(c)
   func = tir.PrimFunc([a, b], c)
   mod = tvm.IRModule({'add': func})
   func = tvm.build(mod['add'])
   out = func(1.0, 2.0)
   print(out)
   ```
   
   In the future, with the tvm script, we should be able to write some like:
   ```python
   @tvm.script
   def add(a, b):
   tir.myreturn(a + b)
   ```
   
   There are two things need to be discussed:
   - Because `return` is a keyword in Python, currently I use `myreturn`, we 
should have another name for it.
   - Whether we should support multiple `return` in a single function.
   
   @tqchen @junrushao1994 @areusch 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] tqchen commented on a change in pull request #7084: [WIP][TIR] Support Return in TIR

2020-12-10 Thread GitBox



tqchen commented on a change in pull request #7084:
URL: https://github.com/apache/tvm/pull/7084#discussion_r540482265



##
File path: include/tvm/tir/builtin.h
##
@@ -41,6 +41,10 @@ namespace tir {
 
 /*! \brief Collection of builtin intrinsics as ops */
 namespace builtin {
+/*!
+ * \brief Return value.
+ */
+TVM_DLL const Op& myreturn();

Review comment:
   How about return_value





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] ZihengJiang commented on a change in pull request #7084: [WIP][TIR] Support Return in TIR

2020-12-10 Thread GitBox



ZihengJiang commented on a change in pull request #7084:
URL: https://github.com/apache/tvm/pull/7084#discussion_r540483035



##
File path: include/tvm/tir/builtin.h
##
@@ -41,6 +41,10 @@ namespace tir {
 
 /*! \brief Collection of builtin intrinsics as ops */
 namespace builtin {
+/*!
+ * \brief Return value.
+ */
+TVM_DLL const Op& myreturn();

Review comment:
   @tqchen  how about `tir.ret`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] altanh commented on pull request #7083: [RELAY,TOPI] Threefry PRNG: splittable and stateless

2020-12-10 Thread GitBox



altanh commented on pull request #7083:
URL: https://github.com/apache/tvm/pull/7083#issuecomment-742788752


   awesome work, cc @tqchen @junrushao1994 @MarisaKirisame @eric-haibin-lin who 
may be interested
   
   I think it may be worth discussing high vs low level API for this, and what 
namespace it should live in. I wrote a few examples for how we might use this 
here 
https://discuss.tvm.apache.org/t/rfc-handling-effect-in-tvm-and-relay/5946/25?u=altanh



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] junrushao1994 commented on a change in pull request #7084: [WIP][TIR] Support Return in TIR

2020-12-10 Thread GitBox



junrushao1994 commented on a change in pull request #7084:
URL: https://github.com/apache/tvm/pull/7084#discussion_r540486888



##
File path: include/tvm/tir/builtin.h
##
@@ -41,6 +41,10 @@ namespace tir {
 
 /*! \brief Collection of builtin intrinsics as ops */
 namespace builtin {
+/*!
+ * \brief Return value.
+ */
+TVM_DLL const Op& myreturn();

Review comment:
   I think `tir.ret` looks better





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] csullivan commented on pull request #7073: Rollback changes to SSA begin/end scope for Store in C codegen

2020-12-10 Thread GitBox



csullivan commented on pull request #7073:
URL: https://github.com/apache/tvm/pull/7073#issuecomment-742797682


   Thank you. 
   Looks like main is failing the linter after a bad merge. @vegaluisjose fixes 
it here: https://github.com/apache/tvm/pull/7081/files



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] areusch commented on a change in pull request #7084: [WIP][TIR] Support Return in TIR

2020-12-10 Thread GitBox



areusch commented on a change in pull request #7084:
URL: https://github.com/apache/tvm/pull/7084#discussion_r540496971



##
File path: python/tvm/driver/build_module.py
##
@@ -159,17 +159,30 @@ def lower(sch, args, name="main", binds=None, 
simple_mode=False):
 lower_phase2 = [x[1] for x in add_lower_pass if x[0] == 2]
 lower_phase3 = [x[1] for x in add_lower_pass if x[0] > 2]
 
+is_tir_schedule = False
+
 # Phase 0
 if isinstance(sch, schedule.Schedule):
 mod = form_irmodule(sch, args, name, binds)
+elif isinstance(sch, tvm.tir.PrimFunc):
+func = sch.with_attr("global_symbol", name)
+if pass_ctx.config.get("tir.restricted_func"):
+func = func.with_attr("tir.noalias", True)
+mod = tvm.IRModule({name: func})
+is_tir_schedule = True
 else:
 mod = sch
 
 pass_list = lower_phase0
 # Phase 1
+pass_list += [tvm.tir.transform.InjectPrefetch()]
+
+if is_tir_schedule:
+pass
+# pass_list += [tvm.tir.transform.BufferFlatten()]

Review comment:
   remove the comment and invert the if

##
File path: src/tir/transforms/make_packed_api.cc
##
@@ -222,6 +275,7 @@ namespace transform {
 
 Pass MakePackedAPI(int num_unpacked_args) {
   auto pass_func = [num_unpacked_args](IRModule m, PassContext ctx) {
+LOG(INFO) << "Before Make Packed API:\n" << m;

Review comment:
   remove

##
File path: src/tir/transforms/make_packed_api.cc
##
@@ -239,6 +293,7 @@ Pass MakePackedAPI(int num_unpacked_args) {
 for (const auto& pair : updates) {
   mptr->AddUnchecked(pair.first, pair.second);
 }
+LOG(INFO) << "After Make Packed API:\n" << m;

Review comment:
   remove

##
File path: src/tir/transforms/make_packed_api.cc
##
@@ -41,6 +41,58 @@
 namespace tvm {
 namespace tir {
 
+class ReturnRewriter : public StmtMutator {
+ public:
+  explicit ReturnRewriter(Var ret_var, Var ret_tcode)
+: ret_var_(ret_var), ret_tcode_(ret_tcode) {}
+
+  Stmt VisitStmt_(const EvaluateNode* node) override {
+Stmt ret = StmtMutator::VisitStmt_(node);
+const EvaluateNode* eval = ret.as();
+CHECK(eval);
+if (const CallNode* call = eval->value.as()) {
+  if (call->op.same_as(builtin::myreturn())) {
+CHECK_EQ(call->args.size(), 1);
+ret = WriteToOut(call->args[0], ret_var_, ret_tcode_);
+  }
+}
+return ret;
+  }
+ private:
+  std::pair ConvertForFFI(PrimExpr val) {
+DataType dtype = val.dtype();
+if (dtype.is_int() || dtype.is_uint()) {
+  return {kTVMArgInt, Cast(DataType::Int(64), val)};
+} else if (dtype.is_float()) {
+  return {kTVMArgFloat, Cast(DataType::Float(64), val)};
+} else if (dtype.is_void()) {
+  return {kTVMNullptr, val};
+} else {
+  LOG(FATAL) << "data type " << dtype << " not supported yet";

Review comment:
   I think we may need to return at least DLTensor for AOT

##
File path: tests/python/unittest/test_tir_build.py
##
@@ -0,0 +1,16 @@
+import tvm
+from tvm import tir
+
+def add():
+a = tir.Var("a", "float32") 
+b = tir.Var("b", "float32") 
+c = a + b
+c = tir.call_intrin("float32", "tir.myreturn", c) 
+c = tir.Evaluate(c)
+func = tir.PrimFunc([a, b], c)
+mod = tvm.IRModule({'add': func})
+func = tvm.build(mod['add'])
+out = func(1.0, 2.0)
+print(out)

Review comment:
   add an assert here





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] Laurawly edited a comment on pull request #6839: [ONNX] NMS in ONNX

2020-12-10 Thread GitBox



Laurawly edited a comment on pull request #6839:
URL: https://github.com/apache/tvm/pull/6839#issuecomment-742728283


   > @Laurawly I took the mxnet example you provided and ran it with the debug 
runtime. It required a little bit of editing, APIs have changed slightly since 
that tutorial was written. Anyway, this is what I get on my 1070 TI with Thrust 
enabled.
   > 
   > main:
   > 
   > ```
   > Ops
 Time(us)Time(%)  Shape
   > ---
 ---  - 
   > fused_vision_non_max_suppression   
 139329.074.66(1, 122640, 6)
   > fused_vision_get_valid_counts  
 124.255 0.067(1, 122640, 6) 
   > ```
   > 
   > this PR:
   > 
   > ```
   > fused_vision_get_valid_counts  
 46138.350.891   (1, 122640, 6)  
   > fused_vision_non_max_suppression   
 12319.813.589   (1, 122640, 6)
   > ```
   > 
   > The get valid counts function slow down, but I'm actually seeing the total 
runtime of these ops decrease from 139.3ms to 58.5ms
   > 
   > My modifications to the example can be found here: 
https://gist.github.com/mbrookhart/df25427cbbfb3c73ed16be72c8525610
   
   The time measurement is not as fast as before because of this PR: 
https://github.com/apache/tvm/pull/7005. If you reverse back this one, you 
should get fairly good performance without any problem with the correctness. 
Because after my improvement, nms is not a bottleneck of the ssd model anymore 
while in your measurement, it seems it still is. So my point is your changes to 
PR #7005 here: 
https://github.com/apache/tvm/blob/f332512629e94fd8d761f3dd002fd3aa91673ce0/python/tvm/topi/cuda/nms.py#L470-L483
 
   and here: 
   
https://github.com/apache/tvm/blob/f332512629e94fd8d761f3dd002fd3aa91673ce0/python/tvm/topi/cuda/nms.py#L632-L643
   should be faster and thus gets the performance improvement. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] mbrookhart commented on pull request #6839: [ONNX] NMS in ONNX

2020-12-10 Thread GitBox



mbrookhart commented on pull request #6839:
URL: https://github.com/apache/tvm/pull/6839#issuecomment-742807888


   @Laurawly I just rewrote get_valid_counts in the way you suggested. I still 
need to better parallelize the sum/conditional scan operation, but this takes 
it to:
   ```
   Ops  
   Time(us)   Time(%)  Shape   Inputs  
Outputs  
   ---  
      ---  -   --  
---  
   fused_vision_non_max_suppression 
   12105.925.723   (1, 122640, 6)  
   fused_vision_get_valid_counts
   3517.227.474(1, 122640, 6)
   ```
   I'm going to take another look at NMS before I try to parallelize the sum



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] Laurawly commented on pull request #6839: [ONNX] NMS in ONNX

2020-12-10 Thread GitBox



Laurawly commented on pull request #6839:
URL: https://github.com/apache/tvm/pull/6839#issuecomment-742808115


   > So, we seem to be at a bit of an impass. The implementation that is in 
master is not 100% correct, and it was even less correct before #7005. This PR 
is the best combination of correctness and performance we have a the moment. I 
would like to get it merged to have a correct implementation in place, but I 
would also like to find a way to speed up get valid counts, what I have here is 
not great.
   > 
   > I'm happy to spend the next few days trying to improve parallelism in 
get_valid_counts, but given that this is a performance improvement over main 
and it's more correct, I think we shouldn't block merging over the current 
performance, we could always do a second PR to improve performance once we have 
the unit tests in place
   
   We have tested the correctness of ssd related detection models before PR 
#7005 and they are in the correctness margin for customers to ship. I agree on 
the part you updated where PR #7005 has changed for perf improvement. But I do 
suggest a perf comparison of your implementation and the one before PR #7005. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] jroesch opened a new pull request #7085: [WIP] Fixes for using Python APIs from Rust.

2020-12-10 Thread GitBox



jroesch opened a new pull request #7085:
URL: https://github.com/apache/tvm/pull/7085


   cc @hypercubestart this is a bigger PR to fix the current bindings and 
enable calling them with the Python code, but it needs a bit of cleanup, there 
are some changes that I used to debug segfaults. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] mbrookhart commented on pull request #6839: [ONNX] NMS in ONNX

2020-12-10 Thread GitBox



mbrookhart commented on pull request #6839:
URL: https://github.com/apache/tvm/pull/6839#issuecomment-742809738


   > We have tested the correctness of ssd related detection models before PR 
#7005 and they are in the correctness margin for customers to ship.
   
   This indicates that the kernel was correct for certain inputs, but not that 
it was correct overall.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] jroesch merged pull request #7081: [VTA] update 3rdparty submodule

2020-12-10 Thread GitBox



jroesch merged pull request #7081:
URL: https://github.com/apache/tvm/pull/7081


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[tvm] branch main updated (7a3278a -> 653f697)

2020-12-10 Thread jroesch

This is an automated email from the ASF dual-hosted git repository.

jroesch pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 7a3278a  [Relay][Strategy] Allow cuda cross compilation without 
physical device. (#7063)
 add 653f697  [VTA] update 3rdparty submodule (#7081)

No new revisions were added by this update.

Summary of changes:
 3rdparty/vta-hw  | 2 +-
 python/tvm/relay/op/strategy/cuda.py | 1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

[GitHub] [tvm] Laurawly edited a comment on pull request #6839: [ONNX] NMS in ONNX

2020-12-10 Thread GitBox



Laurawly edited a comment on pull request #6839:
URL: https://github.com/apache/tvm/pull/6839#issuecomment-742808115


   > So, we seem to be at a bit of an impass. The implementation that is in 
master is not 100% correct, and it was even less correct before #7005. This PR 
is the best combination of correctness and performance we have a the moment. I 
would like to get it merged to have a correct implementation in place, but I 
would also like to find a way to speed up get valid counts, what I have here is 
not great.
   > 
   > I'm happy to spend the next few days trying to improve parallelism in 
get_valid_counts, but given that this is a performance improvement over main 
and it's more correct, I think we shouldn't block merging over the current 
performance, we could always do a second PR to improve performance once we have 
the unit tests in place
   
   We have tested the correctness of mxnet ssd related detection models before 
PR #7005 and they are in the correctness margin for customers to ship. I agree 
on the part you updated where PR #7005 has changed for perf improvement. But I 
do suggest a perf comparison of your implementation and the one before PR 
#7005. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] jwfromm opened a new pull request #7086: [Relay][Op] Consolidate reshape and reverse_reshape operators.

2020-12-10 Thread GitBox



jwfromm opened a new pull request #7086:
URL: https://github.com/apache/tvm/pull/7086


   Currently relay has separate operators for `reshape` and `reverse_reshape` 
however in c++ they were both implemented using shared functions. The weird 
part is that `ReshapeAttrs` has a `reverse` attribute that was used for 
`reverse_reshape` but not exposed in the python API of `reshape`. This often 
causes headaches when trying to create a new `reshape` operator in a pass from 
existing attributes, since python doesn't expect a `reverse` argument. As far 
as I can tell, there's no reason to have a separate `reverse_reshape` op. 
Consolidating the two and exposing `reverse` in python leads to a much cleaner 
and more obvious API without breaking any tests or use cases.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] comaniac opened a new pull request #7087: [Relay] Deformable Conv2D layout conversion

2020-12-10 Thread GitBox



comaniac opened a new pull request #7087:
URL: https://github.com/apache/tvm/pull/7087


   This PR implements the required functions for deformable conv2D to perform 
layout conversion between NHWC and NCHW. Please also be aware of the following 
points in the review:
   
   - I am not sure if simply returning `None` is ok for alter op and legalize 
functions.
   - I suppose we don't need to consider QNN yet because we only have the FP32 
computes in TOPI?
   
   cc @anijain2305 @yzhliu @vinx13 @Laurawly 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] jwfromm commented on pull request #7086: [Relay][Op] Consolidate reshape and reverse_reshape operators.

2020-12-10 Thread GitBox



jwfromm commented on pull request #7086:
URL: https://github.com/apache/tvm/pull/7086#issuecomment-742836190


   @vinx13 @icemelon9 @electriclilies can you guys take a look at this PR and 
let me know what you think?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] Laurawly edited a comment on pull request #6839: [ONNX] NMS in ONNX

2020-12-10 Thread GitBox



Laurawly edited a comment on pull request #6839:
URL: https://github.com/apache/tvm/pull/6839#issuecomment-742808115


   > So, we seem to be at a bit of an impass. The implementation that is in 
master is not 100% correct, and it was even less correct before #7005. This PR 
is the best combination of correctness and performance we have a the moment. I 
would like to get it merged to have a correct implementation in place, but I 
would also like to find a way to speed up get valid counts, what I have here is 
not great.
   > 
   > I'm happy to spend the next few days trying to improve parallelism in 
get_valid_counts, but given that this is a performance improvement over main 
and it's more correct, I think we shouldn't block merging over the current 
performance, we could always do a second PR to improve performance once we have 
the unit tests in place
   
   We have tested the correctness of mxnet ssd related detection models before 
PR #7005 and they are in the correctness margin for customers to ship. I do 
suggest a perf comparison of your implementation and the one in main block by 
block. And an overall performance comparison with the implementation before PR 
#7005's change to `nms.py`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] trevor-m opened a new pull request #7088: [Frontend][Pytorch] Handle case where output of model is python list

2020-12-10 Thread GitBox



trevor-m opened a new pull request #7088:
URL: https://github.com/apache/tvm/pull/7088


   `prim::ListConstruct` converter can make a regular python list when the list 
is not dynamic:
   
https://github.com/apache/tvm/blob/main/python/tvm/relay/frontend/pytorch.py#L2436-L2439
   
   If that python list happens to be the output of the whole model, an error 
would occur when the list is passed to `_analysis.free_vars` because it only 
accepts Expr. This PR puts the list in a tuple to solve that.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] electriclilies commented on pull request #7086: [Relay][Op] Consolidate reshape and reverse_reshape operators.

2020-12-10 Thread GitBox



electriclilies commented on pull request #7086:
URL: https://github.com/apache/tvm/pull/7086#issuecomment-742841281


   Looks good to me! Thanks for getting this up quickly



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] Laurawly commented on pull request #6839: [ONNX] NMS in ONNX

2020-12-10 Thread GitBox



Laurawly commented on pull request #6839:
URL: https://github.com/apache/tvm/pull/6839#issuecomment-742845654


   > I'm happy to spend the next few days trying to improve parallelism in 
get_valid_counts, but given that this is a performance improvement over main 
and it's more correct, I think we shouldn't block merging over the current 
performance, we could always do a second PR to improve performance once we have 
the unit tests in place
   
   Why don't you put the new implementation in a separate file say 
`nms_onnx.py` or a separate function. And we can see if we can merge it back 
once enough tests have passed to test the flakiness of the kernel and when it 
has better parallelism than using only `batch_size` number of threads. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] mbrookhart commented on pull request #6839: [ONNX] NMS in ONNX

2020-12-10 Thread GitBox



mbrookhart commented on pull request #6839:
URL: https://github.com/apache/tvm/pull/6839#issuecomment-742861849


   I really don't want the code fragmentation, especially when it's currently 
~9x faster than main. I will keep pounding on speeding it up



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] jwfromm opened a new pull request #7089: [Relay][Frontend][Onnx] Add softplus operator conversion to Onnx.

2020-12-10 Thread GitBox



jwfromm opened a new pull request #7089:
URL: https://github.com/apache/tvm/pull/7089


   This little PR adds support for the ONNX Softplus operator.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] jwfromm commented on pull request #7089: [Relay][Frontend][Onnx] Add softplus operator conversion to Onnx.

2020-12-10 Thread GitBox



jwfromm commented on pull request #7089:
URL: https://github.com/apache/tvm/pull/7089#issuecomment-742864599


   @masahi @CircleSpin can you guys take a look at this PR?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] comaniac opened a new issue #7090: [Bug][Relay] Winograd Conv2D for auto-scheduler errors out when input image is not square

2020-12-10 Thread GitBox



comaniac opened a new issue #7090:
URL: https://github.com/apache/tvm/issues/7090


   ## Description
   
   When compiling a Conv2D with non-square input image, the weight transform 
has some bugs so that the transformed weight has the wrong shape. For example, 
the shape I used (1, 23, 40, 128) results in alpha=4 of the transform weight. 
Based on this value, the kernel size is `alpha - tile_size + 1 = 4 - 4 + 1 = 
1`, which is no longer 3.
   
   ## Error Log
   
   ```
   Exception in thread Thread-1:
   Traceback (most recent call last):
 File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
   self.run()
 File "/usr/lib/python3.6/threading.py", line 864, in run
   self._target(*self._args, **self._kwargs)
 File 
"/home/ubuntu/cody-tvm/python/tvm/auto_scheduler/relay_integration.py", line 
61, in call_all_topi_funcs
   grc.codegen(opt_mod["main"])
 File 
"/home/ubuntu/cody-tvm/python/tvm/relay/backend/graph_runtime_codegen.py", line 
83, in codegen
   self._codegen(func)
 File "tvm/_ffi/_cython/./packed_func.pxi", line 321, in 
tvm._ffi._cy3.core.PackedFuncBase.__call__
 File "tvm/_ffi/_cython/./packed_func.pxi", line 256, in 
tvm._ffi._cy3.core.FuncCall
 File "tvm/_ffi/_cython/./packed_func.pxi", line 245, in 
tvm._ffi._cy3.core.FuncCall3
 File "tvm/_ffi/_cython/./base.pxi", line 160, in tvm._ffi._cy3.core.CALL
   tvm._ffi.base.TVMError: Traceback (most recent call last):
 [bt] (8) 
/home/ubuntu/cody-tvm/build/libtvm.so(tvm::relay::backend::MemoizedExprTranslator >::VisitExpr(tvm::RelayExpr const&)+0xa9) [0x7fce58f91e69]
 [bt] (7) 
/home/ubuntu/cody-tvm/build/libtvm.so(tvm::relay::ExprFunctor (tvm::RelayExpr const&)>::VisitExpr(tvm::RelayExpr const&)+0x82) 
[0x7fce58f91be2]
 [bt] (6) 
/home/ubuntu/cody-tvm/build/libtvm.so(tvm::relay::ExprFunctor (tvm::RelayExpr const&)>::InitVTable()::{lambda(tvm::runtime::ObjectRef 
const&, tvm::relay::ExprFunctor 
(tvm::RelayExpr const&)>*)#6}::_FUN(tvm::runtime::ObjectRef const&, 
tvm::relay::ExprFunctor 
(tvm::RelayExpr const&)>*)+0x27) [0x7fce58f832e7]
 [bt] (5) 
/home/ubuntu/cody-tvm/build/libtvm.so(tvm::relay::ScheduleGetter::VisitExpr_(tvm::relay::CallNode
 const*)+0x14f) [0x7fce58f89a8f]
 [bt] (4) 
/home/ubuntu/cody-tvm/build/libtvm.so(tvm::relay::backend::MemoizedExprTranslator >::VisitExpr(tvm::RelayExpr const&)+0xa9) [0x7fce58f91e69]
 [bt] (3) 
/home/ubuntu/cody-tvm/build/libtvm.so(tvm::relay::ExprFunctor (tvm::RelayExpr const&)>::VisitExpr(tvm::RelayExpr const&)+0x82) 
[0x7fce58f91be2]
 [bt] (2) 
/home/ubuntu/cody-tvm/build/libtvm.so(tvm::relay::ExprFunctor (tvm::RelayExpr const&)>::InitVTable()::{lambda(tvm::runtime::ObjectRef 
const&, tvm::relay::ExprFunctor 
(tvm::RelayExpr const&)>*)#6}::_FUN(tvm::runtime::ObjectRef const&, 
tvm::relay::ExprFunctor 
(tvm::RelayExpr const&)>*)+0x27) [0x7fce58f832e7]
 [bt] (1) 
/home/ubuntu/cody-tvm/build/libtvm.so(tvm::relay::ScheduleGetter::VisitExpr_(tvm::relay::CallNode
 const*)+0x714) [0x7fce58f8a054]
 [bt] (0) /home/ubuntu/cody-tvm/build/libtvm.so(+0x1230f0b) [0x7fce5913cf0b]
 File "tvm/_ffi/_cython/./packed_func.pxi", line 55, in 
tvm._ffi._cy3.core.tvm_callback
 File "/home/ubuntu/cody-tvm/python/tvm/relay/backend/compile_engine.py", 
line 300, in lower_call
   best_impl, outputs = select_implementation(op, call.attrs, inputs, 
ret_type, target)
 File "/home/ubuntu/cody-tvm/python/tvm/relay/backend/compile_engine.py", 
line 205, in select_implementation
   outs = best_plevel_impl.compute(attrs, inputs, out_type)
 File "/home/ubuntu/cody-tvm/python/tvm/relay/op/op.py", line 90, in compute
   return _OpImplementationCompute(self, attrs, inputs, out_type)
 File "tvm/_ffi/_cython/./packed_func.pxi", line 321, in 
tvm._ffi._cy3.core.PackedFuncBase.__call__
 File "tvm/_ffi/_cython/./packed_func.pxi", line 266, in 
tvm._ffi._cy3.core.FuncCall
 File "tvm/_ffi/_cython/./base.pxi", line 160, in tvm._ffi._cy3.core.CALL
 [bt] (3) /home/ubuntu/cody-tvm/build/libtvm.so(TVMFuncCall+0x65) 
[0x7fce59140485]
 [bt] (2) /home/ubuntu/cody-tvm/build/libtvm.so(+0x1140ca8) [0x7fce5904cca8]
 [bt] (1) 
/home/ubuntu/cody-tvm/build/libtvm.so(tvm::relay::OpImplementation::Compute(tvm::Attrs
 const&, tvm::runtime::Array const&, tvm::Type 
const&)+0xb1) [0x7fce5904ca71]
 [bt] (0) /home/ubuntu/cody-tvm/build/libtvm.so(+0x1230f0b) [0x7fce5913cf0b]
 File "tvm/_ffi/_cython/./packed_func.pxi", line 55, in 
tvm._ffi._cy3.core.tvm_callback
 File "/home/ubuntu/cody-tvm/python/tvm/relay/op/strategy/generic.py", line 
214, in _compute_conv2d
   return [topi_compute(*args)]
 File "/home/ubuntu/cody-tvm/python/tvm/topi/nn/conv2d.py", line 1191, in 
conv2d_winograd_nhwc_without_weight_transform
   data, weight, strides, padding, dilation, out_dtype, pre_computed=True
 File "", line 2, in conv2d_winograd_nhwc
 File "/home/ubuntu/cody-tvm/python/tvm/target/generic_func.py",

[GitHub] [tvm] jwfromm opened a new pull request #7091: [Frontent][Relay][Onnx] Allow If shortcut when condition is constant and available.

2020-12-10 Thread GitBox



jwfromm opened a new pull request #7091:
URL: https://github.com/apache/tvm/pull/7091


   I found a goofy model exported from tensorflow that has a ton of If 
statements but all of their conditions were constant False values. There's no 
reason for us to insert an If when we know the value of condition so I've added 
a check in the onnx importer for this case.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] jwfromm commented on pull request #7091: [Frontend][Relay][Onnx] Allow If shortcut when condition is constant and available.

2020-12-10 Thread GitBox



jwfromm commented on pull request #7091:
URL: https://github.com/apache/tvm/pull/7091#issuecomment-742879720


   @masahi can you take a quick look at this one?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] altanh commented on pull request #7083: [RELAY,TOPI] Threefry PRNG: splittable and stateless

2020-12-10 Thread GitBox



altanh commented on pull request #7083:
URL: https://github.com/apache/tvm/pull/7083#issuecomment-742881312


   worth noting that `threefry_generate` can only generate for shapes that have 
size as multiple of 4, can you update the examples and documentation in the 
Relay python op to reflect this?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] jwfromm closed pull request #7091: [Frontend][Relay][Onnx] Allow If shortcut when condition is constant and available.

2020-12-10 Thread GitBox



jwfromm closed pull request #7091:
URL: https://github.com/apache/tvm/pull/7091


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] mbrookhart commented on pull request #6839: [ONNX] NMS in ONNX

2020-12-10 Thread GitBox



mbrookhart commented on pull request #6839:
URL: https://github.com/apache/tvm/pull/6839#issuecomment-742884803


   I found a combination of serial and threaded portions of NMS that combine to 
make it fast while still passing the tests:
   ```
   fused_vision_get_valid_counts
   3674.6210.87(1, 122640, 6)
   fused_vision_non_max_suppression 
   402.7911.191(1, 122640, 6)
   ```
   I'll go back to parallelizing the sum/scan on get_valid_counts tomorrow, but 
at this point, this is ~30x faster than main.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] tkonolige commented on a change in pull request #7084: [WIP][TIR] Support Return in TIR

2020-12-10 Thread GitBox



tkonolige commented on a change in pull request #7084:
URL: https://github.com/apache/tvm/pull/7084#discussion_r540600882



##
File path: python/tvm/driver/build_module.py
##
@@ -159,17 +159,27 @@ def lower(sch, args, name="main", binds=None, 
simple_mode=False):
 lower_phase2 = [x[1] for x in add_lower_pass if x[0] == 2]
 lower_phase3 = [x[1] for x in add_lower_pass if x[0] > 2]
 
+is_tir_schedule = False
+
 # Phase 0
 if isinstance(sch, schedule.Schedule):
 mod = form_irmodule(sch, args, name, binds)
+elif isinstance(sch, tvm.tir.PrimFunc):

Review comment:
   I'm a little confused on how all the changes in this file it into having 
return for TIR?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] tqchen commented on a change in pull request #7084: [WIP][TIR] Support Return in TIR

2020-12-10 Thread GitBox



tqchen commented on a change in pull request #7084:
URL: https://github.com/apache/tvm/pull/7084#discussion_r540609823



##
File path: python/tvm/driver/build_module.py
##
@@ -159,17 +159,27 @@ def lower(sch, args, name="main", binds=None, 
simple_mode=False):
 lower_phase2 = [x[1] for x in add_lower_pass if x[0] == 2]
 lower_phase3 = [x[1] for x in add_lower_pass if x[0] > 2]
 
+is_tir_schedule = False
+
 # Phase 0
 if isinstance(sch, schedule.Schedule):
 mod = form_irmodule(sch, args, name, binds)
+elif isinstance(sch, tvm.tir.PrimFunc):

Review comment:
   Let us only support build from IRModule not PrimFunc

##
File path: python/tvm/driver/build_module.py
##
@@ -159,17 +159,27 @@ def lower(sch, args, name="main", binds=None, 
simple_mode=False):
 lower_phase2 = [x[1] for x in add_lower_pass if x[0] == 2]
 lower_phase3 = [x[1] for x in add_lower_pass if x[0] > 2]
 
+is_tir_schedule = False
+
 # Phase 0
 if isinstance(sch, schedule.Schedule):
 mod = form_irmodule(sch, args, name, binds)
+elif isinstance(sch, tvm.tir.PrimFunc):
+func = sch.with_attr("global_symbol", name)
+if pass_ctx.config.get("tir.restricted_func"):
+func = func.with_attr("tir.noalias", True)

Review comment:
   no alias and global symbol should be part of the IRModule before passing 
into build





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] tmoreau89 merged pull request #7059: #7058 [Tutorial] Import errors in deploy_detection.py and deploy_classification.py

2020-12-10 Thread GitBox



tmoreau89 merged pull request #7059:
URL: https://github.com/apache/tvm/pull/7059


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[tvm] branch main updated (653f697 -> 19e2631)

2020-12-10 Thread moreau

This is an automated email from the ASF dual-hosted git repository.

moreau pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 653f697  [VTA] update 3rdparty submodule (#7081)
 add 19e2631  #7058 [Tutorial] Import errors in deploy_detection.py and 
deploy_classification.py (#7059)

No new revisions were added by this update.

Summary of changes:
 vta/tutorials/frontend/legacy/deploy_detection.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

[GitHub] [tvm] tmoreau89 commented on pull request #7059: #7058 [Tutorial] Import errors in deploy_detection.py and deploy_classification.py

2020-12-10 Thread GitBox



tmoreau89 commented on pull request #7059:
URL: https://github.com/apache/tvm/pull/7059#issuecomment-742899105


   Thank you @fprotopapa and @liangfu and @vegaluisjose for reviewing the PR!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] ZihengJiang commented on a change in pull request #7084: [WIP][TIR] Support Return in TIR

2020-12-10 Thread GitBox



ZihengJiang commented on a change in pull request #7084:
URL: https://github.com/apache/tvm/pull/7084#discussion_r540622156



##
File path: python/tvm/driver/build_module.py
##
@@ -159,17 +159,27 @@ def lower(sch, args, name="main", binds=None, 
simple_mode=False):
 lower_phase2 = [x[1] for x in add_lower_pass if x[0] == 2]
 lower_phase3 = [x[1] for x in add_lower_pass if x[0] > 2]
 
+is_tir_schedule = False
+
 # Phase 0
 if isinstance(sch, schedule.Schedule):
 mod = form_irmodule(sch, args, name, binds)
+elif isinstance(sch, tvm.tir.PrimFunc):

Review comment:
   @tkonolige for building TIR function 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] masahi commented on a change in pull request #6839: [ONNX] NMS in ONNX

2020-12-10 Thread GitBox



masahi commented on a change in pull request #6839:
URL: https://github.com/apache/tvm/pull/6839#discussion_r540627615



##
File path: tests/python/relay/test_op_level5.py
##
@@ -393,8 +393,6 @@ def verify_nms(
 intrp2 = relay.create_executor("debug", ctx=ctx, target=target)
 op_res2 = intrp2.evaluate(func)(x0_data, x1_data, x2_data, x3_data)
 tvm.testing.assert_allclose(op_res2.asnumpy(), ref_res, rtol=1e-5)
-if target == "nvptx":

Review comment:
   I think I removed this condition in 
https://github.com/apache/tvm/pull/7051 
   Why is this diff here? Need rebase?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm-vta] tmoreau89 merged pull request #9: [Hardware][OpenCL] Intelfocl support

2020-12-10 Thread GitBox



tmoreau89 merged pull request #9:
URL: https://github.com/apache/tvm-vta/pull/9


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[tvm-vta] branch main updated: [Hardware][OpenCL] Intelfocl support (#9)

2020-12-10 Thread moreau

This is an automated email from the ASF dual-hosted git repository.

moreau pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm-vta.git


The following commit(s) were added to refs/heads/main by this push:
 new 5bd9c6a  [Hardware][OpenCL] Intelfocl support (#9)
5bd9c6a is described below

commit 5bd9c6a8b487234e18069c887b3f6271c97292f7
Author: ZHANG Hao 
AuthorDate: Fri Dec 11 10:12:58 2020 +0800

[Hardware][OpenCL] Intelfocl support (#9)

* - static auto-tune sample config
- add mul, load_int8
- some bugfix for bits width

* Extract hw_spec_const.h out of hw_spec.h
Rename VTA_MEM_ID_ACC_8 to VTA_MEM_ID_ACC_8BIT

* Add OpenCL kernel sources for Intel OpenCL for FPGA devices

* Add driver sources to support Intel OpenCL for FPGA devices

* intelfocl sample configuration for VTA added

* Workaround for Signedness bug in Intel OpenCL for FPGA compiler

* remove some comments

* rename cpp to cc

* change UOP src_idx size to max(inp, acc)

* Move AOCLUtils into 3rdpary directory on TVM

* bump the intelfocl HW_VER to 0.0.2

* Bump all the HW_VER to 0.0.2 as there is a ISA change

* Address cpplint issues

* Fix cpplint errors for indentations

* api to init device from outside

* Split OpenCL init and FPGA setup code

* Add comment for cleanup() callback

* Assert error for unsupported input/weight/accu types

* Add Apache Software Foundation headers

* Address cpplint issues

* Drop dependency on 3rd party library aoclutils, preparing for Xilinx 
support

* Xilinx Vitis does not allow local_work_size to be omitted

* Suppress warnings for deprecated clCreateCommandQueue
(clCreateCommandQueueWithProperties not supported by Xilinx)

* Rename intelfocl_ to oclfpga_ as both Intel & Xilinx are supported

* Rename string literals and code structures for Xilinx Vitis support

* Rename aocx to bitstream as part of Xilinx Vitis support

* Remove obsolete vta-cost python script

* Add comments for MEM_ADDR_IDENTIFIER constant

* Apply CamelCase for function names

* Add comments for OCLFPGADevice member functions

* 2-space indentation for .cl files

* Add README to hardware/intelfocl

* Update README.rst

* Update README.rst

* update to trigger ci

* disable tsim test: quick fix for test fails due to ISA changes

* TESTING

* disable tsim test in docker_bash.sh

* cleanup code

Co-authored-by: Li Jiashu 
---
 config/de10nano_sample.json|   2 +-
 config/fsim_sample.json|   2 +-
 ...{de10nano_sample.json => intelfocl_sample.json} |   4 +-
 config/pynq_sample.json|   2 +-
 config/tsim_sample.json|   2 +-
 config/ultra96_sample.json |   2 +-
 config/vta_config.json |   2 +-
 hardware/intelfocl/Makefile|  65 +
 hardware/intelfocl/README.rst  | 168 
 hardware/intelfocl/src/vta.cl  | 298 +
 hardware/intelfocl/src/vta.h   | 114 
 include/vta/hw_spec.h  | 157 +--
 include/vta/hw_spec_const.h| 173 
 src/oclfpga/oclfpga_device.cc  | 251 +
 src/oclfpga/oclfpga_device.h   |  86 ++
 src/oclfpga/oclfpga_driver.cc  |  96 +++
 src/sim/sim_driver.cc  |  55 +++-
 tests/scripts/docker_bash.sh   |   6 +
 18 files changed, 1326 insertions(+), 159 deletions(-)

diff --git a/config/de10nano_sample.json b/config/de10nano_sample.json
index e4148c3..d8c60f7 100644
--- a/config/de10nano_sample.json
+++ b/config/de10nano_sample.json
@@ -1,6 +1,6 @@
 {
   "TARGET" : "de10nano",
-  "HW_VER" : "0.0.1",
+  "HW_VER" : "0.0.2",
   "LOG_INP_WIDTH" : 3,
   "LOG_WGT_WIDTH" : 3,
   "LOG_ACC_WIDTH" : 5,
diff --git a/config/fsim_sample.json b/config/fsim_sample.json
index 0591bb4..9d7867d 100644
--- a/config/fsim_sample.json
+++ b/config/fsim_sample.json
@@ -1,6 +1,6 @@
 {
   "TARGET" : "sim",
-  "HW_VER" : "0.0.1",
+  "HW_VER" : "0.0.2",
   "LOG_INP_WIDTH" : 3,
   "LOG_WGT_WIDTH" : 3,
   "LOG_ACC_WIDTH" : 5,
diff --git a/config/de10nano_sample.json b/config/intelfocl_sample.json
similarity index 82%
copy from config/de10nano_sample.json
copy to config/intelfocl_sample.json
index e4148c3..4943448 100644
--- a/config/de10nano_sample.json
+++ b/config/intelfocl_sample.json
@@ -1,6 +1,6 @@
 {
-  "TARGET" : "de10nano",
-  "HW_VER" : "0.0.1",
+  "TARGET" : "intelfocl",
+  "HW_VER" : "0.0.2",
   "LOG_INP_WIDTH"

[GitHub] [tvm-vta] tmoreau89 commented on pull request #9: [Hardware][OpenCL] Intelfocl support

2020-12-10 Thread GitBox



tmoreau89 commented on pull request #9:
URL: https://github.com/apache/tvm-vta/pull/9#issuecomment-742919559


   Thank you @zhanghaohit, @remotego, @liangfu , the PR is merged!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] mbrookhart commented on a change in pull request #6839: [ONNX] NMS in ONNX

2020-12-10 Thread GitBox



mbrookhart commented on a change in pull request #6839:
URL: https://github.com/apache/tvm/pull/6839#discussion_r540639712



##
File path: tests/python/relay/test_op_level5.py
##
@@ -393,8 +393,6 @@ def verify_nms(
 intrp2 = relay.create_executor("debug", ctx=ctx, target=target)
 op_res2 = intrp2.evaluate(func)(x0_data, x1_data, x2_data, x3_data)
 tvm.testing.assert_allclose(op_res2.asnumpy(), ref_res, rtol=1e-5)
-if target == "nvptx":

Review comment:
   I haven't rebased since Monday, will fix





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] tmoreau89 commented on pull request #6126: [VTA][OpenCL] intelfocl

2020-12-10 Thread GitBox



tmoreau89 commented on pull request #6126:
URL: https://github.com/apache/tvm/pull/6126#issuecomment-742921873


   https://github.com/apache/tvm-vta/pull/9 is finally merged! @zhanghaohit can 
you please update submodule and let's make sure that the unit tests do pass now 
that the ISA has changed.
   
   Note: we'll also need to modify the bitstreams for the FPGAs that VTA 
supports. @liangfu are you able to update the bitstreams for DE10 in the next 
week or so?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[tvm] branch main updated (19e2631 -> 1e6e202)

2020-12-10 Thread moreau

This is an automated email from the ASF dual-hosted git repository.

moreau pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git.


from 19e2631  #7058 [Tutorial] Import errors in deploy_detection.py and 
deploy_classification.py (#7059)
 add 1e6e202  [VTA][OpenCL] add device_annot support in graphpack (#6125)

No new revisions were added by this update.

Summary of changes:
 python/tvm/relay/op/_tensor.py|   4 ++
 src/relay/op/annotation/annotation.cc |   7 +-
 vta/python/vta/top/graphpack.py   | 132 +-
 3 files changed, 140 insertions(+), 3 deletions(-)

[GitHub] [tvm] tmoreau89 merged pull request #6125: [VTA][OpenCL] add device_annot support in graphpack

2020-12-10 Thread GitBox



tmoreau89 merged pull request #6125:
URL: https://github.com/apache/tvm/pull/6125


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] mbrookhart commented on pull request #6839: [ONNX] NMS in ONNX

2020-12-10 Thread GitBox



mbrookhart commented on pull request #6839:
URL: https://github.com/apache/tvm/pull/6839#issuecomment-742933063


   @Laurawly Commit fd5ce6459, just before #7005, gets me this :/ I don't think 
the perf problem came from 7005
   ```
   fused_vision_non_max_suppression 
   142188.076.13(1, 122640, 6)
   fused_vision_get_valid_counts
   76.874  0.041(1, 122640, 6)
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

1 2 >

1 - 100 of 122 matches

Mail list logo