[GitHub] [incubator-tvm] liangfu commented on a change in pull request #5417: [RUNTIME][uTVM] AutoTVM + uTVM for Cortex-M7
liangfu commented on a change in pull request #5417: URL: https://github.com/apache/incubator-tvm/pull/5417#discussion_r417774863 ## File path: python/tvm/micro/base.py ## @@ -133,44 +152,91 @@ def __exit__(self, exc_type, exc_value, exc_traceback): self._exit() -def create_micro_mod(c_mod, dev_config): +def _calc_max_workspace_usage(src): +# TODO factor in alignment to the calculation (alloc sizes will be aligned up to the word size) +alloc_re = re.compile( +r'.*\* ?(.+) = (\(.+\))? TVMBackendAllocWorkspace\(.+, .+, \(uint64_t\)(.+), .+, .+\).*') +free_re = re.compile(r'.*if \(TVMBackendFreeWorkspace\(.+, .+, (\(void\*\))? (.+)\) != 0\) {.*') +max_usage = 0 +alloc_map = {} +for line in src.split("\n"): +if line.strip().startswith("//"): +continue +match = alloc_re.match(line) +if match is not None: +alloc_map[match.group(1)] = int(match.group(3)) +max_usage = max(max_usage, sum(alloc_map.values())) +else: +match = free_re.match(line) +if match is not None: +print(alloc_map) +del alloc_map[match.group(2)] +return max_usage + + +def create_micro_mod(c_mod, dev_config, lib_src_paths=None, lib_headers=None, + lib_include_paths=None): """Produces a micro module from a given module. Parameters -- -c_mod : tvm.runtime.Module +c_mod : tvm.module.Module module with "c" as its target backend -dev_config : Dict[str, Any] -MicroTVM config dict for the target device +lib_src_paths: TODO +TODO + +lib_headers: TODO +TODO + +lib_include_paths: TODO +TODO Return -- -micro_mod : tvm.runtim.Module +micro_mod : tvm.module.Module micro module for the target device """ temp_dir = _util.tempdir() lib_obj_path = temp_dir.relpath("dev_lib.obj") +# TODO use dev config to dispatch on the type of C codegen to run through +# (e.g., CodeGenCArm, CodeGenCHost, CodeGenCRiscV) c_mod.export_library( lib_obj_path, -fcompile=cross_compiler(dev_config, LibType.OPERATOR)) +fcompile=cross_compiler( +dev_config, +LibType.OPERATOR, +lib_src_paths=lib_src_paths, +lib_headers=lib_headers, +lib_include_paths=lib_include_paths)) micro_mod = tvm.runtime.load_module(lib_obj_path) return micro_mod -def cross_compiler(dev_config, lib_type): -"""Create a cross-compile function that wraps `create_lib` for a `Binutil` instance. +def cross_compiler(dev_config, lib_type, lib_src_paths=None, lib_headers=None, + lib_include_paths=None): +"""Create a cross compile function that wraps `create_lib` for a `Binutil` instance. For use in `tvm.runtime.Module.export_library`. Parameters -- -dev_config : Dict[str, Any] -MicroTVM config dict for the target device +create_micro_lib : func +function for creating MicroTVM libraries for a specific device (e.g., + `tvm.micro.device.get_device_funcs('arm.stm32f746xx')['create_micro_lib']`) lib_type : micro.LibType whether to compile a MicroTVM runtime or operator library +lib_src_paths: TODO +TODO + +lib_headers: TODO +e.g., `['cmsis_gcc.h', 'arm_math.h']` + +lib_include_paths: TODO Review comment: Please add a meaningful comment here. ## File path: python/tvm/micro/device/riscv_spike.py ## @@ -62,56 +78,31 @@ def default_config(base_addr, server_addr, server_port): server_port : int port of OpenOCD server to connect to +TODO correct type annotation? +section_constraints: Optional[Dict[str, Tuple[Number, MemConstraint]]] +TODO Review comment: leave a meaningful comment ## File path: python/tvm/micro/device/arm/stm32f746xx.py ## @@ -36,23 +55,40 @@ def create_micro_lib(obj_path, src_path, lib_type, options=None): options : Optional[List[str]] additional options to pass to GCC + +lib_src_paths : Optional[List[str]] +TODO Review comment: Please put a meaningful comment here as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm-vta] huajsj commented on pull request #7: [pynq_driver] fix device early return
huajsj commented on pull request #7: URL: https://github.com/apache/incubator-tvm-vta/pull/7#issuecomment-621612870 @remotego, Thanks for the detailed explain, but for this problem I think you may have a glance on related pynq driver logic, actually there is no "racing condition" between "processor" and "FPGA" , because they are checking on different register. on line 124 the VTA_START get write into register with offset "0", and in line 134 the logic is go to check register with offset VTA_COMPUTE_DONE_RD_OFFSET which is "24", i think the work ground may work in some scenario but if we can not find root cause it may not reliable. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm-vta] remotego commented on pull request #8: [Hardware][Xilinx] explicitly specify acc dep distance to avoid hidden pitfall
remotego commented on pull request #8: URL: https://github.com/apache/incubator-tvm-vta/pull/8#issuecomment-621601302 > Perhaps one way to look at this is to start with DISTANCE=3 by default. And if the II>1 for the GEMM, we issue a warning telling the user to increase the distance to 4. The process can repeat until the II of 1 is achieved, and the runtime will be informed of the actual distance so proper checks can be inserted. I love this idea. Our ultimate goal is to achieve ii = 1, thus it would be great if we could achieve it by some intelligent scripting... only problem is that we need to introduce a way to feedback the results to S/W side. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-tvm] branch master updated (684f2d7 -> 90b08f5)
This is an automated email from the ASF dual-hosted git repository. ziheng pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git. from 684f2d7 [VTA][Runtime] clear sram index in reset (#5470) add 90b08f5 [intrin] a few more math functions (#5468) No new revisions were added by this update. Summary of changes: include/tvm/tir/op.h | 6 +++ python/tvm/tir/__init__.py | 4 +- python/tvm/tir/op.py | 80 src/target/intrin_rule.cc| 21 +++-- tests/python/unittest/test_tir_intrin.py | 8 +++- 5 files changed, 114 insertions(+), 5 deletions(-)
[GitHub] [incubator-tvm] tqchen commented on pull request #5483: [TIR][Printer] text format printer considering future parsing use
tqchen commented on pull request #5483: URL: https://github.com/apache/incubator-tvm/pull/5483#issuecomment-621600022 cc @jroesch @antinucleon @junrushao1994 @Hzfengsy @mbrookhart This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm-vta] tmoreau89 commented on pull request #8: [Hardware][Xilinx] explicitly specify acc dep distance to avoid hidden pitfall
tmoreau89 commented on pull request #8: URL: https://github.com/apache/incubator-tvm-vta/pull/8#issuecomment-621599822 I'll think some more about this problem overnight, thanks for bringing attention to this issue @zhanghaohit @remotego This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm-vta] tmoreau89 commented on pull request #8: [Hardware][Xilinx] explicitly specify acc dep distance to avoid hidden pitfall
tmoreau89 commented on pull request #8: URL: https://github.com/apache/incubator-tvm-vta/pull/8#issuecomment-621599637 Perhaps one way to look at this is to start with DISTANCE=3 by default. And if the II>1 for the GEMM, we issue a warning telling the user to increase the distance to 4. The process can repeat until the II of 1 is achieved, and the runtime will be informed of the actual distance so proper checks can be inserted. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm-vta] tmoreau89 commented on pull request #8: [Hardware][Xilinx] explicitly specify acc dep distance to avoid hidden pitfall
tmoreau89 commented on pull request #8: URL: https://github.com/apache/incubator-tvm-vta/pull/8#issuecomment-621599402 So to conclude, I agree with you that we may want this parameter to be larger than 2 cycles if the hardware pipeline becomes more complicated. So the big question here is: how can we ensure that it is set correctly moving forward. For instance a DISTANCE=2 works for 16x16, but might break for 32x32. Is there a way that the user as she performs design space exploration does not have to think about updating this parameter? Or at least receives some guidance to update it to the right value? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm-vta] tmoreau89 commented on pull request #8: [Hardware][Xilinx] explicitly specify acc dep distance to avoid hidden pitfall
tmoreau89 commented on pull request #8: URL: https://github.com/apache/incubator-tvm-vta/pull/8#issuecomment-621598572 I assume that if you have an FPGA that needs to be clocked at a high frequency, you'll likely end up with more cycles. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm-vta] tmoreau89 commented on pull request #8: [Hardware][Xilinx] explicitly specify acc dep distance to avoid hidden pitfall
tmoreau89 commented on pull request #8: URL: https://github.com/apache/incubator-tvm-vta/pull/8#issuecomment-621598201 @remotego I just read your comment that must have been posted as I wrote my reply. Your explanation makes sense, ultimately the critical path in the accumulation is READ -> ADD to GEMM result -> WRITE, which can be more than 2 cycles if we account for the addition logic, and various multiplexers etc. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] tqchen commented on pull request #5484: [REFACTOR][RPC][PROCOTOL-CHANGE] Modularize the RPC infra
tqchen commented on pull request #5484: URL: https://github.com/apache/incubator-tvm/pull/5484#issuecomment-621598079 cc @FrozenGene @merrymercy @eqy @ZihengJiang @areusch @ajtulloch @antinucleon @junrushao1994 @jroesch @kazum @yzhliu This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] tqchen commented on pull request #5484: [REFACTOR][RPC][PROCOTOL-CHANGE] Modularize the RPC infra
tqchen commented on pull request #5484: URL: https://github.com/apache/incubator-tvm/pull/5484#issuecomment-621597882 RFC https://discuss.tvm.ai/t/modularize-rpc-infra/6544 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] tqchen opened a new pull request #5484: [REFACTOR][RPC][PROCOTOL-CHANGE] Modularize the RPC infra
tqchen opened a new pull request #5484: URL: https://github.com/apache/incubator-tvm/pull/5484 This PR refactors the RPC protocol to make it more modularized. - RPCSession: represent a set of features that need to be implemented - RPCEndPont: End point that forwards the RPCSession requests over a communication channel. - RPCModule: Exposes an RPCSession as an rpc device in the TVM Runtime API. In the new design, the local machine is presented as a special case of RPCSession. The remote is just another client session that calls into RPCEndPoint. The RPC communication path is as follows. ``` client -> ClientSession -> EndPoint[client@n0] -> networking[between n0 <=> n1] -> EndPoint[server@n1] -> LocalSession[@n1] ``` Because of the new modular design, we can now chain more sessions together. For example, we can now run the following proxy setup (testcase in test_runtime_rpc.test_session_constructor). ``` client -> ClientSession -> Endpoint[client@n0] -> networking[between n0 <=> n1] -> Endpoint[server@n1] -> ClientSession -> Endpoint[client@n1] -> networking[between n1 <=> n2] -> Endpoint[server@n2] -> LocalSession[@n2] ``` We can also implement other types of Sessions. For example, we could make uTVM session a special case of the RPCSession and use the same mechanism for session management. We also add more comments about the internal of the RPC. The communication protocol is simplfied using a similar convention as PackedFunc. This allows us to further reduce the amount of special remote syscalls. Due to the major improvement and simplification, we are making a non-compatible update to the RPC protocol. It means that the client and server needs to be upgraded to together in order for it to function correctly. This PR also introduces a versioning mechanism to the current RPC procotol, so that future upgrade will be produce more user friendly with error messages. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm-vta] tmoreau89 commented on pull request #8: [Hardware][Xilinx] explicitly specify acc dep distance to avoid hidden pitfall
tmoreau89 commented on pull request #8: URL: https://github.com/apache/incubator-tvm-vta/pull/8#issuecomment-621597493 Thank you for the explanation. Do you mind breaking down what happens in the 32x32 configuration that causes correctness to fail (since the dependence analysis waive is unbounded)? I agree that there should be a constant set for hardware and software to agree on, but I'm not convinced that it should be set by a user, and instead be derived automatically by the hardware target we choose in pkg_config. This is to ensure that we can achieve an iteration interval of `II=1` for the GEMM pipeline, which is constrained by the SRAM write to read latency, and then enforced by the software. The example I can use is that if we are say we want to accumulate SRAM at address A and B for 3 consecutive cycles. We could do it (1) by updating each address consecutively, or (2) by interlacing the updates. ``` // initial contents of SRAM at address A is 20, and B is 14 // here we assume 2 cycles to propagate a write to being able to read it t=0: SRAM[A] += 1 // SRAM[A] is 20, SRAM[B] is 14 t=1: SRAM[A] += 1 // SRAM[A] is 20, SRAM[B] is 14 t=2: SRAM[A] += 1 // SRAM[A] is 21, SRAM[B] is 14 t=3: SRAM[B] += 1 // SRAM[A] is 21, SRAM[B] is 14 t=4: SRAM[B] += 1 // SRAM[A] is 22, SRAM[B] is 14 t=5: SRAM[B] += 1 // SRAM[A] is 22, SRAM[B] is 15 t=6: noop // SRAM[A] is 22, SRAM[B] is 15 t=7: noop // SRAM[A] is 22, SRAM[B] is 16 // Results are incorrect: A final value is 22 instead of 23, and B final value is 16 instead of 17 ``` Now in the case where these back to back writes are detected in the runtime, we would have an invalid schedule. So assuming the TVM compiler produces a valid schedule, we would end up with interlaced writes to A and B: ``` // initial contents of SRAM at address A is 20, and B is 14 // here we assume 2 cycles to propagate a write to being able to read it t=0: SRAM[A] += 1 // SRAM[A] is 20, SRAM[B] is 14 t=1: SRAM[B] += 1 // SRAM[A] is 20, SRAM[B] is 14 t=2: SRAM[A] += 1 // SRAM[A] is 21, SRAM[B] is 14 t=3: SRAM[B] += 1 // SRAM[A] is 21, SRAM[B] is 15 t=4: SRAM[A] += 1 // SRAM[A] is 22, SRAM[B] is 15 t=5: SRAM[B] += 1 // SRAM[A] is 22, SRAM[B] is 16 t=6: noop // SRAM[A] is 23, SRAM[B] is 16 t=7: noop // SRAM[A] is 23, SRAM[B] is 17 // Results are now correct because the compiler correctly ensured enough wait time was allowed between consecutive writes and reads to the same address in SRAM. ``` So therefore the point I'm hoping to make is that this 2 cycle wait time in enforced by the SRAM, and should not be exposed to the user to be set as a parameter in VTA_config. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] srkreddy1238 commented on issue #4102: [Relay] Support for Tensorflow 2.0
srkreddy1238 commented on issue #4102: URL: https://github.com/apache/incubator-tvm/issues/4102#issuecomment-621595847 Good to see CI moving to 2.0. On this context. - I have upgraded TF test cases to use pure 2.0 API (with out compat v1, tf.graph, tf.session ...etc.). - Trying to simplify the frontend with support to use model signatures unlike explicit arguments of shapes and out info (In the interest of SavedModel & TF Serving signatures being officially used export formats for TF). Most of this work is completed (except few operators failing with 2.0 API). Hoping to upstream soon once I am cleared with contribution policy approval. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] tobegit3hub commented on issue #4464: [RFC] Add TVMDSOOp to integrate any TVM operator with TensorFlow
tobegit3hub commented on issue #4464: URL: https://github.com/apache/incubator-tvm/issues/4464#issuecomment-621593883 The PR has been merged and we will close this issue. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm-vta] remotego commented on pull request #8: [Hardware][Xilinx] explicitly specify acc dep distance to avoid hidden pitfall
remotego commented on pull request #8: URL: https://github.com/apache/incubator-tvm-vta/pull/8#issuecomment-621593259 > Thanks @zhanghaohit, for catching this error. I agree with the fact that the dependence should not be hardcoded. However in order to not to add too many parameters in the config file, can we derive the value from the VTA target (i.e. FPGA type) in `pkg_config`? > > Also from what I understand is that the dependence distance is generally a property of the memory part that the FPGA is using. For instance that distance will be different if one uses BRAM vs. Ultra-RAM? And on a different FPGA family this number will change. Correct? Thanks @tmoreau89 for the advice. We also thought of putting the value into the pkg_config. However, later we found out the dependence distance is not tightly related to the device family/type. In my opinion, the compiler will decide the II based on multiple factors. But eventually, it is based on the number of cycles on the datapath from "Read Mem" -> "Perform Ops" -> "Write Back", as we need to avoid RAW data hazard. In the "Read Mem" and "Write Back" stage, it is definitely related to the properties of the memory (eg. device family, uram vs bram, etc). But it is also related to the H/W circuit of accessing the data. For example, multiplexers as there are multiple accesses. The compile will judge based on the complexity of the overall circuit, and it could add registers (increase II) if the desired frequency target could not be satisfied. In "Perform Ops" stage, the cycles needed is mainly related to the op itself. For example, a 32-bit multiplication may require more cycles than a 8-bit multiplication. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] spectrometerHBH opened a new pull request #5483: [TIR][Printer] text format printer considering future parsing use
spectrometerHBH opened a new pull request #5483: URL: https://github.com/apache/incubator-tvm/pull/5483 ## Several Points 1. Rename vars/size_vars/buffers to use name hints to identify them 2. A node is either in META or not. If ref A is put in META, ref B is contained in A, then when we need to print ref B we print its META. 3. Now only the `node` in `AttrStmt` may be put in META except that `node` is `StringImm/Var/Buffer/IterVar`. 4. Print the Type of vars when we first encounter them on the tree. 5. This PR hasn't combine relay's printer with tir's printer. I put several choices here for combined API for discussion. - add an `astext` member function for IRModule: we only do mix printing when printing an IRModule. Keep `relay.astext` and `tir.astext` separate for other uses. - merge relay's printer & tir's printer into one printer and use only one API `ir.astext` cc @tqchen @Hzfengsy ## Several Examples ### Simple ```c++ primfn(A0_1: handle, A1_1: handle, C_1: handle) -> () attr = {"tir.noalias": bool(1), "global_symbol": "main"} buffers = {A1: Buffer(A1_2: handle, float32, [m: int32, n: int32], [stride: int32, stride_1: int32], type="auto"), C: Buffer(C_2: handle, float32, [m, n], [stride_2: int32, stride_3: int32], type="auto"), A0: Buffer(A0_2: handle, float32, [m, n], [stride_4: int32, stride_5: int32], type="auto")} buffer_map = {C_1: C, A0_1: A0, A1_1: A1} { attr [B.v0: handle] "storage_scope" = "global"; allocate(B.v0, float32, [n]) { attr [B.v1: handle] "storage_scope" = "global"; allocate(B.v1, float32, [n]) { for (i: int32, 0, m) { for (j: int32, 0, n) { B.v0[j] = (load(float32, A0_2[((i*stride_4) + (j*stride_5))]) + float32(2)) B.v1[j] = (load(float32, A0_2[((i*stride_4) + (j*stride_5))])*float32(3)) } for (j_1: int32, 0, n) { C_2[((i*stride_2) + (j_1*stride_3))] = (load(float32, A1_2[((i*stride) + (j_1*stride_1))]) + load(float32, B.v0[j_1])) } } } } } ``` ### Conv on GPU ```c++ primfn(A_1: handle, W_1: handle, B_1: handle) -> () attr = {"tir.noalias": bool(1), "global_symbol": "cuda"} buffers = {W: Buffer(W_2: handle, float32, [3, 3, 256, 512], []), B: Buffer(B_2: handle, float32, [14, 14, 512, 256], []), A: Buffer(A_2: handle, float32, [14, 14, 256, 256], [])} buffer_map = {B_1: B, A_1: A, W_1: W} { attr [IterVar(blockIdx.z: int32, [(nullptr)], "ThreadIndex", "blockIdx.z")] "thread_extent" = 196; attr [B.local: handle] "storage_scope" = "local"; allocate(B.local, float32, [64]) { attr [Apad.shared: handle] "storage_scope" = "shared"; allocate(Apad.shared, float32, [512]) { attr [W.shared: handle] "storage_scope" = "shared"; allocate(W.shared, float32, [512]) { attr [Apad.shared.local: handle] "storage_scope" = "local"; allocate(Apad.shared.local, float32, [8]) { attr [W.shared.local: handle] "storage_scope" = "local"; allocate(W.shared.local, float32, [8]) { attr [IterVar(blockIdx.y: int32, [(nullptr)], "ThreadIndex", "blockIdx.y")] "thread_extent" = 8; attr [IterVar(blockIdx.x: int32, [(nullptr)], "ThreadIndex", "blockIdx.x")] "thread_extent" = 4; attr [IterVar(threadIdx.y: int32, [0:8], "ThreadIndex", "threadIdx.y")] "thread_extent" = 8; attr [IterVar(threadIdx.x: int32, [0:8], "ThreadIndex", "threadIdx.x")] "thread_extent" = 8 { for (ff.c.init: int32, 0, 4) { for (nn.c.init: int32, 0, 4) { B.local[((ff.c.init*4) + nn.c.init)] = float32(0) B.local[(((ff.c.init*4) + nn.c.init) + 32)] = float32(0) B.local[(((ff.c.init*4) + nn.c.init) + 16)] = float32(0) B.local[(((ff.c.init*4) + nn.c.init) + 48)] = float32(0) } } for (rc.outer: int32, 0, 32) { for (ry: int32, 0, 3) { for (rx: int32, 0, 3) { for (ax3.inner.outer: int32, 0, 2) { Apad.shared[rampthreadIdx.y*64) + (threadIdx.x*8)) + (ax3.inner.outer*4)), 1, 4)] = call("tvm_if_then_else", [1 <= (floordiv(blockIdx.z, 14) + ry)) and ((floordiv(blockIdx.z, 14) + ry) < 15)) and (1 <= (rx + floormod(blockIdx.z, 14 and ((rx + floormod(blockIdx.z, 14)) < 15)), load(float32x4, A_2[ramp((ry*917504) + (blockIdx.z*65536)) + (rx*65536)) + (rc.outer*2048)) + (threadIdx.y*256)) + (blockIdx.x*64)) + (threadIdx.x*8)) + (ax3.inner.outer*4)) - 983040), 1, 4)]), broadcast(float32(0), 4)], float32x4, "pure_intrin", 0) } for (ax3.inner.outer_1: int32, 0, 2) {
[GitHub] [incubator-tvm-vta] tmoreau89 commented on a change in pull request #7: [pynq_driver] fix device early return
tmoreau89 commented on a change in pull request #7: URL: https://github.com/apache/incubator-tvm-vta/pull/7#discussion_r417732849 ## File path: src/pynq/pynq_driver.cc ## @@ -126,6 +127,10 @@ class VTADevice { VTAWriteMappedReg(vta_compute_handle_, 0x0, VTA_AUTORESTART); VTAWriteMappedReg(vta_store_handle_, 0x0, VTA_AUTORESTART); +// Allow device to respond Review comment: Could you add more explanation here to make sure that this sleep is well understood for posterity (the summarized comment by @remotego should suffice) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm-vta] tmoreau89 commented on pull request #7: [pynq_driver] fix device early return
tmoreau89 commented on pull request #7: URL: https://github.com/apache/incubator-tvm-vta/pull/7#issuecomment-621590481 Thank you @remotego for this logical explanation. The fix makes sense, and is highly likely not to affect performance negatively. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] yzhliu opened a new pull request #5482: [team] add reviewer kparzysz-quic
yzhliu opened a new pull request #5482: URL: https://github.com/apache/incubator-tvm/pull/5482 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm-vta] remotego edited a comment on pull request #7: [pynq_driver] fix device early return
remotego edited a comment on pull request #7: URL: https://github.com/apache/incubator-tvm-vta/pull/7#issuecomment-621581807 Thanks @huajsj @tmoreau89 for pointing out the potential performance issue for this fix. The root cause of the issue is because on line 124, we instruct the VTA to start calculation by writing to the VTA_START bit of control register. The FPGA will then respond by changing its VTA_COMPUTE_DONE bit of control register from '1' to '0', indicating a "busy" state of the VTA hardware. And VTA will change it back to '1' after the completion of the calculation. The original code created a racing condition here between the processor code and the FPGA hardware, by attempting to check the VTA_COMPUTE_DONE bit immediately after the issuing of "start" command. If we got a more powerful processor, the checking of VTA_COMPUTE_DONE bit will happen before FPGA could ever change it to '0'. As @tmoreau89 mentioned, we are using a pooling approach here. And in my opinion, a short sleep before checking is quite common in the pooling world. Here we proposed a 1us sleep, which translates to around ~250 clock cycles on FPGA, it would give FPGA sufficient time to change VTA_COMPUTE_DONE from '1' to '0'. As 250 cycles are way shorter than any meaningful calculations, I believe there should not be any performance impact. I agree with @tmoreau89 that interrupt is a more robust solution compare to pooling, but it is quite a huge task involving changes from both HW and driver archs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm-vta] remotego commented on pull request #7: [pynq_driver] fix device early return
remotego commented on pull request #7: URL: https://github.com/apache/incubator-tvm-vta/pull/7#issuecomment-621581807 Thanks @huajsj @tmoreau89 for pointing out the potential performance issue for this fix. The root cause of the issue is because on line 124, we instruct the VTA to start calculation by writing to the VTA_START bit of control register. The FPGA will then respond by changing its VTA_COMPUTE_DONE bit of control register from '1' to '0', indicating a "busy" state of the VTA hardware. And VTA will change it back to '1' after the completion of the calculation. The original code created a racing condition here between the processor code and the FPGA hardware, by attempting to check the VTA_COMPUTE_DONE bit immediately after the issuing of "start" command. If we got a more powerful processor, the checking of VTA_COMPUTE_DONE bit will happen before FPGA could ever change it to '0'. As @tmoreau89 mentioned, we are using a pooling approach here. And in my opinion, a short sleep before checking is quite common in the pooling world. Here we proposed a 1us sleeping, which translates to around ~250 clock cycles on FPGA, it would give FPGA sufficient time to change VTA_COMPUTE_DONE from '1' to '0'. As 250 cycles are way shorter than any meaningful calculations, I believe there should not be any performance impact. I agree with @tmoreau89 that interrupt is a more robust solution compare to pooling, but it is quite a huge task involving changes from both HW and driver archs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] zhanghaohit commented on pull request #5471: [VTA][Runtime] fix hardcoded VerifyDep step
zhanghaohit commented on pull request #5471: URL: https://github.com/apache/incubator-tvm/pull/5471#issuecomment-621572608 > Thanks @zhanghaohit for the proposed change, moving away from a hardcoded value is the right move! > > I find the `ACC_DEP_DISTANCE` to be confusing however. Ultimately it's derived from the latency (in cycles) it takes for a write to address `X` to be visible on the read port at address `X`. Maybe we can dub this `SRAM_WRITE_TO_READ_LATENCY` or something along those lines? Thanks @tmoreau89 for the suggestion. I think `VerifyDep` here is to [`verify that we don't write to the same acc_mem index two cycles in a row`](https://github.com/apache/incubator-tvm/blob/684f2d7b89e29b92cdac5389df58ba5a70782c2a/vta/runtime/runtime.cc#L289) (I may also need to change the comments). The `ACC_DEP_DISTANCE` is the same as [here](https://github.com/apache/incubator-tvm-vta/pull/8). Did I get the correct meaning of [`VerifyDep`](https://github.com/apache/incubator-tvm/blob/684f2d7b89e29b92cdac5389df58ba5a70782c2a/vta/runtime/runtime.cc#L290)? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] huajsj opened a new pull request #5481: [VTA] Fix VTA compile issue
huajsj opened a new pull request #5481: URL: https://github.com/apache/incubator-tvm/pull/5481 Issue1: When doing vta compile in xilinx FPGA, the pynqdriver.cc report can not find Issue2: Run vta on pynq board , libvta.so load failed. solution: fixed VTA.make logic issue @tmoreau89 @liangfu , could you help to have a review , thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm-vta] zhanghaohit commented on pull request #8: [Hardware][Xilinx] explicitly specify acc dep distance to avoid hidden pitfall
zhanghaohit commented on pull request #8: URL: https://github.com/apache/incubator-tvm-vta/pull/8#issuecomment-621567733 > Thanks @zhanghaohit, for catching this error. I agree with the fact that the dependence should not be hardcoded. However in order to not to add too many parameters in the config file, can we derive the value from the VTA target (i.e. FPGA type) in `pkg_config`? > > Also from what I understand is that the dependence distance is generally a property of the memory part that the FPGA is using. For instance that distance will be different if one uses BRAM vs. Ultra-RAM? And on a different FPGA family this number will change. Correct? I think the dependence distance here is defined by the user, to tell the compiler (e.g., XILINX vivado) that, to what extent, the software layer (i.e., TVM code generator) can guarantee there is no acc memory conflict. Say, if the distance = 4, meaning that the software will guarantee there is no conflict within 4 iterations. So the compiler can generate hardware code with smaller `II` as it is not necessary for the compiler to generate code with larger `II` to eliminate memory conflict from the hardware perspective. Thus, I also add the check in [VTA runtime](https://github.com/apache/incubator-tvm/pull/5471/commits/3b0d179de41e4212d94c2920c91004709ab78003) to do the VerifyDep. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] Menooker commented on pull request #5357: [Relay] enable blocking format in x86 conv2d and fold scale axis
Menooker commented on pull request #5357: URL: https://github.com/apache/incubator-tvm/pull/5357#issuecomment-621567041 > @Menooker Could you share the script that can reproduce the error? It's a bit weird that `_alter_conv2d_layout` receive a conv2d op with `NCHWc` layout. My usecase is that, I want to use `nn.conv2d` with "NCHWc' layout in the frontend. I noticed that there are `relay.nn.contrib_conv2d_nchwc` and `contrib_depthwise_conv2d_nchwc`. I think users do not need to choose from `conv2d_nchwc` and `depthwise_conv2d_nchwc` manually. And I want TVM to decide which to use automatically - just like what TVM is now doing on `conv2d` with "NCHW'. So I use "NCHW16c" in `nn.conv2d`: ```python from tvm import relay import tvm img = relay.var("img", shape=(128, 3, 224, 224, 1), dtype='float') weight = relay.var("weight", shape=(1, 3, 3, 3, 1, 16 ), dtype='float') conv = relay.nn.conv2d(img, weight, strides=(2, 2), padding=(1, 1), dilation=(1, 1), groups=1, channels=16, kernel_size=[3, 3], data_layout='NCHW1c', kernel_layout='OIHW1i16o', out_layout='NCHW16c') func = relay.Function([img, weight], conv) target = 'llvm -mcpu=skylake-avx512' ctx = tvm.context(target, 0) print(func) with relay.build_config(opt_level=3): graph, lib, params_fwd = relay.build(func, target) ``` Supporting "NCHWc" in `nn.conv2d` will make the frontend cleaner: users can use `nn.conv2d` for depthwise or not, and blocking format or not. TVM will choose the implementation (like `depthwise_conv2d_nchwc`) under the hood. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] siju-samuel opened a new pull request #5480: [TFLITE]Argmin & Argmax op support
siju-samuel opened a new pull request #5480: URL: https://github.com/apache/incubator-tvm/pull/5480 @anijain2305 @masahi please help to review this PR. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm-vta] tmoreau89 commented on pull request #5: [WIP]Add c++ and python local deploy example
tmoreau89 commented on pull request #5: URL: https://github.com/apache/incubator-tvm-vta/pull/5#issuecomment-621552938 Very exciting development @huajsj ! I'm looking forward to the C++ example! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm-vta] tmoreau89 commented on pull request #7: [pynq_driver] fix device early return
tmoreau89 commented on pull request #7: URL: https://github.com/apache/incubator-tvm-vta/pull/7#issuecomment-621552736 Thanks @zhanghaohit for the fix. I'm curious when the early completion occurs as @huajsj pointed out. Ideally the driver shouldn't rely on polling and instead use a more robust mechanism like triggering an interrupt. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm-vta] tmoreau89 commented on pull request #8: [Hardware][Xilinx] explicitly specify acc dep distance to avoid hidden pitfall
tmoreau89 commented on pull request #8: URL: https://github.com/apache/incubator-tvm-vta/pull/8#issuecomment-621551928 Thanks @zhanghaohit, for catching this error. I agree with the fact that the dependence should not be hardcoded. However in order to not to add too many parameters in the config file, can we derive the value from the VTA target (i.e. FPGA type) in `pkg_config`? Also from what I understand is that the dependence distance is generally a property of the memory part that the FPGA is using. For instance that distance will be different if one uses BRAM vs. Ultra-RAM? And on a different FPGA family this number will change. Correct? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-tvm] branch master updated (046b0d9 -> 684f2d7)
This is an automated email from the ASF dual-hosted git repository. moreau pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git. from 046b0d9 [BYOC] Bind constant tuples in graph partitioner (#5476) add 684f2d7 [VTA][Runtime] clear sram index in reset (#5470) No new revisions were added by this update. Summary of changes: vta/runtime/runtime.cc | 10 ++ 1 file changed, 10 insertions(+)
[GitHub] [incubator-tvm] anijain2305 opened a new pull request #5479: [Relay-TFLite] Quantized activations
anijain2305 opened a new pull request #5479: URL: https://github.com/apache/incubator-tvm/pull/5479 TBD This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] anijain2305 commented on pull request #5477: Removing older Object detection TFlite test
anijain2305 commented on pull request #5477: URL: https://github.com/apache/incubator-tvm/pull/5477#issuecomment-621543411 @tqchen This PR is for the web-data PR - https://github.com/dmlc/web-data/pull/247 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] tqchen commented on pull request #5478: [RUNTIME] Improved Packed FFI for optional.
tqchen commented on pull request #5478: URL: https://github.com/apache/incubator-tvm/pull/5478#issuecomment-621527940 cc @zhiics @ZihengJiang @yzhliu @jroesch @junrushao1994 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] tqchen opened a new pull request #5478: [RUNTIME] Improved Packed FFI for optional.
tqchen opened a new pull request #5478: URL: https://github.com/apache/incubator-tvm/pull/5478 Allows Optional and module to be passed with the right type code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] zhiics edited a comment on pull request #5457: [Fix] Add ConstantNode to IsAtomic
zhiics edited a comment on pull request #5457: URL: https://github.com/apache/incubator-tvm/pull/5457#issuecomment-621515297 @MarisaKirisame Sorry for the late response. Yeah, transforming it to ANF should work as well and seems like better fix. Let me try it. BTW, this would also mean that that we may have to execute ANF multiple times for the program executing through VM because we do constant folding after ANF in vm optimization. In general, I don't think there wouldn't be any problem. Do you see any possible issue? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] zhiics commented on pull request #5457: [Fix] Add ConstantNode to IsAtomic
zhiics commented on pull request #5457: URL: https://github.com/apache/incubator-tvm/pull/5457#issuecomment-621515297 @MarisaKirisame Sorry for the late response. Yeah, transforming it to ANF should work as well and seems like better fix. Let me try it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] kevinthesun commented on pull request #5457: [Fix] Add ConstantNode to IsAtomic
kevinthesun commented on pull request #5457: URL: https://github.com/apache/incubator-tvm/pull/5457#issuecomment-621505667 @MarisaKirisame The current use cases which can trigger is this issue are mainly from TensorFlow OD models. For those models, constant tensors which can be feed into fold_const pass are mostly small. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] anijain2305 opened a new pull request #5477: Removing older Object detection TFlite test
anijain2305 opened a new pull request #5477: URL: https://github.com/apache/incubator-tvm/pull/5477 New test will follow up This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] kevinthesun commented on a change in pull request #5429: [RELAY][TF] Support symbolic newshape for Reshape
kevinthesun commented on a change in pull request #5429: URL: https://github.com/apache/incubator-tvm/pull/5429#discussion_r417650019 ## File path: python/tvm/relay/_parser.py ## @@ -114,7 +114,10 @@ def convert(self, v): def __call__(self, args, attrs, type_args): if attrs is None: attrs = {} -x = self.operator(*args, **{k: self.convert(v) for k, v in attrs.items()}) +if self.operator is op.reshape: Review comment: I think for now we might want to handle each symbolic op separately, since they may have different attrs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] yongwww commented on a change in pull request #5429: [RELAY][TF] Support symbolic newshape for Reshape
yongwww commented on a change in pull request #5429: URL: https://github.com/apache/incubator-tvm/pull/5429#discussion_r417638357 ## File path: python/tvm/relay/_parser.py ## @@ -114,7 +114,10 @@ def convert(self, v): def __call__(self, args, attrs, type_args): if attrs is None: attrs = {} -x = self.operator(*args, **{k: self.convert(v) for k, v in attrs.items()}) +if self.operator is op.reshape: Review comment: Does this mean we have to maintain a list for symbolic ops here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] anijain2305 commented on a change in pull request #5439: TFlite e2e FP32 Object detection model
anijain2305 commented on a change in pull request #5439: URL: https://github.com/apache/incubator-tvm/pull/5439#discussion_r417583390 ## File path: tests/python/frontend/tflite/test_forward.py ## @@ -1933,21 +1937,41 @@ def test_forward_qnn_mobilenet_v3_net(): # SSD Mobilenet # - -def test_forward_ssd_mobilenet_v1(): -"""Test the SSD Mobilenet V1 TF Lite model.""" -# SSD MobilenetV1 +def test_forward_coco_ssd_mobilenet_v1(): +"""Test the quantized Coco SSD Mobilenet V1 TF Lite model.""" Review comment: remove quantized, This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-tvm] branch master updated: [BYOC] Bind constant tuples in graph partitioner (#5476)
This is an automated email from the ASF dual-hosted git repository. zhic pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git The following commit(s) were added to refs/heads/master by this push: new 046b0d9 [BYOC] Bind constant tuples in graph partitioner (#5476) 046b0d9 is described below commit 046b0d98a08153a4829a12cc81a4fa856be6efcd Author: mbaret <55580676+mba...@users.noreply.github.com> AuthorDate: Wed Apr 29 19:23:15 2020 +0100 [BYOC] Bind constant tuples in graph partitioner (#5476) * Bind constant tuples in the graph partitioner Change-Id: I815b32b5445a536c1837369b04f67dbbb0aed900 * Add partitioning test Change-Id: I3a492ec8d1beab4830214e3bc8da2a7c80771ca4 * Rename test target Change-Id: Ie32f37c1395ff597c0047ad3a93ed04ce3f3125d --- src/relay/transforms/partition_graph.cc | 22 --- tests/python/relay/test_pass_partition_graph.py | 37 + 2 files changed, 55 insertions(+), 4 deletions(-) diff --git a/src/relay/transforms/partition_graph.cc b/src/relay/transforms/partition_graph.cc index 15ad60b..3b0d6bc 100644 --- a/src/relay/transforms/partition_graph.cc +++ b/src/relay/transforms/partition_graph.cc @@ -393,12 +393,26 @@ class Partitioner : public ExprMutator { Array params; Array param_expr; -std::unordered_map params_bind; +Map params_bind; + +auto IsConstant = [](const Expr& expr) { + if (expr->IsInstance()) +return true; + if (expr->IsInstance()) { +auto tuple = expr.as(); +for (const auto& field : tuple->fields) { + if (!field->IsInstance()) +return false; +} +return true; + } + return false; +}; for (auto pair : region_args[region]) { params.push_back(pair.first); - if (const auto* cn = pair.second.as()) { -params_bind[pair.first->name_hint()] = cn->data; + if (IsConstant(pair.second)) { +params_bind.Set(pair.first, pair.second); } else { param_expr.push_back(pair.second); } @@ -428,7 +442,7 @@ class Partitioner : public ExprMutator { // Constant propagation if (!params_bind.empty()) { - global_region_func = backend::BindParamsByName(global_region_func, params_bind); + global_region_func = Downcast(relay::Bind(global_region_func, params_bind)); } std::string fname = name; diff --git a/tests/python/relay/test_pass_partition_graph.py b/tests/python/relay/test_pass_partition_graph.py index 2a4fd31..d78b9ea 100644 --- a/tests/python/relay/test_pass_partition_graph.py +++ b/tests/python/relay/test_pass_partition_graph.py @@ -1155,6 +1155,42 @@ def test_duplicate_merge_and_tuplegetitem(): partitioned = seq(mod) assert tvm.ir.structural_equal(partitioned, ref_mod, map_free_vars=True) +def test_constant_tuples(): +@reg.register("qnn.concatenate", "target.const_tuples") +def add(attrs, args): # pylint: disable=unused-variable +return True + +def create_graph(): +a = relay.var('a', shape=(10, 10), dtype="uint8") +b = relay.var('b', shape=(10, 10), dtype="uint8") +a1 = relay.abs(a) + +zeroi = relay.const(1, "int32") +zerof = relay.const(0, "float32") +con = relay.qnn.op.concatenate((a1, b), + input_scales=(zerof, zerof), + input_zero_points=(zeroi, zeroi), + output_scale=zerof, + output_zero_point=zeroi, + axis=1) + +f = relay.Function([a, b], con) +mod = tvm.IRModule.from_expr(f) +return mod + +seq = tvm.transform.Sequential([ +transform.AnnotateTarget("const_tuples"), +transform.MergeCompilerRegions(), +transform.PartitionGraph(), +]) + +partitioned = seq(create_graph()) +concat = partitioned["const_tuples_0"].body +assert type(concat.args[1]) == relay.Tuple +assert type(concat.args[2]) == relay.Tuple +assert type(concat.args[3]) == relay.Constant +assert type(concat.args[4]) == relay.Constant + if __name__ == "__main__": test_multi_node_compiler() test_extern_ccompiler_single_op() @@ -1171,3 +1207,4 @@ if __name__ == "__main__": test_multiple_use_of_an_output() test_duplicate_outputs() test_duplicate_merge_and_tuplegetitem() +test_constant_tuples()
[GitHub] [incubator-tvm-vta] huajsj commented on pull request #7: [pynq_driver] fix device early return
huajsj commented on pull request #7: URL: https://github.com/apache/incubator-tvm-vta/pull/7#issuecomment-621379576 @zhanghaohit, thanks for the contribution, do we know the root cause for this early return issue? a sleep solution may not always work and would impact performance, we better to avoid that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] tqchen edited a comment on pull request #5436: [TFLite Runtime] Re-enable test for remote execution via RPC
tqchen edited a comment on pull request #5436: URL: https://github.com/apache/incubator-tvm/pull/5436#issuecomment-621364933 NOTE: - the ci-cpu is still blocked by https://github.com/apache/incubator-tvm/issues/5455 - but ci-gpu is now updated to sync with the latest state. If we can update the testcase to pre-check the env before test, we should be able to cover the case already via ci-gpu This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] tqchen commented on pull request #5436: [TFLite Runtime] Re-enable test for remote execution via RPC
tqchen commented on pull request #5436: URL: https://github.com/apache/incubator-tvm/pull/5436#issuecomment-621364933 NOTE: the ci-cpu is still blocked by https://github.com/apache/incubator-tvm/issues/5455 ci-gpu is now updated. If we can update the testcase to pre-check the env before test, we should be able to cover the case already via ci-gpu This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] icemelon9 commented on pull request #5357: [Relay] enable blocking format in x86 conv2d and fold scale axis
icemelon9 commented on pull request #5357: URL: https://github.com/apache/incubator-tvm/pull/5357#issuecomment-621364453 @Menooker Could you share the script that can reproduce the error? It's a bit weird that `_alter_conv2d_layout` receive a conv2d op with `NCHWc` layout. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] tqchen commented on pull request #5392: [CI] Migrate Tensorflow and Tensorflow lite in CI to 2.1.0
tqchen commented on pull request #5392: URL: https://github.com/apache/incubator-tvm/pull/5392#issuecomment-621363701 NOTE: the mainline ci-gpu now takes in effect This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] kevinthesun commented on a change in pull request #4312: [TOPI][Relay][OP] Dynamic NMS and strided_slice
kevinthesun commented on a change in pull request #4312: URL: https://github.com/apache/incubator-tvm/pull/4312#discussion_r417499071 ## File path: python/tvm/relay/frontend/tensorflow.py ## @@ -612,6 +615,51 @@ def _impl(inputs, attr, params, mod): out = _op.transpose(out, axes=(0, 2, 3, 4, 1)) return out + +def _nms(): +def _impl(inputs, attr, params, mod): +# Get parameter values +max_output_size = int(np.atleast_1d(inputs[2].data.asnumpy().astype("int64"))[0]) +iou_threshold = np.atleast_1d(inputs[3].data.asnumpy())[0] +# score_threshold was introduced from V3 +score_threshold = np.atleast_1d(inputs[4].data.asnumpy())[0] if len(inputs) > 4 else None Review comment: Need to use 0.0 instead of None here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] tqchen commented on issue #4102: [Relay] Support for Tensorflow 2.0
tqchen commented on issue #4102: URL: https://github.com/apache/incubator-tvm/issues/4102#issuecomment-621363297 NOTE: the Mainline CI now uses tensorflow 2.0 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-tvm] branch master updated (f1e87f1 -> 4d8816c)
This is an automated email from the ASF dual-hosted git repository. tqchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git. from f1e87f1 [CODEGEN][CUDA] Fix a bug when vectorized load&store was involved for… (#5428) add 4d8816c [CI] update ci-gpu to the latest (#5469) No new revisions were added by this update. Summary of changes: Jenkinsfile| 2 +- docker/install/ubuntu_install_mxnet.sh | 2 +- docs/conf.py | 2 +- tests/scripts/task_sphinx_precheck.sh | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-)
[GitHub] [incubator-tvm] zhiics commented on a change in pull request #5360: [QNN] Support CallNode inputs in qnn.concatenate
zhiics commented on a change in pull request #5360: URL: https://github.com/apache/incubator-tvm/pull/5360#discussion_r417496245 ## File path: python/tvm/relay/qnn/op/qnn.py ## @@ -180,15 +180,16 @@ def concatenate(data, The concatenated quantized tensor. """ -data = list(data) -if not data: -raise ValueError("relay.concatenate requires data to be non-empty.") +if isinstance(data, (list, tuple)): +data = Tuple(data) +if isinstance(data, TupleWrapper): Review comment: I think we should use `elif` so that there is no need to check again if `data` is list of tuple ## File path: src/relay/qnn/op/concatenate.cc ## @@ -149,8 +149,18 @@ Expr ConcatenateQnnCanonicalize(const Attrs& attrs, const Array& new_args, // If the output qnn params do not match the input qnn params, we can call requantize on the input // expr first, followed by a concatenate on the requantized input exprs. - auto tuple_data = data.as(); - CHECK(tuple_data != nullptr); + Array tuple_exprs; + if (data->IsInstance()) { +tuple_exprs = data.as()->fields; + } + // if the data is a CallNode, use TupleGetItems + if (data->IsInstance()) { Review comment: `else if` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-tvm] branch master updated: [CODEGEN][CUDA] Fix a bug when vectorized load&store was involved for… (#5428)
This is an automated email from the ASF dual-hosted git repository. tqchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git The following commit(s) were added to refs/heads/master by this push: new f1e87f1 [CODEGEN][CUDA] Fix a bug when vectorized load&store was involved for… (#5428) f1e87f1 is described below commit f1e87f1b18be048183bb8d09f417e320d46f8a34 Author: boh_inspur <61525430+boh-ins...@users.noreply.github.com> AuthorDate: Wed Apr 29 12:22:25 2020 -0500 [CODEGEN][CUDA] Fix a bug when vectorized load&store was involved for… (#5428) * [CODEGEN][CUDA] Fix a bug when vectorized load&store was involved for "char2" * Add unittest for char2 * vector element load support char2&add some unittest for vector element load * Merge common up logic&Support char3&Add unittest for char3 --- src/target/source/codegen_cuda.cc | 37 --- tests/python/unittest/test_target_codegen_cuda.py | 29 +- 2 files changed, 48 insertions(+), 18 deletions(-) diff --git a/src/target/source/codegen_cuda.cc b/src/target/source/codegen_cuda.cc index 56d7162..a911e6b 100644 --- a/src/target/source/codegen_cuda.cc +++ b/src/target/source/codegen_cuda.cc @@ -272,9 +272,17 @@ void CodeGenCUDA::PrintVecElemLoad( static const char access[] = {'x', 'y', 'z', 'w'}; CHECK(i >= 0 && i < (t.is_float16() ? 8 : 4)); if ((t.is_int()) && t.bits() == 8) { -os << "((char)(" << vec << " >> " << i * 8 << "))"; +if (t.lanes() == 2 || t.lanes() == 3) { + os << vec << "." << access[i % t.lanes()]; +} else { + os << "((char)(" << vec << " >> " << i * 8 << "))"; +} } else if ((t.is_uint()) && t.bits() == 8) { -os << "((unsigned char)(" << vec << " >> " << i * 8 << "))"; +if (t.lanes() == 2 || t.lanes() == 3) { + os << vec << "." << access[i % t.lanes()]; +} else { + os << "((unsigned char)(" << vec << " >> " << i * 8 << "))"; +} } else if (t.is_float16()) { os << "((half2*)(&(" << vec << "." << access[i / 2] << ")))->" << access[i % 2]; @@ -289,12 +297,17 @@ void CodeGenCUDA::PrintVecElemStore( static const char access[] = {'x', 'y', 'z', 'w'}; CHECK(i >= 0 && i < (t.is_float16() ? 8 : 4)); if (t.bits() == 8 && (t.is_int() || t.is_uint())) { -stream << vec << "="; -// Do not read the first undef lane. -if (i != 0) { - stream << vec << " & ~(0x00ff << " << i * 8 << ") |"; +if (t.lanes() == 2 || t.lanes() == 3) { + stream << vec << '.' << access[i % t.lanes()] << "=" + << "(" << value << ");\n"; +} else { + stream << vec << "="; + // Do not read the first undef lane. + if (i != 0) { +stream << vec << " & ~(0x00ff << " << i * 8 << ") |"; + } + stream << "(" << value << " << " << i * 8 << ");\n"; } -stream << "(" << value << " << " << i * 8 << ");\n"; } else if (t.is_float16()) { stream << "((half2*)(&(" << vec << "." << access[i / 2] << ")))->" << access[i % 2] << " = " << value << ";\n"; @@ -789,11 +802,13 @@ void CodeGenCUDA::PrintVecElemLoadExpr( DataType t, int i, const std::string& value, std::ostream& os) { CHECK_GT(t.lanes(), 1); if (t.bits() == 8 && (t.is_int() || t.is_uint())) { -if (i != 0) { - os << "|"; +if (!(t.lanes() == 2 || t.lanes() == 3)) { + if (i != 0) { +os << "|"; + } + os << "((0x00ff << " << i * 8 << ") & (" << value << " << " << i * 8 << "))"; + return; } -os << "((0x00ff << " << i * 8 << ") & (" << value << " << " << i * 8 << "))"; -return; } if (t.is_float16()) { diff --git a/tests/python/unittest/test_target_codegen_cuda.py b/tests/python/unittest/test_target_codegen_cuda.py index 49a7933..50705e8 100644 --- a/tests/python/unittest/test_target_codegen_cuda.py +++ b/tests/python/unittest/test_target_codegen_cuda.py @@ -55,7 +55,12 @@ def test_cuda_vectorize_add(): check_cuda("float32", 64, 2) check_cuda("float32", 64, 3) check_cuda("float32", 64, 4) +check_cuda("int8",64, 2) +check_cuda("int8",64, 3) check_cuda("int8",64, 4) +check_cuda("uint8", 64, 2) +check_cuda("uint8", 64, 3) +check_cuda("uint8", 64, 4) check_cuda("float16", 64, 2) check_cuda("float16", 64, 4) check_cuda("float16", 64, 6) @@ -112,15 +117,17 @@ def test_cuda_vectorize_load(): b = tvm.nd.empty((n,), B.dtype, ctx) fun(a,b) tvm.testing.assert_allclose(a.asnumpy(), b.asnumpy()) +check_cuda("int8", 64, 2) +check_cuda("int8", 64, 3) +check_cuda("int8", 64, 4) check_cuda("int8", 64, 8) check_cuda("int8", 64, 16) -def test_cuda_make_int8x4(): -def check_cuda(n, value): +def test_cuda_make_int8(): +def check_cuda(n, value, lanes): if not tvm.gpu(0).exist or not tvm.runtime.enabled("cuda"):
[GitHub] [incubator-tvm] tqchen commented on pull request #5428: [CODEGEN][CUDA] Fix a bug when vectorized load&store was involved for…
tqchen commented on pull request #5428: URL: https://github.com/apache/incubator-tvm/pull/5428#issuecomment-621350989 Thanks @wpan11nv @boh-inspur ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] mbaret commented on pull request #5360: [QNN] Support CallNode inputs in qnn.concatenate
mbaret commented on pull request #5360: URL: https://github.com/apache/incubator-tvm/pull/5360#issuecomment-621346205 @anijain2305 @zhiics could you take a look at the changes I've made and see if this is now OK? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] icemelon9 commented on a change in pull request #5357: [Relay] enable blocking format in x86 conv2d and fold scale axis
icemelon9 commented on a change in pull request #5357: URL: https://github.com/apache/incubator-tvm/pull/5357#discussion_r417472667 ## File path: python/tvm/relay/op/strategy/x86.py ## @@ -84,8 +88,13 @@ def conv2d_strategy_cpu(attrs, inputs, out_type, target): raise ValueError("dilation should be positive value") if groups == 1: -if layout == "NCHW": -assert kernel_layout == "OIHW" +if layout.startswith("NCHW"): Review comment: No, inside `_alter_conv2d_layout`, the `conv2d` layout should still be NCHW, and this function will then convert layout into NCHWc This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-tvm] branch master updated: [TFLITE] Match TFLite shape for SSD custom op (#5473)
This is an automated email from the ASF dual-hosted git repository. anijain2305 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git The following commit(s) were added to refs/heads/master by this push: new 95a816c [TFLITE] Match TFLite shape for SSD custom op (#5473) 95a816c is described below commit 95a816c9078c5cc7cb08d354a069a15f5d18951c Author: mbaret <55580676+mba...@users.noreply.github.com> AuthorDate: Wed Apr 29 17:13:16 2020 +0100 [TFLITE] Match TFLite shape for SSD custom op (#5473) This patch ensures that the output shape from TVM's Detection_PostProcess is the same as TFLite's and expands the unit test to confirm this. Change-Id: If5db95741533f131241dfebbaa7708dbd528fe70 --- python/tvm/relay/frontend/tflite.py | 13 + tests/python/frontend/tflite/test_forward.py | 7 +++ 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/python/tvm/relay/frontend/tflite.py b/python/tvm/relay/frontend/tflite.py index b9a1657..66d0ff3 100644 --- a/python/tvm/relay/frontend/tflite.py +++ b/python/tvm/relay/frontend/tflite.py @@ -2257,6 +2257,7 @@ class OperatorConverter(object): assert len(inputs) == 3, "inputs length should be 3" cls_pred = self.get_expr(inputs[1].tensor_idx) loc_prob = self.get_expr(inputs[0].tensor_idx) +batch_size = inputs[1].tensor.Shape(0) anchor_values = self.get_tensor_value(inputs[2]) anchor_boxes = len(anchor_values) anchor_type = self.get_tensor_type_str(inputs[2].tensor.Type()) @@ -2284,7 +2285,7 @@ class OperatorConverter(object): loc_prob = _op.concatenate( [loc_coords[1], loc_coords[0], loc_coords[3], loc_coords[2]], axis=2 ) -loc_prob = _op.reshape(loc_prob, [1, anchor_boxes*4]) +loc_prob = _op.reshape(loc_prob, [batch_size, anchor_boxes*4]) # anchor coords are in yxhw format # need to convert to ltrb @@ -2327,10 +2328,14 @@ class OperatorConverter(object): ret = _op.vision.non_max_suppression(ret[0], ret[1], **non_max_suppression_attrs) ret = _op.vision.get_valid_counts(ret, 0) valid_count = ret[0] +# keep only the top 'max_detections' rows +ret = _op.strided_slice(ret[1], +[0, 0, 0], +[batch_size, custom_options["max_detections"], anchor_boxes]) # the output needs some reshaping to match tflite -ret = _op.split(ret[1], 6, axis=2) -cls_ids = ret[0] -scores = ret[1] +ret = _op.split(ret, 6, axis=2) +cls_ids = _op.reshape(ret[0], [batch_size, -1]) +scores = _op.reshape(ret[1], [batch_size, -1]) boxes = _op.concatenate([ret[3], ret[2], ret[5], ret[4]], axis=2) ret = _expr.TupleWrapper(_expr.Tuple([boxes, cls_ids, scores, valid_count]), size=4) return ret diff --git a/tests/python/frontend/tflite/test_forward.py b/tests/python/frontend/tflite/test_forward.py index 7ff4c31..bc3f32a 100644 --- a/tests/python/frontend/tflite/test_forward.py +++ b/tests/python/frontend/tflite/test_forward.py @@ -1731,7 +1731,14 @@ def test_detection_postprocess(): ["raw_outputs/box_encodings", "raw_outputs/class_predictions"], num_output=4) # check valid count is the same assert tvm_output[3] == tflite_output[3] +# check all the output shapes are the same +assert tvm_output[0].shape == tflite_output[0].shape +assert tvm_output[1].shape == tflite_output[1].shape +assert tvm_output[2].shape == tflite_output[2].shape valid_count = tvm_output[3][0] +# only check the valid detections are the same +# tvm has a different convention to tflite for invalid detections, it uses all -1s whereas +# tflite appears to put in nonsense data instead tvm_boxes = tvm_output[0][0][:valid_count] tvm_classes = tvm_output[1][0][:valid_count] tvm_scores = tvm_output[2][0][:valid_count]
[GitHub] [incubator-tvm] anijain2305 commented on pull request #5473: [TFLITE] Match TFLite shape for SSD custom op
anijain2305 commented on pull request #5473: URL: https://github.com/apache/incubator-tvm/pull/5473#issuecomment-621313035 Thanks @mbaret This is merged! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] roastduck edited a comment on pull request #5465: [TIR] Convert if_then_else intrinsics to if-statements
roastduck edited a comment on pull request #5465: URL: https://github.com/apache/incubator-tvm/pull/5465#issuecomment-621285171 Made the pass default when lowering. So that not only generated device code, but also lowered IR in simple mode, will reflect this pass. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-tvm-vta] branch master updated: [Hardware][Xilinx] utilize uram (#6)
This is an automated email from the ASF dual-hosted git repository. moreau pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-tvm-vta.git The following commit(s) were added to refs/heads/master by this push: new 22e4b88 [Hardware][Xilinx] utilize uram (#6) 22e4b88 is described below commit 22e4b88060d26cc4c309cbbbf6fba66770c20e5b Author: ZHANG Hao AuthorDate: Wed Apr 29 23:57:04 2020 +0800 [Hardware][Xilinx] utilize uram (#6) Co-authored-by: Zhang Hao --- config/vta_config.py | 8 hardware/xilinx/scripts/vivado.tcl | 4 2 files changed, 12 insertions(+) diff --git a/config/vta_config.py b/config/vta_config.py index 9bb6d7b..d7ea0df 100644 --- a/config/vta_config.py +++ b/config/vta_config.py @@ -79,6 +79,8 @@ def main(): help="returns output memory depth") parser.add_argument("--get-out-mem-axi-ratio", action="store_true", help="returns ratio between output element width and axi width") +parser.add_argument("--get-num-wgt-mem-uram", action="store_true", +help="returns number of weight memory blocks to be implemented on URAM") parser.add_argument("--get-axi-cache-bits", action="store_true", help="returns AXI system ARCACHE/AWCACHE hardcoded bit value") parser.add_argument("--get-axi-prot-bits", action="store_true", @@ -193,6 +195,12 @@ def main(): if args.get_out_mem_axi_ratio: print(pkg.out_mem_axi_ratio) +if args.get_num_wgt_mem_uram: +if hasattr(pkg, 'num_wgt_mem_uram'): +print(pkg.num_wgt_mem_uram) +else: +print(0) + if args.get_axi_cache_bits: print(pkg.axi_cache_bits) diff --git a/hardware/xilinx/scripts/vivado.tcl b/hardware/xilinx/scripts/vivado.tcl index 1f8f1da..2ed0733 100644 --- a/hardware/xilinx/scripts/vivado.tcl +++ b/hardware/xilinx/scripts/vivado.tcl @@ -50,6 +50,7 @@ set wgt_mem_depth [exec python $vta_config --get-wgt-mem-depth] set out_part [exec python $vta_config --get-out-mem-banks] set out_mem_width [exec python $vta_config --get-out-mem-width] set out_mem_depth [exec python $vta_config --get-out-mem-depth] +set num_wgt_mem_uram [exec python $vta_config --get-num-wgt-mem-uram] # AXI bus signals set axi_cache [exec python $vta_config --get-axi-cache-bits] @@ -247,6 +248,9 @@ for {set i 0} {$i < $wgt_part} {incr i} { connect_bd_intf_net -intf_net compute_0_wgt_mem_${i}_V_PORTA \ [get_bd_intf_pins $wgt_mem/BRAM_PORTB] \ $portb + if { $device_family eq "zynq-ultrascale+" && $i < $num_wgt_mem_uram } { +set_property -dict [list CONFIG.PRIM_type_to_Implement {URAM}] $wgt_mem + } } # Create and connect out_mem partitions
[GitHub] [incubator-tvm] roastduck commented on pull request #5465: [TIR] Convert if_then_else intrinsics to if-statements
roastduck commented on pull request #5465: URL: https://github.com/apache/incubator-tvm/pull/5465#issuecomment-621285171 Made the pass default when lowering. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] roastduck commented on pull request #5465: [TIR] Convert if_then_else intrinsics to if-statements
roastduck commented on pull request #5465: URL: https://github.com/apache/incubator-tvm/pull/5465#issuecomment-621282600 > Given that it is a quit fundamental construct that can leads to concise generated code(e.g. ? in CUDA), I think we should keep it throughout the lowering process by default. We can still merge the pass if it is an optional utility I agree. I will make it default. > If the goal is mainly to utilize LoopParition, perhaps a better way would be to insert likely into the condition and enable LoopPartition to deal with if_then_else. I think we are unable to insert `likely` automatically, because we don't know whether the condition is likely or unlikely, so `likely` must be written by users. Besides, what do you think about the cascading if_then_else issue above? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] mbaret commented on a change in pull request #5474: [Frontend][TFLite] ADD_N operator
mbaret commented on a change in pull request #5474: URL: https://github.com/apache/incubator-tvm/pull/5474#discussion_r417383328 ## File path: tests/python/frontend/tflite/test_forward.py ## @@ -1896,6 +1896,41 @@ def test_forward_mediapipe_hand_landmark(): tvm.testing.assert_allclose(np.squeeze(tvm_output[i]), np.squeeze(tflite_output[i]), rtol=1e-5, atol=1e-5) +### Review comment: I think this should be moved so it's with the other element-wise tests and we keep the order the tests are run in the same as the order they appear in the file. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] mbaret commented on pull request #5409: [BYOC] Don't annotate constants
mbaret commented on pull request #5409: URL: https://github.com/apache/incubator-tvm/pull/5409#issuecomment-621253051 I've implemented an alternative fix in partition_graph.cc by explicitly binding constant tuples (see [5476](https://github.com/apache/incubator-tvm/pull/5476)). This means this PR is no longer required. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] mbaret opened a new pull request #5476: [BYOC] Bind constant tuples in graph partitioner
mbaret opened a new pull request #5476: URL: https://github.com/apache/incubator-tvm/pull/5476 Constant tuples were not being propagated into partitions making it impossible to codegen functions like concatenate. This extends the constant binding that happens in PartitionGraph to also include constant tuples. This is an alternative solution to the one presented in this PR: [5409](https://github.com/apache/incubator-tvm/pull/5409). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] mbaret commented on pull request #5473: [TFLITE] Match TFLite shape for SSD custom op
mbaret commented on pull request #5473: URL: https://github.com/apache/incubator-tvm/pull/5473#issuecomment-621250794 cc @anijain2305 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] tqchen edited a comment on pull request #5465: [TIR] Convert if_then_else intrinsics to if-statements
tqchen edited a comment on pull request #5465: URL: https://github.com/apache/incubator-tvm/pull/5465#issuecomment-621242357 Thanks @roastduck Given that it is a quit fundamental construct that can leads to concise generated code(e.g. ? in CUDA), I think we should keep it throughout the lowering process by default. We can still merge the pass if it is an optional utility If the goal is mainly to utilize LoopParition, perhaps a better way would be to insert likely into the condition and enable LoopPartition to deal with if_then_else. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] tqchen commented on pull request #5465: [TIR] Convert if_then_else intrinsics to if-statements
tqchen commented on pull request #5465: URL: https://github.com/apache/incubator-tvm/pull/5465#issuecomment-621242357 Thanks @roastduck if the goal is mainly to utilize LoopParition, perhaps a better way would be to insert likely into the condition and enable LoopPartition to deal with if_then_else. Given that it is a quit fundamental construct that can leads to concise generated code(e.g. ? in CUDA). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] tqchen commented on issue #5446: [NMS] nms operation cost too mush time
tqchen commented on issue #5446: URL: https://github.com/apache/incubator-tvm/issues/5446#issuecomment-621241256 Thanks for start the discussion, Please open a new discussion thread on https://discuss.tvm.ai/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] tqchen commented on issue #5464: [OpenCL] `directly 4 8 bit int in integer` causes compiling error
tqchen commented on issue #5464: URL: https://github.com/apache/incubator-tvm/issues/5464#issuecomment-621240765 cc @kazum @masahi This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] tqchen commented on pull request #5417: [RUNTIME][uTVM] AutoTVM + uTVM for Cortex-M7
tqchen commented on pull request #5417: URL: https://github.com/apache/incubator-tvm/pull/5417#issuecomment-621237313 @weberlo @tmoreau89 @liangfu @u99127 please take another look and http://tvm.apache.org/docs/contribute/code_review.html#approve-and-request-changes-explicitly This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] siju-samuel opened a new pull request #5475: [FRONTEND][TFLITE]Logical not op support
siju-samuel opened a new pull request #5475: URL: https://github.com/apache/incubator-tvm/pull/5475 Supported Logical Not for TFLite frontend. @FrozenGene @masahi Please help to review and merge this PR. TIA This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] mbaret commented on a change in pull request #5345: [RELAY] Move frontend utils
mbaret commented on a change in pull request #5345: URL: https://github.com/apache/incubator-tvm/pull/5345#discussion_r417275917 ## File path: python/tvm/relay/qnn/op/legalizations.py ## @@ -20,8 +20,8 @@ import tvm from tvm import relay +import numpy as np Review comment: If you take a look here: https://github.com/apache/incubator-tvm/blob/063ba63a4448432a1e17ec8b793152cd8bba7c2c/docker/install/ubuntu_install_python_package.sh#L24 CI runs with pylint 1.9.4 and I imagine that's where the difference is coming from. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] Menooker commented on pull request #5357: [Relay] enable blocking format in x86 conv2d and fold scale axis
Menooker commented on pull request #5357: URL: https://github.com/apache/incubator-tvm/pull/5357#issuecomment-621165175 @icemelon9 If you unchenge these, you will get ``` File "/home/menooker/work/mobilenet/tvm/python/tvm/relay/op/nn/_nn.py", line 97, in alter_op_layout_conv2d return topi.nn.conv2d_alter_layout(attrs, inputs, tinfos, out_type) File "", line 2, in conv2d_alter_layout File "/home/menooker/work/mobilenet/tvm/python/tvm/target/generic_func.py", line 267, in dispatch_func return dispatch_dict[k](*args, **kwargs) File "/home/menooker/work/mobilenet/tvm/topi/python/topi/x86/conv2d_alter_op.py", line 43, in _alter_conv2d_layout relay.op.get("nn.conv2d"), attrs, tinfos, out_type, target) File "/home/menooker/work/mobilenet/tvm/python/tvm/relay/backend/compile_engine.py", line 183, in select_implementation all_impls = get_valid_implementations(op, attrs, inputs, out_type, target) File "/home/menooker/work/mobilenet/tvm/python/tvm/relay/backend/compile_engine.py", line 124, in get_valid_implementations strategy = fstrategy(attrs, inputs, out_type, target) File "/home/menooker/work/mobilenet/tvm/python/tvm/target/generic_func.py", line 45, in __call__ return _ffi_api.GenericFuncCallFunc(self, *args) File "/home/menooker/work/mobilenet/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 213, in __call__ raise get_last_ffi_error() [bt] (3) /home/menooker/work/mobilenet/tvm/build/libtvm.so(TVMFuncCall+0x52) [0x7f4a967f5272] [bt] (2) /home/menooker/work/mobilenet/tvm/build/libtvm.so(+0x7b19c8) [0x7f4a963779c8] [bt] (1) /home/menooker/work/mobilenet/tvm/build/libtvm.so(tvm::GenericFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const+0x1b6) [0x7f4a96377746] [bt] (0) /home/menooker/work/mobilenet/tvm/build/libtvm.so(+0xc2aa43) [0x7f4a967f0a43] File "/home/menooker/work/mobilenet/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 78, in cfun rv = local_pyfunc(*pyargs) File "/home/menooker/work/mobilenet/tvm/python/tvm/relay/op/strategy/x86.py", line 108, in conv2d_strategy_cpu raise RuntimeError("Unsupported conv2d layout {} for x86".format(layout)) RuntimeError: Unsupported conv2d layout NCHW1c for x86 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] Menooker commented on a change in pull request #5357: [Relay] enable blocking format in x86 conv2d and fold scale axis
Menooker commented on a change in pull request #5357: URL: https://github.com/apache/incubator-tvm/pull/5357#discussion_r417267279 ## File path: python/tvm/relay/op/strategy/x86.py ## @@ -84,8 +88,13 @@ def conv2d_strategy_cpu(attrs, inputs, out_type, target): raise ValueError("dilation should be positive value") if groups == 1: -if layout == "NCHW": -assert kernel_layout == "OIHW" +if layout.startswith("NCHW"): Review comment: Actually the changes here are necessary. `_alter_conv2d_layout` which is used in `AlterOpLayout` pass will call `select_implementation(relay.op.get('nn.conv2d'))` which will finally go to this function `conv2d_strategy_cpu`. And the layout is passed as blocking layout This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] roastduck commented on a change in pull request #5345: [RELAY] Move frontend utils
roastduck commented on a change in pull request #5345: URL: https://github.com/apache/incubator-tvm/pull/5345#discussion_r417265081 ## File path: python/tvm/relay/qnn/op/legalizations.py ## @@ -20,8 +20,8 @@ import tvm from tvm import relay +import numpy as np Review comment: I run `./tests/scripts/task_lint.sh` directly from the repository root. My `pylint` version: ``` pylint 2.4.4 astroid 2.3.3 Python 3.7.0 (default, Mar 12 2019, 20:10:09) [GCC 7.3.0] ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] roastduck commented on pull request #5465: [TIR] Convert if_then_else intrinsics to if-statements
roastduck commented on pull request #5465: URL: https://github.com/apache/incubator-tvm/pull/5465#issuecomment-621159263 Fixed the breaking `fifo_buffer` test by avoiding passing the same buffer as different arguments to a CUDA kernel. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] maheshambule opened a new pull request #5474: [Frontend][TFLite] ADD_N operator
maheshambule opened a new pull request #5474: URL: https://github.com/apache/incubator-tvm/pull/5474 - added support for ADD_N operator - refactored some of the ADD code This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-tvm] siju-samuel commented on pull request #5472: [TENSORFLOW]TF Addons activations support added
siju-samuel commented on pull request #5472: URL: https://github.com/apache/incubator-tvm/pull/5472#issuecomment-621154395 ` Running relay Tensorflow frontend test... = test session starts == platform linux -- Python 3.6.10, pytest-5.3.5, py-1.8.1, pluggy-0.13.1 -- /usr/bin/python3 cachedir: .pytest_cache rootdir: /workspace collecting ... collected 20 items / 1 error / 19 selected ERRORS __ ERROR collecting tests/python/frontend/tensorflow/test_forward.py ___ ImportError while importing test module '/workspace/tests/python/frontend/tensorflow/test_forward.py'. Hint: make sure your test modules/packages have valid Python names. Traceback: tests/python/frontend/tensorflow/test_forward.py:30: in import tensorflow_addons as tfa E ModuleNotFoundError: No module named 'tensorflow_addons' ` @tqchen Could you please help me to install tensorflow addons in CI docker image for running scripts. TIA This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[incubator-tvm] branch master updated: [Frontend][TFLite] support for FILL and SPLIT_V operators (#5330)
This is an automated email from the ASF dual-hosted git repository. zhaowu pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-tvm.git The following commit(s) were added to refs/heads/master by this push: new 063ba63 [Frontend][TFLite] support for FILL and SPLIT_V operators (#5330) 063ba63 is described below commit 063ba63a4448432a1e17ec8b793152cd8bba7c2c Author: Mahesh Ambule <15611578+maheshamb...@users.noreply.github.com> AuthorDate: Wed Apr 29 16:21:57 2020 +0530 [Frontend][TFLite] support for FILL and SPLIT_V operators (#5330) * tflite spliv ops * TFLITE fill and splitv ops * TFLITE fill and splitv ops * TFLITE fill and splitv ops * remove unnecessary operator check --- python/tvm/relay/frontend/tflite.py | 46 ++ tests/python/frontend/tflite/test_forward.py | 57 2 files changed, 96 insertions(+), 7 deletions(-) diff --git a/python/tvm/relay/frontend/tflite.py b/python/tvm/relay/frontend/tflite.py index 2065d60..b9a1657 100644 --- a/python/tvm/relay/frontend/tflite.py +++ b/python/tvm/relay/frontend/tflite.py @@ -78,6 +78,7 @@ class OperatorConverter(object): 'ELU': self.convert_elu, 'EQUAL': self.convert_equal, 'EXP': self.convert_exp, +'FILL': self.convert_fill, 'FLOOR_DIV': self.convert_floor_div, 'FLOOR_MOD': self.convert_floor_mod, 'FLOOR': self.convert_floor, @@ -123,6 +124,7 @@ class OperatorConverter(object): 'SPACE_TO_BATCH_ND': self.convert_space_to_batch_nd, 'SPACE_TO_DEPTH': self.convert_space_to_depth, 'SPLIT': self.convert_split, +'SPLIT_V': self.convert_split_v, 'SQRT': self.convert_sqrt, 'SQUARE': self.convert_square, 'SQUARED_DIFFERENCE': self.convert_squared_difference, @@ -1212,6 +1214,21 @@ class OperatorConverter(object): return out +def convert_fill(self, op): +"""Convert TFLite FILL""" +input_tensors = self.get_input_tensors(op) +assert len(input_tensors) == 2, "input tensors length should be 2" + +if self.has_expr(input_tensors[0].tensor_idx): +raise tvm.error.OpNotImplemented("For dims parameter of Fill operator," + " only constant values are supported.") + +in_dims = list(self.get_tensor_value(input_tensors[0])) +in_value_expr = self.get_expr(input_tensors[1].tensor_idx) +out = _op.full(in_value_expr, in_dims) + +return out + def _convert_reduce(self, relay_op, op): """Generic method to Convert TFLite MEAN operators""" try: @@ -1617,6 +1634,35 @@ class OperatorConverter(object): return out +def convert_split_v(self, op): +"""SPLIT_V implementation.""" +input_tensors = self.get_input_tensors(op) + +assert len(input_tensors) == 3, "input tensors length should be 3" + +input_tensor = input_tensors[0] +input_tensor_idx = input_tensor.tensor_idx +in_expr = self.get_expr(input_tensor_idx) + +if self.has_expr(input_tensors[1].tensor_idx): +raise tvm.error.OpNotImplemented("For size_splits parameter of SPLIT_V operator, " + "only constant values are supported.") +size_splits = list(self.get_tensor_value(input_tensors[1])) +size_splits = tuple(np.cumsum(size_splits)[:-1]) + +axis_tensor = input_tensors[2] +split_axis = self.get_tensor_value(axis_tensor) + +out = _op.split(in_expr, size_splits, axis=int(split_axis)) +# Relay does not like a TupleWrapper of 1 element, further this +# only shows up with tf1.13 if we use a split with num_splits==1. +# In tf 1.14 this doesn't appear as it is automatically a reshape +# operation. +if isinstance(out, _expr.TupleWrapper) and out.size == 1: +out = out[0] + +return out + def convert_slice(self, op): """Convert TFLite SLICE""" input_tensors = self.get_input_tensors(op) diff --git a/tests/python/frontend/tflite/test_forward.py b/tests/python/frontend/tflite/test_forward.py index eb65d82..7ff4c31 100644 --- a/tests/python/frontend/tflite/test_forward.py +++ b/tests/python/frontend/tflite/test_forward.py @@ -216,15 +216,19 @@ def with_fused_activation_function(input_tensor, fn_name): return math_ops.tanh(input_tensor) raise AssertionError("Unknown fused_activation_function {}".format(fn_name)) -def _test_split(in_shape, axis, num_Splits, dtype): -'''internal split tester taking as parameters in_shape, number of tensors to split into - and dtype (data type)''' + +def _test_split(in_shape, axis, num_splits, dtype): +"""internal split tester taking as parameters in_shape, n