jverma-quic commented on code in PR #12971: URL: https://github.com/apache/tvm/pull/12971#discussion_r988345984
########## src/runtime/hexagon/profiler/README.md: ########## @@ -0,0 +1,152 @@ +<!--- Licensed to the Apache Software Foundation (ASF) under one --> +<!--- or more contributor license agreements. See the NOTICE file --> +<!--- distributed with this work for additional information --> +<!--- regarding copyright ownership. The ASF licenses this file --> +<!--- to you under the Apache License, Version 2.0 (the --> +<!--- "License"); you may not use this file except in compliance --> +<!--- with the License. You may obtain a copy of the License at --> + +<!--- http://www.apache.org/licenses/LICENSE-2.0 --> + +<!--- Unless required by applicable law or agreed to in writing, --> +<!--- software distributed under the License is distributed on an --> +<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY --> +<!--- KIND, either express or implied. See the License for the --> +<!--- specific language governing permissions and limitations --> +<!--- under the License. --> + +# Hexagon lightweight instrumentation based profiling (LWP) + +For Hexagon, LWP can be used to get function and loop level processor cycle count. +This is done by instrumenting the code with profiling builtin calls using a TIR pass. +During codegen, these builtin calls are replaced with the calls to a hexagon specific +handler which records the runtime information into a buffer. +This buffer is written into a JSON file ('lwp.json') which is processed to construct +function and loop level profiling information as a csv file. + +**Note:** During codegen, the profiling builtin calls are ignored for other targets. + +The TIR pass offers several config flags to control the level of instrumentation +as mentioned below: + +1) `lwp_disable_func_prof`: To disable function level profiling. By default, it is +set to 'False', i.e., the function level profiling is enabled. + +2) `instr_siblings`: When enabled, only loops with siblings are instrumented and rest are +ignored. The inner-most loops are always excluded from instrumentation unless overwritten +using `lwp_min_height`. This is done to minimize the adverse effect of instrumentation on +actual performance. By default, it is set to 'True'. + +3) `lwp_max_depth`: To instrument loops up to a certain depth. This flag is effective +only when `instr_siblings` is disabled. By default, it is set to 0. + +4) `lwp_min_height`: To exclude inner loops up to a certain height from instrumentation. +By default, it is set to 1. + +For additional usage information on various config flags, please refer to the tests in +`tests/python/unittest/test_tir_transform_profiling_instr.py` + + +## How to use lightweight profiling with RPC Launcher: + +`tests/python/contrib/test_hexagon/test_launcher.py` contains two tests, `test_lwp` and +`test_lwp_multiple_conv2d`, to demonstrate lightweight profiling usage. + +The steps involved are as follows: + +1) While building a model, set `tir.instrument_lwp` to `True`. + By default, the builtin calls will only be inserted for the loops with siblings. But it + can be altered using LWP config options as described above. +2) Save the binary file as it will be needed to process the profiling data (lwp.json) later. +3) Create `HexagonProfiler` object. It is passed to `get_profile_output` to check if the model was +built with profiling enabled before copying the data from the device. + +``` +with tvm.transform.PassContext(opt_level=3, config={"tir.instrument_lwp": True}): + lowered = tvm.relay.build( + relay_mod, + tvm.target.Target(target_hexagon, host=target_hexagon), + ... + ) + + # Save binary file to post-process lwp output + lowered.get_lib().save(dso_binary_path) + + # Create HexagonProfiler object. It sets the profiling mode based on the PassContext config. + profiler = HexagonProfiler() +``` + +4) Run the model and get profile data (`lwp.json`) from the device (or the simulator): + +**Note:** + +- For on-device runs, 'lwp.json' is generated in the same remote directory where 'tvm_rpc_android' +is copied. This remote path is needed to copy the file from the device and can be found in +'hexagon_server_process["launcher"].workspace'. + +- For the simulator runs, the remote path is not needed as the 'lwp.json' file is generated in the +simulator test output directory. + +``` + remote_path = "" + if android_serial_number is not None and android_serial_number != "simulator": + # Get the workspace on the device to extract lwp output + remote_path = hexagon_server_process["launcher"]._workspace + + # Get profile data (lwp.json) from the device + prof_out = hexagon_launcher.get_profile_output(profiler, hexagon_session, remote_path, temp) Review Comment: I agree. Let me try to do something about it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org