Hello!
I wrote an op composed of four CUDA kernels, and now I want to optimize the op,
so I need to know the time ratio of the four kernels.
I tried nvprof but was unable to use it due to permission issues.
Is there a similar test function in TVM?
My current test code is as follows:
module = graph_runtime.create(graph, lib, ctx)
data_tvm =
tvm.nd.array((np.random.uniform(size=input_shape)).astype("float16"))
module.set_input('data', data_tvm)
module.set_input(**params)
module.run()
---
[Visit
Topic](https://discuss.tvm.ai/t/how-do-you-test-the-percentage-of-time-spent-on-several-cuda-kernels/6279/1)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https://discuss.tvm.ai/email/unsubscribe/496ad6440508750c65564844545d4753bd664a65979c7bd7d998c0044180b495).