Lunderberg opened a new pull request, #15839: URL: https://github.com/apache/tvm/pull/15839
Prior to this commit, the last kernel launch would not be included in a captured CUDA graph. This commit updates `RewriteCUDAGraph` to include the last kernel launch. The previous implementation assumed that any calls to `R.builtin.alloc_tensor` that remain after `StaticPlanBlockMemory` are dynamic allocations. This is not the case, as the allocation of a static-shaped output tensor may still use `R.builtin.alloc_tensor`. The primary change of this commit was to update `RewriteCUDAGraph` to check for static allocations directly, rather than inferring a static allocation based on the operation being used. This change triggered an additional bug, in which the previous implementation only checked for output variables if they occurred as part of a `VarBinding`, and not if they occurred as the body of a `SeqExpr`. As a result, a captured CUDA graph whose output was immediately used as the output of the containing Relax function would contain an undefined variable. This commit updates `RewriteCUDAGraph` to operate on a `SeqExpr` rather than a `BindingBlock`, so that the `SeqExprNode::body` may be inspected for output variables. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org