Lunderberg opened a new pull request, #15839:
URL: https://github.com/apache/tvm/pull/15839

   Prior to this commit, the last kernel launch would not be included in
   a captured CUDA graph.  This commit updates `RewriteCUDAGraph` to
   include the last kernel launch.
   
   The previous implementation assumed that any calls to
   `R.builtin.alloc_tensor` that remain after `StaticPlanBlockMemory` are
   dynamic allocations.  This is not the case, as the allocation of a
   static-shaped output tensor may still use `R.builtin.alloc_tensor`.
   The primary change of this commit was to update `RewriteCUDAGraph` to
   check for static allocations directly, rather than inferring a static
   allocation based on the operation being used.
   
   This change triggered an additional bug, in which the previous
   implementation only checked for output variables if they occurred as
   part of a `VarBinding`, and not if they occurred as the body of a
   `SeqExpr`.  As a result, a captured CUDA graph whose output was
   immediately used as the output of the containing Relax function would
   contain an undefined variable.  This commit updates `RewriteCUDAGraph`
   to operate on a `SeqExpr` rather than a `BindingBlock`, so that the
   `SeqExprNode::body` may be inspected for output variables.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to