tqchen edited a comment on issue #9022:
URL: https://github.com/apache/tvm/issues/9022#issuecomment-920896447


   So in the above post I tried to summarize the state. Now let me try to share 
some of my thoughts based on the summary.
   
   First of all, R0 and R1 are not that different in nature. Both tries to 
introduce two separate scopes that brings different behavior. The main 
questions boils down to how can we name the "global" scope.
   
   Per allocate semantics, we treats "global" as normal CPU memory which can 
come from stack or platform specific allocation. The system can choose the best 
way of doing such lowering. Always lowering to TBAW is indeed more general for 
the need of N1. However, the need N0 would favor stack allocation when 
possible. Note that we will likely need a related behavior for micro devices as 
well when generating operator kernels.
   
   While it is OK to differentiate stack allocated memory from a platform 
specific one, doing so would bring additional burden to the user and would 
require significant refactor of the operator implementations.
   
   The main requests so far comes from need of N1. In that case, it would be 
easy for AOT generator to allocate memory with special 
tags("global.workspace"), that enforces workspace allocation since in this 
setting there is a single expected behavior.
   
   So my suggestion would be R1+R2, as it helps to resolve the need in a way 
that is compatible with the current semantics and usecases. It will also open 
doors for more future scope dependent optimizations


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to