tqchen commented on issue #9022:
URL: https://github.com/apache/tvm/issues/9022#issuecomment-920896447


   So in the above post I tried to summarize the state. Now let me try to share
   some of my thoughts based on the summary.
   
   First of all, R0 and R1 are not that different in nature. Both tries to 
introduce
   two separate scopes that brings different behavior. The main questions boils 
down
   to how can we name the "global" scope.
   
   Per allocate semantics, we treats "global" as normal CPU memory which can 
come
   from stack or platform specific allocation. The system can choose the best 
way
   of doing such lowering. However, memory that is accessible from NPU is 
something
   that is more specialized and could use a special memory tag for 
differentiation
   purposes.
   
   While it is OK to differentiate stack allocated memory from a platform 
specific one,
   doing so would bring additional burden to the user and would require 
significant
   refactor of the operator implementations.
   
   Note that we will likely need a related behavior for micro devices as well
   in the need of N0. The main requests so far comes from need of N1. In that 
case,
   it would be easy for AOT generator to allocate memory with special 
tags("global.workspace"),
   that enforces workspace allocation since in this setting there is a single 
expected behavior.
   
   So my suggestion would be R1+R2, as it helps to resolve the need in a way 
that is compatible
   with the current semantics and usecases. It will also open doors for more 
future scope dependent optimizations


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to