tqchen commented on issue #9022: URL: https://github.com/apache/tvm/issues/9022#issuecomment-920896447
So in the above post I tried to summarize the state. Now let me try to share some of my thoughts based on the summary. First of all, R0 and R1 are not that different in nature. Both tries to introduce two separate scopes that brings different behavior. The main questions boils down to how can we name the "global" scope. Per allocate semantics, we treats "global" as normal CPU memory which can come from stack or platform specific allocation. The system can choose the best way of doing such lowering. However, memory that is accessible from NPU is something that is more specialized and could use a special memory tag for differentiation purposes. While it is OK to differentiate stack allocated memory from a platform specific one, doing so would bring additional burden to the user and would require significant refactor of the operator implementations. Note that we will likely need a related behavior for micro devices as well in the need of N0. The main requests so far comes from need of N1. In that case, it would be easy for AOT generator to allocate memory with special tags("global.workspace"), that enforces workspace allocation since in this setting there is a single expected behavior. So my suggestion would be R1+R2, as it helps to resolve the need in a way that is compatible with the current semantics and usecases. It will also open doors for more future scope dependent optimizations -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org