manupa-arm commented on issue #9022: URL: https://github.com/apache/tvm/issues/9022#issuecomment-921782373
@tqchen , > To say it in another way, we cannot say that "global" definitely mean no stack allocation. The current issue is in the device "CPU" && 'global' certainly means its definitely stack allocation if its less that heuristic size and not the other way around. > If the code needs to impose additional constraint that the memory must be accessible from a separate device(e.g. NPU), it certainly would require a more specialized constraint that is better spelled out explicitly. > As we can see that this is another kind of flexibility we want to enable here -- flexibility of picking possible backend allocation implementations without over constraining the code generator to a backend specific behavior that is platform dependent (like the case of pinned memory Yes this is something we want eventually and we will be working towards achieving with USMP work. Until we have that, the natural assumption should be in absense of a 'constraint' that the memories are more accessible rather than being less accessible (e.g. stack). Its unfortunate that the current design prefers the latter especially in a absense of a constraint. @mbs-octoml , ### Short term solution : I think you are right, we might want to unblock this using a target-dependent kMaxStackAllocaSize. May I ask why was the default chosen to be this ? https://github.com/apache/tvm/blob/1fd8f610953adc39cbd18d82f4a9e92a11575dfc/include/tvm/runtime/device_api.h#L60-L61 Its interesting because the stack size go up beyond that size as it is just looking on a single allocate at a time. i.e. you could have multiple allocates that are less than < 1024. So the stack usage is not even bounded by the current approach. Therefore, to both unlock us with Ethos-U and also somewhat solve the problem that current micro builds using stack for tensors < 1024 instead of the workspace buffer provided, maybe we should just make kMaxStackAllocaSize=0 (a binary decision rather than a value range). @Mousius @leandron @areusch , this means there is going to be another argument for a simple micro deployment to be added to the already long list of arguments. Something like "--use-external-workspace" ? @tqchen , I still feel it would have been super helpful that kMaxStackAllocaSize is by default zero but with the option of going higher based on a user argument. e.g. --max-stack-alloca-size=1024. It is not very convincing that we are leaving out stack allocation of tensors with the prospect of being optimized by mem2reg without doing any marking (i.e. special storage scope) in a somewhat global form. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org