manupa-arm commented on issue #9022:
URL: https://github.com/apache/tvm/issues/9022#issuecomment-921782373


   @tqchen ,
   
   > To say it in another way, we cannot say that "global" definitely mean no 
stack allocation.
   
   The current issue is in the device "CPU" && 'global' certainly means its 
definitely stack allocation if its less that heuristic size and not the other 
way around.
   
    > If the code needs to impose additional constraint that the memory must be 
accessible from a separate device(e.g. NPU), it certainly would require a more 
specialized constraint that is better spelled out explicitly.
   
   > As we can see that this is another kind of flexibility we want to enable 
here -- flexibility of picking possible backend allocation implementations 
without over constraining the code generator to a backend specific behavior 
that is platform dependent (like the case of pinned memory
   
   Yes this is something we want eventually and we will be working towards 
achieving with USMP work.
   
   Until we have that, the natural assumption should be in absense of a 
'constraint' that the memories are more accessible rather than being less 
accessible (e.g. stack). Its unfortunate that the current design prefers the 
latter especially in a absense of a constraint.
   
   @mbs-octoml ,
   
   ### Short term solution :
   I think you are right, we might want to unblock this using a 
target-dependent kMaxStackAllocaSize. 
   
   May I ask why  was the default chosen to be this ?
   
https://github.com/apache/tvm/blob/1fd8f610953adc39cbd18d82f4a9e92a11575dfc/include/tvm/runtime/device_api.h#L60-L61
   
   Its interesting because the stack size go up beyond that size as it is just 
looking on a single allocate at a time. i.e. you could have multiple allocates 
that are less than < 1024. So the stack usage is not even bounded by the 
current approach.
   
   Therefore, to both unlock us with Ethos-U and also somewhat solve the 
problem that current micro builds using stack for tensors < 1024 instead of the 
workspace buffer provided, maybe we should just make kMaxStackAllocaSize=0 (a 
binary decision rather than a value range).
   
   @Mousius @leandron @areusch , this means there is going to be another 
argument for a simple micro deployment to be added to the already long list of 
arguments. Something like "--use-external-workspace" ? 
   
   @tqchen , I still feel it would have been super helpful that 
kMaxStackAllocaSize is by default zero but with the option of going higher 
based on a user argument. e.g. --max-stack-alloca-size=1024. It is not very 
convincing that we are leaving out stack allocation of tensors with the 
prospect of being optimized by mem2reg without doing any marking (i.e. special 
storage scope) in a somewhat global form.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to