junrushao commented on PR #16111:
URL: https://github.com/apache/tvm/pull/16111#issuecomment-1807330295

   I was thinking about the following case: allocating a tensor of shape `(n, 
)`, where upper bound of `n` is huge, e.g. 128k, while the actual runtime value 
of `n` is usually small, e.g. 1k. This could be a scenario when context length 
becomes gigantic in LLMs. In this case, static planning will always return a 
tensor of 128k length no matter what `n` is.
   
   As a compiler infra, it does not always assume runtime use cases without 
specific hints/annotations, which, in our particular case, means always 
allocating upper bound memory while not assuming its lifetime, which, in my 
PoV, could be suboptimal if the caller of the relax function indefinitely 
extends the life span of the returned value, leading to unnecessary overuse of 
memory when multiple small `n`s are kept by the caller.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to