junrushao commented on PR #16111: URL: https://github.com/apache/tvm/pull/16111#issuecomment-1807330295
I was thinking about the following case: allocating a tensor of shape `(n, )`, where upper bound of `n` is huge, e.g. 128k, while the actual runtime value of `n` is usually small, e.g. 1k. This could be a scenario when context length becomes gigantic in LLMs. In this case, static planning will always return a tensor of 128k length no matter what `n` is. As a compiler infra, it does not always assume runtime use cases without specific hints/annotations, which, in our particular case, means always allocating upper bound memory while not assuming its lifetime, which, in my PoV, could be suboptimal if the caller of the relax function indefinitely extends the life span of the returned value, leading to unnecessary overuse of memory when multiple small `n`s are kept by the caller. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
