junrushao commented on PR #16111:
URL: https://github.com/apache/tvm/pull/16111#issuecomment-1811882801

   Thanks for getting back to me @MasterJH5574! I believe C1 and C2 make our 
points crystal clear, where C1 is the case for over-allocated being kept for an 
indefinite life span, and C2 is the case for immediate memory recycling with 
the pooled allocator.
   
   Now moving to the discussion on S1, S2 and S3, where S1 and S2 are based on 
static analysis and S3 is purely runtime. The point I'd love make here is that 
static analysis may not be sufficient once dynamism is involved, and it might 
be eventually desirable to have a hybrid approach instead.
   
   To give a specific example in LLMs:
   - The compiler and the runtime needs to work together to find out the 
optimal combination of `max_sequence_length` and `prefill_chunk_size` under a 
certain memory constraint.
   - The memory constraint, such as 6GB, is not known during compilation time;
   - Repetitive re-compilation is not desirable.
   
   If the problem is designed specifically only for Llama2-7B, it would be 
relatively easy to resolve, i.e. static analysis gives a function 
`f(max_sequence_length, prefill_chunk_size, ...)` that returns the upper bound 
RAM needed, and the runtime figures out the maximum `max_sequence_length` that 
satisfies the memory constraint. In this case, purely static analysis won't 
work out and pure runtime approach would more or less lead to memory waste.
   
   Within the scope of this PR, it wouldn't be that complicated that we will 
have to find a perfect solution, and agreeing with your assessment, if the 
upstream framework could instruct compiler to apply certain upper bound-based 
approach with function attributes/annotation, it should be sufficient 
specifically for LLM serving so far.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to