zhuwenxi edited a comment on issue #7246:
URL: https://github.com/apache/tvm/issues/7246#issuecomment-759957921


   > Thank you @zhuwenxi! this is indeed an issue that we need to work to 
resolve. The main problem was the stack used for parallel packed call being 
raised into outside of the parallel for block during PackedCall lowering.
   > 
   > We will need to think about ways to improve the packed call handling to 
avoid lifting such allocation to outside of the parallel for block
   
   @tqchen , thanks for the reply!
   
   Despite the race condition in a parallel schedule, I think the approach that 
allocate stack outside of (parallel) loops does have some sort of performance 
advantages, that it makes a shared stack which can be used by multiple packed 
func calls thus they don't need to create and allocate their own stacks.
   
   So my point is, put stack allocation outside of for-loop is OK, we just need 
to take special treatments to those packed func in parallel for loops.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to