zhuwenxi edited a comment on issue #7246: URL: https://github.com/apache/tvm/issues/7246#issuecomment-759957921
> Thank you @zhuwenxi! this is indeed an issue that we need to work to resolve. The main problem was the stack used for parallel packed call being raised into outside of the parallel for block during PackedCall lowering. > > We will need to think about ways to improve the packed call handling to avoid lifting such allocation to outside of the parallel for block @tqchen , thanks for the reply! Despite the race condition in a parallel schedule, I think the approach that allocate stack outside of (parallel) loops does have some sort of performance advantages, that it makes a shared stack which can be used by multiple packed func calls thus they don't need to create and allocate their own stacks. So my point is, put stack allocation outside of for-loop is OK, we just need to take special treatments to those packed func in parallel for loops. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org