zhuwenxi edited a comment on issue #7246: URL: https://github.com/apache/tvm/issues/7246#issuecomment-759976432
> Possible way to resolve the issue: > > * Introduce the packed_arg_alloca intrinsic that is only gauranteed to be valid for the specific packed func call > > * Skip the lifting alloca step, and keep alloca always next to the func call > * Update LLVM codegen to insert alloca always to the beginning of the current function block > * Update StackVM and C codegen to support things accordingly If I understand correctly, you wanna introduce a special "packed_arg_alloca" tir and make sure all backends implement it? Correct me if I'm wrong :) As I mentioned above, the root cause of this problem is the **tir** lowering for packed func in a parallel for is not thread-safe. So have you considered to fix it on tir level, utilizing existing TVM IRs? Thus no new tir type introduction and corresponding backend codegen implementations are required. This is what I propose to fix the problem: re-allocation the stack next to the packed func call, but only in the parallel for loop. ![image](https://user-images.githubusercontent.com/4969797/104556407-84444e80-567a-11eb-931f-c24677709786.png) I've already tired the fix and confirmed this approach does work. (I understand the re-allocation is against the SSA constrain, but it can be avoid easily, by making re-allocated stacks have distinct names, such as "stack_value_1", "stack_value_2") ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org