@KellenSunderland The problem is that it is impossible to predict which stream will be chosen for the next operator, and issuing waits on all streams would mean that you never get the parallel execution. To choose the right stream to wait one needs to do it from the second op, not the first (and then you basically end up with this proposal).
-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/apache/incubator-mxnet/issues/18951#issuecomment-675186927
