tqchen commented on PR #16689:
URL: https://github.com/apache/tvm/pull/16689#issuecomment-1995926307

   Indeed agree that this makes things more relax. On the other hand, from the 
device api's pov, we don't really guarantee the sync behavior in other 
DeviceAPIs, e.g. in the case of metal API or vulkan 
   
   
   - In most GPU APIs, both copy from/to host and across are async (e.g. in the 
case of metal or vulkan)
   - The default CUDA sync behavior of copyfromto actually was mainly limited 
to the default stream. 
   
   One possible middleground we could have is to update CopyTo to always enable 
a StreamSync before CopyTo ends, this would help us preserve original usage of 
CopyTo, but still allows low level device API to enable async copy behavior 
that would generally provide more optimizations opportunities.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to