tqchen commented on PR #16689: URL: https://github.com/apache/tvm/pull/16689#issuecomment-1995926307
Indeed agree that this makes things more relax. On the other hand, from the device api's pov, we don't really guarantee the sync behavior in other DeviceAPIs, e.g. in the case of metal API or vulkan - In most GPU APIs, both copy from/to host and across are async (e.g. in the case of metal or vulkan) - The default CUDA sync behavior of copyfromto actually was mainly limited to the default stream. One possible middleground we could have is to update CopyTo to always enable a StreamSync before CopyTo ends, this would help us preserve original usage of CopyTo, but still allows low level device API to enable async copy behavior that would generally provide more optimizations opportunities. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
