Lunderberg opened a new pull request, #16992: URL: https://github.com/apache/tvm/pull/16992
Prior to this commit, using `disco.Session` methods to transfer `NDArray` instances to workers could raise an exception if the `NDArray` is larger than the buffer allocated by the OS for the controller/worker pipe. In these case, the first call to the `Read` method of `tvm::support::Pipe` would successfully return, but only with the initial bytes of the `NDArray`. Receiving the full `NDArray` requires repeatedly calling the POSIX `read` function. This commit updates the `Read` and `Write` methods of `tvm::support::Pipe` to repeatedly call the underlying read/write methods, until the full `NDArray` has been transferred. This commit does not add any unit tests, as the existing unit test `tests/python/disco/test_ccl.py::test_attention[nccl-ProcessSession]` requires this change to pass. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org