Openmpi uses readv/writev. I am beginning to think that the timeout and permission errors are legit and reflect real conditions. What does re do when it receives a write request when it is busy?
I am having some luck in redoing writev calls when needed, though there are still some problems. Dave On 12/24/19, Claudio Jeker <cje...@diehard.n-r-g.com> wrote: > On Mon, Dec 23, 2019 at 08:17:37AM -0800, Philip Guenther wrote: >> On Mon, Dec 23, 2019 at 5:04 AM Raymond, David <david.raym...@nmt.edu> >> wrote: >> >> > The "timeout" error was numerically 60. Curiously, boards with RTL >> > 8111GR chips did not produce these errors, but those with RTL 8111H >> > chips did. Unfortunately, this chipset seems to be in a lot of newer >> > motherboards. >> > >> > I didn't use ktrace/kdump. The openmpi software returned the error >> > presented by readv/writev. >> > >> > It sounds like the simplest solution at this point is to try >> > non-Realtek pcie network cards. Any suggestions? How are Intel or >> > Broadcom cards? >> > >> >> At this point I think you're clearly in the "device driver is buggy" >> situation. If this device has an in-tree driver (and not something >> you're >> compiling locally into your kernel) then you should start a new thread >> starting with a dmesg and a clear description of the involved hardware. > > I don't know what OpenMP uses for communication but re(4) does not return > errno 60 (ETIMEDOUT). So it seems like it is something else. Also 8111G > and 8111H are treated the same way in our re(4) driver. > > -- > :wq Claudio > -- David J. Raymond david.raym...@nmt.edu http://physics.nmt.edu/~raymond