Openmpi uses readv/writev.  I am beginning to think that the timeout
and permission errors are legit and reflect real conditions.  What
does re do when it receives a write request when it is busy?

I am having some luck in redoing writev calls when needed, though
there are still some problems.

Dave


On 12/24/19, Claudio Jeker <cje...@diehard.n-r-g.com> wrote:
> On Mon, Dec 23, 2019 at 08:17:37AM -0800, Philip Guenther wrote:
>> On Mon, Dec 23, 2019 at 5:04 AM Raymond, David <david.raym...@nmt.edu>
>> wrote:
>>
>> > The "timeout" error was numerically 60.  Curiously, boards with RTL
>> > 8111GR chips did not produce these errors, but those with RTL 8111H
>> > chips did.  Unfortunately, this chipset seems to be in a lot of newer
>> > motherboards.
>> >
>> > I didn't use ktrace/kdump.  The openmpi software returned the error
>> > presented by readv/writev.
>> >
>> > It sounds like the simplest solution at this point is to try
>> > non-Realtek pcie network cards.  Any suggestions?  How are Intel or
>> > Broadcom cards?
>> >
>>
>> At this point I think you're clearly in the "device driver is buggy"
>> situation.  If this device has an in-tree driver (and not something
>> you're
>> compiling locally into your kernel) then you should start a new thread
>> starting with a dmesg and a clear description of the involved hardware.
>
> I don't know what OpenMP uses for communication but re(4) does not return
> errno 60 (ETIMEDOUT). So it seems like it is something else. Also 8111G
> and 8111H are treated the same way in our re(4) driver.
>
> --
> :wq Claudio
>


-- 
David J. Raymond
david.raym...@nmt.edu
http://physics.nmt.edu/~raymond

Reply via email to