Hi folks,

My situation is the following : 2 computers (A and B) running Opensolaris b131 having intel 82574L NICs, connected through an HP4208 switch.
Both computers are on the same network.
I have transfers running from computer A to computer B, either through ssh or netcat.

As long a computer B is not too busy, the transfer goes like a charm.
But when B's really busy (doing zfs recv from a local file in this case) , the transfer fails is an odd way after some time (tests show somewhere between 10 minutes and 13 hours).

What's odd is that A reports that he could not read from B and closes the connection (no sign of it in netstat), but B still thinks the connection is open. Further, running "kstat -p | grep e1000g | grep -i err" on A show all zeroes but for the following :
e1000g:1:statistics:Recv_Length_Errors  14
link:0:e1000g1:ierrors  14
e1000g:1:mac:ierrors    14

More details on the test cases is available there :
http://opensolaris.org/jive/thread.jspa?threadID=122977&tstart=0

You can see that Brent Jones mentionned the following CR but this is marked as a dupplicate of something fixed in 131.
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6905510

I did not do any twiddling in e1000g.conf.
Both e1000g are grouped in a aggregation named trk0.
Per advice of Richard Elling, I disabled LACP and, just to be sure, I unplugged one network cable on each machine.

If any of you has any clue or workaround to try, please share.

Thanks,
Arnaud

_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to