[networking-discuss] Help needed on big transfers failure with e1000g

Arnaud Brand Wed, 03 Feb 2010 05:39:05 -0800

Hi folks,

My situation is the following : 2 computers (A and B) runningOpensolaris b131 having intel 82574L NICs, connected through an HP4208switch.

Both computers are on the same network.

I have transfers running from computer A to computer B, either throughssh or netcat.


As long a computer B is not too busy, the transfer goes like a charm.

But when B's really busy (doing zfs recv from a local file in this case), the transfer fails is an odd way after some time (tests show somewherebetween 10 minutes and 13 hours).

What's odd is that A reports that he could not read from B and closesthe connection (no sign of it in netstat), but B still thinks theconnection is open.Further, running "kstat -p | grep e1000g | grep -i err" on A show allzeroes but for the following :

e1000g:1:statistics:Recv_Length_Errors  14
link:0:e1000g1:ierrors  14
e1000g:1:mac:ierrors    14

More details on the test cases is available there :
http://opensolaris.org/jive/thread.jspa?threadID=122977&tstart=0

You can see that Brent Jones mentionned the following CR but this ismarked as a dupplicate of something fixed in 131.

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6905510

I did not do any twiddling in e1000g.conf.
Both e1000g are grouped in a aggregation named trk0.

Per advice of Richard Elling, I disabled LACP and, just to be sure, Iunplugged one network cable on each machine.


If any of you has any clue or workaround to try, please share.

Thanks,
Arnaud

_______________________________________________
networking-discuss mailing list
[email protected]

[networking-discuss] Help needed on big transfers failure with e1000g

Reply via email to