I have recently seen the connection reset problem, and no firewall was involved.

I have been doing a mapred index build over more than 5TB of arc files and I 
noticed:
  SocketException: Connection reset
that occurred in 1 of 1070 map tasks during the parse phase; the task was 
automatically restarted and succeeded on the second attempt.

The problem is chaotic/spurious/intermittent and is probably related to
OS network tuning.  It would be nice to know more about the transient
conditions that are associated with this problem.

I checked all the nodes I'm using and the slave nodes all have
high numbers of dropped RX packets.  Example:

  RX packets:1126673543 errors:0 dropped:163568 overruns:0 frame:0
  TX packets:871110771 errors:3 dropped:0 overruns:3 carrier:0

No slave node stands out in particular.  The master node, by contrast,
has dropped only 4 RX packets during 57 days of uptime.


Paul

Reply via email to