I have recently seen the connection reset problem, and no firewall was involved.
I have been doing a mapred index build over more than 5TB of arc files and I noticed: SocketException: Connection reset that occurred in 1 of 1070 map tasks during the parse phase; the task was automatically restarted and succeeded on the second attempt. The problem is chaotic/spurious/intermittent and is probably related to OS network tuning. It would be nice to know more about the transient conditions that are associated with this problem. I checked all the nodes I'm using and the slave nodes all have high numbers of dropped RX packets. Example: RX packets:1126673543 errors:0 dropped:163568 overruns:0 frame:0 TX packets:871110771 errors:3 dropped:0 overruns:3 carrier:0 No slave node stands out in particular. The master node, by contrast, has dropped only 4 RX packets during 57 days of uptime. Paul