I have an issue that I am running a hadoop job on a 40 node cluster with about 300 Map tasks and about 300 reduce tasks. Most tasks complete within 20 minutes but a few, typically less than 10 run for many hours. If they complete I see nothing to suggest that the number of bytes read or written or the number of records read or written is significantly different from tasks that run much faster. I sometimes see multiple attempts - usually only two and the cluster is doing nothing else.
Any suggested tuning?