Scott Oaks created MAPREDUCE-7337:
-------------------------------------
Summary: Task files while deleting spill files on slow disk
Key: MAPREDUCE-7337
URL: https://issues.apache.org/jira/browse/MAPREDUCE-7337
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: performance
Reporter: Scott Oaks
We sometimes have tasks fail when deleting spill files in this loop (line 2005
of MapTask.java):
{code:java}
for(int i = 0; i < numSpills; i++) {
rfs.delete(filename[i],true);
}{code}
During this loop, there is no communication back to the master server, and
hence if the loop takes too long, the master server assumes the child has timed
out and tells the nodeagent to kill the yarn child.
Typically this is linked to storage issues, and we've seen it most often due to
an underlying bug in the filesystem (where there is contention in the
filesystem delete path when deleting several files). But while there are
usually underlying issues, it still wouldn't hurt to mark progress in the task
during this loop periodically.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]