Jim Brennan created HADOOP-15548:
------------------------------------
Summary: Randomize local dirs
Key: HADOOP-15548
URL: https://issues.apache.org/jira/browse/HADOOP-15548
Project: Hadoop Common
Issue Type: Bug
Reporter: Jim Brennan
Assignee: Jim Brennan
shuffle LOCAL_DIRS, LOG_DIRS and LOCAL_USER_DIRS when launching container. Some
applications will process these in exactly the same way in every container
(e.g. roundrobin) which can cause disks to get unnecessarily overloaded (e.g.
one output file written to first entry specified in the environment variable).
There are two paths for local dir allocation, depending on whether the size is
unknown or known. The unknown path already uses a random algorithm. The known
path initializes with a random starting point, and then goes round-robin after
that. When selecting a dir, it increments the last used by one and then checks
sequentially until it finds a dir that satisfies the request. Proposal is to
increment by a random value of between 1 and num_dirs - 1, and then check
sequentially from there. This should result in a more random selection in all
cases.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]