Memory sharing across all the Tasks in the Task Tracker to improve the job
performance
--------------------------------------------------------------------------------------
Key: MAPREDUCE-2647
URL: https://issues.apache.org/jira/browse/MAPREDUCE-2647
Project: Hadoop Map/Reduce
Issue Type: New Feature
Components: tasktracker
Reporter: Devaraj K
Assignee: Devaraj K
If all the tasks (maps/reduces) are using (working with) the same
additional data to execute the map/reduce task, each task should load the data
into memory individually and read the data. It is the additional effort for all
the tasks to do the same job. Instead of loading the data by each task, data
can be loaded into main memory and it can be used to execute all the tasks.
h5.Proposed Solution:
1. Provide a mechanism to load the data into shared memory and to read that
data from main memory.
2. We can provide a java API, which internally uses the native implementation
to read the data from the memory. All the maps/reducers can this API for
reading the data from the main memory.
h5.Example:
Suppose in a map task, ip address is a key and it needs to get location
of the ip address from a local file. In this case each map task should load the
file into main memory and read from it and close it. It takes some time to
open, read from the file and process every time. Instead of this, we can load
the file in the task tracker memory and each task can read from the memory
directly.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira