Memory sharing across all the Tasks in the Task Tracker to improve the job 
performance
--------------------------------------------------------------------------------------

                 Key: MAPREDUCE-2647
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2647
             Project: Hadoop Map/Reduce
          Issue Type: New Feature
          Components: tasktracker
            Reporter: Devaraj K
            Assignee: Devaraj K


        If all the tasks (maps/reduces) are using (working with) the same 
additional data to execute the map/reduce task, each task should load the data 
into memory individually and read the data. It is the additional effort for all 
the tasks to do the same job. Instead of loading the data by each task, data 
can be loaded into main memory and it can be used to execute all the tasks.


h5.Proposed Solution:
1. Provide a mechanism to load the data into shared memory and to read that 
data from main memory.
2. We can provide a java API, which internally uses the native implementation 
to read the data from the memory. All the maps/reducers can this API for 
reading the data from the main memory. 


h5.Example: 
        Suppose in a map task, ip address is a key and it needs to get location 
of the ip address from a local file. In this case each map task should load the 
file into main memory and read from it and close it. It takes some time to 
open, read from the file and process every time. Instead of this, we can load 
the file in the task tracker memory and each task can read from the memory 
directly.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to