Depending on the scale of data, between the two, it would be best stored in hdfs , and use the built-in InputFormat-s , as that is more scalable.
If necessary, (depending on how the data is stored), build a custom InputFormat, as per the API and set it for the job. http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/InputFormat.html . -- Vijay ----- Original Message ---- > From: maha <m...@umail.ucsb.edu> > To: common-user <common-user@hadoop.apache.org> > Sent: Sun, February 6, 2011 5:09:38 PM > Subject: Mapper reading from local directory or global variable? > > Hello, > > I'm wondering which option is more efficient to store "People's Names" to >be processed by Mappers. > > > 1. Store it in a global variable declared in the main class? > > 2. Store it in the HDFS to be distributed and read in each map. > > > Note that the number of mappers until now is around 1000 mappers. > Appreciate >any thought :) > > Thank you, > > Maha