If it is relatively small you can pass it via the JobConf object, storing a serialized version of your dataset. If it is larger you can pass a serialized version via the distributed cache. Your map task will need to deserialize the object in the configure method.
None of the above methods give you an object that is write shared between map tasks. Please remember that the map tasks execute in separate JVM's on distinct machines in the normal MapReduce environment. On Sat, May 2, 2009 at 10:59 PM, Amandeep Khurana <ama...@gmail.com> wrote: > How can I create a global variable for each node running my map task. For > example, a common ArrayList that my map function can access for every k,v > pair it works on. It doesnt really need to create the ArrayList everytime. > > If I create it in the main function of the job, the map function gets a > null > pointer exception. Where else can this be created? > > Amandeep > > > Amandeep Khurana > Computer Science Graduate Student > University of California, Santa Cruz > -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422