You would need to read the data, and store it in an internal data structure, or copy the file to a local file system file and mmap it if you didn't want to store it in java heap space.
Your map then has to deal with the fact that the data isn't being passed in directly. This is not straight forward to do. On Sun, May 10, 2009 at 4:53 PM, Matt Bowyer <mattbowy...@googlemail.com>wrote: > Thanks Jason, how can I get access to the particular block? > > do you mean create a static map inside the task (add the values).. and > check > if populated on the next run? > > or is there a more elegant/tried&tested solution? > > thanks again > > On Mon, May 11, 2009 at 12:41 AM, jason hadoop <jason.had...@gmail.com > >wrote: > > > You can cache the block in your task, in a pinned static variable, when > you > > are reusing the jvms. > > > > On Sun, May 10, 2009 at 2:30 PM, Matt Bowyer <mattbowy...@googlemail.com > > >wrote: > > > > > Hi, > > > > > > I am trying to do 'on demand map reduce' - something which will return > in > > > reasonable time (a few seconds). > > > > > > My dataset is relatively small and can fit into my datanode's memory. > Is > > it > > > possible to keep a block in the datanode's memory so on the next job > the > > > response will be much quicker? The majority of the time spent during > the > > > job > > > run appears to be during the 'HDFS_BYTES_READ' part of the job. I have > > > tried > > > using the setNumTasksToExecutePerJvm but the block still seems to be > > > cleared > > > from memory after the job. > > > > > > thanks! > > > > > > > > > > > -- > > Alpha Chapters of my book on Hadoop are available > > http://www.apress.com/book/view/9781430219422 > > www.prohadoopbook.com a community for Hadoop Professionals > > > -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422 www.prohadoopbook.com a community for Hadoop Professionals