On Mar 19, 2007, at 10:08 PM, Alejandro Abdelnur wrote:
you could write your word set to a file in DFS somewhere outside of the input directory and read it at map init time (within the configure() method). you could pass the path to file as a configuration property.
On a side node, if the files are large (or the maps short) it can make sense to use the local file cache. See org.apache.hadoop.filecache.DistributedCache. In particular, look at setCacheFiles. Basically, you configure it with a url and the task tracker will copy an instance down and cache it locally.
-- Owen
