So a common use case for the Distributed Cache would be to store a lookup table for use during a map task, perhaps?
On 9/10/07, Jeff Hammerbacher <[EMAIL PROTECTED]> wrote: > > Thanks, Owen and Doug. I am looking at that presentation with fresh eyes > Owen and it's great! If you could toss me the OmniGraffle file for the > "Process Diagram" on page 10, that would be awesome. That will serve as the > main diagram for understanding how a job gets run, and I would love to just > flesh it out a bit more ( e.g. throw some data structures on there, and > some of the other threads that the JobTracker/TaskTracker run). I am also > chatting with some of our Flash guys to see if we can make the diagram > dynamic, so that you could drill down on various components. > > Much appreciated, > Jeff > > On 9/10/07, Doug Cutting <[EMAIL PROTECTED]> wrote: > > > > Owen O'Malley wrote: > > > > > > On Sep 9, 2007, at 11:18 PM, Jeff Hammerbacher wrote: > > > > > >> What's the DistributedCache for, in words? > > > > > > It is for distributing large read-only files that need to be available > > > > > to each task in the job. I've added an entry for it at the bottom of > > > > > > http://wiki.apache.org/lucene-hadoop/FAQ > > > > > > The answer needs more meat about how to set it up, but at least I > > > started the entry. > > > > We should really improve the javadoc for this and link to it. The > > javadoc should be good reference documentation, but is not currently. > > The wiki and website should provide "user guide" style documentation, > > but we should primarily rely on javadoc for reference. Thus the > > class-level documentation in DistributedCache.java should describe how > > its configured, and link to other relevant javadocs (e.g., command line > > programs that add files to the cache). > > > > Doug > > > >
