Thanks, Owen and Doug. I am looking at that presentation with fresh eyes Owen and it's great! If you could toss me the OmniGraffle file for the "Process Diagram" on page 10, that would be awesome. That will serve as the main diagram for understanding how a job gets run, and I would love to just flesh it out a bit more (e.g. throw some data structures on there, and some of the other threads that the JobTracker/TaskTracker run). I am also chatting with some of our Flash guys to see if we can make the diagram dynamic, so that you could drill down on various components.
Much appreciated, Jeff On 9/10/07, Doug Cutting <[EMAIL PROTECTED]> wrote: > > Owen O'Malley wrote: > > > > On Sep 9, 2007, at 11:18 PM, Jeff Hammerbacher wrote: > > > >> What's the DistributedCache for, in words? > > > > It is for distributing large read-only files that need to be available > > to each task in the job. I've added an entry for it at the bottom of > > > > http://wiki.apache.org/lucene-hadoop/FAQ > > > > The answer needs more meat about how to set it up, but at least I > > started the entry. > > We should really improve the javadoc for this and link to it. The > javadoc should be good reference documentation, but is not currently. > The wiki and website should provide "user guide" style documentation, > but we should primarily rely on javadoc for reference. Thus the > class-level documentation in DistributedCache.java should describe how > its configured, and link to other relevant javadocs (e.g., command line > programs that add files to the cache). > > Doug >
