Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The "FAQ" page has been changed by QwertyManiac.
The comment on this change is: Reading cluster configuration values in Job..
http://wiki.apache.org/hadoop/FAQ?action=diff&rev1=89&rev2=90

--------------------------------------------------

  == What is the Distributed Cache used for? ==
  The distributed cache is used to distribute large read-only files that are 
needed by map/reduce jobs to the cluster. The framework will copy the necessary 
files from a url (either hdfs: or http:) on to the slave node before any tasks 
for the job are executed on that node. The files are only copied once per job 
and so should not be modified by the application.
  
+ == How do I get my MapReduce Java Program to read the Cluster's set 
configuration and not just defaults? ==
+ The configuration property files ({core|mapred|hdfs}-site.xml) that are 
available in the various '''conf/''' directories of your Hadoop installation 
needs to be on the '''CLASSPATH''' of your Java application for it to get found 
and applied. Another way of ensuring that no set configuration gets overridden 
by any Job is to set those properties as final; for example:
+ {{{
+ <name>mapreduce.task.io.sort.mb</name>
+ <value>400</value>
+ <final>true</final>
+ }}}
+ 
+ Setting configuration properties as final is a common thing Administrators 
do, as is noted in the 
[[http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/conf/Configuration.html|Configuration]]
 API docs.
+ 
  == Can I write create/write-to hdfs files directly from map/reduce tasks? ==
  Yes. (Clearly, you want this since you need to create/write-to files other 
than the output-file written out by 
[[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/OutputCollector.html|OutputCollector]].)
  

Reply via email to