Hi Pedro, I can answer a couple of these.
On Tue, Jan 5, 2010 at 5:46 PM, psdc1978 <psdc1...@gmail.com> wrote: > 1 - What are the difference between the classes: > org.apache.hadoop.mapred.Reducer.java and > org.apache.hadoop.mapreduce.Reducer.java? In which case the 2 reducers > are used? > > 2 - The same question for the Mapper.java? These classes were refactored in 0.20. The older ones (mapred package) were left to maintain backwards compatibility. > 4 - What's the purpose of the property in hdfs-site.xml called > "dfs.replication"? > > I've read what is defined in the Hadoop site, > "dfs.replication - Default block replication. The actual number of > replications can be specified when the file is created. The default is > used if replication is not specified in create time. ", but I still > haven't understand it. Is it in how many machines a file will be > replicated? Pretty much. Note that the underlying structure of an HDFS file is a collection of large blocks (64MB default) and that it is these blocks that are replicated. Ed