Without knowing your exact workload, using a Combiner (if possible) as
Tsuyoshi recommended should decrease your total shuffle time. You can also
try compressing the map output so that there's less disk and network IO.
Here's an example configuration using Snappy:
Marc, see my inline comments.
On Fri, Aug 24, 2012 at 4:09 PM, Marc Sturlese marc.sturl...@gmail.comwrote:
Hey there,
I have a doubt about reduce tasks and block writes. Do a reduce task always
first write to hdfs in the node where they it is placed? (and then these
blocks would be
As Shaswat mentioned previously, you're problem may be related to your
configuration.
Is core-site.xml on your classpath? For example, what is the value for
conf.get(fs.default.name)?
Alternatively, you can set this property directly in your code:
conf.set(fs.default.name,
Akash,
Instead of adding the connector jar to $HADOOP_HOME/lib, when
running your map-reduce job using hadoop jar you can pass your
connector jar using the -libjars flag.
For example: hadoop jar hadoop-examples.jar wordcount -files
cachefile.txt -libjars mylib.jar input output
~ Minh
On
Peter, I believe that this will help you:
https://ccp.cloudera.com/display/CDHDOC/CDH3+Deployment+on+a+Cluster
DataNode Configuration
By default, the failure of a single dfs.data.dir will cause the HDFS
DataNode process to shut down, which results in the NameNode scheduling
additional replicas
Take at look at slide 25:
http://www.slideshare.net/cloudera/hadoop-troubleshooting-101-kate-ting-cloudera
It describes a similar error so hopefully this will help you.
~ Minh
On Tue, Jun 19, 2012 at 10:27 AM, Ellis H. Wilson III el...@cse.psu.edu wrote:
Hi all,
This is my first email to