log files not found

2010-04-01 Thread Raghava Mutharaju
Hi all,

   I am running a series of jobs one after another. While executing the
4th job, the job fails. It fails in the reducer --- the progress percentage
would be map 100%, reduce 99%. It gives out the following message

10/04/01 01:04:15 INFO mapred.JobClient: Task Id :
attempt_201003240138_0110_r_18_1, Status : FAILED
Task attempt_201003240138_0110_r_18_1 failed to report status for 602
seconds. Killing!

It makes several attempts again to execute it but fails with similar
message. I couldn't get anything from this error message and wanted to look
at logs (located in the default dir of ${HADOOP_HOME/logs}). But I don't
find any files which match the timestamp of the job. Also I did not find
history and userlogs in the logs folder. Should I look at some other place
for the logs? What could be the possible causes for the above error?

   I am using Hadoop 0.20.2 and I am running it on a cluster with 14
nodes.

Thank you.

Regards,
Raghava.


reduce takes too long time

2010-04-01 Thread Zheng Lv
Hello Everyone,
One of our job's has 4 reduce tasks, but we find that one of them runs
normally, and others takes too long time.
Following is the normal task's log:
2010-04-01 15:01:48,596 INFO org.apache.hadoop.mapred.Merger: Merging 1
sorted segments
2010-04-01 15:01:48,601 INFO org.apache.hadoop.mapred.Merger: Down to the
last merge-pass, with 1 segments left of total size: 9907055 bytes
2010-04-01 15:01:48,605 WARN org.apache.hadoop.mapred.JobConf: The variable
mapred.task.maxvmem is no longer used. Instead use mapred.job.map.memory.mb
and mapred.job.reduce.memory.mb
2010-04-01 15:01:48,622 WARN org.apache.hadoop.mapred.JobConf: The variable
mapred.task.maxvmem is no longer used. Instead use mapred.job.map.memory.mb
and mapred.job.reduce.memory.mb
2010-04-01 15:01:48,672 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new compressor
2010-04-01 15:02:03,744 INFO org.apache.hadoop.mapred.TaskRunner:
Task:attempt_201003301656_0139_r_01_0 is done. And is in the process of
commiting
2010-04-01 15:02:05,756 INFO org.apache.hadoop.mapred.TaskRunner: Task
attempt_201003301656_0139_r_01_0 is allowed to commit now
2010-04-01 15:02:05,762 INFO
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved output of
task 'attempt_201003301656_0139_r_01_0' to
/user/root/nginxlog/sessionjob/output/20100401140001-20100401150001
2010-04-01 15:02:05,765 INFO org.apache.hadoop.mapred.TaskRunner: Task
'attempt_201003301656_0139_r_01_0' done.

And following is one of others:
2010-04-01 15:01:49,549 INFO org.apache.hadoop.mapred.Merger: Merging 1
sorted segments
2010-04-01 15:01:49,554 INFO org.apache.hadoop.mapred.Merger: Down to the
last merge-pass, with 1 segments left of total size: 9793700 bytes
2010-04-01 15:01:49,563 WARN org.apache.hadoop.mapred.JobConf: The variable
mapred.task.maxvmem is no longer used. Instead use mapred.job.map.memory.mb
and mapred.job.reduce.memory.mb
2010-04-01 15:01:49,582 WARN org.apache.hadoop.mapred.JobConf: The variable
mapred.task.maxvmem is no longer used. Instead use mapred.job.map.memory.mb
and mapred.job.reduce.memory.mb
2010-04-01 15:04:49,690 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new compressor
2010-04-01 15:05:07,103 INFO org.apache.hadoop.mapred.TaskRunner:
Task:attempt_201003301656_0139_r_00_0 is done. And is in the process of
commiting
2010-04-01 15:05:09,114 INFO org.apache.hadoop.mapred.TaskRunner: Task
attempt_201003301656_0139_r_00_0 is allowed to commit now
2010-04-01 15:05:09,120 INFO
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved output of
task 'attempt_201003301656_0139_r_00_0' to
/user/root/nginxlog/sessionjob/output/20100401140001-20100401150001
2010-04-01 15:05:09,123 INFO org.apache.hadoop.mapred.TaskRunner: Task
'attempt_201003301656_0139_r_00_0' done.

   It looks like sth is waiting before 2010-04-01 15:05:07,103 INFO
org.apache.hadoop.mapred.TaskRunner:
Task:attempt_201003301656_0139_r_00_0 is done. And is in the process of
commiting.Any suggestion?
   Regards,
LvZheng


Re: Errors reading lzo-compressed files from Hadoop

2010-04-01 Thread Todd Lipcon
Hey Dmitriy,

This is very interesting (and worrisome in a way!) I'll try to take a look
this afternoon.

-Todd

On Thu, Apr 1, 2010 at 12:16 AM, Dmitriy Ryaboy dmit...@twitter.com wrote:

 Hi folks,
 We write a lot of lzo-compressed files to HDFS -- some via scribe,
 some using internal tools. Occasionally, we discover that the created
 lzo files cannot be read from HDFS -- they get through some (often
 large) portion of the file, and then fail with the following stack
 trace:

 Exception in thread main java.lang.InternalError:
 lzo1x_decompress_safe returned:
at
 com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect(Native
 Method)
at
 com.hadoop.compression.lzo.LzoDecompressor.decompress(LzoDecompressor.java:303)
at
 com.hadoop.compression.lzo.LzopDecompressor.decompress(LzopDecompressor.java:122)
at
 com.hadoop.compression.lzo.LzopInputStream.decompress(LzopInputStream.java:223)
at
 org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:74)
at java.io.InputStream.read(InputStream.java:85)
at com.twitter.twadoop.jobs.LzoReadTest.main(LzoReadTest.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

 The initial thought is of course that the lzo file is corrupt --
 however, plain-jane lzop is able to read these files. Moreover, if we
 pull the files out of hadoop, uncompress them, compress them again,
 and put them back into HDFS, we can usually read them from HDFS as
 well.

 We've been thinking that this strange behavior is caused by a bug in
 the hadoop-lzo libraries (we use the version with Twitter and Cloudera
 fixes, on github: http://github.com/kevinweil/hadoop-lzo )
 However, today I discovered that using the exact same environment,
 codec, and InputStreams, we can successfully read from the local file
 system, but cannot read from HDFS. This appears to point at possible
 issues in the FSDataInputStream or further down the stack.

 Here's a small test class that tries to read the same file from HDFS
 and from the local FS, and the output of running it on our cluster.
 We are using the CDH2 distribution.

 https://gist.github.com/e1bf7e4327c7aef56303

 Any ideas on what could be going on?

 Thanks,
 -Dmitriy




-- 
Todd Lipcon
Software Engineer, Cloudera


Re: OutOfMemoryError: Cannot create GC thread. Out of system resources

2010-04-01 Thread Scott Carey
The default size of Java's young GC generation is 1/3 of the heap.  
(-XX:NewRatio defaults to 2)
You have told it to use 100MB for in memory file system.  There is a default 
setting of 64MB sort space.   

if -Xmx is 128M then the above sums to over 200MB and won't fit.   Turning down 
the use of any of the three above could help, or increasing -Xmx.

Additionally, when a thread can't be allocated it could potentially be due to a 
limit on the OS side for file system handles per process or user.


On Mar 31, 2010, at 11:48 AM, Edson Ramiro wrote:

 Hi all,
 
 When I run the pi Hadoop sample I get this error:
 
 10/03/31 15:46:13 WARN mapred.JobClient: Error reading task outputhttp://
 h04.ctinfra.ufpr.br:50060/tasklog?plaintext=truetaskid=attempt_201003311545_0001_r_02_0filter=stdout
 10/03/31 15:46:13 WARN mapred.JobClient: Error reading task outputhttp://
 h04.ctinfra.ufpr.br:50060/tasklog?plaintext=truetaskid=attempt_201003311545_0001_r_02_0filter=stderr
 10/03/31 15:46:20 INFO mapred.JobClient: Task Id :
 attempt_201003311545_0001_m_06_1, Status : FAILED
 java.io.IOException: Task process exit with nonzero status of 134.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
 
 May be its because the datanode can't create more threads.
 
 ram...@lcpad:~/hadoop-0.20.2$ cat
 logs/userlogs/attempt_201003311457_0001_r_01_2/stdout
 #
 # A fatal error has been detected by the Java Runtime Environment:
 #
 # java.lang.OutOfMemoryError: Cannot create GC thread. Out of system
 resources.
 #
 #  Internal Error (gcTaskThread.cpp:38), pid=28840, tid=140010745776400
 #  Error: Cannot create GC thread. Out of system resources.
 #
 # JRE version: 6.0_17-b04
 # Java VM: Java HotSpot(TM) 64-Bit Server VM (14.3-b01 mixed mode
 linux-amd64 )
 # An error report file with more information is saved as:
 #
 /var-host/tmp/hadoop-ramiro/mapred/local/taskTracker/jobcache/job_201003311457_0001/attempt_201003311457_0001_r_01_2/work/hs_err_pid28840.log
 #
 # If you would like to submit a bug report, please visit:
 #   http://java.sun.com/webapps/bugreport/crash.jsp
 #
 
 I configured the limits bellow, but I'm still getting the same error.
 
  property
  namefs.inmemory.size.mb/name
  value100/value
  /property
 
  property
  namemapred.child.java.opts/name
  value-Xmx128M/value
  /property
 
 Do you know what limit should I configure to fix it?
 
 Thanks in Advance
 
 Edson Ramiro



Re: swapping on hadoop

2010-04-01 Thread Scott Carey

On Apr 1, 2010, at 8:38 AM, Vasilis Liaskovitis wrote:

 
 In this example, what hadoop config parameters do the above 2 buffers
 refer to? io.sort.mb=250, but which parameter does the map side join
 100MB refer to? Are you referring to the split size of the input data
 handled by a single map task? Apart from that question, the example is
 clear to me and useful, thanks.
 

Map side join in just an example of one of many possible use cases where a 
particular map implementation may hold on to some semi-permanent data for the 
whole task.
It could be anything that takes 100MB of heap and holds the data across 
individual calls to map().

 
 Quoting Allen: Java takes more RAM than just the heap size.
 Sometimes 2-3x as much.
 Is there a clear indication that Java memory usage extends so far
 beyond its allocated heap? E.g. would java thread stacks really
 account for such a big increase 2x to 3x? Tasks seem to be heavily
 threaded. What are the relevant config options to control number of
 threads within a task?
 

Java typically uses 5MB to 60MB for classloader data (statics, classes) and 
some space for threads, etc.  The default thread stack on most OS's is about 
1MB, and the number of threads for a task process is on the order of a dozen.
Getting 2-3x the space in a java process outside the heap would require either 
a huge thread count, a large native library loaded, or perhaps a non-java 
hadoop job using pipes.
It would be rather obvious in 'top' if you sort by memory (shift-M on linux), 
or vmstat, etc.   To get the current size of the heap of a process, you can use 
jstat or 'kill -3' to create a stack dump and heap summary.

 
 With this new setup, I don't normally get swapping for a single job
 e.g. terasort or hive job. However, the problem in general is
 exacerbated if one spawns multiple indepenendent hadoop jobs
 simultaneously. I 've noticed that JVMs are not re-used across jobs,
 in an earlier post:
 http://www.mail-archive.com/common-...@hadoop.apache.org/msg01174.html
 This implies that Java memory usage would blow up when submitting
 multiple independent jobs. So this multiple job scenario sounds more
 susceptible to swapping
 
The maximum number of map and reduce tasks per node applies no matter how many 
jobs are running.


 A relevant question is: in production environments, do people run jobs
 in parallel? Or is it that the majority of jobs is a serial pipeline /
 cascade of jobs being run back to back?
 
Jobs are absolutely run in parallel.  I recommend using the fair scheduler with 
no config parameters other than 'assignmultiple = true' as the 'baseline' 
scheduler, and adjust from there accordingly.  The Capacity Scheduler has more 
tuning knobs for dealing with memory constraints if jobs have drastically 
different memory needs.  The out-of-the-box FIFO scheduler tends to have a hard 
time keeping the cluster utilization high when there are multiple jobs to run.

 thanks,
 
 - Vasilis



Re: Error converting WordCount to v0.20.x

2010-04-01 Thread slim tebourbi
I've tried to try the same thing and I noted that even the map function was
not executed!

here are the logs :

$ hadoop jar wordcount.jar org.stebourbi.hadoop.training.WordCount input
output

10/04/01 23:39:53 INFO security.Groups: Group mapping
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
cacheTimeout=30
10/04/01 23:39:53 WARN conf.Configuration: mapred.task.id is deprecated.
Instead, use mapreduce.task.attempt.id

10/04/01 23:39:53 DEBUG mapreduce.JobSubmitter: Configuring job
job_201004012334_0007 with
hdfs://localhost:9000/tmp/hadoop-tebourbi/mapred/staging/tebourbi/.staging/job_201004012334_0007
as the submit dir
10/04/01 23:39:53 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
10/04/01 23:39:53 DEBUG mapreduce.JobSubmitter: default FileSystem:
hdfs://localhost:9000
10/04/01 23:39:54 DEBUG mapreduce.JobSubmitter: Creating splits at
hdfs://localhost:9000/tmp/hadoop-tebourbi/mapred/staging/tebourbi/.staging/job_201004012334_0007
10/04/01 23:39:54 INFO input.FileInputFormat: Total input paths to process :
3
10/04/01 23:39:54 DEBUG input.FileInputFormat: Total # of splits: 3
10/04/01 23:39:54 WARN conf.Configuration: mapred.map.tasks is deprecated.
Instead, use mapreduce.job.maps
10/04/01 23:39:54 INFO mapreduce.JobSubmitter: number of splits:3
10/04/01 23:39:54 INFO mapreduce.JobSubmitter: adding the following
namenodes' delegation tokens:null
10/04/01 23:39:54 INFO mapreduce.Job: Running job: job_201004012334_0007
10/04/01 23:39:55 INFO mapreduce.Job:  map 0% reduce 0%
10/04/01 23:39:55 INFO mapreduce.Job: Job complete: job_201004012334_0007
10/04/01 23:39:55 INFO mapreduce.Job: Counters: 4
Job Counters
Total time spent by all maps waiting after reserving slots (ms)=0
Total time spent by all reduces waiting after reserving slots (ms)=0
SLOTS_MILLIS_MAPS=0
SLOTS_MILLIS_REDUCES=0


However, the same code works well on eclipse as a simple java program!

Slim.

2010/3/28 Chris Williams chris.d.willi...@gmail.com


 I am working through the WordCount example to get rid of all the
 deprecation
 warnings.  While running it, my reduce function isn't being called.  Any
 ideas?  The code below can also be found here:
 http://gist.github.com/346975

 Thanks!
 Chris

 package hadoop.examples;

 import java.io.IOException;
 import java.util.*;

 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.*;
 import org.apache.hadoop.mapreduce.Job;
 import org.apache.hadoop.mapreduce.Mapper;
 import org.apache.hadoop.mapreduce.Reducer;
 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
 import org.apache.hadoop.util.Tool;
 import org.apache.hadoop.util.ToolRunner;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.conf.Configured;

 public class WordCount extends Configured implements Tool {

public static class Map extends MapperLongWritable, Text, Text,
 IntWritable {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(LongWritable key, Text value, Context
 context)
throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new
 StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}

public static class Reduce extends ReducerText, IntWritable, Text,
 IntWritable {
public void reduce(Text key, IteratorIntWritable values,
 Context
 context)
throws IOException, InterruptedException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
context.write(key, new IntWritable(sum));
}
}

public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new
 WordCount(), args);
System.exit(res);
}

@Override
public int run(String[] args) throws Exception {
Configuration conf = getConf();
Job job = new Job(conf, wordcount);

job.setJarByClass(WordCount.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
//job.setCombinerClass(Reduce.class);


How to Recommission?

2010-04-01 Thread Zhanlei Ma
How to Recommission or decommission DataNode(s) in hadoop???
Decommission(Del some Datanodes):
On a large cluster removing one or two data-nodes will not lead to any data 
loss, because name-node will replicate their blocks as long as it

will detect that the nodes are dead. With a large number of nodes getting 
removed or dying the probability of losing data is higher.

Hadoop offers the decommission feature to retire a set of existing data-nodes. 
The nodes to be retired should be included into the exclude file,

and the exclude file name should be specified as a configuration parameter 
dfs.hosts.exclude. This file should have been specified during

namenode startup. It could be a zero length file. You must use the full 
hostname, ip or ip:port format in this file. Then the shell command

bin/hadoop dfsadmin -refreshNodes

should be called, which forces the name-node to re-read the exclude file and 
start the decommission process.

Decommission does not happen momentarily since it requires replication of 
potentially a large number of blocks and we do not want the cluster to

be overwhelmed with just this one job. The decommission progress can be 
monitored on the name-node Web UI. Until all blocks are replicated the

node will be in Decommission In Progress state. When decommission is done the 
state will change to Decommissioned. The nodes can be removed

whenever decommission is finished.



But how to Recommission? Wish your help.
Thanks.