Re: hadoop/hive data loading

2011-05-12 Thread Fei Pan
hi,hadoopman

you can put the large data into your hdfs using hadoop fs -put src dest
and then you can use alter table xxx add partition(x) location 'desc'



2011/5/11 amit jaiswal amit_...@yahoo.com

 Hi,

 What is the meaning of 'union' over here. Is there any hadoop job with 1
 (or few) reducer that combines all data together. Have you tried external
 (dynamic) partitions for combining data?

 -amit


 - Original Message -
 From: hadoopman hadoop...@gmail.com
 To: common-user@hadoop.apache.org
 Cc:
 Sent: Tuesday, 10 May 2011 11:26 PM
 Subject: hadoop/hive data loading

 When we load data into hive sometimes we've run into situations where the
 load fails and the logs show a heap out of memory error.  If I load just a
 few days (or months) of data then no problem.  But then if I try to load two
 years (for example) of data then I've seen it fail.  Not with every feed but
 certain ones.

 Sometimes I've been able to split the data and get it to load.  An example
 of one type of feed I'm working on is the apache web server access logs.
 Generally it works.  But there are times when I need to load more than a few
 months of data and get the memory heap errors in the task logs.

 Generally how do people load their data into Hive?  We have a process where
 we first copy it to hdfs then from there we run a staging process to get it
 into hive.  Once that completes we perform a union all then overwrite table
 partition.  Usually it's during the union all stage that we see these errors
 appear.

 Also is there a log which tells you which log it fails on?  I can see which
 task/job failed but not finding which file it's complaining about.  I figure
 that might help a bit..

 Thanks!




-- 
Stay Hungry. Stay Foolish.


What exactly are the output_dir/part-00000 semantics (of a streaming job) ?

2011-05-12 Thread Dieter Plaetinck
Hi,
I'm running some experiments using hadoop streaming.
I always get a output_dir/part-0 file at the end, but I wonder:
when exactly will this filename show up? when it's completely written,
or will it already show up while the hapreduce software is still
writing to it? Is the write atomic?


The reason I'm asking this, I have a script which submits +- 200 of
jobs to mapreduce, and I have an other script collecting the
part-00 files of all jobs. (not just once when all experiments are
done, but I frequently collect all results of thus far finished jobs)

For this, I just do (simplified code):

for i in $(seq 1 200); do
  if $(ssh $master bin/hadoop dfs -ls $i/output/part-00); then
ssh $master bin/hadoop dfs -cat $i/output/part-00  output_$i
  fi
done

and I wonder if this is prone to race conditions, is there any change I
will run this while $i/output/part-00 in the process of being
written to, and hence I end up with incomplete output_$i files?

If so, what's the proper way to check if the file is really stable?
fetching the jobtracker webpage and checking if job $i is finished?

Dieter


Host-address or Hostname

2011-05-12 Thread Matthew John
Hi all,

The String[] that is output by the InputSplit.getLocations() gives the list
of nodes where the input split resides.
But the node detail is either represented as the ip-address or the hostname
(for eg - an entry in the list could be either 10.72.147.109 or mattHDFS1
(hostname). Is it possible to make this consistent. I am trying to do some
work by parsing an ID number embedded in the Hostname and this mixed
representation is giving me hell lot of problems.

How to resolve this ?

Thanks,
Matthew


Question about InputSampler

2011-05-12 Thread Panayotis Antonopoulos

Hello,
I am writing a MR job where the distribution of the Keys emitted by the Map 
phase is not known beforehand and so I can't create the partitions for the 
TotalOrderPartitioner. I would like to sample those keys to create the 
partitions and then run the job that will process the whole input.

Is the InputSampler the tool I need?
I tried to use it but I think it doesn't use the mapper class to process the 
samples and then create the partitions, 
but it just creates the partitions from the input. Am I wrong?

Thank you in advance!
Pan
  

Re: Host-address or Hostname

2011-05-12 Thread Matthew John
Is it possible to get a Host-address to Host-name mapping in the JIP ?
Someone please help me with this!

Thanks,
Matthew

On Thu, May 12, 2011 at 5:36 PM, Matthew John tmatthewjohn1...@gmail.comwrote:

 Hi all,

 The String[] that is output by the InputSplit.getLocations() gives the list
 of nodes where the input split resides.
 But the node detail is either represented as the ip-address or the hostname
 (for eg - an entry in the list could be either 10.72.147.109 or mattHDFS1
 (hostname). Is it possible to make this consistent. I am trying to do some
 work by parsing an ID number embedded in the Hostname and this mixed
 representation is giving me hell lot of problems.

 How to resolve this ?

 Thanks,
 Matthew



Error reading task output for benchmark test TESTDFSIO

2011-05-12 Thread Matthew Tice
Hello,

 

I have a four node hadoop cluster running hadoop v.0.20.2 on CentOS 5.6.
Here is my layout:

 

Name01.hadoop.stage (namenode)

Name02.hadoop.stage (sec namenode / jobtracker)

Data01.hadoop.stage (data node)

Data02.hadoop.stage (data node)

 

When trying to run a benchmark test for this newly-stood up cluster I'm
getting errors.  This is the command (run as the hadoop user on my
name01.hadoop.stage node):

 

# /opt/hadoop/bin/hadoop jar /opt/hadoop/hadoop-0.20.2-test.jar TestDFSIO
-write -nrFiles 1 -fileSize 10

 

Here is the output:

{{BEGIN}}

TestFDSIO.0.0.4

11/05/12 09:35:52 INFO mapred.FileInputFormat: nrFiles = 1

11/05/12 09:35:52 INFO mapred.FileInputFormat: fileSize (MB) = 10

11/05/12 09:35:52 INFO mapred.FileInputFormat: bufferSize = 100

11/05/12 09:35:52 INFO mapred.FileInputFormat: creating control file: 10
mega bytes, 1 files

11/05/12 09:35:52 INFO mapred.FileInputFormat: created control files for: 1
files

11/05/12 09:35:52 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.

11/05/12 09:35:52 INFO mapred.FileInputFormat: Total input paths to process
: 1

11/05/12 09:35:52 INFO mapred.JobClient: Running job: job_201105120935_0001

11/05/12 09:35:53 INFO mapred.JobClient:  map 0% reduce 0%

11/05/12 09:35:59 INFO mapred.JobClient: Task Id :
attempt_201105120935_0001_m_02_0, Status : FAILED

java.io.IOException: Task process exit with nonzero status of 1.

at
org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)

 

11/05/12 09:35:59 WARN mapred.JobClient: Error reading task
outputhttp://data02.hadoop.stage:50060/tasklog?plaintext=truetaskid=attempt
_201105120935_0001_m_02_0filter=stdout

11/05/12 09:35:59 WARN mapred.JobClient: Error reading task
outputhttp://data02.hadoop.stage:50060/tasklog?plaintext=truetaskid=attempt
_201105120935_0001_m_02_0filter=stderr

11/05/12 09:36:05 INFO mapred.JobClient: Task Id :
attempt_201105120935_0001_r_02_0, Status : FAILED

java.io.IOException: Task process exit with nonzero status of 1.

at
org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)

 

11/05/12 09:36:05 WARN mapred.JobClient: Error reading task
outputhttp://data02.hadoop.stage:50060/tasklog?plaintext=truetaskid=attempt
_201105120935_0001_r_02_0filter=stdout

11/05/12 09:36:05 WARN mapred.JobClient: Error reading task
outputhttp://data02.hadoop.stage:50060/tasklog?plaintext=truetaskid=attempt
_201105120935_0001_r_02_0filter=stderr

11/05/12 09:36:14 INFO mapred.JobClient: Task Id :
attempt_201105120935_0001_m_02_1, Status : FAILED

java.io.IOException: Task process exit with nonzero status of 1.

at
org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)

 

11/05/12 09:36:14 WARN mapred.JobClient: Error reading task
outputhttp://data01.hadoop.stage:50060/tasklog?plaintext=truetaskid=attempt
_201105120935_0001_m_02_1filter=stdout

11/05/12 09:36:14 WARN mapred.JobClient: Error reading task
outputhttp://data01.hadoop.stage:50060/tasklog?plaintext=truetaskid=attempt
_201105120935_0001_m_02_1filter=stderr

11/05/12 09:36:20 INFO mapred.JobClient: Task Id :
attempt_201105120935_0001_m_02_2, Status : FAILED

java.io.IOException: Task process exit with nonzero status of 1.

at
org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)

 

11/05/12 09:36:20 WARN mapred.JobClient: Error reading task
outputhttp://data01.hadoop.stage:50060/tasklog?plaintext=truetaskid=attempt
_201105120935_0001_m_02_2filter=stdout

11/05/12 09:36:20 WARN mapred.JobClient: Error reading task
outputhttp://data01.hadoop.stage:50060/tasklog?plaintext=truetaskid=attempt
_201105120935_0001_m_02_2filter=stderr

11/05/12 09:36:33 INFO mapred.JobClient: Task Id :
attempt_201105120935_0001_m_01_0, Status : FAILED

java.io.IOException: Task process exit with nonzero status of 1.

at
org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)

 

11/05/12 09:36:33 WARN mapred.JobClient: Error reading task
outputhttp://data01.hadoop.stage:50060/tasklog?plaintext=truetaskid=attempt
_201105120935_0001_m_01_0filter=stdout

11/05/12 09:36:33 WARN mapred.JobClient: Error reading task
outputhttp://data01.hadoop.stage:50060/tasklog?plaintext=truetaskid=attempt
_201105120935_0001_m_01_0filter=stderr

11/05/12 09:36:39 INFO mapred.JobClient: Task Id :
attempt_201105120935_0001_r_01_0, Status : FAILED

java.io.IOException: Task process exit with nonzero status of 1.

at
org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)

 

11/05/12 09:36:39 WARN mapred.JobClient: Error reading task
outputhttp://data01.hadoop.stage:50060/tasklog?plaintext=truetaskid=attempt
_201105120935_0001_r_01_0filter=stdout

11/05/12 09:36:39 WARN mapred.JobClient: Error reading task
outputhttp://data01.hadoop.stage:50060/tasklog?plaintext=truetaskid=attempt

Re: What exactly are the output_dir/part-00000 semantics (of a streaming job) ?

2011-05-12 Thread Aman
The creation of files part-n is atomic. When you run a MR job, these
files are created in directory output_dir/_temporary and moved to
output_dir after the files is closed for writing. This move is atomic
hence as long as you don't try to read these files from temporary directory
(which I see you are not) you will be fine. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-exactly-are-the-output-dir-part-0-semantics-of-a-streaming-job-tp2931125p2932598.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.


Call to namenode failures

2011-05-12 Thread Sidney Simmons
Hi there,

I'm experiencing some unusual behaviour on our 0.20.2 hadoop cluster.
Randomly (periodically), we're getting Call to namenode failures on
tasktrackers causing tasks to fail:

2011-05-12 14:36:37,462 WARN org.apache.hadoop.mapred.TaskRunner:
attempt_201105090819_059_m_0038_0Child Error
java.io.IOException: Call to namenode/10.10.10.10:9000 failed on local
exception: java.io.EOFException
   at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
   at org.apache.hadoop.ipc.Client.call(Client.java:743)
   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
   at $Proxy5.getFileInfo(Unknown Source)
   at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
   at java.lang.reflect.Method.invoke(Unknown Source)
   at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
   at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
   at $Proxy5.getFileInfo(Unknown Source)
   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:615)
   at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453)
   at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:210)
Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(Unknown Source)
   at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)

The namenode log (logging level = INFO) shows the following a few seconds
either side of the above timestamps. Could be relevant or it could be a
coincidence :

2011-05-12 14:36:40,005 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 57 on 9000 caught: java.nio.channels.ClosedChannelException
   at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(Unknown Source)
   at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
   at org.apache.hadoop.ipc.Server.channelWrite(Server.java:1213)
   at org.apache.hadoop.ipc.Server.access$1900(Server.java:77)
   at
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:622)
   at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:686)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:997)

The jobtracker does however have an entry that correlates with the
tasktracker :

2011-05-12 14:36:39,781 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201105090819_059_m_0038_0: java.io.IOException: Call to
namenode/10.10.10.10:9000 failed on local exception: java.io.EOFException
   at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
   at org.apache.hadoop.ipc.Client.call(Client.java:743)
   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
   at $Proxy1.getProtocolVersion(Unknown Source)
   at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
   at
org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:105)
   at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:208)
   at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:169)
   at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
   at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
   at org.apache.hadoop.mapred.Child.main(Child.java:157)
Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(Unknown Source)
   at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)

Can anyone give me any pointers on how to start troubleshooting this issue?
It's very sporadic and we haven't been able to reproduce the issue yet in
our lab. After looking through the mailing list archives, some of the
suggestions revolve around the following settings:

dfs.namenode.handler.count 128 (existing 64)
dfs.datanode.handler.count 10 (existing 3)
dfs.datanode.max.xcievers 4096 (existing 256)

Any pointers ?

Thanks in advance

Sid Simmons
Infrastructure Support Specialist


Call to namenode fails with java.io.EOFException

2011-05-12 Thread Sidney Simmons
Hi there,

I'm experiencing some unusual behaviour on our 0.20.2 hadoop cluster.
Randomly (periodically), we're getting Call to namenode failures on
tasktrackers causing tasks to fail:

2011-05-12 14:36:37,462 WARN org.apache.hadoop.mapred.TaskRunner:
attempt_201105090819_059_m_0038_0Child Error
java.io.IOException: Call to namenode/10.10.10.10:9000 failed on local
exception: java.io.EOFException
   at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
   at org.apache.hadoop.ipc.Client.call(Client.java:743)
   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
   at $Proxy5.getFileInfo(Unknown Source)
   at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
   at java.lang.reflect.Method.invoke(Unknown Source)
   at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
   at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
   at $Proxy5.getFileInfo(Unknown Source)
   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:615)
   at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453)
   at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:210)
Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(Unknown Source)
   at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)

The namenode log (logging level = INFO) shows the following a few seconds
either side of the above timestamps. Could be relevant or it could be a
coincidence :

2011-05-12 14:36:40,005 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 57 on 9000 caught: java.nio.channels.ClosedChannelException
   at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(Unknown Source)
   at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
   at org.apache.hadoop.ipc.Server.channelWrite(Server.java:1213)
   at org.apache.hadoop.ipc.Server.access$1900(Server.java:77)
   at
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:622)
   at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:686)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:997)

The jobtracker does however have an entry that correlates with the
tasktracker :

2011-05-12 14:36:39,781 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201105090819_059_m_0038_0: java.io.IOException: Call to
namenode/10.10.10.10:9000 failed on local exception: java.io.EOFException
   at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
   at org.apache.hadoop.ipc.Client.call(Client.java:743)
   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
   at $Proxy1.getProtocolVersion(Unknown Source)
   at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
   at
org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:105)
   at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:208)
   at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:169)
   at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
   at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
   at org.apache.hadoop.mapred.Child.main(Child.java:157)
Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(Unknown Source)
   at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)

Can anyone give me any pointers on how to start troubleshooting this issue?
It's very sporadic and we haven't been able to reproduce the issue yet in
our lab. After looking through the mailing list archives, some of the
suggestions revolve around the following settings:

dfs.namenode.handler.count 128 (existing 64)
dfs.datanode.handler.count 10 (existing 3)
dfs.datanode.max.xcievers 4096 (existing 256)

Any pointers ?

Thanks in advance

Sid Simmons
Infrastructure Support Specialist


mapper java process not exiting

2011-05-12 Thread Adi
For one long running job we are noticing that the mapper jvms do not exit
even after the mapper is done. Any suggestions on why this could be
happening.
The java processes get cleaned up if I do a hadoop job -kill job_id. The
java processes get cleaned up of I run in it in a smaller batch and the job
gets done fairly quickly(say half an hour). For larger inputs the nodes
eventually run out of memory because of these java processes that the
cluster thinks are gone but they haven't been cleaned up yet. I am
suspecting the TaskTrackers are failing to kill JVMs for some reason by
themselves.
The following exceptions can be seen in the hadoop logs.

2011-05-12 13:52:04,147 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
Error executing shell command
org.apache.hadoop.util.Shell$ExitCodeException: kill -12545: No such process
2011-05-12 13:52:08,071 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
Error executing shell command
org.apache.hadoop.util.Shell$ExitCodeException: kill -11061: No such process
2011-05-12 13:52:09,009 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
Error executing shell command
org.apache.hadoop.util.Shell$ExitCodeException: kill -11151: No such process
2011-05-12 13:52:12,009 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
Error executing shell command
org.apache.hadoop.util.Shell$ExitCodeException: kill -25057: No such process
2011-05-12 13:52:13,306 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
Error executing shell command
org.apache.hadoop.util.Shell$ExitCodeException: kill -19805: No such process
2011-05-12 13:52:14,996 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
Error executing shell command
org.apache.hadoop.util.Shell$ExitCodeException: kill -11103: No such process

2011-05-12 15:51:41,105 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
Error executing shell command
org.apache.hadoop.util.Shell$ExitCodeException: kill -17202: No such process
2011-05-12 15:51:43,481 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
Error executing shell command
org.apache.hadoop.util.Shell$ExitCodeException: kill -15981: No such process
2011-05-12 15:51:45,916 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
Error executing shell command
org.apache.hadoop.util.Shell$ExitCodeException: kill -17931: No such process
2011-05-12 15:52:06,328 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
Error executing shell command
org.apache.hadoop.util.Shell$ExitCodeException: kill -14867: No such process
2011-05-12 15:52:34,503 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
Error executing shell command
org.apache.hadoop.util.Shell$ExitCodeException: kill -29376: No such process
2011-05-12 15:52:38,607 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
Error executing shell command
org.apache.hadoop.util.Shell$ExitCodeException: kill -32491: No such process
2011-05-12 15:52:39,292 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
Error executing shell command
org.apache.hadoop.util.Shell$ExitCodeException: kill -31529: No such process
2011-05-12 15:52:46,547 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
Error executing shell command
org.apache.hadoop.util.Shell$ExitCodeException: kill -15140: No such process

Some other exceptions also seen in the logs may or may not be related to the
above problem.
2011-05-12 16:01:20,534 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 6 on 33465 caught: java.nio.channels.ClosedChannelException
2011-05-12 16:01:48,869 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 80 on 33465 caught: java.nio.channels.ClosedChannelException
2011-05-12 16:01:53,922 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 59 on 33465 caught: java.nio.channels.ClosedChannelException
2011-05-12 16:01:58,977 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 28 on 33465 caught: java.nio.channels.ClosedChannelException
2011-05-12 16:02:04,040 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 37 on 33465 caught: java.nio.channels.ClosedChannelException
2011-05-12 16:02:09,095 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 100 on 33465 caught: java.nio.channels.ClosedChannelException

Thanks.

-Adi


Call to namenode fails (java.io.EOFException)

2011-05-12 Thread Sidney Simmons
Hi there,

Apologies if this comes through twice but i sent the mail a few hours
ago and haven't seen it on the mailing list.

I'm experiencing some unusual behaviour on our 0.20.2 hadoop cluster.
Randomly (periodically), we're getting Call to namenode failures on
tasktrackers causing tasks to fail:

2011-05-12 14:36:37,462 WARN org.apache.hadoop.mapred.TaskRunner:
attempt_201105090819_059_m_0038_0Child Error
java.io.IOException: Call to namenode/10.10.10.10:9000 failed on local
exception: java.io.EOFException
   at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
   at org.apache.hadoop.ipc.Client.call(Client.java:743)
   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
   at $Proxy5.getFileInfo(Unknown Source)
   at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
   at java.lang.reflect.Method.invoke(Unknown Source)
   at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
   at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
   at $Proxy5.getFileInfo(Unknown Source)
   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:615)
   at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453)
   at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:210)
Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(Unknown Source)
   at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)

The namenode log (logging level = INFO) shows the following a few seconds
either side of the above timestamps. Could be relevant or it could be a
coincidence :

2011-05-12 14:36:40,005 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 57 on 9000 caught: java.nio.channels.ClosedChannelException
   at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(Unknown Source)
   at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
   at org.apache.hadoop.ipc.Server.channelWrite(Server.java:1213)
   at org.apache.hadoop.ipc.Server.access$1900(Server.java:77)
   at
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:622)
   at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:686)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:997)

The jobtracker does however have an entry that correlates with the
tasktracker :

2011-05-12 14:36:39,781 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201105090819_059_m_0038_0: java.io.IOException: Call to
namenode/10.10.10.10:9000 failed on local exception: java.io.EOFException
   at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
   at org.apache.hadoop.ipc.Client.call(Client.java:743)
   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
   at $Proxy1.getProtocolVersion(Unknown Source)
   at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
   at
org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:105)
   at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:208)
   at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:169)
   at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
   at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
   at org.apache.hadoop.mapred.Child.main(Child.java:157)
Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(Unknown Source)
   at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)

Can anyone give me any pointers on how to start troubleshooting this issue?
It's very sporadic and we haven't been able to reproduce the issue yet in
our lab. After looking through the mailing list archives, some of the
suggestions revolve around the following settings:

dfs.namenode.handler.count 128 (existing 64)
dfs.datanode.handler.count 10 (existing 3)
dfs.datanode.max.xcievers 4096 (existing 256)

Any pointers ?

Thanks in advance

Sid Simmons
Infrastructure Support Specialist


Re: mapper java process not exiting

2011-05-12 Thread Joey Echeverria
Which version of hadoop are you running?

Are you running on linux?

-Joey

On Thu, May 12, 2011 at 1:39 PM, Adi adi.pan...@gmail.com wrote:
 For one long running job we are noticing that the mapper jvms do not exit
 even after the mapper is done. Any suggestions on why this could be
 happening.
 The java processes get cleaned up if I do a hadoop job -kill job_id. The
 java processes get cleaned up of I run in it in a smaller batch and the job
 gets done fairly quickly(say half an hour). For larger inputs the nodes
 eventually run out of memory because of these java processes that the
 cluster thinks are gone but they haven't been cleaned up yet. I am
 suspecting the TaskTrackers are failing to kill JVMs for some reason by
 themselves.
 The following exceptions can be seen in the hadoop logs.

 2011-05-12 13:52:04,147 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
 Error executing shell command
 org.apache.hadoop.util.Shell$ExitCodeException: kill -12545: No such process
 2011-05-12 13:52:08,071 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
 Error executing shell command
 org.apache.hadoop.util.Shell$ExitCodeException: kill -11061: No such process
 2011-05-12 13:52:09,009 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
 Error executing shell command
 org.apache.hadoop.util.Shell$ExitCodeException: kill -11151: No such process
 2011-05-12 13:52:12,009 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
 Error executing shell command
 org.apache.hadoop.util.Shell$ExitCodeException: kill -25057: No such process
 2011-05-12 13:52:13,306 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
 Error executing shell command
 org.apache.hadoop.util.Shell$ExitCodeException: kill -19805: No such process
 2011-05-12 13:52:14,996 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
 Error executing shell command
 org.apache.hadoop.util.Shell$ExitCodeException: kill -11103: No such process

 2011-05-12 15:51:41,105 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
 Error executing shell command
 org.apache.hadoop.util.Shell$ExitCodeException: kill -17202: No such process
 2011-05-12 15:51:43,481 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
 Error executing shell command
 org.apache.hadoop.util.Shell$ExitCodeException: kill -15981: No such process
 2011-05-12 15:51:45,916 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
 Error executing shell command
 org.apache.hadoop.util.Shell$ExitCodeException: kill -17931: No such process
 2011-05-12 15:52:06,328 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
 Error executing shell command
 org.apache.hadoop.util.Shell$ExitCodeException: kill -14867: No such process
 2011-05-12 15:52:34,503 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
 Error executing shell command
 org.apache.hadoop.util.Shell$ExitCodeException: kill -29376: No such process
 2011-05-12 15:52:38,607 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
 Error executing shell command
 org.apache.hadoop.util.Shell$ExitCodeException: kill -32491: No such process
 2011-05-12 15:52:39,292 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
 Error executing shell command
 org.apache.hadoop.util.Shell$ExitCodeException: kill -31529: No such process
 2011-05-12 15:52:46,547 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
 Error executing shell command
 org.apache.hadoop.util.Shell$ExitCodeException: kill -15140: No such process

 Some other exceptions also seen in the logs may or may not be related to the
 above problem.
 2011-05-12 16:01:20,534 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 6 on 33465 caught: java.nio.channels.ClosedChannelException
 2011-05-12 16:01:48,869 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 80 on 33465 caught: java.nio.channels.ClosedChannelException
 2011-05-12 16:01:53,922 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 59 on 33465 caught: java.nio.channels.ClosedChannelException
 2011-05-12 16:01:58,977 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 28 on 33465 caught: java.nio.channels.ClosedChannelException
 2011-05-12 16:02:04,040 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 37 on 33465 caught: java.nio.channels.ClosedChannelException
 2011-05-12 16:02:09,095 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 100 on 33465 caught: java.nio.channels.ClosedChannelException

 Thanks.

 -Adi




-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434


Re: mapper java process not exiting

2011-05-12 Thread Adi
Which version of hadoop are you running?

 Hadoop 0.21.0 with some patches.



 Are you running on linux?

 Yes
Linux 2.6.18-238.9.1.el5 #1 SMP  x86_64 x86_64 x86_64 GNU/Linux
java version 1.6.0_21
Java(TM) SE Runtime Environment (build 1.6.0_21-b06)
Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode)

I set up 0.21.0 on another linux box and am not seeing this issue as hadoop
is reusing JVMs(as configured).
In the production cluster it is not re-using JVMs and runs out of memory
because of mapper JVMs staying alive even after they have ended according to
hadoop.

The production node is a 64 bit OS/JVM. Is there any known issue workaround
for enabling JVM reuse in 64 bit environments.

Test node is 32 bit:
Linux 2.6.18-194.32.1.el5.centos.plus #1 SMP i686 i686 i386 GNU/Linux
java version 1.6.0_17
OpenJDK Runtime Environment (IcedTea6 1.7.5) (rhel-1.16.b17.el5-i386)
OpenJDK Server VM (build 14.0-b16, mixed mode)

Even if I can get it to reuse JVM it will be grrreat.

-Adi





 -Joey

 On Thu, May 12, 2011 at 1:39 PM, Adi adi.pan...@gmail.com wrote:
  For one long running job we are noticing that the mapper jvms do not exit
  even after the mapper is done. Any suggestions on why this could be
  happening.
  The java processes get cleaned up if I do a hadoop job -kill job_id.
 The
  java processes get cleaned up of I run in it in a smaller batch and the
 job
  gets done fairly quickly(say half an hour). For larger inputs the nodes
  eventually run out of memory because of these java processes that the
  cluster thinks are gone but they haven't been cleaned up yet. I am
  suspecting the TaskTrackers are failing to kill JVMs for some reason by
  themselves.
  The following exceptions can be seen in the hadoop logs.
 
  2011-05-12 13:52:04,147 WARN
 org.apache.hadoop.mapreduce.util.ProcessTree:
  Error executing shell command
  org.apache.hadoop.util.Shell$ExitCodeException: kill -12545: No such
 process
  2011-05-12 13:52:08,071 WARN
 org.apache.hadoop.mapreduce.util.ProcessTree:
  Error executing shell command
  org.apache.hadoop.util.Shell$ExitCodeException: kill -11061: No such
 process
  2011-05-12 13:52:09,009 WARN
 org.apache.hadoop.mapreduce.util.ProcessTree:
  Error executing shell command
  org.apache.hadoop.util.Shell$ExitCodeException: kill -11151: No such
 process
  2011-05-12 13:52:12,009 WARN
 org.apache.hadoop.mapreduce.util.ProcessTree:
  Error executing shell command
  org.apache.hadoop.util.Shell$ExitCodeException: kill -25057: No such
 process
  2011-05-12 13:52:13,306 WARN
 org.apache.hadoop.mapreduce.util.ProcessTree:
  Error executing shell command
  org.apache.hadoop.util.Shell$ExitCodeException: kill -19805: No such
 process
  2011-05-12 13:52:14,996 WARN
 org.apache.hadoop.mapreduce.util.ProcessTree:
  Error executing shell command
  org.apache.hadoop.util.Shell$ExitCodeException: kill -11103: No such
 process
 
  2011-05-12 15:51:41,105 WARN
 org.apache.hadoop.mapreduce.util.ProcessTree:
  Error executing shell command
  org.apache.hadoop.util.Shell$ExitCodeException: kill -17202: No such
 process
  2011-05-12 15:51:43,481 WARN
 org.apache.hadoop.mapreduce.util.ProcessTree:
  Error executing shell command
  org.apache.hadoop.util.Shell$ExitCodeException: kill -15981: No such
 process
  2011-05-12 15:51:45,916 WARN
 org.apache.hadoop.mapreduce.util.ProcessTree:
  Error executing shell command
  org.apache.hadoop.util.Shell$ExitCodeException: kill -17931: No such
 process
  2011-05-12 15:52:06,328 WARN
 org.apache.hadoop.mapreduce.util.ProcessTree:
  Error executing shell command
  org.apache.hadoop.util.Shell$ExitCodeException: kill -14867: No such
 process
  2011-05-12 15:52:34,503 WARN
 org.apache.hadoop.mapreduce.util.ProcessTree:
  Error executing shell command
  org.apache.hadoop.util.Shell$ExitCodeException: kill -29376: No such
 process
  2011-05-12 15:52:38,607 WARN
 org.apache.hadoop.mapreduce.util.ProcessTree:
  Error executing shell command
  org.apache.hadoop.util.Shell$ExitCodeException: kill -32491: No such
 process
  2011-05-12 15:52:39,292 WARN
 org.apache.hadoop.mapreduce.util.ProcessTree:
  Error executing shell command
  org.apache.hadoop.util.Shell$ExitCodeException: kill -31529: No such
 process
  2011-05-12 15:52:46,547 WARN
 org.apache.hadoop.mapreduce.util.ProcessTree:
  Error executing shell command
  org.apache.hadoop.util.Shell$ExitCodeException: kill -15140: No such
 process
 
  Some other exceptions also seen in the logs may or may not be related to
 the
  above problem.
  2011-05-12 16:01:20,534 INFO org.apache.hadoop.ipc.Server: IPC Server
  handler 6 on 33465 caught: java.nio.channels.ClosedChannelException
  2011-05-12 16:01:48,869 INFO org.apache.hadoop.ipc.Server: IPC Server
  handler 80 on 33465 caught: java.nio.channels.ClosedChannelException
  2011-05-12 16:01:53,922 INFO org.apache.hadoop.ipc.Server: IPC Server
  handler 59 on 33465 caught: java.nio.channels.ClosedChannelException
  2011-05-12 16:01:58,977 INFO 

Datanode doesn't start but there is no exception in the log

2011-05-12 Thread Panayotis Antonopoulos

Hello,
I am trying to set up Hadoop HDFS in a cluster for the first time. So far I was 
using pseudo-distributed mode on my PC at home and everything was working 
perfectly.
Tha NameNode starts but the DataNode doesn't start and the log contains the 
following:

2011-05-13 04:01:13,663 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
STARTUP_MSG: 
/
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = clone1/147.102.4.129
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2-cdh3u0
STARTUP_MSG:   build =  -r 81256ad0f2e4ab2bd34b04f53d25a6c23686dd14; compiled 
by 'hudson' on Fri Mar 25 19:56:23 PDT 2011
/
2011-05-13 04:01:14,019 INFO org.apache.hadoop.security.UserGroupInformation: 
JAAS Configuration already set up for Hadoop, not re-installing.
2011-05-13 04:01:14,143 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Registered FSDatasetStatusMBean
2011-05-13 04:01:14,152 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Opened info server at 50010
2011-05-13 04:01:14,154 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Balancing bandwith is 1048576 bytes/s
2011-05-13 04:01:14,206 INFO org.mortbay.log: Logging to 
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2011-05-13 04:01:14,272 INFO org.apache.hadoop.http.HttpServer: Added global 
filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: Port returned 
by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the 
listener on 50075
2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: 
listener.getLocalPort() returned 50075 
webServer.getConnectors()[0].getLocalPort() returned 50075
2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: Jetty bound to 
port 50075
2011-05-13 04:01:14,278 INFO org.mortbay.log: jetty-6.1.26
2011-05-13 04:01:14,567 INFO org.mortbay.log: Started 
SelectChannelConnector@0.0.0.0:50075
2011-05-13 04:01:14,570 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processName=DataNode, sessionId=null
2011-05-13 04:01:14,976 INFO org.apache.hadoop.ipc.Server: Starting Socket 
Reader #1 for port 50020
2011-05-13 04:01:14,978 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: 
Initializing RPC Metrics with hostName=DataNode, port=50020
2011-05-13 04:01:14,981 INFO org.apache.hadoop.ipc.metrics.RpcDetailedMetrics: 
Initializing RPC Metrics with hostName=DataNode, port=50020
2011-05-13 04:01:14,984 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
dnRegistration = DatanodeRegistration(clone1:50010, storageID=, infoPort=50075, 
ipcPort=50020)

Does anyone know what might be wrong??

Thank you in advance!
Panagiotis
  

Re: Datanode doesn't start but there is no exception in the log

2011-05-12 Thread Bharath Mundlapudi
Is that all the messages in the datanode log? Do you see any SHUTDOWN message 
also?

-Bharath




From: Panayotis Antonopoulos antonopoulos...@hotmail.com
To: common-user@hadoop.apache.org
Sent: Thursday, May 12, 2011 6:07 PM
Subject: Datanode doesn't start but there is no exception in the log


Hello,
I am trying to set up Hadoop HDFS in a cluster for the first time. So far I was 
using pseudo-distributed mode on my PC at home and everything was working 
perfectly.
Tha NameNode starts but the DataNode doesn't start and the log contains the 
following:

2011-05-13 04:01:13,663 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
STARTUP_MSG: 
/
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = clone1/147.102.4.129
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2-cdh3u0
STARTUP_MSG:   build =  -r 81256ad0f2e4ab2bd34b04f53d25a6c23686dd14; compiled 
by 'hudson' on Fri Mar 25 19:56:23 PDT 2011
/
2011-05-13 04:01:14,019 INFO org.apache.hadoop.security.UserGroupInformation: 
JAAS Configuration already set up for Hadoop, not re-installing.
2011-05-13 04:01:14,143 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Registered FSDatasetStatusMBean
2011-05-13 04:01:14,152 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Opened info server at 50010
2011-05-13 04:01:14,154 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Balancing bandwith is 1048576 bytes/s
2011-05-13 04:01:14,206 INFO org.mortbay.log: Logging to 
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2011-05-13 04:01:14,272 INFO org.apache.hadoop.http.HttpServer: Added global 
filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: Port returned 
by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the 
listener on 50075
2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: 
listener.getLocalPort() returned 50075 
webServer.getConnectors()[0].getLocalPort() returned 50075
2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: Jetty bound to 
port 50075
2011-05-13 04:01:14,278 INFO org.mortbay.log: jetty-6.1.26
2011-05-13 04:01:14,567 INFO org.mortbay.log: Started 
SelectChannelConnector@0.0.0.0:50075
2011-05-13 04:01:14,570 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processName=DataNode, sessionId=null
2011-05-13 04:01:14,976 INFO org.apache.hadoop.ipc.Server: Starting Socket 
Reader #1 for port 50020
2011-05-13 04:01:14,978 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: 
Initializing RPC Metrics with hostName=DataNode, port=50020
2011-05-13 04:01:14,981 INFO org.apache.hadoop.ipc.metrics.RpcDetailedMetrics: 
Initializing RPC Metrics with hostName=DataNode, port=50020
2011-05-13 04:01:14,984 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
dnRegistration = DatanodeRegistration(clone1:50010, storageID=, infoPort=50075, 
ipcPort=50020)

Does anyone know what might be wrong??

Thank you in advance!
Panagiotis

Re: is it possible to concatenate output files under many reducers?

2011-05-12 Thread Jun Young Kim

yes. that is a general solution to control counts of output files.

however, if you need to control counts of outputs dynamically, how could 
you do?


if an output file name is 'A', counts of this output files are needed to 
be 5.
if an output file name is 'B', counts of this output files are needed to 
be 10.


is it able to be under hadoop?

Junyoung Kim (juneng...@gmail.com)


On 05/12/2011 02:17 PM, Harsh J wrote:

Short, blind answer: You could run 10 reducers.

Otherwise, you'll have to run another job that picks up a few files
each in mapper and merges them out. But having 60 files shouldn't
really be a problem if they are sufficiently large (at least 80% of a
block size perhaps -- you can tune # of reducers to achieve this).

On Thu, May 12, 2011 at 6:14 AM, Jun Young Kimjuneng...@gmail.com  wrote:

hi, all.

I have 60 reducers which are generating same output files.

from output-r--1 to output-r-00059.

under this situation, I want to control the count of output files.

for example, is it possible to concatenate all output files to 10 ?

from output-r-1 to output-r-00010.

thanks

--
Junyoung Kim (juneng...@gmail.com)







Re: mapper java process not exiting

2011-05-12 Thread Joey Echeverria
 Hadoop 0.21.0 with some patches.

Hadoop 0.21.0 doesn't get much use, so I'm not sure how much help I can be.

 2011-05-12 13:52:04,147 WARN org.apache.hadoop.mapreduce.util.ProcessTree:
 Error executing shell command
 org.apache.hadoop.util.Shell$ExitCodeException: kill -12545: No such process

Your logs showed that Hadoop tried to kill processes but the kill
command claimed they didn't exist. The next time you see this problem,
can you check the logs and see if any of the PIDs that appear in the
logs are in fact still running?

A more likely scenario is that Hadoop's tracking of child VMs is
getting out of sync, but I'm not sure what would cause that.

-Joey

-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434


Re: is it possible to concatenate output files under many reducers?

2011-05-12 Thread Joey Echeverria
You can control the number of reducers by calling
job.setNumReduceTasks() before you launch it.

-Joey

On Thu, May 12, 2011 at 6:33 PM, Jun Young Kim juneng...@gmail.com wrote:
 yes. that is a general solution to control counts of output files.

 however, if you need to control counts of outputs dynamically, how could you
 do?

 if an output file name is 'A', counts of this output files are needed to be
 5.
 if an output file name is 'B', counts of this output files are needed to be
 10.

 is it able to be under hadoop?

 Junyoung Kim (juneng...@gmail.com)


 On 05/12/2011 02:17 PM, Harsh J wrote:

 Short, blind answer: You could run 10 reducers.

 Otherwise, you'll have to run another job that picks up a few files
 each in mapper and merges them out. But having 60 files shouldn't
 really be a problem if they are sufficiently large (at least 80% of a
 block size perhaps -- you can tune # of reducers to achieve this).

 On Thu, May 12, 2011 at 6:14 AM, Jun Young Kimjuneng...@gmail.com
  wrote:

 hi, all.

 I have 60 reducers which are generating same output files.

 from output-r--1 to output-r-00059.

 under this situation, I want to control the count of output files.

 for example, is it possible to concatenate all output files to 10 ?

 from output-r-1 to output-r-00010.

 thanks

 --
 Junyoung Kim (juneng...@gmail.com)








-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434


Re: mapper java process not exiting

2011-05-12 Thread Adi
  2011-05-12 13:52:04,147 WARN
 org.apache.hadoop.mapreduce.util.ProcessTree:
  Error executing shell command
  org.apache.hadoop.util.Shell$ExitCodeException: kill -12545: No such
 process

 Your logs showed that Hadoop tried to kill processes but the kill
 command claimed they didn't exist. The next time you see this problem,
 can you check the logs and see if any of the PIDs that appear in the
 logs are in fact still running?

 A more likely scenario is that Hadoop's tracking of child VMs is
 getting out of sync, but I'm not sure what would cause that.


Yes those java processes are in fact running. And those error messages do
not always show up. Just sometimes. But the processes never get cleaned up.

-Adi


Re: mapper java process not exiting

2011-05-12 Thread highpointe
Is there a reason for using OpenJDK and not Sun's JDK?

Also...  I believe there were noted issues with the .17 JDK. I will look for a 
link and post if I can find. 

Otherwise, the behaviour I have seen before. Hadoop is detaching from the JVM 
and stops seeing it.

I think your problem lies in the JDK and not Hadoop. 


On May 12, 2011 at 8:12 PM, Adi adi.pan...@gmail.com wrote:

 2011-05-12 13:52:04,147 WARN
 org.apache.hadoop.mapreduce.util.ProcessTree:
 Error executing shell command
 org.apache.hadoop.util.Shell$ExitCodeException: kill -12545: No such
 process
 
 Your logs showed that Hadoop tried to kill processes but the kill
 command claimed they didn't exist. The next time you see this problem,
 can you check the logs and see if any of the PIDs that appear in the
 logs are in fact still running?
 
 A more likely scenario is that Hadoop's tracking of child VMs is
 getting out of sync, but I'm not sure what would cause that.
 
 
 Yes those java processes are in fact running. And those error messages do
 not always show up. Just sometimes. But the processes never get cleaned up.
 
 -Adi


Can Mapper get paths of inputSplits ?

2011-05-12 Thread Mark question
Hi

   I'm using FileInputFormat which will split files logically according to
their sizes into splits. Can the mapper get a pointer to these splits? and
know which split it is assigned ?

   I tried looking up the Reporter class and see how is it printing the
logical splits on the UI for each mapper .. but it's an interface.

   Eg.
Mapper1:  is assigned the logical split
hdfs://localhost:9000/user/Hadoop/input:23+24
Mapper2:  is assigned the logical split
hdfs://localhost:9000/user/Hadoop/input:0+23

 Then inside map, I want to ask what are the logical splits and get the
upper two strings and know which one my current mapper is assigned.

 Thanks,
Mark


I can't see my messages immediately, and sometimes doesn't even arrive why !

2011-05-12 Thread Mark question



Re: Can Mapper get paths of inputSplits ?

2011-05-12 Thread Owen O'Malley
On Thu, May 12, 2011 at 8:59 PM, Mark question markq2...@gmail.com wrote:

 Hi

   I'm using FileInputFormat which will split files logically according to
 their sizes into splits. Can the mapper get a pointer to these splits? and
 know which split it is assigned ?


Look at
http://hadoop.apache.org/common/docs/r0.20.203.0/mapred_tutorial.html#Task+JVM+Reuse

 In particular, map.input.file and map.input.offset are the configuration
parameters that you want.

-- Owen


Re: Can Mapper get paths of inputSplits ?

2011-05-12 Thread Mark question
Thanks for the reply Owen, I only knew about map.input.file.

 So there is no way I can see the other possible splits (start+length)? like
some function that returns strings of map.input.file and map.input.offset of
the other mappers ?

Thanks,
Mark

On Thu, May 12, 2011 at 9:08 PM, Owen O'Malley omal...@apache.org wrote:

 On Thu, May 12, 2011 at 8:59 PM, Mark question markq2...@gmail.com
 wrote:

  Hi
 
I'm using FileInputFormat which will split files logically according to
  their sizes into splits. Can the mapper get a pointer to these splits?
 and
  know which split it is assigned ?
 

 Look at

 http://hadoop.apache.org/common/docs/r0.20.203.0/mapred_tutorial.html#Task+JVM+Reuse

  In particular, map.input.file and map.input.offset are the configuration
 parameters that you want.

 -- Owen



Re: how to get user-specified Job name from hadoop for running jobs?

2011-05-12 Thread Mark question
you mean by user-specified is when you write your job name via
JobConf.setJobName(myTask) ?
Then using the same object you can recall your name as follows:

JobConf conf ;
conf.getJobName() ;

~Cheers
Mark

On Tue, May 10, 2011 at 10:16 AM, Mark Zand mz...@basistech.com wrote:

 While I can get JobStatus with this:

 JobClient client = new JobClient(new JobConf(conf));
 JobStatus[] jobStatuses = client.getAllJobs();


 I don't see any way to get user-specified Job name.

 Please help. Thanks.



Re: Can Mapper get paths of inputSplits ?

2011-05-12 Thread Owen O'Malley
On Thu, May 12, 2011 at 9:23 PM, Mark question markq2...@gmail.com wrote:

  So there is no way I can see the other possible splits (start+length)?
 like
 some function that returns strings of map.input.file and map.input.offset
 of
 the other mappers ?


No, there isn't any way to do it using the public API.

The only way would be to look under the covers and read the split file
(job.split).

-- Owen


Re: Can Mapper get paths of inputSplits ?

2011-05-12 Thread Mark question
Then which class is filling the
Thanks again Owen, hopefully last but:

   Who's filling the map.input.file and map.input.offset (ie. which class)
so I can extend it to have a function to return these strings.

Thanks,
Mark

On Thu, May 12, 2011 at 10:07 PM, Owen O'Malley omal...@apache.org wrote:

 On Thu, May 12, 2011 at 9:23 PM, Mark question markq2...@gmail.com
 wrote:

   So there is no way I can see the other possible splits (start+length)?
  like
  some function that returns strings of map.input.file and map.input.offset
  of
  the other mappers ?
 

 No, there isn't any way to do it using the public API.

 The only way would be to look under the covers and read the split file
 (job.split).

 -- Owen



Re: Call to namenode fails with java.io.EOFException

2011-05-12 Thread Harsh J
One of the reasons I can think of could be a version mismatch. You may
want to ensure that the job in question was not carrying a separate
version of Hadoop along with it inside, perhaps?

On Fri, May 13, 2011 at 12:42 AM, Sidney Simmons
ssimm...@nmitconsulting.co.uk wrote:
 Hi there,

 I'm experiencing some unusual behaviour on our 0.20.2 hadoop cluster.
 Randomly (periodically), we're getting Call to namenode failures on
 tasktrackers causing tasks to fail:

 2011-05-12 14:36:37,462 WARN org.apache.hadoop.mapred.TaskRunner:
 attempt_201105090819_059_m_0038_0Child Error
 java.io.IOException: Call to namenode/10.10.10.10:9000 failed on local
 exception: java.io.EOFException
       at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
       at org.apache.hadoop.ipc.Client.call(Client.java:743)
       at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
       at $Proxy5.getFileInfo(Unknown Source)
       at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
       at java.lang.reflect.Method.invoke(Unknown Source)
       at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
       at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
       at $Proxy5.getFileInfo(Unknown Source)
       at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:615)
       at
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453)
       at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:210)
 Caused by: java.io.EOFException
       at java.io.DataInputStream.readInt(Unknown Source)
       at
 org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
       at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)

 The namenode log (logging level = INFO) shows the following a few seconds
 either side of the above timestamps. Could be relevant or it could be a
 coincidence :

 2011-05-12 14:36:40,005 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 57 on 9000 caught: java.nio.channels.ClosedChannelException
       at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(Unknown Source)
       at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
       at org.apache.hadoop.ipc.Server.channelWrite(Server.java:1213)
       at org.apache.hadoop.ipc.Server.access$1900(Server.java:77)
       at
 org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:622)
       at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:686)
       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:997)

 The jobtracker does however have an entry that correlates with the
 tasktracker :

 2011-05-12 14:36:39,781 INFO org.apache.hadoop.mapred.TaskInProgress: Error
 from attempt_201105090819_059_m_0038_0: java.io.IOException: Call to
 namenode/10.10.10.10:9000 failed on local exception: java.io.EOFException
       at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
       at org.apache.hadoop.ipc.Client.call(Client.java:743)
       at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
       at $Proxy1.getProtocolVersion(Unknown Source)
       at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
       at
 org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:105)
       at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:208)
       at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:169)
       at
 org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
       at
 org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
       at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
       at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
       at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
       at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
       at org.apache.hadoop.mapred.Child.main(Child.java:157)
 Caused by: java.io.EOFException
       at java.io.DataInputStream.readInt(Unknown Source)
       at
 org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
       at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)

 Can anyone give me any pointers on how to start troubleshooting this issue?
 It's very sporadic and we haven't been able to reproduce the issue yet in
 our lab. After looking through the mailing list archives, some of the
 suggestions revolve around the following settings:

 dfs.namenode.handler.count 128 (existing 64)
 dfs.datanode.handler.count 10 (existing 3)
 dfs.datanode.max.xcievers 4096 (existing 256)

 Any pointers ?

 Thanks in advance

 Sid Simmons
 Infrastructure Support Specialist




-- 
Harsh J


Re: Datanode doesn't start but there is no exception in the log

2011-05-12 Thread highpointe
Have you defined the IP
of the DN in the slaves file?



Sent from my iPhone

On May 12, 2011, at 7:27 PM, Bharath Mundlapudi bharathw...@yahoo.com wrote:

 Is that all the messages in the datanode log? Do you see any SHUTDOWN message 
 also?
 
 -Bharath
 
 
 
 
 From: Panayotis Antonopoulos antonopoulos...@hotmail.com
 To: common-user@hadoop.apache.org
 Sent: Thursday, May 12, 2011 6:07 PM
 Subject: Datanode doesn't start but there is no exception in the log
 
 
 Hello,
 I am trying to set up Hadoop HDFS in a cluster for the first time. So far I 
 was using pseudo-distributed mode on my PC at home and everything was working 
 perfectly.
 Tha NameNode starts but the DataNode doesn't start and the log contains the 
 following:
 
 2011-05-13 04:01:13,663 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
 STARTUP_MSG: 
 /
 STARTUP_MSG: Starting DataNode
 STARTUP_MSG:   host = clone1/147.102.4.129
 STARTUP_MSG:   args = []
 STARTUP_MSG:   version = 0.20.2-cdh3u0
 STARTUP_MSG:   build =  -r 81256ad0f2e4ab2bd34b04f53d25a6c23686dd14; compiled 
 by 'hudson' on Fri Mar 25 19:56:23 PDT 2011
 /
 2011-05-13 04:01:14,019 INFO org.apache.hadoop.security.UserGroupInformation: 
 JAAS Configuration already set up for Hadoop, not re-installing.
 2011-05-13 04:01:14,143 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
 Registered FSDatasetStatusMBean
 2011-05-13 04:01:14,152 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
 Opened info server at 50010
 2011-05-13 04:01:14,154 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
 Balancing bandwith is 1048576 bytes/s
 2011-05-13 04:01:14,206 INFO org.mortbay.log: Logging to 
 org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via 
 org.mortbay.log.Slf4jLog
 2011-05-13 04:01:14,272 INFO org.apache.hadoop.http.HttpServer: Added global 
 filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
 2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: Port returned 
 by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening 
 the listener on 50075
 2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: 
 listener.getLocalPort() returned 50075 
 webServer.getConnectors()[0].getLocalPort() returned 50075
 2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: Jetty bound 
 to port 50075
 2011-05-13 04:01:14,278 INFO org.mortbay.log: jetty-6.1.26
 2011-05-13 04:01:14,567 INFO org.mortbay.log: Started 
 SelectChannelConnector@0.0.0.0:50075
 2011-05-13 04:01:14,570 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
 Initializing JVM Metrics with processName=DataNode, sessionId=null
 2011-05-13 04:01:14,976 INFO org.apache.hadoop.ipc.Server: Starting Socket 
 Reader #1 for port 50020
 2011-05-13 04:01:14,978 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: 
 Initializing RPC Metrics with hostName=DataNode, port=50020
 2011-05-13 04:01:14,981 INFO 
 org.apache.hadoop.ipc.metrics.RpcDetailedMetrics: Initializing RPC Metrics 
 with hostName=DataNode, port=50020
 2011-05-13 04:01:14,984 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
 dnRegistration = DatanodeRegistration(clone1:50010, storageID=, 
 infoPort=50075, ipcPort=50020)
 
 Does anyone know what might be wrong??
 
 Thank you in advance!
 Panagiotis