Re: hadoop/hive data loading
hi,hadoopman you can put the large data into your hdfs using hadoop fs -put src dest and then you can use alter table xxx add partition(x) location 'desc' 2011/5/11 amit jaiswal amit_...@yahoo.com Hi, What is the meaning of 'union' over here. Is there any hadoop job with 1 (or few) reducer that combines all data together. Have you tried external (dynamic) partitions for combining data? -amit - Original Message - From: hadoopman hadoop...@gmail.com To: common-user@hadoop.apache.org Cc: Sent: Tuesday, 10 May 2011 11:26 PM Subject: hadoop/hive data loading When we load data into hive sometimes we've run into situations where the load fails and the logs show a heap out of memory error. If I load just a few days (or months) of data then no problem. But then if I try to load two years (for example) of data then I've seen it fail. Not with every feed but certain ones. Sometimes I've been able to split the data and get it to load. An example of one type of feed I'm working on is the apache web server access logs. Generally it works. But there are times when I need to load more than a few months of data and get the memory heap errors in the task logs. Generally how do people load their data into Hive? We have a process where we first copy it to hdfs then from there we run a staging process to get it into hive. Once that completes we perform a union all then overwrite table partition. Usually it's during the union all stage that we see these errors appear. Also is there a log which tells you which log it fails on? I can see which task/job failed but not finding which file it's complaining about. I figure that might help a bit.. Thanks! -- Stay Hungry. Stay Foolish.
What exactly are the output_dir/part-00000 semantics (of a streaming job) ?
Hi, I'm running some experiments using hadoop streaming. I always get a output_dir/part-0 file at the end, but I wonder: when exactly will this filename show up? when it's completely written, or will it already show up while the hapreduce software is still writing to it? Is the write atomic? The reason I'm asking this, I have a script which submits +- 200 of jobs to mapreduce, and I have an other script collecting the part-00 files of all jobs. (not just once when all experiments are done, but I frequently collect all results of thus far finished jobs) For this, I just do (simplified code): for i in $(seq 1 200); do if $(ssh $master bin/hadoop dfs -ls $i/output/part-00); then ssh $master bin/hadoop dfs -cat $i/output/part-00 output_$i fi done and I wonder if this is prone to race conditions, is there any change I will run this while $i/output/part-00 in the process of being written to, and hence I end up with incomplete output_$i files? If so, what's the proper way to check if the file is really stable? fetching the jobtracker webpage and checking if job $i is finished? Dieter
Host-address or Hostname
Hi all, The String[] that is output by the InputSplit.getLocations() gives the list of nodes where the input split resides. But the node detail is either represented as the ip-address or the hostname (for eg - an entry in the list could be either 10.72.147.109 or mattHDFS1 (hostname). Is it possible to make this consistent. I am trying to do some work by parsing an ID number embedded in the Hostname and this mixed representation is giving me hell lot of problems. How to resolve this ? Thanks, Matthew
Question about InputSampler
Hello, I am writing a MR job where the distribution of the Keys emitted by the Map phase is not known beforehand and so I can't create the partitions for the TotalOrderPartitioner. I would like to sample those keys to create the partitions and then run the job that will process the whole input. Is the InputSampler the tool I need? I tried to use it but I think it doesn't use the mapper class to process the samples and then create the partitions, but it just creates the partitions from the input. Am I wrong? Thank you in advance! Pan
Re: Host-address or Hostname
Is it possible to get a Host-address to Host-name mapping in the JIP ? Someone please help me with this! Thanks, Matthew On Thu, May 12, 2011 at 5:36 PM, Matthew John tmatthewjohn1...@gmail.comwrote: Hi all, The String[] that is output by the InputSplit.getLocations() gives the list of nodes where the input split resides. But the node detail is either represented as the ip-address or the hostname (for eg - an entry in the list could be either 10.72.147.109 or mattHDFS1 (hostname). Is it possible to make this consistent. I am trying to do some work by parsing an ID number embedded in the Hostname and this mixed representation is giving me hell lot of problems. How to resolve this ? Thanks, Matthew
Error reading task output for benchmark test TESTDFSIO
Hello, I have a four node hadoop cluster running hadoop v.0.20.2 on CentOS 5.6. Here is my layout: Name01.hadoop.stage (namenode) Name02.hadoop.stage (sec namenode / jobtracker) Data01.hadoop.stage (data node) Data02.hadoop.stage (data node) When trying to run a benchmark test for this newly-stood up cluster I'm getting errors. This is the command (run as the hadoop user on my name01.hadoop.stage node): # /opt/hadoop/bin/hadoop jar /opt/hadoop/hadoop-0.20.2-test.jar TestDFSIO -write -nrFiles 1 -fileSize 10 Here is the output: {{BEGIN}} TestFDSIO.0.0.4 11/05/12 09:35:52 INFO mapred.FileInputFormat: nrFiles = 1 11/05/12 09:35:52 INFO mapred.FileInputFormat: fileSize (MB) = 10 11/05/12 09:35:52 INFO mapred.FileInputFormat: bufferSize = 100 11/05/12 09:35:52 INFO mapred.FileInputFormat: creating control file: 10 mega bytes, 1 files 11/05/12 09:35:52 INFO mapred.FileInputFormat: created control files for: 1 files 11/05/12 09:35:52 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 11/05/12 09:35:52 INFO mapred.FileInputFormat: Total input paths to process : 1 11/05/12 09:35:52 INFO mapred.JobClient: Running job: job_201105120935_0001 11/05/12 09:35:53 INFO mapred.JobClient: map 0% reduce 0% 11/05/12 09:35:59 INFO mapred.JobClient: Task Id : attempt_201105120935_0001_m_02_0, Status : FAILED java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) 11/05/12 09:35:59 WARN mapred.JobClient: Error reading task outputhttp://data02.hadoop.stage:50060/tasklog?plaintext=truetaskid=attempt _201105120935_0001_m_02_0filter=stdout 11/05/12 09:35:59 WARN mapred.JobClient: Error reading task outputhttp://data02.hadoop.stage:50060/tasklog?plaintext=truetaskid=attempt _201105120935_0001_m_02_0filter=stderr 11/05/12 09:36:05 INFO mapred.JobClient: Task Id : attempt_201105120935_0001_r_02_0, Status : FAILED java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) 11/05/12 09:36:05 WARN mapred.JobClient: Error reading task outputhttp://data02.hadoop.stage:50060/tasklog?plaintext=truetaskid=attempt _201105120935_0001_r_02_0filter=stdout 11/05/12 09:36:05 WARN mapred.JobClient: Error reading task outputhttp://data02.hadoop.stage:50060/tasklog?plaintext=truetaskid=attempt _201105120935_0001_r_02_0filter=stderr 11/05/12 09:36:14 INFO mapred.JobClient: Task Id : attempt_201105120935_0001_m_02_1, Status : FAILED java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) 11/05/12 09:36:14 WARN mapred.JobClient: Error reading task outputhttp://data01.hadoop.stage:50060/tasklog?plaintext=truetaskid=attempt _201105120935_0001_m_02_1filter=stdout 11/05/12 09:36:14 WARN mapred.JobClient: Error reading task outputhttp://data01.hadoop.stage:50060/tasklog?plaintext=truetaskid=attempt _201105120935_0001_m_02_1filter=stderr 11/05/12 09:36:20 INFO mapred.JobClient: Task Id : attempt_201105120935_0001_m_02_2, Status : FAILED java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) 11/05/12 09:36:20 WARN mapred.JobClient: Error reading task outputhttp://data01.hadoop.stage:50060/tasklog?plaintext=truetaskid=attempt _201105120935_0001_m_02_2filter=stdout 11/05/12 09:36:20 WARN mapred.JobClient: Error reading task outputhttp://data01.hadoop.stage:50060/tasklog?plaintext=truetaskid=attempt _201105120935_0001_m_02_2filter=stderr 11/05/12 09:36:33 INFO mapred.JobClient: Task Id : attempt_201105120935_0001_m_01_0, Status : FAILED java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) 11/05/12 09:36:33 WARN mapred.JobClient: Error reading task outputhttp://data01.hadoop.stage:50060/tasklog?plaintext=truetaskid=attempt _201105120935_0001_m_01_0filter=stdout 11/05/12 09:36:33 WARN mapred.JobClient: Error reading task outputhttp://data01.hadoop.stage:50060/tasklog?plaintext=truetaskid=attempt _201105120935_0001_m_01_0filter=stderr 11/05/12 09:36:39 INFO mapred.JobClient: Task Id : attempt_201105120935_0001_r_01_0, Status : FAILED java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) 11/05/12 09:36:39 WARN mapred.JobClient: Error reading task outputhttp://data01.hadoop.stage:50060/tasklog?plaintext=truetaskid=attempt _201105120935_0001_r_01_0filter=stdout 11/05/12 09:36:39 WARN mapred.JobClient: Error reading task outputhttp://data01.hadoop.stage:50060/tasklog?plaintext=truetaskid=attempt
Re: What exactly are the output_dir/part-00000 semantics (of a streaming job) ?
The creation of files part-n is atomic. When you run a MR job, these files are created in directory output_dir/_temporary and moved to output_dir after the files is closed for writing. This move is atomic hence as long as you don't try to read these files from temporary directory (which I see you are not) you will be fine. -- View this message in context: http://lucene.472066.n3.nabble.com/What-exactly-are-the-output-dir-part-0-semantics-of-a-streaming-job-tp2931125p2932598.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Call to namenode failures
Hi there, I'm experiencing some unusual behaviour on our 0.20.2 hadoop cluster. Randomly (periodically), we're getting Call to namenode failures on tasktrackers causing tasks to fail: 2011-05-12 14:36:37,462 WARN org.apache.hadoop.mapred.TaskRunner: attempt_201105090819_059_m_0038_0Child Error java.io.IOException: Call to namenode/10.10.10.10:9000 failed on local exception: java.io.EOFException at org.apache.hadoop.ipc.Client.wrapException(Client.java:775) at org.apache.hadoop.ipc.Client.call(Client.java:743) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy5.getFileInfo(Unknown Source) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy5.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:615) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:210) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(Unknown Source) at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446) The namenode log (logging level = INFO) shows the following a few seconds either side of the above timestamps. Could be relevant or it could be a coincidence : 2011-05-12 14:36:40,005 INFO org.apache.hadoop.ipc.Server: IPC Server handler 57 on 9000 caught: java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(Unknown Source) at sun.nio.ch.SocketChannelImpl.write(Unknown Source) at org.apache.hadoop.ipc.Server.channelWrite(Server.java:1213) at org.apache.hadoop.ipc.Server.access$1900(Server.java:77) at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:622) at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:686) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:997) The jobtracker does however have an entry that correlates with the tasktracker : 2011-05-12 14:36:39,781 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201105090819_059_m_0038_0: java.io.IOException: Call to namenode/10.10.10.10:9000 failed on local exception: java.io.EOFException at org.apache.hadoop.ipc.Client.wrapException(Client.java:775) at org.apache.hadoop.ipc.Client.call(Client.java:743) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy1.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:105) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:208) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:169) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95) at org.apache.hadoop.mapred.Child.main(Child.java:157) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(Unknown Source) at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446) Can anyone give me any pointers on how to start troubleshooting this issue? It's very sporadic and we haven't been able to reproduce the issue yet in our lab. After looking through the mailing list archives, some of the suggestions revolve around the following settings: dfs.namenode.handler.count 128 (existing 64) dfs.datanode.handler.count 10 (existing 3) dfs.datanode.max.xcievers 4096 (existing 256) Any pointers ? Thanks in advance Sid Simmons Infrastructure Support Specialist
Call to namenode fails with java.io.EOFException
Hi there, I'm experiencing some unusual behaviour on our 0.20.2 hadoop cluster. Randomly (periodically), we're getting Call to namenode failures on tasktrackers causing tasks to fail: 2011-05-12 14:36:37,462 WARN org.apache.hadoop.mapred.TaskRunner: attempt_201105090819_059_m_0038_0Child Error java.io.IOException: Call to namenode/10.10.10.10:9000 failed on local exception: java.io.EOFException at org.apache.hadoop.ipc.Client.wrapException(Client.java:775) at org.apache.hadoop.ipc.Client.call(Client.java:743) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy5.getFileInfo(Unknown Source) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy5.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:615) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:210) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(Unknown Source) at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446) The namenode log (logging level = INFO) shows the following a few seconds either side of the above timestamps. Could be relevant or it could be a coincidence : 2011-05-12 14:36:40,005 INFO org.apache.hadoop.ipc.Server: IPC Server handler 57 on 9000 caught: java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(Unknown Source) at sun.nio.ch.SocketChannelImpl.write(Unknown Source) at org.apache.hadoop.ipc.Server.channelWrite(Server.java:1213) at org.apache.hadoop.ipc.Server.access$1900(Server.java:77) at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:622) at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:686) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:997) The jobtracker does however have an entry that correlates with the tasktracker : 2011-05-12 14:36:39,781 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201105090819_059_m_0038_0: java.io.IOException: Call to namenode/10.10.10.10:9000 failed on local exception: java.io.EOFException at org.apache.hadoop.ipc.Client.wrapException(Client.java:775) at org.apache.hadoop.ipc.Client.call(Client.java:743) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy1.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:105) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:208) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:169) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95) at org.apache.hadoop.mapred.Child.main(Child.java:157) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(Unknown Source) at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446) Can anyone give me any pointers on how to start troubleshooting this issue? It's very sporadic and we haven't been able to reproduce the issue yet in our lab. After looking through the mailing list archives, some of the suggestions revolve around the following settings: dfs.namenode.handler.count 128 (existing 64) dfs.datanode.handler.count 10 (existing 3) dfs.datanode.max.xcievers 4096 (existing 256) Any pointers ? Thanks in advance Sid Simmons Infrastructure Support Specialist
mapper java process not exiting
For one long running job we are noticing that the mapper jvms do not exit even after the mapper is done. Any suggestions on why this could be happening. The java processes get cleaned up if I do a hadoop job -kill job_id. The java processes get cleaned up of I run in it in a smaller batch and the job gets done fairly quickly(say half an hour). For larger inputs the nodes eventually run out of memory because of these java processes that the cluster thinks are gone but they haven't been cleaned up yet. I am suspecting the TaskTrackers are failing to kill JVMs for some reason by themselves. The following exceptions can be seen in the hadoop logs. 2011-05-12 13:52:04,147 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -12545: No such process 2011-05-12 13:52:08,071 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -11061: No such process 2011-05-12 13:52:09,009 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -11151: No such process 2011-05-12 13:52:12,009 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -25057: No such process 2011-05-12 13:52:13,306 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -19805: No such process 2011-05-12 13:52:14,996 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -11103: No such process 2011-05-12 15:51:41,105 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -17202: No such process 2011-05-12 15:51:43,481 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -15981: No such process 2011-05-12 15:51:45,916 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -17931: No such process 2011-05-12 15:52:06,328 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -14867: No such process 2011-05-12 15:52:34,503 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -29376: No such process 2011-05-12 15:52:38,607 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -32491: No such process 2011-05-12 15:52:39,292 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -31529: No such process 2011-05-12 15:52:46,547 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -15140: No such process Some other exceptions also seen in the logs may or may not be related to the above problem. 2011-05-12 16:01:20,534 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 33465 caught: java.nio.channels.ClosedChannelException 2011-05-12 16:01:48,869 INFO org.apache.hadoop.ipc.Server: IPC Server handler 80 on 33465 caught: java.nio.channels.ClosedChannelException 2011-05-12 16:01:53,922 INFO org.apache.hadoop.ipc.Server: IPC Server handler 59 on 33465 caught: java.nio.channels.ClosedChannelException 2011-05-12 16:01:58,977 INFO org.apache.hadoop.ipc.Server: IPC Server handler 28 on 33465 caught: java.nio.channels.ClosedChannelException 2011-05-12 16:02:04,040 INFO org.apache.hadoop.ipc.Server: IPC Server handler 37 on 33465 caught: java.nio.channels.ClosedChannelException 2011-05-12 16:02:09,095 INFO org.apache.hadoop.ipc.Server: IPC Server handler 100 on 33465 caught: java.nio.channels.ClosedChannelException Thanks. -Adi
Call to namenode fails (java.io.EOFException)
Hi there, Apologies if this comes through twice but i sent the mail a few hours ago and haven't seen it on the mailing list. I'm experiencing some unusual behaviour on our 0.20.2 hadoop cluster. Randomly (periodically), we're getting Call to namenode failures on tasktrackers causing tasks to fail: 2011-05-12 14:36:37,462 WARN org.apache.hadoop.mapred.TaskRunner: attempt_201105090819_059_m_0038_0Child Error java.io.IOException: Call to namenode/10.10.10.10:9000 failed on local exception: java.io.EOFException at org.apache.hadoop.ipc.Client.wrapException(Client.java:775) at org.apache.hadoop.ipc.Client.call(Client.java:743) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy5.getFileInfo(Unknown Source) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy5.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:615) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:210) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(Unknown Source) at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446) The namenode log (logging level = INFO) shows the following a few seconds either side of the above timestamps. Could be relevant or it could be a coincidence : 2011-05-12 14:36:40,005 INFO org.apache.hadoop.ipc.Server: IPC Server handler 57 on 9000 caught: java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(Unknown Source) at sun.nio.ch.SocketChannelImpl.write(Unknown Source) at org.apache.hadoop.ipc.Server.channelWrite(Server.java:1213) at org.apache.hadoop.ipc.Server.access$1900(Server.java:77) at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:622) at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:686) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:997) The jobtracker does however have an entry that correlates with the tasktracker : 2011-05-12 14:36:39,781 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201105090819_059_m_0038_0: java.io.IOException: Call to namenode/10.10.10.10:9000 failed on local exception: java.io.EOFException at org.apache.hadoop.ipc.Client.wrapException(Client.java:775) at org.apache.hadoop.ipc.Client.call(Client.java:743) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy1.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:105) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:208) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:169) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95) at org.apache.hadoop.mapred.Child.main(Child.java:157) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(Unknown Source) at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446) Can anyone give me any pointers on how to start troubleshooting this issue? It's very sporadic and we haven't been able to reproduce the issue yet in our lab. After looking through the mailing list archives, some of the suggestions revolve around the following settings: dfs.namenode.handler.count 128 (existing 64) dfs.datanode.handler.count 10 (existing 3) dfs.datanode.max.xcievers 4096 (existing 256) Any pointers ? Thanks in advance Sid Simmons Infrastructure Support Specialist
Re: mapper java process not exiting
Which version of hadoop are you running? Are you running on linux? -Joey On Thu, May 12, 2011 at 1:39 PM, Adi adi.pan...@gmail.com wrote: For one long running job we are noticing that the mapper jvms do not exit even after the mapper is done. Any suggestions on why this could be happening. The java processes get cleaned up if I do a hadoop job -kill job_id. The java processes get cleaned up of I run in it in a smaller batch and the job gets done fairly quickly(say half an hour). For larger inputs the nodes eventually run out of memory because of these java processes that the cluster thinks are gone but they haven't been cleaned up yet. I am suspecting the TaskTrackers are failing to kill JVMs for some reason by themselves. The following exceptions can be seen in the hadoop logs. 2011-05-12 13:52:04,147 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -12545: No such process 2011-05-12 13:52:08,071 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -11061: No such process 2011-05-12 13:52:09,009 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -11151: No such process 2011-05-12 13:52:12,009 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -25057: No such process 2011-05-12 13:52:13,306 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -19805: No such process 2011-05-12 13:52:14,996 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -11103: No such process 2011-05-12 15:51:41,105 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -17202: No such process 2011-05-12 15:51:43,481 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -15981: No such process 2011-05-12 15:51:45,916 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -17931: No such process 2011-05-12 15:52:06,328 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -14867: No such process 2011-05-12 15:52:34,503 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -29376: No such process 2011-05-12 15:52:38,607 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -32491: No such process 2011-05-12 15:52:39,292 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -31529: No such process 2011-05-12 15:52:46,547 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -15140: No such process Some other exceptions also seen in the logs may or may not be related to the above problem. 2011-05-12 16:01:20,534 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 33465 caught: java.nio.channels.ClosedChannelException 2011-05-12 16:01:48,869 INFO org.apache.hadoop.ipc.Server: IPC Server handler 80 on 33465 caught: java.nio.channels.ClosedChannelException 2011-05-12 16:01:53,922 INFO org.apache.hadoop.ipc.Server: IPC Server handler 59 on 33465 caught: java.nio.channels.ClosedChannelException 2011-05-12 16:01:58,977 INFO org.apache.hadoop.ipc.Server: IPC Server handler 28 on 33465 caught: java.nio.channels.ClosedChannelException 2011-05-12 16:02:04,040 INFO org.apache.hadoop.ipc.Server: IPC Server handler 37 on 33465 caught: java.nio.channels.ClosedChannelException 2011-05-12 16:02:09,095 INFO org.apache.hadoop.ipc.Server: IPC Server handler 100 on 33465 caught: java.nio.channels.ClosedChannelException Thanks. -Adi -- Joseph Echeverria Cloudera, Inc. 443.305.9434
Re: mapper java process not exiting
Which version of hadoop are you running? Hadoop 0.21.0 with some patches. Are you running on linux? Yes Linux 2.6.18-238.9.1.el5 #1 SMP x86_64 x86_64 x86_64 GNU/Linux java version 1.6.0_21 Java(TM) SE Runtime Environment (build 1.6.0_21-b06) Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode) I set up 0.21.0 on another linux box and am not seeing this issue as hadoop is reusing JVMs(as configured). In the production cluster it is not re-using JVMs and runs out of memory because of mapper JVMs staying alive even after they have ended according to hadoop. The production node is a 64 bit OS/JVM. Is there any known issue workaround for enabling JVM reuse in 64 bit environments. Test node is 32 bit: Linux 2.6.18-194.32.1.el5.centos.plus #1 SMP i686 i686 i386 GNU/Linux java version 1.6.0_17 OpenJDK Runtime Environment (IcedTea6 1.7.5) (rhel-1.16.b17.el5-i386) OpenJDK Server VM (build 14.0-b16, mixed mode) Even if I can get it to reuse JVM it will be grrreat. -Adi -Joey On Thu, May 12, 2011 at 1:39 PM, Adi adi.pan...@gmail.com wrote: For one long running job we are noticing that the mapper jvms do not exit even after the mapper is done. Any suggestions on why this could be happening. The java processes get cleaned up if I do a hadoop job -kill job_id. The java processes get cleaned up of I run in it in a smaller batch and the job gets done fairly quickly(say half an hour). For larger inputs the nodes eventually run out of memory because of these java processes that the cluster thinks are gone but they haven't been cleaned up yet. I am suspecting the TaskTrackers are failing to kill JVMs for some reason by themselves. The following exceptions can be seen in the hadoop logs. 2011-05-12 13:52:04,147 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -12545: No such process 2011-05-12 13:52:08,071 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -11061: No such process 2011-05-12 13:52:09,009 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -11151: No such process 2011-05-12 13:52:12,009 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -25057: No such process 2011-05-12 13:52:13,306 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -19805: No such process 2011-05-12 13:52:14,996 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -11103: No such process 2011-05-12 15:51:41,105 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -17202: No such process 2011-05-12 15:51:43,481 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -15981: No such process 2011-05-12 15:51:45,916 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -17931: No such process 2011-05-12 15:52:06,328 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -14867: No such process 2011-05-12 15:52:34,503 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -29376: No such process 2011-05-12 15:52:38,607 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -32491: No such process 2011-05-12 15:52:39,292 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -31529: No such process 2011-05-12 15:52:46,547 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -15140: No such process Some other exceptions also seen in the logs may or may not be related to the above problem. 2011-05-12 16:01:20,534 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 33465 caught: java.nio.channels.ClosedChannelException 2011-05-12 16:01:48,869 INFO org.apache.hadoop.ipc.Server: IPC Server handler 80 on 33465 caught: java.nio.channels.ClosedChannelException 2011-05-12 16:01:53,922 INFO org.apache.hadoop.ipc.Server: IPC Server handler 59 on 33465 caught: java.nio.channels.ClosedChannelException 2011-05-12 16:01:58,977 INFO
Datanode doesn't start but there is no exception in the log
Hello, I am trying to set up Hadoop HDFS in a cluster for the first time. So far I was using pseudo-distributed mode on my PC at home and everything was working perfectly. Tha NameNode starts but the DataNode doesn't start and the log contains the following: 2011-05-13 04:01:13,663 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = clone1/147.102.4.129 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2-cdh3u0 STARTUP_MSG: build = -r 81256ad0f2e4ab2bd34b04f53d25a6c23686dd14; compiled by 'hudson' on Fri Mar 25 19:56:23 PDT 2011 / 2011-05-13 04:01:14,019 INFO org.apache.hadoop.security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-installing. 2011-05-13 04:01:14,143 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Registered FSDatasetStatusMBean 2011-05-13 04:01:14,152 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened info server at 50010 2011-05-13 04:01:14,154 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 1048576 bytes/s 2011-05-13 04:01:14,206 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2011-05-13 04:01:14,272 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter) 2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50075 2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50075 webServer.getConnectors()[0].getLocalPort() returned 50075 2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50075 2011-05-13 04:01:14,278 INFO org.mortbay.log: jetty-6.1.26 2011-05-13 04:01:14,567 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:50075 2011-05-13 04:01:14,570 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=DataNode, sessionId=null 2011-05-13 04:01:14,976 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 50020 2011-05-13 04:01:14,978 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=DataNode, port=50020 2011-05-13 04:01:14,981 INFO org.apache.hadoop.ipc.metrics.RpcDetailedMetrics: Initializing RPC Metrics with hostName=DataNode, port=50020 2011-05-13 04:01:14,984 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration = DatanodeRegistration(clone1:50010, storageID=, infoPort=50075, ipcPort=50020) Does anyone know what might be wrong?? Thank you in advance! Panagiotis
Re: Datanode doesn't start but there is no exception in the log
Is that all the messages in the datanode log? Do you see any SHUTDOWN message also? -Bharath From: Panayotis Antonopoulos antonopoulos...@hotmail.com To: common-user@hadoop.apache.org Sent: Thursday, May 12, 2011 6:07 PM Subject: Datanode doesn't start but there is no exception in the log Hello, I am trying to set up Hadoop HDFS in a cluster for the first time. So far I was using pseudo-distributed mode on my PC at home and everything was working perfectly. Tha NameNode starts but the DataNode doesn't start and the log contains the following: 2011-05-13 04:01:13,663 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = clone1/147.102.4.129 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2-cdh3u0 STARTUP_MSG: build = -r 81256ad0f2e4ab2bd34b04f53d25a6c23686dd14; compiled by 'hudson' on Fri Mar 25 19:56:23 PDT 2011 / 2011-05-13 04:01:14,019 INFO org.apache.hadoop.security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-installing. 2011-05-13 04:01:14,143 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Registered FSDatasetStatusMBean 2011-05-13 04:01:14,152 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened info server at 50010 2011-05-13 04:01:14,154 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 1048576 bytes/s 2011-05-13 04:01:14,206 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2011-05-13 04:01:14,272 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter) 2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50075 2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50075 webServer.getConnectors()[0].getLocalPort() returned 50075 2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50075 2011-05-13 04:01:14,278 INFO org.mortbay.log: jetty-6.1.26 2011-05-13 04:01:14,567 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:50075 2011-05-13 04:01:14,570 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=DataNode, sessionId=null 2011-05-13 04:01:14,976 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 50020 2011-05-13 04:01:14,978 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=DataNode, port=50020 2011-05-13 04:01:14,981 INFO org.apache.hadoop.ipc.metrics.RpcDetailedMetrics: Initializing RPC Metrics with hostName=DataNode, port=50020 2011-05-13 04:01:14,984 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration = DatanodeRegistration(clone1:50010, storageID=, infoPort=50075, ipcPort=50020) Does anyone know what might be wrong?? Thank you in advance! Panagiotis
Re: is it possible to concatenate output files under many reducers?
yes. that is a general solution to control counts of output files. however, if you need to control counts of outputs dynamically, how could you do? if an output file name is 'A', counts of this output files are needed to be 5. if an output file name is 'B', counts of this output files are needed to be 10. is it able to be under hadoop? Junyoung Kim (juneng...@gmail.com) On 05/12/2011 02:17 PM, Harsh J wrote: Short, blind answer: You could run 10 reducers. Otherwise, you'll have to run another job that picks up a few files each in mapper and merges them out. But having 60 files shouldn't really be a problem if they are sufficiently large (at least 80% of a block size perhaps -- you can tune # of reducers to achieve this). On Thu, May 12, 2011 at 6:14 AM, Jun Young Kimjuneng...@gmail.com wrote: hi, all. I have 60 reducers which are generating same output files. from output-r--1 to output-r-00059. under this situation, I want to control the count of output files. for example, is it possible to concatenate all output files to 10 ? from output-r-1 to output-r-00010. thanks -- Junyoung Kim (juneng...@gmail.com)
Re: mapper java process not exiting
Hadoop 0.21.0 with some patches. Hadoop 0.21.0 doesn't get much use, so I'm not sure how much help I can be. 2011-05-12 13:52:04,147 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -12545: No such process Your logs showed that Hadoop tried to kill processes but the kill command claimed they didn't exist. The next time you see this problem, can you check the logs and see if any of the PIDs that appear in the logs are in fact still running? A more likely scenario is that Hadoop's tracking of child VMs is getting out of sync, but I'm not sure what would cause that. -Joey -- Joseph Echeverria Cloudera, Inc. 443.305.9434
Re: is it possible to concatenate output files under many reducers?
You can control the number of reducers by calling job.setNumReduceTasks() before you launch it. -Joey On Thu, May 12, 2011 at 6:33 PM, Jun Young Kim juneng...@gmail.com wrote: yes. that is a general solution to control counts of output files. however, if you need to control counts of outputs dynamically, how could you do? if an output file name is 'A', counts of this output files are needed to be 5. if an output file name is 'B', counts of this output files are needed to be 10. is it able to be under hadoop? Junyoung Kim (juneng...@gmail.com) On 05/12/2011 02:17 PM, Harsh J wrote: Short, blind answer: You could run 10 reducers. Otherwise, you'll have to run another job that picks up a few files each in mapper and merges them out. But having 60 files shouldn't really be a problem if they are sufficiently large (at least 80% of a block size perhaps -- you can tune # of reducers to achieve this). On Thu, May 12, 2011 at 6:14 AM, Jun Young Kimjuneng...@gmail.com wrote: hi, all. I have 60 reducers which are generating same output files. from output-r--1 to output-r-00059. under this situation, I want to control the count of output files. for example, is it possible to concatenate all output files to 10 ? from output-r-1 to output-r-00010. thanks -- Junyoung Kim (juneng...@gmail.com) -- Joseph Echeverria Cloudera, Inc. 443.305.9434
Re: mapper java process not exiting
2011-05-12 13:52:04,147 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -12545: No such process Your logs showed that Hadoop tried to kill processes but the kill command claimed they didn't exist. The next time you see this problem, can you check the logs and see if any of the PIDs that appear in the logs are in fact still running? A more likely scenario is that Hadoop's tracking of child VMs is getting out of sync, but I'm not sure what would cause that. Yes those java processes are in fact running. And those error messages do not always show up. Just sometimes. But the processes never get cleaned up. -Adi
Re: mapper java process not exiting
Is there a reason for using OpenJDK and not Sun's JDK? Also... I believe there were noted issues with the .17 JDK. I will look for a link and post if I can find. Otherwise, the behaviour I have seen before. Hadoop is detaching from the JVM and stops seeing it. I think your problem lies in the JDK and not Hadoop. On May 12, 2011 at 8:12 PM, Adi adi.pan...@gmail.com wrote: 2011-05-12 13:52:04,147 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -12545: No such process Your logs showed that Hadoop tried to kill processes but the kill command claimed they didn't exist. The next time you see this problem, can you check the logs and see if any of the PIDs that appear in the logs are in fact still running? A more likely scenario is that Hadoop's tracking of child VMs is getting out of sync, but I'm not sure what would cause that. Yes those java processes are in fact running. And those error messages do not always show up. Just sometimes. But the processes never get cleaned up. -Adi
Can Mapper get paths of inputSplits ?
Hi I'm using FileInputFormat which will split files logically according to their sizes into splits. Can the mapper get a pointer to these splits? and know which split it is assigned ? I tried looking up the Reporter class and see how is it printing the logical splits on the UI for each mapper .. but it's an interface. Eg. Mapper1: is assigned the logical split hdfs://localhost:9000/user/Hadoop/input:23+24 Mapper2: is assigned the logical split hdfs://localhost:9000/user/Hadoop/input:0+23 Then inside map, I want to ask what are the logical splits and get the upper two strings and know which one my current mapper is assigned. Thanks, Mark
I can't see my messages immediately, and sometimes doesn't even arrive why !
Re: Can Mapper get paths of inputSplits ?
On Thu, May 12, 2011 at 8:59 PM, Mark question markq2...@gmail.com wrote: Hi I'm using FileInputFormat which will split files logically according to their sizes into splits. Can the mapper get a pointer to these splits? and know which split it is assigned ? Look at http://hadoop.apache.org/common/docs/r0.20.203.0/mapred_tutorial.html#Task+JVM+Reuse In particular, map.input.file and map.input.offset are the configuration parameters that you want. -- Owen
Re: Can Mapper get paths of inputSplits ?
Thanks for the reply Owen, I only knew about map.input.file. So there is no way I can see the other possible splits (start+length)? like some function that returns strings of map.input.file and map.input.offset of the other mappers ? Thanks, Mark On Thu, May 12, 2011 at 9:08 PM, Owen O'Malley omal...@apache.org wrote: On Thu, May 12, 2011 at 8:59 PM, Mark question markq2...@gmail.com wrote: Hi I'm using FileInputFormat which will split files logically according to their sizes into splits. Can the mapper get a pointer to these splits? and know which split it is assigned ? Look at http://hadoop.apache.org/common/docs/r0.20.203.0/mapred_tutorial.html#Task+JVM+Reuse In particular, map.input.file and map.input.offset are the configuration parameters that you want. -- Owen
Re: how to get user-specified Job name from hadoop for running jobs?
you mean by user-specified is when you write your job name via JobConf.setJobName(myTask) ? Then using the same object you can recall your name as follows: JobConf conf ; conf.getJobName() ; ~Cheers Mark On Tue, May 10, 2011 at 10:16 AM, Mark Zand mz...@basistech.com wrote: While I can get JobStatus with this: JobClient client = new JobClient(new JobConf(conf)); JobStatus[] jobStatuses = client.getAllJobs(); I don't see any way to get user-specified Job name. Please help. Thanks.
Re: Can Mapper get paths of inputSplits ?
On Thu, May 12, 2011 at 9:23 PM, Mark question markq2...@gmail.com wrote: So there is no way I can see the other possible splits (start+length)? like some function that returns strings of map.input.file and map.input.offset of the other mappers ? No, there isn't any way to do it using the public API. The only way would be to look under the covers and read the split file (job.split). -- Owen
Re: Can Mapper get paths of inputSplits ?
Then which class is filling the Thanks again Owen, hopefully last but: Who's filling the map.input.file and map.input.offset (ie. which class) so I can extend it to have a function to return these strings. Thanks, Mark On Thu, May 12, 2011 at 10:07 PM, Owen O'Malley omal...@apache.org wrote: On Thu, May 12, 2011 at 9:23 PM, Mark question markq2...@gmail.com wrote: So there is no way I can see the other possible splits (start+length)? like some function that returns strings of map.input.file and map.input.offset of the other mappers ? No, there isn't any way to do it using the public API. The only way would be to look under the covers and read the split file (job.split). -- Owen
Re: Call to namenode fails with java.io.EOFException
One of the reasons I can think of could be a version mismatch. You may want to ensure that the job in question was not carrying a separate version of Hadoop along with it inside, perhaps? On Fri, May 13, 2011 at 12:42 AM, Sidney Simmons ssimm...@nmitconsulting.co.uk wrote: Hi there, I'm experiencing some unusual behaviour on our 0.20.2 hadoop cluster. Randomly (periodically), we're getting Call to namenode failures on tasktrackers causing tasks to fail: 2011-05-12 14:36:37,462 WARN org.apache.hadoop.mapred.TaskRunner: attempt_201105090819_059_m_0038_0Child Error java.io.IOException: Call to namenode/10.10.10.10:9000 failed on local exception: java.io.EOFException at org.apache.hadoop.ipc.Client.wrapException(Client.java:775) at org.apache.hadoop.ipc.Client.call(Client.java:743) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy5.getFileInfo(Unknown Source) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy5.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:615) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:210) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(Unknown Source) at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446) The namenode log (logging level = INFO) shows the following a few seconds either side of the above timestamps. Could be relevant or it could be a coincidence : 2011-05-12 14:36:40,005 INFO org.apache.hadoop.ipc.Server: IPC Server handler 57 on 9000 caught: java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(Unknown Source) at sun.nio.ch.SocketChannelImpl.write(Unknown Source) at org.apache.hadoop.ipc.Server.channelWrite(Server.java:1213) at org.apache.hadoop.ipc.Server.access$1900(Server.java:77) at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:622) at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:686) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:997) The jobtracker does however have an entry that correlates with the tasktracker : 2011-05-12 14:36:39,781 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201105090819_059_m_0038_0: java.io.IOException: Call to namenode/10.10.10.10:9000 failed on local exception: java.io.EOFException at org.apache.hadoop.ipc.Client.wrapException(Client.java:775) at org.apache.hadoop.ipc.Client.call(Client.java:743) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy1.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:105) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:208) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:169) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95) at org.apache.hadoop.mapred.Child.main(Child.java:157) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(Unknown Source) at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446) Can anyone give me any pointers on how to start troubleshooting this issue? It's very sporadic and we haven't been able to reproduce the issue yet in our lab. After looking through the mailing list archives, some of the suggestions revolve around the following settings: dfs.namenode.handler.count 128 (existing 64) dfs.datanode.handler.count 10 (existing 3) dfs.datanode.max.xcievers 4096 (existing 256) Any pointers ? Thanks in advance Sid Simmons Infrastructure Support Specialist -- Harsh J
Re: Datanode doesn't start but there is no exception in the log
Have you defined the IP of the DN in the slaves file? Sent from my iPhone On May 12, 2011, at 7:27 PM, Bharath Mundlapudi bharathw...@yahoo.com wrote: Is that all the messages in the datanode log? Do you see any SHUTDOWN message also? -Bharath From: Panayotis Antonopoulos antonopoulos...@hotmail.com To: common-user@hadoop.apache.org Sent: Thursday, May 12, 2011 6:07 PM Subject: Datanode doesn't start but there is no exception in the log Hello, I am trying to set up Hadoop HDFS in a cluster for the first time. So far I was using pseudo-distributed mode on my PC at home and everything was working perfectly. Tha NameNode starts but the DataNode doesn't start and the log contains the following: 2011-05-13 04:01:13,663 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = clone1/147.102.4.129 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2-cdh3u0 STARTUP_MSG: build = -r 81256ad0f2e4ab2bd34b04f53d25a6c23686dd14; compiled by 'hudson' on Fri Mar 25 19:56:23 PDT 2011 / 2011-05-13 04:01:14,019 INFO org.apache.hadoop.security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-installing. 2011-05-13 04:01:14,143 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Registered FSDatasetStatusMBean 2011-05-13 04:01:14,152 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened info server at 50010 2011-05-13 04:01:14,154 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 1048576 bytes/s 2011-05-13 04:01:14,206 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2011-05-13 04:01:14,272 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter) 2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50075 2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50075 webServer.getConnectors()[0].getLocalPort() returned 50075 2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50075 2011-05-13 04:01:14,278 INFO org.mortbay.log: jetty-6.1.26 2011-05-13 04:01:14,567 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:50075 2011-05-13 04:01:14,570 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=DataNode, sessionId=null 2011-05-13 04:01:14,976 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 50020 2011-05-13 04:01:14,978 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=DataNode, port=50020 2011-05-13 04:01:14,981 INFO org.apache.hadoop.ipc.metrics.RpcDetailedMetrics: Initializing RPC Metrics with hostName=DataNode, port=50020 2011-05-13 04:01:14,984 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration = DatanodeRegistration(clone1:50010, storageID=, infoPort=50075, ipcPort=50020) Does anyone know what might be wrong?? Thank you in advance! Panagiotis