Re: Changing the maximum tasks per node on a per job basis
Your problem seems to surround available memory and over-subscription. If you're using a 0.20.x or 1.x version of Apache Hadoop, you probably want to use the CapacityScheduler to address this for you. I once detailed how-to, on a similar question here: http://search-hadoop.com/m/gnFs91yIg1e On Wed, May 22, 2013 at 2:55 PM, Steve Lewis lordjoe2...@gmail.com wrote: I have a series of Hadoop jobs to run - one of my jobs requires larger than standard memory I allow the task to use 2GB of memory. When I run some of these jobs the slave nodes are crashing because they run out of swap space. It is not that s slave count not run one. or even 4 of these jobs but 8 stresses the limits. I could cut the mapred.tasktracker.reduce.tasks.maximum for the entire cluster but this cripples the whole cluster for one of many jobs. It seems to be a very bad design a) to allow the job tracker to keep assigning tasks to a slave that is already getting low on memory b) to allow the user to run jobs capable or crashing noeds on the cluster c) not to allow the user to specify that some jobs need to be limited to a lower value without requiring this limit for every job. Are there plans to fix this?? -- -- Harsh J
dncp_block_verification log
Hi All, On some systems, I noticed that when the scanner runs, the dncp_block_verification.log.curr file under the block pool gets quite large .. Please let me know.. i) why it is growing in only some machines..? ii) Wht's solution..? Following links also will describes the problem http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/201303.mbox/%3ccajzooycpad5w6cqdteliufy-h9r0pind9f0xelvt2bftwmm...@mail.gmail.com%3E Thanks Brahma Reddy
pauses during startup (maybe network related?)
Hi I'm running hadoop on my local laptop for development and everything works but there's some annoying pauses during the startup which causes the entire hadoop startup process to take up to 4 minutes and I'm wondering what it is and if I can do anything about it. I'm running everything on 1 machines, on fedora linux, hadoop-1.1.2, oracle jkd1.7.0_17, the machine is a dual core i5, and I have 8gb of ram and an SSD so it shouldn't be slow. When the system pauses, there is no cpu usage, no disk usage and no network usage (although I suspect it's waiting for the network to resolve or return something). Here's some snippets from the namenode logs during startup where you can see it just pauses for around 30 seconds or more with out errors or anything : ... 2013-05-23 19:26:37,660 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2013-05-23 19:26:37,676 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered. 2013-05-23 19:27:54,144 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2013-05-23 19:27:54,144 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system started ... 2013-05-23 19:27:54,341 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: The dfs.support.append option is in your configuration, however append is not supported. This configuration option is no longer required to enable sync. 2013-05-23 19:27:54,341 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 2013-05-23 19:28:19,918 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStateMBean and NameNodeMXBean 2013-05-23 19:28:19,937 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times ... 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server handler 28 on 9000: starting 2013-05-23 19:28:26,833 INFO org.apache.hadoop.ipc.Server: IPC Server handler 31 on 9000: starting 2013-05-23 19:30:10,644 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from 127.0.0.1:50010 storage DS-651015167-192.168.1.5-50010-1369140176513 2013-05-23 19:30:10,650 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/127.0.0.1:50010 I already start the system with : export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true I only allocate : export HADOOP_HEAPSIZE=512 (but it's an empty hadoop system, maybe just 1 or 2 test files less than 100k, and there's no CPU usage so it doesn't look like it's GC thrashing) I should mention again, there's no errors and the system runs fine and relatively speedy once started (considering it's on my laptop). Does anyone know what's causing these pauses? (and how I can get rid of them) Thanks. -- Ted.
Hadoop Rack awareness on virtual system
Hi, Can we create and test hadoop rack awareness functionality in virtual box system(like on laptop .etc)?. Thanks~
Re: dncp_block_verification log
Hi, What is your HDFS version? I vaguely remember this to be a problem in the 2.0.0 version or so where there was also a block scanner excessive work bug, but I'm not sure what fixed it. I've not seen it appear in the later releases. On Thu, May 23, 2013 at 12:08 PM, Brahma Reddy Battula brahmareddy.batt...@huawei.com wrote: Hi All, On some systems, I noticed that when the scanner runs, the dncp_block_verification.log.curr file under the block pool gets quite large .. Please let me know.. i) why it is growing in only some machines..? ii) Wht's solution..? Following links also will describes the problem http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/201303.mbox/%3ccajzooycpad5w6cqdteliufy-h9r0pind9f0xelvt2bftwmm...@mail.gmail.com%3E Thanks Brahma Reddy -- Harsh J
RE: dncp_block_verification log
HI Harsh Thanks for reply... I am using hadoop-2.0.1 From: Harsh J [ha...@cloudera.com] Sent: Thursday, May 23, 2013 8:24 PM To: user@hadoop.apache.org Subject: Re: dncp_block_verification log Hi, What is your HDFS version? I vaguely remember this to be a problem in the 2.0.0 version or so where there was also a block scanner excessive work bug, but I'm not sure what fixed it. I've not seen it appear in the later releases. On Thu, May 23, 2013 at 12:08 PM, Brahma Reddy Battula brahmareddy.batt...@huawei.commailto:brahmareddy.batt...@huawei.com wrote: Hi All, On some systems, I noticed that when the scanner runs, the dncp_block_verification.log.curr file under the block pool gets quite large .. Please let me know.. i) why it is growing in only some machines..? ii) Wht's solution..? Following links also will describes the problem http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/201303.mbox/%3ccajzooycpad5w6cqdteliufy-h9r0pind9f0xelvt2bftwmm...@mail.gmail.com%3E Thanks Brahma Reddy -- Harsh J
Hadoop Installation Mappers setting
Hi, While installing hadoop cluster, how we can calculate the exact number of mappers value. Thanks~
Out of memory error by Node Manager, and shut down
Hi, I have got the following error in node manager's log, and it got shut down, after about 1 application were run after it was started. Any clue why does it occur... or is this a bug? 2013-05-22 11:53:34,456 FATAL org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[process reaper,5,main] threw an Error. Shutting down now... java.lang.OutOfMemoryError: Failed to create a thread: retVal -1073741830, errno 11 at java.lang.Thread.startImpl(Native Method) at java.lang.Thread.start(Thread.java:887) at java.lang.ProcessInputStream.init(UNIXProcess.java:472) at java.lang.UNIXProcess$1$1$1.run(UNIXProcess.java:157) at java.security.AccessController.doPrivileged(AccessController.java:202) at java.lang.UNIXProcess$1$1.run(UNIXProcess.java:137) Thanks, Kishore
Re: Hadoop Installation Mappers setting
Hi I assume the question is on how many slots. It dependents on - the child/task jvm size and the available memory. - available number of cores Your available memory for tasks is total memory - memory used for OS and other services running on your box. Other services include non hadoop services as well as hadoop daemons. Divide the available memory with child jvm size and that would get the max num of slots. Also check whether sufficient number of cores are available as well. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: Jitendra Yadav jeetuyadav200...@gmail.com Date: Thu, 23 May 2013 18:10:38 To: user@hadoop.apache.org Reply-To: user@hadoop.apache.org Subject: Hadoop Installation Mappers setting Hi, While installing hadoop cluster, how we can calculate the exact number of mappers value. Thanks~
Re: Hadoop Rack awareness on virtual system
You definitely can. Just set rack script on your VMs. Leonid On Thu, May 23, 2013 at 2:50 AM, Jitendra Yadav jeetuyadav200...@gmail.comwrote: Hi, Can we create and test hadoop rack awareness functionality in virtual box system(like on laptop .etc)?. Thanks~
Hadoop Classpath issue.
Hi Guys, When i trying to execute hadoop fs -ls / command It's return extra two lines. 226:~# hadoop fs -ls / *common ./* *lib lib* Found 9 items drwxrwxrwx - hdfs supergroup 0 2013-03-07 04:46 /benchmarks drwxr-xr-x - hbase hbase 0 2013-05-23 08:59 /hbase drwxr-xr-x - hdfs supergroup 0 2013-02-20 13:21 /mapred drwxr-xr-x - tech supergroup 0 2013-05-03 05:15 /test drwxrwxrwx - mapred supergroup 0 2013-05-23 09:33 /tmp drwxrwxr-x - hdfs supergroup 0 2013-02-20 16:32 /user drwxr-xr-x - hdfs supergroup 0 2013-02-20 15:10 /var In other machines. Not return extra to lines. Please guide me how to remove this line. 226:~# /usr/bin/hadoop classpath common ./ lib lib /etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/*:/usr/lib/hadoop-0.20-mapreduce/.//* Please guide me How to fix this. -Dhanasekaran Did I learn something today? If not, I wasted it.
Re: Hadoop Rack awareness on virtual system
Hi Leonid, Thanks for you reply. please you please give me an example how to make topology.sh file? Lets say I have below slave servers(data nodes) 192.168.45.1 dnode1 192.168.45.2 dnode2 192.168.45.3 dnode3 192.168.45.4 dnode4 192.168.45.5 dnode5 Thanks On Thu, May 23, 2013 at 8:02 PM, Leonid Fedotov lfedo...@hortonworks.comwrote: You definitely can. Just set rack script on your VMs. Leonid On Thu, May 23, 2013 at 2:50 AM, Jitendra Yadav jeetuyadav200...@gmail.com wrote: Hi, Can we create and test hadoop rack awareness functionality in virtual box system(like on laptop .etc)?. Thanks~
Re: Hadoop Rack awareness on virtual system
An example topology file and script is available on the Wiki at http://wiki.apache.org/hadoop/topology_rack_awareness_scripts On Thu, May 23, 2013 at 8:38 PM, Jitendra Yadav jeetuyadav200...@gmail.comwrote: Hi Leonid, Thanks for you reply. please you please give me an example how to make topology.sh file? Lets say I have below slave servers(data nodes) 192.168.45.1 dnode1 192.168.45.2 dnode2 192.168.45.3 dnode3 192.168.45.4 dnode4 192.168.45.5 dnode5 Thanks On Thu, May 23, 2013 at 8:02 PM, Leonid Fedotov lfedo...@hortonworks.comwrote: You definitely can. Just set rack script on your VMs. Leonid On Thu, May 23, 2013 at 2:50 AM, Jitendra Yadav jeetuyadav200...@gmail.com wrote: Hi, Can we create and test hadoop rack awareness functionality in virtual box system(like on laptop .etc)?. Thanks~ -- Harsh J
Re: Out of memory error by Node Manager, and shut down
Looks like the problem is with jvm heap size. Its trying to create a new thread and threads require native memory for internal JVM related things. One of the possible solution is to reduce java heap size(to increase free native memory). Is there any other information about the memory status (malloc debug information etc) on NN? That would give more information about the NN's memory status. Hope this helps. *Pramod N* Bruce Wayne of web @machinelearner https://twitter.com/machinelearner On Thu, May 23, 2013 at 6:42 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I have got the following error in node manager's log, and it got shut down, after about 1 application were run after it was started. Any clue why does it occur... or is this a bug? 2013-05-22 11:53:34,456 FATAL org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[process reaper,5,main] threw an Error. Shutting down now... java.lang.OutOfMemoryError: Failed to create a thread: retVal -1073741830, errno 11 at java.lang.Thread.startImpl(Native Method) at java.lang.Thread.start(Thread.java:887) at java.lang.ProcessInputStream.init(UNIXProcess.java:472) at java.lang.UNIXProcess$1$1$1.run(UNIXProcess.java:157) at java.security.AccessController.doPrivileged(AccessController.java:202) at java.lang.UNIXProcess$1$1.run(UNIXProcess.java:137) Thanks, Kishore
Re: R for Hadoop
Try Rhipe, it is good. http://amalgjose.wordpress.com/2013/05/05/rhipe-installation/ http://www.datadr.org/ http://amalgjose.wordpress.com/2013/05/05/r-installation-in-linux-platforms/ On Mon, May 20, 2013 at 2:23 PM, sudhakara st sudhakara...@gmail.comwrote: Hi You find good start up materiel for RHadoop here, https://github.com/RevolutionAnalytics/rmr2/blob/master/docs/tutorial.md http://bighadoop.wordpress.com/2013/02/25/r-and-hadoop-data-analysis-rhadoop/ I am also working on RHadoop, mail me if any difficulty in RHadoop. On Mon, May 20, 2013 at 12:20 AM, Marco Shaw marco.s...@gmail.com wrote: You can try to search for Rhadoop using your favourite search engine. I think you are going to have to put in a bit more effort on your own. Marco -- Regards, ...Sudhakara.st
RE: Shuffle phase replication factor
Ling, Thanks for the response! I could use more clarification on item 1. Specifically * mapred.reduce.parallel.copies limits the number of outbound connections for a reducer, but not the inbound connections for a mapper. Does tasktracker.http.threads limit the number of simultaneous inbound connections for a mapper, or only the size of the thread pool servicing the connections? (i.e. is it one thread per inbound connection?). * Who actually creates the listen port for serving up the mapper files? The mapper task? Or something more persistent in MapReduce? Thanks, John From: erlv5...@gmail.com [mailto:erlv5...@gmail.com] On Behalf Of Kun Ling Sent: Wednesday, May 22, 2013 7:50 PM To: user Subject: Re: Shuffle phase replication factor Hi John, 1. for the number of simultaneous connection limitations. You can configure this using the mapred.reduce.parallel.copies flag. the default is 5. 2. For the aggressively disconnect implication, I am afraid it is only a little. Normally, each reducer will connect to each mapper task, and asking for the partions of the map output file. Because there are about 5 simultaneous connections to fetch the map output for each reducer. For a large MR cluster with 1000 node, and a Huge MR job with 1000 Mapper, and 1000 reducer, for each node, there are only about 5 connections. So the imply is only a little. 3. What happens to the pending/ failing coonection, the short answer is: just try to reconnect.There is a List, which maintain all the output of the Mapper that need to copied, and the element will be removed iff the map output is successfully copied. A forever loop will keep on look into the List, and fetch the corrsponding map output. All the above answer is based on the Hadoop 1.0.4 source code, especially the ReduceTask.java file. yours, Ling Kun On Wed, May 22, 2013 at 10:57 PM, John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote: U, is that also the limit for the number of simultaneous connections? In general, one does not need a 1:1 map between threads and connections. If this is the connection limit, does it imply that the client or server side aggressively disconnects after a transfer? What happens to the pending/failing connection attempts that exceed the limit? Thanks! john From: Rahul Bhattacharjee [mailto:rahul.rec@gmail.commailto:rahul.rec@gmail.com] Sent: Wednesday, May 22, 2013 8:52 AM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: Re: Shuffle phase replication factor There are properties/configuration to control the no. of copying threads for copy. tasktracker.http.threads=40 Thanks, Rahul On Wed, May 22, 2013 at 8:16 PM, John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote: This brings up another nagging question I've had for some time. Between HDFS and shuffle, there seems to be the potential for every node connecting to every other node via TCP. Are there explicit mechanisms in place to manage or limit simultaneous connections? Is the protocol simply robust enough to allow a server-side to disconnect at any time to free up slots and the client-side will retry the request? Thanks john From: Shahab Yunus [mailto:shahab.yu...@gmail.commailto:shahab.yu...@gmail.com] Sent: Wednesday, May 22, 2013 8:38 AM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: Re: Shuffle phase replication factor As mentioned by Bertrand, Hadoop, The Definitive Guide, is well... really definitive :) place to start. It is pretty thorough for starts and once you are gone through it, the code will start making more sense too. Regards, Shahab On Wed, May 22, 2013 at 10:33 AM, John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote: Oh I see. Does this mean there is another service and TCP listen port for this purpose? Thanks for your indulgence... I would really like to read more about this without bothering the group but not sure where to start to learn these internals other than the code. john From: Kai Voigt [mailto:k...@123.orgmailto:k...@123.org] Sent: Tuesday, May 21, 2013 12:59 PM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: Re: Shuffle phase replication factor The map output doesn't get written to HDFS. The map task writes its output to its local disk, the reduce tasks will pull the data through HTTP for further processing. Am 21.05.2013 um 19:57 schrieb John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net: When MapReduce enters shuffle to partition the tuples, I am assuming that it writes intermediate data to HDFS. What replication factor is used for those temporary files? john -- Kai Voigt k...@123.orgmailto:k...@123.org -- http://www.lingcc.com
Re: Shuffle phase replication factor
In MR1, the tasktracker serves the mapper files (so that tasks don't have to stick around taking up resources). In MR2, the shuffle service, which lives inside the nodemanager, serves them. -Sandy On Thu, May 23, 2013 at 10:22 AM, John Lilley john.lil...@redpoint.netwrote: Ling, Thanks for the response! I could use more clarification on item 1. Specifically **· **mapred.reduce.parallel.copies limits the number of outbound connections for a reducer, but not the inbound connections for a mapper. Does tasktracker.http.threads limit the number of simultaneous inbound connections for a mapper, or only the size of the thread pool servicing the connections? (i.e. is it one thread per inbound connection?). **· **Who actually creates the listen port for serving up the mapper files? The mapper task? Or something more persistent in MapReduce? Thanks, John ** ** *From:* erlv5...@gmail.com [mailto:erlv5...@gmail.com] *On Behalf Of *Kun Ling *Sent:* Wednesday, May 22, 2013 7:50 PM *To:* user *Subject:* Re: Shuffle phase replication factor ** ** Hi John, ** ** ** ** 1. for the number of simultaneous connection limitations. You can configure this using the mapred.reduce.parallel.copies flag. the default is 5. ** ** 2. For the aggressively disconnect implication, I am afraid it is only a little. Normally, each reducer will connect to each mapper task, and asking for the partions of the map output file. Because there are about 5 simultaneous connections to fetch the map output for each reducer. For a large MR cluster with 1000 node, and a Huge MR job with 1000 Mapper, and 1000 reducer, for each node, there are only about 5 connections. So the imply is only a little. ** ** ** ** 3. What happens to the pending/ failing coonection, the short answer is: just try to reconnect.There is a List, which maintain all the output of the Mapper that need to copied, and the element will be removed iff the map output is successfully copied. A forever loop will keep on look into the List, and fetch the corrsponding map output. ** ** ** ** All the above answer is based on the Hadoop 1.0.4 source code, especially the ReduceTask.java file. ** ** yours, Ling Kun ** ** On Wed, May 22, 2013 at 10:57 PM, John Lilley john.lil...@redpoint.net wrote: U, is that also the limit for the number of simultaneous connections? In general, one does not need a 1:1 map between threads and connections.** ** If this is the connection limit, does it imply that the client or server side aggressively disconnects after a transfer? What happens to the pending/failing connection attempts that exceed the limit? Thanks! john *From:* Rahul Bhattacharjee [mailto:rahul.rec@gmail.com] *Sent:* Wednesday, May 22, 2013 8:52 AM *To:* user@hadoop.apache.org *Subject:* Re: Shuffle phase replication factor There are properties/configuration to control the no. of copying threads for copy. tasktracker.http.threads=40 Thanks, Rahul On Wed, May 22, 2013 at 8:16 PM, John Lilley john.lil...@redpoint.net wrote: This brings up another nagging question I’ve had for some time. Between HDFS and shuffle, there seems to be the potential for “every node connecting to every other node” via TCP. Are there explicit mechanisms in place to manage or limit simultaneous connections? Is the protocol simply robust enough to allow a server-side to disconnect at any time to free up slots and the client-side will retry the request? Thanks john *From:* Shahab Yunus [mailto:shahab.yu...@gmail.com] *Sent:* Wednesday, May 22, 2013 8:38 AM *To:* user@hadoop.apache.org *Subject:* Re: Shuffle phase replication factor As mentioned by Bertrand, Hadoop, The Definitive Guide, is well... really definitive :) place to start. It is pretty thorough for starts and once you are gone through it, the code will start making more sense too. Regards, Shahab On Wed, May 22, 2013 at 10:33 AM, John Lilley john.lil...@redpoint.net wrote: Oh I see. Does this mean there is another service and TCP listen port for this purpose? Thanks for your indulgence… I would really like to read more about this without bothering the group but not sure where to start to learn these internals other than the code. john *From:* Kai Voigt [mailto:k...@123.org] *Sent:* Tuesday, May 21, 2013 12:59 PM *To:* user@hadoop.apache.org *Subject:* Re: Shuffle phase replication factor The map output doesn't get written to HDFS. The map task writes its output to its local disk, the reduce tasks will pull the data through HTTP for further processing. Am 21.05.2013 um 19:57 schrieb John Lilley john.lil...@redpoint.net:
Re: Is there a way to limit # of hadoop tasks per user at runtime?
You can use capacity scheduler also. In that you can create some queues, each of specific capacity. Then you can submit jobs to that specific queue at runtime or you can configure it as direct submission. On Wed, May 22, 2013 at 3:27 AM, Sandy Ryza sandy.r...@cloudera.com wrote: Hi Mehmet, Are you using MR1 or MR2? The fair scheduler, present in both versions, but configured slightly differently, allows you to limit the number of map and reduce tasks in a queue. The configuration can be updated at runtime by modifying the scheduler's allocations file. It also has a feature that automatically maps jobs to queues based on the user submitted them. Here are links to documentation in MR1 and MR2: http://hadoop.apache.org/docs/stable/fair_scheduler.html http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html -Sandy On Tue, May 21, 2013 at 2:43 PM, Mehmet Belgin mehmet.bel...@oit.gatech.edu wrote: Hi Everyone, I was wondering if there is a way for limiting the number of tasks (map+reduce) *per user* at runtime? Using an environment variable perhaps? I am asking this from a resource provisioning perspective. I am trying to come up with a N-token licensing system for multiple users to use our limited hadoop resources simultaneously. That is, when user A checks out 6 tokens, he/she can only run 6 hadoop tasks. If there is no such thing in hadoop, has anyone tried to integrate hadoop with torque/moab (or any other RM or scheduler)? Any advice in that direction will be appreciated :) Thanks in advance, -Mehmet
Re: Hadoop Installation Mappers setting
I am explaining it more. If your machine have 8 GB of memory. After reserving to Operating system and all other processes except tasktracker, you have 4 GB remaining(assume). The remaining process running is tasktracker. If the child jvm size is 200 MB, Then you can define a maximum slots of 4*1024 MB/ 200 MB Which is approximately 20. You can divide the slots into mapper and reducer slots as per your requirement. This is just an example that I explained based on my knowledge. On Thu, May 23, 2013 at 7:48 PM, bejoy.had...@gmail.com wrote: ** Hi I assume the question is on how many slots. It dependents on - the child/task jvm size and the available memory. - available number of cores Your available memory for tasks is total memory - memory used for OS and other services running on your box. Other services include non hadoop services as well as hadoop daemons. Divide the available memory with child jvm size and that would get the max num of slots. Also check whether sufficient number of cores are available as well. Regards Bejoy KS Sent from remote device, Please excuse typos -- *From: * Jitendra Yadav jeetuyadav200...@gmail.com *Date: *Thu, 23 May 2013 18:10:38 +0530 *To: *user@hadoop.apache.org *ReplyTo: * user@hadoop.apache.org *Subject: *Hadoop Installation Mappers setting Hi, While installing hadoop cluster, how we can calculate the exact number of mappers value. Thanks~
HDFS data and non-aligned splits
What happens when MR produces data splits, and those splits don't align on block boundaries? I've read that MR will attempt to make data splits near block boundaries to improve data locality, but isn't there always some slop where records straddle the block boundaries, resulting in an extra HDFS connection just to get the half-record in the other block? Does this impact performance? Are there file formats that attempt to enforce data alignment?
SequenceFile sync marker uniqueness
How does SequenceFile guarantee that the sync marker does not appear in the data? John
Re: pauses during startup (maybe network related?)
Hi Ted, 2013-05-23 19:28:19,937 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times ... 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server handler 28 on 9000: starting There are a couple of relevant activities that happen during namenode startup in between these 2 log statements. It loads the current fsimage (persistent copy of file system metadata), merges in the edits log (transaction log containing all file system metadata changes since the last checkpoint), and then saves back a new fsimage file after that merge. Current versions of the Hadoop codebase will print some information to logs about the volume of activity during this checkpointing process, so I recommend looking for that in your logs to see if this explains it. Depending on whether or not your have a large number of transactions queued since your last checkpoint, this whole process can cause namenode startup to take several minutes. If this becomes a regular problem, then you can run SecondaryNameNode or BackupNode to perform periodic checkpoints in addition to the checkpoint that occurs on namenode restart. This is probably overkill for a dev environment on your laptop though. Hope this helps, Chris Nauroth Hortonworks http://hortonworks.com/ On Thu, May 23, 2013 at 2:49 AM, Ted r6squee...@gmail.com wrote: Hi I'm running hadoop on my local laptop for development and everything works but there's some annoying pauses during the startup which causes the entire hadoop startup process to take up to 4 minutes and I'm wondering what it is and if I can do anything about it. I'm running everything on 1 machines, on fedora linux, hadoop-1.1.2, oracle jkd1.7.0_17, the machine is a dual core i5, and I have 8gb of ram and an SSD so it shouldn't be slow. When the system pauses, there is no cpu usage, no disk usage and no network usage (although I suspect it's waiting for the network to resolve or return something). Here's some snippets from the namenode logs during startup where you can see it just pauses for around 30 seconds or more with out errors or anything : ... 2013-05-23 19:26:37,660 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2013-05-23 19:26:37,676 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered. 2013-05-23 19:27:54,144 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2013-05-23 19:27:54,144 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system started ... 2013-05-23 19:27:54,341 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: The dfs.support.append option is in your configuration, however append is not supported. This configuration option is no longer required to enable sync. 2013-05-23 19:27:54,341 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 2013-05-23 19:28:19,918 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStateMBean and NameNodeMXBean 2013-05-23 19:28:19,937 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times ... 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server handler 28 on 9000: starting 2013-05-23 19:28:26,833 INFO org.apache.hadoop.ipc.Server: IPC Server handler 31 on 9000: starting 2013-05-23 19:30:10,644 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from 127.0.0.1:50010 storage DS-651015167-192.168.1.5-50010-1369140176513 2013-05-23 19:30:10,650 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/127.0.0.1:50010 I already start the system with : export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true I only allocate : export HADOOP_HEAPSIZE=512 (but it's an empty hadoop system, maybe just 1 or 2 test files less than 100k, and there's no CPU usage so it doesn't look like it's GC thrashing) I should mention again, there's no errors and the system runs fine and relatively speedy once started (considering it's on my laptop). Does anyone know what's causing these pauses? (and how I can get rid of them) Thanks. -- Ted.
Re: Is there a way to limit # of hadoop tasks per user at runtime?
The only pain point I'd find with CS in a multi-user environment is its limitation of using queue configs. Its non-trivial to configure a queue per user as CS doesn't provide any user level settings (it wasn't designed for that initially), while in FS you get user level limiting settings for free, while also being able to specify pools (for users, or generally for a property, such as queues). On Thu, May 23, 2013 at 10:55 PM, Amal G Jose amalg...@gmail.com wrote: You can use capacity scheduler also. In that you can create some queues, each of specific capacity. Then you can submit jobs to that specific queue at runtime or you can configure it as direct submission. On Wed, May 22, 2013 at 3:27 AM, Sandy Ryza sandy.r...@cloudera.comwrote: Hi Mehmet, Are you using MR1 or MR2? The fair scheduler, present in both versions, but configured slightly differently, allows you to limit the number of map and reduce tasks in a queue. The configuration can be updated at runtime by modifying the scheduler's allocations file. It also has a feature that automatically maps jobs to queues based on the user submitted them. Here are links to documentation in MR1 and MR2: http://hadoop.apache.org/docs/stable/fair_scheduler.html http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html -Sandy On Tue, May 21, 2013 at 2:43 PM, Mehmet Belgin mehmet.bel...@oit.gatech.edu wrote: Hi Everyone, I was wondering if there is a way for limiting the number of tasks (map+reduce) *per user* at runtime? Using an environment variable perhaps? I am asking this from a resource provisioning perspective. I am trying to come up with a N-token licensing system for multiple users to use our limited hadoop resources simultaneously. That is, when user A checks out 6 tokens, he/she can only run 6 hadoop tasks. If there is no such thing in hadoop, has anyone tried to integrate hadoop with torque/moab (or any other RM or scheduler)? Any advice in that direction will be appreciated :) Thanks in advance, -Mehmet -- Harsh J
Re: Hadoop Installation Mappers setting
Hi, Thanks for your clarification. I have one more question. How does cores factor influence slots calculation? Thanks~ On 5/23/13, Amal G Jose amalg...@gmail.com wrote: I am explaining it more. If your machine have 8 GB of memory. After reserving to Operating system and all other processes except tasktracker, you have 4 GB remaining(assume). The remaining process running is tasktracker. If the child jvm size is 200 MB, Then you can define a maximum slots of 4*1024 MB/ 200 MB Which is approximately 20. You can divide the slots into mapper and reducer slots as per your requirement. This is just an example that I explained based on my knowledge. On Thu, May 23, 2013 at 7:48 PM, bejoy.had...@gmail.com wrote: ** Hi I assume the question is on how many slots. It dependents on - the child/task jvm size and the available memory. - available number of cores Your available memory for tasks is total memory - memory used for OS and other services running on your box. Other services include non hadoop services as well as hadoop daemons. Divide the available memory with child jvm size and that would get the max num of slots. Also check whether sufficient number of cores are available as well. Regards Bejoy KS Sent from remote device, Please excuse typos -- *From: * Jitendra Yadav jeetuyadav200...@gmail.com *Date: *Thu, 23 May 2013 18:10:38 +0530 *To: *user@hadoop.apache.org *ReplyTo: * user@hadoop.apache.org *Subject: *Hadoop Installation Mappers setting Hi, While installing hadoop cluster, how we can calculate the exact number of mappers value. Thanks~
Re: HDFS data and non-aligned splits
What happens when MR produces data splits, and those splits don’t align on block boundaries? Answer depends on the file format used here. With any of the formats we ship, nothing happens. but isn’t there always some slop where records straddle the block boundaries, resulting in an extra HDFS connection just to get the half-record in the other block? Yes, but how large is half (or in worst case, the whole) record going to be in size? Does this impact performance? Its more of an extra, minor DN connection. The perf impact is almost zero but the format-free loading is a major win in operations. Comparing to Disco's DDFS for one alternative example, HDFS is much easier here. With Disco you have to manage your chunking during load time, while with HDFS, MR libraries need logic based on http://wiki.apache.org/hadoop/HadoopMapReduce to process those records. You would at most, depending on how large the records are of course, spend reading from a few bytes to a few megabytes over the network. If you use large record sizes, its also a good thing to raise up the file's block size. Are there file formats that attempt to enforce data alignment? I don't think there are any, and there shouldn't be, cause reading them beyond split boundaries is pretty transparent to application writers. Your HDFS reader API doesn't require you to be aware of the split. On Thu, May 23, 2013 at 11:23 PM, John Lilley john.lil...@redpoint.netwrote: What happens when MR produces data splits, and those splits don’t align on block boundaries? I’ve read that MR will attempt to make data splits near block boundaries to improve data locality, but isn’t there always some slop where records straddle the block boundaries, resulting in an extra HDFS connection just to get the half-record in the other block? Does this impact performance? Are there file formats that attempt to enforce data alignment? ** ** -- Harsh J
Re: Hadoop Installation Mappers setting
When you take a mapreduce tasks, you need CPU cycles to do the processing, not just memory. So ideally based on the processor type(hyperthreaded or not) compute the available cores. Then may be compute as, one core for each task slot. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: Jitendra Yadav jeetuyadav200...@gmail.com Date: Fri, 24 May 2013 00:26:29 To: user@hadoop.apache.org Reply-To: user@hadoop.apache.org Subject: Re: Hadoop Installation Mappers setting Hi, Thanks for your clarification. I have one more question. How does cores factor influence slots calculation? Thanks~ On 5/23/13, Amal G Jose amalg...@gmail.com wrote: I am explaining it more. If your machine have 8 GB of memory. After reserving to Operating system and all other processes except tasktracker, you have 4 GB remaining(assume). The remaining process running is tasktracker. If the child jvm size is 200 MB, Then you can define a maximum slots of 4*1024 MB/ 200 MB Which is approximately 20. You can divide the slots into mapper and reducer slots as per your requirement. This is just an example that I explained based on my knowledge. On Thu, May 23, 2013 at 7:48 PM, bejoy.had...@gmail.com wrote: ** Hi I assume the question is on how many slots. It dependents on - the child/task jvm size and the available memory. - available number of cores Your available memory for tasks is total memory - memory used for OS and other services running on your box. Other services include non hadoop services as well as hadoop daemons. Divide the available memory with child jvm size and that would get the max num of slots. Also check whether sufficient number of cores are available as well. Regards Bejoy KS Sent from remote device, Please excuse typos -- *From: * Jitendra Yadav jeetuyadav200...@gmail.com *Date: *Thu, 23 May 2013 18:10:38 +0530 *To: *user@hadoop.apache.org *ReplyTo: * user@hadoop.apache.org *Subject: *Hadoop Installation Mappers setting Hi, While installing hadoop cluster, how we can calculate the exact number of mappers value. Thanks~
Re: SequenceFile sync marker uniqueness
SequenceFiles use a 16 digit MD5 (computed based on a UID and writer ~init time, so pretty random). For the rest of my answer, I'll prefer not to repeat what Martin's already said very well here: http://search-hadoop.com/m/VYVra2krg5t1 (point #2) over the Avro lists for the Avro DataFile format which uses a similar technique. On Thu, May 23, 2013 at 11:34 PM, John Lilley john.lil...@redpoint.netwrote: How does SequenceFile guarantee that the sync marker does not appear in the data? John ** ** -- Harsh J
HTTP file server, map output, and other files
Thanks to previous kind answers and more reading in the elephant book, I now understand that mapper tasks place partitioned results into local files that are served up to reducers via HTTP: The output file's partitions are made available to the reducers over HTTP. The maximum number of worker threads used to serve the file partitions is controlled by the tasktracker.http.threads property; this setting is per tasktracker, not per map task slot. The default of 40 may need to be increased for large clusters running large jobs. In MapReduce 2, this property is not applicable because the maximum number of threads used is set automatically based on the number of processors on the machine. (MapReduce 2 uses Netty, which by default allows up to twice as many threads as there are processors.) My question is, for a custom (non-MR) application under YARN, how would I set up my application tasks' output data to be served over HTTP? Is there an API to control this, or are there predefined local folders that will be served up? Once I am finished with the temporary data, how do I request that the files are removed? Thanks John
Re: Hive tmp logs
Clarification This property defines a file on HDFS property namehive.exec.scratchdir/name value /data01/workspace/hive scratch/dir/on/local/linux/disk/value /property From: Sanjay Subramanian sanjay.subraman...@wizecommerce.commailto:sanjay.subraman...@wizecommerce.com Date: Wednesday, May 22, 2013 12:23 PM To: u...@hive.apache.orgmailto:u...@hive.apache.org u...@hive.apache.orgmailto:u...@hive.apache.org Cc: User user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: Re: Hive tmp logs property namehive.querylog.location/name value/path/to/hivetmp/dir/on/local/linux/disk/value /property From: Anurag Tangri tangri.anu...@gmail.commailto:tangri.anu...@gmail.com Reply-To: u...@hive.apache.orgmailto:u...@hive.apache.org u...@hive.apache.orgmailto:u...@hive.apache.org Date: Wednesday, May 22, 2013 11:56 AM To: u...@hive.apache.orgmailto:u...@hive.apache.org u...@hive.apache.orgmailto:u...@hive.apache.org Cc: Hive u...@hive.apache.orgmailto:u...@hive.apache.org, User user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: Re: Hive tmp logs Hi, You can add Hive query log property in your hive site xml and point to the directory you want. Thanks, Anurag Tangri Sent from my iPhone On May 22, 2013, at 11:53 AM, Raj Hadoop hadoop...@yahoo.commailto:hadoop...@yahoo.com wrote: Hi, My hive job logs are being written to /tmp/hadoop directory. I want to change it to a different location i.e. a sub directory somehere under the 'hadoop' user home directory. How do I change it. Thanks, Ra CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
hive.log
How do I set the property in hive-site.xml that defines the local linux directory for hive.log ? Thanks sanjay CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
MiniDFS Cluster log dir
Hey guys, For testing purpose I am starting up a minicluster using the http://hadoop.apache.org/docs/r1.2.0/cli_minicluster.html I was wondering what is a good way to configure log directory for the same. I tried setting hadoop.log.dir or yarn.log.dir but that seems to have no effect. I am specifically trying to access job logs. While trying to access job logs from the job history server pages it complains that Logs not available for attempt_136934344_0001_r_00_0. Aggregation may not be complete, Check back later or try the nodemanager at localhost:62025 I do set yarn.nodemanager.remote-app-log-dir using the -D option while starting up the hadoop cluster but it seems like it doesnot make used of that at all. Any pointer to help resolve the issue. Regards, Siddhi Mehta
Re: hive.log
Ok figured it out - vi /etc/hive/conf/hive-log4j.properties - Modify this line #hive.log.dir=/tmp/${user.name} hive.log.dir=/data01/workspace/hive/log/${user.name} From: Sanjay Subramanian sanjay.subraman...@wizecommerce.commailto:sanjay.subraman...@wizecommerce.com Reply-To: u...@hive.apache.orgmailto:u...@hive.apache.org u...@hive.apache.orgmailto:u...@hive.apache.org Date: Thursday, May 23, 2013 2:56 PM To: u...@hive.apache.orgmailto:u...@hive.apache.org u...@hive.apache.orgmailto:u...@hive.apache.org Cc: User user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: hive.log How do I set the property in hive-site.xml that defines the local linux directory for hive.log ? Thanks sanjay CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator. CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
Child Error
Hello, I have a 20 node Hadoop cluster where each node has 8GB memory and an 8-core processor. I sometimes get the following error on a random basis: --- Exception in thread main java.io.IOException: Exception reading file:/var/tmp/jim/hadoop-jim/mapred/local/taskTracker/jim/jobcache/job_201305231647_0005/jobToken at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:135) at org.apache.hadoop.mapreduce.security.TokenCache.loadTokens(TokenCache.java:165) at org.apache.hadoop.mapred.Child.main(Child.java:92) Caused by: java.io.IOException: failure to login at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:501) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:463) at org.apache.hadoop.fs.FileSystem$Cache$Key.init(FileSystem.java:1519) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1420) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:129) ... 2 more Caused by: javax.security.auth.login.LoginException: java.lang.NullPointerException: invalid null input: name at com.sun.security.auth.UnixPrincipal.init(UnixPrincipal.java:70) at com.sun.security.auth.module.UnixLoginModule.login(UnixLoginModule.java:132) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) .. --- This does not always happen but I see a pattern when the intermediate data is larger, it tends to occur more frequently. In the web log, I can see the following: java.lang.Throwable: Child Error at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) Caused by: java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) From what I read online, a possible cause is when there is not enough memory for all JVM's. My mapred site.xml is set up to allocate 1100MB for each child and the maximum number of map and reduce tasks are set to 3 - So 6600MB of the child JVMs + (500MB * 2) for the data node and task tracker (as I set HADOOP_HEAP to 500 MB). I feel like memory is not the cause but I couldn't avoid it so far. In case it helps, here are the relevant sections of my mapred-site.xml --- namemapred.tasktracker.map.tasks.maximum/name value3/value namemapred.tasktracker.reduce.tasks.maximum/name value3/value namemapred.child.java.opts/name value-Xmx1100M -ea -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/tmp/soner/value namemapred.reduce.parallel.copies/name value5/value nametasktracker.http.threads/name value80/value --- My jobs still complete most of the time though they occasionally fail and I'm really puzzled at this point. I'd appreciate any help or ideas. Thanks
Re: pauses during startup (maybe network related?)
thanks, I'm almost 100% sure it's network related now. What I tested was unpluggin my network :), the entire system starts in just a few seconds. I decided to search on reverse dns in google and I see other people have complained about very slow reverse dns lookups (some related to hadoop / hbase too). I'm not sure why this is happenning yet though. I thought 127.0.0.1 or localhost would have just resolved instantly - but it appears it's some how finding my real IP instead, i.e. 192.168.1.5 seems to show up in the log entries even though all my configurations say localhost/127.0.0.1 and my /etc/hosts file has and entry for localhost/127.0.0.1 I think if I make a /etc/hosts entry for 192.168.1.5 everything will be quick, that's what I'm going to test later. The only problem is I'm on an dynamic IP... I've considered just making entries for all reasonable permutations like 192.168.1.1 through 192.168.1.20... but I'm still more just miffed at how it's knowing I'm a 192 address when I told it to use localhost. On 5/24/13, Chris Nauroth cnaur...@hortonworks.com wrote: Hi Ted, 2013-05-23 19:28:19,937 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times ... 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server handler 28 on 9000: starting There are a couple of relevant activities that happen during namenode startup in between these 2 log statements. It loads the current fsimage (persistent copy of file system metadata), merges in the edits log (transaction log containing all file system metadata changes since the last checkpoint), and then saves back a new fsimage file after that merge. Current versions of the Hadoop codebase will print some information to logs about the volume of activity during this checkpointing process, so I recommend looking for that in your logs to see if this explains it. Depending on whether or not your have a large number of transactions queued since your last checkpoint, this whole process can cause namenode startup to take several minutes. If this becomes a regular problem, then you can run SecondaryNameNode or BackupNode to perform periodic checkpoints in addition to the checkpoint that occurs on namenode restart. This is probably overkill for a dev environment on your laptop though. Hope this helps, Chris Nauroth Hortonworks http://hortonworks.com/ On Thu, May 23, 2013 at 2:49 AM, Ted r6squee...@gmail.com wrote: Hi I'm running hadoop on my local laptop for development and everything works but there's some annoying pauses during the startup which causes the entire hadoop startup process to take up to 4 minutes and I'm wondering what it is and if I can do anything about it. I'm running everything on 1 machines, on fedora linux, hadoop-1.1.2, oracle jkd1.7.0_17, the machine is a dual core i5, and I have 8gb of ram and an SSD so it shouldn't be slow. When the system pauses, there is no cpu usage, no disk usage and no network usage (although I suspect it's waiting for the network to resolve or return something). Here's some snippets from the namenode logs during startup where you can see it just pauses for around 30 seconds or more with out errors or anything : ... 2013-05-23 19:26:37,660 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2013-05-23 19:26:37,676 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered. 2013-05-23 19:27:54,144 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2013-05-23 19:27:54,144 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system started ... 2013-05-23 19:27:54,341 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: The dfs.support.append option is in your configuration, however append is not supported. This configuration option is no longer required to enable sync. 2013-05-23 19:27:54,341 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 2013-05-23 19:28:19,918 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStateMBean and NameNodeMXBean 2013-05-23 19:28:19,937 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times ... 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server handler 28 on 9000: starting 2013-05-23 19:28:26,833 INFO org.apache.hadoop.ipc.Server: IPC Server handler 31 on 9000: starting 2013-05-23 19:30:10,644 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from 127.0.0.1:50010 storage DS-651015167-192.168.1.5-50010-1369140176513 2013-05-23 19:30:10,650 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/127.0.0.1:50010 I already
Where to begin from??
Hi all, I'm a computer science undergraduate and has recently started to explore about Hadoop. I find it very interesting and want to get involved both as contributor and developer for this open source project. I have been going through many text book related to Hadoop and HDFS but still I find it very difficult as to where should a beginner start from before writing his first line of code as contributer or developer. Also please tell me what are the things I compulsorily need to know before I dive into depth of these things. Thanking you all in anticipation. -- *Lokesh Chandra Basu* B. Tech Computer Science and Engineering Indian Institute of Technology, Roorkee India(GMT +5hr 30min) +91-8267805498
Re: splittable vs seekable compressed formats
I think seeking is a property of the fs , so any file stored in hdfs is seekable. Inputstream is seekable and outputstream isn't. FileSystem supports seekable. Thanks, Rahul On Thu, May 23, 2013 at 11:01 PM, John Lilley john.lil...@redpoint.netwrote: I’ve read about splittable compressed formats in Hadoop. Are any of these formats also “seekable” (in other words, be able to seek to an absolute location in the uncompressed data). John ** **
Re: Where to begin from??
I'll be chastised and have mean things said about me for this. Get some experience in IT before you start looking at Hadoop. My reasoning is this: If you don't know how to develop real applications in a Non-Hadoop world, you'll struggle a lot to develop with Hadoop. Asking what things you need to know in compulsory is like saying you want to learn computers -- totally worthless! Find a problem to solve and seek to learn the tools you need to solve your problem. Otherwise, your learning is un-applied and somewhat useless. Picture a recent acting school graduate how to direct the next Star Wars movie. It's almost like that. On Thu, May 23, 2013 at 10:39 PM, Lokesh Basu lokesh.b...@gmail.com wrote: Hi all, I'm a computer science undergraduate and has recently started to explore about Hadoop. I find it very interesting and want to get involved both as contributor and developer for this open source project. I have been going through many text book related to Hadoop and HDFS but still I find it very difficult as to where should a beginner start from before writing his first line of code as contributer or developer. Also please tell me what are the things I compulsorily need to know before I dive into depth of these things. Thanking you all in anticipation. -- *Lokesh Chandra Basu* B. Tech Computer Science and Engineering Indian Institute of Technology, Roorkee India(GMT +5hr 30min) +91-8267805498
Re: Where to begin from??
I agree with Chris…don't worry about what the technology is called Hadoop , Big table, Lucene, Hive….Model the problem and see what the solution could be….that’s very important And Lokesh please don't mind…we are writing to u perhaps stuff that u don't want to hear but its an important real perspective To illustrate what I mean let me give u a few problems to think about and see how u would solve them…. 1. Before Microsoft took over Skype at least this feature used to be there and the feature is like this……u type the name of a person and it used to come back with some search results in milliseconds often searching close to a billion names…….How would u design such a search architecture ? 2. In 2012, say 50 million users (cookie based) searched Macys.com on a SALES weekend and say 20,000 bought $100 dollar shoes. Now this year 2013 on that SALES weekend 60 million users (cookie based) are buying on the website….You want to give a 25% extra reward to only those cookies that were from last year…So u are looking for an intersection set of possibly 20,000 cookies in two sets - 50million and 60 million…..How would u solve this problem within milli seconds ? 3. Last my favorite….The Postal Services department wants to think of new business ideas to avoid bankruptcy…One idea I have is they have zillion small delivery vans that go to each street in the country….Say I lease out the space to BIG wireless phone providers and promise them them that I will mount wireless signal strength measurement systems on these vans and I will provide them data 3 times a day…how will u devise a solution to analyse and store data ? I am sure if u look around in India as well u will see a lot of situations where u want to solve a problem…. As Chris says , think about the problem u want to solve, then model the solutions and pick the best one… On the flip side….I can tell u it will still be a few years till many Banks and Stock trading houses will believe in Cassandra and Hbase for OLTP because that data is critical……If your timeline in Facebook does not show a photo , its possibly OK but if your 1 million deposit I a bank does not show up for days or suddenly vanishes - u r possibly not going to take that lightly….. Ok enough RAMBLING…. Good luck sanjay From: Chris Embree cemb...@gmail.commailto:cemb...@gmail.com Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org user@hadoop.apache.orgmailto:user@hadoop.apache.org, ch...@embree.usmailto:ch...@embree.us ch...@embree.usmailto:ch...@embree.us Date: Thursday, May 23, 2013 7:47 PM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: Re: Where to begin from?? I'll be chastised and have mean things said about me for this. Get some experience in IT before you start looking at Hadoop. My reasoning is this: If you don't know how to develop real applications in a Non-Hadoop world, you'll struggle a lot to develop with Hadoop. Asking what things you need to know in compulsory is like saying you want to learn computers -- totally worthless! Find a problem to solve and seek to learn the tools you need to solve your problem. Otherwise, your learning is un-applied and somewhat useless. Picture a recent acting school graduate how to direct the next Star Wars movie. It's almost like that. On Thu, May 23, 2013 at 10:39 PM, Lokesh Basu lokesh.b...@gmail.commailto:lokesh.b...@gmail.com wrote: Hi all, I'm a computer science undergraduate and has recently started to explore about Hadoop. I find it very interesting and want to get involved both as contributor and developer for this open source project. I have been going through many text book related to Hadoop and HDFS but still I find it very difficult as to where should a beginner start from before writing his first line of code as contributer or developer. Also please tell me what are the things I compulsorily need to know before I dive into depth of these things. Thanking you all in anticipation. -- Lokesh Chandra Basu B. Tech Computer Science and Engineering Indian Institute of Technology, Roorkee India(GMT +5hr 30min) +91-8267805498tel:%2B91-8267805498 CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
Task attempt failed after TaskAttemptListenerImpl ping
Hi hadoop users I find that One application filed when the container log it shows that it always ping [2]. How does it come out? I'm using the YARN and MRv2(CDH-4.1.2) [1]resourcemanager.log 2013-05-24 09:45:07,192 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Done launching container Container: [ContainerId: container_1369298403742_0144_01_01, NodeId: wxossetl3:29984, NodeHttpAddress: wxossetl3:8042, Resource: memory: 1536, Priority: org.apache.hadoop.yarn.api.records.impl.pb.PriorityPBImpl@1f, State: NEW, Token: null, Status: container_id {, app_attempt_id {, application_id {, id: 144, cluster_timestamp: 1369298403742, }, attemptId: 1, }, id: 1, }, state: C_NEW, ] for AM appattempt_1369298403742_0144_01 2013-05-24 09:45:07,192 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1369298403742_0144_01 State change from ALLOCATED to LAUNCHED 2013-05-24 09:45:08,186 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1369298403742_0144_01_01 Container Transitioned from ACQUIRED to RUNNING 2013-05-24 09:45:10,533 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: AM registration appattempt_1369298403742_0144_01 2013-05-24 09:45:10,533 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop IP=172.16.250.1OPERATION=Register App Master TARGET=ApplicationMasterService RESULT=SUCCESS APPID=application_1369298403742_0144 APPATTEMPTID=appattempt_1369298403742_0144_01 2013-05-24 09:45:10,533 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1369298403742_0144_01 State change from LAUNCHED to RUNNING 2013-05-24 09:45:10,533 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1369298403742_0144 State change from ACCEPTED to RUNNING [2] container syslog: 2013-05-24 10:00:10,222 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.LocalContainerLauncher: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for container container_1369298403742_0153_01_01 taskAttempt attempt_1369298403742_0153_m_00_0 2013-05-24 10:00:10,223 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1369298403742_0153_m_00_0] using containerId: [container_1369298403742_0153_01_01 on NM: [wxossetl1:46256] 2013-05-24 10:00:10,223 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.LocalContainerLauncher: mapreduce.cluster.local.dir for uber task: /tmp/nm-local-dir/usercache/hadoop/appcache/application_1369298403742_0153 2013-05-24 10:00:10,225 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1369298403742_0153_m_00_0 TaskAttempt Transitioned from ASSIGNED to RUNNING 2013-05-24 10:00:10,226 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1369298403742_0153_m_00 Task Transitioned from SCHEDULED to RUNNING 2013-05-24 10:00:10,237 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@34e77781 2013-05-24 10:00:13,224 INFO [communication thread] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from attempt_1369298403742_0153_m_00_0 2013-05-24 10:00:16,225 INFO [communication thread] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from attempt_1369298403742_0153_m_00_0 2013-05-24 10:00:19,225 INFO [communication thread] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from attempt_1369298403742_0153_m_00_0 ..
Re: Where to begin from??
Hi, With all due to respect to the senior members of this site, I wanted to first congratulate Lokesh for his interest in Hadoop. I want to know how many fresh graduates are interested in this technology. I guess not many. So we have to welcome Lokesh to Hadoop world. I agree to the seniors...It is good and important to know the real world problems But coming to your question - as per my knowledge - if u want to learn / shine in Hadoop - know the following compulsorily. 1) Linux 2) Java 3) Sql Seniors may correct me or add or modify to the following list. Thanks, Raj From: Sanjay Subramanian sanjay.subraman...@wizecommerce.com To: user@hadoop.apache.org user@hadoop.apache.org; ch...@embree.us ch...@embree.us Sent: Thursday, May 23, 2013 11:03 PM Subject: Re: Where to begin from?? I agree with Chris…don't worry about what the technology is called Hadoop , Big table, Lucene, Hive….Model the problem and see what the solution could be….that’s very important And Lokesh please don't mind…we are writing to u perhaps stuff that u don't want to hear but its an important real perspective To illustrate what I mean let me give u a few problems to think about and see how u would solve them…. 1. Before Microsoft took over Skype at least this feature used to be there and the feature is like this……u type the name of a person and it used to come back with some search results in milliseconds often searching close to a billion names…….How would u design such a search architecture ? 2. In 2012, say 50 million users (cookie based) searched Macys.com on a SALES weekend and say 20,000 bought $100 dollar shoes. Now this year 2013 on that SALES weekend 60 million users (cookie based) are buying on the website….You want to give a 25% extra reward to only those cookies that were from last year…So u are looking for an intersection set of possibly 20,000 cookies in two sets - 50million and 60 million…..How would u solve this problem within milli seconds ? 3. Last my favorite….The Postal Services department wants to think of new business ideas to avoid bankruptcy…One idea I have is they have zillion small delivery vans that go to each street in the country….Say I lease out the space to BIG wireless phone providers and promise them them that I will mount wireless signal strength measurement systems on these vans and I will provide them data 3 times a day…how will u devise a solution to analyse and store data ? I am sure if u look around in India as well u will see a lot of situations where u want to solve a problem…. As Chris says , think about the problem u want to solve, then model the solutions and pick the best one… On the flip side….I can tell u it will still be a few years till many Banks and Stock trading houses will believe in Cassandra and Hbase for OLTP because that data is critical……If your timeline in Facebook does not show a photo , its possibly OK but if your 1 million deposit I a bank does not show up for days or suddenly vanishes - u r possibly not going to take that lightly….. Ok enough RAMBLING…. Good luck sanjay From: Chris Embree cemb...@gmail.com Reply-To: user@hadoop.apache.org user@hadoop.apache.org, ch...@embree.us ch...@embree.us Date: Thursday, May 23, 2013 7:47 PM To: user@hadoop.apache.org user@hadoop.apache.org Subject: Re: Where to begin from?? I'll be chastised and have mean things said about me for this. Get some experience in IT before you start looking at Hadoop. My reasoning is this: If you don't know how to develop real applications in a Non-Hadoop world, you'll struggle a lot to develop with Hadoop. Asking what things you need to know in compulsory is like saying you want to learn computers -- totally worthless! Find a problem to solve and seek to learn the tools you need to solve your problem. Otherwise, your learning is un-applied and somewhat useless. Picture a recent acting school graduate how to direct the next Star Wars movie. It's almost like that. On Thu, May 23, 2013 at 10:39 PM, Lokesh Basu lokesh.b...@gmail.com wrote: Hi all, I'm a computer science undergraduate and has recently started to explore about Hadoop. I find it very interesting and want to get involved both as contributor and developer for this open source project. I have been going through many text book related to Hadoop and HDFS but still I find it very difficult as to where should a beginner start from before writing his first line of code as contributer or developer. Also please tell me what are the things I compulsorily need to know before I dive into depth of these things. Thanking you all in anticipation. -- Lokesh Chandra Basu B. Tech Computer Science and Engineering Indian Institute of Technology, Roorkee India(GMT +5hr 30min) +91-8267805498 CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the
Re: Hadoop Classpath issue.
Hi You should check your /usr/bin/hadoop script. 2013/5/23 Dhanasekaran Anbalagan bugcy...@gmail.com Hi Guys, When i trying to execute hadoop fs -ls / command It's return extra two lines. 226:~# hadoop fs -ls / *common ./* *lib lib* Found 9 items drwxrwxrwx - hdfs supergroup 0 2013-03-07 04:46 /benchmarks drwxr-xr-x - hbase hbase 0 2013-05-23 08:59 /hbase drwxr-xr-x - hdfs supergroup 0 2013-02-20 13:21 /mapred drwxr-xr-x - tech supergroup 0 2013-05-03 05:15 /test drwxrwxrwx - mapred supergroup 0 2013-05-23 09:33 /tmp drwxrwxr-x - hdfs supergroup 0 2013-02-20 16:32 /user drwxr-xr-x - hdfs supergroup 0 2013-02-20 15:10 /var In other machines. Not return extra to lines. Please guide me how to remove this line. 226:~# /usr/bin/hadoop classpath common ./ lib lib /etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/*:/usr/lib/hadoop-0.20-mapreduce/.//* Please guide me How to fix this. -Dhanasekaran Did I learn something today? If not, I wasted it.
Re: Where to begin from??
First of all thank you all. I accept that I don't know much about the real world problem and have to begin from scratch to get some insight of what is actually driving these technologies. to Chris : I will start working on finding and implementing some real world problem and see how these things are implemented in the first place before I try to do something out of the box. to Sanjay : Thank you very much for the sample problems to look into before going into much detail about it. to Raj : Thank you for the appreciation and support for my attempt to learn and implement something which is new to me. The things that you mentioned like Linux, Java and Sql are very much familiar to me and in fact I have some implementation experience with Sql, php, python and c++. I have made some online event websites and made a command based Search Engine for small scale search (without something too complex as PageRank). I also have some experience with version control system as I was trying to qualify for GSoC 2012 (AbiWord, but was unsuccessful). Right now I just need something like a guide that can allow me to move from start and let me learn as much as I can, because I I'm willing to give all the time I have to learn more and more about these things. Thanking you all for your kind replies and support. *Lokesh Chandra Basu* B. Tech Computer Science and Engineering Indian Institute of Technology, Roorkee India(GMT +5hr 30min) +91-8267805498 On Fri, May 24, 2013 at 9:35 AM, Raj Hadoop hadoop...@yahoo.com wrote: Hi, With all due to respect to the senior members of this site, I wanted to first congratulate Lokesh for his interest in Hadoop. I want to know how many fresh graduates are interested in this technology. I guess not many. So we have to welcome Lokesh to Hadoop world. I agree to the seniors...It is good and important to know the real world problems But coming to your question - as per my knowledge - if u want to learn / shine in Hadoop - know the following compulsorily. 1) Linux 2) Java 3) Sql Seniors may correct me or add or modify to the following list. Thanks, Raj -- *From:* Sanjay Subramanian sanjay.subraman...@wizecommerce.com *To:* user@hadoop.apache.org user@hadoop.apache.org; ch...@embree.us ch...@embree.us *Sent:* Thursday, May 23, 2013 11:03 PM *Subject:* Re: Where to begin from?? I agree with Chris…don't worry about what the technology is called Hadoop , Big table, Lucene, Hive….Model the problem and see what the solution could be….that’s very important And Lokesh please don't mind…we are writing to u perhaps stuff that u don't want to hear but its an important real perspective To illustrate what I mean let me give u a few problems to think about and see how u would solve them…. 1. Before Microsoft took over Skype at least this feature used to be there and the feature is like this……u type the name of a person and it used to come back with some search results in milliseconds often searching close to a billion names…….How would u design such a search architecture ? 2. In 2012, say 50 million users (cookie based) searched Macys.com on a SALES weekend and say 20,000 bought $100 dollar shoes. Now this year 2013 on that SALES weekend 60 million users (cookie based) are buying on the website….You want to give a 25% extra reward to only those cookies that were from last year…So u are looking for an intersection set of possibly 20,000 cookies in two sets - 50million and 60 million…..How would u solve this problem within milli seconds ? 3. Last my favorite….The Postal Services department wants to think of new business ideas to avoid bankruptcy…One idea I have is they have zillion small delivery vans that go to each street in the country….Say I lease out the space to BIG wireless phone providers and promise them them that I will mount wireless signal strength measurement systems on these vans and I will provide them data 3 times a day…how will u devise a solution to analyse and store data ? I am sure if u look around in India as well u will see a lot of situations where u want to solve a problem…. As Chris says , think about the problem u want to solve, then model the solutions and pick the best one… On the flip side….I can tell u it will still be a few years till many Banks and Stock trading houses will believe in Cassandra and Hbase for OLTP because that data is critical……If your timeline in Facebook does not show a photo , its possibly OK but if your 1 million deposit I a bank does not show up for days or suddenly vanishes - u r possibly not going to take that lightly….. Ok enough RAMBLING…. Good luck sanjay From: Chris Embree cemb...@gmail.com Reply-To: user@hadoop.apache.org user@hadoop.apache.org, ch...@embree.us ch...@embree.us Date: Thursday, May 23, 2013 7:47 PM To: user@hadoop.apache.org user@hadoop.apache.org Subject: Re: Where to begin from??
Re: Hadoop Classpath issue.
Check your HDFS at namenode:50070 if these files are there... *Thanks Regards* ∞ Shashwat Shriparv On Fri, May 24, 2013 at 9:45 AM, YouPeng Yang yypvsxf19870...@gmail.comwrote: Hi You should check your /usr/bin/hadoop script. 2013/5/23 Dhanasekaran Anbalagan bugcy...@gmail.com Hi Guys, When i trying to execute hadoop fs -ls / command It's return extra two lines. 226:~# hadoop fs -ls / *common ./* *lib lib* Found 9 items drwxrwxrwx - hdfs supergroup 0 2013-03-07 04:46 /benchmarks drwxr-xr-x - hbase hbase 0 2013-05-23 08:59 /hbase drwxr-xr-x - hdfs supergroup 0 2013-02-20 13:21 /mapred drwxr-xr-x - tech supergroup 0 2013-05-03 05:15 /test drwxrwxrwx - mapred supergroup 0 2013-05-23 09:33 /tmp drwxrwxr-x - hdfs supergroup 0 2013-02-20 16:32 /user drwxr-xr-x - hdfs supergroup 0 2013-02-20 15:10 /var In other machines. Not return extra to lines. Please guide me how to remove this line. 226:~# /usr/bin/hadoop classpath common ./ lib lib /etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/*:/usr/lib/hadoop-0.20-mapreduce/.//* Please guide me How to fix this. -Dhanasekaran Did I learn something today? If not, I wasted it.
Re: Task attempt failed after TaskAttemptListenerImpl ping
Assuming you mean failed there instead of filed. In MR, a ping message is sent over the TaskUmbilicalProtocol from the Task container to the MR AM. A ping is only sent as an alternative, to check self, if there's no progress to report from the task. No progress to report for a long time generally means the task has stopped doing work/isn't updating its status/is stuck. On Fri, May 24, 2013 at 8:46 AM, YouPeng Yang yypvsxf19870...@gmail.com wrote: Hi hadoop users I find that One application filed when the container log it shows that it always ping [2]. How does it come out? I'm using the YARN and MRv2(CDH-4.1.2) [1]resourcemanager.log 2013-05-24 09:45:07,192 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Done launching container Container: [ContainerId: container_1369298403742_0144_01_01, NodeId: wxossetl3:29984, NodeHttpAddress: wxossetl3:8042, Resource: memory: 1536, Priority: org.apache.hadoop.yarn.api.records.impl.pb.PriorityPBImpl@1f, State: NEW, Token: null, Status: container_id {, app_attempt_id {, application_id {, id: 144, cluster_timestamp: 1369298403742, }, attemptId: 1, }, id: 1, }, state: C_NEW, ] for AM appattempt_1369298403742_0144_01 2013-05-24 09:45:07,192 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1369298403742_0144_01 State change from ALLOCATED to LAUNCHED 2013-05-24 09:45:08,186 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1369298403742_0144_01_01 Container Transitioned from ACQUIRED to RUNNING 2013-05-24 09:45:10,533 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: AM registration appattempt_1369298403742_0144_01 2013-05-24 09:45:10,533 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop IP=172.16.250.1OPERATION=Register App Master TARGET=ApplicationMasterService RESULT=SUCCESS APPID=application_1369298403742_0144 APPATTEMPTID=appattempt_1369298403742_0144_01 2013-05-24 09:45:10,533 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1369298403742_0144_01 State change from LAUNCHED to RUNNING 2013-05-24 09:45:10,533 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1369298403742_0144 State change from ACCEPTED to RUNNING [2] container syslog: 2013-05-24 10:00:10,222 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.LocalContainerLauncher: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for container container_1369298403742_0153_01_01 taskAttempt attempt_1369298403742_0153_m_00_0 2013-05-24 10:00:10,223 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1369298403742_0153_m_00_0] using containerId: [container_1369298403742_0153_01_01 on NM: [wxossetl1:46256] 2013-05-24 10:00:10,223 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.LocalContainerLauncher: mapreduce.cluster.local.dir for uber task: /tmp/nm-local-dir/usercache/hadoop/appcache/application_1369298403742_0153 2013-05-24 10:00:10,225 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1369298403742_0153_m_00_0 TaskAttempt Transitioned from ASSIGNED to RUNNING 2013-05-24 10:00:10,226 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1369298403742_0153_m_00 Task Transitioned from SCHEDULED to RUNNING 2013-05-24 10:00:10,237 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@34e77781 2013-05-24 10:00:13,224 INFO [communication thread] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from attempt_1369298403742_0153_m_00_0 2013-05-24 10:00:16,225 INFO [communication thread] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from attempt_1369298403742_0153_m_00_0 2013-05-24 10:00:19,225 INFO [communication thread] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from attempt_1369298403742_0153_m_00_0 .. -- Harsh J
Re: pauses during startup (maybe network related?)
You are spot on about the DNS lookup slowing things down. I've faced the same issue (before I had a local network DNS set up for the WiFi network I use). but I'm still more just miffed at how it's knowing I'm a 192 address when I told it to use localhost. There's a few configs you need to additionally change to make a perfect localhost setup. Otherwise, there are defaults in Apache Hadoop that bind to 0.0.0.0 and report the current system hostname (which changes if you get onto a network), causing what you're seeing. On Fri, May 24, 2013 at 7:42 AM, Ted r6squee...@gmail.com wrote: thanks, I'm almost 100% sure it's network related now. What I tested was unpluggin my network :), the entire system starts in just a few seconds. I decided to search on reverse dns in google and I see other people have complained about very slow reverse dns lookups (some related to hadoop / hbase too). I'm not sure why this is happenning yet though. I thought 127.0.0.1 or localhost would have just resolved instantly - but it appears it's some how finding my real IP instead, i.e. 192.168.1.5 seems to show up in the log entries even though all my configurations say localhost/127.0.0.1 and my /etc/hosts file has and entry for localhost/127.0.0.1 I think if I make a /etc/hosts entry for 192.168.1.5 everything will be quick, that's what I'm going to test later. The only problem is I'm on an dynamic IP... I've considered just making entries for all reasonable permutations like 192.168.1.1 through 192.168.1.20... but I'm still more just miffed at how it's knowing I'm a 192 address when I told it to use localhost. On 5/24/13, Chris Nauroth cnaur...@hortonworks.com wrote: Hi Ted, 2013-05-23 19:28:19,937 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times ... 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server handler 28 on 9000: starting There are a couple of relevant activities that happen during namenode startup in between these 2 log statements. It loads the current fsimage (persistent copy of file system metadata), merges in the edits log (transaction log containing all file system metadata changes since the last checkpoint), and then saves back a new fsimage file after that merge. Current versions of the Hadoop codebase will print some information to logs about the volume of activity during this checkpointing process, so I recommend looking for that in your logs to see if this explains it. Depending on whether or not your have a large number of transactions queued since your last checkpoint, this whole process can cause namenode startup to take several minutes. If this becomes a regular problem, then you can run SecondaryNameNode or BackupNode to perform periodic checkpoints in addition to the checkpoint that occurs on namenode restart. This is probably overkill for a dev environment on your laptop though. Hope this helps, Chris Nauroth Hortonworks http://hortonworks.com/ On Thu, May 23, 2013 at 2:49 AM, Ted r6squee...@gmail.com wrote: Hi I'm running hadoop on my local laptop for development and everything works but there's some annoying pauses during the startup which causes the entire hadoop startup process to take up to 4 minutes and I'm wondering what it is and if I can do anything about it. I'm running everything on 1 machines, on fedora linux, hadoop-1.1.2, oracle jkd1.7.0_17, the machine is a dual core i5, and I have 8gb of ram and an SSD so it shouldn't be slow. When the system pauses, there is no cpu usage, no disk usage and no network usage (although I suspect it's waiting for the network to resolve or return something). Here's some snippets from the namenode logs during startup where you can see it just pauses for around 30 seconds or more with out errors or anything : ... 2013-05-23 19:26:37,660 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2013-05-23 19:26:37,676 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered. 2013-05-23 19:27:54,144 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2013-05-23 19:27:54,144 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system started ... 2013-05-23 19:27:54,341 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: The dfs.support.append option is in your configuration, however append is not supported. This configuration option is no longer required to enable sync. 2013-05-23 19:27:54,341 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 2013-05-23 19:28:19,918 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStateMBean and NameNodeMXBean 2013-05-23 19:28:19,937 INFO
Hadoop 2.0.4: Unable to load native-hadoop library for your platform
Hi I downloaded hadoop 2.0.4 and keep getting these errors from hadoop cli and MapReduce task logs 13/05/24 14:34:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable i tried adding $HADOOP_HOME/lib/native/* to CLASSPATH and LD_LIBRARY_PATH but none of these worked. Had anyone have similar problem? TY! -- *Benjamin Kim* *benkimkimben at gmail*
Hint on EOFException's on datanodes
On a smallish (10 node) cluster with only 2 mappers per node after a few minutes EOFExceptions are cropping up on the datanodes: an example is shown below. Any hint on what to tweak/change in hadoop / cluster settings to make this more happy? 2013-05-24 05:03:57,460 INFO org.apache.hadoop.hdfs.server.datanode.DataNode (org.apache.hadoop.hdfs.server.datanode.DataXceiver@1b1accfc): writeBlock blk_7760450154173670997_48372 received exception java.io.EOFException: while trying to read 65557 bytes 2013-05-24 05:03:57,262 INFO org.apache.hadoop.hdfs.server.datanode.DataNode (PacketResponder 0 for Block blk_-3990749197748165818_48331): PacketResponder 0 for block blk_-3990749197748165818_48331 terminating 2013-05-24 05:03:57,460 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode (org.apache.hadoop.hdfs.server.datanode.DataXceiver@1b1accfc): DatanodeRegistration(10.254.40.79:9200, storageID=DS-1106090267-10.254.40.79-9200-1369343833886, infoPort=9102, ipcPort=9201):DataXceiver java.io.EOFException: while trying to read 65557 bytes at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:268) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:312) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:532) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:406) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112) at java.lang.Thread.run(Thread.java:662) 2013-05-24 05:03:57,261 INFO org.apache.hadoop.hdfs.server.datanode.Dat