Re: How can I control Number of Mappers of a job?
when done, HADOOP-3387 would allow you to do that. In our implementation we can tell Hadoop the exact # maps and it will group splits if necessary. On Fri, Aug 1, 2008 at 5:25 AM, Andreas Kostyrka [EMAIL PROTECTED] wrote: Well, the only way to reliably fix the number of maptasks that I've found is by using compressed input files, that forces hadoop to assign one and only one file to a map task ;) Andreas On Thursday 31 July 2008 21:30:33 Gopal Gandhi wrote: Thank you, finally someone has interests in my questions =) My cluster contains more than one machine. Please don't get me wrong :-). I don't want to limit the total mappers in one node (by mapred.map.tasks). What I want is to limit the total mappers for one job. The motivation is that I have 2 jobs to run at the same time. they have the same input data in Hadoop. I found that one job has to wait until the other finishes its mapping. Because the 2 jobs are submitted by 2 different people, I don't want one job to be starving. So I want to limit the first job's total mappers so that the 2 jobs will be launched simultaneously. - Original Message From: Goel, Ankur [EMAIL PROTECTED] To: core-user@hadoop.apache.org Cc: [EMAIL PROTECTED] Sent: Wednesday, July 30, 2008 10:17:53 PM Subject: RE: How can I control Number of Mappers of a job? How big is your cluster? Assuming you are running a single node cluster, Hadoop-default.xml has a parameter 'mapred.map.tasks' that is set to 2. So By default, no matter how many map tasks are calculated by framework, only 2 map task will execute on a single node cluster. -Original Message- From: Gopal Gandhi [mailto:[EMAIL PROTECTED] Sent: Thursday, July 31, 2008 4:38 AM To: core-user@hadoop.apache.org Cc: [EMAIL PROTECTED] Subject: How can I control Number of Mappers of a job? The motivation is to control the max # of mappers of a job. For example, the input data is 246MB, divided by 64M is 4. If by default there will be 4 mappers launched on the 4 blocks. What I want is to set its max # of mappers as 2, so that 2 mappers are launched first and when they completes on the first 2 blocks, another 2 mappers start on the rest 2 blocks. Does Hadoop provide a way?
Re: NameNode failover procedure
NFS is problematic, that's sure. So, what if secondary namenode where only the secondary process is running, itself used as backup of Editslog file using any synchronisation tool? Then we may have a backup in case primary namenode goes down so that it can be started there at the secondary namenode. Steve Loughran wrote: Himanshu Sharma wrote: The NFS seems to be having problem as NFS locking causes namenode hangup. Can't be there any other way, say if namenode starts writing synchronously to secondary namenode apart from local directories, then in case of namenode failover, we can start the primary namenode process on secondary namenode and the latest checkpointed fsimage is already there on secondary namenode. NFS shouldn't be used in production datacentres, at least not as the main way that the nodes talk to a common filesystem. That doesn't mean it doesn't get used that way, but when the network plays up, all 1000+ servers suddenly halt on file IO with their logs filling up with NFS warnings. The problem here is that the OS assumes that file IO is local and fast, and NFS is trying transparently to recover by blocking for a while, so bringing your apps to a halt. It is way better to have the failures visible at the app level and make it apply whatever policy you want -which is exactly what the DFS clients do when talking to name- or -data nodes. say no to NFS. Alternatives * Some HA databases have two servers sharing access to the same disk array at the physical layer, so when the 1ary node goes down, the secondary can take over. but that assumes that it is never the raid-5 disk array that is going to fail. If something very bad happens to the RAID controller, that assumption may prove to be false. * SAN storage arrays to route RAID-backed storage to specific nodes in the cluster. Again, you are hoping that nothing goes wrong behind the scenes. * Product placement warning: HP extreme storage with CPUs in the rack http://h71028.www7.hp.com/enterprise/cache/592778-0-0-0-121.html I haven't tried bringing up hadoop on one of these -but it would be interesting to see how well it works. Maybe Apache could start having an approved by hadoop sticker with a yellow elephant on it to attach to hardware that is known to work. This also raises a fundamental question, whether we can run secondary namenode process on the same node as primary namenode process without any out of memory / heap exceptions ? Also ideally what should be the memory size of primary namenode if alone and when with secondary namenode process ? What failures are you planning to deal with? Running the secondary node process on the same machine means that you could cope with a process failure, but not machine failure or network outage. You'd also need the 2ary process listening on a second port, so clients would still need to do some kind of handover. -- View this message in context: http://www.nabble.com/Re%3A-NameNode-failover-procedure-tp18740218p18770460.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Hadoop and Ganglia Meterics
I'm using the newest version 0.17.1, but I can't make it works (it works with FileContext, but not with GangliaContext). The gmond and gmetad are working fine. The hadoop run on my local machine only. here is my hadoop-metrics.properties: # dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext dfs.period=10 dfs.servers=localhost:8649 mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext mapred.period=10 mapred.servers=localhost:8649 jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext jvm.period=10 jvm.servers=localhost:8649 # All the other lines are commented. Is there any problem with it? Where can I find the patch for this version? Thks. Joe Williams wrote: Ah, yeah, I found that one. :) Patching 'java/org/apache/hadoop/mapred/JobInProgress.java' on 0.17.1. -joe Jason Venner wrote: I have only applied this patch as far forward as 0.16.0 Joe Williams wrote: Sweet, thanks. Jason Venner wrote: Once the patch is applied you should start seeing the ganglia metrics We do. Joe Williams wrote: Once I have the patch applied and have it running should I see the metrics? Or do I need to additional work? Thanks. -Joe Jason Venner wrote: I applied the patch in the jira to my distro Joe Williams wrote: Thanks Jason, until this is implemented are how are you pulling stats from Hadoop? -joe Jason Venner wrote: Check out https://issues.apache.org/jira/browse/HADOOP-3422 Joe Williams wrote: I have been attempting to get Hadoop metrics in Ganliga and have been unsuccessful thus far. I have see this thread (http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200712.mbox/raw/[EMAIL PROTECTED]/) but it didn't help much. I have setup my properties file like so: [EMAIL PROTECTED] current]# cat conf/hadoop-metrics.properties dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext dfs.period=10 dfs.servers=127.0.0.1:8649 mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext mapred.period=10 mapred.servers=127.0.0.1:8649 And if I 'telnet 127.0.0.1 8649' I receive the Ganglia XML metrics output without any hadoop specific metrics: [EMAIL PROTECTED] current]# telnet 127.0.0.1 8649 Trying 127.0.0.1... Connected to localhost (127.0.0.1). Escape character is '^]'. ?xml version=1.0 encoding=ISO-8859-1 standalone=yes? !DOCTYPE GANGLIA_XML [ !ELEMENT GANGLIA_XML (GRID|CLUSTER|HOST)* !ATTLIST GANGLIA_XML VERSION CDATA #REQUIRED !ATTLIST GANGLIA_XML SOURCE CDATA #REQUIRED --SNIP-- Is there more I need to do to get the metrics to show up in this output, am I doing something incorrectly? Do I need to have a gmetric script run in a cron to update the stats? If so, does anyone have a hadoop specific example of this? Any info would be helpful. Thanks. -Joe -- Name: Joseph A. Williams Email: [EMAIL PROTECTED] -- View this message in context: http://www.nabble.com/Hadoop-and-Ganglia-Meterics-tp18620340p18771561.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Multiple master nodes
I've been wondering about DRBD. Many (5+?) years ago when I looked at DRBD it required too much low-level tinkering and required hardware I did not have. I wonder what it takes to set it up now and if there are any Hadoop-specific things you needed to do? Overall, are you happy with DRBD? (you are limited to 2 nodes, right?) Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: paul [EMAIL PROTECTED] To: core-user@hadoop.apache.org Sent: Tuesday, July 29, 2008 2:56:44 PM Subject: Re: Multiple master nodes I'm currently running with your option B setup and it seems to be reliable for me (so far). I use a combination of drbd and various hearbeat/LinuxHA scripts that handle the failover process, including a virtual IP for the namenode. I haven't had any real-world unexpected failures to deal with, yet, but all manual testing has had consistent and reliable results. -paul On Tue, Jul 29, 2008 at 1:54 PM, Ryan Shih wrote: Dear Hadoop Community -- I am wondering if it is already possible or in the plans to add capability for multiple master nodes. I'm in a situation where I have a master node that may potentially be in a less than ideal execution and networking environment. For this reason, it's possible that the master node could die at any time. On the other hand, the application must always be available. I have accessible to me other machines but I'm still unclear on the best method to add reliability. Here are a few options that I'm exploring: a) To create a completely secondary Hadoop cluster that we can flip to when we detect that the master node has died. This will double hardware costs, so if we originally have a 5 node cluster, then we would need to pull 5 more machines out of somewhere for this decision. This is not the preferable choice. b) Just mirror the master node via other always available software, such as DRBD for real time synchronization. Upon detection we could swap to the alternate node. c) Or if Hadoop had some functionality already in place, it would be fantastic to be able to take advantage of that. I don't know if anything like this is available but I could not find anything as of yet. It seems to me, however, that having multiple master nodes would be the direction Hadoop needs to go if it is to be useful in high availability applications. I was told there are some papers on Amazon's Elastic Computing that I'm about to look for that follow this approach. In any case, could someone with experience in solving this type of problem share how they approached this issue? Thanks!
RE: Running mapred job from remote machine to a pseudo-distributed hadoop
I'll try again, can anyone tell me should it be possible to run hadoop in a pseudo-distributed mode (i.e. everything on one machine) and then submit a mapred job using the ToolRunner from another machine on that hadoop configuration? Cheers Arv -Original Message- From: Arv Mistry [mailto:[EMAIL PROTECTED] Sent: Thursday, July 31, 2008 2:32 PM To: core-user@hadoop.apache.org Subject: Running mapred job from remote machine to a pseudo-distributed hadoop I have hadoop setup in a pseudo-distributed mode i.e. everything on one machine, And I'm trying to submit a hadoop mapred job from another machine to that hadoop setup. At the point that I run the mapred job I get the following error. Any ideas as to what I'm doing wrong? Is this possible in a pseudo-distributed mode? Cheers Arv INFO | jvm 1| 2008/07/31 14:01:00 | 2008-07-31 14:01:00,547 ERROR [HadoopJobTool] java.io.IOException: /tmp/hadoop-root/mapred/system/job_200807310809_0006/job.xml: No such file or directory INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:215) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:149) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1155) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1136) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.mapred.JobInProgress.init(JobInProgress.java:175) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1755) INFO | jvm 1| 2008/07/31 14:01:00 | at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) INFO | jvm 1| 2008/07/31 14:01:00 | at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav a:39) INFO | jvm 1| 2008/07/31 14:01:00 | at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor Impl.java:25) INFO | jvm 1| 2008/07/31 14:01:00 | at java.lang.reflect.Method.invoke(Method.java:597) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896) INFO | jvm 1| 2008/07/31 14:01:00 | INFO | jvm 1| 2008/07/31 14:01:00 | org.apache.hadoop.ipc.RemoteException: java.io.IOException: /tmp/hadoop-root/mapred/system/job_200807310809_0006/job.xml: No such file or directory INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:215) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:149) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1155) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1136) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.mapred.JobInProgress.init(JobInProgress.java:175) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1755) INFO | jvm 1| 2008/07/31 14:01:00 | at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) INFO | jvm 1| 2008/07/31 14:01:00 | at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav a:39) INFO | jvm 1| 2008/07/31 14:01:00 | at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor Impl.java:25) INFO | jvm 1| 2008/07/31 14:01:00 | at java.lang.reflect.Method.invoke(Method.java:597) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896) INFO | jvm 1| 2008/07/31 14:01:00 | INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.ipc.Client.call(Client.java:557) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212) INFO | jvm 1| 2008/07/31 14:01:00 | at $Proxy5.submitJob(Unknown Source) INFO | jvm 1| 2008/07/31 14:01:00 | at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) INFO | jvm 1| 2008/07/31 14:01:00 | at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav a:39) INFO | jvm 1| 2008/07/31 14:01:00 | at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor Impl.java:25) INFO | jvm 1| 2008/07/31 14:01:00 | at java.lang.reflect.Method.invoke(Method.java:597) INFO | jvm 1| 2008/07/31 14:01:00 | at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvo cationHandler.java:82) INFO | jvm 1| 2008/07/31
Re: corrupted fsimage and edits
I had the same thing happen to me a few weeks ago. The solution was to modify one of the classes a bit (FSEdits.java or some such) and simple catch + swallow one of the exceptions. This let the NN come up again (at the expense of some data loss). Lohit helped me out and files a bug. Don't have the issue number handy, but it is in JIRA and still open as of a few days ago. NN HA seems to be a requirement for a lot of people... I suppose because it's (the only?) SPOF. :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Torsten Curdt [EMAIL PROTECTED] To: core-user@hadoop.apache.org Sent: Wednesday, July 30, 2008 2:09:15 PM Subject: corrupted fsimage and edits Just a bit of a feedback here. One of our hadoop 0.16.4 namenodes had gotten a disk full incident today. No second backup namenode was in place. Both files fsimage and edits seem to have gotten corrupted. After quite a bit of debugging and fiddling with a hex edtor we managed to resurrect the files and continue with just minor loss. Thankfully this only happened on a development cluster - not on production. But shouldn't that be something that should NEVER happen? cheers -- Torsten
RE: java.io.IOException: Cannot allocate memory
Actually I found the problem was our operations people had enabled overcommit on memory and restricted it to 50%...lol. Telling them to make it 100% fixed the problem. -Xavier -Original Message- From: Taeho Kang [mailto:[EMAIL PROTECTED] Sent: Thursday, July 31, 2008 6:16 PM To: core-user@hadoop.apache.org Subject: Re: java.io.IOException: Cannot allocate memory Are you using HadoopStreaming? If so, then subprocess created by HadoopStreaming Job can take as much memory as it needs. In that case, the system will run out of memory and other processes (e.g. TaskTracker) may not be able to run properly or even be killed by the OS. /Taeho On Fri, Aug 1, 2008 at 2:24 AM, Xavier Stevens [EMAIL PROTECTED]wrote: We're currently running jobs on machines with around 16GB of memory with 8 map tasks per machine. We used to run with max heap set to 2048m. Since we started using version 0.17.1 we've been getting a lot of these errors: task_200807251330_0042_m_000146_0: Caused by: java.io.IOException: java.io.IOException: Cannot allocate memory task_200807251330_0042_m_000146_0: at java.lang.UNIXProcess.init(UNIXProcess.java:148) task_200807251330_0042_m_000146_0: at java.lang.ProcessImpl.start(ProcessImpl.java:65) task_200807251330_0042_m_000146_0: at java.lang.ProcessBuilder.start(ProcessBuilder.java:451) task_200807251330_0042_m_000146_0: at org.apache.hadoop.util.Shell.runCommand(Shell.java:149) task_200807251330_0042_m_000146_0: at org.apache.hadoop.util.Shell.run(Shell.java:134) task_200807251330_0042_m_000146_0: at org.apache.hadoop.fs.DF.getAvailable(DF.java:73) task_200807251330_0042_m_000146_0: at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPat hF orWrite(LocalDirAllocator.java:296) task_200807251330_0042_m_000146_0: at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAl lo cator.java:124) task_200807251330_0042_m_000146_0: at org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputF il e.java:107) task_200807251330_0042_m_000146_0: at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask. ja va:734) task_200807251330_0042_m_000146_0: at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1600(MapTask.j av a:272) task_200807251330_0042_m_000146_0: at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTa sk .java:707) We haven't changed our heapsizes at all. Has anyone else experienced this? Is there a way around it other than reducing heap sizes excessively low? I've tried all the way down to 1024m max heap and I still get this error. -Xavier
Re: Multiple master nodes
Otis, The DRBD setup is relatively straight forward now and the documentation is pretty thorough at http://www.drbd.org/users-guide/. I only run a two node setup for the masters, so a one to many replication scheme is outside of my requirements. I'm currently running my cluster on CentOS 5 and there are rpms available for DRBD through the extras repository with the following packages: drbd82.x86_64 kmod-drbd82.x86_64 There's nothing Hadoop specific, other than starting up the right services in the right order when using heartbeat. (The secondary server does not run it's namenode processes while it's in standby mode) This is no different than many other apps in an HA scenario so it's hard to even call this Hadoop specific. As far as being happy with it, yes, so far I am. I've had enough history of usage with DRBD over the past four years that I'm pretty comfortable with it's reliability and performance. I've also done replication of data sets much larger than the namenode's with negligible performance overhead (after the initial sync). Your mileage may vary based on the change rate of your namenode's data, but for our purposes there is little to no concern. Here's a few more details on my current configuration... I do not use a crossover cable between the nodes as you'll often see recommended by the documentation and howto's. Instead, since my servers each have two NICs, I use bonding with LACP and use the bond0 device for both my regular traffic and my drbd replication. With this setup, I'd have to lose two NICs (and two switches on my network) in order to have a complete network failure and risk any split brain. My /etc/drbd.conf is pretty simple: # # drbd.conf example # global { usage-count no; } resource r0 { protocol C; syncer { rate 110M; } startup { wfc-timeout 0; degr-wfc-timeout 120; } on grid102.domain.prod { device /dev/drbd0; disk /dev/sda4; address 10.6.5.62:7788; meta-disk internal; } on grid101.domain.prod { device /dev/drbd0; disk /dev/sda4; address 10.6.5.61:7788; meta-disk internal; } } # # end drbd.conf # And a single entry in /etc/fstab: /dev/drbd0 /hadoopext3defaults,noauto0 0 Obviously there's more to creating the device and file system, but there are pretty clear instructions on this through the user guide. I do most of it through some scripts that I keep around for building cluster masters and nodes in my environment which the following lines come from: ### start script ### SOURCE_DIR=/mnt/hadoop/dist mkdir -p /hadoop echo /dev/drbd0 /hadoopext3noauto1 2 /etc/fstab yum -y install drbd82 kmod-drbd82 /bin/cp $SOURCE_DIR/drbd.conf /etc chkconfig drbd on yes | drbdadm create-md r0 service drbd start # run only on primary, manually # drbdadm -- --overwrite-data-of-peer primary r0 ### end script ### (fdisk of the volume and mkfs need to be added in there at the end) If you have any more questions on the setup let me know and I'll try to answer them for you. -paul On Fri, Aug 1, 2008 at 10:09 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote: I've been wondering about DRBD. Many (5+?) years ago when I looked at DRBD it required too much low-level tinkering and required hardware I did not have. I wonder what it takes to set it up now and if there are any Hadoop-specific things you needed to do? Overall, are you happy with DRBD? (you are limited to 2 nodes, right?) Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: paul [EMAIL PROTECTED] To: core-user@hadoop.apache.org Sent: Tuesday, July 29, 2008 2:56:44 PM Subject: Re: Multiple master nodes I'm currently running with your option B setup and it seems to be reliable for me (so far). I use a combination of drbd and various hearbeat/LinuxHA scripts that handle the failover process, including a virtual IP for the namenode. I haven't had any real-world unexpected failures to deal with, yet, but all manual testing has had consistent and reliable results. -paul On Tue, Jul 29, 2008 at 1:54 PM, Ryan Shih wrote: Dear Hadoop Community -- I am wondering if it is already possible or in the plans to add capability for multiple master nodes. I'm in a situation where I have a master node that may potentially be in a less than ideal execution and networking environment. For this reason, it's possible that the master node could die at any time. On the other hand, the application must always be available. I have accessible to me other machines but I'm still unclear on the best method to add reliability. Here are a few options that I'm exploring: a) To create a completely secondary Hadoop cluster that we can flip to when we detect that the master node has died. This will double hardware costs, so if we originally have a
Re: Running mapred job from remote machine to a pseudo-distributed hadoop
On Fri, Aug 1, 2008 at 7:13 AM, Arv Mistry [EMAIL PROTECTED] wrote: I'll try again, can anyone tell me should it be possible to run hadoop in a pseudo-distributed mode (i.e. everything on one machine) That's not quite what pseudo-distributed mode is. You can run regular hadoop jobs on a cluster that consists of one machine, just change the hostname in your hadoop-site.xml file to the real hostname of your machine. If you've got localhost in the conf, Hadoop is going to use LocalJobRunner, and that may be related to your issue. I may be wrong on this - I haven't spent much time looking at that code. Take a look at ./src/java/org/apache/hadoop/mapred/JobClient.java for what gets kicked off (for 0.17.1 at least). -- James Moore | [EMAIL PROTECTED] Ruby and Ruby on Rails consulting blog.restphone.com
Re: How can I control Number of Mappers of a job?
On Thu, Jul 31, 2008 at 12:30 PM, Gopal Gandhi [EMAIL PROTECTED] wrote: Thank you, finally someone has interests in my questions =) My cluster contains more than one machine. Please don't get me wrong :-). I don't want to limit the total mappers in one node (by mapred.map.tasks). What I want is to limit the total mappers for one job. The motivation is that I have 2 jobs to run at the same time. they have the same input data in Hadoop. I found that one job has to wait until the other finishes its mapping. Because the 2 jobs are submitted by 2 different people, I don't want one job to be starving. So I want to limit the first job's total mappers so that the 2 jobs will be launched simultaneously. What about running two different jobtrackers on the same machines, looking at the same DFS files? Never tried it myself, but it might be an approach. -- James Moore | [EMAIL PROTECTED] Ruby and Ruby on Rails consulting blog.restphone.com
Re: How can I control Number of Mappers of a job?
I've talked to a few people that claim to have done this as a way to limit resources for different groups, like developers versus production jobs. Haven't tried it myself yet, but it's getting close to the top of my to-do list. -paul On Fri, Aug 1, 2008 at 1:36 PM, James Moore [EMAIL PROTECTED] wrote: On Thu, Jul 31, 2008 at 12:30 PM, Gopal Gandhi [EMAIL PROTECTED] wrote: Thank you, finally someone has interests in my questions =) My cluster contains more than one machine. Please don't get me wrong :-). I don't want to limit the total mappers in one node (by mapred.map.tasks). What I want is to limit the total mappers for one job. The motivation is that I have 2 jobs to run at the same time. they have the same input data in Hadoop. I found that one job has to wait until the other finishes its mapping. Because the 2 jobs are submitted by 2 different people, I don't want one job to be starving. So I want to limit the first job's total mappers so that the 2 jobs will be launched simultaneously. What about running two different jobtrackers on the same machines, looking at the same DFS files? Never tried it myself, but it might be an approach. -- James Moore | [EMAIL PROTECTED] Ruby and Ruby on Rails consulting blog.restphone.com
No locks available error
Hi, We're getting the following error when starting up hadoop on the cluster: 2008-08-01 14:42:37,334 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG: / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = node5.cube.disc.cias.utexas.edu/129.116.113.77 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.16.4 STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/core/ branches/branch-0.16 -r 652614; compiled by 'hadoopqa' on Fri May 2 00:18:12 UTC 2008 / 2008-08-01 14:43:37,572 INFO org.apache.hadoop.dfs.Storage: java.io.IOException: No locks available at sun.nio.ch.FileChannelImpl.lock0(Native Method) at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:822) at java.nio.channels.FileChannel.tryLock(FileChannel.java:967) at org.apache.hadoop.dfs.Storage$StorageDirectory.lock (Storage.java:393) at org.apache.hadoop.dfs.Storage $StorageDirectory.analyzeStorage(Storage.java:278) at org.apache.hadoop.dfs.DataStorage.recoverTransitionRead (DataStorage.java:103) at org.apache.hadoop.dfs.DataNode.startDataNode (DataNode.java:236) at org.apache.hadoop.dfs.DataNode.init(DataNode.java:162) at org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java: 2512) at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2456) at org.apache.hadoop.dfs.DataNode.createDataNode (DataNode.java:2477) at org.apache.hadoop.dfs.DataNode.main(DataNode.java:2673) This error appears on every data node during startup. We are running version 0.16.4 of hadoop and the hadoop dfs is NSF mounted on all the nodes in the cluster. Does anyone know what this error means? Thanks, Shirley
Re: No locks available error
Most likely your NFS is not configured to allow file locks. Please enable filelocks for your NFS. Raghu. Shirley Cohen wrote: Hi, We're getting the following error when starting up hadoop on the cluster: 2008-08-01 14:42:37,334 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG: / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = node5.cube.disc.cias.utexas.edu/129.116.113.77 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.16.4 STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.16 -r 652614; compiled by 'hadoopqa' on Fri May 2 00:18:12 UTC 2008 / 2008-08-01 14:43:37,572 INFO org.apache.hadoop.dfs.Storage: java.io.IOException: No locks available at sun.nio.ch.FileChannelImpl.lock0(Native Method) at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:822) at java.nio.channels.FileChannel.tryLock(FileChannel.java:967) at org.apache.hadoop.dfs.Storage$StorageDirectory.lock(Storage.java:393) at org.apache.hadoop.dfs.Storage$StorageDirectory.analyzeStorage(Storage.java:278) at org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(DataStorage.java:103) at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:236) at org.apache.hadoop.dfs.DataNode.init(DataNode.java:162) at org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java:2512) at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2456) at org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:2477) at org.apache.hadoop.dfs.DataNode.main(DataNode.java:2673) This error appears on every data node during startup. We are running version 0.16.4 of hadoop and the hadoop dfs is NSF mounted on all the nodes in the cluster. Does anyone know what this error means? Thanks, Shirley
Re: Determining number of mappers and number of input splits
On Wed, Jul 30, 2008 at 11:24 PM, Naama Kraus [EMAIL PROTECTED] wrote: Hi, I am a bit confused of how the framework determines the number of mappers of a job and the number of input splits. Could anyone summarize ? Take a look at http://wiki.apache.org/hadoop/HowManyMapsAndReduces Things start to become a little more clear when you think about Hadoop-size datasets. It's common that you usually care about tuning the number of simultaneous jobs running on a single machine (one per core? one per hard drive? one per whatever?), and the total number is just many. -- James Moore | [EMAIL PROTECTED] Ruby and Ruby on Rails consulting blog.restphone.com
Re: How can I control Number of Mappers of a job?
We control the number of map tasks by carefully managing the input split size when we need to. This may require using the multiplefileinput classes or aggregating your input files before hand. You need to have some aggregation either by contactination or the MultipleFileInput if you have more input files than you want map tasks. The case of 1 mapper per input file requires setting the inputsplitsize to Long.MAX_SIZE (see the datajoin classes for examples) paul wrote: I've talked to a few people that claim to have done this as a way to limit resources for different groups, like developers versus production jobs. Haven't tried it myself yet, but it's getting close to the top of my to-do list. -paul On Fri, Aug 1, 2008 at 1:36 PM, James Moore [EMAIL PROTECTED] wrote: On Thu, Jul 31, 2008 at 12:30 PM, Gopal Gandhi [EMAIL PROTECTED] wrote: Thank you, finally someone has interests in my questions =) My cluster contains more than one machine. Please don't get me wrong :-). I don't want to limit the total mappers in one node (by mapred.map.tasks). What I want is to limit the total mappers for one job. The motivation is that I have 2 jobs to run at the same time. they have the same input data in Hadoop. I found that one job has to wait until the other finishes its mapping. Because the 2 jobs are submitted by 2 different people, I don't want one job to be starving. So I want to limit the first job's total mappers so that the 2 jobs will be launched simultaneously. What about running two different jobtrackers on the same machines, looking at the same DFS files? Never tried it myself, but it might be an approach. -- James Moore | [EMAIL PROTECTED] Ruby and Ruby on Rails consulting blog.restphone.com -- Jason Venner Attributor - Program the Web http://www.attributor.com/ Attributor is hiring Hadoop Wranglers and coding wizards, contact if interested
Can a data node failure prevents from writing into HDFS?
Hi, We are running hadoop (0.16.3) on our production system and are probably still lacking experience. In a nutshell, our system is composed of 14 nodes (each 8 cores and 4 * 500 Gig disks) to accommodate for both storage and map/reduce jobs. We have a component writing into HDFS outside of that hadoop cluster but within the same data center. At some point this component was not able to write correctly and was getting a lot of IOException: 1216986726631 SEVERE [LogRangeManager $LogRangeConsumer.consumeFromLogRange Failed to writeNextElement for LogRange 6 ts = 121698669000 java.io.IOException: Could not get block locations. Aborting... 1216986726633 SEVERE [LogRangeManager $LogRangeConsumer.consumeFromLogRange Failed to writeNextElement for LogRange 6 ts = 121698669000 java.io.IOException: Could not get block locations. Aborting... 1216986726636 SEVERE [LogRangeManager $LogRangeConsumer.consumeFromLogRange Failed to writeNextElement for LogRange 6 ts = 121698669000 java.io.IOException: Could not get block locations. Aborting... After checking the cluster we found that one datanode that was down due to hardware issue. Is it possible that one datanode down prevents from writing into HDFS? I don't know if this is relevant or not but by looking into the logs, i also did see that traces but 47 hours before: 1216816310525 WARNING [LogRange.close Failed to close the channel for range 12168159 java.io.IOException: All datanodes 10.0.1.173:50010 are bad. (That's all there was and at the time i looked at the cluster only one datanode was down) Thanks, S.