Re: How can I control Number of Mappers of a job?

2008-08-01 Thread Alejandro Abdelnur
when done, HADOOP-3387 would allow you to do that. In our
implementation we can tell Hadoop the exact # maps and it will group
splits if necessary.

On Fri, Aug 1, 2008 at 5:25 AM, Andreas Kostyrka [EMAIL PROTECTED] wrote:
 Well, the only way to reliably fix the number of maptasks that I've found is
 by using compressed input files, that forces hadoop to assign one and only
 one file to a map task ;)

 Andreas

 On Thursday 31 July 2008 21:30:33 Gopal Gandhi wrote:
 Thank you, finally someone has interests in my questions =)
 My cluster contains more than one machine. Please don't get me wrong :-). I
 don't want to limit the total mappers in one node (by mapred.map.tasks).
 What I want is to limit the total mappers for one job. The motivation is
 that I have 2 jobs to run at the same time. they have the same input data
 in Hadoop. I found that one job has to wait until the other finishes its
 mapping. Because the 2 jobs are submitted by 2 different people, I don't
 want one job to be starving. So I want to limit the first job's total
 mappers so that the 2 jobs will be launched simultaneously.



 - Original Message 
 From: Goel, Ankur [EMAIL PROTECTED]
 To: core-user@hadoop.apache.org
 Cc: [EMAIL PROTECTED]
 Sent: Wednesday, July 30, 2008 10:17:53 PM
 Subject: RE: How can I control Number of Mappers of a job?

 How big is your cluster? Assuming you are running a single node cluster,

 Hadoop-default.xml has a parameter 'mapred.map.tasks' that is set to 2.
 So
 By default, no matter how many map tasks are calculated by framework,
 only  2 map task will execute on a single node cluster.

 -Original Message-
 From: Gopal Gandhi [mailto:[EMAIL PROTECTED]
 Sent: Thursday, July 31, 2008 4:38 AM
 To: core-user@hadoop.apache.org
 Cc: [EMAIL PROTECTED]
 Subject: How can I control Number of Mappers of a job?

 The motivation is to control the max # of mappers of a job. For example,
 the input data is 246MB, divided by 64M is 4. If by default there will
 be 4 mappers launched on the 4 blocks.
 What I want is to set its max # of mappers as 2, so that 2 mappers are
 launched first and when they completes on the first 2 blocks, another 2
 mappers start on the rest 2 blocks. Does Hadoop provide a way?





Re: NameNode failover procedure

2008-08-01 Thread Himanshu Sharma

NFS is problematic, that's sure. So, what if secondary namenode where only
the secondary process is running, itself used as backup of Editslog file
using any synchronisation tool? Then we may have a backup in case primary
namenode goes down so that it can be started there at the secondary
namenode.


Steve Loughran wrote:
 
 Himanshu Sharma wrote:
 The NFS seems to be having problem as NFS locking causes namenode hangup.
 Can't be there any other way, say if namenode starts writing
 synchronously
 to secondary namenode apart from local directories, then in case of
 namenode
 failover, we can start the primary namenode process on secondary namenode
 and the latest checkpointed fsimage is already there on secondary
 namenode.
 
 NFS shouldn't be used in production datacentres, at least not as the 
 main way that the nodes talk to a common filesystem.
 
 That doesn't mean it doesn't get used that way, but when the network 
 plays up, all 1000+ servers suddenly halt on file IO with their logs 
 filling up with NFS warnings. The problem here is that the OS assumes 
 that file IO is local and fast, and NFS is trying transparently to 
 recover by blocking for a while, so bringing your apps to a halt. It is 
 way better to have the failures visible at the app level and make it 
 apply whatever policy you want -which is exactly what the DFS clients do 
 when talking to name- or -data nodes.
 
 
 say no to NFS.
 
 Alternatives
 
 * Some HA databases have two servers sharing access to the same disk 
 array at the physical layer, so when the 1ary node goes down, the 
 secondary can take over. but that assumes that it is never the raid-5 
 disk array that is going to fail. If something very bad happens to the 
 RAID controller, that assumption may prove to be false.
 
 * SAN storage arrays to route RAID-backed storage to specific nodes in 
 the cluster. Again, you are hoping that nothing goes wrong behind the 
 scenes.
 
 * Product placement warning: HP extreme storage with CPUs in the rack
 http://h71028.www7.hp.com/enterprise/cache/592778-0-0-0-121.html
 
 I haven't tried bringing up hadoop on one of these -but it would be 
 interesting to see how well it works. Maybe Apache could start having an 
 approved by hadoop sticker with a yellow elephant on it to attach to 
 hardware that is known to work.
 
 This also raises a fundamental question, whether we can run secondary
 namenode process on the same node as primary namenode process without any
 out of memory / heap exceptions ? Also ideally what should be the memory
 size of primary namenode if alone and when with secondary namenode
 process ?
 
 
 What failures are you planning to deal with? Running the secondary node 
 process on the same machine means that you could cope with a process 
 failure, but not machine failure or network outage. You'd also need the 
 2ary process listening on a second port, so clients would still need to 
 do some kind of handover.
 
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Re%3A-NameNode-failover-procedure-tp18740218p18770460.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Hadoop and Ganglia Meterics

2008-08-01 Thread savethequeen


I'm using the newest version 0.17.1, but I can't make it works (it works
with FileContext, but not with GangliaContext). The gmond and gmetad are
working fine. The hadoop run on my local machine only. 

here is my hadoop-metrics.properties: 
#
dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext
dfs.period=10
dfs.servers=localhost:8649

mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext
mapred.period=10
mapred.servers=localhost:8649

jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext
jvm.period=10
jvm.servers=localhost:8649
#
All the other lines are commented. 

Is there any problem with it? Where can I find the patch for this version? 
Thks. 



Joe Williams wrote:
 
 Ah, yeah, I found that one. :) Patching 
 'java/org/apache/hadoop/mapred/JobInProgress.java' on 0.17.1.
 
 -joe
 
 
 Jason Venner wrote:
 I have only applied this patch as far forward as 0.16.0

 Joe Williams wrote:
 Sweet, thanks.


 Jason Venner wrote:
 Once the patch is applied you should start seeing the ganglia metrics

 We do.


 Joe Williams wrote:
 Once I have the patch applied and have it running should I see the 
 metrics? Or do I need to additional work?

 Thanks.
 -Joe


 Jason Venner wrote:
 I applied the patch in the jira to my distro

 Joe Williams wrote:
 Thanks Jason, until this is implemented are how are you pulling 
 stats from Hadoop?

 -joe


 Jason Venner wrote:
 Check out

 https://issues.apache.org/jira/browse/HADOOP-3422


 Joe Williams wrote:
 I have been attempting to get Hadoop metrics in Ganliga and 
 have been unsuccessful thus far. I have see this thread 
 (http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200712.mbox/raw/[EMAIL
  PROTECTED]/) 
 but it didn't help much.

 I have setup my properties file like so:

 [EMAIL PROTECTED] current]# cat 
 conf/hadoop-metrics.properties
 dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext
 dfs.period=10
 dfs.servers=127.0.0.1:8649

 mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext
 mapred.period=10
 mapred.servers=127.0.0.1:8649

 And if I 'telnet 127.0.0.1  8649' I receive the Ganglia XML 
 metrics output without any hadoop specific metrics:

 [EMAIL PROTECTED] current]# telnet 127.0.0.1  8649
 Trying 127.0.0.1...
 Connected to localhost (127.0.0.1).
 Escape character is '^]'.
 ?xml version=1.0 encoding=ISO-8859-1 standalone=yes?
 !DOCTYPE GANGLIA_XML [
   !ELEMENT GANGLIA_XML (GRID|CLUSTER|HOST)*
   !ATTLIST GANGLIA_XML VERSION CDATA #REQUIRED
   !ATTLIST GANGLIA_XML SOURCE CDATA #REQUIRED
 --SNIP--

 Is there more I need to do to get the metrics to show up in 
 this output, am I doing something incorrectly? Do I need to 
 have a gmetric script run in a cron to update the stats? If so, 
 does anyone have a hadoop specific example of this?

 Any info would be helpful.

 Thanks.
 -Joe







 
 -- 
 Name: Joseph A. Williams
 Email: [EMAIL PROTECTED]
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Hadoop-and-Ganglia-Meterics-tp18620340p18771561.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Multiple master nodes

2008-08-01 Thread Otis Gospodnetic
I've been wondering about DRBD.  Many (5+?) years ago when I looked at DRBD it 
required too much low-level tinkering and required hardware I did not have.  I 
wonder what it takes to set it up now and if there are any Hadoop-specific 
things you needed to do?  Overall, are you happy with DRBD? (you are limited to 
2 nodes, right?)


Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: paul [EMAIL PROTECTED]
 To: core-user@hadoop.apache.org
 Sent: Tuesday, July 29, 2008 2:56:44 PM
 Subject: Re: Multiple master nodes
 
 I'm currently running with your option B setup and it seems to be reliable
 for me (so far).  I use a combination of drbd and various hearbeat/LinuxHA
 scripts that handle the failover process, including a virtual IP for the
 namenode.  I haven't had any real-world unexpected failures to deal with,
 yet, but all manual testing has had consistent and reliable results.
 
 
 
 -paul
 
 
 On Tue, Jul 29, 2008 at 1:54 PM, Ryan Shih wrote:
 
  Dear Hadoop Community --
 
  I am wondering if it is already possible or in the plans to add capability
  for multiple master nodes. I'm in a situation where I have a master node
  that may potentially be in a less than ideal execution and networking
  environment. For this reason, it's possible that the master node could die
  at any time. On the other hand, the application must always be available. I
  have accessible to me other machines but I'm still unclear on the best
  method to add reliability.
 
  Here are a few options that I'm exploring:
  a) To create a completely secondary Hadoop cluster that we can flip to when
  we detect that the master node has died. This will double hardware costs,
  so
  if we originally have a 5 node cluster, then we would need to pull 5 more
  machines out of somewhere for this decision. This is not the preferable
  choice.
  b) Just mirror the master node via other always available software, such as
  DRBD for real time synchronization. Upon detection we could swap to the
  alternate node.
  c) Or if Hadoop had some functionality already in place, it would be
  fantastic to be able to take advantage of that. I don't know if anything
  like this is available but I could not find anything as of yet. It seems to
  me, however, that having multiple master nodes would be the direction
  Hadoop
  needs to go if it is to be useful in high availability applications. I was
  told there are some papers on Amazon's Elastic Computing that I'm about to
  look for that follow this approach.
 
  In any case, could someone with experience in solving this type of problem
  share how they approached this issue?
 
  Thanks!
 



RE: Running mapred job from remote machine to a pseudo-distributed hadoop

2008-08-01 Thread Arv Mistry

I'll try again, can anyone tell me should it be possible to run hadoop
in a pseudo-distributed mode (i.e. everything on one machine) and then
submit a mapred job using the ToolRunner from another machine on that
hadoop configuration?

Cheers Arv
 


-Original Message-
From: Arv Mistry [mailto:[EMAIL PROTECTED] 
Sent: Thursday, July 31, 2008 2:32 PM
To: core-user@hadoop.apache.org
Subject: Running mapred job from remote machine to a pseudo-distributed
hadoop

 
I have hadoop setup in a pseudo-distributed mode i.e. everything on one
machine, And I'm trying to submit a hadoop mapred job from another
machine to that hadoop setup.

At the point that I run the mapred job I get the following error. Any
ideas as to what I'm doing wrong?
Is this possible in a pseudo-distributed mode?

Cheers Arv

 INFO   | jvm 1| 2008/07/31 14:01:00 | 2008-07-31 14:01:00,547 ERROR
[HadoopJobTool] java.io.IOException:
/tmp/hadoop-root/mapred/system/job_200807310809_0006/job.xml: No such
file or directory
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:215)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:149)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1155)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1136)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.mapred.JobInProgress.init(JobInProgress.java:175)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1755)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
java.lang.reflect.Method.invoke(Method.java:597)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
INFO   | jvm 1| 2008/07/31 14:01:00 |
INFO   | jvm 1| 2008/07/31 14:01:00 |
org.apache.hadoop.ipc.RemoteException: java.io.IOException:
/tmp/hadoop-root/mapred/system/job_200807310809_0006/job.xml: No such
file or directory
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:215)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:149)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1155)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1136)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.mapred.JobInProgress.init(JobInProgress.java:175)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1755)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
java.lang.reflect.Method.invoke(Method.java:597)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
INFO   | jvm 1| 2008/07/31 14:01:00 |
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.ipc.Client.call(Client.java:557)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
$Proxy5.submitJob(Unknown Source)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
java.lang.reflect.Method.invoke(Method.java:597)
INFO   | jvm 1| 2008/07/31 14:01:00 |   at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvo
cationHandler.java:82)
INFO   | jvm 1| 2008/07/31 

Re: corrupted fsimage and edits

2008-08-01 Thread Otis Gospodnetic
I had the same thing happen to me a few weeks ago.  The solution was to modify 
one of the classes a bit (FSEdits.java or some such) and simple catch + swallow 
one of the exceptions.  This let the NN come up again (at the expense of some 
data loss).  Lohit helped me out and files a bug.  Don't have the issue number 
handy, but it is in JIRA and still open as of a few days ago.  NN HA seems to 
be a requirement for a lot of people... I suppose because it's (the only?) 
SPOF. :)


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Torsten Curdt [EMAIL PROTECTED]
 To: core-user@hadoop.apache.org
 Sent: Wednesday, July 30, 2008 2:09:15 PM
 Subject: corrupted fsimage and edits
 
 Just a bit of a feedback here.
 
 One of our hadoop 0.16.4 namenodes had gotten a disk full incident  
 today. No second backup namenode was in place. Both files fsimage and  
 edits seem to have gotten corrupted. After quite a bit of debugging  
 and fiddling with a hex edtor we managed to resurrect the files and  
 continue with just minor loss.
 
 Thankfully this only happened on a development cluster - not on  
 production. But shouldn't that be something that should NEVER happen?
 
 cheers
 --
 Torsten



RE: java.io.IOException: Cannot allocate memory

2008-08-01 Thread Xavier Stevens
Actually I found the problem was our operations people had enabled
overcommit on memory and restricted it to 50%...lol.  Telling them to
make it 100% fixed the problem.

-Xavier


-Original Message-
From: Taeho Kang [mailto:[EMAIL PROTECTED] 
Sent: Thursday, July 31, 2008 6:16 PM
To: core-user@hadoop.apache.org
Subject: Re: java.io.IOException: Cannot allocate memory

Are you using HadoopStreaming?

If so, then subprocess created by HadoopStreaming Job can take as much
memory as it needs. In that case, the system will run out of memory and
other processes (e.g. TaskTracker) may not be able to run properly or
even be killed by the OS.

/Taeho

On Fri, Aug 1, 2008 at 2:24 AM, Xavier Stevens
[EMAIL PROTECTED]wrote:

 We're currently running jobs on machines with around 16GB of memory 
 with
 8 map tasks per machine.  We used to run with max heap set to 2048m.
 Since we started using version 0.17.1 we've been getting a lot of 
 these
 errors:

 task_200807251330_0042_m_000146_0: Caused by: java.io.IOException:
 java.io.IOException: Cannot allocate memory
 task_200807251330_0042_m_000146_0:  at
 java.lang.UNIXProcess.init(UNIXProcess.java:148)
 task_200807251330_0042_m_000146_0:  at
 java.lang.ProcessImpl.start(ProcessImpl.java:65)
 task_200807251330_0042_m_000146_0:  at
 java.lang.ProcessBuilder.start(ProcessBuilder.java:451)
 task_200807251330_0042_m_000146_0:  at
 org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
 task_200807251330_0042_m_000146_0:  at
 org.apache.hadoop.util.Shell.run(Shell.java:134)
 task_200807251330_0042_m_000146_0:  at
 org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
 task_200807251330_0042_m_000146_0:  at
 org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPat
 hF
 orWrite(LocalDirAllocator.java:296)
 task_200807251330_0042_m_000146_0:  at
 org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAl
 lo
 cator.java:124)
 task_200807251330_0042_m_000146_0:  at
 org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputF
 il
 e.java:107)
 task_200807251330_0042_m_000146_0:  at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.
 ja
 va:734)
 task_200807251330_0042_m_000146_0:  at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1600(MapTask.j
 av
 a:272)
 task_200807251330_0042_m_000146_0:  at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTa
 sk
 .java:707)

 We haven't changed our heapsizes at all.  Has anyone else experienced 
 this?  Is there a way around it other than reducing heap sizes 
 excessively low?  I've tried all the way down to 1024m max heap and I 
 still get this error.


 -Xavier





Re: Multiple master nodes

2008-08-01 Thread paul
Otis,

The DRBD setup is relatively straight forward now and the documentation is
pretty thorough at http://www.drbd.org/users-guide/.  I only run a two node
setup for the masters, so a one to many replication scheme is outside of my
requirements.  I'm currently running my cluster on CentOS 5 and there are
rpms available for DRBD through the extras repository with the following
packages:

drbd82.x86_64
kmod-drbd82.x86_64


There's nothing Hadoop specific, other than starting up the right services
in the right order when using heartbeat.  (The secondary server does not run
it's namenode processes while it's in standby mode)  This is no different
than many other apps in an HA scenario so it's hard to even call this Hadoop
specific.

As far as being happy with it, yes, so far I am.  I've had enough history of
usage with DRBD over the past four years that I'm pretty comfortable with
it's reliability and performance.  I've also done replication of data sets
much larger than the namenode's with negligible performance overhead (after
the initial sync).  Your mileage may vary based on the change rate of your
namenode's data, but for our purposes there is little to no concern.


Here's a few more details on my current configuration...

I do not use a crossover cable between the nodes as you'll often see
recommended by the documentation and howto's.  Instead, since my servers
each have two NICs, I use bonding with LACP and use the bond0 device for
both my regular traffic and my drbd replication.  With this setup, I'd have
to lose two NICs (and two switches on my network) in order to have a
complete network failure and risk any split brain.


My /etc/drbd.conf is pretty simple:

#
# drbd.conf example
#

global { usage-count no; }

resource r0 {

  protocol  C;

  syncer { rate 110M; }

  startup { wfc-timeout 0; degr-wfc-timeout 120; }

  on grid102.domain.prod {
device /dev/drbd0;
disk /dev/sda4;
address 10.6.5.62:7788;
meta-disk internal;
  }

  on grid101.domain.prod {
device /dev/drbd0;
disk /dev/sda4;
address 10.6.5.61:7788;
meta-disk internal;
  }
}

#
# end drbd.conf
#


And a single entry in /etc/fstab:

/dev/drbd0 /hadoopext3defaults,noauto0 0


Obviously there's more to creating the device and file system, but there are
pretty clear instructions on this through the user guide.  I do most of it
through some scripts that I keep around for building cluster masters and
nodes in my environment which the following lines come from:


### start script ###

SOURCE_DIR=/mnt/hadoop/dist

mkdir -p /hadoop
echo /dev/drbd0 /hadoopext3noauto1 2 
/etc/fstab

yum -y install drbd82 kmod-drbd82

/bin/cp $SOURCE_DIR/drbd.conf /etc
chkconfig drbd on

yes | drbdadm create-md r0

service drbd start

# run only on primary, manually
# drbdadm -- --overwrite-data-of-peer primary r0

### end script ###

(fdisk of the volume and mkfs need to be added in there at the end)



If you have any more questions on the setup let me know and I'll try to
answer them for you.



-paul



On Fri, Aug 1, 2008 at 10:09 AM, Otis Gospodnetic 
[EMAIL PROTECTED] wrote:

 I've been wondering about DRBD.  Many (5+?) years ago when I looked at DRBD
 it required too much low-level tinkering and required hardware I did not
 have.  I wonder what it takes to set it up now and if there are any
 Hadoop-specific things you needed to do?  Overall, are you happy with DRBD?
 (you are limited to 2 nodes, right?)


 Thanks,
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
  From: paul [EMAIL PROTECTED]
  To: core-user@hadoop.apache.org
  Sent: Tuesday, July 29, 2008 2:56:44 PM
  Subject: Re: Multiple master nodes
 
  I'm currently running with your option B setup and it seems to be
 reliable
  for me (so far).  I use a combination of drbd and various
 hearbeat/LinuxHA
  scripts that handle the failover process, including a virtual IP for the
  namenode.  I haven't had any real-world unexpected failures to deal with,
  yet, but all manual testing has had consistent and reliable results.
 
 
 
  -paul
 
 
  On Tue, Jul 29, 2008 at 1:54 PM, Ryan Shih wrote:
 
   Dear Hadoop Community --
  
   I am wondering if it is already possible or in the plans to add
 capability
   for multiple master nodes. I'm in a situation where I have a master
 node
   that may potentially be in a less than ideal execution and networking
   environment. For this reason, it's possible that the master node could
 die
   at any time. On the other hand, the application must always be
 available. I
   have accessible to me other machines but I'm still unclear on the best
   method to add reliability.
  
   Here are a few options that I'm exploring:
   a) To create a completely secondary Hadoop cluster that we can flip to
 when
   we detect that the master node has died. This will double hardware
 costs,
   so
   if we originally have a 

Re: Running mapred job from remote machine to a pseudo-distributed hadoop

2008-08-01 Thread James Moore
On Fri, Aug 1, 2008 at 7:13 AM, Arv Mistry [EMAIL PROTECTED] wrote:

 I'll try again, can anyone tell me should it be possible to run hadoop
 in a pseudo-distributed mode (i.e. everything on one machine)

That's not quite what pseudo-distributed mode is.  You can run regular
hadoop jobs on a cluster that consists of one machine, just change the
hostname in your hadoop-site.xml file to the real hostname of your
machine.  If you've got localhost in the conf, Hadoop is going to
use LocalJobRunner, and that may be related to your issue.

I may be wrong on this - I haven't spent much time looking at that
code.  Take a look at
./src/java/org/apache/hadoop/mapred/JobClient.java for what gets
kicked off (for 0.17.1 at least).

-- 
James Moore | [EMAIL PROTECTED]
Ruby and Ruby on Rails consulting
blog.restphone.com


Re: How can I control Number of Mappers of a job?

2008-08-01 Thread James Moore
On Thu, Jul 31, 2008 at 12:30 PM, Gopal Gandhi
[EMAIL PROTECTED] wrote:
 Thank you, finally someone has interests in my questions =)
 My cluster contains more than one machine. Please don't get me wrong :-). I 
 don't want to limit the total mappers in one node (by mapred.map.tasks). What 
 I want is to limit the total mappers for one job. The motivation is that I 
 have 2 jobs to run at the same time. they have the same input data in 
 Hadoop. I found that one job has to wait until the other finishes its 
 mapping. Because the 2 jobs are submitted by 2 different people, I don't want 
 one job to be starving. So I want to limit the first job's total mappers so 
 that the 2 jobs will be launched simultaneously.

What about running two different jobtrackers on the same machines,
looking at the same DFS files?  Never tried it myself, but it might be
an approach.

-- 
James Moore | [EMAIL PROTECTED]
Ruby and Ruby on Rails consulting
blog.restphone.com


Re: How can I control Number of Mappers of a job?

2008-08-01 Thread paul
I've talked to a few people that claim to have done this as a way to limit
resources for different groups, like developers versus production jobs.
Haven't tried it myself yet, but it's getting close to the top of my to-do
list.


-paul


On Fri, Aug 1, 2008 at 1:36 PM, James Moore [EMAIL PROTECTED] wrote:

 On Thu, Jul 31, 2008 at 12:30 PM, Gopal Gandhi
 [EMAIL PROTECTED] wrote:
  Thank you, finally someone has interests in my questions =)
  My cluster contains more than one machine. Please don't get me wrong :-).
 I don't want to limit the total mappers in one node (by mapred.map.tasks).
 What I want is to limit the total mappers for one job. The motivation is
 that I have 2 jobs to run at the same time. they have the same input data
 in Hadoop. I found that one job has to wait until the other finishes its
 mapping. Because the 2 jobs are submitted by 2 different people, I don't
 want one job to be starving. So I want to limit the first job's total
 mappers so that the 2 jobs will be launched simultaneously.

 What about running two different jobtrackers on the same machines,
 looking at the same DFS files?  Never tried it myself, but it might be
 an approach.

 --
 James Moore | [EMAIL PROTECTED]
 Ruby and Ruby on Rails consulting
 blog.restphone.com



No locks available error

2008-08-01 Thread Shirley Cohen

Hi,

We're getting the following error when starting up hadoop on the  
cluster:


2008-08-01 14:42:37,334 INFO org.apache.hadoop.dfs.DataNode:  
STARTUP_MSG:

/
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = node5.cube.disc.cias.utexas.edu/129.116.113.77
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.16.4
STARTUP_MSG:   build = http://svn.apache.org/repos/asf/hadoop/core/ 
branches/branch-0.16 -r 652614; compiled by 'hadoopqa' on Fri May  2  
00:18:12 UTC 2008

/
2008-08-01 14:43:37,572 INFO org.apache.hadoop.dfs.Storage:  
java.io.IOException: No locks available

at sun.nio.ch.FileChannelImpl.lock0(Native Method)
at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:822)
at java.nio.channels.FileChannel.tryLock(FileChannel.java:967)
at org.apache.hadoop.dfs.Storage$StorageDirectory.lock 
(Storage.java:393)
at org.apache.hadoop.dfs.Storage 
$StorageDirectory.analyzeStorage(Storage.java:278)
at org.apache.hadoop.dfs.DataStorage.recoverTransitionRead 
(DataStorage.java:103)
at org.apache.hadoop.dfs.DataNode.startDataNode 
(DataNode.java:236)

at org.apache.hadoop.dfs.DataNode.init(DataNode.java:162)
at org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java: 
2512)

at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2456)
at org.apache.hadoop.dfs.DataNode.createDataNode 
(DataNode.java:2477)

at org.apache.hadoop.dfs.DataNode.main(DataNode.java:2673)

This error appears on every data node during startup.  We are running  
version 0.16.4 of hadoop and the hadoop dfs is NSF mounted on all the  
nodes in the cluster.


Does anyone know what this error means?

Thanks,

Shirley





Re: No locks available error

2008-08-01 Thread Raghu Angadi


Most likely your NFS is not configured to allow file locks. Please 
enable filelocks for your NFS.


Raghu.

Shirley Cohen wrote:

Hi,

We're getting the following error when starting up hadoop on the cluster:

2008-08-01 14:42:37,334 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG:
/
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = node5.cube.disc.cias.utexas.edu/129.116.113.77
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.16.4
STARTUP_MSG:   build = 
http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.16 -r 
652614; compiled by 'hadoopqa' on Fri May  2 00:18:12 UTC 2008

/
2008-08-01 14:43:37,572 INFO org.apache.hadoop.dfs.Storage: 
java.io.IOException: No locks available

at sun.nio.ch.FileChannelImpl.lock0(Native Method)
at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:822)
at java.nio.channels.FileChannel.tryLock(FileChannel.java:967)
at 
org.apache.hadoop.dfs.Storage$StorageDirectory.lock(Storage.java:393)
at 
org.apache.hadoop.dfs.Storage$StorageDirectory.analyzeStorage(Storage.java:278) 

at 
org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(DataStorage.java:103) 


at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:236)
at org.apache.hadoop.dfs.DataNode.init(DataNode.java:162)
at org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java:2512)
at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2456)
at 
org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:2477)

at org.apache.hadoop.dfs.DataNode.main(DataNode.java:2673)

This error appears on every data node during startup.  We are running 
version 0.16.4 of hadoop and the hadoop dfs is NSF mounted on all the 
nodes in the cluster.


Does anyone know what this error means?

Thanks,

Shirley







Re: Determining number of mappers and number of input splits

2008-08-01 Thread James Moore
On Wed, Jul 30, 2008 at 11:24 PM, Naama Kraus [EMAIL PROTECTED] wrote:
 Hi,

 I am a bit confused of how the framework determines the number of mappers of
 a job and the number of input splits.
 Could anyone summarize ?

Take a look at http://wiki.apache.org/hadoop/HowManyMapsAndReduces

Things start to become a little more clear when you think about
Hadoop-size datasets.  It's common that you usually care about tuning
the number of simultaneous jobs running on a single machine (one per
core?  one per hard drive? one per whatever?), and the total number
is just many.

-- 
James Moore | [EMAIL PROTECTED]
Ruby and Ruby on Rails consulting
blog.restphone.com


Re: How can I control Number of Mappers of a job?

2008-08-01 Thread Jason Venner
We control the number of map tasks by carefully managing the input split 
size when we need to.
This may require using the multiplefileinput classes or aggregating your 
input files before hand.
You need to have some aggregation either by contactination or the 
MultipleFileInput if you have more input files than you want map tasks.  



The case of 1 mapper per input file requires setting the inputsplitsize 
to Long.MAX_SIZE (see the datajoin classes for examples)




paul wrote:

I've talked to a few people that claim to have done this as a way to limit
resources for different groups, like developers versus production jobs.
Haven't tried it myself yet, but it's getting close to the top of my to-do
list.


-paul


On Fri, Aug 1, 2008 at 1:36 PM, James Moore [EMAIL PROTECTED] wrote:

  

On Thu, Jul 31, 2008 at 12:30 PM, Gopal Gandhi
[EMAIL PROTECTED] wrote:


Thank you, finally someone has interests in my questions =)
My cluster contains more than one machine. Please don't get me wrong :-).
  

I don't want to limit the total mappers in one node (by mapred.map.tasks).
What I want is to limit the total mappers for one job. The motivation is
that I have 2 jobs to run at the same time. they have the same input data
in Hadoop. I found that one job has to wait until the other finishes its
mapping. Because the 2 jobs are submitted by 2 different people, I don't
want one job to be starving. So I want to limit the first job's total
mappers so that the 2 jobs will be launched simultaneously.

What about running two different jobtrackers on the same machines,
looking at the same DFS files?  Never tried it myself, but it might be
an approach.

--
James Moore | [EMAIL PROTECTED]
Ruby and Ruby on Rails consulting
blog.restphone.com




  


--
Jason Venner
Attributor - Program the Web http://www.attributor.com/
Attributor is hiring Hadoop Wranglers and coding wizards, contact if 
interested


Can a data node failure prevents from writing into HDFS?

2008-08-01 Thread steph

Hi,

We are running hadoop (0.16.3) on our production system and are  
probably still lacking experience.



In a nutshell, our system is composed of 14 nodes (each 8 cores and 4  
* 500 Gig disks) to
accommodate for both storage and map/reduce jobs. We have a component  
writing into HDFS

outside of that hadoop cluster but within the same data center.

At some point this component was not able to write correctly and was  
getting a lot of IOException:


1216986726631 SEVERE [LogRangeManager 
$LogRangeConsumer.consumeFromLogRange Failed to writeNextElement for  
LogRange 6 ts = 121698669000 java.io.IOException: Could not get  
block locations. Aborting...


1216986726633 SEVERE [LogRangeManager 
$LogRangeConsumer.consumeFromLogRange Failed to writeNextElement for  
LogRange 6 ts = 121698669000 java.io.IOException: Could not get  
block locations. Aborting...


1216986726636 SEVERE [LogRangeManager 
$LogRangeConsumer.consumeFromLogRange Failed to writeNextElement for  
LogRange 6 ts = 121698669000 java.io.IOException: Could not get  
block locations. Aborting...




After checking the cluster we found that one datanode that was down  
due to hardware issue.


Is it possible that one datanode down prevents from writing into HDFS?


I don't know if this is relevant or not but by looking into the logs,  
i also did see that traces but 47 hours before:
1216816310525 WARNING [LogRange.close Failed to close the channel for  
range 12168159 java.io.IOException: All datanodes  
10.0.1.173:50010 are bad.
(That's all there was and at the time i looked at the cluster only one  
datanode was down)



Thanks,

S.