Anybody? I am completely stuck here. I have no idea who else I can ask or where I can go for more information. Is there somewhere specific where I should be asking about HOD?
Thank you, Dave On Thu, Jun 10, 2010 at 2:56 PM, David Milne <d.n.mi...@gmail.com> wrote: > Hi there, > > I am trying to get Hadoop on Demand up and running, but am having > problems with the ringmaster not being able to communicate with HDFS. > > The output from the hod allocate command ends with this, with full verbosity: > > [2010-06-10 14:40:22,650] CRITICAL/50 hadoop:298 - Failed to retrieve > 'hdfs' service address. > [2010-06-10 14:40:22,654] DEBUG/10 hadoop:631 - Cleaning up cluster id > 34029.symphony.cs.waikato.ac.nz, as cluster could not be allocated. > [2010-06-10 14:40:22,655] DEBUG/10 hadoop:635 - Calling rm.stop() > [2010-06-10 14:40:22,665] DEBUG/10 hadoop:637 - Returning from rm.stop() > [2010-06-10 14:40:22,666] CRITICAL/50 hod:401 - Cannot allocate > cluster /home/dmilne/hadoop/cluster > [2010-06-10 14:40:23,090] DEBUG/10 hod:597 - return code: 7 > > > I've attached the hodrc file below, but briefly HOD is supposed to > provision an HDFS cluster as well as a Map/Reduce cluster, and seems > to be failing to do so. The ringmaster log looks like this: > > [2010-06-10 14:36:05,144] DEBUG/10 ringMaster:479 - getServiceAddr name: hdfs > [2010-06-10 14:36:05,145] DEBUG/10 ringMaster:487 - getServiceAddr > service: <hodlib.GridServices.hdfs.Hdfs instance at 0x8f97e8> > [2010-06-10 14:36:05,147] DEBUG/10 ringMaster:504 - getServiceAddr > addr hdfs: not found > [2010-06-10 14:36:06,195] DEBUG/10 ringMaster:479 - getServiceAddr name: hdfs > [2010-06-10 14:36:06,197] DEBUG/10 ringMaster:487 - getServiceAddr > service: <hodlib.GridServices.hdfs.Hdfs instance at 0x8f97e8> > [2010-06-10 14:36:06,198] DEBUG/10 ringMaster:504 - getServiceAddr > addr hdfs: not found > > ... and so on, until it gives up > > Any ideas why? One red flag is that when running the allocate command, > some of the variables echo-ed back look dodgy: > > --gridservice-hdfs.fs_port 0 > --gridservice-hdfs.host localhost > --gridservice-hdfs.info_port 0 > > These are not what I specified in the hodrc. Are the port numbers just > set to 0 because I am not using an external HDFS, or is this a > problem? > > > The software versions involved are: > - Hadoop 0.20.2 > - Python 2.5.2 (no Twisted) > - Java 1.6.0_20 > - Torque 2.4.5 > > > The hodrc file looks like this: > > [hod] > stream = True > java-home = /opt/jdk1.6.0_20 > cluster = debian5 > cluster-factor = 1.8 > xrs-port-range = 32768-65536 > debug = 3 > allocate-wait-time = 3600 > temp-dir = /scratch/local/dmilne/hod > > [ringmaster] > register = True > stream = False > temp-dir = /scratch/local/dmilne/hod > log-dir = /scratch/local/dmilne/hod/log > http-port-range = 8000-9000 > idleness-limit = 864000 > work-dirs = > /scratch/local/dmilne/hod/1,/scratch/local/dmilne/hod/2 > xrs-port-range = 32768-65536 > debug = 4 > > [hodring] > stream = False > temp-dir = /scratch/local/dmilne/hod > log-dir = /scratch/local/dmilne/hod/log > register = True > java-home = /opt/jdk1.6.0_20 > http-port-range = 8000-9000 > xrs-port-range = 32768-65536 > debug = 4 > > [resource_manager] > queue = express > batch-home = /opt/torque-2.4.5 > id = torque > options = l:pmem=3812M,W:X="NACCESSPOLICY:SINGLEJOB" > #env-vars = > HOD_PYTHON_HOME=/foo/bar/python-2.5.1/bin/python > > [gridservice-mapred] > external = False > pkgs = /opt/hadoop-0.20.2 > tracker_port = 8030 > info_port = 50080 > > [gridservice-hdfs] > external = False > pkgs = /opt/hadoop-0.20.2 > fs_port = 8020 > info_port = 50070 > > Cheers, > Dave >