On Monday 14 June 2010 09:51 AM, David Milne wrote:
Ok, thanks Jeff.

This is pretty surprising though. I would have thought many people
would be in my position, where they have to use Hadoop on a general
purpose cluster, and need it to play nice with a resource manager?
What do other people do in this position, if they don't use HOD?
Deprecated normally means there is a better alternative.

- Dave


It isn't formally deprecated though. May be we'll need to do it explicitly; that'll help putting up proper documentation about what else to use instead.

A quick reply is that you start a static cluster on a set of nodes. Static cluster means bringing up hadoop dameons on a set of nodes using the startup scripts distributed along in bin/ directory.

That said, there are no changes in HOD in 0.21 and beyond. Deploying 0.21 clusters should mostly work out of the box. But beyond 0.21, it may not work because HOD needs to be updated w.r.t removed/updated hadoop specific configuration parameters and environmental variables it generates itself.

HTH,
+vinod

On Mon, Jun 14, 2010 at 2:39 PM, Jeff Hammerbacher<ham...@cloudera.com>  wrote:
Hey Dave,

I can't speak for the folks at Yahoo!, but from watching the JIRA, I don't
think HOD is actively used or developed anywhere these days. You're
attempting to use a mostly deprecated project, and hence not receiving any
support on the mailing list.

Thanks,
Jeff

On Sun, Jun 13, 2010 at 7:33 PM, David Milne<d.n.mi...@gmail.com>  wrote:

Anybody? I am completely stuck here. I have no idea who else I can ask
or where I can go for more information. Is there somewhere specific
where I should be asking about HOD?

Thank you,
Dave

On Thu, Jun 10, 2010 at 2:56 PM, David Milne<d.n.mi...@gmail.com>  wrote:
Hi there,

I am trying to get Hadoop on Demand up and running, but am having
problems with the ringmaster not being able to communicate with HDFS.

The output from the hod allocate command ends with this, with full
verbosity:
[2010-06-10 14:40:22,650] CRITICAL/50 hadoop:298 - Failed to retrieve
'hdfs' service address.
[2010-06-10 14:40:22,654] DEBUG/10 hadoop:631 - Cleaning up cluster id
34029.symphony.cs.waikato.ac.nz, as cluster could not be allocated.
[2010-06-10 14:40:22,655] DEBUG/10 hadoop:635 - Calling rm.stop()
[2010-06-10 14:40:22,665] DEBUG/10 hadoop:637 - Returning from rm.stop()
[2010-06-10 14:40:22,666] CRITICAL/50 hod:401 - Cannot allocate
cluster /home/dmilne/hadoop/cluster
[2010-06-10 14:40:23,090] DEBUG/10 hod:597 - return code: 7


I've attached the hodrc file below, but briefly HOD is supposed to
provision an HDFS cluster as well as a Map/Reduce cluster, and seems
to be failing to do so. The ringmaster log looks like this:

[2010-06-10 14:36:05,144] DEBUG/10 ringMaster:479 - getServiceAddr name:
hdfs
[2010-06-10 14:36:05,145] DEBUG/10 ringMaster:487 - getServiceAddr
service:<hodlib.GridServices.hdfs.Hdfs instance at 0x8f97e8>
[2010-06-10 14:36:05,147] DEBUG/10 ringMaster:504 - getServiceAddr
addr hdfs: not found
[2010-06-10 14:36:06,195] DEBUG/10 ringMaster:479 - getServiceAddr name:
hdfs
[2010-06-10 14:36:06,197] DEBUG/10 ringMaster:487 - getServiceAddr
service:<hodlib.GridServices.hdfs.Hdfs instance at 0x8f97e8>
[2010-06-10 14:36:06,198] DEBUG/10 ringMaster:504 - getServiceAddr
addr hdfs: not found

... and so on, until it gives up

Any ideas why? One red flag is that when running the allocate command,
some of the variables echo-ed back look dodgy:

--gridservice-hdfs.fs_port 0
--gridservice-hdfs.host localhost
--gridservice-hdfs.info_port 0

These are not what I specified in the hodrc. Are the port numbers just
set to 0 because I am not using an external HDFS, or is this a
problem?


The software versions involved are:
  - Hadoop 0.20.2
  - Python 2.5.2 (no Twisted)
  - Java 1.6.0_20
  - Torque 2.4.5


The hodrc file looks like this:

[hod]
stream                          = True
java-home                       = /opt/jdk1.6.0_20
cluster                         = debian5
cluster-factor                  = 1.8
xrs-port-range                  = 32768-65536
debug                           = 3
allocate-wait-time              = 3600
temp-dir                        = /scratch/local/dmilne/hod

[ringmaster]
register                        = True
stream                          = False
temp-dir                        = /scratch/local/dmilne/hod
log-dir                         = /scratch/local/dmilne/hod/log
http-port-range                 = 8000-9000
idleness-limit                  = 864000
work-dirs                       =
/scratch/local/dmilne/hod/1,/scratch/local/dmilne/hod/2
xrs-port-range                  = 32768-65536
debug                           = 4

[hodring]
stream                          = False
temp-dir                        = /scratch/local/dmilne/hod
log-dir                         = /scratch/local/dmilne/hod/log
register                        = True
java-home                       = /opt/jdk1.6.0_20
http-port-range                 = 8000-9000
xrs-port-range                  = 32768-65536
debug                           = 4

[resource_manager]
queue                           = express
batch-home                      = /opt/torque-2.4.5
id                              = torque
options                         =
l:pmem=3812M,W:X="NACCESSPOLICY:SINGLEJOB"
#env-vars                       =
HOD_PYTHON_HOME=/foo/bar/python-2.5.1/bin/python

[gridservice-mapred]
external                        = False
pkgs                            = /opt/hadoop-0.20.2
tracker_port                    = 8030
info_port                       = 50080

[gridservice-hdfs]
external                        = False
pkgs                            = /opt/hadoop-0.20.2
fs_port                         = 8020
info_port                       = 50070

Cheers,
Dave


Reply via email to