Re: Appending and seeking files while writing

2010-06-13 Thread Vidur Goyal
Append is supported in hadoop 0.20 .


 Hi.

 I think this really depends on the append functionality, any idea whether
 it
 supports such behaviour now?

 Regards.

 On Fri, Jun 11, 2010 at 10:41 AM, hadooprcoks hadoopro...@gmail.com
 wrote:

 Stas,

 I also believe that there should be a seek interface on the write path
 so
 that the FS API is complete. The FsDataInputStream already support
 seek() -
 so should FsDataOutputStream. For File systems, that do not support the
 seek
 on the write path, the seek can be a no operation.

 Could you open a JIRA to track this. I am willing to provide the patch
 if
 you do not have the time to do so.

 thanks
 hadooprocks


  On Thu, Jun 10, 2010 at 5:05 AM, Stas Oskin stas.os...@gmail.com
 wrote:

  Hi.
 
  Was the append functionality finally added to 0.20.1 version?
 
  Also, is the ability to seek file being written and write data in
 other
  place also supported?
 
  Thanks in advance!
 


 --
 This message has been scanned for viruses and
 dangerous content by MailScanner, and is
 believed to be clean.




-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



Hadoop 0.20.2 looking *inside* a file in the input path for files?

2010-06-13 Thread suckerfish

Hello, I am a newbie to hadoop, following the 
http://hadoop.apache.org/common/docs/r0.20.1/mapred_tutorial.html WordCount  
tutorial but trying update it to use the mapreduce classes instead of
mapred. 

However I am getting the following error: 
10/06/13 18:24:50 INFO mapred.JobClient: Task Id :
attempt_201006131625_0023_m_00_0, Status : FAILED
java.io.FileNotFoundException: 123 123 123 (No such file or directory)

123 123 123 is the first line in a file on my specified input path. Anyone
have any idea what is going on? I do not run into this problem when I use
the deprecated mapred classes. 

Thanks, 

Yipeng 
-- 
View this message in context: 
http://old.nabble.com/Hadoop-0.20.2-looking-*inside*-a-file-in-the-input-path-for-files--tp28870429p28870429.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Hadoop 0.20.2 looking *inside* a file in the input path for files?

2010-06-13 Thread Ted Yu
See https://issues.apache.org/jira/browse/MAPREDUCE-1734

BTW, it's easier for other people to reproduce your scenario if you post
your code.

On Sun, Jun 13, 2010 at 3:35 AM, suckerfish yip...@gmail.com wrote:


 Hello, I am a newbie to hadoop, following the
 http://hadoop.apache.org/common/docs/r0.20.1/mapred_tutorial.htmlWordCount
 tutorial but trying update it to use the mapreduce classes instead of
 mapred.

 However I am getting the following error:
 10/06/13 18:24:50 INFO mapred.JobClient: Task Id :
 attempt_201006131625_0023_m_00_0, Status : FAILED
 java.io.FileNotFoundException: 123 123 123 (No such file or directory)

 123 123 123 is the first line in a file on my specified input path.
 Anyone
 have any idea what is going on? I do not run into this problem when I use
 the deprecated mapred classes.

 Thanks,

 Yipeng
 --
 View this message in context:
 http://old.nabble.com/Hadoop-0.20.2-looking-*inside*-a-file-in-the-input-path-for-files--tp28870429p28870429.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.




Re: Appending and seeking files while writing

2010-06-13 Thread Todd Lipcon
On Sun, Jun 13, 2010 at 12:46 AM, Vidur Goyal vi...@students.iiit.ac.inwrote:

 Append is supported in hadoop 0.20 .


Append will be supported in the 0.20-append branch, which is still in
progress. It is NOT supported in vanilla 0.20. You can turn on the config
option but it is dangerous and highly discouraged for real use.

Append will be supported fully in 0.21.

Also, append does *not* add random write. It simply adds the ability to
re-open a file and add more data to the end.

-Todd



  Hi.
 
  I think this really depends on the append functionality, any idea whether
  it
  supports such behaviour now?
 
  Regards.
 
  On Fri, Jun 11, 2010 at 10:41 AM, hadooprcoks hadoopro...@gmail.com
  wrote:
 
  Stas,
 
  I also believe that there should be a seek interface on the write path
  so
  that the FS API is complete. The FsDataInputStream already support
  seek() -
  so should FsDataOutputStream. For File systems, that do not support the
  seek
  on the write path, the seek can be a no operation.
 
  Could you open a JIRA to track this. I am willing to provide the patch
  if
  you do not have the time to do so.
 
  thanks
  hadooprocks
 
 
   On Thu, Jun 10, 2010 at 5:05 AM, Stas Oskin stas.os...@gmail.com
  wrote:
 
   Hi.
  
   Was the append functionality finally added to 0.20.1 version?
  
   Also, is the ability to seek file being written and write data in
  other
   place also supported?
  
   Thanks in advance!
  
 
 
  --
  This message has been scanned for viruses and
  dangerous content by MailScanner, and is
  believed to be clean.
 
 


 --
 This message has been scanned for viruses and
 dangerous content by MailScanner, and is
 believed to be clean.




-- 
Todd Lipcon
Software Engineer, Cloudera


problem setting up development environment for hadoop

2010-06-13 Thread Vidur Goyal
Hello All,

I have been trying to set up a development environment for hdfs using this
link http://wiki.apache.org/hadoop/EclipseEnvironment , but the project
gives error after the build is completed. It does not contain certain
files. Please help !

vidur

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



Is it possible to sort values before they are sent to the reduce function?

2010-06-13 Thread Kevin Tse
Hi,
For each key, there might be millions of values(LongWritable), but I only
want to emit top 20 of these values which I want to be sorted in descending
order.
So is it possible to sort these values before they enter the reduce phase?

Thank you in advance!
Kevin


Re: Is it possible to sort values before they are sent to the reduce function?

2010-06-13 Thread Alex Kozlov
Hi Kevin, This is a very common technique.  Look for secondary sort in Tom
White's HTGD (Chapter 6).  You'll most likely have to write your own
Partitioner and WritableComparator.  -- Alex K

On Sun, Jun 13, 2010 at 7:16 PM, Kevin Tse kevintse.on...@gmail.com wrote:

 Hi,
 For each key, there might be millions of values(LongWritable), but I only
 want to emit top 20 of these values which I want to be sorted in descending
 order.
 So is it possible to sort these values before they enter the reduce phase?

 Thank you in advance!
 Kevin



Re: Problems with HOD and HDFS

2010-06-13 Thread David Milne
Anybody? I am completely stuck here. I have no idea who else I can ask
or where I can go for more information. Is there somewhere specific
where I should be asking about HOD?

Thank you,
Dave

On Thu, Jun 10, 2010 at 2:56 PM, David Milne d.n.mi...@gmail.com wrote:
 Hi there,

 I am trying to get Hadoop on Demand up and running, but am having
 problems with the ringmaster not being able to communicate with HDFS.

 The output from the hod allocate command ends with this, with full verbosity:

 [2010-06-10 14:40:22,650] CRITICAL/50 hadoop:298 - Failed to retrieve
 'hdfs' service address.
 [2010-06-10 14:40:22,654] DEBUG/10 hadoop:631 - Cleaning up cluster id
 34029.symphony.cs.waikato.ac.nz, as cluster could not be allocated.
 [2010-06-10 14:40:22,655] DEBUG/10 hadoop:635 - Calling rm.stop()
 [2010-06-10 14:40:22,665] DEBUG/10 hadoop:637 - Returning from rm.stop()
 [2010-06-10 14:40:22,666] CRITICAL/50 hod:401 - Cannot allocate
 cluster /home/dmilne/hadoop/cluster
 [2010-06-10 14:40:23,090] DEBUG/10 hod:597 - return code: 7


 I've attached the hodrc file below, but briefly HOD is supposed to
 provision an HDFS cluster as well as a Map/Reduce cluster, and seems
 to be failing to do so. The ringmaster log looks like this:

 [2010-06-10 14:36:05,144] DEBUG/10 ringMaster:479 - getServiceAddr name: hdfs
 [2010-06-10 14:36:05,145] DEBUG/10 ringMaster:487 - getServiceAddr
 service: hodlib.GridServices.hdfs.Hdfs instance at 0x8f97e8
 [2010-06-10 14:36:05,147] DEBUG/10 ringMaster:504 - getServiceAddr
 addr hdfs: not found
 [2010-06-10 14:36:06,195] DEBUG/10 ringMaster:479 - getServiceAddr name: hdfs
 [2010-06-10 14:36:06,197] DEBUG/10 ringMaster:487 - getServiceAddr
 service: hodlib.GridServices.hdfs.Hdfs instance at 0x8f97e8
 [2010-06-10 14:36:06,198] DEBUG/10 ringMaster:504 - getServiceAddr
 addr hdfs: not found

 ... and so on, until it gives up

 Any ideas why? One red flag is that when running the allocate command,
 some of the variables echo-ed back look dodgy:

 --gridservice-hdfs.fs_port 0
 --gridservice-hdfs.host localhost
 --gridservice-hdfs.info_port 0

 These are not what I specified in the hodrc. Are the port numbers just
 set to 0 because I am not using an external HDFS, or is this a
 problem?


 The software versions involved are:
  - Hadoop 0.20.2
  - Python 2.5.2 (no Twisted)
  - Java 1.6.0_20
  - Torque 2.4.5


 The hodrc file looks like this:

 [hod]
 stream                          = True
 java-home                       = /opt/jdk1.6.0_20
 cluster                         = debian5
 cluster-factor                  = 1.8
 xrs-port-range                  = 32768-65536
 debug                           = 3
 allocate-wait-time              = 3600
 temp-dir                        = /scratch/local/dmilne/hod

 [ringmaster]
 register                        = True
 stream                          = False
 temp-dir                        = /scratch/local/dmilne/hod
 log-dir                         = /scratch/local/dmilne/hod/log
 http-port-range                 = 8000-9000
 idleness-limit                  = 864000
 work-dirs                       =
 /scratch/local/dmilne/hod/1,/scratch/local/dmilne/hod/2
 xrs-port-range                  = 32768-65536
 debug                           = 4

 [hodring]
 stream                          = False
 temp-dir                        = /scratch/local/dmilne/hod
 log-dir                         = /scratch/local/dmilne/hod/log
 register                        = True
 java-home                       = /opt/jdk1.6.0_20
 http-port-range                 = 8000-9000
 xrs-port-range                  = 32768-65536
 debug                           = 4

 [resource_manager]
 queue                           = express
 batch-home                      = /opt/torque-2.4.5
 id                              = torque
 options                         = l:pmem=3812M,W:X=NACCESSPOLICY:SINGLEJOB
 #env-vars                       =
 HOD_PYTHON_HOME=/foo/bar/python-2.5.1/bin/python

 [gridservice-mapred]
 external                        = False
 pkgs                            = /opt/hadoop-0.20.2
 tracker_port                    = 8030
 info_port                       = 50080

 [gridservice-hdfs]
 external                        = False
 pkgs                            = /opt/hadoop-0.20.2
 fs_port                         = 8020
 info_port                       = 50070

 Cheers,
 Dave



Re: Problems with HOD and HDFS

2010-06-13 Thread Jeff Hammerbacher
Hey Dave,

I can't speak for the folks at Yahoo!, but from watching the JIRA, I don't
think HOD is actively used or developed anywhere these days. You're
attempting to use a mostly deprecated project, and hence not receiving any
support on the mailing list.

Thanks,
Jeff

On Sun, Jun 13, 2010 at 7:33 PM, David Milne d.n.mi...@gmail.com wrote:

 Anybody? I am completely stuck here. I have no idea who else I can ask
 or where I can go for more information. Is there somewhere specific
 where I should be asking about HOD?

 Thank you,
 Dave

 On Thu, Jun 10, 2010 at 2:56 PM, David Milne d.n.mi...@gmail.com wrote:
  Hi there,
 
  I am trying to get Hadoop on Demand up and running, but am having
  problems with the ringmaster not being able to communicate with HDFS.
 
  The output from the hod allocate command ends with this, with full
 verbosity:
 
  [2010-06-10 14:40:22,650] CRITICAL/50 hadoop:298 - Failed to retrieve
  'hdfs' service address.
  [2010-06-10 14:40:22,654] DEBUG/10 hadoop:631 - Cleaning up cluster id
  34029.symphony.cs.waikato.ac.nz, as cluster could not be allocated.
  [2010-06-10 14:40:22,655] DEBUG/10 hadoop:635 - Calling rm.stop()
  [2010-06-10 14:40:22,665] DEBUG/10 hadoop:637 - Returning from rm.stop()
  [2010-06-10 14:40:22,666] CRITICAL/50 hod:401 - Cannot allocate
  cluster /home/dmilne/hadoop/cluster
  [2010-06-10 14:40:23,090] DEBUG/10 hod:597 - return code: 7
 
 
  I've attached the hodrc file below, but briefly HOD is supposed to
  provision an HDFS cluster as well as a Map/Reduce cluster, and seems
  to be failing to do so. The ringmaster log looks like this:
 
  [2010-06-10 14:36:05,144] DEBUG/10 ringMaster:479 - getServiceAddr name:
 hdfs
  [2010-06-10 14:36:05,145] DEBUG/10 ringMaster:487 - getServiceAddr
  service: hodlib.GridServices.hdfs.Hdfs instance at 0x8f97e8
  [2010-06-10 14:36:05,147] DEBUG/10 ringMaster:504 - getServiceAddr
  addr hdfs: not found
  [2010-06-10 14:36:06,195] DEBUG/10 ringMaster:479 - getServiceAddr name:
 hdfs
  [2010-06-10 14:36:06,197] DEBUG/10 ringMaster:487 - getServiceAddr
  service: hodlib.GridServices.hdfs.Hdfs instance at 0x8f97e8
  [2010-06-10 14:36:06,198] DEBUG/10 ringMaster:504 - getServiceAddr
  addr hdfs: not found
 
  ... and so on, until it gives up
 
  Any ideas why? One red flag is that when running the allocate command,
  some of the variables echo-ed back look dodgy:
 
  --gridservice-hdfs.fs_port 0
  --gridservice-hdfs.host localhost
  --gridservice-hdfs.info_port 0
 
  These are not what I specified in the hodrc. Are the port numbers just
  set to 0 because I am not using an external HDFS, or is this a
  problem?
 
 
  The software versions involved are:
   - Hadoop 0.20.2
   - Python 2.5.2 (no Twisted)
   - Java 1.6.0_20
   - Torque 2.4.5
 
 
  The hodrc file looks like this:
 
  [hod]
  stream  = True
  java-home   = /opt/jdk1.6.0_20
  cluster = debian5
  cluster-factor  = 1.8
  xrs-port-range  = 32768-65536
  debug   = 3
  allocate-wait-time  = 3600
  temp-dir= /scratch/local/dmilne/hod
 
  [ringmaster]
  register= True
  stream  = False
  temp-dir= /scratch/local/dmilne/hod
  log-dir = /scratch/local/dmilne/hod/log
  http-port-range = 8000-9000
  idleness-limit  = 864000
  work-dirs   =
  /scratch/local/dmilne/hod/1,/scratch/local/dmilne/hod/2
  xrs-port-range  = 32768-65536
  debug   = 4
 
  [hodring]
  stream  = False
  temp-dir= /scratch/local/dmilne/hod
  log-dir = /scratch/local/dmilne/hod/log
  register= True
  java-home   = /opt/jdk1.6.0_20
  http-port-range = 8000-9000
  xrs-port-range  = 32768-65536
  debug   = 4
 
  [resource_manager]
  queue   = express
  batch-home  = /opt/torque-2.4.5
  id  = torque
  options =
 l:pmem=3812M,W:X=NACCESSPOLICY:SINGLEJOB
  #env-vars   =
  HOD_PYTHON_HOME=/foo/bar/python-2.5.1/bin/python
 
  [gridservice-mapred]
  external= False
  pkgs= /opt/hadoop-0.20.2
  tracker_port= 8030
  info_port   = 50080
 
  [gridservice-hdfs]
  external= False
  pkgs= /opt/hadoop-0.20.2
  fs_port = 8020
  info_port   = 50070
 
  Cheers,
  Dave
 



Re: Is it possible to sort values before they are sent to the reduce function?

2010-06-13 Thread Kevin Tse
Hi Alex,
I am was reading Tom's book, but I have not reached chapter 6 yet. I just
read it, it is really helpful.
Thank you for mentioning it, and Thanks also goes to Tom.

Kevin

On Mon, Jun 14, 2010 at 10:22 AM, Alex Kozlov ale...@cloudera.com wrote:

 Hi Kevin, This is a very common technique.  Look for secondary sort in Tom
 White's HTGD (Chapter 6).  You'll most likely have to write your own
 Partitioner and WritableComparator.  -- Alex K

 On Sun, Jun 13, 2010 at 7:16 PM, Kevin Tse kevintse.on...@gmail.com
 wrote:

  Hi,
  For each key, there might be millions of values(LongWritable), but I only
  want to emit top 20 of these values which I want to be sorted in
 descending
  order.
  So is it possible to sort these values before they enter the reduce
 phase?
 
  Thank you in advance!
  Kevin
 



Re: Caching in HDFS C API Client

2010-06-13 Thread Arun C Murthy
I'd bet on the Linux file-cache. Assuming you wrote the file with the  
default replication factor of 3, there is one replica of the local- 
filesystem which you are reading...


Try writing multiple GBs of data and randomly reading large files to  
blow your file-cache?


Arun

On Jun 11, 2010, at 10:05 AM, Patrick Donnelly wrote:


Hi List,

I need to explain an higher than expected throughput (bandwidth) for a
HDFS C API Client. Specifically, the client is getting bandwidth
higher than its link rate :). The client is first writing a 512 MB
file followed by reading the entire file back. The file read is what's
getting the higher than link rate bandwidth. I assume this is a
consequence of caching? Is this done by HDFS or by Linux?

Thanks for any help,

--
- Patrick Donnelly




Re: Problems with HOD and HDFS

2010-06-13 Thread David Milne
Ok, thanks Jeff.

This is pretty surprising though. I would have thought many people
would be in my position, where they have to use Hadoop on a general
purpose cluster, and need it to play nice with a resource manager?
What do other people do in this position, if they don't use HOD?
Deprecated normally means there is a better alternative.

- Dave

On Mon, Jun 14, 2010 at 2:39 PM, Jeff Hammerbacher ham...@cloudera.com wrote:
 Hey Dave,

 I can't speak for the folks at Yahoo!, but from watching the JIRA, I don't
 think HOD is actively used or developed anywhere these days. You're
 attempting to use a mostly deprecated project, and hence not receiving any
 support on the mailing list.

 Thanks,
 Jeff

 On Sun, Jun 13, 2010 at 7:33 PM, David Milne d.n.mi...@gmail.com wrote:

 Anybody? I am completely stuck here. I have no idea who else I can ask
 or where I can go for more information. Is there somewhere specific
 where I should be asking about HOD?

 Thank you,
 Dave

 On Thu, Jun 10, 2010 at 2:56 PM, David Milne d.n.mi...@gmail.com wrote:
  Hi there,
 
  I am trying to get Hadoop on Demand up and running, but am having
  problems with the ringmaster not being able to communicate with HDFS.
 
  The output from the hod allocate command ends with this, with full
 verbosity:
 
  [2010-06-10 14:40:22,650] CRITICAL/50 hadoop:298 - Failed to retrieve
  'hdfs' service address.
  [2010-06-10 14:40:22,654] DEBUG/10 hadoop:631 - Cleaning up cluster id
  34029.symphony.cs.waikato.ac.nz, as cluster could not be allocated.
  [2010-06-10 14:40:22,655] DEBUG/10 hadoop:635 - Calling rm.stop()
  [2010-06-10 14:40:22,665] DEBUG/10 hadoop:637 - Returning from rm.stop()
  [2010-06-10 14:40:22,666] CRITICAL/50 hod:401 - Cannot allocate
  cluster /home/dmilne/hadoop/cluster
  [2010-06-10 14:40:23,090] DEBUG/10 hod:597 - return code: 7
 
 
  I've attached the hodrc file below, but briefly HOD is supposed to
  provision an HDFS cluster as well as a Map/Reduce cluster, and seems
  to be failing to do so. The ringmaster log looks like this:
 
  [2010-06-10 14:36:05,144] DEBUG/10 ringMaster:479 - getServiceAddr name:
 hdfs
  [2010-06-10 14:36:05,145] DEBUG/10 ringMaster:487 - getServiceAddr
  service: hodlib.GridServices.hdfs.Hdfs instance at 0x8f97e8
  [2010-06-10 14:36:05,147] DEBUG/10 ringMaster:504 - getServiceAddr
  addr hdfs: not found
  [2010-06-10 14:36:06,195] DEBUG/10 ringMaster:479 - getServiceAddr name:
 hdfs
  [2010-06-10 14:36:06,197] DEBUG/10 ringMaster:487 - getServiceAddr
  service: hodlib.GridServices.hdfs.Hdfs instance at 0x8f97e8
  [2010-06-10 14:36:06,198] DEBUG/10 ringMaster:504 - getServiceAddr
  addr hdfs: not found
 
  ... and so on, until it gives up
 
  Any ideas why? One red flag is that when running the allocate command,
  some of the variables echo-ed back look dodgy:
 
  --gridservice-hdfs.fs_port 0
  --gridservice-hdfs.host localhost
  --gridservice-hdfs.info_port 0
 
  These are not what I specified in the hodrc. Are the port numbers just
  set to 0 because I am not using an external HDFS, or is this a
  problem?
 
 
  The software versions involved are:
   - Hadoop 0.20.2
   - Python 2.5.2 (no Twisted)
   - Java 1.6.0_20
   - Torque 2.4.5
 
 
  The hodrc file looks like this:
 
  [hod]
  stream                          = True
  java-home                       = /opt/jdk1.6.0_20
  cluster                         = debian5
  cluster-factor                  = 1.8
  xrs-port-range                  = 32768-65536
  debug                           = 3
  allocate-wait-time              = 3600
  temp-dir                        = /scratch/local/dmilne/hod
 
  [ringmaster]
  register                        = True
  stream                          = False
  temp-dir                        = /scratch/local/dmilne/hod
  log-dir                         = /scratch/local/dmilne/hod/log
  http-port-range                 = 8000-9000
  idleness-limit                  = 864000
  work-dirs                       =
  /scratch/local/dmilne/hod/1,/scratch/local/dmilne/hod/2
  xrs-port-range                  = 32768-65536
  debug                           = 4
 
  [hodring]
  stream                          = False
  temp-dir                        = /scratch/local/dmilne/hod
  log-dir                         = /scratch/local/dmilne/hod/log
  register                        = True
  java-home                       = /opt/jdk1.6.0_20
  http-port-range                 = 8000-9000
  xrs-port-range                  = 32768-65536
  debug                           = 4
 
  [resource_manager]
  queue                           = express
  batch-home                      = /opt/torque-2.4.5
  id                              = torque
  options                         =
 l:pmem=3812M,W:X=NACCESSPOLICY:SINGLEJOB
  #env-vars                       =
  HOD_PYTHON_HOME=/foo/bar/python-2.5.1/bin/python
 
  [gridservice-mapred]
  external                        = False
  pkgs                            = /opt/hadoop-0.20.2
  tracker_port          

Re: Problems with HOD and HDFS

2010-06-13 Thread Vinod KV

On Monday 14 June 2010 08:03 AM, David Milne wrote:

Anybody? I am completely stuck here. I have no idea who else I can ask
or where I can go for more information. Is there somewhere specific
where I should be asking about HOD?

Thank you,
Dave
   


In the ringmaster logs, you should see which node was supposed to run 
Namenode. This can be found above the logs that you've printed. I can 
barely remember but I guess it reads something like getCommand(). Once 
you find out the node, check the hodring logs there, something must have 
gone wrong there.


The return code was 7 - indicating HDFS failure. See 
http://hadoop.apache.org/common/docs/r0.20.0/hod_user_guide.html#The+Exit+Codes+For+HOD+Are+Not+Getting+Into+Torque, 
and check if you are hitting one of the problems listed there.


HTH,
+vinod



On Thu, Jun 10, 2010 at 2:56 PM, David Milned.n.mi...@gmail.com  wrote:
   

Hi there,

I am trying to get Hadoop on Demand up and running, but am having
problems with the ringmaster not being able to communicate with HDFS.

The output from the hod allocate command ends with this, with full verbosity:

[2010-06-10 14:40:22,650] CRITICAL/50 hadoop:298 - Failed to retrieve
'hdfs' service address.
[2010-06-10 14:40:22,654] DEBUG/10 hadoop:631 - Cleaning up cluster id
34029.symphony.cs.waikato.ac.nz, as cluster could not be allocated.
[2010-06-10 14:40:22,655] DEBUG/10 hadoop:635 - Calling rm.stop()
[2010-06-10 14:40:22,665] DEBUG/10 hadoop:637 - Returning from rm.stop()
[2010-06-10 14:40:22,666] CRITICAL/50 hod:401 - Cannot allocate
cluster /home/dmilne/hadoop/cluster
[2010-06-10 14:40:23,090] DEBUG/10 hod:597 - return code: 7


I've attached the hodrc file below, but briefly HOD is supposed to
provision an HDFS cluster as well as a Map/Reduce cluster, and seems
to be failing to do so. The ringmaster log looks like this:

[2010-06-10 14:36:05,144] DEBUG/10 ringMaster:479 - getServiceAddr name: hdfs
[2010-06-10 14:36:05,145] DEBUG/10 ringMaster:487 - getServiceAddr
service:hodlib.GridServices.hdfs.Hdfs instance at 0x8f97e8
[2010-06-10 14:36:05,147] DEBUG/10 ringMaster:504 - getServiceAddr
addr hdfs: not found
[2010-06-10 14:36:06,195] DEBUG/10 ringMaster:479 - getServiceAddr name: hdfs
[2010-06-10 14:36:06,197] DEBUG/10 ringMaster:487 - getServiceAddr
service:hodlib.GridServices.hdfs.Hdfs instance at 0x8f97e8
[2010-06-10 14:36:06,198] DEBUG/10 ringMaster:504 - getServiceAddr
addr hdfs: not found

... and so on, until it gives up

Any ideas why? One red flag is that when running the allocate command,
some of the variables echo-ed back look dodgy:

--gridservice-hdfs.fs_port 0
--gridservice-hdfs.host localhost
--gridservice-hdfs.info_port 0

These are not what I specified in the hodrc. Are the port numbers just
set to 0 because I am not using an external HDFS, or is this a
problem?


The software versions involved are:
  - Hadoop 0.20.2
  - Python 2.5.2 (no Twisted)
  - Java 1.6.0_20
  - Torque 2.4.5


The hodrc file looks like this:

[hod]
stream  = True
java-home   = /opt/jdk1.6.0_20
cluster = debian5
cluster-factor  = 1.8
xrs-port-range  = 32768-65536
debug   = 3
allocate-wait-time  = 3600
temp-dir= /scratch/local/dmilne/hod

[ringmaster]
register= True
stream  = False
temp-dir= /scratch/local/dmilne/hod
log-dir = /scratch/local/dmilne/hod/log
http-port-range = 8000-9000
idleness-limit  = 864000
work-dirs   =
/scratch/local/dmilne/hod/1,/scratch/local/dmilne/hod/2
xrs-port-range  = 32768-65536
debug   = 4

[hodring]
stream  = False
temp-dir= /scratch/local/dmilne/hod
log-dir = /scratch/local/dmilne/hod/log
register= True
java-home   = /opt/jdk1.6.0_20
http-port-range = 8000-9000
xrs-port-range  = 32768-65536
debug   = 4

[resource_manager]
queue   = express
batch-home  = /opt/torque-2.4.5
id  = torque
options = l:pmem=3812M,W:X=NACCESSPOLICY:SINGLEJOB
#env-vars   =
HOD_PYTHON_HOME=/foo/bar/python-2.5.1/bin/python

[gridservice-mapred]
external= False
pkgs= /opt/hadoop-0.20.2
tracker_port= 8030
info_port   = 50080

[gridservice-hdfs]
external= False
pkgs= /opt/hadoop-0.20.2
fs_port = 8020
info_port   = 50070

Cheers,
Dave