Re: Regarding Capacity Scheduler

2009-05-14 Thread Hemanth Yamijala

Manish,

The pre-emption code in capacity scheduler was found to require a good 
relook and due to the inherent complexity of the problem is likely to 
have issues of the type you have noticed. We have decided to relook at 
the pre-emption code from scratch and to this effect removed it from the 
0.20 branch to start afresh.


Thanks
Hemanth

Manish Katyal wrote:

I'm experimenting with the Capacity scheduler (0.19.0) in a multi-cluster
environment.
I noticed that unlike the mappers, the reducers are not pre-empted?

I have two queues (high and low) that are each running big jobs (70+ maps
each).  The scheduler splits the mappers as per the queue
guaranteed-capacity (5/8ths for the high and the rest for the low). However,
the reduce jobs are not interleaved -- the reduce job in the high queue is
blocked waiting for the reduce job in the low queue to complete.

Is this a bug or by design?

*Low queue:*
Guaranteed Capacity (%) : 37.5
Guaranteed Capacity Maps : 3
Guaranteed Capacity Reduces : *3*
User Limit : 100
Reclaim Time limit : 300
Number of Running Maps : 3
Number of Running Reduces : *7*
Number of Waiting Maps : 131
Number of Waiting Reduces : 0
Priority Supported : NO

*High queue:*
Guaranteed Capacity (%) : 62.5
Guaranteed Capacity Maps : 5
Guaranteed Capacity Reduces : 5
User Limit : 100
Reclaim Time limit : 300
Number of Running Maps : 4
Number of Running Reduces : *0*
Number of Waiting Maps : 68
Number of Waiting Reduces : *7*
Priority Supported : NO

  




Re: HOD questions

2008-12-18 Thread Hemanth Yamijala

Craig,
While HOD does not do this automatically, please note that since you 
are bringing up a Map/Reduce cluster on the allocated nodes, you can 
submit map/reduce parameters with which to bring up the cluster when 
allocating jobs. The relevant options are 
--gridservice-mapred.server-params (or -M in shorthand). Please refer to
http://hadoop.apache.org/core/docs/r0.19.0/hod_user_guide.html#Options+for+Configuring+Hadoop 
for details.
I was aware of this, but the issue is that unless you obtain dedicated 
nodes (as above), this option is not suitable, as it isn't set on a 
per-node basis. I think it would be /fairly/ straightfoward to add to 
HOD, as I detailed in my initial email, so that it "does the correct 
thing" out the box.


True, I did assume you obtained dedicated nodes. It has been fairly 
simpler to operate HOD in this manner, and if I understand correctly, 
would help to solve the requirement you are having as well.
According to hadoop-default.xml, the number of maps is "Typically set 
to a prime several times greater than number of available hosts." - 
Say that we relax this recommendation to read "Typically set to a 
NUMBER several times greater than number of available hosts" then it 
should be straightforward for HOD to set it automatically then?



Actually, AFAIK, the number of maps for a job is determined more or less 
exclusively by the M/R framework based on the number of splits. I've 
seen messages on this list before about how the documentation for this 
configuration item is misleading. So, this might actually not make a 
difference at all, whatever is specified.


Thanks
Hemanth


Re: HOD questions

2008-12-17 Thread Hemanth Yamijala

Craig,

Hello,

We have two HOD questions:

(1) For our current Torque PBS setup, the number of nodes requested by 
HOD (-l nodes=X) corresponds to the number of CPUs allocated, however 
these nodes can be spread across various partially or empty nodes. 
Unfortunately, HOD does not appear to honour the number of processors 
actually allocated by Torque PBS to that job.


Just FYI, at Yahoo! we've set torque to allocate separate nodes for the 
number specified to HOD. In other words, the number corresponds to the 
number of nodes, not processors. This has proved simpler to manage. I 
forget right now, but I think you can make Torque behave like this (to 
not treat processors as individual nodes).

For example, a current running HOD session can be viewed in qstat as:
104544.trmaster  user parallel HOD   4178 8  ----  288:0 R 
01:48

  node29/2+node29/1+node29/0+node17/2+node17/1+node18/2+node18/1
  +node19/1

However, on inspection of the Jobtracker UI, it tells us that node19 
has "Max Map Tasks" and "Max Reduce Tasks" both set to 2, when I think 
that for node19, it should only be allowed one map task.


While HOD does not do this automatically, please note that since you are 
bringing up a Map/Reduce cluster on the allocated nodes, you can submit 
map/reduce parameters with which to bring up the cluster when allocating 
jobs. The relevant options are --gridservice-mapred.server-params (or -M 
in shorthand). Please refer to
http://hadoop.apache.org/core/docs/r0.19.0/hod_user_guide.html#Options+for+Configuring+Hadoop 
for details.


I believe that for each node, HOD should determine (using the 
information in the $PBS_NODEFILE), how many CPUs for each node are 
allocated to the HOD job, and then set 
mapred.tasktracker.map.tasks.maximum appropriately on each node.


(2) In our InputFormat, we use the numSplits to tell us how many map 
tasks the job's files should be split into. However, HOD does not 
override the mapred.map.tasks property (nor the mapred.reduce.tasks), 
while they should be set dependent on the number of available task 
trackers and/or nodes in the HOD session.


Can this not be submitted via the Hadoop job's configuration ? Again, 
HOD cannot do this automatically currently. But you could use the 
hod.client-params to set up a client side hadoop-site.xml that would 
work like this for all jobs submitted to the cluster.


Hope this helps some.

Thanks
Hemanth


Re: Integrate HADOOP and Map/Reduce paradigm into HPC environment

2008-09-05 Thread Hemanth Yamijala

Hemanth Yamijala wrote:

Filippo Spiga wrote:

This procedure allows me to
- use persistent HDFS on all cluster, placing namenode to frontend 
(always

up and running) and datanode to other nodes
- submit a lot of jobs to resource manager trasparently without any 
problem
and manage jobs priority/reservation with MAUI as simple as other 
classical

HPC jobs
- execute jobtracker and tasktracker services on the nodes chosen by 
TORQUE

(in particular, the first node selected becomes the jobtracker)
- store logs for different users into separated directory
- run only one job at time (but probably multiple map/reduce jobs can
runs together
because different jobs use different subset of nodes)

Probably HOD does what I can do with my raw script... it's possibile 
that I

don't understand well the userguide...

  
Filippo, HOD indeed allows you to do all these things, and a little 
bit more. On the other
hand your script executes the jobtracker on the first node always, 
which also seems
useful to me. It will be nice if  you can still try HOD and see if it 
makes your life

simpler in any way. :-)


Some things that HOD does automatically:
- Set up log directories differently for different users
- Port numbers need not be fixed, HOD detects free ports and provisions 
the services to use

them
- Depending on need, you can also use a custom tarball of hadoop to 
deploy, rather than use

a pre-installed version.

Also, since HOD is only a thin wrapper around the resource manager, all 
policies that you

can set up for Maui can automatically apply for HOD-run clusters.


Sorry for my english :-P

Regards

2008/9/2 Hemanth Yamijala <[EMAIL PROTECTED]>

 

Allen Wittenauer wrote:

   

On 8/18/08 11:33 AM, "Filippo Spiga" <[EMAIL PROTECTED]> wrote:


 

Well but I haven't understand how I should configurate HOD to work in
this
manner.

For HDFS I folllow this sequence of steps
- conf/master contain only master node of my cluster
- conf/slaves contain all nodes
- I start HDFS using bin/start-dfs.sh




   Right, fine...



 

Potentially I would allow to use all nodes for MapReduce.
For HOD which parameter should I set in contrib/hod/conf/hodrc? 
Should I

change only the gridservice-hdfs section?



   I was hoping the HOD folks would answer this question for you, 
but they

are apparently sleeping. :)



  

Woops ! Sorry, I missed this.

   
   Anyway, yes, if you point gridservice-hdfs to a static HDFS,  it 
should
use that as the -default- HDFS. That doesn't prevent a user from 
using HOD

to create a custom HDFS as part of their job submission.



  

Allen's answer is perfect. Please refer to
http://hadoop.apache.org/core/docs/current/hod_user_guide.html#Using+an+external+HDFS 

for more information about how to set up the gridservice-hdfs 
section to

use a static or
external HDFS.







  






Re: Integrate HADOOP and Map/Reduce paradigm into HPC environment

2008-09-05 Thread Hemanth Yamijala

Filippo Spiga wrote:

This procedure allows me to
- use persistent HDFS on all cluster, placing namenode to frontend (always
up and running) and datanode to other nodes
- submit a lot of jobs to resource manager trasparently without any problem
and manage jobs priority/reservation with MAUI as simple as other classical
HPC jobs
- execute jobtracker and tasktracker services on the nodes chosen by TORQUE
(in particular, the first node selected becomes the jobtracker)
- store logs for different users into separated directory
- run only one job at time (but probably multiple map/reduce jobs can
runs together
because different jobs use different subset of nodes)

Probably HOD does what I can do with my raw script... it's possibile that I
don't understand well the userguide...

  
Filippo, HOD indeed allows you to do all these things, and a little bit 
more. On the other
hand your script executes the jobtracker on the first node always, which 
also seems
useful to me. It will be nice if  you can still try HOD and see if it 
makes your life

simpler in any way. :-)


Sorry for my english :-P

Regards

2008/9/2 Hemanth Yamijala <[EMAIL PROTECTED]>

  

Allen Wittenauer wrote:



On 8/18/08 11:33 AM, "Filippo Spiga" <[EMAIL PROTECTED]> wrote:


  

Well but I haven't understand how I should configurate HOD to work in
this
manner.

For HDFS I folllow this sequence of steps
- conf/master contain only master node of my cluster
- conf/slaves contain all nodes
- I start HDFS using bin/start-dfs.sh




   Right, fine...



  

Potentially I would allow to use all nodes for MapReduce.
For HOD which parameter should I set in contrib/hod/conf/hodrc? Should I
change only the gridservice-hdfs section?




   I was hoping the HOD folks would answer this question for you, but they
are apparently sleeping. :)



  

Woops ! Sorry, I missed this.



   Anyway, yes, if you point gridservice-hdfs to a static HDFS,  it should
use that as the -default- HDFS. That doesn't prevent a user from using HOD
to create a custom HDFS as part of their job submission.



  

Allen's answer is perfect. Please refer to
http://hadoop.apache.org/core/docs/current/hod_user_guide.html#Using+an+external+HDFS
for more information about how to set up the gridservice-hdfs section to
use a static or
external HDFS.







  




Re: Integrate HADOOP and Map/Reduce paradigm into HPC environment

2008-09-05 Thread Hemanth Yamijala

Filippo Spiga wrote:

This procedure allows me to
- use persistent HDFS on all cluster, placing namenode to frontend (always
up and running) and datanode to other nodes
- submit a lot of jobs to resource manager trasparently without any problem
and manage jobs priority/reservation with MAUI as simple as other classical
HPC jobs
- execute jobtracker and tasktracker services on the nodes chosen by TORQUE
(in particular, the first node selected becomes the jobtracker)
- store logs for different users into separated directory
- run only one job at time (but probably multiple map/reduce jobs can
runs together
because different jobs use different subset of nodes)

Probably HOD does what I can do with my raw script... it's possibile that I
don't understand well the userguide...

  
Filippo, HOD indeed allows you to do all these things, and a little bit 
more. On the other
hand your script executes the jobtracker on the first node always, which 
also seems
useful to me. It will be nice if  you can still try HOD and see if it 
makes your life

simpler in any way. :-)


Sorry for my english :-P

Regards

2008/9/2 Hemanth Yamijala <[EMAIL PROTECTED]>

  

Allen Wittenauer wrote:



On 8/18/08 11:33 AM, "Filippo Spiga" <[EMAIL PROTECTED]> wrote:


  

Well but I haven't understand how I should configurate HOD to work in
this
manner.

For HDFS I folllow this sequence of steps
- conf/master contain only master node of my cluster
- conf/slaves contain all nodes
- I start HDFS using bin/start-dfs.sh




   Right, fine...



  

Potentially I would allow to use all nodes for MapReduce.
For HOD which parameter should I set in contrib/hod/conf/hodrc? Should I
change only the gridservice-hdfs section?




   I was hoping the HOD folks would answer this question for you, but they
are apparently sleeping. :)



  

Woops ! Sorry, I missed this.



   Anyway, yes, if you point gridservice-hdfs to a static HDFS,  it should
use that as the -default- HDFS. That doesn't prevent a user from using HOD
to create a custom HDFS as part of their job submission.



  

Allen's answer is perfect. Please refer to
http://hadoop.apache.org/core/docs/current/hod_user_guide.html#Using+an+external+HDFS
for more information about how to set up the gridservice-hdfs section to
use a static or
external HDFS.







  




Re: Integrate HADOOP and Map/Reduce paradigm into HPC environment

2008-09-01 Thread Hemanth Yamijala

Allen Wittenauer wrote:


On 8/18/08 11:33 AM, "Filippo Spiga" <[EMAIL PROTECTED]> wrote:
  

Well but I haven't understand how I should configurate HOD to work in this
manner.

For HDFS I folllow this sequence of steps
- conf/master contain only master node of my cluster
- conf/slaves contain all nodes
- I start HDFS using bin/start-dfs.sh



Right, fine...

  

Potentially I would allow to use all nodes for MapReduce.
For HOD which parameter should I set in contrib/hod/conf/hodrc? Should I
change only the gridservice-hdfs section?



I was hoping the HOD folks would answer this question for you, but they
are apparently sleeping. :)

  

Woops ! Sorry, I missed this.

Anyway, yes, if you point gridservice-hdfs to a static HDFS,  it should
use that as the -default- HDFS. That doesn't prevent a user from using HOD
to create a custom HDFS as part of their job submission.

  
Allen's answer is perfect. Please refer to 
http://hadoop.apache.org/core/docs/current/hod_user_guide.html#Using+an+external+HDFS
for more information about how to set up the gridservice-hdfs section to 
use a static or

external HDFS.




Re: HoD and locality of TaskTrackers to data (on DataNodes)

2008-03-23 Thread Hemanth Yamijala

Jiaqi,

Hi,

I have a question about using HoD and the locality of the assigned
TaskTrackers to the data.

Suppose I have a long-running HDFS installation with
TaskTrackers/JobTracker nodes dynamically allocated by HoD, and I
uploaded my data to HDFS prior to running my job/allocating nodes
using "dfs -put". Then, I allocate some nodes and run my job on that
data using HoD. Would the nodes allocated by HoD take into account the
HDFS nodes on which my data resides (e.g. by looking at which
DataNodes hold blocks that belong to the current user)? If the nodes
are just arbitrarily allocated, doesn't that break Hadoop's design
principle of having processing take place near the data?

And if HoD doesn't currently take block location into account when
allocating nodes, are there future plans for that to be incorporated?

  
Excellent point ! HOD does not currently take this into account.  We are 
working on ways in which we can accomplish this using configuration 
outside HOD (i.e. in Torque / some Hadoop features in 0.17 like 
HADOOP-1985). I will update this list (and possibly also documentation) 
on how this can be setup, after we have some more concrete results.


Thanks
Hemanth


Re: HOD question wrt virtual mapred master node

2008-03-10 Thread Hemanth Yamijala

Jason Venner wrote:
We would like to have the master node be the submit node. Our 
processing nodes tend to be rather beefy machines and we don't want to 
have to use one for a HOD master.


We run a shared DFS, so we are only building virtual mapred nodes via 
hod.


Does anyone have any suggestions on how to do that via HOD?



What do you mean by master node - the node that runs the RingMaster 
process in HOD ? If yes, then that node is not exclusively reserved for 
running the RingMaster process. Either a JobTracker, or TaskTracker is 
going to be brought up on it.


Thanks
Hemanth


Re: [HOD] Collecting MapReduce logs

2008-03-10 Thread Hemanth Yamijala

Luca,

Luca wrote:

Hello everyone,
I wonder what is the meaning of hodring.log-destination-uri versus 
hodring.log-dir. I'd like to collect MapReduce UI logs after a job has 
been run and the only attribute seems to be hod.hadoop-ui-log-dir, in 
the hod section.


log-destination-uri is a config option for uploading hadoop logs after 
the cluster is deallocated. log-dir is used to store logs generated by 
the HOD processes itself. If you want MapReduce UI logs, 
hadoop-ui-log-dir is what you want, as you rightly noted.


With that attribute specified, logs are all grabbed in that directory, 
producing a large amount of html files. Is there a way to collect 
them, maybe as a .tar.gz, in a place somewhere related to the user?


Sorry, no, we don't have that option yet. In fact going forward, Hadoop 
might solve this problem on its own. HADOOP-2178 seems to be related to 
this, but I haven't looked at it too closely.


Additionally, how do administrators specify variables in these values? 
Which interpreter interprets them? For instance, variables specified 
in a bash fashion like $USER in section hodring or ringmaster work 
well (I guess they are interpreted by bash itself) but if specified in 
the hod section they're not: I tried with

[hod]
hadoop-ui-log-dir=/somedir/$USER
but any hod command fails displaying an error on that line.

We are definitely planning to build this capability, as part of the work 
we will be doing for HADOOP-2849.

Cheers,
Luca





Re: [HOD] Example script

2008-03-05 Thread Hemanth Yamijala

Luca,


#!/bin/bash
hadoop --config /home/luca/hod-test jar
/mnt/scratch/hadoop/hadoop-0.16.0-examples.jar wordcount
file:///mnt/scratch/hadoop/test/part-0 test.hodscript.out

Can you try removing the --config from this script ? While running 
scripts, HOD automatically allocates a directory and sets 
HADOOP_CONF_DIR to that. So, you don't need to mention that.


That said, I have a very important note: we are changing HOD's user 
interface a bit based on some feedback from some of our users. Before 
you write too many scripts, it might be worthwhile to wait until Hadoop 
0.16.1 is out which will have a more user friendly HOD interface. Please 
look at H-2861 for all details.


Thanks
hemanth



Re: More HOD: is there anyway to get HOD to copy back all of the log files to the submit node?

2008-02-27 Thread Hemanth Yamijala

Jason Venner wrote:
I have found that HOD writes a series of log files to directories on 
the virtual cluster master, if you specify log directories.
The interesting part is figuring out which machine was the virtual 
cluster master, if you have a decent sized pool of machines.


Can you explain what you mean by 'virtual cluster master' ? I guess you 
could mean the node which ran the 'ringmaster' process - the master 
process in a hod virtual cluster, or the hadoop 'jobtracker'. Both this 
information is stored by hod for an allocated cluster. You could 
retrieve it by using

hod -o list,
and
hod -o "info cluster-dir"

If the cluster is deallocated, you can get the ringmaster node still by 
using torque commands:


qstat (to get torque job ids)
and
qstat -n 

This will print a list of nodes, the first one is the ringmaster.


It would be so nice if HOD could copy this back to the submission node.

I suppose we could configure syslog up but then we have to configure 
the syslogd on each of the submit hosts to accept from any compute 
node...


Syslog configuration has a bug which will be addressed in Hadoop 0.16.1.

Thanks
hemanth



Re: HOD question wrt to the virtual cluster log files - where do they end up when the job ends

2008-02-27 Thread Hemanth Yamijala

Jason Venner wrote:
As you have all read from my previous emails, we are still pretty low 
on the HOD learning curve.


That is explained. It is new software, so we will improve over time with 
feedback from our users, like you :-)
We are having jobs that terminate and the virtual mapred cluster is 
terminated also. We want access to the log files and job history from 
the virtual cluster.
You've mentioned that you have a persistent DFS. So, you could do the 
following:


Set up the following variables in your hodrc:
[hodring]
log-destination-uri  = 
hdfs://:/user/hod/logs


If you are enabling permissions, you may need to make sure this 
directory is writable by all users who launch hod jobs. When a job is 
completed, hadoop logs will be uploaded to this folder in dfs as zipped 
files. You can access them from there. Hod logs are stored on the local 
file system under the log-dir configured in the [ringmaster] and 
[hodring] sections.




Our other active research project is why are the virtual clusters 
getting killed.
You mean you allocate, allocation succeeds, and after a while the 
cluster is no longer active ? If yes, please check the following:

- What's the default wallclock time set up in your torque master ? For e.g.:
$ qmgr -c "p q batch"
#
# Create queues and set their attributes.
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution

set queue batch resources_default.walltime = 1:00:00


If this timelimit is low, you could be hitting that and the torque 
server is killing the cluster. You can increase the wallclock time when 
you allocate a cluster. Use --hod.walltime  in the 
allocate command line:

hod --hod.walltime 3600 -o "allocate .."

- Another possibility for auto-deallocation is if your cluster is not 
running no hadoop jobs for a long time. To free up nodes, HOD 
deallocates automatically.




Anyway, how do we get access to the log files and job history from a 
terminated virtual cluster.



History logs also should be uploaded to DFS as explained above.


Note: we run a persistent DFS, and allocate virtual mapred clusters.

Thanks




Re: [HOD] hdfs:///mapredsystem directory

2008-02-27 Thread Hemanth Yamijala

Mahadev Konar wrote:

Hi Luca,
  Can you do a ls -l on /mapredsystem and send the output? According to
permissions for mapreduce the system directories created by jobtracker
should be world writable so permissions should have worked as it is for
hod.

  
No, it doesn't appear to be working that way. The mapred system 
directory specified by HOD for a jobtracker is /mapredsystem/. 
When the job tracker starts, it creates this directory with permissions 
rwx-wx-wx. It does not clean it up when it stops. But when it is started 
again on the same node, I think (may be wrong here) it tries to clean up 
the directory (possibly to handle crashes etc ?). This seems equivalent 
to issuing a "hadoop dfs -rmr /mapredsystem/"which in turn rm 
-r of unix. So, it basically has to read the directory to recursively 
delete; and because there are no read permissions, it fails.


A simple experiment to simulate the same set of DFS operations using 2 
different users, without using mapred, give the same result. Enabling 
read permissions allows the delete to work.


There appear several ways we can fix this, and that discussion should go 
on HADOOP-2899. Will post my comments there.


Thanks
hemanth


Re: [HOD] Port for MapReduce UI interface

2008-02-26 Thread Hemanth Yamijala

Luca wrote:


[hod]
xrs-port-range  = 1-11000
http-port-range = 1-11000


the Mapred UI is chosen outside this range.


There's no port range option for Mapred and HDFS sections currently. You 
seem to have a use-case for specifying the range within which HOD should 
try and bind. Please file a JIRA for this.


Thanks
Hemanth


Re: More HOD questions 0.16.0 - debug log enclosed - help with how to debug

2008-02-26 Thread Hemanth Yamijala

Jason Venner wrote:

My hadoop jobs don't start
This is configured to use an existing DFS and to unpack a tarball with 
a cut down 0.16.0 config
I have looked in the mom logs on the client machines and am not 
getting anything meaningful.


What is your hod command line ? Specifically, how did you provide the 
tarball option ?
Can you attach the log of the hod command, like you did the hodrc. There 
are some lines in the output that don't seem complete.
Set your debug option in the [ringmaster] section to 4, and rerun hod. 
Under the log-dir specified in the [ringmaster] section you will be able 
to see a log file corresponding to your jobid. Can you attach that too ? 
The ringmaster node is the first one allocated by torque for the job, 
that is, the mother superior for the job.
How is your tarball built ? Can you check that there's no hadoop-env.sh 
with pre-filled values in them. Look at HADOOP-2860.


Thanks
Hemanth