Re: Environment consideration for a research on scheduling

Steve Loughran Mon, 26 Sep 2011 02:42:39 -0700

On 23/09/11 16:09, GOEKE, MATTHEW (AG/1000) wrote:

If you are starting from scratch with no prior Hadoop install experience I 
would configure stand-alone, migrate to pseudo distributed and then to fully 
distributed verifying functionality at each step by doing a simple word count 
run. Also, if you don't mind using the CDH distribution then SCM / their rpms 
will greatly simplify both the bin installs as well as the user creation.

Your VM route will most likely work but I can imagine the amount of hiccups 
during migration from that to the real cluster will not make it worth your time.

Matt

-----Original Message-----
From: Merto Mertek [mailto:masmer...@gmail.com]
Sent: Friday, September 23, 2011 10:00 AM
To: common-user@hadoop.apache.org
Subject: Environment consideration for a research on scheduling

Hi,
in the first phase we are planning to establish a small cluster with few
commodity computer (each 1GB, 200GB,..). Cluster would run ubuntu server
10.10 and  a hadoop build from the branch 0.20.204 (i had some issues with
version 0.20.203 with missing
libraries<http://hadoop-common.472056.n3.nabble.com/Development-enviroment-problems-eclipse-hadoop-0-20-203-td3186022.html#a3188567>).
Would you suggest any other version?

I wouldn't run to put Ubuntu 10.x on; they make good desktops, but RHELand CentOS are the platform of choice in the server side.


In the second phase we are planning to analyse, test and modify some of
hadoop schedulers.

The main schedulers used by Y! and FB are fairly tuned for theirworkloads, and not apparently something you'd want to play with. Thereis at least one other scheduler in the contribs/ dir to play with.

the other thing about scheduling is that you may have a fasterdevelopment cycle if, instead of working on a real cluster, you simulateit and multiples of real time; using stats collected from your ownworkload by way of the gridmix2 tools. I've never done scheduling work,but think there's some stuff there to do that. if not, it's a possiblecontribution.

Be aware that the changes in 0.23+ will change resource scheduling; thismay be a better place to do development with a plan to deploy in 2012.Oh, and get on the mapreduce lists, esp, the -dev list, to discuss issues

The information contained in this email may be subject to the export control 
laws and regulations of the United States, potentially
including but not limited to the Export Administration Regulations (EAR) and 
sanctions regulations issued by the U.S. Department of
Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this 
information you are obligated to comply with all
applicable U.S. export laws and regulations.

I have no idea what that means but am not convinced that reading anemail forces me to comply with a different country's rules

Re: Environment consideration for a research on scheduling

Reply via email to