Will your jobs be running night and day, or just over a specified period? Depending on your setup, and on what you mean by "scale back" (CPU vs disk IO vs memory), you could potentially restart your cluster with different settings at different times of the day via cron. This will kill any running jobs, so it'll only work if you can find or create a few free minutes. But then you could scale back on CPU by running with HADOOP_NICENESS nonzero (see conf/hadoop-env.sh), you could scale back on memory by setting the various process memory limits low in conf/hadoop-site.xml, and you could scale back on datanode work entirely by setting the maximum number of mappers or reducers to 1 per node during the day (also in conf/hadoop-site.xml).
Kevin On Tue, May 19, 2009 at 7:23 AM, Steve Loughran <ste...@apache.org> wrote: > John Clarke wrote: > >> Hi, >> >> I am working on a project that is suited to Hadoop and so want to create a >> small cluster (only 5 machines!) on our servers. The servers are however >> used during the day and (mostly) idle at night. >> >> So, I want Hadoop to run at full throttle at night and either scale back >> or >> suspend itself during certain times. >> > > You could add/remove new task trackers on idle systems, but > * you don't want to take away datanodes, as there's a risk that data will > become unavailable. > * there's nothing in the scheduler to warn that machines will go away at a > certain time > If you only want to run the cluster at night, I'd just configure the entire > cluster to go up and down >