Re: How to do load control of MapReduce

zsongbo Tue, 12 May 2009 01:33:01 -0700

Hi Stefan,
Yes, the 'nice' cannot resolve this problem.

Now, in my cluster, there are 8GB of RAM. My java heap configuration is:


HDFS DataNode : 1GB
HBase-RegionServer: 1.5GB
MR-TaskTracker: 1GB
MR-child: 512MB   (max child task is 6, 4 map task + 2 reduce task)

But the memory usage is still tight.

Schubert

On Tue, May 12, 2009 at 11:39 AM, Stefan Will <stefan.w...@gmx.net> wrote:

> I'm having similar performance issues and have been running my Hadoop
> processes using a nice level of 10 for a while, and haven't noticed any
> improvement.
>
> In my case, I believe what's happening is that the peak combined RAM usage
> of all the Hadoop task processes and the service processes exceeeds the
> ammount of RAM on my machines. This in turn causes part of the server
> processes to get paged out to disk while the nightly Hadoop batch processes
> are running. Since the swap space is typically on the same physical disks
> as
> the DFS and MapReduce working directory, I'm heavily IO bound and real time
> queries pretty much slow down to a crawl.
>
> I think the key is to make absolutely sure that all of your processes fit
> in
> your available RAM at all times. I'm actually having a hard time achieving
> this since the virtual memory usage of the JVM is usually way higher than
> the maximum heap size (see my other thread).
>
> -- Stefan
>
>
> > From: zsongbo <zson...@gmail.com>
> > Reply-To: <core-user@hadoop.apache.org>
> > Date: Tue, 12 May 2009 10:58:49 +0800
> > To: <core-user@hadoop.apache.org>
> > Subject: Re: How to do load control of MapReduce
> >
> > Thanks Billy,I am trying 'nice', and will report the result later.
> >
> > On Tue, May 12, 2009 at 3:42 AM, Billy Pearson
> > <sa...@pearsonwholesale.com>wrote:
> >
> >> Might try setting the tasktrackers linux nice level to say 5 or 10
> >> leavening dfs and hbase setting to 0
> >>
> >> Billy
> >> "zsongbo" <zson...@gmail.com> wrote in message
> >> news:fa03480d0905110549j7f09be13qd434ca41c9f84...@mail.gmail.com...
> >>
> >>  Hi all,
> >>> Now, if we have a large dataset to process by MapReduce. The MapReduce
> >>> will
> >>> take machine resources as many as possible.
> >>>
> >>> So when one such a big MapReduce job are running, the cluster would
> become
> >>> very busy and almost cannot do anything else.
> >>>
> >>> For example, we have a HDFS+MapReduc+HBase cluster.
> >>> There are a large dataset in HDFS to be processed by MapReduce
> >>> periodically,
> >>> the workload is CPU and I/O heavy. And the cluster also provide other
> >>> service for query (query HBase and read files in HDFS). So, when the
> job
> >>> is
> >>> running, the query latency will become very long.
> >>>
> >>> Since the MapReduce job is not time sensitive, I want to control the
> load
> >>> of
> >>> MapReduce. Do you have some advices ?
> >>>
> >>> Thanks in advance.
> >>> Schubert
> >>>
> >>>
> >>
> >>
>
>
>

Re: How to do load control of MapReduce

Reply via email to