Hi Stefan, Yes, the 'nice' cannot resolve this problem. Now, in my cluster, there are 8GB of RAM. My java heap configuration is:
HDFS DataNode : 1GB HBase-RegionServer: 1.5GB MR-TaskTracker: 1GB MR-child: 512MB (max child task is 6, 4 map task + 2 reduce task) But the memory usage is still tight. Schubert On Tue, May 12, 2009 at 11:39 AM, Stefan Will <stefan.w...@gmx.net> wrote: > I'm having similar performance issues and have been running my Hadoop > processes using a nice level of 10 for a while, and haven't noticed any > improvement. > > In my case, I believe what's happening is that the peak combined RAM usage > of all the Hadoop task processes and the service processes exceeeds the > ammount of RAM on my machines. This in turn causes part of the server > processes to get paged out to disk while the nightly Hadoop batch processes > are running. Since the swap space is typically on the same physical disks > as > the DFS and MapReduce working directory, I'm heavily IO bound and real time > queries pretty much slow down to a crawl. > > I think the key is to make absolutely sure that all of your processes fit > in > your available RAM at all times. I'm actually having a hard time achieving > this since the virtual memory usage of the JVM is usually way higher than > the maximum heap size (see my other thread). > > -- Stefan > > > > From: zsongbo <zson...@gmail.com> > > Reply-To: <core-user@hadoop.apache.org> > > Date: Tue, 12 May 2009 10:58:49 +0800 > > To: <core-user@hadoop.apache.org> > > Subject: Re: How to do load control of MapReduce > > > > Thanks Billy,I am trying 'nice', and will report the result later. > > > > On Tue, May 12, 2009 at 3:42 AM, Billy Pearson > > <sa...@pearsonwholesale.com>wrote: > > > >> Might try setting the tasktrackers linux nice level to say 5 or 10 > >> leavening dfs and hbase setting to 0 > >> > >> Billy > >> "zsongbo" <zson...@gmail.com> wrote in message > >> news:fa03480d0905110549j7f09be13qd434ca41c9f84...@mail.gmail.com... > >> > >> Hi all, > >>> Now, if we have a large dataset to process by MapReduce. The MapReduce > >>> will > >>> take machine resources as many as possible. > >>> > >>> So when one such a big MapReduce job are running, the cluster would > become > >>> very busy and almost cannot do anything else. > >>> > >>> For example, we have a HDFS+MapReduc+HBase cluster. > >>> There are a large dataset in HDFS to be processed by MapReduce > >>> periodically, > >>> the workload is CPU and I/O heavy. And the cluster also provide other > >>> service for query (query HBase and read files in HDFS). So, when the > job > >>> is > >>> running, the query latency will become very long. > >>> > >>> Since the MapReduce job is not time sensitive, I want to control the > load > >>> of > >>> MapReduce. Do you have some advices ? > >>> > >>> Thanks in advance. > >>> Schubert > >>> > >>> > >> > >> > > >