Hi all,
Now, if we have a large dataset to process by MapReduce. The MapReduce will
take machine resources as many as possible.

So when one such a big MapReduce job are running, the cluster would become
very busy and almost cannot do anything else.

For example, we have a HDFS+MapReduc+HBase cluster.
There are a large dataset in HDFS to be processed by MapReduce periodically,
the workload is CPU and I/O heavy. And the cluster also provide other
service for query (query HBase and read files in HDFS). So, when the job is
running, the query latency will become very long.

Since the MapReduce job is not time sensitive, I want to control the load of
MapReduce. Do you have some advices ?

Thanks in advance.
Schubert

Reply via email to