Hi all, Now, if we have a large dataset to process by MapReduce. The MapReduce will take machine resources as many as possible.
So when one such a big MapReduce job are running, the cluster would become very busy and almost cannot do anything else. For example, we have a HDFS+MapReduc+HBase cluster. There are a large dataset in HDFS to be processed by MapReduce periodically, the workload is CPU and I/O heavy. And the cluster also provide other service for query (query HBase and read files in HDFS). So, when the job is running, the query latency will become very long. Since the MapReduce job is not time sensitive, I want to control the load of MapReduce. Do you have some advices ? Thanks in advance. Schubert