I'm thinking the reason for hard-coding is to protect Hadoop cluster
from high network traffic. If the value is too small, there are
too many network traffic between Map/Reduce tasks and MRAppMaster.
Please see https://issues.apache.org/jira/browse/MAPREDUCE-4381 also.
That's why you need to be very careful if you really want to change
the value.
The source code is at
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java
(line 532-533)
/** The number of milliseconds between progress reports. */
public static final int PROGRESS_INTERVAL = 3000;
Regards,
Akira
(2014/04/16 22:17), Dharmesh Kakadia wrote:
Hi Akira,
Thanks fir the quick reply.
Any particular reason for hard-coding it? Is there a workaround? I want to
be able to get the counters as fine as possible. Also can you point me to
the relevant source code. I am willing to take the issue and contribute if
required.
Thanks,
Dharmesh
On Wed, Apr 16, 2014 at 3:14 PM, Akira AJISAKA
<ajisa...@oss.nttdata.co.jp>wrote:
Moved mapreduce-dev@ to Bcc.
Hi Dharmesh,
The parameter is to set the interval of polling the progress
of the MRAppMaster, not the Map/Reduce tasks. The tasks send
the progress (includes the counter information) to MRAppMaster
every 3000 milliseconds, which is hard-coded.
That's why a sudden big change in counter values happens
even if the parameter is set to a small value.
Regards,
Akira
(2014/04/16 15:42), Dharmesh Kakadia wrote:
Hi Akira,
Thanks for the reply, but as I understand this is the interval of console
counter printing. What I am trying to get
while(!job.isComplete()){
getcounters() and do some processing on that.
}
Now this is running fine, but the status I get the same counter values
repeatedly and then suddenly a big change in counter values.
For example, getcounters for REDUCE_INPUT_RECORDS returns values like
0
0
..
0
280
280
...
280
516
516
...
516
etc.
I want to get more finer values, instead of directly jumping from 280 to
516.
Did that make sense? mapreduce.client.progressmonitor.pollinterval does
not
seem to effect it. Any workaround ?
Thanks,
Dharmesh
On Tue, Apr 15, 2014 at 7:51 PM, Akira AJISAKA
<ajisa...@oss.nttdata.co.jp>wrote:
Moved to u...@hadoop.apache.org.
You can configure the interval by setting
"mapreduce.client.progressmonitor.pollinterval" parameter.
The default value is 1000 ms.
For more details, please see http://hadoop.apache.org/docs/
stable/hadoop-mapreduce-client/hadoop-mapreduce-
client-core/mapred-default.xml.
Regards,
Akira
(2014/04/15 15:29), Dharmesh Kakadia wrote:
Hi,
What is the update interval of inbuilt framework counters? Is that
configurable?
I am trying to collect very fine grained information about the job
execution and using counters for that. It would be great if someone can
point me to documentation/code for it. Thanks in advance.
Thanks,
Dharmesh