get rid of excessive flushes from PipeMapper/Reducer
----------------------------------------------------
Key: HADOOP-3196
URL: https://issues.apache.org/jira/browse/HADOOP-3196
Project: Hadoop Core
Issue Type: Bug
Components: contrib/streaming
Affects Versions: 0.16.2
Reporter: Joydeep Sen Sarma
there's a flush on the buffered output streams in mapper/reducer for every row
of data.
// 2/4 Hadoop to Tool
if (numExceptions_ == 0) {
if (!this.ignoreKey) {
write(key);
clientOut_.write('\t');
}
write(value);
if(!this.skipNewline) {
clientOut_.write('\n');
}
clientOut_.flush();
} else {
numRecSkipped_++;
}
tried to measure impact of removing this. number of context switches reported
by vmstat shows marked decline.
with flush (10 second intervals):
r b swpd free buff cache si so bi bo in cs us sy id wa
4 2 784 23140 83352 3114648 0 0 4819 32397 1175 13220 59 11 13 17
1 2 784 129724 80704 3075696 0 0 4614 27196 1156 14797 49 11 19 21
4 0 784 24160 83440 3174880 0 0 96 36070 1337 10976 67 11 9 12
5 0 784 155872 84400 3158840 0 0 125 44084 1280 11044 68 14 10 8
2 1 784 365128 87048 2892032 0 0 119 38472 1317 11610 69 14 10 7
without flush:
5 0 784 24652 56056 3217864 0 0 310 29499 1379 7603 76 9 7 8
5 3 784 118456 54568 3209992 0 0 3249 33426 1173 6828 63 11 12 14
0 2 784 227628 54820 3198560 0 0 7840 30063 1146 8899 60 10 15 15
3 1 784 25608 55048 3313512 0 0 3251 36276 1194 7915 60 10 15 15
1 2 784 197324 49968 3194572 0 0 4714 35479 1281 8204 62 13 12 13
cs goes down by about 20-30%. but having trouble measuring overall speed
improvement (too many variables due to spec. execution etc. - need better
benchmark).
can't hurt.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.