Nick Dimiduk created HBASE-28682:
------------------------------------
Summary: ITBLL and other MR-based integration tests should
heartbeat often
Key: HBASE-28682
URL: https://issues.apache.org/jira/browse/HBASE-28682
Project: HBase
Issue Type: Test
Components: integration tests, mapreduce
Reporter: Nick Dimiduk
We have this little note in our ITBLL harness,
{noformat}
// If we cause enough chaos, RPC requests might get into long backoffs.
During this
// time, it won't send keep alives to the map/reduce context. So increase
the timeout
// a bunch
{noformat}
Investigating, the ITBLL Generator's persist method updates the MR context
progress only every 100 puts. You'd think that would be enough, but given
chaos, it really isn't. What if we update progress with every put? Digging
through MR source code, it seems that calling the context.progress() method
only sets an AtomicBoolean that a progress update needs sent, actual sending of
progress reports is gated by {{mapreduce.task.progress-report.interval}}, or 1%
of {{mapreduce.task.timeout}}, which defaults to 1% of 300_000ms, or 3 seconds.
So yeah, we should probably update this AtomicBool much more often in chaotic
jobs, as doing so is effectively free and will improve reliability.
But still, every put is perhaps excessive. What if we add a pre-flush hook to
(Async)BufferedMutator so that a MR job can set this progress flag right before
the client disappears down into a retry loop? I bet other applications would
find such a hook useful as well.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)