Nick Dimiduk created HBASE-28682:
------------------------------------

             Summary: ITBLL and other MR-based integration tests should 
heartbeat often
                 Key: HBASE-28682
                 URL: https://issues.apache.org/jira/browse/HBASE-28682
             Project: HBase
          Issue Type: Test
          Components: integration tests, mapreduce
            Reporter: Nick Dimiduk


We have this little note in our ITBLL harness,
{noformat}
      // If we cause enough chaos, RPC requests might get into long backoffs. 
During this
      // time, it won't send keep alives to the map/reduce context. So increase 
the timeout
      // a bunch
{noformat}

Investigating, the ITBLL Generator's persist method updates the MR context 
progress only every 100 puts. You'd think that would be enough, but given 
chaos, it really isn't. What if we update progress with every put? Digging 
through MR source code, it seems that calling the context.progress() method 
only sets an AtomicBoolean that a progress update needs sent, actual sending of 
progress reports is gated by {{mapreduce.task.progress-report.interval}}, or 1% 
of {{mapreduce.task.timeout}}, which defaults to 1% of 300_000ms, or 3 seconds. 
So yeah, we should probably update this AtomicBool much more often in chaotic 
jobs, as doing so is effectively free and will improve reliability.

But still, every put is perhaps excessive. What if we add a pre-flush hook to 
(Async)BufferedMutator so that a MR job can set this progress flag right before 
the client disappears down into a retry loop? I bet other applications would 
find such a hook useful as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to