Adar Dembo created KUDU-2050:
--------------------------------

             Summary: Avoid peer eviction during block manager startup
                 Key: KUDU-2050
                 URL: https://issues.apache.org/jira/browse/KUDU-2050
             Project: Kudu
          Issue Type: Bug
          Components: fs, tserver
    Affects Versions: 1.4.0
            Reporter: Adar Dembo
            Priority: Critical


In larger deployments we've observed that opening the block manager can take a 
really long time, like tens of minutes or sometimes even hours. This is 
especially true as of 1.4 where the log block manager tries to optimize on-disk 
data structures during startup.

The default time to Raft peer eviction is 5 minutes. If one node is restarted 
and LBM startup takes over 5 minutes, or if all nodes are restarted and there's 
over 5 minutes of LBM startup time variance across them, the "slow" node could 
have all of its replicas evicted. Besides generating a lot of unnecessary work 
in rereplication, this effectively "defeats" the LBM optimizations in that it 
would have been equally slow (but more efficient) to reformat the node instead.

So, let's reorder startup such that LBM startup counts towards replica 
bootstrapping. One idea: adjust FsManager startup so that tablet-meta/cmeta 
files can be accessed early to construct bootstrapping replicas, but to defer 
opening of the block manager until after that time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to