Vikas Vishwakarma created HBASE-16499:
-----------------------------------------

             Summary: slow replication for small HBase clusters
                 Key: HBASE-16499
                 URL: https://issues.apache.org/jira/browse/HBASE-16499
             Project: HBase
          Issue Type: Bug
            Reporter: Vikas Vishwakarma
            Assignee: Vikas Vishwakarma


For small clusters 10-20 nodes we recently observed that replication is 
progressing very slowly when we do bulk writes and there is lot of lag 
accumulation on AgeOfLastShipped / SizeOfLogQueue. From the logs we observed 
that the number of threads used for shipping wal edits in parallel comes from 
the following equation in HBaseInterClusterReplicationEndpoint

int n = Math.min(Math.min(this.maxThreads, entries.size()/100+1),
      replicationSinkMgr.getSinks().size());
... 
      for (int i=0; i<n; i++) {
        entryLists.add(new ArrayList<HLog.Entry>(entries.size()/n+1)); <-- 
batch size
      }
...
        for (int i=0; i<entryLists.size(); i++) {
         .....
            // RuntimeExceptions encountered here bubble up and are handled in 
ReplicationSource
            pool.submit(createReplicator(entryLists.get(i), i));  <-- 
concurrency 
            futures++;
          }
        }

maxThreads is fixed & configurable and since we are taking min of the three 
values n gets decided based replicationSinkMgr.getSinks().size() when we have 
enough edits to replicate

replicationSinkMgr.getSinks().size() is decided based on 
int numSinks = (int) Math.ceil(slaveAddresses.size() * ratio);
where ratio is this.ratio = conf.getFloat("replication.source.ratio", 
DEFAULT_REPLICATION_SOURCE_RATIO);

Currently DEFAULT_REPLICATION_SOURCE_RATIO is set to 10% so for small clusters 
of size 10-20 RegionServers  the value we get for numSinks and hence n is very 
small like 1 or 2. This substantially reduces the pool concurrency used for 
shipping wal edits in parallel effectively slowing down replication for small 
clusters and causing lot of lag accumulation in AgeOfLastShipped. Sometimes it 
takes tens of hours to clear off the entire replication queue even after the 
client has finished writing on the source side. 

We are running tests by varying replication.source.ratio and have seen 
multi-fold improvement in total replication time (will update the results 
here). I wanted to propose here that we should increase the default value for 
replication.source.ratio also so that we have sufficient concurrency even for 
small clusters. We figured it out after lot of iterations and debugging so 
probably slightly higher default will save the trouble. 






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to