[ https://issues.apache.org/jira/browse/HBASE-16499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ashish Singhi updated HBASE-16499: ---------------------------------- Attachment: HBASE-16499.patch > slow replication for small HBase clusters > ----------------------------------------- > > Key: HBASE-16499 > URL: https://issues.apache.org/jira/browse/HBASE-16499 > Project: HBase > Issue Type: Bug > Components: Replication > Reporter: Vikas Vishwakarma > Assignee: Ashish Singhi > Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-16499.patch, HBASE-16499.patch > > > For small clusters 10-20 nodes we recently observed that replication is > progressing very slowly when we do bulk writes and there is lot of lag > accumulation on AgeOfLastShipped / SizeOfLogQueue. From the logs we observed > that the number of threads used for shipping wal edits in parallel comes from > the following equation in HBaseInterClusterReplicationEndpoint > int n = Math.min(Math.min(this.maxThreads, entries.size()/100+1), > replicationSinkMgr.getSinks().size()); > ... > for (int i=0; i<n; i++) { > entryLists.add(new ArrayList<HLog.Entry>(entries.size()/n+1)); <-- > batch size > } > ... > for (int i=0; i<entryLists.size(); i++) { > ..... > // RuntimeExceptions encountered here bubble up and are handled > in ReplicationSource > pool.submit(createReplicator(entryLists.get(i), i)); <-- > concurrency > futures++; > } > } > maxThreads is fixed & configurable and since we are taking min of the three > values n gets decided based replicationSinkMgr.getSinks().size() when we have > enough edits to replicate > replicationSinkMgr.getSinks().size() is decided based on > int numSinks = (int) Math.ceil(slaveAddresses.size() * ratio); > where ratio is this.ratio = conf.getFloat("replication.source.ratio", > DEFAULT_REPLICATION_SOURCE_RATIO); > Currently DEFAULT_REPLICATION_SOURCE_RATIO is set to 10% so for small > clusters of size 10-20 RegionServers the value we get for numSinks and hence > n is very small like 1 or 2. This substantially reduces the pool concurrency > used for shipping wal edits in parallel effectively slowing down replication > for small clusters and causing lot of lag accumulation in AgeOfLastShipped. > Sometimes it takes tens of hours to clear off the entire replication queue > even after the client has finished writing on the source side. > We are running tests by varying replication.source.ratio and have seen > multi-fold improvement in total replication time (will update the results > here). I wanted to propose here that we should increase the default value for > replication.source.ratio also so that we have sufficient concurrency even for > small clusters. We figured it out after lot of iterations and debugging so > probably slightly higher default will save the trouble. -- This message was sent by Atlassian JIRA (v7.6.3#76005)