> On Oct. 13, 2016, 5:02 p.m., Alejandro Fernandez wrote: > > ambari-server/src/main/resources/common-services/HBASE/0.96.0.2.0/package/scripts/upgrade.py, > > line 64 > > <https://reviews.apache.org/r/52833/diff/1/?file=1534837#file1534837line64> > > > > Let's decrease the sleep time to 10 secs. > > Jonathan Hurley wrote: > Is there a reason? I am hesistent against changing stuff like this; we've > been burned before where customer environments take much longer than we > think. HBase, especially, as it rebuilds regions.
Maybe decrease sleep by factor of 3 and increase attempts by same number. - Alejandro ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/52833/#review152531 ----------------------------------------------------------- On Oct. 13, 2016, 3:19 p.m., Jonathan Hurley wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/52833/ > ----------------------------------------------------------- > > (Updated Oct. 13, 2016, 3:19 p.m.) > > > Review request for Ambari, Alejandro Fernandez and Nate Cole. > > > Bugs: AMBARI-18590 > https://issues.apache.org/jira/browse/AMBARI-18590 > > > Repository: ambari > > > Description > ------- > > During a rolling upgrade, the upgrade orchestration must wait for each > RegionServer to register with the HBase master before moving onto the next RS > restart. This is a very asynchronous process which may occur several minutes > after the daemon has actually started. > > We have a check now which uses {{hbase shell}} along with {{status 'simple'}} > to determine if the host has registered by looking for the hostname. > > However, if reverse DNS is not enabled, then this could potentially be IP > addresses. As a result, the check would always fail during upgrades: > > The HBase status command we use is {{status simple}}, which returns like so: > > ``` > active master: 10.0.0.8:16000 1475801031124 > 2 backup masters > 10.0.0.10:16000 1475801061290 > 10.0.0.13:16000 1475801046018 > 2 live servers > 10.0.0.5:16020 1475798271407 > requestsPerSecond=0.0, numberOfOnlineRegions=2, usedHeapMB=159, > maxHeapMB=7840, numberOfStores=3, numberOfStorefiles=1, > storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, > storefileIndexSizeMB=0, readRequestsCount=14, writeRequestsCount=1, > rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, > totalCompactingKVs=14, currentCompactedKVs=14, compactionProgressPct=1.0, > coprocessors=[MultiRowMutationEndpoint, SecureBulkLoadEndpoint] > 10.0.0.7:16020 1475872741297 > requestsPerSecond=0.0, numberOfOnlineRegions=1, usedHeapMB=1002, > maxHeapMB=7840, numberOfStores=1, numberOfStorefiles=1, > storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, > storefileIndexSizeMB=0, readRequestsCount=0, writeRequestsCount=0, > rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, > totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, > coprocessors=[SecureBulkLoadEndpoint] > 0 dead servers > Aggregate load: 0, regions: 3 > ``` > > If this lookup fails for the hostname, we should also try by IP address. > > > Diffs > ----- > > > ambari-server/src/main/resources/common-services/HBASE/0.96.0.2.0/package/scripts/upgrade.py > f1fa80c > > Diff: https://reviews.apache.org/r/52833/diff/ > > > Testing > ------- > > Total run:1133 > Total errors:0 > Total failures:0 > OK > > > Thanks, > > Jonathan Hurley > >