On Sun, Feb 5, 2017 at 2:25 AM, Lars George <[email protected]> wrote:
> The next example is wrong too, claiming to show 60 secs, while it > shows 600 secs (the default value as well). > > The question is still, what is a good value for intervals? Anyone here > that uses the Canary that would like to chime in? > > I was hanging out with a user where on a mid-sized cluster with Canary running with defaults, the regionserver carrying meta was 100% CPU because of all the requests from Canary doing repeated full-table Scans. 6 seconds is too short. Seems like a typo that should be 60seconds. It is not as though the Canary is going to do anything about it if it finds something wrong. S > On Sat, Feb 4, 2017 at 5:40 PM, Ted Yu <[email protected]> wrote: > > Brief search on HBASE-4393 didn't reveal why the interval was shortened. > > > > If you read the first paragraph of: > > http://hbase.apache.org/book.html#_run_canary_test_as_daemon_mode > > > > possibly the reasoning was that canary would exit upon seeing some error > > (the first time). > > > > BTW There was a mismatch in the description for this command: (5 seconds > > vs. 50000 milliseconds) > > > > ${HBASE_HOME}/bin/hbase canary -daemon -interval 50000 -f false > > > > > > On Sat, Feb 4, 2017 at 8:21 AM, Lars George <[email protected]> > wrote: > > > >> Oh right, Ted. An earlier patch attached to the JIRA had 60 secs, the > >> last one has 6 secs. Am I reading this right? It hands 6000 into the > >> Thread.sleep() call, which takes millisecs. So that makes 6 secs > >> between checks, which seems super short, no? I might just dull here. > >> > >> On Sat, Feb 4, 2017 at 5:00 PM, Ted Yu <[email protected]> wrote: > >> > For the default interval , if you were looking at: > >> > > >> > private static final long DEFAULT_INTERVAL = 6000; > >> > > >> > The above was from: > >> > > >> > HBASE-4393 Implement a canary monitoring program > >> > > >> > which was integrated on Tue Apr 24 07:20:16 2012 > >> > > >> > FYI > >> > > >> > On Sat, Feb 4, 2017 at 4:06 AM, Lars George <[email protected]> > >> wrote: > >> > > >> >> Also, the default interval used to be 60 secs, but is now 6 secs. > Does > >> >> that make sense? Seems awfully short for a default, assuming you have > >> >> many regions or servers. > >> >> > >> >> On Sat, Feb 4, 2017 at 11:54 AM, Lars George <[email protected]> > >> >> wrote: > >> >> > Hi, > >> >> > > >> >> > Looking at the Canary tool, it tries to ensure that all canary test > >> >> > table regions are spread across all region servers. If that is not > the > >> >> > case, it calls: > >> >> > > >> >> > if (numberOfCoveredServers < numberOfServers) { > >> >> > admin.balancer(); > >> >> > } > >> >> > > >> >> > I doubt this will help with the StochasticLoadBalancer, which is > known > >> >> > to consider per-table balancing as one of many factors. In > practice, > >> >> > the SLB will most likely _not_ distribute the canary regions > >> >> > sufficiently, leaving gap in the check. Switching on the per-table > >> >> > option is discouraged against to let it do its thing. > >> >> > > >> >> > Just pointing it out for vetting. > >> >> > > >> >> > Lars > >> >> > >> >
