Please keep in mind we are talking about two issues here: 1) The short default interval time, and 2) the issue that the canary table regions might not be on all servers.
Anyone here that tried write sniffing on a current cluster with the SLB and saw it work? Best, Lars On Mon, Feb 6, 2017 at 10:38 PM, Enis Söztutar <[email protected]> wrote: > Open an issue? > Enis > > On Mon, Feb 6, 2017 at 9:39 AM, Stack <[email protected]> wrote: > >> On Sun, Feb 5, 2017 at 2:25 AM, Lars George <[email protected]> wrote: >> >> > The next example is wrong too, claiming to show 60 secs, while it >> > shows 600 secs (the default value as well). >> > >> > The question is still, what is a good value for intervals? Anyone here >> > that uses the Canary that would like to chime in? >> > >> > >> I was hanging out with a user where on a mid-sized cluster with Canary >> running with defaults, the regionserver carrying meta was 100% CPU because >> of all the requests from Canary doing repeated full-table Scans. >> >> 6 seconds is too short. Seems like a typo that should be 60seconds. It is >> not as though the Canary is going to do anything about it if it finds >> something wrong. >> >> S >> >> >> >> >> > On Sat, Feb 4, 2017 at 5:40 PM, Ted Yu <[email protected]> wrote: >> > > Brief search on HBASE-4393 didn't reveal why the interval was >> shortened. >> > > >> > > If you read the first paragraph of: >> > > http://hbase.apache.org/book.html#_run_canary_test_as_daemon_mode >> > > >> > > possibly the reasoning was that canary would exit upon seeing some >> error >> > > (the first time). >> > > >> > > BTW There was a mismatch in the description for this command: (5 >> seconds >> > > vs. 50000 milliseconds) >> > > >> > > ${HBASE_HOME}/bin/hbase canary -daemon -interval 50000 -f false >> > > >> > > >> > > On Sat, Feb 4, 2017 at 8:21 AM, Lars George <[email protected]> >> > wrote: >> > > >> > >> Oh right, Ted. An earlier patch attached to the JIRA had 60 secs, the >> > >> last one has 6 secs. Am I reading this right? It hands 6000 into the >> > >> Thread.sleep() call, which takes millisecs. So that makes 6 secs >> > >> between checks, which seems super short, no? I might just dull here. >> > >> >> > >> On Sat, Feb 4, 2017 at 5:00 PM, Ted Yu <[email protected]> wrote: >> > >> > For the default interval , if you were looking at: >> > >> > >> > >> > private static final long DEFAULT_INTERVAL = 6000; >> > >> > >> > >> > The above was from: >> > >> > >> > >> > HBASE-4393 Implement a canary monitoring program >> > >> > >> > >> > which was integrated on Tue Apr 24 07:20:16 2012 >> > >> > >> > >> > FYI >> > >> > >> > >> > On Sat, Feb 4, 2017 at 4:06 AM, Lars George <[email protected]> >> > >> wrote: >> > >> > >> > >> >> Also, the default interval used to be 60 secs, but is now 6 secs. >> > Does >> > >> >> that make sense? Seems awfully short for a default, assuming you >> have >> > >> >> many regions or servers. >> > >> >> >> > >> >> On Sat, Feb 4, 2017 at 11:54 AM, Lars George < >> [email protected]> >> > >> >> wrote: >> > >> >> > Hi, >> > >> >> > >> > >> >> > Looking at the Canary tool, it tries to ensure that all canary >> test >> > >> >> > table regions are spread across all region servers. If that is >> not >> > the >> > >> >> > case, it calls: >> > >> >> > >> > >> >> > if (numberOfCoveredServers < numberOfServers) { >> > >> >> > admin.balancer(); >> > >> >> > } >> > >> >> > >> > >> >> > I doubt this will help with the StochasticLoadBalancer, which is >> > known >> > >> >> > to consider per-table balancing as one of many factors. In >> > practice, >> > >> >> > the SLB will most likely _not_ distribute the canary regions >> > >> >> > sufficiently, leaving gap in the check. Switching on the >> per-table >> > >> >> > option is discouraged against to let it do its thing. >> > >> >> > >> > >> >> > Just pointing it out for vetting. >> > >> >> > >> > >> >> > Lars >> > >> >> >> > >> >> > >>
