Re: Canary Test Tool and write sniffing

Lars George Sat, 11 Feb 2017 02:09:32 -0800

Please keep in mind we are talking about two issues here:

1) The short default interval time, and
2) the issue that the canary table regions might not be on all servers.


Anyone here that tried write sniffing on a current cluster with the
SLB and saw it work?

Best,
Lars


On Mon, Feb 6, 2017 at 10:38 PM, Enis Söztutar <[email protected]> wrote:
> Open an issue?
> Enis
>
> On Mon, Feb 6, 2017 at 9:39 AM, Stack <[email protected]> wrote:
>
>> On Sun, Feb 5, 2017 at 2:25 AM, Lars George <[email protected]> wrote:
>>
>> > The next example is wrong too, claiming to show 60 secs, while it
>> > shows 600 secs (the default value as well).
>> >
>> > The question is still, what is a good value for intervals? Anyone here
>> > that uses the Canary that would like to chime in?
>> >
>> >
>> I was hanging out with a user where on a mid-sized cluster with Canary
>> running with defaults, the regionserver carrying meta was 100% CPU because
>> of all the requests from Canary doing repeated full-table Scans.
>>
>> 6 seconds is too short. Seems like a typo that should be 60seconds. It is
>> not as though the Canary is going to do anything about it if it finds
>> something wrong.
>>
>> S
>>
>>
>>
>>
>> > On Sat, Feb 4, 2017 at 5:40 PM, Ted Yu <[email protected]> wrote:
>> > > Brief search on HBASE-4393 didn't reveal why the interval was
>> shortened.
>> > >
>> > > If you read the first paragraph of:
>> > > http://hbase.apache.org/book.html#_run_canary_test_as_daemon_mode
>> > >
>> > > possibly the reasoning was that canary would exit upon seeing some
>> error
>> > > (the first time).
>> > >
>> > > BTW There was a mismatch in the description for this command: (5
>> seconds
>> > > vs. 50000 milliseconds)
>> > >
>> > > ${HBASE_HOME}/bin/hbase canary -daemon -interval 50000 -f false
>> > >
>> > >
>> > > On Sat, Feb 4, 2017 at 8:21 AM, Lars George <[email protected]>
>> > wrote:
>> > >
>> > >> Oh right, Ted. An earlier patch attached to the JIRA had 60 secs, the
>> > >> last one has 6 secs. Am I reading this right? It hands 6000 into the
>> > >> Thread.sleep() call, which takes millisecs. So that makes 6 secs
>> > >> between checks, which seems super short, no? I might just dull here.
>> > >>
>> > >> On Sat, Feb 4, 2017 at 5:00 PM, Ted Yu <[email protected]> wrote:
>> > >> > For the default interval , if you were looking at:
>> > >> >
>> > >> >   private static final long DEFAULT_INTERVAL = 6000;
>> > >> >
>> > >> > The above was from:
>> > >> >
>> > >> >     HBASE-4393 Implement a canary monitoring program
>> > >> >
>> > >> > which was integrated on Tue Apr 24 07:20:16 2012
>> > >> >
>> > >> > FYI
>> > >> >
>> > >> > On Sat, Feb 4, 2017 at 4:06 AM, Lars George <[email protected]>
>> > >> wrote:
>> > >> >
>> > >> >> Also, the default interval used to be 60 secs, but is now 6 secs.
>> > Does
>> > >> >> that make sense? Seems awfully short for a default, assuming you
>> have
>> > >> >> many regions or servers.
>> > >> >>
>> > >> >> On Sat, Feb 4, 2017 at 11:54 AM, Lars George <
>> [email protected]>
>> > >> >> wrote:
>> > >> >> > Hi,
>> > >> >> >
>> > >> >> > Looking at the Canary tool, it tries to ensure that all canary
>> test
>> > >> >> > table regions are spread across all region servers. If that is
>> not
>> > the
>> > >> >> > case, it calls:
>> > >> >> >
>> > >> >> > if (numberOfCoveredServers < numberOfServers) {
>> > >> >> >   admin.balancer();
>> > >> >> > }
>> > >> >> >
>> > >> >> > I doubt this will help with the StochasticLoadBalancer, which is
>> > known
>> > >> >> > to consider per-table balancing as one of many factors. In
>> > practice,
>> > >> >> > the SLB will most likely _not_ distribute the canary regions
>> > >> >> > sufficiently, leaving gap in the check. Switching on the
>> per-table
>> > >> >> > option is discouraged against to let it do its thing.
>> > >> >> >
>> > >> >> > Just pointing it out for vetting.
>> > >> >> >
>> > >> >> > Lars
>> > >> >>
>> > >>
>> >
>>

Re: Canary Test Tool and write sniffing

Reply via email to