Yeah... cancel really mostly serves the purpose of unloading the cluster for the next requirement. It can help avoid having a node go postal as well, but that mostly affects the next query, not the current one.
On Tue, Sep 11, 2012 at 6:54 PM, Jason Frantz <[email protected]> wrote: > Definitely agree with many of the points in the link. > > The PowerDrill paper also mentions a variant of this where each query > fragment is sent to two machines, and the results for that fragment are > used from whatever machine responds first. So in that case it's not so much > a "cancel" as an "ignore". > > On Tue, Sep 11, 2012 at 11:37 AM, Ted Dunning <[email protected]> > wrote: > > > Headed into Thursday's meetup, this paper by Jeff Dean provides a very > good > > description of strategies for getting fast response times with variable > > quality infrastructure. > > > > http://research.google.com/people/jeff/latency.html > > > > The key point here is that it is very important to have asynchronous > > queries with a cancel. Above that level, there needs to be a simple > > strategy for pushing second versions of queries out to the workers and > > canceling defunct or redundant queries. > > >
