[GitHub] [accumulo] ctubbsii commented on pull request #2665: Eventually Consistent scans / ScanServer feature

GitBox Wed, 15 Jun 2022 21:31:55 -0700


ctubbsii commented on PR #2665:
URL: https://github.com/apache/accumulo/pull/2665#issuecomment-1157220591

   @keith-turner wrote:
   > I am not sure if you are implying the client side plugin should have 
control over choosing tservers and sservers. If so, I would like to avoid that 
and keep the plugin narrowly scoped to choosing scan servers because of the 
following :

   That was what I was suggesting. The advantage of flattening the decision of 
choosing between tservers and sservers, and choosing among the sservers is that 
there is only one branching point when you zoom out and look at the the server 
selection logic, instead of two. One simplifies the bigger picture, but 
potentially makes the plugin more complicated. Keeping those decisions separate 
makes the zoomed out view look more complicated, but the job of the plugin is 
simpler.

   Having one decision point also enables more complex selection features in 
the plugin, like "I don't care if I get a tserver or a sserver... treat them 
equally", or "try a tserver first, but settle for a sserver if the tserver's 
load is high". The plugin can't do that if it is narrowly focused on a decision 
after the tserver is excluded.

   > * Any scan server can be chosen to service a query for a tablet.  Only one 
tserver can be chosen to service a tablet scan.

   The selector plugin does not need to be responsible for the all the logic 
that identifies the one tserver. It can be provided with a Supplier that 
executes our current logic, so it can have the option of selecting the tserver, 
but without all the complexity of locating it.

   > * Scan servers have a busy timeout and tservers do not.  The plugin 
specifies the busy timeout to use.
   > * History of busy timeout events is given to the plugin.  This allows it 
to possibly choose a different scan server based on past events.

   I don't think there's any reason a tserver can't have some of those 
features, in case a selector plugin wanted to treat the tserver as another 
possible scan server to choose from. Those features wouldn't be of much use if 
immediate consistency were required... but if it's not required, it would 
certainly be acceptable for a selector to choose the tserver if the sservers 
are busy or unavailable.

   I'm also wondering if the "busy timeout" concept can be made more 
generalized. Like, instead of a queue wait timeout, a sserver could be 
considered "busy" if its CPU load was high or something else. Computing this 
weight could be another SPI added in future. The first pass could just be the 
current "busy timeout"... just with a more generic name, so it doesn't strictly 
have to be a timeout in future, but some other selection weight.

   > Also the logic for choosing a tserver is not flexible and there is 
basically only one way to do it ATM.

   As explained here, I'm not proposing that we diverge from the current one 
way to do this. I'm only proposing that the selector be allowed to select it.

   > I think it makes sense to pass the scan exec hint

   I concede this point. The selection of the sserver is still part of the 
overall execution of the scan, and could make use of these hints, even if they 
are not used to determine whether the tserver is selected or the sservers are 
selected.

-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [accumulo] ctubbsii commented on pull request #2665: Eventually Consistent scans / ScanServer feature

Reply via email to