ctubbsii commented on PR #2665: URL: https://github.com/apache/accumulo/pull/2665#issuecomment-1157220591
@keith-turner wrote: > I am not sure if you are implying the client side plugin should have control over choosing tservers and sservers. If so, I would like to avoid that and keep the plugin narrowly scoped to choosing scan servers because of the following : That was what I was suggesting. The advantage of flattening the decision of choosing between tservers and sservers, and choosing among the sservers is that there is only one branching point when you zoom out and look at the the server selection logic, instead of two. One simplifies the bigger picture, but potentially makes the plugin more complicated. Keeping those decisions separate makes the zoomed out view look more complicated, but the job of the plugin is simpler. Having one decision point also enables more complex selection features in the plugin, like "I don't care if I get a tserver or a sserver... treat them equally", or "try a tserver first, but settle for a sserver if the tserver's load is high". The plugin can't do that if it is narrowly focused on a decision after the tserver is excluded. > * Any scan server can be chosen to service a query for a tablet. Only one tserver can be chosen to service a tablet scan. The selector plugin does not need to be responsible for the all the logic that identifies the one tserver. It can be provided with a Supplier that executes our current logic, so it can have the option of selecting the tserver, but without all the complexity of locating it. > * Scan servers have a busy timeout and tservers do not. The plugin specifies the busy timeout to use. > * History of busy timeout events is given to the plugin. This allows it to possibly choose a different scan server based on past events. I don't think there's any reason a tserver can't have some of those features, in case a selector plugin wanted to treat the tserver as another possible scan server to choose from. Those features wouldn't be of much use if immediate consistency were required... but if it's not required, it would certainly be acceptable for a selector to choose the tserver if the sservers are busy or unavailable. I'm also wondering if the "busy timeout" concept can be made more generalized. Like, instead of a queue wait timeout, a sserver could be considered "busy" if its CPU load was high or something else. Computing this weight could be another SPI added in future. The first pass could just be the current "busy timeout"... just with a more generic name, so it doesn't strictly have to be a timeout in future, but some other selection weight. > Also the logic for choosing a tserver is not flexible and there is basically only one way to do it ATM. As explained here, I'm not proposing that we diverge from the current one way to do this. I'm only proposing that the selector be allowed to select it. > I think it makes sense to pass the scan exec hint I concede this point. The selection of the sserver is still part of the overall execution of the scan, and could make use of these hints, even if they are not used to determine whether the tserver is selected or the sservers are selected. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
