[
https://issues.apache.org/jira/browse/LENS-899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436362#comment-15436362
]
Rajat Khandelwal commented on LENS-899:
---------------------------------------
I gave it another round of thought and arrived at the following approach. The
words that look like class names are there to indicate the behavior.
h3. Approach
* Driver interface will have a method which returns a RetryPolicyDecider.
RetryPolicyDecider is a new interface which takes a QueryContext and returns a
BackOffRetryHandler. This allows us to plug something like a driver level
policy factory, which can look at query errors and return a retry policy for a
query. Here, a retry policy is just a BackOffRetryHandler, which is an existing
interface. Providing the signature to put the next points in context.
{noformat}
public interface BackOffRetryHandler {
/**
* To know whether operation can be done now.
*
* @param failContext FailureContext holding failures till now.
*
* @return true if operation can be done now, false otherwise.
*/
boolean canTryOpNow(FailureContext failContext);
/**
* Get the time when the operation can be done next.
*
* @param failContext FailureContext holding failures till now.
*
* @return Next operation time in millis since epoch
*/
long getOperationNextTime(FailureContext failContext);
/**
* Has the operation exhausted all its retries
*
* @param failContext FailureContext holding failures till now.
*
* @return true if all retries have exhausted, false otherwise.
*/
boolean hasExhaustedRetries(FailureContext failContext);
}
{noformat}
* On query failure, the retry policy factory of the driver running the query is
asked to provide with a policy. The policy can be to do no retries by the
following class:
{noformat}
class NoRetryPolicy implements BackOffRetryHandler {
// not providing other methods to highlight the following
@Override
public boolean hasExhaustedRetries(FailureContext failContext) {
return true;
}
}
{noformat}
FailureContext will be replaced by a generic type and there will be a class
that wraps QueryContext in a failure context like object. hasExhaustedRetries
can be used to further read the query's current state of retries, the number of
attempts already done etc and decide whether to allow a retry or not. Hence,
this can control how many retries to do.
* Note that other two methods of the BackOffRetryHandler can make a complex
policy.
* So after getting the policy from the driver, hasExhaustedRetries will be
called to decide whether or not to do a retry.
* If we want to do a retry, we'll put this query at the front of queuedQueries.
A FailedAttempt will be extracted out and the state will be reset to QUEUED.
The extracted attempt object will also be serialized to failed_attempts table.
The table is described in earlier comments.
* QuerySubmitter picks from queuedQueries, makes a decision based on
constraints whether or not to send this query for launch. Based on that, it
either puts the query in the waiting queries data structure or launches a
QueryLauncher thread for this query. Here, the retry policy will be adapted to
a constraint. The adapter will be something like:
{noformat}
class BackOffPolicyToConstraintAdapter implements QueryLaunchingConstraint {
@Override
public boolean allowsLaunchOf(QueryContext candidateQuery,
EstimatedImmutableQueryCollection launchedQueries) {
BackOffRetryHandler policy = candidateQuery.getRetryPolicy();
return policy.canTryOpNow();
}
}
{noformat}
For queries having at least one failed attempt, this constraint will be checked
before other constraints. If the policy has some backoff, this query will be
put to waiting-queries, and some other query will be launched. This query won't
be re-looked at until one query finishes. At which point, queries will be
picked from waitingQueries to queuedQueries. For queries having at least one
failed attempt will be picked based on whether the backoff policy allows their
retry now. And if such a query is picked, it'll be put at the top of
queuedQueries, ready to be picked by QuerySubmitter in its next iteration.
* finished_queries table will contain the last attempt. failed_attempts table
will contain attempts after which a decision to do retry was taken. This allows
Lens to remain backward compatible. The default policy can be no retry policy,
and in that case, no retries will happen and failed_attempts table will remain
empty and finished_queries table will have no change in the meaning of columns.
h3. Complications
Most of these are based on the offline discussions I've had regarding the
approach.
* Analysis of finished queries will become complex. As the finished_queries
table contains only the final attempt (which might or might not have been
SUCCESSFUL). So the definition of wait time will change. Right now we just take
launchTime - submissionTime as the wait time of the query. Now, the query might
spend more time waiting, since each attempt might have its own wait.
* Along the same lines, a join will be required in most of the analysis of
finished queries.
* If we want the list of all attempts for a particular query, we have to do a
union, since the last attempt is only present in finished_queries. On the other
hand, if we try to solve for this, we lose the normalization of the lens
database that is present in the proposed approach.
* If a backoff policy refuses to let the query out of the waitingQueries list,
and the server arrives at a state in which no queries are running. So no
queries finish and no queries are picked from waiting-queries. In that case,
the query might remain stuck in waiting-queries. This should be easy to solve
for by adding some if guards in query submitter and waiting-queries-selector,
but does add some complication to the code.
Thoughts?
> Create Attempt framework
> ------------------------
>
> Key: LENS-899
> URL: https://issues.apache.org/jira/browse/LENS-899
> Project: Apache Lens
> Issue Type: Sub-task
> Components: server
> Reporter: Rajat Khandelwal
> Assignee: Rajat Khandelwal
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)