[
https://issues.apache.org/jira/browse/PHOENIX-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281234#comment-16281234
]
James Taylor commented on PHOENIX-4130:
---------------------------------------
Thanks for the patch, [~vincentpoon]. I like this approach. A few
comments/questions:
- Once the index is put in a PENDING_ACTIVE state, who puts it back into an
ACTIVE state if the client retries succeed? Are you counting on the partial
index rebuilder in MetaDataRegionObserver to do that? The downside of that is
that we'd replay some edits that have already been processed, but if we expect
this to be rare perhaps that's ok. An alternative would be for Phoenix to
handle the client retries and set the index status back to ACTIVE if they
succeed. I'm not positive, but we may be able to detect from the client that
retries were done and then we could potentially set the index back to ACTIVE
and clear the INDEX_DISABLE_TIMESTAMP (FYI, see below for a util function that
does this).
- One good side effect of putting the index to a PENDING_ACTIVE state is that
if the client were to crash during the retries, the partial index rebuilder
would do the replay and eventual disable if the replay took too long. If this
were to happen, though, do you think the lag time is too long to continue to
use the index even though it's out of sync with the client (up to 45 minutes
while previously it would have been ~15 seconds)? Another (though more complex)
alternative would be to invent another state like PENDING_DISABLE which would
act the same as PENDING_ACTIVE except on the client we could treat the index as
disabled if the index stays in this state longer than a configurable amount of
time (i.e. the same amount of time we were willing to retry on the server
side). This would be a much shorter window in this case than the default 45
minutes of keeping the index active until the partial index rebuilder gives up
and marks it as permanently disabled.
- One corner case that's not handled is if we're updating indexes within the
RS. This can occur as an optimization when auto commit is on and a DML
statement is operating on a single table (i.e. DELETE FROM T). In this case,
the commits occur in UngroupedAggregateRegionObserver.commit(), unfortunately
outside of the abstraction we have in MutationState.
- We'll want to document that the QueryServices.INDEX_FAILURE_DISABLE_INDEX
config property needs to be set both on the client and server side as we're
relying on that to determine both the client behaviour and the server behaviour
(where before it was only used on the server side). Not a huge deal, but if we
can prevent needing to check it client side (maybe based on information thrown
back in the exception?), then that'd be an improvement IMHO.
- You might want to consider using the IndexUtil.updateIndexState(conn,
fullIndexTableName, PIndexState.DISABLE, 0L) method to disable the index
instead of using the one you wrote. If you use yours, then you'd want to make
sure to put the schema name, table name, and index name in double quotes in
case they're case sensitive (as otherwise you'd get a TableNotFoundException).
{code}
+ private void disableIndex(String dataTableFullName, String indexName)
throws SQLException {
+ logger.info(
+ "Disabling index after hitting max number of index write retries:
" + indexName);
+ String disableIndexDDL = "ALTER INDEX %s ON %s DISABLE";
+ connection.createStatement().execute(String.format(disableIndexDDL,
+ SchemaUtil.getTableNameFromFullName(indexName),
dataTableFullName));
+ }
{code}
> Avoid server retries for mutable indexes
> ----------------------------------------
>
> Key: PHOENIX-4130
> URL: https://issues.apache.org/jira/browse/PHOENIX-4130
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Lars Hofhansl
> Assignee: Vincent Poon
> Fix For: 4.14.0
>
> Attachments: PHOENIX-4130.v1.master.patch
>
>
> Had some discussions with [~jamestaylor], [~samarthjain], and [~vincentpoon],
> during which I suggested that we can possibly eliminate retry loops happening
> at the server that cause the handler threads to be stuck potentially for
> quite a while (at least multiple seconds to ride over common scenarios like
> splits).
> Instead we can do the retries at the Phoenix client that.
> So:
> # The index updates are not retried on the server. (retries = 0)
> # A failed index update would set the failed index timestamp but leave the
> index enabled.
> # Now the handler thread is done, it throws an appropriate exception back to
> the client.
> # The Phoenix client can now retry. When those retries fail the index is
> disabled (if the policy dictates that) and throw the exception back to its
> caller.
> So no more waiting is needed on the server, handler threads are freed
> immediately.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)