[ 
https://issues.apache.org/jira/browse/SENTRY-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1508:
----------------------------------
    Description: 
MetastorePlugin.java has several implementation issues that need to be fixed by 
this JIRA:

a) The state of MetastorePlugin is not communicated cleanly. The constructor 
initializes successfully regardless of whether HMS cache initialization was 
successful or not. The initThread.run() can easily fail when, say, 
cacheInitializer.createInitialUpdate() fails due to misconfigured HMS, so it is 
quite possible. The state is communicated through 3 variables: boolean 
initComplete, Throwable initError, and UpdateableAuthzPaths authzPaths. 
initComplete is always set to true from finally {} block, so it really 
indicates the end of an _attempt_ to initialize the cache. It, thus, should 
only be used to detect whether the cache initialization is still in-progress, 
not whether initialization has been successful. To determine whether cache 
initialization was successful, initComplete should be used together with 
initError, which is set only when initialization fails. This is not how the 
code implements it in more than one place, which should be clear from analyzing 
a patch.

b) There are several synchronization issues that may lead to the failure of 
synchronizing with the Sentry. They usually have to do with inconsistent use of 
monitor objects to guard access to thread-unsafe objects. E.g. 
authzPaths.updatePartial() uses a lock created just for the scope of a single 
call, which makes it useless for synchronization. A single read-write lock 
should be used instead, as there is one read and one write operation performed 
on authzPaths within MetastorePlugin. 

c) Failure to create authzPaths is not communicated clearly to a caller. When 
it is dereferenced from the public callback APIs, it results in 
NullPointerException. The suggested fix of this issue in SENTRY-1270 avoids 
NullPointerException by detecting the null authzPaths, printing error message, 
and returning. This effectively leads to consistent failure to update local 
cache without notifying the caller. The suggested solution would be not only to 
throw IllegalArgumentException to the caller instead, but to also specify why 
exactly authzPaths == null - due to initialization still being in progress or 
due to its failure.

d) The housekeeping thread running SyncTask to synchronize with Sentry state 
fails when authzPaths == null. However, it keeps printing misleading "#### 
Metastore Plugin cache has not finished initialization." message even in the 
case of a permanent cache initialization failure. It needs to print the real 
cause of this condition, by analyzing authzPaths together with initComplete and 
initError values.

e) getClient() may return null, in which case dereferencing it causes not so 
helpful NullPointerException. Instead, while getClient() may still print error 
message, it should also re-throw an original exception, which would then be 
much easier to debug.

f) Each code fork deserves log message: e.g. addPath() retrurns immediately if 
PathsUpdate.parsePath() returns null - w/o printing any log.

g) in SyncTask.run() - if notificationLock.tryLock() succeeds, yet the next 
line's expression "MetastorePlugin.this.authzPaths == null" is evaluated to 
false, run() exits w/o a chance to call notificationLock.unlock(). All the code 
following lock.tryLock() should be inside try-catch-finally, with finally's 
first line being lock.unlock(), as a general safe pattern.

h) some additional misc synchronization issues may also be addressed by the 
patch.

What's not in the scope: the initial patch honors the original design 
supporting asynchronous synchronization. It means that MetastorePlugin is 
always constructed, even if it's in a broken state. It is up to the public 
callback APIs to properly inform the caller of the broken state.

  was:
MetastorePlugin.java constructor initializes successfully regardless of whether 
initThread.run() was successful. E.g. if cacheInitializer.createInitialUpdate() 
cal inside the constructor fails (in one case in the field it was due to an 
invalid table uri in HMS database, whose parsing produced 
ArrayIndexOutOfBoundsException), initComplete is still assigned to true and 
MetastorePlugin gets constructed. While initError is assigned to non-null value 
in case of failure, it is effectively ignored as will be described later.

MetastorePlugin implements several callbacks to be called on the Hive side, to 
update Sentry state. It is critical functionality for enforcing access control. 
Its callbacks (addPath(), removeAllPaths(), etc) all end up calling 
notifySentryAndApplyLocal() which starts with the following code:
if (initComplete) {
    processUpdate()
}...
which means that processUpdate() is called even when MetastorePlugin 
initialization has failed (i.e. initError != null). There is an option of 
asynchronous initialization, when MetastorePlugin is constructed and returned 
to the caller before initialization is complete, but the code makes no 
distinction between the two cases, as will be shown below.

processUpdate() called regardless of whether MetastorePlugin a) has been 
initialized (sync mode, success), or b) still being initialized (async mode), 
or c) failed to initialized (either mode). It calls (unconditionally) 
applyLocal() and notifySentry(). aplyLocal() calls authzPaths.updatePartial() 
which immediately fails with NullPointerException if authzPaths == null which 
happens if initThread within the constructor fails. 

SENTRY-1270 avoids NullPointerException by execuitng this before dereferencing 
authzPaths inside applyLocal():
if(authzPaths == null) {
LOGGER.error(initializationFailureMsg);
return;
}
However, this leads to hard-to-diagnose behavior, because a) local updates fail 
forever, and b) housekeeping thread running SyncTask to synchronize with 
Sentry-side state fails as well, printing endlessly misleading "#### Metastore 
Plugin cache has not finished initialization." message suggesting its temporary 
condition as opposed to a permanent failure. Failure to sync up with Sentry 
state can lead to very subtle, often intermittent, problems with acls. 
MetastorePlugin and its caller never get fully aware of the permanent nature of 
init failure.

Suggestion:
a) define 3 states in MetastorePlugin: initializing, initialized, init_failed.
2) communicate those states to the caller clearly, by _immediately_ throwing 
some kind of exception from public callback APIs in the cases preventing those 
callback from completing successfully:
    a) for state == initializing, throw some kind of 
InitializationInProgressException. This only applies for asynchronous 
initialization configuration. It is unknown whether it is actually deployed 
anywhere, but the code exists and there is no reason why not to support it. 
Async init makes sense as an optimization, as long as 
initialization-in-progress condition is clearly communicated.
   b) for state == init_failed throw InitializationFailedException of some 
sort. For usability, it should contain the cause - the original exception 
originating either from the constructor or from the initThread.run() started 
from the constructor (in sync or async modes alike).

Although in the sync mode initialization failure could be detected easily by 
constructor failure, we may want to preserve the current design, in which 
constructor always succeeds, so the client code does not need to be aware of 
whether init mode is configured as sync vs async.

Additional cleanup:
1. getClient() is always dereferenced, while it can actually return null 
(exception is swallowed and error is logged). Suggested fix: getClient() may 
still print error message, but it should also re-throw an original exception - 
it is much easier to debug than dealing with inevitable and uninformative 
NullPointerException up the caller stack. General rule - error messages are to 
help with diagnostic, not to replace throwing exception, which is the only way 
to make sure error conditions are properly propagated to the caller.

2. Each code fork deserves log message: e.g. addPath() retrurns immediately if 
PathsUpdate.parsePath() returns null - w/o printing any log.

3. in SyncTask.run() - if notificationLock.tryLock() succeeds, yet the next 
line's expression "MetastorePlugin.this.authzPaths == null" is evaluated to 
false, run() exits w/o a chance to call notificationLock.unlock(). All the code 
following lock.tryLock() should be inside try-catch-finally, with finally's 
first line being lock.unlock(), as a general safe pattern.


> MetastorePlugin.java does not handle properly initialization failure
> --------------------------------------------------------------------
>
>                 Key: SENTRY-1508
>                 URL: https://issues.apache.org/jira/browse/SENTRY-1508
>             Project: Sentry
>          Issue Type: Bug
>            Reporter: Vadim Spector
>            Assignee: Vadim Spector
>            Priority: Critical
>         Attachments: SENTRY-1508.001.patch
>
>
> MetastorePlugin.java has several implementation issues that need to be fixed 
> by this JIRA:
> a) The state of MetastorePlugin is not communicated cleanly. The constructor 
> initializes successfully regardless of whether HMS cache initialization was 
> successful or not. The initThread.run() can easily fail when, say, 
> cacheInitializer.createInitialUpdate() fails due to misconfigured HMS, so it 
> is quite possible. The state is communicated through 3 variables: boolean 
> initComplete, Throwable initError, and UpdateableAuthzPaths authzPaths. 
> initComplete is always set to true from finally {} block, so it really 
> indicates the end of an _attempt_ to initialize the cache. It, thus, should 
> only be used to detect whether the cache initialization is still in-progress, 
> not whether initialization has been successful. To determine whether cache 
> initialization was successful, initComplete should be used together with 
> initError, which is set only when initialization fails. This is not how the 
> code implements it in more than one place, which should be clear from 
> analyzing a patch.
> b) There are several synchronization issues that may lead to the failure of 
> synchronizing with the Sentry. They usually have to do with inconsistent use 
> of monitor objects to guard access to thread-unsafe objects. E.g. 
> authzPaths.updatePartial() uses a lock created just for the scope of a single 
> call, which makes it useless for synchronization. A single read-write lock 
> should be used instead, as there is one read and one write operation 
> performed on authzPaths within MetastorePlugin. 
> c) Failure to create authzPaths is not communicated clearly to a caller. When 
> it is dereferenced from the public callback APIs, it results in 
> NullPointerException. The suggested fix of this issue in SENTRY-1270 avoids 
> NullPointerException by detecting the null authzPaths, printing error 
> message, and returning. This effectively leads to consistent failure to 
> update local cache without notifying the caller. The suggested solution would 
> be not only to throw IllegalArgumentException to the caller instead, but to 
> also specify why exactly authzPaths == null - due to initialization still 
> being in progress or due to its failure.
> d) The housekeeping thread running SyncTask to synchronize with Sentry state 
> fails when authzPaths == null. However, it keeps printing misleading "#### 
> Metastore Plugin cache has not finished initialization." message even in the 
> case of a permanent cache initialization failure. It needs to print the real 
> cause of this condition, by analyzing authzPaths together with initComplete 
> and initError values.
> e) getClient() may return null, in which case dereferencing it causes not so 
> helpful NullPointerException. Instead, while getClient() may still print 
> error message, it should also re-throw an original exception, which would 
> then be much easier to debug.
> f) Each code fork deserves log message: e.g. addPath() retrurns immediately 
> if PathsUpdate.parsePath() returns null - w/o printing any log.
> g) in SyncTask.run() - if notificationLock.tryLock() succeeds, yet the next 
> line's expression "MetastorePlugin.this.authzPaths == null" is evaluated to 
> false, run() exits w/o a chance to call notificationLock.unlock(). All the 
> code following lock.tryLock() should be inside try-catch-finally, with 
> finally's first line being lock.unlock(), as a general safe pattern.
> h) some additional misc synchronization issues may also be addressed by the 
> patch.
> What's not in the scope: the initial patch honors the original design 
> supporting asynchronous synchronization. It means that MetastorePlugin is 
> always constructed, even if it's in a broken state. It is up to the public 
> callback APIs to properly inform the caller of the broken state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to