[jira] [Commented] (SENTRY-1570) One task failure in MetastoreCacheInitializer should not cause HMS not to start

Vadim Spector (JIRA) Mon, 19 Dec 2016 09:36:49 -0800

    [ 
https://issues.apache.org/jira/browse/SENTRY-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15761784#comment-15761784
 ]


Vadim Spector commented on SENTRY-1570:
---------------------------------------

[~aihuaxu], [~hahao]

I would be very troubled by this approach. Corrupted/malformed HMS data is a 
serious issue, and it needs to be fixed in the field - ASAP. Ignoring corrupted 
data, even temporarily, can lead to serious security breaches, which is exactly 
what Sentry serves to prevent.

You refer to SENTRY-1564 - in the corresponding escalation that lead to this 
patch _thousands_ of tables ended up with no associated SD_ID in SDS tables - 
all due to DB migration. It's a massive failure - we certainly don't want 
Sentry to continue, it would be a disservice to our customers. Instead, we want 
Sentry to provide as clear messages as possible, so the client could fix the 
issue asap, which SENTRY-1564 as well as some other upstream JIRAs address.

The client did not know or care care if SD_ID was missing, until we found it 
and pointed at it. They only cared about ACLs set as expected, and when they 
were not - they blamed Sentry for it.

If Sentry ignores bad HMS data, we would have more escalations, not less. They 
would be of a different kind - complaining that HDFS ACLs are not correct, 
which is much harder to troubleshoot. Plus, it could lead to serious security 
breaches in security-critical deployments - really bad for us and bad for our 
customers.

In my view, there are three parts to the right solution:
a) On Sentry side - report bad HMS data as clearly as possible. Bunch of JIRAs 
already address it, but if you think more can be done - I'm all for it.
b) Educate customers how to recognize HMS initialization problem and where in 
the logs to look for specific corrupted data information.
c) On Hive side - come up with the HMS data validation script, to validate HMS 
data periodically - certainly after big things like DB migration. It should be 
a pretty straightforward series of HSQL queries.

These measures alone should be sufficient to prevent escalations due to 
corrupted HMS data in the first place. And we would not be liable at any point 
for compromising customer's data security.

Thanks,
Vadim





> One task failure in MetastoreCacheInitializer should not cause HMS not to 
> start
> -------------------------------------------------------------------------------
>
>                 Key: SENTRY-1570
>                 URL: https://issues.apache.org/jira/browse/SENTRY-1570
>             Project: Sentry
>          Issue Type: Improvement
>          Components: Hive Plugin
>    Affects Versions: 1.5.1
>            Reporter: Aihua Xu
>
> Seems it's common that HMS metastore may have incorrect metadata for one or 
> more databases/tables/partitions due to various reasons (see SENTRY-1564). 
> Right now, HMS will not start properly if MetastoreCacheInitializer fails to 
> initialize authzPath and the whole Hive service becomes unusable. 
> Propose to continue the cache initialization even there is a task failure. 
> The databases/tables with the issues will be unusable until the authzPath 
> cache for them is built.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SENTRY-1570) One task failure in MetastoreCacheInitializer should not cause HMS not to start

Reply via email to