[
https://issues.apache.org/jira/browse/RANGER-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14334032#comment-14334032
]
Alok Lal commented on RANGER-265:
---------------------------------
[~rmani] noted that this fix protects not only attempts to get new connections
but also code that uses the connection to talk to the subject service. For
example, if a rogue service hangs while responding to a lookup query that won't
lead to connection stacking in the policy manager.
On that note, though, one item to review are the current timeout values.
Current values 5 or 10 seconds seem like an eternity. A more reasonable
timeout value should be a few hundred mili-seconds. We should note that for
overwhelming number of deployments ranger protected services probably reside in
a single data center. Even for cross colo calls few hundred mili-seconds
should sufice.
Thoughts?
> If Hive repository's connection is setup incorrectly then it can make policy
> manager unresponsive.
> --------------------------------------------------------------------------------------------------
>
> Key: RANGER-265
> URL: https://issues.apache.org/jira/browse/RANGER-265
> Project: Ranger
> Issue Type: Bug
> Affects Versions: 0.4.0
> Reporter: Alok Lal
> Assignee: Alok Lal
> Fix For: 0.4.0, 0.5.0
>
> Attachments:
> 0001-RANGER-265-Changed-TimedEventUtil-so-that-it-would-i.patch
>
>
> [Reporting on behalf of Abhay Kulkarni]
> If the connection of a hive repository is setup incorrectly in such a manner
> that connect call gets blocked, i.e. does not error out then it quickly leads
> to a buildup of connect threads inside policy manager. And eventually it
> stops responding to both the service plugins and to user requests.
> The cause of the problem seems to be the incomplete implementation of the
> deferred/timed execution class. While the problem was reported on Hive it
> could affect any component. This problem could be fixed at several levels:
> - Server could timeout the connection call and then interrupt that thread.
> - Server code that opens client connections to various services could in turn
> change how the critical sections are structured so that this problem does not
> happen and/or connection attempts don't block if someone else is trying to
> get a connection, etc.
> - For not a fix that could have widest impact should be considered.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)