[ 
https://issues.apache.org/jira/browse/SENTRY-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16289816#comment-16289816
 ] 

Liam Sargent commented on SENTRY-1753:
--------------------------------------

[~akolb] The point of the exponential backoff is to reduce log messages.

bq. What is the value of any delay other then zero? Why do you want tto delay 
connection attempts? This just increases startup latency when Sentry restarts 
after reboot.

This config item currently exists and is set to zero. I left this step in 
because it is part of the current pattern.

bq. 2.) Attempt to connect to HMS,
So far so good...
_if fail due to no service, wait configurable initial delay amount and try 
again.
_

We want the first retry attempts to be quick, within a few seconds of each 
other, but if the service is not responding right away, we should wait a little 
longer before trying again. Using exponential backoff means the retries will 
not be fired off every half a second. It is not necessary to use exponential 
backoff at all, but I think it is a clever way of reducing noise without 
sacrificing. The current stack trace for a single HMSFollower connect failure 
(which occurs every half second for almost 3 minutes on a full restart) looks 
like:


9:45:46.816 AM  WARN    UserGroupInformation    
PriviledgedActionException 
as:sentry/liam-sentry-1.gce.cloudera....@gce.cloudera.com (auth:KERBEROS) 
cause:MetaException(message:Could not connect to meta store using any of the 
URIs provided. Most recent failure: 
org.apache.thrift.transport.TTransportException: java.net.ConnectException: 
Connection refused (Connection refused)
        at org.apache.thrift.transport.TSocket.open(TSocket.java:226)
        at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:266)
        at 
org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
        at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
        at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
        at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:472)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:252)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:187)
        at 
org.apache.sentry.service.thrift.HiveSimpleConnectionFactory$1.run(HiveSimpleConnectionFactory.java:117)
        at 
org.apache.sentry.service.thrift.HiveSimpleConnectionFactory$1.run(HiveSimpleConnectionFactory.java:114)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
        at 
org.apache.sentry.service.thrift.HiveSimpleConnectionFactory.connect(HiveSimpleConnectionFactory.java:114)
        at 
org.apache.sentry.service.thrift.SentryHMSClient.connect(SentryHMSClient.java:106)
        at 
org.apache.sentry.service.thrift.HMSFollower.syncupWithHms(HMSFollower.java:177)
        at 
org.apache.sentry.service.thrift.HMSFollower.run(HMSFollower.java:150)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.thrift.transport.TSocket.open(TSocket.java:221)
        ... 27 more
)
9:45:46.816 AM  ERROR   HMSFollower     
HMSFollower cannot connect to HMS!!
java.lang.reflect.UndeclaredThrowableException
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1930)
        at 
org.apache.sentry.service.thrift.HiveSimpleConnectionFactory.connect(HiveSimpleConnectionFactory.java:114)
        at 
org.apache.sentry.service.thrift.SentryHMSClient.connect(SentryHMSClient.java:106)
        at 
org.apache.sentry.service.thrift.HMSFollower.syncupWithHms(HMSFollower.java:177)
        at 
org.apache.sentry.service.thrift.HMSFollower.run(HMSFollower.java:150)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: MetaException(message:Could not connect to meta store using any of 
the URIs provided. Most recent failure: 
org.apache.thrift.transport.TTransportException: java.net.ConnectException: 
Connection refused (Connection refused)
        at org.apache.thrift.transport.TSocket.open(TSocket.java:226)
        at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:266)
        at 
org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
        at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
        at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
        at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:472)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:252)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:187)
        at 
org.apache.sentry.service.thrift.HiveSimpleConnectionFactory$1.run(HiveSimpleConnectionFactory.java:117)
        at 
org.apache.sentry.service.thrift.HiveSimpleConnectionFactory$1.run(HiveSimpleConnectionFactory.java:114)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
        at 
org.apache.sentry.service.thrift.HiveSimpleConnectionFactory.connect(HiveSimpleConnectionFactory.java:114)
        at 
org.apache.sentry.service.thrift.SentryHMSClient.connect(SentryHMSClient.java:106)
        at 
org.apache.sentry.service.thrift.HMSFollower.syncupWithHms(HMSFollower.java:177)
        at 
org.apache.sentry.service.thrift.HMSFollower.run(HMSFollower.java:150)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.thrift.transport.TSocket.open(TSocket.java:221)
        ... 27 more
)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:520)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:252)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:187)
        at 
org.apache.sentry.service.thrift.HiveSimpleConnectionFactory$1.run(HiveSimpleConnectionFactory.java:117)
        at 
org.apache.sentry.service.thrift.HiveSimpleConnectionFactory$1.run(HiveSimpleConnectionFactory.java:114)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
        ... 11 more

So, we could either reduce the number of failure messages in the average case 
by backing off retry time, or we could make the log message less verbose.

> Make HMSFollower initial delay configurable
> -------------------------------------------
>
>                 Key: SENTRY-1753
>                 URL: https://issues.apache.org/jira/browse/SENTRY-1753
>             Project: Sentry
>          Issue Type: Task
>          Components: Sentry
>    Affects Versions: 2.0.0
>            Reporter: Na Li
>            Assignee: Liam Sargent
>            Priority: Minor
>
> HMSFollower initial delay affects how fast it can get notification from Hive 
> meta store. We are changing it from 60 seconds to 0 second. It may cause some 
> issue. It is better to make this value configurable, so we can adjust it 
> based on testing result.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to