[ https://issues.apache.org/jira/browse/SENTRY-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16289816#comment-16289816 ]
Liam Sargent edited comment on SENTRY-1753 at 12/13/17 8:19 PM: ---------------------------------------------------------------- [~akolb] The point of the exponential backoff is to reduce log messages. bq. What is the value of any delay other then zero? Why do you want tto delay connection attempts? This just increases startup latency when Sentry restarts after reboot. This config item currently exists and is set to zero. I left this step in because it is part of the current pattern. bq. 2.) Attempt to connect to HMS, So far so good... _if fail due to no service, wait configurable initial delay amount and try again._ We want the first retry attempts to be quick, within a few seconds of each other, but if the service is not responding right away, we should wait a little longer before trying again. Using exponential backoff means the retries will not be fired off every half a second. It is not necessary to use exponential backoff at all, but I think it is a clever way of reducing noise without sacrificing. The current stack trace for a single HMSFollower connect failure (which occurs every half second for almost 3 minutes on a full restart) looks like: 9:45:46.816 AM WARN UserGroupInformation PriviledgedActionException as:sentry/liam-sentry-1.gce.cloudera....@gce.cloudera.com (auth:KERBEROS) cause:MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused) at org.apache.thrift.transport.TSocket.open(TSocket.java:226) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:266) at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:472) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:252) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:187) at org.apache.sentry.service.thrift.HiveSimpleConnectionFactory$1.run(HiveSimpleConnectionFactory.java:117) at org.apache.sentry.service.thrift.HiveSimpleConnectionFactory$1.run(HiveSimpleConnectionFactory.java:114) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) at org.apache.sentry.service.thrift.HiveSimpleConnectionFactory.connect(HiveSimpleConnectionFactory.java:114) at org.apache.sentry.service.thrift.SentryHMSClient.connect(SentryHMSClient.java:106) at org.apache.sentry.service.thrift.HMSFollower.syncupWithHms(HMSFollower.java:177) at org.apache.sentry.service.thrift.HMSFollower.run(HMSFollower.java:150) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.thrift.transport.TSocket.open(TSocket.java:221) ... 27 more ) 9:45:46.816 AM ERROR HMSFollower HMSFollower cannot connect to HMS!! java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1930) at org.apache.sentry.service.thrift.HiveSimpleConnectionFactory.connect(HiveSimpleConnectionFactory.java:114) at org.apache.sentry.service.thrift.SentryHMSClient.connect(SentryHMSClient.java:106) at org.apache.sentry.service.thrift.HMSFollower.syncupWithHms(HMSFollower.java:177) at org.apache.sentry.service.thrift.HMSFollower.run(HMSFollower.java:150) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused) at org.apache.thrift.transport.TSocket.open(TSocket.java:226) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:266) at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:472) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:252) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:187) at org.apache.sentry.service.thrift.HiveSimpleConnectionFactory$1.run(HiveSimpleConnectionFactory.java:117) at org.apache.sentry.service.thrift.HiveSimpleConnectionFactory$1.run(HiveSimpleConnectionFactory.java:114) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) at org.apache.sentry.service.thrift.HiveSimpleConnectionFactory.connect(HiveSimpleConnectionFactory.java:114) at org.apache.sentry.service.thrift.SentryHMSClient.connect(SentryHMSClient.java:106) at org.apache.sentry.service.thrift.HMSFollower.syncupWithHms(HMSFollower.java:177) at org.apache.sentry.service.thrift.HMSFollower.run(HMSFollower.java:150) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.thrift.transport.TSocket.open(TSocket.java:221) ... 27 more ) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:520) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:252) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:187) at org.apache.sentry.service.thrift.HiveSimpleConnectionFactory$1.run(HiveSimpleConnectionFactory.java:117) at org.apache.sentry.service.thrift.HiveSimpleConnectionFactory$1.run(HiveSimpleConnectionFactory.java:114) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) ... 11 more So, we could either reduce the number of failure messages in the average case by backing off retry time, or we could make the log message less verbose. was (Author: liamsargent): [~akolb] The point of the exponential backoff is to reduce log messages. bq. What is the value of any delay other then zero? Why do you want tto delay connection attempts? This just increases startup latency when Sentry restarts after reboot. This config item currently exists and is set to zero. I left this step in because it is part of the current pattern. bq. 2.) Attempt to connect to HMS, So far so good... _if fail due to no service, wait configurable initial delay amount and try again._ We want the first retry attempts to be quick, within a few seconds of each other, but if the service is not responding right away, we should wait a little longer before trying again. Using exponential backoff means the retries will not be fired off every half a second. It is not necessary to use exponential backoff at all, but I think it is a clever way of reducing noise without sacrificing. The current stack trace for a single HMSFollower connect failure (which occurs every half second for almost 3 minutes on a full restart) looks like: 9:45:46.816 AM WARN UserGroupInformation PriviledgedActionException as:sentry/liam-sentry-1.gce.cloudera....@gce.cloudera.com (auth:KERBEROS) cause:MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused) at org.apache.thrift.transport.TSocket.open(TSocket.java:226) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:266) at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:472) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:252) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:187) at org.apache.sentry.service.thrift.HiveSimpleConnectionFactory$1.run(HiveSimpleConnectionFactory.java:117) at org.apache.sentry.service.thrift.HiveSimpleConnectionFactory$1.run(HiveSimpleConnectionFactory.java:114) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) at org.apache.sentry.service.thrift.HiveSimpleConnectionFactory.connect(HiveSimpleConnectionFactory.java:114) at org.apache.sentry.service.thrift.SentryHMSClient.connect(SentryHMSClient.java:106) at org.apache.sentry.service.thrift.HMSFollower.syncupWithHms(HMSFollower.java:177) at org.apache.sentry.service.thrift.HMSFollower.run(HMSFollower.java:150) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.thrift.transport.TSocket.open(TSocket.java:221) ... 27 more ) 9:45:46.816 AM ERROR HMSFollower HMSFollower cannot connect to HMS!! java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1930) at org.apache.sentry.service.thrift.HiveSimpleConnectionFactory.connect(HiveSimpleConnectionFactory.java:114) at org.apache.sentry.service.thrift.SentryHMSClient.connect(SentryHMSClient.java:106) at org.apache.sentry.service.thrift.HMSFollower.syncupWithHms(HMSFollower.java:177) at org.apache.sentry.service.thrift.HMSFollower.run(HMSFollower.java:150) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused) at org.apache.thrift.transport.TSocket.open(TSocket.java:226) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:266) at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:472) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:252) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:187) at org.apache.sentry.service.thrift.HiveSimpleConnectionFactory$1.run(HiveSimpleConnectionFactory.java:117) at org.apache.sentry.service.thrift.HiveSimpleConnectionFactory$1.run(HiveSimpleConnectionFactory.java:114) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) at org.apache.sentry.service.thrift.HiveSimpleConnectionFactory.connect(HiveSimpleConnectionFactory.java:114) at org.apache.sentry.service.thrift.SentryHMSClient.connect(SentryHMSClient.java:106) at org.apache.sentry.service.thrift.HMSFollower.syncupWithHms(HMSFollower.java:177) at org.apache.sentry.service.thrift.HMSFollower.run(HMSFollower.java:150) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.thrift.transport.TSocket.open(TSocket.java:221) ... 27 more ) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:520) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:252) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:187) at org.apache.sentry.service.thrift.HiveSimpleConnectionFactory$1.run(HiveSimpleConnectionFactory.java:117) at org.apache.sentry.service.thrift.HiveSimpleConnectionFactory$1.run(HiveSimpleConnectionFactory.java:114) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) ... 11 more So, we could either reduce the number of failure messages in the average case by backing off retry time, or we could make the log message less verbose. > Make HMSFollower initial delay configurable > ------------------------------------------- > > Key: SENTRY-1753 > URL: https://issues.apache.org/jira/browse/SENTRY-1753 > Project: Sentry > Issue Type: Task > Components: Sentry > Affects Versions: 2.0.0 > Reporter: Na Li > Assignee: Liam Sargent > Priority: Minor > > HMSFollower initial delay affects how fast it can get notification from Hive > meta store. We are changing it from 60 seconds to 0 second. It may cause some > issue. It is better to make this value configurable, so we can adjust it > based on testing result. -- This message was sent by Atlassian JIRA (v6.4.14#64029)