Steven Hancz created SENTRY-1965: ------------------------------------ Summary: sentry server database database Destroy connection exception (Galera cluster) Key: SENTRY-1965 URL: https://issues.apache.org/jira/browse/SENTRY-1965 Project: Sentry Issue Type: Bug Components: Sentry Affects Versions: 1.5.1 Environment: Sentry 1.5.1 CDH 5.9.1 Mysql 5.6.35 Galera 25.18 Reporter: Steven Hancz
We have implemented an HA solution for the Sentry server database. Basically instead of using a single MySQL server we have a Galera cluster that is accessed via a DNS load balanced VIP. So that if one MySQL server stops working the VIP will detect and send the DB request to the surviving node. A similar set up is working for the HIVE metastore. However we noticed that Sentry just like spark uses the BoneCP connection pool to connect to the database. There are some hard codded configuration options in the bonecp-default-config.xml that are causing issues with Sentry. idleConnectionTestPeriodInMinutes default 240 minutes idleMaxAgeInMinutes default 60 minutes Based on this BonceCP will test each idle connection every 240 minutes but an idle connection is closed after 60 minutes (second parameter) so the connection testing will never take place as the connection will be closed after 60 minutes. The test takes place every 240 minutes. However in an HA configuration with a VIP you can set the connection time out and how often to test for target availability. We had the exact same problem for hive there the work around was to include a second configuration file for BoneCP called bonecp-config.xml. This was added to the hive server jar. The second config file (bonecp-config.xml) contains idleConnectionTestPeriodInMinutes 1 idleMaxAgeInMinutes 5 So that every connection is tested every minute and an idle connection is closed after 5 minutes. But since we test it every minute they will be kept alive. So the question is how to enable a similar setting for Sentry ? With default boneCP configuration and Galera cluster in the back end Sentry is returning the following error: Sep 26, 7:30:39.900 AM ERROR com.jolbox.bonecp.ConnectionTesterThread Destroy connection exception com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Communications link failure during rollback(). Transaction resolution unknown. at sun.reflect.GeneratedConstructorAccessor41.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at com.mysql.jdbc.Util.handleNewInstance(Util.java:404) at com.mysql.jdbc.Util.getInstance(Util.java:387) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:917) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:896) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:885) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:860) at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:4634) at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4263) at com.mysql.jdbc.ConnectionImpl.close(ConnectionImpl.java:1519) at com.jolbox.bonecp.ConnectionHandle.internalClose(ConnectionHandle.java:396) at com.jolbox.bonecp.ConnectionTesterThread.closeConnection(ConnectionTesterThread.java:155) at com.jolbox.bonecp.ConnectionTesterThread.run(ConnectionTesterThread.java:95) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) >From our research it appears that sentry server is using BoneCP class but in >more than one location. Changing the parameters in BoneCP for Sentry alone is >does not appear to be sufficient. Trace file shows that parameters are not >changed and time outs are default boncecp parameter. Where else do we have to >change boneCP config? Regards, -- This message was sent by Atlassian JIRA (v6.4.14#64029)