Oh, and I should correct myself, it doesn't crash, it seems that the management server fences itself because it can't talk to the database.
On Sun, Jul 7, 2013 at 12:59 AM, Marcus Sorensen <shadow...@gmail.com> wrote: > Ok. After a cursory look, I've seen that the autoReconnect is kind of > a bad option for jdbc. I've also found this, which seems kind of hairy > for what I want to do: > > http://dev.mysql.com/doc/refman/5.0/en/connector-j-usagenotes-j2ee-concepts-managing-load-balanced-connections.html > > I don't necessarily want to hand off the loadbalancing management to > the java code, I just want cloudstack to automatically reinitialize > the database connection when this 'communications link failure' > occurs, maybe with a db.cloud.connection.retry.count property or > similar. > > On Sun, Jul 7, 2013 at 12:54 AM, Wido den Hollander <w...@widodh.nl> wrote: >> Hi, >> >> >> On 07/07/2013 08:45 AM, Marcus Sorensen wrote: >>> >>> I see that my db.properties has db.cloud.autoReconnect=true, which >>> translates to setting autoReconnect in the jdbc driver connection in >>> utils/src/com/cloud/utils/db/Transaction.java. I also see that if I >>> manually trigger the issue I get: >>> >> >> Just to confirm, I see the same issues. I haven't looked into this yet, but >> this is also one of the things I want to have fixed. >> >> Maybe create an issue for it? >> >> Wido >> >> >>> 013-07-07 00:42:50,502 ERROR [cloud.cluster.ClusterManagerImpl] >>> (Cluster-Heartbeat-1:null) Runtime DB exception >>> com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: >>> Communications link failure >>> >>> The last packet successfully received from the server was 1,503 >>> milliseconds ago. The last packet sent successfully to the server was >>> 0 milliseconds ago. >>> at sun.reflect.GeneratedConstructorAccessor159.newInstance(Unknown Source) >>> at >>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) >>> at java.lang.reflect.Constructor.newInstance(Constructor.java:532) >>> at com.mysql.jdbc.Util.handleNewInstance(Util.java:411) >>> at >>> com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1117) >>> at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3567) >>> at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3456) >>> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3997) >>> at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2468) >>> at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2629) >>> at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2719) >>> at >>> com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2155) >>> at >>> com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2318) >>> at >>> org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96) >>> at >>> org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96) >>> at >>> com.cloud.utils.db.GenericDaoBase.searchIncludingRemoved(GenericDaoBase.java:409) >>> at >>> com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125) >>> at >>> com.cloud.utils.db.GenericDaoBase.searchIncludingRemoved(GenericDaoBase.java:350) >>> at >>> com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125) >>> at >>> com.cloud.utils.db.GenericDaoBase.listIncludingRemovedBy(GenericDaoBase.java:907) >>> at >>> com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125) >>> at >>> com.cloud.utils.db.GenericDaoBase.listIncludingRemovedBy(GenericDaoBase.java:912) >>> at >>> com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125) >>> at >>> com.cloud.cluster.dao.ManagementServerHostDaoImpl.getActiveList(ManagementServerHostDaoImpl.java:158) >>> at >>> com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125) >>> at >>> com.cloud.cluster.ClusterManagerImpl.peerScan(ClusterManagerImpl.java:1057) >>> at >>> com.cloud.cluster.ClusterManagerImpl.access$1200(ClusterManagerImpl.java:95) >>> at com.cloud.cluster.ClusterManagerImpl$4.run(ClusterManagerImpl.java:789) >>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>> at >>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) >>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) >>> at >>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165) >>> at >>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:679) >>> Caused by: java.io.EOFException: Can not read response from server. >>> Expected to read 4 bytes, read 0 bytes before connection was >>> unexpectedly lost. >>> ... 55 more >>> 2013-07-07 00:42:50,505 ERROR [cloud.cluster.ClusterManagerImpl] >>> (Cluster-Heartbeat-1:null) DB communication problem detected, fence it >>> >>> And I have only to restart cloudstack-management so it can connect to >>> another member in the loadbalanced multimaster database to get things >>> running again. >>> >>> >>> On Sun, Jul 7, 2013 at 12:35 AM, Marcus Sorensen <shadow...@gmail.com> >>> wrote: >>>> >>>> I've noticed that the cloudstack management server creates persistent >>>> connections to the database, and crashes if the database connection is >>>> lost. I haven't looked at the code yet, but I was wondering if anyone >>>> knew about what was going on here, if it's simply not set up to >>>> gracefully handle reconnect, or something else. We have a >>>> multi-master database setup, but cloudstack doesn't take advantage of >>>> it since it doesn't attempt graceful reconnect, if the particular node >>>> it connected to on startup goes down, it simply crashes.