Re: database connection resilience

Marcus Sorensen Sun, 07 Jul 2013 21:10:00 -0700

Looks like there's no "db.usage.url.params", either. Is there a reason
for it, or was it just implemented quickly?


On Sun, Jul 7, 2013 at 4:36 PM, Marcus Sorensen <shadow...@gmail.com> wrote:
> I think there are two separate issues here.
>
> 1) The management server uses the database to determine cluster
> membership, and if no database connection can be made, the management
> server fences itself (shuts down). This is good, but in the case where
> there's only one management server (no cluster intended), it seems
> like an issue. However, it may be better to shut down, I'm not sure
> how the management server will react after a temporary database
> outage. Some opinions would be appreciated, my preference would be
> that a single-management server would just be able to pick back up
> where it left off rather than dying.
>
> 2) There is no support for JDBC's built-in loadbalancing features. I
> have a patch that fixes this, however I noticed a few things that I'd
> like some feedback on. Namely, the awsapi database connection doesn't
> have its own settings, rather it uses the same host connection
> settings as the cloud db and the autoReconnect setting from the usage
> database settings. Was this a shortcut, or is there a reason for it?
> My current version of the patch just keeps the same methodology, but
> it seems that while I'm at adding properties to db.properties we could
> allow true db.awsapi.host and db.awsapi.port.
>
> On Sun, Jul 7, 2013 at 1:02 AM, Marcus Sorensen <shadow...@gmail.com> wrote:
>> Oh, and I should correct myself, it doesn't crash, it seems that the
>> management server fences itself because it can't talk to the database.
>>
>> On Sun, Jul 7, 2013 at 12:59 AM, Marcus Sorensen <shadow...@gmail.com> wrote:
>>> Ok. After a cursory look, I've seen that the autoReconnect is kind of
>>> a bad option for jdbc. I've also found this, which seems kind of hairy
>>> for what I want to do:
>>>
>>> http://dev.mysql.com/doc/refman/5.0/en/connector-j-usagenotes-j2ee-concepts-managing-load-balanced-connections.html
>>>
>>> I don't necessarily want to hand off the loadbalancing management to
>>> the java code, I just want cloudstack to automatically reinitialize
>>> the database connection when this 'communications link failure'
>>> occurs, maybe with a db.cloud.connection.retry.count property or
>>> similar.
>>>
>>> On Sun, Jul 7, 2013 at 12:54 AM, Wido den Hollander <w...@widodh.nl> wrote:
>>>> Hi,
>>>>
>>>>
>>>> On 07/07/2013 08:45 AM, Marcus Sorensen wrote:
>>>>>
>>>>> I see that my db.properties has db.cloud.autoReconnect=true, which
>>>>> translates to setting autoReconnect in the jdbc driver connection in
>>>>> utils/src/com/cloud/utils/db/Transaction.java. I also see that if I
>>>>> manually trigger the issue I get:
>>>>>
>>>>
>>>> Just to confirm, I see the same issues. I haven't looked into this yet, but
>>>> this is also one of the things I want to have fixed.
>>>>
>>>> Maybe create an issue for it?
>>>>
>>>> Wido
>>>>
>>>>
>>>>> 013-07-07 00:42:50,502 ERROR [cloud.cluster.ClusterManagerImpl]
>>>>> (Cluster-Heartbeat-1:null) Runtime DB exception
>>>>> com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:
>>>>> Communications link failure
>>>>>
>>>>> The last packet successfully received from the server was 1,503
>>>>> milliseconds ago.  The last packet sent successfully to the server was
>>>>> 0 milliseconds ago.
>>>>> at sun.reflect.GeneratedConstructorAccessor159.newInstance(Unknown Source)
>>>>> at
>>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
>>>>> at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
>>>>> at
>>>>> com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1117)
>>>>> at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3567)
>>>>> at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3456)
>>>>> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3997)
>>>>> at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2468)
>>>>> at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2629)
>>>>> at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2719)
>>>>> at
>>>>> com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2155)
>>>>> at
>>>>> com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2318)
>>>>> at
>>>>> org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
>>>>> at
>>>>> org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
>>>>> at
>>>>> com.cloud.utils.db.GenericDaoBase.searchIncludingRemoved(GenericDaoBase.java:409)
>>>>> at
>>>>> com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
>>>>> at
>>>>> com.cloud.utils.db.GenericDaoBase.searchIncludingRemoved(GenericDaoBase.java:350)
>>>>> at
>>>>> com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
>>>>> at
>>>>> com.cloud.utils.db.GenericDaoBase.listIncludingRemovedBy(GenericDaoBase.java:907)
>>>>> at
>>>>> com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
>>>>> at
>>>>> com.cloud.utils.db.GenericDaoBase.listIncludingRemovedBy(GenericDaoBase.java:912)
>>>>> at
>>>>> com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
>>>>> at
>>>>> com.cloud.cluster.dao.ManagementServerHostDaoImpl.getActiveList(ManagementServerHostDaoImpl.java:158)
>>>>> at
>>>>> com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
>>>>> at
>>>>> com.cloud.cluster.ClusterManagerImpl.peerScan(ClusterManagerImpl.java:1057)
>>>>> at
>>>>> com.cloud.cluster.ClusterManagerImpl.access$1200(ClusterManagerImpl.java:95)
>>>>> at com.cloud.cluster.ClusterManagerImpl$4.run(ClusterManagerImpl.java:789)
>>>>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>>> at
>>>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>>>>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>>>> at
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
>>>>> at
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> at java.lang.Thread.run(Thread.java:679)
>>>>> Caused by: java.io.EOFException: Can not read response from server.
>>>>> Expected to read 4 bytes, read 0 bytes before connection was
>>>>> unexpectedly lost.
>>>>> ... 55 more
>>>>> 2013-07-07 00:42:50,505 ERROR [cloud.cluster.ClusterManagerImpl]
>>>>> (Cluster-Heartbeat-1:null) DB communication problem detected, fence it
>>>>>
>>>>> And I have only to restart cloudstack-management so it can connect to
>>>>> another member in the loadbalanced multimaster database to get things
>>>>> running again.
>>>>>
>>>>>
>>>>> On Sun, Jul 7, 2013 at 12:35 AM, Marcus Sorensen <shadow...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> I've noticed that the cloudstack management server creates persistent
>>>>>> connections to the database, and crashes if the database connection is
>>>>>> lost. I haven't looked at the code yet, but I was wondering if anyone
>>>>>> knew about what was going on here, if it's simply not set up to
>>>>>> gracefully handle reconnect, or something else.  We have a
>>>>>> multi-master database setup, but cloudstack doesn't take advantage of
>>>>>> it since it doesn't attempt graceful reconnect, if the particular node
>>>>>> it connected to on startup goes down, it simply crashes.

Re: database connection resilience

Reply via email to