Re: com.cloud.agent.api.CheckRouterCommand timeout
Am 21.06.2018 um 17:08 schrieb Daan Hoogland: > makes sense, well let's hope all breaks soon ;) I am sure it will break! :D And then I will get back to you with more questions! Thanks a lot for taking the time! > > On Thu, Jun 21, 2018 at 2:15 PM, Melanie Desaive < > m.desa...@heinlein-support.de> wrote: > >> Hi Daan, >> >> Am 21.06.2018 um 15:29 schrieb Daan Hoogland: >>> Melanie, attachments get deleted for this list. Your assumption for the >>> comm path is right for xen. Did you try and execute the script as it is >>> called by the proxy script from the host? and capture the return? We had >> a >>> bad problem with getting the template version in the past on xen, this >>> might be similar. That was due to processing of the returned string in >> the >>> script. >> >> I called both stages of the script manually but at at time, when all was >> working as expected and the routers where back to MASTER and BACKUP. >> >> Looked like: >> >> [root@acs-compute-5 ~]# /opt/cloud/bin/router_proxy.sh checkrouter.sh >> 169.254.1.178 >> Status: BACKUP >> >> root@r-2595-VM:~# /opt/cloud/bin/checkrouter.sh >> Status: BACKUP >> >> >>> >>> On Thu, Jun 21, 2018 at 1:16 PM, Melanie Desaive < >>> m.desa...@heinlein-support.de> wrote: >>> Hi Daan, thanks for your reply. The latest occurance of our VRs going to UNKNOWN did resolve 24 hours after it had occured. Nevertheless I would appreciate some insight into how the checkRouter command is handled, as I expect the problem to come back again. Am 21.06.2018 um 10:39 schrieb Daan Hoogland: > Melanie, this depends a bit on the type of hypervisor. The command executes > the checkrouter.sh script on the virtual router if it reaches it, but >> it > seems your problem is before that. I would look at the network first >> and > follow the path that the execution takes for your hypervisortype. With Stephans help I figured out the following guess for the path of connections for the checkrouter command. Could someone please correct me, if my guess is not correct. ;) x Management Nodes connects to XenServer hypervisor host via management network on port 22 by SSH x On hypervisor host the wrapper script "/opt/cloud/bin/router_proxy.sh" is used to call scripts on system VMs via link-local IP and port 3922 x On the VR the script "/opt/cloud/bin/checkrouter.sh" does the actual check. In our case the API call times out with log messages x Operation timed out: Commands 1063975411966525473 to Host 29 timed out after 60 x Unable to update router r-2595-VM's status x Redundant virtual router (name: r-2595-VM, id: 2595) just switch from BACKUP to UNKNOWN To me it seems that this is a timeout that occurs when ACS management is waitig for the API call to return. At what stage (management host <-> virtualization host) or (virutalization host <-> VR> the answer is delayed is unclear to me. (SSH Login from virtualization host to VR via link-local is working all the time) And it is unclear to me, why both VRs of the respective network stay in UNKNOWN for 24 hours, are accessible via link-local but come back immedately after a reboot. I am happy for any suggestions or explanations in this topic and will investigate further as soon, as the problem comes back again. A portion of our management log for the latest occurance of the problem is attached to this email. Greetings, Melanie > > On Wed, Jun 20, 2018 at 1:53 PM, Melanie Desaive < > m.desa...@heinlein-support.de> wrote: > >> Hi all, >> >> we have a recurring problem with our virtual routers. By the log >> messages it seems that com.cloud.agent.api.CheckRouterCommand runs >> into >> a timeout and therefore switches to UNKNOWN. >> >> All network traffic through the routers is still working. They can be >> accessed by their link-local IP adresses, and configuration looks good >> at a first sight. But configuration changes through the CloudStack API >> do no longer reach the routers. A reboot fixes the problem. >> >> I would like to investigate a little further but lack understanding >> about how the checkRouter command is trying to access the virtual router. >> >> Could someone point me to some relevant documentation or give a short >> overview how the connection from CS-Management is done and where such >> an >> timeout could occur? >> >> As background information - the sequence from the management log looks >> kind of this: >> >> --- >> >> x Every few seconds the com.cloud.agent.api.CheckRouterCommand >> returns >> a state BACKUP or MASTER correctly >> x When the problem occurs the log messages change. Some snippets >> below >> >> x ... Waiting
Re: com.cloud.agent.api.CheckRouterCommand timeout
makes sense, well let's hope all breaks soon ;) On Thu, Jun 21, 2018 at 2:15 PM, Melanie Desaive < m.desa...@heinlein-support.de> wrote: > Hi Daan, > > Am 21.06.2018 um 15:29 schrieb Daan Hoogland: > > Melanie, attachments get deleted for this list. Your assumption for the > > comm path is right for xen. Did you try and execute the script as it is > > called by the proxy script from the host? and capture the return? We had > a > > bad problem with getting the template version in the past on xen, this > > might be similar. That was due to processing of the returned string in > the > > script. > > I called both stages of the script manually but at at time, when all was > working as expected and the routers where back to MASTER and BACKUP. > > Looked like: > > [root@acs-compute-5 ~]# /opt/cloud/bin/router_proxy.sh checkrouter.sh > 169.254.1.178 > Status: BACKUP > > root@r-2595-VM:~# /opt/cloud/bin/checkrouter.sh > Status: BACKUP > > > > > > On Thu, Jun 21, 2018 at 1:16 PM, Melanie Desaive < > > m.desa...@heinlein-support.de> wrote: > > > >> Hi Daan, > >> > >> thanks for your reply. > >> > >> The latest occurance of our VRs going to UNKNOWN did resolve 24 hours > >> after it had occured. Nevertheless I would appreciate some insight into > >> how the checkRouter command is handled, as I expect the problem to come > >> back again. > >> Am 21.06.2018 um 10:39 schrieb Daan Hoogland: > >>> Melanie, this depends a bit on the type of hypervisor. The command > >> executes > >>> the checkrouter.sh script on the virtual router if it reaches it, but > it > >>> seems your problem is before that. I would look at the network first > and > >>> follow the path that the execution takes for your hypervisortype. > >> > >> With Stephans help I figured out the following guess for the path of > >> connections for the checkrouter command. Could someone please correct > >> me, if my guess is not correct. ;) > >> > >> x Management Nodes connects to XenServer hypervisor host via management > >> network on port 22 by SSH > >> x On hypervisor host the wrapper script > >> "/opt/cloud/bin/router_proxy.sh" is used to call scripts on system VMs > >> via link-local IP and port 3922 > >> x On the VR the script "/opt/cloud/bin/checkrouter.sh" does the actual > >> check. > >> > >> In our case the API call times out with log messages > >> x Operation timed out: Commands 1063975411966525473 to Host 29 timed > >> out after 60 > >> x Unable to update router r-2595-VM's status > >> x Redundant virtual router (name: r-2595-VM, id: 2595) just switch > >> from BACKUP to UNKNOWN > >> > >> To me it seems that this is a timeout that occurs when ACS management is > >> waitig for the API call to return. At what stage (management host <-> > >> virtualization host) or (virutalization host <-> VR> the answer is > >> delayed is unclear to me. (SSH Login from virtualization host to VR via > >> link-local is working all the time) > >> > >> And it is unclear to me, why both VRs of the respective network stay in > >> UNKNOWN for 24 hours, are accessible via link-local but come back > >> immedately after a reboot. > >> > >> I am happy for any suggestions or explanations in this topic and will > >> investigate further as soon, as the problem comes back again. > >> > >> A portion of our management log for the latest occurance of the problem > >> is attached to this email. > >> > >> Greetings, > >> > >> Melanie > >> > >>> > >>> On Wed, Jun 20, 2018 at 1:53 PM, Melanie Desaive < > >>> m.desa...@heinlein-support.de> wrote: > >>> > Hi all, > > we have a recurring problem with our virtual routers. By the log > messages it seems that com.cloud.agent.api.CheckRouterCommand runs > into > a timeout and therefore switches to UNKNOWN. > > All network traffic through the routers is still working. They can be > accessed by their link-local IP adresses, and configuration looks good > at a first sight. But configuration changes through the CloudStack API > do no longer reach the routers. A reboot fixes the problem. > > I would like to investigate a little further but lack understanding > about how the checkRouter command is trying to access the virtual > >> router. > > Could someone point me to some relevant documentation or give a short > overview how the connection from CS-Management is done and where such > an > timeout could occur? > > As background information - the sequence from the management log looks > kind of this: > > --- > > x Every few seconds the com.cloud.agent.api.CheckRouterCommand > returns > a state BACKUP or MASTER correctly > x When the problem occurs the log messages change. Some snippets > below > > x ... Waiting some more time because this is the current command > x ... Waiting some more time because this is the current command > x Could not find exception: > com.cloud.exception.OperationTimedo
Re: com.cloud.agent.api.CheckRouterCommand timeout
Hi Daan, Am 21.06.2018 um 15:29 schrieb Daan Hoogland: > Melanie, attachments get deleted for this list. Your assumption for the > comm path is right for xen. Did you try and execute the script as it is > called by the proxy script from the host? and capture the return? We had a > bad problem with getting the template version in the past on xen, this > might be similar. That was due to processing of the returned string in the > script. I called both stages of the script manually but at at time, when all was working as expected and the routers where back to MASTER and BACKUP. Looked like: [root@acs-compute-5 ~]# /opt/cloud/bin/router_proxy.sh checkrouter.sh 169.254.1.178 Status: BACKUP root@r-2595-VM:~# /opt/cloud/bin/checkrouter.sh Status: BACKUP > > On Thu, Jun 21, 2018 at 1:16 PM, Melanie Desaive < > m.desa...@heinlein-support.de> wrote: > >> Hi Daan, >> >> thanks for your reply. >> >> The latest occurance of our VRs going to UNKNOWN did resolve 24 hours >> after it had occured. Nevertheless I would appreciate some insight into >> how the checkRouter command is handled, as I expect the problem to come >> back again. >> Am 21.06.2018 um 10:39 schrieb Daan Hoogland: >>> Melanie, this depends a bit on the type of hypervisor. The command >> executes >>> the checkrouter.sh script on the virtual router if it reaches it, but it >>> seems your problem is before that. I would look at the network first and >>> follow the path that the execution takes for your hypervisortype. >> >> With Stephans help I figured out the following guess for the path of >> connections for the checkrouter command. Could someone please correct >> me, if my guess is not correct. ;) >> >> x Management Nodes connects to XenServer hypervisor host via management >> network on port 22 by SSH >> x On hypervisor host the wrapper script >> "/opt/cloud/bin/router_proxy.sh" is used to call scripts on system VMs >> via link-local IP and port 3922 >> x On the VR the script "/opt/cloud/bin/checkrouter.sh" does the actual >> check. >> >> In our case the API call times out with log messages >> x Operation timed out: Commands 1063975411966525473 to Host 29 timed >> out after 60 >> x Unable to update router r-2595-VM's status >> x Redundant virtual router (name: r-2595-VM, id: 2595) just switch >> from BACKUP to UNKNOWN >> >> To me it seems that this is a timeout that occurs when ACS management is >> waitig for the API call to return. At what stage (management host <-> >> virtualization host) or (virutalization host <-> VR> the answer is >> delayed is unclear to me. (SSH Login from virtualization host to VR via >> link-local is working all the time) >> >> And it is unclear to me, why both VRs of the respective network stay in >> UNKNOWN for 24 hours, are accessible via link-local but come back >> immedately after a reboot. >> >> I am happy for any suggestions or explanations in this topic and will >> investigate further as soon, as the problem comes back again. >> >> A portion of our management log for the latest occurance of the problem >> is attached to this email. >> >> Greetings, >> >> Melanie >> >>> >>> On Wed, Jun 20, 2018 at 1:53 PM, Melanie Desaive < >>> m.desa...@heinlein-support.de> wrote: >>> Hi all, we have a recurring problem with our virtual routers. By the log messages it seems that com.cloud.agent.api.CheckRouterCommand runs into a timeout and therefore switches to UNKNOWN. All network traffic through the routers is still working. They can be accessed by their link-local IP adresses, and configuration looks good at a first sight. But configuration changes through the CloudStack API do no longer reach the routers. A reboot fixes the problem. I would like to investigate a little further but lack understanding about how the checkRouter command is trying to access the virtual >> router. Could someone point me to some relevant documentation or give a short overview how the connection from CS-Management is done and where such an timeout could occur? As background information - the sequence from the management log looks kind of this: --- x Every few seconds the com.cloud.agent.api.CheckRouterCommand returns a state BACKUP or MASTER correctly x When the problem occurs the log messages change. Some snippets below x ... Waiting some more time because this is the current command x ... Waiting some more time because this is the current command x Could not find exception: com.cloud.exception.OperationTimedoutException in error code list for exceptions x Timed out on Seq 28-2352567855348137104 x Seq 28-2352567855348137104: Cancelling. x Operation timed out: Commands 2352567855348137104 to Host 28 timed out after 60 x Unable to update router r-2594-VM's status x Redundant virtual router (name: r-2594-VM, id: 2594) just switch from MASTE
Re: com.cloud.agent.api.CheckRouterCommand timeout
Melanie, attachments get deleted for this list. Your assumption for the comm path is right for xen. Did you try and execute the script as it is called by the proxy script from the host? and capture the return? We had a bad problem with getting the template version in the past on xen, this might be similar. That was due to processing of the returned string in the script. On Thu, Jun 21, 2018 at 1:16 PM, Melanie Desaive < m.desa...@heinlein-support.de> wrote: > Hi Daan, > > thanks for your reply. > > The latest occurance of our VRs going to UNKNOWN did resolve 24 hours > after it had occured. Nevertheless I would appreciate some insight into > how the checkRouter command is handled, as I expect the problem to come > back again. > Am 21.06.2018 um 10:39 schrieb Daan Hoogland: > > Melanie, this depends a bit on the type of hypervisor. The command > executes > > the checkrouter.sh script on the virtual router if it reaches it, but it > > seems your problem is before that. I would look at the network first and > > follow the path that the execution takes for your hypervisortype. > > With Stephans help I figured out the following guess for the path of > connections for the checkrouter command. Could someone please correct > me, if my guess is not correct. ;) > > x Management Nodes connects to XenServer hypervisor host via management > network on port 22 by SSH > x On hypervisor host the wrapper script > "/opt/cloud/bin/router_proxy.sh" is used to call scripts on system VMs > via link-local IP and port 3922 > x On the VR the script "/opt/cloud/bin/checkrouter.sh" does the actual > check. > > In our case the API call times out with log messages > x Operation timed out: Commands 1063975411966525473 to Host 29 timed > out after 60 > x Unable to update router r-2595-VM's status > x Redundant virtual router (name: r-2595-VM, id: 2595) just switch > from BACKUP to UNKNOWN > > To me it seems that this is a timeout that occurs when ACS management is > waitig for the API call to return. At what stage (management host <-> > virtualization host) or (virutalization host <-> VR> the answer is > delayed is unclear to me. (SSH Login from virtualization host to VR via > link-local is working all the time) > > And it is unclear to me, why both VRs of the respective network stay in > UNKNOWN for 24 hours, are accessible via link-local but come back > immedately after a reboot. > > I am happy for any suggestions or explanations in this topic and will > investigate further as soon, as the problem comes back again. > > A portion of our management log for the latest occurance of the problem > is attached to this email. > > Greetings, > > Melanie > > > > > On Wed, Jun 20, 2018 at 1:53 PM, Melanie Desaive < > > m.desa...@heinlein-support.de> wrote: > > > >> Hi all, > >> > >> we have a recurring problem with our virtual routers. By the log > >> messages it seems that com.cloud.agent.api.CheckRouterCommand runs into > >> a timeout and therefore switches to UNKNOWN. > >> > >> All network traffic through the routers is still working. They can be > >> accessed by their link-local IP adresses, and configuration looks good > >> at a first sight. But configuration changes through the CloudStack API > >> do no longer reach the routers. A reboot fixes the problem. > >> > >> I would like to investigate a little further but lack understanding > >> about how the checkRouter command is trying to access the virtual > router. > >> > >> Could someone point me to some relevant documentation or give a short > >> overview how the connection from CS-Management is done and where such an > >> timeout could occur? > >> > >> As background information - the sequence from the management log looks > >> kind of this: > >> > >> --- > >> > >> x Every few seconds the com.cloud.agent.api.CheckRouterCommand returns > >> a state BACKUP or MASTER correctly > >> x When the problem occurs the log messages change. Some snippets below > >> > >> x ... Waiting some more time because this is the current command > >> x ... Waiting some more time because this is the current command > >> x Could not find exception: > >> com.cloud.exception.OperationTimedoutException in error code list for > >> exceptions > >> x Timed out on Seq 28-2352567855348137104 > >> x Seq 28-2352567855348137104: Cancelling. > >> x Operation timed out: Commands 2352567855348137104 to Host 28 timed > >> out after 60 > >> x Unable to update router r-2594-VM's status > >> x Redundant virtual router (name: r-2594-VM, id: 2594) just switch > >> from MASTER to UNKNOWN > >> > >> x Those error messages are now repeated for each following > >> CheckRouterCommand until the virtual router is rebootet > >> > >> > >> Greetings, > >> > >> Melanie > >> > >> -- > >> -- > >> > >> Heinlein Support GmbH > >> Linux: Akademie - Support - Hosting > >> > >> http://www.heinlein-support.de > >> Tel: 030 / 40 50 51 - 0 > >> Fax: 030 / 40 50 51 - 19 > >> > >> Zwangsangaben lt. §35a GmbHG: > >> HRB 93818 B / Amtsgeric
Re: com.cloud.agent.api.CheckRouterCommand timeout
Hi Daan, thanks for your reply. The latest occurance of our VRs going to UNKNOWN did resolve 24 hours after it had occured. Nevertheless I would appreciate some insight into how the checkRouter command is handled, as I expect the problem to come back again. Am 21.06.2018 um 10:39 schrieb Daan Hoogland: > Melanie, this depends a bit on the type of hypervisor. The command executes > the checkrouter.sh script on the virtual router if it reaches it, but it > seems your problem is before that. I would look at the network first and > follow the path that the execution takes for your hypervisortype. With Stephans help I figured out the following guess for the path of connections for the checkrouter command. Could someone please correct me, if my guess is not correct. ;) x Management Nodes connects to XenServer hypervisor host via management network on port 22 by SSH x On hypervisor host the wrapper script "/opt/cloud/bin/router_proxy.sh" is used to call scripts on system VMs via link-local IP and port 3922 x On the VR the script "/opt/cloud/bin/checkrouter.sh" does the actual check. In our case the API call times out with log messages x Operation timed out: Commands 1063975411966525473 to Host 29 timed out after 60 x Unable to update router r-2595-VM's status x Redundant virtual router (name: r-2595-VM, id: 2595) just switch from BACKUP to UNKNOWN To me it seems that this is a timeout that occurs when ACS management is waitig for the API call to return. At what stage (management host <-> virtualization host) or (virutalization host <-> VR> the answer is delayed is unclear to me. (SSH Login from virtualization host to VR via link-local is working all the time) And it is unclear to me, why both VRs of the respective network stay in UNKNOWN for 24 hours, are accessible via link-local but come back immedately after a reboot. I am happy for any suggestions or explanations in this topic and will investigate further as soon, as the problem comes back again. A portion of our management log for the latest occurance of the problem is attached to this email. Greetings, Melanie > > On Wed, Jun 20, 2018 at 1:53 PM, Melanie Desaive < > m.desa...@heinlein-support.de> wrote: > >> Hi all, >> >> we have a recurring problem with our virtual routers. By the log >> messages it seems that com.cloud.agent.api.CheckRouterCommand runs into >> a timeout and therefore switches to UNKNOWN. >> >> All network traffic through the routers is still working. They can be >> accessed by their link-local IP adresses, and configuration looks good >> at a first sight. But configuration changes through the CloudStack API >> do no longer reach the routers. A reboot fixes the problem. >> >> I would like to investigate a little further but lack understanding >> about how the checkRouter command is trying to access the virtual router. >> >> Could someone point me to some relevant documentation or give a short >> overview how the connection from CS-Management is done and where such an >> timeout could occur? >> >> As background information - the sequence from the management log looks >> kind of this: >> >> --- >> >> x Every few seconds the com.cloud.agent.api.CheckRouterCommand returns >> a state BACKUP or MASTER correctly >> x When the problem occurs the log messages change. Some snippets below >> >> x ... Waiting some more time because this is the current command >> x ... Waiting some more time because this is the current command >> x Could not find exception: >> com.cloud.exception.OperationTimedoutException in error code list for >> exceptions >> x Timed out on Seq 28-2352567855348137104 >> x Seq 28-2352567855348137104: Cancelling. >> x Operation timed out: Commands 2352567855348137104 to Host 28 timed >> out after 60 >> x Unable to update router r-2594-VM's status >> x Redundant virtual router (name: r-2594-VM, id: 2594) just switch >> from MASTER to UNKNOWN >> >> x Those error messages are now repeated for each following >> CheckRouterCommand until the virtual router is rebootet >> >> >> Greetings, >> >> Melanie >> >> -- >> -- >> >> Heinlein Support GmbH >> Linux: Akademie - Support - Hosting >> >> http://www.heinlein-support.de >> Tel: 030 / 40 50 51 - 0 >> Fax: 030 / 40 50 51 - 19 >> >> Zwangsangaben lt. §35a GmbHG: >> HRB 93818 B / Amtsgericht Berlin-Charlottenburg, >> Geschäftsführer: Peer Heinlein -- Sitz: Berlin >> > > > -- -- Heinlein Support GmbH Linux: Akademie - Support - Hosting http://www.heinlein-support.de Tel: 030 / 40 50 51 - 0 Fax: 030 / 40 50 51 - 19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin
Re: com.cloud.agent.api.CheckRouterCommand timeout
Melanie, this depends a bit on the type of hypervisor. The command executes the checkrouter.sh script on the virtual router if it reaches it, but it seems your problem is before that. I would look at the network first and follow the path that the execution takes for your hypervisortype. On Wed, Jun 20, 2018 at 1:53 PM, Melanie Desaive < m.desa...@heinlein-support.de> wrote: > Hi all, > > we have a recurring problem with our virtual routers. By the log > messages it seems that com.cloud.agent.api.CheckRouterCommand runs into > a timeout and therefore switches to UNKNOWN. > > All network traffic through the routers is still working. They can be > accessed by their link-local IP adresses, and configuration looks good > at a first sight. But configuration changes through the CloudStack API > do no longer reach the routers. A reboot fixes the problem. > > I would like to investigate a little further but lack understanding > about how the checkRouter command is trying to access the virtual router. > > Could someone point me to some relevant documentation or give a short > overview how the connection from CS-Management is done and where such an > timeout could occur? > > As background information - the sequence from the management log looks > kind of this: > > --- > > x Every few seconds the com.cloud.agent.api.CheckRouterCommand returns > a state BACKUP or MASTER correctly > x When the problem occurs the log messages change. Some snippets below > > x ... Waiting some more time because this is the current command > x ... Waiting some more time because this is the current command > x Could not find exception: > com.cloud.exception.OperationTimedoutException in error code list for > exceptions > x Timed out on Seq 28-2352567855348137104 > x Seq 28-2352567855348137104: Cancelling. > x Operation timed out: Commands 2352567855348137104 to Host 28 timed > out after 60 > x Unable to update router r-2594-VM's status > x Redundant virtual router (name: r-2594-VM, id: 2594) just switch > from MASTER to UNKNOWN > > x Those error messages are now repeated for each following > CheckRouterCommand until the virtual router is rebootet > > > Greetings, > > Melanie > > -- > -- > > Heinlein Support GmbH > Linux: Akademie - Support - Hosting > > http://www.heinlein-support.de > Tel: 030 / 40 50 51 - 0 > Fax: 030 / 40 50 51 - 19 > > Zwangsangaben lt. §35a GmbHG: > HRB 93818 B / Amtsgericht Berlin-Charlottenburg, > Geschäftsführer: Peer Heinlein -- Sitz: Berlin > -- Daan
com.cloud.agent.api.CheckRouterCommand timeout
Hi all, we have a recurring problem with our virtual routers. By the log messages it seems that com.cloud.agent.api.CheckRouterCommand runs into a timeout and therefore switches to UNKNOWN. All network traffic through the routers is still working. They can be accessed by their link-local IP adresses, and configuration looks good at a first sight. But configuration changes through the CloudStack API do no longer reach the routers. A reboot fixes the problem. I would like to investigate a little further but lack understanding about how the checkRouter command is trying to access the virtual router. Could someone point me to some relevant documentation or give a short overview how the connection from CS-Management is done and where such an timeout could occur? As background information - the sequence from the management log looks kind of this: --- x Every few seconds the com.cloud.agent.api.CheckRouterCommand returns a state BACKUP or MASTER correctly x When the problem occurs the log messages change. Some snippets below x ... Waiting some more time because this is the current command x ... Waiting some more time because this is the current command x Could not find exception: com.cloud.exception.OperationTimedoutException in error code list for exceptions x Timed out on Seq 28-2352567855348137104 x Seq 28-2352567855348137104: Cancelling. x Operation timed out: Commands 2352567855348137104 to Host 28 timed out after 60 x Unable to update router r-2594-VM's status x Redundant virtual router (name: r-2594-VM, id: 2594) just switch from MASTER to UNKNOWN x Those error messages are now repeated for each following CheckRouterCommand until the virtual router is rebootet Greetings, Melanie -- -- Heinlein Support GmbH Linux: Akademie - Support - Hosting http://www.heinlein-support.de Tel: 030 / 40 50 51 - 0 Fax: 030 / 40 50 51 - 19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin