[GitHub] csquire edited a comment on issue #2722: CLOUDSTACK-10310 Fix KVM reboot on storage issue

2018-10-08 Thread GitBox
csquire edited a comment on issue #2722: CLOUDSTACK-10310 Fix KVM reboot on 
storage issue
URL: https://github.com/apache/cloudstack/pull/2722#issuecomment-427867205
 
 
   Sorry, I misspoke in my last comment (edited to make it correct). The 
blocked host doesn't reboot, it just gets marked as `Down` and the VMs are 
actually still running on it when duplicate VMs get provisioned. Maybe it's a 
completely separate issue, but will still prevent us from using 4.11 in 
production. EDIT: Actually, looks like it may have been present before 4.11.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] csquire edited a comment on issue #2722: CLOUDSTACK-10310 Fix KVM reboot on storage issue

2018-10-08 Thread GitBox
csquire edited a comment on issue #2722: CLOUDSTACK-10310 Fix KVM reboot on 
storage issue
URL: https://github.com/apache/cloudstack/pull/2722#issuecomment-427859929
 
 
   This PR doesn't seem to completely fix the problem (or maybe this is a 
completely new problem). We installed the RC release with this PR on a test 
system and are able to get the KVM host to be marked as `Down` by using 
iptables to drop outgoing requests to NFS. My investigation shows that the line 
[`storage = 
conn.storagePoolLookupByUUIDString(uuid);`](https://github.com/apache/cloudstack/blob/4.11/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/KVMHAMonitor.java#L95)
 blocks indefinitely.  So, `kvmheartbeat.sh` is never executed, a host 
investigation is started, the host with blocked NFS is marked as `Down` and 
finally all VMs on that host are rescheduled and result in duplicate VMs.
   
   I pulled a thread dump and found the KVMHAMonitor thread will hang here 
until NFS is unblocked, didn't dig any deeper yet though.
   
   ```"Thread-20" - Thread t@135
  java.lang.Thread.State: RUNNABLE
   at com.sun.jna.Native.invokePointer(Native Method)
   at com.sun.jna.Function.invokePointer(Function.java:470)
   at com.sun.jna.Function.invoke(Function.java:404)
   at com.sun.jna.Function.invoke(Function.java:315)
   at com.sun.jna.Library$Handler.invoke(Library.java:212)
   at com.sun.proxy.$Proxy3.virStoragePoolLookupByUUIDString(Unknown 
Source)
   at org.libvirt.Connect.storagePoolLookupByUUIDString(Unknown Source)
   at 
com.cloud.hypervisor.kvm.resource.KVMHAMonitor$Monitor.runInContext(KVMHAMonitor.java:95)
   - locked <1afb3370> (a java.util.concurrent.ConcurrentHashMap)
   at 
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
   at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
   at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
   at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
   at 
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
   at java.lang.Thread.run(Thread.java:748)
   
  Locked ownable synchronizers:
   - None```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services