I'm ok with a symptom fix on our end, if the root cause is in Libvirt we can't do much about that. This is the sort of patch that tends to get pulled into the regular update cycle of the distributions, so unless there's more to it and it's not a good fix I imagine we will see it come through without having to wait for the next point releases. We still have to support existing users who might not be running the latest, though, so the symptom fix is probably ok as a temporary measure.
On Mon, Jul 15, 2013 at 3:42 PM, Edison Su <edison...@citrix.com> wrote: > There is a serious issue on > KVM(https://issues.apache.org/jira/browse/CLOUDSTACK-2729): a libvirt storage > pool can disappear on KVM host, it's easy to be reproduced in our internal QA > environment. > Wei found the root cause, is on the libvirt: > " > This is a libvirt issue. I created a ticket for it. > https://bugzilla.redhat.com/show_bug.cgi?id=977706 > The patch is very simple. > https://www.redhat.com/archives/libvir-list/2013-July/msg00635.html > " > But it's also introduced by CloudStack, as cloudstack will call libvirt > storage pool refresh method each time when access the storage pool. The code > is added by commit: 2ffc9907f7b0d371737e39b7649f7af23026f5cf, about less than > one year ago. > > As Wei suggested, we can call storage pool refresh only if needed, it will > mitigate the issue(It's behavior I did on cloudstack pre-4.0), but it's only > treat the symptom, not the cause. > Or add a cluster wide lock, only one guy can access storage pool at one time, > we can add a file lock on NFS primary storage. > Any idea/feedback on how to fix this KVM issue? > > >