Re: [RFC]Bypass Libvirt storage pool for NFS
On 03/20/2014 07:51 PM, Nux! wrote: On 20.03.2014 18:48, Wido den Hollander wrote: And it just went upstream! How great is that? Pretty great. That's how open source works. :) Quick question: there is no problem if instead of using NFS directly we use the "shared mount point" option, is it? Probably not, since we would then be mounting the filesystem manually instead of having libvirt do it. Ok, so worst case scenario people can just fall-back on this option. I'd say get this patch down into EL6 and I'll try to get it into Ubuntu 14.04. Done! I just got e-mail from Bugzilla: It's fixed in libvirt-0.10.2-32.el6 Wido Yeah, fingers crossed on that! Lucian
RE: [RFC]Bypass Libvirt storage pool for NFS
> -Original Message- > From: Nux! [mailto:n...@li.nux.ro] > Sent: Thursday, March 20, 2014 11:51 AM > To: dev@cloudstack.apache.org > Subject: Re: [RFC]Bypass Libvirt storage pool for NFS > > On 20.03.2014 18:48, Wido den Hollander wrote: > > > > And it just went upstream! How great is that? > > Pretty great. That's how open source works. :) Thanks guys to push it:) But fundamentally, I don't think Libvirt doing a right thing here, libvirt should not need to care about integrity of storage pool, as the storage pool is shared by multiple hypervisor hosts, one libvirt is only one of them, so it's useless and error-prone, to check the files on the storage pool. If the integrity of storage pool is broken, then user should complain to CloudStack or other upper layer of cloud orchestration software. > > > > >> Quick question: there is no problem if instead of using NFS directly > >> we use the "shared mount point" option, is it? > >> > > > > Probably not, since we would then be mounting the filesystem manually > > instead of having libvirt do it. > > Ok, so worst case scenario people can just fall-back on this option. > > > > > I'd say get this patch down into EL6 and I'll try to get it into > > Ubuntu 14.04. > > Yeah, fingers crossed on that! > > Lucian > > -- > Sent from the Delta quadrant using Borg technology! > > Nux! > www.nux.ro
Re: [RFC]Bypass Libvirt storage pool for NFS
On 20.03.2014 18:48, Wido den Hollander wrote: And it just went upstream! How great is that? Pretty great. That's how open source works. :) Quick question: there is no problem if instead of using NFS directly we use the "shared mount point" option, is it? Probably not, since we would then be mounting the filesystem manually instead of having libvirt do it. Ok, so worst case scenario people can just fall-back on this option. I'd say get this patch down into EL6 and I'll try to get it into Ubuntu 14.04. Yeah, fingers crossed on that! Lucian -- Sent from the Delta quadrant using Borg technology! Nux! www.nux.ro
Re: [RFC]Bypass Libvirt storage pool for NFS
On 03/20/2014 05:38 PM, Nux! wrote: On 19.03.2014 22:48, Edison Su wrote: -Original Message- From: Nux! [mailto:n...@li.nux.ro] Sent: Wednesday, March 19, 2014 3:34 PM To: dev@cloudstack.apache.org Subject: RE: [RFC]Bypass Libvirt storage pool for NFS On 19.03.2014 22:28, Edison Su wrote: Edison, if - with the workarounds in place now - the current version of KVM works OK, then why wouldn't a newer version work just as fine? Just trying to understand this. That's a long story, there is a bug in Libvirt, which is introduced in a newer version(>0.9.10), which can make the storage pool disappear. Edison, that I understand, but what is the technical reason that prevents using newer KVM? It looks like current KVM works fine on CentOS 6.5 for example which has libvirt 0.10.2. Yes, at first glance, the newer version libvirt(> 0.9.10) just works fine. But under stress test, it will complain NFS storage pool missing, and can't add the storage pool back, unless you shut down all the VMs which using the storage pool. That's the bug(https://bugzilla.redhat.com/show_bug.cgi?id=977706) all about. In ACS 4.2/4.3 release, we only recommend to use libvirt <=0.9.10, if primary storage is NFS. Ok, I'm trying to make some noise in that bz entry, hopefully someone gets annoyed enough to do something about it. And it just went upstream! How great is that? Quick question: there is no problem if instead of using NFS directly we use the "shared mount point" option, is it? Probably not, since we would then be mounting the filesystem manually instead of having libvirt do it. I'd say get this patch down into EL6 and I'll try to get it into Ubuntu 14.04. Wido Lucian
RE: [RFC]Bypass Libvirt storage pool for NFS
On 19.03.2014 22:48, Edison Su wrote: -Original Message- From: Nux! [mailto:n...@li.nux.ro] Sent: Wednesday, March 19, 2014 3:34 PM To: dev@cloudstack.apache.org Subject: RE: [RFC]Bypass Libvirt storage pool for NFS On 19.03.2014 22:28, Edison Su wrote: Edison, if - with the workarounds in place now - the current version of KVM works OK, then why wouldn't a newer version work just as fine? Just trying to understand this. That's a long story, there is a bug in Libvirt, which is introduced in a newer version(>0.9.10), which can make the storage pool disappear. Edison, that I understand, but what is the technical reason that prevents using newer KVM? It looks like current KVM works fine on CentOS 6.5 for example which has libvirt 0.10.2. Yes, at first glance, the newer version libvirt(> 0.9.10) just works fine. But under stress test, it will complain NFS storage pool missing, and can't add the storage pool back, unless you shut down all the VMs which using the storage pool. That's the bug(https://bugzilla.redhat.com/show_bug.cgi?id=977706) all about. In ACS 4.2/4.3 release, we only recommend to use libvirt <=0.9.10, if primary storage is NFS. Ok, I'm trying to make some noise in that bz entry, hopefully someone gets annoyed enough to do something about it. Quick question: there is no problem if instead of using NFS directly we use the "shared mount point" option, is it? Lucian -- Sent from the Delta quadrant using Borg technology! Nux! www.nux.ro
RE: [RFC]Bypass Libvirt storage pool for NFS
> -Original Message- > From: Nux! [mailto:n...@li.nux.ro] > Sent: Wednesday, March 19, 2014 3:34 PM > To: dev@cloudstack.apache.org > Subject: RE: [RFC]Bypass Libvirt storage pool for NFS > > On 19.03.2014 22:28, Edison Su wrote: > >> Edison, if - with the workarounds in place now - the current version > >> of KVM works OK, then why wouldn't a newer version work just as fine? > >> Just trying to understand this. > > > > That's a long story, there is a bug in Libvirt, which is introduced in > > a newer version(>0.9.10), which can make the storage pool disappear. > > Edison, that I understand, but what is the technical reason that prevents > using newer KVM? > It looks like current KVM works fine on CentOS 6.5 for example which has > libvirt 0.10.2. Yes, at first glance, the newer version libvirt(> 0.9.10) just works fine. But under stress test, it will complain NFS storage pool missing, and can't add the storage pool back, unless you shut down all the VMs which using the storage pool. That's the bug(https://bugzilla.redhat.com/show_bug.cgi?id=977706) all about. In ACS 4.2/4.3 release, we only recommend to use libvirt <=0.9.10, if primary storage is NFS. > > Lucian > > -- > Sent from the Delta quadrant using Borg technology! > > Nux! > www.nux.ro
RE: [RFC]Bypass Libvirt storage pool for NFS
On 19.03.2014 22:28, Edison Su wrote: Edison, if - with the workarounds in place now - the current version of KVM works OK, then why wouldn't a newer version work just as fine? Just trying to understand this. That's a long story, there is a bug in Libvirt, which is introduced in a newer version(>0.9.10), which can make the storage pool disappear. Edison, that I understand, but what is the technical reason that prevents using newer KVM? It looks like current KVM works fine on CentOS 6.5 for example which has libvirt 0.10.2. Lucian -- Sent from the Delta quadrant using Borg technology! Nux! www.nux.ro
RE: [RFC]Bypass Libvirt storage pool for NFS
> -Original Message- > From: Nux! [mailto:n...@li.nux.ro] > Sent: Wednesday, March 19, 2014 3:07 PM > To: dev@cloudstack.apache.org > Subject: RE: [RFC]Bypass Libvirt storage pool for NFS > > On 19.03.2014 20:29, Edison Su wrote: > > It’s hard to find root cause and fix something in libvirt, even we > > found the root cause, it's hard to push the fix into libvirt upstream, > > and not to say push into downstream, like RHEL 6 etc. For example, we > > already have a fix for the bug > > https://bugzilla.redhat.com/show_bug.cgi?id=977706 for a few month > > now, there is no resolution to resolve the issue. Without the fix, we > > just simply are blocked to support newer version of KVM. > > Edison, if - with the workarounds in place now - the current version of KVM > works OK, then why wouldn't a newer version work just as fine? > Just trying to understand this. That's a long story, there is a bug in Libvirt, which is introduced in a newer version(>0.9.10), which can make the storage pool disappear. Wei made a patch to fix it, more than half a year ago: https://www.redhat.com/archives/libvir-list/2013-July/msg00635.html, unfortunately, the patch seems not getting into upstream yet. So in order to move forward, to support newer version of KVM, we have to do something in 4.4 release. > > Lucian > > -- > Sent from the Delta quadrant using Borg technology! > > Nux! > www.nux.ro
RE: [RFC]Bypass Libvirt storage pool for NFS
On 19.03.2014 20:29, Edison Su wrote: It’s hard to find root cause and fix something in libvirt, even we found the root cause, it's hard to push the fix into libvirt upstream, and not to say push into downstream, like RHEL 6 etc. For example, we already have a fix for the bug https://bugzilla.redhat.com/show_bug.cgi?id=977706 for a few month now, there is no resolution to resolve the issue. Without the fix, we just simply are blocked to support newer version of KVM. Edison, if - with the workarounds in place now - the current version of KVM works OK, then why wouldn't a newer version work just as fine? Just trying to understand this. Lucian -- Sent from the Delta quadrant using Borg technology! Nux! www.nux.ro
Re: [RFC]Bypass Libvirt storage pool for NFS
I'm all for adding a bit of additional smarts to cloudstack so it can workaround the current KVM limitations. Waiting for anything to get fixed up stream is affecting deployments NOW, and a bit utopian. cloudstack seems to be the lower barrier to entry on getting these scenarios addressed. On Wed, Mar 19, 2014 at 1:29 PM, Edison Su wrote: > It's hard to find root cause and fix something in libvirt, even we found > the root cause, it's hard to push the fix into libvirt upstream, and not to > say push into downstream, like RHEL 6 etc. For example, we already have a > fix for the bug https://bugzilla.redhat.com/show_bug.cgi?id=977706 for a > few month now, there is no resolution to resolve the issue. Without the > fix, we just simply are blocked to support newer version of KVM. > > So if community doesn't like what I proposed, then how about another way: > I will write a new implementation of KVMStoragePool interface, which will > be backed by java/python/shell script, it won't be enabled by default. > It's a simple thing, don't understand why libvirt gets it done so > complicated, and introduce a lot of pain. > > > -Original Message- > > From: Nux! [mailto:n...@li.nux.ro] > > Sent: Wednesday, March 19, 2014 12:35 PM > > To: dev@cloudstack.apache.org > > Subject: Re: [RFC]Bypass Libvirt storage pool for NFS > > > > On 19.03.2014 19:01, Wido den Hollander wrote: > > > On 03/19/2014 07:54 PM, Edison Su wrote: > > >> I found many times in QA's testing environment, the libvirt storage > > >> pool(created on NFS) is missing on the kvm host frequently, for no > > >> reason. It may relate to bug > > >> https://bugzilla.redhat.com/show_bug.cgi?id=977706. > > >> In order to fix this issue, and bug CLOUDSTACK-2729, we added a lot > > >> of workaround to fight with libvirt, such as, if can't find the > > >> storage pool, then create the same pool again etc. As the storage > > >> pool can be lost on kvm host at any time, it will cause a lot of > > >> operation errors, such as can't start vm, can't delete volume etc, > etc. > > >> I want to bypass libvirt storage pool for NFS, as java itself, > > >> already have all the capabilities that libvirt can provide, such as > > >> create a file, delete a file, list a directory etc, there is no need > > >> to add another layer of crap here. In doing so, we won't be blocked > > >> by libvirt bug(https://bugzilla.redhat.com/show_bug.cgi?id=977706) to > > >> support newer version of KVM. > > >> > > > > > > -1 > > > > > > I understand the issues which we see here, but imho the way forward is > > > to fix this in libvirt instead of simply go around it. > > > > > > We should not try to re-invent the wheel here, but fix the root-cause. > > > > > > Yes, Java can do a lot, but I think libvirt can do this better. > > > > > > For the RBD code I also had a couple of changes go into libvirt > > > recently and this NFS issue can also be fixed. > > > > > > Loosing NFS pools in libvirt is most of the times due to a restart of > > > libvirt, they don't magically disappear from libvirt. > > > > > > I agree that we should be able to start the pool again even while it's > > > mounted, but that's something we should fix in libvirt. > > > > > > Wido > > > > -1 and 100% with Wido. If libvirt gets fixed then it would save loads of > code in > > the future and bring other benefits (think support for Xen Project via > libvirt > > etc). > > Let's push for libvirt fix instead. > > > > My 2 cents, > > Lucian > > > > > > -- > > Sent from the Delta quadrant using Borg technology! > > > > Nux! > > www.nux.ro >
RE: [RFC]Bypass Libvirt storage pool for NFS
It’s hard to find root cause and fix something in libvirt, even we found the root cause, it's hard to push the fix into libvirt upstream, and not to say push into downstream, like RHEL 6 etc. For example, we already have a fix for the bug https://bugzilla.redhat.com/show_bug.cgi?id=977706 for a few month now, there is no resolution to resolve the issue. Without the fix, we just simply are blocked to support newer version of KVM. So if community doesn't like what I proposed, then how about another way: I will write a new implementation of KVMStoragePool interface, which will be backed by java/python/shell script, it won't be enabled by default. It's a simple thing, don't understand why libvirt gets it done so complicated, and introduce a lot of pain. > -Original Message- > From: Nux! [mailto:n...@li.nux.ro] > Sent: Wednesday, March 19, 2014 12:35 PM > To: dev@cloudstack.apache.org > Subject: Re: [RFC]Bypass Libvirt storage pool for NFS > > On 19.03.2014 19:01, Wido den Hollander wrote: > > On 03/19/2014 07:54 PM, Edison Su wrote: > >> I found many times in QA's testing environment, the libvirt storage > >> pool(created on NFS) is missing on the kvm host frequently, for no > >> reason. It may relate to bug > >> https://bugzilla.redhat.com/show_bug.cgi?id=977706. > >> In order to fix this issue, and bug CLOUDSTACK-2729, we added a lot > >> of workaround to fight with libvirt, such as, if can't find the > >> storage pool, then create the same pool again etc. As the storage > >> pool can be lost on kvm host at any time, it will cause a lot of > >> operation errors, such as can't start vm, can't delete volume etc, etc. > >> I want to bypass libvirt storage pool for NFS, as java itself, > >> already have all the capabilities that libvirt can provide, such as > >> create a file, delete a file, list a directory etc, there is no need > >> to add another layer of crap here. In doing so, we won't be blocked > >> by libvirt bug(https://bugzilla.redhat.com/show_bug.cgi?id=977706) to > >> support newer version of KVM. > >> > > > > -1 > > > > I understand the issues which we see here, but imho the way forward is > > to fix this in libvirt instead of simply go around it. > > > > We should not try to re-invent the wheel here, but fix the root-cause. > > > > Yes, Java can do a lot, but I think libvirt can do this better. > > > > For the RBD code I also had a couple of changes go into libvirt > > recently and this NFS issue can also be fixed. > > > > Loosing NFS pools in libvirt is most of the times due to a restart of > > libvirt, they don't magically disappear from libvirt. > > > > I agree that we should be able to start the pool again even while it's > > mounted, but that's something we should fix in libvirt. > > > > Wido > > -1 and 100% with Wido. If libvirt gets fixed then it would save loads of code > in > the future and bring other benefits (think support for Xen Project via libvirt > etc). > Let's push for libvirt fix instead. > > My 2 cents, > Lucian > > > -- > Sent from the Delta quadrant using Borg technology! > > Nux! > www.nux.ro
Re: [RFC]Bypass Libvirt storage pool for NFS
On 19.03.2014 19:01, Wido den Hollander wrote: On 03/19/2014 07:54 PM, Edison Su wrote: I found many times in QA's testing environment, the libvirt storage pool(created on NFS) is missing on the kvm host frequently, for no reason. It may relate to bug https://bugzilla.redhat.com/show_bug.cgi?id=977706. In order to fix this issue, and bug CLOUDSTACK-2729, we added a lot of workaround to fight with libvirt, such as, if can't find the storage pool, then create the same pool again etc. As the storage pool can be lost on kvm host at any time, it will cause a lot of operation errors, such as can't start vm, can't delete volume etc, etc. I want to bypass libvirt storage pool for NFS, as java itself, already have all the capabilities that libvirt can provide, such as create a file, delete a file, list a directory etc, there is no need to add another layer of crap here. In doing so, we won't be blocked by libvirt bug(https://bugzilla.redhat.com/show_bug.cgi?id=977706) to support newer version of KVM. -1 I understand the issues which we see here, but imho the way forward is to fix this in libvirt instead of simply go around it. We should not try to re-invent the wheel here, but fix the root-cause. Yes, Java can do a lot, but I think libvirt can do this better. For the RBD code I also had a couple of changes go into libvirt recently and this NFS issue can also be fixed. Loosing NFS pools in libvirt is most of the times due to a restart of libvirt, they don't magically disappear from libvirt. I agree that we should be able to start the pool again even while it's mounted, but that's something we should fix in libvirt. Wido -1 and 100% with Wido. If libvirt gets fixed then it would save loads of code in the future and bring other benefits (think support for Xen Project via libvirt etc). Let's push for libvirt fix instead. My 2 cents, Lucian -- Sent from the Delta quadrant using Borg technology! Nux! www.nux.ro
Re: [RFC]Bypass Libvirt storage pool for NFS
On 03/19/2014 07:54 PM, Edison Su wrote: I found many times in QA's testing environment, the libvirt storage pool(created on NFS) is missing on the kvm host frequently, for no reason. It may relate to bug https://bugzilla.redhat.com/show_bug.cgi?id=977706. In order to fix this issue, and bug CLOUDSTACK-2729, we added a lot of workaround to fight with libvirt, such as, if can't find the storage pool, then create the same pool again etc. As the storage pool can be lost on kvm host at any time, it will cause a lot of operation errors, such as can't start vm, can't delete volume etc, etc. I want to bypass libvirt storage pool for NFS, as java itself, already have all the capabilities that libvirt can provide, such as create a file, delete a file, list a directory etc, there is no need to add another layer of crap here. In doing so, we won't be blocked by libvirt bug(https://bugzilla.redhat.com/show_bug.cgi?id=977706) to support newer version of KVM. -1 I understand the issues which we see here, but imho the way forward is to fix this in libvirt instead of simply go around it. We should not try to re-invent the wheel here, but fix the root-cause. Yes, Java can do a lot, but I think libvirt can do this better. For the RBD code I also had a couple of changes go into libvirt recently and this NFS issue can also be fixed. Loosing NFS pools in libvirt is most of the times due to a restart of libvirt, they don't magically disappear from libvirt. I agree that we should be able to start the pool again even while it's mounted, but that's something we should fix in libvirt. Wido