Re: [Gluster-users] 3.7.12 disaster
On 1/07/2016 9:15 AM, Kaleb KEITHLEY wrote: There isn't a libglusterfs.a that it could static link to. Then it shouldn't matter what version it was built against should it? unless the function signatures have changed since 3.5 -- Lindsay Mathieson ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
On 06/30/2016 06:53 PM, Lindsay Mathieson wrote: > On 30/06/2016 10:31 PM, Kaushal M wrote: >> The pve-qemu-kvm package was last built or updated in January this >> year[1]. And I think it was built against glusterfs-3.5.2, which is >> the latest version of glusterfs in the proxmox sources [2]. >> Maybe the pve-qemu-kvm package needs a rebuild. > > Does qemu static link libglusterfs? > There isn't a libglusterfs.a that it could static link to. So no. -- Kaleb ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
On 30/06/2016 10:31 PM, Kaushal M wrote: The pve-qemu-kvm package was last built or updated in January this year[1]. And I think it was built against glusterfs-3.5.2, which is the latest version of glusterfs in the proxmox sources [2]. Maybe the pve-qemu-kvm package needs a rebuild. Does qemu static link libglusterfs? -- Lindsay Mathieson ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
On 06/30/2016 11:23 AM, Kaleb KEITHLEY wrote: > On 06/30/2016 11:18 AM, Vijay Bellur wrote: >> On Thu, Jun 30, 2016 at 8:31 AM, Kaushal M wrote: >>> On Thu, Jun 30, 2016 at 5:47 PM, Kevin Lemonnier >>> wrote: > > Replicated the problem with 3.7.12 *and* 3.8.0 :( > Yeah, I tried 3.8 when it came out too and I had to use the fuse mount point to get the VMs to work. I just assumed proxmox wasn't compatible yet with 3.8 (since the menu were a bit wonky anyway) but I guess it was the same bug. >>> >>> I was able to reproduce the hang as well against 3.7.12. >>> >>> I tested by installing the pve-qemu-kvm package from the Proxmox >>> repositories in a Debain Jessie container, as the default Debian qemu >>> packages don't link with glusterfs. >>> I used the 3.7.11 and 3.7.12 gluster repos from download.gluster.org. >>> >>> I tried to create an image on a simple 1 brick gluster volume using >>> qemu-img. >>> The qemu-img command succeeded against a 3.7.11 volume, but hung >>> against 3.7.12 to finally timeout and fail after ping-timeout. >>> >>> We can at-least be happy that this issue isn't due to any bugs in AFR. >>> >>> I was testing this with Raghavendra, and we are wondering if this is >>> probably a result of changes to libglusterfs and libgfapi that have >>> been introduced in 3.7.12 and 3.8. >>> Any app linking with libgfapi also needs to link with libglusterfs. >>> While we have some sort of versioning for libgfapi, we don't have any >>> for libglusterfs. >>> This has caused problems before (I cannot find any links for this >>> right now though). >>> >> >> Did any function signatures change between 3.7.11 and 3.7.12? > > In gfapi? No. And (as I'm sure you're aware) they're all versioned, so > things that linked with the old version-signature continue to do so. > > I don't know about libglusterfs. > And I'm not sure I want to suggest that we version libglusterfs for 4.0; but perhaps we ought to? -- Kaleb ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
On 06/30/2016 11:18 AM, Vijay Bellur wrote: > On Thu, Jun 30, 2016 at 8:31 AM, Kaushal M wrote: >> On Thu, Jun 30, 2016 at 5:47 PM, Kevin Lemonnier >> wrote: Replicated the problem with 3.7.12 *and* 3.8.0 :( >>> >>> Yeah, I tried 3.8 when it came out too and I had to use the fuse mount point >>> to get the VMs to work. I just assumed proxmox wasn't compatible yet with >>> 3.8 (since >>> the menu were a bit wonky anyway) but I guess it was the same bug. >>> >> >> I was able to reproduce the hang as well against 3.7.12. >> >> I tested by installing the pve-qemu-kvm package from the Proxmox >> repositories in a Debain Jessie container, as the default Debian qemu >> packages don't link with glusterfs. >> I used the 3.7.11 and 3.7.12 gluster repos from download.gluster.org. >> >> I tried to create an image on a simple 1 brick gluster volume using qemu-img. >> The qemu-img command succeeded against a 3.7.11 volume, but hung >> against 3.7.12 to finally timeout and fail after ping-timeout. >> >> We can at-least be happy that this issue isn't due to any bugs in AFR. >> >> I was testing this with Raghavendra, and we are wondering if this is >> probably a result of changes to libglusterfs and libgfapi that have >> been introduced in 3.7.12 and 3.8. >> Any app linking with libgfapi also needs to link with libglusterfs. >> While we have some sort of versioning for libgfapi, we don't have any >> for libglusterfs. >> This has caused problems before (I cannot find any links for this >> right now though). >> > > Did any function signatures change between 3.7.11 and 3.7.12? In gfapi? No. And (as I'm sure you're aware) they're all versioned, so things that linked with the old version-signature continue to do so. I don't know about libglusterfs. -- Kaleb ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
Kaushal, Raghavendra Talur(CCed) were looking into why libgfapi could be giving a problem. We will get in touch with you as soon as they have something. Please keep the test node until they reach you. Thanks again Lindsay. On Thu, Jun 30, 2016 at 5:34 PM, Lindsay Mathieson < lindsay.mathie...@gmail.com> wrote: > On 30/06/2016 2:42 PM, Pranith Kumar Karampuri wrote: > >> Glad that for both of you, things are back to normal. Could one of you >> help us find what is the problem you are facing with libgfapi, if you have >> any spare test machines. Otherwise we need to understand proxmox etc which >> may take a bit more time. >> > > I got a test node running, with a replica 3 volumes (3 bricks on same > node). > > Replicated the problem with 3.7.12 *and* 3.8.0 :( > > > I can trash this node as needed, happy to build from src and apply patches. > > -- > Lindsay Mathieson > > -- Pranith ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
On Thu, Jun 30, 2016 at 8:31 AM, Kaushal M wrote: > On Thu, Jun 30, 2016 at 5:47 PM, Kevin Lemonnier wrote: >>> >>> Replicated the problem with 3.7.12 *and* 3.8.0 :( >>> >> >> Yeah, I tried 3.8 when it came out too and I had to use the fuse mount point >> to get the VMs to work. I just assumed proxmox wasn't compatible yet with >> 3.8 (since >> the menu were a bit wonky anyway) but I guess it was the same bug. >> > > I was able to reproduce the hang as well against 3.7.12. > > I tested by installing the pve-qemu-kvm package from the Proxmox > repositories in a Debain Jessie container, as the default Debian qemu > packages don't link with glusterfs. > I used the 3.7.11 and 3.7.12 gluster repos from download.gluster.org. > > I tried to create an image on a simple 1 brick gluster volume using qemu-img. > The qemu-img command succeeded against a 3.7.11 volume, but hung > against 3.7.12 to finally timeout and fail after ping-timeout. > > We can at-least be happy that this issue isn't due to any bugs in AFR. > > I was testing this with Raghavendra, and we are wondering if this is > probably a result of changes to libglusterfs and libgfapi that have > been introduced in 3.7.12 and 3.8. > Any app linking with libgfapi also needs to link with libglusterfs. > While we have some sort of versioning for libgfapi, we don't have any > for libglusterfs. > This has caused problems before (I cannot find any links for this > right now though). > Did any function signatures change between 3.7.11 and 3.7.12? -Vijay ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
On 30/06/2016 10:31 PM, Kaushal M wrote: Any app linking with libgfapi also needs to link with libglusterfs. While we have some sort of versioning for libgfapi, we don't have any for libglusterfs. This has caused problems before (I cannot find any links for this right now though). The pve-qemu-kvm package was last built or updated in January this year[1]. And I think it was built against glusterfs-3.5.2, which is the latest version of glusterfs in the proxmox sources [2]. Maybe the pve-qemu-kvm package needs a rebuild. Tricky problem. -- Lindsay Mathieson ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
On Thu, Jun 30, 2016 at 5:47 PM, Kevin Lemonnier wrote: >> >> Replicated the problem with 3.7.12 *and* 3.8.0 :( >> > > Yeah, I tried 3.8 when it came out too and I had to use the fuse mount point > to get the VMs to work. I just assumed proxmox wasn't compatible yet with 3.8 > (since > the menu were a bit wonky anyway) but I guess it was the same bug. > I was able to reproduce the hang as well against 3.7.12. I tested by installing the pve-qemu-kvm package from the Proxmox repositories in a Debain Jessie container, as the default Debian qemu packages don't link with glusterfs. I used the 3.7.11 and 3.7.12 gluster repos from download.gluster.org. I tried to create an image on a simple 1 brick gluster volume using qemu-img. The qemu-img command succeeded against a 3.7.11 volume, but hung against 3.7.12 to finally timeout and fail after ping-timeout. We can at-least be happy that this issue isn't due to any bugs in AFR. I was testing this with Raghavendra, and we are wondering if this is probably a result of changes to libglusterfs and libgfapi that have been introduced in 3.7.12 and 3.8. Any app linking with libgfapi also needs to link with libglusterfs. While we have some sort of versioning for libgfapi, we don't have any for libglusterfs. This has caused problems before (I cannot find any links for this right now though). The pve-qemu-kvm package was last built or updated in January this year[1]. And I think it was built against glusterfs-3.5.2, which is the latest version of glusterfs in the proxmox sources [2]. Maybe the pve-qemu-kvm package needs a rebuild. We'll continue to try to figure out what the actual issue is though. ~kaushal > -- > Kevin Lemonnier > PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
> > Replicated the problem with 3.7.12 *and* 3.8.0 :( > Yeah, I tried 3.8 when it came out too and I had to use the fuse mount point to get the VMs to work. I just assumed proxmox wasn't compatible yet with 3.8 (since the menu were a bit wonky anyway) but I guess it was the same bug. -- Kevin Lemonnier PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 signature.asc Description: Digital signature ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
On 30/06/2016 2:42 PM, Pranith Kumar Karampuri wrote: Glad that for both of you, things are back to normal. Could one of you help us find what is the problem you are facing with libgfapi, if you have any spare test machines. Otherwise we need to understand proxmox etc which may take a bit more time. I got a test node running, with a replica 3 volumes (3 bricks on same node). Replicated the problem with 3.7.12 *and* 3.8.0 :( I can trash this node as needed, happy to build from src and apply patches. -- Lindsay Mathieson ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
On 06/30/2016 05:47 AM, Kaushal M wrote: > On Thu, Jun 30, 2016 at 2:29 PM, Lindsay Mathieson > wrote: >> On 30 June 2016 at 18:48, Kaushal M wrote: >>> I need a quick info from you guys, which packages are you using? Are >>> you using any of the packages built by the community (ie. on >>> download.gluster.org/launchpad/CentOS-storage-sig etc.). >>> We are wondering if the issues you are facing are the same that have >>> been fixed by https://review.gluster.org/14822 . >>> The packages that have been built by us, contain this patch. So if you >>> are facing problems with these packages, we can be sure its a new >>> issue. >> >> >> I'm using the same as Kevin: >> >> deb >> http://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.12/Debian/jessie/apt >> jessie main >> >> Could there have been a packaging error perhaps? > > These packages should include the patch. > > We need to check with Kaleb, who built the packages could confirm if > it could be a packaging error. > He is away today/tomorrow, let's hope he checks his mail. The Debian 3.7.12 packages _do_ have http://review.gluster.org/14822 (a.k.a. http://review.gluster.org/14779) Jury duty cancelled. I'm working today. -- Kaleb ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
On Thu, Jun 30, 2016 at 2:29 PM, Lindsay Mathieson wrote: > On 30 June 2016 at 18:48, Kaushal M wrote: >> I need a quick info from you guys, which packages are you using? Are >> you using any of the packages built by the community (ie. on >> download.gluster.org/launchpad/CentOS-storage-sig etc.). >> We are wondering if the issues you are facing are the same that have >> been fixed by https://review.gluster.org/14822 . >> The packages that have been built by us, contain this patch. So if you >> are facing problems with these packages, we can be sure its a new >> issue. > > > I'm using the same as Kevin: > > deb > http://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.12/Debian/jessie/apt > jessie main > > Could there have been a packaging error perhaps? These packages should include the patch. We need to check with Kaleb, who built the packages could confirm if it could be a packaging error. He is away today/tomorrow, let's hope he checks his mail. > > -- > Lindsay ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
On 30 June 2016 at 18:48, Kaushal M wrote: > I need a quick info from you guys, which packages are you using? Are > you using any of the packages built by the community (ie. on > download.gluster.org/launchpad/CentOS-storage-sig etc.). > We are wondering if the issues you are facing are the same that have > been fixed by https://review.gluster.org/14822 . > The packages that have been built by us, contain this patch. So if you > are facing problems with these packages, we can be sure its a new > issue. I'm using the same as Kevin: deb http://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.12/Debian/jessie/apt jessie main Could there have been a packaging error perhaps? -- Lindsay ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
> > I need a quick info from you guys, which packages are you using? Are > you using any of the packages built by the community (ie. on > download.gluster.org/launchpad/CentOS-storage-sig etc.). > We are wondering if the issues you are facing are the same that have > been fixed by https://review.gluster.org/14822 . > The packages that have been built by us, contain this patch. So if you > are facing problems with these packages, we can be sure its a new > issue. > r...@s2.name [hostname]:~ # cat /etc/apt/sources.list.d/gluster.list deb http://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.12/Debian/jessie/apt jessie main Should include the patch, right ? -- Kevin Lemonnier PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 signature.asc Description: Digital signature ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
On Thu, Jun 30, 2016 at 12:29 PM, Kevin Lemonnier wrote: >>Glad that for both of you, things are back to normal. Could one of you >>help us find what is the problem you are facing with libgfapi, if you have >>any spare test machines. Otherwise we need to understand proxmox etc which >>may take a bit more time. > > Sure, I have my test cluster working now using NFS but I can create other VMs > using the lib to test if needed. What would you need ? Unfortunatly creating > a VM on gluster through the lib doesn't work and I don't know how to get the > logs of that, > the only error I get is this in proxmox logs : > > Jun 29 13:26:25 s2.name pvedaemon[2803]: create failed - unable to create > image: got lock timeout - aborting command > Jun 29 13:26:52 s2.name qemu-img[2811]: [2016-06-29 11:26:52.485296] C > [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-gluster-client-3: server > 172.16.0.2:49153 has not responded in the last 42 seconds, disconnecting. > Jun 29 13:26:52 s2.name qemu-img[2811]: [2016-06-29 11:26:52.485407] C > [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-gluster-client-4: server > 172.16.0.3:49153 has not responded in the last 42 seconds, disconnecting. > Jun 29 13:26:52 s2.name qemu-img[2811]: [2016-06-29 11:26:52.485443] C > [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-gluster-client-5: server > 172.16.0.50:49153 has not responded in the last 42 seconds, disconnecting. > I need a quick info from you guys, which packages are you using? Are you using any of the packages built by the community (ie. on download.gluster.org/launchpad/CentOS-storage-sig etc.). We are wondering if the issues you are facing are the same that have been fixed by https://review.gluster.org/14822 . The packages that have been built by us, contain this patch. So if you are facing problems with these packages, we can be sure its a new issue. > > -- > Kevin Lemonnier > PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
>Glad that for both of you, things are back to normal. Could one of you >help us find what is the problem you are facing with libgfapi, if you have >any spare test machines. Otherwise we need to understand proxmox etc which >may take a bit more time. Sure, I have my test cluster working now using NFS but I can create other VMs using the lib to test if needed. What would you need ? Unfortunatly creating a VM on gluster through the lib doesn't work and I don't know how to get the logs of that, the only error I get is this in proxmox logs : Jun 29 13:26:25 s2.name pvedaemon[2803]: create failed - unable to create image: got lock timeout - aborting command Jun 29 13:26:52 s2.name qemu-img[2811]: [2016-06-29 11:26:52.485296] C [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-gluster-client-3: server 172.16.0.2:49153 has not responded in the last 42 seconds, disconnecting. Jun 29 13:26:52 s2.name qemu-img[2811]: [2016-06-29 11:26:52.485407] C [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-gluster-client-4: server 172.16.0.3:49153 has not responded in the last 42 seconds, disconnecting. Jun 29 13:26:52 s2.name qemu-img[2811]: [2016-06-29 11:26:52.485443] C [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-gluster-client-5: server 172.16.0.50:49153 has not responded in the last 42 seconds, disconnecting. -- Kevin Lemonnier PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 signature.asc Description: Digital signature ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
On 30 June 2016 at 14:42, Pranith Kumar Karampuri wrote: > Could one of you help us find what is the problem you are facing with > libgfapi, if you have any spare test machines. Otherwise we need to > understand proxmox etc which may take a bit more time. I'm looking at getting a basic PC reurposed for testing today - to be exact a Celeron NU with 8GB ram. We'll see if its usable :) -- Lindsay ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
Glad that for both of you, things are back to normal. Could one of you help us find what is the problem you are facing with libgfapi, if you have any spare test machines. Otherwise we need to understand proxmox etc which may take a bit more time. On Wed, Jun 29, 2016 at 11:49 PM, Kevin Lemonnier wrote: > > > > To me it looks like a libgfapi problem. VM's were working via fuse. > > > > Yeah, as mentionned in other messages I can confirm libgfapi isn't working > in 3.7.12. > I don't know if there was some breaking change between 3.7.11 and 3.7.12 > but looks like > it's not working anymore, at least not in the latest proxmox. > > -- > Kevin Lemonnier > PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users > -- Pranith ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
> > To me it looks like a libgfapi problem. VM's were working via fuse. > Yeah, as mentionned in other messages I can confirm libgfapi isn't working in 3.7.12. I don't know if there was some breaking change between 3.7.11 and 3.7.12 but looks like it's not working anymore, at least not in the latest proxmox. -- Kevin Lemonnier PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 signature.asc Description: Digital signature ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
On 29/06/2016 10:48 PM, Kevin Lemonnier wrote: I do have a problem the libgfapi though, I can't create VMs with qcow disks (I get a timeout) and I can create VMs with raw disks but when I try to format them with mkfs.ext4 they shut down without any errors. Downgraded back to 3.7.11 and got everything working again. To me it looks like a libgfapi problem. VM's were working via fuse. -- Lindsay Mathieson ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
> > Which NFS server are you using? the std one built into proxmox/debian? > how do you handle redundancy? I mean the one in gluster, I added the gluster volume as NFS in proxmox instead of as gluster. -- Kevin Lemonnier PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 signature.asc Description: Digital signature ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
On 29/06/2016 11:29 PM, Kevin Lemonnier wrote: I mean the one in gluster, I added the gluster volume as NFS in proxmox instead of as gluster. I didn't realise that worked on debian, is that the ganesha server? -- Lindsay Mathieson ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
On 29/06/2016 10:48 PM, Kevin Lemonnier wrote: Just to add some info on that, I did a fresh install of 3.7.12 here (without setting that option) and I don't have a problem starting the VMs. I do have a problem the libgfapi though, I can't create VMs with qcow disks (I get a timeout) and I can create VMs with raw disks but when I try to format them with mkfs.ext4 they shut down without any errors. Maybe it's related ? Are you using qcow ? Yes, I am I added the volume as NFS and I'm using that without any problem for now with both qwoc and raw, maybe you could try that, see if at least your VMs can boot that way. Which NFS server are you using? the std one built into proxmox/debian? how do you handle redundancy? That did suggest to me trying using the fuse client, which proxmox automatically sets up. I changed my gfgapi storage to shared directory storage pointing to the fuse mount. That is working better, I have several VM's running now and heal info isn't locking up or reporting any issues. However several other VM's won't start, qemu erros out with: "Could not read qcow2 header: Operation not permitted", which freaked me out till I manually checked the image with qemu-img, which reported it as fine. Perhaps I need to reboot the cluster again to reset any locks or randomness. -- Lindsay Mathieson ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
Just to add some info on that, I did a fresh install of 3.7.12 here (without setting that option) and I don't have a problem starting the VMs. I do have a problem the libgfapi though, I can't create VMs with qcow disks (I get a timeout) and I can create VMs with raw disks but when I try to format them with mkfs.ext4 they shut down without any errors. Maybe it's related ? Are you using qcow ? I added the volume as NFS and I'm using that without any problem for now with both qwoc and raw, maybe you could try that, see if at least your VMs can boot that way. On Wed, Jun 29, 2016 at 06:25:44PM +1000, Lindsay Mathieson wrote: > Was able to shutdown my gluster and clean reboot. > > set: > > cluster.shd-max-threads:4 > cluster.locking-scheme:granular > > And started one VM. It got halfway booted and froze. A gluster heal > info returned "Not able to fetch volfile from glusterd" > > I killed the VM, stopped the datastore and all gluster processes, then > started it back up. Heal info was successful and showed 200+ shards > being healed. > > however a heal info heal-count show's 0 heals > > I stopped the dtastore again and reset the settings: > cluster.shd-max-threads > cluster.locking-scheme > > Waiting for heal to complete. Before I try again. > > Contemplating undoing the upgrade. Can I set the opversion back to 30710? > > > > -- > Lindsay > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users -- Kevin Lemonnier PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 signature.asc Description: Digital signature ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
When I try to start a VM the brick log is spammed with the following: [2016-06-29 12:29:17.844704] E [MSGID: 113022] [posix.c:1244:posix_mknod] 0-datastore4-posix: mknod on /tank/vmdata/datastore4/.shard/b2996a69-f629-4425-9098-e62c25d9f033.832 failed [File exists] [2016-06-29 12:29:18.030093] E [MSGID: 113022] [posix.c:1244:posix_mknod] 0-datastore4-posix: mknod on /tank/vmdata/datastore4/.shard/b2996a69-f629-4425-9098-e62c25d9f033.835 failed [File exists] [2016-06-29 12:29:19.276670] E [MSGID: 113022] [posix.c:1244:posix_mknod] 0-datastore4-posix: mknod on /tank/vmdata/datastore4/.shard/b2996a69-f629-4425-9098-e62c25d9f033.478 failed [File exists] [2016-06-29 12:29:19.915686] E [MSGID: 113022] [posix.c:1244:posix_mknod] 0-datastore4-posix: mknod on /tank/vmdata/datastore4/.shard/b2996a69-f629-4425-9098-e62c25d9f033.813 failed [File exists] [2016-06-29 12:29:20.270403] E [MSGID: 113022] [posix.c:1244:posix_mknod] 0-datastore4-posix: mknod on /tank/vmdata/datastore4/.shard/b2996a69-f629-4425-9098-e62c25d9f033.476 failed [File exists] [2016-06-29 12:29:20.750933] E [MSGID: 113022] [posix.c:1244:posix_mknod] 0-datastore4-posix: mknod on /tank/vmdata/datastore4/.shard/b2996a69-f629-4425-9098-e62c25d9f033.505 failed [File exists] [2016-06-29 12:29:21.175397] E [MSGID: 113022] [posix.c:1244:posix_mknod] 0-datastore4-posix: mknod on /tank/vmdata/datastore4/.shard/b2996a69-f629-4425-9098-e62c25d9f033.159 failed [File exists] [2016-06-29 12:29:21.366887] E [MSGID: 113022] [posix.c:1244:posix_mknod] 0-datastore4-posix: mknod on /tank/vmdata/datastore4/.shard/b2996a69-f629-4425-9098-e62c25d9f033.220 failed [File exists] [2016-06-29 12:29:21.827546] E [MSGID: 113022] [posix.c:1244:posix_mknod] 0-datastore4-posix: mknod on /tank/vmdata/datastore4/.shard/b2996a69-f629-4425-9098-e62c25d9f033.640 failed [File exists] [2016-06-29 12:29:22.329726] E [MSGID: 113022] [posix.c:1244:posix_mknod] 0-datastore4-posix: mknod on /tank/vmdata/datastore4/.shard/b2996a69-f629-4425-9098-e62c25d9f033.509 failed [File exists] [2016-06-29 12:29:22.790829] E [MSGID: 113022] [posix.c:1244:posix_mknod] 0-datastore4-posix: mknod on /tank/vmdata/datastore4/.shard/b2996a69-f629-4425-9098-e62c25d9f033.119 failed [File exists] [2016-06-29 12:29:24.180752] E [MSGID: 113022] [posix.c:1244:posix_mknod] 0-datastore4-posix: mknod on /tank/vmdata/datastore4/.shard/b2996a69-f629-4425-9098-e62c25d9f033.508 failed [File exists] -- Lindsay Mathieson ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
On Wed, Jun 29, 2016 at 2:41 PM, wrote: > No i didn’t, but I set those options and rebooted to upgrade the client at > the same time. > This is the part that confuses me. You can *not* set 3.7.12 option when 3.7.11 clients are still in play. Something is a miss. Can you give me 'ls -l please? > > > Will get the logs > > > > Sent from my Windows 10 phone > > > > *From: *Anuradha Talur > *Sent: *Wednesday, 29 June 2016 7:05 PM > *To: *Lindsay Mathieson > *Cc: *gluster-users > > *Subject: *Re: [Gluster-users] 3.7.12 disaster > > > > Lindsay, > > > > Did you see any problems in the setup before you set those options? > > > > Also, could you please share glusterd and glfsheal logs before you revert > to 3.7.11, > > so that it can be analyzed? > > > > - Original Message - > > > From: "Lindsay Mathieson" > > > To: "gluster-users" > > > Sent: Wednesday, June 29, 2016 2:07:30 PM > > > Subject: Re: [Gluster-users] 3.7.12 disaster > > > > > > On 29 June 2016 at 18:30, Lindsay Mathieson > > > > wrote: > > > > Same problem again. VM froze and heal info timed out with "Not able to > > > > fetch volfile from glusterd". I'm going to have to revert to 3.7.11 > > > > > > > > > Heal process seems to be stuck at the following: > > > > > > gluster v heal datastore4 info > > > Brick vnb.proxmox.softlog:/tank/vmdata/datastore4 > > > Status: Connected > > > Number of entries: 0 > > > > > > Brick vng.proxmox.softlog:/tank/vmdata/datastore4 > > > - Possibly undergoing heal > > > > > > Status: Connected > > > Number of entries: 1 > > > > > > Brick vna.proxmox.softlog:/tank/vmdata/datastore4 > > > - Possibly undergoing heal > > > > > > Status: Connected > > > Number of entries: 1 > > > > > > I'm on my home now, will be offline for a couple of hours. But for now > > > my cluster is offline. > > > > > > -- > > > Lindsay > > > ___ > > > Gluster-users mailing list > > > Gluster-users@gluster.org > > > http://www.gluster.org/mailman/listinfo/gluster-users > > > > > > > -- > > Thanks, > > Anuradha. > > > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users > -- Pranith ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
Hi Lindsay, Can you share the glusterd log and the glfsheal log for the volume from the system on which you ran the heal command? This will help understand why volfile fetch failed. The files will be `/var/log/glusterfs/etc-glusterfs-glusterd.vol.log` and `/var/log/glusterfs/glfsheal-.log` On Wed, Jun 29, 2016 at 2:26 PM, wrote: > Yes, but I hadn't restarted the servers either, so the clients (qemu/gfapi) > were still 3.7.11 until then. > > > > Still have same problems after reverting the settings. > > > > Waiting for heal to finish before I revert to 3.7.11 > > > > Any advice on the best way to use apt for that? > > > > Sent from my Windows 10 phone > > > > From: Kevin Lemonnier > Sent: Wednesday, 29 June 2016 6:49 PM > To: gluster-users@gluster.org > Subject: Re: [Gluster-users] 3.7.12 disaster > > > >> cluster.shd-max-threads:4 > >> cluster.locking-scheme:granular > > > > So you had no problems before setting that ? I'm currently re-installing my > test > > servers, as you can imagine really really hoping 3.7.12 fixes the corruption > problem, > > I hope there isn't a new horrible bug .. > > > > -- > > Kevin Lemonnier > > PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 > > > > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
On Wed, Jun 29, 2016 at 2:07 PM, Lindsay Mathieson < lindsay.mathie...@gmail.com> wrote: > On 29 June 2016 at 18:30, Lindsay Mathieson > wrote: > > Same problem again. VM froze and heal info timed out with "Not able to > > fetch volfile from glusterd". I'm going to have to revert to 3.7.11 > > > Heal process seems to be stuck at the following: > > gluster v heal datastore4 info > Brick vnb.proxmox.softlog:/tank/vmdata/datastore4 > Status: Connected > Number of entries: 0 > > Brick vng.proxmox.softlog:/tank/vmdata/datastore4 > - Possibly undergoing heal > > Status: Connected > Number of entries: 1 > > Brick vna.proxmox.softlog:/tank/vmdata/datastore4 > - Possibly undergoing heal > > Status: Connected > Number of entries: 1 > > I'm on my home now, will be offline for a couple of hours. But for now > my cluster is offline. > hi Lindsay, Something doesn't sound right. If you had enabled locking-scheme to granular, 'Possibly undergoing heal' message should never come. I think the problem seems to be something to do with this option in your case. VMs freezing suggest that some processes are still doing old style locking which can block the writes. Can you try resetting the locking-scheme option? > > -- > Lindsay > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users > -- Pranith ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
No i didn’t, but I set those options and rebooted to upgrade the client at the same time. Will get the logs Sent from my Windows 10 phone From: Anuradha Talur___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
Lindsay, Did you see any problems in the setup before you set those options? Also, could you please share glusterd and glfsheal logs before you revert to 3.7.11, so that it can be analyzed? - Original Message - > From: "Lindsay Mathieson" > To: "gluster-users" > Sent: Wednesday, June 29, 2016 2:07:30 PM > Subject: Re: [Gluster-users] 3.7.12 disaster > > On 29 June 2016 at 18:30, Lindsay Mathieson > wrote: > > Same problem again. VM froze and heal info timed out with "Not able to > > fetch volfile from glusterd". I'm going to have to revert to 3.7.11 > > > Heal process seems to be stuck at the following: > > gluster v heal datastore4 info > Brick vnb.proxmox.softlog:/tank/vmdata/datastore4 > Status: Connected > Number of entries: 0 > > Brick vng.proxmox.softlog:/tank/vmdata/datastore4 > - Possibly undergoing heal > > Status: Connected > Number of entries: 1 > > Brick vna.proxmox.softlog:/tank/vmdata/datastore4 > - Possibly undergoing heal > > Status: Connected > Number of entries: 1 > > I'm on my home now, will be offline for a couple of hours. But for now > my cluster is offline. > > -- > Lindsay > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users > -- Thanks, Anuradha. ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
Yes, but I hadn't restarted the servers either, so the clients (qemu/gfapi) were still 3.7.11 until then. Still have same problems after reverting the settings. Waiting for heal to finish before I revert to 3.7.11 Any advice on the best way to use apt for that? Sent from my Windows 10 phone From: Kevin Lemonnier___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
> cluster.shd-max-threads:4 > cluster.locking-scheme:granular So you had no problems before setting that ? I'm currently re-installing my test servers, as you can imagine really really hoping 3.7.12 fixes the corruption problem, I hope there isn't a new horrible bug .. -- Kevin Lemonnier PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 signature.asc Description: Digital signature ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
On 29 June 2016 at 18:30, Lindsay Mathieson wrote: > Same problem again. VM froze and heal info timed out with "Not able to > fetch volfile from glusterd". I'm going to have to revert to 3.7.11 Heal process seems to be stuck at the following: gluster v heal datastore4 info Brick vnb.proxmox.softlog:/tank/vmdata/datastore4 Status: Connected Number of entries: 0 Brick vng.proxmox.softlog:/tank/vmdata/datastore4 - Possibly undergoing heal Status: Connected Number of entries: 1 Brick vna.proxmox.softlog:/tank/vmdata/datastore4 - Possibly undergoing heal Status: Connected Number of entries: 1 I'm on my home now, will be offline for a couple of hours. But for now my cluster is offline. -- Lindsay ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.12 disaster
On 29 June 2016 at 18:25, Lindsay Mathieson wrote: > Waiting for heal to complete. Before I try again. Same problem again. VM froze and heal info timed out with "Not able to fetch volfile from glusterd". I'm going to have to revert to 3.7.11 -- Lindsay ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] 3.7.12 disaster
Was able to shutdown my gluster and clean reboot. set: cluster.shd-max-threads:4 cluster.locking-scheme:granular And started one VM. It got halfway booted and froze. A gluster heal info returned "Not able to fetch volfile from glusterd" I killed the VM, stopped the datastore and all gluster processes, then started it back up. Heal info was successful and showed 200+ shards being healed. however a heal info heal-count show's 0 heals I stopped the dtastore again and reset the settings: cluster.shd-max-threads cluster.locking-scheme Waiting for heal to complete. Before I try again. Contemplating undoing the upgrade. Can I set the opversion back to 30710? -- Lindsay ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users