Re: [Gluster-users] 3.7.12 disaster

2016-06-30 Thread Lindsay Mathieson

On 1/07/2016 9:15 AM, Kaleb KEITHLEY wrote:

There isn't a libglusterfs.a that it could static link to.


Then it shouldn't matter what version it was built against should it? 
unless the function signatures have changed since 3.5


--
Lindsay Mathieson

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] 3.7.12 disaster

2016-06-30 Thread Kaleb KEITHLEY
On 06/30/2016 06:53 PM, Lindsay Mathieson wrote:
> On 30/06/2016 10:31 PM, Kaushal M wrote:
>> The pve-qemu-kvm package was last built or updated in January this
>> year[1]. And I think it was built against glusterfs-3.5.2, which is
>> the latest version of glusterfs in the proxmox sources [2].
>> Maybe the pve-qemu-kvm package needs a rebuild.
> 
> Does qemu static link libglusterfs?
> 

There isn't a libglusterfs.a that it could static link to.

So no.

--

Kaleb

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] 3.7.12 disaster

2016-06-30 Thread Lindsay Mathieson

On 30/06/2016 10:31 PM, Kaushal M wrote:

The pve-qemu-kvm package was last built or updated in January this
year[1]. And I think it was built against glusterfs-3.5.2, which is
the latest version of glusterfs in the proxmox sources [2].
Maybe the pve-qemu-kvm package needs a rebuild.


Does qemu static link libglusterfs?

--
Lindsay Mathieson

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] 3.7.12 disaster

2016-06-30 Thread Kaleb KEITHLEY
On 06/30/2016 11:23 AM, Kaleb KEITHLEY wrote:
> On 06/30/2016 11:18 AM, Vijay Bellur wrote:
>> On Thu, Jun 30, 2016 at 8:31 AM, Kaushal M  wrote:
>>> On Thu, Jun 30, 2016 at 5:47 PM, Kevin Lemonnier  
>>> wrote:
>
> Replicated the problem with 3.7.12 *and* 3.8.0 :(
>

 Yeah, I tried 3.8 when it came out too and I had to use the fuse mount 
 point
 to get the VMs to work. I just assumed proxmox wasn't compatible yet with 
 3.8 (since
 the menu were a bit wonky anyway) but I guess it was the same bug.

>>>
>>> I was able to reproduce the hang as well against 3.7.12.
>>>
>>> I tested by installing the pve-qemu-kvm package from the Proxmox
>>> repositories in a Debain Jessie container, as the default Debian qemu
>>> packages don't link with glusterfs.
>>> I used the 3.7.11 and 3.7.12 gluster repos from download.gluster.org.
>>>
>>> I tried to create an image on a simple 1 brick gluster volume using 
>>> qemu-img.
>>> The qemu-img command succeeded against a 3.7.11 volume, but hung
>>> against 3.7.12 to finally timeout and fail after ping-timeout.
>>>
>>> We can at-least be happy that this issue isn't due to any bugs in AFR.
>>>
>>> I was testing this with Raghavendra, and we are wondering if this is
>>> probably a result of changes to libglusterfs and libgfapi that have
>>> been introduced in 3.7.12 and 3.8.
>>> Any app linking with libgfapi also needs to link with libglusterfs.
>>> While we have some sort of versioning for libgfapi, we don't have any
>>> for libglusterfs.
>>> This has caused problems before (I cannot find any links for this
>>> right now though).
>>>
>>
>> Did any function signatures change between 3.7.11 and 3.7.12?
> 
> In gfapi? No. And (as I'm sure you're aware) they're all versioned, so
> things that linked with the old version-signature continue to do so.
> 
> I don't know about libglusterfs.
> 

And I'm not sure I want to suggest that we version libglusterfs for 4.0;
but perhaps we ought to?

--

Kaleb


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] 3.7.12 disaster

2016-06-30 Thread Kaleb KEITHLEY
On 06/30/2016 11:18 AM, Vijay Bellur wrote:
> On Thu, Jun 30, 2016 at 8:31 AM, Kaushal M  wrote:
>> On Thu, Jun 30, 2016 at 5:47 PM, Kevin Lemonnier  
>> wrote:

 Replicated the problem with 3.7.12 *and* 3.8.0 :(

>>>
>>> Yeah, I tried 3.8 when it came out too and I had to use the fuse mount point
>>> to get the VMs to work. I just assumed proxmox wasn't compatible yet with 
>>> 3.8 (since
>>> the menu were a bit wonky anyway) but I guess it was the same bug.
>>>
>>
>> I was able to reproduce the hang as well against 3.7.12.
>>
>> I tested by installing the pve-qemu-kvm package from the Proxmox
>> repositories in a Debain Jessie container, as the default Debian qemu
>> packages don't link with glusterfs.
>> I used the 3.7.11 and 3.7.12 gluster repos from download.gluster.org.
>>
>> I tried to create an image on a simple 1 brick gluster volume using qemu-img.
>> The qemu-img command succeeded against a 3.7.11 volume, but hung
>> against 3.7.12 to finally timeout and fail after ping-timeout.
>>
>> We can at-least be happy that this issue isn't due to any bugs in AFR.
>>
>> I was testing this with Raghavendra, and we are wondering if this is
>> probably a result of changes to libglusterfs and libgfapi that have
>> been introduced in 3.7.12 and 3.8.
>> Any app linking with libgfapi also needs to link with libglusterfs.
>> While we have some sort of versioning for libgfapi, we don't have any
>> for libglusterfs.
>> This has caused problems before (I cannot find any links for this
>> right now though).
>>
> 
> Did any function signatures change between 3.7.11 and 3.7.12?

In gfapi? No. And (as I'm sure you're aware) they're all versioned, so
things that linked with the old version-signature continue to do so.

I don't know about libglusterfs.

--

Kaleb


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] 3.7.12 disaster

2016-06-30 Thread Pranith Kumar Karampuri
Kaushal, Raghavendra Talur(CCed) were looking into why libgfapi could be
giving a problem. We will get in touch with you as soon as they have
something. Please keep the test node until they reach you. Thanks again
Lindsay.

On Thu, Jun 30, 2016 at 5:34 PM, Lindsay Mathieson <
lindsay.mathie...@gmail.com> wrote:

> On 30/06/2016 2:42 PM, Pranith Kumar Karampuri wrote:
>
>> Glad that for both of you, things are back to normal. Could one of you
>> help us find what is the problem you are facing with libgfapi, if you have
>> any spare test machines. Otherwise we need to understand proxmox etc which
>> may take a bit more time.
>>
>
> I got a test node running, with a replica 3 volumes (3 bricks on same
> node).
>
> Replicated the problem with 3.7.12 *and* 3.8.0 :(
>
>
> I can trash this node as needed, happy to build from src and apply patches.
>
> --
> Lindsay Mathieson
>
>


-- 
Pranith
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 3.7.12 disaster

2016-06-30 Thread Vijay Bellur
On Thu, Jun 30, 2016 at 8:31 AM, Kaushal M  wrote:
> On Thu, Jun 30, 2016 at 5:47 PM, Kevin Lemonnier  wrote:
>>>
>>> Replicated the problem with 3.7.12 *and* 3.8.0 :(
>>>
>>
>> Yeah, I tried 3.8 when it came out too and I had to use the fuse mount point
>> to get the VMs to work. I just assumed proxmox wasn't compatible yet with 
>> 3.8 (since
>> the menu were a bit wonky anyway) but I guess it was the same bug.
>>
>
> I was able to reproduce the hang as well against 3.7.12.
>
> I tested by installing the pve-qemu-kvm package from the Proxmox
> repositories in a Debain Jessie container, as the default Debian qemu
> packages don't link with glusterfs.
> I used the 3.7.11 and 3.7.12 gluster repos from download.gluster.org.
>
> I tried to create an image on a simple 1 brick gluster volume using qemu-img.
> The qemu-img command succeeded against a 3.7.11 volume, but hung
> against 3.7.12 to finally timeout and fail after ping-timeout.
>
> We can at-least be happy that this issue isn't due to any bugs in AFR.
>
> I was testing this with Raghavendra, and we are wondering if this is
> probably a result of changes to libglusterfs and libgfapi that have
> been introduced in 3.7.12 and 3.8.
> Any app linking with libgfapi also needs to link with libglusterfs.
> While we have some sort of versioning for libgfapi, we don't have any
> for libglusterfs.
> This has caused problems before (I cannot find any links for this
> right now though).
>

Did any function signatures change between 3.7.11 and 3.7.12?

-Vijay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] 3.7.12 disaster

2016-06-30 Thread Lindsay Mathieson

On 30/06/2016 10:31 PM, Kaushal M wrote:

Any app linking with libgfapi also needs to link with libglusterfs.
While we have some sort of versioning for libgfapi, we don't have any
for libglusterfs.
This has caused problems before (I cannot find any links for this
right now though).

The pve-qemu-kvm package was last built or updated in January this
year[1]. And I think it was built against glusterfs-3.5.2, which is
the latest version of glusterfs in the proxmox sources [2].
Maybe the pve-qemu-kvm package needs a rebuild.


Tricky problem.

--
Lindsay Mathieson

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] 3.7.12 disaster

2016-06-30 Thread Kaushal M
On Thu, Jun 30, 2016 at 5:47 PM, Kevin Lemonnier  wrote:
>>
>> Replicated the problem with 3.7.12 *and* 3.8.0 :(
>>
>
> Yeah, I tried 3.8 when it came out too and I had to use the fuse mount point
> to get the VMs to work. I just assumed proxmox wasn't compatible yet with 3.8 
> (since
> the menu were a bit wonky anyway) but I guess it was the same bug.
>

I was able to reproduce the hang as well against 3.7.12.

I tested by installing the pve-qemu-kvm package from the Proxmox
repositories in a Debain Jessie container, as the default Debian qemu
packages don't link with glusterfs.
I used the 3.7.11 and 3.7.12 gluster repos from download.gluster.org.

I tried to create an image on a simple 1 brick gluster volume using qemu-img.
The qemu-img command succeeded against a 3.7.11 volume, but hung
against 3.7.12 to finally timeout and fail after ping-timeout.

We can at-least be happy that this issue isn't due to any bugs in AFR.

I was testing this with Raghavendra, and we are wondering if this is
probably a result of changes to libglusterfs and libgfapi that have
been introduced in 3.7.12 and 3.8.
Any app linking with libgfapi also needs to link with libglusterfs.
While we have some sort of versioning for libgfapi, we don't have any
for libglusterfs.
This has caused problems before (I cannot find any links for this
right now though).

The pve-qemu-kvm package was last built or updated in January this
year[1]. And I think it was built against glusterfs-3.5.2, which is
the latest version of glusterfs in the proxmox sources [2].
Maybe the pve-qemu-kvm package needs a rebuild.

We'll continue to try to figure out what the actual issue is though.

~kaushal

> --
> Kevin Lemonnier
> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] 3.7.12 disaster

2016-06-30 Thread Kevin Lemonnier
> 
> Replicated the problem with 3.7.12 *and* 3.8.0 :(
> 

Yeah, I tried 3.8 when it came out too and I had to use the fuse mount point
to get the VMs to work. I just assumed proxmox wasn't compatible yet with 3.8 
(since
the menu were a bit wonky anyway) but I guess it was the same bug.

-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111


signature.asc
Description: Digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 3.7.12 disaster

2016-06-30 Thread Lindsay Mathieson

On 30/06/2016 2:42 PM, Pranith Kumar Karampuri wrote:
Glad that for both of you, things are back to normal. Could one of you 
help us find what is the problem you are facing with libgfapi, if you 
have any spare test machines. Otherwise we need to understand proxmox 
etc which may take a bit more time.


I got a test node running, with a replica 3 volumes (3 bricks on same node).

Replicated the problem with 3.7.12 *and* 3.8.0 :(


I can trash this node as needed, happy to build from src and apply patches.

--
Lindsay Mathieson

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] 3.7.12 disaster

2016-06-30 Thread Kaleb KEITHLEY
On 06/30/2016 05:47 AM, Kaushal M wrote:
> On Thu, Jun 30, 2016 at 2:29 PM, Lindsay Mathieson
>  wrote:
>> On 30 June 2016 at 18:48, Kaushal M  wrote:
>>> I need a quick info from you guys, which packages are you using? Are
>>> you using any of the packages built by the community (ie. on
>>> download.gluster.org/launchpad/CentOS-storage-sig etc.).
>>> We are wondering if the issues you are facing are the same that have
>>> been fixed by https://review.gluster.org/14822 .
>>> The packages that have been built by us, contain this patch. So if you
>>> are facing problems with these packages, we can be sure its a new
>>> issue.
>>
>>
>> I'm using the same as Kevin:
>>
>> deb 
>> http://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.12/Debian/jessie/apt
>> jessie main
>>
>> Could there have been a packaging error perhaps?
> 
> These packages should include the patch.
> 
> We need to check with Kaleb, who built the packages could confirm if
> it could be a packaging error.
> He is away today/tomorrow, let's hope he checks his mail.

The Debian 3.7.12 packages _do_ have http://review.gluster.org/14822
(a.k.a. http://review.gluster.org/14779)

Jury duty cancelled. I'm working today.

--

Kaleb


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] 3.7.12 disaster

2016-06-30 Thread Kaushal M
On Thu, Jun 30, 2016 at 2:29 PM, Lindsay Mathieson
 wrote:
> On 30 June 2016 at 18:48, Kaushal M  wrote:
>> I need a quick info from you guys, which packages are you using? Are
>> you using any of the packages built by the community (ie. on
>> download.gluster.org/launchpad/CentOS-storage-sig etc.).
>> We are wondering if the issues you are facing are the same that have
>> been fixed by https://review.gluster.org/14822 .
>> The packages that have been built by us, contain this patch. So if you
>> are facing problems with these packages, we can be sure its a new
>> issue.
>
>
> I'm using the same as Kevin:
>
> deb 
> http://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.12/Debian/jessie/apt
> jessie main
>
> Could there have been a packaging error perhaps?

These packages should include the patch.

We need to check with Kaleb, who built the packages could confirm if
it could be a packaging error.
He is away today/tomorrow, let's hope he checks his mail.

>
> --
> Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] 3.7.12 disaster

2016-06-30 Thread Lindsay Mathieson
On 30 June 2016 at 18:48, Kaushal M  wrote:
> I need a quick info from you guys, which packages are you using? Are
> you using any of the packages built by the community (ie. on
> download.gluster.org/launchpad/CentOS-storage-sig etc.).
> We are wondering if the issues you are facing are the same that have
> been fixed by https://review.gluster.org/14822 .
> The packages that have been built by us, contain this patch. So if you
> are facing problems with these packages, we can be sure its a new
> issue.


I'm using the same as Kevin:

deb 
http://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.12/Debian/jessie/apt
jessie main

Could there have been a packaging error perhaps?

-- 
Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] 3.7.12 disaster

2016-06-30 Thread Kevin Lemonnier
> 
> I need a quick info from you guys, which packages are you using? Are
> you using any of the packages built by the community (ie. on
> download.gluster.org/launchpad/CentOS-storage-sig etc.).
> We are wondering if the issues you are facing are the same that have
> been fixed by https://review.gluster.org/14822 .
> The packages that have been built by us, contain this patch. So if you
> are facing problems with these packages, we can be sure its a new
> issue.
>


r...@s2.name [hostname]:~ # cat /etc/apt/sources.list.d/gluster.list
deb 
http://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.12/Debian/jessie/apt 
jessie main

Should include the patch, right ?

-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111


signature.asc
Description: Digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 3.7.12 disaster

2016-06-30 Thread Kaushal M
On Thu, Jun 30, 2016 at 12:29 PM, Kevin Lemonnier  wrote:
>>Glad that for both of you, things are back to normal. Could one of you
>>help us find what is the problem you are facing with libgfapi, if you have
>>any spare test machines. Otherwise we need to understand proxmox etc which
>>may take a bit more time.
>
> Sure, I have my test cluster working now using NFS but I can create other VMs
> using the lib to test if needed. What would you need ? Unfortunatly creating
> a VM on gluster through the lib doesn't work and I don't know how to get the 
> logs of that,
> the only error I get is this in proxmox logs :
>
> Jun 29 13:26:25 s2.name pvedaemon[2803]: create failed - unable to create 
> image: got lock timeout - aborting command
> Jun 29 13:26:52 s2.name qemu-img[2811]: [2016-06-29 11:26:52.485296] C 
> [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-gluster-client-3: server 
> 172.16.0.2:49153 has not responded in the last 42 seconds, disconnecting.
> Jun 29 13:26:52 s2.name qemu-img[2811]: [2016-06-29 11:26:52.485407] C 
> [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-gluster-client-4: server 
> 172.16.0.3:49153 has not responded in the last 42 seconds, disconnecting.
> Jun 29 13:26:52 s2.name qemu-img[2811]: [2016-06-29 11:26:52.485443] C 
> [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-gluster-client-5: server 
> 172.16.0.50:49153 has not responded in the last 42 seconds, disconnecting.
>

I need a quick info from you guys, which packages are you using? Are
you using any of the packages built by the community (ie. on
download.gluster.org/launchpad/CentOS-storage-sig etc.).
We are wondering if the issues you are facing are the same that have
been fixed by https://review.gluster.org/14822 .
The packages that have been built by us, contain this patch. So if you
are facing problems with these packages, we can be sure its a new
issue.

>
> --
> Kevin Lemonnier
> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] 3.7.12 disaster

2016-06-29 Thread Kevin Lemonnier
>Glad that for both of you, things are back to normal. Could one of you
>help us find what is the problem you are facing with libgfapi, if you have
>any spare test machines. Otherwise we need to understand proxmox etc which
>may take a bit more time.

Sure, I have my test cluster working now using NFS but I can create other VMs
using the lib to test if needed. What would you need ? Unfortunatly creating
a VM on gluster through the lib doesn't work and I don't know how to get the 
logs of that,
the only error I get is this in proxmox logs :

Jun 29 13:26:25 s2.name pvedaemon[2803]: create failed - unable to create 
image: got lock timeout - aborting command
Jun 29 13:26:52 s2.name qemu-img[2811]: [2016-06-29 11:26:52.485296] C 
[rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-gluster-client-3: server 
172.16.0.2:49153 has not responded in the last 42 seconds, disconnecting.
Jun 29 13:26:52 s2.name qemu-img[2811]: [2016-06-29 11:26:52.485407] C 
[rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-gluster-client-4: server 
172.16.0.3:49153 has not responded in the last 42 seconds, disconnecting.
Jun 29 13:26:52 s2.name qemu-img[2811]: [2016-06-29 11:26:52.485443] C 
[rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-gluster-client-5: server 
172.16.0.50:49153 has not responded in the last 42 seconds, disconnecting.


-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111


signature.asc
Description: Digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 3.7.12 disaster

2016-06-29 Thread Lindsay Mathieson
On 30 June 2016 at 14:42, Pranith Kumar Karampuri  wrote:
> Could one of you help us find what is the problem you are facing with
> libgfapi, if you have any spare test machines. Otherwise we need to
> understand proxmox etc which may take a bit more time.


I'm looking at getting a basic PC reurposed for testing today - to be
exact a Celeron NU with 8GB ram. We'll see if its usable :)

-- 
Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] 3.7.12 disaster

2016-06-29 Thread Pranith Kumar Karampuri
Glad that for both of you, things are back to normal. Could one of you help
us find what is the problem you are facing with libgfapi, if you have any
spare test machines. Otherwise we need to understand proxmox etc which may
take a bit more time.

On Wed, Jun 29, 2016 at 11:49 PM, Kevin Lemonnier 
wrote:

> >
> > To me it looks like a libgfapi problem. VM's were working via fuse.
> >
>
> Yeah, as mentionned in other messages I can confirm libgfapi isn't working
> in 3.7.12.
> I don't know if there was some breaking change between 3.7.11 and 3.7.12
> but looks like
> it's not working anymore, at least not in the latest proxmox.
>
> --
> Kevin Lemonnier
> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>



-- 
Pranith
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 3.7.12 disaster

2016-06-29 Thread Kevin Lemonnier
> 
> To me it looks like a libgfapi problem. VM's were working via fuse.
> 

Yeah, as mentionned in other messages I can confirm libgfapi isn't working in 
3.7.12.
I don't know if there was some breaking change between 3.7.11 and 3.7.12 but 
looks like
it's not working anymore, at least not in the latest proxmox.

-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111


signature.asc
Description: Digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 3.7.12 disaster

2016-06-29 Thread Lindsay Mathieson

On 29/06/2016 10:48 PM, Kevin Lemonnier wrote:

I do have a problem the libgfapi though, I can't create VMs with qcow disks (I 
get a timeout)
and I can create VMs with raw disks but when I try to format them with 
mkfs.ext4 they shut down
without any errors.


Downgraded back to 3.7.11 and got everything working again.


To me it looks like a libgfapi problem. VM's were working via fuse.


--
Lindsay Mathieson

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] 3.7.12 disaster

2016-06-29 Thread Kevin Lemonnier
> 
> Which NFS server are you using? the std one built into proxmox/debian? 
> how do you handle redundancy?

I mean the one in gluster, I added the gluster volume as NFS in proxmox
instead of as gluster.

-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111


signature.asc
Description: Digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 3.7.12 disaster

2016-06-29 Thread Lindsay Mathieson

On 29/06/2016 11:29 PM, Kevin Lemonnier wrote:

I mean the one in gluster, I added the gluster volume as NFS in proxmox
instead of as gluster.


I didn't realise that worked on debian, is that the ganesha server?

--
Lindsay Mathieson

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] 3.7.12 disaster

2016-06-29 Thread Lindsay Mathieson

On 29/06/2016 10:48 PM, Kevin Lemonnier wrote:

Just to add some info on that, I did a fresh install of 3.7.12 here (without 
setting that option)
and I don't have a problem starting the VMs.

I do have a problem the libgfapi though, I can't create VMs with qcow disks (I 
get a timeout)
and I can create VMs with raw disks but when I try to format them with 
mkfs.ext4 they shut down
without any errors.
Maybe it's related ? Are you using qcow ?


Yes, I am


I added the volume as NFS and I'm using that without any problem for now with 
both qwoc and raw, maybe
you could try that, see if at least your VMs can boot that way.


Which NFS server are you using? the std one built into proxmox/debian? 
how do you handle redundancy?



That did suggest to me trying using the fuse client, which proxmox 
automatically sets up. I changed my gfgapi storage to shared directory 
storage pointing to the fuse mount. That is working better, I have 
several VM's running now and heal info isn't locking up or reporting any 
issues.


However several other VM's won't start, qemu erros out with: "Could not 
read qcow2 header: Operation not permitted", which freaked me out till I 
manually checked the image with qemu-img, which reported it as fine.


Perhaps I need to reboot the cluster again to reset any locks or randomness.


--
Lindsay Mathieson

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] 3.7.12 disaster

2016-06-29 Thread Kevin Lemonnier
Just to add some info on that, I did a fresh install of 3.7.12 here (without 
setting that option)
and I don't have a problem starting the VMs.

I do have a problem the libgfapi though, I can't create VMs with qcow disks (I 
get a timeout)
and I can create VMs with raw disks but when I try to format them with 
mkfs.ext4 they shut down
without any errors.
Maybe it's related ? Are you using qcow ?
I added the volume as NFS and I'm using that without any problem for now with 
both qwoc and raw, maybe
you could try that, see if at least your VMs can boot that way.

On Wed, Jun 29, 2016 at 06:25:44PM +1000, Lindsay Mathieson wrote:
> Was able to shutdown my gluster and clean reboot.
> 
> set:
> 
> cluster.shd-max-threads:4
> cluster.locking-scheme:granular
> 
> And started one VM. It got halfway booted and froze. A gluster heal
> info returned "Not able to fetch volfile from glusterd"
> 
> I killed the VM, stopped the datastore and all gluster processes, then
> started it back up. Heal info was successful and showed 200+ shards
> being healed.
> 
> however a heal info heal-count show's 0 heals
> 
> I stopped the dtastore again and reset the settings:
>   cluster.shd-max-threads
>   cluster.locking-scheme
> 
> Waiting for heal to complete. Before I try again.
> 
> Contemplating undoing the upgrade. Can I set the opversion back to 30710?
> 
> 
> 
> -- 
> Lindsay
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111


signature.asc
Description: Digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 3.7.12 disaster

2016-06-29 Thread Lindsay Mathieson

When I try to start a VM the brick log is spammed with the following:

[2016-06-29 12:29:17.844704] E [MSGID: 113022] 
[posix.c:1244:posix_mknod] 0-datastore4-posix: mknod on 
/tank/vmdata/datastore4/.shard/b2996a69-f629-4425-9098-e62c25d9f033.832 
failed [File exists]
[2016-06-29 12:29:18.030093] E [MSGID: 113022] 
[posix.c:1244:posix_mknod] 0-datastore4-posix: mknod on 
/tank/vmdata/datastore4/.shard/b2996a69-f629-4425-9098-e62c25d9f033.835 
failed [File exists]
[2016-06-29 12:29:19.276670] E [MSGID: 113022] 
[posix.c:1244:posix_mknod] 0-datastore4-posix: mknod on 
/tank/vmdata/datastore4/.shard/b2996a69-f629-4425-9098-e62c25d9f033.478 
failed [File exists]
[2016-06-29 12:29:19.915686] E [MSGID: 113022] 
[posix.c:1244:posix_mknod] 0-datastore4-posix: mknod on 
/tank/vmdata/datastore4/.shard/b2996a69-f629-4425-9098-e62c25d9f033.813 
failed [File exists]
[2016-06-29 12:29:20.270403] E [MSGID: 113022] 
[posix.c:1244:posix_mknod] 0-datastore4-posix: mknod on 
/tank/vmdata/datastore4/.shard/b2996a69-f629-4425-9098-e62c25d9f033.476 
failed [File exists]
[2016-06-29 12:29:20.750933] E [MSGID: 113022] 
[posix.c:1244:posix_mknod] 0-datastore4-posix: mknod on 
/tank/vmdata/datastore4/.shard/b2996a69-f629-4425-9098-e62c25d9f033.505 
failed [File exists]
[2016-06-29 12:29:21.175397] E [MSGID: 113022] 
[posix.c:1244:posix_mknod] 0-datastore4-posix: mknod on 
/tank/vmdata/datastore4/.shard/b2996a69-f629-4425-9098-e62c25d9f033.159 
failed [File exists]
[2016-06-29 12:29:21.366887] E [MSGID: 113022] 
[posix.c:1244:posix_mknod] 0-datastore4-posix: mknod on 
/tank/vmdata/datastore4/.shard/b2996a69-f629-4425-9098-e62c25d9f033.220 
failed [File exists]
[2016-06-29 12:29:21.827546] E [MSGID: 113022] 
[posix.c:1244:posix_mknod] 0-datastore4-posix: mknod on 
/tank/vmdata/datastore4/.shard/b2996a69-f629-4425-9098-e62c25d9f033.640 
failed [File exists]
[2016-06-29 12:29:22.329726] E [MSGID: 113022] 
[posix.c:1244:posix_mknod] 0-datastore4-posix: mknod on 
/tank/vmdata/datastore4/.shard/b2996a69-f629-4425-9098-e62c25d9f033.509 
failed [File exists]
[2016-06-29 12:29:22.790829] E [MSGID: 113022] 
[posix.c:1244:posix_mknod] 0-datastore4-posix: mknod on 
/tank/vmdata/datastore4/.shard/b2996a69-f629-4425-9098-e62c25d9f033.119 
failed [File exists]
[2016-06-29 12:29:24.180752] E [MSGID: 113022] 
[posix.c:1244:posix_mknod] 0-datastore4-posix: mknod on 
/tank/vmdata/datastore4/.shard/b2996a69-f629-4425-9098-e62c25d9f033.508 
failed [File exists]



--
Lindsay Mathieson

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] 3.7.12 disaster

2016-06-29 Thread Pranith Kumar Karampuri
On Wed, Jun 29, 2016 at 2:41 PM,  wrote:

> No i didn’t, but I set those options and rebooted to upgrade the client at
> the same time.
>

This is the part that confuses me. You can *not* set 3.7.12 option when
3.7.11 clients are still in play. Something is a miss. Can you give me 'ls
-l  please?


>
>
> Will get the logs
>
>
>
> Sent from my Windows 10 phone
>
>
>
> *From: *Anuradha Talur 
> *Sent: *Wednesday, 29 June 2016 7:05 PM
> *To: *Lindsay Mathieson 
> *Cc: *gluster-users 
>
> *Subject: *Re: [Gluster-users] 3.7.12 disaster
>
>
>
> Lindsay,
>
>
>
> Did you see any problems in the setup before you set those options?
>
>
>
> Also, could you please share glusterd and glfsheal logs before you revert
> to 3.7.11,
>
> so that it can be analyzed?
>
>
>
> - Original Message -
>
> > From: "Lindsay Mathieson" 
>
> > To: "gluster-users" 
>
> > Sent: Wednesday, June 29, 2016 2:07:30 PM
>
> > Subject: Re: [Gluster-users] 3.7.12 disaster
>
> >
>
> > On 29 June 2016 at 18:30, Lindsay Mathieson  >
>
> > wrote:
>
> > > Same problem again. VM froze and heal info timed out with "Not able to
>
> > > fetch volfile from glusterd". I'm going to have to revert to 3.7.11
>
> >
>
> >
>
> > Heal process seems to be stuck at the following:
>
> >
>
> > gluster v heal datastore4 info
>
> > Brick vnb.proxmox.softlog:/tank/vmdata/datastore4
>
> > Status: Connected
>
> > Number of entries: 0
>
> >
>
> > Brick vng.proxmox.softlog:/tank/vmdata/datastore4
>
> >  - Possibly undergoing heal
>
> >
>
> > Status: Connected
>
> > Number of entries: 1
>
> >
>
> > Brick vna.proxmox.softlog:/tank/vmdata/datastore4
>
> >  - Possibly undergoing heal
>
> >
>
> > Status: Connected
>
> > Number of entries: 1
>
> >
>
> > I'm on my home now, will be offline for a couple of hours. But for now
>
> > my cluster is offline.
>
> >
>
> > --
>
> > Lindsay
>
> > ___
>
> > Gluster-users mailing list
>
> > Gluster-users@gluster.org
>
> > http://www.gluster.org/mailman/listinfo/gluster-users
>
> >
>
>
>
> --
>
> Thanks,
>
> Anuradha.
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>



-- 
Pranith
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 3.7.12 disaster

2016-06-29 Thread Kaushal M
Hi Lindsay,

Can you share the glusterd log and the glfsheal log for the volume
from the system on which you ran the heal command?
This will help understand why volfile fetch failed.

The files will be `/var/log/glusterfs/etc-glusterfs-glusterd.vol.log`
and `/var/log/glusterfs/glfsheal-.log`



On Wed, Jun 29, 2016 at 2:26 PM,   wrote:
> Yes, but I hadn't restarted the servers either, so the clients (qemu/gfapi)
> were still 3.7.11 until then.
>
>
>
> Still have same problems after reverting the settings.
>
>
>
> Waiting for heal to finish before I revert to 3.7.11
>
>
>
> Any advice on the best way to use apt for that?
>
>
>
> Sent from my Windows 10 phone
>
>
>
> From: Kevin Lemonnier
> Sent: Wednesday, 29 June 2016 6:49 PM
> To: gluster-users@gluster.org
> Subject: Re: [Gluster-users] 3.7.12 disaster
>
>
>
>> cluster.shd-max-threads:4
>
>> cluster.locking-scheme:granular
>
>
>
> So you had no problems before setting that ? I'm currently re-installing my
> test
>
> servers, as you can imagine really really hoping 3.7.12 fixes the corruption
> problem,
>
> I hope there isn't a new horrible bug ..
>
>
>
> --
>
> Kevin Lemonnier
>
> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
>
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] 3.7.12 disaster

2016-06-29 Thread Pranith Kumar Karampuri
On Wed, Jun 29, 2016 at 2:07 PM, Lindsay Mathieson <
lindsay.mathie...@gmail.com> wrote:

> On 29 June 2016 at 18:30, Lindsay Mathieson 
> wrote:
> > Same problem again. VM froze and heal info timed out with "Not able to
> > fetch volfile from glusterd". I'm going to have to revert to 3.7.11
>
>
> Heal process seems to be stuck at the following:
>
> gluster v heal datastore4 info
> Brick vnb.proxmox.softlog:/tank/vmdata/datastore4
> Status: Connected
> Number of entries: 0
>
> Brick vng.proxmox.softlog:/tank/vmdata/datastore4
>  - Possibly undergoing heal
>
> Status: Connected
> Number of entries: 1
>
> Brick vna.proxmox.softlog:/tank/vmdata/datastore4
>  - Possibly undergoing heal
>
> Status: Connected
> Number of entries: 1
>
> I'm on my home now, will be offline for a couple of hours. But for now
> my cluster is offline.
>

hi Lindsay,
  Something doesn't sound right. If you had enabled locking-scheme to
granular, 'Possibly undergoing heal' message should never come. I think the
problem seems to be something to do with this option in your case. VMs
freezing suggest that some processes are still doing old style locking
which can block the writes. Can you try resetting the locking-scheme option?



>
> --
> Lindsay
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>



-- 
Pranith
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 3.7.12 disaster

2016-06-29 Thread lindsay.mathieson
No i didn’t, but I set those options and rebooted to upgrade the client at the 
same time.

Will get the logs 

Sent from my Windows 10 phone

From: Anuradha Talur___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 3.7.12 disaster

2016-06-29 Thread Anuradha Talur
Lindsay,

Did you see any problems in the setup before you set those options?

Also, could you please share glusterd and glfsheal logs before you revert to 
3.7.11,
so that it can be analyzed?

- Original Message -
> From: "Lindsay Mathieson" 
> To: "gluster-users" 
> Sent: Wednesday, June 29, 2016 2:07:30 PM
> Subject: Re: [Gluster-users] 3.7.12 disaster
> 
> On 29 June 2016 at 18:30, Lindsay Mathieson 
> wrote:
> > Same problem again. VM froze and heal info timed out with "Not able to
> > fetch volfile from glusterd". I'm going to have to revert to 3.7.11
> 
> 
> Heal process seems to be stuck at the following:
> 
> gluster v heal datastore4 info
> Brick vnb.proxmox.softlog:/tank/vmdata/datastore4
> Status: Connected
> Number of entries: 0
> 
> Brick vng.proxmox.softlog:/tank/vmdata/datastore4
>  - Possibly undergoing heal
> 
> Status: Connected
> Number of entries: 1
> 
> Brick vna.proxmox.softlog:/tank/vmdata/datastore4
>  - Possibly undergoing heal
> 
> Status: Connected
> Number of entries: 1
> 
> I'm on my home now, will be offline for a couple of hours. But for now
> my cluster is offline.
> 
> --
> Lindsay
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
> 

-- 
Thanks,
Anuradha.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] 3.7.12 disaster

2016-06-29 Thread lindsay.mathieson
Yes, but I hadn't restarted the servers either, so the clients (qemu/gfapi) 
were still 3.7.11 until then.

Still have same problems after reverting the settings.

Waiting for heal to finish before I revert to 3.7.11

Any advice on the best way to use apt for that?

Sent from my Windows 10 phone

From: Kevin Lemonnier___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 3.7.12 disaster

2016-06-29 Thread Kevin Lemonnier
> cluster.shd-max-threads:4
> cluster.locking-scheme:granular

So you had no problems before setting that ? I'm currently re-installing my test
servers, as you can imagine really really hoping 3.7.12 fixes the corruption 
problem,
I hope there isn't a new horrible bug ..

-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111


signature.asc
Description: Digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 3.7.12 disaster

2016-06-29 Thread Lindsay Mathieson
On 29 June 2016 at 18:30, Lindsay Mathieson  wrote:
> Same problem again. VM froze and heal info timed out with "Not able to
> fetch volfile from glusterd". I'm going to have to revert to 3.7.11


Heal process seems to be stuck at the following:

gluster v heal datastore4 info
Brick vnb.proxmox.softlog:/tank/vmdata/datastore4
Status: Connected
Number of entries: 0

Brick vng.proxmox.softlog:/tank/vmdata/datastore4
 - Possibly undergoing heal

Status: Connected
Number of entries: 1

Brick vna.proxmox.softlog:/tank/vmdata/datastore4
 - Possibly undergoing heal

Status: Connected
Number of entries: 1

I'm on my home now, will be offline for a couple of hours. But for now
my cluster is offline.

-- 
Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] 3.7.12 disaster

2016-06-29 Thread Lindsay Mathieson
On 29 June 2016 at 18:25, Lindsay Mathieson  wrote:
> Waiting for heal to complete. Before I try again.


Same problem again. VM froze and heal info timed out with "Not able to
fetch volfile from glusterd". I'm going to have to revert to 3.7.11

-- 
Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] 3.7.12 disaster

2016-06-29 Thread Lindsay Mathieson
Was able to shutdown my gluster and clean reboot.

set:

cluster.shd-max-threads:4
cluster.locking-scheme:granular

And started one VM. It got halfway booted and froze. A gluster heal
info returned "Not able to fetch volfile from glusterd"

I killed the VM, stopped the datastore and all gluster processes, then
started it back up. Heal info was successful and showed 200+ shards
being healed.

however a heal info heal-count show's 0 heals

I stopped the dtastore again and reset the settings:
  cluster.shd-max-threads
  cluster.locking-scheme

Waiting for heal to complete. Before I try again.

Contemplating undoing the upgrade. Can I set the opversion back to 30710?



-- 
Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users