Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-21 Thread Adrian Kan
Actually I had the same experience when I was using 3.4.2

https://www.mail-archive.com/gluster-users@gluster.org/msg15850.html

 

If I understand, I should be using FULL heal rather than DIFF for large
vm-images?

 

I was not sure throttling was working for 3.4.2 or not.  I attempted to
recover the entire volume filled with VM-images ranging from size 10G to
500G

I saw it was recovering 2 images at a time rather than all at once.

 

 

Thanks,

Adrian

 

 

 

 

 

From: gluster-users-boun...@gluster.org
[mailto:gluster-users-boun...@gluster.org] On Behalf Of Pranith Kumar
Karampuri
Sent: Tuesday, November 18, 2014 7:02 PM
To: Lindsay Mathieson; gluster-users
Subject: Re: [Gluster-users] glusterfsd process thrashing CPU

 

 

On 11/18/2014 04:14 PM, Lindsay Mathieson wrote:

On Tue, 18 Nov 2014 02:36:19 PM Pranith Kumar Karampuri wrote:

On 11/18/2014 01:17 PM, Lindsay Mathieson wrote:

On 18 November 2014 17:40, Pranith Kumar Karampuri
<mailto:pkara...@redhat.com>  

wrote:

 
However given the files are tens of GB in size, won't it thrash my
network?

 
Yes you are right. I wonder why thrashing of the network is never
reported till now.

 
Not sure if you are being sarcastic or not :) But from what I've observed, 
sync operations seem to self throttle, I've not seen them use more than 50%
of 
bandwidth, and given most setups have a dedicated network for the servers 
maybe they just don't notice if it takes a while?

No, I was not being sarcastic :-). I am genuinely wondering why it is not
reported till now. May be Joe will have more inputs there, that is the
reason I CCed him.



 
 

I still need to think about how best to solve this problem.

 
Setup a array of queues for self healing, sorted by size maybe?
 

 
Let me tell you a bit more about this issue:
there are two processes which heal the VM images:
1) self-heal-daemon. 2) Mount process.
Self-heal daemon heals one VM image at a time. But mount process
triggers self-heals for all the opened files(VM image is nothing but an
opened file from filesystem's perspective) when a brick goes down and
comes backup.

 
 
Thanks, interesting to know.
 

So we need to come up with a scheme to throttle self-heals
on the mount point to prevent this issue. I will update you as soon as I
come up with a fix. This should not be hard to do. Need some time to
choose the best approach. Thanks a lot for bringing up this issue.

 
Thanks you for looking at it!
 
Cheers,
 
 






___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

 

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-18 Thread Pranith Kumar Karampuri


On 11/18/2014 04:14 PM, Lindsay Mathieson wrote:

On Tue, 18 Nov 2014 02:36:19 PM Pranith Kumar Karampuri wrote:

On 11/18/2014 01:17 PM, Lindsay Mathieson wrote:

On 18 November 2014 17:40, Pranith Kumar Karampuri 

wrote:

However given the files are tens of GB in size, won't it thrash my
network?

Yes you are right. I wonder why thrashing of the network is never
reported till now.

Not sure if you are being sarcastic or not :) But from what I've observed,
sync operations seem to self throttle, I've not seen them use more than 50% of
bandwidth, and given most setups have a dedicated network for the servers
maybe they just don't notice if it takes a while?
No, I was not being sarcastic :-). I am genuinely wondering why it is 
not reported till now. May be Joe will have more inputs there, that is 
the reason I CCed him.



I still need to think about how best to solve this problem.

Setup a array of queues for self healing, sorted by size maybe?


Let me tell you a bit more about this issue:
there are two processes which heal the VM images:
1) self-heal-daemon. 2) Mount process.
Self-heal daemon heals one VM image at a time. But mount process
triggers self-heals for all the opened files(VM image is nothing but an
opened file from filesystem's perspective) when a brick goes down and
comes backup.


Thanks, interesting to know.


So we need to come up with a scheme to throttle self-heals
on the mount point to prevent this issue. I will update you as soon as I
come up with a fix. This should not be hard to do. Need some time to
choose the best approach. Thanks a lot for bringing up this issue.

Thanks you for looking at it!

Cheers,




___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-18 Thread Lindsay Mathieson
On Tue, 18 Nov 2014 02:36:19 PM Pranith Kumar Karampuri wrote:
> On 11/18/2014 01:17 PM, Lindsay Mathieson wrote:
> > On 18 November 2014 17:40, Pranith Kumar Karampuri  
wrote:
> > 
> > However given the files are tens of GB in size, won't it thrash my
> > network?
> 
> Yes you are right. I wonder why thrashing of the network is never
> reported till now.

Not sure if you are being sarcastic or not :) But from what I've observed, 
sync operations seem to self throttle, I've not seen them use more than 50% of 
bandwidth, and given most setups have a dedicated network for the servers 
maybe they just don't notice if it takes a while?

> I still need to think about how best to solve this problem.

Setup a array of queues for self healing, sorted by size maybe?

> 
> Let me tell you a bit more about this issue:
> there are two processes which heal the VM images:
> 1) self-heal-daemon. 2) Mount process.
> Self-heal daemon heals one VM image at a time. But mount process
> triggers self-heals for all the opened files(VM image is nothing but an
> opened file from filesystem's perspective) when a brick goes down and
> comes backup.


Thanks, interesting to know.

> So we need to come up with a scheme to throttle self-heals
> on the mount point to prevent this issue. I will update you as soon as I
> come up with a fix. This should not be hard to do. Need some time to
> choose the best approach. Thanks a lot for bringing up this issue.

Thanks you for looking at it!

Cheers,


-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-18 Thread Pranith Kumar Karampuri


On 11/18/2014 01:17 PM, Lindsay Mathieson wrote:

On 18 November 2014 17:40, Pranith Kumar Karampuri  wrote:

Sorry didn't see this one. I think this is happening because of 'diff' based
self-heal which does full file checksums, that I believe is the root cause.
Could you execute 'gluster volume set 
cluster.data-self-heal-algorithm full' to prevent this issue in future. But
this option will be effective for the new self-heals that will be triggered
after the execution of the command. The ongoing ones will still use the old
mode of self-heal.

Thanks, makes sense.

However given the files are tens of GB in size, won't it thrash my network?
Yes you are right. I wonder why thrashing of the network is never 
reported till now.
+Joejulian who also uses VMs on gluster(for 5 years now?). He uses this 
option of full self-heal (Thats what I saw in his bug reports).


I still need to think about how best to solve this problem.

Let me tell you a bit more about this issue:
there are two processes which heal the VM images:
1) self-heal-daemon. 2) Mount process.
Self-heal daemon heals one VM image at a time. But mount process 
triggers self-heals for all the opened files(VM image is nothing but an 
opened file from filesystem's perspective) when a brick goes down and 
comes backup. So we need to come up with a scheme to throttle self-heals 
on the mount point to prevent this issue. I will update you as soon as I 
come up with a fix. This should not be hard to do. Need some time to 
choose the best approach. Thanks a lot for bringing up this issue.


Pranith

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-18 Thread Lindsay Mathieson
On 18 November 2014 18:05, Franco Broi  wrote:
>
> Can't see how any of that could account for 1000% cpu unless it's just
> stuck in a loop.


Currently still varying between 400% to 950%

Can glusterfsd be killed without effecting the lgfapi clients? (KVM's)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-18 Thread Franco Broi

Can't see how any of that could account for 1000% cpu unless it's just
stuck in a loop.

On Tue, 2014-11-18 at 18:00 +1000, Lindsay Mathieson wrote: 
> On 18 November 2014 17:46, Franco Broi  wrote:
> >
> > Try strace -Ff -e file -p 'glusterfsd pid'
> 
> Thanks, Attached
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-18 Thread Lindsay Mathieson
On 18 November 2014 17:46, Franco Broi  wrote:
>
> Try strace -Ff -e file -p 'glusterfsd pid'

Thanks, Attached
Process 27115 attached with 25 threads - interrupt to quit
[pid 27122] stat("/mnt/gluster-brick1/datastore", {st_mode=S_IFDIR|0755, 
st_size=4, ...}) = 0
[pid 11840] lstat("/mnt/gluster-brick1/datastore/", {st_mode=S_IFDIR|0755, 
st_size=4, ...}) = 0
[pid 11840] lgetxattr("/mnt/gluster-brick1/datastore/", 
"system.posix_acl_default", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
[pid 11840] lgetxattr("/mnt/gluster-brick1/datastore/", 
"system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
[pid 11840] lgetxattr("/mnt/gluster-brick1/datastore/", "trusted.glusterfs.dht" 

[pid 29198] lstat("/mnt/gluster-brick1/datastore/", {st_mode=S_IFDIR|0755, 
st_size=4, ...}) = 0
[pid 29198] lgetxattr("/mnt/gluster-brick1/datastore/", 
"system.posix_acl_default", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
[pid 29198] lgetxattr("/mnt/gluster-brick1/datastore/", 
"system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
[pid 29198] lgetxattr("/mnt/gluster-brick1/datastore/", "trusted.glusterfs.dht" 

[pid 29197] lstat("/mnt/gluster-brick1/datastore/", {st_mode=S_IFDIR|0755, 
st_size=4, ...}) = 0
[pid 29197] lgetxattr("/mnt/gluster-brick1/datastore/", 
"system.posix_acl_default", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
[pid 29197] lgetxattr("/mnt/gluster-brick1/datastore/", 
"system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
[pid 29197] lgetxattr("/mnt/gluster-brick1/datastore/", "trusted.glusterfs.dht" 

[pid 11840] <... lgetxattr resumed> , 0x0, 0) = 16
[pid 11840] lgetxattr("/mnt/gluster-brick1/datastore/", 
"trusted.glusterfs.dht", 
"\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xff\xff", 16) = 16
[pid 11840] lgetxattr("/mnt/gluster-brick1/datastore/", "missing-gfid-ESTALE", 
0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
[pid 11840] lgetxattr("/mnt/gluster-brick1/datastore/", 
"trusted.afr.datastore1-client-0", 0x0, 0) = -1 ENODATA (No data available)
[pid 11840] lgetxattr("/mnt/gluster-brick1/datastore/", 
"trusted.afr.datastore1-client-1", 0x0, 0) = -1 ENODATA (No data available)
[pid 11840] llistxattr("/mnt/gluster-brick1/datastore/", (nil), 0) = 63
[pid 11840] llistxattr("/mnt/gluster-brick1/datastore/", 0x7feae3cfda10, 63) = 
63
[pid 29198] <... lgetxattr resumed> , 0x0, 0) = 16
[pid 29198] lgetxattr("/mnt/gluster-brick1/datastore/", 
"trusted.glusterfs.dht", 
"\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xff\xff", 16) = 16
[pid 29198] lgetxattr("/mnt/gluster-brick1/datastore/", "missing-gfid-ESTALE", 
0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
[pid 29198] lgetxattr("/mnt/gluster-brick1/datastore/", 
"trusted.afr.datastore1-client-0", 0x0, 0) = -1 ENODATA (No data available)
[pid 29198] lgetxattr("/mnt/gluster-brick1/datastore/", 
"trusted.afr.datastore1-client-1" 
[pid 29197] <... lgetxattr resumed> , 0x0, 0) = 16
[pid 29198] <... lgetxattr resumed> , 0x0, 0) = -1 ENODATA (No data available)
[pid 29197] lgetxattr("/mnt/gluster-brick1/datastore/", "trusted.glusterfs.dht" 

[pid 29198] llistxattr("/mnt/gluster-brick1/datastore/", (nil), 0) = 63
[pid 29198] llistxattr("/mnt/gluster-brick1/datastore/" 
[pid 29197] <... lgetxattr resumed> , 
"\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xff\xff", 16) = 16
[pid 29198] <... llistxattr resumed> , 0x7feae3ffea10, 63) = 63
[pid 29197] lgetxattr("/mnt/gluster-brick1/datastore/", "missing-gfid-ESTALE", 
0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
[pid 29197] lgetxattr("/mnt/gluster-brick1/datastore/", 
"trusted.afr.datastore1-client-0", 0x0, 0) = -1 ENODATA (No data available)
[pid 29197] lgetxattr("/mnt/gluster-brick1/datastore/", 
"trusted.afr.datastore1-client-1", 0x0, 0) = -1 ENODATA (No data available)
[pid 29197] llistxattr("/mnt/gluster-brick1/datastore/", (nil), 0) = 63
[pid 29197] llistxattr("/mnt/gluster-brick1/datastore/", 0x7feaf0487a10, 63) = 
63
[pid 11846] lstat("/mnt/gluster-brick1/datastore/images", 
{st_mode=S_IFDIR|0755, st_size=27, ...}) = 0
[pid 11846] lgetxattr("/mnt/gluster-brick1/datastore/images", "trusted.gfid" 

[pid 11844] lstat("/mnt/gluster-brick1/datastore/images", 
{st_mode=S_IFDIR|0755, st_size=27, ...}) = 0
[pid 11844] lgetxattr("/mnt/gluster-brick1/datastore/images", "trusted.gfid" 

[pid 11845] lstat("/mnt/gluster-brick1/datastore/images", 
{st_mode=S_IFDIR|0755, st_size=27, ...}) = 0
[pid 11845] lgetxattr("/mnt/gluster-brick1/datastore/images", "trusted.gfid" 

[pid 11844] <... lgetxattr resumed> , "\xbe\x7fIlH\xb0C\xbd\xaaA=BJ6\xca\xb1", 
16) = 16
[pid 11846] <... lgetxattr resumed> , "\xbe\x7fIlH\xb0C\xbd\xaaA=BJ6\xca\xb1", 
16) = 16
[pid 11845] <... lgetxattr resumed> , "\xbe\x7fIlH\xb0C\xbd\xaaA=BJ6\xca\xb1", 
16) = 16
[pid 11846] lgetxattr("/mnt/gluster-brick1/datastore/images", 
"system.posix_acl_default" 
[pid 11845] lgetxattr("/mnt/gluster-brick1/datastore/images", 
"system.posix_acl_d

Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-17 Thread Lindsay Mathieson
On 18 November 2014 17:40, Pranith Kumar Karampuri  wrote:
>
> Sorry didn't see this one. I think this is happening because of 'diff' based
> self-heal which does full file checksums, that I believe is the root cause.
> Could you execute 'gluster volume set 
> cluster.data-self-heal-algorithm full' to prevent this issue in future. But
> this option will be effective for the new self-heals that will be triggered
> after the execution of the command. The ongoing ones will still use the old
> mode of self-heal.

Thanks, makes sense.

However given the files are tens of GB in size, won't it thrash my network?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-17 Thread Franco Broi

Try strace -Ff -e file -p 'glusterfsd pid'

On Tue, 2014-11-18 at 17:42 +1000, Lindsay Mathieson wrote: 
> Sorry, meant to send to the list. strace attached.
> 
> On 18 November 2014 17:35, Pranith Kumar Karampuri  
> wrote:
> >
> > On 11/18/2014 12:32 PM, Lindsay Mathieson wrote:
> >>
> >> 2 Node replicate setup,
> >>
> >> Everything has been stable for days untill I had occasion to reboot
> >> one of the nodes. Since then (past hour) glusterfsd has been pegging
> >> the CPU(s), utilization ranging from 1% to 1000% !
> >>
> >> On average its around 500%
> >>
> >> This is a vm server, so there are only 27 VM images for a total of
> >> 800GB. Its an Intel E5-2620 (12 Cores) with 32GB ECC RAM
> >>
> >> - What does glusterfsd do?
> >>
> >> - What can I do to fix this?
> >
> > Which version of glusterfs are you using? Do you have directories with lots
> > of files?
> >
> > Pranith
> >>
> >>
> >> thanks,
> >>
> >
> 
> 
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-17 Thread Lindsay Mathieson
Sorry, meant to send to the list. strace attached.

On 18 November 2014 17:35, Pranith Kumar Karampuri  wrote:
>
> On 11/18/2014 12:32 PM, Lindsay Mathieson wrote:
>>
>> 2 Node replicate setup,
>>
>> Everything has been stable for days untill I had occasion to reboot
>> one of the nodes. Since then (past hour) glusterfsd has been pegging
>> the CPU(s), utilization ranging from 1% to 1000% !
>>
>> On average its around 500%
>>
>> This is a vm server, so there are only 27 VM images for a total of
>> 800GB. Its an Intel E5-2620 (12 Cores) with 32GB ECC RAM
>>
>> - What does glusterfsd do?
>>
>> - What can I do to fix this?
>
> Which version of glusterfs are you using? Do you have directories with lots
> of files?
>
> Pranith
>>
>>
>> thanks,
>>
>



-- 
Lindsay
execve("/usr/sbin/glusterfsd", ["glusterfsd"], [/* 15 vars */]) = 0
brk(0)  = 0x1e76000
access("/etc/ld.so.nohwcap", F_OK)  = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f365dc72000
access("/etc/ld.so.preload", R_OK)  = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)  = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=32190, ...}) = 0
mmap(NULL, 32190, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f365dc6a000
close(3)= 0
access("/etc/ld.so.nohwcap", F_OK)  = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340\r\0\0\0\0\0\0"..., 
832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=14768, ...}) = 0
mmap(NULL, 2109696, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7f365d851000
mprotect(0x7f365d853000, 2097152, PROT_NONE) = 0
mmap(0x7f365da53000, 8192, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f365da53000
close(3)= 0
access("/etc/ld.so.nohwcap", F_OK)  = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libutil.so.1", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\16\0\0\0\0\0\0"..., 
832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=10640, ...}) = 0
mmap(NULL, 2105608, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7f365d64e000
mprotect(0x7f365d65, 2093056, PROT_NONE) = 0
mmap(0x7f365d84f000, 8192, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x7f365d84f000
close(3)= 0
access("/etc/ld.so.nohwcap", F_OK)  = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libm.so.6", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360>\0\0\0\0\0\0"..., 
832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=530736, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f365dc69000
mmap(NULL, 2625768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7f365d3cc000
mprotect(0x7f365d44d000, 2093056, PROT_NONE) = 0
mmap(0x7f365d64c000, 8192, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x8) = 0x7f365d64c000
close(3)= 0
access("/etc/ld.so.nohwcap", F_OK)  = -1 ENOENT (No such file or directory)
open("/usr/lib/libpython2.7.so.1.0", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\217\4\0\0\0\0\0"..., 
832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=3073448, ...}) = 0
mmap(NULL, 5242520, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7f365cecc000
mprotect(0x7f365d15, 2093056, PROT_NONE) = 0
mmap(0x7f365d34f000, 438272, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x283000) = 0x7f365d34f000
mmap(0x7f365d3ba000, 73368, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f365d3ba000
close(3)= 0
access("/etc/ld.so.nohwcap", F_OK)  = -1 ENOENT (No such file or directory)
open("/usr/lib/x86_64-linux-gnu/libglusterfs.so.0", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340Q\1\0\0\0\0\0"..., 
832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=557592, ...}) = 0
mmap(NULL, 2666280, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7f365cc41000
mprotect(0x7f365ccc7000, 2097152, PROT_NONE) = 0
mmap(0x7f365cec7000, 8192, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x86000) = 0x7f365cec7000
mmap(0x7f365cec9000, 12072, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f365cec9000
close(3)= 0
access("/etc/ld.so.nohwcap", F_OK)  = -1 ENOENT (No such file or directory)
open("/usr/lib/x86_64-linux-gnu/libgfrpc.so.0", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360Z\0\0\0\0\0\0"..., 
832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=105848, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f365dc68000
mmap(NULL, 2201016, PRO

Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-17 Thread Lindsay Mathieson
Gluster 3.5.2

Very few files - its purely a VM image host, 27 files, 10 - 60GB in size.

seems to be undergoing a heal:

root@vnb:~# gluster volume heal datastore1 info
Brick vnb:/mnt/gluster-brick1/datastore/
/images/108/vm-108-disk-1.qcow2 - Possibly undergoing heal
/images/105/vm-105-disk-1.qcow2 - Possibly undergoing heal
/images/100/vm-100-disk-1.qcow2 - Possibly undergoing heal
/images/401/vm-401-disk-1.qcow2 - Possibly undergoing heal
/images/201/vm-201-disk-1.qcow2 - Possibly undergoing heal
/images/204/vm-204-disk-1.qcow2 - Possibly undergoing heal
/images/102/vm-102-disk-1.qcow2 - Possibly undergoing heal
/images/501/vm-501-disk-1.qcow2 - Possibly undergoing heal
/images/203/vm-203-disk-1.qcow2 - Possibly undergoing heal
/images/106/vm-106-disk-1.qcow2 - Possibly undergoing heal
/images/400/vm-400-disk-1.qcow2 - Possibly undergoing heal
/images/107/vm-107-disk-1.qcow2 - Possibly undergoing heal
Number of entries: 12

Brick vng:/mnt/gluster-brick1/datastore/
 - Possibly undergoing heal
 - Possibly undergoing heal
 - Possibly undergoing heal
Number of entries: 3


What would the gfid entries be?

On 18 November 2014 17:35, Pranith Kumar Karampuri  wrote:
>
> On 11/18/2014 12:32 PM, Lindsay Mathieson wrote:
>>
>> 2 Node replicate setup,
>>
>> Everything has been stable for days untill I had occasion to reboot
>> one of the nodes. Since then (past hour) glusterfsd has been pegging
>> the CPU(s), utilization ranging from 1% to 1000% !
>>
>> On average its around 500%
>>
>> This is a vm server, so there are only 27 VM images for a total of
>> 800GB. Its an Intel E5-2620 (12 Cores) with 32GB ECC RAM
>>
>> - What does glusterfsd do?
>>
>> - What can I do to fix this?
>
> Which version of glusterfs are you using? Do you have directories with lots
> of files?
>
> Pranith
>>
>>
>> thanks,
>>
>



-- 
Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-17 Thread Pranith Kumar Karampuri


On 11/18/2014 01:05 PM, Pranith Kumar Karampuri wrote:


On 11/18/2014 12:32 PM, Lindsay Mathieson wrote:

2 Node replicate setup,

Everything has been stable for days untill I had occasion to reboot
one of the nodes. Since then (past hour) glusterfsd has been pegging
the CPU(s), utilization ranging from 1% to 1000% !

On average its around 500%

This is a vm server, so there are only 27 VM images for a total of
800GB. Its an Intel E5-2620 (12 Cores) with 32GB ECC RAM
Sorry didn't see this one. I think this is happening because of 'diff' 
based self-heal which does full file checksums, that I believe is the 
root cause. Could you execute 'gluster volume set  
cluster.data-self-heal-algorithm full' to prevent this issue in future. 
But this option will be effective for the new self-heals that will be 
triggered after the execution of the command. The ongoing ones will 
still use the old mode of self-heal.


Pranith


- What does glusterfsd do?

- What can I do to fix this?
Which version of glusterfs are you using? Do you have directories with 
lots of files?


Pranith


thanks,



___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-17 Thread Pranith Kumar Karampuri


On 11/18/2014 12:32 PM, Lindsay Mathieson wrote:

2 Node replicate setup,

Everything has been stable for days untill I had occasion to reboot
one of the nodes. Since then (past hour) glusterfsd has been pegging
the CPU(s), utilization ranging from 1% to 1000% !

On average its around 500%

This is a vm server, so there are only 27 VM images for a total of
800GB. Its an Intel E5-2620 (12 Cores) with 32GB ECC RAM

- What does glusterfsd do?

- What can I do to fix this?
Which version of glusterfs are you using? Do you have directories with 
lots of files?


Pranith


thanks,



___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-17 Thread Franco Broi

glusterfsd is the filesystem daemon. You could trace strace'ing it to
see what it's doing.

On Tue, 2014-11-18 at 17:09 +1000, Lindsay Mathieson wrote: 
> And its happening on both nodes now, they have become near unusable.
> 
> On 18 November 2014 17:03, Lindsay Mathieson
>  wrote:
> > ps. There is very little network traffic happening
> >
> >
> >
> > --
> > Lindsay
> 
> 
> 


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-17 Thread Lindsay Mathieson
And its happening on both nodes now, they have become near unusable.

On 18 November 2014 17:03, Lindsay Mathieson
 wrote:
> ps. There is very little network traffic happening
>
>
>
> --
> Lindsay



-- 
Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-17 Thread Lindsay Mathieson
ps. There is very little network traffic happening



-- 
Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users