Re: [Gluster-users] glusterfsd process thrashing CPU
Actually I had the same experience when I was using 3.4.2 https://www.mail-archive.com/gluster-users@gluster.org/msg15850.html If I understand, I should be using FULL heal rather than DIFF for large vm-images? I was not sure throttling was working for 3.4.2 or not. I attempted to recover the entire volume filled with VM-images ranging from size 10G to 500G I saw it was recovering 2 images at a time rather than all at once. Thanks, Adrian From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of Pranith Kumar Karampuri Sent: Tuesday, November 18, 2014 7:02 PM To: Lindsay Mathieson; gluster-users Subject: Re: [Gluster-users] glusterfsd process thrashing CPU On 11/18/2014 04:14 PM, Lindsay Mathieson wrote: On Tue, 18 Nov 2014 02:36:19 PM Pranith Kumar Karampuri wrote: On 11/18/2014 01:17 PM, Lindsay Mathieson wrote: On 18 November 2014 17:40, Pranith Kumar Karampuri <mailto:pkara...@redhat.com> wrote: However given the files are tens of GB in size, won't it thrash my network? Yes you are right. I wonder why thrashing of the network is never reported till now. Not sure if you are being sarcastic or not :) But from what I've observed, sync operations seem to self throttle, I've not seen them use more than 50% of bandwidth, and given most setups have a dedicated network for the servers maybe they just don't notice if it takes a while? No, I was not being sarcastic :-). I am genuinely wondering why it is not reported till now. May be Joe will have more inputs there, that is the reason I CCed him. I still need to think about how best to solve this problem. Setup a array of queues for self healing, sorted by size maybe? Let me tell you a bit more about this issue: there are two processes which heal the VM images: 1) self-heal-daemon. 2) Mount process. Self-heal daemon heals one VM image at a time. But mount process triggers self-heals for all the opened files(VM image is nothing but an opened file from filesystem's perspective) when a brick goes down and comes backup. Thanks, interesting to know. So we need to come up with a scheme to throttle self-heals on the mount point to prevent this issue. I will update you as soon as I come up with a fix. This should not be hard to do. Need some time to choose the best approach. Thanks a lot for bringing up this issue. Thanks you for looking at it! Cheers, ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfsd process thrashing CPU
On 11/18/2014 04:14 PM, Lindsay Mathieson wrote: On Tue, 18 Nov 2014 02:36:19 PM Pranith Kumar Karampuri wrote: On 11/18/2014 01:17 PM, Lindsay Mathieson wrote: On 18 November 2014 17:40, Pranith Kumar Karampuri wrote: However given the files are tens of GB in size, won't it thrash my network? Yes you are right. I wonder why thrashing of the network is never reported till now. Not sure if you are being sarcastic or not :) But from what I've observed, sync operations seem to self throttle, I've not seen them use more than 50% of bandwidth, and given most setups have a dedicated network for the servers maybe they just don't notice if it takes a while? No, I was not being sarcastic :-). I am genuinely wondering why it is not reported till now. May be Joe will have more inputs there, that is the reason I CCed him. I still need to think about how best to solve this problem. Setup a array of queues for self healing, sorted by size maybe? Let me tell you a bit more about this issue: there are two processes which heal the VM images: 1) self-heal-daemon. 2) Mount process. Self-heal daemon heals one VM image at a time. But mount process triggers self-heals for all the opened files(VM image is nothing but an opened file from filesystem's perspective) when a brick goes down and comes backup. Thanks, interesting to know. So we need to come up with a scheme to throttle self-heals on the mount point to prevent this issue. I will update you as soon as I come up with a fix. This should not be hard to do. Need some time to choose the best approach. Thanks a lot for bringing up this issue. Thanks you for looking at it! Cheers, ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfsd process thrashing CPU
On Tue, 18 Nov 2014 02:36:19 PM Pranith Kumar Karampuri wrote: > On 11/18/2014 01:17 PM, Lindsay Mathieson wrote: > > On 18 November 2014 17:40, Pranith Kumar Karampuri wrote: > > > > However given the files are tens of GB in size, won't it thrash my > > network? > > Yes you are right. I wonder why thrashing of the network is never > reported till now. Not sure if you are being sarcastic or not :) But from what I've observed, sync operations seem to self throttle, I've not seen them use more than 50% of bandwidth, and given most setups have a dedicated network for the servers maybe they just don't notice if it takes a while? > I still need to think about how best to solve this problem. Setup a array of queues for self healing, sorted by size maybe? > > Let me tell you a bit more about this issue: > there are two processes which heal the VM images: > 1) self-heal-daemon. 2) Mount process. > Self-heal daemon heals one VM image at a time. But mount process > triggers self-heals for all the opened files(VM image is nothing but an > opened file from filesystem's perspective) when a brick goes down and > comes backup. Thanks, interesting to know. > So we need to come up with a scheme to throttle self-heals > on the mount point to prevent this issue. I will update you as soon as I > come up with a fix. This should not be hard to do. Need some time to > choose the best approach. Thanks a lot for bringing up this issue. Thanks you for looking at it! Cheers, -- Lindsay signature.asc Description: This is a digitally signed message part. ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfsd process thrashing CPU
On 11/18/2014 01:17 PM, Lindsay Mathieson wrote: On 18 November 2014 17:40, Pranith Kumar Karampuri wrote: Sorry didn't see this one. I think this is happening because of 'diff' based self-heal which does full file checksums, that I believe is the root cause. Could you execute 'gluster volume set cluster.data-self-heal-algorithm full' to prevent this issue in future. But this option will be effective for the new self-heals that will be triggered after the execution of the command. The ongoing ones will still use the old mode of self-heal. Thanks, makes sense. However given the files are tens of GB in size, won't it thrash my network? Yes you are right. I wonder why thrashing of the network is never reported till now. +Joejulian who also uses VMs on gluster(for 5 years now?). He uses this option of full self-heal (Thats what I saw in his bug reports). I still need to think about how best to solve this problem. Let me tell you a bit more about this issue: there are two processes which heal the VM images: 1) self-heal-daemon. 2) Mount process. Self-heal daemon heals one VM image at a time. But mount process triggers self-heals for all the opened files(VM image is nothing but an opened file from filesystem's perspective) when a brick goes down and comes backup. So we need to come up with a scheme to throttle self-heals on the mount point to prevent this issue. I will update you as soon as I come up with a fix. This should not be hard to do. Need some time to choose the best approach. Thanks a lot for bringing up this issue. Pranith ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfsd process thrashing CPU
On 18 November 2014 18:05, Franco Broi wrote: > > Can't see how any of that could account for 1000% cpu unless it's just > stuck in a loop. Currently still varying between 400% to 950% Can glusterfsd be killed without effecting the lgfapi clients? (KVM's) ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfsd process thrashing CPU
Can't see how any of that could account for 1000% cpu unless it's just stuck in a loop. On Tue, 2014-11-18 at 18:00 +1000, Lindsay Mathieson wrote: > On 18 November 2014 17:46, Franco Broi wrote: > > > > Try strace -Ff -e file -p 'glusterfsd pid' > > Thanks, Attached > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfsd process thrashing CPU
On 18 November 2014 17:46, Franco Broi wrote: > > Try strace -Ff -e file -p 'glusterfsd pid' Thanks, Attached Process 27115 attached with 25 threads - interrupt to quit [pid 27122] stat("/mnt/gluster-brick1/datastore", {st_mode=S_IFDIR|0755, st_size=4, ...}) = 0 [pid 11840] lstat("/mnt/gluster-brick1/datastore/", {st_mode=S_IFDIR|0755, st_size=4, ...}) = 0 [pid 11840] lgetxattr("/mnt/gluster-brick1/datastore/", "system.posix_acl_default", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) [pid 11840] lgetxattr("/mnt/gluster-brick1/datastore/", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) [pid 11840] lgetxattr("/mnt/gluster-brick1/datastore/", "trusted.glusterfs.dht" [pid 29198] lstat("/mnt/gluster-brick1/datastore/", {st_mode=S_IFDIR|0755, st_size=4, ...}) = 0 [pid 29198] lgetxattr("/mnt/gluster-brick1/datastore/", "system.posix_acl_default", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) [pid 29198] lgetxattr("/mnt/gluster-brick1/datastore/", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) [pid 29198] lgetxattr("/mnt/gluster-brick1/datastore/", "trusted.glusterfs.dht" [pid 29197] lstat("/mnt/gluster-brick1/datastore/", {st_mode=S_IFDIR|0755, st_size=4, ...}) = 0 [pid 29197] lgetxattr("/mnt/gluster-brick1/datastore/", "system.posix_acl_default", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) [pid 29197] lgetxattr("/mnt/gluster-brick1/datastore/", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) [pid 29197] lgetxattr("/mnt/gluster-brick1/datastore/", "trusted.glusterfs.dht" [pid 11840] <... lgetxattr resumed> , 0x0, 0) = 16 [pid 11840] lgetxattr("/mnt/gluster-brick1/datastore/", "trusted.glusterfs.dht", "\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xff\xff", 16) = 16 [pid 11840] lgetxattr("/mnt/gluster-brick1/datastore/", "missing-gfid-ESTALE", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) [pid 11840] lgetxattr("/mnt/gluster-brick1/datastore/", "trusted.afr.datastore1-client-0", 0x0, 0) = -1 ENODATA (No data available) [pid 11840] lgetxattr("/mnt/gluster-brick1/datastore/", "trusted.afr.datastore1-client-1", 0x0, 0) = -1 ENODATA (No data available) [pid 11840] llistxattr("/mnt/gluster-brick1/datastore/", (nil), 0) = 63 [pid 11840] llistxattr("/mnt/gluster-brick1/datastore/", 0x7feae3cfda10, 63) = 63 [pid 29198] <... lgetxattr resumed> , 0x0, 0) = 16 [pid 29198] lgetxattr("/mnt/gluster-brick1/datastore/", "trusted.glusterfs.dht", "\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xff\xff", 16) = 16 [pid 29198] lgetxattr("/mnt/gluster-brick1/datastore/", "missing-gfid-ESTALE", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) [pid 29198] lgetxattr("/mnt/gluster-brick1/datastore/", "trusted.afr.datastore1-client-0", 0x0, 0) = -1 ENODATA (No data available) [pid 29198] lgetxattr("/mnt/gluster-brick1/datastore/", "trusted.afr.datastore1-client-1" [pid 29197] <... lgetxattr resumed> , 0x0, 0) = 16 [pid 29198] <... lgetxattr resumed> , 0x0, 0) = -1 ENODATA (No data available) [pid 29197] lgetxattr("/mnt/gluster-brick1/datastore/", "trusted.glusterfs.dht" [pid 29198] llistxattr("/mnt/gluster-brick1/datastore/", (nil), 0) = 63 [pid 29198] llistxattr("/mnt/gluster-brick1/datastore/" [pid 29197] <... lgetxattr resumed> , "\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xff\xff", 16) = 16 [pid 29198] <... llistxattr resumed> , 0x7feae3ffea10, 63) = 63 [pid 29197] lgetxattr("/mnt/gluster-brick1/datastore/", "missing-gfid-ESTALE", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) [pid 29197] lgetxattr("/mnt/gluster-brick1/datastore/", "trusted.afr.datastore1-client-0", 0x0, 0) = -1 ENODATA (No data available) [pid 29197] lgetxattr("/mnt/gluster-brick1/datastore/", "trusted.afr.datastore1-client-1", 0x0, 0) = -1 ENODATA (No data available) [pid 29197] llistxattr("/mnt/gluster-brick1/datastore/", (nil), 0) = 63 [pid 29197] llistxattr("/mnt/gluster-brick1/datastore/", 0x7feaf0487a10, 63) = 63 [pid 11846] lstat("/mnt/gluster-brick1/datastore/images", {st_mode=S_IFDIR|0755, st_size=27, ...}) = 0 [pid 11846] lgetxattr("/mnt/gluster-brick1/datastore/images", "trusted.gfid" [pid 11844] lstat("/mnt/gluster-brick1/datastore/images", {st_mode=S_IFDIR|0755, st_size=27, ...}) = 0 [pid 11844] lgetxattr("/mnt/gluster-brick1/datastore/images", "trusted.gfid" [pid 11845] lstat("/mnt/gluster-brick1/datastore/images", {st_mode=S_IFDIR|0755, st_size=27, ...}) = 0 [pid 11845] lgetxattr("/mnt/gluster-brick1/datastore/images", "trusted.gfid" [pid 11844] <... lgetxattr resumed> , "\xbe\x7fIlH\xb0C\xbd\xaaA=BJ6\xca\xb1", 16) = 16 [pid 11846] <... lgetxattr resumed> , "\xbe\x7fIlH\xb0C\xbd\xaaA=BJ6\xca\xb1", 16) = 16 [pid 11845] <... lgetxattr resumed> , "\xbe\x7fIlH\xb0C\xbd\xaaA=BJ6\xca\xb1", 16) = 16 [pid 11846] lgetxattr("/mnt/gluster-brick1/datastore/images", "system.posix_acl_default" [pid 11845] lgetxattr("/mnt/gluster-brick1/datastore/images", "system.posix_acl_d
Re: [Gluster-users] glusterfsd process thrashing CPU
On 18 November 2014 17:40, Pranith Kumar Karampuri wrote: > > Sorry didn't see this one. I think this is happening because of 'diff' based > self-heal which does full file checksums, that I believe is the root cause. > Could you execute 'gluster volume set > cluster.data-self-heal-algorithm full' to prevent this issue in future. But > this option will be effective for the new self-heals that will be triggered > after the execution of the command. The ongoing ones will still use the old > mode of self-heal. Thanks, makes sense. However given the files are tens of GB in size, won't it thrash my network? ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfsd process thrashing CPU
Try strace -Ff -e file -p 'glusterfsd pid' On Tue, 2014-11-18 at 17:42 +1000, Lindsay Mathieson wrote: > Sorry, meant to send to the list. strace attached. > > On 18 November 2014 17:35, Pranith Kumar Karampuri > wrote: > > > > On 11/18/2014 12:32 PM, Lindsay Mathieson wrote: > >> > >> 2 Node replicate setup, > >> > >> Everything has been stable for days untill I had occasion to reboot > >> one of the nodes. Since then (past hour) glusterfsd has been pegging > >> the CPU(s), utilization ranging from 1% to 1000% ! > >> > >> On average its around 500% > >> > >> This is a vm server, so there are only 27 VM images for a total of > >> 800GB. Its an Intel E5-2620 (12 Cores) with 32GB ECC RAM > >> > >> - What does glusterfsd do? > >> > >> - What can I do to fix this? > > > > Which version of glusterfs are you using? Do you have directories with lots > > of files? > > > > Pranith > >> > >> > >> thanks, > >> > > > > > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfsd process thrashing CPU
Sorry, meant to send to the list. strace attached. On 18 November 2014 17:35, Pranith Kumar Karampuri wrote: > > On 11/18/2014 12:32 PM, Lindsay Mathieson wrote: >> >> 2 Node replicate setup, >> >> Everything has been stable for days untill I had occasion to reboot >> one of the nodes. Since then (past hour) glusterfsd has been pegging >> the CPU(s), utilization ranging from 1% to 1000% ! >> >> On average its around 500% >> >> This is a vm server, so there are only 27 VM images for a total of >> 800GB. Its an Intel E5-2620 (12 Cores) with 32GB ECC RAM >> >> - What does glusterfsd do? >> >> - What can I do to fix this? > > Which version of glusterfs are you using? Do you have directories with lots > of files? > > Pranith >> >> >> thanks, >> > -- Lindsay execve("/usr/sbin/glusterfsd", ["glusterfsd"], [/* 15 vars */]) = 0 brk(0) = 0x1e76000 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f365dc72000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=32190, ...}) = 0 mmap(NULL, 32190, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f365dc6a000 close(3)= 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) open("/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340\r\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0644, st_size=14768, ...}) = 0 mmap(NULL, 2109696, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f365d851000 mprotect(0x7f365d853000, 2097152, PROT_NONE) = 0 mmap(0x7f365da53000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f365da53000 close(3)= 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) open("/lib/x86_64-linux-gnu/libutil.so.1", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\16\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0644, st_size=10640, ...}) = 0 mmap(NULL, 2105608, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f365d64e000 mprotect(0x7f365d65, 2093056, PROT_NONE) = 0 mmap(0x7f365d84f000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x7f365d84f000 close(3)= 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) open("/lib/x86_64-linux-gnu/libm.so.6", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360>\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0644, st_size=530736, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f365dc69000 mmap(NULL, 2625768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f365d3cc000 mprotect(0x7f365d44d000, 2093056, PROT_NONE) = 0 mmap(0x7f365d64c000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x8) = 0x7f365d64c000 close(3)= 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) open("/usr/lib/libpython2.7.so.1.0", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\217\4\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0644, st_size=3073448, ...}) = 0 mmap(NULL, 5242520, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f365cecc000 mprotect(0x7f365d15, 2093056, PROT_NONE) = 0 mmap(0x7f365d34f000, 438272, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x283000) = 0x7f365d34f000 mmap(0x7f365d3ba000, 73368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f365d3ba000 close(3)= 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) open("/usr/lib/x86_64-linux-gnu/libglusterfs.so.0", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340Q\1\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0644, st_size=557592, ...}) = 0 mmap(NULL, 2666280, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f365cc41000 mprotect(0x7f365ccc7000, 2097152, PROT_NONE) = 0 mmap(0x7f365cec7000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x86000) = 0x7f365cec7000 mmap(0x7f365cec9000, 12072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f365cec9000 close(3)= 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) open("/usr/lib/x86_64-linux-gnu/libgfrpc.so.0", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360Z\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0644, st_size=105848, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f365dc68000 mmap(NULL, 2201016, PRO
Re: [Gluster-users] glusterfsd process thrashing CPU
Gluster 3.5.2 Very few files - its purely a VM image host, 27 files, 10 - 60GB in size. seems to be undergoing a heal: root@vnb:~# gluster volume heal datastore1 info Brick vnb:/mnt/gluster-brick1/datastore/ /images/108/vm-108-disk-1.qcow2 - Possibly undergoing heal /images/105/vm-105-disk-1.qcow2 - Possibly undergoing heal /images/100/vm-100-disk-1.qcow2 - Possibly undergoing heal /images/401/vm-401-disk-1.qcow2 - Possibly undergoing heal /images/201/vm-201-disk-1.qcow2 - Possibly undergoing heal /images/204/vm-204-disk-1.qcow2 - Possibly undergoing heal /images/102/vm-102-disk-1.qcow2 - Possibly undergoing heal /images/501/vm-501-disk-1.qcow2 - Possibly undergoing heal /images/203/vm-203-disk-1.qcow2 - Possibly undergoing heal /images/106/vm-106-disk-1.qcow2 - Possibly undergoing heal /images/400/vm-400-disk-1.qcow2 - Possibly undergoing heal /images/107/vm-107-disk-1.qcow2 - Possibly undergoing heal Number of entries: 12 Brick vng:/mnt/gluster-brick1/datastore/ - Possibly undergoing heal - Possibly undergoing heal - Possibly undergoing heal Number of entries: 3 What would the gfid entries be? On 18 November 2014 17:35, Pranith Kumar Karampuri wrote: > > On 11/18/2014 12:32 PM, Lindsay Mathieson wrote: >> >> 2 Node replicate setup, >> >> Everything has been stable for days untill I had occasion to reboot >> one of the nodes. Since then (past hour) glusterfsd has been pegging >> the CPU(s), utilization ranging from 1% to 1000% ! >> >> On average its around 500% >> >> This is a vm server, so there are only 27 VM images for a total of >> 800GB. Its an Intel E5-2620 (12 Cores) with 32GB ECC RAM >> >> - What does glusterfsd do? >> >> - What can I do to fix this? > > Which version of glusterfs are you using? Do you have directories with lots > of files? > > Pranith >> >> >> thanks, >> > -- Lindsay ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfsd process thrashing CPU
On 11/18/2014 01:05 PM, Pranith Kumar Karampuri wrote: On 11/18/2014 12:32 PM, Lindsay Mathieson wrote: 2 Node replicate setup, Everything has been stable for days untill I had occasion to reboot one of the nodes. Since then (past hour) glusterfsd has been pegging the CPU(s), utilization ranging from 1% to 1000% ! On average its around 500% This is a vm server, so there are only 27 VM images for a total of 800GB. Its an Intel E5-2620 (12 Cores) with 32GB ECC RAM Sorry didn't see this one. I think this is happening because of 'diff' based self-heal which does full file checksums, that I believe is the root cause. Could you execute 'gluster volume set cluster.data-self-heal-algorithm full' to prevent this issue in future. But this option will be effective for the new self-heals that will be triggered after the execution of the command. The ongoing ones will still use the old mode of self-heal. Pranith - What does glusterfsd do? - What can I do to fix this? Which version of glusterfs are you using? Do you have directories with lots of files? Pranith thanks, ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfsd process thrashing CPU
On 11/18/2014 12:32 PM, Lindsay Mathieson wrote: 2 Node replicate setup, Everything has been stable for days untill I had occasion to reboot one of the nodes. Since then (past hour) glusterfsd has been pegging the CPU(s), utilization ranging from 1% to 1000% ! On average its around 500% This is a vm server, so there are only 27 VM images for a total of 800GB. Its an Intel E5-2620 (12 Cores) with 32GB ECC RAM - What does glusterfsd do? - What can I do to fix this? Which version of glusterfs are you using? Do you have directories with lots of files? Pranith thanks, ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfsd process thrashing CPU
glusterfsd is the filesystem daemon. You could trace strace'ing it to see what it's doing. On Tue, 2014-11-18 at 17:09 +1000, Lindsay Mathieson wrote: > And its happening on both nodes now, they have become near unusable. > > On 18 November 2014 17:03, Lindsay Mathieson > wrote: > > ps. There is very little network traffic happening > > > > > > > > -- > > Lindsay > > > ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfsd process thrashing CPU
And its happening on both nodes now, they have become near unusable. On 18 November 2014 17:03, Lindsay Mathieson wrote: > ps. There is very little network traffic happening > > > > -- > Lindsay -- Lindsay ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfsd process thrashing CPU
ps. There is very little network traffic happening -- Lindsay ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users