Re: [Gluster-users] Apache hung tasks still occur with glusterfs 3.2.1
hello. do you maybe have already feedback? was it successfull? (disabled io-cache, disabled stat-prefetch, inreades io-thread count to 64) is/was your problem similar to this one? http://bugs.gluster.com/show_bug.cgi?id=3011 thx christopher Am 13.06.2011 19:14, schrieb Jiri Lunacek: Thanks for the tip. I disabled io-cache and stat-prefetch, increased io-thread-count to 64 and rebooted the server to clean off the hung apache processes. We'll see tomorrow. On 13.6.2011, at 15:58, Justice London wrote: Disable io-cache and up the threads to 64 and your problems should disappear. They did for me when I made both of these changes. Justice London *From:*gluster-users-boun...@gluster.org mailto:gluster-users-boun...@gluster.org[mailto:gluster-users-boun...@gluster.org]*On Behalf Of*Jiri Lunacek *Sent:*Monday, June 13, 2011 1:49 AM *To:*gluster-users@gluster.org mailto:gluster-users@gluster.org *Subject:*[Gluster-users] Apache hung tasks still occur with glusterfs 3.2.1 Hi all. We have been having problems with hung tasks of apache reading from glusterfs 2-replica volume ever since upgrading to 3.2.0. The problems were identical to those described here: http://gluster.org/pipermail/gluster-users/2011-May/007697.html Yesterday we updated to 3.2.1. A good thing is that the hung tasks stopped appearing when gluster is in intact operation, i.e. when there are no modifications to the gluster configs at all. Today we modified some other volume exported by the same cluster (but not sharing anything with the volume used by the apache process). And, once again, two requests of apache reading from glusterfs volume are stuck. Any help with this issue would be very appreciated as right now we have to nightly-reboot the machine as the processes re stuck in iowait - unkillable. I really do not want to go through the downgrade to 3.1.4 since it seems from the mailing list that it may not go exactly smooth. We are exporting millions of files and any large operation on the exported filesystem takes days. I am attaching tech info on the problem. client: Centos 5.6 2.6.18-238.9.1.el5 fuse-2.7.4-8.el5 glusterfs-fuse-3.2.1-1 glusterfs-core-3.2.1-1 servers: Centos 5.6 2.6.18-194.32.1.el5 fuse-2.7.4-8.el5 glusterfs-fuse-3.2.1-1 glusterfs-core-3.2.1-1 dmesg: INFO: task httpd:1246 blocked for more than 120 seconds. echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. httpd D 81000101d7a0 0 1246 2394 1247 1191 (NOTLB) 81013ee7dc38 0082 0092 81013ee7dcd8 81013ee7dd04 000a 810144d0f7e0 81019fc28100 308f8b444727 14ee 810144d0f9c8 00038006e608 Call Trace: [8006ec4e] do_gettimeofday+0x40/0x90 [80028c5a] sync_page+0x0/0x43 [800637ca] io_schedule+0x3f/0x67 [80028c98] sync_page+0x3e/0x43 [8006390e] __wait_on_bit_lock+0x36/0x66 [8003ff27] __lock_page+0x5e/0x64 [800a2921] wake_bit_function+0x0/0x23 [8003fd85] pagevec_lookup+0x17/0x1e [800cc666] invalidate_inode_pages2_range+0x73/0x1bd [8004fc94] finish_wait+0x32/0x5d [884b9798] :fuse:wait_answer_interruptible+0xb6/0xbd [800a28f3] autoremove_wake_function+0x0/0x2e [8009a485] recalc_sigpending+0xe/0x25 [8001decc] sigprocmask+0xb7/0xdb [884bd456] :fuse:fuse_finish_open+0x36/0x62 [884bda11] :fuse:fuse_open_common+0x147/0x158 [884bda22] :fuse:fuse_open+0x0/0x7 [8001eb99] __dentry_open+0xd9/0x1dc [8002766e] do_filp_open+0x2a/0x38 [8001a061] do_sys_open+0x44/0xbe [8005d28d] tracesys+0xd5/0xe0 INFO: task httpd:1837 blocked for more than 120 seconds. echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. httpd D 810001004420 0 1837 2394 1856 1289 (NOTLB) 81013c6f9c38 0086 81013c6f9bf8 fffe 810170ce7000 000a 81019c0ae7a0 80311b60 308c0f83d792 0ec4 81019c0ae988 8006e608 Call Trace: [8006ec4e] do_gettimeofday+0x40/0x90 [80028c5a] sync_page+0x0/0x43 [800637ca] io_schedule+0x3f/0x67 [80028c98] sync_page+0x3e/0x43 [8006390e] __wait_on_bit_lock+0x36/0x66 [8003ff27] __lock_page+0x5e/0x64 [800a2921] wake_bit_function+0x0/0x23 [8003fd85] pagevec_lookup+0x17/0x1e [800cc666] invalidate_inode_pages2_range+0x73/0x1bd [8004fc94] finish_wait+0x32/0x5d [884b9798] :fuse:wait_answer_interruptible+0xb6/0xbd [800a28f3] autoremove_wake_function+0x0/0x2e [8009a485] recalc_sigpending+0xe/0x25 [8001decc] sigprocmask+0xb7/0xdb [884bd456] :fuse:fuse_finish_open+0x36/0x62 [884bda11] :fuse:fuse_open_common+0x147/0x158 [884bda22] :fuse:fuse_open+0x0/0x7 [8001eb99] __dentry_open+0xd9/0x1dc [8002766e] do_filp_open+0x2a/0x38 [8001a061] do_sys_open+0x44/0xbe [8005d28d]
Re: [Gluster-users] Apache hung tasks still occur with glusterfs 3.2.1
Can you get us the process state dump of the glusterfs client where httpd is hung? kill -USR1 glusterfs pid will generate /tmp/glusterdump.pid which is the dumpfile. Avati On Mon, Jun 13, 2011 at 2:18 PM, Jiri Lunacek jiri.luna...@hosting90.czwrote: Hi all. We have been having problems with hung tasks of apache reading from glusterfs 2-replica volume ever since upgrading to 3.2.0. The problems were identical to those described here: http://gluster.org/pipermail/gluster-users/2011-May/007697.html Yesterday we updated to 3.2.1. A good thing is that the hung tasks stopped appearing when gluster is in intact operation, i.e. when there are no modifications to the gluster configs at all. Today we modified some other volume exported by the same cluster (but not sharing anything with the volume used by the apache process). And, once again, two requests of apache reading from glusterfs volume are stuck. Any help with this issue would be very appreciated as right now we have to nightly-reboot the machine as the processes re stuck in iowait - unkillable. I really do not want to go through the downgrade to 3.1.4 since it seems from the mailing list that it may not go exactly smooth. We are exporting millions of files and any large operation on the exported filesystem takes days. I am attaching tech info on the problem. client: Centos 5.6 2.6.18-238.9.1.el5 fuse-2.7.4-8.el5 glusterfs-fuse-3.2.1-1 glusterfs-core-3.2.1-1 servers: Centos 5.6 2.6.18-194.32.1.el5 fuse-2.7.4-8.el5 glusterfs-fuse-3.2.1-1 glusterfs-core-3.2.1-1 dmesg: INFO: task httpd:1246 blocked for more than 120 seconds. echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. httpd D 81000101d7a0 0 1246 2394 1247 1191 (NOTLB) 81013ee7dc38 0082 0092 81013ee7dcd8 81013ee7dd04 000a 810144d0f7e0 81019fc28100 308f8b444727 14ee 810144d0f9c8 00038006e608 Call Trace: [8006ec4e] do_gettimeofday+0x40/0x90 [80028c5a] sync_page+0x0/0x43 [800637ca] io_schedule+0x3f/0x67 [80028c98] sync_page+0x3e/0x43 [8006390e] __wait_on_bit_lock+0x36/0x66 [8003ff27] __lock_page+0x5e/0x64 [800a2921] wake_bit_function+0x0/0x23 [8003fd85] pagevec_lookup+0x17/0x1e [800cc666] invalidate_inode_pages2_range+0x73/0x1bd [8004fc94] finish_wait+0x32/0x5d [884b9798] :fuse:wait_answer_interruptible+0xb6/0xbd [800a28f3] autoremove_wake_function+0x0/0x2e [8009a485] recalc_sigpending+0xe/0x25 [8001decc] sigprocmask+0xb7/0xdb [884bd456] :fuse:fuse_finish_open+0x36/0x62 [884bda11] :fuse:fuse_open_common+0x147/0x158 [884bda22] :fuse:fuse_open+0x0/0x7 [8001eb99] __dentry_open+0xd9/0x1dc [8002766e] do_filp_open+0x2a/0x38 [8001a061] do_sys_open+0x44/0xbe [8005d28d] tracesys+0xd5/0xe0 INFO: task httpd:1837 blocked for more than 120 seconds. echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. httpd D 810001004420 0 1837 2394 1856 1289 (NOTLB) 81013c6f9c38 0086 81013c6f9bf8 fffe 810170ce7000 000a 81019c0ae7a0 80311b60 308c0f83d792 0ec4 81019c0ae988 8006e608 Call Trace: [8006ec4e] do_gettimeofday+0x40/0x90 [80028c5a] sync_page+0x0/0x43 [800637ca] io_schedule+0x3f/0x67 [80028c98] sync_page+0x3e/0x43 [8006390e] __wait_on_bit_lock+0x36/0x66 [8003ff27] __lock_page+0x5e/0x64 [800a2921] wake_bit_function+0x0/0x23 [8003fd85] pagevec_lookup+0x17/0x1e [800cc666] invalidate_inode_pages2_range+0x73/0x1bd [8004fc94] finish_wait+0x32/0x5d [884b9798] :fuse:wait_answer_interruptible+0xb6/0xbd [800a28f3] autoremove_wake_function+0x0/0x2e [8009a485] recalc_sigpending+0xe/0x25 [8001decc] sigprocmask+0xb7/0xdb [884bd456] :fuse:fuse_finish_open+0x36/0x62 [884bda11] :fuse:fuse_open_common+0x147/0x158 [884bda22] :fuse:fuse_open+0x0/0x7 [8001eb99] __dentry_open+0xd9/0x1dc [8002766e] do_filp_open+0x2a/0x38 [8001a061] do_sys_open+0x44/0xbe [8005d28d] tracesys+0xd5/0xe0 INFO: task httpd:383 blocked for more than 120 seconds. echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. httpd D 81019fa21100 0 383 2394 534 (NOTLB) 81013e497c08 0082 810183eb8910 884b9219 81019e41c600 0009 81019b1e2100 81019fa21100 308c0e2c2bfb 00016477 81019b1e22e8 00038006e608 Call Trace: [884b9219] :fuse:flush_bg_queue+0x2b/0x48 [8006ec4e] do_gettimeofday+0x40/0x90
Re: [Gluster-users] Crossover cable: single point of failure?
Hi Whit, Thanks for your reply. I do know that it's not the Gluster-standard thing to use a crossover link. (Seems to me it's the obvious best way to do it, but it's not a configuration they're committed to.) It's possible that if you were doing your replication over the LAN rather than the crossover that Gluster would handle a disconnected system better. Might be worth testing. It is still the same, even if no crossover cable is used and all traffic goes through an ethernet switch. The client can't write to the gluster volume anymore. I discovered that the NFS volume seems to be read-only in this state: client01:~# rm debian-6.0.1a-i386-DVD-1.iso rm: cannot remove `debian-6.0.1a-i386-DVD-1.iso': Read-only file system So all traffic goes through one interface (NFS to the client, glusterfs replication, corosync). I can reproduce the issue with the NFS client on VMware ESXi and with the NFS client on my Linux desktop. My config: Volume Name: vmware Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: gluster1:/mnt/gvolumes/vmware Brick2: gluster2:/mnt/gvolumes/vmware Regards, Daniel ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Crossover cable: single point of failure?
Daniel, Can you confirm if you backend filesystem is proper? Can you delete the file from the backend? Gluster does not return EROFS in any of the cases you described. Also, try setting a lower ping-timeout and see if it helps in case of crosscable failover test. Avati On Tue, Jun 14, 2011 at 12:58 PM, Daniel Manser dan...@clienta.ch wrote: Hi Whit, Thanks for your reply. I do know that it's not the Gluster-standard thing to use a crossover link. (Seems to me it's the obvious best way to do it, but it's not a configuration they're committed to.) It's possible that if you were doing your replication over the LAN rather than the crossover that Gluster would handle a disconnected system better. Might be worth testing. It is still the same, even if no crossover cable is used and all traffic goes through an ethernet switch. The client can't write to the gluster volume anymore. I discovered that the NFS volume seems to be read-only in this state: client01:~# rm debian-6.0.1a-i386-DVD-1.iso rm: cannot remove `debian-6.0.1a-i386-DVD-1.iso': Read-only file system So all traffic goes through one interface (NFS to the client, glusterfs replication, corosync). I can reproduce the issue with the NFS client on VMware ESXi and with the NFS client on my Linux desktop. My config: Volume Name: vmware Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: gluster1:/mnt/gvolumes/vmware Brick2: gluster2:/mnt/gvolumes/vmware Regards, Daniel ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Crossover cable: single point of failure?
Hi Thanks for your reply. Can you confirm if you backend filesystem is proper? Can you delete the file from the backend? I was able to delete files on the server. Also, try setting a lower ping-timeout and see if it helps in case of crosscable failover test. I set it to 5 seconds, but the result is still the same. Volume Name: vmware Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: gluster1:/mnt/gvolumes/vmware Brick2: gluster2:/mnt/gvolumes/vmware Options Reconfigured: network.ping-timeout: 5 Daniel ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Strange errors reading/writing/editing/deleting JPGs, PDFs and PNG from PHP Application
Here is the log. Nothing really stands out. There is one entry from today and the previous log entry was from 6/3. [alan@app1:10.71.57.82:glusterfs]$ sudo cat /var/log/glusterfs/drives-d1.log [2011-06-03 18:17:05.160722] W [io-stats.c:1644:init] d1: dangling volume. check volfile [2011-06-03 18:17:05.160865] W [dict.c:1205:data_to_str] dict: @data=(nil) [2011-06-03 18:17:05.160897] W [dict.c:1205:data_to_str] dict: @data=(nil) Given volfile: +--+ 1: volume d1-client-0 2: type protocol/client 3: option remote-host 10.198.6.214 4: option remote-subvolume /data/d1 5: option transport-type tcp 6: end-volume 7: 8: volume d1-client-1 9: type protocol/client 10: option remote-host 10.195.15.38 11: option remote-subvolume /data/d1 12: option transport-type tcp 13: end-volume 14: 15: volume d1-replicate-0 16: type cluster/replicate 17: subvolumes d1-client-0 d1-client-1 18: end-volume 19: 20: volume d1-write-behind 21: type performance/write-behind 22: option cache-size 4MB 23: subvolumes d1-replicate-0 24: end-volume 25: 26: volume d1-read-ahead 27: type performance/read-ahead 28: subvolumes d1-write-behind 29: end-volume 30: 31: volume d1-io-cache 32: type performance/io-cache 33: option cache-size 1024MB 34: subvolumes d1-read-ahead 35: end-volume 36: 37: volume d1-quick-read 38: type performance/quick-read 39: option cache-size 1024MB 40: subvolumes d1-io-cache 41: end-volume 42: 43: volume d1-stat-prefetch 44: type performance/stat-prefetch 45: subvolumes d1-quick-read 46: end-volume 47: 48: volume d1 49: type debug/io-stats 50: subvolumes d1-stat-prefetch 51: end-volume +--+ [2011-06-03 18:17:08.676157] I [client-handshake.c:1005:select_server_supported_programs] d1-client-0: Using Program GlusterFS-3.1.0, Num (1298437), Version (310) [2011-06-03 18:17:08.684299] I [client-handshake.c:1005:select_server_supported_programs] d1-client-1: Using Program GlusterFS-3.1.0, Num (1298437), Version (310) [2011-06-03 18:17:08.718624] I [client-handshake.c:841:client_setvolume_cbk] d1-client-1: Connected to 10.195.15.38:24009, attached to remote volume '/data/d1'. [2011-06-03 18:17:08.718687] I [afr-common.c:2572:afr_notify] d1-replicate-0: Subvolume 'd1-client-1' came back up; going online. [2011-06-03 18:17:08.732772] I [fuse-bridge.c:2821:fuse_init] glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.14 [2011-06-03 18:17:08.735602] I [afr-common.c:819:afr_fresh_lookup_cbk] d1-replicate-0: added root inode [2011-06-03 18:17:08.748443] I [client-handshake.c:841:client_setvolume_cbk] d1-client-0: Connected to 10.198.6.214:24009, attached to remote volume '/data/d1'. [2011-06-10 06:33:08.255922] W [fuse-bridge.c:2510:fuse_getxattr] glusterfs-fuse: 3480740: GETXATTR (null)/3039291028 (security.capability) (fuse_loc_fill() failed) [alan@app1:10.71.57.82:glusterfs]$ Forgive me, I'm relatively new to GlusterFS. I'm not sure what level of logging I have setup. How can I tell the level of logging I have configured? Perhaps I could increase this? Hopefully capture more detailed information. Thanks again for the help! - Alan Just in case this helps, here are the volume configuration files from the server- [alan@file1:10.198.6.214:d1]$ sudo cat d1-fuse.vol volume d1-client-0 type protocol/client option remote-host 10.198.6.214 option remote-subvolume /data/d1 option transport-type tcp end-volume volume d1-client-1 type protocol/client option remote-host 10.195.15.38 option remote-subvolume /data/d1 option transport-type tcp end-volume volume d1-replicate-0 type cluster/replicate subvolumes d1-client-0 d1-client-1 end-volume volume d1-write-behind type performance/write-behind option cache-size 4MB subvolumes d1-replicate-0 end-volume volume d1-read-ahead type performance/read-ahead subvolumes d1-write-behind end-volume volume d1-io-cache type performance/io-cache option cache-size 1024MB subvolumes d1-read-ahead end-volume volume d1-quick-read type performance/quick-read option cache-size 1024MB subvolumes d1-io-cache end-volume volume d1-stat-prefetch type performance/stat-prefetch subvolumes d1-quick-read end-volume volume d1 type debug/io-stats subvolumes d1-stat-prefetch end-volume [alan@file1:10.198.6.214:d1]$ [alan@file1:10.198.6.214:d1]$ sudo cat d1.10.195.15.38.data-d1.vol volume d1-posix type storage/posix option directory /data/d1 end-volume volume d1-access-control type features/access-control subvolumes d1-posix end-volume volume d1-locks type features/locks subvolumes d1-access-control end-volume volume d1-io-threads type
Re: [Gluster-users] [Gluster3.2@Grid5000] 128 nodes failure and rr scheduler question
Hello, To make things clear, what I've done is : - deploying GlusterFS on 2, 4, 8, 16, 32, 64, 128 nodes - running a variant of the MAB benchmark (it's all about compilation of openssl-1.0.0) on 2, 4, 8, 16, 32, 64, 128 nodes - I used 'pdsh -f 512' to start MAB on all nodes at the same time - on each experiment on each node, I ran MAB in a dedicated directory within the glusterfs global namespace (e.g. nodeA used gluster global namespace/nodeA/mab files) to avoid a metadata storm on the parent directory inode - between each experiment, I destroy and redeploy a complete new GlusterFS setup (and I also destroy everything within each brick i.e the exported storage dir) I then compare the average compilation time vs the number of nodes ... and it increases due to the round robin scheduler that dispatches files on all the bricks 2 : Phase_V(s)avg 249.9332121175 4 : Phase_V(s)avg 262.808117374 8 : Phase_V(s)avg 293.572061537875 16 : Phase_V(s)avg 351.436554833375 32 : Phase_V(s)avg 546.503069517844 64 : Phase_V(s)avg 1010.61019479478 (phase V is related to the compilation itself, previous phases are about metadata ops) You can also try to compile a linux kernel on your own, this is pretty much the same thing. Now regarding the GlusterFS setup : yes, you're right, there is no replication so this is a simple stripping (on a file basis) setup Each time, I create a glusterfs volume featuring one brick, then i add bricks (one by one) till I reach the number of nodes ... and after that, I start the volume. Now regarding the 128bricks case, this is when I start the volume that I get a random error telling me that brickX does not respond, and this changes every time I retry to start the volume. So far, I didn't tested with a number of nodes between 64 and 128 François On Friday, June 10, 2011 16:38 CEST, Pavan T C t...@gluster.com wrote: On Wednesday 08 June 2011 06:10 PM, Francois THIEBOLT wrote: Hello, I'm driving some experiments on grid'5000 with GlusterFS 3.2 and, as a first point, i've been unable to start a volume featuring 128bricks (64 ok) Then, due to the round-robin scheduler, as the number of nodes increase (every node is also a brick), the performance of an application on an individual node decrease! I would like to understand what you mean by increase of nodes. You have 64 bricks and each brick also acts as a client. So, where is the increase in the number of nodes? Are you referring to the mounts that you are doing? What is your gluster configuration - I mean, is it a distribute only, or is it a distributed-replicate setup? [From your command sequence, it should be a pure distribute, but I just want to be sure]. What is your application like? Is it mostly I/O intensive? It will help if you provide a brief description of typical operations done by your application. How are you measuring the performance? What parameter determines that you are experiencing a decrease in performance with increase in the number of nodes? Pavan So my question is : how to STOP the round-robin distribution of files over the bricks within a volume ? *** Setup *** - i'm using glusterfs3.2 from source - every node is both a client node and a brick (storage) Commands : - gluster peer probe each of the 128nodes - gluster volume create myVolume transport tcp 128 bricks:/storage - gluster volume start myVolume (fails with 128 bricks!) - mount -t glusterfs .. on all nodes Feel free to tell me how to improve things François ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Files present on the backend but have become invisible from clients
Hi Pranith. Yes, I do see those messages in my mount logs on the client: root@jc1lnxsamm100:~# fgrep afr-self-heal /var/log/glusterfs/pfs2.log | tail [2011-06-14 07:30:56.152066] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2011-06-14 07:35:16.869848] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2011-06-14 07:39:48.500117] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2011-06-14 07:40:19.312364] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2011-06-14 07:44:27.714292] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2011-06-14 07:50:04.691154] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2011-06-14 07:54:17.853591] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2011-06-14 07:55:26.876415] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2011-06-14 07:59:51.702585] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2011-06-14 08:00:08.346056] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes James Burnash Unix Engineer Knight Capital Group -Original Message- From: Pranith Kumar. Karampuri [mailto:prani...@gluster.com] Sent: Tuesday, June 14, 2011 1:28 AM To: Burnash, James; Jeff Darcy (jda...@redhat.com); gluster-users@gluster.org Subject: RE: [Gluster-users] Files present on the backend but have become invisible from clients hi James, bricks 3-10 dont have problems, I think brick 01, 02 went to split brain situation, could you confirm if you see the following logs in your mount's log file [afr-self-heal-metadata.c:524:afr_sh_metadata_fix]0-stress-volume-replicate-0: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes. Pranith. From: Burnash, James [jburn...@knight.com] Sent: Monday, June 13, 2011 11:56 PM To: Pranith Kumar. Karampuri; Jeff Darcy(jda...@redhat.com); gluster-users@gluster.org Subject: RE: [Gluster-users] Files present on the backend but havebecome invisible from clients Hi Pranith. Here is the revised listing - please notice that bricks g01 and g02 on the two servers (jc1letgfs14 and 15) have what appear to be normal trusted.afr attributes, but the balance of the bricks (3-10) all have =0x. http://pastebin.com/j0hVFTzd Is this right, or am I looking at this backwards / sideways? James Burnash Unix Engineer Knight Capital Group -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of Burnash, James Sent: Monday, June 13, 2011 8:28 AM To: 'Pranith Kumar. Karampuri'; Jeff Darcy (jda...@redhat.com); gluster-users@gluster.org Subject: Re: [Gluster-users] Files present on the backend but have become invisible from clients Hi Pranith. Sorry - last week was a rough one. Disregard that pastebin - I will put up a new one that makes more sense and repost to the list. James -Original Message- From: Pranith Kumar. Karampuri [mailto:prani...@gluster.com] Sent: Monday, June 13, 2011 1:12 AM To: Burnash, James; Jeff Darcy (jda...@redhat.com); gluster-users@gluster.org Subject: RE: [Gluster-users] Files present on the backend but have become invisible from clients hi James, I looked at the pastebin sample, I see that all of the attrs are complete zeros, Could you let me know what is it that I am missing. Pranith From: gluster-users-boun...@gluster.org [gluster-users-boun...@gluster.org] on behalf of Burnash, James [jburn...@knight.com] Sent:
Re: [Gluster-users] Apache hung tasks still occur with glusterfs 3.2.1
Hi. hello. do you maybe have already feedback? was it successfull? (disabled io-cache, disabled stat-prefetch, inreades io-thread count to 64) For now it seems that the work-arround has worked. We have not encountered any hung processes on the server since the change (io-cache disable, stat-prefetch disable io-thread-count=64). The only bad influence is expectable, the pages (mainly list of several hundred images per page) take a little while longer. Of course this is caused by the files not being cached. is/was your problem similar to this one? http://bugs.gluster.com/show_bug.cgi?id=3011 The symptoms were the same. The processes were hung on ioctl. /proc//wchan for the PIDs showed sync_page. I'll experiment a bit once again today and set the volume back to original parameters and wait for a hung process to get you the information (/tmp/glusterdump.pid). I'll report back later. Jiri Am 13.06.2011 19:14, schrieb Jiri Lunacek: Thanks for the tip. I disabled io-cache and stat-prefetch, increased io-thread-count to 64 and rebooted the server to clean off the hung apache processes. We'll see tomorrow. On 13.6.2011, at 15:58, Justice London wrote: Disable io-cache and up the threads to 64 and your problems should disappear. They did for me when I made both of these changes. Justice London *From:*gluster-users-boun...@gluster.org mailto:gluster-users-boun...@gluster.org[mailto:gluster-users-boun...@gluster.org]*On Behalf Of*Jiri Lunacek *Sent:*Monday, June 13, 2011 1:49 AM *To:*gluster-users@gluster.org mailto:gluster-users@gluster.org *Subject:*[Gluster-users] Apache hung tasks still occur with glusterfs 3.2.1 Hi all. We have been having problems with hung tasks of apache reading from glusterfs 2-replica volume ever since upgrading to 3.2.0. The problems were identical to those described here: http://gluster.org/pipermail/gluster-users/2011-May/007697.html Yesterday we updated to 3.2.1. A good thing is that the hung tasks stopped appearing when gluster is in intact operation, i.e. when there are no modifications to the gluster configs at all. Today we modified some other volume exported by the same cluster (but not sharing anything with the volume used by the apache process). And, once again, two requests of apache reading from glusterfs volume are stuck. Any help with this issue would be very appreciated as right now we have to nightly-reboot the machine as the processes re stuck in iowait - unkillable. I really do not want to go through the downgrade to 3.1.4 since it seems from the mailing list that it may not go exactly smooth. We are exporting millions of files and any large operation on the exported filesystem takes days. I am attaching tech info on the problem. client: Centos 5.6 2.6.18-238.9.1.el5 fuse-2.7.4-8.el5 glusterfs-fuse-3.2.1-1 glusterfs-core-3.2.1-1 servers: Centos 5.6 2.6.18-194.32.1.el5 fuse-2.7.4-8.el5 glusterfs-fuse-3.2.1-1 glusterfs-core-3.2.1-1 dmesg: INFO: task httpd:1246 blocked for more than 120 seconds. echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. httpd D 81000101d7a0 0 1246 2394 1247 1191 (NOTLB) 81013ee7dc38 0082 0092 81013ee7dcd8 81013ee7dd04 000a 810144d0f7e0 81019fc28100 308f8b444727 14ee 810144d0f9c8 00038006e608 Call Trace: [8006ec4e] do_gettimeofday+0x40/0x90 [80028c5a] sync_page+0x0/0x43 [800637ca] io_schedule+0x3f/0x67 [80028c98] sync_page+0x3e/0x43 [8006390e] __wait_on_bit_lock+0x36/0x66 [8003ff27] __lock_page+0x5e/0x64 [800a2921] wake_bit_function+0x0/0x23 [8003fd85] pagevec_lookup+0x17/0x1e [800cc666] invalidate_inode_pages2_range+0x73/0x1bd [8004fc94] finish_wait+0x32/0x5d [884b9798] :fuse:wait_answer_interruptible+0xb6/0xbd [800a28f3] autoremove_wake_function+0x0/0x2e [8009a485] recalc_sigpending+0xe/0x25 [8001decc] sigprocmask+0xb7/0xdb [884bd456] :fuse:fuse_finish_open+0x36/0x62 [884bda11] :fuse:fuse_open_common+0x147/0x158 [884bda22] :fuse:fuse_open+0x0/0x7 [8001eb99] __dentry_open+0xd9/0x1dc [8002766e] do_filp_open+0x2a/0x38 [8001a061] do_sys_open+0x44/0xbe [8005d28d] tracesys+0xd5/0xe0 INFO: task httpd:1837 blocked for more than 120 seconds. echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. httpd D 810001004420 0 1837 2394 1856 1289 (NOTLB) 81013c6f9c38 0086 81013c6f9bf8 fffe 810170ce7000 000a 81019c0ae7a0 80311b60 308c0f83d792 0ec4 81019c0ae988 8006e608 Call Trace: [8006ec4e] do_gettimeofday+0x40/0x90 [80028c5a] sync_page+0x0/0x43 [800637ca] io_schedule+0x3f/0x67 [80028c98] sync_page+0x3e/0x43
[Gluster-users] read-ahead performance translator tweaking with 3.2.1?
Hi Is there a way to tweak the read-ahead settings via the gluster command line? For example: gluster volume set somevolumename performance.read-ahead 2 Or is this no longer feasible? With read-ahead set to the default of 8 like was the case with standard volgen generated configs, the amount of useless reads happening to the bricks is way too high, and on 1 GbE interconnects causes saturation and performance degradation in no time. Thanks. Mohan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Crossover cable: single point of failure?
On Tue, Jun 14, 2011 at 2:51 AM, Daniel Manser dan...@clienta.ch wrote: Hi Thanks for your reply. Can you confirm if you backend filesystem is proper? Can you delete the file from the backend? I was able to delete files on the server. Also, try setting a lower ping-timeout and see if it helps in case of crosscable failover test. I set it to 5 seconds, but the result is still the same. It will be good to get to bottom of this. Do you see any errors in server logs? Is it possible to do the same test with no vmware in between, just using baremetals? Volume Name: vmware Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: gluster1:/mnt/gvolumes/vmware Brick2: gluster2:/mnt/gvolumes/vmware Options Reconfigured: network.ping-timeout: 5 Daniel ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Variable sized bricks replication
Hello, we've been using glusterfs 3.0 for a while now, and it appears to be quite stable and very useful. Next thing we need to do however is to migrate to glusterfs 3.2 to allow for brick additions on the fly without client restarts. Now, since we are about to completely re-do the whole thing, we should really do distributed replicated volumes, and here I was wondering: can I use different brick sizes for that? For economical reasons, I need to use the hardware on hand, and there is a lot, but the disks are anything from 500GB to 2TB. Now, how does glusterfs handle the replication here? Will gluster just use another node if one is full? Any experience with that? cheers, Philip ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Files present on the backend but have become invisible from clients
hi James, Could you please check if any of the file permissions of files in the directory are mis-matching, I also need the output of getxattr -d -m . filename for all the files in the following bricks in that order: jc1letgfs14:export/read-only/g01 jc1letgfs15:export/read-only/g01 jc1letgfs14:export/read-only/g02 jc1letgfs15:export/read-only/g02 Please give the ls command output on the mount point so that we can check what files are missing. Thanks Pranith From: Burnash, James [jburn...@knight.com] Sent: Tuesday, June 14, 2011 5:37 PM To: Pranith Kumar. Karampuri; Jeff Darcy(jda...@redhat.com); gluster-users@gluster.org Subject: RE: [Gluster-users] Files present on the backend but havebecome invisible from clients Hi Pranith. Yes, I do see those messages in my mount logs on the client: root@jc1lnxsamm100:~# fgrep afr-self-heal /var/log/glusterfs/pfs2.log | tail [2011-06-14 07:30:56.152066] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2011-06-14 07:35:16.869848] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2011-06-14 07:39:48.500117] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2011-06-14 07:40:19.312364] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2011-06-14 07:44:27.714292] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2011-06-14 07:50:04.691154] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2011-06-14 07:54:17.853591] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2011-06-14 07:55:26.876415] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2011-06-14 07:59:51.702585] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes [2011-06-14 08:00:08.346056] E [afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes James Burnash Unix Engineer Knight Capital Group -Original Message- From: Pranith Kumar. Karampuri [mailto:prani...@gluster.com] Sent: Tuesday, June 14, 2011 1:28 AM To: Burnash, James; Jeff Darcy (jda...@redhat.com); gluster-users@gluster.org Subject: RE: [Gluster-users] Files present on the backend but have become invisible from clients hi James, bricks 3-10 dont have problems, I think brick 01, 02 went to split brain situation, could you confirm if you see the following logs in your mount's log file [afr-self-heal-metadata.c:524:afr_sh_metadata_fix]0-stress-volume-replicate-0: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes. Pranith. From: Burnash, James [jburn...@knight.com] Sent: Monday, June 13, 2011 11:56 PM To: Pranith Kumar. Karampuri; Jeff Darcy(jda...@redhat.com); gluster-users@gluster.org Subject: RE: [Gluster-users] Files present on the backend but havebecome invisible from clients Hi Pranith. Here is the revised listing - please notice that bricks g01 and g02 on the two servers (jc1letgfs14 and 15) have what appear to be normal trusted.afr attributes, but the balance of the bricks (3-10) all have =0x. http://pastebin.com/j0hVFTzd Is this right, or am I looking at this backwards / sideways? James Burnash Unix Engineer Knight Capital Group -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of Burnash, James Sent: Monday, June 13, 2011 8:28 AM To: 'Pranith Kumar. Karampuri'; Jeff Darcy (jda...@redhat.com); gluster-users@gluster.org Subject: Re: [Gluster-users] Files present on the backend but have become invisible from