Re: [Gluster-users] Gluster 2.6 and infiniband
Hello, after downgrade kernel to 2.6.28 ( on 3.2.12 is glusterd not working - check my previous email ) i'm not able to run rdma at all, mount without rdma ( i'm using tcp,rdma ) is working ok but speed max 150mb/s after try to mount .rdma it fail and log contain this: [2012-06-08 03:50:32.442263] I [glusterfsd.c:1493:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.2.6 [2012-06-08 03:50:32.451931] W [write-behind.c:3023:init] 0-atlas1-write-behind: disabling write-behind for first 0 bytes [2012-06-08 03:50:32.455502] E [rdma.c:3969:rdma_init] 0-rpc-transport/rdma: Failed to get infinibanddevice context [2012-06-08 03:50:32.455528] E [rdma.c:4813:init] 0-atlas1-client-0: Failed to initialize IB Device [2012-06-08 03:50:32.455541] E [rpc-transport.c:742:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed [2012-06-08 03:50:32.44] W [rpc-clnt.c:926:rpc_clnt_connection_init] 0-atlas1-client-0: loading of new rpc-transport failed [2012-06-08 03:50:32.456355] E [client.c:2095:client_init_rpc] 0-atlas1-client-0: failed to initialize RPC [2012-06-08 03:50:32.456378] E [xlator.c:1447:xlator_init] 0-atlas1-client-0: Initialization of volume 'atlas1-client-0' failed, review your volfile again [2012-06-08 03:50:32.456391] E [graph.c:348:glusterfs_graph_init] 0-atlas1-client-0: initializing translator failed [2012-06-08 03:50:32.456403] E [graph.c:526:glusterfs_graph_activate] 0-graph: init failed [2012-06-08 03:50:32.456680] W [glusterfsd.c:727:cleanup_and_exit] (--/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7f98ecea7175] (--/usr/sbin /glusterfs(mgmt_getspec_cbk+0xc7) [0x4089d7] (--/usr/sbin/glusterfs(glusterfs_process_volfp+0x1a0) [0x406410]))) 0-: received signum (0), shutting down [2012-06-08 03:50:32.456720] I [fuse-bridge.c:3727:fini] 0-fuse: Unmounting 'mount'. Infiniband config is same with new and old kernel. thanks Matus 2012/6/7 Sabuj Pattanayek sab...@gmail.com: To make a long story short, I made rdma client connect files and mounted with them directly : #/etc/glusterd/vols/pirdist/pirdist.rdma-fuse.vol /pirdist glusterfs transport=rdma 0 0 #/etc/glusterd/vols/pirstripe/pirstripe.rdma-fuse.vol /pirstripe glusterfs transport=rdma 0 0 the transport=rdma does nothing here since it reads the parameters from .vol files . However you'll see that they're now commented out since RDMA has been very unstable for us. Servers lose their connections to each other, which somehow causes gbe clients to lose their connections. IP over IB however is working great, although at the expense of some performance vs RDMA, but it's still much better than gbe. On Thu, Jun 7, 2012 at 4:25 AM, bxma...@gmail.com bxma...@gmail.com wrote: Hello, at first it was tcp then tcp,rdma. You are right that without tcp definition .rdma is not working. But now i have another problem. I'm trying tcp / rdma, im trying even tcp/rdma using normal network card ( not using infiniband IP but normal 1gbit network card and i have still same speed, upload about 30mb/s and download about 200mb/s .. so i'm not sure if rdma is even working. Native infiniband is giving me 3500mb/s speed with benchmark tests (ib_rdma_bw ). thanks Matus 2012/6/7 Amar Tumballi ama...@redhat.com: On 06/07/2012 02:04 PM, bxma...@gmail.com wrote: Hello, i have a problem with gluster 3.2.6 and infiniband. With gluster 3.3 its working ok but with 3.2.6 i have following problems: when i'm trying to mount rdma volume using command mount -t glusterfs 192.168.100.1:/atlas1.rdma mount i get: [2012-06-07 04:30:18.894337] I [glusterfsd.c:1493:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 3.2.6 [2012-06-07 04:30:18.907499] E [glusterfsd-mgmt.c:628:mgmt_getspec_cbk] 0-glusterfs: failed to get the 'volume file' from server [2012-06-07 04:30:18.907592] E [glusterfsd-mgmt.c:695:mgmt_getspec_cbk] 0-mgmt: failed to fetch volume file (key:/atlas1.rdma) [2012-06-07 04:30:18.907995] W [glusterfsd.c:727:cleanup_and_exit] (--/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0xc9) [0x7f784e2c8bc9] (--/usr/local/ lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7f784e2c8975] (--/usr/local/sbin/glusterfs(mgmt_getspec_cbk+0x28b) [0x40861b]))) 0-: received signum (0) , shutting down [2012-06-07 04:30:18.908049] I [fuse-bridge.c:3727:fini] 0-fuse: Unmounting 'mount'. Same command without .rdma works ok. Is the volume's transport type only 'rdma' ? or 'tcp,rdma' ? If its only 'rdma', then appending .rdma to volume name is not required. The appending of .rdma is only required when there are both type of transports on a volume (ie, 'tcp,rdma'), as from the client you can decide which transport you want to mount. default volume name would point to 'tcp' transport type, and appending .rdma, will point to rdma transport type. Hope that is clear now. Regards, Amar
[Gluster-users] Gluster 2.6 and infiniband
Hello, i have a problem with gluster 3.2.6 and infiniband. With gluster 3.3 its working ok but with 3.2.6 i have following problems: when i'm trying to mount rdma volume using command mount -t glusterfs 192.168.100.1:/atlas1.rdma mount i get: [2012-06-07 04:30:18.894337] I [glusterfsd.c:1493:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 3.2.6 [2012-06-07 04:30:18.907499] E [glusterfsd-mgmt.c:628:mgmt_getspec_cbk] 0-glusterfs: failed to get the 'volume file' from server [2012-06-07 04:30:18.907592] E [glusterfsd-mgmt.c:695:mgmt_getspec_cbk] 0-mgmt: failed to fetch volume file (key:/atlas1.rdma) [2012-06-07 04:30:18.907995] W [glusterfsd.c:727:cleanup_and_exit] (--/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0xc9) [0x7f784e2c8bc9] (--/usr/local/ lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7f784e2c8975] (--/usr/local/sbin/glusterfs(mgmt_getspec_cbk+0x28b) [0x40861b]))) 0-: received signum (0) , shutting down [2012-06-07 04:30:18.908049] I [fuse-bridge.c:3727:fini] 0-fuse: Unmounting 'mount'. Same command without .rdma works ok. thanks Matus ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster 2.6 and infiniband
Hello, at first it was tcp then tcp,rdma. You are right that without tcp definition .rdma is not working. But now i have another problem. I'm trying tcp / rdma, im trying even tcp/rdma using normal network card ( not using infiniband IP but normal 1gbit network card and i have still same speed, upload about 30mb/s and download about 200mb/s .. so i'm not sure if rdma is even working. Native infiniband is giving me 3500mb/s speed with benchmark tests (ib_rdma_bw ). thanks Matus 2012/6/7 Amar Tumballi ama...@redhat.com: On 06/07/2012 02:04 PM, bxma...@gmail.com wrote: Hello, i have a problem with gluster 3.2.6 and infiniband. With gluster 3.3 its working ok but with 3.2.6 i have following problems: when i'm trying to mount rdma volume using command mount -t glusterfs 192.168.100.1:/atlas1.rdma mount i get: [2012-06-07 04:30:18.894337] I [glusterfsd.c:1493:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 3.2.6 [2012-06-07 04:30:18.907499] E [glusterfsd-mgmt.c:628:mgmt_getspec_cbk] 0-glusterfs: failed to get the 'volume file' from server [2012-06-07 04:30:18.907592] E [glusterfsd-mgmt.c:695:mgmt_getspec_cbk] 0-mgmt: failed to fetch volume file (key:/atlas1.rdma) [2012-06-07 04:30:18.907995] W [glusterfsd.c:727:cleanup_and_exit] (--/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0xc9) [0x7f784e2c8bc9] (--/usr/local/ lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7f784e2c8975] (--/usr/local/sbin/glusterfs(mgmt_getspec_cbk+0x28b) [0x40861b]))) 0-: received signum (0) , shutting down [2012-06-07 04:30:18.908049] I [fuse-bridge.c:3727:fini] 0-fuse: Unmounting 'mount'. Same command without .rdma works ok. Is the volume's transport type only 'rdma' ? or 'tcp,rdma' ? If its only 'rdma', then appending .rdma to volume name is not required. The appending of .rdma is only required when there are both type of transports on a volume (ie, 'tcp,rdma'), as from the client you can decide which transport you want to mount. default volume name would point to 'tcp' transport type, and appending .rdma, will point to rdma transport type. Hope that is clear now. Regards, Amar ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Gluster 3.3 and xen 4.1.2 under Kernel 3.2 problem
Hello i have following problem, i'm trying to run gluster server 3.3.0 with xen 3.1.2 with kernel 3.2.12. I'm running glusted on dom0, not inside virtual env. Glusterd is dieing with following message: E [glusterd.c:270:glusterd_check_gsync_present] 0-glusterd: geo-replication module not working as desired D [glusterd.c:298:glusterd_check_gsync_present] 0-glusterd: Returning -1 E [xlator.c:385:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again With hypervisor turned off same kernel and exactly same settings works fine. thanks Matus ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Gluster 3.2 configurations + translators
Hello, i'm little confused about gluster configuration interface. I did start with gluster 3.2 and i did all configurations using gluster cli command. Now when i was looking into way how to tune performance i find out in documentation on many places some pieces of text configuration files, but usually there is a warning that it is old and should be not used. Right now im solving how to turn on io-cache and i find in some documentation that it need to be turned on on server and client end as well. On server i did use gluster volume set atlas performance.io-cache on but on client gluser command die on timeout or error that glusterd not working. So question is how to configure correctly client end on gluster ? There is very little about this on gluster 3.2 documentation and i don't know how much from 3.1 can be used here. And is there any translator documentation for gluster 3.2 ? thanks Matus ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster 3.2 configurations + translators
Hmmm, where can i check if client is configured to pull configuration from server ? On server i have /etc/glusterd and /etc/gluster which look like it is not used at all. On client end there is only /etc/gluster which is also not used( all defaults ). Matus 2011/9/15 greg_sw...@aotx.uscourts.gov: gluster-users-boun...@gluster.org wrote on 09/15/2011 08:40:52 AM: i'm little confused about gluster configuration interface. I did start with gluster 3.2 and i did all configurations using gluster cli command. Now when i was looking into way how to tune performance i find out in documentation on many places some pieces of text configuration files, but usually there is a warning that it is old and should be not used. Right now im solving how to turn on io-cache and i find in some documentation that it need to be turned on on server and client end as well. On server i did use gluster volume set atlas performance.io-cache on but on client gluser command die on timeout or error that glusterd not working. So question is how to configure correctly client end on gluster ? There is very little about this on gluster 3.2 documentation and i don't know how much from 3.1 can be used here. And is there any translator documentation for gluster 3.2 ? With the newer versions they are really pushing away from having to manually configure bits. As long as your client is configured to pull its configuration file from the server when you run the command on the server the client should get an updated config file. You should be able to look in the clients log file and see the fact that the config file updated (I don't have an example at the moment). Another way you can check this is if the number of connections from the client to the server (netstat -pant | grep gluster | wc -l) increases after you make the change. (should increase by the count of bricks in the volume i believe). -greg ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Top Reset
Hmm, i just find out that top stats are not changing at all, it look like it was grabbing data for some time and then no changes ... does anyone know how it is working ? Documentation is very bad about this feature. thanks Matus 2011/7/20 bxma...@gmail.com bxma...@gmail.com: Hello, is there any way how to reset volume TOP statistics ? thanks Matus ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Top Reset
Hello, is there any way how to reset volume TOP statistics ? thanks Matus ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster 3.2.0 and ucarp not working
When client is connecting to any gluster node it automaticly receive list of all other nodes for that volume. Matus Dne 8.6.2011 8:13 Joshua Baker-LePain jl...@duke.edu napsal(a): On Mon, 6 Jun 2011 at 1:30am, Craig Carl wrote Matus - If you are using the Gluster native client (mount -t glusterfs ...) then ucarp/CTDB is NOT required and you should not install it. Always use the real IPs when you are mounting with 'mount -t glusterfs...'. Hrm. That wasn't my understanding. Say my fstab line looks like this: 192.168.2.100:/distrep /mnt/distrep glusterfs defaults,_netdev 0 0 Now, let's say that at mount time 192.168.2.100 is down. How does the Gluster native client know which other IP addresses to contact to get the volume file? Is there a way to put multiple hosts in the fstab line? -- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] uninterruptible processes writing to glusterfsshare
Im using kernel 2.6.34 + fuse 2.5.5 + gluster 3.2 from beginning and it happen again today ... php-fpm freeze and reboot was only solution. Matus 2011/6/7 Markus Fröhlich markus.froehl...@xidras.com: hi! there ist no relavant output from dmesg. no entries in the server log - only the one line in the client-server log, I already posted. the glusterfs version on the server had been updated to gfs 3.2.0 more than a month ago. because of the troubles on the backup server, I deleted the whole backup share and started from scratch. I looked for a update of fuse and upgraded from 2.7.2-61.18.1 to 2.8.5-41.1 maybe this helps. here is the changelog info: Authors: Miklos Szeredi mik...@szeredi.hu Distribution: systemsmanagement:baracus / SLE_11_SP1 * Tue Mar 29 2011 db...@novell.com - remove the --no-canonicalize usage for suse_version = 11.3 * Mon Mar 21 2011 co...@novell.com - licenses package is about to die * Thu Feb 17 2011 mszer...@suse.cz - In case of failure to add to /etc/mtab don't umount. [bnc#668820] [CVE-2011-0541] * Tue Nov 16 2010 mszer...@suse.cz - Fix symlink attack for mount and umount [bnc#651598] * Wed Oct 27 2010 mszer...@suse.cz - Remove /etc/init.d/boot.fuse [bnc#648843] * Tue Sep 28 2010 mszer...@suse.cz - update to 2.8.5 * fix option escaping for fusermount [bnc#641480] * Wed Apr 28 2010 mszer...@suse.cz - keep examples and internal docs in devel package (from jnweiger) * Mon Apr 26 2010 mszer...@suse.cz - update to 2.8.4 * fix checking for symlinks in umount from /tmp * fix umounting if /tmp is a symlink kind regards markus froehlich Am 06.06.2011 21:19, schrieb Anthony J. Biacco: Could be fuse, check 'dmesg' for kernel module timeouts. In a similar vein, has anyone seen signifigant performance/reliability with diff fuse versions? say, latest source vs. Rhel distro rpms vers. -Tony -Original Message- From: Mohit Anchliamohitanch...@gmail.com Sent: June 06, 2011 1:14 PM To: Markus Fröhlichmarkus.froehl...@xidras.com Cc: gluster-users@gluster.orggluster-users@gluster.org Subject: Re: [Gluster-users] uninterruptible processes writing to glusterfsshare Is there anything in the server logs? Does it follow any particular pattern before going in this mode? Did you upgrade Gluster or is this new install? 2011/6/6 Markus Fröhlichmarkus.froehl...@xidras.com: hi! sometimes we've on some client-servers hanging uninterruptible processes (ps aux stat is on D ) and on one the CPU wait I/O grows within some minutes to 100%. you are not able to kill such processes - also kill -9 doesnt work - when you connect via strace to such an process, you wont see anything and you cannot detach it again. there are only two possibilities: killing the glusterfs process (umount GFS share) or rebooting the server. the only log entry I found, was on one client - just a single line: [2011-06-06 10:44:18.593211] I [afr-common.c:581:afr_lookup_collect_xattr] 0-office-data-replicate-0: data self-heal is pending for /pc-partnerbet-public/Promotionaktionen/Mailakquise_2009/Webmaster_2010/HTML/bilder/Thumbs.db. one of the client-servers is a samba-server, the other one a backup-server based on rsync with millions of small files. gfs-servers + gfs-clients: SLES11 x86_64, glusterfs V 3.2.0 and here are the configs from server and client: server config /etc/glusterd/vols/office-data/office-data.gfs-01-01.GFS-office-data02.vol: volume office-data-posix type storage/posix option directory /GFS/office-data02 end-volume volume office-data-access-control type features/access-control subvolumes office-data-posix end-volume volume office-data-locks type features/locks subvolumes office-data-access-control end-volume volume office-data-io-threads type performance/io-threads subvolumes office-data-locks end-volume volume office-data-marker type features/marker option volume-uuid 3c6e633d-a0bb-4c52-8f05-a2db9bc9c659 option timestamp-file /etc/glusterd/vols/office-data/marker.tstamp option xtime off option quota off subvolumes office-data-io-threads end-volume volume /GFS/office-data02 type debug/io-stats option latency-measurement off option count-fop-hits off subvolumes office-data-marker end-volume volume office-data-server type protocol/server option transport-type tcp option auth.addr./GFS/office-data02.allow * subvolumes /GFS/office-data02 end-volume -- client config /etc/glusterd/vols/office-data/office-data-fuse.vol: volume office-data-client-0 type protocol/client option remote-host gfs-01-01 option remote-subvolume /GFS/office-data02 option transport-type tcp end-volume volume office-data-replicate-0 type cluster/replicate subvolumes office-data-client-0 end-volume volume office-data-write-behind type performance/write-behind subvolumes
Re: [Gluster-users] uninterruptible processes writing toglusterfsshare
How to disable io-cache routine ? I will try it and report back :) thanks Matus 2011/6/8 Mohit Anchlia mohitanch...@gmail.com: On Wed, Jun 8, 2011 at 12:29 PM, Justice London jlon...@lawinfo.com wrote: Hopefully this will help some people... try disabling the io-cache routine in the fuse configurations for your share. Let me know if you need instruction on doing this. It solved all of the lockup issues I was experiencing. I believe there is some sort of as-yet-undetermined memory leak here. Was there a bug filed? If you think this is a bug it will help others as well. Justice London -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of bxma...@gmail.com Sent: Wednesday, June 08, 2011 12:22 PM To: gluster-users@gluster.org Subject: Re: [Gluster-users] uninterruptible processes writing toglusterfsshare Im using kernel 2.6.34 + fuse 2.5.5 + gluster 3.2 from beginning and it happen again today ... php-fpm freeze and reboot was only solution. Matus 2011/6/7 Markus Fröhlich markus.froehl...@xidras.com: hi! there ist no relavant output from dmesg. no entries in the server log - only the one line in the client-server log, I already posted. the glusterfs version on the server had been updated to gfs 3.2.0 more than a month ago. because of the troubles on the backup server, I deleted the whole backup share and started from scratch. I looked for a update of fuse and upgraded from 2.7.2-61.18.1 to 2.8.5-41.1 maybe this helps. here is the changelog info: Authors: Miklos Szeredi mik...@szeredi.hu Distribution: systemsmanagement:baracus / SLE_11_SP1 * Tue Mar 29 2011 db...@novell.com - remove the --no-canonicalize usage for suse_version = 11.3 * Mon Mar 21 2011 co...@novell.com - licenses package is about to die * Thu Feb 17 2011 mszer...@suse.cz - In case of failure to add to /etc/mtab don't umount. [bnc#668820] [CVE-2011-0541] * Tue Nov 16 2010 mszer...@suse.cz - Fix symlink attack for mount and umount [bnc#651598] * Wed Oct 27 2010 mszer...@suse.cz - Remove /etc/init.d/boot.fuse [bnc#648843] * Tue Sep 28 2010 mszer...@suse.cz - update to 2.8.5 * fix option escaping for fusermount [bnc#641480] * Wed Apr 28 2010 mszer...@suse.cz - keep examples and internal docs in devel package (from jnweiger) * Mon Apr 26 2010 mszer...@suse.cz - update to 2.8.4 * fix checking for symlinks in umount from /tmp * fix umounting if /tmp is a symlink kind regards markus froehlich Am 06.06.2011 21:19, schrieb Anthony J. Biacco: Could be fuse, check 'dmesg' for kernel module timeouts. In a similar vein, has anyone seen signifigant performance/reliability with diff fuse versions? say, latest source vs. Rhel distro rpms vers. -Tony -Original Message- From: Mohit Anchliamohitanch...@gmail.com Sent: June 06, 2011 1:14 PM To: Markus Fröhlichmarkus.froehl...@xidras.com Cc: gluster-users@gluster.orggluster-users@gluster.org Subject: Re: [Gluster-users] uninterruptible processes writing to glusterfsshare Is there anything in the server logs? Does it follow any particular pattern before going in this mode? Did you upgrade Gluster or is this new install? 2011/6/6 Markus Fröhlichmarkus.froehl...@xidras.com: hi! sometimes we've on some client-servers hanging uninterruptible processes (ps aux stat is on D ) and on one the CPU wait I/O grows within some minutes to 100%. you are not able to kill such processes - also kill -9 doesnt work - when you connect via strace to such an process, you wont see anything and you cannot detach it again. there are only two possibilities: killing the glusterfs process (umount GFS share) or rebooting the server. the only log entry I found, was on one client - just a single line: [2011-06-06 10:44:18.593211] I [afr-common.c:581:afr_lookup_collect_xattr] 0-office-data-replicate-0: data self-heal is pending for /pc-partnerbet-public/Promotionaktionen/Mailakquise_2009/Webmaster_2010/HTML /bilder/Thumbs.db. one of the client-servers is a samba-server, the other one a backup-server based on rsync with millions of small files. gfs-servers + gfs-clients: SLES11 x86_64, glusterfs V 3.2.0 and here are the configs from server and client: server config /etc/glusterd/vols/office-data/office-data.gfs-01-01.GFS-office-data02.vol : volume office-data-posix type storage/posix option directory /GFS/office-data02 end-volume volume office-data-access-control type features/access-control subvolumes office-data-posix end-volume volume office-data-locks type features/locks subvolumes office-data-access-control end-volume volume office-data-io-threads type performance/io-threads subvolumes office-data-locks end-volume volume office-data-marker type features/marker option volume-uuid 3c6e633d-a0bb-4c52-8f05-a2db9bc9c659 option timestamp-file /etc/glusterd
Re: [Gluster-users] uninterruptible processes writing to glusterfsshare
I had similar problem, php-fpm sometime(once per 1 - 2 weeks ) hang, process is waiting for some IO, gluster itself is working ok, server reboot is only solution. Nothing in logs, nothing in dmesg. Gluster version 3.2, kernel 2.6.34, running under xen, distribution gentoo. It start after gluster installation, never had this problem with previous openafs ( many other problems :))) ). Matus 2011/6/6 Anthony J. Biacco abia...@formatdynamics.com: Could be fuse, check 'dmesg' for kernel module timeouts. In a similar vein, has anyone seen signifigant performance/reliability with diff fuse versions? say, latest source vs. Rhel distro rpms vers. -Tony -Original Message- From: Mohit Anchlia mohitanch...@gmail.com Sent: June 06, 2011 1:14 PM To: Markus Fröhlich markus.froehl...@xidras.com Cc: gluster-users@gluster.org gluster-users@gluster.org Subject: Re: [Gluster-users] uninterruptible processes writing to glusterfsshare Is there anything in the server logs? Does it follow any particular pattern before going in this mode? Did you upgrade Gluster or is this new install? 2011/6/6 Markus Fröhlich markus.froehl...@xidras.com: hi! sometimes we've on some client-servers hanging uninterruptible processes (ps aux stat is on D ) and on one the CPU wait I/O grows within some minutes to 100%. you are not able to kill such processes - also kill -9 doesnt work - when you connect via strace to such an process, you wont see anything and you cannot detach it again. there are only two possibilities: killing the glusterfs process (umount GFS share) or rebooting the server. the only log entry I found, was on one client - just a single line: [2011-06-06 10:44:18.593211] I [afr-common.c:581:afr_lookup_collect_xattr] 0-office-data-replicate-0: data self-heal is pending for /pc-partnerbet-public/Promotionaktionen/Mailakquise_2009/Webmaster_2010/HTML/bilder/Thumbs.db. one of the client-servers is a samba-server, the other one a backup-server based on rsync with millions of small files. gfs-servers + gfs-clients: SLES11 x86_64, glusterfs V 3.2.0 and here are the configs from server and client: server config /etc/glusterd/vols/office-data/office-data.gfs-01-01.GFS-office-data02.vol: volume office-data-posix type storage/posix option directory /GFS/office-data02 end-volume volume office-data-access-control type features/access-control subvolumes office-data-posix end-volume volume office-data-locks type features/locks subvolumes office-data-access-control end-volume volume office-data-io-threads type performance/io-threads subvolumes office-data-locks end-volume volume office-data-marker type features/marker option volume-uuid 3c6e633d-a0bb-4c52-8f05-a2db9bc9c659 option timestamp-file /etc/glusterd/vols/office-data/marker.tstamp option xtime off option quota off subvolumes office-data-io-threads end-volume volume /GFS/office-data02 type debug/io-stats option latency-measurement off option count-fop-hits off subvolumes office-data-marker end-volume volume office-data-server type protocol/server option transport-type tcp option auth.addr./GFS/office-data02.allow * subvolumes /GFS/office-data02 end-volume -- client config /etc/glusterd/vols/office-data/office-data-fuse.vol: volume office-data-client-0 type protocol/client option remote-host gfs-01-01 option remote-subvolume /GFS/office-data02 option transport-type tcp end-volume volume office-data-replicate-0 type cluster/replicate subvolumes office-data-client-0 end-volume volume office-data-write-behind type performance/write-behind subvolumes office-data-replicate-0 end-volume volume office-data-read-ahead type performance/read-ahead subvolumes office-data-write-behind end-volume volume office-data-io-cache type performance/io-cache subvolumes office-data-read-ahead end-volume volume office-data-quick-read type performance/quick-read subvolumes office-data-io-cache end-volume volume office-data-stat-prefetch type performance/stat-prefetch subvolumes office-data-quick-read end-volume volume office-data type debug/io-stats option latency-measurement off option count-fop-hits off subvolumes office-data-stat-prefetch end-volume -- Mit freundlichen Grüssen Markus Fröhlich Techniker Xidras GmbH Stockern 47 3744 Stockern Austria Tel: +43 (0) 2983 201 30503 Fax: +43 (0) 2983 201 305039 Email: markus.froehl...@xidras.com Web: http://www.xidras.com FN 317036 f | Landesgericht Krems | ATU64485024 VERTRAULICHE INFORMATIONEN! Diese eMail enthält vertrauliche Informationen und ist nur für den berechtigten Empfänger bestimmt. Wenn diese eMail nicht
[Gluster-users] Gluster 3.2.0 and ucarp not working
Hello everybody. I have a problem setting up gluster failover funcionality. Based on manual i setup ucarp which is working well ( tested with ping/ssh etc ) But when i use virtual address for gluster volume mount and i turn off one of nodes machine/gluster will freeze until node is back online. My virtual ip is 3.200 and machine real ip is 3.233 and 3.5. In gluster log i can see: [2011-06-06 02:33:54.230082] I [client-handshake.c:913:client_setvolume_cbk] 0-atlas-client-1: Connected to 192.168.3.233:24009, attached to re mote volume '/atlas'. [2011-06-06 02:33:54.230116] I [afr-common.c:2514:afr_notify] 0-atlas-replicate-0: Subvolume 'atlas-client-1' came back up; going online. [2011-06-06 02:33:54.237541] I [fuse-bridge.c:3316:fuse_graph_setup] 0-fuse: switched to graph 0 [2011-06-06 02:33:54.237801] I [fuse-bridge.c:2897:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.13 [2011-06-06 02:33:54.238757] I [afr-common.c:836:afr_fresh_lookup_cbk] 0-atlas-replicate-0: added root inode [2011-06-06 02:33:54.272650] I [client-handshake.c:913:client_setvolume_cbk] 0-atlas-client-0: Connected to 192.168.3.5:24009, attached to remo te volume '/atlas'. Even when IP i'm using at mount is 3.200 ... Its look like that at the end gluster is using real machine IP's even when i'm connecting to virtual. Is there a way how to turn this functionality off or it is just broken ? thanks for answer Matus ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster 3.2.0 and ucarp not working
Hi Craig, thanks for answer. I'm using replications - which is working ok, volume is mounted using -t glusterfs parameter, does it matter ? All nodes are using real IP's not virtual ( for probing etc ) , i'm using virtual only for mounting volume on client. I was waiting over 10 minutes for volume to wake up but it never start to work - it never switch to another node, even when UCARP was already pointing there, there was lot of recovery messages on log but no attemt to connect to second node. thanks Matus 2011/6/6 Craig Carl cc...@gluster.com: Matus - Gluster has automatic, built-in failover if you are using replica nodes. ucarp is only required if you want highly available NFS mounts. To use ucarp with Gluster you should - 1. Install Gluster and create a replica volume. [1] 1. DO NOT use the virtual IPs when you peer probe or create the volume, that won't work. 2. Set the ping-timeout volume option to 25 seconds. [2] 2. Install and setup ucarp. 3. Mount your NFS clients using the VIPs. 4. Mount your glusterfs clients using the real IP addresses. We mostly use CTDB because it supports NFS and Samba, I can't attach a document here but I'll email you directly with the documentation. [1] http://gluster.com/community/documentation/index.php/Gluster_3.2:_Configuring_Distributed_Replicated_Volumes [2] http://gluster.com/community/documentation/index.php/Gluster_3.2:_Setting_Volume_Options Thanks, Craig -- Craig Carl, Senior Systems Engineer | Gluster 408.829.9953(PST) | http://gluster.com http://www.gluster.com/gluster-for-aws/ http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE5666F925A557DD8 On 6/6/11 12:09 AM, bxma...@gmail.com wrote: Hello everybody. I have a problem setting up gluster failover funcionality. Based on manual i setup ucarp which is working well ( tested with ping/ssh etc ) But when i use virtual address for gluster volume mount and i turn off one of nodes machine/gluster will freeze until node is back online. My virtual ip is 3.200 and machine real ip is 3.233 and 3.5. In gluster log i can see: [2011-06-06 02:33:54.230082] I [client-handshake.c:913:client_setvolume_cbk] 0-atlas-client-1: Connected to 192.168.3.233:24009, attached to re mote volume '/atlas'. [2011-06-06 02:33:54.230116] I [afr-common.c:2514:afr_notify] 0-atlas-replicate-0: Subvolume 'atlas-client-1' came back up; going online. [2011-06-06 02:33:54.237541] I [fuse-bridge.c:3316:fuse_graph_setup] 0-fuse: switched to graph 0 [2011-06-06 02:33:54.237801] I [fuse-bridge.c:2897:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.13 [2011-06-06 02:33:54.238757] I [afr-common.c:836:afr_fresh_lookup_cbk] 0-atlas-replicate-0: added root inode [2011-06-06 02:33:54.272650] I [client-handshake.c:913:client_setvolume_cbk] 0-atlas-client-0: Connected to 192.168.3.5:24009, attached to remo te volume '/atlas'. Even when IP i'm using at mount is 3.200 ... Its look like that at the end gluster is using real machine IP's even when i'm connecting to virtual. Is there a way how to turn this functionality off or it is just broken ? thanks for answer Matus ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster 3.2.0 and ucarp not working
great, it is working now ... strange is that before i setup that network.ping-timeout value it never switch to another node ( i was waiting for 5 minutes and nothing ) and now after 25 seconds is up and working - maybe there is no default value ? I dont know :) ... but it is working for me now, thanks a lot Matus 2011/6/6 Craig Carl cc...@gluster.com: Exactly! The default ping-timeout is 42 seconds. Craig On 6/6/11 1:50 AM, bxma...@gmail.com wrote: I see, i'm starting to understand that ... so in theory it should work fine with normal IP, and after $ping-timeout seconds it should switch to another node if one is dead, i'm i right ? 2011/6/6 Craig Carl cc...@gluster.com: Matus - If you are using the Gluster native client (mount -t glusterfs ...) then ucarp/CTDB is NOT required and you should not install it. Always use the real IPs when you are mounting with 'mount -t glusterfs...'. Craig On 6/6/11 1:16 AM, bxma...@gmail.com wrote: Hi Craig, thanks for answer. I'm using replications - which is working ok, volume is mounted using -t glusterfs parameter, does it matter ? All nodes are using real IP's not virtual ( for probing etc ) , i'm using virtual only for mounting volume on client. I was waiting over 10 minutes for volume to wake up but it never start to work - it never switch to another node, even when UCARP was already pointing there, there was lot of recovery messages on log but no attemt to connect to second node. thanks Matus 2011/6/6 Craig Carl cc...@gluster.com: Matus - Gluster has automatic, built-in failover if you are using replica nodes. ucarp is only required if you want highly available NFS mounts. To use ucarp with Gluster you should - 1. Install Gluster and create a replica volume. [1] 1. DO NOT use the virtual IPs when you peer probe or create the volume, that won't work. 2. Set the ping-timeout volume option to 25 seconds. [2] 2. Install and setup ucarp. 3. Mount your NFS clients using the VIPs. 4. Mount your glusterfs clients using the real IP addresses. We mostly use CTDB because it supports NFS and Samba, I can't attach a document here but I'll email you directly with the documentation. [1] http://gluster.com/community/documentation/index.php/Gluster_3.2:_Configuring_Distributed_Replicated_Volumes [2] http://gluster.com/community/documentation/index.php/Gluster_3.2:_Setting_Volume_Options Thanks, Craig -- Craig Carl, Senior Systems Engineer | Gluster 408.829.9953(PST) | http://gluster.com http://www.gluster.com/gluster-for-aws/ http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE5666F925A557DD8 On 6/6/11 12:09 AM, bxma...@gmail.com wrote: Hello everybody. I have a problem setting up gluster failover funcionality. Based on manual i setup ucarp which is working well ( tested with ping/ssh etc ) But when i use virtual address for gluster volume mount and i turn off one of nodes machine/gluster will freeze until node is back online. My virtual ip is 3.200 and machine real ip is 3.233 and 3.5. In gluster log i can see: [2011-06-06 02:33:54.230082] I [client-handshake.c:913:client_setvolume_cbk] 0-atlas-client-1: Connected to 192.168.3.233:24009, attached to re mote volume '/atlas'. [2011-06-06 02:33:54.230116] I [afr-common.c:2514:afr_notify] 0-atlas-replicate-0: Subvolume 'atlas-client-1' came back up; going online. [2011-06-06 02:33:54.237541] I [fuse-bridge.c:3316:fuse_graph_setup] 0-fuse: switched to graph 0 [2011-06-06 02:33:54.237801] I [fuse-bridge.c:2897:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.13 [2011-06-06 02:33:54.238757] I [afr-common.c:836:afr_fresh_lookup_cbk] 0-atlas-replicate-0: added root inode [2011-06-06 02:33:54.272650] I [client-handshake.c:913:client_setvolume_cbk] 0-atlas-client-0: Connected to 192.168.3.5:24009, attached to remo te volume '/atlas'. Even when IP i'm using at mount is 3.200 ... Its look like that at the end gluster is using real machine IP's even when i'm connecting to virtual. Is there a way how to turn this functionality off or it is just broken ? thanks for answer Matus ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi