Re: [Gluster-users] uninterruptible processes writing toglusterfsshare
Hopefully this will help some people... try disabling the io-cache routine in the fuse configurations for your share. Let me know if you need instruction on doing this. It solved all of the lockup issues I was experiencing. I believe there is some sort of as-yet-undetermined memory leak here. Justice London -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of bxma...@gmail.com Sent: Wednesday, June 08, 2011 12:22 PM To: gluster-users@gluster.org Subject: Re: [Gluster-users] uninterruptible processes writing toglusterfsshare Im using kernel 2.6.34 + fuse 2.5.5 + gluster 3.2 from beginning and it happen again today ... php-fpm freeze and reboot was only solution. Matus 2011/6/7 Markus Fröhlich markus.froehl...@xidras.com: hi! there ist no relavant output from dmesg. no entries in the server log - only the one line in the client-server log, I already posted. the glusterfs version on the server had been updated to gfs 3.2.0 more than a month ago. because of the troubles on the backup server, I deleted the whole backup share and started from scratch. I looked for a update of fuse and upgraded from 2.7.2-61.18.1 to 2.8.5-41.1 maybe this helps. here is the changelog info: Authors: Miklos Szeredi mik...@szeredi.hu Distribution: systemsmanagement:baracus / SLE_11_SP1 * Tue Mar 29 2011 db...@novell.com - remove the --no-canonicalize usage for suse_version = 11.3 * Mon Mar 21 2011 co...@novell.com - licenses package is about to die * Thu Feb 17 2011 mszer...@suse.cz - In case of failure to add to /etc/mtab don't umount. [bnc#668820] [CVE-2011-0541] * Tue Nov 16 2010 mszer...@suse.cz - Fix symlink attack for mount and umount [bnc#651598] * Wed Oct 27 2010 mszer...@suse.cz - Remove /etc/init.d/boot.fuse [bnc#648843] * Tue Sep 28 2010 mszer...@suse.cz - update to 2.8.5 * fix option escaping for fusermount [bnc#641480] * Wed Apr 28 2010 mszer...@suse.cz - keep examples and internal docs in devel package (from jnweiger) * Mon Apr 26 2010 mszer...@suse.cz - update to 2.8.4 * fix checking for symlinks in umount from /tmp * fix umounting if /tmp is a symlink kind regards markus froehlich Am 06.06.2011 21:19, schrieb Anthony J. Biacco: Could be fuse, check 'dmesg' for kernel module timeouts. In a similar vein, has anyone seen signifigant performance/reliability with diff fuse versions? say, latest source vs. Rhel distro rpms vers. -Tony -Original Message- From: Mohit Anchliamohitanch...@gmail.com Sent: June 06, 2011 1:14 PM To: Markus Fröhlichmarkus.froehl...@xidras.com Cc: gluster-users@gluster.orggluster-users@gluster.org Subject: Re: [Gluster-users] uninterruptible processes writing to glusterfsshare Is there anything in the server logs? Does it follow any particular pattern before going in this mode? Did you upgrade Gluster or is this new install? 2011/6/6 Markus Fröhlichmarkus.froehl...@xidras.com: hi! sometimes we've on some client-servers hanging uninterruptible processes (ps aux stat is on D ) and on one the CPU wait I/O grows within some minutes to 100%. you are not able to kill such processes - also kill -9 doesnt work - when you connect via strace to such an process, you wont see anything and you cannot detach it again. there are only two possibilities: killing the glusterfs process (umount GFS share) or rebooting the server. the only log entry I found, was on one client - just a single line: [2011-06-06 10:44:18.593211] I [afr-common.c:581:afr_lookup_collect_xattr] 0-office-data-replicate-0: data self-heal is pending for /pc-partnerbet-public/Promotionaktionen/Mailakquise_2009/Webmaster_2010/HTML /bilder/Thumbs.db. one of the client-servers is a samba-server, the other one a backup-server based on rsync with millions of small files. gfs-servers + gfs-clients: SLES11 x86_64, glusterfs V 3.2.0 and here are the configs from server and client: server config /etc/glusterd/vols/office-data/office-data.gfs-01-01.GFS-office-data02.vol : volume office-data-posix type storage/posix option directory /GFS/office-data02 end-volume volume office-data-access-control type features/access-control subvolumes office-data-posix end-volume volume office-data-locks type features/locks subvolumes office-data-access-control end-volume volume office-data-io-threads type performance/io-threads subvolumes office-data-locks end-volume volume office-data-marker type features/marker option volume-uuid 3c6e633d-a0bb-4c52-8f05-a2db9bc9c659 option timestamp-file /etc/glusterd/vols/office-data/marker.tstamp option xtime off option quota off subvolumes office-data-io-threads end-volume volume /GFS/office-data02 type debug/io-stats option latency-measurement off option count-fop-hits off subvolumes office-data-marker end-volume volume office-data
Re: [Gluster-users] uninterruptible processes writing toglusterfsshare
On Wed, Jun 8, 2011 at 12:29 PM, Justice London jlon...@lawinfo.com wrote: Hopefully this will help some people... try disabling the io-cache routine in the fuse configurations for your share. Let me know if you need instruction on doing this. It solved all of the lockup issues I was experiencing. I believe there is some sort of as-yet-undetermined memory leak here. Was there a bug filed? If you think this is a bug it will help others as well. Justice London -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of bxma...@gmail.com Sent: Wednesday, June 08, 2011 12:22 PM To: gluster-users@gluster.org Subject: Re: [Gluster-users] uninterruptible processes writing toglusterfsshare Im using kernel 2.6.34 + fuse 2.5.5 + gluster 3.2 from beginning and it happen again today ... php-fpm freeze and reboot was only solution. Matus 2011/6/7 Markus Fröhlich markus.froehl...@xidras.com: hi! there ist no relavant output from dmesg. no entries in the server log - only the one line in the client-server log, I already posted. the glusterfs version on the server had been updated to gfs 3.2.0 more than a month ago. because of the troubles on the backup server, I deleted the whole backup share and started from scratch. I looked for a update of fuse and upgraded from 2.7.2-61.18.1 to 2.8.5-41.1 maybe this helps. here is the changelog info: Authors: Miklos Szeredi mik...@szeredi.hu Distribution: systemsmanagement:baracus / SLE_11_SP1 * Tue Mar 29 2011 db...@novell.com - remove the --no-canonicalize usage for suse_version = 11.3 * Mon Mar 21 2011 co...@novell.com - licenses package is about to die * Thu Feb 17 2011 mszer...@suse.cz - In case of failure to add to /etc/mtab don't umount. [bnc#668820] [CVE-2011-0541] * Tue Nov 16 2010 mszer...@suse.cz - Fix symlink attack for mount and umount [bnc#651598] * Wed Oct 27 2010 mszer...@suse.cz - Remove /etc/init.d/boot.fuse [bnc#648843] * Tue Sep 28 2010 mszer...@suse.cz - update to 2.8.5 * fix option escaping for fusermount [bnc#641480] * Wed Apr 28 2010 mszer...@suse.cz - keep examples and internal docs in devel package (from jnweiger) * Mon Apr 26 2010 mszer...@suse.cz - update to 2.8.4 * fix checking for symlinks in umount from /tmp * fix umounting if /tmp is a symlink kind regards markus froehlich Am 06.06.2011 21:19, schrieb Anthony J. Biacco: Could be fuse, check 'dmesg' for kernel module timeouts. In a similar vein, has anyone seen signifigant performance/reliability with diff fuse versions? say, latest source vs. Rhel distro rpms vers. -Tony -Original Message- From: Mohit Anchliamohitanch...@gmail.com Sent: June 06, 2011 1:14 PM To: Markus Fröhlichmarkus.froehl...@xidras.com Cc: gluster-users@gluster.orggluster-users@gluster.org Subject: Re: [Gluster-users] uninterruptible processes writing to glusterfsshare Is there anything in the server logs? Does it follow any particular pattern before going in this mode? Did you upgrade Gluster or is this new install? 2011/6/6 Markus Fröhlichmarkus.froehl...@xidras.com: hi! sometimes we've on some client-servers hanging uninterruptible processes (ps aux stat is on D ) and on one the CPU wait I/O grows within some minutes to 100%. you are not able to kill such processes - also kill -9 doesnt work - when you connect via strace to such an process, you wont see anything and you cannot detach it again. there are only two possibilities: killing the glusterfs process (umount GFS share) or rebooting the server. the only log entry I found, was on one client - just a single line: [2011-06-06 10:44:18.593211] I [afr-common.c:581:afr_lookup_collect_xattr] 0-office-data-replicate-0: data self-heal is pending for /pc-partnerbet-public/Promotionaktionen/Mailakquise_2009/Webmaster_2010/HTML /bilder/Thumbs.db. one of the client-servers is a samba-server, the other one a backup-server based on rsync with millions of small files. gfs-servers + gfs-clients: SLES11 x86_64, glusterfs V 3.2.0 and here are the configs from server and client: server config /etc/glusterd/vols/office-data/office-data.gfs-01-01.GFS-office-data02.vol : volume office-data-posix type storage/posix option directory /GFS/office-data02 end-volume volume office-data-access-control type features/access-control subvolumes office-data-posix end-volume volume office-data-locks type features/locks subvolumes office-data-access-control end-volume volume office-data-io-threads type performance/io-threads subvolumes office-data-locks end-volume volume office-data-marker type features/marker option volume-uuid 3c6e633d-a0bb-4c52-8f05-a2db9bc9c659 option timestamp-file /etc/glusterd/vols/office-data/marker.tstamp option xtime off option quota off subvolumes office-data-io-threads end-volume volume
Re: [Gluster-users] uninterruptible processes writing toglusterfsshare
How to disable io-cache routine ? I will try it and report back :) thanks Matus 2011/6/8 Mohit Anchlia mohitanch...@gmail.com: On Wed, Jun 8, 2011 at 12:29 PM, Justice London jlon...@lawinfo.com wrote: Hopefully this will help some people... try disabling the io-cache routine in the fuse configurations for your share. Let me know if you need instruction on doing this. It solved all of the lockup issues I was experiencing. I believe there is some sort of as-yet-undetermined memory leak here. Was there a bug filed? If you think this is a bug it will help others as well. Justice London -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of bxma...@gmail.com Sent: Wednesday, June 08, 2011 12:22 PM To: gluster-users@gluster.org Subject: Re: [Gluster-users] uninterruptible processes writing toglusterfsshare Im using kernel 2.6.34 + fuse 2.5.5 + gluster 3.2 from beginning and it happen again today ... php-fpm freeze and reboot was only solution. Matus 2011/6/7 Markus Fröhlich markus.froehl...@xidras.com: hi! there ist no relavant output from dmesg. no entries in the server log - only the one line in the client-server log, I already posted. the glusterfs version on the server had been updated to gfs 3.2.0 more than a month ago. because of the troubles on the backup server, I deleted the whole backup share and started from scratch. I looked for a update of fuse and upgraded from 2.7.2-61.18.1 to 2.8.5-41.1 maybe this helps. here is the changelog info: Authors: Miklos Szeredi mik...@szeredi.hu Distribution: systemsmanagement:baracus / SLE_11_SP1 * Tue Mar 29 2011 db...@novell.com - remove the --no-canonicalize usage for suse_version = 11.3 * Mon Mar 21 2011 co...@novell.com - licenses package is about to die * Thu Feb 17 2011 mszer...@suse.cz - In case of failure to add to /etc/mtab don't umount. [bnc#668820] [CVE-2011-0541] * Tue Nov 16 2010 mszer...@suse.cz - Fix symlink attack for mount and umount [bnc#651598] * Wed Oct 27 2010 mszer...@suse.cz - Remove /etc/init.d/boot.fuse [bnc#648843] * Tue Sep 28 2010 mszer...@suse.cz - update to 2.8.5 * fix option escaping for fusermount [bnc#641480] * Wed Apr 28 2010 mszer...@suse.cz - keep examples and internal docs in devel package (from jnweiger) * Mon Apr 26 2010 mszer...@suse.cz - update to 2.8.4 * fix checking for symlinks in umount from /tmp * fix umounting if /tmp is a symlink kind regards markus froehlich Am 06.06.2011 21:19, schrieb Anthony J. Biacco: Could be fuse, check 'dmesg' for kernel module timeouts. In a similar vein, has anyone seen signifigant performance/reliability with diff fuse versions? say, latest source vs. Rhel distro rpms vers. -Tony -Original Message- From: Mohit Anchliamohitanch...@gmail.com Sent: June 06, 2011 1:14 PM To: Markus Fröhlichmarkus.froehl...@xidras.com Cc: gluster-users@gluster.orggluster-users@gluster.org Subject: Re: [Gluster-users] uninterruptible processes writing to glusterfsshare Is there anything in the server logs? Does it follow any particular pattern before going in this mode? Did you upgrade Gluster or is this new install? 2011/6/6 Markus Fröhlichmarkus.froehl...@xidras.com: hi! sometimes we've on some client-servers hanging uninterruptible processes (ps aux stat is on D ) and on one the CPU wait I/O grows within some minutes to 100%. you are not able to kill such processes - also kill -9 doesnt work - when you connect via strace to such an process, you wont see anything and you cannot detach it again. there are only two possibilities: killing the glusterfs process (umount GFS share) or rebooting the server. the only log entry I found, was on one client - just a single line: [2011-06-06 10:44:18.593211] I [afr-common.c:581:afr_lookup_collect_xattr] 0-office-data-replicate-0: data self-heal is pending for /pc-partnerbet-public/Promotionaktionen/Mailakquise_2009/Webmaster_2010/HTML /bilder/Thumbs.db. one of the client-servers is a samba-server, the other one a backup-server based on rsync with millions of small files. gfs-servers + gfs-clients: SLES11 x86_64, glusterfs V 3.2.0 and here are the configs from server and client: server config /etc/glusterd/vols/office-data/office-data.gfs-01-01.GFS-office-data02.vol : volume office-data-posix type storage/posix option directory /GFS/office-data02 end-volume volume office-data-access-control type features/access-control subvolumes office-data-posix end-volume volume office-data-locks type features/locks subvolumes office-data-access-control end-volume volume office-data-io-threads type performance/io-threads subvolumes office-data-locks end-volume volume office-data-marker type features/marker option volume-uuid 3c6e633d-a0bb-4c52-8f05-a2db9bc9c659 option timestamp-file /etc/glusterd
Re: [Gluster-users] uninterruptible processes writing toglusterfsshare
I have been working with the gluster support department (under paid support I should mention)... with no resolution or similar. A bug report specifically was not filed by myself though. Justice London -Original Message- From: Mohit Anchlia [mailto:mohitanch...@gmail.com] Sent: Wednesday, June 08, 2011 12:29 PM To: Justice London Cc: bxma...@gmail.com; gluster-users@gluster.org Subject: Re: [Gluster-users] uninterruptible processes writing toglusterfsshare On Wed, Jun 8, 2011 at 12:29 PM, Justice London jlon...@lawinfo.com wrote: Hopefully this will help some people... try disabling the io-cache routine in the fuse configurations for your share. Let me know if you need instruction on doing this. It solved all of the lockup issues I was experiencing. I believe there is some sort of as-yet-undetermined memory leak here. Was there a bug filed? If you think this is a bug it will help others as well. Justice London -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of bxma...@gmail.com Sent: Wednesday, June 08, 2011 12:22 PM To: gluster-users@gluster.org Subject: Re: [Gluster-users] uninterruptible processes writing toglusterfsshare Im using kernel 2.6.34 + fuse 2.5.5 + gluster 3.2 from beginning and it happen again today ... php-fpm freeze and reboot was only solution. Matus 2011/6/7 Markus Fröhlich markus.froehl...@xidras.com: hi! there ist no relavant output from dmesg. no entries in the server log - only the one line in the client-server log, I already posted. the glusterfs version on the server had been updated to gfs 3.2.0 more than a month ago. because of the troubles on the backup server, I deleted the whole backup share and started from scratch. I looked for a update of fuse and upgraded from 2.7.2-61.18.1 to 2.8.5-41.1 maybe this helps. here is the changelog info: Authors: Miklos Szeredi mik...@szeredi.hu Distribution: systemsmanagement:baracus / SLE_11_SP1 * Tue Mar 29 2011 db...@novell.com - remove the --no-canonicalize usage for suse_version = 11.3 * Mon Mar 21 2011 co...@novell.com - licenses package is about to die * Thu Feb 17 2011 mszer...@suse.cz - In case of failure to add to /etc/mtab don't umount. [bnc#668820] [CVE-2011-0541] * Tue Nov 16 2010 mszer...@suse.cz - Fix symlink attack for mount and umount [bnc#651598] * Wed Oct 27 2010 mszer...@suse.cz - Remove /etc/init.d/boot.fuse [bnc#648843] * Tue Sep 28 2010 mszer...@suse.cz - update to 2.8.5 * fix option escaping for fusermount [bnc#641480] * Wed Apr 28 2010 mszer...@suse.cz - keep examples and internal docs in devel package (from jnweiger) * Mon Apr 26 2010 mszer...@suse.cz - update to 2.8.4 * fix checking for symlinks in umount from /tmp * fix umounting if /tmp is a symlink kind regards markus froehlich Am 06.06.2011 21:19, schrieb Anthony J. Biacco: Could be fuse, check 'dmesg' for kernel module timeouts. In a similar vein, has anyone seen signifigant performance/reliability with diff fuse versions? say, latest source vs. Rhel distro rpms vers. -Tony -Original Message- From: Mohit Anchliamohitanch...@gmail.com Sent: June 06, 2011 1:14 PM To: Markus Fröhlichmarkus.froehl...@xidras.com Cc: gluster-users@gluster.orggluster-users@gluster.org Subject: Re: [Gluster-users] uninterruptible processes writing to glusterfsshare Is there anything in the server logs? Does it follow any particular pattern before going in this mode? Did you upgrade Gluster or is this new install? 2011/6/6 Markus Fröhlichmarkus.froehl...@xidras.com: hi! sometimes we've on some client-servers hanging uninterruptible processes (ps aux stat is on D ) and on one the CPU wait I/O grows within some minutes to 100%. you are not able to kill such processes - also kill -9 doesnt work - when you connect via strace to such an process, you wont see anything and you cannot detach it again. there are only two possibilities: killing the glusterfs process (umount GFS share) or rebooting the server. the only log entry I found, was on one client - just a single line: [2011-06-06 10:44:18.593211] I [afr-common.c:581:afr_lookup_collect_xattr] 0-office-data-replicate-0: data self-heal is pending for /pc-partnerbet-public/Promotionaktionen/Mailakquise_2009/Webmaster_2010/HTML /bilder/Thumbs.db. one of the client-servers is a samba-server, the other one a backup-server based on rsync with millions of small files. gfs-servers + gfs-clients: SLES11 x86_64, glusterfs V 3.2.0 and here are the configs from server and client: server config /etc/glusterd/vols/office-data/office-data.gfs-01-01.GFS-office-data02.vol : volume office-data-posix type storage/posix option directory /GFS/office-data02 end-volume volume office-data-access-control type features/access-control subvolumes office-data-posix end-volume volume office-data-locks