Re: [Gluster-users] Gluster 2.0.2 locking up issues
Hi Daniel, I see you are using the brick volume from the server side. Did you try splitting it up so the client and server are in different processes? That could possibly cause a problem. Thanks, Jasper On 18 jun 2009, at 14:18, Daniel Jordan Bambach wrote: Well one of the servers just locked up again (completely). All accesses were occurring on the other machine at the time, We had a moment when a directory on the still running server went to 'Device or Resource Busy', I restartedt Gluster on that machine to clear the issue, then noticed the second had died (not sure if it happened at the same time or not) I'm trying to update the dump_caches value to 3, but it isn't letting me for some reason (permission denied as root ?) Will adding DEBUG to the glusterfs commandline give me more information across the whole process rather than the trace (below) which isnt giving anything away? [2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 162: (loc {path=/www/site/rebuild2008/faber, ino=0}) [2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 162: (op_ret=0, ino=0, *buf {st_dev=64768, st_ino=37883182, st_mode=40775, st_nlink=24, st_uid=504, st_gid=501, st_rdev=0, st_size=4096, st_blksize=4096, st_blocks=16}) [2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 163: (loc {path=/www/site/rebuild2008/faber/site-media, ino=0}) [2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 163: (op_ret=0, ino=0, *buf {st_dev=64768, st_ino=19238048, st_mode=40777, st_nlink=21, st_uid=504, st_gid=501, st_rdev=0, st_size=4096, st_blksize=4096, st_blocks=16}) [2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 164: (loc {path=/www/site/rebuild2008/faber/site-media/onix-images, ino=0}) [2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 164: (op_ret=0, ino=0, *buf {st_dev=64768, st_ino=37884374, st_mode=40777, st_nlink=4, st_uid=504, st_gid=501, st_rdev=0, st_size=114688, st_blksize=4096, st_blocks=240}) [2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 165: (loc {path=/www/site/rebuild2008/faber/site-media/onix-images/thumbs, ino=0}) [2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 165: (op_ret=0, ino=0, *buf {st_dev=64768, st_ino=19238105, st_mode=40777, st_nlink=3, st_uid=504, st_gid=501, st_rdev=0, st_size=479232, st_blksize=4096, st_blocks=952}) [2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 166: (loc {path=/www/site/rebuild2008/faber/site-media/onix-images/thumbs/ 185_jpg_130x400_q85.jpg, ino=0}) [2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 166: (op_ret=0, ino=0, *buf {st_dev=64768, st_ino=7089866, st_mode=100644, st_nlink=1, st_uid=504, st_gid=501, st_rdev=0, st_size=10919, st_blksize=4096, st_blocks=32}) ---ends-- On 18 Jun 2009, at 11:53, Daniel Jordan Bambach wrote: Willdo, though I recently added in those lines to help be explicit about behaviour (I had no options set before at all, leaving it to the default of 16 threads). I will remove and specify the default of 16 to see if that helps. Im adding: volume trace type debug/trace subvolumes cache end-volume to both sides now as well, so next time (if any) it locks up perhaps there will be some more info. Thanks Shehjar On 18 Jun 2009, at 11:26, Shehjar Tikoo wrote: Daniel Jordan Bambach wrote: I'm experiencing various locking up issues ranging from Gluster locking up ( 'ls'ing the mount hangs ), to the whole machine locking up under load. My current config is below (two servers, afring) I would love to be able to get to the bottom of this, because it seems very strange that we should see erratic behaviour on such a simple setup. There is approx 12Gb of files, and to stress test (and heal) i run ls -alR on the mount. This will run for a while and eventually lock up Gluster, and occasionally the machine. I have found that in some cases killing Gluster and re-mounting does not solve the problem (in that perhaps both servers have entered a locked state in some way). Im finding it very hard to collect and debug information of any use, as there is no crashlog, no errors in the volume log. Can anyone suggest what I migth be able to do to extract more information as to what is occuring at lock-up time? volume posix type storage/posix option directory /home/export end-volume volume locks type features/locks subvolumes posix end-volume volume brick type performance/io-threads subvolumes locks option autoscaling on option min-threads 8 option max-threads 32 end-volume I see that the max-threads will never exceed 32 which is a reasonable valueand should work fine in most cases but considering some of the other reports we've been getting, could you please try again but without the autoscaling turned on? It is off by default, so you can simply set the number of threads you need by: option thread-count ...instead of the three "option" lines above. Thanks Shehja
Re: [Gluster-users] Gluster 2.0.2 locking up issues
Well one of the servers just locked up again (completely). All accesses were occurring on the other machine at the time, We had a moment when a directory on the still running server went to 'Device or Resource Busy', I restartedt Gluster on that machine to clear the issue, then noticed the second had died (not sure if it happened at the same time or not) I'm trying to update the dump_caches value to 3, but it isn't letting me for some reason (permission denied as root ?) Will adding DEBUG to the glusterfs commandline give me more information across the whole process rather than the trace (below) which isnt giving anything away? [2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 162: (loc {path=/www/site/rebuild2008/faber, ino=0}) [2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 162: (op_ret=0, ino=0, *buf {st_dev=64768, st_ino=37883182, st_mode=40775, st_nlink=24, st_uid=504, st_gid=501, st_rdev=0, st_size=4096, st_blksize=4096, st_blocks=16}) [2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 163: (loc {path=/www/site/rebuild2008/faber/site-media, ino=0}) [2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 163: (op_ret=0, ino=0, *buf {st_dev=64768, st_ino=19238048, st_mode=40777, st_nlink=21, st_uid=504, st_gid=501, st_rdev=0, st_size=4096, st_blksize=4096, st_blocks=16}) [2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 164: (loc {path=/www/site/rebuild2008/faber/site-media/onix-images, ino=0}) [2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 164: (op_ret=0, ino=0, *buf {st_dev=64768, st_ino=37884374, st_mode=40777, st_nlink=4, st_uid=504, st_gid=501, st_rdev=0, st_size=114688, st_blksize=4096, st_blocks=240}) [2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 165: (loc {path=/www/site/rebuild2008/faber/site-media/onix-images/thumbs, ino=0}) [2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 165: (op_ret=0, ino=0, *buf {st_dev=64768, st_ino=19238105, st_mode=40777, st_nlink=3, st_uid=504, st_gid=501, st_rdev=0, st_size=479232, st_blksize=4096, st_blocks=952}) [2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 166: (loc {path=/www/site/rebuild2008/faber/site-media/onix-images/thumbs/ 185_jpg_130x400_q85.jpg, ino=0}) [2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 166: (op_ret=0, ino=0, *buf {st_dev=64768, st_ino=7089866, st_mode=100644, st_nlink=1, st_uid=504, st_gid=501, st_rdev=0, st_size=10919, st_blksize=4096, st_blocks=32}) ---ends-- On 18 Jun 2009, at 11:53, Daniel Jordan Bambach wrote: Willdo, though I recently added in those lines to help be explicit about behaviour (I had no options set before at all, leaving it to the default of 16 threads). I will remove and specify the default of 16 to see if that helps. Im adding: volume trace type debug/trace subvolumes cache end-volume to both sides now as well, so next time (if any) it locks up perhaps there will be some more info. Thanks Shehjar On 18 Jun 2009, at 11:26, Shehjar Tikoo wrote: Daniel Jordan Bambach wrote: I'm experiencing various locking up issues ranging from Gluster locking up ( 'ls'ing the mount hangs ), to the whole machine locking up under load. My current config is below (two servers, afring) I would love to be able to get to the bottom of this, because it seems very strange that we should see erratic behaviour on such a simple setup. There is approx 12Gb of files, and to stress test (and heal) i run ls -alR on the mount. This will run for a while and eventually lock up Gluster, and occasionally the machine. I have found that in some cases killing Gluster and re-mounting does not solve the problem (in that perhaps both servers have entered a locked state in some way). Im finding it very hard to collect and debug information of any use, as there is no crashlog, no errors in the volume log. Can anyone suggest what I migth be able to do to extract more information as to what is occuring at lock-up time? volume posix type storage/posix option directory /home/export end-volume volume locks type features/locks subvolumes posix end-volume volume brick type performance/io-threads subvolumes locks option autoscaling on option min-threads 8 option max-threads 32 end-volume I see that the max-threads will never exceed 32 which is a reasonable valueand should work fine in most cases but considering some of the other reports we've been getting, could you please try again but without the autoscaling turned on? It is off by default, so you can simply set the number of threads you need by: option thread-count ...instead of the three "option" lines above. Thanks Shehjar volume server type protocol/server option transport-type tcp option auth.addr.brick.allow * subvolumes brick end-volume volume latsrv2 type protocol/client option transport-type tcp option remote-host latsrv2 option remote-subvolume brick end-volume volume afr typ
Re: [Gluster-users] Gluster 2.0.2 locking up issues
Willdo, though I recently added in those lines to help be explicit about behaviour (I had no options set before at all, leaving it to the default of 16 threads). I will remove and specify the default of 16 to see if that helps. Im adding: volume trace type debug/trace subvolumes cache end-volume to both sides now as well, so next time (if any) it locks up perhaps there will be some more info. Thanks Shehjar On 18 Jun 2009, at 11:26, Shehjar Tikoo wrote: Daniel Jordan Bambach wrote: I'm experiencing various locking up issues ranging from Gluster locking up ( 'ls'ing the mount hangs ), to the whole machine locking up under load. My current config is below (two servers, afring) I would love to be able to get to the bottom of this, because it seems very strange that we should see erratic behaviour on such a simple setup. There is approx 12Gb of files, and to stress test (and heal) i run ls -alR on the mount. This will run for a while and eventually lock up Gluster, and occasionally the machine. I have found that in some cases killing Gluster and re-mounting does not solve the problem (in that perhaps both servers have entered a locked state in some way). Im finding it very hard to collect and debug information of any use, as there is no crashlog, no errors in the volume log. Can anyone suggest what I migth be able to do to extract more information as to what is occuring at lock-up time? volume posix type storage/posix option directory /home/export end-volume volume locks type features/locks subvolumes posix end-volume volume brick type performance/io-threads subvolumes locks option autoscaling on option min-threads 8 option max-threads 32 end-volume I see that the max-threads will never exceed 32 which is a reasonable valueand should work fine in most cases but considering some of the other reports we've been getting, could you please try again but without the autoscaling turned on? It is off by default, so you can simply set the number of threads you need by: option thread-count ...instead of the three "option" lines above. Thanks Shehjar volume server type protocol/server option transport-type tcp option auth.addr.brick.allow * subvolumes brick end-volume volume latsrv2 type protocol/client option transport-type tcp option remote-host latsrv2 option remote-subvolume brick end-volume volume afr type cluster/replicate subvolumes brick latsrv2 option read-subvolume brick end-volume volume writebehind type performance/write-behind option cache-size 2MB subvolumes afr end-volume volume cache type performance/io-cache option cache-size 32MB option priority *.pyc:4,*.html:3,*.php:2,*:1 option cache-timeout 5 subvolumes writebehind end-volume ___ Gluster-users mailing list Gluster-users@gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster 2.0.2 locking up issues
Daniel Jordan Bambach wrote: I'm experiencing various locking up issues ranging from Gluster locking up ( 'ls'ing the mount hangs ), to the whole machine locking up under load. My current config is below (two servers, afring) I would love to be able to get to the bottom of this, because it seems very strange that we should see erratic behaviour on such a simple setup. There is approx 12Gb of files, and to stress test (and heal) i run ls -alR on the mount. This will run for a while and eventually lock up Gluster, and occasionally the machine. I have found that in some cases killing Gluster and re-mounting does not solve the problem (in that perhaps both servers have entered a locked state in some way). Im finding it very hard to collect and debug information of any use, as there is no crashlog, no errors in the volume log. Can anyone suggest what I migth be able to do to extract more information as to what is occuring at lock-up time? volume posix type storage/posix option directory /home/export end-volume volume locks type features/locks subvolumes posix end-volume volume brick type performance/io-threads subvolumes locks option autoscaling on option min-threads 8 option max-threads 32 end-volume I see that the max-threads will never exceed 32 which is a reasonable valueand should work fine in most cases but considering some of the other reports we've been getting, could you please try again but without the autoscaling turned on? It is off by default, so you can simply set the number of threads you need by: option thread-count ...instead of the three "option" lines above. Thanks Shehjar volume server type protocol/server option transport-type tcp option auth.addr.brick.allow * subvolumes brick end-volume volume latsrv2 type protocol/client option transport-type tcp option remote-host latsrv2 option remote-subvolume brick end-volume volume afr type cluster/replicate subvolumes brick latsrv2 option read-subvolume brick end-volume volume writebehind type performance/write-behind option cache-size 2MB subvolumes afr end-volume volume cache type performance/io-cache option cache-size 32MB option priority *.pyc:4,*.html:3,*.php:2,*:1 option cache-timeout 5 subvolumes writebehind end-volume ___ Gluster-users mailing list Gluster-users@gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Gluster 2.0.2 locking up issues
I'm experiencing various locking up issues ranging from Gluster locking up ( 'ls'ing the mount hangs ), to the whole machine locking up under load. My current config is below (two servers, afring) I would love to be able to get to the bottom of this, because it seems very strange that we should see erratic behaviour on such a simple setup. There is approx 12Gb of files, and to stress test (and heal) i run ls - alR on the mount. This will run for a while and eventually lock up Gluster, and occasionally the machine. I have found that in some cases killing Gluster and re-mounting does not solve the problem (in that perhaps both servers have entered a locked state in some way). Im finding it very hard to collect and debug information of any use, as there is no crashlog, no errors in the volume log. Can anyone suggest what I migth be able to do to extract more information as to what is occuring at lock-up time? volume posix type storage/posix option directory /home/export end-volume volume locks type features/locks subvolumes posix end-volume volume brick type performance/io-threads subvolumes locks option autoscaling on option min-threads 8 option max-threads 32 end-volume volume server type protocol/server option transport-type tcp option auth.addr.brick.allow * subvolumes brick end-volume volume latsrv2 type protocol/client option transport-type tcp option remote-host latsrv2 option remote-subvolume brick end-volume volume afr type cluster/replicate subvolumes brick latsrv2 option read-subvolume brick end-volume volume writebehind type performance/write-behind option cache-size 2MB subvolumes afr end-volume volume cache type performance/io-cache option cache-size 32MB option priority *.pyc:4,*.html:3,*.php:2,*:1 option cache-timeout 5 subvolumes writebehind end-volume ___ Gluster-users mailing list Gluster-users@gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users