Re: [Gluster-users] Gluster 2.0.2 locking up issues

2009-06-18 Thread Jasper van Wanrooy - Chatventure

Hi Daniel,

I see you are using the brick volume from the server side. Did you try  
splitting it up so the client and server are in different processes?  
That could possibly cause a problem.


Thanks, Jasper


On 18 jun 2009, at 14:18, Daniel Jordan Bambach wrote:


Well one of the servers just locked up again (completely).

All accesses were occurring on the other machine at the time, We had  
a moment when a directory on the still running server went to  
'Device or Resource Busy', I restartedt Gluster on that machine to  
clear the issue, then noticed the second had died (not sure if it  
happened at the same time or not)


I'm trying to update the dump_caches value to 3, but it isn't  
letting me for some reason (permission denied as root ?)


Will adding DEBUG to the glusterfs commandline give me more  
information across the whole process rather than the trace (below)  
which isnt giving anything away?



[2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 162: (loc  
{path=/www/site/rebuild2008/faber, ino=0})
[2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 162:  
(op_ret=0, ino=0, *buf {st_dev=64768, st_ino=37883182,  
st_mode=40775, st_nlink=24, st_uid=504, st_gid=501, st_rdev=0,  
st_size=4096, st_blksize=4096, st_blocks=16})
[2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 163: (loc  
{path=/www/site/rebuild2008/faber/site-media, ino=0})
[2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 163:  
(op_ret=0, ino=0, *buf {st_dev=64768, st_ino=19238048,  
st_mode=40777, st_nlink=21, st_uid=504, st_gid=501, st_rdev=0,  
st_size=4096, st_blksize=4096, st_blocks=16})
[2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 164: (loc  
{path=/www/site/rebuild2008/faber/site-media/onix-images, ino=0})
[2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 164:  
(op_ret=0, ino=0, *buf {st_dev=64768, st_ino=37884374,  
st_mode=40777, st_nlink=4, st_uid=504, st_gid=501, st_rdev=0,  
st_size=114688, st_blksize=4096, st_blocks=240})
[2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 165: (loc  
{path=/www/site/rebuild2008/faber/site-media/onix-images/thumbs,  
ino=0})
[2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 165:  
(op_ret=0, ino=0, *buf {st_dev=64768, st_ino=19238105,  
st_mode=40777, st_nlink=3, st_uid=504, st_gid=501, st_rdev=0,  
st_size=479232, st_blksize=4096, st_blocks=952})
[2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 166: (loc  
{path=/www/site/rebuild2008/faber/site-media/onix-images/thumbs/ 
185_jpg_130x400_q85.jpg, ino=0})
[2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 166:  
(op_ret=0, ino=0, *buf {st_dev=64768, st_ino=7089866,  
st_mode=100644, st_nlink=1, st_uid=504, st_gid=501, st_rdev=0,  
st_size=10919, st_blksize=4096, st_blocks=32})

---ends--


On 18 Jun 2009, at 11:53, Daniel Jordan Bambach wrote:

Willdo, though I recently added in those lines to help be explicit  
about behaviour (I had no options set before at all, leaving it to  
the default of 16 threads). I will remove and specify the default  
of 16 to see if that helps.


Im adding:

volume trace
type debug/trace
subvolumes cache
end-volume

to both sides now as well, so next time (if any) it locks up  
perhaps there will be some more info.


Thanks Shehjar


On 18 Jun 2009, at 11:26, Shehjar Tikoo wrote:


Daniel Jordan Bambach wrote:
I'm experiencing various locking up issues ranging from Gluster  
locking up ( 'ls'ing the mount hangs ), to the whole machine  
locking up under load.

My current config is below (two servers, afring)
I would love to be able to get to the bottom of this, because it  
seems very strange that we should see erratic behaviour on such a  
simple setup.
There is approx 12Gb of files, and to stress test (and heal) i  
run ls -alR on the mount. This will run for a while and  
eventually lock up Gluster, and occasionally the machine. I have  
found that in some cases killing Gluster and re-mounting does not  
solve the problem (in that perhaps both servers have entered a  
locked state in some way).
Im finding it very hard to collect and debug information of any  
use, as there is no crashlog, no errors in the volume log.
Can anyone suggest what I migth be able to do to extract more  
information as to what is occuring at lock-up time?

volume posix
type storage/posix
option directory /home/export
end-volume
volume locks
type features/locks
subvolumes posix
end-volume
volume brick
type performance/io-threads
subvolumes locks
option autoscaling on
option min-threads 8
option max-threads 32
end-volume

I see that the max-threads will never exceed 32 which is
a reasonable valueand should work fine in most cases but considering
some of the other reports we've been getting, could you please try  
again

but without the autoscaling turned on?

It is off by default, so you can simply set the number of threads
you need by:

option thread-count 

...instead of the three "option" lines above.

Thanks
Shehja

Re: [Gluster-users] Gluster 2.0.2 locking up issues

2009-06-18 Thread Daniel Jordan Bambach

Well one of the servers just locked up again (completely).

All accesses were occurring on the other machine at the time, We had a  
moment when a directory on the still running server went to 'Device or  
Resource Busy', I restartedt Gluster on that machine to clear the  
issue, then noticed the second had died (not sure if it happened at  
the same time or not)


I'm trying to update the dump_caches value to 3, but it isn't letting  
me for some reason (permission denied as root ?)


Will adding DEBUG to the glusterfs commandline give me more  
information across the whole process rather than the trace (below)  
which isnt giving anything away?



[2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 162: (loc  
{path=/www/site/rebuild2008/faber, ino=0})
[2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 162:  
(op_ret=0, ino=0, *buf {st_dev=64768, st_ino=37883182, st_mode=40775,  
st_nlink=24, st_uid=504, st_gid=501, st_rdev=0, st_size=4096,  
st_blksize=4096, st_blocks=16})
[2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 163: (loc  
{path=/www/site/rebuild2008/faber/site-media, ino=0})
[2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 163:  
(op_ret=0, ino=0, *buf {st_dev=64768, st_ino=19238048, st_mode=40777,  
st_nlink=21, st_uid=504, st_gid=501, st_rdev=0, st_size=4096,  
st_blksize=4096, st_blocks=16})
[2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 164: (loc  
{path=/www/site/rebuild2008/faber/site-media/onix-images, ino=0})
[2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 164:  
(op_ret=0, ino=0, *buf {st_dev=64768, st_ino=37884374, st_mode=40777,  
st_nlink=4, st_uid=504, st_gid=501, st_rdev=0, st_size=114688,  
st_blksize=4096, st_blocks=240})
[2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 165: (loc  
{path=/www/site/rebuild2008/faber/site-media/onix-images/thumbs, ino=0})
[2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 165:  
(op_ret=0, ino=0, *buf {st_dev=64768, st_ino=19238105, st_mode=40777,  
st_nlink=3, st_uid=504, st_gid=501, st_rdev=0, st_size=479232,  
st_blksize=4096, st_blocks=952})
[2009-06-18 12:42:08] N [trace.c:1245:trace_lookup] trace: 166: (loc  
{path=/www/site/rebuild2008/faber/site-media/onix-images/thumbs/ 
185_jpg_130x400_q85.jpg, ino=0})
[2009-06-18 12:42:08] N [trace.c:513:trace_lookup_cbk] trace: 166:  
(op_ret=0, ino=0, *buf {st_dev=64768, st_ino=7089866, st_mode=100644,  
st_nlink=1, st_uid=504, st_gid=501, st_rdev=0, st_size=10919,  
st_blksize=4096, st_blocks=32})

---ends--


On 18 Jun 2009, at 11:53, Daniel Jordan Bambach wrote:

Willdo, though I recently added in those lines to help be explicit  
about behaviour (I had no options set before at all, leaving it to  
the default of 16 threads). I will remove and specify the default of  
16 to see if that helps.


Im adding:

volume trace
 type debug/trace
 subvolumes cache
end-volume

to both sides now as well, so next time (if any) it locks up perhaps  
there will be some more info.


Thanks Shehjar


On 18 Jun 2009, at 11:26, Shehjar Tikoo wrote:


Daniel Jordan Bambach wrote:
I'm experiencing various locking up issues ranging from Gluster  
locking up ( 'ls'ing the mount hangs ), to the whole machine  
locking up under load.

My current config is below (two servers, afring)
I would love to be able to get to the bottom of this, because it  
seems very strange that we should see erratic behaviour on such a  
simple setup.
There is approx 12Gb of files, and to stress test (and heal) i run  
ls -alR on the mount. This will run for a while and eventually  
lock up Gluster, and occasionally the machine. I have found that  
in some cases killing Gluster and re-mounting does not solve the  
problem (in that perhaps both servers have entered a locked state  
in some way).
Im finding it very hard to collect and debug information of any  
use, as there is no crashlog, no errors in the volume log.
Can anyone suggest what I migth be able to do to extract more  
information as to what is occuring at lock-up time?

volume posix
type storage/posix
option directory /home/export
end-volume
volume locks
type features/locks
subvolumes posix
end-volume
volume brick
type performance/io-threads
subvolumes locks
option autoscaling on
option min-threads 8
option max-threads 32
end-volume

I see that the max-threads will never exceed 32 which is
a reasonable valueand should work fine in most cases but considering
some of the other reports we've been getting, could you please try  
again

but without the autoscaling turned on?

It is off by default, so you can simply set the number of threads
you need by:

option thread-count 

...instead of the three "option" lines above.

Thanks
Shehjar



volume server
type protocol/server
option transport-type tcp
option auth.addr.brick.allow *
subvolumes brick
end-volume
volume latsrv2
type protocol/client
option transport-type tcp
option remote-host latsrv2
option remote-subvolume brick
end-volume
volume afr
typ

Re: [Gluster-users] Gluster 2.0.2 locking up issues

2009-06-18 Thread Daniel Jordan Bambach
Willdo, though I recently added in those lines to help be explicit  
about behaviour (I had no options set before at all, leaving it to the  
default of 16 threads). I will remove and specify the default of 16 to  
see if that helps.


Im adding:

volume trace
  type debug/trace
  subvolumes cache
end-volume

to both sides now as well, so next time (if any) it locks up perhaps  
there will be some more info.


Thanks Shehjar


On 18 Jun 2009, at 11:26, Shehjar Tikoo wrote:


Daniel Jordan Bambach wrote:
I'm experiencing various locking up issues ranging from Gluster  
locking up ( 'ls'ing the mount hangs ), to the whole machine  
locking up under load.

My current config is below (two servers, afring)
I would love to be able to get to the bottom of this, because it  
seems very strange that we should see erratic behaviour on such a  
simple setup.
There is approx 12Gb of files, and to stress test (and heal) i run  
ls -alR on the mount. This will run for a while and eventually lock  
up Gluster, and occasionally the machine. I have found that in some  
cases killing Gluster and re-mounting does not solve the problem  
(in that perhaps both servers have entered a locked state in some  
way).
Im finding it very hard to collect and debug information of any  
use, as there is no crashlog, no errors in the volume log.
Can anyone suggest what I migth be able to do to extract more  
information as to what is occuring at lock-up time?

volume posix
type storage/posix
option directory /home/export
end-volume
volume locks
 type features/locks
 subvolumes posix
end-volume
volume brick
type performance/io-threads
subvolumes locks
option autoscaling on
option min-threads 8
option max-threads 32
end-volume

I see that the max-threads will never exceed 32 which is
a reasonable valueand should work fine in most cases but considering
some of the other reports we've been getting, could you please try  
again

but without the autoscaling turned on?

It is off by default, so you can simply set the number of threads
you need by:

option thread-count 

...instead of the three "option" lines above.

Thanks
Shehjar



volume server
type protocol/server
option transport-type tcp
option auth.addr.brick.allow *
subvolumes brick
end-volume
volume latsrv2
type protocol/client
option transport-type tcp
option remote-host latsrv2
option remote-subvolume brick
end-volume
volume afr
 type cluster/replicate
 subvolumes brick latsrv2
 option read-subvolume brick
end-volume
volume writebehind
 type performance/write-behind
 option cache-size 2MB
 subvolumes afr
end-volume
volume cache
 type performance/io-cache
 option cache-size 32MB
 option priority *.pyc:4,*.html:3,*.php:2,*:1
 option cache-timeout 5
 subvolumes writebehind
end-volume
___
Gluster-users mailing list
Gluster-users@gluster.org
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users






___
Gluster-users mailing list
Gluster-users@gluster.org
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Gluster 2.0.2 locking up issues

2009-06-18 Thread Shehjar Tikoo

Daniel Jordan Bambach wrote:
I'm experiencing various locking up issues ranging from Gluster locking 
up ( 'ls'ing the mount hangs ), to the whole machine locking up under load.


My current config is below (two servers, afring)

I would love to be able to get to the bottom of this, because it seems 
very strange that we should see erratic behaviour on such a simple setup.


There is approx 12Gb of files, and to stress test (and heal) i run ls 
-alR on the mount. This will run for a while and eventually lock up 
Gluster, and occasionally the machine. I have found that in some cases 
killing Gluster and re-mounting does not solve the problem (in that 
perhaps both servers have entered a locked state in some way).


Im finding it very hard to collect and debug information of any use, as 
there is no crashlog, no errors in the volume log.
Can anyone suggest what I migth be able to do to extract more 
information as to what is occuring at lock-up time?




volume posix
 type storage/posix
 option directory /home/export
end-volume

volume locks
  type features/locks
  subvolumes posix
end-volume

volume brick
 type performance/io-threads
 subvolumes locks
 option autoscaling on
 option min-threads 8
 option max-threads 32
end-volume


I see that the max-threads will never exceed 32 which is
a reasonable valueand should work fine in most cases but considering
some of the other reports we've been getting, could you please try again
but without the autoscaling turned on?

It is off by default, so you can simply set the number of threads
you need by:

option thread-count 

...instead of the three "option" lines above.

Thanks
Shehjar



volume server
 type protocol/server
 option transport-type tcp
 option auth.addr.brick.allow *
 subvolumes brick
end-volume

volume latsrv2
 type protocol/client
 option transport-type tcp
 option remote-host latsrv2
 option remote-subvolume brick
end-volume

volume afr
  type cluster/replicate
  subvolumes brick latsrv2
  option read-subvolume brick
end-volume

volume writebehind
  type performance/write-behind
  option cache-size 2MB
  subvolumes afr
end-volume

volume cache
  type performance/io-cache
  option cache-size 32MB
  option priority *.pyc:4,*.html:3,*.php:2,*:1
  option cache-timeout 5
  subvolumes writebehind
end-volume


___
Gluster-users mailing list
Gluster-users@gluster.org
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users




___
Gluster-users mailing list
Gluster-users@gluster.org
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] Gluster 2.0.2 locking up issues

2009-06-18 Thread Daniel Jordan Bambach
I'm experiencing various locking up issues ranging from Gluster  
locking up ( 'ls'ing the mount hangs ), to the whole machine locking  
up under load.


My current config is below (two servers, afring)

I would love to be able to get to the bottom of this, because it seems  
very strange that we should see erratic behaviour on such a simple  
setup.


There is approx 12Gb of files, and to stress test (and heal) i run ls - 
alR on the mount. This will run for a while and eventually lock up  
Gluster, and occasionally the machine. I have found that in some cases  
killing Gluster and re-mounting does not solve the problem (in that  
perhaps both servers have entered a locked state in some way).


Im finding it very hard to collect and debug information of any use,  
as there is no crashlog, no errors in the volume log.
Can anyone suggest what I migth be able to do to extract more  
information as to what is occuring at lock-up time?




volume posix
 type storage/posix
 option directory /home/export
end-volume

volume locks
  type features/locks
  subvolumes posix
end-volume

volume brick
 type performance/io-threads
 subvolumes locks
 option autoscaling on
 option min-threads 8
 option max-threads 32
end-volume

volume server
 type protocol/server
 option transport-type tcp
 option auth.addr.brick.allow *
 subvolumes brick
end-volume

volume latsrv2
 type protocol/client
 option transport-type tcp
 option remote-host latsrv2
 option remote-subvolume brick
end-volume

volume afr
  type cluster/replicate
  subvolumes brick latsrv2
  option read-subvolume brick
end-volume

volume writebehind
  type performance/write-behind
  option cache-size 2MB
  subvolumes afr
end-volume

volume cache
  type performance/io-cache
  option cache-size 32MB
  option priority *.pyc:4,*.html:3,*.php:2,*:1
  option cache-timeout 5
  subvolumes writebehind
end-volume


___
Gluster-users mailing list
Gluster-users@gluster.org
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users