Re: [Gluster-users] [Gluster-devel] Testing replication and HA

Sharuzzaman Ahmat Raslan Tue, 11 Feb 2014 00:49:55 -0800

Hi all,

Is the 42s timeout tunable?


Should the default be made lower, eg. 3 second?

Thanks.




On Tue, Feb 11, 2014 at 3:37 PM, Kaushal M <kshlms...@gmail.com> wrote:

> The 42 second hang is most likely the ping timeout of the client
> translator.
>
> What most likely happened was that, the brick on annex3 was being used
> for the read when you pulled its plug. When you pulled the plug, the
> connection between the client and annex3 isn't gracefully terminated
> and the client translator still sees the connection as alive. Because
> of this the next fop is also sent to annex3, but it will timeout as
> annex3 is dead. After the timeout happens, the connection is marked as
> dead, and the associated client xlator is marked as down. Since afr
> now know annex3 is dead, it sends the next fop to annex4 which is
> still alive.
>
> These kinds of unclean connection terminations are only handled by
> request/ping timeouts currently. You could set the ping timeout values
> to be lower, to reduce the detection time.
>
> ~kaushal
>
> On Tue, Feb 11, 2014 at 11:57 AM, Krishnan Parthasarathi
> <kpart...@redhat.com> wrote:
> > James,
> >
> > Could you provide the logs of the mount process, where you see the hang
> for 42s?
> > My initial guess, seeing 42s, is that the client translator's ping
> timeout
> > is in play.
> >
> > I would encourage you to report a bug and attach relevant logs.
> > If the issue (observed) turns out to be an acceptable/explicable
> behavioural
> > quirk of glusterfs, then we could close the bug :-)
> >
> > cheers,
> > Krish
> > ----- Original Message -----
> >> It's been a while since I did some gluster replication testing, so I
> >> spun up a quick cluster *cough, plug* using puppet-gluster+vagrant (of
> >> course) and here are my results.
> >>
> >> * Setup is a 2x2 distributed-replicated cluster
> >> * Hosts are named: annex{1..4}
> >> * Volume name is 'puppet'
> >> * Client vm's mount (fuse) the volume.
> >>
> >> * On the client:
> >>
> >> # cd /mnt/gluster/puppet/
> >> # dd if=/dev/urandom of=random.51200 count=51200
> >> # sha1sum random.51200
> >> # rsync -v --bwlimit=10 --progress random.51200 root@localhost:/tmp
> >>
> >> * This gives me about an hour to mess with the bricks...
> >> * By looking on the hosts directly, I see that the random.51200 file is
> >> on annex3 and annex4...
> >>
> >> * On annex3:
> >> # poweroff
> >> [host shuts down...]
> >>
> >> * On client1:
> >> # time ls
> >> random.51200
> >>
> >> real    0m42.705s
> >> user    0m0.001s
> >> sys     0m0.002s
> >>
> >> [hangs for about 42 seconds, and then returns successfully...]
> >>
> >> * I then powerup annex3, and then pull the plug on annex4. The same sort
> >> of thing happens... It hangs for 42 seconds, but then everything works
> >> as normal. This is of course the cluster timeout value and the answer to
> >> life the universe and everything.
> >>
> >> Question: Why doesn't glusterfs automatically flip over to using the
> >> other available host right away? If you agree, I'll report this as a
> >> bug. If there's a way to do this, let me know.
> >>
> >> Apart from the delay, glad that this is of course still HA ;)
> >>
> >> Cheers,
> >> James
> >> @purpleidea (twitter/irc)
> >> https://ttboj.wordpress.com/
> >>
> >>
> >> _______________________________________________
> >> Gluster-devel mailing list
> >> gluster-de...@nongnu.org
> >> https://lists.nongnu.org/mailman/listinfo/gluster-devel
> >>
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
> _______________________________________________
> Gluster-devel mailing list
> gluster-de...@nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>



-- 
Sharuzzaman Ahmat Raslan

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] Testing replication and HA

Reply via email to