On Tue, Jan 24, 2017 at 10:39 AM, Vijay Bellur <vbel...@redhat.com> wrote:
> > > On Thu, Jan 19, 2017 at 8:06 AM, Jeff Darcy <jda...@redhat.com> wrote: > >> > The more relevant question would be with TCP_KEEPALIVE and >> TCP_USER_TIMEOUT >> > on sockets, do we really need ping-pong framework in Clients? We might >> need >> > that in transport/rdma setups, but my question is concentrating on >> > transport/rdma. In other words would like to hear why do we need >> heart-beat >> > mechanism in the first place. One scenario might be a healthy socket >> level >> > connection but an unhealthy brick/client (like a deadlocked one). >> >> This is an important case to consider. On the one hand, I think it >> answers >> your question about TCP_KEEPALIVE. What we really care about is whether a >> brick's request queue is moving. In other words, what's the time since >> the >> last reply from that brick, and does that time exceed some threshold? > > I agree with this. > On a >> busy system, we don't even need ping packets to know that. We can just >> use >> responses on other requests to set/reset that timer. We only need to send >> ping packets when our *outbound* queue has remained empty for some >> fraction >> of our timeout. >> > Do we need ping packets sent even when client is not waiting for any replies? I assume no. If there are no responses to be received and no requests being sent to a brick, why would be a client be interested in the health of server/brick? > >> However, it's important that our measurements be *end to end* and not just >> at the transport level. This is particularly true with multiplexing, >> where multiple bricks will share and contend on various resources. We >> should ping *through* client and server, with separate translators above >> and below each. This would give us a true end-to-end ping *for that >> brick*, and also keep the code nicely modular. >> > Agree with this. My understanding of ping framework is a tool to identify unhealthy bricks (we are interested in bricks as they are the ones going to serve fops). With that understanding ping-pong should be end to end (to whatever logical entity that constitutes brick). However, where in the brick xlator stack ping packets should be responded? Should they go all the way down to storage/posix? > > +1 to this. Having ping, pong xlators immediately above and below protocol > translators would also address the problem of epoll threads getting blocked > in gluster's xlator stacks in busy systems. > > Having said that, I do see value in Rafi's patch that prompted this > thread. Would it not help to prioritize ping - pong traffic in all parts of > the gluster stack including the send queue on the client? > I've two concerns here: 1. Responsiveness of brick to client invariably involves latency of network and our own transport's io-queue. Wouldn't prioritizing ping packets over normal data give us a skewed view of brick's responsiveness? For eg., On a network with heavy traffic ping-pong might be happening, but fops might be moving very slowely. What is that we achieve with a successful ping-pong in this scenario? Also, Is our response to the opposite scenario of ping-timeout happening and disconnecting the transport achieves anything substantially good? May be it helps to bring the latency of syscalls down (as experienced by application), as our HA translators like afr, EC add the latency of identifying disconnect (or a successful fop) to latency of syscalls. As developers many of us keep wondering what is that we are trying to achieve with an heart beat mechanism. 2. Assuming that we want to prioritize ping traffic over normal traffic (which we do logically now as ping packets doesn't traverse the entire brick xlator stack all the way down to posix, instead short circuit at protocol/server), the fix in discussion here is partial (as we can't prioritize ping traffic ON the WIRE and through tcp/ip stack). While I don't have strong objections to it, I feel that its partial solution and might be inconsequential (just an hunch, no data). However, I can accept the patch, if we feel it helps. > Regards, > Vijay > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-devel > -- Raghavendra G
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel