On Thu, Jan 19, 2017 at 3:59 PM, Raghavendra G <raghaven...@gluster.com> wrote:
> The more relevant question would be with TCP_KEEPALIVE and > TCP_USER_TIMEOUT on sockets, do we really need ping-pong framework in > Clients? We might need that in transport/rdma setups, but my question is > concentrating on transport/rdma. > s \ concentrating on transport/rdma \ concentrating on transport/socket \ In other words would like to hear why do we need heart-beat mechanism in > the first place. One scenario might be a healthy socket level connection > but an unhealthy brick/client (like a deadlocked one). Are there enough > such realistic scenarios which make ping-pong/heartbeat necessary? What > other ways brick/client can go bad? > > On Thu, Jan 19, 2017 at 3:36 PM, Raghavendra G <raghaven...@gluster.com> > wrote: > >> >> >> On Thu, Jan 19, 2017 at 1:50 PM, Mohammed Rafi K C <rkavu...@redhat.com> >> wrote: >> >>> Hi, >>> >>> The patch for priority based ping packets [1] are ready to review. As >>> Shyam mentioned in the comment on patch set 12, it doesn't solve the >>> problem with network conjunction nor the disk latency. Also it won't >>> priorities the reply of ping packets at the server end (We don't have a >>> straight way to identify prognum in the reply). >>> >>> >>> So my question , Is it worth of taking the patch or do we need to think >>> through a more generic solutions. >>> >> >> Though ping requests can take more time to reach server due to heavy >> traffic, realistically speaking common reasons for ping-timer expiry have >> been either >> >> 1. client not been able to read ping response [2] >> 2. server not able to read ping request. >> >> Speaking about 2 above, Me, Kritika and Pranith were just discussing >> today morning about an issue where they had hit ping timer expiry in >> replicated setups when disk usage was high. The reason for this as Pranith >> pointed out was, >> 1. posix has some fops (like posix_xattrop, posix_fxattrop) which do >> syscalls after holding a lock on inode (inode->lock). >> 2. During high disk usage scenarios, syscall latencies were high >> (sometimes >= ping-timeout value) >> 3. Before being handed over to a new thread at io-threads xlator, a fop >> gets executed in one of the threads that reads incoming messages from >> socket. This execution path includes some translators like protocol/server, >> index, quota-enforcer, marker. And these translators might access inode-ctx >> which involves locking inode (inode->lock). Due to this locking latency of >> syscall gets transferred to poller thread. Since poller thread is waiting >> on inode->lock, it won't be able to read ping requests from network in-time >> resulting in ping-timer expiry. >> >> I think Kritika is working on a patch to eliminate locking on inode in 1 >> above. We also need to reduce the actual fop execution in poller thread. >> IOW, we need to hand over the fop execution to io-threads/syncop-threads as >> early as we can. [3] helps in this scenario as it adds back the socket for >> polling immediately after reading the entire msg but before execution of >> fop begins. So, even though fop execution is happening in poller thread, >> msgs from same connection can be read in other poller threads parallely >> (and we can scale up the number of epoll-threads when load is high). >> >> Also, note that there is no way we can send entire ping request as >> "URGENT" data over network. So, prioritization in [1] is only the queue of >> messages waiting to be written to network. So, Though I suggested [1], the >> more I think of it, it seems less irrelevant. >> >> [2] http://review.gluster.org/12402 >> [3] http://review.gluster.org/15036 >> >> >>> >>> Note : We could make this patch more generic so that any packets can be >>> marked as priority to add into the head instead of just Ping packets. >>> >>> [1] : http://review.gluster.org/#/c/11935/ >>> >>> Regards >>> >>> Rafi KC >>> >>> _______________________________________________ >>> Gluster-devel mailing list >>> Gluster-devel@gluster.org >>> http://lists.gluster.org/mailman/listinfo/gluster-devel >>> >> >> >> >> -- >> Raghavendra G >> > > > > -- > Raghavendra G > -- Raghavendra G
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel