Hi,

When testing HAST synchronization running both primary and secondary HAST
instances on the same host I faced an issue that the synchronization may be
very slow:

Apr  9 14:04:04 kopusha hastd[3812]: [test] (primary) Synchronization complete. 
512MB synchronized in 16m38s (525KB/sec).

hastd is synchronizing data in MAXPHYS (131072 bytes) blocks. Sending it
splits them on smaller chunks of MAX_SEND_SIZE (32768 bytes), while receives
the whole block calling recv() with MSG_WAITALL option.

Sometimes recv() gets stuck: in tcpdump I see that sending side sent all
chunks, all they were acked, but receiving thread is still waiting in
recv(). netstat is reporting non empty Recv-Q for receiving side (with the
amount of bytes usually equal to the size of last sent chunk). It looked like
the receiving userspace was not informed by the kernel that all data had been
arrived.

I can reproduce the issue with the attached test_MSG_WAITALL.c.

I think the issue is in soreceive_generic(). 

If MSG_WAITALL is set but the request is larger than the receive buffer, it
has to do the receive in sections. So after receiving some data it notifies
protocol (calls pr_usrreqs->pru_rcvd) about the data, releasing so_rcv
lock. Returning it blocks in sbwait() waiting for the rest of data. I think
there is a race: when it was in pr_usrreqs->pru_rcvd not keeping the lock the
rest of data could arrive. Thus it should check for this before sbwait().

See the attached uipc_socket.c.soreceive.patch. The patch fixes the issue for
me.

Apr  9 14:16:40 kopusha hastd[2926]: [test] (primary) Synchronization complete. 
512MB synchronized in 4s (128MB/sec).

I observed the problem on STABLE but believe the same is on CURRENT.

BTW, I also tried optimized version of soreceive(), soreceive_stream(). It
does not have this problem. But with it I was observing tcp connections
getting stuck in soreceive_stream() on firefox (with many tabs) or pidgin
(with many contacts) start. The processes were killable only with -9. I did
not investigate this much though.

-- 
Mikolaj Golub

Attachment: test_MSG_WAITALL.c
Description: Binary data

Index: sys/kern/uipc_socket.c
===================================================================
--- sys/kern/uipc_socket.c	(revision 220472)
+++ sys/kern/uipc_socket.c	(working copy)
@@ -1836,28 +1836,34 @@ dontblock:
 			/*
 			 * Notify the protocol that some data has been
 			 * drained before blocking.
 			 */
 			if (pr->pr_flags & PR_WANTRCVD) {
 				SOCKBUF_UNLOCK(&so->so_rcv);
 				VNET_SO_ASSERT(so);
 				(*pr->pr_usrreqs->pru_rcvd)(so, flags);
 				SOCKBUF_LOCK(&so->so_rcv);
 			}
 			SBLASTRECORDCHK(&so->so_rcv);
 			SBLASTMBUFCHK(&so->so_rcv);
-			error = sbwait(&so->so_rcv);
-			if (error) {
-				SOCKBUF_UNLOCK(&so->so_rcv);
-				goto release;
+			/*
+			 * We could receive some data while was notifying the
+			 * the protocol. Skip blocking in this case.
+			 */
+			if (so->so_rcv.sb_mb == NULL) {
+				error = sbwait(&so->so_rcv);
+				if (error) {
+					SOCKBUF_UNLOCK(&so->so_rcv);
+					goto release;
+				}
 			}
 			m = so->so_rcv.sb_mb;
 			if (m != NULL)
 				nextrecord = m->m_nextpkt;
 		}
 	}
 
 	SOCKBUF_LOCK_ASSERT(&so->so_rcv);
 	if (m != NULL && pr->pr_flags & PR_ATOMIC) {
 		flags |= MSG_TRUNC;
 		if ((flags & MSG_PEEK) == 0)
 			(void) sbdroprecord_locked(&so->so_rcv);
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Reply via email to