From: Wenji Wu <[EMAIL PROTECTED]>

Greetings,

For Linux TCP, when the network applcaiton make system call to move data
from
socket's receive buffer to user space by calling tcp_recvmsg(). The socket
will
be locked. During the period, all the incoming packet for the TCP socket
will go
to the backlog queue without being TCP processed. Since Linux 2.6 can be
inerrupted mid-task, if the network application expires, and moved to the
expired array with the socket locked, all the packets within the backlog
queue
will not be TCP processed till the network applicaton resume its execution.
If
the system is heavily loaded, TCP can easily RTO in the Sender Side.

Attached is the Changelog for the patch

best regards,

wenji

Wenji Wu
Network Researcher
Fermilab, MS-368
P.O. Box 500
Batavia, IL, 60510
(Email): [EMAIL PROTECTED]
(O): 001-630-840-4541
From: Wenji Wu <[EMAIL PROTECTED]>

- Subject

Potential performance bottleneck for Linux TCP (2.6 Desktop, Low-latency 
Desktop)


- Why the kernel needed patching

For Linux TCP, when the network applcaiton make system call to move data from
socket's receive buffer to user space by calling tcp_recvmsg(). The socket will
be locked. During the period, all the incoming packet for the TCP socket will go
to the backlog queue without being TCP processed. Since Linux 2.6 can be
inerrupted mid-task, if the network application expires, and moved to the
expired array with the socket locked, all the packets within the backlog queue
will not be TCP processed till the network applicaton resume its execution. If
the system is heavily loaded, TCP can easily RTO in the Sender Side.

- The overall design apparoch in the patch

the underlying idea here is that when there are packets waiting on the prequeue 
or backlog queue, do not allow the data receiving process to release the CPU 
for long. 

- Implementation details

We have modified the Linux process scheduling policy and tcp_recvmsg().

To summarize, the solution works as follows: 

an expired data receiving process with packets waiting on backlog queue or 
prequeue is moved to the active array, instead of expired array as usual. 
More often than not, the expired data receiving process will continue to run. 
Even it doesn’t, the wait time before it resumes its execution will be greatly 
reduced. 
However, this gives the process extra runs compared to other processes in the 
runqueue. 

For the sake of fairness, the process would be labeled with the extra_run_flag. 

Also considering the facts that: 

(1) the resumed process will continue its execution within tcp_recvmsg(); 
(2) tcp_recvmsg() does not return to user space until the prequeue and backlog 
queue are drained. 

For the sake of fairness, we modified tcp_recvmsg() as such: after prequeue and 
backlog 
queue are drained and before tcp_recvmsg() returns to user space, any process 
labeled with 
the extra_run_flag will call yield() to explicitly yield the CPU to other 
proc-esses in the runqueue. 
yield() works by removing the process from the active array (where it current 
is, because it is running), 
and inserting it into the expired array. Also, to prevent processes in the 
expired array from starving, 

A special rule has been provided for Linux process scheduling (the same rule 
used for interactive processes): 
an expired process is moved to the expired array without respect to its status 
if processes in the expired array are starved.

Changed files:

/kernel/sched.c
/kernel/fork.c
/include/linux/sched.h
/net/ipv4/tcp.c

- Testing results

The proposed solution tradeoffs a small amount of fairness performance to 
resolve the TCP performance bottleneck. 
The proposed solution won’t cause serious fairness issue.

The patch is for Linux kernel 2.6.14 Deskop and Low-latency Desktop

Reply via email to