Brian,

I agree. IMHO default tuning for specific benchmarks doesn't often benefit
the average user.
Perhaps the default should be the size of the receive buffer modulo blocksize
and an alternate default (pre-set) available to revert to the current
behaviour, (even then I'd prefer to see the existing limit tunable with a
defined safe range, while that preset default was selected in /etc/system).

George

*>Date: Wed, 03 Jan 2007 09:09:45 -0500
*>From: Brian Utterback <[EMAIL PROTECTED]>
*>Subject: [networking-discuss] When should fused TCP connections block.
*>To: [email protected]
*>MIME-version: 1.0
*>Content-transfer-encoding: 7BIT
*>X-BeenThere: [email protected]
*>Delivered-to: [email protected]
*>X-PMX-Version: 5.2.0.264296
*>X-Original-To: [email protected]
*>X-Mailman-Version: 2.1.4
*>List-Post: <mailto:[email protected]>
*>List-Subscribe:  
<http://mail.opensolaris.org/mailman/listinfo/networking-discuss>, 
<mailto:[EMAIL PROTECTED]>
*>List-Unsubscribe:  
<http://mail.opensolaris.org/mailman/listinfo/networking-discuss>, 
<mailto:[EMAIL PROTECTED]>
*>List-Archive: <http://mail.opensolaris.org/pipermail/networking-discuss>
*>List-Help: <mailto:[EMAIL PROTECTED]>
*>List-Id: Networking General Discussion <networking-discuss.opensolaris.org>
*>User-Agent: Thunderbird 2.0b1 (X11/20070101)
*>
*>In the interests of open development, I wanted to get the opinions
*>of the OpenSolaris developers on this mailing list.
*>
*>In Solaris 10, Sun introduced the concept of "fused" TCP connections.
*>The idea is that most of the TCP algorithms are designed to deal with
*>unreliable network wires. However, there is no need for all of that
*>baggage when both ends of the connection are on the same machine,
*>since there are no unreliable wires between them. The is no reason
*>to limit the packet flow because of Nagle, or silly window syndrome
*>or anything else, just put the data directly into the receive buffer
*>and have done with it.
*>
*>This was a great idea, however, there was a slight modification to
*>the standard streams flow control added to the fused connections. This
*>modification placed a restriction of the number of unread data blocks
*>on the queue. In the context of TCP and the kernel, a data block
*>amounts to the data written in a single write syscall, and the queue
*>is the receive buffer. What this means in practical terms is that the
*>producer process can only do 7 write calls without the consumer doing
*>a read. The 8th write will block until the read.
*>
*>This is done to balance the process scheduling and prevent the producer
*>from starving the consumer for cycles to read the data. The number was
*>determined experimentally by tuning to get good results on an important
*>benchmark.
*>
*>I am distrustful of the reasoning, and very distrustful of the results.
*>You can see how it might improve performance by reducing the latency.
*>If your benchmark has a producer and a consumer, you want the consumer
*>to start consuming as soon as possible, otherwise the startup cost gets
*>high. Also, by having a producer produce a bunch of data and then have
*>the consumer consume them, you have to allocate more data buffers than
*>might otherwise be necessary. But I am not convinced that it should be
*>up to TCP/IP to enforce that. It seems like it should be the job of
*>the scheduler, or the application itself. And tuning to a particular
*>benchmark strikes me as particularly troublesome.
*>
*>Furthermore, it introduces a deadlock situation that did not exist
*>before. Applications that have some knowledge of the size of the
*>records that they deal with often use MSG_PEEK or FIONREAD to query
*>the available data and wait until a full record arrives before reading
*>the data.  If the data is written in more than 8 chunks by the
*>producer, then the producer will block waiting for the consumer, who
*>will never read, waiting for the rest of the data to arrive.
*>
*>Now this same deadlock was always a possibility with the flow control,
*>but as long as the record size was considerably smaller than the receive
*>buffer size, the application never had to worry about it. With this type
*>of blocking, the receive buffer can effectively be 8 bytes, making the
*>deadlock a very real possibility.
*>
*>So, I am open to discussion on this. Is this a reasonable approach to
*>context switching between a producer and consumer, or should the
*>scheduler do this better? Perhaps instead of blocking, the process
*>should just lose the rest of its time slice? (I don't know if that
*>is feasible) Any thoughts on the subject?
*>
*>blu
*>
*>"The genius of you Americans is that you never make clear-cut stupid
*>  moves, only complicated stupid moves which make us wonder at the
*>  possibility that there may be something to them which we are missing."
*>  - Gamal Abdel Nasser
*>----------------------------------------------------------------------
*>Brian Utterback - Solaris RPE, Sun Microsystems, Inc.
*>Ph:877-259-7345, Em:brian.utterback-at-ess-you-enn-dot-kom
*>_______________________________________________
*>networking-discuss mailing list
*>[email protected]


George Shepherd
http://clem.uk/~georges/
==============================================================================
   Solaris Revenue Product Engineering:    |  SUN Microsystems
       Core team  -Internet                |  Guillemont Park
   Email: [EMAIL PROTECTED]          |  Camberley GU17 9QG
   Disclaimer: Less is more, more or less  |  United Kingdom 
==============================================================================

_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to