On 5/19/2015 8:54 AM, Harald Barth wrote:
> 
>> 1444 is the number of octets after subtracting the ip/ip6 and udp/udp6
>> headers for a network with MTU of 1500.
> 
> Yes but here I was on localhost and that has a loopback does have an
> MTU of 16436. If the MTU detection code does the right thing(TM).

Unlike IPv6 when using IPv4 there is no reliable path mtu detection for
UDP.  Nor is there a reliable method of delivering fragmented packets.
Rx doesn't even know which interface the packet is going to be sent
over.  What Rx sees is the set of MTUs for all interfaces and must
select the smallest of the set.   Sending a packet larger than that
might result in fragmentation and an undeliverable packet.

With IPv6 the rules change.

Rx provides mtu advertisements to the peers but this mechanism is not
helpful when there is more than one router/switch between the two peers
since neither peer has a complete view of the path mtu and the paths
which packets travel can be distinct in each direction.

>> Remember that Rx takes
>> advantage of larger network MTUs via the use of Jumbograms which combine
>> multiple Rx packets into one UDP message.
> 
>> Also remember that jumbo
>> grams are disabled by default because of the negative impact that occurs
>> when using them over the public Internet.
> 
> I think there are two mechanisms which may interact:
> 
> 1. packing multiple RX packets into one UDP packet (mostly a good
>    thing unless high packet drop)
> 
> 2. making UDP packets bigger than MTU (only a good thing if one knows
>    that there is almost no packet drop and reassembly is cheap.

An Rx packet is never larger than 1500 - Header Sizes.  A jumbogram
consists of many of these Rx packet data payloads combined into one UDP
packet so that if the large packet cannot be delivered it is possible to
transmit the Rx packets individually.  This permits the Rx packet
acknowledgment scoreboard to properly indicate which packets have been
received and which haven't when using jumbograms.

>> The biggest cause of the performance problems you are seeing is Rx.
> 
> My newest measurements make me believe that that's actually not the
> case because my numbers are equally bad when using voldump or tar
> after forcibly throwing away the buffer cache. No rx involved at all.
> But when I use tar multiple times buffer cache seems to help. When
> using voldump not so. I'll have to confirm my measurements by more
> runs (which take each approx 2.5h). Some of my numbers I get from
> strace feel fishy, so I have to confirm them again.

One interpretation of your results is that buffering can help with tar
because the data source can fill the buffer but with Rx the data source
cannot.

>> The
>> window size is small and every time that the sender doesn't have data to
>> send the connection stalls.
> 
> True.
> 
>> Every time the reader performs disk i/o it
>> stops reading from the stream and the Rx connection stalls.
> 
> Does not seem to be a problem in my case because even when I write to
> /dev/null the numbers are the same.

I used "disk i/o" to describe the steps of parsing the input stream,
deciding what to writing, and writing it.   The actual device is less
important.  When the application is performing these steps the Rx
transfer is stalling.

>> Volume moves, releases and dumps are some of the operations in which the
>> AuriStor Rx stack performance changes are quite noticeable.
> 
> ;-)
> 
> Harald.




Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to