Folks,

So for the other folks on the thread, I should clarify, we are
talking about RFC 4755 which is the "connected mode" of IPonIB.
Linux in this situation tries to use a 65520 byte MTU. I am not
sure how they came to this 64K - 16 value. Of course it
goes without saying we need to inter-operate too. RFC4755
has a negotiation method specified to use the minimum of
what the two ends want.

My hope is that our stack can deal with using a similar
large value around 64K. It sounds like for simplicity
(and RFC4755 recommends also) though we should try to
use a max value that works for both IPv4 and IPv6, which
based on the previous conversation sounds like 64K
(though would get pushed down to 65520 when talking to Linux).

But ultimately, it sounds like measurements are the only
way to tell if we will experience a performance drop off
with really big sizes.

I am not as concerned about the checksum
stability since IB has it's own stronger 32-bit end-to-end
checksum (plus a 16-bit checksum to cover
the much smaller changes that occur hop-by-hop).

-ted

Kevin Ge wrote:
> Hi Garrett and James,
> 
> Thanks for your answer. We are working on IPoIB to support large MTU. We 
> will do more work to explore what is the suitable MTU size for IPoIB 
> situation.
> 
> Best Regards
> Kevin Ge
> 
> Garrett D'Amore wrote:
>> James Carlson wrote:
>>  
>>> Kevin Ge writes:
>>>      
>>>> Hi Garrett,
>>>>
>>>> Yes, IPv4 protocol has 64K limit. But how about IPv6? Does IPv6 also 
>>>> limit to 64K in Solaris?
>>>>           
>>> It's roughly the same -- 65575 to be exact (because the 16-bit IPv6
>>> payload length field, unlike IPv4, doesn't include the header length).
>>> If you look at the ip6.c source, you'll see:
>>>
>>>             case IP6OPT_JUMBO:
>>>                 if (hdr_type != IPPROTO_HOPOPTS)
>>>                     goto opt_error;
>>>                 goto opt_error; /* XXX Not implemented! */
>>>
>>> ... so we don't have RFC 2675 jumbograms.
>>>
>>> I guess I'd be surprised if it's terribly useful anyway.  Even if you
>>> can arrange to have your applications somehow offer that much in one
>>> go, you're talking about an _incredibly_ tiny reduction of overhead,
>>> at least at the protocol level.  At 65575, the overhead is about
>>> 0.09%.
>>>
>>> Yes, I realize that jumbograms are a _system_ overhead game and not a
>>> protocol game, but I think both have to be balanced.  As the packet
>>> size goes up, both the probability and the cost of a drop increase,
>>> effectively wiping out the benefits.  I'd expect that users need to
>>> avoid both the per-packet overhead imposed by tiny packets at one end
>>> of the spectrum *and* the overall cost imposed by huge ones at the
>>> other end.
>>>
>>> The other small problem is hardware ... Ethernet supporting frames
>>> that long would have clock stability problems, wouldn't it?  I had
>>> thought that the situation today was that anything over 16K or so was
>>> experimental, as in:
>>>
>>>   http://tinyurl.com/3tdav3
>>>
>>> Obviously, Ethernet's not the only game in town, but with little
>>> network-wide support for monsters of that sort, it'd probably be
>>> pretty hard to deploy.
>>>       
>>
>> There are other problems with ethernet.
>>
>> The frame checksum is not reliable when the size gets too big.  (ISTR 
>> it started to be unreliable over about 13 or 14k, google will tell you 
>> more.)
>>
>> Also, frankly, the effort to support extremely large frames actually 
>> causes quite a lot of additional complexity in drivers.  Going outside 
>> the typical cases of 8 or 9K here may wind up on some extremely slow 
>> paths in these drivers, and in the kernel itself.  (Think for example, 
>> of the challenge of frequently being able to continuously reallocate 
>> contiguous memory for such frames -- or for that matter, the IOMMU 
>> impact of churning such frames.)
>>
>> I've not done any measurements myself, but I wouldn't be surprised to 
>> see performance drop (and system overhead increase) when frame sizes 
>> exceed 16k.
>>
>>     -- Garrett
>> _______________________________________________
>> networking-discuss mailing list
>> [email protected]
>>   
> 

-- 
Ted H. Kim
Sun Microsystems, Inc.                  [EMAIL PROTECTED]
222 North Sepulveda Blvd., 10th Floor   (310) 341-1116
El Segundo, CA  90245                   (310) 341-1120 FAX
_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to