Hi Antony I haven't read your draft, but will do and will comment if it helps.
FYI I pinged the text below to Valery over the weekend after reading his draft (which I liked). Since sending the text below, I had a thought about a super long bitmask potentially becoming fragmented itself.. however i'm not sure how likely that would be. cheers //// 4.2.1.5. Implementation Details When a sender uses the techniques described in Sections 4.1.1 (randomized fragment ordering) and 4.1.2 (inter-fragment delays), a receiver cannot immediately distinguish between fragments that have been lost in transit and fragments that are still enroute due to deliberate pacing by the sender or reordering in the network. A receiver that sends a Receipt Status Message (Section 4.2.1.4) prematurely, e.g., prior to all fragments having had reasonable time to arrive, will request retransmission of fragments that are not in fact lost. This can result in unnecessary duplicate traffic, wasted bandwidth, and in severe cases a feedback loop of spurious retransmissions that worsens the congestion that the unilateral techniques were designed to alleviate. To mitigate this, implementations SHOULD adopt a fragment collection strategy that accounts for the expected arrival pattern of fragments. This document defines three approaches, in order of increasing complexity. An implementation MAY support more than one and allow the operator to select the appropriate mode based on deployment characteristics. 4.2.1.5.1. Strict Mode In Strict Mode, the receiver MUST NOT send a Receipt Status Message until a configurable hold-down timer has expired after receipt of the first fragment of a message. The hold-down timer value SHOULD be set by the operator based on knowledge of the network characteristics between the peers. The following guidance is provided for selecting hold-down timer values: - For low-latency, high-reliability networks (e.g., data centre interconnects, enterprise LAN): a hold-down timer of 50-200 milliseconds is RECOMMENDED. On such networks, fragments that have not arrived within this window are almost certainly lost. - For typical Internet paths with moderate latency: a hold-down timer of 500 milliseconds to 2 seconds is RECOMMENDED. - For high-latency or bandwidth-constrained links (e.g., satellite communications, congested mobile networks): a hold-down timer of 3-10 seconds or more may be necessary. On such links, propagation delay alone can be several hundred milliseconds, and the sender may be deliberately pacing fragments over an extended period. Operators SHOULD set the hold-down timer to at least twice the expected one-way propagation delay of the link. If no hold-down timer is configured, the implementation MUST use a default value of no less than 1 second. 4.2.1.5.2. Relaxed Mode In Relaxed Mode, the receiver tracks the arrival times of incoming fragments and MUST NOT send a Receipt Status Message while fragments are still arriving at a steady rate. The receiver SHOULD send a Receipt Status Message only after a quiescence period during which no new fragments have been received. The quiescence period SHOULD be set to at least twice the observed mean inter-arrival time of fragments received so far in the current exchange. This allows the receiver to adapt to the sender's actual pacing behaviour without prior configuration. Relaxed Mode is suitable for deployments where the network characteristics are unknown or variable, as it requires no operator configuration. However, it may be slower to react to genuine loss than Strict Mode with a well-tuned timer. 4.2.1.5.3. Adaptive Mode In Adaptive Mode, the receiver combines both approaches. It uses a configurable minimum hold-down timer (as in Section 4.2.1.5.1) and additionally applies quiescence detection (as in Section 4.2.1.5.2). The receiver MUST NOT send a Receipt Status Message until both conditions are met: the hold-down timer has expired AND no new fragments have arrived for the quiescence period. This mode is RECOMMENDED for general-purpose implementations as it provides a safety floor via the timer while adapting to actual network conditions via quiescence detection. 4.2.1.5.4. Interaction with IKEv2 Retransmission Timers Implementations that use short initial retransmission timers with exponential back-off (as is common in deployed IKEv2 implementations) MUST ensure that the fragment collection hold-down period is considered when calculating retransmission timeouts. If the sender's retransmission timer fires before the receiver has had time to collect all fragments and respond with a Receipt Status Message, the sender will retransmit the entire message (or the first fragment per Section 4.1.3), defeating the purpose of selective retransmission. Specifically, when a sender is transmitting a large and fragmented message and is aware that selective retransmission may be in use, the sender's retransmission timer for that exchange SHOULD be set to a value no less than the time required to transmit all fragments (including any inter-fragment delays) plus a reasonable allowance for the receiver to process the fragments and return a Receipt Status Message. 4.2.1.5.5. Considerations for Bandwidth-Constrained and High-Latency Networks On satellite communication links and other high-latency, low-bandwidth networks, the interaction between the techniques described in this document requires particular care. These networks exhibit the combination of high propagation delay (often 250ms or more one-way for geostationary links), limited bandwidth that makes congestion from spurious retransmissions particularly costly, and higher baseline packet loss rates that make selective retransmission most valuable. This creates a tension: the receiver benefits most from selective retransmission (because fragments are more likely to be genuinely lost), but must also wait longest before requesting it (because fragments take longest to arrive). Implementations deployed in these environments SHOULD use Adaptive Mode with a hold-down timer of at least one full round-trip time of the link and SHOULD err on the side of caution when in doubt. 4.2.1.5.6. Fragment Count as a Receiver Heuristic The Total Fragments field in the Encrypted Fragment payload (Section 2.5 of [RFC7383]) is available to the receiver from the moment the first fragment arrives. This value provides a useful implicit signal that the receiver MAY use to adjust its fragment collection behaviour without requiring any protocol extension or negotiation. A message fragmented into a small number of fragments (e.g., fewer than 20) is likely to be fully transmitted by the sender within a short time window, even with inter-fragment delays. A message fragmented into a large number of fragments (e.g., 100 or more) will take substantially longer to transmit, particularly when the sender is using the rate-limiting technique of Section 4.1.2. The receiver can use the Total Fragments value to scale its hold-down timer or quiescence period accordingly. The following approach is RECOMMENDED. Implementations SHOULD allow the operator to configure a per-fragment delay estimate (in milliseconds) representing the expected inter-fragment spacing used by the sender. The receiver then calculates an adjusted hold-down timer as: adjusted_holddown = base_holddown + (Total_Fragments * per_fragment_delay) where base_holddown is the hold-down timer value as described above. This ensures that the receivers time-out window scales linearly with the size of the message being received. Senders can influence receiver behaviour through their choice of fragment size, which determines the Total Fragments count. A sender on a low-latency, high-bandwidth link MAY choose a smaller fragment size (producing more fragments) if it determines that the receiver or intermediate network can handle the higher packet rate. Conversely, a sender on a high-latency or bandwidth-constrained link (e.g., satellite communication) SHOULD use a larger fragment size where possible to reduce the total number of fragments, thereby reducing both the transmission time and the window during which the receiver must wait before concluding that fragments are missing. The following guidance is provided for a sender fragment size selection based on network characteristics: - On low-latency, high-bandwidth networks: the sender MAY use the minimum fragment size (i.e., the path MTU minus IKEv2 overhead), as the receiver can absorb a high packet rate and the resulting large fragment count should not cause excessive delay before selective retransmission can engage. - On moderate-latency Internet paths: the sender SHOULD use the path MTU as the fragment size, which is the default behaviour defined in [RFC7383]. - On high-latency or bandwidth-constrained links: the sender SHOULD avoid producing an unnecessarily large number of fragments. Where the path MTU permits, a fragment size larger than the minimum SHOULD be used. The trade-off is that larger fragments are more costly to retransmit individually if lost, but the reduced fragment count allows the receiver to engage selective retransmission sooner with greater confidence that gaps represent genuine loss. Implementations SHOULD allow the operator to configure the fragment size or to select a network profile (e.g., "low-latency", "internet", "satellite") that sets appropriate defaults for both the fragment size and the receiver's hold-down parameters. 4.2.1.5.7. Duplicate Fragment Handling Regardless of the mode in use, a receiver that has already successfully processed a fragment and subsequently receives a duplicate, (whether from a spurious retransmission or some form of network duplication) MUST silently discard the duplicate. Implementations MUST NOT treat receipt of a duplicate fragment as an error condition. On Wed, Mar 18, 2026 at 3:03 PM Antony Antony <[email protected]> wrote: > > Hi Valery, > > Thanks for taking the time to present your draft at tomorrow's session. > I quickly went through your slides — appreciate you including a > comparison with our draft. I am sorry for my delayed response! > I updated our draft back in January, and didn't get around to responding it. > > Thnaks for the numbers in your slides. That give better picture. > > On Thu, Dec 11, 2025 at 03:19:18PM +0300, Valery Smyslov wrote: > > Hi Antony, > > > > please, see inline. > > > > > Hi Valery, > > > Thank you for the detailed feedback. > > > > > > I have been looking through the simultaneous-initiation case you describe, > > > where both peers have just completed an IKE SA rekey and therefore begin > > > with Message ID 0 on each side. One situation can be slightly problematic > > > when there delayed responses, however, I don't see any case where the > > > proposed ack would fail to advance the negotation. > > > > > > Still to make it clear at the end I am proposing two direction specific > > > Notifiers instead of one. > > > > This would help. However, it won't work if some future (imaginary) IKE > > extension > > makes each exchange to use different key (e.g., as KDF(SK_ex, MSG-ID). > > Once this draft is standardized, any such future (imaginary) extension would > need to accommodate the existing mechanism regardless. More importantly, > your proposal has the same property: the Receipt Status Message is sent > with the same Message ID as the original exchange, so you also have two > messages sharing a Message ID — the receipt status and the actual IKE > response. The concern applies equally to both drafts. > > As for the Message ID as AEAD counter: yes, implementations need to handle > this carefully, but less of a protocol correctness issue. Implementations > can track the context create a monotonus counter as IV. > > > > > > Here is How I see the case you described. I am using > > > CREATE_CHILD_SA as example. The analysis would similar for other excahnge > > > too. > > > > > > 1. Simultaneous CREATE_CHILD_SA requests after rekey > > > In the simplest case: > > > > > > ---- IKE SA Rekeyed both ends Message ID 0 > > > Request > > > Initiator Responder > > > > > > MID(0) CREATE_CHILD_SA ----> <------ MID(0) CREATE_CHILD_SA > > > FACK(MID=0, respose flag=1) ---> <------ FACK(MID=0, respose flag=1) > > > > > > Since each peer knows it has an outstanding request with MID=0, > > > the received FACK(MID=0,R=1) can be unambiguously associated with its own > > > outstanding request. > > > > Yes. > > > > > 2. Case where one peer has advanced its CREATE_CHILD_SA exchange and the > > > response is lost > > > > > > A more interesting scenario is when both peers send the CREATE_CHILD_SA > > > request, but one peer sends its response and then advances its internal > > > state, while the response is lost: > > > > > > The actual CREATE_CHILD_SA response fragments are lost. And the initiator > > > responsd with FACK(MID=0, respose flag) > > > > > > MID(0) CREATE_CHILD_SA ----> <------ MID(0) CREATE_CHILD_SA > > > FACK(MID=0, respose flag=1) ---> <------ Partial Retransmit (MID=0) > > > > > > MID(0) CREATE_CHILD_SA respose flag=1 ----> > > > > > > <------MID(0) CREATE_CHILD_SA respose > > > flag=1 ----> > > > <------ FACK(MID=0, respose flag=1) > > > > > > Here, once the responders have advanced past CREATE_CHILD_SA, any FACK it > > > receives later clearly corresponds to the response it sent. > > > The initiator can correctly attribute that FACK to the outstanding > > > response > > > it is waiting for. > > > > I meant the case: > > > > MID(0) CREATE_CHILD_SA ----> <------ MID(0) CREATE_CHILD_SA > > FACK(MID=0, response flag=1) ---> (1) (delayed) > > > > <---- MID(0) > > CREATE_CHILD_SA response flag=1 > > FACK(MID=0, response flag=1) ---> (2) > > (1 received) > > > > Message (1) is the FACK response to the responder's request while > > message (2) is the FACK response to responder's response to initiator's > > request. > > The responder cannot distinguish these two messages. > > I agree that making the content different would help (but see above), > > but in general this is a headache to implement (since it violates the steps > > the incoming message is processed - it is processed in a context of a > > particular > > exchange that is determined before the message is parsed). > > > > > 3. Delayed or misordered FACK messages > > > I agree there are corner cases where a delayed FACK may arrive late and > > > overlap with another exchange with same MID, 0 in this case.. > > > However, in these cases processing the FACK as a hint rather than a > > > state-advancing message does not break > > > protocol correctness. > > > At worst, a late FACK would simply cause an extra re-transmit of fragments > > > that already arrived. > > > > > > Addressing your core concern: distinguishing request-side vs response-side > > > acknowledgments > > > > > > To address the case where a FACK for a request and a FACK for a response > > > may > > > look identical (same MID, same exchange type, same R flag), I agree this > > > could lead to an un necessary ambiguity in simultaneous-initiation > > > scenarios. > > > > > > To resolve this cleanly, I propose defining two separate Notify Status > > > Types: > > > > > > FRAGMENT_ACK_REQ — acknowledgment of fragments belonging to a request > > > > > > FRAGMENT_ACK_RES — acknowledgment of fragments belonging to a response > > > > > > These two notifiers would make the semantic direction explicit, > > > eliminating > > > any ambiguity you describe even in simultaneous exchanges with identical > > > Message IDs. > > > > > more responses bellow inline. > > > > > > On Wed, Nov 26, 2025 at 03:24:15PM +0300, Valery Smyslov wrote: > > > > HI Antony, > > > > > > > > I doubt that this proposal is workable, at least in some situations. > > > > Consider the IKE SA was just rekeyed, so that each peer starts > > > > its first exchange with Message ID = 0. And consider they simultaneously > > > > initiate same exchange, say CREATE_CHILD_SA. And consider > > > > the response messages need fragmentation. Then the "response to > > > > response" > > > > messages will have the same Message ID (0) and the same exchange type > > > > and the same "response flag" as the regular response message for the > > > > other exchange. Moreover, they both can have the same content - > > > > FRAGMENT_ACK notify. > > > > It is impossible for the receiver to find the exchange this message > > > > belongs to. > > > > (OK, I can imagine a lot of possible approaches in this situation - > > > > e.g., ignore > > > > such messages or process them for both exchanges since it is only a > > > > hint, > > > > but this decreases the value of this extension). > > > > > > > > In addition, you have to disable (or somehow tweak) a replay protection > > > > mechanism > > > > in IKEv2 since you should be able to process different messages with > > > > the same Message ID. > > > > And you already said that retransmission behavior of responders is also > > > > changed. > > > > > > > > Overall, the proposed solution looks like a protocol hack to me and I'm > > > > not sure it is > > > > so easy to implement (taking into considerations all possible cases). > > > > > > > > I think that depending on the nature of packet loss and the maximum > > > > size of the message, > > > > several approaches are possible. > > > > > > > > 1. If the message size is of few tens of Kbytes (so that the number of > > > > fragments is few tens), > > > > then the simplest solution would be either to randomize the order > > > > fragments are sent > > > > when retransmitted (or just shift them) and/or add some small delay > > > > (20-50 ms) between sending each > > > > fragment. This will cope with situation when network is quickly > > > > saturated or the receiver's buffers > > > > are too small and receivers performance is insufficient. In this > > > > case only the first few fragments are > > > > processed and the rest is dropped. Both solutions (changing the > > > > order of fragment and introducing > > > > delay) should help. They are both easy to implement and don't > > > > require protocol change. > > > > > > This is a good idea. Thanks. > > > Also note RFC7383 state every retransmit must include the first segment. > > > Our proposal relaxes this requirement > > > when responding to FRAGMENT_ACK_*, because the first is received. > > > > This is incorrect, RFC 7383 does not contain this requirement. > > RFC 7383 says (or tries to say) that when responder has already sent the > > (possibly fragmented) response > > and it receives some (retransmitted or delayed) fragments of the request > > (which the responder has already processed), > > then the responder must only re-send its response if the received fragment > > number is 1 (the first fragment). > > > > Thus, the first fragment has a special meaning for the responder when it > > decides whether to re-send the response, but > > the initiator is free to send any subset of fragments at any time (as well > > as the responder). > > > > > > 2. If the message size is of several hundreds of Kbytes (so that the > > > > number of fragments is few hundreds), > > > > then the above approach might not help. In this situation your > > > > proposal may not help too, > > > > because the size of FRAGMENT_ACK can grow so much, that the message > > > > containing it > > > > would be fragmented itself. In addition, if the reason of the > > > > packet loss is also network saturation > > > > or insufficient buffer size on receiver, then even with individual > > > > acks the process may still > > > > not converged (you still send a lot of extra data with each > > > > retransmission, that adds to the problem). > > > > In this situation the preferred solution would be to redefine IKE > > > > exchanges, perhaps splitting > > > > them into two sub-exchanges, where peer send a series of fragments > > > > one by one each > > > > individually acknowledged (and not all fragments at once). > > > > > > > > 3. If the message size is more than 1 Mbyte, then it is not possible to > > > > use UDP with IKE fragmentation > > > > in its current form regardless of how fragments are sent and > > > > acknowledged, because > > > > the number of fragments is limited to 2^16, thus TCP should be used. > > > > > > Yes. This out of scope until number IKEv2 extend fragment numbers. Which > > > at > > > this point I think is simple update RFC7383 to extend "Total Fragments" > > > and > > > "Fragment Number" to 32 bit numbers from the current 16 bits. I tried to > > > write it down! The prposed Fragment Ack could support 32bit versions as > > > well. > > > > I don't think that extending fragments number to 2^32 has practical sense. > > With 2^16 and the size of fragment around 500 bytes it is enough to transfer > > 32 Mbytes of data. I'm very skeptical that even with the help of acks > > but w/o any congestion control transferring that much data will go smoothly. > > > > > > And if network just randomly drops packets (I assume there is no > > > > congestion problems), > > > > then your proposal won't help much (in my opinion). > > > > > > > > I believe we are now at situation #1. Thus I think that simpler > > > > approaches should help. > > > > If we sometime reach situation #2 (e.g., if we use Classic McEliece > > > > with the smallest public keys), > > > > then proposals like yours can be considered (but I prefer less hacking > > > > approaches). > > > > > > I am trying to be a bit less hack with two notifiers! > > > > Thinking more about this I come up to an alternative proposal: > > https://datatracker.ietf.org/doc/draft-smyslov-ipsecme-ikev2-fragm-large-msg/ > > > > Comparing to yours it has (as I believe) the following advantages: > > - request/response semantics is preserved - no "response to response" > > - retransmission logic is preserved - initiator is always an active side > > - IKE replay protection is not affected > > - no layer violation - the extension can be entirely implemented in the > > IKE fragmentation code, > > upper layers (e.g., message parsing and forming) are not affected > > - RFC 7383 PMTU discovery is supported > > - traffic overhead is smaller in most cases (but I agree that not in all) > > - receipt status messages are protected against replays > > - no negotiation is needed (not a real advantage, just a feature that can > > be changed in future) > > > > My proposal also has one small hack (or a trick), but it is not immanent > > to the proposal, there are several ways how to avoid it (and perhaps it is > > not needed at all, > > this is just in case). > > The ICV trick interestg. It is smart, and I wonder wouldn't it be an interop > risk: a non-supporting peer sees an ICV failure and must decide whether to > re-check. No negotiation means no clean capability signaling. Using notifier > is my preference. I vote to negotiate. > > Most of the other points are, in my opinion, a matter of design preference, > and I have mine. One concrete reason I strongly prefer ranges over a bitmap: > ranges are far easier to inspect in practice — both in Wireshark dissectors > and > in plain log output — which matters for diagnostics and interop testing. > A bitmap requires bit-level decoding; a (start, count) pair is immediately > human-readable. > > The remaining concerns you raised are addressed in v3 of our draft: > > I am also open to merging the two approaches: keep Valery's ICV trick > to avoid negotiation, but use Notify payloads with ranges instead of > a bitmap. This would combine the cleaner diagnostics and human-readable > encoding of ranges with the no-negotiation property of Valery's design. > > Would others in the WG like to weigh in? > > Looking forward to tomorrow's presentation, and hoping we have time > during the session to discuss both drafts. > > regards, > -antony > > _______________________________________________ > IPsec mailing list -- [email protected] > To unsubscribe send an email to [email protected] _______________________________________________ IPsec mailing list -- [email protected] To unsubscribe send an email to [email protected]
