Re: [IPsec] WGLC of draft-ietf-ipsecme-multi-sa-performance

Paul Wouters Thu, 16 Nov 2023 05:31:13 -0800

On Tue, 14 Nov 2023, Valery Smyslov wrote:

I support publication of this draft. I'm glad authors took my points into 
consideration
while preparing the latest version. I do have some comments though.


1. Section 1

  IKEv2 [RFC7296] already allows installing
  multiple Child SAs with identical Traffic Selectors, but it offers no
  method to indicate that the additional Child SA is being requested
  for performance increase reasons and is restricted to some resource
  (queue or CPU).  Without this indication, the peer might not accept
  multi Child SAs with identical Traffic Selectors and might delete one
  of the Child SAs it considered an unwanted duplicate.

There is some inconsistency here. You first say that IKEv2 allows
creating multiple identical Child SAs and then say that implementations
would delete them as unwanted duplicates. Clearly these implementations
violate RFC 7296, and we don't consider broken implementations, do we? :-)
I suggest to remove the last sentence, or to add a clarification.


Done.

2. Section 2

  There are a number of practical reasons why most Implementations have
  to limit a Child SA to only one specific hardware resource, but a key
  limitation is that sharing the crypto state, counters and sequence
  numbers between multiple CPUs is not feasible without a significant
  performance penalty.

Shouldn't it be clarified, that the performance problems arise
if you use an SA by several CPUs at the same time? I don't think
there are problems if you use the SA by several CPUs at different time.
Consider you have an SA with a traffic one packet per hour.
Each time it is processed up by a different CPU, then the resulted
state is stored in the shared memory. So, perhaps

s/one specific hardware resource/one specific hardware resource at any given 
time


I changed it to:

       but a key limitation is that sharing the crypto state, counters
       and sequence numbers between multiple CPUs that are trying to
       use these shared states at the same time is not feasible without
       a significant performance penalty.

3. Section 3

  If multiple Child SAs with the same Traffic Selectors are
  desired, the initiator will add the SA_RESOURCE_INFO notify payload
  to the Exchange negotiating the Child SA (eg IKE_AUTH or
  CREATE_CHILD_SA).  If this initial Child SA will be tied to a
  specific resource, it MAY indicate this by including an identifier in
  the Notification Data.  A responder that is willing to have multple
  Child SAs for the same Traffic Selectors will respond by also adding
  the SA_RESOURCE_INFO notify payload in which it MAY add a non-zero
  notify data payload.

This text is a bit inconsistent with IKEv2 specification.
In my reading the text implies that unless you exchange SA_RESOURCE_INFO,
you cannot initiate multiple SAs with same selector, which is wrong -
you can do it at any time if you follow RFC 7296.
Only if you want to follow this draft (i.e. - associate each Child SA with
some resource, and be able to limit their number with TS_MAX_QUEUE)
you have to negotiate. I think that this subtle thing should be expressed more 
accurately.


Changed to:

        If multiple Child SAs with the same Traffic Selectors that
        are bound to a single resource are desired, [...]

4. Section 3

  These resource-
  specific Child SAs MUST be negotiated with identical Child SA
  properties that were negotiated for the initial Child SA.  This
  includes cryptographic algorithms, Traffic Selectors, Mode (e.g.
  transport mode), compression usage, etc.  However, each Child SA does
  have its own keying material that is individually derived according
  to the regular IKEv2 process.

I think that MUST is over-restrictive if we talk about crypto algorithms.
For crypto algorithms herhaps something along the lines:

SHOULD be negotiated with the same crypto algorithms;
if they differ, then they MUST provide identical level of protection.


I don't agree. We are basically making "clones", so I see no reason why
to make this more complicated by allowing things to be more different in
various ways. The negotiation for the first child and second child
should follow from the same configuration so it should really also end
up returning the same crypto algorithms. Unless you would make choics
based on trigger packet on specific resource, but that's something I
am happy to not create another knob for. If others really feel similar
to Valery, please speak out. We would also than need to make a clear
list of all the knobs that MUST be the same and all the ones that SHOULD
be the same. I'd rather not make that list :P

(I agree that Mode and Traffic Selectors MUST be the same, not sure about 
compression).

5. Section 3

  During the CREATE_CHILD_SA rekey for the Child SA, the
  SA_RESOURCE_INFO notification MAY be included, but regardless of
  whether or not it is included, the rekeyed Child SA MUST be bound to
  the same resource(s) as the Child SA that is being rekeyed.

Isn't binding a local matter? Doesn't peer bother to what resource you bound
your end of an SA? Then why is there this MUST? What happens if I bound new SA
to a different CPU - peer never know this, so how it will check that you follow 
this MUST?
I think that instead of this requirement there should be just a recommendation 
for implementers
(with no BCP14 language).

i
changed MUST to should.

6. Section 4

  A simple distribution could be to install one additional Child SA on
  each CPU.  An implementation MAY ensure that one Child SA can be used
  by all CPUs ...

I believe it should be "can" instead of "MAY", since it is a local matter.


I do think local matters can still have BCP14 language. eg MUST use
strong random, or MUST use a secure time source. I would rather change
this MAY to a SHOULD than to a "can" :P

7. Section 4

  When the number of queue or CPU resources are different between the
  peers, the peer with the least amount of resources MAY decide to not
  install a second outbound Child SA for the same resource as it will
  never use it to send traffic.

Again, I think it should be "can" instead of "MAY", since it is a local matter.


Here I don't really care, so changed to "may" (instead of "can" as it
feels better to me, but someone with better english please speak out
which is better)

8. Section 4

  If per-CPU SADB_ACQUIRE messages are implemented (see Section 6), the
  Traffic Selector (TSi) entry containing the information of the
  trigger packet SHOULD be included in the TS set.  This information
  MAY be used by the peer to select the most optimal target CPU to
  install the additional Child SA on.

The conditional part of the first sentence is too implementation specific:
the SADB_REQUIRE is specific to PF_KEYv2 API and not all implementations use it
(and this specification doesn't depend on it). Just replace it with
some more generic text, like "dynamic creating of per-CPU Child SAs" providing 
a single
a reference to an SADB_ACQUIRE as an example. This is true for Section 6 too,
where SADB_ACQUIRE should only be used as an example and not as
prescribed mechanism.


Fixed.

Then, this SHOULD is already specified in RFC 7296, Section 2.9:

  To enable the responder to choose the appropriate range in this case,
  if the initiator has requested the SA due to a data packet, the
  initiator SHOULD include as the first Traffic Selector in each of TSi
  and TSr a very specific Traffic Selector including the addresses in
  the packet triggering the request.

Why repeat this requirement there? Perhaps just reference Section 2.9 of RFC 
7296?


Because the example is specific to CPU-bound resources here and tweaking
the decision based on that. I added the reference to 2.9 though.

And finally, "MAY" should be "can", since it is a local matter of responder.


Done.

9. Section 5.1, Section 5.2

  *  Protocol ID (1 octet) - MUST be 0.  MUST be ignored if not 0.
  *  SPI Size (1 octet) - MUST be 0.  MUST be ignored if not 0.

What "MUST be ignored if not 0"? The value of the field or the notify message 
itself?


I think it is clear that it is on the bullet line about Protocol ID, and
not that we are not talking about higher level things.

If the non-zero value must be ignored, then why specify that it MUST be 0?


Normally we do that so that future implementations can be written that
won't be dropped by current implementations seeing a new valid value
that older implementations do not support. I looked at RFC7296 Section
3.10 and it uses similar language , eg:

   [protocol id] ......
      If the SPI field is empty, this field MUST be
      sent as zero and MUST be ignored on receipt.

   o  SPI Size (1 octet) - Length in octets of the SPI as defined by the
      IPsec protocol ID or zero if no SPI is applicable.  For a
      notification concerning the IKE SA, the SPI Size MUST be zero and
      the field must be empty.

So I guess 7296 says we cannot ignore a bad SPI Size, but can ignore a
bad protocol ID. But it is unclear on what to do when a notify payload
contains weird things. RFC 7296 Section 2.21 does not make it more clear
on what we should do.

But thinking about this now, I think this is all wrong. The
SA_RESOURCE_INFO notify is about a Child SA, so it should really
have a valid Protocol ID (eg 3 for ESP) and should have a proper
spi size and length specified that matches the Child SA. So I changed
it to:

           Protocol ID (1 octet) - this field MUST contain either (2)
               to indicate AH or (3) to indicate ESP.</t>

           SPI Size (1 octet) - Length in octets of the SPI as defined
               by the IPsec protocol ID.</t>

And how the receiver should interpret the notify message in this case?
I think there is an inconsistency. I suggest to remove the second MUSTs,
thus the receiver will reject the notify if it is has non-zero protocol and SPI,
as it should do with any other such notifications.


I cannot find this guidance in 7296, where did you find it? But as I
said, it does not apply here anyway.

10. Section 5.1

  *  Optional Payload Data.  This value MAY be set to convey the local
     identity of the resource.  The value SHOULD be a unique identifier
     and the peer SHOULD only use it for debugging purposes.

This is an optional field used for debugging purposes. Why any BCP14 language 
here?


optional use can still have BCP14, eg an optional nonce MUST use secure
random :P I changed the first MAY to lowercase, but leaving the SHUOLD
in, as the whole point is to be able to distinguish/recognise the
resource seperately from each other.

And if this is a value, associated with a resource, then how it may be unique
in a situation when peer associates several SAs with a single resource
(e.g. if it has fewer resources then its peer, but doesn't mind creating as many
SAs as its peer wants)?


You can pick a random value or a counter or whatever you want? I'm not
sure we need text to explain that?

Another consideration - if it is used only for debugging,
and its format is not specified, then it is meaningful only for the peer that
sends it and thus it should be an opaque blob with no defined semantics
(like association with a resource) and no requirements on its value.


I changed "value" to "opague data" to make this clearer.

11. Section 6

As I pointed out before, this section is too implementation-specific.
I think that SADB_ACQUIRE should be mentioned only as an example
and the operational considerations should be provided in
API-neutral language.


Fixed with using "packet trigger messages" instead.

12. Section 6

  Implementations supporting per-CPU SAs SHOULD extend their local SPD
  selector, and the mechanism of on-demand negotiation that is
  triggered by traffic to include a CPU (or queue) identifier in their
  SADB_ACQUIRE message from the SPD to the IKE daemon.

I think that using BCP14 language is not justified here (it is not protocol 
behavior).
It should be "should" instead.


I think it is similar to "MUST use secure random source". It is an
assumption or requirement of proper functioning of the specified
protocol.

13. Section 6

  And
  bringing the deleted per-CPU Child SA up again immediately after
  receiving the Delete Notify might cause an infinite loop between the
  peers.  Another issue of not bringing up all its per-CPU Child SAs is
  that if the peer acts similarly, the two peers might end up with only
  the first Child SA without ever activating any per-CPU Child SAs.

I think more should be said about this situation and how to avoid it.
Perhaps advising such implementations to not delete per-CPU Child SAs
if they have not been rekeyed.


I don't like that advise. I would still like to see that idle child SA's
get removed so we don't collect too many useless Child SAs.

Then the second case will never happen.
As for the first case, perhaps advise all implementations to not
delete SAs immediately once they are created...


I don't think that is a problem we have. Even if we use a short idle
timer of a few seconds, at least it is not a wire speed loop request.

14. Section 6

  Implementations might support dynamically moving a per-CPU Child SAs
  from one CPU to another CPU.  If this method is supported,
  implementations must be careful to move both the inbound and outbound
  SAs.

If this is done "on the fly", then the data in SA_RESOURCE_INFO (if it is 
provided)
would become invalid, killing its usefulness for debugging.


Not always. Eg if the postfix mail server process migrates to another
CPU, it might make sense to migrate the Child SA there as well. And
the debugging would still be valid if mail traffic triggered that SA.

 If it is done
as a result of SA rekey, then it contradicts to the last sentence in Section 3
(well, I already suggested to remove it, see comment 5).


I think these are two different atomic processes. rekeying and sa migration.

15. Section 6

  If this method is supported,
  implementations must be careful to move both the inbound and outbound
  SAs.

Is it justified? Why inbound and outbound SAs cannot be bound to a different 
resources?


I don't think that makes sense from a performance point of view?
I'm happy to hear a few more opinions on this.

16. Section 6

  If the IPsec endpoint is a gateway, it can move the inbound SA
  and outbound SA independently from each other.

This contradict to the previous sentence.

It does say "must be careful". The gateway case is different from theendpoint case where the packet is created on the host that is also the

ipsec endpoint.

Perhaps Antony has some changes that would make everyone happier?

17. Section 6

  If the IPsec endpoint
  is the same host responsible for generating the traffic, the inbound
  and outbound SAs SHOULD remain as a pair on the same CPU.

Using BCP14 language is not justified here (it is a local matter and
not a protocol behavior, so this is just a recommendation
for implementers). And some justification for this recommendation
would be helpful.


Again, it would cause performance issues which is at the core of this
protocol, so I think it is justified?

18. Section 6

  If a host
  previously skipped installing an outbound SA because it would be an
  unused duplicate outbound SA, it will have to create and add the
  previously skipped outbound SA to the SAD with the new CPU ID.

This sentence unclear for me. First, I fail to understand why this is needed.
Then, I fail to understand how to do this, in particular -
how to get SPIs and keying material of the skipped (i.e. deleted) SA.
I believe this piece of text should be elaborated.


The idea is that if you move Child SA #1 away from Child SA #2, and
#2 did not install an outbound, the resource might be left without
any outbound which would be wrong/bad.

19. Section 6

  The
  inbound SA may not have CPU ID in the SAD.

I fail to understand what this sentence is related to?


It is a bit unclear to me too. Perhaps Antony can clarify.

20. Section 6

  To support this, the IKE software might have to hold
  on to the key material longer than it normally would, as it might
  actively attempt to destroy key material from memory that the IKE
  daemon no longer needs access to.

In my opinion this requirement is problematic. It highly depends on
concrete architecture. In some cases IKE daemon has never access to the Child SA
keying material - it only provides values for its generating,
which is done in a HSM. And in this case IKE daemon has no control
over this material (and it never needs to have an access to it or control over 
it).


Whatever method used to keep a pointer to the key is what would be
needed. I dont think it matters wether that is a crypto module fd
or just a memory pointer.

So, I think that this should be elaborated, in particular justifying
why do you need to re-create outbound SA which you have already deemed unneeded.


To fill in a hole left on a resource if your moving child sa happens to
be the last outbound SA on that resource, which then ends up with no
outbound because the last remaining child sa did not install its
outbound.

I wonder whether TS_MAX_QUEUE is a permanent or temporary error.


I think it can depend. If too many customers with too many CPUs, perhaps
it is a fairly permanent error. But if customers are removed or CPUs
added, perhaps it can go away again too. If used as a resource
constraint, eg say "each customer up to 1024", then it might be more
or less permanent for that customer.

22. Section 7

  Similar to how an implementation should limit the number of half-open
  SAs to limit the impact of a denial of service attack, an
  implementation SHOULD limit the maximum number of additional Child
  SAs allowed per unique TSi/TSr.

In section 6 you said:

  Peers SHOULD be lenient with the maximum number of Child SAs they
  allow for a given TSi/TSr combination to account for corner cases.

and later

 As additional Child SAs consume  little additional overhead, ...

So, some wordsmithing to make these pieces of text in concert would be helpful.


Happy to hear suggestions. I think the overal idea is "dont try to get
too close to the real number of CPUs, allow at least several times the
number of CPUs of common Big Appliances. Eg mostly use it as a safety
cap.

23. Section 7

  This trust relationship is usually not present for the Remote Access
  VPN type deployments, and allowing per-CPU Child SA's is NOT
  RECOMMENDED in these scenarios.  Therefor, it is also NOT RECOMMENDED
  to allow per-CPU Child SAs per default.

Typo: s/Therefor/Therefore


Fixed.

In general, I don't think that this statement is universally true. If it is a 
corporate
Remote Access scenario, then clients are administered by the same people
as gateways. Thus, I don't think that advising not use this extension
in Remote Access scenario due to the lack of trust to clients is always 
justified.


Isn't that covered by "usually not present". I am thinking of $5/month
customers, not million dollar customers. You have a suggestion for the
text to make it work for you as well?

24. Section 7

  The SA_RESOURCE_INFO notify contains an optional data payload that
  can be used by the peer to identify the Child SA belonging to a
  specific resource.  The notify data SHOULD NOT be an identier that
  can be used to gain information about the hardware.  For example,
  using the CPU number itself as identier might give an attacker
  knowledge which packets are handled by which CPU ID and it might
  optimize a brute force attack against the system.

Isn't it protected by IKE SA? I believe this is only possible with MitM attack
(only initiator is affected) or when CRQC is used with RFC 8784 scenario for 
the initial SA.
In the first case the SA won't be created. The second case is currently 
unrealistic.
Am I missing something? If not, then I suggest to just describe this
threat giving no advices (i.e. remove SHOULD NOT), since it seems to be
theoretical threat. Just let the implementers know that this is possible.


Again, I was thinking of $5/month people that you should not really trust
to share your hardware details with.

Typo: s/identier/identifier (2 instances)


Fixed.

25. Section 9

  This document defines two new IKEv2 Notify Message Type payloads for
  the IANA "IKEv2 Notify Message Types - Status Types" registry.

This is a leftover from the previous version of the draft -
the current version defines only one status type notification.


Fixed.

26. Section 10

Why RFC2367 is a normative reference?


Because I made a mistake. fixed :)

Thanks again for the thorough review !

The -03 should appear shortly. Once we resolve the last few issues we
can do another update.

Paul

_______________________________________________
IPsec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/ipsec

Re: [IPsec] WGLC of draft-ietf-ipsecme-multi-sa-performance

Reply via email to