Hi Paul,

thank you for the detailed review.

I would remove most of the speculative text in:

 In IPv4 it makes sense to limit the number of half-open SAs based on
 IP address.  Most IPv4 nodes are either directly attached to the
 Internet using a routable address or are hidden behind a NAT device
 with a single IPv4 external address.  IPv6 networks are currently a
 rarity, so we can only speculate on what their wide deployment will
 be like, but the current thinking is that ISP customers will be
 assigned whole subnets, so we don't expect the kind of NAT deployment
 that is common in IPv4.  For this reason, it makes sense to use a
 64-bit prefix as the basis for rate limiting in IPv6.

And replace it with:

 For IPv6, ISPs assign either a /48 or /64, so it makes sense to
 use a 64-bit prefix as the basis for rate limiting in IPv6.

Do I get you right that you want to remove the following text?

                                                   IPv6 networks are currently a
   rarity, so we can only speculate on what their wide deployment will
   be like, but the current thinking is that ISP customers will be
   assigned whole subnets, so we don't expect the kind of NAT deployment
   that is common in IPv4.

I'm not sure about:

 The number of half-open SAs is easy to measure, but it is also
 worthwhile to measure the number of failed IKE_AUTH exchanges.  If
 possible, both factors should be taken into account when deciding
 which IP address or prefix is considered suspicious.

I'm not sure what measuring the failed IKE_AUTH exchanges gains you?

It allows a responder to make a decision whether it is under attack.
If a percentage of IKE_AUTH exchanges that failed to decrypt properly
is high enough, then it means that a large fraction of initiators send bogus IKE_AUTH requests, so the responder can assume that they do this for reason. In this case the responder can use puzzles in IKE_AUTH (see Section 7.2).

Whether or not you accept new work should more depend on your own
resoures than previously failed attempts, otherwise you risk becoming a

It is based on both. You must maintain some statistics information
(number of half-open IKE_SA_INIT, number of failed IKE_AUTH)
and make a decision whether to use defensive measures
by analyzing this statistics.

If you only take into consideration your resources consumption,
then you would end up punishing legitimate clients in case there are so many of them that you just cannot handle the volume
of requests, even if there are no no orphaned IKE_SA_INIT or failed IKE_AUTH.
In other words, you must distinguish attack from just a high load. Statistics helps you in this case.

victim of lock-out attacks where an attacker causes so many failures
that legitimate clients would be prevented from initiating IKE.
Especially with CGNAT.

See above. Once you make a decision that an attak is in progress
(e.g. by monitoring the number of failed IKE_AUTH within
last N seconds), you'll turn on IKE_AUTH puzzles or take some other measures.

Next item:

 There are two ways to rate-limit a peer address or prefix:

 1.  Hard Limit - where the number of half-open SAs is capped, and any
     further IKE_SA_INIT requests are rejected.

 2.  Soft Limit - where if a set number of half-open SAs exist for a
     particular address or prefix, any IKE_SA_INIT request will
     require solving a puzzle.

This does not mention the build-in defense of the DCOOKIE defense from
the base IKEv2 spec. Although it does shortly after:

Cookies are described in the next chapter.

 The cookie mechanism limits the amount of allocated
 state to the size of the bot-net, multiplied by the number of half-
 open SAs allowed per peer address, multiplied by the amount of state
 allocated for each half-open SA.  With typical values this can easily
 reach hundreds of megabytes.

It would be clearer to to mention explicitely that the cookie mechanism
prevents spoofed packets from taking up state, thereby limiting [....]

Could you please be more explicit what text you are not happy with?

 Note that the Responder SHOULD cache
 tickets for a short time to reject reused tickets (Section 4.3.1),
 and therefore there should be no issue of half-open SAs resulting
 from replayed IKE_SESSION_RESUME messages.

I should probably read 5723, but why would one ever respond to an "old"
re-used or unknown session resume ticket? I guess the only use is a
faster failure of lost resumption tickets for real clients? Since just
sending a "go away" response is not more computationally expensive then
creating a "go away" response for an invalid IKE SPI or badly formed IKE
packet, I would say it is not worth implementing a separate list of
recently used tickets.

RFC 5723 describes two kinds of tickets - "ticket by reference"
and "ticket by value". In the former case the server stores
all the information regarding IKE SAs that can be resumed and the ticket is just an "index" in that database. With this approach
the server always knows whether the ticket was already used or not.
With the latter approach all the information regarding the SA
is stored in the ticket itself. The server stores nothing in this case - it just decrypts the presented ticket and resumes the IKE SA.
In this case the server doesn't know whether the ticket
is used before unless it maintaines a cache of recently
used tickets.
 If the received IKE_AUTH message failed to decrypt correctly (or
 failed to pass ICV check), then the Responder SHOULD still keep the
 computed SK_* keys, so that if it happened to be an attack, then the
 malicious Initiator cannot get advantage of repeating the attack
 multiple times on a single IKE SA.

Well, it needs to do this anyway in case the attacker is just sending
bogus responses faster than the real client. So I don't think this

Do you mean "bogus requests"? Isn't it a DoS attack?

advise here is warranted - it has nothing to do with ddos.

I think this advise is closely related to DoS protection. You yourself described the attack two lines above.

 To prevent this kind of attacks the responder should not blindly
 download the whole file.  Instead it SHOULD first read the initial
 few bytes, decode the length of the ASN.1 structure from these bytes,
 and then download no more than the decoded number of bytes.

That seems really bad. If the attacker controls the URL, they can also
put an malicious ASN.1 encoding in the cert. Much better is to [Oh never
mind you write all the things I wrote here already]

OK.

 With a typical setup and typical Child SA lifetimes, there
 are typically no more than a few such exchanges, often less.

I don't agree. People put in 1s liveness probes, so that's a lot of IKE
packets.

Liveness check is about 50 bytes. Even if it is performed
every second, it results in 2 packet/sec and 100 bytes/sec traffic per a client. Is it a lot?

 Since any endpoint can initiate a new exchange, [...]

I would more explicitely point the AUTH NULL based attacks to its RFC.

Well, I think we tryed to follow this way: there is little specific to NULL auth, it is just mentioned (and referenced) as one of the factors that may make DoS attacks more easy to mount.

Then focus this document on the possible abuse of legitimate clients.
However, I don't know what I would want to advise. You can put in
maximums for rekeys, reauths, or child sa's, but those should at most
be configuration options, and not hardcoded options in the
implementation - since implementors cannot predict what legitimate
large scale use their code might see.

Sure. And that's why the draft doesn't prescribe any hard limits,
it just lists possible defense measures.

 For that reason, it is NOT RECOMMENDED to
 ever increase the IKEv2 window size above its default value of one
 if the peer uses NULL Authentication.

I'm not sure why here the auth method is used to discriminate. Earlier
it also talked about authenticated clients and launching many exchanges?

Because with NULL auth the peer is not authenticated and we'd rather limit him/her abilities to mount DoS attack
by initiating N exchanges in parallel, that would increase
our peak load. If the peer is authenticated, then launching
N exchanges simultaneously is not an attack in general. And if the authenticated peer mounts such a DoS attack, the
he/she could be traced down and either out-of-band
measures are taken or peer's credentials are revoked.

Also, this advise is actually an update to RFC7619/RFC7296 so this
document should list it is updating those RFC's.

Is it really needed? RFC 7296 doesn't deal with NULL auth,
and RFC 7619 does reference this draft in Security Considerations.
What others think of it?

 If the peer initiates too many exchanges of any kind,
 implementations can introduce an artificial delay before
 responding to request messages.  This delay would decrease the
 rate the implementation need to process requests from any
 particular peer, making it possible to process requests from the
 others.  The delay should not be too long to avoid causing the IKE
 SA to be deleted on the other end due to timeout.

I am not sure how useful this advise is. Since people use liveness
timeouts of 1s, a malicious peer can always do 1s of exchanges. So if
you want to introduce delays, they should probably delay only
non-liveness exchanges. And liveness exchanges that are more frequent
that 1s should probably just be dropped or rejected.

It doesn't matter what exchange type is. The intention is to artificially limit the number of exchanges the malicious peer can initiate per second.
There is a MID window, so the peer cannot initiate a new exchange until
one of the currently active exchanges is completed. If, for example,
window size is 1, then the malcious peer cannot initiate a new
exchange until we send a responce to the current one. If we send
response immediately, then the malicious peer immediately
initiates another exchange. If we respond in 3 seconds, then it will have to wait 3 seconds, before it can initiate a new exchange (probably sending retransmissions during that time that we will ignore). That's an idea. If malicious peer is so impatient, that it'll tear down the IKE SA if no response is received within 3 seconds, then it'll make worse for itself - it'll need to create
a new IKE SA from scratch passing all the puzzles barriers.

 Note, that if the Responder
 receives retransmissions of the request message during the delay
 period, the retransmitted messages should be silently discarded.

That also updated RFC-7296 which states that each IKE request should get
an IKE answer.

I don't think artificaial delay is a violation of RFC 7296.
Each IKE request will be answered. RFC 7296 doesn't require that
it is answered immediately (or as soon as responder can prepare the response).

And these should be very cheap to send anyway. At least
our implementation caches the last sent IKE packet for retransmissions.

Yes, you can count the number of received retransmissions during the artificial
delay and once the delay is over send back that number of identical responses 
at once.
It is cheap. However I'm not sure it makes sense.

While the document mentions Fragmentation with respect to puzzles, it
does not mention ddos attacks based on malicious fragmentation packets.
It could be that the base RFC is clear enough, but perhaps this document
should give some advise too?

I think RFC 7383 lists possible DoS attacks in Security Considerations section.
Do you think it's not enough?

Paul

Regards,
Valery.

_______________________________________________
IPsec mailing list
IPsec@ietf.org
https://www.ietf.org/mailman/listinfo/ipsec

Reply via email to