[IPsec] Review of draft-ietf-ipsecme-ddos-protection-06

Paul Wouters Tue, 31 May 2016 13:45:13 -0700


This is a partial review of draft-ietf-ipsecme-ddos-protection-06
up to Section 6. I hope to complete the rest in the next few days.


I think this document needs another revision before continuing.
(and I would prefer it to be split in two)

Issues / Questions:

   An obvious defense, which is described in Section 4.2, is limiting
   the number of half-open SAs opened by a single peer.  However, since
   all that is required is a single packet, an attacker can use multiple
   spoofed source IP addresses.

I am not sure why this is mentioned here in this way, because the attack
of spoofed source IP is already handled effectively with DOS cookies. I
think it is better to state "bot-nets are large enough that they have
enough unique IP addresses" and avoid talking about spoofing in this
section altogether.


   Stage #3 includes public key operations, typically more than one.

It seems this sentence needs to say something that these operations are
very expensive, similar to describing the "effort" in the previous
sentences of stage #1 and stage #2.

   It seems that the first thing cannot be dealt with at the IKE level.
   It's probably better left to Intrusion Prevention System (IPS)
   technology.

I would rewrite this more authoritively, and not use the word "seems"

   Depending on the Responder implementation, this can be repeated with
   the same half-open SA.

I don't think this "depends on the implemention". Since any on-path
attacker can spoof rubbish, a Responder MUST ignore the failed packet
and remain ready to accept the real one for a certain about of time. And
this also applies to this later section in the document:

   If the received IKE_AUTH message failed to decrypt correctly (or
   failed to pass ICV check), then the Responder SHOULD still keep the
   computed SK_* keys, so that if it happened to be an attack, then the
   malicious Initiator cannot get advantage of repeating the attack
   multiple times on a single IKE SA.




   Retransmission policies in practice wait at least one or two seconds
   before retransmitting for the first time.

I'm not sure if this is still true. Libreswan starts at 0.5s and doubles,
and I know that iOS was faster too.

   When not under attack, the half-open SA timeout SHOULD be set high
   enough that the Initiator will have enough time to send multiple
   retransmissions, minimizing the chance of transient network
   congestion causing IKE failure.

I agree, but I'd like to note that this and the text just above mentioning
"several minutes" is kind of archaic. We found a limit of 30 seconds on
other implementations so common as a timeout, that we see no more value in
keeping an IKE exchange around for more then 30 seconds. (we do re-start
and try a new exchange from scratch for longer, in some configurations we
try that forever)

   For IPv6, ISPs assign between a /48 and a /64, so it makes sense to use
   a 64-bit prefix as the basis for rate limiting in IPv6.

Why does that make sense over using /48 ? Wouldn't you rather rate limit
some innocent neighbours over not actually defending against the attack?
If puzzles work as advertised, real clients on that /48 should still be
able to connect.

   Regardless of the type of rate-limiting used, there is a huge
   advantage in blocking the DoS attack using rate-limiting for
   legitimate clients that are away from the attacking nodes.  In such
   cases, adverse impacts caused by the attack or by the measures used
   to counteract the attack can be avoided.

I don't understand this paragraph at all. I guess "rate-limiting for
legitimate clients" just confuses me. I think it might attempt to be
saying "not blocking ranges with no attackers helps real clients", but
it is very unclear.

   to calculate the PRF

One does not "calculate" a PRF. One uses a PRF to calculate something.

The section that starts with "Upon receiving this challenge," seems to
be discussing the pros and conns of this method before it has explained
the method. The reader is forced to skip this or forward to section 7
and getting back to this part. I suggest to re-order some text to avoid
this, or to give a better short summary of the puzzle nature just before
this paragraph.

   When the Responder is under attack, it MAY choose to prefer
   previously authenticated peers who present a Session Resumption
   ticket (see [RFC5723] for details).

Why is this only a MAY? Why is it not a SHOULD or MUST?

   The Responder MAY require such
   Initiators to pass a return routability check by including the COOKIE
   notification in the IKE_SESSION_RESUME response message, as allowed
   by Section 4.3.2. of [RFC5723].

Perhaps this should say the responder SHOULD require COOKIEs for resumed
sessions if it also requires COOKIEs for IKE_INIT requests. That is, it
should not give preference to resumed sessions as those could be equally
forged as IKE_INIT requests.

   With a typical setup and typical Child SA lifetimes, there
   are typically no more than a few such exchanges, often less.

(ignoring the language) I do not believe this is true. This goes back to
the discussion on how often people deploy liveness probes. Implementors
seem to think 30s, while endusers want and do configure things like 1s.
I don't think the text about the amount of IKE exchanges are typical
are needed because the text below talks about specific abuse anyway,
and not in terms of just number of exchanges.

      If the peer creates too many Child SA with the same or overlapping
      Traffic Selectors, implementations can respond with the
      NO_ADDITIONAL_SAS notification.

I think this requires normative language, eg: implementations MUST respond
with a NO_ADDITIONAL_SAS notification. The same for the next bullet item
where it says "implementations can introduce an artificial delay", which
should be like: "MAY introduce an artificial delay" (or even SHOULD, or
rewrite "too many" to "many" and use MAY)


Section 5 switchs from talking about "the Responder" to "the implementation".
I think it should be "the Responder" throughout the document.

    the retransmitted messages should be silently discarded.

That should be normative too, MUST be discarded.

NITS:

always bounded -> always bound

"effectively defend" -> defend
(if it was "effective", we wouldn't need puzzles :)

thwart -> prevent or handle or counter?
(thwart is just an odd/uncommon word for non-native englush speakers)

The following sentence kind of runs on:

   Generating the IKE_SA_INIT request is cheap, and sending multiple
   such requests can either cause the Responder to allocate too much
   resources and fail, or else if resource allocation is somehow
   throttled, legitimate Initiators would also be prevented from setting
   up IKE SAs.

How about:

   Generating the IKE_SA_INIT request is cheap. Sending large amounts of
   IKE_SA_INIT requests can cause a Responder to use up all its resources.
   If the Responder tries to defend against this by throttling new requests,
   this will also prevent legitimate Initiators from setting up IKE SAs.

Next,

   Yes, there's a stage 4 where the Responder actually creates Child
   SAs, but when talking about (D)DoS, we never get to this stage.

This is rather strange language for an RFC, how about:

   The fourth stage where the Responder creates the Child SA
   is not reached by attackers who cannot pass the authentication
   step.


so it's -> so it is

attempt to either exhaust -> attempt either to exhaust

This should be easy because -> this is easy because

even without changes to the protocol -> without changes to the protocol

Puzzles, introduced in Section 4.4, do the same thing only more of it ->
Puzzles, introduced in Section 4.4, accomplish this goal and more.

They don't have to be so hard -> Puzzles do not have to be so hard

can't -> cannot

it's -> it is

they increase the cost of a half-open SAs for the attacker so that it can
create only a few. ->
puzzles increase the cost of creating half-open SAs so the attacker is
limited in the amount they can create.

Reducing the amount of time an abandoned half-open SA is kept attacks
the issue from the other side. It reduces the value the attacker
gets from managing to create a half-open SA.  ->
Reducing the lifetime of an abandoned half-open SA also reduces the
impact of such attacks.

(I don't much like using comma's for numbers, as it means different things
 in different parts of the worlds. eg 60,000 and 1,000 in this document)

Reduce the retention time to 3 seconds, and the attacker needs to
create 20,000 half-open SAs per second. ->
If the retention time is reduced to 3 seconds, the attacker would need to
create 20,000 half-open SAs per second to get the same result.

making it more likely to thwart an exhaustion attack against Responder
memory ->

making it more likely that the attacks run out of memory before the Responder.

The attacker has two ways to do better -> The attacker has two alternative
attacks to do better

It seems that the first thing -> It seems that the first alternative

On the other hand, sending an IKE_AUTH request is surprisingly cheap. ->
On the other hand, the second alternative of sending an IKE_AUTH request
is very cheap.

It requires a proper IKE header with the correct IKE SPIs, and it
requires a single Encrypted payload.  The content of the payload
might as well be junk.  ->
It requires generating a roper IKE header with correct IKE SPIs and a
single Encrypted payload. The content of the Encrypted payload is
irrelevant and therefore cheap to generate.

does not check -> fails the integrity check.

Puzzles can make attacks of such sort -> Puzzles make attacks of such sort

Puzzles have their place as part of #4 -> Puzzles are used as a solution
for strategy #4.

Defense Measures while IKE SA is being created ->
Defense Measures while the IKE SA is being created

any IKE_SA_INIT request will require solving a puzzle. ->
any IKE_SA_INIT request will be required to solve a puzzle.

The downside -> The disadvantage
(the other case does use advantage/disadvantage properly, so this is the odd
 one out)

can still effectively DoS the Responder -> can still effectively DDoS the 
Responder.
(there are some more DoS -> DDoS changes that you could make)

to mitigate DoS attack -> to mitigate DoS attacks

the cookie mechanism from -> the cookie mechanism of

   It is loosely based on the proof-of-work technique used
   in Bitcoins [bitcoins].

I think refering to bitcoins is a bit of a stretch and only distracts.

   This sets an upper bound, determined by the
   attacker's CPU, to the number of negotiations it can initiate in a
   unit of time. ->
   Puzzles set an upper bound, determined by the
   attacker's CPU, to the number of negotiations the attacker can initiate in a
   unit of time. ->

for it to make any difference in mitigating DDoS attacks. -> [remove]

and this fact allows -> and this allows

a malicious peer -> an attacker   (we used attacker all the way up to here
in this document, why change it now?)


Preventing Attacks using "Hash and URL" Certificate Encoding ->
Preventing "Hash and URL" Certificate Encoding attacks

In IKEv2 each side may use "Hash and URL" Certificate Encoding ->
In IKEv2 each side may use the "Hash and URL" Certificate Encoding

a DoS attack on responder -> a DoS attack on the responder

 before continue downloading. -> before continuing to download the file.

See Section 5 of [RFC7383] for details. -> See Section 5 of [RFC7383] for
details on how to mitigate these attacks.

Defense Measures after IKE SA is created -> Defense Measures after an IKE SA is 
created

Once IKE SA is created -> Once an IKE SA is created

there is usually not much traffic over it -> there usually are only a limited
amount of IKE messages exchanged.

In most cases this traffic consists of exchanges aimed to create
additional Child SAs, rekey, or delete them and check the liveness of
the peer. ->
This IKE traffic consists of exchanges aimed to create additional Child SAs,
IKE rekeys, IKE deletions and IKE liveness tests.

Such behavior may be caused by buggy implementation, misconfiguration or be
intentional.  The latter becomes more of a real threat if the peer uses NULL
Authentication, described in [RFC7619]. In this case the peer remains
anonymous, allowing it to escape any responsibility for its actions.  ->
Such behavior can be caused by broken implementations, misconfiguration or
as an intended attack. Extra case should be taken in the case of NULL
Authentication [RFC7619] where one essentially allows IKE SAs with untrusted
third parties that could be malicious.

See Section 3 of [RFC7619] for details -> See Section 3 of [RFC7619] for 
details on how to mitigate attacks when using NULL Authentication.

The following recommendations for defense against possible DoS attacks after
IKE SA is established are mostly intended for implementations that allow
unauthenticated IKE sessions; however, they may also be useful in other
cases. ->
The following recommendations apply especially for NULL Authenticated IKE
sessions, but also apply to authenticated IKE sessions, with the difference
that in the latter case, the identified peer can be locked out.

then the peer could initiate multiple simultaneous -> peers are able to
initiate multiple simultaneous

that could increase host resource consumption -> that increases host resource 
consumption

Since currently there is no way -> Since there is no way

decrease window size once it was increased -> decrease the window size
once it has been increased

For that reason, it is NOT RECOMMENDED to ever increase the IKEv2 window size
above its default value of one if the peer uses NULL Authentication.->
It is NOT RECOMMENDED to allow an IKEv2 window size greater than one when
NULL Authentication has been used.

If the peer initiates requests to rekey IKE SA or Child SA too
often, implementations can respond to some of these requests with
the TEMPORARY_FAILURE notification, indicating that the request
should be retried after some period of time. ->
If a peer initiates an abusive amount of CREATE_CHILD exchanges, the
Responder SHOULD reply with TEMPORARY_FAILURE notifications indicating
the peer must slow down their requests.

If the peer initiates too many exchanges of any kind, implementations can
introduce an artificial delay before responding to each request message.->
If a peer initiates many exchanges of any kind, the Responder MAY
introduce an artificial delay before responding to the request.

"the implementation need" -> the Responder needs

making it possible to process requests from the others -> and frees up
resources on the Responder that can be used for answering legitimate clients.

Note, that if the Responder receives retransmissions -> If the Responder
receives retransmissions

the retransmitted messages should be silently discarded. -> the retransmitted
messages MUST be discarded.

The delay should not be too long to avoid causing the IKE SA to be deleted on the 
other end due to timeout. ->
The delay must be short enough to avoid legitimate peers deleting the IKE
SA due to a timeout.

[ to be continued ]

_______________________________________________
IPsec mailing list
IPsec@ietf.org
https://www.ietf.org/mailman/listinfo/ipsec

[IPsec] Review of draft-ietf-ipsecme-ddos-protection-06

Reply via email to