Re: [IPsec] I-D Action: draft-ietf-ipsecme-ddos-protection-05.txt

Paul Wouters Sun, 27 Mar 2016 10:42:06 -0700

On Tue, 22 Mar 2016, Valery Smyslov wrote:

Do I get you right that you want to remove the following text?


                                                   IPv6 networks are
   currently a
   rarity, so we can only speculate on what their wide deployment will
   be like, but the current thinking is that ISP customers will be
   assigned whole subnets, so we don't expect the kind of NAT deployment
   that is common in IPv4.


Yes. Since it is speculative and will quickly go stale in the RFC (we hope :)

 I'm not sure about:

  The number of half-open SAs is easy to measure, but it is also
  worthwhile to measure the number of failed IKE_AUTH exchanges.  If
  possible, both factors should be taken into account when deciding
  which IP address or prefix is considered suspicious.

 I'm not sure what measuring the failed IKE_AUTH exchanges gains you?
It allows a responder to make a decision whether it is under attack.
If a percentage of IKE_AUTH exchanges that failed to decrypt properly
is high enough, then it means that a large fraction of initiators send bogusIKE_AUTH requests, so the responder can assume that theydo this for reason. In this case the responder can use puzzles in IKE_AUTH(see Section 7.2).


Ok, although I'm still not convinced doing this won't make the problem
worse for the legitimate client.

 Whether or not you accept new work should more depend on your own
 resoures than previously failed attempts, otherwise you risk becoming a


It is based on both. You must maintain some statistics information
(number of half-open IKE_SA_INIT, number of failed IKE_AUTH)
and make a decision whether to use defensive measures
by analyzing this statistics.


I agree on the half-open SA's, I'm not convinced about meassuring failed
IKE_AUTH. Previous failures cost no current resources, so I'm not sure
if it should be used to as a metric. The metric should be based on
current resources, which seem already reflected in the number of
half-open SA's you're carrying and the amount of resources those are
using.

If you only take into consideration your resources consumption,
then you would end up punishing legitimate clients in case there are so manyof them that you just cannot handle the volume of requests


Sure, just like puzzles punish legitimate clients :)

In other words, you must distinguish attack from just a high load.


"just a high load" would not have zillions of half-open SA's.

See above. Once you make a decision that an attak is in progress
(e.g. by monitoring the number of failed IKE_AUTH within
last N seconds), you'll turn on IKE_AUTH puzzles or take some other measures.


you seem to think you can save the poor legitimate client in a see of
botnet. I am not convinced of that. Excluding that, the only thing the
server needs to do when under attack is ensure it does not die. So a
limit of the half-open SA's make sense _for the server itself_. It's not
doing that to help legitimate clients - those have already lost. They
are drowned out (with or without a sea of puzzles)

  The cookie mechanism limits the amount of allocated
  state to the size of the bot-net, multiplied by the number of half-
  open SAs allowed per peer address, multiplied by the amount of state
  allocated for each half-open SA.  With typical values this can easily
  reach hundreds of megabytes.

 It would be clearer to to mention explicitely that the cookie mechanism
 prevents spoofed packets from taking up state, thereby limiting [....]


Could you please be more explicit what text you are not happy with?


I don't think it is obvious enough in the text that cookies prevent
attacks based on source IP spoofing, and tHat this attack is based on
a network of compromised machines that talk IKE without needing
spoofing. I would also just say "attacker" instead of "bot-net" to keep
it generic.

  Note that the Responder SHOULD cache
  tickets for a short time to reject reused tickets (Section 4.3.1),
  and therefore there should be no issue of half-open SAs resulting
  from replayed IKE_SESSION_RESUME messages.

 I should probably read 5723, but why would one ever respond to an "old"
 re-used or unknown session resume ticket? I guess the only use is a
 faster failure of lost resumption tickets for real clients? Since just
 sending a "go away" response is not more computationally expensive then
 creating a "go away" response for an invalid IKE SPI or badly formed IKE
 packet, I would say it is not worth implementing a separate list of
 recently used tickets.


RFC 5723 describes two kinds of tickets - "ticket by reference"
and "ticket by value". In the former case the server stores

all the information regarding IKE SAs that can be resumed and the ticket isjust an "index" in that database. With this approach

the server always knows whether the ticket was already used or not.


Ok, so that ticket is not covered in the above text. If there is such
a ticket, it is unused, not re-used.

With the latter approach all the information regarding the SA
is stored in the ticket itself. The server stores nothing in this case - itjust decrypts the presented ticket and resumes the IKE SA.
In this case the server doesn't know whether the ticket
is used before unless it maintaines a cache of recently
used tickets.


Ahh okay. Thanks for the information. That makes sense now. But it does
open up another attack. Attackers can flood the responder with bogus
resumption tickets, using up the responders CPU. But I see that is covered
itself in RFC 5723 section 9.3, but that is currently not references in
your current document. Possibly add that for completeness sake?

  If the received IKE_AUTH message failed to decrypt correctly (or
  failed to pass ICV check), then the Responder SHOULD still keep the
  computed SK_* keys, so that if it happened to be an attack, then the
  malicious Initiator cannot get advantage of repeating the attack
  multiple times on a single IKE SA.

 Well, it needs to do this anyway in case the attacker is just sending
 bogus responses faster than the real client. So I don't think this


Do you mean "bogus requests"? Isn't it a DoS attack?


Yes I meany bogus IKE_AUTH requests.

 advise here is warranted - it has nothing to do with ddos.
I think this advise is closely related to DoS protection. You yourselfdescribed the attack two lines above.


It is an (obvious) attack but not a DDOS attack. eg:

client    IKE_INIT Request          --->
                                    <--- IKE_INIT Response  server
attaker   IKE_AUTH Request (bogus)  --->  [fails]
client    IKE_AUTH Request          --->

I think any implementor should really already handle this case in
general. Any failures of unauthenticated packets must be dropped
and the timeout timer continued to wait for the legitimate response.
That's a core part of the IKEv2 spec, so I don't think that needs
to be repeated in this document.

  To prevent this kind of attacks the responder should not blindly
  download the whole file.  Instead it SHOULD first read the initial
  few bytes, decode the length of the ASN.1 structure from these bytes,
  and then download no more than the decoded number of bytes.

 That seems really bad. If the attacker controls the URL, they can also
 put an malicious ASN.1 encoding in the cert. Much better is to [Oh never
 mind you write all the things I wrote here already]

OK.


Just to clarify, I do think you need to rewrite the text to not blindly
trust the ASN.1 length encoding. Or just remove that advise and stick
to the other items to discuss right after.

  With a typical setup and typical Child SA lifetimes, there
  are typically no more than a few such exchanges, often less.

 I don't agree. People put in 1s liveness probes, so that's a lot of IKE
 packets.


Liveness check is about 50 bytes. Even if it is performed

every second, it results in 2 packet/sec and 100 bytes/sec traffic per aclient. Is it a lot?


See other discussions. We sadly have a strong demand by operational
people to have really short liveness timers. While as implementor, we
have refused < 1s, people often do use 1s timers as a way to do High
Availability. So I think the advise of limiting the number of allowed
responses for an IKE SA in general is dangerous. There are many
unexpected use cases.

  Since any endpoint can initiate a new exchange, [...]

 I would more explicitely point the AUTH NULL based attacks to its RFC.
Well, I think we tryed to follow this way: there is little specific to NULLauth, it is just mentioned (and referenced) as one of the factors that maymake DoS attacks more easy to mount.


hmm ok.

 Then focus this document on the possible abuse of legitimate clients.
 However, I don't know what I would want to advise. You can put in
 maximums for rekeys, reauths, or child sa's, but those should at most
 be configuration options, and not hardcoded options in the
 implementation - since implementors cannot predict what legitimate
 large scale use their code might see.


Sure. And that's why the draft doesn't prescribe any hard limits,
it just lists possible defense measures.


right.

  For that reason, it is NOT RECOMMENDED to
  ever increase the IKEv2 window size above its default value of one
  if the peer uses NULL Authentication.

 I'm not sure why here the auth method is used to discriminate. Earlier
 it also talked about authenticated clients and launching many exchanges?

Because with NULL auth the peer is not authenticated and we'd rather limithim/her abilities to mount DoS attack

by initiating N exchanges in parallel, that would increase
our peak load. If the peer is authenticated, then launching

N exchanges simultaneously is not an attack in general. And if theauthenticated peer mounts such a DoS attack, the

he/she could be traced down and either out-of-band
measures are taken or peer's credentials are revoked.


So you are saying basically that this text should have appeared in the
AUTH NULL RFC, but didn't. Perhaps then a separate section for AUTH NULL
clients could be put in this document, and then also let it update that
RFC?

 Also, this advise is actually an update to RFC7619/RFC7296 so this
 document should list it is updating those RFC's.


Is it really needed? RFC 7296 doesn't deal with NULL auth,
and RFC 7619 does reference this draft in Security Considerations.
What others think of it?


I'm curious too what others think this is: a "recommendation" or a "change".

  If the peer initiates too many exchanges of any kind,
  implementations can introduce an artificial delay before
  responding to request messages.  This delay would decrease the
  rate the implementation need to process requests from any
  particular peer, making it possible to process requests from the
  others.  The delay should not be too long to avoid causing the IKE
  SA to be deleted on the other end due to timeout.

 I am not sure how useful this advise is. Since people use liveness
 timeouts of 1s, a malicious peer can always do 1s of exchanges. So if
 you want to introduce delays, they should probably delay only
 non-liveness exchanges. And liveness exchanges that are more frequent
 that 1s should probably just be dropped or rejected.

It doesn't matter what exchange type is. The intention is to artificiallylimit the number of exchanges the malicious peer can initiate per second.


See earlier discussion about 1s liveness probes.... It is very common.

There is a MID window, so the peer cannot initiate a new exchange until
one of the currently active exchanges is completed. If, for example,
window size is 1, then the malcious peer cannot initiate a new
exchange until we send a responce to the current one. If we send
response immediately, then the malicious peer immediately

initiates another exchange. If we respond in 3 seconds, then it will have towait 3 seconds, before it can initiate a new exchange


Yes, I don't have a problem with this part.

(probably sending retransmissions during that time that we will ignore).That's an idea. If malicious peer is so impatient, that it'll tear down theIKE SA if no response is received within 3 seconds, then it'll make worse foritself - it'll need to create
a new IKE SA from scratch passing all the puzzles barriers.


Sure.

  Note, that if the Responder
  receives retransmissions of the request message during the delay
  period, the retransmitted messages should be silently discarded.

 That also updated RFC-7296 which states that each IKE request should get

an IKE answer.


I don't think artificaial delay is a violation of RFC 7296.
Each IKE request will be answered. RFC 7296 doesn't require that

it is answered immediately (or as soon as responder can prepare theresponse).


Yes, you are right. Strike this remark :)

 While the document mentions Fragmentation with respect to puzzles, it
 does not mention ddos attacks based on malicious fragmentation packets.
 It could be that the base RFC is clear enough, but perhaps this document
 should give some advise too?

I think RFC 7383 lists possible DoS attacks in Security Considerationssection.

Do you think it's not enough?


It is, but you are inconsistent with what you pull it or reference from
other RFC's? See above discussion.

btw. thanks for this discussion. It has raised some interesting
implementation details, some of which I now want to implement :)

Paul

_______________________________________________
IPsec mailing list
IPsec@ietf.org
https://www.ietf.org/mailman/listinfo/ipsec

Re: [IPsec] I-D Action: draft-ietf-ipsecme-ddos-protection-05.txt

Reply via email to