David Wierbowski writes:
> I'm not sure this makes RFC4718 incorrect.  It just makes it incomplete.

Ok, but that still means we need to find a way to fix that problem
before we can use that solution in IKEv2bis. 

> > This solution might cause peers to stay in live lock state, causing
> > the whole IKE SA to be unusable. I.e. host A starts IKE SA rekey and
> > host B starts Create Child SA. Host B replies NO_PROPOSAL_CHOSEN to
> > host A's IKE SA rekey, and Host A replies NO_ADDITIONAL_SAS to Host
> > B's Create Child SA request. Both ends process replies, and notices
> > they failed, thus both start again, causing both ends to be trying
> > these operations as fast as they can. This situation will stay as it
> > is unless something kicks hosts out of sync.
> >
> > Or returning NO_ADDITIONAL_SAS might cause other end to delete the
> > whole IKE SA and start from scratch.
> 
> I do not like that RFC 4718 used NO_PROPOSAL_CHOSEN as the indicator that
> a rekey is being rejected because there are outstanding requests.  To me
> a new notify would have made sense.

True, but as RFC4718 tried to be so it does not modify IKEv2, it could
not define new error code. In IKEv2bis we can do this, so I think we
should define new error code something like "TEMPORAL_FAILURE" which
means there was some kind of temporal error (i.e. the problem will
disappear without anybody changing policy) and other end should try
again after short timeout.

This error code could have uses for other places too.

NO_PROPOSAL_CHOSEN has indication that the problem will NOT disappear
unless someone changes something (i.e. proposals or policy or traffic
selectors etc). So some implementations might (and should) use much
larger timeout before trying again with exactly same parameters. 

> Given that RFC 4718 did use NO_PROPOSAL_CHOSEN it seems to me that
> when HOST A is rekeying the IKE_SA it should assume the peer is busy
> when it receives NO_PROPOSAL_CHOSEN and should continue to attempt
> to periodically rekey the IKE SA again.

Yes.

> I do not agree that when Host B receives NO_ADDITIONAL_SAS that it
> should retry the operation using the same IKE SA.

True, if it follows the RFC4306, it should tear down the whole IKE SA,
and start from beginning:
----------------------------------------------------------------------
4.  Conformance Requirements
...
                        If the responder rejects the CREATE_CHILD_SA
   request with a NO_ADDITIONAL_SAS notification, the implementation
   MUST be capable of instead closing the old SA and creating a new one.
...
----------------------------------------------------------------------

This would be very unfortunate operation to be done in this case, as
it would tear down the whole IKE SA, and all the IPsec SAs along with
it. I do not think we can use NO_ADDITIONAL_SAS with the current
definition anywhere else because of this.

If on the other hand the host B which receives NO_ADDITIONAL_SAS does
not tear down the whole IKE SA, but decides to keep the existing IKE
SA up and running, there is no text anywhere saying it cannot start
create child exchange again in future. Most likely it will do that
whenever next packet requiring IPsec SA to be created is received,
thus if there is constant stream of packets which would require
protection it will trigger new create child exchange immediately.

If we want that when host B receives NO_ADDITIONAL_SAS or when it
rejects the IKE SA rekey with NO_PROPOSAL_CHOSEN (or with new
TEMPORAL_FAILURE) then it needs to mark the IKE SA in some kind of on
hold state, which means no new exchanges can be started on it, that
needs to be explictly mentioned.

> As such I do not think there is a live lock state. What should be
> done is up to the implementation. An implementation could assume the
> other end is in the process of rekeying or deleting the IKE SA and
> delay taking any action or it could take immendiate action. If it
> takes immediate action it would need to do so on a new IKE SA.

How long should it delay those operations? Forever? Does that include
DPD? If so how is the other end going to get rid of the IKE SA if Host
A crashes and forgets everything about the IKE SA, as there will not
be any more exchanges from Host A from that on etc.

As the behavior of the nodes affects interoperability we should define
what to do in this case. 

> > This is not in RFC4306, this is just one proposal given in RFC4718
> > which might be used, but as I noted above, it can cause live lock
> > loop, thus it is not really acceptable.
> 
> I think it is appropriate to add this to the new draft.  If you are
> concerned about the lock state then a warning should be added stating
> that when you receive NO_ADDITIONAL_SAS that you should not attempt to
> retry that operation on the same IKE SA, although that seems
> self-evident.

Yes, I would want to have some kind of text describing that, and also
describing how long does this limit for retry take effect, and I
assume that if the other end does not rekey or delete the IKE SA for
certain timeout then the other node which received NO_ADDITIONAL_SAS
should delete the IKE SA and start over.

> I'm not convinced it is broken, I'm just convinced that if you
> attempt to retry an operation on the same IKE SA that you received
> NO_ADDITIONAL_SAS on that you can get into a lock state.  To reduce that
> concern we can come up with a new REKEYING_IKE_SA notification, but that's
> likely to cause problems with old implementations, so better to stick with
> what RFC 4718 proposed.

Adding new error notifications cannot be problem for complient
implementations as RFC4306 says:

----------------------------------------------------------------------
3.10.1.  Notify Message Types
...
   Types in the range 0 - 16383 are intended for reporting errors.  An
   implementation receiving a Notify payload with one of these types
   that it does not recognize in a response MUST assume that the
   corresponding request has failed entirely.  ...
----------------------------------------------------------------------

Thus every complient implementation MUSST assume that corresponding
request has failed if they receive unrecognized error notify on
response. Thus every implementation should handle new error messages
just as we wanted, i.e. assume the IKE SA rekey failed.

> 
> > The text above implies that regardless what you do you should be able
> > to allow other end to start exchanges and process them. I.e. IKEv2
> > protocol tries to be specified in such way that both ends can start
> > exchanges at any times and expect them to either fail or succeed and
> > get reply back, but not stay in situation where you do not know,
> > whether other end processed your request or not.
> >
> > If you delete the IKE SA immediately that will happen.
> 
> You can never guarantee you are going to get a response back to a
> request.  I do not see what makes this situation any different.

If I do not get response back (after dozen retransmissions over a
period of at least several minutes) to a request on IKEv2 protocol
that means the IKE SA is dead, and I silently discard it (i.e. assume
other end is dead).

Only case where you might not reply back is if the other end has
deleted the IKE SA or if the network is broken. In both cases the
correct fix is to remove the IKE SA and start over and it does not
matter whether your request got other end or not.

But that is not true with rekey case as some of the operations you do
on the old IKE SA do affect the state of the new IKE SA, thus you need
to know whether other end processed your request or not.

> I understand that RFC 4718 is just one proposal, but it's one that I
> expect some vendors tried to implement.  I doubt there are many that are
> currently delaying the deletion of the IKE SA.

As our implementation does that, I guess all our customers
implementations do it... :-)

But I do not think that is an issue here.

Simultaneous rekeys are not things that happen that often (if ever
outside laboratory tests :-), so even if there are old implementations
which do not do what IKEv2bis document will say, that shouldn't really
matter.

So I think we need to write some text that will work, and not be to
concerned about what current implementations are now doing (I am sure
all implementations out there are going to need minor modifications
anyways when IKEv2bis comes out). 

> I'm not convinced yet that RFC 4718 is broken or at least that it cannot
> be made to work.

I think it can be made to work, I do consider it broken or at least
underspecified as it is now, as it might lead live locks, but adding
text to it might solve the problem (before I see the text and solution
I cannot say for sure it will solve the problem). 

> > Implementation needs to still have the code that detects the
> > simultaneous rekey, and other end might still use this delay, thus you
> > need to be able to cope with the case where this happens.
> > Implementations need to be able to handle both cases regardless
> > whether we use SHOULD or MAY, only thing that is different is whether
> > they allow other end finish exchanges or not.
> 
> Agreed, but I still think delaying the deletion is at most a MAY.

Ok.
-- 
kivi...@iki.fi
_______________________________________________
IPsec mailing list
IPsec@ietf.org
https://www.ietf.org/mailman/listinfo/ipsec

Reply via email to