[lisp] Benjamin Kaduk's Discuss on draft-ietf-lisp-rfc6830bis-20: (with DISCUSS and COMMENT)

Benjamin Kaduk Wed, 26 Sep 2018 20:45:02 -0700

Benjamin Kaduk has entered the following ballot position for
draft-ietf-lisp-rfc6830bis-20: Discuss


When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-lisp-rfc6830bis/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

I have grave concerns about the suitability of LISP as a whole, in its
present form, for advancement to the Standards-Track.  While some of my
concerns are not specific to this document, as the core protocol
(data-plane) spec, it seems an appropriate place to attach them to.

I am told, out of band, that the intended deployment model is no longer to
cover the entire Internet (c.f. the MISSREF-state
draft-ietf-lisp-introduction's "with LISP, the dge of the Internet and the
core can be logically separated and interconnected by LISP-capable
routers", etc.), and that full Internet-scale operation is no longer a
goal.  However, since that does not seem to be reflected in the current
batch of documents up for IESG review, I am forced to ballot on them
"as-is", namely as targetting global Internet deployment.  The requirements
placed on the mapping system are so stringent so as to be arguably
unachievable at Internet-scale, though that arguably has more of an
interaction with the control-plane than the data-plane.  It's still in
scope here, though, as part of the overall description of the protocol
flow.

There are an almost innumerable number of downgrade attacks possible, and
the control-plane and data-plane security mechanisms are not normative
dependencies of the current corpus of documents, and as such are not up for
consideration as mitigating the security concerns with the core documents.

Section 3 defines the EID-to-RLOC Datbaase:

   EID-to-RLOC Database:   The EID-to-RLOC Database is a global
      distributed database that contains all known EID-Prefix-to-RLOC
      mappings.  Each potential ETR typically contains a small piece of
      the database: the EID-to-RLOC mappings for the EID-Prefixes
      "behind" the router.  These map to one of the router's own
      globally visible IP addresses.  Note that there MAY be transient
      conditions when the EID-Prefix for the site and Locator-Set for
      each EID-Prefix may not be the same on all ETRs.  This has no
      negative implications, since a partial set of Locators can be
      used.

No compelling architecture for a trustworthy global distributed database
has been presented that I've seen so far, and LISP relies heavily on the
mapping system's database for its functionality.  I am concerned that so
many requirements are placed on the mapping system so as to be in effect
unimplementable, in which case it would seem that the architecture as a
whole (that is, for a global Internet-scale system) is not fit for purpose.

Section 4.1's Step (6) only mentions parsing "to check for format
validity".  I think it is appropriate to mention (and refer to) source
authentication checks as well, since bad Map-Reply data can allow all sorts
of attacks to occur.

There are some fairly subtle ordering requirements between the order of
entries in Map-Reply messages and the Locator-Status-Bits in data-plane
traffic (so that the semantic meaning of the status bits are meaningful),
which is only given a minimal treatment in the control-plane document.  The
need for synchronization in interpreting these bits should be mentioned
more prominently in the data-plane document as well.

The usage of the Instance ID does not seem to be adequately covered; from
what I've been able to pick up so far it seems that both source and
destination participants must agree on the meaning of an Instance ID, and
the source and destination EIDs must be in the same Instance.  This does
not seem like it is compatible with Internet scale, especially if there are
only 24 usable bits of Instance ID.

There seems to be a lot of intra-site synchronization requirements, notably
with respect to Map-Version consistency, the contents and ordering of
locator sets for EIDs in the site, etc.; the actual hard requirements for
synchronization within a site should be clearly called out, ideally in a
single location.

The security considerations attempt to defer substantially to the
threat-analysis in RFC 7835, which does not really seem like a complete
threat analysis and does not provide analysis as to what requirements are
placed on the boundaries between the different components of LISP (data
plane, control plane, mapping system, various extensions, etc.).  The
secdir reviewer had some good thoughts in this space.

The security considerations throughout the LISP documents place a heavy
focus on the risk of over-claiming for routing EID-prefixes.  This is a
real concern, to be clear, but it should not overshadow the risk of an
attacker who is able to move traffic around at will, strip security
protections, cause denial of service, alter data-plane payloads, etc.
Similarly, this document's security considerations call out denial of
service as a risk from Map-Cache insertion/spoofing, but the risks from an
attacker being able to read and modify the traffic, perhaps even without
detection, seems a much greater threat to me.

I am not convinced that this protocol meets the current IETF requirements
for the security properties of Standards-Track Protocols without at least
LISP-SEC as a mandatory-to-implement component, and possibly additional or
stronger requirements.  (I did not do a full analysis of the system in the
presence of those security mechanisms, since that is not what is being
presented for review.)

Having an EID that is associated to user-correlatable devices has severe
privacy considerations, but I could not find this mentioned anywhere in all
of the LISP documents I've read so far.


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

I apologize for the somewhat scattered nature of these comments; there are
a lot of them and I was focusing my time more on trying to understand the
broader system, and the intended security posture, so they did not get as
much clean-up as I would have liked.  (Most of my review was performed on the
-18, though I have tried to update to the -20 as relevant.)


The instance ID provides for organizational correlation, another privacy
exposure.

Is there anything different between an "EID-to-RLOC Map-Request" and just a
"Map-Request"?  (Same question for "Map-Reply", too.)

There's a lot of stuff that seems to work best if there is symmetric
bidirectional traffic, with inline signalling of map version and
reachability changes, though clearly everything is designed to also work
with asymmetric connectivity or unidirectional traffic.  It would be nice
to have a high-level summary in or near the introduction about what kinds
of behavior/performance differences are expected for bidirectional vs.
unidirectional traffic.

Section 2

That's not the 8174 boilerplate; it's more than just adding a cite to the
2119 boilerplate.

Section 3

nit: "An address family that pertains to the Data-Plane." is a sentence
fragment.

   Ingress Tunnel Router (ITR):   An ITR is a router that resides in a
      [...]
      mapping lookup in the destination address field.  Note that this
      destination RLOC MAY be an intermediate, proxy device that has
      better knowledge of the EID-to-RLOC mapping closer to the

This doesn't seem like a 2119 MAY is necessary, but rather a statement of
fact that may not be known to the encapsulating ITR.

      Specifically, when a service provider prepends a LISP header for
      Traffic Engineering purposes, the router that does this is also
      regarded as an ITR.  The outer RLOC the ISP ITR uses can be based
      on the outer destination address (the originating ITR's supplied
      RLOC) or the inner destination address (the originating host's
      supplied EID).

I'm confused here, perhaps in multiple ways.  Are there now *two* LISP
headers on the packet?  Is the "outer RLOC the ISP ITR uses" the source
RLOC or the destination RLOC?

   Negative Mapping Entry:   A negative mapping entry, also known as a
      negative cache entry, is an EID-to-RLOC entry where an EID-Prefix
      is advertised or stored with no RLOCs.  That is, the Locator-Set
      for the EID-to-RLOC entry is empty or has an encoded Locator count
      of 0.

Is "empty" a distinct representation from "locator count of zero"?

Perhaps something of an aside, but the check described for
Route-Returnability is a somewhat weak check, and in some cases could still
be spoofed.  (I don't expect this to surprise anyone, of course, but
perhaps some more qualifiers could be added to the text.)

Section 4

   An additional LISP header MAY be prepended to packets by a TE-ITR
   when re-routing of the path for a packet is desired.  A potential
   use-case for this would be an ISP router that needs to perform
   Traffic Engineering for packets flowing through its network.  In such
   a situation, termed "Recursive Tunneling", an ISP transit acts as an
   additional ITR, and the RLOC it uses for the new prepended header
   would be either a TE-ETR within the ISP (along an intra-ISP traffic
   engineered path) or a TE-ETR within another ISP (an inter-ISP traffic
   engineered path, where an agreement to build such a path exists).

"the RLOC it uses for the new prepnded header", again, this is as the
destination RLOC (vs. source RLOC)?

Section 4.1

   o  Map-Replies are sent on the underlying routing system topology
      using the [I-D.ietf-lisp-rfc6833bis] Control-Plane protocol.

Just to check my understanding: is the "underlying routing system topology"
the same as the "underlay"?

Is step (3) just describing more of what step (2) says is "not described in
this example"?

Section 5.3

The word "nonce" is normally used for something used exactly once.
E.g., with some AEAD algorithms, if the same "nonce" input is used for
different encryptions, the entire security of the system is compromised.
It would be better to refer to this field with a different term, given
that "the same nonce can be used for a period of time when encapsulating to
the same ETR".  "Uniquifier" or "random value" might be reasonable choices.

Why is there no discussion of the Map-Version or Instance-ID fields
in this section?

When doing ETR/PETR decapsulation:

   o  The inner-header 'Time to Live' field (or 'Hop Limit' field, in
      the case of IPv6) SHOULD be copied from the outer-header 'Time to
      Live' field, when the Time to Live value of the outer header is
      less than the Time to Live value of the inner header.  Failing to
      perform this check can cause the Time to Live of the inner header
      to increment across encapsulation/decapsulation cycles.  This
      check is also performed when doing initial encapsulation, when a
      packet comes to an ITR or PITR destined for a LISP site.

Er, what is "this check" that is also performed for initial encapsulation?
How are there multiple TTL values to compare?

   o  The inner-header 'Differentiated Services Code Point' (DSCP) field
      (or the 'Traffic Class' field, in the case of IPv6) SHOULD be
      copied from the outer-header DSCP field ('Traffic Class' field, in
      the case of IPv6) to the inner-header.

nit: the first "inner-header" seems like an editing remnant?

Section 7.1

How is this stateless if it invovles knowledge about the routers between
the ITR and all possible ETRs (i.e., a set that could change over time)?

Section 8

This 32-bit vs 24-bit thing is pretty hokey for a standards-track
specification (yes, I know that LISP-DDT is not standards track at the
moment).

Section 9

   Alternatively, RLOC information MAY be gleaned from received tunneled

What is this an alternative to?  The list of four options above?

   packets or EID-to-RLOC Map-Request messages.  A "gleaned" Map-Cache
   entry, one learned from the source RLOC of a received encapsulated
   packet, is only stored and used for a few seconds, pending
   verification.  Verification is performed by sending a Map-Request to
   the source EID (the inner-header IP source address) of the received
   encapsulated packet.

The source EID is some random end system, right?  So this relys on some
magic in the ETR to detect that there's a Map-Request and reply directly
instead of passing it on to the EID that won't know what to do with it?

Talking about the "R-bit" of the Map-Reply" is detail from 6833bis and
might benefit from an explicit section reference to the other document.

Section 10

What is the "CE" of "CE-based ITRs"?  Presumably Customer Edge, but it
is not marked as well-known at
https://www.rfc-editor.org/materials/abbrev.expansion.txt so expansion is
probably in order.

Again, when we are talking about the internal structure of the Map-Reply, a
detailed section refernce to 6833bis is useful.

Modifying LSBs seems like a fine DoS attack vector for an on-path attacker.

   value of 1.  Locator-Status-Bits are associated with a Locator-Set
   per EID-Prefix.  Therefore, when a Locator becomes unreachable, the
   Locator-Status-Bit that corresponds to that Locator's position in the
   list returned by the last Map-Reply will be set to zero for that
   particular EID-Prefix

Doesn't this imply a stateful relationship between the ordering of
Map-Replys and data-plane traffic?

Section 10.1

   Note that "ITR" and "ETR" are relative terms here.  Both devices MUST
   be implementing both ITR and ETR functionality for the echo nonce
   mechanism to operate.

Perhaps they could be given actual names so as to disambiguate which steps
are performed with ITR vs. ETR role?

   The echo-nonce algorithm is bilateral.  That is, if one side sets the
   E-bit and the other side is not enabled for echo-noncing, then the
   echoing of the nonce does not occur and the requesting side may
   erroneously consider the Locator unreachable.  An ITR SHOULD only set
   the E-bit in an encapsulated data packet when it knows the ETR is
   enabled for echo-noncing.  This is conveyed by the E-bit in the RLOC-
   probe Map-Reply message.

Why is this even optional?  If it was mandatory to use, then there would
not be a question.  But at least clarify that the "this" that is conveyed
is whether the peer supports the echo-nonce algorithm.  (Also, subject to
downgrade.)

Section 13

   When a Locator record is removed from a Locator-Set, ITRs that have
   the mapping cached will not use the removed Locator because the xTRs
   will set the Locator-Status-Bit to 0.  So, even if the Locator is in
   the list, it will not be used.  For new mapping requests, the xTRs
   can set the Locator AFI to 0 (indicating an unspecified address), as
   well as setting the corresponding Locator-Status-Bit to 0.  This
   forces ITRs with old or new mappings to avoid using the removed
   Locator.

The behavior describe here seems like it would be better described as "when
a Locator is taken out of service" than "removed from a Locator-Set", since
if it is not in the set at all, it has no index, and no LSB or AFI to set.
Should actually depopulating it like this be forbidden?

I guess the Map Versioning is supposed to help with this, but we need to
nail down the semantics more and/or give a clearer reference to it.

Section 13.1

   An ITR, when it encapsulates packets to ETRs, can convey its own Map-
   Version Number.  This is known as the Source Map-Version Number.

Replacing "its own Map-Version Number" with something like "the Map-Version
numer for the LISP site of which it is a part".  Writing this causes me to
note that the semantics of the Map-Version are unclear, here -- what is it
scoped to?  An EID-Prefix?  An RLOC?  Oh, you say that in the next
paragraph (EID-Prefix).

   A Map-Version Number can be included in Map-Register messages as
   well.  This is a good way for the Map-Server to assure that all ETRs
   for a site registering to it will be synchronized according to Map-
   Version Number.

Huh?  I must be confused how this works.  (Also, wouldn't this be better in
the control plane document which covers Map-Register?)

Section 15

   o  When a tunnel-encapsulated packet is received by an ETR, the outer
      destination address may not be the address of the router.  This
      makes it challenging for the control plane to get packets from the
      hardware.  This may be mitigated by creating special Forwarding
      Information Base (FIB) entries for the EID-Prefixes of EIDs served
      by the ETR (those for which the router provides an RLOC
      translation).  These FIB entries are marked with a flag indicating
      that Control-Plane processing SHOULD be performed.

I assume this is just my lack of background showing, but I'm confused how
it makes sense to mark these for control-plane processing.  Isn't the
control plane much slower, and we're not putting all of the LISP data-plane
traffic onto the slow path?

Section 18

   o  Data-Plane gleaning for creating map-cache entries has been made
      optional.  If any ITR implementations depend or assume the remote
      ETR is gleaning should not do so.

nit: this is ungrammatical; "they should not" or "Any ITR implementations
that depend on or assume that" would fix it.

Section 19.1

Presumably IANA also updated the reference column to point to this
document?


_______________________________________________
lisp mailing list
lisp@ietf.org
https://www.ietf.org/mailman/listinfo/lisp

[lisp] Benjamin Kaduk's Discuss on draft-ietf-lisp-rfc6830bis-20: (with DISCUSS and COMMENT)

Reply via email to