Re: [lisp] Benjamin Kaduk's Discuss on draft-ietf-lisp-rfc6830bis-20: (with DISCUSS and COMMENT)

Joel M. Halpern Fri, 28 Sep 2018 15:42:02 -0700

Thank you Benjamin.  This response helps me understand the situation.

I have sent a note to the WG about making LISP-SEC MTI. That kind ofchange needs WG support.


Yours,
Joel

On 9/28/18 6:03 PM, Benjamin Kaduk wrote:

Hi Joel,


On Wed, Sep 26, 2018 at 11:53:02PM -0400, Joel M. Halpern wrote:

Is there text we can add about the scoping that will change your discuss
into a series of useful comments?


I had attempted to structure my Discuss points so that they would either be
useful comments as is, or rendered moot by a reduced scope.  I guess I can
try to clarify those below.  (To be clear, reducing the scope is only going
to move from "has potentially existentially bad problems" to "has
substantial issues that likely require reengineering to resolve".)

If so, Some indication of how you would like that phrased would help us
address these.


I think Ekr's ballot position on 6833bis has a good summary of the
architecture assumptions that the reduced scope allows us to make.
In order to have the document be able to plausibly make those claims, it
looks like we'd need to at least:
(1) update the Abstract/Introduction to clarify that the EID namespace is
     only defined within a single administrative domain.
(2) (optionally, if it makes sense) mention in the introduction that this
     administrative domain can include transport over other networks in the
     same way that a VPN would function[*], without requiring cooperation
     from or interaction with the other networks' administrators
(3) remove the "global" text from the EID-to-RLOC Database and Map-Cache
     definitions
(4) update the EID-Prefix definition to talk about the local site or
     administrative domain's "address allocation authority"
(5) Take a look at the EID definition to consider whether references to "on
     the public Internet" are still valid, and the text about assignment
     in a hierarchical manner should be revised for the new scope as well.
     Likewise for EID-internal structure that is "not visible to the global
     routing system"

(I stopped skimming and looking for problematic text around section 6)

[*] Ideally this would be done without using the term "VPN" itself, since
I'd like to get a movement going to restrict "VPN" to include
confidentiality (i.e., privacy) protection.  "virtual network" or "overlay
network" may or may not be good candidate replacement terms.

If not, we seem to have a larger problem.


Well, we appear to have five ADs that are supporting making LISP-SEC a
normative reference and thus MTI; I don't know if that scale of change
meets your threshold for a "larger problem".

Yours,
Joel

On 9/26/18 11:44 PM, Benjamin Kaduk wrote:

Benjamin Kaduk has entered the following ballot position for
draft-ietf-lisp-rfc6830bis-20: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-lisp-rfc6830bis/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

I have grave concerns about the suitability of LISP as a whole, in its
present form, for advancement to the Standards-Track.  While some of my
concerns are not specific to this document, as the core protocol
(data-plane) spec, it seems an appropriate place to attach them to.

I am told, out of band, that the intended deployment model is no longer to
cover the entire Internet (c.f. the MISSREF-state
draft-ietf-lisp-introduction's "with LISP, the dge of the Internet and the
core can be logically separated and interconnected by LISP-capable
routers", etc.), and that full Internet-scale operation is no longer a
goal.  However, since that does not seem to be reflected in the current
batch of documents up for IESG review, I am forced to ballot on them
"as-is", namely as targetting global Internet deployment.  The requirements
placed on the mapping system are so stringent so as to be arguably
unachievable at Internet-scale, though that arguably has more of an
interaction with the control-plane than the data-plane.  It's still in
scope here, though, as part of the overall description of the protocol
flow.


(rendered moot by scope reduction)

There are an almost innumerable number of downgrade attacks possible, and
the control-plane and data-plane security mechanisms are not normative
dependencies of the current corpus of documents, and as such are not up for
consideration as mitigating the security concerns with the core documents.


The downgrade attacks will probably require some further analysis; LISP-SEC
would protect a lot of the header bits but I think there may be some other
data flows to be looked at.

Section 3 defines the EID-to-RLOC Datbaase:

     EID-to-RLOC Database:   The EID-to-RLOC Database is a global
        distributed database that contains all known EID-Prefix-to-RLOC
        mappings.  Each potential ETR typically contains a small piece of
        the database: the EID-to-RLOC mappings for the EID-Prefixes
        "behind" the router.  These map to one of the router's own
        globally visible IP addresses.  Note that there MAY be transient
        conditions when the EID-Prefix for the site and Locator-Set for
        each EID-Prefix may not be the same on all ETRs.  This has no
        negative implications, since a partial set of Locators can be
        used.

No compelling architecture for a trustworthy global distributed database
has been presented that I've seen so far, and LISP relies heavily on the
mapping system's database for its functionality.  I am concerned that so
many requirements are placed on the mapping system so as to be in effect
unimplementable, in which case it would seem that the architecture as a
whole (that is, for a global Internet-scale system) is not fit for purpose.


(rendered moot by scope reduction)

Section 4.1's Step (6) only mentions parsing "to check for format
validity".  I think it is appropriate to mention (and refer to) source
authentication checks as well, since bad Map-Reply data can allow all sorts
of attacks to occur.


(not affected by scope reduction)

There are some fairly subtle ordering requirements between the order of
entries in Map-Reply messages and the Locator-Status-Bits in data-plane
traffic (so that the semantic meaning of the status bits are meaningful),
which is only given a minimal treatment in the control-plane document.  The
need for synchronization in interpreting these bits should be mentioned
more prominently in the data-plane document as well.


(not affected by scope reduction)


The usage of the Instance ID does not seem to be adequately covered; from
what I've been able to pick up so far it seems that both source and
destination participants must agree on the meaning of an Instance ID, and
the source and destination EIDs must be in the same Instance.  This does
not seem like it is compatible with Internet scale, especially if there are
only 24 usable bits of Instance ID.


(not affected by scope reduction)


There seems to be a lot of intra-site synchronization requirements, notably
with respect to Map-Version consistency, the contents and ordering of
locator sets for EIDs in the site, etc.; the actual hard requirements for
synchronization within a site should be clearly called out, ideally in a
single location.


(not affected by scope reduction, since ETRs are affected and not just
Map-Servers)


The security considerations attempt to defer substantially to the
threat-analysis in RFC 7835, which does not really seem like a complete
threat analysis and does not provide analysis as to what requirements are
placed on the boundaries between the different components of LISP (data
plane, control plane, mapping system, various extensions, etc.).  The
secdir reviewer had some good thoughts in this space.


(not affected by scope reduction)


The security considerations throughout the LISP documents place a heavy
focus on the risk of over-claiming for routing EID-prefixes.  This is a
real concern, to be clear, but it should not overshadow the risk of an
attacker who is able to move traffic around at will, strip security
protections, cause denial of service, alter data-plane payloads, etc.
Similarly, this document's security considerations call out denial of
service as a risk from Map-Cache insertion/spoofing, but the risks from an
attacker being able to read and modify the traffic, perhaps even without
detection, seems a much greater threat to me.


(not affected by scope reduction)


I am not convinced that this protocol meets the current IETF requirements
for the security properties of Standards-Track Protocols without at least
LISP-SEC as a mandatory-to-implement component, and possibly additional or
stronger requirements.  (I did not do a full analysis of the system in the
presence of those security mechanisms, since that is not what is being
presented for review.)


(noting that LISP-SEC needs to be MTI and analysis performed under the new
assumptions)

Having an EID that is associated to user-correlatable devices has severe
privacy considerations, but I could not find this mentioned anywhere in all
of the LISP documents I've read so far.


(not affected by scope reduction)

-Benjamin



----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

I apologize for the somewhat scattered nature of these comments; there are
a lot of them and I was focusing my time more on trying to understand the
broader system, and the intended security posture, so they did not get as
much clean-up as I would have liked.  (Most of my review was performed on the
-18, though I have tried to update to the -20 as relevant.)


The instance ID provides for organizational correlation, another privacy
exposure.

Is there anything different between an "EID-to-RLOC Map-Request" and just a
"Map-Request"?  (Same question for "Map-Reply", too.)

There's a lot of stuff that seems to work best if there is symmetric
bidirectional traffic, with inline signalling of map version and
reachability changes, though clearly everything is designed to also work
with asymmetric connectivity or unidirectional traffic.  It would be nice
to have a high-level summary in or near the introduction about what kinds
of behavior/performance differences are expected for bidirectional vs.
unidirectional traffic.

Section 2

That's not the 8174 boilerplate; it's more than just adding a cite to the
2119 boilerplate.

Section 3

nit: "An address family that pertains to the Data-Plane." is a sentence
fragment.

     Ingress Tunnel Router (ITR):   An ITR is a router that resides in a
        [...]
        mapping lookup in the destination address field.  Note that this
        destination RLOC MAY be an intermediate, proxy device that has
        better knowledge of the EID-to-RLOC mapping closer to the

This doesn't seem like a 2119 MAY is necessary, but rather a statement of
fact that may not be known to the encapsulating ITR.

        Specifically, when a service provider prepends a LISP header for
        Traffic Engineering purposes, the router that does this is also
        regarded as an ITR.  The outer RLOC the ISP ITR uses can be based
        on the outer destination address (the originating ITR's supplied
        RLOC) or the inner destination address (the originating host's
        supplied EID).

I'm confused here, perhaps in multiple ways.  Are there now *two* LISP
headers on the packet?  Is the "outer RLOC the ISP ITR uses" the source
RLOC or the destination RLOC?

     Negative Mapping Entry:   A negative mapping entry, also known as a
        negative cache entry, is an EID-to-RLOC entry where an EID-Prefix
        is advertised or stored with no RLOCs.  That is, the Locator-Set
        for the EID-to-RLOC entry is empty or has an encoded Locator count
        of 0.

Is "empty" a distinct representation from "locator count of zero"?

Perhaps something of an aside, but the check described for
Route-Returnability is a somewhat weak check, and in some cases could still
be spoofed.  (I don't expect this to surprise anyone, of course, but
perhaps some more qualifiers could be added to the text.)

Section 4

     An additional LISP header MAY be prepended to packets by a TE-ITR
     when re-routing of the path for a packet is desired.  A potential
     use-case for this would be an ISP router that needs to perform
     Traffic Engineering for packets flowing through its network.  In such
     a situation, termed "Recursive Tunneling", an ISP transit acts as an
     additional ITR, and the RLOC it uses for the new prepended header
     would be either a TE-ETR within the ISP (along an intra-ISP traffic
     engineered path) or a TE-ETR within another ISP (an inter-ISP traffic
     engineered path, where an agreement to build such a path exists).

"the RLOC it uses for the new prepnded header", again, this is as the
destination RLOC (vs. source RLOC)?

Section 4.1

     o  Map-Replies are sent on the underlying routing system topology
        using the [I-D.ietf-lisp-rfc6833bis] Control-Plane protocol.

Just to check my understanding: is the "underlying routing system topology"
the same as the "underlay"?

Is step (3) just describing more of what step (2) says is "not described in
this example"?

Section 5.3

The word "nonce" is normally used for something used exactly once.
E.g., with some AEAD algorithms, if the same "nonce" input is used for
different encryptions, the entire security of the system is compromised.
It would be better to refer to this field with a different term, given
that "the same nonce can be used for a period of time when encapsulating to
the same ETR".  "Uniquifier" or "random value" might be reasonable choices.

Why is there no discussion of the Map-Version or Instance-ID fields
in this section?

When doing ETR/PETR decapsulation:

     o  The inner-header 'Time to Live' field (or 'Hop Limit' field, in
        the case of IPv6) SHOULD be copied from the outer-header 'Time to
        Live' field, when the Time to Live value of the outer header is
        less than the Time to Live value of the inner header.  Failing to
        perform this check can cause the Time to Live of the inner header
        to increment across encapsulation/decapsulation cycles.  This
        check is also performed when doing initial encapsulation, when a
        packet comes to an ITR or PITR destined for a LISP site.

Er, what is "this check" that is also performed for initial encapsulation?
How are there multiple TTL values to compare?

     o  The inner-header 'Differentiated Services Code Point' (DSCP) field
        (or the 'Traffic Class' field, in the case of IPv6) SHOULD be
        copied from the outer-header DSCP field ('Traffic Class' field, in
        the case of IPv6) to the inner-header.

nit: the first "inner-header" seems like an editing remnant?

Section 7.1

How is this stateless if it invovles knowledge about the routers between
the ITR and all possible ETRs (i.e., a set that could change over time)?

Section 8

This 32-bit vs 24-bit thing is pretty hokey for a standards-track
specification (yes, I know that LISP-DDT is not standards track at the
moment).

Section 9

     Alternatively, RLOC information MAY be gleaned from received tunneled

What is this an alternative to?  The list of four options above?

     packets or EID-to-RLOC Map-Request messages.  A "gleaned" Map-Cache
     entry, one learned from the source RLOC of a received encapsulated
     packet, is only stored and used for a few seconds, pending
     verification.  Verification is performed by sending a Map-Request to
     the source EID (the inner-header IP source address) of the received
     encapsulated packet.

The source EID is some random end system, right?  So this relys on some
magic in the ETR to detect that there's a Map-Request and reply directly
instead of passing it on to the EID that won't know what to do with it?

Talking about the "R-bit" of the Map-Reply" is detail from 6833bis and
might benefit from an explicit section reference to the other document.

Section 10

What is the "CE" of "CE-based ITRs"?  Presumably Customer Edge, but it
is not marked as well-known at
https://www.rfc-editor.org/materials/abbrev.expansion.txt so expansion is
probably in order.

Again, when we are talking about the internal structure of the Map-Reply, a
detailed section refernce to 6833bis is useful.

Modifying LSBs seems like a fine DoS attack vector for an on-path attacker.

     value of 1.  Locator-Status-Bits are associated with a Locator-Set
     per EID-Prefix.  Therefore, when a Locator becomes unreachable, the
     Locator-Status-Bit that corresponds to that Locator's position in the
     list returned by the last Map-Reply will be set to zero for that
     particular EID-Prefix

Doesn't this imply a stateful relationship between the ordering of
Map-Replys and data-plane traffic?

Section 10.1

     Note that "ITR" and "ETR" are relative terms here.  Both devices MUST
     be implementing both ITR and ETR functionality for the echo nonce
     mechanism to operate.

Perhaps they could be given actual names so as to disambiguate which steps
are performed with ITR vs. ETR role?

     The echo-nonce algorithm is bilateral.  That is, if one side sets the
     E-bit and the other side is not enabled for echo-noncing, then the
     echoing of the nonce does not occur and the requesting side may
     erroneously consider the Locator unreachable.  An ITR SHOULD only set
     the E-bit in an encapsulated data packet when it knows the ETR is
     enabled for echo-noncing.  This is conveyed by the E-bit in the RLOC-
     probe Map-Reply message.

Why is this even optional?  If it was mandatory to use, then there would
not be a question.  But at least clarify that the "this" that is conveyed
is whether the peer supports the echo-nonce algorithm.  (Also, subject to
downgrade.)

Section 13

     When a Locator record is removed from a Locator-Set, ITRs that have
     the mapping cached will not use the removed Locator because the xTRs
     will set the Locator-Status-Bit to 0.  So, even if the Locator is in
     the list, it will not be used.  For new mapping requests, the xTRs
     can set the Locator AFI to 0 (indicating an unspecified address), as
     well as setting the corresponding Locator-Status-Bit to 0.  This
     forces ITRs with old or new mappings to avoid using the removed
     Locator.

The behavior describe here seems like it would be better described as "when
a Locator is taken out of service" than "removed from a Locator-Set", since
if it is not in the set at all, it has no index, and no LSB or AFI to set.
Should actually depopulating it like this be forbidden?

I guess the Map Versioning is supposed to help with this, but we need to
nail down the semantics more and/or give a clearer reference to it.

Section 13.1

     An ITR, when it encapsulates packets to ETRs, can convey its own Map-
     Version Number.  This is known as the Source Map-Version Number.

Replacing "its own Map-Version Number" with something like "the Map-Version
numer for the LISP site of which it is a part".  Writing this causes me to
note that the semantics of the Map-Version are unclear, here -- what is it
scoped to?  An EID-Prefix?  An RLOC?  Oh, you say that in the next
paragraph (EID-Prefix).

     A Map-Version Number can be included in Map-Register messages as
     well.  This is a good way for the Map-Server to assure that all ETRs
     for a site registering to it will be synchronized according to Map-
     Version Number.

Huh?  I must be confused how this works.  (Also, wouldn't this be better in
the control plane document which covers Map-Register?)

Section 15

     o  When a tunnel-encapsulated packet is received by an ETR, the outer
        destination address may not be the address of the router.  This
        makes it challenging for the control plane to get packets from the
        hardware.  This may be mitigated by creating special Forwarding
        Information Base (FIB) entries for the EID-Prefixes of EIDs served
        by the ETR (those for which the router provides an RLOC
        translation).  These FIB entries are marked with a flag indicating
        that Control-Plane processing SHOULD be performed.

I assume this is just my lack of background showing, but I'm confused how
it makes sense to mark these for control-plane processing.  Isn't the
control plane much slower, and we're not putting all of the LISP data-plane
traffic onto the slow path?

Section 18

     o  Data-Plane gleaning for creating map-cache entries has been made
        optional.  If any ITR implementations depend or assume the remote
        ETR is gleaning should not do so.

nit: this is ungrammatical; "they should not" or "Any ITR implementations
that depend on or assume that" would fix it.

Section 19.1

Presumably IANA also updated the reference column to point to this
document?


_______________________________________________
lisp mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lisp

Re: [lisp] Benjamin Kaduk's Discuss on draft-ietf-lisp-rfc6830bis-20: (with DISCUSS and COMMENT)

Reply via email to