[homenet] Concerns about DNCP

Juliusz Chroboczek Tue, 30 Jun 2015 10:11:54 -0700

Concerns about DNCP
===================

DNCP is an elegant and small protocol that distributes HNCP data across
the Homenet.  DNCP works by flooding a hash of the full network state over
link-local multicast, and synchronising the actual state piecewise by
using link-local unicast request/response pairs.  This document outlines
a number of concerns with DNCP as described in draft-ietf-homenet-dncp-06.


I very strongly support the work that is being done on DNCP and HNCP.
I will be glad to see something very similar to the current drafts adopted
by the Homenet WG.


Packet format
-------------

### There is no header

A DNCP packet consists of just a sequence of TLVs.  This means that there
is no way to version the DNCP protocol.  Should a change in the TLV format
be required, a new UDP port will need to be allocated.  (A new multicast
group is not sufficient, since DNCP also uses unicast).

I recommend adding a fixed-size header with a version number.

### NODE-ENDPOINT is stateful

The DNCP/HNCP protocol suite uses the elegant technique of using
recursively embedded TLVs: hop-to-hop data is in the packet toplevel,
end-to-end data is in the NODE-STATE TLV, which may in turn contain TLVs
that themselves contain embedded TLVs.  The TLVs are mostly stateless, in
the sense that they can be sent in any order or even in independent
packets.  NODE-ENDPOINT is an exception.

NODE-ENDPOINT identifies the sender of this packet, and applies to all
TLVs in this packet.  The current specification implies that the
NODE-ENDPOINT may appear anywhere in the packet, which would force the
receiver to make two passes over the packet.

Conceptually, NODE-ENDPOINT is a packet header, and it is best treated
that way.  Ideally, the information it contains would be part of the
fixed-size packet header suggested above (but after the version number,
which should be parsable even when the NODE-ENDPOINT format changes).
Alternatively, specify that NODE-ENDPOINT MUST be the very first TLV in
a packet, or at least appear before all currently-defined TLVs, which
merely formalises what existing implementations already do.

### NODE-ENDPOINT is underspecified

It is not clear whether NODE-ENDPOINT is required in all packets, and if
not which TLVs are allowed in a packet without a NODE-ENDPOINT.  Existing
implementations appear to differ in that respect.

### Node data is underspecified

The NODE-STATE TLV carries the end-to-end hash of the "node data".
However, the exact "node data" is never specified exactly.  For example,
there is padding applied between TLVs (look, Ma, I've saved 4ns parsing my
packet), and it is nowhere specified whether this padding participates in
the hashing (it does).

It turns out that in the absence of fragmentation the "node data" is just
the raw binary data in the NODE-STATE TLV.  This is the reasonable thing
to do, and must be specified in this manner.  It must also be made clear
that this binary data MUST NOT be modified in transit (parsing/reformatting
is not likely to work reliably), and that its hash SHOULD (or is that MAY?)
be validated on reception.

### Normalisation is apparently useless

DNCP specifies that the TLVs within a node state be sorted.  Since both
the raw binary data and the hash are end-to-end, it is not clear why this
partial normalisation is useful.

At the very least, the draft should make it clear that this normalisation
should not be relied upon, and that peers MUST forward the binary data
unchanged.  I recommend simply dropping the normalisation, although this
will require changes to the fragmentation scheme.

### FRAGMENT-COUNT is stateful in the reverse direction

FRAGMENT-COUNT is stateful, but in the reverse direction: it changes the
interpretation of the enclosing NODE-STATE TLV.  Usually, the NODE-STATE
hash carries exactly data being hashed; with a FRAGMENT-COUNT, it carries
part of that data, and FRAGMENT-COUNT is not being hashed.

FRAGMENT-COUNT is not end-to-end data, and it doesn't belong within the
NODE-STATE TLV.  It should either be a field in the NODE-STATE TLV itself
(not in an embedded TLV), or put in a TLV that contains the NODE-STATE
TLV.

### Fragmentation is specified at the TLV level

The section about fragmentation is not quite clear to me.  It appears to
specify fragmentation in terms of TLVs (every fragment must consist of
a valid series of TLVs), and appears to assume that the receiver is doing
something smart in order to avoid reassembly timeouts.

Unfortunately, this encoding makes it very difficult to implement
a simpler scheme, where fragmentation and reassembly are TLV-agnostic and
act entirely at the byte-stream level.  In particular, a fragment TLV does
not specify an initial byte offset, and the total length of the
reassembled packet is not known beforehand, which makes buffer management
in the receiver challenging.

I recommend a TLV-agnostic fragmentation scheme, with fragment offsets
counted in octets and a total reassembled size explicitly encoded.

### The fragment timeout is not specified

It is not clear when the receiver can discard an incomplete defragmentation
buffer.  This might not be an issue if a TLV-based fragmentation scheme is
used.

### Keep-alive intervals are flooded

The KEEP-ALIVE-INTERVAL is within the node state, and hence flooded across
the network.  This information is of no interest to remote nodes, this TLV
should be within at the top level of the packet in order to reduce the
amount of information being flooded.


Protocol dynamics
-----------------

### Need to check sender

Most of the packets in DNCP have exactly the same meaning whether they are
sent to a unicast or a multicast address.  There is one exception: the
keepalive timer is only reset for inconsistent network state when they
are sent over unicast.  This requires that the receiver check the
destination address of packets, which is non-portable and might be
impossible on some systems.

I recommend removing the requirement to distinguish between unicast and
multicast.  If such distinction is needed, make it explicit in the TLV
contents.

### Not clear when multicasts and unsolicited packets can be sent

As it is written, the draft requires that all TLVs other than
NETWORK-STATE be sent over unicast and does not allow unsolicited packets
other than NETWORK-STATE.  However, the draft authors indicate that
unsolicited multicast packets are allowed.

The draft should specify clearly when multicast and unsolicited packets
are allowed.  In particular, it should mention whether it is legal to
reply to a request over multicast (which may make sense, at least on some
link layers), and which packets can be sent unsolicited over multicast.

### Non standard port

The draft specifies that it is required to answer a request from a node
that does not publish a node state.  What should happen when this request
comes from a non-standard port?  Monitoring software may need to run on
the same node as a normal DNCP implementation.

### Non-publishing nodes

A node may want to participate in the full protocol without publishing
a node state in order to reduce the amount of data being flooded.  Doing
this naively might cause persistent state desynchronisation, leading to
repeated resetting of the Trickle timers.

Ideally, the draft would specify exactly which behaviours are allowed for
non-publishing nodes.  At any rate, a warning about the dangers of
non-publishing nodes should be included in the draft.

_______________________________________________
homenet mailing list
homenet@ietf.org
https://www.ietf.org/mailman/listinfo/homenet

[homenet] Concerns about DNCP

Reply via email to