Re: [6tisch] Benjamin Kaduk's Discuss on draft-ietf-6tisch-msf-12: (with DISCUSS and COMMENT)

Tengfei Chang Tue, 24 Mar 2020 04:25:10 -0700

Hi Benjamin,

I replied inline starting with '>'


Thanks so much those detailed comments!

On Wed, Mar 11, 2020 at 6:55 PM Benjamin Kaduk via Datatracker <
nore...@ietf.org> wrote:

> Benjamin Kaduk has entered the following ballot position for
> draft-ietf-6tisch-msf-12: Discuss
>
> When responding, please keep the subject line intact and reply to all
> email addresses included in the To and CC lines. (Feel free to cut this
> introductory paragraph, however.)
>
>
> Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
> for more information about IESG DISCUSS and COMMENT positions.
>
>
> The document, along with other ballot positions, can be found here:
> https://datatracker.ietf.org/doc/draft-ietf-6tisch-msf/
>
>
>
> ----------------------------------------------------------------------
> DISCUSS:
> ----------------------------------------------------------------------
>
> I'm concerned that the scheduling function for autonomous cells can
> cause an infinite loop in the case of hash collision -- Section 3
> specifies that AutoTxCell always takes precedence over AutoRxCell, but
> if those two cells collide, the corresponding cells on the peer in
> question will also collide.  If both peers try to send at the same time
> and the hashes collide, they will both attempt to transmit indefinitely
> and never be received.
>

>. Notice that the AutoTxCell  is a shared cell, where the back-off
mechanism is applied.
> In case there is a collision on that cell, a back-off with different
exponent will be used on each side.
> The cell will be used AutoTxCell on each side at different timing.

There seems to be some "passing the buck" going on with respect to
> rate-limiting unauthenticated (join) traffic:
> draft-ietf-6tisch-minimal-security (Section 6.1.1) says that the SF
> "SHOULD NOT allocate additional cells as a result of traffic with code
> point AF43"; this document is implementing a SF, and yet we try to avoid
> the issue, saying that "[t]he at IPv6 layer SHOULD ensure that this join
> traffic is rate-limited before it is passed to 6top sublayer where MSF
> can observe it".  I think we need a clear and consistent story about
> where this rate-limiting is supposed to happen.
>

> Thanks for the comments! This has been discussed in some  previous
revision of MSF.
> It is not "passing the buck" but a decision based on the scheduling
function and security context.
> In the point of avoiding layer violation, the upper layer information
suppose NOT see-able for linker layer where 6P and MSF are.
> But regarding to security, it seems it is not avoidable.
> IMO, the scheduling function is aiming to provide algorithm to add/remove
cell according to traffic.
> The traffic could contains unauthenticated  join request from both normal
devices and malicious devices.
> The function does NOT have enough information to differentiate them.
> We are assuming some other entity out side of MSF needs to resolve this
issue.

>> If assuming the security info in the Ipv6 header is passed to MSF, we
could abandon rate-limiting approach and simply jumping over a slot if the
AF43 packet is sent on that slot.
> Hence the adapting traffic never happens to traffic marked as AF43.

>
>
> ----------------------------------------------------------------------
> COMMENT:
> ----------------------------------------------------------------------
>
> I support Roman's Discuss -- we need more information for this to be a
> useful reference; even what seem to be the official DASFAA 1997
> proceedings (https://dblp.org/db/conf/dasfaa/dasfaa97) do not have an
> associated document).
>
> Basing various scheduling aspects on (a hash of) the EUI64 ties
> functionality to a persistent identifier for a device.  How significant
> a disruption would be incurred if a device periodically changes its
> presented EUI64 for anonymization purposes?
>

> I assume you are saying a malicious device?
> There is no doubt this will influence the performance of joining process
for normal devices.
> But normal devices still have a chance to join.
> the join proxy won't be affect as well since the cell will be removed
right after the packet is sent out.

>
> There seems to be a general pattern of "if you don't have a
> 6P-negotiated Tx cell, install and AutoTxCell to send your one message
> and then remove it after sending"; I wonder if it would be easier on the
> reader to consolidate this as a general principle and not repeat the
> details every time it occurs.
>

>  Yes, this is the feature of autonomous cell. Not sure if it would easier
to understand state just one time.
> There is little different for each adding/removing, e.g which node to do
so, parent/JP?
> I personally feel it's clear to repeat this every time,  with various
type of node, so highlighting the difference.

>
> Requirements Language
>
> "NOT RECOMMENDED" is not in the RFC2119 boilerplate (but is a BCP 14
> keyword).
>

> Thanks for pointing out. It will be removed in next revision.
> We also updated the RFC to RFC8174 instead of RFC2119.

>
> Section 1
>
>    the 6 steps described in Section 4.  The end state of the join
>    process is that the node is synchronized to the network, has mutually
>    authenticated to the network, has identified a routing parent, and
>
> nit(?): I guess maybe "mutually authenticated with" is more correct for
> the bidirectional operation.
>

> will update in next revision.

>
>    It does so for 3 reasons: to match the link-layer resources to the
>    traffic, to handle changing parent, to handle a schedule collision.
>
> nit: end the list with "or" (or "and"?).
>

> will update in next revision.

>
>    MSF works closely with RPL, specifically the routing parent defined
>    in [RFC6550].  This specification only describes how MSF works with
>    one routing parent, which is phrased as "selected parent".  The
>
> nit: I suggest '''one routing parent; this parent is referred to as the
> "selected parent"'''.
>

> will update in next revision.

>
>    activity of MSF towards to single routing parent is called as a "MSF
>
> nit: "towards the"
>

> will update in next revision.

>
>    *  We added sections on the interface to the minimal 6TiSCH
>       configuration (Section 2), the use of the SIGNAL command
>       (Section 6), the MSF constants (Section 14), the MSF statistics
>       (Section 15).
>
> nit: end the list with "and".
>

> will update in next revision.

>
> Section 2
>
>    In a TSCH network, time is sliced up into time slots.  The time slots
>    are grouped as one of more slotframes which repeat over time.  The
>
> nit(?): should this be "one or more"?
>

> it should be 'one or multiple slotframes". Will update in next revision

>
>    channel) is indicated as a cell of TSCH schedule.  MSF is one of the
>    policies defining how to manage the TSCH schedule.
>
> nit: if there is only one such policy active at a given time for a given
> network, I suggest "MSF is a policy for managing the TCSH schedule".
> (If multiple policies are active simultaneously, no change is needed.)
>

> As indicated in RFC8480: A node MAY implement multiple SFs  and run them
at the same time.
> so MSF is* one of the policies* defining how to manage the TSCH schedule.

>
>    MSF uses the minimal cell for broadcast frames such as Enhanced
>    Beacons (EBs) [IEEE802154] and broadcast DODAG Information Objects
>    (DIOs) [RFC6550].  Cells scheduled by MSF are meant to be used only
>    for unicast frames.
>
> If this paragraph was moved before the previous paragraph, then EB and
> DIO would be defined before their first usage.
>

> Maybe I understand it wrong. Do you mean you prefer to move this
paragraph before the previous one?
> The EB and DIO are defined in the references, not sure we still need
define them in MSF.

>
>    bandwidth of minimal cell.  One of the algorithm met the rule is the
>    Trickle timer defined in [RFC6206] which is applied on DIO messages
>    [RFC6550].  However, any such algorithm of limiting the broadcast
>
> nit(?): "One of the algorithms that fulfills this requirement"?
>

> will update accordingly.

>
>    MSF RECOMMENDS the use of 3 slotframes.  MSF schedules autonomous
>    cells at Slotframe 1 (Section 3) and 6P negotiated cells at Slotframe
>    2 (Section 5) , while Slotframe 0 is used for the bootstrap traffic
>    as defined in the Minimal 6TiSCH Configuration.  It is RECOMMENDED to
>    use the same slotframe length for Slotframe 0, 1 and 2.  Thus it is
>
> Perhaps this is just a question of writing style, but if an
> implementation is free to use an alternative SF or a variant of MSF,
> could we not say that "MSF uses 3 slotframts", "MSF uses the same
> slotframe length for", etc.?
>

> updated to "3 slotframes are used in MSF. " , "The same slotframe length
for Slotframe 0, 1 and 2 is RECOMMENDED".

>
> Section 3
>
> Is there any risk of unwanted correlation between slot and channel
> offsets when using the same hash function and input for both
> calculations?
>
>    hash function.  Other optional parameters defined in SAX determine
>    the performance of SAX hash function.  Those parameters could be
>    broadcasted in EB frame or pre-configured.  For interoperability
>    purposes, an example how the hash function is implemented is detailed
>    in Appendix B.
>
> Given the lack of usable reference for [SAX-DASFAA], I assume that the
> content in Appendix B is going to be used as a specification, not just
> an example.
>

> the new reference for SAX is updated in the new revision.

>
>    *  The AutoRxCell MUST always remain scheduled after synchronized.
>
> nit: s/synchronized/synchronization/
>
>    AutoRxCell.  In case of conflicting with a negotiated cell,
>    autonomous cells take precedence over negotiated cell, which is
>    stated in [IEEE802154].  However, when the Slotframe 0, 1 and 2 use
>    the same length value, it is possible for negotiated cell to avoid
>    the collision with AutoRxCell.
>
> Presumably this factors in to the recommendation to have the three
> listed slotframes use the same length, but mentioning it explicitly
> (whether here or where the recommendation is made) might be nice.
>

> it is mentioned before as:  *The same slotframe length for Slotframe 0, 1
and 2 is RECOMMENDED.*

>
> Section 4
>
>    network.  Alternative behaviors may involved, for example, when
>    alternative security solution is used for the network.  Section 4.1
>
> nit: singular/plural mismatch "behaviors"/"solution is used"
>

> will be fixed in next revision.

>
> Section 4.1
>
>    A node implementing MSF SHOULD implement the Minimal Security
>    Framework for 6TiSCH [I-D.ietf-6tisch-minimal-security].  As a
>
> Didn't this get renamed to CoJP?
>

> Thanks for pointing it out! Will update in next revision.

>
> Section 4.2
>
> I a little bit wonder if there is a better description than "available
> frequencies" but don't have one to offer.
>

> The frequency to be selected is randomly picked. There is no one that is
preferred comparing to others.

>
> Section 4.3
>
>    While the exact behavior is implementation-specific, it is
>    RECOMMENDED that after having received the first EB, a node keeps
>    listen for at most MAX_EB_DELAY seconds until it has received EBs
>    from NUM_NEIGHBOURS_TO_WAIT distinct neighbors, which is defined in
>    [RFC8180].
>
> nit(?): this phrasing implies that only NUM_NEIGHBOURS_TO_WAIT is
> defined in RFC 8180, but MAX_EB_DELAY is also defined there.
>

> The "which" here indicates the whole behavior.
> It will be rephrased  as "This behavior is defined in [RFC8180]".

>
> not-nit: this phrasing is ambiguous as to whether one of MAX_EB_DELAY
> and NUM_NEIGHBOURS_TO_WAIT is sufficient to move to the next step or
> whether both are required.
>

> The two are actually explaining two situations:
> 1 .keep listening, when EBs from NUM_NEIGHBOURS_TO_WAIT are received, it
stops listening and synchronize to one of the neighbors  .
> 2. if after  MAX_EB_DELAY timeout,  EBs are received from number of
neighbors <  NUM_NEIGHBOURS_TO_WAIT, it stops listening as well and
synchronize to the neighbor or one of neighbors.

>
> Section 4.4
>
>    After selected a JP, a node generates a Join Request and installs an
>    AutoTxCell to the JP.  The Join Request is then sent by the pledge to
>    its JP over the AutoTxCell.  The AutoTxCell is removed by the pledge
>
> editorial: I'd suggest s/its JP/its selected JP/
>

> Will be updated in next revision.

>
>    Response is sent out.  The pledge receives the Join Response from its
>    AutoRxCell, thereby learns the keying material used in the network,
>    as well as other configurations, and becomes a "joined node".
>
> nit: maybe "other configuration values" or "other configuration
> settings"?
>

> Will be updated in next revision.

>
> Section 4.6
>
>    Once it has selected a routing parent, the joined node MUST generate
>    a 6P ADD Request and install an AutoTxCell to that parent.  The 6P
>    ADD Request is sent out through the AutoTxCell with the following
>    fields:
>
>    *  CellOptions: set to TX=1,RX=0,SHARED=0
>    *  NumCells: set to 1
>    *  CellList: at least 5 cells, chosen according to Section 8
>
> Is this listing describing the contents of the ADD request or the
> AuthTxCell used to send it?  (I presume the former, in which case I
> suggest to use "containing" or similar in preference to "with".)
>

> yes, it is the former. Will update in the next revision.

>
> Section 5.1
>
>    The goal of MSF is to manage the communication schedule in the 6TiSCH
>    schedule in a distributed manner.  For a node, this translates into
>    monitoring the current usage of the cells it has to the selected
>    parent:
>
> Is this goal strictly limited to traffic "to the selected parent" vs.
> all traffic?
>

> Theoretically MSF does not limit to traffic to the selected parent but
any neighbors.
> However, all the experiment result with MSF we have made to verify it is
to the selected parent only.
> Hence, We state here "the selected parent" only.

>
>    *  If the node determines that the number of link-layer frames it is
>       attempting to exchange with the selected parent per unit of time
>       is larger than the capacity offered by the TSCH negotiated cells
>       it has scheduled with it, the node issues a 6P ADD command to that
>       parent to add cells to the TSCH schedule.
>    *  If the traffic is lower than the capacity, the node issues a 6P
>       DELETE command to that parent to delete cells from the TSCH
>       schedule.
>
> As written, this would potentially lead to oscillation when demand is
> basically at capacity, due to the quantization of capacity.  Perhaps
> some provisioning for hysteresis is appropriate?
>

> Yes, if referring to the MSF cell usage algorithm in the following, more
cell are scheduled than what needed.
> Here is to explain the basic concept of this scheduling function.

>
>    The cell option of cells listed in CellList in 6P Request frame
>    SHOULD be either Tx=1 only or Rx=1 only.  Both NumCellsElapsed and
>    NumCellsUsed counters can be used to both type of negotiated cells.
>
> Would this be more clear as "(Tx=1,Rx=0) or (Tx=0,Rx=1)"?
>

> Yes it's more clear. Will update in next revision

>
>    *  NumCellsElapsed is incremented by exactly 1 when the current cell
>       is AutoRxCell.
>
> This holds for all peers/parents we're keeping counters for, so the
> AutoRxCell can get "double counted"?
>

> one pair of counters is associated to one neighbor.
> If there is multiple parents, then there are two NumCellsElapsed
counters, one for each of the parents.

>
>    In case that a node booted or disappeared from the network, the cell
>    reserved at the selected parent may be kept in the schedule forever.
>    A clean-up mechanism MUST be provided to resolve this issue.  The
>    clean-up mechanism is implementation-specific.  It could either be a
>    periodic polling to the neighbors the nodes have negotiated cells
>    with, or monitoring the activities on those cells.  The goal is to
>    confirm those negotiated cells are not used anymore by the associated
>    neighbors and remove them from the schedule.
>
> I'm not sure that "monitoring the activities on those cells" is safe
> with the current level of specification; if a node negotiates a 6P
> transmit cell to a parent and uses it only sparingly, with the parent
> eventually reclaiming it due to inactivity, I don't see a mechanism by
> which the node will reliably discover the negotiated cell to be
> nonfunctional and fall back to (e.g.) the corresponding AutoTxCell.  It
> may be most prudent to just not mention that as an example (a "periodic
> polling" procedure does not seem to have the same potential for
> information skew)
>

> Thanks for the comment! I will just remove that sentence from this
paragraph.

>
> Section 5.3
>
>    schedule is executed and the node sends frames to that parent.  When
>    NumTx reaches MAX_NUMTX, both NumTx and NumTxAck MUST be divided by
>    2.  For example, when MAX_NUMTX is set to 256, from NumTx=255 and
>    NumTxAck=127, the counters become NumTx=128 and NumTxAck=64 if one
>    frame is sent to the parent with an Acknowledgment received.  This
>    operation does not change the value of the PDR, but allows the
>    counters to keep incrementing.  The value of MAX_NUMTX is
>    implementation-specific.
>
> Does MAX_NUMTX need to be a power of two (to avoid errors when the
> division occurs)?
>

> Agree, it's better to be a power of two. Will state in the text.

>
>    4.  For any other cell, it compares its PDR against that of the cell
>        with the highest PDR.  If the difference is larger than
>        RELOCATE_PDRTHRES, it triggers the relocation of that cell using
>        a 6P RELOCATE command.
>
> The recommended RELOCATE_PDRTHRES is given as "50 %".  Is this
> "difference" performed as a subtraction (so that if the highest PDR is
> less than 50%, no cells can ever be relocated) or a ratio (a PDR that's
> half than the maximum PDR or smaller will trigger relocation)?
>

> This is "difference" performed as a subtraction.
> Yes it's sure if highest PDR is less than 50%, no cell can be relocated.
> But it can't tell those cells are link quality bad or because of
collision.
> If all cell PDR is so low, highly chance the routing will be affected and
switch to another neighbor.
> In experiments,  we never encounter highest PDR less 50% all time.

>
> Section 7
>
> Maybe reference Section 17.1 where the allocation will occur?
>

> Will add this in next revision.

>
> Section 8
>
>    *  The slotOffset of a cell in the CellList SHOULD be randomly and
>       uniformly chosen among all the slotOffset values that satisfy the
>       restrictions above.
>    *  The channelOffset of a cell in the CellList SHOULD be randomly and
>       uniformly chosen in [0..numFrequencies], where numFrequencies
>       represents the number of frequencies a node can communicate on.
>
> Do these random selections need to be independent from each other?  (I
> note that the selection for the autonomous cells are not.)
>
> > For channelOffset, they are independently random selected.
> For slotOffset, since once a slotOffset is picked, the next time to
select slotOffset, that one can't be selected.
> This is indicated in the text already as "chosen among all the slotOffset
values *that satisfy the*
*      restrictions above*"


> Section 9
>
> Is there a reference for these three parameters (MAXBE, MAXRETRIES,
> SLOTFRAME_LENGTH)?  SLOTFRAME_LENGTH seems new in this document and is
> listed in the table in Section 14, but the other two are not listed
> there.
>

> The MAXBE, MAXRETRIES are defined in IEEE802.15.4 standard.
> Their values various on different network systems, according to the size
and density.
> Hence we didn't give a recommended value in this draft.

>
> Section 14
>
> Why is MAX_NUMTX not listed in the table?
>
> Can we really give a recommended NUM_CH_OFFSET value, since this is in
> effect dependent on the number of channels available?
>

> We give a recommended value as this is a parameter used in the SAX
hashing algorithm.
>  This doesn't provide implementer to use other values.

>
> KA_PERIOD is defined but not used elsewhere in the document.
>

> This is a legacy of MSF draft, which we forgot to remove. Will update in
next revision

>
> What are the considerations in using a power of 10 vs. a power of 2 as
> MAX_NUM_CELLS?
>

> We pick power of 10 simply because it's easy for reader to understand.
Nothing specific.
> There is no restriction to use power of 2, such as 128.

>
> Section 16
>
>    MSF defines a series of "rules" for the node to follow.  It triggers
>    several actions, that are carried out by the protocols defined in the
>    following specifications: the Minimal IPv6 over the TSCH Mode of IEEE
>    802.15.4e (6TiSCH) Configuration [RFC8180], the 6TiSCH Operation
>
> I'd suggest a brief note that the security considerations of those
> protocols continue to apply (even though it ought to be obvious);
> reading them could help a reader understand the behavior of this
> document as well.
>
>    Sublayer Protocol (6P) [RFC8480], and the Minimal Security Framework
>    for 6TiSCH [I-D.ietf-6tisch-minimal-security].  In particular, MSF
>
> [CoJP again]
>
>    prevent it from receiving the join response.  This situation should
>    be detected through the absence of a particular node from the network
>    and handled by the network administrator through out-of-band means,
>    e.g. by moving the node outside the radio range of the attacker.
>
> "the radio range of the attacker" is not exactly a fixed constant ...
> attackers are not in general bound by legal limits and can increase Tx
> power subject only to their equipment and budget.
>

> Yes, I agree. For action, I will simply remove the example.

>
>    MSF adapts to traffics containing packets from IP layer.  It is
>    possible that the IP packet has a non-zero DSCP (Diffserv Code Point
>    [RFC2597]) value in its IPv6 header.  The decision whether to hand
>
> RFC 2597 is talking more about specifically assured forwarding PHB groups
> than "DSCP codepoint"s per se.
>

> Yes, RFC2472 is the one defined the DSCP codepoint. Will update the
reference.

>
> Section 18.1
>
> RFC 6206 seems to only be used as an example (Trickle), and could
> probably be informative.
>
> RFC 8505 might also not need to be normative.
>

> They will be moved to informative reference section

>
> Appendix B
>
>    In MSF, the T is replaced by the length slotframe 1.  String s is
>
> nit: "length of"
>
>    2.  sum the value of L_shift(h,l_bit), R_shift(h,r_bit) and ci
>
> Is this addition performed in "infinite precision" integer arithmetic or
> limited to the output width of h, e.g., by modular division?  (It's not
> clear to me whether this is the role T plays or not.)
>

> What I know here the sum is used by most of the classic string hashing
functions.
> The deep reason why using sum here is more mathematics question, which I
am not an expertise on it:-(
> The T here used for modular is to make sure the result fall into the
range of slotframe ( to pick slotOffset), or available frequencies ( to
pick channelOffset).

>
>    8.  assign the result of Step 5 to h
>
> The value from step 5 *is* h, so taken literally this says "assign h to
> h" and is not needed.
>

>  Yes, this step is removed in next revision.

Thanks so much for your comments. Will prepare revision 13 to resolve them!

>
>
>
> _______________________________________________
> 6tisch mailing list
> 6tisch@ietf.org
> https://www.ietf.org/mailman/listinfo/6tisch
>
>

-- 
——————————————————————————————————————

Dr. Tengfei, Chang
Postdoctoral Research Engineer, Inria

www.tchang.org/
——————————————————————————————————————

_______________________________________________
6tisch mailing list
6tisch@ietf.org
https://www.ietf.org/mailman/listinfo/6tisch

Re: [6tisch] Benjamin Kaduk's Discuss on draft-ietf-6tisch-msf-12: (with DISCUSS and COMMENT)

Reply via email to