Hi Benjamin, I replied inline starting with '>'
Thanks so much those detailed comments! On Wed, Mar 11, 2020 at 6:55 PM Benjamin Kaduk via Datatracker < nore...@ietf.org> wrote: > Benjamin Kaduk has entered the following ballot position for > draft-ietf-6tisch-msf-12: Discuss > > When responding, please keep the subject line intact and reply to all > email addresses included in the To and CC lines. (Feel free to cut this > introductory paragraph, however.) > > > Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html > for more information about IESG DISCUSS and COMMENT positions. > > > The document, along with other ballot positions, can be found here: > https://datatracker.ietf.org/doc/draft-ietf-6tisch-msf/ > > > > ---------------------------------------------------------------------- > DISCUSS: > ---------------------------------------------------------------------- > > I'm concerned that the scheduling function for autonomous cells can > cause an infinite loop in the case of hash collision -- Section 3 > specifies that AutoTxCell always takes precedence over AutoRxCell, but > if those two cells collide, the corresponding cells on the peer in > question will also collide. If both peers try to send at the same time > and the hashes collide, they will both attempt to transmit indefinitely > and never be received. > >. Notice that the AutoTxCell is a shared cell, where the back-off mechanism is applied. > In case there is a collision on that cell, a back-off with different exponent will be used on each side. > The cell will be used AutoTxCell on each side at different timing. There seems to be some "passing the buck" going on with respect to > rate-limiting unauthenticated (join) traffic: > draft-ietf-6tisch-minimal-security (Section 6.1.1) says that the SF > "SHOULD NOT allocate additional cells as a result of traffic with code > point AF43"; this document is implementing a SF, and yet we try to avoid > the issue, saying that "[t]he at IPv6 layer SHOULD ensure that this join > traffic is rate-limited before it is passed to 6top sublayer where MSF > can observe it". I think we need a clear and consistent story about > where this rate-limiting is supposed to happen. > > Thanks for the comments! This has been discussed in some previous revision of MSF. > It is not "passing the buck" but a decision based on the scheduling function and security context. > In the point of avoiding layer violation, the upper layer information suppose NOT see-able for linker layer where 6P and MSF are. > But regarding to security, it seems it is not avoidable. > IMO, the scheduling function is aiming to provide algorithm to add/remove cell according to traffic. > The traffic could contains unauthenticated join request from both normal devices and malicious devices. > The function does NOT have enough information to differentiate them. > We are assuming some other entity out side of MSF needs to resolve this issue. >> If assuming the security info in the Ipv6 header is passed to MSF, we could abandon rate-limiting approach and simply jumping over a slot if the AF43 packet is sent on that slot. > Hence the adapting traffic never happens to traffic marked as AF43. > > > ---------------------------------------------------------------------- > COMMENT: > ---------------------------------------------------------------------- > > I support Roman's Discuss -- we need more information for this to be a > useful reference; even what seem to be the official DASFAA 1997 > proceedings (https://dblp.org/db/conf/dasfaa/dasfaa97) do not have an > associated document). > > Basing various scheduling aspects on (a hash of) the EUI64 ties > functionality to a persistent identifier for a device. How significant > a disruption would be incurred if a device periodically changes its > presented EUI64 for anonymization purposes? > > I assume you are saying a malicious device? > There is no doubt this will influence the performance of joining process for normal devices. > But normal devices still have a chance to join. > the join proxy won't be affect as well since the cell will be removed right after the packet is sent out. > > There seems to be a general pattern of "if you don't have a > 6P-negotiated Tx cell, install and AutoTxCell to send your one message > and then remove it after sending"; I wonder if it would be easier on the > reader to consolidate this as a general principle and not repeat the > details every time it occurs. > > Yes, this is the feature of autonomous cell. Not sure if it would easier to understand state just one time. > There is little different for each adding/removing, e.g which node to do so, parent/JP? > I personally feel it's clear to repeat this every time, with various type of node, so highlighting the difference. > > Requirements Language > > "NOT RECOMMENDED" is not in the RFC2119 boilerplate (but is a BCP 14 > keyword). > > Thanks for pointing out. It will be removed in next revision. > We also updated the RFC to RFC8174 instead of RFC2119. > > Section 1 > > the 6 steps described in Section 4. The end state of the join > process is that the node is synchronized to the network, has mutually > authenticated to the network, has identified a routing parent, and > > nit(?): I guess maybe "mutually authenticated with" is more correct for > the bidirectional operation. > > will update in next revision. > > It does so for 3 reasons: to match the link-layer resources to the > traffic, to handle changing parent, to handle a schedule collision. > > nit: end the list with "or" (or "and"?). > > will update in next revision. > > MSF works closely with RPL, specifically the routing parent defined > in [RFC6550]. This specification only describes how MSF works with > one routing parent, which is phrased as "selected parent". The > > nit: I suggest '''one routing parent; this parent is referred to as the > "selected parent"'''. > > will update in next revision. > > activity of MSF towards to single routing parent is called as a "MSF > > nit: "towards the" > > will update in next revision. > > * We added sections on the interface to the minimal 6TiSCH > configuration (Section 2), the use of the SIGNAL command > (Section 6), the MSF constants (Section 14), the MSF statistics > (Section 15). > > nit: end the list with "and". > > will update in next revision. > > Section 2 > > In a TSCH network, time is sliced up into time slots. The time slots > are grouped as one of more slotframes which repeat over time. The > > nit(?): should this be "one or more"? > > it should be 'one or multiple slotframes". Will update in next revision > > channel) is indicated as a cell of TSCH schedule. MSF is one of the > policies defining how to manage the TSCH schedule. > > nit: if there is only one such policy active at a given time for a given > network, I suggest "MSF is a policy for managing the TCSH schedule". > (If multiple policies are active simultaneously, no change is needed.) > > As indicated in RFC8480: A node MAY implement multiple SFs and run them at the same time. > so MSF is* one of the policies* defining how to manage the TSCH schedule. > > MSF uses the minimal cell for broadcast frames such as Enhanced > Beacons (EBs) [IEEE802154] and broadcast DODAG Information Objects > (DIOs) [RFC6550]. Cells scheduled by MSF are meant to be used only > for unicast frames. > > If this paragraph was moved before the previous paragraph, then EB and > DIO would be defined before their first usage. > > Maybe I understand it wrong. Do you mean you prefer to move this paragraph before the previous one? > The EB and DIO are defined in the references, not sure we still need define them in MSF. > > bandwidth of minimal cell. One of the algorithm met the rule is the > Trickle timer defined in [RFC6206] which is applied on DIO messages > [RFC6550]. However, any such algorithm of limiting the broadcast > > nit(?): "One of the algorithms that fulfills this requirement"? > > will update accordingly. > > MSF RECOMMENDS the use of 3 slotframes. MSF schedules autonomous > cells at Slotframe 1 (Section 3) and 6P negotiated cells at Slotframe > 2 (Section 5) , while Slotframe 0 is used for the bootstrap traffic > as defined in the Minimal 6TiSCH Configuration. It is RECOMMENDED to > use the same slotframe length for Slotframe 0, 1 and 2. Thus it is > > Perhaps this is just a question of writing style, but if an > implementation is free to use an alternative SF or a variant of MSF, > could we not say that "MSF uses 3 slotframts", "MSF uses the same > slotframe length for", etc.? > > updated to "3 slotframes are used in MSF. " , "The same slotframe length for Slotframe 0, 1 and 2 is RECOMMENDED". > > Section 3 > > Is there any risk of unwanted correlation between slot and channel > offsets when using the same hash function and input for both > calculations? > > hash function. Other optional parameters defined in SAX determine > the performance of SAX hash function. Those parameters could be > broadcasted in EB frame or pre-configured. For interoperability > purposes, an example how the hash function is implemented is detailed > in Appendix B. > > Given the lack of usable reference for [SAX-DASFAA], I assume that the > content in Appendix B is going to be used as a specification, not just > an example. > > the new reference for SAX is updated in the new revision. > > * The AutoRxCell MUST always remain scheduled after synchronized. > > nit: s/synchronized/synchronization/ > > AutoRxCell. In case of conflicting with a negotiated cell, > autonomous cells take precedence over negotiated cell, which is > stated in [IEEE802154]. However, when the Slotframe 0, 1 and 2 use > the same length value, it is possible for negotiated cell to avoid > the collision with AutoRxCell. > > Presumably this factors in to the recommendation to have the three > listed slotframes use the same length, but mentioning it explicitly > (whether here or where the recommendation is made) might be nice. > > it is mentioned before as: *The same slotframe length for Slotframe 0, 1 and 2 is RECOMMENDED.* > > Section 4 > > network. Alternative behaviors may involved, for example, when > alternative security solution is used for the network. Section 4.1 > > nit: singular/plural mismatch "behaviors"/"solution is used" > > will be fixed in next revision. > > Section 4.1 > > A node implementing MSF SHOULD implement the Minimal Security > Framework for 6TiSCH [I-D.ietf-6tisch-minimal-security]. As a > > Didn't this get renamed to CoJP? > > Thanks for pointing it out! Will update in next revision. > > Section 4.2 > > I a little bit wonder if there is a better description than "available > frequencies" but don't have one to offer. > > The frequency to be selected is randomly picked. There is no one that is preferred comparing to others. > > Section 4.3 > > While the exact behavior is implementation-specific, it is > RECOMMENDED that after having received the first EB, a node keeps > listen for at most MAX_EB_DELAY seconds until it has received EBs > from NUM_NEIGHBOURS_TO_WAIT distinct neighbors, which is defined in > [RFC8180]. > > nit(?): this phrasing implies that only NUM_NEIGHBOURS_TO_WAIT is > defined in RFC 8180, but MAX_EB_DELAY is also defined there. > > The "which" here indicates the whole behavior. > It will be rephrased as "This behavior is defined in [RFC8180]". > > not-nit: this phrasing is ambiguous as to whether one of MAX_EB_DELAY > and NUM_NEIGHBOURS_TO_WAIT is sufficient to move to the next step or > whether both are required. > > The two are actually explaining two situations: > 1 .keep listening, when EBs from NUM_NEIGHBOURS_TO_WAIT are received, it stops listening and synchronize to one of the neighbors . > 2. if after MAX_EB_DELAY timeout, EBs are received from number of neighbors < NUM_NEIGHBOURS_TO_WAIT, it stops listening as well and synchronize to the neighbor or one of neighbors. > > Section 4.4 > > After selected a JP, a node generates a Join Request and installs an > AutoTxCell to the JP. The Join Request is then sent by the pledge to > its JP over the AutoTxCell. The AutoTxCell is removed by the pledge > > editorial: I'd suggest s/its JP/its selected JP/ > > Will be updated in next revision. > > Response is sent out. The pledge receives the Join Response from its > AutoRxCell, thereby learns the keying material used in the network, > as well as other configurations, and becomes a "joined node". > > nit: maybe "other configuration values" or "other configuration > settings"? > > Will be updated in next revision. > > Section 4.6 > > Once it has selected a routing parent, the joined node MUST generate > a 6P ADD Request and install an AutoTxCell to that parent. The 6P > ADD Request is sent out through the AutoTxCell with the following > fields: > > * CellOptions: set to TX=1,RX=0,SHARED=0 > * NumCells: set to 1 > * CellList: at least 5 cells, chosen according to Section 8 > > Is this listing describing the contents of the ADD request or the > AuthTxCell used to send it? (I presume the former, in which case I > suggest to use "containing" or similar in preference to "with".) > > yes, it is the former. Will update in the next revision. > > Section 5.1 > > The goal of MSF is to manage the communication schedule in the 6TiSCH > schedule in a distributed manner. For a node, this translates into > monitoring the current usage of the cells it has to the selected > parent: > > Is this goal strictly limited to traffic "to the selected parent" vs. > all traffic? > > Theoretically MSF does not limit to traffic to the selected parent but any neighbors. > However, all the experiment result with MSF we have made to verify it is to the selected parent only. > Hence, We state here "the selected parent" only. > > * If the node determines that the number of link-layer frames it is > attempting to exchange with the selected parent per unit of time > is larger than the capacity offered by the TSCH negotiated cells > it has scheduled with it, the node issues a 6P ADD command to that > parent to add cells to the TSCH schedule. > * If the traffic is lower than the capacity, the node issues a 6P > DELETE command to that parent to delete cells from the TSCH > schedule. > > As written, this would potentially lead to oscillation when demand is > basically at capacity, due to the quantization of capacity. Perhaps > some provisioning for hysteresis is appropriate? > > Yes, if referring to the MSF cell usage algorithm in the following, more cell are scheduled than what needed. > Here is to explain the basic concept of this scheduling function. > > The cell option of cells listed in CellList in 6P Request frame > SHOULD be either Tx=1 only or Rx=1 only. Both NumCellsElapsed and > NumCellsUsed counters can be used to both type of negotiated cells. > > Would this be more clear as "(Tx=1,Rx=0) or (Tx=0,Rx=1)"? > > Yes it's more clear. Will update in next revision > > * NumCellsElapsed is incremented by exactly 1 when the current cell > is AutoRxCell. > > This holds for all peers/parents we're keeping counters for, so the > AutoRxCell can get "double counted"? > > one pair of counters is associated to one neighbor. > If there is multiple parents, then there are two NumCellsElapsed counters, one for each of the parents. > > In case that a node booted or disappeared from the network, the cell > reserved at the selected parent may be kept in the schedule forever. > A clean-up mechanism MUST be provided to resolve this issue. The > clean-up mechanism is implementation-specific. It could either be a > periodic polling to the neighbors the nodes have negotiated cells > with, or monitoring the activities on those cells. The goal is to > confirm those negotiated cells are not used anymore by the associated > neighbors and remove them from the schedule. > > I'm not sure that "monitoring the activities on those cells" is safe > with the current level of specification; if a node negotiates a 6P > transmit cell to a parent and uses it only sparingly, with the parent > eventually reclaiming it due to inactivity, I don't see a mechanism by > which the node will reliably discover the negotiated cell to be > nonfunctional and fall back to (e.g.) the corresponding AutoTxCell. It > may be most prudent to just not mention that as an example (a "periodic > polling" procedure does not seem to have the same potential for > information skew) > > Thanks for the comment! I will just remove that sentence from this paragraph. > > Section 5.3 > > schedule is executed and the node sends frames to that parent. When > NumTx reaches MAX_NUMTX, both NumTx and NumTxAck MUST be divided by > 2. For example, when MAX_NUMTX is set to 256, from NumTx=255 and > NumTxAck=127, the counters become NumTx=128 and NumTxAck=64 if one > frame is sent to the parent with an Acknowledgment received. This > operation does not change the value of the PDR, but allows the > counters to keep incrementing. The value of MAX_NUMTX is > implementation-specific. > > Does MAX_NUMTX need to be a power of two (to avoid errors when the > division occurs)? > > Agree, it's better to be a power of two. Will state in the text. > > 4. For any other cell, it compares its PDR against that of the cell > with the highest PDR. If the difference is larger than > RELOCATE_PDRTHRES, it triggers the relocation of that cell using > a 6P RELOCATE command. > > The recommended RELOCATE_PDRTHRES is given as "50 %". Is this > "difference" performed as a subtraction (so that if the highest PDR is > less than 50%, no cells can ever be relocated) or a ratio (a PDR that's > half than the maximum PDR or smaller will trigger relocation)? > > This is "difference" performed as a subtraction. > Yes it's sure if highest PDR is less than 50%, no cell can be relocated. > But it can't tell those cells are link quality bad or because of collision. > If all cell PDR is so low, highly chance the routing will be affected and switch to another neighbor. > In experiments, we never encounter highest PDR less 50% all time. > > Section 7 > > Maybe reference Section 17.1 where the allocation will occur? > > Will add this in next revision. > > Section 8 > > * The slotOffset of a cell in the CellList SHOULD be randomly and > uniformly chosen among all the slotOffset values that satisfy the > restrictions above. > * The channelOffset of a cell in the CellList SHOULD be randomly and > uniformly chosen in [0..numFrequencies], where numFrequencies > represents the number of frequencies a node can communicate on. > > Do these random selections need to be independent from each other? (I > note that the selection for the autonomous cells are not.) > > > For channelOffset, they are independently random selected. > For slotOffset, since once a slotOffset is picked, the next time to select slotOffset, that one can't be selected. > This is indicated in the text already as "chosen among all the slotOffset values *that satisfy the* * restrictions above*" > Section 9 > > Is there a reference for these three parameters (MAXBE, MAXRETRIES, > SLOTFRAME_LENGTH)? SLOTFRAME_LENGTH seems new in this document and is > listed in the table in Section 14, but the other two are not listed > there. > > The MAXBE, MAXRETRIES are defined in IEEE802.15.4 standard. > Their values various on different network systems, according to the size and density. > Hence we didn't give a recommended value in this draft. > > Section 14 > > Why is MAX_NUMTX not listed in the table? > > Can we really give a recommended NUM_CH_OFFSET value, since this is in > effect dependent on the number of channels available? > > We give a recommended value as this is a parameter used in the SAX hashing algorithm. > This doesn't provide implementer to use other values. > > KA_PERIOD is defined but not used elsewhere in the document. > > This is a legacy of MSF draft, which we forgot to remove. Will update in next revision > > What are the considerations in using a power of 10 vs. a power of 2 as > MAX_NUM_CELLS? > > We pick power of 10 simply because it's easy for reader to understand. Nothing specific. > There is no restriction to use power of 2, such as 128. > > Section 16 > > MSF defines a series of "rules" for the node to follow. It triggers > several actions, that are carried out by the protocols defined in the > following specifications: the Minimal IPv6 over the TSCH Mode of IEEE > 802.15.4e (6TiSCH) Configuration [RFC8180], the 6TiSCH Operation > > I'd suggest a brief note that the security considerations of those > protocols continue to apply (even though it ought to be obvious); > reading them could help a reader understand the behavior of this > document as well. > > Sublayer Protocol (6P) [RFC8480], and the Minimal Security Framework > for 6TiSCH [I-D.ietf-6tisch-minimal-security]. In particular, MSF > > [CoJP again] > > prevent it from receiving the join response. This situation should > be detected through the absence of a particular node from the network > and handled by the network administrator through out-of-band means, > e.g. by moving the node outside the radio range of the attacker. > > "the radio range of the attacker" is not exactly a fixed constant ... > attackers are not in general bound by legal limits and can increase Tx > power subject only to their equipment and budget. > > Yes, I agree. For action, I will simply remove the example. > > MSF adapts to traffics containing packets from IP layer. It is > possible that the IP packet has a non-zero DSCP (Diffserv Code Point > [RFC2597]) value in its IPv6 header. The decision whether to hand > > RFC 2597 is talking more about specifically assured forwarding PHB groups > than "DSCP codepoint"s per se. > > Yes, RFC2472 is the one defined the DSCP codepoint. Will update the reference. > > Section 18.1 > > RFC 6206 seems to only be used as an example (Trickle), and could > probably be informative. > > RFC 8505 might also not need to be normative. > > They will be moved to informative reference section > > Appendix B > > In MSF, the T is replaced by the length slotframe 1. String s is > > nit: "length of" > > 2. sum the value of L_shift(h,l_bit), R_shift(h,r_bit) and ci > > Is this addition performed in "infinite precision" integer arithmetic or > limited to the output width of h, e.g., by modular division? (It's not > clear to me whether this is the role T plays or not.) > > What I know here the sum is used by most of the classic string hashing functions. > The deep reason why using sum here is more mathematics question, which I am not an expertise on it:-( > The T here used for modular is to make sure the result fall into the range of slotframe ( to pick slotOffset), or available frequencies ( to pick channelOffset). > > 8. assign the result of Step 5 to h > > The value from step 5 *is* h, so taken literally this says "assign h to > h" and is not needed. > > Yes, this step is removed in next revision. Thanks so much for your comments. Will prepare revision 13 to resolve them! > > > > _______________________________________________ > 6tisch mailing list > 6tisch@ietf.org > https://www.ietf.org/mailman/listinfo/6tisch > > -- —————————————————————————————————————— Dr. Tengfei, Chang Postdoctoral Research Engineer, Inria www.tchang.org/ ——————————————————————————————————————
_______________________________________________ 6tisch mailing list 6tisch@ietf.org https://www.ietf.org/mailman/listinfo/6tisch