Jon,
Thank you for the detailed comments and for suggesting text offline. Posting here for the benefit of the list. The issues were discussed and resolved in a series of in-person meetings at the ietf, and comments incorporated in version 03 of the draft. Answers inline below marked ### (for the benefit of the list). A few open items remain and will be addressed in the future, they are marked "###open". Thank you, Ina From: pce-boun...@ietf.org<mailto:pce-boun...@ietf.org> [mailto:pce-boun...@ietf.org] On Behalf Of Jonathan Hardwick Sent: 08 March 2013 15:40 To: draft-ietf-pce-stateful-...@tools.ietf.org<mailto:draft-ietf-pce-stateful-...@tools.ietf.org> Cc: pce@ietf.org<mailto:pce@ietf.org> Subject: [Pce] Comments on draft-ietf-pce-stateful-pce-02 Hi there, I reviewed your draft. Overall, I think this adds good function. I think that many of the LSP control-related concepts discussed in this draft are inextricably linked with path computation, and so centralising this function in a PCE and extending PCEP in this way makes sense to me. I have some detailed comments below. Please let me know if any require clarification or you would like to discuss. Regards Jon Taxonomy and Applicability It would be useful to extend the definitions in s2 to encompass the full taxonomy of stateful PCEs recently discussed on the mailing list (when that is settled). ### open - version 04 It would be helpful to include an applicability statement in each subsection of s3.1.2 to indicate which type f stateful PCE is applicable to which problem. I could have misunderstood, but it seems that examples 3.1.2.3 and 3.1.2.4 both require the PCE to have LSP initiation capability, so are outside the scope of this draft. If I'm wrong, how can you use LSP delegation to control LSP sequencing? ### will be discussed in the applicability draft Stateful PCEs and inter-domain LSPs This is not currently addressed by the document. It would be useful to have a statement either that it is out of scope or that it will be addressed in a future revision. On page 6 it says "Within this document, when describing PCE-PCE communications..." but I can't see PCE-PCE communications discussed anywhere. Suggest removing this text. ### Discussed, text updated LSP delegation It might seem a little pedantic (!) but I think it would be useful to clarify which PCC "owns" a given LSP i.e. reports on it & has rights to delegate it. * The one that sent the original PCReq? * If the LSP was set up without the benefit of a PCReq, which PCC owns it? Configured by operator? I also think it's necessary to stipulate that a given LSP is owned (in the above sense) by at most one PCC. ### done Section 5.5 para 1: why not allow the PCC to delegate at the same time that it synchronizes? Otherwise you force it to send two sets of essentially identical messages. ### Updated accordingly. Section 5.5 para 2: "an LSP can be delegated to one or more PCEs" Surely not at the same time, though? Please clarify. ### done Section 5.5.2 para 1: "and MUST ignore any further PCUpd messages" Rather than ignore them I think it should send a PCErr. ### The issue was rather with PCUpd messages in queue (received but not yet processed). Text was clarified. Section 6.2 "A PCC MUST respond with an LSP State Report to each LSP Update Request" It's not clear why this is necessary. Particularly in the case of combining LSP updates that this section discusses. It would be better for the PCE server to just accept the most recent LSP Report as being the definitive statement of the current LSP state. Is there a reason why the PCE server needs to correlate LSP Reports to LSP Updates? I don't think so, but if there is, there should be some sort of ID in the LSP Update that is reflected back in the LSP Report. ###This will be clarified in 04 when operation ids will be introduced If a PCC has been configured to delegate an LSP and a PCE has decided not to accept the delegation then I think the result could be a large amount of thrashing as the LSP is continually delegated and revoked. The draft should include some text indicating when it is OK for a PCC to re-delegate an LSP after the delegation has been returned to it. ### Done State Synchronization - General Comments I wonder whether the NODE-IDENTIFIER is really necessary. To date, PCEP has used the IP address as the sole identifier of a PCEP entity, and although conceptually they are not necessarily the same thing, we shouldn't change it now without good reason. I would prefer the LSP's symbolic path name to be unique within a given administrative domain. Then if a PCC restarts with a different IP address, the PCE server recognises the LSP's symbolic path name and concludes that ownership of the LSP has transferred to a different PCC. Regardless, having introduced the NODE-IDENTIFIER TLV, you need to specify how to deal with the case of two PCCs using the same NODE-IDENTIFIER concurrently (there is at least a new PCErr subcode). ### discussed offline, renamed to "redundancy-group" I can't figure out why you need the INCLUDE-DB-VERSION flag. As far as I can tell it is synonymous with the presence of the LSP-DB-VERSION TLV in the OPEN object. Please let me know if I'm missing something. I am struggling to make sense of 5.4.1 figure 8. If the PCE does not want to do LSP DB versioning then how & why can it be providing a previous DB version number in its Open message? Are you saying that the PCE can decide to track version numbers dynamically per session (even different sessions to the same PCC)? What is the basis for that? ###db versioning will be further clarified in the next version State Synchronization and Graceful Restart When a PCE restarts, it will synchronize with many PCCs, so it SHOULD implement some procedure to wait for all its sessions to synchronize before it sends any PCUpds or replies to any PCReqs. I think this renders obsolete the requirement for a PCC to delay sending a PCReq until its session has synchronized, since even if it does, it has no way of knowing that the other sessions have synchronized. ### end-of-marker for sync was added to 03 5.4.1 suggests that a stateful PCE that is not using state synchronization avoidance should purge LSP state and relearn everything. I think this is the wrong choice of words. To allow a PCC to restart gracefully the PCE should perform a mark-and-sweep of stale state, not temporarily purge all the PCC's LSPs from the LSP-DB. Note that a mark-and-sweep approach requires the PCE to be able to identify the final PCRpt in the synchronization flow from the PCC (the "sync done" message in your diagrams). I think there is currently no way for it to tell, apart from receiving the first PCRpt with SYNC=0, which could be any amount of time later. It would be good to specify an "end of SYNC" marker to be sent immediately at the end of synchronization, either an empty PCRpt with SYNC=0, or a new bit in the LSP TLV. ### done State Synchronization - Failure Cases Why must a PCC terminate its session if there is an error during synchronization? How should it recover and regain service? It is better for the stateful PCE to continue with partial information than for it not to get any information at all from this PCC, so I think the session should stay up, perhaps flagged to the operator as "synchronization failed". The PCC may be able to retry its failed PCRpts later. If the PCE encounters a recoverable error during synchronization (such as memory allocation failure) it should have some mechanism to recover. One possibility is for it to reset the TCP connection after a suitable time period has passed, triggering a new synchronization. Perhaps better is to reset the connection immediately (freeing resources immediately) but specifying that the PCC should not try to reconnect for a certain time period, perhaps using a new TLV on the CLOSE object. ### still open - next revision Dealing with PCEP Session failure What is the purpose of the delegation timeout interval? ### clarified in the doc. I know that section 5.8 says that cleanup on the event of session failure is for further study, but here are my thoughts. * On session failure, all delegations are immediately returned to the PCC. The LSPs must be re-delegated after the session comes back up. * The PCE should run a timer before discarding its state from the PCC, allowing a window for the session to come back up. * Once the session is back up, graceful restart procedures (discussed above) should be used to recover the relevant portion of the LSP-DB. * The PCC should attempt to re-establish the session. If it fails then it MAY temporarily or permanently downgrade the PCE's priority which may lead it to prefer other PCEs and hence delegate the affected LSPs to them. Section 5.8 "A permanent PCEP session MUST be established" This is a little imprecise and I would prefer something like the final bullet above. ### done Backup PCEs versus load-balancing 5.5.4 implies that there is a single active PCE handling all delegations and the other PCEs are idling. This is not necessarily the case, the delegations could be apportioned equally amongst the PCEs, and if any one of them fails, the LSPs that are delegated to it can be re-delegated amongst the remaining PCEs. The benefit is that fewer LSPs are affected by a PCE failure (and resources are used more effectively). Same comment applies to s9.1 second to last paragraph. ### Text updated Stateful PCE priority I was struck by this in s9.1: "A PCC implementation SHOULD allow the operator to specify delegation priority for PCEs" How does that relate to PrefL, PrefR, PrefS, PrefY in RFCs 5088 and 5089? Is this a new type of preference? In which case, this is another extension required to those RFCs. ###open Symbolic Path Names As mentioned I think it should be unique in the administrative domain, not per PCC. If it is possible to include the symbolic path name in messages after the first PCRpt, you need to specify what happens if the symbolic path name changes for a given LSP ID. This is illegal behaviour but must be handled. Suggestion: * Symbolic path name MAY be provided in subsequent messages to aid network diagnostics * It is not significant in those messages and is ignored by the receiver except for logging. ### discussed offline, text changes were made to clarify. Minor comments Definition of minimum cut-set on page 6 is incorrect. Here is a correct definition: * Cut-set: a set of links which, when removed from the network, result in a specific source being isolated from a specific destination. * Minimum cut-set: a cut-set whose constituent links have the smallest possible summed capacity. Section 3.1.1 paragraph 2, you should define what the auto-bandwidth system is, and explain why it is "particularly affected". Section 3.1.2.2 para 2 case (a) please could you expand on where and why the deleterious effects arise? ### the above comments will be discussed as part of the applicability draft Section 5.3 para 4 line 5 "will be generated": it will only be generated by a device that supports this draft! Page 33 para 5 "a PCC MUST process all LSP Update Requests" - worth pointing out that process != obey. Page 33 para 6 "triggering the LSP setup of a next LSP" - triggering is not in the scope of this draft, perhaps you mean "modifying the properties of". ### text updated for all the above Section 7.2 O-bit: since the operator will always have configured an operation status for the LSP, how is the O bit ever used? I think this belongs in the LSP-initiation draft. ### open - O-bit usage will be clarified in the next version 7.2.2 "The value contains the RSVP IPv4 ERROR_SPEC object " In fact it contains the contents of that object minus the common object header, not the whole object. Ditto for IPv6. ### open - error processing will be updated for next version 7.2.3 final paragraph: "Since a PCE does not send LSP updates to a PCC" Do you mean LSP Reports? ### updated 9.1 first paragraph: "A PCC implementation SHOULD allow the operator to specify multiple candidate PCEs" RFC 5440 already says this, doesn't need saying again. 9.1 first paragraph: "and a delegation preference for each candidate PCE" You say the same thing again later. Perhaps just delete this whole sentence. ### discussed and left unchanged 9.4 "A PCC implementation SHOULD provide a command to show to which PCEs LSPs are delegated ." This is ambiguous. Does it mean show the set of PCEs to which any LSPs have been delegated, or does it mean show, for each LSP, whether it is delegated and if so, to which PCE? ### text updated 9.6 "It SHOULD also allow sending a notification when a rate threshold is reached." Do you mean a PCNtf or some other type of notification? ### open, will be updated in the next version Typos Typo: s3.1.1 para 1 line 6 "it's" -> "its" Typo: s3.1.1 para 1 line 8 "within in" -> "within" Typo: s3.1.1 para 3 line 4 "promise of" -> "promise of the" Typo: s3.1.2.2 para 2 line 10 "but it beyond" -> "but it is beyond" Typo: page 23 line 1 "PCE or the PCE" -> "PCC or the PCE" Typo: s5.5.1 para 1 line 6 "when the PCC wishes" -> "when the PCE wishes" Typo: page 33 para 2 "that a PCC wishes" -> "that a PCE wishes" Typo: page 33 para 2 "specified the Updare Request" -> "specified in the Update Request"
_______________________________________________ Pce mailing list Pce@ietf.org https://www.ietf.org/mailman/listinfo/pce