Re: [Pce] Comments on draft-ietf-pce-stateful-pce-02

Ina Minei Fri, 22 Mar 2013 15:16:59 -0700


Jon,




Thank you for the detailed comments and for suggesting text offline. Posting 
here for the benefit of the list.



The issues were discussed and resolved in a series of in-person meetings at the 
ietf, and comments incorporated in version 03 of the draft. Answers inline 
below marked ### (for the benefit of the list). A few open items remain and 
will be addressed in the future, they are marked "###open".



Thank you,



Ina


From: pce-boun...@ietf.org<mailto:pce-boun...@ietf.org> 
[mailto:pce-boun...@ietf.org] On Behalf Of Jonathan Hardwick
Sent: 08 March 2013 15:40
To: 
draft-ietf-pce-stateful-...@tools.ietf.org<mailto:draft-ietf-pce-stateful-...@tools.ietf.org>
Cc: pce@ietf.org<mailto:pce@ietf.org>
Subject: [Pce] Comments on draft-ietf-pce-stateful-pce-02

Hi there,

I reviewed your draft.  Overall, I think this adds good function.  I think that 
many of the LSP control-related concepts discussed in this draft are 
inextricably linked with path computation, and so centralising this function in 
a PCE and extending PCEP in this way makes sense to me.

I have some detailed comments below.  Please let me know if any require 
clarification or you would like to discuss.

Regards
Jon


Taxonomy and Applicability
It would be useful to extend the definitions in s2 to encompass the full 
taxonomy of stateful PCEs recently discussed on the mailing list (when that is 
settled).
### open - version 04

It would be helpful to include an applicability statement in each subsection of 
s3.1.2 to indicate which type f stateful PCE is applicable to which problem.
I could have misunderstood, but it seems that examples 3.1.2.3 and 3.1.2.4 both 
require the PCE to have LSP initiation capability, so are outside the scope of 
this draft.  If I'm wrong, how can you use LSP delegation to control LSP 
sequencing?
### will be discussed in the applicability draft

Stateful PCEs and inter-domain LSPs
This is not currently addressed by the document.  It would be useful to have a 
statement either that it is out of scope or that it will be addressed in a 
future revision.
On page 6 it says "Within this document, when describing PCE-PCE 
communications..." but I can't see PCE-PCE communications discussed anywhere.  
Suggest removing this text.
### Discussed, text updated

LSP delegation
It might seem a little pedantic (!) but I think it would be useful to clarify 
which PCC "owns" a given LSP i.e. reports on it & has rights to delegate it.

*         The one that sent the original PCReq?

*         If the LSP was set up without the benefit of a PCReq, which PCC owns 
it?  Configured by operator?
I also think it's necessary to stipulate that a given LSP is owned (in the 
above sense) by at most one PCC.
### done

Section 5.5 para 1: why not allow the PCC to delegate at the same time that it 
synchronizes?  Otherwise you force it to send two sets of essentially identical 
messages.
### Updated accordingly.

Section 5.5 para 2: "an LSP can be delegated to one or more PCEs"  Surely not 
at the same time, though?  Please clarify.
### done

Section 5.5.2 para 1: "and MUST ignore  any further PCUpd messages"  Rather 
than ignore them I think it should send a PCErr.
### The issue was rather with PCUpd messages in queue (received but not yet 
processed). Text was clarified.

Section 6.2 "A PCC MUST respond with an LSP State Report to each LSP Update 
Request"  It's not clear why this is necessary.  Particularly in the case of 
combining LSP updates that this section discusses.  It would be better for the 
PCE server to just accept the most recent LSP Report as being the definitive 
statement of the current LSP state.  Is there a reason why the PCE server needs 
to correlate LSP Reports to LSP Updates?  I don't think so, but if there is, 
there should be some sort of ID in the LSP Update that is reflected back in the 
LSP Report.
###This will be clarified in 04 when operation ids will be introduced

If a PCC has been configured to delegate an LSP and a PCE has decided not to 
accept the delegation then I think the result could be a large amount of 
thrashing as the LSP is continually delegated and revoked.  The draft should 
include some text indicating when it is OK for a PCC to re-delegate an LSP 
after the delegation has been returned to it.
### Done

State Synchronization - General Comments
I wonder whether the NODE-IDENTIFIER is really necessary.  To date, PCEP has 
used the IP address as the sole identifier of a PCEP entity, and although 
conceptually they are not necessarily the same thing, we shouldn't change it 
now without good reason.  I would prefer the LSP's symbolic path name to be 
unique within a given administrative domain.  Then if a PCC restarts with a 
different IP address, the PCE server recognises the LSP's symbolic path name 
and concludes that ownership of the LSP has transferred to a different PCC.

Regardless, having introduced the NODE-IDENTIFIER TLV, you need to specify how 
to deal with the case of two PCCs using the same NODE-IDENTIFIER concurrently 
(there is at least a new PCErr subcode).
### discussed offline, renamed to "redundancy-group"

I can't figure out why you need the INCLUDE-DB-VERSION flag.  As far as I can 
tell it is synonymous with the presence of the LSP-DB-VERSION TLV in the OPEN 
object.  Please let me know if I'm missing something.

I am struggling to make sense of 5.4.1 figure 8.  If the PCE does not want to 
do LSP DB versioning then how & why can it be providing a previous DB version 
number in its Open message?  Are you saying that the PCE can decide to track 
version numbers dynamically per session (even different sessions to the same 
PCC)?  What is the basis for that?
###db versioning will be further clarified in the next version


State Synchronization and Graceful Restart
When a PCE restarts, it will synchronize with many PCCs, so it SHOULD implement 
some procedure to wait for all its sessions to synchronize before it sends any 
PCUpds or replies to any PCReqs.  I think this renders obsolete the requirement 
for a PCC to delay sending a PCReq until its session has synchronized, since 
even if it does, it has no way of knowing that the other sessions have 
synchronized.
### end-of-marker for sync was added to 03

5.4.1 suggests that a stateful PCE that is not using state synchronization 
avoidance should purge LSP state and relearn everything.  I think this is the 
wrong choice of words.  To allow a PCC to restart gracefully the PCE should 
perform a mark-and-sweep of stale state, not temporarily purge all the PCC's 
LSPs from the LSP-DB.  Note that a mark-and-sweep approach requires the PCE to 
be able to identify the final PCRpt in the synchronization flow from the PCC 
(the "sync done" message in your diagrams).  I think there is currently no way 
for it to tell, apart from receiving the first PCRpt with SYNC=0, which could 
be any amount of time later.  It would be good to specify an "end of SYNC" 
marker to be sent immediately at the end of synchronization, either an empty 
PCRpt with SYNC=0, or a new bit in the LSP TLV.
### done

State Synchronization - Failure Cases
Why must a PCC terminate its session if there is an error during 
synchronization?  How should it recover and regain service?  It is better for 
the stateful PCE to continue with partial information than for it not to get 
any information at all from this PCC, so I think the session should stay up, 
perhaps flagged to the operator as "synchronization failed".  The PCC may be 
able to retry its failed PCRpts later.

If the PCE encounters a recoverable error during synchronization (such as 
memory allocation failure) it should have some mechanism to recover.  One 
possibility is for it to reset the TCP connection after a suitable time period 
has passed, triggering a new synchronization.  Perhaps better is to reset the 
connection immediately (freeing resources immediately) but specifying that the 
PCC should not try to reconnect for a certain time period, perhaps using a new 
TLV on the CLOSE object.
### still open - next revision

Dealing with PCEP Session failure
What is the purpose of the delegation timeout interval?
### clarified in the doc.

I know that section 5.8 says that cleanup on the event of session failure is 
for further study, but here are my thoughts.

*         On session failure, all delegations are immediately returned to the 
PCC.  The LSPs must be re-delegated after the session comes back up.

*         The PCE should run a timer before discarding its state from the PCC, 
allowing a window for the session to come back up.

*         Once the session is back up, graceful restart procedures (discussed 
above) should be used to recover the relevant portion of the LSP-DB.

*         The PCC should attempt to re-establish the session.  If it fails then 
it MAY temporarily or permanently downgrade the PCE's priority which may lead 
it to prefer other PCEs and hence delegate the affected LSPs to them.

Section 5.8 "A permanent PCEP session MUST be established" This is a little 
imprecise and I would prefer something like the final bullet above.
### done

Backup PCEs versus load-balancing
5.5.4 implies that there is a single active PCE handling all delegations and 
the other PCEs are idling.  This is not necessarily the case, the delegations 
could be apportioned equally amongst the PCEs, and if any one of them fails, 
the LSPs that are delegated to it can be re-delegated amongst the remaining 
PCEs.  The benefit is that fewer LSPs are affected by a PCE failure (and 
resources are used more effectively).

Same comment applies to s9.1 second to last paragraph.
### Text updated

Stateful PCE priority
I was struck by this in s9.1: "A PCC implementation SHOULD allow the operator 
to specify delegation priority for PCEs"  How does that relate to PrefL, PrefR, 
PrefS, PrefY in RFCs 5088 and 5089?  Is this a new type of preference?  In 
which case, this is another extension required to those RFCs.
###open

Symbolic Path Names
As mentioned I think it should be unique in the administrative domain, not per 
PCC.

If it is possible to include the symbolic path name in messages after the first 
PCRpt, you need to specify what happens if the symbolic path name changes for a 
given LSP ID.  This is illegal behaviour but must be handled.  Suggestion:

*         Symbolic path name MAY be provided in subsequent messages to aid 
network diagnostics

*         It is not significant in those messages and is ignored by the 
receiver except for logging.
### discussed offline, text changes were made to clarify.

Minor comments
Definition of minimum cut-set on page 6 is incorrect.  Here is a correct 
definition:

*         Cut-set: a set of links which, when removed from the network, result 
in a specific source being isolated from a specific destination.

*         Minimum cut-set: a cut-set whose constituent links have the smallest 
possible summed capacity.
Section 3.1.1 paragraph 2, you should define what the auto-bandwidth system is, 
and explain why it is "particularly affected".
Section 3.1.2.2 para 2 case (a) please could you expand on where and why the 
deleterious effects arise?
### the above comments will be discussed as part of the applicability draft

Section 5.3 para 4 line 5 "will be generated": it will only be generated by a 
device that supports this draft!
Page 33 para 5 "a PCC MUST process all LSP Update Requests" - worth pointing 
out that process != obey.
Page 33 para 6 "triggering the LSP setup of a next LSP" - triggering is not in 
the scope of this draft, perhaps you mean "modifying the properties of".
### text updated for all the above

Section 7.2 O-bit: since the operator will always have configured an operation 
status for the LSP, how is the O bit ever used?  I think this belongs in the 
LSP-initiation draft.
### open - O-bit usage will be clarified in the next version

7.2.2 "The value contains the RSVP IPv4 ERROR_SPEC object " In fact it contains 
the contents of that object minus the common object header, not the whole 
object.  Ditto for IPv6.
### open - error processing will be updated for next version

7.2.3 final paragraph: "Since a PCE does not send LSP updates  to a PCC" Do you 
mean LSP Reports?
### updated

9.1 first paragraph: "A PCC implementation SHOULD allow the operator to specify 
multiple candidate PCEs" RFC 5440 already says this, doesn't need saying again.
9.1 first paragraph: "and a delegation preference for each candidate PCE" You 
say the same thing again later.  Perhaps just delete this whole sentence.
### discussed and left unchanged

9.4 "A PCC implementation SHOULD provide a command to show to which PCEs LSPs 
are delegated ."  This is ambiguous.  Does it mean show the set of PCEs to 
which any LSPs have been delegated, or does it mean show, for each LSP, whether 
it is delegated and if so, to which PCE?
### text updated

9.6 "It SHOULD also allow sending a notification when a rate threshold is 
reached."  Do you mean a PCNtf or some other type of notification?
### open, will be updated in the next version

Typos
Typo: s3.1.1 para 1 line 6 "it's" -> "its"
Typo: s3.1.1 para 1 line 8 "within in" -> "within"
Typo: s3.1.1 para 3 line 4 "promise of" -> "promise of the"
Typo: s3.1.2.2 para 2 line 10 "but it beyond" -> "but it is beyond"
Typo: page 23 line 1 "PCE or the PCE" -> "PCC or the PCE"
Typo: s5.5.1 para 1 line 6 "when the PCC wishes" -> "when the PCE wishes"
Typo: page 33 para 2 "that a PCC wishes" -> "that a PCE wishes"
Typo: page 33 para 2 "specified the Updare Request" -> "specified in the Update 
Request"

_______________________________________________
Pce mailing list
Pce@ietf.org
https://www.ietf.org/mailman/listinfo/pce

Re: [Pce] Comments on draft-ietf-pce-stateful-pce-02

Reply via email to