Hi,

As discussed, it is time to review PCEP from outside the design team. JP asked (well, OK, he begged :-) that at this review I limit myself to major protocol issues, and leave other documentation issues to a later review. I have mainly done that, but I have also included a few process-related comments that I think you should
resolve in the next revision.

Cheers,
Adrian
===
Frontpage authors list.
You need to cut this down to at most 5, but preferably no more than 4 people. I guess it is time to work out who the main editors are, and make the split.
===
Title
I don't think that naming the protocol "Version 1" is appropriate. It is almost a statement of intended failure!
It is still OK to have the protocol version field.
===
Spelling
If you could run the text through a spell-checker, it would really improve the readability and hence the quality of review.
===
"PCEP protocol"
Could you fix that rather annoying phrase!
===
Section 2 needs to include a statement of intent (or otherwise) to satisfy the requirements in RFC4657.
===
I think section 4 belongs after section 5
===
Any reason why the Open, Close, and Keepalive messages do not conform to the naming convention "PCXxxx" of all other messages?
===
Remove RFC2119 language from section 5
===
PCErr is missing from section 5.
Indeed, the use of PCNtfy as shown in figure 5 to express cancellation of a request by a PCE would seem to be the correct usage of PCErr. I suggest a bit of a rationalisation (and perhaps the creation of a new message - PCCncl). Thus PCNtfy becomes a benign notification message, PCErr is used by the PCE to report an error (but not a failure), PCCncl is used by the PCC to cancel a PCReq. It is best if we can resist overloading the semantics of the different messages.
===
Section 5.2.5 would benefit from a statement that it is your intention that the TCP connection can be re-established and recover the context of a session without further exchange of Open messages. I think this is what you intend. I am not sure it is good, and would prefer that the session was also re-established with the exchange of a list of PCReq I-Ds still outstanding OR that a failed TCP connection automatically cleared the session. Otherwise you complicate the protocol and the implementation with handling duplicate request I-Ds and lost requests/responses.
===
In section 6 you imply that a malformed PCErr will cause the generation of a PCErr. I suggest you need to avoid this behavior to prevent duelling error messages!
===
Can you include a reference to the definition of BNF in section 6
===
In 6.1 you need to describe the default settings of the Flags and Reserved fields
===
Do you really support 2^32 byte messages? If this is a conscious decision, then maybe OK.
===
6.3 doesn't mention the use of keepalive to detect TCP connection failure. I think it should.
===
In 6.4 the End-Points object is described as mandatory, but the BNF has it optional
===
In 6.7
 The PCErr message may be sent by a PCC or a PCE in response to a
 request or in an unsolicited manner.
"may"?
===
In the objects you define reserved fields as MUST be set to zero.
I think you need:
- SHOULD be set to zero
- MUST be ignored on receipt
===
In 7.1 it is unclear, but I think the P flag would only have meaning on a request, and the I flag on a response. So actually, you only need one flag (or you have some clarification text to write)
===
You will also need to explain what it means when a PCRep has the I flag set for an object that had the P flag set on the PCReq. As currently phrased this is an encoding error that would require a PCErr, but that would clearly be a waste of time.
===
In fact, why would you ever send a PCErr in response to a PCRep?
===
At the end of 7.1 you need to explain padding rules
===
Why does the Open need a protocol version number? This is in the message header so we all know what protocol is being used/supported. in fact, I think you need to re-write the protocol version support processing and negotiation.
===
Dead timer.
Should it be "SHOULD be set to zero, and MUST be ignored if the keepalive timer is zero"
===
The fields in the Open object are a mess!
Can you bring the flags up with the version to improve the byte- alignment?
===
Note that in the message header the version is four bits but in the Open object it is three bits.
===
In the Open object
 SID (PCEP session-ID - 8 bits): specifies a 2 octet unsigned PCEP
 session number that identifies the current session.
Are you using bit compression?
===
Open object TLVs may be out of the scope of this doc (7.2) but the format needs to be defined here.
===
 When present in an Open message, the OPEN object specifies the
 proposed PCEP session characteristics.  Upon receiving unacceptable
 PCEP session characteristics during the PCEP session initialization
phase, the receiving PCEP peer (PCE) MAY include a PCEP object within the PCErr message so as to propose alternative session characteristic
 values.
Isn't there an issue with duelling Open messages?
You seem to have an independent Open in each direction, each of which is responded with a PCErr, leading to a new Open in each direction. Is it necessary to have this symmetry? After all, the TCP connection is always initiated by the PCC, so can't we also have the Open initiated by the PCC and responded with a PCErr or an Open by the PCE. Keepalives then follow.
===
7.3
Why is the P flag set in an RP object carried in a PCNtf or PCErr? No computation action is required.
===
Why is there a reserved piece of the RP object? Is there a difference between the unused flags bits and the reserved field?
===
7.3.1
 Note that it is not required for a PCE to
 support the priority field: in this case, it is RECOMMENDED to set
 the priority field to 0 by the PCC in the RP object.
Why would it matter what value is set if it is not supported by the PCE? Wouldn't it possibly be helpful if the PCE forwards the request to another PCE?
===
7.3.1
As described here there is no difference between the R and the F bit. Both are used to recompute a path allowing re-use of resources already used by the path. Presumably in the case of the F bit, some information about the failure is also supplied through an exclusion, but this is not mentioned. The behavior at the PCE for these two bits would seem to be identical so unless you can give better separation of the meanings, I suggest you merge the bits.
===
7.3.1
The description of the O-bit is very confusing.
I think the problem lies with the definition of "explicit path"
Loose means that it is acceptable to include loose hops
You have no term to distinguish the use of abstract nodes that contain more than one network node
Explicit just means an ERO
Strict means that only strict hops are allowed
===
7.3.1
You need to think about how the Request ID is handled when the PCC restarts. Are you requiring the PCC to have persistent memory to avoid re-use of a Request ID? (That is how the text reads!)
===
7.3.2
The description of R-bit handling looks broken.
The previous setting of the O-bit is not important because the PCC is able to use RRO to collect information from the network. Thus, the question is actually about how detailed the RRO is. The point you are making remains (that it is hard to compute a proper make- before-break resource-sharing path if you don't know the previous path. [Those of you looking at the inter-area/AS PCE problem space will recognise this as impossible to solve unless stateful PCEs are used or the PCE is provided with a way to issue an enquiry to the ABR/ASBR.]
===
7.3.2
 If the PCC receives a PCRep message that contains a RP object
 referring to an unknown Request-ID-Number, the PCC MUST send a PCErr
 message with Error-Type="Unknown request reference".
This feels like at most a "MAY". Why is there a requirement to respond?
===
7.4
I don't think the description on the inclusion of "failed" constraints in the PCRep message is in agreement with the BNF in section 6.5. The BNF appears to imply that an ERO must be present in order to include, for example, a Metric object. [By the way: I continue to believe that this business of reporting the failed constraints is extremely dubious. Where there are two absolute constraints to be satisfied, it is often impossible to state which of them led to the failure to compute a path.]
===
7.4
How does a PCC tell the difference between a PCRep that contains the "failed" constraints echoed from the PCReq, and modified constraints that could have been satisfied? From the text it appears that you intend the C-bit to make this distinction, but the description of the C-bt does not say this.
===
7.4
Wouldn't it be a good idea to put a reason code in the No-Path object?
===
7.5
Can you add a note to say where the end points of the LSP *are* encoded. That is, although this PCReq is computing a segment only, it is important (at least for policy) to know the LSP end points as well.
===
7.6
First paragraph is hard to parse, but I think it says that the Bandwidth object must always be present in a PCReq! But the second paragraph explains what it means if it is absent.
===
7.6
Can you give a references to the IEEE FP definition.
===
7.7
Please add a third metric: hop count
I think you need to allow the Metric object to be present multiple times, and you have this in the text, but it is missing in the BNF. On the other hand, you need to describe how to interpret multiple Metric objects for the same Type setting
===
7.7
Why do you need the B-bit? The metric value is set to zero with the same semantic.
===
7.7
The text appears to say that only one Metric object may be present with the B-flag cleared. Why?
===
7.8 and 7.9 (and everywhere)
A real nit, but when you write "ERO object" or "RRO object" do you mean EROO and RROO? :-)
===
7.8
You say that the ERO is bounded by 3209, 3473 and 3477 which is fine, but how will an MPLS-TE LSR handle an ERO with an explicit label in it? You also allow future extensions to subobjects. I think this means that the PCReq must include a way for the requester to limit the subobjects that will be returned.
===
7.9
RRO is also used for reporting the path of a failed LSP?
===
7.10
What are optional TLVs for?
Perhaps to carry the additional attributes as defined in RFC4420?
===
7.10
Why do you need an LSPA object without resource affinities? This is pointless. It only exists in 3209 for backwards compatibility.
===
7.10
What is the meaning of holding priority with respect to path computation?
===
7.10
Why are the Flags and Reserved fields different sizes in the two objects?
===
7.10
When computing a path to satisfy a request with the L-bit set, how will the PCE know which links are protected with FRR?
===
7.11
Good to see the IRO
Why is there no XRO?
Exclusion is a standard part of the MPLS-TE MIB (RFC3812 see mplsTunnelHopInclude). If the exclusion can be configured on the headend LSR, I think we need to be able to compute the path.
Besides, this is a requirement in RC4657.
===
7.11
Why is it necessary to list the subobjects for the IRO but not for the ERO or the RRO?
===
7.12.1
I urge caution in the use of "simultaneously" in this section.
===
7.12.1
At the end of this section, you suddenly introduce the term "bundle". I think this is supposed to refer to the presence of multiple requests in a single PCReq message. Note that the SVEC can apply across messages. It would, perhaps be better to explain that requests in the same message are not necessarily associated by an SVEC, and that requests in separate messages may be associated by an SVEC.
===
7.12
You are missing text to describe the use of the SVEC on a PCRep (allowed according to 6.5). It feels like you would only use SVEC in the case of an RP, but the BNF has them as independent.
Please clarify
===
7.12.2
You have MUST NOT for each of your diversity characteristics.
I think you also need "desired" so that you can get "as much diversity as possible". How will you handle the overlap between (for example) node and link diversity? Just needs some text to explain.
===
7.12
Is it possible to end up with a nasty mesh of synchronized requests? Are there any rules to resolve this?
===
10
The state machine is quite hard to parse for reviewing purposes. It would be good to at least name the arrows on the figure. My personal preference is to draw the FSM as a grid because this shows up missing state/event combinations far more easily.

Is the OpenRetry counter just a substate of OpenWait?

The description of OpenRetry is at odds with the error used (1:5). The error says a second unacceptable Open has been received, but the text says that OpenRetry is incremented if the Open is anything except malformed. Can I suggest that the use of a separate state and not a substate would help you resolve this type of issue.

LocalOK also looks like a substate. Ditto RemoteOK. Please don't use substates, they are evil! The flip flop between OpenWait and KeepWait caused by these two variables and the negotiation of parameters is ghastly.

I think that what the state machine shows us is that we do not need independent Open messages in each direction. Anyway, if we *do* need them, we need correct states to manage them.
===
10
 If a malformed PCEP message is received or the TCP connection fails,
 the systems sends a PCEP CLOSE message, the system releases the PCEP
 resources for that PCEP peer, closes the TCP connection and moves to
 the Idle State.
This seems to be in contradiction with text in the main body of the draft and with the existence of error code 6.
===
App A
Is this appendix just present to help you track the requirements you still need to satisfy, or do you intend to leave some requirements unanswered?

BTW, I don't think this is a full list. Route exclusions is not listed.

What are you doing to ensure that the list is complete?
===
I think Kenji's email address may have changed
Same for Jean-Louis



_______________________________________________
Pce mailing list
[email protected]
https://www1.ietf.org/mailman/listinfo/pce

Reply via email to