Hi,
As discussed, it is time to review PCEP from outside the design
team. JP asked (well, OK, he begged :-) that at this review I limit
myself to major protocol issues, and leave other documentation
issues to a later review. I have mainly done that, but I have also
included a few process-related comments that I think you should
resolve in the next revision.
Cheers,
Adrian
===
Frontpage authors list.
You need to cut this down to at most 5, but preferably no more than
4 people.
I guess it is time to work out who the main editors are, and make
the split.
===
Title
I don't think that naming the protocol "Version 1" is appropriate.
It is almost a statement of intended failure!
It is still OK to have the protocol version field.
===
Spelling
If you could run the text through a spell-checker, it would really
improve the readability and hence the quality of review.
===
"PCEP protocol"
Could you fix that rather annoying phrase!
===
Section 2 needs to include a statement of intent (or otherwise) to
satisfy the requirements in RFC4657.
===
I think section 4 belongs after section 5
===
Any reason why the Open, Close, and Keepalive messages do not
conform to the naming convention "PCXxxx" of all other messages?
===
Remove RFC2119 language from section 5
===
PCErr is missing from section 5.
Indeed, the use of PCNtfy as shown in figure 5 to express
cancellation of a request by a PCE would seem to be the correct
usage of PCErr.
I suggest a bit of a rationalisation (and perhaps the creation of a
new message - PCCncl).
Thus PCNtfy becomes a benign notification message, PCErr is used by
the PCE to report an error (but not a failure), PCCncl is used by
the PCC to cancel a PCReq.
It is best if we can resist overloading the semantics of the
different messages.
===
Section 5.2.5 would benefit from a statement that it is your
intention that the TCP connection can be re-established and recover
the context of a session without further exchange of Open messages.
I think this is what you intend. I am not sure it is good, and
would prefer that the session was also re-established with the
exchange of a list of PCReq I-Ds still outstanding OR that a failed
TCP connection automatically cleared the session. Otherwise you
complicate the protocol and the implementation with handling
duplicate request I-Ds and lost requests/responses.
===
In section 6 you imply that a malformed PCErr will cause the
generation of a PCErr. I suggest you need to avoid this behavior to
prevent duelling error messages!
===
Can you include a reference to the definition of BNF in section 6
===
In 6.1 you need to describe the default settings of the Flags and
Reserved fields
===
Do you really support 2^32 byte messages? If this is a conscious
decision, then maybe OK.
===
6.3 doesn't mention the use of keepalive to detect TCP connection
failure. I think it should.
===
In 6.4 the End-Points object is described as mandatory, but the BNF
has it optional
===
In 6.7
The PCErr message may be sent by a PCC or a PCE in response to a
request or in an unsolicited manner.
"may"?
===
In the objects you define reserved fields as MUST be set to zero.
I think you need:
- SHOULD be set to zero
- MUST be ignored on receipt
===
In 7.1 it is unclear, but I think the P flag would only have
meaning on a request, and the I flag on a response.
So actually, you only need one flag (or you have some clarification
text to write)
===
You will also need to explain what it means when a PCRep has the I
flag set for an object that had the P flag set on the PCReq. As
currently phrased this is an encoding error that would require a
PCErr, but that would clearly be a waste of time.
===
In fact, why would you ever send a PCErr in response to a PCRep?
===
At the end of 7.1 you need to explain padding rules
===
Why does the Open need a protocol version number? This is in the
message header so we all know what protocol is being used/supported.
in fact, I think you need to re-write the protocol version support
processing and negotiation.
===
Dead timer.
Should it be "SHOULD be set to zero, and MUST be ignored if the
keepalive timer is zero"
===
The fields in the Open object are a mess!
Can you bring the flags up with the version to improve the byte-
alignment?
===
Note that in the message header the version is four bits but in the
Open object it is three bits.
===
In the Open object
SID (PCEP session-ID - 8 bits): specifies a 2 octet unsigned PCEP
session number that identifies the current session.
Are you using bit compression?
===
Open object TLVs may be out of the scope of this doc (7.2) but the
format needs to be defined here.
===
When present in an Open message, the OPEN object specifies the
proposed PCEP session characteristics. Upon receiving unacceptable
PCEP session characteristics during the PCEP session initialization
phase, the receiving PCEP peer (PCE) MAY include a PCEP object
within
the PCErr message so as to propose alternative session
characteristic
values.
Isn't there an issue with duelling Open messages?
You seem to have an independent Open in each direction, each of
which is responded with a PCErr, leading to a new Open in each
direction.
Is it necessary to have this symmetry? After all, the TCP
connection is always initiated by the PCC, so can't we also have
the Open initiated by the PCC and responded with a PCErr or an Open
by the PCE. Keepalives then follow.
===
7.3
Why is the P flag set in an RP object carried in a PCNtf or PCErr?
No computation action is required.
===
Why is there a reserved piece of the RP object? Is there a
difference between the unused flags bits and the reserved field?
===
7.3.1
Note that it is not required for a PCE to
support the priority field: in this case, it is RECOMMENDED to set
the priority field to 0 by the PCC in the RP object.
Why would it matter what value is set if it is not supported by the
PCE?
Wouldn't it possibly be helpful if the PCE forwards the request to
another PCE?
===
7.3.1
As described here there is no difference between the R and the F
bit. Both are used to recompute a path allowing re-use of resources
already used by the path. Presumably in the case of the F bit, some
information about the failure is also supplied through an
exclusion, but this is not mentioned. The behavior at the PCE for
these two bits would seem to be identical so unless you can give
better separation of the meanings, I suggest you merge the bits.
===
7.3.1
The description of the O-bit is very confusing.
I think the problem lies with the definition of "explicit path"
Loose means that it is acceptable to include loose hops
You have no term to distinguish the use of abstract nodes that
contain more than one network node
Explicit just means an ERO
Strict means that only strict hops are allowed
===
7.3.1
You need to think about how the Request ID is handled when the PCC
restarts. Are you requiring the PCC to have persistent memory to
avoid re-use of a Request ID? (That is how the text reads!)
===
7.3.2
The description of R-bit handling looks broken.
The previous setting of the O-bit is not important because the PCC
is able to use RRO to collect information from the network. Thus,
the question is actually about how detailed the RRO is. The point
you are making remains (that it is hard to compute a proper make-
before-break resource-sharing path if you don't know the previous
path. [Those of you looking at the inter-area/AS PCE problem space
will recognise this as impossible to solve unless stateful PCEs are
used or the PCE is provided with a way to issue an enquiry to the
ABR/ASBR.]
===
7.3.2
If the PCC receives a PCRep message that contains a RP object
referring to an unknown Request-ID-Number, the PCC MUST send a PCErr
message with Error-Type="Unknown request reference".
This feels like at most a "MAY". Why is there a requirement to
respond?
===
7.4
I don't think the description on the inclusion of "failed"
constraints in the PCRep message is in agreement with the BNF in
section 6.5. The BNF appears to imply that an ERO must be present
in order to include, for example, a Metric object.
[By the way: I continue to believe that this business of reporting
the failed constraints is extremely dubious. Where there are two
absolute constraints to be satisfied, it is often impossible to
state which of them led to the failure to compute a path.]
===
7.4
How does a PCC tell the difference between a PCRep that contains
the "failed" constraints echoed from the PCReq, and modified
constraints that could have been satisfied? From the text it
appears that you intend the C-bit to make this distinction, but the
description of the C-bt does not say this.
===
7.4
Wouldn't it be a good idea to put a reason code in the No-Path object?
===
7.5
Can you add a note to say where the end points of the LSP *are*
encoded.
That is, although this PCReq is computing a segment only, it is
important (at least for policy) to know the LSP end points as well.
===
7.6
First paragraph is hard to parse, but I think it says that the
Bandwidth object must always be present in a PCReq! But the second
paragraph explains what it means if it is absent.
===
7.6
Can you give a references to the IEEE FP definition.
===
7.7
Please add a third metric: hop count
I think you need to allow the Metric object to be present multiple
times, and you have this in the text, but it is missing in the BNF.
On the other hand, you need to describe how to interpret multiple
Metric objects for the same Type setting
===
7.7
Why do you need the B-bit? The metric value is set to zero with the
same semantic.
===
7.7
The text appears to say that only one Metric object may be present
with the B-flag cleared. Why?
===
7.8 and 7.9 (and everywhere)
A real nit, but when you write "ERO object" or "RRO object" do you
mean EROO and RROO? :-)
===
7.8
You say that the ERO is bounded by 3209, 3473 and 3477 which is
fine, but how will an MPLS-TE LSR handle an ERO with an explicit
label in it? You also allow future extensions to subobjects.
I think this means that the PCReq must include a way for the
requester to limit the subobjects that will be returned.
===
7.9
RRO is also used for reporting the path of a failed LSP?
===
7.10
What are optional TLVs for?
Perhaps to carry the additional attributes as defined in RFC4420?
===
7.10
Why do you need an LSPA object without resource affinities? This is
pointless. It only exists in 3209 for backwards compatibility.
===
7.10
What is the meaning of holding priority with respect to path
computation?
===
7.10
Why are the Flags and Reserved fields different sizes in the two
objects?
===
7.10
When computing a path to satisfy a request with the L-bit set, how
will the PCE know which links are protected with FRR?
===
7.11
Good to see the IRO
Why is there no XRO?
Exclusion is a standard part of the MPLS-TE MIB (RFC3812 see
mplsTunnelHopInclude). If the exclusion can be configured on the
headend LSR, I think we need to be able to compute the path.
Besides, this is a requirement in RC4657.
===
7.11
Why is it necessary to list the subobjects for the IRO but not for
the ERO or the RRO?
===
7.12.1
I urge caution in the use of "simultaneously" in this section.
===
7.12.1
At the end of this section, you suddenly introduce the term
"bundle". I think this is supposed to refer to the presence of
multiple requests in a single PCReq message. Note that the SVEC can
apply across messages. It would, perhaps be better to explain that
requests in the same message are not necessarily associated by an
SVEC, and that requests in separate messages may be associated by
an SVEC.
===
7.12
You are missing text to describe the use of the SVEC on a PCRep
(allowed according to 6.5).
It feels like you would only use SVEC in the case of an RP, but the
BNF has them as independent.
Please clarify
===
7.12.2
You have MUST NOT for each of your diversity characteristics.
I think you also need "desired" so that you can get "as much
diversity as possible".
How will you handle the overlap between (for example) node and link
diversity? Just needs some text to explain.
===
7.12
Is it possible to end up with a nasty mesh of synchronized
requests? Are there any rules to resolve this?
===
10
The state machine is quite hard to parse for reviewing purposes. It
would be good to at least name the arrows on the figure. My
personal preference is to draw the FSM as a grid because this shows
up missing state/event combinations far more easily.
Is the OpenRetry counter just a substate of OpenWait?
The description of OpenRetry is at odds with the error used (1:5).
The error says a second unacceptable Open has been received, but
the text says that OpenRetry is incremented if the Open is anything
except malformed. Can I suggest that the use of a separate state
and not a substate would help you resolve this type of issue.
LocalOK also looks like a substate. Ditto RemoteOK. Please don't
use substates, they are evil! The flip flop between OpenWait and
KeepWait caused by these two variables and the negotiation of
parameters is ghastly.
I think that what the state machine shows us is that we do not need
independent Open messages in each direction. Anyway, if we *do*
need them, we need correct states to manage them.
===
10
If a malformed PCEP message is received or the TCP connection fails,
the systems sends a PCEP CLOSE message, the system releases the PCEP
resources for that PCEP peer, closes the TCP connection and moves to
the Idle State.
This seems to be in contradiction with text in the main body of the
draft and with the existence of error code 6.
===
App A
Is this appendix just present to help you track the requirements
you still need to satisfy, or do you intend to leave some
requirements unanswered?
BTW, I don't think this is a full list. Route exclusions is not
listed.
What are you doing to ensure that the list is complete?
===
I think Kenji's email address may have changed
Same for Jean-Louis
_______________________________________________
Pce mailing list
[email protected]
https://www1.ietf.org/mailman/listinfo/pce