Hi Dimitri,

I forgot to mention a key point ...

This ID meets a specific requirement of RFC4927.

5.  Manageability Considerations

   The inter-area application implies some new manageability
   requirements in addition to those already listed in [RFC4657].  The
   PCECP PCC and PCE MIB modules MUST allow recording the proportion of
inter-area requests and the success rate of inter-area requests. The
   PCECP PCC MIB module MUST also allow recording the performances of a
   PCE chain (minimum, maximum, and average response times), in case of
   multiple-PCE inter-area path computation.

   It is really important, for diagnostic and troubleshooting reasons,
   to monitor the availability and performances of each PCE of a PCE
   chain used for inter-area path computation.  Particularly, it is
   really important to identify the PCE(s) responsible for a delayed
   reply.

   Hence, a mechanism MUST be defined to monitor the performances of a
PCE chain. It MUST allow determining the availability of each PCE of
   the chain as well as its minimum, maximum, and average response
   times.

Thanks.

JP.


Begin forwarded message:

From: JP Vasseur <[EMAIL PROTECTED]>
Date: August 21, 2007 9:45:06 AM EDT
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: [Pce] Re: PCE monitoring doc

Hi,

On Aug 20, 2007, at 3:07 AM, dimitri papadimitriou wrote:

hi,

JP Vasseur wrote:
Hi,
On Aug 16, 2007, at 4:20 AM, dimitri papadimitriou wrote:
hi j-p

as far as i remember, this doc. is still open for discussion from San Diego and Prague mtg discussion.
An ID is always open for discussion as long as it is not an RFC.

just want to be sure this doc. is NOT a done deal like
presented since a couple of meetings.

Not sure what you mean (since it was not presented at all
in Chicago), but that's OK.


here below for the record

<http://www3.ietf.org/proceedings/06nov/minutes/pce.txt>

v03 has been reworked but does not provide answer to the concerns expressed so far - quoting the doc.

> In PCE-based environments, it is critical to monitor > the state of the

critical for what ? if computation time is an issue why delegate it (isn't that the safest assumption ?)
Looking like you are quibbling on words here ... Do want to replace "critical" by "useful", is that your point ?

this does not answer the real question i think (the real
point is why is it "critical" or "useful" to receive this
info from the PCE while a local timer on the PCC can very
simply determine the delay experienced between initiation
of the request and the reception of the response)


The aim of the newly defined messages is NOT to determine
the end to end delay of course *but* each delay along the PCE
chain to identify a potential issue, gets statistics, ... If your PCE
chain is made of 2 PCEs, it will provide you what delays have
been experienced along that chain on each PCE (queueing,
processing, ...).

> path computation chain for troubeshooting and > performance monitoring purposes:

troubleshooting of what ?
I think that I gave this explanation a few times already ... Suppose that the head-end experiences long response times. I think that we can agree on the fact that this is potentially an issue., in which case the user may want to troubleshoot by issuing such a request in order to determine the location (which PCE of the chain) and the root cause.

does not help. if there are real issues an answer may never
be received and the information received may be erroneous
anyway

note also that it would be rather surprising that PCE and
PCC can not make use of local timers to determine such
kind of delays (as i said the PCC against the PCE and
each PCE against the peering PCE)

The local timer gives you the overall end to end time (already
available on several PCC/PCE implementations) but does not
tell you which delays have been experiences on each element
along the PCE chain. You could have a Telnet session on each
PCE in your network and upon issuing a request try to log all
events and figure this out (try this ...) or have a PCE message that
does it for you from the head-end (this is what this ID is about).


if there is congestion/troubles how would you ensure the information received back is accurate ?
The information of the waiting times, CPU utilization, ... is provided by the PCE.

The issue of accurateness is no different than for any other information provided for example by the MIB.

if you seek at MIB information then i suspect that they
are also available at the SNMP management level ... why
a specific protocol mechanism to correlate it ?

First I did say that you had similar information provided by SNMP.
I was using the example of SNMP answering your point about
the accurateness issue. Here we do collect statistics about queueuing
delays, processing delays and so on ... so does SNMP, this is no
different.

Back to your question "Why a specific protocol mechanism to correlate
it ?". Because you may want to have on-line mechanisms that do
require a NMS system, that's it. Note that this is no different than Ping,
LSPPing, ...


i miss why LSR shall now start perform troubleshooting
of PCEs ... i thought that PCE where used to decrease
load on LSR not the contrary (PCE can be exceptionally
blocking / defectuous only otherwise resulting in more
troubles than being really improving performance)


Dimitri, yes the aim of the PCE is too help solving various issues.
Note that in case of inter-area, this is usually not a load issue but
a visibility issue, I guess that you noticed that. But when you operate
a network you also need tools to make sure that things work as
expected and there might be circumstances under which they don't
thus the reason for such tools.

What a strange discussion ... it looks like you're questioning the
usefulness of OAM tools in general ... ? Again look at LSPPing: LSR
are of course expected to work, but still OAM tools are useful.

> liveness of each element (PCE) involved in the PCE > chain,

if i well remember the PCE is a client-server model (fundamental assumption about the PCE approach) hence, why the client needs to know the "chain" of PCE servers ?
Strange question ... Try to operate a network and you'll immediately figure out.

i don't think you have captured the question. the point
is not whether the server shall be accountable but why
the PCC shall be made aware about the "chain"

the notion of the PCE chain is confusing from this point
of view

Because when you use a set of PCEs and you experience an issue
along the PCE chain, you want to know where.


Back to the previous case, consider the case of an inter-
domain TE deployment where the number of PCE chain may potentially be large. Isn't useful during a troubleshooting event for example to know the set of PCEs involved in a PCE chain ? (or to check a particular PCE chain).

well i am not sure to follow the operational process you
are referring to,

I can see that.

if PCCs needs to check/monitoring and
validate each computed segment per PCE then it is also
better to maintain autonomous segment computation and
prevent the overhead of additional cross-verification

this triggers to me the following point, before devising
methods for troubleshooting and monitoring computational
results it may be appropriate to have a much better view
of the usage of PCE before progressing on "tools"

the same is ongoing at CCAMP where in-depth work has
been started in understanding the needs based on the
operational practice

> detection of potential resource contention states

a t[0] contention, at t[1] message for perf.mon -> are running conditions identical ? since probabilisticaly, not what is the expectation behind this mechanism ?

would it be possible to have a "curve" of the deterioration of the performance as with an increasing number of computed path the number of "monitoring messages" will also increase ?
Sure but ... this is true for all OAM tools.

nope. it has never been expected that the PCE gets dimensioned
to the total number of computed LSP over time (except for the
stateful PCEs) meaning that a stateless PCE was only expected
to support a certain number of request per unit of time (and
this independently of the accumulated number of computed paths)


So ?

I read the comments below and I think that they all fall
into the same category. It looks like you do not think that
on-line tools to gather data about the PCE chain is
useful. Apparently several SPs do think that this is a useful
thing to have. If you have any comment to improve the ID
there are always very much welcome.

Thanks.

JP.


If you use a ping to locate a congestion spot at the time you get your reply the network state may have changed 10 times ... but if you use it because of a sustained congestion state then the ping may help you to locate the problem.

echo request/reply is not equivalent to resource consumption
measurement / monitoring on certain nodes and number of cycles
required for path computation not sure to really capture the
analogy

The exact same reasoning applies here. Note that you can also
retrieve historical data computed over a period of time, this is a matter of local configuration on the PCE, in which case you could retrieve the averaged (using for example a low pass
filter) computation time.

knowing the computational interval that could result from the
use of different PC algorithms and the various conditions (both
in terms of input and local resources) i am sure that off-line
statistics would be more suitable than real-time collection per
PCC

> statistics in term of path computation times are examples > of such metrics of interest.

interest to who and for which purpose (detection is fine but what is the issue to be solved) ?
Again sorry ... but strange question ... Interest to the user of course.

it is not a strange question it is the initial question: which
problem are you trying to solve then one could try to assess
which method is appropriate

Before fixing a problem it is usually useful to locate the root cause of the problem. This tool may help for that purpose.

hence your problem is the computational limitation of a PCE
what a PCC could do about ? do you need to sending back a
time value is of any use (delta between send/receive allows
the PCC to obtain the same kind of information assuming that
the PCE remains "reachable" i.e. no communication channel issue)

like any system, PCE requires suitable planning and dimensioning wrt to perf.objectives i have impression that these fundamental design steps are skipped.

let's start discussion with this.

side note: the document states "In this document we call a "state metric" a metric that characterizes a PCE state" -> need to define the latter.
Sure.

indeed, this question is still open since end of 2004.

-d.
JP.

thanks,
-d.


JP Vasseur wrote:
Hi,
Just let you know about an IPR disclosure that has been filed with the IETF in relation to http://tools.ietf.org/id/draft- vasseur-pce-monitoring-03.txt. You can see the disclosure at https://datatracker.ietf.org/ipr/872/ and see the terms offered by the IPR claimant.
Thanks,
JP.
_______________________________________________
Pce mailing list
[email protected]
https://www1.ietf.org/mailman/listinfo/pce
.
.


_______________________________________________
Pce mailing list
[email protected]
https://www1.ietf.org/mailman/listinfo/pce

_______________________________________________
Pce mailing list
[email protected]
https://www1.ietf.org/mailman/listinfo/pce

Reply via email to