From: JP Vasseur <[EMAIL PROTECTED]>
Date: August 21, 2007 9:45:06 AM EDT
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: [Pce] Re: PCE monitoring doc
Hi,
On Aug 20, 2007, at 3:07 AM, dimitri papadimitriou wrote:
hi,
JP Vasseur wrote:
Hi,
On Aug 16, 2007, at 4:20 AM, dimitri papadimitriou wrote:
hi j-p
as far as i remember, this doc. is still open for discussion
from San Diego and Prague mtg discussion.
An ID is always open for discussion as long as it is not an RFC.
just want to be sure this doc. is NOT a done deal like
presented since a couple of meetings.
Not sure what you mean (since it was not presented at all
in Chicago), but that's OK.
here below for the record
<http://www3.ietf.org/proceedings/06nov/minutes/pce.txt>
v03 has been reworked but does not provide answer to the
concerns expressed so far - quoting the doc.
> In PCE-based environments, it is critical to monitor > the
state of the
critical for what ? if computation time is an issue why delegate
it (isn't that the safest assumption ?)
Looking like you are quibbling on words here ... Do want to
replace "critical" by "useful", is that your point ?
this does not answer the real question i think (the real
point is why is it "critical" or "useful" to receive this
info from the PCE while a local timer on the PCC can very
simply determine the delay experienced between initiation
of the request and the reception of the response)
The aim of the newly defined messages is NOT to determine
the end to end delay of course *but* each delay along the PCE
chain to identify a potential issue, gets statistics, ... If your PCE
chain is made of 2 PCEs, it will provide you what delays have
been experienced along that chain on each PCE (queueing,
processing, ...).
> path computation chain for troubeshooting and > performance
monitoring purposes:
troubleshooting of what ?
I think that I gave this explanation a few times already ...
Suppose that the head-end experiences long response times. I
think that we can agree on the fact that this is
potentially an issue., in which case the user may want to
troubleshoot by issuing such a request in order to determine the
location (which PCE of the chain) and the root cause.
does not help. if there are real issues an answer may never
be received and the information received may be erroneous
anyway
note also that it would be rather surprising that PCE and
PCC can not make use of local timers to determine such
kind of delays (as i said the PCC against the PCE and
each PCE against the peering PCE)
The local timer gives you the overall end to end time (already
available on several PCC/PCE implementations) but does not
tell you which delays have been experiences on each element
along the PCE chain. You could have a Telnet session on each
PCE in your network and upon issuing a request try to log all
events and figure this out (try this ...) or have a PCE message that
does it for you from the head-end (this is what this ID is about).
if there is congestion/troubles how would you ensure the
information received back is accurate ?
The information of the waiting times, CPU utilization, ... is
provided by the PCE.
The issue of accurateness is no different than for any other
information provided for example by the MIB.
if you seek at MIB information then i suspect that they
are also available at the SNMP management level ... why
a specific protocol mechanism to correlate it ?
First I did say that you had similar information provided by SNMP.
I was using the example of SNMP answering your point about
the accurateness issue. Here we do collect statistics about queueuing
delays, processing delays and so on ... so does SNMP, this is no
different.
Back to your question "Why a specific protocol mechanism to correlate
it ?". Because you may want to have on-line mechanisms that do
require a NMS system, that's it. Note that this is no different
than Ping,
LSPPing, ...
i miss why LSR shall now start perform troubleshooting
of PCEs ... i thought that PCE where used to decrease
load on LSR not the contrary (PCE can be exceptionally
blocking / defectuous only otherwise resulting in more
troubles than being really improving performance)
Dimitri, yes the aim of the PCE is too help solving various issues.
Note that in case of inter-area, this is usually not a load issue but
a visibility issue, I guess that you noticed that. But when you
operate
a network you also need tools to make sure that things work as
expected and there might be circumstances under which they don't
thus the reason for such tools.
What a strange discussion ... it looks like you're questioning the
usefulness of OAM tools in general ... ? Again look at LSPPing: LSR
are of course expected to work, but still OAM tools are useful.
> liveness of each element (PCE) involved in the PCE > chain,
if i well remember the PCE is a client-server model (fundamental
assumption about the PCE approach) hence, why the client needs
to know the "chain" of PCE servers ?
Strange question ... Try to operate a network and you'll
immediately figure out.
i don't think you have captured the question. the point
is not whether the server shall be accountable but why
the PCC shall be made aware about the "chain"
the notion of the PCE chain is confusing from this point
of view
Because when you use a set of PCEs and you experience an issue
along the PCE chain, you want to know where.
Back to the previous case, consider the case of an inter-
domain TE deployment where the number of PCE chain may
potentially be large. Isn't useful during a troubleshooting event
for example to know the set of PCEs involved in a PCE chain ? (or
to check a particular PCE chain).
well i am not sure to follow the operational process you
are referring to,
I can see that.
if PCCs needs to check/monitoring and
validate each computed segment per PCE then it is also
better to maintain autonomous segment computation and
prevent the overhead of additional cross-verification
this triggers to me the following point, before devising
methods for troubleshooting and monitoring computational
results it may be appropriate to have a much better view
of the usage of PCE before progressing on "tools"
the same is ongoing at CCAMP where in-depth work has
been started in understanding the needs based on the
operational practice
> detection of potential resource contention states
a t[0] contention, at t[1] message for perf.mon -> are running
conditions identical ? since probabilisticaly, not what is the
expectation behind this mechanism ?
would it be possible to have a "curve" of the deterioration of
the performance as with an increasing number of computed path
the number of "monitoring messages" will also increase ?
Sure but ... this is true for all OAM tools.
nope. it has never been expected that the PCE gets dimensioned
to the total number of computed LSP over time (except for the
stateful PCEs) meaning that a stateless PCE was only expected
to support a certain number of request per unit of time (and
this independently of the accumulated number of computed paths)
So ?
I read the comments below and I think that they all fall
into the same category. It looks like you do not think that
on-line tools to gather data about the PCE chain is
useful. Apparently several SPs do think that this is a useful
thing to have. If you have any comment to improve the ID
there are always very much welcome.
Thanks.
JP.
If you use a ping to locate a congestion spot at the time you get
your reply the network state may have changed 10 times ... but if
you use it because of a sustained congestion state then the ping
may help you to locate the problem.
echo request/reply is not equivalent to resource consumption
measurement / monitoring on certain nodes and number of cycles
required for path computation not sure to really capture the
analogy
The exact same reasoning applies here. Note that you can also
retrieve historical data computed over a period of time, this is
a matter of local configuration on the PCE, in which case you
could retrieve the averaged (using for example a low pass
filter) computation time.
knowing the computational interval that could result from the
use of different PC algorithms and the various conditions (both
in terms of input and local resources) i am sure that off-line
statistics would be more suitable than real-time collection per
PCC
> statistics in term of path computation times are examples > of
such metrics of interest.
interest to who and for which purpose (detection is fine but
what is the issue to be solved) ?
Again sorry ... but strange question ... Interest to the user of
course.
it is not a strange question it is the initial question: which
problem are you trying to solve then one could try to assess
which method is appropriate
Before fixing a problem it is usually useful to locate the root
cause of the problem. This tool may help for that purpose.
hence your problem is the computational limitation of a PCE
what a PCC could do about ? do you need to sending back a
time value is of any use (delta between send/receive allows
the PCC to obtain the same kind of information assuming that
the PCE remains "reachable" i.e. no communication channel issue)
like any system, PCE requires suitable planning and dimensioning
wrt to perf.objectives i have impression that these fundamental
design steps are skipped.
let's start discussion with this.
side note: the document states "In this document we call a
"state metric" a metric that characterizes a PCE state" -> need
to define the latter.
Sure.
indeed, this question is still open since end of 2004.
-d.
JP.
thanks,
-d.
JP Vasseur wrote:
Hi,
Just let you know about an IPR disclosure that has been filed
with the IETF in relation to http://tools.ietf.org/id/draft-
vasseur-pce-monitoring-03.txt. You can see the disclosure at
https://datatracker.ietf.org/ipr/872/ and see the terms offered
by the IPR claimant.
Thanks,
JP.
_______________________________________________
Pce mailing list
[email protected]
https://www1.ietf.org/mailman/listinfo/pce
.
.
_______________________________________________
Pce mailing list
[email protected]
https://www1.ietf.org/mailman/listinfo/pce