Re: [GROW] Proposed updates to GROW charter

2020-07-29 Thread UTTARO, JAMES
I understand that GROW is specific to Global Routing Operations at the current 
time. Is the intention of the new charter to address BGP as a protocol when 
explicitly dealing with routing state for standalone or inter-connected ... or 
is the scope expanded to include BGP as it is currently deployed to support 
non-routing and hybrid use cases?  

BGP is used for across multiple FOUs i.e Kompella, EVPN, FlowSpec, BGP-LS 
etc... Each of these present unique operational challenges. 

Will the charter address these and other FOUs BGP is used for?

Thanks,
Jim Uttaro

-Original Message-
From: GROW  On Behalf Of Christopher Morrow
Sent: Wednesday, July 29, 2020 5:17 PM
To: Alvaro Retana 
Cc:  ; grow@ietf.org grow@ietf.org 
; Warren Kumari 
Subject: Re: [GROW] Proposed updates to GROW charter

On Wed, Jul 29, 2020 at 8:54 PM Alvaro Retana  wrote:
>
> Job:
>
> Hi!
>
>
> Thanks for addressing my comments!
>
>
> I have just a couple more things:
>
> - The use of "Internet networks" doesn't sound right...perhaps "Internet-
>   connected networks"?   Looking at the rest of the charter, I assume that,

Job an I had a concern (I think we both shared this concern) about
limiting the problems/solutions/monitoring/etc to
'internet-connected'.
I think that 'bgp' (or external routing protocols) are used on both
'internet connected' and 'all the other' IP networks, right?
should the benefits GROW pushes for be limited to 'internet connected' ?

(I agree the current wording could use some help, how do we capture
the idea that any IP network that ends up using a bgp can benefit?)

>   for example, the operations of BGP in a non-Internet-connected network (a
>   data center, for example) is not within the scope of grow.  Is that the
>   intent, or am I reading too much into it?

i think we were hoping to cover  both, actually. our wording choice
wasn't great :)

> - [nit] The last goal talks about "preventing malpractice in the global
>   routing system".  I'm not sure documentation can stop someone from doing
>   the wrong thing -- especially if they want to.  Maybe this goal can be
>   worded positively: Document best practices and recommendations to assist
>   in the proper operation of the global routing system.

sounds ok to me :)

> Thanks!
>
> Alvaro.
>
>
>
> On July 29, 2020 at 1:07:06 PM, Job Snijders
> (j...@ntt.net(mailto:j...@ntt.net)) wrote:
>
> > Dear all,
> >
> > Below is a third revision of the charter proposal, we attempted to
> > incorporate all the feedback received so far, specifically Alvaro's.
> >
> > Please let us know your feedback!
> >
> > Kind regards,
> >
> > Job & Chris
> >
> > --
> >
> > Charter for Working Group
> > ==
> >
> > The purpose of GROW is to consider the operational problems associated
> > with the Internet Protocol (IP) global routing systems, including but
> > not limited to default-free zone routing table growth, effects of the
> > interactions between interior and exterior routing protocols, the effect
> > of address allocation policies, or practices on the global routing
> > system. Where appropriate, GROW documents the operational aspects of
> > measurement, monitoring, policy, routing system security, VPN
> > infrastructures, or safe default behavior of IP routing protocol
> > implementations and deployments.
> >
> > GROW will also advise various working groups, specifically IDR and
> > SIDROPS, with respect to whether it is addressing the relevant
> > operational and routing security requirements of Internet networks,
> > and where appropriate, suggest course corrections. Finally, operational
> > requirements developed in GROW can also be used by any working group
> > chartered with standardizing a next generation inter-domain routing
> > protocol.
> >
> > GOALS
> > -
> >
> > * Provide stewardship and maintenance for the BGP Monitoring Protocol (BMP)
> > * Provide stewardship and maintenance for the Multi-Threaded Routing
> > Toolkit (MRT) Routing Information Export Format
> > * Document Best Current Practises for operations of the Internet global
> > routing system.
> > * Analyze aspects for supporting new applications, including extending
> > existing routing protocols or creating new ones. This includes risk,
> > interference, and application fit.
> > * Document the operational aspects of securing the Internet routing
> > system, and provide recommendations to other WGs.
> > * Provide documentation to assist in preventing malpractice in the
> > global routing system.
> >
> > Milestones
> > --
> >
> > 2020 - "Support for Local RIB in BGP Monitoring Protocol (BMP)" to IESG
> > 2020 - "BMP Peer Up Message Namespace" to IESG
> > 2020 - "Revision to Registration Procedures for Multiple BMP Registries" to 
> > IESG.
> > 2021 - "Document negative consequences of de-aggregating received routes 
> > for traffic engineering purposes" - to IESG
> > 2021 - "TLV support for BMP Route Mo

Re: [GROW] Limiting AS path length?

2019-09-17 Thread UTTARO, JAMES
+1

From: GROW  On Behalf Of Job Snijders
Sent: Monday, September 16, 2019 2:37 PM
To: Jared Mauch 
Cc: Iljitsch van Beijnum ; grow@ietf.org
Subject: Re: [GROW] Limiting AS path length?

Limiting the AS_PATH length - from an IETF RFC publication process in context 
of providing operational guidance, probably shouldn’t be “limit the path length 
to avoid vendor bugs”.

Instead, the guidance perhaps should be “please report and fix bugs”, right? :-)

Kind regards,

Job
___
GROW mailing list
GROW@ietf.org
https://www.ietf.org/mailman/listinfo/grow


Re: [GROW] Prefix limit ORF: add grace time

2019-04-02 Thread UTTARO, JAMES
+1..

VRF-Limit and Session Limit used together to protect a router.. VRF Limit is 
imposed such that multiple sessions for a given VPN customers cannot overflow 
due to the aggregate number of routes coming over all sessions for said VPN.

Jim Uttaro

From: GROW  On Behalf Of Robert Raszuk
Sent: Tuesday, April 02, 2019 5:35 AM
To: Maria Matějka 
Cc: grow@ietf.org
Subject: Re: [GROW] Prefix limit ORF: add grace time

Hi,

Prefix-limit is a feature which was originally invented to save BGP table size 
in the cases of L3VPN such that any VPN site with dynamic routing protocol 
could not overflow control plane capacity of PE it is attached to.

A bit later it has been extended to cover also BGP sessions not related to 
VPNs/VRFs.

As such the prefix limit behavior is an independent vendor choice on how to 
implement it. For example there can be a knob not to reset the session at all 
but just log warning. There can be a knob to auto restart the session etc  
To the best of my knowledge there is no spec defining what prefix limit inbound 
or outbound (if we ever get there) should do.

Job had a presentation during last GROW meeting on that.

Here the draft from Keyur only defines the ORF mechanism on how to communicate 
such limit between peers. It really does not define how the prefix limit should 
be handled by either side.

Maybe we need new document to specify it provided vendor would agree to adjust 
their current zoo of implementations 

Kind regards,
R.

On Tue, Apr 2, 2019 at 11:21 AM Maria Matějka 
mailto:jan.mate...@nic.cz>> wrote:
Thank you for clarifying this. Therefore it has no sense to make it more
complicated in this way.

Anyway, I'd like to have there at least a note saying that if the prefix
limit is too tight, you may get an unwanted bgp session drop simply due
to temporary conditions. Is this reasonable?

Thanks
Maria

On 4/1/19 7:54 PM, Robert Raszuk wrote:
> Well you still need to indicate to prefix limit check when it should
> start x2 multiplication and when the specific upgrade which requires it
> is over.
>
> Note that vast majority of customers use prefix limit as a safety fuse
> normally allowing x2 or even x5 under normal operation.
>
> Best,
> R.
>
> On Mon, Apr 1, 2019, 19:50 Maria Matějka 
> mailto:jan.mate...@nic.cz>
> >> wrote:
>
> Oops, I forgot to write there that during the grace period the limit
> should be only temporarily raised by a factor of 2 to allow complete
> route exchange, not lifted at all.
>
> I think this would be much simpler for the users than fiddling with
> the limit more times.
>
> Thx
> Mariais
>
> On April 1, 2019 6:30:45 PM GMT+02:00, Robert Raszuk
> mailto:rob...@raszuk.net> 
> >> wrote:
>
> Hi Maria,
>
> So your suggestion is not to apply this limit at all (ie. have
> unlimited transient) - don't you think that in such a case your
> weakest network elements may just crash if say they expect 10
> and will get 700K prefixes ?
>
> I think what you are asking for can be easily achieved today
> during described migration if you adjust the prefix limit to
> some controlled higher value before your planned policy change
> then simply revert it back. Seems like very safe and problem
> free operation.
>
> Many thx,
> R.
>
> On Mon, Apr 1, 2019 at 5:29 PM Maria Matějka 
> mailto:jan.mate...@nic.cz>
> >> wrote:
>
> Hello!
>
> I'd like to suggest adding a grace time to the Prefix Limit
> ORF-Type.
> Its purpose is to allow temporary overrun of the limit while
> reloading
> the routes after a policy is changed.
>
> Why: If the peer exports e.g. 2001:db8:0/48 through
> 2001:db8:7/48 and
> it wants to substitute them for 2001:db8:0/45, it first has
> to add the
> less specific prefix and then drop the more specific
> prefixes. Doing this
> on large scale may override the limits temporarily which
> would lead to
> unneeded BGP session drop.
>
> Here are the changes to be done to the RFC text:
>
> * append to section 3:
> The "Grace-Time" is a two-byte unsigned integer. It
> indicates
> the number of seconds for which the Prefix-Limit can
> be exceeded.
>
> * append to section 4 the Grace-Time directly after the
> Prefix-Limit
>
> * insert to section 6.1..1 after 2nd paragraph:
> The sending speaker MUST wait for the Grace-Time
> period before
> taking corrective action. If the peer gets from the
>   

Re: [GROW] A Simple BGP-based Mobile Routing System for the Aeronautical Telecommunications Network

2018-03-14 Thread UTTARO, JAMES
Fred,

Took a read on the draft.. A few comments below..

First my understanding of this solution is that is BGP 
underlay, BGP Overlay.. Is this correct? If so, what is the ask in Section 6 
last paragraph? If it is to create multiple instances of BGP would that be 
using Local-AS, Dual-AS etc… or to actually to create two BGP Instances each 
with their own AS? We have had that discussion at IETF and it is similar to an 
ask I have where I am using BGP 3107 as underlay and 2547, VPN etc… as overlay..

There is no discussion of the actual topology in terms of s-ASBRs and C-ASBRs.. 
Is there some notion of geography, nation, continent etc…

Sect 3 ( Top of Pg 8 )

“Since ATN/IPS end systems …”
You indicate that airplanes remain in the stub area for an extended period of 
time and therefore intra-domain mobility events are handled in the stub area 
only. How is this possible? There must come a time where the MNP for said 
aircraft will not be available via a given cell tower or cellular provider.  
What am I missing here..

Sect 3 ( Middle of Pg 8 )

It seems that you are going to apply route filters on the c-asbrs so that a 
given set of them will service a given set of MSPs.


a)  Are these filters applied inbound/outbound

b)  Is this generally static or will operations be mucking with these 
often. More hands equals more problems is my experience.

c)   What happens if mis-config? What is the blast radius of getting this 
wrong.

Thanks,
Jim Uttaro


From: GROW [mailto:grow-boun...@ietf.org] On Behalf Of Templin, Fred L
Sent: Wednesday, March 14, 2018 2:19 PM
To: Christopher Morrow 
Cc: grow@ietf.org; Saccone, Gregory T ; Gaurav 
Dawra 
Subject: Re: [GROW] A Simple BGP-based Mobile Routing System for the 
Aeronautical Telecommunications Network

Chris,

I forgot to mention that one of the key requirements is that there be no
dynamic routing protocol running over the air-to-ground data links. The
data links we are talking about have data rates as low as 1Mbps and even
lower, and the civil aviation community has declared that the control
message overhead must be kept to a minimum. So, placing a BGP
speaker on-board the airplane would not be  acceptable.

Is there interest in having a presentation about this in London next
week?

Thanks - Fred

From: GROW [mailto:grow-boun...@ietf.org] On Behalf Of Templin, Fred L
Sent: Tuesday, March 13, 2018 8:57 AM
To: Christopher Morrow 
mailto:christopher.mor...@gmail.com>>
Cc: grow@ietf.org; Saccone, Gregory T 
mailto:gregory.t.sacc...@boeing.com>>; Gaurav 
Dawra mailto:gdawra.i...@gmail.com>>
Subject: Re: [GROW] A Simple BGP-based Mobile Routing System for the 
Aeronautical Telecommunications Network

Hi Chris,

>it's bad for bgp on the global scale, but in a VPN scenario you're talking 
>about ~10k routes? (number of planes concurrently in the air) and transitions 
>at a rate of 100/second? 500/second? (what rate is >expected at 10k planes? at 
>100k planes?)

The model is that each airplane gets one or more IPv6 prefixes and acts as a 
mobile
network. So, it has a mobile router on board, and uses the IPv6 prefixes to 
number
its downstream-attached devices and networks – an airborne Internet of Things.
The IPv6 prefixes stay the same wherever the plane roams to (more on that 
below).
But, the plane’s underlying data link connections can be changing very 
dynamically,
e.g., switch from SATCOM to cellular, update QoS due to signal fading, etc.

>For quick/dirty numbers:
>https://www.telegraph.co.uk/travel/travel-truths/how-many-planes-are-there-in-the-world/
>
>says there are 25k planes (round numbers) planes that I think qualify in your 
>pool.

You are very correct to check on the current numbers of planes. For civil 
aviation,
we currently see tens of thousands. But, the system should be flexible to 
support
several orders of magnitude more than that with the multitudes of unmanned
aircraft expected to be coming into the airspace in the near future.

>why would you change ip addressing on the plane? having them keep their 
>addressing seems simpler and more conducive to stability, no?

Right, the airplane’s on-board IPv6 prefixes used for downstream IoT addressing
never change. It is the plane’s upstream data link addresses that can change
dynamically, i.e., in the same way that a cellphone’s WiFi and/or 4G addresses
can change.

Again, the design is to keep mobility-related churn out of BGP in the core
of the network and to keep the churn out in the edges of the network.

Thanks - Fred


From: Christopher Morrow [mailto:christopher.mor...@gmail.com]
Sent: Tuesday, March 13, 2018 8:24

Re: [GROW] Route Server ASN stripping hiding considered harmful?

2017-12-19 Thread UTTARO, JAMES
Interesting discussion, coming from the VPN space the selection of an AS as a 
global or regional value has been a long discussion.. For better or worse AS 
value per region forces a different paradigm than AS per global.. In the VPN 
space we take great pains to not affect the AS or AS-Path at is meaningful to 
customers. IMO AIGP is for more useful for selection for certain applications 
i.e 3107..

Jim Uttaro

-Original Message-
From: GROW [mailto:grow-boun...@ietf.org] On Behalf Of Jeffrey Haas
Sent: Tuesday, December 19, 2017 3:02 PM
To: Nick Hilliard 
Cc: grow@ietf.org; Job Snijders 
Subject: Re: [GROW] Route Server ASN stripping hiding considered harmful?

On Mon, Dec 18, 2017 at 01:04:03PM +, Nick Hilliard wrote:
> It's also common practice for transit providers to use a single ASN
> spanning the globe e.g. 174, 2914, 3356, etc. What you're describing
> here is an aspect of the fact that that as-pathlen has not been a useful
> determinant for the bgp decision engine for many years.

While somewhat orthogonal to this discussion, path length (and thus
prepending) is about the only useful knob many BGP speakers have to try to
bias incoming traffic.  I suspect you mean something a bit different above.

> Updating rfc4271 would be more productive - and getting IXPs to filter
> ingress bgp feeds by default.

I hope to be retired before that level of "incremental update" is expected
to work in the Internet at large. :-)

-- Jeff

P.S. many providers provide knobs to ignore path length as a consideration.
No spec work is required.

___
GROW mailing list
GROW@ietf.org
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ietf.org_mailman_listinfo_grow&d=DwICAg&c=LFYZ-o9_HUMeMTSQicvjIg&r=3qhKphE8RnwJQ6u8MrAGeA&m=pj_0O26biD9_MZUbccTTQ2LlWH2PMPXLuRPgQU0olFs&s=8K5jqMaS-4xLe7x_zFLoLUZOCAkHEMuARWekULTt39A&e=
 

___
GROW mailing list
GROW@ietf.org
https://www.ietf.org/mailman/listinfo/grow


Re: [GROW] Call for GROW WG adoption of grow-overlapping-routes

2012-10-03 Thread UTTARO, JAMES
Russ,

Hmmm, I don't think this is a consistent message.. When I attempted to 
give people rope i.e BGP Persistence IETF chairs felt that this was too much 
rope ?? IMO it takes very little rope to hang oneself so, let's be consistent 
as a starting point...

Jim Uttaro

-Original Message-
From: grow-boun...@ietf.org [mailto:grow-boun...@ietf.org] On Behalf Of Russ 
White
Sent: Wednesday, October 03, 2012 10:22 AM
To: Jared Mauch
Cc: grow@ietf.org
Subject: Re: [GROW] Call for GROW WG adoption of grow-overlapping-routes


> The problem as I see it is many of those that operate in the BGP/DFZ don't 
> know what they are doing.

???

Then they shouldn't be using this technique. Or perhaps even running
BGP. Protocols provide rope. It's choice whether you make good things or
bad things with the rope provided.

:-)

Russ
___
GROW mailing list
GROW@ietf.org
https://www.ietf.org/mailman/listinfo/grow
___
GROW mailing list
GROW@ietf.org
https://www.ietf.org/mailman/listinfo/grow


Re: [GROW] draft-ietf-grow-ops-reqs-for-bgp-error-handling-04

2012-07-08 Thread UTTARO, JAMES
Rob,

Sorry, I meant to respond to you sooner.. In re good/bad paths...

Thanks,
Jim Uttaro

">>> [Jim U>] I guess what I meant was the other paths that are considered good 
would be treated differently.. So in an environment where only paths with the 
mal-formed attr are affected by this error condition as opposed to an 
environment where all paths are affected ( withdrawn ) would create a 
inconsistent view of the "good" paths across AS domains.. So not so much the 
"bad" paths but the "good" paths and how they may be treated differently..

[rjs]: I'm not sure I fully understand here:
- Today: UPDATE is received from element A and found to be erroneous - 
session is reset, downstream do not see any paths where A was the best-path in 
the RIB. 
- With this draft: UPDATE is received from element A, found to be 
erroneous, downstream still see all other paths where A is the best-path in the 
RIB.

[rjs]: I'm not sure that this is so much inconsistency of what the "good" paths 
look like - both the receiving and downstream elements still consider A's paths 
as valid, other than the ones that were included in the erroneous UPDATE. In 
both cases, the NLRI contained in the erroneous UPDATE is also not propagated 
downstream (session reset, or treat-as-withdraw stops the further propagation).
"

I do not know if my concern actually matters.. My thought was that if AS1 
advertises P(1)...P(n) to AS2 and AS3 and AS2 has deployed error handling and 
AS3 has not, then in a mal-formed scenario AS3 would withdraw all paths and AS2 
would only send a withdrawal for the bad paths.. There is no way of AS4 knowing 
that the "good" paths from AS2 are actually suspect..Not sure if it matters and 
if AS4 could ever actually know without AS2 sending the "good" paths and 
informing downstream peers..

-Original Message-----
From: Rob Shakir [mailto:r...@rob.sh] 
Sent: Thursday, June 28, 2012 4:49 AM
To: UTTARO, JAMES
Cc: 'grow@ietf.org'; 'idr wg'
Subject: Re: draft-ietf-grow-ops-reqs-for-bgp-error-handling-04

Hi Jim,

Apologies for the delay in replying to this message. Further discussion in-line 
marked [rjs].

On 25 Jun 2012, at 23:47, UTTARO, JAMES wrote:

> [rjs]: Absolutely, this is the current behaviour. The problem with taking a 
> whole session down in this case is that you now take a risk of inconsistency 
> for all NLRI across that session for the duration that you hold onto the 
> learned NLRI. If one avoids being in the situation where the session is down 
> (e.g., by applying treat-as-withdraw behaviour in cases where one can 
> determine the NLRI) then all other NLRI on the session continue to be updated 
> as they need to be. It is only the NLRI that were included in the erroneous 
> UPDATE that may be affected for looping/black-holing.
> 
> 
>>> [Jim U>] The assumption being that the error was caused by an upstream 
>>> speaker and is therefore not truly indicative of an issue over the session 
>>> where the error manifests itself. This seems to make sense in the IPV4 
>>> case. I am still a bit concerned as I do not understand how the following 
>>> is addressed.
> 

[rjs]: Actually, I think treat-as-withdraw applied more generally than 
optional-transitive only does not necessarily imply that the error was not the 
direct fault of the upstream speaker. However:

> - There is no way of knowing if the adjacent peer is the speaker that is 
> actually responsible for the malformed attr or is coming from an upstream 
> speaker. I can think of no way of knowing this.. Can it be inferred from the 
> notion that an error is of the syntactic or semantic variety?

[rjs]: This is true, only where we have a mechanism such as the partial bit in 
the optional transitive attribute can we infer that the directly attached 
neighbour did not look at the session. What the semantic and critical errors 
that are called out in the draft relate to is the impact of the error on the 
resulting UPDATE message, rather than the direct neighbour being responsible 
for it.
[Jim U>] >>> Yup. I was hoping that it may  be possible to infer.. 

> - There seems to be no threshold when the session is actually taking out of 
> service. It would seem that some number of these type of errors would 
> indicate a major issue is taking place and should be addressed by severing 
> the speaker that is advertising paths with the malformed attr into the 
> topology. A large number of these error will create a large number of 
> withdrawn messages being generated from many peers. What are your thoughts on 
> how this should be addressed?

[rjs]: This was something that has been discussed on the list previously. There 
are two key questions in this space:

- Do you expect errors 

Re: [GROW] draft-ietf-grow-ops-reqs-for-bgp-error-handling-04

2012-06-25 Thread UTTARO, JAMES
Rob,

Comments In-Line..

Thanks,
Jim Uttaro

-Original Message-
From: Rob Shakir [mailto:r...@rob.sh] 
Sent: Sunday, June 24, 2012 5:10 AM
To: UTTARO, JAMES
Cc: 'grow@ietf.org'; 'idr wg'
Subject: Re: draft-ietf-grow-ops-reqs-for-bgp-error-handling-04

Hi Jim,

Further comments in-line as [rjs].

On 24 Jun 2012, at 03:06, UTTARO, JAMES wrote:

> 1. Conservative -- any error in messages received from a neighbour are 
> indicative that it is not a viable route to any prefix it advertises. 
> Therefore where error conditions occur, disconnect and remove all routes from 
> the neighbour from the RIB.
> 
> [Jim U>] This is based on the current behavior of BGP.. If the NH is still 
> viable you could allow the session to fall and persist the good state that 
> has already been learned..

[rjs]: Absolutely, this is the current behaviour. The problem with taking a 
whole session down in this case is that you now take a risk of inconsistency 
for all NLRI across that session for the duration that you hold onto the 
learned NLRI. If one avoids being in the situation where the session is down 
(e.g., by applying treat-as-withdraw behaviour in cases where one can determine 
the NLRI) then all other NLRI on the session continue to be updated as they 
need to be. It is only the NLRI that were included in the erroneous UPDATE that 
may be affected for looping/black-holing.


>>[Jim U>] The assumption being that the error was caused by an upstream 
>>speaker and is therefore not truly indicative of an issue over the session 
>>where the error manifests itself. This seems to make sense in the IPV4 case. 
>>I am still a bit concerned as I do not understand how the following is 
>>addressed.

- There is no way of knowing if the adjacent peer is the speaker that is 
actually responsible for the malformed attr or is coming from an upstream 
speaker. I can think of no way of knowing this.. Can it be inferred from the 
notion that an error is of the syntactic or semantic variety?
- There seems to be no threshold when the session is actually taking out of 
service. It would seem that some number of these type of errors would indicate 
a major issue is taking place and should be addressed by severing the speaker 
that is advertising paths with the malformed attr into the topology. A large 
number of these error will create a large number of withdrawn messages being 
generated from many peers. What are your thoughts on how this should be 
addressed?
>>

> I would expect all solutions implemented in response to these requirements to 
> be optional. If the risk of incorrectness is unacceptable to you/an operator, 
> then you should absolutely not enable any of these mechanisms. In a number of 
> networks that I have operated, designed and architected, I am prepared to 
> accept the risk of incorrectness, as I consider it acceptable when compared 
> to the risk of complete service outages in terms of impact to my customers 
> during such incidents. At the moment, without the work described through the 
> requirements outlined in this draft I do not have the means to make that 
> call...
> [Jim U>] I do not understand how it is possible to make this configurable on 
> a per session or AS basis..I would think all speakers participating in a 
> routing context would have to adhere to the same rules for a consistent view 
> across domains.. In my reading of the IDR draft it seems that it would be a 
> MUST.. Maybe I should not be considering that IDR draft as the actual 
> realization of the reqs..

[rjs]: The IDR draft is the solution for some of the requirements -- 
particularly those described in Section 3 of the GROW draft.
>>[Jim U>] Got it..

[rjs]: I do not see why this behaviour needs to be consistent across domains?
>>[Jim U>] Can you explain this

[rjs]: Essentially, if I receive an invalid UPDATE message, and apply 
treat-as-withdraw, if the advertising speaker did not know that this was 
erroneous then I end up with a different view of what is in the RIB than the 
advertising speaker does. If this was a prefix I had no other route to, then I 
may black-hole, if it was one where it was a more-specific of some larger 
prefix, then we end up with the potential for loops.
>>[Jim U>] Yes.. I am not sure I like the notion of forwarding loops especially 
>>for large flows.. 

[rjs]: If I am prepared to accept the black-holing or loops for the NLRI in the 
erroneous UPDATE as a risk, in favour of keeping the remaining NLRI working 
(and being updated/withdrawn if they change), then this is a local decision and 
I do not need to imply any behaviour of the neighbouring domains.
>>[Jim U>] I guess what I meant was the other paths that are considered good 
>>would be treated differently.. So in an environment where only paths with t

Re: [GROW] draft-ietf-grow-ops-reqs-for-bgp-error-handling-04

2012-06-23 Thread UTTARO, JAMES
Rob,

Comments In-Line...

Thanks,
Jim Uttaro

-Original Message-
From: Rob Shakir [mailto:r...@rob.sh] 
Sent: Saturday, June 23, 2012 6:39 PM
To: UTTARO, JAMES
Cc: 'grow@ietf.org'; 'idr wg'
Subject: Re: draft-ietf-grow-ops-reqs-for-bgp-error-handling-04

Hi Jim,

Thanks very much for the detailed review of this document.

On 22 Jun 2012, at 15:56, UTTARO, JAMES wrote:

> Rob,
>  
> Following find my comments..
>  
> Thanks,
> Jim Uttaro
>  
> General Comment,
>  
> From a philosophical perspective I agree with the goals of this draft but I 
> do not agree with an approach that maintains a session in the face of a 
> failure in the machinery. This is a bottom up approach which will always be a 
> day late and a dollar short as we continue to patch what it means to be a 
> viable session.. I believe the proper approach to meet your reqs ( and mine 
> too!!! ) is a top down which does not change the BGP behavior but changes the 
> response to that behavior.. The reality of today's BGP fields of use which 
> are many and varied is that the control and the forwarding paths are 
> orthogonal, the state being carried is not always paths as we usually 
> consider them. Examples include RT-C, Flowspec which are used to create PWs 
> and simulate an IGP etc... I do not believe changing the behavior of BGP 
> session viability machinery to accomplish maintaining valid forwarding in the 
> face of a control plane failure is the correct approach
>  
> The draft addresses a very important error condition but ( Update Error ).. 
> This should be expanded to other areas where the failure of a session does 
> not indicate that a subset or all of the routing state learned over said 
> session is invalid.. I think the draft should address this directly.. Are the 
> changes here configurable? Can I turn this off if I do not want this behavior 
> for certain topologies, AFs?

The scope for this draft was particularly to handle errors that occurred in the 
DFZ (and in some private networks) where UPDATE messages with malformed 
contents representing a subset of NLRI carried via a session (1 prefix in 
numerous cases) resulted in complete failure of these sessions. Particularly, 
errors where it was a malformed optional transitive attribute that was 
"tunnelled" across multiple speakers who did not parse it were particularly 
destructive in the DFZ. Jonathan Oddy, Andy Davidson and myself spent some 
effort analysing one of these incidents back in 2008/2009 -- 
http://mailman.nanog.org/pipermail/nanog/2009-January/006816.html. 
[Jim U>] Got it..

One of the complexities in this problem space is that there are multiple 
underlying philosophies as to how such error behaviour should be handled:

1. Conservative -- any error in messages received from a neighbour are 
indicative that it is not a viable route to any prefix it advertises. Therefore 
where error conditions occur, disconnect and remove all routes from the 
neighbour from the RIB.

[Jim U>] This is based on the current behavior of BGP.. If the NH is still 
viable you could allow the session to fall and persist the good state that has 
already been learned..

2. Balanced risk -- some errors are not indicative of an error on the remote 
speaker, or are localised to specific NLRI/prefixes that are propagated via 
this speaker. Therefore take a balanced approach of applying error handling 
mechanisms to those NLRI, but continue to trust the integrity of the speaker 
for the remainder of routing information.

Essentially, this draft's scope was to explain the requirements that exist to 
manage the risk of taking the latter view. At the moment, there is a risk to an 
operator presented by the first (a malformed UPDATE may break the BGP sessions 
that run between their PEs and RRs, or those that propagate their routes to the 
Internet DFZ, and cause a service outage), and there is no option to balance 
the impact of that risk against the impact of BGP being incorrect for a subset 
of prefixes propagated via those sessions.
[Jim U>] Yes I have seen this..

I would expect all solutions implemented in response to these requirements to 
be optional. If the risk of incorrectness is unacceptable to you/an operator, 
then you should absolutely not enable any of these mechanisms. In a number of 
networks that I have operated, designed and architected, I am prepared to 
accept the risk of incorrectness, as I consider it acceptable when compared to 
the risk of complete service outages in terms of impact to my customers during 
such incidents. At the moment, without the work described through the 
requirements outlined in this draft I do not have the means to make that call...
[Jim U>] I do not understand how it is possible to make this configurable on a 
per session or AS basis..I would think all speakers 

Re: [GROW] [Idr] Fwd: draft-ietf-grow-ops-reqs-for-bgp-error-handling-04

2012-06-22 Thread UTTARO, JAMES
+1..

Jim Uttaro

From: Enke Chen [mailto:enkec...@cisco.com]
Sent: Friday, June 22, 2012 2:00 PM
To: rob...@raszuk.net
Cc: i...@ietf.org List; grow@ietf.org; UTTARO, JAMES; Enke Chen
Subject: Re: [Idr] Fwd: [GROW] 
draft-ietf-grow-ops-reqs-for-bgp-error-handling-04

Hi, folks:

It might help the discussion to refresh ourselves about several large outages 
in the last few years that prompted the work on the error handling requirements 
and solutions:

   o issue with AS4_PATH that resulted in session resets multiple hops away 
(two separate incidents)
   o session reset triggered by a single route with a new attribute

I remember that Rob had a presentation at the NANOG on the topic.

-- Enke

On 6/22/12 8:57 AM, Robert Raszuk wrote:
Jim,


We could as easily without any change to BGP use BGP Persistence to
maintain the paths except for the ones that have the invalid
attribute.. This is the simpler method, has the benefit of not
changing BGP, or educating the world on the nuances of the changes
etc...
+

Why wouldn't we simply let the session fail and then use BGP Persistence
or GR ;)

Please observe that when the session is down you are not receiving withdraws or 
new best paths for those "good" prefixes (maybe 99% of them) which did not have 
any errors in their respective update messages.

Equating it with persistence proposal is therefor highly incorrect.


I also do not fully understand "treat as withdraw" does this meant that
the peer who has received an update with P1-PN with malformed attr then
initiate a withdrawal to all of its peers?  Or simply assume that the
paths have been received as a message?  Some sample topologies as to how
this works would be a good addition to this section..

The speaker reacting on an error which can be addressed by "treat-as-withdraw" 
invalidates locally those prefixes received in the update message, runs local 
best path and as result if no other path is found withdraws those prefixes from 
all peers it has previously sent them to.


I am not in support of solutions which create a scenario where BGP
cannot recover without human intervention.

I think no one is. But we are - I think - not there yet for the routers to 
automatically fix their bugs, but only automatically signalling them the 
requested action ;(.

> Nothing is going to get people's attention like a failed BGP
> Session..

True statement. But the entire assumption behind treat-as-withdraw is that your 
ops scripts parse the syslog messages indicating the issue to NOC with the same 
red color and buzz as bgp session down. Of course you need to rework your ops 
scripts/alarms for that to happen.

Rgs,
R.

PS.

Note that if the main BGP session is down (like in the persistence case) BGP 
Operational Messages can not any longer be exchanged between peers as TCP 
connection could have been reset (if no multisession is used and if we are 
talking about single SAFI). That just makes the issue worse especially when you 
do not like to have humans intervention.






___

Idr mailing list

i...@ietf.org<mailto:i...@ietf.org>

https://www.ietf.org/mailman/listinfo/idr

___
GROW mailing list
GROW@ietf.org
https://www.ietf.org/mailman/listinfo/grow


[GROW] draft-ietf-grow-ops-reqs-for-bgp-error-handling-04

2012-06-22 Thread UTTARO, JAMES
Rob,

Following find my comments..

Thanks,
Jim Uttaro

General Comment,

>From a philosophical perspective I agree with the goals of this draft but I do 
>not agree with an approach that maintains a session in the face of a failure 
>in the machinery. This is a bottom up approach which will always be a day late 
>and a dollar short as we continue to patch what it means to be a viable 
>session.. I believe the proper approach to meet your reqs ( and mine too!!! ) 
>is a top down which does not change the BGP behavior but changes the response 
>to that behavior.. The reality of today's BGP fields of use which are many and 
>varied is that the control and the forwarding paths are orthogonal, the state 
>being carried is not always paths as we usually consider them. Examples 
>include RT-C, Flowspec which are used to create PWs and simulate an IGP etc... 
>I do not believe changing the behavior of BGP session viability machinery to 
>accomplish maintaining valid forwarding in the face of a control plane failure 
>is the correct approach

The draft addresses a very important error condition but ( Update Error ).. 
This should be expanded to other areas where the failure of a session does not 
indicate that a subset or all of the routing state learned over said session is 
invalid.. I think the draft should address this directly.. Are the changes here 
configurable? Can I turn this off if I do not want this behavior for certain 
topologies, AFs?

Abstract

Can the scope be expanded? There are other failure modes, i.e Timer Expiry 
which today is not considered a failure mode. In reality what I have seen is 
that timer expiry occurs due to the fact that BGP threads cannot be serviced in 
a timely manner.  I think it would be best if we could put it all on the table..

I do think that this draft should bound the solution space. At the minimum the 
solutions proposed should meet a minimum set of the operators criteria in terms 
of managing the network, convergence, persistence, churn, forwarding impact 
etc...

Section 1.1

The following paragraph is based on the premise that the session being "down" 
results in a large impact.. This is certainly true for today's implementations 
which use the session ( Control Plane Construct ) to determine the viability of 
the forwarding state learned over said session. There are cases where this is 
the session and forwarding are parallel, but in many more cases control and 
forwarding planes are orthogonal.. I think we need to re-consider this 
assumption for many of the services BGP is being used for and base the response 
to error conditions on this reality..

" Both within Internet and multi-service routing architectures, a
   number of BGP sessions propagate a large proportion of the required
   routing information for network operation.  For Internet routing,
   these are typically BGP sessions which propagate the global routing
   table to an AS - failure of these sessions may have a large impact on
   network service, based on a single erroneous update.  In an multi-
   service environment, typical deployments utilise a small number of
   core-facing BGP sessions, typically towards route reflector devices.
   Failure of these sessions may also result in a large impact to
   network operation.  Clearly, the avoidance of conditions requiring
   these sessions to fail is of great utility to any network operator,
  and provides further motivation for the revision of the existing
   behaviour. "

Section 1.2

Bullet 1..


This is a very interesting point.. As you stated



"  Traditional network architectures would deploy an Interior Gateway

   Protocol (IGP) to carry infrastructure and customer prefixes, with an

   Exterior Gateway Protocol (EGP) such as BGP being utilised to

   propagate these prefixes to other Autonomous Systems. "



In this environment where BGP was predominantly used to advertised state 
between AS domains over dedicated peering points it would makes sense that a 
malformed update learned from a peer is a fairly good indication that the peer 
which is originating the update is suspect.. As there are other NHs available 
it would be prudent to not use the suspect session/forwarding path which in 
these cases were in parallel.. Maybe "treat as withdraw" is appropriate not 
sure, although the SP should be able to decide the course of action .



I would ask you to consider that there are two cases here.. The first is when a 
speaker learns an update from a peer where the NH for the paths in that update 
is the direct peer ( ASBR, PE ), and the second whereby the update is from a 
peer where the NH for the paths in the update is not the peer ( RR ). In the 
former case is there still a case that the original premise holds..The 
offending egress router should be disconnected from the topology. I don't think 
this is black and white and operators may want the flexibility of determining 
the behavior..



Bullet 2



Totally agree.. This i

Re: [GROW] I-D Action:draft-ietf-grow-diverse-bgp-path-dist-01.txt

2010-06-23 Thread UTTARO, JAMES (ATTLABS)
The approach as I understand it is two deploy multiple channels to
disseminate routing state be it the 2,3,...nth path to some dest D..
Comments follow...

Jim Uttaro

Section 1.0 

"The parallel route reflector planes solution brings very significant
benefits at a negligible capex and opex deployment price as compared to
the alternative techniques"

A number of points need to be clarified here. The first is the
SP/Operator needs to deploy n number of RR planes to disseminate N
paths. Assuming some form of redundancy we would have to of course buy
the RRs or deploy some type of logical routers. How can this be
monetized? Does this approach assume that customers who want fast
restoration, load balancing, mitigation of oscillation would pay for
this. Or does the draft assume that the addl RRs are of such negligible
capex cost that the operator would simply incur the cost.. This model
does not usually sit well with the folks that write the checks. From an
opex perspective we are putting in addl planes for each AS that is under
the operators authority. So we not only need to pay for it we would need
to establish coherent inter-AS strategies to manage, maintain these addl
RRs.. Additionally the function of these devices is different than a
traditional RR which implies that OpS needs to be cognizant of the
difference and how they should be managed.. As described in the draft
the 2,3..nth plane need not be as robust as it is not the primary path..
This needs to be understood by OpS in terms of their response to failure
or how to perform maintenance.. We are essentially introducing a new
device from these perspectives...

Section 2.1

"This new requirement has its own memory and processing cost.  Suffice
to say that by the middle of 2009 none of the commercial BGP
implementation can claim to support the new add-path behaviour in
production code, in part because of this resource overhead."

A bit confused by this statement.. My thoughts on this was add-paths is
useful for a customer that is advertising multiple paths or at peering
points. In both cases we would anticipate the use of routing policy to
only select a subset of these routes. It is impractical to believe that
we are going to duplicate the same state over and over again on each
plane.. This is not a function of the draft but how operators deploy the
functionality..This functionality has been around a long time in VPNV4
services and I believe it will eventually be used for IPV4 services.. 

" The add paths protocol extensions have to be implemented by all the
routers within an AS in order for the system to work correctly."

Pls explain.. Why do you believe this? It is certainly not practical and
I never envisioned a full upgrade across thousands of edges in multiple
AS domains.. The approach we believe we could take is to deploy on a
subset of edges for some set of routes. 

" It is intended as a way to buy more time allowing for a smoother and
gradual migration where router upgrades will be required for perhaps
different reasons.  It will also allow the time required where standard
RP/RE memory size can easily accommodate the associated overhead with
other techniques without any compromises.'

His statement seems to conflict with the one above.. Above you state
that it is needed everywhere to work correctly here the statement is we
can buy time to gradually migrate.. Why don't we just gradually migrate
and eliminate this middle step??

Section 4

" The proposed solution is based on the use of additional route
reflectors or new functionality enabled on the existing route reflectors
that instead of distributing the best path for each route will
distribute an alternative path other then best. "

Would like to drill down on this a bit..In the first case where addl
deployment of RRs are done I am assuming that these RRs would somehow
prefer the second best path of the first.. How would this be done
customers use many different mechanisms to identify primary, secondary,
etc... AS-PATH prepend, Local Pref, IGP cost etc... are all used..How is
this done on the secondary plane? Regardless of either of these
approaches changes to the BGP implementation to select a different POI
is needed. But where how do you know how a customer is identifying? Pls
expand on this. It would seem that although the protocol definition does
not change the operator needs to ensure that this functionality is
constructed the same way across all the vendors.. Will this require
another draft? 

" The best path (main) reflector plane distributes the best path for
each route as it does today.  The second plane distributes the second
best path for each route and so on.  Distribution of N paths for each
route can be
achieved by using N reflector planes."

How is this done when it is the IGP cost that is the deciding factor..
Will we have to correctly place the Nth plane corresponding to IGP
correctly in the IGP??

" It is easy to observe that the installation of one or more additional
route re

Re: [GROW] I-D ACTION:draft-ietf-grow-bgp-graceful-shutdown-requirements-02.txt

2010-05-04 Thread UTTARO, JAMES (ATTLABS)
Could the authors provide further clarification on the following.

Is the intent to tie path availability and this functionality? I could
see other uses for path availability ie Load Balancing, Fast
Convergence, Minimize Path Hiding..

Availability of Paths. Is the draft proposing a mechanism for a speaker
to discover alternative paths ? There are a number of existing drafts
BGP Best External and Add-Paths for IPV4 and BGP Best External and a
Unique RD ( Not a draft ) per VRF Context ensure that we get the second
best paths to a ASBR Speaker ( Either PE or ASBR ). The draft does not
speak to the approach..

Make before Break. Is this a capabilities negotiation on a per session
basis? I would guess so.. 

Will the set of affected prefixes need to be sent in a BGP Withdrawal
message to the RRs from the initiator? If so what is the behavior? It
seems that all speakers would then select ASBR2 ( Your Example ).. So
there would be no change from the RR towards the iBGP topology?

Jim Uttaro 


-Original Message-
From: grow-boun...@ietf.org [mailto:grow-boun...@ietf.org] On Behalf Of
internet-dra...@ietf.org
Sent: Friday, April 30, 2010 5:15 PM
To: i-d-annou...@ietf.org
Cc: grow@ietf.org
Subject: [GROW] I-D
ACTION:draft-ietf-grow-bgp-graceful-shutdown-requirements-02.txt

A New Internet-Draft is available from the on-line Internet-Drafts 
directories.
This draft is a work item of the Global Routing Operations Working Group
of the IETF.

Title   : Requirements for the graceful shutdown of BGP
sessions
Author(s)   : T. Takeda, B. Decraene, P. Francois, c.
pelsser, Z. Ahmad, A. Armengol
Filename:
draft-ietf-grow-bgp-graceful-shutdown-requirements-02.txt
Pages   : 16
Date: 2010-4-30

The BGP protocol is heavily used in Service Provider networks both 
   for Internet and BGP/MPLS VPN services. For resiliency purposes, 
   redundant routers and BGP sessions can be deployed to reduce the 
   consequences of an AS Border Router or BGP session breakdown on 
   customers' or peers' traffic. However simply taking down or even 
   bringing up a BGP session for maintenance purposes may still induce 
   connectivity losses during the BGP convergence. This is no more 
   satisfactory for new applications (e.g. voice over IP, on line 
   gaming, VPN). Therefore, a solution is required for the graceful 
   shutdown of a (set of) BGP session(s) in order to limit the amount of
traffic loss during a planned shutdown. This document expresses 
   requirements for such a solution.

A URL for this Internet-Draft is:
http://www.ietf.org/internet-drafts/draft-ietf-grow-bgp-graceful-shutdow
n-requirements-02.txt

Internet-Drafts are also available by anonymous FTP at:
ftp://ftp.ietf.org/internet-drafts/

Below is the data which will enable a MIME compliant mail reader
implementation to automatically retrieve the ASCII version of the
Internet-Draft.
___
GROW mailing list
GROW@ietf.org
https://www.ietf.org/mailman/listinfo/grow


Re: [GROW] BGP Monitoring Protocol

2009-04-08 Thread UTTARO, JAMES, ATTLABS
 

-Original Message-
From: Robert Raszuk [mailto:rob...@raszuk.net] 
Sent: Wednesday, April 08, 2009 11:12 AM
To: UTTARO, JAMES, ATTLABS
Cc: grow@ietf.org
Subject: Re: [GROW] BGP Monitoring Protocol

Hi Jim,

 > I was a bit confused about the "L3VPN Instance Peer". What does it
 > mean to set the Peer Distinguisher to the Route Distinguisher of the
 > L3VPN instance that the peer belongs to?

EBGP session you are monitoring will belong to either global table or 
some VRF on a given PE. Adding RD makes it easier to find out which VRF 
given routes belong. If the session is to the global table the field 
will be zero.

 > Are you assuming that the RD is different for every instantiation of
 > the VPN on each PE? It is possible that the same RD is used across
all
 > PEs that a VPN belongs to.

>>There is no need to make such assumption. The addition of the RD
should 
>>be used at the management station along with Peer BGP ID which would 
>>then used in tuple identify PE/VRF for a given session.  Peer BGP ID
is 
>>already part of BMP header.

My thinking is that the management station cannot use the RD as the
method to uniquely identify a VPN/VRF across all PEs. For VPNA Each PE
participating in VPNA may use a unique RD. This method is quite useful
for convergence, and load balancing. If ISPs can configure RDs as unique
per VPN per PE per VRF or common then how can RD make it easier to
determine eBGP sessions belonging to a specific VPN. I agree that it
will identify per VRF but is this useful without the VPN context?

Cheers,
R.


> "o  Peer Distinguisher (8 bytes): Routers today can have multiple
>   instances (example L3VPNs).  This field is present to
distinguish
>   peers that belong to one address domain from the other.
> 
>   If the peer is a "Global Instance Peer", this field is zero
>   filled.  If the peer is a "L3VPN Instance Peer", it is set to
the
>   route distinguisher of the particular L3VPN instance that the
peer
>   belongs to."
>  
> I was a bit confused about the "L3VPN Instance Peer". What does it
mean 
> to set the Peer Distinguisher to the Route Distinguisher of the L3VPN 
> instance that the peer belongs to? Are you assuming that the RD is 
> different for every instantiation of the VPN on each PE? It is
possible 
> that the same RD is used across all PEs that a VPN belongs to.
>  
> Jim Uttaro
> 
>  
>  
> 
> 
>

> 
> ___
> GROW mailing list
> GROW@ietf.org
> https://www.ietf.org/mailman/listinfo/grow

___
GROW mailing list
GROW@ietf.org
https://www.ietf.org/mailman/listinfo/grow


[GROW] BGP Monitoring Protocol

2009-04-08 Thread UTTARO, JAMES, ATTLABS
"o  Peer Distinguisher (8 bytes): Routers today can have multiple
  instances (example L3VPNs).  This field is present to distinguish
  peers that belong to one address domain from the other.

  If the peer is a "Global Instance Peer", this field is zero
  filled.  If the peer is a "L3VPN Instance Peer", it is set to the
  route distinguisher of the particular L3VPN instance that the peer
  belongs to."
 
I was a bit confused about the "L3VPN Instance Peer". What does it mean
to set the Peer Distinguisher to the Route Distinguisher of the L3VPN
instance that the peer belongs to? Are you assuming that the RD is
different for every instantiation of the VPN on each PE? It is possible
that the same RD is used across all PEs that a VPN belongs to.
 
Jim Uttaro

 
 
___
GROW mailing list
GROW@ietf.org
https://www.ietf.org/mailman/listinfo/grow


Re: [GROW] RR reflection to clients

2009-03-30 Thread UTTARO, JAMES, ATTLABS

-Original Message-
From: grow-boun...@ietf.org [mailto:grow-boun...@ietf.org] On Behalf Of
Iljitsch van Beijnum
Sent: Monday, March 30, 2009 9:43 AM
To: Shane Amante
Cc: grow@ietf.org grow@ietf.org
Subject: Re: [GROW] RR reflection to clients

On 27 mrt 2009, at 3:43, Shane Amante wrote:

> Re: your comment on "breaking groupwise processing" ... don't RR's  
> already have to account for not sending updates to a specific peer  
> (or, set of peers) based on various ORF types they've received from  
> that peer (assuming ORF's are enabled)?  Why can't that same code/ 
> logic apply here (i.e.: it's already written, why can't we re-use it  
> here)?

>>Hm, aren't filters on iBGP sessions considered bad style?

>>I don't know what implementations do when peer group members send  
>>ORFs, but it wouldn't surprise me to learn that this would make the  
>>peer group processing advantages disappear.

One of the big concerns from an RR perspective when deciding whether to
deploy ORF.

Jim Uttaro
___
GROW mailing list
GROW@ietf.org
https://www.ietf.org/mailman/listinfo/grow
___
GROW mailing list
GROW@ietf.org
https://www.ietf.org/mailman/listinfo/grow


Re: [GROW] RR reflection to clients

2009-03-27 Thread UTTARO, JAMES, ATTLABS
The ACCEPT-OWN functionality is a direct result of a business model that
is based on per route granular stitching between different routing
contexts on PEs within an AS or spread across multiple ASes. This may be
value added centralised services as provided by the service provider or
a middleman service for creating long-standing or temporal communities
of interest. The push back I have heard in re ACCEPT-OWN is that it can
be done by simply configuring the "leaking" of the routes of interest on
the PE where the two VPNs intersect. That is where the source of Route A
from VPNA needs to be leaked into VPNB on the same PE. The issue with
the configuration approach is that it assumes that customers are static
as to where they may advertise routes that they may want stitched. The
fact is the ISP does not know a priori which routing contexts, on which
PEs,and in which AS the stitching can occur. The configuration approach
is a huge operational nightmare in large scale networks that span
multiple regions of the world. We had considered it..

>"So to clarify..  It's ACCEPT_OWN, and it's not what's
>introducing the problem, it's "utilizing" it."

The notion of sending the routes back to the originator is fairly
straightforward when the entire PE is viewed as one routing context. It
may be done as an efficient mechanism for a RR to advertise routes to
multiple peers but it is not the optimal behavior as it requires both
the RR and PEs to process additional routing state. In the VPN case the
PE is really homing multiple mutually exclusive routing contexts. The
ability to create simple operational models to stitch requires that we
can get the routes back to a PE and import them into a different routing
context that originated them in the first place. So "the problem" for
one routing context is actually a requirement for creating the above
value added services. How and which routes send back for importation is
really the discussion.

Jim Uttaro

-Original Message-
From: Danny McPherson [mailto:da...@tcb.net] 
Sent: Thursday, March 26, 2009 2:20 PM
To: Christopher Morrow
Cc: grow@ietf.org; UTTARO, JAMES, ATTLABS
Subject: RR reflection to clients


On Mar 26, 2009, at 12:09 PM, Christopher Morrow wrote:

> Today's WG meeting brought out some contentious discussion around this
> presentation. The summary for a portion of the discussion was that
> local optimizations in BGP mechanisms can often lead to system wide
> systemic issues. One comment was that a particular feature
> (advertise-own) ends up being used in 'internet' context where it's
> inappropriate. In this instance though, often the VPN providers are
> also running 'internet' as a VPN, to lower capex/opex and take
> advantage of their larger 'internet' platforms for smaller 'vpn'
> solutions.

So to clarify..  It's ACCEPT_OWN, and it's not what's
introducing the problem, it's "utilizing" it.

The issue I have *here* is the fact that a client (c)
advertises a prefix (p) to it's local RRs.  The RRs reflect
the route back to the client and the client is expected to
poison it based on the Originator ID matching the local
Router ID.

In RFC 1966 the RRs did not reflect the route back the client
because the client didn't know it was a client, and for
incremental deployment it was required that the RRs not reflect
it back or a routing information loop could occur.  RFC 2796
was updated and allowed an RR to reflect a route back to a
client and the onus was on the client to drop the update.

The problem is that if the client is advertising 100ks of
routes (which is perfectly reasonable for a peering edge
router) to the RRs, (100ks * num_rrs) are reflected back to
the client, and it now has to senselessly process and discard
all of those updates.

ACCEPT_OWN comes into the picture only because it employs
this fundamentally broken behavior.  IF ACCEPT_OWN was a
function of the RR, and the routes were only reflected back
to the client IF that community was presented, I'd be
perfectly fine with it.

-danny


___
GROW mailing list
GROW@ietf.org
https://www.ietf.org/mailman/listinfo/grow


Re: [GROW] RR reflection to clients

2009-03-26 Thread UTTARO, JAMES, ATTLABS
It seems that trading off the work that a PE has to do to parse the
routes that have been sourced by itself needs to be weighed against the
added complexity and additional work a RR must do to ensure that it does
not send the routes back to said PE. I would guess that the RR would
need to maintain multiple data structures associated with each client
peer, have multiple update groups etc... The main point is which network
device should do the work? My approach has always been to minimize the
work on the RR as it is most important to the viability of the entire
service/network.. The RR infrastructure is more critical and important
than any single PE...

Jim Uttaro 

-Original Message-
From: grow-boun...@ietf.org [mailto:grow-boun...@ietf.org] On Behalf Of
Danny McPherson
Sent: Thursday, March 26, 2009 3:40 PM
To: John G. Scudder
Cc: grow@ietf.org
Subject: Re: [GROW] RR reflection to clients


On Mar 26, 2009, at 1:06 PM, John G. Scudder wrote:

> I think the comment in quotes above demonstrates that at worst,  
> ACCEPT_OWN is tangential to your RR beef.

Unless implementations change their behavior to support this,
which is why I keep associating it.  But yes, in general, I
agree.

> Suppose the RR spec was changed to mandate suppression of own routes  
> being sent back to their originators -- in that case I think we'd be  
> happy to update the ACCEPT_OWN spec as you describe, to explicitly  
> permit sending just the special routes back to their originators.   
> Of course it was unnecessary to spec this in the ACCEPT_OWN doc  
> since as you point out, reality is that the routes are sent anyway.

Agreed..

> So, I agree with Robert (and I think you do too) that this is a  
> distraction from your primary point.

Noted.

-danny

___
GROW mailing list
GROW@ietf.org
https://www.ietf.org/mailman/listinfo/grow
___
GROW mailing list
GROW@ietf.org
https://www.ietf.org/mailman/listinfo/grow